Clustering Content

for Context-Driven Advertising

globo_logo_bigger
CONTEXT

New times, New challenges

In response to shifting industry trends, Globo transformed from a traditional television company to a media tech enterprise, harnessing the advertising potential of major media properties in Latin America such as GloboPlay (Streaming), G1 (News), and GE (Sports). In my capacity within the big data field, I played a crucial role in supporting this strategy by providing solutions to help achieve its objectives.

products-svg
globo-device5
OBJECTIVE

A new way to sell Advertising

Segmentation and sponsorship are traditional advertising approaches at Globo. However, this project aimed to revolutionize how advertisers connect with Globo's digital properties by facilitating theme-based advertising delivery. This enables advertisers to align their messages with thematic interests for more impactful campaigns

segment_icon

Segment

The segmentation advertising model targets specific audience segments with tailored campaigns, maximizing relevance and effectiveness

sponsorship_icon-1

Sponsorship

Advetizing to associate their brand with events or programs, like football championships or reality shows, for increased exposure

themes_icon

Themes

This innovative model proposes selling placements based on thematic interests, like music or gastronomy, aligning ads with relevant content for increased effectiveness

MY WORK

But... Where is the Design?

Everywhere. Countless design decisions and approaches are made for each specific problem. Let me outline a few:

1

Guide the team

It's incredibly easy to lose sight of our objective through multiple interactions

2

Analyzing the results

Utilizing data analysis skills to assess whether they align with the business expectations

3

Reporting to stakeholders

Sharing results and managing their expectations through each interaction. A crucial step since no model is perfect.

Defining the approach

There are multiple solutions to tackle the theme-finding problem. For example, we can approach it with a supervised model, treating it as a categorization problem, or an unsupervised model where we cluster the content by similarity. Each approach contains its own sets of algorithms and methods. In the end, we opted for a clustering method: given the uncertainty surrounding the size of the theme's production, the unsupervised approach allows for a natural discovery process without our previous inputs.

DISCARTED

Supervised Modeling

'Classification' Method

Requiries a manual intervention

More control of result

vs
SELECTED

Unsupervised Modeling

'Clustering' Method

Exploratory Nature

Less control of result

RESULT

Clustering thousands of Contents

the whole year of G1 editorial production

vertical-line
cards

From 25 clusters

3 where high value for advertizers 

dark-clusters
light-cluster
International Politics
Economy
Gastronomy
Elections
Public Management
Daily News
Homicide
Roads & Streets
Infrainstructure
War & Terrorism
City Traffic
Accidents
Job & Exams
Concerts & Arts
War on Drugs
Industry
College Exam
Politics
Natural Desasters
Job Market
Violance
Public Health System
Music
Culture
Education

Let's take a deep-dive at those clusters

Concerts & Arts

4% of documents
Pageviews in 2018
concerts_pageviews
Top 10 Terms Vs Frequency
concerts-wordcount

The graph above helps us not only to assess the production but also the advertising impression margin. Even if a cluster contains few contents, the volume of the audience is another factor to be taken into account in the analysis to measure advertising impressions

As we can see, the Brazilian elections of 2018 influenced the production and consumption of content classified in the cluster "Concerts and Arts". This is due to the 'cultural' nature of the events promoted during the elections, such as campaign rallies and artist performances.

Analyzing the frequency and terms contained in the cluster, we can see that almost whenever a content contains words like 'museum,' 'theaters,' 'tickets,' and 'show,' it is classified within the 'Concerts & Arts' cluster, indicating a strong thematic cohesion.

Gastronomy

1.2% of documents
Pageviews in 2018
gastronomy-pageview
Top 10 Terms Vs Frequency
wordcount-gastronomy

It's interesting to see in the gastronomy cluster a correlation with festive dates and the end of the year

Here we found something interesting: 'Olinda', 'Recife', and 'Pernambuco' are Brazilian cities. But why were they related to terms of gastronomy? Through this work, we discovered an error in the editorial guideline: many articles about food were being published under the editorial line of these cities. Therefore, the model created a relationship between these elements

Music

3% of documents
Pageviews in 2018
pageviews-music
Top 10 Terms Vs Frequency
music-wordcount-1

An interesting relationship in both the terms and the consumption of music content is their connection to the 2018 World Cup. Why did a sports event end up being encompassed within a music cluster? To understand this, we need to grasp the nature of the content and the bias in which the World Cup is portrayed by the product: The GE (Sports News) covers the World Cup as a sporting event, while the G1 (News) focuses on the main events and especially the festivities off the field, such as the opening ceremony. Therefore, for the G1, the World Cup was a musical event.

Conclusions

1

airpods1small
airpods2small
airpods3small

Apple was the first client for our Music Cluster

To contextualize the advertisement of the 2nd generation of Apple's AirPods, they utilized our Music cluster, closing a partnership with Globo

2

brandsafe

Supervised approach for brand safety

Although not utilized in this project, our supervised studies were employed in a brand safety initiative aimed at protecting advertisers from undesired associations

3

content-richness

A better understending of Globo's portifolio

This was one of the largest studies on Globo's portfolio to date, providing business units with a greater understanding of the breadth and thematic scope of editorial productionuction  

E-MAIL

eduardoclechner@gmail.com

TIME ZONE

GMT-3 (Rio de Janeiro))

CREATED BY

Eduardo Lechner