Why Marketers Need to Focus on Data Quality, Not Quantity
The epiphany of the 2010s technological era was that people are streams of data. Likes, dislikes, families, friends, hobbies, jobs—everything that makes up a full life is just data awaiting capture, now amenable to all kinds of sophisticated techniques to maximize the likelihood of a desired action. In 2024, it’s rare that a corporation isn’t applying significant effort to dip its spoon steadily deeper into the river of consumer data.
This unquenchable thirst for customer data is driven by the fundamental belief that more data leads to better data models, which can drive efficiency and more revenue. This, however, is false. Not only does more data not always lead to better models, but it can actually degrade the model’s power and explainability. The advertising industry is suffering from data overload—making us less effective and causing us to lose the trust of the customers we market to.
The data snapshot
Even if all external restrictions were lifted and we could gather from all the data sources we wanted, a smart marketer recognizes that we should restrain ourselves for a more fundamental reason: Much of our data is highly correlated, making it nearly useless.
To understand this, imagine you are a photographer standing an arm’s length from a skyscraper. You can’t step back to get the whole building in one picture; instead, you take many pictures from different positions and angles around the building to stitch them together and make a composite photograph of the whole building.
In this example, each picture is a new source of data we’re adding to our model, the reconstruction of the full building. As long as each individual snapshot is of a different part of the building, it’s easy to fit them together to get a full view. However, with highly correlated data, our pictures overlap, depicting the same part of the building multiple times. It’s much, much harder to be accurate in this case.
No matter how many pictures you take, if the information content of each new one is low, your model cannot improve.
Think smaller, build smarter
So, if we cannot wait to deepen our dataset, and if gathering all available data can weaken our results, how then do we build accurate, explainable, and ethical models to advertise to our customers better?
Discover more from Сегодня.Today
Subscribe to get the latest posts sent to your email.