Data Clean Rooms
What is a Clean Room?
“Clean room” has become a buzzword these past few years, yet I still run into a lot of confusion from people on the subject. At a basic level, clean rooms enable the privacy safe comingling of 1st and 3rd party data. They achieve this by masking both data sets behind anonymized IDs, and subsequently anonymized profiles that cannot be re-identified.
Further complicating the topic, there are broadly two “kinds” of clean rooms.
- Activation clean rooms, built for buy-side audience segmentation, activation & measurement use cases.
- Publisher clean rooms, built for sell-side applications: enabling publishers to expose impression logs to advertisers for measurement.
Some well-known publisher clean rooms would be Facebook’s Advanced Analytics, or Google’s Ads Data Hub. But we’re focusing on activation clean rooms today.
Old Clean Room Architecture
Years ago, while at HBO, I built my first clean room. It’s main function was building more advanced audiences for acquisition campaigns, incorporating 3rd party data to fill in the behavioral holes of our 1st party data. It was built in with traditional architecture:
- Run our 1st party data through a 3rd party identity provider (Liveramp, Experian, Neustar, etc.) to anonymize our data behind a 3rd party ID that could not be re-identified.
- Purchase a 3rd party household data set, run that through the same 3rd party identity provider to anonymize that data set behind the same 3rd party IDs.
- Ingest both anonymized data sets into a new environment, in which queries could use the anonymous ID to join between the data sets and garner more comprehensive consumer insights.
- To activate audience segments, push the segments 3rd party IDs back to the 3rd party identifier, they translate to identifiers and push down to ad platforms for activation.
This was advanced at the time, but over the years several limiting challenges with this type of clean room architecture became apparent:
- Converting large data sets to anonymous IDs, then co-mingling in a new environment, is clunky and takes time. At best, we could get to a monthly cadence.
- Having to transport, convert and co-mingle 3rd party data sets limited how many data sets we could realistically apply to the clean room.
- Audience activation was clunky (nowhere near real-time) and expensive (3rd party identity providers typically charge a premium for “onboarding” their IDs to activate).
- Privacy – while safeguards were put in place to limit leakage, you’re still actually transferring around PII to convert to anonymized IDs. That transport is a liability.
Where Clean Room Architecture is Going
In the past couple of years, there has been an assortment of different companies with different products trying to facilitate easier to implement clean rooms that integrate more easily into the tech stack. One in particular, Snowflake DCR (Data Clean Room), has allowed us to think about clean room architecture dramatically differently.
DCR allows different companies to connect data shares to a singular DCR instance, and join disparate data sets with either common IDs shared by both data sets – or by incorporating 3rd party identity providers apps directly within DCR.
All of this ID stitching is done “underneath” DCR, inaccessible by those running the queries on top.
The simplest way to think about it: instead of having to convert whole data sets and then co-mingle in new anonymized ID environments, DCR allows the querying across disparate data sets – without needing to transport, co-mingle or anonymize anything.
Flexibility and speed are the two most obvious benefits, but there is a myriad of other ones I’ve discovered as a I re-architect our clean room. I’ll save those for deeper dives on future days.
But one of the most compelling benefits – adding new 3rd party data sets to enrich our 1st party knowledge is so much easier when we don’t have to transport anything, and can simply query across.
The sudden opportunity to sync in thousands, if not hundreds of thousands, of new 3rd party data attributes – making sense of the plethora of newly available data points has us thinking about AI and Chatbot applications. (Again, I’ll save the deep dive there for a future day).
This concept of querying across disparate data sets, instead of co-mingling them behind anonymized IDs, is changing how we think of clean rooms. I suspect it will be responsible for shifting clean rooms from esoteric concepts that only huge companies can dabble in, to a common component of small – medium sized businesses operating their own tech stacks.