Synthetic market research

From the first appearance of LLMs, the psychology and behaviour community has asked: "Can I use this tech to replace human subjects?"

It's now taken off in market research. There are lots of companies doing this, and I suspect that’s partly because it's not that hard in principle. The intuition is that LLMs, by hoovering up all the internet, have implicitly picked up a lot of how humans behave.

In essence, this works by prompting an LLM with as much individual-level information as you can, and then asking it a related question. Effectively, you're pushing the LLM into a state to act like a specific person, and then letting it auto-complete your questionnaire.

The hard part is getting the data to power your synthetic research subjects. I've been working on a project with a company who has data, to see if this does indeed hold up. That's for another day.

* * *

What has triggered all this is a classic paper: Generative Agent Simulations of 1,000 People (2024). This is a Stanford/DeepMind group who found that LLMs can replicate human survey responses with 70% to 85% accuracy.

To do that they used a lot of data: two-hour interviews with humans, transcribed for use in an LLM prompt. To get the accuracy, they asked the original humans and the LLM to answer other surveys, and looked to see if the humans and LLMs agreed.

Uploaded image

They found that humans agree with themselves around 81% of the time, if you ask them the same thing a few days apart. An LLM will agree with the human about 68%. 

The more information you give, the better the result: they ran the experiment giving the LLM a persona (short, narrative, description of the person), but that didn't work anything like as well as the transcript.

Uploaded image
Accuracy of the LLMs and human participants (re-drawn from the paper)

The potential is to survey your (synthetic) customers much faster. The trick is knowing if it works, and that's what I'm looking at in a project.

* * *

Related to all that, I was at Digital FMCG 2026 (Fast Moving Consumer Goods). Friendly bunch, and fun to see people walking around with KitKat notepads, or Yorkshire Tea folders, with titles like Head of Connected Pack Experience for Cheez-it and Pringles, or something like that. "Are there any Pringles fans in the audience?" Crowd goes wild, knowingly.

Everyone I spoke to was lovely and helpful in giving me clues. I've almost zero idea about any of this. My newbie impression is: wow, it's complicated. I was surprised by the level of deals done to get information from A to B. You put that product in your basket, but then checked out with another? The manufacturers could well get that fact back to inform their plans. 

Buzzing around my mind was: do they know what problem they want to solve with all this? I don't know.

I did sit in on a "connected pack experience" session, to find out what those words mean. And it's about putting QR codes in the lids of your products to take customers to a game or competition.

Imagine every Pringles can as a portal to an ever-refreshing digital world: Just a quick scan of a QR code on the pack transports you into a vibrant world brimming with exclusive content, games, and competitions.

Can you imagine that? As I heard more about the experiences and data possibilities I wanted to scream: why are you bothering? And then the penny dropped when they said (if I recall) that engagement on the site is 1.5 minutes—which is huge compared to a TV ad.

I'm still learning.