Why A/B Testing Breaks on Social Products

The Hidden Assumption Inside the Test

A/B testing is one of the most trusted tools in product management. That trust comes from a simple promise: show two groups different versions, isolate the difference, and let behavior tell you what works.

The method is powerful. It is also built on a strong assumption that often goes unstated: users are independent of one another. One person's treatment is not supposed to change another person's behavior. On many products, that assumption is shaky. On social products, marketplaces, creator platforms, collaboration tools, and networked systems, it is frequently false.

That is where a lot of experimentation confidence becomes overconfidence. If users affect each other, then the test is not only measuring a variant. It is measuring a changing social environment.

What Interference Actually Looks Like

Imagine testing a recommendation system on a professional network. If one group sees more weak-tie recommendations, the effect does not end with the users who got the treatment. The new connections they make can alter opportunity, information flow, and engagement for other people in the network too. The unit of analysis may look individual. The mechanism is not.

The same thing happens in messaging products when one group gets a sharing feature before another. It happens in marketplaces when one side of the market gets better ranking logic. It happens in social feeds when creators in one condition change what audiences in another condition end up seeing and responding to. Even something that looks isolated, like a new badge or social proof treatment, can spill over once people compare notes or react to the changed behavior of others.

Once interference exists, clean causal claims get harder. A statistically significant result may still be telling you the wrong story about why behavior moved.

Why This Matters More Than It Sounds

The danger is not just technical purity. It is product misdiagnosis. If a team believes a variant worked because of button copy when the real effect came from network spillover, they will scale the wrong lesson. If they think a social feature failed because users did not like it, when in reality it never reached enough density to create the intended network effect, they may kill a promising idea too early.

LinkedIn's own experimentation research has been explicit about this problem, and Iavor Bojinov, Guillaume Saint-Jacques, and Martin Tingley made the same point in Harvard Business Review: connected customers do not behave like independent lab subjects. On social products, the behavior of one user can change the treatment context of another. Sociology would describe this less as noise and more as the normal condition of social life.

That is why experimentation often looks most decisive precisely where the underlying model is too simple. The spreadsheet is clean. The social world is not.

If users can affect one another, the experiment is not only testing a feature. It is testing a relationship.

What Better Experimentation Looks Like

The answer is not to stop testing. It is to test with the right unit in mind. Sometimes that means randomizing at the level of groups, networks, teams, regions, or communities rather than individuals. Sometimes it means explicitly measuring spillover rather than pretending it is contamination. Sometimes it means running smaller social experiments alongside qualitative work so you can understand the mechanism rather than only the lift.

It also means being honest about when experimentation cannot carry the whole burden of judgment. Norm shifts, community interventions, trust features, and social discovery systems may need longer horizons and more interpretive methods than a typical product dashboard likes to allow.

The core question I would ask is simple: if one user's experience can change another user's behavior, why are we still acting like the test is happening in isolation? Once the product is social, the experiment has to become more social too.

What This Changes for PMs

The practical shift is to stop treating experimentation as automatically neutral. Every method sees some things clearly and misses others by design. A/B testing is great at measuring isolated response. It is much weaker when the phenomenon itself is relational.

That is why this essay belongs next to The Product Funnel Starts Too Late. Both are really about the same problem: product teams often inherit tools built for individuals and then apply them to systems shaped by interaction. The result is not just imperfect measurement. It is a narrower theory of the product than the product actually deserves.