Designing for Trust in Voice Interfaces

The Constraint

Voice interfaces are a fundamentally different product category. There is no visual layer to absorb friction. No button to A/B test, no layout to iterate, no color that can guide attention. The interface is the interaction — entirely.

The three questions I kept returning to: What does this feel like to someone who didn't choose to become a "user"? How do we know if this is working — not by our metrics, but by what the person actually needed? What makes people trust a device they can't see making decisions?

Trust Is Not Additive

In most product categories, trust accumulates. A person uses your product, it works, they come back. Voice breaks this model.

Trust in voice is built slowly and lost instantly — and the loss is asymmetric.

A device that mishears a medication reminder, calls the wrong person, or misunderstands a request in a high-stakes moment doesn't just cause frustration. It resets the entire trust relationship.

This creates a product challenge most standard PM frameworks aren't built for. The strategy has to account for asymmetry from the beginning — which means thinking carefully about which interactions the device should attempt and which it should gracefully decline.

The Measurement Problem

Engagement metrics in voice products are unreliable in a particular way: they count interactions without understanding intent. A person who invokes a skill ten times and stops is counted, in most systems, as a successful ten-interaction user. The abandonment is invisible.

The framework I kept returning to: measure the delta between what the person tried to do and what they actually got. That gap — not the engagement number — is where product quality lives.

What This Clarified

Voice forced honesty. Every feature had to justify itself in the conversation. That discipline — assuming you only have the interaction to explain yourself, no visual design, no marketing copy — is what I now apply to every product I work on. The constraint became the most useful thing I worked under. What voice interfaces teach →