Knowing when not to respond is as important as knowing how to respond. The product question isn't just "can the device do this?" — it's "should it, in this moment, for this person?"
Voice interfaces are a fundamentally different product category. There is no visual layer to absorb friction. No button to A/B test, no layout to iterate, no color that can guide attention. The interface is the interaction — entirely.
The three questions I kept returning to: What does this feel like to someone who didn't choose to become a "user"? How do we know if this is working — not by our metrics, but by what the person actually needed? What makes people trust a device they can't see making decisions?
In most product categories, trust accumulates. A person uses your product, it works, they come back. Voice breaks this model.
A device that mishears a medication reminder, calls the wrong person, or misunderstands a request in a high-stakes moment doesn't just cause frustration. It resets the entire trust relationship.
This creates a product challenge most standard PM frameworks aren't built for. The strategy has to account for asymmetry from the beginning — which means thinking carefully about which interactions the device should attempt and which it should gracefully decline.
Engagement metrics in voice products are unreliable in a particular way: they count interactions without understanding intent. A person who invokes a skill ten times and stops is counted, in most systems, as a successful ten-interaction user. The abandonment is invisible.
The framework I kept returning to: measure the delta between what the person tried to do and what they actually got. That gap — not the engagement number — is where product quality lives.
Voice forced honesty. Every feature had to justify itself in the conversation. That discipline — assuming you only have the interaction to explain yourself, no visual design, no marketing copy — is what I now apply to every product I work on. The constraint became the most useful thing I worked under. What voice interfaces teach →