Designing for AI: what the field still gets wrong

We've learned to design with AI. We haven't learned to design for AI.

That's not a semantic difference. It's the whole problem.

The wrong questions

Most of what gets called "AI design" is standard product design applied to a new output type. The AI generates something. The designer figures out how to display it. Card layout, loading states, error handling. Real questions. Not the new questions.

The new questions are upstream:

When should the AI speak? Not every moment of user uncertainty is an opportunity for intervention. The signal-to-noise calculation is fundamentally different from traditional features.

What is the AI's authority? Is this a suggestion, an answer, or a directive? Users calibrate trust differently for each. The interface needs to communicate that distinction — not through disclaimers, through design.

What happens when it's wrong? Not as an error state. As a designed expectation. AI systems are wrong sometimes. That's not a bug — it's a property. The interface needs to be built for it.

Who is accountable for the outcome? When an AI recommends something, the user acts on it, and the outcome is bad — where does blame land? How you design that attribution affects trust, retention, and long-term usage patterns.

Most teams aren't asking these. They're asking about card layout.

The research problem

Traditional UX research assumes you can observe what users do and ask them why. AI breaks this in two specific ways.

First: the AI's output is probabilistic. What you test in research isn't what appears in production. The model changes. Context changes. Prior interaction history changes. You can test the structure of the experience. You can't test the content.

Second: users don't have language for their AI experiences yet. They know when they trust something and when they don't. They can't always explain why. Post-rationalisation is worse with AI than any feature category I've worked in.

The research question changes. You're not testing usability — you're building a theory of calibration. What conditions make trust form and break. That requires different questions, a longer observation horizon, and tolerance for ambiguous data.

Most teams haven't updated their practice. Same usability studies. Different feature.

The spec problem

How do you write a design spec for a system whose output you can't fully predict?

Traditional handoff is deterministic. Here's the component. Here are the states. Engineers implement. QA tests against the spec.

AI breaks that. You can spec the container, the fallback states, the edge cases. You cannot spec the content of an AI-generated response.

The shift is from specifying outputs to specifying constraints. What should the AI never say? What's the maximum response length before it becomes unusable? What confidence threshold triggers a result? What categories of response require human confirmation before action?

These are design decisions. Most designers aren't making them — they're leaving them implicit, delegating to the model team, or discovering them in post-launch bug reports.

The trust ratchet

Trust in AI features is not symmetric. It's far easier to destroy than build.

One high-confidence wrong answer. One recommendation that costs someone time or money or embarrassment. The feature becomes invisible. Users don't give it a second chance — they just stop using it.

This means the design disposition needs to be more conservative than most tech culture allows. Move fast, ship and learn, A/B test everything — those instincts are right for most features. For AI features they're dangerous. The cost of a failed trust moment isn't a bounce rate you can optimise. It's a permanent behaviour change.

Slower launch cycles, more conservative confidence thresholds, explicit recovery patterns. These aren't signs of a timid team. They're signs of a team that understands the asymmetry.

The conversation we're not having

Not more frameworks. We have enough.

What we need: honest accounts of what broke. Cases where the AI was wrong and the interface failed to catch it. Features that worked in research and fell apart in production. Design decisions that moved trust the wrong direction.

Until that conversation happens, we'll keep building AI features that work technically and fail experientially. Features users try once, decide they don't trust, and never open again.

How do you feel about this article?

More writing

🍜

2-Minute Noodles and the Vibe Coding Illusion

Read

📐

Systems thinking isn't a design skill — it's a survival skill

Read

🎯

Why I chose generalism over specialisation

Read