Cadence Labs: studying the UI of agency

Most AI products bolt a chat box onto existing software. We think the interface, not the model, is the constraint that defines the next decade.

§01 The premise

The model is good. It has been for two years. Claude Opus 4.7 reads code, writes code, runs tools, holds context across hours of work without supervision. The bottleneck moved.

What it moved to is operator throughput. How fast can one person direct autonomous software, observe its state, course-correct, approve, intervene? Almost no one is researching this seriously. The dominant pattern across the industry is a chat textbox in a sidebar — the same input rectangle that has been on the web since 1996.

We started Cadence Labs because that pattern is wrong. The interface to AI is the unsolved problem of this decade. The companies that solve it at the operating-system level will own the next platform shift.

§02 The research questions

Four questions guide most of the work. We don't have closed answers to any of them. We have working hypotheses and a product where we test them on ourselves daily.

Operator capacity. How many concurrent agents can one person manage before quality degrades? Internal data so far suggests four to six with proper UI scaffolding, one to two with chat. The variable that moves the number isn't model speed. It's interrupt latency — the time between an operator noticing a problem and the agent actually changing course.

Ambient state. What does "this agent is fine, leave it alone" look like at a glance, when six of them are running in parallel? Logs are wrong: too dense, too unstructured, too easy to miss a signal. Progress bars are wrong: they encode time, not health. Animated state icons with tight motion design encode more bits per pixel than either.

Interruption design. When an agent is mid-tool-call and the operator notices it's heading the wrong way, the cost of interrupting must be lower than the cost of letting it finish. Most products invert this. A well-placed stop command should land in under 200ms and unwind cleanly. We've measured ours; we're not done tuning.

Voice in, screen out. The asymmetry is the design point. Voice is fast to emit and slow to consume. Screen is the opposite. So the operator speaks commands and reads results. Trying to use either channel for both directions degrades both.

§03 What has shipped from the research

A few findings that moved from internal experiments into the product:

Per-agent mute toggles cut audio fatigue by an order of magnitude. We initially had global TTS. Operators stopped speaking after the third concurrent agent because the noise was worse than the gain. Mute-by-default for new spawns was the fix.
Inline approval banners resolve roughly ten times faster than approval lists in a separate panel. The context for the decision is the action being approved; separating them adds a switch tax that compounds across agents.
Named addressing beat structured turn-taking. When operators can say "Atlas, kill that. Cypher, keep going," their mental model is a roster, not a thread. Latency to correct one of N agents drops to a single sentence.
Tool-call summaries as TTS output land. Verbose narration ("I'm now going to read the routing file in order to understand...") does not. We trimmed our progress summaries to subject-verb-object on tool name and target.

Internal benchmark · 2026 · Q1

Concurrent agents (chat UI baseline)1.4 avg

Concurrent agents (Cadence dashboard)4.7 avg

Time-to-approve (separate panel)11.2s

Time-to-approve (inline banner)1.1s

Voice command → action latency (p50)198ms

§04 Why this is research, not just product work

Two reasons it has to be done as research and not as feature backlog.

One, the answers don't exist yet. Multi-agent orchestration isn't a solved problem you can copy out of a design system. There's no Apple HIG chapter on driving four agents at once. About ten products on Earth ship any version of it, and most are derivative of the same chat-panel layout that everyone settled on in 2023.

Two, the design space is wide enough that getting it right requires writing the decisions down. Not on a whiteboard, where they vanish. In durable form, with the reasoning attached. When we revisit a choice in six months, we should be able to find out why it was made and what conditions would invalidate it. That's part of what this blog is for. The other part: our product is built on these calls, and we owe the people running it on their machines a record of how we got here.

§05 What gets published here

Three to four posts a quarter. The material is what we've found running the product on ourselves and on the early users who put up with our early decisions. Some of it will turn out to be wrong. Most of it will be specific enough to argue with.

What this isn't: SEO bait, vendor-neutral hedging, or the kind of "X tools to boost your AI workflow" listicle that has eaten most of the technical web. We took a position on the model layer (we ship only Claude), the input layer (voice, not keyboard), and the orchestration layer (multi-agent with named addressing). Those positions are how the product works. The blog is where we explain why.

The bet underneath everything: the interface to AI is the unsolved problem of this decade, most of the field is still iterating on the textbox, and the company that fixes the operator-throughput layer at the OS level wins the next platform shift. Cadence Labs is the research effort to figure out what that layer should look like.

§06 What lands in the next quarter

Three pieces are in draft.

The first is a teardown of how operators actually distribute attention across concurrent agents. Eye-tracking data, mouse-acquisition latency, and the moment quality starts dropping. The number we keep arriving at is six. We don't yet understand why six.

The second is on the sound design of agent state. We currently encode running, blocked, succeeded, and failed in audible cues, and we've found that operators stop hearing them after about fifteen minutes. There's a fix involving spectral spread and timed silence. We'll publish it when we're sure it works.

The third is a response to a critique we've been getting in private: "isn't 'voice as primary input' just a niche power-user move?" The short answer is no. The long answer involves a stack of input-throughput data, and we're going to publish that data so the question stops being a vibes argument.

If you want a notification when each lands, the GitHub repo gets release notes that mention new posts. We don't have a newsletter. We're not building one. The writing index is the canonical place to follow along.

Cadence Labs: studying the UI of agency.