Flagship case study / 2026
ShippedTransforming a Design System into a Living Knowledge Platform
Our AI could access the entire codebase, but it still couldn’t make design decisions. I built a machine-readable knowledge layer that helps agents choose components, patterns, flows, and UX decisions across design and development workflows.
- Role
- [Lead Product Designer]
- Focus
- Design Systems + AI
- Duration
- [duration needed]
- Outcome
- Knowledge platform
Documentation
Knowledge
Decisions
The project was not a chatbot or a single MCP server. It was a shift from publishing information to operationalizing design judgment.
01 / The gap
AI understood the code. Not the design.
The company already had an official MCP that could navigate the codebase, find components, and explain their APIs. That solved access to implementation. It did not solve design judgment.
An agent could discover that a dialog component existed, inspect its props, and generate valid code. It still could not tell whether a dialog was the right interaction, when a destructive action needed an extra confirmation step, what should happen after failure, or which exception had already been approved. Those answers lived across long guides, examples, review conversations, and the memories of design-system contributors.
Code-aware MCP
How to implement it
- Component names and source code
- Properties, types, and examples
- Existing implementation references
- Technical constraints
Design knowledge layer
What should be built
- Patterns and complete flows
- Semantic rules and principles
- Required states and edge cases
- Exceptions and decision rationale
This changed the problem definition. The goal was not to duplicate the official MCP or build a better search box. It was to model the layer of knowledge that turns available components into coherent product decisions.
02 / Maturity
Design systems keep evolving.
The familiar maturity story ends with documentation. In practice, documentation was only another passive layer people could ignore, misread, or never discover.
Components
Reusable interface building blocks.
Tokens
Shared visual decisions encoded as variables.
Documentation
Guidance that explains APIs and usage.
Patterns
Repeatable solutions to product problems.
Flows
Complete scenarios, states, and recovery paths.
Agent-ready knowledge
Structured judgment that software can retrieve and apply.
Components made interfaces reusable. Tokens made visual decisions reusable. Documentation explained how both worked. Patterns and flows then captured larger product decisions: not only which control to use, but how an entire scenario should guide a user through uncertainty, risk, error, and recovery.
The AI layer was the next maturity step. It adapted those assets for a new class of consumer: agents that need explicit, retrievable, bounded instructions. The design system became capable of participating in decisions instead of waiting for a person to open a documentation page.
03 / Knowledge model
Components were only one layer of the system.
To make design knowledge useful to agents, I separated it by the kind of decision it could support. This prevented every question from loading the entire design system into context.
Components
Button, dialog, navigation, form field
Patterns
Search, filtering, confirmation, progressive disclosure
Flows
Onboarding, creation, deletion, recovery
States
Empty, loading, error, success, overflow
Principles
Hierarchy, clarity, interruption cost
Exceptions
Approved deviations and the reason they exist
This distinction matters because a correct component can still create the wrong experience. A deletion journey is not solved by finding the destructive button. It includes consequence copy, confirmation, permissions, progress, failure, recovery, and the state left behind. Similarly, onboarding is not a collection of tooltips; it is a flow shaped by progressive disclosure and the user’s current level of context.
04 / Platform
MCP was the first interface, not the final product.
The durable asset was a shared knowledge layer. MCP exposed it to agents, while skills, chat, Figma, Slack, and a future CLI could reuse the same rules without creating new knowledge silos.
Knowledge layer
Components, patterns, flows, principles, and exceptions.
MCP interface
Typed access to design-system knowledge for development agents.
Point-of-work clients
Chat and design-tool prototypes reuse the same source.
CLI and wider distribution
Cheaper retrieval and more specialized workflows.
Structured design-system knowledge, typed MCP tools, and task-specific guidance available to development agents.
In-context question answering for designers and team communication surfaces, tested as prototypes or partial flows.
Skills, routing instructions, evaluation suites, and a focused CLI to reduce repeated context and token consumption.
This platform framing also reduced bus factor. Routine questions could be answered from approved knowledge at the point of work. Ambiguous or policy-level decisions could still escalate to a design-system owner with the relevant context attached. The system scaled access to expertise without pretending that every design decision could be automated.
05 / Tool design
One goal. Three tools.
Icon selection exposed an important AI product-design principle: the same user goal can require different levels of certainty, context, and cost.
recommend_icon
Turns an intent such as “export data” into a short semantic candidate list.
Context cost
Low
Use when
The agent needs direction but has little product context.
match_icon
Checks whether a candidate already exists and identifies the closest approved asset.
Context cost
Medium
Use when
A likely symbol is known and duplication must be avoided.
select_icon
Makes a final contextual choice and returns rationale, constraints, and usage notes.
Context cost
High
Use when
Meaning depends on workflow, neighboring actions, or exceptions.
A single “find an icon” tool looked simpler, but it forced the agent to pay for deep context even when a lightweight semantic suggestion was enough. Splitting the workflow made the trade-off explicit. The agent could stop after a recommendation, verify that an asset existed, or spend more context on a final decision only when the product situation demanded it.
Reconstructed and anonymized tool trace
User
Choose an icon for exporting a table as CSV.
recommend_icon
Shortlist: Download, FileDown, Export. Exclude Save because the action creates an external file rather than persisting an in-session edit.
[tokens]
match_icon
Download already exists in the approved set. FileDown is not part of the current library.
[latency]
select_icon
Use Download. It matches the established export convention and avoids introducing a second symbol for the same intent.
[cost]
Real icon-tool benchmark
Prove that specialized routes changed token use, retries, or first-pass acceptance.
- Required source
- Anonymized agent traces for comparable recommend, match, and select tasks.
- Anonymization
- Remove company names, repository paths, proprietary icon names, and user identifiers.
- Recommended format
- 1600 × 1000 px transcript and a compact table covering tokens, latency, retries, and outcome.
06 / Prototyping
The highest leverage appeared before designs existed.
When an agent was implementing an existing mockup, the design system mostly improved accuracy. When no mockup existed, the knowledge layer helped shape the experience itself.
Starting prompt
“Design a flow for deleting a workspace that contains active projects and multiple members.”
Component-level response
A dialog and a red button
- Finds an Alert Dialog component
- Adds a destructive primary action
- Asks the user to confirm
- Stops at the happy path
System-guided response
A complete destructive flow
- Explains impact before confirmation
- Checks permissions and blocking conditions
- Includes progress, failure, and recovery
- Defines the post-deletion empty or redirect state
Retrieved decision package
Pattern
Destructive action with explicit consequences
States
Blocked, confirming, processing, failed, complete
Components
Inline warning, dialog, progress, notification
Rules
Do not rely on color; preserve a safe exit
Exception
Skip re-auth only for low-risk sandbox data
Recovery
Explain retention and restoration policy
This moved the design system upstream. Instead of checking compliance after a screen had been designed, it could guide early exploration: compare several visual approaches, choose an interaction model, identify missing states, and explain how the product should lead a user through the flow.
07 / Feedback
The design system started talking back.
Traditional documentation broadcasts guidance and waits for people to report problems. Agent interactions can produce structured evidence about where that guidance succeeds, conflicts, or disappears.
Interaction
An agent applies a component, pattern, or flow.
Feedback
It records uncertainty, conflict, modification, and outcome.
Pattern update
A maintainer adds a rule, example, or approved exception.
Evaluation
With-guidance and baseline outputs are compared.
Release
Human-approved improvements return to the shared layer.
Short feedback records could reveal questions that surveys rarely captured: which rule was missing, where two sources contradicted each other, what an agent changed after a user request, and which workaround repeatedly appeared. This made design-system quality observable at a scale that depended less on people remembering to send feedback.
Human governance remains explicit.
AI can identify clusters, propose guidance, and run comparisons. Design-system owners approve changes that alter principles, policy, or product behavior.
Feedback taxonomy and evaluation result
Show one concrete case where recurring agent feedback exposed missing or conflicting design-system guidance.
- Required source
- Anonymized interaction logs plus one before-and-after guidance update.
- Anonymization
- Aggregate by topic and remove prompts, product names, repository identifiers, and personal data.
- Recommended format
- Two 1400 × 900 px visuals: a feedback-cluster view and a with-guidance versus baseline evaluation.
08 / Adoption
It looked like vibe coding. Then it became a standard.
The strongest resistance came from engineers who saw AI-generated design-system guidance as less trustworthy than conventional tooling. Adoption changed when the system proved that it constrained agents rather than giving them more freedom to improvise.
Skepticism
AI output was associated with generic UI and fragile code.
Focused pilot
Narrow tools answered concrete design-system questions.
Visible proof
Decisions became more consistent and easier to review.
Default workflow
The approach moved from experiment toward company standard.
The project did not remove disagreement. It made disagreement more useful. The system could answer established questions immediately, expose the source behind an answer, and escalate low-confidence or disputed cases. Design-system experts spent less time repeating settled guidance and more time deciding the situations the system did not yet understand.
For design leadership, this was the larger organizational change. Adoption no longer depended only on training sessions, office hours, or individual reviewers catching the same mistakes. Approved guidance could travel with the work, while unresolved questions arrived with enough evidence to improve the system rather than disappear inside another one-off conversation.
Adoption proof and team quotation
Substantiate the shift from resistance to routine use without overstating company-wide adoption.
- Required source
- Dated rollout milestones, usage records, and an approved quote from an engineer or design leader.
- Anonymization
- Remove names and product identifiers unless written permission is available.
- Recommended format
- 1600 × 700 px timeline with one short quotation and a source note.
09 / Outcomes
The platform made design-system impact measurable.
Exact production numbers still need to be cleared for publication. The measurement model is already defined, so the final case can distinguish demonstrated impact from future potential.
Decision quality
[metric needed]
First-pass design-system approval or review correction rate.
Self-service speed
[metric needed]
Median time from a design-system question to a usable answer.
Token efficiency
[metric needed]
Tokens per resolved task, split by tool and confidence level.
State coverage
[metric needed]
Required default, empty, error, success, and recovery states included.
Expert interruptions
[metric needed]
Routine questions resolved without involving a design-system owner.
Feedback volume
[metric needed]
Structured gaps and conflicts captured from agent interactions.
These measures connect design-system work to decisions rather than page views. Documentation traffic says that someone opened a guide. First-pass acceptance, state coverage, token cost, escalation rate, and recurring conflict topics show whether the system actually changed how products were designed and built.
10 / Reflection
“The design system stopped being a place people had to visit. It became an active participant in how products were designed and built.”
The project expanded the role of a design-system team beyond components and documentation. It required knowledge architecture, AI tool design, routing, evaluation, observability, and governance. It also created a practical path for carrying the same expertise into engineering agents, design tools, chat, and future workflows.
The next design-system interface is not necessarily a website. It is the decision layer available wherever work happens.