Palette Forge

An agent-first utility for building accessible color palettes. You hand it an image and say "surprise me"; it composes four distinct light and dark UI palettes, checks every pairing against WCAG contrast itself, self-corrects until it passes, and lets you refine and branch from any take. You are the final word on taste.

The reason an agent earns its place here — and doesn't in most apps that bolt one into a sidebar — is that color has a free, automatic verifier: contrast ratio is math, not opinion. So there are two checks stacked on every output: contrast (non-negotiable, the machine's job) and you (taste, live). Delete the agent and what's left is a color picker. The agent is the product.

The distinctive bet is the knowledge/ folder: plain markdown you can read and rewrite. It both guides what the agent proposes and is the rubric it judges against. Change the markdown, change the output — no code required. That's the whole thing made legible.

The lane is welded: image in, four UI palettes out. Not a design-system builder, not an image editor, not a SaaS. You can retune what good color means; you can't talk it into being something else.

This is the flagship of three small agent-first tools being built in public. It goes first, all the way to shipped, before the others get real attention — one shipped tile beats three planned ones.

The agent has a free referee

The whole bet is that an agent earns its place here only because color has a free, automatic verifier — contrast is math, not opinion. The first real palette proved it. A rule I'd written by hand — "text on the accent must hit AA" — rendered a loud red failure on the very first render. Not a fluke: that pairing is physically unsatisfiable (dark text on a dark accent can't reach 4.5:1). The tool caught a mistake in its own rulebook before I did. I reversed the rule to a light label on the accent, and scores jumped.

Faking it first paid off twice

The entire journey — animation, scoring, branching, the feel of watching the agent work — was built against a deterministic simulator behind one engine seam: no key, no tokens. When the real Claude engine dropped into that exact socket, it worked on the first run. The simulator didn't get thrown away; it's now the no-key demo, the fallback, and the test fixture. Code owns the verifier and picks the winner — Claude composes and explains, but never grades its own work.

Then I cut the core idea

The hard part. The app was built on color theory — triadic, complementary, the wheel. In a single-accent UI palette those types are invisible: every variation collapsed to "neutrals plus one accent." People don't think in wheel types anyway; they think in vibes — Braun, Teenage Engineering. So v1 got reframed: image in, four distinct UI palettes, "surprise me," characters instead of taxonomy. Fewer concepts, less code, better output. The contrast engine, the six roles, light and dark, the refine-and-branch trail all stayed welded — only the framing that never worked got removed.

Where it stands: live on a real key, "surprise me" shipped, the geometric library cards next.

Progress

The whole build in a day — and one idea thrown out

The agent has a free referee

Faking it first paid off twice

Then I cut the core idea