Ohiyo — From a One-Line Idea to Eight Finished Formats
Four specialized agents — Researcher, Creator, Reviewer, Archivist — relay through research, writing, QA and archiving: one sentence in, eight finished formats out in five minutes.

Ohiyo is my full bet on one judgment: in 2026, an AI product's experience is 60% model and 40% harness — the layer of orchestration, memory, tools and context engineering around the model. Anyone can rent the model; the gap lives in the harness. So Ohiyo sinks all complexity into the backend, and the user sees a single chat box.
The scenario it serves is the three hours after a creator's idea: researching, drafting, reformatting for each platform, being your own editor. In Ohiyo, one sentence goes in; research, writing and review relay through in about 90 seconds; out come eight publishable formats — article, tweet thread, video script, email, slides, PDF, chart, poster — fifteen format cards on the frontend in total.
One demolition and rebuild
In V2 I designed five preset agents (scout, researcher, creator, stylist, designer) wired into a pipeline — it looked very "multi-agent" and felt rigid in use: real user needs always fell into the gaps between preset pipelines. In April 2026 I deleted the entire preset-agent system and replaced it with one Orchestrator wielding 20 tools directly, wrapped in a budget controller: at most 30 tool iterations and a 100K-token ceiling, with a live warning pushed at 80%.
Deleted code says more about judgment than written code. Control is exposed in three layers: ordinary users get the chat box (L1); power users get 12 slash commands (L2); developers find the full configuration panel deep in settings (L3). The complexity exists — it just never interrupts anyone.
Memory is layered, and every layer degrades gracefully
- Conversations auto-compress into summaries past 80K tokens;
- Persistent memories are auto-extracted from dialogue, with conflict detection;
- Semantic recall runs on pgvector: vector similarity, multiplied by a 30-day half-life time decay, then filtered through MMR diversity — at most 15 memories injected.
Every layer has a fallback — if the embedding service dies, recall degrades to full injection. The memory system is never allowed to become a single point of failure. One selection judgment along the way: pgvector over Pinecone, because at MVP stage a pure $70/month cost is not worth it, and Postgres is more than enough below a million vectors.
The invisible engineering
The two places I spent the most effort that users will never see:
- Tenant isolation: every UPDATE and DELETE must carry both the record id and the user_id. The test suite installs a SQL Spy that listens to the SQL the ORM actually generates and asserts, statement by statement, that every destructive operation's WHERE clause contains user_id — defense in depth that does not rely on programmer discipline;
- Billing atomicity: balance deduction and ledger writes share one database transaction. Tests monkey-patch the ledger insert to fail and assert the user's balance moved by exactly zero — the transactionless version leaves behind the disaster of "money taken, nothing recorded."
Scale: 54K lines of backend and 12K of frontend TypeScript, 36 tables, 103 test files, seven LLM providers with per-key priority failover. The name Ohiyo plays on the Japanese for "good morning" — a head start on a creator's day.