---
title: "A CPO we worked with shipped production code daily. Here's the system that made it safe."
date: "2026-06-25"
excerpt: "A product person with zero engineering background shipping to production sounds reckless. What makes it work is the system around the model."
author: "Marcin Ostrowski"
---

At nerds.family, a typical feature started with the CPO, who has zero engineering background, writing a PRD together with an AI that had access to the repository. That detail does more work than it sounds like: because the AI could read the codebase, he learned in the first ten minutes what was possible, what was cheap, and what was expensive, instead of finding out three weeks later from an estimate. I filled in the technical gaps and decisions, handed the spec to a coding agent, and reviewed what came out. If it was fine, it shipped. We ran this way, to production, daily, from December 2025 until the engagement closed this June.

When I describe this to engineering leaders, the first reaction is usually some version of "that sounds reckless." It would be, if safety rested on the CPO. It doesn't. That's the whole design.

## Safety is a property of the system, not the person

Here's the uncomfortable mirror question: in your org, what actually stands between a mid-level developer's mistake and production? If the honest answer is "a colleague skims the PR," then the difference between your safety and ours is thinner than it looks. We just stopped pretending the human skim was the load-bearing part, and built the load-bearing part on purpose.

The system has four layers, and none of them is a vibe. Conventions live in the repo and load into the agent before it writes a line, so the code comes out the way our codebase does things, not the way the model's training data does. Deterministic gates run before any human looks: linters with our rules, security scanning, a check that fails when changed code lacks tests, the full suite locally before push. Work arrives in slices small enough that a wrong one is a small revert. And a release that fails its checks rolls back automatically instead of waiting for somebody to notice the graphs. I've written up the verification side in [How do you know the software is working?](/blog/how-do-you-know-the-software-is-working) and the conventions are [public](https://github.com/marostr/superpowers-rails).

What's left for a person is judgment: is this the right change, does it do what the spec says. That's [the review that survived](/blog/the-review-gap-reviewing-ai-generated-code) when we restructured everything else, and it's where I spend my time. It still catches things the gates can't. The difference is that the floor no longer depends on it.

## The interface is the spec

The CPO's contribution entered through exactly one door: the spec. He never edited code, and he didn't need to. What made his specs good enough to build from was the repo-aware AI on his side of the table. A PRD written against the actual codebase doesn't ask for the impossible and doesn't reinvent something that exists. The edge cases showed up while he was still writing, not in week three.

This is the part most teams get backwards. They give non-engineers a code-generating tool and hope the output is safe. We gave a non-engineer a spec-writing tool and made the pipeline behind it safe. His intent had high bandwidth into the system; his hands never touched the parts where an honest mistake is expensive.

## Autonomy is sized to blast radius

Honesty requires a distinction here, because "the CPO ships production code" describes two different arrangements we've run.

At nerds.family, the platform carried real client work, so the loop kept me in it: he designed, agents built, I made the calls before production. What he gained wasn't merge rights. It was that the distance from his idea to shipped behavior collapsed from "wait for a developer" to "write the spec well."

At Gyfted, where we installed the same harness on a stack we don't usually work in, their CPO ships landing pages to production on his own schedule, with no developer in the loop at all. Nobody decided CPOs are trustworthy now. The lane is bounded: the worst landing-page mistake is cheap, so the gates alone are enough, and a human reviewer would add latency without adding safety.

That's the rule worth stealing: autonomy is sized to blast radius, not to job title. Widen a lane only after the gates have caught real mistakes in it, and keep the expensive lanes gated.

## What it cost to build

I won't sell this as a weekend setup. I spent the first two to three months at nerds.family, from September 2025, plus real money, trying everything the ecosystem offered: GSD, Superpowers, spec-kit, kiro.dev, claude-swarm, claude-on-rails, the BMAD method. Most of it didn't survive contact with a production codebase. The harness we run is built from the parts that did, and it was built out of necessity, because I was the only technical person and the bottleneck was me.

The cost has a shape worth knowing in advance: it's front-loaded, and it compounds. Every rejected change becomes a new rule, every production surprise becomes a new check, so the system you have in month six is meaningfully better than the one you started with. Two people, part-time, shipped more in a week than the product got from a full-time developer in a month before the harness existed. Our count, one product, and the multiple matters less than who's in the loop: the person closest to the customer designs the feature, and the machinery carries it to production.

If your AI adoption stalled at "the demos were nice," the missing piece is usually the system around the model, one strict enough that the people with the product knowledge can finally use it directly.
