The Rosetta Stone Problem: Why Policy-to-Code Translation Is a Human Rights Issue

When you spend enough time working in public benefit systems, a familiar frustration starts to surface: the people who understand the policy and the people who build the systems to enforce it often talk past each other.

One group is fluent in regulations, exceptions, and intent. The other speaks in logic, constraints, and system architecture. Two brilliant minds in a room with no common language. And quietly, invisibly, the misunderstanding starts making decisions. About who gets help. About who falls through.

I've seen that breakdown cause slow, expensive, and sometimes catastrophically flawed system implementations. And when the Oxford Commission on AI Governance released its framework for responsible AI deployment in public institutions,

The Oxford Commission Named What I Had Been Watching

The accountability gap in AI-assisted governance isn't primarily a technical failure. It's a translation failure.

The Commission was unambiguous about something the tech industry has spent years softening. Human-in-the-loop is not a safety checkbox. It is not a liability hedge. It is the structural foundation of democratic accountability. And here is the part that stopped me cold: if the humans closest to policy intent cannot read, verify, or correct what the system is doing, then the loop is not closed. It is just labeled closed.

There is a real person on the other side of that label. Someone who applied for food assistance, or Medicaid, or housing support. Someone whose eligibility got processed by a system that a policy expert could not fully see, could not fully interrogate, and therefore could not fully stand behind.

The Question That Changed My Approach

What if we could close that communication gap using generative AI, in a way that kept humans meaningfully in the loop rather than just nominally?

While many teams focused on improving experiences for applicants or navigators, I turned my attention upstream to the policy experts. I wanted to empower the people who best understand the rules to also shape how those rules are implemented in code. Not hand their authority to a model. Shape it. Maintain it. Stay legible to it.

A Rosetta Stone Between Policy and Code

So I started experimenting with large language models to generate a sort of Rosetta Stone between policy and software. The result was an intermediate format, a domain-specific language built specifically for describing public benefit policies. Structured enough for engineers to implement directly, but still readable and verifiable by the policy experts who hold the actual institutional knowledge.

The Oxford Commission draws a distinction that I think about constantly now. There is nominal human oversight, where a person technically signs off on an AI output. And then there is meaningful human oversight, where that person has enough interpretive access to catch an error, flag an edge case, or push back on a logic flaw. For meaningful oversight to exist in policy-to-code work, the policy expert has to be able to see themselves in the output. They have to be able to say: this is what the rule means, or it is not.

The domain-specific language I built became that connective tissue. The model generates. The policy expert verifies. The engineer implements. The loop stays genuinely closed because the human in the middle is not approving something they cannot read.

Where Accountability Actually Lives

I spent the summer working with various LLMs to refine this idea, testing prototypes and stress-testing how reliably models could generate policy logic in this format. What I kept returning to was not model performance. It was the moment of human review. That moment is where accountability lives. That is where you catch the exception the model flattened, the nuance that did not survive the training data, the community the system forgot.

The Oxford Commission puts it plainly: AI systems operating in high-stakes domains must be interpretable in real time by the humans who bear accountability for the decisions. In public benefit administration, those humans are policy experts. And for too long, the systems built to carry out their work have been opaque to them.

Legitimacy Is Not Something a Model Can Generate

This work drew on everything I care about: responsible AI, civic tech, human-centered design, and the promise of government systems that actually serve the people they were designed for. The Oxford framework gave me language for why the human in the loop cannot be a formality. When that person is a genuine partner rather than a rubber stamp, the whole system earns a legitimacy that no model can generate on its own.

If you are working in the public benefit space, or thinking about how to turn policy into code in your own world, I would love to compare notes. The translation problem is solvable. But only if we are honest about who needs to be in the room, and what they need to be able to read.

Next
Next

We've Rebuilt Government Digital Infrastructure Before. Here's What AI Requires Us to Get Right This Time.