We've Rebuilt Government Digital Infrastructure Before. Here's What AI Requires Us to Get Right This Time.

How AI and LLMs will change government websites

One of the first jobs I ever had in government technology was leading the complete redesign of a hospital quality reporting system. That was over seven years ago. And I still see the same core challenge playing out today, just wearing different clothes.

It was 2017. I was working with the United States Digital Service on the CCSQ process, the system that determined how hospitals got paid under Medicare and Medicaid. At the time, most government websites were organized like org charts. If you did not already know which agency owned what you needed, you were not finding it. And the people who suffered most for that were the ones who could least afford to navigate a maze.

We made a deliberate choice to organize content around user need instead of bureaucratic structure. Resident. Business owner. State employee. We built pathways based on who you were and what you actually needed, not on how the government had arranged itself internally. That decision sounds obvious now. In 2017, it was a genuine fight.

What I understood then, and what has only deepened through my later work at CMS and now advising enterprise AI strategy at Metric5, is that reorganizing how information flows is never just a design problem. It is a decision about who bears responsibility when the system gets it wrong. When I was helping shape how hospitals reported quality data and how that data connected to payment systems, every structural choice I made had a downstream consequence for real institutions and real patients. I had to be able to name that and own it.

That accountability does not disappear when you introduce AI. It concentrates.

Now We Are at the Same Inflection Point, and the Stakes Are Higher

Large language models are doing to government content what search engines did twenty years ago. They are ingesting it, reinterpreting it, and serving it back to people who trust the output because it sounds authoritative. And just like in the early search era, government agencies are not yet building for the infrastructure that is actually consuming their information.

But there is a critical difference this time. Search engines indexed what you published and ranked it. LLMs synthesize across sources and generate responses that may blend your authoritative content with seventeen other things the model encountered during training. The accuracy problem is real, but the deeper problem is interoperability.

Government agencies do not operate in isolation. A question about Medicaid eligibility may touch CMS policy, a state agency's implementation, a county navigator's interpretation, and a benefits portal's outdated FAQ. When an LLM processes that question, it is not consulting one clean source of truth. It is working across a fragmented ecosystem where those sources are not designed to talk to each other, let alone to be machine-readable in consistent ways.

That is the interoperability crisis underneath the AI readiness conversation. And it is one I watched develop up close.

At CMS, I saw how even within a single federal agency, systems built in different eras, on different platforms, with different data standards, created enormous friction for the humans trying to make policy work in practice. When I led learner insights research on technology skill gaps among CMS employees, one of the clearest findings was that people were not struggling because the tools were hard. They were struggling because the tools did not speak to each other. Staff were manually bridging gaps that should have been closed at the infrastructure level.

AI does not fix that. It inherits it.

Interoperability Is a Prerequisite, Not a Feature

If government agencies want LLMs to return accurate, trustworthy responses about their programs and services, they have to think about content architecture the way they think about policy architecture: with explicit attention to how pieces connect, who is responsible for each piece, and what happens when two pieces say different things.

This means structured metadata so that a model can understand not just what a page says but what program it governs, what population it serves, and when it was last verified. It means shared data standards across agencies so that an LLM synthesizing across sources is not trying to reconcile apples and org charts. It means building document question-answering pipelines that route queries to authoritative sources rather than averaged guesses.

And it means being honest about something the Oxford Commission on AI Governance has put plainly: meaningful human oversight in AI systems is not possible when the underlying infrastructure is opaque. If policy experts cannot trace how an AI-generated response got constructed, if they cannot see which source was weighted and which was ignored, then they are not actually in the loop. They are approving something they cannot read.

I know what it costs when that happens in a benefits system. I have seen it.

Personal Responsibility Does Not Transfer to the Model

Here is what I keep coming back to after years of working at the intersection of policy, technology, and human services. The people who make structural decisions about government AI systems carry accountability that does not disappear when the model goes live. When I helped design data flows at CMS, I could not point to the database and say it made the call. The same is true now.

When an agency decides how its content will be structured, what metadata standards it will adopt, whether it will invest in retrieval-augmented systems that pull from verified current sources rather than static training data, those are not technical choices. They are decisions about who gets helped and who gets a wrong answer at a moment when it matters. Someone has to own them.

I have been working hands-on with LLMs to understand where they perform, where they fail, and what it takes to deploy them responsibly in high-stakes government contexts. The path forward is not complicated, but it requires clarity about what you are actually building toward.

LLM-ready content. Interoperable data architecture. Human reviewers who can genuinely interpret what the system produced. Those three things together are not a technical checklist. They are a governance posture.

If you are thinking about what your agency's AI future looks like, the time to build the foundation is now, before the model is the one being asked the question your content is not ready to answer.

Let's talk about what it takes to get there.

Previous
Previous

The Rosetta Stone Problem: Why Policy-to-Code Translation Is a Human Rights Issue

Next
Next

How Federal Agencies Can Achieve AI Readiness Through Open Data and Strategic Data Management