Our AI Solution to Government’s PDF Problem

We built an AI prototype can help government break down accessibility barriers

State and local governments rely on their websites as their front door for their constituents. It’s where people get information about programs, check their eligibility for services, and apply for the things they need. But too often, government websites aren’t as easy to navigate as they could be.

There’s one major pain point that prevents many government agencies from having a website that is truly accessible: PDFs. These documents lack critical accessibility features like screen reader compatibility, alt text for images, searchability, and adaptability to different screen and text size preferences. For the one in four Americans (over 70 million people) with a disability, that’s a huge problem.

PDFs can be the barrier between someone and the service they need to access. We believe that AI has huge potential to help government tackle this challenge. That’s why we built our AI PDF prototype.

How big is the PDF problem?

In 2023, the federal government reported 4 billion PDF downloads across agencies. The most highly downloaded PDFs are forms that help people receive essential government services. But 71% of PDFs from federal agencies have at least one accessibility issue, which can ultimately prevent people from getting what they need. For instance, untagged PDFs with embedded images and form fields show up as blank images for people who use screen readers. 

The PDF problem has become urgent due to a rapidly approaching deadline: April 26, 2026. By this date, cities and states with more than 50,000 residents need to have their websites compliant with the Americans with Disabilities Act (ADA) to ensure that web content and mobile apps are accessible to all.  This includes everything that can be downloaded from government websites, including application forms, maps, brochures, and more. Even if a state or local government has transformed their websites to be otherwise accessible, inaccessible PDFs remain the final barrier for full compliance. 

How can AI address this challenge?

Code for America formed our AI Studio as a nimble, tech-forward squad to pilot AI-enabled solutions in partnership with innovative government agencies. Based on our experience partnering with governments to effectively deliver human-centered services, we saw huge potential for AI in this space—but first, we had to understand the full scope of the problem.

We began by talking to those who understand it best: innovation officers, content managers, and digital service strategists who are managing the accessibility efforts of their government websites. We learned that governments are at varying stages of readiness in standing up workflows for auditing their PDFs.

Throughout the process, we worked closely with Rebecca Woodbury, founder of the Department of Civic Things. Rebecca had been conducting a manual PDF audit for a city and documenting her learnings along the way, providing us critical insights into where AI could plug in.

Together, we identified the need for a tool that would support website managers as they developed their processes for auditing PDFs. Our prototype is a web application where users can inspect the PDFs in their web domain and be provided with AI-enabled support so that they can make decisions on what to do with the PDF content.

How does the prototype work?

First, the tool collects all of the public PDFs from a city or state web domain using a script called a web crawler. By gathering all of the PDFs into one place, website managers get a better sense of how many PDFs they have within their web domains to help them estimate the magnitude of the problem. 

When we collect the PDFs through the web crawler script, we also keep track of the associated data, such as the title of the PDF, when the PDF was created, and who created the PDFs.  We use that information to train a machine learning algorithm to classify the PDF so a user can tell at a glance whether or not a PDF is a meeting agenda, or a map, or a form.

The prototype can provide short summaries of PDFs so they can be quickly reviewed.

We decided to leverage the strengths of generative AI and large language models to address major workflow pain points. PDFs can be hundreds of pages long and generative AI/large language models shine at summarizing long, complex documents. If someone is going through PDFs and needs a quick description, we provide an option for a user to generate a short text summary, which can also be used to generate alt text for images.

Our tool also supports different ways for a user to explore the PDFs. If a user knows the name of the file they’re looking for, they can search by name. If someone wants to investigate a type of PDF like meeting agendas, they can filter by file type. To separate quick wins, we provide a way to sort simple documents from complex ones. We also help teams stay organized by enabling users to sort their PDFs by whether or not they’ve been reviewed.

The prototype’s analytics dashboard can help PDF auditers track progress.

Some PDFs meet the ADA policy criteria for an exception—meaning they can be archived or removed rather than updated to comply with accessibility criteria. If a PDF doesn’t meet the criteria, the PDF needs more in-depth follow up. If a PDF reviewer isn’t familiar with the exception policy, our tool puts into place an “AI exception checker” that uses a large language model to summarize the content of the PDF and then compare it with the policy to see whether or not it meets the criteria for an exception. Ultimately, the reviewer can use this comparison to make their own informed decision on what to do next.

Finally, we discussed with our partners how a high-level overview of progress could help them manage their work. We developed an analytics dashboard within the app to facilitate insights on the progress made during their PDFs audits. 

Our AI tool is open source and available to governments conducting their own PDF. Explore more here.

What’s next?

We’ve launched the prototype with two partners: Salt Lake City, UT and the state of Georgia. We’re learning from them as they design their PDF audit process and workflows, supported by this system. Those learnings will help us iterate on our tool. We’re partnering closely to make sure that they reach the last mile of their accessibility journey before the ADA deadline next April. 

We know that AI has a lot of different use cases in government: conversational interfaces such as chatbots to triage and manage client success workload, data deduplication to reduce staff processing time, and document intelligence to help government agencies extract meaningful insights from their documents. We believe that PDF auditing could be the next major implementation, and we look forward to supporting governments looking to tackle their PDF problem.

Interested in improving web accessibility, or want to have a conversation with the AI Studio team about the AI use cases you’re considering? Reach out to us.

Related stories