News and Stories
Blog post
Principles & Practices

How to Start Small with AI Research Experiments

Identifying areas for experimentation where AI could be a big help—even on small projects

February 10, 2025
Share
Tweet

At Code for America, we’re leveraging cutting-edge technology to advance our commitment to human-centered design, data-driven solutions, and making government work well for everyone.. Whether it’s working on a chatbot for improving application assistance, or making criminal record clearance more efficient, we are constantly exploring new tools to deliver the best outcomes for clients and government teams.

Right now, there’s a lot of focus on the potential role of artificial intelligence (AI) in government and civic tech projects—like using machine learning algorithms to simplify the process of renewing safety net benefits, or using optical character recognition to scan handwritten documents and make it easier for caseworkers to process them. We’re curious about another usage: how AI could help researchers surface critical insights.

Large language models (LLMs) like GPT are a perfect tool for analyzing large datasets. In our context, where we have datasets on client experience, our qualitative research team has been working to identify where AI can support research analysis, where it’s a hindrance, and how it can be a good partner.

Through our experiments, we’ve found that AI helps summarize, analyze, and extract insights on a fairly surface level—and is therefore good at capturing patterns for researchers to contextualize and use in designing recommendations. Based on that, we’ve outlined the answers to two big questions for researchers interested in using AI to analyze data sets: where to start, and how to mitigate risk.

Where to start when using AI to analyze data sets

As with the use of any tool, AI should be introduced only when it meets a specific need. A great place to start is by identifying inefficiencies or bottlenecks in existing practices and datasets—such as large surveys, live chat messages, emails, text messages, or other sources that require frequent review but may be limited by staff capacity. In industry terms, these data points are often referred to as Voice of the Client (VOC) data, where people often share problems they’re encountering, provide feedback, and seek further help. This data is invaluable for service and product improvement—but can be extremely time-consuming to run sentiment analysis, highlight common pain points, and even generate concise summaries of the most frequently mentioned topics.

We recommend experimenting with AI by testing the output of an AI model against a completed project. This way, researchers can learn how to prompt a model to create the kind of output they’re looking for. By leveraging AI for these tasks, researchers can focus on higher-level interpretation and decision-making. Here’s what that process could look like:

Select your dataset and format it in a legible way (for example: date, user, inputs).
Consider your most pressing research questions and draft prompts such as:
- Identify and prioritize clients’ key needs and challenges to guide the design of user-focused solutions.
- Analyze and categorize common responses to establish trends and priorities.
- Delve deeper into the nature of customer inquiries to provide a detailed understanding of recurring themes.
- Develop actionable recommendations for implementing customer-centric strategies that address the most pressing issues identified.
Run some of your own analog analysis and select random data points to compare with the analysis by the AI. Sometimes AI “hallucinates” or fabricates incorrect information, so make sure to review for quality.
Speak with your AI as a thought partner and continue to iterate prompts to get the answers you need. For example, AI might identify several common pain points like “Clients commonly get stuck resetting their login and password information.” You might follow up with prompts that ask it to generate specific examples and quotes.

Understanding client needs in Connecticut

In Connecticut, Code for America worked on deploying a text message system to support SNAP renewals. As thousands of text message responses came in, we wanted to know the top concerns that customers were responding with. AI offered comparable analyses on common pain points, top reasons clients sought assistance, and the most common technical issues. It also offered similar technical intervention solutions—such as providing multiple communication channels for client support, collaboration with community partners, and expanding self-service options.

Ways to mitigate risks associated with analyzing data using an AI model

The first risk we addressed in our experiments was data security and privacy. When using an AI model, data should be treated in a way that adheres to your organization’s terms of use and in which any personally identifiable information (PII) that is not absolutely necessary for analysis should be excluded. PII includes things like addresses, names, contact information, and any other unique identifier that can be used to trace clients’ identities back to them. There are tools that can automate this process by redacting PII from large datasets (our team used Amazon Comprehend). Not all AI models have the same privacy and security measures, and certain AI models have higher standards of security and privacy measures built in.

The second risk we addressed in our experiments was data validity and reliability. To make sure that the AI model we used reached the same conclusions we would, we chose to recreate past project analyses to compare against our own completed deliverables. We tested various prompts to do this, including counts of how often various types of client feedback came up, which we compared against our own tags in our code books, and summaries of key themes, which we compared against our own affinity maps and synthesis. Reviewing the AI outputs against our own allowed us to refine our prompts to get results that most closely mirrored our own, spot hallucinations, and ensure our full dataset was being analyzed.

Researchers may decide against the use of an AI tool if the process does not yield the intended results for the task at hand—that’s okay! AI models and their capabilities are rapidly evolving, so continuous improvement and refinement to documented processes should be expected.

Integrating AI into a research practice

Researchers within a team or organization likely have varying degrees of experience and confidence using AI models. But following the simple steps outlined above, research teams should feel empowered to lead experiments using their expertise in data analysis and synthesis to evaluate and refine AI-generated output. Researchers may decide against the use of an AI tool if the process does not yield the intended results for the task at hand—that’s okay! AI models and their capabilities are rapidly evolving, so continuous improvement and refinement to documented processes should be expected. Starting with small experiments can help researchers identify where and when in the research process an AI model could help—the key is to approach AI as a partner in research, complementing the team’s expertise rather than replacing it.