- News and Stories
- Blog post
- Principles & Practices
Five Questions to Ask When Evaluating Technical Products
Working in a state benefits program like Medicaid or the Supplemental Nutrition Assistance Program (SNAP) means constantly making choices related to technology. Rapid technological developments, including new generative AI systems, have unlocked new possibilities for service delivery and operational efficiency. These opportunities come at a time when states are under extreme pressure to make major changes for work requirement rules and payment error reductions mandated by H.R. 1. The result is a flurry of new products and offerings. With a clear, validated problem statement, agencies can identify solutions that measurably advance their goals.
Code for America’s deep technical expertise, rooted in government and human-centered perspectives, has informed five questions state agencies can use to evaluate products and offerings from vendors.
Question 1: How does this product fit into the existing landscape of your work?
Start by clearly defining the problem: identify the desired outcome, who is affected, and how improvement will be measured. Use recent data and frontline feedback to anchor your evaluation to a shared, measurable goal. Once that goal has been defined, the product can be evaluated in context.
The next step is to evaluate your state agency’s resources across technical and non‑technical areas. Does the product require access to data? If so, what kind? Do you have access to that data? For AI-based tools, would this product augment or empower your team and complement their existing background and skill set? How does this product fit into the kind of work your agency does, and with the people who do that work?
If your organization doesn’t have the technical or non-technical resources needed to immediately pick up a product, that doesn’t mean you can’t eventually implement it. But asking these questions will illuminate the gaps that will need to be filled during adoption: preparing data systems, for example, or hiring for specific missing skill sets.
Question 2: What’s the success rate of this product?
It’s easy to make promises about the capabilities of a product. It’s much harder—and more important—to back up those promises with data. No product is perfect. You should request that any product come with data indicating how often it produces a successful result. You should also request details around how that data was produced and in what context.
For example, let’s say a vendor approaches a state with a document upload tool. It’s easy to imagine the marketing pitch here: This tool will allow clients to upload documents with no burden from any device. You can immediately begin clarifying the picture with requests for data:
- In testing, how often did clients successfully upload documents? How often did they fail?
- How long, on average, did clients spend using the tool?
- On what devices were clients most successful? On what devices were they least successful?
- How did the tool perform among non-English speaking clients? Blind or low-vision clients? Clients receiving assistance from others? Clients using the tool alone?
Strong partners share transparent evidence and context about performance, including clear goals, success metrics, methods, and limitations, so you can assess fit in your environment. But having this data is merely the beginning of decisionmaking. Applying it to your specific context comes next. Does this tool align with the needs of the populations you serve? Is the success rate high enough overall to be worth the investment?
It’s easy to make promises about the capabilities of a product. It’s much harder—and more important—to back up those promises with data.
Question 3: What underlying systems does this product rely on, and what kind of support is offered to make sure those systems are transparent?
The building blocks of a system should be immediately obvious when evaluating a product. Partners should be able to discuss these fundamentals—which are essential to any technical product—without difficulty. If details aren’t available yet, co‑design a time‑boxed pilot or proof‑of‑concept to close gaps and build shared confidence. Depending on the context and implementation specifics of the product, these questions may be useful:
-
- Is this product cloud-hosted? If so, where? Will we need to host it in our cloud, or will the vendor provide a cloud environment? If that cloud environment needs to change, how difficult is that transition?
- If this is an LLM-based product, which models/vendors is it using? How large is the model being used? Why is it using that particular model over others? How will the product’s pricing model adapt to changes in the LLM’s pricing model? How is the product’s architecture designed to leverage future AI advancements? How will the product adapt to model advancements from the downstream vendor?
- What support will be provided for upgrading/maintaining this product? Is this a product that an existing vendor can modify and upgrade? Will the product vendor provide ongoing support (such as regular software updates, security patches, troubleshooting help, and training sessions)? What training is needed to help staff understand the product’s technical structure and how its main components work together?
Question 4: How can you evaluate the impact of this product?
Ultimately, you are evaluating a product for a specific goal. That may be reduced work for caseworkers, a better client experience, a decrease in error rates, cost savings over time, or another goal. Evaluate products with the scale of their anticipated impact on these specific goals in mind. Use vendor-provided data to estimate that impact: Given the cost of the product, does its reduction in caseworker burden result in net savings for your agency? The calculations may require some effort, but even an estimate can go a long way in your evaluation.
The focus in all of these conversations should be on the people the program serves. This can happen in multiple ways: a cost-saving tool frees budget for additional client experience improvements, for example. An LLM-based tool can improve document processing and also analyze client feedback patterns to proactively identify new service gaps. Reducing caseworker burden is a critical metric, but the end result should be better outcomes for clients. In addition, take this opportunity to evaluate whether it could potentially unlock new capacity or create new value for your agency.
Question 5: How will the product be evaluated over time to see if it’s performing well?
Choosing the right technology isn’t only about what works today. It’s essential to make sure that your systems keep working over time. This means regularly checking performance and having a plan for growth. Ask the vendor for simple, understandable ways to track how the product is doing, ideally with regular reports or real-time dashboard views anyone on your team can read. Focus on these key areas:
- Reliability: Does the system stay online and work when you need it? Ask about uptime (how often it’s working) and how fast issues get fixed.
- Speed: Does it respond quickly, even when lots of people are using it?
- User experience: How is staff and user satisfaction measured, and are they satisfied? Are they using the system?
- Data quality: Is the information accurate and up-to-date?
- Support help: How quickly does the vendor respond to problems or questions?
- Ongoing costs: Are expenses (like licenses or cloud hosting) staying predictable as your use grows?
Each of these metrics makes it easier to know if the system is doing its job and spot problems early.
Choosing the right technology isn’t only about what works today. It’s essential to make sure that your systems keep working over time.
Pilots and proofs of concept
While the questions above can provide context on a solution in general, it is much more valuable to understand how a product will work for your particular agency. Pilots and proofs of concept, defined up front with clear success metrics, measurement methods, and timelines, let you validate value in your context before scaling. A pilot is a small-scale in-production experiment that is used to determine whether a solution could yield benefits in a larger implementation. A proof of concept is even smaller— working outside the production context, it will prove whether an aspect of the solution is implementable at all.
Asking for pilots or proofs of concept can be an ideal way to ensure that the solution works not only for the average case, but for your specific agency. While these generally take some time to implement, they can both mitigate the risk of procuring a solution that doesn’t fit with your system and empower staff to discover new, more effective ways of delivering services.
A clear picture at the beginning gives your solutions long-term viability
The pressure on states to adapt to changing circumstances is constantly increasing. New services or products can seem like simple solutions to high urgency problems, but careful evaluation is critical from the very beginning. By asking the questions outlined here, states can confidently select products that meet operational needs, serve users effectively, and support their mission both now and in the future. This approach helps ensure investments deliver lasting value and impact.