A lot of hands touch each product and service that gets built at Code for America. From client research to marketing, everything we create goes through a collaborative process full of feedback loops, iteration, and countless Zoom meetings. We want to hone in on one particular part of that process today: how our client success, data science, and qualitative research teams work together to consider the responsible deployment of artificial intelligence (AI) in responding to clients who seek assistance with our products.
During the COVID-19 pandemic, we saw soaring demand for GetCalFresh, our digital application assister for California’s version of the Supplemental Nutrition Assistance Program (SNAP). With more applications came more client requests for help, and with a small client success team at the time, we needed a solution that could ensure all clients received responses in a timely manner. We found one in a chatbot that could handle common, simple client questions and direct them to relevant resources. Considering the sensitive nature of our clients’ needs, we aimed to develop a chatbot that embodies our values of being empathetic and human-centered.
Recently, we sat down with three of our staff members to learn more about their work on this project: Pik Ting Szeto, a Client Success Advocate, Vince Dorie, a Principal Data Scientist, and Jennifer Thom, the Director of Qualitative Research.
Can we hear a bit about the challenge of efficiently handling the volume of client inquiries that the team receives?
Pik Ting: Our human resources are limited, and the demand for our services is expansive—so without any kind of assistance, we’d have a big backlog of requests for help. Currently, a chatbot is a way for us to handle all the “low-hanging fruit”—meaning questions that are commonly submitted and probably only need a simple response. Without the bot, those inquiries would be sitting in our inbox and it would require time and human action to manually handle those cases. When the bot is taking care of those low-hanging fruit, it frees up the people on our team to address questions that are more complex and need a human response.
Vince: Chatbots are commonly used in customer service to handle a large volume of customer inquiries. The word “chatbot” usually evokes a conversational agent or rules-based bot, which is something we specifically did not want to build. A fully-fleshed out chatbot implementation usually replaces a human being or purposefully erects barriers to human interaction. We had no intention of replacing people, but rather wanted to create something that could answer simple questions when the result would be high quality and get out of the way of our support specialists when the result would not. So while our use of automated replies might be called a “chatbot” for simplicity’s sake, it’s not the same experience that people would have with other chatbots out in the world.
How do you start to identify questions that the chatbot should handle—those low-hanging fruit?
Vince: Low-hanging fruit essentially means it’s a question with an answer that is automatable with a high degree of precision—an instance where a bot can provide a simple answer and then the client says “thanks” and signs off. That’s a safely closed inquiry. Finding what questions make for good low-hanging fruit is all about patterns—looking for things that repeat. Our client success team was already using macros—aka templated replies—to respond to some common questions that are phrased in a similar fashion. For example, “I want to cancel my application,” “I lost my EBT card,” “I don’t have my ID,” and “What is my case number?” are some common inquiries. Those are easy ones to have the bot handle because it just requires a response with a resource.
Jennifer: I’ve been involved in building conversational interfaces at other organizations, and one of the challenges is that you have to have a lot of data that’s labeled in a certain way so that the chatbot can understand it. One thing that’s notable at Code for America is how much care and consideration was put into leveraging existing processes to create high-quality data sets. So we had some existing client success workflows that involved data labeling, and Vince ran with those to identify what a chatbot should handle.
Vince: Data labeling is unglamorous but vital work, and a lot of machine learning experts have found a lot of ways to separate themselves from that work. It looks like a machine but it’s a human inside—a human you don’t ever see. So when we were building our chatbot, that’s why we focused on existing workflows. We knew it might lead to a less capable bot, but that was an acceptable tradeoff to not make anyone go through data review like that.
Rather than rushing into adopting AI technologies, we’ve been trying to strike a careful balance between using the bot and using people. So for every step that we implement, we aim to “think like our clients” and envision if and how this might affect their experience.
What makes our use of an AI chatbot human-centered?
Vince: I think that a lot of people’s understanding of chatbots is that they get between you and the person you want to talk to—and a lot of these experiences are justifiably frustrating. With the popularization of ChatGPT and large language model-based chatbots, the issue is that there are a lot of examples of bots that at least sound plausibly like human beings—so you can’t tell when they’re giving you the wrong answer. I don’t think we’ve collectively reckoned with how wrong they are sometimes. In our work on the chatbot for GetCalFresh, we chose to be very conservative—hence the focus on low-hanging fruit—because we didn’t want a bot providing wrong answers. If we were somewhere else, maybe we’d have the bot handle more than that, but that comes with more risk. Focusing on low-hanging fruit ensures we only program the bot with high-confidence responses. That means we’re able to provide assistance for basic inquiries even outside our client success working hours—for example, the bot can be available 24/7 to send a client their county help line number. If client success volume is particularly high and the backlog is overflowing, it is easy to dial-back the precision and let the bot handle more conversations.
Jennifer: We rely on the expertise of the people on our client success team to make sure the client experience with our tools is a positive one, and we think that there should always be someone mediating between these models and the people using them. One of our principal values at Code for America is that we put people first—and how that shows up here is that the chatbot does not engage in a back-and-forth. The more back-and-forth there is, the more things can go wrong. So we have it set so that if the client responds to the bot, they immediately get routed to a human.
Pik Ting: What Vince and Jenn said is spot on. Rather than rushing into adopting AI technologies, we’ve been trying to strike a careful balance between using the bot and using people. So for every step that we implement, we aim to “think like our clients” and envision if and how this might affect their experience. We also have a dashboard where we can review all bot actions and intervene to make adjustments or correct them if needed. The important thing is that clients always have an option for human interaction. If the bot can’t get them an answer, we’re here.
What does collaboration between your three departments look like when you work on a project like this?
Jennifer: The qualitative research team got to work really closely with the client success team on this project so that we could analyze common client messages and understand how the client success team handles those queries. They’re working on the ground and we’re seeing big picture data—and by working together and combining our micro and macro approaches, we get a really holistic understanding of how people experience the process of applying for benefits. We also got to work with the data science team, so they can track large scale patterns while the qualitative researchers dive deep into understanding behaviors and motivations. Collaborations like this happen between teams all the time. I’m relatively new to Code for America, so this project helped me understand how different disciplines have been contributing to human-centered AI projects—on this and other projects, it’s never just one group or team or decision maker. Our knowledge and skill sets combine and enhance each other’s ability to create products that improve the client experience. The future of these kinds of collaborations is interdisciplinary.
They’re working on the ground and we’re seeing big picture data—and by working together and combining our micro and macro approaches, we get a really holistic understanding of how people experience the process of applying for benefits.
Pik Ting: Vince has been instrumental in helping me learn about the process of maintaining a chatbot, even though I don’t know a lot of the technical terms around AI and machine learning. I always really enjoy our conversations about how we might improve the bot—I’ll notice something and make a suggestion based on review of the bot actions and client messages, and then Vince can talk me through how complex a change would be to make as well as the risk involved. I also regularly meet with Vince and Gwen Rino, another data scientist on our team, to talk about patterns, synthesize perspectives, and integrate these two to find solutions or make improvements. Each of the disciplines brings unique knowledge to the table and this makes the collaboration mutually enriching. The whole becomes larger than the sum of its parts!
Vince: Having a free flow between our teams like this has been really rewarding. Pik Ting raises such good questions, and when I can provide an insight, it’s a moment where I can say “that’s why I’m here.”
What learnings will we take away from this collaboration?
Vince: When we first started this project, people were very skeptical of AI. Now everyone is rushing in as fast as they can—but we need to do it cautiously. I think this project has always been time-limited work; I’m not sure that the state is going to adopt this tool more broadly. The important thing here is that we learned a lot of things along the way.
Jennifer: At Code for America, we’re always trying to try to learn from the things we’ve done before. We’re using this collaboration as a case study for other human-centered practices we engage with both internally and in our work with other government partners. There’s so much to share here in regards to taking the time to deeply understand client needs, being thoughtful about training AI models, using responsible data practices, and iteratively evaluating these experiences. It’s not just something you deploy and forget about. This was a good exploratory case to think about how we would use similar technologies in other government services scenarios.
Pik Ting: Leveraging the strengths of each discipline amplifies the overall impact we want to make, and I think this collaboration shows that really well. I’m excited about a future in which we have opportunities to share our learnings on this project so that we can continue to explore how to responsibly adopt AI strategies while staying human-centered in government.