Behind the scenes with OverflowAI Search

In late July, we announced OverflowAI, which involves some exciting new features—one of those features being OverflowAI Search. This post will take you through research, activities, and milestone decisions that led us to the alpha we recently launched.

How users search for information has gone through a shift due to the emergence of new generative AI tools and technology. Now it is easier to find answers in an instantaneous and frictionless way—one that provides a natural conversational experience to refine queries. Access to this new way of finding solutions has caused users' expectations to change and adapt.

These AI tools are still relatively new and it comes with some known issues. For example, LLMs are known to hallucinate to fill knowledge gaps which raises concerns about accuracy. Meanwhile, Stack Overflow remains a trusted resource over LLMs because users still trust Stack Overflow’s content quality and the large number of human technical subject matter experts on the site.

These changes in user expectations and behaviors introduced an interesting problem to solve. What could we do to reduce friction in finding an answer to a question while grounding that answer in trusted content?

The expectations of being able to quickly and effectively find an answer is higher than ever and there is less patience for slower methods. We had an opportunity to adapt to meet those expectations and to improve the way users search for answers on Stack Overflow.

We put together a strategy that focused on what we could do in the product to help our users more effectively that could reduce the frustration of trying to solve a problem. Our objective with this strategy was to drive user retention and engagement in a way that was useful and meaningful for our users. In order to accomplish this, we kicked off a design sprint.

A design sprint allows a cross-functional team made of product managers, designers, engineers, researchers, and community managers to quickly solve challenges, ideate on solutions, and build and test prototypes with users in five days.

During the sprint, our goal was “reducing friction in finding an answer to a question.” Sprinters used the following metric to signify that the goal was being met:

Decreased time to get or find an answer

The group explored the current user journey, mapped “how might we” questions to that journey and then identified the best areas to focus. Those areas were two loops that we believe cause the most friction for our average user.

Reviewing each piece of content to decide if it could be helpful/relevant > looking at related questions
Testing solutions > Refining a question/query

We knew from past research that these loops and points of friction identified are real and the size of the problem dictates how much time the users spent in the loop. We also know that the ability to articulate a problem is a skill technologists develop through practice. So the group asked themselves

How might we help users identify the content they really need, more quickly?
How might we make the back and forth of refining a question and getting an answer smoother?

Now that we knew we were working on a problem that involved finding content more quickly, we also delved deeper to better understand the current state of the search feature and its top limitations.

Complexity and confusion: Users often struggle with Stack Overflow's search interface, even requiring guides on how to use it effectively. Results can be imprecise and looking through them can be cumbersome.
Duplicate questions: Due to poor research results and search relevancy, duplicate questions are asked on Stack Overflow when users are unable to find existing answers. Their duplicates are then closed, which leads to a poor user experience.
Dependency on external tools: The current search experience on Stack Overflow often falls short of meeting users' expectations for search precision and relevance, prompting them to resort to external search engines.
Changing user expectations of content discovery : The rise of AI tools like ChatGPT is changing how users prefer to obtain information. More users are turning to AI for quick answers, and their patience for sifting through search results may be diminishing.

If you are familiar with a design sprint, you understand that there were a lot of ideas that our group went through. But after a lot of brainstorming, iteration, and refinement, we narrowed down on the following set of problems, solutions, and goals. You can read more about it in the Overflow AI Search announcement post.

Problem: Users experience difficulty in finding relevant answers on Stack Overflow: Goal: Deliver answers that more closely align with the user’s intent and increase the relevance of search result Solution: Improved search results via a hybrid Elasticsearch and semantic search solution.
Problem: Users find it time consuming to navigate through various questions and answers Goal: Reduce time to find a relevant answer while leaning into our community expertise. Solution: Search summary of the most relevant answers powered by AI
Problem: Users sometimes struggle with being able to articulate or identify their problem. Goal: Unique answers resulting in reduced time to get an answer and reduced number of duplicate questions Solution: Conversational search refinement

Our final day of the sprint was dedicated to research where we took our preliminary designs and proposals to users and gained these key takeaways:

Speed and immediacy of an answer is critical. This confirmed our previous research on the friction users were currently experiencing. Users are trying to get an answer to their problem as fast as possible. They would rather ask a co-worker or ask AI than ask a question on the site because of the time it takes to craft the question and then the user has to wait for an answer.
Good search refinement and signals can help speed up the process. Many users know the elements of their question they might want to refine by (tag, version, recency, etc.), but we could learn more here. Users found being able to copy a code snippet directly as valuable. Seeing votes or other helpful indicators were also important to users. Having AI select the critical information or an accurate summary was helpful and desirable, but they also want to see search results alongside, again to move faster.

These findings helped us identify that our problem was valid and the solutions we were exploring were an interesting way to bridge the gap between finding solutions for a problem you can’t articulate and finding the solutions for a problem that was already asked. In addition, users were excited that Stack Overflow, the world’s largest source of developer knowledge, was trying to improve how quickly users could find answers.

Following the design sprint, we moved into weekly design and research sprints where we presented Stack Overflow users—a mix of more tenured and newer users,—with mockups and prototypes of the solutions we had brainstormed. This allowed us to directly gauge user reactions, assess the perceived value of these solutions, and understand users' expectations in a more concrete way.

The feedback we gathered from these sessions directly shaped the development of the search solutions. Each week, we iterated on these designs and these learnings guide us in refining and adjusting the features to better meet user needs. From conversations with our users, we knew that the following principles and insights were important to the success of the feature.

AI as a flexible and seamless option: While there was a general sense of excitement about Stack Overflow’s early exploration into AI from our research participants, we still wanted the introduction of AI the platform to be a seamless and flexible experience. This led us to enhance the existing search experience by adding an AI summary of the most relevant questions and answers alongside the search results. Users could always choose to browse through the improved search results instead of delving deeper into the summary. We also extend the experience by allowing users to engage in a conversation if they need additional help refining their question. Alternatively, if users wanted to jump immediately into the conversational search experience, they were able to do so as well.
Highlighting sources and recognizing our community: While many research participants are using AI tools, there is still some degree of skepticism about AI capabilities. They still view Stack Overflow as an indispensable source of information, especially for complex problems that require human expertise. With this in mind, we wanted our solution to highlight the trusted and validated content from our community by prominently showing the sources used. Participants liked being able to see citations of where the AI content was coming from and the ability to dig deeper into the sources used. They expressed concerns about voting and reputation on the sources. We kept the design simple with voting arrows next to the sources, communicating that a user can vote on individual sources just like they do on answers.
Measuring confidence: Early on, we explored the idea of showing confidence indicators of the answer quality with users. We found that users valued answer quality indicators based on human feedback, such as the number of upvotes on a source, or the reputation of the person answering the source. This reinforced how important it was to highlight human interaction that exists within the community. As a result, alongside our sources, we display these indicators to give users a better understanding of the quality of the answer.
Challenges in giving credit: Participants had diverse opinions on how to appropriately acknowledge the sources that inform the AI responses. Some advocated for awarding votes and reputation to all sources, and others felt that credit should be given according to the source’s actual contribution to the summary. Notably, concerns were raised about awarding credit to lower quality sources, or sources that did not contribute enough to the answer. This insight also added to the decision to break down voting to the individual source level. However, this is an issue that we have yet to resolve. In the Alpha, we are allowing users to vote on sources but not awarding reputation as an interim step to learn more about how to best strike a balance.
Expectations on accuracy, usefulness, and relevancy: Our research shows that users hold Stack Overflow in high regard for consistently delivering dependable and quality information, setting the bar high for accuracy. Implementing hybrid Elasticsearch and semantic search will now hopefully yield search results that better match your question. A primary focus for the Alpha will be to measure and improve the quality of insights provided by AI responses.
The importance of user feedback: User feedback was critical in order for the AI to improve. In early concepts with user testing, we tried approaches where we allowed users to just upvote the whole AI summary as a proxy of providing feedback. However, there was confusion of whether they were upvoting the AI, all the sources used by the AI, or just giving feedback to the AI. This led us to clearly separate the AI feedback from the upvoting of the sources.

We are excited to be launching our improvements to search into Alpha. We want to thank those who have already contributed to this process and recognize the effort users have already made in sharing their opinions with us in our weekly sprint sessions. That feedback has already influenced and shaped the feature, and we have learned a lot over the past few months.

With the Alpha, our goal is to continue this process of learning and building with the community. Search and improvements to search is not set in stone and it is still actively in development. As we open up the Alpha to increasing numbers of users, please keep in mind that your feedback during this time can still influence the final product.

Ultimately, we hope that we can accomplish our mutual objectives of being able to help our users find answers to their questions in a quicker and more efficient way, and that it reduces a lot of the friction they are currently experiencing.

If you’re interested in reading about the technical details of our semantic search implementation, check out this deep dive.

Behind the scenes with OverflowAI Search

Background and context

Strategy

Design sprint

Problems with search today

Ideating on problems, goals, and solutions

Testing our assumptions

Continuous improvement with design and research

Final thoughts

Add to the discussion