Research Update: A/B Testing the New Question Form

Welcome to November’s installment of Stack Overflow research updates! This month marks one year since my colleagues in UX research and I started sharing bite-size updates about the quantitative and qualitative research we use to understand our communities and make decisions.

In recent months, we have invested time and energy in improving the question-asking experience on Stack Overflow, one of the most fundamental interactions on our site. In August, I outlined what we learned from the question wizard, our first major change to the question-asking workflow in a decade. In September, Lisa shared the results of her qualitative research that has informed our next steps. Today, I want to present the results of A/B testing for the changes currently live on the site.

Wizards and a unified experience

The question wizard represented a move in the right direction in terms of question quality and interactions via comments, but some of the decisions made for the wizard turned out to be brittle and inappropriate for our scale. For both technical and design reasons, we have chosen to pursue a single question design with modals specific to different kinds of users, not a two-mode workflow based on reputation.

To measure the impact of changes to the question workflow, we use A/B testing. People in the baseline arm of the test had the old version of the question workflow as it already existed. We shipped changes to the question workflow iteratively so that people in the experiment arm of the test experienced a new workflow; these iterative changes in the experiment arm were necessary because the changes we wanted to test against the old workflow were so extensive and had complex dependencies. For simplicity, we can summarize the changes in two “steps”:

  • Step 1: The first group of changes launched in September and included pretty dramatic UI changes, along with a welcome modal and what-to-expect modal for new users.
  • Step 2: The next group of changes launched in October and focused mostly on a review interface, consolidating and organizing validation warnings.

People in the baseline arm did not see any of these changes but had the old question workflow only.

Posting your question

One of the most important metrics for us when we work with the question workflow is the conversion from clicking the “Ask Question” button to finally posting a question. The new question workflow, compared to the old, allows users to be more successful in this task, with increases of 3% in this conversion throughout the entire process (both Step 1 and Step 2). Adding the review interface did not impact the ease of use of the new question form, as measured by this conversion from initial click to final post.

It may be difficult to see in this graph because they are so small, but the gray errorbars show the uncertainty on how we have measured the proportion here.

Another important metric for the question workflow is question quality, which we define and explain here. More questions are being asked with the new workflow, but what are these questions like?

During Step 1 (the major UI changes plus modals for new question askers), we saw a 1.5% decrease in good quality questions. Not great news! During that part of this major revamp of the question-asking experience, we had increased the number of questions (and the overall number of good questions) but the proportion of questions that were good was down slightly.

Fortunately, one of the main reasons we are redesigning the question workflow is that our new approach is more flexible and easier to iterate on. In fact, that’s exactly what we did next. Step 2 of our rollout focused on consolidating and organizing validation warnings, and during this step, the quality gap between the baseline and experiment groups decreased to virtually zero. We fixed the regression in question quality by iterating in this more flexible framework. We see similar results during the test if we measure bad quality instead of good quality.

Next steps

As of today, the new question workflow performs better in terms of task success (people who intend to ask a question successfully posting their question) and the same in terms of question quality. From a technical perspective, the new workflow is easier to maintain and build on moving forward. Our next steps will include more iteration to continue improving question quality, along with other concerns of all kinds of users, from the most to the least experienced. We have graduated this new question workflow and in the future, we’ll be testing any further changes against this new baseline. The next time you ask a question on Stack Overflow, look for the results of these carefully planned and tested changes!

We have something fun for ya. Our latest podcast episode is out! You can check out all our episodes here.

Related Articles

Comments

  1. > task success: people who intend to ask a question successfully posting their question

    Don’t forget that the real success is finding a solution for their problem, not posting a question. (Having more content might be a success for the site though). People might click the “Ask a question” button, but then got good duplicate suggestions or were encouraged to formulate their problem clearly, which lead them to the solution.

    It might be useful to collect data on that as well, e.g. by tracking whether the user visited similar questions instead of posting a question, or even having a dialog pop up when they leave the page, asking for the reason.

  2. Just a student says:

    Good to see this work being done. I have two questions:

    1. When users find that their question is answered already in the process of asking it, they won’t successfully post their question. I would still consider that a success though. How do you take that into account? It seems to me that optimizing that percentage might be even more important than optimizing the percentage of successful posts.

    2. I’m confused by the second graph. Question quality seems to decline both in step 1 and step 2. Did you iterate more after step 2, and eventually improve the question quality above the baseline value of step 1? Or should I understand your explanation to read that the decline in quality is negligible?

    Thank you.

  3. It’s a shame that the wizard was scrapped. It introduced a structure to asking questions that’s completely missing from the new dialog. By contrast, the new dialog is pretty much the same as the old one, and all the “advice” is easily ignorable. With the wizard it was in your face, and that’s a good thing.

    As an example of what I mean, there was a time when, using the wizard, I ended up not asking a question because the structure of the wizard led me to answer my own question. I’m confident that the same wouldn’t happen with the current question UI since it’s missing crucial guidance. If this is true even for experienced users it’s probably even more so for new users.

  4. So TLDR; More people who started writing a question went on to post it, but the quality of the question is worse.

    Yes, I’d noticed!

  5. Stephen Boesch says:

    What’s going on with making it so that experienced users do not see the pre-school-ish robot and question balloons?

  6. “As of today, the new question workflow performs better in terms […] of question quality.”
    Maybe I just don’t understand the graph (and its description) but to me it seems like question quality went down (about 2%). Could you clarify where the better question quality can be seen?

    1. Now I read “[…]and the same in terms of question quality” – was that always there and I just missed it or did you edit the article?

  7. Thank you for sharing useful information

  8. Are the results actually statistically significant? Your bar graphs aren’t very convincing.

  9. When users find that their question is answered already in the process of asking it, they won’t successfully post their question. I would still consider that a success though. How do you take that into account? It seems to me that optimizing that percentage might be even more important than optimizing the percentage of successful posts.

  10. It’s a shame that the wizard was scrapped. It introduced a structure to asking questions that’s completely missing from the new dialog. By contrast, the new dialog is pretty much the same as the old one, and all the “advice” is easily ignorable.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.