Research Update: Impact of the Ask Question Wizard

Welcome to this month’s installment of Stack Overflow research updates! Since November of last year, my colleague Donna Choi and I have been posting (over on Meta) bite-size updates about the quantitative and qualitative research we use to make decisions at Stack Overflow. As of this month, we are posting these updates here on the blog. 🙌 In March of this year, we launched the Ask Question Wizard, the biggest change to the question asking experience on Stack Overflow in our ten years’ existence. People have been using the wizard for about six months now, so let’s explore how many people have used it, what its impact has been, and where we’re going from here. We have seen some encouraging improvements in question quality, and we can use what we’ve learned to improve question asking more broadly.

What is this wizard?

The wizard presents an alternate question-asking mode that we call guided mode, offering askers detailed instructions and direction about how to ask a question and what to include. We kept the original question-asking page intact, calling it traditional mode. Before launching the wizard, we tested a final version of it in an A/B test. We found that question quality, which we define and explain in detail here, improved in an absolute sense by modest amounts for askers with reputation less than 111 (the asker population we included in that test). There was a 5.12% decrease in bad quality questions, and a 1.12% increase in good quality questions. When the wizard launched more broadly, it was the default option for users with reputation less than 111, but such users could opt out if they wanted. Also, users with reputation above that threshold could opt in.

Who has used the wizard?

Between when the wizard was launched and the beginning of this week, 777,644 questions have been asked using guided mode. What level of opting in/out do we see, specifically on people’s first questions? A few percent of higher rep question askers opted in, and about 15% of users with reputation below 111 opted out. In this time period, 99.1% of first questions were asked by users with reputation less than 111, so using the wizard has been the main new question-asking experience over the past several months. Users are largely not opting out.

Confounders and models

It is quite challenging to measure what effect an intervention like this wizard has because users can opt in/out of it. We expect that the very characteristics that lead a user to have a hard time writing a high-quality question may make them more likely to opt in. A difference we see between using the wizard vs. not may be caused by a factor that predicts intervention (the wizard) rather than the intervention itself. This is an example of confounding, and is exactly why we typically use A/B tests when planning changes for our site. However, we would still like to see what we can learn from the last six months of data, which means we need to use methods appropriate for observational data. My academic background is astronomy, so this is a pretty familiar situation for me! (There are not a lot of randomized controlled trials in space.) I’ll focus on a few very important factors in this blog post, but predictors we explored included reputation, account age, user location, and more. For this analysis, I am going to share results using Bayesian generalized linear multilevel modeling to understand the impact of the question wizard. Why use a Bayesian approach for this data? The main reasons are that only a very few users with higher rep ever used the wizard, and we’d like to explore whether the wizard only helps lower rep users, among other questions about when and for whom the wizard is helpful. Bayesian modeling provides a framework well-suited to those kinds of questions.

Question quality

First, let’s look at model results for question quality.
This plot shows results for a straightforward classification model predicting whether a question is good or bad trained using brms and Stan. I chose the reputation threshold at 11 (rather than, say, at 111) because the median reputation of a first time question asker is 8.1. A “new account” is one created within the past day. The size of these effects does change with the thresholds but these predictors having an impact at all is robust to such changes. How can you interpret this plot? Reputation above the threshold, older accounts, and using the question wizard are all associated with positive improvements in question quality. Specifically, because of the modeling approach used here, we see that, for example, people with accounts older than one day write better first questions, controlling for other factors like reputation and using the wizard.
Quality?FactorRisk ratio95% confidence interval (low)95% confidence interval (high)
BadCreated with wizard0.820.790.85
GoodCreated with wizard1.061.031.10
BadNew account1.191.161.22
GoodNew account0.890.860.91
BadRep <= 112.051.872.24
GoodRep <= 110.590.550.64
Let’s walk through how to interpret these risk ratios, which are relative changes.
  • Someone with rep <= 11 is about 0.6x (~40% less) likely to ask a good quality first question, and about 2x (100% more) likely to ask a bad quality first question, than someone with higher reputation, controlling for other factors.
  • A first time question asker who has a new account (has created their account within the past day) is about 0.9x (10% less) likely to ask a good quality first question and about 1.2x (20% more) likely to ask a bad quality first question than someone with an older account.
Controlling for these confounders, when a first time question asker uses the wizard, their question is about 6% more likely to be good quality and 20% less likely to be bad quality. We consider this a real success of the wizard, because when people ask better quality questions, they are more likely to get answers and have an overall more positive experience. We also know people who ask good quality questions are more likely to ask a question again. Confirming what we found during the A/B test, the wizard has a bigger impact on bad question quality than good question quality. I explored whether we see evidence that the wizard only helps users with lower reputation or first-time question askers, and we don’t see evidence for that. We don’t have as strong evidence that the wizard can help users with higher reputation (we don’t have much data on this), but we can put some limits on how large the difference for higher and lower rep question askers would need to be for us to see a difference with the data we have.

Comments galore

We know from multiple sources of feedback and research that the comment section of Stack Overflow can be a pain point in using our site. Putting aside the issue of unfriendly or otherwise problematic comments, they are intended to be a venue for clarifying questions to improve the quality of a question. This means that from our perspective, in general and on average, the fewer comments, the better. How has the question wizard affected the number of comments on first-time questions, and the number of unfriendly comments? Instead of a classification model, this uses a Poisson regression model.
People use several different categories to flag comments that are inappropriate for our site, such as rude/abusive and unfriendly/unwelcoming. Recently, my team, especially my colleague Jason Punyon, has used the human-generated flags to build a deep learning model to automatically detect unfriendly comments on Stack Overflow. We’ll share more soon about this model, how it works, and how we’re using it on our site to make Stack Overflow a more safe and inclusive community. For this analysis, I used the unfriendly comment model (not comments flagged by human beings, which we know are dramatically underflagged) to understand what impact the wizard has on unfriendly comments.
Comments?FactorRate ratio95% confidence interval (low)95% confidence interval (high)
AllGood quality0.850.850.86
UnfriendlyGood quality0.280.250.30
AllCreated with wizard0.950.940.96
UnfriendlyCreated with wizard0.780.710.85
AllNew account1.051.041.06
UnfriendlyNew account1.181.101.27
AllRep <= 11 1.061.031.08
UnfriendlyRep <= 11 1.421.111.82
What do these coefficients mean? They are multiplicative factors, because this was Poisson regression.
  • Having rep <= 11 is associated with receiving 6% more comments and about 40% more unfriendly comments.
  • Having a new account (less than one day) is associated with receiving 5% more comments on a first question and about 20% more unfriendly comments.
  • Writing a good quality first question is associated with about a 15% reduction in comments on that question and over 70% reduction in unfriendly comments on that question.
Using the wizard is associated with 5% fewer comments and over 20% fewer unfriendly comments. These factors and their impact come from models controlling for the other factors, so think about the wizard reducing unfriendly comments by over 20%, controlling for other factors, including reputation and quality of the question. We consider this another huge success of the wizard.

Next steps

We are really happy to see that the question wizard has been having a positive influence on Stack Overflow, both in terms of question quality and interactions via comments. 🎉 This was the first time we had shipped changes of this magnitude for the question-asking workflow in a decade, and it is great to be able to celebrate its impact. The question wizard has proven successful enough that we want to iterate on its design and see how we can better scaffold people with coding problems to either write a successful question or realize that they don’t need to ask a question at all because the answer is already available. We also need to revisit some of the technical decisions made when launching a two-mode question workflow, as some aspects of the current system are brittle and inappropriate for a growing system, especially when we consider the network beyond Stack Overflow. What can you expect in the near future if you ask a question soon? We will be using A/B testing, so not every user will experience changes at the same time, but some of the changes we are working on so far include:
  • More upfront guidance for first-time question-askers
  • Setting expectations about what happens after asking a question
  • Improved “how-to-ask” guidance while drafting a question
  • Consolidating the many dozens of validation warnings into a single review interface
For both technical and design reasons, we plan for the question workflow to change for all users, not only those with lower reputation. As we move forward, we can use both data and feedback from users to assess how successful changes are. How do we know when we are successful? We use both qualitative and quantitative research in making decisions about our site. This blog post is an example of the kind of quantitative analysis that we use, involving large samples of our users broadly. For more details on what’s coming to Stack Overflow soon, check out my colleague Meg’s blog post from earlier this week!

Related Articles

Comments

  1. “we want to iterate on its design and see how we can better scaffold people with coding problems to either write a successful question”

    Step 1: Put the “title” step after the “description” step.

    Step 2: Profit

    1. This is something I would agree with. When I start with a title, then I find myself re-writing the title several times until I find something I am happy with.

      It’s much easier to first get the question body into shape and then find a title that is meaningful

  2. When I first saw the announcement, I thought the idea was brilliant, and couldn’t wait for it to come to Super User. I expected it to make a huge difference. The improvements you measured were much smaller than what I expected. My theory is that the people who aren’t helped by it don’t understand the nature of the site or why they need to spend time on the “silly details”. They arrive expecting the site to be a personal help line, and if the “experts” know their stuff, they’ll recognize the problem. Their intention is to post what’s in their head, and the wizard is just silly hoops to jump through. I wonder if it would help to start off with some context about why it’s important to follow the posting guidelines.

    1. Keep mind that while Superuser is one of the bigger sites, it’s still *dwarfed* by Stack Overflow in question volume. You need to translate these percentages in absolute terms of 6k questions per day (SO) to <200 questions per day (SU). Even if it's something small like 10 questions a day, personally I'd appreciate that on a site at the scale of SU.

      As for the people who fundamentally misunderstand the concept of Stack Exchange sites, you can't fix that until you seize control of the education systems/mechanisms of every household in the world and institute changes in their curriculum… and even then it'd likely fail unless you control for external factors like background, geolocation, parental involvement, affluence, etc. …in other words, good luck!

  3. Jan Doggen says:

    “We are really happy to see that the question wizard has been having a positive influence on Stack Overflow,….”

    But what do you plan to do about the negative influence it has brought on other SE sites? E.g. Software Recreations gets inundated with off-topic questions since the introduction of the wizard.

    1. Good point, but are such questions on Software Recs off-topic by their very nature, or are the off-topic by some metric in your Help Center? For example, there’s no way you can edit a software recommendation question on SO such that it’s on-topic, but I’m guessing most of the questions that are now (rightfully) asked on SoftwareRecs.SE originally are only off-topic by some specific metric (in other words, these are questions that *can* be salvaged).

  4. It is answered in the provided link, but to summarise:

    Bad question: Score 0 or (answer count > 0 and score = 0))

    Neutral question: Score = 0 and not closed and answer count = 0

  5. Meta: Does this comment interface not escape “less than” (so it is not seen as the start of an HTML tag)?

    The preview (“Your comment is awaiting moderation.”) indicates it doesn’t escape “less than”.

  6. Could you please run the analysis again, with a distinction between coding-problem questions and non-coding problems? (Distinguished by something like <25% of characters of the post in code sections)
    Although they often are too broad or too opinionated, non-coding problem sometimes make great questions with very insightful answers.

    Also it would be nice to see how the question wizard affected the used close reasons on bad questions. Did fewer of them get closed for lacking an MCVE or being unclear?

    And finally, did you get any data on how many people using the wizard vs. not using it started to write a question, but then did not post it (presumably having solved their problem)? Not sure whether it's OK to process this data without user consent, though.

  7. NotgivingNameAnymore says:

    re: “Using the wizard is associated with 5% fewer comments and over 20% fewer unfriendly comments.”

    Does this account, within the controls, for the fact that many users have left or, of those who remain, that there has been a trend for users (at least the ones that use chat) to remark that they no longer leave any comments due to the welcoming initiative itself?

    I mean, this is definitely better than nothing (which is what we had) but is the 5/20% due to the wizard or just veterans saying “I’m not leaving comments anymore period” (good/bad/other – reasons abound for why this is so on chat & meta).

    This is not including the review strike many users participated/participate in until new tooling is available (also in meta several times).

  8. Michael Chen says:

    I think the Ask Question Wizard is a great idea, and it seems to be working for Stack Overflow, but with the current wording, it is leading far too many users to believe that their question belongs on Software Recommendations. Software Recommendations has had a large influx of off-topic questions since the start of the Ask Question Wizard, and I’m sure that the new users who post there walk away feeling unsatisfied, when their question might actually have been on-topic for Stack Overflow. See my Stack Overflow Meta question https://meta.stackoverflow.com/questions/388250/choices-for-what-type-of-question-do-you-have-are-too-vague for my suggestions.

  9. Could you change the charts to make it clearer that there are two charts per image rather than just one? The first time I looked at them I didn’t see the thin white gap in the horizontal light grey lines.

  10. Super Swamper says:

    That Ask Question Wizard seems very useful for new sites to get people know what the company is all about and it’s products or services. Sounds very tricky though to come up with great questions and answers that could make people engage.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.