Dr. Strangedupe: Or, How I Learned to Stop Worrying And Love Duplication
As Stack Overflow grows — or any other Q&A; site in the Stack Exchange network, really — there’s a natural pressure to discover and link duplicate questions. The more questions you have, the higher the possibility a given new question isn’t in fact a new question, but a duplicate of an older existing question. Because of this, we’ve continually enhanced the tools for finding, linking, and merging duplicate questions:
One thing I want to be clear about, though, is that duplication is not necessarily bad. Quite the contrary — some duplication is desirable. There’s often benefit to having multiple subtle variants of a question around, as people tend to ask and search using completely different words, and the better our coverage, the better odds people can find the answer they’re looking for. And isn’t that, really, the whole point of this exercise?
Furthermore, it’s OK for duplicate questions to have duplicate answers. While you could argue that the duplicate questions could all be merged into one question with a “master” set of answers, this is kind of irritating from the perspective of the user looking for an answer. Put yourself in their shoes. Instead of finding …
Duplicate Question
Duplicate Answer
They have to deal with finding:
Duplicate Question
[closed as duplicate of Question] click here to see answers
Now, what other site requires users to do some sort of weird scroll-down, click-here-first to see the answer nonsense on the search results before they will reveal the answer? Oh yes, our old hyphenated pals. Do we really want our site to work like theirs?
Furthermore, I’ve found that the perfect duplicate question is a … bit of a mythical beast. There are similar questions, yes, and so-called “exact” duplicates do happen, but they are kind of rare in my experience. It’s far more common to have many subtle variations of a question. I think that’s OK, because that’s how the world works. Trying to shoehorn a bunch of semi-related things into one arbitrary container in service of some Highlander-ish “there can be only one” rule is ultimately harmful. Remember: while there are aspects of wiki to our system, we are not Wikipedia. There is not one canonical question about every possible subject. Rather, there are many.
In other words, over time, I have learned to stop worrying and love (some) duplication. And you should too.
Here are my official guidelines on question duplication:
-
Having one “perfect” form of a question that contains every possible answer to every slight variation of that question is a myth at best and actively harmful at worst.
-
Having dozens and dozens of variations of the same question is clearly bad.
- What we want is on the order of 4 or 5 similar-but-not-quite-the-same duplicates to cover all possible search terms and common permutations of the question. It is also OK for these duplicates to have their own answers so people who find them don’t have to click yet again to get to a good answer.
Let me be clear — too much question duplication is bad. Absolutely. You’ll get no argument whatsoever from me on that. But not enough question duplication is also bad. I know this does not sit well with programmers who love to think in binary black and white and cannot abide a single atom of duplicated content in the entire omniverse. But the honest, realistic answer to how much question duplication there should be is … “enough”. Question duplicates aren’t necessarily our enemy. They’re more like our, y’know, frenemies.
So, as always, use your good judgment and please continue to close and merge duplicates as you see fit. However, bear in mind that cultivating and supporting a moderate amount of natural duplication actively helps the community. I wasn’t kidding when I said learn to stop worrying and love (some) duplication. Use the above guidelines and try to find a happy, reasonable medium somewhere in the middle there.
7 Comments
There’s a raging argument going on in Meta right now. It was started by a bunch of arrogant, pedantic badge seekers who want even more badges for closing dupes.
It’s insane how stupid really smart people can be.
I guess this article has gathered too much dust. It needs a revival.
This is an interesting and helpful post but it’s missing a topic that I’ve not seen addressed at SO. While we may debate whether question A is a duplicate of question B, there are many cases where question A has been closed because it is a claimed duplicate of B, yet A is in no stretch of the imagination a “duplicate” of B. I can think of many such situations where B has not even been a question, but a community faq that summarized basic elements of programming language. When this happens, the message being conveyed to the OP is “study this and you’ll be able to answer your question yourself”. Ask yourself what would be your reaction if you had asked a question that was closed for that reason. (Pause reading until you’ve thought about that.) If the OP is new to SO they may just say “adieu”. Defining what constitutes a “duplicate question” obviously is a challenge, but I think SO should make a stab at it. At a minimum it would be helpful to give examples of situations where question A is *not* a duplicate of question B.
Amen to that! I recently posted a comment on Meta that was concerned with SO questions that were being closed as duplicates, but clearly were not duplicates. I felt lucky to to have escaped from that den with all my body parts.
Hi all,
I have written a research paper about duplicates in Stack Overflow: Same-Same But Different: On Understanding Duplicates in Stack Overflow.
Just take a look here: https://gi.de/fileadmin/GI/Mitgliederbereich/Informatik-Spektrum/287_42_4_web.pdf#page=38
You can also share the paper if you like.
Thank you and best regards,
Mathias Ellmann
I have another link to the paper (official link from the publisher Springer) about duplicates in Stack Overflow: Same-Same But Different: On Understanding Duplicates in Stack Overflow if the given link (https://gi.de/fileadmin/GI/Mitgliederbereich/Informatik-Spektrum/287_42_4_web.pdf#page=38) does not work for some reason: https://rdcu.be/bKNrJ
Hi all,
I have written another research paper about duplicates in Stack Overflow that are compared with FAQs of different software vendors: A Comparative Study of FAQs for Software Development.
You can take a look at the presentation of the paper here: https://www.researchgate.net/publication/335455291_A_Comporative_Study_of_FAQs_for_Software_Development_presented_in_the_2nd_International_Workshop_on_Software_Qualities_and_Their_Dependencies_SQUADE'19_in_Tallinn_Estonia.
You can also share the presentation if you like.
Thank you and best regards,
Mathias Ellmann
Updated version of the link with text “our old hyphenated pals”: https://blog.codinghorror.com/whos-your-arch-enemy/