Defending Attribution Required

All content contributed to the Stack Exchange network is licensed under cc-wiki (aka cc-by-sa).

What does this mean? In short, it's a way of guaranteeing that we can't ever do anything nefarious with the questions and answers the community have so generously shared with us. It's not unheard of for some companies to arbitrarily decide that giving content back to the community is, er ... well, let's just say ... not in their best commercial interests. Then they suddenly pull the rug out from under the very people that contributed the content that made them viable in the first place.

We wouldn't want that done to us. And there's no way we're doing it to our community. To prove it, we adopted a licensing scheme that makes it impossible for us to do anything even partially-quasi-evil with our community's content. Namely, cc-by-sa (aka cc-wiki), which gives everyone the following rights to all Stack Exchange data:

You are free:
  • to Share— to copy, distribute and transmit the work
  • to Remix — to adapt the work

Under the following conditions:

  • Attribution — You must attribute the work in the manner specified by the author or licensor(but not in any way that suggests that they endorse you or your use of the work).
  • Share Alike — If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.

This isn't news, of course; it's explained on the footer of every web page we serve. And note that we explicitly allow commercial usage -- after all, we're a commercial entity, so it felt only sporting to allow others the same rights we enjoyed.

What is news, is this: lately we're getting a lot of reports of sites reposting our content (which is totally cool, and explicitly allowed), but not attributing it correctly ... which is most decidedly not cool.

What are our attribution requirements?

Let me clarify what we mean by attribution. If you republish this content, we require that you: By “directly”, I mean each hyperlink must point directly to our domain in standard HTML visible even with JavaScript disabled, and not use a tinyurl or any other form of obfuscation or redirection. Furthermore, the links must not be nofollowed.
  1. Visually indicate that the content is from Stack Overflow, Meta Stack Overflow, Server Fault, or Super Userin some way. It doesn’t have to be obnoxious; a discreet text blurb is fine.
  2. Hyperlink directly to the original questionon the source site (e.g., http://stackoverflow.com/questions/12345)
  3. Show the author namesfor every question and answer
  4. Hyperlink each author name directly back to their user profile page on the source site (e.g., http://stackoverflow.com/users/12345/username)

They're not complicated, nor are these attribution requirements particularly hard to find: they're linked from the footer of every web page we serve, and included as a plaintext file in every public data dump we share.

We've been collecting a list of sites that are reposting our data without attributing it correctly -- but it's becoming something of an epidemic lately. Every other day now I get an email or meta report about a real live web search where someone found content that is clearly ripped off, has zero useful attribution, and a bucket of greasy, slimy ads slathered all over it to boot.

I'm starting to get fed up with these sites. Not because they're abusing our website, but because they're abusing you guys, our community -- by reposting your questions and your answers with no attribution! The whole point of Stack Overflow, Server Fault, Super User, and every other Stack Exchange site is to give credit directly to the talented people providing all these fantastic answers. When a scraper site rips a great answer, removes all attribution and context, plasters it with cheap ads -- and it shows up in a public web search result, as they increasingly do -- everyone loses.

I'm not going to stand for this, at least not without a fight. We're starting to email these sites and ask them very politely to please follow our simple attribution guidelines.

And if they don't follow our simple attribution requirements when we've asked them nicely, well -- we're going to start asking them not so nicely. Namely, we will hit them where it hurts, in the pocketbook. Our pal How-to-Geek explains:

For the quickest results, you can send the DMCA to their web host, which you can generally figure out with whoishostingthis.com. Every single legit hosting center will have a "legal" or "copyright" page, and they will have a specific way to send in DMCA requests. Some of them require fax, though many are starting to accept email instead... and they will often have the content removed almost instantly. Wordpress.com will instantly cancel their entire account, and other hosts tend to take very swift action, often disabling their whole site until they comply. If you really want to cause them some pain, however, you can send the DMCA to their advertisers. Adsense is usually the first target for this, since so many of the jerks are using it. The only problem with Adsense is they [require a DMCA fax](http://www.google.com/adsense_dmca.html ). There's been once or twice where I've found a site that was hosted somewhere that doesn't care about copyright... but every single ad network of any value is based in the US, and the jerk website owner isn't going to mess around with their income stream.

Please help us defend your right to have your name and source attached to the content you've so generously contributed to our sites. We will absolutely do our part, but many hands make light work:

  1. Whenever you find a new site that is using our data without proper attribution, check this meta question and make sure it's listed.
  2. If you have contact information for the site that is inappropriately using our content, forward it to us at team@stackoverflow.com for action.
  3. If you're feeling a bit miffed about the whole situation, don't hesitate to forward a link to our attribution guidelines to the site operators, or their ISP, and briefly indicate specifically where they are not following them. Squeaky wheel gets the grease, and all that.
  4. If the site is wrapping the content in invasive ads that attempt to redirect the user or compromise their web experience in some way, I encourage you to report it at http://www.google.com/safebrowsing/report_badware/ ; I'm only adding this because it happened recently (!).

I'm always happy for our content to get remixed and reused, but at some point we have to start defending our attribution guidelines, or we are failing the community who trusted us with their content in the first place.

After all, if we don't stick up for what's right, and what's fair -- who will?

Login with your stackoverflow.com account to take part in the discussion.