We decided early on that all user-generated content on Stack Overflow would be under a Creative Commons license. All those great Stack Overflow questions, answers, and comments, so generously contributed by all of you, are licensed under cc-wiki (also known as cc-by-sa):
You are free Under the following conditions
- to Share — to copy, distribute, and transmit the work
- to Remix — to adapt the work
- Attribution — You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).
- Share Alike — If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.
The community has selflessly provided all this content in the spirit of sharing and helping each other. In that very same spirit, we are happy to return the favor by providing a database dump of public data [Ed. note: this location has changes since the original posting]. We always intended to give the contributed content back to the community as a whole. Our primary concern was making sure we didn't have an AOL-style "incident" where we accidentally release personally identifying information in so-called "sanitized" data. Stack Overflow user Greg Hewgill was kind enough to help us beta test several iterations of the data dump, ensuring that we didn't release anything except content that is visible on the public website. He also suggested several improvements to improve the data dump, so that it contains as much useful public information as possible.
Cheers, Greg! Also, thanks to Stack Overflow Valued Associate #00003, Geoff Dalgas, who patiently worked through many iterations of this to get it together on our end. All public Stack Exchange sites are now included in the data dump: including Stack Overflow, Server Fault, Super User, and so on.
Note that if you republish this data, we require attribution as described in this blog post. Most importantly, there should be hyperlinks back to the original question, and the profiles of all participants. Our plan is to create a new data dump every two months, reflecting all data in the system up to that date. We will seed the latest and greatest dump (at a low bitrate) as long as we can, ideally permanently. And yes, it's still fun to say "data dump". We look forward to seeing what the community can do with this data! update: per this message from Cameron Parkins of Creative Commons, cc-wiki is now an alias for cc-by-sa.
Hi Stack Overflow-ers, My name is Cameron Parkins – I do community outreach at Creative Commons and recently stumbled across your latest CC data dump. Very cool that you all are using CC! I wanted to give you a heads up that the license you’ve chosen, the “CC Wiki-License”, isn’t really around any more. It is in the sense that it links directly to our CC BY-SA license, but our attempt to brand it as a separate license for wikis never got off the ground. We don’t use or promote it anymore and when we see it, we try and reach out to whoever is using it to let them know. Part of the problem is that the Wiki License doesn’t carry any value, while our BY-SA license (which is what the wiki license is) has widespread community support around it. Would you all consider switching your indication as such? Let me know if you have any questions – would like to promote the project through our networks. Best, C — Cameron Parkins Cultural Program Assistant Creative Commons [aim] cam3ran [work] www.creativecommons.org [linkedin] http://www.linkedin.com/in/cameronparkins [cc newsletter] http://creativecommons.org/about/newsletter
Do you love big data? There is a career waiting for you. Find the latest data scientist job listings here.