Everything contributed to the Stack Exchange network of websites is licensed under Creative Commons Attribution – Share Alike. This means it belongs to everyone, and can be freely reused (even commercially!), so long as it is follows our simple rules of attribution. That’s our contract with the community — it’s your generously contributed content that makes these websites worth visiting in the first place!
Thus, we provide dumps of all the public data in the current Stack Exchange network (Stack Overflow, Server Fault, Super User, and Meta) every month, like clockwork.
But if you just want to play with the data, it’s kind of tedious: you have to download the entire 700 plus megabyte archive, import it into some kind of database system — and only then can you even begin thinking about how to query out the results you’re looking for.
Well, I’m pleased to announce that we now officially support a web tool for sharing, querying, and analyzing the Creative Commons data from every website in the Stack Exchange network — the Stack Exchange Data Explorer.
The Stack Exchange Data Explorer, or SEDE:
- provides easy web-based access to the latest and greatest monthly Stack Exchange website data dumps*
- gives us an Open Data Protocol (odata) endpoint
- allows testing, running, editing and permalinking to public queries against our corpus of data with a simple, syntax-highlighted web UI
- can be used as a permanently linkable tool for teaching general SQL and relational database concepts — we can be our own Northwind or Pubs database, when answering questions tagged [sql]!
- hosted on Windows Azure so it’s speedy, scalable, and always available (and did I mention, generously sponsored by Microsoft?)
- built from the same ASP.NET MVC software stack as our own engine, and will be open-sourced so others can learn from the code
We’ve been working with Sam Saffron to build this out, and even though this is only a public beta, it’s already amazing! But don’t take my word for it — check out the Stack Exchange Data Explorer yourself at …
… and leave any beta feedback in the [data-explorer] tag on meta.
The ultimate goal of all of our sites is learning, and making the Internet a slightly better place. I believe the SEDE achieves both of these goals in a rather serendipitous way — it helps us teach SQL and relational databases by querying the very posts we’re creating as we teach! Yes, maybe it’s a little geeky, but it is magical to me.
- we are looking at eventually making special weekly or biweekly dumps for SEDE
If you love this data exploration or making data exploration easier for others, you may be interested in a career in this field. Check out the data science job openings on Stack Overflow Jobs.