Academic Papers Using Stack Overflow Data

One unanticipated benefit of releasing our data as creative commons is that the Stack Overflow dataset has been the subject of several academic papers already:

Ravi Kumar, Yury Lifshits (Yahoo! Research), and Andrew Tomkins (Google)

Presented at WSDM 2010, Session 7: Temporal Interaction

View Slides
Download paper (pdf)

Hüseyin Oktay, Brian J. Taylor, David D. Jensen (Knowledge Discovery Laboratory, Department of Computer Science, University of Massachusetts Amherst)

To be presented at the 2010 ACM/SIGKDD conference

Download paper (pdf)

There's also a third study starting up with Lena Mamykina, a researcher in Human-Centered Computing at Columbia University, who is working in conjunction with Björn Hartmann, a professor from UC Berkeley:

The success of stackoverflow.com is making all my research community wonder what is it that makes it work so well for the users. Would you be interested in participating in a research study to answer some of these questions? The study would probably involve things like interviews (phone) with your development team, moderators and selected users. The results will be submitted for publication at one of the ACM (Association of Computing and Machinery) conferences (for example a conference on human factors in computing systems, CHI or a conference on computer-supported cooperative work, CSCW). Of course you will have a chance to review and provide your feedback on all the materials before they are published.

We'll of course be contributing to the interviews, as well as introducing Lena to selected community members who indicate that they are willing to be interviewed for ... science!

It's exciting to be a part of this research, which lets everyone benefit from the slices of time that we've all collectively contributed to not just Stack Overflow, but every site in our network. If there's anything else we can do to help assist any research using the public creative commons data we expose, just contact us.

Academic Papers Using Stack Overflow Data

Evolution of Two Sided Markets

Causal Discovery in Social Media Using Quasi-Experimental Designs

Add to the discussion