Jessica Hicklin was 16 when she entered a juvenile correctional facility. Earlier this year, after 26 years, she left the justice system a free woman, and set to work in her new role, as CTO of Unlocked Labs. The organization supports coding education programs in three prisons across the state of Missouri, and runs a development shop at one prison, empowering currently incarcerated individuals to build software that improves access to education in prison more broadly. Key to their success, says Hicklin, is the knowledge contributed by the Stack Overflow community over the years. “It was the difference in making our project work. Unlocked Labs would not exist without Stack Overflow.”
Many coders would say they rely on Stack Overflow to get work done, but Hicklin’s situation is different. She had no access to the internet while incarcerated. “My experience with the internet ended with America Online and dial-up modems.” At first she relied on textbooks, but they were often out of date. Eventually, a friend who worked in tech sent her an XML file containing Stack Overflow’s quarterly data dump. With that raw material, Hicklin and her co-founder built out a curriculum and learning management system that is now utilized by individuals studying through Unlocked Labs.
Today, we’re happy to announce a new initiative called Overflow Offline. We’re working with a non-profit, Kiwix, to ensure that an up-to-date version of our dataset is easily available for those who need it, and will work to improve its readability and reduce its size so there is less friction for end users. Unlocked Labs is one of the organizations we’ll be working with, but there are many others, spanning from the justice system, to scientific research, to university education in areas where internet access is scarce.
We first set out to create this project in the fall of 2019. We had heard that there were communities that would benefit from an offline version of Stack Overflow and wanted to understand what it would take to create. What we learned was that Kiwix had already done great work on this, and had been distributing Stack Overflow to many users. In fact, it was the second most popular data set in their entire library, behind only Wikipedia.
Since 2018, however, Kiwix was unable to update its offline version of Stack Overflow. Over the last two years, we were able to provide them with financial and technical support to unblock the issues preventing these updates, and to provide resources for improving the usability of the data on their platform.
“Over the years demand for Stack Overflow’s dataset has only continued to grow. With their help we are now able to offer this resource again, and have seen it widely adopted by those coding in situations where there is little to no internet access,” says Stéphane Coillet-Matillon, co-founder of Kiwix. “We built the Sotoki (Stack Overflow to Kiwix) scraper in such a way that it can capture each and every one of the 180 Stack Exchange websites.”
The pandemic prevented us from moving ahead with new deployments, as in-person activity was limited, but we have now reached a point where the work we’ve done with Kiwix can be released to the world and hopefully continue to be improved as an open-source project.
Along with Unlocked Labs, groups like The Last Mile, Bard Prison Initiative, and Code 4000 have also found that the questions and answers our community has generated, now over 20 million and counting, can be an invaluable resource.
“When you have a problem you cannot simply go to the internet and search for help,” says Neil Barnby, an instructional officer with Code4000. “This is huge disadvantage for the Code4000 students, but Stack Overflow offline allows them to search the resource for possible solutions. This means they become more independent in finding their own solution and gets them familiar with the resources they will eventually use when they are released. It also means that they can work more efficiently on the commercial work they undertake to develop their portfolios.”
In the world of science, Stack Overflow is used offline in locations like The IceCube Lab, a remote research station at the South Pole that studies the universe by observing neutrinos.
“We constantly work on scripts, a lot of Python code, for instance. We use Puppet down there. So every time we need to make a major change on the cluster, there’s always something that doesn’t work. That’s when Stack Overflow comes in handy,” says Ralf Auer, the IceCube Data Center Manager.
There are two people onsite year-round to keep the experiments running and there are hundreds of scientists around the world that depend on the data the lab produces. Keeping all the tech running is tough when internet access is spotty to non-existent. “Right now we update once a year during the summer season when I can go down there with a drive. There is no way for us to download 135 gigabytes over the satellite,” says Auer. “Of course it would be nice to get updates more frequently than what we have right now. When I look there and I have a problem with Python, the knowledge on Stack Overflow is just incredible. Just the amount of knowledge that is already out there is very handy for us.”
A third use case for Overflow Offline is for students studying at schools where internet access isn’t consistent, or for those who might not have internet access to help with homework when they are done with class. “Cameroon has a population of 27 million. The Internet coverage rate in Cameroon is 34%, or 9.15 million people have access to the Internet,” says Yannick Nkengne, an ed-tech entrepreneur and founder of the company EduAirBox. “In the education sector, offering free Wi-Fi on university campuses in Cameroon is expensive for administrations. Almost no campuses in Cameroon have free Wi-Fi for students. Students pay for internet packages via their smartphones to connect to their laptops and the price of effective packages for research is expensive for the majority of students. With our solution, students can download content from the EduAirBox to read at home off our network. Their browser uses Web Storage (IndexedDB) technology to store data.”
As a former computer science student, Nkengne is familiar with Stack Overflow. “The fact that Kiwix is able to distribute the Stack Overflow platform was good news for my team because 30% of students in Cameroon are in digital courses and this rate is growing. We are planning to install Stack Overflow in all the networks of our universities. By the end of this year, about 50,000 students will be able to access Stack Overflow without an internet connection.”
Stack Overflow sees today’s launch as the jumping off point for Overflow Offline. As we learn from our partners and users, we’ll endeavor to improve the dataset so that it’s accessible to more organizations and provides a more powerful resource to those learning to code or building with software.
If you want to learn more about how you can support Kiwix and their millions of users without reliable internet access, check out their site here. If you’re interested in contributing to their open source project, check out their list of onboarding tickets here, or jump straight to the Sotoki repository on Github.
This is a really wonderful initiative. Good work.
This is an amazing innovation that I am sure will definitely help the tech sector. 🤝👍💪😊
Nice to know about Kiwix. Thanks for sharing about this initiative
Its going to be very fantastic 😊🥳🎉🎉🎉
I was missing any kind of details in this blog post. In general, how much worse or different is the experience using this offline version compared to the public one on the Internet?
Are the pages rendered the same way as on the public Stack Overflow, as web pages? For example, are there the “Linked” and “Related” sections in the right-hand column? Or is every question an isolated island? From a local web server? Does it require a web server? Can it run stand-alone on a single computer or is a local network required? Or directly from files? Or is it rendered in some other format?
What about the images on Imgur? Are they included in the offline version? Are they scraped from that site? Or is there an agreement for a more organised/efficient exchange of data?
Are all posts included? Does any kind of filtering of posts take place? Are unanswered questions included?
What about comments? Are they included? Tag wikis?
Perhaps less relevant, but are Help Center pages, meta sites, user profile pages, edit summaries/revisions, and post timelines included?
What about external resources? E.g., link-only answers.
What kind of search facility is available?
How is the information actually exchanged? Sending USB sticks by mail? Data DVDs? SSDs? Some kind of organised sneakernet?
Awesome! Is this just for Stack Overflow or does it cover the other Stack Exchange sites as well?