When we published our data analysis of the security questions on Stack Overflow and the Information Security Stack Exchange back in October, we found that every time that there was a major vulnerability or breach, there would be a corresponding uptick in the number of questions asked. It made intuitive sense, but the data confirmed it. But we didn’t expect to have new data to confirm this trend so soon.
On December 10th (or 9th depending on who you ask), 2021, the maintainers of Log4j disclosed that the very popular Java logging library had a serious security vulnerability, dubbed Log4Shell. The bad news is the vulnerability could allow attackers to gain control of any system running specific versions of Log4j; potentially hundreds of millions of systems and projects were at risk. The good news is only some versions are vulnerable, and the solution is pretty simple: update Log4j to version 2.16+.
To get a sense of the scale of the problem, we asked our followers on Twitter about the vulnerability. User Bakhtiyar Garashovsaid, “If someone says [they weren’t affected] it is because either they don't use Java or they don't log at all.” It’s risen to the level of national security, with big tech companies talking to the federal government about how to better secure open source projects that become widespread dependencies.
Suddenly, the version of Log4j implemented in your software became very important. Right on cue, thousands of developers checked what version their systems were using, and for those unsure about how to do that, the Stack Overflow community was there to help.
A sudden increase in traffic
Over five years ago, markthegrea asked, “How can I find out what version of Log4j I am using?” At the time, this was a pretty innocuous question; now, it was vital.
Thanks to the sudden importance of the answer, this question received 17 times more views (207K) in the last 30 days than it did in the previous five years combined. It even received a new top answer for developers looking to find out if any project of theirs could be running a vulnerable version, as some projects automatically include Log4j, leading to multiple versions on a single server. The answer even came with a program to scan all .jar files on a server and highlight vulnerable Log4j versions.
At Stack Overflow, we call it knowledge reuse: when questions and answers are shared, they can be referenced, reused, and updated. This was a perfect example of an existing question that proved relevant to what developers needed to know in 2021.
Besides the heavy activity on this question, the tag itself saw a jump in views. Web traffic to log4j questions totaled 766K views in the first seven days after the vulnerability was announced, averaging 110K views per day. That’s a 1,122% increase over the 9K average views per day before the announcement.
Obviously, when something changes with a piece of technology, users will ask new questions, doubly so when what’s changed is a mission-critical security vulnerability. Users have asked 325 new log4j questions in the first 30 days since the vulnerability was announced, nearly double the total number of questions asked previously. For the first seven days after the announcement, the tag averaged 20 new questions a day. Compare that with the volume before the announcement: an average of one lonely question per day.
The vulnerability itself got a new tag: log4shell. This tag saw 13 questions asked and received over 25K views. The most popular of these new questions, “How can I mitigate the log4shell vulnerability in version 1.2 of Log4j?,” garnered 22K views. The fact that this popular question was marked a duplicate is no surprise—people in a crisis generally have the same questions.
Changes in what viewers wanted
The top ten most viewed questions since the announcement have all been related to the vulnerability in some form, with eight of them asked after the announcement and explicitly mentioning the vulnerability. The remaining two include the question discussed above about finding the version and another five-year-old question, “Migrating from Log4j to Log4j2 - properties file configuration,” which received 19K views since the disclosure.
After the disclosure, questions clearly shifted to directly reference the vulnerability. The words vulnerability, version, and cve were among the top five words mentioned after the disclosure, whereas these words were rarely mentioned before it.
Pre-Vulnerability period includes the previous 325 questions asked. Post-Vulnerability period is from December 10th 2021 to January 10th 2022.
When looking at the top 100 words used before and after the announcement it becomes even more clear that the majority of new questions are directly related to the vulnerability. Specific Log4j version numbers are referenced in the titles (1.2.17, 1.2, 2.17). We even see Logback, a successor to Log4j, start to appear in the titles.
We broadened our semantic analysis to include more than just the top five words to reveal what users were trying to learn before and after the vulnerability. Words like logs, file, and logging were frequently used prior to the announcement, which suggests that these questions were in regards to Log4j's core functionality. After the vulnerability there was a clear shift where new questions were a direct result of the vulnerability. Not only did the words vulnerability, vulnerable, and security begin to appear, but we also see specific versions being referenced and the vulnerability itself CVE-2021-44228.
Questions in a crisis
Any security vulnerability in a software dependency creates a whole lot of uncertainty for its users. Does this affect me? How can I tell? And what do I do if I’m affected? As a site where technologists go to gain and share knowledge—specifically those who create software—we have a window into the uncertainties the software community is facing. Getting the answers they need, when they need them is essential.
This vulnerability may have been fixed in an update, but the challenge with open source is that updates don’t always permeate the industry retroactively. Vulnerabilities like Log4j will live on in the affected versions. Studies have found that over 80% of projects still use outdated dependencies.
“There are many fantastic, free tools available to software developers, things we use everyday that we don't even think twice about using,” Matt Kiernander, technical advocate here at Stack Overflow. “The Log4J vulnerability is a prime example of what could go wrong when we trust too casually. Log4j was built by Apache, a well known and trusted entity that's provided much value to the open source community over the years. If this can happen with Apache, what about that third party library you downloaded from npm that had 3.5 stars but 'did the trick?' Many devs will download things just because they work without considering the potential security impacts it could have in an application. ”
There’s a huge number of free, open source libraries available to make your development life easier, but these dependencies are out of your control—as are the security issues that they face. When project maintainers realize that they are vulnerable to system takeovers and data exfiltration, Stack Overflow will be here to help them locate and mitigate these issues.