Preventing the Top Security Weaknesses Found in Stack Overflow Code Snippets

Last week, we told you about research that found a number of security vulnerabilities in C++ code snippets in Stack Overflow answers, and how some of those flaws had migrated into actual, real-live Github projects. Today, we’re following up on the top eight error types that research highlighted and suggesting ways to avoid making the same mistakes.

To get these answers, we checked the common weakness enumeration database for the numbers that Verdi et al posted in their research. We’re only addressing the CWE numbers that referred to specific weaknesses, not to categories, as those require a little more information before you can begin to address them.

While the research specifically found these vulnerabilities in C++ snippets, the CWEs mentioned below can happen in a number of programming languages. Other programming languages may be more prone to other CWEs, so don't take this as a security checklist.

Without further ado, if you’ve got problems, yo, I’ll solve them.

This vulnerability occurs based on the assumption that events or specific circumstances never happen, such as low memory conditions, lack of access to resources.

In a given program, you may be able to control most of the flow of data and therefore assume that all function return values will be within expected values. But rare events can still happen. Even worse, sometimes a clever attacker can trigger those events and cause havoc on your system.

You may expose this weakness by assuming that memory will always be successfully allocated when needed, that files or strings will be of a specific size, or that database or input calls will return something other than NULL. In many cases, unexpected data will crash the process. Maybe your code can handle this gracefully. Or maybe the system will continue in an unexpected state, leaving the system vulnerable to an attacker’s machinations.

Recently, an ethernet/serial RTU module exposed this weakness. While the vulnerability has likely been patched, previously, an attacker could initiate a denial of service attack by sending truncated SNMP packets to a specific UDP port or disconnect all connections with an unexpectedly high number of packets on another port.

Some languages may be able to catch these exceptions before they cause harm. But for the lower-level languages that don’t, you can do the following:

Check your return values every time
Assume all inputs are malicious and only accept certain values
Plan for low-resource conditions
Architect for failure

Exception and failure management aren’t a cure all here. Overly verbose and specific exceptions can give attackers clues as to how to find other holes in your code.

When software does not validate input properly, an attacker is able to craft the input in a form that is not expected by the rest of the application.

As the old saying goes, “garbage in, garbage out.” If your input is bad, it’s probably going to lead to some bad results. It’s sort of like that prank where you bite into what you think is one of those Lindor truffles but it’s actually a chocolate-covered brussel sprout. Malicious attackers are going to try to slip you some input that’s way worse than brussel sprouts.

This weakness shares a commonality with the previous one, in that a developer has assumed that nobody can access the inputs in their functions. The list of developers who were wrong about this is pretty long.

Some versions of MS Excel (all of which are ten years old or more) allowed arbitrary code to execute on a user’s system once they loaded a spreadsheet. Various objects embedded within that malicious spreadsheet accessed the VBA Performance Cache, which was not sanitized at runtime, so it behaved in unexpected ways.

Especially in networked applications with database-driven backends, software has a ton of vectors for untrusted input. You might think that you could catch this weakness every time just by sanitizing your inputs, but there’s a whole raft of other safety measures to take:

Check input values on both the client and server
For any input values combined from multiple sources, validate the combined value as well as the inputs
Be careful about code that crosses languages, as happens in an interpreted language running on a native virtual machine
Explicitly convert your input into the expected data type
Ensure all data is in the canonical character coding scheme before validating
Use static analysis

The return value is not checked by a method or function, which may create an unexpected state.

You may have noticed a theme by now: check yourself before you wreck yourself. Or, more plainly, validate all data to make sure it is what you think it is. If you look at a function and think either that it can never fail or that it doesn’t matter if it fails, you’re exposing yourself to this vulnerability.

Similarly to the previous weaknesses, this one could leave a system in an unexpected state, causing crashes or other weirdness, including executing arbitrary code. That’s just what happened in a number of Linux implementations in 2007; an attacker could sneak crafted TLV packets into a BGP packet that would let them execute what they wanted. This was especially dangerous because BGP packets are usually passed by autonomous systems to share routing information.

Mitigating this is simple: check your return values, make sure they are within expected ranges, and throw exceptions if they aren’t.

The code uses deprecated or obsolete functions, which suggests that the code has not been actively reviewed or maintained.

As exploits are discovered and software practice changes, languages and software may deprecate some functionality as outdated. While you may be able to continue using deprecated functions, there can be security issues when you do. And some functions, like rand(), might be plain obsolete for serious code.

A more serious problem that this weakness can indicate is that the software in question isn’t being maintained or updated. Obsolete functions can linger if nobody keeps up with the changes in software and systems. In an unmaintained system, security flaws can snowball as the software ages. Questions sometimes get new answers, but for those code snippets posted on Stack Overflow years ago, nobody’s maintaining them (nor should they), so it’s on you to make sure everything there is still safe.

In the NUUO CMS in 2018, outdated software components left users open to arbitrary code execution. In complex software with lots of dependencies, it doesn’t even need to be your code that has the security weakness in order to open it up to threats.

Mitigation for this weakness is simple, even if the implementation is not: refactor your code to avoid deprecated or otherwise obsolete functions. If you need to use these functions, make sure that you understand the security implications and plan accordingly.

Memory is allocated based on invalid size, allowing arbitrary amounts of memory to be allocated.

Memory allocation is a pretty common function, especially within lower-level languages. But if code allocates a massive amount of memory, it can lead to system slowdowns or crashes. While a system failing is bad enough, on flexible cloud systems, overallocation can be extremely expensive.

This is cousin to CWE-20, Improper input validation, as the input that needs to be validated is being supplied to memory allocation functions. Memory may be increasingly cheap, but it is still finite. If an attacker can tie up all the memory on your hardware, it can not only crash your program, but any other programs running on that system.

In 2008, the IBM solidDB didn’t validate the field that set the amount of memory to allocate. So any incoming packets could crash the system just by supplying a large value in this field. As solidDB was embedded in a lot of networking and telecommunications equipment, this weakness could have wreaked a lot of havoc with the underlying infrastructure of the internet.

Mitigating this is the same as CWE-20 above, plus ensuring that you are not exceeding the memory limits of your system.

The input is received from an upstream component, but it does not neutralize or incorrectly neutralizes when null bytes are sent to a downstream component.

In C++, Strings are terminated by null characters. Most of the time, this allows a lot of flexibility, as Strings can be of any length. But a user can enter any number of characters as input, including null characters in the middle of Strings. In this case, the String will be truncated to that null.

Most of the time, this doesn’t matter. But if that String is combined as part of a longer String, say a URL or username, then they may be able to get a program to give up more than it’s supposed to. While this question is about PHP, the technique is the same; a null character in input can cancel out anything appended to that String, including the file extension that would have made this a file name. With a little clever scripting, malicious agents could traverse the web directory and get whatever files they wanted.

This is what happened in a lot of early web servers. Attackers could tack a null at the end of a URL and list directories, read source code, and generally bypass all access restrictions. These source files would then give away a lot of extra information about the system that could be exploited.

In trying to catch these attacks, your code should assume that somebody’s going to slip a null in input. Here’s a few ways to protect your code:

Remove null characters within input
Explicitly encode input with the canonical method
Use a combination of whitelists and blacklists to designate valid values and block invalid ones

The software uses a function that accepts a format string as an argument, but the format string originates from an external source.

Format functions, like printf(), take a String with format functions and convert it to a regular String. The format parameters look like %x and will process an argument in the function, adding it to the return value. However, if the String is externally controlled—that is, taken as input or read from a file/data location—and not properly validated, an attacker could slip their own parameters into the function.

Some of the parameters allow reading and writing from the memory stack or process memory. Allowing that sort of data from arbitrary user input is a deeply bad idea. Attackers could read information from memory or execute arbitrary code, depending on what else is in the format string.

In 2006, Dia, a GTK+ based diagramming tool, allowed denial of service attacks (and possibly code execution) when it tried to open a bmp file named with a bunch of %s in a row. The software would try to display a warning message with the name of the file, but because it was processing the format parameters, it displayed sensitive information instead.

The solution is pretty simple: make sure all Strings passed into format functions are static and within your control. Sometimes compiler warnings can indicate problematic areas, so don’t ignore them.

A NULL pointer dereference occurs when the application dereferences a pointer that it expects to be valid, but is NULL, typically causing a crash or exit.

I’ve gone through release notes when half of the issues being fixed were because of null pointer exceptions (or dereferences). While many null pointers can crash a system or cause unexpected program exits, sometimes they can leave the system in an unexpected state—strange, unreproducible quirks or pulling data from other memory addresses. It becomes a security issue when an attacker reliably finds an avenue to induce a null in a system.

While a lot of null pointers happen because of sloppy programming, in asynchronous systems, a null pointer can arise from a race condition. That’s where two async processes read and write the same variable. These can happen out of order, so when one process moves faster and reads the variable before the other process has a chance to set it, boom! Null pointer dereference.

In 2004, OpenSSL had a null pointer dereference that allowed attackers to craft a specific SSL/TLS handshake that crashed the service, effectively creating a denial of service attack.

By checking for these common errors in the code you copy from Stack Overflow, you can avoid a lot of serious security risks. Share some knowledge, check out some answers, and be safe out there.

Preventing the Top Security Weaknesses Found in Stack Overflow Code Snippets

CWE-754 Improper Check for Unusual or Exceptional Conditions

CWE-20 Improper input validation

CWE-252 Unchecked return value

CWE-477 Use of obsolete function

CWE-789 Uncontrolled memory allocation

CWE-158 Improper neutralization of null byte or null character

CWE-134 Use of externally controlled format string

CWE-476 Null pointer deference

Add to the discussion