code-for-a-living May 20, 2020

Good coders borrow, great coders steal

Copying and pasting can be dangerous, but then again, so can many aspects of software development when done incautiously. In this post, I’ll take a look at what code copying actually means for software development, what good code theft means, and the pitfalls of copying badly.

It’s an open secret among coders that some of the example code that gets posted as part of answers here at Stack Overflow ends up in production code. Maybe you asked a question and got the perfect for loop in exchange. Maybe you found a great answer that already had the exact async await implementation to suit your application.

So when I ran across this tweet promoting the benefits of stealing code I got to wondering; could copying and pasting code actually be beneficial?

Copying and pasting can be dangerous, but then again, so can many aspects of software development when done incautiously. In this post, I’ll take a look at what code copying actually means for software development, what good code theft means, and the pitfalls of copying badly. 

For the record, I’m not advocating that you copy and paste code from our public Q&A site willy nilly, there are instances where it can get you in trouble. But, as our podcast guest Anna Lytical showed us, it can be done well to quickly produce functioning prototypes.

If you do copy code examples, please remember to provide attribution. Depending on when code was last edited on Stack Overflow, it is licensed under a version of the Creative Commons license, the most recent being CC BY-SA 4.0, which requires attribution.

Code once, run millions of times

Copying code from Stack Overflow is a form of code cloning; that is, duplicating code from within a project or between projects and reusing it. Depending on who you ask, as little as 5-10% or as much as much as 7-23% of code is cloned from somewhere else. Whether these clones are good or bad is up for debate. 

Regardless of the exact amount, code cloning is extremely common. Boilerplate code is essentially code repeated regularly throughout a project. Chances are pretty good that those coders aren’t typing each of those by hand. Tools like Lombok try to reduce the need for boilerplate, but the fact remains: 

  • There will be some pieces of code that are going to show up in a project over and over 

AND

  •  Because they may need a small modification, these code snippets can’t be shunted into a separate function or dependency. 

Speaking of which, libraries and external dependencies are an efficient way to reuse functionality without reusing code. It’s almost like copying code, except that you aren’t responsible for the maintenance of it. Heck, most of the web today operates on a variety of frameworks and plugin libraries that simplify development. Reusing code in the form of libraries is incredibly efficient and allows each focused library to be very good at what it does and only that. And unlike in academia, many libraries don’t even require anything to indicate you’re building with or on top of someone else’s code. 

The JavaScript package manager npm takes this to the extreme. You can install tiny, single function libraries—some as small as a single line of code—into your project via the command line. You can grab any of over one million open source packages and start building their functionality into your app. 

Of course, as with every approach to work, there’s downside to this method. By installing a package, you give up some control over the code. Some malicious coders have created legitimately useful packages, waited until they had a decent adoption rate, then updated the code to steal bitcoin wallets. To their credit, the npm staff manages to head these sort of attacks off pretty quickly, but the more external dependencies you have, the greater attack surface you present. 

Even Stack Overflow answers themselves are not immune to code cloning. An independent researcher found several instances where Stack Overflow answers had code copied from other places. One Java snippet found its way into over 40 answers. 

Good artists borrow, great artists steal

In the art world, theft is part of how great works come into being. I remember going to the Van Gogh museum in Amsterdam and seeing some of his early works. I was blown away, but what I didn’t know at the time was that these were actually studies of Japanese art and woodcuts. What I had thought was a style unique to its time was actually part of a continuum, and not the continuum I had thought. Instead of just progressing along the path that the Dutch masters had laid in front of him, he took ideas from Japanese prints he found in Paris—ideas of composition, brushwork—and merged them.

Picasso has a saying credited to him: “Good artists borrow, great artists steal.” Picasso himself was lifted a lot of his ideas from African and Polynesian art and combined them with his own study. Stealing sounds wrong, and in fact, claiming someone else’s work as your own is plagiarism. But the quote is using the word “steal” to say something a little different.. A borrowed object still belongs to someone else; you copy a style and it still belongs to someone else. To steal, however, is to make that idea your own. Taking credit for someone else’s idea is borrowing; understanding an idea and weaving it into your own work, that’s what he meant by theft. Steve Jobs was a  fan of this quote, and Apple became successful under him because they stole, incorporated, and refined. 

When you clone code, you risk merely borrowing it. Borrowed code goes into the project wholesale, so long as it compiles or throws no errors, but it may have bugs or malicious intent baked in that you aren’t aware of. The risks of badly copied code—or code copied with modification—are legion. In fact, most of the complaints about cloned code can be traced to borrowers. If you don’t understand the code, you’re liable to leave security holes in what was meant to be only a demonstration of a concept. Even the most copied snippet from Stack Overflow has a bug in it

On the other hand, when you steal code, you know exactly what it does. The core of it, the quirks, all of it become a seamless part of your own code. If you could write it again from memory, that’s a sign a good theft; a reworking that has left you with something more than a clone, with something original.

So yes, steal code. Take it, understand it, and implement it in your own projects. Make it yours. You can be more efficient, improve your projects, and maybe even improve your resume (aka your ctrl+C ctrl+V). But if you copy without fully understanding your newly acquired code and what it does, you risk making your code worse. 

Tags: , , ,
Podcast logo The Stack Overflow Podcast is a weekly conversation about working in software development, learning to code, and the art and culture of computer programming.

Related

code-for-a-living November 23, 2020

The macro problem with microservices

In just 20 years, software engineering has shifted from architecting monoliths with a single database and centralized state to microservices where everything is distributed across multiple containers, servers, data centers, and even continents. Distributing things solves scaling concerns, but introduces a whole new world of problems, many of which were previously solved by monoliths.
Avatar for Ryland Goldstein
Head of Product at Temporal