Good coders borrow, great coders steal
It’s an open secret among coders that some of the example code that gets posted as part of answers here at Stack Overflow ends up in production code. Maybe you asked a question and got the perfect for
loop in exchange. Maybe you found a great answer that already had the exact async await
implementation to suit your application.
So when I ran across this tweet promoting the benefits of stealing code I got to wondering; could copying and pasting code actually be beneficial?
Copying and pasting can be dangerous, but then again, so can many aspects of software development when done incautiously. In this post, I’ll take a look at what code copying actually means for software development, what good code theft means, and the pitfalls of copying badly.
For the record, I’m not advocating that you copy and paste code from our public Q&A site willy nilly, there are instances where it can get you in trouble. But, as our podcast guest Anna Lytical showed us, it can be done well to quickly produce functioning prototypes.
If you do copy code examples, please remember to provide attribution. Depending on when code was last edited on Stack Overflow, it is licensed under a version of the Creative Commons license, the most recent being CC BY-SA 4.0, which requires attribution.
Code once, run millions of times
Copying code from Stack Overflow is a form of code cloning; that is, duplicating code from within a project or between projects and reusing it. Depending on who you ask, as little as 5-10% or as much as much as 7-23% of code is cloned from somewhere else. Whether these clones are good or bad is up for debate.
Regardless of the exact amount, code cloning is extremely common. Boilerplate code is essentially code repeated regularly throughout a project. Chances are pretty good that those coders aren’t typing each of those by hand. Tools like Lombok try to reduce the need for boilerplate, but the fact remains:
- There will be some pieces of code that are going to show up in a project over and over
AND
- Because they may need a small modification, these code snippets can’t be shunted into a separate function or dependency.
Speaking of which, libraries and external dependencies are an efficient way to reuse functionality without reusing code. It’s almost like copying code, except that you aren’t responsible for the maintenance of it. Heck, most of the web today operates on a variety of frameworks and plugin libraries that simplify development. Reusing code in the form of libraries is incredibly efficient and allows each focused library to be very good at what it does and only that. And unlike in academia, many libraries don’t even require anything to indicate you’re building with or on top of someone else’s code.
The JavaScript package manager npm takes this to the extreme. You can install tiny, single function libraries—some as small as a single line of code—into your project via the command line. You can grab any of over one million open source packages and start building their functionality into your app.
Of course, as with every approach to work, there’s downside to this method. By installing a package, you give up some control over the code. Some malicious coders have created legitimately useful packages, waited until they had a decent adoption rate, then updated the code to steal bitcoin wallets. To their credit, the npm staff manages to head these sort of attacks off pretty quickly, but the more external dependencies you have, the greater attack surface you present.
Even Stack Overflow answers themselves are not immune to code cloning. An independent researcher found several instances where Stack Overflow answers had code copied from other places. One Java snippet found its way into over 40 answers.
Good artists borrow, great artists steal
In the art world, theft is part of how great works come into being. I remember going to the Van Gogh museum in Amsterdam and seeing some of his early works. I was blown away, but what I didn’t know at the time was that these were actually studies of Japanese art and woodcuts. What I had thought was a style unique to its time was actually part of a continuum, and not the continuum I had thought. Instead of just progressing along the path that the Dutch masters had laid in front of him, he took ideas from Japanese prints he found in Paris—ideas of composition, brushwork—and merged them.
Picasso has a saying credited to him: “Good artists borrow, great artists steal.” Picasso himself was lifted a lot of his ideas from African and Polynesian art and combined them with his own study. Stealing sounds wrong, and in fact, claiming someone else’s work as your own is plagiarism. But the quote is using the word “steal” to say something a little different.. A borrowed object still belongs to someone else; you copy a style and it still belongs to someone else. To steal, however, is to make that idea your own. Taking credit for someone else’s idea is borrowing; understanding an idea and weaving it into your own work, that’s what he meant by theft. Steve Jobs was a fan of this quote, and Apple became successful under him because they stole, incorporated, and refined.
When you clone code, you risk merely borrowing it. Borrowed code goes into the project wholesale, so long as it compiles or throws no errors, but it may have bugs or malicious intent baked in that you aren’t aware of. The risks of badly copied code—or code copied with modification—are legion. In fact, most of the complaints about cloned code can be traced to borrowers. If you don’t understand the code, you’re liable to leave security holes in what was meant to be only a demonstration of a concept. Even the most copied snippet from Stack Overflow has a bug in it.
On the other hand, when you steal code, you know exactly what it does. The core of it, the quirks, all of it become a seamless part of your own code. If you could write it again from memory, that’s a sign a good theft; a reworking that has left you with something more than a clone, with something original.
So yes, steal code. Take it, understand it, and implement it in your own projects. Make it yours. You can be more efficient, improve your projects, and maybe even improve your resume (aka your ctrl+C ctrl+V). But if you copy without fully understanding your newly acquired code and what it does, you risk making your code worse.
Tags: bulletin, copying code, security, stackoverflow
19 Comments
Here is a good thread on Twitter about what can happen if you copy from stackoverflow without double checking the code…
https://twitter.com/Foone/status/1229641258370355200?s=20
How do I become a Java developer? I have been self tutoring, but I get stuck most of the times and it takes days (sometimes months) to summon courage and continue.
Hey Joe,
Why you want to become a java developer is more important.
Mostly java developers have Computer science degree on hand, it is not required but if you will looking for job sure it will matter.
Make java fundamentals strong and work on real world project.
If you can spend some money go for pluralsight.com.
Great article!
This litterally makes no sense: “Taking credit for someone else’s idea is borrowing;” That is also stealing. Lol. Otherwise gr8 article.
If it takes you months to work up the courage to write code, then programming might not be the right vocation for you. The way you become a developer (regardless of the particular language) is to accept that you’re going to make mistakes, make them, solve them, learn from them, and keep going. Getting stuck and then unsticking yourself is part of the process.
I hope no one takes this post seriously. The difference between an engineer and a technician is understanding the system. There is nothing wrong with reusing ideas, but I think that rarely the solution is to just copy and paste code. Thats a great way to introduce bugs.
This is exactly what he wrote. Read through again.
Hi, you are saying exactly what he said. The author said “steal” here refers to programmers that fully understand (intricately) the code that they are copying, then go ahead to modify and implement it for their purpose, making something original of its own. We are all inspired by someone or something, you don’t always have to learn by studying basic principles to a build-up. That’s def beneficial, but time consuming, especially when someone has done it already. Learn from them and improve!
This was disscussed on the meta site https://meta.stackexchange.com/questions/334811/stack-overflow-made-the-bbc-news-copycat-coders-create-vulnerable-apps
The point you make is valid, but can be distilled down simply to “Integrity”, both professional and technical. Professional integrity, especially in the open-source world, doesn’t preclude using code that is publicly available (that’s what open-source is about), but requires fair-attribution and implies that in-turn, you will give something back to the community. Technical integrity speaks to the larger caution about Cargo-Cult programming or using code you don’t fully understand. Technical integrity (and just plain competence for that matter) requires you fully understand any code you copy and use — including not just an understanding of the immediate code itself, but of any potential side-effects on the rest of the system the code is incorporated in. No one can force integrity on another — it is something that comes from within.
Write a program in C (without using any standard tools or library functions) to print the top N
or/and bottom N lines of a given set of files.
Print the first/last N lines of each FILE to standard output. With more than one FILE, precede
each with a header giving the file name.
The way to invoke the program is:
a.out –q –h N –t N File1 File2 ….Filen (-h N – Print the top N lines from each file
-t N – The bottom N lines from each file)
-q – Do not print the header giving the file names
if u find the code please help me
Came here expecting good examples on how to give proper attribution under different licenses but found this ramble about arts and ethics, disappointed.
I know, right!? Insight based on an analogy to broader concepts through arts, culture and ethics? How dare you, Stack Overflow. How dare you.
Stop teaching us stuff!
There’s nothing wrong with cut and pasting code from SO and other online resources. I do it all the time. I would go on a limb and say that the most productive programmers are those who do it frequently. Writing an application touches on so many areas where it’s nearly impossible to be an expert on all of them. The .NET framework, for instance, is huge and complex. It would be a waste of time and effort to figure out how to call WCF or how to build a XAML based form by only referencing the technical documentation. I would immediately reprimand or fire a developer who implements QuickSort from scratch instead of using what’s already supplied by standard libraries unless it’s performance is very critical or they have a novel implementation that’s significantly faster and worthy of a patent. Today’s application development is mainly setting up an architecture and tying together the components according to a recipe. Even the best chef’s in the world use recipes that they didn’t create themselves.
All that said, you should know the language and the technology you’re working with. While I do borrow other people’s code, I make sure I understand a good deal of it before trusting the code. If I am copy code that opens and references an .NET assembly, I make sure I understand how loading an assembly works. The code is only valuable to me in that it saves me from implementing all the minor details. Blindly copying code can introduce bugs and security issues, but this can happen even when the a developer writes their own code from scratch. Copy large parts of code and shoe horning it into your project as an unmodified black box, however, is an issue. This is quite different from copying and pasting snippets of code, modifying it, and integrating it into your code base.
OFF TOPIC:
While many use the Jobs’ quote as an admission of intellectual property theft, Apple didn’t steal the GUI from Xerox. They licensed the technology from them and poached their top scientists working on it. They stole it in the sense that they didn’t just copy it, but made significant improvements rendering it far superior to the original.
Hi Sir, I have read this article you wrote, a really cool piece of work that can potentially open up new ways f thinking about development! May I ask you, as an editor of InfoQ China, would you mind if I translate it into Chinese and reach our readers? With creat!
Unfortunately, we cannot grant permission to translate our articles.
Of course, borrowing an idea (e.g., code) from another source is necessary. We need to be efficient with time because we are in the business of solving problems that haven’t been solved before. If we had to solve every old problem over and over, there would be no progress in the world. But the solution is not to simply “understand the code”. If we try to do that, we may waste as much time as if we had tried to solve the problem from scratch. Our brains are not fast enough, and what we focus on depends on recent experiences. In addition, requirements for the solution change constantly. The problem isn’t understanding the code, which you should be able to do, but having systems and tools in place that can properly deal with code: enumerate and check requirements, insert assertions, analyze the code and make recommendations of tradeoffs, perform refactorings on the code, generate unit tests. The list is endless. But, we are still on first base with regard to these tools in practice.
This is an excellent read! Really enjoyed it.