The robots are coming for (the boring parts of) your job
Are the robots coming for your job? You’ve heard this question before (we’ve even asked it before). But in 2022, with AI increasingly ubiquitous in the lives of most coders, the issue feels more pressing.
Given the explosive progress AI has made over the last few years, it might seem like only a question of time (or data) until its mastery over complex, nuanced problems clearly outstrips our own. From Go to poker to StarCraft II, AI has bested humans in plenty of arenas where we were once uncontested champions. Is the same true of coding?
Programs like GitHub Copilot have already won widespread adoption, and organization-wide investment in AI has exploded since 2020, expanding developers’ access to and understanding of intelligent automation tools. In this environment, will code written by AI replace code written by humans?
New numbers indicate it already is. Since the program’s launch in June 2021, more than 35% of newly written Java and Python code on GitHub has been suggested by its Copilot AI. To put this in perspective, GitHub is the largest source code host on the planet, with over 73 million developers and more than 200 million repositories (including some 28 million public repositories).
Since the program’s launch in June 2021, more than 35% of newly written Java and Python code on GitHub has been suggested by its Copilot AI.
Coding a tool or service, of course, is fundamentally different from playing a game. Games unfold in accordance with fixed rulesets, while codebases are dynamic: they must evolve as new technologies emerge and adapt to meet new business needs. And it’s not as if Copilot has led to a 35% drop in demand for human programmers: demand for software developers remains high after doubling in 2021.
Still, if AI is writing more than a third of the fresh code for some of the most popular languages on the world’s largest development platform, the AI coding revolution isn’t imminent; it’s already here. In this piece, we’ll explore what AI programs are out there and how developers are using them. We’ll look at their current limitations and future potential. And we’ll try to unpack the impact of these programs on developers and the software industry as a whole.
What do AI coding tools look like?
Based on functionality, there are three species of AI coding tools currently on the market:
- Tools that automatically identify bugs, like CodeGuru or GitGuardian
- Tools that produce basic code by themselves or can autocomplete code for programmers, like Copilot or AlphaCode
- Machine inferred code similarity, or MISIM tools: programs that can extract the “meaning” of a piece of code, like the new machine programming (MP) system developed by Intel, MIT, and Georgia Tech
Bug-hunting tools and AI pair programmers like Copilot are steadily becoming more popular and more powerful, while emergent technologies like MISIM still have a way to go before they become a seamless part of most developers’ working lives. Let’s break these tools down.
Tools that automatically identify bugs
Tools that automatically identify bugs represent one of the most successful applications of AI to programming. These programs not only enhance code safety and quality; they allow developers to focus more time and energy on writing business logic that improves the end product, rather than scanning their code for possible errors and vulnerabilities. Amazon CodeGuru, for example, helps AWS BugBust participants “find [their] most expensive lines of code”—the bugs that drain resources and allow tech debt to flourish.
DeepCode, acquired by Snyk in 2020, is an AI-based code review tool that analyzes and improves code in Python, JavaScript, and Java. Guided by 250,000 rules, DeepCode reads your private and public GitHub repositories and tells you precisely what to do to fix problems, maintain compatibility, and improve performance. Cofounder Boris Paskalev calls DeepCode a Grammarly for programmers: “We have a unique platform that understands software code the same way Grammarly understands written language,” he told TechCrunch.
Other programs focus on scanning code for potential security risks. GitHub’s GitGuardian scans source code to detect sensitive data like passwords, encryption keys, and API keys in real time. Software failures due to relatively simple mistakes like these cost over $2 trillion annually in the US alone.
Tools that produce basic code by themselves or can autocomplete code for programmers
Automatic code generators and AI pair programmers fall into another category: tools that can produce code independently or autocomplete a human programmer’s code. For example, Facebook’s Aroma is an AI-powered code-to-code search and recommendation tool that saves developers time by making it easier to draw insights from huge codebases.
Meanwhile, a new open-source AI code generator called PolyCoder was trained not only with code files, but also by reading questions and answers on Stack Overflow. The creators describe our corpus as a rich source of natural language information that reveals how real people use, troubleshoot, and optimize software.
AlphaCode
At the cutting edge of more research-oriented projects is DeepMind’s AlphaCode, which uses transformer-based language models to generate code. AlphaCode does as well as most humans in coding competitions, ranking among the top 54% of participants “by solving new problems that require a combination of critical thinking, logic, algorithms, coding, and natural language understanding,” according to the company. DeepMind principal research scientist Oriol Vinyals told The Verge that AlphaCode is the latest product of the company’s goal to create a flexible, autonomous AI capable of solving coding problems only humans are currently able to address.
AlphaCode has achieved impressive results, but there’s no need to start watching your back just yet: “AlphaCode’s current skill set is only currently applicable within the domain of competitive programming,” reports The Verge, although “its abilities open the door to creating future tools that make programming more accessible and one day fully automated.”
GPT-3
OpenAI’s GPT-3 is the largest language model yet created. With 175 billion parameters, it can generate astonishingly human-like text on demand, from words to guitar tabs to computer code. The API is designed to be straightforward enough for almost anyone to use, but also flexible and powerful enough to increase productivity for AI/ML teams. More than 300 applications were using GPT-3 only nine months after its launch, with the program generating 4.5 billion words every day, per OpenAI.
In 2020, OpenAI and end-user developers had noticed that GPT-3 could autocomplete code in addition to sentences. GPT-3 had been trained on billions of documents scraped from the web, including pages where programmers had posted their code, so it had learned patterns not just in English but also in Python, Java, C+, R, HTML, and on and on. This realization sparked OpenAI’s interest in creating a code-writing AI: Copilot, built with GitHub and first released in the summer of 2021.
Copilot
Ask most developers for the gold standard in AI pair programming, and they’ll mention Copilot. Trained on public code, Copilot makes suggestions for lines of code or entire functions directly in the editor. Users can explore alternative suggestions, accept or reject Copilot’s input, and edit suggested code manually when required. Importantly, Copilot adapts to users’ edits to match their coding style, increasing the value and relevance of the program’s suggestions over time. Since the program’s launch in June 2021, more than 35% of newly written Java and Python code on GitHub has been suggested by Copilot.
Copilot, writes Clive Thompson in Wired, offers “a first peek at a world where AI predicts increasingly complex forms of thinking.” Despite errors “ranging from boneheaded to distressingly subtle,” Copilot has earned the wide-eyed approval of plenty of developers. “GitHub Copilot works shockingly well,” says Lars Gyrup Brink Nielsen, an open-source software developer and GitHub Star. “I will never develop software without it again.”
Mike Krieger, cofounder and former CTO of Instagram, calls Copilot “the single most mind-blowing application of ML I’ve ever seen,” comparing the program to “a team member who fits right in from the first time you hit /Tab/.”
Copilot is also an invaluable resource for people who want to expand and deepen their coding knowledge (and who doesn’t, really?). “I’m learning TypeScript by hacking through another extension,” says GitHub Star Chrissy LeMaire. “When my previous development experience fails me, I now use GitHub Copilot to learn how to do what I need!” Thompson, the Wired journalist, experimented with asking Copilot to write a program to scan PDFs, starting with a plain-text comment:
# write a function that opens a pdf document and returns the text
In response, Copilot wrote:
def pdf_to_text(filename):
pdf = PyPDF2.PdfFileReader(open(filename, “rb”))
text = “”
for i in range(pdf.getNumPages()):
text += pdf.getPage(i).extractText()
return text
This code not only fulfilled the request exactly; it made use of an open-source Python code—PyPDF2 —Thompson had never even heard of: “When I Googled it, I learned that PyPDF was, indeed, designed specifically to read PDF files. It was a strange feeling. I, the human, was learning new techniques from the AI.”
Copilot’s reception hasn’t been universally glowing. Some developers have raised concerns that Copilot could “effectively launder open-source code into commercial uses without proper licensing,” violate copyrights, and regurgitate developers’ personal details, according to Fast Company. But more developers see Copilot as “the next step in an evolution that started with abstracting assembly languages.” Says Kelsey Hightower: “Developers should be as afraid of GitHub Copilot as mathematicians are of calculators.”
Tools that can extract the “meaning” of a piece of code
OK, so AI can write code, spitting out patterns or producing tools and solutions it’s seen before. But it doesn’t really know what that code means, right?
Well, a consortium of researchers from Intel, MIT, and Georgia Tech have developed a new machine programming system called machine inferred code similarity (MISIM). Much as natural language programming (NLP) can recognize the meaning of text or spoken words, MISIM can learn what a piece of software is supposed to do by examining code structure and syntactic differences between the software and other code that behaves similarly.
Language-independent MISIM has revolutionary potential: it can read code as it’s written and automatically generate modules to check off common, time-consuming tasks. The code that automates cloud backups, for instance, is often the same across programs, as is the code used in compliance processes. Conceivably, MISIM could shoulder responsibility for processes like these, leaving developers free to focus on other work.
Intel’s goal is to build MISIM into a code recommendation engine to help developers working across Intel’s various architectures: “This type of system would be able to recognize the intent behind a simple algorithm input by a developer and offer candidate codes that are semantically similar but with improved performance,” said Intel in a press release.
How AI coding tools can make developers’ lives easier
From improving code quality to tuning out distractions, programs like AlphaCode and Copilot make developers more productive, happier in their work, and more available for higher-order tasks.
Keep developers in the flow and focused on higher-order work
Developers are keenly aware that context-switching and distractions like chat notifications and email pings are highly disruptive to their workflows. As much as 20% of developers’ time is spent on web searches, for example.
One of the primary benefits of AI coding tools is that they can keep developers focused, issuing suggestions and recommendations without jerking people out of their flow states. AI tools that minimize distraction help developers carve out uninterrupted time, making them more productive but also happier and less stressed by their jobs. An internal GitHub investigation found that developers stood an 82% chance of having a good day when interruptions were minimal or nonexistent, but only a 7% chance of having a good day when they were interrupted frequently. In helping developers carve out more uninterrupted time, AI tools also increase coders’ availability for complex, creative problem-solving.
These AI programs don’t replace humans; they increase our productivity and allow us to devote more resources to the kind of work AI is less able to tackle. Which brings us to our next question: What are the limitations of these AI tools?
What are the limitations of these tools?
As we’ve previously explored on our blog, AI coding tools still have plenty of limitations. Broadly speaking, their ability to create new solutions is limited, as is their capacity for understanding the complexities of modern coding—at least for now.
They produce false positives and security vulnerabilities
As many developers are already painfully aware, AI programs designed to catch bugs in code written by humans tend to produce a huge volume of false positives: that is, things the AI identifies as bugs when they’re not. You might argue that, from the perspective of information security, it’s better to produce a ton of false positives than a few potentially devastating false negatives. But a high number of false positives can negate the AI’s value by obscuring the signal in the noise. Plus, security teams become “overwhelmed and desensitized” in the face of too many false positives.
Consider NPM audit, a built-in security feature in Node package manager (NPM) intended to scan projects for security vulnerabilities and produce reports detailing anomalies, potential remediations, and other insights. That sounds great—but a “deluge” of security alerts that overwhelms developers with distractions has made NPM audit a classic example of what’s been called “infosec theater,” with some NPM users saying 99% of the possible vulnerabilities flagged are “false alarms in common usage scenarios.” The prevalence of false positives underscores the fact that AI still struggles to grasp the complexity of contemporary software.
In addition to a high volume of false positives, AI programs can also produce security vulnerabilities. According to Wired, an NYU team assessing how Copilot performed in writing code for high-security scenarios found that 40% of the time, Copilot wrote software prone to security vulnerabilities, especially SQL injections: malicious code inserted by attackers.
They still require human input and direction
As things stand, tools like Aroma and GPT-3 can produce straightforward pieces of code—but only when directed by humans. As Technology Review puts it, “GPT-3’s human-like output and striking versatility are the results of excellent engineering, not genuine smarts.”
Given a tightly controlled problem, these programs can produce impressive solutions, but they’re not yet at the point where, like a skilled human developer, they can examine a design brief and work out the best approach from there. Even Copilot is still “more a hint of the future than the future itself,” writes Thompson in Wired.
Aesthetics is another arena where AI tools still fall short of human capabilities, which is to say the front end is often neglected in favor of the back end during the AI/ML lifecycle.
They absorb and spread harmful biases
AI programs are tools made by humans, prone to the same constraints and flaws as humans ourselves. When the single word “women” was used to prompt GPT-3 to write a tweet, the program generated gems like, “The best female startup founders are named…Girl.” (Nice.) “GPT-3 is still prone to spewing hateful sexist and racist language,” sighed Technology Review. DALL-E, which lets users generate images by entering a text description, has raised similar concerns. And who could forget Microsoft’s ill-starred AI chatbot Tay, turned into a racist, misogynistic caricature almost literally overnight on a rich diet of 2016 Twitter content?
These revealing episodes underscore the importance of prioritizing responsible AI: not to keep the robots from taking our jobs, but to keep them from making the world less inclusive, less equitable, and less safe. As the metaverse takes shape, there are growing calls to develop AI with a greater degree of ethical oversight, since AI-powered language technology can reinforce and perpetuate bias.
But for plenty of companies, responsible AI isn’t a priority. A recent SAS study of 277 data scientists and managers found that “43% do not conduct specific reviews of their analytical processes with respect to bias and discrimination,” while “only 26% of respondents indicated that unfair bias is used as a measure of model success in their organization” (Forbes). By these numbers, the industry has yet to reckon with Uncle Ben’s evergreen advice: “With great power comes great responsibility.”
A matter of trust
A common thread runs through all the limitations we’ve mentioned: developers’ trust, or lack thereof, in a tool. Research (and more research) shows that trust impacts the adoption of software engineering tools. In short, developers are more likely to use tools whose technology and results they trust, and intelligent automation tools are still earning that trust.
David Widder, a doctoral student at Carnegie Mellon studying developer experiences, conducted a 10-week case study of NASA engineers collaborating with an autonomous tool to write control software for high-stakes missions (“Trust in Collaborative Automation in High Stakes Software Engineering Work: A Case Study at NASA,” 2021). The study was designed to examine which factors influence software engineers to trust—or not trust—autonomous tools.
The bottom line, says Widder, is that “developers may embrace tools that automate part of their job, to ensure that high-stakes code is written correctly, but only if they can learn to trust the tool, and this trust is hard-won. We found that many factors complicated trust in the autocoding tool, and that may also complicate a tool’s ability to automate a developer’s job.”
The study found that engineers’ level of trust in autonomous tools was determined by four main factors:
- Transparency of the tool: A developer’s ability to understand how the tool works and confirm it works correctly.
- Usability of the tool: How easy developers find the tool to use.
- The social context of the tool: How people are using the tool and checking it for accurate performance, including the trustworthiness of the person or people who built the tool, the people and organizations that have endorsed the tool, and whether the tool has “betrayed” users by introducing errors.
- The organization’s associated processes: To what degree the company or organization is invested in the tool, has thoroughly tested it, and has proven its effectiveness in real-world contexts.
The study results suggest that training and documentation in how to use a tool are not enough to build engineers’ trust: “Software engineers also expect to understand why by including not just the rationale for what they are told to do, but also why certain design decisions were made.” This suggests, according to the study, that “not only should automated systems provide explanations for their behavior to incur trust, but that their human creators must too.”
Collaboration, not competition
Instead of checking over our shoulders for a robot army, the path forward involves identifying which tasks are best performed by AI and which by humans. A collaborative approach to coding that draws on the strengths of humans and AI programs allows companies to automate and streamline developers’ workflows while giving them the chance to learn from the AI. Organizations can realize this approach by using AI to:
- Train human developers: AI coding tools can help teach human developers in an efficient, targeted way—like using Copilot to learn additional languages.
- Track human developers’ work and make recommendations to improve efficiency and code quality: Imagine if every human coder had an AI pair programmer that would learn how they worked, anticipate their next line of code, and make recommendations based on prior solutions. Those coders would get a lot more done, a lot more quickly—and learn more while doing it.
- Rewrite legacy systems: Systems like MISIM may not be able to fully automate coding, but they can be of enormous assistance in rewriting legacy systems. These programs are platform-independent, so they have the potential to teach themselves elderly or obscure programming languages like COBOL, on which the US government—not to mention plenty of finance and insurance companies—still relies. MISIM-type programs can rewrite the COBOL programs in a modern language like Python so that fewer devs need to brush up on their COBOL skills to keep these services up and running.
As with most workplace relationships, collaboration, not competition, is the way to approach our relationship with AI. The robots aren’t coming for your job—at least not yet—but they are well on their way to making your job easier, your work life happier, and your code better.
Edited by Ben Popper.
Tags: ai, ai coding, copilot, responsible ai
6 Comments
I’m all for robots taking over the boring aspects of my job. I mean, building forms & basic reports then testing them 1000 times is not my idea of a good time. Automated testing is another thing I’d like to see, well, automated. When those things no longer have to be done by me, I’ll celebrate.
Taking over the boring aspects of your job means eventually taking over your job, or at least someone’s job.
Suppose 10% of your work can be automated. That means that, where 10 people were needed, now only 9 people are needed. This wasn’t a problem when we had a productivity deficit. When the steam shovel could do the work of 100 men, we had plenty of other work for those 99 extra men (one operated the steam shovel). But we have already reached the point where everything that the market can clear, all the goods and service that can be bought and sold, can be produced without engaging the complete workforce. That’s one big reason why we have the proliferation of gig jobs.
Jetbrains IntelliJ has been doing a lot of this kind of thing over the last few years. As with most “AI” applications (I hate the phrase, actually) it’s great when it works and really bad news when it doesn’t – for example I had a subtle production bug caused by IntelliJ rewriting code into something it thought was equivalent, but actually handled negative zero differently.
I find this remark on the Copilot site rather worrying “We also believe that GitHub Copilot has the potential to lower barriers to entry, enabling more people to explore software development and join the next generation of developers.” My experience from StackOverflow the last few years is that the barriers to entry are already far too low – there are too many people writing code without the training or aptitude to do the job properly. The Copilot site also shows automatic generation of test cases – tests which, of course, are guaranteed to succeed because they contain the same bugs as the code they are testing. So these new people “exploring software development” are going to think their code is tested when it is anything but.
A heck of a lot of code writing is boilerplate depending on the language syntax, etc. This is mind numbingly boring. It seems CoPilot is more auto-complete with intelligence and quite a bit of “best practices” thrown in. That’s a good thing. Newer opinionated languages like Rust try to help avoid common pitfalls like buffer overrun security bugs, or other common mistakes that would crash software. The Rust compiler is much pickier and forces the developer to comply. Is it perfect? Absolutely not. But plain old C/C++ will gladly let you shoot yourself in the foot, both knees and the head and will compile whatever you tell it to compile despite it containing many logic flaws. Just because it compiles doesn’t make it “correct”. But in the case of Rust it makes it a bit “more correct”. But still fallible. A.I. can certainly make mistakes because it was built by humans. What it suggests might not always be correct for every scenario but it can help you avoid the mundane and avoid silly mistakes. But you still need to make informed decisions.
I once had a senior Unix architect hand me 5 pages of C++ code and ask if I could identify the problem with it. Hmm, let’s see, I just completed some courses and recently finished one on Object-Oriented programming, sure let me take a look-see. Constructor, constructor, constructor…. Hey where’s the destructor? This code is never destroying the objects it allocates? BINGO! This application would run for about 23 days before it crashed the Solaris box when it ran out of memory. The architect was banging his head against the wall for months about this constantly crashing application and he was at wits end when he finally found the flaw. Of course he handed me the precise bit of code printed across five pages. He had to pour through more than 150,000 lines of code to find the flaw so he did make it easy for me. I really only had to examine a few classes. So a little A.I. on boilerplate code doesn’t sound so bad to me. Or if your code could be audited by machine pointing out things like an obvious memory leak on some of your classes. It would have saved a metric ton of time and effort.
How many developers have been stealing snippets of code from Stack Exchange over the years? Quite a bit more than we would suspect. There is a severe shortage of quality programmers but plenty of inexperienced programmers without the deep comp sci knowledge.
What makes a good developer? Well someone with years of experience with multiple languages who can choose the correct tool for the job. Someone who can tune a code base for efficiency and performance obtained through in depth collection of accurate metrics. Identifying the bottlenecks and knows enough about the hardware and the operating system to optimize the code to increase performance. I think anyone working with code or IT departments in general have experience with nightmare systems that are so fragile you dread having to make a change where you may break something else in the spaghetti of endless dependencies and prerequisites. So a bit of A.I. that reduces the amount of boilerplate coding as well as offering best practices is not a bad thing. It’s merely auto-complete on steroids. Yes, this may improve efficiency and reduce the number of programmers but wasn’t that the issue with the Mythical Man Month? You reach a point where throwing more humans at the problem doesn’t pay off. Developers will need to become more engineer than code monkey in this future where you are piecing together machine generated code along with your own code. Will it one day be possible for programs to write programs? Absolutely! Someday that will be the reality. But there will always be a need for programmers but their skills might need to be expanded and the work they perform will be different.
Look at what early developers had to do with punch cards, then teletype and terminals. Now you can flip open a laptop anywhere in the world and do the job. The industry constantly evolves and changes. The education never ever stops, you must continue learning and adapting to the new way of things until you retire and can sip margarita’s in some tropical location till your end of days. If you ever stop learning and become complacent you will soon be replaced. This happens much faster in the computer industry than in any other field. It’s happened many times in the past and it is quite rare to be brought back in due to something like the Y2K bug. Where old school knowledge was in high demand. How many CORBA developers do you know? Fads come and go. Things have been constantly evolving and endless flame wars have raged. Stay frosty and continue to learn, never ever stop.
Soon robots will start doing code if we feed requirement documents to them 🙂 They will never be limited to doing ‘Boring Part of Job’ only!