code-for-a-living April 7, 2022

You should be reading academic computer science papers

You read documentation and tutorials to become a better programmer, but if you really want to be cutting-edge, academic research is where it's at.

As working programmers, you need to keep learning all the time. You check out tutorials, documentation, Stack Overflow questions, anything you can find that will help you write code and keep your skills current. But how often do you find yourself digging into academic computer science papers to improve your programming chops? 

While the tutorials can help you write code right now, it’s  the academic papers that can help you understand where programming came from and where it’s going. Every programming feature, from the null pointer (aka the billion dollar mistake) to objects (via Smalltalk) has been built on a foundation of research that stretches back to the 1960s (and earlier). Future innovations will be built on the research of today. 

We spoke to three of the members of the Papers We Love team, an online repository of their favorite computer science scholarship. 

Zeeshan Lakhani, an engineering director at BlockFi, Darren Newton, an engineering team lead at Datadog, and David Ashby, a staff engineer at SageSure, all met while working at a company called Arc90. They found that none of them had formal training in computer science, but they all wanted to learn more. All three came from humanities and arts disciplines: Ashby has an English degree with a history minor, Newton went to art school twice, and Lakhani went to film school for undergrad before getting a master’s degree in music and audio engineering. All of those fields of study rely heavily on reading texts that built the foundation of the discipline as to understand the theory that underlies all practice. 

Like any good student of the humanities, they went looking for answers in the archives. “I had a latent librarian inside,” said Newton. “So I’m always interested in the historical source material for the things that I do.”

Surveying history

As part of learning more about the history of programming, Ashby was reading Tracy Kidder’s Soul of a New Machine, about the race to design a 32-bit microcomputer in the late 70s. It covered both the engineering culture at the time and the problems and concepts those engineers wrestled with. This was before the time of mass-market CPUs and standard motherboard components, so a lot of what we take for granted today was still being worked out. 

In Kidder’s book, Lakhani, Newton, and Ashby saw a whole history of computer science that they had no connection with, so they decided to try reading a foundational paper: Tony Hoare’s “Communicating Sequential Processes” from 1978. They were working on Clojure and Clojurescript at the time, so this seemed relevant. When they sat down to discuss the paper, they realized they didn’t even know how to approach understanding it. “It was like, I can’t understand half of this formalism, but maybe the intro is pretty good,” said Lakhani. “But we need someone like David Nolen to explain this to us.”

Nolen was an acquaintance who worked for The New York Times. He gave a talk there about Clojure and other Lisp-like languages, referencing a lot of John McCarthy’s early papers. Hearing this explanation with the academic context started turning a few gears in their minds. That’s when the idea of Papers We Love was born. 

Knowing the history of the computing concepts that you use every day unlocks a lot of understanding into how they work at a practical level. The tools that you use, from databases to programming languages, are built on a foundation of academic research. “Understanding the roots of the things you’re working on unlocks a lot of knowledge that you’re not going to get purely just by using every day because you don’t understand the paths that they didn’t go down,” said Ashby.

There’s a talk they love that Bret Victor gave in 2013 called “The Future of Programming.” He’s dressed like an engineer from the 70s, white button-up, khakis, pocket protector. He starts giving his talk using an overhead projector that has the name of the talk. He adjusts the slide and it reveals that the date is 1973. He goes on to talk about all the great things coming out of research, all the things that are going to shake up computer science. And they’re all things that the audience is still dealing with, like the move from sequential execution to concurrent models. 

“The top theme was that it takes a long time,” said Lakhani. “There’s a lot of things that are old that are new again, over and over and over.” The same problems are still relevant, whether because the problems are harder than once thought or because the research into those problems has been widely shared. 

The trio behind Papers We Love aren’t alone in discovering a love for computing’s history. There is an increased interest in retrocomputing, engineers looking at the systems of the past to learn more about the practice of technology. It’s the flipside of looking at older papers; you look at the old hardware and software programmers used and work on it with a present-day mindset. “A lot of people are spinning up these ancient operating systems on Raspberry PIs and working with them,” said Newton. “Like spinning up an old Smalltalk VM on a Raspberry PI or recreating a PDP-10.”

When you see these issues in their initial contexts, like reading the research papers that tried to address them, you can get a better perspective on where you are now. That can lead to all sorts of epiphanies. “Oh, objects do the things they do because of Smalltalk back in the 80s,” said Ashby. “And that’s why big systems look like that. And that’s why Java looks like that.”

That new understanding can help you solve the problems that you face now. 

The future of programming (today)

There’s more to reading research papers than understanding history; you can find new ways to solve problems by reading current research. “The idea of Stack Overflow is: someone else has had your problem before,” said Ashby. “Academic papers are: someone else has thought about this problem before.” 

If your work involves building variations of the same old CRUD app in new spaces, then maybe research papers won’t help you. But if you are trying to solve the unique problems of your industry, then some of the research in those problem spaces may help you overcome them. “I find papers to expand the idea of what’s possible with the work you do,” said Ashby. “They can help you appreciate that there are other ways to solve these problems.”

For Newton and his colleagues at Datadog, academic papers are an integral part of their work. Their monitoring software has to process a lot of information in real time to give engineers a view of their applications and the stack they run on. “We are very concerned with performance algorithms and better ways to do statistics on large volumes of data,” said Newton. “We need to rely on academic research for some of that.”

Just because research exists, of course, it doesn’t mean your problems are automatically solved. Sometimes a single paper only gets you part of the solution. “I was at Comcast where we wanted to leverage load balancing work that we do in terms of routing,” said Lakhani. “We ended up applying three different kinds of papers that didn’t know each other. We put semantics into network packets, routed them based on another paper via a specific protocol, and implemented a bunch of IETF specs. Part of this work now lives in a Rust library people can run today.” It’s finding threads in academic work and braiding them together to solve the problems at hand. 

Without reading those papers, Lakhani’s team wouldn’t have been able to design such an effective solution. Perhaps they would have gotten there on their own. But imagine the amount of work to research those three concepts; there’s no need to redo their work if it’s already been done. It’s standing on the shoulders of giants, as the saying goes, and if you’re on top of the research in your field, you know exactly which giants to stand on. 

A map of the giants’ shoulders

Naturally, being a graduate of the humanities myself, I wanted to know which were the giants of computer science, those papers that would be on the syllabus if you were to construct a humanities-style curricula for a class. Think of it as a map of which giant shoulders you could stand on to get ahead. 

It turns out, I’m not the first to wonder what’s in the computer science canon. In 1996, Phillip Laplante wrote Great Papers in Computer Science, which might be a bit outdated at this point. For a more recent take on the same thing, the trio recommend Ideas That Created the Future, published last year. Lakhani, who is now doing a PhD in computer science at Carnegie Mellon University (my alma mater), points out that there was a course when he arrived that covered the important papers of the field. 

In a way, this canon is exactly what the Papers We Love repo aims to create. It contains papers and links to papers organized by topic. The group welcomes new pull requests with academic papers that you all love and want to see spotlighted. 

Here are a few papers (and talks) that they recommended to anyone wanting to get started reading the research:

Of course, there are many more. 

If you’re intimidated by starting on a paper, then check out some of Papers We Love’s presentations, which offer a primer on how to understand a paper. The whole idea of these talks is borne out of that first frustration with a paper, then finding a path through it with someone else’s help. “They’ve gotten the CliffsNotes,” says Lakhani. “Now they can attack the paper and really understand it.”

The Papers We Love community continues to try to build a bridge between industry and academia. Everyone benefits—the industry gets access to new solutions without having to wait for someone else to implement and open-source them, and academics get to see their ideas tested and implemented in real situations. 

“One of the goals of Papers We Love is to make it where you find out about stuff a little bit faster,” said Lakhani. “Maybe that changes things.”

Tags: ,
Podcast logo The Stack Overflow Podcast is a weekly conversation about working in software development, learning to code, and the art and culture of computer programming.

Related

code-for-a-living December 23, 2021

Best practices for writing code comments

While there are many resources to help programmers write better code—such as books and static analyzers—there are few for writing better comments. While it's easy to measure the quantity of comments in a program, it's hard to measure the quality, and the two are not necessarily correlated. A bad comment is worse than no comment at all. Here are some rules to help you achieve a happy medium.