code-for-a-living November 17, 2020

The complexities—and rewards—of open sourcing corporate software products

Two engineers at Salesforce talk about how they decoupled a complex library from old spaghetti logic, then open sourced that library by creating a new internal process where none existed before.

The year was 2011, and we at Toopher were building a complex authentication API. Our small startup was based on a graduate school project by our founder, Evan Grim, and the authentication API was our sole product. As a startup, we were focused on delivering value, so we leveraged existing tools rather than building everything from scratch. 

We wrote the whole product in Django, a mature, batteries-included open source Python web framework geared towards rapid application development (exactly what we were doing at the time!). We used the Piston framework to handle API creation. It was a decent framework, and there wasn’t much else to meet our needs. But Piston was eventually abandoned as a project, forcing us to come up with our own solution, which we did, and which we called django-declarative-apis.

By this time, our little startup had been acquired by Salesforce. The Salesforce Authenticator app, in fact, was built as the front end to our complex authentication API. We realized that the django-declarative-apis framework filled a gap in the Django ecosystem and wanted to open source it, but faced a couple of problems.

First, the new framework was deeply embedded in the automated two-factor authentication API that we had created, and teasing it out to a separate package that we could release publicly would be a challenge.

Second, while Salesforce had policies that supported employees working on open-source projects, at the time, it didn’t have a process for turning previously proprietary code into open source. 

We want to share what we learned—and we learned two big things. First, we learned how to architect a library like this for Django and how to decouple it from a lot of old, spaghetti logic that was already baked into the app in order to do a public open source release. Second, we learned how to push an open source release like this out through Salesforce, which is a dynamic place but also a big one—you can’t just take proprietary code, throw it on GitHub, and call it a day. First, let’s talk through the technical challenges and how we managed them.

Against spaghetti logic

The original API framework was a very procedural solution. You start at the top with a request, and then you extract the parameters and do what you need to do, forming a response all in one method. We had to bolt on new features continuously without ever breaking anything, and we ended up with a bunch of spaghetti logic handlers that we were pretty ashamed of. There was a pretty hairy 500-line method in there for a single handler.

It was getting to be very brittle and made us sad every time we had to go in and modify it. Different sets of behaviors would run based on input parameters, and there were a lot of interdependencies. But we needed to continue to support all of that behavior, because we had customers who were dependent on every single aspect of it. 

The idea for a version two had been kicking around for a while. The big goal was to have all the API handlers be coded in a declarative way; that is, the handler behavior could be determined from the code. You could run static analysis on the code and know everything you needed to know about it: how to call the handler, what kind of response you’d get, and what it does in the middle. 

We also wanted to abstract away the HTTP request-response cycle from the business logic in the handler. The existing options were pretty rigid on this point; for any handler, you had to muck around with the request object and do whatever data transformations you needed to do on it within the handler method, polluting it with logic that wasn’t directly tied to the business function of that code. 

We came up with wish list interfaces. Evan thought up a more aggressively forward-thinking way to write our APIs that inspired us to immediately ditch all the other methods we had considered and go all in on it. It was unlike any way we’d seen API handlers written before. 

What made it special was that the handler code is completely unaware of the HTTP interface; headers, encoding, URL parameters, all that stuff is handled through a class that automatically mapped request parameters to fields in the class. The class could contain methods to transform request data into data usable by the handler method, which made sure that the handler code only contained business logic. Based on that request, it could extract the necessary data, gather any dependencies, and prepare the response object. 

If the request causes any side effects, like sending an email, you could isolate those into a separate task method while keeping that handler method clean. With an additional decorator, you could have that task method automatically run deferred until after when the handler returns a response. By separating everything like this, we made our handlers so much more maintainable. 

Once we had the framework implemented, we needed to prove that it was feature complete. To do so, we reimplemented that super hairy 500-line handler in it. Because if we could get that beast to work in this framework, we could be pretty confident that it had all the capabilities that we needed to port over all of our API to it. The complete port took about a year, all told, after completing half the API (and that unwieldy beast) in a quick burst. 

While the initial project was created as part of the authenticator API and app—you can see it in the Apple and Google app stores as the Salesforce Authenticator—the framework was so useful, we decided to pull it apart and make it a separate library. 

The first step was to just copy and paste it into a clean module within the main project, its own top-level directory. Messy, but moving this code to its own module exposed the coupling and showed us what kinds of knots we needed to untangle. We refactored all the broken dependencies in this new module, then went back to the original API and refactored it to work with the new fixes. At this point, we had identical modules in both projects, so we could remove the framework code from the original API. The last step was to make the new framework installable for projects within Salesforce—a true library, not just a sub-module. 

Share what you’re proud of

We always had ambitions to open source the framework that we had created. When you create a novel solution that you’re proud of, letting others use and contribute to it can only help the software community as a whole. Plus, by going open source, we’d have access to some best-in-class tooling, like Travis CI, and have an easier time incorporating this framework into projects here at Salesforce. 

But it remained a low-priority item on our to-do list. At least, it did until Demian Brecht joined the team. He’s our open source advocate who believes not only in participating in the open source community as an individual, but also in the tangible and intangible benefits for the companies giving back to the open source community. He’s active on a lot of open-source projects, both within Salesforce and outside of it. At this point, django-declarative-apis was already a sub-module that was being used heavily across a lot of projects at Salesforce. It was natural that we’d open source it. 

But a common theme when we talked to people on the team was that they liked open source software, but had never contributed to it before and were a little shy about doing so. We wanted to give them that exposure and help them start contributing in a meaningful way to an open source project. They’d be developing this library publically, right alongside our customers or casual hobbyists, anyone who wanted to use and shape it. 

There’s a huge benefit to your team developing code out in the open like this; because anybody can see your code, add to the project, and comment on your pull requests, there’s a bit more onus on you to consider what it is that you’re changing. You get better commit messages about more thoughtful changes, which makes your development process self-documenting. 

The engineering team was on board with taking the framework open source, and next we needed to work with the legal department to make it happen. This was a brand new situation for them. They had a matrix set up for starting a new project as open source or working on existing open-source projects, but nothing about taking a framework pulled from a security-related product and publicly releasing the code. 

Because we were breaking new ground, we had to make sure our i’s were dotted and t’s were crossed everywhere. We needed to scrub through all of the code to make sure there wasn’t something patentable that we would be giving away. We had our product security folks make sure that we weren’t inadvertently revealing some easy way to hack us. It was the first project of its kind, so it took about six months, but because we went through it and worked out the kinks, open sourcing existing projects at Salesforce has now become much more streamlined (and commonplace). These days, we have a formal process for open sourcing that is sometimes instantaneous, but rarely takes more than 4-6 weeks! As such, we’re now open sourcing dozens of projects across the company every year.

There are practical benefits to us open-sourcing it, independent of anything community-related or external. The intrinsic benefit revolves around our deployment pipeline. We deploy our API on Heroku, which makes it easy to install a dependency that is open source and available on the public repositories. As a closed library internal to Salesforce, we had to jump through a lot of hoops to add django-declarative-apis to projects. One example of one of the hoops is vendoring a dependency, which is to mirror the entire source tree of a library into the dependent project instead of simply specifying the dependency and version in a configuration file. When it went open source, suddenly we no longer had that, or any other hoop to jump through, related to consuming closed source dependencies. 

Besides making our workflows easier, making the project open source also gave us access to some best-of-breed tools, like Travis CI. These tools have huge communities around them because they’re open source themselves—being able to participate and contribute to these communities is fantastic. If going open source means we’re able to get some community around django-declarative-apis, that would be cool. But it’s worth it without any of that.

We’re proud of this library and are excited to be able to share it with the community. When you implement a new endpoint with a decent level of complexity in django-declarative-apis, you really appreciate how much it protects you from writing bad code. We accepted complexity as our tradeoff; but we’re willing to accept more complexity within the framework if it allows us to express more simplicity in the handlers that we’re implementing every day.

In the end, both parts were a learning experience. We’d created a lot of technical debt, and starting over turned out to be a better fix than trying to do piecemeal repairs. We also found that our peers at Salesforce were glad to explain why the legal rules were there, which helped a lot. The open sourcing process took longer than we first expected, but not that long, and in the end we were able to give something important back to the commons we’re proud of—and that Salesforce can get behind. We’d like to invite you to use and contribute to the django-declarative-apis project; we feel it’s a great framework and can only get better with your input. It would be amazing to have someone from outside our little group contribute to it. But more than that, we’d like to encourage you to consider open sourcing the libraries you work on everyday, whether those are at your job or in personal projects. Open sourcing something is good for your team and the projects that you’re producing, and it gives your teammates something to contribute to in the public and to the larger programming community.

Tags: , , , , ,
Podcast logo The Stack Overflow Podcast is a weekly conversation about working in software development, learning to code, and the art and culture of computer programming.

Related

code-for-a-living June 23, 2020

Is it time to give Drupal another look?

For many people discussion of content management systems raises unpleasant specters of the early 2000s. But while CMS platforms may not feel like the shiniest new tech on the block, they still have a lot to offer, and they've evolved in ways that might surprise you. Let's talk about Drupal, a 20 year old open source project that still manages to be on the leading edge of the CMS world.
code-for-a-living November 23, 2020

The macro problem with microservices

In just 20 years, software engineering has shifted from architecting monoliths with a single database and centralized state to microservices where everything is distributed across multiple containers, servers, data centers, and even continents. Distributing things solves scaling concerns, but introduces a whole new world of problems, many of which were previously solved by monoliths.
Avatar for Ryland Goldstein
Head of Product at Temporal
newsletter September 18, 2020

The Overflow #39: Breaking leases in Silicon Valley

Welcome to ISSUE #39 of the Overflow! This newsletter is by developers, for developers, written and curated by the Stack Overflow team and Cassidy Williams at Netlify. For your consideration: micro frontends, the ancient computer hardware making a cameo beside James Bond, and a time-lapse of the most popular website rankings since 1993. From the blog Can one…