Scaling your VPN overnight
Starting on Monday of this week, Stack Overflow as a company went 100% remote, meaning all of our employees are now working from home. We put together tips from our staff about how to make remote work efficient, enjoyable, and sustainable.
One of the critical tools for enabling robust remote work in the software industry is a good virtual private network—VPN for short. Normally, when you work from outside of the office, data sent from your computer travels over the same public network as data from a consumer service, like a video streaming platform, online game, or shopping site. A VPN creates what’s known as a tunnel: an encrypted link between your device and your work network, allowing your data to move in a secure manner, as if you were on a wired connection at your office.
With the current health crisis created by COVID-19, many countries and companies are asking people to work from home. This has created a sudden and massive surge in the number of employees working remotely each day. For many companies, their VPN infrastructure was not built to handle the entire organization working remotely and the need to scale quickly can prove challenging. So we sat down with some of the technical experts at Stack Overflow who built our VPN network to get their advice on pitfalls to avoid and best practices to follow.
“Over the years, we’ve used numerous vendors for our VPN system,” says Brian Artschwager, the internal support engineer who leads up work on Stack Overflow’s network. “One thing we ran into a lot was issues with dependability. Developers would be uploading a large file, the connection would drop, and they would have to start all over again.”
Why we chose open source
In 2019 Stack Overflow switched to OpenVPN, an open source system written in C that was originally authored by James Yohan and released in 2001. “A lot of people, especially outside IT, still sometimes hesitate when it comes to open source, because there is a notion that it might somehow be less secure,” says Artschwager. “But when a project has a deep history and a large group of people actively contributing to it, the reality is that it’s likely to be the most robust and up to date software available.”
There are a number of features that make OpenVPN the obvious choice for us. Since switching, dropped connections have become very rare. The service works well across a wide variety of operating systems and device types. Critically for Stack Overflow, it’s SOC2 compliant, offers two-factor authentication, and as a result is approved for use when dealing with work for our enterprise level clients.
Don’t mix business and pleasure
If your company is about to set up its first VPN or needs to dramatically scale up the number of users working through a VPN, Artschwager recommends going with a “split tunnel” approach. When using a VPN, data from a user appears to be coming from a specific, pre-set IP address. That is why VPNs are sometimes used to avoid geographic restrictions on internet traffic. A user with a VPN can communicate with a server and make it appear as if they are based in whatever region or country the VPN server exists within.
“We have developers in other countries and other states and they are connecting directly into our data center. The software on the user’s computer gets an internal IP address that’s in the same network or subnet as if it were a client on that physical network,” says Artschwager. “The tunnel that you’re generating is actually encapsulated and encrypted. Your traffic looks like one big encrypted stream of data, but on the other end of the connection it’s just like you’re directly connected.”
With a split tunnel approach, only sensitive, work-related data is sent through the secure VPN tunnel to your work network. If you’re at home watching cat videos on YouTube, that data will travel over your ordinary network. This can significantly reduce the load being put on your work’s VPN servers and systems, ensuring everything stays up and running with minimal latency. “I know that we have developers that have gigabit internet connections. We don’t want all of their traffic going to the data centers because it’s not relevant for us and it just takes up bandwidth on our internet circuits. We have hundreds of employees using five VPN servers in different locations and see barely any traffic because the only things that go over the connection are traffic that is destined for our internal systems.”
Build extra capacity into your system
Over the year, Artschwager and his colleagues found that whether it’s a hardware constraint or a licensing constraint, there’s always an implied limit of how many people can connect to the same VPN. OpenVPN offers us thousands of connections and gigabytes of traffic.
We did have to buy “concurrently connected device” licenses for OpenVPN, but luckily we bought twice as many as we had users. That means folks working remotely now have options if they need to be on a laptop, tablet, and phone. “I’m reading reddit’s /r/sysadmin/ subreddit and seeing the conversations in my peer group. People are talking about how they aren’t sure how they are going to take a thousand people completely remote. Their VPN was only meant to support a few hundred people at once because it was built for remote sales people, for example, or their people at conferences. So far we haven’t had that problem because we spec’d it for twice as many people as we had at the time.”
Avoid routing through local offices
The majority of our VPN endpoints are actually in the data center, not regional offices, which frees up bandwidth for our employees. “Our data centers have access to really high bandwidth connections. By comparison, small regional offices or folks who work out of co-working spaces may be sharing internet access on a low throughput connection with dozens of employees or even other companies.”
When considering how to build out your VPN, take stock of the data centers your company has access to, and try to find ways to maximize the throughput to locations with powerful, high bandwidth connections. Most data centers plan their operations around potential disruptions, building redundancy into their power and cooling systems, and will offer commitments to their customers to keep their technology operations running around the clock, so your VPN connection has less of a chance of going down for extended periods.
A checklist to help as you work to scale
Working remotely should be as secure as working in the office. If your organization is suddenly finding itself in need of a new or upgraded VPN solution, you’ll need consider a few things:
- Bandwidth: Bandwidth utilization will increase with each additional client and residential internet speeds are constantly increasing. Users expect a fast connection and don’t differentiate between what comes from the VPN connection and the public internet, so make sure your VPN solution can accommodate everyone’s traffic.
- Stability: Remote users depend on the connection and it should be as stable as being in the office. We performed 24 hour stress tests when choosing a vendor—we recommend everyone do the same.
- Price per user: Licensing can be different for each vendor. But generally the more licenses, the lower the price. This provides room for growth should user count increase. With employees potentially having multiple devices, user count may be more than you expect.
- Security: Remote users will be connecting from unsecured internet connections. Strong encryption is needed to secure the traffic to and from your corporate network. Multifactor authentication for your VPN can prevent unauthorized connections.
If you’re looking to do further research, check out our questions on the tags openvpn and vpn. You can also leave a question in the comments—please keep it respectful and on topic—and we’ll try to find time to answer them over the coming weeks.
you should try out wiregaurd
There’s a key thing you didn’t mention: you absolutely should test your VPN at-scale *before* you need it. My company too routes its VPN endpoints to data centers, but we tested this week to see if they could support all 11,000 US employees working remotely. (Spoiler alert: they couldn’t, but just barely.) Given that amount of lead time, we were able to address the issue ahead of being asked to be full-remote “for real” starting Monday.
Roddy, what did you use to load test your VPN? And what VPN vendor are you using? Thanks.
Use ZPA from Zscaler. It auto scales
Definitely Zscaler. A large amount of customers are coming to us because of COVID-19 and said their traditional VPN couldn’t handle all of their employees working from home securely, with performance that scales. With Zscaler’s ZPA connectors each has 500 Mbps throughput, and adding more connectors to accommodate additional user traffic is super easy.
Also, it’s worth noting that Zscaler’s technology is drastically different than a traditional network-to-network VPN, and focuses on ZTNA principals – i.e. Zero-Trust, user-to-application security. Our users are never placed on the network and our connectors are not internet facing which reduces attack surface area immensely. Look at all the VPN vulnerabilities that continue to take advantage of internet-facing VPN’s, with 2019 very bad in particular, and the attacks aren’t going to be stopping anytime soon…
“But when a project has a deep history and a large group of people actively contributing to it, the reality is that it’s likely to be the most robust and up to date software available.” -> Is that really so? Linux had some critical bugs take around four years or more to be fixed.
Linux.. Which Distro? Which version? Did you attempt to fix it?
If you were using a paid for distro. I’d bet it was fixed pretty quickly.
If you were using a free distro, I’d bet open source developers closed that loop hole pretty quickly.
Be wary of pulling the “but linux ” straw man argument.
I think bandwidth and security are the two key parameters while choosing a vpn from customer perspective. I have seen free vpn users complaining about internet speed issues due to limited bandwidth.