WebSockets for fun and profit

Seamless communication is a must on the modern web. As internet speeds increase, we expect our data in real time. To address this need, WebSocket, a popular communication protocol finalized in 2011, enables websites to send and receive data without delay. With WebSockets, you can build multiplayer games, chat apps, and collaboration software that work on the open web.

I built several projects with WebSockets before I started to wonder what exactly was happening under the hood. That question took me down a research rabbit hole, and I am excited to share what I’ve learned with you. In this article, we will:

explore the problem WebSockets solve and look at alternatives
look under the hood to understand how WebSockets work
review some simple code for a WebSocket-powered app
talk through some real-world implementations

By the end of this piece, you should feel comfortable discussing how WebSockets work, and maybe even inspired to use it in your next project.

Let’s start with the basics: WebSocket is a technology that allows a client to establish two-way (“full-duplex”) communication with the server. (A quick review: the client is the application on a user’s computer, and the server is the remote computer that stores the website and associated data).

The key word in that definition is two-way: with WebSocket, both the client and the server can trigger communication with one another, and both can send messages, at the same time. Why is this a big deal? To fully appreciate the power of WebSocket, let’s take a step back and look at a few common ways that computers can fetch data from the server.

In a traditional HTTP system, which is used by the majority of websites today, a web server is designed to receive and respond to requests from clients via HTTP messages. This traditional communication can only be initiated in one direction: from the client to the server. Server code defines what type of requests the server should expect and how to respond to each of them. A common metaphor for this type of communication is a restaurant kitchen. It goes something like this:

You (the client) places an order (an HTTP request) that a waiter takes to the kitchen (the server).
The kitchen receives the order and checks if they know how to make it (the server processes the request).
If the kitchen knows how to make the dish, they prepare the order (the server fetches data from a database or assets from the server).
If the kitchen doesn’t recognize the order or isn’t allowed to serve it, they send the waiter back with bad news (if the server doesn’t know how to or isn’t allowed to respond to the request, it sends back an error code, like a 404).
Either way, the waiter returns back to you (you get an HTTP response with an associated code, like 200 OK or 403 Forbidden).

The important thing to note here is that the kitchen has no idea who the order is coming from. The technical way to say this is that “HTTP is stateless”: it treats each new request as completely independent. We do have ways around that—for example, clients can send along cookies that help the server identify the client, but the HTTP messages themselves are distinct and are read and fulfilled independently.

Here’s the problem: the kitchen can’t send a waiter to you; it can only give the waiter a dish, or bad news, when you send the waiter over. The kitchen has no concept of you—only the orders that come in. In server-speak, the only way for clients to get updated information from the server is to send requests.

Imagine a chat app where you’re talking to a friend. You send a message to the server, as a request with some text as a payload. The server receives your request and stores the message. But, it has no way to reach out to your friend’s computer. Your friend’s computer also needs to send a request to check for new messages; only then can the server send over your message.

As it stands, you and your friend—both clients—need to constantly check the server for updates, introducing awkward delays between every message. That’s silly, right? When you send a message, you want the server to ping your friend immediately to say “Hey, you got a message! Here it is!” HTTP request-response works just fine when you need to load a static page, but it’s insufficient when your communication is time-sensitive.

One dead simple solution to this problem is a technique called short polling. Just have the client ping the server repeatedly, say, every 500ms (or over some fixed delay). That way, you get new data every 500ms. There are a few obvious downsides to this: there’s a 500ms delay, it consumes server resources with a barrage of requests, and most requests will return empty if the data isn’t frequently updated.

Another workaround to the delay in receiving data is a technique called long polling. In this method, the server receives a request, but doesn’t respond to it until it gets new data from another request. Long polling is more efficient than pinging the server repeatedly since it saves the hassle of parsing request headers, querying for new data, and sending often-empty responses. However, the server must now keep track of multiple requests and their order. Also, requests can time out, and new requests need to be issued periodically.

Another technique for sending messages is theServer-Sent Events API, which allows the server to push updates to the client by leveraging the JavaScript EventSource interface. EventSource opens a persistent, one-directional connection with the server over HTTP using a special text/event-stream header and listens for messages, which are treated like JavaScript events by your code.

This is almost what we’re looking for—we can now receive updates from the server! Because they’re one-directional, Server-Sent Events (SSE) are great for apps where you don’t need to send the server any data—for example, a Twitter-style news feed or a real-time dashboard of stock quotes. Another pro is that Server-Sent Events work over HTTP and the API is relatively easy to use. However, SSE is not supported by older browsers, and most browsers limit the number of SSE connections you can make at the same time. But, that’s still not quite enough for our chat app: it’s great to receive real-time updates, but we’d like to be able to send them too.

So, we need a way to send information to the server, and receive updates from the server when updates come in. This brings us back to the two-way (“full-duplex”) communication we mentioned earlier. Enter WebSocket! Supported by almost all modern browsers, the WebSocket API allows us to open exactly that kind of two-way connection with the server. Moreover, the server can keep track of each client and push messages to a subset of clients. Great! With this capability we can invite all of our friends to our chat app and send messages to all of them, some of them, or only your best friend.

So, how exactly does this magic work? Don’t be intimidated by the setup—modern WebSocket libraries like socket.io abstract away much of the setup, but it’s still helpful to understand how the technology works. If, at the end of this section, you’re interested in even more detail, check out the surprisingly readable WebSocket RFC.

In the last section, we mentioned HTTP several times. HTTP is a protocol, a set of rules for how computers communicate on the web. It’s made up of the requests and responses, each of which contains a request line (“GET /assets/icon.png”), headers, and an optional message body (used in, for example, POST requests to send some data to the server).

WebSocket is another protocol for sending and receiving messages. Both HTTP and WebSockets send messages over a TCP (Transmission Control Protocol) connection, which is a transport-layer standard that ensures streams of bytes, sent over in packets, are delivered reliably and predictably from one computer to another. So, HTTP and WebSockets use the same delivery mechanism at the packet/byte level, but the protocols for structuring the messages are different.

In order to establish a WebSocket connection with the server, the client first sends an HTTP “handshake” request with an upgrade header, specifying that the client wishes to establish a WebSocket connection. The request is sent to a ws: or wss:: URI (analogous to http or https). If the server is capable of establishing a WebSocket connection and the connection is allowed (for example, if the request comes from an authenticated or whitelisted client), the server sends a successful handshake response, indicated by HTTP code 101 Switching Protocols.

Once the connection is upgraded, the protocol switches from HTTP to WebSocket, and while packets are still sent over TCP, the communication now conforms to the WebSocket message format. Since TCP, the underlying protocol that transmits data packets, is a full-duplex protocol, both the client and the server can send messages at the same time. Messages can be fragmented, so it’s possible to send a huge message without declaring the size beforehand. In that case, WebSockets breaks it up into frames. Each frame contains a small header that indicates the length and type of payload and whether this is the final frame.

A server can open WebSocket connections with multiple clients—even multiple connections with the same client. It can then message one, some, or all of these clients. Practically, this means multiple people can connect to our chat app, and we can message some of them at a time.

Finally, when it’s ready to close the connection, either the client or the server can send over a “close” message.

Whew—nice job following along! Let’s take a quick break to stretch, and then look at some sample code.

Now that you’re all loose from stretching (I wasn’t kidding before!), let’s switch gears from a technical overview to looking over some code. This will be much less complex—as I mentioned before, modern tooling saves us from having to worry about handshaking or frame OpCodes.

On the front end, we’ll use JavaScript to establish a connection to a WebSockets-enabled server, and then listen for messages as JavaScript events—the same way you’d listen to user-generated events like clicks and keypresses. In vanilla JavaScript, we do this by creating a new WebSocket object and adding event listeners for open, message, and close events. We also use the send method to send data to the server.

const socket = new WebSocket('ws://localhost:8080'); 
socket.addEventListener('open', function (event) { 
  socket.send('Hello Server!'); 
}); 

socket.addEventListener('message', function (event) { 
  console.log('Message from server ', event.data); 
});

socket.addEventListener('close', function (event) { 
  console.log('The connection has been closed'); 
});

On the server, we similarly need to listen for WebSocket requests. In Node.js, for example, we can use the popular ws package to open a connection and listen for messages:

const WebSocket = require('ws');
const ws = new WebSocket.Server({ port: 8080 });

ws.on('connection', function connection(wsConnection) {
  wsConnection.on('message', function incoming(message) {
    console.log(`server received: ${message}`);
  });

  wsConnection.send('got your message!');
});

Although in this example, we’re sending strings, a common use case of WebSockets is to send stringified JSON data or even binary data, allowing you to structure your messages in the format convenient to you.

For a more complete example, Socket.io, a popular front-end framework for making and managing WebSocket connections, has a fantastic walkthrough for building a Node/JavaScript chat app. This library automatically switches between WebSockets and long polling, and also simplifies broadcasting messages to groups of connected users.

Although you can manually write WebSocket server code, WebSockets is conveniently already built into popular frameworks. We’ve already touched on the Socket.io library on the front end. A few other implementations of WebSockets in larger projects are:

ActionCable in Ruby on Rails, a full-stack Ruby framework
Channels in Django, a full-stack Python framework
Gorilla in the Go language
Meteor, a full-stack JavaScript framework based on WebSockets instead of HTTP
Apollo, a GraphQL server that helps fetch data in real time using WebSockets

So, what types of projects can benefit from WebSockets? Any project that requires real-time, two-way communication is a great use case. Here are just a few kinds of applications often powered by WebSockets:

Chat Apps: a conceptually simple implementation WebSockets; users send messages to a server, which instantly pushes these messages to the recipient. No delay! The server can also store groups of connections in channels, allowing you to message multiple people at once, or see messages from multiple people in a room, like a Slack channel.
Multiplayer Games: A common pattern for multiplayer games is to have a server store a game state that serves as the source of truth. Players will take actions or make moves that are sent to the server, which updates the game state, and pushes it out to all players. With HTTP, each player needs to regularly request the game state. With WebSockets, each move is instantly relayed to all players.
Collaboration Apps: Need to work on a shared document or canvas? You can follow the pattern above to allow multiple users to draw or type in a document and instantly update it for everyone connected.
Developer Tools: Continuous Integration tools like CircleCI use WebSockets to instantly notify you when a build has finished. Need to send your client metrics around site performance and visitor count? Open a WebSocket connection and send updates as soon as the server receives them.
Location-dependent apps: Update your server when the user’s GPS coordinates have changed, then send the user new data based on their current coordinates.

Hopefully by now you’re sold on WebSockets! We’ve covered a lot of ground: we traced the problem WebSockets solves and the alternate solutions, explored how WebSockets operates under the hood, and even looked over some sample code and common use cases.

The WebSocket protocol is a wonderful tool to build real-time communication apps, but like all tools, it’s not a silver bullet. A WebSocket connection is meant to be persisted, so can be overkill for simpler apps. For a one-directional news feed, metrics feed, or any app where you need to update the client but not receive information in return, Server Sent Events or plain old HTTP calls are quicker and simpler to set up. However, for multiplayer games and collaborative apps, WebSockets opens up a world of possibilities. The real-time web is evolving, and the WebSocket protocol is a crucial cog in its evolution. Go forth and use it for good!

WebSockets for fun and profit

Messaging on the internet

Request-response

Short polling

Long polling

Server-Sent Events (SSE) / EventSource

WebSockets

WebSockets under the hood

A short example

In the wild

A real-time recap

Add to the discussion