23 January, 2020
As a developer, you probably know the difference between vertical and horizontal scaling. But if you don’t have much experience with the WebSocket protocol, you might not realize that doing horizontal scaling for it is not nearly as straightforward as with a typical REST API. In this tutorial, we learn how to scale horizontally WebSocket servers on easy practical examples.
When we start thinking about the development of an application, we usually first focus on an MVP and the most crucial features. It’s fine, as long as we are aware that at some point we will need to focus on scalability. For most of the REST APIs, it’s rather easy. However, it’s a whole different story when it comes to WebSockets.
Vertical vs horizontal scaling – what’s the difference
We all know what scaling is, but do we know that there are two types of scaling – horizontal and vertical scaling?
The first one is the vertical scaling. It is by far the easiest way to scale your app, but at the same time, it has its limitations.
Vertical scaling is all about resources, adding more power by adding more machines. We’re going to keep a single instance of our application and just improve hardware – better CPU, more memory, faster IO etc. Vertical scalability doesn’t require any additional work and, at the same time, it isn’t the most effective scaling option.
First of all, our code execution time is not changing linearly with the improved hardware. What’s more, we’re limited by possible hardware improvements – obviously, we cannot improve our CPU speed infinitely.
So, what about an alternative?
Instead of adding more resources to existing instances, we might think about creating additional ones. This is called horizontal scaling.
The horizontal scalability approach allows us to scale almost infinitely. Nowadays it’s even possible to have dynamic scaling – instances are being added and removed depending on a current load. It’s partly thanks to the trend of scaling in cloud computing.
On the other hand, it requires a little bit more configuration, since you need at least one additional piece – load balancer, something responsible for request distribution to a specific instance – and for some systems we need to introduce additional services, for example messaging.
Horizontal scaling with WebSocket Issue #1: State
OK, let’s say we have two simple apps.
One is a simple REST API:
The other one is a simple WebSocket API:
Even though both APIs use a different way to communicate, the code base is fairly similar. What’s more, there will be no difference between those two when it comes to vertical scaling.
The problem arises with horizontal scaling.
In order to be able to handle multiple instances, we need to introduce a load balancer. It is a special service responsible for even (using selected strategy) distribution of traffic between instances.
HAProxy is an example of a load balancer. All we need is to provide a simple configuration.
As you can see, we’re defining frontend and backends.
The frontend will be public (this is the address used for communication with our backends).
We need to specify an address for it and also the name of the backend that will be used for it.
After that, we need to define our backends. In our case, we have two of them, both using the same IP, but different ports.
By default, HAProxy is using a round-robin strategy – each request is forwarded to the next backend on the list and then we iterate from the start.
We also configure so-called health checks, so we make sure that the requests won’t be forwarded to an inactive backend.
So where is the problem?
Most REST APIs are stateless. It means that nothing related to a single user making a request is saved on an instance itself. The thing is, it is not the same case with WebSockets.
Each socket connection is bound to a specific instance, so we need to make sure that all the requests from specific users are forwarded to a particular backend.
How to fix it?
Solution #1: Sticky session
What we are looking for is a sticky session. Thankfully, we are using HAProxy, so the only thing that needs to be done is some configuration tweaks.
First of all, we’ve changed the balancing strategy. Instead of using round-robin, we decided to go with leastconn. This will make sure that a new user is connected to the instance with the lowest overall number of connections.
The second change is to sign every request from a single user with a cookie. It will contain the name of the backend to be used.
After that, the only thing that is left is to tell which backend should be used for a given cookie value.
At this point we should be fine with handling the single user’s messages. But what about broadcasting?
Horizontal scaling with WebSocket Issue #2: Broadcasting
Let’s start with adding a new function to our WebSocket server, so we can send a message to all connected clients at once.
It looks fine. So where is the catch?
The WebSocket Server knows only about clients connected to this specific instance. This means we’re sending a message only to a set of clients, not all of them.
Solution #2: Pub/Sub
The easiest option is to introduce communication between different instances. For example, all of them could be subscribed to a specific channel and handle upcoming messages.
This is what we call publish-subscriber or pub/sub. There are many ready-to-go solutions, like Redis, Kafka, or Nats.
Let’s start with the channel subscription method.
First of all, we need to separate clients for the subscriber and the publisher. That’s because the client in the subscriber mode is allowed to perform only commands related to the subscription, so we cannot use the publish command on that client.
However, we can use a duplicate method to create a copy of a specific Redis client.
After that, we subscribe to a message event. By doing this, we will get any message published on Redis. Of course, we’re also getting information about the channel it was published on.
The last step is to run a publish method to send a message to a specific channel.
And now, onto the last part. Let’s connect it with our WebSocket code.
As you can see, instead of sending messages to the WebSocket client right now, we’re publishing them on a channel and then handle them separately. By doing this, we’re sure that the message is published to every instance and then sent to users.
Do you have experience with microservice architecture? 🤔 Take part in the State of Microservices 2020 survey!
Scaling WebSocket servers – summary
WebSocket scaling is not a trivial task. You cannot just increase the number of instances, because it won’t work right away. However, with the help of a few tools, we are able to build a fully scalable architecture. All we need is a load balancer (such as HAProxy or even Nginx) configured with a sticky-session and messaging system.