07 October 2019
Serverless architecture in Node.js: Case study of an open-source app
This is the continuation of the article posted some time ago – Serverless in Node.js: Beginner’s guide. If you haven’t read it yet, I strongly encourage you to do so. It can help you gain some knowledge about basic concepts regarding serverless architecture in Node.js. In the article below, I’ll take a deeper look under the hood of this technology. Moreover, you’ll find out how we used it in the translation management system developed at The Software House.
I’m going to show you tell Serverless architecture using the example of one of The Software House’s most successful open-source projects – BabelSheet. Let’s get started!
BabelSheet – our translation management system
Before digging into the details about serverless architecture in Node.js, it’s quite important to know something about concepts and business requirements that lie behind the translation management system we’ve developed (and we use it on a daily basis).
BabelSheet allows you to translate the user interface and application’s content to a specific language. We decided to use Google Spreadsheets as a user interface. It simply gives a lot of features out of the box, e.g.: multiple users working on translations at the same time and automatic translations to other languages. Also, it contains a scheduler to synchronize translations, a web server allowing you to fetch them, and CLI tools which can generate translations in various formats on your local environment. What’s more, there is a cache layer for better performance.
The best part of it is that BabelSheet is completely free. We created an open-source project developed under the MIT license. If you would like to check how it’s done, feel free to visit its GitHub page.
Before trying to migrate the translation service to serverless architecture, we needed to know what was inside. BabelSheet consists of two services, the producer and the API. The first one is responsible for fetching Google Spreadsheet files containing translations every few minutes, then transforming them into a JSON format and storing it in a Redis cache storage. On the other hand, the API service is a standard Express.js-based web server that handles requests for translations in various formats and serves the output. It uses cached data stored in Redis for performance optimization.
Let’s go serverless!
At first, it’s quite important to know all the steps which are necessary to migrate almost any kind of Node.js application to a serverless architecture.
The first and most important thing is to make an application stateless. Otherwise, since functions are not executed in long-running processes, you would lose data. It’s up to you where that data will be stored. Functions can communicate with external storage sources such as DynamoDB or Redis. The second one would be a good choice for caching data due to its speed. It’s worth remembering that every function has access to 512MB of ephemeral disk capacity within its execution context. We can store some cached data and reuse it for later executions, but functions always need to check if proper data exists before trying to access it. There’s no guarantee of how long it’ll be available.
Once you get rid of the state, the next step is to parametrize the application, if you haven’t done it yet. The best way is to make use of environmental variables, which can be securely provided for AWS Lambda functions through AWS Systems Manager Parameter Store.
The next step is optional but it can make it much easier to work with the application. This is all about storage abstraction which gives the possibility to inject proper storage implementations through dependency injection containers, depending on the type of environment. For local development and testing purposes, the in-memory storage can be used to simplify the process. Real databases can be used in production after switching the storage implementation to another. Unfortunately, this approach has two main drawbacks. By developing software this way you can omit possible errors and then encounter them on production. Moreover, this is not always possible to write the in-memory equivalent of real storage for more sophisticated databases and all their mechanisms (for example transactions).
Proper application architecture is crucial to achieve quick results during migration to another platform.
The next part of porting the application is to write function handlers containing your application’s logic.
Writing function handlers for producer and API services
Thankfully, an Express.js-based web server can be easily exported as a function handler. It’s because there is a ready-made solution that simplifies it. You just need to install “serverless-http” package from npm and wrap your exported server instance. Please note that you must not run this server manually by listening to a port.
With this approach, the whole application (even if there are multiple routes and endpoints) is covered by just one function handler. This is not the best solution for each occasion and you may consider dividing your application into smaller ones and then wrapping every endpoint in a separate function handler. Both approaches have some advantages and disadvantages when it comes to performance and complexity comparison.
Writing a function handler for producer service is different because it is just a plain Node.js script without any kind of web server. In this case, we need to write it manually, invoke our business logic inside and return a proper response based on the fact that everything was successful or not.
The last step is to prepare your provisioning stack. It can be achieved with the use of various frameworks. For the sake of simplicity, we chose Serverless Framework, but it is always worth knowing some alternatives.
A few notes about provisioning frameworks
The main reason to use provisioning frameworks is to make it easier to prepare your technology stack and deploy the application. This is achieved with the use of a special configuration file which is then translated into a set of commands executed on the provider’s infrastructure, resulting in a ready-to-use application with all its resources.
Such frameworks usually offer more tools for other operations such as updating the whole infrastructure, making rollbacks, and previewing various logs and stats. There are two very popular frameworks that I would like to compare – Serverless Framework and AWS Serverless Application Model.
Its main feature is the fact it is provider agnostic. With some small changes in a configuration file, you can switch from one provider to another and the application should work the same way. Things may get more complicated if some parts of a configuration are strictly related to some provider’s specific, unique services. For example, if you configure DynamoDB for your AWS Lambda based application, it might be more complicated to find a similar configuration for, let’s say, Google Cloud Platform. As a result, some changes in the application’s code might be necessary as well.
AWS Serverless Application Model
It’s important to remember that in contrast to Serverless Framework, SAM works only with AWS infrastructure – you can’t switch to another provider later. On the other hand, this solution has some benefits.
Being bound to a specific provider allows you to utilize its services even more efficiently because the framework has access to most or even all of its unique features. Moreover, it might get updated more frequently since there is no need to support a lot of other providers’ specific solutions.
The next big advantage of AWS SAM is the fact that it has better function emulation for a local development environment. Instead of having a plugin that has only some basic functionality for function invocation, SAM tries to emulate a lot of other AWS services, trying to reflect AWS infrastructure as precisely as possible. It is achieved with the use of Docker images. Thanks to that approach, some major bugs may be found faster and more frequently, allowing you to deliver a product of better quality.
If you are creating a solution based on AWS, then both AWS SAM and Serverless Framework use CloudFormation under the hood, which is another AWS service responsible for the proper provisioning of the whole infrastructure. You can think of these frameworks as a layer between your app and a provider. A layer that simplifies things to make you more productive.
See also: New Node.js new features
Which one should I choose?
If you are exploring various FaaS possibilities and would like to get started as soon as possible, then the Serverless Framework might be better. On the other hand, if you know that your application will be hosted on AWS for sure, then their Serverless Application Model would be a better choice.
You should make some time investment when choosing a provider and provisioning framework. As an application grows larger and larger, it might get harder to migrate to another one.
Configuring Serverless Framework for BabelSheet
Although Serverless Framework hides some complexity around provisioning details, we still need to know what kind of services are under the hood and how to use them. Take a look at the provisioning configuration for the translations service.
As you can see, there are a lot of details we need to provide to create DynamoDB table and assign proper security roles to it. But don’t worry! You don’t need to understand all these details because such configurations are available as examples on Serverless Framework or AWS websites. On the other hand, changing standard Node.js scheduler to CloudWatch scheduler is as simple as creating a scheduled event with a defined rate of execution.
At this point, you are ready to deploy the whole application just like we did it with the “Hello World” example in the previous article. Once deployed, the CloudWatch scheduler will trigger the producer function every 5 minutes. As a result, cached translations will be stored in DynamoDB database, to be later fetched by API function and served through the AWS API Gateway service to the end-user.
Serverless architecture in Node.js – summary
Before deciding to go serverless, it’s quite urgent to know all the components of your app.
If it’s developed the right way, with stateless services, and abstractions allowing the provision of different configurations and storage implementations, then migration shouldn’t be that difficult to achieve. Especially if your app is Express.js based, which makes defining function handler a piece of cake.
Last, but not least, you should choose a provisioning framework that suits your needs best and learn what kind of provider’s services are behind it and how to use them. It will give you all the knowledge you need to turn your Node.js application into a serverless architecture.