Back to all blogposts

How to deal with AWS API throttling? We protected our text chat app from it – learn how!

Łukasz Brzeziński

Łukasz Brzeziński

Node.js developer

“Too many requests! You have made too many requests!! No more requests!!!” – How many times do you have to hear that before you get really angry? Amazon Chime API’s request throttling tested our patience like this. But all we ever wanted was to make a simple text chat app work! In this article, you’ll find out why Chime was so unkind to us, what we did to turn things around, and how you too can follow the path we forged.

Request throttling is not always at the forefront of the discussions about Amazon Web Services inc services and tools as it is rarely a major cause for concern when developing a cloud-based system. Usually, both business and development focus more on the amount of memory and storage they can use.

However, in a recent project of mine, the issue of request throttling on AWS became one of our most severe bottlenecks, turning the seemingly simple task of adding a text chat feature into quite a complex solution full of creative workarounds. 

How did my team and I find a way around request throttling to save a high-traffic text chat application? 

You’ll find the answer in this practical case study.

Background – what kind of text chat did the client need?

To make the matter clear, I need to tell you a thing or two about the client I got to work for.

The client

The COVID pandemic caused all kinds of troubles and thwarted a lot of plans of both individuals and companies. Lots of people missed out on important gatherings, holiday trips, and family time, not to mention long-awaited concerts, workshops, art shows, and live theatre events. 

One of my clients came up with a solution – an online streaming platform that enables artists to organize and conduct their events remotely for the enjoyment of their paying fans. Essentially, it made it possible to bring a big part of the live experience to the web. It also opened up new opportunities in a post-COVID world, making art more accessible to many people, anywhere they might be.

The client noticed that performing without a public to look at and interact with could get kind of awkward for artists. To provide a way for performers and fans to interact, we decided to equip the platform with a text chat.

The theory – what is throttling all about?

Before we get to the app, let’s review the theory behind the request throttling limit. 

As you probably know, throttling is the process of limiting the number of requests you (or your authorized developer) can submit to a given operation in a given amount of time.

A request can be when you submit an inventory feed or when you make an order report request.

The term „API Request Throttling” used throughout this article refers to a quota put by an API owner on the number of requests a particular user can make in a certain period (throttling limits).

That way, you keep the amount of workload put on your services in check, ensuring that more users can enjoy the system at the same time. After all, it is important to share, isn’t it?

What about request throttling on the AWS cloud?

In my opinion, the documentation of EC2, one of the most popular AWS services, has quite a satisfying answer:

Amazon EC2 throttles EC2 API requests per account on a per-Region basis. The goal here is to improve the performance of the service and ensure fair usage for all Amazon EC2 customers. Throttling makes sure that calls to the Amazon EC2 API do not exceed the maximum allowed API request limits.

Each AWS service has its own rules defining the number of requests one can make.

Challenge – not your regular text chat

When you read the title, you might have thought to yourself: “A text chat? What’s so difficult about that?”.

Well, as any senior developer would tell you, just about anything can be difficult – depending on the outside factors.

And it just so happens that this outside factor was special.

Not your regular, everyday text chat

For the longest time, the AWS-based infrastructure of the platform had no issues with throttling. It all started showing up during the text chat implementation of the Amazon Chime communications service.

In a typical chat app, you give users the ability to register and choose chat channels they wish to join. Ours worked a bit differently.

In this one, each chat participant was to be distinguished by the very same ticket code they used to gain access to the online event stream.

Unfortunately, the ticket data downloaded from our client’s ticketing system did not allow us to tell if a particular ticket belonged to a returning user. Because of that, we couldn’t assign previous chat user accounts to the ticket. As a result, we had to create them from scratch for each event.

Slow down there, cowboy!

Right after downloading over a thousand tickets worth of data from our client’s system, we tried to create Chime resources for all of them. 

Almost immediately, we were cut off by the throttling error message due to the high load, far exceeding the allowed maximum number of 10 requests per second. We have encountered the same problem of maximum capacity when trying to delete resources when they were no longer needed.

All of our problems stemmed from the fact that we did not immediately verify the number of requests we can make to the Amazon Chime API.

Slow and steady wins the race?

„What about using bulk methods?”, you might ask. Well, at the time of writing this article there weren’t many of those available.

Out of the 8 batch operations listed in the Amazon Chime API reference, we ended up using only one – the ability to create multiple channel memberships (with a limit of 100 memberships created at a time).

Even list operations are limited to fetching 50 items at a time, which made it troublesome for us to keep up with the limits concerning the number of users the chat solution would handle.

At this point, we decided to try to create each and every user separately.

The solution to request throttling for Amazon Chime

Obviously, we had to adjust our expectations regarding the speed at which this process would operate. 

At 10 requests per second, we could create almost the same number of users (including brief breaks for creating channels, moderators, batch memberships, and so on). Luckily, our client was fine with waiting a bit more. 

Every event was set up in advance, giving our solution enough time to create everything before the event even starts.

We also had to be sure that for each ticket only one user would be created since Chime restricts the number of those as well.

Our solution was to set up a separate EC2 instance, which created Chime resources asynchronously to the process of setting up an event and downloading tickets, limiting the client’s call count.

AWS API throttling solution – implementation

First of all, the setup includes a CRON job scheduled to run every minute. 

This job has built-in protection from running twice if the previous job didn’t finish in time. Another layer of protection from running a job twice is having it run on a single EC2 instance, without upwards scalability.

Running createChimeUsers method retrieves a set amount of tickets from a private database (150-200 worked best for us).

And for each ticket we retrieved from our database, we are creating a chat user.

We have used a package called throttled-queue as our throttling handler.

The throttledQueue parameters are responsible for the following configuration of Chime API calls:

  • number of maximum requests we would like to perform per given amount of time,
  • time (in milliseconds) during which a number of requests up to a maximum will be performed,
  • boolean flag determining whether requests should be spread out evenly in time (true) or should run as soon as possible (false).

In the given example, adhering to Amazon Chime limitations, we are making 10 calls per second, evenly spread out throughout this time.

You want more AWS insights? Find out about our developer's experience with API Gateway, including API Gateway console, REST API, auto scaling, burst limit, and more.

Project deliverables

So what did we achieve as a result of doing all that?


As far as tangible deliverables go, we achieved the following:

  1. Service responsibilities have been split. As a result, the separation of concerns for text chat and ticket retrieval has been achieved.
  2. The ticket retrieval system does not concern itself with text chat anymore. Consequently, throttling or any other AWS-related issues do not impact this function anymore.
  3. The new text chat service handles all of the requests to Amazon Chime. Extracting this from a scalable service to a non-scalable one ensures that no duplicates of Chime resources will be created.

What’s even better, our success made a noticeable difference for the product itself!


Now, that the request throttling issue has been resolved:

  1. A smooth chatting experience for end users has been achieved. There were no detected issues in terms of missing chat users or inaccessible chat. 
  2. Performers can interact with the audience undisturbed by technical limitations.
  3. With the current configuration, we are able to create 9000 (!) chat users per hour.
Solving the throttling issue brought business benefits to the client

As you can see, finding the custom way around Chime’s request throttling resulted in many tangible benefits for the client, without having to look for an entirely new text chat solution.

Text chat & request throttling issues on AWS – lessons learned

And that’s how my team and I managed to work around the issue of AWS API throttiling in Amazon Chime, scoring a couple of wins for the clients in the process.

The challenge taught us, or should I say – reminded us of a few lessons:

  • Despite offering a lot of benefits for the efficiency and stability of your infrastructure, AWS is not just a gift that keeps on giving. It’s vital to remember that all AWS users have the responsibility to follow all the rules so that resources are shared fairly.
  • Before you choose a particular AWS service, you should pay attention to all the limits regarding efficiency, security, request handling, and others set by the provider.
  • In the case of cloud-based architectures, increasing the efficiency (that is, the number of requests per second) may be either impossible or very costly. You need to be aware of what you can do with your cloud service at the design stage in order to avoid problems later on.

If you can follow these guidelines, you should not come across any surprises during the implementation of a text chat or any other app or functionality.

Do you want to work on AWS cases like this with us?

The Software House provides multiple such opportunities for ambitious developers that love challenging cloud projects.

State of Frontend 2024

👨‍💻 Help the Frontend community! Answer the State of Frontend 2024 global survey. Takes less than 10 mins.

I want to help

The Software House is promoting EU projects and driving innovation with the support of EU funds

What would you like to do?

    Your personal data will be processed in order to handle your question, and their administrator will be The Software House sp. z o.o. with its registered office in Gliwice. Other information regarding the processing of personal data, including information on your rights, can be found in our Privacy Policy.

    This site is protected by reCAPTCHA and the Google
    Privacy Policy and Terms of Service apply.

    We regard the TSH team as co-founders in our business. The entire team from The Software House has invested an incredible amount of time to truly understand our business, our users and their needs.

    Eyass Shakrah

    Co-Founder of Pet Media Group


    Thank you for your inquiry!

    We'll be back to you shortly to discuss your needs in more detail.