Reducing Concurrency

Concurrent connections are effective number of connections a cluster can handle before it returns an HTTP 429 response. Bonsai meters on concurrency as one level of defense against Denial of Service (DoS) and noisy neighbor situations, which helps ensure users are not taking more than their fair share of resources.

This guide will cover concurrent connection limits and some strategies for avoiding them:

  1. What are concurrent connection limits?
  2. Upgrade to a plan with a higher concurrency allowance
  3. Reduce the request overhead
  4. Reduce the traffic rate

If reading over these suggestions doesn’t provide enough ideas for resolving the issue, you can always email our support team at to discuss options.

What Are Concurrent Connection Limits?

Bonsai distinguishes between types of connections for concurrency metering purposes:

  • Searches. Search requests. The concurrency allowance for search traffic is generally much higher than for updates.
  • Updates. Adding, modifying or deleting a given document. Generally has the lowest concurrency allowance due to the high overhead of single document updates.This includes _bulk requests.

Bonsai also uses connection pools to account for short-lived bursts of traffic.

The effective number of connections the cluster can handle is the pool size, plus the number of "active" connections. Active connections are being served by the cluster and not sitting in the queue.

If all active connections are in use, subsequent connections are put into a FIFO queue and served when a connection becomes available. If all connections are in use  and the request pool is exhausted, then Bonsai will return an immediate HTTP 429.

Concurrency is NOT Throughput!

Concurrency allowances are a limit on total active connections, not throughput (requests per second). A cluster with a search concurrency allowance of 10 connections can easily serve hundreds of requests per second. A typical p50 query time on Bonsai is ~10ms. At this latency, a cluster could serve a sustained 100 requests per second,  per connection. That would give the hypothetical cluster a maximum throughput of around 1,000 requests per second before the queue would even be needed. In practice, throughput may be closer to 500 rps (to account for bursts, network effects and longer-running requests).

To put these numbers in perspective, a sustained rate of 500 rps is over 43M requests per day. StackExchange – the parent of sites like StackOverflow, ServerFault and SuperUser (and many more) is performing 34M daily Elasticsearch queries. By the time your application demands exceed StackExchange by 25%, you will likely already be on a single tenant configuration with no concurrency limits.

Queued Connections Have a 60s TTL

If the connection queue begins to fill up with connection requests, those requests will only be held for up to 60 seconds. After that time, Bonsai will return a HTTP 504 response. If you are seeing HTTP 429 and 504 errors in your logs, that is an indicator that your cluster has a high volume of long-running queries.


Concurrency allowances on Bonsai scale with the plan level. Often the fastest way to resolve concurrency issues is to simply upgrade the plan. Upgrades take effect instantly.

Upgrading Direct Bonsai cluster

Upgrading a Heroku Bonsai cluster

Upgrading a Manifold Bonsai cluster

Reduce Overhead

The more time a request takes to process, the longer the connection remains open and active. Too many of these requests in a given amount of time will exhaust the connection pool and request queue, resulting in HTTP 429 responses. Reducing the amount of time a request takes to process adds overhead for more traffic.

Some examples of requests which tend to have a lot of overhead and take longer to process:

  • Aggregations on large numbers of results
  • Geospatial sorting on large numbers of results
  • Highlighting
  • Custom scripting
  • Wildcard matching

If you have requests that perform a lot of processing, then finding ways to optimize them with filter caches and better scoping can improve throughput quite a bit.

Reduce Traffic Volume

Sometimes applications are written without considering scalability or impact on Elasticsearch. These often result in far more queries being made to the cluster than is really necessary. Some minor changes are all that is needed to reduce the volume and avoid HTTP 429 responses in a way that still makes the application usable.

A common example is autocomplete / typeahead scripts. Often the frontend is sending a query to the Elasticsearch cluster each time a user presses a key. The average computer user types around 3-4 characters per second, and (depending on the query and typist speed) a search launched by one keypress may not even be returned before the next keypress is made. This results in a piling up of requests. More users searching the app exacerbate the problem. Initiating a search every ~500 milliseconds instead of every keypress will be much more scalable without impacting the user experience.

Another example might be a site that boosts relevancy scores by page view. Whenever a page is requested, the application updates the corresponding document in Elasticsearch by incrementing a counter for the number of times the page has been viewed.

This strategy will boost a document’s position in the results in real time, but it also means the cluster is updated every time a user visits any page, potentially resulting in a high volume of expensive single document updates. It would be better to write the page counts to the local database and use a queue and worker system (like Resque) to push out updated page counts using the Bulk API every minute or so. This would be a much cheaper and more scalable approach, and would be just as effective.