Metering on Bonsai

The term “metering” in Bonsai refers to the limits imposed on the resources allocated to a cluster. A cluster’s limits are determined by its subscription level; higher-priced plans yield higher limits. There are several metered resources that can result in an overage. This document explains what those resources are and how to resolve related overages:

Additionally, Bonsai meters on Concurrent Connections. Exceeding the connection limit will not result in an overage per se, but rather HTTP 429’s.

Checking on Cluster Status

All Bonsai clusters have a dashboard that indicates the cluster’s resource usage. The relevant section looks like this:


Overages are indicated on the Cluster dashboard (see previous section. If your cluster is over its subscription limits, the overage will be indicated in red like so:

Bonsai uses “soft limits” for metering. This approach does not immediately penalize or disable clusters exceeding the limits of the subscription. This is a much more gentle way of treating users who are probably not even aware that they’re over their limits, or why that may be an issue.

When an overages is detected, it triggers a state machine that follows a specially-designed process that takes increasingly firm actions. This process has several steps:

1. Initial notification (immediately). The owner and members of their team is notified that there is an overage, and provided information about how to address it.

2. Second notification (5 days). A reminder is sent and warns that the cluster is about to be put into read-only mode. Clusters in read-only mode will receive an HTTP 403: Cluster Read Only error message.

3. Read-only mode (10 days). The cluster is put into read-only mode. Updates will fail with a 403 error.

4. Disabled (15 days). All access to the cluster is disabled. Both searches and updates will fail with a HTTP 403: Cluster Disabled error.

Extreme Overages Skip the Process

Overages that are particularly aggressive are subject to being disabled immediately. Bonsai uses a fairly generous algorithm to determine whether an overage is severe and subject to immediate cut-off. This step is uncommon, but is a definite possibility.

Stale Data May Be Lost

Free clusters that have been disabled for a period of time may be purged to free up resources for other users.

The fastest way to deal with an overage is to simply upgrade your cluster’s plan. Upgrades take effect instantly and will unlock any cluster set to read-only or disabled.

If upgrading is not possible for some reason, then the next best option is to address the issue directly. This document contains information about how to address an overage in each metered resource.

Give it a minute!

The resource usage indicators are not real time displays. Cluster stats are calculated every 10 minutes or so. If you address an overage and the change isn’t immediately reflected on the display, don’t worry. The changes will be detected, the dashboard will be updated, and any sanction in place on the cluster will be lifted automatically. Please wait 5-10 minutes before emailing support.

Here are some quick links to questions you may have:

Concurrent Connections

Concurrent connections are simultaneous active connections to a cluster. Bonsai meters on concurrency as one level of defense against Denial of Service (DoS) and noisy neighbor situations, and to help ensure users are not taking more than their fair share of resources.

Bonsai distinguishes between types of connections for concurrency metering purposes:

  • Searches. Search requests. The concurrency allowance for search traffic is generally much higher than for updates and bulk.
  • Updates. Adding, modifying or deleting a given document. Generally has the lowest concurrency allowance due to the high overhead of single document updates.
  • Bulk. Adding, modifying or deleting a batch of documents using the Bulk API. Bulk requests are the preferred way to change data for efficiency reasons.

Bonsai also uses connection pools to account for short-lived bursts of traffic. If all available connections are in use, subsequent connections are put into a FIFO queue and served when a connection becomes available. If all connections are in use and the request pool is exhausted, then Bonsai will return an immediate HTTP 429.

Concurrency is NOT Throughput!

Concurrency allowances are a limit on total active connections, not throughput (requests per second). A cluster with a search concurrency allowance of 10 connections can easily serve hundreds of requests per second. A typical p50 query time on Bonsai is ~10ms. At this latency, a cluster could serve a sustained 100 requests per second, per connection. That would give the hypothetical cluster a maximum throughput of around 1,000 requests per second before the queue would even be needed. In practice, throughput may be closer to 500 rps (to account for bursts, network effects and longer-running requests).

To put these numbers in perspective, a sustained rate of 500 rps is over 43M requests per day. StackExchange – the parent of sites like StackOverflow, ServerFault and SuperUser (and many more) is performing 34M daily Elasticsearch queries. By the time your application demands exceed StackExchange by 25%, you will likely already be on a single tenant configuration with no concurrency limits.

Queued Connections Have a 60s TTL

If the connection queue begins to fill up with connection requests, those requests will only be held for up to 60 seconds. After that time, Bonsai will return a HTTP 504 response. If you are seeing HTTP 429 and 504 errors in your logs, that is an indicator that your cluster has a high volume of long-running queries.

Fixing Concurrency Issues

An HTTP 429 error indicates that traffic to the cluster is not being cleared fast enough given the concurrency allowance. This leaves three possibilities for a resolution:

  1. Upgrade to a plan with a higher concurrency allowance
  2. Reduce the request overhead
  3. Reduce the traffic rate

If reading over these suggestions doesn’t provide enough ideas for resolving the issue, you can always email our support team at to discuss options.

1. Upgrade

Concurrency allowances on Bonsai scale with the plan level. Often the fastest way to resolve concurrency issues is to simply upgrade the plan. Upgrades take effect instantly.

2. Reduce Overhead

The more time a request takes to process, the longer the connection remains open and active. Too many of these requests in a given amount of time will exhaust the connection pool and request queue, resulting in HTTP 429 responses. Reducing the amount of time a request takes to process adds overhead for more traffic.

Some examples of requests which tend to have a lot of overhead and take longer to process:

  • Aggregations on large numbers of results
  • Geospatial sorting on large numbers of results
  • Highlighting
  • Custom scripting
  • Wildcard matching

If you have requests that perform a lot of processing, then finding ways to optimize them with filter caches and better scoping can improve throughput quite a bit.

3. Reduce Traffic Volume

Sometimes applications are written without considering scalability or impact on Elasticsearch. These often result in far more queries being made to the cluster than is really necessary. Some minor changes are all that is needed to reduce the volume and avoid HTTP 429 responses in a way that still makes the application usable.

A common example is autocomplete / typeahead scripts. Often the frontend is sending a query to the Elasticsearch cluster each time a user presses a key. The average computer user types around 3-4 characters per second, and (depending on the query and typist speed) a search launched by one keypress may not even be returned before the next keypress is made. This results in a piling up of requests. More users searching the app exacerbate the problem. Initiating a search every ~500 milliseconds instead of every keypress will be much more scalable without impacting the user experience.

Another example might be a site that boosts relevancy scores by page view. Whenever a page is requested, the application updates the corresponding document in Elasticsearch by incrementing a counter for the number of times the page has been viewed.

This strategy will boost a document’s position in the results in real time, but it also means the cluster is updated every time a user visits any page, potentially resulting in a high volume of expensive single document updates. It would be better to write the page counts to the local database and use a queue and worker system (like Resque) to push out updated page counts using the Bulk API every minute or so. This would be a much cheaper and more scalable approach, and would be just as effective.


A shard is the basic unit of work in Elasticsearch. If you haven’t read the Core Concept on Shards, that would be a good place to start. Bonsai meters on the total number of shards in a cluster. That means both primary and replica shards count towards the limit.

The relationship between shard scheme and your cluster’s usage can sometimes not be readily apparent. For example, if you have an index with a 3x2 sharding scheme (3 primaries, 2 replicas), that’s not 5 or 6 shards, it’s 9. If this is confusing, read our Shards and Replicas documentation for some nice illustrations.

If you have a shard overage, that can mean one of three things:

  1. Extraneous indices.
  2. Sharding scheme is not optimal.
  3. Replication too high

It is also possible that the cluster and its data are already configured according to best practices. In that case, you may need to get creative with aliases and data collocation in order to remain on your current subscription. The Reducing Shard Usage document has information about all of these possibilities.

If you find that you’re unable to reduce shards through the options discussed on the Reducing Shard Usage page, then you basically will need to upgrade to the next plan.


Bonsai meters on the number of documents in an index. There are several types of “documents” in Elasticsearch, but Bonsai only counts the live Lucene documents in primary shards towards the document limit. Documents which have been marked as deleted, but have not yet been merged out of the segment files do not count towards the limit.

The phrase “live Lucene documents” may be a little confusing, but this is due to how Elasticsearch counts nested documents.

The Index Stats API is used to determine how many documents are in a cluster. The Index Stats API counts nested documents by including all associated documents. In other words, if you have a document with 2 nested documents, this is reported as 3 total documents.

Elasticsearch has several different articles on how nested documents work, but the simplest answer is that it is creating the illusion of complex object by quietly creating multiple hidden documents.

How Do I Have a Document Overage? I’m Way Under the Limit!

A common point of confusion is that the /_cat/indices endpoint will show one set of documents, while the /_stats endpoint shows a much larger count. This is because the Cat API is counting the “visible” documents, while the Index Stats API is counting all documents. The _stats endpoint is a more true representation of a cluster’s document usage, and is the most fair to all users for metering purposes.

There are several strategies for resolving a document overage:

Remove Old Data

If your index has a lot of old data, or “stale” data (documents which rarely show up in searches), then you could simply delete those documents. Deleted documents do not count against your limits.

Remove an Index

Occasionally users are indexing time series data, or database tables that are not actually being searched by the application. Audit your usage by using the Interactive Console to check the /_cat/indices endpoint. If you find that there are old or unecessary indices with data, then delete those.

Compact Your Mappings

Changing your mappings to nest less information can greatly reduce your document usage. Consider this sample document:

  "title": "Spiderman saves child from well",
  "body":  "Move over, Lassie! New York has a new hero. But is he also a menace?",
  "authors": [
      "name":  "Jonah Jameson",
      "title": "Sr. Editor",
      "name":  "Peter Parker",
      "title": "Photos",
  "comments": [
      "username": "captain_usa",
      "comment":  "I understood that reference!",
      "username": "man_of_iron",
      "comment":  "Congrats on being slightly more useful than a ladder.",
  "photos": [
      "url":      "",
      "caption":  "Spiderman delivering Timmy back to his mother",

Note that it’s nesting data for authors, comments and photos. The mapping above would actually result in creating 6 documents. Removing the comments and photos (which usually don’t need to be indexed anyway) would reduce the footprint by 50%.

If you’re using nested objects, review whether any of the nested information could stand to be left out, and then reindex with a smaller mapping.

Upgrade the Subscription

If you find that you’re unable to remove any documents or indices, or change your mappings, then you will simply need to upgrade to the next subscription level.


Bonsai meters on the total amount of disk space a cluster can consume. This is for capacity planning purposes, and to ensure multitenant customers have their fair share of resources. Bonsai calculates a cluster’s disk usage by looking at the total data store size in bytes. This information can be found in the Index Stats API.

Resolving disk overages can be resolved in a couple different ways:

Remove Stale Data / Indices

There are some cases where one or more indices are created on a cluster for testing purposes, and are not actually being used for anything. These will count towards the data limits; if you’re getting overage notifications, then you should delete these indices.

GET /_cat/indices
green open prod20180101    1 1 1015123 0  32M  64M
green open prod20180201    1 1 1016456 0  35M  70M
green open prod20180301    1 1 1017123 0  39M  78M
green open prod20180401    1 1 1018456 0  45M  90M
green open prod20180501    1 1 1019123 0  47M  94M
green open prod20180601    1 1 1020456 0  51M  102M

Removing the old and unneeded indices in the example above would free up 356MB. A single command could do it:

# Delete a group of indices:
DELETE /prod20180101,prod20180201,prod20180301,prod20180401,prod20180501

Purge Deleted Documents

Data in Elasticsearch is spread across lots of files called segments. Segments each contain some number of documents. An index could have dozens, hundreds or even thousands of segment files, and Elasticsearch will periodically merge some segment files into others.

When a document is deleted in Elasticsearch, its segment file is simply updated to mark the document as deleted. The data is not actually removed until that segment file is merged with another. Elasticsearch normally handles segment merging automatically, but forcing a segment merging will reduce the overall disk footprint of the cluster by eliminating deleted documents.

Normally this is done through the Optimize / Forcemerge API, but this is unfortunately one of Bonsai’s Unsupported API Endpoints. The same effect can be accomplished however, by simply reindexing. Reindexing will cause the data to be refreshed, and no deleted documents will be tracked by Elasticsearch. This will reduce disk usage.

To check whether this will work for you, look at the /_cat/indices data. There is a column called docs.deleted, which shows how many documents are sitting on the disk and are marked as deleted. This should give a sense of how much data could be freed up by reindexing. For example:

health status index     pri rep docs.count docs.deleted store.size
green  open   my_index  3   2   15678948   6895795      47.1G      15.7G

In this case, the docs.deleted is around 30% of the primary store, or around 4.8G of primary data. With replication, this works out to something like 14.4GB of total disk marked for deletion. Reindexing would reduce the cluster’s disk footprint by this much. The result would look like this:

health status index     pri rep docs.count docs.deleted store.size
green  open   my_index  3   2   15678948   0            32.7G      10.9G

Protip: Queue Writes and Reindex in the Background for Minimal Impact

Your app’s search could be down or degraded during a reindex. If reindexing will take a long time, that may make this option unfeasible. However, you could minimize the impact by using something like Kafka to queue writes while reindexing to a new index.

Search traffic can continue to be served from the old index until its replacement is ready. Flush the queued updates from Kafka into the new index, then destroy the old index and use an alias to promote the new index.

The tradeoff of this solution is that you’ll minimize the impact to your traffic/users, but you’ll need to set up and manage the queue software. You’ll also have a lot of duplicated data for a short period of time, so your footprint could be way above the subscription limit for a short time.

To prevent the state machine from disabling your cluster, you might want to consider temporarily upgrading to perform the operation, then downgrading when you’re done. Billing is prorated, so this would not add much to your invoice. You can always email us to discuss options before settling on a decision.

Reindex with Smaller Mappings

Mappings define how data is stored and indexed in an Elasticsearch cluster. There are some settings which can cause the disk footprint to grow exponentially.

For example, synonym expansion can lead to lots of extra tokens to be generated per input token (if you’re using WordNet, see our documentation article on it, specifically Why Wouldn’t Everyone Want WordNet?). If you’re using lots of index-time synonym expansion, then you’re essentially inflating the document sizes with lots of data, with the tradeoff (hopefully) being improved relevancy.

Another example would be Ngrams. Ngrams are tokens generated from the parts of other tokens. A token like “hello” could be broken into 2-grams like “he”, “el”, “ll”, and “lo”. In 3-grams, it would be “hel”, “ell” and “llo”. And so on. The Elasticsearch Guide has more examples.

It’s possible to generate multiple gram sizes for a single, starting with values as low as 1. Some developers use this to maximize substring matching. But there is an exponential growth in the number of grams generated for a single token:

This relationship is expressed mathematically as:

In other words, a token with a length of 5 and a minimum gram size of 1 would result in (1/2)*5*(5+1)=15 grams. A token with a length of 10 would result in 55 grams. The grams are generated per token, which leads to an explosion in terms for a document.

As a sample calculation: if a typical document in your corpus has a field with ~1,000 tokens and a Rayleigh distribution of length with an average of ~5, you could plausibly see something like a 1,100-1,200% inflation in disk footprint using Ngrams of minimum size 1. In other words, if the non-grammed document would need 100KB on disk, the Ngrammed version would need over 1MB. Virtually none of this overhead would improve relevancy, and would probably even hurt it.

Nested documents are another example of a feature can also increase your data footprint without necessarily improving relevancy.

The point is that there are plenty of features available that lead to higher disk usage than one might think at first glance. Check on your mappings carefully: look for large synonym expansions, make sure you’re using Ngrams with a minimum gram size of 3 or more (also look into EdgeNGrams if you’re attempting autocomplete), and see if you can get away with fewer nested objects. Reindex your data with the updated mappings, and you should see a definite improvement.

Upgrade the Subscription

If you find that you’re unable to remove data, reindex, or update your mappings – or that these changes don’t yeild a stable resolution – then you will simply need to upgrade to the next subscription level.