Categories

Reducing Shard Usage

A shard is the basic unit of work in Elasticsearch.
Last updated
July 7, 2023

A shard is the basic unit of work in Elasticsearch. If you haven’t read the Shards section in Elasticsearch Core Concepts, that would be a good place to start. Bonsai meters on the total number of shards in a cluster. That means both primary and replica shards count towards the limit.

The relationship between shard scheme and your cluster’s usage can sometimes not be readily apparent. For example, if you have an index with a 3x2 sharding scheme (3 primaries, 2 replicas), that’s not 5 or 6 shards, it’s 9. If this is confusing, read our Shard Primer documentation for some nice illustrations.

Managing shards is a basic skill for any Elasticsearch user. Shards carry system overhead and potentially stale data. Keeping your cluster clean by pruning old shards can both improve performance and reduce your server costs.

In this guide we cover a few different ways of reducing shard usage:

  1. What is a Shard Overage?
  2. Delete Unneeded Indices
  3. Use a Different Sharding Scheme
  4. Reduce replication
  5. Data Collocation
  6. Upgrade the Subscription

This guide will make frequent references to the Elasticsearch API using the command line tool <span class="inline-code"><pre><code>curl</code></pre></span>. Interacting with your own cluster can also be done via <span class="inline-code"><pre><code>curl</code></pre></span>, or via a web browser. You can also use the interactive Bonsai Console in your cluster dashboard.

What is a Shard Overage?

A shard overage occurs when your cluster has more shards than the subscription allows. In practice, this usually means one of three things:

  1. Extraneous indices.
  2. Sharding scheme is not optimal.
  3. Replication too high

It is also possible that the cluster and its data are already configured according to best practices. In that case, you may need to get creative with aliases and data collocation in order to remain on your current subscription.

Delete Unneeded Indices

There are some cases where one or more indices are created on a cluster for testing purposes, and are not actually being used for anything. These will count towards the shard limits; if you’re getting overage notifications, then you should delete these indices.

There are also some clients that will use aliases to roll out changes to mappings. This is a really nice feature that allows for zero-downtime reindexing and immediate rollbacks if there’s a problem, but it can also result in lots of stale indices and data accumulating in the cluster.

To determine if you have extraneous indices, use the <span class="inline-code"><pre><code>/_cat/indices</code></pre></span> endpoint to get a list of all the indices you have in your cluster:

<div class="code-snippet w-richtext">
<pre><code fs-codehighlight-element="code" class="hljs language-javascript">&lt;script> console.log('hello'); &lt;/script>GET /_cat/indices
green open test1           1 1 0 0  318b  159b
green open test2           1 1 0 0  318b  159b
green open test3           1 1 0 0  318b  159b
green open test4           1 1 0 0  318b  159b
green open test5           1 1 0 0  318b  159b
green open test6           1 1 0 0  318b  159b</code></pre>
</div>

If you see indices that you don’t need, you can simply delete them:

<div class="code-snippet-container">
<a fs-copyclip-element="click-2" href="#" class="btn w-button code-copy-button" title="Copy">
<img class="copy-image" src="https://global-uploads.webflow.com/63c81e4decde60c281417feb/6483934eeefb356710a1d2e9_icon-copy.svg" loading="lazy" alt="">
<img class="copied-image" src="https://assets-global.website-files.com/63c81e4decde60c281417feb/64839e207c2860eb9e6aa572_icon-copied.svg" loading="lazy" alt="">
</a>
<div class="code-snippet">
<pre><code fs-codehighlight-element="code" fs-copyclip-element="copy-this-2" class="hljs language-javascript"># Delete a single index:
DELETE /test1

# Delete a group of indices:
DELETE /test2,test3,test4,test5</code></pre>
</div>
</div>

Use a Different Sharding Scheme

It’s possible that for some reason one or more indices were created with far more shards than necessary. For example, a check of <span class="inline-code"><pre><code>/_cat/indices</code></pre></span> shows something like this:

<div class="code-snippet w-richtext">
<pre><code fs-codehighlight-element="code" class="hljs language-javascript">GET /_cat/indices
green open test1           5 2 0 0  318b  159b
green open test2           5 2 0 0  318b  159b</code></pre>
</div>

That 5x2 shard scheme results in 15 shards per index, or 30 total for this cluster. This is probably really overprovisioned for these indices. Choosing a more conservative shard scheme of 1x1 would reduce this cluster’s usage from 30 shards down to 4.

Unfortunately, the number of primary shards can not be changed once an index has been created. To fix this, you will need to manually create a new index with the desired shard scheme and reindex the data. If you have not read The Ideal Elasticsearch Index, it has some really nice information on capacity planning and sizing. Check out the sections on Intelligent Sharding and Benchmarking for some tips on what scheme would make more sense for your particular use case.

Reduce replication

Multitenant subscription plans (Hobby and Standard tiers) are required to have at least 1 replica shard in use per index. Indices on these plans cannot have a replica count of 0.

For most use-cases, a single replica is perfectly sufficient for redundancy and load capacity. If any of your indices have been created with more than one replica, you can reduce it to free up shards. An index with more than one replica might look like this:

<div class="code-snippet w-richtext">
<pre><code fs-codehighlight-element="code" class="hljs language-javascript">&lt;script> console.log('hello'); &lt;/script>GET /_cat/indices?v
health status index    pri rep docs.count docs.deleted store.size pri.store.size
green  open   test1    5   2   0          0            318b       159b
green  open   test2    5   2   0          0            318b       159b</code></pre>
</div>

Note that the <span class="inline-code"><pre><code>rep</code></pre></span> column has a 2? That means there are actually 3 copies of the data: one primary shard, and its two replicas. Replicas are a multiplier against primary shards, so if an index has a 5×2 configuration (5 primary shards with 2 replicas), reducing replication to 1 will free up five shards, not just one. See the Shard Primer for more details.

Fortunately, reducing the replica count for the index is a small JSON body to the <span class="inline-code"><pre><code>_settings</code></pre></span> endpoint:

<div class="code-snippet-container">
<a fs-copyclip-element="click-3" href="#" class="btn w-button code-copy-button" title="Copy">
<img class="copy-image" src="https://global-uploads.webflow.com/63c81e4decde60c281417feb/6483934eeefb356710a1d2e9_icon-copy.svg" loading="lazy" alt="">
<img class="copied-image" src="https://assets-global.website-files.com/63c81e4decde60c281417feb/64839e207c2860eb9e6aa572_icon-copied.svg" loading="lazy" alt="">
</a>
<div class="code-snippet">
<pre><code fs-codehighlight-element="code" fs-copyclip-element="copy-this-3" class="hljs language-javascript">PUT /test1,test2/_settings -d '{"index":{"number_of_replicas":1}}'
{"acknowledged":true}

GET /_cat/indices
green open test1           5 1 0 0  318b  159b
green open test2           5 1 0 0  318b  159b</code></pre>
</div>
</div>

That simple request shaved 10 shards off of this cluster’s usage.

Replication == Availability and Redundancy

It might seem like a good money-saving idea to simply set all replicas to 0 so as to fit as many indices into your cluster as possible. However, this is not advisable. This means that your primary data has no live backup. If a node in your cluster goes offline, data loss is basically guaranteed.

Data can be restored from a snapshot, but this is messy and not a great failover plan. The snapshot could be an hour or more old, and any updates to your data since then either need to be reindexed or are lost for good. Additionally, the outage will last much longer.

Having replication of at least 1 mitigates against all these problems.

Data Collocation

Another solution for reducing shard usage involves using aliases and custom routing rules to collocate different data models onto the same group of shards.

What is data collocation? Many Elasticsearch clients use an index per model paradigm as the default for data organization. This is analogous to, say, a Postgres database with a table for each type of data being indexed.

Sharding this way makes sense most of the time, but in some rare cases users may benefit from putting all of the data into a single namespace. In the Postgres analogy, this would be like putting all of the data into a single table instead of a table for each model. An attribute (i.e., table column) is then used to filter searches by class.

For example, you might have a cluster that has three indices: <span class="inline-code"><pre><code>videos</code></pre></span>, <span class="inline-code"><pre><code>images</code></pre></span> and <span class="inline-code"><pre><code>notes</code></pre></span>. If each of these has a conservative 1x1 sharding scheme, it would require 6 shards. But this data could potentially be compacted down into a single index, <span class="inline-code"><pre><code>production</code></pre></span>, where the mapping has a <span class="inline-code"><pre><code>type</code></pre></span> field of some kind to indicate whether the document belongs to the <span class="inline-code"><pre><code>video</code></pre></span>, <span class="inline-code"><pre><code>image</code></pre></span> or <span class="inline-code"><pre><code>note</code></pre></span> class.

The latter configuration with the same 1x1 scheme would only require two shards (one primary, one replica) instead of six:

There are several major downsides to this approach. One is that field name collisions become an issue. For example, if there is a field called <span class="inline-code"><pre><code>published</code></pre></span> in two models, but one is defined as a boolean and the other is a datetime, it will create a conflict in the mapping. One will need to be renamed.

Another downside is that it is a pretty large refactor for most users, and may be more trouble than simply upgrading the plan. Overriding the default behavior in the application’s Elasticsearch client may require forking the code and relying on other hacks/workarounds.

There are others. Data collocation is mentioned here as a possibility, and one that only works for certain users. It is by no means a recommendation.

Upgrade the Subscription

If you find that you’re unable to reduce shards through the options discussed here, then you will need to upgrade to the next plan.

View code snippet
Close code snippet