Categories

Reducing Document Usage

Bonsai meters on the number of documents in an index.
Last updated
July 7, 2023

Bonsai meters on the number of documents in an index. There are several types of“documents” in Elasticsearch, but Bonsai only counts the live Lucene documents in primary shards towards the document limit. Documents which have been marked as deleted, but have not yet been merged out of the segment files do not count towards the limit.

In this guide we cover a few different ways of reducing document usage:

  1. How Document Overages Occur
  2. Remove Old Data
  3. Remove an Index
  4. Compact Your Mappings
  5. Upgrade the Subscription

How Document Overages Occur

Users frequently report having "only" a few hundred documents, but the dashboard registers several thousand. This is due to how Elasticsearch counts nested documents. The Index Stats API is used to determine how many documents are in a cluster. The Index Stats API counts nested documents by including all associated documents. In other words, if you have a document with 2 nested documents, this is reported as 3 total documents.

Elasticsearch has several different articles on how nested documents work, but the simplest answer is that it is creating the illusion of complex object by quietly creating multiple hidden documents.

A common point of confusion is that the /_cat/indices endpoint will show one set of documents, while the /_stats endpoint shows a much larger count. This is because the Cat API is counting the“visible” documents, while the Index Stats API is counting all documents. The _stats endpoint is a more true representation of a cluster’s document usage, and is the most fair to all users for metering purposes.

Remove Old Data

If your index has a lot of old data, or“stale” data (documents which rarely show up in searches), then you could simply delete those documents. Deleted documents do not count against your limits.

Remove an Index

Occasionally users are indexing time series data, or database tables that are not actually being searched by the application. Audit usage by using the Interactive Console to check the /_cat/indices endpoint. If you find that there are old or unnecessary indices with data, then delete those.

You may also want to check out the Index Trimmer.

Compact Your Mappings

Changing your mappings to nest less information can greatly reduce your document usage. Consider this sample document:

<div class="code-snippet w-richtext">
<pre><code fs-codehighlight-element="code" class="hljs language-javascript">{  
"title": "Spiderman saves child from well",  
"body":  "Move over, Lassie! New York has a new hero. But is he also a menace?",  
"authors": [
  {
     "name":  "Jonah Jameson",      
     "title": "Sr. Editor",    
  },    
  {      
     "name":  "Peter Parker",      
     "title": "Photos",    
  }  
 ],  
"comments": [    
  {      
     "username": "captain_usa",      
     "comment":  "I understood that reference!",    
  },    
  {      
     "username": "man_of_iron",      
     "comment":  "Congrats on being slightly more useful than a ladder.",    
  }  
 ],  
"photos": [
  {      
     "url":      "https://assets.dailybugle.com/12345",      
     "caption":  "Spiderman delivering Timmy back to his mother",    
  }  
 ]
}</code></pre>
</div>

Note that it’s nesting data for authors, comments and photos. The mapping above would actually result in creating 6 documents. Removing the comments and photos(which usually don’t need to be indexed anyway) would reduce the footprint by 50%.

If you’re using nested objects, review whether any of the nested information could stand to be left out, and then reindex with a smaller mapping.

Upgrade the Subscription

If you find that you’re unable to reduce documents through the options discussed here, then you will need to upgrade to the next plan.

Upgrading Direct Bonsai cluster

Upgrading a Heroku Bonsai cluster

View code snippet
Close code snippet