Reducing Document Usage
Bonsai meters on the number of documents in an index. There are several types of“documents” in Elasticsearch, but Bonsai only counts the live Lucene documents in primary shards towards the document limit. Documents which have been marked as deleted, but have not yet been merged out of the segment files do not count towards the limit.
In this guide we cover a few different ways of reducing document usage:
- How Document Overages Occur
- Remove Old Data
- Remove an Index
- Compact Your Mappings
- Upgrade the Subscription
How Document Overages Occur
Users frequently report having "only" a few hundred documents, but the dashboard registers several thousand. This is due to how Elasticsearch counts
Index Stats API
is used to determine how many documents are in a cluster. The Index Stats API counts
by including all associated documents. In other words, if you have a document with 2 nested documents, this is reported as 3 total documents.
Elasticsearch has several different articles on how nested documents work, but the simplest answer is that it is creating the illusion of complex object by quietly creating multiple hidden documents.
A common point of confusion is that the /_cat/indices endpoint will show one set of documents, while the /_stats endpoint shows a much larger count. This is because the Cat API is counting the“visible” documents, while the Index Stats API is counting
all documents. The _stats endpoint is a more true representation of a cluster’s document usage, and is the most fair to all users for metering purposes.
Remove Old Data
If your index has a lot of old data, or“stale” data (documents which rarely show up in searches), then you could simply delete those documents. Deleted documents do not count against your limits.
Remove an Index
Occasionally users are indexing time series data, or database tables that are not actually being searched by the application. Audit usage by using the Interactive Console to check the /_cat/indices endpoint. If you find that there are old or unnecessary indices with data, then delete those.
Compact Your Mappings
Changing your mappings to nest less information can greatly reduce your document usage. Consider this sample document:
"title": "Spiderman saves child from well",
"body": "Move over, Lassie! New York has a new hero. But is he also a menace?",
"name": "Jonah Jameson",
"title": "Sr. Editor",
"name": "Peter Parker",
"comment": "I understood that reference!",
"comment": "Congrats on being slightly more useful than a ladder.",
"caption": "Spiderman delivering Timmy back to his mother",
Note that it’s nesting data for authors, comments and photos. The mapping above would actually result in creating 6 documents. Removing the comments and photos(which usually don’t need to be indexed anyway) would reduce the footprint by 50%.
If you’re using nested objects, review whether any of the nested information could stand to be left out, and then reindex with a smaller mapping.