{"_id":"56e8747747de1e170005945a","parentDoc":null,"user":"5633ec9b35355017003ca3f2","version":{"_id":"5633ec007e9e880d00af1a56","project":"5633ebff7e9e880d00af1a53","__v":15,"createdAt":"2015-10-30T22:15:28.105Z","releaseDate":"2015-10-30T22:15:28.105Z","categories":["5633ec007e9e880d00af1a57","5633f072737ea01700ea329d","5637a37d0704070d00f06cf4","5637cf4e7ca5de0d00286aeb","564503082c74cf1900da48b4","564503cb7f1fff210078e70a","567af26cb56bac0d0019d87d","567afeb8802b2b17005ddea0","567aff47802b2b17005ddea1","567b0005802b2b17005ddea3","568adfffcbd4ca0d00aebf7e","56ba80078cf7c9210009673e","574d127f6f075519007da3d0","574fde60aef76a0e00840927","57a22ba6cd51b22d00f623a0"],"is_deprecated":false,"is_hidden":false,"is_beta":false,"is_stable":true,"codename":"","version_clean":"1.0.0","version":"1.0"},"project":"5633ebff7e9e880d00af1a53","category":{"_id":"567b0005802b2b17005ddea3","pages":["567b001117368a0d009a6e10","567b00307c40060d005603e7","567b039a7c40060d005603ec"],"project":"5633ebff7e9e880d00af1a53","version":"5633ec007e9e880d00af1a56","__v":3,"sync":{"url":"","isSync":false},"reference":false,"createdAt":"2015-12-23T20:11:49.377Z","from_sync":false,"order":1,"slug":"best-practices","title":"Best Practices"},"__v":5,"updates":[],"next":{"pages":[],"description":""},"createdAt":"2016-03-15T20:45:43.071Z","link_external":false,"link_url":"","githubsync":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":0,"body":"Here are a few tips on data partitioning and index naming that may help you get started, and avoid common pitfalls as you scale.\n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Creating an index: Partitioning your data\"\n}\n[/block]\nAn index in Elasticsearch is loosely analogous to a table in a SQL database. Smaller apps can often get away with using a single monolithic index per environment, and save on the extra overhead. Much larger apps can take advantage of multiple indices as a natural place to segment and partition dissimilar data. (For example, a table with 10 million events may benefit from being indexed and configured separately from a table with 10 thousand blog articles.)\n\nNot only can you partition your data by using multiple indices, you can partition each index by using Elasticsearch shards. An index is comprised of primary shards, which subdivide your data across multiple servers. Indices also have replica shards, which serve as extra copies of the primaries for redundancy and to support higher search volumes. The smallest possible index is a single primary shard.\n\nWhere’s the threshold for these decisions? If you’re under a million documents or a gigabyte of data, then it’s probably not something worth worrying about. More than that, and you may want to consult our more detailed article on data partitioning in Elasticsearch. Conversely, if you have a very small set of data and are working on a tight budget, you may want to do a bit of extra work to consolidate your shard usage.\n\nOtherwise, for sensible defaults, many Elasticsearch clients will create an index per model. And unless otherwise specified, new indices on your Bonsai cluster will default to using a single primary and a single replica shard. These are useful compromises between both ends of the usage continuum, though they’ll want to be tweaked eventually for the very large and very small applications.\n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Creating an index: What’s in a name?\"\n}\n[/block]\nIf the first section has you scratching your head a bit, not to worry. This next tip will benefit all ES deployments, giving you the freedom to get started today and flexibility to adapt your index configurations a bit more down the road.\n\nTo wit: we recommend using a versioned naming scheme for your indices, for example:\n\nIn a development cluster:\ndevelopment-v1\ndevelopment-v2\netc.\n\nIn a production cluster:\nproduction-v1\nproduction-v2\netc.\n\nWhether you need to tweak the number of shards, or overhaul your mappings and analysis settings, some day you’re going to need to reindex your data. Search-focused use cases particularly benefit from regular reindexing as a helpful part of tuning your search result relevancy.\n\nA simple timestamp or manually incremented version number gives you the power to make changes to index settings which are otherwise immutable. Plus, you can use Aliases to maintain a canonical index name for your application to reference, giving you the ability to create a new index in the background and hot swap your application to use it without any client reconfiguration.\n\nConsider this story. Say you get started with a quick proof of concept, and index all your data with Elasticsearch’s quite sensible defaults. You ship a new search integration, and people are thrilled to be getting quick and useful results. Over time, as users get used to the search tools, they start asking for different ways to match against the things they’re searching for. Like searching for projects based on their owners’ names. Or you’d like to make matches fuzzier for fast autocompletion style search.","excerpt":"","slug":"tips-for-new-and-experienced-users-of-elasticsearch","type":"basic","title":"Index Partitioning and Naming"}

Index Partitioning and Naming


Here are a few tips on data partitioning and index naming that may help you get started, and avoid common pitfalls as you scale. [block:api-header] { "type": "basic", "title": "Creating an index: Partitioning your data" } [/block] An index in Elasticsearch is loosely analogous to a table in a SQL database. Smaller apps can often get away with using a single monolithic index per environment, and save on the extra overhead. Much larger apps can take advantage of multiple indices as a natural place to segment and partition dissimilar data. (For example, a table with 10 million events may benefit from being indexed and configured separately from a table with 10 thousand blog articles.) Not only can you partition your data by using multiple indices, you can partition each index by using Elasticsearch shards. An index is comprised of primary shards, which subdivide your data across multiple servers. Indices also have replica shards, which serve as extra copies of the primaries for redundancy and to support higher search volumes. The smallest possible index is a single primary shard. Where’s the threshold for these decisions? If you’re under a million documents or a gigabyte of data, then it’s probably not something worth worrying about. More than that, and you may want to consult our more detailed article on data partitioning in Elasticsearch. Conversely, if you have a very small set of data and are working on a tight budget, you may want to do a bit of extra work to consolidate your shard usage. Otherwise, for sensible defaults, many Elasticsearch clients will create an index per model. And unless otherwise specified, new indices on your Bonsai cluster will default to using a single primary and a single replica shard. These are useful compromises between both ends of the usage continuum, though they’ll want to be tweaked eventually for the very large and very small applications. [block:api-header] { "type": "basic", "title": "Creating an index: What’s in a name?" } [/block] If the first section has you scratching your head a bit, not to worry. This next tip will benefit all ES deployments, giving you the freedom to get started today and flexibility to adapt your index configurations a bit more down the road. To wit: we recommend using a versioned naming scheme for your indices, for example: In a development cluster: development-v1 development-v2 etc. In a production cluster: production-v1 production-v2 etc. Whether you need to tweak the number of shards, or overhaul your mappings and analysis settings, some day you’re going to need to reindex your data. Search-focused use cases particularly benefit from regular reindexing as a helpful part of tuning your search result relevancy. A simple timestamp or manually incremented version number gives you the power to make changes to index settings which are otherwise immutable. Plus, you can use Aliases to maintain a canonical index name for your application to reference, giving you the ability to create a new index in the background and hot swap your application to use it without any client reconfiguration. Consider this story. Say you get started with a quick proof of concept, and index all your data with Elasticsearch’s quite sensible defaults. You ship a new search integration, and people are thrilled to be getting quick and useful results. Over time, as users get used to the search tools, they start asking for different ways to match against the things they’re searching for. Like searching for projects based on their owners’ names. Or you’d like to make matches fuzzier for fast autocompletion style search.