Creating Your First Index

Creating an index on Elasticsearch is the first step towards leveraging the awesome power of Elasticsearch. While there is a wealth of resources online for creating an index on Elasticsearch, if you’re new to Elasticsearch, make sure to check out the definition of an index in our docs.

If you’re already familiar with the basics, we have a blog post / white paper on The Ideal Elasticsearch Index, which has a ton of information and things to think about when creating an index.

Note that many Elasticsearch clients will take care of creating an index for you. You should review your client’s documentation for more information on its index usage conventions. If you don’t know how many indexes your application needs, we recommend creating one index per model or database table.

Important Note on Index Auto-Creation

By default, Elasticsearch has a feature that will automatically create indices. Simply pushing data into a non-existing index will cause that index to be created with mappings inferred from the data. In accordance with Elasticsearch best practices for production applications, we’ve disabled this feature on Bonsai.

However, some popular tools such asKibana and Logstash do not support explicit index creation, and rely on auto-creation being available. To accommodate these tools, we’ve whitelisted popular time-series index names such as logstash*, requests*, events*, .kibana* and kibana-int*.

For the purposes of this discussion, we’ll assume that you don’t have an Elasticsearch client that can create the index for you. This guide will proceed with the manual steps for creating an index, changing settings, populating it with data, and finally destroying your index.

Manually Creating an Index

There are two main ways to manually create an index in your Bonsai cluster. The first is with a command line tool like curl or httpie. Curl is a standard tool that is bundled with many *nix-like operating systems. OSX and many Linux distributions should have it. It can even be installed on Windows. If you do not have curl, and don’t have a package manager capable of installing it, you can download it here.

The second is through the Interactive Console. The Interactive Console is a feature provided by Bonsai and found in your cluster dashboard.

Let’s create an example index called acme-production from the command line with curl.

$ curl -X PUT http://user:password@redwood-12345.us-east-1.bonsai.io/acme-production
{"acknowledged":true}

Authentication required

All Bonsai clusters have a randomly generated username and password. By default, these credentials need to be included with all requests in order to be processed. If you’re seeing something an HTTP 401 Error like this:

  HTTP 401: Authorization required

then your credentials were not supplied. You can view the fully-qualified URL in your cluster dashboard. It will look like this: http://user:password@redwood-12345.us-east-1.bonsai.io

Updating the Index Settings

We can inspect the index settings with a GET call to /_cat/indices like so:

$ curl -XGET http://user:password@redwood-12345.us-east-1.bonsai.io/_cat/indices?v
health status index             pri rep docs.count docs.deleted store.size pri.store.size
green  open   acme-production   1   1          0            0        260b 130b

The ?v at the end of the URL tells Elasticsearch to return the headers of the data it’s returning. It’s not required, but it helps explain the data.

In the example above, Elasticsearch shows that the acme-production index was created with one primary shard and one replica shard. It doesn’t have any documents yet, and is only a few bytes in size.

Let’s add a replica to the index:

$ curl -XPUT http://user:password@redwood-12345.us-east-1.bonsai.io/acme-production/_settings -d '{"index":{"number_of_replicas":2}}'
{"acknowledged":true}

Now, when we re-query the /_cat/indices endpoint, we can see that there are now two replicas, where before there was only one:

$ curl -XGET http://user:password@redwood-12345.us-east-1.bonsai.io/_cat/indices?v
health status index             pri rep docs.count docs.deleted store.size pri.store.size
green  open   acme-production   1   2          0            0        390b 130b

Similarly, if we wanted to remove all the replicas, we could simply modify the JSON payload like so:

$ curl -XPUT http://user:password@redwood-12345.us-east-1.bonsai.io/acme-production/_settings -d '{"index":{"number_of_replicas":0}}'
{"acknowledged":true}

$ curl -XGET http://user:password@redwood-12345.us-east-1.bonsai.io/_cat/indices?v
health status index             pri rep docs.count docs.deleted store.size pri.store.size
green  open   acme-production   1   0          0            0        318b 159b

Adding Data to Your Index

Let’s insert a “Hello, world” test document to verify that your new index is available, and to highlight some basic Elasticsearch concepts.

Every document prior to Elasticsearch 7.x should specify a type, and preferably an id. You may specify these values with the _id and the _type keys, or Elasticsearch will infer them from the URL. If you don’t explicitly provide an id, Elasticsearch will create a random one for you.

In the following example, we use POST to add a simple document to the index which specifies a _type of test and an _id of 1. You should replace the sample URL in this document with your own index URL to follow along:

$ curl -XPOST http://user:password@redwood-12345/acme-production/test/1 -d '{"title":"Hello world"}'
{
 "_index" : "acme-production",
 "_type" : "test",
 "_id" : "1",
 "_version" : 1,
 "_shards" : {
   "total" : 3,
   "successful" : 3,
   "failed" : 0
 },
 "created" : true
}

Because we haven’t explicitly defined a mapping (schema) for the acme-production index, Elasticsearch will come up with one for us. It will inspect the contents of each field it receives and attempt to infer a data structure for the content. It will then use that for subsequent documents.

Dynamic Mapping Can Be Dangerous

Elasticsearch’s ability to generate mappings on the fly is a really nice feature, but it has some drawbacks. One is that the contents of the first field it sees determines how it will interpret the rest.

For example, there have been cases where users attempt to index geospatial data, and Elasticsearch interprets the field as being a float type. Certain documents then fail later in the indexing process with an HTTP 400 error. Or everything succeeds, but geospatial filtering is broken.

It’s a best practice to explicitly create your mappings before indexing into a new index, if you’re planning to power a production application. Today, most clients and frameworks are pretty good about handling this automatically, but it’s a subtle “gotcha” that has made its way into the support queues from time to time.

We can see the mapping that Elasticsearch generated by using the _mapping API:

$ curl -XGET http://user:password@redwood-12345/acme-production/_mapping
{"acme-production":{"mappings":{"test":{"properties":{"title":{"type":"string"}}}}}}

The inspection of the index mapping shows that Elasticsearch has generated a schema from our sample JSON, and that it has decided that documents in the “acme-production” index of type “test” will have a string body in the “title” field. This is reasonable, so we’ll leave it alone.

Next, you may view this document by accessing it directly. In the example below, note the ?pretty parameter at the end of the URL. This tells Elasticsearch to pretty print the results, making them more legible:

$curl -XGET 'http://user:password@redwood-12345/acme-production/test/1?pretty'
{
  "_index" : "acme-production",
  "_type" : "test",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "title" : "Hello world"
  }
}

Alternatively, you can see it in the search results with the _search endpoint:

$curl -XGET 'http://user:password@redwood-12345/acme-production/_search?pretty'
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "acme-production",
      "_type" : "test",
      "_id" : "1",
      "_score" : 1.0,
      "_source" : {
        "title" : "Hello world"
      }
    }
  }
}

Check the _source

Note the _source key, which contains a copy of your original document. Elasticsearch makes an excellent general-purpose document store, although should never be used as a primary store. Use something ACID-compliant for that.

The _source field does also add some overhead. It can be disabled in the mappings. See the Elasticsearch documentation for more details.

To learn more about about the operations supported by your index, you should read the Elasticsearch Index API documentation. Note that some operations mentioned in the documentation (such as “Automatic Index Creation”) are restricted on Bonsai for technical reasons.

Destroying Your Index

When you have decided you no longer need the “acme-production” index, you can destroy it with a one liner:

$ curl -XDELETE http://user:password@redwood-12345/acme-production
{"acknowledged":true}

The DELETE verb will delete one or more indices in your cluster. If you have several indices to delete, you can still perform the action in one line by concatenating the indices with a comma, like so:

$ curl -XDELETE http://user:password@redwood-12345/acme-production-1,acme-production-2,acme-production-3

There’s No ‘Undo’ Button

Destroying an index can not be undone, unless you restore it from a snapshot (if one exists). Do not delete indices without fully understanding the consequences. If there is a chance that your cluster is supporting a production application, be very careful before taking this kind of action. Accidental deletes are a major reason Bonsai doesn’t support _all or wildcard (*) destructive actions.