=========
Reference
=========

Elasticsearch is a highly scalable open-source full-text search and
analytics engine. It allows you to store, search, and analyze big
volumes of data quickly and in near real time. It is generally used as
the underlying engine/technology that powers applications that have
complex search features and requirements.

Here are a few sample use-cases that Elasticsearch could be used for:

-  You run an online web store where you allow your customers to search
   for products that you sell. In this case, you can use Elasticsearch
   to store your entire product catalog and inventory and provide search
   and autocomplete suggestions for them.

-  You want to collect log or transaction data and you want to analyze
   and mine this data to look for trends, statistics, summarizations, or
   anomalies. In this case, you can use Logstash (part of the
   Elasticsearch/Logstash/Kibana stack) to collect, aggregate, and parse
   your data, and then have Logstash feed this data into Elasticsearch.
   Once the data is in Elasticsearch, you can run searches and
   aggregations to mine any information that is of interest to you.

-  You run a price alerting platform which allows price-savvy customers
   to specify a rule like "I am interested in buying a specific
   electronic gadget and I want to be notified if the price of gadget
   falls below $X from any vendor within the next month". In this case
   you can scrape vendor prices, push them into Elasticsearch and use
   its reverse-search (Percolator) capability to match price movements
   against customer queries and eventually push the alerts out to the
   customer once matches are found.

-  You have analytics/business-intelligence needs and want to quickly
   investigate, analyze, visualize, and ask ad-hoc questions on a lot of
   data (think millions or billions of records). In this case, you can
   use Elasticsearch to store your data and then use Kibana (part of the
   Elasticsearch/Logstash/Kibana stack) to build custom dashboards that
   can visualize aspects of your data that are important to you.
   Additionally, you can use the Elasticsearch aggregations
   functionality to perform complex business intelligence queries
   against your data.

For the rest of this tutorial, I will guide you through the process of
getting Elasticsearch up and running, taking a peek inside it, and
performing basic operations like indexing, searching, and modifying your
data. At the end of this tutorial, you should have a good idea of what
Elasticsearch is, how it works, and hopefully be inspired to see how you
can use it to either build sophisticated search applications or to mine
intelligence from your data.

Basic Concepts
==============

There are few concepts that are core to Elasticsearch. Understanding
these concepts from the outset will tremendously help ease the learning
process.

**Near Realtime (NRT)**

Elasticsearch is a near real time search platform. What this means is
there is a slight latency (normally one second) from the time you index
a document until the time it becomes searchable.

**Cluster**

A cluster is a collection of one or more nodes (servers) that together
holds your entire data and provides federated indexing and search
capabilities across all nodes. A cluster is identified by a unique name
which by default is "elasticsearch". This name is important because a
node can only be part of a cluster if the node is set up to join the
cluster by its name. It is good practice to explicitly set the cluster
name in production, but it is fine to use the default for
testing/development purposes.

Note that it is valid and perfectly fine to have a cluster with only a
single node in it. Furthermore, you may also have multiple independent
clusters each with its own unique cluster name.

**Node**

A node is a single server that is part of your cluster, stores your
data, and participates in the cluster’s indexing and search
capabilities. Just like a cluster, a node is identified by a name which
by default is a random Marvel character name that is assigned to the
node at startup. You can define any node name you want if you do not
want the default. This name is important for administration purposes
where you want to identify which servers in your network correspond to
which nodes in your Elasticsearch cluster.

A node can be configured to join a specific cluster by the cluster name.
By default, each node is set up to join a cluster named
``elasticsearch`` which means that if you start up a number of nodes on
your network and—assuming they can discover each other—they will all
automatically form and join a single cluster named ``elasticsearch``.

In a single cluster, you can have as many nodes as you want.
Furthermore, if there are no other Elasticsearch nodes currently running
on your network, starting a single node will by default form a new
single-node cluster named ``elasticsearch``.

**Index**

An index is a collection of documents that have somewhat similar
characteristics. For example, you can have an index for customer data,
another index for a product catalog, and yet another index for order
data. An index is identified by a name (that must be all lowercase) and
this name is used to refer to the index when performing indexing,
search, update, and delete operations against the documents in it.

In a single cluster, you can define as many indexes as you want.

**Type**

Within an index, you can define one or more types. A type is a logical
category/partition of your index whose semantics is completely up to
you. In general, a type is defined for documents that have a set of
common fields. For example, let’s assume you run a blogging platform and
store all your data in a single index. In this index, you may define a
type for user data, another type for blog data, and yet another type for
comments data.

**Document**

A document is a basic unit of information that can be indexed. For
example, you can have a document for a single customer, another document
for a single product, and yet another for a single order. This document
is expressed in `JSON <http://json.org/>`__ (JavaScript Object Notation)
which is an ubiquitous internet data interchange format.

Within an index/type, you can store as many documents as you want. Note
that although a document physically resides in an index, a document
actually must be indexed/assigned to a type inside an index.

**Shards & Replicas**

An index can potentially store a large amount of data that can exceed
the hardware limits of a single node. For example, a single index of a
billion documents taking up 1TB of disk space may not fit on the disk of
a single node or may be too slow to serve search requests from a single
node alone.

To solve this problem, Elasticsearch provides the ability to subdivide
your index into multiple pieces called shards. When you create an index,
you can simply define the number of shards that you want. Each shard is
in itself a fully-functional and independent "index" that can be hosted
on any node in the cluster.

Sharding is important for two primary reasons:

-  It allows you to horizontally split/scale your content volume

-  It allows you distribute and parallelize operations across shards
   (potentially on multiple nodes) thus increasing
   performance/throughput

The mechanics of how a shard is distributed and also how its documents
are aggregated back into search requests are completely managed by
Elasticsearch and is transparent to you as the user.

In a network/cloud environment where failures can be expected anytime,
it is very useful and highly recommended to have a failover mechanism in
case a shard/node somehow goes offline or disappears for whatever
reason. To this end, Elasticsearch allows you to make one or more copies
of your index’s shards into what are called replica shards, or replicas
for short.

Replication is important for two primary reasons:

-  It provides high availability in case a shard/node fails. For this
   reason, it is important to note that a replica shard is never
   allocated on the same node as the original/primary shard that it was
   copied from.

-  It allows you to scale out your search volume/throughput since
   searches can be executed on all replicas in parallel.

To summarize, each index can be split into multiple shards. An index can
also be replicated zero (meaning no replicas) or more times. Once
replicated, each index will have primary shards (the original shards
that were replicated from) and replica shards (the copies of the primary
shards). The number of shards and replicas can be defined per index at
the time the index is created. After the index is created, you may
change the number of replicas dynamically anytime but you cannot change
the number shards after-the-fact.

By default, each index in Elasticsearch is allocated 5 primary shards
and 1 replica which means that if you have at least two nodes in your
cluster, your index will have 5 primary shards and another 5 replica
shards (1 complete replica) for a total of 10 shards per index.

With that out of the way, let’s get started with the fun part…

Installation
============

Elasticsearch requires Java 7. Specifically as of this writing, it is
recommended that you use the Oracle JDK version 1.8.0\_25. Java
installation varies from platform to platform so we won’t go into those
details here. Suffice to say, before you install Elasticsearch, please
check your Java version first by running (and then install/upgrade
accordingly if needed):

.. code:: sh

    java -version
    echo $JAVA_HOME

Once we have Java set up, we can then download and run Elasticsearch.
The binaries are available from
```www.elasticsearch.org/download`` <http://www.elasticsearch.org/download>`__
along with all the releases that have been made in the past. For each
release, you have a choice among a ``zip`` or ``tar`` archive, or a
``DEB`` or ``RPM`` package. For simplicity, let’s use the tar file.

Let’s download the Elasticsearch 1.4.0 tar as follows (Windows users
should download the zip package):

.. code:: sh

    curl -L -O https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.4.0.tar.gz

Then extract it as follows (Windows users should unzip the zip package):

.. code:: sh

    tar -xvf elasticsearch-1.4.0.tar.gz

It will then create a bunch of files and folders in your current
directory. We then go into the bin directory as follows:

.. code:: sh

    cd elasticsearch-1.4.0/bin

And now we are ready to start our node and single cluster (Windows users
should run the elasticsearch.bat file):

.. code:: sh

    ./elasticsearch

If everything goes well, you should see a bunch of messages that look
like below:

.. code:: sh

    ./elasticsearch
    [2014-03-13 13:42:17,218][INFO ][node           ] [New Goblin] version[1.4.0], pid[2085], build[5c03844/2014-02-25T15:52:53Z]
    [2014-03-13 13:42:17,219][INFO ][node           ] [New Goblin] initializing ...
    [2014-03-13 13:42:17,223][INFO ][plugins        ] [New Goblin] loaded [], sites []
    [2014-03-13 13:42:19,831][INFO ][node           ] [New Goblin] initialized
    [2014-03-13 13:42:19,832][INFO ][node           ] [New Goblin] starting ...
    [2014-03-13 13:42:19,958][INFO ][transport      ] [New Goblin] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.8.112:9300]}
    [2014-03-13 13:42:23,030][INFO ][cluster.service] [New Goblin] new_master [New Goblin][rWMtGj3dQouz2r6ZFL9v4g][mwubuntu1][inet[/192.168.8.112:9300]], reason: zen-disco-join (elected_as_master)
    [2014-03-13 13:42:23,100][INFO ][discovery      ] [New Goblin] elasticsearch/rWMtGj3dQouz2r6ZFL9v4g
    [2014-03-13 13:42:23,125][INFO ][http           ] [New Goblin] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.8.112:9200]}
    [2014-03-13 13:42:23,629][INFO ][gateway        ] [New Goblin] recovered [1] indices into cluster_state
    [2014-03-13 13:42:23,630][INFO ][node           ] [New Goblin] started

Without going too much into detail, we can see that our node named "New
Goblin" (which will be a different Marvel character in your case) has
started and elected itself as a master in a single cluster. Don’t worry
yet at the moment what master means. The main thing that is important
here is that we have started one node within one cluster.

As mentioned previously, we can override either the cluster or node
name. This can be done from the command line when starting Elasticsearch
as follows:

.. code:: sh

    ./elasticsearch --cluster.name my_cluster_name --node.name my_node_name

Also note the line marked http with information about the HTTP address
(``192.168.8.112``) and port (``9200``) that our node is reachable from.
By default, Elasticsearch uses port ``9200`` to provide access to its
REST API. This port is configurable if necessary.

Exploring Your Cluster
======================

**The REST API**

Now that we have our node (and cluster) up and running, the next step is
to understand how to communicate with it. Fortunately, Elasticsearch
provides a very comprehensive and powerful REST API that you can use to
interact with your cluster. Among the few things that can be done with
the API are as follows:

-  Check your cluster, node, and index health, status, and statistics

-  Administer your cluster, node, and index data and metadata

-  Perform CRUD (Create, Read, Update, and Delete) and search operations
   against your indexes

-  Execute advanced search operations such as paging, sorting,
   filtering, scripting, aggregations, and many others

Cluster Health
--------------

Let’s start with a basic health check, which we can use to see how our
cluster is doing. We’ll be using curl to do this but you can use any
tool that allows you to make HTTP/REST calls. Let’s assume that we are
still on the same node where we started Elasticsearch on and open
another command shell window.

To check the cluster health, we will be using the ```_cat``
API <http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cat.html>`__.
Remember previously that our node HTTP endpoint is available at port
``9200``:

.. code:: sh

    curl 'localhost:9200/_cat/health?v'

And the response:

.. code:: sh

    epoch      timestamp cluster       status node.total node.data shards pri relo init unassign
    1394735289 14:28:09  elasticsearch green           1         1      0   0    0    0        0

We can see that our cluster named "elasticsearch" is up with a green
status.

Whenever we ask for the cluster health, we either get green, yellow, or
red. Green means everything is good (cluster is fully functional),
yellow means all data is available but some replicas are not yet
allocated (cluster is fully functional), and red means some data is not
available for whatever reason. Note that even if a cluster is red, it
still is partially functional (i.e. it will continue to serve search
requests from the available shards) but you will likely need to fix it
ASAP since you have missing data.

Also from the above response, we can see and total of 1 node and that we
have 0 shards since we have no data in it yet. Note that since we are
using the default cluster name (elasticsearch) and since Elasticsearch
uses multicast network discovery by default to find other nodes, it is
possible that you could accidentally start up more than one node in your
network and have them all join a single cluster. In this scenario, you
may see more than 1 node in the above response.

We can also get a list of nodes in our cluster as follows:

.. code:: sh

    curl 'localhost:9200/_cat/nodes?v'

And the response:

.. code:: sh

    curl 'localhost:9200/_cat/nodes?v'
    host         ip        heap.percent ram.percent load node.role master name
    mwubuntu1    127.0.1.1            8           4 0.00 d         *      New Goblin

Here, we can see our one node named "New Goblin", which is the single
node that is currently in our cluster.

List All Indexes
----------------

Now let’s take a peek at our indexes:

.. code:: sh

    curl 'localhost:9200/_cat/indices?v'

And the response:

.. code:: sh

    curl 'localhost:9200/_cat/indices?v'
    health index pri rep docs.count docs.deleted store.size pri.store.size

Which simply means we have no indexes yet in the cluster.

Create an Index
---------------

Now let’s create an index named "customer" and then list all the indexes
again:

.. code:: sh

    curl -XPUT 'localhost:9200/customer?pretty'
    curl 'localhost:9200/_cat/indices?v'

The first command creates the index named "customer" using the PUT verb.
We simply append ``pretty`` to the end of the call to tell it to
pretty-print the JSON response (if any).

And the response:

.. code:: sh

    curl -XPUT 'localhost:9200/customer?pretty'
    {
      "acknowledged" : true
    }

    curl 'localhost:9200/_cat/indices?v'
    health index    pri rep docs.count docs.deleted store.size pri.store.size
    yellow customer   5   1          0            0       495b           495b

The results of the second command tells us that we now have 1 index
named customer and it has 5 primary shards and 1 replica (the defaults)
and it contains 0 documents in it.

You might also notice that the customer index has a yellow health tagged
to it. Recall from our previous discussion that yellow means that some
replicas are not (yet) allocated. The reason this happens for this index
is because Elasticsearch by default created one replica for this index.
Since we only have one node running at the moment, that one replica
cannot yet be allocated (for high availability) until a later point in
time when another node joins the cluster. Once that replica gets
allocated onto a second node, the health status for this index will turn
to green.

Index and Query a Document
--------------------------

Let’s now put something into our customer index. Remember previously
that in order to index a document, we must tell Elasticsearch which type
in the index it should go to.

Let’s index a simple customer document into the customer index,
"external" type, with an ID of 1 as follows:

Our JSON document: { "name": "John Doe" }

.. code:: sh

    curl -XPUT 'localhost:9200/customer/external/1?pretty' -d '
    {
      "name": "John Doe"
    }'

And the response:

.. code:: sh

    curl -XPUT 'localhost:9200/customer/external/1?pretty' -d '
    {
      "name": "John Doe"
    }'
    {
      "_index" : "customer",
      "_type" : "external",
      "_id" : "1",
      "_version" : 1,
      "created" : true
    }

From the above, we can see that a new customer document was successfully
created inside the customer index and the external type. The document
also has an internal id of 1 which we specified at index time.

It is important to note that Elasticsearch does not require you to
explicitly create an index first before you can index documents into it.
In the previous example, Elasticsearch will automatically create the
customer index if it didn’t already exist beforehand.

Let’s now retrieve that document that we just indexed:

.. code:: sh

    curl -XGET 'localhost:9200/customer/external/1?pretty'

And the response:

.. code:: sh

    curl -XGET 'localhost:9200/customer/external/1?pretty'
    {
      "_index" : "customer",
      "_type" : "external",
      "_id" : "1",
      "_version" : 1,
      "found" : true, "_source" : { "name": "John Doe" }
    }

Nothing out of the ordinary here other than a field, ``found``, stating
that we found a document with the requested ID 1 and another field,
``_source``, which returns the full JSON document that we indexed from
the previous step.

Delete an Index
---------------

Now let’s delete the index that we just created and then list all the
indexes again:

.. code:: sh

    curl -XDELETE 'localhost:9200/customer?pretty'
    curl 'localhost:9200/_cat/indices?v'

And the response:

.. code:: sh

    curl -XDELETE 'localhost:9200/customer?pretty'
    {
      "acknowledged" : true
    }
    curl 'localhost:9200/_cat/indices?v'
    health index pri rep docs.count docs.deleted store.size pri.store.size

Which means that the index was deleted successfully and we are now back
to where we started with nothing in our cluster.

Before we move on, let’s take a closer look again at some of the API
commands that we have learned so far:

.. code:: sh

    curl -XPUT 'localhost:9200/customer'
    curl -XPUT 'localhost:9200/customer/external/1' -d '
    {
      "name": "John Doe"
    }'
    curl 'localhost:9200/customer/external/1'
    curl -XDELETE 'localhost:9200/customer'

If we study the above commands carefully, we can actually see a pattern
of how we access data in Elasticsearch. That pattern can be summarized
as follows:

.. code:: sh

    curl -<REST Verb> <Node>:<Port>/<Index>/<Type>/<ID>

This REST access pattern is pervasive throughout all the API commands
that if you can simply remember it, you will have a good head start at
mastering Elasticsearch.

Modifying Your Data
===================

Elasticsearch provides data manipulation and search capabilities in near
real time. By default, you can expect a one second delay (refresh
interval) from the time you index/update/delete your data until the time
that it appears in your search results. This is an important distinction
from other platforms like SQL wherein data is immediately available
after a transaction is completed.

**Indexing/Replacing Documents**

We’ve previously seen how we can index a single document. Let’s recall
that command again:

.. code:: sh

    curl -XPUT 'localhost:9200/customer/external/1?pretty' -d '
    {
      "name": "John Doe"
    }'

Again, the above will index the specified document into the customer
index, external type, with the ID of 1. If we then executed the above
command again with a different (or same) document, Elasticsearch will
replace (i.e. reindex) a new document on top of the existing one with
the ID of 1:

.. code:: sh

    curl -XPUT 'localhost:9200/customer/external/1?pretty' -d '
    {
      "name": "Jane Doe"
    }'

The above changes the name of the document with the ID of 1 from "John
Doe" to "Jane Doe". If, on the other hand, we use a different ID, a new
document will be indexed and the existing document(s) already in the
index remains untouched.

.. code:: sh

    curl -XPUT 'localhost:9200/customer/external/2?pretty' -d '
    {
      "name": "Jane Doe"
    }'

The above indexes a new document with an ID of 2.

When indexing, the ID part is optional. If not specified, Elasticsearch
will generate a random ID and then use it to index the document. The
actual ID Elasticsearch generates (or whatever we specified explicitly
in the previous examples) is returned as part of the index API call.

This example shows how to index a document without an explicit ID:

.. code:: sh

    curl -XPOST 'localhost:9200/customer/external?pretty' -d '
    {
      "name": "Jane Doe"
    }'

Note that in the above case, we are using the POST verb instead of PUT
since we didn’t specify an ID.

Updating Documents
------------------

In addition to being able to index and replace documents, we can also
update documents. Note though that Elasticsearch does not actually do
in-place updates under the hood. Whenever we do an update, Elasticsearch
deletes the old document and then indexes a new document with the update
applied to it in one shot.

This example shows how to update our previous document (ID of 1) by
changing the name field to "Jane Doe":

.. code:: sh

    curl -XPOST 'localhost:9200/customer/external/1/_update?pretty' -d '
    {
      "doc": { "name": "Jane Doe" }
    }'

This example shows how to update our previous document (ID of 1) by
changing the name field to "Jane Doe" and at the same time add an age
field to it:

.. code:: sh

    curl -XPOST 'localhost:9200/customer/external/1/_update?pretty' -d '
    {
      "doc": { "name": "Jane Doe", "age": 20 }
    }'

Updates can also be performed by using simple scripts. This example uses
a script to increment the age by 5:

.. code:: sh

    curl -XPOST 'localhost:9200/customer/external/1/_update?pretty' -d '
    {
      "script" : "ctx._source.age += 5"
    }'

In the above example, ``ctx._source`` refers to the current source
document that is about to be updated.

Note that as of this writing, updates can only be performed on a single
document at a time. In the future, Elasticsearch will provide the
ability to update multiple documents given a query condition (like an
``SQL UPDATE-WHERE`` statement).

Deleting Documents
------------------

Deleting a document is fairly straightforward. This example shows how to
delete our previous customer with the ID of 2:

.. code:: sh

    curl -XDELETE 'localhost:9200/customer/external/2?pretty'

We also have the ability to delete multiple documents that match a query
condition. This example shows how to delete all customers whose names
contain "John":

.. code:: sh

    curl -XDELETE 'localhost:9200/customer/external/_query?pretty' -d '
    {
      "query": { "match": { "name": "John" } }
    }'

Note above that the URI has changed to ``/_query`` to signify a
delete-by-query API with the delete query criteria in the body, but we
are still using the DELETE verb. Don’t worry yet about the query syntax
as we will cover that later in this tutorial.

Batch Processing
----------------

In addition to being able to index, update, and delete individual
documents, Elasticsearch also provides the ability to perform any of the
above operations in batches using the ```_bulk``
API <http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html>`__.
This functionality is important in that it provides a very efficient
mechanism to do multiple operations as fast as possible with as little
network roundtrips as possible.

As a quick example, the following call indexes two documents (ID 1 -
John Doe and ID 2 - Jane Doe) in one bulk operation:

.. code:: sh

    curl -XPOST 'localhost:9200/customer/external/_bulk?pretty' -d '
    {"index":{"_id":"1"}}
    {"name": "John Doe" }
    {"index":{"_id":"2"}}
    {"name": "Jane Doe" }
    '

This example updates the first document (ID of 1) and then deletes the
second document (ID of 2) in one bulk operation:

.. code:: sh

    curl -XPOST 'localhost:9200/customer/external/_bulk?pretty' -d '
    {"update":{"_id":"1"}}
    {"doc": { "name": "John Doe becomes Jane Doe" } }
    {"delete":{"_id":"2"}}
    '

Note above that for the delete action, there is no corresponding source
document after it since deletes only require the ID of the document to
be deleted.

The bulk API executes all the actions sequentially and in order. If a
single action fails for whatever reason, it will continue to process the
remainder of the actions after it. When the bulk API returns, it will
provide a status for each action (in the same order it was sent in) so
that you can check if a specific action failed or not.

Exploring Your Data
===================

**Sample Dataset**

Now that we’ve gotten a glimpse of the basics, let’s try to work on a
more realistic dataset. I’ve prepared a sample of fictitious JSON
documents of customer bank account information. Each document has the
following schema:

.. code:: sh

    {
        "account_number": 0,
        "balance": 16623,
        "firstname": "Bradshaw",
        "lastname": "Mckenzie",
        "age": 29,
        "gender": "F",
        "address": "244 Columbus Place",
        "employer": "Euron",
        "email": "bradshawmckenzie@euron.com",
        "city": "Hobucken",
        "state": "CO"
    }

For the curious, I generated this data from
```www.json-generator.com/`` <http://www.json-generator.com/>`__ so
please ignore the actual values and semantics of the data as these are
all randomly generated.

**Loading the Sample Dataset**

You can download the sample dataset (accounts.json) from
`here <https://github.com/bly2k/files/blob/master/accounts.zip?raw=true>`__.
Extract it to our current directory and let’s load it into our cluster
as follows:

.. code:: sh

    curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary @accounts.json
    curl 'localhost:9200/_cat/indices?v'

And the response:

.. code:: sh

    curl 'localhost:9200/_cat/indices?v'
    health index pri rep docs.count docs.deleted store.size pri.store.size
    yellow bank    5   1       1000            0    424.4kb        424.4kb

Which means that we just successfully bulk indexed 1000 documents into
the bank index (under the account type).

The Search API
--------------

Now let’s start with some simple searches. There are two basic ways to
run searches: one is by sending search parameters through the `REST
request
URI <http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-uri-request.html>`__
and the other by sending them through the `REST request
body <http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-body.html>`__.
The request body method allows you to be more expressive and also to
define your searches in a more readable JSON format. We’ll try one
example of the request URI method but for the remainder of this
tutorial, we will exclusively be using the request body method.

The REST API for search is accessible from the ``_search`` endpoint.
This example returns all documents in the bank index:

.. code:: sh

    curl 'localhost:9200/bank/_search?q=*&pretty'

Let’s first dissect the search call. We are searching (``_search``
endpoint) in the bank index, and the ``q=*`` parameter instructs
Elasticsearch to match all documents in the index. The ``pretty``
parameter, again, just tells Elasticsearch to return pretty-printed JSON
results.

And the response (partially shown):

.. code:: sh

    curl 'localhost:9200/bank/_search?q=*&pretty'
    {
      "took" : 63,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
      },
      "hits" : {
        "total" : 1000,
        "max_score" : 1.0,
        "hits" : [ {
          "_index" : "bank",
          "_type" : "account",
          "_id" : "1",
          "_score" : 1.0, "_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
        }, {
          "_index" : "bank",
          "_type" : "account",
          "_id" : "6",
          "_score" : 1.0, "_source" : {"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"}
        }, {
          "_index" : "bank",
          "_type" : "account",

As for the response, we see the following parts:

-  ``took`` – time in milliseconds for Elasticsearch to execute the
   search

-  ``timed_out`` – tells us if the search timed out or not

-  ``_shards`` – tells us how many shards were searched, as well as a
   count of the successful/failed searched shards

-  ``hits`` – search results

-  ``hits.total`` – total number of documents matching our search
   criteria

-  ``hits.hits`` – actual array of search results (defaults to first 10
   documents)

-  ``_score`` and ``max_score`` - ignore these fields for now

Here is the same exact search above using the alternative request body
method:

.. code:: sh

    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "query": { "match_all": {} }
    }'

The difference here is that instead of passing ``q=*`` in the URI, we
POST a JSON-style query request body to the ``_search`` API. We’ll
discuss this JSON query in the next section.

And the response (partially shown):

.. code:: sh

    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "query": { "match_all": {} }
    }'
    {
      "took" : 26,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
      },
      "hits" : {
        "total" : 1000,
        "max_score" : 1.0,
        "hits" : [ {
          "_index" : "bank",
          "_type" : "account",
          "_id" : "1",
          "_score" : 1.0, "_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
        }, {
          "_index" : "bank",
          "_type" : "account",
          "_id" : "6",
          "_score" : 1.0, "_source" : {"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"}
        }, {
          "_index" : "bank",
          "_type" : "account",
          "_id" : "13",

It is important to understand that once you get your search results
back, Elasticsearch is completely done with the request and does not
maintain any kind of server-side resources or open cursors into your
results. This is in stark contrast to many other platforms such as SQL
wherein you may initially get a partial subset of your query results
up-front and then you have to continuously go back to the server if you
want to fetch (or page through) the rest of the results using some kind
of stateful server-side cursor.

Introducing the Query Language
------------------------------

Elasticsearch provides a JSON-style domain-specific language that you
can use to execute queries. This is referred to as the `Query
DSL <http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl.html>`__.
The query language is quite comprehensive and can be intimidating at
first glance but the best way to actually learn it is to start with a
few basic examples.

Going back to our last example, we executed this query:

.. code:: sh

    {
      "query": { "match_all": {} }
    }

Dissecting the above, the ``query`` part tells us what our query
definition is and the ``match_all`` part is simply the type of query
that we want to run. The ``match_all`` query is simply a search for all
documents in the specified index.

In addition to the ``query`` parameter, we also can pass other
parameters to influence the search results. For example, the following
does a ``match_all`` and returns only the first document:

.. code:: sh

    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "query": { "match_all": {} },
      "size": 1
    }'

Note that if ``size`` is not specified, it defaults to 10.

This example does a ``match_all`` and returns documents 11 through 20:

.. code:: sh

    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "query": { "match_all": {} },
      "from": 10,
      "size": 10
    }'

The ``from`` parameter (0-based) specifies which document index to start
from and the ``size`` parameter specifies how many documents to return
starting at the from parameter. This feature is useful when implementing
paging of search results. Note that if ``from`` is not specified, it
defaults to 0.

This example does a ``match_all`` and sorts the results by account
balance in descending order and returns the top 10 (default size)
documents.

.. code:: sh

    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "query": { "match_all": {} },
      "sort": { "balance": { "order": "desc" } }
    }'

Executing Searches
------------------

Now that we have seen a few of the basic search parameters, let’s dig in
some more into the Query DSL. Let’s first take a look at the returned
document fields. By default, the full JSON document is returned as part
of all searches. This is referred to as the source (``_source`` field in
the search hits). If we don’t want the entire source document returned,
we have the ability to request only a few fields from within source to
be returned.

This example shows how to return two fields, ``account_number`` and
``balance`` (inside of ``_source``), from the search:

.. code:: sh

    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "query": { "match_all": {} },
      "_source": ["account_number", "balance"]
    }'

Note that the above example simply reduces the ``_source`` field. It
will still only return one field named ``_source`` but within it, only
the fields ``account_number`` and ``balance`` are included.

If you come from a SQL background, the above is somewhat similar in
concept to the ``SQL SELECT FROM`` field list.

Now let’s move on to the query part. Previously, we’ve seen how the
``match_all`` query is used to match all documents. Let’s now introduce
a new query called the ```match``
query <http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html>`__,
which can be thought of as a basic fielded search query (i.e. a search
done against a specific field or set of fields).

This example returns the account numbered 20:

.. code:: sh

    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "query": { "match": { "account_number": 20 } }
    }'

This example returns all accounts containing the term "mill" in the
address:

.. code:: sh

    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "query": { "match": { "address": "mill" } }
    }'

This example returns all accounts containing the term "mill" or "lane"
in the address:

.. code:: sh

    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "query": { "match": { "address": "mill lane" } }
    }'

This example is a variant of ``match`` (``match_phrase``) that returns
all accounts containing the phrase "mill lane" in the address:

.. code:: sh

    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "query": { "match_phrase": { "address": "mill lane" } }
    }'

Let’s now introduce the ```bool``\ (ean)
query <http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html>`__.
The ``bool`` query allows us to compose smaller queries into bigger
queries using boolean logic.

This example composes two ``match`` queries and returns all accounts
containing "mill" and "lane" in the address:

.. code:: sh

    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "query": {
        "bool": {
          "must": [
            { "match": { "address": "mill" } },
            { "match": { "address": "lane" } }
          ]
        }
      }
    }'

In the above example, the ``bool must`` clause specifies all the queries
that must be true for a document to be considered a match.

In contrast, this example composes two ``match`` queries and returns all
accounts containing "mill" or "lane" in the address:

.. code:: sh

    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "query": {
        "bool": {
          "should": [
            { "match": { "address": "mill" } },
            { "match": { "address": "lane" } }
          ]
        }
      }
    }'

In the above example, the ``bool should`` clause specifies a list of
queries either of which must be true for a document to be considered a
match.

This example composes two ``match`` queries and returns all accounts
that contain neither "mill" nor "lane" in the address:

.. code:: sh

    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "query": {
        "bool": {
          "must_not": [
            { "match": { "address": "mill" } },
            { "match": { "address": "lane" } }
          ]
        }
      }
    }'

In the above example, the ``bool must_not`` clause specifies a list of
queries none of which must be true for a document to be considered a
match.

We can combine ``must``, ``should``, and ``must_not`` clauses
simultaneously inside a ``bool`` query. Furthermore, we can compose
``bool`` queries inside any of these ``bool`` clauses to mimic any
complex multi-level boolean logic.

This example returns all accounts of anybody who is 40 years old but
don’t live in ID(aho):

.. code:: sh

    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "query": {
        "bool": {
          "must": [
            { "match": { "age": "40" } }
          ],
          "must_not": [
            { "match": { "state": "ID" } }
          ]
        }
      }
    }'

Executing Filters
-----------------

In the previous section, we skipped over a little detail called the
document score (``_score`` field in the search results). The score is a
numeric value that is a relative measure of how well the document
matches the search query that we specified. The higher the score, the
more relevant the document is, the lower the score, the less relevant
the document is.

All queries in Elasticsearch trigger computation of the relevance
scores. In cases where we do not need the relevance scores,
Elasticsearch provides another query capability in the form of
`filters <http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filters.html>`__.
Filters are similar in concept to queries except that they are optimized
for much faster execution speeds for two primary reasons:

-  Filters do not score so they are faster to execute than queries

-  Filters can be `cached in
   memory <http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/>`__
   allowing repeated search executions to be significantly faster than
   queries

To understand filters, let’s first introduce the ```filtered``
query <http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html>`__,
which allows you to combine a query (like ``match_all``, ``match``,
``bool``, etc.) together with a filter. As an example, let’s introduce
the ```range``
filter <http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-range-filter.html>`__,
which allows us to filter documents by a range of values. This is
generally used for numeric or date filtering.

This example uses a filtered query to return all accounts with balances
between 20000 and 30000, inclusive. In other words, we want to find
accounts with a balance that is greater than or equal to 20000 and less
than or equal to 30000.

.. code:: sh

    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "query": {
        "filtered": {
          "query": { "match_all": {} },
          "filter": {
            "range": {
              "balance": {
                "gte": 20000,
                "lte": 30000
              }
            }
          }
        }
      }
    }'

Dissecting the above, the filtered query contains a ``match_all`` query
(the query part) and a ``range`` filter (the filter part). We can
substitute any other query into the query part as well as any other
filter into the filter part. In the above case, the range filter makes
perfect sense since documents falling into the range all match
"equally", i.e., no document is more relevant than another.

In general, the easiest way to decide whether you want a filter or a
query is to ask yourself if you care about the relevance score or not.
If relevance is not important, use filters, otherwise, use queries. If
you come from a SQL background, queries and filters are similar in
concept to the ``SELECT WHERE`` clause, although more so for filters
than queries.

In addition to the ``match_all``, ``match``, ``bool``, ``filtered``, and
``range`` queries, there are a lot of other query/filter types that are
available and we won’t go into them here. Since we already have a basic
understanding of how they work, it shouldn’t be too difficult to apply
this knowledge in learning and experimenting with the other query/filter
types.

Executing Aggregations
----------------------

Aggregations provide the ability to group and extract statistics from
your data. The easiest way to think about aggregations is by roughly
equating it to the SQL GROUP BY and the SQL aggregate functions. In
Elasticsearch, you have the ability to execute searches returning hits
and at the same time return aggregated results separate from the hits
all in one response. This is very powerful and efficient in the sense
that you can run queries and multiple aggregations and get the results
back of both (or either) operations in one shot avoiding network
roundtrips using a concise and simplified API.

To start with, this example groups all the accounts by state, and then
returns the top 10 (default) states sorted by count descending (also
default):

.. code:: sh

    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "size": 0,
      "aggs": {
        "group_by_state": {
          "terms": {
            "field": "state"
          }
        }
      }
    }'

In SQL, the above aggregation is similar in concept to:

.. code:: sh

    SELECT COUNT(*) from bank GROUP BY state ORDER BY COUNT(*) DESC

And the response (partially shown):

.. code:: sh

      "hits" : {
        "total" : 1000,
        "max_score" : 0.0,
        "hits" : [ ]
      },
      "aggregations" : {
        "group_by_state" : {
          "buckets" : [ {
            "key" : "al",
            "doc_count" : 21
          }, {
            "key" : "tx",
            "doc_count" : 17
          }, {
            "key" : "id",
            "doc_count" : 15
          }, {
            "key" : "ma",
            "doc_count" : 15
          }, {
            "key" : "md",
            "doc_count" : 15
          }, {
            "key" : "pa",
            "doc_count" : 15
          }, {
            "key" : "dc",
            "doc_count" : 14
          }, {
            "key" : "me",
            "doc_count" : 14
          }, {
            "key" : "mo",
            "doc_count" : 14
          }, {
            "key" : "nd",
            "doc_count" : 14
          } ]
        }
      }
    }

We can see that there are 21 accounts in AL(abama), followed by 17
accounts in TX, followed by 15 accounts in ID(aho), and so forth.

Note that we set ``size=0`` to not show search hits because we only want
to see the aggregation results in the response.

Building on the previous aggregation, this example calculates the
average account balance by state (again only for the top 10 states
sorted by count in descending order):

.. code:: sh

    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "size": 0,
      "aggs": {
        "group_by_state": {
          "terms": {
            "field": "state"
          },
          "aggs": {
            "average_balance": {
              "avg": {
                "field": "balance"
              }
            }
          }
        }
      }
    }'

Notice how we nested the ``average_balance`` aggregation inside the
``group_by_state`` aggregation. This is a common pattern for all the
aggregations. You can nest aggregations inside aggregations arbitrarily
to extract pivoted summarizations that you require from your data.

Building on the previous aggregation, let’s now sort on the average
balance in descending order:

.. code:: sh

    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "size": 0,
      "aggs": {
        "group_by_state": {
          "terms": {
            "field": "state",
            "order": {
              "average_balance": "desc"
            }
          },
          "aggs": {
            "average_balance": {
              "avg": {
                "field": "balance"
              }
            }
          }
        }
      }
    }'

This example demonstrates how we can group by age brackets (ages 20-29,
30-39, and 40-49), then by gender, and then finally get the average
account balance, per age bracket, per gender:

.. code:: sh

    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "size": 0,
      "aggs": {
        "group_by_age": {
          "range": {
            "field": "age",
            "ranges": [
              {
                "from": 20,
                "to": 30
              },
              {
                "from": 30,
                "to": 40
              },
              {
                "from": 40,
                "to": 50
              }
            ]
          },
          "aggs": {
            "group_by_gender": {
              "terms": {
                "field": "gender"
              },
              "aggs": {
                "average_balance": {
                  "avg": {
                    "field": "balance"
                  }
                }
              }
            }
          }
        }
      }
    }'

There are a many other aggregations capabilities that we won’t go into
detail here. The `aggregations reference
guide <http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations.html>`__
is a great starting point if you want to do further experimentation.

Conclusion
==========

Elasticsearch is both a simple and complex product. We’ve so far learned
the basics of what it is, how to look inside of it, and how to work with
it using some of the REST APIs. I hope that this tutorial has given you
a better understanding of what Elasticsearch is and more importantly,
inspired you to further experiment with the rest of its great features!

This section includes information on how to setup **elasticsearch** and
get it running. If you haven’t already,
`download <http://www.elasticsearch.org/download>`__ it, and then check
the `installation <#setup-installation>`__ docs.

    **Note**

    Elasticsearch can also be installed from our repositories using
    ``apt`` or ``yum``. See ?.

**Installation**

After `downloading </download>`__ the latest release and extracting it,
**elasticsearch** can be started using:

.. code:: sh

    $ bin/elasticsearch

Under \*nix system, the command will start the process in the
foreground. To run it in the background, add the ``-d`` switch to it:

.. code:: sh

    $ bin/elasticsearch -d

There are added features when using the ``elasticsearch`` shell script.
The first, which was explained earlier, is the ability to easily run the
process either in the foreground or the background.

Another feature is the ability to pass ``-X`` and ``-D`` or getopt long
style configuration parameters directly to the script. When set, all
override anything set using either ``JAVA_OPTS`` or ``ES_JAVA_OPTS``.
For example:

.. code:: sh

    $ bin/elasticsearch -Xmx2g -Xms2g -Des.index.store.type=memory --node.name=my-node

**Java version**

Elasticsearch is built using Java, and requires at least `Java
7 <http://www.oracle.com/technetwork/java/javase/downloads/index.html>`__
in order to run Only Oracle’s Java and the OpenJDK are supported.

We recommend installing the **Java 8 update 20 or later**, or **Java 7
update 55 or later**. Previous versions of Java 7 are known to have bugs
that can cause index corruption and data loss.

The version of Java to use can be configured by setting the
``JAVA_HOME`` environment variable.

Configuration
=============

**Environment Variables**

Within the scripts, Elasticsearch comes with built in ``JAVA_OPTS``
passed to the JVM started. The most important setting for that is the
``-Xmx`` to control the maximum allowed memory for the process, and
``-Xms`` to control the minimum allocated memory for the process (*in
general, the more memory allocated to the process, the better*).

Most times it is better to leave the default ``JAVA_OPTS`` as they are,
and use the ``ES_JAVA_OPTS`` environment variable in order to set /
change JVM settings or arguments.

The ``ES_HEAP_SIZE`` environment variable allows to set the heap memory
that will be allocated to elasticsearch java process. It will allocate
the same value to both min and max values, though those can be set
explicitly (not recommended) by setting ``ES_MIN_MEM`` (defaults to
``256m``), and ``ES_MAX_MEM`` (defaults to ``1g``).

It is recommended to set the min and max memory to the same value, and
enable ```mlockall`` <#setup-configuration-memory>`__.

**System Configuration**

**File Descriptors**

Make sure to increase the number of open files descriptors on the
machine (or for the user running elasticsearch). Setting it to 32k or
even 64k is recommended.

In order to test how many open files the process can open, start it with
``-Des.max-open-files`` set to ``true``. This will print the number of
open files the process can open on startup.

Alternatively, you can retrieve the ``max_file_descriptors`` for each
node using the ? API, with:

.. code:: js

    curl localhost:9200/_nodes/process?pretty

**Virtual memory**

Elasticsearch uses a ```hybrid mmapfs / niofs`` <#default_fs>`__
directory by default to store its indices. The default operating system
limits on mmap counts is likely to be too low, which may result in out
of memory exceptions. On Linux, you can increase the limits by running
the following command as ``root``:

.. code:: bash

    sysctl -w vm.max_map_count=262144

To set this value permanently, update the ``vm.max_map_count`` setting
in ``/etc/sysctl.conf``.

**Memory Settings**

The Linux kernel tries to use as much memory as possible for file system
caches and eagerly swaps out unused application memory, possibly
resulting in the elasticsearch process being swapped. Swapping is very
bad for performance and for node stability, so it should be avoided at
all costs.

There are three options:

-  **Disable swap**

   The simplest option is to completely disable swap. Usually
   Elasticsearch is the only service running on a box, and its memory
   usage is controlled by the ``ES_HEAP_SIZE`` environment variable.
   There should be no need to have swap enabled. On Linux systems, you
   can disable swap temporarily by running: ``sudo swapoff -a``. To
   disable it permanently, you will need to edit the ``/etc/fstab`` file
   and comment out any lines that contain the word ``swap``.

-  **Configure ``swappiness``**

   The second option is to ensure that the sysctl value
   ``vm.swappiness`` is set to ``0``. This reduces the kernel’s tendency
   to swap and should not lead to swapping under normal circumstances,
   while still allowing the whole system to swap in emergency
   conditions.

       **Note**

       From kernel version 3.5-rc1 and above, a ``swappiness`` of ``0``
       will cause the OOM killer to kill the process instead of allowing
       swapping. You will need to set ``swappiness`` to ``1`` to still
       allow swapping in emergencies.

-  **``mlockall``**

   The third option on Linux/Unix systems only, is to use
   `mlockall <http://opengroup.org/onlinepubs/007908799/xsh/mlockall.html>`__
   to try to lock the process address space into RAM, preventing any
   Elasticsearch memory from being swapped out. This can be done, by
   adding this line to the ``config/elasticsearch.yml`` file:

   .. code:: yaml

       bootstrap.mlockall: true

   After starting Elasticsearch, you can see whether this setting was
   applied successfully by checking the value of ``mlockall`` in the
   output from this request:

   .. code:: sh

       curl http://localhost:9200/_nodes/process?pretty

   If you see that ``mlockall`` is ``false``, then it means that the the
   ``mlockall`` request has failed. The most probable reason is that the
   user running Elasticsearch doesn’t have permission to lock memory.
   This can be granted by running ``ulimit -l unlimited`` as ``root``
   before starting Elasticsearch.

   Another possible reason why ``mlockall`` can fail is that the
   temporary directory (usually ``/tmp``) is mounted with the ``noexec``
   option. This can be solved by specfying a new temp directory, by
   starting Elasticsearch with:

   .. code:: sh

       ./bin/elasticsearch -Djna.tmpdir=/path/to/new/dir

       **Warning**

       ``mlockall`` might cause the JVM or shell session to exit if it
       tries to allocate more memory than is available!

**Elasticsearch Settings**

**elasticsearch** configuration files can be found under
``ES_HOME/config`` folder. The folder comes with two files, the
``elasticsearch.yml`` for configuring Elasticsearch different
`modules <#modules>`__, and ``logging.yml`` for configuring the
Elasticsearch logging.

The configuration format is `YAML <http://www.yaml.org/>`__. Here is an
example of changing the address all network based modules will use to
bind and publish to:

.. code:: yaml

    network :
        host : 10.0.0.4

**Paths**

In production use, you will almost certainly want to change paths for
data and log files:

.. code:: yaml

    path:
      logs: /var/log/elasticsearch
      data: /var/data/elasticsearch

**Cluster name**

Also, don’t forget to give your production cluster a name, which is used
to discover and auto-join other nodes:

.. code:: yaml

    cluster:
      name: <NAME OF YOUR CLUSTER>

**Node name**

You may also want to change the default node name for each node to
something like the display hostname. By default Elasticsearch will
randomly pick a Marvel character name from a list of around 3000 names
when your node starts up.

.. code:: yaml

    node:
      name: <NAME OF YOUR NODE>

Internally, all settings are collapsed into "namespaced" settings. For
example, the above gets collapsed into ``node.name``. This means that
its easy to support other configuration formats, for example,
`JSON <http://www.json.org>`__. If JSON is a preferred configuration
format, simply rename the ``elasticsearch.yml`` file to
``elasticsearch.json`` and add:

**Configuration styles**

.. code:: yaml

    {
        "network" : {
            "host" : "10.0.0.4"
        }
    }

It also means that its easy to provide the settings externally either
using the ``ES_JAVA_OPTS`` or as parameters to the ``elasticsearch``
command, for example:

.. code:: sh

    $ elasticsearch -Des.network.host=10.0.0.4

Another option is to set ``es.default.`` prefix instead of ``es.``
prefix, which means the default setting will be used only if not
explicitly set in the configuration file.

Another option is to use the ``${...}`` notation within the
configuration file which will resolve to an environment setting, for
example:

.. code:: js

    {
        "network" : {
            "host" : "${ES_NET_HOST}"
        }
    }

The location of the configuration file can be set externally using a
system property:

.. code:: sh

    $ elasticsearch -Des.config=/path/to/config/file

**Index Settings**

Indices created within the cluster can provide their own settings. For
example, the following creates an index with memory based storage
instead of the default file system based one (the format can be either
YAML or JSON):

.. code:: sh

    $ curl -XPUT http://localhost:9200/kimchy/ -d \
    '
    index :
        store:
            type: memory
    '

Index level settings can be set on the node level as well, for example,
within the ``elasticsearch.yml`` file, the following can be set:

.. code:: yaml

    index :
        store:
            type: memory

This means that every index that gets created on the specific node
started with the mentioned configuration will store the index in memory
**unless the index explicitly sets it**. In other words, any index level
settings override what is set in the node configuration. Of course, the
above can also be set as a "collapsed" setting, for example:

.. code:: sh

    $ elasticsearch -Des.index.store.type=memory

All of the index level configuration can be found within each `index
module <#index-modules>`__.

**Logging**

Elasticsearch uses an internal logging abstraction and comes, out of the
box, with `log4j <http://logging.apache.org/log4j/>`__. It tries to
simplify log4j configuration by using `YAML <http://www.yaml.org/>`__ to
configure it, and the logging configuration file is
``config/logging.yml`` file.

Running As a Service on Linux
=============================

In order to run elasticsearch as a service on your operating system, the
provided packages try to make it as easy as possible for you to start
and stop elasticsearch during reboot and upgrades.

**Linux**

Currently our build automatically creates a debian package and an RPM
package, which is available on the download page. The package itself
does not have any dependencies, but you have to make sure that you
installed a JDK.

Each package features a configuration file, which allows you to set the
following parameters

+------------+---------------------------------------------------------------+
| ``ES_USER` | The user to run as, defaults to ``elasticsearch``             |
| `          |                                                               |
+------------+---------------------------------------------------------------+
| ``ES_GROUP | The group to run as, defaults to ``elasticsearch``            |
| ``         |                                                               |
+------------+---------------------------------------------------------------+
| ``ES_HEAP_ | The heap size to start with                                   |
| SIZE``     |                                                               |
+------------+---------------------------------------------------------------+
| ``ES_HEAP_ | The size of the new generation heap                           |
| NEWSIZE``  |                                                               |
+------------+---------------------------------------------------------------+
| ``ES_DIREC | The maximum size of the direct memory                         |
| T_SIZE``   |                                                               |
+------------+---------------------------------------------------------------+
| ``MAX_OPEN | Maximum number of open files, defaults to ``65535``           |
| _FILES``   |                                                               |
+------------+---------------------------------------------------------------+
| ``MAX_LOCK | Maximum locked memory size. Set to "unlimited" if you use the |
| ED_MEMORY` | bootstrap.mlockall option in elasticsearch.yml. You must also |
| `          | set ES\_HEAP\_SIZE.                                           |
+------------+---------------------------------------------------------------+
| ``MAX_MAP_ | Maximum number of memory map areas a process may have. If you |
| COUNT``    | use ``mmapfs`` as index store type, make sure this is set to  |
|            | a high value. For more information, check the `linux kernel   |
|            | documentation <https://github.com/torvalds/linux/blob/master/ |
|            | Documentation/sysctl/vm.txt>`__                               |
|            | about ``max_map_count``. This is set via ``sysctl`` before    |
|            | starting elasticsearch. Defaults to ``65535``                 |
+------------+---------------------------------------------------------------+
| ``LOG_DIR` | Log directory, defaults to ``/var/log/elasticsearch``         |
| `          |                                                               |
+------------+---------------------------------------------------------------+
| ``DATA_DIR | Data directory, defaults to ``/var/lib/elasticsearch``        |
| ``         |                                                               |
+------------+---------------------------------------------------------------+
| ``WORK_DIR | Work directory, defaults to ``/tmp/elasticsearch``            |
| ``         |                                                               |
+------------+---------------------------------------------------------------+
| ``CONF_DIR | Configuration file directory (which needs to include          |
| ``         | ``elasticsearch.yml`` and ``logging.yml`` files), defaults to |
|            | ``/etc/elasticsearch``                                        |
+------------+---------------------------------------------------------------+
| ``CONF_FIL | Path to configuration file, defaults to                       |
| E``        | ``/etc/elasticsearch/elasticsearch.yml``                      |
+------------+---------------------------------------------------------------+
| ``ES_JAVA_ | Any additional java options you may want to apply. This may   |
| OPTS``     | be useful, if you need to set the ``node.name`` property, but |
|            | do not want to change the ``elasticsearch.yml`` configuration |
|            | file, because it is distributed via a provisioning system     |
|            | like puppet or chef. Example:                                 |
|            | ``ES_JAVA_OPTS="-Des.node.name=search-01"``                   |
+------------+---------------------------------------------------------------+
| ``RESTART_ | Configure restart on package upgrade, defaults to ``false``.  |
| ON_UPGRADE | This means you will have to restart your elasticsearch        |
| ``         | instance after installing a package manually. The reason for  |
|            | this is to ensure, that upgrades in a cluster do not result   |
|            | in a continuous shard reallocation resulting in high network  |
|            | traffic and reducing the response times of your cluster.      |
+------------+---------------------------------------------------------------+

**Debian/Ubuntu**

The debian package ships with everything you need as it uses standard
debian tools like update ``update-rc.d`` to define the runlevels it runs
on. The init script is placed at ``/etc/init.d/elasticsearch`` as you
would expect it. The configuration file is placed at
``/etc/default/elasticsearch``.

The debian package does not start up the service by default. The reason
for this is to prevent the instance to accidentally join a cluster,
without being configured appropriately. After installing using
``dpkg -i`` you can use the following commands to ensure, that
elasticsearch starts when the system is booted and then start up
elasticsearch:

.. code:: sh

    sudo update-rc.d elasticsearch defaults 95 10
    sudo /etc/init.d/elasticsearch start

**Installing the oracle JDK**

The usual recommendation is to run the Oracle JDK with elasticsearch.
However Ubuntu and Debian only ship the OpenJDK due to license issues.
You can easily install the oracle installer package though. In case you
are missing the ``add-apt-repository`` command under Debian GNU/Linux,
make sure have at least Debian Wheezy and the package
``python-software-properties`` installed

.. code:: sh

    sudo add-apt-repository ppa:webupd8team/java
    sudo apt-get update
    sudo apt-get install oracle-java7-installer
    java -version

The last command should verify a successful installation of the Oracle
JDK.

**RPM based distributions**

**Using chkconfig**

Some RPM based distributions are using ``chkconfig`` to enable and
disable services. The init script is located at
``/etc/init.d/elasticsearch``, where as the configuration file is placed
at ``/etc/sysconfig/elasticsearch``. Like the debian package the RPM
package is not started by default after installation, you have to do
this manually by entering the following commands

.. code:: sh

    sudo /sbin/chkconfig --add elasticsearch
    sudo service elasticsearch start

**Using systemd**

Distributions like SUSE do not use the ``chkconfig`` tool to register
services, but rather ``systemd`` and its command ``/bin/systemctl`` to
start and stop services (at least in newer versions, otherwise use the
``chkconfig`` commands above). The configuration file is also placed at
``/etc/sysconfig/elasticsearch``. After installing the RPM, you have to
change the systemd configuration and then start up elasticsearch

.. code:: sh

    sudo /bin/systemctl daemon-reload
    sudo /bin/systemctl enable elasticsearch.service
    sudo /bin/systemctl start elasticsearch.service

Also note that changing the ``MAX_MAP_COUNT`` setting in
``/etc/sysconfig/elasticsearch`` does not have any effect, you will have
to change it in ``/usr/lib/sysctl.d/elasticsearch.conf`` in order to
have it applied at startup.

Running as a Service on Windows
===============================

Windows users can configure Elasticsearch to run as a service to run in
the background or start automatically at startup without any user
interaction. This can be achieved through ``service.bat`` script under
``bin/`` folder which allows one to install, remove, manage or configure
the service and potentially start and stop the service, all from the
command-line.

.. code:: sh

    c:\elasticsearch-1.4.0\bin>service

    Usage: service.bat install|remove|start|stop|manager [SERVICE_ID]

The script requires one parameter (the command to execute) followed by
an optional one indicating the service id (useful when installing
multiple Elasticsearch services).

The commands available are:

+------------+---------------------------------------------------------------+
| ``install` | Install Elasticsearch as a service                            |
| `          |                                                               |
+------------+---------------------------------------------------------------+
| ``remove`` | Remove the installed Elasticsearch service (and stop the      |
|            | service if started)                                           |
+------------+---------------------------------------------------------------+
| ``start``  | Start the Elasticsearch service (if installed)                |
+------------+---------------------------------------------------------------+
| ``stop``   | Stop the Elasticsearch service (if started)                   |
+------------+---------------------------------------------------------------+
| ``manager` | Start a GUI for managing the installed service                |
| `          |                                                               |
+------------+---------------------------------------------------------------+

Note that the environment configuration options available during the
installation are copied and will be used during the service lifecycle.
This means any changes made to them after the installation will not be
picked up unless the service is reinstalled.

Based on the architecture of the available JDK/JRE (set through
``JAVA_HOME``), the appropriate 64-bit(x64) or 32-bit(x86) service will
be installed. This information is made available during install:

.. code:: sh

    c:\elasticsearch-{version}bin>service install
    Installing service      :  "elasticsearch-service-x64"
    Using JAVA_HOME (64-bit):  "c:\jvm\jdk1.7"
    The service 'elasticsearch-service-x64' has been installed.

    **Note**

    While a JRE can be used for the Elasticsearch service, due to its
    use of a client VM (as oppose to a server JVM which offers better
    performance for long-running applications) its usage is discouraged
    and a warning will be issued.

**Customizing service settings**

There are two ways to customize the service settings:

Manager GUI
    accessible through ``manager`` command, the GUI offers insight into
    the installed service including its status, startup type, JVM, start
    and stop settings among other things. Simply invoking
    ``service.bat`` from the command-line with the aforementioned option
    will open up the manager window:

|Windows Service Manager GUI|

Customizing ``service.bat``
    at its core, ``service.bat`` relies on `Apache Commons
    Daemon <http://commons.apache.org/proper/commons-daemon/>`__ project
    to install the services. For full flexibility such as customizing
    the user under which the service runs, one can modify the
    installation parameters to tweak all the parameters accordingly. Do
    note that this requires reinstalling the service for the new
    settings to be applied.

    **Note**

    There is also a community supported customizable MSI installer
    available: https://github.com/salyh/elasticsearch-msi-installer (by
    Hendrik Saly).

Directory Layout
================

The directory layout of an installation is as follows:

+--------------------+--------------------+--------------------+--------------------+
| Type               | Description        | Default Location   | Setting            |
+====================+====================+====================+====================+
| **home**           | Home of            | ````               | ``path.home``      |
|                    | elasticsearch      |                    |                    |
|                    | installation.      |                    |                    |
+--------------------+--------------------+--------------------+--------------------+
| **bin**            | Binary scripts     | ``{path.home}/bin` | ````               |
|                    | including          | `                  |                    |
|                    | ``elasticsearch``  |                    |                    |
|                    | to start a node.   |                    |                    |
+--------------------+--------------------+--------------------+--------------------+
| **conf**           | Configuration      | ``{path.home}/conf | ``path.conf``      |
|                    | files including    | ig``               |                    |
|                    | ``elasticsearch.ym |                    |                    |
|                    | l``                |                    |                    |
+--------------------+--------------------+--------------------+--------------------+
| **data**           | The location of    | ``{path.home}/data | ``path.data``      |
|                    | the data files of  | ``                 |                    |
|                    | each index / shard |                    |                    |
|                    | allocated on the   |                    |                    |
|                    | node. Can hold     |                    |                    |
|                    | multiple           |                    |                    |
|                    | locations.         |                    |                    |
+--------------------+--------------------+--------------------+--------------------+
| **logs**           | Log files          | ``{path.home}/logs | ``path.logs``      |
|                    | location.          | ``                 |                    |
+--------------------+--------------------+--------------------+--------------------+
| **plugins**        | Plugin files       | ``{path.home}/plug | ``path.plugins``   |
|                    | location. Each     | ins``              |                    |
|                    | plugin will be     |                    |                    |
|                    | contained in a     |                    |                    |
|                    | subdirectory.      |                    |                    |
+--------------------+--------------------+--------------------+--------------------+

The multiple data locations allows to stripe it. The striping is simple,
placing whole files in one of the locations, and deciding where to place
the file based on the value of the ``index.store.distributor`` setting:

-  ``least_used`` (default) always selects the directory with the most
   available space

-  ``random`` selects directories at random. The probability of
   selecting a particular directory is proportional to amount of
   available space in this directory.

Note, there are no multiple copies of the same data, in that, its
similar to RAID 0. Though simple, it should provide a good solution for
people that don’t want to mess with RAID. Here is how it is configured:

::

    path.data: /mnt/first,/mnt/second

Or the in an array format:

::

    path.data: ["/mnt/first", "/mnt/second"]

**Default Paths**

Below are the default paths that elasticsearch will use, if not
explicitly changed.

**deb and rpm**

+--------------------+--------------------+--------------------+--------------------+
| Type               | Description        | Location           | Location           |
|                    |                    | Debian/Ubuntu      | RHEL/CentOS        |
+====================+====================+====================+====================+
| **home**           | Home of            | ``/usr/share/elast | ``/usr/share/elast |
|                    | elasticsearch      | icsearch``         | icsearch``         |
|                    | installation.      |                    |                    |
+--------------------+--------------------+--------------------+--------------------+
| **bin**            | Binary scripts     | ``/usr/share/elast | ``/usr/share/elast |
|                    | including          | icsearch/bin``     | icsearch/bin``     |
|                    | ``elasticsearch``  |                    |                    |
|                    | to start a node.   |                    |                    |
+--------------------+--------------------+--------------------+--------------------+
| **conf**           | Configuration      | ``/etc/elasticsear | ``/etc/elasticsear |
|                    | files              | ch``               | ch``               |
|                    | ``elasticsearch.ym |                    |                    |
|                    | l``                |                    |                    |
|                    | and                |                    |                    |
|                    | ``logging.yml``.   |                    |                    |
+--------------------+--------------------+--------------------+--------------------+
| **conf**           | Environment        | ``/etc/default/ela | ``/etc/sysconfig/e |
|                    | variables          | sticseach``        | lasticsearch``     |
|                    | including heap     |                    |                    |
|                    | size, file         |                    |                    |
|                    | descriptors.       |                    |                    |
+--------------------+--------------------+--------------------+--------------------+
| **data**           | The location of    | ``/var/lib/elastic | ``/var/lib/elastic |
|                    | the data files of  | search/data``      | search``           |
|                    | each index / shard |                    |                    |
|                    | allocated on the   |                    |                    |
|                    | node.              |                    |                    |
+--------------------+--------------------+--------------------+--------------------+
| **logs**           | Log files location | ``/var/log/elastic | ``/var/log/elastic |
|                    |                    | search``           | search``           |
+--------------------+--------------------+--------------------+--------------------+
| **plugins**        | Plugin files       | ``/usr/share/elast | ``/usr/share/elast |
|                    | location. Each     | icsearch/plugins`` | icsearch/plugins`` |
|                    | plugin will be     |                    |                    |
|                    | contained in a     |                    |                    |
|                    | subdirectory.      |                    |                    |
+--------------------+--------------------+--------------------+--------------------+

**zip and tar.gz**

+--------------------------+--------------------------+--------------------------+
| Type                     | Description              | Location                 |
+==========================+==========================+==========================+
| **home**                 | Home of elasticsearch    | ``{extract.path}``       |
|                          | installation             |                          |
+--------------------------+--------------------------+--------------------------+
| **bin**                  | Binary scripts including | ``{extract.path}/bin``   |
|                          | ``elasticsearch`` to     |                          |
|                          | start a node             |                          |
+--------------------------+--------------------------+--------------------------+
| **conf**                 | Configuration files      | ``{extract.path}/config` |
|                          | ``elasticsearch.yml``    | `                        |
|                          | and ``logging.yml``      |                          |
+--------------------------+--------------------------+--------------------------+
| **conf**                 | Environment variables    | ``{extract.path}/config` |
|                          | including heap size,     | `                        |
|                          | file descriptors         |                          |
+--------------------------+--------------------------+--------------------------+
| **data**                 | The location of the data | ``{extract.path}/data``  |
|                          | files of each index /    |                          |
|                          | shard allocated on the   |                          |
|                          | node                     |                          |
+--------------------------+--------------------------+--------------------------+
| **logs**                 | Log files location       | ``{extract.path}/logs``  |
+--------------------------+--------------------------+--------------------------+
| **plugins**              | Plugin files location.   | ``{extract.path}/plugins |
|                          | Each plugin will be      | ``                       |
|                          | contained in a           |                          |
|                          | subdirectory             |                          |
+--------------------------+--------------------------+--------------------------+

Repositories
============

We also have repositories available for APT and YUM based distributions.

We have split the major versions in separate urls to avoid accidental
upgrades across major version. For all 0.90.x releases use 0.90 as
version number, for 1.0.x use 1.0, for 1.1.x use 1.1 etc.

**APT**

Download and install the Public Signing Key

.. code:: sh

    wget -qO - http://packages.elasticsearch.org/GPG-KEY-elasticsearch | sudo apt-key add -

Add the following to your /etc/apt/sources.list to enable the repository

.. code:: sh

    deb http://packages.elasticsearch.org/elasticsearch/1.4/debian stable main

Run apt-get update and the repository is ready for use. You can install
it with :

.. code:: sh

    apt-get update && apt-get install elasticsearch

**YUM**

Download and install the Public Signing Key

.. code:: sh

    rpm --import http://packages.elasticsearch.org/GPG-KEY-elasticsearch

Add the following in your ``/etc/yum.repos.d/`` directory in a file
named (for example) ``elasticsearch.repo``

.. code:: sh

    [elasticsearch-1.4]
    name=Elasticsearch repository for 1.4.x packages
    baseurl=http://packages.elasticsearch.org/elasticsearch/1.4/centos
    gpgcheck=1
    gpgkey=http://packages.elasticsearch.org/GPG-KEY-elasticsearch
    enabled=1

And your repository is ready for use. You can install it with :

.. code:: sh

    yum install elasticsearch

Upgrading
=========

Elasticsearch can usually be upgraded using a rolling upgrade process,
resulting in no interruption of service. This section details how to
perform both rolling and restart upgrades. To determine whether a
rolling upgrade is supported for your release, please consult this
table:

+-------------+--------------------------+--------------------------------------+
| Upgrade     | Upgrade To               | Supported Upgrade Type               |
| From        |                          |                                      |
+=============+==========================+======================================+
| 0.90.x      | 1.x                      | Restart Upgrade                      |
+-------------+--------------------------+--------------------------------------+
| < 0.90.7    | 0.90.x                   | Restart Upgrade                      |
+-------------+--------------------------+--------------------------------------+
| >= 0.90.7   | 0.90.x                   | Rolling Upgrade                      |
+-------------+--------------------------+--------------------------------------+
| 1.x         | 1.x                      | Rolling Upgrade                      |
+-------------+--------------------------+--------------------------------------+

    **Tip**

    Before upgrading Elasticsearch, it is a good idea to consult the
    `breaking changes <#breaking-changes>`__ docs.

**Back Up Your Data!**

Before performing an upgrade, it’s a good idea to back up the data on
your system. This will allow you to roll back in the event of a problem
with the upgrade. The upgrades sometimes include upgrades to the Lucene
libraries used by Elasticsearch to access the index files, and after an
index file has been updated to work with a new version of Lucene, it may
not be accessible to the versions of Lucene present in earlier
Elasticsearch releases.

**0.90.x and earlier**

To back up a running 0.90.x system, first disable index flushing. This
will prevent indices from being flushed to disk while the backup is in
process:

.. code:: sh

    $ curl -XPUT 'http://localhost:9200/_all/_settings' -d '{
        "index": {
            "translog.disable_flush": "true"
        }
    }'

Then disable reallocation. This will prevent the cluster from moving
data files from one node to another while the backup is in process:

.. code:: sh

    $ curl -XPUT 'http://localhost:9200/_cluster/settings' -d '{
        "transient" : {
            "cluster.routing.allocation.disable_allocation": "true"
        }
    }'

After reallocation and index flushing are disabled, initiate a backup of
Elasticsearch’s data path using your favorite backup method (tar,
storage array snapshots, backup software). When the backup is complete
and data no longer needs to be read from the Elasticsearch data path,
reallocation and index flushing must be re-enabled:

.. code:: sh

    $ curl -XPUT 'http://localhost:9200/_all/_settings' -d '{
        "index": {
            "translog.disable_flush": "false"
        }
    }'

    $ curl -XPUT 'http://localhost:9200/_cluster/settings' -d '{
        "transient" : {
            "cluster.routing.allocation.disable_allocation": "false"
        }
    }'

**1.0 and later**

To back up a running 1.0 or later system, it is simplest to use the
snapshot feature. Complete instructions for backup and restore with
snapshots are available
`here <http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html>`__.

**Rolling upgrade process**

A rolling upgrade allows the ES cluster to be upgraded one node at a
time, with no observable downtime for end users. Running multiple
versions of Elasticsearch in the same cluster for any length of time
beyond that required for an upgrade is not supported, as shard
replication from the more recent version to the previous versions will
not work.

Within minor or maintenance releases after release 1.0, rolling upgrades
are supported. To perform a rolling upgrade:

-  Disable shard reallocation (optional). This is done to allow for a
   faster startup after cluster shutdown. If this step is not performed,
   the nodes will immediately start trying to replicate shards to each
   other on startup and will spend a lot of time on wasted I/O. With
   shard reallocation disabled, the nodes will join the cluster with
   their indices intact, without attempting to rebalance. After startup
   is complete, reallocation will be turned back on.

This syntax applies to Elasticsearch 1.0 and later:

.. code:: sh

            curl -XPUT localhost:9200/_cluster/settings -d '{
                    "transient" : {
                        "cluster.routing.allocation.enable" : "none"
                    }
            }'

-  Shut down a single node within the cluster.

.. code:: sh

    curl -XPOST 'http://localhost:9200/_cluster/nodes/_local/_shutdown'

-  Confirm that all shards are correctly reallocated to the remaining
   running nodes.

-  Upgrade the stopped node. To upgrade using a zip or compressed
   tarball from elasticsearch.org:

   -  Extract the zip or tarball to a new directory, usually in the same
      volume as the current Elasticsearch installation. Do not overwrite
      the existing installation, as the downloaded archive will contain
      a default elasticsearch.yml file and will overwrite your existing
      configuration.

   -  Copy the configuration files from the old Elasticsearch
      installation’s config directory to the new Elasticsearch
      installation’s config directory. Move data files from the old
      Elasticsesarch installation’s data directory if necessary. If data
      files are not located within the tarball’s extraction directory,
      they will not have to be moved.

   -  The simplest solution for moving from one version to another is to
      have a symbolic link for *elasticsearch* that points to the
      currently running version. This link can be easily updated and
      will provide a stable access point to the most recent version.
      Update this symbolic link if it is being used.

-  To upgrade using a ``.deb`` or ``.rpm`` package:

   -  Use ``rpm`` or ``dpkg`` to install the new package. All files
      should be placed in their proper locations, and config files
      should not be overwritten.

-  Start the now upgraded node. Confirm that it joins the cluster.

-  Re-enable shard reallocation:

.. code:: sh

            curl -XPUT localhost:9200/_cluster/settings -d '{
                    "transient" : {
                        "cluster.routing.allocation.enable" : "all"
                    }
            }'

-  Observe that all shards are properly allocated on all nodes.
   Balancing may take some time.

-  Repeat this process for all remaining nodes.

It may be possible to perform the upgrade by installing the new software
while the service is running. This would reduce downtime by ensuring the
service was ready to run on the new version as soon as it is stopped on
the node being upgraded. This can be done by installing the new version
in its own directory and using the symbolic link method outlined above.
It is important to test this procedure first to be sure that
site-specific configuration data and production indices will not be
overwritten during the upgrade process.

**Cluster restart upgrade process**

Elasticsearch releases prior to 1.0 and releases after 1.0 are not
compatible with each other, so a rolling upgrade is not possible. In
order to upgrade a pre-1.0 system to 1.0 or later, a full cluster stop
and start is required. In order to perform this upgrade:

-  Disable shard reallocation (optional). This is done to allow for a
   faster startup after cluster shutdown. If this step is not performed,
   the nodes will immediately start trying to replicate shards to each
   other on startup and will spend a lot of time on wasted I/O. With
   shard reallocation disabled, the nodes will join the cluster with
   their indices intact, without attempting to rebalance. After startup
   is complete, reallocation will be turned back on.

This syntax is from versions prior to 1.0:

.. code:: sh

        curl -XPUT localhost:9200/_cluster/settings -d '{
            "persistent" : {
            "cluster.routing.allocation.disable_allocation" : true
            }
        }'

-  Stop all Elasticsearch services on all nodes in the cluster.

.. code:: sh

        curl -XPOST 'http://localhost:9200/_shutdown'

-  On the first node to be upgraded, extract the archive or install the
   new package as described above in the Rolling Upgrades section.
   Repeat for all nodes.

-  After upgrading Elasticsearch on all nodes is complete, the cluster
   can be started by starting each node individually.

   -  Start master-eligible nodes first, one at a time. Verify that a
      quorum has been reached and a master has been elected before
      proceeding.

   -  Start data nodes and then client nodes one at a time, verifying
      that they successfully join the cluster.

-  When the cluster is running and reaches a yellow state, shard
   reallocation can be enabled.

This syntax is from release 1.0 and later:

.. code:: sh

        curl -XPUT localhost:9200/_cluster/settings -d '{
                "persistent" : {
            "cluster.routing.allocation.disable_allocation": false,
                "cluster.routing.allocation.enable" : "all"
                }
        }'

The cluster upgrade can be streamlined by installing the software before
stopping cluster services. If this is done, testing must be performed to
ensure that no production data or configuration files are overwritten
prior to restart.

This section discusses the changes that you need to be aware of when
migrating your application from one version of Elasticsearch to another.

As a general rule:

-  Migration between major versions — e.g. ``1.x`` to ``2.x`` — 
   requires a `full cluster restart <#restart-upgrade>`__.

-  Migration between minor versions — e.g. ``1.x`` to ``1.y`` — can be
   performed by `upgrading one node at a time <#rolling-upgrades>`__.

See ? for more info.

Breaking changes in 2.0
=======================

This section discusses the changes that you need to be aware of when
migrating your application to Elasticsearch 2.0.

Indices API
-----------

The `get alias api <#alias-retrieving>`__ will, by default produce an
error response if a requested index does not exist. This change brings
the defaults for this API in line with the other Indices APIs. The ?
options can be used on a request to change this behavior

Partial fields
--------------

Partial fields were deprecated since 1.0.0beta1 in favor of `source
filtering <#search-request-source-filtering>`__.

More Like This Field
--------------------

The More Like This Field query has been removed in favor of the `More
Like This Query <#query-dsl-mlt-query>`__ restrained set to a specific
``field``.

Routing
-------

The default hash function that is used for routing has been changed from
djb2 to murmur3. This change should be transparent unless you relied on
very specific properties of djb2. This will help ensure a better balance
of the document counts between shards.

In addition, the following node settings related to routing have been
deprecated:

+------------+---------------------------------------------------------------+
| ``cluster. | This was an undocumented setting that allowed to configure    |
| routing.op | which hash function to use for routing. ``murmur3`` is now    |
| eration.ha | enforced on new indices.                                      |
| sh.type``  |                                                               |
+------------+---------------------------------------------------------------+
| ``cluster. | This was an undocumented setting that allowed to take the     |
| routing.op | ``_type`` of the document into account when computing its     |
| eration.us | shard (default: ``false``). ``false`` is now enforced on new  |
| e_type``   | indices.                                                      |
+------------+---------------------------------------------------------------+

Breaking changes in 1.x
=======================

This section discusses the changes that you need to be aware of when
migrating your application from Elasticsearch 1.x to Elasticsearch 1.y.

**Facets**

Facets are deprecated and will be removed in a future release. You are
encouraged to migrate to `aggregations <#search-aggregations>`__
instead.

1.4
---

Percolator
~~~~~~~~~~

In indices created with version ``1.4.0`` or later, percolation queries
can only refer to fields that already exist in the mappings in that
index. There are two ways to make sure that a field mapping exist:

-  Add or update a mapping via the `create
   index <#indices-create-index>`__ or `put
   mapping <#indices-put-mapping>`__ apis.

-  Percolate a document before registering a query. Percolating a
   document can add field mappings dynamically, in the same way as
   happens when indexing a document.

Aliases
~~~~~~~

`Aliases <#indices-aliases>`__ can include
`filters <#query-dsl-filters>`__ which are automatically applied to any
search performed via the alias. `Filtered aliases <#filtered>`__ created
with version ``1.4.0`` or later can only refer to field names which
exist in the mappings of the index (or indices) pointed to by the alias.

Add or update a mapping via the `create index <#indices-create-index>`__
or `put mapping <#indices-put-mapping>`__ apis.

Indices APIs
~~~~~~~~~~~~

The `get warmer api <#warmer-retrieving>`__ will return a section for
``warmers`` even if there are no warmers. This ensures that the
following two examples are equivalent:

.. code:: js

    curl -XGET 'http://localhost:9200/_all/_warmers'

    curl -XGET 'http://localhost:9200/_warmers'

The `get alias api <#alias-retrieving>`__ will return a section for
``aliases`` even if there are no aliases. This ensures that the
following two examples are equivalent:

.. code:: js

    curl -XGET 'http://localhost:9200/_all/_aliases'

    curl -XGET 'http://localhost:9200/_aliases'

The `get mapping api <#indices-get-mapping>`__ will return a section for
``mappings`` even if there are no mappings. This ensures that the
following two examples are equivalent:

.. code:: js

    curl -XGET 'http://localhost:9200/_all/_mappings'

    curl -XGET 'http://localhost:9200/_mappings'

Zen discovery
~~~~~~~~~~~~~

Each cluster must have an elected master node in order to be fully
operational. Once a node loses its elected master node it will reject
some or all operations.

On versions before ``1.4.0.Beta1`` all operations are rejected when a
node loses its elected master. From ``1.4.0.Beta1`` only write
operations will be rejected by default. Read operations will still be
served based on the information available to the node, which may result
in being partial and possibly also stale. If the default is undesired
then the pre ``1.4.0.Beta1`` behaviour can be enabled, see:
`no-master-block <#modules-discovery-zen>`__

Breaking changes in 1.0
=======================

This section discusses the changes that you need to be aware of when
migrating your application to Elasticsearch 1.0.

System and settings
-------------------

-  Elasticsearch now runs in the foreground by default. There is no more
   ``-f`` flag on the command line. Instead, to run elasticsearch as a
   daemon, use the ``-d`` flag:

.. code:: sh

    ./bin/elasticsearch -d

-  Command line settings can now be passed without the ``-Des.`` prefix,
   for instance:

.. code:: sh

    ./bin/elasticsearch --node.name=search_1 --cluster.name=production

-  Elasticsearch on 64 bit Linux now uses ```mmapfs`` <#mmapfs>`__ by
   default. Make sure that you set
   ```MAX_MAP_COUNT`` <#setup-service>`__ to a sufficiently high number.
   The RPM and Debian packages default this value to ``262144``.

-  The RPM and Debian packages no longer start Elasticsearch by default.

-  The ``cluster.routing.allocation`` settings (``disable_allocation``,
   ``disable_new_allocation`` and ``disable_replica_location``) have
   been `replaced by the single setting <#modules-cluster>`__:

   .. code:: yaml

       cluster.routing.allocation.enable: all|primaries|new_primaries|none

Stats and Info APIs
-------------------

The ```cluster_state`` <#cluster-state>`__,
```nodes_info`` <#cluster-nodes-info>`__,
```nodes_stats`` <#cluster-nodes-stats>`__ and
```indices_stats`` <#indices-stats>`__ APIs have all been changed to
make their format more RESTful and less clumsy.

For instance, if you just want the ``nodes`` section of the the
``cluster_state``, instead of:

.. code:: sh

    GET /_cluster/state?filter_metadata&filter_routing_table&filter_blocks

you now use:

.. code:: sh

    GET /_cluster/state/nodes

Simliarly for the ``nodes_stats`` API, if you want the ``transport`` and
``http`` metrics only, instead of:

.. code:: sh

    GET /_nodes/stats?clear&transport&http

you now use:

.. code:: sh

    GET /_nodes/stats/transport,http

See the links above for full details.

Indices APIs
------------

The ``mapping``, ``alias``, ``settings``, and ``warmer`` index APIs are
all similar but there are subtle differences in the order of the URL and
the response body. For instance, adding a mapping and a warmer look
slightly different:

.. code:: sh

    PUT /{index}/{type}/_mapping
    PUT /{index}/_warmer/{name}

These URLs have been unified as:

.. code:: sh

    PUT /{indices}/_mapping/{type}
    PUT /{indices}/_alias/{name}
    PUT /{indices}/_warmer/{name}

    GET /{indices}/_mapping/{types}
    GET /{indices}/_alias/{names}
    GET /{indices}/_settings/{names}
    GET /{indices}/_warmer/{names}

    DELETE /{indices}/_mapping/{types}
    DELETE /{indices}/_alias/{names}
    DELETE /{indices}/_warmer/{names}

All of the ``{indices}``, ``{types}`` and ``{names}`` parameters can be
replaced by:

-  ``_all``, ``*`` or blank (ie left out altogether), all of which mean
   “all”

-  wildcards like ``test*``

-  comma-separated lists: ``index_1,test_*``

The only exception is ``DELETE`` which doesn’t accept blank (missing)
parameters. If you want to delete something, you should be specific.

Similarly, the return values for ``GET`` have been unified with the
following rules:

-  Only return values that exist. If you try to ``GET`` a mapping which
   doesn’t exist, then the result will be an empty object: ``{}``. We no
   longer throw a ``404`` if the requested mapping/warmer/alias/setting
   doesn’t exist.

-  The response format always has the index name, then the section, then
   the element name, for instance:

   .. code:: json

       {
           "my_index": {
               "mappings": {
                   "my_type": {...}
               }
           }
       }

   This is a breaking change for the ``get_mapping`` API.

In the future we will also provide plural versions to allow putting
multiple mappings etc in a single request.

See ```put-mapping`` <#indices-put-mapping>`__, ```get-
mapping`` <#indices-get-mapping>`__,
```get-field-mapping`` <#indices-get-field-mapping>`__,
```delete-mapping`` <#indices-delete-mapping>`__,
```update-settings`` <#indices-update-settings>`__,
```get-settings`` <#indices-get-settings>`__,
```warmers`` <#indices-warmers>`__, and
```aliases`` <#indices-aliases>`__ for more details.

Index request
-------------

Previously a document could be indexed as itself, or wrapped in an outer
object which specified the ``type`` name:

.. code:: json

    PUT /my_index/my_type/1
    {
      "my_type": {
         ... doc fields ...
      }
    }

This led to some ambiguity when a document also included a field with
the same name as the ``type``. We no longer accept the outer ``type``
wrapper, but this behaviour can be reenabled on an index-by-index basis
with the setting: ``index.mapping.allow_type_wrapper``.

Search requests
---------------

While the ``search`` API takes a top-level ``query`` parameter, the
```count`` <#search-count>`__,
```delete-by-query`` <#docs-delete-by-query>`__ and
```validate-query`` <#search-validate>`__ requests expected the whole
body to be a query. These now *require* a top-level ``query`` parameter:

.. code:: json

    GET /_count
    {
        "query": {
            "match": {
                "title": "Interesting stuff"
            }
        }
    }

Also, the top-level ``filter`` parameter in search has been renamed to
```post_filter`` <#search-request-post-filter>`__, to indicate that it
should not be used as the primary way to filter search results (use a
```filtered`` query <#query-dsl-filtered-query>`__ instead), but only to
filter results AFTER aggregations have been calculated.

This example counts the top colors in all matching docs, but only
returns docs with color ``red``:

.. code:: json

    GET /_search
    {
        "query": {
            "match_all": {}
        },
        "aggs": {
            "colors": {
                "terms": { "field": "color" }
            }
        },
        "post_filter": {
            "term": {
                "color": "red"
            }
        }
    }

Multi-fields
------------

Multi-fields are dead! Long live multi-fields! Well, the field type
``multi_field`` has been removed. Instead, any of the core field types
(excluding ``object`` and ``nested``) now accept a ``fields`` parameter.
It’s the same thing, but nicer. Instead of:

.. code:: json

    "title": {
        "type": "multi_field",
        "fields": {
            "title": { "type": "string" },
            "raw":   { "type": "string", "index": "not_analyzed" }
        }
    }

you can now write:

.. code:: json

    "title": {
        "type": "string",
        "fields": {
            "raw":   { "type": "string", "index": "not_analyzed" }
        }
    }

Existing multi-fields will be upgraded to the new format automatically.

Also, instead of having to use the arcane ``path`` and ``index_name``
parameters in order to index multiple fields into a single “custom
``_all`` field”, you can now use the ```copy_to``
parameter <#copy-to>`__.

Stopwords
---------

Previously, the ```standard`` <#analysis-standard-analyzer>`__ and
```pattern`` <#analysis-pattern-analyzer>`__ analyzers used the list of
English stopwords by default, which caused some hard to debug indexing
issues. Now they are set to use the empty stopwords list (ie ``_none_``)
instead.

Dates without years
-------------------

When dates are specified without a year, for example:
``Dec 15 10:00:00`` they are treated as dates in 2000 during indexing
and range searches… except for the upper included bound ``lte`` where
they were treated as dates in 1970! Now, all `dates without
years <https://github.com/elasticsearch/elasticsearch/issues/4451>`__
use ``1970`` as the default.

Parameters
----------

-  Geo queries used to use ``miles`` as the default unit. And we `all
   know what happened at
   NASA <http://en.wikipedia.org/wiki/Mars_Climate_Orbiter>`__ because
   of that decision. The new default unit is
   ```meters`` <https://github.com/elasticsearch/elasticsearch/issues/4515>`__.

-  For all queries that support *fuzziness*, the ``min_similarity``,
   ``fuzziness`` and ``edit_distance`` parameters have been unified as
   the single parameter ``fuzziness``. See ? for details of accepted
   values.

-  The ``ignore_missing`` parameter has been replaced by the
   ``expand_wildcards``, ``ignore_unavailable`` and ``allow_no_indices``
   parameters, all of which have sensible defaults. See `the multi-index
   docs <#multi-index>`__ for more.

-  An index name (or pattern) is now required for destructive operations
   like deleting indices:

   .. code:: sh

       # v0.90 - delete all indices:
       DELETE /

       # v1.0 - delete all indices:
       DELETE /_all
       DELETE /*

   Setting ``action.destructive_requires_name`` to ``true`` provides
   further safety by disabling wildcard expansion on destructive
   actions.

Return values
-------------

-  The ``ok`` return value has been removed from all response bodies as
   it added no useful information.

-  The ``found``, ``not_found`` and ``exists`` return values have been
   unified as ``found`` on all relevant APIs.

-  Field values, in response to the
   ```fields`` <#search-request-fields>`__ parameter, are now always
   returned as arrays. A field could have single or multiple values,
   which meant that sometimes they were returned as scalars and
   sometimes as arrays. By always returning arrays, this simplifies user
   code. The only exception to this rule is when ``fields`` is used to
   retrieve metadata like the ``routing`` value, which are always
   singular. Metadata fields are always returned as scalars.

   The ``fields`` parameter is intended to be used for retrieving stored
   fields, rather than for fields extracted from the ``_source``. That
   means that it can no longer be used to return whole objects and it no
   longer accepts the ``_source.fieldname`` format. For these you should
   use the ```_source`` ``_source_include`` and
   ``_source_exclude`` <#search-request-source-filtering>`__ parameters
   instead.

-  Settings, like ``index.analysis.analyzer.default`` are now returned
   as proper nested JSON objects, which makes them easier to work with
   programatically:

   .. code:: json

       {
           "index": {
               "analysis": {
                   "analyzer": {
                       "default": xxx
                   }
               }
           }
       }

   You can choose to return them in flattened format by passing
   ``?flat_settings`` in the query string.

-  The ```analyze`` <#indices-analyze>`__ API no longer supports the
   text response format, but does support JSON and YAML.

Deprecations
------------

-  The ``text`` query has been removed. Use the
   ```match`` <#query-dsl-match-query>`__ query instead.

-  The ``field`` query has been removed. Use the
   ```query_string`` <#query-dsl-query-string-query>`__ query instead.

-  Per-document boosting with the ``_boost`` field has been removed. You
   can use the ```function_score`` <#query-dsl-function-score-query>`__
   instead.

-  The ``path`` parameter in mappings has been deprecated. Use the
   ```copy_to`` <#copy-to>`__ parameter instead.

-  The ``custom_score`` and ``custom_boost_score`` is no longer
   supported. You can use
   ```function_score`` <#query-dsl-function-score-query>`__ instead.

Percolator
----------

The percolator has been redesigned and because of this the dedicated
``_percolator`` index is no longer used by the percolator, but instead
the percolator works with a dedicated ``.percolator`` type. Read the
`redesigned
percolator <http://www.elasticsearch.org/blog/percolator-redesign-blog-post/>`__
blog post for the reasons why the percolator has been redesigned.

Elasticsearch will **not** delete the ``_percolator`` index when
upgrading, only the percolate api will not use the queries stored in the
``_percolator`` index. In order to use the already stored queries, you
can just re-index the queries from the ``_percolator`` index into any
index under the reserved ``.percolator`` type. The format in which the
percolate queries were stored has **not** been changed. So a simple
script that does a scan search to retrieve all the percolator queries
and then does a bulk request into another index should be sufficient.

The **elasticsearch** REST APIs are exposed using:

-  `JSON over HTTP <#modules-http>`__,

-  `thrift <#modules-thrift>`__,

-  `memcached <#modules-memcached>`__.

The conventions listed in this chapter can be applied throughout the
REST API, unless otherwise specified.

-  ?

-  ?

Multiple Indices
================

Most APIs that refer to an ``index`` parameter support execution across
multiple indices, using simple ``test1,test2,test3`` notation (or
``_all`` for all indices). It also support wildcards, for example:
``test*``, and the ability to "add" (``+``) and "remove" (``-``), for
example: ``+test*,-test3``.

All multi indices API support the following url query string parameters:

``ignore_unavailable``
    Controls whether to ignore if any specified indices are unavailable,
    this includes indices that don’t exist or closed indices. Either
    ``true`` or ``false`` can be specified.

``allow_no_indices``
    Controls whether to fail if a wildcard indices expressions results
    into no concrete indices. Either ``true`` or ``false`` can be
    specified. For example if the wildcard expression ``foo*`` is
    specified and no indices are available that start with ``foo`` then
    depending on this setting the request will fail. This setting is
    also applicable when ``_all``, ``*`` or no index has been specified.

``expand_wildcards``
    Controls to what kind of concrete indices wildcard indices
    expression expand to. If ``open`` is specified then the wildcard
    expression is expanded to only open indices and if ``closed`` is
    specified then the wildcard expression is expanded only to closed
    indices. Also both values (``open,closed``) can be specified to
    expand to all indices.

    If ``none`` is specified then wildcard expansion will be disabled
    and if ``all`` is specified, wildcard expressions will expand to all
    indices (this is equivalent to specifying ``open,closed``).

The defaults settings for the above parameters depend on the api being
used.

    **Note**

    Single index APIs such as the ? and the `single-index ``alias``
    APIs <#indices-aliases>`__ do not support multiple indices.

Common options
==============

The following options can be applied to all of the REST APIs.

**Pretty Results**

When appending ``?pretty=true`` to any request made, the JSON returned
will be pretty formatted (use it for debugging only!). Another option is
to set ``format=yaml`` which will cause the result to be returned in the
(sometimes) more readable yaml format.

**Human readable output**

Statistics are returned in a format suitable for humans (eg
``"exists_time": "1h"`` or ``"size": "1kb"``) and for computers (eg
``"exists_time_in_millis": 3600000``` or ``"size_in_bytes": 1024``). The
human readable values can be turned off by adding ``?human=false`` to
the query string. This makes sense when the stats results are being
consumed by a monitoring tool, rather than intended for human
consumption. The default for the ``human`` flag is ``false``.

**Flat Settings**

The ``flat_settings`` flag affects rendering of the lists of settings.
When flat\_settings\` flag is ``true`` settings are returned in a flat
format:

.. code:: js

    {
      "persistent" : { },
      "transient" : {
        "discovery.zen.minimum_master_nodes" : "1"
      }
    }

When the ``flat_settings`` flag is ``false`` settings are returned in a
more human readable structured format:

.. code:: js

    {
      "persistent" : { },
      "transient" : {
        "discovery" : {
          "zen" : {
            "minimum_master_nodes" : "1"
          }
        }
      }
    }

By default the ``flat_settings`` is set to ``false``.

**Parameters**

Rest parameters (when using HTTP, map to HTTP URL parameters) follow the
convention of using underscore casing.

**Boolean Values**

All REST APIs parameters (both request parameters and JSON body) support
providing boolean "false" as the values: ``false``, ``0``, ``no`` and
``off``. All other values are considered "true". Note, this is not
related to fields within a document indexed treated as boolean fields.

**Number Values**

All REST APIs support providing numbered parameters as ``string`` on top
of supporting the native JSON number types.

**Time units**

Whenever durations need to be specified, eg for a ``timeout`` parameter,
the duration can be specified as a whole number representing time in
milliseconds, or as a time value like ``2d`` for 2 days. The supported
units are:

+------------+---------------------------------------------------------------+
| ``y``      | Year                                                          |
+------------+---------------------------------------------------------------+
| ``M``      | Month                                                         |
+------------+---------------------------------------------------------------+
| ``w``      | Week                                                          |
+------------+---------------------------------------------------------------+
| ``d``      | Day                                                           |
+------------+---------------------------------------------------------------+
| ``h``      | Hour                                                          |
+------------+---------------------------------------------------------------+
| ``m``      | Minute                                                        |
+------------+---------------------------------------------------------------+
| ``s``      | Second                                                        |
+------------+---------------------------------------------------------------+

**Distance Units**

Wherever distances need to be specified, such as the ``distance``
parameter in the ?) or the ``precision`` parameter in the ?, the default
unit if none is specified is the meter. Distances can be specified in
other units, such as ``"1km"`` or ``"2mi"`` (2 miles).

The full list of units is listed below:

+------------+---------------------------------------------------------------+
| Mile       | ``mi`` or ``miles``                                           |
+------------+---------------------------------------------------------------+
| Yard       | ``yd`` or ``yards``                                           |
+------------+---------------------------------------------------------------+
| Feet       | ``ft`` or ``feet``                                            |
+------------+---------------------------------------------------------------+
| Inch       | ``in`` or ``inch``                                            |
+------------+---------------------------------------------------------------+
| Kilometer  | ``km`` or ``kilometers``                                      |
+------------+---------------------------------------------------------------+
| Meter      | ``m`` or ``meters``                                           |
+------------+---------------------------------------------------------------+
| Centimeter | ``cm`` or ``centimeters``                                     |
+------------+---------------------------------------------------------------+
| Millimeter | ``mm`` or ``millimeters``                                     |
+------------+---------------------------------------------------------------+
| Nautical   | ``NM``, ``nmi`` or ``nauticalmiles``                          |
| mile       |                                                               |
+------------+---------------------------------------------------------------+

**Fuzziness**

Some queries and APIs support parameters to allow inexact *fuzzy*
matching, using the ``fuzziness`` parameter. The ``fuzziness`` parameter
is context sensitive which means that it depends on the type of the
field being queried:

**Numeric, date and IPv4 fields**

When querying numeric, date and IPv4 fields, ``fuzziness`` is
interpreted as a ``+/-`` margin. It behaves like a ? where:

::

    -fuzziness <= field value <= +fuzziness

The ``fuzziness`` parameter should be set to a numeric value, eg ``2``
or ``2.0``. A ``date`` field interprets a long as milliseconds, but also
accepts a string containing a time value — \ ``"1h"`` — as explained in
?. An ``ip`` field accepts a long or another IPv4 address (which will be
converted into a long).

**String fields**

When querying ``string`` fields, ``fuzziness`` is interpreted as a
`Levenshtein Edit
Distance <http://en.wikipedia.org/wiki/Levenshtein_distance>`__ — the
number of one character changes that need to be made to one string to
make it the same as another string.

The ``fuzziness`` parameter can be specified as:

``0``, ``1``, ``2``
    the maximum allowed Levenshtein Edit Distance (or number of edits)

``AUTO``
    generates an edit distance based on the length of the term. For
    lengths:

    ``0..1``
        must match exactly

    ``1..4``
        one edit allowed

    ``>4``
        two edits allowed

    ``AUTO`` should generally be the preferred value for ``fuzziness``.

``0.0..1.0``
    converted into an edit distance using the formula:
    ``length(term) * (1.0 -
    fuzziness)``, eg a ``fuzziness`` of ``0.6`` with a term of length 10
    would result in an edit distance of ``4``. Note: in all APIs except
    for the ?, the maximum allowed edit distance is ``2``.

**Result Casing**

All REST APIs accept the ``case`` parameter. When set to ``camelCase``,
all field names in the result will be returned in camel casing,
otherwise, underscore casing will be used. Note, this does not apply to
the source document indexed.

**JSONP**

By default JSONP responses are disabled.

When enabled, all REST APIs accept a ``callback`` parameter resulting in
a `JSONP <http://en.wikipedia.org/wiki/JSONP>`__ result. You can enable
this behavior by adding the following to ``config.yaml``:

::

    http.jsonp.enable: true

Please note, when enabled, due to the architecture of Elasticsearch,
this may pose a security risk. Under some circumstances, an attacker may
be able to exfiltrate data in your Elasticsearch server if they’re able
to force your browser to make a JSONP request on your behalf (e.g. by
including a <script> tag on an untrusted site with a legitimate query
against a local Elasticsearch server).

**Request body in query string**

For libraries that don’t accept a request body for non-POST requests,
you can pass the request body as the ``source`` query string parameter
instead.

URL-based access control
========================

Many users use a proxy with URL-based access control to secure access to
Elasticsearch indices. For `multi-search <#search-multi-search>`__,
`multi-get <#docs-multi-get>`__ and `bulk <#docs-bulk>`__ requests, the
user has the choice of specifying an index in the URL and on each
individual request within the request body. This can make URL-based
access control challenging.

To prevent the user from overriding the index which has been specified
in the URL, add this setting to the ``config.yml`` file:

::

    rest.action.multi.allow_explicit_index: false

The default value is ``true``, but when set to ``false``, Elasticsearch
will reject requests that have an explicit index specified in the
request body.

This section describes the following CRUD APIs:

-  ?

-  ?

-  ?

-  ?

-  ?

-  ?

-  ?

    **Note**

    All CRUD APIs are single-index APIs. The ``index`` parameter accepts
    a single index name, or an ``alias`` which points to a single index.

Index API
=========

The index API adds or updates a typed JSON document in a specific index,
making it searchable. The following example inserts the JSON document
into the "twitter" index, under a type called "tweet" with an id of 1:

.. code:: js

    $ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
        "user" : "kimchy",
        "post_date" : "2009-11-15T14:12:12",
        "message" : "trying out Elasticsearch"
    }'

The result of the above index operation is:

.. code:: js

    {
        "_index" : "twitter",
        "_type" : "tweet",
        "_id" : "1",
        "_version" : 1,
        "created" : true
    }

**Automatic Index Creation**

The index operation automatically creates an index if it has not been
created before (check out the `create index
API <#indices-create-index>`__ for manually creating an index), and also
automatically creates a dynamic type mapping for the specific type if
one has not yet been created (check out the `put
mapping <#indices-put-mapping>`__ API for manually creating a type
mapping).

The mapping itself is very flexible and is schema-free. New fields and
objects will automatically be added to the mapping definition of the
type specified. Check out the `mapping <#mapping>`__ section for more
information on mapping definitions.

Note that the format of the JSON document can also include the type
(very handy when using JSON mappers) if the
``index.mapping.allow_type_wrapper`` setting is set to true, for
example:

.. code:: js

    $ curl -XPOST 'http://localhost:9200/twitter' -d '{
      "settings": {
        "index": {
          "mapping.allow_type_wrapper": true
        }
      }
    }'
    {"acknowledged":true}

    $ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
        "tweet" : {
            "user" : "kimchy",
            "post_date" : "2009-11-15T14:12:12",
            "message" : "trying out Elasticsearch"
        }
    }'

Automatic index creation can be disabled by setting
``action.auto_create_index`` to ``false`` in the config file of all
nodes. Automatic mapping creation can be disabled by setting
``index.mapper.dynamic`` to ``false`` in the config files of all nodes
(or on the specific index settings).

Automatic index creation can include a pattern based white/black list,
for example, set ``action.auto_create_index`` to
``+aaa*,-bbb*,+ccc*,-*`` (+ meaning allowed, and - meaning disallowed).

**Versioning**

Each indexed document is given a version number. The associated
``version`` number is returned as part of the response to the index API
request. The index API optionally allows for `optimistic concurrency
control <http://en.wikipedia.org/wiki/Optimistic_concurrency_control>`__
when the ``version`` parameter is specified. This will control the
version of the document the operation is intended to be executed
against. A good example of a use case for versioning is performing a
transactional read-then-update. Specifying a ``version`` from the
document initially read ensures no changes have happened in the meantime
(when reading in order to update, it is recommended to set
``preference`` to ``_primary``). For example:

.. code:: js

    curl -XPUT 'localhost:9200/twitter/tweet/1?version=2' -d '{
        "message" : "elasticsearch now has versioning support, double cool!"
    }'

**NOTE:** versioning is completely real time, and is not affected by the
near real time aspects of search operations. If no version is provided,
then the operation is executed without any version checks.

By default, internal versioning is used that starts at 1 and increments
with each update, deletes included. Optionally, the version number can
be supplemented with an external value (for example, if maintained in a
database). To enable this functionality, ``version_type`` should be set
to ``external``. The value provided must be a numeric, long value
greater or equal to 0, and less than around 9.2e+18. When using the
external version type, instead of checking for a matching version
number, the system checks to see if the version number passed to the
index request is greater than the version of the currently stored
document. If true, the document will be indexed and the new version
number used. If the value provided is less than or equal to the stored
document’s version number, a version conflict will occur and the index
operation will fail.

A nice side effect is that there is no need to maintain strict ordering
of async indexing operations executed as a result of changes to a source
database, as long as version numbers from the source database are used.
Even the simple case of updating the elasticsearch index using data from
a database is simplified if external versioning is used, as only the
latest version will be used if the index operations are out of order for
whatever reason.

**Version types**

Next to the ``internal`` & ``external`` version types explained above,
Elasticsearch also supports other types for specific use cases. Here is
an overview of the different version types and their semantics.

``internal``
    only index the document if the given version is identical to the
    version of the stored document.

``external`` or ``external_gt``
    only index the document if the given version is strictly higher than
    the version of the stored document **or** if there is no existing
    document. The given version will be used as the new version and will
    be stored with the new document. The supplied version must be a
    non-negative long number.

``external_gte``
    only index the document if the given version is **equal** or higher
    than the version of the stored document. If there is no existing
    document the operation will succeed as well. The given version will
    be used as the new version and will be stored with the new document.
    The supplied version must be a non-negative long number.

``force``
    the document will be indexed regardless of the version of the stored
    document or if there is no existing document. The given version will
    be used as the new version and will be stored with the new document.
    This version type is typically used for correcting errors.

**NOTE**: The ``external_gte`` & ``force`` version types are meant for
special use cases and should be used with care. If used incorrectly,
they can result in loss of data.

**Operation Type**

The index operation also accepts an ``op_type`` that can be used to
force a ``create`` operation, allowing for "put-if-absent" behavior.
When ``create`` is used, the index operation will fail if a document by
that id already exists in the index.

Here is an example of using the ``op_type`` parameter:

.. code:: js

    $ curl -XPUT 'http://localhost:9200/twitter/tweet/1?op_type=create' -d '{
        "user" : "kimchy",
        "post_date" : "2009-11-15T14:12:12",
        "message" : "trying out Elasticsearch"
    }'

Another option to specify ``create`` is to use the following uri:

.. code:: js

    $ curl -XPUT 'http://localhost:9200/twitter/tweet/1/_create' -d '{
        "user" : "kimchy",
        "post_date" : "2009-11-15T14:12:12",
        "message" : "trying out Elasticsearch"
    }'

**Automatic ID Generation**

The index operation can be executed without specifying the id. In such a
case, an id will be generated automatically. In addition, the
``op_type`` will automatically be set to ``create``. Here is an example
(note the **POST** used instead of **PUT**):

.. code:: js

    $ curl -XPOST 'http://localhost:9200/twitter/tweet/' -d '{
        "user" : "kimchy",
        "post_date" : "2009-11-15T14:12:12",
        "message" : "trying out Elasticsearch"
    }'

The result of the above index operation is:

.. code:: js

    {
        "_index" : "twitter",
        "_type" : "tweet",
        "_id" : "6a8ca01c-7896-48e9-81cc-9f70661fcb32",
        "_version" : 1,
        "created" : true
    }

**Routing**

By default, shard placement — or ``routing`` — is controlled by using a
hash of the document’s id value. For more explicit control, the value
fed into the hash function used by the router can be directly specified
on a per-operation basis using the ``routing`` parameter. For example:

.. code:: js

    $ curl -XPOST 'http://localhost:9200/twitter/tweet?routing=kimchy' -d '{
        "user" : "kimchy",
        "post_date" : "2009-11-15T14:12:12",
        "message" : "trying out Elasticsearch"
    }'

In the example above, the "tweet" document is routed to a shard based on
the ``routing`` parameter provided: "kimchy".

When setting up explicit mapping, the ``_routing`` field can be
optionally used to direct the index operation to extract the routing
value from the document itself. This does come at the (very minimal)
cost of an additional document parsing pass. If the ``_routing`` mapping
is defined, and set to be ``required``, the index operation will fail if
no routing value is provided or extracted.

**Parents & Children**

A child document can be indexed by specifying its parent when indexing.
For example:

.. code:: js

    $ curl -XPUT localhost:9200/blogs/blog_tag/1122?parent=1111 -d '{
        "tag" : "something"
    }'

When indexing a child document, the routing value is automatically set
to be the same as its parent, unless the routing value is explicitly
specified using the ``routing`` parameter.

**Timestamp**

A document can be indexed with a ``timestamp`` associated with it. The
``timestamp`` value of a document can be set using the ``timestamp``
parameter. For example:

.. code:: js

    $ curl -XPUT localhost:9200/twitter/tweet/1?timestamp=2009-11-15T14%3A12%3A12 -d '{
        "user" : "kimchy",
        "message" : "trying out Elasticsearch"
    }'

If the ``timestamp`` value is not provided externally or in the
``_source``, the ``timestamp`` will be automatically set to the date the
document was processed by the indexing chain. More information can be
found on the `\_timestamp mapping page <#mapping-timestamp-field>`__.

**TTL**

A document can be indexed with a ``ttl`` (time to live) associated with
it. Expired documents will be expunged automatically. The expiration
date that will be set for a document with a provided ``ttl`` is relative
to the ``timestamp`` of the document, meaning it can be based on the
time of indexing or on any time provided. The provided ``ttl`` must be
strictly positive and can be a number (in milliseconds) or any valid
time value as shown in the following examples:

.. code:: js

    curl -XPUT 'http://localhost:9200/twitter/tweet/1?ttl=86400000' -d '{
        "user": "kimchy",
        "message": "Trying out elasticsearch, so far so good?"
    }'

.. code:: js

    curl -XPUT 'http://localhost:9200/twitter/tweet/1?ttl=1d' -d '{
        "user": "kimchy",
        "message": "Trying out elasticsearch, so far so good?"
    }'

.. code:: js

    curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
        "_ttl": "1d",
        "user": "kimchy",
        "message": "Trying out elasticsearch, so far so good?"
    }'

More information can be found on the `\_ttl mapping
page <#mapping-ttl-field>`__.

**Distributed**

The index operation is directed to the primary shard based on its route
(see the Routing section above) and performed on the actual node
containing this shard. After the primary shard completes the operation,
if needed, the update is distributed to applicable replicas.

**Write Consistency**

To prevent writes from taking place on the "wrong" side of a network
partition, by default, index operations only succeed if a quorum
(>replicas/2+1) of active shards are available. This default can be
overridden on a node-by-node basis using the
``action.write_consistency`` setting. To alter this behavior
per-operation, the ``consistency`` request parameter can be used.

Valid write consistency values are ``one``, ``quorum``, and ``all``.

Note, for the case where the number of replicas is 1 (total of 2 copies
of the data), then the default behavior is to succeed if 1 copy (the
primary) can perform the write.

**Asynchronous Replication**

By default, the index operation only returns after all shards within the
replication group have indexed the document (sync replication). To
enable asynchronous replication, causing the replication process to take
place in the background, set the ``replication`` parameter to ``async``.
When asynchronous replication is used, the index operation will return
as soon as the operation succeeds on the primary shard.

The default value for the ``replication`` setting is ``sync`` and this
default can be overridden on a node-by-node basis using the
``action.replication_type`` setting. Valid values for replication type
are ``sync`` and ``async``. To alter this behavior per-operation, the
``replication`` request parameter can be used.

**Refresh**

To refresh the shard (not the whole index) immediately after the
operation occurs, so that the document appears in search results
immediately, the ``refresh`` parameter can be set to ``true``. Setting
this option to ``true`` should **ONLY** be done after careful thought
and verification that it does not lead to poor performance, both from an
indexing and a search standpoint. Note, getting a document using the get
API is completely realtime.

**Timeout**

The primary shard assigned to perform the index operation might not be
available when the index operation is executed. Some reasons for this
might be that the primary shard is currently recovering from a gateway
or undergoing relocation. By default, the index operation will wait on
the primary shard to become available for up to 1 minute before failing
and responding with an error. The ``timeout`` parameter can be used to
explicitly specify how long it waits. Here is an example of setting it
to 5 minutes:

.. code:: js

    $ curl -XPUT 'http://localhost:9200/twitter/tweet/1?timeout=5m' -d '{
        "user" : "kimchy",
        "post_date" : "2009-11-15T14:12:12",
        "message" : "trying out Elasticsearch"
    }'

Get API
=======

The get API allows to get a typed JSON document from the index based on
its id. The following example gets a JSON document from an index called
twitter, under a type called tweet, with id valued 1:

.. code:: js

    curl -XGET 'http://localhost:9200/twitter/tweet/1'

The result of the above get operation is:

.. code:: js

    {
        "_index" : "twitter",
        "_type" : "tweet",
        "_id" : "1",
        "_version" : 1,
        "found": true,
        "_source" : {
            "user" : "kimchy",
            "postDate" : "2009-11-15T14:12:12",
            "message" : "trying out Elasticsearch"
        }
    }

The above result includes the ``_index``, ``_type``, ``_id`` and
``_version`` of the document we wish to retrieve, including the actual
``_source`` of the document if it could be found (as indicated by the
``found`` field in the response).

The API also allows to check for the existence of a document using
``HEAD``, for example:

.. code:: js

    curl -XHEAD -i 'http://localhost:9200/twitter/tweet/1'

**Realtime**

By default, the get API is realtime, and is not affected by the refresh
rate of the index (when data will become visible for search).

In order to disable realtime GET, one can either set ``realtime``
parameter to ``false``, or globally default it to by setting the
``action.get.realtime`` to ``false`` in the node configuration.

When getting a document, one can specify ``fields`` to fetch from it.
They will, when possible, be fetched as stored fields (fields mapped as
stored in the mapping). When using realtime GET, there is no notion of
stored fields (at least for a period of time, basically, until the next
flush), so they will be extracted from the source itself (note, even if
source is not enabled). It is a good practice to assume that the fields
will be loaded from source when using realtime GET, even if the fields
are stored.

**Optional Type**

The get API allows for ``_type`` to be optional. Set it to ``_all`` in
order to fetch the first document matching the id across all types.

**Source filtering**

By default, the get operation returns the contents of the ``_source``
field unless you have used the ``fields`` parameter or if the
``_source`` field is disabled. You can turn off ``_source`` retrieval by
using the ``_source`` parameter:

.. code:: js

    curl -XGET 'http://localhost:9200/twitter/tweet/1?_source=false'

If you only need one or two fields from the complete ``_source``, you
can use the ``_source_include`` & ``_source_exclude`` parameters to
include or filter out that parts you need. This can be especially
helpful with large documents where partial retrieval can save on network
overhead. Both parameters take a comma separated list of fields or
wildcard expressions. Example:

.. code:: js

    curl -XGET 'http://localhost:9200/twitter/tweet/1?_source_include=*.id&_source_exclude=entities'

If you only want to specify includes, you can use a shorter notation:

.. code:: js

    curl -XGET 'http://localhost:9200/twitter/tweet/1?_source=*.id,retweeted'

**Fields**

The get operation allows specifying a set of stored fields that will be
returned by passing the ``fields`` parameter. For example:

.. code:: js

    curl -XGET 'http://localhost:9200/twitter/tweet/1?fields=title,content'

For backward compatibility, if the requested fields are not stored, they
will be fetched from the ``_source`` (parsed and extracted). This
functionality has been replaced by the `source
filtering <#get-source-filtering>`__ parameter.

Field values fetched from the document it self are always returned as an
array. Metadata fields like ``_routing`` and ``_parent`` fields are
never returned as an array.

Also only leaf fields can be returned via the ``field`` option. So
object fields can’t be returned and such requests will fail.

**Generated fields**

If no refresh occurred between indexing and refresh, GET will access the
transaction log to fetch the document. However, some fields are
generated only when indexing. If you try to access a field that is only
generated when indexing, you will get an exception (default). You can
choose to ignore field that are generated if the transaction log is
accessed by setting ``ignore_errors_on_generated_fields=true``.

**Getting the \_source directly**

Use the ``/{index}/{type}/{id}/_source`` endpoint to get just the
``_source`` field of the document, without any additional content around
it. For example:

.. code:: js

    curl -XGET 'http://localhost:9200/twitter/tweet/1/_source'

You can also use the same source filtering parameters to control which
parts of the ``_source`` will be returned:

.. code:: js

    curl -XGET 'http://localhost:9200/twitter/tweet/1/_source?_source_include=*.id&_source_exclude=entities'

Note, there is also a HEAD variant for the \_source endpoint to
efficiently test for document existence. Curl example:

.. code:: js

    curl -XHEAD -i 'http://localhost:9200/twitter/tweet/1/_source'

**Routing**

When indexing using the ability to control the routing, in order to get
a document, the routing value should also be provided. For example:

.. code:: js

    curl -XGET 'http://localhost:9200/twitter/tweet/1?routing=kimchy'

The above will get a tweet with id 1, but will be routed based on the
user. Note, issuing a get without the correct routing, will cause the
document not to be fetched.

**Preference**

Controls a ``preference`` of which shard replicas to execute the get
request on. By default, the operation is randomized between the shard
replicas.

The ``preference`` can be set to:

``_primary``
    The operation will go and be executed only on the primary shards.

``_local``
    The operation will prefer to be executed on a local allocated shard
    if possible.

Custom (string) value
    A custom value will be used to guarantee that the same shards will
    be used for the same custom value. This can help with "jumping
    values" when hitting different shards in different refresh states. A
    sample value can be something like the web session id, or the user
    name.

**Refresh**

The ``refresh`` parameter can be set to ``true`` in order to refresh the
relevant shard before the get operation and make it searchable. Setting
it to ``true`` should be done after careful thought and verification
that this does not cause a heavy load on the system (and slows down
indexing).

**Distributed**

The get operation gets hashed into a specific shard id. It then gets
redirected to one of the replicas within that shard id and returns the
result. The replicas are the primary shard and its replicas within that
shard id group. This means that the more replicas we will have, the
better GET scaling we will have.

**Versioning support**

You can use the ``version`` parameter to retrieve the document only if
it’s current version is equal to the specified one. This behavior is the
same for all version types with the exception of version type ``FORCE``
which always retrieves the document.

Note that Elasticsearch do not store older versions of documents. Only
the current version can be retrieved.

Delete API
==========

The delete API allows to delete a typed JSON document from a specific
index based on its id. The following example deletes the JSON document
from an index called twitter, under a type called tweet, with id valued
1:

.. code:: js

    $ curl -XDELETE 'http://localhost:9200/twitter/tweet/1'

The result of the above delete operation is:

.. code:: js

    {
        "found" : true,
        "_index" : "twitter",
        "_type" : "tweet",
        "_id" : "1",
        "_version" : 2
    }

**Versioning**

Each document indexed is versioned. When deleting a document, the
``version`` can be specified to make sure the relevant document we are
trying to delete is actually being deleted and it has not changed in the
meantime. Every write operation executed on a document, deletes
included, causes its version to be incremented.

**Routing**

When indexing using the ability to control the routing, in order to
delete a document, the routing value should also be provided. For
example:

.. code:: js

    $ curl -XDELETE 'http://localhost:9200/twitter/tweet/1?routing=kimchy'

The above will delete a tweet with id 1, but will be routed based on the
user. Note, issuing a delete without the correct routing, will cause the
document to not be deleted.

Many times, the routing value is not known when deleting a document. For
those cases, when specifying the ``_routing`` mapping as ``required``,
and no routing value is specified, the delete will be broadcasted
automatically to all shards.

**Parent**

The ``parent`` parameter can be set, which will basically be the same as
setting the routing parameter.

Note that deleting a parent document does not automatically delete its
children. One way of deleting all child documents given a parent’s id is
to perform a `delete by query <#docs-delete-by-query>`__ on the child
index with the automatically generated (and indexed) field \_parent,
which is in the format parent\_type#parent\_id.

**Automatic index creation**

The delete operation automatically creates an index if it has not been
created before (check out the `create index
API <#indices-create-index>`__ for manually creating an index), and also
automatically creates a dynamic type mapping for the specific type if it
has not been created before (check out the `put
mapping <#indices-put-mapping>`__ API for manually creating type
mapping).

**Distributed**

The delete operation gets hashed into a specific shard id. It then gets
redirected into the primary shard within that id group, and replicated
(if needed) to shard replicas within that id group.

**Replication Type**

The replication of the operation can be done in an asynchronous manner
to the replicas (the operation will return once it has be executed on
the primary shard). The ``replication`` parameter can be set to
``async`` (defaults to ``sync``) in order to enable it.

**Write Consistency**

Control if the operation will be allowed to execute based on the number
of active shards within that partition (replication group). The values
allowed are ``one``, ``quorum``, and ``all``. The parameter to set it is
``consistency``, and it defaults to the node level setting of
``action.write_consistency`` which in turn defaults to ``quorum``.

For example, in a N shards with 2 replicas index, there will have to be
at least 2 active shards within the relevant partition (``quorum``) for
the operation to succeed. In a N shards with 1 replica scenario, there
will need to be a single shard active (in this case, ``one`` and
``quorum`` is the same).

**Refresh**

The ``refresh`` parameter can be set to ``true`` in order to refresh the
relevant primary and replica shards after the delete operation has
occurred and make it searchable. Setting it to ``true`` should be done
after careful thought and verification that this does not cause a heavy
load on the system (and slows down indexing).

**Timeout**

The primary shard assigned to perform the delete operation might not be
available when the delete operation is executed. Some reasons for this
might be that the primary shard is currently recovering from a gateway
or undergoing relocation. By default, the delete operation will wait on
the primary shard to become available for up to 1 minute before failing
and responding with an error. The ``timeout`` parameter can be used to
explicitly specify how long it waits. Here is an example of setting it
to 5 minutes:

.. code:: js

    $ curl -XDELETE 'http://localhost:9200/twitter/tweet/1?timeout=5m'

Update API
==========

The update API allows to update a document based on a script provided.
The operation gets the document (collocated with the shard) from the
index, runs the script (with optional script language and parameters),
and index back the result (also allows to delete, or ignore the
operation). It uses versioning to make sure no updates have happened
during the "get" and "reindex".

Note, this operation still means full reindex of the document, it just
removes some network roundtrips and reduces chances of version conflicts
between the get and the index. The ``_source`` field need to be enabled
for this feature to work.

For example, lets index a simple doc:

.. code:: js

    curl -XPUT localhost:9200/test/type1/1 -d '{
        "counter" : 1,
        "tags" : ["red"]
    }'

Now, we can execute a script that would increment the counter:

.. code:: js

    curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
        "script" : "ctx._source.counter += count",
        "params" : {
            "count" : 4
        }
    }'

We can also add a tag to the list of tags (note, if the tag exists, it
will still add it, since its a list):

.. code:: js

    curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
        "script" : "ctx._source.tags += tag",
        "params" : {
            "tag" : "blue"
        }
    }'

We can also add a new field to the document:

.. code:: js

    curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
        "script" : "ctx._source.name_of_new_field = \"value_of_new_field\""
    }'

We can also remove a field from the document:

.. code:: js

    curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
        "script" : "ctx._source.remove(\"name_of_field\")"
    }'

And, we can delete the doc if the tags contain blue, or ignore (noop):

.. code:: js

    curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
        "script" : "ctx._source.tags.contains(tag) ? ctx.op = \"delete\" : ctx.op = \"none\"",
        "params" : {
            "tag" : "blue"
        }
    }'

**Note**: Be aware of MVEL and handling of ternary operators and
assignments. Assignment operations have lower precedence than the
ternary operator. Compare the following statements:

.. code:: js

    // Will NOT update the tags array
    ctx._source.tags.contains(tag) ? ctx.op = \"none\" : ctx._source.tags += tag
    // Will update
    ctx._source.tags.contains(tag) ? (ctx.op = \"none\") : ctx._source.tags += tag
    // Also works
    if (ctx._source.tags.contains(tag)) { ctx.op = \"none\" } else { ctx._source.tags += tag }

The update API also support passing a partial document, which will be
merged into the existing document (simple recursive merge, inner merging
of objects, replacing core "keys/values" and arrays). For example:

.. code:: js

    curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
        "doc" : {
            "name" : "new_name"
        }
    }'

If both ``doc`` and ``script`` is specified, then ``doc`` is ignored.
Best is to put your field pairs of the partial document in the script
itself.

By default if ``doc`` is specified then the document is always updated
even if the merging process doesn’t cause any changes. Specifying
``detect_noop`` as ``true`` will cause Elasticsearch to check if there
are changes and, if there aren’t, turn the update request into a noop.
For example:

.. code:: js

    curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
        "doc" : {
            "name" : "new_name"
        },
        "detect_noop": true
    }'

If ``name`` was ``new_name`` before the request was sent then the entire
update request is ignored.

**Upserts**

There is also support for ``upsert``. If the document does not already
exists, the content of the ``upsert`` element will be used to index the
fresh doc:

.. code:: js

    curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
        "script" : "ctx._source.counter += count",
        "params" : {
            "count" : 4
        },
        "upsert" : {
            "counter" : 1
        }
    }'

If the document does not exist you may want your update script to run
anyway in order to initialize the document contents using business logic
unknown to the client. In this case pass the new ``scripted_upsert``
parameter with the value ``true``.

.. code:: js

    curl -XPOST 'localhost:9200/sessions/session/dh3sgudg8gsrgl/_update' -d '{
        "script_id" : "my_web_session_summariser",
        "scripted_upsert":true,
        "params" : {
            "pageViewEvent" : {
                "url":"foo.com/bar",
                "response":404,
                "time":"2014-01-01 12:32"
            }
        },
        "upsert" : {
        }
    }'

The default ``scripted_upsert`` setting is ``false`` meaning the script
is not executed for inserts. However, in scenarios like the one above we
may be using a non-trivial script stored using the new "indexed scripts"
feature. The script may be deriving properties like the duration of our
web session based on observing multiple page view events so the client
can supply a blank "upsert" document and allow the script to fill in
most of the details using the events passed in the ``params`` element.

Last, the upsert facility also supports ``doc_as_upsert``. So that the
provided document will be inserted if the document does not already
exist. This will reduce the amount of data that needs to be sent to
elasticsearch.

.. code:: js

    curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
        "doc" : {
            "name" : "new_name"
        },
        "doc_as_upsert" : true
    }'

**Parameters**

The update operation supports similar parameters as the index API,
including:

+------------+---------------------------------------------------------------+
| ``routing` | Sets the routing that will be used to route the document to   |
| `          | the relevant shard.                                           |
+------------+---------------------------------------------------------------+
| ``parent`` | Simply sets the routing.                                      |
+------------+---------------------------------------------------------------+
| ``timeout` | Timeout waiting for a shard to become available.              |
| `          |                                                               |
+------------+---------------------------------------------------------------+
| ``replicat | The replication type for the delete/index operation (sync or  |
| ion``      | async).                                                       |
+------------+---------------------------------------------------------------+
| ``consiste | The write consistency of the index/delete operation.          |
| ncy``      |                                                               |
+------------+---------------------------------------------------------------+
| ``refresh` | Refresh the relevant primary and replica shards (not the      |
| `          | whole index) immediately after the operation occurs, so that  |
|            | the updated document appears in search results immediately.   |
+------------+---------------------------------------------------------------+
| ``fields`` | return the relevant fields from the updated document. Support |
|            | ``_source`` to return the full updated source.                |
+------------+---------------------------------------------------------------+
| ``version` | the Update API uses the Elasticsearch’s versioning support    |
| `          | internally to make sure the document doesn’t change during    |
| &          | the update. You can use the ``version`` parameter to specify  |
| ``version_ | that the document should only be updated if it’s version      |
| type``     | matches the one specified. By setting version type to         |
|            | ``force`` you can force the new version of the document after |
|            | update (use with care! with ``force`` there is no guarantee   |
|            | the document didn’t change).Version types ``external`` &      |
|            | ``external_gte`` are not supported.                           |
+------------+---------------------------------------------------------------+

And also support ``retry_on_conflict`` which controls how many times to
retry if there is a version conflict between getting the document and
indexing / deleting it. Defaults to ``0``.

It also allows to update the ``ttl`` of a document using ``ctx._ttl``
and timestamp using ``ctx._timestamp``. Note that if the timestamp is
not updated and not extracted from the ``_source`` it will be set to the
update date.

Multi Get API
=============

Multi GET API allows to get multiple documents based on an index, type
(optional) and id (and possibly routing). The response includes a
``docs`` array with all the fetched documents, each element similar in
structure to a document provided by the `get <#docs-get>`__ API. Here is
an example:

.. code:: js

    curl 'localhost:9200/_mget' -d '{
        "docs" : [
            {
                "_index" : "test",
                "_type" : "type",
                "_id" : "1"
            },
            {
                "_index" : "test",
                "_type" : "type",
                "_id" : "2"
            }
        ]
    }'

The ``mget`` endpoint can also be used against an index (in which case
it is not required in the body):

.. code:: js

    curl 'localhost:9200/test/_mget' -d '{
        "docs" : [
            {
                "_type" : "type",
                "_id" : "1"
            },
            {
                "_type" : "type",
                "_id" : "2"
            }
        ]
    }'

And type:

.. code:: js

    curl 'localhost:9200/test/type/_mget' -d '{
        "docs" : [
            {
                "_id" : "1"
            },
            {
                "_id" : "2"
            }
        ]
    }'

In which case, the ``ids`` element can directly be used to simplify the
request:

.. code:: js

    curl 'localhost:9200/test/type/_mget' -d '{
        "ids" : ["1", "2"]
    }'

**Optional Type**

The mget API allows for ``_type`` to be optional. Set it to ``_all`` or
leave it empty in order to fetch the first document matching the id
across all types.

If you don’t set the type and have many documents sharing the same
``_id``, you will end up getting only the first matching document.

For example, if you have a document 1 within typeA and typeB then
following request will give you back only the same document twice:

.. code:: js

    curl 'localhost:9200/test/_mget' -d '{
        "ids" : ["1", "1"]
    }'

You need in that case to explicitly set the ``_type``:

.. code:: js

    GET /test/_mget/
    {
      "docs" : [
            {
                "_type":"typeA",
                "_id" : "1"
            },
            {
                "_type":"typeB",
                "_id" : "1"
            }
        ]
    }

**Source filtering**

By default, the ``_source`` field will be returned for every document
(if stored). Similar to the `get <#get-source-filtering>`__ API, you can
retrieve only parts of the ``_source`` (or not at all) by using the
``_source`` parameter. You can also use the url parameters
``_source``,\ ``_source_include`` & ``_source_exclude`` to specify
defaults, which will be used when there are no per-document
instructions.

For example:

.. code:: js

    curl 'localhost:9200/_mget' -d '{
        "docs" : [
            {
                "_index" : "test",
                "_type" : "type",
                "_id" : "1",
                "_source" : false
            },
            {
                "_index" : "test",
                "_type" : "type",
                "_id" : "2",
                "_source" : ["field3", "field4"]
            },
            {
                "_index" : "test",
                "_type" : "type",
                "_id" : "3",
                "_source" : {
                    "include": ["user"],
                    "exclude": ["user.location"]
                }
            }
        ]
    }'

**Fields**

Specific stored fields can be specified to be retrieved per document to
get, similar to the `fields <#get-fields>`__ parameter of the Get API.
For example:

.. code:: js

    curl 'localhost:9200/_mget' -d '{
        "docs" : [
            {
                "_index" : "test",
                "_type" : "type",
                "_id" : "1",
                "fields" : ["field1", "field2"]
            },
            {
                "_index" : "test",
                "_type" : "type",
                "_id" : "2",
                "fields" : ["field3", "field4"]
            }
        ]
    }'

Alternatively, you can specify the ``fields`` parameter in the query
string as a default to be applied to all documents.

.. code:: js

    curl 'localhost:9200/test/type/_mget?fields=field1,field2' -d '{
        "docs" : [
            {
                "_id" : "1" 
            },
            {
                "_id" : "2",
                "fields" : ["field3", "field4"] 
            }
        ]
    }'

Returns ``field1`` and ``field2``

Returns ``field3`` and ``field4``

**Generated fields**

See ? for fields are generated only when indexing.

**Routing**

You can also specify routing value as a parameter:

.. code:: js

    curl 'localhost:9200/_mget?routing=key1' -d '{
        "docs" : [
            {
                "_index" : "test",
                "_type" : "type",
                "_id" : "1",
                "_routing" : "key2"
            },
            {
                "_index" : "test",
                "_type" : "type",
                "_id" : "2"
            }
        ]
    }'

In this example, document ``test/type/2`` will be fetch from shard
corresponding to routing key ``key1`` but document ``test/type/1`` will
be fetch from shard corresponding to routing key ``key2``.

**Security**

See ?

Bulk API
========

The bulk API makes it possible to perform many index/delete operations
in a single API call. This can greatly increase the indexing speed.

Some of the officially supported clients provide helpers to assist with
bulk requests and reindexing of documents from one index to another:

Perl
    See
    `Search::Elasticsearch::Bulk <https://metacpan.org/pod/Search::Elasticsearch::Bulk>`__
    and
    `Search::Elasticsearch::Scroll <https://metacpan.org/pod/Search::Elasticsearch::Scroll>`__

Python
    See
    `elasticsearch.helpers.\* <http://elasticsearch-py.readthedocs.org/en/master/helpers.html>`__

The REST API endpoint is ``/_bulk``, and it expects the following JSON
structure:

.. code:: js

    action_and_meta_data\n
    optional_source\n
    action_and_meta_data\n
    optional_source\n
    ....
    action_and_meta_data\n
    optional_source\n

**NOTE**: the final line of data must end with a newline character
``\n``.

The possible actions are ``index``, ``create``, ``delete`` and
``update``. ``index`` and ``create`` expect a source on the next line,
and have the same semantics as the ``op_type`` parameter to the standard
index API (i.e. create will fail if a document with the same index and
type exists already, whereas index will add or replace a document as
necessary). ``delete`` does not expect a source on the following line,
and has the same semantics as the standard delete API. ``update``
expects that the partial doc, upsert and script and its options are
specified on the next line.

If you’re providing text file input to ``curl``, you **must** use the
``--data-binary`` flag instead of plain ``-d``. The latter doesn’t
preserve newlines. Example:

.. code:: js

    $ cat requests
    { "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
    { "field1" : "value1" }
    $ curl -s -XPOST localhost:9200/_bulk --data-binary @requests; echo
    {"took":7,"items":[{"create":{"_index":"test","_type":"type1","_id":"1","_version":1}}]}

Because this format uses literal ``\n``'s as delimiters, please be sure
that the JSON actions and sources are not pretty printed. Here is an
example of a correct sequence of bulk commands:

.. code:: js

    { "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
    { "field1" : "value1" }
    { "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
    { "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
    { "field1" : "value3" }
    { "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1"} }
    { "doc" : {"field2" : "value2"} }

In the above example ``doc`` for the ``update`` action is a partial
document, that will be merged with the already stored document.

The endpoints are ``/_bulk``, ``/{index}/_bulk``, and
``{index}/{type}/_bulk``. When the index or the index/type are provided,
they will be used by default on bulk items that don’t provide them
explicitly.

A note on the format. The idea here is to make processing of this as
fast as possible. As some of the actions will be redirected to other
shards on other nodes, only ``action_meta_data`` is parsed on the
receiving node side.

Client libraries using this protocol should try and strive to do
something similar on the client side, and reduce buffering as much as
possible.

The response to a bulk action is a large JSON structure with the
individual results of each action that was performed. The failure of a
single action does not affect the remaining actions.

There is no "correct" number of actions to perform in a single bulk
call. You should experiment with different settings to find the optimum
size for your particular workload.

If using the HTTP API, make sure that the client does not send HTTP
chunks, as this will slow things down.

**Versioning**

Each bulk item can include the version value using the
``_version``/``version`` field. It automatically follows the behavior of
the index / delete operation based on the ``_version`` mapping. It also
support the ``version_type``/``_version_type`` (see
`versioning <#index-versioning>`__)

**Routing**

Each bulk item can include the routing value using the
``_routing``/``routing`` field. It automatically follows the behavior of
the index / delete operation based on the ``_routing`` mapping.

**Parent**

Each bulk item can include the parent value using the
``_parent``/``parent`` field. It automatically follows the behavior of
the index / delete operation based on the ``_parent`` / ``_routing``
mapping.

**Timestamp**

Each bulk item can include the timestamp value using the
``_timestamp``/``timestamp`` field. It automatically follows the
behavior of the index operation based on the ``_timestamp`` mapping.

**TTL**

Each bulk item can include the ttl value using the ``_ttl``/``ttl``
field. It automatically follows the behavior of the index operation
based on the ``_ttl`` mapping.

**Write Consistency**

When making bulk calls, you can require a minimum number of active
shards in the partition through the ``consistency`` parameter. The
values allowed are ``one``, ``quorum``, and ``all``. It defaults to the
node level setting of ``action.write_consistency``, which in turn
defaults to ``quorum``.

For example, in a N shards with 2 replicas index, there will have to be
at least 2 active shards within the relevant partition (``quorum``) for
the operation to succeed. In a N shards with 1 replica scenario, there
will need to be a single shard active (in this case, ``one`` and
``quorum`` is the same).

**Refresh**

The ``refresh`` parameter can be set to ``true`` in order to refresh the
relevant primary and replica shards immediately after the bulk operation
has occurred and make it searchable, instead of waiting for the normal
refresh interval to expire. Setting it to ``true`` can trigger
additional load, and may slow down indexing.

**Update**

When using ``update`` action ``_retry_on_conflict`` can be used as field
in the action itself (not in the extra payload line), to specify how
many times an update should be retried in the case of a version
conflict.

The ``update`` action payload, supports the following options: ``doc``
(partial document), ``upsert``, ``doc_as_upsert``, ``script``,
``params`` (for script), ``lang`` (for script). See update documentation
for details on the options. Curl example with update actions:

.. code:: js

    { "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} }
    { "doc" : {"field" : "value"} }
    { "update" : { "_id" : "0", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} }
    { "script" : "ctx._source.counter += param1", "lang" : "js", "params" : {"param1" : 1}, "upsert" : {"counter" : 1}}
    { "update" : {"_id" : "2", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} }
    { "doc" : {"field" : "value"}, "doc_as_upsert" : true }

**Security**

See ?

Delete By Query API
===================

The delete by query API allows to delete documents from one or more
indices and one or more types based on a query. The query can either be
provided using a simple query string as a parameter, or using the `Query
DSL <#query-dsl>`__ defined within the request body. Here is an example:

.. code:: js

    $ curl -XDELETE 'http://localhost:9200/twitter/tweet/_query?q=user:kimchy'

    $ curl -XDELETE 'http://localhost:9200/twitter/tweet/_query' -d '{
        "query" : {
            "term" : { "user" : "kimchy" }
        }
    }
    '

    **Note**

    The query being sent in the body must be nested in a ``query`` key,
    same as the `search api <#search-search>`__ works

Both above examples end up doing the same thing, which is delete all
tweets from the twitter index for a certain user. The result of the
commands is:

.. code:: js

    {
        "_indices" : {
            "twitter" : {
                "_shards" : {
                    "total" : 5,
                    "successful" : 5,
                    "failed" : 0
                }
            }
        }
    }

Note, delete by query bypasses versioning support. Also, it is not
recommended to delete "large chunks of the data in an index", many
times, it’s better to simply reindex into a new index.

**Multiple Indices and Types**

The delete by query API can be applied to multiple types within an
index, and across multiple indices. For example, we can delete all
documents across all types within the twitter index:

.. code:: js

    $ curl -XDELETE 'http://localhost:9200/twitter/_query?q=user:kimchy'

We can also delete within specific types:

.. code:: js

    $ curl -XDELETE 'http://localhost:9200/twitter/tweet,user/_query?q=user:kimchy'

We can also delete all tweets with a certain tag across several indices
(for example, when each user has his own index):

.. code:: js

    $ curl -XDELETE 'http://localhost:9200/kimchy,elasticsearch/_query?q=tag:wow'

Or even delete across all indices:

.. code:: js

    $ curl -XDELETE 'http://localhost:9200/_all/_query?q=tag:wow'

**Request Parameters**

When executing a delete by query using the query parameter ``q``, the
query passed is a query string using Lucene query parser. There are
additional parameters that can be passed:

+--------------------------------------+--------------------------------------+
| Name                                 | Description                          |
+======================================+======================================+
| df                                   | The default field to use when no     |
|                                      | field prefix is defined within the   |
|                                      | query.                               |
+--------------------------------------+--------------------------------------+
| analyzer                             | The analyzer name to be used when    |
|                                      | analyzing the query string.          |
+--------------------------------------+--------------------------------------+
| default\_operator                    | The default operator to be used, can |
|                                      | be ``AND`` or ``OR``. Defaults to    |
|                                      | ``OR``.                              |
+--------------------------------------+--------------------------------------+

**Request Body**

The delete by query can use the `Query DSL <#query-dsl>`__ within its
body in order to express the query that should be executed and delete
all documents. The body content can also be passed as a REST parameter
named ``source``.

**Distributed**

The delete by query API is broadcast across all primary shards, and from
there, replicated across all shards replicas.

**Routing**

The routing value (a comma separated list of the routing values) can be
specified to control which shards the delete by query request will be
executed on.

**Replication Type**

The replication of the operation can be done in an asynchronous manner
to the replicas (the operation will return once it has be executed on
the primary shard). The ``replication`` parameter can be set to
``async`` (defaults to ``sync``) in order to enable it.

**Write Consistency**

Control if the operation will be allowed to execute based on the number
of active shards within that partition (replication group). The values
allowed are ``one``, ``quorum``, and ``all``. The parameter to set it is
``consistency``, and it defaults to the node level setting of
``action.write_consistency`` which in turn defaults to ``quorum``.

For example, in a N shards with 2 replicas index, there will have to be
at least 2 active shards within the relevant partition (``quorum``) for
the operation to succeed. In a N shards with 1 replica scenario, there
will need to be a single shard active (in this case, ``one`` and
``quorum`` is the same).

**Limitations**

The delete by query does not support the following queries and filters:
``has_child``, ``has_parent`` and ``top_children``.

Term Vectors
============

Returns information and statistics on terms in the fields of a
particular document. The document could be stored in the index or
artificially provided by the user. Term vectors are
`realtime <#realtime>`__ by default, not near realtime. This can be
changed by setting ``realtime`` parameter to ``false``.

.. code:: js

    curl -XGET 'http://localhost:9200/twitter/tweet/1/_termvector?pretty=true'

Optionally, you can specify the fields for which the information is
retrieved either with a parameter in the url

.. code:: js

    curl -XGET 'http://localhost:9200/twitter/tweet/1/_termvector?fields=text,...'

or by adding the requested fields in the request body (see example
below). Fields can also be specified with wildcards in similar way to
the `multi match query <#query-dsl-multi-match-query>`__

**Return values**

Three types of values can be requested: *term information*, *term
statistics* and *field statistics*. By default, all term information and
field statistics are returned for all fields but no term statistics.

**Term information**

-  term frequency in the field (always returned)

-  term positions (``positions`` : true)

-  start and end offsets (``offsets`` : true)

-  term payloads (``payloads`` : true), as base64 encoded bytes

If the requested information wasn’t stored in the index, it will be
computed on the fly if possible. Additionally, term vectors could be
computed for documents not even existing in the index, but instead
provided by the user.

    **Warning**

    Start and end offsets assume UTF-16 encoding is being used. If you
    want to use these offsets in order to get the original text that
    produced this token, you should make sure that the string you are
    taking a sub-string of is also encoded using UTF-16.

**Term statistics**

Setting ``term_statistics`` to ``true`` (default is ``false``) will
return

-  total term frequency (how often a term occurs in all documents)

-  document frequency (the number of documents containing the current
   term)

By default these values are not returned since term statistics can have
a serious performance impact.

**Field statistics**

Setting ``field_statistics`` to ``false`` (default is ``true``) will
omit :

-  document count (how many documents contain this field)

-  sum of document frequencies (the sum of document frequencies for all
   terms in this field)

-  sum of total term frequencies (the sum of total term frequencies of
   each term in this field)

**Distributed frequencies coming[2.0]**

Setting ``dfs`` to ``true`` (default is ``false``) will return the term
statistics or the field statistics of the entire index, and not just at
the shard. Use it with caution as distributed frequencies can have a
serious performance impact.

**Behaviour**

The term and field statistics are not accurate. Deleted documents are
not taken into account. The information is only retrieved for the shard
the requested document resides in, unless ``dfs`` is set to ``true``.
The term and field statistics are therefore only useful as relative
measures whereas the absolute numbers have no meaning in this context.
By default, when requesting term vectors of artificial documents, a
shard to get the statistics from is randomly selected. Use ``routing``
only to hit a particular shard.

**Example 1**

First, we create an index that stores term vectors, payloads etc. :

.. code:: js

    curl -s -XPUT 'http://localhost:9200/twitter/' -d '{
      "mappings": {
        "tweet": {
          "properties": {
            "text": {
              "type": "string",
              "term_vector": "with_positions_offsets_payloads",
              "store" : true,
              "index_analyzer" : "fulltext_analyzer"
             },
             "fullname": {
              "type": "string",
              "term_vector": "with_positions_offsets_payloads",
              "index_analyzer" : "fulltext_analyzer"
            }
          }
        }
      },
      "settings" : {
        "index" : {
          "number_of_shards" : 1,
          "number_of_replicas" : 0
        },
        "analysis": {
          "analyzer": {
            "fulltext_analyzer": {
              "type": "custom",
              "tokenizer": "whitespace",
              "filter": [
                "lowercase",
                "type_as_payload"
              ]
            }
          }
        }
      }
    }'

Second, we add some documents:

.. code:: js

    curl -XPUT 'http://localhost:9200/twitter/tweet/1?pretty=true' -d '{
      "fullname" : "John Doe",
      "text" : "twitter test test test "
    }'

    curl -XPUT 'http://localhost:9200/twitter/tweet/2?pretty=true' -d '{
      "fullname" : "Jane Doe",
      "text" : "Another twitter test ..."
    }'

The following request returns all information and statistics for field
``text`` in document ``1`` (John Doe):

.. code:: js

    curl -XGET 'http://localhost:9200/twitter/tweet/1/_termvector?pretty=true' -d '{
      "fields" : ["text"],
      "offsets" : true,
      "payloads" : true,
      "positions" : true,
      "term_statistics" : true,
      "field_statistics" : true
    }'

Response:

.. code:: js

    {
        "_id": "1",
        "_index": "twitter",
        "_type": "tweet",
        "_version": 1,
        "found": true,
        "term_vectors": {
            "text": {
                "field_statistics": {
                    "doc_count": 2,
                    "sum_doc_freq": 6,
                    "sum_ttf": 8
                },
                "terms": {
                    "test": {
                        "doc_freq": 2,
                        "term_freq": 3,
                        "tokens": [
                            {
                                "end_offset": 12,
                                "payload": "d29yZA==",
                                "position": 1,
                                "start_offset": 8
                            },
                            {
                                "end_offset": 17,
                                "payload": "d29yZA==",
                                "position": 2,
                                "start_offset": 13
                            },
                            {
                                "end_offset": 22,
                                "payload": "d29yZA==",
                                "position": 3,
                                "start_offset": 18
                            }
                        ],
                        "ttf": 4
                    },
                    "twitter": {
                        "doc_freq": 2,
                        "term_freq": 1,
                        "tokens": [
                            {
                                "end_offset": 7,
                                "payload": "d29yZA==",
                                "position": 0,
                                "start_offset": 0
                            }
                        ],
                        "ttf": 2
                    }
                }
            }
        }
    }

**Example 2**

Term vectors which are not explicitly stored in the index are
automatically computed on the fly. The following request returns all
information and statistics for the fields in document ``1``, even though
the terms haven’t been explicitly stored in the index. Note that for the
field ``text``, the terms are not re-generated.

.. code:: js

    curl -XGET 'http://localhost:9200/twitter/tweet/1/_termvector?pretty=true' -d '{
      "fields" : ["text", "some_field_without_term_vectors"],
      "offsets" : true,
      "positions" : true,
      "term_statistics" : true,
      "field_statistics" : true
    }'

**Example 3**

Term vectors can also be generated for artificial documents, that is for
documents not present in the index. The syntax is similar to the
`percolator <#search-percolate>`__ API. For example, the following
request would return the same results as in example 1. The mapping used
is determined by the ``index`` and ``type``.

    **Warning**

    If dynamic mapping is turned on (default), the document fields not
    in the original mapping will be dynamically created.

.. code:: js

    curl -XGET 'http://localhost:9200/twitter/tweet/_termvector' -d '{
      "doc" : {
        "fullname" : "John Doe",
        "text" : "twitter test test test"
      }
    }'

**Example 4**

Additionally, a different analyzer than the one at the field may be
provided by using the ``per_field_analyzer`` parameter. This is useful
in order to generate term vectors in any fashion, especially when using
artificial documents. When providing an analyzer for a field that
already stores term vectors, the term vectors will be re-generated.

.. code:: js

    curl -XGET 'http://localhost:9200/twitter/tweet/_termvector' -d '{
      "doc" : {
        "fullname" : "John Doe",
        "text" : "twitter test test test"
      },
      "fields": ["fullname"],
      "per_field_analyzer" : {
        "fullname": "keyword"
      }
    }'

Response:

.. code:: js

    {
      "_index": "twitter",
      "_type": "tweet",
      "_version": 0,
      "found": true,
      "term_vectors": {
        "fullname": {
           "field_statistics": {
              "sum_doc_freq": 1,
              "doc_count": 1,
              "sum_ttf": 1
           },
           "terms": {
              "John Doe": {
                 "term_freq": 1,
                 "tokens": [
                    {
                       "position": 0,
                       "start_offset": 0,
                       "end_offset": 8
                    }
                 ]
              }
           }
        }
      }
    }

Multi termvectors API
=====================

Multi termvectors API allows to get multiple termvectors at once. The
documents from which to retrieve the term vectors are specified by an
index, type and id. But the documents could also be artificially
provided The response includes a ``docs`` array with all the fetched
termvectors, each element having the structure provided by the
`termvectors <#docs-termvectors>`__ API. Here is an example:

.. code:: js

    curl 'localhost:9200/_mtermvectors' -d '{
       "docs": [
          {
             "_index": "testidx",
             "_type": "test",
             "_id": "2",
             "term_statistics": true
          },
          {
             "_index": "testidx",
             "_type": "test",
             "_id": "1",
             "fields": [
                "text"
             ]
          }
       ]
    }'

See the `termvectors <#docs-termvectors>`__ API for a description of
possible parameters.

The ``_mtermvectors`` endpoint can also be used against an index (in
which case it is not required in the body):

.. code:: js

    curl 'localhost:9200/testidx/_mtermvectors' -d '{
       "docs": [
          {
             "_type": "test",
             "_id": "2",
             "fields": [
                "text"
             ],
             "term_statistics": true
          },
          {
             "_type": "test",
             "_id": "1"
          }
       ]
    }'

And type:

.. code:: js

    curl 'localhost:9200/testidx/test/_mtermvectors' -d '{
       "docs": [
          {
             "_id": "2",
             "fields": [
                "text"
             ],
             "term_statistics": true
          },
          {
             "_id": "1"
          }
       ]
    }'

If all requested documents are on same index and have same type and also
the parameters are the same, the request can be simplified:

.. code:: js

    curl 'localhost:9200/testidx/test/_mtermvectors' -d '{
        "ids" : ["1", "2"],
        "parameters": {
            "fields": [
                "text"
            ],
            "term_statistics": true,
            …
        }
    }'

Additionally, just like for the `termvectors <#docs-termvectors>`__ API,
term vectors could be generated for user provided documents. The syntax
is similar to the `percolator <#search-percolate>`__ API. The mapping
used is determined by ``_index`` and ``_type``.

.. code:: js

    curl 'localhost:9200/_mtermvectors' -d '{
       "docs": [
          {
             "_index": "testidx",
             "_type": "test",
             "doc" : {
                "fullname" : "John Doe",
                "text" : "twitter test test test"
             }
          },
          {
             "_index": "testidx",
             "_type": "test",
             "doc" : {
               "fullname" : "Jane Doe",
               "text" : "Another twitter test ..."
             }
          }
       ]
    }'

Most search APIs are `multi-index,
multi-type <#search-multi-index-type>`__, with the exception of the ?
endpoints.

**Routing**

When executing a search, it will be broadcasted to all the index/indices
shards (round robin between replicas). Which shards will be searched on
can be controlled by providing the ``routing`` parameter. For example,
when indexing tweets, the routing value can be the user name:

.. code:: js

    $ curl -XPOST 'http://localhost:9200/twitter/tweet?routing=kimchy' -d '{
        "user" : "kimchy",
        "postDate" : "2009-11-15T14:12:12",
        "message" : "trying out Elasticsearch"
    }
    '

In such a case, if we want to search only on the tweets for a specific
user, we can specify it as the routing, resulting in the search hitting
only the relevant shard:

.. code:: js

    $ curl -XGET 'http://localhost:9200/twitter/tweet/_search?routing=kimchy' -d '{
        "query": {
            "filtered" : {
                "query" : {
                    "query_string" : {
                        "query" : "some query string here"
                    }
                },
                "filter" : {
                    "term" : { "user" : "kimchy" }
                }
            }
        }
    }
    '

The routing parameter can be multi valued represented as a comma
separated string. This will result in hitting the relevant shards where
the routing values match to.

**Stats Groups**

A search can be associated with stats groups, which maintains a
statistics aggregation per group. It can later be retrieved using the
`indices stats <#indices-stats>`__ API specifically. For example, here
is a search body request that associate the request with two different
groups:

.. code:: js

    {
        "query" : {
            "match_all" : {}
        },
        "stats" : ["group1", "group2"]
    }

Search
======

The search API allows to execute a search query and get back search hits
that match the query. The query can either be provided using a simple
`query string as a parameter <#search-uri-request>`__, or using a
`request body <#search-request-body>`__.

**Multi-Index, Multi-Type**

All search APIs can be applied across multiple types within an index,
and across multiple indices with support for the `multi index
syntax <#multi-index>`__. For example, we can search on all documents
across all types within the twitter index:

.. code:: js

    $ curl -XGET 'http://localhost:9200/twitter/_search?q=user:kimchy'

We can also search within specific types:

.. code:: js

    $ curl -XGET 'http://localhost:9200/twitter/tweet,user/_search?q=user:kimchy'

We can also search all tweets with a certain tag across several indices
(for example, when each user has his own index):

.. code:: js

    $ curl -XGET 'http://localhost:9200/kimchy,elasticsearch/tweet/_search?q=tag:wow'

Or we can search all tweets across all available indices using ``_all``
placeholder:

.. code:: js

    $ curl - XGET 'http://localhost:9200/_all/tweet/_search?q=tag:wow'

Or even search across all indices and all types:

.. code:: js

    $ curl -XGET 'http://localhost:9200/_search?q=tag:wow'

URI Search
==========

A search request can be executed purely using a URI by providing request
parameters. Not all search options are exposed when executing a search
using this mode, but it can be handy for quick "curl tests". Here is an
example:

.. code:: js

    $ curl -XGET 'http://localhost:9200/twitter/tweet/_search?q=user:kimchy'

And here is a sample response:

.. code:: js

    {
        "_shards":{
            "total" : 5,
            "successful" : 5,
            "failed" : 0
        },
        "hits":{
            "total" : 1,
            "hits" : [
                {
                    "_index" : "twitter",
                    "_type" : "tweet",
                    "_id" : "1",
                    "_source" : {
                        "user" : "kimchy",
                        "postDate" : "2009-11-15T14:12:12",
                        "message" : "trying out Elasticsearch"
                    }
                }
            ]
        }
    }

**Parameters**

The parameters allowed in the URI are:

+--------------------------------------+--------------------------------------+
| Name                                 | Description                          |
+======================================+======================================+
| ``q``                                | The query string (maps to the        |
|                                      | ``query_string`` query, see `*Query  |
|                                      | String                               |
|                                      | Query* <#query-dsl-query-string-quer |
|                                      | y>`__                                |
|                                      | for more details).                   |
+--------------------------------------+--------------------------------------+
| ``df``                               | The default field to use when no     |
|                                      | field prefix is defined within the   |
|                                      | query.                               |
+--------------------------------------+--------------------------------------+
| ``analyzer``                         | The analyzer name to be used when    |
|                                      | analyzing the query string.          |
+--------------------------------------+--------------------------------------+
| ``default_operator``                 | The default operator to be used, can |
|                                      | be ``AND`` or ``OR``. Defaults to    |
|                                      | ``OR``.                              |
+--------------------------------------+--------------------------------------+
| ``explain``                          | For each hit, contain an explanation |
|                                      | of how scoring of the hits was       |
|                                      | computed.                            |
+--------------------------------------+--------------------------------------+
| ``_source``                          | Set to ``false`` to disable          |
|                                      | retrieval of the ``_source`` field.  |
|                                      | You can also retrieve part of the    |
|                                      | document by using                    |
|                                      | ``_source_include`` &                |
|                                      | ``_source_exclude`` (see the         |
|                                      | `request                             |
|                                      | body <#search-request-source-filteri |
|                                      | ng>`__                               |
|                                      | documentation for more details)      |
+--------------------------------------+--------------------------------------+
| ``fields``                           | The selective stored fields of the   |
|                                      | document to return for each hit,     |
|                                      | comma delimited. Not specifying any  |
|                                      | value will cause no fields to        |
|                                      | return.                              |
+--------------------------------------+--------------------------------------+
| ``sort``                             | Sorting to perform. Can either be in |
|                                      | the form of ``fieldName``, or        |
|                                      | ``fieldName:asc``/``fieldName:desc`` |
|                                      | .                                    |
|                                      | The fieldName can either be an       |
|                                      | actual field within the document, or |
|                                      | the special ``_score`` name to       |
|                                      | indicate sorting based on scores.    |
|                                      | There can be several ``sort``        |
|                                      | parameters (order is important).     |
+--------------------------------------+--------------------------------------+
| ``track_scores``                     | When sorting, set to ``true`` in     |
|                                      | order to still track scores and      |
|                                      | return them as part of each hit.     |
+--------------------------------------+--------------------------------------+
| ``timeout``                          | A search timeout, bounding the       |
|                                      | search request to be executed within |
|                                      | the specified time value and bail    |
|                                      | with the hits accumulated up to that |
|                                      | point when expired. Defaults to no   |
|                                      | timeout.                             |
+--------------------------------------+--------------------------------------+
| ``terminate_after``                  | The maximum number of documents to   |
|                                      | collect for each shard, upon         |
|                                      | reaching which the query execution   |
|                                      | will terminate early. If set, the    |
|                                      | response will have a boolean field   |
|                                      | ``terminated_early`` to indicate     |
|                                      | whether the query execution has      |
|                                      | actually terminated\_early. Defaults |
|                                      | to no terminate\_after.              |
+--------------------------------------+--------------------------------------+
| ``from``                             | The starting from index of the hits  |
|                                      | to return. Defaults to ``0``.        |
+--------------------------------------+--------------------------------------+
| ``size``                             | The number of hits to return.        |
|                                      | Defaults to ``10``.                  |
+--------------------------------------+--------------------------------------+
| ``search_type``                      | The type of the search operation to  |
|                                      | perform. Can be                      |
|                                      | ``dfs_query_then_fetch``,            |
|                                      | ``dfs_query_and_fetch``,             |
|                                      | ``query_then_fetch``,                |
|                                      | ``query_and_fetch``, ``count``,      |
|                                      | ``scan``. Defaults to                |
|                                      | ``query_then_fetch``. See `*Search   |
|                                      | Type* <#search-request-search-type>` |
|                                      | __                                   |
|                                      | for more details on the different    |
|                                      | types of search that can be          |
|                                      | performed.                           |
+--------------------------------------+--------------------------------------+
| ``lowercase_expanded_terms``         | Should terms be automatically        |
|                                      | lowercased or not. Defaults to       |
|                                      | ``true``.                            |
+--------------------------------------+--------------------------------------+
| ``analyze_wildcard``                 | Should wildcard and prefix queries   |
|                                      | be analyzed or not. Defaults to      |
|                                      | ``false``.                           |
+--------------------------------------+--------------------------------------+

Request Body Search
===================

The search request can be executed with a search DSL, which includes the
`Query DSL <#query-dsl>`__, within its body. Here is an example:

.. code:: js

    $ curl -XGET 'http://localhost:9200/twitter/tweet/_search' -d '{
        "query" : {
            "term" : { "user" : "kimchy" }
        }
    }
    '

And here is a sample response:

.. code:: js

    {
        "_shards":{
            "total" : 5,
            "successful" : 5,
            "failed" : 0
        },
        "hits":{
            "total" : 1,
            "hits" : [
                {
                    "_index" : "twitter",
                    "_type" : "tweet",
                    "_id" : "1",
                    "_source" : {
                        "user" : "kimchy",
                        "postDate" : "2009-11-15T14:12:12",
                        "message" : "trying out Elasticsearch"
                    }
                }
            ]
        }
    }

**Parameters**

+------------+---------------------------------------------------------------+
| ``timeout` | A search timeout, bounding the search request to be executed  |
| `          | within the specified time value and bail with the hits        |
|            | accumulated up to that point when expired. Defaults to no     |
|            | timeout. See ?.                                               |
+------------+---------------------------------------------------------------+
| ``from``   | The starting from index of the hits to return. Defaults to    |
|            | ``0``.                                                        |
+------------+---------------------------------------------------------------+
| ``size``   | The number of hits to return. Defaults to ``10``.             |
+------------+---------------------------------------------------------------+
| ``search_t | The type of the search operation to perform. Can be           |
| ype``      | ``dfs_query_then_fetch``, ``dfs_query_and_fetch``,            |
|            | ``query_then_fetch``, ``query_and_fetch``. Defaults to        |
|            | ``query_then_fetch``. See `*Search                            |
|            | Type* <#search-request-search-type>`__ for more.              |
+------------+---------------------------------------------------------------+
| ``query_ca | Set to ``true`` or ``false`` to enable or disable the caching |
| che``      | of search results for requests where ``?search_type=count``,  |
|            | ie aggregations and suggestions. See ?.                       |
+------------+---------------------------------------------------------------+
| ``terminat | The maximum number of documents to collect for each shard,    |
| e_after``  | upon reaching which the query execution will terminate early. |
|            | If set, the response will have a boolean field                |
|            | ``terminated_early`` to indicate whether the query execution  |
|            | has actually terminated\_early. Defaults to no                |
|            | terminate\_after.                                             |
+------------+---------------------------------------------------------------+

Out of the above, the ``search_type`` and the ``query_cache`` must be
passed as query-string parameters. The rest of the search request should
be passed within the body itself. The body content can also be passed as
a REST parameter named ``source``.

Both HTTP GET and HTTP POST can be used to execute search with body.
Since not all clients support GET with body, POST is allowed as well.

Query
-----

The query element within the search request body allows to define a
query using the `Query DSL <#query-dsl>`__.

.. code:: js

    {
        "query" : {
            "term" : { "user" : "kimchy" }
        }
    }

From / Size
-----------

Pagination of results can be done by using the ``from`` and ``size``
parameters. The ``from`` parameter defines the offset from the first
result you want to fetch. The ``size`` parameter allows you to configure
the maximum amount of hits to be returned.

Though ``from`` and ``size`` can be set as request parameters, they can
also be set within the search body. ``from`` defaults to ``0``, and
``size`` defaults to ``10``.

.. code:: js

    {
        "from" : 0, "size" : 10,
        "query" : {
            "term" : { "user" : "kimchy" }
        }
    }

Sort
----

Allows to add one or more sort on specific fields. Each sort can be
reversed as well. The sort is defined on a per field level, with special
field name for ``_score`` to sort by score.

.. code:: js

    {
        "sort" : [
            { "post_date" : {"order" : "asc"}},
            "user",
            { "name" : "desc" },
            { "age" : "desc" },
            "_score"
        ],
        "query" : {
            "term" : { "user" : "kimchy" }
        }
    }

Sort Values
~~~~~~~~~~~

The sort values for each document returned are also returned as part of
the response.

Sort mode option
~~~~~~~~~~~~~~~~

Elasticsearch supports sorting by array or multi-valued fields. The
``mode`` option controls what array value is picked for sorting the
document it belongs to. The ``mode`` option can have the following
values:

+------------+---------------------------------------------------------------+
| ``min``    | Pick the lowest value.                                        |
+------------+---------------------------------------------------------------+
| ``max``    | Pick the highest value.                                       |
+------------+---------------------------------------------------------------+
| ``sum``    | Use the sum of all values as sort value. Only applicable for  |
|            | number based array fields.                                    |
+------------+---------------------------------------------------------------+
| ``avg``    | Use the average of all values as sort value. Only applicable  |
|            | for number based array fields.                                |
+------------+---------------------------------------------------------------+

Sort mode example usage
^^^^^^^^^^^^^^^^^^^^^^^

In the example below the field price has multiple prices per document.
In this case the result hits will be sort by price ascending based on
the average price per document.

.. code:: js

    curl -XPOST 'localhost:9200/_search' -d '{
       "query" : {
        ...
       },
       "sort" : [
          {"price" : {"order" : "asc", "mode" : "avg"}}
       ]
    }'

Sorting within nested objects.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Elasticsearch also supports sorting by fields that are inside one or
more nested objects. The sorting by nested field support has the
following parameters on top of the already existing sort options:

``nested_path``
    Defines the on what nested object to sort. The actual sort field
    must be a direct field inside this nested object. The default is to
    use the most immediate inherited nested object from the sort field.

``nested_filter``
    A filter the inner objects inside the nested path should match with
    in order for its field values to be taken into account by sorting.
    Common case is to repeat the query / filter inside the nested filter
    or query. By default no ``nested_filter`` is active.

Nested sorting example
^^^^^^^^^^^^^^^^^^^^^^

In the below example ``offer`` is a field of type ``nested``. Because
``offer`` is the closest inherited nested field, it is picked as
``nested_path``. Only the inner objects that have color blue will
participate in sorting.

.. code:: js

    curl -XPOST 'localhost:9200/_search' -d '{
       "query" : {
        ...
       },
       "sort" : [
           {
              "offer.price" : {
                 "mode" :  "avg",
                 "order" : "asc",
                 "nested_filter" : {
                    "term" : { "offer.color" : "blue" }
                 }
              }
           }
        ]
    }'

Nested sorting is also supported when sorting by scripts and sorting by
geo distance.

Missing Values
~~~~~~~~~~~~~~

The ``missing`` parameter specifies how docs which are missing the field
should be treated: The ``missing`` value can be set to ``_last``,
``_first``, or a custom value (that will be used for missing docs as the
sort value). For example:

.. code:: js

    {
        "sort" : [
            { "price" : {"missing" : "_last"} },
        ],
        "query" : {
            "term" : { "user" : "kimchy" }
        }
    }

    **Note**

    If a nested inner object doesn’t match with the ``nested_filter``
    then a missing value is used.

Ignoring Unmapped Fields
~~~~~~~~~~~~~~~~~~~~~~~~

By default, the search request will fail if there is no mapping
associated with a field. The ``unmapped_type`` option allows to ignore
fields that have no mapping and not sort by them. The value of this
parameter is used to determine what sort values to emit. Here is an
example of how it can be used:

.. code:: js

    {
        "sort" : [
            { "price" : {"unmapped_type" : "long"} },
        ],
        "query" : {
            "term" : { "user" : "kimchy" }
        }
    }

If any of the indices that are queried doesn’t have a mapping for
``price`` then Elasticsearch will handle it as if there was a mapping of
type ``long``, with all documents in this index having no value for this
field.

Geo Distance Sorting
~~~~~~~~~~~~~~~~~~~~

Allow to sort by ``_geo_distance``. Here is an example:

.. code:: js

    {
        "sort" : [
            {
                "_geo_distance" : {
                    "pin.location" : [-70, 40],
                    "order" : "asc",
                    "unit" : "km",
            "mode" : "min",
            "distance_type" : "sloppy_arc"
                }
            }
        ],
        "query" : {
            "term" : { "user" : "kimchy" }
        }
    }

``distance_type``
    How to compute the distance. Can either be ``sloppy_arc`` (default),
    ``arc`` (slighly more precise but significantly slower) or ``plane``
    (faster, but inaccurate on long distances and close to the poles).

Note: the geo distance sorting supports ``sort_mode`` options: ``min``,
``max`` and ``avg``.

The following formats are supported in providing the coordinates:

Lat Lon as Properties
^^^^^^^^^^^^^^^^^^^^^

.. code:: js

    {
        "sort" : [
            {
                "_geo_distance" : {
                    "pin.location" : {
                        "lat" : 40,
                        "lon" : -70
                    },
                    "order" : "asc",
                    "unit" : "km"
                }
            }
        ],
        "query" : {
            "term" : { "user" : "kimchy" }
        }
    }

Lat Lon as String
^^^^^^^^^^^^^^^^^

Format in ``lat,lon``.

.. code:: js

    {
        "sort" : [
            {
                "_geo_distance" : {
                    "pin.location" : "-70,40",
                    "order" : "asc",
                    "unit" : "km"
                }
            }
        ],
        "query" : {
            "term" : { "user" : "kimchy" }
        }
    }

Geohash
^^^^^^^

.. code:: js

    {
        "sort" : [
            {
                "_geo_distance" : {
                    "pin.location" : "drm3btev3e86",
                    "order" : "asc",
                    "unit" : "km"
                }
            }
        ],
        "query" : {
            "term" : { "user" : "kimchy" }
        }
    }

Lat Lon as Array
^^^^^^^^^^^^^^^^

Format in ``[lon, lat]``, note, the order of lon/lat here in order to
conform with `GeoJSON <http://geojson.org/>`__.

.. code:: js

    {
        "sort" : [
            {
                "_geo_distance" : {
                    "pin.location" : [-70, 40],
                    "order" : "asc",
                    "unit" : "km"
                }
            }
        ],
        "query" : {
            "term" : { "user" : "kimchy" }
        }
    }

Multiple reference points
~~~~~~~~~~~~~~~~~~~~~~~~~

Multiple geo points can be passed as an array containing any
``geo_point`` format, for example

.. code:: js

    "pin.location" : [[-70, 40], [-71, 42]]
    "pin.location" : [{"lat": -70, "lon": 40}, {"lat": -71, "lon": 42}]

and so forth.

The final distance for a document will then be ``min``/``max``/``avg``
(defined via ``mode``) distance of all points contained in the document
to all points given in the sort request.

Script Based Sorting
~~~~~~~~~~~~~~~~~~~~

Allow to sort based on custom scripts, here is an example:

.. code:: js

    {
        "query" : {
            ....
        },
        "sort" : {
            "_script" : {
                "script" : "doc['field_name'].value * factor",
                "type" : "number",
                "params" : {
                    "factor" : 1.1
                },
                "order" : "asc"
            }
        }
    }

Note, it is recommended, for single custom based script based sorting,
to use ``function_score`` query instead as sorting based on score is
faster.

Track Scores
~~~~~~~~~~~~

When sorting on a field, scores are not computed. By setting
``track_scores`` to true, scores will still be computed and tracked.

.. code:: js

    {
        "track_scores": true,
        "sort" : [
            { "post_date" : {"reverse" : true} },
            { "name" : "desc" },
            { "age" : "desc" }
        ],
        "query" : {
            "term" : { "user" : "kimchy" }
        }
    }

Memory Considerations
~~~~~~~~~~~~~~~~~~~~~

When sorting, the relevant sorted field values are loaded into memory.
This means that per shard, there should be enough memory to contain
them. For string based types, the field sorted on should not be analyzed
/ tokenized. For numeric types, if possible, it is recommended to
explicitly set the type to six\_hun types (like ``short``, ``integer``
and ``float``).

Source filtering
----------------

Allows to control how the ``_source`` field is returned with every hit.

By default operations return the contents of the ``_source`` field
unless you have used the ``fields`` parameter or if the ``_source``
field is disabled.

You can turn off ``_source`` retrieval by using the ``_source``
parameter:

To disable ``_source`` retrieval set to ``false``:

.. code:: js

    {
        "_source": false,
        "query" : {
            "term" : { "user" : "kimchy" }
        }
    }

The ``_source`` also accepts one or more wildcard patterns to control
what parts of the ``_source`` should be returned:

For example:

.. code:: js

    {
        "_source": "obj.*",
        "query" : {
            "term" : { "user" : "kimchy" }
        }
    }

Or

.. code:: js

    {
        "_source": [ "obj1.*", "obj2.*" ],
        "query" : {
            "term" : { "user" : "kimchy" }
        }
    }

Finally, for complete control, you can specify both include and exclude
patterns:

.. code:: js

    {
        "_source": {
            "include": [ "obj1.*", "obj2.*" ],
            "exclude": [ "*.description" ],
        }
        "query" : {
            "term" : { "user" : "kimchy" }
        }
    }

Fields
------

Allows to selectively load specific stored fields for each document
represented by a search hit.

.. code:: js

    {
        "fields" : ["user", "postDate"],
        "query" : {
            "term" : { "user" : "kimchy" }
        }
    }

``*`` can be used to load all stored fields from the document.

An empty array will cause only the ``_id`` and ``_type`` for each hit to
be returned, for example:

.. code:: js

    {
        "fields" : [],
        "query" : {
            "term" : { "user" : "kimchy" }
        }
    }

For backwards compatibility, if the fields parameter specifies fields
which are not stored (``store`` mapping set to ``false``), it will load
the ``_source`` and extract it from it. This functionality has been
replaced by the `source filtering <#search-request-source-filtering>`__
parameter.

Field values fetched from the document it self are always returned as an
array. Metadata fields like ``_routing`` and ``_parent`` fields are
never returned as an array.

Also only leaf fields can be returned via the ``field`` option. So
object fields can’t be returned and such requests will fail.

Script fields can also be automatically detected and used as fields, so
things like ``_source.obj1.field1`` can be used, though not recommended,
as ``obj1.field1`` will work as well.

Script Fields
-------------

Allows to return a `script evaluation <#modules-scripting>`__ (based on
different fields) for each hit, for example:

.. code:: js

    {
        "query" : {
            ...
        },
        "script_fields" : {
            "test1" : {
                "script" : "doc['my_field_name'].value * 2"
            },
            "test2" : {
                "script" : "doc['my_field_name'].value * factor",
                "params" : {
                    "factor"  : 2.0
                }
            }
        }
    }

Script fields can work on fields that are not stored (``my_field_name``
in the above case), and allow to return custom values to be returned
(the evaluated value of the script).

Script fields can also access the actual ``_source`` document indexed
and extract specific elements to be returned from it (can be an "object"
type). Here is an example:

.. code:: js

        {
            "query" : {
                ...
            },
            "script_fields" : {
                "test1" : {
                    "script" : "_source.obj1.obj2"
                }
            }
        }

Note the ``_source`` keyword here to navigate the json-like model.

It’s important to understand the difference between
``doc['my_field'].value`` and ``_source.my_field``. The first, using the
doc keyword, will cause the terms for that field to be loaded to memory
(cached), which will result in faster execution, but more memory
consumption. Also, the ``doc[...]`` notation only allows for simple
valued fields (can’t return a json object from it) and make sense only
on non-analyzed or single term based fields.

The ``_source`` on the other hand causes the source to be loaded,
parsed, and then only the relevant part of the json is returned.

Field Data Fields
-----------------

Allows to return the field data representation of a field for each hit,
for example:

.. code:: js

    {
        "query" : {
            ...
        },
        "fielddata_fields" : ["test1", "test2"]
    }

Field data fields can work on fields that are not stored.

It’s important to understand that using the ``fielddata_fields``
parameter will cause the terms for that field to be loaded to memory
(cached), which will result in more memory consumption.

Post filter
-----------

The ``post_filter`` is applied to the search ``hits`` at the very end of
a search request, after aggregations have already been calculated. It’s
purpose is best explained by example:

Imagine that you are selling shirts, and the user has specified two
filters: ``color:red`` and ``brand:gucci``. You only want to show them
red shirts made by Gucci in the search results. Normally you would do
this with a ```filtered`` query <#query-dsl-filtered-query>`__:

.. code:: json

    curl -XGET localhost:9200/shirts/_search -d '
    {
      "query": {
        "filtered": {
          "filter": {
            "bool": {
              "must": [
                { "term": { "color": "red"   }},
                { "term": { "brand": "gucci" }}
              ]
            }
          }
        }
      }
    }
    '

However, you would also like to use *faceted navigation* to display a
list of other options that the user could click on. Perhaps you have a
``model`` field that would allow the user to limit their search results
to red Gucci ``t-shirts`` or ``dress-shirts``.

This can be done with a ```terms``
aggregation <#search-aggregations-bucket-terms-aggregation>`__:

.. code:: json

    curl -XGET localhost:9200/shirts/_search -d '
    {
      "query": {
        "filtered": {
          "filter": {
            "bool": {
              "must": [
                { "term": { "color": "red"   }},
                { "term": { "brand": "gucci" }}
              ]
            }
          }
        }
      },
      "aggs": {
        "models": {
          "terms": { "field": "model" } 
        }
      }
    }
    '

Returns the most popular models of red shirts by Gucci.

But perhaps you would also like to tell the user how many Gucci shirts
are available in **other colors**. If you just add a ``terms``
aggregation on the ``color`` field, you will only get back the color
``red``, because your query returns only red shirts by Gucci.

Instead, you want to include shirts of all colors during aggregation,
then apply the ``colors`` filter only to the search results. This is the
purpose of the ``post_filter``:

.. code:: json

    curl -XGET localhost:9200/shirts/_search -d '
    {
      "query": {
        "filtered": {
          "filter": {
            { "term": { "brand": "gucci" }} 
          }
        }
      },
      "aggs": {
        "colors": {
          "terms": { "field": "color" }, 
        },
        "color_red": {
          "filter": {
            "term": { "color": "red" } 
          },
          "aggs": {
            "models": {
              "terms": { "field": "model" } 
            }
          }
        }
      },
      "post_filter": { 
        "term": { "color": "red" },
      }
    }
    '

The main query now finds all shirts by Gucci, regardless of color.

The ``colors`` agg returns popular colors for shirts by Gucci.

The ``color_red`` agg limits the ``models`` sub-aggregation to **red**
Gucci shirts.

Finally, the ``post_filter`` removes colors other than red from the
search ``hits``.

Highlighting
------------

Allows to highlight search results on one or more fields. The
implementation uses either the lucene ``highlighter``,
``fast-vector-highlighter`` or ``postings-highlighter``. The following
is an example of the search request body:

.. code:: js

    {
        "query" : {...},
        "highlight" : {
            "fields" : {
                "content" : {}
            }
        }
    }

In the above case, the ``content`` field will be highlighted for each
search hit (there will be another element in each search hit, called
``highlight``, which includes the highlighted fields and the highlighted
fragments).

    **Note**

    In order to perform highlighting, the actual content of the field is
    required. If the field in question is stored (has ``store`` set to
    ``true`` in the mapping) it will be used, otherwise, the actual
    ``_source`` will be loaded and the relevant field will be extracted
    from it.

    The ``_all`` field cannot be extracted from ``_source``, so it can
    only be used for highlighting if it mapped to have ``store`` set to
    ``true``.

The field name supports wildcard notation. For example, using
``comment_*`` will cause all fields that match the expression to be
highlighted.

Postings highlighter
~~~~~~~~~~~~~~~~~~~~

If ``index_options`` is set to ``offsets`` in the mapping the postings
highlighter will be used instead of the plain highlighter. The postings
highlighter:

-  Is faster since it doesn’t require to reanalyze the text to be
   highlighted: the larger the documents the better the performance gain
   should be

-  Requires less disk space than term\_vectors, needed for the fast
   vector highlighter

-  Breaks the text into sentences and highlights them. Plays really well
   with natural languages, not as well with fields containing for
   instance html markup

-  Treats the document as the whole corpus, and scores individual
   sentences as if they were documents in this corpus, using the BM25
   algorithm

Here is an example of setting the ``content`` field to allow for
highlighting using the postings highlighter on it:

.. code:: js

    {
        "type_name" : {
            "content" : {"index_options" : "offsets"}
        }
    }

    **Note**

    Note that the postings highlighter is meant to perform simple query
    terms highlighting, regardless of their positions. That means that
    when used for instance in combination with a phrase query, it will
    highlight all the terms that the query is composed of, regardless of
    whether they are actually part of a query match, effectively
    ignoring their positions.

    **Warning**

    The postings highlighter does support highlighting of multi term
    queries, like prefix queries, wildcard queries and so on. On the
    other hand, this requires the queries to be rewritten using a proper
    `rewrite method <#query-dsl-multi-term-rewrite>`__ that supports
    multi term extraction, which is a potentially expensive operation.

Fast vector highlighter
~~~~~~~~~~~~~~~~~~~~~~~

If ``term_vector`` information is provided by setting ``term_vector`` to
``with_positions_offsets`` in the mapping then the fast vector
highlighter will be used instead of the plain highlighter. The fast
vector highlighter:

-  Is faster especially for large fields (> ``1MB``)

-  Can be customized with ``boundary_chars``, ``boundary_max_scan``, and
   ``fragment_offset`` (see `below <#boundary-characters>`__)

-  Requires setting ``term_vector`` to ``with_positions_offsets`` which
   increases the size of the index

-  Can combine matches from multiple fields into one result. See
   ``matched_fields``

-  Can assign different weights to matches at different positions
   allowing for things like phrase matches being sorted above term
   matches when highlighting a Boosting Query that boosts phrase matches
   over term matches

Here is an example of setting the ``content`` field to allow for
highlighting using the fast vector highlighter on it (this will cause
the index to be bigger):

.. code:: js

    {
        "type_name" : {
            "content" : {"term_vector" : "with_positions_offsets"}
        }
    }

Force highlighter type
~~~~~~~~~~~~~~~~~~~~~~

The ``type`` field allows to force a specific highlighter type. This is
useful for instance when needing to use the plain highlighter on a field
that has ``term_vectors`` enabled. The allowed values are: ``plain``,
``postings`` and ``fvh``. The following is an example that forces the
use of the plain highlighter:

.. code:: js

    {
        "query" : {...},
        "highlight" : {
            "fields" : {
                "content" : {"type" : "plain"}
            }
        }
    }

Force highlighting on source
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Forces the highlighting to highlight fields based on the source even if
fields are stored separately. Defaults to ``false``.

.. code:: js

    {
        "query" : {...},
        "highlight" : {
            "fields" : {
                "content" : {"force_source" : true}
            }
        }
    }

Highlighting Tags
~~~~~~~~~~~~~~~~~

By default, the highlighting will wrap highlighted text in ``<em>`` and
``</em>``. This can be controlled by setting ``pre_tags`` and
``post_tags``, for example:

.. code:: js

    {
        "query" : {...},
        "highlight" : {
            "pre_tags" : ["<tag1>"],
            "post_tags" : ["</tag1>"],
            "fields" : {
                "_all" : {}
            }
        }
    }

Using the fast vector highlighter there can be more tags, and the
"importance" is ordered.

.. code:: js

    {
        "query" : {...},
        "highlight" : {
            "pre_tags" : ["<tag1>", "<tag2>"],
            "post_tags" : ["</tag1>", "</tag2>"],
            "fields" : {
                "_all" : {}
            }
        }
    }

There are also built in "tag" schemas, with currently a single schema
called ``styled`` with the following ``pre_tags``:

.. code:: js

    <em class="hlt1">, <em class="hlt2">, <em class="hlt3">,
    <em class="hlt4">, <em class="hlt5">, <em class="hlt6">,
    <em class="hlt7">, <em class="hlt8">, <em class="hlt9">,
    <em class="hlt10">

and ``</em>`` as ``post_tags``. If you think of more nice to have built
in tag schemas, just send an email to the mailing list or open an issue.
Here is an example of switching tag schemas:

.. code:: js

    {
        "query" : {...},
        "highlight" : {
            "tags_schema" : "styled",
            "fields" : {
                "content" : {}
            }
        }
    }

Encoder
~~~~~~~

An ``encoder`` parameter can be used to define how highlighted text will
be encoded. It can be either ``default`` (no encoding) or ``html`` (will
escape html, if you use html highlighting tags).

Highlighted Fragments
~~~~~~~~~~~~~~~~~~~~~

Each field highlighted can control the size of the highlighted fragment
in characters (defaults to ``100``), and the maximum number of fragments
to return (defaults to ``5``). For example:

.. code:: js

    {
        "query" : {...},
        "highlight" : {
            "fields" : {
                "content" : {"fragment_size" : 150, "number_of_fragments" : 3}
            }
        }
    }

The ``fragment_size`` is ignored when using the postings highlighter, as
it outputs sentences regardless of their length.

On top of this it is possible to specify that highlighted fragments need
to be sorted by score:

.. code:: js

    {
        "query" : {...},
        "highlight" : {
            "order" : "score",
            "fields" : {
                "content" : {"fragment_size" : 150, "number_of_fragments" : 3}
            }
        }
    }

If the ``number_of_fragments`` value is set to ``0`` then no fragments
are produced, instead the whole content of the field is returned, and of
course it is highlighted. This can be very handy if short texts (like
document title or address) need to be highlighted but no fragmentation
is required. Note that ``fragment_size`` is ignored in this case.

.. code:: js

    {
        "query" : {...},
        "highlight" : {
            "fields" : {
                "_all" : {},
                "bio.title" : {"number_of_fragments" : 0}
            }
        }
    }

When using ``fast-vector-highlighter`` one can use ``fragment_offset``
parameter to control the margin to start highlighting from.

In the case where there is no matching fragment to highlight, the
default is to not return anything. Instead, we can return a snippet of
text from the beginning of the field by setting ``no_match_size``
(default ``0``) to the length of the text that you want returned. The
actual length may be shorter than specified as it tries to break on a
word boundary. When using the postings highlighter it is not possible to
control the actual size of the snippet, therefore the first sentence
gets returned whenever ``no_match_size`` is greater than ``0``.

.. code:: js

    {
        "query" : {...},
        "highlight" : {
            "fields" : {
                "content" : {
                    "fragment_size" : 150,
                    "number_of_fragments" : 3,
                    "no_match_size": 150
                }
            }
        }
    }

Highlight query
~~~~~~~~~~~~~~~

It is also possible to highlight against a query other than the search
query by setting ``highlight_query``. This is especially useful if you
use a rescore query because those are not taken into account by
highlighting by default. Elasticsearch does not validate that
``highlight_query`` contains the search query in any way so it is
possible to define it so legitimate query results aren’t highlighted at
all. Generally it is better to include the search query in the
``highlight_query``. Here is an example of including both the search
query and the rescore query in ``highlight_query``.

.. code:: js

    {
        "fields": [ "_id" ],
        "query" : {
            "match": {
                "content": {
                    "query": "foo bar"
                }
            }
        },
        "rescore": {
            "window_size": 50,
            "query": {
                "rescore_query" : {
                    "match_phrase": {
                        "content": {
                            "query": "foo bar",
                            "phrase_slop": 1
                        }
                    }
                },
                "rescore_query_weight" : 10
            }
        },
        "highlight" : {
            "order" : "score",
            "fields" : {
                "content" : {
                    "fragment_size" : 150,
                    "number_of_fragments" : 3,
                    "highlight_query": {
                        "bool": {
                            "must": {
                                "match": {
                                    "content": {
                                        "query": "foo bar"
                                    }
                                }
                            },
                            "should": {
                                "match_phrase": {
                                    "content": {
                                        "query": "foo bar",
                                        "phrase_slop": 1,
                                        "boost": 10.0
                                    }
                                }
                            },
                            "minimum_should_match": 0
                        }
                    }
                }
            }
        }
    }

Note that the score of text fragment in this case is calculated by the
Lucene highlighting framework. For implementation details you can check
the ``ScoreOrderFragmentsBuilder.java`` class. On the other hand when
using the postings highlighter the fragments are scored using, as
mentioned above, the BM25 algorithm.

Global Settings
~~~~~~~~~~~~~~~

Highlighting settings can be set on a global level and then overridden
at the field level.

.. code:: js

    {
        "query" : {...},
        "highlight" : {
            "number_of_fragments" : 3,
            "fragment_size" : 150,
            "tag_schema" : "styled",
            "fields" : {
                "_all" : { "pre_tags" : ["<em>"], "post_tags" : ["</em>"] },
                "bio.title" : { "number_of_fragments" : 0 },
                "bio.author" : { "number_of_fragments" : 0 },
                "bio.content" : { "number_of_fragments" : 5, "order" : "score" }
            }
        }
    }

Require Field Match
~~~~~~~~~~~~~~~~~~~

``require_field_match`` can be set to ``true`` which will cause a field
to be highlighted only if a query matched that field. ``false`` means
that terms are highlighted on all requested fields regardless if the
query matches specifically on them.

Boundary Characters
~~~~~~~~~~~~~~~~~~~

When highlighting a field using the fast vector highlighter,
``boundary_chars`` can be configured to define what constitutes a
boundary for highlighting. It’s a single string with each boundary
character defined in it. It defaults to ``.,!? \t\n``.

The ``boundary_max_scan`` allows to control how far to look for boundary
characters, and defaults to ``20``.

Matched Fields
~~~~~~~~~~~~~~

The Fast Vector Highlighter can combine matches on multiple fields to
highlight a single field using ``matched_fields``. This is most
intuitive for multifields that analyze the same string in different
ways. All ``matched_fields`` must have ``term_vector`` set to
``with_positions_offsets`` but only the field to which the matches are
combined is loaded so only that field would benefit from having
``store`` set to ``yes``.

In the following examples ``content`` is analyzed by the ``english``
analyzer and ``content.plain`` is analyzed by the ``standard`` analyzer.

.. code:: js

    {
        "query": {
            "query_string": {
                "query": "content.plain:running scissors",
                "fields": ["content"]
            }
        },
        "highlight": {
            "order": "score",
            "fields": {
                "content": {
                    "matched_fields": ["content", "content.plain"],
                    "type" : "fvh"
                }
            }
        }
    }

The above matches both "run with scissors" and "running with scissors"
and would highlight "running" and "scissors" but not "run". If both
phrases appear in a large document then "running with scissors" is
sorted above "run with scissors" in the fragments list because there are
more matches in that fragment.

.. code:: js

    {
        "query": {
            "query_string": {
                "query": "running scissors",
                "fields": ["content", "content.plain^10"]
            }
        },
        "highlight": {
            "order": "score",
            "fields": {
                "content": {
                    "matched_fields": ["content", "content.plain"],
                    "type" : "fvh"
                }
            }
        }
    }

The above highlights "run" as well as "running" and "scissors" but still
sorts "running with scissors" above "run with scissors" because the
plain match ("running") is boosted.

.. code:: js

    {
        "query": {
            "query_string": {
                "query": "running scissors",
                "fields": ["content", "content.plain^10"]
            }
        },
        "highlight": {
            "order": "score",
            "fields": {
                "content": {
                    "matched_fields": ["content.plain"],
                    "type" : "fvh"
                }
            }
        }
    }

The above query wouldn’t highlight "run" or "scissor" but shows that it
is just fine not to list the field to which the matches are combined
(``content``) in the matched fields.

    **Note**

    Technically it is also fine to add fields to ``matched_fields`` that
    don’t share the same underlying string as the field to which the
    matches are combined. The results might not make much sense and if
    one of the matches is off the end of the text then the whole query
    will fail.

    **Note**

    There is a small amount of overhead involved with setting
    ``matched_fields`` to a non-empty array so always prefer

    .. code:: js

            "highlight": {
                "fields": {
                    "content": {}
                }
            }

    to

    .. code:: js

            "highlight": {
                "fields": {
                    "content": {
                        "matched_fields": ["content"],
                        "type" : "fvh"
                    }
                }
            }

Phrase Limit
~~~~~~~~~~~~

The ``fast-vector-highlighter`` has a ``phrase_limit`` parameter that
prevents it from analyzing too many phrases and eating tons of memory.
It defaults to 256 so only the first 256 matching phrases in the
document scored considered. You can raise the limit with the
``phrase_limit`` parameter but keep in mind that scoring more phrases
consumes more time and memory.

If using ``matched_fields`` keep in mind that ``phrase_limit`` phrases
per matched field are considered.

Field Highlight Order
---------------------

Elasticsearch highlights the fields in the order that they are sent. Per
the json spec objects are unordered but if you need to be explicit about
the order that fields are highlighted then you can use an array for
``fields`` like this:

.. code:: js

        "highlight": {
            "fields": [
                {"title":{ /*params*/ }},
                {"text":{ /*params*/ }}
            ]
        }

None of the highlighters built into Elasticsearch care about the order
that the fields are highlighted but a plugin may.

Rescoring
---------

Rescoring can help to improve precision by reordering just the top (eg
100 - 500) documents returned by the
```query`` <#search-request-query>`__ and
```post_filter`` <#search-request-post-filter>`__ phases, using a
secondary (usually more costly) algorithm, instead of applying the
costly algorithm to all documents in the index.

A ``rescore`` request is executed on each shard before it returns its
results to be sorted by the node handling the overall search request.

Currently the rescore API has only one implementation: the query
rescorer, which uses a query to tweak the scoring. In the future,
alternative rescorers may be made available, for example, a pair-wise
rescorer.

    **Note**

    the ``rescore`` phase is not executed when
    ```search_type`` <#search-request-search-type>`__ is set to ``scan``
    or ``count``.

    **Note**

    when exposing pagination to your users, you should not change
    ``window_size`` as you step through each page (by passing different
    ``from`` values) since that can alter the top hits causing results
    to confusingly shift as the user steps through pages.

Query rescorer
~~~~~~~~~~~~~~

The query rescorer executes a second query only on the Top-K results
returned by the ```query`` <#search-request-query>`__ and
```post_filter`` <#search-request-post-filter>`__ phases. The number of
docs which will be examined on each shard can be controlled by the
``window_size`` parameter, which defaults to ```from`` and
``size`` <#search-request-from-size>`__.

By default the scores from the original query and the rescore query are
combined linearly to produce the final ``_score`` for each document. The
relative importance of the original query and of the rescore query can
be controlled with the ``query_weight`` and ``rescore_query_weight``
respectively. Both default to ``1``.

For example:

.. code:: js

    curl -s -XPOST 'localhost:9200/_search' -d '{
       "query" : {
          "match" : {
             "field1" : {
                "operator" : "or",
                "query" : "the quick brown",
                "type" : "boolean"
             }
          }
       },
       "rescore" : {
          "window_size" : 50,
          "query" : {
             "rescore_query" : {
                "match" : {
                   "field1" : {
                      "query" : "the quick brown",
                      "type" : "phrase",
                      "slop" : 2
                   }
                }
             },
             "query_weight" : 0.7,
             "rescore_query_weight" : 1.2
          }
       }
    }
    '

The way the scores are combined can be controled with the
``score_mode``:

+--------------------------------------+--------------------------------------+
| Score Mode                           | Description                          |
+======================================+======================================+
| ``total``                            | Add the original score and the       |
|                                      | rescore query score. The default.    |
+--------------------------------------+--------------------------------------+
| ``multiply``                         | Multiply the original score by the   |
|                                      | rescore query score. Useful for      |
|                                      | ```function query`` <#query-dsl-func |
|                                      | tion-score-query>`__                 |
|                                      | rescores.                            |
+--------------------------------------+--------------------------------------+
| ``avg``                              | Average the original score and the   |
|                                      | rescore query score.                 |
+--------------------------------------+--------------------------------------+
| ``max``                              | Take the max of original score and   |
|                                      | the rescore query score.             |
+--------------------------------------+--------------------------------------+
| ``min``                              | Take the min of the original score   |
|                                      | and the rescore query score.         |
+--------------------------------------+--------------------------------------+

Multiple Rescores
~~~~~~~~~~~~~~~~~

It is also possible to execute multiple rescores in sequence:

.. code:: js

    curl -s -XPOST 'localhost:9200/_search' -d '{
       "query" : {
          "match" : {
             "field1" : {
                "operator" : "or",
                "query" : "the quick brown",
                "type" : "boolean"
             }
          }
       },
       "rescore" : [ {
          "window_size" : 100,
          "query" : {
             "rescore_query" : {
                "match" : {
                   "field1" : {
                      "query" : "the quick brown",
                      "type" : "phrase",
                      "slop" : 2
                   }
                }
             },
             "query_weight" : 0.7,
             "rescore_query_weight" : 1.2
          }
       }, {
          "window_size" : 10,
          "query" : {
             "score_mode": "multiply",
             "rescore_query" : {
                "function_score" : {
                   "script_score": {
                      "script": "log10(doc['numeric'].value + 2)"
                   }
                }
             }
          }
       } ]
    }
    '

The first one gets the results of the query then the second one gets the
results of the first, etc. The second rescore will "see" the sorting
done by the first rescore so it is possible to use a large window on the
first rescore to pull documents into a smaller window for the second
rescore.

Search Type
-----------

There are different execution paths that can be done when executing a
distributed search. The distributed search operation needs to be
scattered to all the relevant shards and then all the results are
gathered back. When doing scatter/gather type execution, there are
several ways to do that, specifically with search engines.

One of the questions when executing a distributed search is how much
results to retrieve from each shard. For example, if we have 10 shards,
the 1st shard might hold the most relevant results from 0 till 10, with
other shards results ranking below it. For this reason, when executing a
request, we will need to get results from 0 till 10 from all shards,
sort them, and then return the results if we want to ensure correct
results.

Another question, which relates to the search engine, is the fact that
each shard stands on its own. When a query is executed on a specific
shard, it does not take into account term frequencies and other search
engine information from the other shards. If we want to support accurate
ranking, we would need to first gather the term frequencies from all
shards to calculate global term frequencies, then execute the query on
each shard using these globale frequencies.

Also, because of the need to sort the results, getting back a large
document set, or even scrolling it, while maintaing the correct sorting
behavior can be a very expensive operation. For large result set
scrolling without sorting, the ``scan`` search type (explained below) is
also available.

Elasticsearch is very flexible and allows to control the type of search
to execute on a **per search request** basis. The type can be configured
by setting the **search\_type** parameter in the query string. The types
are:

Query And Fetch
~~~~~~~~~~~~~~~

Parameter value: **query\_and\_fetch**.

The most naive (and possibly fastest) implementation is to simply
execute the query on all relevant shards and return the results. Each
shard returns ``size`` results. Since each shard already returns
``size`` hits, this type actually returns ``size`` times
``number of shards`` results back to the caller.

Query Then Fetch
~~~~~~~~~~~~~~~~

Parameter value: **query\_then\_fetch**.

The query is executed against all shards, but only enough information is
returned (**not the document content**). The results are then sorted and
ranked, and based on it, **only the relevant shards** are asked for the
actual document content. The return number of hits is exactly as
specified in ``size``, since they are the only ones that are fetched.
This is very handy when the index has a lot of shards (not replicas,
shard id groups).

    **Note**

    This is the default setting, if you do not specify a ``search_type``
    in your request.

Dfs, Query And Fetch
~~~~~~~~~~~~~~~~~~~~

Parameter value: **dfs\_query\_and\_fetch**.

Same as "Query And Fetch", except for an initial scatter phase which
goes and computes the distributed term frequencies for more accurate
scoring.

Dfs, Query Then Fetch
~~~~~~~~~~~~~~~~~~~~~

Parameter value: **dfs\_query\_then\_fetch**.

Same as "Query Then Fetch", except for an initial scatter phase which
goes and computes the distributed term frequencies for more accurate
scoring.

Count
~~~~~

Parameter value: **count**.

A special search type that returns the count that matched the search
request without any docs (represented in ``total_hits``), and possibly,
including aggregations as well. In general, this is preferable to the
``count`` API as it provides more options.

Scan
~~~~

Parameter value: **scan**.

The ``scan`` search type disables sorting in order to allow very
efficient scrolling through large result sets. See ? for more.

Scroll
------

While a ``search`` request returns a single “page” of results, the
``scroll`` API can be used to retrieve large numbers of results (or even
all results) from a single search request, in much the same way as you
would use a cursor on a traditional database.

Scrolling is not intended for real time user requests, but rather for
processing large amounts of data, e.g. in order to reindex the contents
of one index into a new index with a different configuration.

Some of the officially supported clients provide helpers to assist with
scrolled searches and reindexing of documents from one index to another:

Perl
    See
    `Search::Elasticsearch::Bulk <https://metacpan.org/pod/Search::Elasticsearch::Bulk>`__
    and
    `Search::Elasticsearch::Scroll <https://metacpan.org/pod/Search::Elasticsearch::Scroll>`__

Python
    See
    `elasticsearch.helpers.\* <http://elasticsearch-py.readthedocs.org/en/master/helpers.html>`__

    **Note**

    The results that are returned from a scroll request reflect the
    state of the index at the time that the initial ``search`` request
    was made, like a snapshot in time. Subsequent changes to documents
    (index, update or delete) will only affect later search requests.

In order to use scrolling, the initial search request should specify the
``scroll`` parameter in the query string, which tells Elasticsearch how
long it should keep the “search context” alive (see ?), eg
``?scroll=1m``.

.. code:: js

    curl -XGET 'localhost:9200/twitter/tweet/_search?scroll=1m' -d '
    {
        "query": {
            "match" : {
                "title" : "elasticsearch"
            }
        }
    }
    '

The result from the above request includes a ``scroll_id``, which should
be passed to the ``scroll`` API in order to retrieve the next batch of
results.

.. code:: js

    curl -XGET  'localhost:9200/_search/scroll?scroll=1m'   \
         -d       'c2Nhbjs2OzM0NDg1ODpzRlBLc0FXNlNyNm5JWUc1' 

``GET`` or ``POST`` can be used.

The URL should not include the ``index`` or ``type`` name — these are
specified in the original ``search`` request instead.

The ``scroll`` parameter tells Elasticsearch to keep the search context
open for another ``1m``.

The ``scroll_id`` can be passed in the request body or in the query
string as ``?scroll_id=....``

Each call to the ``scroll`` API returns the next batch of results until
there are no more results left to return, ie the ``hits`` array is
empty.

    **Important**

    The initial search request and each subsequent scroll request
    returns a new ``scroll_id`` — only the most recent ``scroll_id``
    should be used.

    **Note**

    If the request specifies aggregations, only the initial search
    response will contain the aggregations results.

Efficient scrolling with Scroll-Scan
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Deep pagination with ```from`` and
``size`` <#search-request-from-size>`__ — e.g.
``?size=10&from=10000`` — is very inefficient as (in this example)
100,000 sorted results have to be retrieved from each shard and resorted
in order to return just 10 results. This process has to be repeated for
every page requested.

The ``scroll`` API keeps track of which results have already been
returned and so is able to return sorted results more efficiently than
with deep pagination. However, sorting results (which happens by
default) still has a cost.

Normally, you just want to retrieve all results and the order doesn’t
matter. Scrolling can be combined with the ```scan`` <#scan>`__ search
type to disable sorting and to return results in the most efficient way
possible. All that is needed is to add ``search_type=scan`` to the query
string of the initial search request:

.. code:: js

    curl 'localhost:9200/twitter/tweet/_search?scroll=1m&search_type=scan'  -d '
    {
        "query": {
            "match" : {
                "title" : "elasticsearch"
            }
        }
    }
    '

Setting ``search_type`` to ``scan`` disables sorting and makes scrolling
very efficient.

A scanning scroll request differs from a standard scroll request in
three ways:

-  Sorting is disabled. Results are returned in the order they appear in
   the index.

-  Aggregations are not supported.

-  The response of the initial ``search`` request will not contain any
   results in the ``hits`` array. The first results will be returned by
   the first ``scroll`` request.

-  The ```size`` parameter <#search-request-from-size>`__ controls the
   number of results **per shard**, not per request, so a ``size`` of
   ``10`` which hits 5 shards will return a maximum of 50 results per
   ``scroll`` request.

Keeping the search context alive
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``scroll`` parameter (passed to the ``search`` request and to every
``scroll`` request) tells Elasticsearch how long it should keep the
search context alive. Its value (e.g. ``1m``, see ?) does not need to be
long enough to process all data — it just needs to be long enough to
process the previous batch of results. Each ``scroll`` request (with the
``scroll`` parameter) sets a new expiry time.

Normally, the `background merge process <#index-modules-merge>`__
optimizes the index by merging together smaller segments to create new
bigger segments, at which time the smaller segments are deleted. This
process continues during scrolling, but an open search context prevents
the old segments from being deleted while they are still in use. This is
how Elasticsearch is able to return the results of the initial search
request, regardless of subsequent changes to documents.

    **Tip**

    Keeping older segments alive means that more file handles are
    needed. Ensure that you have configured your nodes to have ample
    free file handles. See ?.

You can check how many search contexts are open with the `nodes stats
API <#cluster-nodes-stats>`__:

.. code:: js

    curl -XGET localhost:9200/_nodes/stats/indices/search?pretty

Clear scroll API
~~~~~~~~~~~~~~~~

Search contexts are removed automatically either when all results have
been retrieved or when the ``scroll`` timeout has been exceeded.
However, you can clear a search context manually with the
``clear-scroll`` API:

.. code:: js

    curl -XDELETE localhost:9200/_search/scroll \
         -d 'c2Nhbjs2OzM0NDg1ODpzRlBLc0FXNlNyNm5JWUc1' 

The ``scroll_id`` can be passed in the request body or in the query
string.

Multiple scroll IDs can be passed as comma separated values:

.. code:: js

    curl -XDELETE localhost:9200/_search/scroll \
         -d 'c2Nhbjs2OzM0NDg1ODpzRlBLc0FXNlNyNm5JWUc1,aGVuRmV0Y2g7NTsxOnkxaDZ' 

All search contexts can be cleared with the ``_all`` parameter:

.. code:: js

    curl -XDELETE localhost:9200/_search/scroll/_all

Preference
----------

Controls a ``preference`` of which shard replicas to execute the search
request on. By default, the operation is randomized between the shard
replicas.

The ``preference`` is a query string parameter which can be set to:

+------------+---------------------------------------------------------------+
| ``_primary | The operation will go and be executed only on the primary     |
| ``         | shards.                                                       |
+------------+---------------------------------------------------------------+
| ``_primary | The operation will go and be executed on the primary shard,   |
| _first``   | and if not available (failover), will execute on other        |
|            | shards.                                                       |
+------------+---------------------------------------------------------------+
| ``_local`` | The operation will prefer to be executed on a local allocated |
|            | shard if possible.                                            |
+------------+---------------------------------------------------------------+
| ``_only_no | Restricts the search to execute only on a node with the       |
| de:xyz``   | provided node id (``xyz`` in this case).                      |
+------------+---------------------------------------------------------------+
| ``_prefer_ | Prefers execution on the node with the provided node id       |
| node:xyz`` | (``xyz`` in this case) if applicable.                         |
+------------+---------------------------------------------------------------+
| ``_shards: | Restricts the operation to the specified shards. (``2`` and   |
| 2,3``      | ``3`` in this case). This preference can be combined with     |
|            | other preferences but it has to appear first:                 |
|            | ``_shards:2,3;_primary``                                      |
+------------+---------------------------------------------------------------+
| Custom     | A custom value will be used to guarantee that the same shards |
| (string)   | will be used for the same custom value. This can help with    |
| value      | "jumping values" when hitting different shards in different   |
|            | refresh states. A sample value can be something like the web  |
|            | session id, or the user name.                                 |
+------------+---------------------------------------------------------------+

For instance, use the user’s session ID to ensure consistent ordering of
results for the user:

.. code:: js

    curl localhost:9200/_search?preference=xyzabc123 -d '
    {
        "query": {
            "match": {
                "title": "elasticsearch"
            }
        }
    }
    '

Explain
-------

Enables explanation for each hit on how its score was computed.

.. code:: js

    {
        "explain": true,
        "query" : {
            "term" : { "user" : "kimchy" }
        }
    }

Version
-------

Returns a version for each search hit.

.. code:: js

    {
        "version": true,
        "query" : {
            "term" : { "user" : "kimchy" }
        }
    }

Index Boost
-----------

Allows to configure different boost level per index when searching
across more than one indices. This is very handy when hits coming from
one index matter more than hits coming from another index (think social
graph where each user has an index).

.. code:: js

    {
        "indices_boost" : {
            "index1" : 1.4,
            "index2" : 1.3
        }
    }

min\_score
----------

Exclude documents which have a ``_score`` less than the minimum
specified in ``min_score``:

.. code:: js

    {
        "min_score": 0.5,
        "query" : {
            "term" : { "user" : "kimchy" }
        }
    }

Note, most times, this does not make much sense, but is provided for
advanced use cases.

Named Queries and Filters
-------------------------

Each filter and query can accept a ``_name`` in its top level
definition.

.. code:: js

    {
        "filtered" : {
            "query" : {
                "bool" : {
                    "should" : [
                        {"match" : { "name.first" : {"query" : "shay", "_name" : "first"} }},
                        {"match" : { "name.last" : {"query" : "banon", "_name" : "last"} }}
                    ]
                }
            },
            "filter" : {
                "terms" : {
                    "name.last" : ["banon", "kimchy"],
                    "_name" : "test"
                }
            }
        }
    }

The search response will include for each hit the ``matched_queries`` it
matched on. The tagging of queries and filters only make sense for
compound queries and filters (such as ``bool`` query and filter, ``or``
and ``and`` filter, ``filtered`` query etc.).

Note, the query filter had to be enhanced in order to support this. In
order to set a name, the ``fquery`` filter should be used, which wraps a
query (just so there will be a place to set a name for it), for example:

.. code:: js

    {
        "filtered" : {
            "query" : {
                "term" : { "name.first" : "shay" }
            },
            "filter" : {
                "fquery" : {
                    "query" : {
                        "term" : { "name.last" : "banon" }
                    },
                    "_name" : "test"
                }
            }
        }
    }

Named queries
~~~~~~~~~~~~~

The support for the ``_name`` option on queries is available from
version ``0.90.4`` and the support on filters is available also in
versions before ``0.90.4``.

Search Template
===============

The ``/_search/template`` endpoint allows to use the mustache language
to pre render search requests, before they are executed and fill
existing templates with template parameters.

.. code:: js

    GET /_search/template
    {
        "template" : {
          "query": { "match" : { "{{my_field}}" : "{{my_value}}" } },
          "size" : "{{my_size}}"
        },
        "params" : {
            "my_field" : "foo",
            "my_value" : "bar",
            "my_size" : 5
        }
    }

For more information on how Mustache templating and what kind of
templating you can do with it check out the `online documentation of the
mustache project <http://mustache.github.io/mustache.5.html>`__.

**More template examples**

**Filling in a query string with a single value**

.. code:: js

    GET /_search/template
    {
        "template": {
            "query": {
                "match": {
                    "title": "{{query_string}}"
                }
            }
        },
        "params": {
            "query_string": "search for these words"
        }
    }

**Passing an array of strings**

.. code:: js

    GET /_search/template
    {
      "template": {
        "query": {
          "terms": {
            "status": [
              "{{#status}}",
              "{{.}}",
              "{{/status}}"
            ]
          }
        }
      },
      "params": {
        "status": [ "pending", "published" ]
      }
    }

which is rendered as:

.. code:: js

    {
    "query": {
      "terms": {
        "status": [ "pending", "published" ]
      }
    }

**Default values**

A default value is written as ``{{var}}{{^var}}default{{/var}}`` for
instance:

.. code:: js

    {
      "template": {
        "query": {
          "range": {
            "line_no": {
              "gte": "{{start}}",
              "lte": "{{end}}{{^end}}20{{/end}}"
            }
          }
        }
      },
      "params": { ... }
    }

When ``params`` is ``{ "start": 10, "end": 15 }`` this query would be
rendered as:

.. code:: js

    {
        "range": {
            "line_no": {
                "gte": "10",
                "lte": "15"
            }
      }
    }

But when ``params`` is ``{ "start": 10 }`` this query would use the
default value for ``end``:

.. code:: js

    {
        "range": {
            "line_no": {
                "gte": "10",
                "lte": "20"
            }
        }
    }

**Conditional clauses**

Conditional clauses cannot be expressed using the JSON form of the
template. Instead, the template **must** be passed as a string. For
instance, let’s say we wanted to run a ``match`` query on the ``line``
field, and optionally wanted to filter by line numbers, where ``start``
and ``end`` are optional.

The ``params`` would look like:

.. code:: js

    {
        "params": {
            "text":      "words to search for",
            "line_no": { 
                "start": 10, 
                "end":   20  
            }
        }
    }

All three of these elements are optional.

We could write the query as:

.. code:: js

    {
      "query": {
        "filtered": {
          "query": {
            "match": {
              "line": "{{text}}" 
            }
          },
          "filter": {
            {{#line_no}} 
              "range": {
                "line_no": {
                  {{#start}} 
                    "gte": "{{start}}" 
                    {{#end}},{{/end}} 
                  {{/start}} 
                  {{#end}} 
                    "lte": "{{end}}" 
                  {{/end}} 
                }
              }
            {{/line_no}} 
          }
        }
      }
    }

Fill in the value of param ``text``

Include the ``range`` filter only if ``line_no`` is specified

Include the ``gte`` clause only if ``line_no.start`` is specified

Fill in the value of param ``line_no.start``

Add a comma after the ``gte`` clause only if ``line_no.start`` AND
``line_no.end`` are specified

Include the ``lte`` clause only if ``line_no.end`` is specified

Fill in the value of param ``line_no.end``

    **Note**

    As written above, this template is not valid JSON because it
    includes the *section* markers like ``{{#line_no}}``. For this
    reason, the template should either be stored in a file (see ?) or,
    when used via the REST API, should be written as a string:

    .. code:: json

        "template": "{\"query\":{\"filtered\":{\"query\":{\"match\":{\"line\":\"{{text}}\"}},\"filter\":{{{#line_no}}\"range\":{\"line_no\":{{{#start}}\"gte\":\"{{start}}\"{{#end}},{{/end}}{{/start}}{{#end}}\"lte\":\"{{end}}\"{{/end}}}}{{/line_no}}}}}}"

**Pre-registered template**

You can register search templates by storing it in the
``config/scripts`` directory, in a file using the ``.mustache``
extension. In order to execute the stored template, reference it by it’s
name under the ``template`` key:

.. code:: js

    GET /_search/template
    {
        "template": {
            "file": "storedTemplate" 
        },
        "params": {
            "query_string": "search for these words"
        }
    }

Name of the the query template in ``config/scripts/``, i.e.,
``storedTemplate.mustache``.

You can also register search templates by storing it in the
elasticsearch cluster in a special index named ``.scripts``. There are
REST APIs to manage these indexed templates.

.. code:: js

    POST /_search/template/<templatename>
    {
        "template": {
            "query": {
                "match": {
                    "title": "{{query_string}}"
                }
            }
        }
    }

This template can be retrieved by

.. code:: js

    GET /_search/template/<templatename>

which is rendered as:

.. code:: js

    {
        "template": {
            "query": {
                "match": {
                    "title": "{{query_string}}"
                }
            }
        }
    }

This template can be deleted by

.. code:: js

    DELETE /_search/template/<templatename>

To use an indexed template at search time use:

.. code:: js

    GET /_search/template
    {
        "template": {
            "id": "templateName" 
        },
        "params": {
            "query_string": "search for these words"
        }
    }

Name of the the query template stored in the .scripts index.

Search Shards API
=================

The search shards api returns the indices and shards that a search
request would be executed against. This can give useful feedback for
working out issues or planning optimizations with routing and shard
preferences.

The ``index`` and ``type`` parameters may be single values, or
comma-separated.

**Usage**

Full example:

.. code:: js

    curl -XGET 'localhost:9200/twitter/_search_shards'

This will yield the following result:

.. code:: js

    {
      "nodes": {
        "JklnKbD7Tyqi9TP3_Q_tBg": {
          "name": "Rl'nnd",
          "transport_address": "inet[/192.168.1.113:9300]"
        }
      },
      "shards": [
        [
          {
            "index": "twitter",
            "node": "JklnKbD7Tyqi9TP3_Q_tBg",
            "primary": true,
            "relocating_node": null,
            "shard": 3,
            "state": "STARTED"
          }
        ],
        [
          {
            "index": "twitter",
            "node": "JklnKbD7Tyqi9TP3_Q_tBg",
            "primary": true,
            "relocating_node": null,
            "shard": 4,
            "state": "STARTED"
          }
        ],
        [
          {
            "index": "twitter",
            "node": "JklnKbD7Tyqi9TP3_Q_tBg",
            "primary": true,
            "relocating_node": null,
            "shard": 0,
            "state": "STARTED"
          }
        ],
        [
          {
            "index": "twitter",
            "node": "JklnKbD7Tyqi9TP3_Q_tBg",
            "primary": true,
            "relocating_node": null,
            "shard": 2,
            "state": "STARTED"
          }
        ],
        [
          {
            "index": "twitter",
            "node": "JklnKbD7Tyqi9TP3_Q_tBg",
            "primary": true,
            "relocating_node": null,
            "shard": 1,
            "state": "STARTED"
          }
        ]
      ]
    }

And specifying the same request, this time with a routing value:

.. code:: js

    curl -XGET 'localhost:9200/twitter/_search_shards?routing=foo,baz'

This will yield the following result:

.. code:: js

    {
      "nodes": {
        "JklnKbD7Tyqi9TP3_Q_tBg": {
          "name": "Rl'nnd",
          "transport_address": "inet[/192.168.1.113:9300]"
        }
      },
      "shards": [
        [
          {
            "index": "twitter",
            "node": "JklnKbD7Tyqi9TP3_Q_tBg",
            "primary": true,
            "relocating_node": null,
            "shard": 2,
            "state": "STARTED"
          }
        ],
        [
          {
            "index": "twitter",
            "node": "JklnKbD7Tyqi9TP3_Q_tBg",
            "primary": true,
            "relocating_node": null,
            "shard": 4,
            "state": "STARTED"
          }
        ]
      ]
    }

This time the search will only be executed against two of the shards,
because routing values have been specified.

**All parameters:**

+------------+---------------------------------------------------------------+
| ``routing` | A comma-separated list of routing values to take into account |
| `          | when determining which shards a request would be executed     |
|            | against.                                                      |
+------------+---------------------------------------------------------------+
| ``preferen | Controls a ``preference`` of which shard replicas to execute  |
| ce``       | the search request on. By default, the operation is           |
|            | randomized between the shard replicas. See the                |
|            | `preference <search-request-preference.html>`__ documentation |
|            | for a list of all acceptable values.                          |
+------------+---------------------------------------------------------------+
| ``local``  | A boolean value whether to read the cluster state locally in  |
|            | order to determine where shards are allocated instead of      |
|            | using the Master node’s cluster state.                        |
+------------+---------------------------------------------------------------+

Aggregations
============

The aggregations framework helps provide aggregated data based on a
search query. It is based on simple building blocks called aggregations,
that can be composed in order to build complex summaries of the data.

An aggregation can be seen as a *unit-of-work* that builds analytic
information over a set of documents. The context of the execution
defines what this document set is (e.g. a top-level aggregation executes
within the context of the executed query/filters of the search request).

There are many different types of aggregations, each with its own
purpose and output. To better understand these types, it is often easier
to break them into two main families:

*Bucketing*
    A family of aggregations that build buckets, where each bucket is
    associated with a *key* and a document criterion. When the
    aggregation is executed, all the buckets criteria are evaluated on
    every document in the context and when a criterion matches, the
    document is considered to "fall in" the relevant bucket. By the end
    of the aggregation process, we’ll end up with a list of buckets -
    each one with a set of documents that "belong" to it.

*Metric*
    Aggregations that keep track and compute metrics over a set of
    documents.

The interesting part comes next. Since each bucket effectively defines a
document set (all documents belonging to the bucket), one can
potentially associate aggregations on the bucket level, and those will
execute within the context of that bucket. This is where the real power
of aggregations kicks in: **aggregations can be nested!**

    **Note**

    Bucketing aggregations can have sub-aggregations (bucketing or
    metric). The sub-aggregations will be computed for the buckets which
    their parent aggregation generates. There is no hard limit on the
    level/depth of nested aggregations (one can nest an aggregation
    under a "parent" aggregation, which is itself a sub-aggregation of
    another higher-level aggregation).

**Structuring Aggregations**

The following snippet captures the basic structure of aggregations:

.. code:: js

    "aggregations" : {
        "<aggregation_name>" : {
            "<aggregation_type>" : {
                <aggregation_body>
            }
            [,"meta" : {  [<meta_data_body>] } ]?
            [,"aggregations" : { [<sub_aggregation>]+ } ]?
        }
        [,"<aggregation_name_2>" : { ... } ]*
    }

The ``aggregations`` object (the key ``aggs`` can also be used) in the
JSON holds the aggregations to be computed. Each aggregation is
associated with a logical name that the user defines (e.g. if the
aggregation computes the average price, then it would make sense to name
it ``avg_price``). These logical names will also be used to uniquely
identify the aggregations in the response. Each aggregation has a
specific type (``<aggregation_type>`` in the above snippet) and is
typically the first key within the named aggregation body. Each type of
aggregation defines its own body, depending on the nature of the
aggregation (e.g. an ``avg`` aggregation on a specific field will define
the field on which the average will be calculated). At the same level of
the aggregation type definition, one can optionally define a set of
additional aggregations, though this only makes sense if the aggregation
you defined is of a bucketing nature. In this scenario, the
sub-aggregations you define on the bucketing aggregation level will be
computed for all the buckets built by the bucketing aggregation. For
example, if you define a set of aggregations under the ``range``
aggregation, the sub-aggregations will be computed for the range buckets
that are defined.

**Values Source**

Some aggregations work on values extracted from the aggregated
documents. Typically, the values will be extracted from a specific
document field which is set using the ``field`` key for the
aggregations. It is also possible to define a
```script`` <#modules-scripting>`__ which will generate the values (per
document).

When both ``field`` and ``script`` settings are configured for the
aggregation, the script will be treated as a ``value script``. While
normal scripts are evaluated on a document level (i.e. the script has
access to all the data associated with the document), value scripts are
evaluated on the **value** level. In this mode, the values are extracted
from the configured ``field`` and the ``script`` is used to apply a
"transformation" over these value/s.

    **Note**

    When working with scripts, the ``lang`` and ``params`` settings can
    also be defined. The former defines the scripting language which is
    used (assuming the proper language is available in Elasticsearch,
    either by default or as a plugin). The latter enables defining all
    the "dynamic" expressions in the script as parameters, which enables
    the script to keep itself static between calls (this will ensure the
    use of the cached compiled scripts in Elasticsearch).

Scripts can generate a single value or multiple values per document.
When generating multiple values, one can use the
``script_values_sorted`` settings to indicate whether these values are
sorted or not. Internally, Elasticsearch can perform optimizations when
dealing with sorted values (for example, with the ``min`` aggregations,
knowing the values are sorted, Elasticsearch will skip the iterations
over all the values and rely on the first value in the list to be the
minimum value among all other values associated with the same document).

**Metrics Aggregations**

The aggregations in this family compute metrics based on values
extracted in one way or another from the documents that are being
aggregated. The values are typically extracted from the fields of the
document (using the field data), but can also be generated using
scripts.

Numeric metrics aggregations are a special type of metrics aggregation
which output numeric values. Some aggregations output a single numeric
metric (e.g. ``avg``) and are called
``single-value numeric metrics aggregation``, others generate multiple
metrics (e.g. ``stats``) and are called
``multi-value numeric metrics aggregation``. The distinction between
single-value and multi-value numeric metrics aggregations plays a role
when these aggregations serve as direct sub-aggregations of some bucket
aggregations (some bucket aggregations enable you to sort the returned
buckets based on the numeric metrics in each bucket).

**Bucket Aggregations**

Bucket aggregations don’t calculate metrics over fields like the metrics
aggregations do, but instead, they create buckets of documents. Each
bucket is associated with a criterion (depending on the aggregation
type) which determines whether or not a document in the current context
"falls" into it. In other words, the buckets effectively define document
sets. In addition to the buckets themselves, the ``bucket`` aggregations
also compute and return the number of documents that "fell in" to each
bucket.

Bucket aggregations, as opposed to ``metrics`` aggregations, can hold
sub-aggregations. These sub-aggregations will be aggregated for the
buckets created by their "parent" bucket aggregation.

There are different bucket aggregators, each with a different
"bucketing" strategy. Some define a single bucket, some define fixed
number of multiple buckets, and others dynamically create the buckets
during the aggregation process.

**Caching heavy aggregations**

Frequently used aggregations (e.g. for display on the home page of a
website) can be cached for faster responses. These cached results are
the same results that would be returned by an uncached aggregation — you
will never get stale results.

See ? for more details.

**Returning only aggregation results**

There are many occasions when aggregations are required but search hits
are not. For these cases the hits can be ignored by adding
``search_type=count`` to the request URL parameters. For example:

.. code:: js

    $ curl -XGET 'http://localhost:9200/twitter/tweet/_search?search_type=count' -d '{
      "aggregations": {
        "my_agg": {
          "terms": {
            "field": "text"
          }
        }
      }
    }
    '

Setting ``search_type`` to ``count`` avoids executing the fetch phase of
the search making the request more efficient. See ? for more information
on the ``search_type`` parameter.

**Metadata**

You can associate a piece of metadata with individual aggregations at
request time that will be returned in place at response time.

Consider this example where we want to associate the color blue with our
``terms`` aggregation.

.. code:: js

    {
        ...
        aggs": {
            "titles": {
                "terms": {
                    "field": "title"
                },
                "meta": {
                    "color": "blue"
                },
            }
        }
    }

Then that piece of metadata will be returned in place for our ``titles``
terms aggregation

.. code:: js

    {
        ...
        "aggregations": {
            "titles": {
                "meta": {
                    "color" : "blue"
                },
                "buckets": [
                ]
            }
        }
    }

Min Aggregation
---------------

A ``single-value`` metrics aggregation that keeps track and returns the
minimum value among numeric values extracted from the aggregated
documents. These values can be extracted either from specific numeric
fields in the documents, or be generated by a provided script.

Computing the min price value across all documents:

.. code:: js

    {
        "aggs" : {
            "min_price" : { "min" : { "field" : "price" } }
        }
    }

Response:

.. code:: js

    {
        ...

        "aggregations": {
            "min_price": {
                "value": 10
            }
        }
    }

As can be seen, the name of the aggregation (``min_price`` above) also
serves as the key by which the aggregation result can be retrieved from
the returned response.

Script
~~~~~~

Computing the min price value across all document, this time using a
script:

.. code:: js

    {
        "aggs" : {
            "min_price" : { "min" : { "script" : "doc['price'].value" } }
        }
    }

Value Script
~~~~~~~~~~~~

Let’s say that the prices of the documents in our index are in USD, but
we would like to compute the min in EURO (and for the sake of this
example, lets say the conversion rate is 1.2). We can use a value script
to apply the conversion rate to every value before it is aggregated:

.. code:: js

    {
        "aggs" : {
            "min_price_in_euros" : {
                "min" : {
                    "field" : "price",
                    "script" : "_value * conversion_rate",
                    "params" : {
                        "conversion_rate" : 1.2
                    }
                }
            }
        }
    }

Max Aggregation
---------------

A ``single-value`` metrics aggregation that keeps track and returns the
maximum value among the numeric values extracted from the aggregated
documents. These values can be extracted either from specific numeric
fields in the documents, or be generated by a provided script.

Computing the max price value across all documents

.. code:: js

    {
        "aggs" : {
            "max_price" : { "max" : { "field" : "price" } }
        }
    }

Response:

.. code:: js

    {
        ...

        "aggregations": {
            "max_price": {
                "value": 35
            }
        }
    }

As can be seen, the name of the aggregation (``max_price`` above) also
serves as the key by which the aggregation result can be retrieved from
the returned response.

Script
~~~~~~

Computing the max price value across all document, this time using a
script:

.. code:: js

    {
        "aggs" : {
            "max_price" : { "max" : { "script" : "doc['price'].value" } }
        }
    }

Value Script
~~~~~~~~~~~~

Let’s say that the prices of the documents in our index are in USD, but
we would like to compute the max in EURO (and for the sake of this
example, lets say the conversion rate is 1.2). We can use a value script
to apply the conversion rate to every value before it is aggregated:

.. code:: js

    {
        "aggs" : {
            "max_price_in_euros" : {
                "max" : {
                    "field" : "price",
                    "script" : "_value * conversion_rate",
                    "params" : {
                        "conversion_rate" : 1.2
                    }
                }
            }
        }
    }

Sum Aggregation
---------------

A ``single-value`` metrics aggregation that sums up numeric values that
are extracted from the aggregated documents. These values can be
extracted either from specific numeric fields in the documents, or be
generated by a provided script.

Assuming the data consists of documents representing stock ticks, where
each tick holds the change in the stock price from the previous tick.

.. code:: js

    {
        "query" : {
            "filtered" : {
                "query" : { "match_all" : {}},
                "filter" : {
                    "range" : { "timestamp" : { "from" : "now/1d+9.5h", "to" : "now/1d+16h" }}
                }
            }
        },
        "aggs" : {
            "intraday_return" : { "sum" : { "field" : "change" } }
        }
    }

The above aggregation sums up all changes in the today’s trading stock
ticks which accounts for the intraday return. The aggregation type is
``sum`` and the ``field`` setting defines the numeric field of the
documents of which values will be summed up. The above will return the
following:

.. code:: js

    {
        ...

        "aggregations": {
            "intraday_return": {
               "value": 2.18
            }
        }
    }

The name of the aggregation (``intraday_return`` above) also serves as
the key by which the aggregation result can be retrieved from the
returned response.

Script
~~~~~~

Computing the intraday return based on a script:

.. code:: js

    {
        ...,

        "aggs" : {
            "intraday_return" : { "sum" : { "script" : "doc['change'].value" } }
        }
    }

Value Script
^^^^^^^^^^^^

Computing the sum of squares over all stock tick changes:

.. code:: js

    {
        "aggs" : {
            ...

            "aggs" : {
                "daytime_return" : {
                    "sum" : {
                        "field" : "change",
                        "script" : "_value * _value" }
                }
            }
        }
    }

Avg Aggregation
---------------

A ``single-value`` metrics aggregation that computes the average of
numeric values that are extracted from the aggregated documents. These
values can be extracted either from specific numeric fields in the
documents, or be generated by a provided script.

Assuming the data consists of documents representing exams grades
(between 0 and 100) of students

.. code:: js

    {
        "aggs" : {
            "avg_grade" : { "avg" : { "field" : "grade" } }
        }
    }

The above aggregation computes the average grade over all documents. The
aggregation type is ``avg`` and the ``field`` setting defines the
numeric field of the documents the average will be computed on. The
above will return the following:

.. code:: js

    {
        ...

        "aggregations": {
            "avg_grade": {
                "value": 75
            }
        }
    }

The name of the aggregation (``avg_grade`` above) also serves as the key
by which the aggregation result can be retrieved from the returned
response.

Script
~~~~~~

Computing the average grade based on a script:

.. code:: js

    {
        ...,

        "aggs" : {
            "avg_grade" : { "avg" : { "script" : "doc['grade'].value" } }
        }
    }

Value Script
^^^^^^^^^^^^

It turned out that the exam was way above the level of the students and
a grade correction needs to be applied. We can use value script to get
the new average:

.. code:: js

    {
        "aggs" : {
            ...

            "aggs" : {
                "avg_corrected_grade" : {
                    "avg" : {
                        "field" : "grade",
                        "script" : "_value * correction",
                        "params" : {
                            "correction" : 1.2
                        }
                    }
                }
            }
        }
    }

Stats Aggregation
-----------------

A ``multi-value`` metrics aggregation that computes stats over numeric
values extracted from the aggregated documents. These values can be
extracted either from specific numeric fields in the documents, or be
generated by a provided script.

The stats that are returned consist of: ``min``, ``max``, ``sum``,
``count`` and ``avg``.

Assuming the data consists of documents representing exams grades
(between 0 and 100) of students

.. code:: js

    {
        "aggs" : {
            "grades_stats" : { "stats" : { "field" : "grade" } }
        }
    }

The above aggregation computes the grades statistics over all documents.
The aggregation type is ``stats`` and the ``field`` setting defines the
numeric field of the documents the stats will be computed on. The above
will return the following:

.. code:: js

    {
        ...

        "aggregations": {
            "grades_stats": {
                "count": 6,
                "min": 60,
                "max": 98,
                "avg": 78.5,
                "sum": 471
            }
        }
    }

The name of the aggregation (``grades_stats`` above) also serves as the
key by which the aggregation result can be retrieved from the returned
response.

Script
~~~~~~

Computing the grades stats based on a script:

.. code:: js

    {
        ...,

        "aggs" : {
            "grades_stats" : { "stats" : { "script" : "doc['grade'].value" } }
        }
    }

Value Script
^^^^^^^^^^^^

It turned out that the exam was way above the level of the students and
a grade correction needs to be applied. We can use a value script to get
the new stats:

.. code:: js

    {
        "aggs" : {
            ...

            "aggs" : {
                "grades_stats" : {
                    "stats" : {
                        "field" : "grade",
                        "script" : "_value * correction",
                        "params" : {
                            "correction" : 1.2
                        }
                    }
                }
            }
        }
    }

Extended Stats Aggregation
--------------------------

A ``multi-value`` metrics aggregation that computes stats over numeric
values extracted from the aggregated documents. These values can be
extracted either from specific numeric fields in the documents, or be
generated by a provided script.

The ``extended_stats`` aggregations is an extended version of the
```stats`` <#search-aggregations-metrics-stats-aggregation>`__
aggregation, where additional metrics are added such as
``sum_of_squares``, ``variance`` and ``std_deviation``.

Assuming the data consists of documents representing exams grades
(between 0 and 100) of students

.. code:: js

    {
        "aggs" : {
            "grades_stats" : { "extended_stats" : { "field" : "grade" } }
        }
    }

The above aggregation computes the grades statistics over all documents.
The aggregation type is ``extended_stats`` and the ``field`` setting
defines the numeric field of the documents the stats will be computed
on. The above will return the following:

.. code:: js

    {
        ...

        "aggregations": {
            "grades_stats": {
                "count": 6,
                "min": 72,
                "max": 117.6,
                "avg": 94.2,
                "sum": 565.2,
                "sum_of_squares": 54551.51999999999,
                "variance": 218.2799999999976,
                "std_deviation": 14.774302013969987
            }
        }
    }

The name of the aggregation (``grades_stats`` above) also serves as the
key by which the aggreagtion result can be retrieved from the returned
response.

Script
~~~~~~

Computing the grades stats based on a script:

.. code:: js

    {
        ...,

        "aggs" : {
            "grades_stats" : { "extended_stats" : { "script" : "doc['grade'].value" } }
        }
    }

Value Script
^^^^^^^^^^^^

It turned out that the exam was way above the level of the students and
a grade correction needs to be applied. We can use value script to get
the new stats:

.. code:: js

    {
        "aggs" : {
            ...

            "aggs" : {
                "grades_stats" : {
                    "extended_stats" : {
                        "field" : "grade",
                        "script" : "_value * correction",
                        "params" : {
                            "correction" : 1.2
                        }
                    }
                }
            }
        }
    }

Value Count Aggregation
-----------------------

A ``single-value`` metrics aggregation that counts the number of values
that are extracted from the aggregated documents. These values can be
extracted either from specific fields in the documents, or be generated
by a provided script. Typically, this aggregator will be used in
conjunction with other single-value aggregations. For example, when
computing the ``avg`` one might be interested in the number of values
the average is computed over.

.. code:: js

    {
        "aggs" : {
            "grades_count" : { "value_count" : { "field" : "grade" } }
        }
    }

Response:

.. code:: js

    {
        ...

        "aggregations": {
            "grades_count": {
                "value": 10
            }
        }
    }

The name of the aggregation (``grades_count`` above) also serves as the
key by which the aggregation result can be retrieved from the returned
response.

Script
~~~~~~

Counting the values generated by a script:

.. code:: js

    {
        ...,

        "aggs" : {
            "grades_count" : { "value_count" : { "script" : "doc['grade'].value" } }
        }
    }

Percentiles Aggregation
-----------------------

A ``multi-value`` metrics aggregation that calculates one or more
percentiles over numeric values extracted from the aggregated documents.
These values can be extracted either from specific numeric fields in the
documents, or be generated by a provided script.

Percentiles show the point at which a certain percentage of observed
values occur. For example, the 95th percentile is the value which is
greater than 95% of the observed values.

Percentiles are often used to find outliers. In normal distributions,
the 0.13th and 99.87th percentiles represents three standard deviations
from the mean. Any data which falls outside three standard deviations is
often considered an anomaly.

When a range of percentiles are retrieved, they can be used to estimate
the data distribution and determine if the data is skewed, bimodal, etc.

Assume your data consists of website load times. The average and median
load times are not overly useful to an administrator. The max may be
interesting, but it can be easily skewed by a single slow response.

Let’s look at a range of percentiles representing load time:

.. code:: js

    {
        "aggs" : {
            "load_time_outlier" : {
                "percentiles" : {
                    "field" : "load_time" 
                }
            }
        }
    }

The field ``load_time`` must be a numeric field

By default, the ``percentile`` metric will generate a range of
percentiles: ``[ 1, 5, 25, 50, 75, 95, 99 ]``. The response will look
like this:

.. code:: js

    {
        ...

       "aggregations": {
          "load_time_outlier": {
             "values" : {
                "1.0": 15,
                "5.0": 20,
                "25.0": 23,
                "50.0": 25,
                "75.0": 29,
                "95.0": 60,
                "99.0": 150
             }
          }
       }
    }

As you can see, the aggregation will return a calculated value for each
percentile in the default range. If we assume response times are in
milliseconds, it is immediately obvious that the webpage normally loads
in 15-30ms, but occasionally spikes to 60-150ms.

Often, administrators are only interested in outliers — the extreme
percentiles. We can specify just the percents we are interested in
(requested percentiles must be a value between 0-100 inclusive):

.. code:: js

    {
        "aggs" : {
            "load_time_outlier" : {
                "percentiles" : {
                    "field" : "load_time",
                    "percents" : [95, 99, 99.9] 
                }
            }
        }
    }

Use the ``percents`` parameter to specify particular percentiles to
calculate

Script
~~~~~~

The percentile metric supports scripting. For example, if our load times
are in milliseconds but we want percentiles calculated in seconds, we
could use a script to convert them on-the-fly:

.. code:: js

    {
        "aggs" : {
            "load_time_outlier" : {
                "percentiles" : {
                    "script" : "doc['load_time'].value / timeUnit", 
                    "params" : {
                        "timeUnit" : 1000   
                    }
                }
            }
        }
    }

The ``field`` parameter is replaced with a ``script`` parameter, which
uses the script to generate values which percentiles are calculated on

Scripting supports parameterized input just like any other script

Percentiles are (usually) approximate
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

There are many different algorithms to calculate percentiles. The naive
implementation simply stores all the values in a sorted array. To find
the 50th percentile, you simply find the value that is at
``my_array[count(my_array) * 0.5]``.

Clearly, the naive implementation does not scale — the sorted array
grows linearly with the number of values in your dataset. To calculate
percentiles across potentially billions of values in an Elasticsearch
cluster, *approximate* percentiles are calculated.

The algorithm used by the ``percentile`` metric is called TDigest
(introduced by Ted Dunning in `Computing Accurate Quantiles using
T-Digests <https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf>`__).

When using this metric, there are a few guidelines to keep in mind:

-  Accuracy is proportional to ``q(1-q)``. This means that extreme
   percentiles (e.g. 99%) are more accurate than less extreme
   percentiles, such as the median

-  For small sets of values, percentiles are highly accurate (and
   potentially 100% accurate if the data is small enough).

-  As the quantity of values in a bucket grows, the algorithm begins to
   approximate the percentiles. It is effectively trading accuracy for
   memory savings. The exact level of inaccuracy is difficult to
   generalize, since it depends on your data distribution and volume of
   data being aggregated

The following chart shows the relative error on a uniform distribution
depending on the number of collected values and the requested
percentile:

|images/percentiles\_error.png|

It shows how precision is better for extreme percentiles. The reason why
error diminishes for large number of values is that the law of large
numbers makes the distribution of values more and more uniform and the
t-digest tree can do a better job at summarizing it. It would not be the
case on more skewed distributions.

Compression
~~~~~~~~~~~

Approximate algorithms must balance memory utilization with estimation
accuracy. This balance can be controlled using a ``compression``
parameter:

.. code:: js

    {
        "aggs" : {
            "load_time_outlier" : {
                "percentiles" : {
                    "field" : "load_time",
                    "compression" : 200 
                }
            }
        }
    }

Compression controls memory usage and approximation error

The TDigest algorithm uses a number of "nodes" to approximate
percentiles — the more nodes available, the higher the accuracy (and
large memory footprint) proportional to the volume of data. The
``compression`` parameter limits the maximum number of nodes to
``20 * compression``.

Therefore, by increasing the compression value, you can increase the
accuracy of your percentiles at the cost of more memory. Larger
compression values also make the algorithm slower since the underlying
tree data structure grows in size, resulting in more expensive
operations. The default compression value is ``100``.

A "node" uses roughly 32 bytes of memory, so under worst-case scenarios
(large amount of data which arrives sorted and in-order) the default
settings will produce a TDigest roughly 64KB in size. In practice data
tends to be more random and the TDigest will use less memory.

Percentile Ranks Aggregation
----------------------------

A ``multi-value`` metrics aggregation that calculates one or more
percentile ranks over numeric values extracted from the aggregated
documents. These values can be extracted either from specific numeric
fields in the documents, or be generated by a provided script.

    **Important**

    This feature is marked as experimental, and may be subject to change
    in the future. If you use this feature, please let us know your
    experience with it!

    **Note**

    Please see ? and ? for advice regarding approximation and memory use
    of the percentile ranks aggregation

Percentile rank show the percentage of observed values which are below
certain value. For example, if a value is greater than or equal to 95%
of the observed values it is said to be at the 95th percentile rank.

Assume your data consists of website load times. You may have a service
agreement that 95% of page loads completely within 15ms and 99% of page
loads complete within 30ms.

Let’s look at a range of percentiles representing load time:

.. code:: js

    {
        "aggs" : {
            "load_time_outlier" : {
                "percentile_ranks" : {
                    "field" : "load_time" 
                    "values" : [15, 30]
                }
            }
        }
    }

The field ``load_time`` must be a numeric field

The response will look like this:

.. code:: js

    {
        ...

       "aggregations": {
          "load_time_outlier": {
             "values" : {
                "15": 92,
                "30": 100
             }
          }
       }
    }

From this information you can determine you are hitting the 99% load
time target but not quite hitting the 95% load time target

Script
~~~~~~

The percentile rank metric supports scripting. For example, if our load
times are in milliseconds but we want to specify values in seconds, we
could use a script to convert them on-the-fly:

.. code:: js

    {
        "aggs" : {
            "load_time_outlier" : {
                "percentile_ranks" : {
                    "values" : [3, 5],
                    "script" : "doc['load_time'].value / timeUnit", 
                    "params" : {
                        "timeUnit" : 1000   
                    }
                }
            }
        }
    }

The ``field`` parameter is replaced with a ``script`` parameter, which
uses the script to generate values which percentile ranks are calculated
on

Scripting supports parameterized input just like any other script

Cardinality Aggregation
-----------------------

A ``single-value`` metrics aggregation that calculates an approximate
count of distinct values. Values can be extracted either from specific
fields in the document or generated by a script.

    **Important**

    This feature is marked as experimental, and may be subject to change
    in the future. If you use this feature, please let us know your
    experience with it!

Assume you are indexing books and would like to count the unique authors
that match a query:

.. code:: js

    {
        "aggs" : {
            "author_count" : {
                "cardinality" : {
                    "field" : "author"
                }
            }
        }
    }

This aggregation also supports the ``precision_threshold`` and
``rehash`` options:

.. code:: js

    {
        "aggs" : {
            "author_count" : {
                "cardinality" : {
                    "field" : "author_hash",
                    "precision_threshold": 100, 
                    "rehash": false 
                }
            }
        }
    }

The ``precision_threshold`` options allows to trade memory for accuracy,
and defines a unique count below which counts are expected to be close
to accurate. Above this value, counts might become a bit more fuzzy. The
maximum supported value is 40000, thresholds above this number will have
the same effect as a threshold of 40000. Default value depends on the
number of parent aggregations that multiple create buckets (such as
terms or histograms).

If you computed a hash on client-side, stored it into your documents and
want Elasticsearch to use them to compute counts using this hash
function without rehashing values, it is possible to specify
``rehash: false``. Default value is ``true``. Please note that the hash
must be indexed as a long when ``rehash`` is false.

Counts are approximate
~~~~~~~~~~~~~~~~~~~~~~

Computing exact counts requires loading values into a hash set and
returning its size. This doesn’t scale when working on high-cardinality
sets and/or large values as the required memory usage and the need to
communicate those per-shard sets between nodes would utilize too many
resources of the cluster.

This ``cardinality`` aggregation is based on the
`HyperLogLog++ <http://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf>`__
algorithm, which counts based on the hashes of the values with some
interesting properties:

-  configurable precision, which decides on how to trade memory for
   accuracy,

-  excellent accuracy on low-cardinality sets,

-  fixed memory usage: no matter if there are tens or billions of unique
   values, memory usage only depends on the configured precision.

For a precision threshold of ``c``, the implementation that we are using
requires about ``c * 8`` bytes.

The following chart shows how the error varies before and after the
threshold:

|images/cardinality\_error.png|

For all 3 thresholds, counts have been accurate up to the configured
threshold (although not guaranteed, this is likely to be the case).
Please also note that even with a threshold as low as 100, the error
remains under 5%, even when counting millions of items.

Pre-computed hashes
~~~~~~~~~~~~~~~~~~~

If you don’t want Elasticsearch to re-compute hashes on every run of
this aggregation, it is possible to use pre-computed hashes, either by
computing a hash on client-side, indexing it and specifying
``rehash: false``, or by using the special ``murmur3`` field mapper,
typically in the context of a ``multi-field`` in the mapping:

.. code:: js

    {
        "author": {
            "type": "string",
            "fields": {
                "hash": {
                    "type": "murmur3"
                }
            }
        }
    }

With such a mapping, Elasticsearch is going to compute hashes of the
``author`` field at indexing time and store them in the ``author.hash``
field. This way, unique counts can be computed using the cardinality
aggregation by only loading the hashes into memory, not the values of
the ``author`` field, and without computing hashes on the fly:

.. code:: js

    {
        "aggs" : {
            "author_count" : {
                "cardinality" : {
                    "field" : "author.hash"
                }
            }
        }
    }

    **Note**

    ``rehash`` is automatically set to ``false`` when computing unique
    counts on a ``murmur3`` field.

    **Note**

    Pre-computing hashes is usually only useful on very large and/or
    high-cardinality fields as it saves CPU and memory. However, on
    numeric fields, hashing is very fast and storing the original values
    requires as much or less memory than storing the hashes. This is
    also true on low-cardinality string fields, especially given that
    those have an optimization in order to make sure that hashes are
    computed at most once per unique value per segment.

Script
~~~~~~

The ``cardinality`` metric supports scripting, with a noticeable
performance hit however since hashes need to be computed on the fly.

.. code:: js

    {
        "aggs" : {
            "author_count" : {
                "cardinality" : {
                    "script": "doc['author.first_name'].value + ' ' + doc['author.last_name'].value"
                }
            }
        }
    }

Geo Bounds Aggregation
----------------------

A metric aggregation that computes the bounding box containing all
geo\_point values for a field.

    **Important**

    This feature is marked as experimental, and may be subject to change
    in the future. If you use this feature, please let us know your
    experience with it!

Example:

.. code:: js

    {
        "query" : {
            "match" : { "business_type" : "shop" }
        },
        "aggs" : {
            "viewport" : {
                "geo_bounds" : {
                    "field" : "location" 
                    "wrap_longitude" : "true" 
                }
            }
        }
    }

The ``geo_bounds`` aggregation specifies the field to use to obtain the
bounds

``wrap_longitude`` is an optional parameter which specifies whether the
bounding box should be allowed to overlap the international date line.
The default value is ``true``

The above aggregation demonstrates how one would compute the bounding
box of the location field for all documents with a business type of shop

The response for the above aggregation:

.. code:: js

    {
        ...

        "aggregations": {
            "viewport": {
                "bounds": {
                    "top_left": {
                        "lat": 80.45,
                        "lon": -160.22
                    },
                    "bottom_right": {
                        "lat": 40.65,
                        "lon": 42.57
                    }
                }
            }
        }
    }

Top hits Aggregation
--------------------

A ``top_hits`` metric aggregator keeps track of the most relevant
document being aggregated. This aggregator is intended to be used as a
sub aggregator, so that the top matching documents can be aggregated per
bucket.

The ``top_hits`` aggregator can effectively be used to group result sets
by certain fields via a bucket aggregator. One or more bucket
aggregators determines by which properties a result set get sliced into.

Options
~~~~~~~

-  ``from`` - The offset from the first result you want to fetch.

-  ``size`` - The maximum number of top matching hits to return per
   bucket. By default the top three matching hits are returned.

-  ``sort`` - How the top matching hits should be sorted. By default the
   hits are sorted by the score of the main query.

Supported per hit features
~~~~~~~~~~~~~~~~~~~~~~~~~~

The top\_hits aggregation returns regular search hits, because of this
many per hit features can be supported:

-  `Highlighting <#search-request-highlighting>`__

-  `Explain <#search-request-explain>`__

-  `Named filters and
   queries <#search-request-named-queries-and-filters>`__

-  `Source filtering <#search-request-source-filtering>`__

-  `Script fields <#search-request-script-fields>`__

-  `Fielddata fields <#search-request-fielddata-fields>`__

-  `Include versions <#search-request-version>`__

Example
~~~~~~~

In the following example we group the questions by tag and per tag we
show the last active question. For each question only the title field is
being included in the source.

.. code:: js

    {
        "aggs": {
            "top-tags": {
                "terms": {
                    "field": "tags",
                    "size": 3
                },
                "aggs": {
                    "top_tag_hits": {
                        "top_hits": {
                            "sort": [
                                {
                                    "last_activity_date": {
                                        "order": "desc"
                                    }
                                }
                            ],
                            "_source": {
                                "include": [
                                    "title"
                                ]
                            },
                            "size" : 1
                        }
                    }
                }
            }
        }
    }

Possible response snippet:

.. code:: js

    "aggregations": {
      "top-tags": {
         "buckets": [
            {
               "key": "windows-7",
               "doc_count": 25365,
               "top_tags_hits": {
                  "hits": {
                     "total": 25365,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602679",
                           "_score": 1,
                           "_source": {
                              "title": "Windows port opening"
                           },
                           "sort": [
                              1370143231177
                           ]
                        }
                     ]
                  }
               }
            },
            {
               "key": "linux",
               "doc_count": 18342,
               "top_tags_hits": {
                  "hits": {
                     "total": 18342,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602672",
                           "_score": 1,
                           "_source": {
                              "title": "Ubuntu RFID Screensaver lock-unlock"
                           },
                           "sort": [
                              1370143379747
                           ]
                        }
                     ]
                  }
               }
            },
            {
               "key": "windows",
               "doc_count": 18119,
               "top_tags_hits": {
                  "hits": {
                     "total": 18119,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602678",
                           "_score": 1,
                           "_source": {
                              "title": "If I change my computers date / time, what could be affected?"
                           },
                           "sort": [
                              1370142868283
                           ]
                        }
                     ]
                  }
               }
            }
         ]
      }
    }

Field collapse example
~~~~~~~~~~~~~~~~~~~~~~

Field collapsing or result grouping is a feature that logically groups a
result set into groups and per group returns top documents. The ordering
of the groups is determined by the relevancy of the first document in a
group. In Elasticsearch this can be implemented via a bucket aggregator
that wraps a ``top_hits`` aggregator as sub-aggregator.

In the example below we search across crawled webpages. For each webpage
we store the body and the domain the webpage belong to. By defining a
``terms`` aggregator on the ``domain`` field we group the result set of
webpages by domain. The ``top_docs`` aggregator is then defined as
sub-aggregator, so that the top matching hits are collected per bucket.

Also a ``max`` aggregator is defined which is used by the ``terms``
aggregator’s order feature the return the buckets by relevancy order of
the most relevant document in a bucket.

.. code:: js

    {
      "query": {
        "match": {
          "body": "elections"
        }
      },
      "aggs": {
        "top-sites": {
          "terms": {
            "field": "domain",
            "order": {
              "top_hit": "desc"
            }
          },
          "aggs": {
            "top_tags_hits": {
              "top_hits": {}
            },
            "top_hit" : {
              "max": {
                "script": "_score"
              }
            }
          }
        }
      }
    }

At the moment the ``max`` (or ``min``) aggregator is needed to make sure
the buckets from the ``terms`` aggregator are ordered according to the
score of the most relevant webpage per domain. The ``top_hits``
aggregator isn’t a metric aggregator and therefore can’t be used in the
``order`` option of the ``terms`` aggregator.

top\_hits support in a nested or reverse\_nested aggregator
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If the ``top_hits`` aggregator is wrapped in a ``nested`` or
``reverse_nested`` aggregator then nested hits are being returned.
Nested hits are in a sense hidden mini documents that are part of
regular document where in the mapping a nested field type has been
configured. The ``top_hits`` aggregator has the ability to un-hide these
documents if it is wrapped in a ``nested`` or ``reverse_nested``
aggregator. Read more about nested in the `nested type
mapping <#mapping-nested-type>`__.

If nested type has been configured a single document is actually indexed
as multiple Lucene documents and they share the same id. In order to
determine the identity of a nested hit there is more needed than just
the id, so that is why nested hits also include their nested identity.
The nested identity is kept under the ``_nested`` field in the search
hit and includes the array field and the offset in the array field the
nested hit belongs to. The offset is zero based.

Top hits response snippet with a nested hit, which resides in the third
slot of array field ``nested_field1`` in document with id ``1``:

.. code:: js

    ...
    "hits": {
     "total": 25365,
     "max_score": 1,
     "hits": [
       {
         "_index": "a",
         "_type": "b",
         "_id": "1",
         "_score": 1,
         "_nested" : {
           "field" : "nested_field1",
           "offset" : 2
         }
         "_source": ...
       },
       ...
     ]
    }
    ...

If ``_source`` is requested then just the part of the source of the
nested object is returned, not the entire source of the document. Also
stored fields on the **nested** inner object level are accessible via
``top_hits`` aggregator residing in a ``nested`` or ``reverse_nested``
aggregator.

Only nested hits will have a ``_nested`` field in the hit, non nested
(regular) hits will not have a ``_nested`` field.

The information in ``_nested`` can also be used to parse the original
source somewhere else if ``_source`` isn’t enabled.

If there are multiple levels of nested object types defined in mappings
then the ``_nested`` information can also be hierarchical in order to
express the identity of nested hits that are two layers deep or more.

In the example below a nested hit resides in the first slot of the field
``nested_grand_child_field`` which then resides in the second slow of
the ``nested_child_field`` field:

.. code:: js

    ...
    "hits": {
     "total": 2565,
     "max_score": 1,
     "hits": [
       {
         "_index": "a",
         "_type": "b",
         "_id": "1",
         "_score": 1,
         "_nested" : {
           "field" : "nested_child_field",
           "offset" : 1,
           "_nested" : {
             "field" : "nested_grand_child_field",
             "offset" : 0
           }
         }
         "_source": ...
       },
       ...
     ]
    }
    ...

Scripted Metric Aggregation
---------------------------

A metric aggregation that executes using scripts to provide a metric
output.

    **Important**

    This feature is marked as experimental, and may be subject to change
    in the future. If you use this feature, please let us know your
    experience with it!

Example:

.. code:: js

    {
        "query" : {
            "match_all" : {}
        },
        "aggs": {
            "profit": {
                "scripted_metric": {
                    "init_script" : "_agg['transactions'] = []",
                    "map_script" : "if (doc['type'].value == \"sale\") { _agg.transactions.add(doc['amount'].value) } else { _agg.transactions.add(-1 * doc['amount'].value) }", 
                    "combine_script" : "profit = 0; for (t in _agg.transactions) { profit += t }; return profit",
                    "reduce_script" : "profit = 0; for (a in _aggs) { profit += a }; return profit"
                }
            }
        }
    }

``map_script`` is the only required parameter

The above aggregation demonstrates how one would use the script
aggregation compute the total profit from sale and cost transactions.

The response for the above aggregation:

.. code:: js

    {
        ...

        "aggregations": {
            "profit": {
                "value": 170
            }
       }
    }

Scope of scripts
~~~~~~~~~~~~~~~~

The scripted metric aggregation uses scripts at 4 stages of its
execution:

init\_script
    Executed prior to any collection of documents. Allows the
    aggregation to set up any initial state.

    In the above example, the ``init_script`` creates an array
    ``transactions`` in the ``_agg`` object.

map\_script
    Executed once per document collected. This is the only required
    script. If no combine\_script is specified, the resulting state
    needs to be stored in an object named ``_agg``.

    In the above example, the ``map_script`` checks the value of the
    type field. If the value if *sale* the value of the amount field is
    added to the transactions array. If the value of the type field is
    not *sale* the negated value of the amount field is added to
    transactions.

combine\_script
    Executed once on each shard after document collection is complete.
    Allows the aggregation to consolidate the state returned from each
    shard. If a combine\_script is not provided the combine phase will
    return the aggregation variable.

    In the above example, the ``combine_script`` iterates through all
    the stored transactions, summing the values in the ``profit``
    variable and finally returns ``profit``.

reduce\_script
    Executed once on the coordinating node after all shards have
    returned their results. The script is provided with access to a
    variable ``_aggs`` which is an array of the result of the
    combine\_script on each shard. If a reduce\_script is not provided
    the reduce phase will return the ``_aggs`` variable.

    In the above example, the ``reduce_script`` iterates through the
    ``profit`` returned by each shard summing the values before
    returning the final combined profit which will be returned in the
    response of the aggregation.

Worked Example
~~~~~~~~~~~~~~

Imagine a situation where you index the following documents into and
index with 2 shards:

.. code:: js

    $ curl -XPUT 'http://localhost:9200/transactions/stock/1' -d '{
    {
        "type": "sale"
        "amount": 80
    }

    $ curl -XPUT 'http://localhost:9200/transactions/stock/2' -d '{
    {
        "type": "cost"
        "amount": 10
    }

    $ curl -XPUT 'http://localhost:9200/transactions/stock/3' -d '{
    {
        "type": "cost"
        "amount": 30
    }

    $ curl -XPUT 'http://localhost:9200/transactions/stock/4' -d '{
    {
        "type": "sale"
        "amount": 130
    }

Lets say that documents 1 and 3 end up on shard A and documents 2 and 4
end up on shard B. The following is a breakdown of what the aggregation
result is at each stage of the example above.

Before init\_script
^^^^^^^^^^^^^^^^^^^

No params object was specified so the default params object is used:

.. code:: js

    "params" : {
        "_agg" : {}
    }

After init\_script
^^^^^^^^^^^^^^^^^^

This is run once on each shard before any document collection is
performed, and so we will have a copy on each shard:

Shard A
    .. code:: js

        "params" : {
            "_agg" : {
                "transactions" : []
            }
        }

Shard B
    .. code:: js

        "params" : {
            "_agg" : {
                "transactions" : []
            }
        }

After map\_script
^^^^^^^^^^^^^^^^^

Each shard collects its documents and runs the map\_script on each
document that is collected:

Shard A
    .. code:: js

        "params" : {
            "_agg" : {
                "transactions" : [ 80, -30 ]
            }
        }

Shard B
    .. code:: js

        "params" : {
            "_agg" : {
                "transactions" : [ -10, 130 ]
            }
        }

After combine\_script
^^^^^^^^^^^^^^^^^^^^^

The combine\_script is executed on each shard after document collection
is complete and reduces all the transactions down to a single profit
figure for each shard (by summing the values in the transactions array)
which is passed back to the coordinating node:

Shard A
    50

Shard B
    120

After reduce\_script
^^^^^^^^^^^^^^^^^^^^

The reduce\_script receives an ``_aggs`` array containing the result of
the combine script for each shard:

.. code:: js

    "_aggs" : [
        50,
        120
    ]

It reduces the responses for the shards down to a final overall profit
figure (by summing the values) and returns this as the result of the
aggregation to produce the response:

.. code:: js

    {
        ...

        "aggregations": {
            "profit": {
                "value": 170
            }
       }
    }

Other Parameters
~~~~~~~~~~~~~~~~

+------------+---------------------------------------------------------------+
| params     | Optional. An object whose contents will be passed as          |
|            | variables to the ``init_script``, ``map_script`` and          |
|            | ``combine_script``. This can be useful to allow the user to   |
|            | control the behavior of the aggregation and for storing state |
|            | between the scripts. If this is not specified, the default is |
|            | the equivalent of providing:                                  |
|            |                                                               |
|            | .. code:: js                                                  |
|            |                                                               |
|            |     "params" : {                                              |
|            |         "_agg" : {}                                           |
|            |     }                                                         |
                                                                            
+------------+---------------------------------------------------------------+
| reduce\_pa | Optional. An object whose contents will be passed as          |
| rams       | variables to the ``reduce_script``. This can be useful to     |
|            | allow the user to control the behavior of the reduce phase.   |
|            | If this is not specified the variable will be undefined in    |
|            | the reduce\_script execution.                                 |
+------------+---------------------------------------------------------------+
| lang       | Optional. The script language used for the scripts. If this   |
|            | is not specified the default scripting language is used.      |
+------------+---------------------------------------------------------------+
| init\_scri | Optional. Can be used in place of the ``init_script``         |
| pt\_file   | parameter to provide the script using in a file.              |
+------------+---------------------------------------------------------------+
| init\_scri | Optional. Can be used in place of the ``init_script``         |
| pt\_id     | parameter to provide the script using an indexed script.      |
+------------+---------------------------------------------------------------+
| map\_scrip | Optional. Can be used in place of the ``map_script``          |
| t\_file    | parameter to provide the script using in a file.              |
+------------+---------------------------------------------------------------+
| map\_scrip | Optional. Can be used in place of the ``map_script``          |
| t\_id      | parameter to provide the script using an indexed script.      |
+------------+---------------------------------------------------------------+
| combine\_s | Optional. Can be used in place of the ``combine_script``      |
| cript\_fil | parameter to provide the script using in a file.              |
| e          |                                                               |
+------------+---------------------------------------------------------------+
| combine\_s | Optional. Can be used in place of the ``combine_script``      |
| cript\_id  | parameter to provide the script using an indexed script.      |
+------------+---------------------------------------------------------------+
| reduce\_sc | Optional. Can be used in place of the ``reduce_script``       |
| ript\_file | parameter to provide the script using in a file.              |
+------------+---------------------------------------------------------------+
| reduce\_sc | Optional. Can be used in place of the ``reduce_script``       |
| ript\_id   | parameter to provide the script using an indexed script.      |
+------------+---------------------------------------------------------------+

Global Aggregation
------------------

Defines a single bucket of all the documents within the search execution
context. This context is defined by the indices and the document types
you’re searching on, but is **not** influenced by the search query
itself.

    **Note**

    Global aggregators can only be placed as top level aggregators (it
    makes no sense to embed a global aggregator within another bucket
    aggregator)

Example:

.. code:: js

    {
        "query" : {
            "match" : { "title" : "shirt" }
        },
        "aggs" : {
            "all_products" : {
                "global" : {}, 
                "aggs" : { 
                    "avg_price" : { "avg" : { "field" : "price" } }
                }
            }
        }
    }

The ``global`` aggregation has an empty body

The sub-aggregations that are registered for this ``global`` aggregation

The above aggregation demonstrates how one would compute aggregations
(``avg_price`` in this example) on all the documents in the search
context, regardless of the query (in our example, it will compute the
average price over all products in our catalog, not just on the
"shirts").

The response for the above aggreation:

.. code:: js

    {
        ...

        "aggregations" : {
            "all_products" : {
                "doc_count" : 100, 
                "avg_price" : {
                    "value" : 56.3
                }
            }
        }
    }

The number of documents that were aggregated (in our case, all documents
within the search context)

Filter Aggregation
------------------

Defines a single bucket of all the documents in the current document set
context that match a specified filter. Often this will be used to narrow
down the current aggregation context to a specific set of documents.

Example:

.. code:: js

    {
        "aggs" : {
            "in_stock_products" : {
                "filter" : { "range" : { "stock" : { "gt" : 0 } } },
                "aggs" : {
                    "avg_price" : { "avg" : { "field" : "price" } }
                }
            }
        }
    }

In the above example, we calculate the average price of all the products
that are currently in-stock.

Response:

.. code:: js

    {
        ...

        "aggs" : {
            "in_stock_products" : {
                "doc_count" : 100,
                "avg_price" : { "value" : 56.3 }
            }
        }
    }

Filters Aggregation
-------------------

Defines a multi bucket aggregations where each bucket is associated with
a filter. Each bucket will collect all documents that match its
associated filter.

Example:

.. code:: js

    {
      "aggs" : {
        "messages" : {
          "filters" : {
            "filters" : {
              "errors" :   { "term" : { "body" : "error"   }},
              "warnings" : { "term" : { "body" : "warning" }}
            }
          },
          "aggs" : {
            "monthly" : {
              "histogram" : {
                "field" : "timestamp",
                "interval" : "1M"
              }
            }
          }
        }
      }
    }

In the above example, we analyze log messages. The aggregation will
build two collection (buckets) of log messages - one for all those
containing an error, and another for all those containing a warning. And
for each of these buckets it will break them down by month.

Response:

.. code:: js

    ...
      "aggs" : {
        "messages" : {
          "buckets" : {
            "errors" : {
              "doc_count" : 34,
                "monthly" : {
                  "buckets : [
                    ... // the histogram monthly breakdown
                  ]
                }
              },
              "warnings" : {
                "doc_count" : 439,
                "monthly" : {
                  "buckets : [
                     ... // the histogram monthly breakdown
                  ]
                }
              }
            }
          }
        }
      }
    ...

Anonymous filters
~~~~~~~~~~~~~~~~~

The filters field can also be provided as an array of filters, as in the
following request:

.. code:: js

    {
      "aggs" : {
        "messages" : {
          "filters" : {
            "filters" : [
              { "term" : { "body" : "error"   }},
              { "term" : { "body" : "warning" }}
            ]
          },
          "aggs" : {
            "monthly" : {
              "histogram" : {
                "field" : "timestamp",
                "interval" : "1M"
              }
            }
          }
        }
      }
    }

The filtered buckets are returned in the same order as provided in the
request. The response for this example would be:

.. code:: js

    ...
      "aggs" : {
        "messages" : {
          "buckets" : [
            {
              "doc_count" : 34,
              "monthly" : {
                "buckets : [
                  ... // the histogram monthly breakdown
                ]
              }
            },
            {
              "doc_count" : 439,
              "monthly" : {
                "buckets : [
                  ... // the histogram monthly breakdown
                ]
              }
            }
          ]
        }
      }
    ...

Missing Aggregation
-------------------

A field data based single bucket aggregation, that creates a bucket of
all documents in the current document set context that are missing a
field value (effectively, missing a field or having the configured NULL
value set). This aggregator will often be used in conjunction with other
field data bucket aggregators (such as ranges) to return information for
all the documents that could not be placed in any of the other buckets
due to missing field data values.

Example:

.. code:: js

    {
        "aggs" : {
            "products_without_a_price" : {
                "missing" : { "field" : "price" }
            }
        }
    }

In the above example, we get the total number of products that do not
have a price.

Response:

.. code:: js

    {
        ...

        "aggs" : {
            "products_without_a_price" : {
                "doc_count" : 10
            }
        }
    }

Nested Aggregation
------------------

A special single bucket aggregation that enables aggregating nested
documents.

For example, lets say we have a index of products, and each product
holds the list of resellers - each having its own price for the product.
The mapping could look like:

.. code:: js

    {
        ...

        "product" : {
            "properties" : {
                "resellers" : { 
                    "type" : "nested"
                    "properties" : {
                        "name" : { "type" : "string" },
                        "price" : { "type" : "double" }
                    }
                }
            }
        }
    }

The ``resellers`` is an array that holds nested documents under the
``product`` object.

The following aggregations will return the minimum price products can be
purchased in:

.. code:: js

    {
        "query" : {
            "match" : { "name" : "led tv" }
        },
        "aggs" : {
            "resellers" : {
                "nested" : {
                    "path" : "resellers"
                },
                "aggs" : {
                    "min_price" : { "min" : { "field" : "resellers.price" } }
                }
            }
        }
    }

As you can see above, the nested aggregation requires the ``path`` of
the nested documents within the top level documents. Then one can define
any type of aggregation over these nested documents.

Response:

.. code:: js

    {
        "aggregations": {
            "resellers": {
                "min_price": {
                    "value" : 350
                }
            }
        }
    }

Reverse nested Aggregation
--------------------------

A special single bucket aggregation that enables aggregating on parent
docs from nested documents. Effectively this aggregation can break out
of the nested block structure and link to other nested structures or the
root document, which allows nesting other aggregations that aren’t part
of the nested object in a nested aggregation.

The ``reverse_nested`` aggregation must be defined inside a ``nested``
aggregation.

-  ``path`` - Which defines to what nested object field should be joined
   back. The default is empty, which means that it joins back to the
   root / main document level. The path cannot contain a reference to a
   nested object field that falls outside the ``nested`` aggregation’s
   nested structure a ``reverse_nested`` is in.

For example, lets say we have an index for a ticket system with issues
and comments. The comments are inlined into the issue documents as
nested documents. The mapping could look like:

.. code:: js

    {
        ...

        "issue" : {
            "properties" : {
                "tags" : { "type" : "string" }
                "comments" : { 
                    "type" : "nested"
                    "properties" : {
                        "username" : { "type" : "string", "index" : "not_analyzed" },
                        "comment" : { "type" : "string" }
                    }
                }
            }
        }
    }

The ``comments`` is an array that holds nested documents under the
``issue`` object.

The following aggregations will return the top commenters' username that
have commented and per top commenter the top tags of the issues the user
has commented on:

.. code:: js

    {
      "query": {
        "match": {
          "name": "led tv"
        }
      },
      "aggs": {
        "comments": {
          "nested": {
            "path": "comments"
          },
          "aggs": {
            "top_usernames": {
              "terms": {
                "field": "comments.username"
              },
              "aggs": {
                "comment_to_issue": {
                  "reverse_nested": {}, 
                  "aggs": {
                    "top_tags_per_comment": {
                      "terms": {
                        "field": "tags"
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }

As you can see above, the the ``reverse_nested`` aggregation is put in
to a ``nested`` aggregation as this is the only place in the dsl where
the ``reversed_nested`` aggregation can be used. Its sole purpose is to
join back to a parent doc higher up in the nested structure.

A ``reverse_nested`` aggregation that joins back to the root / main
document level, because no ``path`` has been defined. Via the ``path``
option the ``reverse_nested`` aggregation can join back to a different
level, if multiple layered nested object types have been defined in the
mapping

Possible response snippet:

.. code:: js

    {
      "aggregations": {
        "comments": {
          "top_usernames": {
            "buckets": [
              {
                "key": "username_1",
                "doc_count": 12,
                "comment_to_issue": {
                  "top_tags_per_comment": {
                    "buckets": [
                      {
                        "key": "tag1",
                        "doc_count": 9
                      },
                      ...
                    ]
                  }
                }
              },
              ...
            ]
          }
        }
      }
    }

Children Aggregation
--------------------

A special single bucket aggregation that enables aggregating from
buckets on parent document types to buckets on child documents.

This aggregation relies on the `\_parent
field <#mapping-parent-field>`__ in the mapping. This aggregation has a
single option: \* ``type`` - The what child type the buckets in the
parent space should be mapped to.

For example, let’s say we have an index of questions and answers. The
answer type has the following ``_parent`` field in the mapping:

.. code:: js

    {
        "answer" : {
            "_parent" : {
                "type" : "question"
            }
        }
    }

The question typed document contain a tag field and the answer typed
documents contain an owner field. With the ``children`` aggregation the
tag buckets can be mapped to the owner buckets in a single request even
though the two fields exist in two different kinds of documents.

An example of a question typed document:

.. code:: js

    {
        "body": "<p>I have Windows 2003 server and i bought a new Windows 2008 server...",
        "title": "Whats the best way to file transfer my site from server to a newer one?",
        "tags": [
            "windows-server-2003",
            "windows-server-2008",
            "file-transfer"
        ],
    }

An example of an answer typed document:

.. code:: js

    {
        "owner": {
            "location": "Norfolk, United Kingdom",
            "display_name": "Sam",
            "id": 48
        },
        "body": "<p>Unfortunately your pretty much limited to FTP...",
        "creation_date": "2009-05-04T13:45:37.030"
    }

The following request can be built that connects the two together:

.. code:: js

    {
      "aggs": {
        "top-tags": {
          "terms": {
            "field": "tags",
            "size": 10
          },
          "aggs": {
            "to-answers": {
              "children": {
                "type" : "answer" 
              },
              "aggs": {
                "top-names": {
                  "terms": {
                    "field": "owner.display_name",
                    "size": 10
                  }
                }
              }
            }
          }
        }
      }
    }

The ``type`` points to type / mapping with the name ``answer``.

The above example returns the top question tags and per tag the top
answer owners.

Possible response:

.. code:: js

    {
      "aggregations": {
        "top-tags": {
          "buckets": [
            {
              "key": "windows-server-2003",
              "doc_count": 25365, 
              "to-answers": {
                "doc_count": 36004, 
                "top-names": {
                  "buckets": [
                    {
                      "key": "Sam",
                      "doc_count": 274
                    },
                    {
                      "key": "chris",
                      "doc_count": 19
                    },
                    {
                      "key": "david",
                      "doc_count": 14
                    },
                    ...
                  ]
                }
              }
            },
            {
              "key": "linux",
              "doc_count": 18342,
              "to-answers": {
                "doc_count": 6655,
                "top-names": {
                  "buckets": [
                    {
                      "key": "abrams",
                      "doc_count": 25
                    },
                    {
                      "key": "ignacio",
                      "doc_count": 25
                    },
                    {
                      "key": "vazquez",
                      "doc_count": 25
                    },
                    ...
                  ]
                }
              }
            },
            {
              "key": "windows",
              "doc_count": 18119,
              "to-answers": {
                "doc_count": 24051,
                "top-names": {
                  "buckets": [
                    {
                      "key": "molly7244",
                      "doc_count": 265
                    },
                    {
                      "key": "david",
                      "doc_count": 27
                    },
                    {
                      "key": "chris",
                      "doc_count": 26
                    },
                    ...
                  ]
                }
              }
            },
            {
              "key": "osx",
              "doc_count": 10971,
              "to-answers": {
                "doc_count": 5902,
                "top-names": {
                  "buckets": [
                    {
                      "key": "diago",
                      "doc_count": 4
                    },
                    {
                      "key": "albert",
                      "doc_count": 3
                    },
                    {
                      "key": "asmus",
                      "doc_count": 3
                    },
                    ...
                  ]
                }
              }
            },
            {
              "key": "ubuntu",
              "doc_count": 8743,
              "to-answers": {
                "doc_count": 8784,
                "top-names": {
                  "buckets": [
                    {
                      "key": "ignacio",
                      "doc_count": 9
                    },
                    {
                      "key": "abrams",
                      "doc_count": 8
                    },
                    {
                      "key": "molly7244",
                      "doc_count": 8
                    },
                    ...
                  ]
                }
              }
            },
            {
              "key": "windows-xp",
              "doc_count": 7517,
              "to-answers": {
                "doc_count": 13610,
                "top-names": {
                  "buckets": [
                    {
                      "key": "molly7244",
                      "doc_count": 232
                    },
                    {
                      "key": "chris",
                      "doc_count": 9
                    },
                    {
                      "key": "john",
                      "doc_count": 9
                    },
                    ...
                  ]
                }
              }
            },
            {
              "key": "networking",
              "doc_count": 6739,
              "to-answers": {
                "doc_count": 2076,
                "top-names": {
                  "buckets": [
                    {
                      "key": "molly7244",
                      "doc_count": 6
                    },
                    {
                      "key": "alnitak",
                      "doc_count": 5
                    },
                    {
                      "key": "chris",
                      "doc_count": 3
                    },
                    ...
                  ]
                }
              }
            },
            {
              "key": "mac",
              "doc_count": 5590,
              "to-answers": {
                "doc_count": 999,
                "top-names": {
                  "buckets": [
                    {
                      "key": "abrams",
                      "doc_count": 2
                    },
                    {
                      "key": "ignacio",
                      "doc_count": 2
                    },
                    {
                      "key": "vazquez",
                      "doc_count": 2
                    },
                    ...
                  ]
                }
              }
            },
            {
              "key": "wireless-networking",
              "doc_count": 4409,
              "to-answers": {
                "doc_count": 6497,
                "top-names": {
                  "buckets": [
                    {
                      "key": "molly7244",
                      "doc_count": 61
                    },
                    {
                      "key": "chris",
                      "doc_count": 5
                    },
                    {
                      "key": "mike",
                      "doc_count": 5
                    },
                    ...
                  ]
                }
              }
            },
            {
              "key": "windows-8",
              "doc_count": 3601,
              "to-answers": {
                "doc_count": 4263,
                "top-names": {
                  "buckets": [
                    {
                      "key": "molly7244",
                      "doc_count": 3
                    },
                    {
                      "key": "msft",
                      "doc_count": 2
                    },
                    {
                      "key": "user172132",
                      "doc_count": 2
                    },
                    ...
                  ]
                }
              }
            }
          ]
        }
      }
    }

The number of question documents with the tag ``windows-server-2003``.

The number of answer documents that are related to question documents
with the tag ``windows-server-2003``.

Terms Aggregation
-----------------

A multi-bucket value source based aggregation where buckets are
dynamically built - one per unique value.

Example:

.. code:: js

    {
        "aggs" : {
            "genders" : {
                "terms" : { "field" : "gender" }
            }
        }
    }

Response:

.. code:: js

    {
        ...

        "aggregations" : {
            "genders" : {
                "doc_count_error_upper_bound": 0, 
                "sum_other_doc_count": 0, 
                "buckets" : [ 
                    {
                        "key" : "male",
                        "doc_count" : 10
                    },
                    {
                        "key" : "female",
                        "doc_count" : 10
                    },
                ]
            }
        }
    }

an upper bound of the error on the document counts for each term, see
`below <#search-aggregations-bucket-terms-aggregation-approximate-counts>`__

when there are lots of unique terms, elasticsearch only returns the top
terms; this number is the sum of the document counts for all buckets
that are not part of the response

the list of the top buckets, the meaning of ``top`` being defined by the
`order <#search-aggregations-bucket-terms-aggregation-order>`__

By default, the ``terms`` aggregation will return the buckets for the
top ten terms ordered by the ``doc_count``. One can change this default
behaviour by setting the ``size`` parameter.

Size
~~~~

The ``size`` parameter can be set to define how many term buckets should
be returned out of the overall terms list. By default, the node
coordinating the search process will request each shard to provide its
own top ``size`` term buckets and once all shards respond, it will
reduce the results to the final list that will then be returned to the
client. This means that if the number of unique terms is greater than
``size``, the returned list is slightly off and not accurate (it could
be that the term counts are slightly off and it could even be that a
term that should have been in the top size buckets was not returned). If
set to ``0``, the ``size`` will be set to ``Integer.MAX_VALUE``.

Document counts are approximate
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

As described above, the document counts (and the results of any sub
aggregations) in the terms aggregation are not always accurate. This is
because each shard provides its own view of what the ordered list of
terms should be and these are combined to give a final view. Consider
the following scenario:

A request is made to obtain the top 5 terms in the field product,
ordered by descending document count from an index with 3 shards. In
this case each shard is asked to give its top 5 terms.

.. code:: js

    {
        "aggs" : {
            "products" : {
                "terms" : {
                    "field" : "product",
                    "size" : 5
                }
            }
        }
    }

The terms for each of the three shards are shown below with their
respective document counts in brackets:

+--------------------+--------------------+--------------------+--------------------+
|                    | Shard A            | Shard B            | Shard C            |
+====================+====================+====================+====================+
| 1                  | Product A (25)     | Product A (30)     | Product A (45)     |
+--------------------+--------------------+--------------------+--------------------+
| 2                  | Product B (18)     | Product B (25)     | Product C (44)     |
+--------------------+--------------------+--------------------+--------------------+
| 3                  | Product C (6)      | Product F (17)     | Product Z (36)     |
+--------------------+--------------------+--------------------+--------------------+
| 4                  | Product D (3)      | Product Z (16)     | Product G (30)     |
+--------------------+--------------------+--------------------+--------------------+
| 5                  | Product E (2)      | Product G (15)     | Product E (29)     |
+--------------------+--------------------+--------------------+--------------------+
| 6                  | Product F (2)      | Product H (14)     | Product H (28)     |
+--------------------+--------------------+--------------------+--------------------+
| 7                  | Product G (2)      | Product I (10)     | Product Q (2)      |
+--------------------+--------------------+--------------------+--------------------+
| 8                  | Product H (2)      | Product Q (6)      | Product D (1)      |
+--------------------+--------------------+--------------------+--------------------+
| 9                  | Product I (1)      | Product J (8)      |                    |
+--------------------+--------------------+--------------------+--------------------+
| 10                 | Product J (1)      | Product C (4)      |                    |
+--------------------+--------------------+--------------------+--------------------+

The shards will return their top 5 terms so the results from the shards
will be:

+--------------------+--------------------+--------------------+--------------------+
|                    | Shard A            | Shard B            | Shard C            |
+====================+====================+====================+====================+
| 1                  | Product A (25)     | Product A (30)     | Product A (45)     |
+--------------------+--------------------+--------------------+--------------------+
| 2                  | Product B (18)     | Product B (25)     | Product C (44)     |
+--------------------+--------------------+--------------------+--------------------+
| 3                  | Product C (6)      | Product F (17)     | Product Z (36)     |
+--------------------+--------------------+--------------------+--------------------+
| 4                  | Product D (3)      | Product Z (16)     | Product G (30)     |
+--------------------+--------------------+--------------------+--------------------+
| 5                  | Product E (2)      | Product G (15)     | Product E (29)     |
+--------------------+--------------------+--------------------+--------------------+

Taking the top 5 results from each of the shards (as requested) and
combining them to make a final top 5 list produces the following:

+--------------------------------------+--------------------------------------+
| 1                                    | Product A (100)                      |
+--------------------------------------+--------------------------------------+
| 2                                    | Product Z (52)                       |
+--------------------------------------+--------------------------------------+
| 3                                    | Product C (50)                       |
+--------------------------------------+--------------------------------------+
| 4                                    | Product G (45)                       |
+--------------------------------------+--------------------------------------+
| 5                                    | Product B (43)                       |
+--------------------------------------+--------------------------------------+

Because Product A was returned from all shards we know that its document
count value is accurate. Product C was only returned by shards A and C
so its document count is shown as 50 but this is not an accurate count.
Product C exists on shard B, but its count of 4 was not high enough to
put Product C into the top 5 list for that shard. Product Z was also
returned only by 2 shards but the third shard does not contain the term.
There is no way of knowing, at the point of combining the results to
produce the final list of terms, that there is an error in the document
count for Product C and not for Product Z. Product H has a document
count of 44 across all 3 shards but was not included in the final list
of terms because it did not make it into the top five terms on any of
the shards.

Shard Size
~~~~~~~~~~

The higher the requested ``size`` is, the more accurate the results will
be, but also, the more expensive it will be to compute the final results
(both due to bigger priority queues that are managed on a shard level
and due to bigger data transfers between the nodes and the client).

The ``shard_size`` parameter can be used to minimize the extra work that
comes with bigger requested ``size``. When defined, it will determine
how many terms the coordinating node will request from each shard. Once
all the shards responded, the coordinating node will then reduce them to
a final result which will be based on the ``size`` parameter - this way,
one can increase the accuracy of the returned terms and avoid the
overhead of streaming a big list of buckets back to the client. If set
to ``0``, the ``shard_size`` will be set to ``Integer.MAX_VALUE``.

    **Note**

    ``shard_size`` cannot be smaller than ``size`` (as it doesn’t make
    much sense). When it is, elasticsearch will override it and reset it
    to be equal to ``size``.

It is possible to not limit the number of terms that are returned by
setting ``size`` to ``0``. Don’t use this on high-cardinality fields as
this will kill both your CPU since terms need to be return sorted, and
your network.

Calculating Document Count Error
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

There are two error values which can be shown on the terms aggregation.
The first gives a value for the aggregation as a whole which represents
the maximum potential document count for a term which did not make it
into the final list of terms. This is calculated as the sum of the
document count from the last term returned from each shard .For the
example given above the value would be 46 (2 + 15 + 29). This means that
in the worst case scenario a term which was not returned could have the
4th highest document count.

.. code:: js

    {
        ...

        "aggregations" : {
            "products" : {
                "doc_count_error_upper_bound" : 46,
                "buckets" : [
                    {
                        "key" : "Product A",
                        "doc_count" : 100
                    },
                    {
                        "key" : "Product Z",
                        "doc_count" : 52
                    },
                    ...
                ]
            }
        }
    }

The second error value can be enabled by setting the
``show_term_doc_count_error`` parameter to true. This shows an error
value for each term returned by the aggregation which represents the
*worst case* error in the document count and can be useful when deciding
on a value for the ``shard_size`` parameter. This is calculated by
summing the document counts for the last term returned by all shards
which did not return the term. In the example above the error in the
document count for Product C would be 15 as Shard B was the only shard
not to return the term and the document count of the last termit did
return was 15. The actual document count of Product C was 54 so the
document count was only actually off by 4 even though the worst case was
that it would be off by 15. Product A, however has an error of 0 for its
document count, since every shard returned it we can be confident that
the count returned is accurate.

.. code:: js

    {
        ...

        "aggregations" : {
            "products" : {
                "doc_count_error_upper_bound" : 46,
                "buckets" : [
                    {
                        "key" : "Product A",
                        "doc_count" : 100,
                        "doc_count_error_upper_bound" : 0
                    },
                    {
                        "key" : "Product Z",
                        "doc_count" : 52,
                        "doc_count_error_upper_bound" : 2
                    },
                    ...
                ]
            }
        }
    }

These errors can only be calculated in this way when the terms are
ordered by descending document count. When the aggregation is ordered by
the terms values themselves (either ascending or descending) there is no
error in the document count since if a shard does not return a
particular term which appears in the results from another shard, it must
not have that term in its index. When the aggregation is either sorted
by a sub aggregation or in order of ascending document count, the error
in the document counts cannot be determined and is given a value of -1
to indicate this.

Order
~~~~~

The order of the buckets can be customized by setting the ``order``
parameter. By default, the buckets are ordered by their ``doc_count``
descending. It is also possible to change this behaviour as follows:

Ordering the buckets by their ``doc_count`` in an ascending manner:

.. code:: js

    {
        "aggs" : {
            "genders" : {
                "terms" : {
                    "field" : "gender",
                    "order" : { "_count" : "asc" }
                }
            }
        }
    }

Ordering the buckets alphabetically by their terms in an ascending
manner:

.. code:: js

    {
        "aggs" : {
            "genders" : {
                "terms" : {
                    "field" : "gender",
                    "order" : { "_term" : "asc" }
                }
            }
        }
    }

Ordering the buckets by single value metrics sub-aggregation (identified
by the aggregation name):

.. code:: js

    {
        "aggs" : {
            "genders" : {
                "terms" : {
                    "field" : "gender",
                    "order" : { "avg_height" : "desc" }
                },
                "aggs" : {
                    "avg_height" : { "avg" : { "field" : "height" } }
                }
            }
        }
    }

Ordering the buckets by multi value metrics sub-aggregation (identified
by the aggregation name):

.. code:: js

    {
        "aggs" : {
            "genders" : {
                "terms" : {
                    "field" : "gender",
                    "order" : { "height_stats.avg" : "desc" }
                },
                "aggs" : {
                    "height_stats" : { "stats" : { "field" : "height" } }
                }
            }
        }
    }

It is also possible to order the buckets based on a "deeper" aggregation
in the hierarchy. This is supported as long as the aggregations path are
of a single-bucket type, where the last aggregation in the path may
either be a single-bucket one or a metrics one. If it’s a single-bucket
type, the order will be defined by the number of docs in the bucket
(i.e. ``doc_count``), in case it’s a metrics one, the same rules as
above apply (where the path must indicate the metric name to sort by in
case of a multi-value metrics aggregation, and in case of a single-value
metrics aggregation the sort will be applied on that value).

The path must be defined in the following form:

::

    AGG_SEPARATOR       :=  '>'
    METRIC_SEPARATOR    :=  '.'
    AGG_NAME            :=  <the name of the aggregation>
    METRIC              :=  <the name of the metric (in case of multi-value metrics aggregation)>
    PATH                :=  <AGG_NAME>[<AGG_SEPARATOR><AGG_NAME>]*[<METRIC_SEPARATOR><METRIC>]

.. code:: js

    {
        "aggs" : {
            "countries" : {
                "terms" : {
                    "field" : "address.country",
                    "order" : { "females>height_stats.avg" : "desc" }
                },
                "aggs" : {
                    "females" : {
                        "filter" : { "term" : { "gender" : { "female" }}},
                        "aggs" : {
                            "height_stats" : { "stats" : { "field" : "height" }}
                        }
                    }
                }
            }
        }
    }

The above will sort the countries buckets based on the average height
among the female population.

Multiple criteria can be used to order the buckets by providing an array
of order criteria such as the following:

.. code:: js

    {
        "aggs" : {
            "countries" : {
                "terms" : {
                    "field" : "address.country",
                    "order" : [ { "females>height_stats.avg" : "desc" }, { "_count" : "desc" } ]
                },
                "aggs" : {
                    "females" : {
                        "filter" : { "term" : { "gender" : { "female" }}},
                        "aggs" : {
                            "height_stats" : { "stats" : { "field" : "height" }}
                        }
                    }
                }
            }
        }
    }

The above will sort the countries buckets based on the average height
among the female population and then by their ``doc_count`` in
descending order.

    **Note**

    In the event that two buckets share the same values for all order
    criteria the bucket’s term value is used as a tie-breaker in
    ascending alphabetical order to prevent non-deterministic ordering
    of buckets.

Minimum document count
~~~~~~~~~~~~~~~~~~~~~~

It is possible to only return terms that match more than a configured
number of hits using the ``min_doc_count`` option:

.. code:: js

    {
        "aggs" : {
            "tags" : {
                "terms" : {
                    "field" : "tag",
                    "min_doc_count": 10
                }
            }
        }
    }

The above aggregation would only return tags which have been found in 10
hits or more. Default value is ``1``.

Terms are collected and ordered on a shard level and merged with the
terms collected from other shards in a second step. However, the shard
does not have the information about the global document count available.
The decision if a term is added to a candidate list depends only on the
order computed on the shard using local shard frequencies. The
``min_doc_count`` criterion is only applied after merging local terms
statistics of all shards. In a way the decision to add the term as a
candidate is made without being very *certain* about if the term will
actually reach the required ``min_doc_count``. This might cause many
(globally) high frequent terms to be missing in the final result if low
frequent terms populated the candidate lists. To avoid this, the
``shard_size`` parameter can be increased to allow more candidate terms
on the shards. However, this increases memory consumption and network
traffic.

``shard_min_doc_count`` parameter

The parameter ``shard_min_doc_count`` regulates the *certainty* a shard
has if the term should actually be added to the candidate list or not
with respect to the ``min_doc_count``. Terms will only be considered if
their local shard frequency within the set is higher than the
``shard_min_doc_count``. If your dictionary contains many low frequent
terms and you are not interested in those (for example misspellings),
then you can set the ``shard_min_doc_count`` parameter to filter out
candidate terms on a shard level that will with a reasonable certainty
not reach the required ``min_doc_count`` even after merging the local
counts. ``shard_min_doc_count`` is set to ``0`` per default and has no
effect unless you explicitly set it.

    **Note**

    Setting ``min_doc_count``\ =\ ``0`` will also return buckets for
    terms that didn’t match any hit. However, some of the returned terms
    which have a document count of zero might only belong to deleted
    documents, so there is no warranty that a ``match_all`` query would
    find a positive document count for those terms.

    **Warning**

    When NOT sorting on ``doc_count`` descending, high values of
    ``min_doc_count`` may return a number of buckets which is less than
    ``size`` because not enough data was gathered from the shards.
    Missing buckets can be back by increasing ``shard_size``. Setting
    ``shard_min_doc_count`` too high will cause terms to be filtered out
    on a shard level. This value should be set much lower than
    ``min_doc_count/#shards``.

Script
~~~~~~

Generating the terms using a script:

.. code:: js

    {
        "aggs" : {
            "genders" : {
                "terms" : {
                    "script" : "doc['gender'].value"
                }
            }
        }
    }

Value Script
~~~~~~~~~~~~

.. code:: js

    {
        "aggs" : {
            "genders" : {
                "terms" : {
                    "field" : "gender",
                    "script" : "'Gender: ' +_value"
                }
            }
        }
    }

Filtering Values
~~~~~~~~~~~~~~~~

It is possible to filter the values for which buckets will be created.
This can be done using the ``include`` and ``exclude`` parameters which
are based on regular expression strings or arrays of exact values.

.. code:: js

    {
        "aggs" : {
            "tags" : {
                "terms" : {
                    "field" : "tags",
                    "include" : ".*sport.*",
                    "exclude" : "water_.*"
                }
            }
        }
    }

In the above example, buckets will be created for all the tags that has
the word ``sport`` in them, except those starting with ``water_`` (so
the tag ``water_sports`` will no be aggregated). The ``include`` regular
expression will determine what values are "allowed" to be aggregated,
while the ``exclude`` determines the values that should not be
aggregated. When both are defined, the ``exclude`` has precedence,
meaning, the ``include`` is evaluated first and only then the
``exclude``.

The regular expression are based on the Java™
`Pattern <http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html>`__,
and as such, they it is also possible to pass in flags that will
determine how the compiled regular expression will work:

.. code:: js

    {
        "aggs" : {
            "tags" : {
                 "terms" : {
                     "field" : "tags",
                     "include" : {
                         "pattern" : ".*sport.*",
                         "flags" : "CANON_EQ|CASE_INSENSITIVE" 
                     },
                     "exclude" : {
                         "pattern" : "water_.*",
                         "flags" : "CANON_EQ|CASE_INSENSITIVE"
                     }
                 }
             }
        }
    }

the flags are concatenated using the ``|`` character as a separator

The possible flags that can be used are:
```CANON_EQ`` <http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#CANON_EQ>`__,
```CASE_INSENSITIVE`` <http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#CASE_INSENSITIVE>`__,
```COMMENTS`` <http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#COMMENTS>`__,
```DOTALL`` <http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#DOTALL>`__,
```LITERAL`` <http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#LITERAL>`__,
```MULTILINE`` <http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#MULTILINE>`__,
```UNICODE_CASE`` <http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#UNICODE_CASE>`__,
```UNICODE_CHARACTER_CLASS`` <http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#UNICODE_CHARACTER_CLASS>`__
and
```UNIX_LINES`` <http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#UNIX_LINES>`__

For matching based on exact values the ``include`` and ``exclude``
parameters can simply take an array of strings that represent the terms
as they are found in the index:

.. code:: js

    {
        "aggs" : {
            "JapaneseCars" : {
                 "terms" : {
                     "field" : "make",
                     "include" : ["mazda", "honda"]
                 }
             },
            "ActiveCarManufacturers" : {
                 "terms" : {
                     "field" : "make",
                     "exclude" : ["rover", "jensen"]
                 }
             }
        }
    }

Multi-field terms aggregation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``terms`` aggregation does not support collecting terms from
multiple fields in the same document. The reason is that the ``terms``
agg doesn’t collect the string term values themselves, but rather uses
`global
ordinals <#search-aggregations-bucket-terms-aggregation-execution-hint>`__
to produce a list of all of the unique values in the field. Global
ordinals results in an important performance boost which would not be
possible across multiple fields.

There are two approaches that you can use to perform a ``terms`` agg
across multiple fields:

`Script <#search-aggregations-bucket-terms-aggregation-script>`__
    Use a script to retrieve terms from multiple fields. This disables
    the global ordinals optimization and will be slower than collecting
    terms from a single field, but it gives you the flexibility to
    implement this option at search time.

```copy_to`` field <#copy-to>`__
    If you know ahead of time that you want to collect the terms from
    two or more fields, then use ``copy_to`` in your mapping to create a
    new dedicated field at index time which contains the values from
    both fields. You can aggregate on this single field, which will
    benefit from the global ordinals optimization.

Collect mode
~~~~~~~~~~~~

Deferring calculation of child aggregations

For fields with many unique terms and a small number of required results
it can be more efficient to delay the calculation of child aggregations
until the top parent-level aggs have been pruned. Ordinarily, all
branches of the aggregation tree are expanded in one depth-first pass
and only then any pruning occurs. In some rare scenarios this can be
very wasteful and can hit memory constraints. An example problem
scenario is querying a movie database for the 10 most popular actors and
their 5 most common co-stars:

.. code:: js

    {
        "aggs" : {
            "actors" : {
                 "terms" : {
                     "field" : "actors",
                     "size" : 10
                 },
                "aggs" : {
                    "costars" : {
                         "terms" : {
                             "field" : "actors",
                             "size" : 5
                         }
                     }
                }
             }
        }
    }

Even though the number of movies may be comparatively small and we want
only 50 result buckets there is a combinatorial explosion of buckets
during calculation - a single movie will produce n² buckets where n is
the number of actors. The sane option would be to first determine the 10
most popular actors and only then examine the top co-stars for these 10
actors. This alternative strategy is what we call the ``breadth_first``
collection mode as opposed to the default ``depth_first`` mode:

.. code:: js

    {
        "aggs" : {
            "actors" : {
                 "terms" : {
                     "field" : "actors",
                     "size" : 10,
                     "collect_mode" : "breadth_first"
                 },
                "aggs" : {
                    "costars" : {
                         "terms" : {
                             "field" : "actors",
                             "size" : 5
                         }
                     }
                }
             }
        }
    }

When using ``breadth_first`` mode the set of documents that fall into
the uppermost buckets are cached for subsequent replay so there is a
memory overhead in doing this which is linear with the number of
matching documents. In most requests the volume of buckets generated is
smaller than the number of documents that fall into them so the default
``depth_first`` collection mode is normally the best bet but
occasionally the ``breadth_first`` strategy can be significantly more
efficient. Currently elasticsearch will always use the ``depth_first``
collect\_mode unless explicitly instructed to use ``breadth_first`` as
in the above example. Note that the ``order`` parameter can still be
used to refer to data from a child aggregation when using the
``breadth_first`` setting - the parent aggregation understands that this
child aggregation will need to be called first before any of the other
child aggregations.

    **Warning**

    It is not possible to nest aggregations such as ``top_hits`` which
    require access to match score information under an aggregation that
    uses the ``breadth_first`` collection mode. This is because this
    would require a RAM buffer to hold the float score value for every
    document and this would typically be too costly in terms of RAM.

Execution hint
~~~~~~~~~~~~~~

There are different mechanisms by which terms aggregations can be
executed:

-  by using field values directly in order to aggregate data per-bucket
   (``map``)

-  by using ordinals of the field and preemptively allocating one bucket
   per ordinal value (``global_ordinals``)

-  by using ordinals of the field and dynamically allocating one bucket
   per ordinal value (``global_ordinals_hash``)

-  by using per-segment ordinals to compute counts and remap these
   counts to global counts using global ordinals
   (``global_ordinals_low_cardinality``)

Elasticsearch tries to have sensible defaults so this is something that
generally doesn’t need to be configured.

``map`` should only be considered when very few documents match a query.
Otherwise the ordinals-based execution modes are significantly faster.
By default, ``map`` is only used when running an aggregation on scripts,
since they don’t have ordinals.

``global_ordinals_low_cardinality`` only works for leaf terms
aggregations but is usually the fastest execution mode. Memory usage is
linear with the number of unique values in the field, so it is only
enabled by default on low-cardinality fields.

``global_ordinals`` is the second fastest option, but the fact that it
preemptively allocates buckets can be memory-intensive, especially if
you have one or more sub aggregations. It is used by default on
top-level terms aggregations.

``global_ordinals_hash`` on the contrary to ``global_ordinals`` and
``global_ordinals_low_cardinality`` allocates buckets dynamically so
memory usage is linear to the number of values of the documents that are
part of the aggregation scope. It is used by default in inner
aggregations.

.. code:: js

    {
        "aggs" : {
            "tags" : {
                 "terms" : {
                     "field" : "tags",
                     "execution_hint": "map" 
                 }
             }
        }
    }

the possible values are ``map``, ``global_ordinals``,
``global_ordinals_hash`` and ``global_ordinals_low_cardinality``

Please note that Elasticsearch will ignore this execution hint if it is
not applicable and that there is no backward compatibility guarantee on
these hints.

Significant Terms Aggregation
-----------------------------

An aggregation that returns interesting or unusual occurrences of terms
in a set.

    **Important**

    This feature is marked as experimental, and may be subject to change
    in the future. If you use this feature, please let us know your
    experience with it!

-  Suggesting "H5N1" when users search for "bird flu" in text

-  Identifying the merchant that is the "common point of compromise"
   from the transaction history of credit card owners reporting loss

-  Suggesting keywords relating to stock symbol $ATI for an automated
   news classifier

-  Spotting the fraudulent doctor who is diagnosing more than his fair
   share of whiplash injuries

-  Spotting the tire manufacturer who has a disproportionate number of
   blow-outs

In all these cases the terms being selected are not simply the most
popular terms in a set. They are the terms that have undergone a
significant change in popularity measured between a *foreground* and
*background* set. If the term "H5N1" only exists in 5 documents in a 10
million document index and yet is found in 4 of the 100 documents that
make up a user’s search results that is significant and probably very
relevant to their search. 5/10,000,000 vs 4/100 is a big swing in
frequency.

Single-set analysis
~~~~~~~~~~~~~~~~~~~

In the simplest case, the *foreground* set of interest is the search
results matched by a query and the *background* set used for statistical
comparisons is the index or indices from which the results were
gathered.

Example:

.. code:: js

    {
        "query" : {
            "terms" : {"force" : [ "British Transport Police" ]}
        },
        "aggregations" : {
            "significantCrimeTypes" : {
                "significant_terms" : { "field" : "crime_type" }
            }
        }
    }

Response:

.. code:: js

    {
        ...

        "aggregations" : {
            "significantCrimeTypes" : {
                "doc_count": 47347,
                "buckets" : [
                    {
                        "key": "Bicycle theft",
                        "doc_count": 3640,
                        "score": 0.371235374214817,
                        "bg_count": 66799
                    }
                    ...
                ]
            }
        }
    }

When querying an index of all crimes from all police forces, what these
results show is that the British Transport Police force stand out as a
force dealing with a disproportionately large number of bicycle thefts.
Ordinarily, bicycle thefts represent only 1% of crimes (66799/5064554)
but for the British Transport Police, who handle crime on railways and
stations, 7% of crimes (3640/47347) is a bike theft. This is a
significant seven-fold increase in frequency and so this anomaly was
highlighted as the top crime type.

The problem with using a query to spot anomalies is it only gives us one
subset to use for comparisons. To discover all the other police forces'
anomalies we would have to repeat the query for each of the different
forces.

This can be a tedious way to look for unusual patterns in an index

Multi-set analysis
~~~~~~~~~~~~~~~~~~

A simpler way to perform analysis across multiple categories is to use a
parent-level aggregation to segment the data ready for analysis.

Example using a parent aggregation for segmentation:

.. code:: js

    {
        "aggregations": {
            "forces": {
                "terms": {"field": "force"},
                "aggregations": {
                    "significantCrimeTypes": {
                        "significant_terms": {"field": "crime_type"}
                    }
                }
            }
        }
    }

Response:

.. code:: js

    {
     ...

     "aggregations": {
        "forces": {
            "buckets": [
                {
                    "key": "Metropolitan Police Service",
                    "doc_count": 894038,
                    "significantCrimeTypes": {
                        "doc_count": 894038,
                        "buckets": [
                            {
                                "key": "Robbery",
                                "doc_count": 27617,
                                "score": 0.0599,
                                "bg_count": 53182
                            },
                            ...
                        ]
                    }
                },
                {
                    "key": "British Transport Police",
                    "doc_count": 47347,
                    "significantCrimeTypes": {
                        "doc_count": 47347,
                        "buckets": [
                            {
                                "key": "Bicycle theft",
                                "doc_count": 3640,
                                "score": 0.371,
                                "bg_count": 66799
                            },
                            ...
                        ]
                    }
                }
            ]
        }
    }

Now we have anomaly detection for each of the police forces using a
single request.

We can use other forms of top-level aggregations to segment our data,
for example segmenting by geographic area to identify unusual hot-spots
of a particular crime type:

.. code:: js

    {
        "aggs": {
            "hotspots": {
                "geohash_grid" : {
                    "field":"location",
                    "precision":5,
                },
                "aggs": {
                    "significantCrimeTypes": {
                        "significant_terms": {"field": "crime_type"}
                    }
                }
            }
        }
    }

This example uses the ``geohash_grid`` aggregation to create result
buckets that represent geographic areas, and inside each bucket we can
identify anomalous levels of a crime type in these tightly-focused areas
e.g.

-  Airports exhibit unusual numbers of weapon confiscations

-  Universities show uplifts of bicycle thefts

At a higher geohash\_grid zoom-level with larger coverage areas we would
start to see where an entire police-force may be tackling an unusual
volume of a particular crime type.

Obviously a time-based top-level segmentation would help identify
current trends for each point in time where a simple ``terms``
aggregation would typically show the very popular "constants" that
persist across all time slots.

The numbers returned for scores are primarily intended for ranking
different suggestions sensibly rather than something easily understood
by end users. The scores are derived from the doc frequencies in
*foreground* and *background* sets. In brief, a term is considered
significant if there is a noticeable difference in the frequency in
which a term appears in the subset and in the background. The way the
terms are ranked can be configured, see "Parameters" section.

Use on free-text fields
~~~~~~~~~~~~~~~~~~~~~~~

The significant\_terms aggregation can be used effectively on tokenized
free-text fields to suggest:

-  keywords for refining end-user searches

-  keywords for use in percolator queries

    **Warning**

    Picking a free-text field as the subject of a significant terms
    analysis can be expensive! It will attempt to load every unique word
    into RAM. It is recommended to only use this on smaller indices.

You can spot mis-categorized content by first searching a structured
field e.g. ``category:adultMovie`` and use significant\_terms on the
free-text "movie\_description" field. Take the suggested words (I’ll
leave them to your imagination) and then search for all movies NOT
marked as category:adultMovie but containing these keywords. You now
have a ranked list of badly-categorized movies that you should
reclassify or at least remove from the "familyFriendly" category.

The significance score from each term can also provide a useful
``boost`` setting to sort matches. Using the ``minimum_should_match``
setting of the ``terms`` query with the keywords will help control the
balance of precision/recall in the result set i.e a high setting would
have a small number of relevant results packed full of keywords and a
setting of "1" would produce a more exhaustive results set with all
documents containing *any* keyword.

    **Tip**

    | *Show significant\_terms in context*.
    | Free-text significant\_terms are much more easily understood when
    viewed in context. Take the results of ``significant_terms``
    suggestions from a free-text field and use them in a ``terms`` query
    on the same field with a ``highlight`` clause to present users with
    example snippets of documents. When the terms are presented
    unstemmed, highlighted, with the right case, in the right order and
    with some context, their significance/meaning is more readily
    apparent.

Custom background sets
~~~~~~~~~~~~~~~~~~~~~~

Ordinarily, the foreground set of documents is "diffed" against a
background set of all the documents in your index. However, sometimes it
may prove useful to use a narrower background set as the basis for
comparisons. For example, a query on documents relating to "Madrid" in
an index with content from all over the world might reveal that
"Spanish" was a significant term. This may be true but if you want some
more focused terms you could use a ``background_filter`` on the term
*spain* to establish a narrower set of documents as context. With this
as a background "Spanish" would now be seen as commonplace and therefore
not as significant as words like "capital" that relate more strongly
with Madrid. Note that using a background filter will slow things down -
each term’s background frequency must now be derived on-the-fly from
filtering posting lists rather than reading the index’s pre-computed
count for a term.

Limitations
~~~~~~~~~~~

Significant terms must be indexed values
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Unlike the terms aggregation it is currently not possible to use
script-generated terms for counting purposes. Because of the way the
significant\_terms aggregation must consider both *foreground* and
*background* frequencies it would be prohibitively expensive to use a
script on the entire index to obtain background frequencies for
comparisons. Also DocValues are not supported as sources of term data
for similar reasons.

No analysis of floating point fields
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Floating point fields are currently not supported as the subject of
significant\_terms analysis. While integer or long fields can be used to
represent concepts like bank account numbers or category numbers which
can be interesting to track, floating point fields are usually used to
represent quantities of something. As such, individual floating point
terms are not useful for this form of frequency analysis.

Use as a parent aggregation
^^^^^^^^^^^^^^^^^^^^^^^^^^^

If there is the equivalent of a ``match_all`` query or no query criteria
providing a subset of the index the significant\_terms aggregation
should not be used as the top-most aggregation - in this scenario the
*foreground* set is exactly the same as the *background* set and so
there is no difference in document frequencies to observe and from which
to make sensible suggestions.

Another consideration is that the significant\_terms aggregation
produces many candidate results at shard level that are only later
pruned on the reducing node once all statistics from all shards are
merged. As a result, it can be inefficient and costly in terms of RAM to
embed large child aggregations under a significant\_terms aggregation
that later discards many candidate terms. It is advisable in these cases
to perform two searches - the first to provide a rationalized list of
significant\_terms and then add this shortlist of terms to a second
query to go back and fetch the required child aggregations.

Approximate counts
^^^^^^^^^^^^^^^^^^

The counts of how many documents contain a term provided in results are
based on summing the samples returned from each shard and as such may
be:

-  low if certain shards did not provide figures for a given term in
   their top sample

-  high when considering the background frequency as it may count
   occurrences found in deleted documents

Like most design decisions, this is the basis of a trade-off in which we
have chosen to provide fast performance at the cost of some (typically
small) inaccuracies. However, the ``size`` and ``shard size`` settings
covered in the next section provide tools to help control the accuracy
levels.

Parameters
~~~~~~~~~~

JLH score
^^^^^^^^^

The scores are derived from the doc frequencies in *foreground* and
*background* sets. The *absolute* change in popularity
(foregroundPercent - backgroundPercent) would favor common terms whereas
the *relative* change in popularity (foregroundPercent/
backgroundPercent) would favor rare terms. Rare vs common is essentially
a precision vs recall balance and so the absolute and relative changes
are multiplied to provide a sweet spot between precision and recall.

mutual information
^^^^^^^^^^^^^^^^^^

Mutual information as described in "Information Retrieval", Manning et
al., Chapter 13.5.1 can be used as significance score by adding the
parameter

.. code:: js

         "mutual_information": {
              "include_negatives": true
         }

Mutual information does not differentiate between terms that are
descriptive for the subset or for documents outside the subset. The
significant terms therefore can contain terms that appear more or less
frequent in the subset than outside the subset. To filter out the terms
that appear less often in the subset than in documents outside the
subset, ``include_negatives`` can be set to ``false``.

Per default, the assumption is that the documents in the bucket are also
contained in the background. If instead you defined a custom background
filter that represents a different set of documents that you want to
compare to, set

.. code:: js

    "background_is_superset": false

Chi square
^^^^^^^^^^

Chi square as described in "Information Retrieval", Manning et al.,
Chapter 13.5.2 can be used as significance score by adding the parameter

.. code:: js

         "chi_square": {
         }

Chi square behaves like mutual information and can be configured with
the same parameters ``include_negatives`` and
``background_is_superset``.

google normalized distance
^^^^^^^^^^^^^^^^^^^^^^^^^^

Google normalized distance as described in "The Google Similarity
Distance", Cilibrasi and Vitanyi, 2007
(http://arxiv.org/pdf/cs/0412098v3.pdf) can be used as significance
score by adding the parameter

.. code:: js

         "gnd": {
         }

``gnd`` also accepts the ``background_is_superset`` parameter.

Which one is best?
^^^^^^^^^^^^^^^^^^

Roughly, ``mutual_information`` prefers high frequent terms even if they
occur also frequently in the background. For example, in an analysis of
natural language text this might lead to selection of stop words.
``mutual_information`` is unlikely to select very rare terms like
misspellings. ``gnd`` prefers terms with a high co-occurrence and avoids
selection of stopwords. It might be better suited for synonym detection.
However, ``gnd`` has a tendency to select very rare terms that are, for
example, a result of misspelling. ``chi_square`` and ``jlh`` are
somewhat in-between.

It is hard to say which one of the different heuristics will be the best
choice as it depends on what the significant terms are used for (see for
example [Yang and Pedersen, "A Comparative Study on Feature Selection in
Text Categorization",
1997](\ http://courses.ischool.berkeley.edu/i256/f06/papers/yang97comparative.pdf)
for a study on using significant terms for feature selection for text
classification).

Size & Shard Size
^^^^^^^^^^^^^^^^^

The ``size`` parameter can be set to define how many term buckets should
be returned out of the overall terms list. By default, the node
coordinating the search process will request each shard to provide its
own top term buckets and once all shards respond, it will reduce the
results to the final list that will then be returned to the client. If
the number of unique terms is greater than ``size``, the returned list
can be slightly off and not accurate (it could be that the term counts
are slightly off and it could even be that a term that should have been
in the top size buckets was not returned).

If set to ``0``, the ``size`` will be set to ``Integer.MAX_VALUE``.

To ensure better accuracy a multiple of the final ``size`` is used as
the number of terms to request from each shard using a heuristic based
on the number of shards. To take manual control of this setting the
``shard_size`` parameter can be used to control the volumes of candidate
terms produced by each shard.

Low-frequency terms can turn out to be the most interesting ones once
all results are combined so the significant\_terms aggregation can
produce higher-quality results when the ``shard_size`` parameter is set
to values significantly higher than the ``size`` setting. This ensures
that a bigger volume of promising candidate terms are given a
consolidated review by the reducing node before the final selection.
Obviously large candidate term lists will cause extra network traffic
and RAM usage so this is quality/cost trade off that needs to be
balanced. If ``shard_size`` is set to -1 (the default) then
``shard_size`` will be automatically estimated based on the number of
shards and the ``size`` parameter.

If set to ``0``, the ``shard_size`` will be set to
``Integer.MAX_VALUE``.

    **Note**

    ``shard_size`` cannot be smaller than ``size`` (as it doesn’t make
    much sense). When it is, elasticsearch will override it and reset it
    to be equal to ``size``.

Minimum document count
^^^^^^^^^^^^^^^^^^^^^^

It is possible to only return terms that match more than a configured
number of hits using the ``min_doc_count`` option:

.. code:: js

    {
        "aggs" : {
            "tags" : {
                "significant_terms" : {
                    "field" : "tag",
                    "min_doc_count": 10
                }
            }
        }
    }

The above aggregation would only return tags which have been found in 10
hits or more. Default value is ``3``.

Terms that score highly will be collected on a shard level and merged
with the terms collected from other shards in a second step. However,
the shard does not have the information about the global term
frequencies available. The decision if a term is added to a candidate
list depends only on the score computed on the shard using local shard
frequencies, not the global frequencies of the word. The
``min_doc_count`` criterion is only applied after merging local terms
statistics of all shards. In a way the decision to add the term as a
candidate is made without being very *certain* about if the term will
actually reach the required ``min_doc_count``. This might cause many
(globally) high frequent terms to be missing in the final result if low
frequent but high scoring terms populated the candidate lists. To avoid
this, the ``shard_size`` parameter can be increased to allow more
candidate terms on the shards. However, this increases memory
consumption and network traffic.

``shard_min_doc_count`` parameter

The parameter ``shard_min_doc_count`` regulates the *certainty* a shard
has if the term should actually be added to the candidate list or not
with respect to the ``min_doc_count``. Terms will only be considered if
their local shard frequency within the set is higher than the
``shard_min_doc_count``. If your dictionary contains many low frequent
words and you are not interested in these (for example misspellings),
then you can set the ``shard_min_doc_count`` parameter to filter out
candidate terms on a shard level that will with a reasonable certainty
not reach the required ``min_doc_count`` even after merging the local
frequencies. ``shard_min_doc_count`` is set to ``1`` per default and has
no effect unless you explicitly set it.

    **Warning**

    Setting ``min_doc_count`` to ``1`` is generally not advised as it
    tends to return terms that are typos or other bizarre curiosities.
    Finding more than one instance of a term helps reinforce that, while
    still rare, the term was not the result of a one-off accident. The
    default value of 3 is used to provide a minimum weight-of-evidence.
    Setting ``shard_min_doc_count`` too high will cause significant
    candidate terms to be filtered out on a shard level. This value
    should be set much lower than ``min_doc_count/#shards``.

Custom background context
^^^^^^^^^^^^^^^^^^^^^^^^^

The default source of statistical information for background term
frequencies is the entire index and this scope can be narrowed through
the use of a ``background_filter`` to focus in on significant terms
within a narrower context:

.. code:: js

    {
        "query" : {
            "match" : "madrid"
        },
        "aggs" : {
            "tags" : {
                "significant_terms" : {
                    "field" : "tag",
                    "background_filter": {
                        "term" : { "text" : "spain"}
                    }
                }
            }
        }
    }

The above filter would help focus in on terms that were peculiar to the
city of Madrid rather than revealing terms like "Spanish" that are
unusual in the full index’s worldwide context but commonplace in the
subset of documents containing the word "Spain".

    **Warning**

    Use of background filters will slow the query as each term’s
    postings must be filtered to determine a frequency

Filtering Values
^^^^^^^^^^^^^^^^

It is possible (although rarely required) to filter the values for which
buckets will be created. This can be done using the ``include`` and
``exclude`` parameters which are based on a regular expression string or
arrays of exact terms. This functionality mirrors the features described
in the `terms
aggregation <#search-aggregations-bucket-terms-aggregation>`__
documentation.

Execution hint
^^^^^^^^^^^^^^

There are two mechanisms by which terms aggregations can be executed:
either by using field values directly in order to aggregate data
per-bucket (``map``), or by using ordinals of the field values instead
of the values themselves (``ordinals``). Although the latter execution
mode can be expected to be slightly faster, it is only available for use
when the underlying data source exposes those terms ordinals. Moreover,
it may actually be slower if most field values are unique. Elasticsearch
tries to have sensible defaults when it comes to the execution mode that
should be used, but in case you know that an execution mode may perform
better than the other one, you have the ability to provide Elasticsearch
with a hint:

.. code:: js

    {
        "aggs" : {
            "tags" : {
                 "significant_terms" : {
                     "field" : "tags",
                     "execution_hint": "map" 
                 }
             }
        }
    }

the possible values are ``map`` and ``ordinals``

Please note that Elasticsearch will ignore this execution hint if it is
not applicable.

Range Aggregation
-----------------

A multi-bucket value source based aggregation that enables the user to
define a set of ranges - each representing a bucket. During the
aggregation process, the values extracted from each document will be
checked against each bucket range and "bucket" the relevant/matching
document. Note that this aggregration includes the ``from`` value and
excludes the ``to`` value for each range.

Example:

.. code:: js

    {
        "aggs" : {
            "price_ranges" : {
                "range" : {
                    "field" : "price",
                    "ranges" : [
                        { "to" : 50 },
                        { "from" : 50, "to" : 100 },
                        { "from" : 100 }
                    ]
                }
            }
        }
    }

Response:

.. code:: js

    {
        ...

        "aggregations": {
            "price_ranges" : {
                "buckets": [
                    {
                        "to": 50,
                        "doc_count": 2
                    },
                    {
                        "from": 50,
                        "to": 100,
                        "doc_count": 4
                    },
                    {
                        "from": 100,
                        "doc_count": 4
                    }
                ]
            }
        }
    }

Keyed Response
~~~~~~~~~~~~~~

Setting the ``keyed`` flag to ``true`` will associate a unique string
key with each bucket and return the ranges as a hash rather than an
array:

.. code:: js

    {
        "aggs" : {
            "price_ranges" : {
                "range" : {
                    "field" : "price",
                    "keyed" : true,
                    "ranges" : [
                        { "to" : 50 },
                        { "from" : 50, "to" : 100 },
                        { "from" : 100 }
                    ]
                }
            }
        }
    }

Response:

.. code:: js

    {
        ...

        "aggregations": {
            "price_ranges" : {
                "buckets": {
                    "*-50.0": {
                        "to": 50,
                        "doc_count": 2
                    },
                    "50.0-100.0": {
                        "from": 50,
                        "to": 100,
                        "doc_count": 4
                    },
                    "100.0-*": {
                        "from": 100,
                        "doc_count": 4
                    }
                }
            }
        }
    }

It is also possible to customize the key for each range:

.. code:: js

    {
        "aggs" : {
            "price_ranges" : {
                "range" : {
                    "field" : "price",
                    "keyed" : true,
                    "ranges" : [
                        { "key" : "cheap", "to" : 50 },
                        { "key" : "average", "from" : 50, "to" : 100 },
                        { "key" : "expensive", "from" : 100 }
                    ]
                }
            }
        }
    }

Script
~~~~~~

.. code:: js

    {
        "aggs" : {
            "price_ranges" : {
                "range" : {
                    "script" : "doc['price'].value",
                    "ranges" : [
                        { "to" : 50 },
                        { "from" : 50, "to" : 100 },
                        { "from" : 100 }
                    ]
                }
            }
        }
    }

Value Script
~~~~~~~~~~~~

Lets say the product prices are in USD but we would like to get the
price ranges in EURO. We can use value script to convert the prices
prior the aggregation (assuming conversion rate of 0.8)

.. code:: js

    {
        "aggs" : {
            "price_ranges" : {
                "range" : {
                    "field" : "price",
                    "script" : "_value * conversion_rate",
                    "params" : {
                        "conversion_rate" : 0.8
                    },
                    "ranges" : [
                        { "to" : 35 },
                        { "from" : 35, "to" : 70 },
                        { "from" : 70 }
                    ]
                }
            }
        }
    }

Sub Aggregations
~~~~~~~~~~~~~~~~

The following example, not only "bucket" the documents to the different
buckets but also computes statistics over the prices in each price range

.. code:: js

    {
        "aggs" : {
            "price_ranges" : {
                "range" : {
                    "field" : "price",
                    "ranges" : [
                        { "to" : 50 },
                        { "from" : 50, "to" : 100 },
                        { "from" : 100 }
                    ]
                },
                "aggs" : {
                    "price_stats" : {
                        "stats" : { "field" : "price" }
                    }
                }
            }
        }
    }

Response:

.. code:: js

    {
        "aggregations": {
            "price_ranges" : {
                "buckets": [
                    {
                        "to": 50,
                        "doc_count": 2,
                        "price_stats": {
                            "count": 2,
                            "min": 20,
                            "max": 47,
                            "avg": 33.5,
                            "sum": 67
                        }
                    },
                    {
                        "from": 50,
                        "to": 100,
                        "doc_count": 4,
                        "price_stats": {
                            "count": 4,
                            "min": 60,
                            "max": 98,
                            "avg": 82.5,
                            "sum": 330
                        }
                    },
                    {
                        "from": 100,
                        "doc_count": 4,
                        "price_stats": {
                            "count": 4,
                            "min": 134,
                            "max": 367,
                            "avg": 216,
                            "sum": 864
                        }
                    }
                ]
            }
        }
    }

If a sub aggregation is also based on the same value source as the range
aggregation (like the ``stats`` aggregation in the example above) it is
possible to leave out the value source definition for it. The following
will return the same response as above:

.. code:: js

    {
        "aggs" : {
            "price_ranges" : {
                "range" : {
                    "field" : "price",
                    "ranges" : [
                        { "to" : 50 },
                        { "from" : 50, "to" : 100 },
                        { "from" : 100 }
                    ]
                },
                "aggs" : {
                    "price_stats" : {
                        "stats" : {} 
                    }
                }
            }
        }
    }

We don’t need to specify the ``price`` as we "inherit" it by default
from the parent ``range`` aggregation

Date Range Aggregation
----------------------

A range aggregation that is dedicated for date values. The main
difference between this aggregation and the normal
`range <#search-aggregations-bucket-range-aggregation>`__ aggregation is
that the ``from`` and ``to`` values can be expressed in Date Math
expressions, and it is also possible to specify a date format by which
the ``from`` and ``to`` response fields will be returned. Note that this
aggregration includes the ``from`` value and excludes the ``to`` value
for each range.

Example:

.. code:: js

    {
        "aggs": {
            "range": {
                "date_range": {
                    "field": "date",
                    "format": "MM-yyy",
                    "ranges": [
                        { "to": "now-10M/M" },
                        { "from": "now-10M/M" }
                    ]
                }
            }
        }
    }

In the example above, we created two range buckets, the first will
"bucket" all documents dated prior to 10 months ago and the second will
"bucket" all documents dated since 10 months ago

Response:

.. code:: js

    {
        ...

        "aggregations": {
            "range": {
                "buckets": [
                    {
                        "to": 1.3437792E+12,
                        "to_as_string": "08-2012",
                        "doc_count": 7
                    },
                    {
                        "from": 1.3437792E+12,
                        "from_as_string": "08-2012",
                        "doc_count": 2
                    }
                ]
            }
        }
    }

Date Format/Pattern
~~~~~~~~~~~~~~~~~~~

    **Note**

    this information was copied from
    `JodaDate <http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html>`__

All ASCII letters are reserved as format pattern letters, which are
defined as follows:

+--------------------+--------------------+--------------------+--------------------+
| Symbol             | Meaning            | Presentation       | Examples           |
+====================+====================+====================+====================+
| G                  | era                | text               | AD                 |
+--------------------+--------------------+--------------------+--------------------+
| C                  | century of era     | number             | 20                 |
|                    | (>=0)              |                    |                    |
+--------------------+--------------------+--------------------+--------------------+
| Y                  | year of era (>=0)  | year               | 1996               |
+--------------------+--------------------+--------------------+--------------------+
| x                  | weekyear           | year               | 1996               |
+--------------------+--------------------+--------------------+--------------------+
| w                  | week of weekyear   | number             | 27                 |
+--------------------+--------------------+--------------------+--------------------+
| e                  | day of week        | number             | 2                  |
+--------------------+--------------------+--------------------+--------------------+
| E                  | day of week        | text               | Tuesday; Tue       |
+--------------------+--------------------+--------------------+--------------------+
| y                  | year               | year               | 1996               |
+--------------------+--------------------+--------------------+--------------------+
| D                  | day of year        | number             | 189                |
+--------------------+--------------------+--------------------+--------------------+
| M                  | month of year      | month              | July; Jul; 07      |
+--------------------+--------------------+--------------------+--------------------+
| d                  | day of month       | number             | 10                 |
+--------------------+--------------------+--------------------+--------------------+
| a                  | halfday of day     | text               | PM                 |
+--------------------+--------------------+--------------------+--------------------+
| K                  | hour of halfday    | number             | 0                  |
|                    | (0~11)             |                    |                    |
+--------------------+--------------------+--------------------+--------------------+
| h                  | clockhour of       | number             | 12                 |
|                    | halfday (1~12)     |                    |                    |
+--------------------+--------------------+--------------------+--------------------+
| H                  | hour of day (0~23) | number             | 0                  |
+--------------------+--------------------+--------------------+--------------------+
| k                  | clockhour of day   | number             | 24                 |
|                    | (1~24)             |                    |                    |
+--------------------+--------------------+--------------------+--------------------+
| m                  | minute of hour     | number             | 30                 |
+--------------------+--------------------+--------------------+--------------------+
| s                  | second of minute   | number             | 55                 |
+--------------------+--------------------+--------------------+--------------------+
| S                  | fraction of second | number             | 978                |
+--------------------+--------------------+--------------------+--------------------+
| z                  | time zone          | text               | Pacific Standard   |
|                    |                    |                    | Time; PST          |
+--------------------+--------------------+--------------------+--------------------+
| Z                  | time zone          | zone               | -0800; -08:00;     |
|                    | offset/id          |                    | America/Los\_Angel |
|                    |                    |                    | es                 |
+--------------------+--------------------+--------------------+--------------------+
| '                  | escape for text    | delimiter          | ''                 |
+--------------------+--------------------+--------------------+--------------------+

The count of pattern letters determine the format.

Text
    If the number of pattern letters is 4 or more, the full form is
    used; otherwise a short or abbreviated form is used if available.

Number
    The minimum number of digits. Shorter numbers are zero-padded to
    this amount.

Year
    Numeric presentation for year and weekyear fields are handled
    specially. For example, if the count of *y* is 2, the year will be
    displayed as the zero-based year of the century, which is two
    digits.

Month
    3 or over, use text, otherwise use number.

Zone
    *Z* outputs offset without a colon, *ZZ* outputs the offset with a
    colon, *ZZZ* or more outputs the zone id.

Zone names
    Time zone names (*z*) cannot be parsed.

Any characters in the pattern that are not in the ranges of [*a*..\ *z*]
and [*A*..\ *Z*] will be treated as quoted text. For instance,
characters like *:*, *.*, ' *, '#* and *?* will appear in the resulting
time text even they are not embraced within single quotes.

IPv4 Range Aggregation
----------------------

Just like the dedicated
`date <#search-aggregations-bucket-daterange-aggregation>`__ range
aggregation, there is also a dedicated range aggregation for IPv4 typed
fields:

Example:

.. code:: js

    {
        "aggs" : {
            "ip_ranges" : {
                "ip_range" : {
                    "field" : "ip",
                    "ranges" : [
                        { "to" : "10.0.0.5" },
                        { "from" : "10.0.0.5" }
                    ]
                }
            }
        }
    }

Response:

.. code:: js

    {
        ...

        "aggregations": {
            "ip_ranges":
                "buckets" : [
                    {
                        "to": 167772165,
                        "to_as_string": "10.0.0.5",
                        "doc_count": 4
                    },
                    {
                        "from": 167772165,
                        "from_as_string": "10.0.0.5",
                        "doc_count": 6
                    }
                ]
            }
        }
    }

IP ranges can also be defined as CIDR masks:

.. code:: js

    {
        "aggs" : {
            "ip_ranges" : {
                "ip_range" : {
                    "field" : "ip",
                    "ranges" : [
                        { "mask" : "10.0.0.0/25" },
                        { "mask" : "10.0.0.127/25" }
                    ]
                }
            }
        }
    }

Response:

.. code:: js

    {
        "aggregations": {
            "ip_ranges": {
                "buckets": [
                    {
                        "key": "10.0.0.0/25",
                        "from": 1.6777216E+8,
                        "from_as_string": "10.0.0.0",
                        "to": 167772287,
                        "to_as_string": "10.0.0.127",
                        "doc_count": 127
                    },
                    {
                        "key": "10.0.0.127/25",
                        "from": 1.6777216E+8,
                        "from_as_string": "10.0.0.0",
                        "to": 167772287,
                        "to_as_string": "10.0.0.127",
                        "doc_count": 127
                    }
                ]
            }
        }
    }

Histogram Aggregation
---------------------

A multi-bucket values source based aggregation that can be applied on
numeric values extracted from the documents. It dynamically builds fixed
size (a.k.a. interval) buckets over the values. For example, if the
documents have a field that holds a price (numeric), we can configure
this aggregation to dynamically build buckets with interval ``5`` (in
case of price it may represent $5). When the aggregation executes, the
price field of every document will be evaluated and will be rounded down
to its closest bucket - for example, if the price is ``32`` and the
bucket size is ``5`` then the rounding will yield ``30`` and thus the
document will "fall" into the bucket that is associated withe the key
``30``. To make this more formal, here is the rounding function that is
used:

.. code:: java

    rem = value % interval
    if (rem < 0) {
        rem += interval
    }
    bucket_key = value - rem

The following snippet "buckets" the products based on their ``price`` by
interval of ``50``:

.. code:: js

    {
        "aggs" : {
            "prices" : {
                "histogram" : {
                    "field" : "price",
                    "interval" : 50
                }
            }
        }
    }

And the following may be the response:

.. code:: js

    {
        "aggregations": {
            "prices" : {
                "buckets": [
                    {
                        "key": 0,
                        "doc_count": 2
                    },
                    {
                        "key": 50,
                        "doc_count": 4
                    },
                    {
                        "key": 150,
                        "doc_count": 3
                    }
                ]
            }
        }
    }

The response above shows that none of the aggregated products has a
price that falls within the range of ``[100 - 150)``. By default, the
response will only contain those buckets with a ``doc_count`` greater
than 0. It is possible change that and request buckets with either a
higher minimum count or even 0 (in which case elasticsearch will "fill
in the gaps" and create buckets with zero documents). This can be
configured using the ``min_doc_count`` setting:

.. code:: js

    {
        "aggs" : {
            "prices" : {
                "histogram" : {
                    "field" : "price",
                    "interval" : 50,
                    "min_doc_count" : 0
                }
            }
        }
    }

Response:

.. code:: js

    {
        "aggregations": {
            "prices" : {
                "buckets": [
                    {
                        "key": 0,
                        "doc_count": 2
                    },
                    {
                        "key": 50,
                        "doc_count": 4
                    },
                    {
                        "key" : 100,
                        "doc_count" : 0 
                    },
                    {
                        "key": 150,
                        "doc_count": 3
                    }
                ]
            }
        }
    }

No documents were found that belong in this bucket, yet it is still
returned with zero ``doc_count``.

By default the date\_/histogram returns all the buckets within the range
of the data itself, that is, the documents with the smallest values (on
which with histogram) will determine the min bucket (the bucket with the
smallest key) and the documents with the highest values will determine
the max bucket (the bucket with the highest key). Often, when when
requesting empty buckets (``"min_doc_count" : 0``), this causes a
confusion, specifically, when the data is also filtered.

To understand why, let’s look at an example:

Lets say the you’re filtering your request to get all docs with values
between ``0`` and ``500``, in addition you’d like to slice the data per
price using a histogram with an interval of ``50``. You also specify
``"min_doc_count" : 0`` as you’d like to get all buckets even the empty
ones. If it happens that all products (documents) have prices higher
than ``100``, the first bucket you’ll get will be the one with ``100``
as its key. This is confusing, as many times, you’d also like to get
those buckets between ``0 - 100``.

With ``extended_bounds`` setting, you now can "force" the histogram
aggregation to start building buckets on a specific ``min`` values and
also keep on building buckets up to a ``max`` value (even if there are
no documents anymore). Using ``extended_bounds`` only makes sense when
``min_doc_count`` is 0 (the empty buckets will never be returned if
``min_doc_count`` is greater than 0).

Note that (as the name suggest) ``extended_bounds`` is **not** filtering
buckets. Meaning, if the ``extended_bounds.min`` is higher than the
values extracted from the documents, the documents will still dictate
what the first bucket will be (and the same goes for the
``extended_bounds.max`` and the last bucket). For filtering buckets, one
should nest the histogram aggregation under a range ``filter``
aggregation with the appropriate ``from``/``to`` settings.

Example:

.. code:: js

    {
        "query" : {
            "filtered" : { "filter": { "range" : { "price" : { "to" : "500" } } } }
        },
        "aggs" : {
            "prices" : {
                "histogram" : {
                    "field" : "price",
                    "interval" : 50,
                    "min_doc_count" : 0,
                    "extended_bounds" : {
                        "min" : 0,
                        "max" : 500
                    }
                }
            }
        }
    }

Order
~~~~~

By default the returned buckets are sorted by their ``key`` ascending,
though the order behaviour can be controled using the ``order`` setting.

Ordering the buckets by their key - descending:

.. code:: js

    {
        "aggs" : {
            "prices" : {
                "histogram" : {
                    "field" : "price",
                    "interval" : 50,
                    "order" : { "_key" : "desc" }
                }
            }
        }
    }

Ordering the buckets by their ``doc_count`` - ascending:

.. code:: js

    {
        "aggs" : {
            "prices" : {
                "histogram" : {
                    "field" : "price",
                    "interval" : 50,
                    "order" : { "_count" : "asc" }
                }
            }
        }
    }

If the histogram aggregation has a direct metrics sub-aggregation, the
latter can determine the order of the buckets:

.. code:: js

    {
        "aggs" : {
            "prices" : {
                "histogram" : {
                    "field" : "price",
                    "interval" : 50,
                    "order" : { "price_stats.min" : "asc" } 
                },
                "aggs" : {
                    "price_stats" : { "stats" : {} } 
                }
            }
        }
    }

The ``{ "price_stats.min" : asc" }`` will sort the buckets based on
``min`` value of their ``price_stats`` sub-aggregation.

There is no need to configure the ``price`` field for the
``price_stats`` aggregation as it will inherit it by default from its
parent histogram aggregation.

It is also possible to order the buckets based on a "deeper" aggregation
in the hierarchy. This is supported as long as the aggregations path are
of a single-bucket type, where the last aggregation in the path may
either by a single-bucket one or a metrics one. If it’s a single-bucket
type, the order will be defined by the number of docs in the bucket
(i.e. ``doc_count``), in case it’s a metrics one, the same rules as
above apply (where the path must indicate the metric name to sort by in
case of a multi-value metrics aggregation, and in case of a single-value
metrics aggregation the sort will be applied on that value).

The path must be defined in the following form:

::

    AGG_SEPARATOR       :=  '>'
    METRIC_SEPARATOR    :=  '.'
    AGG_NAME            :=  <the name of the aggregation>
    METRIC              :=  <the name of the metric (in case of multi-value metrics aggregation)>
    PATH                :=  <AGG_NAME>[<AGG_SEPARATOR><AGG_NAME>]*[<METRIC_SEPARATOR><METRIC>]

.. code:: js

    {
        "aggs" : {
            "prices" : {
                "histogram" : {
                    "field" : "price",
                    "interval" : 50,
                    "order" : { "promoted_products>rating_stats.avg" : "desc" } 
                },
                "aggs" : {
                    "promoted_products" : {
                        "filter" : { "term" : { "promoted" : true }},
                        "aggs" : {
                            "rating_stats" : { "stats" : { "field" : "rating" }}
                        }
                    }
                }
            }
        }
    }

The above will sort the buckets based on the avg rating among the
promoted products

Minimum document count
~~~~~~~~~~~~~~~~~~~~~~

It is possible to only return buckets that have a document count that is
greater than or equal to a configured limit through the
``min_doc_count`` option.

.. code:: js

    {
        "aggs" : {
            "prices" : {
                "histogram" : {
                    "field" : "price",
                    "interval" : 50,
                    "min_doc_count": 10
                }
            }
        }
    }

The above aggregation would only return buckets that contain 10
documents or more. Default value is ``1``.

    **Note**

    The special value ``0`` can be used to add empty buckets to the
    response between the minimum and the maximum buckets. Here is an
    example of what the response could look like:

.. code:: js

    {
        "aggregations": {
            "prices": {
                "buckets": {
                    "0": {
                        "key": 0,
                        "doc_count": 2
                    },
                    "50": {
                        "key": 50,
                        "doc_count": 0
                    },
                    "150": {
                        "key": 150,
                        "doc_count": 3
                    },
                    "200": {
                        "key": 150,
                        "doc_count": 0
                    },
                    "250": {
                        "key": 150,
                        "doc_count": 0
                    },
                    "300": {
                        "key": 150,
                        "doc_count": 1
                    }
                }
            }
       }
    }

Response Format
~~~~~~~~~~~~~~~

By default, the buckets are returned as an ordered array. It is also
possible to request the response as a hash instead keyed by the buckets
keys:

.. code:: js

    {
        "aggs" : {
            "prices" : {
                "histogram" : {
                    "field" : "price",
                    "interval" : 50,
                    "keyed" : true
                }
            }
        }
    }

Response:

.. code:: js

    {
        "aggregations": {
            "prices": {
                "buckets": {
                    "0": {
                        "key": 0,
                        "doc_count": 2
                    },
                    "50": {
                        "key": 50,
                        "doc_count": 4
                    },
                    "150": {
                        "key": 150,
                        "doc_count": 3
                    }
                }
            }
        }
    }

Date Histogram Aggregation
--------------------------

A multi-bucket aggregation similar to the
`histogram <#search-aggregations-bucket-histogram-aggregation>`__ except
it can only be applied on date values. Since dates are represented in
elasticsearch internally as long values, it is possible to use the
normal ``histogram`` on dates as well, though accuracy will be
compromised. The reason for this is in the fact that time based
intervals are not fixed (think of leap years and on the number of days
in a month). For this reason, we need a special support for time based
data. From a functionality perspective, this histogram supports the same
features as the normal
`histogram <#search-aggregations-bucket-histogram-aggregation>`__. The
main difference is that the interval can be specified by date/time
expressions.

Requesting bucket intervals of a month.

.. code:: js

    {
        "aggs" : {
            "articles_over_time" : {
                "date_histogram" : {
                    "field" : "date",
                    "interval" : "month"
                }
            }
        }
    }

Available expressions for interval: ``year``, ``quarter``, ``month``,
``week``, ``day``, ``hour``, ``minute``, ``second``

Fractional values are allowed for seconds, minutes, hours, days and
weeks. For example 1.5 hours:

.. code:: js

    {
        "aggs" : {
            "articles_over_time" : {
                "date_histogram" : {
                    "field" : "date",
                    "interval" : "1.5h"
                }
            }
        }
    }

See ? for accepted abbreviations.

Time Zone
~~~~~~~~~

By default, times are stored as UTC milliseconds since the epoch. Thus,
all computation and "bucketing" / "rounding" is done on UTC. It is
possible to provide a time zone (both pre rounding, and post rounding)
value, which will cause all computations to take the relevant zone into
account. The time returned for each bucket/entry is milliseconds since
the epoch of the provided time zone.

The parameters are ``pre_zone`` (pre rounding based on interval) and
``post_zone`` (post rounding based on interval). The ``time_zone``
parameter simply sets the ``pre_zone`` parameter. By default, those are
set to ``UTC``.

The zone value accepts either a numeric value for the hours offset, for
example: ``"time_zone" : -2``. It also accepts a format of hours and
minutes, like ``"time_zone" : "-02:30"``. Another option is to provide a
time zone accepted as one of the values listed here.

Lets take an example. For ``2012-04-01T04:15:30Z``, with a ``pre_zone``
of ``-08:00``. For day interval, the actual time by applying the time
zone and rounding falls under ``2012-03-31``, so the returned value will
be (in millis) of ``2012-03-31T00:00:00Z`` (UTC). For hour interval,
applying the time zone results in ``2012-03-31T20:15:30``, rounding it
results in ``2012-03-31T20:00:00``, but, we want to return it in UTC
(``post_zone`` is not set), so we convert it back to UTC:
``2012-04-01T04:00:00Z``. Note, we are consistent in the results,
returning the rounded value in UTC.

``post_zone`` simply takes the result, and adds the relevant offset.

Sometimes, we want to apply the same conversion to UTC we did above for
hour also for day (and up) intervals. We can set
``pre_zone_adjust_large_interval`` to ``true``, which will apply the
same conversion done for hour interval in the example, to day and above
intervals (it can be set regardless of the interval, but only kick in
when using day and higher intervals).

Pre/Post Offset
~~~~~~~~~~~~~~~

Specific offsets can be provided for pre rounding and post rounding. The
``pre_offset`` for pre rounding, and ``post_offset`` for post rounding.
The format is the date time format (``1h``, ``1d``, etc…).

Keys
~~~~

Since internally, dates are represented as 64bit numbers, these numbers
are returned as the bucket keys (each key representing a date -
milliseconds since the epoch). It is also possible to define a date
format, which will result in returning the dates as formatted strings
next to the numeric key values:

.. code:: js

    {
        "aggs" : {
            "articles_over_time" : {
                "date_histogram" : {
                    "field" : "date",
                    "interval" : "1M",
                    "format" : "yyyy-MM-dd" 
                }
            }
        }
    }

Supports expressive date `format pattern <#date-format-pattern>`__

Response:

.. code:: js

    {
        "aggregations": {
            "articles_over_time": {
                "buckets": [
                    {
                        "key_as_string": "2013-02-02",
                        "key": 1328140800000,
                        "doc_count": 1
                    },
                    {
                        "key_as_string": "2013-03-02",
                        "key": 1330646400000,
                        "doc_count": 2
                    },
                    ...
                ]
            }
        }
    }

Like with the normal
`histogram <#search-aggregations-bucket-histogram-aggregation>`__, both
document level scripts and value level scripts are supported. It is also
possible to control the order of the returned buckets using the
``order`` settings and filter the returned buckets based on a
``min_doc_count`` setting (by defaults to all buckets with
``min_doc_count > 0`` will be returned). This histogram also supports
the ``extended_bounds`` settings, that enables extending the bounds of
the histogram beyond the data itself (to read more on why you’d want to
do that please refer to the explanation
`here <#search-aggregations-bucket-histogram-aggregation-extended-bounds>`__.

Geo Distance Aggregation
------------------------

A multi-bucket aggregation that works on ``geo_point`` fields and
conceptually works very similar to the
`range <#search-aggregations-bucket-range-aggregation>`__ aggregation.
The user can define a point of origin and a set of distance range
buckets. The aggregation evaluate the distance of each document value
from the origin point and determines the buckets it belongs to based on
the ranges (a document belongs to a bucket if the distance between the
document and the origin falls within the distance range of the bucket).

.. code:: js

    {
        "aggs" : {
            "rings_around_amsterdam" : {
                "geo_distance" : {
                    "field" : "location",
                    "origin" : "52.3760, 4.894",
                    "ranges" : [
                        { "to" : 100 },
                        { "from" : 100, "to" : 300 },
                        { "from" : 300 }
                    ]
                }
            }
        }
    }

Response:

.. code:: js

    {
        "aggregations": {
            "rings" : {
                "buckets": [
                    {
                        "unit": "km",
                        "to": 100.0,
                        "doc_count": 3
                    },
                    {
                        "unit": "km",
                        "from": 100.0,
                        "to": 300.0,
                        "doc_count": 1
                    },
                    {
                        "unit": "km",
                        "from": 300.0,
                        "doc_count": 7
                    }
                ]
            }
        }
    }

The specified field must be of type ``geo_point`` (which can only be set
explicitly in the mappings). And it can also hold an array of
``geo_point`` fields, in which case all will be taken into account
during aggregation. The origin point can accept all formats supported by
the ``geo_point`` `type <#mapping-geo-point-type>`__:

-  Object format: ``{ "lat" : 52.3760, "lon" : 4.894 }`` - this is the
   safest format as it is the most explicit about the ``lat`` & ``lon``
   values

-  String format: ``"52.3760, 4.894"`` - where the first number is the
   ``lat`` and the second is the ``lon``

-  Array format: ``[4.894, 52.3760]`` - which is based on the
   ``GeoJson`` standard and where the first number is the ``lon`` and
   the second one is the ``lat``

By default, the distance unit is ``km`` but it can also accept: ``mi``
(miles), ``in`` (inch), ``yd`` (yards), ``m`` (meters), ``cm``
(centimeters), ``mm`` (millimeters).

.. code:: js

    {
        "aggs" : {
            "rings" : {
                "geo_distance" : {
                    "field" : "location",
                    "origin" : "52.3760, 4.894",
                    "unit" : "mi", 
                    "ranges" : [
                        { "to" : 100 },
                        { "from" : 100, "to" : 300 },
                        { "from" : 300 }
                    ]
                }
            }
        }
    }

The distances will be computed as miles

There are three distance calculation modes: ``sloppy_arc`` (the
default), ``arc`` (most accurate) and ``plane`` (fastest). The ``arc``
calculation is the most accurate one but also the more expensive one in
terms of performance. The ``sloppy_arc`` is faster but less accurate.
The ``plane`` is the fastest but least accurate distance function.
Consider using ``plane`` when your search context is "narrow" and spans
smaller geographical areas (like cities or even countries). ``plane``
may return higher error mergins for searches across very large areas
(e.g. cross continent search). The distance calculation type can be set
using the ``distance_type`` parameter:

.. code:: js

    {
        "aggs" : {
            "rings" : {
                "geo_distance" : {
                    "field" : "location",
                    "origin" : "52.3760, 4.894",
                    "distance_type" : "plane",
                    "ranges" : [
                        { "to" : 100 },
                        { "from" : 100, "to" : 300 },
                        { "from" : 300 }
                    ]
                }
            }
        }
    }

GeoHash grid Aggregation
------------------------

A multi-bucket aggregation that works on ``geo_point`` fields and groups
points into buckets that represent cells in a grid. The resulting grid
can be sparse and only contains cells that have matching data. Each cell
is labeled using a `geohash <http://en.wikipedia.org/wiki/Geohash>`__
which is of user-definable precision.

-  High precision geohashes have a long string length and represent
   cells that cover only a small area.

-  Low precision geohashes have a short string length and represent
   cells that each cover a large area.

Geohashes used in this aggregation can have a choice of precision
between 1 and 12.

    **Warning**

    The highest-precision geohash of length 12 produces cells that cover
    less than a square metre of land and so high-precision requests can
    be very costly in terms of RAM and result sizes. Please see the
    example below on how to first filter the aggregation to a smaller
    geographic area before requesting high-levels of detail.

The specified field must be of type ``geo_point`` (which can only be set
explicitly in the mappings) and it can also hold an array of
``geo_point`` fields, in which case all points will be taken into
account during aggregation.

Simple low-precision request
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code:: js

    {
        "aggregations" : {
            "myLarge-GrainGeoHashGrid" : {
                "geohash_grid" : {
                    "field" : "location",
                    "precision" : 3
                }
            }
        }
    }

Response:

.. code:: js

    {
        "aggregations": {
            "myLarge-GrainGeoHashGrid": {
                "buckets": [
                    {
                        "key": "svz",
                        "doc_count": 10964
                    },
                    {
                        "key": "sv8",
                        "doc_count": 3198
                    }
                ]
            }
        }
    }

High-precision requests
~~~~~~~~~~~~~~~~~~~~~~~

When requesting detailed buckets (typically for displaying a "zoomed in"
map) a filter like
`geo\_bounding\_box <#query-dsl-geo-bounding-box-filter>`__ should be
applied to narrow the subject area otherwise potentially millions of
buckets will be created and returned.

.. code:: js

    {
        "aggregations" : {
            "zoomedInView" : {
                "filter" : {
                    "geo_bounding_box" : {
                        "location" : {
                            "top_left" : "51.73, 0.9",
                            "bottom_right" : "51.55, 1.1"
                        }
                    }
                },
                "aggregations":{
                    "zoom1":{
                        "geohash_grid" : {
                            "field":"location",
                            "precision":8,
                        }
                    }
                }
            }
        }
     }

Cell dimensions at the equator
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The table below shows the metric dimensions for cells covered by various
string lengths of geohash. Cell dimensions vary with latitude and so the
table is for the worst-case scenario at the equator.

+------------+---------------------------------------------------------------+
| **GeoHash  | **Area width x height**                                       |
| length**   |                                                               |
+------------+---------------------------------------------------------------+
| 1          | 5,009.4km x 4,992.6km                                         |
+------------+---------------------------------------------------------------+
| 2          | 1,252.3km x 624.1km                                           |
+------------+---------------------------------------------------------------+
| 3          | 156.5km x 156km                                               |
+------------+---------------------------------------------------------------+
| 4          | 39.1km x 19.5km                                               |
+------------+---------------------------------------------------------------+
| 5          | 4.9km x 4.9km                                                 |
+------------+---------------------------------------------------------------+
| 6          | 1.2km x 609.4m                                                |
+------------+---------------------------------------------------------------+
| 7          | 152.9m x 152.4m                                               |
+------------+---------------------------------------------------------------+
| 8          | 38.2m x 19m                                                   |
+------------+---------------------------------------------------------------+
| 9          | 4.8m x 4.8m                                                   |
+------------+---------------------------------------------------------------+
| 10         | 1.2m x 59.5cm                                                 |
+------------+---------------------------------------------------------------+
| 11         | 14.9cm x 14.9cm                                               |
+------------+---------------------------------------------------------------+
| 12         | 3.7cm x 1.9cm                                                 |
+------------+---------------------------------------------------------------+

Options
~~~~~~~

+------------+---------------------------------------------------------------+
| field      | Mandatory. The name of the field indexed with GeoPoints.      |
+------------+---------------------------------------------------------------+
| precision  | Optional. The string length of the geohashes used to define   |
|            | cells/buckets in the results. Defaults to 5.                  |
+------------+---------------------------------------------------------------+
| size       | Optional. The maximum number of geohash buckets to return     |
|            | (defaults to 10,000). When results are trimmed, buckets are   |
|            | prioritised based on the volumes of documents they contain. A |
|            | value of ``0`` will return all buckets that contain a hit,    |
|            | use with caution as this could use a lot of CPU and network   |
|            | bandwith if there are many buckets.                           |
+------------+---------------------------------------------------------------+
| shard\_siz | Optional. To allow for more accurate counting of the top      |
| e          | cells returned in the final result the aggregation defaults   |
|            | to returning ``max(10,(size x number-of-shards))`` buckets    |
|            | from each shard. If this heuristic is undesirable, the number |
|            | considered from each shard can be over-ridden using this      |
|            | parameter. A value of ``0`` makes the shard size unlimited.   |
+------------+---------------------------------------------------------------+

Facets
======

Faceted search refers to a way to explore large amounts of data by
displaying summaries about various partitions of the data and later
allowing to narrow the navigation to a specific partition.

In Elasticsearch, ``facets`` are also the name of a feature that allowed
to compute these summaries. ``facets`` have been replaced by
`aggregations <#search-aggregations>`__ in Elasticsearch 1.0, which are
a superset of facets.

Suggesters
==========

The suggest feature suggests similar looking terms based on a provided
text by using a suggester. Parts of the suggest feature are still under
development.

The suggest request part is either defined alongside the query part in a
``_search`` request or via the REST ``_suggest`` endpoint.

.. code:: js

    curl -s -XPOST 'localhost:9200/_search' -d '{
      "query" : {
        ...
      },
      "suggest" : {
        ...
      }
    }'

Suggest requests executed against the ``_suggest`` endpoint should omit
the surrounding ``suggest`` element which is only used if the suggest
request is part of a search.

.. code:: js

    curl -XPOST 'localhost:9200/_suggest' -d '{
      "my-suggestion" : {
        "text" : "the amsterdma meetpu",
        "term" : {
          "field" : "body"
        }
      }
    }'

Several suggestions can be specified per request. Each suggestion is
identified with an arbitrary name. In the example below two suggestions
are requested. Both ``my-suggest-1`` and ``my-suggest-2`` suggestions
use the ``term`` suggester, but have a different ``text``.

.. code:: js

    "suggest" : {
      "my-suggest-1" : {
        "text" : "the amsterdma meetpu",
        "term" : {
          "field" : "body"
        }
      },
      "my-suggest-2" : {
        "text" : "the rottredam meetpu",
        "term" : {
          "field" : "title"
        }
      }
    }

The below suggest response example includes the suggestion response for
``my-suggest-1`` and ``my-suggest-2``. Each suggestion part contains
entries. Each entry is effectively a token from the suggest text and
contains the suggestion entry text, the original start offset and length
in the suggest text and if found an arbitrary number of options.

.. code:: js

    {
      ...
      "suggest": {
        "my-suggest-1": [
          {
            "text" : "amsterdma",
            "offset": 4,
            "length": 9,
            "options": [
               ...
            ]
          },
          ...
        ],
        "my-suggest-2" : [
          ...
        ]
      }
      ...
    }

Each options array contains an option object that includes the suggested
text, its document frequency and score compared to the suggest entry
text. The meaning of the score depends on the used suggester. The term
suggester’s score is based on the edit distance.

.. code:: js

    "options": [
      {
        "text": "amsterdam",
        "freq": 77,
        "score": 0.8888889
      },
      ...
    ]

**Global suggest text**

To avoid repetition of the suggest text, it is possible to define a
global text. In the example below the suggest text is defined globally
and applies to the ``my-suggest-1`` and ``my-suggest-2`` suggestions.

.. code:: js

    "suggest" : {
      "text" : "the amsterdma meetpu",
      "my-suggest-1" : {
        "term" : {
          "field" : "title"
        }
      },
      "my-suggest-2" : {
        "term" : {
          "field" : "body"
        }
      }
    }

The suggest text can in the above example also be specified as
suggestion specific option. The suggest text specified on suggestion
level override the suggest text on the global level.

**Other suggest example**

In the below example we request suggestions for the following suggest
text: ``devloping distibutd saerch engies`` on the ``title`` field with
a maximum of 3 suggestions per term inside the suggest text. Note that
in this example we use the ``count`` search type. This isn’t required,
but a nice optimization. The suggestions are gather in the ``query``
phase and in the case that we only care about suggestions (so no hits)
we don’t need to execute the ``fetch`` phase.

.. code:: js

    curl -s -XPOST 'localhost:9200/_search?search_type=count' -d '{
      "suggest" : {
        "my-title-suggestions-1" : {
          "text" : "devloping distibutd saerch engies",
          "term" : {
            "size" : 3,
            "field" : "title"
          }
        }
      }
    }'

The above request could yield the response as stated in the code example
below. As you can see if we take the first suggested options of each
suggestion entry we get ``developing distributed search engines`` as
result.

.. code:: js

    {
      ...
      "suggest": {
        "my-title-suggestions-1": [
          {
            "text": "devloping",
            "offset": 0,
            "length": 9,
            "options": [
              {
                "text": "developing",
                "freq": 77,
                "score": 0.8888889
              },
              {
                "text": "deloping",
                "freq": 1,
                "score": 0.875
              },
              {
                "text": "deploying",
                "freq": 2,
                "score": 0.7777778
              }
            ]
          },
          {
            "text": "distibutd",
            "offset": 10,
            "length": 9,
            "options": [
              {
                "text": "distributed",
                "freq": 217,
                "score": 0.7777778
              },
              {
                "text": "disributed",
                "freq": 1,
                "score": 0.7777778
              },
              {
                "text": "distribute",
                "freq": 1,
                "score": 0.7777778
              }
            ]
          },
          {
            "text": "saerch",
            "offset": 20,
            "length": 6,
            "options": [
              {
                "text": "search",
                "freq": 1038,
                "score": 0.8333333
              },
              {
                "text": "smerch",
                "freq": 3,
                "score": 0.8333333
              },
              {
                "text": "serch",
                "freq": 2,
                "score": 0.8
              }
            ]
          },
          {
            "text": "engies",
            "offset": 27,
            "length": 6,
            "options": [
              {
                "text": "engines",
                "freq": 568,
                "score": 0.8333333
              },
              {
                "text": "engles",
                "freq": 3,
                "score": 0.8333333
              },
              {
                "text": "eggies",
                "freq": 1,
                "score": 0.8333333
              }
            ]
          }
        ]
      }
      ...
    }

Term suggester
--------------

    **Note**

    In order to understand the format of suggestions, please read the ?
    page first.

The ``term`` suggester suggests terms based on edit distance. The
provided suggest text is analyzed before terms are suggested. The
suggested terms are provided per analyzed suggest text token. The
``term`` suggester doesn’t take the query into account that is part of
request.

Common suggest options:
~~~~~~~~~~~~~~~~~~~~~~~

+------------+---------------------------------------------------------------+
| ``text``   | The suggest text. The suggest text is a required option that  |
|            | needs to be set globally or per suggestion.                   |
+------------+---------------------------------------------------------------+
| ``field``  | The field to fetch the candidate suggestions from. This is an |
|            | required option that either needs to be set globally or per   |
|            | suggestion.                                                   |
+------------+---------------------------------------------------------------+
| ``analyzer | The analyzer to analyse the suggest text with. Defaults to    |
| ``         | the search analyzer of the suggest field.                     |
+------------+---------------------------------------------------------------+
| ``size``   | The maximum corrections to be returned per suggest text       |
|            | token.                                                        |
+------------+---------------------------------------------------------------+
| ``sort``   | Defines how suggestions should be sorted per suggest text     |
|            | term. Two possible values:                                    |
|            |                                                               |
|            | -  ``score``: Sort by score first, then document frequency    |
|            |    and then the term itself.                                  |
|            |                                                               |
|            | -  ``frequency``: Sort by document frequency first, then      |
|            |    similarity score and then the term itself.                 |
|            |                                                               |
                                                                            
+------------+---------------------------------------------------------------+
| ``suggest_ | The suggest mode controls what suggestions are included or    |
| mode``     | controls for what suggest text terms, suggestions should be   |
|            | suggested. Three possible values can be specified:            |
|            |                                                               |
|            | -  ``missing``: Only provide suggestions for suggest text     |
|            |    terms that are not in the index. This is the default.      |
|            |                                                               |
|            | -  ``popular``: Only suggest suggestions that occur in more   |
|            |    docs then the original suggest text term.                  |
|            |                                                               |
|            | -  ``always``: Suggest any matching suggestions based on      |
|            |    terms in the suggest text.                                 |
|            |                                                               |
                                                                            
+------------+---------------------------------------------------------------+

Other term suggest options:
~~~~~~~~~~~~~~~~~~~~~~~~~~~

+------------+---------------------------------------------------------------+
| ``lowercas | Lower cases the suggest text terms after text analysis.       |
| e_terms``  |                                                               |
+------------+---------------------------------------------------------------+
| ``max_edit | The maximum edit distance candidate suggestions can have in   |
| s``        | order to be considered as a suggestion. Can only be a value   |
|            | between 1 and 2. Any other value result in an bad request     |
|            | error being thrown. Defaults to 2.                            |
+------------+---------------------------------------------------------------+
| ``prefix_l | The number of minimal prefix characters that must match in    |
| ength``    | order be a candidate suggestions. Defaults to 1. Increasing   |
|            | this number improves spellcheck performance. Usually          |
|            | misspellings don’t occur in the beginning of terms. (Old name |
|            | "prefix\_len" is deprecated)                                  |
+------------+---------------------------------------------------------------+
| ``min_word | The minimum length a suggest text term must have in order to  |
| _length``  | be included. Defaults to 4. (Old name "min\_word\_len" is     |
|            | deprecated)                                                   |
+------------+---------------------------------------------------------------+
| ``shard_si | Sets the maximum number of suggestions to be retrieved from   |
| ze``       | each individual shard. During the reduce phase only the top N |
|            | suggestions are returned based on the ``size`` option.        |
|            | Defaults to the ``size`` option. Setting this to a value      |
|            | higher than the ``size`` can be useful in order to get a more |
|            | accurate document frequency for spelling corrections at the   |
|            | cost of performance. Due to the fact that terms are           |
|            | partitioned amongst shards, the shard level document          |
|            | frequencies of spelling corrections may not be precise.       |
|            | Increasing this will make these document frequencies more     |
|            | precise.                                                      |
+------------+---------------------------------------------------------------+
| ``max_insp | A factor that is used to multiply with the ``shards_size`` in |
| ections``  | order to inspect more candidate spell corrections on the      |
|            | shard level. Can improve accuracy at the cost of performance. |
|            | Defaults to 5.                                                |
+------------+---------------------------------------------------------------+
| ``min_doc_ | The minimal threshold in number of documents a suggestion     |
| freq``     | should appear in. This can be specified as an absolute number |
|            | or as a relative percentage of number of documents. This can  |
|            | improve quality by only suggesting high frequency terms.      |
|            | Defaults to 0f and is not enabled. If a value higher than 1   |
|            | is specified then the number cannot be fractional. The shard  |
|            | level document frequencies are used for this option.          |
+------------+---------------------------------------------------------------+
| ``max_term | The maximum threshold in number of documents a suggest text   |
| _freq``    | token can exist in order to be included. Can be a relative    |
|            | percentage number (e.g 0.4) or an absolute number to          |
|            | represent document frequencies. If an value higher than 1 is  |
|            | specified then fractional can not be specified. Defaults to   |
|            | 0.01f. This can be used to exclude high frequency terms from  |
|            | being spellchecked. High frequency terms are usually spelled  |
|            | correctly on top of this also improves the spellcheck         |
|            | performance. The shard level document frequencies are used    |
|            | for this option.                                              |
+------------+---------------------------------------------------------------+

Phrase Suggester
----------------

    **Note**

    In order to understand the format of suggestions, please read the ?
    page first.

The ``term`` suggester provides a very convenient API to access word
alternatives on a per token basis within a certain string distance. The
API allows accessing each token in the stream individually while
suggest-selection is left to the API consumer. Yet, often pre-selected
suggestions are required in order to present to the end-user. The
``phrase`` suggester adds additional logic on top of the ``term``
suggester to select entire corrected phrases instead of individual
tokens weighted based on ``ngram-language`` models. In practice this
suggester will be able to make better decisions about which tokens to
pick based on co-occurrence and frequencies.

API Example
~~~~~~~~~~~

The ``phrase`` request is defined along side the query part in the json
request:

.. code:: js

    curl -XPOST 'localhost:9200/_search' -d {
      "suggest" : {
        "text" : "Xor the Got-Jewel",
        "simple_phrase" : {
          "phrase" : {
            "analyzer" : "body",
            "field" : "bigram",
            "size" : 1,
            "real_word_error_likelihood" : 0.95,
            "max_errors" : 0.5,
            "gram_size" : 2,
            "direct_generator" : [ {
              "field" : "body",
              "suggest_mode" : "always",
              "min_word_length" : 1
            } ],
            "highlight": {
              "pre_tag": "<em>",
              "post_tag": "</em>"
            }
          }
        }
      }
    }

The response contains suggestions scored by the most likely spell
correction first. In this case we received the expected correction
``xorr the god jewel`` first while the second correction is less
conservative where only one of the errors is corrected. Note, the
request is executed with ``max_errors`` set to ``0.5`` so 50% of the
terms can contain misspellings (See parameter descriptions below).

.. code:: js

      {
      "took" : 5,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
      },
      "hits" : {
        "total" : 2938,
        "max_score" : 0.0,
        "hits" : [ ]
      },
      "suggest" : {
        "simple_phrase" : [ {
          "text" : "Xor the Got-Jewel",
          "offset" : 0,
          "length" : 17,
          "options" : [ {
            "text" : "xorr the god jewel",
            "highlighted": "<em>xorr</em> the <em>god</em> jewel",
            "score" : 0.17877324
          }, {
            "text" : "xor the god jewel",
            "highlighted": "xor the <em>god</em> jewel",
            "score" : 0.14231323
          } ]
        } ]
      }
    }

Basic Phrase suggest API parameters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

+------------+---------------------------------------------------------------+
| ``field``  | the name of the field used to do n-gram lookups for the       |
|            | language model, the suggester will use this field to gain     |
|            | statistics to score corrections. This field is mandatory.     |
+------------+---------------------------------------------------------------+
| ``gram_siz | sets max size of the n-grams (shingles) in the ``field``. If  |
| e``        | the field doesn’t contain n-grams (shingles) this should be   |
|            | omitted or set to ``1``. Note that Elasticsearch tries to     |
|            | detect the gram size based on the specified ``field``. If the |
|            | field uses a ``shingle`` filter the ``gram_size`` is set to   |
|            | the ``max_shingle_size`` if not explicitly set.               |
+------------+---------------------------------------------------------------+
| ``real_wor | the likelihood of a term being a misspelled even if the term  |
| d_error_li | exists in the dictionary. The default it ``0.95``             |
| kelihood`` | corresponding to 5% or the real words are misspelled.         |
+------------+---------------------------------------------------------------+
| ``confiden | The confidence level defines a factor applied to the input    |
| ce``       | phrases score which is used as a threshold for other suggest  |
|            | candidates. Only candidates that score higher than the        |
|            | threshold will be included in the result. For instance a      |
|            | confidence level of ``1.0`` will only return suggestions that |
|            | score higher than the input phrase. If set to ``0.0`` the top |
|            | N candidates are returned. The default is ``1.0``.            |
+------------+---------------------------------------------------------------+
| ``max_erro | the maximum percentage of the terms that at most considered   |
| rs``       | to be misspellings in order to form a correction. This method |
|            | accepts a float value in the range ``[0..1)`` as a fraction   |
|            | of the actual query terms a number ``>=1`` as an absolute     |
|            | number of query terms. The default is set to ``1.0`` which    |
|            | corresponds to that only corrections with at most 1           |
|            | misspelled term are returned. Note that setting this too high |
|            | can negativly impact performance. Low values like ``1`` or    |
|            | ``2`` are recommended otherwise the time spend in suggest     |
|            | calls might exceed the time spend in query execution.         |
+------------+---------------------------------------------------------------+
| ``separato | the separator that is used to separate terms in the bigram    |
| r``        | field. If not set the whitespace character is used as a       |
|            | separator.                                                    |
+------------+---------------------------------------------------------------+
| ``size``   | the number of candidates that are generated for each          |
|            | individual query term Low numbers like ``3`` or ``5``         |
|            | typically produce good results. Raising this can bring up     |
|            | terms with higher edit distances. The default is ``5``.       |
+------------+---------------------------------------------------------------+
| ``analyzer | Sets the analyzer to analyse to suggest text with. Defaults   |
| ``         | to the search analyzer of the suggest field passed via        |
|            | ``field``.                                                    |
+------------+---------------------------------------------------------------+
| ``shard_si | Sets the maximum number of suggested term to be retrieved     |
| ze``       | from each individual shard. During the reduce phase, only the |
|            | top N suggestions are returned based on the ``size`` option.  |
|            | Defaults to ``5``.                                            |
+------------+---------------------------------------------------------------+
| ``text``   | Sets the text / query to provide suggestions for.             |
+------------+---------------------------------------------------------------+
| ``highligh | Sets up suggestion highlighting. If not provided then no      |
| t``        | ``highlighted`` field is returned. If provided must contain   |
|            | exactly ``pre_tag`` and ``post_tag`` which are wrapped around |
|            | the changed tokens. If multiple tokens in a row are changed   |
|            | the entire phrase of changed tokens is wrapped rather than    |
|            | each token.                                                   |
+------------+---------------------------------------------------------------+
| ``collate` | Checks each suggestion against the specified ``query`` or     |
| `          | ``filter`` to prune suggestions for which no matching docs    |
|            | exist in the index. Either a ``query`` or a ``filter`` must   |
|            | be specified, and it is run as a ```template``                |
|            | query <#query-dsl-template-query>`__. The current suggestion  |
|            | is automatically made available as the ``{{suggestion}}``     |
|            | variable, which should be used in your query/filter. You can  |
|            | still specify your own template ``params`` — the              |
|            | ``suggestion`` value will be added to the variables you       |
|            | specify. You can specify a ``preference`` to control on which |
|            | shards the query is executed (see ?). The default value is    |
|            | ``_only_local``. Additionally, you can specify a ``prune`` to |
|            | control if all phrase suggestions will be returned, when set  |
|            | to ``true`` the suggestions will have an additional option    |
|            | ``collate_match``, which will be ``true`` if matching         |
|            | documents for the phrase was found, ``false`` otherwise. The  |
|            | default value for ``prune`` is ``false``.                     |
+------------+---------------------------------------------------------------+

.. code:: js

    curl -XPOST 'localhost:9200/_search' -d {
       "suggest" : {
         "text" : "Xor the Got-Jewel",
         "simple_phrase" : {
           "phrase" : {
             "field" :  "bigram",
             "size" :   1,
             "direct_generator" : [ {
               "field" :            "body",
               "suggest_mode" :     "always",
               "min_word_length" :  1
             } ],
             "collate": {
               "query": { 
                 "match": {
                     "{{field_name}}" : "{{suggestion}}" 
                 }
               },
               "params": {"field_name" : "title"}, 
               "preference": "_primary", 
               "prune": true 
             }
           }
         }
       }
     }

This query will be run once for every suggestion.

The ``{{suggestion}}`` variable will be replaced by the text of each
suggestion.

An additional ``field_name`` variable has been specified in ``params``
and is used by the ``match`` query.

The default ``preference`` has been changed to ``_primary``.

All suggestions will be returned with an extra ``collate_match`` option
indicating whether the generated phrase matched any document.

Smoothing Models
~~~~~~~~~~~~~~~~

The ``phrase`` suggester supports multiple smoothing models to balance
weight between infrequent grams (grams (shingles) are not existing in
the index) and frequent grams (appear at least once in the index).

+------------+---------------------------------------------------------------+
| ``stupid_b | a simple backoff model that backs off to lower order n-gram   |
| ackoff``   | models if the higher order count is ``0`` and discounts the   |
|            | lower order n-gram model by a constant factor. The default    |
|            | ``discount`` is ``0.4``. Stupid Backoff is the default model. |
+------------+---------------------------------------------------------------+
| ``laplace` | a smoothing model that uses an additive smoothing where a     |
| `          | constant (typically ``1.0`` or smaller) is added to all       |
|            | counts to balance weights, The default ``alpha`` is ``0.5``.  |
+------------+---------------------------------------------------------------+
| ``linear_i | a smoothing model that takes the weighted mean of the         |
| nterpolati | unigrams, bigrams and trigrams based on user supplied weights |
| on``       | (lambdas). Linear Interpolation doesn’t have any default      |
|            | values. All parameters (``trigram_lambda``,                   |
|            | ``bigram_lambda``, ``unigram_lambda``) must be supplied.      |
+------------+---------------------------------------------------------------+

Candidate Generators
~~~~~~~~~~~~~~~~~~~~

The ``phrase`` suggester uses candidate generators to produce a list of
possible terms per term in the given text. A single candidate generator
is similar to a ``term`` suggester called for each individual term in
the text. The output of the generators is subsequently scored in
combination with the candidates from the other terms to for suggestion
candidates.

Currently only one type of candidate generator is supported, the
``direct_generator``. The Phrase suggest API accepts a list of
generators under the key ``direct_generator`` each of the generators in
the list are called per term in the original text.

Direct Generators
~~~~~~~~~~~~~~~~~

The direct generators support the following parameters:

+------------+---------------------------------------------------------------+
| ``field``  | The field to fetch the candidate suggestions from. This is a  |
|            | required option that either needs to be set globally or per   |
|            | suggestion.                                                   |
+------------+---------------------------------------------------------------+
| ``size``   | The maximum corrections to be returned per suggest text       |
|            | token.                                                        |
+------------+---------------------------------------------------------------+
| ``suggest_ | The suggest mode controls what suggestions are included or    |
| mode``     | controls for what suggest text terms, suggestions should be   |
|            | suggested. Three possible values can be specified:            |
|            |                                                               |
|            | -  ``missing``: Only suggest terms in the suggest text that   |
|            |    aren’t in the index. This is the default.                  |
|            |                                                               |
|            | -  ``popular``: Only suggest suggestions that occur in more   |
|            |    docs then the original suggest text term.                  |
|            |                                                               |
|            | -  ``always``: Suggest any matching suggestions based on      |
|            |    terms in the suggest text.                                 |
|            |                                                               |
                                                                            
+------------+---------------------------------------------------------------+
| ``max_edit | The maximum edit distance candidate suggestions can have in   |
| s``        | order to be considered as a suggestion. Can only be a value   |
|            | between 1 and 2. Any other value result in an bad request     |
|            | error being thrown. Defaults to 2.                            |
+------------+---------------------------------------------------------------+
| ``prefix_l | The number of minimal prefix characters that must match in    |
| ength``    | order be a candidate suggestions. Defaults to 1. Increasing   |
|            | this number improves spellcheck performance. Usually          |
|            | misspellings don’t occur in the beginning of terms. (Old name |
|            | "prefix\_len" is deprecated)                                  |
+------------+---------------------------------------------------------------+
| ``min_word | The minimum length a suggest text term must have in order to  |
| _length``  | be included. Defaults to 4. (Old name "min\_word\_len" is     |
|            | deprecated)                                                   |
+------------+---------------------------------------------------------------+
| ``max_insp | A factor that is used to multiply with the ``shards_size`` in |
| ections``  | order to inspect more candidate spell corrections on the      |
|            | shard level. Can improve accuracy at the cost of performance. |
|            | Defaults to 5.                                                |
+------------+---------------------------------------------------------------+
| ``min_doc_ | The minimal threshold in number of documents a suggestion     |
| freq``     | should appear in. This can be specified as an absolute number |
|            | or as a relative percentage of number of documents. This can  |
|            | improve quality by only suggesting high frequency terms.      |
|            | Defaults to 0f and is not enabled. If a value higher than 1   |
|            | is specified then the number cannot be fractional. The shard  |
|            | level document frequencies are used for this option.          |
+------------+---------------------------------------------------------------+
| ``max_term | The maximum threshold in number of documents a suggest text   |
| _freq``    | token can exist in order to be included. Can be a relative    |
|            | percentage number (e.g 0.4) or an absolute number to          |
|            | represent document frequencies. If an value higher than 1 is  |
|            | specified then fractional can not be specified. Defaults to   |
|            | 0.01f. This can be used to exclude high frequency terms from  |
|            | being spellchecked. High frequency terms are usually spelled  |
|            | correctly on top of this also improves the spellcheck         |
|            | performance. The shard level document frequencies are used    |
|            | for this option.                                              |
+------------+---------------------------------------------------------------+
| ``pre_filt | a filter (analyzer) that is applied to each of the tokens     |
| er``       | passed to this candidate generator. This filter is applied to |
|            | the original token before candidates are generated.           |
+------------+---------------------------------------------------------------+
| ``post_fil | a filter (analyzer) that is applied to each of the generated  |
| ter``      | tokens before they are passed to the actual phrase scorer.    |
+------------+---------------------------------------------------------------+

The following example shows a ``phrase`` suggest call with two
generators, the first one is using a field containing ordinary indexed
terms and the second one uses a field that uses terms indexed with a
``reverse`` filter (tokens are index in reverse order). This is used to
overcome the limitation of the direct generators to require a constant
prefix to provide high-performance suggestions. The ``pre_filter`` and
``post_filter`` options accept ordinary analyzer names.

.. code:: js

    curl -s -XPOST 'localhost:9200/_search' -d {
     "suggest" : {
        "text" : "Xor the Got-Jewel",
        "simple_phrase" : {
          "phrase" : {
            "analyzer" : "body",
            "field" : "bigram",
            "size" : 4,
            "real_word_error_likelihood" : 0.95,
            "confidence" : 2.0,
            "gram_size" : 2,
            "direct_generator" : [ {
              "field" : "body",
              "suggest_mode" : "always",
              "min_word_length" : 1
            }, {
              "field" : "reverse",
              "suggest_mode" : "always",
              "min_word_length" : 1,
              "pre_filter" : "reverse",
              "post_filter" : "reverse"
            } ]
          }
        }
      }
    }

``pre_filter`` and ``post_filter`` can also be used to inject synonyms
after candidates are generated. For instance for the query
``captain usq`` we might generate a candidate ``usa`` for term ``usq``
which is a synonym for ``america`` which allows to present
``captain america`` to the user if this phrase scores high enough.

Completion Suggester
--------------------

    **Note**

    In order to understand the format of suggestions, please read the ?
    page first.

The ``completion`` suggester is a so-called prefix suggester. It does
not do spell correction like the ``term`` or ``phrase`` suggesters but
allows basic ``auto-complete`` functionality.

Why another suggester? Why not prefix queries?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The first question which comes to mind when reading about a prefix
suggestion is, why you should use it at all, if you have prefix queries
already. The answer is simple: Prefix suggestions are fast.

The data structures are internally backed by Lucenes
``AnalyzingSuggester``, which uses FSTs to execute suggestions. Usually
these data structures are costly to create, stored in-memory and need to
be rebuilt every now and then to reflect changes in your indexed
documents. The ``completion`` suggester circumvents this by storing the
FST as part of your index during index time. This allows for really fast
loads and executions.

Mapping
~~~~~~~

In order to use this feature, you have to specify a special mapping for
this field, which enables the special storage of the field.

.. code:: js

    curl -X PUT localhost:9200/music
    curl -X PUT localhost:9200/music/song/_mapping -d '{
      "song" : {
            "properties" : {
                "name" : { "type" : "string" },
                "suggest" : { "type" : "completion",
                              "index_analyzer" : "simple",
                              "search_analyzer" : "simple",
                              "payloads" : true
                }
            }
        }
    }'

Mapping supports the following parameters:

``index_analyzer``
    The index analyzer to use, defaults to ``simple``.

``search_analyzer``
    The search analyzer to use, defaults to ``simple``. In case you are
    wondering why we did not opt for the ``standard`` analyzer: We try
    to have easy to understand behaviour here, and if you index the
    field content ``At the Drive-in``, you will not get any suggestions
    for ``a``, nor for ``d`` (the first non stopword).

``payloads``
    Enables the storing of payloads, defaults to ``false``

``preserve_separators``
    Preserves the separators, defaults to ``true``. If disabled, you
    could find a field starting with ``Foo Fighters``, if you suggest
    for ``foof``.

``preserve_position_increments``
    Enables position increments, defaults to ``true``. If disabled and
    using stopwords analyzer, you could get a field starting with
    ``The Beatles``, if you suggest for ``b``. **Note**: You could also
    achieve this by indexing two inputs, ``Beatles`` and
    ``The Beatles``, no need to change a simple analyzer, if you are
    able to enrich your data.

``max_input_length``
    Limits the length of a single input, defaults to ``50`` UTF-16 code
    points. This limit is only used at index time to reduce the total
    number of characters per input string in order to prevent massive
    inputs from bloating the underlying datastructure. The most usecases
    won’t be influenced by the default value since prefix completions
    hardly grow beyond prefixes longer than a handful of characters.
    (Old name "max\_input\_len" is deprecated)

Indexing
~~~~~~~~

.. code:: js

    curl -X PUT 'localhost:9200/music/song/1?refresh=true' -d '{
        "name" : "Nevermind",
        "suggest" : {
            "input": [ "Nevermind", "Nirvana" ],
            "output": "Nirvana - Nevermind",
            "payload" : { "artistId" : 2321 },
            "weight" : 34
        }
    }'

The following parameters are supported:

``input``
    The input to store, this can be a an array of strings or just a
    string. This field is mandatory.

``output``
    The string to return, if a suggestion matches. This is very useful
    to normalize outputs (i.e. have them always in the format
    ``artist - songname``). This is optional. **Note**: The result is
    de-duplicated if several documents have the same output, i.e. only
    one is returned as part of the suggest result.

``payload``
    An arbitrary JSON object, which is simply returned in the suggest
    option. You could store data like the id of a document, in order to
    load it from elasticsearch without executing another search (which
    might not yield any results, if ``input`` and ``output`` differ
    strongly).

``weight``
    A positive integer or a string containing a positive integer, which
    defines a weight and allows you to rank your suggestions. This field
    is optional.

    **Note**

    Even though you will lose most of the features of the completion
    suggest, you can choose to use the following shorthand form. Keep in
    mind that you will not be able to use several inputs, an output,
    payloads or weights. This form does still work inside of multi
    fields.

.. code:: js

    {
      "suggest" : "Nirvana"
    }

    **Note**

    The suggest data structure might not reflect deletes on documents
    immediately. You may need to do an ? for that. You can call optimize
    with the ``only_expunge_deletes=true`` to only cater for deletes or
    alternatively call a ? operation.

Querying
~~~~~~~~

Suggesting works as usual, except that you have to specify the suggest
type as ``completion``.

.. code:: js

    curl -X POST 'localhost:9200/music/_suggest?pretty' -d '{
        "song-suggest" : {
            "text" : "n",
            "completion" : {
                "field" : "suggest"
            }
        }
    }'

    {
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
      },
      "song-suggest" : [ {
        "text" : "n",
        "offset" : 0,
        "length" : 4,
        "options" : [ {
          "text" : "Nirvana - Nevermind",
          "score" : 34.0, "payload" : {"artistId":2321}
        } ]
      } ]
    }

As you can see, the payload is included in the response, if configured
appropriately. If you configured a weight for a suggestion, this weight
is used as ``score``. Also the ``text`` field uses the ``output`` of
your indexed suggestion, if configured, otherwise the matched part of
the ``input`` field.

The basic completion suggester query supports the following two
parameters:

``field``
    The name of the field on which to run the query (required).

``size``
    The number of suggestions to return (defaults to ``5``).

    **Note**

    The completion suggester considers all documents in the index. See ?
    for an explanation of how to query a subset of documents instead.

Fuzzy queries
~~~~~~~~~~~~~

The completion suggester also supports fuzzy queries - this means, you
can actually have a typo in your search and still get results back.

.. code:: js

    curl -X POST 'localhost:9200/music/_suggest?pretty' -d '{
        "song-suggest" : {
            "text" : "n",
            "completion" : {
                "field" : "suggest",
                "fuzzy" : {
                    "fuzziness" : 2
                }
            }
        }
    }'

The fuzzy query can take specific fuzzy parameters. The following
parameters are supported:

+------------+---------------------------------------------------------------+
| ``fuzzines | The fuzziness factor, defaults to ``AUTO``. See ? for allowed |
| s``        | settings.                                                     |
+------------+---------------------------------------------------------------+
| ``transpos | Sets if transpositions should be counted as one or two        |
| itions``   | changes, defaults to ``true``                                 |
+------------+---------------------------------------------------------------+
| ``min_leng | Minimum length of the input before fuzzy suggestions are      |
| th``       | returned, defaults ``3``                                      |
+------------+---------------------------------------------------------------+
| ``prefix_l | Minimum length of the input, which is not checked for fuzzy   |
| ength``    | alternatives, defaults to ``1``                               |
+------------+---------------------------------------------------------------+
| ``unicode_ | Sets all are measurements (like edit distance, transpositions |
| aware``    | and lengths) in unicode code points (actual letters) instead  |
|            | of bytes.                                                     |
+------------+---------------------------------------------------------------+

    **Note**

    If you want to stick with the default values, but still use fuzzy,
    you can either use ``fuzzy: {}`` or ``fuzzy: true``.

Context Suggester
-----------------

The context suggester is an extension to the suggest API of
Elasticsearch. Namely the suggester system provides a very fast way of
searching documents by handling these entirely in memory. But this
special treatment does not allow the handling of traditional queries and
filters, because those would have notable impact on the performance. So
the context extension is designed to take so-called context information
into account to specify a more accurate way of searching within the
suggester system. Instead of using the traditional query and filter
system a predefined \`\`context\`\` is configured to limit suggestions
to a particular subset of suggestions. Such a context is defined by a
set of context mappings which can either be a simple **category** or a
**geo location**. The information used by the context suggester is
configured in the type mapping with the ``context`` parameter, which
lists all of the contexts that need to be specified in each document and
in each suggestion request. For instance:

.. code:: js

    PUT services/_mapping/service
    {
        "service": {
            "properties": {
                "name": {
                    "type" : "string"
                },
                "tag": {
                    "type" : "string"
                },
                "suggest_field": {
                    "type": "completion",
                    "context": {
                        "color": { 
                            "type": "category",
                            "path": "color_field",
                            "default": ["red", "green", "blue"]
                        },
                        "location": { 
                            "type": "geo",
                            "precision": "5m",
                            "neighbors": true,
                            "default": "u33"
                        }
                    }
                }
            }
        }
    }

See ?

See ?

However contexts are specified (as type ``category`` or ``geo``, which
are discussed below), each context value generates a new sub-set of
documents which can be queried by the completion suggester. All three
types accept a ``default`` parameter which provides a default value to
use if the corresponding context value is absent.

The basic structure of this element is that each field forms a new
context and the fieldname is used to reference this context information
later on during indexing or querying. All context mappings have the
``default`` and the ``type`` option in common. The value of the
``default`` field is used, when ever no specific is provided for the
certain context. Note that a context is defined by at least one value.
The ``type`` option defines the kind of information hold by this
context. These type will be explained further in the following sections.

**Category Context**

The ``category`` context allows you to specify one or more categories in
the document at index time. The document will be assigned to each named
category, which can then be queried later. The category type also allows
to specify a field to extract the categories from. The ``path``
parameter is used to specify this field of the documents that should be
used. If the referenced field contains multiple values, all these values
will be used as alternative categories.

**Category Mapping**

The mapping for a category is simply defined by its ``default`` values.
These can either be defined as list of **default** categories:

.. code:: js

    "context": {
        "color": {
            "type": "category",
            "default": ["red", "orange"]
        }
    }

or as a single value

.. code:: js

    "context": {
        "color": {
            "type": "category",
            "default": "red"
        }
    }

or as reference to another field within the documents indexed:

.. code:: js

    "context": {
        "color": {
            "type": "category",
            "default": "red"
            "path": "color_field"
        }
    }

in this case the **default** categories will only be used, if the given
field does not exist within the document. In the example above the
categories are received from a field named ``color_field``. If this
field does not exist a category **red** is assumed for the context
**color**.

**Indexing category contexts**

Within a document the category is specified either as an ``array`` of
values, a single value or ``null``. A list of values is interpreted as
alternative categories. So a document belongs to all the categories
defined. If the category is ``null`` or remains unset the categories
will be retrieved from the documents field addressed by the ``path``
parameter. If this value is not set or the field is missing, the default
values of the mapping will be assigned to the context.

.. code:: js

    PUT services/service/1
    {
        "name": "knapsack",
        "suggest_field": {
            "input": ["knacksack", "backpack", "daypack"],
            "context": {
                "color": ["red", "yellow"]
            }
        }
    }

**Category Query**

A query within a category works similar to the configuration. If the
value is ``null`` the mappings default categories will be used.
Otherwise the suggestion takes place for all documents that have at
least one category in common with the query.

.. code:: js

    POST services/_suggest?pretty'
    {
        "suggest" : {
            "text" : "m",
            "completion" : {
                "field" : "suggest_field",
                "size": 10,
                "context": {
                    "color": "red"
                }
            }
        }
    }

**Geo location Context**

A ``geo`` context allows you to limit results to those that lie within a
certain distance of a specified geolocation. At index time, a lat/long
geo point is converted into a geohash of a certain precision, which
provides the context.

**Geo location Mapping**

The mapping for a geo context accepts four settings, only of which
``precision`` is required:

+------------+---------------------------------------------------------------+
| ``precisio | This defines the precision of the geohash and can be          |
| n``        | specified as ``5m``, ``10km``, or as a raw geohash precision: |
|            | ``1``..\ ``12``. It’s also possible to setup multiple         |
|            | precisions by defining a list of precisions:                  |
|            | ``["5m", "10km"]``                                            |
+------------+---------------------------------------------------------------+
| ``neighbor | Geohashes are rectangles, so a geolocation, which in reality  |
| s``        | is only 1 metre away from the specified point, may fall into  |
|            | the neighbouring rectangle. Set ``neighbours`` to ``true`` to |
|            | include the neighbouring geohashes in the context. (default   |
|            | is **on**)                                                    |
+------------+---------------------------------------------------------------+
| ``path``   | Optionally specify a field to use to look up the geopoint.    |
+------------+---------------------------------------------------------------+
| ``default` | The geopoint to use if no geopoint has been specified.        |
| `          |                                                               |
+------------+---------------------------------------------------------------+

Since all locations of this mapping are translated into geohashes, each
location matches a geohash cell. So some results that lie within the
specified range but not in the same cell as the query location will not
match. To avoid this the ``neighbors`` option allows a matching of cells
that join the bordering regions of the documents location. This option
is turned on by default. If a document or a query doesn’t define a
location a value to use instead can defined by the ``default`` option.
The value of this option supports all the ways a ``geo_point`` can be
defined. The ``path`` refers to another field within the document to
retrieve the location. If this field contains multiple values, the
document will be linked to all these locations.

.. code:: js

    "context": {
        "location": {
            "type": "geo",
            "precision": ["1km", "5m"],
            "neighbors": true,
            "path": "pin",
            "default": {
                "lat": 0.0,
                "lon": 0.0
            }
        }
    }

**Geo location Config**

Within a document a geo location retrieved from the mapping definition
can be overridden by another location. In this case the context mapped
to a geo location supports all variants of defining a ``geo_point``.

.. code:: js

    PUT services/service/1
    {
        "name": "some hotel 1",
        "suggest_field": {
            "input": ["my hotel", "this hotel"],
            "context": {
                "location": {
                        "lat": 0,
                        "lon": 0
                }
            }
        }
    }

**Geo location Query**

Like in the configuration, querying with a geo location in context, the
geo location query supports all representations of a ``geo_point`` to
define the location. In this simple case all precision values defined in
the mapping will be applied to the given location.

.. code:: js

    POST services/_suggest
    {
        "suggest" : {
            "text" : "m",
            "completion" : {
                "field" : "suggest_field",
                "size": 10,
                "context": {
                    "location": {
                        "lat": 0,
                        "lon": 0
                    }
                }
            }
        }
    }

But it also possible to set a subset of the precisions set in the
mapping, by using the ``precision`` parameter. Like in the mapping, this
parameter is allowed to be set to a single precision value or a list of
these.

.. code:: js

    POST services/_suggest
    {
        "suggest" : {
            "text" : "m",
            "completion" : {
                "field" : "suggest_field",
                "size": 10,
                "context": {
                    "location": {
                        "value": {
                            "lat": 0,
                            "lon": 0
                        },
                        "precision": "1km"
                    }
                }
            }
        }
    }

A special form of the query is defined by an extension of the object
representation of the ``geo_point``. Using this representation allows to
set the ``precision`` parameter within the location itself:

.. code:: js

    POST services/_suggest
    {
        "suggest" : {
            "text" : "m",
            "completion" : {
                "field" : "suggest_field",
                "size": 10,
                "context": {
                    "location": {
                            "lat": 0,
                            "lon": 0,
                            "precision": "1km"
                    }
                }
            }
        }
    }

Multi Search API
================

The multi search API allows to execute several search requests within
the same API. The endpoint for it is ``_msearch``.

The format of the request is similar to the bulk API format, and the
structure is as follows (the structure is specifically optimized to
reduce parsing if a specific search ends up redirected to another node):

.. code:: js

    header\n
    body\n
    header\n
    body\n

The header part includes which index / indices to search on, optional
(mapping) types to search on, the ``search_type``, ``preference``, and
``routing``. The body includes the typical search body request
(including the ``query``, ``aggregations``, ``from``, ``size``, and so
on). Here is an example:

.. code:: js

    $ cat requests
    {"index" : "test"}
    {"query" : {"match_all" : {}}, "from" : 0, "size" : 10}
    {"index" : "test", "search_type" : "count"}
    {"query" : {"match_all" : {}}}
    {}
    {"query" : {"match_all" : {}}}

    {"query" : {"match_all" : {}}}
    {"search_type" : "count"}
    {"query" : {"match_all" : {}}}

    $ curl -XGET localhost:9200/_msearch --data-binary @requests; echo

Note, the above includes an example of an empty header (can also be just
without any content) which is supported as well.

The response returns a ``responses`` array, which includes the search
response for each search request matching its order in the original
multi search request. If there was a complete failure for that specific
search request, an object with ``error`` message will be returned in
place of the actual search response.

The endpoint allows to also search against an index/indices and
type/types in the URI itself, in which case it will be used as the
default unless explicitly defined otherwise in the header. For example:

.. code:: js

    $ cat requests
    {}
    {"query" : {"match_all" : {}}, "from" : 0, "size" : 10}
    {}
    {"query" : {"match_all" : {}}}
    {"index" : "test2"}
    {"query" : {"match_all" : {}}}

    $ curl -XGET localhost:9200/test/_msearch --data-binary @requests; echo

The above will execute the search against the ``test`` index for all the
requests that don’t define an index, and the last one will be executed
against the ``test2`` index.

The ``search_type`` can be set in a similar manner to globally apply to
all search requests.

**Security**

See ?

Count API
=========

The count API allows to easily execute a query and get the number of
matches for that query. It can be executed across one or more indices
and across one or more types. The query can either be provided using a
simple query string as a parameter, or using the `Query
DSL <#query-dsl>`__ defined within the request body. Here is an example:

.. code:: js

    $ curl -XGET 'http://localhost:9200/twitter/tweet/_count?q=user:kimchy'

    $ curl -XGET 'http://localhost:9200/twitter/tweet/_count' -d '
    {
        "query" : {
            "term" : { "user" : "kimchy" }
        }
    }'

    **Note**

    The query being sent in the body must be nested in a ``query`` key,
    same as the `search api <#search-search>`__ works

Both examples above do the same thing, which is count the number of
tweets from the twitter index for a certain user. The result is:

.. code:: js

    {
        "count" : 1,
        "_shards" : {
            "total" : 5,
            "successful" : 5,
            "failed" : 0
        }
    }

The query is optional, and when not provided, it will use ``match_all``
to count all the docs.

**Multi index, Multi type**

The count API can be applied to `multiple types in multiple
indices <#search-multi-index-type>`__.

**Request Parameters**

When executing count using the query parameter ``q``, the query passed
is a query string using Lucene query parser. There are additional
parameters that can be passed:

+--------------------------------------+--------------------------------------+
| Name                                 | Description                          |
+======================================+======================================+
| df                                   | The default field to use when no     |
|                                      | field prefix is defined within the   |
|                                      | query.                               |
+--------------------------------------+--------------------------------------+
| analyzer                             | The analyzer name to be used when    |
|                                      | analyzing the query string.          |
+--------------------------------------+--------------------------------------+
| default\_operator                    | The default operator to be used, can |
|                                      | be ``AND`` or ``OR``. Defaults to    |
|                                      | ``OR``.                              |
+--------------------------------------+--------------------------------------+
| terminate\_after                     | The maximum count for each shard,    |
|                                      | upon reaching which the query        |
|                                      | execution will terminate early. If   |
|                                      | set, the response will have a        |
|                                      | boolean field ``terminated_early``   |
|                                      | to indicate whether the query        |
|                                      | execution has actually               |
|                                      | terminated\_early. Defaults to no    |
|                                      | terminate\_after.                    |
+--------------------------------------+--------------------------------------+

**Request Body**

The count can use the `Query DSL <#query-dsl>`__ within its body in
order to express the query that should be executed. The body content can
also be passed as a REST parameter named ``source``.

Both HTTP GET and HTTP POST can be used to execute count with body.
Since not all clients support GET with body, POST is allowed as well.

**Distributed**

The count operation is broadcast across all shards. For each shard id
group, a replica is chosen and executed against it. This means that
replicas increase the scalability of count.

**Routing**

The routing value (a comma separated list of the routing values) can be
specified to control which shards the count request will be executed on.

Search Exists API
=================

The exists API allows to easily determine if any matching documents
exist for a provided query. It can be executed across one or more
indices and across one or more types. The query can either be provided
using a simple query string as a parameter, or using the `Query
DSL <#query-dsl>`__ defined within the request body. Here is an example:

.. code:: js

    $ curl -XGET 'http://localhost:9200/twitter/tweet/_search/exists?q=user:kimchy'

    $ curl -XGET 'http://localhost:9200/twitter/tweet/_search/exists' -d '
    {
        "query" : {
            "term" : { "user" : "kimchy" }
        }
    }'

    **Note**

    The query being sent in the body must be nested in a ``query`` key,
    same as how the `search api <#search-search>`__ works.

Both the examples above do the same thing, which is determine the
existence of tweets from the twitter index for a certain user. The
response body will be of the following format:

.. code:: js

    {
        "exists" : true
    }

**Multi index, Multi type**

The exists API can be applied to `multiple types in multiple
indices <#search-multi-index-type>`__.

**Request Parameters**

When executing exists using the query parameter ``q``, the query passed
is a query string using Lucene query parser. There are additional
parameters that can be passed:

+--------------------------------------+--------------------------------------+
| Name                                 | Description                          |
+======================================+======================================+
| df                                   | The default field to use when no     |
|                                      | field prefix is defined within the   |
|                                      | query.                               |
+--------------------------------------+--------------------------------------+
| analyzer                             | The analyzer name to be used when    |
|                                      | analyzing the query string.          |
+--------------------------------------+--------------------------------------+
| default\_operator                    | The default operator to be used, can |
|                                      | be ``AND`` or ``OR``. Defaults to    |
|                                      | ``OR``.                              |
+--------------------------------------+--------------------------------------+

**Request Body**

The exists API can use the `Query DSL <#query-dsl>`__ within its body in
order to express the query that should be executed. The body content can
also be passed as a REST parameter named ``source``.

HTTP GET and HTTP POST can be used to execute exists with body. Since
not all clients support GET with body, POST is allowed as well.

**Distributed**

The exists operation is broadcast across all shards. For each shard id
group, a replica is chosen and executed against it. This means that
replicas increase the scalability of exists. The exists operation also
early terminates shard requests once the first shard reports matched
document existence.

**Routing**

The routing value (a comma separated list of the routing values) can be
specified to control which shards the exists request will be executed
on.

Validate API
============

The validate API allows a user to validate a potentially expensive query
without executing it. The following example shows how it can be used:

.. code:: js

    curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
        "user" : "kimchy",
        "post_date" : "2009-11-15T14:12:12",
        "message" : "trying out Elasticsearch"
    }'

When the query is valid, the response contains ``valid:true``:

.. code:: js

    curl -XGET 'http://localhost:9200/twitter/_validate/query?q=user:foo'
    {"valid":true,"_shards":{"total":1,"successful":1,"failed":0}}

Or, with a request body:

.. code:: js

    curl -XGET 'http://localhost:9200/twitter/tweet/_validate/query' -d '{
      "query" : {
        "filtered" : {
          "query" : {
            "query_string" : {
              "query" : "*:*"
            }
          },
          "filter" : {
            "term" : { "user" : "kimchy" }
          }
        }
      }
    }'
    {"valid":true,"_shards":{"total":1,"successful":1,"failed":0}}

    **Note**

    The query being sent in the body must be nested in a ``query`` key,
    same as the `search api <#search-search>`__ works

If the query is invalid, ``valid`` will be ``false``. Here the query is
invalid because Elasticsearch knows the post\_date field should be a
date due to dynamic mapping, and *foo* does not correctly parse into a
date:

.. code:: js

    curl -XGET 'http://localhost:9200/twitter/tweet/_validate/query?q=post_date:foo'
    {"valid":false,"_shards":{"total":1,"successful":1,"failed":0}}

An ``explain`` parameter can be specified to get more detailed
information about why a query failed:

.. code:: js

    curl -XGET 'http://localhost:9200/twitter/tweet/_validate/query?q=post_date:foo&pretty=true&explain=true'
    {
      "valid" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "failed" : 0
      },
      "explanations" : [ {
        "index" : "twitter",
        "valid" : false,
        "error" : "org.elasticsearch.index.query.QueryParsingException: [twitter] Failed to parse; org.elasticsearch.ElasticsearchParseException: failed to parse date field [foo], tried both date format [dateOptionalTime], and timestamp number; java.lang.IllegalArgumentException: Invalid format: \"foo\""
      } ]
    }

Explain API
===========

The explain api computes a score explanation for a query and a specific
document. This can give useful feedback whether a document matches or
didn’t match a specific query.

The ``index`` and ``type`` parameters expect a single index and a single
type respectively.

**Usage**

Full query example:

.. code:: js

    curl -XGET 'localhost:9200/twitter/tweet/1/_explain' -d '{
          "query" : {
            "term" : { "message" : "search" }
          }
    }'

This will yield the following result:

.. code:: js

    {
      "matches" : true,
      "explanation" : {
        "value" : 0.15342641,
        "description" : "fieldWeight(message:search in 0), product of:",
        "details" : [ {
          "value" : 1.0,
          "description" : "tf(termFreq(message:search)=1)"
        }, {
          "value" : 0.30685282,
          "description" : "idf(docFreq=1, maxDocs=1)"
        }, {
          "value" : 0.5,
          "description" : "fieldNorm(field=message, doc=0)"
        } ]
      }
    }

There is also a simpler way of specifying the query via the ``q``
parameter. The specified ``q`` parameter value is then parsed as if the
``query_string`` query was used. Example usage of the ``q`` parameter in
the explain api:

.. code:: js

    curl -XGET 'localhost:9200/twitter/tweet/1/_explain?q=message:search'

This will yield the same result as the previous request.

**All parameters:**

+------------+---------------------------------------------------------------+
| ``_source` | Set to ``true`` to retrieve the ``_source`` of the document   |
| `          | explained. You can also retrieve part of the document by      |
|            | using ``_source_include`` & ``_source_exclude`` (see `Get     |
|            | API <#get-source-filtering>`__ for more details)              |
+------------+---------------------------------------------------------------+
| ``fields`` | Allows to control which stored fields to return as part of    |
|            | the document explained.                                       |
+------------+---------------------------------------------------------------+
| ``routing` | Controls the routing in the case the routing was used during  |
| `          | indexing.                                                     |
+------------+---------------------------------------------------------------+
| ``parent`` | Same effect as setting the routing parameter.                 |
+------------+---------------------------------------------------------------+
| ``preferen | Controls on which shard the explain is executed.              |
| ce``       |                                                               |
+------------+---------------------------------------------------------------+
| ``source`` | Allows the data of the request to be put in the query string  |
|            | of the url.                                                   |
+------------+---------------------------------------------------------------+
| ``q``      | The query string (maps to the query\_string query).           |
+------------+---------------------------------------------------------------+
| ``df``     | The default field to use when no field prefix is defined      |
|            | within the query. Defaults to \_all field.                    |
+------------+---------------------------------------------------------------+
| ``analyzer | The analyzer name to be used when analyzing the query string. |
| ``         | Defaults to the analyzer of the \_all field.                  |
+------------+---------------------------------------------------------------+
| ``analyze_ | Should wildcard and prefix queries be analyzed or not.        |
| wildcard`` | Defaults to false.                                            |
+------------+---------------------------------------------------------------+
| ``lowercas | Should terms be automatically lowercased or not. Defaults to  |
| e_expanded | true.                                                         |
| _terms``   |                                                               |
+------------+---------------------------------------------------------------+
| ``lenient` | If set to true will cause format based failures (like         |
| `          | providing text to a numeric field) to be ignored. Defaults to |
|            | false.                                                        |
+------------+---------------------------------------------------------------+
| ``default_ | The default operator to be used, can be AND or OR. Defaults   |
| operator`` | to OR.                                                        |
+------------+---------------------------------------------------------------+

Percolator
==========

Traditionally you design documents based on your data and store them
into an index and then define queries via the search api in order to
retrieve these documents. The percolator works in the opposite
direction, first you store queries into an index and then via the
percolate api you define documents in order to retrieve these queries.

The reason that queries can be stored comes from the fact that in
Elasticsearch both documents and queries are defined in JSON. This
allows you to embed queries into documents via the index api.
Elasticsearch can extract the query from a document and make it
available to the percolate api. Since documents are also defined as
json, you can define a document in a request to the percolate api.

The percolator and most of its features work in realtime, so once a
percolate query is indexed it can immediately be used in the percolate
api.

    **Important**

    Field referred to in a percolator query must **already** exist in
    the mapping assocated with the index used for percolation. There are
    two ways to make sure that a field mapping exist:

    -  Add or update a mapping via the `create
       index <#indices-create-index>`__ or `put
       mapping <#indices-put-mapping>`__ apis.

    -  Percolate a document before registering a query. Percolating a
       document can add field mappings dynamically, in the same way as
       happens when indexing a document.

**Sample usage**

Create an index with a mapping for the field ``message``:

.. code:: js

    curl -XPUT 'localhost:9200/my-index' -d '{
      "mappings": {
        "my-type": {
          "properties": {
            "message": {
              "type": "string"
            }
          }
        }
      }
    }

Register a query in the percolator:

.. code:: js

    curl -XPUT 'localhost:9200/my-index/.percolator/1' -d '{
        "query" : {
            "match" : {
                "message" : "bonsai tree"
            }
        }
    }'

Match a document to the registered percolator queries:

.. code:: js

    curl -XGET 'localhost:9200/my-index/message/_percolate' -d '{
        "doc" : {
            "message" : "A new bonsai tree in the office"
        }
    }'

The above request will yield the following response:

.. code:: js

    {
        "took" : 19,
        "_shards" : {
            "total" : 5,
            "successful" : 5,
            "failed" : 0
        },
        "total" : 1,
        "matches" : [ 
            {
              "_index" : "my-index",
              "_id" : "1"
            }
        ]
    }

The percolate query with id ``1`` matches our document.

**Indexing percolator queries**

Percolate queries are stored as documents in a specific format and in an
arbitrary index under a reserved type with the name ``.percolator``. The
query itself is placed as is in a json object under the top level field
``query``.

.. code:: js

    {
        "query" : {
            "match" : {
                "field" : "value"
            }
        }
    }

Since this is just an ordinary document, any field can be added to this
document. This can be useful later on to only percolate documents by
specific queries.

.. code:: js

    {
        "query" : {
            "match" : {
                "field" : "value"
            }
        },
        "priority" : "high"
    }

On top of this also a mapping type can be associated with the this
query. This allows to control how certain queries like range queries,
shape filters and other query & filters that rely on mapping settings
get constructed. This is important since the percolate queries are
indexed into the ``.percolator`` type, and the queries / filters that
rely on mapping settings would yield unexpected behaviour. Note by
default field names do get resolved in a smart manner, but in certain
cases with multiple types this can lead to unexpected behaviour, so
being explicit about it will help.

.. code:: js

    {
        "query" : {
            "range" : {
                "created_at" : {
                    "gte" : "2010-01-01T00:00:00",
                    "lte" : "2011-01-01T00:00:00"
                }
            }
        },
        "type" : "tweet",
        "priority" : "high"
    }

In the above example the range query gets really parsed into a Lucene
numeric range query, based on the settings for the field ``created_at``
in the type ``tweet``.

Just as with any other type, the ``.percolator`` type has a mapping,
which you can configure via the mappings apis. The default percolate
mapping doesn’t index the query field and only stores it.

Because ``.percolate`` is a type it also has a mapping. By default the
following mapping is active:

.. code:: js

    {
        ".percolator" : {
            "properties" : {
                "query" : {
                    "type" : "object",
                    "enabled" : false
                }
            }
        }
    }

If needed this mapping can be modified with the update mapping api.

In order to un-register a percolate query the delete api can be used. So
if the previous added query needs to be deleted the following delete
requests needs to be executed:

.. code:: js

    curl -XDELETE localhost:9200/my-index/.percolator/1

**Percolate api**

The percolate api executes in a distributed manner, meaning it executes
on all shards an index points to.

-  ``index`` - The index that contains the ``.percolator`` type. This
   can also be an alias.

-  ``type`` - The type of the document to be percolated. The mapping of
   that type is used to parse document.

-  ``doc`` - The actual document to percolate. Unlike the other two
   options this needs to be specified in the request body. Note this
   isn’t required when percolating an existing document.

.. code:: js

    curl -XGET 'localhost:9200/twitter/tweet/_percolate' -d '{
        "doc" : {
            "created_at" : "2010-10-10T00:00:00",
            "message" : "some text"
        }
    }'

-  ``routing`` - In the case the percolate queries are partitioned by a
   custom routing value, that routing option make sure that the
   percolate request only gets executed on the shard where the routing
   value is partitioned to. This means that the percolate request only
   gets executed on one shard instead of all shards. Multiple values can
   be specified as a comma separated string, in that case the request
   can be be executed on more than one shard.

-  ``preference`` - Controls which shard replicas are preferred to
   execute the request on. Works the same as in the search api.

-  ``ignore_unavailable`` - Controls if missing concrete indices should
   silently be ignored. Same as is in the search api.

-  ``percolate_format`` - If ``ids`` is specified then the matches array
   in the percolate response will contain a string array of the matching
   ids instead of an array of objects. This can be useful to reduce the
   amount of data being send back to the client. Obviously if there are
   to percolator queries with same id from different indices there is no
   way the find out which percolator query belongs to what index. Any
   other value to ``percolate_format`` will be ignored.

-  ``filter`` - Reduces the number queries to execute during
   percolating. Only the percolator queries that match with the filter
   will be included in the percolate execution. The filter option works
   in near realtime, so a refresh needs to have occurred for the filter
   to included the latest percolate queries.

-  ``query`` - Same as the ``filter`` option, but also the score is
   computed. The computed scores can then be used by the
   ``track_scores`` and ``sort`` option.

-  ``size`` - Defines to maximum number of matches (percolate queries)
   to be returned. Defaults to unlimited.

-  ``track_scores`` - Whether the ``_score`` is included for each match.
   The ``_score`` is based on the query and represents how the query
   matched the **percolate query’s metadata**, **not** how the document
   (that is being percolated) matched the query. The ``query`` option is
   required for this option. Defaults to ``false``.

-  ``sort`` - Define a sort specification like in the search api.
   Currently only sorting ``_score`` reverse (default relevancy) is
   supported. Other sort fields will throw an exception. The ``size``
   and ``query`` option are required for this setting. Like
   ``track_score`` the score is based on the query and represents how
   the query matched to the percolate query’s metadata and **not** how
   the document being percolated matched to the query.

-  ``aggs`` - Allows aggregation definitions to be included. The
   aggregations are based on the matching percolator queries, look at
   the aggregation documentation on how to define aggregations.

-  ``highlight`` - Allows highlight definitions to be included. The
   document being percolated is being highlight for each matching query.
   This allows you to see how each match is highlighting the document
   being percolated. See highlight documentation on how to define
   highlights. The ``size`` option is required for highlighting, the
   performance of highlighting in the percolate api depends of how many
   matches are being highlighted.

**Dedicated percolator index**

Percolate queries can be added to any index. Instead of adding percolate
queries to the index the data resides in, these queries can also be
added to an dedicated index. The advantage of this is that this
dedicated percolator index can have its own index settings (For example
the number of primary and replicas shards). If you choose to have a
dedicated percolate index, you need to make sure that the mappings from
the normal index are also available on the percolate index. Otherwise
percolate queries can be parsed incorrectly.

**Filtering Executed Queries**

Filtering allows to reduce the number of queries, any filter that the
search api supports, (expect the ones mentioned in important notes) can
also be used in the percolate api. The filter only works on the metadata
fields. The ``query`` field isn’t indexed by default. Based on the query
we indexed before the following filter can be defined:

.. code:: js

    curl -XGET localhost:9200/test/type1/_percolate -d '{
        "doc" : {
            "field" : "value"
        },
        "filter" : {
            "term" : {
                "priority" : "high"
            }
        }
    }'

**Percolator count api**

The count percolate api, only keeps track of the number of matches and
doesn’t keep track of the actual matches Example:

.. code:: js

    curl -XGET 'localhost:9200/my-index/my-type/_percolate/count' -d '{
       "doc" : {
           "message" : "some message"
       }
    }'

Response:

.. code:: js

    {
       ... // header
       "total" : 3
    }

**Percolating an existing document**

In order to percolate in newly indexed document, the percolate existing
document can be used. Based on the response from an index request the
``_id`` and other meta information can be used to immediately percolate
the newly added document.

-  ``id`` - The id of the document to retrieve the source for.

-  ``percolate_index`` - The index containing the percolate queries.
   Defaults to the ``index`` defined in the url.

-  ``percolate_type`` - The percolate type (used for parsing the
   document). Default to ``type`` defined in the url.

-  ``routing`` - The routing value to use when retrieving the document
   to percolate.

-  ``preference`` - Which shard to prefer when retrieving the existing
   document.

-  ``percolate_routing`` - The routing value to use when percolating the
   existing document.

-  ``percolate_preference`` - Which shard to prefer when executing the
   percolate request.

-  ``version`` - Enables a version check. If the fetched document’s
   version isn’t equal to the specified version then the request fails
   with a version conflict and the percolation request is aborted.

Internally the percolate api will issue a get request for fetching
the\`\_source\` of the document to percolate. For this feature to work
the ``_source`` for documents to be percolated need to be stored.

**Example**

Index response:

.. code:: js

    {
        "_index" : "my-index",
        "_type" : "message",
        "_id" : "1",
        "_version" : 1,
        "created" : true
    }

Percolating an existing document:

.. code:: js

    curl -XGET 'localhost:9200/my-index1/message/1/_percolate'

The response is the same as with the regular percolate api.

**Multi percolate api**

The multi percolate api allows to bundle multiple percolate requests
into a single request, similar to what the multi search api does to
search requests. The request body format is line based. Each percolate
request item takes two lines, the first line is the header and the
second line is the body.

The header can contain any parameter that normally would be set via the
request path or query string parameters. There are several percolate
actions, because there are multiple types of percolate requests.

-  ``percolate`` - Action for defining a regular percolate request.

-  ``count`` - Action for defining a count percolate request.

Depending on the percolate action different parameters can be specified.
For example the percolate and percolate existing document actions
support different parameters.

-  ``GET|POST /[index]/[type]/_mpercolate``

-  ``GET|POST /[index]/_mpercolate``

-  ``GET|POST /_mpercolate``

The ``index`` and ``type`` defined in the url path are the default index
and type.

**Example**

Request:

.. code:: js

    curl -XGET 'localhost:9200/twitter/tweet/_mpercolate' --data-binary @requests.txt; echo

The index twitter is the default index and the type tweet is the default
type and will be used in the case a header doesn’t specify an index or
type.

requests.txt:

.. code:: js

    {"percolate" : {"index" : "twitter", "type" : "tweet"}}
    {"doc" : {"message" : "some text"}}
    {"percolate" : {"index" : "twitter", "type" : "tweet", "id" : "1"}}
    {}
    {"percolate" : {"index" : "users", "type" : "user", "id" : "3", "percolate_index" : "users_2012" }}
    {"size" : 10}
    {"count" : {"index" : "twitter", "type" : "tweet"}}
    {"doc" : {"message" : "some other text"}}
    {"count" : {"index" : "twitter", "type" : "tweet", "id" : "1"}}
    {}

For a percolate existing document item (headers with the ``id`` field),
the response can be an empty json object. All the required options are
set in the header.

Response:

.. code:: js

    {
        "items" : [
            {
                "took" : 24,
                "_shards" : {
                    "total" : 5,
                    "successful" : 5,
                    "failed" : 0,
                },
                "total" : 3,
                "matches" : ["1", "2", "3"]
            },
            {
                "took" : 12,
                "_shards" : {
                    "total" : 5,
                    "successful" : 5,
                    "failed" : 0,
                },
                "total" : 3,
                "matches" : ["4", "5", "6"]
            },
            {
                "error" : "[user][3]document missing"
            },
            {
                "took" : 12,
                "_shards" : {
                    "total" : 5,
                    "successful" : 5,
                    "failed" : 0,
                },
                "total" : 3
            },
            {
                "took" : 14,
                "_shards" : {
                    "total" : 5,
                    "successful" : 5,
                    "failed" : 0,
                },
                "total" : 3
            }
        ]
    }

Each item represents a percolate response, the order of the items maps
to the order in which the percolate requests were specified. In case a
percolate request failed, the item response is substituted with an error
message.

**How it works under the hood**

When indexing a document that contains a query in an index and the
``.percolator`` type the query part of the documents gets parsed into a
Lucene query and is kept in memory until that percolator document is
removed or the index containing the ``.percolator`` type get removed. So
all the active percolator queries are kept in memory.

At percolate time the document specified in the request gets parsed into
a Lucene document and is stored in a in-memory Lucene index. This
in-memory index can just hold this one document and it is optimized for
that. Then all the queries that are registered to the index that the
percolate request is targeted for are going to be executed on this
single document in-memory index. This happens on each shard the
percolate request need to execute.

By using ``routing``, ``filter`` or ``query`` features the amount of
queries that need to be executed can be reduced and thus the time the
percolate api needs to run can be decreased.

**Important notes**

Because the percolator API is processing one document at a time, it
doesn’t support queries and filters that run against child documents
such as ``has_child``, ``has_parent`` and ``top_children``.

The ``wildcard`` and ``regexp`` query natively use a lot of memory and
because the percolator keeps the queries into memory this can easily
take up the available memory in the heap space. If possible try to use a
``prefix`` query or ngramming to achieve the same result (with way less
memory being used).

The delete-by-query api doesn’t work to unregister a query, it only
deletes the percolate documents from disk. In order to update the
registered queries in memory the index needs be closed and opened.

More Like This API
==================

The more like this (mlt) API allows to get documents that are "like" a
specified document. Here is an example:

.. code:: js

    $ curl -XGET 'http://localhost:9200/twitter/tweet/1/_mlt?mlt_fields=tag,content&min_doc_freq=1'

The API simply results in executing a search request with
`moreLikeThis <#query-dsl-mlt-query>`__ query (http parameters match the
parameters to the ``more_like_this`` query). This means that the body of
the request can optionally include all the request body options in the
`search API <#search-search>`__ (aggs, from/to and so on). Internally,
the more like this API is equivalent to performing a boolean query of
``more_like_this_field`` queries, with one query per specified
``mlt_fields``.

Rest parameters relating to search are also allowed, including
``search_type``, ``search_indices``, ``search_types``,
``search_scroll``, ``search_size`` and ``search_from``.

When no ``mlt_fields`` are specified, all the fields of the document
will be used in the ``more_like_this`` query generated.

By default, the queried document is excluded from the response
(``include`` set to false).

Note: In order to use the ``mlt`` feature a ``mlt_field`` needs to be
either be ``stored``, store ``term_vector`` or ``source`` needs to be
enabled.

Benchmark
=========

    **Important**

    This feature is marked as experimental, and may be subject to change
    in the future. If you use this feature, please let us know your
    experience with it!

The benchmark API provides a standard mechanism for submitting queries
and measuring their performance relative to one another.

    **Important**

    To be eligible to run benchmarks nodes must be started with:
    ``--node.bench true``. This is just a way to mark certain nodes as
    "executors". Searches will still be distributed out to the cluster
    in the normal manner. This is primarily a defensive measure to
    prevent production nodes from being flooded with potentially many
    requests. Typically one would start a single node with this setting
    and submit benchmark requests to it.

.. code:: bash

    $ ./bin/elasticsearch --node.bench true

Benchmarking a search request is as simple as executing the following
command:

.. code:: js

    $ curl -XPUT 'localhost:9200/_bench/?pretty=true' -d '{
        "name": "my_benchmark",
        "competitors": [ {
            "name": "my_competitor",
            "requests": [ {
                "query": {
                    "match": { "_all": "a*" }
                }
            } ]
        } ]
    }'

Response:

.. code:: js

    {
      "status" : "complete",
      "competitors" : {
        "my_competitor" : {
          "summary" : {
            "nodes" : [ "localhost" ],
            "total_iterations" : 5,
            "completed_iterations" : 5,
            "total_queries" : 1000,
            "concurrency" : 5,
            "multiplier" : 100,
            "avg_warmup_time" : 43.0,
            "statistics" : {
              "min" : 1,
              "max" : 10,
              "mean" : 4.19,
              "qps" : 238.663,
              "std_dev" : 1.938,
              "millis_per_hit" : 1.064,
              "percentile_10" : 2,
              "percentile_25" : 3,
              "percentile_50" : 4,
              "percentile_75" : 5,
              "percentile_90" : 7,
              "percentile_99" : 10
            }
          }
        }
      }
    }

A *competitor* defines one or more search requests to execute along with
parameters that describe how the search(es) should be run. Multiple
competitors may be submitted as a group in which case they will execute
one after the other. This makes it easy to compare various competing
alternatives side-by-side.

There are several parameters which may be set at the competition level:

+------------+---------------------------------------------------------------+
| ``name``   | Unique name for the competition.                              |
+------------+---------------------------------------------------------------+
| ``iteratio | Number of times to run the competitors. Defaults to ``5``.    |
| ns``       |                                                               |
+------------+---------------------------------------------------------------+
| ``concurre | Within each iteration use this level of parallelism. Defaults |
| ncy``      | to ``5``.                                                     |
+------------+---------------------------------------------------------------+
| ``multipli | Within each iteration run the query this many times. Defaults |
| er``       | to ``1000``.                                                  |
+------------+---------------------------------------------------------------+
| ``warmup`` | Perform warmup of query. Defaults to ``true``.                |
+------------+---------------------------------------------------------------+
| ``num_slow | Record N slowest queries. Defaults to ``1``.                  |
| est``      |                                                               |
+------------+---------------------------------------------------------------+
| ``search_t | Type of search, e.g. "query\_then\_fetch",                    |
| ype``      | "dfs\_query\_then\_fetch", "count". Defaults to               |
|            | ``query_then_fetch``.                                         |
+------------+---------------------------------------------------------------+
| ``requests | Query DSL describing search requests.                         |
| ``         |                                                               |
+------------+---------------------------------------------------------------+
| ``clear_ca | Whether caches should be cleared on each iteration, and if    |
| ches``     | so, how. Caches are not cleared by default.                   |
+------------+---------------------------------------------------------------+
| ``indices` | Array of indices to search, e.g. ["my\_index\_1",             |
| `          | "my\_index\_2", "my\_index\_3"].                              |
+------------+---------------------------------------------------------------+
| ``types``  | Array of index types to search, e.g. ["my\_type\_1",          |
|            | "my\_type\_2"].                                               |
+------------+---------------------------------------------------------------+

Cache clearing parameters:

+------------+---------------------------------------------------------------+
| ``clear_ca | Set to *false* to disable cache clearing completely.          |
| ches``     |                                                               |
+------------+---------------------------------------------------------------+
| ``clear_ca | Whether to clear the filter cache.                            |
| ches.filte |                                                               |
| r``        |                                                               |
+------------+---------------------------------------------------------------+
| ``clear_ca | Whether to clear the field data cache.                        |
| ches.field |                                                               |
| _data``    |                                                               |
+------------+---------------------------------------------------------------+
| ``clear_ca | Whether to clear the id cache.                                |
| ches.id``  |                                                               |
+------------+---------------------------------------------------------------+
| ``clear_ca | Whether to clear the recycler cache.                          |
| ches.recyc |                                                               |
| ler``      |                                                               |
+------------+---------------------------------------------------------------+
| ``clear_ca | Array of fields to clear.                                     |
| ches.field |                                                               |
| s``        |                                                               |
+------------+---------------------------------------------------------------+
| ``clear_ca | Array of filter keys to clear.                                |
| ches.filte |                                                               |
| r_keys``   |                                                               |
+------------+---------------------------------------------------------------+

Global parameters:

+------------+---------------------------------------------------------------+
| ``name``   | Unique name for the benchmark.                                |
+------------+---------------------------------------------------------------+
| ``num_exec | Number of cluster nodes from which to submit and time         |
| utor_nodes | benchmarks. Allows user to run a benchmark simultaneously on  |
| ``         | one or more nodes and compare timings. Note that this does    |
|            | not control how many nodes a search request will actually     |
|            | execute on. Defaults to: 1.                                   |
+------------+---------------------------------------------------------------+
| ``percenti | Array of percentile values to report. Defaults to: [10, 25,   |
| les``      | 50, 75, 90, 99].                                              |
+------------+---------------------------------------------------------------+

Additionally, the following competition-level parameters may be set
globally: iteration, concurrency, multiplier, warmup, and clear\_caches.

Using these parameters it is possible to describe precisely how to
execute a benchmark under various conditions. In the following example
we run a filtered query against two different indices using two
different search types.

.. code:: js

    $ curl -XPUT 'localhost:9200/_bench/?pretty=true' -d '{
        "name": "my_benchmark",
        "num_executor_nodes": 1,
        "percentiles" : [ 25, 50, 75 ],
        "iterations": 5,
        "multiplier": 1000,
        "concurrency": 5,
        "num_slowest": 0,
        "warmup": true,
        "clear_caches": false,

        "requests": [ {
            "query" : {
                "filtered" : {
                    "query" : { "match" : { "_all" : "*" } },
                    "filter" : {
                        "and" : [ { "term" : { "title" : "Spain" } },
                                  { "term" : { "title" : "rain" } },
                                  { "term" : { "title" : "plain" } } ]
                    }
                }
            }
        } ],

        "competitors": [ {
            "name": "competitor_1",
            "search_type": "query_then_fetch",
            "indices": [ "my_index_1" ],
            "types": [ "my_type_1" ],
            "clear_caches" : {
                "filter" : true,
                "field_data" : true,
                "id" : true,
                "recycler" : true,
                "fields": ["title"]
            }
        }, {
            "name": "competitor_2",
            "search_type": "dfs_query_then_fetch",
            "indices": [ "my_index_2" ],
            "types": [ "my_type_2" ],
            "clear_caches" : {
                "filter" : true,
                "field_data" : true,
                "id" : true,
                "recycler" : true,
                "fields": ["title"]
            }
        } ]
    }'

Response:

.. code:: js

    {
      "status" : "complete",
      "competitors" : {
        "competitor_1" : {
          "summary" : {
            "nodes" : [ "localhost" ],
            "total_iterations" : 5,
            "completed_iterations" : 5,
            "total_queries" : 5000,
            "concurrency" : 5,
            "multiplier" : 1000,
            "avg_warmup_time" : 54.0,
            "statistics" : {
              "min" : 0,
              "max" : 3,
              "mean" : 0.533,
              "qps" : 1872.659,
              "std_dev" : 0.528,
              "millis_per_hit" : 0.0,
              "percentile_25" : 0.0,
              "percentile_50" : 1.0,
              "percentile_75" : 1.0
            }
          }
        },
        "competitor_2" : {
          "summary" : {
            "nodes" : [ "localhost" ],
            "total_iterations" : 5,
            "completed_iterations" : 5,
            "total_queries" : 5000,
            "concurrency" : 5,
            "multiplier" : 1000,
            "avg_warmup_time" : 4.0,
            "statistics" : {
              "min" : 0,
              "max" : 4,
              "mean" : 0.487,
              "qps" : 2049.180,
              "std_dev" : 0.545,
              "millis_per_hit" : 0.0,
              "percentile_25" : 0.0,
              "percentile_50" : 0.0,
              "percentile_75" : 1.0
            }
          }
        }
      }
    }

In some cases it may be desirable to view the progress of a long-running
benchmark and optionally terminate it early. To view all active
benchmarks use:

.. code:: js

    $ curl -XGET 'localhost:9200/_bench?pretty'

This would display run-time statistics in the same format as the sample
output above.

To abort a long-running benchmark use the *abort* endpoint:

.. code:: js

    $ curl -XPOST 'localhost:9200/_bench/abort/my_benchmark?pretty'

Response:

.. code:: js

    {
        "aborted_benchmarks" : [
            "node" "localhost",
            "benchmark_name", "my_benchmark",
            "aborted", true
        ]
    }

The indices APIs are used to manage individual indices, index settings,
aliases, mappings, index templates and warmers.

**Index management:**

-  ?

-  ?

-  ?

-  ?

-  ?

**Mapping management:**

-  ?

-  ?

-  ?

-  ?

-  ?

**Alias management:**

-  ?

**Index settings:**

-  ?

-  ?

-  ?

-  ?

-  ?

**Monitoring:**

-  ?

-  ?

-  ?

**Status management:**

-  ?

-  ?

-  ?

-  ?

-  ?

Create Index
============

The create index API allows to instantiate an index. Elasticsearch
provides support for multiple indices, including executing operations
across several indices.

**Index Settings**

Each index created can have specific settings associated with it.

.. code:: js

    $ curl -XPUT 'http://localhost:9200/twitter/'

    $ curl -XPUT 'http://localhost:9200/twitter/' -d '
    index :
        number_of_shards : 3 
        number_of_replicas : 2 
    '

Default for ``number_of_shards`` is 5

Default for ``number_of_replicas`` is 1 (ie one replica for each primary
shard)

The above second curl example shows how an index called ``twitter`` can
be created with specific settings for it using
`YAML <http://www.yaml.org>`__. In this case, creating an index with 3
shards, each with 2 replicas. The index settings can also be defined
with `JSON <http://www.json.org>`__:

.. code:: js

    $ curl -XPUT 'http://localhost:9200/twitter/' -d '{
        "settings" : {
            "index" : {
                "number_of_shards" : 3,
                "number_of_replicas" : 2
            }
        }
    }'

or more simplified

.. code:: js

    $ curl -XPUT 'http://localhost:9200/twitter/' -d '{
        "settings" : {
            "number_of_shards" : 3,
            "number_of_replicas" : 2
        }
    }'

    **Note**

    You do not have to explicitly specify ``index`` section inside the
    ``settings`` section.

For more information regarding all the different index level settings
that can be set when creating an index, please check the `index
modules <#index-modules>`__ section.

**Mappings**

The create index API allows to provide a set of one or more mappings:

.. code:: js

    curl -XPOST localhost:9200/test -d '{
        "settings" : {
            "number_of_shards" : 1
        },
        "mappings" : {
            "type1" : {
                "_source" : { "enabled" : false },
                "properties" : {
                    "field1" : { "type" : "string", "index" : "not_analyzed" }
                }
            }
        }
    }'

**Warmers**

The create index API allows also to provide a set of
`warmers <#indices-warmers>`__:

.. code:: js

    curl -XPUT localhost:9200/test -d '{
        "warmers" : {
            "warmer_1" : {
                "source" : {
                    "query" : {
                        ...
                    }
                }
            }
        }
    }'

**Aliases**

The create index API allows also to provide a set of
`aliases <#indices-aliases>`__:

.. code:: js

    curl -XPUT localhost:9200/test -d '{
        "aliases" : {
            "alias_1" : {},
            "alias_2" : {
                "filter" : {
                    "term" : {"user" : "kimchy" }
                },
                "routing" : "kimchy"
            }
        }
    }'

**Creation Date**

When an index is created, a timestamp is stored in the index metadata
for the creation date. By default this it is automatically generated but
it can also be specified using the ``creation_date`` parameter on the
create index API:

.. code:: js

    curl -XPUT localhost:9200/test -d '{
        "creation_date" : 1407751337000 
    }'

``creation_date`` is set using epoch time in milliseconds.

Delete Index
============

The delete index API allows to delete an existing index.

.. code:: js

    $ curl -XDELETE 'http://localhost:9200/twitter/'

The above example deletes an index called ``twitter``. Specifying an
index, alias or wildcard expression is required.

The delete index API can also be applied to more than one index, or on
all indices (be careful!) by using ``_all`` or ``*`` as index.

In order to disable allowing to delete indices via wildcards or
``_all``, set ``action.destructive_requires_name`` setting in the config
to ``true``. This setting can also be changed via the cluster update
settings api.

Get Index
=========

The get index API allows to retrieve information about one or more
indexes.

.. code:: js

    $ curl -XGET 'http://localhost:9200/twitter/'

The above example gets the information for an index called ``twitter``.
Specifying an index, alias or wildcard expression is required.

The get index API can also be applied to more than one index, or on all
indices by using ``_all`` or ``*`` as index.

**Filtering index information**

The information returned by the get API can be filtered to include only
specific features by specifying a comma delimited list of features in
the URL:

.. code:: js

    $ curl -XGET 'http://localhost:9200/twitter/_settings,_mappings'

The above command will only return the settings and mappings for the
index called ``twitter``.

The available features are ``_settings``, ``_mappings``, ``_warmers``
and ``_aliases``.

Indices Exists
==============

Used to check if the index (indices) exists or not. For example:

.. code:: js

    curl -XHEAD 'http://localhost:9200/twitter'

The HTTP status code indicates if the index exists or not. A ``404``
means it does not exist, and ``200`` means it does.

Open / Close Index API
======================

The open and close index APIs allow to close an index, and later on
opening it. A closed index has almost no overhead on the cluster (except
for maintaining its metadata), and is blocked for read/write operations.
A closed index can be opened which will then go through the normal
recovery process.

The REST endpoint is ``/{index}/_close`` and ``/{index}/_open``. For
example:

.. code:: js

    curl -XPOST 'localhost:9200/my_index/_close'

    curl -XPOST 'localhost:9200/my_index/_open'

It is possible to open and close multiple indices. An error will be
thrown if the request explicitly refers to a missing index. This
behaviour can be disabled using the ``ignore_unavailable=true``
parameter.

All indices can be opened or closed at once using ``_all`` as the index
name or specifying patterns that identify them all (e.g. ``*``).

Identifying indices via wildcards or ``_all`` can be disabled by setting
the ``action.destructive_requires_name`` flag in the config file to
``true``. This setting can also be changed via the cluster update
settings api.

Put Mapping
===========

The put mapping API allows to register specific mapping definition for a
specific type.

.. code:: js

    $ curl -XPUT 'http://localhost:9200/twitter/_mapping/tweet' -d '
    {
        "tweet" : {
            "properties" : {
                "message" : {"type" : "string", "store" : true }
            }
        }
    }
    '

The above example creates a mapping called ``tweet`` within the
``twitter`` index. The mapping simply defines that the ``message`` field
should be stored (by default, fields are not stored, just indexed) so we
can retrieve it later on using selective loading.

More information on how to define type mappings can be found in the
`mapping <#mapping>`__ section.

**Merging & Conflicts**

When an existing mapping already exists under the given type, the two
mapping definitions, the one already defined, and the new ones are
merged. The ``ignore_conflicts`` parameters can be used to control if
conflicts should be ignored or not, by default, it is set to ``false``
which means conflicts are **not** ignored.

The definition of conflict is really dependent on the type merged, but
in general, if a different core type is defined, it is considered as a
conflict. New mapping definitions can be added to object types, and core
type mappings can be upgraded by specifying multi fields on a core type.

**Multi Index**

The put mapping API can be applied to more than one index with a single
call, or even on ``_all`` the indices.

.. code:: js

    $ curl -XPUT 'http://localhost:9200/kimchy,elasticsearch/_mapping/tweet' -d '
    {
        "tweet" : {
            "properties" : {
                "message" : {"type" : "string", "store" : true }
            }
        }
    }
    '

All options:

.. code:: js

    PUT /{index}/_mapping/{type}

where

+------------+---------------------------------------------------------------+
| ``{index}` | ``blank | * | _all | glob pattern | name1, name2, …``         |
| `          |                                                               |
+------------+---------------------------------------------------------------+
| ``{type}`` | Name of the type to add. Must be the name of the type defined |
|            | in the body.                                                  |
+------------+---------------------------------------------------------------+

Instead of ``_mapping`` you can also use the plural ``_mappings``.

Get Mapping
===========

The get mapping API allows to retrieve mapping definitions for an index
or index/type.

.. code:: js

    curl -XGET 'http://localhost:9200/twitter/_mapping/tweet'

**Multiple Indices and Types**

The get mapping API can be used to get more than one index or type
mapping with a single call. General usage of the API follows the
following syntax: ``host:port/{index}/_mapping/{type}`` where both
``{index}`` and ``{type}`` can accept a comma-separated list of names.
To get mappings for all indices you can use ``_all`` for ``{index}``.
The following are some examples:

.. code:: js

    curl -XGET 'http://localhost:9200/_mapping/twitter,kimchy'

    curl -XGET 'http://localhost:9200/_all/_mapping/tweet,book'

If you want to get mappings of all indices and types then the following
two examples are equivalent:

.. code:: js

    curl -XGET 'http://localhost:9200/_all/_mapping'

    curl -XGET 'http://localhost:9200/_mapping'

Get Field Mapping
=================

The get field mapping API allows you to retrieve mapping definitions for
one or more fields. This is useful when you do not need the complete
type mapping returned by the ? API.

The following returns the mapping of the field ``text`` only:

.. code:: js

    curl -XGET 'http://localhost:9200/twitter/_mapping/tweet/field/text'

For which the response is (assuming ``text`` is a default string field):

.. code:: js

    {
       "twitter": {
          "tweet": {
             "text": {
                "full_name": "text",
                "mapping": {
                   "text": { "type": "string" }
                }
             }
          }
       }
    }

**Multiple Indices, Types and Fields**

The get field mapping API can be used to get the mapping of multiple
fields from more than one index or type with a single call. General
usage of the API follows the following syntax:
``host:port/{index}/{type}/_mapping/field/{field}`` where ``{index}``,
``{type}`` and ``{field}`` can stand for comma-separated list of names
or wild cards. To get mappings for all indices you can use ``_all`` for
``{index}``. The following are some examples:

.. code:: js

    curl -XGET 'http://localhost:9200/twitter,kimchy/_mapping/field/message'

    curl -XGET 'http://localhost:9200/_all/_mapping/tweet,book/field/message,user.id'

    curl -XGET 'http://localhost:9200/_all/_mapping/tw*/field/*.id'

**Specifying fields**

The get mapping api allows you to specify one or more fields separated
with by a comma. You can also use wildcards. The field names can be any
of the following:

+------------+---------------------------------------------------------------+
| Full names | the full path, including any parent object name the field is  |
|            | part of (ex. ``user.id``).                                    |
+------------+---------------------------------------------------------------+
| Index      | the name of the lucene field (can be different than the field |
| names      | name if the ``index_name`` option of the mapping is used).    |
+------------+---------------------------------------------------------------+
| Field      | the name of the field without the path to it (ex. ``id`` for  |
| names      | ``{ "user" : { "id" : 1 } }``).                               |
+------------+---------------------------------------------------------------+

The above options are specified in the order the ``field`` parameter is
resolved. The first field found which matches is returned. This is
especially important if index names or field names are used as those can
be ambiguous.

For example, consider the following mapping:

.. code:: js

     {
         "article": {
             "properties": {
                 "id": { "type": "string" },
                 "title":  { "type": "string", "index_name": "text" },
                 "abstract": { "type": "string", "index_name": "text" },
                 "author": {
                     "properties": {
                         "id": { "type": "string" },
                         "name": { "type": "string", "index_name": "author" }
                     }
                 }
             }
         }
     }

To select the ``id`` of the ``author`` field, you can use its full name
``author.id``. Using ``text`` will return the mapping of ``abstract`` as
it is one of the fields which map to the Lucene field ``text``. ``name``
will return the field ``author.name``:

.. code:: js

    curl -XGET "http://localhost:9200/publications/_mapping/article/field/author.id,text,name"

returns:

.. code:: js

    {
       "publications": {
          "article": {
             "text": {
                "full_name": "abstract",
                "mapping": {
                   "abstract": { "type": "string", "index_name": "text" }
                }
             },
             "author.id": {
                "full_name": "author.id",
                "mapping": {
                   "id": { "type": "string" }
                }
             },
             "name": {
                "full_name": "author.name",
                "mapping": {
                   "name": { "type": "string", "index_name": "author" }
                }
             }
          }
       }
    }

Note how the response always use the same fields specified in the
request as keys. The ``full_name`` in every entry contains the full name
of the field whose mapping were returned. This is useful when the
request can refer to to multiple fields (like ``text`` above).

**Other options**

+------------+---------------------------------------------------------------+
| ``include_ | adding ``include_defaults=true`` to the query string will     |
| defaults`` | cause the response to include default values, which are       |
|            | normally suppressed.                                          |
+------------+---------------------------------------------------------------+

Types Exists
============

Used to check if a type/types exists in an index/indices.

.. code:: js

    curl -XHEAD 'http://localhost:9200/twitter/tweet'

The HTTP status code indicates if the type exists or not. A ``404``
means it does not exist, and ``200`` means it does.

Delete Mapping
==============

Allow to delete a mapping (type) along with its data. The REST endpoints
are

.. code:: js

    [DELETE] /{index}/{type}

    [DELETE] /{index}/{type}/_mapping

    [DELETE] /{index}/_mapping/{type}

where

+------------+---------------------------------------------------------------+
| ``index``  | ``* | _all | glob pattern | name1, name2, …``                 |
+------------+---------------------------------------------------------------+
| ``type``   | ``* | _all | glob pattern | name1, name2, …``                 |
+------------+---------------------------------------------------------------+

Note, most times, it make more sense to reindex the data into a fresh
index compared to delete large chunks of it.

Index Aliases
=============

APIs in elasticsearch accept an index name when working against a
specific index, and several indices when applicable. The index aliases
API allow to alias an index with a name, with all APIs automatically
converting the alias name to the actual index name. An alias can also be
mapped to more than one index, and when specifying it, the alias will
automatically expand to the aliases indices. An alias can also be
associated with a filter that will automatically be applied when
searching, and routing values.

Here is a sample of associating the alias ``alias1`` with index
``test1``:

.. code:: js

    curl -XPOST 'http://localhost:9200/_aliases' -d '
    {
        "actions" : [
            { "add" : { "index" : "test1", "alias" : "alias1" } }
        ]
    }'

An alias can also be removed, for example:

.. code:: js

    curl -XPOST 'http://localhost:9200/_aliases' -d '
    {
        "actions" : [
            { "remove" : { "index" : "test1", "alias" : "alias1" } }
        ]
    }'

Renaming an alias is a simple ``remove`` then ``add`` operation within
the same API. This operation is atomic, no need to worry about a short
period of time where the alias does not point to an index:

.. code:: js

    curl -XPOST 'http://localhost:9200/_aliases' -d '
    {
        "actions" : [
            { "remove" : { "index" : "test1", "alias" : "alias1" } },
            { "add" : { "index" : "test1", "alias" : "alias2" } }
        ]
    }'

Associating an alias with more than one index are simply several ``add``
actions:

.. code:: js

    curl -XPOST 'http://localhost:9200/_aliases' -d '
    {
        "actions" : [
            { "add" : { "index" : "test1", "alias" : "alias1" } },
            { "add" : { "index" : "test2", "alias" : "alias1" } }
        ]
    }'

It is an error to index to an alias which points to more than one index.

**Filtered Aliases**

Aliases with filters provide an easy way to create different "views" of
the same index. The filter can be defined using Query DSL and is applied
to all Search, Count, Delete By Query and More Like This operations with
this alias.

To create a filtered alias, first we need to ensure that the fields
already exist in the mapping:

.. code:: js

    curl -XPUT 'http://localhost:9200/test1' -d '{
      "mappings": {
        "type1": {
          "properties": {
            "user" : {
              "type": "string",
              "index": "not_analyzed"
            }
          }
        }
      }
    }

Now we can create an alias that uses a filter on field ``user``:

.. code:: js

    curl -XPOST 'http://localhost:9200/_aliases' -d '{
        "actions" : [
            {
                "add" : {
                     "index" : "test1",
                     "alias" : "alias2",
                     "filter" : { "term" : { "user" : "kimchy" } }
                }
            }
        ]
    }'

**Routing**

It is possible to associate routing values with aliases. This feature
can be used together with filtering aliases in order to avoid
unnecessary shard operations.

The following command creates a new alias ``alias1`` that points to
index ``test``. After ``alias1`` is created, all operations with this
alias are automatically modified to use value ``1`` for routing:

.. code:: js

    curl -XPOST 'http://localhost:9200/_aliases' -d '
    {
        "actions" : [
            {
                "add" : {
                     "index" : "test",
                     "alias" : "alias1",
                     "routing" : "1"
                }
            }
        ]
    }'

It’s also possible to specify different routing values for searching and
indexing operations:

.. code:: js

    curl -XPOST 'http://localhost:9200/_aliases' -d '
    {
        "actions" : [
            {
                "add" : {
                     "index" : "test",
                     "alias" : "alias2",
                     "search_routing" : "1,2",
                     "index_routing" : "2"
                }
            }
        ]
    }'

As shown in the example above, search routing may contain several values
separated by comma. Index routing can contain only a single value.

If an operation that uses routing alias also has a routing parameter, an
intersection of both alias routing and routing specified in the
parameter is used. For example the following command will use "2" as a
routing value:

.. code:: js

    curl -XGET 'http://localhost:9200/alias2/_search?q=user:kimchy&routing=2,3'

**Add a single alias**

An alias can also be added with the endpoint

``PUT /{index}/_alias/{name}``

where

+------------+---------------------------------------------------------------+
| ``index``  | The index the alias refers to. Can be any of                  |
|            | ``* | _all | glob pattern | name1, name2, …``                 |
+------------+---------------------------------------------------------------+
| ``name``   | The name of the alias. This is a required option.             |
+------------+---------------------------------------------------------------+
| ``routing` | An optional routing that can be associated with an alias.     |
| `          |                                                               |
+------------+---------------------------------------------------------------+
| ``filter`` | An optional filter that can be associated with an alias.      |
+------------+---------------------------------------------------------------+

You can also use the plural ``_aliases``.

**Examples:**

Adding time based alias
    .. code:: js

        curl -XPUT 'localhost:9200/logs_201305/_alias/2013'

Adding a user alias
    First create the index and add a mapping for the ``user_id`` field:

    .. code:: js

        curl -XPUT 'localhost:9200/users' -d '{
            "mappings" : {
                "user" : {
                    "properties" : {
                        "user_id" : {"type" : "integer"}
                    }
                }
            }
        }'

    Then add the alias for a specific user:

    .. code:: js

        curl -XPUT 'localhost:9200/users/_alias/user_12' -d '{
            "routing" : "12",
            "filter" : {
                "term" : {
                    "user_id" : 12
                }
            }
        }'

**Aliases during index creation**

Aliases can also be specified during `index
creation <#create-index-aliases>`__:

.. code:: js

    curl -XPUT localhost:9200/logs_20142801 -d '{
        "mappings" : {
            "type" : {
                "properties" : {
                    "year" : {"type" : "integer"}
                }
            }
        },
        "aliases" : {
            "current_day" : {},
            "2014" : {
                "filter" : {
                    "term" : {"year" : 2014 }
                }
            }
        }
    }'

**Delete aliases**

The rest endpoint is: ``/{index}/_alias/{name}``

where

+------------+---------------------------------------------------------------+
| ``index``  | ``* | _all | glob pattern | name1, name2, …``                 |
+------------+---------------------------------------------------------------+
| ``name``   | ``* | _all | glob pattern | name1, name2, …``                 |
+------------+---------------------------------------------------------------+

Alternatively you can use the plural ``_aliases``. Example:

.. code:: js

    curl -XDELETE 'localhost:9200/users/_alias/user_12'

**Retrieving existing aliases**

The get index alias api allows to filter by alias name and index name.
This api redirects to the master and fetches the requested index
aliases, if available. This api only serialises the found index aliases.

Possible options:

+------------+---------------------------------------------------------------+
| ``index``  | The index name to get aliases for. Partially names are        |
|            | supported via wildcards, also multiple index names can be     |
|            | specified separated with a comma. Also the alias name for an  |
|            | index can be used.                                            |
+------------+---------------------------------------------------------------+
| ``alias``  | The name of alias to return in the response. Like the index   |
|            | option, this option supports wildcards and the option the     |
|            | specify multiple alias names separated by a comma.            |
+------------+---------------------------------------------------------------+
| ``ignore_u | What to do is an specified index name doesn’t exist. If set   |
| navailable | to ``true`` then those indices are ignored.                   |
| ``         |                                                               |
+------------+---------------------------------------------------------------+

The rest endpoint is: ``/{index}/_alias/{alias}``.

**Examples:**

All aliases for the index users:

.. code:: js

    curl -XGET 'localhost:9200/users/_alias/*'

Response:

.. code:: js

     {
      "users" : {
        "aliases" : {
          "user_13" : {
            "filter" : {
              "term" : {
                "user_id" : 13
              }
            },
            "index_routing" : "13",
            "search_routing" : "13"
          },
          "user_14" : {
            "filter" : {
              "term" : {
                "user_id" : 14
              }
            },
            "index_routing" : "14",
            "search_routing" : "14"
          },
          "user_12" : {
            "filter" : {
              "term" : {
                "user_id" : 12
              }
            },
            "index_routing" : "12",
            "search_routing" : "12"
          }
        }
      }
    }

All aliases with the name 2013 in any index:

.. code:: js

    curl -XGET 'localhost:9200/_alias/2013'

Response:

.. code:: js

    {
      "logs_201304" : {
        "aliases" : {
          "2013" : { }
        }
      },
      "logs_201305" : {
        "aliases" : {
          "2013" : { }
        }
      }
    }

All aliases that start with 2013\_01 in any index:

.. code:: js

    curl -XGET 'localhost:9200/_alias/2013_01*'

Response:

.. code:: js

    {
      "logs_20130101" : {
        "aliases" : {
          "2013_01" : { }
        }
      }
    }

There is also a HEAD variant of the get indices aliases api to check if
index aliases exist. The indices aliases exists api supports the same
option as the get indices aliases api. Examples:

.. code:: js

    curl -XHEAD 'localhost:9200/_alias/2013'
    curl -XHEAD 'localhost:9200/_alias/2013_01*'
    curl -XHEAD 'localhost:9200/users/_alias/*'

Update Indices Settings
=======================

Change specific index level settings in real time.

The REST endpoint is ``/_settings`` (to update all indices) or
``{index}/_settings`` to update one (or more) indices settings. The body
of the request includes the updated settings, for example:

.. code:: js

    {
        "index" : {
            "number_of_replicas" : 4
        } }

The above will change the number of replicas to 4 from the current
number of replicas. Here is a curl example:

.. code:: js

    curl -XPUT 'localhost:9200/my_index/_settings' -d '
    {
        "index" : {
            "number_of_replicas" : 4
        } }
    '

Below is the list of settings that can be changed using the update
settings API:

``index.number_of_replicas``
    The number of replicas each shard has.

``index.auto_expand_replicas`` (string)
    Set to a dash delimited lower and upper bound (e.g. ``0-5``) or one
    may use ``all`` as the upper bound (e.g. ``0-all``), or ``false`` to
    disable it.

``index.blocks.read_only``
    Set to ``true`` to have the index read only, ``false`` to allow
    writes and metadata changes.

``index.blocks.read``
    Set to ``true`` to disable read operations against the index.

``index.blocks.write``
    Set to ``true`` to disable write operations against the index.

``index.blocks.metadata``
    Set to ``true`` to disable metadata operations against the index.

``index.refresh_interval``
    The async refresh interval of a shard.

``index.index_concurrency``
    Defaults to ``8``.

``index.codec.bloom.load``
    Whether to load the bloom filter. Defaults to ``false``. See ?.

``index.fail_on_merge_failure``
    Default to ``true``.

``index.translog.flush_threshold_ops``
    When to flush based on operations.

``index.translog.flush_threshold_size``
    When to flush based on translog (bytes) size.

``index.translog.flush_threshold_period``
    When to flush based on a period of not flushing.

``index.translog.disable_flush``
    Disables flushing. Note, should be set for a short interval and then
    enabled.

``index.cache.filter.max_size``
    The maximum size of filter cache (per segment in shard). Set to
    ``-1`` to disable.

``index.cache.filter.expire``
    The expire after access time for filter cache. Set to ``-1`` to
    disable.

``index.gateway.snapshot_interval``
    The gateway snapshot interval (only applies to shared gateways).
    Defaults to 10s.

`merge policy <#index-modules-merge>`__
    All the settings for the merge policy currently configured. A
    different merge policy can’t be set.

``index.routing.allocation.include.*``
    A node matching any rule will be allowed to host shards from the
    index.

``index.routing.allocation.exclude.*``
    A node matching any rule will NOT be allowed to host shards from the
    index.

``index.routing.allocation.require.*``
    Only nodes matching all rules will be allowed to host shards from
    the index.

``index.routing.allocation.disable_allocation``
    Disable allocation. Defaults to ``false``. Deprecated in favour for
    ``index.routing.allocation.enable``.

``index.routing.allocation.disable_new_allocation``
    Disable new allocation. Defaults to ``false``. Deprecated in favour
    for ``index.routing.allocation.enable``.

``index.routing.allocation.disable_replica_allocation``
    Disable replica allocation. Defaults to ``false``. Deprecated in
    favour for ``index.routing.allocation.enable``.

``index.routing.allocation.enable``
    Enables shard allocation for a specific index. It can be set to:

    -  ``all`` (default) - Allows shard allocation for all shards.

    -  ``primaries`` - Allows shard allocation only for primary shards.

    -  ``new_primaries`` - Allows shard allocation only for primary
       shards for new indices.

    -  ``none`` - No shard allocation is allowed.

``index.routing.rebalance.enable``
    Enables shard rebalancing for a specific index. It can be set to:

    -  ``all`` (default) - Allows shard rebalancing for all shards.

    -  ``primaries`` - Allows shard rebalancing only for primary shards.

    -  ``replicas`` - Allows shard rebalancing only for replica shards.

    -  ``none`` - No shard rebalancing is allowed.

``index.routing.allocation.total_shards_per_node``
    Controls the total number of shards (replicas and primaries) allowed
    to be allocated on a single node. Defaults to unbounded (``-1``).

``index.recovery.initial_shards``
    When using local gateway a particular shard is recovered only if
    there can be allocated quorum shards in the cluster. It can be set
    to:

    -  ``quorum`` (default)

    -  ``quorum-1`` (or ``half``)

    -  ``full``

    -  ``full-1``.

    -  Number values are also supported, e.g. ``1``.

``index.gc_deletes``; \ ``index.ttl.disable_purge``
    Disables temporarily the purge of expired docs.

`store level throttling <#index-modules-store>`__
    All the settings for the store level throttling policy currently
    configured.

``index.translog.fs.type``
    Either ``simple`` or ``buffered`` (default).

``index.compound_format``
    See ```index.compound_format`` <#index-compound-format>`__ in ?.

``index.compound_on_flush``
    See `\`index.compound\_on\_flush <#index-compound-on-flush>`__ in ?.

?
    All the settings for slow log.

``index.warmer.enabled``
    See ?. Defaults to ``true``.

**Bulk Indexing Usage**

For example, the update settings API can be used to dynamically change
the index from being more performant for bulk indexing, and then move it
to more real time indexing state. Before the bulk indexing is started,
use:

.. code:: js

    curl -XPUT localhost:9200/test/_settings -d '{
        "index" : {
            "refresh_interval" : "-1"
        } }'

(Another optimization option is to start the index without any replicas,
and only later adding them, but that really depends on the use case).

Then, once bulk indexing is done, the settings can be updated (back to
the defaults for example):

.. code:: js

    curl -XPUT localhost:9200/test/_settings -d '{
        "index" : {
            "refresh_interval" : "1s"
        } }'

And, an optimize should be called:

.. code:: js

    curl -XPOST 'http://localhost:9200/test/_optimize?max_num_segments=5'

**Updating Index Analysis**

It is also possible to define new `analyzers <#analysis>`__ for the
index. But it is required to `close <#indices-open-close>`__ the index
first and `open <#indices-open-close>`__ it after the changes are made.

For example if ``content`` analyzer hasn’t been defined on ``myindex``
yet you can use the following commands to add it:

.. code:: js

    curl -XPOST 'localhost:9200/myindex/_close'

    curl -XPUT 'localhost:9200/myindex/_settings' -d '{
      "analysis" : {
        "analyzer":{
          "content":{
            "type":"custom",
            "tokenizer":"whitespace"
          }
        }
      }
    }'

    curl -XPOST 'localhost:9200/myindex/_open'

**Bloom filters**

Up to version 1.3, Elasticsearch used to generate bloom filters for the
``_uid`` field at indexing time and to load them at search time in order
to speed-up primary-key lookups by savings disk seeks.

As of 1.4, bloom filters are still generated at indexing time, but they
are no longer loaded at search time by default: they consume RAM in
proportion to the number of unique terms, which can quickly add up for
certain use cases, and separate performance improvements have made the
performance gains with bloom filters very small.

    **Tip**

    You can enable loading of the bloom filter at search time on a
    per-index basis by updating the index settings:

    .. code:: js

        PUT /old_index/_settings?index.codec.bloom.load=true

    This setting, which defaults to ``false``, can be updated on a live
    index. Note, however, that changing the value will cause the index
    to be reopened, which will invalidate any existing caches.

Get Settings
============

The get settings API allows to retrieve settings of index/indices:

.. code:: js

    $ curl -XGET 'http://localhost:9200/twitter/_settings'

**Multiple Indices and Types**

The get settings API can be used to get settings for more than one index
with a single call. General usage of the API follows the following
syntax: ``host:port/{index}/_settings`` where ``{index}`` can stand for
comma-separated list of index names and aliases. To get settings for all
indices you can use ``_all`` for ``{index}``. Wildcard expressions are
also supported. The following are some examples:

.. code:: js

    curl -XGET 'http://localhost:9200/twitter,kimchy/_settings'

    curl -XGET 'http://localhost:9200/_all/_settings'

    curl -XGET 'http://localhost:9200/2013-*/_settings'

**Prefix option**

There is also support for a ``prefix`` query string option that allows
to include only settings matches the specified prefix.

.. code:: js

    curl -XGET 'http://localhost:9200/my-index/_settings?prefix=index.'

    curl -XGET 'http://localhost:9200/_all/_settings?prefix=index.routing.allocation.'

    curl -XGET 'http://localhost:9200/2013-*/_settings?name=index.merge.*'

    curl -XGET 'http://localhost:9200/2013-*/_settings/index.merge.*'

The first example returns all index settings the start with ``index.``
in the index ``my-index``, the second example gets all index settings
that start with ``index.routing.allocation.`` for all indices, lastly
the third example returns all index settings that start with
``index.merge.`` in indices that start with ``2013-``.

Analyze
=======

Performs the analysis process on a text and return the tokens breakdown
of the text.

Can be used without specifying an index against one of the many built in
analyzers:

.. code:: js

    curl -XGET 'localhost:9200/_analyze?analyzer=standard' -d 'this is a test'

Or by building a custom transient analyzer out of tokenizers, token
filters and char filters. Token filters can use the shorter *filters*
parameter name:

.. code:: js

    curl -XGET 'localhost:9200/_analyze?tokenizer=keyword&filters=lowercase' -d 'this is a test'

    curl -XGET 'localhost:9200/_analyze?tokenizer=keyword&token_filters=lowercase&char_filters=html_strip' -d 'this is a <b>test</b>'

It can also run against a specific index:

.. code:: js

    curl -XGET 'localhost:9200/test/_analyze?text=this+is+a+test'

The above will run an analysis on the "this is a test" text, using the
default index analyzer associated with the ``test`` index. An
``analyzer`` can also be provided to use a different analyzer:

.. code:: js

    curl -XGET 'localhost:9200/test/_analyze?analyzer=whitespace' -d 'this is a test'

Also, the analyzer can be derived based on a field mapping, for example:

.. code:: js

    curl -XGET 'localhost:9200/test/_analyze?field=obj1.field1' -d 'this is a test'

Will cause the analysis to happen based on the analyzer configured in
the mapping for ``obj1.field1`` (and if not, the default index
analyzer).

Also, the text can be provided as part of the request body, and not as a
parameter.

Index Templates
===============

Index templates allow to define templates that will automatically be
applied to new indices created. The templates include both settings and
mappings, and a simple pattern template that controls if the template
will be applied to the index created. For example:

.. code:: js

    curl -XPUT localhost:9200/_template/template_1 -d '
    {
        "template" : "te*",
        "settings" : {
            "number_of_shards" : 1
        },
        "mappings" : {
            "type1" : {
                "_source" : { "enabled" : false }
            }
        }
    }
    '

Defines a template named template\_1, with a template pattern of
``te*``. The settings and mappings will be applied to any index name
that matches the ``te*`` template.

It is also possible to include aliases in an index template as follows:

.. code:: js

    curl -XPUT localhost:9200/_template/template_1 -d '
    {
        "template" : "te*",
        "settings" : {
            "number_of_shards" : 1
        },
        "aliases" : {
            "alias1" : {},
            "alias2" : {
                "filter" : {
                    "term" : {"user" : "kimchy" }
                },
                "routing" : "kimchy"
            },
            "{index}-alias" : {} 
        }
    }
    '

the ``{index}`` placeholder within the alias name will be replaced with
the actual index name that the template gets applied to during index
creation.

**Deleting a Template**

Index templates are identified by a name (in the above case
``template_1``) and can be deleted as well:

.. code:: js

    curl -XDELETE localhost:9200/_template/template_1

**GETting templates**

Index templates are identified by a name (in the above case
``template_1``) and can be retrieved using the following:

.. code:: js

    curl -XGET localhost:9200/_template/template_1

You can also match several templates by using wildcards like:

.. code:: js

    curl -XGET localhost:9200/_template/temp*
    curl -XGET localhost:9200/_template/template_1,template_2

To get list of all index templates you can run:

.. code:: js

    curl -XGET localhost:9200/_template/

**Multiple Template Matching**

Multiple index templates can potentially match an index, in this case,
both the settings and mappings are merged into the final configuration
of the index. The order of the merging can be controlled using the
``order`` parameter, with lower order being applied first, and higher
orders overriding them. For example:

.. code:: js

    curl -XPUT localhost:9200/_template/template_1 -d '
    {
        "template" : "*",
        "order" : 0,
        "settings" : {
            "number_of_shards" : 1
        },
        "mappings" : {
            "type1" : {
                "_source" : { "enabled" : false }
            }
        }
    }
    '

    curl -XPUT localhost:9200/_template/template_2 -d '
    {
        "template" : "te*",
        "order" : 1,
        "settings" : {
            "number_of_shards" : 1
        },
        "mappings" : {
            "type1" : {
                "_source" : { "enabled" : true }
            }
        }
    }
    '

The above will disable storing the ``_source`` on all ``type1`` types,
but for indices of that start with ``te*``, source will still be
enabled. Note, for mappings, the merging is "deep", meaning that
specific object/property based mappings can easily be added/overridden
on higher order templates, with lower order templates providing the
basis.

**Config**

Index templates can also be placed within the config location
(``path.conf``) under the ``templates`` directory (note, make sure to
place them on all master eligible nodes). For example, a file called
``template_1.json`` can be placed under ``config/templates`` and it will
be added if it matches an index. Here is a sample of the mentioned file:

.. code:: js

    {
        "template_1" : {
            "template" : "*",
            "settings" : {
                "index.number_of_shards" : 2
            },
            "mappings" : {
                "_default_" : {
                    "_source" : {
                        "enabled" : false
                    }
                },
                "type1" : {
                    "_all" : {
                        "enabled" : false
                    }
                }
            }
        }
    }

Warmers
=======

Index warming allows to run registered search requests to warm up the
index before it is available for search. With the near real time aspect
of search, cold data (segments) will be warmed up before they become
available for search. This includes things such as the filter cache,
filesystem cache, and loading field data for fields.

Warmup searches typically include requests that require heavy loading of
data, such as aggregations or sorting on specific fields. The warmup
APIs allows to register warmup (search) under specific names, remove
them, and get them.

Index warmup can be disabled by setting ``index.warmer.enabled`` to
``false``. It is supported as a realtime setting using update settings
API. This can be handy when doing initial bulk indexing: disable pre
registered warmers to make indexing faster and less expensive and then
enable it.

**Index Creation / Templates**

Warmers can be registered when an index gets created, for example:

.. code:: js

    curl -XPUT localhost:9200/test -d '{
        "warmers" : {
            "warmer_1" : {
                "types" : [],
                "source" : {
                    "query" : {
                        ...
                    },
                    "aggs" : {
                        ...
                    }
                }
            }
        }
    }'

Or, in an index template:

.. code:: js

    curl -XPUT localhost:9200/_template/template_1 -d '
    {
        "template" : "te*",
        "warmers" : {
            "warmer_1" : {
                "types" : [],
                "source" : {
                    "query" : {
                        ...
                    },
                    "aggs" : {
                        ...
                    }
                }
            }
        }
    }'

On the same level as ``types`` and ``source``, the ``query_cache`` flag
is supported to enable query caching for the warmed search request. If
not specified, it will use the index level configuration of query
caching.

**Put Warmer**

Allows to put a warmup search request on a specific index (or indices),
with the body composing of a regular search request. Types can be
provided as part of the URI if the search request is designed to be run
only against the specific types.

Here is an example that registers a warmup called ``warmer_1`` against
index ``test`` (can be alias or several indices), for a search request
that runs against all types:

.. code:: js

    curl -XPUT localhost:9200/test/_warmer/warmer_1 -d '{
        "query" : {
            "match_all" : {}
        },
        "aggs" : {
            "aggs_1" : {
                "terms" : {
                    "field" : "field"
                }
            }
        }
    }'

And an example that registers a warmup against specific types:

.. code:: js

    curl -XPUT localhost:9200/test/type1/_warmer/warmer_1 -d '{
        "query" : {
            "match_all" : {}
        },
        "aggs" : {
            "aggs_1" : {
                "terms" : {
                    "field" : "field"
                }
            }
        }
    }'

All options:

.. code:: js

    PUT _warmer/{warmer_name}

    PUT /{index}/_warmer/{warmer_name}

    PUT /{index}/{type}/_warmer/{warmer_name}

where

+------------+---------------------------------------------------------------+
| ``{index}` | ``* | _all | glob pattern | name1, name2, …``                 |
| `          |                                                               |
+------------+---------------------------------------------------------------+
| ``{type}`` | ``* | _all | glob pattern | name1, name2, …``                 |
+------------+---------------------------------------------------------------+

Instead of ``_warmer`` you can also use the plural ``_warmers``.

The ``query_cache`` parameter can be used to enable query caching for
the search request. If not specified, it will use the index level
configuration of query caching.

**Delete Warmers**

Warmers can be deleted using the following endpoint:

.. code:: js

    [DELETE] /{index}/_warmer/{name}

where

+------------+---------------------------------------------------------------+
| ``{index}` | ``* | _all | glob pattern | name1, name2, …``                 |
| `          |                                                               |
+------------+---------------------------------------------------------------+
| ``{name}`` | ``* | _all | glob pattern | name1, name2, …``                 |
+------------+---------------------------------------------------------------+

Instead of ``_warmer`` you can also use the plural ``_warmers``.

**GETting Warmer**

Getting a warmer for specific index (or alias, or several indices) based
on its name. The provided name can be a simple wildcard expression or
omitted to get all warmers.

Some examples:

.. code:: js

    # get warmer named warmer_1 on test index
    curl -XGET localhost:9200/test/_warmer/warmer_1

    # get all warmers that start with warm on test index
    curl -XGET localhost:9200/test/_warmer/warm*

    # get all warmers for test index
    curl -XGET localhost:9200/test/_warmer/

Indices Stats
=============

Indices level stats provide statistics on different operations happening
on an index. The API provides statistics on the index level scope
(though most stats can also be retrieved using node level scope).

The following returns high level aggregation and index level stats for
all indices:

.. code:: js

    curl localhost:9200/_stats

Specific index stats can be retrieved using:

.. code:: js

    curl localhost:9200/index1,index2/_stats

By default, all stats are returned, returning only specific stats can be
specified as well in the URI. Those stats can be any of:

+------------+---------------------------------------------------------------+
| ``docs``   | The number of docs / deleted docs (docs not yet merged out).  |
|            | Note, affected by refreshing the index.                       |
+------------+---------------------------------------------------------------+
| ``store``  | The size of the index.                                        |
+------------+---------------------------------------------------------------+
| ``indexing | Indexing statistics, can be combined with a comma separated   |
| ``         | list of ``types`` to provide document type level stats.       |
+------------+---------------------------------------------------------------+
| ``get``    | Get statistics, including missing stats.                      |
+------------+---------------------------------------------------------------+
| ``search`` | Search statistics. You can include statistics for custom      |
|            | groups by adding an extra ``groups`` parameter (search        |
|            | operations can be associated with one or more groups). The    |
|            | ``groups`` parameter accepts a comma separated list of group  |
|            | names. Use ``_all`` to return statistics for all groups.      |
+------------+---------------------------------------------------------------+
| ``completi | Completion suggest statistics.                                |
| on``       |                                                               |
+------------+---------------------------------------------------------------+
| ``fielddat | Fielddata statistics.                                         |
| a``        |                                                               |
+------------+---------------------------------------------------------------+
| ``flush``  | Flush statistics.                                             |
+------------+---------------------------------------------------------------+
| ``merge``  | Merge statistics.                                             |
+------------+---------------------------------------------------------------+
| ``query_ca | `Shard query cache <#index-modules-shard-query-cache>`__      |
| che``      | statistics.                                                   |
+------------+---------------------------------------------------------------+
| ``refresh` | Refresh statistics.                                           |
| `          |                                                               |
+------------+---------------------------------------------------------------+
| ``suggest` | Suggest statistics.                                           |
| `          |                                                               |
+------------+---------------------------------------------------------------+
| ``warmer`` | Warmer statistics.                                            |
+------------+---------------------------------------------------------------+

Some statistics allow per field granularity which accepts a list
comma-separated list of included fields. By default all fields are
included:

+------------+---------------------------------------------------------------+
| ``fields`` | List of fields to be included in the statistics. This is used |
|            | as the default list unless a more specific field list is      |
|            | provided (see below).                                         |
+------------+---------------------------------------------------------------+
| ``completi | List of fields to be included in the Completion Suggest       |
| on_fields` | statistics.                                                   |
| `          |                                                               |
+------------+---------------------------------------------------------------+
| ``fielddat | List of fields to be included in the Fielddata statistics.    |
| a_fields`` |                                                               |
+------------+---------------------------------------------------------------+

Here are some samples:

.. code:: js

    # Get back stats for merge and refresh only for all indices
    curl 'localhost:9200/_stats/merge,refresh'
    # Get back stats for type1 and type2 documents for the my_index index
    curl 'localhost:9200/my_index/_stats/indexing?types=type1,type2
    # Get back just search stats for group1 and group2
    curl 'localhost:9200/_stats/search?groups=group1,group2

The stats returned are aggregated on the index level, with ``primaries``
and ``total`` aggregations. In order to get back shard level stats, set
the ``level`` parameter to ``shards``.

Note, as shards move around the cluster, their stats will be cleared as
they are created on other nodes. On the other hand, even though a shard
"left" a node, that node will still retain the stats that shard
contributed to.

Indices Segments
================

Provide low level segments information that a Lucene index (shard level)
is built with. Allows to be used to provide more information on the
state of a shard and an index, possibly optimization information, data
"wasted" on deletes, and so on.

Endpoints include segments for a specific index, several indices, or
all:

.. code:: js

    curl -XGET 'http://localhost:9200/test/_segments'
    curl -XGET 'http://localhost:9200/test1,test2/_segments'
    curl -XGET 'http://localhost:9200/_segments'

Response:

.. code:: js

    {
        ...
            "_3": {
                "generation": 3,
                "num_docs": 1121,
                "deleted_docs": 53,
                "size_in_bytes": 228288,
                "memory_in_bytes": 3211,
                "committed": true,
                "search": true,
                "version": "4.6",
                "compound": true
            }
        ...
    }

\_0
    The key of the JSON document is the name of the segment. This name
    is used to generate file names: all files starting with this segment
    name in the directory of the shard belong to this segment.

generation
    A generation number that is basically incremented when needing to
    write a new segment. The segment name is derived from this
    generation number.

num\_docs
    The number of non-deleted documents that are stored in this segment.

deleted\_docs
    The number of deleted documents that are stored in this segment. It
    is perfectly fine if this number is greater than 0, space is going
    to be reclaimed when this segment gets merged.

size\_in\_bytes
    The amount of disk space that this segment uses, in bytes.

memory\_in\_bytes
    Segments need to store some data into memory in order to be
    searchable efficiently. This number returns the number of bytes that
    are used for that purpose. A value of -1 indicates that
    Elasticsearch was not able to compute this number.

committed
    Whether the segment has been sync’ed on disk. Segments that are
    committed would survive a hard reboot. No need to worry in case of
    false, the data from uncommitted segments is also stored in the
    transaction log so that Elasticsearch is able to replay changes on
    the next start.

search
    Whether the segment is searchable. A value of false would most
    likely mean that the segment has been written to disk but no refresh
    occurred since then to make it searchable.

version
    The version of Lucene that has been used to write this segment.

compound
    Whether the segment is stored in a compound file. When true, this
    means that Lucene merged all files from the segment in a single one
    in order to save file descriptors.

Indices Recovery
================

The indices recovery API provides insight into on-going index shard
recoveries. Recovery status may be reported for specific indices, or
cluster-wide.

For example, the following command would show recovery information for
the indices "index1" and "index2".

.. code:: js

    curl -XGET http://localhost:9200/index1,index2/_recovery?pretty=true

To see cluster-wide recovery status simply leave out the index names.

.. code:: js

    curl -XGET http://localhost:9200/_recovery?pretty=true

Response:

.. code:: js

    {
      "index1" : {
        "shards" : [ {
          "id" : 0,
          "type" : "snapshot",
          "stage" : "index",
          "primary" : true,
          "start_time" : "2014-02-24T12:15:59.716",
          "stop_time" : 0,
          "total_time_in_millis" : 175576,
          "source" : {
            "repository" : "my_repository",
            "snapshot" : "my_snapshot",
            "index" : "index1"
          },
          "target" : {
            "id" : "ryqJ5lO5S4-lSFbGntkEkg",
            "hostname" : "my.fqdn",
            "ip" : "10.0.1.7",
            "name" : "my_es_node"
          },
          "index" : {
            "files" : {
              "total" : 73,
              "reused" : 0,
              "recovered" : 69,
              "percent" : "94.5%"
            },
            "bytes" : {
              "total" : 79063092,
              "reused" : 0,
              "recovered" : 68891939,
              "percent" : "87.1%"
            },
            "total_time_in_millis" : 0
          },
          "translog" : {
            "recovered" : 0,
            "total_time_in_millis" : 0
          },
          "start" : {
            "check_index_time" : 0,
            "total_time_in_millis" : 0
          }
        } ]
      }
    }

The above response shows a single index recovering a single shard. In
this case, the source of the recovery is a snapshot repository and the
target of the recovery is the node with name "my\_es\_node".

Additionally, the output shows the number and percent of files
recovered, as well as the number and percent of bytes recovered.

In some cases a higher level of detail may be preferable. Setting
"detailed=true" will present a list of physical files in recovery.

.. code:: js

    curl -XGET http://localhost:9200/_recovery?pretty=true&detailed=true

Response:

.. code:: js

    {
      "index1" : {
        "shards" : [ {
          "id" : 0,
          "type" : "gateway",
          "stage" : "done",
          "primary" : true,
          "start_time" : "2014-02-24T12:38:06.349",
          "stop_time" : "2014-02-24T12:38:08.464",
          "total_time_in_millis" : 2115,
          "source" : {
            "id" : "RGMdRc-yQWWKIBM4DGvwqQ",
            "hostname" : "my.fqdn",
            "ip" : "10.0.1.7",
            "name" : "my_es_node"
          },
          "target" : {
            "id" : "RGMdRc-yQWWKIBM4DGvwqQ",
            "hostname" : "my.fqdn",
            "ip" : "10.0.1.7",
            "name" : "my_es_node"
          },
          "index" : {
            "files" : {
              "total" : 26,
              "reused" : 26,
              "recovered" : 26,
              "percent" : "100.0%",
              "details" : [ {
                "name" : "segments.gen",
                "length" : 20,
                "recovered" : 20
              }, {
                "name" : "_0.cfs",
                "length" : 135306,
                "recovered" : 135306
              }, {
                "name" : "segments_2",
                "length" : 251,
                "recovered" : 251
              },
               ...
              ]
            },
            "bytes" : {
              "total" : 26001617,
              "reused" : 26001617,
              "recovered" : 26001617,
              "percent" : "100.0%"
            },
            "total_time_in_millis" : 2
          },
          "translog" : {
            "recovered" : 71,
            "total_time_in_millis" : 2025
          },
          "start" : {
            "check_index_time" : 0,
            "total_time_in_millis" : 88
          }
        } ]
      }
    }

This response shows a detailed listing (truncated for brevity) of the
actual files recovered and their sizes.

Also shown are the timings in milliseconds of the various stages of
recovery: index retrieval, translog replay, and index start time.

Note that the above listing indicates that the recovery is in stage
"done". All recoveries, whether on-going or complete, are kept in
cluster state and may be reported on at any time. Setting
"active\_only=true" will cause only on-going recoveries to be reported.

Here is a complete list of options:

+------------+---------------------------------------------------------------+
| ``detailed | Display a detailed view. This is primarily useful for viewing |
| ``         | the recovery of physical index files. Default: false.         |
+------------+---------------------------------------------------------------+
| ``active_o | Display only those recoveries that are currently on-going.    |
| nly``      | Default: false.                                               |
+------------+---------------------------------------------------------------+

Description of output fields:

+------------+---------------------------------------------------------------+
| ``id``     | Shard ID                                                      |
+------------+---------------------------------------------------------------+
| ``type``   | Recovery type:                                                |
|            |                                                               |
|            | -  gateway                                                    |
|            |                                                               |
|            | -  snapshot                                                   |
|            |                                                               |
|            | -  replica                                                    |
|            |                                                               |
|            | -  relocating                                                 |
|            |                                                               |
                                                                            
+------------+---------------------------------------------------------------+
| ``stage``  | Recovery stage:                                               |
|            |                                                               |
|            | -  init: Recovery has not started                             |
|            |                                                               |
|            | -  index: Reading index meta-data and copying bytes from      |
|            |    source to destination                                      |
|            |                                                               |
|            | -  start: Starting the engine; opening the index for use      |
|            |                                                               |
|            | -  translog: Replaying transaction log                        |
|            |                                                               |
|            | -  finalize: Cleanup                                          |
|            |                                                               |
|            | -  done: Complete                                             |
|            |                                                               |
                                                                            
+------------+---------------------------------------------------------------+
| ``primary` | True if shard is primary, false otherwise                     |
| `          |                                                               |
+------------+---------------------------------------------------------------+
| ``start_ti | Timestamp of recovery start                                   |
| me``       |                                                               |
+------------+---------------------------------------------------------------+
| ``stop_tim | Timestamp of recovery finish                                  |
| e``        |                                                               |
+------------+---------------------------------------------------------------+
| ``total_ti | Total time to recover shard in milliseconds                   |
| me_in_mill |                                                               |
| is``       |                                                               |
+------------+---------------------------------------------------------------+
| ``source`` | Recovery source:                                              |
|            |                                                               |
|            | -  repository description if recovery is from a snapshot      |
|            |                                                               |
|            | -  description of source node otherwise                       |
|            |                                                               |
                                                                            
+------------+---------------------------------------------------------------+
| ``target`` | Destination node                                              |
+------------+---------------------------------------------------------------+
| ``index``  | Statistics about physical index recovery                      |
+------------+---------------------------------------------------------------+
| ``translog | Statistics about translog recovery                            |
| ``         |                                                               |
+------------+---------------------------------------------------------------+
| ``start``  | Statistics about time to open and start the index             |
+------------+---------------------------------------------------------------+

Clear Cache
===========

The clear cache API allows to clear either all caches or specific cached
associated with one ore more indices.

.. code:: js

    $ curl -XPOST 'http://localhost:9200/twitter/_cache/clear'

The API, by default, will clear all caches. Specific caches can be
cleaned explicitly by setting ``filter``, ``fielddata``,
``query_cache``, or ``id_cache`` to ``true``.

All caches relating to a specific field(s) can also be cleared by
specifying ``fields`` parameter with a comma delimited list of the
relevant fields.

**Multi Index**

The clear cache API can be applied to more than one index with a single
call, or even on ``_all`` the indices.

.. code:: js

    $ curl -XPOST 'http://localhost:9200/kimchy,elasticsearch/_cache/clear'

    $ curl -XPOST 'http://localhost:9200/_cache/clear'

    **Note**

    The ``filter`` cache is not cleared immediately but is scheduled to
    be cleared within 60 seconds.

Flush
=====

The flush API allows to flush one or more indices through an API. The
flush process of an index basically frees memory from the index by
flushing data to the index storage and clearing the internal
`transaction log <#index-modules-translog>`__. By default, Elasticsearch
uses memory heuristics in order to automatically trigger flush
operations as required in order to clear memory.

.. code:: js

    $ curl -XPOST 'http://localhost:9200/twitter/_flush'

**Request Parameters**

The flush API accepts the following request parameters:

+------------+---------------------------------------------------------------+
| ``wait_if_ | If set to ``true`` the flush operation will block until the   |
| ongoing``  | flush can be executed if another flush operation is already   |
|            | executing. The default is ``false`` and will cause an         |
|            | exception to be thrown on the shard level if another flush    |
|            | operation is already running.                                 |
+------------+---------------------------------------------------------------+
| ``full``   | If set to ``true`` a new index writer is created and settings |
|            | that have been changed related to the index writer will be    |
|            | refreshed. Note: if a full flush is required for a setting to |
|            | take effect this will be part of the settings update process  |
|            | and it not required to be executed by the user. (This setting |
|            | can be considered as internal)                                |
+------------+---------------------------------------------------------------+
| ``force``  | Whether a flush should be forced even if it is not            |
|            | necessarily needed ie. if no changes will be committed to the |
|            | index. This is useful if transaction log IDs should be        |
|            | incremented even if no uncommitted changes are present. (This |
|            | setting can be considered as internal)                        |
+------------+---------------------------------------------------------------+

**Multi Index**

The flush API can be applied to more than one index with a single call,
or even on ``_all`` the indices.

.. code:: js

    $ curl -XPOST 'http://localhost:9200/kimchy,elasticsearch/_flush'

    $ curl -XPOST 'http://localhost:9200/_flush'

Refresh
=======

The refresh API allows to explicitly refresh one or more index, making
all operations performed since the last refresh available for search.
The (near) real-time capabilities depend on the index engine used. For
example, the internal one requires refresh to be called, but by default
a refresh is scheduled periodically.

.. code:: js

    $ curl -XPOST 'http://localhost:9200/twitter/_refresh'

**Multi Index**

The refresh API can be applied to more than one index with a single
call, or even on ``_all`` the indices.

.. code:: js

    $ curl -XPOST 'http://localhost:9200/kimchy,elasticsearch/_refresh'

    $ curl -XPOST 'http://localhost:9200/_refresh'

Optimize
========

The optimize API allows to optimize one or more indices through an API.
The optimize process basically optimizes the index for faster search
operations (and relates to the number of segments a Lucene index holds
within each shard). The optimize operation allows to reduce the number
of segments by merging them.

.. code:: js

    $ curl -XPOST 'http://localhost:9200/twitter/_optimize'

**Request Parameters**

The optimize API accepts the following request parameters:

+------------+---------------------------------------------------------------+
| ``max_num_ | The number of segments to optimize to. To fully optimize the  |
| segments`` | index, set it to ``1``. Defaults to simply checking if a      |
|            | merge needs to execute, and if so, executes it.               |
+------------+---------------------------------------------------------------+
| ``only_exp | Should the optimize process only expunge segments with        |
| unge_delet | deletes in it. In Lucene, a document is not deleted from a    |
| es``       | segment, just marked as deleted. During a merge process of    |
|            | segments, a new segment is created that does not have those   |
|            | deletes. This flag allows to only merge segments that have    |
|            | deletes. Defaults to ``false``. Note that this won’t override |
|            | the ``index.merge.policy.expunge_deletes_allowed`` threshold. |
+------------+---------------------------------------------------------------+
| ``flush``  | Should a flush be performed after the optimize. Defaults to   |
|            | ``true``.                                                     |
+------------+---------------------------------------------------------------+
| ``wait_for | Should the request wait for the merge to end. Defaults to     |
| _merge``   | ``true``. Note, a merge can potentially be a very heavy       |
|            | operation, so it might make sense to run it set to ``false``. |
+------------+---------------------------------------------------------------+

**Multi Index**

The optimize API can be applied to more than one index with a single
call, or even on ``_all`` the indices.

.. code:: js

    $ curl -XPOST 'http://localhost:9200/kimchy,elasticsearch/_optimize'

    $ curl -XPOST 'http://localhost:9200/_optimize'

Upgrade
=======

The upgrade API allows to upgrade one or more indices to the latest
format through an API. The upgrade process converts any segments written
with previous formats.

**Start an upgrade**

.. code:: sh

    $ curl -XPOST 'http://localhost:9200/twitter/_upgrade'

    **Note**

    Upgrading is an I/O intensive operation, and is limited to
    processing a single shard per node at a time. It also is not allowed
    to run at the same time as optimize.

**Request Parameters**

The ``upgrade`` API accepts the following request parameters:

+------------+---------------------------------------------------------------+
| ``wait_for | Should the request wait for the upgrade to complete. Defaults |
| _completio | to ``false``.                                                 |
| n``        |                                                               |
+------------+---------------------------------------------------------------+

**Check upgrade status**

Use a ``GET`` request to monitor how much of an index is upgraded. This
can also be used prior to starting an upgrade to identify which indices
you want to upgrade at the same time.

.. code:: sh

    curl 'http://localhost:9200/twitter/_upgrade?human'

.. code:: js

    {
       "twitter": {
          "size": "21gb",
          "size_in_bytes": "21000000000",
          "size_to_upgrade": "10gb",
          "size_to_upgrade_in_bytes": "10000000000"
       }
    }

**Introduction**

JSON is great… for computers. Even if it’s pretty-printed, trying to
find relationships in the data is tedious. Human eyes, especially when
looking at an ssh terminal, need compact and aligned text. The cat API
aims to meet this need.

All the cat commands accept a query string parameter ``help`` to see all
the headers and info they provide, and the ``/_cat`` command alone lists
all the available commands.

**Common parameters**

**Verbose**

Each of the commands accepts a query string parameter ``v`` to turn on
verbose output.

.. code:: shell

    % curl 'localhost:9200/_cat/master?v'
    id                     ip        node
    EGtKWZlWQYWDmX29fUnp3Q 127.0.0.1 Grey, Sara

**Help**

Each of the commands accepts a query string parameter ``help`` which
will output its available columns.

.. code:: shell

    % curl 'localhost:9200/_cat/master?help'
    id   | node id
    ip   | node transport ip address
    node | node name

**Headers**

Each of the commands accepts a query string parameter ``h`` which forces
only those columns to appear.

.. code:: shell

    % curl 'n1:9200/_cat/nodes?h=ip,port,heapPercent,name'
    192.168.56.40 9300 40.3 Captain Universe
    192.168.56.20 9300 15.3 Kaluu
    192.168.56.50 9300 17.0 Yellowjacket
    192.168.56.10 9300 12.3 Remy LeBeau
    192.168.56.30 9300 43.9 Ramsey, Doug

**Numeric formats**

Many commands provide a few types of numeric output, either a byte value
or a time value. By default, these types are human-formatted, for
example, ``3.5mb`` instead of ``3763212``. The human values are not
sortable numerically, so in order to operate on these values where order
is important, you can change it.

Say you want to find the largest index in your cluster (storage used by
all the shards, not number of documents). The ``/_cat/indices`` API is
ideal. We only need to tweak two things. First, we want to turn off
human mode. We’ll use a byte-level resolution. Then we’ll pipe our
output into ``sort`` using the appropriate column, which in this case is
the eight one.

.. code:: shell

    % curl '192.168.56.10:9200/_cat/indices?bytes=b' | sort -rnk8
    green wiki2 3 0 10000   0 105274918 105274918
    green wiki1 3 0 10000 413 103776272 103776272
    green foo   1 0   227   0   2065131   2065131

cat aliases
===========

``aliases`` shows information about currently configured aliases to
indices including filter and routing infos.

.. code:: shell

    % curl '192.168.56.10:9200/_cat/aliases?v'
    alias  index filter indexRouting searchRouting
    alias2 test1 *      -            -
    alias4 test1 -      2            1,2
    alias1 test1 -      -            -
    alias3 test1 -      1            1

The output shows that ``alias`` has configured a filter, and specific
routing configurations in ``alias3`` and ``alias4``.

If you only want to get information about a single alias, you can
specify the alias in the URL, for example ``/_cat/aliases/alias1``.

cat allocation
==============

``allocation`` provides a snapshot of how shards have located around the
cluster and the state of disk usage.

.. code:: shell

    % curl '192.168.56.10:9200/_cat/allocation?v'
    shards diskUsed diskAvail diskRatio ip            node
         1    5.6gb    72.2gb      7.8% 192.168.56.10 Jarella
         1    5.6gb    72.2gb      7.8% 192.168.56.30 Solarr
         1    5.5gb    72.3gb      7.6% 192.168.56.20 Adam II

Here we can see that each node has been allocated a single shard and
that they’re all using about the same amount of space.

cat count
=========

``count`` provides quick access to the document count of the entire
cluster, or individual indices.

.. code:: shell

    % curl 192.168.56.10:9200/_cat/indices
    green wiki1 3 0 10000 331 168.5mb 168.5mb
    green wiki2 3 0   428   0     8mb     8mb
    % curl 192.168.56.10:9200/_cat/count
    1384314124582 19:42:04 10428
    % curl 192.168.56.10:9200/_cat/count/wiki2
    1384314139815 19:42:19 428

cat fielddata
=============

``fielddata`` shows information about currently loaded fielddata on a
per-node basis.

.. code:: shell

    % curl '192.168.56.10:9200/_cat/fielddata?v'
    id                     host    ip            node          total   body    text
    c223lARiSGeezlbrcugAYQ myhost1 10.20.100.200 Jessica Jones 385.6kb 159.8kb 225.7kb
    waPCbitNQaCL6xC8VxjAwg myhost2 10.20.100.201 Adversary     435.2kb 159.8kb 275.3kb
    yaDkp-G3R0q1AJ-HUEvkSQ myhost3 10.20.100.202 Microchip     284.6kb 109.2kb 175.3kb

Fields can be specified either as a query parameter, or in the URL path:

.. code:: shell

    % curl '192.168.56.10:9200/_cat/fielddata?v&fields=body'
    id                     host    ip            node          total   body
    c223lARiSGeezlbrcugAYQ myhost1 10.20.100.200 Jessica Jones 385.6kb 159.8kb
    waPCbitNQaCL6xC8VxjAwg myhost2 10.20.100.201 Adversary     435.2kb 159.8kb
    yaDkp-G3R0q1AJ-HUEvkSQ myhost3 10.20.100.202 Microchip     284.6kb 109.2kb

    % curl '192.168.56.10:9200/_cat/fielddata/body,text?v'
    id                     host    ip            node          total   body    text
    c223lARiSGeezlbrcugAYQ myhost1 10.20.100.200 Jessica Jones 385.6kb 159.8kb 225.7kb
    waPCbitNQaCL6xC8VxjAwg myhost2 10.20.100.201 Adversary     435.2kb 159.8kb 275.3kb
    yaDkp-G3R0q1AJ-HUEvkSQ myhost3 10.20.100.202 Microchip     284.6kb 109.2kb 175.3kb

The output shows the total fielddata and then the individual fielddata
for the ``body`` and ``text`` fields.

cat health
==========

``health`` is a terse, one-line representation of the same information
from ``/_cluster/health``. It has one option ``ts`` to disable the
timestamping.

.. code:: shell

    % curl 192.168.56.10:9200/_cat/health
    1384308967 18:16:07 foo green 3 3 3 3 0 0 0
    % curl '192.168.56.10:9200/_cat/health?v&ts=0'
    cluster status nodeTotal nodeData shards pri relo init unassign
    foo     green          3        3      3   3    0    0        0

A common use of this command is to verify the health is consistent
across nodes:

.. code:: shell

    % pssh -i -h list.of.cluster.hosts curl -s localhost:9200/_cat/health
    [1] 20:20:52 [SUCCESS] es3.vm
    1384309218 18:20:18 foo green 3 3 3 3 0 0 0
    [2] 20:20:52 [SUCCESS] es1.vm
    1384309218 18:20:18 foo green 3 3 3 3 0 0 0
    [3] 20:20:52 [SUCCESS] es2.vm
    1384309218 18:20:18 foo green 3 3 3 3 0 0 0

A less obvious use is to track recovery of a large cluster over time.
With enough shards, starting a cluster, or even recovering after losing
a node, can take time (depending on your network & disk). A way to track
its progress is by using this command in a delayed loop:

.. code:: shell

    % while true; do curl 192.168.56.10:9200/_cat/health; sleep 120; done
    1384309446 18:24:06 foo red 3 3 20 20 0 0 1812
    1384309566 18:26:06 foo yellow 3 3 950 916 0 12 870
    1384309686 18:28:06 foo yellow 3 3 1328 916 0 12 492
    1384309806 18:30:06 foo green 3 3 1832 916 4 0 0
    ^C

In this scenario, we can tell that recovery took roughly four minutes.
If this were going on for hours, we would be able to watch the
``UNASSIGNED`` shards drop precipitously. If that number remained
static, we would have an idea that there is a problem.

**Why the timestamp?**

You typically are using the ``health`` command when a cluster is
malfunctioning. During this period, it’s extremely important to
correlate activities across log files, alerting systems, etc.

There are two outputs. The ``HH:MM:SS`` output is simply for quick human
consumption. The epoch time retains more information, including date,
and is machine sortable if your recovery spans days.

cat indices
===========

The ``indices`` command provides a cross-section of each index. This
information **spans nodes**.

.. code:: shell

    % curl 'localhost:9200/_cat/indices/twi*?v'
    health index    pri rep docs.count docs.deleted store.size pri.store.size
    green  twitter    5   1      11434            0       64mb           32mb
    green  twitter2   2   0       2030            0      5.8mb          5.8mb

We can tell quickly how many shards make up an index, the number of
docs, deleted docs, primary store size, and total store size (all shards
including replicas).

**Primaries**

The index stats by default will show them for all of an index’s shards,
including replicas. A ``pri`` flag can be supplied to enable the view of
relevant stats in the context of only the primaries.

**Examples**

Which indices are yellow?

.. code:: shell

    % curl localhost:9200/_cat/indices | grep ^yell
    yellow wiki     2 1  6401 1115 151.4mb 151.4mb
    yellow twitter  5 1 11434    0    32mb    32mb

What’s my largest index by disk usage not including replicas?

.. code:: shell

    % curl 'localhost:9200/_cat/indices?bytes=b' | sort -rnk7
    green wiki     2 0  6401 1115 158843725 158843725
    green twitter  5 1 11434    0  67155614  33577857
    green twitter2 2 0  2030    0   6125085   6125085

How many merge operations have the shards for the ``wiki`` completed?

.. code:: shell

    % curl 'localhost:9200/_cat/indices/wiki?pri&v&h=health,index,prirep,docs.count,mt'
    health index docs.count mt pri.mt
    green  wiki        9646 16     16

How much memory is used per index?

.. code:: shell

    % curl 'localhost:9200/_cat/indices?v&h=i,tm'
    i     tm
    wiki  8.1gb
    test  30.5kb
    user  1.9mb

cat master
==========

``master`` doesn’t have any extra options. It simply displays the
master’s node ID, bound IP address, and node name.

.. code:: shell

    % curl 'localhost:9200/_cat/master?v'
    id                     ip            node
    Ntgn2DcuTjGuXlhKDUD4vA 192.168.56.30 Solarr

This information is also available via the ``nodes`` command, but this
is slightly shorter when all you want to do, for example, is verify all
nodes agree on the master:

.. code:: shell

    % pssh -i -h list.of.cluster.hosts curl -s localhost:9200/_cat/master
    [1] 19:16:37 [SUCCESS] es3.vm
    Ntgn2DcuTjGuXlhKDUD4vA 192.168.56.30 Solarr
    [2] 19:16:37 [SUCCESS] es2.vm
    Ntgn2DcuTjGuXlhKDUD4vA 192.168.56.30 Solarr
    [3] 19:16:37 [SUCCESS] es1.vm
    Ntgn2DcuTjGuXlhKDUD4vA 192.168.56.30 Solarr

cat nodes
=========

The ``nodes`` command shows the cluster topology.

.. code:: sh

    % curl 192.168.56.10:9200/_cat/nodes
    SP4H 4727 192.168.56.30 9300 1.4.0 1.8.0_25 72.1gb 35.4 93.9mb 79 239.1mb 0.45 3.4h d m Boneyard
    _uhJ 5134 192.168.56.10 9300 1.4.0 1.8.0_25 72.1gb 33.3 93.9mb 85 239.1mb 0.06 3.4h d * Athena
    HfDp 4562 192.168.56.20 9300 1.4.0 1.8.0_25 72.2gb 74.5 93.9mb 83 239.1mb 0.12 3.4h d m Zarek

The first few columns tell you where your nodes live. For sanity it also
tells you what version of ES and the JVM each one runs.

.. code:: sh

    nodeId pid  ip            port version jdk
    u2PZ   4234 192.168.56.30 9300 1.4.0   1.8.0_25
    URzf   5443 192.168.56.10 9300 1.4.0   1.8.0_25
    ActN   3806 192.168.56.20 9300 1.4.0   1.8.0_25

The next few give a picture of your heap, memory, and load.

.. code:: shell

    diskAvail heapPercent heapMax ramPercent  ramMax load
       72.1gb        31.3  93.9mb         81 239.1mb 0.24
       72.1gb        19.6  93.9mb         82 239.1mb 0.05
       72.2gb        64.9  93.9mb         84 239.1mb 0.12

The last columns provide ancillary information that can often be useful
when looking at the cluster as a whole, particularly large ones. How
many master-eligible nodes do I have? How many client nodes? It looks
like someone restarted a node recently; which one was it?

.. code:: shell

    uptime data/client master name
      3.5h d           m      Boneyard
      3.5h d           *      Athena
      3.5h d           m      Zarek

**Columns**

Below is an exhaustive list of the existing headers that can be passed
to ``nodes?h=`` to retrieve the relevant details in ordered columns. If
no headers are specified, then those marked to Appear by Default will
appear. If any header is specified, then the defaults are not used.

Aliases can be used in place of the full header name for brevity.
Columns appear in the order that they are listed below unless a
different order is specified (e.g., ``h=pid,id`` versus ``h=id,pid``).

When specifying headers, the headers are not placed in the output by
default. To have the headers appear in the output, use verbose mode
(``v``). The header name will match the supplied value (e.g., ``pid``
versus ``p``). For example:

.. code:: sh

    % curl 192.168.56.10:9200/_cat/nodes?v&h=id,ip,port,v,m
    id   ip            port version m
    pLSN 192.168.56.30 9300 1.4.0   m
    k0zy 192.168.56.10 9300 1.4.0   m
    6Tyi 192.168.56.20 9300 1.4.0   *
    % curl 192.168.56.10:9200/_cat/nodes?h=id,ip,port,v,m
    pLSN 192.168.56.30 9300 1.4.0 m
    k0zy 192.168.56.10 9300 1.4.0 m
    6Tyi 192.168.56.20 9300 1.4.0 *

+----------------+----------------+----------------+----------------+----------------+
| Header         | Alias          | Appear by      | Description    | Example        |
|                |                | Default        |                |                |
+================+================+================+================+================+
| ``id``         | ``nodeId``     | No             | Unique node ID | k0zy           |
+----------------+----------------+----------------+----------------+----------------+
| ``pid``        | ``p``          | No             | Process ID     | 13061          |
+----------------+----------------+----------------+----------------+----------------+
| ``host``       | ``h``          | Yes            | Host name      | n1             |
+----------------+----------------+----------------+----------------+----------------+
| ``ip``         | ``i``          | Yes            | IP address     | 127.0.1.1      |
+----------------+----------------+----------------+----------------+----------------+
| ``port``       | ``po``         | No             | Bound          | 9300           |
|                |                |                | transport port |                |
+----------------+----------------+----------------+----------------+----------------+
| ``version``    | ``v``          | No             | Elasticsearch  | 1.4.0          |
|                |                |                | version        |                |
+----------------+----------------+----------------+----------------+----------------+
| ``build``      | ``b``          | No             | Elasticsearch  | 5c03844        |
|                |                |                | Build hash     |                |
+----------------+----------------+----------------+----------------+----------------+
| ``jdk``        | ``j``          | No             | Running Java   | 1.8.0          |
|                |                |                | version        |                |
+----------------+----------------+----------------+----------------+----------------+
| ``disk.avail`` | ``d``,         | No             | Available disk | 1.8gb          |
|                | ``disk``,      |                | space          |                |
|                | ``diskAvail``  |                |                |                |
+----------------+----------------+----------------+----------------+----------------+
| ``heap.current | ``hc``,        | No             | Used heap      | 311.2mb        |
| ``             | ``heapCurrent` |                |                |                |
|                | `              |                |                |                |
+----------------+----------------+----------------+----------------+----------------+
| ``heap.percent | ``hp``,        | Yes            | Used heap      | 7              |
| ``             | ``heapPercent` |                | percentage     |                |
|                | `              |                |                |                |
+----------------+----------------+----------------+----------------+----------------+
| ``heap.max``   | ``hm``,        | No             | Maximum        | 1015.6mb       |
|                | ``heapMax``    |                | configured     |                |
|                |                |                | heap           |                |
+----------------+----------------+----------------+----------------+----------------+
| ``ram.current` | ``rc``,        | No             | Used total     | 513.4mb        |
| `              | ``ramCurrent`` |                | memory         |                |
+----------------+----------------+----------------+----------------+----------------+
| ``ram.percent` | ``rp``,        | Yes            | Used total     | 47             |
| `              | ``ramPercent`` |                | memory         |                |
|                |                |                | percentage     |                |
+----------------+----------------+----------------+----------------+----------------+
| ``ram.max``    | ``rm``,        | No             | Total memory   | 2.9gb          |
|                | ``ramMax``     |                |                |                |
+----------------+----------------+----------------+----------------+----------------+
| ``file_desc.cu | ``fdc``,       | No             | Used file      | 123            |
| rrent``        | ``fileDescript |                | descriptors    |                |
|                | orCurrent``    |                |                |                |
+----------------+----------------+----------------+----------------+----------------+
| ``file_desc.pe | ``fdp``,       | Yes            | Used file      | 1              |
| rcent``        | ``fileDescript |                | descriptors    |                |
|                | orPercent``    |                | percentage     |                |
+----------------+----------------+----------------+----------------+----------------+
| ``file_desc.ma | ``fdm``,       | No             | Maximum number | 1024           |
| x``            | ``fileDescript |                | of file        |                |
|                | orMax``        |                | descriptors    |                |
+----------------+----------------+----------------+----------------+----------------+
| ``load``       | ``l``          | No             | Most recent    | 0.22           |
|                |                |                | load average   |                |
+----------------+----------------+----------------+----------------+----------------+
| ``uptime``     | ``u``          | No             | Node uptime    | 17.3m          |
+----------------+----------------+----------------+----------------+----------------+
| ``node.role``  | ``r``,         | Yes            | Data node (d); | d              |
|                | ``role``,      |                | Client node    |                |
|                | ``dc``,        |                | (c)            |                |
|                | ``nodeRole``   |                |                |                |
+----------------+----------------+----------------+----------------+----------------+
| ``master``     | ``m``          | Yes            | Current master | m              |
|                |                |                | (\*); master   |                |
|                |                |                | eligible (m)   |                |
+----------------+----------------+----------------+----------------+----------------+
| ``name``       | ``n``          | Yes            | Node name      | Venom          |
+----------------+----------------+----------------+----------------+----------------+
| ``completion.s | ``cs``,        | No             | Size of        | 0b             |
| ize``          | ``completionSi |                | completion     |                |
|                | ze``           |                |                |                |
+----------------+----------------+----------------+----------------+----------------+
| ``fielddata.me | ``fm``,        | No             | Used fielddata | 0b             |
| mory_size``    | ``fielddataMem |                | cache memory   |                |
|                | ory``          |                |                |                |
+----------------+----------------+----------------+----------------+----------------+
| ``fielddata.ev | ``fe``,        | No             | Fielddata      | 0              |
| ictions``      | ``fielddataEvi |                | cache          |                |
|                | ctions``       |                | evictions      |                |
+----------------+----------------+----------------+----------------+----------------+
| ``filter_cache | ``fcm``,       | No             | Used filter    | 0b             |
| .memory_size`` | ``filterCacheM |                | cache memory   |                |
|                | emory``        |                |                |                |
+----------------+----------------+----------------+----------------+----------------+
| ``filter_cache | ``fce``,       | No             | Filter cache   | 0              |
| .evictions``   | ``filterCacheE |                | evictions      |                |
|                | victions``     |                |                |                |
+----------------+----------------+----------------+----------------+----------------+
| ``flush.total` | ``ft``,        | No             | Number of      | 1              |
| `              | ``flushTotal`` |                | flushes        |                |
+----------------+----------------+----------------+----------------+----------------+
| ``flush.total_ | ``ftt``,       | No             | Time spent in  | 1              |
| time``         | ``flushTotalTi |                | flush          |                |
|                | me``           |                |                |                |
+----------------+----------------+----------------+----------------+----------------+
| ``get.current` | ``gc``,        | No             | Number of      | 0              |
| `              | ``getCurrent`` |                | current get    |                |
|                |                |                | operations     |                |
+----------------+----------------+----------------+----------------+----------------+
| ``get.time``   | ``gti``,       | No             | Time spent in  | 14ms           |
|                | ``getTime``    |                | get            |                |
+----------------+----------------+----------------+----------------+----------------+
| ``get.total``  | ``gto``,       | No             | Number of get  | 2              |
|                | ``getTotal``   |                | operations     |                |
+----------------+----------------+----------------+----------------+----------------+
| ``get.exists_t | ``geti``,      | No             | Time spent in  | 14ms           |
| ime``          | ``getExistsTim |                | successful     |                |
|                | e``            |                | gets           |                |
+----------------+----------------+----------------+----------------+----------------+
| ``get.exists_t | ``geto``,      | No             | Number of      | 2              |
| otal``         | ``getExistsTot |                | successful get |                |
|                | al``           |                | operations     |                |
+----------------+----------------+----------------+----------------+----------------+
| ``get.missing_ | ``gmti``,      | No             | Time spent in  | 0s             |
| time``         | ``getMissingTi |                | failed gets    |                |
|                | me``           |                |                |                |
+----------------+----------------+----------------+----------------+----------------+
| ``get.missing_ | ``gmto``,      | No             | Number of      | 1              |
| total``        | ``getMissingTo |                | failed get     |                |
|                | tal``          |                | operations     |                |
+----------------+----------------+----------------+----------------+----------------+
| ``id_cache.mem | ``im``,        | No             | Used ID cache  | 216b           |
| ory_size``     | ``idCacheMemor |                | memory         |                |
|                | y``            |                |                |                |
+----------------+----------------+----------------+----------------+----------------+
| ``indexing.del | ``idc``,       | No             | Number of      | 0              |
| ete_current``  | ``indexingDele |                | current        |                |
|                | teCurrent``    |                | deletion       |                |
|                |                |                | operations     |                |
+----------------+----------------+----------------+----------------+----------------+
| ``indexing.del | ``idti``,      | No             | Time spent in  | 2ms            |
| ete_time``     | ``indexingDele |                | deletions      |                |
|                | teTime``       |                |                |                |
+----------------+----------------+----------------+----------------+----------------+
| ``indexing.del | ``idto``,      | No             | Number of      | 2              |
| ete_total``    | ``indexingDele |                | deletion       |                |
|                | teTotal``      |                | operations     |                |
+----------------+----------------+----------------+----------------+----------------+
| ``indexing.ind | ``iic``,       | No             | Number of      | 0              |
| ex_current``   | ``indexingInde |                | current        |                |
|                | xCurrent``     |                | indexing       |                |
|                |                |                | operations     |                |
+----------------+----------------+----------------+----------------+----------------+
| ``indexing.ind | ``iiti``,      | No             | Time spent in  | 134ms          |
| ex_time``      | ``indexingInde |                | indexing       |                |
|                | xTime``        |                |                |                |
+----------------+----------------+----------------+----------------+----------------+
| ``indexing.ind | ``iito``,      | No             | Number of      | 1              |
| ex_total``     | ``indexingInde |                | indexing       |                |
|                | xTotal``       |                | operations     |                |
+----------------+----------------+----------------+----------------+----------------+
| ``merges.curre | ``mc``,        | No             | Number of      | 0              |
| nt``           | ``mergesCurren |                | current merge  |                |
|                | t``            |                | operations     |                |
+----------------+----------------+----------------+----------------+----------------+
| ``merges.curre | ``mcd``,       | No             | Number of      | 0              |
| nt_docs``      | ``mergesCurren |                | current        |                |
|                | tDocs``        |                | merging        |                |
|                |                |                | documents      |                |
+----------------+----------------+----------------+----------------+----------------+
| ``merges.curre | ``mcs``,       | No             | Size of        | 0b             |
| nt_size``      | ``mergesCurren |                | current merges |                |
|                | tSize``        |                |                |                |
+----------------+----------------+----------------+----------------+----------------+
| ``merges.total | ``mt``,        | No             | Number of      | 0              |
| ``             | ``mergesTotal` |                | completed      |                |
|                | `              |                | merge          |                |
|                |                |                | operations     |                |
+----------------+----------------+----------------+----------------+----------------+
| ``merges.total | ``mtd``,       | No             | Number of      | 0              |
| _docs``        | ``mergesTotalD |                | merged         |                |
|                | ocs``          |                | documents      |                |
+----------------+----------------+----------------+----------------+----------------+
| ``merges.total | ``mts``,       | No             | Size of        | 0b             |
| _size``        | ``mergesTotalS |                | current merges |                |
|                | ize``          |                |                |                |
+----------------+----------------+----------------+----------------+----------------+
| ``merges.total | ``mtt``,       | No             | Time spent     | 0s             |
| _time``        | ``mergesTotalT |                | merging        |                |
|                | ime``          |                | documents      |                |
+----------------+----------------+----------------+----------------+----------------+
| ``percolate.cu | ``pc``,        | No             | Number of      | 0              |
| rrent``        | ``percolateCur |                | current        |                |
|                | rent``         |                | percolations   |                |
+----------------+----------------+----------------+----------------+----------------+
| ``percolate.me | ``pm``,        | No             | Memory used by | 0b             |
| mory_size``    | ``percolateMem |                | current        |                |
|                | ory``          |                | percolations   |                |
+----------------+----------------+----------------+----------------+----------------+
| ``percolate.qu | ``pq``,        | No             | Number of      | 0              |
| eries``        | ``percolateQue |                | registered     |                |
|                | ries``         |                | percolation    |                |
|                |                |                | queries        |                |
+----------------+----------------+----------------+----------------+----------------+
| ``percolate.ti | ``pti``,       | No             | Time spent     | 0s             |
| me``           | ``percolateTim |                | percolating    |                |
|                | e``            |                |                |                |
+----------------+----------------+----------------+----------------+----------------+
| ``percolate.to | ``pto``,       | No             | Total          | 0              |
| tal``          | ``percolateTot |                | percolations   |                |
|                | al``           |                |                |                |
+----------------+----------------+----------------+----------------+----------------+
| ``refresh.tota | ``rto``,       | No             | Number of      | 16             |
| l``            | ``refreshTotal |                | refreshes      |                |
|                | ``             |                |                |                |
+----------------+----------------+----------------+----------------+----------------+
| ``refresh.time | ``rti``,       | No             | Time spent in  | 91ms           |
| ``             | ``refreshTime` |                | refreshes      |                |
|                | `              |                |                |                |
+----------------+----------------+----------------+----------------+----------------+
| ``search.fetch | ``sfc``,       | No             | Current fetch  | 0              |
| _current``     | ``searchFetchC |                | phase          |                |
|                | urrent``       |                | operations     |                |
+----------------+----------------+----------------+----------------+----------------+
| ``search.fetch | ``sfti``,      | No             | Time spent in  | 37ms           |
| _time``        | ``searchFetchT |                | fetch phase    |                |
|                | ime``          |                |                |                |
+----------------+----------------+----------------+----------------+----------------+
| ``search.fetch | ``sfto``,      | No             | Number of      | 7              |
| _total``       | ``searchFetchT |                | fetch          |                |
|                | otal``         |                | operations     |                |
+----------------+----------------+----------------+----------------+----------------+
| ``search.open_ | ``so``,        | No             | Open search    | 0              |
| contexts``     | ``searchOpenCo |                | contexts       |                |
|                | ntexts``       |                |                |                |
+----------------+----------------+----------------+----------------+----------------+
| ``search.query | ``sqc``,       | No             | Current query  | 0              |
| _current``     | ``searchFetchC |                | phase          |                |
|                | urrent``       |                | operations     |                |
+----------------+----------------+----------------+----------------+----------------+
| ``search.query | ``sqti``,      | No             | Time spent in  | 43ms           |
| _time``        | ``searchFetchT |                | query phase    |                |
|                | ime``          |                |                |                |
+----------------+----------------+----------------+----------------+----------------+
| ``search.query | ``sqto``,      | No             | Number of      | 9              |
| _total``       | ``searchFetchT |                | query          |                |
|                | otal``         |                | operations     |                |
+----------------+----------------+----------------+----------------+----------------+
| ``segments.cou | ``sc``,        | No             | Number of      | 4              |
| nt``           | ``segmentsCoun |                | segments       |                |
|                | t``            |                |                |                |
+----------------+----------------+----------------+----------------+----------------+
| ``segments.mem | ``sm``,        | No             | Memory used by | 1.4kb          |
| ory``          | ``segmentsMemo |                | segments       |                |
|                | ry``           |                |                |                |
+----------------+----------------+----------------+----------------+----------------+
| ``segments.ind | ``siwm``,      | No             | Memory used by | 18mb           |
| ex_writer_memo | ``segmentsInde |                | index writer   |                |
| ry``           | xWriterMemory` |                |                |                |
|                | `              |                |                |                |
+----------------+----------------+----------------+----------------+----------------+
| ``segments.ind | ``siwmx``,     | No             | Maximum memory | 32mb           |
| ex_writer_max_ | ``segmentsInde |                | index writer   |                |
| memory``       | xWriterMaxMemo |                | may use before |                |
|                | ry``           |                | it must write  |                |
|                |                |                | buffered       |                |
|                |                |                | documents to a |                |
|                |                |                | new segment    |                |
+----------------+----------------+----------------+----------------+----------------+
| ``segments.ver | ``svmm``,      | No             | Memory used by | 1.0kb          |
| sion_map_memor | ``segmentsVers |                | version map    |                |
| y``            | ionMapMemory`` |                |                |                |
+----------------+----------------+----------------+----------------+----------------+

cat pending tasks
=================

``pending_tasks`` provides the same information as the
```/_cluster/pending_tasks`` <#cluster-pending>`__ API in a convenient
tabular format.

.. code:: shell

    % curl 'localhost:9200/_cat/pending_tasks?v'
    insertOrder timeInQueue priority source
           1685       855ms HIGH     update-mapping [foo][t]
           1686       843ms HIGH     update-mapping [foo][t]
           1693       753ms HIGH     refresh-mapping [foo][[t]]
           1688       816ms HIGH     update-mapping [foo][t]
           1689       802ms HIGH     update-mapping [foo][t]
           1690       787ms HIGH     update-mapping [foo][t]
           1691       773ms HIGH     update-mapping [foo][t]

cat plugins
===========

The ``plugins`` command provides a view per node of running plugins.
This information **spans nodes**.

.. code:: shell

    % curl 'localhost:9200/_cat/plugins?v'
    name    component       version        type isolation url
    Abraxas cloud-azure     2.1.0-SNAPSHOT j    x
    Abraxas lang-groovy     2.0.0          j    x
    Abraxas lang-javascript 2.0.0-SNAPSHOT j    x
    Abraxas marvel          NA             j/s  x         /_plugin/marvel/
    Abraxas lang-python     2.0.0-SNAPSHOT j    x
    Abraxas inquisitor      NA             s              /_plugin/inquisitor/
    Abraxas kopf            0.5.2          s              /_plugin/kopf/
    Abraxas segmentspy      NA             s              /_plugin/segmentspy/

We can tell quickly how many plugins per node we have and which
versions.

cat recovery
============

The ``recovery`` command is a view of index shard recoveries, both
on-going and previously completed. It is a more compact view of the JSON
`recovery <#indices-recovery>`__ API.

A recovery event occurs anytime an index shard moves to a different node
in the cluster. This can happen during a snapshot recovery, a change in
replication level, node failure, or on node startup. This last type is
called a local gateway recovery and is the normal way for shards to be
loaded from disk when a node starts up.

As an example, here is what the recovery state of a cluster may look
like when there are no shards in transit from one node to another:

.. code:: shell

    > curl -XGET 'localhost:9200/_cat/recovery?v'
    index shard time type    stage source target files percent bytes     percent
    wiki  0     73   gateway done  hostA  hostA  36    100.0%  24982806 100.0%
    wiki  1     245  gateway done  hostA  hostA  33    100.0%  24501912 100.0%
    wiki  2     230  gateway done  hostA  hostA  36    100.0%  30267222 100.0%

In the above case, the source and target nodes are the same because the
recovery type was gateway, i.e. they were read from local storage on
node start.

Now let’s see what a live recovery looks like. By increasing the replica
count of our index and bringing another node online to host the
replicas, we can see what a live shard recovery looks like.

.. code:: shell

    > curl -XPUT 'localhost:9200/wiki/_settings' -d'{"number_of_replicas":1}'
    {"acknowledged":true}

    > curl -XGET 'localhost:9200/_cat/recovery?v'
    index shard time type    stage source target files percent bytes    percent
    wiki  0     1252 gateway done  hostA  hostA  4     100.0%  23638870 100.0%
    wiki  0     1672 replica index hostA  hostB  4     75.0%   23638870 48.8%
    wiki  1     1698 replica index hostA  hostB  4     75.0%   23348540 49.4%
    wiki  1     4812 gateway done  hostA  hostA  33    100.0%  24501912 100.0%
    wiki  2     1689 replica index hostA  hostB  4     75.0%   28681851 40.2%
    wiki  2     5317 gateway done  hostA  hostA  36    100.0%  30267222 100.0%

We can see in the above listing that our 3 initial shards are in various
stages of being replicated from one node to another. Notice that the
recovery type is shown as ``replica``. The files and bytes copied are
real-time measurements.

Finally, let’s see what a snapshot recovery looks like. Assuming I have
previously made a backup of my index, I can restore it using the
`snapshot and restore <#modules-snapshots>`__ API.

.. code:: shell

    > curl -XPOST 'localhost:9200/_snapshot/imdb/snapshot_2/_restore'
    {"acknowledged":true}
    > curl -XGET 'localhost:9200/_cat/recovery?v'
    index shard time type     stage repository snapshot files percent bytes percent
    imdb  0     1978 snapshot done  imdb       snap_1   79    8.0%    12086 9.0%
    imdb  1     2790 snapshot index imdb       snap_1   88    7.7%    11025 8.1%
    imdb  2     2790 snapshot index imdb       snap_1   85    0.0%    12072 0.0%
    imdb  3     2796 snapshot index imdb       snap_1   85    2.4%    12048 7.2%
    imdb  4     819  snapshot init  imdb       snap_1   0     0.0%    0     0.0%

cat thread pool
===============

The ``thread_pool`` command shows cluster wide thread pool statistics
per node. By default the active, queue and rejected statistics are
returned for the bulk, index and search thread pools.

.. code:: shell

    % curl 192.168.56.10:9200/_cat/thread_pool
    host1 192.168.1.35 0 0 0 0 0 0 0 0 0
    host2 192.168.1.36 0 0 0 0 0 0 0 0 0

The first two columns contain the host and ip of a node.

.. code:: shell

    host      ip
    host1 192.168.1.35
    host2 192.168.1.36

The next three columns show the active queue and rejected statistics for
the bulk thread pool.

.. code:: shell

    bulk.active bulk.queue bulk.rejected
              0          0             0

The remaining columns show the active queue and rejected statistics of
the index and search thread pool respectively.

Also other statistics of different thread pools can be retrieved by
using the ``h`` (header) parameter.

.. code:: shell

    % curl 'localhost:9200/_cat/thread_pool?v&h=id,host,suggest.active,suggest.rejected,suggest.completed'
    host      suggest.active suggest.rejected suggest.completed
    host1                  0                0                 0
    host2                  0                0                 0

Here the host columns and the active, rejected and completed suggest
thread pool statistic are displayed. The suggest thread pool won’t be
displayed by default, so you always need to be specific about what
statistic you want to display.

**Available Thread Pools**

Currently available `thread pools <#modules-threadpool>`__:

+--------------------------+--------------------------+--------------------------+
| Thread Pool              | Alias                    | Description              |
+==========================+==========================+==========================+
| ``bulk``                 | ``b``                    | Thread pool used for     |
|                          |                          | `bulk <#docs-bulk>`__    |
|                          |                          | operations               |
+--------------------------+--------------------------+--------------------------+
| ``flush``                | ``f``                    | Thread pool used for     |
|                          |                          | `flush <#indices-flush>` |
|                          |                          | __                       |
|                          |                          | operations               |
+--------------------------+--------------------------+--------------------------+
| ``generic``              | ``ge``                   | Thread pool used for     |
|                          |                          | generic operations (e.g. |
|                          |                          | background node          |
|                          |                          | discovery)               |
+--------------------------+--------------------------+--------------------------+
| ``get``                  | ``g``                    | Thread pool used for     |
|                          |                          | `get <#docs-get>`__      |
|                          |                          | operations               |
+--------------------------+--------------------------+--------------------------+
| ``index``                | ``i``                    | Thread pool used for     |
|                          |                          | `index <#docs-index_>`__ |
|                          |                          | /`delete <#docs-delete>` |
|                          |                          | __                       |
|                          |                          | operations               |
+--------------------------+--------------------------+--------------------------+
| ``management``           | ``ma``                   | Thread pool used for     |
|                          |                          | management of            |
|                          |                          | Elasticsearch (e.g.      |
|                          |                          | cluster management)      |
+--------------------------+--------------------------+--------------------------+
| ``merge``                | ``m``                    | Thread pool used for     |
|                          |                          | `merge <#index-modules-m |
|                          |                          | erge>`__                 |
|                          |                          | operations               |
+--------------------------+--------------------------+--------------------------+
| ``optimize``             | ``o``                    | Thread pool used for     |
|                          |                          | `optimize <#indices-opti |
|                          |                          | mize>`__                 |
|                          |                          | operations               |
+--------------------------+--------------------------+--------------------------+
| ``percolate``            | ``p``                    | Thread pool used for     |
|                          |                          | `percolator <#search-per |
|                          |                          | colate>`__               |
|                          |                          | operations               |
+--------------------------+--------------------------+--------------------------+
| ``refresh``              | ``r``                    | Thread pool used for     |
|                          |                          | `refresh <#indices-refre |
|                          |                          | sh>`__                   |
|                          |                          | operations               |
+--------------------------+--------------------------+--------------------------+
| ``search``               | ``s``                    | Thread pool used for     |
|                          |                          | `search <#search-search> |
|                          |                          | `__/`count <#search-coun |
|                          |                          | t>`__                    |
|                          |                          | operations               |
+--------------------------+--------------------------+--------------------------+
| ``snapshot``             | ``sn``                   | Thread pool used for     |
|                          |                          | `snapshot <#modules-snap |
|                          |                          | shots>`__                |
|                          |                          | operations               |
+--------------------------+--------------------------+--------------------------+
| ``suggest``              | ``su``                   | Thread pool used for     |
|                          |                          | `suggester <#search-sugg |
|                          |                          | esters>`__               |
|                          |                          | operations               |
+--------------------------+--------------------------+--------------------------+
| ``warmer``               | ``w``                    | Thread pool used for     |
|                          |                          | `index                   |
|                          |                          | warm-up <#indices-warmer |
|                          |                          | s>`__                    |
|                          |                          | operations               |
+--------------------------+--------------------------+--------------------------+

The thread pool name (or alias) must be combined with a thread pool
field below to retrieve the requested information.

**Thread Pool Fields**

For each thread pool, you can load details about it by using the field
names in the table below, either using the full field name (e.g.
``bulk.active``) or its alias (e.g. ``sa`` is equivalent to
``search.active``).

+--------------------------+--------------------------+--------------------------+
| Field Name               | Alias                    | Description              |
+==========================+==========================+==========================+
| ``type``                 | ``t``                    | The current (\*) type of |
|                          |                          | thread pool (``cached``, |
|                          |                          | ``fixed`` or             |
|                          |                          | ``scaling``)             |
+--------------------------+--------------------------+--------------------------+
| ``active``               | ``a``                    | The number of active     |
|                          |                          | threads in the current   |
|                          |                          | thread pool              |
+--------------------------+--------------------------+--------------------------+
| ``size``                 | ``s``                    | The number of threads in |
|                          |                          | the current thread pool  |
+--------------------------+--------------------------+--------------------------+
| ``queue``                | ``q``                    | The number of tasks in   |
|                          |                          | the queue for the        |
|                          |                          | current thread pool      |
+--------------------------+--------------------------+--------------------------+
| ``queueSize``            | ``qs``                   | The maximum number of    |
|                          |                          | tasks in the queue for   |
|                          |                          | the current thread pool  |
+--------------------------+--------------------------+--------------------------+
| ``rejected``             | ``r``                    | The number of rejected   |
|                          |                          | threads in the current   |
|                          |                          | thread pool              |
+--------------------------+--------------------------+--------------------------+
| ``largest``              | ``l``                    | The highest number of    |
|                          |                          | active threads in the    |
|                          |                          | current thread pool      |
+--------------------------+--------------------------+--------------------------+
| ``completed``            | ``c``                    | The number of completed  |
|                          |                          | threads in the current   |
|                          |                          | thread pool              |
+--------------------------+--------------------------+--------------------------+
| ``min``                  | ``mi``                   | The configured minimum   |
|                          |                          | number of active threads |
|                          |                          | allowed in the current   |
|                          |                          | thread pool              |
+--------------------------+--------------------------+--------------------------+
| ``max``                  | ``ma``                   | The configured maximum   |
|                          |                          | number of active threads |
|                          |                          | allowed in the current   |
|                          |                          | thread pool              |
+--------------------------+--------------------------+--------------------------+
| ``keepAlive``            | ``k``                    | The configured keep      |
|                          |                          | alive time for threads   |
+--------------------------+--------------------------+--------------------------+

**Other Fields**

In addition to details about each thread pool, it is also convenient to
get an understanding of where those thread pools reside. As such, you
can request other details like the ``ip`` of the responding node(s).

+--------------------------+--------------------------+--------------------------+
| Field Name               | Alias                    | Description              |
+==========================+==========================+==========================+
| ``id``                   | ``nodeId``               | The unique node ID       |
+--------------------------+--------------------------+--------------------------+
| ``pid``                  | ``p``                    | The process ID of the    |
|                          |                          | running node             |
+--------------------------+--------------------------+--------------------------+
| ``host``                 | ``h``                    | The hostname for the     |
|                          |                          | current node             |
+--------------------------+--------------------------+--------------------------+
| ``ip``                   | ``i``                    | The IP address for the   |
|                          |                          | current node             |
+--------------------------+--------------------------+--------------------------+
| ``port``                 | ``po``                   | The bound transport port |
|                          |                          | for the current node     |
+--------------------------+--------------------------+--------------------------+

cat shards
==========

The ``shards`` command is the detailed view of what nodes contain which
shards. It will tell you if it’s a primary or replica, the number of
docs, the bytes it takes on disk, and the node where it’s located.

Here we see a single index, with three primary shards and no replicas:

.. code:: shell

    % curl 192.168.56.20:9200/_cat/shards
    wiki1 0 p STARTED 3014 31.1mb 192.168.56.10 Stiletto
    wiki1 1 p STARTED 3013 29.6mb 192.168.56.30 Frankie Raye
    wiki1 2 p STARTED 3973 38.1mb 192.168.56.20 Commander Kraken

Index pattern
-------------

If you have many shards, you may wish to limit which indices show up in
the output. You can always do this with ``grep``, but you can save some
bandwidth by supplying an index pattern to the end.

.. code:: shell

    % curl 192.168.56.20:9200/_cat/shards/wiki2
    wiki2 0 p STARTED 197 3.2mb 192.168.56.10 Stiletto
    wiki2 1 p STARTED 205 5.9mb 192.168.56.30 Frankie Raye
    wiki2 2 p STARTED 275 7.8mb 192.168.56.20 Commander Kraken

Relocation
----------

Let’s say you’ve checked your health and you see two relocating shards.
Where are they from and where are they going?

.. code:: shell

    % curl 192.168.56.10:9200/_cat/health
    1384315316 20:01:56 foo green 3 3 12 6 2 0 0
    % curl 192.168.56.10:9200/_cat/shards | fgrep RELO
    wiki1 0 r RELOCATING 3014 31.1mb 192.168.56.20 Commander Kraken -> 192.168.56.30 Frankie Raye
    wiki1 1 r RELOCATING 3013 29.6mb 192.168.56.10 Stiletto -> 192.168.56.30 Frankie Raye

Shard states
------------

Before a shard can be used, it goes through an ``INITIALIZING`` state.
``shards`` can show you which ones.

.. code:: shell

    % curl -XPUT 192.168.56.20:9200/_settings -d'{"number_of_replicas":1}'
    {"acknowledged":true}
    % curl 192.168.56.20:9200/_cat/shards
    wiki1 0 p STARTED      3014 31.1mb 192.168.56.10 Stiletto
    wiki1 0 r INITIALIZING    0 14.3mb 192.168.56.30 Frankie Raye
    wiki1 1 p STARTED      3013 29.6mb 192.168.56.30 Frankie Raye
    wiki1 1 r INITIALIZING    0 13.1mb 192.168.56.20 Commander Kraken
    wiki1 2 r INITIALIZING    0   14mb 192.168.56.10 Stiletto
    wiki1 2 p STARTED      3973 38.1mb 192.168.56.20 Commander Kraken

If a shard cannot be assigned, for example you’ve overallocated the
number of replicas for the number of nodes in the cluster, they will
remain ``UNASSIGNED``.

.. code:: shell

    % curl -XPUT 192.168.56.20:9200/_settings -d'{"number_of_replicas":3}'
    % curl 192.168.56.20:9200/_cat/health
    1384316325 20:18:45 foo yellow 3 3 9 3 0 0 3
    % curl 192.168.56.20:9200/_cat/shards
    wiki1 0 p STARTED    3014 31.1mb 192.168.56.10 Stiletto
    wiki1 0 r STARTED    3014 31.1mb 192.168.56.30 Frankie Raye
    wiki1 0 r STARTED    3014 31.1mb 192.168.56.20 Commander Kraken
    wiki1 0 r UNASSIGNED
    wiki1 1 r STARTED    3013 29.6mb 192.168.56.10 Stiletto
    wiki1 1 p STARTED    3013 29.6mb 192.168.56.30 Frankie Raye
    wiki1 1 r STARTED    3013 29.6mb 192.168.56.20 Commander Kraken
    wiki1 1 r UNASSIGNED
    wiki1 2 r STARTED    3973 38.1mb 192.168.56.10 Stiletto
    wiki1 2 r STARTED    3973 38.1mb 192.168.56.30 Frankie Raye
    wiki1 2 p STARTED    3973 38.1mb 192.168.56.20 Commander Kraken
    wiki1 2 r UNASSIGNED

**Node specification**

Most cluster level APIs allow to specify which nodes to execute on (for
example, getting the node stats for a node). Nodes can be identified in
the APIs either using their internal node id, the node name, address,
custom attributes, or just the ``_local`` node receiving the request.
For example, here are some sample executions of nodes info:

.. code:: js

    # Local
    curl localhost:9200/_nodes/_local
    # Address
    curl localhost:9200/_nodes/10.0.0.3,10.0.0.4
    curl localhost:9200/_nodes/10.0.0.*
    # Names
    curl localhost:9200/_nodes/node_name_goes_here
    curl localhost:9200/_nodes/node_name_goes_*
    # Attributes (set something like node.rack: 2 in the config)
    curl localhost:9200/_nodes/rack:2
    curl localhost:9200/_nodes/ra*:2
    curl localhost:9200/_nodes/ra*:2*

Cluster Health
==============

The cluster health API allows to get a very simple status on the health
of the cluster.

.. code:: js

    $ curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
    {
      "cluster_name" : "testcluster",
      "status" : "green",
      "timed_out" : false,
      "number_of_nodes" : 2,
      "number_of_data_nodes" : 2,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    }

The API can also be executed against one or more indices to get just the
specified indices health:

.. code:: js

    $ curl -XGET 'http://localhost:9200/_cluster/health/test1,test2'

The cluster health status is: ``green``, ``yellow`` or ``red``. On the
shard level, a ``red`` status indicates that the specific shard is not
allocated in the cluster, ``yellow`` means that the primary shard is
allocated but replicas are not, and ``green`` means that all shards are
allocated. The index level status is controlled by the worst shard
status. The cluster status is controlled by the worst index status.

One of the main benefits of the API is the ability to wait until the
cluster reaches a certain high water-mark health level. For example, the
following will wait for 50 seconds for the cluster to reach the
``yellow`` level (if it reaches the ``green`` or ``yellow`` status
before 50 seconds elapse, it will return at that point):

.. code:: js

    $ curl -XGET 'http://localhost:9200/_cluster/health?wait_for_status=yellow&timeout=50s'

**Request Parameters**

The cluster health API accepts the following request parameters:

``level``
    Can be one of ``cluster``, ``indices`` or ``shards``. Controls the
    details level of the health information returned. Defaults to
    ``cluster``.

``wait_for_status``
    One of ``green``, ``yellow`` or ``red``. Will wait (until the
    timeout provided) until the status of the cluster changes to the one
    provided or better, i.e. ``green`` > ``yellow`` > ``red``. By
    default, will not wait for any status.

``wait_for_relocating_shards``
    A number controlling to how many relocating shards to wait for.
    Usually will be ``0`` to indicate to wait till all relocations have
    happened. Defaults to not wait.

``wait_for_nodes``
    The request waits until the specified number ``N`` of nodes is
    available. It also accepts ``>=N``, ``<=N``, ``>N`` and ``<N``.
    Alternatively, it is possible to use ``ge(N)``, ``le(N)``, ``gt(N)``
    and ``lt(N)`` notation.

``timeout``
    A time based parameter controlling how long to wait if one of the
    wait\_for\_XXX are provided. Defaults to ``30s``.

The following is an example of getting the cluster health at the
``shards`` level:

.. code:: js

    $ curl -XGET 'http://localhost:9200/_cluster/health/twitter?level=shards'

Cluster State
=============

The cluster state API allows to get a comprehensive state information of
the whole cluster.

.. code:: js

    $ curl -XGET 'http://localhost:9200/_cluster/state'

By default, the cluster state request is routed to the master node, to
ensure that the latest cluster state is returned. For debugging
purposes, you can retrieve the cluster state local to a particular node
by adding ``local=true`` to the query string.

**Response Filters**

As the cluster state can grow (depending on the number of shards and
indices, your mapping, templates), it is possible to filter the cluster
state response specifying the parts in the URL.

.. code:: js

    $ curl -XGET 'http://localhost:9200/_cluster/state/{metrics}/{indices}'

``metrics`` can be a comma-separated list of

``version``
    Shows the cluster state version.

``master_node``
    Shows the elected ``master_node`` part of the response

``nodes``
    Shows the ``nodes`` part of the response

``routing_table``
    Shows the ``routing_table`` part of the response. If you supply a
    comma separated list of indices, the returned output will only
    contain the indices listed.

``metadata``
    Shows the ``metadata`` part of the response. If you supply a comma
    separated list of indices, the returned output will only contain the
    indices listed.

``blocks``
    Shows the ``blocks`` part of the response

A couple of example calls:

.. code:: js

    # return only metadata and routing_table data for specified indices
    $ curl -XGET 'http://localhost:9200/_cluster/state/metadata,routing_table/foo,bar'

    # return everything for these two indices
    $ curl -XGET 'http://localhost:9200/_cluster/state/_all/foo,bar'

    # Return only blocks data
    $ curl -XGET 'http://localhost:9200/_cluster/state/blocks'

Cluster Stats
=============

The Cluster Stats API allows to retrieve statistics from a cluster wide
perspective. The API returns basic index metrics (shard numbers, store
size, memory usage) and information about the current nodes that form
the cluster (number, roles, os, jvm versions, memory usage, cpu and
installed plugins).

.. code:: js

    curl -XGET 'http://localhost:9200/_cluster/stats?human&pretty'

Will return, for example:

.. code:: js

    {
       "cluster_name": "elasticsearch",
       "status": "green",
       "indices": {
          "count": 3,
          "shards": {
             "total": 35,
             "primaries": 15,
             "replication": 1.333333333333333,
             "index": {
                "shards": {
                   "min": 10,
                   "max": 15,
                   "avg": 11.66666666666666
                },
                "primaries": {
                   "min": 5,
                   "max": 5,
                   "avg": 5
                },
                "replication": {
                   "min": 1,
                   "max": 2,
                   "avg": 1.3333333333333333
                }
             }
          },
          "docs": {
             "count": 2,
             "deleted": 0
          },
          "store": {
             "size": "5.6kb",
             "size_in_bytes": 5770,
             "throttle_time": "0s",
             "throttle_time_in_millis": 0
          },
          "fielddata": {
             "memory_size": "0b",
             "memory_size_in_bytes": 0,
             "evictions": 0
          },
          "filter_cache": {
             "memory_size": "0b",
             "memory_size_in_bytes": 0,
             "evictions": 0
          },
          "id_cache": {
             "memory_size": "0b",
             "memory_size_in_bytes": 0
          },
          "completion": {
             "size": "0b",
             "size_in_bytes": 0
          },
          "segments": {
             "count": 2
          }
       },
       "nodes": {
          "count": {
             "total": 2,
             "master_only": 0,
             "data_only": 0,
             "master_data": 2,
             "client": 0
          },
          "versions": [
             "1.4.0"
          ],
          "os": {
             "available_processors": 4,
             "mem": {
                "total": "8gb",
                "total_in_bytes": 8589934592
             },
             "cpu": [
                {
                   "vendor": "Intel",
                   "model": "MacBookAir5,2",
                   "mhz": 2000,
                   "total_cores": 4,
                   "total_sockets": 4,
                   "cores_per_socket": 16,
                   "cache_size": "256b",
                   "cache_size_in_bytes": 256,
                   "count": 1
                }
             ]
          },
          "process": {
             "cpu": {
                "percent": 3
             },
             "open_file_descriptors": {
                "min": 200,
                "max": 346,
                "avg": 273
             }
          },
          "jvm": {
             "max_uptime": "24s",
             "max_uptime_in_millis": 24054,
             "version": [
                {
                   "version": "1.6.0_45",
                   "vm_name": "Java HotSpot(TM) 64-Bit Server VM",
                   "vm_version": "20.45-b01-451",
                   "vm_vendor": "Apple Inc.",
                   "count": 2
                }
             ],
             "mem": {
                "heap_used": "38.3mb",
                "heap_used_in_bytes": 40237120,
                "heap_max": "1.9gb",
                "heap_max_in_bytes": 2130051072
             },
             "threads": 89
          },
          "fs":
             {
                "total": "232.9gb",
                "total_in_bytes": 250140434432,
                "free": "31.3gb",
                "free_in_bytes": 33705881600,
                "available": "31.1gb",
                "available_in_bytes": 33443737600,
                "disk_reads": 21202753,
                "disk_writes": 27028840,
                "disk_io_op": 48231593,
                "disk_read_size": "528gb",
                "disk_read_size_in_bytes": 566980806656,
                "disk_write_size": "617.9gb",
                "disk_write_size_in_bytes": 663525366784,
                "disk_io_size": "1145.9gb",
                "disk_io_size_in_bytes": 1230506173440
           },
          "plugins": [
             // all plugins installed on nodes
             {
                "name": "inquisitor",
                "description": "",
                "url": "/_plugin/inquisitor/",
                "jvm": false,
                "site": true
             }
          ]
       }
    }

Pending cluster tasks
=====================

The pending cluster tasks API returns a list of any cluster-level
changes (e.g. create index, update mapping, allocate or fail shard)
which have not yet been executed.

.. code:: js

    $ curl -XGET 'http://localhost:9200/_cluster/pending_tasks'

Usually this will return an empty list as cluster-level changes are
usually fast. However if there are tasks queued up, the output will look
something like this:

.. code:: js

    {
       "tasks": [
          {
             "insert_order": 101,
             "priority": "URGENT",
             "source": "create-index [foo_9], cause [api]",
             "time_in_queue_millis": 86,
             "time_in_queue": "86ms"
          },
          {
             "insert_order": 46,
             "priority": "HIGH",
             "source": "shard-started ([foo_2][1], node[tMTocMvQQgGCkj7QDHl3OA], [P], s[INITIALIZING]), reason [after recovery from gateway]",
             "time_in_queue_millis": 842,
             "time_in_queue": "842ms"
          },
          {
             "insert_order": 45,
             "priority": "HIGH",
             "source": "shard-started ([foo_2][0], node[tMTocMvQQgGCkj7QDHl3OA], [P], s[INITIALIZING]), reason [after recovery from gateway]",
             "time_in_queue_millis": 858,
             "time_in_queue": "858ms"
          }
      ]
    }

Cluster Reroute
===============

The reroute command allows to explicitly execute a cluster reroute
allocation command including specific commands. For example, a shard can
be moved from one node to another explicitly, an allocation can be
canceled, or an unassigned shard can be explicitly allocated on a
specific node.

Here is a short example of how a simple reroute API call:

.. code:: js

    curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
        "commands" : [ {
            "move" :
                {
                  "index" : "test", "shard" : 0,
                  "from_node" : "node1", "to_node" : "node2"
                }
            },
            {
              "allocate" : {
                  "index" : "test", "shard" : 1, "node" : "node3"
              }
            }
        ]
    }'

An important aspect to remember is the fact that once when an allocation
occurs, the cluster will aim at re-balancing its state back to an even
state. For example, if the allocation includes moving a shard from
``node1`` to ``node2``, in an ``even`` state, then another shard will be
moved from ``node2`` to ``node1`` to even things out.

The cluster can be set to disable allocations, which means that only the
explicitly allocations will be performed. Obviously, only once all
commands has been applied, the cluster will aim to be re-balance its
state.

Another option is to run the commands in ``dry_run`` (as a URI flag, or
in the request body). This will cause the commands to apply to the
current cluster state, and return the resulting cluster after the
commands (and re-balancing) has been applied.

If the ``explain`` parameter is specified, a detailed explanation of why
the commands could or could not be executed is returned.

The commands supported are:

``move``
    Move a started shard from one node to another node. Accepts
    ``index`` and ``shard`` for index name and shard number,
    ``from_node`` for the node to move the shard ``from``, and
    ``to_node`` for the node to move the shard to.

``cancel``
    Cancel allocation of a shard (or recovery). Accepts ``index`` and
    ``shard`` for index name and shard number, and ``node`` for the node
    to cancel the shard allocation on. It also accepts ``allow_primary``
    flag to explicitly specify that it is allowed to cancel allocation
    for a primary shard. This can be used to force resynchronization of
    existing replicas from the primary shard by cancelling them and
    allowing them to be reinitialized through the standard reallocation
    process.

``allocate``
    Allocate an unassigned shard to a node. Accepts the ``index`` and
    ``shard`` for index name and shard number, and ``node`` to allocate
    the shard to. It also accepts ``allow_primary`` flag to explicitly
    specify that it is allowed to explicitly allocate a primary shard
    (might result in data loss).

Cluster Update Settings
=======================

Allows to update cluster wide specific settings. Settings updated can
either be persistent (applied cross restarts) or transient (will not
survive a full cluster restart). Here is an example:

.. code:: js

    curl -XPUT localhost:9200/_cluster/settings -d '{
        "persistent" : {
            "discovery.zen.minimum_master_nodes" : 2
        }
    }'

Or:

.. code:: js

    curl -XPUT localhost:9200/_cluster/settings -d '{
        "transient" : {
            "discovery.zen.minimum_master_nodes" : 2
        }
    }'

The cluster responds with the settings updated. So the response for the
last example will be:

.. code:: js

    {
        "persistent" : {},
        "transient" : {
            "discovery.zen.minimum_master_nodes" : "2"
        }
    }'

Cluster wide settings can be returned using:

.. code:: js

    curl -XGET localhost:9200/_cluster/settings

There is a specific list of settings that can be updated, those include:

**Cluster settings**

**Routing allocation**

**Awareness**

``cluster.routing.allocation.awareness.attributes``
    See ?.

``cluster.routing.allocation.awareness.force.*``
    See ?.

**Balanced Shards**

All these values are relative to one another. The first three are used
to compose a three separate weighting functions into one. The cluster is
balanced when no allowed action can bring the weights of each node
closer together by more then the fourth setting. Actions might not be
allowed, for instance, due to forced awareness or allocation filtering.

``cluster.routing.allocation.balance.shard``
    Defines the weight factor for shards allocated on a node (float).
    Defaults to ``0.45f``. Raising this raises the tendency to equalize
    the number of shards across all nodes in the cluster.

``cluster.routing.allocation.balance.index``
    Defines a factor to the number of shards per index allocated on a
    specific node (float). Defaults to ``0.5f``. Raising this raises the
    tendency to equalize the number of shards per index across all nodes
    in the cluster.

``cluster.routing.allocation.balance.primary``
    Defines a weight factor for the number of primaries of a specific
    index allocated on a node (float). ``0.05f``. Raising this raises
    the tendency to equalize the number of primary shards across all
    nodes in the cluster.

``cluster.routing.allocation.balance.threshold``
    Minimal optimization value of operations that should be performed
    (non negative float). Defaults to ``1.0f``. Raising this will cause
    the cluster to be less aggressive about optimizing the shard
    balance.

**Concurrent Rebalance**

``cluster.routing.allocation.cluster_concurrent_rebalance``
    Allow to control how many concurrent rebalancing of shards are
    allowed cluster wide, and default it to ``2`` (integer). ``-1`` for
    unlimited. See also ?.

**Enable allocation**

``cluster.routing.allocation.enable``
    See ?.

**Throttling allocation**

``cluster.routing.allocation.node_initial_primaries_recoveries``
    See ?.

``cluster.routing.allocation.node_concurrent_recoveries``
    See ?.

**Filter allocation**

``cluster.routing.allocation.include.*``
    See ?.

``cluster.routing.allocation.exclude.*``
    See ?.

``cluster.routing.allocation.require.*`` See ?.

**Metadata**

``cluster.blocks.read_only``
    Have the whole cluster read only (indices do not accept write
    operations), metadata is not allowed to be modified (create or
    delete indices).

**Discovery**

``discovery.zen.minimum_master_nodes``
    See ?

``discovery.zen.publish_timeout``
    See ?

**Threadpools**

``threadpool.*``
    See ?

**Index settings**

**Index filter cache**

``indices.cache.filter.size``
    See ?

``indices.cache.filter.expire`` (time)
    See ?

**TTL interval**

``indices.ttl.interval`` (time)
    See ?

**Recovery**

``indices.recovery.concurrent_streams``
    See ?

``indices.recovery.file_chunk_size``
    See ?

``indices.recovery.translog_ops``
    See ?

``indices.recovery.translog_size``
    See ?

``indices.recovery.compress``
    See ?

``indices.recovery.max_bytes_per_sec``
    See ?

**Store level throttling**

``indices.store.throttle.type``
    See ?

``indices.store.throttle.max_bytes_per_sec``
    See ?

**Logger**

Logger values can also be updated by setting ``logger.`` prefix. More
settings will be allowed to be updated.

**Field data circuit breaker**

``indices.fielddata.breaker.limit``
    See ?

``indices.fielddata.breaker.overhead``
    See ?

Nodes Stats
===========

**Nodes statistics**

The cluster nodes stats API allows to retrieve one or more (or all) of
the cluster nodes statistics.

.. code:: js

    curl -XGET 'http://localhost:9200/_nodes/stats'
    curl -XGET 'http://localhost:9200/_nodes/nodeId1,nodeId2/stats'

The first command retrieves stats of all the nodes in the cluster. The
second command selectively retrieves nodes stats of only ``nodeId1`` and
``nodeId2``. All the nodes selective options are explained
`here <#cluster-nodes>`__.

By default, all stats are returned. You can limit this by combining any
of ``indices``, ``os``, ``process``, ``jvm``, ``network``,
``transport``, ``http``, ``fs``, ``breaker`` and ``thread_pool``. For
example:

+------------+---------------------------------------------------------------+
| ``indices` | Indices stats about size, document count, indexing and        |
| `          | deletion times, search times, field cache size , merges and   |
|            | flushes                                                       |
+------------+---------------------------------------------------------------+
| ``fs``     | File system information, data path, free disk space,          |
|            | read/write stats                                              |
+------------+---------------------------------------------------------------+
| ``http``   | HTTP connection information                                   |
+------------+---------------------------------------------------------------+
| ``jvm``    | JVM stats, memory pool information, garbage collection,       |
|            | buffer pools                                                  |
+------------+---------------------------------------------------------------+
| ``network` | TCP information                                               |
| `          |                                                               |
+------------+---------------------------------------------------------------+
| ``os``     | Operating system stats, load average, cpu, mem, swap          |
+------------+---------------------------------------------------------------+
| ``process` | Process statistics, memory consumption, cpu usage, open file  |
| `          | descriptors                                                   |
+------------+---------------------------------------------------------------+
| ``thread_p | Statistics about each thread pool, including current size,    |
| ool``      | queue and rejected tasks                                      |
+------------+---------------------------------------------------------------+
| ``transpor | Transport statistics about sent and received bytes in cluster |
| t``        | communication                                                 |
+------------+---------------------------------------------------------------+
| ``breaker` | Statistics about the field data circuit breaker               |
| `          |                                                               |
+------------+---------------------------------------------------------------+

.. code:: js

    # return indices and os
    curl -XGET 'http://localhost:9200/_nodes/stats/os'
    # return just os and process
    curl -XGET 'http://localhost:9200/_nodes/stats/os,process'
    # specific type endpoint
    curl -XGET 'http://localhost:9200/_nodes/stats/process'
    curl -XGET 'http://localhost:9200/_nodes/10.0.0.1/stats/process'

The ``all`` flag can be set to return all the stats.

**Field data statistics**

You can get information about field data memory usage on node level or
on index level.

.. code:: js

    # Node Stats
    curl localhost:9200/_nodes/stats/indices/?fields=field1,field2&pretty

    # Indices Stat
    curl localhost:9200/_stats/fielddata/?fields=field1,field2&pretty

    # You can use wildcards for field names
    curl localhost:9200/_stats/fielddata/?fields=field*&pretty
    curl localhost:9200/_nodes/stats/indices/?fields=field*&pretty

**Search groups**

You can get statistics about search groups for searches executed on this
node.

.. code:: js

    # All groups with all stats
    curl localhost:9200/_nodes/stats?pretty&groups=_all

    # Some groups from just the indices stats
    curl localhost:9200/_nodes/stats/indices?pretty&groups=foo,bar

Nodes Info
==========

The cluster nodes info API allows to retrieve one or more (or all) of
the cluster nodes information.

.. code:: js

    curl -XGET 'http://localhost:9200/_nodes'
    curl -XGET 'http://localhost:9200/_nodes/nodeId1,nodeId2'

The first command retrieves information of all the nodes in the cluster.
The second command selectively retrieves nodes information of only
``nodeId1`` and ``nodeId2``. All the nodes selective options are
explained `here <#cluster-nodes>`__.

By default, it just returns all attributes and core settings for a node.
It also allows to get only information on ``settings``, ``os``,
``process``, ``jvm``, ``thread_pool``, ``network``, ``transport``,
``http`` and ``plugins``:

.. code:: js

    curl -XGET 'http://localhost:9200/_nodes/process'
    curl -XGET 'http://localhost:9200/_nodes/_all/process'
    curl -XGET 'http://localhost:9200/_nodes/nodeId1,nodeId2/jvm,process'
    # same as above
    curl -XGET 'http://localhost:9200/_nodes/nodeId1,nodeId2/info/jvm,process'

    curl -XGET 'http://localhost:9200/_nodes/nodeId1,nodeId2/_all

The ``_all`` flag can be set to return all the information - or you can
simply omit it.

``plugins`` - if set, the result will contain details about the loaded
plugins per node:

-  ``name``: plugin name

-  ``description``: plugin description if any

-  ``site``: ``true`` if the plugin is a site plugin

-  ``jvm``: ``true`` if the plugin is a plugin running in the JVM

-  ``url``: URL if the plugin is a site plugin

The result will look similar to:

.. code:: js

    {
      "cluster_name" : "test-cluster-MacBook-Air-de-David.local",
      "nodes" : {
        "hJLXmY_NTrCytiIMbX4_1g" : {
          "name" : "node4",
          "transport_address" : "inet[/172.18.58.139:9303]",
          "hostname" : "MacBook-Air-de-David.local",
          "version" : "0.90.0.Beta2-SNAPSHOT",
          "http_address" : "inet[/172.18.58.139:9203]",
          "plugins" : [ {
            "name" : "test-plugin",
            "description" : "test-plugin description",
            "site" : true,
            "jvm" : false
          }, {
            "name" : "test-no-version-plugin",
            "description" : "test-no-version-plugin description",
            "site" : true,
            "jvm" : false
          }, {
            "name" : "dummy",
            "description" : "No description found for dummy.",
            "url" : "/_plugin/dummy/",
            "site" : false,
            "jvm" : true
          } ]
        }
      }
    }

if your ``plugin`` data is subject to change use
``plugins.info_refresh_interval`` to change or disable the caching
interval:

.. code:: js

    # Change cache to 20 seconds
    plugins.info_refresh_interval: 20s

    # Infinite cache
    plugins.info_refresh_interval: -1

    # Disable cache
    plugins.info_refresh_interval: 0

Nodes hot\_threads
==================

An API allowing to get the current hot threads on each node in the
cluster. Endpoints are ``/_nodes/hot_threads``, and
``/_nodes/{nodesIds}/hot_threads``.

The output is plain text with a breakdown of each node’s top hot
threads. Parameters allowed are:

+------------+---------------------------------------------------------------+
| ``threads` | number of hot threads to provide, defaults to 3.              |
| `          |                                                               |
+------------+---------------------------------------------------------------+
| ``interval | the interval to do the second sampling of threads. Defaults   |
| ``         | to 500ms.                                                     |
+------------+---------------------------------------------------------------+
| ``type``   | The type to sample, defaults to cpu, but supports wait and    |
|            | block to see hot threads that are in wait or block state.     |
+------------+---------------------------------------------------------------+

Nodes Shutdown
==============

The nodes shutdown API allows to shutdown one or more (or all) nodes in
the cluster. Here is an example of shutting the ``_local`` node the
request is directed to:

.. code:: js

    $ curl -XPOST 'http://localhost:9200/_cluster/nodes/_local/_shutdown'

Specific node(s) can be shutdown as well using their respective node ids
(or other selective options as explained `here <#cluster-nodes>`__ .):

.. code:: js

    $ curl -XPOST 'http://localhost:9200/_cluster/nodes/nodeId1,nodeId2/_shutdown'

The master (of the cluster) can also be shutdown using:

.. code:: js

    $ curl -XPOST 'http://localhost:9200/_cluster/nodes/_master/_shutdown'

Finally, all nodes can be shutdown using one of the options below:

.. code:: js

    $ curl -XPOST 'http://localhost:9200/_shutdown'

    $ curl -XPOST 'http://localhost:9200/_cluster/nodes/_shutdown'

    $ curl -XPOST 'http://localhost:9200/_cluster/nodes/_all/_shutdown'

**Delay**

By default, the shutdown will be executed after a 1 second delay
(``1s``). The delay can be customized by setting the ``delay`` parameter
in a time value format. For example:

.. code:: js

    $ curl -XPOST 'http://localhost:9200/_cluster/nodes/_local/_shutdown?delay=10s'

**Disable Shutdown**

The shutdown API can be disabled by setting ``action.disable_shutdown``
in the node configuration.

**elasticsearch** provides a full Query DSL based on JSON to define
queries. In general, there are basic queries such as
`term <#query-dsl-term-query>`__ or
`prefix <#query-dsl-prefix-query>`__. There are also compound queries
like the `bool <#query-dsl-bool-query>`__ query. Queries can also have
filters associated with them such as the
`filtered <#query-dsl-filtered-query>`__ or
`constant\_score <#query-dsl-constant-score-query>`__ queries, with
specific filter queries.

Think of the Query DSL as an AST of queries. Certain queries can contain
other queries (like the `bool <#query-dsl-bool-query>`__ query), others
can contain filters (like the
`constant\_score <#query-dsl-constant-score-query>`__), and some can
contain both a query and a filter (like the
`filtered <#query-dsl-filtered-query>`__). Each of those can contain
**any** query of the list of queries or **any** filter from the list of
filters, resulting in the ability to build quite complex (and
interesting) queries.

Both queries and filters can be used in different APIs. For example,
within a `search query <#search-request-query>`__, or as an `aggregation
filter <#search-aggregations-bucket-filter-aggregation>`__. This section
explains the components (queries and filters) that can form the AST one
can use.

Filters are very handy since they perform an order of magnitude better
than plain queries since no scoring is performed and they are
automatically cached.

Queries
=======

As a general rule, queries should be used instead of filters:

-  for full text search

-  where the result depends on a relevance score

Match Query
-----------

A family of ``match`` queries that accept text/numerics/dates, analyzes
it, and constructs a query out of it. For example:

.. code:: js

    {
        "match" : {
            "message" : "this is a test"
        }
    }

Note, ``message`` is the name of a field, you can substitute the name of
any field (including ``_all``) instead.

**Types of Match Queries**

**boolean**

The default ``match`` query is of type ``boolean``. It means that the
text provided is analyzed and the analysis process constructs a boolean
query from the provided text. The ``operator`` flag can be set to ``or``
or ``and`` to control the boolean clauses (defaults to ``or``). The
minimum number of optional ``should`` clauses to match can be set using
the ```minimum_should_match`` <#query-dsl-minimum-should-match>`__
parameter.

The ``analyzer`` can be set to control which analyzer will perform the
analysis process on the text. It defaults to the field explicit mapping
definition, or the default search analyzer.

``fuzziness`` allows *fuzzy matching* based on the type of field being
queried. See ? for allowed settings.

The ``prefix_length`` and ``max_expansions`` can be set in this case to
control the fuzzy process. If the fuzzy option is set the query will use
``constant_score_rewrite`` as its `rewrite
method <#query-dsl-multi-term-rewrite>`__ the ``fuzzy_rewrite``
parameter allows to control how the query will get rewritten.

Here is an example when providing additional parameters (note the slight
change in structure, ``message`` is the field name):

.. code:: js

    {
        "match" : {
            "message" : {
                "query" : "this is a test",
                "operator" : "and"
            }
        }
    }

| *zero\_terms\_query*.
| If the analyzer used removes all tokens in a query like a ``stop``
filter does, the default behavior is to match no documents at all. In
order to change that the ``zero_terms_query`` option can be used, which
accepts ``none`` (default) and ``all`` which corresponds to a
``match_all`` query.

.. code:: js

    {
        "match" : {
            "message" : {
                "query" : "to be or not to be",
                "operator" : "and",
                "zero_terms_query": "all"
            }
        }
    }

| *cutoff\_frequency*.
| The match query supports a ``cutoff_frequency`` that allows specifying
an absolute or relative document frequency where high frequent terms are
moved into an optional subquery and are only scored if one of the low
frequent (below the cutoff) terms in the case of an ``or`` operator or
all of the low frequent terms in the case of an ``and`` operator match.

This query allows handling ``stopwords`` dynamically at runtime, is
domain independent and doesn’t require on a stopword file. It prevent
scoring / iterating high frequent terms and only takes the terms into
account if a more significant / lower frequent terms match a document.
Yet, if all of the query terms are above the given ``cutoff_frequency``
the query is automatically transformed into a pure conjunction (``and``)
query to ensure fast execution.

The ``cutoff_frequency`` can either be relative to the number of
documents in the index if in the range ``[0..1)`` or absolute if greater
or equal to ``1.0``.

Here is an example showing a query composed of stopwords exclusivly:

.. code:: js

    {
        "match" : {
            "message" : {
                "query" : "to be or not to be",
                "cutoff_frequency" : 0.001
            }
        }
    }

**phrase**

The ``match_phrase`` query analyzes the text and creates a ``phrase``
query out of the analyzed text. For example:

.. code:: js

    {
        "match_phrase" : {
            "message" : "this is a test"
        }
    }

Since ``match_phrase`` is only a ``type`` of a ``match`` query, it can
also be used in the following manner:

.. code:: js

    {
        "match" : {
            "message" : {
                "query" : "this is a test",
                "type" : "phrase"
            }
        }
    }

A phrase query matches terms up to a configurable ``slop`` (which
defaults to 0) in any order. Transposed terms have a slop of 2.

The ``analyzer`` can be set to control which analyzer will perform the
analysis process on the text. It default to the field explicit mapping
definition, or the default search analyzer, for example:

.. code:: js

    {
        "match_phrase" : {
            "message" : {
                "query" : "this is a test",
                "analyzer" : "my_analyzer"
            }
        }
    }

**match\_phrase\_prefix**

The ``match_phrase_prefix`` is the same as ``match_phrase``, except that
it allows for prefix matches on the last term in the text. For example:

.. code:: js

    {
        "match_phrase_prefix" : {
            "message" : "this is a test"
        }
    }

Or:

.. code:: js

    {
        "match" : {
            "message" : {
                "query" : "this is a test",
                "type" : "phrase_prefix"
            }
        }
    }

It accepts the same parameters as the phrase type. In addition, it also
accepts a ``max_expansions`` parameter that can control to how many
prefixes the last term will be expanded. It is highly recommended to set
it to an acceptable value to control the execution time of the query.
For example:

.. code:: js

    {
        "match_phrase_prefix" : {
            "message" : {
                "query" : "this is a test",
                "max_expansions" : 10
            }
        }
    }

**Comparison to query\_string / field**

The match family of queries does not go through a "query parsing"
process. It does not support field name prefixes, wildcard characters,
or other "advanced" features. For this reason, chances of it failing are
very small / non existent, and it provides an excellent behavior when it
comes to just analyze and run that text as a query behavior (which is
usually what a text search box does). Also, the ``phrase_prefix`` type
can provide a great "as you type" behavior to automatically load search
results.

**Other options**

-  ``lenient`` - If set to true will cause format based failures (like
   providing text to a numeric field) to be ignored. Defaults to false.

Multi Match Query
-----------------

The ``multi_match`` query builds on the ```match``
query <#query-dsl-match-query>`__ to allow multi-field queries:

.. code:: js

    {
      "multi_match" : {
        "query":    "this is a test", 
        "fields": [ "subject", "message" ] 
      }
    }

The query string.

The fields to be queried.

**``fields`` and per-field boosting**

Fields can be specified with wildcards, eg:

.. code:: js

    {
      "multi_match" : {
        "query":    "Will Smith",
        "fields": [ "title", "*_name" ] 
      }
    }

Query the ``title``, ``first_name`` and ``last_name`` fields.

Individual fields can be boosted with the caret (``^``) notation:

.. code:: js

    {
      "multi_match" : {
        "query" : "this is a test",
        "fields" : [ "subject^3", "message" ] 
      }
    }

The ``subject`` field is three times as important as the ``message``
field.

**Types of ``multi_match`` query:**

The way the ``multi_match`` query is executed internally depends on the
``type`` parameter, which can be set to:

+------------+---------------------------------------------------------------+
| ``best_fie | (**default**) Finds documents which match any field, but uses |
| lds``      | the ``_score`` from the best field. See ?.                    |
+------------+---------------------------------------------------------------+
| ``most_fie | Finds documents which match any field and combines the        |
| lds``      | ``_score`` from each field. See ?.                            |
+------------+---------------------------------------------------------------+
| ``cross_fi | Treats fields with the same ``analyzer`` as though they were  |
| elds``     | one big field. Looks for each word in **any** field. See ?.   |
+------------+---------------------------------------------------------------+
| ``phrase`` | Runs a ``match_phrase`` query on each field and combines the  |
|            | ``_score`` from each field. See ?.                            |
+------------+---------------------------------------------------------------+
| ``phrase_p | Runs a ``match_phrase_prefix`` query on each field and        |
| refix``    | combines the ``_score`` from each field. See ?.               |
+------------+---------------------------------------------------------------+

``best_fields``
~~~~~~~~~~~~~~~

The ``best_fields`` type is most useful when you are searching for
multiple words best found in the same field. For instance “brown fox” in
a single field is more meaningful than “brown” in one field and “fox” in
the other.

The ``best_fields`` type generates a ```match``
query <#query-dsl-match-query>`__ for each field and wraps them in a
```dis_max`` <#query-dsl-dis-max-query>`__ query, to find the single
best matching field. For instance, this query:

.. code:: js

    {
      "multi_match" : {
        "query":      "brown fox",
        "type":       "best_fields",
        "fields":     [ "subject", "message" ],
        "tie_breaker": 0.3
      }
    }

would be executed as:

.. code:: js

    {
      "dis_max": {
        "queries": [
          { "match": { "subject": "brown fox" }},
          { "match": { "message": "brown fox" }}
        ],
        "tie_breaker": 0.3
      }
    }

Normally the ``best_fields`` type uses the score of the **single** best
matching field, but if ``tie_breaker`` is specified, then it calculates
the score as follows:

-  the score from the best matching field

-  plus ``tie_breaker * _score`` for all other matching fields

Also, accepts ``analyzer``, ``boost``, ``operator``,
``minimum_should_match``, ``fuzziness``, ``prefix_length``,
``max_expansions``, ``rewrite``, ``zero_terms_query`` and
``cutoff_frequency``, as explained in `match
query <#query-dsl-match-query>`__.

    **Important**

    The ``best_fields`` and ``most_fields`` types are
    *field-centric* — they generate a ``match`` query **per field**.
    This means that the ``operator`` and ``minimum_should_match``
    parameters are applied to each field individually, which is probably
    not what you want.

    Take this query for example:

    .. code:: js

        {
          "multi_match" : {
            "query":      "Will Smith",
            "type":       "best_fields",
            "fields":     [ "first_name", "last_name" ],
            "operator":   "and" 
          }
        }

    All terms must be present.

    This query is executed as:

    ::

          (+first_name:will +first_name:smith)
        | (+last_name:will  +last_name:smith)

    In other words, **all terms** must be present **in a single field**
    for a document to match.

    See ? for a better solution.

``most_fields``
~~~~~~~~~~~~~~~

The ``most_fields`` type is most useful when querying multiple fields
that contain the same text analyzed in different ways. For instance, the
main field may contain synonyms, stemming and terms without diacritics.
A second field may contain the original terms, and a third field might
contain shingles. By combining scores from all three fields we can match
as many documents as possible with the main field, but use the second
and third fields to push the most similar results to the top of the
list.

This query:

.. code:: js

    {
      "multi_match" : {
        "query":      "quick brown fox",
        "type":       "most_fields",
        "fields":     [ "title", "title.original", "title.shingles" ]
      }
    }

would be executed as:

.. code:: js

    {
      "bool": {
        "should": [
          { "match": { "title":          "quick brown fox" }},
          { "match": { "title.original": "quick brown fox" }},
          { "match": { "title.shingles": "quick brown fox" }}
        ]
      }
    }

The score from each ``match`` clause is added together, then divided by
the number of ``match`` clauses.

Also, accepts ``analyzer``, ``boost``, ``operator``,
``minimum_should_match``, ``fuzziness``, ``prefix_length``,
``max_expansions``, ``rewrite``, ``zero_terms_query`` and
``cutoff_frequency``, as explained in `match
query <#query-dsl-match-query>`__, but **see ?**.

``phrase`` and ``phrase_prefix``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``phrase`` and ``phrase_prefix`` types behave just like ?, but they
use a ``match_phrase`` or ``match_phrase_prefix`` query instead of a
``match`` query.

This query:

.. code:: js

    {
      "multi_match" : {
        "query":      "quick brown f",
        "type":       "phrase_prefix",
        "fields":     [ "subject", "message" ]
      }
    }

would be executed as:

.. code:: js

    {
      "dis_max": {
        "queries": [
          { "match_phrase_prefix": { "subject": "quick brown f" }},
          { "match_phrase_prefix": { "message": "quick brown f" }}
        ]
      }
    }

Also, accepts ``analyzer``, ``boost``, ``slop`` and ``zero_terms_query``
as explained in ?. Type ``phrase_prefix`` additionally accepts
``max_expansions``.

``cross_fields``
~~~~~~~~~~~~~~~~

The ``cross_fields`` type is particularly useful with structured
documents where multiple fields **should** match. For instance, when
querying the ``first_name`` and ``last_name`` fields for “Will Smith”,
the best match is likely to have “Will” in one field and “Smith” in the
other.

This sounds like a job for ? but there are two problems with that
approach. The first problem is that ``operator`` and
``minimum_should_match`` are applied per-field, instead of per-term (see
`explanation above <#operator-min>`__).

The second problem is to do with relevance: the different term
frequencies in the ``first_name`` and ``last_name`` fields can produce
unexpected results.

For instance, imagine we have two people: “Will Smith” and “Smith
Jones”. “Smith” as a last name is very common (and so is of low
importance) but “Smith” as a first name is very uncommon (and so is of
great importance).

If we do a search for “Will Smith”, the “Smith Jones” document will
probably appear above the better matching “Will Smith” because the score
of ``first_name:smith`` has trumped the combined scores of
``first_name:will`` plus ``last_name:smith``.

One way of dealing with these types of queries is simply to index the
``first_name`` and ``last_name`` fields into a single ``full_name``
field. Of course, this can only be done at index time.

The ``cross_field`` type tries to solve these problems at query time by
taking a *term-centric* approach. It first analyzes the query string
into individual terms, then looks for each term in any of the fields, as
though they were one big field.

A query like:

.. code:: js

    {
      "multi_match" : {
        "query":      "Will Smith",
        "type":       "cross_fields",
        "fields":     [ "first_name", "last_name" ],
        "operator":   "and"
      }
    }

is executed as:

::

    +(first_name:will  last_name:will)
    +(first_name:smith last_name:smith)

In other words, **all terms** must be present **in at least one field**
for a document to match. (Compare this to `the logic used for
``best_fields`` and ``most_fields`` <#operator-min>`__.)

That solves one of the two problems. The problem of differing term
frequencies is solved by *blending* the term frequencies for all fields
in order to even out the differences. In other words,
``first_name:smith`` will be treated as though it has the same weight as
``last_name:smith``. (Actually, ``first_name:smith`` is given a tiny
advantage over ``last_name:smith``, just to make the order of results
more stable.)

If you run the above query through the ?, it returns this explanation:

::

    +blended("will",  fields: [first_name, last_name])
    +blended("smith", fields: [first_name, last_name])

Also, accepts ``analyzer``, ``boost``, ``operator``,
``minimum_should_match``, ``zero_terms_query`` and ``cutoff_frequency``,
as explained in `match query <#query-dsl-match-query>`__.

``cross_field`` and analysis
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``cross_field`` type can only work in term-centric mode on fields
that have the same analyzer. Fields with the same analyzer are grouped
together as in the example above. If there are multiple groups, they are
combined with a ``bool`` query.

For instance, if we have a ``first`` and ``last`` field which have the
same analyzer, plus a ``first.edge`` and ``last.edge`` which both use an
``edge_ngram`` analyzer, this query:

.. code:: js

    {
      "multi_match" : {
        "query":      "Jon",
        "type":       "cross_fields",
        "fields":     [
            "first", "first.edge",
            "last",  "last.edge"
        ]
      }
    }

would be executed as:

::

        blended("jon", fields: [first, last])
    | (
        blended("j",   fields: [first.edge, last.edge])
        blended("jo",  fields: [first.edge, last.edge])
        blended("jon", fields: [first.edge, last.edge])
    )

In other words, ``first`` and ``last`` would be grouped together and
treated as a single field, and ``first.edge`` and ``last.edge`` would be
grouped together and treated as a single field.

Having multiple groups is fine, but when combined with ``operator`` or
``minimum_should_match``, it can suffer from the `same
problem <#operator-min>`__ as ``most_fields`` or ``best_fields``.

You can easily rewrite this query yourself as two separate
``cross_fields`` queries combined with a ``bool`` query, and apply the
``minimum_should_match`` parameter to just one of them:

.. code:: js

    {
        "bool": {
            "should": [
                {
                  "multi_match" : {
                    "query":      "Will Smith",
                    "type":       "cross_fields",
                    "fields":     [ "first", "last" ],
                    "minimum_should_match": "50%" 
                  }
                },
                {
                  "multi_match" : {
                    "query":      "Will Smith",
                    "type":       "cross_fields",
                    "fields":     [ "*.edge" ]
                  }
                }
            ]
        }
    }

Either ``will`` or ``smith`` must be present in either of the ``first``
or ``last`` fields

You can force all fields into the same group by specifying the
``analyzer`` parameter in the query.

.. code:: js

    {
      "multi_match" : {
        "query":      "Jon",
        "type":       "cross_fields",
        "analyzer":   "standard", 
        "fields":     [ "first", "last", "*.edge" ]
      }
    }

Use the ``standard`` analyzer for all fields.

which will be executed as:

::

    blended("will",  fields: [first, first.edge, last.edge, last])
    blended("smith", fields: [first, first.edge, last.edge, last])

``tie_breaker``
^^^^^^^^^^^^^^^

By default, each per-term ``blended`` query will use the best score
returned by any field in a group, then these scores are added together
to give the final score. The ``tie_breaker`` parameter can change the
default behaviour of the per-term ``blended`` queries. It accepts:

+------------+---------------------------------------------------------------+
| ``0.0``    | Take the single best score out of (eg) ``first_name:will``    |
|            | and ``last_name:will`` (**default**)                          |
+------------+---------------------------------------------------------------+
| ``1.0``    | Add together the scores for (eg) ``first_name:will`` and      |
|            | ``last_name:will``                                            |
+------------+---------------------------------------------------------------+
| ``0.0 < n  | Take the single best score plus ``tie_breaker`` multiplied by |
| < 1.0``    | each of the scores from other matching fields.                |
+------------+---------------------------------------------------------------+

Bool Query
----------

A query that matches documents matching boolean combinations of other
queries. The bool query maps to Lucene ``BooleanQuery``. It is built
using one or more boolean clauses, each clause with a typed occurrence.
The occurrence types are:

+--------------------------------------+--------------------------------------+
| Occur                                | Description                          |
+======================================+======================================+
| ``must``                             | The clause (query) must appear in    |
|                                      | matching documents.                  |
+--------------------------------------+--------------------------------------+
| ``should``                           | The clause (query) should appear in  |
|                                      | the matching document. In a boolean  |
|                                      | query with no ``must`` clauses, one  |
|                                      | or more ``should`` clauses must      |
|                                      | match a document. The minimum number |
|                                      | of should clauses to match can be    |
|                                      | set using the                        |
|                                      | ```minimum_should_match`` <#query-ds |
|                                      | l-minimum-should-match>`__           |
|                                      | parameter.                           |
+--------------------------------------+--------------------------------------+
| ``must_not``                         | The clause (query) must not appear   |
|                                      | in the matching documents.           |
+--------------------------------------+--------------------------------------+

The bool query also supports ``disable_coord`` parameter (defaults to
``false``). Basically the coord similarity computes a score factor based
on the fraction of all query terms that a document contains. See Lucene
``BooleanQuery`` for more details.

.. code:: js

    {
        "bool" : {
            "must" : {
                "term" : { "user" : "kimchy" }
            },
            "must_not" : {
                "range" : {
                    "age" : { "from" : 10, "to" : 20 }
                }
            },
            "should" : [
                {
                    "term" : { "tag" : "wow" }
                },
                {
                    "term" : { "tag" : "elasticsearch" }
                }
            ],
            "minimum_should_match" : 1,
            "boost" : 1.0
        }
    }

Boosting Query
--------------

The ``boosting`` query can be used to effectively demote results that
match a given query. Unlike the "NOT" clause in bool query, this still
selects documents that contain undesirable terms, but reduces their
overall score.

.. code:: js

    {
        "boosting" : {
            "positive" : {
                "term" : {
                    "field1" : "value1"
                }
            },
            "negative" : {
                "term" : {
                    "field2" : "value2"
                }
            },
            "negative_boost" : 0.2
        }
    }

Common Terms Query
------------------

The ``common`` terms query is a modern alternative to stopwords which
improves the precision and recall of search results (by taking stopwords
into account), without sacrificing performance.

**The problem**

Every term in a query has a cost. A search for ``"The brown fox"``
requires three term queries, one for each of ``"the"``, ``"brown"`` and
``"fox"``, all of which are executed against all documents in the index.
The query for ``"the"`` is likely to match many documents and thus has a
much smaller impact on relevance than the other two terms.

Previously, the solution to this problem was to ignore terms with high
frequency. By treating ``"the"`` as a *stopword*, we reduce the index
size and reduce the number of term queries that need to be executed.

The problem with this approach is that, while stopwords have a small
impact on relevance, they are still important. If we remove stopwords,
we lose precision, (eg we are unable to distinguish between ``"happy"``
and ``"not happy"``) and we lose recall (eg text like ``"The The"`` or
``"To be or not to be"`` would simply not exist in the index).

**The solution**

The ``common`` terms query divides the query terms into two groups: more
important (ie *low frequency* terms) and less important (ie *high
frequency* terms which would previously have been stopwords).

First it searches for documents which match the more important terms.
These are the terms which appear in fewer documents and have a greater
impact on relevance.

Then, it executes a second query for the less important terms — terms
which appear frequently and have a low impact on relevance. But instead
of calculating the relevance score for **all** matching documents, it
only calculates the ``_score`` for documents already matched by the
first query. In this way the high frequency terms can improve the
relevance calculation without paying the cost of poor performance.

If a query consists only of high frequency terms, then a single query is
executed as an ``AND`` (conjunction) query, in other words all terms are
required. Even though each individual term will match many documents,
the combination of terms narrows down the resultset to only the most
relevant. The single query can also be executed as an ``OR`` with a
specific ```minimum_should_match`` <#query-dsl-minimum-should-match>`__,
in this case a high enough value should probably be used.

Terms are allocated to the high or low frequency groups based on the
``cutoff_frequency``, which can be specified as an absolute frequency
(``>=1``) or as a relative frequency (``0.0 .. 1.0``).

Perhaps the most interesting property of this query is that it adapts to
domain specific stopwords automatically. For example, on a video hosting
site, common terms like ``"clip"`` or ``"video"`` will automatically
behave as stopwords without the need to maintain a manual list.

**Examples**

In this example, words that have a document frequency greater than 0.1%
(eg ``"this"`` and ``"is"``) will be treated as *common terms*.

.. code:: js

    {
      "common": {
        "body": {
          "query":            "this is bonsai cool",
          "cutoff_frequency": 0.001
        }
      }
    }

The number of terms which should match can be controlled with the
```minimum_should_match`` <#query-dsl-minimum-should-match>`__
(``high_freq``, ``low_freq``), ``low_freq_operator`` (default ``"or"``)
and ``high_freq_operator`` (default ``"or"``) parameters.

For low frequency terms, set the ``low_freq_operator`` to ``"and"`` to
make all terms required:

.. code:: js

    {
      "common": {
        "body": {
          "query":            "nelly the elephant as a cartoon",
          "cutoff_frequency": 0.001,
          "low_freq_operator" "and"
        }
      }
    }

which is roughly equivalent to:

.. code:: js

    {
      "bool": {
        "must": [
          { "term": { "body": "nelly"}},
          { "term": { "body": "elephant"}},
          { "term": { "body": "cartoon"}}
        ],
        "should": [
          { "term": { "body": "the"}}
          { "term": { "body": "as"}}
          { "term": { "body": "a"}}
        ]
      }
    }

Alternatively use
```minimum_should_match`` <#query-dsl-minimum-should-match>`__ to
specify a minimum number or percentage of low frequency terms which must
be present, for instance:

.. code:: js

    {
      "common": {
        "body": {
          "query":                "nelly the elephant as a cartoon",
          "cutoff_frequency":     0.001,
          "minimum_should_match": 2
        }
      }
    }

which is roughly equivalent to:

.. code:: js

    {
      "bool": {
        "must": {
          "bool": {
            "should": [
              { "term": { "body": "nelly"}},
              { "term": { "body": "elephant"}},
              { "term": { "body": "cartoon"}}
            ],
            "minimum_should_match": 2
          }
        },
        "should": [
          { "term": { "body": "the"}}
          { "term": { "body": "as"}}
          { "term": { "body": "a"}}
        ]
      }
    }

minimum\_should\_match

A different
```minimum_should_match`` <#query-dsl-minimum-should-match>`__ can be
applied for low and high frequency terms with the additional
``low_freq`` and ``high_freq`` parameters Here is an example when
providing additional parameters (note the change in structure):

.. code:: js

    {
      "common": {
        "body": {
          "query":                "nelly the elephant not as a cartoon",
          "cutoff_frequency":     0.001,
          "minimum_should_match": {
              "low_freq" : 2,
              "high_freq" : 3
           }
        }
      }
    }

which is roughly equivalent to:

.. code:: js

    {
      "bool": {
        "must": {
          "bool": {
            "should": [
              { "term": { "body": "nelly"}},
              { "term": { "body": "elephant"}},
              { "term": { "body": "cartoon"}}
            ],
            "minimum_should_match": 2
          }
        },
        "should": {
          "bool": {
            "should": [
              { "term": { "body": "the"}},
              { "term": { "body": "not"}},
              { "term": { "body": "as"}},
              { "term": { "body": "a"}}
            ],
            "minimum_should_match": 3
          }
        }
      }
    }

In this case it means the high frequency terms have only an impact on
relevance when there are at least three of them. But the most
interesting use of the
```minimum_should_match`` <#query-dsl-minimum-should-match>`__ for high
frequency terms is when there are only high frequency terms:

.. code:: js

    {
      "common": {
        "body": {
          "query":                "how not to be",
          "cutoff_frequency":     0.001,
          "minimum_should_match": {
              "low_freq" : 2,
              "high_freq" : 3
           }
        }
      }
    }

which is roughly equivalent to:

.. code:: js

    {
      "bool": {
        "should": [
          { "term": { "body": "how"}},
          { "term": { "body": "not"}},
          { "term": { "body": "to"}},
          { "term": { "body": "be"}}
        ],
        "minimum_should_match": "3<50%"
      }
    }

The high frequency generated query is then slightly less restrictive
than with an ``AND``.

The ``common`` terms query also supports ``boost``, ``analyzer`` and
``disable_coord`` as parameters.

Constant Score Query
--------------------

A query that wraps a filter or another query and simply returns a
constant score equal to the query boost for every document in the
filter. Maps to Lucene ``ConstantScoreQuery``.

.. code:: js

    {
        "constant_score" : {
            "filter" : {
                "term" : { "user" : "kimchy"}
            },
            "boost" : 1.2
        }
    }

The filter object can hold only filter elements, not queries. Filters
can be much faster compared to queries since they don’t perform any
scoring, especially when they are cached.

A query can also be wrapped in a ``constant_score`` query:

.. code:: js

    {
        "constant_score" : {
            "query" : {
                "term" : { "user" : "kimchy"}
            },
            "boost" : 1.2
        }
    }

Dis Max Query
-------------

A query that generates the union of documents produced by its
subqueries, and that scores each document with the maximum score for
that document as produced by any subquery, plus a tie breaking increment
for any additional matching subqueries.

This is useful when searching for a word in multiple fields with
different boost factors (so that the fields cannot be combined
equivalently into a single search field). We want the primary score to
be the one associated with the highest boost, not the sum of the field
scores (as Boolean Query would give). If the query is "albino elephant"
this ensures that "albino" matching one field and "elephant" matching
another gets a higher score than "albino" matching both fields. To get
this result, use both Boolean Query and DisjunctionMax Query: for each
term a DisjunctionMaxQuery searches for it in each field, while the set
of these DisjunctionMaxQuery’s is combined into a BooleanQuery.

The tie breaker capability allows results that include the same term in
multiple fields to be judged better than results that include this term
in only the best of those multiple fields, without confusing this with
the better case of two different terms in the multiple fields.The
default ``tie_breaker`` is ``0.0``.

This query maps to Lucene ``DisjunctionMaxQuery``.

.. code:: js

    {
        "dis_max" : {
            "tie_breaker" : 0.7,
            "boost" : 1.2,
            "queries" : [
                {
                    "term" : { "age" : 34 }
                },
                {
                    "term" : { "age" : 35 }
                }
            ]
        }
    }

Filtered Query
--------------

The ``filtered`` query is used to combine another query with any
`filter <#query-dsl-filters>`__. Filters are usually faster than queries
because:

-  they don’t have to calculate the relevance ``_score`` for each
   document —  the answer is just a boolean “Yes, the document matches
   the filter” or “No, the document does not match the filter”.

-  the results from most filters can be cached in memory, making
   subsequent executions faster.

    **Tip**

    Exclude as many document as you can with a filter, then query just
    the documents that remain.

.. code:: js

    {
      "filtered": {
        "query": {
          "match": { "tweet": "full text search" }
        },
        "filter": {
          "range": { "created": { "gte": "now - 1d / d" }}
        }
      }
    }

The ``filtered`` query can be used wherever a ``query`` is expected, for
instance, to use the above example in search request:

.. code:: js

    curl -XGET localhost:9200/_search -d '
    {
      "query": {
        "filtered": { 
          "query": {
            "match": { "tweet": "full text search" }
          },
          "filter": {
            "range": { "created": { "gte": "now - 1d / d" }}
          }
        }
      }
    }
    '

The ``filtered`` query is passed as the value of the ``query`` parameter
in the search request.

Filtering without a query
~~~~~~~~~~~~~~~~~~~~~~~~~

If a ``query`` is not specified, it defaults to the ```match_all``
query <#query-dsl-match-all-query>`__. This means that the ``filtered``
query can be used to wrap just a filter, so that it can be used wherever
a query is expected.

.. code:: js

    curl -XGET localhost:9200/_search -d '
    {
      "query": {
        "filtered": { 
          "filter": {
            "range": { "created": { "gte": "now - 1d / d" }}
          }
        }
      }
    }
    '

No ``query`` has been specified, so this request applies just the
filter, returning all documents created since yesterday.

Multiple filters
~~~~~~~~~~~~~~~~

Multiple filters can be applied by wrapping them in a ```bool``
filter <#query-dsl-bool-filter>`__, for example:

.. code:: js

    {
      "filtered": {
        "query": { "match": { "tweet": "full text search" }},
        "filter": {
          "bool": {
            "must": { "range": { "created": { "gte": "now - 1d / d" }}},
            "should": [
              { "term": { "featured": true }},
              { "term": { "starred":  true }}
            ],
            "must_not": { "term": { "deleted": false }}
          }
        }
      }
    }

Similarly, multiple queries can be combined with a ```bool``
query <#query-dsl-bool-query>`__.

Filter strategy
~~~~~~~~~~~~~~~

You can control how the filter and query are executed with the
``strategy`` parameter:

.. code:: js

    {
        "filtered" : {
            "query" :   { ... },
            "filter" :  { ... },
            "strategy": "leap_frog"
        }
    }

    **Important**

    This is an *expert-level* setting. Most users can simply ignore it.

The ``strategy`` parameter accepts the following options:

+------------+---------------------------------------------------------------+
| ``leap_fro | Look for the first document matching the query, and then      |
| g_query_fi | alternatively advance the query and the filter to find common |
| rst``      | matches.                                                      |
+------------+---------------------------------------------------------------+
| ``leap_fro | Look for the first document matching the filter, and then     |
| g_filter_f | alternatively advance the query and the filter to find common |
| irst``     | matches.                                                      |
+------------+---------------------------------------------------------------+
| ``leap_fro | Same as ``leap_frog_query_first``.                            |
| g``        |                                                               |
+------------+---------------------------------------------------------------+
| ``query_fi | If the filter supports random access, then search for         |
| rst``      | documents using the query, and then consult the filter to     |
|            | check whether there is a match. Otherwise fall back to        |
|            | ``leap_frog_query_first``.                                    |
+------------+---------------------------------------------------------------+
| ``random_a | If the filter supports random access and if the number of     |
| ccess_${th | documents in the then apply the filter first. Otherwise fall  |
| reshold}`` | back to ``leap_frog_query_first``. ``${threshold}`` must be   |
|            | greater than or equal to ``1``.                               |
+------------+---------------------------------------------------------------+
| ``random_a | Apply the filter first if it supports random access.          |
| ccess_alwa | Otherwise fall back to ``leap_frog_query_first``.             |
| ys``       |                                                               |
+------------+---------------------------------------------------------------+

The default strategy is to use ``query_first`` on filters that are not
advanceable such as geo filters and script filters, and
``random_access_100`` on other filters.

Fuzzy Like This Query
---------------------

Fuzzy like this query find documents that are "like" provided text by
running it against one or more fields.

.. code:: js

    {
        "fuzzy_like_this" : {
            "fields" : ["name.first", "name.last"],
            "like_text" : "text like this one",
            "max_query_terms" : 12
        }
    }

``fuzzy_like_this`` can be shortened to ``flt``.

The ``fuzzy_like_this`` top level parameters include:

+--------------------------------------+--------------------------------------+
| Parameter                            | Description                          |
+======================================+======================================+
| ``fields``                           | A list of the fields to run the more |
|                                      | like this query against. Defaults to |
|                                      | the ``_all`` field.                  |
+--------------------------------------+--------------------------------------+
| ``like_text``                        | The text to find documents like it,  |
|                                      | **required**.                        |
+--------------------------------------+--------------------------------------+
| ``ignore_tf``                        | Should term frequency be ignored.    |
|                                      | Defaults to ``false``.               |
+--------------------------------------+--------------------------------------+
| ``max_query_terms``                  | The maximum number of query terms    |
|                                      | that will be included in any         |
|                                      | generated query. Defaults to ``25``. |
+--------------------------------------+--------------------------------------+
| ``fuzziness``                        | The minimum similarity of the term   |
|                                      | variants. Defaults to ``0.5``. See   |
|                                      | ?.                                   |
+--------------------------------------+--------------------------------------+
| ``prefix_length``                    | Length of required common prefix on  |
|                                      | variant terms. Defaults to ``0``.    |
+--------------------------------------+--------------------------------------+
| ``boost``                            | Sets the boost value of the query.   |
|                                      | Defaults to ``1.0``.                 |
+--------------------------------------+--------------------------------------+
| ``analyzer``                         | The analyzer that will be used to    |
|                                      | analyze the text. Defaults to the    |
|                                      | analyzer associated with the field.  |
+--------------------------------------+--------------------------------------+

**How it Works**

Fuzzifies ALL terms provided as strings and then picks the best n
differentiating terms. In effect this mixes the behaviour of FuzzyQuery
and MoreLikeThis but with special consideration of fuzzy scoring
factors. This generally produces good results for queries where users
may provide details in a number of fields and have no knowledge of
boolean query syntax and also want a degree of fuzzy matching and a fast
query.

For each source term the fuzzy variants are held in a BooleanQuery with
no coord factor (because we are not looking for matches on multiple
variants in any one doc). Additionally, a specialized TermQuery is used
for variants and does not use that variant term’s IDF because this would
favor rarer terms, such as misspellings. Instead, all variants use the
same IDF ranking (the one for the source query term) and this is
factored into the variant’s boost. If the source query term does not
exist in the index the average IDF of the variants is used.

Fuzzy Like This Field Query
---------------------------

The ``fuzzy_like_this_field`` query is the same as the
``fuzzy_like_this`` query, except that it runs against a single field.
It provides nicer query DSL over the generic ``fuzzy_like_this`` query,
and support typed fields query (automatically wraps typed fields with
type filter to match only on the specific type).

.. code:: js

    {
        "fuzzy_like_this_field" : {
            "name.first" : {
                "like_text" : "text like this one",
                "max_query_terms" : 12
            }
        }
    }

``fuzzy_like_this_field`` can be shortened to ``flt_field``.

The ``fuzzy_like_this_field`` top level parameters include:

+--------------------------------------+--------------------------------------+
| Parameter                            | Description                          |
+======================================+======================================+
| ``like_text``                        | The text to find documents like it,  |
|                                      | **required**.                        |
+--------------------------------------+--------------------------------------+
| ``ignore_tf``                        | Should term frequency be ignored.    |
|                                      | Defaults to ``false``.               |
+--------------------------------------+--------------------------------------+
| ``max_query_terms``                  | The maximum number of query terms    |
|                                      | that will be included in any         |
|                                      | generated query. Defaults to ``25``. |
+--------------------------------------+--------------------------------------+
| ``fuzziness``                        | The fuzziness of the term variants.  |
|                                      | Defaults to ``0.5``. See ?.          |
+--------------------------------------+--------------------------------------+
| ``prefix_length``                    | Length of required common prefix on  |
|                                      | variant terms. Defaults to ``0``.    |
+--------------------------------------+--------------------------------------+
| ``boost``                            | Sets the boost value of the query.   |
|                                      | Defaults to ``1.0``.                 |
+--------------------------------------+--------------------------------------+
| ``analyzer``                         | The analyzer that will be used to    |
|                                      | analyze the text. Defaults to the    |
|                                      | analyzer associated with the field.  |
+--------------------------------------+--------------------------------------+

Function Score Query
--------------------

The ``function_score`` allows you to modify the score of documents that
are retrieved by a query. This can be useful if, for example, a score
function is computationally expensive and it is sufficient to compute
the score on a filtered set of documents.

Using function score
~~~~~~~~~~~~~~~~~~~~

To use ``function_score``, the user has to define a query and one or
several functions, that compute a new score for each document returned
by the query.

``function_score`` can be used with only one function like this:

.. code:: js

    "function_score": {
        "(query|filter)": {},
        "boost": "boost for the whole query",
        "FUNCTION": {},
        "boost_mode":"(multiply|replace|...)"
    }

Furthermore, several functions can be combined. In this case one can
optionally choose to apply the function only if a document matches a
given filter:

.. code:: js

    "function_score": {
        "(query|filter)": {},
        "boost": "boost for the whole query",
        "functions": [
            {
                "filter": {},
                "FUNCTION": {},
                "weight": number
            },
            {
                "FUNCTION": {}
            },
            {
                "filter": {},
                "weight": number
            }
        ],
        "max_boost": number,
        "score_mode": "(multiply|max|...)",
        "boost_mode": "(multiply|replace|...)"
    }

If no filter is given with a function this is equivalent to specifying
``"match_all": {}``

First, each document is scored by the defined functions. The parameter
``score_mode`` specifies how the computed scores are combined:

+------------+---------------------------------------------------------------+
| ``multiply | scores are multiplied (default)                               |
| ``         |                                                               |
+------------+---------------------------------------------------------------+
| ``sum``    | scores are summed                                             |
+------------+---------------------------------------------------------------+
| ``avg``    | scores are averaged                                           |
+------------+---------------------------------------------------------------+
| ``first``  | the first function that has a matching filter is applied      |
+------------+---------------------------------------------------------------+
| ``max``    | maximum score is used                                         |
+------------+---------------------------------------------------------------+
| ``min``    | minimum score is used                                         |
+------------+---------------------------------------------------------------+

Because scores can be on different scales (for example, between 0 and 1
for decay functions but arbitrary for ``field_value_factor``) and also
because sometimes a different impact of functions on the score is
desirable, the score of each function can be adjusted with a user
defined ``weight`` (). The ``weight`` can be defined per function in the
``functions`` array (example above) and is multiplied with the score
computed by the respective function. If weight is given without any
other function declaration, ``weight`` acts as a function that simply
returns the ``weight``.

The new score can be restricted to not exceed a certain limit by setting
the ``max_boost`` parameter. The default for ``max_boost`` is FLT\_MAX.

Finally, the newly computed score is combined with the score of the
query. The parameter ``boost_mode`` defines how:

+------------+---------------------------------------------------------------+
| ``multiply | query score and function score is multiplied (default)        |
| ``         |                                                               |
+------------+---------------------------------------------------------------+
| ``replace` | only function score is used, the query score is ignored       |
| `          |                                                               |
+------------+---------------------------------------------------------------+
| ``sum``    | query score and function score are added                      |
+------------+---------------------------------------------------------------+
| ``avg``    | average                                                       |
+------------+---------------------------------------------------------------+
| ``max``    | max of query score and function score                         |
+------------+---------------------------------------------------------------+
| ``min``    | min of query score and function score                         |
+------------+---------------------------------------------------------------+

Score functions
~~~~~~~~~~~~~~~

The ``function_score`` query provides several types of score functions.

Script score
^^^^^^^^^^^^

The ``script_score`` function allows you to wrap another query and
customize the scoring of it optionally with a computation derived from
other numeric field values in the doc using a script expression. Here is
a simple sample:

.. code:: js

    "script_score" : {
        "script" : "_score * doc['my_numeric_field'].value"
    }

On top of the different scripting field values and expression, the
``_score`` script parameter can be used to retrieve the score based on
the wrapped query.

Scripts are cached for faster execution. If the script has parameters
that it needs to take into account, it is preferable to reuse the same
script, and provide parameters to it:

.. code:: js

    "script_score": {
        "lang": "lang",
        "params": {
            "param1": value1,
            "param2": value2
         },
        "script": "_score * doc['my_numeric_field'].value / pow(param1, param2)"
    }

Note that unlike the ``custom_score`` query, the score of the query is
multiplied with the result of the script scoring. If you wish to inhibit
this, set ``"boost_mode": "replace"``

Weight
^^^^^^

The ``weight`` score allows you to multiply the score by the provided
``weight``. This can sometimes be desired since boost value set on
specific queries gets normalized, while for this score function it does
not.

.. code:: js

    "weight" : number

Random
^^^^^^

The ``random_score`` generates scores using a hash of the ``_uid``
field, with a ``seed`` for variation. If ``seed`` is not specified, the
current time is used.

    **Note**

    Using this feature will load field data for ``_uid``, which can be a
    memory intensive operation since the values are unique.

.. code:: js

    "random_score": {
        "seed" : number
    }

Field Value factor
^^^^^^^^^^^^^^^^^^

The ``field_value_factor`` function allows you to use a field from a
document to influence the score. It’s similar to using the
``script_score`` function, however, it avoids the overhead of scripting.
If used on a multi-valued field, only the first value of the field is
used in calculations.

As an example, imagine you have a document indexed with a numeric
``popularity`` field and wish to influence the score of a document with
this field, an example doing so would look like:

.. code:: js

    "field_value_factor": {
      "field": "popularity",
      "factor": 1.2,
      "modifier": "sqrt"
    }

Which will translate into the following formula for scoring:

``sqrt(1.2 * doc['popularity'].value)``

There are a number of options for the ``field_value_factor`` function:

+--------------------------------------+--------------------------------------+
| Parameter                            | Description                          |
+======================================+======================================+
| ``field``                            | Field to be extracted from the       |
|                                      | document.                            |
+--------------------------------------+--------------------------------------+
| ``factor``                           | Optional factor to multiply the      |
|                                      | field value with, defaults to 1.     |
+--------------------------------------+--------------------------------------+
| ``modifier``                         | Modifier to apply to the field       |
|                                      | value, can be one of: ``none``,      |
|                                      | ``log``, ``log1p``, ``log2p``,       |
|                                      | ``ln``, ``ln1p``, ``ln2p``,          |
|                                      | ``square``, ``sqrt``, or             |
|                                      | ``reciprocal``. Defaults to          |
|                                      | ``none``.                            |
+--------------------------------------+--------------------------------------+

Keep in mind that taking the log() of 0, or the square root of a
negative number is an illegal operation, and an exception will be
thrown. Be sure to limit the values of the field with a range filter to
avoid this, or use ``log1p`` and ``ln1p``.

Decay functions
^^^^^^^^^^^^^^^

Decay functions score a document with a function that decays depending
on the distance of a numeric field value of the document from a user
given origin. This is similar to a range query, but with smooth edges
instead of boxes.

To use distance scoring on a query that has numerical fields, the user
has to define an ``origin`` and a ``scale`` for each field. The
``origin`` is needed to define the “central point” from which the
distance is calculated, and the ``scale`` to define the rate of decay.
The decay function is specified as

.. code:: js

    "DECAY_FUNCTION": {
        "FIELD_NAME": {
              "origin": "11, 12",
              "scale": "2km",
              "offset": "0km",
              "decay": 0.33
        }
    }

where ``DECAY_FUNCTION`` can be "linear", "exp" and "gauss" (see below).
The specified field must be a numeric field. In the above example, the
field is a ? and origin can be provided in geo format. ``scale`` and
``offset`` must be given with a unit in this case. If your field is a
date field, you can set ``scale`` and ``offset`` as days, weeks, and so
on. Example:

.. code:: js

        "DECAY_FUNCTION": {
            "FIELD_NAME": {
                  "origin": "2013-09-17",
                  "scale": "10d",
                  "offset": "5d",
                  "decay" : 0.5
            }
        }

The format of the origin depends on the ? defined in your mapping. If
you do not define the origin, the current time is used.

The ``offset`` and ``decay`` parameters are optional.

+------------+---------------------------------------------------------------+
| ``offset`` | If an ``offset`` is defined, the decay function will only     |
|            | compute the decay function for documents with a distance      |
|            | greater that the defined ``offset``. The default is 0.        |
+------------+---------------------------------------------------------------+
| ``decay``  | The ``decay`` parameter defines how documents are scored at   |
|            | the distance given at ``scale``. If no ``decay`` is defined,  |
|            | documents at the distance ``scale`` will be scored 0.5.       |
+------------+---------------------------------------------------------------+

In the first example, your documents might represents hotels and contain
a geo location field. You want to compute a decay function depending on
how far the hotel is from a given location. You might not immediately
see what scale to choose for the gauss function, but you can say
something like: "At a distance of 2km from the desired location, the
score should be reduced by one third." The parameter "scale" will then
be adjusted automatically to assure that the score function computes a
score of 0.5 for hotels that are 2km away from the desired location.

In the second example, documents with a field value between 2013-09-12
and 2013-09-22 would get a weight of 1.0 and documents which are 15 days
from that date a weight of 0.5.

The ``DECAY_FUNCTION`` determines the shape of the decay:

+------------+---------------------------------------------------------------+
| ``gauss``  | Normal decay, computed as:                                    |
|            |                                                               |
|            | |images/Gaussian.png|                                         |
+------------+---------------------------------------------------------------+

where |images/sigma.png| is computed to assure that the score takes the
value ``decay`` at distance ``scale`` from ``origin``\ +-\ ``offset``

|images/sigma\_calc.png|

+------------+---------------------------------------------------------------+
| ``exp``    | Exponential decay, computed as:                               |
|            |                                                               |
|            | |images/Exponential.png|                                      |
+------------+---------------------------------------------------------------+

where again the parameter |images/lambda.png| is computed to assure that
the score takes the value ``decay`` at distance ``scale`` from
``origin``\ +-\ ``offset``

|images/lambda\_calc.png|

+------------+---------------------------------------------------------------+
| ``linear`` | Linear decay, computed as:                                    |
|            |                                                               |
|            | |images/Linear.png|.                                          |
+------------+---------------------------------------------------------------+

where again the parameter ``s`` is computed to assure that the score
takes the value ``decay`` at distance ``scale`` from
``origin``\ +-\ ``offset``

|images/s\_calc.png|

In contrast to the normal and exponential decay, this function actually
sets the score to 0 if the field value exceeds twice the user given
scale value.

For single functions the three decay functions together with their
parameters can be visualized like this (the field in this example called
"age"):

|images/decay\_2d.png|

Multiple values:
^^^^^^^^^^^^^^^^

If a field used for computing the decay contains multiple values, per
default the value closest to the origin is chosen for determining the
distance. This can be changed by setting ``multi_value_mode``.

+------------+---------------------------------------------------------------+
| ``min``    | Distance is the minimum distance                              |
+------------+---------------------------------------------------------------+
| ``max``    | Distance is the maximum distance                              |
+------------+---------------------------------------------------------------+
| ``avg``    | Distance is the average distance                              |
+------------+---------------------------------------------------------------+
| ``sum``    | Distance is the sum of all distances                          |
+------------+---------------------------------------------------------------+

Example:

.. code:: js

        "DECAY_FUNCTION": {
            "FIELD_NAME": {
                  "origin": ...,
                  "scale": ...
            },
            "multi_value_mode": "avg"
        }

Detailed example
~~~~~~~~~~~~~~~~

Suppose you are searching for a hotel in a certain town. Your budget is
limited. Also, you would like the hotel to be close to the town center,
so the farther the hotel is from the desired location the less likely
you are to check in.

You would like the query results that match your criterion (for example,
"hotel, Nancy, non-smoker") to be scored with respect to distance to the
town center and also the price.

Intuitively, you would like to define the town center as the origin and
maybe you are willing to walk 2km to the town center from the hotel.In
this case your **origin** for the location field is the town center and
the **scale** is ~2km.

If your budget is low, you would probably prefer something cheap above
something expensive. For the price field, the **origin** would be 0
Euros and the **scale** depends on how much you are willing to pay, for
example 20 Euros.

In this example, the fields might be called "price" for the price of the
hotel and "location" for the coordinates of this hotel.

The function for ``price`` in this case would be

.. code:: js

    "DECAY_FUNCTION": {
        "price": {
              "origin": "0",
              "scale": "20"
        }
    }

and for ``location``:

.. code:: js

    "DECAY_FUNCTION": {
        "location": {
              "origin": "11, 12",
              "scale": "2km"
        }
    }

where ``DECAY_FUNCTION`` can be "linear", "exp" and "gauss".

Suppose you want to multiply these two functions on the original score,
the request would look like this:

.. code:: js

    curl 'localhost:9200/hotels/_search/' -d '{
    "query": {
        "function_score": {
            "functions": [
                {
                    "DECAY_FUNCTION": {
                        "price": {
                            "origin": "0",
                            "scale": "20"
                        }
                    }
                },
                {
                    "DECAY_FUNCTION": {
                        "location": {
                            "origin": "11, 12",
                            "scale": "2km"
                        }
                    }
                }
            ],
            "query": {
                "match": {
                    "properties": "balcony"
                }
            },
            "score_mode": "multiply"
        }
    }
    }'

Next, we show how the computed score looks like for each of the three
possible decay functions.

Normal decay, keyword ``gauss``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When choosing ``gauss`` as the decay function in the above example, the
contour and surface plot of the multiplier looks like this:

|https://f.cloud.github.com/assets/4320215/768157/cd0e18a6-e898-11e2-9b3c-f0145078bd6f.png|

|https://f.cloud.github.com/assets/4320215/768160/ec43c928-e898-11e2-8e0d-f3c4519dbd89.png|

Suppose your original search results matches three hotels :

-  "Backback Nap"

-  "Drink n Drive"

-  "BnB Bellevue".

"Drink n Drive" is pretty far from your defined location (nearly 2 km)
and is not too cheap (about 13 Euros) so it gets a low factor a factor
of 0.56. "BnB Bellevue" and "Backback Nap" are both pretty close to the
defined location but "BnB Bellevue" is cheaper, so it gets a multiplier
of 0.86 whereas "Backpack Nap" gets a value of 0.66.

Exponential decay, keyword ``exp``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When choosing ``exp`` as the decay function in the above example, the
contour and surface plot of the multiplier looks like this:

|https://f.cloud.github.com/assets/4320215/768161/082975c0-e899-11e2-86f7-174c3a729d64.png|

|https://f.cloud.github.com/assets/4320215/768162/0b606884-e899-11e2-907b-aefc77eefef6.png|

Linear' decay, keyword ``linear``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When choosing ``linear`` as the decay function in the above example, the
contour and surface plot of the multiplier looks like this:

|https://f.cloud.github.com/assets/4320215/768164/1775b0ca-e899-11e2-9f4a-776b406305c6.png|

|https://f.cloud.github.com/assets/4320215/768165/19d8b1aa-e899-11e2-91bc-6b0553e8d722.png|

Supported fields for decay functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Only single valued numeric fields, including time and geo locations, are
supported.

What if a field is missing?
~~~~~~~~~~~~~~~~~~~~~~~~~~~

If the numeric field is missing in the document, the function will
return 1.

Fuzzy Query
-----------

The fuzzy query uses similarity based on Levenshtein edit distance for
``string`` fields, and a ``+/-`` margin on numeric and date fields.

String fields
~~~~~~~~~~~~~

The ``fuzzy`` query generates all possible matching terms that are
within the maximum edit distance specified in ``fuzziness`` and then
checks the term dictionary to find out which of those generated terms
actually exist in the index.

Here is a simple example:

.. code:: js

    {
        "fuzzy" : { "user" : "ki" }
    }

Or with more advanced settings:

.. code:: js

    {
        "fuzzy" : {
            "user" : {
                "value" :         "ki",
                "boost" :         1.0,
                "fuzziness" :     2,
                "prefix_length" : 0,
                "max_expansions": 100
            }
        }
    }

**Parameters**

+------------+---------------------------------------------------------------+
| ``fuzzines | The maximum edit distance. Defaults to ``AUTO``. See ?.       |
| s``        |                                                               |
+------------+---------------------------------------------------------------+
| ``prefix_l | The number of initial characters which will not be            |
| ength``    | “fuzzified”. This helps to reduce the number of terms which   |
|            | must be examined. Defaults to ``0``.                          |
+------------+---------------------------------------------------------------+
| ``max_expa | The maximum number of terms that the ``fuzzy`` query will     |
| nsions``   | expand to. Defaults to ``50``.                                |
+------------+---------------------------------------------------------------+

    **Warning**

    this query can be very heavy if ``prefix_length`` and
    ``max_expansions`` are both set to ``0``. This could cause every
    term in the index to be examined!

**Numeric and date fields**

Performs a ? “around” the value using the ``fuzziness`` value as a
``+/-`` range, where:

::

    -fuzziness <= field value <= +fuzziness

For example:

.. code:: js

    {
        "fuzzy" : {
            "price" : {
                "value" : 12,
                "fuzziness" : 2
            }
        }
    }

Will result in a range query between 10 and 14. Date fields support
`time values <#time-units>`__, eg:

.. code:: js

    {
        "fuzzy" : {
            "created" : {
                "value" : "2010-02-05T12:05:07",
                "fuzziness" : "1d"
            }
        }
    }

See ? for more details about accepted values.

GeoShape Query
--------------

Query version of the `geo\_shape
Filter <#query-dsl-geo-shape-filter>`__.

Requires the `geo\_shape Mapping <#mapping-geo-shape-type>`__.

Given a document that looks like this:

.. code:: js

    {
        "name": "Wind & Wetter, Berlin, Germany",
        "location": {
            "type": "Point",
            "coordinates": [13.400544, 52.530286]
        }
    }

The following query will find the point:

.. code:: js

    {
        "query": {
            "geo_shape": {
                "location": {
                    "shape": {
                        "type": "envelope",
                        "coordinates": [[13, 53],[14, 52]]
                    }
                }
            }
        }
    }

See the Filter’s documentation for more information.

**Relevancy and Score**

Currently Elasticsearch does not have any notion of geo shape relevancy,
consequently the Query internally uses a ``constant_score`` Query which
wraps a `geo\_shape filter <#query-dsl-geo-shape-filter>`__.

Has Child Query
---------------

The ``has_child`` query works the same as the
`has\_child <#query-dsl-has-child-filter>`__ filter, by automatically
wrapping the filter with a
`constant\_score <#query-dsl-constant-score-query>`__ (when using the
default score type). It has the same syntax as the
`has\_child <#query-dsl-has-child-filter>`__ filter:

.. code:: js

    {
        "has_child" : {
            "type" : "blog_tag",
            "query" : {
                "term" : {
                    "tag" : "something"
                }
            }
        }
    }

An important difference with the ``top_children`` query is that this
query is always executed in two iterations whereas the ``top_children``
query can be executed in one or more iteration. When using the
``has_child`` query the ``total_hits`` is always correct.

**Scoring capabilities**

The ``has_child`` also has scoring support. The supported score types
are ``min``, ``max``, ``sum``, ``avg`` or ``none``. The default is
``none`` and yields the same behaviour as in previous versions. If the
score type is set to another value than ``none``, the scores of all the
matching child documents are aggregated into the associated parent
documents. The score type can be specified with the ``score_mode`` field
inside the ``has_child`` query:

.. code:: js

    {
        "has_child" : {
            "type" : "blog_tag",
            "score_mode" : "sum",
            "query" : {
                "term" : {
                    "tag" : "something"
                }
            }
        }
    }

**Min/Max Children**

The ``has_child`` query allows you to specify that a minimum and/or
maximum number of children are required to match for the parent doc to
be considered a match:

.. code:: js

    {
        "has_child" : {
            "type" : "blog_tag",
            "score_mode" : "sum",
            "min_children": 2, 
            "max_children": 10, 
            "query" : {
                "term" : {
                    "tag" : "something"
                }
            }
        }
    }

Both ``min_children`` and ``max_children`` are optional.

The ``min_children`` and ``max_children`` parameters can be combined
with the ``score_mode`` parameter.

**Memory Considerations**

In order to support parent-child joins, all of the (string) parent IDs
must be resident in memory (in the `field data
cache <#index-modules-fielddata>`__. Additionaly, every child document
is mapped to its parent using a long value (approximately). It is
advisable to keep the string parent ID short in order to reduce memory
usage.

You can check how much memory is being used by the ID cache using the
`indices stats <#indices-stats>`__ or `nodes
stats <#cluster-nodes-stats>`__ APIS, eg:

.. code:: js

    curl -XGET "http://localhost:9200/_stats/id_cache?pretty&human"

Has Parent Query
----------------

The ``has_parent`` query works the same as the
`has\_parent <#query-dsl-has-parent-filter>`__ filter, by automatically
wrapping the filter with a constant\_score (when using the default score
type). It has the same syntax as the
`has\_parent <#query-dsl-has-parent-filter>`__ filter.

.. code:: js

    {
        "has_parent" : {
            "parent_type" : "blog",
            "query" : {
                "term" : {
                    "tag" : "something"
                }
            }
        }
    }

**Scoring capabilities**

The ``has_parent`` also has scoring support. The supported score types
are ``score`` or ``none``. The default is ``none`` and this ignores the
score from the parent document. The score is in this case equal to the
boost on the ``has_parent`` query (Defaults to 1). If the score type is
set to ``score``, then the score of the matching parent document is
aggregated into the child documents belonging to the matching parent
document. The score type can be specified with the ``score_mode`` field
inside the ``has_parent`` query:

.. code:: js

    {
        "has_parent" : {
            "parent_type" : "blog",
            "score_mode" : "score",
            "query" : {
                "term" : {
                    "tag" : "something"
                }
            }
        }
    }

**Memory Considerations**

In order to support parent-child joins, all of the (string) parent IDs
must be resident in memory (in the `field data
cache <#index-modules-fielddata>`__. Additionaly, every child document
is mapped to its parent using a long value (approximately). It is
advisable to keep the string parent ID short in order to reduce memory
usage.

You can check how much memory is being used by the ID cache using the
`indices stats <#indices-stats>`__ or `nodes
stats <#cluster-nodes-stats>`__ APIS, eg:

.. code:: js

    curl -XGET "http://localhost:9200/_stats/id_cache?pretty&human"

Ids Query
---------

Filters documents that only have the provided ids. Note, this filter
does not require the `\_id <#mapping-id-field>`__ field to be indexed
since it works using the `\_uid <#mapping-uid-field>`__ field.

.. code:: js

    {
        "ids" : {
            "type" : "my_type",
            "values" : ["1", "4", "100"]
        }
    }

The ``type`` is optional and can be omitted, and can also accept an
array of values.

Indices Query
-------------

The ``indices`` query can be used when executed across multiple indices,
allowing to have a query that executes only when executed on an index
that matches a specific list of indices, and another query that executes
when it is executed on an index that does not match the listed indices.

.. code:: js

    {
        "indices" : {
            "indices" : ["index1", "index2"],
            "query" : {
                "term" : { "tag" : "wow" }
            },
            "no_match_query" : {
                "term" : { "tag" : "kow" }
            }
        }
    }

You can use the ``index`` field to provide a single index.

``no_match_query`` can also have "string" value of ``none`` (to match no
documents), and ``all`` (to match all). Defaults to ``all``.

``query`` is mandatory, as well as ``indices`` (or ``index``).

    **Tip**

    The fields order is important: if the ``indices`` are provided
    before ``query`` or ``no_match_query``, the related queries get
    parsed only against the indices that they are going to be executed
    on. This is useful to avoid parsing queries when it is not necessary
    and prevent potential mapping errors.

Match All Query
---------------

A query that matches all documents. Maps to Lucene
``MatchAllDocsQuery``.

.. code:: js

    {
        "match_all" : { }
    }

Which can also have boost associated with it:

.. code:: js

    {
        "match_all" : { "boost" : 1.2 }
    }

More Like This Query
--------------------

More like this query find documents that are "like" provided text by
running it against one or more fields.

.. code:: js

    {
        "more_like_this" : {
            "fields" : ["name.first", "name.last"],
            "like" : "text like this one",
            "min_term_freq" : 1,
            "max_query_terms" : 12
        }
    }

More Like This can find documents that are "like" a set of chosen
documents. The syntax to specify one or more documents is similar to the
`Multi GET API <#docs-multi-get>`__. If only one document is specified,
the query behaves the same as the `More Like This
API <#search-more-like-this>`__.

.. code:: js

    {
        "more_like_this" : {
            "fields" : ["name.first", "name.last"],
            "like" : [
            {
                "_index" : "test",
                "_type" : "type",
                "_id" : "1"
            },
            {
                "_index" : "test",
                "_type" : "type",
                "_id" : "2"
            },
            "and also some text like this one!"
            ],
            "min_term_freq" : 1,
            "max_query_terms" : 12
        }
    }

Additionally, `artificial
documents <#docs-termvectors-artificial-doc>`__ are also supported. This
is useful in order to specify one or more documents not present in the
index.

.. code:: js

    {
        "more_like_this" : {
            "fields" : ["name.first", "name.last"],
            "like" : [
            {
                "_index" : "test",
                "_type" : "type",
                "doc" : {
                    "name": {
                        "first": "Ben",
                        "last": "Grimm"
                    },
                    "tweet": "You got no idea what I'd... what I'd give to be invisible."
                  }
                }
            },
            {
                "_index" : "test",
                "_type" : "type",
                "_id" : "2"
            }
            ],
            "min_term_freq" : 1,
            "max_query_terms" : 12
        }
    }

``more_like_this`` can be shortened to ``mlt``.

Under the hood, ``more_like_this`` simply creates multiple ``should``
clauses in a ``bool`` query of interesting terms extracted from some
provided text. The interesting terms are selected with respect to their
tf-idf scores. These are controlled by ``min_term_freq``,
``min_doc_freq``, and ``max_doc_freq``. The number of interesting terms
is controlled by ``max_query_terms``. While the minimum number of
clauses that must be satisfied is controlled by
``minimum_should_match``. The terms are extracted from the text in
``like`` and analyzed by the analyzer associated with the field, unless
specified by ``analyzer``. There are other parameters, such as
``min_word_length``, ``max_word_length`` or ``stop_words``, to control
what terms should be considered as interesting. In order to give more
weight to more interesting terms, each boolean clause associated with a
term could be boosted by the term tf-idf score times some boosting
factor ``boost_terms``. When a search for multiple documents is issued,
More Like This generates a ``more_like_this`` query per document field
in ``fields``. These ``fields`` are specified as a top level parameter
or within each document request.

    **Important**

    The fields must be indexed and of type ``string``. Additionally,
    when using ``like`` with documents, the fields must be either
    ``stored``, store ``term_vector`` or ``_source`` must be enabled.

The ``more_like_this`` top level parameters include:

+--------------------------------------+--------------------------------------+
| Parameter                            | Description                          |
+======================================+======================================+
| ``fields``                           | A list of the fields to run the more |
|                                      | like this query against. Defaults to |
|                                      | the ``_all`` field for text and to   |
|                                      | all possible fields for documents.   |
+--------------------------------------+--------------------------------------+
| ``like``                             | coming[2.0] Can either be some text, |
|                                      | some documents or a combination of   |
|                                      | all, **required**. A document        |
|                                      | request follows the same syntax as   |
|                                      | the `Multi Get                       |
|                                      | API <#docs-multi-get>`__ or `Multi   |
|                                      | Term Vectors                         |
|                                      | API <#docs-multi-termvectors>`__. In |
|                                      | this case, the text is fetched from  |
|                                      | ``fields`` unless specified          |
|                                      | otherwise in each document request.  |
|                                      | The text is analyzed by the default  |
|                                      | analyzer at the field, unless        |
|                                      | overridden by the                    |
|                                      | ``per_field_analyzer`` parameter of  |
|                                      | the `Term Vectors                    |
|                                      | API <#docs-termvectors-per-field-ana |
|                                      | lyzer>`__.                           |
+--------------------------------------+--------------------------------------+
| ``like_text``                        | deprecated[2.0,Replaced by ``like``] |
|                                      | The text to find documents like it,  |
|                                      | **required** if ``ids`` or ``docs``  |
|                                      | are not specified.                   |
+--------------------------------------+--------------------------------------+
| ``ids`` or ``docs``                  | deprecated[2.0,Replaced by ``like``] |
|                                      | A list of documents following the    |
|                                      | same syntax as the `Multi GET        |
|                                      | API <#docs-multi-get>`__ or `Multi   |
|                                      | termvectors                          |
|                                      | API <#docs-multi-termvectors>`__.    |
|                                      | The text is fetched from ``fields``  |
|                                      | unless specified otherwise in each   |
|                                      | ``doc``. The text is analyzed by the |
|                                      | default analyzer at the field,       |
|                                      | unless specified by the              |
|                                      | ``per_field_analyzer`` parameter of  |
|                                      | the `Term Vectors                    |
|                                      | API <#docs-termvectors-per-field-ana |
|                                      | lyzer>`__.                           |
+--------------------------------------+--------------------------------------+
| ``include``                          | When using ``like`` with document    |
|                                      | requests, specifies whether the      |
|                                      | documents should be included from    |
|                                      | the search. Defaults to ``false``.   |
+--------------------------------------+--------------------------------------+
| ``minimum_should_match``             | From the generated query, the number |
|                                      | of terms that must match following   |
|                                      | the `minimum should                  |
|                                      | syntax <#query-dsl-minimum-should-ma |
|                                      | tch>`__.                             |
|                                      | (Defaults to ``"30%"``).             |
+--------------------------------------+--------------------------------------+
| ``min_term_freq``                    | The frequency below which terms will |
|                                      | be ignored in the source doc. The    |
|                                      | default frequency is ``2``.          |
+--------------------------------------+--------------------------------------+
| ``max_query_terms``                  | The maximum number of query terms    |
|                                      | that will be included in any         |
|                                      | generated query. Defaults to ``25``. |
+--------------------------------------+--------------------------------------+
| ``stop_words``                       | An array of stop words. Any word in  |
|                                      | this set is considered               |
|                                      | "uninteresting" and ignored. Even if |
|                                      | your Analyzer allows stopwords, you  |
|                                      | might want to tell the MoreLikeThis  |
|                                      | code to ignore them, as for the      |
|                                      | purposes of document similarity it   |
|                                      | seems reasonable to assume that "a   |
|                                      | stop word is never interesting".     |
+--------------------------------------+--------------------------------------+
| ``min_doc_freq``                     | The frequency at which words will be |
|                                      | ignored which do not occur in at     |
|                                      | least this many docs. Defaults to    |
|                                      | ``5``.                               |
+--------------------------------------+--------------------------------------+
| ``max_doc_freq``                     | The maximum frequency in which words |
|                                      | may still appear. Words that appear  |
|                                      | in more than this many docs will be  |
|                                      | ignored. Defaults to unbounded.      |
+--------------------------------------+--------------------------------------+
| ``min_word_length``                  | The minimum word length below which  |
|                                      | words will be ignored. Defaults to   |
|                                      | ``0``.(Old name "min\_word\_len" is  |
|                                      | deprecated)                          |
+--------------------------------------+--------------------------------------+
| ``max_word_length``                  | The maximum word length above which  |
|                                      | words will be ignored. Defaults to   |
|                                      | unbounded (``0``). (Old name         |
|                                      | "max\_word\_len" is deprecated)      |
+--------------------------------------+--------------------------------------+
| ``boost_terms``                      | Sets the boost factor to use when    |
|                                      | boosting terms. Defaults to          |
|                                      | deactivated (``0``). Any other value |
|                                      | activates boosting with given boost  |
|                                      | factor.                              |
+--------------------------------------+--------------------------------------+
| ``boost``                            | Sets the boost value of the query.   |
|                                      | Defaults to ``1.0``.                 |
+--------------------------------------+--------------------------------------+
| ``analyzer``                         | The analyzer that will be used to    |
|                                      | analyze the ``like text``. Defaults  |
|                                      | to the analyzer associated with the  |
|                                      | first field in ``fields``.           |
+--------------------------------------+--------------------------------------+

Nested Query
------------

Nested query allows to query nested objects / docs (see `nested
mapping <#mapping-nested-type>`__). The query is executed against the
nested objects / docs as if they were indexed as separate docs (they
are, internally) and resulting in the root parent doc (or parent nested
mapping). Here is a sample mapping we will work with:

.. code:: js

    {
        "type1" : {
            "properties" : {
                "obj1" : {
                    "type" : "nested"
                }
            }
        }
    }

And here is a sample nested query usage:

.. code:: js

    {
        "nested" : {
            "path" : "obj1",
            "score_mode" : "avg",
            "query" : {
                "bool" : {
                    "must" : [
                        {
                            "match" : {"obj1.name" : "blue"}
                        },
                        {
                            "range" : {"obj1.count" : {"gt" : 5}}
                        }
                    ]
                }
            }
        }
    }

The query ``path`` points to the nested object path, and the ``query``
(or ``filter``) includes the query that will run on the nested docs
matching the direct path, and joining with the root parent docs. Note
that any fields referenced inside the query must use the complete path
(fully qualified).

The ``score_mode`` allows to set how inner children matching affects
scoring of parent. It defaults to ``avg``, but can be ``sum``, ``max``
and ``none``.

Multi level nesting is automatically supported, and detected, resulting
in an inner nested query to automatically match the relevant nesting
level (and not root) if it exists within another nested query.

Prefix Query
------------

Matches documents that have fields containing terms with a specified
prefix (**not analyzed**). The prefix query maps to Lucene
``PrefixQuery``. The following matches documents where the user field
contains a term that starts with ``ki``:

.. code:: js

    {
        "prefix" : { "user" : "ki" }
    }

A boost can also be associated with the query:

.. code:: js

    {
        "prefix" : { "user" :  { "value" : "ki", "boost" : 2.0 } }
    }

Or :

.. code:: js

    {
        "prefix" : { "user" :  { "prefix" : "ki", "boost" : 2.0 } }
    }

This multi term query allows you to control how it gets rewritten using
the `rewrite <#query-dsl-multi-term-rewrite>`__ parameter.

Query String Query
------------------

A query that uses a query parser in order to parse its content. Here is
an example:

.. code:: js

    {
        "query_string" : {
            "default_field" : "content",
            "query" : "this AND that OR thus"
        }
    }

The ``query_string`` top level parameters include:

+--------------------------------------+--------------------------------------+
| Parameter                            | Description                          |
+======================================+======================================+
| ``query``                            | The actual query to be parsed. See   |
|                                      | ?.                                   |
+--------------------------------------+--------------------------------------+
| ``default_field``                    | The default field for query terms if |
|                                      | no prefix field is specified.        |
|                                      | Defaults to the                      |
|                                      | ``index.query.default_field`` index  |
|                                      | settings, which in turn defaults to  |
|                                      | ``_all``.                            |
+--------------------------------------+--------------------------------------+
| ``default_operator``                 | The default operator used if no      |
|                                      | explicit operator is specified. For  |
|                                      | example, with a default operator of  |
|                                      | ``OR``, the query                    |
|                                      | ``capital of Hungary`` is translated |
|                                      | to ``capital OR of OR Hungary``, and |
|                                      | with default operator of ``AND``,    |
|                                      | the same query is translated to      |
|                                      | ``capital AND of AND Hungary``. The  |
|                                      | default value is ``OR``.             |
+--------------------------------------+--------------------------------------+
| ``analyzer``                         | The analyzer name used to analyze    |
|                                      | the query string.                    |
+--------------------------------------+--------------------------------------+
| ``allow_leading_wildcard``           | When set, ``*`` or ``?`` are allowed |
|                                      | as the first character. Defaults to  |
|                                      | ``true``.                            |
+--------------------------------------+--------------------------------------+
| ``lowercase_expanded_terms``         | Whether terms of wildcard, prefix,   |
|                                      | fuzzy, and range queries are to be   |
|                                      | automatically lower-cased or not     |
|                                      | (since they are not analyzed).       |
|                                      | Default it ``true``.                 |
+--------------------------------------+--------------------------------------+
| ``enable_position_increments``       | Set to ``true`` to enable position   |
|                                      | increments in result queries.        |
|                                      | Defaults to ``true``.                |
+--------------------------------------+--------------------------------------+
| ``fuzzy_max_expansions``             | Controls the number of terms fuzzy   |
|                                      | queries will expand to. Defaults to  |
|                                      | ``50``                               |
+--------------------------------------+--------------------------------------+
| ``fuzziness``                        | Set the fuzziness for fuzzy queries. |
|                                      | Defaults to ``AUTO``. See ? for      |
|                                      | allowed settings.                    |
+--------------------------------------+--------------------------------------+
| ``fuzzy_prefix_length``              | Set the prefix length for fuzzy      |
|                                      | queries. Default is ``0``.           |
+--------------------------------------+--------------------------------------+
| ``phrase_slop``                      | Sets the default slop for phrases.   |
|                                      | If zero, then exact phrase matches   |
|                                      | are required. Default value is       |
|                                      | ``0``.                               |
+--------------------------------------+--------------------------------------+
| ``boost``                            | Sets the boost value of the query.   |
|                                      | Defaults to ``1.0``.                 |
+--------------------------------------+--------------------------------------+
| ``analyze_wildcard``                 | By default, wildcards terms in a     |
|                                      | query string are not analyzed. By    |
|                                      | setting this value to ``true``, a    |
|                                      | best effort will be made to analyze  |
|                                      | those as well.                       |
+--------------------------------------+--------------------------------------+
| ``auto_generate_phrase_queries``     | Defaults to ``false``.               |
+--------------------------------------+--------------------------------------+
| ``max_determinized_states``          | Limit on how many automaton states   |
|                                      | regexp queries are allowed to        |
|                                      | create. This protects against        |
|                                      | too-difficult (e.g. exponentially    |
|                                      | hard) regexps. Defaults to 10000.    |
+--------------------------------------+--------------------------------------+
| ``minimum_should_match``             | A value controlling how many         |
|                                      | "should" clauses in the resulting    |
|                                      | boolean query should match. It can   |
|                                      | be an absolute value (``2``), a      |
|                                      | percentage (``30%``) or a            |
|                                      | `combination of                      |
|                                      | both <#query-dsl-minimum-should-matc |
|                                      | h>`__.                               |
+--------------------------------------+--------------------------------------+
| ``lenient``                          | If set to ``true`` will cause format |
|                                      | based failures (like providing text  |
|                                      | to a numeric field) to be ignored.   |
+--------------------------------------+--------------------------------------+
| ``locale``                           | Locale that should be used for       |
|                                      | string conversions. Defaults to      |
|                                      | ``ROOT``.                            |
+--------------------------------------+--------------------------------------+
| ``time_zone``                        | Time Zone to be applied to any range |
|                                      | query related to dates. See also     |
|                                      | `JODA                                |
|                                      | timezone <http://joda-time.sourcefor |
|                                      | ge.net/api-release/org/joda/time/Dat |
|                                      | eTimeZone.html>`__.                  |
+--------------------------------------+--------------------------------------+

When a multi term query is being generated, one can control how it gets
rewritten using the `rewrite <#query-dsl-multi-term-rewrite>`__
parameter.

**Default Field**

When not explicitly specifying the field to search on in the query
string syntax, the ``index.query.default_field`` will be used to derive
which field to search on. It defaults to ``_all`` field.

So, if ``_all`` field is disabled, it might make sense to change it to
set a different default field.

**Multi Field**

The ``query_string`` query can also run against multiple fields. Fields
can be provided via the ``"fields"`` parameter (example below).

The idea of running the ``query_string`` query against multiple fields
is to expand each query term to an OR clause like this:

::

    field1:query_term OR field2:query_term | ...

For example, the following query

.. code:: js

    {
        "query_string" : {
            "fields" : ["content", "name"],
            "query" : "this AND that"
        }
    }

matches the same words as

.. code:: js

    {
        "query_string": {
          "query": "(content:this OR content:that) AND (name:this OR name:that)"
        }
    }

Since several queries are generated from the individual search terms,
combining them can be automatically done using either a ``dis_max``
query or a simple ``bool`` query. For example (the ``name`` is boosted
by 5 using ``^5`` notation):

.. code:: js

    {
        "query_string" : {
            "fields" : ["content", "name^5"],
            "query" : "this AND that OR thus",
            "use_dis_max" : true
        }
    }

Simple wildcard can also be used to search "within" specific inner
elements of the document. For example, if we have a ``city`` object with
several fields (or inner object with fields) in it, we can automatically
search on all "city" fields:

.. code:: js

    {
        "query_string" : {
            "fields" : ["city.*"],
            "query" : "this AND that OR thus",
            "use_dis_max" : true
        }
    }

Another option is to provide the wildcard fields search in the query
string itself (properly escaping the ``*`` sign), for example:
``city.\*:something``.

When running the ``query_string`` query against multiple fields, the
following additional parameters are allowed:

+--------------------------------------+--------------------------------------+
| Parameter                            | Description                          |
+======================================+======================================+
| ``use_dis_max``                      | Should the queries be combined using |
|                                      | ``dis_max`` (set it to ``true``), or |
|                                      | a ``bool`` query (set it to          |
|                                      | ``false``). Defaults to ``true``.    |
+--------------------------------------+--------------------------------------+
| ``tie_breaker``                      | When using ``dis_max``, the          |
|                                      | disjunction max tie breaker.         |
|                                      | Defaults to ``0``.                   |
+--------------------------------------+--------------------------------------+

The fields parameter can also include pattern based field names,
allowing to automatically expand to the relevant fields (dynamically
introduced fields included). For example:

.. code:: js

    {
        "query_string" : {
            "fields" : ["content", "name.*^5"],
            "query" : "this AND that OR thus",
            "use_dis_max" : true
        }
    }

Query string syntax
~~~~~~~~~~~~~~~~~~~

The query string “mini-language” is used by the ? and by the ``q`` query
string parameter in the ```search`` API <#search-search>`__.

The query string is parsed into a series of *terms* and *operators*. A
term can be a single word — \ ``quick`` or ``brown`` — or a phrase,
surrounded by double quotes — \ ``"quick brown"`` — which searches for
all the words in the phrase, in the same order.

Operators allow you to customize the search — the available options are
explained below.

Field names
^^^^^^^^^^^

As mentioned in ?, the ``default_field`` is searched for the search
terms, but it is possible to specify other fields in the query syntax:

-  where the ``status`` field contains ``active``

   ::

       status:active

-  where the ``title`` field contains ``quick`` or ``brown``. If you
   omit the OR operator the default operator will be used

   ::

       title:(quick OR brown)
       title:(quick brown)

-  where the ``author`` field contains the exact phrase ``"john smith"``

   ::

       author:"John Smith"

-  where any of the fields ``book.title``, ``book.content`` or
   ``book.date`` contains ``quick`` or ``brown`` (note how we need to
   escape the ``*`` with a backslash):

   ::

       book.\*:(quick brown)

-  where the field ``title`` has no value (or is missing):

   ::

       _missing_:title

-  where the field ``title`` has any non-null value:

   ::

       _exists_:title

Wildcards
^^^^^^^^^

Wildcard searches can be run on individual terms, using ``?`` to replace
a single character, and ``*`` to replace zero or more characters:

::

    qu?ck bro*

Be aware that wildcard queries can use an enormous amount of memory and
perform very badly — just think how many terms need to be queried to
match the query string ``"a* b* c*"``.

    **Warning**

    Allowing a wildcard at the beginning of a word (eg ``"*ing"``) is
    particularly heavy, because all terms in the index need to be
    examined, just in case they match. Leading wildcards can be disabled
    by setting ``allow_leading_wildcard`` to ``false``.

Wildcarded terms are not analyzed by default — they are lowercased
(``lowercase_expanded_terms`` defaults to ``true``) but no further
analysis is done, mainly because it is impossible to accurately analyze
a word that is missing some of its letters. However, by setting
``analyze_wildcard`` to ``true``, an attempt will be made to analyze
wildcarded words before searching the term list for matching terms.

Regular expressions
^^^^^^^^^^^^^^^^^^^

Regular expression patterns can be embedded in the query string by
wrapping them in forward-slashes (``"/"``):

::

    name:/joh?n(ath[oa]n)/

The supported regular expression syntax is explained in ?.

    **Warning**

    The ``allow_leading_wildcard`` parameter does not have any control
    over regular expressions. A query string such as the following would
    force Elasticsearch to visit every term in the index:

    ::

        /.*n/

    Use with caution!

Fuzziness
^^^^^^^^^

We can search for terms that are similar to, but not exactly like our
search terms, using the “fuzzy” operator:

::

    quikc~ brwn~ foks~

This uses the `Damerau-Levenshtein
distance <http://en.wikipedia.org/wiki/Damerau-Levenshtein_distance>`__
to find all terms with a maximum of two changes, where a change is the
insertion, deletion or substitution of a single character, or
transposition of two adjacent characters.

The default *edit distance* is ``2``, but an edit distance of ``1``
should be sufficient to catch 80% of all human misspellings. It can be
specified as:

::

    quikc~1

Proximity searches
^^^^^^^^^^^^^^^^^^

While a phrase query (eg ``"john smith"``) expects all of the terms in
exactly the same order, a proximity query allows the specified words to
be further apart or in a different order. In the same way that fuzzy
queries can specify a maximum edit distance for characters in a word, a
proximity search allows us to specify a maximum edit distance of words
in a phrase:

::

    "fox quick"~5

The closer the text in a field is to the original order specified in the
query string, the more relevant that document is considered to be. When
compared to the above example query, the phrase ``"quick fox"`` would be
considered more relevant than ``"quick brown fox"``.

Ranges
^^^^^^

Ranges can be specified for date, numeric or string fields. Inclusive
ranges are specified with square brackets ``[min TO max]`` and exclusive
ranges with curly brackets ``{min TO max}``.

-  All days in 2012:

   ::

       date:[2012-01-01 TO 2012-12-31]

-  Numbers 1..5

   ::

       count:[1 TO 5]

-  Tags between ``alpha`` and ``omega``, excluding ``alpha`` and
   ``omega``:

   ::

       tag:{alpha TO omega}

-  Numbers from 10 upwards

   ::

       count:[10 TO *]

-  Dates before 2012

   ::

       date:{* TO 2012-01-01}

Curly and square brackets can be combined:

-  Numbers from 1 up to but not including 5

   ::

       count:[1..5}

Ranges with one side unbounded can use the following syntax:

::

    age:>10
    age:>=10
    age:<10
    age:<=10

    **Note**

    To combine an upper and lower bound with the simplified syntax, you
    would need to join two clauses with an ``AND`` operator:

    ::

        age:(>=10 AND < 20)
        age:(+>=10 +<20)

The parsing of ranges in query strings can be complex and error prone.
It is much more reliable to use an explicit ```range``
filter <#query-dsl-range-filter>`__.

Boosting
^^^^^^^^

Use the *boost* operator ``^`` to make one term more relevant than
another. For instance, if we want to find all documents about foxes, but
we are especially interested in quick foxes:

::

    quick^2 fox

The default ``boost`` value is 1, but can be any positive floating point
number. Boosts between 0 and 1 reduce relevance.

Boosts can also be applied to phrases or to groups:

::

    "john smith"^2   (foo bar)^4

Boolean operators
^^^^^^^^^^^^^^^^^

By default, all terms are optional, as long as one term matches. A
search for ``foo bar baz`` will find any document that contains one or
more of ``foo`` or ``bar`` or ``baz``. We have already discussed the
``default_operator`` above which allows you to force all terms to be
required, but there are also *boolean operators* which can be used in
the query string itself to provide more control.

The preferred operators are ``+`` (this term **must** be present) and
``-`` (this term **must not** be present). All other terms are optional.
For example, this query:

::

    quick brown +fox -news

states that:

-  ``fox`` must be present

-  ``news`` must not be present

-  ``quick`` and ``brown`` are optional — their presence increases the
   relevance

The familiar operators ``AND``, ``OR`` and ``NOT`` (also written ``&&``,
``||`` and ``!``) are also supported. However, the effects of these
operators can be more complicated than is obvious at first glance.
``NOT`` takes precedence over ``AND``, which takes precedence over
``OR``. While the ``+`` and ``-`` only affect the term to the right of
the operator, ``AND`` and ``OR`` can affect the terms to the left and
right.

Rewriting the above query using ``AND``, ``OR`` and ``NOT`` demonstrates
the complexity:

``quick OR brown AND fox AND NOT news``
    This is incorrect, because ``brown`` is now a required term.

``(quick OR brown) AND fox AND NOT news``
    This is incorrect because at least one of ``quick`` or ``brown`` is
    now required and the search for those terms would be scored
    differently from the original query.

``((quick AND fox) OR (brown AND fox) OR fox) AND NOT news``
    This form now replicates the logic from the original query
    correctly, but the relevance scoring bares little resemblance to the
    original.

In contrast, the same query rewritten using the ```match``
query <#query-dsl-match-query>`__ would look like this:

::

    {
        "bool": {
            "must":     { "match": "fox"         },
            "should":   { "match": "quick brown" },
            "must_not": { "match": "news"        }
        }
    }

Grouping
^^^^^^^^

Multiple terms or clauses can be grouped together with parentheses, to
form sub-queries:

::

    (quick OR brown) AND fox

Groups can be used to target a particular field, or to boost the result
of a sub-query:

::

    status:(active OR pending) title:(full text search)^2

Reserved characters
^^^^^^^^^^^^^^^^^^^

If you need to use any of the characters which function as operators in
your query itself (and not as operators), then you should escape them
with a leading backslash. For instance, to search for ``(1+1)=2``, you
would need to write your query as ``\(1\+1\)=2``.

The reserved characters are: ``+ - && || ! ( ) { } [ ] ^ " ~ * ? : \ /``

Failing to escape these special characters correctly could lead to a
syntax error which prevents your query from running.

A space may also be a reserved character. For instance, if you have a
synonym list which converts ``"wi fi"`` to ``"wifi"``, a
``query_string`` search for ``"wi fi"`` would fail. The query string
parser would interpret your query as a search for ``"wi OR fi"``, while
the token stored in your index is actually ``"wifi"``. Escaping the
space will protect it from being touched by the query string parser:
``"wi\ fi"``.

Empty Query
^^^^^^^^^^^

If the query string is empty or only contains whitespaces the query
string is interpreted as a ``no_docs_query`` and will yield an empty
result set.

Simple Query String Query
-------------------------

A query that uses the SimpleQueryParser to parse its context. Unlike the
regular ``query_string`` query, the ``simple_query_string`` query will
never throw an exception, and discards invalid parts of the query. Here
is an example:

.. code:: js

    {
        "simple_query_string" : {
            "query": "\"fried eggs\" +(eggplant | potato) -frittata",
            "analyzer": "snowball",
            "fields": ["body^5","_all"],
            "default_operator": "and"
        }
    }

The ``simple_query_string`` top level parameters include:

+--------------------------------------+--------------------------------------+
| Parameter                            | Description                          |
+======================================+======================================+
| ``query``                            | The actual query to be parsed. See   |
|                                      | below for syntax.                    |
+--------------------------------------+--------------------------------------+
| ``fields``                           | The fields to perform the parsed     |
|                                      | query against. Defaults to the       |
|                                      | ``index.query.default_field`` index  |
|                                      | settings, which in turn defaults to  |
|                                      | ``_all``.                            |
+--------------------------------------+--------------------------------------+
| ``default_operator``                 | The default operator used if no      |
|                                      | explicit operator is specified. For  |
|                                      | example, with a default operator of  |
|                                      | ``OR``, the query                    |
|                                      | ``capital of Hungary`` is translated |
|                                      | to ``capital OR of OR Hungary``, and |
|                                      | with default operator of ``AND``,    |
|                                      | the same query is translated to      |
|                                      | ``capital AND of AND Hungary``. The  |
|                                      | default value is ``OR``.             |
+--------------------------------------+--------------------------------------+
| ``analyzer``                         | The analyzer used to analyze each    |
|                                      | term of the query when creating      |
|                                      | composite queries.                   |
+--------------------------------------+--------------------------------------+
| ``flags``                            | Flags specifying which features of   |
|                                      | the ``simple_query_string`` to       |
|                                      | enable. Defaults to ``ALL``.         |
+--------------------------------------+--------------------------------------+
| ``lowercase_expanded_terms``         | Whether terms of prefix and fuzzy    |
|                                      | queries are to be automatically      |
|                                      | lower-cased or not (since they are   |
|                                      | not analyzed). Defaults to true.     |
+--------------------------------------+--------------------------------------+
| ``locale``                           | Locale that should be used for       |
|                                      | string conversions. Defaults to      |
|                                      | ``ROOT``.                            |
+--------------------------------------+--------------------------------------+
| ``lenient``                          | If set to ``true`` will cause format |
|                                      | based failures (like providing text  |
|                                      | to a numeric field) to be ignored.   |
+--------------------------------------+--------------------------------------+

**Simple Query String Syntax**

The ``simple_query_string`` supports the following special characters:

-  ``+`` signifies AND operation

-  ``|`` signifies OR operation

-  ``-`` negates a single token

-  ``"`` wraps a number of tokens to signify a phrase for searching

-  ``*`` at the end of a term signifies a prefix query

-  ``(`` and ``)`` signify precedence

-  ``~N`` after a word signifies edit distance (fuzziness)

-  ``~N`` after a phrase signifies slop amount

In order to search for any of these special characters, they will need
to be escaped with ``\``.

**Default Field**

When not explicitly specifying the field to search on in the query
string syntax, the ``index.query.default_field`` will be used to derive
which field to search on. It defaults to ``_all`` field.

So, if ``_all`` field is disabled, it might make sense to change it to
set a different default field.

**Multi Field**

The fields parameter can also include pattern based field names,
allowing to automatically expand to the relevant fields (dynamically
introduced fields included). For example:

.. code:: js

    {
        "simple_query_string" : {
            "fields" : ["content", "name.*^5"],
            "query" : "foo bar baz"
        }
    }

**Flags**

``simple_query_string`` support multiple flags to specify which parsing
features should be enabled. It is specified as a ``|``-delimited string
with the ``flags`` parameter:

.. code:: js

    {
        "simple_query_string" : {
            "query" : "foo | bar  & baz*",
            "flags" : "OR|AND|PREFIX"
        }
    }

The available flags are: ``ALL``, ``NONE``, ``AND``, ``OR``, ``NOT``,
``PREFIX``, ``PHRASE``, ``PRECEDENCE``, ``ESCAPE``, ``WHITESPACE``,
``FUZZY``, ``NEAR``, and ``SLOP``.

Range Query
-----------

Matches documents with fields that have terms within a certain range.
The type of the Lucene query depends on the field type, for ``string``
fields, the ``TermRangeQuery``, while for number/date fields, the query
is a ``NumericRangeQuery``. The following example returns all documents
where ``age`` is between ``10`` and ``20``:

.. code:: js

    {
        "range" : {
            "age" : {
                "gte" : 10,
                "lte" : 20,
                "boost" : 2.0
            }
        }
    }

The ``range`` query accepts the following parameters:

+------------+---------------------------------------------------------------+
| ``gte``    | Greater-than or equal to                                      |
+------------+---------------------------------------------------------------+
| ``gt``     | Greater-than                                                  |
+------------+---------------------------------------------------------------+
| ``lte``    | Less-than or equal to                                         |
+------------+---------------------------------------------------------------+
| ``lt``     | Less-than                                                     |
+------------+---------------------------------------------------------------+
| ``boost``  | Sets the boost value of the query, defaults to ``1.0``        |
+------------+---------------------------------------------------------------+

**Date options**

When applied on ``date`` fields the ``range`` filter accepts also a
``time_zone`` parameter. The ``time_zone`` parameter will be applied to
your input lower and upper bounds and will move them to UTC time based
date:

.. code:: js

    {
        "range" : {
            "born" : {
                "gte": "2012-01-01",
                "lte": "now",
                "time_zone": "+1:00"
            }
        }
    }

In the above example, ``gte`` will be actually moved to
``2011-12-31T23:00:00`` UTC date.

    **Note**

    if you give a date with a timezone explicitly defined and use the
    ``time_zone`` parameter, ``time_zone`` will be ignored. For example,
    setting ``from`` to ``2012-01-01T00:00:00+01:00`` with
    ``"time_zone":"+10:00"`` will still use ``+01:00`` time zone.

When applied on ``date`` fields the ``range`` query accepts also a
``format`` parameter. The ``format`` parameter will help support another
date format than the one defined in mapping:

.. code:: js

    {
        "range" : {
            "born" : {
                "gte": "01/01/2012",
                "lte": "2013",
                "format": "dd/MM/yyyy||yyyy"
            }
        }
    }

Regexp Query
------------

The ``regexp`` query allows you to use regular expression term queries.
See ? for details of the supported regular expression language. The
"term queries" in that first sentence means that Elasticsearch will
apply the regexp to the terms produced by the tokenizer for that field,
and not to the original text of the field.

**Note**: The performance of a ``regexp`` query heavily depends on the
regular expression chosen. Matching everything like ``.*`` is very slow
as well as using lookaround regular expressions. If possible, you should
try to use a long prefix before your regular expression starts. Wildcard
matchers like ``.*?+`` will mostly lower performance.

.. code:: js

    {
        "regexp":{
            "name.first": "s.*y"
        }
    }

Boosting is also supported

.. code:: js

    {
        "regexp":{
            "name.first":{
                "value":"s.*y",
                "boost":1.2
            }
        }
    }

You can also use special flags

.. code:: js

    {
        "regexp":{
            "name.first": {
                "value": "s.*y",
                "flags" : "INTERSECTION|COMPLEMENT|EMPTY"
            }
        }
    }

Possible flags are ``ALL``, ``ANYSTRING``, ``AUTOMATON``,
``COMPLEMENT``, ``EMPTY``, ``INTERSECTION``, ``INTERVAL``, or ``NONE``.
Please check the `Lucene
documentation <http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/util/automaton/RegExp.html>`__
for their meaning

Regular expressions are dangerous because it’s easy to accidentally
create an innocuous looking one that requires an exponential number of
internal determinized automaton states (and corresponding RAM and CPU)
for Lucene to execute. Lucene prevents these using the
``max_determinized_states`` setting (defaults to 10000). You can raise
this limit to allow more complex regular expressions to execute.

.. code:: js

    {
        "regexp":{
            "name.first": {
                "value": "s.*y",
                "flags" : "INTERSECTION|COMPLEMENT|EMPTY",
            "max_determinized_states": 20000
            }
        }
    }

Regular expression syntax
~~~~~~~~~~~~~~~~~~~~~~~~~

Regular expression queries are supported by the ``regexp`` and the
``query_string`` queries. The Lucene regular expression engine is not
Perl-compatible but supports a smaller range of operators.

    **Note**

    We will not attempt to explain regular expressions, but just explain
    the supported operators.

Standard operators
^^^^^^^^^^^^^^^^^^

Anchoring
    Most regular expression engines allow you to match any part of a
    string. If you want the regexp pattern to start at the beginning of
    the string or finish at the end of the string, then you have to
    *anchor* it specifically, using ``^`` to indicate the beginning or
    ``$`` to indicate the end.

    Lucene’s patterns are always anchored. The pattern provided must
    match the entire string. For string ``"abcde"``:

    ::

        ab.*     # match
        abcd     # no match

Allowed characters
    Any Unicode characters may be used in the pattern, but certain
    characters are reserved and must be escaped. The standard reserved
    characters are:

    ::

        . ? + * | { } [ ] ( ) " \

    If you enable optional features (see below) then these characters
    may also be reserved:

    ::

        # @ & < >  ~

    Any reserved character can be escaped with a backslash ``"\*"``
    including a literal backslash character: ``"\\"``

    Additionally, any characters (except double quotes) are interpreted
    literally when surrounded by double quotes:

    ::

        john"@smith.com"

Match any character
    The period ``"."`` can be used to represent any character. For
    string ``"abcde"``:

    ::

        ab...   # match
        a.c.e   # match

One-or-more
    The plus sign ``"+"`` can be used to repeat the preceding shortest
    pattern once or more times. For string ``"aaabbb"``:

    ::

        a+b+        # match
        aa+bb+      # match
        a+.+        # match
        aa+bbb+     # match

Zero-or-more
    The asterisk ``"*"`` can be used to match the preceding shortest
    pattern zero-or-more times. For string ``"aaabbb``":

    ::

        a*b*        # match
        a*b*c*      # match
        .*bbb.*     # match
        aaa*bbb*    # match

Zero-or-one
    The question mark ``"?"`` makes the preceding shortest pattern
    optional. It matches zero or one times. For string ``"aaabbb"``:

    ::

        aaa?bbb?    # match
        aaaa?bbbb?  # match
        .....?.?    # match
        aa?bb?      # no match

Min-to-max
    Curly brackets ``"{}"`` can be used to specify a minimum and
    (optionally) a maximum number of times the preceding shortest
    pattern can repeat. The allowed forms are:

    ::

        {5}     # repeat exactly 5 times
        {2,5}   # repeat at least twice and at most 5 times
        {2,}    # repeat at least twice

    For string ``"aaabbb"``:

    ::

        a{3}b{3}        # match
        a{2,4}b{2,4}    # match
        a{2,}b{2,}      # match
        .{3}.{3}        # match
        a{4}b{4}        # no match
        a{4,6}b{4,6}    # no match
        a{4,}b{4,}      # no match

Grouping
    Parentheses ``"()"`` can be used to form sub-patterns. The quantity
    operators listed above operate on the shortest previous pattern,
    which can be a group. For string ``"ababab"``:

    ::

        (ab)+       # match
        ab(ab)+     # match
        (..)+       # match
        (...)+      # no match
        (ab)*       # match
        abab(ab)?   # match
        ab(ab)?     # no match
        (ab){3}     # match
        (ab){1,2}   # no match

Alternation
    The pipe symbol ``"|"`` acts as an OR operator. The match will
    succeed if the pattern on either the left-hand side OR the
    right-hand side matches. The alternation applies to the *longest
    pattern*, not the shortest. For string ``"aabb"``:

    ::

        aabb|bbaa   # match
        aacc|bb     # no match
        aa(cc|bb)   # match
        a+|b+       # no match
        a+b+|b+a+   # match
        a+(b|c)+    # match

Character classes
    Ranges of potential characters may be represented as character
    classes by enclosing them in square brackets ``"[]"``. A leading
    ``^`` negates the character class. The allowed forms are:

    ::

        [abc]   # 'a' or 'b' or 'c'
        [a-c]   # 'a' or 'b' or 'c'
        [-abc]  # '-' or 'a' or 'b' or 'c'
        [abc\-] # '-' or 'a' or 'b' or 'c'
        [^abc]  # any character except 'a' or 'b' or 'c'
        [^a-c]  # any character except 'a' or 'b' or 'c'
        [^-abc]  # any character except '-' or 'a' or 'b' or 'c'
        [^abc\-] # any character except '-' or 'a' or 'b' or 'c'

    Note that the dash ``"-"`` indicates a range of characters, unless
    it is the first character or if it is escaped with a backslash.

    For string ``"abcd"``:

    ::

        ab[cd]+     # match
        [a-d]+      # match
        [^a-d]+     # no match

Optional operators
^^^^^^^^^^^^^^^^^^

These operators are only available when they are explicitly enabled, by
passing ``flags`` to the query.

Multiple flags can be enabled either using the ``ALL`` flag, or by
concatenating flags with a pipe ``"|"``:

::

    {
        "regexp": {
            "username": {
                "value": "john~athon<1-5>",
                "flags": "COMPLEMENT|INTERVAL"
            }
        }
    }

Complement
    The complement is probably the most useful option. The shortest
    pattern that follows a tilde ``"~"`` is negated. For the string
    ``"abcdef"``:

    ::

        ab~df     # match
        ab~cf     # no match
        a~(cd)f   # match
        a~(bc)f   # no match

    Enabled with the ``COMPLEMENT`` or ``ALL`` flags.

Interval
    The interval option enables the use of numeric ranges, enclosed by
    angle brackets ``"<>"``. For string: ``"foo80"``:

    ::

        foo<1-100>     # match
        foo<01-100>    # match
        foo<001-100>   # no match

    Enabled with the ``INTERVAL`` or ``ALL`` flags.

Intersection
    The ampersand ``"&"`` joins two patterns in a way that both of them
    have to match. For string ``"aaabbb"``:

    ::

        aaa.+&.+bbb     # match
        aaa&bbb         # no match

    Using this feature usually means that you should rewrite your
    regular expression.

    Enabled with the ``INTERSECTION`` or ``ALL`` flags.

Any string
    The at sign ``"@"`` matches any string in its entirety. This could
    be combined with the intersection and complement above to express
    “everything except”. For instance:

    ::

        @&~(foo.+)      # anything except string beginning with "foo"

    Enabled with the ``ANYSTRING`` or ``ALL`` flags.

Span First Query
----------------

Matches spans near the beginning of a field. The span first query maps
to Lucene ``SpanFirstQuery``. Here is an example:

.. code:: js

    {
        "span_first" : {
            "match" : {
                "span_term" : { "user" : "kimchy" }
            },
            "end" : 3
        }
    }

The ``match`` clause can be any other span type query. The ``end``
controls the maximum end position permitted in a match.

Span Multi Term Query
---------------------

The ``span_multi`` query allows you to wrap a ``multi term query`` (one
of fuzzy, prefix, term range or regexp query) as a ``span query``, so it
can be nested. Example:

.. code:: js

    {
        "span_multi":{
            "match":{
                "prefix" : { "user" :  { "value" : "ki" } }
            }
        }
    }

A boost can also be associated with the query:

.. code:: js

    {
        "span_multi":{
            "match":{
                "prefix" : { "user" :  { "value" : "ki", "boost" : 1.08 } }
            }
        }
    }

Span Near Query
---------------

Matches spans which are near one another. One can specify *slop*, the
maximum number of intervening unmatched positions, as well as whether
matches are required to be in-order. The span near query maps to Lucene
``SpanNearQuery``. Here is an example:

.. code:: js

    {
        "span_near" : {
            "clauses" : [
                { "span_term" : { "field" : "value1" } },
                { "span_term" : { "field" : "value2" } },
                { "span_term" : { "field" : "value3" } }
            ],
            "slop" : 12,
            "in_order" : false,
            "collect_payloads" : false
        }
    }

The ``clauses`` element is a list of one or more other span type queries
and the ``slop`` controls the maximum number of intervening unmatched
positions permitted.

Span Not Query
--------------

Removes matches which overlap with another span query. The span not
query maps to Lucene ``SpanNotQuery``. Here is an example:

.. code:: js

    {
        "span_not" : {
            "include" : {
                "span_term" : { "field1" : "hoya" }
            },
            "exclude" : {
                "span_near" : {
                    "clauses" : [
                        { "span_term" : { "field1" : "la" } },
                        { "span_term" : { "field1" : "hoya" } }
                    ],
                    "slop" : 0,
                    "in_order" : true
                }
            }
        }
    }

The ``include`` and ``exclude`` clauses can be any span type query. The
``include`` clause is the span query whose matches are filtered, and the
``exclude`` clause is the span query whose matches must not overlap
those returned.

In the above example all documents with the term hoya are filtered
except the ones that have *la* preceding them.

Span Or Query
-------------

Matches the union of its span clauses. The span or query maps to Lucene
``SpanOrQuery``. Here is an example:

.. code:: js

    {
        "span_or" : {
            "clauses" : [
                { "span_term" : { "field" : "value1" } },
                { "span_term" : { "field" : "value2" } },
                { "span_term" : { "field" : "value3" } }
            ]
        }
    }

The ``clauses`` element is a list of one or more other span type
queries.

Span Term Query
---------------

Matches spans containing a term. The span term query maps to Lucene
``SpanTermQuery``. Here is an example:

.. code:: js

    {
        "span_term" : { "user" : "kimchy" }
    }

A boost can also be associated with the query:

.. code:: js

    {
        "span_term" : { "user" : { "value" : "kimchy", "boost" : 2.0 } }
    }

Or :

.. code:: js

    {
        "span_term" : { "user" : { "term" : "kimchy", "boost" : 2.0 } }
    }

Term Query
----------

Matches documents that have fields that contain a term (**not
analyzed**). The term query maps to Lucene ``TermQuery``. The following
matches documents where the user field contains the term ``kimchy``:

.. code:: js

    {
        "term" : { "user" : "kimchy" }
    }

A boost can also be associated with the query:

.. code:: js

    {
        "term" : { "user" : { "value" : "kimchy", "boost" : 2.0 } }
    }

Or :

.. code:: js

    {
        "term" : { "user" : { "term" : "kimchy", "boost" : 2.0 } }
    }

Terms Query
-----------

A query that match on any (configurable) of the provided terms. This is
a simpler syntax query for using a ``bool`` query with several ``term``
queries in the ``should`` clauses. For example:

.. code:: js

    {
        "terms" : {
            "tags" : [ "blue", "pill" ],
            "minimum_should_match" : 1
        }
    }

The ``terms`` query is also aliased with ``in`` as the query name for
simpler usage.

Top Children Query
------------------

The ``top_children`` query runs the child query with an estimated hits
size, and out of the hit docs, aggregates it into parent docs. If there
aren’t enough parent docs matching the requested from/size search
request, then it is run again with a wider (more hits) search.

The ``top_children`` also provide scoring capabilities, with the ability
to specify ``max``, ``sum`` or ``avg`` as the score type.

One downside of using the ``top_children`` is that if there are more
child docs matching the required hits when executing the child query,
then the ``total_hits`` result of the search response will be incorrect.

How many hits are asked for in the first child query run is controlled
using the ``factor`` parameter (defaults to ``5``). For example, when
asking for 10 parent docs (with ``from`` set to 0), then the child query
will execute with 50 hits expected. If not enough parents are found (in
our example 10), and there are still more child docs to query, then the
child search hits are expanded by multiplying by the
``incremental_factor`` (defaults to ``2``).

The required parameters are the ``query`` and ``type`` (the child type
to execute the query on). Here is an example with all different
parameters, including the default values:

.. code:: js

    {
        "top_children" : {
            "type": "blog_tag",
            "query" : {
                "term" : {
                    "tag" : "something"
                }
            },
            "score" : "max",
            "factor" : 5,
            "incremental_factor" : 2
        }
    }

**Scope**

A ``_scope`` can be defined on the query allowing to run aggregations on
the same scope name that will work against the child documents. For
example:

.. code:: js

    {
        "top_children" : {
            "_scope" : "my_scope",
            "type": "blog_tag",
            "query" : {
                "term" : {
                    "tag" : "something"
                }
            }
        }
    }

**Memory Considerations**

In order to support parent-child joins, all of the (string) parent IDs
must be resident in memory (in the `field data
cache <#index-modules-fielddata>`__. Additionaly, every child document
is mapped to its parent using a long value (approximately). It is
advisable to keep the string parent ID short in order to reduce memory
usage.

You can check how much memory is being used by the ID cache using the
`indices stats <#indices-stats>`__ or `nodes
stats <#cluster-nodes-stats>`__ APIS, eg:

.. code:: js

    curl -XGET "http://localhost:9200/_stats/id_cache?pretty&human"

Wildcard Query
--------------

Matches documents that have fields matching a wildcard expression (**not
analyzed**). Supported wildcards are ``*``, which matches any character
sequence (including the empty one), and ``?``, which matches any single
character. Note this query can be slow, as it needs to iterate over many
terms. In order to prevent extremely slow wildcard queries, a wildcard
term should not start with one of the wildcards ``*`` or ``?``. The
wildcard query maps to Lucene ``WildcardQuery``.

.. code:: js

    {
        "wildcard" : { "user" : "ki*y" }
    }

A boost can also be associated with the query:

.. code:: js

    {
        "wildcard" : { "user" : { "value" : "ki*y", "boost" : 2.0 } }
    }

Or :

.. code:: js

    {
        "wildcard" : { "user" : { "wildcard" : "ki*y", "boost" : 2.0 } }
    }

This multi term query allows to control how it gets rewritten using the
`rewrite <#query-dsl-multi-term-rewrite>`__ parameter.

Minimum Should Match
--------------------

The ``minimum_should_match`` parameter possible values:

+--------------------------+--------------------------+--------------------------+
| Type                     | Example                  | Description              |
+==========================+==========================+==========================+
| Integer                  | ``3``                    | Indicates a fixed value  |
|                          |                          | regardless of the number |
|                          |                          | of optional clauses.     |
+--------------------------+--------------------------+--------------------------+
| Negative integer         | ``-2``                   | Indicates that the total |
|                          |                          | number of optional       |
|                          |                          | clauses, minus this      |
|                          |                          | number should be         |
|                          |                          | mandatory.               |
+--------------------------+--------------------------+--------------------------+
| Percentage               | ``75%``                  | Indicates that this      |
|                          |                          | percent of the total     |
|                          |                          | number of optional       |
|                          |                          | clauses are necessary.   |
|                          |                          | The number computed from |
|                          |                          | the percentage is        |
|                          |                          | rounded down and used as |
|                          |                          | the minimum.             |
+--------------------------+--------------------------+--------------------------+
| Negative percentage      | ``-25%``                 | Indicates that this      |
|                          |                          | percent of the total     |
|                          |                          | number of optional       |
|                          |                          | clauses can be missing.  |
|                          |                          | The number computed from |
|                          |                          | the percentage is        |
|                          |                          | rounded down, before     |
|                          |                          | being subtracted from    |
|                          |                          | the total to determine   |
|                          |                          | the minimum.             |
+--------------------------+--------------------------+--------------------------+
| Combination              | ``3<90%``                | A positive integer,      |
|                          |                          | followed by the          |
|                          |                          | less-than symbol,        |
|                          |                          | followed by any of the   |
|                          |                          | previously mentioned     |
|                          |                          | specifiers is a          |
|                          |                          | conditional              |
|                          |                          | specification. It        |
|                          |                          | indicates that if the    |
|                          |                          | number of optional       |
|                          |                          | clauses is equal to (or  |
|                          |                          | less than) the integer,  |
|                          |                          | they are all required,   |
|                          |                          | but if it’s greater than |
|                          |                          | the integer, the         |
|                          |                          | specification applies.   |
|                          |                          | In this example: if      |
|                          |                          | there are 1 to 3 clauses |
|                          |                          | they are all required,   |
|                          |                          | but for 4 or more        |
|                          |                          | clauses only 90% are     |
|                          |                          | required.                |
+--------------------------+--------------------------+--------------------------+
| Multiple combinations    | ``2<-25% 9<-3``          | Multiple conditional     |
|                          |                          | specifications can be    |
|                          |                          | separated by spaces,     |
|                          |                          | each one only being      |
|                          |                          | valid for numbers        |
|                          |                          | greater than the one     |
|                          |                          | before it. In this       |
|                          |                          | example: if there are 1  |
|                          |                          | or 2 clauses both are    |
|                          |                          | required, if there are   |
|                          |                          | 3-9 clauses all but 25%  |
|                          |                          | are required, and if     |
|                          |                          | there are more than 9    |
|                          |                          | clauses, all but three   |
|                          |                          | are required.            |
+--------------------------+--------------------------+--------------------------+

**NOTE:**

When dealing with percentages, negative values can be used to get
different behavior in edge cases. 75% and -25% mean the same thing when
dealing with 4 clauses, but when dealing with 5 clauses 75% means 3 are
required, but -25% means 4 are required.

If the calculations based on the specification determine that no
optional clauses are needed, the usual rules about BooleanQueries still
apply at search time (a BooleanQuery containing no required clauses must
still match at least one optional clause)

No matter what number the calculation arrives at, a value greater than
the number of optional clauses, or a value less than 1 will never be
used. (ie: no matter how low or how high the result of the calculation
result is, the minimum number of required matches will never be lower
than 1 or greater than the number of clauses.

Multi Term Query Rewrite
------------------------

Multi term queries, like `wildcard <#query-dsl-wildcard-query>`__ and
`prefix <#query-dsl-prefix-query>`__ are called multi term queries and
end up going through a process of rewrite. This also happens on the
`query\_string <#query-dsl-query-string-query>`__. All of those queries
allow to control how they will get rewritten using the ``rewrite``
parameter:

-  When not set, or set to ``constant_score_auto``, defaults to
   automatically choosing either ``constant_score_boolean`` or
   ``constant_score_filter`` based on query characteristics.

-  ``scoring_boolean``: A rewrite method that first translates each term
   into a should clause in a boolean query, and keeps the scores as
   computed by the query. Note that typically such scores are
   meaningless to the user, and require non-trivial CPU to compute, so
   it’s almost always better to use ``constant_score_auto``. This
   rewrite method will hit too many clauses failure if it exceeds the
   boolean query limit (defaults to ``1024``).

-  ``constant_score_boolean``: Similar to ``scoring_boolean`` except
   scores are not computed. Instead, each matching document receives a
   constant score equal to the query’s boost. This rewrite method will
   hit too many clauses failure if it exceeds the boolean query limit
   (defaults to ``1024``).

-  ``constant_score_filter``: A rewrite method that first creates a
   private Filter by visiting each term in sequence and marking all docs
   for that term. Matching documents are assigned a constant score equal
   to the query’s boost.

-  ``top_terms_N``: A rewrite method that first translates each term
   into should clause in boolean query, and keeps the scores as computed
   by the query. This rewrite method only uses the top scoring terms so
   it will not overflow boolean max clause count. The ``N`` controls the
   size of the top scoring terms to use.

-  ``top_terms_boost_N``: A rewrite method that first translates each
   term into should clause in boolean query, but the scores are only
   computed as the boost. This rewrite method only uses the top scoring
   terms so it will not overflow the boolean max clause count. The ``N``
   controls the size of the top scoring terms to use.

Template Query
--------------

A query that accepts a query template and a map of key/value pairs to
fill in template parameters.

.. code:: js

    GET /_search
    {
        "query": {
            "template": {
                "query": {"match_{{template}}": {}},
                "params" : {
                    "template" : "all"
                }
            }
        }
    }

Alternatively passing the template as an escaped string works as well:

.. code:: js

    GET /_search
    {
        "query": {
            "template": {
                "query": "{\"match_{{template}}\": {}}\"", 
                "params" : {
                    "template" : "all"
                }
            }
        }
    }

New line characters (``\n``) should be escaped as ``\\n`` or removed,
and quotes (``"``) should be escaped as ``\\"``.

You can register a template by storing it in the ``config/scripts``
directory, in a file using the ``.mustache`` extension. In order to
execute the stored template, reference it by name in the ``query``
parameter:

.. code:: js

    GET /_search
    {
        "query": {
            "template": {
                "query": "storedTemplate", 
                "params" : {
                    "template" : "all"
                }
            }
        }
    }

Name of the the query template in ``config/scripts/``, i.e.,
``storedTemplate.mustache``.

Templating is based on Mustache. For simple token substitution all you
provide is a query containing some variable that you want to substitute
and the actual values:

.. code:: js

    GET /_search
    {
        "query": {
            "template": {
                "query": {"match_{{template}}": {}},
                "params" : {
                    "template" : "all"
                }
            }
        }
    }

which is then turned into:

.. code:: js

    {
        "query": {
            "match_all": {}
        }
    }

You can register a template by storing it in the elasticsearch index
``.scripts`` or by using the REST API. (See ? for more details) In order
to execute the stored template, reference it by name in the ``query``
parameter:

.. code:: js

    GET /_search
    {
        "query": {
            "template": {
                "query": "templateName", 
                "params" : {
                    "template" : "all"
                }
            }
        }
    }

Name of the the query template stored in the index.

.. code:: js

    GET /_search
    {
        "query": {
            "template": {
                "query": "storedTemplate", 
                "params" : {
                    "template" : "all"
                }
            }
        }
    }

There is also a dedicated ``template`` endpoint, allows you to template
an entire search request. Please see ? for more details.

Filters
=======

As a general rule, filters should be used instead of queries:

-  for binary yes/no searches

-  for queries on exact values

**Filters and Caching**

Filters can be a great candidate for caching. Caching the result of a
filter does not require a lot of memory, and will cause other queries
executing against the same filter (same parameters) to be blazingly
fast.

Some filters already produce a result that is easily cacheable, and the
difference between caching and not caching them is the act of placing
the result in the cache or not. These filters, which include the
`term <#query-dsl-term-filter>`__, `terms <#query-dsl-terms-filter>`__,
`prefix <#query-dsl-prefix-filter>`__, and
`range <#query-dsl-range-filter>`__ filters, are by default cached and
are recommended to use (compared to the equivalent query version) when
the same filter (same parameters) will be used across multiple different
queries (for example, a range filter with age higher than 10).

Other filters, usually already working with the field data loaded into
memory, are not cached by default. Those filters are already very fast,
and the process of caching them requires extra processing in order to
allow the filter result to be used with different queries than the one
executed. These filters, including the geo, and
`script <#query-dsl-script-filter>`__ filters are not cached by default.

The last type of filters are those working with other filters. The
`and <#query-dsl-and-filter>`__, `not <#query-dsl-not-filter>`__ and
`or <#query-dsl-or-filter>`__ filters are not cached as they basically
just manipulate the internal filters.

All filters allow to set ``_cache`` element on them to explicitly
control caching. They also allow to set ``_cache_key`` which will be
used as the caching key for that filter. This can be handy when using
very large filters (like a terms filter with many elements in it).

And Filter
----------

A filter that matches documents using the ``AND`` boolean operator on
other filters. Can be placed within queries that accept a filter.

.. code:: js

    {
        "filtered" : {
            "query" : {
                "term" : { "name.first" : "shay" }
            },
            "filter" : {
                "and" : [
                    {
                        "range" : {
                            "postDate" : {
                                "from" : "2010-03-01",
                                "to" : "2010-04-01"
                            }
                        }
                    },
                    {
                        "prefix" : { "name.second" : "ba" }
                    }
                ]
            }
        }
    }

**Caching**

The result of the filter is not cached by default. The ``_cache`` can be
set to ``true`` in order to cache it (though usually not needed). Since
the ``_cache`` element requires to be set on the ``and`` filter itself,
the structure then changes a bit to have the filters provided within a
``filters`` element:

.. code:: js

    {
        "filtered" : {
            "query" : {
                "term" : { "name.first" : "shay" }
            },
            "filter" : {
                "and" : {
                    "filters": [
                        {
                            "range" : {
                                "postDate" : {
                                    "from" : "2010-03-01",
                                    "to" : "2010-04-01"
                                }
                            }
                        },
                        {
                            "prefix" : { "name.second" : "ba" }
                        }
                    ],
                    "_cache" : true
                }
            }
        }
    }

Bool Filter
-----------

A filter that matches documents matching boolean combinations of other
queries. Similar in concept to `Boolean
query <#query-dsl-bool-query>`__, except that the clauses are other
filters. Can be placed within queries that accept a filter.

.. code:: js

    {
        "filtered" : {
            "query" : {
                "queryString" : {
                    "default_field" : "message",
                    "query" : "elasticsearch"
                }
            },
            "filter" : {
                "bool" : {
                    "must" : {
                        "term" : { "tag" : "wow" }
                    },
                    "must_not" : {
                        "range" : {
                            "age" : { "from" : 10, "to" : 20 }
                        }
                    },
                    "should" : [
                        {
                            "term" : { "tag" : "sometag" }
                        },
                        {
                            "term" : { "tag" : "sometagtag" }
                        }
                    ]
                }
            }
        }
    }

**Caching**

The result of the ``bool`` filter is not cached by default (though
internal filters might be). The ``_cache`` can be set to ``true`` in
order to enable caching.

Exists Filter
-------------

Returns documents that have at least one non-\ ``null`` value in the
original field:

.. code:: js

    {
        "constant_score" : {
            "filter" : {
                "exists" : { "field" : "user" }
            }
        }
    }

For instance, these documents would all match the above filter:

.. code:: js

    { "user": "jane" }
    { "user": "" } 
    { "user": "-" } 
    { "user": ["jane"] }
    { "user": ["jane", null ] } 

An empty string is a non-\ ``null`` value.

Even though the ``standard`` analyzer would emit zero tokens, the
original field is non-\ ``null``.

At least one non-\ ``null`` value is required.

These documents would **not** match the above filter:

.. code:: js

    { "user": null }
    { "user": [] } 
    { "user": [null] } 
    { "foo":  "bar" } 

This field has no values.

At least one non-\ ``null`` value is required.

The ``user`` field is missing completely.

**``null_value`` mapping**

If the field mapping includes the ``null_value`` setting (see ?) then
explicit ``null`` values are replaced with the specified ``null_value``.
For instance, if the ``user`` field were mapped as follows:

.. code:: js

      "user": {
        "type": "string",
        "null_value": "_null_"
      }

then explicit ``null`` values would be indexed as the string ``_null_``,
and the following docs would match the ``exists`` filter:

.. code:: js

    { "user": null }
    { "user": [null] }

However, these docs—without explicit ``null`` values—would still have no
values in the ``user`` field and thus would not match the ``exists``
filter:

.. code:: js

    { "user": [] }
    { "foo": "bar" }

**Caching**

The result of the filter is always cached.

Geo Bounding Box Filter
-----------------------

A filter allowing to filter hits based on a point location using a
bounding box. Assuming the following indexed document:

.. code:: js

    {
        "pin" : {
            "location" : {
                "lat" : 40.12,
                "lon" : -71.34
            }
        }
    }

Then the following simple query can be executed with a
``geo_bounding_box`` filter:

.. code:: js

    {
        "filtered" : {
            "query" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_bounding_box" : {
                    "pin.location" : {
                        "top_left" : {
                            "lat" : 40.73,
                            "lon" : -74.1
                        },
                        "bottom_right" : {
                            "lat" : 40.01,
                            "lon" : -71.12
                        }
                    }
                }
            }
        }
    }

**Accepted Formats**

In much the same way the geo\_point type can accept different
representation of the geo point, the filter can accept it as well:

**Lat Lon As Properties**

.. code:: js

    {
        "filtered" : {
            "query" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_bounding_box" : {
                    "pin.location" : {
                        "top_left" : {
                            "lat" : 40.73,
                            "lon" : -74.1
                        },
                        "bottom_right" : {
                            "lat" : 40.01,
                            "lon" : -71.12
                        }
                    }
                }
            }
        }
    }

**Lat Lon As Array**

Format in ``[lon, lat]``, note, the order of lon/lat here in order to
conform with `GeoJSON <http://geojson.org/>`__.

.. code:: js

    {
        "filtered" : {
            "query" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_bounding_box" : {
                    "pin.location" : {
                        "top_left" : [-74.1, 40.73],
                        "bottom_right" : [-71.12, 40.01]
                    }
                }
            }
        }
    }

**Lat Lon As String**

Format in ``lat,lon``.

.. code:: js

    {
        "filtered" : {
            "query" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_bounding_box" : {
                    "pin.location" : {
                        "top_left" : "40.73, -74.1",
                        "bottom_right" : "40.01, -71.12"
                    }
                }
            }
        }
    }

**Geohash**

.. code:: js

    {
        "filtered" : {
            "query" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_bounding_box" : {
                    "pin.location" : {
                        "top_left" : "dr5r9ydj2y73",
                        "bottom_right" : "drj7teegpus6"
                    }
                }
            }
        }
    }

**Vertices**

The vertices of the bounding box can either be set by ``top_left`` and
``bottom_right`` or by ``top_right`` and ``bottom_left`` parameters.
More over the names ``topLeft``, ``bottomRight``, ``topRight`` and
``bottomLeft`` are supported. Instead of setting the values pairwise,
one can use the simple names ``top``, ``left``, ``bottom`` and ``right``
to set the values separately.

.. code:: js

    {
        "filtered" : {
            "query" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_bounding_box" : {
                    "pin.location" : {
                        "top" : -74.1,
                        "left" : 40.73,
                        "bottom" : -71.12,
                        "right" : 40.01
                    }
                }
            }
        }
    }

**geo\_point Type**

The filter **requires** the ``geo_point`` type to be set on the relevant
field.

**Multi Location Per Document**

The filter can work with multiple locations / points per document. Once
a single location / point matches the filter, the document will be
included in the filter

**Type**

The type of the bounding box execution by default is set to ``memory``,
which means in memory checks if the doc falls within the bounding box
range. In some cases, an ``indexed`` option will perform faster (but
note that the ``geo_point`` type must have lat and lon indexed in this
case). Note, when using the indexed option, multi locations per document
field are not supported. Here is an example:

.. code:: js

    {
        "filtered" : {
            "query" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_bounding_box" : {
                    "pin.location" : {
                        "top_left" : {
                            "lat" : 40.73,
                            "lon" : -74.1
                        },
                        "bottom_right" : {
                            "lat" : 40.10,
                            "lon" : -71.12
                        }
                    },
                    "type" : "indexed"
                }
            }
        }
    }

**Caching**

The result of the filter is not cached by default. The ``_cache`` can be
set to ``true`` to cache the **result** of the filter. This is handy
when the same bounding box parameters are used on several (many) other
queries. Note, the process of caching the first execution is higher when
caching (since it needs to satisfy different queries).

Geo Distance Filter
-------------------

Filters documents that include only hits that exists within a specific
distance from a geo point. Assuming the following indexed json:

.. code:: js

    {
        "pin" : {
            "location" : {
                "lat" : 40.12,
                "lon" : -71.34
            }
        }
    }

Then the following simple query can be executed with a ``geo_distance``
filter:

.. code:: js

    {
        "filtered" : {
            "query" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_distance" : {
                    "distance" : "200km",
                    "pin.location" : {
                        "lat" : 40,
                        "lon" : -70
                    }
                }
            }
        }
    }

**Accepted Formats**

In much the same way the ``geo_point`` type can accept different
representation of the geo point, the filter can accept it as well:

**Lat Lon As Properties**

.. code:: js

    {
        "filtered" : {
            "query" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_distance" : {
                    "distance" : "12km",
                    "pin.location" : {
                        "lat" : 40,
                        "lon" : -70
                    }
                }
            }
        }
    }

**Lat Lon As Array**

Format in ``[lon, lat]``, note, the order of lon/lat here in order to
conform with `GeoJSON <http://geojson.org/>`__.

.. code:: js

    {
        "filtered" : {
            "query" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_distance" : {
                    "distance" : "12km",
                    "pin.location" : [-70, 40]
                }
            }
        }
    }

**Lat Lon As String**

Format in ``lat,lon``.

.. code:: js

    {
        "filtered" : {
            "query" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_distance" : {
                    "distance" : "12km",
                    "pin.location" : "40,-70"
                }
            }
        }
    }

**Geohash**

.. code:: js

    {
        "filtered" : {
            "query" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_distance" : {
                    "distance" : "12km",
                    "pin.location" : "drm3btev3e86"
                }
            }
        }
    }

**Options**

The following are options allowed on the filter:

+------------+---------------------------------------------------------------+
| ``distance | The radius of the circle centred on the specified location.   |
| ``         | Points which fall into this circle are considered to be       |
|            | matches. The ``distance`` can be specified in various units.  |
|            | See ?.                                                        |
+------------+---------------------------------------------------------------+
| ``distance | How to compute the distance. Can either be ``sloppy_arc``     |
| _type``    | (default), ``arc`` (slighly more precise but significantly    |
|            | slower) or ``plane`` (faster, but inaccurate on long          |
|            | distances and close to the poles).                            |
+------------+---------------------------------------------------------------+
| ``optimize | Whether to use the optimization of first running a bounding   |
| _bbox``    | box check before the distance check. Defaults to ``memory``   |
|            | which will do in memory checks. Can also have values of       |
|            | ``indexed`` to use indexed value check (make sure the         |
|            | ``geo_point`` type index lat lon in this case), or ``none``   |
|            | which disables bounding box optimization.                     |
+------------+---------------------------------------------------------------+

**geo\_point Type**

The filter **requires** the ``geo_point`` type to be set on the relevant
field.

**Multi Location Per Document**

The ``geo_distance`` filter can work with multiple locations / points
per document. Once a single location / point matches the filter, the
document will be included in the filter.

**Caching**

The result of the filter is not cached by default. The ``_cache`` can be
set to ``true`` to cache the **result** of the filter. This is handy
when the same point and distance parameters are used on several (many)
other queries. Note, the process of caching the first execution is
higher when caching (since it needs to satisfy different queries).

Geo Distance Range Filter
-------------------------

Filters documents that exists within a range from a specific point:

.. code:: js

    {
        "filtered" : {
            "query" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_distance_range" : {
                    "from" : "200km",
                    "to" : "400km"
                    "pin.location" : {
                        "lat" : 40,
                        "lon" : -70
                    }
                }
            }
        }
    }

Supports the same point location parameter as the
`geo\_distance <#query-dsl-geo-distance-filter>`__ filter. And also
support the common parameters for range (lt, lte, gt, gte, from, to,
include\_upper and include\_lower).

Geo Polygon Filter
------------------

A filter allowing to include hits that only fall within a polygon of
points. Here is an example:

.. code:: js

    {
        "filtered" : {
            "query" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_polygon" : {
                    "person.location" : {
                        "points" : [
                            {"lat" : 40, "lon" : -70},
                            {"lat" : 30, "lon" : -80},
                            {"lat" : 20, "lon" : -90}
                        ]
                    }
                }
            }
        }
    }

**Allowed Formats**

**Lat Long as Array**

Format in ``[lon, lat]``, note, the order of lon/lat here in order to
conform with `GeoJSON <http://geojson.org/>`__.

.. code:: js

    {
        "filtered" : {
            "query" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_polygon" : {
                    "person.location" : {
                        "points" : [
                            [-70, 40],
                            [-80, 30],
                            [-90, 20]
                        ]
                    }
                }
            }
        }
    }

**Lat Lon as String**

Format in ``lat,lon``.

.. code:: js

    {
        "filtered" : {
            "query" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_polygon" : {
                    "person.location" : {
                        "points" : [
                            "40, -70",
                            "30, -80",
                            "20, -90"
                        ]
                    }
                }
            }
        }
    }

**Geohash**

.. code:: js

    {
        "filtered" : {
            "query" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_polygon" : {
                    "person.location" : {
                        "points" : [
                            "drn5x1g8cu2y",
                            "30, -80",
                            "20, -90"
                        ]
                    }
                }
            }
        }
    }

**geo\_point Type**

The filter **requires** the `geo\_point <#mapping-geo-point-type>`__
type to be set on the relevant field.

**Caching**

The result of the filter is not cached by default. The ``_cache`` can be
set to ``true`` to cache the **result** of the filter. This is handy
when the same points parameters are used on several (many) other
queries. Note, the process of caching the first execution is higher when
caching (since it needs to satisfy different queries).

GeoShape Filter
---------------

Filter documents indexed using the ``geo_shape`` type.

Requires the `geo\_shape Mapping <#mapping-geo-shape-type>`__.

You may also use the `geo\_shape Query <#query-dsl-geo-shape-query>`__.

The ``geo_shape`` Filter uses the same grid square representation as the
geo\_shape mapping to find documents that have a shape that intersects
with the query shape. It will also use the same PrefixTree configuration
as defined for the field mapping.

**Filter Format**

The Filter supports two ways of defining the Filter shape, either by
providing a whole shape definition, or by referencing the name of a
shape pre-indexed in another index. Both formats are defined below with
examples.

**Provided Shape Definition**

Similar to the ``geo_shape`` type, the ``geo_shape`` Filter uses
`GeoJSON <http://www.geojson.org>`__ to represent shapes.

Given a document that looks like this:

.. code:: js

    {
        "name": "Wind & Wetter, Berlin, Germany",
        "location": {
            "type": "Point",
            "coordinates": [13.400544, 52.530286]
        }
    }

The following query will find the point using the Elasticsearch’s
``envelope`` GeoJSON extension:

.. code:: js

    {
        "query":{
            "filtered": {
                "query": {
                    "match_all": {}
                },
                "filter": {
                    "geo_shape": {
                        "location": {
                            "shape": {
                                "type": "envelope",
                                "coordinates" : [[13.0, 53.0], [14.0, 52.0]]
                            }
                        }
                    }
                }
            }
        }
    }

**Pre-Indexed Shape**

The Filter also supports using a shape which has already been indexed in
another index and/or index type. This is particularly useful for when
you have a pre-defined list of shapes which are useful to your
application and you want to reference this using a logical name (for
example *New Zealand*) rather than having to provide their coordinates
each time. In this situation it is only necessary to provide:

-  ``id`` - The ID of the document that containing the pre-indexed
   shape.

-  ``index`` - Name of the index where the pre-indexed shape is.
   Defaults to *shapes*.

-  ``type`` - Index type where the pre-indexed shape is.

-  ``path`` - The field specified as path containing the pre-indexed
   shape. Defaults to *shape*.

The following is an example of using the Filter with a pre-indexed
shape:

.. code:: js

    {
        "filtered": {
            "query": {
                "match_all": {}
            },
            "filter": {
                "geo_shape": {
                    "location": {
                        "indexed_shape": {
                            "id": "DEU",
                            "type": "countries",
                            "index": "shapes",
                            "path": "location"
                        }
                    }
                }
            }
        }
    }

**Caching**

The result of the Filter is not cached by default. Setting ``_cache`` to
``true`` will mean the results of the Filter will be cached. Since
shapes can contain 10s-100s of coordinates and any one differing means a
new shape, it may make sense to only using caching when you are sure
that the shapes will remain reasonably static.

Geohash Cell Filter
-------------------

The ``geohash_cell`` filter provides access to a hierarchy of geohashes.
By defining a geohash cell, only `geopoints <#mapping-geo-point-type>`__
within this cell will match this filter.

To get this filter work all prefixes of a geohash need to be indexed. In
example a geohash ``u30`` needs to be decomposed into three terms:
``u30``, ``u3`` and ``u``. This decomposition must be enabled in the
mapping of the `geopoint <#mapping-geo-point-type>`__ field that’s going
to be filtered by setting the ``geohash_prefix`` option:

.. code:: js

    {
        "mappings" : {
            "location": {
                "properties": {
                    "pin": {
                        "type": "geo_point",
                        "geohash": true,
                        "geohash_prefix": true,
                        "geohash_precision": 10
                    }
                }
            }
        }
    }

The geohash cell can defined by all formats of ``geo_points``. If such a
cell is defined by a latitude and longitude pair the size of the cell
needs to be setup. This can be done by the ``precision`` parameter of
the filter. This parameter can be set to an integer value which sets the
length of the geohash prefix. Instead of setting a geohash length
directly it is also possible to define the precision as distance, in
example ``"precision": "50m"``. (See ?.)

The ``neighbor`` option of the filter offers the possibility to filter
cells next to the given cell.

.. code:: js

    {
        "filtered" : {
            "query" : {
                "match_all" : {}
            },
            "filter" : {
                "geohash_cell": {
                    "pin": {
                        "lat": 13.4080,
                        "lon": 52.5186
                    },
                    "precision": 3,
                    "neighbors": true
                }
            }
        }
    }

**Caching**

The result of the filter is not cached by default. The ``_cache``
parameter can be set to ``true`` to turn caching on. By default the
filter uses the resulting geohash cells as a cache key. This can be
changed by using the ``_cache_key`` option.

Has Child Filter
----------------

The ``has_child`` filter accepts a query and the child type to run
against, and results in parent documents that have child docs matching
the query. Here is an example:

.. code:: js

    {
        "has_child" : {
            "type" : "blog_tag",
            "query" : {
                "term" : {
                    "tag" : "something"
                }
            }
        }
    }

The ``type`` is the child type to query against. The parent type to
return is automatically detected based on the mappings.

The way that the filter is implemented is by first running the child
query, doing the matching up to the parent doc for each document
matched.

The ``has_child`` filter also accepts a filter instead of a query:

.. code:: js

    {
        "has_child" : {
            "type" : "comment",
            "filter" : {
                "term" : {
                    "user" : "john"
                }
            }
        }
    }

**Min/Max Children**

The ``has_child`` filter allows you to specify that a minimum and/or
maximum number of children are required to match for the parent doc to
be considered a match:

.. code:: js

    {
        "has_child" : {
            "type" : "comment",
            "min_children": 2, 
            "max_children": 10, 
            "filter" : {
                "term" : {
                    "user" : "john"
                }
            }
        }
    }

Both ``min_children`` and ``max_children`` are optional.

The execution speed of the ``has_child`` filter is equivalent to that of
the ``has_child`` query when ``min_children`` or ``max_children`` is
specified.

**Memory Considerations**

In order to support parent-child joins, all of the (string) parent IDs
must be resident in memory (in the `field data
cache <#index-modules-fielddata>`__. Additionaly, every child document
is mapped to its parent using a long value (approximately). It is
advisable to keep the string parent ID short in order to reduce memory
usage.

You can check how much memory is being used by the ID cache using the
`indices stats <#indices-stats>`__ or `nodes
stats <#cluster-nodes-stats>`__ APIS, eg:

.. code:: js

    curl -XGET "http://localhost:9200/_stats/id_cache?pretty&human"

**Caching**

The ``has_child`` filter cannot be cached in the filter cache. The
``_cache`` and ``_cache_key`` options are a no-op in this filter. Also
any filter that wraps the ``has_child`` filter either directly or
indirectly will not be cached.

Has Parent Filter
-----------------

The ``has_parent`` filter accepts a query and a parent type. The query
is executed in the parent document space, which is specified by the
parent type. This filter returns child documents which associated
parents have matched. For the rest ``has_parent`` filter has the same
options and works in the same manner as the ``has_child`` filter.

**Filter example**

.. code:: js

    {
        "has_parent" : {
            "parent_type" : "blog",
            "query" : {
                "term" : {
                    "tag" : "something"
                }
            }
        }
    }

The ``parent_type`` field name can also be abbreviated to ``type``.

The way that the filter is implemented is by first running the parent
query, doing the matching up to the child doc for each document matched.

The ``has_parent`` filter also accepts a filter instead of a query:

.. code:: js

    {
        "has_parent" : {
            "type" : "blog",
            "filter" : {
                "term" : {
                    "text" : "bonsai three"
                }
            }
        }
    }

**Memory Considerations**

In order to support parent-child joins, all of the (string) parent IDs
must be resident in memory (in the `field data
cache <#index-modules-fielddata>`__. Additionaly, every child document
is mapped to its parent using a long value (approximately). It is
advisable to keep the string parent ID short in order to reduce memory
usage.

You can check how much memory is being used by the ID cache using the
`indices stats <#indices-stats>`__ or `nodes
stats <#cluster-nodes-stats>`__ APIS, eg:

.. code:: js

    curl -XGET "http://localhost:9200/_stats/id_cache?pretty&human"

**Caching**

The ``has_parent`` filter cannot be cached in the filter cache. The
``_cache`` and ``_cache_key`` options are a no-op in this filter. Also
any filter that wraps the ``has_parent`` filter either directly or
indirectly will not be cached.

Ids Filter
----------

Filters documents that only have the provided ids. Note, this filter
does not require the `\_id <#mapping-id-field>`__ field to be indexed
since it works using the `\_uid <#mapping-uid-field>`__ field.

.. code:: js

    {
        "ids" : {
            "type" : "my_type",
            "values" : ["1", "4", "100"]
        }
    }

The ``type`` is optional and can be omitted, and can also accept an
array of values.

Indices Filter
--------------

The ``indices`` filter can be used when executed across multiple
indices, allowing to have a filter that executes only when executed on
an index that matches a specific list of indices, and another filter
that executes when it is executed on an index that does not match the
listed indices.

.. code:: js

    {
        "indices" : {
            "indices" : ["index1", "index2"],
            "filter" : {
                "term" : { "tag" : "wow" }
            },
            "no_match_filter" : {
                "term" : { "tag" : "kow" }
            }
        }
    }

You can use the ``index`` field to provide a single index.

``no_match_filter`` can also have "string" value of ``none`` (to match
no documents), and ``all`` (to match all). Defaults to ``all``.

``filter`` is mandatory, as well as ``indices`` (or ``index``).

    **Tip**

    The fields order is important: if the ``indices`` are provided
    before ``filter`` or ``no_match_filter``, the related filters get
    parsed only against the indices that they are going to be executed
    on. This is useful to avoid parsing filters when it is not necessary
    and prevent potential mapping errors.

Limit Filter
------------

A limit filter limits the number of documents (per shard) to execute on.
For example:

.. code:: js

    {
        "filtered" : {
            "filter" : {
                 "limit" : {"value" : 100}
             },
             "query" : {
                "term" : { "name.first" : "shay" }
            }
        }
    }

Match All Filter
----------------

A filter that matches on all documents:

.. code:: js

    {
        "constant_score" : {
            "filter" : {
                "match_all" : { }
            }
        }
    }

Missing Filter
--------------

Returns documents that have no non-\ ``null`` values in the original
field:

.. code:: js

    {
        "constant_score" : {
            "filter" : {
                "missing" : { "field" : "user" }
            }
        }
    }

For instance, the following docs would match the above filter:

.. code:: js

    { "user": null }
    { "user": [] } 
    { "user": [null] } 
    { "foo":  "bar" } 

This field has no values.

This field has no non-\ ``null`` values.

The ``user`` field is missing completely.

These documents would **not** match the above filter:

.. code:: js

    { "user": "jane" }
    { "user": "" } 
    { "user": "-" } 
    { "user": ["jane"] }
    { "user": ["jane", null ] } 

An empty string is a non-\ ``null`` value.

Even though the ``standard`` analyzer would emit zero tokens, the
original field is non-\ ``null``.

This field has one non-\ ``null`` value.

**``null_value`` mapping**

If the field mapping includes a ``null_value`` (see ?) then explicit
``null`` values are replaced with the specified ``null_value``. For
instance, if the ``user`` field were mapped as follows:

.. code:: js

      "user": {
        "type": "string",
        "null_value": "_null_"
      }

then explicit ``null`` values would be indexed as the string ``_null_``,
and the the following docs would **not** match the ``missing`` filter:

.. code:: js

    { "user": null }
    { "user": [null] }

However, these docs—without explicit ``null`` values—would still have no
values in the ``user`` field and thus would match the ``missing``
filter:

.. code:: js

    { "user": [] }
    { "foo": "bar" }

**``existence`` and ``null_value`` parameters**

When the field being queried has a ``null_value`` mapping, then the
behaviour of the ``missing`` filter can be altered with the
``existence`` and ``null_value`` parameters:

.. code:: js

    {
        "constant_score" : {
            "filter" : {
                "missing" : {
                    "field" : "user",
                    "existence" : true,
                    "null_value" : false
                }
            }
        }
    }

``existence``
    When the ``existence`` parameter is set to ``true`` (the default),
    the missing filter will include documents where the field has **no**
    values, ie:

    .. code:: js

        { "user": [] }
        { "foo": "bar" }

    When set to ``false``, these documents will not be included.

``null_value``
    When the ``null_value`` parameter is set to ``true``, the missing
    filter will include documents where the field contains a ``null``
    value, ie:

    .. code:: js

        { "user": null }
        { "user": [null] }
        { "user": ["jane",null] } 

    Matches because the field contains a ``null`` value, even though it
    also contains a non-\ ``null`` value.

    When set to ``false`` (the default), these documents will not be
    included.

    **Note**

    Either ``existence`` or ``null_value`` or both must be set to
    ``true``.

**Caching**

The result of the filter is always cached.

Nested Filter
-------------

A ``nested`` filter works in a similar fashion to the
`nested <#query-dsl-nested-query>`__ query, except it’s used as a
filter. It follows exactly the same structure, but also allows to cache
the results (set ``_cache`` to ``true``), and have it named (set the
``_name`` value). For example:

.. code:: js

    {
        "filtered" : {
            "query" : { "match_all" : {} },
            "filter" : {
                "nested" : {
                    "path" : "obj1",
                    "filter" : {
                        "bool" : {
                            "must" : [
                                {
                                    "term" : {"obj1.name" : "blue"}
                                },
                                {
                                    "range" : {"obj1.count" : {"gt" : 5}}
                                }
                            ]
                        }
                    },
                    "_cache" : true
                }
            }
        }
    }

**Join option**

The nested filter also supports a ``join`` option which controls whether
to perform the block join or not. By default, it’s enabled. But when
it’s disabled, it emits the hidden nested documents as hits instead of
the joined root document.

This is useful when a ``nested`` filter is used in a facet where nested
is enabled, like you can see in the example below:

.. code:: js

    {
        "query" : {
            "nested" : {
                "path" : "offers",
                "query" : {
                    "match" : {
                        "offers.color" : "blue"
                    }
                }
            }
        },
        "facets" : {
            "size" : {
                "terms" : {
                    "field" : "offers.size"
                },
                "facet_filter" : {
                    "nested" : {
                        "path" : "offers",
                        "query" : {
                            "match" : {
                                "offers.color" : "blue"
                            }
                        },
                        "join" : false
                    }
                },
                "nested" : "offers"
            }
        }
    }'

Not Filter
----------

A filter that filters out matched documents using a query. Can be placed
within queries that accept a filter.

.. code:: js

    {
        "filtered" : {
            "query" : {
                "term" : { "name.first" : "shay" }
            },
            "filter" : {
                "not" : {
                    "range" : {
                        "postDate" : {
                            "from" : "2010-03-01",
                            "to" : "2010-04-01"
                        }
                    }
                }
            }
        }
    }

Or, in a longer form with a ``filter`` element:

.. code:: js

    {
        "filtered" : {
            "query" : {
                "term" : { "name.first" : "shay" }
            },
            "filter" : {
                "not" : {
                    "filter" :  {
                        "range" : {
                            "postDate" : {
                                "from" : "2010-03-01",
                                "to" : "2010-04-01"
                            }
                        }
                    }
                }
            }
        }
    }

**Caching**

The result of the filter is not cached by default. The ``_cache`` can be
set to ``true`` in order to cache it (though usually not needed). Here
is an example:

.. code:: js

    {
        "filtered" : {
            "query" : {
                "term" : { "name.first" : "shay" }
            },
            "filter" : {
                "not" : {
                    "filter" :  {
                        "range" : {
                            "postDate" : {
                                "from" : "2010-03-01",
                                "to" : "2010-04-01"
                            }
                        }
                    },
                    "_cache" : true
                }
            }
        }
    }

Or Filter
---------

A filter that matches documents using the ``OR`` boolean operator on
other filters. Can be placed within queries that accept a filter.

.. code:: js

    {
        "filtered" : {
            "query" : {
                "term" : { "name.first" : "shay" }
            },
            "filter" : {
                "or" : [
                    {
                        "term" : { "name.second" : "banon" }
                    },
                    {
                        "term" : { "name.nick" : "kimchy" }
                    }
                ]
            }
        }
    }

**Caching**

The result of the filter is not cached by default. The ``_cache`` can be
set to ``true`` in order to cache it (though usually not needed). Since
the ``_cache`` element requires to be set on the ``or`` filter itself,
the structure then changes a bit to have the filters provided within a
``filters`` element:

.. code:: js

    {
        "filtered" : {
            "query" : {
                "term" : { "name.first" : "shay" }
            },
            "filter" : {
                "or" : {
                    "filters" : [
                        {
                            "term" : { "name.second" : "banon" }
                        },
                        {
                            "term" : { "name.nick" : "kimchy" }
                        }
                    ],
                    "_cache" : true
                }
            }
        }
    }

Prefix Filter
-------------

Filters documents that have fields containing terms with a specified
prefix (**not analyzed**). Similar to phrase query, except that it acts
as a filter. Can be placed within queries that accept a filter.

.. code:: js

    {
        "constant_score" : {
            "filter" : {
                "prefix" : { "user" : "ki" }
            }
        }
    }

**Caching**

The result of the filter is cached by default. The ``_cache`` can be set
to ``false`` in order not to cache it. Here is an example:

.. code:: js

    {
        "constant_score" : {
            "filter" : {
                "prefix" : {
                    "user" : "ki",
                    "_cache" : false
                }
            }
        }
    }

Query Filter
------------

Wraps any query to be used as a filter. Can be placed within queries
that accept a filter.

.. code:: js

    {
        "constantScore" : {
            "filter" : {
                "query" : {
                    "query_string" : {
                        "query" : "this AND that OR thus"
                    }
                }
            }
        }
    }

**Caching**

The result of the filter is not cached by default. The ``_cache`` can be
set to ``true`` to cache the **result** of the filter. This is handy
when the same query is used on several (many) other queries. Note, the
process of caching the first execution is higher when not caching (since
it needs to satisfy different queries).

Setting the ``_cache`` element requires a different format for the
``query``:

.. code:: js

    {
        "constantScore" : {
            "filter" : {
                "fquery" : {
                    "query" : {
                        "query_string" : {
                            "query" : "this AND that OR thus"
                        }
                    },
                    "_cache" : true
                }
            }
        }
    }

Range Filter
------------

Filters documents with fields that have terms within a certain range.
Similar to `range query <#query-dsl-range-query>`__, except that it acts
as a filter. Can be placed within queries that accept a filter.

.. code:: js

    {
        "constant_score" : {
            "filter" : {
                "range" : {
                    "age" : {
                        "gte": 10,
                        "lte": 20
                    }
                }
            }
        }
    }

The ``range`` filter accepts the following parameters:

+------------+---------------------------------------------------------------+
| ``gte``    | Greater-than or equal to                                      |
+------------+---------------------------------------------------------------+
| ``gt``     | Greater-than                                                  |
+------------+---------------------------------------------------------------+
| ``lte``    | Less-than or equal to                                         |
+------------+---------------------------------------------------------------+
| ``lt``     | Less-than                                                     |
+------------+---------------------------------------------------------------+

**Date options**

When applied on ``date`` fields the ``range`` filter accepts also a
``time_zone`` parameter. The ``time_zone`` parameter will be applied to
your input lower and upper bounds and will move them to UTC time based
date:

.. code:: js

    {
        "constant_score": {
            "filter": {
                "range" : {
                    "born" : {
                        "gte": "2012-01-01",
                        "lte": "now",
                        "time_zone": "+1:00"
                    }
                }
            }
        }
    }

In the above example, ``gte`` will be actually moved to
``2011-12-31T23:00:00`` UTC date.

    **Note**

    if you give a date with a timezone explicitly defined and use the
    ``time_zone`` parameter, ``time_zone`` will be ignored. For example,
    setting ``from`` to ``2012-01-01T00:00:00+01:00`` with
    ``"time_zone":"+10:00"`` will still use ``+01:00`` time zone.

When applied on ``date`` fields the ``range`` filter accepts also a
``format`` parameter. The ``format`` parameter will help support another
date format than the one defined in mapping:

.. code:: js

    {
        "constant_score": {
            "filter": {
                "range" : {
                    "born" : {
                        "gte": "01/01/2012",
                        "lte": "2013",
                        "format": "dd/MM/yyyy||yyyy"
                    }
                }
            }
        }
    }

**Execution**

The ``execution`` option controls how the range filter internally
executes. The ``execution`` option accepts the following values:

+------------+---------------------------------------------------------------+
| ``index``  | Uses the field’s inverted index in order to determine whether |
|            | documents fall within the specified range.                    |
+------------+---------------------------------------------------------------+
| ``fielddat | Uses fielddata in order to determine whether documents fall   |
| a``        | within the specified range.                                   |
+------------+---------------------------------------------------------------+

In general for small ranges the ``index`` execution is faster and for
longer ranges the ``fielddata`` execution is faster.

The ``fielddata`` execution, as the name suggests, uses field data and
therefore requires more memory, so make sure you have sufficient memory
on your nodes in order to use this execution mode. It usually makes
sense to use it on fields you’re already aggregating or sorting by.

**Caching**

The result of the filter is only automatically cached by default if the
``execution`` is set to ``index``. The ``_cache`` can be set to
``false`` to turn it off.

If the ``now`` date math expression is used without rounding then a
range filter will never be cached even if ``_cache`` is set to ``true``.
Also any filter that wraps this filter will never be cached.

Regexp Filter
-------------

The ``regexp`` filter is similar to the
`regexp <#query-dsl-regexp-query>`__ query, except that it is cacheable
and can speedup performance in case you are reusing this filter in your
queries.

See ? for details of the supported regular expression language.

.. code:: js

    {
        "filtered": {
            "query": {
                "match_all": {}
            },
            "filter": {
                "regexp":{
                    "name.first" : "s.*y"
                }
            }
        }
    }

You can also select the cache name and use the same regexp flags in the
filter as in the query.

Regular expressions are dangerous because it’s easy to accidentally
create an innocuous looking one that requires an exponential number of
internal determinized automaton states (and corresponding RAM and CPU)
for Lucene to execute. Lucene prevents these using the
``max_determinized_states`` setting (defaults to 10000). You can raise
this limit to allow more complex regular expressions to execute.

You have to enable caching explicitly in order to have the ``regexp``
filter cached.

.. code:: js

    {
        "filtered": {
            "query": {
                "match_all": {}
            },
            "filter": {
                "regexp":{
                    "name.first" : {
                        "value" : "s.*y",
                        "flags" : "INTERSECTION|COMPLEMENT|EMPTY",
                "max_determinized_states": 20000
                    },
                    "_name":"test",
                    "_cache" : true,
                    "_cache_key" : "key"
                }
            }
        }
    }

Script Filter
-------------

A filter allowing to define `scripts <#modules-scripting>`__ as filters.
For example:

.. code:: js

    "filtered" : {
        "query" : {
            ...
        },
        "filter" : {
            "script" : {
                "script" : "doc['num1'].value > 1"
            }
        }
    }

**Custom Parameters**

Scripts are compiled and cached for faster execution. If the same script
can be used, just with different parameters provider, it is preferable
to use the ability to pass parameters to the script itself, for example:

.. code:: js

    "filtered" : {
        "query" : {
            ...
        },
        "filter" : {
            "script" : {
                "script" : "doc['num1'].value > param1"
                "params" : {
                    "param1" : 5
                }
            }
        }
    }

**Caching**

The result of the filter is not cached by default. The ``_cache`` can be
set to ``true`` to cache the **result** of the filter. This is handy
when the same script and parameters are used on several (many) other
queries. Note, the process of caching the first execution is higher when
caching (since it needs to satisfy different queries).

Term Filter
-----------

Filters documents that have fields that contain a term (**not
analyzed**). Similar to `term query <#query-dsl-term-query>`__, except
that it acts as a filter. Can be placed within queries that accept a
filter, for example:

.. code:: js

    {
        "constant_score" : {
            "filter" : {
                "term" : { "user" : "kimchy"}
            }
        }
    }

**Caching**

The result of the filter is automatically cached by default. The
``_cache`` can be set to ``false`` to turn it off. Here is an example:

.. code:: js

    {
        "constant_score" : {
            "filter" : {
                "term" : {
                    "user" : "kimchy",
                    "_cache" : false
                }
            }
        }
    }

Terms Filter
------------

Filters documents that have fields that match any of the provided terms
(**not analyzed**). For example:

.. code:: js

    {
        "constant_score" : {
            "filter" : {
                "terms" : { "user" : ["kimchy", "elasticsearch"]}
            }
        }
    }

The ``terms`` filter is also aliased with ``in`` as the filter name for
simpler usage.

**Execution Mode**

The way terms filter executes is by iterating over the terms provided
and finding matches docs (loading into a bitset) and caching it.
Sometimes, we want a different execution model that can still be
achieved by building more complex queries in the DSL, but we can support
them in the more compact model that terms filter provides.

The ``execution`` option now has the following options :

+------------+---------------------------------------------------------------+
| ``plain``  | The default. Works as today. Iterates over all the terms,     |
|            | building a bit set matching it, and filtering. The total      |
|            | filter is cached.                                             |
+------------+---------------------------------------------------------------+
| ``fielddat | Generates a terms filters that uses the fielddata cache to    |
| a``        | compare terms. This execution mode is great to use when       |
|            | filtering on a field that is already loaded into the          |
|            | fielddata cache from aggregating, sorting, or index warmers.  |
|            | When filtering on a large number of terms, this execution can |
|            | be considerably faster than the other modes. The total filter |
|            | is not cached unless explicitly configured to do so.          |
+------------+---------------------------------------------------------------+
| ``bool``   | Generates a term filter (which is cached) for each term, and  |
|            | wraps those in a bool filter. The bool filter itself is not   |
|            | cached as it can operate very quickly on the cached term      |
|            | filters.                                                      |
+------------+---------------------------------------------------------------+
| ``and``    | Generates a term filter (which is cached) for each term, and  |
|            | wraps those in an and filter. The and filter itself is not    |
|            | cached.                                                       |
+------------+---------------------------------------------------------------+
| ``or``     | Generates a term filter (which is cached) for each term, and  |
|            | wraps those in an or filter. The or filter itself is not      |
|            | cached. Generally, the ``bool`` execution mode should be      |
|            | preferred.                                                    |
+------------+---------------------------------------------------------------+

If you don’t want the generated individual term queries to be cached,
you can use: ``bool_nocache``, ``and_nocache`` or ``or_nocache``
instead, but be aware that this will affect performance.

The "total" terms filter caching can still be explicitly controlled
using the ``_cache`` option. Note the default value for it depends on
the execution value.

For example:

.. code:: js

    {
        "constant_score" : {
            "filter" : {
                "terms" : {
                    "user" : ["kimchy", "elasticsearch"],
                    "execution" : "bool",
                    "_cache": true
                }
            }
        }
    }

**Caching**

The result of the filter is automatically cached by default. The
``_cache`` can be set to ``false`` to turn it off.

**Terms lookup mechanism**

When it’s needed to specify a ``terms`` filter with a lot of terms it
can be beneficial to fetch those term values from a document in an
index. A concrete example would be to filter tweets tweeted by your
followers. Potentially the amount of user ids specified in the terms
filter can be a lot. In this scenario it makes sense to use the terms
filter’s terms lookup mechanism.

The terms lookup mechanism supports the following options:

+------------+---------------------------------------------------------------+
| ``index``  | The index to fetch the term values from. Defaults to the      |
|            | current index.                                                |
+------------+---------------------------------------------------------------+
| ``type``   | The type to fetch the term values from.                       |
+------------+---------------------------------------------------------------+
| ``id``     | The id of the document to fetch the term values from.         |
+------------+---------------------------------------------------------------+
| ``path``   | The field specified as path to fetch the actual values for    |
|            | the ``terms`` filter.                                         |
+------------+---------------------------------------------------------------+
| ``routing` | A custom routing value to be used when retrieving the         |
| `          | external terms doc.                                           |
+------------+---------------------------------------------------------------+
| ``cache``  | Whether to cache the filter built from the retrieved document |
|            | (``true`` - default) or whether to fetch and rebuild the      |
|            | filter on every request (``false``). See "`Terms lookup       |
|            | caching <#query-dsl-terms-filter-lookup-caching>`__\ " below  |
+------------+---------------------------------------------------------------+

The values for the ``terms`` filter will be fetched from a field in a
document with the specified id in the specified type and index.
Internally a get request is executed to fetch the values from the
specified path. At the moment for this feature to work the ``_source``
needs to be stored.

Also, consider using an index with a single shard and fully replicated
across all nodes if the "reference" terms data is not large. The lookup
terms filter will prefer to execute the get request on a local node if
possible, reducing the need for networking.

**Terms lookup caching**

There is an additional cache involved, which caches the lookup of the
lookup document to the actual terms. This lookup cache is a LRU cache.
This cache has the following options:

``indices.cache.filter.terms.size``
    The size of the lookup cache. The default is ``10mb``.

``indices.cache.filter.terms.expire_after_access``
    The time after the last read an entry should expire. Disabled by
    default.

``indices.cache.filter.terms.expire_after_write``
    The time after the last write an entry should expire. Disabled by
    default.

All options for the lookup of the documents cache can only be configured
via the ``elasticsearch.yml`` file.

When using the terms lookup the ``execution`` option isn’t taken into
account and behaves as if the execution mode was set to ``plain``.

**Terms lookup twitter example**

.. code:: js

    # index the information for user with id 2, specifically, its followers
    curl -XPUT localhost:9200/users/user/2 -d '{
       "followers" : ["1", "3"]
    }'

    # index a tweet, from user with id 2
    curl -XPUT localhost:9200/tweets/tweet/1 -d '{
       "user" : "2"
    }'

    # search on all the tweets that match the followers of user 2
    curl -XGET localhost:9200/tweets/_search -d '{
      "query" : {
        "filtered" : {
          "filter" : {
            "terms" : {
              "user" : {
                "index" : "users",
                "type" : "user",
                "id" : "2",
                "path" : "followers"
              },
              "_cache_key" : "user_2_friends"
            }
          }
        }
      }
    }'

The above is highly optimized, both in a sense that the list of
followers will not be fetched if the filter is already cached in the
filter cache, and with internal LRU cache for fetching external values
for the terms filter. Also, the entry in the filter cache will not hold
``all`` the terms reducing the memory required for it.

``_cache_key`` is recommended to be set, so its simple to clear the
cache associated with it using the clear cache API. For example:

.. code:: js

    curl -XPOST 'localhost:9200/tweets/_cache/clear?filter_keys=user_2_friends'

The structure of the external terms document can also include array of
inner objects, for example:

.. code:: js

    curl -XPUT localhost:9200/users/user/2 -d '{
     "followers" : [
       {
         "id" : "1"
       },
       {
         "id" : "2"
       }
     ]
    }'

In which case, the lookup path will be ``followers.id``.

Type Filter
-----------

Filters documents matching the provided document / mapping type. Note,
this filter can work even when the ``_type`` field is not indexed (using
the `\_uid <#mapping-uid-field>`__ field).

.. code:: js

    {
        "type" : {
            "value" : "my_type"
        }
    }

Mapping is the process of defining how a document should be mapped to
the Search Engine, including its searchable characteristics such as
which fields are searchable and if/how they are tokenized. In
Elasticsearch, an index may store documents of different "mapping
types". Elasticsearch allows one to associate multiple mapping
definitions for each mapping type.

Explicit mapping is defined on an index/type level. By default, there
isn’t a need to define an explicit mapping, since one is automatically
created and registered when a new type or new field is introduced (with
no performance overhead) and have sensible defaults. Only when the
defaults need to be overridden must a mapping definition be provided.

**Mapping Types**

Mapping types are a way to divide the documents in an index into logical
groups. Think of it as tables in a database. Though there is separation
between types, it’s not a full separation (all end up as a document
within the same Lucene index).

Field names with the same name across types are highly recommended to
have the same type and same mapping characteristics (analysis settings
for example). There is an effort to allow to explicitly "choose" which
field to use by using type prefix (``my_type.my_field``), but it’s not
complete, and there are places where it will never work (like
aggregations on the field).

In practice though, this restriction is almost never an issue. The field
name usually ends up being a good indication to its "typeness" (e.g.
"first\_name" will always be a string). Note also, that this does not
apply to the cross index case.

**Mapping API**

To create a mapping, you will need the `Put Mapping
API <#indices-put-mapping>`__, or you can add multiple mappings when you
`create an index <#indices-create-index>`__.

**Global Settings**

The ``index.mapping.ignore_malformed`` global setting can be set on the
index level to allow to ignore malformed content globally across all
mapping types (malformed content example is trying to index a text
string value as a numeric type).

The ``index.mapping.coerce`` global setting can be set on the index
level to coerce numeric content globally across all mapping types (The
default setting is true and coercions attempted are to convert strings
with numbers into numeric types and also numeric values with fractions
to any integer/short/long values minus the fraction part). When the
permitted conversions fail in their attempts, the value is considered
malformed and the ignore\_malformed setting dictates what will happen
next.

Fields
======

Each mapping has a number of fields associated with it which can be used
to control how the document metadata (eg ?) is indexed.

``_uid``
--------

Each document indexed is associated with an id and a type, the internal
``_uid`` field is the unique identifier of a document within an index
and is composed of the type and the id (meaning that different types can
have the same id and still maintain uniqueness).

The ``_uid`` field is automatically used when ``_type`` is not indexed
to perform type based filtering, and does not require the ``_id`` to be
indexed.

``_id``
-------

Each document indexed is associated with an id and a type. The ``_id``
field can be used to index just the id, and possible also store it. By
default it is not indexed and not stored (thus, not created).

Note, even though the ``_id`` is not indexed, all the APIs still work
(since they work with the ``_uid`` field), as well as fetching by ids
using ``term``, ``terms`` or ``prefix`` queries/filters (including the
specific ``ids`` query/filter).

The ``_id`` field can be enabled to be indexed, and possibly stored,
using the appropriate mapping attributes:

.. code:: js

    {
        "tweet" : {
            "_id" : {
                "index" : "not_analyzed",
                "store" : true
            }
        }
    }

The ``_id`` mapping can also be associated with a ``path`` that will be
used to extract the id from a different location in the source document.
For example, having the following mapping:

.. code:: js

    {
        "tweet" : {
            "_id" : {
                "path" : "post_id"
            }
        }
    }

Will cause ``1`` to be used as the id for:

.. code:: js

    {
        "message" : "You know, for Search",
        "post_id" : "1"
    }

This does require an additional lightweight parsing step while indexing,
in order to extract the id to decide which shard the index operation
will be executed on.

``_type``
---------

Each document indexed is associated with an id and a type. The type,
when indexing, is automatically indexed into a ``_type`` field. By
default, the ``_type`` field is indexed (but **not** analyzed) and not
stored. This means that the ``_type`` field can be queried.

The ``_type`` field can be stored as well, for example:

.. code:: js

    {
        "tweet" : {
            "_type" : {"store" : true}
        }
    }

The ``_type`` field can also not be indexed, and all the APIs will still
work except for specific queries (term queries / filters) or
aggregations done on the ``_type`` field.

.. code:: js

    {
        "tweet" : {
            "_type" : {"index" : "no"}
        }
    }

``_source``
-----------

The ``_source`` field is an automatically generated field that stores
the actual JSON that was used as the indexed document. It is not indexed
(searchable), just stored. When executing "fetch" requests, like
`get <#docs-get>`__ or `search <#search-search>`__, the ``_source``
field is returned by default.

Though very handy to have around, the source field does incur storage
overhead within the index. For this reason, it can be disabled. For
example:

.. code:: js

    {
        "tweet" : {
            "_source" : {"enabled" : false}
        }
    }

**Includes / Excludes**

Allow to specify paths in the source that would be included / excluded
when it’s stored, supporting ``*`` as wildcard annotation. For example:

.. code:: js

    {
        "my_type" : {
            "_source" : {
                "includes" : ["path1.*", "path2.*"],
                "excludes" : ["path3.*"]
            }
        }
    }

``_all``
--------

The idea of the ``_all`` field is that it includes the text of one or
more other fields within the document indexed. It can come very handy
especially for search requests, where we want to execute a search query
against the content of a document, without knowing which fields to
search on. This comes at the expense of CPU cycles and index size.

The ``_all`` fields can be completely disabled. Explicit field mappings
and object mappings can be excluded / included in the ``_all`` field. By
default, it is enabled and all fields are included in it for ease of
use.

When disabling the ``_all`` field, it is a good practice to set
``index.query.default_field`` to a different value (for example, if you
have a main "message" field in your data, set it to ``message``).

One of the nice features of the ``_all`` field is that it takes into
account specific fields boost levels. Meaning that if a title field is
boosted more than content, the title (part) in the ``_all`` field will
mean more than the content (part) in the ``_all`` field.

Here is a sample mapping:

.. code:: js

    {
        "person" : {
            "_all" : {"enabled" : true},
            "properties" : {
                "name" : {
                    "type" : "object",
                    "dynamic" : false,
                    "properties" : {
                        "first" : {"type" : "string", "store" : true , "include_in_all" : false},
                        "last" : {"type" : "string", "index" : "not_analyzed"}
                    }
                },
                "address" : {
                    "type" : "object",
                    "include_in_all" : false,
                    "properties" : {
                        "first" : {
                            "properties" : {
                                "location" : {"type" : "string", "store" : true, "index_name" : "firstLocation"}
                            }
                        },
                        "last" : {
                            "properties" : {
                                "location" : {"type" : "string"}
                            }
                        }
                    }
                },
                "simple1" : {"type" : "long", "include_in_all" : true},
                "simple2" : {"type" : "long", "include_in_all" : false}
            }
        }
    }

The ``_all`` fields allows for ``store``, ``term_vector`` and
``analyzer`` (with specific ``index_analyzer`` and ``search_analyzer``)
to be set.

**Highlighting**

For any field to allow `highlighting <#search-request-highlighting>`__
it has to be either stored or part of the ``_source`` field. By default
the ``_all`` field does not qualify for either, so highlighting for it
does not yield any data.

Although it is possible to ``store`` the ``_all`` field, it is basically
an aggregation of all fields, which means more data will be stored, and
highlighting it might produce strange results.

``_analyzer``
-------------

The ``_analyzer`` mapping allows to use a document field property as the
name of the analyzer that will be used to index the document. The
analyzer will be used for any field that does not explicitly defines an
``analyzer`` or ``index_analyzer`` when indexing.

Here is a simple mapping:

.. code:: js

    {
        "type1" : {
            "_analyzer" : {
                "path" : "my_field"
            }
        }
    }

The above will use the value of the ``my_field`` to lookup an analyzer
registered under it. For example, indexing the following doc:

.. code:: js

    {
        "my_field" : "whitespace"
    }

Will cause the ``whitespace`` analyzer to be used as the index analyzer
for all fields without explicit analyzer setting.

The default path value is ``_analyzer``, so the analyzer can be driven
for a specific document by setting the ``_analyzer`` field in it. If a
custom json field name is needed, an explicit mapping with a different
path should be set.

By default, the ``_analyzer`` field is indexed, it can be disabled by
settings ``index`` to ``no`` in the mapping.

``_parent``
-----------

The parent field mapping is defined on a child mapping, and points to
the parent type this child relates to. For example, in case of a
``blog`` type and a ``blog_tag`` type child document, the mapping for
``blog_tag`` should be:

.. code:: js

    {
        "blog_tag" : {
            "_parent" : {
                "type" : "blog"
            }
        }
    }

The mapping is automatically stored and indexed (meaning it can be
searched on using the ``_parent`` field notation).

``_field_names``
----------------

The ``_field_names`` field indexes the field names of a document, which
can later be used to search for documents based on the fields that they
contain typically using the ``exists`` and ``missing`` filters.

``_field_names`` is indexed by default for indices that have been
created after Elasticsearch 1.3.0.

``_routing``
------------

The routing field allows to control the ``_routing`` aspect when
indexing data and explicit routing control is required.

**store / index**

The first thing the ``_routing`` mapping does is to store the routing
value provided (``store`` set to ``false``) and index it (``index`` set
to ``not_analyzed``). The reason why the routing is stored by default is
so reindexing data will be possible if the routing value is completely
external and not part of the docs.

**required**

Another aspect of the ``_routing`` mapping is the ability to define it
as required by setting ``required`` to ``true``. This is very important
to set when using routing features, as it allows different APIs to make
use of it. For example, an index operation will be rejected if no
routing value has been provided (or derived from the doc). A delete
operation will be broadcasted to all shards if no routing value is
provided and ``_routing`` is required.

**path**

The routing value can be provided as an external value when indexing
(and still stored as part of the document, in much the same way
``_source`` is stored). But, it can also be automatically extracted from
the index doc based on a ``path``. For example, having the following
mapping:

.. code:: js

    {
        "comment" : {
            "_routing" : {
                "required" : true,
                "path" : "blog.post_id"
            }
        }
    }

Will cause the following doc to be routed based on the ``111222`` value:

.. code:: js

    {
        "text" : "the comment text"
        "blog" : {
            "post_id" : "111222"
        }
    }

Note, using ``path`` without explicit routing value provided required an
additional (though quite fast) parsing phase.

**id uniqueness**

When indexing documents specifying a custom ``_routing``, the uniqueness
of the ``_id`` is not guaranteed throughout all the shards that the
index is composed of. In fact, documents with the same ``_id`` might end
up in different shards if indexed with different ``_routing`` values.

``_index``
----------

The ability to store in a document the index it belongs to. By default
it is disabled, in order to enable it, the following mapping should be
defined:

.. code:: js

    {
        "tweet" : {
            "_index" : { "enabled" : true }
        }
    }

``_size``
---------

The ``_size`` field allows to automatically index the size of the
original ``_source`` indexed. By default, it’s disabled. In order to
enable it, set the mapping to:

.. code:: js

    {
        "tweet" : {
            "_size" : {"enabled" : true}
        }
    }

In order to also store it, use:

.. code:: js

    {
        "tweet" : {
            "_size" : {"enabled" : true, "store" : true }
        }
    }

``_timestamp``
--------------

The ``_timestamp`` field allows to automatically index the timestamp of
a document. It can be provided externally via the index request or in
the ``_source``. If it is not provided externally it will be
automatically set to a `default
date <#mapping-timestamp-field-default>`__.

**enabled**

By default it is disabled. In order to enable it, the following mapping
should be defined:

.. code:: js

    {
        "tweet" : {
            "_timestamp" : { "enabled" : true }
        }
    }

**store / index**

By default the ``_timestamp`` field has ``store`` set to ``true`` and
``index`` set to ``not_analyzed``. It can be queried as a standard date
field.

**path**

The ``_timestamp`` value can be provided as an external value when
indexing. But, it can also be automatically extracted from the document
to index based on a ``path``. For example, having the following mapping:

.. code:: js

    {
        "tweet" : {
            "_timestamp" : {
                "enabled" : true,
                "path" : "post_date"
            }
        }
    }

Will cause ``2009-11-15T14:12:12`` to be used as the timestamp value
for:

.. code:: js

    {
        "message" : "You know, for Search",
        "post_date" : "2009-11-15T14:12:12"
    }

Note, using ``path`` without explicit timestamp value provided requires
an additional (though quite fast) parsing phase.

**format**

You can define the `date format <#mapping-date-format>`__ used to parse
the provided timestamp value. For example:

.. code:: js

    {
        "tweet" : {
            "_timestamp" : {
                "enabled" : true,
                "path" : "post_date",
                "format" : "YYYY-MM-dd"
            }
        }
    }

Note, the default format is ``dateOptionalTime``. The timestamp value
will first be parsed as a number and if it fails the format will be
tried.

**default**

You can define a default value for when timestamp is not provided within
the index request or in the ``_source`` document.

By default, the default value is ``now`` which means the date the
document was processed by the indexing chain.

You can disable that default value by setting ``default`` to ``null``.
It means that ``timestamp`` is mandatory:

.. code:: js

    {
        "tweet" : {
            "_timestamp" : {
                "enabled" : true,
                "default" : null
            }
        }
    }

If you don’t provide any timestamp value, indexation will fail.

You can also set the default value to any date respecting `timestamp
format <#mapping-timestamp-field-format>`__:

.. code:: js

    {
        "tweet" : {
            "_timestamp" : {
                "enabled" : true,
                "format" : "YYYY-MM-dd",
                "default" : "1970-01-01"
            }
        }
    }

If you don’t provide any timestamp value, indexation will fail.

``_ttl``
--------

A lot of documents naturally come with an expiration date. Documents can
therefore have a ``_ttl`` (time to live), which will cause the expired
documents to be deleted automatically.

``_ttl`` accepts two parameters which are described below, every other
setting will be silently ignored.

**enabled**

By default it is disabled, in order to enable it, the following mapping
should be defined:

.. code:: js

    {
        "tweet" : {
            "_ttl" : { "enabled" : true }
        }
    }

``_ttl`` can only be enabled once and never be disabled again.

**default**

You can provide a per index/type default ``_ttl`` value as follows:

.. code:: js

    {
        "tweet" : {
            "_ttl" : { "enabled" : true, "default" : "1d" }
        }
    }

In this case, if you don’t provide a ``_ttl`` value in your query or in
the ``_source`` all tweets will have a ``_ttl`` of one day.

In case you do not specify a time unit like ``d`` (days), ``m``
(minutes), ``h`` (hours), ``ms`` (milliseconds) or ``w`` (weeks),
milliseconds is used as default unit.

If no ``default`` is set and no ``_ttl`` value is given then the
document has an infinite ``_ttl`` and will not expire.

You can dynamically update the ``default`` value using the put mapping
API. It won’t change the ``_ttl`` of already indexed documents but will
be used for future documents.

**Note on documents expiration**

Expired documents will be automatically deleted regularly. You can
dynamically set the ``indices.ttl.interval`` to fit your needs. The
default value is ``60s``.

The deletion orders are processed by bulk. You can set
``indices.ttl.bulk_size`` to fit your needs. The default value is
``10000``.

Note that the expiration procedure handle versioning properly so if a
document is updated between the collection of documents to expire and
the delete order, the document won’t be deleted.

Types
=====

The datatype for each field in a document (eg strings, numbers, objects
etc) can be controlled via the type mapping.

Core Types
----------

Each JSON field can be mapped to a specific core type. JSON itself
already provides us with some typing, with its support for ``string``,
``integer``/``long``, ``float``/``double``, ``boolean``, and ``null``.

The following sample tweet JSON document will be used to explain the
core types:

.. code:: js

    {
        "tweet" {
            "user" : "kimchy",
            "message" : "This is a tweet!",
            "postDate" : "2009-11-15T14:12:12",
            "priority" : 4,
            "rank" : 12.3
        }
    }

Explicit mapping for the above JSON tweet can be:

.. code:: js

    {
        "tweet" : {
            "properties" : {
                "user" : {"type" : "string", "index" : "not_analyzed"},
                "message" : {"type" : "string", "null_value" : "na"},
                "postDate" : {"type" : "date"},
                "priority" : {"type" : "integer"},
                "rank" : {"type" : "float"}
            }
        }
    }

**String**

The text based string type is the most basic type, and contains one or
more characters. An example mapping can be:

.. code:: js

    {
        "tweet" : {
            "properties" : {
                "message" : {
                    "type" : "string",
                    "store" : true,
                    "index" : "analyzed",
                    "null_value" : "na"
                },
                "user" : {
                    "type" : "string",
                    "index" : "not_analyzed",
                    "norms" : {
                        "enabled" : false
                    }
                }
            }
        }
    }

The above mapping defines a ``string`` ``message`` property/field within
the ``tweet`` type. The field is stored in the index (so it can later be
retrieved using selective loading when searching), and it gets analyzed
(broken down into searchable terms). If the message has a ``null``
value, then the value that will be stored is ``na``. There is also a
``string`` ``user`` which is indexed as-is (not broken down into tokens)
and has norms disabled (so that matching this field is a binary
decision, no match is better than another one).

The following table lists all the attributes that can be used with the
``string`` type:

+--------------------------------------+--------------------------------------+
| Attribute                            | Description                          |
+======================================+======================================+
| ``index_name``                       | The name of the field that will be   |
|                                      | stored in the index. Defaults to the |
|                                      | property/field name.                 |
+--------------------------------------+--------------------------------------+
| ``store``                            | Set to ``true`` to actually store    |
|                                      | the field in the index, ``false`` to |
|                                      | not store it. Defaults to ``false``  |
|                                      | (note, the JSON document itself is   |
|                                      | stored, and it can be retrieved from |
|                                      | it).                                 |
+--------------------------------------+--------------------------------------+
| ``index``                            | Set to ``analyzed`` for the field to |
|                                      | be indexed and searchable after      |
|                                      | being broken down into token using   |
|                                      | an analyzer. ``not_analyzed`` means  |
|                                      | that its still searchable, but does  |
|                                      | not go through any analysis process  |
|                                      | or broken down into tokens. ``no``   |
|                                      | means that it won’t be searchable at |
|                                      | all (as an individual field; it may  |
|                                      | still be included in ``_all``).      |
|                                      | Setting to ``no`` disables           |
|                                      | ``include_in_all``. Defaults to      |
|                                      | ``analyzed``.                        |
+--------------------------------------+--------------------------------------+
| ``doc_values``                       | Set to ``true`` to store field       |
|                                      | values in a column-stride fashion.   |
|                                      | Automatically set to ``true`` when   |
|                                      | the ```fielddata``                   |
|                                      | format <#fielddata-formats>`__ is    |
|                                      | ``doc_values``.                      |
+--------------------------------------+--------------------------------------+
| ``term_vector``                      | Possible values are ``no``, ``yes``, |
|                                      | ``with_offsets``,                    |
|                                      | ``with_positions``,                  |
|                                      | ``with_positions_offsets``. Defaults |
|                                      | to ``no``.                           |
+--------------------------------------+--------------------------------------+
| ``boost``                            | The boost value. Defaults to         |
|                                      | ``1.0``.                             |
+--------------------------------------+--------------------------------------+
| ``null_value``                       | When there is a (JSON) null value    |
|                                      | for the field, use the               |
|                                      | ``null_value`` as the field value.   |
|                                      | Defaults to not adding the field at  |
|                                      | all.                                 |
+--------------------------------------+--------------------------------------+
| ``norms: {enabled: <value>}``        | Boolean value if norms should be     |
|                                      | enabled or not. Defaults to ``true`` |
|                                      | for ``analyzed`` fields, and to      |
|                                      | ``false`` for ``not_analyzed``       |
|                                      | fields. See the `section about       |
|                                      | norms <#norms>`__.                   |
+--------------------------------------+--------------------------------------+
| ``norms: {loading: <value>}``        | Describes how norms should be        |
|                                      | loaded, possible values are          |
|                                      | ``eager`` and ``lazy`` (default). It |
|                                      | is possible to change the default    |
|                                      | value to eager for all fields by     |
|                                      | configuring the index setting        |
|                                      | ``index.norms.loading`` to           |
|                                      | ``eager``.                           |
+--------------------------------------+--------------------------------------+
| ``index_options``                    | Allows to set the indexing options,  |
|                                      | possible values are ``docs`` (only   |
|                                      | doc numbers are indexed), ``freqs``  |
|                                      | (doc numbers and term frequencies),  |
|                                      | and ``positions`` (doc numbers, term |
|                                      | frequencies and positions). Defaults |
|                                      | to ``positions`` for ``analyzed``    |
|                                      | fields, and to ``docs`` for          |
|                                      | ``not_analyzed`` fields. It is also  |
|                                      | possible to set it to ``offsets``    |
|                                      | (doc numbers, term frequencies,      |
|                                      | positions and offsets).              |
+--------------------------------------+--------------------------------------+
| ``analyzer``                         | The analyzer used to analyze the     |
|                                      | text contents when ``analyzed``      |
|                                      | during indexing and when searching   |
|                                      | using a query string. Defaults to    |
|                                      | the globally configured analyzer.    |
+--------------------------------------+--------------------------------------+
| ``index_analyzer``                   | The analyzer used to analyze the     |
|                                      | text contents when ``analyzed``      |
|                                      | during indexing.                     |
+--------------------------------------+--------------------------------------+
| ``search_analyzer``                  | The analyzer used to analyze the     |
|                                      | field when part of a query string.   |
|                                      | Can be updated on an existing field. |
+--------------------------------------+--------------------------------------+
| ``include_in_all``                   | Should the field be included in the  |
|                                      | ``_all`` field (if enabled). If      |
|                                      | ``index`` is set to ``no`` this      |
|                                      | defaults to ``false``, otherwise,    |
|                                      | defaults to ``true`` or to the       |
|                                      | parent ``object`` type setting.      |
+--------------------------------------+--------------------------------------+
| ``ignore_above``                     | The analyzer will ignore strings     |
|                                      | larger than this size. Useful for    |
|                                      | generic ``not_analyzed`` fields that |
|                                      | should ignore long text.             |
+--------------------------------------+--------------------------------------+
| ``position_offset_gap``              | Position increment gap between field |
|                                      | instances with the same field name.  |
|                                      | Defaults to 0.                       |
+--------------------------------------+--------------------------------------+

The ``string`` type also support custom indexing parameters associated
with the indexed value. For example:

.. code:: js

    {
        "message" : {
            "_value":  "boosted value",
            "_boost":  2.0
        }
    }

The mapping is required to disambiguate the meaning of the document.
Otherwise, the structure would interpret "message" as a value of type
"object". The key ``_value`` (or ``value``) in the inner document
specifies the real string content that should eventually be indexed. The
``_boost`` (or ``boost``) key specifies the per field document boost
(here 2.0).

**Norms**

Norms store various normalization factors that are later used (at query
time) in order to compute the score of a document relatively to a query.

Although useful for scoring, norms also require quite a lot of memory
(typically in the order of one byte per document per field in your
index, even for documents that don’t have this specific field). As a
consequence, if you don’t need scoring on a specific field, it is highly
recommended to disable norms on it. In particular, this is the case for
fields that are used solely for filtering or aggregations.

In case you would like to disable norms after the fact, it is possible
to do so by using the `PUT mapping API <#indices-put-mapping>`__. Please
however note that norms won’t be removed instantly, but as your index
will receive new insertions or updates, and segments get merged. Any
score computation on a field that got norms removed might return
inconsistent results since some documents won’t have norms anymore while
other documents might still have norms.

**Number**

A number based type supporting ``float``, ``double``, ``byte``,
``short``, ``integer``, and ``long``. It uses specific constructs within
Lucene in order to support numeric values. The number types have the
same ranges as corresponding `Java
types <http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html>`__.
An example mapping can be:

.. code:: js

    {
        "tweet" : {
            "properties" : {
                "rank" : {
                    "type" : "float",
                    "null_value" : 1.0
                }
            }
        }
    }

The following table lists all the attributes that can be used with a
numbered type:

+--------------------------------------+--------------------------------------+
| Attribute                            | Description                          |
+======================================+======================================+
| ``type``                             | The type of the number. Can be       |
|                                      | ``float``, ``double``, ``integer``,  |
|                                      | ``long``, ``short``, ``byte``.       |
|                                      | Required.                            |
+--------------------------------------+--------------------------------------+
| ``index_name``                       | The name of the field that will be   |
|                                      | stored in the index. Defaults to the |
|                                      | property/field name.                 |
+--------------------------------------+--------------------------------------+
| ``store``                            | Set to ``true`` to store actual      |
|                                      | field in the index, ``false`` to not |
|                                      | store it. Defaults to ``false``      |
|                                      | (note, the JSON document itself is   |
|                                      | stored, and it can be retrieved from |
|                                      | it).                                 |
+--------------------------------------+--------------------------------------+
| ``index``                            | Set to ``no`` if the value should    |
|                                      | not be indexed. Setting to ``no``    |
|                                      | disables ``include_in_all``. If set  |
|                                      | to ``no`` the field should be either |
|                                      | stored in ``_source``, have          |
|                                      | ``include_in_all`` enabled, or       |
|                                      | ``store`` be set to ``true`` for     |
|                                      | this to be useful.                   |
+--------------------------------------+--------------------------------------+
| ``doc_values``                       | Set to ``true`` to store field       |
|                                      | values in a column-stride fashion.   |
|                                      | Automatically set to ``true`` when   |
|                                      | the fielddata format is              |
|                                      | ``doc_values``.                      |
+--------------------------------------+--------------------------------------+
| ``precision_step``                   | The precision step (influences the   |
|                                      | number of terms generated for each   |
|                                      | number value). Defaults to ``16``    |
|                                      | for ``long``, ``double``, ``8`` for  |
|                                      | ``short``, ``integer``, ``float``,   |
|                                      | and ``2147483647`` for ``byte``.     |
+--------------------------------------+--------------------------------------+
| ``boost``                            | The boost value. Defaults to         |
|                                      | ``1.0``.                             |
+--------------------------------------+--------------------------------------+
| ``null_value``                       | When there is a (JSON) null value    |
|                                      | for the field, use the               |
|                                      | ``null_value`` as the field value.   |
|                                      | Defaults to not adding the field at  |
|                                      | all.                                 |
+--------------------------------------+--------------------------------------+
| ``include_in_all``                   | Should the field be included in the  |
|                                      | ``_all`` field (if enabled). If      |
|                                      | ``index`` is set to ``no`` this      |
|                                      | defaults to ``false``, otherwise,    |
|                                      | defaults to ``true`` or to the       |
|                                      | parent ``object`` type setting.      |
+--------------------------------------+--------------------------------------+
| ``ignore_malformed``                 | Ignored a malformed number. Defaults |
|                                      | to ``false``.                        |
+--------------------------------------+--------------------------------------+
| ``coerce``                           | Try convert strings to numbers and   |
|                                      | truncate fractions for integers.     |
|                                      | Defaults to ``true``.                |
+--------------------------------------+--------------------------------------+

**Token Count**

The ``token_count`` type maps to the JSON string type but indexes and
stores the number of tokens in the string rather than the string itself.
For example:

.. code:: js

    {
        "tweet" : {
            "properties" : {
                "name" : {
                    "type" : "string",
                    "fields" : {
                        "word_count": {
                            "type" : "token_count",
                            "store" : "yes",
                            "analyzer" : "standard"
                        }
                    }
                }
            }
        }
    }

All the configuration that can be specified for a number can be
specified for a token\_count. The only extra configuration is the
required ``analyzer`` field which specifies which analyzer to use to
break the string into tokens. For best performance, use an analyzer with
no token filters.

    **Note**

    Technically the ``token_count`` type sums position increments rather
    than counting tokens. This means that even if the analyzer filters
    out stop words they are included in the count.

**Date**

The date type is a special type which maps to JSON string type. It
follows a specific format that can be explicitly set. All dates are
``UTC``. Internally, a date maps to a number type ``long``, with the
added parsing stage from string to long and from long to string. An
example mapping:

.. code:: js

    {
        "tweet" : {
            "properties" : {
                "postDate" : {
                    "type" : "date",
                    "format" : "YYYY-MM-dd"
                }
            }
        }
    }

The date type will also accept a long number representing UTC
milliseconds since the epoch, regardless of the format it can handle.

The following table lists all the attributes that can be used with a
date type:

+--------------------------------------+--------------------------------------+
| Attribute                            | Description                          |
+======================================+======================================+
| ``index_name``                       | The name of the field that will be   |
|                                      | stored in the index. Defaults to the |
|                                      | property/field name.                 |
+--------------------------------------+--------------------------------------+
| ``format``                           | The `date                            |
|                                      | format <#mapping-date-format>`__.    |
|                                      | Defaults to ``dateOptionalTime``.    |
+--------------------------------------+--------------------------------------+
| ``store``                            | Set to ``true`` to store actual      |
|                                      | field in the index, ``false`` to not |
|                                      | store it. Defaults to ``false``      |
|                                      | (note, the JSON document itself is   |
|                                      | stored, and it can be retrieved from |
|                                      | it).                                 |
+--------------------------------------+--------------------------------------+
| ``index``                            | Set to ``no`` if the value should    |
|                                      | not be indexed. Setting to ``no``    |
|                                      | disables ``include_in_all``. If set  |
|                                      | to ``no`` the field should be either |
|                                      | stored in ``_source``, have          |
|                                      | ``include_in_all`` enabled, or       |
|                                      | ``store`` be set to ``true`` for     |
|                                      | this to be useful.                   |
+--------------------------------------+--------------------------------------+
| ``doc_values``                       | Set to ``true`` to store field       |
|                                      | values in a column-stride fashion.   |
|                                      | Automatically set to ``true`` when   |
|                                      | the fielddata format is              |
|                                      | ``doc_values``.                      |
+--------------------------------------+--------------------------------------+
| ``precision_step``                   | The precision step (influences the   |
|                                      | number of terms generated for each   |
|                                      | number value). Defaults to ``16``.   |
+--------------------------------------+--------------------------------------+
| ``boost``                            | The boost value. Defaults to         |
|                                      | ``1.0``.                             |
+--------------------------------------+--------------------------------------+
| ``null_value``                       | When there is a (JSON) null value    |
|                                      | for the field, use the               |
|                                      | ``null_value`` as the field value.   |
|                                      | Defaults to not adding the field at  |
|                                      | all.                                 |
+--------------------------------------+--------------------------------------+
| ``include_in_all``                   | Should the field be included in the  |
|                                      | ``_all`` field (if enabled). If      |
|                                      | ``index`` is set to ``no`` this      |
|                                      | defaults to ``false``, otherwise,    |
|                                      | defaults to ``true`` or to the       |
|                                      | parent ``object`` type setting.      |
+--------------------------------------+--------------------------------------+
| ``ignore_malformed``                 | Ignored a malformed number. Defaults |
|                                      | to ``false``.                        |
+--------------------------------------+--------------------------------------+

**Boolean**

The boolean type Maps to the JSON boolean type. It ends up storing
within the index either ``T`` or ``F``, with automatic translation to
``true`` and ``false`` respectively.

.. code:: js

    {
        "tweet" : {
            "properties" : {
                "hes_my_special_tweet" : {
                    "type" : "boolean"
                }
            }
        }
    }

The boolean type also supports passing the value as a number or a string
(in this case ``0``, an empty string, ``false``, ``off`` and ``no`` are
``false``, all other values are ``true``).

The following table lists all the attributes that can be used with the
boolean type:

+--------------------------------------+--------------------------------------+
| Attribute                            | Description                          |
+======================================+======================================+
| ``index_name``                       | The name of the field that will be   |
|                                      | stored in the index. Defaults to the |
|                                      | property/field name.                 |
+--------------------------------------+--------------------------------------+
| ``store``                            | Set to ``true`` to store actual      |
|                                      | field in the index, ``false`` to not |
|                                      | store it. Defaults to ``false``      |
|                                      | (note, the JSON document itself is   |
|                                      | stored, and it can be retrieved from |
|                                      | it).                                 |
+--------------------------------------+--------------------------------------+
| ``index``                            | Set to ``no`` if the value should    |
|                                      | not be indexed. Setting to ``no``    |
|                                      | disables ``include_in_all``. If set  |
|                                      | to ``no`` the field should be either |
|                                      | stored in ``_source``, have          |
|                                      | ``include_in_all`` enabled, or       |
|                                      | ``store`` be set to ``true`` for     |
|                                      | this to be useful.                   |
+--------------------------------------+--------------------------------------+
| ``boost``                            | The boost value. Defaults to         |
|                                      | ``1.0``.                             |
+--------------------------------------+--------------------------------------+
| ``null_value``                       | When there is a (JSON) null value    |
|                                      | for the field, use the               |
|                                      | ``null_value`` as the field value.   |
|                                      | Defaults to not adding the field at  |
|                                      | all.                                 |
+--------------------------------------+--------------------------------------+

**Binary**

The binary type is a base64 representation of binary data that can be
stored in the index. The field is not stored by default and not indexed
at all.

.. code:: js

    {
        "tweet" : {
            "properties" : {
                "image" : {
                    "type" : "binary"
                }
            }
        }
    }

The following table lists all the attributes that can be used with the
binary type:

+------------+---------------------------------------------------------------+
| ``index_na | The name of the field that will be stored in the index.       |
| me``       | Defaults to the property/field name.                          |
+------------+---------------------------------------------------------------+
| ``store``  | Set to ``true`` to store actual field in the index, ``false`` |
|            | to not store it. Defaults to ``false`` (note, the JSON        |
|            | document itself is already stored, so the binary field can be |
|            | retrieved from there).                                        |
+------------+---------------------------------------------------------------+
| ``doc_valu | Set to ``true`` to store field values in a column-stride      |
| es``       | fashion.                                                      |
+------------+---------------------------------------------------------------+
| ``compress | Set to ``true`` to compress the stored binary value.          |
| ``         |                                                               |
+------------+---------------------------------------------------------------+
| ``compress | Compression will only be applied to stored binary fields that |
| _threshold | are greater than this size. Defaults to ``-1``                |
| ``         |                                                               |
+------------+---------------------------------------------------------------+

    **Note**

    Enabling compression on stored binary fields only makes sense on
    large and highly-compressible values. Otherwise per-field
    compression is usually not worth doing as the space savings do not
    compensate for the overhead of the compression format. Normally, you
    should not configure any compression and just rely on the block
    compression of stored fields (which is enabled by default and can’t
    be disabled).

**Fielddata filters**

It is possible to control which field values are loaded into memory,
which is particularly useful for aggregations on string fields, using
fielddata filters, which are explained in detail in the
`Fielddata <#index-modules-fielddata>`__ section.

Fielddata filters can exclude terms which do not match a regex, or which
don’t fall between a ``min`` and ``max`` frequency range:

.. code:: js

    {
        tweet: {
            type:      "string",
            analyzer:  "whitespace"
            fielddata: {
                filter: {
                    regex: {
                        "pattern":        "^#.*"
                    },
                    frequency: {
                        min:              0.001,
                        max:              0.1,
                        min_segment_size: 500
                    }
                }
            }
        }
    }

These filters can be updated on an existing field mapping and will take
effect the next time the fielddata for a segment is loaded. Use the
`Clear Cache <#indices-clearcache>`__ API to reload the fielddata using
the new filters.

**Similarity**

Elasticsearch allows you to configure a similarity (scoring algorithm)
per field. The ``similarity`` setting provides a simple way of choosing
a similarity algorithm other than the default TF/IDF, such as ``BM25``.

You can configure similarities via the `similarity
module <#index-modules-similarity>`__

**Configuring Similarity per Field**

Defining the Similarity for a field is done via the ``similarity``
mapping property, as this example shows:

.. code:: js

    {
       "book":{
          "properties":{
             "title":{
                "type":"string", "similarity":"BM25"
             }
          }
       }
    }

The following Similarities are configured out-of-box:

``default``
    The Default TF/IDF algorithm used by Elasticsearch and Lucene in
    previous versions.

``BM25``
    The BM25 algorithm. `See
    Okapi\_BM25 <http://en.wikipedia.org/wiki/Okapi_BM25>`__ for more
    details.

**Copy to field**

Adding ``copy_to`` parameter to any field mapping will cause all values
of this field to be copied to fields specified in the parameter. In the
following example all values from fields ``title`` and ``abstract`` will
be copied to the field ``meta_data``.

.. code:: js

    {
      "book" : {
        "properties" : {
          "title" : { "type" : "string", "copy_to" : "meta_data" },
          "abstract" : { "type" : "string", "copy_to" : "meta_data" },
          "meta_data" : { "type" : "string" }
        }
    }

Multiple fields are also supported:

.. code:: js

    {
      "book" : {
        "properties" : {
          "title" : { "type" : "string", "copy_to" : ["meta_data", "article_info"] }
        }
    }

**Multi fields**

The ``fields`` options allows to map several core types fields into a
single json source field. This can be useful if a single field need to
be used in different ways. For example a single field is to be used for
both free text search and sorting.

.. code:: js

    {
      "tweet" : {
        "properties" : {
          "name" : {
            "type" : "string",
            "index" : "analyzed",
            "fields" : {
              "raw" : {"type" : "string", "index" : "not_analyzed"}
            }
          }
        }
      }
    }

In the above example the field ``name`` gets processed twice. The first
time it gets processed as an analyzed string and this version is
accessible under the field name ``name``, this is the main field and is
in fact just like any other field. The second time it gets processed as
a not analyzed string and is accessible under the name ``name.raw``.

**Include in All**

The ``include_in_all`` setting is ignored on any field that is defined
in the ``fields`` options. Setting the ``include_in_all`` only makes
sense on the main field, since the raw field value is copied to the
``_all`` field, the tokens aren’t copied.

**Updating a field**

In the essence a field can’t be updated. However multi fields can be
added to existing fields. This allows for example to have a different
``index_analyzer`` configuration in addition to the already configured
``index_analyzer`` configuration specified in the main and other multi
fields.

Also the new multi field will only be applied on document that have been
added after the multi field has been added and in fact the new multi
field doesn’t exist in existing documents.

Another important note is that new multi fields will be merged into the
list of existing multi fields, so when adding new multi fields for a
field previous added multi fields don’t need to be specified.

Array Type
----------

JSON documents allow to define an array (list) of fields or objects.
Mapping array types could not be simpler since arrays gets automatically
detected and mapping them can be done either with `Core
Types <#mapping-core-types>`__ or `Object Type <#mapping-object-type>`__
mappings. For example, the following JSON defines several arrays:

.. code:: js

    {
        "tweet" : {
            "message" : "some arrays in this tweet...",
            "tags" : ["elasticsearch", "wow"],
            "lists" : [
                {
                    "name" : "prog_list",
                    "description" : "programming list"
                },
                {
                    "name" : "cool_list",
                    "description" : "cool stuff list"
                }
            ]
        }
    }

The above JSON has the ``tags`` property defining a list of a simple
``string`` type, and the ``lists`` property is an ``object`` type array.
Here is a sample explicit mapping:

.. code:: js

    {
        "tweet" : {
            "properties" : {
                "message" : {"type" : "string"},
                "tags" : {"type" : "string", "index_name" : "tag"},
                "lists" : {
                    "properties" : {
                        "name" : {"type" : "string"},
                        "description" : {"type" : "string"}
                    }
                }
            }
        }
    }

The fact that array types are automatically supported can be shown by
the fact that the following JSON document is perfectly fine:

.. code:: js

    {
        "tweet" : {
            "message" : "some arrays in this tweet...",
            "tags" : "elasticsearch",
            "lists" : {
                "name" : "prog_list",
                "description" : "programming list"
            }
        }
    }

Note also, that thanks to the fact that we used the ``index_name`` to
use the non plural form (``tag`` instead of ``tags``), we can actually
refer to the field using the ``index_name`` as well. For example, we can
execute a query using ``tweet.tags:wow`` or ``tweet.tag:wow``. We could,
of course, name the field as ``tag`` and skip the ``index_name`` all
together).

Object Type
-----------

JSON documents are hierarchical in nature, allowing them to define inner
"objects" within the actual JSON. Elasticsearch completely understands
the nature of these inner objects and can map them easily, providing
query support for their inner fields. Because each document can have
objects with different fields each time, objects mapped this way are
known as "dynamic". Dynamic mapping is enabled by default. Let’s take
the following JSON as an example:

.. code:: js

    {
        "tweet" : {
            "person" : {
                "name" : {
                    "first_name" : "Shay",
                    "last_name" : "Banon"
                },
                "sid" : "12345"
            },
            "message" : "This is a tweet!"
        }
    }

The above shows an example where a tweet includes the actual ``person``
details. A ``person`` is an object, with a ``sid``, and a ``name``
object which has ``first_name`` and ``last_name``. It’s important to
note that ``tweet`` is also an object, although it is a special `root
object type <#mapping-root-object-type>`__ which allows for additional
mapping definitions.

The following is an example of explicit mapping for the above JSON:

.. code:: js

    {
        "tweet" : {
            "properties" : {
                "person" : {
                    "type" : "object",
                    "properties" : {
                        "name" : {
                            "type" : "object",
                            "properties" : {
                                "first_name" : {"type" : "string"},
                                "last_name" : {"type" : "string"}
                            }
                        },
                        "sid" : {"type" : "string", "index" : "not_analyzed"}
                    }
                },
                "message" : {"type" : "string"}
            }
        }
    }

In order to mark a mapping of type ``object``, set the ``type`` to
object. This is an optional step, since if there are ``properties``
defined for it, it will automatically be identified as an ``object``
mapping.

**properties**

An object mapping can optionally define one or more properties using the
``properties`` tag for a field. Each property can be either another
``object``, or one of the `core\_types <#mapping-core-types>`__.

**dynamic**

One of the most important features of Elasticsearch is its ability to be
schema-less. This means that, in our example above, the ``person``
object can be indexed later with a new property — \ ``age``, for
example — and it will automatically be added to the mapping definitions.
Same goes for the ``tweet`` root object.

This feature is by default turned on, and it’s the ``dynamic`` nature of
each object mapped. Each object mapped is automatically dynamic, though
it can be explicitly turned off:

.. code:: js

    {
        "tweet" : {
            "properties" : {
                "person" : {
                    "type" : "object",
                    "properties" : {
                        "name" : {
                            "dynamic" : false,
                            "properties" : {
                                "first_name" : {"type" : "string"},
                                "last_name" : {"type" : "string"}
                            }
                        },
                        "sid" : {"type" : "string", "index" : "not_analyzed"}
                    }
                },
                "message" : {"type" : "string"}
            }
        }
    }

In the above example, the ``name`` object mapped is not dynamic, meaning
that if, in the future, we try to index JSON with a ``middle_name``
within the ``name`` object, it will get discarded and not added.

There is no performance overhead if an ``object`` is dynamic, the
ability to turn it off is provided as a safety mechanism so "malformed"
objects won’t, by mistake, index data that we do not wish to be indexed.

If a dynamic object contains yet another inner ``object``, it will be
automatically added to the index and mapped as well.

When processing dynamic new fields, their type is automatically derived.
For example, if it is a ``number``, it will automatically be treated as
number `core\_type <#mapping-core-types>`__. Dynamic fields default to
their default attributes, for example, they are not stored and they are
always indexed.

Date fields are special since they are represented as a ``string``. Date
fields are detected if they can be parsed as a date when they are first
introduced into the system. The set of date formats that are tested
against can be configured using the ``dynamic_date_formats`` on the root
object, which is explained later.

Note, once a field has been added, **its type can not change**. For
example, if we added age and its value is a number, then it can’t be
treated as a string.

The ``dynamic`` parameter can also be set to ``strict``, meaning that
not only will new fields not be introduced into the mapping, but also
that parsing (indexing) docs with such new fields will fail.

**enabled**

The ``enabled`` flag allows to disable parsing and indexing a named
object completely. This is handy when a portion of the JSON document
contains arbitrary JSON which should not be indexed, nor added to the
mapping. For example:

.. code:: js

    {
        "tweet" : {
            "properties" : {
                "person" : {
                    "type" : "object",
                    "properties" : {
                        "name" : {
                            "type" : "object",
                            "enabled" : false
                        },
                        "sid" : {"type" : "string", "index" : "not_analyzed"}
                    }
                },
                "message" : {"type" : "string"}
            }
        }
    }

In the above, ``name`` and its content will not be indexed at all.

**include\_in\_all**

``include_in_all`` can be set on the ``object`` type level. When set, it
propagates down to all the inner mappings defined within the ``object``
that do not explicitly set it.

Root Object Type
----------------

The root object mapping is an `object type
mapping <#mapping-object-type>`__ that maps the root object (the type
itself). It supports all of the different mappings that can be set using
the `object type mapping <#mapping-object-type>`__.

The root object mapping allows to index a JSON document that only
contains its fields. For example, the following ``tweet`` JSON can be
indexed without specifying the ``tweet`` type in the document itself:

.. code:: js

    {
        "message" : "This is a tweet!"
    }

**Index / Search Analyzers**

The root object allows to define type mapping level analyzers for index
and search that will be used with all different fields that do not
explicitly set analyzers on their own. Here is an example:

.. code:: js

    {
        "tweet" : {
            "index_analyzer" : "standard",
            "search_analyzer" : "standard"
        }
    }

The above simply explicitly defines both the ``index_analyzer`` and
``search_analyzer`` that will be used. There is also an option to use
the ``analyzer`` attribute to set both the ``search_analyzer`` and
``index_analyzer``.

**dynamic\_date\_formats**

``dynamic_date_formats`` (old setting called ``date_formats`` still
works) is the ability to set one or more date formats that will be used
to detect ``date`` fields. For example:

.. code:: js

    {
        "tweet" : {
            "dynamic_date_formats" : ["yyyy-MM-dd", "dd-MM-yyyy"],
            "properties" : {
                "message" : {"type" : "string"}
            }
        }
    }

In the above mapping, if a new JSON field of type string is detected,
the date formats specified will be used in order to check if its a date.
If it passes parsing, then the field will be declared with ``date``
type, and will use the matching format as its format attribute. The date
format itself is explained `here <#mapping-date-format>`__.

The default formats are: ``dateOptionalTime`` (ISO) and
``yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z``.

**Note:** ``dynamic_date_formats`` are used **only** for dynamically
added date fields, not for ``date`` fields that you specify in your
mapping.

**date\_detection**

Allows to disable automatic date type detection (if a new field is
introduced and matches the provided format), for example:

.. code:: js

    {
        "tweet" : {
            "date_detection" : false,
            "properties" : {
                "message" : {"type" : "string"}
            }
        }
    }

**numeric\_detection**

Sometimes, even though json has support for native numeric types,
numeric values are still provided as strings. In order to try and
automatically detect numeric values from string, the
``numeric_detection`` can be set to ``true``. For example:

.. code:: js

    {
        "tweet" : {
            "numeric_detection" : true,
            "properties" : {
                "message" : {"type" : "string"}
            }
        }
    }

**dynamic\_templates**

Dynamic templates allow to define mapping templates that will be applied
when dynamic introduction of fields / objects happens.

    **Important**

    Dynamic field mappings are only added when a field contains a
    concrete value — not ``null`` or an empty array. This means that if
    the ``null_value`` option is used in a ``dynamic_template``, it will
    only be applied after the first document with a concrete value for
    the field has been indexed.

For example, we might want to have all fields to be stored by default,
or all ``string`` fields to be stored, or have ``string`` fields to
always be indexed with multi fields syntax, once analyzed and once
not\_analyzed. Here is a simple example:

.. code:: js

    {
        "person" : {
            "dynamic_templates" : [
                {
                    "template_1" : {
                        "match" : "multi*",
                        "mapping" : {
                            "type" : "{dynamic_type}",
                            "index" : "analyzed",
                            "fields" : {
                                "org" : {"type": "{dynamic_type}", "index" : "not_analyzed"}
                            }
                        }
                    }
                },
                {
                    "template_2" : {
                        "match" : "*",
                        "match_mapping_type" : "string",
                        "mapping" : {
                            "type" : "string",
                            "index" : "not_analyzed"
                        }
                    }
                }
            ]
        }
    }

The above mapping will create a field with multi fields for all field
names starting with multi, and will map all ``string`` types to be
``not_analyzed``.

Dynamic templates are named to allow for simple merge behavior. A new
mapping, just with a new template can be "put" and that template will be
added, or if it has the same name, the template will be replaced.

The ``match`` allow to define matching on the field name. An ``unmatch``
option is also available to exclude fields if they do match on
``match``. The ``match_mapping_type`` controls if this template will be
applied only for dynamic fields of the specified type (as guessed by the
json format).

Another option is to use ``path_match``, which allows to match the
dynamic template against the "full" dot notation name of the field (for
example ``obj1.*.value`` or ``obj1.obj2.*``), with the respective
``path_unmatch``.

The format of all the matching is simple format, allowing to use \* as a
matching element supporting simple patterns such as xxx\*, \*xxx,
xxx\*yyy (with arbitrary number of pattern types), as well as direct
equality. The ``match_pattern`` can be set to ``regex`` to allow for
regular expression based matching.

The ``mapping`` element provides the actual mapping definition. The
``{name}`` keyword can be used and will be replaced with the actual
dynamic field name being introduced. The ``{dynamic_type}`` (or
``{dynamicType}``) can be used and will be replaced with the mapping
derived based on the field type (or the derived type, like ``date``).

Complete generic settings can also be applied, for example, to have all
mappings be stored, just set:

.. code:: js

    {
        "person" : {
            "dynamic_templates" : [
                {
                    "store_generic" : {
                        "match" : "*",
                        "mapping" : {
                            "store" : true
                        }
                    }
                }
            ]
        }
    }

Such generic templates should be placed at the end of the
``dynamic_templates`` list because when two or more dynamic templates
match a field, only the first matching one from the list is used.

Nested Type
-----------

The ``nested`` type works like the ```object``
type <#mapping-object-type>`__ except that an array of ``objects`` is
flattened, while an array of ``nested`` objects allows each object to be
queried independently. To explain, consider this document:

.. code:: js

    {
        "group" : "fans",
        "user" : [
            {
                "first" : "John",
                "last" :  "Smith"
            },
            {
                "first" : "Alice",
                "last" :  "White"
            },
        ]
    }

If the ``user`` field is of type ``object``, this document would be
indexed internally something like this:

.. code:: js

    {
        "group" :        "fans",
        "user.first" : [ "alice", "john" ],
        "user.last" :  [ "smith", "white" ]
    }

The ``first`` and ``last`` fields are flattened, and the association
between ``alice`` and ``white`` is lost. This document would incorrectly
match a query for ``alice AND smith``.

If the ``user`` field is of type ``nested``, each object is indexed as a
separate document, something like this:

.. code:: js

    { 
        "user.first" : "alice",
        "user.last" :  "white"
    }
    { 
        "user.first" : "john",
        "user.last" :  "smith"
    }
    { 
        "group" :       "fans"
    }

Hidden nested documents.

Visible “parent” document.

By keeping each nested object separate, the association between the
``user.first`` and ``user.last`` fields is maintained. The query for
``alice AND
smith`` would **not** match this document.

Searching on nested docs can be done using either the `nested
query <#query-dsl-nested-query>`__ or `nested
filter <#query-dsl-nested-filter>`__.

Mapping
~~~~~~~

The mapping for ``nested`` fields is the same as ``object`` fields,
except that it uses type ``nested``:

.. code:: js

    {
        "type1" : {
            "properties" : {
                "users" : {
                    "type" : "nested",
                    "properties": {
                        "first" : {"type": "string" },
                        "last"  : {"type": "string" }
                    }
                }
            }
        }
    }

    **Note**

    changing an ``object`` type to ``nested`` type requires reindexing.

You may want to index inner objects both as ``nested`` fields **and** as
flattened ``object`` fields, eg for highlighting. This can be achieved
by setting ``include_in_parent`` to ``true``:

.. code:: js

    {
        "type1" : {
            "properties" : {
                "users" : {
                    "type" : "nested",
                    "include_in_parent": true,
                    "properties": {
                        "first" : {"type": "string" },
                        "last"  : {"type": "string" }
                    }
                }
            }
        }
    }

The result of indexing our example document would be something like
this:

.. code:: js

    { 
        "user.first" : "alice",
        "user.last" :  "white"
    }
    { 
        "user.first" : "john",
        "user.last" :  "smith"
    }
    { 
        "group" :        "fans",
        "user.first" : [ "alice", "john" ],
        "user.last" :  [ "smith", "white" ]
    }

Hidden nested documents.

Visible “parent” document.

Nested fields may contain other nested fields. The ``include_in_parent``
object refers to the direct parent of the field, while the
``include_in_root`` parameter refers only to the topmost “root” object
or document.

Nested docs will automatically use the root doc ``_all`` field only.

Internally, nested objects are indexed as additional documents, but,
since they can be guaranteed to be indexed within the same "block", it
allows for extremely fast joining with parent docs.

Those internal nested documents are automatically masked away when doing
operations against the index (like searching with a match\_all query),
and they bubble out when using the nested query.

Because nested docs are always masked to the parent doc, the nested docs
can never be accessed outside the scope of the ``nested`` query. For
example stored fields can be enabled on fields inside nested objects,
but there is no way of retrieving them, since stored fields are fetched
outside of the ``nested`` query scope.

The ``_source`` field is always associated with the parent document and
because of that field values via the source can be fetched for nested
object.

IP Type
-------

An ``ip`` mapping type allows to store *ipv4* addresses in a numeric
form allowing to easily sort, and range query it (using ip values).

The following table lists all the attributes that can be used with an ip
type:

+--------------------------------------+--------------------------------------+
| Attribute                            | Description                          |
+======================================+======================================+
| ``index_name``                       | The name of the field that will be   |
|                                      | stored in the index. Defaults to the |
|                                      | property/field name.                 |
+--------------------------------------+--------------------------------------+
| ``store``                            | Set to ``true`` to store actual      |
|                                      | field in the index, ``false`` to not |
|                                      | store it. Defaults to ``false``      |
|                                      | (note, the JSON document itself is   |
|                                      | stored, and it can be retrieved from |
|                                      | it).                                 |
+--------------------------------------+--------------------------------------+
| ``index``                            | Set to ``no`` if the value should    |
|                                      | not be indexed. In this case,        |
|                                      | ``store`` should be set to ``true``, |
|                                      | since if it’s not indexed and not    |
|                                      | stored, there is nothing to do with  |
|                                      | it.                                  |
+--------------------------------------+--------------------------------------+
| ``precision_step``                   | The precision step (influences the   |
|                                      | number of terms generated for each   |
|                                      | number value). Defaults to ``16``.   |
+--------------------------------------+--------------------------------------+
| ``boost``                            | The boost value. Defaults to         |
|                                      | ``1.0``.                             |
+--------------------------------------+--------------------------------------+
| ``null_value``                       | When there is a (JSON) null value    |
|                                      | for the field, use the               |
|                                      | ``null_value`` as the field value.   |
|                                      | Defaults to not adding the field at  |
|                                      | all.                                 |
+--------------------------------------+--------------------------------------+
| ``include_in_all``                   | Should the field be included in the  |
|                                      | ``_all`` field (if enabled).         |
|                                      | Defaults to ``true`` or to the       |
|                                      | parent ``object`` type setting.      |
+--------------------------------------+--------------------------------------+

Geo Point Type
--------------

Mapper type called ``geo_point`` to support geo based points. The
declaration looks as follows:

.. code:: js

    {
        "pin" : {
            "properties" : {
                "location" : {
                    "type" : "geo_point"
                }
            }
        }
    }

**Indexed Fields**

The ``geo_point`` mapping will index a single field with the format of
``lat,lon``. The ``lat_lon`` option can be set to also index the
``.lat`` and ``.lon`` as numeric fields, and ``geohash`` can be set to
``true`` to also index ``.geohash`` value.

A good practice is to enable indexing ``lat_lon`` as well, since both
the geo distance and bounding box filters can either be executed using
in memory checks, or using the indexed lat lon values, and it really
depends on the data set which one performs better. Note though, that
indexed lat lon only make sense when there is a single geo point value
for the field, and not multi values.

**Geohashes**

Geohashes are a form of lat/lon encoding which divides the earth up into
a grid. Each cell in this grid is represented by a geohash string. Each
cell in turn can be further subdivided into smaller cells which are
represented by a longer string. So the longer the geohash, the smaller
(and thus more accurate) the cell is.

Because geohashes are just strings, they can be stored in an inverted
index like any other string, which makes querying them very efficient.

If you enable the ``geohash`` option, a ``geohash`` “sub-field” will be
indexed as, eg ``pin.geohash``. The length of the geohash is controlled
by the ``geohash_precision`` parameter, which can either be set to an
absolute length (eg ``12``, the default) or to a distance (eg ``1km``).

More usefully, set the ``geohash_prefix`` option to ``true`` to not only
index the geohash value, but all the enclosing cells as well. For
instance, a geohash of ``u30`` will be indexed as ``[u,u3,u30]``. This
option can be used by the ? to find geopoints within a particular cell
very efficiently.

**Input Structure**

The above mapping defines a ``geo_point``, which accepts different
formats. The following formats are supported:

**Lat Lon as Properties**

.. code:: js

    {
        "pin" : {
            "location" : {
                "lat" : 41.12,
                "lon" : -71.34
            }
        }
    }

**Lat Lon as String**

Format in ``lat,lon``.

.. code:: js

    {
        "pin" : {
            "location" : "41.12,-71.34"
        }
    }

**Geohash**

.. code:: js

    {
        "pin" : {
            "location" : "drm3btev3e86"
        }
    }

**Lat Lon as Array**

Format in ``[lon, lat]``, note, the order of lon/lat here in order to
conform with `GeoJSON <http://geojson.org/>`__.

.. code:: js

    {
        "pin" : {
            "location" : [-71.34, 41.12]
        }
    }

**Mapping Options**

+--------------------------------------+--------------------------------------+
| Option                               | Description                          |
+======================================+======================================+
| ``lat_lon``                          | Set to ``true`` to also index the    |
|                                      | ``.lat`` and ``.lon`` as fields.     |
|                                      | Defaults to ``false``.               |
+--------------------------------------+--------------------------------------+
| ``geohash``                          | Set to ``true`` to also index the    |
|                                      | ``.geohash`` as a field. Defaults to |
|                                      | ``false``.                           |
+--------------------------------------+--------------------------------------+
| ``geohash_precision``                | Sets the geohash precision. It can   |
|                                      | be set to an absolute geohash length |
|                                      | or a distance value (eg 1km, 1m,     |
|                                      | 1ml) defining the size of the        |
|                                      | smallest cell. Defaults to an        |
|                                      | absolute length of 12.               |
+--------------------------------------+--------------------------------------+
| ``geohash_prefix``                   | If this option is set to ``true``,   |
|                                      | not only the geohash but also all    |
|                                      | its parent cells (true prefixes)     |
|                                      | will be indexed as well. The number  |
|                                      | of terms that will be indexed        |
|                                      | depends on the                       |
|                                      | ``geohash_precision``. Defaults to   |
|                                      | ``false``. **Note**: This option     |
|                                      | implicitly enables ``geohash``.      |
+--------------------------------------+--------------------------------------+
| ``validate``                         | Set to ``true`` to reject geo points |
|                                      | with invalid latitude or longitude   |
|                                      | (default is ``false``). **Note**:    |
|                                      | Validation only works when           |
|                                      | normalization has been disabled.     |
+--------------------------------------+--------------------------------------+
| ``validate_lat``                     | Set to ``true`` to reject geo points |
|                                      | with an invalid latitude.            |
+--------------------------------------+--------------------------------------+
| ``validate_lon``                     | Set to ``true`` to reject geo points |
|                                      | with an invalid longitude.           |
+--------------------------------------+--------------------------------------+
| ``normalize``                        | Set to ``true`` to normalize         |
|                                      | latitude and longitude (default is   |
|                                      | ``true``).                           |
+--------------------------------------+--------------------------------------+
| ``normalize_lat``                    | Set to ``true`` to normalize         |
|                                      | latitude.                            |
+--------------------------------------+--------------------------------------+
| ``normalize_lon``                    | Set to ``true`` to normalize         |
|                                      | longitude.                           |
+--------------------------------------+--------------------------------------+
| ``precision_step``                   | The precision step (influences the   |
|                                      | number of terms generated for each   |
|                                      | number value) for ``.lat`` and       |
|                                      | ``.lon`` fields if ``lat_lon`` is    |
|                                      | set to ``true``. Defaults to ``16``. |
+--------------------------------------+--------------------------------------+

**Field data**

By default, geo points use the ``array`` format which loads geo points
into two parallel double arrays, making sure there is no precision loss.
However, this can require a non-negligible amount of memory (16 bytes
per document) which is why Elasticsearch also provides a field data
implementation with lossy compression called ``compressed``:

.. code:: js

    {
        "pin" : {
            "properties" : {
                "location" : {
                    "type" : "geo_point",
                    "fielddata" : {
                        "format" : "compressed",
                        "precision" : "1cm"
                    }
                }
            }
        }
    }

This field data format comes with a ``precision`` option which allows to
configure how much precision can be traded for memory. The default value
is ``1cm``. The following table presents values of the memory savings
given various precisions:

+--------------------------+--------------------------+--------------------------+
| Precision                | Bytes per point          | Size reduction           |
+--------------------------+--------------------------+--------------------------+
| 1km                      | 4                        | 75%                      |
+--------------------------+--------------------------+--------------------------+
| 3m                       | 6                        | 62.5%                    |
+--------------------------+--------------------------+--------------------------+
| 1cm                      | 8                        | 50%                      |
+--------------------------+--------------------------+--------------------------+
| 1mm                      | 10                       | 37.5%                    |
+--------------------------+--------------------------+--------------------------+

Precision can be changed on a live index by using the update mapping
API.

**Usage in Scripts**

When using ``doc[geo_field_name]`` (in the above mapping,
``doc['location']``), the ``doc[...].value`` returns a ``GeoPoint``,
which then allows access to ``lat`` and ``lon`` (for example,
``doc[...].value.lat``). For performance, it is better to access the
``lat`` and ``lon`` directly using ``doc[...].lat`` and
``doc[...].lon``.

Geo Shape Type
--------------

The ``geo_shape`` mapping type facilitates the indexing of and searching
with arbitrary geo shapes such as rectangles and polygons. It should be
used when either the data being indexed or the queries being executed
contain shapes other than just points.

You can query documents using this type using `geo\_shape
Filter <#query-dsl-geo-shape-filter>`__ or `geo\_shape
Query <#query-dsl-geo-shape-query>`__.

**Mapping Options**

The geo\_shape mapping maps geo\_json geometry objects to the geo\_shape
type. To enable it, users must explicitly map fields to the geo\_shape
type.

+--------------------------------------+--------------------------------------+
| Option                               | Description                          |
+======================================+======================================+
| ``tree``                             | Name of the PrefixTree               |
|                                      | implementation to be used:           |
|                                      | ``geohash`` for GeohashPrefixTree    |
|                                      | and ``quadtree`` for QuadPrefixTree. |
|                                      | Defaults to ``geohash``.             |
+--------------------------------------+--------------------------------------+
| ``precision``                        | This parameter may be used instead   |
|                                      | of ``tree_levels`` to set an         |
|                                      | appropriate value for the            |
|                                      | ``tree_levels`` parameter. The value |
|                                      | specifies the desired precision and  |
|                                      | Elasticsearch will calculate the     |
|                                      | best tree\_levels value to honor     |
|                                      | this precision. The value should be  |
|                                      | a number followed by an optional     |
|                                      | distance unit. Valid distance units  |
|                                      | include: ``in``, ``inch``, ``yd``,   |
|                                      | ``yard``, ``mi``, ``miles``, ``km``, |
|                                      | ``kilometers``, ``m``,\ ``meters``   |
|                                      | (default), ``cm``,\ ``centimeters``, |
|                                      | ``mm``, ``millimeters``.             |
+--------------------------------------+--------------------------------------+
| ``tree_levels``                      | Maximum number of layers to be used  |
|                                      | by the PrefixTree. This can be used  |
|                                      | to control the precision of shape    |
|                                      | representations and therefore how    |
|                                      | many terms are indexed. Defaults to  |
|                                      | the default value of the chosen      |
|                                      | PrefixTree implementation. Since     |
|                                      | this parameter requires a certain    |
|                                      | level of understanding of the        |
|                                      | underlying implementation, users may |
|                                      | use the ``precision`` parameter      |
|                                      | instead. However, Elasticsearch only |
|                                      | uses the tree\_levels parameter      |
|                                      | internally and this is what is       |
|                                      | returned via the mapping API even if |
|                                      | you use the precision parameter.     |
+--------------------------------------+--------------------------------------+
| ``distance_error_pct``               | Used as a hint to the PrefixTree     |
|                                      | about how precise it should be.      |
|                                      | Defaults to 0.025 (2.5%) with 0.5 as |
|                                      | the maximum supported value.         |
+--------------------------------------+--------------------------------------+

**Prefix trees**

To efficiently represent shapes in the index, Shapes are converted into
a series of hashes representing grid squares using implementations of a
PrefixTree. The tree notion comes from the fact that the PrefixTree uses
multiple grid layers, each with an increasing level of precision to
represent the Earth.

Multiple PrefixTree implementations are provided:

-  GeohashPrefixTree - Uses
   `geohashes <http://en.wikipedia.org/wiki/Geohash>`__ for grid
   squares. Geohashes are base32 encoded strings of the bits of the
   latitude and longitude interleaved. So the longer the hash, the more
   precise it is. Each character added to the geohash represents another
   tree level and adds 5 bits of precision to the geohash. A geohash
   represents a rectangular area and has 32 sub rectangles. The maximum
   amount of levels in Elasticsearch is 24.

-  QuadPrefixTree - Uses a
   `quadtree <http://en.wikipedia.org/wiki/Quadtree>`__ for grid
   squares. Similar to geohash, quad trees interleave the bits of the
   latitude and longitude the resulting hash is a bit set. A tree level
   in a quad tree represents 2 bits in this bit set, one for each
   coordinate. The maximum amount of levels for the quad trees in
   Elasticsearch is 50.

**Accuracy**

Geo\_shape does not provide 100% accuracy and depending on how it is
configured it may return some false positives or false negatives for
certain queries. To mitigate this, it is important to select an
appropriate value for the tree\_levels parameter and to adjust
expectations accordingly. For example, a point may be near the border of
a particular grid cell and may thus not match a query that only matches
the cell right next to it — even though the shape is very close to the
point.

**Example**

.. code:: js

    {
        "properties": {
            "location": {
                "type": "geo_shape",
                "tree": "quadtree",
                "precision": "1m"
            }
        }
    }

This mapping maps the location field to the geo\_shape type using the
quad\_tree implementation and a precision of 1m. Elasticsearch
translates this into a tree\_levels setting of 26.

**Performance considerations**

Elasticsearch uses the paths in the prefix tree as terms in the index
and in queries. The higher the levels is (and thus the precision), the
more terms are generated. Of course, calculating the terms, keeping them
in memory, and storing them on disk all have a price. Especially with
higher tree levels, indices can become extremely large even with a
modest amount of data. Additionally, the size of the features also
matters. Big, complex polygons can take up a lot of space at higher tree
levels. Which setting is right depends on the use case. Generally one
trades off accuracy against index size and query performance.

The defaults in Elasticsearch for both implementations are a compromise
between index size and a reasonable level of precision of 50m at the
equator. This allows for indexing tens of millions of shapes without
overly bloating the resulting index too much relative to the input size.

**Input Structure**

The `GeoJSON <http://www.geojson.org>`__ format is used to represent
`shapes <http://geojson.org/geojson-spec.html#geometry-objects>`__ as
input as follows:

+--------------------------+--------------------------+--------------------------+
| GeoJSON Type             | Elasticsearch Type       | Description              |
+==========================+==========================+==========================+
| ``Point``                | ``point``                | A single geographic      |
|                          |                          | coordinate.              |
+--------------------------+--------------------------+--------------------------+
| ``LineString``           | ``linestring``           | An arbitrary line given  |
|                          |                          | two or more points.      |
+--------------------------+--------------------------+--------------------------+
| ``Polygon``              | ``polygon``              | A *closed* polygon whose |
|                          |                          | first and last point     |
|                          |                          | must match, thus         |
|                          |                          | requiring ``n + 1``      |
|                          |                          | vertices to create an    |
|                          |                          | ``n``-sided polygon and  |
|                          |                          | a minimum of ``4``       |
|                          |                          | vertices.                |
+--------------------------+--------------------------+--------------------------+
| ``MultiPoint``           | ``multipoint``           | An array of unconnected, |
|                          |                          | but likely related       |
|                          |                          | points.                  |
+--------------------------+--------------------------+--------------------------+
| ``MultiLineString``      | ``multilinestring``      | An array of separate     |
|                          |                          | linestrings.             |
+--------------------------+--------------------------+--------------------------+
| ``MultiPolygon``         | ``multipolygon``         | An array of separate     |
|                          |                          | polygons.                |
+--------------------------+--------------------------+--------------------------+
| ``GeometryCollection``   | ``geometrycollection``   | A GeoJSON shape similar  |
|                          |                          | to the ``multi*`` shapes |
|                          |                          | except that multiple     |
|                          |                          | types can coexist (e.g., |
|                          |                          | a Point and a            |
|                          |                          | LineString).             |
+--------------------------+--------------------------+--------------------------+
| ``N/A``                  | ``envelope``             | A bounding rectangle, or |
|                          |                          | envelope, specified by   |
|                          |                          | specifying only the top  |
|                          |                          | left and bottom right    |
|                          |                          | points.                  |
+--------------------------+--------------------------+--------------------------+
| ``N/A``                  | ``circle``               | A circle specified by a  |
|                          |                          | center point and radius  |
|                          |                          | with units, which        |
|                          |                          | default to ``METERS``.   |
+--------------------------+--------------------------+--------------------------+

    **Note**

    For all types, both the inner ``type`` and ``coordinates`` fields
    are required.

    Note: In GeoJSON, and therefore Elasticsearch, the correct
    **coordinate order is longitude, latitude (X, Y)** within coordinate
    arrays. This differs from many Geospatial APIs (e.g., Google Maps)
    that generally use the colloquial latitude, longitude (Y, X).

**`Point <http://geojson.org/geojson-spec.html#id2>`__**

A point is a single geographic coordinate, such as the location of a
building or the current position given by a smartphone’s Geolocation
API.

.. code:: js

    {
        "location" : {
            "type" : "point",
            "coordinates" : [-77.03653, 38.897676]
        }
    }

**`LineString <http://geojson.org/geojson-spec.html#id3>`__**

A ``linestring`` defined by an array of two or more positions. By
specifying only two points, the ``linestring`` will represent a straight
line. Specifying more than two points creates an arbitrary path.

.. code:: js

    {
        "location" : {
            "type" : "linestring",
            "coordinates" : [[-77.03653, 38.897676], [-77.009051, 38.889939]]
        }
    }

The above ``linestring`` would draw a straight line starting at the
White House to the US Capitol Building.

**`Polygon <http://www.geojson.org/geojson-spec.html#id4>`__**

A polygon is defined by a list of a list of points. The first and last
points in each (outer) list must be the same (the polygon must be
closed).

.. code:: js

    {
        "location" : {
            "type" : "polygon",
            "coordinates" : [
                [ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0] ]
            ]
        }
    }

The first array represents the outer boundary of the polygon, the other
arrays represent the interior shapes ("holes"):

.. code:: js

    {
        "location" : {
            "type" : "polygon",
            "coordinates" : [
                [ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0] ],
                [ [100.2, 0.2], [100.8, 0.2], [100.8, 0.8], [100.2, 0.8], [100.2, 0.2] ]
            ]
        }
    }

**`MultiPoint <http://www.geojson.org/geojson-spec.html#id5>`__**

A list of geojson points.

.. code:: js

    {
        "location" : {
            "type" : "multipoint",
            "coordinates" : [
                [102.0, 2.0], [103.0, 2.0]
            ]
        }
    }

**`MultiLineString <http://www.geojson.org/geojson-spec.html#id6>`__**

A list of geojson linestrings.

.. code:: js

    {
        "location" : {
            "type" : "multilinestring",
            "coordinates" : [
                [ [102.0, 2.0], [103.0, 2.0], [103.0, 3.0], [102.0, 3.0] ],
                [ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0] ],
                [ [100.2, 0.2], [100.8, 0.2], [100.8, 0.8], [100.2, 0.8] ]
            ]
        }
    }

**`MultiPolygon <http://www.geojson.org/geojson-spec.html#id7>`__**

A list of geojson polygons.

.. code:: js

    {
        "location" : {
            "type" : "multipolygon",
            "coordinates" : [
                [ [[102.0, 2.0], [103.0, 2.0], [103.0, 3.0], [102.0, 3.0], [102.0, 2.0]] ],

                [ [[100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0]],
                  [[100.2, 0.2], [100.8, 0.2], [100.8, 0.8], [100.2, 0.8], [100.2, 0.2]] ]
            ]
        }
    }

**`Geometry
Collection <http://geojson.org/geojson-spec.html#geometrycollection>`__**

A collection of geojson geometry objects.

.. code:: js

    {
        "location" : {
            "type": "geometrycollection",
            "geometries": [
                {
                    "type": "point",
                    "coordinates": [100.0, 0.0]
                },
                {
                    "type": "linestring",
                    "coordinates": [ [101.0, 0.0], [102.0, 1.0] ]
                }
            ]
        }
    }

**Envelope**

Elasticsearch supports an ``envelope`` type, which consists of
coordinates for upper left and lower right points of the shape to
represent a bounding rectangle:

.. code:: js

    {
        "location" : {
            "type" : "envelope",
            "coordinates" : [ [-45.0, 45.0], [45.0, -45.0] ]
        }
    }

**Circle**

Elasticsearch supports a ``circle`` type, which consists of a center
point with a radius:

.. code:: js

    {
        "location" : {
            "type" : "circle",
            "coordinates" : [-45.0, 45.0],
            "radius" : "100m"
        }
    }

Note: The inner ``radius`` field is required. If not specified, then the
units of the ``radius`` will default to ``METERS``.

**Sorting and Retrieving index Shapes**

Due to the complex input structure and index representation of shapes,
it is not currently possible to sort shapes or retrieve their fields
directly. The geo\_shape value is only retrievable through the
``_source`` field.

Attachment Type
---------------

The ``attachment`` type allows to index different "attachment" type
field (encoded as ``base64``), for example, Microsoft Office formats,
open document formats, ePub, HTML, and so on (full list can be found
`here <http://lucene.apache.org/tika/0.10/formats.html>`__).

The ``attachment`` type is provided as a `plugin
extension <https://github.com/elasticsearch/elasticsearch-mapper-attachments>`__.
The plugin is a simple zip file that can be downloaded and placed under
``$ES_HOME/plugins`` location. It will be automatically detected and the
``attachment`` type will be added.

Note, the ``attachment`` type is experimental.

Using the attachment type is simple, in your mapping JSON, simply set a
certain JSON element as attachment, for example:

.. code:: js

    {
        "person" : {
            "properties" : {
                "my_attachment" : { "type" : "attachment" }
            }
        }
    }

In this case, the JSON to index can be:

.. code:: js

    {
        "my_attachment" : "... base64 encoded attachment ..."
    }

Or it is possible to use more elaborated JSON if content type or
resource name need to be set explicitly:

.. code:: js

    {
        "my_attachment" : {
            "_content_type" : "application/pdf",
            "_name" : "resource/name/of/my.pdf",
            "content" : "... base64 encoded attachment ..."
        }
    }

The ``attachment`` type not only indexes the content of the doc, but
also automatically adds meta data on the attachment as well (when
available). The metadata supported are: ``date``, ``title``, ``author``,
and ``keywords``. They can be queried using the "dot notation", for
example: ``my_attachment.author``.

Both the meta data and the actual content are simple core type mappers
(string, date, …), thus, they can be controlled in the mappings. For
example:

.. code:: js

    {
        "person" : {
            "properties" : {
                "file" : {
                    "type" : "attachment",
                    "fields" : {
                        "file" : {"index" : "no"},
                        "date" : {"store" : true},
                        "author" : {"analyzer" : "myAnalyzer"}
                    }
                }
            }
        }
    }

In the above example, the actual content indexed is mapped under
``fields`` name ``file``, and we decide not to index it, so it will only
be available in the ``_all`` field. The other fields map to their
respective metadata names, but there is no need to specify the ``type``
(like ``string`` or ``date``) since it is already known.

The plugin uses `Apache Tika <http://lucene.apache.org/tika/>`__ to
parse attachments, so many formats are supported, listed
`here <http://lucene.apache.org/tika/0.10/formats.html>`__.

Date Format
===========

In JSON documents, dates are represented as strings. Elasticsearch uses
a set of pre-configured format to recognize and convert those, but you
can change the defaults by specifying the ``format`` option when
defining a ``date`` type, or by specifying ``dynamic_date_formats`` in
the ``root object`` mapping (which will be used unless explicitly
overridden by a ``date`` type). There are built in formats supported, as
well as complete custom one.

The parsing of dates uses `Joda <http://joda-time.sourceforge.net/>`__.
The default date parsing used if no format is specified is
`ISODateTimeFormat.dateOptionalTimeParser <http://joda-time.sourceforge.net/api-release/org/joda/time/format/ISODateTimeFormat.html#dateOptionalTimeParser()>`__.

An extension to the format allow to define several formats using ``||``
separator. This allows to define less strict formats that can be used,
for example, the ``yyyy/MM/dd HH:mm:ss||yyyy/MM/dd`` format will parse
both ``yyyy/MM/dd HH:mm:ss`` and ``yyyy/MM/dd``. The first format will
also act as the one that converts back from milliseconds to a string
representation.

**Date Math**

The ``date`` type supports using date math expression when using it in a
query/filter (mainly makes sense in ``range`` query/filter).

The expression starts with an "anchor" date, which can be either ``now``
or a date string (in the applicable format) ending with ``||``. It can
then follow by a math expression, supporting ``+``, ``-`` and ``/``
(rounding). The units supported are ``y`` (year), ``M`` (month), ``w``
(week), ``d`` (day), ``h`` (hour), ``m`` (minute), and ``s`` (second).

Here are some samples: ``now+1h``, ``now+1h+1m``, ``now+1h/d``,
``2012-01-01||+1M/d``.

Note, when doing ``range`` type searches, and the upper value is
inclusive, the rounding will properly be rounded to the ceiling instead
of flooring it.

To change this behavior, set ``"mapping.date.round_ceil": false``.

**Built In Formats**

The following tables lists all the defaults ISO formats supported:

+--------------------------------------+--------------------------------------+
| Name                                 | Description                          |
+======================================+======================================+
| ``basic_date``                       | A basic formatter for a full date as |
|                                      | four digit year, two digit month of  |
|                                      | year, and two digit day of month     |
|                                      | (yyyyMMdd).                          |
+--------------------------------------+--------------------------------------+
| ``basic_date_time``                  | A basic formatter that combines a    |
|                                      | basic date and time, separated by a  |
|                                      | *T* (yyyyMMdd’T'HHmmss.SSSZ).        |
+--------------------------------------+--------------------------------------+
| ``basic_date_time_no_millis``        | A basic formatter that combines a    |
|                                      | basic date and time without millis,  |
|                                      | separated by a *T*                   |
|                                      | (yyyyMMdd’T'HHmmssZ).                |
+--------------------------------------+--------------------------------------+
| ``basic_ordinal_date``               | A formatter for a full ordinal date, |
|                                      | using a four digit year and three    |
|                                      | digit dayOfYear (yyyyDDD).           |
+--------------------------------------+--------------------------------------+
| ``basic_ordinal_date_time``          | A formatter for a full ordinal date  |
|                                      | and time, using a four digit year    |
|                                      | and three digit dayOfYear            |
|                                      | (yyyyDDD’T'HHmmss.SSSZ).             |
+--------------------------------------+--------------------------------------+
| ``basic_ordinal_date_time_no_millis` | A formatter for a full ordinal date  |
| `                                    | and time without millis, using a     |
|                                      | four digit year and three digit      |
|                                      | dayOfYear (yyyyDDD’T'HHmmssZ).       |
+--------------------------------------+--------------------------------------+
| ``basic_time``                       | A basic formatter for a two digit    |
|                                      | hour of day, two digit minute of     |
|                                      | hour, two digit second of minute,    |
|                                      | three digit millis, and time zone    |
|                                      | offset (HHmmss.SSSZ).                |
+--------------------------------------+--------------------------------------+
| ``basic_time_no_millis``             | A basic formatter for a two digit    |
|                                      | hour of day, two digit minute of     |
|                                      | hour, two digit second of minute,    |
|                                      | and time zone offset (HHmmssZ).      |
+--------------------------------------+--------------------------------------+
| ``basic_t_time``                     | A basic formatter for a two digit    |
|                                      | hour of day, two digit minute of     |
|                                      | hour, two digit second of minute,    |
|                                      | three digit millis, and time zone    |
|                                      | off set prefixed by *T*              |
|                                      | ('T’HHmmss.SSSZ).                    |
+--------------------------------------+--------------------------------------+
| ``basic_t_time_no_millis``           | A basic formatter for a two digit    |
|                                      | hour of day, two digit minute of     |
|                                      | hour, two digit second of minute,    |
|                                      | and time zone offset prefixed by *T* |
|                                      | ('T’HHmmssZ).                        |
+--------------------------------------+--------------------------------------+
| ``basic_week_date``                  | A basic formatter for a full date as |
|                                      | four digit weekyear, two digit week  |
|                                      | of weekyear, and one digit day of    |
|                                      | week (xxxx’W'wwe).                   |
+--------------------------------------+--------------------------------------+
| ``basic_week_date_time``             | A basic formatter that combines a    |
|                                      | basic weekyear date and time,        |
|                                      | separated by a *T*                   |
|                                      | (xxxx’W'wwe’T'HHmmss.SSSZ).          |
+--------------------------------------+--------------------------------------+
| ``basic_week_date_time_no_millis``   | A basic formatter that combines a    |
|                                      | basic weekyear date and time without |
|                                      | millis, separated by a *T*           |
|                                      | (xxxx’W'wwe’T'HHmmssZ).              |
+--------------------------------------+--------------------------------------+
| ``date``                             | A formatter for a full date as four  |
|                                      | digit year, two digit month of year, |
|                                      | and two digit day of month           |
|                                      | (yyyy-MM-dd).                        |
+--------------------------------------+--------------------------------------+
| ``date_hour``                        | A formatter that combines a full     |
|                                      | date and two digit hour of day.      |
+--------------------------------------+--------------------------------------+
| ``date_hour_minute``                 | A formatter that combines a full     |
|                                      | date, two digit hour of day, and two |
|                                      | digit minute of hour.                |
+--------------------------------------+--------------------------------------+
| ``date_hour_minute_second``          | A formatter that combines a full     |
|                                      | date, two digit hour of day, two     |
|                                      | digit minute of hour, and two digit  |
|                                      | second of minute.                    |
+--------------------------------------+--------------------------------------+
| ``date_hour_minute_second_fraction`` | A formatter that combines a full     |
|                                      | date, two digit hour of day, two     |
|                                      | digit minute of hour, two digit      |
|                                      | second of minute, and three digit    |
|                                      | fraction of second                   |
|                                      | (yyyy-MM-dd’T'HH:mm:ss.SSS).         |
+--------------------------------------+--------------------------------------+
| ``date_hour_minute_second_millis``   | A formatter that combines a full     |
|                                      | date, two digit hour of day, two     |
|                                      | digit minute of hour, two digit      |
|                                      | second of minute, and three digit    |
|                                      | fraction of second                   |
|                                      | (yyyy-MM-dd’T'HH:mm:ss.SSS).         |
+--------------------------------------+--------------------------------------+
| ``date_optional_time``               | a generic ISO datetime parser where  |
|                                      | the date is mandatory and the time   |
|                                      | is optional.                         |
+--------------------------------------+--------------------------------------+
| ``date_time``                        | A formatter that combines a full     |
|                                      | date and time, separated by a *T*    |
|                                      | (yyyy-MM-dd’T'HH:mm:ss.SSSZZ).       |
+--------------------------------------+--------------------------------------+
| ``date_time_no_millis``              | A formatter that combines a full     |
|                                      | date and time without millis,        |
|                                      | separated by a *T*                   |
|                                      | (yyyy-MM-dd’T'HH:mm:ssZZ).           |
+--------------------------------------+--------------------------------------+
| ``hour``                             | A formatter for a two digit hour of  |
|                                      | day.                                 |
+--------------------------------------+--------------------------------------+
| ``hour_minute``                      | A formatter for a two digit hour of  |
|                                      | day and two digit minute of hour.    |
+--------------------------------------+--------------------------------------+
| ``hour_minute_second``               | A formatter for a two digit hour of  |
|                                      | day, two digit minute of hour, and   |
|                                      | two digit second of minute.          |
+--------------------------------------+--------------------------------------+
| ``hour_minute_second_fraction``      | A formatter for a two digit hour of  |
|                                      | day, two digit minute of hour, two   |
|                                      | digit second of minute, and three    |
|                                      | digit fraction of second             |
|                                      | (HH:mm:ss.SSS).                      |
+--------------------------------------+--------------------------------------+
| ``hour_minute_second_millis``        | A formatter for a two digit hour of  |
|                                      | day, two digit minute of hour, two   |
|                                      | digit second of minute, and three    |
|                                      | digit fraction of second             |
|                                      | (HH:mm:ss.SSS).                      |
+--------------------------------------+--------------------------------------+
| ``ordinal_date``                     | A formatter for a full ordinal date, |
|                                      | using a four digit year and three    |
|                                      | digit dayOfYear (yyyy-DDD).          |
+--------------------------------------+--------------------------------------+
| ``ordinal_date_time``                | A formatter for a full ordinal date  |
|                                      | and time, using a four digit year    |
|                                      | and three digit dayOfYear            |
|                                      | (yyyy-DDD’T'HH:mm:ss.SSSZZ).         |
+--------------------------------------+--------------------------------------+
| ``ordinal_date_time_no_millis``      | A formatter for a full ordinal date  |
|                                      | and time without millis, using a     |
|                                      | four digit year and three digit      |
|                                      | dayOfYear (yyyy-DDD’T'HH:mm:ssZZ).   |
+--------------------------------------+--------------------------------------+
| ``time``                             | A formatter for a two digit hour of  |
|                                      | day, two digit minute of hour, two   |
|                                      | digit second of minute, three digit  |
|                                      | fraction of second, and time zone    |
|                                      | offset (HH:mm:ss.SSSZZ).             |
+--------------------------------------+--------------------------------------+
| ``time_no_millis``                   | A formatter for a two digit hour of  |
|                                      | day, two digit minute of hour, two   |
|                                      | digit second of minute, and time     |
|                                      | zone offset (HH:mm:ssZZ).            |
+--------------------------------------+--------------------------------------+
| ``t_time``                           | A formatter for a two digit hour of  |
|                                      | day, two digit minute of hour, two   |
|                                      | digit second of minute, three digit  |
|                                      | fraction of second, and time zone    |
|                                      | offset prefixed by *T*               |
|                                      | ('T’HH:mm:ss.SSSZZ).                 |
+--------------------------------------+--------------------------------------+
| ``t_time_no_millis``                 | A formatter for a two digit hour of  |
|                                      | day, two digit minute of hour, two   |
|                                      | digit second of minute, and time     |
|                                      | zone offset prefixed by *T*          |
|                                      | ('T’HH:mm:ssZZ).                     |
+--------------------------------------+--------------------------------------+
| ``week_date``                        | A formatter for a full date as four  |
|                                      | digit weekyear, two digit week of    |
|                                      | weekyear, and one digit day of week  |
|                                      | (xxxx-'W’ww-e).                      |
+--------------------------------------+--------------------------------------+
| ``week_date_time``                   | A formatter that combines a full     |
|                                      | weekyear date and time, separated by |
|                                      | a *T*                                |
|                                      | (xxxx-'W’ww-e’T'HH:mm:ss.SSSZZ).     |
+--------------------------------------+--------------------------------------+
| ``weekDateTimeNoMillis``             | A formatter that combines a full     |
|                                      | weekyear date and time without       |
|                                      | millis, separated by a *T*           |
|                                      | (xxxx-'W’ww-e’T'HH:mm:ssZZ).         |
+--------------------------------------+--------------------------------------+
| ``week_year``                        | A formatter for a four digit         |
|                                      | weekyear.                            |
+--------------------------------------+--------------------------------------+
| ``weekyearWeek``                     | A formatter for a four digit         |
|                                      | weekyear and two digit week of       |
|                                      | weekyear.                            |
+--------------------------------------+--------------------------------------+
| ``weekyearWeekDay``                  | A formatter for a four digit         |
|                                      | weekyear, two digit week of          |
|                                      | weekyear, and one digit day of week. |
+--------------------------------------+--------------------------------------+
| ``year``                             | A formatter for a four digit year.   |
+--------------------------------------+--------------------------------------+
| ``year_month``                       | A formatter for a four digit year    |
|                                      | and two digit month of year.         |
+--------------------------------------+--------------------------------------+
| ``year_month_day``                   | A formatter for a four digit year,   |
|                                      | two digit month of year, and two     |
|                                      | digit day of month.                  |
+--------------------------------------+--------------------------------------+

**Custom Format**

Allows for a completely customizable date format explained
`here <http://joda-time.sourceforge.net/api-release/org/joda/time/format/DateTimeFormat.html>`__.

Dynamic Mapping
===============

Default mappings allow generic mapping definitions to be automatically
applied to types that do not have mappings predefined. This is mainly
done thanks to the fact that the `object
mapping <#mapping-object-type>`__ and namely the `root object
mapping <#mapping-root-object-type>`__ allow for schema-less dynamic
addition of unmapped fields.

The default mapping definition is a plain mapping definition that is
embedded within the distribution:

.. code:: js

    {
        "_default_" : {
        }
    }

Pretty short, isn’t it? Basically, everything is defaulted, especially
the dynamic nature of the root object mapping. The default mapping
definition can be overridden in several manners. The simplest manner is
to simply define a file called ``default-mapping.json`` and to place it
under the ``config`` directory (which can be configured to exist in a
different location). It can also be explicitly set using the
``index.mapper.default_mapping_location`` setting.

The dynamic creation of mappings for unmapped types can be completely
disabled by setting ``index.mapper.dynamic`` to ``false``.

The dynamic creation of fields within a type can be completely disabled
by setting the ``dynamic`` property of the type to ``strict``.

Here is a `Put Mapping <#indices-put-mapping>`__ example that disables
dynamic field creation for a ``tweet``:

.. code:: js

    $ curl -XPUT 'http://localhost:9200/twitter/_mapping/tweet' -d '
    {
        "tweet" : {
            "dynamic": "strict",
            "properties" : {
                "message" : {"type" : "string", "store" : true }
            }
        }
    }
    '

Here is how we can change the default
`date\_formats <#mapping-date-format>`__ used in the root and inner
object types:

.. code:: js

    {
        "_default_" : {
            "dynamic_date_formats" : ["yyyy-MM-dd", "dd-MM-yyyy", "date_optional_time"]
        }
    }

**Unmapped fields in queries**

Queries and filters can refer to fields that don’t exist in a mapping.
Whether this is allowed is controlled by the
``index.query.parse.allow_unmapped_fields`` setting. This setting
defaults to ``true``. Setting it to ``false`` will disallow the usage of
unmapped fields in queries.

When registering a new `percolator query <#search-percolate>`__ or
creating a `filtered alias <#filtered>`__ then the
``index.query.parse.allow_unmapped_fields`` setting is forcefully
overwritten to disallowed unmapped fields.

Config Mappings
===============

Creating new mappings can be done using the `Put
Mapping <#indices-put-mapping>`__ API. When a document is indexed with
no mapping associated with it in the specific index, the `dynamic /
default mapping <#mapping-dynamic-mapping>`__ feature will kick in and
automatically create mapping definition for it.

Mappings can also be provided on the node level, meaning that each index
created will automatically be started with all the mappings defined
within a certain location.

Mappings can be defined within files called ``[mapping_name].json`` and
be placed either under ``config/mappings/_default`` location, or under
``config/mappings/[index_name]`` (for mappings that should be associated
only with a specific index).

Meta
====

Each mapping can have custom meta data associated with it. These are
simple storage elements that are simply persisted along with the mapping
and can be retrieved when fetching the mapping definition. The meta is
defined under the ``_meta`` element, for example:

.. code:: js

    {
        "tweet" : {
            "_meta" : {
                "attr1" : "value1",
                "attr2" : {
                    "attr3" : "value3"
                }
            }
        }
    }

Meta can be handy for example for client libraries that perform
serialization and deserialization to store its meta model (for example,
the class the document maps to).

Transform
=========

The document can be transformed before it is indexed by registering a
script in the ``transform`` element of the mapping. The result of the
transform is indexed but the original source is stored in the
``_source`` field. Example:

.. code:: js

    {
        "example" : {
            "transform" : {
                "script" : "if (ctx._source['title']?.startsWith('t')) ctx._source['suggest'] = ctx._source['content']",
                "params" : {
                    "variable" : "not used but an example anyway"
                },
                "lang": "groovy"
            },
            "properties": {
               "title": { "type": "string" },
               "content": { "type": "string" },
               "suggest": { "type": "string" }
            }
        }
    }

Its also possible to specify multiple transforms:

.. code:: js

    {
        "example" : {
            "transform" : [
                {"script": "ctx._source['suggest'] = ctx._source['content']"}
                {"script": "ctx._source['foo'] = ctx._source['bar'];"}
            ]
        }
    }

Because the result isn’t stored in the source it can’t normally be
fetched by source filtering. It can be highlighted if it is marked as
stored.

Get Transformed
---------------

The get endpoint will retransform the source if the
``_source_transform`` parameter is set. Example:

.. code:: bash

    curl -XGET "http://localhost:9200/test/example/3?pretty&_source_transform"

The transform is performed before any source filtering but it is mostly
designed to make it easy to see what was passed to the index for
debugging.

Immutable Transformation
------------------------

Once configured the transform script cannot be modified. This is not
because that is technically impossible but instead because madness lies
down that road.

The index analysis module acts as a configurable registry of Analyzers
that can be used in order to both break indexed (analyzed) fields when a
document is indexed and process query strings. It maps to the Lucene
``Analyzer``.

Analyzers are composed of a single `Tokenizer <#analysis-tokenizers>`__
and zero or more `TokenFilters <#analysis-tokenfilters>`__. The
tokenizer may be preceded by one or more
`CharFilters <#analysis-charfilters>`__. The analysis module allows one
to register ``TokenFilters``, ``Tokenizers`` and ``Analyzers`` under
logical names that can then be referenced either in mapping definitions
or in certain APIs. The Analysis module automatically registers (**if
not explicitly defined**) built in analyzers, token filters, and
tokenizers.

Here is a sample configuration:

.. code:: js

    index :
        analysis :
            analyzer :
                standard :
                    type : standard
                    stopwords : [stop1, stop2]
                myAnalyzer1 :
                    type : standard
                    stopwords : [stop1, stop2, stop3]
                    max_token_length : 500
                # configure a custom analyzer which is
                # exactly like the default standard analyzer
                myAnalyzer2 :
                    tokenizer : standard
                    filter : [standard, lowercase, stop]
            tokenizer :
                myTokenizer1 :
                    type : standard
                    max_token_length : 900
                myTokenizer2 :
                    type : keyword
                    buffer_size : 512
            filter :
                myTokenFilter1 :
                    type : stop
                    stopwords : [stop1, stop2, stop3, stop4]
                myTokenFilter2 :
                    type : length
                    min : 0
                    max : 2000

**Backwards compatibility**

All analyzers, tokenizers, and token filters can be configured with a
``version`` parameter to control which Lucene version behavior they
should use. Possible values are: ``3.0`` - ``3.6``, ``4.0`` - ``4.3``
(the highest version number is the default option).

Analyzers
=========

Analyzers are composed of a single `Tokenizer <#analysis-tokenizers>`__
and zero or more `TokenFilters <#analysis-tokenfilters>`__. The
tokenizer may be preceded by one or more
`CharFilters <#analysis-charfilters>`__. The analysis module allows you
to register ``Analyzers`` under logical names which can then be
referenced either in mapping definitions or in certain APIs.

Elasticsearch comes with a number of prebuilt analyzers which are ready
to use. Alternatively, you can combine the built in character filters,
tokenizers and token filters to create `custom
analyzers <#analysis-custom-analyzer>`__.

**Default Analyzers**

An analyzer is registered under a logical name. It can then be
referenced from mapping definitions or certain APIs. When none are
defined, defaults are used. There is an option to define which analyzers
will be used by default when none can be derived.

The ``default`` logical name allows one to configure an analyzer that
will be used both for indexing and for searching APIs. The
``default_index`` logical name can be used to configure a default
analyzer that will be used just when indexing, and the
``default_search`` can be used to configure a default analyzer that will
be used just when searching.

**Aliasing Analyzers**

Analyzers can be aliased to have several registered lookup names
associated with them. For example, the following will allow the
``standard`` analyzer to also be referenced with ``alias1`` and
``alias2`` values.

.. code:: js

    index :
      analysis :
        analyzer :
          standard :
            alias: [alias1, alias2]
            type : standard
            stopwords : [test1, test2, test3]

Below is a list of the built in analyzers.

Standard Analyzer
-----------------

An analyzer of type ``standard`` is built using the `Standard
Tokenizer <#analysis-standard-tokenizer>`__ with the `Standard Token
Filter <#analysis-standard-tokenfilter>`__, `Lower Case Token
Filter <#analysis-lowercase-tokenfilter>`__, and `Stop Token
Filter <#analysis-stop-tokenfilter>`__.

The following are settings that can be set for a ``standard`` analyzer
type:

+--------------------------------------+--------------------------------------+
| Setting                              | Description                          |
+======================================+======================================+
| ``stopwords``                        | A list of stopwords to initialize    |
|                                      | the stop filter with. Defaults to an |
|                                      | *empty* stopword list Check `Stop    |
|                                      | Analyzer <#analysis-stop-analyzer>`_ |
|                                      | _                                    |
|                                      | for more details.                    |
+--------------------------------------+--------------------------------------+
| ``max_token_length``                 | The maximum token length. If a token |
|                                      | is seen that exceeds this length     |
|                                      | then it is discarded. Defaults to    |
|                                      | ``255``.                             |
+--------------------------------------+--------------------------------------+

Simple Analyzer
---------------

An analyzer of type ``simple`` that is built using a `Lower Case
Tokenizer <#analysis-lowercase-tokenizer>`__.

Whitespace Analyzer
-------------------

An analyzer of type ``whitespace`` that is built using a `Whitespace
Tokenizer <#analysis-whitespace-tokenizer>`__.

Stop Analyzer
-------------

An analyzer of type ``stop`` that is built using a `Lower Case
Tokenizer <#analysis-lowercase-tokenizer>`__, with `Stop Token
Filter <#analysis-stop-tokenfilter>`__.

The following are settings that can be set for a ``stop`` analyzer type:

+--------------------------------------+--------------------------------------+
| Setting                              | Description                          |
+======================================+======================================+
| ``stopwords``                        | A list of stopwords to initialize    |
|                                      | the stop filter with. Defaults to    |
|                                      | the english stop words.              |
+--------------------------------------+--------------------------------------+
| ``stopwords_path``                   | A path (either relative to           |
|                                      | ``config`` location, or absolute) to |
|                                      | a stopwords file configuration.      |
+--------------------------------------+--------------------------------------+

Use ``stopwords: _none_`` to explicitly specify an *empty* stopword
list.

Keyword Analyzer
----------------

An analyzer of type ``keyword`` that "tokenizes" an entire stream as a
single token. This is useful for data like zip codes, ids and so on.
Note, when using mapping definitions, it might make more sense to simply
mark the field as ``not_analyzed``.

Pattern Analyzer
----------------

An analyzer of type ``pattern`` that can flexibly separate text into
terms via a regular expression. Accepts the following settings:

The following are settings that can be set for a ``pattern`` analyzer
type:

+--------------------------------------+--------------------------------------+
| Setting                              | Description                          |
+======================================+======================================+
| ``lowercase``                        | Should terms be lowercased or not.   |
|                                      | Defaults to ``true``.                |
+--------------------------------------+--------------------------------------+
| ``pattern``                          | The regular expression pattern,      |
|                                      | defaults to ``\W+``.                 |
+--------------------------------------+--------------------------------------+
| ``flags``                            | The regular expression flags.        |
+--------------------------------------+--------------------------------------+
| ``stopwords``                        | A list of stopwords to initialize    |
|                                      | the stop filter with. Defaults to an |
|                                      | *empty* stopword list Check `Stop    |
|                                      | Analyzer <#analysis-stop-analyzer>`_ |
|                                      | _                                    |
|                                      | for more details.                    |
+--------------------------------------+--------------------------------------+

**IMPORTANT**: The regular expression should match the **token
separators**, not the tokens themselves.

Flags should be pipe-separated, eg ``"CASE_INSENSITIVE|COMMENTS"``.
Check `Java Pattern
API <http://download.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#field_summary>`__
for more details about ``flags`` options.

**Pattern Analyzer Examples**

In order to try out these examples, you should delete the ``test`` index
before running each example:

.. code:: js

        curl -XDELETE localhost:9200/test

**Whitespace tokenizer**

.. code:: js

        curl -XPUT 'localhost:9200/test' -d '
        {
            "settings":{
                "analysis": {
                    "analyzer": {
                        "whitespace":{
                            "type": "pattern",
                            "pattern":"\\\\s+"
                        }
                    }
                }
            }
        }'

        curl 'localhost:9200/test/_analyze?pretty=1&analyzer=whitespace' -d 'foo,bar baz'
        # "foo,bar", "baz"

**Non-word character tokenizer**

.. code:: js

        curl -XPUT 'localhost:9200/test' -d '
        {
            "settings":{
                "analysis": {
                    "analyzer": {
                        "nonword":{
                            "type": "pattern",
                            "pattern":"[^\\\\w]+"
                        }
                    }
                }
            }
        }'

        curl 'localhost:9200/test/_analyze?pretty=1&analyzer=nonword' -d 'foo,bar baz'
        # "foo,bar baz" becomes "foo", "bar", "baz"

        curl 'localhost:9200/test/_analyze?pretty=1&analyzer=nonword' -d 'type_1-type_4'
        # "type_1","type_4"

**CamelCase tokenizer**

.. code:: js

        curl -XPUT 'localhost:9200/test?pretty=1' -d '
        {
            "settings":{
                "analysis": {
                    "analyzer": {
                        "camel":{
                            "type": "pattern",
                            "pattern":"([^\\\\p{L}\\\\d]+)|(?<=\\\\D)(?=\\\\d)|(?<=\\\\d)(?=\\\\D)|(?<=[\\\\p{L}&&[^\\\\p{Lu}]])(?=\\\\p{Lu})|(?<=\\\\p{Lu})(?=\\\\p{Lu}[\\\\p{L}&&[^\\\\p{Lu}]])"
                        }
                    }
                }
            }
        }'

        curl 'localhost:9200/test/_analyze?pretty=1&analyzer=camel' -d '
            MooseX::FTPClass2_beta
        '
        # "moose","x","ftp","class","2","beta"

The regex above is easier to understand as:

.. code:: js

          ([^\\p{L}\\d]+)                 # swallow non letters and numbers,
        | (?<=\\D)(?=\\d)                 # or non-number followed by number,
        | (?<=\\d)(?=\\D)                 # or number followed by non-number,
        | (?<=[ \\p{L} && [^\\p{Lu}]])    # or lower case
          (?=\\p{Lu})                    #   followed by upper case,
        | (?<=\\p{Lu})                   # or upper case
          (?=\\p{Lu}                     #   followed by upper case
            [\\p{L}&&[^\\p{Lu}]]          #   then lower case
          )

Language Analyzers
------------------

A set of analyzers aimed at analyzing specific language text. The
following types are supported: ```arabic`` <#arabic-analyzer>`__,
```armenian`` <#armenian-analyzer>`__,
```basque`` <#basque-analyzer>`__,
```brazilian`` <#brazilian-analyzer>`__,
```bulgarian`` <#bulgarian-analyzer>`__,
```catalan`` <#catalan-analyzer>`__, ```cjk`` <#cjk-analyzer>`__,
```czech`` <#czech-analyzer>`__, ```danish`` <#danish-analyzer>`__,
```dutch`` <#dutch-analyzer>`__, ```english`` <#english-analyzer>`__,
```finnish`` <#finnish-analyzer>`__, ```french`` <#french-analyzer>`__,
```galician`` <#galician-analyzer>`__,
```german`` <#german-analyzer>`__, ```greek`` <#greek-analyzer>`__,
```hindi`` <#hindi-analyzer>`__,
```hungarian`` <#hungarian-analyzer>`__,
```indonesian`` <#indonesian-analyzer>`__,
```irish`` <#irish-analyzer>`__, ```italian`` <#italian-analyzer>`__,
```latvian`` <#latvian-analyzer>`__,
```norwegian`` <#norwegian-analyzer>`__,
```persian`` <#persian-analyzer>`__,
```portuguese`` <#portuguese-analyzer>`__,
```romanian`` <#romanian-analyzer>`__,
```russian`` <#russian-analyzer>`__, ```sorani`` <#sorani-analyzer>`__,
```spanish`` <#spanish-analyzer>`__,
```swedish`` <#swedish-analyzer>`__,
```turkish`` <#turkish-analyzer>`__, ```thai`` <#thai-analyzer>`__.

Configuring language analyzers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Stopwords
^^^^^^^^^

All analyzers support setting custom ``stopwords`` either internally in
the config, or by using an external stopwords file by setting
``stopwords_path``. Check `Stop Analyzer <#analysis-stop-analyzer>`__
for more details.

Excluding words from stemming
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``stem_exclusion`` parameter allows you to specify an array of
lowercase words that should not be stemmed. Internally, this
functionality is implemented by adding the ```keyword_marker`` token
filter <#analysis-keyword-marker-tokenfilter>`__ with the ``keywords``
set to the value of the ``stem_exclusion`` parameter.

The following analyzers support setting custom ``stem_exclusion`` list:
``arabic``, ``armenian``, ``basque``, ``catalan``, ``bulgarian``,
``catalan``, ``czech``, ``finnish``, ``dutch``, ``english``,
``finnish``, ``french``, ``galician``, ``german``, ``irish``, ``hindi``,
``hungarian``, ``indonesian``, ``italian``, ``latvian``, ``norwegian``,
``portuguese``, ``romanian``, ``russian``, ``sorani``, ``spanish``,
``swedish``, ``turkish``.

Reimplementing language analyzers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The built-in language analyzers can be reimplemented as ``custom``
analyzers (as described below) in order to customize their behaviour.

    **Note**

    If you do not intend to exclude words from being stemmed (the
    equivalent of the ``stem_exclusion`` parameter above), then you
    should remove the ``keyword_marker`` token filter from the custom
    analyzer configuration.

``arabic`` analyzer
^^^^^^^^^^^^^^^^^^^

The ``arabic`` analyzer could be reimplemented as a ``custom`` analyzer
as follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "arabic_stop": {
              "type":       "stop",
              "stopwords":  "_arabic_" 
            },
            "arabic_keywords": {
              "type":       "keyword_marker",
              "keywords":   [] 
            },
            "arabic_stemmer": {
              "type":       "stemmer",
              "language":   "arabic"
            }
          },
          "analyzer": {
            "arabic": {
              "tokenizer":  "standard",
              "filter": [
                "lowercase",
                "arabic_stop",
                "arabic_normalization",
                "arabic_keywords",
                "arabic_stemmer"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

This filter should be removed unless there are words which should be
excluded from stemming.

``armenian`` analyzer
^^^^^^^^^^^^^^^^^^^^^

The ``armenian`` analyzer could be reimplemented as a ``custom``
analyzer as follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "armenian_stop": {
              "type":       "stop",
              "stopwords":  "_armenian_" 
            },
            "armenian_keywords": {
              "type":       "keyword_marker",
              "keywords":   [] 
            },
            "armenian_stemmer": {
              "type":       "stemmer",
              "language":   "armenian"
            }
          },
          "analyzer": {
            "armenian": {
              "tokenizer":  "standard",
              "filter": [
                "lowercase",
                "armenian_stop",
                "armenian_keywords",
                "armenian_stemmer"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

This filter should be removed unless there are words which should be
excluded from stemming.

``basque`` analyzer
^^^^^^^^^^^^^^^^^^^

The ``basque`` analyzer could be reimplemented as a ``custom`` analyzer
as follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "basque_stop": {
              "type":       "stop",
              "stopwords":  "_basque_" 
            },
            "basque_keywords": {
              "type":       "keyword_marker",
              "keywords":   [] 
            },
            "basque_stemmer": {
              "type":       "stemmer",
              "language":   "basque"
            }
          },
          "analyzer": {
            "basque": {
              "tokenizer":  "standard",
              "filter": [
                "lowercase",
                "basque_stop",
                "basque_keywords",
                "basque_stemmer"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

This filter should be removed unless there are words which should be
excluded from stemming.

``brazilian`` analyzer
^^^^^^^^^^^^^^^^^^^^^^

The ``brazilian`` analyzer could be reimplemented as a ``custom``
analyzer as follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "brazilian_stop": {
              "type":       "stop",
              "stopwords":  "_brazilian_" 
            },
            "brazilian_keywords": {
              "type":       "keyword_marker",
              "keywords":   [] 
            },
            "brazilian_stemmer": {
              "type":       "stemmer",
              "language":   "brazilian"
            }
          },
          "analyzer": {
            "brazilian": {
              "tokenizer":  "standard",
              "filter": [
                "lowercase",
                "brazilian_stop",
                "brazilian_keywords",
                "brazilian_stemmer"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

This filter should be removed unless there are words which should be
excluded from stemming.

``bulgarian`` analyzer
^^^^^^^^^^^^^^^^^^^^^^

The ``bulgarian`` analyzer could be reimplemented as a ``custom``
analyzer as follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "bulgarian_stop": {
              "type":       "stop",
              "stopwords":  "_bulgarian_" 
            },
            "bulgarian_keywords": {
              "type":       "keyword_marker",
              "keywords":   [] 
            },
            "bulgarian_stemmer": {
              "type":       "stemmer",
              "language":   "bulgarian"
            }
          },
          "analyzer": {
            "bulgarian": {
              "tokenizer":  "standard",
              "filter": [
                "lowercase",
                "bulgarian_stop",
                "bulgarian_keywords",
                "bulgarian_stemmer"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

This filter should be removed unless there are words which should be
excluded from stemming.

``catalan`` analyzer
^^^^^^^^^^^^^^^^^^^^

The ``catalan`` analyzer could be reimplemented as a ``custom`` analyzer
as follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "catalan_elision": {
            "type":         "elision",
                "articles": [ "d", "l", "m", "n", "s", "t"]
            },
            "catalan_stop": {
              "type":       "stop",
              "stopwords":  "_catalan_" 
            },
            "catalan_keywords": {
              "type":       "keyword_marker",
              "keywords":   [] 
            },
            "catalan_stemmer": {
              "type":       "stemmer",
              "language":   "catalan"
            }
          },
          "analyzer": {
            "catalan": {
              "tokenizer":  "standard",
              "filter": [
                "catalan_elision",
                "lowercase",
                "catalan_stop",
                "catalan_keywords",
                "catalan_stemmer"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

This filter should be removed unless there are words which should be
excluded from stemming.

``cjk`` analyzer
^^^^^^^^^^^^^^^^

The ``cjk`` analyzer could be reimplemented as a ``custom`` analyzer as
follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "english_stop": {
              "type":       "stop",
              "stopwords":  "_english_" 
            }
          },
          "analyzer": {
            "cjk": {
              "tokenizer":  "standard",
              "filter": [
                "cjk_width",
                "lowercase",
                "cjk_bigram",
                "english_stop"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

``czech`` analyzer
^^^^^^^^^^^^^^^^^^

The ``czech`` analyzer could be reimplemented as a ``custom`` analyzer
as follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "czech_stop": {
              "type":       "stop",
              "stopwords":  "_czech_" 
            },
            "czech_keywords": {
              "type":       "keyword_marker",
              "keywords":   [] 
            },
            "czech_stemmer": {
              "type":       "stemmer",
              "language":   "czech"
            }
          },
          "analyzer": {
            "czech": {
              "tokenizer":  "standard",
              "filter": [
                "lowercase",
                "czech_stop",
                "czech_keywords",
                "czech_stemmer"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

This filter should be removed unless there are words which should be
excluded from stemming.

``danish`` analyzer
^^^^^^^^^^^^^^^^^^^

The ``danish`` analyzer could be reimplemented as a ``custom`` analyzer
as follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "danish_stop": {
              "type":       "stop",
              "stopwords":  "_danish_" 
            },
            "danish_keywords": {
              "type":       "keyword_marker",
              "keywords":   [] 
            },
            "danish_stemmer": {
              "type":       "stemmer",
              "language":   "danish"
            }
          },
          "analyzer": {
            "danish": {
              "tokenizer":  "standard",
              "filter": [
                "lowercase",
                "danish_stop",
                "danish_keywords",
                "danish_stemmer"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

This filter should be removed unless there are words which should be
excluded from stemming.

``dutch`` analyzer
^^^^^^^^^^^^^^^^^^

The ``dutch`` analyzer could be reimplemented as a ``custom`` analyzer
as follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "dutch_stop": {
              "type":       "stop",
              "stopwords":  "_dutch_" 
            },
            "dutch_keywords": {
              "type":       "keyword_marker",
              "keywords":   [] 
            },
            "dutch_stemmer": {
              "type":       "stemmer",
              "language":   "dutch"
            },
            "dutch_override": {
              "type":       "stemmer_override",
              "rules": [
                "fiets=>fiets",
                "bromfiets=>bromfiets",
                "ei=>eier",
                "kind=>kinder"
              ]
            }
          },
          "analyzer": {
            "dutch": {
              "tokenizer":  "standard",
              "filter": [
                "lowercase",
                "dutch_stop",
                "dutch_keywords",
                "dutch_override",
                "dutch_stemmer"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

This filter should be removed unless there are words which should be
excluded from stemming.

``english`` analyzer
^^^^^^^^^^^^^^^^^^^^

The ``english`` analyzer could be reimplemented as a ``custom`` analyzer
as follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "english_stop": {
              "type":       "stop",
              "stopwords":  "_english_" 
            },
            "english_keywords": {
              "type":       "keyword_marker",
              "keywords":   [] 
            },
            "english_stemmer": {
              "type":       "stemmer",
              "language":   "english"
            },
            "english_possessive_stemmer": {
              "type":       "stemmer",
              "language":   "possessive_english"
            }
          },
          "analyzer": {
            "english": {
              "tokenizer":  "standard",
              "filter": [
                "english_possessive_stemmer",
                "lowercase",
                "english_stop",
                "english_keywords",
                "english_stemmer"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

This filter should be removed unless there are words which should be
excluded from stemming.

``finnish`` analyzer
^^^^^^^^^^^^^^^^^^^^

The ``finnish`` analyzer could be reimplemented as a ``custom`` analyzer
as follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "finnish_stop": {
              "type":       "stop",
              "stopwords":  "_finnish_" 
            },
            "finnish_keywords": {
              "type":       "keyword_marker",
              "keywords":   [] 
            },
            "finnish_stemmer": {
              "type":       "stemmer",
              "language":   "finnish"
            }
          },
          "analyzer": {
            "finnish": {
              "tokenizer":  "standard",
              "filter": [
                "lowercase",
                "finnish_stop",
                "finnish_keywords",
                "finnish_stemmer"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

This filter should be removed unless there are words which should be
excluded from stemming.

``french`` analyzer
^^^^^^^^^^^^^^^^^^^

The ``french`` analyzer could be reimplemented as a ``custom`` analyzer
as follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "french_elision": {
            "type":         "elision",
                "articles": [ "l", "m", "t", "qu", "n", "s",
                              "j", "d", "c", "jusqu", "quoiqu",
                              "lorsqu", "puisqu"
                            ]
            },
            "french_stop": {
              "type":       "stop",
              "stopwords":  "_french_" 
            },
            "french_keywords": {
              "type":       "keyword_marker",
              "keywords":   [] 
            },
            "french_stemmer": {
              "type":       "stemmer",
              "language":   "light_french"
            }
          },
          "analyzer": {
            "french": {
              "tokenizer":  "standard",
              "filter": [
                "french_elision",
                "lowercase",
                "french_stop",
                "french_keywords",
                "french_stemmer"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

This filter should be removed unless there are words which should be
excluded from stemming.

``galician`` analyzer
^^^^^^^^^^^^^^^^^^^^^

The ``galician`` analyzer could be reimplemented as a ``custom``
analyzer as follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "galician_stop": {
              "type":       "stop",
              "stopwords":  "_galician_" 
            },
            "galician_keywords": {
              "type":       "keyword_marker",
              "keywords":   [] 
            },
            "galician_stemmer": {
              "type":       "stemmer",
              "language":   "galician"
            }
          },
          "analyzer": {
            "galician": {
              "tokenizer":  "standard",
              "filter": [
                "lowercase",
                "galician_stop",
                "galician_keywords",
                "galician_stemmer"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

This filter should be removed unless there are words which should be
excluded from stemming.

``german`` analyzer
^^^^^^^^^^^^^^^^^^^

The ``german`` analyzer could be reimplemented as a ``custom`` analyzer
as follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "german_stop": {
              "type":       "stop",
              "stopwords":  "_german_" 
            },
            "german_keywords": {
              "type":       "keyword_marker",
              "keywords":   [] 
            },
            "german_stemmer": {
              "type":       "stemmer",
              "language":   "light_german"
            }
          },
          "analyzer": {
            "german": {
              "tokenizer":  "standard",
              "filter": [
                "lowercase",
                "german_stop",
                "german_keywords",
                "german_normalization",
                "german_stemmer"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

This filter should be removed unless there are words which should be
excluded from stemming.

``greek`` analyzer
^^^^^^^^^^^^^^^^^^

The ``greek`` analyzer could be reimplemented as a ``custom`` analyzer
as follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "greek_stop": {
              "type":       "stop",
              "stopwords":  "_greek_" 
            },
            "greek_lowercase": {
              "type":       "lowercase",
              "language":   "greek"
            },
            "greek_keywords": {
              "type":       "keyword_marker",
              "keywords":   [] 
            },
            "greek_stemmer": {
              "type":       "stemmer",
              "language":   "greek"
            }
          },
          "analyzer": {
            "greek": {
              "tokenizer":  "standard",
              "filter": [
                "greek_lowercase",
                "greek_stop",
                "greek_keywords",
                "greek_stemmer"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

This filter should be removed unless there are words which should be
excluded from stemming.

``hindi`` analyzer
^^^^^^^^^^^^^^^^^^

The ``hindi`` analyzer could be reimplemented as a ``custom`` analyzer
as follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "hindi_stop": {
              "type":       "stop",
              "stopwords":  "_hindi_" 
            },
            "hindi_keywords": {
              "type":       "keyword_marker",
              "keywords":   [] 
            },
            "hindi_stemmer": {
              "type":       "stemmer",
              "language":   "hindi"
            }
          },
          "analyzer": {
            "hindi": {
              "tokenizer":  "standard",
              "filter": [
                "lowercase",
                "indic_normalization",
                "hindi_normalization",
                "hindi_stop",
                "hindi_keywords",
                "hindi_stemmer"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

This filter should be removed unless there are words which should be
excluded from stemming.

``hungarian`` analyzer
^^^^^^^^^^^^^^^^^^^^^^

The ``hungarian`` analyzer could be reimplemented as a ``custom``
analyzer as follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "hungarian_stop": {
              "type":       "stop",
              "stopwords":  "_hungarian_" 
            },
            "hungarian_keywords": {
              "type":       "keyword_marker",
              "keywords":   [] 
            },
            "hungarian_stemmer": {
              "type":       "stemmer",
              "language":   "hungarian"
            }
          },
          "analyzer": {
            "hungarian": {
              "tokenizer":  "standard",
              "filter": [
                "lowercase",
                "hungarian_stop",
                "hungarian_keywords",
                "hungarian_stemmer"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

This filter should be removed unless there are words which should be
excluded from stemming.

``indonesian`` analyzer
^^^^^^^^^^^^^^^^^^^^^^^

The ``indonesian`` analyzer could be reimplemented as a ``custom``
analyzer as follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "indonesian_stop": {
              "type":       "stop",
              "stopwords":  "_indonesian_" 
            },
            "indonesian_keywords": {
              "type":       "keyword_marker",
              "keywords":   [] 
            },
            "indonesian_stemmer": {
              "type":       "stemmer",
              "language":   "indonesian"
            }
          },
          "analyzer": {
            "indonesian": {
              "tokenizer":  "standard",
              "filter": [
                "lowercase",
                "indonesian_stop",
                "indonesian_keywords",
                "indonesian_stemmer"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

This filter should be removed unless there are words which should be
excluded from stemming.

``irish`` analyzer
^^^^^^^^^^^^^^^^^^

The ``irish`` analyzer could be reimplemented as a ``custom`` analyzer
as follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "irish_elision": {
              "type":       "elision",
              "articles": [ "h", "n", "t" ]
            },
            "irish_stop": {
              "type":       "stop",
              "stopwords":  "_irish_" 
            },
            "irish_lowercase": {
              "type":       "lowercase",
              "language":   "irish"
            },
            "irish_keywords": {
              "type":       "keyword_marker",
              "keywords":   [] 
            },
            "irish_stemmer": {
              "type":       "stemmer",
              "language":   "irish"
            }
          },
          "analyzer": {
            "irish": {
              "tokenizer":  "standard",
              "filter": [
                "irish_stop",
                "irish_elision",
                "irish_lowercase",
                "irish_keywords",
                "irish_stemmer"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

This filter should be removed unless there are words which should be
excluded from stemming.

``italian`` analyzer
^^^^^^^^^^^^^^^^^^^^

The ``italian`` analyzer could be reimplemented as a ``custom`` analyzer
as follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "italian_elision": {
            "type":         "elision",
                "articles": [
                    "c", "l", "all", "dall", "dell",
                    "nell", "sull", "coll", "pell",
                    "gl", "agl", "dagl", "degl", "negl",
                    "sugl", "un", "m", "t", "s", "v", "d"
                ]
            },
            "italian_stop": {
              "type":       "stop",
              "stopwords":  "_italian_" 
            },
            "italian_keywords": {
              "type":       "keyword_marker",
              "keywords":   [] 
            },
            "italian_stemmer": {
              "type":       "stemmer",
              "language":   "light_italian"
            }
          },
          "analyzer": {
            "italian": {
              "tokenizer":  "standard",
              "filter": [
                "italian_elision",
                "lowercase",
                "italian_stop",
                "italian_keywords",
                "italian_stemmer"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

This filter should be removed unless there are words which should be
excluded from stemming.

``latvian`` analyzer
^^^^^^^^^^^^^^^^^^^^

The ``latvian`` analyzer could be reimplemented as a ``custom`` analyzer
as follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "latvian_stop": {
              "type":       "stop",
              "stopwords":  "_latvian_" 
            },
            "latvian_keywords": {
              "type":       "keyword_marker",
              "keywords":   [] 
            },
            "latvian_stemmer": {
              "type":       "stemmer",
              "language":   "latvian"
            }
          },
          "analyzer": {
            "latvian": {
              "tokenizer":  "standard",
              "filter": [
                "lowercase",
                "latvian_stop",
                "latvian_keywords",
                "latvian_stemmer"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

This filter should be removed unless there are words which should be
excluded from stemming.

``norwegian`` analyzer
^^^^^^^^^^^^^^^^^^^^^^

The ``norwegian`` analyzer could be reimplemented as a ``custom``
analyzer as follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "norwegian_stop": {
              "type":       "stop",
              "stopwords":  "_norwegian_" 
            },
            "norwegian_keywords": {
              "type":       "keyword_marker",
              "keywords":   [] 
            },
            "norwegian_stemmer": {
              "type":       "stemmer",
              "language":   "norwegian"
            }
          },
          "analyzer": {
            "norwegian": {
              "tokenizer":  "standard",
              "filter": [
                "lowercase",
                "norwegian_stop",
                "norwegian_keywords",
                "norwegian_stemmer"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

This filter should be removed unless there are words which should be
excluded from stemming.

``persian`` analyzer
^^^^^^^^^^^^^^^^^^^^

The ``persian`` analyzer could be reimplemented as a ``custom`` analyzer
as follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "char_filter": {
            "zero_width_spaces": {
                "type":       "mapping",
                "mappings": [ "\\u200C=> "] 
            }
          },
          "filter": {
            "persian_stop": {
              "type":       "stop",
              "stopwords":  "_persian_" 
            }
          },
          "analyzer": {
            "persian": {
              "tokenizer":     "standard",
              "char_filter": [ "zero_width_spaces" ],
              "filter": [
                "lowercase",
                "arabic_normalization",
                "persian_normalization",
                "persian_stop"
              ]
            }
          }
        }
      }
    }

Replaces zero-width non-joiners with an ASCII space.

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

``portuguese`` analyzer
^^^^^^^^^^^^^^^^^^^^^^^

The ``portuguese`` analyzer could be reimplemented as a ``custom``
analyzer as follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "portuguese_stop": {
              "type":       "stop",
              "stopwords":  "_portuguese_" 
            },
            "portuguese_keywords": {
              "type":       "keyword_marker",
              "keywords":   [] 
            },
            "portuguese_stemmer": {
              "type":       "stemmer",
              "language":   "light_portuguese"
            }
          },
          "analyzer": {
            "portuguese": {
              "tokenizer":  "standard",
              "filter": [
                "lowercase",
                "portuguese_stop",
                "portuguese_keywords",
                "portuguese_stemmer"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

This filter should be removed unless there are words which should be
excluded from stemming.

``romanian`` analyzer
^^^^^^^^^^^^^^^^^^^^^

The ``romanian`` analyzer could be reimplemented as a ``custom``
analyzer as follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "romanian_stop": {
              "type":       "stop",
              "stopwords":  "_romanian_" 
            },
            "romanian_keywords": {
              "type":       "keyword_marker",
              "keywords":   [] 
            },
            "romanian_stemmer": {
              "type":       "stemmer",
              "language":   "romanian"
            }
          },
          "analyzer": {
            "romanian": {
              "tokenizer":  "standard",
              "filter": [
                "lowercase",
                "romanian_stop",
                "romanian_keywords",
                "romanian_stemmer"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

This filter should be removed unless there are words which should be
excluded from stemming.

``russian`` analyzer
^^^^^^^^^^^^^^^^^^^^

The ``russian`` analyzer could be reimplemented as a ``custom`` analyzer
as follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "russian_stop": {
              "type":       "stop",
              "stopwords":  "_russian_" 
            },
            "russian_keywords": {
              "type":       "keyword_marker",
              "keywords":   [] 
            },
            "russian_stemmer": {
              "type":       "stemmer",
              "language":   "russian"
            }
          },
          "analyzer": {
            "russian": {
              "tokenizer":  "standard",
              "filter": [
                "lowercase",
                "russian_stop",
                "russian_keywords",
                "russian_stemmer"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

This filter should be removed unless there are words which should be
excluded from stemming.

``sorani`` analyzer
^^^^^^^^^^^^^^^^^^^

The ``sorani`` analyzer could be reimplemented as a ``custom`` analyzer
as follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "sorani_stop": {
              "type":       "stop",
              "stopwords":  "_sorani_" 
            },
            "sorani_keywords": {
              "type":       "keyword_marker",
              "keywords":   [] 
            },
            "sorani_stemmer": {
              "type":       "stemmer",
              "language":   "sorani"
            }
          },
          "analyzer": {
            "sorani": {
              "tokenizer":  "standard",
              "filter": [
                "sorani_normalization",
                "lowercase",
                "sorani_stop",
                "sorani_keywords",
                "sorani_stemmer"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

This filter should be removed unless there are words which should be
excluded from stemming.

``spanish`` analyzer
^^^^^^^^^^^^^^^^^^^^

The ``spanish`` analyzer could be reimplemented as a ``custom`` analyzer
as follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "spanish_stop": {
              "type":       "stop",
              "stopwords":  "_spanish_" 
            },
            "spanish_keywords": {
              "type":       "keyword_marker",
              "keywords":   [] 
            },
            "spanish_stemmer": {
              "type":       "stemmer",
              "language":   "light_spanish"
            }
          },
          "analyzer": {
            "spanish": {
              "tokenizer":  "standard",
              "filter": [
                "lowercase",
                "spanish_stop",
                "spanish_keywords",
                "spanish_stemmer"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

This filter should be removed unless there are words which should be
excluded from stemming.

``swedish`` analyzer
^^^^^^^^^^^^^^^^^^^^

The ``swedish`` analyzer could be reimplemented as a ``custom`` analyzer
as follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "swedish_stop": {
              "type":       "stop",
              "stopwords":  "_swedish_" 
            },
            "swedish_keywords": {
              "type":       "keyword_marker",
              "keywords":   [] 
            },
            "swedish_stemmer": {
              "type":       "stemmer",
              "language":   "swedish"
            }
          },
          "analyzer": {
            "swedish": {
              "tokenizer":  "standard",
              "filter": [
                "lowercase",
                "swedish_stop",
                "swedish_keywords",
                "swedish_stemmer"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

This filter should be removed unless there are words which should be
excluded from stemming.

``turkish`` analyzer
^^^^^^^^^^^^^^^^^^^^

The ``turkish`` analyzer could be reimplemented as a ``custom`` analyzer
as follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "turkish_stop": {
              "type":       "stop",
              "stopwords":  "_turkish_" 
            },
            "turkish_lowercase": {
              "type":       "lowercase",
              "language":   "turkish"
            },
            "turkish_keywords": {
              "type":       "keyword_marker",
              "keywords":   [] 
            },
            "turkish_stemmer": {
              "type":       "stemmer",
              "language":   "turkish"
            }
          },
          "analyzer": {
            "turkish": {
              "tokenizer":  "standard",
              "filter": [
                "apostrophe",
                "turkish_lowercase",
                "turkish_stop",
                "turkish_keywords",
                "turkish_stemmer"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

This filter should be removed unless there are words which should be
excluded from stemming.

``thai`` analyzer
^^^^^^^^^^^^^^^^^

The ``thai`` analyzer could be reimplemented as a ``custom`` analyzer as
follows:

.. code:: js

    {
      "settings": {
        "analysis": {
          "filter": {
            "thai_stop": {
              "type":       "stop",
              "stopwords":  "_thai_" 
            }
          },
          "analyzer": {
            "thai": {
              "tokenizer":  "thai",
              "filter": [
                "lowercase",
                "thai_stop"
              ]
            }
          }
        }
      }
    }

The default stopwords can be overridden with the ``stopwords`` or
``stopwords_path`` parameters.

Snowball Analyzer
-----------------

An analyzer of type ``snowball`` that uses the `standard
tokenizer <#analysis-standard-tokenizer>`__, with `standard
filter <#analysis-standard-tokenfilter>`__, `lowercase
filter <#analysis-lowercase-tokenfilter>`__, `stop
filter <#analysis-stop-tokenfilter>`__, and `snowball
filter <#analysis-snowball-tokenfilter>`__.

The Snowball Analyzer is a stemming analyzer from Lucene that is
originally based on the snowball project from
`snowball.tartarus.org <http://snowball.tartarus.org>`__.

Sample usage:

.. code:: js

    {
        "index" : {
            "analysis" : {
                "analyzer" : {
                    "my_analyzer" : {
                        "type" : "snowball",
                        "language" : "English"
                    }
                }
            }
        }
    }

The ``language`` parameter can have the same values as the `snowball
filter <#analysis-snowball-tokenfilter>`__ and defaults to ``English``.
Note that not all the language analyzers have a default set of stopwords
provided.

The ``stopwords`` parameter can be used to provide stopwords for the
languages that have no defaults, or to simply replace the default set
with your custom list. Check `Stop Analyzer <#analysis-stop-analyzer>`__
for more details. A default set of stopwords for many of these languages
is available from for instance
`here <https://github.com/apache/lucene-solr/tree/trunk/lucene/analysis/common/src/resources/org/apache/lucene/analysis/>`__
and
`here. <https://github.com/apache/lucene-solr/tree/trunk/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball>`__

A sample configuration (in YAML format) specifying Swedish with
stopwords:

.. code:: js

    index :
        analysis :
            analyzer :
               my_analyzer:
                    type: snowball
                    language: Swedish
                    stopwords: "och,det,att,i,en,jag,hon,som,han,på,den,med,var,sig,för,så,till,är,men,ett,om,hade,de,av,icke,mig,du,henne,då,sin,nu,har,inte,hans,honom,skulle,hennes,där,min,man,ej,vid,kunde,något,från,ut,när,efter,upp,vi,dem,vara,vad,över,än,dig,kan,sina,här,ha,mot,alla,under,någon,allt,mycket,sedan,ju,denna,själv,detta,åt,utan,varit,hur,ingen,mitt,ni,bli,blev,oss,din,dessa,några,deras,blir,mina,samma,vilken,er,sådan,vår,blivit,dess,inom,mellan,sådant,varför,varje,vilka,ditt,vem,vilket,sitta,sådana,vart,dina,vars,vårt,våra,ert,era,vilkas"

Custom Analyzer
---------------

An analyzer of type ``custom`` that allows to combine a ``Tokenizer``
with zero or more ``Token Filters``, and zero or more ``Char Filters``.
The custom analyzer accepts a logical/registered name of the tokenizer
to use, and a list of logical/registered names of token filters.

The following are settings that can be set for a ``custom`` analyzer
type:

+--------------------------------------+--------------------------------------+
| Setting                              | Description                          |
+======================================+======================================+
| ``tokenizer``                        | The logical / registered name of the |
|                                      | tokenizer to use.                    |
+--------------------------------------+--------------------------------------+
| ``filter``                           | An optional list of logical /        |
|                                      | registered name of token filters.    |
+--------------------------------------+--------------------------------------+
| ``char_filter``                      | An optional list of logical /        |
|                                      | registered name of char filters.     |
+--------------------------------------+--------------------------------------+

Here is an example:

.. code:: js

    index :
        analysis :
            analyzer :
                myAnalyzer2 :
                    type : custom
                    tokenizer : myTokenizer1
                    filter : [myTokenFilter1, myTokenFilter2]
                    char_filter : [my_html]
            tokenizer :
                myTokenizer1 :
                    type : standard
                    max_token_length : 900
            filter :
                myTokenFilter1 :
                    type : stop
                    stopwords : [stop1, stop2, stop3, stop4]
                myTokenFilter2 :
                    type : length
                    min : 0
                    max : 2000
            char_filter :
                  my_html :
                    type : html_strip
                    escaped_tags : [xxx, yyy]
                    read_ahead : 1024

Tokenizers
==========

Tokenizers are used to break a string down into a stream of terms or
tokens. A simple tokenizer might split the string up into terms wherever
it encounters whitespace or punctuation.

Elasticsearch has a number of built in tokenizers which can be used to
build `custom analyzers <#analysis-custom-analyzer>`__.

Standard Tokenizer
------------------

A tokenizer of type ``standard`` providing grammar based tokenizer that
is a good tokenizer for most European language documents. The tokenizer
implements the Unicode Text Segmentation algorithm, as specified in
`Unicode Standard Annex #29 <http://unicode.org/reports/tr29/>`__.

The following are settings that can be set for a ``standard`` tokenizer
type:

+--------------------------------------+--------------------------------------+
| Setting                              | Description                          |
+======================================+======================================+
| ``max_token_length``                 | The maximum token length. If a token |
|                                      | is seen that exceeds this length     |
|                                      | then it is discarded. Defaults to    |
|                                      | ``255``.                             |
+--------------------------------------+--------------------------------------+

Edge NGram Tokenizer
--------------------

A tokenizer of type ``edgeNGram``.

This tokenizer is very similar to ``nGram`` but only keeps n-grams which
start at the beginning of a token.

The following are settings that can be set for a ``edgeNGram`` tokenizer
type:

+--------------------------+--------------------------+--------------------------+
| Setting                  | Description              | Default value            |
+==========================+==========================+==========================+
| ``min_gram``             | Minimum size in          | ``1``.                   |
|                          | codepoints of a single   |                          |
|                          | n-gram                   |                          |
+--------------------------+--------------------------+--------------------------+
| ``max_gram``             | Maximum size in          | ``2``.                   |
|                          | codepoints of a single   |                          |
|                          | n-gram                   |                          |
+--------------------------+--------------------------+--------------------------+
| ``token_chars``          | Characters classes to    | ``[]`` (Keep all         |
|                          | keep in the tokens,      | characters)              |
|                          | Elasticsearch will split |                          |
|                          | on characters that don’t |                          |
|                          | belong to any of these   |                          |
|                          | classes.                 |                          |
+--------------------------+--------------------------+--------------------------+

``token_chars`` accepts the following character classes:

+------------+---------------------------------------------------------------+
| ``letter`` | for example ``a``, ``b``, ``ï`` or ``京``                     |
+------------+---------------------------------------------------------------+
| ``digit``  | for example ``3`` or ``7``                                    |
+------------+---------------------------------------------------------------+
| ``whitespa | for example ``" "`` or ``"\n"``                               |
| ce``       |                                                               |
+------------+---------------------------------------------------------------+
| ``punctuat | for example ``!`` or ``"``                                    |
| ion``      |                                                               |
+------------+---------------------------------------------------------------+
| ``symbol`` | for example ``$`` or ``√``                                    |
+------------+---------------------------------------------------------------+

**Example**

.. code:: js

        curl -XPUT 'localhost:9200/test' -d '
        {
            "settings" : {
                "analysis" : {
                    "analyzer" : {
                        "my_edge_ngram_analyzer" : {
                            "tokenizer" : "my_edge_ngram_tokenizer"
                        }
                    },
                    "tokenizer" : {
                        "my_edge_ngram_tokenizer" : {
                            "type" : "edgeNGram",
                            "min_gram" : "2",
                            "max_gram" : "5",
                            "token_chars": [ "letter", "digit" ]
                        }
                    }
                }
            }
        }'

        curl 'localhost:9200/test/_analyze?pretty=1&analyzer=my_edge_ngram_analyzer' -d 'FC Schalke 04'
        # FC, Sc, Sch, Scha, Schal, 04

**``side`` deprecated**

There used to be a ``side`` parameter up to ``0.90.1`` but it is now
deprecated. In order to emulate the behavior of ``"side" : "BACK"`` a
```reverse`` token filter <#analysis-reverse-tokenfilter>`__ should be
used together with the ```edgeNGram`` token
filter <#analysis-edgengram-tokenfilter>`__. The ``edgeNGram`` filter
must be enclosed in ``reverse`` filters like this:

.. code:: js

        "filter" : ["reverse", "edgeNGram", "reverse"]

which essentially reverses the token, builds front ``EdgeNGrams`` and
reverses the ngram again. This has the same effect as the previous
``"side" : "BACK"`` setting.

Keyword Tokenizer
-----------------

A tokenizer of type ``keyword`` that emits the entire input as a single
output.

The following are settings that can be set for a ``keyword`` tokenizer
type:

+--------------------------------------+--------------------------------------+
| Setting                              | Description                          |
+======================================+======================================+
| ``buffer_size``                      | The term buffer size. Defaults to    |
|                                      | ``256``.                             |
+--------------------------------------+--------------------------------------+

Letter Tokenizer
----------------

A tokenizer of type ``letter`` that divides text at non-letters. That’s
to say, it defines tokens as maximal strings of adjacent letters. Note,
this does a decent job for most European languages, but does a terrible
job for some Asian languages, where words are not separated by spaces.

Lowercase Tokenizer
-------------------

A tokenizer of type ``lowercase`` that performs the function of `Letter
Tokenizer <#analysis-letter-tokenizer>`__ and `Lower Case Token
Filter <#analysis-lowercase-tokenfilter>`__ together. It divides text at
non-letters and converts them to lower case. While it is functionally
equivalent to the combination of `Letter
Tokenizer <#analysis-letter-tokenizer>`__ and `Lower Case Token
Filter <#analysis-lowercase-tokenfilter>`__, there is a performance
advantage to doing the two tasks at once, hence this (redundant)
implementation.

NGram Tokenizer
---------------

A tokenizer of type ``nGram``.

The following are settings that can be set for a ``nGram`` tokenizer
type:

+--------------------------+--------------------------+--------------------------+
| Setting                  | Description              | Default value            |
+==========================+==========================+==========================+
| ``min_gram``             | Minimum size in          | ``1``.                   |
|                          | codepoints of a single   |                          |
|                          | n-gram                   |                          |
+--------------------------+--------------------------+--------------------------+
| ``max_gram``             | Maximum size in          | ``2``.                   |
|                          | codepoints of a single   |                          |
|                          | n-gram                   |                          |
+--------------------------+--------------------------+--------------------------+
| ``token_chars``          | Characters classes to    | ``[]`` (Keep all         |
|                          | keep in the tokens,      | characters)              |
|                          | Elasticsearch will split |                          |
|                          | on characters that don’t |                          |
|                          | belong to any of these   |                          |
|                          | classes.                 |                          |
+--------------------------+--------------------------+--------------------------+

``token_chars`` accepts the following character classes:

+------------+---------------------------------------------------------------+
| ``letter`` | for example ``a``, ``b``, ``ï`` or ``京``                     |
+------------+---------------------------------------------------------------+
| ``digit``  | for example ``3`` or ``7``                                    |
+------------+---------------------------------------------------------------+
| ``whitespa | for example ``" "`` or ``"\n"``                               |
| ce``       |                                                               |
+------------+---------------------------------------------------------------+
| ``punctuat | for example ``!`` or ``"``                                    |
| ion``      |                                                               |
+------------+---------------------------------------------------------------+
| ``symbol`` | for example ``$`` or ``√``                                    |
+------------+---------------------------------------------------------------+

**Example**

.. code:: js

        curl -XPUT 'localhost:9200/test' -d '
        {
            "settings" : {
                "analysis" : {
                    "analyzer" : {
                        "my_ngram_analyzer" : {
                            "tokenizer" : "my_ngram_tokenizer"
                        }
                    },
                    "tokenizer" : {
                        "my_ngram_tokenizer" : {
                            "type" : "nGram",
                            "min_gram" : "2",
                            "max_gram" : "3",
                            "token_chars": [ "letter", "digit" ]
                        }
                    }
                }
            }
        }'

        curl 'localhost:9200/test/_analyze?pretty=1&analyzer=my_ngram_analyzer' -d 'FC Schalke 04'
        # FC, Sc, Sch, ch, cha, ha, hal, al, alk, lk, lke, ke, 04

Whitespace Tokenizer
--------------------

A tokenizer of type ``whitespace`` that divides text at whitespace.

Pattern Tokenizer
-----------------

A tokenizer of type ``pattern`` that can flexibly separate text into
terms via a regular expression. Accepts the following settings:

+--------------------------------------+--------------------------------------+
| Setting                              | Description                          |
+======================================+======================================+
| ``pattern``                          | The regular expression pattern,      |
|                                      | defaults to ``\W+``.                 |
+--------------------------------------+--------------------------------------+
| ``flags``                            | The regular expression flags.        |
+--------------------------------------+--------------------------------------+
| ``group``                            | Which group to extract into tokens.  |
|                                      | Defaults to ``-1`` (split).          |
+--------------------------------------+--------------------------------------+

**IMPORTANT**: The regular expression should match the **token
separators**, not the tokens themselves.

Note that you may need to escape ``pattern`` string literal according to
your client language rules. For example, in many programming languages a
string literal for ``\W+`` pattern is written as ``"\\W+"``. There is
nothing special about ``pattern`` (you may have to escape other string
literals as well); escaping ``pattern`` is common just because it often
contains characters that should be escaped.

``group`` set to ``-1`` (the default) is equivalent to "split". Using
group >= 0 selects the matching group as the token. For example, if you
have:

::

    pattern = '([^']+)'
    group   = 0
    input   = aaa 'bbb' 'ccc'

the output will be two tokens: ``'bbb'`` and ``'ccc'`` (including the
``'`` marks). With the same input but using group=1, the output would
be: ``bbb`` and ``ccc`` (no ``'`` marks).

UAX Email URL Tokenizer
-----------------------

A tokenizer of type ``uax_url_email`` which works exactly like the
``standard`` tokenizer, but tokenizes emails and urls as single tokens.

The following are settings that can be set for a ``uax_url_email``
tokenizer type:

+--------------------------------------+--------------------------------------+
| Setting                              | Description                          |
+======================================+======================================+
| ``max_token_length``                 | The maximum token length. If a token |
|                                      | is seen that exceeds this length     |
|                                      | then it is discarded. Defaults to    |
|                                      | ``255``.                             |
+--------------------------------------+--------------------------------------+

Path Hierarchy Tokenizer
------------------------

The ``path_hierarchy`` tokenizer takes something like this:

::

    /something/something/else

And produces tokens:

::

    /something
    /something/something
    /something/something/else

+--------------------------------------+--------------------------------------+
| Setting                              | Description                          |
+======================================+======================================+
| ``delimiter``                        | The character delimiter to use,      |
|                                      | defaults to ``/``.                   |
+--------------------------------------+--------------------------------------+
| ``replacement``                      | An optional replacement character to |
|                                      | use. Defaults to the ``delimiter``.  |
+--------------------------------------+--------------------------------------+
| ``buffer_size``                      | The buffer size to use, defaults to  |
|                                      | ``1024``.                            |
+--------------------------------------+--------------------------------------+
| ``reverse``                          | Generates tokens in reverse order,   |
|                                      | defaults to ``false``.               |
+--------------------------------------+--------------------------------------+
| ``skip``                             | Controls initial tokens to skip,     |
|                                      | defaults to ``0``.                   |
+--------------------------------------+--------------------------------------+

Classic Tokenizer
-----------------

A tokenizer of type ``classic`` providing grammar based tokenizer that
is a good tokenizer for English language documents. This tokenizer has
heuristics for special treatment of acronyms, company names, email
addresses, and internet host names. However, these rules don’t always
work, and the tokenizer doesn’t work well for most languages other than
English.

The following are settings that can be set for a ``classic`` tokenizer
type:

+--------------------------------------+--------------------------------------+
| Setting                              | Description                          |
+======================================+======================================+
| ``max_token_length``                 | The maximum token length. If a token |
|                                      | is seen that exceeds this length     |
|                                      | then it is discarded. Defaults to    |
|                                      | ``255``.                             |
+--------------------------------------+--------------------------------------+

Thai Tokenizer
--------------

A tokenizer of type ``thai`` that segments Thai text into words. This
tokenizer uses the built-in Thai segmentation algorithm included with
Java to divide up Thai text. Text in other languages in general will be
treated the same as ``standard``.

Token Filters
=============

Token filters accept a stream of tokens from a
`tokenizer <#analysis-tokenizers>`__ and can modify tokens (eg
lowercasing), delete tokens (eg remove stopwords) or add tokens (eg
synonyms).

Elasticsearch has a number of built in token filters which can be used
to build `custom analyzers <#analysis-custom-analyzer>`__.

Standard Token Filter
---------------------

A token filter of type ``standard`` that normalizes tokens extracted
with the `Standard Tokenizer <#analysis-standard-tokenizer>`__.

ASCII Folding Token Filter
--------------------------

A token filter of type ``asciifolding`` that converts alphabetic,
numeric, and symbolic Unicode characters which are not in the first 127
ASCII characters (the "Basic Latin" Unicode block) into their ASCII
equivalents, if one exists. Example:

.. code:: js

    "index" : {
        "analysis" : {
            "analyzer" : {
                "default" : {
                    "tokenizer" : "standard",
                    "filter" : ["standard", "asciifolding"]
                }
            }
        }
    }

Accepts ``preserve_original`` setting which defaults to false but if
true will keep the original token as well as emit the folded token. For
example:

.. code:: js

    "index" : {
        "analysis" : {
            "analyzer" : {
                "default" : {
                    "tokenizer" : "standard",
                    "filter" : ["standard", "my_ascii_folding"]
                }
            },
            "filter" : {
                "my_ascii_folding" : {
                    "type" : "asciifolding",
                    "preserve_original" : true
                }
            }
        }
    }

Length Token Filter
-------------------

A token filter of type ``length`` that removes words that are too long
or too short for the stream.

The following are settings that can be set for a ``length`` token filter
type:

+--------------------------------------+--------------------------------------+
| Setting                              | Description                          |
+======================================+======================================+
| ``min``                              | The minimum number. Defaults to      |
|                                      | ``0``.                               |
+--------------------------------------+--------------------------------------+
| ``max``                              | The maximum number. Defaults to      |
|                                      | ``Integer.MAX_VALUE``.               |
+--------------------------------------+--------------------------------------+

Lowercase Token Filter
----------------------

A token filter of type ``lowercase`` that normalizes token text to lower
case.

Lowercase token filter supports Greek, Irish, and Turkish lowercase
token filters through the ``language`` parameter. Below is a usage
example in a custom analyzer

.. code:: js

    index :
        analysis :
            analyzer :
                myAnalyzer2 :
                    type : custom
                    tokenizer : myTokenizer1
                    filter : [myTokenFilter1, myGreekLowerCaseFilter]
                    char_filter : [my_html]
            tokenizer :
                myTokenizer1 :
                    type : standard
                    max_token_length : 900
            filter :
                myTokenFilter1 :
                    type : stop
                    stopwords : [stop1, stop2, stop3, stop4]
                myGreekLowerCaseFilter :
                    type : lowercase
                    language : greek
            char_filter :
                  my_html :
                    type : html_strip
                    escaped_tags : [xxx, yyy]
                    read_ahead : 1024

Uppercase Token Filter
----------------------

A token filter of type ``uppercase`` that normalizes token text to upper
case.

NGram Token Filter
------------------

A token filter of type ``nGram``.

The following are settings that can be set for a ``nGram`` token filter
type:

+--------------------------------------+--------------------------------------+
| Setting                              | Description                          |
+======================================+======================================+
| ``min_gram``                         | Defaults to ``1``.                   |
+--------------------------------------+--------------------------------------+
| ``max_gram``                         | Defaults to ``2``.                   |
+--------------------------------------+--------------------------------------+

Edge NGram Token Filter
-----------------------

A token filter of type ``edgeNGram``.

The following are settings that can be set for a ``edgeNGram`` token
filter type:

+--------------------------------------+--------------------------------------+
| Setting                              | Description                          |
+======================================+======================================+
| ``min_gram``                         | Defaults to ``1``.                   |
+--------------------------------------+--------------------------------------+
| ``max_gram``                         | Defaults to ``2``.                   |
+--------------------------------------+--------------------------------------+
| ``side``                             | Either ``front`` or ``back``.        |
|                                      | Defaults to ``front``.               |
+--------------------------------------+--------------------------------------+

Porter Stem Token Filter
------------------------

A token filter of type ``porter_stem`` that transforms the token stream
as per the Porter stemming algorithm.

Note, the input to the stemming filter must already be in lower case, so
you will need to use `Lower Case Token
Filter <#analysis-lowercase-tokenfilter>`__ or `Lower Case
Tokenizer <#analysis-lowercase-tokenizer>`__ farther down the Tokenizer
chain in order for this to work properly!. For example, when using
custom analyzer, make sure the ``lowercase`` filter comes before the
``porter_stem`` filter in the list of filters.

Shingle Token Filter
--------------------

A token filter of type ``shingle`` that constructs shingles (token
n-grams) from a token stream. In other words, it creates combinations of
tokens as a single token. For example, the sentence "please divide this
sentence into shingles" might be tokenized into shingles "please
divide", "divide this", "this sentence", "sentence into", and "into
shingles".

This filter handles position increments > 1 by inserting filler tokens
(tokens with termtext "\_"). It does not handle a position increment of
0.

The following are settings that can be set for a ``shingle`` token
filter type:

+--------------------------------------+--------------------------------------+
| Setting                              | Description                          |
+======================================+======================================+
| ``max_shingle_size``                 | The maximum shingle size. Defaults   |
|                                      | to ``2``.                            |
+--------------------------------------+--------------------------------------+
| ``min_shingle_size``                 | The minimum shingle size. Defaults   |
|                                      | to ``2``.                            |
+--------------------------------------+--------------------------------------+
| ``output_unigrams``                  | If ``true`` the output will contain  |
|                                      | the input tokens (unigrams) as well  |
|                                      | as the shingles. Defaults to         |
|                                      | ``true``.                            |
+--------------------------------------+--------------------------------------+
| ``output_unigrams_if_no_shingles``   | If ``output_unigrams`` is ``false``  |
|                                      | the output will contain the input    |
|                                      | tokens (unigrams) if no shingles are |
|                                      | available. Note if                   |
|                                      | ``output_unigrams`` is set to        |
|                                      | ``true`` this setting has no effect. |
|                                      | Defaults to ``false``.               |
+--------------------------------------+--------------------------------------+
| ``token_separator``                  | The string to use when joining       |
|                                      | adjacent tokens to form a shingle.   |
|                                      | Defaults to ``" "``.                 |
+--------------------------------------+--------------------------------------+
| ``filler_token``                     | The string to use as a replacement   |
|                                      | for each position at which there is  |
|                                      | no actual token in the stream. For   |
|                                      | instance this string is used if the  |
|                                      | position increment is greater than   |
|                                      | one when a ``stop`` filter is used   |
|                                      | together with the ``shingle``        |
|                                      | filter. Defaults to ``"_"``          |
+--------------------------------------+--------------------------------------+

Stop Token Filter
-----------------

A token filter of type ``stop`` that removes stop words from token
streams.

The following are settings that can be set for a ``stop`` token filter
type:

+------------+---------------------------------------------------------------+
| ``stopword | A list of stop words to use. Defaults to ``_english_`` stop   |
| s``        | words.                                                        |
+------------+---------------------------------------------------------------+
| ``stopword | A path (either relative to ``config`` location, or absolute)  |
| s_path``   | to a stopwords file configuration. Each stop word should be   |
|            | in its own "line" (separated by a line break). The file must  |
|            | be UTF-8 encoded.                                             |
+------------+---------------------------------------------------------------+
| ``ignore_c | Set to ``true`` to lower case all words first. Defaults to    |
| ase``      | ``false``.                                                    |
+------------+---------------------------------------------------------------+
| ``remove_t | Set to ``false`` in order to not ignore the last term of a    |
| railing``  | search if it is a stop word. This is very useful for the      |
|            | completion suggester as a query like ``green a`` can be       |
|            | extended to ``green apple`` even though you remove stop words |
|            | in general. Defaults to ``true``.                             |
+------------+---------------------------------------------------------------+

The ``stopwords`` parameter accepts either an array of stopwords:

.. code:: json

    PUT /my_index
    {
        "settings": {
            "analysis": {
                "filter": {
                    "my_stop": {
                        "type":       "stop",
                        "stopwords": ["and", "is", "the"]
                    }
                }
            }
        }
    }

or a predefined language-specific list:

.. code:: json

    PUT /my_index
    {
        "settings": {
            "analysis": {
                "filter": {
                    "my_stop": {
                        "type":       "stop",
                        "stopwords":  "_english_"
                    }
                }
            }
        }
    }

Elasticsearch provides the following predefined list of languages:

``_arabic_``, ``_armenian_``, ``_basque_``, ``_brazilian_``,
``_bulgarian_``, ``_catalan_``, ``_czech_``, ``_danish_``, ``_dutch_``,
``_english_``, ``_finnish_``, ``_french_``, ``_galician_``,
``_german_``, ``_greek_``, ``_hindi_``, ``_hungarian_``,
``_indonesian_``, ``_irish_``, ``_italian_``, ``_latvian_``,
``_norwegian_``, ``_persian_``, ``_portuguese_``, ``_romanian_``,
``_russian_``, ``_sorani_``, ``_spanish_``, ``_swedish_``, ``_thai_``,
``_turkish_``.

For the empty stopwords list (to disable stopwords) use: ``_none_``.

Word Delimiter Token Filter
---------------------------

Named ``word_delimiter``, it Splits words into subwords and performs
optional transformations on subword groups. Words are split into
subwords with the following rules:

-  split on intra-word delimiters (by default, all non alpha-numeric
   characters).

-  "Wi-Fi" → "Wi", "Fi"

-  split on case transitions: "PowerShot" → "Power", "Shot"

-  split on letter-number transitions: "SD500" → "SD", "500"

-  leading and trailing intra-word delimiters on each subword are
   ignored: "//hello---there, *dude*" → "hello", "there", "dude"

-  trailing "'s" are removed for each subword: "O’Neil’s" → "O", "Neil"

Parameters include:

``generate_word_parts``
    If ``true`` causes parts of words to be generated: "PowerShot" ⇒
    "Power" "Shot". Defaults to ``true``.

``generate_number_parts``
    If ``true`` causes number subwords to be generated: "500-42" ⇒ "500"
    "42". Defaults to ``true``.

``catenate_words``
    If ``true`` causes maximum runs of word parts to be catenated:
    "wi-fi" ⇒ "wifi". Defaults to ``false``.

``catenate_numbers``
    If ``true`` causes maximum runs of number parts to be catenated:
    "500-42" ⇒ "50042". Defaults to ``false``.

``catenate_all``
    If ``true`` causes all subword parts to be catenated: "wi-fi-4000" ⇒
    "wifi4000". Defaults to ``false``.

``split_on_case_change``
    If ``true`` causes "PowerShot" to be two tokens; ("Power-Shot"
    remains two parts regards). Defaults to ``true``.

``preserve_original``
    If ``true`` includes original words in subwords: "500-42" ⇒ "500-42"
    "500" "42". Defaults to ``false``.

``split_on_numerics``
    If ``true`` causes "j2se" to be three tokens; "j" "2" "se". Defaults
    to ``true``.

``stem_english_possessive``
    If ``true`` causes trailing "'s" to be removed for each subword:
    "O’Neil’s" ⇒ "O", "Neil". Defaults to ``true``.

Advance settings include:

``protected_words``
    A list of protected words from being delimiter. Either an array, or
    also can set ``protected_words_path`` which resolved to a file
    configured with protected words (one on each line). Automatically
    resolves to ``config/`` based location if exists.

``type_table``
    A custom type mapping table, for example (when configured using
    ``type_table_path``):

.. code:: js

        # Map the $, %, '.', and ',' characters to DIGIT
        # This might be useful for financial data.
        $ => DIGIT
        % => DIGIT
        . => DIGIT
        \\u002C => DIGIT

        # in some cases you might not want to split on ZWJ
        # this also tests the case where we need a bigger byte[]
        # see http://en.wikipedia.org/wiki/Zero-width_joiner
        \\u200D => ALPHANUM

Stemmer Token Filter
--------------------

A filter that provides access to (almost) all of the available stemming
token filters through a single unified interface. For example:

.. code:: js

    {
        "index" : {
            "analysis" : {
                "analyzer" : {
                    "my_analyzer" : {
                        "tokenizer" : "standard",
                        "filter" : ["standard", "lowercase", "my_stemmer"]
                    }
                },
                "filter" : {
                    "my_stemmer" : {
                        "type" : "stemmer",
                        "name" : "light_german"
                    }
                }
            }
        }
    }

The ``language``/``name`` parameter controls the stemmer with the
following available values (the preferred filters are marked in
**bold**):

+------------+---------------------------------------------------------------+
| Arabic     | `**``arabic``** <http://lucene.apache.org/core/4_9_0/analyzer |
|            | s-common/org/apache/lucene/analysis/ar/ArabicStemmer.html>`__ |
+------------+---------------------------------------------------------------+
| Armenian   | `**``armenian``** <http://snowball.tartarus.org/algorithms/ar |
|            | menian/stemmer.html>`__                                       |
+------------+---------------------------------------------------------------+
| Basque     | `**``basque``** <http://snowball.tartarus.org/algorithms/basq |
|            | ue/stemmer.html>`__                                           |
+------------+---------------------------------------------------------------+
| Brazilian  | `**``brazilian``** <http://lucene.apache.org/core/4_9_0/analy |
| Portuguese | zers-common/org/apache/lucene/analysis/br/BrazilianStemmer.ht |
|            | ml>`__                                                        |
+------------+---------------------------------------------------------------+
| Bulgarian  | `**``bulgarian``** <http://members.unine.ch/jacques.savoy/Pap |
|            | ers/BUIR.pdf>`__                                              |
+------------+---------------------------------------------------------------+
| Catalan    | `**``catalan``** <http://snowball.tartarus.org/algorithms/cat |
|            | alan/stemmer.html>`__                                         |
+------------+---------------------------------------------------------------+
| Czech      | `**``czech``** <http://portal.acm.org/citation.cfm?id=1598600 |
|            | >`__                                                          |
+------------+---------------------------------------------------------------+
| Danish     | `**``danish``** <http://snowball.tartarus.org/algorithms/dani |
|            | sh/stemmer.html>`__                                           |
+------------+---------------------------------------------------------------+
| Dutch      | `**``dutch``** <http://snowball.tartarus.org/algorithms/dutch |
|            | /stemmer.html>`__,                                            |
|            | ```dutch_kp`` <http://snowball.tartarus.org/algorithms/kraaij |
|            | _pohlmann/stemmer.html>`__                                    |
+------------+---------------------------------------------------------------+
| English    | `**``english``** <http://snowball.tartarus.org/algorithms/por |
|            | ter/stemmer.html>`__,                                         |
|            | ```light_english`` <http://ciir.cs.umass.edu/pubfiles/ir-35.p |
|            | df>`__,                                                       |
|            | ```minimal_english`` <http://www.researchgate.net/publication |
|            | /220433848_How_effective_is_suffixing>`__,                    |
|            | ```possessive_english`` <http://lucene.apache.org/core/4_9_0/ |
|            | analyzers-common/org/apache/lucene/analysis/en/EnglishPossess |
|            | iveFilter.html>`__,                                           |
|            | ```porter2`` <http://snowball.tartarus.org/algorithms/english |
|            | /stemmer.html>`__,                                            |
|            | ```lovins`` <http://snowball.tartarus.org/algorithms/lovins/s |
|            | temmer.html>`__                                               |
+------------+---------------------------------------------------------------+
| Finnish    | `**``finnish``** <http://snowball.tartarus.org/algorithms/fin |
|            | nish/stemmer.html>`__,                                        |
|            | ```light_finnish`` <http://clef.isti.cnr.it/2003/WN_web/22.pd |
|            | f>`__                                                         |
+------------+---------------------------------------------------------------+
| French     | ```french`` <http://snowball.tartarus.org/algorithms/french/s |
|            | temmer.html>`__,                                              |
|            | `**``light_french``** <http://dl.acm.org/citation.cfm?id=1141 |
|            | 523>`__,                                                      |
|            | ```minimal_french`` <http://dl.acm.org/citation.cfm?id=318984 |
|            | >`__                                                          |
+------------+---------------------------------------------------------------+
| Galician   | `**``galician``** <http://bvg.udc.es/recursos_lingua/stemming |
|            | .jsp>`__,                                                     |
|            | ```minimal_galician`` <http://bvg.udc.es/recursos_lingua/stem |
|            | ming.jsp>`__                                                  |
|            | (Plural step only)                                            |
+------------+---------------------------------------------------------------+
| German     | ```german`` <http://snowball.tartarus.org/algorithms/german/s |
|            | temmer.html>`__,                                              |
|            | ```german2`` <http://snowball.tartarus.org/algorithms/german2 |
|            | /stemmer.html>`__,                                            |
|            | `**``light_german``** <http://dl.acm.org/citation.cfm?id=1141 |
|            | 523>`__,                                                      |
|            | ```minimal_german`` <http://members.unine.ch/jacques.savoy/cl |
|            | ef/morpho.pdf>`__                                             |
+------------+---------------------------------------------------------------+
| Greek      | `**``greek``** <http://sais.se/mthprize/2007/ntais2007.pdf>`_ |
|            | _                                                             |
+------------+---------------------------------------------------------------+
| Hindi      | `**``hindi``** <http://computing.open.ac.uk/Sites/EACLSouthAs |
|            | ia/Papers/p6-Ramanathan.pdf>`__                               |
+------------+---------------------------------------------------------------+
| Hungarian  | `**``hungarian``** <http://snowball.tartarus.org/algorithms/h |
|            | ungarian/stemmer.html>`__,                                    |
|            | ```light_hungarian`` <http://dl.acm.org/citation.cfm?id=11415 |
|            | 23&dl=ACM&coll=DL&CFID=179095584&CFTOKEN=80067181>`__         |
+------------+---------------------------------------------------------------+
| Indonesian | `**``indonesian``** <http://www.illc.uva.nl/Publications/Rese |
|            | archReports/MoL-2003-02.text.pdf>`__                          |
+------------+---------------------------------------------------------------+
| Irish      | `**``irish``** <http://snowball.tartarus.org/otherapps/oregan |
|            | /intro.html>`__                                               |
+------------+---------------------------------------------------------------+
| Italian    | ```italian`` <http://snowball.tartarus.org/algorithms/italian |
|            | /stemmer.html>`__,                                            |
|            | `**``light_italian``** <http://www.ercim.eu/publication/ws-pr |
|            | oceedings/CLEF2/savoy.pdf>`__                                 |
+------------+---------------------------------------------------------------+
| Kurdish    | `**``sorani``** <http://lucene.apache.org/core/4_9_0/analyzer |
| (Sorani)   | s-common/org/apache/lucene/analysis/ckb/SoraniStemmer.html>`_ |
|            | _                                                             |
+------------+---------------------------------------------------------------+
| Latvian    | `**``latvian``** <http://lucene.apache.org/core/4_9_0/analyze |
|            | rs-common/org/apache/lucene/analysis/lv/LatvianStemmer.html>` |
|            | __                                                            |
+------------+---------------------------------------------------------------+
| Norwegian  | `**``norwegian``** <http://snowball.tartarus.org/algorithms/n |
| (Bokmål)   | orwegian/stemmer.html>`__,                                    |
|            | `**``light_norwegian``** <http://lucene.apache.org/core/4_9_0 |
|            | /analyzers-common/org/apache/lucene/analysis/no/NorwegianLigh |
|            | tStemmer.html>`__,                                            |
|            | ```minimal_norwegian`` <http://lucene.apache.org/core/4_9_0/a |
|            | nalyzers-common/org/apache/lucene/analysis/no/NorwegianMinima |
|            | lStemmer.html>`__                                             |
+------------+---------------------------------------------------------------+
| Norwegian  | `**``light_nynorsk``** <http://lucene.apache.org/core/4_9_0/a |
| (Nynorsk)  | nalyzers-common/org/apache/lucene/analysis/no/NorwegianLightS |
|            | temmer.html>`__,                                              |
|            | ```minimal_nynorsk`` <http://lucene.apache.org/core/4_9_0/ana |
|            | lyzers-common/org/apache/lucene/analysis/no/NorwegianMinimalS |
|            | temmer.html>`__                                               |
+------------+---------------------------------------------------------------+
| Portuguese | ```portuguese`` <http://snowball.tartarus.org/algorithms/port |
|            | uguese/stemmer.html>`__,                                      |
|            | `**``light_portuguese``** <http://dl.acm.org/citation.cfm?id= |
|            | 1141523&dl=ACM&coll=DL&CFID=179095584&CFTOKEN=80067181>`__,   |
|            | ```minimal_portuguese`` <http://www.inf.ufrgs.br/~buriol/pape |
|            | rs/Orengo_CLEF07.pdf>`__,                                     |
|            | ```portuguese_rslp`` <http://www.inf.ufrgs.br/\~viviane/rslp/ |
|            | index.htm>`__                                                 |
+------------+---------------------------------------------------------------+
| Romanian   | `**``romanian``** <http://snowball.tartarus.org/algorithms/ro |
|            | manian/stemmer.html>`__                                       |
+------------+---------------------------------------------------------------+
| Russian    | `**``russian``** <http://snowball.tartarus.org/algorithms/rus |
|            | sian/stemmer.html>`__,                                        |
|            | ```light_russian`` <http://doc.rero.ch/lm.php?url=1000%2C43%2 |
|            | C4%2C20091209094227-CA%2FDolamic_Ljiljana_-_Indexing_and_Sear |
|            | ching_Strategies_for_the_Russian_20091209.pdf>`__             |
+------------+---------------------------------------------------------------+
| Spanish    | ```spanish`` <http://snowball.tartarus.org/algorithms/spanish |
|            | /stemmer.html>`__,                                            |
|            | `**``light_spanish``** <http://www.ercim.eu/publication/ws-pr |
|            | oceedings/CLEF2/savoy.pdf>`__                                 |
+------------+---------------------------------------------------------------+
| Swedish    | `**``swedish``** <http://snowball.tartarus.org/algorithms/swe |
|            | dish/stemmer.html>`__,                                        |
|            | ```light_swedish`` <http://clef.isti.cnr.it/2003/WN_web/22.pd |
|            | f>`__                                                         |
+------------+---------------------------------------------------------------+
| Turkish    | `**``turkish``** <http://snowball.tartarus.org/algorithms/tur |
|            | kish/stemmer.html>`__                                         |
+------------+---------------------------------------------------------------+

Stemmer Override Token Filter
-----------------------------

Overrides stemming algorithms, by applying a custom mapping, then
protecting these terms from being modified by stemmers. Must be placed
before any stemming filters.

Rules are separated by ``=>``

+--------------------------------------+--------------------------------------+
| Setting                              | Description                          |
+======================================+======================================+
| ``rules``                            | A list of mapping rules to use.      |
+--------------------------------------+--------------------------------------+
| ``rules_path``                       | A path (either relative to           |
|                                      | ``config`` location, or absolute) to |
|                                      | a list of mappings.                  |
+--------------------------------------+--------------------------------------+

Here is an example:

.. code:: js

    index :
        analysis :
            analyzer :
                myAnalyzer :
                    type : custom
                    tokenizer : standard
                    filter : [lowercase, custom_stems, porter_stem]
            filter:
                custom_stems:
                    type: stemmer_override
                    rules_path : analysis/custom_stems.txt

Keyword Marker Token Filter
---------------------------

Protects words from being modified by stemmers. Must be placed before
any stemming filters.

+--------------------------------------+--------------------------------------+
| Setting                              | Description                          |
+======================================+======================================+
| ``keywords``                         | A list of words to use.              |
+--------------------------------------+--------------------------------------+
| ``keywords_path``                    | A path (either relative to           |
|                                      | ``config`` location, or absolute) to |
|                                      | a list of words.                     |
+--------------------------------------+--------------------------------------+
| ``ignore_case``                      | Set to ``true`` to lower case all    |
|                                      | words first. Defaults to ``false``.  |
+--------------------------------------+--------------------------------------+

Here is an example:

.. code:: js

    index :
        analysis :
            analyzer :
                myAnalyzer :
                    type : custom
                    tokenizer : standard
                    filter : [lowercase, protwods, porter_stem]
            filter :
                protwods :
                    type : keyword_marker
                    keywords_path : analysis/protwords.txt

Keyword Repeat Token Filter
---------------------------

The ``keyword_repeat`` token filter Emits each incoming token twice once
as keyword and once as a non-keyword to allow an unstemmed version of a
term to be indexed side by side with the stemmed version of the term.
Given the nature of this filter each token that isn’t transformed by a
subsequent stemmer will be indexed twice. Therefore, consider adding a
``unique`` filter with ``only_on_same_position`` set to ``true`` to drop
unnecessary duplicates.

Here is an example:

.. code:: js

    index :
        analysis :
            analyzer :
                myAnalyzer :
                    type : custom
                    tokenizer : standard
                    filter : [lowercase, keyword_repeat, porter_stem, unique_stem]
                unique_stem:
                    type: unique
                    only_on_same_position : true

KStem Token Filter
------------------

The ``kstem`` token filter is a high performance filter for english. All
terms must already be lowercased (use ``lowercase`` filter) for this
filter to work correctly.

Snowball Token Filter
---------------------

A filter that stems words using a Snowball-generated stemmer. The
``language`` parameter controls the stemmer with the following available
values: ``Armenian``, ``Basque``, ``Catalan``, ``Danish``, ``Dutch``,
``English``, ``Finnish``, ``French``, ``German``, ``German2``,
``Hungarian``, ``Italian``, ``Kp``, ``Lovins``, ``Norwegian``,
``Porter``, ``Portuguese``, ``Romanian``, ``Russian``, ``Spanish``,
``Swedish``, ``Turkish``.

For example:

.. code:: js

    {
        "index" : {
            "analysis" : {
                "analyzer" : {
                    "my_analyzer" : {
                        "tokenizer" : "standard",
                        "filter" : ["standard", "lowercase", "my_snow"]
                    }
                },
                "filter" : {
                    "my_snow" : {
                        "type" : "snowball",
                        "language" : "Lovins"
                    }
                }
            }
        }
    }

Phonetic Token Filter
---------------------

The ``phonetic`` token filter is provided as a plugin and located
`here <https://github.com/elasticsearch/elasticsearch-analysis-phonetic>`__.

Synonym Token Filter
--------------------

The ``synonym`` token filter allows to easily handle synonyms during the
analysis process. Synonyms are configured using a configuration file.
Here is an example:

.. code:: js

    {
        "index" : {
            "analysis" : {
                "analyzer" : {
                    "synonym" : {
                        "tokenizer" : "whitespace",
                        "filter" : ["synonym"]
                    }
                },
                "filter" : {
                    "synonym" : {
                        "type" : "synonym",
                        "synonyms_path" : "analysis/synonym.txt"
                    }
                }
            }
        }
    }

The above configures a ``synonym`` filter, with a path of
``analysis/synonym.txt`` (relative to the ``config`` location). The
``synonym`` analyzer is then configured with the filter. Additional
settings are: ``ignore_case`` (defaults to ``false``), and ``expand``
(defaults to ``true``).

The ``tokenizer`` parameter controls the tokenizers that will be used to
tokenize the synonym, and defaults to the ``whitespace`` tokenizer.

Two synonym formats are supported: Solr, WordNet.

**Solr synonyms**

The following is a sample format of the file:

.. code:: js

    # blank lines and lines starting with pound are comments.

    #Explicit mappings match any token sequence on the LHS of "=>"
    #and replace with all alternatives on the RHS.  These types of mappings
    #ignore the expand parameter in the schema.
    #Examples:
    i-pod, i pod => ipod,
    sea biscuit, sea biscit => seabiscuit

    #Equivalent synonyms may be separated with commas and give
    #no explicit mapping.  In this case the mapping behavior will
    #be taken from the expand parameter in the schema.  This allows
    #the same synonym file to be used in different synonym handling strategies.
    #Examples:
    ipod, i-pod, i pod
    foozball , foosball
    universe , cosmos

    # If expand==true, "ipod, i-pod, i pod" is equivalent
    # to the explicit mapping:
    ipod, i-pod, i pod => ipod, i-pod, i pod
    # If expand==false, "ipod, i-pod, i pod" is equivalent
    # to the explicit mapping:
    ipod, i-pod, i pod => ipod

    #multiple synonym mapping entries are merged.
    foo => foo bar
    foo => baz
    #is equivalent to
    foo => foo bar, baz

You can also define synonyms for the filter directly in the
configuration file (note use of ``synonyms`` instead of
``synonyms_path``):

.. code:: js

    {
        "filter" : {
            "synonym" : {
                "type" : "synonym",
                "synonyms" : [
                    "i-pod, i pod => ipod",
                    "universe, cosmos"
                ]
            }
        }
    }

However, it is recommended to define large synonyms set in a file using
``synonyms_path``.

**WordNet synonyms**

Synonyms based on `WordNet <http://wordnet.princeton.edu/>`__ format can
be declared using ``format``:

.. code:: js

    {
        "filter" : {
            "synonym" : {
                "type" : "synonym",
                "format" : "wordnet",
                "synonyms" : [
                    "s(100000001,1,'abstain',v,1,0).",
                    "s(100000001,2,'refrain',v,1,0).",
                    "s(100000001,3,'desist',v,1,0)."
                ]
            }
        }
    }

Using ``synonyms_path`` to define WordNet synonyms in a file is
supported as well.

Compound Word Token Filter
--------------------------

Token filters that allow to decompose compound words. There are two
types available: ``dictionary_decompounder`` and
``hyphenation_decompounder``.

The following are settings that can be set for a compound word token
filter type:

+--------------------------------------+--------------------------------------+
| Setting                              | Description                          |
+======================================+======================================+
| ``word_list``                        | A list of words to use.              |
+--------------------------------------+--------------------------------------+
| ``word_list_path``                   | A path (either relative to           |
|                                      | ``config`` location, or absolute) to |
|                                      | a list of words.                     |
+--------------------------------------+--------------------------------------+
| ``hyphenation_patterns_path``        | A path (either relative to           |
|                                      | ``config`` location, or absolute) to |
|                                      | a FOP XML hyphenation pattern file.  |
|                                      | (See                                 |
|                                      | http://offo.sourceforge.net/hyphenat |
|                                      | ion/)                                |
|                                      | Required for                         |
|                                      | ``hyphenation_decompounder``.        |
+--------------------------------------+--------------------------------------+
| ``min_word_size``                    | Minimum word size(Integer). Defaults |
|                                      | to 5.                                |
+--------------------------------------+--------------------------------------+
| ``min_subword_size``                 | Minimum subword size(Integer).       |
|                                      | Defaults to 2.                       |
+--------------------------------------+--------------------------------------+
| ``max_subword_size``                 | Maximum subword size(Integer).       |
|                                      | Defaults to 15.                      |
+--------------------------------------+--------------------------------------+
| ``only_longest_match``               | Only matching the longest(Boolean).  |
|                                      | Defaults to ``false``                |
+--------------------------------------+--------------------------------------+

Here is an example:

.. code:: js

    index :
        analysis :
            analyzer :
                myAnalyzer2 :
                    type : custom
                    tokenizer : standard
                    filter : [myTokenFilter1, myTokenFilter2]
            filter :
                myTokenFilter1 :
                    type : dictionary_decompounder
                    word_list: [one, two, three]
                myTokenFilter2 :
                    type : hyphenation_decompounder
                    word_list_path: path/to/words.txt
                    max_subword_size : 22

Reverse Token Filter
--------------------

A token filter of type ``reverse`` that simply reverses each token.

Elision Token Filter
--------------------

A token filter which removes elisions. For example, "l’avion" (the
plane) will tokenized as "avion" (plane).

Accepts ``articles`` setting which is a set of stop words articles. For
example:

.. code:: js

    "index" : {
        "analysis" : {
            "analyzer" : {
                "default" : {
                    "tokenizer" : "standard",
                    "filter" : ["standard", "elision"]
                }
            },
            "filter" : {
                "elision" : {
                    "type" : "elision",
                    "articles" : ["l", "m", "t", "qu", "n", "s", "j"]
                }
            }
        }
    }

Truncate Token Filter
---------------------

The ``truncate`` token filter can be used to truncate tokens into a
specific length. This can come in handy with keyword (single token)
based mapped fields that are used for sorting in order to reduce memory
usage.

It accepts a ``length`` parameter which control the number of characters
to truncate to, defaults to ``10``.

Unique Token Filter
-------------------

The ``unique`` token filter can be used to only index unique tokens
during analysis. By default it is applied on all the token stream. If
``only_on_same_position`` is set to ``true``, it will only remove
duplicate tokens on the same position.

Pattern Capture Token Filter
----------------------------

The ``pattern_capture`` token filter, unlike the ``pattern`` tokenizer,
emits a token for every capture group in the regular expression.
Patterns are not anchored to the beginning and end of the string, so
each pattern can match multiple times, and matches are allowed to
overlap.

For instance a pattern like :

.. code:: js

    "(([a-z]+)(\d*))"

when matched against:

.. code:: js

    "abc123def456"

would produce the tokens: [ ``abc123``, ``abc``, ``123``, ``def456``,
``def``, ``456`` ]

If ``preserve_original`` is set to ``true`` (the default) then it would
also emit the original token: ``abc123def456``.

This is particularly useful for indexing text like camel-case code, eg
``stripHTML`` where a user may search for ``"strip html"`` or
``"striphtml"``:

.. code:: js

    curl -XPUT localhost:9200/test/  -d '
    {
       "settings" : {
          "analysis" : {
             "filter" : {
                "code" : {
                   "type" : "pattern_capture",
                   "preserve_original" : 1,
                   "patterns" : [
                      "(\\p{Ll}+|\\p{Lu}\\p{Ll}+|\\p{Lu}+)",
                      "(\\d+)"
                   ]
                }
             },
             "analyzer" : {
                "code" : {
                   "tokenizer" : "pattern",
                   "filter" : [ "code", "lowercase" ]
                }
             }
          }
       }
    }
    '

When used to analyze the text

.. code:: js

    import static org.apache.commons.lang.StringEscapeUtils.escapeHtml

this emits the tokens: [ ``import``, ``static``, ``org``, ``apache``,
``commons``, ``lang``, ``stringescapeutils``, ``string``, ``escape``,
``utils``, ``escapehtml``, ``escape``, ``html`` ]

Another example is analyzing email addresses:

.. code:: js

    curl -XPUT localhost:9200/test/  -d '
    {
       "settings" : {
          "analysis" : {
             "filter" : {
                "email" : {
                   "type" : "pattern_capture",
                   "preserve_original" : 1,
                   "patterns" : [
                      "(\\w+)",
                      "(\\p{L}+)",
                      "(\\d+)",
                      "@(.+)"
                   ]
                }
             },
             "analyzer" : {
                "email" : {
                   "tokenizer" : "uax_url_email",
                   "filter" : [ "email", "lowercase",  "unique" ]
                }
             }
          }
       }
    }
    '

When the above analyzer is used on an email address like:

.. code:: js

    john-smith_123@foo-bar.com

it would produce the following tokens: [ ``john-smith_123``,
``foo-bar.com``, ``john``, ``smith_123``, ``smith``, ``123``, ``foo``,
``foo-bar.com``, ``bar``, ``com`` ]

Multiple patterns are required to allow overlapping captures, but also
means that patterns are less dense and easier to understand.

**Note:** All tokens are emitted in the same position, and with the same
character offsets, so when combined with highlighting, the whole
original token will be highlighted, not just the matching subset. For
instance, querying the above email address for ``"smith"`` would
highlight:

.. code:: js

      <em>john-smith_123@foo-bar.com</em>

not:

.. code:: js

      john-<em>smith</em>_123@foo-bar.com

Pattern Replace Token Filter
----------------------------

The ``pattern_replace`` token filter allows to easily handle string
replacements based on a regular expression. The regular expression is
defined using the ``pattern`` parameter, and the replacement string can
be provided using the ``replacement`` parameter (supporting referencing
the original text, as explained
`here <http://docs.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html#appendReplacement(java.lang.StringBuffer,%20java.lang.String)>`__).

Trim Token Filter
-----------------

The ``trim`` token filter trims the whitespace surrounding a token.

Limit Token Count Token Filter
------------------------------

Limits the number of tokens that are indexed per document and field.

+--------------------------------------+--------------------------------------+
| Setting                              | Description                          |
+======================================+======================================+
| ``max_token_count``                  | The maximum number of tokens that    |
|                                      | should be indexed per document and   |
|                                      | field. The default is ``1``          |
+--------------------------------------+--------------------------------------+
| ``consume_all_tokens``               | If set to ``true`` the filter        |
|                                      | exhaust the stream even if           |
|                                      | ``max_token_count`` tokens have been |
|                                      | consumed already. The default is     |
|                                      | ``false``.                           |
+--------------------------------------+--------------------------------------+

Here is an example:

.. code:: js

    index :
        analysis :
            analyzer :
                myAnalyzer :
                    type : custom
                    tokenizer : standard
                    filter : [lowercase, five_token_limit]
            filter :
                five_token_limit :
                    type : limit
                    max_token_count : 5

Hunspell Token Filter
---------------------

Basic support for hunspell stemming. Hunspell dictionaries will be
picked up from a dedicated hunspell directory on the filesystem
(defaults to ``<path.conf>/hunspell``). Each dictionary is expected to
have its own directory named after its associated locale (language).
This dictionary directory is expected to hold a single ``*.aff`` and one
or more ``*.dic`` files (all of which will automatically be picked up).
For example, assuming the default hunspell location is used, the
following directory layout will define the ``en_US`` dictionary:

.. code:: js

    - conf
        |-- hunspell
        |    |-- en_US
        |    |    |-- en_US.dic
        |    |    |-- en_US.aff

The location of the hunspell directory can be configured using the
``indices.analysis.hunspell.dictionary.location`` settings in
*elasticsearch.yml*.

Each dictionary can be configured with one setting:

``ignore_case``
    If true, dictionary matching will be case insensitive (defaults to
    ``false``)

This setting can be configured globally in ``elasticsearch.yml`` using

-  ``indices.analysis.hunspell.dictionary.ignore_case``

or for specific dictionaries:

-  ``indices.analysis.hunspell.dictionary.en_US.ignore_case``.

It is also possible to add ``settings.yml`` file under the dictionary
directory which holds these settings (this will override any other
settings defined in the ``elasticsearch.yml``).

One can use the hunspell stem filter by configuring it the analysis
settings:

.. code:: js

    {
        "analysis" : {
            "analyzer" : {
                "en" : {
                    "tokenizer" : "standard",
                    "filter" : [ "lowercase", "en_US" ]
                }
            },
            "filter" : {
                "en_US" : {
                    "type" : "hunspell",
                    "locale" : "en_US",
                    "dedup" : true
                }
            }
        }
    }

The hunspell token filter accepts four options:

``locale``
    A locale for this filter. If this is unset, the ``lang`` or
    ``language`` are used instead - so one of these has to be set.

``dictionary``
    The name of a dictionary. The path to your hunspell dictionaries
    should be configured via
    ``indices.analysis.hunspell.dictionary.location`` before.

``dedup``
    If only unique terms should be returned, this needs to be set to
    ``true``. Defaults to ``true``.

``longest_only``
    If only the longest term should be returned, set this to ``true``.
    Defaults to ``false``: all possible stems are returned.

    **Note**

    As opposed to the snowball stemmers (which are algorithm based) this
    is a dictionary lookup based stemmer and therefore the quality of
    the stemming is determined by the quality of the dictionary.

**Dictionary loading**

By default, the configured
(``indices.analysis.hunspell.dictionary.location``) or default Hunspell
directory (``config/hunspell/``) is checked for dictionaries when the
node starts up, and any dictionaries are automatically loaded.

Dictionary loading can be deferred until they are actually used by
setting ``indices.analysis.hunspell.dictionary.lazy`` to \`true\`in the
config file.

**References**

Hunspell is a spell checker and morphological analyzer designed for
languages with rich morphology and complex word compounding and
character encoding.

1. Wikipedia, http://en.wikipedia.org/wiki/Hunspell

2. Source code, http://hunspell.sourceforge.net/

3. Open Office Hunspell dictionaries,
   http://wiki.openoffice.org/wiki/Dictionaries

4. Mozilla Hunspell dictionaries,
   https://addons.mozilla.org/en-US/firefox/language-tools/

5. Chromium Hunspell dictionaries,
   http://src.chromium.org/viewvc/chrome/trunk/deps/third_party/hunspell_dictionaries/

Common Grams Token Filter
-------------------------

Token filter that generates bigrams for frequently occuring terms.
Single terms are still indexed. It can be used as an alternative to the
`Stop Token Filter <#analysis-stop-tokenfilter>`__ when we don’t want to
completely ignore common terms.

For example, the text "the quick brown is a fox" will be tokenized as
"the", "the\_quick", "quick", "brown", "brown\_is", "is\_a", "a\_fox",
"fox". Assuming "the", "is" and "a" are common words.

When ``query_mode`` is enabled, the token filter removes common words
and single terms followed by a common word. This parameter should be
enabled in the search analyzer.

For example, the query "the quick brown is a fox" will be tokenized as
"the\_quick", "quick", "brown\_is", "is\_a", "a\_fox", "fox".

The following are settings that can be set:

+--------------------------------------+--------------------------------------+
| Setting                              | Description                          |
+======================================+======================================+
| ``common_words``                     | A list of common words to use.       |
+--------------------------------------+--------------------------------------+
| ``common_words_path``                | A path (either relative to           |
|                                      | ``config`` location, or absolute) to |
|                                      | a list of common words. Each word    |
|                                      | should be in its own "line"          |
|                                      | (separated by a line break). The     |
|                                      | file must be UTF-8 encoded.          |
+--------------------------------------+--------------------------------------+
| ``ignore_case``                      | If true, common words matching will  |
|                                      | be case insensitive (defaults to     |
|                                      | ``false``).                          |
+--------------------------------------+--------------------------------------+
| ``query_mode``                       | Generates bigrams then removes       |
|                                      | common words and single terms        |
|                                      | followed by a common word (defaults  |
|                                      | to ``false``).                       |
+--------------------------------------+--------------------------------------+

Note, ``common_words`` or ``common_words_path`` field is required.

Here is an example:

.. code:: js

    index :
        analysis :
            analyzer :
                index_grams :
                    tokenizer : whitespace
                    filter : [common_grams]
                search_grams :
                    tokenizer : whitespace
                    filter : [common_grams_query]
            filter :
                common_grams :
                    type : common_grams
                    common_words: [a, an, the]
                common_grams_query :
                    type : common_grams
                    query_mode: true
                    common_words: [a, an, the]

Normalization Token Filter
--------------------------

There are several token filters available which try to normalize special
characters of a certain language.

+------------+---------------------------------------------------------------+
| Arabic     | ```arabic_normalization`` <http://lucene.apache.org/core/4_9_ |
|            | 0/analyzers-common/org/apache/lucene/analysis/ar/ArabicNormal |
|            | izer.html>`__                                                 |
+------------+---------------------------------------------------------------+
| German     | ```german_normalization`` <http://lucene.apache.org/core/4_9_ |
|            | 0/analyzers-common/org/apache/lucene/analysis/de/GermanNormal |
|            | izationFilter.html>`__                                        |
+------------+---------------------------------------------------------------+
| Hindi      | ```hindi_normalization`` <http://lucene.apache.org/core/4_9_0 |
|            | /analyzers-common/org/apache/lucene/analysis/hi/HindiNormaliz |
|            | er.html>`__                                                   |
+------------+---------------------------------------------------------------+
| Indic      | ```indic_normalization`` <http://lucene.apache.org/core/4_9_0 |
|            | /analyzers-common/org/apache/lucene/analysis/in/IndicNormaliz |
|            | er.html>`__                                                   |
+------------+---------------------------------------------------------------+
| Kurdish    | ```sorani_normalization`` <http://lucene.apache.org/core/4_9_ |
| (Sorani)   | 0/analyzers-common/org/apache/lucene/analysis/ckb/SoraniNorma |
|            | lizer.html>`__                                                |
+------------+---------------------------------------------------------------+
| Persian    | ```persian_normalization`` <http://lucene.apache.org/core/4_9 |
|            | _0/analyzers-common/org/apache/lucene/analysis/fa/PersianNorm |
|            | alizer.html>`__                                               |
+------------+---------------------------------------------------------------+
| Scandinavi | ```scandinavian_normalization`` <http://lucene.apache.org/cor |
| an         | e/4_9_0/analyzers-common/org/apache/lucene/analysis/miscellan |
|            | eous/ScandinavianNormalizationFilter.html>`__,                |
|            | ```scandinavian_folding`` <http://lucene.apache.org/core/4_9_ |
|            | 0/analyzers-common/org/apache/lucene/analysis/miscellaneous/S |
|            | candinavianFoldingFilter.html>`__                             |
+------------+---------------------------------------------------------------+

CJK Width Token Filter
----------------------

The ``cjk_width`` token filter normalizes CJK width differences:

-  Folds fullwidth ASCII variants into the equivalent basic Latin

-  Folds halfwidth Katakana variants into the equivalent Kana

    **Note**

    This token filter can be viewed as a subset of NFKC/NFKD Unicode
    normalization. See the ? for full normalization support.

CJK Bigram Token Filter
-----------------------

The ``cjk_bigram`` token filter forms bigrams out of the CJK terms that
are generated by the ```standard``
tokenizer <#analysis-standard-tokenizer>`__ or the ``icu_tokenizer``
(see ?).

By default, when a CJK character has no adjacent characters to form a
bigram, it is output in unigram form. If you always want to output both
unigrams and bigrams, set the ``output_unigrams`` flag to ``true``. This
can be used for a combined unigram+bigram approach.

Bigrams are generated for characters in ``han``, ``hiragana``,
``katakana`` and ``hangul``, but bigrams can be disabled for particular
scripts with the ``ignore_scripts`` parameter. All non-CJK input is
passed through unmodified.

.. code:: js

    {
        "index" : {
            "analysis" : {
                "analyzer" : {
                    "han_bigrams" : {
                        "tokenizer" : "standard",
                        "filter" : ["han_bigrams_filter"]
                    }
                },
                "filter" : {
                    "han_bigrams_filter" : {
                        "type" : "cjk_bigram",
                        "ignore_scripts": [
                            "hiragana",
                            "katakana",
                            "hangul"
                        ],
                        "output_unigrams" : true
                    }
                }
            }
        }
    }

Delimited Payload Token Filter
------------------------------

Named ``delimited_payload_filter``. Splits tokens into tokens and
payload whenever a delimiter character is found.

Example: "the\|1 quick\|2 fox\|3" is split per default int to tokens
``fox``, ``quick`` and ``the`` with payloads ``1``, ``2`` and ``3``
respectively.

Parameters:

``delimiter``
    Character used for splitting the tokens. Default is ``|``.

``encoding``
    The type of the payload. ``int`` for integer, ``float`` for float
    and ``identity`` for characters. Default is ``float``.

Keep Words Token Filter
-----------------------

A token filter of type ``keep`` that only keeps tokens with text
contained in a predefined set of words. The set of words can be defined
in the settings or loaded from a text file containing one word per line.

**Options**

+------------+---------------------------------------------------------------+
| keep\_word | a list of words to keep                                       |
| s          |                                                               |
+------------+---------------------------------------------------------------+
| keep\_word | a path to a words file                                        |
| s\_path    |                                                               |
+------------+---------------------------------------------------------------+
| keep\_word | a boolean indicating whether to lower case the words          |
| s\_case    | (defaults to ``false``)                                       |
+------------+---------------------------------------------------------------+

**Settings example**

.. code:: js

    {
        "index" : {
            "analysis" : {
                "analyzer" : {
                    "my_analyzer" : {
                        "tokenizer" : "standard",
                        "filter" : ["standard", "lowercase", "words_till_three"]
                    },
                    "my_analyzer1" : {
                        "tokenizer" : "standard",
                        "filter" : ["standard", "lowercase", "words_on_file"]
                    }
                },
                "filter" : {
                    "words_till_three" : {
                        "type" : "keep",
                        "keep_words" : [ "one", "two", "three"]
                    },
                    "words_on_file" : {
                        "type" : "keep",
                        "keep_words_path" : "/path/to/word/file"
                    }
                }
            }
        }
    }

Keep Types Token Filter
-----------------------

A token filter of type ``keep_types`` that only keeps tokens with a
token type contained in a predefined set.

**Options**

+------------+---------------------------------------------------------------+
| types      | a list of types to keep                                       |
+------------+---------------------------------------------------------------+

**Settings example**

.. code:: js

    {
        "index" : {
            "analysis" : {
                "analyzer" : {
                    "my_analyzer" : {
                        "tokenizer" : "standard",
                        "filter" : ["standard", "lowercase", "extract_numbers"]
                    },
                },
                "filter" : {
                    "extract_numbers" : {
                        "type" : "keep_types",
                        "types" : [ "<NUM>" ]
                    },
                }
            }
        }
    }

Classic Token Filter
--------------------

The ``classic`` token filter does optional post-processing of terms that
are generated by the ```classic``
tokenizer <#analysis-classic-tokenizer>`__.

This filter removes the english possessive from the end of words, and it
removes dots from acronyms.

Apostrophe Token Filter
-----------------------

The ``apostrophe`` token filter strips all characters after an
apostrophe, including the apostrophe itself.

Character Filters
=================

Character filters are used to preprocess the string of characters before
it is passed to the `tokenizer <#analysis-tokenizers>`__. A character
filter may be used to strip out HTML markup, , or to convert ``"&"``
characters to the word ``"and"``.

Elasticsearch has built in characters filters which can be used to build
`custom analyzers <#analysis-custom-analyzer>`__.

Mapping Char Filter
-------------------

A char filter of type ``mapping`` replacing characters of an analyzed
text with given mapping.

Here is a sample configuration:

.. code:: js

    {
        "index" : {
            "analysis" : {
                "char_filter" : {
                    "my_mapping" : {
                        "type" : "mapping",
                        "mappings" : ["ph=>f", "qu=>k"]
                    }
                },
                "analyzer" : {
                    "custom_with_char_filter" : {
                        "tokenizer" : "standard",
                        "char_filter" : ["my_mapping"]
                    }
                }
            }
        }
    }

Otherwise the setting ``mappings_path`` can specify a file where you can
put the list of char mapping :

.. code:: js

    ph => f
    qu => k

HTML Strip Char Filter
----------------------

A char filter of type ``html_strip`` stripping out HTML elements from an
analyzed text.

Pattern Replace Char Filter
---------------------------

The ``pattern_replace`` char filter allows the use of a regex to
manipulate the characters in a string before analysis. The regular
expression is defined using the ``pattern`` parameter, and the
replacement string can be provided using the ``replacement`` parameter
(supporting referencing the original text, as explained
`here <http://docs.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html#appendReplacement(java.lang.StringBuffer,%20java.lang.String)>`__).
For more information check the `lucene
documentation <http://lucene.apache.org/core/4_3_1/analyzers-common/org/apache/lucene/analysis/pattern/PatternReplaceCharFilter.html>`__

Here is a sample configuration:

.. code:: js

    {
        "index" : {
            "analysis" : {
                "char_filter" : {
                    "my_pattern":{
                        "type":"pattern_replace",
                        "pattern":"sample(.*)",
                        "replacement":"replacedSample $1"
                    }
                },
                "analyzer" : {
                    "custom_with_char_filter" : {
                        "tokenizer" : "standard",
                        "char_filter" : ["my_pattern"]
                    },
                }
            }
        }
    }

ICU Analysis Plugin
===================

The `ICU <http://icu-project.org/>`__ analysis plugin allows for unicode
normalization, collation and folding. The plugin is called
`elasticsearch-analysis-icu <https://github.com/elasticsearch/elasticsearch-analysis-icu>`__.

The plugin includes the following analysis components:

**ICU Normalization**

Normalizes characters as explained
`here <http://userguide.icu-project.org/transforms/normalization>`__. It
registers itself by default under ``icu_normalizer`` or
``icuNormalizer`` using the default settings. Allows for the name
parameter to be provided which can include the following values:
``nfc``, ``nfkc``, and ``nfkc_cf``. Here is a sample settings:

.. code:: js

    {
        "index" : {
            "analysis" : {
                "analyzer" : {
                    "normalization" : {
                        "tokenizer" : "keyword",
                        "filter" : ["icu_normalizer"]
                    }
                }
            }
        }
    }

**ICU Folding**

Folding of unicode characters based on ``UTR#30``. It registers itself
under ``icu_folding`` and ``icuFolding`` names. The filter also does
lowercasing, which means the lowercase filter can normally be left out.
Sample setting:

.. code:: js

    {
        "index" : {
            "analysis" : {
                "analyzer" : {
                    "folding" : {
                        "tokenizer" : "keyword",
                        "filter" : ["icu_folding"]
                    }
                }
            }
        }
    }

**Filtering**

The folding can be filtered by a set of unicode characters with the
parameter ``unicodeSetFilter``. This is useful for a
non-internationalized search engine where retaining a set of national
characters which are primary letters in a specific language is wanted.
See syntax for the UnicodeSet
`here <http://icu-project.org/apiref/icu4j/com/ibm/icu/text/UnicodeSet.html>`__.

The Following example exempts Swedish characters from the folding. Note
that the filtered characters are NOT lowercased which is why we add that
filter below.

.. code:: js

    {
        "index" : {
            "analysis" : {
                "analyzer" : {
                    "folding" : {
                        "tokenizer" : "standard",
                        "filter" : ["my_icu_folding", "lowercase"]
                    }
                }
                "filter" : {
                    "my_icu_folding" : {
                        "type" : "icu_folding"
                        "unicodeSetFilter" : "[^åäöÅÄÖ]"
                    }
                }
            }
        }
    }

**ICU Collation**

Uses collation token filter. Allows to either specify the rules for
collation (defined
`here <http://www.icu-project.org/userguide/Collate_Customization.html>`__)
using the ``rules`` parameter (can point to a location or expressed in
the settings, location can be relative to config location), or using the
``language`` parameter (further specialized by country and variant). By
default registers under ``icu_collation`` or ``icuCollation`` and uses
the default locale.

Here is a sample settings:

.. code:: js

    {
        "index" : {
            "analysis" : {
                "analyzer" : {
                    "collation" : {
                        "tokenizer" : "keyword",
                        "filter" : ["icu_collation"]
                    }
                }
            }
        }
    }

And here is a sample of custom collation:

.. code:: js

    {
        "index" : {
            "analysis" : {
                "analyzer" : {
                    "collation" : {
                        "tokenizer" : "keyword",
                        "filter" : ["myCollator"]
                    }
                },
                "filter" : {
                    "myCollator" : {
                        "type" : "icu_collation",
                        "language" : "en"
                    }
                }
            }
        }
    }

**Options**

+------------+---------------------------------------------------------------+
| ``strength | The strength property determines the minimum level of         |
| ``         | difference considered significant during comparison. The      |
|            | default strength for the Collator is ``tertiary``, unless     |
|            | specified otherwise by the locale used to create the          |
|            | Collator. Possible values: ``primary``, ``secondary``,        |
|            | ``tertiary``, ``quaternary`` or ``identical``. See `ICU       |
|            | Collation <http://icu-project.org/apiref/icu4j/com/ibm/icu/te |
|            | xt/Collator.html>`__                                          |
|            | documentation for a more detailed explanation for the         |
|            | specific values.                                              |
+------------+---------------------------------------------------------------+
| ``decompos | Possible values: ``no`` or ``canonical``. Defaults to ``no``. |
| ition``    | Setting this decomposition property with ``canonical`` allows |
|            | the Collator to handle un-normalized text properly, producing |
|            | the same results as if the text were normalized. If ``no`` is |
|            | set, it is the user’s responsibility to insure that all text  |
|            | is already in the appropriate form before a comparison or     |
|            | before getting a CollationKey. Adjusting decomposition mode   |
|            | allows the user to select between faster and more complete    |
|            | collation behavior. Since a great many of the world’s         |
|            | languages do not require text normalization, most locales set |
|            | ``no`` as the default decomposition mode.                     |
+------------+---------------------------------------------------------------+

**Expert options:**

+------------+---------------------------------------------------------------+
| ``alternat | Possible values: ``shifted`` or ``non-ignorable``. Sets the   |
| e``        | alternate handling for strength ``quaternary`` to be either   |
|            | shifted or non-ignorable. What boils down to ignoring         |
|            | punctuation and whitespace.                                   |
+------------+---------------------------------------------------------------+
| ``caseLeve | Possible values: ``true`` or ``false``. Default is ``false``. |
| l``        | Whether case level sorting is required. When strength is set  |
|            | to ``primary`` this will ignore accent differences.           |
+------------+---------------------------------------------------------------+
| ``caseFirs | Possible values: ``lower`` or ``upper``. Useful to control    |
| t``        | which case is sorted first when case is not ignored for       |
|            | strength ``tertiary``.                                        |
+------------+---------------------------------------------------------------+
| ``numeric` | Possible values: ``true`` or ``false``. Whether digits are    |
| `          | sorted according to numeric representation. For example the   |
|            | value ``egg-9`` is sorted before the value ``egg-21``.        |
|            | Defaults to ``false``.                                        |
+------------+---------------------------------------------------------------+
| ``variable | Single character or contraction. Controls what is variable    |
| Top``      | for ``alternate``.                                            |
+------------+---------------------------------------------------------------+
| ``hiragana | Possible values: ``true`` or ``false``. Defaults to           |
| Quaternary | ``false``. Distinguishing between Katakana and Hiragana       |
| Mode``     | characters in ``quaternary`` strength .                       |
+------------+---------------------------------------------------------------+

**ICU Tokenizer**

Breaks text into words according to UAX #29: Unicode Text Segmentation
http://www.unicode.org/reports/tr29/\ http://www.unicode.org/reports/tr29/.

.. code:: js

    {
        "index" : {
            "analysis" : {
                "analyzer" : {
                    "collation" : {
                        "tokenizer" : "icu_tokenizer",
                    }
                }
            }
        }
    }

**ICU Normalization CharFilter**

Normalizes characters as explained
`here <http://userguide.icu-project.org/transforms/normalization>`__. It
registers itself by default under ``icu_normalizer`` or
``icuNormalizer`` using the default settings. Allows for the name
parameter to be provided which can include the following values:
``nfc``, ``nfkc``, and ``nfkc_cf``. Allows for the mode parameter to be
provided which can include the following values: ``compose`` and
``decompose``. Use ``decompose`` with ``nfc`` or ``nfkc``, to get
``nfd`` or ``nfkd``, respectively. Here is a sample settings:

.. code:: js

    {
        "index" : {
            "analysis" : {
                "analyzer" : {
                    "collation" : {
                        "tokenizer" : "keyword",
                        "char_filter" : ["icu_normalizer"]
                    }
                }
            }
        }
    }

Cluster
=======

**Shards Allocation**

Shards allocation is the process of allocating shards to nodes. This can
happen during initial recovery, replica allocation, rebalancing, or
handling nodes being added or removed.

The following settings may be used:

``cluster.routing.allocation.allow_rebalance``
    Allow to control when rebalancing will happen based on the total
    state of all the indices shards in the cluster. ``always``,
    ``indices_primaries_active``, and ``indices_all_active`` are
    allowed, defaulting to ``indices_all_active`` to reduce chatter
    during initial recovery.

``cluster.routing.allocation.cluster_concurrent_rebalance``
    Allow to control how many concurrent rebalancing of shards are
    allowed cluster wide, and default it to ``2``.

``cluster.routing.allocation.node_initial_primaries_recoveries``
    Allow to control specifically the number of initial recoveries of
    primaries that are allowed per node. Since most times local gateway
    is used, those should be fast and we can handle more of those per
    node without creating load.

``cluster.routing.allocation.node_concurrent_recoveries``
    How many concurrent recoveries are allowed to happen on a node.
    Defaults to ``2``.

``cluster.routing.allocation.enable``
    Controls shard allocation for all indices, by allowing specific
    kinds of shard to be allocated.

    Can be set to:

    -  ``all`` - (default) Allows shard allocation for all kinds of
       shards.

    -  ``primaries`` - Allows shard allocation only for primary shards.

    -  ``new_primaries`` - Allows shard allocation only for primary
       shards for new indices.

    -  ``none`` - No shard allocations of any kind are allowed for all
       indices.

``cluster.routing.rebalance.enable``
    Controls shard rebalance for all indices, by allowing specific kinds
    of shard to be rebalanced.

    Can be set to:

    -  ``all`` - (default) Allows shard balancing for all kinds of
       shards.

    -  ``primaries`` - Allows shard balancing only for primary shards.

    -  ``replicas`` - Allows shard balancing only for replica shards.

    -  ``none`` - No shard balancing of any kind are allowed for all
       indices.

``cluster.routing.allocation.same_shard.host``
    Allows to perform a check to prevent allocation of multiple
    instances of the same shard on a single host, based on host name and
    host address. Defaults to ``false``, meaning that no check is
    performed by default. This setting only applies if multiple nodes
    are started on the same machine.

``indices.recovery.concurrent_streams``
    The number of streams to open (on a **node** level) to recover a
    shard from a peer shard. Defaults to ``3``.

**Shard Allocation Awareness**

Cluster allocation awareness allows to configure shard and replicas
allocation across generic attributes associated the nodes. Lets explain
it through an example:

Assume we have several racks. When we start a node, we can configure an
attribute called ``rack_id`` (any attribute name works), for example,
here is a sample config:

::

    node.rack_id: rack_one

The above sets an attribute called ``rack_id`` for the relevant node
with a value of ``rack_one``. Now, we need to configure the ``rack_id``
attribute as one of the awareness allocation attributes (set it on
**all** (master eligible) nodes config):

::

    cluster.routing.allocation.awareness.attributes: rack_id

The above will mean that the ``rack_id`` attribute will be used to do
awareness based allocation of shard and its replicas. For example, lets
say we start 2 nodes with ``node.rack_id`` set to ``rack_one``, and
deploy a single index with 5 shards and 1 replica. The index will be
fully deployed on the current nodes (5 shards and 1 replica each, total
of 10 shards).

Now, if we start two more nodes, with ``node.rack_id`` set to
``rack_two``, shards will relocate to even the number of shards across
the nodes, but, a shard and its replica will not be allocated in the
same ``rack_id`` value.

The awareness attributes can hold several values, for example:

::

    cluster.routing.allocation.awareness.attributes: rack_id,zone

**NOTE**: When using awareness attributes, shards will not be allocated
to nodes that don’t have values set for those attributes.

**Forced Awareness**

Sometimes, we know in advance the number of values an awareness
attribute can have, and more over, we would like never to have more
replicas than needed allocated on a specific group of nodes with the
same awareness attribute value. For that, we can force awareness on
specific attributes.

For example, lets say we have an awareness attribute called ``zone``,
and we know we are going to have two zones, ``zone1`` and ``zone2``.
Here is how we can force awareness on a node:

.. code:: js

    cluster.routing.allocation.awareness.force.zone.values: zone1,zone2
    cluster.routing.allocation.awareness.attributes: zone

Now, lets say we start 2 nodes with ``node.zone`` set to ``zone1`` and
create an index with 5 shards and 1 replica. The index will be created,
but only 5 shards will be allocated (with no replicas). Only when we
start more shards with ``node.zone`` set to ``zone2`` will the replicas
be allocated.

**Automatic Preference When Searching / GETing**

When executing a search, or doing a get, the node receiving the request
will prefer to execute the request on shards that exists on nodes that
have the same attribute values as the executing node.

**Realtime Settings Update**

The settings can be updated using the `cluster update settings
API <#cluster-update-settings>`__ on a live cluster.

**Shard Allocation Filtering**

Allow to control allocation of indices on nodes based on include/exclude
filters. The filters can be set both on the index level and on the
cluster level. Lets start with an example of setting it on the cluster
level:

Lets say we have 4 nodes, each has specific attribute called ``tag``
associated with it (the name of the attribute can be any name). Each
node has a specific value associated with ``tag``. Node 1 has a setting
``node.tag: value1``, Node 2 a setting of ``node.tag: value2``, and so
on.

We can create an index that will only deploy on nodes that have ``tag``
set to ``value1`` and ``value2`` by setting
``index.routing.allocation.include.tag`` to ``value1,value2``. For
example:

.. code:: js

    curl -XPUT localhost:9200/test/_settings -d '{
          "index.routing.allocation.include.tag" : "value1,value2"
    }'

On the other hand, we can create an index that will be deployed on all
nodes except for nodes with a ``tag`` of value ``value3`` by setting
``index.routing.allocation.exclude.tag`` to ``value3``. For example:

.. code:: js

    curl -XPUT localhost:9200/test/_settings -d '{
          "index.routing.allocation.exclude.tag" : "value3"
    }'

``index.routing.allocation.require.*`` can be used to specify a number
of rules, all of which MUST match in order for a shard to be allocated
to a node. This is in contrast to ``include`` which will include a node
if ANY rule matches.

The ``include``, ``exclude`` and ``require`` values can have generic
simple matching wildcards, for example, ``value1*``. A special attribute
name called ``_ip`` can be used to match on node ip values. In addition
``_host`` attribute can be used to match on either the node’s hostname
or its ip address. Similarly ``_name`` and ``_id`` attributes can be
used to match on node name and node id accordingly.

Obviously a node can have several attributes associated with it, and
both the attribute name and value are controlled in the setting. For
example, here is a sample of several node configurations:

.. code:: js

    node.group1: group1_value1
    node.group2: group2_value4

In the same manner, ``include``, ``exclude`` and ``require`` can work
against several attributes, for example:

.. code:: js

    curl -XPUT localhost:9200/test/_settings -d '{
        "index.routing.allocation.include.group1" : "xxx"
        "index.routing.allocation.include.group2" : "yyy",
        "index.routing.allocation.exclude.group3" : "zzz",
        "index.routing.allocation.require.group4" : "aaa"
    }'

The provided settings can also be updated in real time using the update
settings API, allowing to "move" indices (shards) around in realtime.

Cluster wide filtering can also be defined, and be updated in real time
using the cluster update settings API. This setting can come in handy
for things like decommissioning nodes (even if the replica count is set
to 0). Here is a sample of how to decommission a node based on ``_ip``
address:

.. code:: js

    curl -XPUT localhost:9200/_cluster/settings -d '{
        "transient" : {
            "cluster.routing.allocation.exclude._ip" : "10.0.0.1"
        }
    }'

Discovery
=========

The discovery module is responsible for discovering nodes within a
cluster, as well as electing a master node.

Note, Elasticsearch is a peer to peer based system, nodes communicate
with one another directly if operations are delegated / broadcast. All
the main APIs (index, delete, search) do not communicate with the master
node. The responsibility of the master node is to maintain the global
cluster state, and act if nodes join or leave the cluster by reassigning
shards. Each time a cluster state is changed, the state is made known to
the other nodes in the cluster (the manner depends on the actual
discovery implementation).

**Settings**

The ``cluster.name`` allows to create separated clusters from one
another. The default value for the cluster name is ``elasticsearch``,
though it is recommended to change this to reflect the logical group
name of the cluster running.

Azure Discovery
---------------

Azure discovery allows to use the Azure APIs to perform automatic
discovery (similar to multicast). Please check the `plugin
website <https://github.com/elasticsearch/elasticsearch-cloud-azure>`__
to find the full documentation.

EC2 Discovery
-------------

EC2 discovery allows to use the EC2 APIs to perform automatic discovery
(similar to multicast). Please check the `plugin
website <https://github.com/elasticsearch/elasticsearch-cloud-aws>`__ to
find the full documentation.

Google Compute Engine Discovery
-------------------------------

Google Compute Engine (GCE) discovery allows to use the GCE APIs to
perform automatic discovery (similar to multicast). Please check the
`plugin
website <https://github.com/elasticsearch/elasticsearch-cloud-gce>`__ to
find the full documentation.

Zen Discovery
-------------

The zen discovery is the built in discovery module for elasticsearch and
the default. It provides both multicast and unicast discovery as well
being easily extended to support cloud environments.

The zen discovery is integrated with other modules, for example, all
communication between nodes is done using the
`transport <#modules-transport>`__ module.

It is separated into several sub modules, which are explained below:

**Ping**

This is the process where a node uses the discovery mechanisms to find
other nodes. There is support for both multicast and unicast based
discovery (these mechanisms can be used in conjunction as well).

**Multicast**

Multicast ping discovery of other nodes is done by sending one or more
multicast requests which existing nodes will receive and respond to. It
provides the following settings with the
``discovery.zen.ping.multicast`` prefix:

+--------------------------------------+--------------------------------------+
| Setting                              | Description                          |
+======================================+======================================+
| ``group``                            | The group address to use. Defaults   |
|                                      | to ``224.2.2.4``.                    |
+--------------------------------------+--------------------------------------+
| ``port``                             | The port to use. Defaults to         |
|                                      | ``54328``.                           |
+--------------------------------------+--------------------------------------+
| ``ttl``                              | The ttl of the multicast message.    |
|                                      | Defaults to ``3``.                   |
+--------------------------------------+--------------------------------------+
| ``address``                          | The address to bind to, defaults to  |
|                                      | ``null`` which means it will bind to |
|                                      | all available network interfaces.    |
+--------------------------------------+--------------------------------------+
| ``enabled``                          | Whether multicast ping discovery is  |
|                                      | enabled. Defaults to ``true``.       |
+--------------------------------------+--------------------------------------+

**Unicast**

The unicast discovery allows for discovery when multicast is not
enabled. It basically requires a list of hosts to use that will act as
gossip routers. It provides the following settings with the
``discovery.zen.ping.unicast`` prefix:

+--------------------------------------+--------------------------------------+
| Setting                              | Description                          |
+======================================+======================================+
| ``hosts``                            | Either an array setting or a comma   |
|                                      | delimited setting. Each value is     |
|                                      | either in the form of ``host:port``, |
|                                      | or in the form of                    |
|                                      | ``host[port1-port2]``.               |
+--------------------------------------+--------------------------------------+

The unicast discovery uses the `transport <#modules-transport>`__ module
to perform the discovery.

**Master Election**

As part of the ping process a master of the cluster is either elected or
joined to. This is done automatically. The
``discovery.zen.ping_timeout`` (which defaults to ``3s``) allows for the
tweaking of election time to handle cases of slow or congested networks
(higher values assure less chance of failure). Once a node joins, it
will send a join request to the master (``discovery.zen.join_timeout``)
with a timeout defaulting at 20 times the ping timeout.

When the master node stops or has encountered a problem, the cluster
nodes start pinging again and will elect a new master. This pinging
round also serves as a protection against (partial) network failures
where node may unjustly think that the master has failed. In this case
the node will simply hear from other nodes about the currently active
master.

Nodes can be excluded from becoming a master by setting ``node.master``
to ``false``. Note, once a node is a client node (``node.client`` set to
``true``), it will not be allowed to become a master (``node.master`` is
automatically set to ``false``).

The ``discovery.zen.minimum_master_nodes`` sets the minimum number of
master eligible nodes a node should "see" in order to win a master
election. It must be set to a quorum of your master eligible nodes. It
is recommended to avoid having only two master eligible nodes, since a
quorum of two is two. Therefore, a loss of either master node will
result in an inoperable cluster

**Fault Detection**

There are two fault detection processes running. The first is by the
master, to ping all the other nodes in the cluster and verify that they
are alive. And on the other end, each node pings to master to verify if
its still alive or an election process needs to be initiated.

The following settings control the fault detection process using the
``discovery.zen.fd`` prefix:

+--------------------------------------+--------------------------------------+
| Setting                              | Description                          |
+======================================+======================================+
| ``ping_interval``                    | How often a node gets pinged.        |
|                                      | Defaults to ``1s``.                  |
+--------------------------------------+--------------------------------------+
| ``ping_timeout``                     | How long to wait for a ping          |
|                                      | response, defaults to ``30s``.       |
+--------------------------------------+--------------------------------------+
| ``ping_retries``                     | How many ping failures / timeouts    |
|                                      | cause a node to be considered        |
|                                      | failed. Defaults to ``3``.           |
+--------------------------------------+--------------------------------------+

**External Multicast**

The multicast discovery also supports external multicast requests to
discover nodes. The external client can send a request to the multicast
IP/group and port, in the form of:

.. code:: js

    {
        "request" : {
            "cluster_name": "test_cluster"
        }
    }

And the response will be similar to node info response (with node level
information only, including transport/http addresses, and node
attributes):

.. code:: js

    {
        "response" : {
            "cluster_name" : "test_cluster",
            "transport_address" : "...",
            "http_address" : "...",
            "attributes" : {
                "..."
            }
        }
    }

Note, it can still be enabled, with disabled internal multicast
discovery, but still have external discovery working by keeping
``discovery.zen.ping.multicast.enabled`` set to ``true`` (the default),
but, setting ``discovery.zen.ping.multicast.ping.enabled`` to ``false``.

**Cluster state updates**

The master node is the only node in a cluster that can make changes to
the cluster state. The master node processes one cluster state update at
a time, applies the required changes and publishes the updated cluster
state to all the other nodes in the cluster. Each node receives the
publish message, updates its own cluster state and replies to the master
node, which waits for all nodes to respond, up to a timeout, before
going ahead processing the next updates in the queue. The
``discovery.zen.publish_timeout`` is set by default to 30 seconds and
can be changed dynamically through the `cluster update settings
api <#cluster-update-settings>`__

**No master block**

For a node to be fully operational, it must have an active master. The
``discovery.zen.no_master_block`` settings controls what operations
should be rejected when there is no active master.

The ``discovery.zen.no_master_block`` setting has two valid options:

+------------+---------------------------------------------------------------+
| ``all``    | All operations on the node—i.e. both read & writes—will be    |
|            | rejected. This also applies for api cluster state read or     |
|            | write operations, like the get index settings, put mapping    |
|            | and cluster state api.                                        |
+------------+---------------------------------------------------------------+
| ``write``  | (default) Write operations will be rejected. Read operations  |
|            | will succeed, based on the last known cluster configuration.  |
|            | This may result in partial reads of stale data as this node   |
|            | may be isolated from the rest of the cluster.                 |
+------------+---------------------------------------------------------------+

The ``discovery.zen.no_master_block`` setting doesn’t apply to nodes
based apis (for example cluster stats, node info and node stats apis)
which will not be blocked and try to execute on any node possible.

Gateway
=======

The gateway module allows one to store the state of the cluster meta
data across full cluster restarts. The cluster meta data mainly holds
all the indices created with their respective (index level) settings and
explicit type mappings.

Each time the cluster meta data changes (for example, when an index is
added or deleted), those changes will be persisted using the gateway.
When the cluster first starts up, the state will be read from the
gateway and applied.

The gateway set on the node level will automatically control the index
gateway that will be used. For example, if the ``local`` gateway is
used, then automatically, each index created on the node will also use
its own respective index level ``local`` gateway. In this case, if an
index should not persist its state, it should be explicitly set to
``none`` (which is the only other value it can be set to).

The default gateway used is the `local <#modules-gateway-local>`__
gateway.

**Recovery After Nodes / Time**

In many cases, the actual cluster meta data should only be recovered
after specific nodes have started in the cluster, or a timeout has
passed. This is handy when restarting the cluster, and each node local
index storage still exists to be reused and not recovered from the
gateway (which reduces the time it takes to recover from the gateway).

The ``gateway.recover_after_nodes`` setting (which accepts a number)
controls after how many data and master eligible nodes within the
cluster recovery will start. The ``gateway.recover_after_data_nodes``
and ``gateway.recover_after_master_nodes`` setting work in a similar
fashion, except they consider only the number of data nodes and only the
number of master nodes respectively. The ``gateway.recover_after_time``
setting (which accepts a time value) sets the time to wait till recovery
happens once all ``gateway.recover_after...nodes`` conditions are met.

The ``gateway.expected_nodes`` allows to set how many data and master
eligible nodes are expected to be in the cluster, and once met, the
``gateway.recover_after_time`` is ignored and recovery starts. Setting
``gateway.expected_nodes`` also defaults ``gateway.recovery_after_time``
to ``5m`` The ``gateway.expected_data_nodes`` and
``gateway.expected_master_nodes`` settings are also supported. For
example setting:

.. code:: js

    gateway:
        recover_after_time: 5m
        expected_nodes: 2

In an expected 2 nodes cluster will cause recovery to start 5 minutes
after the first node is up, but once there are 2 nodes in the cluster,
recovery will begin immediately (without waiting).

Note, once the meta data has been recovered from the gateway (which
indices to create, mappings and so on), then this setting is no longer
effective until the next full restart of the cluster.

Operations are blocked while the cluster meta data has not been
recovered in order not to mix with the actual cluster meta data that
will be recovered once the settings has been reached.

Local Gateway
-------------

The local gateway allows for recovery of the full cluster state and
indices from the local storage of each node, and does not require a
common node level shared storage.

Note, different from shared gateway types, the persistency to the local
gateway is **not** done in an async manner. Once an operation is
performed, the data is there for the local gateway to recover it in case
of full cluster failure.

It is important to configure the ``gateway.recover_after_nodes`` setting
to include most of the expected nodes to be started after a full cluster
restart. This will insure that the latest cluster state is recovered.
For example:

.. code:: js

    gateway:
        recover_after_nodes: 3
        expected_nodes: 5

**Dangling indices**

When a node joins the cluster, any shards/indices stored in its local
``data/`` directory which do not already exist in the cluster will be
imported into the cluster by default. This functionality has two
purposes:

1. If a new master node is started which is unaware of the other indices
   in the cluster, adding the old nodes will cause the old indices to be
   imported, instead of being deleted.

2. An old index can be added to an existing cluster by copying it to the
   ``data/`` directory of a new node, starting the node and letting it
   join the cluster. Once the index has been replicated to other nodes
   in the cluster, the new node can be shut down and removed.

The import of dangling indices can be controlled with the
``gateway.local.auto_import_dangled`` which accepts:

+------------+---------------------------------------------------------------+
| ``yes``    | Import dangling indices into the cluster (default).           |
+------------+---------------------------------------------------------------+
| ``close``  | Import dangling indices into the cluster state, but leave     |
|            | them closed.                                                  |
+------------+---------------------------------------------------------------+
| ``no``     | Delete dangling indices after                                 |
|            | ``gateway.local.dangling_timeout``, which defaults to 2       |
|            | hours.                                                        |
+------------+---------------------------------------------------------------+

HTTP
====

The http module allows to expose **elasticsearch** APIs over HTTP.

The http mechanism is completely asynchronous in nature, meaning that
there is no blocking thread waiting for a response. The benefit of using
asynchronous communication for HTTP is solving the `C10k
problem <http://en.wikipedia.org/wiki/C10k_problem>`__.

When possible, consider using `HTTP keep
alive <http://en.wikipedia.org/wiki/Keepalive#HTTP_Keepalive>`__ when
connecting for better performance and try to get your favorite client
not to do `HTTP
chunking <http://en.wikipedia.org/wiki/Chunked_transfer_encoding>`__.

**Settings**

The following are the settings that can be configured for HTTP:

+--------------------------------------+--------------------------------------+
| Setting                              | Description                          |
+======================================+======================================+
| ``http.port``                        | A bind port range. Defaults to       |
|                                      | ``9200-9300``.                       |
+--------------------------------------+--------------------------------------+
| ``http.bind_host``                   | The host address to bind the HTTP    |
|                                      | service to. Defaults to              |
|                                      | ``http.host`` (if set) or            |
|                                      | ``network.bind_host``.               |
+--------------------------------------+--------------------------------------+
| ``http.publish_host``                | The host address to publish for HTTP |
|                                      | clients to connect to. Defaults to   |
|                                      | ``http.host`` (if set) or            |
|                                      | ``network.publish_host``.            |
+--------------------------------------+--------------------------------------+
| ``http.host``                        | Used to set the ``http.bind_host``   |
|                                      | and the ``http.publish_host``        |
|                                      | Defaults to ``http.host`` or         |
|                                      | ``network.host``.                    |
+--------------------------------------+--------------------------------------+
| ``http.max_content_length``          | The max content of an HTTP request.  |
|                                      | Defaults to ``100mb``                |
+--------------------------------------+--------------------------------------+
| ``http.max_initial_line_length``     | The max length of an HTTP URL.       |
|                                      | Defaults to ``4kb``                  |
+--------------------------------------+--------------------------------------+
| ``http.compression``                 | Support for compression when         |
|                                      | possible (with Accept-Encoding).     |
|                                      | Defaults to ``false``.               |
+--------------------------------------+--------------------------------------+
| ``http.compression_level``           | Defines the compression level to     |
|                                      | use. Defaults to ``6``.              |
+--------------------------------------+--------------------------------------+
| ``http.cors.enabled``                | Enable or disable cross-origin       |
|                                      | resource sharing, i.e. whether a     |
|                                      | browser on another origin can do     |
|                                      | requests to Elasticsearch. Defaults  |
|                                      | to ``false``.                        |
+--------------------------------------+--------------------------------------+
| ``http.cors.allow-origin``           | Which origins to allow. Defaults to  |
|                                      | ``*``, i.e. any origin. If you       |
|                                      | prepend and append a ``/`` to the    |
|                                      | value, this will be treated as a     |
|                                      | regular expression, allowing you to  |
|                                      | support HTTP and HTTPs. for example  |
|                                      | using                                |
|                                      | ``/https?:\/\/localhost(:[0-9]+)?/`` |
|                                      | would return the request header      |
|                                      | appropriately in both cases.         |
+--------------------------------------+--------------------------------------+
| ``http.cors.max-age``                | Browsers send a "preflight"          |
|                                      | OPTIONS-request to determine CORS    |
|                                      | settings. ``max-age`` defines how    |
|                                      | long the result should be cached     |
|                                      | for. Defaults to ``1728000`` (20     |
|                                      | days)                                |
+--------------------------------------+--------------------------------------+
| ``http.cors.allow-methods``          | Which methods to allow. Defaults to  |
|                                      | ``OPTIONS, HEAD, GET, POST, PUT, DEL |
|                                      | ETE``.                               |
+--------------------------------------+--------------------------------------+
| ``http.cors.allow-headers``          | Which headers to allow. Defaults to  |
|                                      | ``X-Requested-With, Content-Type, Co |
|                                      | ntent-Length``.                      |
+--------------------------------------+--------------------------------------+
| ``http.cors.allow-credentials``      | Whether the                          |
|                                      | ``Access-Control-Allow-Credentials`` |
|                                      | header should be returned. Note:     |
|                                      | This header is only returned, when   |
|                                      | the setting is set to ``true``.      |
|                                      | Defaults to ``false``                |
+--------------------------------------+--------------------------------------+
| ``http.pipelining``                  | Enable or disable HTTP pipelining,   |
|                                      | defaults to ``true``.                |
+--------------------------------------+--------------------------------------+
| ``http.pipelining.max_events``       | The maximum number of events to be   |
|                                      | queued up in memory before a HTTP    |
|                                      | connection is closed, defaults to    |
|                                      | ``10000``.                           |
+--------------------------------------+--------------------------------------+

It also uses the common `network settings <#modules-network>`__.

**Disable HTTP**

The http module can be completely disabled and not started by setting
``http.enabled`` to ``false``. This make sense when creating non `data
nodes <#modules-node>`__ which accept HTTP requests, and communicate
with data nodes using the internal `transport <#modules-transport>`__.

Indices
=======

The indices module allow to control settings that are globally managed
for all indices.

**Indexing Buffer**

The indexing buffer setting allows to control how much memory will be
allocated for the indexing process. It is a global setting that bubbles
down to all the different shards allocated on a specific node.

The ``indices.memory.index_buffer_size`` accepts either a percentage or
a byte size value. It defaults to ``10%``, meaning that ``10%`` of the
total memory allocated to a node will be used as the indexing buffer
size. This amount is then divided between all the different shards.
Also, if percentage is used, it is possible to set
``min_index_buffer_size`` (defaults to ``48mb``) and
``max_index_buffer_size`` (defaults to unbounded).

The ``indices.memory.min_shard_index_buffer_size`` allows to set a hard
lower limit for the memory allocated per shard for its own indexing
buffer. It defaults to ``4mb``.

**TTL interval**

You can dynamically set the ``indices.ttl.interval``, which allows to
set how often expired documents will be automatically deleted. The
default value is 60s.

The deletion orders are processed by bulk. You can set
``indices.ttl.bulk_size`` to fit your needs. The default value is 10000.

See also ?.

**Recovery**

The following settings can be set to manage the recovery policy:

+------------+---------------------------------------------------------------+
| ``indices. | defaults to ``3``.                                            |
| recovery.c |                                                               |
| oncurrent_ |                                                               |
| streams``  |                                                               |
+------------+---------------------------------------------------------------+
| ``indices. | defaults to ``512kb``.                                        |
| recovery.f |                                                               |
| ile_chunk_ |                                                               |
| size``     |                                                               |
+------------+---------------------------------------------------------------+
| ``indices. | defaults to ``1000``.                                         |
| recovery.t |                                                               |
| ranslog_op |                                                               |
| s``        |                                                               |
+------------+---------------------------------------------------------------+
| ``indices. | defaults to ``512kb``.                                        |
| recovery.t |                                                               |
| ranslog_si |                                                               |
| ze``       |                                                               |
+------------+---------------------------------------------------------------+
| ``indices. | defaults to ``true``.                                         |
| recovery.c |                                                               |
| ompress``  |                                                               |
+------------+---------------------------------------------------------------+
| ``indices. | defaults to ``20mb``.                                         |
| recovery.m |                                                               |
| ax_bytes_p |                                                               |
| er_sec``   |                                                               |
+------------+---------------------------------------------------------------+

**Store level throttling**

The following settings can be set to control the store throttling:

+------------+---------------------------------------------------------------+
| ``indices. | could be ``merge`` (default), ``none`` or ``all``. See ?.     |
| store.thro |                                                               |
| ttle.type` |                                                               |
| `          |                                                               |
+------------+---------------------------------------------------------------+
| ``indices. | defaults to ``20mb``.                                         |
| store.thro |                                                               |
| ttle.max_b |                                                               |
| ytes_per_s |                                                               |
| ec``       |                                                               |
+------------+---------------------------------------------------------------+

memcached
=========

The memcached module allows to expose **elasticsearch** APIs over the
memcached protocol (as closely as possible).

It is provided as a plugin called ``transport-memcached`` and installing
is explained
`here <https://github.com/elasticsearch/elasticsearch-transport-memcached>`__
. Another option is to download the memcached plugin and placing it
under the ``plugins`` directory.

The memcached protocol supports both the binary and the text protocol,
automatically detecting the correct one to use.

**Mapping REST to Memcached Protocol**

Memcached commands are mapped to REST and handled by the same generic
REST layer in elasticsearch. Here is a list of the memcached commands
supported:

**GET**

The memcached ``GET`` command maps to a REST ``GET``. The key used is
the URI (with parameters). The main downside is the fact that the
memcached ``GET`` does not allow body in the request (and ``SET`` does
not allow to return a result…). For this reason, most REST APIs (like
search) allow to accept the "source" as a URI parameter as well.

**SET**

The memcached ``SET`` command maps to a REST ``POST``. The key used is
the URI (with parameters), and the body maps to the REST body.

**DELETE**

The memcached ``DELETE`` command maps to a REST ``DELETE``. The key used
is the URI (with parameters).

**QUIT**

The memcached ``QUIT`` command is supported and disconnects the client.

**Settings**

The following are the settings the can be configured for memcached:

+--------------------------------------+--------------------------------------+
| Setting                              | Description                          |
+======================================+======================================+
| ``memcached.port``                   | A bind port range. Defaults to       |
|                                      | ``11211-11311``.                     |
+--------------------------------------+--------------------------------------+

It also uses the common `network settings <#modules-network>`__.

**Disable memcached**

The memcached module can be completely disabled and not started using by
setting ``memcached.enabled`` to ``false``. By default it is enabled
once it is detected as a plugin.

Network Settings
================

There are several modules within a Node that use network based
configuration, for example, the `transport <#modules-transport>`__ and
`http <#modules-http>`__ modules. Node level network settings allows to
set common settings that will be shared among all network based modules
(unless explicitly overridden in each module).

The ``network.bind_host`` setting allows to control the host different
network components will bind on. By default, the bind host will be
``anyLocalAddress`` (typically ``0.0.0.0`` or ``::0``).

The ``network.publish_host`` setting allows to control the host the node
will publish itself within the cluster so other nodes will be able to
connect to it. Of course, this can’t be the ``anyLocalAddress``, and by
default, it will be the first non loopback address (if possible), or the
local address.

The ``network.host`` setting is a simple setting to automatically set
both ``network.bind_host`` and ``network.publish_host`` to the same host
value.

Both settings allows to be configured with either explicit host address
or host name. The settings also accept logical setting values explained
in the following table:

+--------------------------------------+--------------------------------------+
| Logical Host Setting Value           | Description                          |
+======================================+======================================+
| ``_local_``                          | Will be resolved to the local ip     |
|                                      | address.                             |
+--------------------------------------+--------------------------------------+
| ``_non_loopback_``                   | The first non loopback address.      |
+--------------------------------------+--------------------------------------+
| ``_non_loopback:ipv4_``              | The first non loopback IPv4 address. |
+--------------------------------------+--------------------------------------+
| ``_non_loopback:ipv6_``              | The first non loopback IPv6 address. |
+--------------------------------------+--------------------------------------+
| ``_[networkInterface]_``             | Resolves to the ip address of the    |
|                                      | provided network interface. For      |
|                                      | example ``_en0_``.                   |
+--------------------------------------+--------------------------------------+
| ``_[networkInterface]:ipv4_``        | Resolves to the ipv4 address of the  |
|                                      | provided network interface. For      |
|                                      | example ``_en0:ipv4_``.              |
+--------------------------------------+--------------------------------------+
| ``_[networkInterface]:ipv6_``        | Resolves to the ipv6 address of the  |
|                                      | provided network interface. For      |
|                                      | example ``_en0:ipv6_``.              |
+--------------------------------------+--------------------------------------+

When the ``cloud-aws`` plugin is installed, the following are also
allowed as valid network host settings:

+--------------------------------------+--------------------------------------+
| EC2 Host Value                       | Description                          |
+======================================+======================================+
| ``_ec2:privateIpv4_``                | The private IP address (ipv4) of the |
|                                      | machine.                             |
+--------------------------------------+--------------------------------------+
| ``_ec2:privateDns_``                 | The private host of the machine.     |
+--------------------------------------+--------------------------------------+
| ``_ec2:publicIpv4_``                 | The public IP address (ipv4) of the  |
|                                      | machine.                             |
+--------------------------------------+--------------------------------------+
| ``_ec2:publicDns_``                  | The public host of the machine.      |
+--------------------------------------+--------------------------------------+
| ``_ec2_``                            | Less verbose option for the private  |
|                                      | ip address.                          |
+--------------------------------------+--------------------------------------+
| ``_ec2:privateIp_``                  | Less verbose option for the private  |
|                                      | ip address.                          |
+--------------------------------------+--------------------------------------+
| ``_ec2:publicIp_``                   | Less verbose option for the public   |
|                                      | ip address.                          |
+--------------------------------------+--------------------------------------+

**TCP Settings**

Any component that uses TCP (like the HTTP, Transport and Memcached)
share the following allowed settings:

+--------------------------------------+--------------------------------------+
| Setting                              | Description                          |
+======================================+======================================+
| ``network.tcp.no_delay``             | Enable or disable tcp no delay       |
|                                      | setting. Defaults to ``true``.       |
+--------------------------------------+--------------------------------------+
| ``network.tcp.keep_alive``           | Enable or disable tcp keep alive.    |
|                                      | Defaults to ``true``.                |
+--------------------------------------+--------------------------------------+
| ``network.tcp.reuse_address``        | Should an address be reused or not.  |
|                                      | Defaults to ``true`` on non-windows  |
|                                      | machines.                            |
+--------------------------------------+--------------------------------------+
| ``network.tcp.send_buffer_size``     | The size of the tcp send buffer size |
|                                      | (in size setting format). By default |
|                                      | not explicitly set.                  |
+--------------------------------------+--------------------------------------+
| ``network.tcp.receive_buffer_size``  | The size of the tcp receive buffer   |
|                                      | size (in size setting format). By    |
|                                      | default not explicitly set.          |
+--------------------------------------+--------------------------------------+

Node
====

**elasticsearch** allows to configure a node to either be allowed to
store data locally or not. Storing data locally basically means that
shards of different indices are allowed to be allocated on that node. By
default, each node is considered to be a data node, and it can be turned
off by setting ``node.data`` to ``false``.

This is a powerful setting allowing to simply create smart load
balancers that take part in some of different API processing. Lets take
an example:

We can start a whole cluster of data nodes which do not even start an
HTTP transport by setting ``http.enabled`` to ``false``. Such nodes will
communicate with one another using the
`transport <#modules-transport>`__ module. In front of the cluster we
can start one or more "non data" nodes which will start with HTTP
enabled. All HTTP communication will be performed through these "non
data" nodes.

The benefit of using that is first the ability to create smart load
balancers. These "non data" nodes are still part of the cluster, and
they redirect operations exactly to the node that holds the relevant
data. The other benefit is the fact that for scatter / gather based
operations (such as search), these nodes will take part of the
processing since they will start the scatter process, and perform the
actual gather processing.

This relieves the data nodes to do the heavy duty of indexing and
searching, without needing to process HTTP requests (parsing), overload
the network, or perform the gather processing.

Tribe node
==========

The *tribes* feature allows a *tribe node* to act as a federated client
across multiple clusters.

The tribe node works by retrieving the cluster state from all connected
clusters and merging them into a global cluster state. With this
information at hand, it is able to perform read and write operations
against the nodes in all clusters as if they were local.

The ``elasticsearch.yml`` config file for a tribe node just needs to
list the clusters that should be joined, for instance:

.. code:: yaml

    tribe:
        t1: 
            cluster.name:   cluster_one
        t2: 
            cluster.name:   cluster_two

``t1`` and ``t2`` are arbitrary names representing the connection to
each cluster.

The example above configures connections to two clusters, name ``t1``
and ``t2`` respectively. The tribe node will create a `node
client <#modules-node>`__ to connect each cluster using `multicast
discovery <#multicast>`__ by default. Any other settings for the
connection can be configured under ``tribe.{name}``, just like the
``cluster.name`` in the example.

The merged global cluster state means that almost all operations work in
the same way as a single cluster: distributed search, suggest,
percolation, indexing, etc.

However, there are a few exceptions:

-  The merged view cannot handle indices with the same name in multiple
   clusters. By default it will pick one of them, see later for
   on\_conflict options.

-  Master level read operations (eg ?, ?) will automatically execute
   with a local flag set to true since there is no master.

-  Master level write operations (eg ?) are not allowed. These should be
   performed on a single cluster.

The tribe node can be configured to block all write operations and all
metadata operations with:

.. code:: yaml

    tribe:
        blocks:
            write:    true
            metadata: true

The tribe node can also configure blocks on indices explicitly:

.. code:: yaml

    tribe:
        blocks:
            indices.write: hk*,ldn*

When there is a conflict and multiple clusters hold the same index, by
default the tribe node will pick one of them. This can be configured
using the ``tribe.on_conflict`` setting. It defaults to ``any``, but can
be set to ``drop`` (drop indices that have a conflict), or
``prefer_[tribeName]`` to prefer the index from a specific tribe.

Plugins
=======

**Plugins**

Plugins are a way to enhance the basic elasticsearch functionality in a
custom manner. They range from adding custom mapping types, custom
analyzers (in a more built in fashion), native scripts, custom discovery
and more.

**Installing plugins**

Installing plugins can either be done manually by placing them under the
``plugins`` directory, or using the ``plugin`` script. Several plugins
can be found under the
`elasticsearch <https://github.com/elasticsearch>`__ organization in
GitHub, starting with ``elasticsearch-``.

Installing plugins typically take the following form:

.. code:: shell

    plugin --install <org>/<user/component>/<version>

The plugins will be automatically downloaded in this case from
``download.elasticsearch.org``, and in case they don’t exist there, from
maven (central and sonatype).

Note that when the plugin is located in maven central or sonatype
repository, ``<org>`` is the artifact ``groupId`` and
``<user/component>`` is the ``artifactId``.

A plugin can also be installed directly by specifying the URL for it,
for example:

.. code:: shell

    bin/plugin --url file:///path/to/plugin --install plugin-name

You can run ``bin/plugin -h``.

**Site Plugins**

Plugins can have "sites" in them, any plugin that exists under the
``plugins`` directory with a ``_site`` directory, its content will be
statically served when hitting ``/_plugin/[plugin_name]/`` url. Those
can be added even after the process has started.

Installed plugins that do not contain any java related content, will
automatically be detected as site plugins, and their content will be
moved under ``_site``.

The ability to install plugins from Github allows to easily install site
plugins hosted there by downloading the actual repo, for example,
running:

.. code:: js

    bin/plugin --install mobz/elasticsearch-head
    bin/plugin --install lukas-vlcek/bigdesk

Will install both of those site plugins, with ``elasticsearch-head``
available under ``http://localhost:9200/_plugin/head/`` and ``bigdesk``
available under ``http://localhost:9200/_plugin/bigdesk/``.

**Mandatory Plugins**

If you rely on some plugins, you can define mandatory plugins using the
``plugin.mandatory`` attribute, for example, here is a sample config:

.. code:: js

    plugin.mandatory: mapper-attachments,lang-groovy

For safety reasons, if a mandatory plugin is not installed, the node
will not start.

**Installed Plugins**

A list of the currently loaded plugins can be retrieved using the `Node
Info API <#cluster-nodes-info>`__.

**Removing plugins**

Removing plugins can either be done manually by removing them under the
``plugins`` directory, or using the ``plugin`` script.

Removing plugins typically take the following form:

.. code:: shell

    plugin --remove <pluginname>

**Silent/Verbose mode**

When running the ``plugin`` script, you can get more information (debug
mode) using ``--verbose``. On the opposite, if you want ``plugin``
script to be silent, use ``--silent`` option.

Note that exit codes could be:

-  ``0``: everything was OK

-  ``64``: unknown command or incorrect option parameter

-  ``74``: IO error

-  ``70``: other errors

.. code:: shell

    bin/plugin --install mobz/elasticsearch-head --verbose
    plugin --remove head --silent

**Timeout settings**

By default, the ``plugin`` script will wait indefinitely when
downloading before failing. The timeout parameter can be used to
explicitly specify how long it waits. Here is some examples of setting
it to different values:

.. code:: shell

    # Wait for 30 seconds before failing
    bin/plugin --install mobz/elasticsearch-head --timeout 30s

    # Wait for 1 minute before failing
    bin/plugin --install mobz/elasticsearch-head --timeout 1m

    # Wait forever (default)
    bin/plugin --install mobz/elasticsearch-head --timeout 0

**Proxy settings**

To install a plugin via a proxy, you can pass the proxy details using
the environment variables ``proxyHost`` and ``proxyPort``.

On Linux and Mac, here is an example of setting it:

.. code:: shell

    bin/plugin -DproxyHost=host_name -DproxyPort=port_number --install mobz/elasticsearch-head

On Windows, here is an example of setting it:

.. code:: shell

    set JAVA_OPTS="-DproxyHost=host_name -DproxyPort=port_number"
    bin/plugin --install mobz/elasticsearch-head

**Lucene version dependent plugins**

For some plugins, such as analysis plugins, a specific major Lucene
version is required to run. In that case, the plugin provides in its
``es-plugin.properties`` file the Lucene version for which the plugin
was built for.

If present at startup the node will check the Lucene version before
loading the plugin.

You can disable that check using ``plugins.check_lucene: false``.

**Known Plugins**

**Analysis Plugins**

-  `ICU Analysis
   plugin <https://github.com/elasticsearch/elasticsearch-analysis-icu>`__

-  `Japanese (Kuromoji) Analysis
   plugin <https://github.com/elasticsearch/elasticsearch-analysis-kuromoji>`__.

-  `Smart Chinese Analysis
   Plugin <https://github.com/elasticsearch/elasticsearch-analysis-smartcn>`__

-  `Stempel (Polish) Analysis
   plugin <https://github.com/elasticsearch/elasticsearch-analysis-stempel>`__

-  `Annotation Analysis
   Plugin <https://github.com/barminator/elasticsearch-analysis-annotation>`__
   (by Michal Samek)

-  `Combo Analysis
   Plugin <https://github.com/yakaz/elasticsearch-analysis-combo/>`__
   (by Olivier Favre, Yakaz)

-  `Hunspell Analysis
   Plugin <https://github.com/jprante/elasticsearch-analysis-hunspell>`__
   (by Jörg Prante)

-  `IK Analysis
   Plugin <https://github.com/medcl/elasticsearch-analysis-ik>`__ (by
   Medcl)

-  `Japanese Analysis
   plugin <https://github.com/suguru/elasticsearch-analysis-japanese>`__
   (by suguru).

-  `Mmseg Analysis
   Plugin <https://github.com/medcl/elasticsearch-analysis-mmseg>`__ (by
   Medcl)

-  `Morfologik (Polish) Analysis
   plugin <https://github.com/chytreg/elasticsearch-analysis-morfologik>`__
   (by chytreg)

-  `Russian and English Morphological Analysis
   Plugin <https://github.com/imotov/elasticsearch-analysis-morphology>`__
   (by Igor Motov)

-  `Hebrew Analysis
   Plugin <https://github.com/synhershko/elasticsearch-analysis-hebrew>`__
   (by Itamar Syn-Hershko)

-  `Pinyin Analysis
   Plugin <https://github.com/medcl/elasticsearch-analysis-pinyin>`__
   (by Medcl)

-  `String2Integer Analysis
   Plugin <https://github.com/medcl/elasticsearch-analysis-string2int>`__
   (by Medcl)

-  `Vietnamese Analysis
   Plugin <https://github.com/duydo/elasticsearch-analysis-vietnamese>`__
   (by Duy Do)

**Discovery Plugins**

-  `AWS Cloud
   Plugin <https://github.com/elasticsearch/elasticsearch-cloud-aws>`__
   - EC2 discovery and S3 Repository

-  `Azure Cloud
   Plugin <https://github.com/elasticsearch/elasticsearch-cloud-azure>`__
   - Azure discovery

-  `Google Compute Engine Cloud
   Plugin <https://github.com/elasticsearch/elasticsearch-cloud-gce>`__
   - GCE discovery

-  `eskka Discovery Plugin <https://github.com/shikhar/eskka>`__ (by
   Shikhar Bhushan)

**River Plugins**

-  `CouchDB River
   Plugin <https://github.com/elasticsearch/elasticsearch-river-couchdb>`__

-  `RabbitMQ River
   Plugin <https://github.com/elasticsearch/elasticsearch-river-rabbitmq>`__

-  `Twitter River
   Plugin <https://github.com/elasticsearch/elasticsearch-river-twitter>`__

-  `Wikipedia River
   Plugin <https://github.com/elasticsearch/elasticsearch-river-wikipedia>`__

-  `ActiveMQ River
   Plugin <https://github.com/domdorn/elasticsearch-river-activemq/>`__
   (by Dominik Dorn)

-  `Amazon SQS River
   Plugin <https://github.com/albogdano/elasticsearch-river-amazonsqs>`__
   (by Alex Bogdanovski)

-  `CSV River
   Plugin <https://github.com/xxBedy/elasticsearch-river-csv>`__ (by
   Martin Bednar)

-  `Dropbox River Plugin <http://www.pilato.fr/dropbox/>`__ (by David
   Pilato)

-  `FileSystem River Plugin <http://www.pilato.fr/fsriver/>`__ (by David
   Pilato)

-  `Git River
   Plugin <https://github.com/obazoud/elasticsearch-river-git>`__ (by
   Olivier Bazoud)

-  `GitHub River
   Plugin <https://github.com/uberVU/elasticsearch-river-github>`__ (by
   uberVU)

-  `Hazelcast River
   Plugin <https://github.com/sksamuel/elasticsearch-river-hazelcast>`__
   (by Steve Samuel)

-  `JDBC River
   Plugin <https://github.com/jprante/elasticsearch-river-jdbc>`__ (by
   Jörg Prante)

-  `JMS River
   Plugin <https://github.com/qotho/elasticsearch-river-jms>`__ (by
   Steve Sarandos)

-  `Kafka River
   Plugin <https://github.com/endgameinc/elasticsearch-river-kafka>`__
   (by Endgame Inc.)

-  `LDAP River
   Plugin <https://github.com/tlrx/elasticsearch-river-ldap>`__ (by
   Tanguy Leroux)

-  `MongoDB River
   Plugin <https://github.com/richardwilly98/elasticsearch-river-mongodb/>`__
   (by Richard Louapre)

-  `Neo4j River
   Plugin <https://github.com/sksamuel/elasticsearch-river-neo4j>`__ (by
   Steve Samuel)

-  `Open Archives Initiative (OAI) River
   Plugin <https://github.com/jprante/elasticsearch-river-oai/>`__ (by
   Jörg Prante)

-  `Redis River
   Plugin <https://github.com/sksamuel/elasticsearch-river-redis>`__ (by
   Steve Samuel)

-  `RethinkDB River
   Plugin <https://github.com/rethinkdb/elasticsearch-river-rethinkdb>`__
   (by RethinkDB)

-  `RSS River Plugin <http://dadoonet.github.com/rssriver/>`__ (by David
   Pilato)

-  `Sofa River
   Plugin <https://github.com/adamlofts/elasticsearch-river-sofa>`__ (by
   adamlofts)

-  `Solr River
   Plugin <https://github.com/javanna/elasticsearch-river-solr/>`__ (by
   Luca Cavanna)

-  `St9 River
   Plugin <https://github.com/sunnygleason/elasticsearch-river-st9>`__
   (by Sunny Gleason)

-  `Subversion River
   Plugin <https://github.com/plombard/SubversionRiver>`__ (by Pascal
   Lombard)

-  `DynamoDB River
   Plugin <https://github.com/kzwang/elasticsearch-river-dynamodb>`__
   (by Kevin Wang)

-  `IMAP/POP3 Email River
   Plugin <https://github.com/salyh/elasticsearch-river-imap>`__ (by
   Hendrik Saly)

-  `Web River
   Plugin <https://github.com/codelibs/elasticsearch-river-web>`__ (by
   CodeLibs Project)

-  `EEA ElasticSearch RDF River
   Plugin <https://github.com/eea/eea.elasticsearch.river.rdf>`__ (by
   the European Environment Agency)

**Transport Plugins**

-  `Memcached transport
   plugin <https://github.com/elasticsearch/elasticsearch-transport-memcached>`__

-  `Thrift
   Transport <https://github.com/elasticsearch/elasticsearch-transport-thrift>`__

-  `Servlet
   transport <https://github.com/elasticsearch/elasticsearch-transport-wares>`__

-  `ZeroMQ transport layer
   plugin <https://github.com/tlrx/transport-zeromq>`__ (by Tanguy
   Leroux)

-  `Jetty HTTP transport
   plugin <https://github.com/sonian/elasticsearch-jetty>`__ (by Sonian
   Inc.)

-  `Redis transport
   plugin <https://github.com/kzwang/elasticsearch-transport-redis>`__
   (by Kevin Wang)

**Scripting Plugins**

-  `Clojure Language
   Plugin <https://github.com/hiredman/elasticsearch-lang-clojure>`__
   (by Kevin Downey)

-  `Groovy lang
   Plugin <https://github.com/elasticsearch/elasticsearch-lang-groovy>`__

-  `JavaScript language
   Plugin <https://github.com/elasticsearch/elasticsearch-lang-javascript>`__

-  `Python language
   Plugin <https://github.com/elasticsearch/elasticsearch-lang-python>`__

-  `SQL language
   Plugin <https://github.com/NLPchina/elasticsearch-sql/>`__ (by nlpcn)

**Site Plugins**

-  `BigDesk Plugin <https://github.com/lukas-vlcek/bigdesk>`__ (by Lukáš
   Vlček)

-  `Elasticsearch Head
   Plugin <https://github.com/mobz/elasticsearch-head>`__ (by Ben Birch)

-  `Elasticsearch HQ <https://github.com/royrusso/elasticsearch-HQ>`__
   (by Roy Russo)

-  `Hammer Plugin <https://github.com/andrewvc/elastic-hammer>`__ (by
   Andrew Cholakian)

-  `Inquisitor
   Plugin <https://github.com/polyfractal/elasticsearch-inquisitor>`__
   (by Zachary Tong)

-  `Paramedic
   Plugin <https://github.com/karmi/elasticsearch-paramedic>`__ (by
   Karel Minařík)

-  `SegmentSpy
   Plugin <https://github.com/polyfractal/elasticsearch-segmentspy>`__
   (by Zachary Tong)

-  `Whatson Plugin <https://github.com/xyu/elasticsearch-whatson>`__ (by
   Xiao Yu)

**Snapshot/Restore Repository Plugins**

-  `Hadoop
   HDFS <https://github.com/elasticsearch/elasticsearch-hadoop/tree/master/repository-hdfs>`__
   Repository

-  `AWS
   S3 <https://github.com/elasticsearch/elasticsearch-cloud-aws#s3-repository>`__
   Repository

-  `GridFS <https://github.com/kzwang/elasticsearch-repository-gridfs>`__
   Repository (by Kevin Wang)

**Misc Plugins**

-  `Mapper Attachments Type
   plugin <https://github.com/elasticsearch/elasticsearch-mapper-attachments>`__

-  `carrot2
   Plugin <https://github.com/carrot2/elasticsearch-carrot2>`__: Results
   clustering with carrot2 (by Dawid Weiss)

-  `Elasticsearch Changes
   Plugin <https://github.com/derryx/elasticsearch-changes-plugin>`__
   (by Thomas Peuss)

-  `Extended Analyze
   Plugin <https://github.com/johtani/elasticsearch-extended-analyze>`__
   (by Jun Ohtani)

-  `Elasticsearch Graphite
   Plugin <https://github.com/spinscale/elasticsearch-graphite-plugin>`__
   (by Alexander Reelsen)

-  `Elasticsearch Mock Solr
   Plugin <https://github.com/mattweber/elasticsearch-mocksolrplugin>`__
   (by Matt Weber)

-  `Elasticsearch New Relic
   Plugin <https://github.com/viniciusccarvalho/elasticsearch-newrelic>`__
   (by Vinicius Carvalho)

-  `Elasticsearch Statsd
   Plugin <https://github.com/swoop-inc/elasticsearch-statsd-plugin>`__
   (by Swoop Inc.)

-  `Terms Component
   Plugin <https://github.com/endgameinc/elasticsearch-term-plugin>`__
   (by Endgame Inc.)

-  `Elasticsearch View
   Plugin <http://tlrx.github.com/elasticsearch-view-plugin>`__ (by
   Tanguy Leroux)

-  `ZooKeeper Discovery
   Plugin <https://github.com/sonian/elasticsearch-zookeeper>`__ (by
   Sonian Inc.)

-  `Elasticsearch Image
   Plugin <https://github.com/kzwang/elasticsearch-image>`__ (by Kevin
   Wang)

-  `Elasticsearch Experimental
   Highlighter <https://github.com/wikimedia/search-highlighter>`__ (by
   Wikimedia Foundation/Nik Everett)

-  `Elasticsearch Trigram Accelerated Regular Expression
   Filter <https://github.com/wikimedia/search-extra>`__ (by Wikimedia
   Foundation/Nik Everett)

-  `Elasticsearch Security
   Plugin <https://github.com/salyh/elasticsearch-security-plugin>`__
   (by Hendrik Saly)

-  `Elasticsearch Taste
   Plugin <https://github.com/codelibs/elasticsearch-taste>`__ (by
   CodeLibs Project)

-  `Elasticsearch SIREn
   Plugin <http://siren.solutions/siren/downloads/>`__: Nested data
   search (by SIREn Solutions)

Scripting
=========

The scripting module allows to use scripts in order to evaluate custom
expressions. For example, scripts can be used to return "script fields"
as part of a search request, or can be used to evaluate a custom score
for a query and so on.

The scripting module uses by default
`groovy <http://groovy.codehaus.org/>`__ (previously
`mvel <http://mvel.codehaus.org/>`__ in 1.3.x and earlier) as the
scripting language with some extensions. Groovy is used since it is
extremely fast and very simple to use.

Additional ``lang`` plugins are provided to allow to execute scripts in
different languages. Currently supported plugins are ``lang-javascript``
for JavaScript, ``lang-mvel`` for Mvel, and ``lang-python`` for Python.
All places where a ``script`` parameter can be used, a ``lang``
parameter (on the same level) can be provided to define the language of
the script. The ``lang`` options are ``groovy``, ``js``, ``mvel``,
``python``, ``expression`` and ``native``.

To increase security, Elasticsearch does not allow you to specify
scripts for non-sandboxed languages with a request. Instead, scripts
must be placed in the ``scripts`` directory inside the configuration
directory (the directory where elasticsearch.yml is). Scripts placed
into this directory will automatically be picked up and be available to
be used. Once a script has been placed in this directory, it can be
referenced by name. For example, a script called
``calculate-score.groovy`` can be referenced in a request like this:

.. code:: sh

    $ tree config
    config
    ├── elasticsearch.yml
    ├── logging.yml
    └── scripts
        └── calculate-score.groovy

.. code:: sh

    $ cat config/scripts/calculate-score.groovy
    log(_score * 2) + my_modifier

.. code:: js

    curl -XPOST localhost:9200/_search -d '{
      "query": {
        "function_score": {
          "query": {
            "match": {
              "body": "foo"
            }
          },
          "functions": [
            {
              "script_score": {
                "script": "calculate-score",
                "params": {
                  "my_modifier": 8
                }
              }
            }
          ]
        }
      }
    }'

The name of the script is derived from the hierarchy of directories it
exists under, and the file name without the lang extension. For example,
a script placed under ``config/scripts/group1/group2/test.py`` will be
named ``group1_group2_test``.

**Indexed Scripts**

If dynamic scripting is enabled, Elasticsearch allows you to store
scripts in an internal index known as ``.scripts`` and reference them by
id. There are REST endpoints to manage indexed scripts as follows:

Requests to the scripts endpoint look like :

.. code:: js

    /_scripts/{lang}/{id}

Where the ``lang`` part is the language the script is in and the ``id``
part is the id of the script. In the ``.scripts`` index the type of the
document will be set to the ``lang``.

.. code:: js

    curl -XPOST localhost:9200/_scripts/groovy/indexedCalculateScore -d '{
         "script": "log(_score * 2) + my_modifier"
    }'

This will create a document with id: ``indexedCalculateScore`` and type:
``groovy`` in the ``.scripts`` index. The type of the document is the
language used by the script.

This script can be accessed at query time by appending ``_id`` to the
script parameter and passing the script id. So ``script`` becomes
``script_id``.:

.. code:: js

    curl -XPOST localhost:9200/_search -d '{
      "query": {
        "function_score": {
          "query": {
            "match": {
              "body": "foo"
            }
          },
          "functions": [
            {
              "script_score": {
                "script_id": "indexedCalculateScore",
                "lang" : "groovy",
                "params": {
                  "my_modifier": 8
                }
              }
            }
          ]
        }
      }
    }'

Note that you must have dynamic scripting enabled to use indexed scripts
at query time.

The script can be viewed by:

.. code:: js

    curl -XGET localhost:9200/_scripts/groovy/indexedCalculateScore

This is rendered as:

.. code:: js

    '{
         "script": "log(_score * 2) + my_modifier"
    }'

Indexed scripts can be deleted by:

.. code:: js

    curl -XDELETE localhost:9200/_scripts/groovy/indexedCalculateScore

**Enabling dynamic scripting**

We recommend running Elasticsearch behind an application or proxy, which
protects Elasticsearch from the outside world. If users are allowed to
run dynamic scripts (even in a search request), then they have the same
access to your box as the user that Elasticsearch is running as. For
this reason dynamic scripting is allowed only for sandboxed languages by
default.

First, you should not run Elasticsearch as the ``root`` user, as this
would allow a script to access or do **anything** on your server,
without limitations. Second, you should not expose Elasticsearch
directly to users, but instead have a proxy application inbetween. If
you **do** intend to expose Elasticsearch directly to your users, then
you have to decide whether you trust them enough to run scripts on your
box or not. If you do, you can enable dynamic scripting by adding the
following setting to the ``config/elasticsearch.yml`` file on every
node:

.. code:: yaml

    script.disable_dynamic: false

While this still allows execution of named scripts provided in the
config, or *native* Java scripts registered through plugins, it also
allows users to run arbitrary scripts via the API. Instead of sending
the name of the file as the script, the body of the script can be sent
instead.

There are three possible configuration values for the
``script.disable_dynamic`` setting, the default value is ``sandbox``:

+--------------------------------------+--------------------------------------+
| Value                                | Description                          |
+======================================+======================================+
| ``true``                             | all dynamic scripting is disabled,   |
|                                      | scripts must be placed in the        |
|                                      | ``config/scripts`` directory.        |
+--------------------------------------+--------------------------------------+
| ``false``                            | all dynamic scripting is enabled,    |
|                                      | scripts may be sent as strings in    |
|                                      | requests.                            |
+--------------------------------------+--------------------------------------+
| ``sandbox``                          | scripts may be sent as strings for   |
|                                      | languages that are sandboxed.        |
+--------------------------------------+--------------------------------------+

**Default Scripting Language**

The default scripting language (assuming no ``lang`` parameter is
provided) is ``groovy``. In order to change it, set the
``script.default_lang`` to the appropriate language.

**Groovy Sandboxing**

Elasticsearch sandboxes Groovy scripts that are compiled and executed in
order to ensure they don’t perform unwanted actions. There are a number
of options that can be used for configuring this sandbox:

``script.groovy.sandbox.receiver_whitelist``
    Comma-separated list of string classes for objects that may have
    methods invoked.

``script.groovy.sandbox.package_whitelist``
    Comma-separated list of packages under which new objects may be
    constructed.

``script.groovy.sandbox.class_whitelist``
    Comma-separated list of classes that are allowed to be constructed.

``script.groovy.sandbox.method_blacklist``
    Comma-separated list of methods that are never allowed to be
    invoked, regardless of target object.

``script.groovy.sandbox.enabled``
    Flag to disable the sandbox (defaults to ``true`` meaning the
    sandbox is enabled).

When specifying whitelist or blacklist settings for the groovy sandbox,
all options replace the current whitelist, they are not additive.

**Automatic Script Reloading**

The ``config/scripts`` directory is scanned periodically for changes.
New and changed scripts are reloaded and deleted script are removed from
preloaded scripts cache. The reload frequency can be specified using
``watcher.interval`` setting, which defaults to ``60s``. To disable
script reloading completely set ``script.auto_reload_enabled`` to
``false``.

**Native (Java) Scripts**

Even though ``groovy`` is pretty fast, this allows to register native
Java based scripts for faster execution.

In order to allow for scripts, the ``NativeScriptFactory`` needs to be
implemented that constructs the script that will be executed. There are
two main types, one that extends ``AbstractExecutableScript`` and one
that extends ``AbstractSearchScript`` (probably the one most users will
extend, with additional helper classes in ``AbstractLongSearchScript``,
``AbstractDoubleSearchScript``, and ``AbstractFloatSearchScript``).

Registering them can either be done by settings, for example:
``script.native.my.type`` set to ``sample.MyNativeScriptFactory`` will
register a script named ``my``. Another option is in a plugin, access
``ScriptModule`` and call ``registerScript`` on it.

Executing the script is done by specifying the ``lang`` as ``native``,
and the name of the script as the ``script``.

Note, the scripts need to be in the classpath of elasticsearch. One
simple way to do it is to create a directory under plugins (choose a
descriptive name), and place the jar / classes files there. They will be
automatically loaded.

**Lucene Expressions Scripts**

    **Warning**

    This feature is **experimental** and subject to change in future
    versions.

Lucene’s expressions module provides a mechanism to compile a
``javascript`` expression to bytecode. This allows very fast execution,
as if you had written a ``native`` script. Expression scripts can be
used in ``script_score``, ``script_fields``, sort scripts and numeric
aggregation scripts.

See the `expressions module
documentation <http://lucene.apache.org/core/4_9_0/expressions/index.html?org/apache/lucene/expressions/js/package-summary.html>`__
for details on what operators and functions are available.

Variables in ``expression`` scripts are available to access:

-  Single valued document fields, e.g. ``doc['myfield'].value``

-  Parameters passed into the script, e.g. ``mymodifier``

-  The current document’s score, ``_score`` (only available when used in
   a ``script_score``)

There are a few limitations relative to other script languages:

-  Only numeric fields may be accessed

-  Stored fields are not available

-  If a field is sparse (only some documents contain a value), documents
   missing the field will have a value of ``0``

**Score**

In all scripts that can be used in aggregations, the current document’s
score is accessible in ``_score``.

**Computing scores based on terms in scripts**

see `advanced scripting documentation <#modules-advanced-scripting>`__

**Document Fields**

Most scripting revolve around the use of specific document fields data.
The ``doc['field_name']`` can be used to access specific field data
within a document (the document in question is usually derived by the
context the script is used). Document fields are very fast to access
since they end up being loaded into memory (all the relevant field
values/tokens are loaded to memory). Note, however, that the
``doc[...]`` notation only allows for simple valued fields (can’t return
a json object from it) and makes sense only on non-analyzed or single
term based fields.

The following data can be extracted from a field:

+--------------------------------------+--------------------------------------+
| Expression                           | Description                          |
+======================================+======================================+
| ``doc['field_name'].value``          | The native value of the field. For   |
|                                      | example, if its a short type, it     |
|                                      | will be short.                       |
+--------------------------------------+--------------------------------------+
| ``doc['field_name'].values``         | The native array values of the       |
|                                      | field. For example, if its a short   |
|                                      | type, it will be short[]. Remember,  |
|                                      | a field can have several values      |
|                                      | within a single doc. Returns an      |
|                                      | empty array if the field has no      |
|                                      | values.                              |
+--------------------------------------+--------------------------------------+
| ``doc['field_name'].empty``          | A boolean indicating if the field    |
|                                      | has no values within the doc.        |
+--------------------------------------+--------------------------------------+
| ``doc['field_name'].multiValued``    | A boolean indicating that the field  |
|                                      | has several values within the        |
|                                      | corpus.                              |
+--------------------------------------+--------------------------------------+
| ``doc['field_name'].lat``            | The latitude of a geo point type.    |
+--------------------------------------+--------------------------------------+
| ``doc['field_name'].lon``            | The longitude of a geo point type.   |
+--------------------------------------+--------------------------------------+
| ``doc['field_name'].lats``           | The latitudes of a geo point type.   |
+--------------------------------------+--------------------------------------+
| ``doc['field_name'].lons``           | The longitudes of a geo point type.  |
+--------------------------------------+--------------------------------------+
| ``doc['field_name'].distance(lat, lo | The ``plane`` distance (in meters)   |
| n)``                                 | of this geo point field from the     |
|                                      | provided lat/lon.                    |
+--------------------------------------+--------------------------------------+
| ``doc['field_name'].distanceWithDefa | The ``plane`` distance (in meters)   |
| ult(lat, lon, default)``             | of this geo point field from the     |
|                                      | provided lat/lon with a default      |
|                                      | value.                               |
+--------------------------------------+--------------------------------------+
| ``doc['field_name'].distanceInMiles( | The ``plane`` distance (in miles) of |
| lat, lon)``                          | this geo point field from the        |
|                                      | provided lat/lon.                    |
+--------------------------------------+--------------------------------------+
| ``doc['field_name'].distanceInMilesW | The ``plane`` distance (in miles) of |
| ithDefault(lat, lon, default)``      | this geo point field from the        |
|                                      | provided lat/lon with a default      |
|                                      | value.                               |
+--------------------------------------+--------------------------------------+
| ``doc['field_name'].distanceInKm(lat | The ``plane`` distance (in km) of    |
| , lon)``                             | this geo point field from the        |
|                                      | provided lat/lon.                    |
+--------------------------------------+--------------------------------------+
| ``doc['field_name'].distanceInKmWith | The ``plane`` distance (in km) of    |
| Default(lat, lon, default)``         | this geo point field from the        |
|                                      | provided lat/lon with a default      |
|                                      | value.                               |
+--------------------------------------+--------------------------------------+
| ``doc['field_name'].arcDistance(lat, | The ``arc`` distance (in meters) of  |
|  lon)``                              | this geo point field from the        |
|                                      | provided lat/lon.                    |
+--------------------------------------+--------------------------------------+
| ``doc['field_name'].arcDistanceWithD | The ``arc`` distance (in meters) of  |
| efault(lat, lon, default)``          | this geo point field from the        |
|                                      | provided lat/lon with a default      |
|                                      | value.                               |
+--------------------------------------+--------------------------------------+
| ``doc['field_name'].arcDistanceInMil | The ``arc`` distance (in miles) of   |
| es(lat, lon)``                       | this geo point field from the        |
|                                      | provided lat/lon.                    |
+--------------------------------------+--------------------------------------+
| ``doc['field_name'].arcDistanceInMil | The ``arc`` distance (in miles) of   |
| esWithDefault(lat, lon, default)``   | this geo point field from the        |
|                                      | provided lat/lon with a default      |
|                                      | value.                               |
+--------------------------------------+--------------------------------------+
| ``doc['field_name'].arcDistanceInKm( | The ``arc`` distance (in km) of this |
| lat, lon)``                          | geo point field from the provided    |
|                                      | lat/lon.                             |
+--------------------------------------+--------------------------------------+
| ``doc['field_name'].arcDistanceInKmW | The ``arc`` distance (in km) of this |
| ithDefault(lat, lon, default)``      | geo point field from the provided    |
|                                      | lat/lon with a default value.        |
+--------------------------------------+--------------------------------------+
| ``doc['field_name'].factorDistance(l | The distance factor of this geo      |
| at, lon)``                           | point field from the provided        |
|                                      | lat/lon.                             |
+--------------------------------------+--------------------------------------+
| ``doc['field_name'].factorDistance(l | The distance factor of this geo      |
| at, lon, default)``                  | point field from the provided        |
|                                      | lat/lon with a default value.        |
+--------------------------------------+--------------------------------------+
| ``doc['field_name'].geohashDistance( | The ``arc`` distance (in meters) of  |
| geohash)``                           | this geo point field from the        |
|                                      | provided geohash.                    |
+--------------------------------------+--------------------------------------+
| ``doc['field_name'].geohashDistanceI | The ``arc`` distance (in km) of this |
| nKm(geohash)``                       | geo point field from the provided    |
|                                      | geohash.                             |
+--------------------------------------+--------------------------------------+
| ``doc['field_name'].geohashDistanceI | The ``arc`` distance (in miles) of   |
| nMiles(geohash)``                    | this geo point field from the        |
|                                      | provided geohash.                    |
+--------------------------------------+--------------------------------------+

**Stored Fields**

Stored fields can also be accessed when executing a script. Note, they
are much slower to access compared with document fields, as they are not
loaded into memory. They can be simply accessed using
``_fields['my_field_name'].value`` or
``_fields['my_field_name'].values``.

**Accessing the score of a document within a script**

When using scripting for calculating the score of a document (for
instance, with the ``function_score`` query), you can access the score
using the ``_score`` variable inside of a Groovy script.

**Source Field**

The source field can also be accessed when executing a script. The
source field is loaded per doc, parsed, and then provided to the script
for evaluation. The ``_source`` forms the context under which the source
field can be accessed, for example ``_source.obj2.obj1.field3``.

Accessing ``_source`` is much slower compared to using ``_doc`` but the
data is not loaded into memory. For a single field access ``_fields``
may be faster than using ``_source`` due to the extra overhead of
potentially parsing large documents. However, ``_source`` may be faster
if you access multiple fields or if the source has already been loaded
for other purposes.

**Groovy Built In Functions**

There are several built in functions that can be used within scripts.
They include:

+--------------------------------------+--------------------------------------+
| Function                             | Description                          |
+======================================+======================================+
| ``sin(a)``                           | Returns the trigonometric sine of an |
|                                      | angle.                               |
+--------------------------------------+--------------------------------------+
| ``cos(a)``                           | Returns the trigonometric cosine of  |
|                                      | an angle.                            |
+--------------------------------------+--------------------------------------+
| ``tan(a)``                           | Returns the trigonometric tangent of |
|                                      | an angle.                            |
+--------------------------------------+--------------------------------------+
| ``asin(a)``                          | Returns the arc sine of a value.     |
+--------------------------------------+--------------------------------------+
| ``acos(a)``                          | Returns the arc cosine of a value.   |
+--------------------------------------+--------------------------------------+
| ``atan(a)``                          | Returns the arc tangent of a value.  |
+--------------------------------------+--------------------------------------+
| ``toRadians(angdeg)``                | Converts an angle measured in        |
|                                      | degrees to an approximately          |
|                                      | equivalent angle measured in radians |
+--------------------------------------+--------------------------------------+
| ``toDegrees(angrad)``                | Converts an angle measured in        |
|                                      | radians to an approximately          |
|                                      | equivalent angle measured in         |
|                                      | degrees.                             |
+--------------------------------------+--------------------------------------+
| ``exp(a)``                           | Returns Euler’s number *e* raised to |
|                                      | the power of value.                  |
+--------------------------------------+--------------------------------------+
| ``log(a)``                           | Returns the natural logarithm (base  |
|                                      | *e*) of a value.                     |
+--------------------------------------+--------------------------------------+
| ``log10(a)``                         | Returns the base 10 logarithm of a   |
|                                      | value.                               |
+--------------------------------------+--------------------------------------+
| ``sqrt(a)``                          | Returns the correctly rounded        |
|                                      | positive square root of a value.     |
+--------------------------------------+--------------------------------------+
| ``cbrt(a)``                          | Returns the cube root of a double    |
|                                      | value.                               |
+--------------------------------------+--------------------------------------+
| ``IEEEremainder(f1, f2)``            | Computes the remainder operation on  |
|                                      | two arguments as prescribed by the   |
|                                      | IEEE 754 standard.                   |
+--------------------------------------+--------------------------------------+
| ``ceil(a)``                          | Returns the smallest (closest to     |
|                                      | negative infinity) value that is     |
|                                      | greater than or equal to the         |
|                                      | argument and is equal to a           |
|                                      | mathematical integer.                |
+--------------------------------------+--------------------------------------+
| ``floor(a)``                         | Returns the largest (closest to      |
|                                      | positive infinity) value that is     |
|                                      | less than or equal to the argument   |
|                                      | and is equal to a mathematical       |
|                                      | integer.                             |
+--------------------------------------+--------------------------------------+
| ``rint(a)``                          | Returns the value that is closest in |
|                                      | value to the argument and is equal   |
|                                      | to a mathematical integer.           |
+--------------------------------------+--------------------------------------+
| ``atan2(y, x)``                      | Returns the angle *theta* from the   |
|                                      | conversion of rectangular            |
|                                      | coordinates (*x*, *y*) to polar      |
|                                      | coordinates (r,*theta*).             |
+--------------------------------------+--------------------------------------+
| ``pow(a, b)``                        | Returns the value of the first       |
|                                      | argument raised to the power of the  |
|                                      | second argument.                     |
+--------------------------------------+--------------------------------------+
| ``round(a)``                         | Returns the closest *int* to the     |
|                                      | argument.                            |
+--------------------------------------+--------------------------------------+
| ``random()``                         | Returns a random *double* value.     |
+--------------------------------------+--------------------------------------+
| ``abs(a)``                           | Returns the absolute value of a      |
|                                      | value.                               |
+--------------------------------------+--------------------------------------+
| ``max(a, b)``                        | Returns the greater of two values.   |
+--------------------------------------+--------------------------------------+
| ``min(a, b)``                        | Returns the smaller of two values.   |
+--------------------------------------+--------------------------------------+
| ``ulp(d)``                           | Returns the size of an ulp of the    |
|                                      | argument.                            |
+--------------------------------------+--------------------------------------+
| ``signum(d)``                        | Returns the signum function of the   |
|                                      | argument.                            |
+--------------------------------------+--------------------------------------+
| ``sinh(x)``                          | Returns the hyperbolic sine of a     |
|                                      | value.                               |
+--------------------------------------+--------------------------------------+
| ``cosh(x)``                          | Returns the hyperbolic cosine of a   |
|                                      | value.                               |
+--------------------------------------+--------------------------------------+
| ``tanh(x)``                          | Returns the hyperbolic tangent of a  |
|                                      | value.                               |
+--------------------------------------+--------------------------------------+
| ``hypot(x, y)``                      | Returns sqrt(\ *x2* + *y2*) without  |
|                                      | intermediate overflow or underflow.  |
+--------------------------------------+--------------------------------------+

**Arithmetic precision in MVEL**

When dividing two numbers using MVEL based scripts, the engine tries to
be smart and adheres to the default behaviour of java. This means if you
divide two integers (you might have configured the fields as integer in
the mapping), the result will also be an integer. This means, if a
calculation like ``1/num`` is happening in your scripts and ``num`` is
an integer with the value of ``8``, the result is ``0`` even though you
were expecting it to be ``0.125``. You may need to enforce precision by
explicitly using a double like ``1.0/num`` in order to get the expected
result.

Text scoring in scripts
=======================

Text features, such as term or document frequency for a specific term
can be accessed in scripts (see `scripting
documentation <#modules-scripting>`__ ) with the ``_index`` variable.
This can be useful if, for example, you want to implement your own
scoring model using for example a script inside a `function score
query <#query-dsl-function-score-query>`__. Statistics over the document
collection are computed **per shard**, not per index.

**Nomenclature:**

+------------+---------------------------------------------------------------+
| ``df``     | document frequency. The number of documents a term appears    |
|            | in. Computed per field.                                       |
+------------+---------------------------------------------------------------+
| ``tf``     | term frequency. The number times a term appears in a field in |
|            | one specific document.                                        |
+------------+---------------------------------------------------------------+
| ``ttf``    | total term frequency. The number of times this term appears   |
|            | in all documents, that is, the sum of ``tf`` over all         |
|            | documents. Computed per field.                                |
+------------+---------------------------------------------------------------+

``df`` and ``ttf`` are computed per shard and therefore these numbers
can vary depending on the shard the current document resides in.

**Shard statistics:**

``_index.numDocs()``
    Number of documents in shard.

``_index.maxDoc()``
    Maximal document number in shard.

``_index.numDeletedDocs()``
    Number of deleted documents in shard.

**Field statistics:**

Field statistics can be accessed with a subscript operator like this:
``_index['FIELD']``.

``_index['FIELD'].docCount()``
    Number of documents containing the field ``FIELD``. Does not take
    deleted documents into account.

``_index['FIELD'].sumttf()``
    Sum of ``ttf`` over all terms that appear in field ``FIELD`` in all
    documents.

``_index['FIELD'].sumdf()``
    The sum of ``df`` s over all terms that appear in field ``FIELD`` in
    all documents.

Field statistics are computed per shard and therfore these numbers can
vary depending on the shard the current document resides in. The number
of terms in a field cannot be accessed using the ``_index`` variable.
See `word count mapping type <#mapping-core-types>`__ on how to do that.

**Term statistics:**

Term statistics for a field can be accessed with a subscript operator
like this: ``_index['FIELD']['TERM']``. This will never return null,
even if term or field does not exist. If you do not need the term
frequency, call ``_index['FIELD'].get('TERM', 0)`` to avoid unnecessary
initialization of the frequencies. The flag will have only affect is
your set the ``index_options`` to ``docs`` (see `mapping
documentation <#mapping-core-types>`__).

``_index['FIELD']['TERM'].df()``
    ``df`` of term ``TERM`` in field ``FIELD``. Will be returned, even
    if the term is not present in the current document.

``_index['FIELD']['TERM'].ttf()``
    The sum of term frequencys of term ``TERM`` in field ``FIELD`` over
    all documents. Will be returned, even if the term is not present in
    the current document.

``_index['FIELD']['TERM'].tf()``
    ``tf`` of term ``TERM`` in field ``FIELD``. Will be 0 if the term is
    not present in the current document.

**Term positions, offsets and payloads:**

If you need information on the positions of terms in a field, call
``_index['FIELD'].get('TERM', flag)`` where flag can be

+------------+---------------------------------------------------------------+
| ``_POSITIO | if you need the positions of the term                         |
| NS``       |                                                               |
+------------+---------------------------------------------------------------+
| ``_OFFSETS | if you need the offests of the term                           |
| ``         |                                                               |
+------------+---------------------------------------------------------------+
| ``_PAYLOAD | if you need the payloads of the term                          |
| S``        |                                                               |
+------------+---------------------------------------------------------------+
| ``_CACHE`` | if you need to iterate over all positions several times       |
+------------+---------------------------------------------------------------+

The iterator uses the underlying lucene classes to iterate over
positions. For efficiency reasons, you can only iterate over positions
once. If you need to iterate over the positions several times, set the
``_CACHE`` flag.

You can combine the operators with a ``|`` if you need more than one
info. For example, the following will return an object holding the
positions and payloads, as well as all statistics:

::

    `_index['FIELD'].get('TERM', _POSITIONS | _PAYLOADS)`

Positions can be accessed with an iterator that returns an object
(``POS_OBJECT``) holding position, offsets and payload for each term
position.

``POS_OBJECT.position``
    The position of the term.

``POS_OBJECT.startOffset``
    The start offset of the term.

``POS_OBJECT.endOffset``
    The end offset of the term.

``POS_OBJECT.payload``
    The payload of the term.

``POS_OBJECT.payloadAsInt(missingValue)``
    The payload of the term converted to integer. If the current
    position has no payload, the ``missingValue`` will be returned. Call
    this only if you know that your payloads are integers.

``POS_OBJECT.payloadAsFloat(missingValue)``
    The payload of the term converted to float. If the current position
    has no payload, the ``missingValue`` will be returned. Call this
    only if you know that your payloads are floats.

``POS_OBJECT.payloadAsString()``
    The payload of the term converted to string. If the current position
    has no payload, ``null`` will be returned. Call this only if you
    know that your payloads are strings.

Example: sums up all payloads for the term ``foo``.

.. code:: groovy

    termInfo = _index['my_field'].get('foo',_PAYLOADS);
    score = 0;
    for (pos in termInfo) {
        score = score + pos.payloadAsInt(0);
    }
    return score;

**Term vectors:**

The ``_index`` variable can only be used to gather statistics for single
terms. If you want to use information on all terms in a field, you must
store the term vectors (set ``term_vector`` in the mapping as described
in the `mapping documentation <#mapping-core-types>`__). To access them,
call ``_index.termVectors()`` to get a
`Fields <https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/index/Fields.html>`__
instance. This object can then be used as described in `lucene
doc <https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/index/Fields.html>`__
to iterate over fields and then for each field iterate over each term in
the field. The method will return null if the term vectors were not
stored.

Thread Pool
===========

A node holds several thread pools in order to improve how threads are
managed and memory consumption within a node. There are several thread
pools, but the important ones include:

+------------+---------------------------------------------------------------+
| ``index``  | For index/delete operations, defaults to ``fixed``, size      |
|            | ``# of available processors``. queue\_size ``200``.           |
+------------+---------------------------------------------------------------+
| ``search`` | For count/search operations, defaults to ``fixed``, size      |
|            | ``3x # of available processors``. queue\_size ``1000``.       |
+------------+---------------------------------------------------------------+
| ``suggest` | For suggest operations, defaults to ``fixed``, size           |
| `          | ``# of available processors``. queue\_size ``1000``.          |
+------------+---------------------------------------------------------------+
| ``get``    | For get operations, defaults to ``fixed`` size                |
|            | ``# of available processors``. queue\_size ``1000``.          |
+------------+---------------------------------------------------------------+
| ``bulk``   | For bulk operations, defaults to ``fixed`` size               |
|            | ``# of available processors``. queue\_size ``50``.            |
+------------+---------------------------------------------------------------+
| ``percolat | For percolate operations, defaults to ``fixed`` size          |
| e``        | ``# of available processors``. queue\_size ``1000``.          |
+------------+---------------------------------------------------------------+
| ``snapshot | For snapshot/restore operations, defaults to ``scaling``      |
| ``         | keep-alive ``5m``, size ``(# of available processors)/2``.    |
+------------+---------------------------------------------------------------+
| ``warmer`` | For segment warm-up operations, defaults to ``scaling`` with  |
|            | a ``5m`` keep-alive.                                          |
+------------+---------------------------------------------------------------+
| ``refresh` | For refresh operations, defaults to ``scaling`` with a ``5m`` |
| `          | keep-alive.                                                   |
+------------+---------------------------------------------------------------+
| ``listener | Mainly for java client executing of action when listener      |
| ``         | threaded is set to true size                                  |
|            | ``(# of available processors)/2`` max at 10.                  |
+------------+---------------------------------------------------------------+

Changing a specific thread pool can be done by setting its type and
specific type parameters, for example, changing the ``index`` thread
pool to have more threads:

.. code:: js

    threadpool:
        index:
            type: fixed
            size: 30

    **Note**

    you can update threadpool settings live using ?.

**Thread pool types**

The following are the types of thread pools that can be used and their
respective parameters:

**``cache``**

The ``cache`` thread pool is an unbounded thread pool that will spawn a
thread if there are pending requests. Here is an example of how to set
it:

.. code:: js

    threadpool:
        index:
            type: cached

**``fixed``**

The ``fixed`` thread pool holds a fixed size of threads to handle the
requests with a queue (optionally bounded) for pending requests that
have no threads to service them.

The ``size`` parameter controls the number of threads, and defaults to
the number of cores times 5.

The ``queue_size`` allows to control the size of the queue of pending
requests that have no threads to execute them. By default, it is set to
``-1`` which means its unbounded. When a request comes in and the queue
is full, it will abort the request.

.. code:: js

    threadpool:
        index:
            type: fixed
            size: 30
            queue_size: 1000

**Processors setting**

The number of processors is automatically detected, and the thread pool
settings are automatically set based on it. Sometimes, the number of
processors are wrongly detected, in such cases, the number of processors
can be explicitly set using the ``processors`` setting.

In order to check the number of processors detected, use the nodes info
API with the ``os`` flag.

Thrift
======

The `thrift <https://thrift.apache.org/>`__ transport module allows to
expose the REST interface of elasticsearch using thrift. Thrift should
provide better performance over http. Since thrift provides both the
wire protocol and the transport, it should make using Elasticsearch more
efficient (though it has limited documentation).

Using thrift requires installing the ``transport-thrift`` plugin,
located
`here <https://github.com/elasticsearch/elasticsearch-transport-thrift>`__.

The thrift
`schema <https://github.com/elasticsearch/elasticsearch-transport-thrift/blob/master/elasticsearch.thrift>`__
can be used to generate thrift clients.

+--------------------------------------+--------------------------------------+
| Setting                              | Description                          |
+======================================+======================================+
| ``thrift.port``                      | The port to bind to. Defaults to     |
|                                      | 9500-9600                            |
+--------------------------------------+--------------------------------------+
| ``thrift.frame``                     | Defaults to ``-1``, which means no   |
|                                      | framing. Set to a higher value to    |
|                                      | specify the frame size (like         |
|                                      | ``15mb``).                           |
+--------------------------------------+--------------------------------------+

Transport
=========

The transport module is used for internal communication between nodes
within the cluster. Each call that goes from one node to the other uses
the transport module (for example, when an HTTP GET request is processed
by one node, and should actually be processed by another node that holds
the data).

The transport mechanism is completely asynchronous in nature, meaning
that there is no blocking thread waiting for a response. The benefit of
using asynchronous communication is first solving the `C10k
problem <http://en.wikipedia.org/wiki/C10k_problem>`__, as well as being
the ideal solution for scatter (broadcast) / gather operations such as
search in ElasticSearch.

**TCP Transport**

The TCP transport is an implementation of the transport module using
TCP. It allows for the following settings:

+--------------------------------------+--------------------------------------+
| Setting                              | Description                          |
+======================================+======================================+
| ``transport.tcp.port``               | A bind port range. Defaults to       |
|                                      | ``9300-9400``.                       |
+--------------------------------------+--------------------------------------+
| ``transport.publish_port``           | The port that other nodes in the     |
|                                      | cluster should use when              |
|                                      | communicating with this node. Useful |
|                                      | when a cluster node is behind a      |
|                                      | proxy or firewall and the            |
|                                      | ``transport.tcp.port`` is not        |
|                                      | directly addressable from the        |
|                                      | outside. Defaults to the actual port |
|                                      | assigned via ``transport.tcp.port``. |
+--------------------------------------+--------------------------------------+
| ``transport.bind_host``              | The host address to bind the         |
|                                      | transport service to. Defaults to    |
|                                      | ``transport.host`` (if set) or       |
|                                      | ``network.bind_host``.               |
+--------------------------------------+--------------------------------------+
| ``transport.publish_host``           | The host address to publish for      |
|                                      | nodes in the cluster to connect to.  |
|                                      | Defaults to ``transport.host`` (if   |
|                                      | set) or ``network.publish_host``.    |
+--------------------------------------+--------------------------------------+
| ``transport.host``                   | Used to set the                      |
|                                      | ``transport.bind_host`` and the      |
|                                      | ``transport.publish_host`` Defaults  |
|                                      | to ``transport.host`` or             |
|                                      | ``network.host``.                    |
+--------------------------------------+--------------------------------------+
| ``transport.tcp.connect_timeout``    | The socket connect timeout setting   |
|                                      | (in time setting format). Defaults   |
|                                      | to ``30s``.                          |
+--------------------------------------+--------------------------------------+
| ``transport.tcp.compress``           | Set to ``true`` to enable            |
|                                      | compression (LZF) between all nodes. |
|                                      | Defaults to ``false``.               |
+--------------------------------------+--------------------------------------+

It also uses the common `network settings <#modules-network>`__.

**TCP Transport Profiles**

Elasticsearch allows you to bind to multiple ports on different
interfaces by the use of transport profiles. See this example
configuration

.. code:: yaml

    transport.profiles.default.port: 9300-9400
    transport.profiles.default.bind_host: 10.0.0.1
    transport.profiles.client.port: 9500-9600
    transport.profiles.client.bind_host: 192.168.0.1
    transport.profiles.dmz.port: 9700-9800
    transport.profiles.dmz.bind_host: 172.16.1.2

The ``default`` profile is a special. It is used as fallback for any
other profiles, if those do not have a specific configuration setting
set. Note that the default profile is how other nodes in the cluster
will connect to this node usually. In the future this feature will allow
to enable node-to-node communication via multiple interfaces.

The following parameters can be configured like that

-  ``port``: The port to bind to

-  ``bind_host``: The host to bind

-  ``publish_host``: The host which is published in informational APIs

-  ``tcp_no_delay``: Configures the ``TCP_NO_DELAY`` option for this
   socket

-  ``tcp_keep_alive``: Configures the ``SO_KEEPALIVE`` option for this
   socket

-  ``reuse_address``: Configures the ``SO_REUSEADDR`` option for this
   socket

-  ``tcp_send_buffer_size``: Configures the send buffer size of the
   socket

-  ``tcp_receive_buffer_size``: Configures the receive buffer size of
   the socket

**Local Transport**

This is a handy transport to use when running integration tests within
the JVM. It is automatically enabled when using
``NodeBuilder#local(true)``.

Snapshot And Restore
====================

The snapshot and restore module allows to create snapshots of individual
indices or an entire cluster into a remote repository. At the time of
the initial release only shared file system repository was supported,
but now a range of backends are available via officially supported
repository plugins.

**Repositories**

Before any snapshot or restore operation can be performed a snapshot
repository should be registered in Elasticsearch. The following command
registers a shared file system repository with the name ``my_backup``
that will use location ``/mount/backups/my_backup`` to store snapshots.

.. code:: js

    $ curl -XPUT 'http://localhost:9200/_snapshot/my_backup' -d '{
        "type": "fs",
        "settings": {
            "location": "/mount/backups/my_backup",
            "compress": true
        }
    }'

Once repository is registered, its information can be obtained using the
following command:

.. code:: js

    $ curl -XGET 'http://localhost:9200/_snapshot/my_backup?pretty'

.. code:: js

    {
      "my_backup" : {
        "type" : "fs",
        "settings" : {
          "compress" : "true",
          "location" : "/mount/backups/my_backup"
        }
      }
    }

If a repository name is not specified, or ``_all`` is used as repository
name Elasticsearch will return information about all repositories
currently registered in the cluster:

.. code:: js

    $ curl -XGET 'http://localhost:9200/_snapshot'

or

.. code:: js

    $ curl -XGET 'http://localhost:9200/_snapshot/_all'

**Shared File System Repository**

The shared file system repository (``"type": "fs"``) is using shared
file system to store snapshot. The path specified in the ``location``
parameter should point to the same location in the shared filesystem and
be accessible on all data and master nodes. The following settings are
supported:

+------------+---------------------------------------------------------------+
| ``location | Location of the snapshots. Mandatory.                         |
| ``         |                                                               |
+------------+---------------------------------------------------------------+
| ``compress | Turns on compression of the snapshot files. Compression is    |
| ``         | applied only to metadata files (index mapping and settings).  |
|            | Data files are not compressed. Defaults to ``true``.          |
+------------+---------------------------------------------------------------+
| ``chunk_si | Big files can be broken down into chunks during snapshotting  |
| ze``       | if needed. The chunk size can be specified in bytes or by     |
|            | using size value notation, i.e. 1g, 10m, 5k. Defaults to      |
|            | ``null`` (unlimited chunk size).                              |
+------------+---------------------------------------------------------------+
| ``max_rest | Throttles per node restore rate. Defaults to ``20mb`` per     |
| ore_bytes_ | second.                                                       |
| per_sec``  |                                                               |
+------------+---------------------------------------------------------------+
| ``max_snap | Throttles per node snapshot rate. Defaults to ``20mb`` per    |
| shot_bytes | second.                                                       |
| _per_sec`` |                                                               |
+------------+---------------------------------------------------------------+
| ``verify`` | Verify repository upon creation. Defaults to ``true``.        |
+------------+---------------------------------------------------------------+

**Read-only URL Repository**

The URL repository (``"type": "url"``) can be used as an alternative
read-only way to access data created by shared file system repository is
using shared file system to store snapshot. The URL specified in the
``url`` parameter should point to the root of the shared filesystem
repository. The following settings are supported:

+------------+---------------------------------------------------------------+
| ``url``    | Location of the snapshots. Mandatory.                         |
+------------+---------------------------------------------------------------+

**Repository plugins**

Other repository backends are available in these official plugins:

-  `AWS Cloud
   Plugin <https://github.com/elasticsearch/elasticsearch-cloud-aws#s3-repository>`__
   for S3 repositories

-  `HDFS
   Plugin <https://github.com/elasticsearch/elasticsearch-hadoop/tree/master/repository-hdfs>`__
   for Hadoop environments

-  `Azure Cloud
   Plugin <https://github.com/elasticsearch/elasticsearch-cloud-azure#azure-repository>`__
   for Azure storage repositories

**Repository Verification**

When repository is registered, it’s immediately verified on all master
and data nodes to make sure that it’s functional on all nodes currently
present in the cluster. The verification process can also be executed
manually by running the following command:

.. code:: js

    $ curl -XPOST 'http://localhost:9200/_snapshot/my_backup/_verify'

It returns a list of nodes where repository was successfully verified or
an error message if verification process failed.

**Snapshot**

A repository can contain multiple snapshots of the same cluster.
Snapshot are identified by unique names within the cluster. A snapshot
with the name ``snapshot_1`` in the repository ``my_backup`` can be
created by executing the following command:

.. code:: js

    $ curl -XPUT "localhost:9200/_snapshot/my_backup/snapshot_1?wait_for_completion=true"

The ``wait_for_completion`` parameter specifies whether or not the
request should return immediately after snapshot initialization
(default) or wait for snapshot completion. During snapshot
initialization, information about all previous snapshots is loaded into
the memory, which means that in large repositories it may take several
seconds (or even minutes) for this command to return even if the
``wait_for_completion`` parameter is set to ``false``.

By default snapshot of all open and started indices in the cluster is
created. This behavior can be changed by specifying the list of indices
in the body of the snapshot request.

.. code:: js

    $ curl -XPUT "localhost:9200/_snapshot/my_backup/snapshot_1" -d '{
        "indices": "index_1,index_2",
        "ignore_unavailable": "true",
        "include_global_state": false
    }'

The list of indices that should be included into the snapshot can be
specified using the ``indices`` parameter that supports `multi index
syntax <#search-multi-index-type>`__. The snapshot request also supports
the ``ignore_unavailable`` option. Setting it to ``true`` will cause
indices that do not exist to be ignored during snapshot creation. By
default, when ``ignore_unavailable`` option is not set and an index is
missing the snapshot request will fail. By setting
``include_global_state`` to false it’s possible to prevent the cluster
global state to be stored as part of the snapshot. By default, entire
snapshot will fail if one or more indices participating in the snapshot
don’t have all primary shards available. This behaviour can be changed
by setting ``partial`` to ``true``.

The index snapshot process is incremental. In the process of making the
index snapshot Elasticsearch analyses the list of the index files that
are already stored in the repository and copies only files that were
created or changed since the last snapshot. That allows multiple
snapshots to be preserved in the repository in a compact form.
Snapshotting process is executed in non-blocking fashion. All indexing
and searching operation can continue to be executed against the index
that is being snapshotted. However, a snapshot represents the
point-in-time view of the index at the moment when snapshot was created,
so no records that were added to the index after snapshot process had
started will be present in the snapshot. The snapshot process starts
immediately for the primary shards that has been started and are not
relocating at the moment. Before version 1.2.0, the snapshot operation
fails if the cluster has any relocating or initializing primaries of
indices participating in the snapshot. Starting with version 1.2.0,
Elasticsearch waits for relocation or initialization of shards to
complete before snapshotting them.

Besides creating a copy of each index the snapshot process can also
store global cluster metadata, which includes persistent cluster
settings and templates. The transient settings and registered snapshot
repositories are not stored as part of the snapshot.

Only one snapshot process can be executed in the cluster at any time.
While snapshot of a particular shard is being created this shard cannot
be moved to another node, which can interfere with rebalancing process
and allocation filtering. Once snapshot of the shard is finished
Elasticsearch will be able to move shard to another node according to
the current allocation filtering settings and rebalancing algorithm.

Once a snapshot is created information about this snapshot can be
obtained using the following command:

.. code:: shell

    $ curl -XGET "localhost:9200/_snapshot/my_backup/snapshot_1"

All snapshots currently stored in the repository can be listed using the
following command:

.. code:: shell

    $ curl -XGET "localhost:9200/_snapshot/my_backup/_all"

A snapshot can be deleted from the repository using the following
command:

.. code:: shell

    $ curl -XDELETE "localhost:9200/_snapshot/my_backup/snapshot_1"

When a snapshot is deleted from a repository, Elasticsearch deletes all
files that are associated with the deleted snapshot and not used by any
other snapshots. If the deleted snapshot operation is executed while the
snapshot is being created the snapshotting process will be aborted and
all files created as part of the snapshotting process will be cleaned.
Therefore, the delete snapshot operation can be used to cancel long
running snapshot operations that were started by mistake.

A repository can be deleted using the following command:

.. code:: shell

    $ curl -XDELETE "localhost:9200/_snapshot/my_backup"

When a repository is deleted, Elasticsearch only removes the reference
to the location where the repository is storing the snapshots. The
snapshots themselves are left untouched and in place.

**Restore**

A snapshot can be restored using the following command:

.. code:: shell

    $ curl -XPOST "localhost:9200/_snapshot/my_backup/snapshot_1/_restore"

By default, all indices in the snapshot as well as cluster state are
restored. It’s possible to select indices that should be restored as
well as prevent global cluster state from being restored by using
``indices`` and ``include_global_state`` options in the restore request
body. The list of indices supports `multi index
syntax <#search-multi-index-type>`__. The ``rename_pattern`` and
``rename_replacement`` options can be also used to rename index on
restore using regular expression that supports referencing the original
text as explained
`here <http://docs.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html#appendReplacement(java.lang.StringBuffer,%20java.lang.String)>`__.
Set ``include_aliases`` to ``false`` to prevent aliases from being
restored together with associated indices

.. code:: js

    $ curl -XPOST "localhost:9200/_snapshot/my_backup/snapshot_1/_restore" -d '{
        "indices": "index_1,index_2",
        "ignore_unavailable": "true",
        "include_global_state": false,
        "rename_pattern": "index_(.+)",
        "rename_replacement": "restored_index_$1"
    }'

The restore operation can be performed on a functioning cluster.
However, an existing index can be only restored if it’s
`closed <#indices-open-close>`__. The restore operation automatically
opens restored indices if they were closed and creates new indices if
they didn’t exist in the cluster. If cluster state is restored, the
restored templates that don’t currently exist in the cluster are added
and existing templates with the same name are replaced by the restored
templates. The restored persistent settings are added to the existing
persistent settings.

**Partial restore**

By default, entire restore operation will fail if one or more indices
participating in the operation don’t have snapshots of all shards
available. It can occur if some shards failed to snapshot for example.
It is still possible to restore such indices by setting ``partial`` to
``true``. Please note, that only successfully snapshotted shards will be
restored in this case and all missing shards will be recreated empty.

**Snapshot status**

A list of currently running snapshots with their detailed status
information can be obtained using the following command:

.. code:: shell

    $ curl -XGET "localhost:9200/_snapshot/_status"

In this format, the command will return information about all currently
running snapshots. By specifying a repository name, it’s possible to
limit the results to a particular repository:

.. code:: shell

    $ curl -XGET "localhost:9200/_snapshot/my_backup/_status"

If both repository name and snapshot id are specified, this command will
return detailed status information for the given snapshot even if it’s
not currently running:

.. code:: shell

    $ curl -XGET "localhost:9200/_snapshot/my_backup/snapshot_1/_status"

Multiple ids are also supported:

.. code:: shell

    $ curl -XGET "localhost:9200/_snapshot/my_backup/snapshot_1,snapshot_2/_status"

**Monitoring snapshot/restore progress**

There are several ways to monitor the progress of the snapshot and
restores processes while they are running. Both operations support
``wait_for_completion`` parameter that would block client until the
operation is completed. This is the simplest method that can be used to
get notified about operation completion.

The snapshot operation can be also monitored by periodic calls to the
snapshot info:

.. code:: shell

    $ curl -XGET "localhost:9200/_snapshot/my_backup/snapshot_1"

Please note that snapshot info operation is using the same resources and
thread pool as the snapshot operation. So, executing snapshot info
operation while large shards are being snapshotted can cause the
snapshot info operation to wait for available resources before returning
the result. On very large shards the wait time can be significant.

To get more immediate and complete information about snapshots the
snapshot status command can be used instead:

.. code:: shell

    $ curl -XGET "localhost:9200/_snapshot/my_backup/snapshot_1/_status"

While snapshot info method returns only basic information about the
snapshot in progress, the snapshot status returns complete breakdown of
the current state for each shard participating in the snapshot.

The restore process piggybacks on the standard recovery mechanism of the
Elasticsearch. As a result, standard recovery monitoring services can be
used to monitor the state of restore. When restore operation is executed
the cluster typically goes into ``red`` state. It happens because the
restore operation starts with "recovering" primary shards of the
restored indices. During this operation the primary shards become
unavailable which manifests itself in the ``red`` cluster state. Once
recovery of primary shards is completed Elasticsearch is switching to
standard replication process that creates the required number of
replicas at this moment cluster switches to the ``yellow`` state. Once
all required replicas are created, the cluster switches to the ``green``
states.

The cluster health operation provides only a high level status of the
restore process. It’s possible to get more detailed insight into the
current state of the recovery process by using `indices
recovery <#indices-recovery>`__ and `cat recovery <#cat-recovery>`__
APIs.

**Stopping currently running snapshot and restore operations**

The snapshot and restore framework allows running only one snapshot or
one restore operation at time. If currently running snapshot was
executed by mistake or takes unusually long, it can be terminated using
snapshot delete operation. The snapshot delete operation checks if
deleted snapshot is currently running and if it does, the delete
operation stops such snapshot before deleting the snapshot data from the
repository.

The restore operation is using standard shard recovery mechanism.
Therefore, any currently running restore operation can be canceled by
deleting indices that are being restored. Please note that data for all
deleted indices will be removed from the cluster as a result of this
operation.

Index Modules are modules created per index and control all aspects
related to an index. Since those modules lifecycle are tied to an index,
all the relevant modules settings can be provided when creating an index
(and it is actually the recommended way to configure an index).

**Index Settings**

There are specific index level settings that are not associated with any
specific module. These include:

``index.compound_format``
    Should the compound file format be used (boolean setting). The
    compound format was created to reduce the number of open file
    handles when using file based storage. However, by default it is set
    to ``false`` as the non-compound format gives better performance. It
    is important that OS is configured to give Elasticsearch “enough”
    file handles. See ?.

    Alternatively, ``compound_format`` can be set to a number between
    ``0`` and ``1``, where ``0`` means ``false``, ``1`` means ``true``
    and a number inbetween represents a percentage: if the merged
    segment is less than this percentage of the total index, then it is
    written in compound format, otherwise it is written in non-compound
    format.

``index.compound_on_flush``
    Should a new segment (create by indexing, not by merging) be written
    in compound format or non-compound format? Defaults to ``true``.
    This is a dynamic setting.

``index.refresh_interval``
    A time setting controlling how often the refresh operation will be
    executed. Defaults to ``1s``. Can be set to ``-1`` in order to
    disable it.

``index.shard.check_on_startup``
    Should shard consistency be checked upon opening. When ``true``, the
    shard will be checked, preventing it from being open in case some
    segments appear to be corrupted. When ``fix``, the shard will also
    be checked but segments that were reported as corrupted will be
    automatically removed. Default value is ``false``, which doesn’t
    check shards.

    **Note**

    Checking shards may take a lot of time on large indices.

    **Warning**

    Setting ``index.shard.check_on_startup`` to ``fix`` may result in
    data loss, use with caution.

Analysis
========

The index analysis module acts as a configurable registry of Analyzers
that can be used in order to break down indexed (analyzed) fields when a
document is indexed as well as to process query strings. It maps to the
Lucene ``Analyzer``.

Analyzers are (generally) composed of a single ``Tokenizer`` and zero or
more ``TokenFilters``. A set of ``CharFilters`` can be associated with
an analyzer to process the characters prior to other analysis steps. The
analysis module allows one to register ``TokenFilters``, ``Tokenizers``
and ``Analyzers`` under logical names that can then be referenced either
in mapping definitions or in certain APIs. The Analysis module
automatically registers (**if not explicitly defined**) built in
analyzers, token filters, and tokenizers.

See ? for configuration details.

Index Shard Allocation
======================

**Shard Allocation Filtering**

Allows to control the allocation of indices on nodes based on
include/exclude filters. The filters can be set both on the index level
and on the cluster level. Lets start with an example of setting it on
the cluster level:

Lets say we have 4 nodes, each has specific attribute called ``tag``
associated with it (the name of the attribute can be any name). Each
node has a specific value associated with ``tag``. Node 1 has a setting
``node.tag: value1``, Node 2 a setting of ``node.tag: value2``, and so
on.

We can create an index that will only deploy on nodes that have ``tag``
set to ``value1`` and ``value2`` by setting
``index.routing.allocation.include.tag`` to ``value1,value2``. For
example:

.. code:: js

    curl -XPUT localhost:9200/test/_settings -d '{
        "index.routing.allocation.include.tag" : "value1,value2"
    }'

On the other hand, we can create an index that will be deployed on all
nodes except for nodes with a ``tag`` of value ``value3`` by setting
``index.routing.allocation.exclude.tag`` to ``value3``. For example:

.. code:: js

    curl -XPUT localhost:9200/test/_settings -d '{
        "index.routing.allocation.exclude.tag" : "value3"
    }'

``index.routing.allocation.require.*`` can be used to specify a number
of rules, all of which MUST match in order for a shard to be allocated
to a node. This is in contrast to ``include`` which will include a node
if ANY rule matches.

The ``include``, ``exclude`` and ``require`` values can have generic
simple matching wildcards, for example, ``value1*``. Additonally,
special attribute names called ``_ip``, ``_name``, ``_id`` and ``_host``
can be used to match by node ip address, name, id or host name,
respectively.

Obviously a node can have several attributes associated with it, and
both the attribute name and value are controlled in the setting. For
example, here is a sample of several node configurations:

.. code:: js

    node.group1: group1_value1
    node.group2: group2_value4

In the same manner, ``include``, ``exclude`` and ``require`` can work
against several attributes, for example:

.. code:: js

    curl -XPUT localhost:9200/test/_settings -d '{
        "index.routing.allocation.include.group1" : "xxx"
        "index.routing.allocation.include.group2" : "yyy",
        "index.routing.allocation.exclude.group3" : "zzz",
        "index.routing.allocation.require.group4" : "aaa",
    }'

The provided settings can also be updated in real time using the update
settings API, allowing to "move" indices (shards) around in realtime.

Cluster wide filtering can also be defined, and be updated in real time
using the cluster update settings API. This setting can come in handy
for things like decommissioning nodes (even if the replica count is set
to 0). Here is a sample of how to decommission a node based on ``_ip``
address:

.. code:: js

    curl -XPUT localhost:9200/_cluster/settings -d '{
        "transient" : {
            "cluster.routing.allocation.exclude._ip" : "10.0.0.1"
        }
    }'

**Total Shards Per Node**

The ``index.routing.allocation.total_shards_per_node`` setting allows to
control how many total shards (replicas and primaries) for an index will
be allocated per node. It can be dynamically set on a live index using
the update index settings API.

**Disk-based Shard Allocation**

disk based shard allocation is enabled from version 1.3.0 onward

Elasticsearch can be configured to prevent shard allocation on nodes
depending on disk usage for the node. This functionality is enabled by
default, and can be changed either in the configuration file, or
dynamically using:

.. code:: js

    curl -XPUT localhost:9200/_cluster/settings -d '{
        "transient" : {
            "cluster.routing.allocation.disk.threshold_enabled" : false
        }
    }'

Once enabled, Elasticsearch uses two watermarks to decide whether shards
should be allocated or can remain on the node.

``cluster.routing.allocation.disk.watermark.low`` controls the low
watermark for disk usage. It defaults to 85%, meaning ES will not
allocate new shards to nodes once they have more than 85% disk used. It
can also be set to an absolute byte value (like 500mb) to prevent ES
from allocating shards if less than the configured amount of space is
available.

``cluster.routing.allocation.disk.watermark.high`` controls the high
watermark. It defaults to 90%, meaning ES will attempt to relocate
shards to another node if the node disk usage rises above 90%. It can
also be set to an absolute byte value (similar to the low watermark) to
relocate shards once less than the configured amount of space is
available on the node.

Both watermark settings can be changed dynamically using the cluster
settings API. By default, Elasticsearch will retrieve information about
the disk usage of the nodes every 30 seconds. This can also be changed
by setting the ``cluster.info.update.interval`` setting.

By default, Elasticsearch will take into account shards that are
currently being relocated to the target node when computing a node’s
disk usage. This can be changed by setting the
‘cluster.routing.allocation.disk.include\_relocations\` setting to
``false`` (defaults to ``true``). Taking relocating shards’ sizes into
account may, however, mean that the disk usage for a node is incorrectly
estimated on the high side, since the relocation could be 90% complete
and a recently retrieved disk usage would include the total size of the
relocating shard as well as the space already used by the running
relocation.

Index Slow Log
==============

**Search Slow Log**

Shard level slow search log allows to log slow search (query and fetch
executions) into a dedicated log file.

Thresholds can be set for both the query phase of the execution, and
fetch phase, here is a sample:

.. code:: js

    #index.search.slowlog.threshold.query.warn: 10s
    #index.search.slowlog.threshold.query.info: 5s
    #index.search.slowlog.threshold.query.debug: 2s
    #index.search.slowlog.threshold.query.trace: 500ms

    #index.search.slowlog.threshold.fetch.warn: 1s
    #index.search.slowlog.threshold.fetch.info: 800ms
    #index.search.slowlog.threshold.fetch.debug: 500ms
    #index.search.slowlog.threshold.fetch.trace: 200ms

By default, none are enabled (set to ``-1``). Levels (``warn``,
``info``, ``debug``, ``trace``) allow to control under which logging
level the log will be logged. Not all are required to be configured (for
example, only ``warn`` threshold can be set). The benefit of several
levels is the ability to quickly "grep" for specific thresholds
breached.

The logging is done on the shard level scope, meaning the execution of a
search request within a specific shard. It does not encompass the whole
search request, which can be broadcast to several shards in order to
execute. Some of the benefits of shard level logging is the association
of the actual execution on the specific machine, compared with request
level.

All settings are index level settings (and each index can have different
values for it), and can be changed in runtime using the index update
settings API.

The logging file is configured by default using the following
configuration (found in ``logging.yml``):

.. code:: js

    index_search_slow_log_file:
      type: dailyRollingFile
      file: ${path.logs}/${cluster.name}_index_search_slowlog.log
      datePattern: "'.'yyyy-MM-dd"
      layout:
        type: pattern
        conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"

**Index Slow log**

The indexing slow log, similar in functionality to the search slow log.
The log file is ends with ``_index_indexing_slowlog.log``. Log and the
thresholds are configured in the elasticsearch.yml file in the same way
as the search slowlog. Index slowlog sample:

.. code:: js

    #index.indexing.slowlog.threshold.index.warn: 10s
    #index.indexing.slowlog.threshold.index.info: 5s
    #index.indexing.slowlog.threshold.index.debug: 2s
    #index.indexing.slowlog.threshold.index.trace: 500ms

The index slow log file is configured by default in the ``logging.yml``
file:

.. code:: js

    index_indexing_slow_log_file:
        type: dailyRollingFile
        file: ${path.logs}/${cluster.name}_index_indexing_slowlog.log
        datePattern: "'.'yyyy-MM-dd"
        layout:
          type: pattern
          conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"

Merge
=====

A shard in elasticsearch is a Lucene index, and a Lucene index is broken
down into segments. Segments are internal storage elements in the index
where the index data is stored, and are immutable up to delete markers.
Segments are, periodically, merged into larger segments to keep the
index size at bay and expunge deletes.

The more segments one has in the Lucene index means slower searches and
more memory used. Segment merging is used to reduce the number of
segments, however merges can be expensive to perform, especially on low
IO environments. Merges can be throttled using `store level
throttling <#store-throttling>`__.

**Policy**

The index merge policy module allows one to control which segments of a
shard index are to be merged. There are several types of policies with
the default set to ``tiered``.

**tiered**

Merges segments of approximately equal size, subject to an allowed
number of segments per tier. This is similar to ``log_bytes_size`` merge
policy, except this merge policy is able to merge non-adjacent segment,
and separates how many segments are merged at once from how many
segments are allowed per tier. This merge policy also does not
over-merge (i.e., cascade merges).

This policy has the following settings:

``index.merge.policy.expunge_deletes_allowed``
    When expungeDeletes is called, we only merge away a segment if its
    delete percentage is over this threshold. Default is ``10``.

``index.merge.policy.floor_segment``
    Segments smaller than this are "rounded up" to this size, i.e.
    treated as equal (floor) size for merge selection. This is to
    prevent frequent flushing of tiny segments, thus preventing a long
    tail in the index. Default is ``2mb``.

``index.merge.policy.max_merge_at_once``
    Maximum number of segments to be merged at a time during "normal"
    merging. Default is ``10``.

``index.merge.policy.max_merge_at_once_explicit``
    Maximum number of segments to be merged at a time, during optimize
    or expungeDeletes. Default is ``30``.

``index.merge.policy.max_merged_segment``
    Maximum sized segment to produce during normal merging (not explicit
    optimize). This setting is approximate: the estimate of the merged
    segment size is made by summing sizes of to-be-merged segments
    (compensating for percent deleted docs). Default is ``5gb``.

``index.merge.policy.segments_per_tier``
    Sets the allowed number of segments per tier. Smaller values mean
    more merging but fewer segments. Default is ``10``. Note, this value
    needs to be >= than the ``max_merge_at_once`` otherwise you’ll force
    too many merges to occur.

``index.merge.policy.reclaim_deletes_weight``
    Controls how aggressively merges that reclaim more deletions are
    favored. Higher values favor selecting merges that reclaim
    deletions. A value of ``0.0`` means deletions don’t impact merge
    selection. Defaults to ``2.0``.

``index.compound_format``
    Should the index be stored in compound format or not. Defaults to
    ``false``. See
    ```index.compound_format`` <#index-compound-format>`__ in ?.

For normal merging, this policy first computes a "budget" of how many
segments are allowed to be in the index. If the index is over-budget,
then the policy sorts segments by decreasing size (proportionally
considering percent deletes), and then finds the least-cost merge. Merge
cost is measured by a combination of the "skew" of the merge (size of
largest seg divided by smallest seg), total merge size and pct deletes
reclaimed, so that merges with lower skew, smaller size and those
reclaiming more deletes, are favored.

If a merge will produce a segment that’s larger than
``max_merged_segment`` then the policy will merge fewer segments (down
to 1 at once, if that one has deletions) to keep the segment size under
budget.

Note, this can mean that for large shards that holds many gigabytes of
data, the default of ``max_merged_segment`` (``5gb``) can cause for many
segments to be in an index, and causing searches to be slower. Use the
indices segments API to see the segments that an index has, and possibly
either increase the ``max_merged_segment`` or issue an optimize call for
the index (try and aim to issue it on a low traffic time).

**log\_byte\_size**

A merge policy that merges segments into levels of exponentially
increasing **byte size**, where each level has fewer segments than the
value of the merge factor. Whenever extra segments (beyond the merge
factor upper bound) are encountered, all segments within the level are
merged.

This policy has the following settings:

+--------------------------------------+--------------------------------------+
| Setting                              | Description                          |
+======================================+======================================+
| index.merge.policy.merge\_factor     | Determines how often segment indices |
|                                      | are merged by index operation. With  |
|                                      | smaller values, less RAM is used     |
|                                      | while indexing, and searches on      |
|                                      | unoptimized indices are faster, but  |
|                                      | indexing speed is slower. With       |
|                                      | larger values, more RAM is used      |
|                                      | during indexing, and while searches  |
|                                      | on unoptimized indices are slower,   |
|                                      | indexing is faster. Thus larger      |
|                                      | values (greater than 10) are best    |
|                                      | for batch index creation, and        |
|                                      | smaller values (lower than 10) for   |
|                                      | indices that are interactively       |
|                                      | maintained. Defaults to ``10``.      |
+--------------------------------------+--------------------------------------+
| index.merge.policy.min\_merge\_size  | A size setting type which sets the   |
|                                      | minimum size for the lowest level    |
|                                      | segments. Any segments below this    |
|                                      | size are considered to be on the     |
|                                      | same level (even if they vary        |
|                                      | drastically in size) and will be     |
|                                      | merged whenever there are            |
|                                      | mergeFactor of them. This            |
|                                      | effectively truncates the "long      |
|                                      | tail" of small segments that would   |
|                                      | otherwise be created into a single   |
|                                      | level. If you set this too large, it |
|                                      | could greatly increase the merging   |
|                                      | cost during indexing (if you flush   |
|                                      | many small segments). Defaults to    |
|                                      | ``1.6mb``                            |
+--------------------------------------+--------------------------------------+
| index.merge.policy.max\_merge\_size  | A size setting type which sets the   |
|                                      | largest segment (measured by total   |
|                                      | byte size of the segment’s files)    |
|                                      | that may be merged with other        |
|                                      | segments. Defaults to unbounded.     |
+--------------------------------------+--------------------------------------+
| index.merge.policy.max\_merge\_docs  | Determines the largest segment       |
|                                      | (measured by document count) that    |
|                                      | may be merged with other segments.   |
|                                      | Defaults to unbounded.               |
+--------------------------------------+--------------------------------------+

**log\_doc**

A merge policy that tries to merge segments into levels of exponentially
increasing **document count**, where each level has fewer segments than
the value of the merge factor. Whenever extra segments (beyond the merge
factor upper bound) are encountered, all segments within the level are
merged.

+--------------------------------------+--------------------------------------+
| Setting                              | Description                          |
+======================================+======================================+
| index.merge.policy.merge\_factor     | Determines how often segment indices |
|                                      | are merged by index operation. With  |
|                                      | smaller values, less RAM is used     |
|                                      | while indexing, and searches on      |
|                                      | unoptimized indices are faster, but  |
|                                      | indexing speed is slower. With       |
|                                      | larger values, more RAM is used      |
|                                      | during indexing, and while searches  |
|                                      | on unoptimized indices are slower,   |
|                                      | indexing is faster. Thus larger      |
|                                      | values (greater than 10) are best    |
|                                      | for batch index creation, and        |
|                                      | smaller values (lower than 10) for   |
|                                      | indices that are interactively       |
|                                      | maintained. Defaults to ``10``.      |
+--------------------------------------+--------------------------------------+
| index.merge.policy.min\_merge\_docs  | Sets the minimum size for the lowest |
|                                      | level segments. Any segments below   |
|                                      | this size are considered to be on    |
|                                      | the same level (even if they vary    |
|                                      | drastically in size) and will be     |
|                                      | merged whenever there are            |
|                                      | mergeFactor of them. This            |
|                                      | effectively truncates the "long      |
|                                      | tail" of small segments that would   |
|                                      | otherwise be created into a single   |
|                                      | level. If you set this too large, it |
|                                      | could greatly increase the merging   |
|                                      | cost during indexing (if you flush   |
|                                      | many small segments). Defaults to    |
|                                      | ``1000``.                            |
+--------------------------------------+--------------------------------------+
| index.merge.policy.max\_merge\_docs  | Determines the largest segment       |
|                                      | (measured by document count) that    |
|                                      | may be merged with other segments.   |
|                                      | Defaults to unbounded.               |
+--------------------------------------+--------------------------------------+

**Scheduling**

The merge scheduler (ConcurrentMergeScheduler) controls the execution of
merge operations once they are needed (according to the merge policy).
Merges run in separate threads, and when the maximum number of threads
is reached, further merges will wait until a merge thread becomes
available. The merge scheduler supports this setting:

``index.merge.scheduler.max_thread_count``
    The maximum number of threads that may be merging at once. Defaults
    to
    ``Math.max(1, Math.min(3, Runtime.getRuntime().availableProcessors() / 2))``
    which works well for a good solid-state-disk (SSD). If your index is
    on spinning platter drives instead, decrease this to 1.

**SerialMergeScheduler**

This is accepted for backwards compatibility, but just uses
ConcurrentMergeScheduler with index.merge.scheduler.max\_thread\_count
set to 1 so that only 1 merge may run at a time.

Store
=====

The store module allows you to control how index data is stored.

The index can either be stored in-memory (no persistence) or on-disk
(the default). In-memory indices provide better performance at the cost
of limiting the index size to the amount of available physical memory.

When using a local gateway (the default), file system storage with
**no** in memory storage is required to maintain index consistency. This
is required since the local gateway constructs its state from the local
index state of each node.

Another important aspect of memory based storage is the fact that
Elasticsearch supports storing the index in memory **outside of the JVM
heap space** using the "Memory" (see below) storage type. It translates
to the fact that there is no need for extra large JVM heaps (with their
own consequences) for storing the index in memory.

**Store Level Throttling**

The way Lucene, the IR library elasticsearch uses under the covers,
works is by creating immutable segments (up to deletes) and constantly
merging them (the merge policy settings allow to control how those
merges happen). The merge process happens in an asynchronous manner
without affecting the indexing / search speed. The problem though,
especially on systems with low IO, is that the merge process can be
expensive and affect search / index operation simply by the fact that
the box is now taxed with more IO happening.

The store module allows to have throttling configured for merges (or
all) either on the node level, or on the index level. The node level
throttling will make sure that out of all the shards allocated on that
node, the merge process won’t pass the specific setting bytes per
second. It can be set by setting ``indices.store.throttle.type`` to
``merge``, and setting ``indices.store.throttle.max_bytes_per_sec`` to
something like ``5mb``. The node level settings can be changed
dynamically using the cluster update settings API. The default is set to
``20mb`` with type ``merge``.

If specific index level configuration is needed, regardless of the node
level settings, it can be set as well using the
``index.store.throttle.type``, and
``index.store.throttle.max_bytes_per_sec``. The default value for the
type is ``node``, meaning it will throttle based on the node level
settings and participate in the global throttling happening. Both
settings can be set using the index update settings API dynamically.

**File system storage types**

File system based storage is the default storage used. There are
different implementations or *storage types*. The best one for the
operating environment will be automatically chosen: ``mmapfs`` on
Windows 64bit, ``simplefs`` on Windows 32bit, and ``default`` (hybrid
``niofs`` and ``mmapfs``) for the rest.

This can be overridden for all indices by adding this to the
``config/elasticsearch.yml`` file:

.. code:: yaml

    index.store.type: niofs

It can also be set on a per-index basis at index creation time:

.. code:: json

    curl -XPUT localhost:9200/my_index -d '{
        "settings": {
            "index.store.type": "niofs"
        }
    }';

The following sections lists all the different storage types supported.

**Simple FS**

The ``simplefs`` type is a straightforward implementation of file system
storage (maps to Lucene ``SimpleFsDirectory``) using a random access
file. This implementation has poor concurrent performance (multiple
threads will bottleneck). It is usually better to use the ``niofs`` when
you need index persistence.

**NIO FS**

The ``niofs`` type stores the shard index on the file system (maps to
Lucene ``NIOFSDirectory``) using NIO. It allows multiple threads to read
from the same file concurrently. It is not recommended on Windows
because of a bug in the SUN Java implementation.

**MMap FS**

The ``mmapfs`` type stores the shard index on the file system (maps to
Lucene ``MMapDirectory``) by mapping a file into memory (mmap). Memory
mapping uses up a portion of the virtual memory address space in your
process equal to the size of the file being mapped. Before using this
class, be sure your have plenty of virtual address space. See ?

**Hybrid MMap / NIO FS**

The ``default`` type stores the shard index on the file system depending
on the file type by mapping a file into memory (mmap) or using Java NIO.
Currently only the Lucene term dictionary and doc values files are
memory mapped to reduce the impact on the operating system. All other
files are opened using Lucene ``NIOFSDirectory``. Address space settings
(?) might also apply if your term dictionaries are large.

**Memory**

The ``memory`` type stores the index in main memory, using Lucene’s
``RamIndexStore``.

Mapper
======

The mapper module acts as a registry for the type mapping definitions
added to an index either when creating it or by using the put mapping
api. It also handles the dynamic mapping support for types that have no
explicit mappings pre defined. For more information about mapping
definitions, check out the `mapping section <#mapping>`__.

**Dynamic Mappings**

New types and new fields within types can be added dynamically just by
indexing a document. When Elasticsearch encounters a new type, it
creates the type using the ``_default_`` mapping (see below).

When it encounters a new field within a type, it autodetects the
datatype that the field contains and adds it to the type mapping
automatically.

See ? for details of how to control and configure dynamic mapping.

**Default Mapping**

When a new type is created (at `index
creation <#indices-create-index>`__ time, using the ```put-mapping``
API <#indices-put-mapping>`__ or just by indexing a document into it),
the type uses the ``_default_`` mapping as its basis. Any mapping
specified in the ```create-index`` <#indices-create-index>`__ or
```put-mapping`` <#indices-put-mapping>`__ request override values set
in the ``_default_`` mapping.

The default mapping definition is a plain mapping definition that is
embedded within ElasticSearch:

.. code:: js

    {
        _default_ : {
        }
    }

Pretty short, isn’t it? Basically, everything is \`\ *default*\ \`ed,
including the dynamic nature of the root object mapping which allows new
fields to be added automatically.

The built-in default mapping definition can be overridden in several
ways. A ``_default_`` mapping can be specified when creating a new
index, or the global ``_default_`` mapping (for all indices) can be
configured by creating a file called ``config/default-mapping.json``.
(This location can be changed with the
``index.mapper.default_mapping_location`` setting.)

Dynamic creation of mappings for unmapped types can be completely
disabled by setting ``index.mapper.dynamic`` to ``false``.

Translog
========

Each shard has a transaction log or write ahead log associated with it.
It allows to guarantee that when an index/delete operation occurs, it is
applied atomically, while not "committing" the internal Lucene index for
each request. A flush ("commit") still happens based on several
parameters:

``index.translog.flush_threshold_ops``
    After how many operations to flush. Defaults to ``unlimited``.

``index.translog.flush_threshold_size``
    Once the translog hits this size, a flush will happen. Defaults to
    ``200mb``.

``index.translog.flush_threshold_period``
    The period with no flush happening to force a flush. Defaults to
    ``30m``.

``index.translog.interval``
    How often to check if a flush is needed, randomized between the
    interval value and 2x the interval value. Defaults to ``5s``.

``index.gateway.local.sync``
    How often the translog is ``fsync``\ ed to disk. Defaults to ``5s``.

Note: these parameters can be updated at runtime using the Index
Settings Update API (for example, these number can be increased when
executing bulk updates to support higher TPS)

Cache
=====

There are different caching inner modules associated with an index. They
include ``filter`` and others.

**Filter Cache**

The filter cache is responsible for caching the results of filters (used
in the query). The default implementation of a filter cache (and the one
recommended to use in almost all cases) is the ``node`` filter cache
type.

**Node Filter Cache**

The ``node`` filter cache may be configured to use either a percentage
of the total memory allocated to the process or an specific amount of
memory. All shards present on a node share a single node cache (thats
why its called ``node``). The cache implements an LRU eviction policy:
when a cache becomes full, the least recently used data is evicted to
make way for new data.

The setting that allows one to control the memory size for the filter
cache is ``indices.cache.filter.size``, which defaults to ``10%``.
**Note**, this is **not** an index level setting but a node level
setting (can be configured in the node configuration).

``indices.cache.filter.size`` can accept either a percentage value, like
``30%``, or an exact value, like ``512mb``.

Shard query cache
=================

When a search request is run against an index or against many indices,
each involved shard executes the search locally and returns its local
results to the *coordinating node*, which combines these shard-level
results into a “global” result set.

The shard-level query cache module caches the local results on each
shard. This allows frequently used (and potentially heavy) search
requests to return results almost instantly. The query cache is a very
good fit for the logging use case, where only the most recent index is
being actively updated — results from older indices will be served
directly from the cache.

    **Important**

    For now, the query cache will only cache the results of search
    requests where ```?search_type=count`` <#count>`__, so it will not
    cache ``hits``, but it will cache ``hits.total``,
    `aggregations <#search-aggregations>`__, and
    `suggestions <#search-suggesters>`__.

    Queries that use ``now`` (see ?) cannot be cached.

**Cache invalidation**

The cache is smart — it keeps the same *near real-time* promise as
uncached search.

Cached results are invalidated automatically whenever the shard
refreshes, but only if the data in the shard has actually changed. In
other words, you will always get the same results from the cache as you
would for an uncached search request.

The longer the refresh interval, the longer that cached entries will
remain valid. If the cache is full, the least recently used cache keys
will be evicted.

The cache can be expired manually with the ```clear-cache``
API <#indices-clearcache>`__:

.. code:: json

    curl -XPOST 'localhost:9200/kimchy,elasticsearch/_cache/clear?query_cache=true'

**Enabling caching by default**

The cache is not enabled by default, but can be enabled when creating a
new index as follows:

.. code:: json

    curl -XPUT localhost:9200/my_index -d'
    {
      "settings": {
        "index.cache.query.enable": true
      }
    }
    '

It can also be enabled or disabled dynamically on an existing index with
the ```update-settings`` <#indices-update-settings>`__ API:

.. code:: json

    curl -XPUT localhost:9200/my_index/_settings -d'
    { "index.cache.query.enable": true }
    '

**Enabling caching per request**

The ``query_cache`` query-string parameter can be used to enable or
disable caching on a **per-query** basis. If set, it overrides the
index-level setting:

.. code:: json

    curl 'localhost:9200/my_index/_search?search_type=count&query_cache=true' -d'
    {
      "aggs": {
        "popular_colors": {
          "terms": {
            "field": "colors"
          }
        }
      }
    }
    '

    **Important**

    If your query uses a script whose result is not deterministic (e.g.
    it uses a random function or references the current time) you should
    set the ``query_cache`` flag to ``false`` to disable caching for
    that request.

**Cache key**

The whole JSON body is used as the cache key. This means that if the
JSON changes — for instance if keys are output in a different
order — then the cache key will not be recognised.

    **Tip**

    Most JSON libraries support a *canonical* mode which ensures that
    JSON keys are always emitted in the same order. This canonical mode
    can be used in the application to ensure that a request is always
    serialized in the same way.

**Cache settings**

The cache is managed at the node level, and has a default maximum size
of ``1%`` of the heap. This can be changed in the
``config/elasticsearch.yml`` file with:

.. code:: yaml

    indices.cache.query.size: 2%

Also, you can use the ``indices.cache.query.expire`` setting to specify
a TTL for cached results, but there should be no reason to do so.
Remember that stale results are automatically invalidated when the index
is refreshed. This setting is provided for completeness' sake only.

**Monitoring cache usage**

The size of the cache (in bytes) and the number of evictions can be
viewed by index, with the ```indices-stats`` <#indices-stats>`__ API:

.. code:: json

    curl 'localhost:9200/_stats/query_cache?pretty&human'

or by node with the ```nodes-stats`` <#cluster-nodes-stats>`__ API:

.. code:: json

    curl 'localhost:9200/_nodes/stats/indices/query_cache?pretty&human'

Field data
==========

The field data cache is used mainly when sorting on or computing
aggregations on a field. It loads all the field values to memory in
order to provide fast document based access to those values. The field
data cache can be expensive to build for a field, so its recommended to
have enough memory to allocate it, and to keep it loaded.

The amount of memory used for the field data cache can be controlled
using ``indices.fielddata.cache.size``. Note: reloading the field data
which does not fit into your cache will be expensive and perform poorly.

+--------------------------------------+--------------------------------------+
| Setting                              | Description                          |
+======================================+======================================+
| ``indices.fielddata.cache.size``     | The max size of the field data       |
|                                      | cache, eg ``30%`` of node heap       |
|                                      | space, or an absolute value, eg      |
|                                      | ``12GB``. Defaults to unbounded.     |
+--------------------------------------+--------------------------------------+
| ``indices.fielddata.cache.expire``   | A time based setting that expires    |
|                                      | field data after a certain time of   |
|                                      | inactivity. Defaults to ``-1``. For  |
|                                      | example, can be set to ``5m`` for a  |
|                                      | 5 minute expiry.                     |
+--------------------------------------+--------------------------------------+

**Circuit Breaker**

Elasticsearch contains multiple circuit breakers used to prevent
operations from causing an OutOfMemoryError. Each breaker specifies a
limit for how much memory it can use. Additionally, there is a
parent-level breaker that specifies the total amount of memory that can
be used across all breakers.

The parent-level breaker can be configured with the following setting:

``indices.breaker.total.limit``
    Starting limit for overall parent breaker, defaults to 70% of JVM
    heap

All circuit breaker settings can be changed dynamically using the
cluster update settings API.

**Field data circuit breaker**

The field data circuit breaker allows Elasticsearch to estimate the
amount of memory a field will required to be loaded into memory. It can
then prevent the field data loading by raising an exception. By default
the limit is configured to 60% of the maximum JVM heap. It can be
configured with the following parameters:

``indices.breaker.fielddata.limit``
    Limit for fielddata breaker, defaults to 60% of JVM heap

``indices.breaker.fielddata.overhead``
    A constant that all field data estimations are multiplied with to
    determine a final estimation. Defaults to 1.03

**Request circuit breaker**

The request circuit breaker allows Elasticsearch to prevent per-request
data structures (for example, memory used for calculating aggregations
during a request) from exceeding a certain amount of memory.

``indices.breaker.request.limit``
    Limit for request breaker, defaults to 40% of JVM heap

``indices.breaker.request.overhead``
    A constant that all request estimations are multiplied with to
    determine a final estimation. Defaults to 1

**Monitoring field data**

You can monitor memory usage for field data as well as the field data
circuit breaker using `Nodes Stats API <#cluster-nodes-stats>`__

Field data formats
==================

The field data format controls how field data should be stored.

Depending on the field type, there might be several field data types
available. In particular, string and numeric types support the
``doc_values`` format which allows for computing the field data
data-structures at indexing time and storing them on disk. Although it
will make the index larger and may be slightly slower, this
implementation will be more near-realtime-friendly and will require much
less memory from the JVM than other implementations.

Here is an example of how to configure the ``tag`` field to use the
``fst`` field data format.

.. code:: js

    {
        "tag": {
            "type":      "string",
            "fielddata": {
                "format": "fst"
            }
        }
    }

It is possible to change the field data format (and the field data
settings in general) on a live index by using the update mapping API.
When doing so, field data which had already been loaded for existing
segments will remain alive while new segments will use the new field
data configuration. Thanks to the background merging process, all
segments will eventually use the new field data format.

**String field data types**

``paged_bytes`` (default)
    Stores unique terms sequentially in a large buffer and maps
    documents to the indices of the terms they contain in this large
    buffer.

``fst``
    Stores terms in a FST. Slower to build than ``paged_bytes`` but can
    help lower memory usage if many terms share common prefixes and/or
    suffixes.

``doc_values``
    Computes and stores field data data-structures on disk at indexing
    time. Lowers memory usage but only works on non-analyzed strings
    (``index``: ``no`` or ``not_analyzed``).

**Numeric field data types**

``array`` (default)
    Stores field values in memory using arrays.

``doc_values``
    Computes and stores field data data-structures on disk at indexing
    time.

**Geo point field data types**

``array`` (default)
    Stores latitudes and longitudes in arrays.

``doc_values``
    Computes and stores field data data-structures on disk at indexing
    time.

**Global ordinals**

Global ordinals is a data-structure on top of field data, that maintains
an incremental numbering for all the terms in field data in a
lexicographic order. Each term has a unique number and the number of
term *A* is lower than the number of term *B*. Global ordinals are only
supported on string fields.

Field data on string also has ordinals, which is a unique numbering for
all terms in a particular segment and field. Global ordinals just build
on top of this, by providing a mapping between the segment ordinals and
the global ordinals. The latter being unique across the entire shard.

Global ordinals can be beneficial in search features that use segment
ordinals already such as the terms aggregator to improve the execution
time. Often these search features need to merge the segment ordinal
results to a cross segment terms result. With global ordinals this
mapping happens during field data load time instead of during each query
execution. With global ordinals search features only need to resolve the
actual term when building the (shard) response, but during the execution
there is no need at all to use the actual terms and the unique numbering
global ordinals provided is sufficient and improves the execution time.

Global ordinals for a specified field are tied to all the segments of a
shard (Lucene index), which is different than for field data for a
specific field which is tied to a single segment. For this reason global
ordinals need to be rebuilt in its entirety once new segments become
visible. This one time cost would happen anyway without global ordinals,
but then it would happen for each search execution instead!

The loading time of global ordinals depends on the number of terms in a
field, but in general it is low, since it source field data has already
been loaded. The memory overhead of global ordinals is a small because
it is very efficiently compressed. Eager loading of global ordinals can
move the loading time from the first search request, to the refresh
itself.

**Fielddata loading**

By default, field data is loaded lazily, ie. the first time that a query
that requires them is executed. However, this can make the first
requests that follow a merge operation quite slow since fielddata
loading is a heavy operation.

It is possible to force field data to be loaded and cached eagerly
through the ``loading`` setting of fielddata:

.. code:: js

    {
        "category": {
            "type":      "string",
            "fielddata": {
                "loading": "eager"
            }
        }
    }

Global ordinals can also be eagerly loaded:

.. code:: js

    {
        "category": {
            "type":      "string",
            "fielddata": {
                "loading": "eager_global_ordinals"
            }
        }
    }

With the above setting both field data and global ordinals for a
specific field are eagerly loaded.

**Disabling field data loading**

Field data can take a lot of RAM so it makes sense to disable field data
loading on the fields that don’t need field data, for example those that
are used for full-text search only. In order to disable field data
loading, just change the field data format to ``disabled``. When
disabled, all requests that will try to load field data, e.g. when they
include aggregations and/or sorting, will return an error.

.. code:: js

    {
        "text": {
            "type":      "string",
            "fielddata": {
                "format": "disabled"
            }
        }
    }

The ``disabled`` format is supported by all field types.

**Filtering fielddata**

It is possible to control which field values are loaded into memory,
which is particularly useful for string fields. When specifying the
`mapping <#mapping-core-types>`__ for a field, you can also specify a
fielddata filter.

Fielddata filters can be changed using the `PUT
mapping <#indices-put-mapping>`__ API. After changing the filters, use
the `Clear Cache <#indices-clearcache>`__ API to reload the fielddata
using the new filters.

**Filtering by frequency:**

The frequency filter allows you to only load terms whose frequency falls
between a ``min`` and ``max`` value, which can be expressed an absolute
number or as a percentage (eg ``0.01`` is ``1%``). Frequency is
calculated **per segment**. Percentages are based on the number of docs
which have a value for the field, as opposed to all docs in the segment.

Small segments can be excluded completely by specifying the minimum
number of docs that the segment should contain with
``min_segment_size``:

.. code:: js

    {
        "tag": {
            "type":      "string",
            "fielddata": {
                "filter": {
                    "frequency": {
                        "min":              0.001,
                        "max":              0.1,
                        "min_segment_size": 500
                    }
                }
            }
        }
    }

**Filtering by regex**

Terms can also be filtered by regular expression - only values which
match the regular expression are loaded. Note: the regular expression is
applied to each term in the field, not to the whole field value. For
instance, to only load hashtags from a tweet, we can use a regular
expression which matches terms beginning with ``#``:

.. code:: js

    {
        "tweet": {
            "type":      "string",
            "analyzer":  "whitespace"
            "fielddata": {
                "filter": {
                    "regex": {
                        "pattern": "^#.*"
                    }
                }
            }
        }
    }

**Combining filters**

The ``frequency`` and ``regex`` filters can be combined:

.. code:: js

    {
        "tweet": {
            "type":      "string",
            "analyzer":  "whitespace"
            "fielddata": {
                "filter": {
                    "regex": {
                        "pattern":          "^#.*",
                    },
                    "frequency": {
                        "min":              0.001,
                        "max":              0.1,
                        "min_segment_size": 500
                    }
                }
            }
        }
    }

Similarity module
=================

A similarity (scoring / ranking model) defines how matching documents
are scored. Similarity is per field, meaning that via the mapping one
can define a different similarity per field.

Configuring a custom similarity is considered a expert feature and the
builtin similarities are most likely sufficient as is described in the
`mapping section <#mapping-core-types>`__

**Configuring a similarity**

Most existing or custom Similarities have configuration options which
can be configured via the index settings as shown below. The index
options can be provided when creating an index or updating index
settings.

.. code:: js

    "similarity" : {
      "my_similarity" : {
        "type" : "DFR",
        "basic_model" : "g",
        "after_effect" : "l",
        "normalization" : "h2",
        "normalization.h2.c" : "3.0"
      }
    }

Here we configure the DFRSimilarity so it can be referenced as
``my_similarity`` in mappings as is illustrate in the below example:

.. code:: js

    {
      "book" : {
        "properties" : {
          "title" : { "type" : "string", "similarity" : "my_similarity" }
        }
    }

**Available similarities**

**Default similarity**

The default similarity that is based on the TF/IDF model. This
similarity has the following option:

``discount_overlaps``
    Determines whether overlap tokens (Tokens with 0 position increment)
    are ignored when computing norm. By default this is true, meaning
    overlap tokens do not count when computing norms.

Type name: ``default``

**BM25 similarity**

Another TF/IDF based similarity that has built-in tf normalization and
is supposed to work better for short fields (like names). See
`Okapi\_BM25 <http://en.wikipedia.org/wiki/Okapi_BM25>`__ for more
details. This similarity has the following options:

+------------+---------------------------------------------------------------+
| ``k1``     | Controls non-linear term frequency normalization              |
|            | (saturation).                                                 |
+------------+---------------------------------------------------------------+
| ``b``      | Controls to what degree document length normalizes tf values. |
+------------+---------------------------------------------------------------+
| ``discount | Determines whether overlap tokens (Tokens with 0 position     |
| _overlaps` | increment) are ignored when computing norm. By default this   |
| `          | is true, meaning overlap tokens do not count when computing   |
|            | norms.                                                        |
+------------+---------------------------------------------------------------+

Type name: ``BM25``

**DFR similarity**

Similarity that implements the `divergence from
randomness <http://lucene.apache.org/core/4_1_0/core/org/apache/lucene/search/similarities/DFRSimilarity.html>`__
framework. This similarity has the following options:

+------------+---------------------------------------------------------------+
| ``basic_mo | Possible values: ``be``, ``d``, ``g``, ``if``, ``in``,        |
| del``      | ``ine`` and ``p``.                                            |
+------------+---------------------------------------------------------------+
| ``after_ef | Possible values: ``no``, ``b`` and ``l``.                     |
| fect``     |                                                               |
+------------+---------------------------------------------------------------+
| ``normaliz | Possible values: ``no``, ``h1``, ``h2``, ``h3`` and ``z``.    |
| ation``    |                                                               |
+------------+---------------------------------------------------------------+

All options but the first option need a normalization value.

Type name: ``DFR``

**IB similarity.**

`Information based
model <http://lucene.apache.org/core/4_1_0/core/org/apache/lucene/search/similarities/IBSimilarity.html>`__
. This similarity has the following options:

+------------+---------------------------------------------------------------+
| ``distribu | Possible values: ``ll`` and ``spl``.                          |
| tion``     |                                                               |
+------------+---------------------------------------------------------------+
| ``lambda`` | Possible values: ``df`` and ``ttf``.                          |
+------------+---------------------------------------------------------------+
| ``normaliz | Same as in ``DFR`` similarity.                                |
| ation``    |                                                               |
+------------+---------------------------------------------------------------+

Type name: ``IB``

**LM Dirichlet similarity.**

`LM Dirichlet
similarity <http://lucene.apache.org/core/4_7_1/core/org/apache/lucene/search/similarities/LMDirichletSimilarity.html>`__
. This similarity has the following options:

+------------+---------------------------------------------------------------+
| ``mu``     | Default to ``2000``.                                          |
+------------+---------------------------------------------------------------+

Type name: ``LMDirichlet``

**LM Jelinek Mercer similarity.**

`LM Jelinek Mercer
similarity <http://lucene.apache.org/core/4_7_1/core/org/apache/lucene/search/similarities/LMJelinekMercerSimilarity.html>`__
. This similarity has the following options:

+------------+---------------------------------------------------------------+
| ``lambda`` | The optimal value depends on both the collection and the      |
|            | query. The optimal value is around ``0.1`` for title queries  |
|            | and ``0.7`` for long queries. Default to ``0.1``.             |
+------------+---------------------------------------------------------------+

Type name: ``LMJelinekMercer``

**Default and Base Similarities**

By default, Elasticsearch will use whatever similarity is configured as
``default``. However, the similarity functions ``queryNorm()`` and
``coord()`` are not per-field. Consequently, for expert users wanting to
change the implementation used for these two methods, while not changing
the ``default``, it is possible to configure a similarity with the name
``base``. This similarity will then be used for the two methods.

You can change the default similarity for all fields like this:

.. code:: js

    index.similarity.default.type: BM25

This section is about utilizing elasticsearch as part of your testing
infrastructure.

**Testing:**

-  ?

Java Testing Framework
======================

Testing is a crucial part of your application, and as information
retrieval itself is already a complex topic, there should not be any
additional complexity in setting up a testing infrastructure, which uses
elasticsearch. This is the main reason why we decided to release an
additional file to the release, which allows you to use the same testing
infrastructure we do in the elasticsearch core. The testing framework
allows you to setup clusters with multiple nodes in order to check if
your code covers everything needed to run in a cluster. The framework
prevents you from writing complex code yourself to start, stop or manage
several test nodes in a cluster. In addition there is another very
important feature called randomized testing, which you are getting for
free as it is part of the elasticsearch infrastructure.

why randomized testing?
-----------------------

The key concept of randomized testing is not to use the same input
values for every testcase, but still be able to reproduce it in case of
a failure. This allows to test with vastly different input variables in
order to make sure, that your implementation is actually independent
from your provided test data.

If you are interested in the implementation being used, check out the
`RandomizedTesting
webpage <http://labs.carrotsearch.com/randomizedtesting.html>`__.

Using the elasticsearch test classes
------------------------------------

First, you need to include the testing dependency in your project. If
you use maven and its ``pom.xml`` file, it looks like this

::

    <dependencies>
      <dependency>
        <groupId>org.apache.lucene</groupId>
        <artifactId>lucene-test-framework</artifactId>
        <version>${lucene.version}</version>
        <scope>test</scope>
      </dependency>
      <dependency>
        <groupId>org.elasticsearch</groupId>
        <artifactId>elasticsearch</artifactId>
        <version>${elasticsearch.version}</version>
        <scope>test</scope>
        <type>test-jar</type>
      </dependency>
      <dependency>
        <groupId>org.elasticsearch</groupId>
        <artifactId>elasticsearch</artifactId>
        <version>${elasticsearch.version}</version>
        <scope>test</scope>
      </dependency>
    </dependencies>

Replace the elasticsearch version and the lucene versions with the
current elasticsearch version and its accompanying lucene release.

There are already have a couple of classes, you can inherit from in your
own test classes. The advantages of doing so is having already defined
loggers, the whole randomized infrastructure is set up already.

unit tests
----------

In case you only need to execute a unit test, because your
implementation can be isolated that good and does not require an up and
running elasticsearch cluster, you can use the
``ElasticsearchTestCase``. If you are testing lucene features, use
``ElasticsearchLuceneTestCase`` and if you are testing concrete token
streams, use the ``ElasticsearchTokenStreamTestCase`` class. Those
specific classes execute additional checks, which ensure that no
resources leaks are happening, after the test has run.

integration tests
-----------------

These kind of tests require firing up a whole cluster of nodes, before
the tests can actually be run. Compared to unit tests they are obviously
way more time consuming, but the test infrastructure tries to minimize
the time cost by only restarting the whole cluster, if this is
configured explicitly.

The class your tests have to inherit from is
``ElasticsearchIntegrationTest``. As soon as you inherit, there is no
need for you to start any elasticsearch nodes manually in your test
anymore, though you might need to ensure that at least a certain amount
of nodes is up and running.

number of shards
~~~~~~~~~~~~~~~~

The number of shards used for indices created during integration tests
is randomized between ``1`` and ``10`` unless overwritten upon index
creation via index settings. Rule of thumb is not to specify the number
of shards unless needed, so that each test will use a different one all
the time.

generic helper methods
~~~~~~~~~~~~~~~~~~~~~~

There are a couple of helper methods in
``ElasticsearchIntegrationTest``, which will make your tests shorter and
more concise.

+------------+---------------------------------------------------------------+
| ``refresh( | Refreshes all indices in a cluster                            |
| )``        |                                                               |
+------------+---------------------------------------------------------------+
| ``ensureGr | Ensures a green health cluster state, waiting for             |
| een()``    | relocations. Waits the default timeout of 30 seconds before   |
|            | failing.                                                      |
+------------+---------------------------------------------------------------+
| ``ensureYe | Ensures a yellow health cluster state, also waits for 30      |
| llow()``   | seconds before failing.                                       |
+------------+---------------------------------------------------------------+
| ``createIn | Creates an index with the specified name                      |
| dex(name)` |                                                               |
| `          |                                                               |
+------------+---------------------------------------------------------------+
| ``flush()` | Flushes all indices in a cluster                              |
| `          |                                                               |
+------------+---------------------------------------------------------------+
| ``flushAnd | Combines ``flush()`` and ``refresh()`` calls                  |
| Refresh()` |                                                               |
| `          |                                                               |
+------------+---------------------------------------------------------------+
| ``optimize | Waits for all relocations and optimized all indices in the    |
| ()``       | cluster to one segment.                                       |
+------------+---------------------------------------------------------------+
| ``indexExi | Checks if given index exists                                  |
| sts(name)` |                                                               |
| `          |                                                               |
+------------+---------------------------------------------------------------+
| ``admin()` | Returns an ``AdminClient`` for administrative tasks           |
| `          |                                                               |
+------------+---------------------------------------------------------------+
| ``clusterS | Returns the cluster service java class                        |
| ervice()`` |                                                               |
+------------+---------------------------------------------------------------+
| ``cluster( | Returns the test cluster class, which is explained in the     |
| )``        | next paragraphs                                               |
+------------+---------------------------------------------------------------+

test cluster methods
~~~~~~~~~~~~~~~~~~~~

The ``TestCluster`` class is the heart of the cluster functionality in a
randomized test and allows you to configure a specific setting or replay
certain types of outages to check, how your custom code reacts.

+------------+---------------------------------------------------------------+
| ``ensureAt | Ensure at least the specified number of nodes is running in   |
| LeastNumNo | the cluster                                                   |
| des(n)``   |                                                               |
+------------+---------------------------------------------------------------+
| ``ensureAt | Ensure at most the specified number of nodes is running in    |
| MostNumNod | the cluster                                                   |
| es(n)``    |                                                               |
+------------+---------------------------------------------------------------+
| ``getInsta | Get a guice instantiated instance of a class from a random    |
| nce()``    | node                                                          |
+------------+---------------------------------------------------------------+
| ``getInsta | Get a guice instantiated instance of a class from a specified |
| nceFromNod | node                                                          |
| e()``      |                                                               |
+------------+---------------------------------------------------------------+
| ``stopRand | Stop a random node in your cluster to mimic an outage         |
| omNode()`` |                                                               |
+------------+---------------------------------------------------------------+
| ``stopCurr | Stop the current master node to force a new election          |
| entMasterN |                                                               |
| ode()``    |                                                               |
+------------+---------------------------------------------------------------+
| ``stopRand | Stop a random non master node to mimic an outage              |
| omNonMaste |                                                               |
| r()``      |                                                               |
+------------+---------------------------------------------------------------+
| ``buildNod | Create a new elasticsearch node                               |
| e()``      |                                                               |
+------------+---------------------------------------------------------------+
| ``startNod | Create and start a new elasticsearch node                     |
| e(settings |                                                               |
| )``        |                                                               |
+------------+---------------------------------------------------------------+

Accessing clients
~~~~~~~~~~~~~~~~~

In order to execute any actions, you have to use a client. You can use
the ``ElasticsearchIntegrationTest.client()`` method to get back a
random client. This client can be a ``TransportClient`` or a
``NodeClient`` - and usually you do not need to care as long as the
action gets executed. There are several more methods for client
selection inside of the ``TestCluster`` class, which can be accessed
using the ``ElasticsearchIntegrationTest.cluster()`` method.

+------------+---------------------------------------------------------------+
| ``iterator | An iterator over all available clients                        |
| ()``       |                                                               |
+------------+---------------------------------------------------------------+
| ``masterCl | Returns a client which is connected to the master node        |
| ient()``   |                                                               |
+------------+---------------------------------------------------------------+
| ``nonMaste | Returns a client which is not connected to the master node    |
| rClient()` |                                                               |
| `          |                                                               |
+------------+---------------------------------------------------------------+
| ``clientNo | Returns a client, which is running on a client node           |
| deClient() |                                                               |
| ``         |                                                               |
+------------+---------------------------------------------------------------+
| ``client(S | Returns a client to a given node                              |
| tring node |                                                               |
| Name)``    |                                                               |
+------------+---------------------------------------------------------------+
| ``smartCli | Returns a smart client                                        |
| ent()``    |                                                               |
+------------+---------------------------------------------------------------+

Scoping
~~~~~~~

By default the tests are run without restarting the cluster between
tests or test classes in order to be as fast as possible. Of course all
indices and templates are deleted between each test. However, sometimes
you need to start a new cluster for each test or for a whole test suite
- for example, if you load a certain plugin, but you do not want to load
it for every test.

You can use the ``@ClusterScope`` annotation at class level to configure
this behaviour

.. code:: java

    @ClusterScope(scope=SUITE, numNodes=1)
    public class CustomSuggesterSearchTests extends ElasticsearchIntegrationTest {
      // ... tests go here
    }

The above sample configures an own cluster for this test suite, which is
the class. Other values could be ``GLOBAL`` (the default) or ``TEST`` in
order to spawn a new cluster for each test. The ``numNodes`` settings
allows you to only start a certain number of nodes, which can speed up
test execution, as starting a new node is a costly and time consuming
operation and might not be needed for this test.

Changing node configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~

As elasticsearch is using JUnit 4, using the ``@Before`` and ``@After``
annotations is not a problem. However you should keep in mind, that this
does not have any effect in your cluster setup, as the cluster is
already up and running when those methods are run. So in case you want
to configure settings - like loading a plugin on node startup - before
the node is actually running, you should overwrite the
``nodeSettings()`` method from the ``ElasticsearchIntegrationTest``
class and change the cluster scope to ``SUITE``.

.. code:: java

    @Override
    protected Settings nodeSettings(int nodeOrdinal) {
      return ImmutableSettings.settingsBuilder()
               .put("plugin.types", CustomSuggesterPlugin.class.getName())
               .put(super.nodeSettings(nodeOrdinal)).build();
    }

Randomized testing
------------------

The code snippets you saw so far did not show any trace of randomized
testing features, as they are carefully hidden under the hood. However
when you are writing your own tests, you should make use of these
features as well. Before starting with that, you should know, how to
repeat a failed test with the same setup, how it failed. Luckily this is
quite easy, as the whole mvn call is logged together with failed tests,
which means you can simply copy and paste that line and run the test.

Generating random data
~~~~~~~~~~~~~~~~~~~~~~

The next step is to convert your test using static test data into a test
using randomized test data. The kind of data you could randomize varies
a lot with the functionality you are testing against. Take a look at the
following examples (note, that this list could go on for pages, as a
distributed system has many, many moving parts):

-  Searching for data using arbitrary UTF8 signs

-  Changing your mapping configuration, index and field names with each
   run

-  Changing your response sizes/configurable limits with each run

-  Changing the number of shards/replicas when creating an index

So, how can you create random data. The most important thing to know is,
that you never should instantiate your own ``Random`` instance, but use
the one provided in the ``RandomizedTest``, from which all elasticsearch
dependent test classes inherit from.

+------------+---------------------------------------------------------------+
| ``getRando | Returns the random instance, which can recreated when calling |
| m()``      | the test with specific parameters                             |
+------------+---------------------------------------------------------------+
| ``randomBo | Returns a random boolean                                      |
| olean()``  |                                                               |
+------------+---------------------------------------------------------------+
| ``randomBy | Returns a random byte                                         |
| te()``     |                                                               |
+------------+---------------------------------------------------------------+
| ``randomSh | Returns a random short                                        |
| ort()``    |                                                               |
+------------+---------------------------------------------------------------+
| ``randomIn | Returns a random integer                                      |
| t()``      |                                                               |
+------------+---------------------------------------------------------------+
| ``randomLo | Returns a random long                                         |
| ng()``     |                                                               |
+------------+---------------------------------------------------------------+
| ``randomFl | Returns a random float                                        |
| oat()``    |                                                               |
+------------+---------------------------------------------------------------+
| ``randomDo | Returns a random double                                       |
| uble()``   |                                                               |
+------------+---------------------------------------------------------------+
| ``randomIn | Returns a random integer between 0 and max                    |
| t(max)``   |                                                               |
+------------+---------------------------------------------------------------+
| ``between( | Returns a random between the supplied range                   |
| )``        |                                                               |
+------------+---------------------------------------------------------------+
| ``atLeast( | Returns a random integer of at least the specified integer    |
| )``        |                                                               |
+------------+---------------------------------------------------------------+
| ``atMost() | Returns a random integer of at most the specified integer     |
| ``         |                                                               |
+------------+---------------------------------------------------------------+
| ``randomLo | Returns a random locale                                       |
| cale()``   |                                                               |
+------------+---------------------------------------------------------------+
| ``randomTi | Returns a random timezone                                     |
| meZone()`` |                                                               |
+------------+---------------------------------------------------------------+

In addition, there are a couple of helper methods, allowing you to
create random ASCII and Unicode strings, see methods beginning with
``randomAscii``, ``randomUnicode``, and ``randomRealisticUnicode`` in
the random test class. The latter one tries to create more realistic
unicode string by not being arbitrary random.

If you want to debug a specific problem with a specific random seed, you
can use the ``@Seed`` annotation to configure a specific seed for a
test. If you want to run a test more than once, instead of starting the
whole test suite over and over again, you can use the ``@Repeat``
annotation with an arbitrary value. Each iteration than gets run with a
different seed.

Assertions
----------

As many elasticsearch tests are checking for a similar output, like the
amount of hits or the first hit or special highlighting, a couple of
predefined assertions have been created. Those have been put into the
``ElasticsearchAssertions`` class.

+------------+---------------------------------------------------------------+
| ``assertHi | Checks hit count of a search or count request                 |
| tCount()`` |                                                               |
+------------+---------------------------------------------------------------+
| ``assertAc | Ensure the a request has been acknowledged by the master      |
| ked()``    |                                                               |
+------------+---------------------------------------------------------------+
| ``assertSe | Asserts a search response contains specific ids               |
| archHits() |                                                               |
| ``         |                                                               |
+------------+---------------------------------------------------------------+
| ``assertMa | Asserts a matching count from a percolation response          |
| tchCount() |                                                               |
| ``         |                                                               |
+------------+---------------------------------------------------------------+
| ``assertFi | Asserts the first hit hits the specified matcher              |
| rstHit()`` |                                                               |
+------------+---------------------------------------------------------------+
| ``assertSe | Asserts the second hit hits the specified matcher             |
| condHit()` |                                                               |
| `          |                                                               |
+------------+---------------------------------------------------------------+
| ``assertTh | Asserts the third hits hits the specified matcher             |
| irdHit()`` |                                                               |
+------------+---------------------------------------------------------------+
| ``assertSe | Assert a certain element in a search response hits the        |
| archHit()` | specified matcher                                             |
| `          |                                                               |
+------------+---------------------------------------------------------------+
| ``assertNo | Asserts that no shard failures have occurred in the response  |
| Failures() |                                                               |
| ``         |                                                               |
+------------+---------------------------------------------------------------+
| ``assertFa | Asserts that shard failures have happened during a search     |
| ilures()`` | request                                                       |
+------------+---------------------------------------------------------------+
| ``assertHi | Assert specific highlights matched                            |
| ghlight()` |                                                               |
| `          |                                                               |
+------------+---------------------------------------------------------------+
| ``assertSu | Assert for specific suggestions                               |
| ggestion() |                                                               |
| ``         |                                                               |
+------------+---------------------------------------------------------------+
| ``assertSu | Assert for specific suggestion count                          |
| ggestionSi |                                                               |
| ze()``     |                                                               |
+------------+---------------------------------------------------------------+
| ``assertTh | Assert a specific exception has been thrown                   |
| rows()``   |                                                               |
+------------+---------------------------------------------------------------+

Common matchers

+------------+---------------------------------------------------------------+
| ``hasId()` | Matcher to check for a search hit id                          |
| `          |                                                               |
+------------+---------------------------------------------------------------+
| ``hasType( | Matcher to check for a search hit type                        |
| )``        |                                                               |
+------------+---------------------------------------------------------------+
| ``hasIndex | Matcher to check for a search hit index                       |
| ()``       |                                                               |
+------------+---------------------------------------------------------------+

Usually, you would combine assertions and matchers in your test like
this

.. code:: java

    SearchResponse seearchResponse = client().prepareSearch() ...;
    assertHitCount(searchResponse, 4);
    assertFirstHit(searchResponse, hasId("4"));
    assertSearchHits(searchResponse, "1", "2", "3", "4");

Glossary of terms
=================

analysis
Analysis is the process of converting `full text <#glossary-text>`__ to
`terms <#glossary-term>`__. Depending on which analyzer is used, these
phrases: ``FOO BAR``, ``Foo-Bar``, ``foo,bar`` will probably all result
in the terms ``foo`` and ``bar``. These terms are what is actually
stored in the index. A full text query (not a `term <#glossary-term>`__
query) for ``FoO:bAR`` will also be analyzed to the terms
``foo``,\ ``bar`` and will thus match the terms stored in the index. It
is this process of analysis (both at index time and at search time) that
allows elasticsearch to perform full text queries. Also see
`text <#glossary-text>`__ and `term <#glossary-term>`__.

cluster
A cluster consists of one or more `nodes <#glossary-node>`__ which share
the same cluster name. Each cluster has a single master node which is
chosen automatically by the cluster and which can be replaced if the
current master node fails.

document
A document is a JSON document which is stored in elasticsearch. It is
like a row in a table in a relational database. Each document is stored
in an `index <#glossary-index>`__ and has a `type <#glossary-type>`__
and an `id <#glossary-id>`__. A document is a JSON object (also known in
other languages as a hash / hashmap / associative array) which contains
zero or more `fields <#glossary-field>`__, or key-value pairs. The
original JSON document that is indexed will be stored in the
```_source`` field <#glossary-source_field>`__, which is returned by
default when getting or searching for a document.

id
The ID of a `document <#glossary-document>`__ identifies a document. The
``index/type/id`` of a document must be unique. If no ID is provided,
then it will be auto-generated. (also see
`routing <#glossary-routing>`__)

field
A `document <#glossary-document>`__ contains a list of fields, or
key-value pairs. The value can be a simple (scalar) value (eg a string,
integer, date), or a nested structure like an array or an object. A
field is similar to a column in a table in a relational database. The
`mapping <#glossary-mapping>`__ for each field has a field *type* (not
to be confused with document `type <#glossary-type>`__) which indicates
the type of data that can be stored in that field, eg ``integer``,
``string``, ``object``. The mapping also allows you to define (amongst
other things) how the value for a field should be analyzed.

index
An index is like a *database* in a relational database. It has a
`mapping <#glossary-mapping>`__ which defines multiple
`types <#glossary-type>`__. An index is a logical namespace which maps
to one or more `primary shards <#glossary-primary-shard>`__ and can have
zero or more `replica shards <#glossary-replica-shard>`__.

mapping
A mapping is like a *schema definition* in a relational database. Each
`index <#glossary-index>`__ has a mapping, which defines each
`type <#glossary-type>`__ within the index, plus a number of index-wide
settings. A mapping can either be defined explicitly, or it will be
generated automatically when a document is indexed.

node
A node is a running instance of elasticsearch which belongs to a
`cluster <#glossary-cluster>`__. Multiple nodes can be started on a
single server for testing purposes, but usually you should have one node
per server. At startup, a node will use unicast (or multicast, if
specified) to discover an existing cluster with the same cluster name
and will try to join that cluster.

primary shard
Each document is stored in a single primary `shard <#glossary-shard>`__.
When you index a document, it is indexed first on the primary shard,
then on all `replicas <#glossary-replica-shard>`__ of the primary shard.
By default, an `index <#glossary-index>`__ has 5 primary shards. You can
specify fewer or more primary shards to scale the number of
`documents <#glossary-document>`__ that your index can handle. You
cannot change the number of primary shards in an index, once the index
is created. See also `routing <#glossary-routing>`__

replica shard
Each `primary shard <#glossary-primary-shard>`__ can have zero or more
replicas. A replica is a copy of the primary shard, and has two
purposes:

1. increase failover: a replica shard can be promoted to a primary shard
   if the primary fails

2. increase performance: get and search requests can be handled by
   primary or replica shards. By default, each primary shard has one
   replica, but the number of replicas can be changed dynamically on an
   existing index. A replica shard will never be started on the same
   node as its primary shard.

routing
When you index a document, it is stored on a single `primary
shard <#glossary-primary-shard>`__. That shard is chosen by hashing the
``routing`` value. By default, the ``routing`` value is derived from the
ID of the document or, if the document has a specified parent document,
from the ID of the parent document (to ensure that child and parent
documents are stored on the same shard). This value can be overridden by
specifying a ``routing`` value at index time, or a `routing
field <#mapping-routing-field>`__ in the
`mapping <#glossary-mapping>`__.

shard
A shard is a single Lucene instance. It is a low-level “worker” unit
which is managed automatically by elasticsearch. An index is a logical
namespace which points to `primary <#glossary-primary-shard>`__ and
`replica <#glossary-replica-shard>`__ shards. Other than defining the
number of primary and replica shards that an index should have, you
never need to refer to shards directly. Instead, your code should deal
only with an index. Elasticsearch distributes shards amongst all
`nodes <#glossary-node>`__ in the `cluster <#glossary-cluster>`__, and
can move shards automatically from one node to another in the case of
node failure, or the addition of new nodes.

source field
By default, the JSON document that you index will be stored in the
``_source`` field and will be returned by all get and search requests.
This allows you access to the original object directly from search
results, rather than requiring a second step to retrieve the object from
an ID. Note: the exact JSON string that you indexed will be returned to
you, even if it contains invalid JSON. The contents of this field do not
indicate anything about how the data in the object has been indexed.

term
A term is an exact value that is indexed in elasticsearch. The terms
``foo``, ``Foo``, ``FOO`` are NOT equivalent. Terms (i.e. exact values)
can be searched for using *term* queries.See also
`text <#glossary-text>`__ and `analysis <#glossary-analysis>`__.

text
Text (or full text) is ordinary unstructured text, such as this
paragraph. By default, text will be `analyzed <#glossary-analysis>`__
into `terms <#glossary-term>`__, which is what is actually stored in the
index. Text `fields <#glossary-field>`__ need to be analyzed at index
time in order to be searchable as full text, and keywords in full text
queries must be analyzed at search time to produce (and search for) the
same terms that were generated at index time. See also
`term <#glossary-term>`__ and `analysis <#glossary-analysis>`__.

type
A type is like a *table* in a relational database. Each type has a list
of `fields <#glossary-field>`__ that can be specified for
`documents <#glossary-document>`__ of that type. The
`mapping <#glossary-mapping>`__ defines how each field in the document
is analyzed.

.. |Windows Service Manager GUI| image:: images/service-manager-win.png
.. |images/percentiles\_error.png| image:: images/percentiles_error.png
.. |images/cardinality\_error.png| image:: images/cardinality_error.png
.. |images/Gaussian.png| image:: images/Gaussian.png
.. |images/sigma.png| image:: images/sigma.png
.. |images/sigma\_calc.png| image:: images/sigma_calc.png
.. |images/Exponential.png| image:: images/Exponential.png
.. |images/lambda.png| image:: images/lambda.png
.. |images/lambda\_calc.png| image:: images/lambda_calc.png
.. |images/Linear.png| image:: images/Linear.png
.. |images/s\_calc.png| image:: images/s_calc.png
.. |images/decay\_2d.png| image:: images/decay_2d.png
.. |https://f.cloud.github.com/assets/4320215/768157/cd0e18a6-e898-11e2-9b3c-f0145078bd6f.png| image:: https://f.cloud.github.com/assets/4320215/768157/cd0e18a6-e898-11e2-9b3c-f0145078bd6f.png
.. |https://f.cloud.github.com/assets/4320215/768160/ec43c928-e898-11e2-8e0d-f3c4519dbd89.png| image:: https://f.cloud.github.com/assets/4320215/768160/ec43c928-e898-11e2-8e0d-f3c4519dbd89.png
.. |https://f.cloud.github.com/assets/4320215/768161/082975c0-e899-11e2-86f7-174c3a729d64.png| image:: https://f.cloud.github.com/assets/4320215/768161/082975c0-e899-11e2-86f7-174c3a729d64.png
.. |https://f.cloud.github.com/assets/4320215/768162/0b606884-e899-11e2-907b-aefc77eefef6.png| image:: https://f.cloud.github.com/assets/4320215/768162/0b606884-e899-11e2-907b-aefc77eefef6.png
.. |https://f.cloud.github.com/assets/4320215/768164/1775b0ca-e899-11e2-9f4a-776b406305c6.png| image:: https://f.cloud.github.com/assets/4320215/768164/1775b0ca-e899-11e2-9f4a-776b406305c6.png
.. |https://f.cloud.github.com/assets/4320215/768165/19d8b1aa-e899-11e2-91bc-6b0553e8d722.png| image:: https://f.cloud.github.com/assets/4320215/768165/19d8b1aa-e899-11e2-91bc-6b0553e8d722.png
