Magento 2 Elasticsearch Cheatsheet

Published: January 9, 2018

This post is a catch all location for my notes on working with Elasticsearch in Magento 2. It will be continually updated as I continue to spend more time with Elasticsearch.

Get A List Of Indexes

This request will display a list of all indexes.

$ curl 'localhost:9200/_cat/indices?v'
health status index                           pri rep docs.count docs.deleted store.size pri.store.size
yellow open   magento2_product_1_v1             5   1          1            0      7.5kb          7.5kb

This assumes that the Elasticsearch index is running on localhost on port 9200. The actual host / port on a given project can be confirmed in the database in the core_config_data table (or potentially in the app/etc/env.php file) at the following paths

  • Host: catalog/search/elasticsearch_server_hostname
  • Port: catalog/search/elasticsearch_server_port

Determining Which Index is Being Used

The query may return a number of indexes.

Indexes follow the following format…

magento2_product__v

As you can see, on a multi-store Magento instance a separate Elasticsearch index will be available for each store.

In terms of the version number, there should only be one index per store, but I have seen cases where indexes don’t get deleted cleanly and there are old indexes lying around. Magento will query the highest version number for each store.

Inspecting Indexed Data

NOTE: All examples will be run against the example "magento2_product_1_v1" index. Replace the index as needed when running your queries...

This request will match any indexed documents…

$ curl 'localhost:9200/magento2_product_1_v1/_search?pretty&q=*:*'
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "magento2_product_1_v1",
      "_type" : "document",
      "_id" : "1",
      "_score" : 1.0,
      "_source" : {
        "store_id" : "1",
        "created_at" : "2018-01-10T01:53:24+00:00",
        "gift_message_available" : "2",
        "gift_message_available_value" : "Use config",
        "gift_wrapping_available" : "2",
        "gift_wrapping_available_value" : "Use config",
        "is_returnable" : "2",
        "is_returnable_value" : "",
        "meta_description" : "Test ",
        "meta_keyword" : "Test",
        "meta_title" : "Test",
        "name" : "Test",
        "options_container" : "container2",
        "options_container_value" : "Block after Info Column",
        "quantity_and_stock_status_value" : "In Stock",
        "sku" : "Test",
        "status" : "1",
        "status_value" : "Enabled",
        "tax_class_id" : "2",
        "tax_class_id_value" : "Taxable Goods",
        "updated_at" : "2018-01-10T01:53:24+00:00",
        "url_key" : "test",
        "visibility" : "4",
        "visibility_value" : "Catalog, Search",
        "weight" : "1.0000",
        "is_in_stock" : 1,
        "qty" : 999,
        "price_0_1" : "1.000000",
        "price_1_1" : "1.000000",
        "price_2_1" : "1.000000",
        "price_3_1" : "1.000000",
        "category_ids" : "2",
        "position_category_2" : "0",
        "name_category_2" : "Default Category"
      }
    } ]
  }
}

By default it will return 10 documents.

Search By SKU

NOTE: When querying Elasticsearch it's important to understand the concept of analyzers. If your product has a SKU of "SKU", Elasticsearch's analyzers will convert it to lowercase. Therefore, you need to search for "sku", not "SKU".

There are a few ways you can do this. Here’s the simplest (searching across all fields…)

Request

curl 'localhost:9200/magento2_product_1_v1/_search?pretty&q=sku'

Response

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.30685282,
    "hits" : [ {
      "_index" : "magento2_product_1_v1",
      "_type" : "document",
      "_id" : "1",
      "_score" : 0.30685282,
      "_source" : {
        "store_id" : "1",
        "sku" : "SKU",
        "status_value" : "Enabled",
        "status" : "1",
        "visibility_value" : "Catalog, Search",
        "visibility" : "4",
        "tax_class_id_value" : "Taxable Goods",
        "tax_class_id" : "2",
        "name" : "Product",
        "category_ids" : "2 3",
        "position_category_2" : "0",
        "name_category_2" : "Default Category",
        "position_category_3" : "0",
        "name_category_3" : "My Category",
        "price_0_1" : "1.000000",
        "price_1_1" : "1.000000",
        "price_2_1" : "1.000000",
        "price_3_1" : "1.000000",
        "price_4_1" : "1.000000"
      }
    } ]
  }
}

Search For Products Named T-Shirt

Request

This bool query moves closer to the direction of how Magento queries Elasticsearch. Note the double-escaping of t\\-shirt. This is again due to analyzers.

$ curl -H 'Content-Type: application/json' "localhost:9200/magento2_product_1_v1/_search?pretty" -d'
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "name": {
              "query": "t\\-shirt"
            }
          }
        }
      ]
    }
  }
}
'

Queries With Multiple Words

By default Elasticsearch will do an “or” search when receiving a query with multiple words. If an “and” is desired this can be achieved by providing and as the operator

Request

$ curl -H 'Content-Type: application/json' "localhost:9200/magento2_product_1_v1/_search?pretty" -d'
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "name": {
              "query": "green t\\-shirt",
              "operator": "and"
            }
          }
        }
      ]
    }
  }
}
'

NOTE: Magento does an "or" search. There's no way to change this out-of-box...

Returning Specific Fields

By default Elasticsearch will return all the fields for each document. An array of fields can by provided to only retrieve specific ones…

Request

curl -H 'Content-Type: application/json' "localhost:9200/magento2_product_1_v1/_search?pretty" -d'
{
  "fields": [
    "_id",
    "_score",
    "name",
    "sku"
  ],
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "name": {
              "query": "green t\\-shirt"
            }
          }
        }
      ]
    }
  }
}
'

should vs. must

So far the bool examples we’ve looked at all use should exclusively. When using should with multiple conditions a minimum_should_match can be used to define how many of the conditions need to match. Here’s an example…

Request

curl -H 'Content-Type: application/json' "localhost:9200/magento2_product_1_v1/_search?pretty" -d'
{
  "fields": [
    "_id",
    "_score",
    "name",
    "sku",
    "color"
  ],
  "query": {
    "bool": {
      "minimum_should_match": "1",
      "should": [
        {
          "match": {
            "name": {
              "query": "t\\-shirt"
            }
          }
        },
        {
          "match": {
            "color": {
              "query": "16"
            }
          }
        }
      ]
    }
  }
}
'

As the minimum_should_match is set to 1 either the name of the product can contain the string “t-shirt” or the color can be option id “16”. Only one of those conditions need to be true.

Setting the minimum_should_match to “100%” would require all the conditions to match.

Another way to achieve this is via a must. All conditions provided as musts must match (as you’d expect)…

Request

curl -H 'Content-Type: application/json' "localhost:9200/magento2_product_1_v1/_search?pretty" -d'
{
  "fields": [
    "_id",
    "_score",
    "name",
    "sku",
    "color"
  ],
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": {
              "query": "t\\-shirt"
            }
          }
        },
        {
          "match": {
            "color": {
              "query": "16"
            }
          }
        }
      ]
    }
  }
}
'

Viewing A Query Log

It appears that the best option for viewing a query log is to decrease the the slowlog threshold to “0s”. This can be done at runtime without restarting Elasticsearch.

$ curl -XPUT "http://localhost:9200/magento2_product_1_v1/_settings" -d'
{
    "index.search.slowlog.threshold.query.debug": "0s"
}'

The location for log files is defined by the path.logs configuration value in elasticsearch.yml.

Reference: Logging Requests to Elasticsearch

Logged queries look something like this…

[2018-05-13 11:49:17,789][DEBUG][index.search.slowlog.query] [magento2-2-2-b2b-ee_product_1_v2]took[1.4ms], took_millis[1], types[document], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":"10000","fields":["_id","_score"],"query":{"bool":{"must":[{"term":{"category_ids":"4"}},{"terms":{"visibility":["2","4"]}}],"minimum_should_match":1}},"aggregations":{"price_bucket":{"extended_stats":{"field":"price_0_1"}},"category_bucket":{"terms":{"field":"category_ids"}},"manufacturer_bucket":{"terms":{"field":"manufacturer"}},"color_bucket":{"terms":{"field":"color"}}}}], extra_source[],

The actual query received can be found inside brackets after the word source[

Note that Magento automatically renames the Elasticsearch index during a full catalog search reindex (as documented below), so these settings can go away…

Index Version Naming

The version number is increased by when Magento\Elasticsearch\Model\Adapter\Elasticsearch::cleanIndex() is called.

How Magento Queries Elasticsearch

Search Results Page

Below is an example of how Magento queries Elasticsearch when searching for the term “Product”. Fields queried are determined by the “Use in Search” attribute property and the boosts are determined by the “Search Weight” attribute property…

$ curl -H 'Content-Type: application/json' "localhost:9200/magento2_product_1_v1/_search?pretty" -d'
{
  "aggregations": {
    "prices": {
      "histogram": {
        "field": "price_0_1",
        "interval": 1
      }
    }
  },
  "fields": [
    "_id",
    "_score"
  ],
  "from": 0,
  "query": {
    "bool": {
      "minimum_should_match": 1,
      "must": [
        {
          "terms": {
            "visibility": [
              "3",
              "4"
            ]
          }
        }
      ],
      "should": [
        {
          "match": {
            "sku": {
              "boost": 2,
              "query": "product"
            }
          }
        },
        {
          "match": {
            "_all": {
              "boost": 2,
              "query": "product"
            }
          }
        },
        {
          "match": {
            "name": {
              "boost": 6,
              "query": "product"
            }
          }
        },
        {
          "match": {
            "description": {
              "boost": 11,
              "query": "product"
            }
          }
        },
        {
          "match": {
            "short_description": {
              "boost": 2,
              "query": "product"
            }
          }
        },
        {
          "match": {
            "manufacturer": {
              "boost": 2,
              "query": "product"
            }
          }
        },
        {
          "match": {
            "color": {
              "boost": 2,
              "query": "product"
            }
          }
        },
        {
          "match": {
            "status_value": {
              "boost": 2,
              "query": "product"
            }
          }
        },
        {
          "match": {
            "tax_class_id_value": {
              "boost": 2,
              "query": "product"
            }
          }
        }
      ]
    }
  },
  "size": "10000"
}
'

Category Pages

When Elasticsearch is configured as Magento’s search engine, Elasticsearch will also be consulted to retrieve the product set while browsing category pages. Here’s a query to return all the product in category “3” with color “12”

$ curl -H 'Content-Type: application/json' "localhost:9200/magento2_product_1_v1/_search?pretty" -d'
{
  "aggregations": {
    "prices": {
      "histogram": {
        "field": "price_0_1",
        "interval": 1
      }
    }
  },
  "fields": [
    "_id",
    "_score"
  ],
  "from": 0,
  "query": {
    "bool": {
      "minimum_should_match": 1,
      "must": [
        {
          "term": {
            "category_ids": "3"
          }
        },
        {
          "terms": {
            "visibility": [
              "2",
              "4"
            ]
          }
        },
        {
          "term": {
            "color": "13"
          }
        }
      ]
    }
  },
  "size": "10000"
}
'

Max Chadwick Hi, I'm Max!

I'm a software developer who mainly works in PHP, but loves dabbling in other languages like Go and Ruby. Technical topics that interest me are monitoring, security and performance. I'm also a stickler for good documentation and clear technical writing.

During the day I lead a team of developers and solve challenging technical problems at Rightpoint where I mainly work with the Magento platform. I've also spoken at a number of events.

In my spare time I blog about tech, work on open source and participate in bug bounty programs.

If you'd like to get in contact, you can find me on Twitter and LinkedIn.