Magento 2 Elasticsearch Cheatsheet

Published: January 9, 2018

Tags:

This post is a catch all location for my notes on working with Elasticsearch in Magento 2. It will be continually updated as I continue to spend more time with Elasticsearch.

Get A List Of Indexes

This request will display a list of all indexes.

$ curl 'localhost:9200/_cat/indices?v'
health status index                           pri rep docs.count docs.deleted store.size pri.store.size
yellow open   magento2_product_1_v1             5   1          1            0      7.5kb          7.5kb

Inspecting Indexed Data

NOTE: All examples will be run against the example "magento2_product_1_v1" index. Replace the index as needed when running your queries...

This request will match any indexed documents…

$ curl 'localhost:9200/magento2_product_1_v1/_search?pretty&q=*:*'
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "magento2_product_1_v1",
      "_type" : "document",
      "_id" : "1",
      "_score" : 1.0,
      "_source" : {
        "store_id" : "1",
        "created_at" : "2018-01-10T01:53:24+00:00",
        "gift_message_available" : "2",
        "gift_message_available_value" : "Use config",
        "gift_wrapping_available" : "2",
        "gift_wrapping_available_value" : "Use config",
        "is_returnable" : "2",
        "is_returnable_value" : "",
        "meta_description" : "Test ",
        "meta_keyword" : "Test",
        "meta_title" : "Test",
        "name" : "Test",
        "options_container" : "container2",
        "options_container_value" : "Block after Info Column",
        "quantity_and_stock_status_value" : "In Stock",
        "sku" : "Test",
        "status" : "1",
        "status_value" : "Enabled",
        "tax_class_id" : "2",
        "tax_class_id_value" : "Taxable Goods",
        "updated_at" : "2018-01-10T01:53:24+00:00",
        "url_key" : "test",
        "visibility" : "4",
        "visibility_value" : "Catalog, Search",
        "weight" : "1.0000",
        "is_in_stock" : 1,
        "qty" : 999,
        "price_0_1" : "1.000000",
        "price_1_1" : "1.000000",
        "price_2_1" : "1.000000",
        "price_3_1" : "1.000000",
        "category_ids" : "2",
        "position_category_2" : "0",
        "name_category_2" : "Default Category"
      }
    } ]
  }
}

By default it will return 10 documents.

Search By SKU

NOTE: When querying Elasticsearch it's important to understand the concept of analyzers. If your product has a SKU of "SKU", Elasticsearch's analyzers will convert it to lowercase. Therefore, you need to search for "sku", not "SKU".

There are a few ways you can do this. Here’s the simplest (searching for a product with a sku value of “sku”)…

Request

$ curl "localhost:9200/magento2_product_1_v1/_search?pretty" -d'
{
  "query": {
    "term": {
      "sku": "sku"
    }
  }
}
'

Response

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.30685282,
    "hits" : [ {
      "_index" : "magento2_product_1_v1",
      "_type" : "document",
      "_id" : "1",
      "_score" : 0.30685282,
      "_source" : {
        "store_id" : "1",
        "sku" : "SKU",
        "status_value" : "Enabled",
        "status" : "1",
        "visibility_value" : "Catalog, Search",
        "visibility" : "4",
        "tax_class_id_value" : "Taxable Goods",
        "tax_class_id" : "2",
        "name" : "Product",
        "category_ids" : "2 3",
        "position_category_2" : "0",
        "name_category_2" : "Default Category",
        "position_category_3" : "0",
        "name_category_3" : "My Category",
        "price_0_1" : "1.000000",
        "price_1_1" : "1.000000",
        "price_2_1" : "1.000000",
        "price_3_1" : "1.000000",
        "price_4_1" : "1.000000"
      }
    } ]
  }
}

Search For Products Named T-Shirt

Request

This bool query moves closer to the direction of how Magento queries Elasticsearch. Note the double-escaping of t\\-shirt. This is again due to analyzers.

$ curl "localhost:9200/magento2_product_1_v1/_search?pretty" -d'
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "name": {
              "query": "t\\-shirt"
            }
          }
        }
      ]
    }
  }
}
'

Queries With Multiple Words

By default Elasticsearch will do an “or” search when receiving a query with multiple words. If an “and” is desired this can be achieved by providing and as the operator

Request

$ curl "localhost:9200/magento2_product_1_v1/_search?pretty" -d'
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "name": {
              "query": "green t\\-shirt",
              "operator": "and"
            }
          }
        }
      ]
    }
  }
}
'

NOTE: Magento does an "or" search. There's no way to change this out-of-box...

Returning Specific Fields

By default Elasticsearch will return all the fields for each document. An array of fields can by provided to only retrieve specific ones…

Request

curl "localhost:9200/magento2_product_1_v1/_search?pretty" -d'
{
  "fields": [
    "_id",
    "_score",
    "name",
    "sku"
  ],
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "name": {
              "query": "green t\\-shirt"
            }
          }
        }
      ]
    }
  }
}
'

should vs. must

So far the bool examples we’ve looked at all use should exclusively. When using should with multiple conditions a minimum_should_match can be used to define how many of the conditions need to match. Here’s an example…

Request

curl "localhost:9200/magento2_product_1_v1/_search?pretty" -d'
{
  "fields": [
    "_id",
    "_score",
    "name",
    "sku",
    "color"
  ],
  "query": {
    "bool": {
      "minimum_should_match": "1",
      "should": [
        {
          "match": {
            "name": {
              "query": "t\\-shirt"
            }
          }
        },
        {
          "match": {
            "color": {
              "query": "16"
            }
          }
        }
      ]
    }
  }
}
'

As the minimum_should_match is set to 1 either the name of the product can contain the string “t-shirt” or the color can be option id “16”. Only one of those conditions need to be true.

Setting the minimum_should_match to “100%” would require all the conditions to match.

Another way to achieve this is via a must. All conditions provided as musts must match (as you’d expect)…

Request

curl "localhost:9200/magento2_product_1_v1/_search?pretty" -d'
{
  "fields": [
    "_id",
    "_score",
    "name",
    "sku",
    "color"
  ],
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": {
              "query": "t\\-shirt"
            }
          }
        },
        {
          "match": {
            "color": {
              "query": "16"
            }
          }
        }
      ]
    }
  }
}
'

Viewing A Query Log

It appears that the best option for viewing a query log is to decrease the the slowlog threshold to “0s”. This can be done at runtime without restarting Elasticsearch.

$ curl -XPUT "http://localhost:9200/magento2_product_1_v1/_settings" -d'
{
    "index.search.slowlog.threshold.query.debug": "0s"
}'

The location for log files is defined by the path.logs configuration value in elasticsearch.yml.

Reference: Logging Requests to Elasticsearch

Logged queries look something like this…

[2018-05-13 11:49:17,789][DEBUG][index.search.slowlog.query] [magento2-2-2-b2b-ee_product_1_v2]took[1.4ms], took_millis[1], types[document], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":"10000","fields":["_id","_score"],"query":{"bool":{"must":[{"term":{"category_ids":"4"}},{"terms":{"visibility":["2","4"]}}],"minimum_should_match":1}},"aggregations":{"price_bucket":{"extended_stats":{"field":"price_0_1"}},"category_bucket":{"terms":{"field":"category_ids"}},"manufacturer_bucket":{"terms":{"field":"manufacturer"}},"color_bucket":{"terms":{"field":"color"}}}}], extra_source[],

The actual query received can be found inside brackets after the word source[

Note that Magento automatically renames the Elasticsearch index during a full catalog search reindex (as documented below), so these settings can go away…

Index Version Naming

The version number is increased by when Magento\Elasticsearch\Model\Adapter\Elasticsearch::cleanIndex() is called.

How Magento Queries Elasticsearch

Search Results Page

Below is an example of how Magento queries Elasticsearch when searching for the term “Product”. Fields queried are determined by the “Use in Search” attribute property and the boosts are determined by the “Search Weight” attribute property…

$ curl "localhost:9200/magento2_product_1_v1/_search?pretty" -d'
{
  "aggregations": {
    "prices": {
      "histogram": {
        "field": "price_0_1",
        "interval": 1
      }
    }
  },
  "fields": [
    "_id",
    "_score"
  ],
  "from": 0,
  "query": {
    "bool": {
      "minimum_should_match": 1,
      "must": [
        {
          "terms": {
            "visibility": [
              "3",
              "4"
            ]
          }
        }
      ],
      "should": [
        {
          "match": {
            "sku": {
              "boost": 2,
              "query": "product"
            }
          }
        },
        {
          "match": {
            "_all": {
              "boost": 2,
              "query": "product"
            }
          }
        },
        {
          "match": {
            "name": {
              "boost": 6,
              "query": "product"
            }
          }
        },
        {
          "match": {
            "description": {
              "boost": 11,
              "query": "product"
            }
          }
        },
        {
          "match": {
            "short_description": {
              "boost": 2,
              "query": "product"
            }
          }
        },
        {
          "match": {
            "manufacturer": {
              "boost": 2,
              "query": "product"
            }
          }
        },
        {
          "match": {
            "color": {
              "boost": 2,
              "query": "product"
            }
          }
        },
        {
          "match": {
            "status_value": {
              "boost": 2,
              "query": "product"
            }
          }
        },
        {
          "match": {
            "tax_class_id_value": {
              "boost": 2,
              "query": "product"
            }
          }
        }
      ]
    }
  },
  "size": "10000"
}
'

Category Pages

When Elasticsearch is configured as Magento’s search engine, Elasticsearch will also be consulted to retrieve the product set while browsing category pages. Here’s a query to return all the product in category “3” with color “12”

$ curl "localhost:9200/magento2_product_1_v1/_search?pretty" -d'
{
  "aggregations": {
    "prices": {
      "histogram": {
        "field": "price_0_1",
        "interval": 1
      }
    }
  },
  "fields": [
    "_id",
    "_score"
  ],
  "from": 0,
  "query": {
    "bool": {
      "minimum_should_match": 1,
      "must": [
        {
          "term": {
            "category_ids": "3"
          }
        },
        {
          "terms": {
            "visibility": [
              "2",
              "4"
            ]
          }
        },
        {
          "term": {
            "color": "13"
          }
        }
      ]
    }
  },
  "size": "10000"
}
'

Max Chadwick Hi, I'm Max!

I'm a software developer who mainly works in PHP, but also dabbles in Ruby and Go. Technical topics that interest me are monitoring, security and performance.

During the day I solve challenging technical problems at Something Digital where I mainly work with the Magento platform. I also blog about tech, work on open source and hunt for bugs.

If you'd like to get in touch with me the best way is on Twitter.