WPSolr logo

WordPress AI Recommendations
- SEO, conversions -

Search

Must, Must_Not, Should : Boolean operators in search engines don’t work how you think they do

Published 19 October, 2023
– Last updated 19 October, 2023

Table of Contents

Concepts

 

The traditional boolean operators are AND or NOT are very effective when programming or doing SQL requests but are not as good for search relevancy. The Apache Lucene library has it’s own boolean operators :  Must, Must_Not, Should.

The must boolean operator in a query means that only the documents that contain the specified keyword(s) will be returned, the must_not boolean operator in the query means that all the documents containing the specified keyword(s) will not be returned while the should operator means that the documents that contain at least one of the specified keyword(s) will be returned.

Must is represented by the character +,  Must_Not by the character – and Should by no character. So , the query “+dog -cat lazy” means the results must contain dog, must not contain the keyword cat and should contain lazy so they appear at the top of the list.

 

Elasticsearch/Opensearch query examples

 

The following queries are from opensearch, but should also work in Elasticsearch.

 

Must query

 

GET my_index/_search 
{
  "query": {
    "bool": {
      "must": [
        {"match": {"customer_first_name": "Elyssa"}}
      ]
    }
  }
}

 

This query ensures that only the documents where the field “customer_first_name ” value is “Elyssa” are returned.

 

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 348,
      "relation": "eq"
    },
    "max_score": 2.7161815,
    "hits": [
      {
        "_index": "my_index",
        "_id": "uEBFNYsB3YQhEmbfO9IV",
        "_score": 2.7161815,
        "_source": {
          "category": [
            "Women's Clothing"
          ],
          "customer_first_name": "Elyssa",
          "day_of_week": "Tuesday",
      },
      {
        "_index": "my_index",
        "_id": "zEBFNYsB3YQhEmbfO9IW",
        "_score": 2.7161815,
        "_source": {
          "category": [
            "Women's Shoes"
          ],
          "customer_first_name": "Elyssa",
          "day_of_week": "Thursday",
      },
      {
        "_index": "my_index",
        "_id": "8kBFNYsB3YQhEmbfO9IW",
        "_score": 2.7161815,
        "_source": {
          "category": [
            "Women's Accessories"
          ],
          "customer_first_name": "Elyssa",
          "day_of_week": "Monday",
    }
}

 

As you can see in the json response containing the 3 most relevant documents, there were 348 hits (matches) and all the returned documents contain the field “customer_first_name” with the “Elyssa” value.

 

Must_not query

 

GET my_index/_search 
{
  "query": {
    "bool": {
      "must_not": [
        {"match": {"category": "Men's Clothing"}}
      ]
    }
  }
}

 

This query will return any document where the “category” field doesn’t contain the value “Men’s Clothing”.

 

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 462,
      "relation": "eq"
    },
    "max_score": 0,
    "hits": [
      {
        "_index": "my_index",
        "_id": "p0BFNYsB3YQhEmbfO9IV",
        "_score": 0,
        "_source": {
          "category": [
            "Women's Accessories",
            "Women's Shoes"
          ],
          "customer_first_name": "Selena"
      },
      {
        "_index": "my_index",
        "_id": "qEBFNYsB3YQhEmbfO9IV",
        "_score": 0,
        "_source": {
          "category": [
            "Women's Shoes",
            "Women's Accessories"
          ],
          "customer_first_name": "rania",
          "day_of_week": "Thursday"
      },
      {
        "_index": "my_index",
        "_id": "r0BFNYsB3YQhEmbfO9IV",
        "_score": 0,
        "_source": {
          "category": [
            "Women's Shoes"
          ],
          "customer_first_name": "Brigitte",
          "day_of_week": "Thursday",
      }
    ]
  }
}

 

There are 462 documents where  the “category” field doesn’t contain the value “Men’s Clothing”. Since we only specify a must_not boolean operator, there are no fields to match so no documents has a better score than 0.

 

Should query

 

GET my_index/_search 
{
  "query": {
    "bool": {
      "should": [
        {"match": {"customer_first_name": "Elyssa"}},
        {"match": {"day_of_week": "Thursday"}}
      ]
    }
  }
}

 

This will fetch the documents that contain a match to one of the two fields. With the Should operator the more the matches in a document, the better the score.

 

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1060,
      "relation": "eq"
    },
    "max_score": 4.5128717,
    "hits": [
      {
        "_index": "my_index",
        "_id": "zEBFNYsB3YQhEmbfO9IW",
        "_score": 4.5128717,
        "_source": {
          "category": [
            "Women's Shoes"
          ],
          "customer_first_name": "Elyssa",
          "day_of_week": "Thursday"
      },
      {
        "_index": "my_index",
        "_id": "90BFNYsB3YQhEmbfO9IW",
        "_score": 4.5128717,
        "_source": {
          "category": [
            "Women's Clothing"
          ],
          "customer_first_name": "Elyssa",
          "day_of_week": "Thursday"
      },
      {
        "_index": "my_index",
        "_id": "WUBFNYsB3YQhEmbfO9MW",
        "_score": 4.5128717,
        "_source": {
          "category": [
            "Women's Shoes",
            "Women's Clothing"
          ],
          "customer_first_name": "Elyssa",
          "day_of_week": "Thursday"
      }
    ]
  }
}

 

As you can see, even though there are two fields to match instead of one, there are much more hits (1060). The three most relevant documents match both fields : “customer_first_name” and “day_of_week“.

Related posts ... not powered by WPSOLR 😊

Elasticsearch and the world of big data

Introduction Big data is now a reality across the world. The massive amount of structured and unstructured data that companies generate every day requires effective