Concepts
The traditional boolean operators are AND or NOT are very effective when programming or doing SQL requests but are not as good for search relevancy. The Apache Lucene library has it’s own boolean operators : Must, Must_Not, Should.
The must boolean operator in a query means that only the documents that contain the specified keyword(s) will be returned, the must_not boolean operator in the query means that all the documents containing the specified keyword(s) will not be returned while the should operator means that the documents that contain at least one of the specified keyword(s) will be returned.
Must is represented by the character +, Must_Not by the character – and Should by no character. So , the query “+dog -cat lazy” means the results must contain dog, must not contain the keyword cat and should contain lazy so they appear at the top of the list.
Elasticsearch/Opensearch query examples
The following queries are from opensearch, but should also work in Elasticsearch.
Must query
GET my_index/_search
{
"query": {
"bool": {
"must": [
{"match": {"customer_first_name": "Elyssa"}}
]
}
}
}
This query ensures that only the documents where the field “customer_first_name ” value is “Elyssa” are returned.
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 348,
"relation": "eq"
},
"max_score": 2.7161815,
"hits": [
{
"_index": "my_index",
"_id": "uEBFNYsB3YQhEmbfO9IV",
"_score": 2.7161815,
"_source": {
"category": [
"Women's Clothing"
],
"customer_first_name": "Elyssa",
"day_of_week": "Tuesday",
},
{
"_index": "my_index",
"_id": "zEBFNYsB3YQhEmbfO9IW",
"_score": 2.7161815,
"_source": {
"category": [
"Women's Shoes"
],
"customer_first_name": "Elyssa",
"day_of_week": "Thursday",
},
{
"_index": "my_index",
"_id": "8kBFNYsB3YQhEmbfO9IW",
"_score": 2.7161815,
"_source": {
"category": [
"Women's Accessories"
],
"customer_first_name": "Elyssa",
"day_of_week": "Monday",
}
}
As you can see in the json response containing the 3 most relevant documents, there were 348 hits (matches) and all the returned documents contain the field “customer_first_name” with the “Elyssa” value.
Must_not query
GET my_index/_search
{
"query": {
"bool": {
"must_not": [
{"match": {"category": "Men's Clothing"}}
]
}
}
}
This query will return any document where the “category” field doesn’t contain the value “Men’s Clothing”.
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 462,
"relation": "eq"
},
"max_score": 0,
"hits": [
{
"_index": "my_index",
"_id": "p0BFNYsB3YQhEmbfO9IV",
"_score": 0,
"_source": {
"category": [
"Women's Accessories",
"Women's Shoes"
],
"customer_first_name": "Selena"
},
{
"_index": "my_index",
"_id": "qEBFNYsB3YQhEmbfO9IV",
"_score": 0,
"_source": {
"category": [
"Women's Shoes",
"Women's Accessories"
],
"customer_first_name": "rania",
"day_of_week": "Thursday"
},
{
"_index": "my_index",
"_id": "r0BFNYsB3YQhEmbfO9IV",
"_score": 0,
"_source": {
"category": [
"Women's Shoes"
],
"customer_first_name": "Brigitte",
"day_of_week": "Thursday",
}
]
}
}
There are 462 documents where the “category” field doesn’t contain the value “Men’s Clothing”. Since we only specify a must_not boolean operator, there are no fields to match so no documents has a better score than 0.
Should query
GET my_index/_search
{
"query": {
"bool": {
"should": [
{"match": {"customer_first_name": "Elyssa"}},
{"match": {"day_of_week": "Thursday"}}
]
}
}
}
This will fetch the documents that contain a match to one of the two fields. With the Should operator the more the matches in a document, the better the score.
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1060,
"relation": "eq"
},
"max_score": 4.5128717,
"hits": [
{
"_index": "my_index",
"_id": "zEBFNYsB3YQhEmbfO9IW",
"_score": 4.5128717,
"_source": {
"category": [
"Women's Shoes"
],
"customer_first_name": "Elyssa",
"day_of_week": "Thursday"
},
{
"_index": "my_index",
"_id": "90BFNYsB3YQhEmbfO9IW",
"_score": 4.5128717,
"_source": {
"category": [
"Women's Clothing"
],
"customer_first_name": "Elyssa",
"day_of_week": "Thursday"
},
{
"_index": "my_index",
"_id": "WUBFNYsB3YQhEmbfO9MW",
"_score": 4.5128717,
"_source": {
"category": [
"Women's Shoes",
"Women's Clothing"
],
"customer_first_name": "Elyssa",
"day_of_week": "Thursday"
}
]
}
}
As you can see, even though there are two fields to match instead of one, there are much more hits (1060). The three most relevant documents match both fields : “customer_first_name” and “day_of_week“.