fbpx

Manage Elasticsearch and Solr analyser configuration files

Some search engine Analysers come with a “file path” parameter, instead of a simple value parameter (bool, string, list, …). But how and where to upload the parameter file is often a bit of a puzzle. Especially if, like myself, one use several search engines.

So, I decided to write a small “how to” for Solr and Elasticsearch. Each use case is documented with the cURL or the shell commands required to replicate the tutorial on your own environments. The tutorial environment is a Vagrant Linux Ubuntu 16.04.6 LTS, with Solr 8.1.1 and Elasticsearch 7.3.

The tutorial will be based on a frequently used Token filter, the synonyms filter.

Let’s connect first to the search engine server with ssh (here, a vagrant machine on MacOS):

Image word-image.png of Manage Elasticsearch and Solr analyser configuration files

Elasticsearch

We will use the Elasticsearch Synonym Token Filter for our tutorial.

Surprisingly, Elasticsearch does not yet provide a REST API to upload our resource file. As a consequence, we must use shell commands.

First, locate our Elasticsearch config directory. You can find useful informations in https://www.elastic.co/guide/en/elasticsearch/reference/current/settings.html

But the easiest way to find out the Elasticsearch config directory is to create an index without the file installed yet. Elasticsearch is kind enough to display the missing file and config directory in the error message:

curl -X PUT “localhost:9200/test_index?pretty” -H ‘Content-Type: application/json’ -d’

{

“settings”: {

“index” : {

“analysis” : {

“analyzer” : {

“synonym” : {

“tokenizer” : “whitespace”,

“filter” : [“synonym”]

}

},

“filter” : {

“synonym” : {

“type” : “synonym”,

“synonyms_path” : “analysis/synonym.txt”

}

}

}

}

}

}’

Image word-image-1.png of Manage Elasticsearch and Solr analyser configuration files

Now, install (here, create) the synonyms file:

vagrant@vvv:~$ sudo mkdir /etc/elasticsearch/analysis

vagrant@vvv:~$ sudo vi /etc/elasticsearch/analysis/synonym.txt
(copy your synonyms in the file, then save)

vagrant@vvv:~$ sudo cat /etc/elasticsearch/analysis/synonym.txt
# Blank lines and lines starting with pound are comments.

# Explicit mappings match any token sequence on the LHS of “=>”

# and replace with all alternatives on the RHS. These types of mappings

# ignore the expand parameter in the schema.

# Examples:

i-pod, i pod => ipod,

sea biscuit, sea biscit => seabiscuit

# Equivalent synonyms may be separated with commas and give

# no explicit mapping. In this case the mapping behavior will

# be taken from the expand parameter in the schema. This allows

# the same synonym file to be used in different synonym handling strategies.

# Examples:

ipod, i-pod, i pod

foozball , foosball

universe , cosmos

lol, laughing out loud

# If expand==true, “ipod, i-pod, i pod” is equivalent

# to the explicit mapping:

ipod, i-pod, i pod => ipod, i-pod, i pod

# If expand==false, “ipod, i-pod, i pod” is equivalent

# to the explicit mapping:

ipod, i-pod, i pod => ipod

# Multiple synonym mapping entries are merged.

foo => foo bar

foo => baz

# is equivalent to

foo => foo bar, baz

Image word-image-2.png of Manage Elasticsearch and Solr analyser configuration files

Now, create the index with the Synonyms token filter, configured with the installed synonyms file:

Image word-image-3.png of Manage Elasticsearch and Solr analyser configuration files

Now, we can verify that the synonym.txt file is working as expected:

Image word-image-4.png of Manage Elasticsearch and Solr analyser configuration files

This is done. We’ve created a new index, with the resource file installed in the Elasticsearch conf directory.

 

Solr

We will use the Solr Synonyms Graph Filter

Our first task is to create a new core, without any conf files, to get the conf directory from the Solr error message:

Image word-image-5.png of Manage Elasticsearch and Solr analyser configuration files

Image word-image-6.png of Manage Elasticsearch and Solr analyser configuration files

Let’s add the following fieldType to our schema.xml, with the Synonym filter:
<fieldType name=”text_synonym” class=”solr.TextField”>
<analyzer type=”index”>
<tokenizer class=”solr.StandardTokenizerFactory”/>

<filter class=”solr.SynonymGraphFilterFactory” synonyms=”synonym.txt”/>

<filter class=”solr.FlattenGraphFilterFactory”/> <!– required on index analyzers after graph filters –>

</analyzer>

<analyzer type=”query”>

<tokenizer class=”solr.StandardTokenizerFactory”/>

<filter class=”solr.SynonymGraphFilterFactory” synonyms=”synonym.txt”/>

</analyzer>

</fieldType>

With the knowledge of our Solr conf folders, we can finally install the index files:

cd /solr-8.1.1/server/solr

mkdir -p new_index/conf

(install solrconfig.xml, schema.xml, and synonym.xml : not described here)

Image word-image-7.png of Manage Elasticsearch and Solr analyser configuration files

Fine. We are now ready to create the new index:

Image word-image-8.png of Manage Elasticsearch and Solr analyser configuration files

Image word-image-9.png of Manage Elasticsearch and Solr analyser configuration files

Now, verify that the synonym.txt is in place and well alive, thanks to the analyser page:

Image word-image-10.png of Manage Elasticsearch and Solr analyser configuration files

This is done. We’ve created a new index, with the resource file installed in the Solr conf directory.

 

 

wpsolr

wpsolr

Leave a Reply

Share on facebook
Share on google
Share on twitter
Share on linkedin

Recent Posts

Join Our weekly Newsletter

Receive our latest news once a week, each Thursday afternoon.
Your email is kept 100% private, and you will not receive other stuff in your mailbox.

We keep your data private and share your data only with third parties that make this service possible. Read our Privacy Policy.