Multilingual search

469 views January 13, 2017 January 14, 2017 admin 2

World class WordPress multilingual search

Multilingual search with WPSOLR

A World class multilingual search

WPSOLR is perfectly integrated with WordPress most popular multilingual plugins: WPML and Polylang. Your search should not be less multilingual that your posts, products, or pages, isn’t it ?

And true multilingual search is complex, because it has to cope with many layers, from search content, to static and dynamic UI elements.

We will explain all of them in detail in the next chapters.

Multilingual search content

Your search content is made of post types bits and pieces: title, text, excerpt, taxonomies, terms, and custom fields. WooCommerce products also deal with attributes and variations.

When a post is saved, it’s different pieces are sent to a Solr index, which will transform them, accordingly with the schema.xml index file instructions.
For instance, texts will be cut in individual words. Punctuation and meaningless words will be removed (“stopwords”). Words will be reduced to a common radical (“stemming”). Synonyms will be matched, and so one.

But, wait a minute … This is true in English, where sentences are made of words separated by white spaces and punctuation. But is it true everywhere in the world, in countries with symbol based languages ( Chinese, Japanese, Korean, Russian …) ?

It seems each language needs specific text transformations. And fortunately, Apache Solr can manage them all thanks to it’s numerous language specific filters that can be inserted in an index schema.xml.

One or several indexes ?

But we are dealing with several languages. So, should we store content in one or several indexes ? Well, this is a question WPSOLR solved for you.

One index by language is incredibly flexible

Solr multilingual one index by language

Solr multilingual one index by language

When you’re using WPSOLR with WPML or Polylang, each content is stored in it’s language index. So, the French content is stored in an index with a French tailored schema.xml to deal with accents. While the Chinese content belongs to it’s own Chinese tailored schema.xml.

Why this choice ? Because it is the most flexible. With one Solr index by language, you can tweak as much as you like each language to your needs. Not only with the schema.xml, but also with the solrconfig.xml.

Workflow to index a post type

Retrieve the Solr index for a post language

Retrieve the Solr index for a post language

 

Workflow to search results for a language

Retrieve the Solr index for a language search

Retrieve the Solr index for a language search

 

One shared index is not as flexible, and more complex

Solr multilingual with one shared index

Solr multilingual with one shared index

While with a shared index, you can certainly define specific fields with specific language filters. But at the expense of much more complexity on the client side (call field_en, or field_fr, or field_cn, depending on the language context).

And you cannot define a default search field by language, or customize the search handler with a default parameter or java plugin by language, among many other things.

 

Multilingual static content

Static content is the text contained in the php files that will build the search page results displayed to the users.

It can be the “Search” label on the search form submit button. Or the “Sort by” label of the sort drop-down-list. Or the “Next page” label in the search navigation bar.

WPSOLR is delivered with POT and PO files, that you can modify at will. You can also create your own language translation if necessary.

 

Multilingual dynamic content

Dynamic content is the text contained in the database, related to your search content, that will build the search page results displayed to the users.

For instance, the list of sort items (“Less expensive first”, “Newest first”, …). Or the list of post types (“Post”, “Page”, “Product”, “Knowledge base”, …) in facets.

WPSOLR calls WPML/POLYLANG actions/filters to retrieve a translation for the dynamic content from their respective:

wpsolr multilingual - wpml string translation module

wpsolr multilingual – wpml string translation module

 

Which languages can be managed by WPSOLR ?

This is a list of languages officially supported by Apache Solr:

Was this helpful?

Leave A Comment
*
*