WPSolr logo
Search
Close this search box.

Table of contents :

Why hybrid vector search could fail, and what would be better?

DALL·E 2023-04-18 16.45.33 - a hybrid creature, art work

Table of contents :

— Pure vector search is amazing —

After settling my first WooCommerce demos with Weaviate, I was shocked how good the results were. And with absolutely no tuning!

Just a few elements of the vector search:

– Misspelling is literally not a subject anymore
It’s even quite impossible to fool the search, as it understands phonetic-like queries. And it could even be improved by asking a generative LLM like T5 or GPT pre-processing step to fix the query spelling or grammar.

– Multi-language is almost not a subject anymore
Who did not dream of selling to worldwide customers without translating in a hundred languages?

– Usually hidden products can be found with semantic
“Something to sleep on” returns mattresses 🙂

— Hybrid search: a step backwards? —

I asked a client to compare Weaviate with Cohere to the same configuration with a hybrid mode.
The hybrid mode lost…
The test was multi-lingual, which explains a lot: BM25 needs pre-processing to be efficient. It requires stemming, N-Grams, synonyms, all that sort of tricks that cannot work easily with a single language queried in 5 other languages.

— So what? —
Hybrid is meant to prevent hallucinations, and to overcome highly specialised vocabularies.

A solution could be to fine-tune embedding models on your own data and vocabulary instead.

Like a recommender system, which looks into products and also into external information like user interactions.

 

Demos:

WooCommerce + Weaviate + Hybrid

WooCommerce + Weaviate + Cohere

WooCommerce + Weaviate + OpenAI

WooCommerce + Weaviate + CLIP image search

 

WPSOLR + Weaviate + Vespa: https://www.wpsolr.com

Related posts ... not powered by WPSOLR 😊

Apache Solr and text analysis

Introduction Apache Solr is an open-source search platform built on Apache Lucene. It provides powerful indexing and searching capabilities for handling large volumes of data.