— Pure vector search is amazing —
Just a few elements of the vector search:
– Misspelling is literally not a subject anymore
It’s even quite impossible to fool the search, as it understands phonetic-like queries. And it could even be improved by asking a generative LLM like T5 or GPT pre-processing step to fix the query spelling or grammar.
– Multi-language is almost not a subject anymore
Who did not dream of selling to worldwide customers without translating in a hundred languages?
– Usually hidden products can be found with semantic
“Something to sleep on” returns mattresses 🙂
— Hybrid search: a step backwards? —
I asked a client to compare Weaviate with Cohere to the same configuration with a hybrid mode.
The hybrid mode lost…
The test was multi-lingual, which explains a lot: BM25 needs pre-processing to be efficient. It requires stemming, N-Grams, synonyms, all that sort of tricks that cannot work easily with a single language queried in 5 other languages.
— So what? —
Hybrid is meant to prevent hallucinations, and to overcome highly specialised vocabularies.
A solution could be to fine-tune embedding models on your own data and vocabulary instead.
Like a recommender system, which looks into products and also into external information like user interactions.