Is hybrid search an easy trick to better fine-tuned LLM search?

— When LLMs suck –

LLMs (large Language Models) do not rank well on recent or specialised corpus, as they are trained on oldish and general datasets.

— Don’t understand? Parrot instead —

Hybrid search with BM25 can compensate that: after all, when you do not understand a concept (vector similarity) you can always recognise some keywords (pattern matching).

— Fine tuning —

Fine-tuned models will always be better than hybrid or pure BM25 on a predefined test dataset, by definition.

But only if you have time, money, skills, and probably labeled data to do so.

— Conclusion —

Training a LLM on the last 2 weeks of discoveries in Quantum gravity will probably be out of reach for most people.

This is where BM25 and Hybrid search can shine.

But on the other hand, if you can train your model, you’re certainly far better.

— Solutions —

And the good news is that there are lots of LLM fine-tuning cheap methods nowadays:

OpenAI fine-tuning API:

Hugging Face AutoML API:

Cohere fine-tuning API:

You can then use your fine-tuned models with SeMI Technologies Weaviate:

OpenAI module:

Hugging Face module:

Cohere module:

More on WPSOLR, WooCommerce and Weaviate:

#wpsolr #openai #gpt3 #chatgpt #huggingface #cohere #woocommerce #vectorsearch #finetuning #embeddings