Table of contents :

Is hybrid search an easy trick to better fine-tuned LLM search?

Table of contents :

— When LLMs suck –

LLMs (large Language Models) do not rank well on recent or specialised corpus, as they are trained on oldish and general datasets.

— Don’t understand? Parrot instead —

Hybrid search with BM25 can compensate that: after all, when you do not understand a concept (vector similarity) you can always recognise some keywords (pattern matching).

— Fine tuning —

Fine-tuned models will always be better than hybrid or pure BM25 on a predefined test dataset, by definition.

But only if you have time, money, skills, and probably labeled data to do so.

— Conclusion —

Training a LLM on the last 2 weeks of discoveries in Quantum gravity will probably be out of reach for most people.

This is where BM25 and Hybrid search can shine.

But on the other hand, if you can train your model, you’re certainly far better.

— Solutions —

And the good news is that there are lots of LLM fine-tuning cheap methods nowadays:

OpenAI fine-tuning API: https://beta.openai.com/docs/guides/fine-tuning

Hugging Face AutoML API: https://huggingface.co/autotrain

Cohere fine-tuning API: https://docs.cohere.ai/reference/finetune

You can then use your fine-tuned models with SeMI Technologies Weaviate:

OpenAI module: https://weaviate.io/developers/weaviate/current/retriever-vectorizer-modules/text2vec-openai.html

Hugging Face module: https://weaviate.io/developers/weaviate/current/retriever-vectorizer-modules/text2vec-huggingface.html

Cohere module: https://weaviate.io/developers/weaviate/current/retriever-vectorizer-modules/text2vec-cohere.html

More on WPSOLR, WooCommerce and Weaviate: https://www.wpsolr.com

#wpsolr #openai #gpt3 #chatgpt #huggingface #cohere #woocommerce #vectorsearch #finetuning #embeddings

Trending posts