— When LLMs suck –
LLMs (large Language Models) do not rank well on recent or specialised corpus, as they are trained on oldish and general datasets.
— Don’t understand? Parrot instead —
Hybrid search with BM25 can compensate that: after all, when you do not understand a concept (vector similarity) you can always recognise some keywords (pattern matching).
— Fine tuning —
Fine-tuned models will always be better than hybrid or pure BM25 on a predefined test dataset, by definition.
But only if you have time, money, skills, and probably labeled data to do so.
— Conclusion —
Training a LLM on the last 2 weeks of discoveries in Quantum gravity will probably be out of reach for most people.
This is where BM25 and Hybrid search can shine.
But on the other hand, if you can train your model, you’re certainly far better.
— Solutions —
And the good news is that there are lots of LLM fine-tuning cheap methods nowadays:
OpenAI fine-tuning API: https://beta.openai.com/docs/guides/fine-tuning
Hugging Face AutoML API: https://huggingface.co/autotrain
Cohere fine-tuning API: https://docs.cohere.ai/reference/finetune
You can then use your fine-tuned models with SeMI Technologies Weaviate:
OpenAI module: https://weaviate.io/developers/weaviate/current/retriever-vectorizer-modules/text2vec-openai.html
Hugging Face module: https://weaviate.io/developers/weaviate/current/retriever-vectorizer-modules/text2vec-huggingface.html
Cohere module: https://weaviate.io/developers/weaviate/current/retriever-vectorizer-modules/text2vec-cohere.html
More on WPSOLR, WooCommerce and Weaviate: https://www.wpsolr.com
#wpsolr #openai #gpt3 #chatgpt #huggingface #cohere #woocommerce #vectorsearch #finetuning #embeddings