Great post from HumanSignal labelstud.io to fine-tune a Cohere reranker with synthetic LLM generated data and LLM+human annotations:
- Generate synthetic queries from documents with a LLM (OpenAI gpt4-o here)
- Extract results from your retrieval system for all synthetic queries
- Create a label project for reranking tasks with triplet-loss (positive, hard-negative)
- Upload query/results in the label studio
- Pre-label query/results with a LLM reranker’s back-end (OpenAI gpt4-o here)
- Let humans complete the pre-labeling
- Send labeled query/results to a LLM reranking fine-tuner (Cohere here)
- Test your new fine-tuned reranked retrieva
Original post: https://labelstud.io/blog/improving-rag-document-search-quality-with-cohere-re-ranking/