Fine-tuning a reranker with synthetic LLM generated data and LLM+human annotations
Great post from HumanSignal labelstud.io to fine-tune a Cohere reranker with synthetic LLM generated data and LLM+human annotations: Generate synthetic queries from documents with a LLM (OpenAI gpt4-o here) Extract results from your retrieval system for all synthetic queries Create a label project for reranking tasks with triplet-loss (positive, hard-negative) Upload query/results in the label studio Pre-label query/results with a LLM reranker’s back-end (OpenAI gpt4-o here) Let humans complete the pre-labeling Send labeled query/results to a LLM reranking fine-tuner (Cohere here) Test your new fine-tuned reranked retrieva Original post: https://labelstud.io/blog/improving-rag-document-search-quality-with-cohere-re-ranking/