BEIR benchmarks are great as absolute landmarks for theoretical research.
But measuring improvements over an existing search is much more dramatic for production systems.
(one cannot tell a WooCommerce owner that is new shinny search is the best on BEIR, while at the same time the shop orders dropped significantly)
As a dog food recipe, I’m wondering how I could:
1. A/B compare search implementations
My clients can tell almost immediately wether a search is better or not than the previous one. But they cannot prove it, because they cannot measure it without great efforts and costs.
3. Automate the fine-tuning of a search in Weaviate, ideally with the push of a button
As a summary, I’d say Weaviate is already one of the few integrated vector database: embeddings can be produced, stored, and consumed in a pipeline.
The next stage could be adding usage feedbacks, measurements, and auto-tuning. Does this look like a recommender system?