My 5 cents thoughts on BEIR benchmarks on a production environment…

BEIR benchmarks are great as absolute landmarks for theoretical research.

But measuring improvements over an existing search is much more dramatic for production systems.
(one cannot tell a WooCommerce owner that is new shinny search is the best on BEIR, while at the same time the shop orders dropped significantly)

As a dog food recipe, I’m wondering how I could:

1. A/B compare search implementations
My clients can tell almost immediately wether a search is better or not than the previous one. But they cannot prove it, because they cannot measure it without great efforts and costs.

2. Introduce event feedbacks in the model (CTR, #orders#baskets, orders value …)

3. Automate the fine-tuning of a search in Weaviate, ideally with the push of a button

As a summary, I’d say Weaviate is already one of the few integrated vector database: embeddings can be produced, stored, and consumed in a pipeline.
The next stage could be adding usage feedbacks, measurements, and auto-tuning. Does this look like a recommender system?

WPSOLR + Weaviate