Search results

I believe vector databases by themselves are an empty shell: the pearl inside are the embeddings.

Weaviate was probably the first to understand that fact, by delivering no-code vectorizer modules inside the database. Vespa too, but with more manual steps (download the model, convert to ONNX). Suddenly, anyone a bit technical (but not a data scientist) could index its raw data without worrying too much about the cogs behind the scene. Vector database + custom python code vectorizer => hard-core toolbox to build POC Vector database + integrated vectorizer => mainstream toolbox to go to prod More on WooCommerce + Weaviate : https://www.wpsolr.com

VectorOps is the next big thing, and here’s why…

– Chapter 1: embeddings – After years of exploiting the last layer of ML models, somebody discovered with BERT that the gem was in fact the before last layer, which we now call embeddings. Because embeddings could encapsulate semantic, indeed. – Chapter 2: vector databases from vector metrics – But, even deeper, because embeddings are vectors, and vector spaces bring vector metrics. With vector metrics you can compare, and therefore cluster, things that are not comparable else. How to compare 2 sentences, 2 images, a sentence to an image… And so, vector databases were born to store vectors and use vector metrics to find related vectors and concepts. – Chapter 3: embed embeddings – It remains difficult to build the embeddings, despite the rise of

Why hybrid vector search could fail, and what would be better?

— Pure vector search is amazing — After settling my first WooCommerce demos with Weaviate, I was shocked how good the results were. And with absolutely no tuning! Just a few elements of the vector search: – Misspelling is literally not a subject anymore It’s even quite impossible to fool the search, as it understands phonetic-like queries. And it could even be improved by asking a generative LLM like T5 or GPT pre-processing step to fix the query spelling or grammar. – Multi-language is almost not a subject anymore Who did not dream of selling to worldwide customers without translating in a hundred languages? – Usually hidden products can be found with semantic “Something to sleep on” returns mattresses 🙂 — Hybrid search: a step backwards? — I

Fine-tuned query spell correction without model training?

— Typo corrected results — If you’re using vector search, you already get some form of spell tolerance. You will enjoy results even with a query written phonetically. (If you’re not using vector search, well, you should!) — Generic typo corrected query — But sometimes, you’d rater let the user choose his preferred typo correction inside a list, before starting the search. This is really easy nowadays. Here are the manual steps: – Open your GPT3.5/GPT4/ChatGPT sandbox – Prompt the AI with a proper sentence to fix your query. Something like: Fix the typo on the following query: A: plise fixe the tippo Answers: A: Please fix the typo That’s it. But of course, you want to automate corrections in your search, and it remains

Contrary to popular belief, vector search is mature enough for E-commerce in production !

But while vector search is, while embeddings are, sometimes the full integration is not … Let’s face it, most e-commerce demos start with some notebook python code to: – Load data – Tweak some parameters, like vocabulary size and tokenizer types – Download and call embedding models – Store vectors in the vector database After that, you’ll have to build the code to replace your actual E-Commerce SQL search with the vector search. But how to: – Build facets and filters ? – Index your data in real-time, as soon as your data is updated ? – Re-index all your data in batches, from time to time ? – Monitor your conversion rates ? – Compare search results from several search engines ? – Display

What’s missing to open source LLMs is the hosting

This is why closed LLMs developed so fast. There are not so many inference and even less training hosting providers for OSS LLMs. And even less with a full Pay-as you-Go billing. For instance, Hugging Face endpoints are currently billed per model per VM. One has to configure a VM, choose a model, then pay by the usage. This is great, but to try or switch off temporarily a model, you must configure a new endpoint. Interested in WooCommerce vector search? https://wpsolr.com #wpsolr #woocommerce #vectorsearch #weaviate #huggingface

New Vespa global reranking

Vespa’s 2-phase state ranking can now be followed by a (stateless/autoscaling/GPU) global reranking from cross-encoder models

New Google PALM embeddings API

Finally! I was wondering how long before Google provides a LLM embeddings API ? Already integrated to Weaviate with Google PALM, and soon to WooCommerce  with WPSOLR:) WPSOLR + Weaviate + WooCommerce: https://wpsolr.com