(a.k.a Is it possible to use embeddings on long WooCommerce products descriptions for vector search ?)
The context:
I’ve tested with success vector search on toy data.
Full of confidence, I decided to show off a little bit with a WooCommerce demo containing real e-Commerce products.
It all collapsed:
– The size of tensor a (51625) must match the size of tensor b (512) at non-singleton dimension 1
– Failed with status: 400 error: This model’s maximum context length is 2046 tokens, however you requested 9995 tokens (9995 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.
– Token indices sequence length is longer than the specified maximum sequence length for this model (1095 > 512). Running this sequence through the model will result in indexing errors
The question:
Is this just about choosing the right model, or is it deeper and must be solved above, inside the frameworks that call embedding/inference upon the models?
A naive answer:
Naively, I could imagine vectorizing the long texts into smaller pieces, then querying/aggregating results.
And you, what do you think?
WPSOLR: https://www.wpsolr.com/
SeMI Technologies Hugging Face Jina AI Pinecone OpenAI PyTorch TensorFlow Zilliz Cohere
#embeddings #retail #ecommerce #woocommerce #vectorsearch