(a.k.a Is it possible to use embeddings on long WooCommerce products descriptions for vector search ?)
I’ve tested with success vector search on toy data.
Full of confidence, I decided to show off a little bit with a WooCommerce demo containing real e-Commerce products.
It all collapsed:
– The size of tensor a (51625) must match the size of tensor b (512) at non-singleton dimension 1
– Failed with status: 400 error: This model’s maximum context length is 2046 tokens, however you requested 9995 tokens (9995 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.
– Token indices sequence length is longer than the specified maximum sequence length for this model (1095 > 512). Running this sequence through the model will result in indexing errors
Is this just about choosing the right model, or is it deeper and must be solved above, inside the frameworks that call embedding/inference upon the models?
A naive answer:
Naively, I could imagine vectorizing the long texts into smaller pieces, then querying/aggregating results.
And you, what do you think?
SeMI Technologies Hugging Face Jina AI Pinecone OpenAI PyTorch TensorFlow Zilliz Cohere
#embeddings #retail #ecommerce #woocommerce #vectorsearch