1. OpenAI vectorizer
OpenAI vectorizer uses the OpenAI API to create Embeddings (vectors) from data. Vectorizing from large language models is CPU intensive, often requires GPUs, and can be very ineffective without proper optimisations. This is the purpose of this module, which calls OpenAI servers with their Embeddings API.
Create a docker compose file from Weaviate wizzard : select the Text transformers vectorizer, with a specific transformer model among the list. Then start the docker container.
- (1) Select the “Text” vectorizer type
- (2) Select the “OpenAI” vectorizer
- (3) Leave it. The API key will be set later, in WPSOLR’s settings.
Let’s setup our model on OpenAI dashboard:
(Please check the OpenAI Embedding models pricing first)
a) Login: https://openai.com/api/ (signup first if necessary)
b) Select the API Keys menu
- (2) Copy your OpenAI API key for later during your WPSOLR index configuration
Now, let’s create our index in WPSOLR:
- (1) Select the text2vec-openai vectorizer module
- (2) Copy your OpenAI API access token described earlier
- (3) Set a name for you index, visible in WPSOLR admin
- (4) Set a name for your Weaviate class (index)
- (5) Set the url of your Weaviate docker container
- (6) Set “text” or “code”
- (7) Set the model “ada”, “babbage”, “curies” “davinci”. OpenAI replaced all embeddings model v1 with a unique v2 “ada” model (10x cheaper)
- (8) Set the version “001′, “002”. OpenAI’s new “002” version is recommended and 10x cheaper
- (5) Create the index. Done!
Connect to the Weaviate GraphQL console at https://console.semi.technology/console with url https://localhost:8080, to check our new index (class):
2 Select your data
- (1) (2) (3) select the index you just created
- (4) Choose a filter: “Near Text” to perform a vector search (search on concepts), or “Where” to perform a keywords search (classic search that works with words)
- (5) Set a similarity for your “Near Text” search. The closer to “1”, the more precise is the vector search.
3 Index your data