Qdrant is a vector database & vector similarity search engine.
You can import your own data into Qdrant to be used but first you need to create the embeddings.
We have created a guide/notebook explaining and showing a code example of how you could do it yourself in 30 minutes or less.
This guide will go over :
- Setting up and initializing your Qdrant server.
- Loading the data and creating the corresponding embeddings using the “sentence-transformers” model.
- Importing the embeddings (vectors) into Qdrant.
- Testing that the import finished correctly by sending a query.
Learn how to import your own data into Qdrant using our notebook
You can find it at :
https://colab.research.google.com/drive/1Ix79zyp2v6XOqQDCfRSRavJinwlxCRDY?usp=sharing
Or following the detailed instructions down below
Start the Qdrant container
Pull the official Qdrant image from the Docker registry :
docker pull qdrant/qdrant #Start up a Qdrant container on the port 6333.
docker run -p 6333:6333 \
-v $(pwd)/qdrant_storage:/qdrant/storage \
qdrant/qdrant
This guide’s purpose is not to set up a secure production Qdrant server but to learn the basics of importing embedded data. So if you are interested in securing your Qdrant installation, please refer to the official documentation (https://qdrant.tech/documentation/guides/security/).
Configure and send data to Qdrant
This will import the Qdrant client module and set the url of your Qdrant installation to be used in the following lines (in this case, https://localhost:6333).
from qdrant_client.http.models import Distance, VectorParams
from qdrant_client import QdrantClient
client = QdrantClient("localhost", port=6333)
Create a collection named “collection_name” in Qdrant. A collection is used to store vectors (called points in Qdrant).
client.recreate_collection(
collection_name="collection_name",
vectors_config=VectorParams(size=384, distance=Distance.DOT),
)
If you ever need to, you can delete the collection using :
client.delete_collection(collection_name="collection_name")
Next, we’ll import the neccessary modules to load the dataset (it could be your own data in your case) and create the embeddings from the data.
from datasets import load_dataset
from sentence_transformers import SentenceTransformer
from qdrant_client.http.models import PointStruct
In the next section, we will select the first 100 sentences from the ms_marco dataset, create the embeddings from the data using the sentence-transformers model and send them to the server using the “upsert” function.
# Import the "ms_marco" dataset and load the sentences
dataset = load_dataset("ms_marco", 'v1.1')
train_dataset = dataset["train"]['passages']
# Load the sentence-transformers model
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
points = []
# Select the 100 first sentences of the dataset
for i in range(100):
sentence = train_dataset[i]['passage_text'][0]
# Create the respective embedding
embedding = model.encode(sentence)
# Append the sentence and embedding to the Qdrant points array
points.append(PointStruct(id=i, vector=embedding, payload={"text": sentence}))
# Send the points to the "collection_name" on the Qdrant server
operation_info = client.upsert(
collection_name="collection_name",
wait=True,
points=points
)
# Display the import status
print(operation_info)
Send a Query to Qdrant
You can finally test out Qdrant by sending a query. In the following code, create the embeddings from the query “who is Ronald Reagan?”. Send it to Qdrant using the search function of the Qdrant client module. You will receive the 3 most relevants results.
query = "who is Ronald Reagan?"
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
embedding = model.encode(query)
client = QdrantClient("localhost", port=6333)
search_result = client.search(
collection_name="collection_name",
query_vector=embedding,
limit=3
)
print(search_result)