Table of contents :

How to import your own data into Qdrant

68747470733a2f2f716472616e742e746563682f636f6e74656e742f696d616765732f746578745f7365617263682e706e67

Table of contents :

Qdrant is a vector database & vector similarity search engine.

You can import your own data into Qdrant to be used but first you need to create the embeddings.
We have created a guide/notebook explaining and showing a code example of how you could do it yourself in 30 minutes or less.

 

This guide will go over :

  1. Setting up and initializing your Qdrant server.
  2. Loading the data and creating the corresponding embeddings using the “sentence-transformers” model.
  3. Importing the embeddings (vectors) into Qdrant.
  4. Testing that the import finished correctly by sending a query.

 

Learn how to import your own data into Qdrant using our notebook

 

You can find it at :

https://colab.research.google.com/drive/1Ix79zyp2v6XOqQDCfRSRavJinwlxCRDY?usp=sharing

 

Or following the detailed instructions down below

 

Start the Qdrant container

 

Pull the official Qdrant image from the Docker registry :

docker pull qdrant/qdrant #Start up a Qdrant container on the port 6333.
docker run -p 6333:6333 \
-v $(pwd)/qdrant_storage:/qdrant/storage \
qdrant/qdrant
This guide’s purpose is not to set up a secure production Qdrant server but to learn the basics of importing embedded data. So if you are interested in securing your Qdrant installation, please refer to the official documentation (https://qdrant.tech/documentation/guides/security/).

Configure and send data to Qdrant

 

This will import the Qdrant client module and set the url of your Qdrant installation to be used in the following lines (in this case, https://localhost:6333).

from qdrant_client.http.models import Distance, VectorParams
from qdrant_client import QdrantClient

client = QdrantClient("localhost", port=6333)
Create a collection named “collection_name” in Qdrant. A collection is used to store vectors (called points in Qdrant).
client.recreate_collection(
    collection_name="collection_name",
    vectors_config=VectorParams(size=384, distance=Distance.DOT),
)
If you ever need to, you can delete the collection using :
client.delete_collection(collection_name="collection_name")
Next, we’ll import the neccessary modules to load the dataset (it could be your own data in your case) and create the embeddings from the data.
from datasets import load_dataset
from sentence_transformers import SentenceTransformer
from qdrant_client.http.models import PointStruct
In the next section, we will select the first 100 sentences from the ms_marco dataset, create the embeddings from the data using the sentence-transformers model and send them to the server using the “upsert” function.
# Import the "ms_marco" dataset and load the sentences
dataset = load_dataset("ms_marco", 'v1.1')
train_dataset = dataset["train"]['passages']

# Load the sentence-transformers model
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

points = []

# Select the 100 first sentences of the dataset
for i in range(100):
    sentence = train_dataset[i]['passage_text'][0]
    # Create the respective embedding
    embedding = model.encode(sentence)
    # Append the sentence and embedding to the Qdrant points array
    points.append(PointStruct(id=i, vector=embedding, payload={"text": sentence}))

# Send the points to the "collection_name" on the Qdrant server
operation_info = client.upsert(
    collection_name="collection_name",
    wait=True,
    points=points
)

# Display the import status
print(operation_info)

Send a Query to Qdrant

You can finally test out Qdrant by sending a query. In the following code, create the embeddings from the query “who is Ronald Reagan?”. Send it to Qdrant using the search function of the Qdrant client module. You will receive the 3 most relevants results.
query = "who is Ronald Reagan?"

model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
embedding = model.encode(query)

client = QdrantClient("localhost", port=6333)
search_result = client.search(
    collection_name="collection_name",
    query_vector=embedding,
    limit=3
)
print(search_result)

 

Trending posts