Table of Contents
In this tutorial, we will walk through the process of indexing and searching embeddings using Pinecone in Python. Pinecone is a managed vector database that simplifies similarity search at scale, making it ideal for applications involving machine learning and AI.
Prerequisites
Before we begin, ensure you have the following:
- Python installed on your system (version 3.6+ recommended)
- An active Pinecone account
- API key from Pinecone
- Required Python libraries: pinecone-client, numpy
Setting Up the Environment
First, install the necessary libraries using pip:
pip install pinecone-client numpy
Initializing Pinecone
Import the library and initialize the connection with your API key:
import pinecone
pinecone.init(api_key='YOUR_API_KEY', environment='us-west1-gcp')
Creating an Index
Next, create a new index or connect to an existing one:
index_name = 'example-index'
if index_name not in pinecone.list_indexes():
pinecone.create_index(index_name, dimension=128)
index = pinecone.Index(index_name)
Creating Embeddings
Generate sample embeddings using numpy:
import numpy as np
embeddings = [np.random.rand(128).tolist() for _ in range(10)]
Indexing Embeddings
Prepare data with unique IDs and upsert into the index:
ids = [f'id_{i}' for i in range(10)]
vectors = list(zip(ids, embeddings))
index.upsert(vectors=vectors)
Searching Embeddings
To perform a similarity search, create a query embedding and search:
query_embedding = np.random.rand(128).tolist()
results = index.query(
vector=query_embedding,
top_k=3,
include_metadata=True
)
Reviewing Results
The search results will include the IDs of the most similar vectors along with their similarity scores:
for match in results['matches']:
print(f\"ID: {match['id']}, Score: {match['score']}\")
Cleanup
When finished, delete the index to free resources:
pinecone.delete_index(index_name)