natural-language-processing
Step-by-Step: Indexing and Searching Embeddings with Pinecone in Python
Table of Contents
In this tutorial, we will walk through the process of indexing and searching embeddings using Pinecone in Python. Pinecone is a managed vector database that simplifies similarity search at scale, making it ideal for applications involving machine learning and AI.
Prerequisites
Before we begin, ensure you have the following:
- Python installed on your system (version 3.6+ recommended)
- An active Pinecone account
- API key from Pinecone
- Required Python libraries: pinecone-client, numpy
Setting Up the Environment
First, install the necessary libraries using pip:
pip install pinecone-client numpy
Initializing Pinecone
Import the library and initialize the connection with your API key:
import pinecone
pinecone.init(api_key='YOUR_API_KEY', environment='us-west1-gcp')
Creating an Index
Next, create a new index or connect to an existing one:
index_name = 'example-index'
if index_name not in pinecone.list_indexes():
pinecone.create_index(index_name, dimension=128)
index = pinecone.Index(index_name)
Creating Embeddings
Generate sample embeddings using numpy:
import numpy as np
embeddings = [np.random.rand(128).tolist() for _ in range(10)]
Indexing Embeddings
Prepare data with unique IDs and upsert into the index:
ids = [f'id_{i}' for i in range(10)]
vectors = list(zip(ids, embeddings))
index.upsert(vectors=vectors)
Searching Embeddings
To perform a similarity search, create a query embedding and search:
query_embedding = np.random.rand(128).tolist()
results = index.query(
vector=query_embedding,
top_k=3,
include_metadata=True
)
Reviewing Results
The search results will include the IDs of the most similar vectors along with their similarity scores:
for match in results['matches']:
print(f\"ID: {match['id']}, Score: {match['score']}\")
Cleanup
When finished, delete the index to free resources:
pinecone.delete_index(index_name)