In this tutorial, we will walk through the process of indexing and searching embeddings using Pinecone in Python. Pinecone is a managed vector database that simplifies similarity search at scale, making it ideal for applications involving machine learning and AI.

Prerequisites

Before we begin, ensure you have the following:

  • Python installed on your system (version 3.6+ recommended)
  • An active Pinecone account
  • API key from Pinecone
  • Required Python libraries: pinecone-client, numpy

Setting Up the Environment

First, install the necessary libraries using pip:

pip install pinecone-client numpy

Initializing Pinecone

Import the library and initialize the connection with your API key:

import pinecone

pinecone.init(api_key='YOUR_API_KEY', environment='us-west1-gcp')

Creating an Index

Next, create a new index or connect to an existing one:

index_name = 'example-index'

if index_name not in pinecone.list_indexes():

pinecone.create_index(index_name, dimension=128)

index = pinecone.Index(index_name)

Creating Embeddings

Generate sample embeddings using numpy:

import numpy as np

embeddings = [np.random.rand(128).tolist() for _ in range(10)]

Indexing Embeddings

Prepare data with unique IDs and upsert into the index:

ids = [f'id_{i}' for i in range(10)]

vectors = list(zip(ids, embeddings))

index.upsert(vectors=vectors)

Searching Embeddings

To perform a similarity search, create a query embedding and search:

query_embedding = np.random.rand(128).tolist()

results = index.query(

vector=query_embedding,

top_k=3,

include_metadata=True

)

Reviewing Results

The search results will include the IDs of the most similar vectors along with their similarity scores:

for match in results['matches']:

print(f\"ID: {match['id']}, Score: {match['score']}\")

Cleanup

When finished, delete the index to free resources:

pinecone.delete_index(index_name)