Weaviate is an open-source vector search engine that enables efficient similarity searches and semantic data retrieval. Its integration with Python allows data scientists to leverage powerful search capabilities within their data workflows seamlessly. This guide provides practical steps to integrate Weaviate with Python, helping you harness its full potential for your data projects.

Prerequisites

  • Python 3.7 or higher installed on your system
  • Weaviate server running locally or remotely
  • Python libraries: weaviate-client, pandas (optional)

Setting Up the Environment

Start by installing the necessary Python library for interacting with Weaviate:

pip install weaviate-client

If you plan to handle dataframes, also install pandas:

pip install pandas

Connecting to the Weaviate Instance

Import the client and establish a connection to your Weaviate server:

import weaviate

client = weaviate.Client(
    url="http://localhost:8080"  # Replace with your Weaviate URL
)

if client.is_ready():
    print("Connected to Weaviate!")
else:
    print("Connection failed. Check your server.")

Creating a Schema

Define the schema for your data objects. For example, a schema for articles:

schema = {
    "classes": [
        {
            "class": "Article",
            "description": "A scientific article",
            "properties": [
                {
                    "name": "title",
                    "dataType": ["text"]
                },
                {
                    "name": "content",
                    "dataType": ["text"]
                },
                {
                    "name": "publishedDate",
                    "dataType": ["date"]
                }
            ]
        }
    ]
}

client.schema.create(schema)

Adding Data to Weaviate

Create data objects and upload them to Weaviate:

article_data = {
    "title": "The Renaissance Era",
    "content": "The Renaissance was a fervent period of European cultural, artistic, political and economic “rebirth” following the Middle Ages.",
    "publishedDate": "1500-01-01"
}

client.data_object.create(article_data, class_name="Article")

Performing a Semantic Search

Use Weaviate's built-in vector search to find similar articles based on content:

query = {
    "nearText": {
        "concepts": ["European cultural rebirth"]
    }
}

result = client.query.get("Article", ["title", "content"]).with_near_text(query["nearText"]).do()

print(result)

Retrieving and Processing Data

Extract and handle search results for analysis or display:

articles = result['data']['Get']['Article']
for article in articles:
    print(f"Title: {article['title']}")
    print(f"Content: {article['content']}\n")

Conclusion

Integrating Weaviate with Python enhances your data science toolkit by enabling efficient semantic searches and data retrieval. With these practical steps, you can set up, manage, and query your vector data effectively, opening new possibilities for your projects.