Weaviate is an open-source vector search engine that enables efficient similarity searches and semantic data retrieval. Its integration with Python allows data scientists to leverage powerful search capabilities within their data workflows seamlessly. This guide provides practical steps to integrate Weaviate with Python, helping you harness its full potential for your data projects.
Prerequisites
- Python 3.7 or higher installed on your system
- Weaviate server running locally or remotely
- Python libraries: weaviate-client, pandas (optional)
Setting Up the Environment
Start by installing the necessary Python library for interacting with Weaviate:
pip install weaviate-client
If you plan to handle dataframes, also install pandas:
pip install pandas
Connecting to the Weaviate Instance
Import the client and establish a connection to your Weaviate server:
import weaviate
client = weaviate.Client(
url="http://localhost:8080" # Replace with your Weaviate URL
)
if client.is_ready():
print("Connected to Weaviate!")
else:
print("Connection failed. Check your server.")
Creating a Schema
Define the schema for your data objects. For example, a schema for articles:
schema = {
"classes": [
{
"class": "Article",
"description": "A scientific article",
"properties": [
{
"name": "title",
"dataType": ["text"]
},
{
"name": "content",
"dataType": ["text"]
},
{
"name": "publishedDate",
"dataType": ["date"]
}
]
}
]
}
client.schema.create(schema)
Adding Data to Weaviate
Create data objects and upload them to Weaviate:
article_data = {
"title": "The Renaissance Era",
"content": "The Renaissance was a fervent period of European cultural, artistic, political and economic “rebirth” following the Middle Ages.",
"publishedDate": "1500-01-01"
}
client.data_object.create(article_data, class_name="Article")
Performing a Semantic Search
Use Weaviate's built-in vector search to find similar articles based on content:
query = {
"nearText": {
"concepts": ["European cultural rebirth"]
}
}
result = client.query.get("Article", ["title", "content"]).with_near_text(query["nearText"]).do()
print(result)
Retrieving and Processing Data
Extract and handle search results for analysis or display:
articles = result['data']['Get']['Article']
for article in articles:
print(f"Title: {article['title']}")
print(f"Content: {article['content']}\n")
Conclusion
Integrating Weaviate with Python enhances your data science toolkit by enabling efficient semantic searches and data retrieval. With these practical steps, you can set up, manage, and query your vector data effectively, opening new possibilities for your projects.