Table of Contents
Weaviate is a powerful open-source vector search engine that allows organizations to manage and search large-scale vector data efficiently. Setting up Weaviate properly is essential for leveraging its full capabilities. This guide provides a step-by-step process to configure Weaviate for optimal vector data management.
Prerequisites
- Basic knowledge of Docker and Docker Compose
- Access to a Linux or Windows machine with Docker installed
- Familiarity with command-line interface
Step 1: Install Docker and Docker Compose
Ensure Docker and Docker Compose are installed on your system. You can download them from the official Docker website and follow the installation instructions for your operating system.
Step 2: Create a Docker Compose File
Create a directory for your Weaviate setup and inside it, create a file named docker-compose.yml. Add the following configuration:
version: '3.4'
services:
weaviate:
image: semitechnologies/weaviate:latest
ports:
- "8080:8080"
environment:
- QUERY_DEFAULTS_LIMIT=20
- AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true
- PERSISTENCE_DATA_PATH=/var/lib/weaviate
- DEFAULT_VECTORIZER_MODULE=text2vec-contextionary
- ENABLE_MODULES=text2vec-contextionary
volumes:
- ./data:/var/lib/weaviate
Step 3: Launch Weaviate
Navigate to the directory containing your docker-compose.yml file and run the following command:
docker-compose up -d
This command will download the Weaviate image and start the container in detached mode. You can verify it's running by visiting http://localhost:8080.
Step 4: Configure the Vectorizer Module
Weaviate supports multiple vectorizer modules. The default is text2vec-contextionary. To enable other modules like OpenAI or Hugging Face, update the environment variables in your Docker Compose file accordingly.
Example: Using OpenAI for Vectorization
Add the following environment variables:
environment:
- QUERY_DEFAULTS_LIMIT=20
- AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true
- PERSISTENCE_DATA_PATH=/var/lib/weaviate
- DEFAULT_VECTORIZER_MODULE=text2vec-openai
- OPENAI_API_KEY=your-openai-api-key
Step 5: Indexing Data into Weaviate
Once Weaviate is running, you can start indexing data via its RESTful API. Use tools like cURL or Postman to send POST requests with your data objects.
Step 6: Querying Vector Data
To perform vector searches, send a GraphQL or REST API request with your query vector. Weaviate will return the most similar objects based on cosine similarity or other metrics.
Additional Tips
- Regularly back up your data directory (
./data) to prevent data loss. - Monitor container logs using
docker logs [container_id]for troubleshooting. - Explore Weaviate's documentation for advanced configurations like schema setup and multi-tenancy.
By following these steps, you can effectively configure Weaviate to manage and search your vector data at scale. Happy indexing!