
haiku-rag
io.github.ggozad/haiku-rag
Opinionated agentic RAG powered by LanceDB, Pydantic AI, and Docling
Documentation
Haiku RAG
Opinionated agentic RAG powered by LanceDB, Pydantic AI, and Docling.
haiku.rag is an opinionated agentic RAG system that uses LanceDB for vector storage, Pydantic AI for multi-agent workflows, and Docling for document processing. It supports hybrid search (vector + full-text) with Reciprocal Rank Fusion, multiple embedding providers (Ollama, LM Studio, vLLM, OpenAI, VoyageAI), and includes research agents that plan, search, evaluate, and synthesize answers.
Features
- Local LanceDB: No external servers required, supports also LanceDB cloud storage, S3, Google Cloud & Azure
- Multiple embedding providers: Ollama, LM Studio, VoyageAI, OpenAI, vLLM
- Multiple QA providers: Any provider/model supported by Pydantic AI (Ollama, LM Studio, OpenAI, Anthropic, etc.)
- Native hybrid search: Vector + full-text search with native LanceDB RRF reranking
- Reranking: Default search result reranking with MixedBread AI, Cohere, Zero Entropy, or vLLM
- Question answering: Built-in QA agents on your documents
- Research graph (multi‑agent): Plan → Search → Evaluate → Synthesize with agentic AI
- File monitoring: Auto-index files when run as server
- CLI & Python API: Use from command line or Python
- MCP server: Expose as tools for AI assistants
- Flexible document processing: Local (docling) or remote (docling-serve) processing
Installation
Python 3.12 or newer required
Full Package (Recommended)
uv pip install haiku.rag
Includes all features: document processing, all embedding providers, and rerankers.
Slim Package (Minimal Dependencies)
uv pip install haiku.rag-slim
Install only the extras you need. See the Installation documentation for available options
Quick Start
# Add documents
haiku-rag add "Your content here"
haiku-rag add "Your content here" --meta author=alice --meta topic=notes
haiku-rag add-src document.pdf --meta source=manual
# Search
haiku-rag search "query"
# Search with filters
haiku-rag search "query" --filter "uri LIKE '%.pdf' AND title LIKE '%paper%'"
# Ask questions
haiku-rag ask "Who is the author of haiku.rag?"
# Ask questions with citations
haiku-rag ask "Who is the author of haiku.rag?" --cite
# Deep QA (multi-agent question decomposition)
haiku-rag ask "Who is the author of haiku.rag?" --deep --cite
# Deep QA with verbose output
haiku-rag ask "Who is the author of haiku.rag?" --deep --verbose
# Multi‑agent research (iterative plan/search/evaluate)
haiku-rag research \
"What are the main drivers and trends of global temperature anomalies since 1990?" \
--max-iterations 2 \
--confidence-threshold 0.8 \
--max-concurrency 3 \
--verbose
# Rebuild database (re-chunk and re-embed all documents)
haiku-rag rebuild
# Start server with file monitoring
haiku-rag serve --monitor
To customize settings, create a haiku.rag.yaml config file (see Configuration).
Python Usage
from haiku.rag.client import HaikuRAG
from haiku.rag.config import Config
from haiku.rag.graph.agui import stream_graph
from haiku.rag.graph.research import (
ResearchContext,
ResearchDeps,
ResearchState,
build_research_graph,
)
async with HaikuRAG("database.lancedb") as client:
# Add document
doc = await client.create_document("Your content")
# Search (reranking enabled by default)
results = await client.search("query")
for chunk, score in results:
print(f"{score:.3f}: {chunk.content}")
# Ask questions
answer = await client.ask("Who is the author of haiku.rag?")
print(answer)
# Ask questions with citations
answer = await client.ask("Who is the author of haiku.rag?", cite=True)
print(answer)
# Multi‑agent research pipeline (Plan → Search → Evaluate → Synthesize)
# Graph settings (provider, model, max_iterations, etc.) come from config
graph = build_research_graph(config=Config)
question = (
"What are the main drivers and trends of global temperature "
"anomalies since 1990?"
)
context = ResearchContext(original_question=question)
state = ResearchState.from_config(context=context, config=Config)
deps = ResearchDeps(client=client)
# Blocking run (final result only)
report = await graph.run(state=state, deps=deps)
print(report.title)
# Streaming progress (AG-UI events)
async for event in stream_graph(graph, state, deps):
if event["type"] == "STEP_STARTED":
print(f"Starting step: {event['stepName']}")
elif event["type"] == "ACTIVITY_SNAPSHOT":
print(f" {event['content']}")
elif event["type"] == "RUN_FINISHED":
print("\nResearch complete!\n")
result = event["result"]
print(result["title"])
print(result["executive_summary"])
MCP Server
Use with AI assistants like Claude Desktop:
haiku-rag serve --mcp --stdio
Add to your Claude Desktop configuration:
{
"mcpServers": {
"haiku-rag": {
"command": "haiku-rag",
"args": ["serve", "--mcp", "--stdio"]
}
}
}
Provides tools for document management, search, QA, and research directly in your AI assistant.
Examples
See the examples directory for working examples:
- Interactive Research Assistant - Full-stack research assistant with Pydantic AI and AG-UI featuring human-in-the-loop approval and real-time state synchronization
- Docker Setup - Complete Docker deployment with file monitoring and MCP server
- A2A Server - Self-contained A2A protocol server package with conversational agent interface
Documentation
Full documentation at: https://ggozad.github.io/haiku.rag/
- Installation - Provider setup
- Configuration - YAML configuration
- CLI - Command reference
- Python API - Complete API docs
- Agents - QA agent and multi-agent research
- Server - File monitoring, MCP, and AG-UI
- MCP - Model Context Protocol integration
- Inspector - Database browser TUI
- Benchmarks - Performance benchmarks
- Changelog - Version history
mcp-name: io.github.ggozad/haiku-rag
haiku-ragpip install haiku-rag