
docs-mcp
com.vaadin/docs-mcp
Provides Vaadin Documentation and help with development tasks
Documentation
Vaadin Documentation RAG Service
A sophisticated, hierarchically-aware Retrieval-Augmented Generation (RAG) system for Vaadin documentation that understands document structure, provides framework-specific filtering, and enables intelligent parent-child navigation through documentation sections.
๐ฏ Project Overview
This project provides an advanced RAG system with enhanced hybrid search that:
- Understands Hierarchical Structure: Navigates parent-child relationships within and across documentation files
- Enhanced Hybrid Search: Combines semantic and intelligent keyword search with native Pinecone reranking for superior relevance
- Framework Filtering: Intelligently filters content for Vaadin Flow (Java) vs Hilla (React) frameworks
- Agent-Friendly: Provides MCP (Model Context Protocol) server for seamless IDE assistant integration
- Production Ready: Clean architecture with dependency injection, comprehensive testing, and error handling
๐๏ธ Architecture
vaadin-documentation-services/
โโโ packages/
โ โโโ core-types/ # Shared TypeScript interfaces
โ โโโ 1-asciidoc-converter/ # AsciiDoc โ Markdown + metadata extraction
โ โโโ 2-embedding-generator/ # Markdown โ Vector database with hierarchical chunking
โ โโโ rest-server/ # Enhanced REST API with hybrid search + reranking
โ โโโ mcp-server/ # MCP server with hierarchical navigation
โโโ package.json # Bun workspace configuration
โโโ PROJECT_PLAN.md # Complete project documentation
Data Flow
flowchart TD
subgraph "Step 1: Documentation Processing"
VaadinDocs["๐ Vaadin Docs<br/>(AsciiDoc)"]
Converter["๐ AsciiDoc Converter<br/>โข Framework detection<br/>โข URL generation<br/>โข Markdown output"]
Processor["โก Embedding Generator<br/>โข Hierarchical chunking<br/>โข Parent-child relationships<br/>โข OpenAI embeddings"]
end
subgraph "Step 2: Enhanced Retrieval"
Pinecone["๐๏ธ Pinecone Vector DB<br/>โข Rich metadata<br/>โข Hierarchical relationships<br/>โข Framework tags"]
RestAPI["๐ REST API<br/>โข Enhanced hybrid search<br/>โข Native Pinecone reranking<br/>โข Framework filtering"]
end
subgraph "Step 3: Agent Integration"
MCP["๐ค MCP Server<br/>โข search_vaadin_docs<br/>โข get_full_document<br/>โข Full document retrieval"]
IDEs["๐ป IDE Assistants<br/>โข Context-aware search<br/>โข Hierarchical exploration<br/>โข Framework-specific help"]
end
VaadinDocs --> Converter
Converter --> Processor
Processor --> Pinecone
Pinecone <--> RestAPI
RestAPI <--> MCP
MCP <--> IDEs
classDef processing fill:#e1f5fe,stroke:#01579b,stroke-width:2px
classDef storage fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
classDef api fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px
classDef agent fill:#fff3e0,stroke:#e65100,stroke-width:2px
class VaadinDocs,Converter,Processor processing
class Pinecone,RestAPI storage
class MCP api
class IDEs agent
โจ Key Features
๐ Intelligent Search
- Enhanced Hybrid Search: Combines semantic similarity with intelligent keyword extraction and scoring
- Native Pinecone Reranking: Uses Pinecone's bge-reranker-v2-m3 for optimal result ranking
- Framework Awareness: Filters Flow vs Hilla content with common content inclusion
- Query Preprocessing: Smart keyword extraction with stopword filtering for better search quality
๐ณ Hierarchical Navigation
- Parent-Child Relationships: Navigate from specific details to broader context
- Cross-File Links: Understand relationships between different documentation files
- Context Breadcrumbs: Maintain navigation context for better user experience
๐๏ธ Developer Experience
- MCP Integration: Standardized protocol for IDE assistant integration
- TypeScript: Full type safety across all packages
- Comprehensive Testing: Unit tests, integration tests, and hierarchical workflow validation
- Clean Architecture: Dependency injection and interface-based design
๐ Quick Start
Prerequisites
- Bun runtime
- OpenAI API key (for embeddings)
- Pinecone API key and index
Installation
# Clone and install dependencies
git clone https://github.com/vaadin/vaadin-documentation-services
cd vaadin-documentation-services
bun install
Environment Setup
# Create .env file with your API keys
echo "OPENAI_API_KEY=your_openai_api_key" > .env
echo "PINECONE_API_KEY=your_pinecone_api_key" >> .env
echo "PINECONE_INDEX=your_pinecone_index" >> .env
Running the System
1. Process Documentation (One-time setup)
# Convert AsciiDoc to Markdown with metadata
cd packages/1-asciidoc-converter
bun run convert
# Generate embeddings and populate vector database
cd ../2-embedding-generator
bun run generate
2. Start REST API Server
cd packages/rest-server
bun run start
# Server runs at http://localhost:3001
3. Use MCP Server with IDE Assistant
The MCP server is deployed and available remotely via HTTP transport at:
https://vaadin-mcp.fly.dev/mcp
Configure your IDE assistant to use the Streamable HTTP transport:
import { StreamableHTTPClientTransport } from "@modelcontextprotocol/sdk/client/streamableHttp.js";
const transport = new StreamableHTTPClientTransport(
new URL("https://vaadin-mcp.fly.dev/mcp")
);
๐ฆ Package Details
Core Types (packages/core-types/
)
Shared TypeScript interfaces used across all packages:
DocumentChunk
: Core documentation chunk structureRetrievalResult
: Search result with relevance scoringFramework
: Type-safe framework definitions
AsciiDoc Converter (packages/1-asciidoc-converter/
)
Converts Vaadin AsciiDoc documentation to Markdown with metadata:
- Framework Detection: Automatically detects Flow/Hilla/common content
- URL Generation: Creates proper Vaadin.com documentation links
- Include Processing: Handles AsciiDoc include directives
- Metadata Extraction: Preserves semantic information in frontmatter
cd packages/1-asciidoc-converter
bun run convert # Convert all documentation
bun run test # Run framework detection tests
Embedding Generator (packages/2-embedding-generator/
)
Creates vector embeddings with hierarchical relationships:
- Hierarchical Chunking: Preserves document structure and relationships
- Parent-Child Links: Creates cross-file and intra-file relationship mapping
- LangChain Integration: Uses MarkdownHeaderTextSplitter for intelligent chunking
- Batch Processing: Efficient embedding generation and Pinecone upsertion
cd packages/2-embedding-generator
bun run generate # Generate embeddings from Markdown
bun run test # Run chunking and relationship tests
REST Server (packages/rest-server/
)
Enhanced API server with hybrid search capabilities:
- Hybrid Search: Semantic + keyword search with RRF fusion
- Framework Filtering: Flow/Hilla/common content filtering
- Document Navigation:
/chunk/:chunkId
endpoint for parent-child navigation - Backward Compatibility: Maintains existing API contracts
cd packages/rest-server
bun run start # Start production server
bun run test # Run comprehensive test suite
bun run test:verbose # Detailed test output
API Endpoints:
POST /search
- Hybrid search with framework filteringGET /chunk/:chunkId
- Retrieve specific document chunkPOST /ask
- AI-generated answers (with streaming support)GET /health
- Health checkGET /vaadin-version
- Get latest Vaadin version from GitHub releases
MCP Server (packages/mcp-server/
)
Model Context Protocol server for IDE assistant integration:
- Document Tools:
search_vaadin_docs
andget_full_document
- Full Document Retrieval: Complete documentation pages with context
- Framework Awareness: Intelligent framework detection and filtering
- Error Handling: Graceful degradation for missing content
cd packages/mcp-server
bun run build # Build for distribution
bun run test # Run document-based tests
Available Tools:
search_vaadin_docs
: Search with semantic and keyword matchingget_full_document
: Retrieve complete documentation pagesget_vaadin_version
: Get latest Vaadin version and release timestamp
๐งช Testing
Each package includes comprehensive test suites:
# Test individual packages
cd packages/1-asciidoc-converter && bun run test
cd packages/2-embedding-generator && bun run test
cd packages/rest-server && bun run test
cd packages/mcp-server && bun run test
# Run REST server against live endpoint
cd packages/rest-server && bun run test:server
๐ Performance & Metrics
Search Quality
- 100% Framework Detection Accuracy: Flow, Hilla, and common content correctly identified
- Enhanced Hybrid Search: Semantic + keyword search with native Pinecone reranking dramatically improves relevance
- Contextual Navigation: Parent-child relationships enable better result exploration
- 4,982 Document Chunks: Complete coverage of 378 Vaadin documentation files with 5-level hierarchy
System Performance
- Parallel Processing: Semantic and keyword search executed in parallel with intelligent merging
- Native Reranking: Pinecone's bge-reranker-v2-m3 provides superior result ranking
- Query Preprocessing: Smart keyword extraction with stopword filtering improves search quality
- Efficient Chunking: Optimized token limits with intelligent content splitting
- Clean Architecture: Dependency injection enables easy performance optimization
Production Readiness
- 100% API Backward Compatibility: All existing integrations continue to work
- Robust Error Handling: Graceful fallbacks ensure system reliability
- Fresh Data: Recently updated with complete Vaadin documentation coverage
๐ Deployment
REST Server (fly.io)
The REST server is deployed to fly.io and available at:
- Production:
https://vaadin-docs-search.fly.dev
- Health Check:
https://vaadin-docs-search.fly.dev/health
MCP Server (fly.io)
The MCP server is deployed to fly.io and available at:
- Production:
https://vaadin-mcp.fly.dev/mcp
- Health Check:
https://vaadin-mcp.fly.dev/health
Documentation Processing
Automated via GitHub Actions:
- Daily Updates: Documentation re-processed automatically
- Manual Triggers: Can be triggered via GitHub Actions UI
- Error Notifications: Automated alerts for processing failures
๐ง Development
Workspace Structure
This project uses Bun workspaces for package management:
bun install # Install all dependencies
bun run build # Build all packages
bun run test # Test all packages
Adding New Features
- Core Types: Add interfaces to
packages/core-types/
- Processing: Extend converters in
packages/1-asciidoc-converter/
orpackages/2-embedding-generator/
- API: Enhance search in
packages/rest-server/
- Integration: Update MCP tools in
packages/mcp-server/
Architecture Principles
- Single Responsibility: Each package has a clear, focused purpose
- Interface-Based Design: Clean contracts between components
- Dependency Injection: Testable and swappable implementations
- Type Safety: Full TypeScript coverage with strict configuration
๐ Documentation
- Project Plan: Complete project breakdown and progress tracking
- Project Brief: Original requirements and problem definition
- Package READMEs: Detailed documentation for each package
๐ Project Success
This project successfully delivered:
โ
Sophisticated RAG System: Replaced naive implementation with hierarchically-aware search
โ
Enhanced User Experience: Agents can now navigate from specific details to broader context
โ
Production Quality: Clean architecture, comprehensive testing, and error handling
โ
Framework Intelligence: Accurate Flow/Hilla content separation with common content inclusion
โ
Developer Integration: Seamless IDE assistant integration via MCP protocol
The system now provides intelligent, context-aware documentation search that understands the hierarchical structure of Vaadin documentation and enables sophisticated agent interactions.
๐ License
MIT - See license file for details.
Built with โค๏ธ for the Vaadin developer community
No installation packages available.