Repository avatar
Other Tools
v1.0.0
active

docs-mcp

com.vaadin/docs-mcp

Provides Vaadin Documentation and help with development tasks

Documentation

Vaadin Documentation RAG Service

A sophisticated, hierarchically-aware Retrieval-Augmented Generation (RAG) system for Vaadin documentation that understands document structure, provides framework-specific filtering, and enables intelligent parent-child navigation through documentation sections.

๐ŸŽฏ Project Overview

This project provides an advanced RAG system with enhanced hybrid search that:

  • Understands Hierarchical Structure: Navigates parent-child relationships within and across documentation files
  • Enhanced Hybrid Search: Combines semantic and intelligent keyword search with native Pinecone reranking for superior relevance
  • Framework Filtering: Intelligently filters content for Vaadin Flow (Java) vs Hilla (React) frameworks
  • Agent-Friendly: Provides MCP (Model Context Protocol) server for seamless IDE assistant integration
  • Production Ready: Clean architecture with dependency injection, comprehensive testing, and error handling

๐Ÿ—๏ธ Architecture

vaadin-documentation-services/
โ”œโ”€โ”€ packages/
โ”‚   โ”œโ”€โ”€ core-types/              # Shared TypeScript interfaces
โ”‚   โ”œโ”€โ”€ 1-asciidoc-converter/    # AsciiDoc โ†’ Markdown + metadata extraction
โ”‚   โ”œโ”€โ”€ 2-embedding-generator/   # Markdown โ†’ Vector database with hierarchical chunking
โ”‚   โ”œโ”€โ”€ rest-server/             # Enhanced REST API with hybrid search + reranking
โ”‚   โ””โ”€โ”€ mcp-server/              # MCP server with hierarchical navigation
โ”œโ”€โ”€ package.json                 # Bun workspace configuration
โ””โ”€โ”€ PROJECT_PLAN.md             # Complete project documentation

Data Flow

flowchart TD
    subgraph "Step 1: Documentation Processing"
        VaadinDocs["๐Ÿ“š Vaadin Docs<br/>(AsciiDoc)"] 
        Converter["๐Ÿ”„ AsciiDoc Converter<br/>โ€ข Framework detection<br/>โ€ข URL generation<br/>โ€ข Markdown output"]
        Processor["โšก Embedding Generator<br/>โ€ข Hierarchical chunking<br/>โ€ข Parent-child relationships<br/>โ€ข OpenAI embeddings"]
    end

    subgraph "Step 2: Enhanced Retrieval"
        Pinecone["๐Ÿ—„๏ธ Pinecone Vector DB<br/>โ€ข Rich metadata<br/>โ€ข Hierarchical relationships<br/>โ€ข Framework tags"]
        RestAPI["๐ŸŒ REST API<br/>โ€ข Enhanced hybrid search<br/>โ€ข Native Pinecone reranking<br/>โ€ข Framework filtering"]
    end

    subgraph "Step 3: Agent Integration"
        MCP["๐Ÿค– MCP Server<br/>โ€ข search_vaadin_docs<br/>โ€ข get_full_document<br/>โ€ข Full document retrieval"]
        IDEs["๐Ÿ’ป IDE Assistants<br/>โ€ข Context-aware search<br/>โ€ข Hierarchical exploration<br/>โ€ข Framework-specific help"]
    end

    VaadinDocs --> Converter
    Converter --> Processor
    Processor --> Pinecone
    Pinecone <--> RestAPI
    RestAPI <--> MCP
    MCP <--> IDEs

    classDef processing fill:#e1f5fe,stroke:#01579b,stroke-width:2px
    classDef storage fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
    classDef api fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px
    classDef agent fill:#fff3e0,stroke:#e65100,stroke-width:2px

    class VaadinDocs,Converter,Processor processing
    class Pinecone,RestAPI storage
    class MCP api
    class IDEs agent

โœจ Key Features

๐Ÿ” Intelligent Search

  • Enhanced Hybrid Search: Combines semantic similarity with intelligent keyword extraction and scoring
  • Native Pinecone Reranking: Uses Pinecone's bge-reranker-v2-m3 for optimal result ranking
  • Framework Awareness: Filters Flow vs Hilla content with common content inclusion
  • Query Preprocessing: Smart keyword extraction with stopword filtering for better search quality

๐ŸŒณ Hierarchical Navigation

  • Parent-Child Relationships: Navigate from specific details to broader context
  • Cross-File Links: Understand relationships between different documentation files
  • Context Breadcrumbs: Maintain navigation context for better user experience

๐ŸŽ›๏ธ Developer Experience

  • MCP Integration: Standardized protocol for IDE assistant integration
  • TypeScript: Full type safety across all packages
  • Comprehensive Testing: Unit tests, integration tests, and hierarchical workflow validation
  • Clean Architecture: Dependency injection and interface-based design

๐Ÿš€ Quick Start

Prerequisites

  • Bun runtime
  • OpenAI API key (for embeddings)
  • Pinecone API key and index

Installation

# Clone and install dependencies
git clone https://github.com/vaadin/vaadin-documentation-services
cd vaadin-documentation-services
bun install

Environment Setup

# Create .env file with your API keys
echo "OPENAI_API_KEY=your_openai_api_key" > .env
echo "PINECONE_API_KEY=your_pinecone_api_key" >> .env
echo "PINECONE_INDEX=your_pinecone_index" >> .env

Running the System

1. Process Documentation (One-time setup)

# Convert AsciiDoc to Markdown with metadata
cd packages/1-asciidoc-converter
bun run convert

# Generate embeddings and populate vector database
cd ../2-embedding-generator
bun run generate

2. Start REST API Server

cd packages/rest-server
bun run start
# Server runs at http://localhost:3001

3. Use MCP Server with IDE Assistant

The MCP server is deployed and available remotely via HTTP transport at: https://vaadin-mcp.fly.dev/mcp

Configure your IDE assistant to use the Streamable HTTP transport:

import { StreamableHTTPClientTransport } from "@modelcontextprotocol/sdk/client/streamableHttp.js";
const transport = new StreamableHTTPClientTransport(
  new URL("https://vaadin-mcp.fly.dev/mcp")
);

๐Ÿ“ฆ Package Details

Core Types (packages/core-types/)

Shared TypeScript interfaces used across all packages:

  • DocumentChunk: Core documentation chunk structure
  • RetrievalResult: Search result with relevance scoring
  • Framework: Type-safe framework definitions

AsciiDoc Converter (packages/1-asciidoc-converter/)

Converts Vaadin AsciiDoc documentation to Markdown with metadata:

  • Framework Detection: Automatically detects Flow/Hilla/common content
  • URL Generation: Creates proper Vaadin.com documentation links
  • Include Processing: Handles AsciiDoc include directives
  • Metadata Extraction: Preserves semantic information in frontmatter
cd packages/1-asciidoc-converter
bun run convert          # Convert all documentation
bun run test            # Run framework detection tests

Embedding Generator (packages/2-embedding-generator/)

Creates vector embeddings with hierarchical relationships:

  • Hierarchical Chunking: Preserves document structure and relationships
  • Parent-Child Links: Creates cross-file and intra-file relationship mapping
  • LangChain Integration: Uses MarkdownHeaderTextSplitter for intelligent chunking
  • Batch Processing: Efficient embedding generation and Pinecone upsertion
cd packages/2-embedding-generator
bun run generate        # Generate embeddings from Markdown
bun run test           # Run chunking and relationship tests

REST Server (packages/rest-server/)

Enhanced API server with hybrid search capabilities:

  • Hybrid Search: Semantic + keyword search with RRF fusion
  • Framework Filtering: Flow/Hilla/common content filtering
  • Document Navigation: /chunk/:chunkId endpoint for parent-child navigation
  • Backward Compatibility: Maintains existing API contracts
cd packages/rest-server
bun run start          # Start production server
bun run test           # Run comprehensive test suite
bun run test:verbose   # Detailed test output

API Endpoints:

  • POST /search - Hybrid search with framework filtering
  • GET /chunk/:chunkId - Retrieve specific document chunk
  • POST /ask - AI-generated answers (with streaming support)
  • GET /health - Health check
  • GET /vaadin-version - Get latest Vaadin version from GitHub releases

MCP Server (packages/mcp-server/)

Model Context Protocol server for IDE assistant integration:

  • Document Tools: search_vaadin_docs and get_full_document
  • Full Document Retrieval: Complete documentation pages with context
  • Framework Awareness: Intelligent framework detection and filtering
  • Error Handling: Graceful degradation for missing content
cd packages/mcp-server
bun run build          # Build for distribution
bun run test           # Run document-based tests

Available Tools:

  • search_vaadin_docs: Search with semantic and keyword matching
  • get_full_document: Retrieve complete documentation pages
  • get_vaadin_version: Get latest Vaadin version and release timestamp

๐Ÿงช Testing

Each package includes comprehensive test suites:

# Test individual packages
cd packages/1-asciidoc-converter && bun run test
cd packages/2-embedding-generator && bun run test  
cd packages/rest-server && bun run test
cd packages/mcp-server && bun run test

# Run REST server against live endpoint
cd packages/rest-server && bun run test:server

๐Ÿ“ˆ Performance & Metrics

Search Quality

  • 100% Framework Detection Accuracy: Flow, Hilla, and common content correctly identified
  • Enhanced Hybrid Search: Semantic + keyword search with native Pinecone reranking dramatically improves relevance
  • Contextual Navigation: Parent-child relationships enable better result exploration
  • 4,982 Document Chunks: Complete coverage of 378 Vaadin documentation files with 5-level hierarchy

System Performance

  • Parallel Processing: Semantic and keyword search executed in parallel with intelligent merging
  • Native Reranking: Pinecone's bge-reranker-v2-m3 provides superior result ranking
  • Query Preprocessing: Smart keyword extraction with stopword filtering improves search quality
  • Efficient Chunking: Optimized token limits with intelligent content splitting
  • Clean Architecture: Dependency injection enables easy performance optimization

Production Readiness

  • 100% API Backward Compatibility: All existing integrations continue to work
  • Robust Error Handling: Graceful fallbacks ensure system reliability
  • Fresh Data: Recently updated with complete Vaadin documentation coverage

๐ŸŒ Deployment

REST Server (fly.io)

The REST server is deployed to fly.io and available at:

  • Production: https://vaadin-docs-search.fly.dev
  • Health Check: https://vaadin-docs-search.fly.dev/health

MCP Server (fly.io)

The MCP server is deployed to fly.io and available at:

  • Production: https://vaadin-mcp.fly.dev/mcp
  • Health Check: https://vaadin-mcp.fly.dev/health

Documentation Processing

Automated via GitHub Actions:

  • Daily Updates: Documentation re-processed automatically
  • Manual Triggers: Can be triggered via GitHub Actions UI
  • Error Notifications: Automated alerts for processing failures

๐Ÿ”ง Development

Workspace Structure

This project uses Bun workspaces for package management:

bun install           # Install all dependencies
bun run build         # Build all packages
bun run test          # Test all packages

Adding New Features

  1. Core Types: Add interfaces to packages/core-types/
  2. Processing: Extend converters in packages/1-asciidoc-converter/ or packages/2-embedding-generator/
  3. API: Enhance search in packages/rest-server/
  4. Integration: Update MCP tools in packages/mcp-server/

Architecture Principles

  • Single Responsibility: Each package has a clear, focused purpose
  • Interface-Based Design: Clean contracts between components
  • Dependency Injection: Testable and swappable implementations
  • Type Safety: Full TypeScript coverage with strict configuration

๐Ÿ“š Documentation

๐Ÿ† Project Success

This project successfully delivered:

โœ… Sophisticated RAG System: Replaced naive implementation with hierarchically-aware search
โœ… Enhanced User Experience: Agents can now navigate from specific details to broader context
โœ… Production Quality: Clean architecture, comprehensive testing, and error handling
โœ… Framework Intelligence: Accurate Flow/Hilla content separation with common content inclusion
โœ… Developer Integration: Seamless IDE assistant integration via MCP protocol

The system now provides intelligent, context-aware documentation search that understands the hierarchical structure of Vaadin documentation and enables sophisticated agent interactions.

๐Ÿ“„ License

MIT - See license file for details.


Built with โค๏ธ for the Vaadin developer community