building-markdown-rag-system - AIXplore - Tech Articles

<div class="callout" data-callout="info"> <div class="callout-title">Overview</div> <div class="callout-content"> Learn how to build a lightweight yet powerful RAG system that enables natural language interactions with markdown documentation. This implementation focuses on simplicity and effectiveness, avoiding the complexity of vector databases while maintaining high-quality responses. </div> </div> <div class="topic-area"> ## System Architecture The RAG system consists of four main components working together to provide document-grounded AI responses: 1. **Document Processor**: Parses and chunks markdown files 2. **Search Engine**: Finds relevant document sections 3. **LLM Integration**: Generates contextual responses 4. **Chat Interface**: Handles user interactions ```mermaid flowchart TD User[User] --> |Question| Chat[Chat Interface] Chat --> |Query| Search[Search Engine] Search --> |Retrieves| Docs[Markdown Documents] Search --> |Results| LLM[LLM Processor] LLM --> |Response| Chat ``` </div> <div class="topic-area"> ## Document Processing ### Intelligent Chunking The system uses a sophisticated chunking strategy that preserves document context: ```typescript interface ChunkOptions { maxChunkSize: number; overlapSize: number; } function chunkDocument(content: string, options: ChunkOptions): string[] { const { maxChunkSize, overlapSize } = options; const chunks: string[] = []; // Split by paragraphs const paragraphs = content.split(/\n\s*\n/); let currentChunk = ''; for (const paragraph of paragraphs) { if (currentChunk.length + paragraph.length > maxChunkSize && currentChunk.length > 0) { chunks.push(currentChunk); // Keep overlap for context const lastParagraphs = currentChunk .split(/\n\s*\n/) .slice(-3) .join('\n\n'); currentChunk = lastParagraphs.length <= overlapSize ? lastParagraphs + '\n\n' : ''; } currentChunk += paragraph + '\n\n'; } return chunks; } ``` ### Metadata Extraction Each document chunk includes rich metadata for better context: ```typescript interface DocumentChunk { id: string; path: string; title: string; content: string; metadata: Record<string, any>; headings: string[]; tokens: string[]; createdAt: Date; updatedAt: Date; } ``` </div> <div class="topic-area"> ## Search Implementation Instead of using complex vector embeddings, we implement a hybrid search approach combining: 1. **Fuzzy Matching**: Using Fuse.js for typo tolerance 2. **Term Frequency**: For content relevance scoring 3. **Phrase Matching**: For exact matches ```typescript class SearchEngine { private calculateRelevance(query: string[], document: DocumentChunk): number { const tfScore = this.calculateTF(query, document); const phraseScore = this.phraseMatchBoost(query.join(' '), document); const headingScore = this.headingMatchScore(query, document); return (tfScore * 0.6) + (phraseScore * 0.3) + (headingScore * 0.1); } search(query: string, limit: number = 5): SearchResult[] { const queryTokens = tokenize(query); const results = this.documents.map(doc => ({ document: doc, score: this.calculateRelevance(queryTokens, doc) })); return results .filter(result => result.score > THRESHOLD) .sort((a, b) => b.score - a.score) .slice(0, limit); } } ``` </div> <div class="topic-area"> ## LLM Integration The system uses a carefully crafted prompt structure to ensure high-quality, grounded responses: ```typescript function buildSystemPrompt(relevantDocs: DocumentChunk[]): string { const contextSections = relevantDocs.map(doc => ` Source: ${doc.path} Title: ${doc.title} Content: ${doc.content} ---`).join('\n\n'); return `You are an AI assistant specialized in answering questions about the documentation. Answer questions based ONLY on the following information: ${contextSections} Guidelines: 1. Use ONLY the provided information 2. If information is not in context, acknowledge the limitation 3. Cite sources when possible 4. Format responses in Markdown 5. Be concise but thorough 6. Explain technical terms`; } ``` ### Context Management The system maintains conversation history while keeping context relevant: ```typescript function buildPromptWithRecentContext( messages: Message[], relevantDocs: DocumentChunk[] ): { systemPrompt: string; userMessages: Message[] } { const systemPrompt = buildSystemPrompt(relevantDocs); const recentMessages = messages.slice(-6); // Keep conversation focused return { systemPrompt, userMessages: recentMessages }; } ``` </div> <div class="callout" data-callout="tip"> <div class="callout-title">Performance Optimization</div> <div class="callout-content"> The system uses in-memory caching and document reprocessing strategies to maintain fast response times without sacrificing quality: ```typescript let documentsCache: DocumentChunk[] = []; let lastProcessed: Date | null = null; async function getProcessedDocuments() { const shouldReprocess = documentsCache.length === 0 || !lastProcessed || (new Date().getTime() - lastProcessed.getTime() > 3600000); if (shouldReprocess) { documentsCache = await processMarkdownDocuments(MARKDOWN_DIRECTORIES); lastProcessed = new Date(); } return documentsCache; } ``` </div> </div> <div class="topic-area"> ## User Interface The system provides a clean, responsive chat interface with: 1. **Markdown Rendering**: For formatted responses 2. **Source Citations**: Linking to original documents 3. **Context Awareness**: Understanding conversation flow ```tsx function ChatMessage({ message }: ChatMessageProps) { return ( <div className="flex justify-start"> <div className="max-w-[80%] rounded-lg p-3 bg-gray-100"> <MarkdownRenderer content={message.content} /> {message.sources && ( <div className="mt-2 pt-2 border-t"> <div className="text-xs font-medium">Sources:</div> {message.sources.map(source => ( <SourceCitation key={source.chunkId} source={source} /> ))} </div> )} </div> </div> ); } ``` </div> <div class="topic-area"> ## Key Benefits 1. **Simplicity**: No complex vector database setup 2. **Performance**: Fast in-memory search and caching 3. **Accuracy**: Document-grounded responses 4. **Maintainability**: Easy to update and extend 5. **Integration**: Works with existing markdown docs </div> <div class="callout" data-callout="warning"> <div class="callout-title">Limitations & Considerations</div> <div class="callout-content"> - Search is keyword-based, may miss semantic relationships - In-memory processing limits document scale - No persistent conversation storage - Requires careful prompt engineering </div> </div> <div class="topic-area"> ## Future Enhancements 1. **Vector Search**: Add optional embedding-based search 2. **Conversation Storage**: Implement persistence 3. **Streaming Responses**: Improve response UX 4. **Advanced Context**: Add semantic understanding 5. **Multi-Modal**: Support for images and diagrams </div> <div class="topic-area"> ## Conclusion This lightweight RAG implementation demonstrates that you can build a powerful document-grounded AI system without complex infrastructure. By focusing on practical search strategies and careful prompt engineering, we achieve high-quality responses while maintaining simplicity and performance. </div>