<div class="callout" data-callout="info">
<div class="callout-title">Overview</div>
<div class="callout-content">
Learn how to build a lightweight yet powerful RAG system that enables natural language interactions with markdown documentation. This implementation focuses on simplicity and effectiveness, avoiding the complexity of vector databases while maintaining high-quality responses.
</div>
</div>
<div class="topic-area">
## System Architecture
The RAG system consists of four main components working together to provide document-grounded AI responses:
1. **Document Processor**: Parses and chunks markdown files
2. **Search Engine**: Finds relevant document sections
3. **LLM Integration**: Generates contextual responses
4. **Chat Interface**: Handles user interactions
```mermaid
flowchart TD
User[User] --> |Question| Chat[Chat Interface]
Chat --> |Query| Search[Search Engine]
Search --> |Retrieves| Docs[Markdown Documents]
Search --> |Results| LLM[LLM Processor]
LLM --> |Response| Chat
```
</div>
<div class="topic-area">
## Document Processing
### Intelligent Chunking
The system uses a sophisticated chunking strategy that preserves document context:
```typescript
interface ChunkOptions {
maxChunkSize: number;
overlapSize: number;
}
function chunkDocument(content: string, options: ChunkOptions): string[] {
const { maxChunkSize, overlapSize } = options;
const chunks: string[] = [];
// Split by paragraphs
const paragraphs = content.split(/\n\s*\n/);
let currentChunk = '';
for (const paragraph of paragraphs) {
if (currentChunk.length + paragraph.length > maxChunkSize && currentChunk.length > 0) {
chunks.push(currentChunk);
// Keep overlap for context
const lastParagraphs = currentChunk
.split(/\n\s*\n/)
.slice(-3)
.join('\n\n');
currentChunk = lastParagraphs.length <= overlapSize
? lastParagraphs + '\n\n'
: '';
}
currentChunk += paragraph + '\n\n';
}
return chunks;
}
```
### Metadata Extraction
Each document chunk includes rich metadata for better context:
```typescript
interface DocumentChunk {
id: string;
path: string;
title: string;
content: string;
metadata: Record<string, any>;
headings: string[];
tokens: string[];
createdAt: Date;
updatedAt: Date;
}
```
</div>
<div class="topic-area">
## Search Implementation
Instead of using complex vector embeddings, we implement a hybrid search approach combining:
1. **Fuzzy Matching**: Using Fuse.js for typo tolerance
2. **Term Frequency**: For content relevance scoring
3. **Phrase Matching**: For exact matches
```typescript
class SearchEngine {
private calculateRelevance(query: string[], document: DocumentChunk): number {
const tfScore = this.calculateTF(query, document);
const phraseScore = this.phraseMatchBoost(query.join(' '), document);
const headingScore = this.headingMatchScore(query, document);
return (tfScore * 0.6) + (phraseScore * 0.3) + (headingScore * 0.1);
}
search(query: string, limit: number = 5): SearchResult[] {
const queryTokens = tokenize(query);
const results = this.documents.map(doc => ({
document: doc,
score: this.calculateRelevance(queryTokens, doc)
}));
return results
.filter(result => result.score > THRESHOLD)
.sort((a, b) => b.score - a.score)
.slice(0, limit);
}
}
```
</div>
<div class="topic-area">
## LLM Integration
The system uses a carefully crafted prompt structure to ensure high-quality, grounded responses:
```typescript
function buildSystemPrompt(relevantDocs: DocumentChunk[]): string {
const contextSections = relevantDocs.map(doc => `
Source: ${doc.path}
Title: ${doc.title}
Content:
${doc.content}
---`).join('\n\n');
return `You are an AI assistant specialized in answering questions about the documentation.
Answer questions based ONLY on the following information:
${contextSections}
Guidelines:
1. Use ONLY the provided information
2. If information is not in context, acknowledge the limitation
3. Cite sources when possible
4. Format responses in Markdown
5. Be concise but thorough
6. Explain technical terms`;
}
```
### Context Management
The system maintains conversation history while keeping context relevant:
```typescript
function buildPromptWithRecentContext(
messages: Message[],
relevantDocs: DocumentChunk[]
): { systemPrompt: string; userMessages: Message[] } {
const systemPrompt = buildSystemPrompt(relevantDocs);
const recentMessages = messages.slice(-6); // Keep conversation focused
return { systemPrompt, userMessages: recentMessages };
}
```
</div>
<div class="callout" data-callout="tip">
<div class="callout-title">Performance Optimization</div>
<div class="callout-content">
The system uses in-memory caching and document reprocessing strategies to maintain fast response times without sacrificing quality:
```typescript
let documentsCache: DocumentChunk[] = [];
let lastProcessed: Date | null = null;
async function getProcessedDocuments() {
const shouldReprocess =
documentsCache.length === 0 ||
!lastProcessed ||
(new Date().getTime() - lastProcessed.getTime() > 3600000);
if (shouldReprocess) {
documentsCache = await processMarkdownDocuments(MARKDOWN_DIRECTORIES);
lastProcessed = new Date();
}
return documentsCache;
}
```
</div>
</div>
<div class="topic-area">
## User Interface
The system provides a clean, responsive chat interface with:
1. **Markdown Rendering**: For formatted responses
2. **Source Citations**: Linking to original documents
3. **Context Awareness**: Understanding conversation flow
```tsx
function ChatMessage({ message }: ChatMessageProps) {
return (
<div className="flex justify-start">
<div className="max-w-[80%] rounded-lg p-3 bg-gray-100">
<MarkdownRenderer content={message.content} />
{message.sources && (
<div className="mt-2 pt-2 border-t">
<div className="text-xs font-medium">Sources:</div>
{message.sources.map(source => (
<SourceCitation key={source.chunkId} source={source} />
))}
</div>
)}
</div>
</div>
);
}
```
</div>
<div class="topic-area">
## Key Benefits
1. **Simplicity**: No complex vector database setup
2. **Performance**: Fast in-memory search and caching
3. **Accuracy**: Document-grounded responses
4. **Maintainability**: Easy to update and extend
5. **Integration**: Works with existing markdown docs
</div>
<div class="callout" data-callout="warning">
<div class="callout-title">Limitations & Considerations</div>
<div class="callout-content">
- Search is keyword-based, may miss semantic relationships
- In-memory processing limits document scale
- No persistent conversation storage
- Requires careful prompt engineering
</div>
</div>
<div class="topic-area">
## Future Enhancements
1. **Vector Search**: Add optional embedding-based search
2. **Conversation Storage**: Implement persistence
3. **Streaming Responses**: Improve response UX
4. **Advanced Context**: Add semantic understanding
5. **Multi-Modal**: Support for images and diagrams
</div>
<div class="topic-area">
## Conclusion
This lightweight RAG implementation demonstrates that you can build a powerful document-grounded AI system without complex infrastructure. By focusing on practical search strategies and careful prompt engineering, we achieve high-quality responses while maintaining simplicity and performance.
</div>