Disclaimer: This blog article has been generated with the assistance of AI. While the content is AI-generated, the software itself and the ideas behind it are the result of real development work and genuine user needs.
Head over to the Complete source code on GitHub
Ever wanted to create a smart assistant that can answer questions specifically about your blog content? In this post, I’ll walk you through building a Retrieval Augmented Generation (RAG) system that indexes all your Jekyll blog posts in a vector database and uses AWS Bedrock Claude to provide intelligent answers based solely on your content.
Our RAG system will:
Jekyll Blog Posts → Content Processing → Embeddings → Vector DB (FAISS)
↓
User Question → Query Embeddings → Similarity Search → Context Retrieval
↓
Context + Question → AWS Bedrock Claude → Contextualized Answer
Traditional Search Limitations: Standard blog search relies on keyword matching, missing semantic relationships and context.
RAG Advantages:
Before we dive in, ensure you have:
The system automatically finds Jekyll posts in common locations:
post_patterns = [
self.blog_path / "_posts" / "*.md",
self.blog_path / "_posts" / "*.markdown",
self.blog_path / "blog" / "*.md",
self.blog_path / "posts" / "*.md",
]
It then parses the frontmatter and content, extracting metadata like titles, dates, tags, and categories while cleaning the markdown for better indexing.
Long blog posts are split into overlapping chunks to improve retrieval accuracy:
def chunk_content(self, content: str, max_tokens: int = 500, overlap: int = 50):
"""Split content into overlapping chunks for better retrieval."""
tokens = self.tokenizer.encode(content)
chunks = []
start = 0
while start < len(tokens):
end = min(start + max_tokens, len(tokens))
chunk_tokens = tokens[start:end]
chunk_text = self.tokenizer.decode(chunk_tokens)
chunks.append(chunk_text)
if end == len(tokens):
break
start = end - overlap
return chunks
Why Chunking Matters:
We use Sentence Transformers for creating semantic embeddings:
self.embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
# Combine title and content for richer context
text_to_embed = f"Title: {post.title}\n\nContent: {chunk}"
embeddings = self.embedding_model.encode(all_chunks)
The all-MiniLM-L6-v2 model provides excellent semantic understanding while being computationally efficient.
FAISS (Facebook AI Similarity Search) provides lightning-fast similarity search:
# Create index optimized for cosine similarity
dimension = embeddings.shape[1]
self.faiss_index = faiss.IndexFlatIP(dimension)
# Normalize for cosine similarity
faiss.normalize_L2(embeddings)
self.faiss_index.add(embeddings.astype('float32'))
The system uses Claude 3.5 Sonnet via Bedrock for intelligent responses:
def query_bedrock_claude(self, prompt: str, context: str) -> str:
system_prompt = f"""You are an AI assistant that answers questions based ONLY on the provided blog post content.
Rules:
1. ONLY use information from the provided blog post excerpts
2. If the blog posts don't contain enough information, say so explicitly
3. Always cite which blog post(s) you're referencing by title
4. Do not use any external knowledge beyond what's in the blog posts
5. Be concise but thorough in your responses
Blog post content:
{context}"""
response = self.bedrock_client.invoke_model(
modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',
body=json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1000,
"system": system_prompt,
"messages": [{"role": "user", "content": f"Question: {prompt}"}]
})
)
return json.loads(response['body'].read())['content'][0]['text']
pip install boto3 sentence-transformers faiss-cpu python-frontmatter tiktoken numpy
Choose your preferred method:
# Option A: AWS CLI
aws configure
# Option B: Environment Variables
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
# Option C: Use IAM roles (if running on EC2)
Modify the blog path in the script to match your Jekyll site:
rag = JekyllBlogRAG(
blog_path="/your/jekyll/blog/path",
aws_region="ap-south-1"
)
Once set up, you can ask sophisticated questions:
Technical Implementation Questions:
Q: "How do I implement authentication in React applications?"
A: Based on your blog post "React Authentication Best Practices", you can implement authentication using JWT tokens with...
Architecture and Design:
Q: "What are the pros and cons of microservices architecture?"
A: In your post "Microservices vs Monoliths: A Practical Guide", you outlined several key advantages of microservices...
Performance Optimization:
Q: "How can I optimize database queries for better performance?"
A: Your blog post "Database Optimization Techniques" covers several strategies including indexing, query optimization...
Track file modification times to only re-index changed posts:
def needs_reindexing(self, file_path: Path) -> bool:
"""Check if file needs re-indexing based on modification time."""
last_indexed = self.get_last_indexed_time(file_path)
file_mtime = file_path.stat().st_mtime
return file_mtime > last_indexed
Add filters for categories, tags, or date ranges:
def search_with_filters(self, query: str, categories: List[str] = None,
date_range: tuple = None) -> List[Dict]:
"""Search with additional filtering criteria."""
results = self.search_similar_chunks(query, top_k=20)
if categories:
results = [r for r in results if any(cat in r['categories'] for cat in categories)]
if date_range:
results = [r for r in results if date_range[0] <= r['date'] <= date_range[1]]
return results[:5] # Return top 5 after filtering
Extend to handle images and code snippets:
def extract_code_blocks(self, content: str) -> List[Dict]:
"""Extract and index code blocks separately."""
code_pattern = r'```(\w+)?\n(.*?)\n```'
matches = re.findall(code_pattern, content, re.DOTALL)
return [{'language': lang, 'code': code} for lang, code in matches]
# Debug post discovery
post_files = self.find_posts()
print(f"Checking paths: {[str(p) for p in post_patterns]}")
print(f"Found files: {[str(f) for f in post_files]}")
Ensure sensitive information isn’t indexed:
def should_index_post(self, post: BlogPost) -> bool:
"""Check if post should be indexed (skip drafts, private posts)."""
if 'draft' in post.categories or 'private' in post.tags:
return False
return True
Validate and sanitize user inputs:
def sanitize_query(self, query: str) -> str:
"""Sanitize user query to prevent injection attacks."""
# Remove potentially harmful characters
return re.sub(r'[<>\"\'%;()&+]', '', query).strip()
Implement rate limiting for production use:
from time import time
from collections import defaultdict
class RateLimiter:
def __init__(self, max_requests: int = 10, time_window: int = 60):
self.max_requests = max_requests
self.time_window = time_window
self.requests = defaultdict(list)
def allow_request(self, client_id: str) -> bool:
now = time()
client_requests = self.requests[client_id]
# Remove old requests
client_requests[:] = [req_time for req_time in client_requests
if now - req_time < self.time_window]
if len(client_requests) < self.max_requests:
client_requests.append(now)
return True
return False
Build a React/Vue frontend:
// Example API integration
const askQuestion = async (question) => {
const response = await fetch('/api/ask', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({question})
});
return response.json();
};
Track popular queries and content performance:
def log_query_analytics(self, query: str, results: List[Dict],
response_time: float):
"""Log query analytics for insights."""
analytics_data = {
'timestamp': datetime.now(),
'query': query,
'num_results': len(results),
'response_time': response_time,
'top_posts': [r['title'] for r in results[:3]]
}
# Store in database or analytics service
self.store_analytics(analytics_data)
Extend to support multiple languages:
def detect_language(self, content: str) -> str:
"""Detect content language for appropriate processing."""
from langdetect import detect
return detect(content)
def get_embedding_model(self, language: str) -> SentenceTransformer:
"""Select appropriate embedding model for language."""
models = {
'en': 'all-MiniLM-L6-v2',
'hi': 'sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2',
# Add more language models as needed
}
return SentenceTransformer(models.get(language, models['en']))
Building a RAG system for your Jekyll blog creates a powerful way to make your content more accessible and useful. The combination of semantic search with Claude’s reasoning capabilities provides responses that are both accurate and contextually rich.
The system we’ve built offers:
Whether you’re building this for personal use or considering it for a larger content site, the principles and code provided give you a solid foundation to build upon.