📌 Overview
The Agentic Blogging Assistant is a sophisticated AI-powered content transformation platform that converts technical materials (Jupyter notebooks, Markdown files, Python scripts) into professionally structured blog posts. Built on a hierarchical agent-based architecture, it leverages advanced semantic processing, iterative refinement, and multi-provider LLM integration to produce high-quality technical content.
🎯 Key Features
- Multi-Format Content Processing: Supports .ipynb, .md, and .py file uploads with intelligent parsing
- Hierarchical Agent Architecture: Specialized agents for content parsing, outline generation, and blog drafting
- Advanced Semantic Search: ChromaDB vector storage with HyDE (Hypothetical Document Embeddings) for enhanced retrieval
- Multi-LLM Provider Support: OpenAI, Claude, Gemini, Deepseek, and OpenRouter integration with factory pattern
- Project Management System: Persistent project tracking with milestone-based workflow management
- Iterative Quality Refinement: Multi-criteria scoring with feedback loops (completeness, accuracy, clarity, engagement)
- LangGraph Workflow Orchestration: Complex logic implemented as directed graphs with conditional routing
- Section-Level Caching: Granular performance optimization with hash-based storage
- Type-Safe State Management: Pydantic models throughout the system for validation and consistency
- Content-Aware Chunking: Syntax-aware text splitting for different content types
- Social Media Content Generation: Platform-specific promotional content with structured analysis
- Interactive Streamlit Interface: Tabbed workflow with real-time progress tracking
- Version Control & Feedback: Section-level approval, editing, and regeneration capabilities
- SEO Optimization: Intelligent title and summary generation
- Export Capabilities: Multiple format support with professional formatting
The system employs a sophisticated multi-tier architecture designed for scalability, maintainability, and performance:
┌─────────────────────────────────────────┐
│ Frontend (Streamlit) │
│ ┌─────────────────────────────────┐ │
│ │ Project Management UI │ │
│ │ Workflow Components │ │
│ │ State Management │ │
│ └─────────────────────────────────┘ │
└─────────────────┬───────────────────────┘
│ REST API
┌─────────────────┴───────────────────────┐
│ Backend (FastAPI) │
│ ┌─────────────────────────────────┐ │
│ │ Agent Orchestration │ │
│ │ ┌─────────────────────────┐ │ │
│ │ │ ContentParsingAgent │ │ │
│ │ │ OutlineGeneratorAgent │ │ │
│ │ │ BlogDraftGenerator │ │ │
│ │ │ BlogRefinementAgent │ │ │
│ │ │ SocialMediaAgent │ │ │
│ │ └─────────────────────────┘ │ │
│ └─────────────────────────────────┘ │
└─────────────────┬───────────────────────┘
│
┌─────────────────┴───────────────────────┐
│ Infrastructure Layer │
│ ┌─────────────┐ ┌─────────────────┐ │
│ │ ChromaDB │ │ Multi-LLM │ │
│ │ Vector │ │ Provider │ │
│ │ Storage │ │ Abstraction │ │
│ └─────────────┘ └─────────────────┘ │
└─────────────────────────────────────────┘
Purpose: Processes and semantically indexes technical content files
LangGraph Workflow:
validate_file → parse_content → chunk_content → prepare_metadata → store_content
Key Technical Decisions:
- Content Deduplication: SHA256 hashing prevents redundant processing
- Semantic Chunking: 1000-character chunks with 200-character overlap for optimal retrieval
- Parser Factory Pattern: Extensible architecture supporting multiple file formats
- Rich Metadata: Project scoping with timestamps and content type classification
Purpose: Analyzes content and generates structured blog outlines with difficulty assessment
LangGraph Workflow:
analyze_content → assess_difficulty → identify_prerequisites → structure_outline → generate_final
Key Technical Decisions:
- Multi-stage Analysis: Separates topic extraction from complexity assessment
- Difficulty Classification: Automatic Beginner/Intermediate/Advanced categorization
- Cache Optimization: Hash-based outline storage for performance
- User Guidelines Integration: Flexible constraint system for customization
Purpose: Generates complete blog drafts with iterative quality refinement
LangGraph Workflow:
semantic_content_mapper → section_generator → content_enhancer → code_example_extractor → quality_validator
↓
section_finalizer ← [auto_feedback_generator → feedback_incorporator → quality_validator (loop)]
Key Technical Decisions:
- HyDE RAG Implementation: Uses
generate_hypothetical_documentandretrieve_context_with_hydefor enhanced semantic matching - Quality-Driven Iteration: Multi-dimensional scoring with feedback loops
- Section-by-Section Processing: Individual section generation with validation
- Code Example Extraction: Dedicated node for extracting and formatting code blocks
- Persistent Storage: SQLite backend in
root/data/vector_store/ - Embedding Functions: Uses
EmbeddingFactoryto support multiple providers - Single Collection Design: "content" collection with metadata filtering
- Content Hash Deduplication: SHA256 hashing prevents duplicate storage
- Content Chunking: Processes text into manageable chunks for embedding
- Metadata Filtering: Project scoping and content type organization
- Hash-Based Validation: Prevents redundant processing of identical content
- Vector Similarity Search: Semantic matching for relevant content retrieval
- Content Hash Caching: Prevents reprocessing of duplicate files
- Outline Caching: TTL-based storage using FastAPI's TTLCache
- Section Caching: Generated sections cached based on outline hashes
- Job State Caching: Session management with in-memory state storage
- Dependency Injection: Agents initialized with shared vector store and model instances
- Async Operations: All agents support async processing
- Resource Sharing: VectorStoreService and PersonaService shared across agents
- Pytest Framework: Test suite located in
root/backend/tests/ - API Testing: Endpoint validation in
tests/api/ - Service Layer Testing: Unit tests in
tests/services/ - Test Configuration:
pytest.iniwith test discovery settings - Test Requirements: Separate
test-requirements.txtfor testing dependencies
- Quality Validator Node: Automated section quality assessment
- Feedback Generation: Auto-feedback generator for content improvement
- Iterative Refinement: Feedback incorporation loops in blog draft generation
- Content Enhancement: Dedicated enhancer node for improving section quality
- Python 3.8+
- Node.js (for frontend dependencies)
- At least one LLM provider API key (OpenAI, Claude, Gemini, etc.)
Create a .env file in the root/ directory:
# LLM Provider Configuration (configure at least one)
OPENAI_API_KEY=your_openai_key_here
ANTHROPIC_API_KEY=your_claude_key_here
GOOGLE_API_KEY=your_gemini_key_here
DEEPSEEK_API_KEY=your_deepseek_key_here
OPENROUTER_API_KEY=your_openrouter_key_here
# Vector Storage Configuration
CHROMA_DB_PATH=./data/vector_store
# Embedding Provider (optional - defaults to sentence-transformers)
AZURE_OPENAI_ENDPOINT=your_azure_endpoint
AZURE_OPENAI_API_KEY=your_azure_key
AZURE_OPENAI_API_VERSION=2023-05-15# Make launch script executable
chmod +x launch-parallel.sh
# Launch both backend and frontend
./launch-parallel.sh# Backend Setup
cd root
pip install -r requirements.txt
cd backend
uvicorn main:app --reload --host 0.0.0.0 --port 8000
# Frontend Setup (separate terminal)
cd root/frontend
pip install -r requirements.txt
streamlit run new_app_api.py# Run comprehensive test suite
cd root/backend
pytest -v
# Run specific test categories
pytest tests/api/ -v # API endpoint tests
pytest tests/services/ -v # Service layer tests
pytest tests/utils/ -v # Utility function tests- Parallel Launch:
launch-parallel.shruns both services in the same terminal - Individual Launch:
launch.shfor separate terminal deployment - Process Management: Built-in cleanup and signal handling
- Vector Storage:
root/data/vector_store/for ChromaDB persistence - Project Data:
root/data/projects/for project management storage - Upload Directory:
root/data/uploads/for file processing
- Complex Workflow Management: Blog generation requires conditional routing, iterative refinement, and state management
- Debuggability: Graph visualization and step-by-step execution tracking
- Extensibility: Easy addition of new nodes and workflows without architectural changes
- Lightweight Deployment: SQLite backend for simple deployment scenarios
- Embedding Flexibility: Support for multiple embedding providers
- Developer Experience: Easy local development and debugging
- Separation of Concerns: Clean API layer decoupled from UI implementation
- Development Velocity: Rapid prototyping and iteration capabilities
- Production Ready: Async support, automatic API documentation, and robust error handling
The system emphasizes content quality through validation nodes and feedback loops rather than speed-only generation. This reflects the goal of producing high-quality technical content.
root/
├── backend/ # FastAPI application
│ ├── agents/ # Specialized agents with LangGraph workflows
│ │ ├── content_parsing/ # File processing agent
│ │ ├── outline_generator/ # Blog outline generation
│ │ ├── blog_draft_generator/ # Section-by-section drafting
│ │ └── blog_refinement/ # Content refinement and titles
│ ├── models/ # LLM providers and embeddings
│ ├── services/ # Vector store, personas, project management
│ ├── parsers/ # File format parsers (.ipynb, .md, .py)
│ └── tests/ # Comprehensive test suite
├── frontend/ # Streamlit interface
│ ├── components/ # UI components and project management
│ └── services/ # Frontend service layer
└── data/ # Storage directories
├── vector_store/ # ChromaDB persistence
├── projects/ # Project management data
└── uploads/ # Uploaded files