Agentic Blogging Assistant

📌 Overview

The Agentic Blogging Assistant is a sophisticated AI-powered content transformation platform that converts technical materials (Jupyter notebooks, Markdown files, Python scripts) into professionally structured blog posts. Built on a hierarchical agent-based architecture, it leverages advanced semantic processing, iterative refinement, and multi-provider LLM integration to produce high-quality technical content.

🎯 Key Features

Core Functionality

Multi-Format Content Processing: Supports .ipynb, .md, and .py file uploads with intelligent parsing
Hierarchical Agent Architecture: Specialized agents for content parsing, outline generation, and blog drafting
Advanced Semantic Search: ChromaDB vector storage with HyDE (Hypothetical Document Embeddings) for enhanced retrieval
Multi-LLM Provider Support: OpenAI, Claude, Gemini, Deepseek, and OpenRouter integration with factory pattern
Project Management System: Persistent project tracking with milestone-based workflow management
Iterative Quality Refinement: Multi-criteria scoring with feedback loops (completeness, accuracy, clarity, engagement)

Technical Capabilities

LangGraph Workflow Orchestration: Complex logic implemented as directed graphs with conditional routing
Section-Level Caching: Granular performance optimization with hash-based storage
Type-Safe State Management: Pydantic models throughout the system for validation and consistency
Content-Aware Chunking: Syntax-aware text splitting for different content types
Social Media Content Generation: Platform-specific promotional content with structured analysis

User Experience

Interactive Streamlit Interface: Tabbed workflow with real-time progress tracking
Version Control & Feedback: Section-level approval, editing, and regeneration capabilities
SEO Optimization: Intelligent title and summary generation
Export Capabilities: Multiple format support with professional formatting

🏗️ Technical Architecture

System Overview

The system employs a sophisticated multi-tier architecture designed for scalability, maintainability, and performance:

┌─────────────────────────────────────────┐
│           Frontend (Streamlit)          │
│   ┌─────────────────────────────────┐   │
│   │    Project Management UI        │   │
│   │    Workflow Components         │   │
│   │    State Management            │   │
│   └─────────────────────────────────┘   │
└─────────────────┬───────────────────────┘
                  │ REST API
┌─────────────────┴───────────────────────┐
│           Backend (FastAPI)             │
│   ┌─────────────────────────────────┐   │
│   │      Agent Orchestration        │   │
│   │   ┌─────────────────────────┐   │   │
│   │   │  ContentParsingAgent    │   │   │
│   │   │  OutlineGeneratorAgent  │   │   │
│   │   │  BlogDraftGenerator     │   │   │
│   │   │  BlogRefinementAgent    │   │   │
│   │   │  SocialMediaAgent       │   │   │
│   │   └─────────────────────────┘   │   │
│   └─────────────────────────────────┘   │
└─────────────────┬───────────────────────┘
                  │
┌─────────────────┴───────────────────────┐
│        Infrastructure Layer             │
│   ┌─────────────┐  ┌─────────────────┐  │
│   │   ChromaDB  │  │  Multi-LLM      │  │
│   │   Vector    │  │  Provider       │  │
│   │   Storage   │  │  Abstraction    │  │
│   └─────────────┘  └─────────────────┘  │
└─────────────────────────────────────────┘

Agent Architecture Deep Dive

1. ContentParsingAgent

Purpose: Processes and semantically indexes technical content files

LangGraph Workflow:

validate_file → parse_content → chunk_content → prepare_metadata → store_content

Key Technical Decisions:

Content Deduplication: SHA256 hashing prevents redundant processing
Semantic Chunking: 1000-character chunks with 200-character overlap for optimal retrieval
Parser Factory Pattern: Extensible architecture supporting multiple file formats
Rich Metadata: Project scoping with timestamps and content type classification

2. OutlineGeneratorAgent

Purpose: Analyzes content and generates structured blog outlines with difficulty assessment

LangGraph Workflow:

analyze_content → assess_difficulty → identify_prerequisites → structure_outline → generate_final

Key Technical Decisions:

Multi-stage Analysis: Separates topic extraction from complexity assessment
Difficulty Classification: Automatic Beginner/Intermediate/Advanced categorization
Cache Optimization: Hash-based outline storage for performance
User Guidelines Integration: Flexible constraint system for customization

3. BlogDraftGeneratorAgent

Purpose: Generates complete blog drafts with iterative quality refinement

LangGraph Workflow:

semantic_content_mapper → section_generator → content_enhancer → code_example_extractor → quality_validator
                                                                                              ↓
section_finalizer ← [auto_feedback_generator → feedback_incorporator → quality_validator (loop)]

Key Technical Decisions:

HyDE RAG Implementation: Uses generate_hypothetical_document and retrieve_context_with_hyde for enhanced semantic matching
Quality-Driven Iteration: Multi-dimensional scoring with feedback loops
Section-by-Section Processing: Individual section generation with validation
Code Example Extraction: Dedicated node for extracting and formatting code blocks

Vector Storage & Semantic Intelligence

ChromaDB Implementation

Persistent Storage: SQLite backend in root/data/vector_store/
Embedding Functions: Uses EmbeddingFactory to support multiple providers
Single Collection Design: "content" collection with metadata filtering
Content Hash Deduplication: SHA256 hashing prevents duplicate storage

Retrieval Techniques

Content Chunking: Processes text into manageable chunks for embedding
Metadata Filtering: Project scoping and content type organization
Hash-Based Validation: Prevents redundant processing of identical content
Vector Similarity Search: Semantic matching for relevant content retrieval

Performance Optimization

Caching Implementation

Content Hash Caching: Prevents reprocessing of duplicate files
Outline Caching: TTL-based storage using FastAPI's TTLCache
Section Caching: Generated sections cached based on outline hashes
Job State Caching: Session management with in-memory state storage

Agent Coordination

Dependency Injection: Agents initialized with shared vector store and model instances
Async Operations: All agents support async processing
Resource Sharing: VectorStoreService and PersonaService shared across agents

Quality Assurance & Testing

Testing Infrastructure

Pytest Framework: Test suite located in root/backend/tests/
API Testing: Endpoint validation in tests/api/
Service Layer Testing: Unit tests in tests/services/
Test Configuration: pytest.ini with test discovery settings
Test Requirements: Separate test-requirements.txt for testing dependencies

Quality Features

Quality Validator Node: Automated section quality assessment
Feedback Generation: Auto-feedback generator for content improvement
Iterative Refinement: Feedback incorporation loops in blog draft generation
Content Enhancement: Dedicated enhancer node for improving section quality

🚀 Setup & Deployment

Prerequisites

Python 3.8+
Node.js (for frontend dependencies)
At least one LLM provider API key (OpenAI, Claude, Gemini, etc.)

Environment Configuration

Create a .env file in the root/ directory:

# LLM Provider Configuration (configure at least one)
OPENAI_API_KEY=your_openai_key_here
ANTHROPIC_API_KEY=your_claude_key_here
GOOGLE_API_KEY=your_gemini_key_here
DEEPSEEK_API_KEY=your_deepseek_key_here
OPENROUTER_API_KEY=your_openrouter_key_here

# Vector Storage Configuration
CHROMA_DB_PATH=./data/vector_store

# Embedding Provider (optional - defaults to sentence-transformers)
AZURE_OPENAI_ENDPOINT=your_azure_endpoint
AZURE_OPENAI_API_KEY=your_azure_key
AZURE_OPENAI_API_VERSION=2023-05-15

Installation & Launch

Quick Start (Parallel Launch)

# Make launch script executable
chmod +x launch-parallel.sh

# Launch both backend and frontend
./launch-parallel.sh

Manual Setup

# Backend Setup
cd root
pip install -r requirements.txt
cd backend
uvicorn main:app --reload --host 0.0.0.0 --port 8000

# Frontend Setup (separate terminal)
cd root/frontend
pip install -r requirements.txt
streamlit run new_app_api.py

Testing

# Run comprehensive test suite
cd root/backend
pytest -v

# Run specific test categories
pytest tests/api/ -v          # API endpoint tests
pytest tests/services/ -v     # Service layer tests
pytest tests/utils/ -v        # Utility function tests

Development Features

Launch Scripts

Parallel Launch: launch-parallel.sh runs both services in the same terminal
Individual Launch: launch.sh for separate terminal deployment
Process Management: Built-in cleanup and signal handling

Storage Structure

Vector Storage: root/data/vector_store/ for ChromaDB persistence
Project Data: root/data/projects/ for project management storage
Upload Directory: root/data/uploads/ for file processing

📊 Key Technical Decisions & Rationale

Why LangGraph?

Complex Workflow Management: Blog generation requires conditional routing, iterative refinement, and state management
Debuggability: Graph visualization and step-by-step execution tracking
Extensibility: Easy addition of new nodes and workflows without architectural changes

Why ChromaDB?

Lightweight Deployment: SQLite backend for simple deployment scenarios
Embedding Flexibility: Support for multiple embedding providers
Developer Experience: Easy local development and debugging

Why FastAPI + Streamlit?

Separation of Concerns: Clean API layer decoupled from UI implementation
Development Velocity: Rapid prototyping and iteration capabilities
Production Ready: Async support, automatic API documentation, and robust error handling

Quality-First Design

The system emphasizes content quality through validation nodes and feedback loops rather than speed-only generation. This reflects the goal of producing high-quality technical content.

📁 File Structure

Core Components

root/
├── backend/                 # FastAPI application
│   ├── agents/             # Specialized agents with LangGraph workflows
│   │   ├── content_parsing/        # File processing agent
│   │   ├── outline_generator/      # Blog outline generation
│   │   ├── blog_draft_generator/   # Section-by-section drafting
│   │   └── blog_refinement/        # Content refinement and titles
│   ├── models/             # LLM providers and embeddings
│   ├── services/           # Vector store, personas, project management
│   ├── parsers/            # File format parsers (.ipynb, .md, .py)
│   └── tests/              # Comprehensive test suite
├── frontend/               # Streamlit interface
│   ├── components/         # UI components and project management
│   └── services/           # Frontend service layer
└── data/                   # Storage directories
    ├── vector_store/       # ChromaDB persistence
    ├── projects/           # Project management data
    └── uploads/            # Uploaded files

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
.claude		.claude
.github/workflows		.github/workflows
.vscode		.vscode
docs		docs
root		root
.DS_Store		.DS_Store
.env.template		.env.template
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
launch-parallel.sh		launch-parallel.sh
launch.sh		launch.sh
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Agentic Blogging Assistant

Core Functionality

Technical Capabilities

User Experience

🏗️ Technical Architecture

System Overview

Agent Architecture Deep Dive

1. ContentParsingAgent

2. OutlineGeneratorAgent

3. BlogDraftGeneratorAgent

Vector Storage & Semantic Intelligence

ChromaDB Implementation

Retrieval Techniques

Performance Optimization

Caching Implementation

Agent Coordination

Quality Assurance & Testing

Testing Infrastructure

Quality Features

🚀 Setup & Deployment

Prerequisites

Environment Configuration

Installation & Launch

Quick Start (Parallel Launch)

Manual Setup

Testing

Development Features

Launch Scripts

Storage Structure

📊 Key Technical Decisions & Rationale

Why LangGraph?

Why ChromaDB?

Why FastAPI + Streamlit?

Quality-First Design

📁 File Structure

Core Components

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages