Refabric Brand Intelligence Agent

<This part is not AI generated>

I approached this project as a pipeline, that can be broken into stages. Mainly: Scraping, Processing (Text + Image) and Output generation. Passed a shared context object through stages. Runs locally, only ANTHROPIC_KEY is required.

Scraping. No brand specific knowledge in the crawler. Start with a large pool, extract images and let the next stage filter out. Ignored js-rendered pages for now, but would add playwright for the next iteration.

Processing (Text + Image). Text and Image processing as separate stages. Text with LLMs (Summarize chunks by Haiku, extract brand voice by Sonnet). For images, CLIP and phash for filtering, clustering with UMAP + k-selection, basically.

Output. JSON obj and turn it into PDF.

AI Usage

I used Claude Code to accelerate the process.

Turning architectural notes into implementation
Boilerplate, tests, fixtures
Optimization and edge case hunting

What I manually verified:

Full pipeline on 10+ brands
Tweaked params and finetuned
Error boundaries

If I had time

I would add

FashionCLIP support
Better color palette matching (skin/tan fabric issues, object bg detection etc)
Cross-modal coherence check (visual cluster labels vs brand voice embeddings)
Playwright auto-trigger when crawl yield is below threshold (on JS-heavy storefronts)
GPU auto-detection (CPU is fine at ~150 images, breaks above ~2000)
Stage-per-worker architecture with a job queue between them (each scaled independently)

</This part is not AI generated>

The brief: give it a brand name and URL, get back a structured Brand DNA, color palette, garment mix, aesthetic clusters, brand voice, as a PDF and a JSON sidecar.

Approach

The interesting constraint here was that the system had to work on any fashion brand without code changes. No per-site selectors, no scrapers tuned to a specific DOM. That ruled out the obvious approach and forced a more principled one.

The solution is a brand-agnostic pipeline: crawl generically, let CLIP zero-shot classify what's fashion, deduplicate in embedding space, cluster aesthetically, then use LLMs to synthesize brand voice from the text corpus. Each stage reads from and writes to a shared BrandContext. One stage failing doesn't crash the run, you get a partial dossier, not a crash.

Quick Start

pip install -e ".[dev]"
export ANTHROPIC_API_KEY="your-key"

make run BRAND=cos             # full pipeline
make render-pdf BRAND=cos      # re-render PDF from existing run, no crawl
make test                      # unit tests + eval regression gate
make eval                      # offline quality report on existing dossiers

Adding a Brand

brands:
  - id: my_brand
    name: My Brand
    domain: https://my-brand.com
    social:
      instagram: my_brand_handle # optional

No code changes.

Pipeline Architecture

brands.yaml → CLI → Pipeline
                     ├── CrawlStage        sitemap + BFS → images + text corpus
                     ├── SocialStage       public Instagram OG metadata, graceful
                     ├── VisionStage       CLIP filter → dedup → color → embed → hero → garment+patterns → cluster
                     ├── TextAnalysisStage Haiku map → Sonnet synthesis + cluster labels
                     └── PDFStage          dossier.json + dossier.pdf

Every stage implements a two-method protocol: name and run(ctx) → ctx. The pipeline wraps each in try/except. Failures land in ctx.failures and manifest.json. State flows through a single BrandContext object.

Key Decisions

Dumb crawler, smart filter. No per-site selectors. Sitemap (indexes recursed, locale duplicates collapsed to one canonical path) → BFS fallback, collect everything, CLIP zero-shot decides what's fashion. You pull more than you need, but inference is local and fast.

One embedding space. OpenCLIP ViT-B/32 runs filtering, dedup, garment scoring, and clustering. A cosine threshold belongs to the space it was calibrated in — bringing in FashionCLIP would silently shift the geometry and everything downstream (0.95 dedup, 0.90 cluster-merge, logit scale) would re-mean. Not worth it at this scale. Dedup reuses the embeddings already captured during filtering, so nothing is re-encoded. pHash (Hamming ≤ 8) handles resized/recompressed duplicates first; CLIP cosine catches semantic ones.

Support-capped clustering with post-merge. UMAP → 50D → k-means, silhouette-selected k in [3, 6], k also bounded by N // min_images_per_cluster so you can't get 6 clusters of 2 from 13 images. After k-selection, clusters with cosine-similar centroids in the original CLIP space (≥ 0.90) are merged — UMAP sometimes splits one aesthetic into two separable blobs and silhouette rewards both. Silhouette 0.3–0.5 at ~100 images is normal for a brand with a unified look, not a failure. Confidence gates on it.

Adaptive LLM tiering. Corpus under ~12k chars goes straight to Sonnet. Larger runs a Haiku map step first (per-page summaries, 6 concurrent), then Sonnet reduces. Cluster labels are concurrent Sonnet vision calls (4 workers). Threads not asyncio — the sync Anthropic client is thread-safe, the pipeline is synchronous end-to-end, ThreadPoolExecutor is the right tool.

Soft garment voting. Hard argmax discards confidence — a barely-won vote and a certain one look the same. Each image contributes temperature-scaled probability mass across all categories instead; the pipeline sums expected counts. Logit scale is 100.0, matching CLIP's training temperature. Without it, softmax over raw cosines is near-uniform and every styling axis collapses to ~0.5.

Pantone matching. Nearest Pantone by ΔE in LAB space, matched against the full library. RGB distance isn't perceptually uniform. 120+ fashion-specific color names reduce Pantone fallback for common colors. Each palette entry also links to the catalog image that most prominently features that color.

Honest confidence. HIGH / MEDIUM / LOW derive from the signals the run actually produced: silhouette, image count, evidence-quote count. Not proxies like cluster count or corpus length.

Evaluation

The pipeline isn't open-loop. "Accuracy" has no gold label in brand strategy, so quality breaks into separately measurable properties:

Regression gate (make test, free, no API):

Cluster determinism, seeded pipeline → byte-identical output
Cross-seed stability (ARI) and permutation invariance — reported, not hard-gated, because low ARI at small N is an honest finding
Synthetic palette recovery, known-color images in, known color recovered; black garments survive background removal
Dedup correctness, same image at three sizes merges; three distinct images don't
Calibration consistency, re-derives each stored dossier's confidence from its signals, flags mismatches

**refabric eval** (offline, free):

Confidence calibration, flags dossiers claiming more than their signals support
Discriminability, voice-token Jaccard across brands; catches near-boilerplate output

Scalability

At current scale (~150 images/brand):

CLIP inference runs locally, fast on CPU
LLM cost scales with site size, not brand count
Content-addressed storage deduplicates for free
Brand runs share no state; wrapping Pipeline.run() in a process pool is full horizontal scale-out

What breaks first: at 2000+ images, CPU CLIP becomes slow (fix: GPU device check + larger batches, the batch loop is already in clip_filter.py). Parallel brand runs can race on the brand-level content store (fix: per-run storage paths). Single-process orchestration doesn't scale to 100+ brands (fix: job queue with workers running the same Pipeline code unchanged).

Production sketch:

brands queue → worker pool → object store (S3/GCS, same SHA256 keys)
                           → metadata store (Postgres)
                           → eval gate (CI, frozen fixtures)

The pipeline code, Stage protocol, and BrandContext transfer unchanged.

Data Collection & ToS

Crawler identifies itself: User-Agent: RefabricBrandAgent/1.0, rate-limited to 2 req/s per host
Sitemap-first discovery, BFS fallback, same-origin links only
No IP rotation, no browser spoofing, no CAPTCHA solving. Bot-protection pages are detected, logged to ctx.crawl_blocked_by, and surfaced on the methodology page
Instagram: public Open Graph metadata only (profile image, bio, follower count). No login, no gallery scraping. A login wall is documented as blocked, not retried

Output

runs/{brand_id}/{timestamp}/
├── manifest.json                    # stage timings, image counts, failures
├── images/                          # content-addressed image files
├── metadata/                        # per-image metadata JSON
├── analysis/
│   ├── clusters.json                # cluster assignments and metadata
│   ├── embeddings.npy               # OpenCLIP embeddings (persisted for PDF re-render)
│   └── embedding_index.json         # path list matching embeddings.npy rows
├── dossier.json                     # structured Brand DNA
└── dossier.pdf                      # human-readable report

Embeddings are persisted after VisionStage so that render-pdf can re-select the hero image and layout images without re-running the pipeline.

Docker

docker build -t refabric .
docker run -e ANTHROPIC_API_KEY=your-key refabric run cos

Tests

pytest tests/ -v

~85 tests: config validation, URL filtering, dedup logic (pHash + CLIP), color extraction and Pantone matching, clustering (including cluster merging), CLIP filter, garment/pattern/fabric analysis, PDF generation, pipeline failure isolation, and the eval regression gate.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
src/refabric		src/refabric
templates		templates
tests		tests
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
brands.yaml		brands.yaml
memory.md		memory.md
pyproject.toml		pyproject.toml
run_pipeline.py		run_pipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Refabric Brand Intelligence Agent

Approach

Quick Start

Adding a Brand

Pipeline Architecture

Key Decisions

Evaluation

Scalability

Data Collection & ToS

Output

Docker

Tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Refabric Brand Intelligence Agent

Approach

Quick Start

Adding a Brand

Pipeline Architecture

Key Decisions

Evaluation

Scalability

Data Collection & ToS

Output

Docker

Tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages