Skip to content

confxsd/refabric

Repository files navigation

Refabric Brand Intelligence Agent

<This part is not AI generated>

I approached this project as a pipeline, that can be broken into stages. Mainly: Scraping, Processing (Text + Image) and Output generation. Passed a shared context object through stages. Runs locally, only ANTHROPIC_KEY is required.

Scraping. No brand specific knowledge in the crawler. Start with a large pool, extract images and let the next stage filter out. Ignored js-rendered pages for now, but would add playwright for the next iteration.

Processing (Text + Image). Text and Image processing as separate stages. Text with LLMs (Summarize chunks by Haiku, extract brand voice by Sonnet). For images, CLIP and phash for filtering, clustering with UMAP + k-selection, basically.

Output. JSON obj and turn it into PDF.

AI Usage

I used Claude Code to accelerate the process.

  • Turning architectural notes into implementation
  • Boilerplate, tests, fixtures
  • Optimization and edge case hunting

What I manually verified:

  • Full pipeline on 10+ brands
  • Tweaked params and finetuned
  • Error boundaries

If I had time

I would add

  • FashionCLIP support
  • Better color palette matching (skin/tan fabric issues, object bg detection etc)
  • Cross-modal coherence check (visual cluster labels vs brand voice embeddings)
  • Playwright auto-trigger when crawl yield is below threshold (on JS-heavy storefronts)
  • GPU auto-detection (CPU is fine at ~150 images, breaks above ~2000)
  • Stage-per-worker architecture with a job queue between them (each scaled independently)

</This part is not AI generated>

The brief: give it a brand name and URL, get back a structured Brand DNA, color palette, garment mix, aesthetic clusters, brand voice, as a PDF and a JSON sidecar.

Approach

The interesting constraint here was that the system had to work on any fashion brand without code changes. No per-site selectors, no scrapers tuned to a specific DOM. That ruled out the obvious approach and forced a more principled one.

The solution is a brand-agnostic pipeline: crawl generically, let CLIP zero-shot classify what's fashion, deduplicate in embedding space, cluster aesthetically, then use LLMs to synthesize brand voice from the text corpus. Each stage reads from and writes to a shared BrandContext. One stage failing doesn't crash the run, you get a partial dossier, not a crash.

Quick Start

pip install -e ".[dev]"
export ANTHROPIC_API_KEY="your-key"

make run BRAND=cos             # full pipeline
make render-pdf BRAND=cos      # re-render PDF from existing run, no crawl
make test                      # unit tests + eval regression gate
make eval                      # offline quality report on existing dossiers

Adding a Brand

brands:
  - id: my_brand
    name: My Brand
    domain: https://my-brand.com
    social:
      instagram: my_brand_handle # optional

No code changes.

Pipeline Architecture

brands.yaml → CLI → Pipeline
                     ├── CrawlStage        sitemap + BFS → images + text corpus
                     ├── SocialStage       public Instagram OG metadata, graceful
                     ├── VisionStage       CLIP filter → dedup → color → embed → hero → garment+patterns → cluster
                     ├── TextAnalysisStage Haiku map → Sonnet synthesis + cluster labels
                     └── PDFStage          dossier.json + dossier.pdf

Every stage implements a two-method protocol: name and run(ctx) → ctx. The pipeline wraps each in try/except. Failures land in ctx.failures and manifest.json. State flows through a single BrandContext object.

Key Decisions

Dumb crawler, smart filter. No per-site selectors. Sitemap (indexes recursed, locale duplicates collapsed to one canonical path) → BFS fallback, collect everything, CLIP zero-shot decides what's fashion. You pull more than you need, but inference is local and fast.

One embedding space. OpenCLIP ViT-B/32 runs filtering, dedup, garment scoring, and clustering. A cosine threshold belongs to the space it was calibrated in — bringing in FashionCLIP would silently shift the geometry and everything downstream (0.95 dedup, 0.90 cluster-merge, logit scale) would re-mean. Not worth it at this scale. Dedup reuses the embeddings already captured during filtering, so nothing is re-encoded. pHash (Hamming ≤ 8) handles resized/recompressed duplicates first; CLIP cosine catches semantic ones.

Support-capped clustering with post-merge. UMAP → 50D → k-means, silhouette-selected k in [3, 6], k also bounded by N // min_images_per_cluster so you can't get 6 clusters of 2 from 13 images. After k-selection, clusters with cosine-similar centroids in the original CLIP space (≥ 0.90) are merged — UMAP sometimes splits one aesthetic into two separable blobs and silhouette rewards both. Silhouette 0.3–0.5 at ~100 images is normal for a brand with a unified look, not a failure. Confidence gates on it.

Adaptive LLM tiering. Corpus under ~12k chars goes straight to Sonnet. Larger runs a Haiku map step first (per-page summaries, 6 concurrent), then Sonnet reduces. Cluster labels are concurrent Sonnet vision calls (4 workers). Threads not asyncio — the sync Anthropic client is thread-safe, the pipeline is synchronous end-to-end, ThreadPoolExecutor is the right tool.

Soft garment voting. Hard argmax discards confidence — a barely-won vote and a certain one look the same. Each image contributes temperature-scaled probability mass across all categories instead; the pipeline sums expected counts. Logit scale is 100.0, matching CLIP's training temperature. Without it, softmax over raw cosines is near-uniform and every styling axis collapses to ~0.5.

Pantone matching. Nearest Pantone by ΔE in LAB space, matched against the full library. RGB distance isn't perceptually uniform. 120+ fashion-specific color names reduce Pantone fallback for common colors. Each palette entry also links to the catalog image that most prominently features that color.

Honest confidence. HIGH / MEDIUM / LOW derive from the signals the run actually produced: silhouette, image count, evidence-quote count. Not proxies like cluster count or corpus length.

Evaluation

The pipeline isn't open-loop. "Accuracy" has no gold label in brand strategy, so quality breaks into separately measurable properties:

Regression gate (make test, free, no API):

  • Cluster determinism, seeded pipeline → byte-identical output
  • Cross-seed stability (ARI) and permutation invariance — reported, not hard-gated, because low ARI at small N is an honest finding
  • Synthetic palette recovery, known-color images in, known color recovered; black garments survive background removal
  • Dedup correctness, same image at three sizes merges; three distinct images don't
  • Calibration consistency, re-derives each stored dossier's confidence from its signals, flags mismatches

**refabric eval** (offline, free):

  • Confidence calibration, flags dossiers claiming more than their signals support
  • Discriminability, voice-token Jaccard across brands; catches near-boilerplate output

Scalability

At current scale (~150 images/brand):

  • CLIP inference runs locally, fast on CPU
  • LLM cost scales with site size, not brand count
  • Content-addressed storage deduplicates for free
  • Brand runs share no state; wrapping Pipeline.run() in a process pool is full horizontal scale-out

What breaks first: at 2000+ images, CPU CLIP becomes slow (fix: GPU device check + larger batches, the batch loop is already in clip_filter.py). Parallel brand runs can race on the brand-level content store (fix: per-run storage paths). Single-process orchestration doesn't scale to 100+ brands (fix: job queue with workers running the same Pipeline code unchanged).

Production sketch:

brands queue → worker pool → object store (S3/GCS, same SHA256 keys)
                           → metadata store (Postgres)
                           → eval gate (CI, frozen fixtures)

The pipeline code, Stage protocol, and BrandContext transfer unchanged.

Data Collection & ToS

  • Crawler identifies itself: User-Agent: RefabricBrandAgent/1.0, rate-limited to 2 req/s per host
  • Sitemap-first discovery, BFS fallback, same-origin links only
  • No IP rotation, no browser spoofing, no CAPTCHA solving. Bot-protection pages are detected, logged to ctx.crawl_blocked_by, and surfaced on the methodology page
  • Instagram: public Open Graph metadata only (profile image, bio, follower count). No login, no gallery scraping. A login wall is documented as blocked, not retried

Output

runs/{brand_id}/{timestamp}/
├── manifest.json                    # stage timings, image counts, failures
├── images/                          # content-addressed image files
├── metadata/                        # per-image metadata JSON
├── analysis/
│   ├── clusters.json                # cluster assignments and metadata
│   ├── embeddings.npy               # OpenCLIP embeddings (persisted for PDF re-render)
│   └── embedding_index.json         # path list matching embeddings.npy rows
├── dossier.json                     # structured Brand DNA
└── dossier.pdf                      # human-readable report

Embeddings are persisted after VisionStage so that render-pdf can re-select the hero image and layout images without re-running the pipeline.

Docker

docker build -t refabric .
docker run -e ANTHROPIC_API_KEY=your-key refabric run cos

Tests

pytest tests/ -v

~85 tests: config validation, URL filtering, dedup logic (pHash + CLIP), color extraction and Pantone matching, clustering (including cluster merging), CLIP filter, garment/pattern/fabric analysis, PDF generation, pipeline failure isolation, and the eval regression gate.

About

Refabric AI Brand Intelligence case study

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors