generated from mantinedev/vite-template
-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Summary
Implement the llms.txt standard to make PolicyEngine research articles more token-efficient for AI extraction. This follows the pattern used by Bun, Svelte, and other projects.
Problem
When AIs extract our articles, they consume excessive tokens due to:
- Embedded Plotly chart JSON (40-100 lines of styling per chart)
- Need to crawl multiple pages
- No machine-readable summary format
Proposed Solution
Files to Generate
| File | Contents | Size Est. |
|---|---|---|
/llms.txt |
Index with links to sections | ~2KB |
/llms-full.txt |
All articles combined, charts replaced with summaries | ~500KB |
/llms-research-us.txt |
US articles only | ~200KB |
/llms-research-uk.txt |
UK articles only | ~200KB |
Format
# PolicyEngine Research
> PolicyEngine analyzes tax and benefit policy impacts through microsimulation modeling for the US and UK.
## Recent Research
- [Article Title](slug): One-line description
## Docs
- [API Documentation](/docs/api.md)
- [Python Package](/docs/python.md)
## Full Articles
- [US Research](/llms-research-us.txt)
- [UK Research](/llms-research-uk.txt)Article Format in llms-full.txt
---
# Article Title
Date: 2025-01-15
Authors: Max Ghenis
Tags: us, tax, reform
---
Article content here...
**Figure 1: Distributional Impact by Income Decile**
<!-- Chart showing: Bottom decile gains $X, top decile loses $Y. Progressive overall. -->
More content...
---Key changes from raw articles:
- Plotly JSON replaced with text descriptions of what charts show
- Consistent YAML-style header
- Delimiter between articles
Implementation
Build Script
Create scripts/generate-llms-txt.ts that:
- Reads all articles from
app/src/data/posts/articles/ - Reads metadata from
posts.json - Strips Plotly JSON, replaces with chart caption/summary
- Concatenates into single files by region
- Generates index
llms.txt
CI Integration
Add to build process so files are regenerated on each deploy.
Workflow for PRs Adding New Articles
When adding a new article:
- No extra work required - the build script auto-generates llms.txt files
- Optional: Add
ai_summaryfield to your post inposts.json:{ "title": "Rail Fares Freeze Analysis", "ai_summary": "Analyzes 2025 rail fares freeze: costs £X, benefits higher earners disproportionately, top decile receives Y% of benefit." } - For charts: Use descriptive captions that explain the key takeaway:
The caption becomes the chart summary in llms.txt.
**Figure 1: Winners and losers by income decile**
Chart Summary Generation
For Plotly charts without good captions, the script will:
- Use the
**Figure X:**caption if present - Fall back to extracting axis labels from the JSON
- Mark as
[Chart: see original article]if no info available
Tasks
- Create
scripts/generate-llms-txt.ts - Add chart-to-summary extraction logic
- Add optional
ai_summaryfield to posts.json schema - Integrate into build process
- Add to CI workflow
- Document in CONTRIBUTING.md or similar
References
Metadata
Metadata
Assignees
Labels
No labels