Context
The spam analyzer models 45+ signals based on SpamAssassin rules, CAN-SPAM, and GDPR patterns. But it hasn't been validated against a large corpus of real emails to calibrate scoring weights.
What needs to happen
-
Build a test corpus — collect ~100 emails across categories:
- Legitimate transactional (receipts, shipping, password resets)
- Legitimate marketing (newsletters, promotions)
- Known spam/phishing examples (available from public datasets)
-
Run the spam analyzer against each and compare scores to expected classification
-
Tune weights — adjust signal weights so that:
- Legitimate emails score 80+
- Spam emails score below 40
- Edge cases (aggressive marketing) land in the 40-70 range
How to contribute
This is a great contribution for someone interested in email deliverability. You don't need to write much code — mostly curating test data and running the existing analyzer.
bun install
bun test -- --grep "spam" # Run existing spam tests
The spam analyzer lives in `src/analyzers/spam/`.
Context
The spam analyzer models 45+ signals based on SpamAssassin rules, CAN-SPAM, and GDPR patterns. But it hasn't been validated against a large corpus of real emails to calibrate scoring weights.
What needs to happen
Build a test corpus — collect ~100 emails across categories:
Run the spam analyzer against each and compare scores to expected classification
Tune weights — adjust signal weights so that:
How to contribute
This is a great contribution for someone interested in email deliverability. You don't need to write much code — mostly curating test data and running the existing analyzer.
The spam analyzer lives in `src/analyzers/spam/`.