OCR-based pipeline to extract and analyse call records from mobile screenshots, with a focus on spam detection and correlation with French personal data leaks.
Tailored for Google Phone app screenshots in French — it understands French date formats (11 déc, Lun, …) and the layout produced by the app.
Late 2025 and early 2026 saw a wave of massive personal data breaches in France exposing many phone numbers, email addresses, and identity records.
Shortly after, many people (myself included) noticed a sharp rise in unsolicited calls. The timing is hard to ignore.
This project attempts to visualise that correlation: plotting my incoming spam calls per month alongside the French data leak timeline, to see whether the spikes line up.
LeaksFetcher: fetches the bonjour-la-fuite YAML database, counts French data leak events per month, and writesdata/leaks.json.OcrPipeline: iterates over images inscreenshots/, runs Tesseract OCR on each, parses phone numbers and call metadata (number, datetime, spam flag) via regex, and writesdata/calls.csv.ChartRenderer: readscalls.csvandleaks.json, then renders an additive stacked bar chart (unknown vs spam calls per month) with a leaks overlay line on a twin Y axis, saved todata/calls_chart.png.
Tesseract must be installed on the host:
# Debian / Ubuntu
sudo apt install tesseract-ocr tesseract-ocr-fraPython and uv are required.
uv syncCopy .env.example to .env and adjust as needed:
TESSERACT_CMD=/usr/bin/tesseract
OCR_LANG=fra
SHOW_CHART=trueThe program assumes data and screenshots are located in the current working directory.
Run the full pipeline in one command:
uv run phone-calls-analysisOr run each step individually:
# Fetch French data leaks → data/leaks.json
uv run fetch_leaks
# Extract calls from screenshots → data/calls.csv
uv run ocr
# Render the chart → data/calls_chart.png
uv run graph| File | Description |
|---|---|
data/calls.csv |
Extracted call records: number, ISO 8601 datetime, spam flag |
data/leaks.json |
French data leak counts per month (YYYY-MM → count) |
data/calls_chart.png |
Stacked bar chart with leaks overlay |
Example chart with fake data:
This project is licensed under the AGPL 3.0 License. See the LICENSE file for details.
