Skip to content

LockBlock-dev/phone-calls-analysis

Repository files navigation

phone-calls-analysis

OCR-based pipeline to extract and analyse call records from mobile screenshots, with a focus on spam detection and correlation with French personal data leaks.

Tailored for Google Phone app screenshots in French — it understands French date formats (11 déc, Lun, …) and the layout produced by the app.

Project goal

Late 2025 and early 2026 saw a wave of massive personal data breaches in France exposing many phone numbers, email addresses, and identity records.

Shortly after, many people (myself included) noticed a sharp rise in unsolicited calls. The timing is hard to ignore.

This project attempts to visualise that correlation: plotting my incoming spam calls per month alongside the French data leak timeline, to see whether the spikes line up.

How it works

  1. LeaksFetcher: fetches the bonjour-la-fuite YAML database, counts French data leak events per month, and writes data/leaks.json.
  2. OcrPipeline : iterates over images in screenshots/, runs Tesseract OCR on each, parses phone numbers and call metadata (number, datetime, spam flag) via regex, and writes data/calls.csv.
  3. ChartRenderer: reads calls.csv and leaks.json, then renders an additive stacked bar chart (unknown vs spam calls per month) with a leaks overlay line on a twin Y axis, saved to data/calls_chart.png.

Requirements

System

Tesseract must be installed on the host:

# Debian / Ubuntu
sudo apt install tesseract-ocr tesseract-ocr-fra

Python

Python and uv are required.

uv sync

Configuration

Copy .env.example to .env and adjust as needed:

TESSERACT_CMD=/usr/bin/tesseract
OCR_LANG=fra
SHOW_CHART=true

Usage

The program assumes data and screenshots are located in the current working directory.

Run the full pipeline in one command:

uv run phone-calls-analysis

Or run each step individually:

# Fetch French data leaks → data/leaks.json
uv run fetch_leaks

# Extract calls from screenshots → data/calls.csv
uv run ocr

# Render the chart → data/calls_chart.png
uv run graph

Output

File Description
data/calls.csv Extracted call records: number, ISO 8601 datetime, spam flag
data/leaks.json French data leak counts per month (YYYY-MM → count)
data/calls_chart.png Stacked bar chart with leaks overlay

Example

Example chart with fake data:

Exemple chart

Credits

License

This project is licensed under the AGPL 3.0 License. See the LICENSE file for details.

About

OCR-based tool to extract phone call records from screenshots and analyse spam patterns with charts

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages