Senior Data Scientist/Engineer building production reproducible analytical piplines (RAP), ML/GenAI systems at the Food Standards Agency. Previously led open-source data engineering at the Office for National Statistics.
- LangChain Agent - Intelligent data standardization for 360+ local authority data sources with extreme format variance
- NLP Classification - DistilBERT transformer model (82% accuracy, 240-class classification) on Azure
- Platform Engineering - Migrating enterprise data to Databricks Medallion architecture (Azure/Databricks)
- ML Production Systems - Full lifecycle deployment, monitoring, MLOps best practices
ML/GenAI: LangChain • Transformers (DistilBERT, FastText) • Scikit-learn • PyTorch • MLFlow
Data Engineering: Databricks • PySpark • Apache Spark • Python • SQL
Cloud & DevOps: Azure • GCP • GitHub Actions • CI/CD
Databases: BigQuery • DuckDB
A thread running through my open source work:
| Project | What it does | SDG |
|---|---|---|
| SDG 11.2.1 — Transport Access | Measures % of UK population with convenient access to public transport, disaggregated by age, sex and disability | SDG 11 |
| Public Transport Efficiency | Ranks UK cities by public vs private transport journey time ratio | SDG 11 |
| EV Charging Demand Optimisation | Forecasts grid carbon intensity and optimises EV charging schedules to minimise emissions | SDG 7, SDG 13 |
As a member of the ONS Sustainable Development Goals team, I helped maintain the UK Sustainable Indicators website and built data pipelines to measure the UK's progress toward the UN SDGs, tracking public transport access and efficiency across UK cities. I was measuring the gap between where we are and where we need to be. Now I'm building systems to close it.
Unfortunately the vast majority of my work is closed-source but I pioneered the use of an open-from-the-start that adheres to government guidelines.
Production system calculating UK national accounts R&D expenditure statistics. Pioneered open-source approach within ONS, establishing pattern for government data science transparency.
Scale & Impact:
- 4,680+ commits across 20 contributors
- 94 production releases serving national statistics
- Comprehensive CI/CD pipeline, testing (61% coverage), with excellent technical and non-technical user documentation
- Set precedent for open-source government data projects at ONS
Role: Tech-lead and open-source advocate, stakeholder liaison gaining approval for public release from project inception.
Sustainability • Food Systems • Production ML systems • Data engineering best practices • Practical AI applications • Open government software




