New models and refactor around `proc_data.py` by Cloud0310 · Pull Request #15 · QuarticCat/detypify

Cloud0310 · 2026-05-05T09:14:50Z

Summary

This PR refactors the Python data pipeline and adds configurable MobileNet model training.

Data Pipeline

Split the previous proc_data.py implementation into packaged modules under python/detypify.
Store raw LaTeX-labelled datasets separately, then map labels to Typst symbols locally with datasets caching.
Generate frontend metadata under build/generated.
Update the weekly Typst metadata workflow to compare effective mapping digests before regenerating metadata.

Training And Models

Replace the fixed model enum with mobilenet_{v4|v5}_{size} model names.
Add MobileNetV5 support as experimental.
Add --compile/--no-compile for torch.compile.
Add CPU/CUDA/RoCM dependency extras for training environments.
Remove TensorRT dependencies and options.

Frontend And CI

Update frontend service metadata imports to use generated metadata.
Add frontend dependency security updates.
Move Ruff configuration into ruff.toml.

Notes

MobileNetV5 support is experimental and needs training validation.
Training should use explicit accelerator extras, for example uv run --extra cuda python/train.py.

Due to the size of the tensorrt lib and the optimization of current model is already speedy enough, we remove the dep for better installation experience.

Replace hardcoded ModelName enum with regex-based parser supporting mobilenet_{v4|v5}_{size} naming convention. Refactor TimmModel to MobileNetModel using create_project_model factory with V4 support.

Add V5 model path to create_project_model using timm's mobilenetv5_base with configurable channel multiplier. Update default training models to include mobilenet_v5_010 and mobilenet_v5_005.

Cloud0310 · 2026-05-05T09:21:26Z

@QuarticCat Please try new model training, offer me the test set result image and training log. All should be under dir build.

Here's the some things needs your notification when training:
When training, add --no-ema cli option, as I want just the raw exp data.
Models to be trained: v4_035, v5_010, v_005. (the number means percentage of full size model. here, is 35% of small_v4 variant, 10% of full size v5).

…rst 64 bit

Cloud0310 added 22 commits April 30, 2026 04:35

remove(dep): remove tensorrt

1eb6535

Due to the size of the tensorrt lib and the optimization of current model is already speedy enough, we remove the dep for better installation experience.

chore(deps): clean up implicit dep: huggingface-hub.

13bf97c

fix typing issues with pyrefly

d00c813

add(deps): support for cpu training

a407083

add(deps): support for rocm training

f15c6e2

chore: update pyproject.toml comments

6c98171

fix: remove tensorrt option

89ab3c6

fix: wrongly platform mark

bf4c2a2

chore: split ruff config from pyproject.toml

055469a

chore: ignore B008 lint rule in typer.Option use

f87553a

fix: use local logger in python/callbacks.py

2dbdc35

refactor(python): package data and training modules

963b43f

ci(data): detect effective Typst metadata changes

92fc3a7

docs(python): document packaged data pipeline

70c2b73

chore(gitignore): ignore local data artifacts

fc40672

fix(frontend): load generated Python metadata

04b0ca8

docs(python): refresh usage instructions

6553e1e

fix(deps): resolve frontend security advisories

b746b81

fix(deps): fix rocm platform deps missing problem

d18ebe6

feat(python): add canonical mobilenet model name parser

6f16c5d

Replace hardcoded ModelName enum with regex-based parser supporting mobilenet_{v4|v5}_{size} naming convention. Refactor TimmModel to MobileNetModel using create_project_model factory with V4 support.

feat(python): add mobilenetv5 model support

cc68376

Add V5 model path to create_project_model using timm's mobilenetv5_base with configurable channel multiplier. Update default training models to include mobilenet_v5_010 and mobilenet_v5_005.

docs(python): document MobileNet training aliases

bdb3509

Cloud0310 requested a review from QuarticCat May 5, 2026 09:18

Cloud0310 added 6 commits May 8, 2026 12:00

fix: migrate to blake2b checksum, and trim datasets fingerprint to fi…

38c23ab

…rst 64 bit

fix: regression of checksum error

e46fea7

fix: add back exportable=True for onnx export

113c15a

add: warning for compability

2b22183

fix: move bs4 to required deps

4dcb35d

fix: mobilenet_v3 creatation don't have a corresponding exportflag

8f473ba

Cloud0310 and others added 15 commits May 9, 2026 18:10

refactor: use is_bf16_available for compatibily checking

a4d2a3e

add: cli option for control whether or not to use torch.compile

40a14e5

chore: apply lazy import

65e12ec

fix: potential infer.json mis-alignment between model and frontend

26f1f5b

chore: remove redundant checking

1a8c17d

chore(docs): extra training arg for README.md train docs

d34a57d

chore: simplify DataFrame operations

3367890

chore: remove unused noqa: B008 directives

c917371

chore: omit lxml parser arg

3743c98

remove double struck mismappings

55fbedd

use separate metrics for train/test/val

ea6a4db

fix: update supplement symbol mapping

3f0ecd3

fix: save_weights_only causing EMA export ONNX is not working

7bb5b29

fix: update symbol mapping again

257fe0e

fix: enable to_onnx optimization only when use_compile

628b13e

QuarticCat merged commit 08c2249 into main Jun 2, 2026
6 checks passed

QuarticCat deleted the cloud-dev branch June 2, 2026 18:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New models and refactor around `proc_data.py`#15

New models and refactor around `proc_data.py`#15
QuarticCat merged 43 commits into
mainfrom
cloud-dev

Cloud0310 commented May 5, 2026 •

edited

Loading

Uh oh!

Cloud0310 commented May 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Cloud0310 commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Data Pipeline

Training And Models

Frontend And CI

Notes

Uh oh!

Cloud0310 commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Cloud0310 commented May 5, 2026 •

edited

Loading

Cloud0310 commented May 5, 2026 •

edited

Loading