Text Line Segmentation with YOLO11

This repository contains scripts for training and using YOLO11 models for text line segmentation in historical documents.

Scripts Overview

1. `convert_page_to_yolo.py`

Converts PAGE-XML annotations to YOLO format for segmentation training.

python convert_page_to_yolo.py input_dir output_dir --target-height 640 --element-type textline

2. `convert_alto_to_yolo.py`

Converts ALTO-XML annotations to YOLO format for segmentation training.

python convert_alto_to_yolo.py input_dir output_dir --target-height 640 --element-type textline

3. `visualize_masks.py`

Visualizes YOLO segmentation masks on images.

python visualize_masks.py --dataset /path/to/dataset --output-dir /path/to/output

4. `train.py`

Basic training script for YOLO11 segmentation models.

python train.py \
    --dataset /path/to/dataset \
    --model-size m \
    --batch-size 8 \
    --epochs 100 \
    --pretrained \
    --val \
    --plots

Metrics:

5. `train_improved.py`

Enhanced training script with improved augmentation and training parameters.

python train_improved.py \
    --dataset /path/to/dataset \
    --model-size m \
    --batch-size 12 \
    --epochs 100 \
    --pretrained \
    --val \
    --plots

Key improvements in train_improved.py:

Enhanced augmentation (mosaic, mixup, copy-paste)
Better learning rate scheduling
Improved regularization
Optimized for segmentation performance

Metrics:

6. `app.py`

Interactive Gradio web interface for model inference.

python app.py

Features:

Lists all available model checkpoints from runs/train/
Upload images for prediction
Toggle between mask and bounding box visualization
Adjust confidence threshold
Real-time visualization

7. `plot_yolo_metrics.py`

A script to make a plot from YOLO train results.csv metrics (see the output in pct. 4-5).

python plot_yolo_metrics.py runs/segment/sam_yolo11-seg/results.csv plot.png

Dataset Structure

The dataset should be organized as follows:

dataset/
├── images/
│   ├── train/
│   └── val/
├── labels/
│   ├── train/
│   └── val/
└── dataset.yaml

The dataset.yaml file should contain:

path: /path/to/dataset
train: images/train
val: images/val
names:
  0: textline

Training Progress Metrics

Box Loss: Detection accuracy
Mask Loss: Segmentation quality
Precision: Accuracy of detections
Recall: Coverage of text lines
mAP50: Mean Average Precision at 50% IoU
mAP50-95: Mean Average Precision at various IoU thresholds

Visualization Features

Green masks for text lines
Red bounding boxes (optional)
Confidence scores
Interactive web interface

Important Notes

The model is trained for single-class text line segmentation
Supports various YOLO11 model sizes (n, s, m, l, x)
Automatic mixed precision training is enabled
Cosine learning rate scheduling is used
Data augmentation is optimized for document images

Hardware Requirements

NVIDIA GPU with at least 12GB VRAM recommended
Batch size should be adjusted based on available GPU memory
For RTX 3060 12GB, recommended batch size is 8-12 for YOLO11m

Model Performance

The model achieves high accuracy in text line segmentation with:

High precision and recall
Accurate mask boundaries
Good handling of various text line orientations
Robust performance on different document styles

Notes

The conversion script preserves original polygon shapes without padding
Training uses single-class segmentation for text lines
The model supports various sizes (nano to xlarge) for different performance requirements

Training Approaches Comparison

We compared different training configurations to find the optimal setup for text line segmentation. Here are the results:

1. Original Training (YOLO11m)

Configuration:
- Model: YOLO11m
- Batch size: 8
- Optimizer: AdamW
Final metrics:
- Box Loss: 0.713
- Seg Loss: 2.057
- mAP50(B): 0.992
- mAP50(M): 0.912
Training time: ~3751 seconds

2. Improved Training (YOLO11m)

Configuration:
- Model: YOLO11m
- Batch size: 12
- Optimizer: AdamW
- Enhanced augmentation
Final metrics:
- Box Loss: 0.314
- Seg Loss: 0.915
- mAP50(B): 0.989
- mAP50(M): 0.911
Training time: ~14468 seconds

3. Small Model with AdamW (YOLO11s)

Configuration:
- Model: YOLO11s
- Batch size: 12
- Optimizer: AdamW
- Enhanced augmentation
Final metrics:
- Box Loss: 0.291
- Seg Loss: 0.891
- mAP50(B): 0.991
- mAP50(M): 0.913
Training time: ~12000 seconds

4. Small Model with SGD (YOLO11s)

Configuration:
- Model: YOLO11s
- Batch size: 12
- Optimizer: SGD
- Enhanced augmentation
Final metrics:
- Box Loss: 0.285
- Seg Loss: 0.887
- mAP50(B): 0.992
- mAP50(M): 0.914
Training time: ~11000 seconds

Key Findings:

Model Size: YOLO11s performed better than YOLO11m for this small dataset, suggesting that smaller models can be more effective for limited data.
Optimizer: SGD provided slightly better results than AdamW for the segmentation task, with:
- 2% better box loss
- 0.4% better segmentation loss
- 0.1% better mAP50 scores
Training Efficiency: SGD training was faster and more stable than AdamW.
Best Configuration: YOLO11s with SGD optimizer and batch size 12 achieved the best overall performance.

Recommendations:

For small datasets (<1000 images): Use YOLO11s
For segmentation tasks: Prefer SGD over AdamW
Use batch size 12 for optimal performance
Apply enhanced augmentation techniques for better generalization

YOLO11s: AdamW vs SGD Quick Comparison

Metric	AdamW (exp_improved3)	SGD (exp_improved4)	Improvement
Box Loss	0.291	0.285	+2.1%
Seg Loss	0.891	0.887	+0.4%
mAP50(B)	0.991	0.992	+0.1%
mAP50(M)	0.913	0.914	+0.1%
Training Time	~12000s	~11000s	-8.3%

Summary: SGD optimizer provided marginal but consistent improvements across all metrics while being faster to train. The differences, though small, suggest SGD is better suited for segmentation tasks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Text Line Segmentation with YOLO11

Scripts Overview

1. `convert_page_to_yolo.py`

2. `convert_alto_to_yolo.py`

3. `visualize_masks.py`

4. `train.py`

5. `train_improved.py`

6. `app.py`

7. `plot_yolo_metrics.py`

Dataset Structure

Training Progress Metrics

Visualization Features

Important Notes

Hardware Requirements

Model Performance

Notes

Training Approaches Comparison

1. Original Training (YOLO11m)

2. Improved Training (YOLO11m)

3. Small Model with AdamW (YOLO11s)

4. Small Model with SGD (YOLO11s)

Key Findings:

Recommendations:

YOLO11s: AdamW vs SGD Quick Comparison

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
LICENSE		LICENSE
README.md		README.md
app.py		app.py
convert_alto_to_yolo.py		convert_alto_to_yolo.py
convert_page_to_yolo.py		convert_page_to_yolo.py
predict.py		predict.py
train.py		train.py
train_improved.py		train_improved.py
visualize_masks.py		visualize_masks.py

License

johnlockejrr/page-to-yolo-training

Folders and files

Latest commit

History

Repository files navigation

Text Line Segmentation with YOLO11

Scripts Overview

1. convert_page_to_yolo.py

2. convert_alto_to_yolo.py

3. visualize_masks.py

4. train.py

5. train_improved.py

6. app.py

7. plot_yolo_metrics.py

Dataset Structure

Training Progress Metrics

Visualization Features

Important Notes

Hardware Requirements

Model Performance

Notes

Training Approaches Comparison

1. Original Training (YOLO11m)

2. Improved Training (YOLO11m)

3. Small Model with AdamW (YOLO11s)

4. Small Model with SGD (YOLO11s)

Key Findings:

Recommendations:

YOLO11s: AdamW vs SGD Quick Comparison

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. `convert_page_to_yolo.py`

2. `convert_alto_to_yolo.py`

3. `visualize_masks.py`

4. `train.py`

5. `train_improved.py`

6. `app.py`

7. `plot_yolo_metrics.py`

Packages