SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning

	prompt navigate to a basketball	prompt find to a basketball	prompt locate a vase.	prompt find a spray bottle and pick up that spray bottle
Baseline
SafeVLA

Several demos demonstrate how SafeVLA can ensure safety while optimizing task performance.

Latest Updates

[2025-11-21] Model & Benchmark release
[2025-09-18] Paper accepted: SafeVLA was accept as NeurIPS 2025 Spotlight!
[2025-03-06] Paper released: SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning
[2025-02-28] Initial release

Quick Start

1. Setting up the Docker Python environment (Recommend)

First

git clone https://github.com/PKU-Alignment/SafeVLA.git
cd SafeVLA

Please use the pre-built image from Docker Hub:

docker pull safevla/safevla:v1

Then config `scripts/run_docker.sh`

export CODE_PATH=/path/to/this/repo
export DATA_PATH=/path/to/data_dir
export DOCKER_IMAGE=safevla/safevla:v1
docker run \
    --gpus all \
    --device /dev/dri \
    --mount type=bind,source=${CODE_PATH},target=/root/SafeVLA \
    --mount type=bind,source=${DATA_PATH},target=/root/data \
    --shm-size 50G \
    --runtime=nvidia \
    --network=host \
    --name safevla \
    -it ${DOCKER_IMAGE}

DATA_PATH: storage training data | assets | checkpoint...

bash scripts/run_docker.sh

Or create Python environment from scratch

Environment create

conda create -n safevla python==3.10 
pip install torch==2.4.1+cu121 torchvision==0.19.1+cu121 torchaudio==2.4.1+cu121 --extra-index-url https://download.pytorch.org/whl/cu121
bash scripts/install.sh

Then install allenact, AI2THOR, Allenact-plugins

pip install --no-deps  "git+https://github.com/allenai/allenact.git@d055fc9d4533f086e0340fe0a838ed42c28d932e#egg=allenact_plugins[all]&subdirectory=allenact_plugins"
pip install --no-deps "git+https://github.com/Ethyn13/allenact.git@main#egg=allenact&subdirectory=allenact"
pip install --no-deps  --extra-index-url https://ai2thor-pypi.allenai.org ai2thor==0+966bd7758586e05d18f6181f459c0e90ba318bec

Due to occasional instability in the AI2-THOR simulator, terminated evaluation or training runs may leave behind zombie processes that keep the GPU occupied, or cause NCCL failures in the system. You can clean up the processes with:

pkill -f thor-CloudRendering

For the latter, a full system reboot is required — therefore, using Docker is strongly recommended.

2. Training Data and Assets Config

In order to run training and evaluation you'll need:

The processed/optimized Objaverse assets along with their annotations.

The set of ProcTHOR-Objaverse houses you'd like to train/evaluate on.

For evaluation only, a trained model checkpoint.

Below we describe how to download the assets, annotations, and the ProcTHOR-Objaverse houses. We also describe how you can use one of our pre-trained models(IL model) to run evaluation.

2.1 Downloading assets, annotations, and houses

Downloading optimized Objaverse assets and annotations

Pick a directory /path/to/objaverse_assets where you'd like to save the assets and annotations. Then run the following commands:

python -m objathor.dataset.download_annotations --version 2023_07_28 --path /path/to/objaverse_assets
python -m objathor.dataset.download_assets --version 2023_07_28 --path /path/to/objaverse_assets

These will create the directory structure:

/path/to/objaverse_assets
    2023_07_28
        annotations.json.gz                              # The annotations for each object
        assets
            000074a334c541878360457c672b6c2e             # asset id
                000074a334c541878360457c672b6c2e.pkl.gz
                albedo.jpg
                emission.jpg
                normal.jpg
                thor_metadata.json
            ... #  39663 more asset directories

Downloading ProcTHOR-Objaverse houses

Pick a directory /path/to/objaverse_houses where you'd like to save ProcTHOR-Objaverse houses. Then run:

python -m scripts.download_objaverse_houses --save_dir /path/to/objaverse_houses --subset val

to download the validation set of houses as /path/to/objaverse_houses/val.jsonl.gz. You can also change val to train to download the training set of houses.

2.2 Downloading training data

Pick a directory /path/to/training_data where you'd like to save il data. Then run:

python -m scripts.download_training_data --save_dir /path/to/training_data --task_types TASK_TYPES

TASK_TYPES: FetchType | PickupType | ObjectNavType

3 Evaluation

Setting environment variables

Next you need to set the following environment variables:

export PYTHONPATH=/path/to/safevla_code
export OBJAVERSE_HOUSES_DIR=/path/to/objaverse_houses
export OBJAVERSE_DATA_DIR=/path/to/objaverse_assets

For training, we recommend to set two more environment variables to avoid timeout issues from AllenAct:

export ALLENACT_DEBUG=True
export ALLENACT_DEBUG_VST_TIMEOUT=2000

For baseline model (IL model)

python scripts/download_baseline_ckpt.py --ckpt_ids spoc_IL --save_dir PATH_TO_SAVE_DIR

bash scripts/eval.sh --task_type TASK_TYPE --ckpt_path IL_CKPT_PATH

TASK_TYPE: spoc_IL | fetch | pickup | objectnav

For baseline model (RL model)

python scripts/download_baseline_ckpt.py --ckpt_ids TASK_TYPE --save_dir PATH_TO_SAVE_DIR

bash scripts/eval.sh --task_type TASK_TYPE --ckpt_path RL_CKPT_PATH

For Safe Aligned model

python scripts/download_aligned_ckpt.py --ckpt_ids TASK_TYPE --save_dir PATH_TO_SAVE_DIR

bash scripts/eval.sh --task_type TASK_TYPE --ckpt_path CKPT_PATH

4 Training

Running Safe RL finetuning

Download pretrained IL ckpt:

python scripts/download_il_ckpt.py --ckpt_ids spoc_IL --save_dir PATH_TO_SAVE_DIR

Run Safe RL training:

python training/online/dinov2_vits_tsfm_base.py train \
  --il_ckpt_path IL_CKPT_PATH \
  --num_train_processes NUM_OF_TRAIN_PROCESSES \
  --output_dir PATH_TO_SAVE_CKPT \
  --dataset_dir PATH_TO_DATASET \
  --cost_limit COST_LIMIT \
  --tag EXP_NAME

For example,

python training/online/dinov2_vits_tsfm_base.py train \
    --il_ckpt_path /root/data/il_ckpt/spoc_IL/model.ckpt \
    --num_train_processes 32 \
    --output_dir /root/data/results/ \
    --dataset_dir /root/data/data/astar/ObjectNavType \
    --cost_limit 2.31964 \
    --tag SafeVLA2.31964-ObjectNavType-RL-DinoV2-ViTS-TSFM\

Or you can:

bash scripts/train.sh --task_type TASK_TYPE_ID --il_ckpt_path CKPT_PATH

Citation

If you find our code or models useful in your work, please cite our paper:

@article{zhang25safevla,
    title={SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Safe Reinforcement Learning},
    author={Borong Zhang and Yuhao Zhang and Jiaming Ji and Yingshan Lei and Josef Dai and Yuanpei Chen and Yaodong Yang},
    journal = {arXiv preprint arXiv:2503.03480},
    year={2025}
}

Acknowledgment

This repository benefits from AllenAct, AI2THOR, ProcTHOR, SPOC, FLaRe and Align-Anything.

Thanks for their wonderful works and their efforts to further promote VLA research. SafeVLA and its related assets are built and open-sourced with love and respect ❤️.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
architecture		architecture
assets		assets
benchmark		benchmark
environment		environment
online_evaluation		online_evaluation
scripts		scripts
tasks		tasks
training		training
utils		utils
.gitignore		.gitignore
Arial.ttf		Arial.ttf
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning

Latest Updates

Quick Start

1. Setting up the Docker Python environment (Recommend)

First

Please use the pre-built image from Docker Hub:

Then config `scripts/run_docker.sh`

Or create Python environment from scratch

Environment create

Then install allenact, AI2THOR, Allenact-plugins

2. Training Data and Assets Config

2.1 Downloading assets, annotations, and houses

Downloading optimized Objaverse assets and annotations

Downloading ProcTHOR-Objaverse houses

2.2 Downloading training data

3 Evaluation

Setting environment variables

For baseline model (IL model)

For baseline model (RL model)

For Safe Aligned model

4 Training

Running Safe RL finetuning

Citation

Acknowledgment

About

Uh oh!

Releases

Packages

Contributors 3

Languages

License

PKU-Alignment/SafeVLA

Folders and files

Latest commit

History

Repository files navigation

SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning

Latest Updates

Quick Start

1. Setting up the Docker Python environment (Recommend)

First

Please use the pre-built image from Docker Hub:

Then config scripts/run_docker.sh

Or create Python environment from scratch

Environment create

Then install allenact, AI2THOR, Allenact-plugins

2. Training Data and Assets Config

2.1 Downloading assets, annotations, and houses

Downloading optimized Objaverse assets and annotations

Downloading ProcTHOR-Objaverse houses

2.2 Downloading training data

3 Evaluation

Setting environment variables

For baseline model (IL model)

For baseline model (RL model)

For Safe Aligned model

4 Training

Running Safe RL finetuning

Citation

Acknowledgment

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Then config `scripts/run_docker.sh`

Packages