Can We Trust Embodied Agents?
Exploring Backdoor Attacks against Embodied LLM-Based Decision-Making Systems
Ruochen Jiao*1
Shaoyuan Xie*2
Justin Yue2
Takami Sato2
Lixu Wang1
Yixuan Wang1
Qi Alfred Chen2
Qi Zhu1
1Northwestern University
2University of California, Irvine
*Equal contribution
Large Language Models (LLMs) are promising for decision-making in embodied AI but pose safety and security risks. We introduce BALD, a framework for Backdoor Attacks on LLM-based systems, exploring attack surfaces and triggers. We propose three attack mechanisms: word injection, scenario manipulation, and knowledge injection. Our experiments on GPT-3.5, LLaMA2, and PaLM2 in autonomous driving and home robot tasks show high success rates and stealthiness. Our findings highlight critical vulnerabilities and the need for robust defenses in embodied LLM systems.
If you find our work or dataset useful, please cite:
@inproceedings{jiao2025canwe,
title = {Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied {LLM}-Based Decision-Making Systems},
author = {Ruochen Jiao and Shaoyuan Xie and Justin Yue and Takami Sato and Lixu Wang and Yixuan Wang and Qi Alfred Chen and Qi Zhu},
booktitle = {The Thirteenth International Conference on Learning Representations (ICLR)},
year = {2025}
}conda create -y -n bald python=3.11
conda activate bald
pip install -r requirements.txtPlease refer to dataset/README.md for the dataset structure.
Please refer to eval/README.md for the evaluation code.
Please refer to defenses/README.md for the defense code.
- Add HighWayEnv dataset and evaluation
- Add VirtualHome dataset and evaluation
