Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -165,3 +165,14 @@ out/

website/_site
website/.jekyll-cache

#uv
uv.lock

# Huggingface
HF_TOKEN

# Notebooks
notebooks/

# Data
32 changes: 0 additions & 32 deletions .pre-commit-config.yaml

This file was deleted.

5 changes: 0 additions & 5 deletions .pylintrc

This file was deleted.

2 changes: 1 addition & 1 deletion .python-version
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3.8
3.12
44 changes: 0 additions & 44 deletions Dockerfile

This file was deleted.

141 changes: 17 additions & 124 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,21 @@
# TabularBench

TabularBench: Adversarial robustness benchmark for tabular data
TabularBench: Adversarial Robustness Benchmark for Tabular Data

**Leaderboard**: [https://serval-uni-lu.github.io/tabularbench/](https://serval-uni-lu.github.io/tabularbench/)

**Documentation**: [https://serval-uni-lu.github.io/tabularbench/doc](https://serval-uni-lu.github.io/tabularbench/doc)
This is a fork from the [original benchmark repository](https://github.com/serval-uni-lu/tabularbench).
This version has been updated to work with `python 3.12` and the latest versions of the dependencies.
It has also been refactored to be more lightweight, and thus the original leaderbord and documentation have been removed.
To access the original leaderboard and documentation, please refer to the original repository.

**Research papers**:

- Benchmark: [TabularBench: Benchmarking Adversarial Robustness for Tabular Deep Learning in Real-world Use-cases](https://arxiv.org/abs/2408.07579)
- CAA attack: [Constrained Adaptive Attack: Effective Adversarial Attack Against Deep Neural Networks for Tabular Data](https://arxiv.org/abs/2406.00775)
- CAPGD attack: [Towards Adaptive Attacks on Constrained Tabular Machine Learning](https://openreview.net/forum?id=DnvYdmR9OB)
- MOEVA attack: [A Unified Framework for Adversarial Attack and Defense in Constrained Feature Space](https://arxiv.org/abs/2112.01156)

**How to cite**:

Would you like to reference the CAA attack?

Then consider citing our paper, to appear in NeurIPS 2024 (spotlight):
To reference the *CAA attack*, consider citing the following paper:

```bibtex
@misc{simonetto2024caa,
Expand All @@ -29,9 +27,7 @@ Then consider citing our paper, to appear in NeurIPS 2024 (spotlight):
}
```

Would you like to reference the benchmark, the leaderboard or the model zoo?

Then consider citing our paper, to appear in NeurIPS 2024 Datasets and Benchmarks:
To reference the benchmark, consider citing the following paper:

```bibtex
@misc{simonetto2024tabularbench,
Expand All @@ -43,128 +39,25 @@ Then consider citing our paper, to appear in NeurIPS 2024 Datasets and Benchmark
}
```

## Installation

### Using Docker (recommended)

1. Clone the repository

2. Build the Docker image

```bash
./tasks/docker_build.sh
```

3. Run the Docker container

```bash
./tasks/run_benchmark.sh
```

Note: The `./tasks/run_benchmark.sh` script mounts the current directory to the `/workspace` directory in the Docker container.
This allows you to edit the code on your host machine and run the code in the Docker container without rebuilding.

### Using Pip

We recommend using Python 3.8.10.

1. Install the package from PyPI

```bash
pip install tabularbench
```

### With Pyenv and Poetry

1. Clone the repository

2. Create a virtual environment using [Pyenv](https://github.com/pyenv/pyenv) with Python 3.8.10.

3. Install the dependencies using [Poetry](https://python-poetry.org/).

```bash
poetry install
```
## Installation

### Using conda
### UV

1. Clone the repository

2. Create a virtual environment using [Conda](https://docs.anaconda.com/free/miniconda/) with Python 3.8.10.

```bash
conda create -n tabularbench python=3.8.10
```

3. Activate the conda environment.

```bash
conda activate tabularbench
```
``bash
git clone https://github.com/serval-uni-lu/tabularbench-cf.git
``

4. Install the dependencies using Pip.

```bash
pip install -r requirements.txt
```

## How to use

### Run the benchmark

You can run the benchmark with the following command:

```bash
python -m tasks.run_benchmark
```

or with Docker:
2. Navigate to the project directory

```bash
docker_run_benchmark
```

### Using the API

You can also use the API to run the benchmark. See `tasks/run_benchmark.py` for an example.

```python
clean_acc, robust_acc = benchmark(
dataset="URL",
model="STG_Default",
distance="L2",
constraints=True,
)
cd tabularbench-cf
```

### Retrain the models

We provide the models and parameters used in the paper.
You can retrain the models with the following command:
3. Install the dependencies using [UV](https://uv.io/).

```bash
python -m tasks.train_model
```

Edit the `tasks/train_model.py` file to change the model, dataset, and training method.

## Data availability

Datasets, pretrained models, and synthetic data are publicly available [here](https://uniluxembourg-my.sharepoint.com/:f:/g/personal/thibault_simonetto_uni_lu/EvkG4BI0EqJFu436biA2C_sBpkEKTTjA5PgZU_Z9jwNNSA?e=62a4Dm).
The folder structure on the Shared folder should be followed locally to ensure the code runs correctly.

> [!NOTE]
> We are transitioning to Hugging Face for data storage. The model's data is now available on Huggin Face [here](https://huggingface.co/serval-uni-lu/tabularbench/tree/main).

**Datasets**: Datasets are downloaded automatically in `data/datasets` when used.

**Models** (HuggingFace): Models are now downloaded automatically as needed when running the benchmark. Only the required model for a specific setting will be downloaded. Pretrained models remain available in the `data/models` folder on OneDrive.

**Model parameters**: Optimal parameters (from hyperparameters search) are required to train models and are in `data/model_parameters`.

**Synthetic data**: The synthetic data generated by GANs is available in the folder `data/synthetic`.

## Naming

For technical reasons, the names of datasets, models, and training methods are different from the paper.
The mapping can be found in [docs/naming.md](docs/naming.md).
uv sync
```
20 changes: 0 additions & 20 deletions docs/Makefile

This file was deleted.

35 changes: 0 additions & 35 deletions docs/make.bat

This file was deleted.

3 changes: 0 additions & 3 deletions docs/requirements.txt

This file was deleted.

24 changes: 0 additions & 24 deletions docs/source/about.md

This file was deleted.

Loading