Benchmarking RL-enhacned Spatial Indices

Setup

1. Libraries

To run the experiments, you need to have LibTorch installed. Download it from the following link:

LibTorch v2.4.0 (CPU version)

2. Datasets

The datasets required for the experiments can be downloaded from the following Dropbox link:

Download Datasets

After downloading, follow these steps:

Create a data folder in the root directory of your project (./).
Move all downloaded files to ./data/.

3. Dataset Distributions

Here are some visualizations of the dataset distributions used in the experiments:

Real Data

Synthetic Data

4. Configuration

Configs

Ensure that the experiment configurations are correctly set up by checking the \exp_config folder. Adjust the configurations as necessary for your experiments. For example:

{
  "experiments": [
    {
      "available": true,
      "data": {
        "size": 100000000,
        "dimensions": 2,
        "distribution": "us",
        "skewness": 1,
        "bounds": [
          [0, 1],
          [0, 1]
        ]
      },
      "workloads": [
        "point_query_only.json", 
        "range_query_only.json", 
        "knn_query_only.json"
       ],
      "baseline": [
        {
          "name": "rankspace",
          "available": false,
          "config": {
            "fill_factor": 1.0,
            "page_size": 100,
            "bit_num": 32
          }
        },
        {
          "name": "kdgreedy",
          "available": true,
          "config": {
            "page_size": 100
          }
        }
      ]
    }
  ]
}

Explanation:

experiments: An array containing experiment configurations.
- available: A boolean indicating whether the experiment is available to run.
- data: Describes the dataset used in the experiment.
  - size: The number of data points in the dataset.
  - dimensions: The number of dimensions (features) in the dataset.
  - distribution: The distribution type of the dataset (e.g., "us" for U.S. region-based distribution).
  - skewness: The skewness level of the data distribution, with 1 indicating a specific skewness degree.
  - bounds: The range of values for each dimension in the dataset, given as an array of min-max pairs.
- workloads: A list of workload files specifying the types of queries to be executed (e.g., point, range, k-NN queries).
- baseline: An array of baseline methods used for comparison in the experiment.
  - name: The name of the baseline method.
  - available: A boolean indicating whether the baseline method is available for the experiment.
  - config: Configuration parameters specific to the baseline method.
    - fill_factor: (For rankspace) The fill factor of the index structure.
    - page_size: The size of each page (node) in the index.
    - bit_num: (For rankspace) The number of bits used in the rank space method.

Prerequisites Before Running Experiments

Install Extended Libspatialindex:
- Follow the instructions in the Installation Guide to install the extended version of libspatialindex.
Verify Installation:
- Run check_env.sh to verify that libspatialindex is correctly installed.
Update Environment Variables:
- Replace the following line in your environment setup:
```
export LD_LIBRARY_PATH=/home/liuguanli/Documents/libtorch/lib:$LD_LIBRARY_PATH
```
- with the path to your own installed libtorch library.
Configure and Run Experiments:
- In run_exp_from_config.py, set RUN_EXAMPLE=True if you want to run the example configurations.
- To run experiments:
  - Use point_range_knn_queries for all query-only workloads.
  - Use ["write_only", "read_heavy_only", "write_heavy_only"] for insertion-related workloads.

def main():

    global logger
    configs = []
    if RUN_EXAMPLE:
        if RUN_ALL_BASELINE_EXAMPLE:
            configs = ["example_config_all_baselines.json",
                       "example_config_all_baselines_insert.json",
                       "example_config_all_baselines_read_heavy.json",
                       "example_config_all_baselines_write_heavy.json"]
            configs = ["example_config_all_baselines_point_rank_space_100m.json"]
        else: # for debug specific index
            configs = ["example_config_debug_bmtree.json"]
    else:
        directory = CONFIG_DIR
        # First run point_range_knn_queries to make sure queries are generated first for RL based.
        special_candidate = "point_range_knn_queries"
        for root, dirs, files in os.walk(directory):
            if root.split("/")[-1] == special_candidate:
                for file in files:
                    if file.endswith(".json"):
                        config_file_path = os.path.join(root, file)
                        configs.append(config_file_path)

        candidates = ["write_only", "read_heavy_only", "write_heavy_only"]
        for root, dirs, files in os.walk(directory):
            if root.split("/")[-1] not in candidates:
                continue
            for file in files:
                if file.endswith(".json"):
                    config_file_path = os.path.join(root, file)
                    configs.append(config_file_path)

Run experiments

To run all the experiments, simply execute the following command in your terminal:

bash run_all.sh

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.vscode		.vscode
exp_config		exp_config
figs		figs
notebook		notebook
rl_baseline		rl_baseline
tools		tools
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
check_env.sh		check_env.sh
clean_results.sh		clean_results.sh
constants.py		constants.py
example_config_all_baselines.json		example_config_all_baselines.json
example_config_all_baselines_insert.json		example_config_all_baselines_insert.json
example_config_all_baselines_point_rank_space.json		example_config_all_baselines_point_rank_space.json
example_config_all_baselines_point_rank_space_100m.json		example_config_all_baselines_point_rank_space_100m.json
example_config_all_baselines_read_heavy.json		example_config_all_baselines_read_heavy.json
example_config_all_baselines_write_heavy.json		example_config_all_baselines_write_heavy.json
example_config_debug_bmtree.json		example_config_debug_bmtree.json
example_config_debug_kdtree_insertion.json		example_config_debug_kdtree_insertion.json
example_config_debug_qdtree.json		example_config_debug_qdtree.json
example_config_debug_rlrtree.json		example_config_debug_rlrtree.json
example_config_fill_factor.json		example_config_fill_factor.json
requirements.txt		requirements.txt
run_all.sh		run_all.sh
run_all_debug.sh		run_all_debug.sh
run_exp_from_config.py		run_exp_from_config.py
run_exp_from_config_debug.py		run_exp_from_config_debug.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benchmarking RL-enhacned Spatial Indices

Setup

1. Libraries

2. Datasets

3. Dataset Distributions

Real Data

Synthetic Data

4. Configuration

Configs

Prerequisites Before Running Experiments

Run experiments

Index building

Read-only workloads

Point query

Range query

Knn query

Knn query (varying k)

Write-only workload

Write-heavy workload

Read-heavy workload

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Benchmarking RL-enhacned Spatial Indices

Setup

1. Libraries

2. Datasets

3. Dataset Distributions

Real Data

Synthetic Data

4. Configuration

Configs

Prerequisites Before Running Experiments

Run experiments

Index building

Read-only workloads

Point query

Range query

Knn query

Knn query (varying k)

Write-only workload

Write-heavy workload

Read-heavy workload

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages