Skip to content

Liuguanli/rl_spatial_benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Benchmarking RL-enhacned Spatial Indices

Setup

1. Libraries

To run the experiments, you need to have LibTorch installed. Download it from the following link:

2. Datasets

The datasets required for the experiments can be downloaded from the following Dropbox link:

After downloading, follow these steps:

  1. Create a data folder in the root directory of your project (./).
  2. Move all downloaded files to ./data/.

3. Dataset Distributions

Here are some visualizations of the dataset distributions used in the experiments:

Real Data

  • Real data

  • Real data point distribution

  • Real data range distribution

Synthetic Data

  • Synthetic data

  • Synthetic data point distribution

  • Synthetic data range distribution

4. Configuration

Configs

Ensure that the experiment configurations are correctly set up by checking the \exp_config folder. Adjust the configurations as necessary for your experiments. For example:

{
  "experiments": [
    {
      "available": true,
      "data": {
        "size": 100000000,
        "dimensions": 2,
        "distribution": "us",
        "skewness": 1,
        "bounds": [
          [0, 1],
          [0, 1]
        ]
      },
      "workloads": [
        "point_query_only.json", 
        "range_query_only.json", 
        "knn_query_only.json"
       ],
      "baseline": [
        {
          "name": "rankspace",
          "available": false,
          "config": {
            "fill_factor": 1.0,
            "page_size": 100,
            "bit_num": 32
          }
        },
        {
          "name": "kdgreedy",
          "available": true,
          "config": {
            "page_size": 100
          }
        }
      ]
    }
  ]
}

Explanation:

  • experiments: An array containing experiment configurations.
    • available: A boolean indicating whether the experiment is available to run.
    • data: Describes the dataset used in the experiment.
      • size: The number of data points in the dataset.
      • dimensions: The number of dimensions (features) in the dataset.
      • distribution: The distribution type of the dataset (e.g., "us" for U.S. region-based distribution).
      • skewness: The skewness level of the data distribution, with 1 indicating a specific skewness degree.
      • bounds: The range of values for each dimension in the dataset, given as an array of min-max pairs.
    • workloads: A list of workload files specifying the types of queries to be executed (e.g., point, range, k-NN queries).
    • baseline: An array of baseline methods used for comparison in the experiment.
      • name: The name of the baseline method.
      • available: A boolean indicating whether the baseline method is available for the experiment.
      • config: Configuration parameters specific to the baseline method.
        • fill_factor: (For rankspace) The fill factor of the index structure.
        • page_size: The size of each page (node) in the index.
        • bit_num: (For rankspace) The number of bits used in the rank space method.

Prerequisites Before Running Experiments

  1. Install Extended Libspatialindex:

    • Follow the instructions in the Installation Guide to install the extended version of libspatialindex.
  2. Verify Installation:

    • Run check_env.sh to verify that libspatialindex is correctly installed.
  3. Update Environment Variables:

    • Replace the following line in your environment setup:
      export LD_LIBRARY_PATH=/home/liuguanli/Documents/libtorch/lib:$LD_LIBRARY_PATH
    • with the path to your own installed libtorch library.
  4. Configure and Run Experiments:

    • In run_exp_from_config.py, set RUN_EXAMPLE=True if you want to run the example configurations.
    • To run experiments:
      • Use point_range_knn_queries for all query-only workloads.
      • Use ["write_only", "read_heavy_only", "write_heavy_only"] for insertion-related workloads.
def main():

    global logger
    configs = []
    if RUN_EXAMPLE:
        if RUN_ALL_BASELINE_EXAMPLE:
            configs = ["example_config_all_baselines.json",
                       "example_config_all_baselines_insert.json",
                       "example_config_all_baselines_read_heavy.json",
                       "example_config_all_baselines_write_heavy.json"]
            configs = ["example_config_all_baselines_point_rank_space_100m.json"]
        else: # for debug specific index
            configs = ["example_config_debug_bmtree.json"]
    else:
        directory = CONFIG_DIR
        # First run point_range_knn_queries to make sure queries are generated first for RL based.
        special_candidate = "point_range_knn_queries"
        for root, dirs, files in os.walk(directory):
            if root.split("/")[-1] == special_candidate:
                for file in files:
                    if file.endswith(".json"):
                        config_file_path = os.path.join(root, file)
                        configs.append(config_file_path)

        candidates = ["write_only", "read_heavy_only", "write_heavy_only"]
        for root, dirs, files in os.walk(directory):
            if root.split("/")[-1] not in candidates:
                continue
            for file in files:
                if file.endswith(".json"):
                    config_file_path = os.path.join(root, file)
                    configs.append(config_file_path)

Run experiments

To run all the experiments, simply execute the following command in your terminal:

bash run_all.sh

Index building

Index build time Index size Node number

Read-only workloads

Point query

Point query time Point I/O Point query P50 Point query P99

Range query

Range query time Range query I/O Range query P50 Range query P99

Knn query

Knn query time Knn query I/O Knn query P50 Knn query P99

Knn query (varying k)

Knn query time varying k Knn query I/O varying k Knn query P50 varying k Knn query P99 varying k

Write-only workload

Write only Write only P50 Write only P99 Write only reads Write only writes Write only splits

Write-heavy workload

Write heavy query time Write heavy insert time Write heavy query P50 Write heavy query P99 Write heavy insert P50 Write heavy insert P99 Write heavy splits

Read-heavy workload

Read heavy query time Read heavy insert time Read heavy query P50 Read heavy query P99 Read heavy insert P50 Read heavy insert P99

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors