diff --git a/README.md b/README.md index 462bb65..d4fd99d 100644 --- a/README.md +++ b/README.md @@ -3,16 +3,74 @@ [![Build](https://github.com/cacheMon/libCacheSim-python/actions/workflows/build.yml/badge.svg)](https://github.com/cacheMon/libCacheSim-python/actions/workflows/build.yml) [![Documentation](https://github.com/cacheMon/libCacheSim-python/actions/workflows/docs.yml/badge.svg)](docs.libcachesim.com/python) -Python bindings for [libCacheSim](https://github.com/1a1a11a/libCacheSim), a high-performance cache simulator and analysis library. + +libCacheSim is fast with the features from [underlying libCacheSim lib](https://github.com/1a1a11a/libCacheSim): + +- **High performance** - over 20M requests/sec for a realistic trace replay +- **High memory efficiency** - predictable and small memory footprint +- **Parallelism out-of-the-box** - uses the many CPU cores to speed up trace analysis and cache simulations + +libCacheSim is flexible and easy to use with: + +- **Seamless integration** with [open-source cache dataset](https://github.com/cacheMon/cache_dataset) consisting of thousands traces hosted on S3 +- **High-throughput simulation** with the [underlying libCacheSim lib](https://github.com/1a1a11a/libCacheSim) +- **Detailed cache requests** and other internal data control +- **Customized plugin cache development** without any compilation + +## Prerequisites + +- OS: Linux / macOS +- Python: 3.9 -- 3.13 ## Installation +### Quick Install + Binary installers for the latest released version are available at the [Python Package Index (PyPI)](https://pypi.org/project/libcachesim). ```bash pip install libcachesim ``` +### Recommended Installation with uv + +It's recommended to use [uv](https://docs.astral.sh/uv/), a very fast Python environment manager, to create and manage Python environments: + +```bash +uv venv --python 3.12 --seed +source .venv/bin/activate +uv pip install libcachesim +``` + +### Advanced Features Installation + +For users who want to run LRB, ThreeLCache, and GLCache eviction algorithms: + +!!! important + If `uv` cannot find built wheels for your machine, the building system will skip these algorithms by default. + +To enable them, you need to install all third-party dependencies first: + +```bash +git clone https://github.com/cacheMon/libCacheSim-python.git +cd libCacheSim-python +bash scripts/install_deps.sh + +# If you cannot install software directly (e.g., no sudo access) +bash scripts/install_deps_user.sh +``` + +Then, you can reinstall libcachesim using the following commands (may need to add `--no-cache-dir` to force it to build from scratch): + +```bash +# Enable LRB +CMAKE_ARGS="-DENABLE_LRB=ON" uv pip install libcachesim +# Enable ThreeLCache +CMAKE_ARGS="-DENABLE_3L_CACHE=ON" uv pip install libcachesim +# Enable GLCache +CMAKE_ARGS="-DENABLE_GLCACHE=ON" uv pip install libcachesim +``` + ### Installation from sources If there are no wheels suitable for your environment, consider building from source. @@ -29,6 +87,42 @@ python -m pytest tests/ ## Quick Start +### Cache Simulation + +With libcachesim installed, you can start cache simulation for some eviction algorithm and cache traces: + +```python +import libcachesim as lcs + +# Step 1: Get one trace from S3 bucket +URI = "cache_dataset_oracleGeneral/2007_msr/msr_hm_0.oracleGeneral.zst" +dl = lcs.DataLoader() +dl.load(URI) + +# Step 2: Open trace and process efficiently +reader = lcs.TraceReader( + trace = dl.get_cache_path(URI), + trace_type = lcs.TraceType.ORACLE_GENERAL_TRACE, + reader_init_params = lcs.ReaderInitParam(ignore_obj_size=False) +) + +# Step 3: Initialize cache +cache = lcs.S3FIFO(cache_size=1024*1024) + +# Step 4: Process entire trace efficiently (C++ backend) +obj_miss_ratio, byte_miss_ratio = cache.process_trace(reader) +print(f"Object miss ratio: {obj_miss_ratio:.4f}, Byte miss ratio: {byte_miss_ratio:.4f}") + +# Step 4.1: Process with limited number of requests +cache = lcs.S3FIFO(cache_size=1024*1024) +obj_miss_ratio, byte_miss_ratio = cache.process_trace( + reader, + start_req=0, + max_req=1000 +) +print(f"Object miss ratio: {obj_miss_ratio:.4f}, Byte miss ratio: {byte_miss_ratio:.4f}") +``` + ### Basic Usage ```python @@ -46,7 +140,9 @@ print(cache.get(req)) # False (first access) print(cache.get(req)) # True (second access) ``` -### Trace Processing +### Trace Analysis + +Here is an example demonstrating how to use `TraceAnalyzer`: ```python import libcachesim as lcs @@ -56,25 +152,40 @@ URI = "cache_dataset_oracleGeneral/2007_msr/msr_hm_0.oracleGeneral.zst" dl = lcs.DataLoader() dl.load(URI) -# Step 2: Open trace and process efficiently -reader = lcs.TraceReader(dl.get_cache_path(URI)) +reader = lcs.TraceReader( + trace = dl.get_cache_path(URI), + trace_type = lcs.TraceType.ORACLE_GENERAL_TRACE, + reader_init_params = lcs.ReaderInitParam(ignore_obj_size=False) +) -# Step 3: Initialize cache -cache = lcs.S3FIFO(cache_size=1024*1024) +analysis_option = lcs.AnalysisOption( + req_rate=True, # Keep basic request rate analysis + access_pattern=False, # Disable access pattern analysis + size=True, # Keep size analysis + reuse=False, # Disable reuse analysis for small datasets + popularity=False, # Disable popularity analysis for small datasets (< 200 objects) + ttl=False, # Disable TTL analysis + popularity_decay=False, # Disable popularity decay analysis + lifetime=False, # Disable lifetime analysis + create_future_reuse_ccdf=False, # Disable experimental features + prob_at_age=False, # Disable experimental features + size_change=False, # Disable size change analysis + ) + +analysis_param = lcs.AnalysisParam() + +analyzer = lcs.TraceAnalyzer( + reader, "example_analysis", analysis_option=analysis_option, analysis_param=analysis_param +) -# Step 4: Process entire trace efficiently (C++ backend) -obj_miss_ratio, byte_miss_ratio = cache.process_trace(reader) -print(f"Object miss ratio: {obj_miss_ratio:.4f}, Byte miss ratio: {byte_miss_ratio:.4f}") +analyzer.run() ``` -> [!NOTE] -> We DO NOT ignore the object size by defaults, you can add `reader_init_params = lcs.ReaderInitParam(ignore_obj_size=False)` to the initialization of `TraceReader` if needed. - -## Custom Cache Policies +## Plugin System -Implement custom cache replacement algorithms using pure Python functions - **no C/C++ compilation required**. +libCacheSim allows you to develop your own cache eviction algorithms and test them via the plugin system without any C/C++ compilation required. -### Python Hook Cache Overview +### Plugin Cache Overview The `PluginCache` allows you to define custom caching behavior through Python callback functions. You need to implement these callback functions: @@ -87,74 +198,51 @@ The `PluginCache` allows you to define custom caching behavior through Python ca | `remove_hook` | `(data: Any, obj_id: int) -> None` | Clean up when object removed | | `free_hook` | `(data: Any) -> None` | [Optional] Final cleanup | -
-An example for LRU +### Example: Implementing LRU via Plugin System ```python from collections import OrderedDict -from libcachesim import PluginCache, CommonCacheParams, Request, SyntheticReader, LRU - - -class StandaloneLRU: - def __init__(self): - self.cache_data = OrderedDict() - - def cache_hit(self, obj_id): - if obj_id in self.cache_data: - obj_size = self.cache_data.pop(obj_id) - self.cache_data[obj_id] = obj_size - - def cache_miss(self, obj_id, obj_size): - self.cache_data[obj_id] = obj_size - - def cache_eviction(self): - evicted_id, _ = self.cache_data.popitem(last=False) - return evicted_id - - def cache_remove(self, obj_id): - if obj_id in self.cache_data: - del self.cache_data[obj_id] - +from typing import Any -def cache_init_hook(common_cache_params: CommonCacheParams): - return StandaloneLRU() +from libcachesim import PluginCache, LRU, CommonCacheParams, Request +def init_hook(_: CommonCacheParams) -> Any: + return OrderedDict() -def cache_hit_hook(cache, request: Request): - cache.cache_hit(request.obj_id) +def hit_hook(data: Any, req: Request) -> None: + data.move_to_end(req.obj_id, last=True) +def miss_hook(data: Any, req: Request) -> None: + data.__setitem__(req.obj_id, req.obj_size) -def cache_miss_hook(cache, request: Request): - cache.cache_miss(request.obj_id, request.obj_size) +def eviction_hook(data: Any, _: Request) -> int: + return data.popitem(last=False)[0] +def remove_hook(data: Any, obj_id: int) -> None: + data.pop(obj_id, None) -def cache_eviction_hook(cache, request: Request): - return cache.cache_eviction() - - -def cache_remove_hook(cache, obj_id): - cache.cache_remove(obj_id) - - -def cache_free_hook(cache): - cache.cache_data.clear() - +def free_hook(data: Any) -> None: + data.clear() plugin_lru_cache = PluginCache( - cache_size=1024, - cache_init_hook=cache_init_hook, - cache_hit_hook=cache_hit_hook, - cache_miss_hook=cache_miss_hook, - cache_eviction_hook=cache_eviction_hook, - cache_remove_hook=cache_remove_hook, - cache_free_hook=cache_free_hook, - cache_name="CustomizedLRU", + cache_size=128, + cache_init_hook=init_hook, + cache_hit_hook=hit_hook, + cache_miss_hook=miss_hook, + cache_eviction_hook=eviction_hook, + cache_remove_hook=remove_hook, + cache_free_hook=free_hook, + cache_name="Plugin_LRU", ) -``` -
+reader = lcs.SyntheticReader(num_objects=1000, num_of_req=10000, obj_size=1) +req_miss_ratio, byte_miss_ratio = plugin_lru_cache.process_trace(reader) +ref_req_miss_ratio, ref_byte_miss_ratio = LRU(128).process_trace(reader) +print(f"plugin req miss ratio {req_miss_ratio}, ref req miss ratio {ref_req_miss_ratio}") +print(f"plugin byte miss ratio {byte_miss_ratio}, ref byte miss ratio {ref_byte_miss_ratio}") +``` -Another simple implementation via hook functions for S3FIFO respectively is given in [examples](examples/plugin_cache/s3fifo.py). +By defining custom hook functions for cache initialization, hit, miss, eviction, removal, and cleanup, users can easily prototype and test their own cache eviction algorithms. ### Getting Help @@ -208,7 +296,6 @@ If you used libCacheSim in your research, please cite the above papers. --- - ## License See [LICENSE](LICENSE) for details. diff --git a/docs/src/en/getting_started/quickstart.md b/docs/src/en/getting_started/quickstart.md index b913a9d..70bfe22 100644 --- a/docs/src/en/getting_started/quickstart.md +++ b/docs/src/en/getting_started/quickstart.md @@ -37,7 +37,7 @@ To enable them, you need to install all third-party dependencies first. bash scripts/install_deps_user.sh ``` -Then, you can reinstall libcachesim using the following commands: +Then, you can reinstall libcachesim using the following commands (may need to add `--no-cache-dir` to force it to build from scratch): ```bash # Enable LRB