A graph neural network model for predicting adsorbate probability distributions in MOFs, based on the cdm and ocpmodels (now FairChem) packages. For now, this package relies heavily on the deprecated ocpmodels package, work is ongoing to make this a standalone package with fewer dependencies. While getting an appropriate environment set up to run these codes can pose a bit of a challenge, we have done our best to make it as straightforward as possible.
To use this package, several dependencies are required, primarily the Charge Density Models (cdm) pacakge and ocpmodels. Both of these are already included in this repository as submodules. To obtain these packages and this package, run the following command:
git clone --recursive https://github.com/uowoolab/DeepAPD
After doing so, the dependencies in requirements.txt are required. These can be installed into a separate conda environment using the following steps:
conda create -n DeepAPD python=3.10
conda activate DeepAPD
pip install -r requirements.txt
The final dependencies required are torch-scatter and torch-sparse. Install these separately in the following steps. If you continue to encounter issues with installing these packages, please refer to the corresponding documentation/projects, as for some systems, it may require a build from source (https://pypi.org/project/torch-scatter, https://pypi.org/project/torch-sparse/)
pip install torch-scatter -f https://data.pyg.org/whl/torch-$(python -c "import torch; print(torch.__version__)").html
pip install torch-sparse
The primary script intended for use in this package is run_inference.py. To use this script, run it with the --help command. It will generate a cube file from a cif file for a given set of conditions and adsorbate. If you are interested in binding sites, consider using this package with our binding site identification code, GALA. So long as this package is in the source directory of GALA, you can run the inference and binding site identification in a single command, handled entirely by GALA. See the GALA repository for more details.
The GPU will be utilized for inference if found by pytorch. Ensure to set the OMP_NUM_THREADS environment variable to specify the number of CPUs you wish to use, for example, the following would use maximally 12 cores:
export OMP_NUM_THREADS=12
The data used to train the models, and data generated by the ML models can be found at the following Zenodo repositories:
CH4@1bar simulation data (APDs and binding sites): https://zenodo.org/uploads/16800893
CH4@65bar simulation data (APDs and binding sites): https://zenodo.org/uploads/16801034
Xe@1bar simulation data (APDs and binding sites): https://zenodo.org/uploads/16801181
ML results (APDs and binding sites): https://zenodo.org/uploads/16808817