Skip to content

uowoolab/MOSAEC-DB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MOSAEC Database (v1.0.0-release)

Warning

** The Database Zenodo record(s) will be taken down pending confirmation/amendments to the licensing agreement pertaining to the enclosed crystallographic data**

Article Python Formatter

mosaecdb

MOSAEC-DB is a database of metal-organic framework (MOF) and coordination polymer crystallographic information files (.cif) processed for atomistic simulations.

This repository collects the software applied during database construction, validation, and analysis.

Additionally, this repository will provide further information regarding the use of the database and changes to the database:

Download

The MOSAEC-DB files, including all publicly-available crystal structures, scripts, and supplemental data, can be downloaded from the zenodo repository.

Database Construction

Structure Retrieval & Symmetry Conversion

Initial crystal structures may be retrieved directly from the Cambridge Structural Database (CSD) through the ConQuest program or their provided Python API.

Retrieved CSD crystal structures were converted to P1 symmetry using our CSD-Cleaner code.

Solvent Removal

The SAMOSA solvent removal method was utilized in database construction. A publication outlining the details of this method is available. All *_full MOSAEC-DB structures were generated with default settings, while *_partial structures were generated with the --keep_bound option.

Structure Error Analysis

The results of the MOSAEC error checking algorithm were used to determine whether to include a structure in MOSAEC-DB. A preprint article outlining the details of this method is available, and the GitHub repository will be available to the public shortly (following its publication). Only structures which passed the MOSAEC error flagging routine were included in the final database.

Structure Validation

Additional tools to check for problematic structures which may not have been caught by the structure error analyses are provided in structure_validation. This includes codes to search for overlapping and hypervalent atom sites.

Duplicate Structure Analysis

Criterion based on pointwise distance distribution (PDD) scores were applied to identify duplicated and/or highly similar crystal structures with shared empirical formulas. The codes used to complete this analysis are provided in duplicates.

Descriptor Calculation

Global Structure Features

Descriptors characterizing the databases' geometric and chemical environments were generated using a number of standard libraries.

The codes generating the atomic property-weighted radial distribution function (AP-RDF) and revised autocorrelation function (RAC) are identical to those used in the ARC-MOF database.

Geometric properties were generated using the Zeo++ v0.3.0 software with default settings.

Any code used to generate descriptors which were not previously made available are provided in descriptors, including the atom-specific persistent homology descriptors included in the zenodo record.

Partial Atomic Charges

Electrostatic potential-derived partial atomic charges were computed for as many MOSAEC-DB structures as possible using the previously reported REPEAT method. The most recent version of this code is available at the following repository.

Additionally, a broader set of MOSAEC-DB were assigned ML-predicted partial atomic charges using the MEPO-ML models that we reported in a recent publication.

Framework Dimensionality

Each crystal structure's dimensionality was calculated using a previously reported algorithm.

Topology

The CrystalNets package was applied to compute the net topology. A simple julia script was used to characterize MOSAEC-DB crystal structures.

File Management Utilities

Additional utilities unrelated to the MOSAEC-DB database construction and characterization processes are also provided in the zenodo record to facilitate simple file manipulations. These tools are available in zenodo alongside descriptions of their functions below.

Unchanged Crystal Structure Retrieval

Crystal structures that were unchanged by the database construction protocol due to their lack of solvent are outlined in corresponding text files (.gcd). Access to these structures is subject to the users' CSD license status, however the computation ready structure can be regenerated by applying the provided structure processing codes to the relevant CSD REFCODES. This script makes use of a CSD-Cleaner code that depends on the CSD Python API and pymatgen packages.

python get_unchanged_mofs.py --remove_disorder

Additionally, experimental functionality to regenerate the structure files containing partial atomic charges (REPEAT/MEPO-ML) is provided in these scripts. Consistency of these functions is subject to change with various versions of pymatgen and the CSD Python API, thus we cannot guarantee accuracy to the original, computed partial atomic charges. Testing was performed using Python 3.9.x, csd-python-api 3.1.0, and pymatgen 2024.5.31.

python get_unchanged_mofs.py --remove_disorder --write_repeat --write_mepoml

Subset Preparation

Subsets of MOSAEC-DB are arranged according to common conventions of porosity in prior databases, as well as a diverse sampling of several chemical and geometric descriptors. Sampling was achieved using farthest point sampling of the desired descriptor vector.

python get_subset.py ../subsets/______.txt

Updates

Information regarding future updates and additions to the database will be outlined in the GitHub repository established at the time of publication.

Licensing

The CC BY 4.0 license applies to the utilization of the MOSAEC database. Follow the license guidelines regarding the use, sharing, adaptation, and attribution of this data.

Citation

Please cite the following article when using MOSAEC-DB.

Contact

Reach out to any of the following authors with any questions:

Marco Gibaldi: marco.gibaldi@uottawa.ca

Tom Woo: tom.woo@uottawa.ca

About

Compilation of code employed in the construction and analysis of the MOSAEC database.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors