Hi MACE developers,
I am trying to use a locally trained MACE model with OpenMM-ML.
The top-level model is not a TorchScript model. It loads as:
<class 'mace.modules.models.ScaleShiftMACE'>
However, when I inspect the model internals, even the normal non-compiled .model files contain internal compiled/JIT/FX submodules.
For example:
node_embedding.linear._compiled_main
interactions.0.linear_up._compiled_main
interactions.0.conv_tp._compiled_main_left_right
interactions.1.conv_tp._compiled_main_left_right
products.0.linear._compiled_main
readouts.1.linear_2._compiled_main
These are instances of:
<class 'torch.jit._script.RecursiveScriptModule'>
and some symmetric contraction modules are:
<class 'torch.fx.graph_module.GraphModule...'>.
I scanned all available model files:
Mg_Water_MACE.model: top=ScaleShiftMACE, compiled_internals=True, count=33
Mg_Water_MACE_stagetwo.model: top=ScaleShiftMACE, compiled_internals=True, count=33
checkpoints/Mg_Water_MACE_run-123.model: top=ScaleShiftMACE, compiled_internals=True, count=33
checkpoints/Mg_Water_MACE_run-123_stagetwo.model: top=ScaleShiftMACE, compiled_internals=True, count=33
The explicitly compiled model files load as top-level RecursiveScriptModule:
Mg_Water_MACE_compiled.model: top=RecursiveScriptModule, compiled_internals=True, count=101
Mg_Water_MACE_stagetwo_compiled.model: top=RecursiveScriptModule, compiled_internals=True, count=101
So the normal .model files are not top-level TorchScript models, but they still contain internal TorchScript/FX-compiled e3nn/MACE submodules.
This causes trouble with OpenMM-ML direct MACE CUDA execution.
Direct OpenMM-ML MACE backend:
potential = MLPotential("mace", modelPath="Mg_Water_MACE_stagetwo.model")
system = potential.createSystem(topology, device="cuda")
fails during minimization/energy evaluation with:
RuntimeError: Expected all tensors to be on the same device,
but got mat2 is on cpu, different from other tensors on cuda:0
The failing operation is inside:
torch.functional.tensordot
and involves:
However, the same model works correctly in native MACECalculator:
MACECalculator(model_paths="Mg_Water_MACE_stagetwo.model", device="cuda")
I also tested the OpenMM-ML ASE bridge:
calc = MACECalculator(
model_paths="Mg_Water_MACE_stagetwo.model",
device="cuda",
default_dtype="float32"
)
potential = MLPotential("ase")
system = potential.createSystem(topology, calculator=calc)
This works and returns an energy:
-15864308.38803956 kJ/mol
So the model itself works in CUDA through MACECalculator, but the direct OpenMM-ML MACE backend has trouble with the internal compiled/JIT/FX submodules.
My question:
How can I save/export a MACE model with no internal _compiled_main, no torch.jit._script.RecursiveScriptModule, and no torch.fx.GraphModule submodules, for maximum compatibility with OpenMM-ML direct MACE backend?
Is there a recommended flag or export pathway to create a fully eager PyTorch MACE model for OpenMM-ML?
Thanks.
Hi MACE developers,
I am trying to use a locally trained MACE model with OpenMM-ML.
The top-level model is not a TorchScript model. It loads as:
<class 'mace.modules.models.ScaleShiftMACE'>
However, when I inspect the model internals, even the normal non-compiled
.modelfiles contain internal compiled/JIT/FX submodules.For example:
node_embedding.linear._compiled_main
interactions.0.linear_up._compiled_main
interactions.0.conv_tp._compiled_main_left_right
interactions.1.conv_tp._compiled_main_left_right
products.0.linear._compiled_main
readouts.1.linear_2._compiled_main
These are instances of:
<class 'torch.jit._script.RecursiveScriptModule'>
and some symmetric contraction modules are:
<class 'torch.fx.graph_module.GraphModule...'>.
I scanned all available model files:
Mg_Water_MACE.model: top=ScaleShiftMACE, compiled_internals=True, count=33
Mg_Water_MACE_stagetwo.model: top=ScaleShiftMACE, compiled_internals=True, count=33
checkpoints/Mg_Water_MACE_run-123.model: top=ScaleShiftMACE, compiled_internals=True, count=33
checkpoints/Mg_Water_MACE_run-123_stagetwo.model: top=ScaleShiftMACE, compiled_internals=True, count=33
The explicitly compiled model files load as top-level RecursiveScriptModule:
Mg_Water_MACE_compiled.model: top=RecursiveScriptModule, compiled_internals=True, count=101
Mg_Water_MACE_stagetwo_compiled.model: top=RecursiveScriptModule, compiled_internals=True, count=101
So the normal
.modelfiles are not top-level TorchScript models, but they still contain internal TorchScript/FX-compiled e3nn/MACE submodules.This causes trouble with OpenMM-ML direct MACE CUDA execution.
Direct OpenMM-ML MACE backend:
fails during minimization/energy evaluation with:
The failing operation is inside:
and involves:
However, the same model works correctly in native MACECalculator:
I also tested the OpenMM-ML ASE bridge:
This works and returns an energy:
So the model itself works in CUDA through MACECalculator, but the direct OpenMM-ML MACE backend has trouble with the internal compiled/JIT/FX submodules.
My question:
How can I save/export a MACE model with no internal
_compiled_main, notorch.jit._script.RecursiveScriptModule, and notorch.fx.GraphModulesubmodules, for maximum compatibility with OpenMM-ML direct MACE backend?Is there a recommended flag or export pathway to create a fully eager PyTorch MACE model for OpenMM-ML?
Thanks.