Skip to content

MACE model with OpenMM-ML. issue #1476

Description

@jagriti-iisc

Hi MACE developers,

I am trying to use a locally trained MACE model with OpenMM-ML.

The top-level model is not a TorchScript model. It loads as:

<class 'mace.modules.models.ScaleShiftMACE'>

However, when I inspect the model internals, even the normal non-compiled .model files contain internal compiled/JIT/FX submodules.

For example:

node_embedding.linear._compiled_main
interactions.0.linear_up._compiled_main
interactions.0.conv_tp._compiled_main_left_right
interactions.1.conv_tp._compiled_main_left_right
products.0.linear._compiled_main
readouts.1.linear_2._compiled_main

These are instances of:

<class 'torch.jit._script.RecursiveScriptModule'>

and some symmetric contraction modules are:

<class 'torch.fx.graph_module.GraphModule...'>.

I scanned all available model files:

Mg_Water_MACE.model: top=ScaleShiftMACE, compiled_internals=True, count=33
Mg_Water_MACE_stagetwo.model: top=ScaleShiftMACE, compiled_internals=True, count=33
checkpoints/Mg_Water_MACE_run-123.model: top=ScaleShiftMACE, compiled_internals=True, count=33
checkpoints/Mg_Water_MACE_run-123_stagetwo.model: top=ScaleShiftMACE, compiled_internals=True, count=33

The explicitly compiled model files load as top-level RecursiveScriptModule:

Mg_Water_MACE_compiled.model: top=RecursiveScriptModule, compiled_internals=True, count=101
Mg_Water_MACE_stagetwo_compiled.model: top=RecursiveScriptModule, compiled_internals=True, count=101

So the normal .model files are not top-level TorchScript models, but they still contain internal TorchScript/FX-compiled e3nn/MACE submodules.

This causes trouble with OpenMM-ML direct MACE CUDA execution.

Direct OpenMM-ML MACE backend:

potential = MLPotential("mace", modelPath="Mg_Water_MACE_stagetwo.model")
system = potential.createSystem(topology, device="cuda")

fails during minimization/energy evaluation with:

RuntimeError: Expected all tensors to be on the same device,
but got mat2 is on cpu, different from other tensors on cuda:0

The failing operation is inside:

torch.functional.tensordot

and involves:

_w3j_1_2_1

However, the same model works correctly in native MACECalculator:

MACECalculator(model_paths="Mg_Water_MACE_stagetwo.model", device="cuda")

I also tested the OpenMM-ML ASE bridge:

calc = MACECalculator(
    model_paths="Mg_Water_MACE_stagetwo.model",
    device="cuda",
    default_dtype="float32"
)

potential = MLPotential("ase")
system = potential.createSystem(topology, calculator=calc)

This works and returns an energy:

-15864308.38803956 kJ/mol

So the model itself works in CUDA through MACECalculator, but the direct OpenMM-ML MACE backend has trouble with the internal compiled/JIT/FX submodules.

My question:

How can I save/export a MACE model with no internal _compiled_main, no torch.jit._script.RecursiveScriptModule, and no torch.fx.GraphModule submodules, for maximum compatibility with OpenMM-ML direct MACE backend?

Is there a recommended flag or export pathway to create a fully eager PyTorch MACE model for OpenMM-ML?

Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions