MACE model with OpenMM-ML. issue

Hi MACE developers,

I am trying to use a locally trained MACE model with OpenMM-ML.

The top-level model is not a TorchScript model. It loads as:

<class 'mace.modules.models.ScaleShiftMACE'>

However, when I inspect the model internals, even the normal non-compiled `.model` files contain internal compiled/JIT/FX submodules.

For example:

node_embedding.linear._compiled_main
interactions.0.linear_up._compiled_main
interactions.0.conv_tp._compiled_main_left_right
interactions.1.conv_tp._compiled_main_left_right
products.0.linear._compiled_main
readouts.1.linear_2._compiled_main

These are instances of:

<class 'torch.jit._script.RecursiveScriptModule'>

and some symmetric contraction modules are:

<class 'torch.fx.graph_module.GraphModule...'>.

I scanned all available model files:

Mg_Water_MACE.model: top=ScaleShiftMACE, compiled_internals=True, count=33
Mg_Water_MACE_stagetwo.model: top=ScaleShiftMACE, compiled_internals=True, count=33
checkpoints/Mg_Water_MACE_run-123.model: top=ScaleShiftMACE, compiled_internals=True, count=33
checkpoints/Mg_Water_MACE_run-123_stagetwo.model: top=ScaleShiftMACE, compiled_internals=True, count=33

The explicitly compiled model files load as top-level RecursiveScriptModule:

Mg_Water_MACE_compiled.model: top=RecursiveScriptModule, compiled_internals=True, count=101
Mg_Water_MACE_stagetwo_compiled.model: top=RecursiveScriptModule, compiled_internals=True, count=101

So the normal `.model` files are not top-level TorchScript models, but they still contain internal TorchScript/FX-compiled e3nn/MACE submodules.

This causes trouble with OpenMM-ML direct MACE CUDA execution.

Direct OpenMM-ML MACE backend:

    potential = MLPotential("mace", modelPath="Mg_Water_MACE_stagetwo.model")
    system = potential.createSystem(topology, device="cuda")

fails during minimization/energy evaluation with:

    RuntimeError: Expected all tensors to be on the same device,
    but got mat2 is on cpu, different from other tensors on cuda:0

The failing operation is inside:

    torch.functional.tensordot

and involves:

    _w3j_1_2_1

However, the same model works correctly in native MACECalculator:

    MACECalculator(model_paths="Mg_Water_MACE_stagetwo.model", device="cuda")

I also tested the OpenMM-ML ASE bridge:

    calc = MACECalculator(
        model_paths="Mg_Water_MACE_stagetwo.model",
        device="cuda",
        default_dtype="float32"
    )

    potential = MLPotential("ase")
    system = potential.createSystem(topology, calculator=calc)

This works and returns an energy:

    -15864308.38803956 kJ/mol

So the model itself works in CUDA through MACECalculator, but the direct OpenMM-ML MACE backend has trouble with the internal compiled/JIT/FX submodules.

My question:

How can I save/export a MACE model with no internal `_compiled_main`, no `torch.jit._script.RecursiveScriptModule`, and no `torch.fx.GraphModule` submodules, for maximum compatibility with OpenMM-ML direct MACE backend?

Is there a recommended flag or export pathway to create a fully eager PyTorch MACE model for OpenMM-ML?

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MACE model with OpenMM-ML. issue #1476

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

MACE model with OpenMM-ML. issue #1476

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions