LFM model support by wenxie-amd · Pull Request #651 · AMD-AGI/Primus

wenxie-amd · 2026-04-08T14:32:40Z

No description provided.

primus/backends/megatron/core/models/gpt/gpt_layer_specs.py

primus/backends/megatron/core/transformer/attention.py

+if is_causal_conv1d_available():
+    from causal_conv1d import causal_conv1d_fn, causal_conv1d_update
+else:
+    print(f"Unable to import causal_conv1d!!!")


Copilot

Pull request overview

This PR adds initial support for LiquidAI’s LFM2 model family in the Megatron backend by introducing new model configs, a Primus implementation of the LFM2 short-convolution “attention” layer, and a patch to swap Megatron’s decoder layer spec builder when LFM-specific layer types are configured.

Changes:

Add LFM/LFM2 Megatron model configuration YAMLs (base + LFM2-8B-A1B).
Introduce Lfm2ShortConv (Primus implementation) and an LFM-aware GPT layer spec builder.
Add a Megatron patch to route get_gpt_decoder_layer_specs to the Primus implementation and provide an example pretrain config.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`primus/configs/models/megatron/primus_megatron_model.yaml`	Adds LFM2-related config knobs (`lfm_layer_types`, `conv_L_cache`, `conv_bias`).
`primus/configs/models/megatron/lfm2_8B_A1B.yaml`	New LFM2-8B-A1B model config (tokenizer + model/MoE/GQA settings).
`primus/configs/models/megatron/lfm_base.yaml`	New LFM base config defaults (RoPE, RMSNorm, long context, MoE settings).
`primus/backends/megatron/training/tokenizer/tokenizer.py`	Adds `Lfm2Tokenizer` to the allowed tokenizer type set.
`primus/backends/megatron/patches/gpt_decoder_layer_specs_patches.py`	Registers a patch to override Megatron’s decoder layer spec builder when LFM is configured.
`primus/backends/megatron/core/transformer/attention.py`	Adds `Lfm2ShortConv` module (depthwise conv + projections) with fast/slow paths.
`primus/backends/megatron/core/models/gpt/gpt_layer_specs.py`	Adds Primus GPT decoder layer spec selection supporting `lfm_layer_types` (“conv” vs “full_attention”).
`examples/megatron/configs/MI355X/lfm2_8B_A1B-BF16-pretrain.yaml`	Example training config wiring in `lfm_layer_types` and training overrides.

Copilot · 2026-04-08T14:41:26Z

primus/backends/megatron/core/transformer/attention.py

+if is_causal_conv1d_available():
+    from causal_conv1d import causal_conv1d_fn, causal_conv1d_update
+else:
+    print(f"Unable to import causal_conv1d!!!")
+    causal_conv1d_fn, causal_conv1d_update = None, None
+


This module prints to stdout at import time when causal_conv1d isn't available. Import-time prints are noisy in distributed training and can break log formatting; prefer using the project's rank-aware logging (e.g., log_rank_0/warning_rank_0) or warnings.warn, and avoid side effects during import.

Copilot · 2026-04-08T14:41:27Z

primus/backends/megatron/core/transformer/attention.py

+from megatron.core.inference.contexts import BaseInferenceContext
+from megatron.core.packed_seq_params import PackedSeqParams
+from megatron.core.process_groups_config import ProcessGroupCollection
+from megatron.core.transformer.attention import AttnMaskType


AttnMaskType is imported from megatron.core.transformer.attention, while the rest of the codebase consistently imports it from megatron.core.transformer.enums (e.g., primus/backends/megatron/core/transformer/multi_latent_attention.py:12). If megatron.core.transformer.attention doesn’t export AttnMaskType in some Megatron versions, this will raise ImportError at import time; consider switching to the enums import for consistency and compatibility.

Suggested change

from megatron.core.transformer.attention import AttnMaskType

from megatron.core.transformer.enums import AttnMaskType

Copilot · 2026-04-08T14:41:27Z

primus/backends/megatron/core/transformer/attention.py

+        self.L_cache = args.conv_L_cache
+        self.bias = args.conv_bias


get_args() is expected to always have conv_L_cache and conv_bias, but Primus's MegatronArgBuilder drops config keys that aren't present in Megatron's argparse defaults (see primus/backends/megatron/argument_builder.py:86-110). If the underlying Megatron version doesn’t define these args, this will raise AttributeError when constructing the layer. Consider using getattr(args, ..., default) with sensible defaults and/or adding an args patch to register these flags with Megatron’s parser.

Suggested change

self.L_cache = args.conv_L_cache

self.bias = args.conv_bias

self.L_cache = getattr(args, "conv_L_cache", 1)

self.bias = getattr(args, "conv_bias", False)

Copilot · 2026-04-08T14:41:28Z

primus/backends/megatron/patches/gpt_decoder_layer_specs_patches.py

+    # patch its local binding.
+    try:
+        import gpt_builders as gpt_builders_module  # pyright: ignore[reportMissingImports]
+    except ImportError as exc:
+        log_rank_0(
+            "[Patch:megatron.gpt.decoder_layer_specs]   Failed to import gpt_builders; "
+            "cannot patch local get_gpt_decoder_layer_specs binding."
+        )
+        raise RuntimeError("Failed to import required module gpt_builders") from exc


If gpt_builders can’t be imported, this patch raises and aborts training. Given primus/core/utils/import_utils.py explicitly treats gpt_builders as version-dependent, it would be safer to log and continue after patching megatron.core.models.gpt.gpt_layer_specs (older Megatron versions won’t need the local-binding patch anyway).

Suggested change

# patch its local binding.

try:

import gpt_builders as gpt_builders_module # pyright: ignore[reportMissingImports]

except ImportError as exc:

log_rank_0(

"[Patch:megatron.gpt.decoder_layer_specs] Failed to import gpt_builders; "

"cannot patch local get_gpt_decoder_layer_specs binding."

)

raise RuntimeError("Failed to import required module gpt_builders") from exc

# patch its local binding when that version-dependent module is available.

try:

import gpt_builders as gpt_builders_module # pyright: ignore[reportMissingImports]

except ImportError:

log_rank_0(

"[Patch:megatron.gpt.decoder_layer_specs] gpt_builders is unavailable; "

"skipping local get_gpt_decoder_layer_specs binding patch."

)

return

Copilot · 2026-04-08T14:41:28Z

primus/backends/megatron/core/models/gpt/gpt_layer_specs.py

+    HAVE_TE = True
+except ImportError:
+    HAVE_TE = False
+


TESpecProvider is only imported inside the try block, but get_lfm_layer_with_transformer_engine_spec() unconditionally instantiates it. If transformer_engine isn't installed, TESpecProvider will be undefined and this will raise NameError. Consider guarding with if not HAVE_TE: raise RuntimeError(...) (or import TESpecProvider inside the function and raise a clear error) so failures are deterministic and actionable.

Suggested change

class TESpecProvider: # type: ignore[no-redef]

"""Fallback provider that raises a clear error when TE is unavailable."""

def __init__(self, *args, **kwargs):

raise RuntimeError(

"Transformer Engine support is unavailable because the "

"`transformer_engine` package is not installed."

)

Copilot · 2026-04-08T14:41:28Z

primus/backends/megatron/core/models/gpt/gpt_layer_specs.py

+def get_gpt_decoder_layer_specs(
+    config: TransformerConfig,
+    use_transformer_engine: bool,
+    normalization: Optional[str] = None,
+    qk_l2_norm: Optional[bool] = False,
+    vp_stage: Optional[int] = None,
+    pp_rank: Optional[int] = None,
+) -> TransformerBlockSubmodules:
+    """GPT block spec."""
+    assert use_transformer_engine, "use_transformer_engine must be True"


The function is annotated to return TransformerBlockSubmodules, but it actually returns a Python list of per-layer specs (layer_specs). If callers expect the annotated type (or upstream Megatron’s signature differs), this can cause runtime breakage and makes the code harder to reason about. Align the return type annotation (and remove the unused TransformerBlockSubmodules import if appropriate) with what’s actually returned/expected by Megatron.

Copilot · 2026-04-08T14:41:28Z

primus/backends/megatron/core/models/gpt/gpt_layer_specs.py

+    args = get_args()
+
+    lfm_layer_types = args.lfm_layer_types
+
+    if lfm_layer_types is None:
+        lfm_layer_types = ["full_attention"] * args.num_layers
+    else:
+        assert isinstance(
+            lfm_layer_types, list
+        ), f"lfm_layer_types must be list[str] or None, got {type(lfm_layer_types)}"
+        invalid_type_values = [
+            layer_type for layer_type in lfm_layer_types if not isinstance(layer_type, str)
+        ]
+        assert (
+            not invalid_type_values
+        ), f"lfm_layer_types must contain only str values, got invalid values: {invalid_type_values}"
+        valid_layer_types = {"conv", "full_attention"}
+        invalid_layer_types = sorted(
+            {layer_type for layer_type in lfm_layer_types if layer_type not in valid_layer_types}
+        )
+        assert not invalid_layer_types, (
+            f"Invalid lfm_layer_types values: {invalid_layer_types}. "
+            f"Only {sorted(valid_layer_types)} are allowed."
+        )
+
+    assert len(lfm_layer_types) == args.num_layers, (
+        f"Invalid lfm_layer_types length: {len(lfm_layer_types)}, "
+        f"expected {args.num_layers} (args.num_layers)."
+    )


lfm_layer_types defaults and length validation are based on args.num_layers, but the layer construction loop uses config.num_layers. If these ever differ, this can either raise on length check or index out of range later. Use config.num_layers consistently for both defaulting and validation (and only fall back to args when necessary).

Copilot · 2026-04-08T14:41:29Z

primus/backends/megatron/core/transformer/attention.py

+        # attention_mask: [batch, 1, seq, seq]
+        hidden_states = hidden_states.transpose(0, 1)
+
+        if is_fast_path_available and "cuda" in hidden_states.device.type and not is_torchdynamo_compiling():


hidden_states.device.type is typically the string 'cuda'/'cpu'; using "cuda" in hidden_states.device.type works but is an odd predicate and can hide mistakes. Prefer an explicit check like hidden_states.device.type == "cuda" for clarity.

Suggested change

if is_fast_path_available and "cuda" in hidden_states.device.type and not is_torchdynamo_compiling():

if (

is_fast_path_available

and hidden_states.device.type == "cuda"

and not is_torchdynamo_compiling()

):

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.

Copilot · 2026-04-09T10:02:43Z

primus/backends/megatron/core/models/gpt/gpt_layer_specs.py

+from megatron.core.transformer.spec_utils import ModuleSpec
+from megatron.core.transformer.transformer_block import TransformerBlockSubmodules
+from megatron.core.transformer.transformer_config import TransformerConfig
+
+from primus.backends.megatron.core.transformer.attention import (
+    Lfm2ShortConv,
+    Lfm2ShortConvSubmodules,
+)
+
+
+def get_gpt_decoder_layer_specs(
+    config: TransformerConfig,
+    use_transformer_engine: bool,
+    normalization: Optional[str] = None,
+    qk_l2_norm: Optional[bool] = False,
+    vp_stage: Optional[int] = None,
+    pp_rank: Optional[int] = None,
+) -> TransformerBlockSubmodules:
+    """GPT block spec."""
+    assert use_transformer_engine, "use_transformer_engine must be True"
+


get_gpt_decoder_layer_specs is annotated/typed as returning TransformerBlockSubmodules, but the function actually constructs and returns layer_specs (a list of ModuleSpec). This mismatch can confuse callers and static checkers, and may indicate the wrong API is being patched; update the return type/imports to match the actual return value (or adjust implementation to return the expected object).

Copilot · 2026-04-09T10:02:44Z

primus/backends/megatron/core/models/gpt/gpt_layer_specs.py

+    args = get_args()
+
+    lfm_layer_types = args.lfm_layer_types
+
+    if lfm_layer_types is None:
+        lfm_layer_types = ["full_attention"] * args.num_layers
+    else:
+        assert isinstance(
+            lfm_layer_types, list
+        ), f"lfm_layer_types must be list[str] or None, got {type(lfm_layer_types)}"
+        invalid_type_values = [
+            layer_type for layer_type in lfm_layer_types if not isinstance(layer_type, str)
+        ]
+        assert (
+            not invalid_type_values
+        ), f"lfm_layer_types must contain only str values, got invalid values: {invalid_type_values}"
+        valid_layer_types = {"conv", "full_attention"}
+        invalid_layer_types = sorted(
+            {layer_type for layer_type in lfm_layer_types if layer_type not in valid_layer_types}
+        )
+        assert not invalid_layer_types, (
+            f"Invalid lfm_layer_types values: {invalid_layer_types}. "
+            f"Only {sorted(valid_layer_types)} are allowed."
+        )
+
+    assert len(lfm_layer_types) == args.num_layers, (
+        f"Invalid lfm_layer_types length: {len(lfm_layer_types)}, "
+        f"expected {args.num_layers} (args.num_layers)."
+    )


lfm_layer_types validation uses args.num_layers, but layer construction iterates over config.num_layers. If these differ, lfm_layer_types[layer_number] can raise IndexError or silently misconfigure layers. Use a single source of truth (prefer config.num_layers) for both the default fill and the length check.

Copilot · 2026-04-09T10:02:44Z

primus/backends/megatron/core/models/gpt/gpt_layer_specs.py

+    # Parse config.moe_layer_freq to determine the pattern of expert/dense layers.
+    # 0 stands for dense layers, 1 stands for expert layers.
+    # For integer N: Creates a pattern with one expert layer every N layers.
+    # For string pattern: Evaluates the str directly (e.g. "[1,0,1]" for alternating expert/dense).
+    if isinstance(config.moe_layer_freq, int):
+        moe_layer_pattern = [1 if (i % config.moe_layer_freq == 0) else 0 for i in range(config.num_layers)]
+    elif isinstance(config.moe_layer_freq, list):
+        moe_layer_pattern = config.moe_layer_freq
+        assert len(moe_layer_pattern) == config.num_layers, (
+            f"Invalid length of moe_layer_pattern: {len(moe_layer_pattern)}, "
+            f"expected {config.num_layers}, "
+            f"current moe layer pattern: {config.moe_layer_freq}"
+        )
+    else:
+        raise ValueError(f"Invalid moe_layer_freq: {type(config.moe_layer_freq)}, {config.moe_layer_freq}")


The comment says string moe_layer_freq patterns are evaluated, but the code only accepts int or list and raises for str. Either update the comment to reflect the real behavior, or add explicit handling for str here (if this function can be called before the existing moe_layer_freq normalization patches run).

Copilot · 2026-04-09T10:02:44Z

primus/backends/megatron/core/models/gpt/gpt_layer_specs.py

+        )
+
+    assert not use_kitchen, "use_kitchen is not supported with LFM model."
+    # Note: patch_gpt_decoder_layer_specs must be applied bafter te_spec_provider_patches


Typo in comment: "bafter" -> "after".

Suggested change

# Note: patch_gpt_decoder_layer_specs must be applied bafter te_spec_provider_patches

# Note: patch_gpt_decoder_layer_specs must be applied after te_spec_provider_patches

Copilot · 2026-04-09T10:02:44Z

primus/backends/megatron/patches/gpt_decoder_layer_specs_patches.py

+    # gpt_builders imports get_gpt_decoder_layer_specs directly; force import and
+    # patch its local binding.
+    try:
+        import gpt_builders as gpt_builders_module  # pyright: ignore[reportMissingImports]
+    except ImportError as exc:
+        log_rank_0(
+            "[Patch:megatron.gpt.decoder_layer_specs]   Failed to import gpt_builders; "
+            "cannot patch local get_gpt_decoder_layer_specs binding."
+        )
+        raise RuntimeError("Failed to import required module gpt_builders") from exc
+
+    gpt_builders_module.get_gpt_decoder_layer_specs = primus_get_gpt_decoder_layer_specs
+    log_rank_0(
+        "[Patch:megatron.gpt.decoder_layer_specs]   Patched "
+        "gpt_builders.get_gpt_decoder_layer_specs "
+        f"-> {primus_get_gpt_decoder_layer_specs.__name__}"
+    )


This patch hard-fails if gpt_builders cannot be imported. gpt_builders only exists in some Megatron versions (Primus already treats it as optional in primus/core/utils/import_utils.py); in versions without it, this should be a no-op rather than aborting training. Consider using the same lazy-import pattern and only patch the local binding when the module is present.

Suggested change

# gpt_builders imports get_gpt_decoder_layer_specs directly; force import and

# patch its local binding.

try:

import gpt_builders as gpt_builders_module # pyright: ignore[reportMissingImports]

except ImportError as exc:

log_rank_0(

"[Patch:megatron.gpt.decoder_layer_specs] Failed to import gpt_builders; "

"cannot patch local get_gpt_decoder_layer_specs binding."

)

raise RuntimeError("Failed to import required module gpt_builders") from exc

gpt_builders_module.get_gpt_decoder_layer_specs = primus_get_gpt_decoder_layer_specs

log_rank_0(

"[Patch:megatron.gpt.decoder_layer_specs] Patched "

"gpt_builders.get_gpt_decoder_layer_specs "

f"-> {primus_get_gpt_decoder_layer_specs.__name__}"

)

# gpt_builders imports get_gpt_decoder_layer_specs directly; patch its local

# binding only when that optional module exists in this Megatron version.

import importlib

import importlib.util

if importlib.util.find_spec("gpt_builders") is None:

log_rank_0(

"[Patch:megatron.gpt.decoder_layer_specs] gpt_builders is not available; "

"skipping patch of local get_gpt_decoder_layer_specs binding."

)

else:

gpt_builders_module = importlib.import_module("gpt_builders")

gpt_builders_module.get_gpt_decoder_layer_specs = primus_get_gpt_decoder_layer_specs

log_rank_0(

"[Patch:megatron.gpt.decoder_layer_specs] Patched "

"gpt_builders.get_gpt_decoder_layer_specs "

f"-> {primus_get_gpt_decoder_layer_specs.__name__}"

)

Copilot · 2026-04-09T10:02:45Z

primus/backends/megatron/core/transformer/attention.py

+    def __init__(
+        self,
+        config: TransformerConfig,
+        submodules: Optional[Lfm2ShortConvSubmodules],
+        layer_number: int,
+        attn_mask_type: AttnMaskType,
+        attention_type: str | None = None,
+        cp_comm_type: str | None = None,
+        pg_collection: ProcessGroupCollection | None = None,
+    ):


submodules is typed as Optional[...] but is dereferenced unconditionally (submodules.in_proj / submodules.out_proj). Either make submodules non-optional in the signature, or add an explicit check with a clear error so failures are easier to diagnose.

Copilot · 2026-04-09T10:02:45Z

primus/configs/models/megatron/primus_megatron_model.yaml

+# LFM2 model configuration
+# lfm_layer_types: ["conv", "conv", "full_attention", "conv", "conv", "conv", "full_attention", "conv", "conv", "conv", "full_attention", "conv", "conv", "conv", "full_attention", "conv", "conv", "conv", "full_attention", "conv", "conv", "full_attention", "conv", "conv"]
+lfm_layer_types: null # list[str]
+conv_L_cache: 3 # int
+conv_bias: false # bool


These new config fields (lfm_layer_types, conv_L_cache, conv_bias) may be silently dropped when converting Primus configs to Megatron args: MegatronArgBuilder.update() only keeps keys present in Megatron's argparse defaults (primus/backends/megatron/argument_builder.py:92-110). If Megatron doesn't define these flags, megatron.training.get_args() won't have them and the new LFM code (e.g., args.conv_L_cache / args.lfm_layer_types) will fail at runtime. Consider adding an args patch to inject these fields into Megatron args (with defaults) during build_args, or extending the argument builder to allow these Primus-specific keys.

Copilot · 2026-04-09T10:02:45Z

primus/backends/megatron/core/models/gpt/gpt_layer_specs.py

+    from megatron.training import get_args
+
+    args = get_args()
+
+    lfm_layer_types = args.lfm_layer_types
+
+    if lfm_layer_types is None:
+        lfm_layer_types = ["full_attention"] * args.num_layers
+    else:


args.lfm_layer_types is accessed directly; if this attribute isn't present in the Megatron args namespace (e.g., because it isn't part of Megatron's argparse defaults and was dropped during config conversion), this will raise AttributeError. Use getattr(args, "lfm_layer_types", None) (and similarly for any new LFM args) to make the code resilient across Megatron versions/config paths.

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.

Copilot · 2026-04-10T03:02:19Z

run.sh

+export HF_TOKEN="${HF_TOKEN:-'your_hf_token'}"  # make it your own hf token
+export WANDB_API_KEY="${WANDB_API_KEY:-'your_wandb_api_key'}"  # make it your own wandb api key


The default-value quoting in these parameter expansions will include literal single quotes in the resulting env var values (e.g., HF_TOKEN becomes 'your_hf_token'), which will break authentication if users rely on the default/template. Use unquoted defaults or the same double-quote pattern used elsewhere in the repo (e.g., ${HF_TOKEN:-"your_hf_token"}).

Suggested change

export HF_TOKEN="${HF_TOKEN:-'your_hf_token'}" # make it your own hf token

export WANDB_API_KEY="${WANDB_API_KEY:-'your_wandb_api_key'}" # make it your own wandb api key

export HF_TOKEN="${HF_TOKEN:-"your_hf_token"}" # make it your own hf token

export WANDB_API_KEY="${WANDB_API_KEY:-"your_wandb_api_key"}" # make it your own wandb api key

Copilot · 2026-04-10T03:02:19Z

run.sh

+if [ "$TURBO_USE_GROUPED_MLP" = "True" ]; then
+  export PRIMUS_TURBO_GROUPED_GEMM_BACKEND=TRITON
+#   export PRIMUS_TURBO_GROUPED_GEMM_BACKEND=CK
+#   export PRIMUS_TURBO_GROUPED_GEMM_BACKEND=HIPBLASLT
+fi
+
+export PRECISION_TYPE=FP8 # BF16
+export EXP=examples/megatron/configs/MI355X/lfm2_8B_A1B-${PRECISION_TYPE}-pretrain.yaml
+# export EXP=examples/megatron/configs/MI355X/qwen3_30B_A3B-BF16-pretrain.yaml
+export PRIMUS_TEAM=amd
+PRIMUS_USER="tas-mi325x-$(date +%Y%m%d)"
+export PRIMUS_USER
+export PRIMUS_EXP_NAME=lfm2_8B_A1B_${PRECISION_TYPE}_MBS${MBS}_GBS${GBS}_EP${PRIMUS_EP}_legacygg${LEGACY_GG}_turbogg${TURBO_USE_GROUPED_MLP}_${PRIMUS_TURBO_GROUPED_GEMM_BACKEND}


PRIMUS_TURBO_GROUPED_GEMM_BACKEND is referenced in PRIMUS_EXP_NAME even when TURBO_USE_GROUPED_MLP is False, but it is only exported inside the True branch. This makes the experiment name depend on whether the variable was previously set in the shell (or end up empty). Consider setting an explicit default when grouped MLP is disabled (e.g., "NONE") or constructing PRIMUS_EXP_NAME conditionally.

Copilot · 2026-04-10T03:02:20Z

run.sh

+  --use_pytorch_profiler $PROFILE \
+  --profile_step_end 7 \
+  --profile_step_start 6 \
+  2>&1 | tee "output/$PRIMUS_TEAM/$PRIMUS_USER/$PRIMUS_EXP_NAME/log_node_${NODE_RANK}.txt"


NODE_RANK is used in the log filename but is never set in this script. If NODE_RANK is unset, logs will be written to a path with an empty rank (e.g., log_node_.txt) and may collide across processes. Set a default (e.g., NODE_RANK=${PET_NODE_RANK:-0}) or use a safer fallback in the filename.

Copilot · 2026-04-10T03:02:20Z

primus/backends/megatron/core/transformer/attention.py

+if is_causal_conv1d_available():
+    from causal_conv1d import causal_conv1d_fn, causal_conv1d_update
+else:
+    print(f"Unable to import causal_conv1d!!!")
+    causal_conv1d_fn, causal_conv1d_update = None, None
+
+kernel_modules = (causal_conv1d_fn, causal_conv1d_update)
+is_fast_path_available = all(kernel_modules)


This module prints to stdout at import time when causal_conv1d isn't available. In distributed runs this can spam logs from every rank and is hard to control. Prefer using the project logger (or at least warnings.warn) and avoid side effects at import time; also consider checking availability when the fast path is actually requested.

Suggested change

if is_causal_conv1d_available():

from causal_conv1d import causal_conv1d_fn, causal_conv1d_update

else:

print(f"Unable to import causal_conv1d!!!")

causal_conv1d_fn, causal_conv1d_update = None, None

kernel_modules = (causal_conv1d_fn, causal_conv1d_update)

is_fast_path_available = all(kernel_modules)

causal_conv1d_fn, causal_conv1d_update = None, None

def _get_causal_conv1d_kernels():

global causal_conv1d_fn, causal_conv1d_update

if causal_conv1d_fn is not None and causal_conv1d_update is not None:

return causal_conv1d_fn, causal_conv1d_update

if not is_causal_conv1d_available():

return None, None

from causal_conv1d import causal_conv1d_fn as _causal_conv1d_fn

from causal_conv1d import causal_conv1d_update as _causal_conv1d_update

causal_conv1d_fn, causal_conv1d_update = _causal_conv1d_fn, _causal_conv1d_update

return causal_conv1d_fn, causal_conv1d_update

kernel_modules = (causal_conv1d_fn, causal_conv1d_update)

is_fast_path_available = is_causal_conv1d_available()

Copilot · 2026-04-10T03:02:20Z

primus/backends/megatron/patches/gpt_decoder_layer_specs_patches.py

+    except ImportError as exc:
+        log_rank_0(
+            "[Patch:megatron.gpt.decoder_layer_specs]   Failed to import gpt_builders; "
+            "cannot patch local get_gpt_decoder_layer_specs binding."
+        )
+        raise RuntimeError("Failed to import required module gpt_builders") from exc


If gpt_builders is not importable (older Megatron versions), this patch raises RuntimeError and aborts training even though those versions may not need the additional local-binding patch. To preserve the repo's stated multi-version compatibility (see primus/core/utils/import_utils.py), handle ImportError by logging and returning instead of raising.

Suggested change

except ImportError as exc:

log_rank_0(

"[Patch:megatron.gpt.decoder_layer_specs] Failed to import gpt_builders; "

"cannot patch local get_gpt_decoder_layer_specs binding."

)

raise RuntimeError("Failed to import required module gpt_builders") from exc

except ImportError:

log_rank_0(

"[Patch:megatron.gpt.decoder_layer_specs] Failed to import gpt_builders; "

"skipping local get_gpt_decoder_layer_specs binding patch for compatibility."

)

return

Copilot · 2026-04-10T03:02:20Z

primus/backends/megatron/core/models/gpt/gpt_layer_specs.py

+            'The fp8 argument in "get_gpt_layer_with_transformer_engine_spec" has been deprecated'
+            " and will be removed soon. Please update your code accordingly."
+        )
+
+    assert not use_kitchen, "use_kitchen is not supported with LFM model."
+    # Note: patch_gpt_decoder_layer_specs must be applied bafter te_spec_provider_patches


The deprecation warning text references "get_gpt_layer_with_transformer_engine_spec", but this code is in get_lfm_layer_with_transformer_engine_spec. Also, the comment has a typo ("bafter"). Please correct these to avoid misleading users and to keep logs/searchability accurate.

Suggested change

'The fp8 argument in "get_gpt_layer_with_transformer_engine_spec" has been deprecated'

" and will be removed soon. Please update your code accordingly."

)

assert not use_kitchen, "use_kitchen is not supported with LFM model."

# Note: patch_gpt_decoder_layer_specs must be applied bafter te_spec_provider_patches

'The fp8 argument in "get_lfm_layer_with_transformer_engine_spec" has been deprecated'

" and will be removed soon. Please update your code accordingly."

)

assert not use_kitchen, "use_kitchen is not supported with LFM model."

# Note: patch_gpt_decoder_layer_specs must be applied after te_spec_provider_patches

Copilot · 2026-04-10T03:02:20Z

primus/backends/megatron/patches/gpt_decoder_layer_specs_patches.py

+@register_patch(
+    "megatron.gpt.decoder_layer_specs",
+    backend="megatron",
+    phase="before_train",
+    description=(
+        "Monkey patch get_gpt_decoder_layer_specs to use Primus implementation "
+        "when lfm_layer_types is configured for LFM model."
+    ),
+    condition=lambda ctx: getattr(get_args(ctx), "lfm_layer_types", None) is not None,
+    priority=42,  # must patch after te_spec_provider_patches
+)


New LFM-specific behavior is introduced via monkey-patching Megatron's get_gpt_decoder_layer_specs, but there are no accompanying unit tests to validate (1) conv vs full_attention layer selection and (2) patch behavior across Megatron versions (with/without gpt_builders). Given the existing unit test suite under tests/unit_tests, please add targeted tests to prevent regressions.

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Copilot · 2026-04-13T07:22:14Z

run_qwen.sh

+export HF_TOKEN="${HF_TOKEN:-'your_hf_token'}"  # make it your own hf token
+export WANDB_API_KEY="${WANDB_API_KEY:-'your_wandb_api_key'}"  # make it your own wandb api key


Same issue as run.sh: ${HF_TOKEN:-'your_hf_token'} / ${WANDB_API_KEY:-'your_wandb_api_key'} export values containing literal single quotes. Prefer ${HF_TOKEN:-"your_hf_token"} (as used in other repo scripts) or omit the inner quotes.

Suggested change

export HF_TOKEN="${HF_TOKEN:-'your_hf_token'}" # make it your own hf token

export WANDB_API_KEY="${WANDB_API_KEY:-'your_wandb_api_key'}" # make it your own wandb api key

export HF_TOKEN="${HF_TOKEN:-your_hf_token}" # make it your own hf token

export WANDB_API_KEY="${WANDB_API_KEY:-your_wandb_api_key}" # make it your own wandb api key

Copilot · 2026-04-13T07:22:14Z

run_qwen.sh

+export PRIMUS_EXP_NAME=lfm2_8B_A1B_${PRECISION_TYPE}_MBS${MBS}_GBS${GBS}_EP${PRIMUS_EP}_legacygg${LEGACY_GG}_turbogg${TURBO_USE_GROUPED_MLP}_${PRIMUS_TURBO_GROUPED_GEMM_BACKEND}
+export PRIMUS_EXP_NAME=debug


PRIMUS_EXP_NAME is computed and immediately overwritten with debug, so the earlier value is dead code and the output/log path will always be .../debug/.... If this is meant to be optional, gate it behind an env var/flag; otherwise remove the computed assignment to avoid confusion.

Suggested change

export PRIMUS_EXP_NAME=lfm2_8B_A1B_${PRECISION_TYPE}_MBS${MBS}_GBS${GBS}_EP${PRIMUS_EP}_legacygg${LEGACY_GG}_turbogg${TURBO_USE_GROUPED_MLP}_${PRIMUS_TURBO_GROUPED_GEMM_BACKEND}

export PRIMUS_EXP_NAME=debug

DEFAULT_PRIMUS_EXP_NAME=lfm2_8B_A1B_${PRECISION_TYPE}_MBS${MBS}_GBS${GBS}_EP${PRIMUS_EP}_legacygg${LEGACY_GG}_turbogg${TURBO_USE_GROUPED_MLP}_${PRIMUS_TURBO_GROUPED_GEMM_BACKEND}

if [ "${USE_COMPUTED_PRIMUS_EXP_NAME:-0}" = "1" ]; then

export PRIMUS_EXP_NAME="$DEFAULT_PRIMUS_EXP_NAME"

else

export PRIMUS_EXP_NAME=debug

fi

Copilot · 2026-04-13T07:22:15Z

primus/backends/megatron/core/transformer/attention.py

+        sequence_len_offset: Optional[int] = None,
+        *,
+        inference_params: Optional[BaseInferenceContext] = None,
+    ) -> tuple[Tensor, Tensor]:


The forward() type annotation says it returns tuple[Tensor, Tensor], but the implementation returns (out.transpose(0, 1), None). Please update the return annotation to allow None (e.g., tuple[Tensor, Optional[Tensor]]) to match behavior and avoid type/contract confusion.

Suggested change

) -> tuple[Tensor, Tensor]:

) -> tuple[Tensor, Optional[Tensor]]:

Copilot · 2026-04-13T07:22:15Z

examples/megatron/configs/MI355X/qwen3_30B_A3B-BF16-pretrain.yaml

      # hyper parameters
      train_iters: 10
-      micro_batch_size: 8
+      micro_batch_size: 16
      global_batch_size: 512


This PR is titled "LFM model support", but this change bumps Qwen3-30B micro_batch_size from 8 to 16. If this is unrelated to LFM2 enablement, consider moving it to a separate PR (or note the rationale in the PR description) to keep review/rollback scope clear.

add lfm; patch spec; add shortconv

b60c6c7

Copilot AI review requested due to automatic review settings April 8, 2026 14:32

Copilot started reviewing on behalf of wenxie-amd April 8, 2026 14:34 View session

github-code-quality bot found potential problems Apr 8, 2026

View reviewed changes

primus/backends/megatron/core/models/gpt/gpt_layer_specs.py Fixed Show fixed Hide fixed

primus/backends/megatron/core/transformer/attention.py

if is_causal_conv1d_available():

from causal_conv1d import causal_conv1d_fn, causal_conv1d_update

else:

print(f"Unable to import causal_conv1d!!!")

Copilot AI reviewed Apr 8, 2026

View reviewed changes

wenxie-amd added 2 commits April 9, 2026 03:48

fix turbo gg patch conflict

e58056a

update

286460c

Copilot AI review requested due to automatic review settings April 9, 2026 09:55

Copilot started reviewing on behalf of wenxie-amd April 9, 2026 09:56 View session

update

a1cb3d9

Copilot AI reviewed Apr 9, 2026

View reviewed changes

wenxie-amd added 2 commits April 9, 2026 23:48

update

2a74bf3

add fp8 training yaml

ebec4b6

Copilot AI review requested due to automatic review settings April 10, 2026 02:55

Copilot started reviewing on behalf of wenxie-amd April 10, 2026 02:57 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

wenxie-amd added 2 commits April 10, 2026 03:17

update

c3294fb

qwen script

678f2cf

Copilot AI review requested due to automatic review settings April 13, 2026 07:14

Copilot started reviewing on behalf of wenxie-amd April 13, 2026 07:16 View session

Copilot AI reviewed Apr 13, 2026

View reviewed changes

	from megatron.core.transformer.attention import AttnMaskType
	from megatron.core.transformer.enums import AttnMaskType

+    class TESpecProvider:  # type: ignore[no-redef]
+        """Fallback provider that raises a clear error when TE is unavailable."""
+        def __init__(self, *args, **kwargs):
+            raise RuntimeError(
+                "Transformer Engine support is unavailable because the "
+                "`transformer_engine` package is not installed."
+            )

	# Note: patch_gpt_decoder_layer_specs must be applied bafter te_spec_provider_patches
	# Note: patch_gpt_decoder_layer_specs must be applied after te_spec_provider_patches

		export HF_TOKEN="${HF_TOKEN:-'your_hf_token'}" # make it your own hf token
		export WANDB_API_KEY="${WANDB_API_KEY:-'your_wandb_api_key'}" # make it your own wandb api key

-if is_causal_conv1d_available():
-    from causal_conv1d import causal_conv1d_fn, causal_conv1d_update
-else:
-    print(f"Unable to import causal_conv1d!!!")
-    causal_conv1d_fn, causal_conv1d_update = None, None
-kernel_modules = (causal_conv1d_fn, causal_conv1d_update)
-is_fast_path_available = all(kernel_modules)
+causal_conv1d_fn, causal_conv1d_update = None, None
+def _get_causal_conv1d_kernels():
+    global causal_conv1d_fn, causal_conv1d_update
+    if causal_conv1d_fn is not None and causal_conv1d_update is not None:
+        return causal_conv1d_fn, causal_conv1d_update
+    if not is_causal_conv1d_available():
+        return None, None
+    from causal_conv1d import causal_conv1d_fn as _causal_conv1d_fn
+    from causal_conv1d import causal_conv1d_update as _causal_conv1d_update
+    causal_conv1d_fn, causal_conv1d_update = _causal_conv1d_fn, _causal_conv1d_update
+    return causal_conv1d_fn, causal_conv1d_update
+kernel_modules = (causal_conv1d_fn, causal_conv1d_update)
+is_fast_path_available = is_causal_conv1d_available()

		export PRIMUS_EXP_NAME=lfm2_8B_A1B_${PRECISION_TYPE}_MBS${MBS}_GBS${GBS}_EP${PRIMUS_EP}_legacygg${LEGACY_GG}_turbogg${TURBO_USE_GROUPED_MLP}_${PRIMUS_TURBO_GROUPED_GEMM_BACKEND}
		export PRIMUS_EXP_NAME=debug

	) -> tuple[Tensor, Tensor]:
	) -> tuple[Tensor, Optional[Tensor]]:

Conversation

wenxie-amd commented Apr 8, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!