Skip to content

LFM model support#651

Draft
wenxie-amd wants to merge 8 commits intomainfrom
dev/wenx/lfm_model
Draft

LFM model support#651
wenxie-amd wants to merge 8 commits intomainfrom
dev/wenx/lfm_model

Conversation

@wenxie-amd
Copy link
Copy Markdown
Contributor

No description provided.

Copilot AI review requested due to automatic review settings April 8, 2026 14:32
if is_causal_conv1d_available():
from causal_conv1d import causal_conv1d_fn, causal_conv1d_update
else:
print(f"Unable to import causal_conv1d!!!")
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds initial support for LiquidAI’s LFM2 model family in the Megatron backend by introducing new model configs, a Primus implementation of the LFM2 short-convolution “attention” layer, and a patch to swap Megatron’s decoder layer spec builder when LFM-specific layer types are configured.

Changes:

  • Add LFM/LFM2 Megatron model configuration YAMLs (base + LFM2-8B-A1B).
  • Introduce Lfm2ShortConv (Primus implementation) and an LFM-aware GPT layer spec builder.
  • Add a Megatron patch to route get_gpt_decoder_layer_specs to the Primus implementation and provide an example pretrain config.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
primus/configs/models/megatron/primus_megatron_model.yaml Adds LFM2-related config knobs (lfm_layer_types, conv_L_cache, conv_bias).
primus/configs/models/megatron/lfm2_8B_A1B.yaml New LFM2-8B-A1B model config (tokenizer + model/MoE/GQA settings).
primus/configs/models/megatron/lfm_base.yaml New LFM base config defaults (RoPE, RMSNorm, long context, MoE settings).
primus/backends/megatron/training/tokenizer/tokenizer.py Adds Lfm2Tokenizer to the allowed tokenizer type set.
primus/backends/megatron/patches/gpt_decoder_layer_specs_patches.py Registers a patch to override Megatron’s decoder layer spec builder when LFM is configured.
primus/backends/megatron/core/transformer/attention.py Adds Lfm2ShortConv module (depthwise conv + projections) with fast/slow paths.
primus/backends/megatron/core/models/gpt/gpt_layer_specs.py Adds Primus GPT decoder layer spec selection supporting lfm_layer_types (“conv” vs “full_attention”).
examples/megatron/configs/MI355X/lfm2_8B_A1B-BF16-pretrain.yaml Example training config wiring in lfm_layer_types and training overrides.

Comment on lines +33 to +38
if is_causal_conv1d_available():
from causal_conv1d import causal_conv1d_fn, causal_conv1d_update
else:
print(f"Unable to import causal_conv1d!!!")
causal_conv1d_fn, causal_conv1d_update = None, None

Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module prints to stdout at import time when causal_conv1d isn't available. Import-time prints are noisy in distributed training and can break log formatting; prefer using the project's rank-aware logging (e.g., log_rank_0/warning_rank_0) or warnings.warn, and avoid side effects during import.

Copilot uses AI. Check for mistakes.
from megatron.core.inference.contexts import BaseInferenceContext
from megatron.core.packed_seq_params import PackedSeqParams
from megatron.core.process_groups_config import ProcessGroupCollection
from megatron.core.transformer.attention import AttnMaskType
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AttnMaskType is imported from megatron.core.transformer.attention, while the rest of the codebase consistently imports it from megatron.core.transformer.enums (e.g., primus/backends/megatron/core/transformer/multi_latent_attention.py:12). If megatron.core.transformer.attention doesn’t export AttnMaskType in some Megatron versions, this will raise ImportError at import time; consider switching to the enums import for consistency and compatibility.

Suggested change
from megatron.core.transformer.attention import AttnMaskType
from megatron.core.transformer.enums import AttnMaskType

Copilot uses AI. Check for mistakes.
Comment on lines +114 to +115
self.L_cache = args.conv_L_cache
self.bias = args.conv_bias
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_args() is expected to always have conv_L_cache and conv_bias, but Primus's MegatronArgBuilder drops config keys that aren't present in Megatron's argparse defaults (see primus/backends/megatron/argument_builder.py:86-110). If the underlying Megatron version doesn’t define these args, this will raise AttributeError when constructing the layer. Consider using getattr(args, ..., default) with sensible defaults and/or adding an args patch to register these flags with Megatron’s parser.

Suggested change
self.L_cache = args.conv_L_cache
self.bias = args.conv_bias
self.L_cache = getattr(args, "conv_L_cache", 1)
self.bias = getattr(args, "conv_bias", False)

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +54
# patch its local binding.
try:
import gpt_builders as gpt_builders_module # pyright: ignore[reportMissingImports]
except ImportError as exc:
log_rank_0(
"[Patch:megatron.gpt.decoder_layer_specs] Failed to import gpt_builders; "
"cannot patch local get_gpt_decoder_layer_specs binding."
)
raise RuntimeError("Failed to import required module gpt_builders") from exc
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If gpt_builders can’t be imported, this patch raises and aborts training. Given primus/core/utils/import_utils.py explicitly treats gpt_builders as version-dependent, it would be safer to log and continue after patching megatron.core.models.gpt.gpt_layer_specs (older Megatron versions won’t need the local-binding patch anyway).

Suggested change
# patch its local binding.
try:
import gpt_builders as gpt_builders_module # pyright: ignore[reportMissingImports]
except ImportError as exc:
log_rank_0(
"[Patch:megatron.gpt.decoder_layer_specs] Failed to import gpt_builders; "
"cannot patch local get_gpt_decoder_layer_specs binding."
)
raise RuntimeError("Failed to import required module gpt_builders") from exc
# patch its local binding when that version-dependent module is available.
try:
import gpt_builders as gpt_builders_module # pyright: ignore[reportMissingImports]
except ImportError:
log_rank_0(
"[Patch:megatron.gpt.decoder_layer_specs] gpt_builders is unavailable; "
"skipping local get_gpt_decoder_layer_specs binding patch."
)
return

Copilot uses AI. Check for mistakes.
HAVE_TE = True
except ImportError:
HAVE_TE = False

Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TESpecProvider is only imported inside the try block, but get_lfm_layer_with_transformer_engine_spec() unconditionally instantiates it. If transformer_engine isn't installed, TESpecProvider will be undefined and this will raise NameError. Consider guarding with if not HAVE_TE: raise RuntimeError(...) (or import TESpecProvider inside the function and raise a clear error) so failures are deterministic and actionable.

Suggested change
class TESpecProvider: # type: ignore[no-redef]
"""Fallback provider that raises a clear error when TE is unavailable."""
def __init__(self, *args, **kwargs):
raise RuntimeError(
"Transformer Engine support is unavailable because the "
"`transformer_engine` package is not installed."
)

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +55
def get_gpt_decoder_layer_specs(
config: TransformerConfig,
use_transformer_engine: bool,
normalization: Optional[str] = None,
qk_l2_norm: Optional[bool] = False,
vp_stage: Optional[int] = None,
pp_rank: Optional[int] = None,
) -> TransformerBlockSubmodules:
"""GPT block spec."""
assert use_transformer_engine, "use_transformer_engine must be True"
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is annotated to return TransformerBlockSubmodules, but it actually returns a Python list of per-layer specs (layer_specs). If callers expect the annotated type (or upstream Megatron’s signature differs), this can cause runtime breakage and makes the code harder to reason about. Align the return type annotation (and remove the unused TransformerBlockSubmodules import if appropriate) with what’s actually returned/expected by Megatron.

Copilot uses AI. Check for mistakes.
Comment on lines +57 to +85
args = get_args()

lfm_layer_types = args.lfm_layer_types

if lfm_layer_types is None:
lfm_layer_types = ["full_attention"] * args.num_layers
else:
assert isinstance(
lfm_layer_types, list
), f"lfm_layer_types must be list[str] or None, got {type(lfm_layer_types)}"
invalid_type_values = [
layer_type for layer_type in lfm_layer_types if not isinstance(layer_type, str)
]
assert (
not invalid_type_values
), f"lfm_layer_types must contain only str values, got invalid values: {invalid_type_values}"
valid_layer_types = {"conv", "full_attention"}
invalid_layer_types = sorted(
{layer_type for layer_type in lfm_layer_types if layer_type not in valid_layer_types}
)
assert not invalid_layer_types, (
f"Invalid lfm_layer_types values: {invalid_layer_types}. "
f"Only {sorted(valid_layer_types)} are allowed."
)

assert len(lfm_layer_types) == args.num_layers, (
f"Invalid lfm_layer_types length: {len(lfm_layer_types)}, "
f"expected {args.num_layers} (args.num_layers)."
)
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lfm_layer_types defaults and length validation are based on args.num_layers, but the layer construction loop uses config.num_layers. If these ever differ, this can either raise on length check or index out of range later. Use config.num_layers consistently for both defaulting and validation (and only fall back to args when necessary).

Copilot uses AI. Check for mistakes.
# attention_mask: [batch, 1, seq, seq]
hidden_states = hidden_states.transpose(0, 1)

if is_fast_path_available and "cuda" in hidden_states.device.type and not is_torchdynamo_compiling():
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hidden_states.device.type is typically the string 'cuda'/'cpu'; using "cuda" in hidden_states.device.type works but is an odd predicate and can hide mistakes. Prefer an explicit check like hidden_states.device.type == "cuda" for clarity.

Suggested change
if is_fast_path_available and "cuda" in hidden_states.device.type and not is_torchdynamo_compiling():
if (
is_fast_path_available
and hidden_states.device.type == "cuda"
and not is_torchdynamo_compiling()
):

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings April 9, 2026 09:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.

Comment on lines +16 to +36
from megatron.core.transformer.spec_utils import ModuleSpec
from megatron.core.transformer.transformer_block import TransformerBlockSubmodules
from megatron.core.transformer.transformer_config import TransformerConfig

from primus.backends.megatron.core.transformer.attention import (
Lfm2ShortConv,
Lfm2ShortConvSubmodules,
)


def get_gpt_decoder_layer_specs(
config: TransformerConfig,
use_transformer_engine: bool,
normalization: Optional[str] = None,
qk_l2_norm: Optional[bool] = False,
vp_stage: Optional[int] = None,
pp_rank: Optional[int] = None,
) -> TransformerBlockSubmodules:
"""GPT block spec."""
assert use_transformer_engine, "use_transformer_engine must be True"

Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_gpt_decoder_layer_specs is annotated/typed as returning TransformerBlockSubmodules, but the function actually constructs and returns layer_specs (a list of ModuleSpec). This mismatch can confuse callers and static checkers, and may indicate the wrong API is being patched; update the return type/imports to match the actual return value (or adjust implementation to return the expected object).

Copilot uses AI. Check for mistakes.
Comment on lines +39 to +67
args = get_args()

lfm_layer_types = args.lfm_layer_types

if lfm_layer_types is None:
lfm_layer_types = ["full_attention"] * args.num_layers
else:
assert isinstance(
lfm_layer_types, list
), f"lfm_layer_types must be list[str] or None, got {type(lfm_layer_types)}"
invalid_type_values = [
layer_type for layer_type in lfm_layer_types if not isinstance(layer_type, str)
]
assert (
not invalid_type_values
), f"lfm_layer_types must contain only str values, got invalid values: {invalid_type_values}"
valid_layer_types = {"conv", "full_attention"}
invalid_layer_types = sorted(
{layer_type for layer_type in lfm_layer_types if layer_type not in valid_layer_types}
)
assert not invalid_layer_types, (
f"Invalid lfm_layer_types values: {invalid_layer_types}. "
f"Only {sorted(valid_layer_types)} are allowed."
)

assert len(lfm_layer_types) == args.num_layers, (
f"Invalid lfm_layer_types length: {len(lfm_layer_types)}, "
f"expected {args.num_layers} (args.num_layers)."
)
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lfm_layer_types validation uses args.num_layers, but layer construction iterates over config.num_layers. If these differ, lfm_layer_types[layer_number] can raise IndexError or silently misconfigure layers. Use a single source of truth (prefer config.num_layers) for both the default fill and the length check.

Copilot uses AI. Check for mistakes.
Comment on lines +122 to +136
# Parse config.moe_layer_freq to determine the pattern of expert/dense layers.
# 0 stands for dense layers, 1 stands for expert layers.
# For integer N: Creates a pattern with one expert layer every N layers.
# For string pattern: Evaluates the str directly (e.g. "[1,0,1]" for alternating expert/dense).
if isinstance(config.moe_layer_freq, int):
moe_layer_pattern = [1 if (i % config.moe_layer_freq == 0) else 0 for i in range(config.num_layers)]
elif isinstance(config.moe_layer_freq, list):
moe_layer_pattern = config.moe_layer_freq
assert len(moe_layer_pattern) == config.num_layers, (
f"Invalid length of moe_layer_pattern: {len(moe_layer_pattern)}, "
f"expected {config.num_layers}, "
f"current moe layer pattern: {config.moe_layer_freq}"
)
else:
raise ValueError(f"Invalid moe_layer_freq: {type(config.moe_layer_freq)}, {config.moe_layer_freq}")
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says string moe_layer_freq patterns are evaluated, but the code only accepts int or list and raises for str. Either update the comment to reflect the real behavior, or add explicit handling for str here (if this function can be called before the existing moe_layer_freq normalization patches run).

Copilot uses AI. Check for mistakes.
)

assert not use_kitchen, "use_kitchen is not supported with LFM model."
# Note: patch_gpt_decoder_layer_specs must be applied bafter te_spec_provider_patches
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "bafter" -> "after".

Suggested change
# Note: patch_gpt_decoder_layer_specs must be applied bafter te_spec_provider_patches
# Note: patch_gpt_decoder_layer_specs must be applied after te_spec_provider_patches

Copilot uses AI. Check for mistakes.
Comment on lines +51 to +67
# gpt_builders imports get_gpt_decoder_layer_specs directly; force import and
# patch its local binding.
try:
import gpt_builders as gpt_builders_module # pyright: ignore[reportMissingImports]
except ImportError as exc:
log_rank_0(
"[Patch:megatron.gpt.decoder_layer_specs] Failed to import gpt_builders; "
"cannot patch local get_gpt_decoder_layer_specs binding."
)
raise RuntimeError("Failed to import required module gpt_builders") from exc

gpt_builders_module.get_gpt_decoder_layer_specs = primus_get_gpt_decoder_layer_specs
log_rank_0(
"[Patch:megatron.gpt.decoder_layer_specs] Patched "
"gpt_builders.get_gpt_decoder_layer_specs "
f"-> {primus_get_gpt_decoder_layer_specs.__name__}"
)
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch hard-fails if gpt_builders cannot be imported. gpt_builders only exists in some Megatron versions (Primus already treats it as optional in primus/core/utils/import_utils.py); in versions without it, this should be a no-op rather than aborting training. Consider using the same lazy-import pattern and only patch the local binding when the module is present.

Suggested change
# gpt_builders imports get_gpt_decoder_layer_specs directly; force import and
# patch its local binding.
try:
import gpt_builders as gpt_builders_module # pyright: ignore[reportMissingImports]
except ImportError as exc:
log_rank_0(
"[Patch:megatron.gpt.decoder_layer_specs] Failed to import gpt_builders; "
"cannot patch local get_gpt_decoder_layer_specs binding."
)
raise RuntimeError("Failed to import required module gpt_builders") from exc
gpt_builders_module.get_gpt_decoder_layer_specs = primus_get_gpt_decoder_layer_specs
log_rank_0(
"[Patch:megatron.gpt.decoder_layer_specs] Patched "
"gpt_builders.get_gpt_decoder_layer_specs "
f"-> {primus_get_gpt_decoder_layer_specs.__name__}"
)
# gpt_builders imports get_gpt_decoder_layer_specs directly; patch its local
# binding only when that optional module exists in this Megatron version.
import importlib
import importlib.util
if importlib.util.find_spec("gpt_builders") is None:
log_rank_0(
"[Patch:megatron.gpt.decoder_layer_specs] gpt_builders is not available; "
"skipping patch of local get_gpt_decoder_layer_specs binding."
)
else:
gpt_builders_module = importlib.import_module("gpt_builders")
gpt_builders_module.get_gpt_decoder_layer_specs = primus_get_gpt_decoder_layer_specs
log_rank_0(
"[Patch:megatron.gpt.decoder_layer_specs] Patched "
"gpt_builders.get_gpt_decoder_layer_specs "
f"-> {primus_get_gpt_decoder_layer_specs.__name__}"
)

Copilot uses AI. Check for mistakes.
Comment on lines +98 to +107
def __init__(
self,
config: TransformerConfig,
submodules: Optional[Lfm2ShortConvSubmodules],
layer_number: int,
attn_mask_type: AttnMaskType,
attention_type: str | None = None,
cp_comm_type: str | None = None,
pg_collection: ProcessGroupCollection | None = None,
):
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

submodules is typed as Optional[...] but is dereferenced unconditionally (submodules.in_proj / submodules.out_proj). Either make submodules non-optional in the signature, or add an explicit check with a clear error so failures are easier to diagnose.

Copilot uses AI. Check for mistakes.
Comment on lines +11 to +15
# LFM2 model configuration
# lfm_layer_types: ["conv", "conv", "full_attention", "conv", "conv", "conv", "full_attention", "conv", "conv", "conv", "full_attention", "conv", "conv", "conv", "full_attention", "conv", "conv", "conv", "full_attention", "conv", "conv", "full_attention", "conv", "conv"]
lfm_layer_types: null # list[str]
conv_L_cache: 3 # int
conv_bias: false # bool
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These new config fields (lfm_layer_types, conv_L_cache, conv_bias) may be silently dropped when converting Primus configs to Megatron args: MegatronArgBuilder.update() only keeps keys present in Megatron's argparse defaults (primus/backends/megatron/argument_builder.py:92-110). If Megatron doesn't define these flags, megatron.training.get_args() won't have them and the new LFM code (e.g., args.conv_L_cache / args.lfm_layer_types) will fail at runtime. Consider adding an args patch to inject these fields into Megatron args (with defaults) during build_args, or extending the argument builder to allow these Primus-specific keys.

Copilot uses AI. Check for mistakes.
Comment on lines +37 to +45
from megatron.training import get_args

args = get_args()

lfm_layer_types = args.lfm_layer_types

if lfm_layer_types is None:
lfm_layer_types = ["full_attention"] * args.num_layers
else:
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

args.lfm_layer_types is accessed directly; if this attribute isn't present in the Megatron args namespace (e.g., because it isn't part of Megatron's argparse defaults and was dropped during config conversion), this will raise AttributeError. Use getattr(args, "lfm_layer_types", None) (and similarly for any new LFM args) to make the code resilient across Megatron versions/config paths.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings April 10, 2026 02:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.

Comment on lines +3 to +4
export HF_TOKEN="${HF_TOKEN:-'your_hf_token'}" # make it your own hf token
export WANDB_API_KEY="${WANDB_API_KEY:-'your_wandb_api_key'}" # make it your own wandb api key
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default-value quoting in these parameter expansions will include literal single quotes in the resulting env var values (e.g., HF_TOKEN becomes 'your_hf_token'), which will break authentication if users rely on the default/template. Use unquoted defaults or the same double-quote pattern used elsewhere in the repo (e.g., ${HF_TOKEN:-"your_hf_token"}).

Suggested change
export HF_TOKEN="${HF_TOKEN:-'your_hf_token'}" # make it your own hf token
export WANDB_API_KEY="${WANDB_API_KEY:-'your_wandb_api_key'}" # make it your own wandb api key
export HF_TOKEN="${HF_TOKEN:-"your_hf_token"}" # make it your own hf token
export WANDB_API_KEY="${WANDB_API_KEY:-"your_wandb_api_key"}" # make it your own wandb api key

Copilot uses AI. Check for mistakes.
Comment on lines +34 to +46
if [ "$TURBO_USE_GROUPED_MLP" = "True" ]; then
export PRIMUS_TURBO_GROUPED_GEMM_BACKEND=TRITON
# export PRIMUS_TURBO_GROUPED_GEMM_BACKEND=CK
# export PRIMUS_TURBO_GROUPED_GEMM_BACKEND=HIPBLASLT
fi

export PRECISION_TYPE=FP8 # BF16
export EXP=examples/megatron/configs/MI355X/lfm2_8B_A1B-${PRECISION_TYPE}-pretrain.yaml
# export EXP=examples/megatron/configs/MI355X/qwen3_30B_A3B-BF16-pretrain.yaml
export PRIMUS_TEAM=amd
PRIMUS_USER="tas-mi325x-$(date +%Y%m%d)"
export PRIMUS_USER
export PRIMUS_EXP_NAME=lfm2_8B_A1B_${PRECISION_TYPE}_MBS${MBS}_GBS${GBS}_EP${PRIMUS_EP}_legacygg${LEGACY_GG}_turbogg${TURBO_USE_GROUPED_MLP}_${PRIMUS_TURBO_GROUPED_GEMM_BACKEND}
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PRIMUS_TURBO_GROUPED_GEMM_BACKEND is referenced in PRIMUS_EXP_NAME even when TURBO_USE_GROUPED_MLP is False, but it is only exported inside the True branch. This makes the experiment name depend on whether the variable was previously set in the shell (or end up empty). Consider setting an explicit default when grouped MLP is disabled (e.g., "NONE") or constructing PRIMUS_EXP_NAME conditionally.

Copilot uses AI. Check for mistakes.
--use_pytorch_profiler $PROFILE \
--profile_step_end 7 \
--profile_step_start 6 \
2>&1 | tee "output/$PRIMUS_TEAM/$PRIMUS_USER/$PRIMUS_EXP_NAME/log_node_${NODE_RANK}.txt"
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NODE_RANK is used in the log filename but is never set in this script. If NODE_RANK is unset, logs will be written to a path with an empty rank (e.g., log_node_.txt) and may collide across processes. Set a default (e.g., NODE_RANK=${PET_NODE_RANK:-0}) or use a safer fallback in the filename.

Copilot uses AI. Check for mistakes.
Comment on lines +33 to +40
if is_causal_conv1d_available():
from causal_conv1d import causal_conv1d_fn, causal_conv1d_update
else:
print(f"Unable to import causal_conv1d!!!")
causal_conv1d_fn, causal_conv1d_update = None, None

kernel_modules = (causal_conv1d_fn, causal_conv1d_update)
is_fast_path_available = all(kernel_modules)
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module prints to stdout at import time when causal_conv1d isn't available. In distributed runs this can spam logs from every rank and is hard to control. Prefer using the project logger (or at least warnings.warn) and avoid side effects at import time; also consider checking availability when the fast path is actually requested.

Suggested change
if is_causal_conv1d_available():
from causal_conv1d import causal_conv1d_fn, causal_conv1d_update
else:
print(f"Unable to import causal_conv1d!!!")
causal_conv1d_fn, causal_conv1d_update = None, None
kernel_modules = (causal_conv1d_fn, causal_conv1d_update)
is_fast_path_available = all(kernel_modules)
causal_conv1d_fn, causal_conv1d_update = None, None
def _get_causal_conv1d_kernels():
global causal_conv1d_fn, causal_conv1d_update
if causal_conv1d_fn is not None and causal_conv1d_update is not None:
return causal_conv1d_fn, causal_conv1d_update
if not is_causal_conv1d_available():
return None, None
from causal_conv1d import causal_conv1d_fn as _causal_conv1d_fn
from causal_conv1d import causal_conv1d_update as _causal_conv1d_update
causal_conv1d_fn, causal_conv1d_update = _causal_conv1d_fn, _causal_conv1d_update
return causal_conv1d_fn, causal_conv1d_update
kernel_modules = (causal_conv1d_fn, causal_conv1d_update)
is_fast_path_available = is_causal_conv1d_available()

Copilot uses AI. Check for mistakes.
Comment on lines +55 to +60
except ImportError as exc:
log_rank_0(
"[Patch:megatron.gpt.decoder_layer_specs] Failed to import gpt_builders; "
"cannot patch local get_gpt_decoder_layer_specs binding."
)
raise RuntimeError("Failed to import required module gpt_builders") from exc
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If gpt_builders is not importable (older Megatron versions), this patch raises RuntimeError and aborts training even though those versions may not need the additional local-binding patch. To preserve the repo's stated multi-version compatibility (see primus/core/utils/import_utils.py), handle ImportError by logging and returning instead of raising.

Suggested change
except ImportError as exc:
log_rank_0(
"[Patch:megatron.gpt.decoder_layer_specs] Failed to import gpt_builders; "
"cannot patch local get_gpt_decoder_layer_specs binding."
)
raise RuntimeError("Failed to import required module gpt_builders") from exc
except ImportError:
log_rank_0(
"[Patch:megatron.gpt.decoder_layer_specs] Failed to import gpt_builders; "
"skipping local get_gpt_decoder_layer_specs binding patch for compatibility."
)
return

Copilot uses AI. Check for mistakes.
Comment on lines +194 to +199
'The fp8 argument in "get_gpt_layer_with_transformer_engine_spec" has been deprecated'
" and will be removed soon. Please update your code accordingly."
)

assert not use_kitchen, "use_kitchen is not supported with LFM model."
# Note: patch_gpt_decoder_layer_specs must be applied bafter te_spec_provider_patches
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deprecation warning text references "get_gpt_layer_with_transformer_engine_spec", but this code is in get_lfm_layer_with_transformer_engine_spec. Also, the comment has a typo ("bafter"). Please correct these to avoid misleading users and to keep logs/searchability accurate.

Suggested change
'The fp8 argument in "get_gpt_layer_with_transformer_engine_spec" has been deprecated'
" and will be removed soon. Please update your code accordingly."
)
assert not use_kitchen, "use_kitchen is not supported with LFM model."
# Note: patch_gpt_decoder_layer_specs must be applied bafter te_spec_provider_patches
'The fp8 argument in "get_lfm_layer_with_transformer_engine_spec" has been deprecated'
" and will be removed soon. Please update your code accordingly."
)
assert not use_kitchen, "use_kitchen is not supported with LFM model."
# Note: patch_gpt_decoder_layer_specs must be applied after te_spec_provider_patches

Copilot uses AI. Check for mistakes.
Comment on lines +17 to +27
@register_patch(
"megatron.gpt.decoder_layer_specs",
backend="megatron",
phase="before_train",
description=(
"Monkey patch get_gpt_decoder_layer_specs to use Primus implementation "
"when lfm_layer_types is configured for LFM model."
),
condition=lambda ctx: getattr(get_args(ctx), "lfm_layer_types", None) is not None,
priority=42, # must patch after te_spec_provider_patches
)
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New LFM-specific behavior is introduced via monkey-patching Megatron's get_gpt_decoder_layer_specs, but there are no accompanying unit tests to validate (1) conv vs full_attention layer selection and (2) patch behavior across Megatron versions (with/without gpt_builders). Given the existing unit test suite under tests/unit_tests, please add targeted tests to prevent regressions.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings April 13, 2026 07:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Comment on lines +3 to +4
export HF_TOKEN="${HF_TOKEN:-'your_hf_token'}" # make it your own hf token
export WANDB_API_KEY="${WANDB_API_KEY:-'your_wandb_api_key'}" # make it your own wandb api key
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as run.sh: ${HF_TOKEN:-'your_hf_token'} / ${WANDB_API_KEY:-'your_wandb_api_key'} export values containing literal single quotes. Prefer ${HF_TOKEN:-"your_hf_token"} (as used in other repo scripts) or omit the inner quotes.

Suggested change
export HF_TOKEN="${HF_TOKEN:-'your_hf_token'}" # make it your own hf token
export WANDB_API_KEY="${WANDB_API_KEY:-'your_wandb_api_key'}" # make it your own wandb api key
export HF_TOKEN="${HF_TOKEN:-your_hf_token}" # make it your own hf token
export WANDB_API_KEY="${WANDB_API_KEY:-your_wandb_api_key}" # make it your own wandb api key

Copilot uses AI. Check for mistakes.
Comment on lines +69 to +70
export PRIMUS_EXP_NAME=lfm2_8B_A1B_${PRECISION_TYPE}_MBS${MBS}_GBS${GBS}_EP${PRIMUS_EP}_legacygg${LEGACY_GG}_turbogg${TURBO_USE_GROUPED_MLP}_${PRIMUS_TURBO_GROUPED_GEMM_BACKEND}
export PRIMUS_EXP_NAME=debug
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PRIMUS_EXP_NAME is computed and immediately overwritten with debug, so the earlier value is dead code and the output/log path will always be .../debug/.... If this is meant to be optional, gate it behind an env var/flag; otherwise remove the computed assignment to avoid confusion.

Suggested change
export PRIMUS_EXP_NAME=lfm2_8B_A1B_${PRECISION_TYPE}_MBS${MBS}_GBS${GBS}_EP${PRIMUS_EP}_legacygg${LEGACY_GG}_turbogg${TURBO_USE_GROUPED_MLP}_${PRIMUS_TURBO_GROUPED_GEMM_BACKEND}
export PRIMUS_EXP_NAME=debug
DEFAULT_PRIMUS_EXP_NAME=lfm2_8B_A1B_${PRECISION_TYPE}_MBS${MBS}_GBS${GBS}_EP${PRIMUS_EP}_legacygg${LEGACY_GG}_turbogg${TURBO_USE_GROUPED_MLP}_${PRIMUS_TURBO_GROUPED_GEMM_BACKEND}
if [ "${USE_COMPUTED_PRIMUS_EXP_NAME:-0}" = "1" ]; then
export PRIMUS_EXP_NAME="$DEFAULT_PRIMUS_EXP_NAME"
else
export PRIMUS_EXP_NAME=debug
fi

Copilot uses AI. Check for mistakes.
sequence_len_offset: Optional[int] = None,
*,
inference_params: Optional[BaseInferenceContext] = None,
) -> tuple[Tensor, Tensor]:
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The forward() type annotation says it returns tuple[Tensor, Tensor], but the implementation returns (out.transpose(0, 1), None). Please update the return annotation to allow None (e.g., tuple[Tensor, Optional[Tensor]]) to match behavior and avoid type/contract confusion.

Suggested change
) -> tuple[Tensor, Tensor]:
) -> tuple[Tensor, Optional[Tensor]]:

Copilot uses AI. Check for mistakes.
Comment on lines 23 to 26
# hyper parameters
train_iters: 10
micro_batch_size: 8
micro_batch_size: 16
global_batch_size: 512
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is titled "LFM model support", but this change bumps Qwen3-30B micro_batch_size from 8 to 16. If this is unrelated to LFM2 enablement, consider moving it to a separate PR (or note the rationale in the PR description) to keep review/rollback scope clear.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants