add image gen by williechai · Pull Request #56 · inclusionAI/Ming

williechai · 2025-09-28T12:45:11Z

No description provided.

mingcheng · 2025-09-29T02:27:29Z

Hi, I think we could gain more document or comments for the code addition.

Copilot

Pull Request Overview

This PR adds image generation capabilities to the BailingMM2 model by integrating diffusion-based text-to-image generation. The changes enable both standalone image generation and multi-modal inference (text, image, video, audio) in a unified test framework.

Adds image generation functionality with diffusion model integration
Refactors the test file to support multi-modal inference examples
Updates model components to handle image generation parameters and workflows

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
test_infer_gen_image.py	Completely rewritten to demonstrate image generation and multi-modal inference capabilities
modeling_bailingmm2.py	Adds image generation methods, diffusion model loading, and condition embedding generation
processing_bailingmm2.py	Adds image generation input processing for reference images
modeling_bailing_moe_v2.py	Updates forward method to handle image/audio masks for generation
image_processing_bailingmm2.py	Updates VideoInput import source
bailingmm_utils.py	Adds VideoInput type definition and aspect ratio processing utilities

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-29T02:28:54Z

bailingmm_utils.py

+    List["np.ndarray"],
+    List["torch.Tensor"],
+    List[List["Image.Image"]],
+    List[List["np.ndarrray"]],


There's a typo in 'np.ndarrray' - it should be 'np.ndarray' with only one 'r'.

Suggested change

List[List["np.ndarrray"]],

List[List["np.ndarray"]],

Copilot · 2025-09-29T02:28:54Z

test_infer_gen_image.py

+    return output_text
+
+if __name__ == '__main__':
+    model_name_or_path = "/nativemm/share/cpfs/weilong.cwl/checkpoints/Ming_Flash_2.0_sft1_merged"


Hard-coded file paths make the code non-portable. Consider using environment variables or configuration files to specify these paths.

Suggested change

model_name_or_path = "/nativemm/share/cpfs/weilong.cwl/checkpoints/Ming_Flash_2.0_sft1_merged"

model_name_or_path = os.environ.get(

"MODEL_NAME_OR_PATH",

"/nativemm/share/cpfs/weilong.cwl/checkpoints/Ming_Flash_2.0_sft1_merged"

)

Copilot · 2025-09-29T02:28:55Z

test_infer_gen_image.py

-        attn_implementation="flash_attention_2",
-        load_image_gen=True,
-    ).to("cuda")
+    vision_path = "/input/sunyunxiao.syx/assets/"


Hard-coded file paths make the code non-portable. Consider using environment variables or configuration files to specify these paths.

Suggested change

vision_path = "/input/sunyunxiao.syx/assets/"

vision_path = os.environ.get("VISION_PATH", "/input/sunyunxiao.syx/assets/")

Copilot · 2025-09-29T02:28:55Z

modeling_bailingmm2.py

+            print("image_gen_seed: ", image_gen_seed)
+            print("image_gen_steps: ", image_gen_steps)
+            print("image_gen_height: ", image_gen_height)
+            print("image_gen_width: ", image_gen_width)


Debug print statements should be removed from production code or replaced with proper logging using the logging module.

Suggested change

print("image_gen_seed: ", image_gen_seed)

print("image_gen_steps: ", image_gen_steps)

print("image_gen_height: ", image_gen_height)

print("image_gen_width: ", image_gen_width)

logging.info(f"image_gen_seed: {image_gen_seed}")

logging.info(f"image_gen_steps: {image_gen_steps}")

logging.info(f"image_gen_height: {image_gen_height}")

logging.info(f"image_gen_width: {image_gen_width}")

Copilot · 2025-09-29T02:28:55Z

modeling_bailingmm2.py

+            with safe_open(safetensors_path, framework="pt") as f:
+                temp_state_dict = {key: f.get_tensor(key) for key in f.keys()}
+        self.query_tokens_dict = nn.ParameterDict()
+        #self.img_gen_scales = [4, 8, 16]


Commented-out code should be removed to improve code clarity. If this code might be needed later, consider using version control instead of comments.

Suggested change

#self.img_gen_scales = [4, 8, 16]

Copilot · 2025-09-29T02:28:55Z

modeling_bailingmm2.py

+        for scale in self.img_gen_scales:                    
+            num_tokens = scale * scale
+            scale_name = f"{scale}x{scale}"
+            #weights = temp_state_dict[f"query_tokens_dict.{scale_name}"]


Commented-out code should be removed to improve code clarity. If this code might be needed later, consider using version control instead of comments.

Suggested change

#weights = temp_state_dict[f"query_tokens_dict.{scale_name}"]

Copilot · 2025-09-29T02:28:56Z

modeling_bailingmm2.py

+        else:
+            raise ValueError("unsupported dit type: {}".format(dit_type))
+        self.diffusion_loss.to(device)
+        #self.norm_query_embeds = True


Commented-out code should be removed to improve code clarity. If this code might be needed later, consider using version control instead of comments.

Suggested change

#self.norm_query_embeds = True

Copilot · 2025-09-29T02:28:56Z

processing_bailingmm2.py

+            ref_pil = ref_pil.convert("RGB")
+            closest_size, resize_size = process_ratio(ori_h=ref_pil.size[1], ori_w=ref_pil.size[0])
+            ref_pil = torchvision.transforms.functional.resize(ref_pil, resize_size, interpolation=torchvision.transforms.InterpolationMode.BILINEAR)
+            ref_pil = torchvision.transforms.functional.center_crop(ref_pil, closest_size)
+            ref_tensor = ((torchvision.transforms.functional.to_tensor(ref_pil) - 0.5) * 2.0).unsqueeze(0)
+            image_gen_inputs = {
+                "image_gen_pixel_values_reference": ref_tensor,
+                "image_gen_height": torch.LongTensor([ref_pil.size[1]]),
+                "image_gen_width": torch.LongTensor([ref_pil.size[0]]),
+            }


This image processing logic is complex and could be extracted into a separate helper method for better readability and reusability.

Suggested change

ref_pil = ref_pil.convert("RGB")

closest_size, resize_size = process_ratio(ori_h=ref_pil.size[1], ori_w=ref_pil.size[0])

ref_pil = torchvision.transforms.functional.resize(ref_pil, resize_size, interpolation=torchvision.transforms.InterpolationMode.BILINEAR)

ref_pil = torchvision.transforms.functional.center_crop(ref_pil, closest_size)

ref_tensor = ((torchvision.transforms.functional.to_tensor(ref_pil) - 0.5) * 2.0).unsqueeze(0)

image_gen_inputs = {

"image_gen_pixel_values_reference": ref_tensor,

"image_gen_height": torch.LongTensor([ref_pil.size[1]]),

"image_gen_width": torch.LongTensor([ref_pil.size[0]]),

}

image_gen_inputs = self._process_reference_image(ref_pil)

Copilot · 2025-09-29T02:28:56Z

modeling_bailing_moe_v2.py

+                inputs_embeds.size,
+            )
+            batch_size, seq_length = inputs_embeds.shape[:2]
+            #raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")


Commented-out error handling should be removed or replaced with proper logic. If this validation is no longer needed, remove the comment entirely.

Suggested change

#raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")

raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")

add image gen

017f111

mingcheng requested a review from Copilot September 29, 2025 02:27

mingcheng added the enhancement New feature or request label Sep 29, 2025

Copilot AI reviewed Sep 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add image gen#56

add image gen#56
williechai wants to merge 1 commit intoinclusionAI:Ming_Flash_2.0from
williechai:Ming_Flash_2.0_fork_imagegen

williechai commented Sep 28, 2025

Uh oh!

mingcheng commented Sep 29, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Sep 29, 2025

Uh oh!

Copilot AI Sep 29, 2025

Uh oh!

Copilot AI Sep 29, 2025

Uh oh!

Copilot AI Sep 29, 2025

Uh oh!

Copilot AI Sep 29, 2025

Uh oh!

Copilot AI Sep 29, 2025

Uh oh!

Copilot AI Sep 29, 2025

Uh oh!

Copilot AI Sep 29, 2025

Uh oh!

Copilot AI Sep 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	vision_path = "/input/sunyunxiao.syx/assets/"
	vision_path = os.environ.get("VISION_PATH", "/input/sunyunxiao.syx/assets/")

	#raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
	raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")

Conversation

williechai commented Sep 28, 2025

Uh oh!

mingcheng commented Sep 29, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants