You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/en/api/pipelines/z_image.md
+34-1Lines changed: 34 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,8 +26,41 @@ specific language governing permissions and limitations under the License.
26
26
27
27
Z-Image-Turbo is a distilled version of Z-Image that matches or exceeds leading competitors with only 8 NFEs (Number of Function Evaluations). It offers sub-second inference latency on enterprise-grade H800 GPUs and fits comfortably within 16G VRAM consumer devices. It excels in photorealistic image generation, bilingual text rendering (English & Chinese), and robust instruction adherence.
28
28
29
+
## Image-to-image
30
+
31
+
Use [`ZImageImg2ImgPipeline`] to transform an existing image based on a text prompt.
Copy file name to clipboardExpand all lines: docs/source/en/quantization/modelopt.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ specific language governing permissions and limitations under the License. -->
11
11
12
12
# NVIDIA ModelOpt
13
13
14
-
[NVIDIA-ModelOpt](https://github.com/NVIDIA/TensorRT-Model-Optimizer) is a unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed.
14
+
[NVIDIA-ModelOpt](https://github.com/NVIDIA/Model-Optimizer) is a unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed.
15
15
16
16
Before you begin, make sure you have nvidia_modelopt installed.
17
17
@@ -57,7 +57,7 @@ image.save("output.png")
57
57
>
58
58
> The quantization methods in NVIDIA-ModelOpt are designed to reduce the memory footprint of model weights using various QAT (Quantization-Aware Training) and PTQ (Post-Training Quantization) techniques while maintaining model performance. However, the actual performance gain during inference depends on the deployment framework (e.g., TRT-LLM, TensorRT) and the specific hardware configuration.
59
59
>
60
-
> More details can be found [here](https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples).
60
+
> More details can be found [here](https://github.com/NVIDIA/Model-Optimizer/tree/main/examples).
61
61
62
62
## NVIDIAModelOptConfig
63
63
@@ -86,7 +86,7 @@ The quantization methods supported are as follows:
86
86
|**NVFP4**|`nvfp4 weight only`, `nvfp4 block quantization`|`quant_type`, `quant_type + channel_quantize + block_quantize`|`channel_quantize = -1 is only supported for now`|
87
87
88
88
89
-
Refer to the [official modelopt documentation](https://nvidia.github.io/TensorRT-Model-Optimizer/) for a better understanding of the available quantization methods and the exhaustive list of configuration options available.
89
+
Refer to the [official modelopt documentation](https://nvidia.github.io/Model-Optimizer/) for a better understanding of the available quantization methods and the exhaustive list of configuration options available.
0 commit comments