windows c++ tensorrt paddlex 部署推理

## 描述问题
  我通过paddlex制作了tensorrt_infer.dll,来进行推理，并采用了动态输入，成功进行了推理。
  但是问题随之而来：
  1. 没有生成trt缓存，每次运行都会重新
       Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
     如此一来，可用性大幅度降低，想请问该如何保存trt模型，或者提取生成。
  2. 另外我有看见 TensorRTEngineConfig，请问我该如何直接使用此结构体，来构建新的tensorrt_infer.cpp.或许可以避开上述的麻烦。
     我有看见TensorRTEngineConfig结构体中有这一项
     // onnx model path
     std::string model_file_ = "";
     所以我在cmake的时候，勾选了 WITH_ONNX_TENSORRT,但没观察到明显的变化。
     cmake勾选图如下：
<img width="1566" height="778" alt="Image" src="https://github.com/user-attachments/assets/a94f4675-5234-4bad-8a5b-45cc4ef25f93" />

   3. 我在动态部署时，通过其它issue的提示，所生成的pbtxt文件，获取了动态部署的尺寸，但想请问能否直接使用此.pbtxt文件，否则或许更换模型框架后，就需要重新制作一次tensorrt_infer.dll，


## 复现

1. 高性能推理

    * 您是否完全按照[高性能推理文档教程](https://paddlepaddle.github.io/PaddleX/main/pipeline_deploy/high_performance_inference.html)跑通了流程？


3. 您使用的**模型**和**数据集**是？
  我目前使用的模型是pplite， backbone=stdc1
  数据集为自定义生成的数据集。

4. 请提供您出现的报错信息及相关log
    运行时相关的输出如下：
REGISTER_CLASS:seg
init SegModel,model_type=seg
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1203 16:26:10.320520 18876 analysis_predictor.cc:1532] TensorRT subgraph engine is enabled
e[1me[35m--- Running analysis [ir_graph_build_pass]e[0m
I1203 16:26:10.323510 18876 executor.cc:187] Old Executor is Running.
e[1me[35m--- Running analysis [ir_analysis_pass]e[0m
e[32m--- Running IR pass [trt_remove_amp_strategy_op_pass]e[0m
e[32m--- Running IR pass [trt_support_nhwc_pass]e[0m
e[32m--- Running IR pass [adaptive_pool2d_convert_global_pass]e[0m
I1203 16:26:10.350419 18876 fuse_pass_base.cc:59] ---  detected 1 subgraphs
e[32m--- Running IR pass [trt_map_ops_to_matrix_multiply_pass]e[0m
e[32m--- Running IR pass [shuffle_channel_detect_pass]e[0m
e[32m--- Running IR pass [quant_conv2d_dequant_fuse_pass]e[0m
e[32m--- Running IR pass [delete_quant_dequant_op_pass]e[0m
e[32m--- Running IR pass [delete_quant_dequant_filter_op_pass]e[0m
e[32m--- Running IR pass [trt_delete_weight_dequant_linear_op_pass]e[0m
e[32m--- Running IR pass [delete_quant_dequant_linear_op_pass]e[0m
e[32m--- Running IR pass [identity_op_clean_pass]e[0m
I1203 16:26:10.371353 18876 fuse_pass_base.cc:59] ---  detected 1 subgraphs
e[32m--- Running IR pass [add_support_int8_pass]e[0m
I1203 16:26:10.399904 18876 fuse_pass_base.cc:59] ---  detected 209 subgraphs
e[32m--- Running IR pass [simplify_with_basic_ops_pass]e[0m
e[32m--- Running IR pass [trt_prompt_tuning_embedding_eltwise_layernorm_fuse_pass]e[0m
e[32m--- Running IR pass [trt_embedding_eltwise_layernorm_fuse_pass]e[0m
e[32m--- Running IR pass [preln_embedding_eltwise_layernorm_fuse_pass]e[0m
e[32m--- Running IR pass [trt_multihead_matmul_fuse_pass_v2]e[0m
e[32m--- Running IR pass [trt_multihead_matmul_fuse_pass_v3]e[0m
e[32m--- Running IR pass [multihead_matmul_roformer_fuse_pass]e[0m
e[32m--- Running IR pass [constant_folding_pass]e[0m
I1203 16:26:10.455720 18876 fuse_pass_base.cc:59] ---  detected 4 subgraphs
e[32m--- Running IR pass [trt_flash_multihead_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [trt_cross_multihead_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [vit_attention_fuse_pass]e[0m
e[32m--- Running IR pass [trt_qk_multihead_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [layernorm_shift_partition_fuse_pass]e[0m
e[32m--- Running IR pass [merge_layernorm_fuse_pass]e[0m
e[32m--- Running IR pass [preln_residual_bias_fuse_pass]e[0m
e[32m--- Running IR pass [preln_layernorm_x_fuse_pass]e[0m
e[32m--- Running IR pass [reverse_roll_fuse_pass]e[0m
e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m
I1203 16:26:10.524490 18876 fuse_pass_base.cc:59] ---  detected 39 subgraphs
e[32m--- Running IR pass [conv_elementwise_add_fuse_pass]e[0m
W1203 16:26:10.534457 18876 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(conv_elementwise_add_fuse_pass) failed!
W1203 16:26:10.534457 18876 conv_elementwise_add_fuse_pass.cc:94] Pass in op compat failed.
W1203 16:26:10.534457 18876 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(conv_elementwise_add_fuse_pass) failed!
W1203 16:26:10.534457 18876 conv_elementwise_add_fuse_pass.cc:94] Pass in op compat failed.
W1203 16:26:10.534457 18876 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(conv_elementwise_add_fuse_pass) failed!
W1203 16:26:10.534457 18876 conv_elementwise_add_fuse_pass.cc:94] Pass in op compat failed.
W1203 16:26:10.534457 18876 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(conv_elementwise_add_fuse_pass) failed!
W1203 16:26:10.535450 18876 conv_elementwise_add_fuse_pass.cc:94] Pass in op compat failed.
I1203 16:26:10.539438 18876 fuse_pass_base.cc:59] ---  detected 39 subgraphs
e[32m--- Running IR pass [remove_padding_recover_padding_pass]e[0m
e[32m--- Running IR pass [delete_remove_padding_recover_padding_pass]e[0m
e[32m--- Running IR pass [dense_fc_to_sparse_pass]e[0m
e[32m--- Running IR pass [dense_multihead_matmul_to_sparse_pass]e[0m
e[32m--- Running IR pass [tensorrt_subgraph_pass]e[0m
I1203 16:26:10.551398 18876 tensorrt_subgraph_pass.cc:302] ---  detect a sub-graph with 13 nodes
I1203 16:26:10.592264 18876 tensorrt_subgraph_pass.cc:846] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
W1203 16:26:12.048230 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output.
W1203 16:26:12.048230 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output.
W1203 16:26:12.049224 18876 place.cc:161] The `paddle::PlaceType::kCPU/kGPU` is deprecated since version 2.3, and will be removed in version 2.4! Please use `Tensor::is_cpu()/is_gpu()` method to determine the type of place.
I1203 16:26:12.050225 18876 engine.cc:215] Run Paddle-TRT FP16 mode
I1203 16:26:12.050225 18876 engine.cc:301] Run Paddle-TRT Dynamic Shape mode.
W1203 16:26:40.895160 18876 helper.h:127] TensorRT encountered issues when converting weights between types and that could affect accuracy.
W1203 16:26:40.895160 18876 helper.h:127] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
W1203 16:26:40.895160 18876 helper.h:127] Check verbose logs for the list of affected weights.
W1203 16:26:40.895160 18876 helper.h:127] - 1 weights are affected by this issue: Detected subnormal FP16 values.
I1203 16:26:40.898150 18876 tensorrt_subgraph_pass.cc:302] ---  detect a sub-graph with 13 nodes
I1203 16:26:40.898150 18876 tensorrt_subgraph_pass.cc:846] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
W1203 16:26:40.899148 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output.
W1203 16:26:40.899148 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output.
I1203 16:26:40.899148 18876 engine.cc:215] Run Paddle-TRT FP16 mode
I1203 16:26:40.899148 18876 engine.cc:301] Run Paddle-TRT Dynamic Shape mode.
W1203 16:26:46.100383 18876 helper.h:127] TensorRT encountered issues when converting weights between types and that could affect accuracy.
W1203 16:26:46.100383 18876 helper.h:127] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
W1203 16:26:46.101380 18876 helper.h:127] Check verbose logs for the list of affected weights.
W1203 16:26:46.101380 18876 helper.h:127] - 1 weights are affected by this issue: Detected subnormal FP16 values.
W1203 16:26:46.101380 18876 helper.h:127] - 1 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
I1203 16:26:46.103374 18876 tensorrt_subgraph_pass.cc:302] ---  detect a sub-graph with 16 nodes
I1203 16:26:46.104367 18876 tensorrt_subgraph_pass.cc:846] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
W1203 16:26:46.104367 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output.
W1203 16:26:46.104367 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output.
I1203 16:26:46.105363 18876 engine.cc:215] Run Paddle-TRT FP16 mode
I1203 16:26:46.105363 18876 engine.cc:301] Run Paddle-TRT Dynamic Shape mode.
W1203 16:26:53.203140 18876 helper.h:127] TensorRT encountered issues when converting weights between types and that could affect accuracy.
W1203 16:26:53.203140 18876 helper.h:127] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
W1203 16:26:53.204133 18876 helper.h:127] Check verbose logs for the list of affected weights.
W1203 16:26:53.204133 18876 helper.h:127] - 2 weights are affected by this issue: Detected subnormal FP16 values.
I1203 16:26:53.206130 18876 tensorrt_subgraph_pass.cc:302] ---  detect a sub-graph with 106 nodes
I1203 16:26:53.210116 18876 tensorrt_subgraph_pass.cc:846] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
W1203 16:26:53.212109 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output.
W1203 16:26:53.212109 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output.
W1203 16:26:53.215099 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output.
W1203 16:26:53.216091 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output.
W1203 16:26:53.226063 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output.
W1203 16:26:53.226063 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output.
I1203 16:26:53.231046 18876 engine.cc:215] Run Paddle-TRT FP16 mode
I1203 16:26:53.231046 18876 engine.cc:301] Run Paddle-TRT Dynamic Shape mode.
W1203 16:27:51.221599 18876 helper.h:127] TensorRT encountered issues when converting weights between types and that could affect accuracy.
W1203 16:27:51.221599 18876 helper.h:127] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
W1203 16:27:51.222596 18876 helper.h:127] Check verbose logs for the list of affected weights.
W1203 16:27:51.222596 18876 helper.h:127] - 35 weights are affected by this issue: Detected subnormal FP16 values.
W1203 16:27:51.222596 18876 helper.h:127] - 6 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m
e[32m--- Running IR pass [conv_elementwise_add_act_fuse_pass]e[0m
e[32m--- Running IR pass [conv_elementwise_add2_act_fuse_pass]e[0m
e[32m--- Running IR pass [transpose_flatten_concat_fuse_pass]e[0m
e[32m--- Running IR pass [auto_mixed_precision_pass]e[0m
e[1me[35m--- Running analysis [save_optimized_model_pass]e[0m
e[1me[35m--- Running analysis [ir_params_sync_among_devices_pass]e[0m
I1203 16:27:51.243525 18876 ir_params_sync_among_devices_pass.cc:53] Sync params from CPU to GPU
e[1me[35m--- Running analysis [adjust_cudnn_workspace_size_pass]e[0m
e[1me[35m--- Running analysis [inference_op_replace_pass]e[0m
e[1me[35m--- Running analysis [memory_optimize_pass]e[0m
I1203 16:27:51.244522 18876 memory_optimize_pass.cc:118] The persistable params in main graph are : 30.6206MB
I1203 16:27:51.245518 18876 memory_optimize_pass.cc:246] Cluster name : bilinear_interp_v2_0.tmp_0  size: 512
I1203 16:27:51.245518 18876 memory_optimize_pass.cc:246] Cluster name : batch_norm_32.tmp_1  size: 4
I1203 16:27:51.245518 18876 memory_optimize_pass.cc:246] Cluster name : shape_0.tmp_0_slice_2  size: 4
I1203 16:27:51.245518 18876 memory_optimize_pass.cc:246] Cluster name : bilinear_interp_v2_1.tmp_0  size: 512
I1203 16:27:51.245518 18876 memory_optimize_pass.cc:246] Cluster name : tmp_0  size: 512
I1203 16:27:51.246515 18876 memory_optimize_pass.cc:246] Cluster name : shape_0.tmp_0_slice_1  size: 4
e[1me[35m--- Running analysis [ir_graph_to_program_pass]e[0m
I1203 16:27:51.259475 18876 analysis_predictor.cc:1838] ======= optimize end =======
I1203 16:27:51.260469 18876 naive_executor.cc:200] ---  skip [feed], feed -> x
I1203 16:27:51.260469 18876 naive_executor.cc:200] ---  skip [save_infer_model/scale_0.tmp_0], fetch -> fetch
W1203 16:27:51.267448 18876 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.9, Driver API Version: 13.0, Runtime API Version: 11.8
W1203 16:27:51.268443 18876 gpu_resources.cc:164] device: 0, cuDNN Version: 8.6.
[TRT] 批量推理完成，有效结果数：976/976

=====================================================
                TRT 批量推理耗时统计
=====================================================
总有效图片数量：        976 张
总推理耗时：            1782.967 毫秒
平均单图推理耗时：      1.827 毫秒
随机选中图片索引：      935 (有效结果)
=====================================================

## 环境

1. 请提供您使用的PaddlePaddle、PaddleX版本号、Python版本号
paddle2onnx                    1.3.1
paddleocr                      2.10.0 
paddlepaddle                   2.6.2
paddlepaddle-gpu               2.6.1
paddleseg                      0.0.0.dev0   实际为 2.10，使用 了本地部署
python3.10版本在anaconda3中训练的模型，随后在Windows环境下进行c++、tensorrt部署。

2. 请提供您使用的操作系统信息，如Linux/Windows/MacOS
   Windows10
3. 请问您使用的CUDA/cuDNN的版本号是？
   Cuda  版本 11.8
Cudnn 版本 8.6.0.163
	tensorrt 版本8.5.1.7

以上为提供的详细信息，请问需要另外提供什么信息吗？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

windows c++ tensorrt paddlex 部署推理 #4799

描述问题

复现

=====================================================
TRT 批量推理耗时统计

总有效图片数量： 976 张
总推理耗时： 1782.967 毫秒
平均单图推理耗时： 1.827 毫秒
随机选中图片索引： 935 (有效结果)

环境

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

windows c++ tensorrt paddlex 部署推理 #4799

Description

描述问题

复现

===================================================== TRT 批量推理耗时统计

总有效图片数量： 976 张 总推理耗时： 1782.967 毫秒 平均单图推理耗时： 1.827 毫秒 随机选中图片索引： 935 (有效结果)

环境

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

=====================================================
TRT 批量推理耗时统计

总有效图片数量： 976 张
总推理耗时： 1782.967 毫秒
平均单图推理耗时： 1.827 毫秒
随机选中图片索引： 935 (有效结果)