-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
描述问题
我通过paddlex制作了tensorrt_infer.dll,来进行推理,并采用了动态输入,成功进行了推理。
但是问题随之而来:
- 没有生成trt缓存,每次运行都会重新
Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
如此一来,可用性大幅度降低,想请问该如何保存trt模型,或者提取生成。 - 另外我有看见 TensorRTEngineConfig,请问我该如何直接使用此结构体,来构建新的tensorrt_infer.cpp.或许可以避开上述的麻烦。
我有看见TensorRTEngineConfig结构体中有这一项
// onnx model path
std::string model_file_ = "�";
所以我在cmake的时候,勾选了 WITH_ONNX_TENSORRT,但没观察到明显的变化。
cmake勾选图如下:
- 我在动态部署时,通过其它issue的提示,所生成的pbtxt文件,获取了动态部署的尺寸,但想请问能否直接使用此.pbtxt文件,否则或许更换模型框架后,就需要重新制作一次tensorrt_infer.dll,
复现
-
高性能推理
- 您是否完全按照高性能推理文档教程跑通了流程?
-
您使用的模型和数据集是?
我目前使用的模型是pplite, backbone=stdc1
数据集为自定义生成的数据集。 -
请提供您出现的报错信息及相关log
运行时相关的输出如下:
REGISTER_CLASS:seg
init SegModel,model_type=seg
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1203 16:26:10.320520 18876 analysis_predictor.cc:1532] TensorRT subgraph engine is enabled
e[1me[35m--- Running analysis [ir_graph_build_pass]e[0m
I1203 16:26:10.323510 18876 executor.cc:187] Old Executor is Running.
e[1me[35m--- Running analysis [ir_analysis_pass]e[0m
e[32m--- Running IR pass [trt_remove_amp_strategy_op_pass]e[0m
e[32m--- Running IR pass [trt_support_nhwc_pass]e[0m
e[32m--- Running IR pass [adaptive_pool2d_convert_global_pass]e[0m
I1203 16:26:10.350419 18876 fuse_pass_base.cc:59] --- detected 1 subgraphs
e[32m--- Running IR pass [trt_map_ops_to_matrix_multiply_pass]e[0m
e[32m--- Running IR pass [shuffle_channel_detect_pass]e[0m
e[32m--- Running IR pass [quant_conv2d_dequant_fuse_pass]e[0m
e[32m--- Running IR pass [delete_quant_dequant_op_pass]e[0m
e[32m--- Running IR pass [delete_quant_dequant_filter_op_pass]e[0m
e[32m--- Running IR pass [trt_delete_weight_dequant_linear_op_pass]e[0m
e[32m--- Running IR pass [delete_quant_dequant_linear_op_pass]e[0m
e[32m--- Running IR pass [identity_op_clean_pass]e[0m
I1203 16:26:10.371353 18876 fuse_pass_base.cc:59] --- detected 1 subgraphs
e[32m--- Running IR pass [add_support_int8_pass]e[0m
I1203 16:26:10.399904 18876 fuse_pass_base.cc:59] --- detected 209 subgraphs
e[32m--- Running IR pass [simplify_with_basic_ops_pass]e[0m
e[32m--- Running IR pass [trt_prompt_tuning_embedding_eltwise_layernorm_fuse_pass]e[0m
e[32m--- Running IR pass [trt_embedding_eltwise_layernorm_fuse_pass]e[0m
e[32m--- Running IR pass [preln_embedding_eltwise_layernorm_fuse_pass]e[0m
e[32m--- Running IR pass [trt_multihead_matmul_fuse_pass_v2]e[0m
e[32m--- Running IR pass [trt_multihead_matmul_fuse_pass_v3]e[0m
e[32m--- Running IR pass [multihead_matmul_roformer_fuse_pass]e[0m
e[32m--- Running IR pass [constant_folding_pass]e[0m
I1203 16:26:10.455720 18876 fuse_pass_base.cc:59] --- detected 4 subgraphs
e[32m--- Running IR pass [trt_flash_multihead_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [trt_cross_multihead_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [vit_attention_fuse_pass]e[0m
e[32m--- Running IR pass [trt_qk_multihead_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [layernorm_shift_partition_fuse_pass]e[0m
e[32m--- Running IR pass [merge_layernorm_fuse_pass]e[0m
e[32m--- Running IR pass [preln_residual_bias_fuse_pass]e[0m
e[32m--- Running IR pass [preln_layernorm_x_fuse_pass]e[0m
e[32m--- Running IR pass [reverse_roll_fuse_pass]e[0m
e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m
I1203 16:26:10.524490 18876 fuse_pass_base.cc:59] --- detected 39 subgraphs
e[32m--- Running IR pass [conv_elementwise_add_fuse_pass]e[0m
W1203 16:26:10.534457 18876 op_compat_sensible_pass.cc:232] Check the Attr(axis) of Op(elementwise_add) in pass(conv_elementwise_add_fuse_pass) failed!
W1203 16:26:10.534457 18876 conv_elementwise_add_fuse_pass.cc:94] Pass in op compat failed.
W1203 16:26:10.534457 18876 op_compat_sensible_pass.cc:232] Check the Attr(axis) of Op(elementwise_add) in pass(conv_elementwise_add_fuse_pass) failed!
W1203 16:26:10.534457 18876 conv_elementwise_add_fuse_pass.cc:94] Pass in op compat failed.
W1203 16:26:10.534457 18876 op_compat_sensible_pass.cc:232] Check the Attr(axis) of Op(elementwise_add) in pass(conv_elementwise_add_fuse_pass) failed!
W1203 16:26:10.534457 18876 conv_elementwise_add_fuse_pass.cc:94] Pass in op compat failed.
W1203 16:26:10.534457 18876 op_compat_sensible_pass.cc:232] Check the Attr(axis) of Op(elementwise_add) in pass(conv_elementwise_add_fuse_pass) failed!
W1203 16:26:10.535450 18876 conv_elementwise_add_fuse_pass.cc:94] Pass in op compat failed.
I1203 16:26:10.539438 18876 fuse_pass_base.cc:59] --- detected 39 subgraphs
e[32m--- Running IR pass [remove_padding_recover_padding_pass]e[0m
e[32m--- Running IR pass [delete_remove_padding_recover_padding_pass]e[0m
e[32m--- Running IR pass [dense_fc_to_sparse_pass]e[0m
e[32m--- Running IR pass [dense_multihead_matmul_to_sparse_pass]e[0m
e[32m--- Running IR pass [tensorrt_subgraph_pass]e[0m
I1203 16:26:10.551398 18876 tensorrt_subgraph_pass.cc:302] --- detect a sub-graph with 13 nodes
I1203 16:26:10.592264 18876 tensorrt_subgraph_pass.cc:846] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
W1203 16:26:12.048230 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output.
W1203 16:26:12.048230 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output.
W1203 16:26:12.049224 18876 place.cc:161] Thepaddle::PlaceType::kCPU/kGPUis deprecated since version 2.3, and will be removed in version 2.4! Please useTensor::is_cpu()/is_gpu()method to determine the type of place.
I1203 16:26:12.050225 18876 engine.cc:215] Run Paddle-TRT FP16 mode
I1203 16:26:12.050225 18876 engine.cc:301] Run Paddle-TRT Dynamic Shape mode.
W1203 16:26:40.895160 18876 helper.h:127] TensorRT encountered issues when converting weights between types and that could affect accuracy.
W1203 16:26:40.895160 18876 helper.h:127] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
W1203 16:26:40.895160 18876 helper.h:127] Check verbose logs for the list of affected weights.
W1203 16:26:40.895160 18876 helper.h:127] - 1 weights are affected by this issue: Detected subnormal FP16 values.
I1203 16:26:40.898150 18876 tensorrt_subgraph_pass.cc:302] --- detect a sub-graph with 13 nodes
I1203 16:26:40.898150 18876 tensorrt_subgraph_pass.cc:846] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
W1203 16:26:40.899148 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output.
W1203 16:26:40.899148 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output.
I1203 16:26:40.899148 18876 engine.cc:215] Run Paddle-TRT FP16 mode
I1203 16:26:40.899148 18876 engine.cc:301] Run Paddle-TRT Dynamic Shape mode.
W1203 16:26:46.100383 18876 helper.h:127] TensorRT encountered issues when converting weights between types and that could affect accuracy.
W1203 16:26:46.100383 18876 helper.h:127] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
W1203 16:26:46.101380 18876 helper.h:127] Check verbose logs for the list of affected weights.
W1203 16:26:46.101380 18876 helper.h:127] - 1 weights are affected by this issue: Detected subnormal FP16 values.
W1203 16:26:46.101380 18876 helper.h:127] - 1 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
I1203 16:26:46.103374 18876 tensorrt_subgraph_pass.cc:302] --- detect a sub-graph with 16 nodes
I1203 16:26:46.104367 18876 tensorrt_subgraph_pass.cc:846] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
W1203 16:26:46.104367 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output.
W1203 16:26:46.104367 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output.
I1203 16:26:46.105363 18876 engine.cc:215] Run Paddle-TRT FP16 mode
I1203 16:26:46.105363 18876 engine.cc:301] Run Paddle-TRT Dynamic Shape mode.
W1203 16:26:53.203140 18876 helper.h:127] TensorRT encountered issues when converting weights between types and that could affect accuracy.
W1203 16:26:53.203140 18876 helper.h:127] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
W1203 16:26:53.204133 18876 helper.h:127] Check verbose logs for the list of affected weights.
W1203 16:26:53.204133 18876 helper.h:127] - 2 weights are affected by this issue: Detected subnormal FP16 values.
I1203 16:26:53.206130 18876 tensorrt_subgraph_pass.cc:302] --- detect a sub-graph with 106 nodes
I1203 16:26:53.210116 18876 tensorrt_subgraph_pass.cc:846] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
W1203 16:26:53.212109 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output.
W1203 16:26:53.212109 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output.
W1203 16:26:53.215099 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output.
W1203 16:26:53.216091 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output.
W1203 16:26:53.226063 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output.
W1203 16:26:53.226063 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output.
I1203 16:26:53.231046 18876 engine.cc:215] Run Paddle-TRT FP16 mode
I1203 16:26:53.231046 18876 engine.cc:301] Run Paddle-TRT Dynamic Shape mode.
W1203 16:27:51.221599 18876 helper.h:127] TensorRT encountered issues when converting weights between types and that could affect accuracy.
W1203 16:27:51.221599 18876 helper.h:127] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
W1203 16:27:51.222596 18876 helper.h:127] Check verbose logs for the list of affected weights.
W1203 16:27:51.222596 18876 helper.h:127] - 35 weights are affected by this issue: Detected subnormal FP16 values.
W1203 16:27:51.222596 18876 helper.h:127] - 6 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m
e[32m--- Running IR pass [conv_elementwise_add_act_fuse_pass]e[0m
e[32m--- Running IR pass [conv_elementwise_add2_act_fuse_pass]e[0m
e[32m--- Running IR pass [transpose_flatten_concat_fuse_pass]e[0m
e[32m--- Running IR pass [auto_mixed_precision_pass]e[0m
e[1me[35m--- Running analysis [save_optimized_model_pass]e[0m
e[1me[35m--- Running analysis [ir_params_sync_among_devices_pass]e[0m
I1203 16:27:51.243525 18876 ir_params_sync_among_devices_pass.cc:53] Sync params from CPU to GPU
e[1me[35m--- Running analysis [adjust_cudnn_workspace_size_pass]e[0m
e[1me[35m--- Running analysis [inference_op_replace_pass]e[0m
e[1me[35m--- Running analysis [memory_optimize_pass]e[0m
I1203 16:27:51.244522 18876 memory_optimize_pass.cc:118] The persistable params in main graph are : 30.6206MB
I1203 16:27:51.245518 18876 memory_optimize_pass.cc:246] Cluster name : bilinear_interp_v2_0.tmp_0 size: 512
I1203 16:27:51.245518 18876 memory_optimize_pass.cc:246] Cluster name : batch_norm_32.tmp_1 size: 4
I1203 16:27:51.245518 18876 memory_optimize_pass.cc:246] Cluster name : shape_0.tmp_0_slice_2 size: 4
I1203 16:27:51.245518 18876 memory_optimize_pass.cc:246] Cluster name : bilinear_interp_v2_1.tmp_0 size: 512
I1203 16:27:51.245518 18876 memory_optimize_pass.cc:246] Cluster name : tmp_0 size: 512
I1203 16:27:51.246515 18876 memory_optimize_pass.cc:246] Cluster name : shape_0.tmp_0_slice_1 size: 4
e[1me[35m--- Running analysis [ir_graph_to_program_pass]e[0m
I1203 16:27:51.259475 18876 analysis_predictor.cc:1838] ======= optimize end =======
I1203 16:27:51.260469 18876 naive_executor.cc:200] --- skip [feed], feed -> x
I1203 16:27:51.260469 18876 naive_executor.cc:200] --- skip [save_infer_model/scale_0.tmp_0], fetch -> fetch
W1203 16:27:51.267448 18876 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.9, Driver API Version: 13.0, Runtime API Version: 11.8
W1203 16:27:51.268443 18876 gpu_resources.cc:164] device: 0, cuDNN Version: 8.6.
[TRT] 批量推理完成,有效结果数:976/976
=====================================================
TRT 批量推理耗时统计
总有效图片数量: 976 张
总推理耗时: 1782.967 毫秒
平均单图推理耗时: 1.827 毫秒
随机选中图片索引: 935 (有效结果)
环境
-
请提供您使用的PaddlePaddle、PaddleX版本号、Python版本号
paddle2onnx 1.3.1
paddleocr 2.10.0
paddlepaddle 2.6.2
paddlepaddle-gpu 2.6.1
paddleseg 0.0.0.dev0 实际为 2.10,使用 了本地部署
python3.10版本在anaconda3中训练的模型,随后在Windows环境下进行c++、tensorrt部署。 -
请提供您使用的操作系统信息,如Linux/Windows/MacOS
Windows10 -
请问您使用的CUDA/cuDNN的版本号是?
Cuda 版本 11.8
Cudnn 版本 8.6.0.163
tensorrt 版本8.5.1.7
以上为提供的详细信息,请问需要另外提供什么信息吗?