Describe the bug
I noticed that the queryTime reported by the benchmark can be completely off.
For example, NDS query32 was reported to run for 8 seconds while its actual time was 1.07 seconds. The 1.07 seconds matches the SHS UI and the rapids-tools report.
The root reason is that we use the current time upon return to the python code. This is not reliable way of measuring the actual query time.
|
start_time = int(time.time() * 1000) |
|
fn(*args) |
|
end_time = int(time.time() * 1000) |
We should instead extract the query execution time from an eventlistener, or from the evntlog itself.
Describe the bug
I noticed that the queryTime reported by the benchmark can be completely off.
For example, NDS query32 was reported to run for 8 seconds while its actual time was 1.07 seconds. The 1.07 seconds matches the SHS UI and the rapids-tools report.
The root reason is that we use the current time upon return to the python code. This is not reliable way of measuring the actual query time.
spark-rapids-benchmarks/utils/python_benchmark_reporter/PysparkBenchReport.py
Lines 101 to 103 in 2ad6b99
We should instead extract the query execution time from an eventlistener, or from the evntlog itself.