Add Hive and Iceberg Load benchmark#55
Conversation
a8371c9 to
9edb174
Compare
|
@wanglinsong @ethanyzhang Sorry for the late response, this PR slipped from my mind. I addressed your comments, can you please take another look? Thanks. |
wanglinsong
left a comment
There was a problem hiding this comment.
I believe the DDL to create tables are the same across all scale factors. Can you parameterize or remove the hardcoded schema name: tpch.sf100.?
FROM tpch.sf100.customer;
Thanks, at the current framework I think this needs lots of work to support that. |
Oh, this is an embedded connector. This is not an issue at all. Please ignore. |
|
Hi @wanglinsong Thanks for your comments, do you think this PR is ready to be merged? Anything else you want me to change? |
|
@wanglinsong can you please have another look, thanks. |
|
@wanglinsong gentle ping. |
|
Have we tested this PR for pbench? @PingLiuPing |
Thanks, I didn't test this after fix the review comments. Before that I have tested it in pbench. |
6f32a1a to
b8de6a7
Compare
|
Thanks for adding the TPC-H loading benchmarks! I reviewed the files and have some findings before we integrate. Issues to fix1. 2. 3. 4. Column type mismatches vs TPC-H spec
5. @PingLiuPing @xpengahana Did we generate the right schema? Suggestions (non-blocking)6. No 7. Inconsistent SQL casing 8. Consider |
4b003e9 to
0d7d29c
Compare
loading (insert) benchmark is missing in pbench, this PR add the initial files for loading benchmark. It includes test files for hive and iceberg connector, both native and Java.
The data is loaded from tpch connector on the fly.
Future enhancements are required to make the benchmark run in stage such as prepare stage, main stage, cleanup stage etc.