-
Notifications
You must be signed in to change notification settings - Fork 5.5k
fix(native): Fix inserting decimal type values precision and scale #26737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Reviewer's GuideAdjusts how TableWrite nodes build their RowType so that DECIMAL precision and scale from the connector insert handle are preserved when writing Parquet, aligning Prestissimo behavior with the Java writer. Sequence diagram for building TableWriteNode with correct DECIMAL precision/scalesequenceDiagram
actor User
participant PrestoPlanner
participant VeloxQueryPlanConverterBase
participant InsertTableHandle
participant ConnectorInsertTableHandle
participant ParquetWriter
User->>PrestoPlanner: submit INSERT with DECIMAL(10,2)
PrestoPlanner->>VeloxQueryPlanConverterBase: toVeloxQueryPlan(tableWriteNode)
VeloxQueryPlanConverterBase->>InsertTableHandle: connectorInsertTableHandle()
InsertTableHandle-->>VeloxQueryPlanConverterBase: ConnectorInsertTableHandle
VeloxQueryPlanConverterBase->>ConnectorInsertTableHandle: getColumnHandleDataTypes()
ConnectorInsertTableHandle-->>VeloxQueryPlanConverterBase: tableTypes (DECIMAL(10,2))
VeloxQueryPlanConverterBase->>VeloxQueryPlanConverterBase: toRowType(columns, tableTypes, typeParser)
VeloxQueryPlanConverterBase-->>PrestoPlanner: TableWriteNode with RowType(DECIMAL(10,2))
PrestoPlanner->>ParquetWriter: execute TableWriteNode
ParquetWriter-->>User: Parquet file with DECIMAL(10,2) schema
Updated class diagram for TableWriteNode creation and RowType constructionclassDiagram
class VeloxQueryPlanConverterBase {
+toVeloxQueryPlan(node, sourceVeloxPlan, tableWriteInfo, taskId) TableWriteNode
}
class TableWriteNode {
+id
+outputType RowType
+columnNames
+insertTableHandle InsertTableHandle
}
class InsertTableHandle {
+connectorInsertTableHandle() ConnectorInsertTableHandle
}
class ConnectorInsertTableHandle {
+getColumnHandleDataTypes() vector~TypePtr~
}
class RowType {
+names vector~string~
+types vector~TypePtr~
}
class VariableReferenceExpression {
+name string
+typeSignature
}
class TypeParser {
+parse(typeSignature) TypePtr
}
class toRowType_overload_1 {
+toRowType(variables vector~VariableReferenceExpression~, typeParser TypeParser) RowTypePtr
}
class toRowType_overload_2 {
+toRowType(variables vector~VariableReferenceExpression~, types vector~TypePtr~, typeParser TypeParser, excludeNames unordered_set~string~) RowTypePtr
}
VeloxQueryPlanConverterBase --> TableWriteNode : creates
TableWriteNode *-- RowType : outputType
TableWriteNode *-- InsertTableHandle
InsertTableHandle *-- ConnectorInsertTableHandle
ConnectorInsertTableHandle ..> RowType : provides types
VeloxQueryPlanConverterBase ..> toRowType_overload_1 : uses (unchanged)
VeloxQueryPlanConverterBase ..> toRowType_overload_2 : uses for TableWriteNode
toRowType_overload_1 ..> VariableReferenceExpression : reads types via TypeParser
toRowType_overload_1 ..> TypeParser
toRowType_overload_2 ..> VariableReferenceExpression : reads names
toRowType_overload_2 ..> RowType : builds from names and connector types
toRowType_overload_2 ..> TypeParser : available but not used for decimal precision
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and found some issues that need to be addressed.
- In the new
toRowTypeoverload, filteringvariablesviaexcludeNameswithout filteringtypeswill misalign names and types; either filtertypesin sync or derive them inside the function instead of passing the original vector unchanged. - The
typesparameter in the newtoRowTypeoverload is passed by value and never modified; consider taking it as aconst std::vector<TypePtr>&to avoid an unnecessary copy. - The
typeParserparameter in the newtoRowTypeoverload is currently unused; if it is not needed, it should be removed to avoid confusion, or used if there was an intended purpose.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In the new `toRowType` overload, filtering `variables` via `excludeNames` without filtering `types` will misalign names and types; either filter `types` in sync or derive them inside the function instead of passing the original vector unchanged.
- The `types` parameter in the new `toRowType` overload is passed by value and never modified; consider taking it as a `const std::vector<TypePtr>&` to avoid an unnecessary copy.
- The `typeParser` parameter in the new `toRowType` overload is currently unused; if it is not needed, it should be removed to avoid confusion, or used if there was an intended purpose.
## Individual Comments
### Comment 1
<location> `presto-native-execution/presto_cpp/main/types/PrestoToVeloxQueryPlan.cpp:78-82` </location>
<code_context>
return ROW(std::move(names), std::move(types));
}
+RowTypePtr toRowType(
+ const std::vector<protocol::VariableReferenceExpression>& variables,
+ const std::vector<TypePtr> types,
+ const TypeParser& typeParser,
+ const std::unordered_set<std::string>& excludeNames = {}) {
+ std::vector<std::string> names;
+ names.reserve(variables.size());
</code_context>
<issue_to_address>
**issue (bug_risk):** Filtering names via `excludeNames` without filtering `types` risks a name/type size mismatch in the resulting `RowType`.
This overload drops variables whose names are in `excludeNames`, but still passes the original `types` to `ROW(...)`. If any names are excluded, `names.size()` will be less than `types.size()`, which can cause undefined behavior or misaligned columns in the resulting `RowType`. Please either filter `types` in parallel, disallow non-empty `excludeNames` here, or add a runtime assertion that `names.size() == types.size()` before constructing the `RowType`.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
presto-native-execution/presto_cpp/main/types/PrestoToVeloxQueryPlan.cpp
Outdated
Show resolved
Hide resolved
1d026c6 to
a3d2fee
Compare
a3d2fee to
0f7f023
Compare
0f7f023 to
566ff2f
Compare
566ff2f to
061a7bb
Compare
Description
These changes require this velox PR to land first : facebookincubator/velox#12846
Prestissimo decimal type is incorrect when inserting values in Parquet. Decimal precision, scale are written incorrectly to Parquet. Need to change presto velox parquet-tools inspect output on parquet tools created by with decimal to be similar to presto java.
To Reproduce
Before the PR the output for prestissimo would be showing the
col0type asDECIMAL(6,2)instead ofDECIMAL(10,2).Expected behavior
Java writer does not have this issue. It uses the right precision, scale. AND with this PR we are getting the same output for the DECIMAL type and scale.
Motivation and Context
#26113
Impact
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.