Isolate encode/decode logic used in Polaris entities#4214
Isolate encode/decode logic used in Polaris entities#4214adutra merged 2 commits intoapache:mainfrom
Conversation
This change replaces calls to `RESTUtil` in Polaris entity code when serializing or deserializing namespaces stored in internal properties. It replaces those calls with calls to a new utility class, `PolarisEntityUtils`. The logic used there for encoding and decoding namespaces is in fact identical to `RESTUtil` as of today, but having our own logic protects Polaris from future behavioral changes in `RESTUtil` that could potentially cause data corruption in Polaris metastores.
35a8066 to
a2abf50
Compare
dimas-b
left a comment
There was a problem hiding this comment.
+1 to controlling encoding/decoding in Polaris code.
| @ParameterizedTest | ||
| @MethodSource("encodeNamespaceCases") | ||
| public void testEncodeNamespace(Namespace ns, String expected) { | ||
| Assertions.assertThat(PolarisEntityUtils.encodeNamespace(ns)).isEqualTo(expected); |
There was a problem hiding this comment.
Could we add an assertion that the result matches RESTUtil.encodeNamespace?
For now it will ensure we're not breaking existing logic. If RESTUtil becomes incompatible later, we will remove those assertions.
There was a problem hiding this comment.
Actually, what I proposed above probably makes more sense for decodeNamespace 😅
| import java.nio.charset.StandardCharsets; | ||
| import org.apache.iceberg.catalog.Namespace; | ||
| import org.apache.iceberg.rest.RESTUtil; | ||
|
|
There was a problem hiding this comment.
Can we add a Java doc, something like
/**
* Polaris-owned encoding/decoding for namespaces stored in entity internal properties.
*
* <p>This logic is isolated from iceberg.rest.RESTUtil to ensure the storage format
* remains stable even if Iceberg's REST utilities change behavior in future versions,
* preventing potential data corruption in Polaris metastores.
*/
| */ | ||
| public static Namespace decodeNamespace(String encodedNs) { | ||
| Preconditions.checkArgument(encodedNs != null, "Invalid namespace: null"); | ||
| String[] levels = encodedNs.split(NAMESPACE_SEPARATOR_ENCODED, -1); |
There was a problem hiding this comment.
The RESTUtil used a guava splittercom.google.common.base.Splitter. I think we can validate if Java String splitter behaviors a bit different. I mighty be paranoid. There could be some corner cases causing decoding behavior drift.
There was a problem hiding this comment.
That's a good point, I will switch to using Joiner and Splitter.
|
We use |
I will tackle this later. It will depend on apache/iceberg#15989. |
This change replaces calls to
RESTUtilin Polaris entity code when serializing or deserializing namespaces stored in internal properties.It replaces those calls with calls to a new utility class,
PolarisEntityUtils. The logic used there for encoding and decoding namespaces is in fact identical toRESTUtilas of today, but having our own logic protects Polaris from future behavioral changes inRESTUtilthat could potentially cause data corruption in Polaris metastores.Checklist
CHANGELOG.md(if needed)site/content/in-dev/unreleased(if needed)