Skip to content

Isolate encode/decode logic used in Polaris entities#4214

Merged
adutra merged 2 commits intoapache:mainfrom
adutra:namespace-entity-encode
Apr 17, 2026
Merged

Isolate encode/decode logic used in Polaris entities#4214
adutra merged 2 commits intoapache:mainfrom
adutra:namespace-entity-encode

Conversation

@adutra
Copy link
Copy Markdown
Contributor

@adutra adutra commented Apr 15, 2026

This change replaces calls to RESTUtil in Polaris entity code when serializing or deserializing namespaces stored in internal properties.

It replaces those calls with calls to a new utility class, PolarisEntityUtils. The logic used there for encoding and decoding namespaces is in fact identical to RESTUtil as of today, but having our own logic protects Polaris from future behavioral changes in RESTUtil that could potentially cause data corruption in Polaris metastores.

Checklist

  • 🛡️ Don't disclose security issues! (contact security@apache.org)
  • 🔗 Clearly explained why the changes are needed, or linked related issues: Fixes #
  • 🧪 Added/updated tests with good coverage, or manually tested (and explained how)
  • 💡 Added comments for complex logic
  • 🧾 Updated CHANGELOG.md (if needed)
  • 📚 Updated documentation in site/content/in-dev/unreleased (if needed)

@github-project-automation github-project-automation bot moved this to PRs In Progress in Basic Kanban Board Apr 15, 2026
@adutra adutra changed the title Isolate encde/decode logic used in Polaris entities Isolate encode/decode logic used in Polaris entities Apr 15, 2026
This change replaces calls to `RESTUtil` in Polaris entity code when serializing or deserializing namespaces stored in internal properties.

It replaces those calls with calls to a new utility class, `PolarisEntityUtils`. The logic used there for encoding and decoding namespaces is in fact identical to `RESTUtil` as of today, but having our own logic protects Polaris from future behavioral changes in `RESTUtil` that could potentially cause data corruption in Polaris metastores.
@adutra adutra force-pushed the namespace-entity-encode branch from 35a8066 to a2abf50 Compare April 15, 2026 17:13
Copy link
Copy Markdown
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to controlling encoding/decoding in Polaris code.

@ParameterizedTest
@MethodSource("encodeNamespaceCases")
public void testEncodeNamespace(Namespace ns, String expected) {
Assertions.assertThat(PolarisEntityUtils.encodeNamespace(ns)).isEqualTo(expected);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add an assertion that the result matches RESTUtil.encodeNamespace?

For now it will ensure we're not breaking existing logic. If RESTUtil becomes incompatible later, we will remove those assertions.

Copy link
Copy Markdown
Contributor

@dimas-b dimas-b Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, what I proposed above probably makes more sense for decodeNamespace 😅

import java.nio.charset.StandardCharsets;
import org.apache.iceberg.catalog.Namespace;
import org.apache.iceberg.rest.RESTUtil;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a Java doc, something like

  /**
   * Polaris-owned encoding/decoding for namespaces stored in entity internal properties.
   *
   * <p>This logic is isolated from iceberg.rest.RESTUtil to ensure the storage format
   * remains stable even if Iceberg's REST utilities change behavior in future versions,
   * preventing potential data corruption in Polaris metastores.
   */

*/
public static Namespace decodeNamespace(String encodedNs) {
Preconditions.checkArgument(encodedNs != null, "Invalid namespace: null");
String[] levels = encodedNs.split(NAMESPACE_SEPARATOR_ENCODED, -1);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RESTUtil used a guava splittercom.google.common.base.Splitter. I think we can validate if Java String splitter behaviors a bit different. I mighty be paranoid. There could be some corner cases causing decoding behavior drift.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, I will switch to using Joiner and Splitter.

@nandorKollar
Copy link
Copy Markdown
Contributor

We use RESTUtil.encodeNamespace also in other places, like for example in PolarisResourcePaths. May I ask why do we replace that call only in the entites?

jbonofre
jbonofre previously approved these changes Apr 16, 2026
Copy link
Copy Markdown
Member

@jbonofre jbonofre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Maybe only @flyrain's comment about the javadoc link is worth to consider (about Iceberg RestUtils).

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Apr 16, 2026
@adutra
Copy link
Copy Markdown
Contributor Author

adutra commented Apr 16, 2026

We use RESTUtil.encodeNamespace also in other places, like for example in PolarisResourcePaths. May I ask why do we replace that call only in the entites?

I will tackle this later. It will depend on apache/iceberg#15989.

@adutra adutra merged commit 32f7805 into apache:main Apr 17, 2026
52 of 56 checks passed
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Apr 17, 2026
@adutra adutra deleted the namespace-entity-encode branch April 17, 2026 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants