From 1b063a85cd6649be726f8c63b09b3f8f425d96b9 Mon Sep 17 00:00:00 2001 From: Nayan Joshi Date: Tue, 16 Jun 2026 16:47:53 +0530 Subject: [PATCH 1/5] update mssql documentation --- docs/connectors/mssql/index.mdx | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/docs/connectors/mssql/index.mdx b/docs/connectors/mssql/index.mdx index d4986bf9..dd030382 100644 --- a/docs/connectors/mssql/index.mdx +++ b/docs/connectors/mssql/index.mdx @@ -18,6 +18,23 @@ The OLake Go MSSQL Source connector supports multiple synchronization modes. It - **CDC Only** - **Full Refresh + Incremental** +:::info **CHUNKING PERFORMANCE FOR TABLES WITHOUT PRIMARY KEYS** + +This is optional and not mandatory. If you want faster chunking for tables without primary keys, you can grant view table state permission. + +**SQL Server 2016-2019:** + +```sql +GRANT VIEW DATABASE STATE TO ; +``` + +**SQL Server 2022 or later:** + +```sql +GRANT VIEW DATABASE PERFORMANCE STATE TO ; +``` +::: + ## Prerequisites ### Version Prerequisites @@ -283,3 +300,13 @@ check \ --- +## Troubleshooting {#troubleshooting} + +### 1. High CPU usage for tables without a primary key + +Jobs syncing tables without a primary key can consume more CPU because parallel chunking is less efficient without a primary key. + +**Solution:** For non-primary-key table jobs, if CPU usage is high, reduce `max_threads` in the source configuration or set it to the default value. + +**If the issue is not listed here, post the query on Slack to get it resolved within a few hours.** + From c0870577bd7efce8cd4d634c298af0de32520c68 Mon Sep 17 00:00:00 2001 From: Nayan Joshi Date: Wed, 17 Jun 2026 16:45:44 +0530 Subject: [PATCH 2/5] context correction --- docs/connectors/mssql/index.mdx | 4 ++-- docs/release/ingestion/v0.7.0.mdx | 7 +++++++ 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/docs/connectors/mssql/index.mdx b/docs/connectors/mssql/index.mdx index dd030382..1c26bc43 100644 --- a/docs/connectors/mssql/index.mdx +++ b/docs/connectors/mssql/index.mdx @@ -20,7 +20,7 @@ The OLake Go MSSQL Source connector supports multiple synchronization modes. It :::info **CHUNKING PERFORMANCE FOR TABLES WITHOUT PRIMARY KEYS** -This is optional and not mandatory. If you want faster chunking for tables without primary keys, you can grant view table state permission. +This is optional and not mandatory. If you want faster chunking for tables without primary keys, you can grant the following permission: **SQL Server 2016-2019:** @@ -304,7 +304,7 @@ check \ ### 1. High CPU usage for tables without a primary key -Jobs syncing tables without a primary key can consume more CPU because parallel chunking is less efficient without a primary key. +Jobs syncing tables without a primary key can consume more CPU because rowid computation is done for the rows. **Solution:** For non-primary-key table jobs, if CPU usage is high, reduce `max_threads` in the source configuration or set it to the default value. diff --git a/docs/release/ingestion/v0.7.0.mdx b/docs/release/ingestion/v0.7.0.mdx index a77277b1..b5c709a0 100644 --- a/docs/release/ingestion/v0.7.0.mdx +++ b/docs/release/ingestion/v0.7.0.mdx @@ -19,6 +19,8 @@ April 21, 2026 – May 30, 2026 5. **MongoDB delete pre-image capture -**
Added support to capture the full document on delete events using `fullDocumentBeforeChange: "whenAvailable"` for MongoDB 6.0+ clusters with pre-images enabled, falling back to `_id`-only `documentKey` when pre-images are unavailable to preserve existing behaviour. +6. **Optimized chunking strategies for MSSQL -**
Adds faster and more efficient chunk planning for MSSQL full-load syncs. Uses page-level metadata to split tables without scanning them (SQL Server 2012+, requires `VIEW DATABASE STATE`; not supported on Azure SQL DB/MI). Falls back to statistical sampling when the primary strategy is unavailable. + ### Destinations 1. **Skip equality deletes for CDC inserts post-backfill -**
Equality deletes are now skipped for CDC inserts once the backfill→CDC overlap window is complete, reducing unnecessary write overhead. A new `dedup_inserts` flag on the Iceberg `olake_2pc` table property tracks this — Java sets it to `true` on backfill commit, and Go clears it to `false` after the first successful CDC commit. This applies to both the Arrow and legacy gRPC writers. @@ -45,3 +47,8 @@ April 21, 2026 – May 30, 2026 9. **Graceful shutdown via SIGINT/SIGTERM-aware root context -**
Wired SIGINT/SIGTERM into the Cobra root context using `signal.NotifyContext`, so CDC, backfill, and destination writers now respect `ctx.Done()` and shut down cleanly on pod eviction, `docker stop`, or Ctrl-C instead of being killed mid-read. +10. **Fixed duplicate records on thread retry during backfill sync -**
In 2PC backfill syncs, a thread failure after Iceberg commit but before Go could persist local state caused retries to re-write already-committed chunks. Fixed by always fetching fresh `olake_2pc` state on every `NewWriter` call instead of reusing a stale cached snapshot, and ensuring `GET_OR_CREATE_TABLE` is called for all threads so committed chunk IDs are always visible to retrying threads. + +11. **Fixed edge cases in `ReformatValue` and `ReformatBool` -**
Corrected two bugs in value reformatting logic, added unit test coverage for `reformat.go`, and removed dead code. + +12. **Fixed TOAST column values being nulled on update events -**
Unchanged TOAST columns in PostgreSQL update events were incorrectly emitted as `null` when `pgoutput` omitted the column data for unchanged values. For `REPLICA IDENTITY FULL` tables, the fix now preserves the existing value from the old tuple, preventing data loss on updates. From 7345075a3b5129bd13792302e7388863ef2d0191 Mon Sep 17 00:00:00 2001 From: Nayan Joshi Date: Thu, 18 Jun 2026 16:50:30 +0530 Subject: [PATCH 3/5] update troubleshooting --- docs/connectors/mssql/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/connectors/mssql/index.mdx b/docs/connectors/mssql/index.mdx index 1c26bc43..c4b69492 100644 --- a/docs/connectors/mssql/index.mdx +++ b/docs/connectors/mssql/index.mdx @@ -302,7 +302,7 @@ check \ ## Troubleshooting {#troubleshooting} -### 1. High CPU usage for tables without a primary key +### 1. High Database CPU usage during Full Refresh Jobs syncing tables without a primary key can consume more CPU because rowid computation is done for the rows. From 3a1839249f3b9bcca31c50b96e8af3c1b9e1e95e Mon Sep 17 00:00:00 2001 From: Nayan Joshi Date: Fri, 19 Jun 2026 13:25:27 +0530 Subject: [PATCH 4/5] Update the manage instance documentation while using secondaty database --- docs/connectors/mssql/index.mdx | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/docs/connectors/mssql/index.mdx b/docs/connectors/mssql/index.mdx index c4b69492..670023a1 100644 --- a/docs/connectors/mssql/index.mdx +++ b/docs/connectors/mssql/index.mdx @@ -176,8 +176,15 @@ To connect the MSSQL source to a read-only secondary replica, set the following In **OLake UI**, add this under **JDBC URL Parameters** as a key-value pair: `ApplicationIntent` (key) and `ReadOnly` (value). :::info -- CDC must be enabled on the primary database. -- If you enable **Manage Capture Instance** while connecting to a read-only secondary replica, OLake Go cannot create or drop CDC capture instances automatically because that requires write access on the primary database. In this mode, capture instances must be **created and maintained manually** on the primary database when using a secondary replica for sync. +CDC must be enabled on the primary database. +::: + +If you connect to a read-only secondary replica and enable **Manage Capture Instance**, OLake Go prompts for **primary database credentials** so it can create and manage capture instances on the primary. Without primary configuration, capture instances cannot be managed automatically. + +If you prefer not to provide primary database details, you can still sync from the secondary replica by managing capture instances manually on the primary and leaving **Manage Capture Instance** disabled. + +:::note SSH Tunnel +When using an SSH tunnel, OLake Go assumes a single tunnel is used to reach both the primary and secondary databases. ::: ### Connection Prerequisites From 34eae6d8c8e1abb2f44cf8bf9469d069b6eccafc Mon Sep 17 00:00:00 2001 From: Nayan Joshi Date: Fri, 19 Jun 2026 15:04:28 +0530 Subject: [PATCH 5/5] update the ssh tunnel note --- docs/connectors/mssql/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/connectors/mssql/index.mdx b/docs/connectors/mssql/index.mdx index 670023a1..36d077af 100644 --- a/docs/connectors/mssql/index.mdx +++ b/docs/connectors/mssql/index.mdx @@ -184,7 +184,7 @@ If you connect to a read-only secondary replica and enable **Manage Capture Inst If you prefer not to provide primary database details, you can still sync from the secondary replica by managing capture instances manually on the primary and leaving **Manage Capture Instance** disabled. :::note SSH Tunnel -When using an SSH tunnel, OLake Go assumes a single tunnel is used to reach both the primary and secondary databases. +When SSH tunneling is enabled, both the primary and secondary database connections must be reachable through the same bastion host. ::: ### Connection Prerequisites