diff --git a/docs/connectors/mssql/index.mdx b/docs/connectors/mssql/index.mdx index d4986bf9..36d077af 100644 --- a/docs/connectors/mssql/index.mdx +++ b/docs/connectors/mssql/index.mdx @@ -18,6 +18,23 @@ The OLake Go MSSQL Source connector supports multiple synchronization modes. It - **CDC Only** - **Full Refresh + Incremental** +:::info **CHUNKING PERFORMANCE FOR TABLES WITHOUT PRIMARY KEYS** + +This is optional and not mandatory. If you want faster chunking for tables without primary keys, you can grant the following permission: + +**SQL Server 2016-2019:** + +```sql +GRANT VIEW DATABASE STATE TO ; +``` + +**SQL Server 2022 or later:** + +```sql +GRANT VIEW DATABASE PERFORMANCE STATE TO ; +``` +::: + ## Prerequisites ### Version Prerequisites @@ -159,8 +176,15 @@ To connect the MSSQL source to a read-only secondary replica, set the following In **OLake UI**, add this under **JDBC URL Parameters** as a key-value pair: `ApplicationIntent` (key) and `ReadOnly` (value). :::info -- CDC must be enabled on the primary database. -- If you enable **Manage Capture Instance** while connecting to a read-only secondary replica, OLake Go cannot create or drop CDC capture instances automatically because that requires write access on the primary database. In this mode, capture instances must be **created and maintained manually** on the primary database when using a secondary replica for sync. +CDC must be enabled on the primary database. +::: + +If you connect to a read-only secondary replica and enable **Manage Capture Instance**, OLake Go prompts for **primary database credentials** so it can create and manage capture instances on the primary. Without primary configuration, capture instances cannot be managed automatically. + +If you prefer not to provide primary database details, you can still sync from the secondary replica by managing capture instances manually on the primary and leaving **Manage Capture Instance** disabled. + +:::note SSH Tunnel +When SSH tunneling is enabled, both the primary and secondary database connections must be reachable through the same bastion host. ::: ### Connection Prerequisites @@ -283,3 +307,13 @@ check \ --- +## Troubleshooting {#troubleshooting} + +### 1. High Database CPU usage during Full Refresh + +Jobs syncing tables without a primary key can consume more CPU because rowid computation is done for the rows. + +**Solution:** For non-primary-key table jobs, if CPU usage is high, reduce `max_threads` in the source configuration or set it to the default value. + +**If the issue is not listed here, post the query on Slack to get it resolved within a few hours.** + diff --git a/docs/release/ingestion/v0.7.0.mdx b/docs/release/ingestion/v0.7.0.mdx index 8f802df8..d80720a8 100644 --- a/docs/release/ingestion/v0.7.0.mdx +++ b/docs/release/ingestion/v0.7.0.mdx @@ -19,6 +19,8 @@ April 21, 2026 – June 13, 2026 5. **MongoDB delete pre-image capture -**
Added support to capture the full document on delete events using `fullDocumentBeforeChange: "whenAvailable"` for MongoDB 6.0+ clusters with pre-images enabled, falling back to `_id`-only `documentKey` when pre-images are unavailable to preserve existing behaviour. +6. **Optimized chunking strategies for MSSQL -**
Adds faster and more efficient chunk planning for MSSQL full-load syncs. Uses page-level metadata to split tables without scanning them (SQL Server 2012+, requires `VIEW DATABASE STATE`; not supported on Azure SQL DB/MI). Falls back to statistical sampling when the primary strategy is unavailable. + ### Destinations 1. **Skip equality deletes for CDC inserts post-backfill -**
Equality deletes are now skipped for CDC inserts once the backfill→CDC overlap window is complete, reducing unnecessary write overhead. A new `dedup_inserts` flag on the Iceberg `olake_2pc` table property tracks this — Java sets it to `true` on backfill commit, and Go clears it to `false` after the first successful CDC commit. This applies to both the Arrow and legacy gRPC writers. @@ -47,4 +49,4 @@ April 21, 2026 – June 13, 2026 10. **Fixed edge cases in `ReformatValue` and `ReformatBool` -**
Corrected two bugs in value reformatting logic, added unit test coverage for `reformat.go`. -11. **Fixed TOAST column values being nulled on update events -**
Unchanged TOAST columns in PostgreSQL update events were incorrectly emitted as `null` when `pgoutput` omitted the column data for unchanged values. For `REPLICA IDENTITY FULL` tables, the fix now preserves the existing value from the old tuple, preventing data loss on updates. \ No newline at end of file +11. **Fixed TOAST column values being nulled on update events -**
Unchanged TOAST columns in PostgreSQL update events were incorrectly emitted as `null` when `pgoutput` omitted the column data for unchanged values. For `REPLICA IDENTITY FULL` tables, the fix now preserves the existing value from the old tuple, preventing data loss on updates.