Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 36 additions & 3 deletions storage/tables/data-types/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,11 +158,44 @@ The behavior of incremental loading differs between **typed** and **non-typed ta
For more information, refer to our documentation on [incremental loading](/storage/tables/#difference-between-tables-with-native-datatypes-and-string-tables).

## Handling NULLs
Data can contain `NULL` values or empty strings, which are converted differently based on the processing backend, as follows:
Data can contain `NULL` values or empty strings, which are converted differently based on the processing backend.

- Snowflake: `,,` => `NULL` or `""` => `NULL`
- BigQuery: `,,` => `NULL` and `""` => `""`
### CSV Format: Unquoted vs Quoted Empty Values
In CSV files, there are two ways to represent an empty value:

- **Unquoted empty** (`,,`) — the field is completely absent between delimiters
- **Quoted empty** (`""`) — the field explicitly contains an empty string

This distinction matters because backends treat them differently.

### Backend Behavior

| CSV value | Snowflake (STRING) | BigQuery (STRING) | BigQuery (INT64/NUMERIC) |
|---|---|---|---|
| `,,` (unquoted empty) | `NULL` | `NULL` | `NULL` |
| `""` (quoted empty) | `NULL` | `’’` (empty string) | `NULL` |

**Key difference:** Snowflake converts both `,,` and `""` to `NULL` for string columns. BigQuery preserves `""` as an empty string for STRING columns but coerces it to `NULL` for numeric columns (since an empty string is not a valid number).

This means the **same CSV file** can produce different NULL semantics depending on the target column type in BigQuery. Numeric columns will have `NULL` where string columns will have an empty string, even though the source CSV value is the same.

**Note:** This is a behavior of BigQuery's CSV parser, not Keboola Storage. BigQuery does not provide any option (similar to Snowflake's `EMPTY_FIELD_AS_NULL`) to change how quoted empty strings are handled during CSV loading. There is no available workaround at the BigQuery level.

### The `treatValuesAsNull` Parameter
When importing data, you can use the `treatValuesAsNull` parameter to specify which values should be treated as `NULL`. For example, `treatValuesAsNull: [""]` tells the system to treat empty strings as `NULL`.

However, in BigQuery, this parameter maps to BigQuery’s native [`nullMarker`](https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv#csv-options) CSV load option, which **only applies to unquoted CSV fields**. Quoted values bypass the null marker check entirely — this is a limitation of BigQuery’s CSV parser, not Keboola Storage.

In practice:

| CSV value | `treatValuesAsNull: [""]` | BigQuery STRING result |
|---|---|---|
| `,,` (unquoted empty) | matches null marker → **NULL** | `NULL` |
| `""` (quoted empty) | skipped (quoted) | `’’` (empty string) |

If your data source (e.g., a database extractor) writes `NULL` values as `""` instead of `,,`, string columns will contain empty strings instead of `NULL` in BigQuery, while numeric columns will still show `NULL` due to type coercion. To ensure correct NULL handling, the data source must represent NULL values as unquoted empty fields (`,,`) in the CSV.

### NULL and Incremental Loading
Columns without native types are always `VARCHAR NOT NULL`. This means you don’t need to worry about specific `NULL` behavior. However, this changes with typed columns.

In most databases, `NULL` does not equal `NULL` (`NULL == NULL` is not `TRUE`, but `NULL`). This behavior can disrupt the incremental loading process, where columns are compared to detect changes.
Expand Down
Loading