Skip to content

Commit d2945a1

Browse files
feat: Added Type Documentation (#21279)
Added types documentation for the following destinations: - SQlite - Snowflake - MongoDB - DuckDB - PostgressSQL --------- Co-authored-by: Erez Rokah <erezrokah@users.noreply.github.com>
1 parent 1c2b183 commit d2945a1

4 files changed

Lines changed: 229 additions & 0 deletions

File tree

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
The DuckDB destination supports most [Apache Arrow](https://arrow.apache.org/docs/index.html)
2+
types. The following table shows the supported types and how they are mapped
3+
to [DuckDB data types](https://duckdb.org/docs/sql/data_types/overview).
4+
5+
:::callout{type="info"}
6+
Unsupported types are always mapped to `varchar`.
7+
:::
8+
9+
| Arrow Column Type | Supported? | DuckDB Type |
10+
|-------------------|------------|-------------|
11+
| Binary | ✅ Yes | `blob` |
12+
| Boolean | ✅ Yes | `boolean` |
13+
| Date32 | ✅ Yes | `timestamp` |
14+
| Date64 | ✅ Yes | `timestamp` |
15+
| Decimal | ✅ Yes | `varchar` |
16+
| Dense Union | ✅ Yes | `varchar` |
17+
| Dictionary | ✅ Yes | `varchar` |
18+
| Duration | ✅ Yes | `varchar` |
19+
| Fixed Size List | ✅ Yes | `varchar` |
20+
| Float16 | ✅ Yes | `varchar` |
21+
| Float32 | ✅ Yes | `float` |
22+
| Float64 | ✅ Yes | `double` |
23+
| Inet | ✅ Yes | `varchar` |
24+
| Int8 | ✅ Yes | `tinyint` |
25+
| Int16 | ✅ Yes | `smallint` |
26+
| Int32 | ✅ Yes | `integer` |
27+
| Int64 | ✅ Yes | `bigint` |
28+
| Interval[DayTime] | ✅ Yes | `varchar` |
29+
| Interval[MonthDayNano] | ✅ Yes | `varchar` |
30+
| Interval[Month] | ✅ Yes | `varchar` |
31+
| JSON | ✅ Yes | `json` |
32+
| Large Binary | ✅ Yes | `blob` |
33+
| Large List | ✅ Yes | Array of element type |
34+
| Large String | ✅ Yes | `varchar` |
35+
| List | ✅ Yes | Array of element type |
36+
| MAC | ✅ Yes | `varchar` |
37+
| Map | ✅ Yes | `varchar`|
38+
| String | ✅ Yes | `varchar` |
39+
| Struct | ✅ Yes | `varchar`|
40+
| Timestamp | ✅ Yes | `timestamp` |
41+
| UUID | ✅ Yes | `uuid` |
42+
| Uint8 | ✅ Yes | `uinteger` |
43+
| Uint16 | ✅ Yes | `uinteger` |
44+
| Uint32 | ✅ Yes | `uinteger` |
45+
| Uint64 | ✅ Yes | `ubigint` |
46+
| Union | ✅ Yes | `varchar` |
47+
48+
## Notes
49+
50+
- DuckDB supports native unsigned integer types (`uinteger`, `ubigint`)
51+
- List types are converted to DuckDB arrays with the appropriate element type (e.g., `integer[]`)
52+
- Date32 and Date64 types are both mapped to `timestamp` for compatibility
53+
- UUID and JSON types have native DuckDB support
54+
55+
† Complex types like Struct and Map are converted to `varchar` for storage, losing their structured nature
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
The PostgreSQL destination supports most [Apache Arrow](https://arrow.apache.org/docs/index.html)
2+
types. The following table shows the supported types and how they are mapped
3+
to [PostgreSQL data types](https://www.postgresql.org/docs/current/datatype.html).
4+
5+
:::callout{type="info"}
6+
Unsupported types are converted to text using their string representation.
7+
:::
8+
9+
| Arrow Column Type | Supported? | PostgreSQL Type |
10+
|------------------------|------------|-----------------|
11+
| Binary | ✅ Yes | `bytea` |
12+
| Boolean | ✅ Yes | `boolean` |
13+
| Date32 | ✅ Yes | `date` |
14+
| Date64 | ✅ Yes | `date` |
15+
| Decimal | ✅ Yes | `text` |
16+
| Dense Union | ✅ Yes | `text` |
17+
| Dictionary | ✅ Yes | `text` |
18+
| Duration | ✅ Yes | `text` |
19+
| Fixed Size List | ✅ Yes | `text` |
20+
| Float16 | ✅ Yes | `text` |
21+
| Float32 | ✅ Yes | `real` |
22+
| Float64 | ✅ Yes | `double precision` |
23+
| Inet | ✅ Yes | `inet` |
24+
| Int8 | ✅ Yes | `smallint` |
25+
| Int16 | ✅ Yes | `smallint` |
26+
| Int32 | ✅ Yes | `integer` |
27+
| Int64 | ✅ Yes | `bigint` |
28+
| Interval[DayTime] | ✅ Yes | `text` |
29+
| Interval[MonthDayNano] | ✅ Yes | `text` |
30+
| Interval[Month] | ✅ Yes | `text` |
31+
| JSON | ✅ Yes | `jsonb` |
32+
| Large Binary | ✅ Yes | `bytea` |
33+
| Large List | ✅ Yes | Array of element type |
34+
| Large String | ✅ Yes | `text` |
35+
| List | ✅ Yes | Array of element type |
36+
| MAC | ✅ Yes | `text` |
37+
| Map | ✅ Yes | `text` |
38+
| String | ✅ Yes | `text` |
39+
| Struct | ✅ Yes | `text` |
40+
| Time32 | ✅ Yes | `time without time zone` |
41+
| Time64 | ✅ Yes | `time without time zone` |
42+
| Timestamp | ✅ Yes | `timestamp without time zone` |
43+
| UUID | ✅ Yes | `uuid` |
44+
| Uint8 | ✅ Yes | `smallint` |
45+
| Uint16 | ✅ Yes | `integer` |
46+
| Uint32 | ✅ Yes | `bigint` |
47+
| Uint64 | ✅ Yes | `numeric(20,0)` |
48+
| Union | ✅ Yes | `text` |
49+
50+
## Notes
51+
52+
- Null characters (`\x00`) are automatically stripped from string values for PostgreSQL compatibility
53+
- JSON data is stored as `jsonb` with null characters stripped from string values
54+
- List types are converted to PostgreSQL arrays with recursive transformation
55+
- Time values are stored with microsecond precision
56+
- Timestamps are stored as `timestamp without time zone` in UTC
57+
58+
:::callout{type="info"}
59+
For CrateDB compatibility, Uint64 values are stored as strings instead of numeric
60+
:::
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
The Snowflake destination supports most [Apache Arrow](https://arrow.apache.org/docs/index.html)
2+
types. The following table shows the supported types and how they are mapped
3+
to [Snowflake data types](https://docs.snowflake.com/en/sql-reference/data-types).
4+
5+
:::callout{type="info"}
6+
Unsupported types are always mapped to `text`.
7+
:::
8+
9+
| Arrow Column Type | Supported? | Snowflake Type |
10+
|------------------------|------------|----------------|
11+
| Binary | ✅ Yes | `binary` |
12+
| Boolean | ✅ Yes | `boolean` |
13+
| Date32 | ✅ Yes | `text` |
14+
| Date64 | ✅ Yes | `text` |
15+
| Decimal | ✅ Yes | `text` |
16+
| Dense Union | ✅ Yes | `text` |
17+
| Dictionary | ✅ Yes | `text` |
18+
| Duration | ✅ Yes | `text` |
19+
| Fixed Size List | ✅ Yes | `array` |
20+
| Float16 | ✅ Yes | `text` |
21+
| Float32 | ✅ Yes | `float` |
22+
| Float64 | ✅ Yes | `float` |
23+
| Inet | ✅ Yes | `text` |
24+
| Int8 | ✅ Yes | `number` |
25+
| Int16 | ✅ Yes | `number` |
26+
| Int32 | ✅ Yes | `number` |
27+
| Int64 | ✅ Yes | `number` |
28+
| Interval[DayTime] | ✅ Yes | `text` |
29+
| Interval[MonthDayNano] | ✅ Yes | `text` |
30+
| Interval[Month] | ✅ Yes | `text` |
31+
| JSON | ✅ Yes | `variant` |
32+
| Large Binary | ✅ Yes | `binary` |
33+
| Large List | ✅ Yes | `array` |
34+
| Large String | ✅ Yes | `text` |
35+
| List | ✅ Yes | `array` |
36+
| MAC | ✅ Yes | `text` |
37+
| Map | ✅ Yes | `text` |
38+
| String | ✅ Yes | `text` |
39+
| Struct | ✅ Yes | `variant` |
40+
| Time32 | ✅ Yes | `text` |
41+
| Time64 | ✅ Yes | `text` |
42+
| Timestamp | ✅ Yes | `timestamp_tz` |
43+
| UUID | ✅ Yes | `text` |
44+
| Uint8 | ✅ Yes | `number` |
45+
| Uint16 | ✅ Yes | `number` |
46+
| Uint32 | ✅ Yes | `number` |
47+
| Uint64 | ✅ Yes | `number` |
48+
| Union | ✅ Yes | `text` |
49+
50+
## Notes
51+
52+
- All integer types (signed and unsigned) are mapped to Snowflake's `number` type, which can handle arbitrary precision
53+
- Both Float32 and Float64 are stored as Snowflake's `float` type
54+
- List types are stored as Snowflake `array` type, including both regular and fixed-size lists
55+
- JSON and Struct types use Snowflake's `variant` type for semi-structured data storage
56+
- Timestamps are stored as `timestamp_tz` (timestamp with timezone)
57+
- Complex types not natively supported by Snowflake fall back to `text` representation
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
The SQLite destination supports most [Apache Arrow](https://arrow.apache.org/docs/index.html)
2+
types. The following table shows the supported types and how they are mapped
3+
to [SQLite data types](https://www.sqlite.org/datatype3.html).
4+
5+
:::callout{type="info"}
6+
Unsupported types are always mapped to `text`.
7+
:::
8+
9+
| Arrow Column Type | Supported? | SQLite Type |
10+
|------------------------|------------|-------------|
11+
| Binary | ✅ Yes | `blob` |
12+
| Boolean | ✅ Yes | `boolean` |
13+
| Date32 | ✅ Yes | `text` |
14+
| Date64 | ✅ Yes | `text` |
15+
| Decimal | ✅ Yes | `text` |
16+
| Dense Union | ✅ Yes | `text` |
17+
| Dictionary | ✅ Yes | `text` |
18+
| Duration | ✅ Yes | `text` |
19+
| Fixed Size List | ✅ Yes | `text` |
20+
| Float16 | ✅ Yes | `real` |
21+
| Float32 | ✅ Yes | `real` |
22+
| Float64 | ✅ Yes | `real` |
23+
| Inet | ✅ Yes | `text` |
24+
| Int8 | ✅ Yes | `integer` |
25+
| Int16 | ✅ Yes | `integer` |
26+
| Int32 | ✅ Yes | `integer` |
27+
| Int64 | ✅ Yes | `integer` |
28+
| Interval[DayTime] | ✅ Yes | `text` |
29+
| Interval[MonthDayNano] | ✅ Yes | `text` |
30+
| Interval[Month] | ✅ Yes | `text` |
31+
| JSON | ✅ Yes | `text` |
32+
| Large Binary | ✅ Yes | `blob` |
33+
| Large List | ✅ Yes | `text` |
34+
| Large String | ✅ Yes | `text` |
35+
| List | ✅ Yes | `text` |
36+
| MAC | ✅ Yes | `text` |
37+
| Map | ✅ Yes | `text` |
38+
| String | ✅ Yes | `text` |
39+
| Struct | ✅ Yes | `text` |
40+
| Time32 | ✅ Yes | `text` |
41+
| Time64 | ✅ Yes | `text` |
42+
| Timestamp | ✅ Yes | `timestamp` |
43+
| UUID | ✅ Yes | `text` |
44+
| Uint8 | ✅ Yes | `integer` |
45+
| Uint16 | ✅ Yes | `integer` |
46+
| Uint32 | ✅ Yes | `integer` |
47+
| Uint64 | ✅ Yes | `integer` |
48+
| Union | ✅ Yes | `text` |
49+
50+
## Notes
51+
52+
- SQLite has a simplified type system with only 5 storage classes: NULL, INTEGER, REAL, TEXT, and BLOB
53+
- All integer types (signed and unsigned, 8-bit to 64-bit) are stored as SQLite `integer`
54+
- All floating-point types (Float16, Float32, Float64) are stored as SQLite `real`
55+
- Complex data types like JSON, List, and Struct are serialized and stored as `text`
56+
- Binary data uses SQLite's `blob` storage class
57+
- SQLite's dynamic typing system allows flexible data storage regardless of declared column type

0 commit comments

Comments
 (0)