parquet.md

Parquet Format

Feldera can ingest and output data in the Parquet format.

via ingress and egress REST endpoints by specifying ?format=parquet in the URL
as a payload received from or sent to a connector

We document the Parquet format and how it interacts with different SQL types in this page.

Types

The parquet file is expected to be a valid parquet file with a schema. The schema (row name and type) must match the table definition in the Feldera pipeline program. We use Arrow to specify the data-types in parquet. The following table shows the mapping between Feldera SQL types and Arrow types.

Feldera SQL Type	Apache Arrow Type
`BOOLEAN`	`Boolean`
`TINYINT`, `SMALLINT`, `INTEGER`, `BIGINT`	`Int8`, `Int16`, `Int32`, `Int64`
`FLOAT`, `DOUBLE`, `DECIMAL`	`Float32`, `Float64`, `Decimal`
`VARCHAR`, `CHAR`, `STRING`	`LargeUtf8`
`BINARY`, `VARBINARY`	`DataType::Binary`
`TIME`	`DataType::UInt64` (time in nanoseconds)
`TIMESTAMP`	`DataType::Timestamp(TimeUnit::Millisecond, None)` (milliseconds since unix epoch)
`DATE`	`DataType::Int32` (days since unix epoch)
`ARRAY`	`DataType::LargeList`
`STRUCT`	`DataType::Struct`
`MAP`	`DataType::Dictionary`
`VARIANT`	`LargeUtf8` (JSON-encoded string, see VARIANT documentation)

Example

In this example, we configure a table to load data from a Parquet file.

create table PARTS (
  part bigint not null,
  vendor bigint not null,
  price bigint not null
) with ('connectors' = '[{
  "transport": {
    "name": "url_input",
    "config": { "path": "https://feldera-basics-tutorial.s3.amazonaws.com/parts.parquet" }
  },
  "format": {
    "name": "parquet",
    "config": {}
  }
}]');

For reference, the following python script was used to generate the parts.parquet file:

import pyarrow as pa
import pyarrow.parquet as pq

data = {
    'PART': [1, 2, 3],
    'VENDOR': [2, 1, 3],
    'PRICE': [10000, 15000, 9000]
}
table = pa.Table.from_pydict(data)
pq.write_table(table, 'parts.parquet')

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parquet Format

Types

Example

FilesExpand file tree

parquet.md

Latest commit

History

parquet.md

File metadata and controls

Parquet Format

Types

Example