reference.md

Reference

API surface, CLI commands, and operator tables. Look here for "what exists and what it accepts." For task-oriented usage see how-to.md; for design rationale see explanation.md.

Core types

`PType` (`io.github.dfa1.vortex.core.PType`)

Physical primitive type — wire-level numeric kind for a column.

Constant	Bytes	Notes
`U8`, `U16`, `U32`, `U64`	1 / 2 / 4 / 8	Unsigned integers
`I8`, `I16`, `I32`, `I64`	1 / 2 / 4 / 8	Signed integers
`F16`	2	IEEE 754 half — decode not yet supported
`F32`, `F64`	4 / 8	IEEE 754 single / double

Methods: byteSize(), isFloating(), isSigned().

`DType` (`io.github.dfa1.vortex.core.DType`)

Sealed logical type. All variants take a trailing boolean nullable.

Record	Constructor
`DType.Null`	`new DType.Null(nullable)`
`DType.Bool`	`new DType.Bool(nullable)`
`DType.Primitive`	`new DType.Primitive(PType, nullable)`
`DType.Decimal`	`new DType.Decimal(precision, scale, nullable)`
`DType.Utf8`	`new DType.Utf8(nullable)`
`DType.Binary`	`new DType.Binary(nullable)`
`DType.Struct`	`new DType.Struct(fieldNames, fieldTypes, nullable)`
`DType.List`	`new DType.List(elementType, nullable)`
`DType.FixedSizeList`	`new DType.FixedSizeList(elementType, fixedSize, nullable)`
`DType.Extension`	`new DType.Extension(id, storageDType, metadata, nullable)`

Helpers: nullable(), withNullable(boolean), DType.Struct.field(name).

Reader API

`VortexReader` (`io.github.dfa1.vortex.io.VortexReader`)

Memory-mapped handle to a Vortex file. Implements AutoCloseable. Closing releases the mmap region; all Array buffers obtained during scans become invalid.

Method	Returns	Notes
`static open(Path)`	`VortexReader`	Uses `EncodingRegistry.loadAll()`
`static open(Path, EncodingRegistry)`	`VortexReader`	Custom registry (e.g. `allowUnknown()`)
`dtype()`	`DType`	Schema (typically `DType.Struct`)
`layout()`	`Layout`	Layout tree (Struct → Zoned → Chunked → Flat)
`footer()`	`Footer`	Segment specs, encoding specs
`version()`	`int`	File format version
`fileSize()`	`long`	File size in bytes
`registry()`	`EncodingRegistry`	Registry in use
`scan(ScanOptions)`	`ScanIterator`	Open a scan
`columnStats()`	`Map<String, ArrayStats>`	Aggregated min/max per column
`slice(offset, length)`	`MemorySegment`	Zero-copy slice of mmap region
`close()`	—	Releases mmap

Writer API

`VortexWriter` (`io.github.dfa1.vortex.writer.VortexWriter`)

Writes a Vortex file. Implements Closeable. The file is complete and readable as soon as close() returns.

Method	Notes
`static create(WritableByteChannel, DType.Struct, WriteOptions)`	Default codec set
`static create(WritableByteChannel, DType.Struct, WriteOptions, List<Encoding>)`	Custom codec set
`writeChunk(Map<String, Object>)`	One batch of rows; each value = `long[]`, `double[]`, `String[]`, `boolean[]`, etc., matching the column `DType`
`close()`	Finalizes file (footer, postscript, trailer)

`WriteOptions` (`io.github.dfa1.vortex.writer.WriteOptions`)

Record: (int chunkSize, boolean enableZoneMaps, double compressionRatioThreshold, int allowedCascading).

Factory	Defaults
`WriteOptions.defaults()`	`chunkSize=65_536`, `enableZoneMaps=true`, `compressionRatioThreshold=0.90`, `allowedCascading=0`
`WriteOptions.cascading(depth)`	Same defaults, `allowedCascading=depth`

Scan API

`ScanOptions` (`io.github.dfa1.vortex.scan.ScanOptions`)

Record: (List<String> columns, RowFilter rowFilter, long limit). Empty columns = read all. NO_LIMIT = Long.MAX_VALUE.

Factory / builder	Effect
`ScanOptions.all()`	All columns, no filter, no limit
`ScanOptions.columns(String... names)`	Project columns
`ScanOptions.limit(long n)`	Limit rows
`.withColumns(String... names)`	Project columns (builder)
`.withFilter(RowFilter)`	Add zone-map filter
`.withLimit(long n)`	Cap rows
`.hasProjection()` / `.hasFilter()` / `.hasLimit()`	Predicates

`RowFilter` (`io.github.dfa1.vortex.scan.RowFilter`)

Sealed predicate used for zone-map pruning (per-chunk min/max). Chunks that cannot match are skipped entirely.

Record	Static factory	Builder
`RowFilter.Gt(column, value)`	`RowFilter.gt(col, val)`	—
`RowFilter.Gte(column, value)`	`RowFilter.gte(col, val)`	—
`RowFilter.Lt(column, value)`	`RowFilter.lt(col, val)`	—
`RowFilter.Lte(column, value)`	`RowFilter.lte(col, val)`	—
`RowFilter.Eq(column, value)`	`RowFilter.eq(col, val)`	—
`RowFilter.Neq(column, value)`	`RowFilter.neq(col, val)`	—
`RowFilter.And(filters)`	`RowFilter.and(f1, f2, …)`	`f1.and(f2)`

`ScanIterator` (`io.github.dfa1.vortex.scan.ScanIterator`)

Implements AutoCloseable. Drives one scan.

Method	Notes
`hasNext()`	Closes the previous chunk's arena. Access all column data first.
`next()`	Returns `ScanResult`
`close()`	Releases iterator state

`ScanResult` (`io.github.dfa1.vortex.scan.ScanResult`)

Record: (long rowCount, Map<String, Array> columns).

Method	Notes
`rowCount()`	Rows in this chunk
`columns()`	All columns in this chunk
`<T extends Array> column(String name)`	Typed column lookup; throws `VortexException` if unknown

Encoding registry

`EncodingRegistry` (`io.github.dfa1.vortex.encoding`)

Method	Notes
`static loadAll()`	Loads every `Encoding` via `ServiceLoader`
`static empty()`	Empty registry
`register(Encoding)`	Add a custom encoding; throws if already registered
`hasEncoding(EncodingId)`	Lookup
`allowUnknown()`	Switch to passthrough mode — unknown nodes (and their children) decode as `UnknownArray`
`isAllowUnknown()`	Predicate

To register a custom encoding via ServiceLoader, add the fully qualified class name to META-INF/services/io.github.dfa1.vortex.encoding.Encoding.

Parquet / CSV import

`ParquetImporter` (`io.github.dfa1.vortex.parquet.ParquetImporter`)

Method	Notes
`importParquet(Path in, Path out)`	Defaults
`importParquet(Path in, Path out, ImportOptions)`	Tuned

`ImportOptions` (`io.github.dfa1.vortex.parquet.ImportOptions`)

Record: (int chunkSize, List<String> columns, ProgressListener progressListener, WriteOptions writeOptions).

Factory / builder	Notes
`ImportOptions.defaults()`	`chunkSize=65_536`, no projection, `WriteOptions.cascading(3)`
`.withColumns(List<String>)`	Project columns during import
`.withProgressListener(listener)`	Progress callbacks
`.withWriteOptions(WriteOptions)`	Override write options
`.withChunkSize(int)`	Override chunk size

CSV import is CLI-only — types are inferred from the data.

CLI

The cli module ships a fat jar with subcommands for inspecting and querying Vortex files.

./mvnw package -pl cli -am -DskipTests
java -jar cli/target/vortex.jar <subcommand> [args]

Subcommand	Syntax	Description
`inspect`	`inspect <file.vortex>`	Layout tree, encodings, row counts, buffer sizes
`schema`	`schema <file.vortex>`	Column names and types
`count`	`count <file.vortex>`	Total row count
`stats`	`stats <file.vortex>`	Per-column min/max
`export`	`export <file.vortex>`	All columns to CSV on stdout
`select`	`select <file.vortex> <col> [col2 ...]`	Project columns to CSV
`filter`	`filter <file.vortex> "<expr>"`	Filter rows to CSV
`import`	`import <file.csv\|file.parquet> [out.vortex]`	Convert CSV or Parquet to Vortex

`filter` expression syntax

<column> <op> <value>

Operator	Meaning
`>`, `>=`	Greater than, greater-or-equal
`<`, `<=`	Less than, less-or-equal
`=`, `==`	Equal
`!=`	Not equal

Values are parsed as integer, double, boolean, or string (in that order).

File format trailer

8 bytes at EOF:

version (u16 LE) | postscriptLen (u16 LE) | magic ("VTXF")

The postscript is a FlatBuffer blob immediately before the trailer. It points (offset + length) to: the Footer (FlatBuffer), the DType (Protobuf), and the Layout (FlatBuffer) — each stored elsewhere in the file.

See explanation.md#memory-model for the mmap lifecycle.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reference

Core types

`PType` (`io.github.dfa1.vortex.core.PType`)

`DType` (`io.github.dfa1.vortex.core.DType`)

Reader API

`VortexReader` (`io.github.dfa1.vortex.io.VortexReader`)

Writer API

`VortexWriter` (`io.github.dfa1.vortex.writer.VortexWriter`)

`WriteOptions` (`io.github.dfa1.vortex.writer.WriteOptions`)

Scan API

`ScanOptions` (`io.github.dfa1.vortex.scan.ScanOptions`)

`RowFilter` (`io.github.dfa1.vortex.scan.RowFilter`)

`ScanIterator` (`io.github.dfa1.vortex.scan.ScanIterator`)

`ScanResult` (`io.github.dfa1.vortex.scan.ScanResult`)

Encoding registry

`EncodingRegistry` (`io.github.dfa1.vortex.encoding`)

Parquet / CSV import

`ParquetImporter` (`io.github.dfa1.vortex.parquet.ParquetImporter`)

`ImportOptions` (`io.github.dfa1.vortex.parquet.ImportOptions`)

CLI

`filter` expression syntax

File format trailer

FilesExpand file tree

reference.md

Latest commit

History

reference.md

File metadata and controls

Reference

Core types

PType (io.github.dfa1.vortex.core.PType)

DType (io.github.dfa1.vortex.core.DType)

Reader API

VortexReader (io.github.dfa1.vortex.io.VortexReader)

Writer API

VortexWriter (io.github.dfa1.vortex.writer.VortexWriter)

WriteOptions (io.github.dfa1.vortex.writer.WriteOptions)

Scan API

ScanOptions (io.github.dfa1.vortex.scan.ScanOptions)

RowFilter (io.github.dfa1.vortex.scan.RowFilter)

ScanIterator (io.github.dfa1.vortex.scan.ScanIterator)

ScanResult (io.github.dfa1.vortex.scan.ScanResult)

Encoding registry

EncodingRegistry (io.github.dfa1.vortex.encoding)

Parquet / CSV import

ParquetImporter (io.github.dfa1.vortex.parquet.ParquetImporter)

ImportOptions (io.github.dfa1.vortex.parquet.ImportOptions)

CLI

filter expression syntax

File format trailer

`PType` (`io.github.dfa1.vortex.core.PType`)

`DType` (`io.github.dfa1.vortex.core.DType`)

`VortexReader` (`io.github.dfa1.vortex.io.VortexReader`)

`VortexWriter` (`io.github.dfa1.vortex.writer.VortexWriter`)

`WriteOptions` (`io.github.dfa1.vortex.writer.WriteOptions`)

`ScanOptions` (`io.github.dfa1.vortex.scan.ScanOptions`)

`RowFilter` (`io.github.dfa1.vortex.scan.RowFilter`)

`ScanIterator` (`io.github.dfa1.vortex.scan.ScanIterator`)

`ScanResult` (`io.github.dfa1.vortex.scan.ScanResult`)

`EncodingRegistry` (`io.github.dfa1.vortex.encoding`)

`ParquetImporter` (`io.github.dfa1.vortex.parquet.ParquetImporter`)

`ImportOptions` (`io.github.dfa1.vortex.parquet.ImportOptions`)

`filter` expression syntax