Skip to content

Latest commit

 

History

History
170 lines (127 loc) · 4.75 KB

File metadata and controls

170 lines (127 loc) · 4.75 KB

Tutorial: Your first Vortex file

This tutorial walks you through writing and reading a Vortex file from scratch. You will end up with a working Maven project that stores time-series data in Vortex format and reads it back column by column.

Prerequisites: Java 25+, Maven 3.9+.


1. Create a Maven project

mvn archetype:generate \
  -DgroupId=com.example \
  -DartifactId=vortex-demo \
  -DarchetypeArtifactId=maven-archetype-quickstart \
  -DarchetypeVersion=1.5 \
  -DinteractiveMode=false
cd vortex-demo

Add the dependency to pom.xml (inside <dependencies>):

<dependency>
  <groupId>io.github.dfa1</groupId>
  <artifactId>vortex-java</artifactId>
  <version>0.1.0-SNAPSHOT</version>
</dependency>

Set the compiler to Java 25:

<properties>
  <maven.compiler.release>25</maven.compiler.release>
</properties>

2. Define a schema

A Vortex file is a typed struct — every column has a declared type before any data is written.

import io.github.dfa1.vortex.core.DType;
import io.github.dfa1.vortex.core.PType;
import java.util.List;

DType.Struct schema = new DType.Struct(
    List.of("timestamp", "symbol", "price", "volume"),
    List.of(
        new DType.Primitive(PType.I64, false),   // unix epoch millis, non-nullable
        new DType.Utf8(false),                    // ticker symbol
        new DType.Primitive(PType.F64, false),   // trade price
        new DType.Primitive(PType.I64, false)    // shares traded
    ),
    false  // the struct itself is non-nullable
);

Passing true as the trailing argument makes the column nullable. See reference.md#core-types for the full DType / PType list.


3. Write data

import io.github.dfa1.vortex.writer.VortexWriter;
import io.github.dfa1.vortex.writer.WriteOptions;
import java.nio.channels.FileChannel;
import java.nio.file.Path;
import java.util.Map;
import static java.nio.file.StandardOpenOption.*;

Path outPath = Path.of("trades.vortex");

try (FileChannel ch = FileChannel.open(outPath, CREATE, WRITE, TRUNCATE_EXISTING);
     VortexWriter writer = VortexWriter.create(ch, schema, WriteOptions.defaults())) {

    writer.writeChunk(Map.of(
        "timestamp", new long[]   {1_700_000_000_000L, 1_700_000_001_000L, 1_700_000_002_000L},
        "symbol",    new String[]  {"AAPL", "AAPL", "MSFT"},
        "price",     new double[]  {189.95, 190.10, 374.20},
        "volume",    new long[]    {100L,   250L,   175L}
    ));
}

writeChunk takes one batch of rows. Call it multiple times to write multiple chunks — each chunk is compressed independently and can be skipped during a scan if zone-map statistics rule it out.

The file is complete and readable as soon as VortexWriter is closed.


4. Read it back

import io.github.dfa1.vortex.io.VortexReader;
import io.github.dfa1.vortex.scan.ScanOptions;
import io.github.dfa1.vortex.core.array.DoubleArray;
import io.github.dfa1.vortex.core.array.LongArray;

try (VortexReader vf = VortexReader.open(outPath);
     var iter = vf.scan(ScanOptions.all())) {

    while (iter.hasNext()) {
        var chunk = iter.next();   // advances to the next batch

        LongArray  ts     = chunk.column("timestamp");
        DoubleArray price = chunk.column("price");

        for (long i = 0; i < chunk.rowCount(); i++) {
            System.out.printf("%d  %.2f%n", ts.getLong(i), price.getDouble(i));
        }
        // ⚠ do not store references past this point —
        //   iter.hasNext() frees the chunk's memory
    }
}

Important: every chunk lives in an off-heap Arena. Calling iter.hasNext() closes that arena and releases the memory. Read all values before advancing the iterator. See explanation.md#memory-model for why the lifetime works this way.

Expected output:

1700000000000  189.95
1700000001000  190.10
1700000002000  374.20

5. Project columns and limit rows

Reading every column is wasteful when you only need two. Use withColumns to project, and withLimit to stop after n rows:

ScanOptions opts = ScanOptions.all()
    .withColumns("symbol", "price")
    .withLimit(2);

try (VortexReader vf = VortexReader.open(outPath);
     var iter = vf.scan(opts)) {

    while (iter.hasNext()) {
        var chunk = iter.next();
        // chunk only contains "symbol" and "price"
    }
}

What's next

You now have a working write-then-read flow. From here:

  • how-to.md — task recipes: filter rows, project columns, convert Parquet, use the CLI
  • reference.md — API surface, CLI subcommands, operator tables
  • compatibility.md — which encodings are supported
  • explanation.md — memory model, testing strategy, benchmarks