Interface PartitionReader<T>

All Superinterfaces:
AutoCloseable, Closeable
All Known Subinterfaces:
ContinuousPartitionReader<T>, SupportsRealTimeRead<T>

@Evolving public interface PartitionReader<T> extends Closeable
A partition reader returned by PartitionReaderFactory.createReader(InputPartition) or PartitionReaderFactory.createColumnarReader(InputPartition). It's responsible for outputting data for a RDD partition.

Note that, Currently the type `T` can only be InternalRow for normal data sources, or ColumnarBatch for columnar data sources(whose PartitionReaderFactory.supportColumnarReads(InputPartition) returns true).

Since:
3.0.0
  • Method Details

    • next

      boolean next() throws IOException
      Proceed to next record, returns false if there is no more records.
      Throws:
      IOException - if failure happens during disk/network IO like reading files.
    • get

      T get()
      Return the current record. This method should return same value until `next` is called.
    • currentMetricsValues

      default CustomTaskMetric[] currentMetricsValues()
      Returns an array of custom task metrics. By default it returns empty array. Note that it is not recommended to put heavy logic in this method as it may affect reading performance.
    • initMetricsValues

      @Deprecated(since="4.2.0") default void initMetricsValues(CustomTaskMetric[] metrics)
      Deprecated.
      Use CustomTaskMetric.mergeWith(CustomTaskMetric) instead. When a task reads multiple partitions concurrently or sequentially, DataSourceRDD now merges metrics from all readers via mergeWith at reporting time, removing the need to seed each new reader with the prior reader's values.
      Sets the initial value of metrics before fetching any data from the reader. This is called when multiple PartitionReaders are grouped into one partition in case of KeyGroupedPartitioning and the reader is initialized with the metrics returned by the previous reader that belongs to the same partition. By default, this method does nothing.