# Python API Reference


## `array_record.python.array_record_module.ArrayRecordWriter`

### `ArrayRecordWriter(path: str, options: str)`

* `path` (str): File path where the ArrayRecord to be written.
* `options` (str, optional): Comma-separated options string. Default ""

#### Options string format

The options string can contain the following comma-separated options:

* `group_size:N` - Number of records per chunk (default: 1)
* `uncompressed` - Disable compression
* `brotli[:N]` - Use Brotli compression with level N (0-11, default: 6)
* `zstd[:N]` - Use Zstd compression with level N (-131072 to 22, default: 3)
* `snappy` - Use Snappy compression
* `window_log:N` - LZ77 window size (10-31) for zstd and brotli.
* `pad_to_block_boundary:true/false` - Pad chunks to 64KB boundaries (default
  false)

User should only select one of the compression options `zstd`, `brotli`,
`snappy`, `uncompressed`, otherwise an error would be raised.

### `ok() -> bool`

Returns true when the writer object is having a healthy state.

### `close()`

Closes the file. May raise an error if it failed to do so.

### `is_open() -> bool`

Returns true when the file is opened.

### `write(record: bytes)`

Writes a record to the file. May raise an error if it failed to do so.


## `array_record.python.array_record_module.ArrayRecordReader`

### `ArrayRecordReader(path: str, options: str)`

* `path` (str): File path to read from.
* `options` (str, optional): Comma-separated options string. Default ""

#### Options string format

The options string can contain the following comma-separated options:

* `readahead_buffer_size:N` - Number of bytes for read-ahead buffer size per
  thread (default 0)
* `max_parallelism: N` - Number of read-ahead threads.
* `index_storage_options:in_memory/offloaded` - Specifies to store the record
  index in memory or on disk (default: `in_memory`)

### `ok() -> bool`

Returns true when the reader object is having a healthy state.

### `close()`

Closes the file. May raise an error if it failed to do so.

### `is_open() -> bool`

Returns true when the file was opened.

### `num_records() -> int`

Returns the number of records in the file.

### `record_index() -> int`

Returns the current record index. This field is only relevant in the sequential
reading mode.

### `writer_options_string() -> str`

Returns the writer options string that was used when creating the ArrayRecord
file.

### `seek(index: int)`

Update the cursor to the specified index. Throws an error if the index was out
of bound.

### `read() -> bytes`

Reads a record and advance the cursor index by one. Throws an error if the
cursor reaches the end of the file.

### `read(indices: Sequence) -> Sequence[bytes]`

Reads the set of records specified by the input indices with an internal thread
pool. Throws an error if any of the index was out of bound.

### `read(start: int, end: int) -> Sequence[bytes]`

Reads the set of records by range with an internal thread pool. Throws an error
if the index was out of bound.

### `read_all() -> Sequence[bytes]`

Reads all records with an internal thread pool. Throws an error if the index was
out of bound.

## `array_record.python.array_record_data_source.ArrayRecordDataSource`

### `ArrayRecordDataSource(paths: Sequence[str], reader_options: str)`

* `paths` (Sequence[str]): File paths to read from.
* `options` (str, optional): Comma-separated options string. Default "". See
  `ArrayRecordReader` constructor options for details.

### `__len__() -> int`

Returns the number of records of all the array record files specified in the
constructor.

```python
from array_record.python import array_record_data_source
ds = array_record_data_source.ArrayRecordDataSource(glob.glob("output.array_record*"))
len(ds)
```

### `__iter__() -> Iterator[bytes]`

Iterator interface for data access.

```python
from array_record.python import array_record_data_source
ds = array_record_data_source.ArrayRecordDataSource(glob.glob("output.array_record*"))
it = iter(ds)
record = next(it)
```

### `__getitem__(index: int) -> bytes`

Reads a record at the specified index.

```python
from array_record.python import array_record_data_source
ds = array_record_data_source.ArrayRecordDataSource(glob.glob("output.array_record*"))
ds[idx]
```

### `__getitems__(indices: Sequence[int]) -> Sequence[bytes]`

Reads a set of records of the specified indices.

```python
from array_record.python import array_record_data_source
ds = array_record_data_source.ArrayRecordDataSource(glob.glob("output.array_record*"))
ds.__getitems__(indices)
```