Custom DataSetIterator not working with ParallelWrapper

Apparently when using a custom DataSetIterator (such as a rolling window iterator for time series), DL4J’s ParallelWrapper and training loop do not call the next(int) method for batching, but instead repeatedly call next(). This is inconsistent with the behavior of built-in iterators like ListDataSetIterator, which do support batching via next(int). As a result, custom iterators must implement a workaround (always batching in next()) to apparently  achieve parallel batch processing, which is error-prone and not clearly documented. This makes it difficult to efficiently parallelize training on large datasets with custom iterators. 
This should work in a way more straightforward fashion because for large datasets the memory pressure is huge if you don't use the custom iterators as it works for instance in Python with TensorFlow (tf.data.Dataset.window).


Cheers and thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom DataSetIterator not working with ParallelWrapper #10220

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Custom DataSetIterator not working with ParallelWrapper #10220

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions