Apparently when using a custom DataSetIterator (such as a rolling window iterator for time series), DL4J’s ParallelWrapper and training loop do not call the next(int) method for batching, but instead repeatedly call next(). This is inconsistent with the behavior of built-in iterators like ListDataSetIterator, which do support batching via next(int). As a result, custom iterators must implement a workaround (always batching in next()) to apparently achieve parallel batch processing, which is error-prone and not clearly documented. This makes it difficult to efficiently parallelize training on large datasets with custom iterators.
This should work in a way more straightforward fashion because for large datasets the memory pressure is huge if you don't use the custom iterators as it works for instance in Python with TensorFlow (tf.data.Dataset.window).
Cheers and thank you!
Apparently when using a custom DataSetIterator (such as a rolling window iterator for time series), DL4J’s ParallelWrapper and training loop do not call the next(int) method for batching, but instead repeatedly call next(). This is inconsistent with the behavior of built-in iterators like ListDataSetIterator, which do support batching via next(int). As a result, custom iterators must implement a workaround (always batching in next()) to apparently achieve parallel batch processing, which is error-prone and not clearly documented. This makes it difficult to efficiently parallelize training on large datasets with custom iterators.
This should work in a way more straightforward fashion because for large datasets the memory pressure is huge if you don't use the custom iterators as it works for instance in Python with TensorFlow (tf.data.Dataset.window).
Cheers and thank you!