When training an BiLSTM-based time series forecasting model using DL4J’s ParallelWrapper and a custom iterator (or even standard iterators), we encounter the following runtime exception:
org.nd4j.linalg.exception.ND4JIllegalStateException: Workspace [ADSI_ITER-...]: Can't borrow from borrowed workspace
or
java.lang.IllegalStateException: Feed forward to layer (training): array (INPUT) workspace validation failed (layer 1 - layer name "layer1" - class: org.deeplearning4j.nn.layers.recurrent.BidirectionalLayer) - array is defined in incorrect workspace
DL4J Version: (1.0.0-M2.1)
ND4J Version: (1.0.0-M2.1)
Java Version: (java 21)
OS: (Linux)
CPU. Not GPU acceleration.
final MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.updater(new Adam(LEARNING_RATE))
.list()
.layer(new Bidirectional(Bidirectional.Mode.CONCAT, new LSTM.Builder()
.nIn(NUM_FEATURES)
.nOut(64)
.activation(Activation.TANH)
.l2(1e-4)
.dropOut(0.2)
.build()))
.layer(new Bidirectional(Bidirectional.Mode.CONCAT, new LSTM.Builder()
.nIn(128)
.nOut(64)
.activation(Activation.TANH)
.l2(1e-4)
.dropOut(0.2)
.build()))
.layer(new Bidirectional(Bidirectional.Mode.CONCAT, new LSTM.Builder()
.nIn(128)
.nOut(64)
.activation(Activation.TANH)
.l2(1e-4)
.dropOut(0.2)
.build()))
.layer(new GlobalPoolingLayer.Builder().poolingType(PoolingType.AVG).build())
.layer(new DenseLayer.Builder()
.nIn(128)
.nOut(64)
.activation(Activation.RELU)
.dropOut(0.2)
.l2(1e-4)
.build())
.layer(new OutputLayer.Builder(LossFunctions.LossFunction.MEAN_ABSOLUTE_ERROR)
.activation(Activation.IDENTITY)
.nIn(64)
.nOut(FORECAST_HORIZON)
.l2(1e-4)
.build())
.build();
final ParallelWrapper networkWrapper = new ParallelWrapper.Builder<MultiLayerNetwork>(this.network)
.workers(2)
.prefetchBuffer(12)
.build();
networkWrapper.fit(trainIter);
- Custom RollingWindowDataSetIterator that creates new INDArrays per batch and does not keep references outside the batch scope.
- Training fails with workspace exceptions as soon as ParallelWrapper is used, even with minimal worker count. Tested 2, 4, 8 and 12.
- The issue appears only when using BiLSTM layers with ParallelWrapper.
- The problem is reproducible with both custom and standard iterators.
- No references to INDArrays or DataSets are kept outside the batch scope.
- On a regular stack of LSTM layers, no issues at all.
I hope this is enough to figure out the issue.
Thank you.
When training an BiLSTM-based time series forecasting model using DL4J’s ParallelWrapper and a custom iterator (or even standard iterators), we encounter the following runtime exception:
org.nd4j.linalg.exception.ND4JIllegalStateException: Workspace [ADSI_ITER-...]: Can't borrow from borrowed workspaceor
java.lang.IllegalStateException: Feed forward to layer (training): array (INPUT) workspace validation failed (layer 1 - layer name "layer1" - class: org.deeplearning4j.nn.layers.recurrent.BidirectionalLayer) - array is defined in incorrect workspaceDL4J Version: (1.0.0-M2.1)
ND4J Version: (1.0.0-M2.1)
Java Version: (java 21)
OS: (Linux)
CPU. Not GPU acceleration.
I hope this is enough to figure out the issue.
Thank you.