Skip to content

Workspace Exception with ParallelWrapper and Custom Iterator #10221

@MintakaB

Description

@MintakaB

When training an BiLSTM-based time series forecasting model using DL4J’s ParallelWrapper and a custom iterator (or even standard iterators), we encounter the following runtime exception:

org.nd4j.linalg.exception.ND4JIllegalStateException: Workspace [ADSI_ITER-...]: Can't borrow from borrowed workspace
or
java.lang.IllegalStateException: Feed forward to layer (training): array (INPUT) workspace validation failed (layer 1 - layer name "layer1" - class: org.deeplearning4j.nn.layers.recurrent.BidirectionalLayer) - array is defined in incorrect workspace

DL4J Version: (1.0.0-M2.1)
ND4J Version: (1.0.0-M2.1)
Java Version: (java 21)
OS: (Linux)
CPU. Not GPU acceleration.

final MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
    .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
    .updater(new Adam(LEARNING_RATE))
    .list()
    .layer(new Bidirectional(Bidirectional.Mode.CONCAT, new LSTM.Builder()
            .nIn(NUM_FEATURES)
            .nOut(64)
            .activation(Activation.TANH)
            .l2(1e-4)
            .dropOut(0.2)
            .build()))
    .layer(new Bidirectional(Bidirectional.Mode.CONCAT, new LSTM.Builder()
            .nIn(128)
            .nOut(64)
            .activation(Activation.TANH)
            .l2(1e-4)
            .dropOut(0.2)
            .build()))
    .layer(new Bidirectional(Bidirectional.Mode.CONCAT, new LSTM.Builder()
            .nIn(128)
            .nOut(64)
            .activation(Activation.TANH)
            .l2(1e-4)
            .dropOut(0.2)
            .build()))
    .layer(new GlobalPoolingLayer.Builder().poolingType(PoolingType.AVG).build())
    .layer(new DenseLayer.Builder()
            .nIn(128)
            .nOut(64)
            .activation(Activation.RELU)
            .dropOut(0.2)
            .l2(1e-4)
            .build())
    .layer(new OutputLayer.Builder(LossFunctions.LossFunction.MEAN_ABSOLUTE_ERROR)
            .activation(Activation.IDENTITY)
            .nIn(64)
            .nOut(FORECAST_HORIZON)
            .l2(1e-4)
            .build())
    .build();
final ParallelWrapper networkWrapper = new ParallelWrapper.Builder<MultiLayerNetwork>(this.network)
        .workers(2)
        .prefetchBuffer(12)
        .build();

networkWrapper.fit(trainIter);
  1. Custom RollingWindowDataSetIterator that creates new INDArrays per batch and does not keep references outside the batch scope.
  2. Training fails with workspace exceptions as soon as ParallelWrapper is used, even with minimal worker count. Tested 2, 4, 8 and 12.
  3. The issue appears only when using BiLSTM layers with ParallelWrapper.
  4. The problem is reproducible with both custom and standard iterators.
  5. No references to INDArrays or DataSets are kept outside the batch scope.
  6. On a regular stack of LSTM layers, no issues at all.

I hope this is enough to figure out the issue.
Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions