Skip to content

registerTenant deadlocks on SQL Server when called post-boot #4215

@sherter

Description

@sherter

Symptom

MultiSchemaMultiTenantProcessEngineConfiguration.registerTenant(tenantId, ds) called after buildProcessEngine() hangs indefinitely on SQL Server when both databaseSchemaUpdate=true and useLockForDatabaseSchemaUpdate=true are set. The latter is typically enabled in clustered deployments to coordinate concurrent schema updates across nodes. Reproduced on Flowable 8.0.0; SQL Server 2019/2022.

Not reproducible on MariaDB/MySQL (DDL implicitly commits, so the cycle below never forms).

Stack trace (abridged)

"register-tenant-0" WAITING — parked on socket read from SQL Server
  at org.flowable.common.engine.impl.cmd.LockCmd.execute(LockCmd.java:71)
  at org.flowable.common.engine.impl.lock.LockManagerImpl.executeCommand(LockManagerImpl.java:154)
  at org.flowable.common.engine.impl.lock.LockManagerImpl.waitForLock(LockManagerImpl.java:73)
  at org.flowable.engine.impl.db.ProcessDbSchemaManager.schemaUpdate(ProcessDbSchemaManager.java:166)
  at org.flowable.common.engine.impl.db.SchemaOperationsEngineBuild.execute(SchemaOperationsEngineBuild.java:68)
  at org.flowable.engine.impl.cfg.multitenant.ExecuteSchemaOperationCommand.execute(...)
  at ...createTenantSchema(MultiSchemaMultiTenantProcessEngineConfiguration.java:149)
  at ...registerTenant(MultiSchemaMultiTenantProcessEngineConfiguration.java:88)

Root cause

SchemaOperationsEngineBuild opens transaction T_outer on connection C_outer and runs DDL on ACT_GE_PROPERTY, taking Sch-M (held until commit). It then calls LockManagerImpl.waitForLock from inside T_outer. LockManagerImpl deliberately opens a separate transaction T_inner on a separate connection C_inner so the lock can commit independently, and runs SELECT * FROM ACT_GE_PROPERTY WHERE NAME_ = ? — which needs Sch-S, blocked by T_outer's Sch-M. T_outer cannot commit until the outer call returns, which cannot happen until the inner SELECT runs. No RDBMS deadlock detector fires because the blocking session is idle.

Confirmed via sys.dm_exec_requests: blocker sleeping, holds LCK_M_SCH_S wait on OBJECT_NAME(...) = 'ACT_GE_PROPERTY', both sessions from the same application host.

Minimal repro (sketch)

MultiSchemaMultiTenantProcessEngineConfiguration config =
        new MultiSchemaMultiTenantProcessEngineConfiguration(tenantInfoHolder);
config.setDatabaseType(DATABASE_TYPE_MSSQL);
config.setDatabaseSchemaUpdate(DB_SCHEMA_UPDATE_TRUE);
config.setUseLockForDatabaseSchemaUpdate(true);  // required to trigger the bug

ProcessEngine engine = config.buildProcessEngine();

tenantInfoHolder.addTenant("tenant");
config.registerTenant("tenant", hikariFor("my_db"));   // hangs forever

Full reproducible setup

Three files plus a single docker compose up. Tested on JDK 25 against SQL Server 2022.

docker-compose.yml

services:
  mssql:
    image: mcr.microsoft.com/mssql/server:2022-latest
    environment:
      ACCEPT_EULA: "Y"
      MSSQL_SA_PASSWORD: "yourStrong(!)Password"
    ports: ["1433:1433"]

Start it and create the tenant database:

docker compose up -d
docker exec -i $(docker compose ps -q mssql) \
  /opt/mssql-tools18/bin/sqlcmd -S localhost -U sa \
  -P 'yourStrong(!)Password' -No -Q "
    CREATE DATABASE my_db;
  "

pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0">
  <modelVersion>4.0.0</modelVersion>
  <groupId>repro</groupId>
  <artifactId>flowable-registertenant-deadlock</artifactId>
  <version>1.0-SNAPSHOT</version>

  <properties>
    <maven.compiler.release>17</maven.compiler.release>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>

  <dependencies>
    <dependency>
      <groupId>org.flowable</groupId>
      <artifactId>flowable-engine</artifactId>
      <version>8.0.0</version>
    </dependency>
    <dependency>
      <groupId>com.microsoft.sqlserver</groupId>
      <artifactId>mssql-jdbc</artifactId>
      <version>12.8.1.jre11</version>
    </dependency>
    <dependency>
      <groupId>com.zaxxer</groupId>
      <artifactId>HikariCP</artifactId>
      <version>6.0.0</version>
    </dependency>
    <dependency>
      <groupId>org.slf4j</groupId>
      <artifactId>slf4j-simple</artifactId>
      <version>2.0.16</version>
    </dependency>
  </dependencies>

  <build>
    <plugins>
      <plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>exec-maven-plugin</artifactId>
        <version>3.5.0</version>
        <configuration>
          <mainClass>repro.FlowableRegisterTenantDeadlockRepro</mainClass>
        </configuration>
      </plugin>
    </plugins>
  </build>
</project>

src/main/java/repro/InMemoryTenantInfoHolder.java

package repro;

import org.flowable.common.engine.impl.cfg.multitenant.TenantInfoHolder;

import java.util.Collection;
import java.util.LinkedHashSet;
import java.util.Set;

public class InMemoryTenantInfoHolder implements TenantInfoHolder {
    private final Set<String> tenants = new LinkedHashSet<>();
    private final ThreadLocal<String> current = new ThreadLocal<>();

    public void addTenant(String tenantId) { tenants.add(tenantId); }

    @Override public Collection<String> getAllTenants()        { return tenants; }
    @Override public String getCurrentTenantId()               { return current.get(); }
    @Override public void setCurrentTenantId(String tenantId)  { current.set(tenantId); }
    @Override public void clearCurrentTenantId()               { current.remove(); }
}

src/main/java/repro/FlowableRegisterTenantDeadlockRepro.java

package repro;

import com.zaxxer.hikari.HikariConfig;
import com.zaxxer.hikari.HikariDataSource;
import org.flowable.engine.ProcessEngine;
import org.flowable.engine.impl.cfg.multitenant.MultiSchemaMultiTenantProcessEngineConfiguration;

import javax.sql.DataSource;
import java.util.concurrent.TimeUnit;

public class FlowableRegisterTenantDeadlockRepro {

    public static void main(String[] args) throws Exception {
        InMemoryTenantInfoHolder tenantInfoHolder = new InMemoryTenantInfoHolder();
        MultiSchemaMultiTenantProcessEngineConfiguration config =
                new MultiSchemaMultiTenantProcessEngineConfiguration(tenantInfoHolder);
        config.setDatabaseType(
                MultiSchemaMultiTenantProcessEngineConfiguration.DATABASE_TYPE_MSSQL);
        config.setDatabaseSchemaUpdate(
                MultiSchemaMultiTenantProcessEngineConfiguration.DB_SCHEMA_UPDATE_TRUE);
        config.setDisableIdmEngine(true);
        config.setDisableEventRegistry(true);
        // comment out this line to make the problem disappear
        config.setUseLockForDatabaseSchemaUpdate(true);

        System.out.println("Booting engine ...");
        ProcessEngine engine = config.buildProcessEngine();
        System.out.println("Engine booted.");

        // Post-boot registration: invokes createTenantSchema(...)
        //   -> SchemaOperationsEngineBuild
        //   -> ProcessDbSchemaManager.schemaUpdate
        //   -> LockManagerImpl.waitForLock  ← deadlocks on MSSQL
        tenantInfoHolder.addTenant("tenant");

        Thread registerThread = new Thread(() -> {
            System.out.println("Registering tenant post-boot ...");
            config.registerTenant("tenant", buildHikari("my_db"));
            System.out.println("tenant registered (this line NEVER prints on MSSQL).");
        }, "register-tenant-0");
        registerThread.start();

        registerThread.join(TimeUnit.SECONDS.toMillis(10));
        if (registerThread.isAlive()) {
            System.err.println("\nDEADLOCK CONFIRMED: tenant registration did not complete in 10s.\n");
            for (StackTraceElement e : registerThread.getStackTrace()) {
                System.err.println("  at " + e);
            }
            System.exit(1);
        }
        engine.close();
    }

    private static DataSource buildHikari(String database) {
        HikariConfig cfg = new HikariConfig();
        cfg.setJdbcUrl(
                "jdbc:sqlserver://localhost:1433;databaseName=" + database
                        + ";encrypt=false;trustServerCertificate=true");
        cfg.setUsername("sa");
        cfg.setPassword("yourStrong(!)Password");
        cfg.setMaximumPoolSize(10);
        cfg.setPoolName("DB Connection Pool " + database);
        return new HikariDataSource(cfg);
    }
}

Running it

mvn -q compile exec:java

Expected output:

Booting engine ...
Engine booted.
Registering tenant post-boot ...
DEADLOCK CONFIRMED: tenant registration did not complete in 10s.

  at sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:307)
  ...
  at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.execute(...)
  ...
  at org.flowable.common.engine.impl.cmd.LockCmd.execute(LockCmd.java:71)
  ...
  at org.flowable.common.engine.impl.lock.LockManagerImpl.waitForLock(LockManagerImpl.java:73)
  at org.flowable.engine.impl.db.ProcessDbSchemaManager.schemaUpdate(ProcessDbSchemaManager.java:166)
  ...
  at MultiSchemaMultiTenantProcessEngineConfiguration.registerTenant(...)

Commenting out config.setUseLockForDatabaseSchemaUpdate(true); makes the problem disappear. Swapping the JDBC URL and DATABASE_TYPE_MSSQL for a MariaDB/MySQL instance also completes successfully even with the flag enabled.

Environment

Flowable 8.0.0 · JDK 25 · HikariCP 6.0.0 · mssql-jdbc 12.8.1 · SQL Server 2022

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions