Skip to content

Core: InMemoryLockManager can shut down the shared scheduler while another manager is still active #15861

@manuzhang

Description

@manuzhang

Apache Iceberg version

main (development)

Query engine

Flink

Please describe the bug 🐞

As reported in https://github.com/manuzhang/iceberg/actions/runs/23838360618/job/69487479825, LockManagers.InMemoryLockManager uses a JVM-wide shared scheduler in BaseLockManager, but each lock manager instance calls close() independently. In tests or other multi-catalog JVM usage, one in-memory lock manager can shut down the shared ScheduledThreadPoolExecutor while another live manager still needs it for heartbeats.

This causes lock acquisition to fail with RejectedExecutionException when the remaining manager tries to schedule a heartbeat.

TestFlinkTableSink > testIcebergSinkDifferentDAG() > catalogName=testhadoop_basenamespace, baseNamespace=l0.l1, format=PARQUET, isStreaming=false, useV2Sink=true FAILED
    java.lang.RuntimeException: Failed to collect table result
        at org.apache.iceberg.flink.TestBase.sql(TestBase.java:105)
        at org.apache.iceberg.flink.TestFlinkTableSink.testIcebergSinkDifferentDAG(TestFlinkTableSink.java:304)

        Caused by:
        org.apache.flink.table.api.TableException: Failed to wait job finish
            at app//org.apache.flink.table.api.internal.InsertResultProvider.hasNext(InsertResultProvider.java:85)
            at app//org.apache.flink.table.api.internal.InsertResultProvider.access$200(InsertResultProvider.java:37)
            at app//org.apache.flink.table.api.internal.InsertResultProvider$Iterator.hasNext(InsertResultProvider.java:112)
            at app//org.apache.iceberg.relocated.com.google.common.collect.Iterators.addAll(Iterators.java:369)
            at app//org.apache.iceberg.relocated.com.google.common.collect.Lists.newArrayList(Lists.java:151)
            at app//org.apache.iceberg.flink.TestBase.sql(TestBase.java:103)
            ... 1 more

            Caused by:
            java.util.concurrent.ExecutionException: org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
                at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
                at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
                at org.apache.flink.table.api.internal.InsertResultProvider.hasNext(InsertResultProvider.java:83)
                ... 6 more

                Caused by:
                org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
                ...
                        Caused by:
                        java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@747eadd9[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@4aecd5eb[Wrapped task = org.apache.iceberg.util.LockManagers$InMemoryLockManager$$Lambda$1289/0x00007ff730ab1630@425d9b97]] rejected from java.util.concurrent.ScheduledThreadPoolExecutor@3ad5a791[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1]
                            at java.base/java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2065)
                            at java.base/java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:833)
                            at java.base/java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:340)
                            at java.base/java.util.concurrent.ScheduledThreadPoolExecutor.scheduleAtFixedRate(ScheduledThreadPoolExecutor.java:632)
                            at java.base/java.util.concurrent.Executors$DelegatedScheduledExecutorService.scheduleAtFixedRate(Executors.java:819)
                            at org.apache.iceberg.util.LockManagers$InMemoryLockManager.acquireOnce(LockManagers.java:220)
                            at org.apache.iceberg.util.LockManagers$InMemoryLockManager.lambda$acquire$1(LockManagers.java:249)

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions