Skip to content

[Bug] Deadlocks when instrumenting tracing #5807

@lisasgoh

Description

@lisasgoh

Describe the bug

When instrumenting tracing in Firecracker, there are at least two sources of deadlocks using the default clippy-tracing command.

  1. Firecracker process hangs while starting up
    a. main_exec() in main.rs → LOGGER.update(config)
    b. Logger::update() at logging.rs → acquires LOGGER mutex → calls open_file_nonblock() when log-path is configured.
    c. open_file_nonblock() is instrumented → __Instrument::new() → log::trace!()
    d. Logger::log() tries to acquire LOGGER mutex → deadlock

  2. When resuming from snapshot:
    a. Main thread calls resume_vm()send_event() → sends Resume on channel, sets immediate_exit = 1 → sends RT signal to fc_vcpu
    b. fc_vcpu is in paused(), wakes up from recv(), receives Resume, checks immediate_exit = 1 and calls warn!() → Logger::log() → holds LOGGER mutex.
    c. RT signal arrives and interrupts fc_vcpu, the signal handler handle_signal is instrumented so it tries to acquire the LOGGER mutex as well but deadlocks.

To Reproduce

As above.

Expected behaviour

No deadlocks. I had to exclude utils/ and vpu.rs from the tracing instrumentation.

Environment

  • Firecracker version: 1.15.0
  • Host and guest kernel versions:
  • Rootfs used:
  • Architecture:
  • Any other relevant software versions:

Checks

  • Have you searched the Firecracker Issues database for similar problems?
  • Have you read the existing relevant Firecracker documentation?
  • Are you certain the bug being reported is a Firecracker issue?

Metadata

Metadata

Assignees

Labels

Good first issueIndicates a good issue for first-time contributorsStatus: ParkedIndicates that an issues or pull request will be revisited later

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions