Fix block device resize disk#7948
Conversation
ecbc12e to
51a33af
Compare
|
|
||
| let nsectors = new_size / SECTOR_SIZE; | ||
|
|
||
| self.common.pause().map_err(Error::PauseVcpus)?; |
There was a problem hiding this comment.
don't we need this for proper synchronization during disk-resize?
There was a problem hiding this comment.
Also, PauseVcpus is wrong here (a mistake I made). We should correct this and make it PauseVirtioThreads or similar.
There was a problem hiding this comment.
pause and resume actually caused a deadlock of the cloud-hypervisor process and froze the whole VM indefinitely.
There was a problem hiding this comment.
this should not happen. Does it only deadlock when you use block devices? For me, using a raw file with the original code works perfectly fine.
There was a problem hiding this comment.
It does not deadlock when using a block device on the host with "fast" I/O (for example losetup).
It does deadlock when using a host block device on the host with "slow", network-bound I/O (for example ceph; rbd mapped block device). Both scenarios happen 100% of the time when manually testing.
The removal of the pause/resume came from experimentation, which turned out to fix this issue and work well.
There was a problem hiding this comment.
If the VM was paused, and we can ensure that it says paused during the resize call, I would suggest that we only then skip pausing/resuming the virtio-queues. @phip1611 @vincent-thomas
There was a problem hiding this comment.
@phip1611 you mix up vcpu pausing and virtqueu pausing ;)
There was a problem hiding this comment.
@tpressure @phip1611 Have a look at my recently updated diff, It is changed to fix the root cause.
There was a problem hiding this comment.
only then skip pausing/resuming the virtio-queues.
I agree - looks like @vincent-thomas did the right thing now. LGTM on a first glance
There was a problem hiding this comment.
I'll ask for another review since it sounds like this conversation has cleared up maintainers worries.
bbc9d98 to
603b73b
Compare
|
please fix CI |
603b73b to
75256f5
Compare
Sorry about that, should be okay now. |
|
Please run The commit message check can be ignored (optional CI step). You could properly fix it by using the scheme: |
| ); | ||
| self.paused.store(true, Ordering::SeqCst); | ||
|
|
||
| // If already paused, return early to avoid deadlock waiting on barrier |
There was a problem hiding this comment.
This situation occurs when the VMM thread holds a device mutex while
calling an operation that triggers pause(), and a vCPU thread
simultaneously needs that same mutex for MMIO access. With slow I/O
backends (like RBD/Ceph), the timing window for this race is larger,
making the deadlock more likely to occur.
Ah, wait. Doesn't this open another race window? Another thread may unpause this too early? Or are we protected by mutexes?
There was a problem hiding this comment.
Maybe this is another place where we need a three-way handshake with atomic
RUNNING -> PAUSING -> PAUSED
It might not strictly needed in this case since we don't really do anything after.
1bd5f9d to
5bace4c
Compare
rbradford
left a comment
There was a problem hiding this comment.
Your race commit fix deserves its own PR as I think it is independent of the resize fix?
Yeah sure but if the resize fix lands and the race commit doesn't, then CH will deadlock on any resize-disk API call where:
|
|
The ordering of the commits in the PR is probably wrong then - but I asked you to split the race fix out as it's an obviously good fix. |
Previously, calling pause() when already paused would wait on a barrier for worker threads that were already parked, causing a deadlock. This situation occurs when the VMM thread holds a device mutex while calling an operation that triggers pause(), and a vCPU thread simultaneously needs that same mutex for MMIO access. With slow I/O backends (like RBD/Ceph), the timing window for this race is larger, making the deadlock more likely to occur, see [0]. Make pause() idempotent by checking the paused state atomically and returning early if already paused, avoiding the barrier wait. [0] #7948 (comment) Signed-off-by: Vincent Thomas <vincent@v-thomas.com>
5bace4c to
c2ec811
Compare
|
I flipped the order of your commits. |
| ); | ||
| self.paused.store(true, Ordering::SeqCst); | ||
|
|
||
| // If already paused, return early to avoid deadlock waiting on barrier |
There was a problem hiding this comment.
Maybe this is another place where we need a three-way handshake with atomic
RUNNING -> PAUSING -> PAUSED
It might not strictly needed in this case since we don't really do anything after.
Block devices (LVM volumes, loop devices, RBD, etc.) cannot be resized via ftruncate - they are resized externally. When vm.resize-disk is called for a block device backend, verify the device size matches the requested size instead of attempting ftruncate. This enables the resize-disk API to work with block device backends by validating the externally-resized device matches the expected size. Signed-off-by: Vincent Thomas <vincent@v-thomas.com>
c2ec811 to
e91273d
Compare
Sorry about this, ran it a couple of times I thought.... Should be fixed now. |
Previously, calling pause() when already paused would wait on a barrier for worker threads that were already parked, causing a deadlock. This situation occurs when the VMM thread holds a device mutex while calling an operation that triggers pause(), and a vCPU thread simultaneously needs that same mutex for MMIO access. With slow I/O backends (like RBD/Ceph), the timing window for this race is larger, making the deadlock more likely to occur, see [0]. Make pause() idempotent by checking the paused state atomically and returning early if already paused, avoiding the barrier wait. [0] #7948 (comment) Signed-off-by: Vincent Thomas <vincent@v-thomas.com>
As it currently is, CH doesn't support resizing disks of VMs whose storage is backed by a host block device. This PR addresses this and prevents (but doesn't fix) a deadlock of the CH process (which in turn causes VM cpu to freeze) from happening when doing so.
Previous discussion and further details: #7923
Fixes #7923