Fix block device resize disk#7948
Fix block device resize disk#7948vincent-thomas wants to merge 2 commits intocloud-hypervisor:mainfrom
Conversation
Block devices (LVM volumes, loop devices, etc.) cannot be resized via ftruncate - they are resized externally. When vm.resize-disk is called for a block device backend, skip the ftruncate call and verify the device is accessible via BLKGETSIZE64 ioctl. Re-query the actual size after resize to ensure the guest receives the correct capacity for externally resized block devices. Signed-off-by: Vincent Thomas <vincent@v-thomas.com>
ecbc12e to
51a33af
Compare
|
|
||
| let nsectors = new_size / SECTOR_SIZE; | ||
|
|
||
| self.common.pause().map_err(Error::PauseVcpus)?; |
There was a problem hiding this comment.
don't we need this for proper synchronization during disk-resize?
There was a problem hiding this comment.
Also, PauseVcpus is wrong here (a mistake I made). We should correct this and make it PauseVirtioThreads or similar.
There was a problem hiding this comment.
pause and resume actually caused a deadlock of the cloud-hypervisor process and froze the whole VM indefinitely.
There was a problem hiding this comment.
this should not happen. Does it only deadlock when you use block devices? For me, using a raw file with the original code works perfectly fine.
There was a problem hiding this comment.
It does not deadlock when using a block device on the host with "fast" I/O (for example losetup).
It does deadlock when using a host block device on the host with "slow", network-bound I/O (for example ceph; rbd mapped block device). Both scenarios happen 100% of the time when manually testing.
The removal of the pause/resume came from experimentation, which turned out to fix this issue and work well.
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
If the VM was paused, and we can ensure that it says paused during the resize call, I would suggest that we only then skip pausing/resuming the virtio-queues. @phip1611 @vincent-thomas
There was a problem hiding this comment.
@phip1611 you mix up vcpu pausing and virtqueu pausing ;)
There was a problem hiding this comment.
@tpressure @phip1611 Have a look at my recently updated diff, It is changed to fix the root cause.
There was a problem hiding this comment.
only then skip pausing/resuming the virtio-queues.
I agree - looks like @vincent-thomas did the right thing now. LGTM on a first glance
51a33af to
26e3ab4
Compare
Previously running pause when already paused would wait for threads that were already parked, which caused a deadlock, see: cloud-hypervisor#7948 (comment). Pause endpoint now checks if it paused already. If so then it returns success instead of continuing, this prevents the deadlock. This commit adds correct host block device querieng in '/vm.resize-disk'. Because the content and mutability of a block device essentially is unknown, CH cannot resize it itself. In the case where a VMs disk is backed by a host block device, the resize disk endpoint only succeeds (and sends config interrupt) if host block device size matches the "new_size" given to the endpoint. Signed-off-by: Vincent Thomas <vincent@v-thomas.com>
26e3ab4 to
bbc9d98
Compare
Previously running pause when already paused would wait for threads that were already parked, which caused a deadlock, see: cloud-hypervisor#7948 (comment). Pause endpoint now checks if it paused already. If so then it returns success instead of continuing, this prevents the deadlock. This commit adds correct host block device querieng in '/vm.resize-disk'. Because the content and mutability of a block device essentially is unknown, CH cannot resize it itself. In the case where a VMs disk is backed by a host block device, the resize disk endpoint only succeeds (and sends config interrupt) if host block device size matches the "new_size" given to the endpoint. Signed-off-by: Vincent Thomas <vincent@v-thomas.com>
bbc9d98 to
603b73b
Compare
As it currently is, CH doesn't support resizing disks of VMs whose storage is backed by a host block device. This PR addresses this and prevents (but doesn't fix) a deadlock of the CH process (which in turn causes VM cpu to freeze) from happening when doing so.
Previous discussion and further details: #7923
Fixes #7923