Skip to content

Vhost-guest (was virtio-vhost-user) device#7695

Draft
DemiMarie wants to merge 16 commits intocloud-hypervisor:mainfrom
DemiMarie:virtio-device-backend
Draft

Vhost-guest (was virtio-vhost-user) device#7695
DemiMarie wants to merge 16 commits intocloud-hypervisor:mainfrom
DemiMarie:virtio-device-backend

Conversation

@DemiMarie
Copy link
Copy Markdown
Contributor

@DemiMarie DemiMarie commented Feb 12, 2026

Implement a vhost-guest device, formerly known as virtio-vhost-user. A vhost-guest device is a vhost-user server that forwards all messages to the guest.

Vhost-guest requires additional features not used by other virtio devices. Therefore, this PR is based on top of #7841. It is also very complex, as shown by its rather large size. To mitigate the complexity, most of the code is in a separate crate that does not depend on Cloud Hypervisor internals. This will allows it to be used by other VMMs (like crosvm), and (more importantly) will allow it to be unit tested without having to create a VM. I expect the number of unit tests needed to be significant.

A full-featured vhost-guest device also needs a fix to #7839. Securely hot-unplugging a vhost-guest device requires a fix to #7955. While the current code is not yet testable, I hope to have a testable version fairly soon after fixing #7955. It will require changes to Cloud Hypervisor core code to implement notifications.

Right now, the only vhost-guest driver I am aware of is an old version of crosvm. Crosvm uses VFIO to drive the device, which requires allow_unsafe_interrupts. I plan to upstream a Linux kernel driver that will provide a vhost-msg compatible API. There is corresponding work in QEMU to implement support for using such a driver, which will allow any QEMU-supported device to be used.

This implementation is optimized for performance at the expense of simplicity. Frontend-provided vring call and error eventfds are registered with the ioeventfd mechanism, and frontend-provided vring kick eventfds are registered as irqfds. The goal is to avoid Cloud Hypervisor being involved in any hot paths. For instance, this might allow a future version of DPDK and/or VPP to provide high-performance network services to other VMs.

This implementation is designed to be secure against both malicious guests and malicious vhost-user clients. I assumed that vhost-user clients are likely in separate sandboxes from Cloud Hypervisor, and therefore cannot compromise it directly without a separate vulnerability. I consider it to be a security vulnerability in this code if a malicious guest or malicious vhost-user client can compromise it, even if the two must coordinate with each other to perform the attack.

This implementation uses a substantial amount of unsafe code. Some of it is taken from the vhost-user crate, which is not designed for non-AF_UNIX transports. I have encapsulated this code to the extent possible, but very careful review would be greatly appreciated.

The protocol library (under virtio-vhost-user/) is ready to review. I expect the main future change to be the addition of lots of unit tests. The device implementation (in virtio-devices/src/virtio_vhost_user.rs) is incomplete and not hooked up to the rest of Cloud Hypervisor. While review would be appreciated, it is less important at this time.

Comment thread virtio-devices/src/transport/pci_device.rs Outdated
@DemiMarie
Copy link
Copy Markdown
Contributor Author

DemiMarie commented Feb 22, 2026

I’m now running into a serious bottleneck. The version of the virtio vhost-user device backend spec I am implementing requires support for additional interrupts beyond what virtio itself needs. Specifically, a vhost-user device backend (VDB) device will only have one queue. However, implementing a device with N queues needs N MSI-X interrupts, N externally-provided irqfds, and 2N + 1 externally-provided ioeventfds.

This is unique to the VDB device, and is a result of VDB being optimized for performance: it aims to avoid the userspace VMMs being involved in interrupt handling at runtime. I don’t see a way to avoid this without a substantial performance loss. For the desktop workloads @alyssais and I care about, this might be acceptable, but I doubt cloud workloads would be okay with it.

How should these be wired up to the rest of Cloud Hypervisor? Should I resort to slow userspace emulation? I could resort to handling interrupts in-band via control messages on the admin queue, which would work but not be performant. That’s fine for a proof of concept, but fixing it is a backwards-incompatible change (from the backend VM’s perspective) and so I want to fix it before writing and submitting the kernel driver.

Once that is done, the rest of the protocol should hopefully be straightforward to implement. Most messages require no involvement from Cloud Hypervisor beyond checking that they are of known numbers and the framing is correct. Protocol feature negotiation does need Cloud Hypervisor involvement, but only to ensure that no messages with unknown types are generated by the frontend or backend.

@DemiMarie DemiMarie force-pushed the virtio-device-backend branch from 187e228 to 291fa79 Compare February 24, 2026 21:36
@DemiMarie DemiMarie force-pushed the virtio-device-backend branch 8 times, most recently from cf2d063 to 63d6c83 Compare March 10, 2026 09:58
@DemiMarie DemiMarie force-pushed the virtio-device-backend branch from 63d6c83 to bc2de9d Compare March 24, 2026 13:56
@phip1611
Copy link
Copy Markdown
Member

phip1611 commented Apr 1, 2026

What's the status here? The PR description is empty.

@DemiMarie DemiMarie force-pushed the virtio-device-backend branch 2 times, most recently from 995d721 to 7195efe Compare April 1, 2026 16:07
@DemiMarie
Copy link
Copy Markdown
Contributor Author

What's the status here? The PR description is empty.

This device is still incomplete. It has not been tested at all, which is not great. It's not even in a state where it can be tested. For instance, reading configuration space hits a todo!(), as does device reset. The device is also not hooked up to the device manager, so it can't be used.

What is largely implemented is processing incoming and outgoing vhost-user messages. The existing vhost-user crate makes too many assumptions that don't work for this code, so it needs to reimplement parsing itself. The result is that this device is quite complex. It also uses core code that is currently not used and which I implemented specifically for this device. Some of it (#7841) is not even merged yet.

@DemiMarie DemiMarie force-pushed the virtio-device-backend branch from 7195efe to e3a5270 Compare April 1, 2026 16:12
@rbradford
Copy link
Copy Markdown
Member

I think if there is no intention to land some code in this PR. It would be better to just reference a branch from an issue. Having too many open PRs creates noise and distractions.

@DemiMarie
Copy link
Copy Markdown
Contributor Author

I think if there is no intention to land some code in this PR. It would be better to just reference a branch from an issue. Having too many open PRs creates noise and distractions.

I do intend to land this code. It’s not ready yet, though, which is why it is marked as draft.

At this point, I expect that most work will be fixing bugs, adding the device to the rest of the system, and testing it. In particular, I’d like feedback on the code structure, as I don’t intend to change that unless required.

@DemiMarie DemiMarie force-pushed the virtio-device-backend branch 8 times, most recently from 22e3be5 to 7e4c22a Compare April 3, 2026 03:57
@DemiMarie DemiMarie force-pushed the virtio-device-backend branch 6 times, most recently from 668df93 to a864563 Compare April 3, 2026 06:12
@alyssais
Copy link
Copy Markdown
Member

alyssais commented Apr 7, 2026

In particular, I’d like feedback on the code structure, as I don’t intend to change that unless required.

It might help if the PR had a proper title and description!

@DemiMarie DemiMarie changed the title WIP virtio device backend Vhost-guest (was virtio-vhost-user) device Apr 7, 2026
virtio-vhost-user requires for PCI doorbells used to
notify the frontend of events.  Add support for them.  Each doorbell
requires an MSI-X interrupt, so account for this.

The specification for this capability is
<https://stefanha.github.io/virtio/vhost-user-slave.html>.

No functional change intended as the code is currently not used.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
Virtio-vhost-user devices need one interrupt per queue, plus one
interrupt per config change, plus one interrupt per queue of the
*device they are implementing*.  Allow devices to override the number of
interrupts they need instead of assuming it is always one more than the
number of queues.  No functional change for existing devices, which use
a default implementation that returns the number of queues.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
It will always be 0.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
No functional change intended.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
@DemiMarie DemiMarie force-pushed the virtio-device-backend branch from 787c6c2 to 95ee5be Compare April 12, 2026 10:21
The code will soon need to iterate over ioeventfds provided by the inner
virtio device.  Accessing that device requires a lock, which works fine
with internal iteration but breaks with external iteration.  While an
Arc could be used to avoid the need for the callback, implementations of
this method will need to grab a device-internal lock and invoke the
callback with the lock held.  Returning a MutexGuard would constrain the
device implementation unnecessarily.

This also changes VirtioTransport::ioeventfds() to take *two* addresses,
not one.  The callback also takes two addresses.  This is because
relocating a BAR must happen atomically with respect to registering and
unregistering ioeventfds by the device.  If the device manager
used one call to unregister ioeventfds and a second call to reregister
them, a device implementation could register ioeventfds between the two
calls.  Such ioeventfds would not be unregistered, potentially causing
errors during reregistration.  Using a single call for both
unregistration and reregistration allows the device implementation to
hold a lock across both operations.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
The device manager calls VirtioTransport::ioeventfds whenever it
needs to unregister or register ioeventfds.  This function invokes a
callback with each virtqueue ioeventfd, its old base address, and its
new base address.  Also invoke the callback with ioeventfds provided by
the device itself.

No functional change as no current device provides any.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
Devices that register their own ioeventfds need to know the address they
should register them at.  It is not possible for this to be a callback
on PciDevice because PciDevice contains the VirtioDevice.

Calls to the callback after activation are ignored.  Such calls are
always preceeded by a call to VirtioPciDevice::ioeventfds(), which
already provides the new address.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
Map memory from externally-provided FDs into the guest address space.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
Virtio-vhost-user uses four virtqueues.  These are grouped into two
pairs of two queues each.  In each pair, one pair is used for requests
and the other is used for replies.  Each pair is also associated with an
AF_UNIX stream socket used for communication with the frontend.

Implement a queue pair abstraction that handles message framing and
communication with the VM.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
These are significantly simpler than frontend->backend requests.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
Implement frontend-to-backend requests and the corresponding
backend-to-frontend replies.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
This is a partial implementation of a virtio vhost-user device backend.
There is no plan to implement migration for this device.  In most
use-cases, the VM using this device is tied to physical hardware and
makes no sense to migrate.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
This removes dependencies on the rest of Cloud Hypervisor in preparation
for making it a standalone crate.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
This allows it to be used outside of Cloud Hypervisor.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
This code is now used.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
The virtio-vhost-user crate already validates eventfds, so the device
model in Cloud Hypervisor doesn't need to unsafely assume that using a
non-eventfd as an eventfd is safe.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants