Skip to content

pci: Use poll() to check if an MSI-X interrupt is pending#7963

Draft
DemiMarie wants to merge 2 commits intocloud-hypervisor:mainfrom
DemiMarie:fix-pba
Draft

pci: Use poll() to check if an MSI-X interrupt is pending#7963
DemiMarie wants to merge 2 commits intocloud-hypervisor:mainfrom
DemiMarie:fix-pba

Conversation

@DemiMarie
Copy link
Copy Markdown
Contributor

The eventfds are the authoritative source of whether an interrupt is pending. Instead of updating a duplicate copy of this state in the MSI-X code, use poll() to obtain the information from the kernel on demand. This also fixes reading from the Pending Bit Array of vhost-user devices, and avoids taking a lock when triggering interrupts for all devices. Furthermore, it avoids panics when a bad read is made from the Pending Bit Array.

Since it is likely that guests don't read the Pending Bit Array, this is the slowest of slow paths: it will likely never be reached in the entire lifecycle of a VM. Therefore, the need for an extra system call does not matter.

The Pending Bit Array still needs to be stored explicitly for migration. The triggered state of an irqfd is not preserved across migration, so it is necessary to include the Pending Bit Array in the migration state. However, until migration occurs, the current in-memory state only has bits removed, never added. The missing bits are obtained via poll() when the state needs to be saved.

Fixes: #7838

@DemiMarie DemiMarie requested a review from a team as a code owner April 4, 2026 16:41
@DemiMarie DemiMarie marked this pull request as draft April 4, 2026 16:41
@DemiMarie DemiMarie marked this pull request as ready for review April 4, 2026 20:59
@DemiMarie DemiMarie force-pushed the fix-pba branch 6 times, most recently from 0296b07 to 8a63b41 Compare April 6, 2026 22:00
The eventfds are the authoritative source of whether an interrupt is
pending.  Instead of updating a duplicate copy of this state in the
MSI-X code, use poll() to obtain the information from the kernel on
demand.  This also fixes reading from the Pending Bit Array (PBA) of
vhost-user devices, and avoids taking a lock when triggering interrupts
for all devices.

Since it is likely that guests don't read the PBA, this is the slowest
of slow paths: it will likely never be reached in the entire lifecycle
of a VM.  Therefore, the need for an extra system call does not matter.

The PBA still needs to be stored explicitly for migration.  The
triggered state of an irqfd is not preserved across migration, so it is
necessary to include the Pending Bit Array in the migration state.
However, until migration occurs, the current in-memory state only has
bits _removed_, never added.  The missing bits are obtained via poll()
when the state needs to be saved.

Finally, fix a panic when reading from the PBA at an invalid offset.

Fixes: cloud-hypervisor#7838
Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
Copy link
Copy Markdown
Member

@rbradford rbradford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sboeuf When you get a chance can you look at this?

Copy link
Copy Markdown
Member

@sboeuf sboeuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@rbradford rbradford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you test this? This is a lot of complex code and i'm not sure what problem it's actually trying to solve

Comment thread pci/src/msix.rs Outdated
};
let mut buf = Vec::new();
let mut mask = 0u64;
for (index, bit) in bit_iterator() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here you use index as the index and bit as the value.

Comment thread pci/src/msix.rs Outdated
_ => {
error!("invalid data length");
let mut iter = buf.iter();
for (_, bit) in bit_iterator() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But here you throw away the index and then call the value bit.

Comment thread pci/src/msix.rs Outdated
for (_, bit) in bit_iterator() {
if mask & bit != 0 && libc::POLLIN as libc::c_short & iter.next().unwrap().revents != 0
{
pba_entry |= 1u64 << bit;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really want to shift by the value, surely the index?

Comment thread pci/src/msix.rs Outdated
let shift = ((offset & 4) * 8) as usize;
// Create an iterator that iterates over the bits in the data being
// requested, and also yields the bit in the PBA entry being looked at.
let bit_iterator = {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too much smarts

Comment thread pci/src/msix.rs Outdated
let mut buf = Vec::new();
let mut mask = 0u64;
for (index, bit) in bit_iterator() {
let table_offset = offset as usize + index;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a byte offset

Comment thread pci/src/msix.rs Outdated
let mut mask = 0u64;
for (index, bit) in bit_iterator() {
let table_offset = offset as usize + index;
let Some(entry) = self.table_entries.get(table_offset) else {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it gets used as an index?

Comment thread pci/src/msix.rs Outdated
let mut entries = self.pba_entries.clone();
for (offset, entry) in entries.iter_mut().enumerate() {
let mut data = [0u8; 8];
self.read_pba(offset.try_into().unwrap(), &mut data[..]);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This expects a byte offset, but instead you give it a plain index.

Comment thread pci/src/msix.rs Outdated
pba_entry &= !bit;
continue;
};
if !self.masked && entry.masked() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This masking is wrong:

  if !self.masked && !entry.masked() {                                         
      // Fully unmasked: irqfd active, interrupt delivered, PBA clear.                                                                                         
      pba_entry &= !bit;                                                                                                                                       
      continue;                                                                                                                                                
  }                                                                                                                                                            

Also the comment is misleading

@rbradford
Copy link
Copy Markdown
Member

I think this could be unit tested.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
@rbradford rbradford marked this pull request as draft April 16, 2026 17:24
@rbradford
Copy link
Copy Markdown
Member

Drafting as has "WIP" commits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle Pending Bit Array by checking if irqfd is ready

3 participants