pci: Use poll() to check if an MSI-X interrupt is pending#7963
pci: Use poll() to check if an MSI-X interrupt is pending#7963DemiMarie wants to merge 2 commits intocloud-hypervisor:mainfrom
Conversation
0296b07 to
8a63b41
Compare
The eventfds are the authoritative source of whether an interrupt is pending. Instead of updating a duplicate copy of this state in the MSI-X code, use poll() to obtain the information from the kernel on demand. This also fixes reading from the Pending Bit Array (PBA) of vhost-user devices, and avoids taking a lock when triggering interrupts for all devices. Since it is likely that guests don't read the PBA, this is the slowest of slow paths: it will likely never be reached in the entire lifecycle of a VM. Therefore, the need for an extra system call does not matter. The PBA still needs to be stored explicitly for migration. The triggered state of an irqfd is not preserved across migration, so it is necessary to include the Pending Bit Array in the migration state. However, until migration occurs, the current in-memory state only has bits _removed_, never added. The missing bits are obtained via poll() when the state needs to be saved. Finally, fix a panic when reading from the PBA at an invalid offset. Fixes: cloud-hypervisor#7838 Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
rbradford
left a comment
There was a problem hiding this comment.
How did you test this? This is a lot of complex code and i'm not sure what problem it's actually trying to solve
| }; | ||
| let mut buf = Vec::new(); | ||
| let mut mask = 0u64; | ||
| for (index, bit) in bit_iterator() { |
There was a problem hiding this comment.
So here you use index as the index and bit as the value.
| _ => { | ||
| error!("invalid data length"); | ||
| let mut iter = buf.iter(); | ||
| for (_, bit) in bit_iterator() { |
There was a problem hiding this comment.
But here you throw away the index and then call the value bit.
| for (_, bit) in bit_iterator() { | ||
| if mask & bit != 0 && libc::POLLIN as libc::c_short & iter.next().unwrap().revents != 0 | ||
| { | ||
| pba_entry |= 1u64 << bit; |
There was a problem hiding this comment.
Do you really want to shift by the value, surely the index?
| let shift = ((offset & 4) * 8) as usize; | ||
| // Create an iterator that iterates over the bits in the data being | ||
| // requested, and also yields the bit in the PBA entry being looked at. | ||
| let bit_iterator = { |
| let mut buf = Vec::new(); | ||
| let mut mask = 0u64; | ||
| for (index, bit) in bit_iterator() { | ||
| let table_offset = offset as usize + index; |
| let mut mask = 0u64; | ||
| for (index, bit) in bit_iterator() { | ||
| let table_offset = offset as usize + index; | ||
| let Some(entry) = self.table_entries.get(table_offset) else { |
There was a problem hiding this comment.
But it gets used as an index?
| let mut entries = self.pba_entries.clone(); | ||
| for (offset, entry) in entries.iter_mut().enumerate() { | ||
| let mut data = [0u8; 8]; | ||
| self.read_pba(offset.try_into().unwrap(), &mut data[..]); |
There was a problem hiding this comment.
This expects a byte offset, but instead you give it a plain index.
| pba_entry &= !bit; | ||
| continue; | ||
| }; | ||
| if !self.masked && entry.masked() { |
There was a problem hiding this comment.
This masking is wrong:
if !self.masked && !entry.masked() {
// Fully unmasked: irqfd active, interrupt delivered, PBA clear.
pba_entry &= !bit;
continue;
}
Also the comment is misleading
|
I think this could be unit tested. |
Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
|
Drafting as has "WIP" commits. |
The eventfds are the authoritative source of whether an interrupt is pending. Instead of updating a duplicate copy of this state in the MSI-X code, use poll() to obtain the information from the kernel on demand. This also fixes reading from the Pending Bit Array of vhost-user devices, and avoids taking a lock when triggering interrupts for all devices. Furthermore, it avoids panics when a bad read is made from the Pending Bit Array.
Since it is likely that guests don't read the Pending Bit Array, this is the slowest of slow paths: it will likely never be reached in the entire lifecycle of a VM. Therefore, the need for an extra system call does not matter.
The Pending Bit Array still needs to be stored explicitly for migration. The triggered state of an irqfd is not preserved across migration, so it is necessary to include the Pending Bit Array in the migration state. However, until migration occurs, the current in-memory state only has bits removed, never added. The missing bits are obtained via poll() when the state needs to be saved.
Fixes: #7838