pci: expand sub-page VFIO BAR mmap to page size#7939
pci: expand sub-page VFIO BAR mmap to page size#7939rbradford merged 1 commit intocloud-hypervisor:mainfrom
Conversation
|
Before #7904, For 16K BAR device on a 64K page host:
The out-of-range MSI-X hole accidentally inflated the sparse area to exactly page size, #7904 added an This patch explicitly expands sub-page sparse area mmaps to host page size in |
likebreath
left a comment
There was a problem hiding this comment.
Thank you for the contribution with detailed comments and performance regression test.
Some comments below. Please also don't forget to update the commit message and comments added.
Addressed all of them. Updated comments, commit message and PR description. Thanks for your review. |
| ); | ||
| return Err(VfioPciError::MmapArea); | ||
| } | ||
| info!( |
There was a problem hiding this comment.
nit: Include the BAR ID (region.index) here and in the error above for easier debugging?
There was a problem hiding this comment.
Added region.index information to the messages.
On aarch64 with 64K host pages, VFIO passthrough of devices with sub-page BARs (e.g. 16K NVMe BAR0) crashes with EINVAL from KVM_SET_USER_MEMORY_REGION, which requires memory_size to be a multiple of the host page size. Expand the mmap to page size instead of rejecting it, matching QEMU's approach. The kernel's vfio_pci_probe_mmaps() already verifies that sub-page BARs are page-aligned and reserves the remainder of the page, so expansion is safe at offset 0. Reject sub-page sparse areas at non-zero offsets where this guarantee does not apply. The expanded mmap region will not overlap with the relocated MSI-X trap region because fixup_msix_region() ensures MSI-X relocation at >= page_size offset. Signed-off-by: Saravanan D <saravanand@crusoe.ai>
23a980c
On aarch64 with 64K host pages, VFIO passthrough of devices with sub-page BARs (e.g. 16K NVMe BAR0) crashes with EINVAL from KVM_SET_USER_MEMORY_REGION, which requires memory_size to be a multiple of the host page size.
Expand the mmap to page size instead of rejecting it, matching QEMU's approach. The kernel's vfio_pci_probe_mmaps() already verifies that sub-page BARs are page-aligned and reserves the remainder of the page, so this is safe.
The expanded mmap region will not overlap with the relocated MSI-X trap region because
fixup_msix_region()ensures MSI-X relocation at >= page_size offset.Validation
Host: aarch64, 64K pages (kernel 6.11), VFIO passthrough.
Device Under Test
Intel NVMe DC SSD [3DNAND, Sentinel Rock Controller] — 16K BAR0 with MSI-X in the same BAR.
Launch
Functional
Performance (fio 4K random read, io_uring, iodepth=64, numjobs=4, 60s, 3 runs)
Median IOPS: 747,589 — no regression from baseline Pre-#7904