pci: expand sub-page VFIO BAR mmap to page size#7939
Draft
saravan2 wants to merge 1 commit intocloud-hypervisor:mainfrom
Draft
pci: expand sub-page VFIO BAR mmap to page size#7939saravan2 wants to merge 1 commit intocloud-hypervisor:mainfrom
saravan2 wants to merge 1 commit intocloud-hypervisor:mainfrom
Conversation
On aarch64 with 64K host pages, VFIO passthrough of devices with sub-page BARs (e.g. 16K NVMe BAR0) crashes with EINVAL from KVM_SET_USER_MEMORY_REGION, which requires memory_size to be a multiple of the host page size. Expand the mmap to page size instead of rejecting it, matching QEMU's approach. The kernel's vfio_pci_probe_mmaps() already verifies that sub-page BARs are page-aligned and reserves the remainder of the page, so this is safe. To prevent the expanded KVM memory slot from overlapping the relocated MSI-X trap region, fixup_msix_region() now ensures the lower half of the virtual BAR is at least one host page. Signed-off-by: Saravanan D <saravanand@crusoe.ai>
phip1611
reviewed
Mar 31, 2026
| // lower half is at least one host page. This lets | ||
| // map_mmio_regions() expand a sub-page BAR mmap to page size | ||
| // without the KVM memory slot overlapping the relocated MSI-X. | ||
| let size = std::cmp::max(std::cmp::max(region_size, get_page_size()) * 2, msix_sz * 2); |
Member
There was a problem hiding this comment.
nit: this compressed style feels a little like C coding style.
I'd very much prefer a more verbose and readable form
let size = {
let lower_half_size = cmp::max(region_size, get_page_size());
let bar_size = lower_half_size * 2;
let msix_size = msix_sz * 2;
cmp::max(bar_size, msix_size)
};| )?; | ||
|
|
||
| for area in sparse_areas.iter() { | ||
| let page_size = get_page_size(); |
Member
There was a problem hiding this comment.
I'm worried about code readability and quality of gigantic loop bodies.
How about something like this
(I didn't detail it in review, please do not copy & paste blindly)
fn effective_sparse_mmap_len(
region_start: u64,
area: &SparseArea,
page_size: u64,
) -> Option<u64> {
if area.size >= page_size {
return Some(area.size);
}
let gpa = region_start + area.offset;
if !is_page_size_aligned(area.offset) || !is_page_size_aligned(gpa) {
warn!(
"Skipping mmap of sub-page sparse area \
(offset 0x{:x}, size 0x{:x}): not page-aligned",
area.offset, area.size,
);
return None;
}
info!(
"Expanding sub-page sparse area mmap from 0x{:x} to \
page size 0x{:x}",
area.size, page_size,
);
Some(page_size)
}
for area in sparse_areas.iter() {
let page_size = get_page_size();
let mmap_len = match effective_sparse_mmap_len(region.start.0, area, page_size) {
Some(len) => len,
None => continue,
};
let mapping = match MmapRegion::mmap(mmap_len, prot, fd, mmap_offset, area.offset) {
Ok(mapping) => mapping,
Err(_) => {
error!(
"Could not mmap sparse area (offset = 0x{:x}, size = 0x{:x}): {}",
mmap_offset,
mmap_len,
std::io::Error::last_os_error()
);
return Err(VfioPciError::MmapArea);
}
};
let user_memory_region = UserMemoryRegion {
slot: self.memory_slot_allocator.next_memory_slot(),
start: region.start.0 + area.offset,
mapping: Arc::new(mapping),
};
unsafe {
self.vm.create_user_memory_region(
user_memory_region.slot,
user_memory_region.start,
user_memory_region.mapping.len(),
user_memory_region.mapping.addr(),
false,
false,
)
}
.map_err(VfioPciError::CreateUserMemoryRegion)?;
if !self.iommu_attached {
#[allow(unused_unsafe)]
unsafe {
self.vfio_ops.vfio_dma_map(
user_memory_region.start,
user_memory_region.mapping.len(),
user_memory_region.mapping.addr(),
)
}
.map_err(|e| VfioPciError::DmaMap(e, self.device_path.clone(), self.bdf))?;
}
region.user_memory_regions.push(user_memory_region);
}
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
On aarch64 with 64K host pages, VFIO passthrough of devices with sub-page BARs (e.g. 16K NVMe BAR0) crashes with EINVAL from KVM_SET_USER_MEMORY_REGION, which requires memory_size to be a multiple of the host page size.
Expand the mmap to page size instead of rejecting it, matching QEMU's approach. The kernel's vfio_pci_probe_mmaps() already verifies that sub-page BARs are page-aligned and reserves the remainder of the page, so this is safe.
To prevent the expanded KVM memory slot from overlapping the relocated MSI-X trap region, fixup_msix_region() now ensures the lower half of the virtual BAR is at least one host page.