Skip to content

pci: expand sub-page VFIO BAR mmap to page size#7939

Draft
saravan2 wants to merge 1 commit intocloud-hypervisor:mainfrom
saravan2:64KiB
Draft

pci: expand sub-page VFIO BAR mmap to page size#7939
saravan2 wants to merge 1 commit intocloud-hypervisor:mainfrom
saravan2:64KiB

Conversation

@saravan2
Copy link
Copy Markdown
Member

On aarch64 with 64K host pages, VFIO passthrough of devices with sub-page BARs (e.g. 16K NVMe BAR0) crashes with EINVAL from KVM_SET_USER_MEMORY_REGION, which requires memory_size to be a multiple of the host page size.

Expand the mmap to page size instead of rejecting it, matching QEMU's approach. The kernel's vfio_pci_probe_mmaps() already verifies that sub-page BARs are page-aligned and reserves the remainder of the page, so this is safe.

To prevent the expanded KVM memory slot from overlapping the relocated MSI-X trap region, fixup_msix_region() now ensures the lower half of the virtual BAR is at least one host page.

On aarch64 with 64K host pages, VFIO passthrough of devices with
sub-page BARs (e.g. 16K NVMe BAR0) crashes with EINVAL from
KVM_SET_USER_MEMORY_REGION, which requires memory_size to be a
multiple of the host page size.

Expand the mmap to page size instead of rejecting it, matching
QEMU's approach. The kernel's vfio_pci_probe_mmaps() already
verifies that sub-page BARs are page-aligned and reserves the
remainder of the page, so this is safe.

To prevent the expanded KVM memory slot from overlapping the
relocated MSI-X trap region, fixup_msix_region() now ensures the
lower half of the virtual BAR is at least one host page.

Signed-off-by: Saravanan D <saravanand@crusoe.ai>
Copy link
Copy Markdown
Member

@phip1611 phip1611 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a PCI expert but I think the changes are good - thanks for the contribution! I'm worried about code readability and think we can do a little better here

// lower half is at least one host page. This lets
// map_mmio_regions() expand a sub-page BAR mmap to page size
// without the KVM memory slot overlapping the relocated MSI-X.
let size = std::cmp::max(std::cmp::max(region_size, get_page_size()) * 2, msix_sz * 2);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this compressed style feels a little like C coding style.

I'd very much prefer a more verbose and readable form

let size = {
    let lower_half_size = cmp::max(region_size, get_page_size());
    let bar_size = lower_half_size * 2;
    let msix_size = msix_sz * 2;

    cmp::max(bar_size, msix_size)
};

)?;

for area in sparse_areas.iter() {
let page_size = get_page_size();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about code readability and quality of gigantic loop bodies.

How about something like this

(I didn't detail it in review, please do not copy & paste blindly)

fn effective_sparse_mmap_len(
    region_start: u64,
    area: &SparseArea,
    page_size: u64,
) -> Option<u64> {
    if area.size >= page_size {
        return Some(area.size);
    }

    let gpa = region_start + area.offset;
    if !is_page_size_aligned(area.offset) || !is_page_size_aligned(gpa) {
        warn!(
            "Skipping mmap of sub-page sparse area \
             (offset 0x{:x}, size 0x{:x}): not page-aligned",
            area.offset, area.size,
        );
        return None;
    }

    info!(
        "Expanding sub-page sparse area mmap from 0x{:x} to \
         page size 0x{:x}",
        area.size, page_size,
    );
    Some(page_size)
}

for area in sparse_areas.iter() {
    let page_size = get_page_size();
    let mmap_len = match effective_sparse_mmap_len(region.start.0, area, page_size) {
        Some(len) => len,
        None => continue,
    };

    let mapping = match MmapRegion::mmap(mmap_len, prot, fd, mmap_offset, area.offset) {
        Ok(mapping) => mapping,
        Err(_) => {
            error!(
                "Could not mmap sparse area (offset = 0x{:x}, size = 0x{:x}): {}",
                mmap_offset,
                mmap_len,
                std::io::Error::last_os_error()
            );
            return Err(VfioPciError::MmapArea);
        }
    };

    let user_memory_region = UserMemoryRegion {
        slot: self.memory_slot_allocator.next_memory_slot(),
        start: region.start.0 + area.offset,
        mapping: Arc::new(mapping),
    };

    unsafe {
        self.vm.create_user_memory_region(
            user_memory_region.slot,
            user_memory_region.start,
            user_memory_region.mapping.len(),
            user_memory_region.mapping.addr(),
            false,
            false,
        )
    }
    .map_err(VfioPciError::CreateUserMemoryRegion)?;

    if !self.iommu_attached {
        #[allow(unused_unsafe)]
        unsafe {
            self.vfio_ops.vfio_dma_map(
                user_memory_region.start,
                user_memory_region.mapping.len(),
                user_memory_region.mapping.addr(),
            )
        }
        .map_err(|e| VfioPciError::DmaMap(e, self.device_path.clone(), self.bdf))?;
    }

    region.user_memory_regions.push(user_memory_region);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants