Skip to content

Commit 9d33edb

Browse files
committed
Merge tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner: "Updates for the interrupt core and driver subsystem: The bulk is the rework of the MSI subsystem to support per device MSI interrupt domains. This solves conceptual problems of the current PCI/MSI design which are in the way of providing support for PCI/MSI[-X] and the upcoming PCI/IMS mechanism on the same device. IMS (Interrupt Message Store] is a new specification which allows device manufactures to provide implementation defined storage for MSI messages (as opposed to PCI/MSI and PCI/MSI-X that has a specified message store which is uniform accross all devices). The PCI/MSI[-X] uniformity allowed us to get away with "global" PCI/MSI domains. IMS not only allows to overcome the size limitations of the MSI-X table, but also gives the device manufacturer the freedom to store the message in arbitrary places, even in host memory which is shared with the device. There have been several attempts to glue this into the current MSI code, but after lengthy discussions it turned out that there is a fundamental design problem in the current PCI/MSI-X implementation. This needs some historical background. When PCI/MSI[-X] support was added around 2003, interrupt management was completely different from what we have today in the actively developed architectures. Interrupt management was completely architecture specific and while there were attempts to create common infrastructure the commonalities were rudimentary and just providing shared data structures and interfaces so that drivers could be written in an architecture agnostic way. The initial PCI/MSI[-X] support obviously plugged into this model which resulted in some basic shared infrastructure in the PCI core code for setting up MSI descriptors, which are a pure software construct for holding data relevant for a particular MSI interrupt, but the actual association to Linux interrupts was completely architecture specific. This model is still supported today to keep museum architectures and notorious stragglers alive. In 2013 Intel tried to add support for hot-pluggable IO/APICs to the kernel, which was creating yet another architecture specific mechanism and resulted in an unholy mess on top of the existing horrors of x86 interrupt handling. The x86 interrupt management code was already an incomprehensible maze of indirections between the CPU vector management, interrupt remapping and the actual IO/APIC and PCI/MSI[-X] implementation. At roughly the same time ARM struggled with the ever growing SoC specific extensions which were glued on top of the architected GIC interrupt controller. This resulted in a fundamental redesign of interrupt management and provided the today prevailing concept of hierarchical interrupt domains. This allowed to disentangle the interactions between x86 vector domain and interrupt remapping and also allowed ARM to handle the zoo of SoC specific interrupt components in a sane way. The concept of hierarchical interrupt domains aims to encapsulate the functionality of particular IP blocks which are involved in interrupt delivery so that they become extensible and pluggable. The X86 encapsulation looks like this: |--- device 1 [Vector]---[Remapping]---[PCI/MSI]--|... |--- device N where the remapping domain is an optional component and in case that it is not available the PCI/MSI[-X] domains have the vector domain as their parent. This reduced the required interaction between the domains pretty much to the initialization phase where it is obviously required to establish the proper parent relation ship in the components of the hierarchy. While in most cases the model is strictly representing the chain of IP blocks and abstracting them so they can be plugged together to form a hierarchy, the design stopped short on PCI/MSI[-X]. Looking at the hardware it's clear that the actual PCI/MSI[-X] interrupt controller is not a global entity, but strict a per PCI device entity. Here we took a short cut on the hierarchical model and went for the easy solution of providing "global" PCI/MSI domains which was possible because the PCI/MSI[-X] handling is uniform across the devices. This also allowed to keep the existing PCI/MSI[-X] infrastructure mostly unchanged which in turn made it simple to keep the existing architecture specific management alive. A similar problem was created in the ARM world with support for IP block specific message storage. Instead of going all the way to stack a IP block specific domain on top of the generic MSI domain this ended in a construct which provides a "global" platform MSI domain which allows overriding the irq_write_msi_msg() callback per allocation. In course of the lengthy discussions we identified other abuse of the MSI infrastructure in wireless drivers, NTB etc. where support for implementation specific message storage was just mindlessly glued into the existing infrastructure. Some of this just works by chance on particular platforms but will fail in hard to diagnose ways when the driver is used on platforms where the underlying MSI interrupt management code does not expect the creative abuse. Another shortcoming of today's PCI/MSI-X support is the inability to allocate or free individual vectors after the initial enablement of MSI-X. This results in an works by chance implementation of VFIO (PCI pass-through) where interrupts on the host side are not set up upfront to avoid resource exhaustion. They are expanded at run-time when the guest actually tries to use them. The way how this is implemented is that the host disables MSI-X and then re-enables it with a larger number of vectors again. That works by chance because most device drivers set up all interrupts before the device actually will utilize them. But that's not universally true because some drivers allocate a large enough number of vectors but do not utilize them until it's actually required, e.g. for acceleration support. But at that point other interrupts of the device might be in active use and the MSI-X disable/enable dance can just result in losing interrupts and therefore hard to diagnose subtle problems. Last but not least the "global" PCI/MSI-X domain approach prevents to utilize PCI/MSI[-X] and PCI/IMS on the same device due to the fact that IMS is not longer providing a uniform storage and configuration model. The solution to this is to implement the missing step and switch from global PCI/MSI domains to per device PCI/MSI domains. The resulting hierarchy then looks like this: |--- [PCI/MSI] device 1 [Vector]---[Remapping]---|... |--- [PCI/MSI] device N which in turn allows to provide support for multiple domains per device: |--- [PCI/MSI] device 1 |--- [PCI/IMS] device 1 [Vector]---[Remapping]---|... |--- [PCI/MSI] device N |--- [PCI/IMS] device N This work converts the MSI and PCI/MSI core and the x86 interrupt domains to the new model, provides new interfaces for post-enable allocation/free of MSI-X interrupts and the base framework for PCI/IMS. PCI/IMS has been verified with the work in progress IDXD driver. There is work in progress to convert ARM over which will replace the platform MSI train-wreck. The cleanup of VFIO, NTB and other creative "solutions" are in the works as well. Drivers: - Updates for the LoongArch interrupt chip drivers - Support for MTK CIRQv2 - The usual small fixes and updates all over the place" * tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (134 commits) irqchip/ti-sci-inta: Fix kernel doc irqchip/gic-v2m: Mark a few functions __init irqchip/gic-v2m: Include arm-gic-common.h irqchip/irq-mvebu-icu: Fix works by chance pointer assignment iommu/amd: Enable PCI/IMS iommu/vt-d: Enable PCI/IMS x86/apic/msi: Enable PCI/IMS PCI/MSI: Provide pci_ims_alloc/free_irq() PCI/MSI: Provide IMS (Interrupt Message Store) support genirq/msi: Provide constants for PCI/IMS support x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYN PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X PCI/MSI: Provide prepare_desc() MSI domain op PCI/MSI: Split MSI-X descriptor setup genirq/msi: Provide MSI_FLAG_MSIX_ALLOC_DYN genirq/msi: Provide msi_domain_alloc_irq_at() genirq/msi: Provide msi_domain_ops:: Prepare_desc() genirq/msi: Provide msi_desc:: Msi_data genirq/msi: Provide struct msi_map x86/apic/msi: Remove arch_create_remap_msi_irq_domain() ...
2 parents f10bc40 + 6132a49 commit 9d33edb

87 files changed

Lines changed: 3388 additions & 1536 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Documentation/PCI/msi-howto.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -285,3 +285,13 @@ to bridges between the PCI root and the device, MSIs are disabled.
285285
It is also worth checking the device driver to see whether it supports MSIs.
286286
For example, it may contain calls to pci_alloc_irq_vectors() with the
287287
PCI_IRQ_MSI or PCI_IRQ_MSIX flags.
288+
289+
290+
List of device drivers MSI(-X) APIs
291+
===================================
292+
293+
The PCI/MSI subystem has a dedicated C file for its exported device driver
294+
APIs — `drivers/pci/msi/api.c`. The following functions are exported:
295+
296+
.. kernel-doc:: drivers/pci/msi/api.c
297+
:export:
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
2+
%YAML 1.2
3+
---
4+
$id: http://devicetree.org/schemas/interrupt-controller/loongarch,cpu-interrupt-controller.yaml#
5+
$schema: http://devicetree.org/meta-schemas/core.yaml#
6+
7+
title: LoongArch CPU Interrupt Controller
8+
9+
maintainers:
10+
- Liu Peibao <liupeibao@loongson.cn>
11+
12+
properties:
13+
compatible:
14+
const: loongarch,cpu-interrupt-controller
15+
16+
'#interrupt-cells':
17+
const: 1
18+
19+
interrupt-controller: true
20+
21+
additionalProperties: false
22+
23+
required:
24+
- compatible
25+
- '#interrupt-cells'
26+
- interrupt-controller
27+
28+
examples:
29+
- |
30+
interrupt-controller {
31+
compatible = "loongarch,cpu-interrupt-controller";
32+
#interrupt-cells = <1>;
33+
interrupt-controller;
34+
};

Documentation/devicetree/bindings/interrupt-controller/mediatek,cirq.txt

Lines changed: 0 additions & 33 deletions
This file was deleted.
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
2+
%YAML 1.2
3+
---
4+
$id: http://devicetree.org/schemas/interrupt-controller/mediatek,mtk-cirq.yaml#
5+
$schema: http://devicetree.org/meta-schemas/core.yaml#
6+
7+
title: MediaTek System Interrupt Controller
8+
9+
maintainers:
10+
- Youlin Pei <youlin.pei@mediatek.com>
11+
12+
description:
13+
In MediaTek SoCs, the CIRQ is a low power interrupt controller designed to
14+
work outside of MCUSYS which comprises with Cortex-Ax cores, CCI and GIC.
15+
The external interrupts (outside MCUSYS) will feed through CIRQ and connect
16+
to GIC in MCUSYS. When CIRQ is enabled, it will record the edge-sensitive
17+
interrupts and generate a pulse signal to parent interrupt controller when
18+
flush command is executed. With CIRQ, MCUSYS can be completely turned off
19+
to improve the system power consumption without losing interrupts.
20+
21+
22+
properties:
23+
compatible:
24+
items:
25+
- enum:
26+
- mediatek,mt2701-cirq
27+
- mediatek,mt8135-cirq
28+
- mediatek,mt8173-cirq
29+
- mediatek,mt8192-cirq
30+
- const: mediatek,mtk-cirq
31+
32+
reg:
33+
maxItems: 1
34+
35+
'#interrupt-cells':
36+
const: 3
37+
38+
interrupt-controller: true
39+
40+
mediatek,ext-irq-range:
41+
$ref: /schemas/types.yaml#/definitions/uint32-array
42+
items:
43+
- description: First CIRQ interrupt
44+
- description: Last CIRQ interrupt
45+
description:
46+
Identifies the range of external interrupts in different SoCs
47+
48+
required:
49+
- compatible
50+
- reg
51+
- '#interrupt-cells'
52+
- interrupt-controller
53+
- mediatek,ext-irq-range
54+
55+
additionalProperties: false
56+
57+
examples:
58+
- |
59+
#include <dt-bindings/interrupt-controller/irq.h>
60+
61+
cirq: interrupt-controller@10204000 {
62+
compatible = "mediatek,mt2701-cirq", "mediatek,mtk-cirq";
63+
reg = <0x10204000 0x400>;
64+
#interrupt-cells = <3>;
65+
interrupt-controller;
66+
interrupt-parent = <&sysirq>;
67+
mediatek,ext-irq-range = <32 200>;
68+
};

arch/loongarch/include/asm/irq.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ int liointc_acpi_init(struct irq_domain *parent,
9393
int eiointc_acpi_init(struct irq_domain *parent,
9494
struct acpi_madt_eio_pic *acpi_eiointc);
9595

96-
struct irq_domain *htvec_acpi_init(struct irq_domain *parent,
96+
int htvec_acpi_init(struct irq_domain *parent,
9797
struct acpi_madt_ht_pic *acpi_htvec);
9898
int pch_lpc_acpi_init(struct irq_domain *parent,
9999
struct acpi_madt_lpc_pic *acpi_pchlpc);

arch/powerpc/platforms/pseries/msi.c

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -447,21 +447,18 @@ static void pseries_msi_ops_msi_free(struct irq_domain *domain,
447447
* RTAS can not disable one MSI at a time. It's all or nothing. Do it
448448
* at the end after all IRQs have been freed.
449449
*/
450-
static void pseries_msi_domain_free_irqs(struct irq_domain *domain,
451-
struct device *dev)
450+
static void pseries_msi_post_free(struct irq_domain *domain, struct device *dev)
452451
{
453452
if (WARN_ON_ONCE(!dev_is_pci(dev)))
454453
return;
455454

456-
__msi_domain_free_irqs(domain, dev);
457-
458455
rtas_disable_msi(to_pci_dev(dev));
459456
}
460457

461458
static struct msi_domain_ops pseries_pci_msi_domain_ops = {
462459
.msi_prepare = pseries_msi_ops_prepare,
463460
.msi_free = pseries_msi_ops_msi_free,
464-
.domain_free_irqs = pseries_msi_domain_free_irqs,
461+
.msi_post_free = pseries_msi_post_free,
465462
};
466463

467464
static void pseries_msi_shutdown(struct irq_data *d)

arch/um/drivers/Kconfig

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -381,7 +381,6 @@ config UML_PCI_OVER_VIRTIO
381381
select UML_IOMEM_EMULATION
382382
select UML_DMA_EMULATION
383383
select PCI_MSI
384-
select PCI_MSI_IRQ_DOMAIN
385384
select PCI_LOCKLESS_CONFIG
386385

387386
config UML_PCI_OVER_VIRTIO_DEVICE_ID

arch/um/include/asm/pci.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
/* Generic PCI */
88
#include <asm-generic/pci.h>
99

10-
#ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
10+
#ifdef CONFIG_PCI_MSI
1111
/*
1212
* This is a bit of an annoying hack, and it assumes we only have
1313
* the virt-pci (if anything). Which is true, but still.

arch/x86/Kconfig

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1110,7 +1110,6 @@ config X86_LOCAL_APIC
11101110
def_bool y
11111111
depends on X86_64 || SMP || X86_32_NON_STANDARD || X86_UP_APIC || PCI_MSI
11121112
select IRQ_DOMAIN_HIERARCHY
1113-
select PCI_MSI_IRQ_DOMAIN if PCI_MSI
11141113

11151114
config X86_IO_APIC
11161115
def_bool y
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
/* SPDX-License-Identifier: GPL-2.0 */
2+
#ifndef _ASM_X86_HYPERV_TIMER_H
3+
#define _ASM_X86_HYPERV_TIMER_H
4+
5+
#include <asm/msr.h>
6+
7+
#define hv_get_raw_timer() rdtsc_ordered()
8+
9+
#endif

0 commit comments

Comments
 (0)