Skip to content

Commit 34315c5

Browse files
authored
Merge pull request #4337 from vitong/vitong/arm64ecabi
Add Arm64EC ABI document
2 parents 10a6ab7 + 6cd17f4 commit 34315c5

4 files changed

Lines changed: 228 additions & 30 deletions

File tree

Lines changed: 192 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,192 @@
1+
---
2+
description: "Learn more about: Overview of ARM64EC ABI conventions"
3+
title: "Overview of ARM64EC ABI conventions"
4+
ms.date: "06/03/2022"
5+
---
6+
# Overview of ARM64EC ABI conventions
7+
8+
ARM64EC is an application binary interface (ABI) that enables ARM64 binaries to run natively and interoperably with x64 code. Specifically, the ARM64EC ABI follows x64 software conventions including calling convention, stack usage, and data alignment, making ARM64EC and x64 code interoperable. The operating system emulates the x64 portion of the binary. (The EC in ARM64EC stands for *emulation compatible*.)
9+
10+
For more information on the x64 and ARM64 ABIs, see [Overview of x64 ABI conventions](x64-software-conventions.md) and [Overview of ARM64 ABI conventions](arm64-windows-abi-conventions.md).
11+
12+
ARM64EC doesn't solve memory model differences between x64 and ARM based architectures. For more information, see [Common Visual C++ ARM migration issues](common-visual-cpp-arm-migration-issues.md).
13+
14+
## Definitions
15+
16+
- **ARM64** - The code stream for ARM64 processes that contains traditional ARM64 code.
17+
- **ARM64EC** - The code stream that utilizes a subset of the ARM64 register set to provide interoperability with x64 code.
18+
19+
## Register mapping
20+
21+
x64 processes may have threads running ARM64EC code. So it's always possible to retrieve an x64 register context, ARM64EC uses a subset of the ARM64 core registers that map 1:1 to emulated x64 registers. Importantly, ARM64EC never uses registers outside of this subset, except to read the Thread Environment Block (TEB) address from `x18`.
22+
23+
Native ARM64 processes shouldn't regress in performance when some or many functions are recompiled as ARM64EC. To maintain performance, the ABI follows these principles:
24+
25+
- The ARM64EC register subset includes all registers that are part of the ARM64 function calling convention.
26+
27+
- The ARM64EC calling convention directly maps to the ARM64 calling convention.
28+
29+
Special helper routines like `__chkstk_arm64ec` use custom calling conventions and registers. These registers are also included in the ARM64EC subset of registers.
30+
31+
## Register mapping for integer registers
32+
33+
| ARM64EC register | x64 register | ARM64EC calling convention | ARM64 calling convention | x64 calling convention |
34+
|--|--|--|--|--|
35+
| `x0` | `rcx` | volatile | volatile | volatile |
36+
| `x1` | `rdx` | volatile | volatile | volatile |
37+
| `x2` | `r8` | volatile | volatile | volatile |
38+
| `x3` | `r9` | volatile | volatile | volatile |
39+
| `x4` | `r10` | volatile | volatile | volatile |
40+
| `x5` | `r11` | volatile | volatile | volatile |
41+
| `x6` | `mm1` (low 64 bits of x87 `R1` register) | volatile | volatile | volatile |
42+
| `x7` | `mm2` (low 64 bits of x87 `R2` register) | volatile | volatile | volatile |
43+
| `x8` | `rax` | volatile | volatile | volatile |
44+
| `x9` | `mm3` (low 64 bits of x87 `R3` register) | volatile | volatile | volatile |
45+
| `x10` | `mm4` (low 64 bits of x87 `R4` register) | volatile | volatile | volatile |
46+
| `x11` | `mm5` (low 64 bits of x87 `R5` register) | volatile | volatile | volatile |
47+
| `x12` | `mm6` (low 64 bits of x87 `R6` register) | volatile | volatile | volatile |
48+
| `x13` | N/A | disallowed | volatile | N/A |
49+
| `x14` | N/A | disallowed | volatile | N/A |
50+
| `x15` | `mm7` (low 64 bits of x87 `R7` register) | volatile | volatile | volatile |
51+
| `x16` | High 16 bits of each of the x87 `R0`-`R3` registers | volatile(`xip0`) | volatile(`xip0`) | volatile |
52+
| `x17` | High 16 bits of each of the x87 `R4`-`R7` registers | volatile(`xip1`) | volatile(`xip1`) | volatile |
53+
| `x18` | N/A | fixed(TEB) | fixed(TEB) | volatile |
54+
| `x19` | `r12` | non-volatile | non-volatile | non-volatile |
55+
| `x20` | `r13` | non-volatile | non-volatile | non-volatile |
56+
| `x21` | `r14` | non-volatile | non-volatile | non-volatile |
57+
| `x22` | `r15` | non-volatile | non-volatile | non-volatile |
58+
| `x23` | N/A | disallowed | non-volatile | N/A |
59+
| `x24` | N/A | disallowed | non-volatile | N/A |
60+
| `x25` | `rsi` | non-volatile | non-volatile | non-volatile |
61+
| `x26` | `rdi` | non-volatile | non-volatile | non-volatile |
62+
| `x27` | `rbx` | non-volatile | non-volatile | non-volatile |
63+
| `x28` | N/A | disallowed | disallowed | N/A |
64+
| `fp` | `rbp` | non-volatile | non-volatile | non-volatile |
65+
| `lr` | `mm0` (low 64 bits of x87 `R0` register) | volatile | volatile | N/A |
66+
| `sp` | `rsp` | non-volatile | non-volatile | non-volatile |
67+
| `pc` | `rip` | instruction pointer | instruction pointer | instruction pointer |
68+
| `PSTATE` subset: `N`/`Z`/`C`/`V`/`SS` <sup>1, 2</sup> | `RFLAGS` subset: `SF`/`ZF`/`CF`/`OF`/`TF` | volatile | volatile | volatile |
69+
| N/A | `RFLAGS` subset: `PF`/`AF` | N/A | N/A | volatile |
70+
| N/A | `RFLAGS` subset: `DF` | N/A | N/A | non-volatile |
71+
72+
<sup>1</sup> Avoid directly reading, writing or computing mappings between `PSTATE` and `RFLAGS`. These bits may be used in the future and are subject to change.
73+
74+
<sup>2</sup> The ARM64EC carry flag `C` is the inverse of the x64 carry flag `CF` for subtraction operations. There's no special handling, because the flag is volatile and is therefore trashed when transitioning between (ARM64EC and x64) functions.
75+
76+
## <a name="register-mapping-for-vector-registers"/> Register mapping for vector registers
77+
78+
| ARM64EC register | x64 register | ARM64EC calling convention | ARM64 calling convention | x64 calling convention |
79+
|--|--|--|--|--|
80+
| `v0`-`v5` | `xmm0`-`xmm5` | volatile | volatile | volatile |
81+
| `v6`-`v7` | `xmm6`-`xmm7` | volatile | volatile | non-volatile |
82+
| `v8`-`v15` | `xmm8`-`xmm15` | volatile <sup>1</sup> | volatile <sup>1</sup> | non-volatile |
83+
| `v16`-`v31` | `xmm16`-`xmm31` | disallowed | volatile | disallowed (x64 emulator doesn't support AVX-512) |
84+
| `FPCR` <sup>2</sup> | `MXCSR[15:6]` | non-volatile | non-volatile | non-volatile |
85+
| `FPSR` <sup>2</sup> | `MXCSR[5:0]` | volatile | volatile | volatile |
86+
87+
<sup>1</sup> These ARM64 registers are special in that the lower 64 bits are non-volatile but the upper 64 bits are volatile. From the point of view of an x64 caller, they're effectively volatile because the callee would trash data.
88+
89+
<sup>2</sup> Avoid directly reading, writing, or computing mappings of `FPCR` and `FPSR`. These bits may be used in the future and are subject to change.
90+
91+
## Struct packing
92+
93+
ARM64EC uses the struct packing rules that are used for x64 code. Consider a field that has an alignment specifier. Empirically, x64 rounds the size of the struct to the next multiple of the alignment, whereas ARM64 rounds the size of the struct to the next multiple of 8.
94+
95+
## Emulation helper ABI routines
96+
97+
ARM64EC code and [thunks](#thunks) use emulation helper routines to transition between x64 and ARM64EC functions.
98+
99+
The following table describes each special ABI routine and the registers the ABI uses. The routines don't modify the listed preserved registers under the ABI column. No assumptions should be made about unlisted registers. On-disk, the ABI routine pointers are null. At load time, the loader updates the pointers to point to the x64 emulator routines.
100+
101+
| Name | Description | ABI |
102+
|--|--|--|
103+
| `__os_arm64x_dispatch_call_no_redirect` | Called by an exit thunk to call an x64 target (either an x64 function or an x64 fast-forward sequence). The routine pushes the ARM64EC return address (in the `LR` register) followed by the address of the instruction that succeeds a `blr x16` instruction that invokes the x64 emulator. It then runs the `blr x16` instruction | return value in `x8` (`rax`) |
104+
| `__os_arm64x_dispatch_ret` | Called by an entry thunk to return to its x64 caller. It pops the x64 return address from the stack and invokes the x64 emulator to jump to it | N/A |
105+
| `__os_arm64x_check_call` | Called by ARM64EC code with a pointer to an exit thunk and the indirect ARM64EC target address to execute. The ARM64EC target is considered patchable, and execution always returns to the caller with either the same data it was called with, or with modified data | Arguments:<br/>`x9`: The target address<br/>`x10`: The exit thunk address<br/>`x11`: The fast forward sequence address<br/><br/>Out:<br/>`x9`: If the target function was detoured, it contains the address of the fast forward sequence<br/>`x10`: The exit thunk address<br/>`x11`: If the function was detoured, it contains the exit thunk address. Otherwise, the target address jumped to<br/><br/>Preserved registers: `x0`-`x8`, `x15` (`chkstk`). and `q0`-`q7` |
106+
| `__os_arm64x_check_icall` | Called by ARM64EC code, with a pointer to an exit thunk, to handle a jump to either a target address that is either x64 or ARM64EC. If the target is x64 and the x64 code hasn't been patched, the routine sets the target address register. It points to the ARM64EC version of the function if one exists. Otherwise, it sets the register to point to the exit thunk that transitions to the x64 target. Then, it returns to the calling ARM64EC code, which then jumps to the address in the register. This routine is a non-optimized version of `__os_arm64x_check_call`, where the target address isn't known at compile time<br/><br/>Used at a call-site of an indirect call | Arguments:<br/>`x9`: The target address<br/>`x10`: The exit thunk address<br/>`x11`: The fast forward sequence address<br/><br/>Out:<br/>`x9`: If the target function was detoured, it contains the address of the fast forward sequence<br/>`x10`: The exit thunk address<br/>`x11`: If the function was detoured, it contains the exit thunk address. Otherwise, the target address jumped to<br/><br/>Preserved registers: `x0`-`x8`, `x15` (`chkstk`), and `q0`-`q7` |
107+
| `__os_arm64x_check_icall_cfg` | Same as `__os_arm64x_check_icall` but also checks that the specified address is a valid Control Flow Graph indirect call target | Arguments:<br/>`x10`: The address of the exit thunk<br/>`x11`: The address of the target function<br/><br/>Out:<br/>`x9`: If the target is x64, the address to the function. Otherwise, undefined<br/>`x10`: The address of the exit thunk<br/>`x11`: If the target is x64, it contains the address of the exit thunk. Otherwise, the address of the function<br/><br/>Preserved registers: `x0`-`x8`, `x15` (`chkstk`), and `q0`-`q7` |
108+
| `__os_arm64x_get_x64_information` | Gets the requested part of the live x64 register context | `_Function_class_(ARM64X_GET_X64_INFORMATION) NTSTATUS LdrpGetX64Information(_In_ ULONG Type, _Out_ PVOID Output, _In_ PVOID ExtraInfo)` |
109+
| `__os_arm64x_set_x64_information` | Sets the requested part of the live x64 register context | `_Function_class_(ARM64X_SET_X64_INFORMATION) NTSTATUS LdrpSetX64Information(_In_ ULONG Type,_In_ PVOID Input, _In_ PVOID ExtraInfo)` |
110+
| `__os_arm64x_x64_jump` | Used in signature-less adjustor and other thunks that directly forward (`jmp`) a call to another function that can have any signature, deferring the potential application of the right thunk to the real target | Arguments:<br/>`x9`: target to jump to<br/><br/>All parameter registers preserved (forwarded) |
111+
112+
## Thunks
113+
114+
Thunks are the low-level mechanisms to support ARM64EC and x64 functions calling each other. There are two types: *entry thunks* for entering ARM64EC functions and *exit thunks* for calling x64 functions.
115+
116+
### Entry thunk and intrinsic entry thunks: x64 to ARM64EC function call
117+
118+
To support x64 callers when a C/C++ function is compiled as ARM64EC, the toolchain generates a single entry thunk consisting of ARM64EC machine code. Intrinsics have an entry thunk of their own. All other functions share an entry thunk with all functions that have a matching calling convention, parameters, and return type. The content of the thunk depends on the calling convention of the C/C++ function.
119+
120+
In addition to handling parameters and the return address, the thunk bridges the differences in volatility between ARM64EC and x64 vector registers caused by [ARM64EC vector register mapping](#register-mapping-for-vector-registers):
121+
122+
| ARM64EC register | x64 register | ARM64EC calling convention | ARM64 calling convention | x64 calling convention |
123+
|--|--|--|--|--|
124+
| `v6`-`v15` | `xmm6`-`xmm15` | volatile, but saved/restored in the entry thunk (x64 to ARM64EC) | volatile or partially volatile upper 64 bits | non-volatile |
125+
126+
The entry thunk performs the following actions:
127+
128+
| Parameter number | Stack usage | Stack unwind codes |
129+
|--|--|--|
130+
| 0-4 | Stores ARM64EC `v6` and `v7` into the caller-allocated home space<br/><br/>Since the callee is ARM64EC, which doesn't have the notion of a home space, the stored values aren't clobbered.<br/><br/>Allocates an extra 128 bytes on the stack and store ARM64EC `v8` through `v15`. | `UWOP_SAVE_XMM128` for `xmm6` and `xmm7`<br/><br/>`UWOP_ALLOC_SMALL` + `UWOP_SAVE_XMM128` for `xmm8-xmm15` |
131+
| 5-8 | `x4` = 5th parameter from the stack<br/>`x5` = 6th parameter from the stack<br/>`x6` = 7th parameter from the stack<br/>`x7` = 8th parameter from the stack<br/><br/>If the parameter is SIMD, the `v4`-`v7` registers are used instead | Same as above |
132+
| 9+ | Allocates `AlignUp(NumParams - 8 , 2) * 8` bytes on the stack. \*<br/><br/>Copies the 9th and remaining parameters to this area | `UWOP_ALLOC_SMALL` |
133+
134+
\* Aligning the value to an even number guarantees that the stack remains aligned to 16 bytes
135+
136+
If the function accepts a 32-bit integer parameter, the thunk is permitted to only push 32 bits instead of the full 64 bits of the parent register.
137+
138+
Next, the thunk uses an ARM64 `bl` instruction to call the ARM64EC function. After the function returns, the thunk:
139+
140+
1. Undoes any stack allocations
141+
1. Calls the `__os_arm64x_dispatch_ret` emulator helper to pop the x64 return address and resume x64 emulation.
142+
143+
### Exit thunk: ARM64EC to x64 function call
144+
145+
For every call that an ARM64EC C/C++ function makes to potential x64 code, the MSVC toolchain generates an exit thunk. The content of the thunk depends on the parameters of the x64 callee and whether the callee is using the standard calling convention or `__vectorcall`. The compiler obtains this information from a function declaration for the callee.
146+
147+
First, The thunk pushes the return address that's in the ARM64EC `lr` register and a dummy 8-byte value to guarantee that the stack is aligned to 16 bytes. Second, the thunk handles the parameters:
148+
149+
| Parameter number | Stack usage | Stack unwind codes |
150+
|--|--|--|
151+
| 0-4 | Allocates 32 bytes of home space on the stack | `UWOP_ALLOC_SMALL` |
152+
| 5-8 | Allocates `AlignUp(NumParams - 4, 2) * 8` more bytes higher up on the stack. \* <br/><br/> Copies the 5th and any subsequent parameters from ARM64EC's `x4`-`x7` to this extra space | `UWOP_ALLOC_SMALL` |
153+
| 9+ | Copies the 9th and remaining parameters to the extra space | `UWOP_ALLOC_SMALL` |
154+
155+
\* Aligning the value to an even number guarantees that the stack remains aligned to 16 bytes.
156+
157+
Third, the thunk calls the `__os_arm64x_dispatch_call_no_redirect` emulator helper to invoke the x64 emulator to run the x64 function. The call must be a `blr x16` instruction (conveniently, `x16` is a volatile register). A `blr x16` instruction is required because the x64 emulator parses this instruction as a hint.
158+
159+
The x64 function usually attempts to return to the emulator helper by using an x64 `ret` instruction. At this point, the x64 emulator detects that it's in ARM64EC code. It then reads the preceding 4-byte hint that happens to be the ARM64 `blr x16` instruction. Since this hint indicates that the return address is in this helper, the emulator jumps directly to this address.
160+
161+
The x64 function is permitted to return to the emulator helper using any branch instruction, including x64 `jmp` and `call`. The emulator handles these scenarios as well.
162+
163+
When the helper then returns to the thunk, the thunk:
164+
165+
1. Undoes any stack allocation
166+
1. Pops the ARM64EC `lr` register
167+
1. Executes an ARM64 `ret lr` instruction.
168+
169+
## x64 Unwinding of an ARM64EC function
170+
171+
### Recovering the return address
172+
173+
At the beginning of an x64 function, the stack pointer points to the return address that was pushed onto the stack by the caller's `call` instruction. In contrast, at the beginning of an ARM64EC function, the return address is in the `lr` register, set by the caller's `bl` instruction. Therefore, when unwinding the deepest frame of a stack, if the frame corresponds to an ARM64EC function, the x64 unwinder must recreate the value of ARM64EC `lr` by reading the value stashed in the x87 registers when the exception occurred. In addition, the new x64 unwind code `UWOP_SAVE_RET` handles the prolog saving ARM64EC `lr` to the stack.
174+
175+
## ARM64EC function name decoration
176+
177+
An ARM64EC function name has a secondary decoration applied after any language-specific decoration. For functions with C linkage (whether compiled as C or by using `extern "C"`), a `#` is prepended to the name. For C++ decorated functions, a `$$h` tag is inserted into the name.
178+
179+
```
180+
foo => #foo
181+
?foo@@YAHXZ => ?foo@@$$hYAHXZ
182+
```
183+
184+
## `__vectorcall`
185+
186+
The ARM64EC toolchain currently doesn't support `__vectorcall`. The compiler emits an error when it detects `__vectorcall` usage with ARM64EC.
187+
188+
## See also
189+
190+
[Understanding ARM64EC ABI and assembly code](/windows/arm/arm64ec-abi)\
191+
[Common Visual C++ ARM migration issues](common-visual-cpp-arm-migration-issues.md)\
192+
[Decorated names](./reference/decorated-names.md)

0 commit comments

Comments
 (0)