gh-139808: Add branch protections for aarch64 in asm_trampoline.S#130864
Conversation
|
The current equivalent eh_frame for the aarch64 assembly code, generated by C code doesn't seem to properly match, although at the same time it works fine. Here is a comparison between x86_64 and aarch64. x86_64 eh_frame: Equivalent to: cpython/Python/perf_jit_trampoline.c Lines 474 to 478 in d0ecbdd Whereas the eh_frame for aarch64: Equivalent to: cpython/Python/perf_jit_trampoline.c Lines 480 to 489 in d0ecbdd |
a6fe267 to
fe6fb29
Compare
|
Ah wasn't taking into account the code factor alignment for aarch64. Fixed that. |
fe6fb29 to
8772759
Compare
|
eh_frame for the new assembly which should be implemented in C: 0000000 0000000000000010 0000000 CIE 00000014 0000000000000020 00000018 FDE cie=00000000 pc=0000000000000000..000000000000001c |
8772759 to
8519ff1
Compare
|
You might add a NEWS entry using blurb or blurb-it, see: https://devguide.python.org/ |
8519ff1 to
1728096
Compare
|
Noting here that while the protections are enabled with this PR, Perf unwinding which includes the Python functions does not work without Frame Pointers. For that the And they should match this .eh_frame: Haven't yet figured the correct way to do that though as utilizing the way that the ARM abi describes it by using 0x2D for DW_CFA_AARCH64_negate_ra_state, doesn't work. |
|
@diegorusso might be able to help, or at least know someone |
|
@stratakis and I have been discussed this in the past. I'm not sure what the status is. Do you need further help? |
|
I haven't managed to make any progress yet. I took a look again recently but it's not possible to test it till #139544 is resolved. |
|
This PR is stale because it has been open for 30 days with no activity. |
|
|
||
| #if defined(__ARM_FEATURE_BTI_DEFAULT) && __ARM_FEATURE_BTI_DEFAULT == 1 | ||
| #define BTI_J hint 36 /* bti j: for jumps, IE br instructions */ | ||
| #define BTI_C hint 34 /* bti c: for calls, IE bl instructions */ |
There was a problem hiding this comment.
This LGTM, still need to test, but just a few comments:
- Do we need to support binutils older than 2.31? If we don't we can just use the mnemonics.
- Do we need it added conditionally? We can just add the instructions, they're binary compatible with older machines so we can just do the PAC path and we get both. It will just NOP on unsupported machines. So if this can take the cost of a NOP, we can simplify this quite a bit.
Don't let my comments derail this, they can always be done in follow ups IMHO.
There was a problem hiding this comment.
The only OS that still has glibc < 2.31 is for RHEL8. While the buildbots do not compile python >= 3.12 on RHEL8, still someone trying to compile Python 3.13 and higher there won't have it compiled. Granted RHEL8 doesn't enable mbranch-protection by default, I'd rather have at least till RHEL8 goes out of support the old way for delaring pac and bti.
Regarding the if else conditions, while they can be simplified, the assembly code maps nicely somewhat 1on1 with the C code generating dwarf unwinding info for the non-frame pointer case, simplifying them there would make the visual comparisons a tad more cumbersome.
The BTI flag must be applied in assembler sources for this class of attacks to be mitigated on newer aarch64 processors. See also: https://sourceware.org/annobin/annobin.html/Test-branch-protection.html and https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/enabling-pac-and-bti-on-aarch64
1728096 to
c8eca8e
Compare
Documentation build overview
67 files changed ·
|
|
#139544 is resolved in the latest kernel releases, so I had time to extensively test this. The PR should be ready and I've rebased to account for the latest upstream changes with the jit_unwind.c. Tested on aarch64 VMs running on a Neoverse-N1, the one hosting the aarch64 buildbots. A Fedora Rawhide machine without PAC/BTI and then also on a QEMU TCG machine with PAC/BTI active (-cpu max). Both machines had the latest kernel release 7.1.0-rc2 and GCC 16.1.1. I've also tested for good measure on a Fedora 43 machine with kernel 7.0.4 and GCC 15.2.1, the results were the same. On the non-PAC/BTI machine: On the emulated PAC/BTI machine which was the slowlest testing I've experienced: Also the DW_CFA_AARCH64_negate_ra_state is correctly emitted in prologue and epilogue of the perf trampoline FDE. And BTI, PAC, GCS are present on both libpython and python binary |
|
I built the Python main branch with this change on Fedora 43 AArch64 using: I had to add many options to please annocheck. annocheckannocheck: pass on the Python executable, fail on libpython I don't know how to fix the "gaps" issue, I already passed readelfreadelf: test_perf_profiler: But I got some random test_python_calls_appear_in_the_stack_if_perf_activated() failures because "Warning: Processed 381 events and lost 1 chunks! Check IO/CPU overload!" was logged. Linux perfperf: Python frames are recognized as expected. my_script.py: def foo(n):
result = 0
for _ in range(n):
result += 1
return result
def bar(n):
foo(n)
def baz(n):
bar(n)
if __name__ == "__main__":
baz(1000000)Commands: |
|
Without this change, |
vstinner
left a comment
There was a problem hiding this comment.
LGTM. With this change, readelf reports BTI and PAC properties on the Python executable and libpython according to my manual tests. Without this change, BTI and PAC properties are missing.
|
I'm not comfortable to backport this change to 3.13 and 3.14 stable branches, since I'm not an AArch64 export. But I'm fine with backporting the change to the 3.15 branch. |
This issue affects versions from 3.13 and up (although the dwarf code lives in perf_jit_trampoline.c there). The issue is quite isolated and resolves a security issue so I think backporting it makes sense. |
|
Thanks @stratakis for the PR, and @vstinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.15. |
|
GH-149730 is a backport of this pull request to the 3.15 branch. |
|
This builds on top of #128606