Page MenuHomePhabricator

[AArch64] Change location of frame-record within callee-save area.
ClosedPublic

Authored by sdesmalen on Fri, Aug 2, 6:25 AM.

Details

Summary

This patch changes the location of the frame-record (FP, LR) to the
bottom of the callee-saved area. According to the AAPCS the location of
the frame-record within the stackframe is unspecified (section 5.2.3 The
Frame Pointer), so the compiler should be free to choose a different
location.

The reason for changing the location of the frame-record is to prepare
the frame for allocating an SVE area below the callee-saves. This way the
compiler can use the VL-scaled addressing modes to directly access SVE
objects from the frame-pointer.

          :                :   
      | stack |        | stack |
      |  args |        |  args |
      +-------+        +-------+
      |  x30  |        |  x19  |
      |  x29  |        |  x20  |
FP -> |- - - -|        |  x21  |
      |  x19  |   ==>  |  x22  |
      |  x20  |        |- - - -|
      |  x21  |        |  x30  |
      |  x22  |        |  x29  |
      +-------+        +-------+ <- FP
      |///////|        |///////|         // realignment gap 
      |- - - -|        |- - - -|
      |spills/|        |spills/|
      | locals|        | locals|
SP -> +-------+        +-------+ <- SP

Things to point out:

  • The CFI instructions emit the FP/LR registers first (in the order as specified in the CallingConvention.td file) in order to support the Compact Unwinding format.
  • The algorith to find a paired register should be prevented from accidentally pairing some callee-saved register with LR that is not FP, since they should always be paired together when the frame has a frame-record.

Diff Detail

Repository
rL LLVM

Event Timeline

sdesmalen created this revision.Fri, Aug 2, 6:25 AM

I can't really comment on the correctness of this but other than the one comment I'd like to see added, LGTM.

lib/Target/AArch64/AArch64FrameLowering.cpp
1706 ↗(On Diff #213021)

Add a comment about what this does.

I can't think of any issue this would cause, in general; following frame records should still work the same way, and DWARF unwinding should still work.

test/CodeGen/AArch64/wineh-try-catch-realign.ll
13 ↗(On Diff #213021)

I don't think this is legal. On Windows, paired stores in the prologue must be in the form "stp xN, xN+1, [...]"; otherwise, there is no way to represent the store in the unwind format. (See https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling?view=vs-2019#unwind-codes )

sdesmalen marked an inline comment as done.Sun, Aug 4, 3:14 AM
sdesmalen added inline comments.
test/CodeGen/AArch64/wineh-try-catch-realign.ll
13 ↗(On Diff #213021)

The merging of two separate STR instructions into one STP seems to done by the LoadStoreOptimizer, not PEI. The unwind information generated seems correct.

Running with stop-after=prologepilog:

frame-setup STRXui killed $x28, $sp, 2 :: (store 8 into %stack.2)
frame-setup SEH_SaveReg 28, 16
frame-setup STRXui killed $x19, $sp, 3 :: (store 8 into %stack.1)
frame-setup SEH_SaveReg 19, 24

Running readobj --unwind on the binary produced by this test gives:

Prologue [
  0xd003              ; str x19, [sp, #24]
  0xd242              ; str x28, [sp, #16]
  0x83                ; stp x29, x30, [sp, #-32]!
  0xe4                ; end
]
kristof.beyls added inline comments.Mon, Aug 5, 1:02 AM
lib/Target/AArch64/AArch64FrameLowering.cpp
88 ↗(On Diff #213021)

I wonder if it would be a good idea to record the rationale for (some of the) order in which specific blocks are put in the frame record?
I think it would be useful to extend this comment, stating that the frame record is placed specifically to be able to make use of VL-scaled addressing modes to directly access SVE objects from the frame-pointer.
I feel the frame layout code is pretty complex, and therefore it helps to spell out the higher-level design rationales at the beginning of this file in a comment.

efriedma added inline comments.Mon, Aug 5, 3:19 PM
test/CodeGen/AArch64/wineh-try-catch-realign.ll
13 ↗(On Diff #213021)

Windows unwinding determines whether an instruction is part of the prologue by computing the distance in bytes between the current pc and the entry point. So if the real prologue is shorter than the encoded prologue, the exception unwinder could be tricked into skipping important steps in unwinding.

Even if your patch isn't causing this issue, it's more likely to come up with your patch, I think.

sdesmalen updated this revision to Diff 213654.Tue, Aug 6, 10:20 AM
  • Added comments describing rationale of the callee-save layout.
  • Added comments to invalidateRegisterPairing
  • Rebased patch onto D65817 which addresses the invalid pairing of callee save/restore instructions in the LoadStoreOptimizer.
sdesmalen marked 6 inline comments as done.Tue, Aug 6, 10:23 AM
sdesmalen added inline comments.
lib/Target/AArch64/AArch64FrameLowering.cpp
88 ↗(On Diff #213021)

Good point. I have added some words to describe the rationale.

test/CodeGen/AArch64/wineh-try-catch-realign.ll
13 ↗(On Diff #213021)

Thanks for pointing this out! I've created D65817 to address that issue.

Unfortunately I don't think this is viable for Darwin platforms, at least not yet. Our compact unwind encoding just has a bitmask for which registers are saved rather than saying where relative to fp.

Longer term there's room for a new encoding to handle this, but that's not something we can do quickly. It looks like there are at least 5 projects that would need to be updated.

lib/Target/AArch64/AArch64FrameLowering.cpp
355–356 ↗(On Diff #213654)

If changing just the order of the CFI directives convinces the AsmBackend to produce compact unwind for this new scheme then it doesn't have enough checks (understandable, the frame layout is normally very fixed).

sdesmalen updated this revision to Diff 213892.Wed, Aug 7, 8:31 AM
sdesmalen marked 2 inline comments as done.
  • Updated the patch to retain the current location of the frame-record in the callee-save area for Darwin.
  • Removed the code that tried to emit the CFI information for FP and LR before the other callee-saves, as this was only needed on Darwin for the compact unwind encoding.

Unfortunately I don't think this is viable for Darwin platforms, at least not yet. Our compact unwind encoding just has a bitmask for which registers are saved rather than saying where relative to fp.

Longer term there's room for a new encoding to handle this, but that's not something we can do quickly. It looks like there are at least 5 projects that would need to be updated.

Thanks for pointing out. I have updated the patch to retain the current callee-save area layout for Darwin (regardless of whether compact unwind encoding is required).
The changes aren't that significant, but it would be nice to use the same layout at some point.

thegameg added inline comments.Wed, Aug 7, 9:51 AM
lib/Target/AArch64/AArch64FrameLowering.cpp
47 ↗(On Diff #213892)

Maybe it's worth adding a note pointing out that on Darwin this is different?

sdesmalen updated this revision to Diff 213978.Wed, Aug 7, 1:15 PM
sdesmalen marked 2 inline comments as done.

Added an extra comment describing the swapped order of the frame-record location on Darwin.

lib/Target/AArch64/AArch64FrameLowering.cpp
47 ↗(On Diff #213892)

Good spot, thanks!

kristof.beyls added inline comments.Thu, Aug 8, 12:15 AM
lib/Target/AArch64/AArch64FrameLowering.cpp
91–95 ↗(On Diff #213978)

Mostly looks good to me.
I wonder if it'd be useful to explicitly state that the size of "SVE stack objects" aren't known at compile time, and hence the offset that needs to be calculated needs to be multiplied by an unknown-at-compile time factor?
This is of course all very obvious when working already for some time on SVE; but some readers of this comment may be unfamiliar with SVE?

sdesmalen updated this revision to Diff 214081.Thu, Aug 8, 1:28 AM

Clarified comment.

sdesmalen marked 3 inline comments as done.Thu, Aug 8, 1:30 AM
sdesmalen added inline comments.
lib/Target/AArch64/AArch64FrameLowering.cpp
91–95 ↗(On Diff #213978)

Agreed. I've added some more words to describe that.

sdesmalen marked an inline comment as done.Mon, Aug 12, 3:57 PM

Thanks for all the feedback so far. I think I've addressed all comments and suggestions, are we happy to move forward with this patch?

Thanks for all the feedback so far. I think I've addressed all comments and suggestions, are we happy to move forward with this patch?

The comments and changes are great. I'm happy to move forward but I don't think I should be the one to accept, as I only have experience with Linux/AArch64 and not other systems like Windows and Darwin.

efriedma added inline comments.Tue, Aug 13, 4:45 PM
lib/Target/AArch64/AArch64CallingConvention.td
342 ↗(On Diff #214081)

Is the order of the registers in this list important? If it is, should we explain it in a comment somewhere?

test/CodeGen/AArch64/aarch64-dynamic-stack-layout.ll
583 ↗(On Diff #214081)

I don't understand what's happening here; are we reducing register pressure somhow?

sdesmalen updated this revision to Diff 215138.Wed, Aug 14, 9:06 AM
sdesmalen marked 2 inline comments as done.

Added comment to clarify that the order of the callee-saves as specified in AArch64CallingConvention.td actually affects the layout of the callee-saves in the generated stackframe.

sdesmalen marked an inline comment as done.Wed, Aug 14, 9:07 AM
sdesmalen added inline comments.
lib/Target/AArch64/AArch64CallingConvention.td
342 ↗(On Diff #214081)

This is indeed important since various parts in AArch64FrameLowering depend on this order (see also my other comment). I've added a comment to describe this, thanks!

test/CodeGen/AArch64/aarch64-dynamic-stack-layout.ll
583 ↗(On Diff #214081)

Before it did an extra spill of x28 to use as scratch register, but now it uses x30 for that purpose.

This is due to a quirk in determineCalleeSaves, which iterates through the list of callee-saves and then spills the last unused register as a scratch register. With the previous order in which the callee-saves were specified in the .td file, this last register was x28. With the new ordering, the last unused register is x30. This register already conveniently stored as part of the frame-record since we're using a frame-pointer, so there is no need to spill an additional register.

This revision is now accepted and ready to land.Wed, Aug 14, 4:24 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptThu, Aug 15, 3:35 AM
leonardchan added a subscriber: leonardchan.EditedThu, Aug 15, 5:37 PM

I have a suspicion that this might be causing the assertion error we see in https://bugs.llvm.org/show_bug.cgi?id=43015. Checking this now. Just wanted to raise awareness in the meantime.

sdesmalen marked an inline comment as done.Fri, Aug 16, 8:49 AM

I have a suspicion that this might be causing the assertion error we see in https://bugs.llvm.org/show_bug.cgi?id=43015. Checking this now. Just wanted to raise awareness in the meantime.

Sorry for that, there was an obvious condition missing in that assert. I have relanded the patch in rGf28e1128d9ef .

Hi again. After some bisecting, we found that the relanding of this (rGf28e1128d9e) was causing a segfault on our aarch64 asan bots.

Stack trace when attempting to launch Fuchsia on QEMU:

[318.389] 01525.01742> <== exception: process boot/test/sys/thread-state-test[1099746] thread initial-thread[1099761]
[318.389] 01525.01742> <== sw breakpoint, PC at 0xf48c53187fc0
[318.389] 01525.01742>  x0  0xee726573756d65ee x1                   0 x2                   0 x3                0x34
[318.389] 01525.01742>  x4      0xacbe942db2e0 x5      0xa5edc685d51c x6  0x2820202020202020 x7  0x777320676e697375
[318.389] 01525.01742>  x8  0xee726573756d65ee x9  0x988573be33452e9e x10                0x7 x11     0xa5edc685a078
[318.389] 01525.01742>  x12     0x14bdb8d0b40f x13     0x14bdb8d0b40d x14 0xf8f8f8f8f8f8f8f8 x15                  0
[318.390] 01525.01742>  x16               0x1b x17     0xe196795d09a8 x18     0xe287a07e5000 x19         0xfffffff1
[318.390] 01525.01742>  x20     0xf48c5317f8e0 x21     0xacbe942db8f0 x22          0x5743993 x23     0xacbe942db800
[318.390] 01525.01742>  x24     0xacbe942db810 x25     0x193d9a9c80d0 x26     0xacbe942db7e0 x27     0x1597d285b6fc
[318.390] 01525.01742>  x28     0x1597d285b702 x29     0xacbe942db7a0 lr      0xf48c53187fac sp      0xacbe942db7a0
[318.390] 01525.01742>  pc      0xf48c53187fc0 psr         0x60000000
[318.390] 01525.01742> arch: aarch64
[318.391] 01525.01742> dso: id=3ab1b3612220919e base=0xf48c5317c000 name=app:boot/test/sys/thread-state-test
[318.391] 01525.01742> dso: id=6dd3318a7516cc20 base=0xe79ffff8b000 name=libc++.so.2
[318.391] 01525.01742> dso: id=4e7ba3b6018e3afe base=0xe196795ca000 name=<vDSO>
[318.391] 01525.01742> dso: id=1038afe56f0ed1b5 base=0xd0a88b63e000 name=libc++abi.so.1
[318.391] 01525.01742> dso: id=781087240e810874 base=0xc9ecd4e35000 name=libunittest.so
[318.391] 01525.01742> dso: id=8f25bd2b8480d32c base=0xb078a64d3000 name=libfdio.so
[318.391] 01525.01742> dso: id=7f401cc813f2cf9c base=0xa5edc6730000 name=libc.so
[318.391] 01525.01742> dso: id=62ede582d82f7da3 base=0x9535580de000 name=libclang_rt.asan.so
[318.391] 01525.01742> dso: id=738a2fbe70bb8f31 base=0x9460793bf000 name=libunwind.so.1
[318.391] 01525.01742> dso: id=6428e355aa5298dd base=0x55e1360d9000 name=liblaunchpad.so
[318.392] 01525.01742>    #1.1  0x0000f48c53187fc0 in backtrace_request() ../../out/default.zircon/../../zircon/system/ulib/backtrace-request/include/lib/backtrace-request/backtrace-request.h:30 <<VMO#1099739=thread-state-test>>+0xbfc0 sp 0xacbe942db7a0
[318.392] 01525.01742>    #1    0x0000f48c53187fc0 in tu_fatal ../../out/default.zircon/../../zircon/system/ulib/test-utils/test-utils.cc:82 <<VMO#1099739=thread-state-test>>+0xbfc0 sp 0xacbe942db7a0
[318.394] 01525.01742>    #2    0x0000f48c53188488 in tu_channel_read ../../out/default.zircon/../../zircon/system/ulib/test-utils/test-utils.cc:165 <<VMO#1099739=thread-state-test>>+0xc488 sp 0xacbe942db7c0
[318.395] 01525.01742>    #3    0x0000f48c53187004 in recv_msg(unsigned int, Message*) ../../out/default.zircon/../../zircon/system/utest/thread-state/thread-state.cc:94 <<VMO#1099739=thread-state-test>>+0xb004 sp 0xacbe942db7e0
[318.396] 01525.01742>    #4    0x0000f48c53186ab0 in get_child_thread(unsigned int, unsigned int*) ../../out/default.zircon/../../zircon/system/utest/thread-state/thread-state.cc:410 <<VMO#1099739=thread-state-test>>+0xaab0 sp 0xacbe942db8c0
[318.397] 01525.01742>    #5.4  0x0000f48c53182758 in (anonymous namespace)::FidlDecoder::SetError(char const*) ../../out/default.zircon/../../zircon/system/ulib/fidl/decoding.cc:200 <<VMO#1099739=thread-state-test>>+0x6758 sp 0xacbe942db9c0
[318.397] 01525.01742>    #5.3  0x0000f48c53182758 in (anonymous namespace)::FidlDecoder::OnError(char const*) ../../out/default.zircon/../../zircon/system/ulib/fidl/decoding.cc:179 <<VMO#1099739=thread-state-test>>+0x6758 sp 0xacbe942db9c0
[318.397] 01525.01742>    #5.2  0x0000f48c53182758 in fidl::internal::Walker<(anonymous namespace)::FidlDecoder>::Walk((anonymous namespace)::FidlDecoder&) ../../out/default.zircon/../../zircon/system/ulib/fidl/include/lib/fidl/walker.h:706 <<VMO#1099739=thread-state-test>>+0x6758 sp 0xacbe942db9c0
[318.397] 01525.01742>    #5.1  0x0000f48c53182758 in void fidl::Walk<(anonymous namespace)::FidlDecoder>((anonymous namespace)::FidlDecoder&, fidl_type const*, (anonymous namespace)::FidlDecoder::StartingPoint) ../../out/default.zircon/../../zircon/system/ulib/fidl/include/lib/fidl/walker.h:893 <<VMO#1099739=thread-state-test>>+0x6758 sp 0xacbe942db9c0
[318.397] 01525.01742>    #5    0x0000f48c53182758 in channel_test() ../../out/default.zircon/../../zircon/system/ulib/fidl/decoding.cc:268 <<VMO#1099739=thread-state-test>>+0x6758 sp 0xacbe942db9c0
[318.399] 01525.01742>    #6.3  0x0000c9ecd4e3b40c in unittest_run_test(char const*, bool (*)(), test_info**, bool*) ../../out/default.zircon/../../zircon/system/ulib/unittest/unittest.cc:268 <<VMO#1099829=asan/libunittest.s>+0x640c sp 0xacbe942dbae0
[318.399] 01525.01742>    #6.2  0x0000c9ecd4e3b40c in unittest_run_named_test::$_1::operator()() const ../../out/default.zircon/../../zircon/system/ulib/unittest/unittest.cc:296 <<VMO#1099829=asan/libunittest.s>+0x640c sp 0xacbe942dbae0
[318.399] 01525.01742>    #6.1  0x0000c9ecd4e3b40c in void run_with_watchdog<unittest_run_named_test::$_1>(test_type, char const*, unittest_run_named_test::$_1) ../../out/default.zircon/../../zircon/system/ulib/unittest/unittest.cc:285 <<VMO#1099829=asan/libunittest.s>+0x640c sp 0xacbe942dbae0
[318.399] 01525.01742>    #6    0x0000c9ecd4e3b40c in unittest_run_named_test ../../out/default.zircon/../../zircon/system/ulib/unittest/unittest.cc:295 <<VMO#1099829=asan/libunittest.s>+0x640c sp 0xacbe942dbae0
[318.399] 01525.01742>    #7.4  0x0000f48c53181260 in (anonymous namespace)::FidlHandleCloser::SetError(char const*) ../../out/default.zircon/../../zircon/system/ulib/fidl/handle_closing.cc:93 <<VMO#1099739=thread-state-test>>+0x5260 sp 0xacbe942dbbe0
[318.399] 01525.01742>    #7.3  0x0000f48c53181260 in (anonymous namespace)::FidlHandleCloser::OnError(char const*) ../../out/default.zircon/../../zircon/system/ulib/fidl/handle_closing.cc:84 <<VMO#1099739=thread-state-test>>+0x5260 sp 0xacbe942dbbe0
[318.399] 01525.01742>    #7.2  0x0000f48c53181260 in fidl::internal::Walker<(anonymous namespace)::FidlHandleCloser>::Walk((anonymous namespace)::FidlHandleCloser&) ../../out/default.zircon/../../zircon/system/ulib/fidl/include/lib/fidl/walker.h:648 <<VMO#1099739=thread-state-test>>+0x5260 sp 0xacbe942dbbe0
[318.399] 01525.01742>    #7.1  0x0000f48c53181260 in void fidl::Walk<(anonymous namespace)::FidlHandleCloser>((anonymous namespace)::FidlHandleCloser&, fidl_type const*, (anonymous namespace)::FidlHandleCloser::StartingPoint) ../../out/default.zircon/../../zircon/system/ulib/fidl/include/lib/fidl/walker.h:893 <<VMO#1099739=thread-state-test>>+0x5260 sp 0xacbe942dbbe0
[318.399] 01525.01742>    #7    0x0000f48c53181260 in thread_state_tests(bool, char const*) ../../out/default.zircon/../../zircon/system/ulib/fidl/handle_closing.cc:120 <<VMO#1099739=thread-state-test>>+0x5260 sp 0xacbe942dbbe0
[318.401] 01525.01742>    #8.1  0x0000c9ecd4e3a44c in unittest_run_all_tests_etc(char const*, test_type, char const*, char const*, bool) ../../out/default.zircon/../../zircon/system/ulib/unittest/all-tests.cc:46 <<VMO#1099829=asan/libunittest.s>+0x544c sp 0xacbe942dbcc0
[318.401] 01525.01742>    #8    0x0000c9ecd4e3a44c in unittest_run_all_tests ../../out/default.zircon/../../zircon/system/ulib/unittest/all-tests.cc:207 <<VMO#1099829=asan/libunittest.s>+0x544c sp 0xacbe942dbcc0
[318.401] 01525.01742>    #9.4  0x0000f48c53184bbc in (anonymous namespace)::FidlDecoder::SetError(char const*) ../../out/default.zircon/../../zircon/system/ulib/fidl/decoding.cc:197 <<VMO#1099739=thread-state-test>>+0x8bbc sp 0xacbe942dbdc0
[318.401] 01525.01742>    #9.3  0x0000f48c53184bbc in (anonymous namespace)::FidlDecoder::VisitPointer((anonymous namespace)::Position, void**, unsigned int, (anonymous namespace)::Position*) ../../out/default.zircon/../../zircon/system/ulib/fidl/decoding.cc:76 <<VMO#1099739=thread-state-test>>+0x8bbc sp 0xacbe942dbdc0
[318.401] 01525.01742>    #9.2  0x0000f48c53184bbc in fidl::internal::Walker<(anonymous namespace)::FidlDecoder>::Walk((anonymous namespace)::FidlDecoder&) ../../out/default.zircon/../../zircon/system/ulib/fidl/include/lib/fidl/walker.h:680 <<VMO#1099739=thread-state-test>>+0x8bbc sp 0xacbe942dbdc0
[318.401] 01525.01742>    #9.1  0x0000f48c53184bbc in void fidl::Walk<(anonymous namespace)::FidlDecoder>((anonymous namespace)::FidlDecoder&, fidl_type const*, (anonymous namespace)::FidlDecoder::StartingPoint) ../../out/default.zircon/../../zircon/system/ulib/fidl/include/lib/fidl/walker.h:893 <<VMO#1099739=thread-state-test>>+0x8bbc sp 0xacbe942dbdc0
[318.401] 01525.01742>    #9    0x0000f48c53184bbc in main ../../out/default.zircon/../../zircon/system/ulib/fidl/decoding.cc:268 <<VMO#1099739=thread-state-test>>+0x8bbc sp 0xacbe942dbdc0
[318.403] 01525.01742>    #10   0x0000a5edc67bd134 in start_main ../../out/default.zircon/../../zircon/third_party/ulib/musl/src/env/__libc_start_main.c:93 <<VMO#1099752=asan/ld.so.1>>+0x8d134 sp 0xacbe942dbe00
[318.403] 01525.01742>    #11   0x0000a5edc67bd864 in __libc_start_main ../../out/default.zircon/../../zircon/third_party/ulib/musl/src/env/__libc_start_main.c:179 <<VMO#1099752=asan/ld.so.1>>+0x8d864 sp 0xacbe942dc000
[318.403] 01525.01742>    #12   0x0000f48c53187f5c in _start ../../out/default.zircon/../../zircon/system/ulib/c/Scrt1.cc:7 <<VMO#1099739=thread-state-test>>+0xbf5c sp 0xa637d5b2fff0

Could you look into this again? Thanks.

Could you temporarily revert this in the meantime? This is blocking our clang roll that we need to fix another Clang issue we're hitting.

Could you look into this again? Thanks.

Happy to look into this, but the stacktrace you pasted doesn't give enough information for me to reproduce. Can you give some more details?

Could you temporarily revert this in the meantime? This is blocking our clang roll that we need to fix another Clang issue we're hitting.

From the information you provided it is not clear to me if the issue is actually caused by this patch or whether this is an issue in Fuchsia itself. Do you know if Fuchsia makes any implicit assumptions on the layout of the callee-save area beyond what is specified in the AAPCS? If so, we may be able to temporarily disable this for Fuchsia until that issue is resolved, but I'm hesitant to just make this change without understanding what the issue is first.