This is an archive of the discontinued LLVM Phabricator instance.

[ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records
ClosedPublic

Authored by pratlucas on May 6 2022, 7:03 AM.

Details

Summary

Currently the a AAPCS compliant frame record is not always created for
functions when it should. Although a consistent frame record might not
be required in some cases, there are still scenarios where applications
may want to make use of the call hierarchy made available trough it.

In order to enable the use of AAPCS compliant frame records whilst keep
backwards compatibility, this patch introduces a new command-line option
(-mframe-chain=[none|aapcs|aapcs+leaf]) for Aarch32 and Thumb backends.
The option allows users to explicitly select when to use it, and is also
useful to ensure the extra overhead introduced by the frame records is
only introduced when necessary, in particular for Thumb targets.

Diff Detail

Event Timeline

pratlucas created this revision.May 6 2022, 7:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 6 2022, 7:03 AM
pratlucas requested review of this revision.May 6 2022, 7:03 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 6 2022, 7:03 AM
pratlucas updated this revision to Diff 427632.May 6 2022, 7:19 AM

Addressing linting messages.

We probably want testcases for accessing various parts of the stack frame with FP stored in r11; for example, can we correctly access arguments passed on the stack if there's stack realignment?

llvm/lib/Target/ARM/Thumb1FrameLowering.cpp
263

It seems sort of fragile to assume the entry block doesn't contain any tPUSH instructions; we can use them in places other than the prologue. Can we use the FrameSetup flag?

pratlucas updated this revision to Diff 429237.May 13 2022, 7:34 AM

Adding frame access test coverage, adding check for FrameSetup flag and adjusting calculation of FP value.

pratlucas marked an inline comment as done.May 13 2022, 7:42 AM
pratlucas added inline comments.
llvm/lib/Target/ARM/Thumb1FrameLowering.cpp
263

This approach is fragile indeed, I guess the fact that high reg spills are less usual has helpep it go unnoticed for so long.
I've added the checks for the FrameSetup flag to make sure it doesn't run over into unrelated instructions.

efriedma added inline comments.May 13 2022, 11:02 AM
llvm/test/CodeGen/Thumb/frame-access.ll
44

ldr r0, [r11, #8] is not a valid Thumb1 instruction.

pratlucas updated this revision to Diff 430324.May 18 2022, 4:28 AM
pratlucas marked an inline comment as done.

Fix incorrect use of r11 on load/store instructions.

pratlucas marked an inline comment as done.May 18 2022, 4:29 AM
efriedma added inline comments.May 18 2022, 1:56 PM
llvm/lib/Target/ARM/ARMFrameLowering.cpp
2226

How do you end up with tSTRi with a frameindex operand? SelectionDAG won't generate that, I think.

2492

Should requiresAAPCSFrameRecord() be orthogonal to DisableFramePointerElim()? I mean, we have a complete set of flags controlling frame pointer elimination; the only part that's "new" here is that the frame pointer is stored in r11, instead of r7.

2699

Rearrange this to check ExtraCSSpill first, so we don't run canSpillOnFrameIndexAccess() if it isn't necessary?

llvm/test/CodeGen/Thumb/frame-access.ll
77

Can we use sp-relative accesses here? If we're not doing dynamic allocations, it should be a fixed offset.

103

I don't think "add r11, sp, #0" corresponds to a legal Thumb1 instruction? (Does -verify-machine-instrs catch this?)

pratlucas marked 4 inline comments as done.May 20 2022, 8:58 AM
pratlucas added inline comments.
llvm/lib/Target/ARM/ARMFrameLowering.cpp
2226

Indeed I didn't hit any case where tSTRi was emitted during my tests, but I included it here to be on the safe side given the contents of the comment on line 2050. I'm happy to remove it if it's indeed not necessary.

2492

For cases where a function's codegen requires the use of a frame pointer, we want an AAPCS compliant frame record to be generated even when the frame pointer elimination is enabled. For that reason we need the two options to be orthogonal on some level.
I've updated this check to be more strict and only consider requiresAAPCSFrameRecord() when the function has a frame pointer.

llvm/test/CodeGen/Thumb/frame-access.ll
77

The current criteria for the Arm and Thumb backends is that fixed frame indices are always accessed through FP whenever it is available (See ARMFrameLowering.cpp).
I guess that could be changed, but I feel it would fall outside the scope of this patch.

pratlucas updated this revision to Diff 430991.May 20 2022, 8:58 AM
pratlucas marked 2 inline comments as done.

Addressing review comments.

efriedma added inline comments.May 20 2022, 11:43 AM
llvm/lib/Target/ARM/ARMFrameLowering.cpp
2245

Should this also check hasFP?

2492

For cases where a function's codegen requires the use of a frame pointer, we want an AAPCS compliant frame record to be generated even when the frame pointer elimination is enabled.

That's... hmm. Feels a little weird, but I guess it makes sense.

The hasFP(MF) here is redundant; this is already inside an if (HasFP) { block.

llvm/lib/Target/ARM/Thumb1FrameLowering.cpp
1200

Are we actually guaranteed to have any unused return registers at this point? The calling convention uses r0-r3 to return values.

llvm/test/CodeGen/Thumb/frame-access.ll
77

This patch does make it much more likely that fp-relative accesses are going to be out of range, but sure, I guess we can treat it as a separate issue.

pratlucas updated this revision to Diff 431634.May 24 2022, 3:37 AM
pratlucas marked 2 inline comments as done.

Avoiding unecessary uses of FP and addressing review comments.

pratlucas marked 2 inline comments as done.May 24 2022, 3:57 AM
pratlucas added inline comments.
llvm/lib/Target/ARM/Thumb1FrameLowering.cpp
1200

Although unlikely in thumb1, we could run out of return registers here indeed (the calling convention only uses r2 and r3 to return 128-bit containerized vectors).
I've updated the code to make use of a scratch register if that ever happens. This required D126285, a minor NFC change so it could work correctly.

llvm/test/CodeGen/Thumb/frame-access.ll
77

That's a good point. I've pondered over this for a bit and realized that this is not the intended behaviour - we want FP and the frame record to be compliant with AAPCS in the scenarios where they're supposed to be generated and not to make its use more likely.
I've updated the code to match this. The expected behaviour when using the new option should be:

  • r11 is set as the frame pointer - It should be reserved and not used as a general purpose register
  • For a function, an AAPCS compliant frame record should be created and r11 should point to its location on the stack if:
    • Frame pointer elimination is disabled for the function (leaf vs non-leaf)
    • Codegen requires the use of FP
efriedma added inline comments.May 27 2022, 12:14 PM
llvm/lib/Target/ARM/ARMFrameLowering.cpp
1045

I guess this is related to this patch because it involves the interaction of PACBTI with -mframe_chain ? (Without this patch, PACBTI doesn't exist on any targets where the frame pointer is in r11.) I'm fine leaving it to a followup, though.

llvm/lib/Target/ARM/Thumb1FrameLowering.cpp
1120

*intermediate.

llvm/lib/Target/ARM/ThumbRegisterInfo.cpp
556–557

Do we need to do something different here if FrameReg is r11?

pratlucas updated this revision to Diff 434846.Jun 7 2022, 9:26 AM
pratlucas marked 2 inline comments as done.

Addressing comments.

pratlucas marked 2 inline comments as done.Jun 7 2022, 9:27 AM
pratlucas added inline comments.
llvm/lib/Target/ARM/ARMFrameLowering.cpp
1045

Sounds good. I'm going to address that on a followup patch.

efriedma added inline comments.Jun 7 2022, 12:40 PM
llvm/test/CodeGen/Thumb/frame-access.ll
335

This sequence requires, in general, scavenging two registers. I'm not sure we can do that in general? I think we normally only have one emergency spill slot.

Maybe we can save a register using add instead of mov; something like ldr r0, .LCPI0_0; add r0, r11; ldr r0, [r0].

pratlucas updated this revision to Diff 435157.Jun 8 2022, 7:38 AM

Avoid scavenging extra register when accessing const pool.

pratlucas marked an inline comment as done.Jun 8 2022, 7:41 AM
pratlucas added inline comments.
llvm/test/CodeGen/Thumb/frame-access.ll
335

That's true. Loads aren't affected as they reuse the destination register, but stores would indeed require the scaveging of two registers.
I've updaded the code to use the add approach instead.

pratlucas marked an inline comment as done.Jun 8 2022, 7:41 AM
efriedma accepted this revision.Jun 9 2022, 2:36 PM

LGTM

The Thumb1 changes here are complicated, so I'm not sure we found all the issues, but you fixed all the issues I spotted.

This revision is now accepted and ready to land.Jun 9 2022, 2:36 PM
This revision was landed with ongoing or failed builds.Jun 13 2022, 2:21 AM
This revision was automatically updated to reflect the committed changes.
pratlucas reopened this revision.Jun 13 2022, 3:05 AM
This revision is now accepted and ready to land.Jun 13 2022, 3:05 AM
pratlucas updated this revision to Diff 436396.Jun 13 2022, 8:07 AM

Fixing assertion failure caused by incorrect use of tSTRr.

efriedma accepted this revision.Jun 13 2022, 12:25 PM
This revision was landed with ongoing or failed builds.Jun 14 2022, 5:37 AM
This revision was automatically updated to reflect the committed changes.

@pratlucas I'm seeing errors on my EXPENSIVE_CHECKS builds - please can you take a look?

Command Output (stdout):
--
$ ":" "RUN: at line 1"
$ "e:\llvm\ninja\bin\llc.exe" "-mtriple" "thumb-arm-none-eabi" "-filetype" "asm" "-o" "-" "E:\llvm\llvm-project\llvm\test\CodeGen\Thumb\frame-chain.ll" "-frame-pointer=all"
# command stderr:

# After Prologue/Epilogue Insertion & Frame Finalization
# Machine code for function required_fp: NoPHIs, TracksLiveness, NoVRegs, TiedOpsRewritten, TracksDebugUserValues
Frame Objects:
  fi#0: size=4, align=4, at location [SP-20]
  fi#1: size=4, align=4, at location [SP-24]
  fi#2: size=4, align=8, at location [SP-32]
  fi#3: size=8, align=8, at location [SP-40]
  fi#4: variable sized, align=1, at location [SP-40]
  fi#5: size=4, align=4, at location [SP-4]
  fi#6: size=4, align=4, at location [SP-8]
  fi#7: size=4, align=4, at location [SP-12]
  fi#8: size=4, align=4, at location [SP-16]
Function Live Ins: $r0, $r1

bb.0 (%ir-block.2):
  liveins: $r0, $r1, $r4, $lr
  frame-setup tPUSH 14, $noreg, killed $r4, killed $r6, killed $r7, killed $lr, implicit-def $sp, implicit $sp
  frame-setup CFI_INSTRUCTION def_cfa_offset 16
  frame-setup CFI_INSTRUCTION offset $lr, -4
  frame-setup CFI_INSTRUCTION offset $r7, -8
  frame-setup CFI_INSTRUCTION offset $r6, -12
  frame-setup CFI_INSTRUCTION offset $r4, -16
  $r7 = frame-setup tADDrSPi $sp, 2, 14, $noreg
  frame-setup CFI_INSTRUCTION def_cfa $r7, 8
  $sp = frame-setup tSUBspi $sp(tied-def 0), 6, 14, $noreg
  $r6 = tMOVr $sp, 14, $noreg
  $r2 = tMOVr killed $r6, 14, $noreg
  tSTRi killed renamable $r1, renamable $r2, 4, 14, $noreg :: (store (s32) into %ir.4)
  tSTRi renamable $r0, renamable $r2, 5, 14, $noreg :: (store (s32) into %ir.3)
  renamable $r1 = COPY $sp
  tSTRi killed renamable $r1, renamable $r2, 2, 14, $noreg :: (store (s32) into %ir.5, align 8)
  renamable $r1, dead $cpsr = tLSLri renamable $r0, 2, 14, $noreg
  renamable $r1, dead $cpsr = nuw tADDi3 killed renamable $r1, 7, 14, $noreg
  renamable $r3, dead $cpsr = tMOVi8 7, 14, $noreg
  renamable $r1, dead $cpsr = tBIC killed renamable $r1(tied-def 0), killed renamable $r3, 14, $noreg
  renamable $r3 = COPY $sp
  renamable $r1, dead $cpsr = tSUBrr killed renamable $r3, killed renamable $r1, 14, $noreg
  $sp = COPY killed renamable $r1
  renamable $r1, dead $cpsr = tMOVi8 0, 14, $noreg
  tSTRi killed renamable $r1, $r6, 1, 14, $noreg :: (store (s32) into %ir.6 + 4, basealign 8)
  tSTRi killed renamable $r0, killed renamable $r2, 0, 14, $noreg :: (store (s32) into %ir.6, align 8)
  $r4, $cpsr = frame-destroy tSUBi3 killed $r7, 7, 14, $noreg
  $r4, $cpsr = frame-destroy tSUBi8 $r4(tied-def 0), 1, 14, $noreg
  $sp = frame-destroy tMOVr $r4, 14, $noreg
  frame-destroy tPOP 14, $noreg, def $r4, def $r6, def $r7, implicit-def $sp, implicit $sp
  frame-destroy tPOP 14, $noreg, def $r0, implicit-def $sp, implicit $sp
  $lr = frame-destroy tMOVr killed $r0, 14, $noreg
  tBX_RET 14, $noreg

# End machine code for function required_fp.

*** Bad machine code: Non-flag-setting Thumb1 mov is v6-only ***
- function:    required_fp
- basic block: %bb.0  (0x166ad7b18e8)
- instruction: $r2 = tMOVr killed $r6, 14, $noreg
LLVM ERROR: Found 1 machine code errors.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace

The newly added frame-chain.ll test uncovered an issue with an invalid use of the tMOVr instruction, which is only available from V6 onwards. The failure was captured by a buildbot with expensive checks enabled.
The issue is unrelated to the codegen changes from this patch, and still happens when compiling the same IR code if the changes are reverted.
Given the above, I've updated the test to use thumbv6m while the issue is not resolved: rGcbcce82ef6b5.

Cheers! Are you looking at a fix or please can you raise a ticket?

@RKSimon I was writting my comment above just before I saw your message, lucky they went in in the correct order :)
I'm tracking down the source of the invalid tMOVr and I'll raise a ticket with the details.

pratlucas reopened this revision.Jun 23 2022, 5:31 AM
This revision is now accepted and ready to land.Jun 23 2022, 5:31 AM
pratlucas updated this revision to Diff 439356.Jun 23 2022, 5:35 AM

Fixing use-after-poison issue detected by ASAN buildbot.

When popping LR on thumb it's value is popped into PC and used as a return.
The original return instruction gets erased, invalidating the MachineInstr
iterator. This change makes sure we don't try to access the iterator in these
conditions.

efriedma accepted this revision.Jun 23 2022, 10:55 AM

LGTM with one minor comment (feel free to ignore if you disagree)

llvm/lib/Target/ARM/Thumb1FrameLowering.cpp
958

Maybe consider making this an MachineBasicBlock::iterator&, so MI isn't a dangling pointer? Not that the result is really any different from what you're doing here, but maybe less confusing.

pratlucas updated this revision to Diff 440189.Jun 27 2022, 6:07 AM

Updating method to use MAchineBasickBlock::iterator&.

pratlucas marked an inline comment as done.Jun 27 2022, 6:08 AM
This revision was landed with ongoing or failed builds.Jun 27 2022, 6:08 AM
This revision was automatically updated to reflect the committed changes.