This is an archive of the discontinued LLVM Phabricator instance.

[Thumb1][MachineOutliner] Add Machine Outliner support for Thumb1
Needs ReviewPublic

Authored by pestctrl on Apr 17 2021, 7:12 AM.

Details

Summary

MachineOutliner support was previously added for Thumb2 and ARM modes.
Thumb1 support was left out, because there is no simple way to push
the LR onto the stack and then restore it back into the LR. Such an
idiom is required when:

  • A repeated sequence contains a call, which requires the LR to be saved and restored at the frame site.
  • MachineOutlinerDefault, where the LR needs to be saved and restored at the call site.

Besides the two situations described above, Thumb1 machine outlining
should be able to reuse all of the code introduced for machine
outlining in Thumb2, since all instructions generated fall in the
Thumb1 subset.

This patch enables Thumb1 support for the outliner, and introduces an
idiom for saving and restoring the LR. The LR can only be saved and
restored if there is an unused register in a repeated sequence. If
this is the case, then the machine outliner can generate this sequence
of instructions to save the LR:

PUSH R4     ;; Free up an auxilary register
PUSH LR     ;; LR must go on the top of the stack

......      ;; Original sequence of instructions

POP R4      ;; Pop LR off the stack into R4
            ;; (Cannot be directly popped into LR)
MOV LR, R4  ;; Move LR into the correct register
POP R4      ;; Pop the original value of R4 into R4

Diff Detail

Event Timeline

pestctrl created this revision.Apr 17 2021, 7:12 AM
pestctrl requested review of this revision.Apr 17 2021, 7:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 17 2021, 7:12 AM
pestctrl edited the summary of this revision. (Show Details)Apr 17 2021, 7:14 AM
pestctrl edited the summary of this revision. (Show Details)Apr 17 2021, 7:22 AM
pestctrl updated this revision to Diff 338308.Apr 17 2021, 7:30 AM

Incorrect comment

yroux added a comment.Apr 23 2021, 8:15 AM

Hi and Thanks for working on this,

I've a couple of embedded comments, and can you add some tests to check SP-relative loads and stores handling ?

llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
6375

since there is not much logic in saveLROnsStack and restoreLRFromStack function what about removing them and creating the MIs here ?

llvm/lib/Target/ARM/Thumb1InstrInfo.cpp
360

why not adding FreeReg to MBB's LiveIns instead of spilling LR twice ?

llvm/test/CodeGen/ARM/machine-outliner-default.mir
167

I think that $r4 is not needed

pestctrl updated this revision to Diff 340380.Apr 25 2021, 1:23 PM

Fixed issues with clang-format

pestctrl updated this revision to Diff 340383.Apr 25 2021, 1:43 PM

Build MI's for save/restore LR in same function.

pestctrl marked an inline comment as done.Apr 25 2021, 1:44 PM
pestctrl added inline comments.
llvm/lib/Target/ARM/Thumb1InstrInfo.cpp
360

Huh, I had no idea you could do this. This might simplify my code a lot.

pestctrl updated this revision to Diff 356341.Jul 3 2021, 6:43 AM

Add unused register to live-ins, instead of pushing the LR twice

pestctrl marked an inline comment as done.Jul 3 2021, 6:44 AM

@yroux Hey, sorry it's been a while. I've updated this commit to fix the issues you brought up regarding saving the LR twice, could you look at this again?

Thanks!

tTAILJMPdND isn't generally available in Thumb mode, only in Thumb2 or armv8m.base.

PUSH R4     ;; Free up an auxilary register
PUSH LR     ;; LR must go on the top of the stack

......      ;; Original sequence of instructions

POP R4      ;; Pop LR off the stack into R4
            ;; (Cannot be directly popped into LR)
MOV LR, R4  ;; Move LR into the correct register
POP R4      ;; Pop the original value of R4 into R4

Can we instead do something like:

push lr
.... ; Original sequence
mov ip, r4
pop r4
mov lr, r4
mov r4, ip
llvm/lib/Target/ARM/Thumb1InstrInfo.cpp
257

BeforeSequence is unused here?

Oh, that's pretty neat! I would've never thought of that.

Wouldn't I still need to push 2 registers to the stack, to maintain stack alignment? Or could I just push one register?

ARM ABI rules say the stack has to be aligned to 8 bytes at ABI boundaries, 4 bytes otherwise. So it might be an issue in some cases?