This is an archive of the discontinued LLVM Phabricator instance.

[ARM][MachineOutliner] Add Machine Outliner support for ARM
ClosedPublic

Authored by yroux on Mar 12 2020, 7:53 AM.

Details

Summary

Enables Machine Outlining support on ARM for ARM and Thumb2 modes. Only the
simplest outlining modes (tailcalls and thunks) are handled here.

This is a follow-up of ARM Machine Outliner support RFC D57054

Diff Detail

Event Timeline

yroux created this revision.Mar 12 2020, 7:53 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMar 12 2020, 7:53 AM

Hi,

I like reading your code! But I'm concerned around the lack of tests here, specifically:

  • CPSR liveness.
  • LR liveness.
  • All things PIC related.
  • Linkage legality.
  • A negative test for Thumb-1.
  • Inline asm generally concerns me too.
llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
5710

When you say 'available', do you mean 'dead'? I'm struggling to follow the logic here...

5720

Do you need to clear LRU first?

5762

I'm surprised to see IT here?

5818

Isn't this already handled in the loop above? I prefer this notation though.

5838

Looks like this MI building could be refactored.

yroux marked 5 inline comments as done.Mar 25 2020, 7:05 AM

Thanks for the review Sam,

I'll update the patch with more tests (for CPSR/LR it was part of D76068 but you are right it should be done here) and take your comments into account.

Hi,

I like reading your code! But I'm concerned around the lack of tests here, specifically:

  • CPSR liveness.
  • LR liveness.
  • All things PIC related.
  • Linkage legality.
  • A negative test for Thumb-1.
  • Inline asm generally concerns me too.
llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
5710

Yes that's it, 'available' in llvm::LiveRegUnit means no part of the register is live so it's 'dead'.

The logic is to identify the unsafe cases where R12 or CPSR are live across the MBB without being defined in that MBB

5720

hmm, I don't think it is necessary to clear it, given that if the register was in the set the XXavailableInBlock is false so it doesn't matter if it is among liveouts or not the candidate will not have the flag UnsafeRegsDead and thus will not be outlined, if it wasn't and is part of the liveouts, we will reture false and the generic part of the MachineOutliner will not map the MBB and we will not outliune it.

5762

Yes, sorry I need to update the comment, this is a conservative way of handling IT blocks, to avoid having it splitted between an outlined region and the call sites

5818

Yes you are right, I'm cleaning this part

5838

yes, I'm refactoring it

yroux updated this revision to Diff 254834.Apr 3 2020, 9:07 AM

Here is an update of the patch

Hi Yvan,

Thanks for adding the tests, I've added a few concerns and questions.

llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
5615

Does this code work with the DSP instructions that read/write the Q and GE flags? I have a nasty feeling that we don't model their register usage.

5774

There's the getPredicate helper is ARMBaseInstrInfo which looks like it would be useful here.

5810

Probably worth doing these simple checks earlier. Should we also be concerned about the PC too?

llvm/lib/Target/ARM/ARMTargetMachine.cpp
556

We'll need the LowOverheadLoops pass to run for correctness, so we should instead only add the MachineOutliner if the subtarget doesn't support LOB.

yroux added a comment.Apr 8 2020, 8:19 AM

Hi Sam,

Thanks for your comments.

llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
5615

Good question, I'll test that

5774

Ah yes, and there is even isPredicated which might simplify the logic here.

5810

Well, the phasing is important here, if we move the checking of LR or SP usage earlier it will make the outlining of calls illegal since they implicitly modifies these registers or we can go back to my previous version which was checking the explicit usage but it introduces redundancy.

hmm.. maybe we can be conservative with PC, it is not verified in AArch64 code @paquette do you think it can be an issue?

llvm/lib/Target/ARM/ARMTargetMachine.cpp
556

What do you mean by "for correctness" ?

I think that it makes more sense that until MachineOutliner and LowOverheadLoops can work together, we have loloops enabled on targets which have LOB support unless it is explicitly disabled by -disable-arm-loloops flag or if the user wants machine outlining with the flag -moutline. If we do that in the opposite way it means that passing the flag -moutline will have no impact on such targets unless the -disable-arm-loloops flag is used

samparker added inline comments.Apr 8 2020, 11:38 PM
llvm/lib/Target/ARM/ARMTargetMachine.cpp
556

Ok. Well, the HardwareLoops pass inserts intrinsics, which are lowered to pseudos and then finalised by ARMLowOverheadLoops so the compiler could crash with this logic. TTI controls whether we generate a low-overhead loop and it also controls whether the vectorizer tries to optimise for those loops too... So, I think TTI will need to understand when we're trying to use the outliner too.

yroux added inline comments.Apr 9 2020, 4:45 AM
llvm/lib/Target/ARM/ARMTargetMachine.cpp
556

Ah Ok thanks, I'll try to find how to add that info into TTI

yroux updated this revision to Diff 260916.Apr 29 2020, 7:57 AM
yroux edited the summary of this revision. (Show Details)

Here is a new update of the patch.

I remove the logic to disable LowOverheadLoops pass since Eli has added the live-ins infos inside outlined functions in D78605.

yroux updated this revision to Diff 260921.Apr 29 2020, 8:02 AM
yroux added inline comments.Apr 29 2020, 8:30 AM
llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
5615

From what I saw in my testing of DSP instructions, these flags are handled by CPSR usage, so this code seems good to me, do you have something else in mind ?

samparker added inline comments.Apr 30 2020, 1:27 AM
llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
5615

Ah, good, then could you please add a couple of tests for those instructions? I think my last concern is back with low-overhead loops and tail-predication. I think we should explicitly avoid outlining any MVE instructions and, if this pass can, then also don't move pseudo instructions too.

yroux added inline comments.Apr 30 2020, 2:34 AM
llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
5615

OK, do you know a pretty way to check if an instruction is a MVE one or a pseudo ? If not I'll add a new helper isMVEOpcode...

samparker added inline comments.Apr 30 2020, 2:49 AM
llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
5615

Sure, for pseudos there's the MachineInstr isPseudo(). For MVE, it's encoded in the target flags:

const MCInstrDesc &MCID = MI.getDesc();
uint64_t Flags = MCID.TSFlags;
if ((Flags & ARMII::DomainMask) == ARMII::DomainMVE)
yroux updated this revision to Diff 261237.Apr 30 2020, 8:17 AM

This update avoids outlining MVE instructions.

Thanks for adding the MVE changes, but I also still don't see any DSP tests, i.e QADD, SADD16.

llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
5753

Should IMPLICIT_DEF instructions be considered invisible too?

5817

Cheers. AArch64 is nice and doesn't allow arbitrary writes to the PC, but we won't have that luxury here.

efriedma added inline comments.May 1 2020, 4:56 PM
llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
5607

If you control all the instructions that execute, you don't need to worry about what the procedure call standard says. You do need to worry about linker veneers if the outlined function is in a different section, though.

So you need to worry about R12/CPSR on entry to the outlined function, but not on exit from the outlined function.

5806

Maybe also look for TAILJMP opcodes?

5813

Why do you need to forbid outlining code that touches LR or SP? None of the new instructions you're generating read or clobber them. (It might start mattering if you add support for additional forms of outlining, or Thumb1 support, but this patch has neither.)

5817

You might also want to support POP_RET.

efriedma added inline comments.May 1 2020, 5:47 PM
llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
5806

Oh, nevermind, TAILJMP is a terminator so it gets handled earlier.

5817

Oh, nevermind, POP_RET is a terminator so it gets handled earlier.

yroux added a comment.EditedMay 5 2020, 1:20 PM

Thanks for adding the MVE changes, but I also still don't see any DSP tests, i.e QADD, SADD16.

hmm... I was wrong w/r to DSP instructions Q and GE flags handling, I don't get what I did in my tests, but there is no info about CPSR in the MIR attached to these instructions. So, you had a good feeling and I can disable them, but as Eli said in his comments we don't really need to worry about APCS but only to linker veneers and I don't think a linker would clear the sticky Q bit or touch the GE ones.

Do you think we can let it as it is ?

llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
5607

Yes I agree and it is what is done here, I put the same comment as what was done on AArch64 to explain where it comes from, but as it is mentioned below we remove the candidate when these registers are live into or across, which means that we can outline cases where def and use are inside the outlined region or the def inside and the use after the exit as you said.

I've modify the machine-outliner-unsafe-registers.mir test in the update version I'll submit to exhibit this.

5753

Yes, I'll add it

5813

I agree that there is no issue with SP until we modify it, I'll move that to the right patch, for LR it is needed here, when doing thunk outlining the call to the outlined function is a BL and if an instruction uses LR in the outlined region we will have a problem.

yroux updated this revision to Diff 262205.May 5 2020, 1:21 PM
efriedma added inline comments.May 5 2020, 1:54 PM
llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
5813

Oh, right, we're effectively hoisting the definition of LR in thunk outlining. Makes sense.

For tail-call outlining, we don't care about LR, but I guess it's trickier to separate out those two cases.

samparker accepted this revision.May 5 2020, 11:43 PM

Eli said in his comments we don't really need to worry about APCS but only to linker veneers and I don't think a linker would clear the sticky Q bit or touch the GE ones.

Ah, yes, good point. Then I have nothing else, but definitely worth waiting to see if Eli has further comments.

This revision is now accepted and ready to land.May 5 2020, 11:43 PM
yroux added a comment.May 6 2020, 12:33 AM

Ah, yes, good point. Then I have nothing else, but definitely worth waiting to see if Eli has further comments.

Yes I'll wait for Eli's feedback and update the other patches. Thanks for the review Sam

efriedma accepted this revision.May 6 2020, 9:22 AM

LGTM with one minor comment

llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
5607

Pleas clarify the comment to explain why the procedure call standard is relevant.

yroux updated this revision to Diff 263830.May 13 2020, 1:13 PM

I found a last issue when doing a full bootstrap with -moutline used to build clang, there was a case of thunk outlining where the original was call a BLX LR which was broken by the outlining call BL OUTLINE_FUNCTION_X I fixed this issue by checking the use of link register before handling calls and I've added a test case.

Rebased and full bootstrap in ARM and Thumb2 mode are now OK, I'll commit it tomorrow if that's fine for you.

This revision was automatically updated to reflect the committed changes.