This is an archive of the discontinued LLVM Phabricator instance.

[MachineOutliner] AArch64: Avoid saving + restoring LR if possible
ClosedPublic

Authored by paquette on Aug 14 2017, 4:41 PM.

Details

Summary

This commit allows the outliner to avoid saving + restoring the link register on AArch64 when it is dead within an entire class of candidates. It also changes the interface for the outliner slightly to facilitate this.

By doing this, it's possible to find more outlining candidates in AArch64-- in the case that a candidate doesn't require a save/restore, that candidate can be far shorter. This allows us to find length 2 and 3 candidates and outline them.

This is still conservative. For example, if a length-3 candidate appears 20 times, and one of those times requires a save + restore, the outliner will assume all of them require a save + restore. Thus, none will be outlined. This will be fixed in later patches; the purpose of this is just to get the main idea going.

This, on average, improves the code size of the --benchmarking only SingleSource and MultiSource tests by about 2.94% in comparison to the current outliner. Several tests have no improvements; this is likely due to the conservative nature of the current technique. 36 of the tests improve by more than 10%. 4 tests regress, likely due to competing with linker deduping. This can likely be fixed by disallowing outlining from linkonceodr functions.

These were compiled with LNT using -Oz -mno-red-zone -mllvm -enable-machine-outliner and --benchmarking-only.

Full code size results: https://hastebin.com/raw/soloropiwo

Diff Detail

Event Timeline

paquette created this revision.Aug 14 2017, 4:41 PM
davide edited edge metadata.Aug 16 2017, 12:45 AM

Some comments inline.

lib/CodeGen/MachineOutliner.cpp
902

I wonder why this (and other variables) are size_t and not just unsigned?

lib/Target/AArch64/AArch64InstrInfo.cpp
4561–4569

This one seems a decent candidate for any_of, but the use of stepBackward() could make it more difficult.

4572–4584

Is your cost model for the call overhead documented somewhere? If not, can you add a comment here?

4643–4648

Have you considered rewriting this code in terms of llvm::any_of (or all_of) for clarity?

4649–4656

Can you please add a comment here? It's not obvious when it's safe to outline without understanding the whole picture.

MatzeB added inline comments.Aug 16 2017, 12:48 PM
include/llvm/Target/TargetInstrInfo.h
1565–1567

Is the vector supposed to be changed by this function? I'd use an ArrayRef<...> instead of std::vector<...> if it isn't.

1567–1568

I wonder why this parameter is called CandidateClass, from reading the description I wonder why there is a Class suffix.

lib/Target/AArch64/AArch64InstrInfo.cpp
4543

Not sure if I miss something, but I would expect that it is possible to check LR liveness exactly at the position where the call will be inserted rather than requiring the whole block to have LR unused?

MatzeB added inline comments.Aug 16 2017, 1:00 PM
lib/CodeGen/MachineOutliner.cpp
904–913

Seeing that both getOutliningCallOverhead and getOutliningFrameOverhead are now called in sequence and both perform some computations in common, this smells like maybe they should become one function now? (It certainly feels odd that currently you test LR-liveness twice for each candidate because of this split).

paquette added inline comments.Aug 23 2017, 10:37 AM
lib/CodeGen/MachineOutliner.cpp
902

Mostly for consistency with the definition of length + size. I was thinking that because an "overhead" in this sense is the length of a sequence of instructions, it'd be more consistent to use size_t.

OTOH, having a std::pair<size_t, unsigned> is kind of weird, and "overhead" need not be considered a "length". I'll swap size_t out for unsigned.

lib/Target/AArch64/AArch64InstrInfo.cpp
4543

Oh, I think you're right. I thought that you'd have to ensure that at least the sequence is safe, but the outliner doesn't ever touch LR. That is much simpler. :)

4572–4584

I'll add a comment for sure.

There should also be more comments about how the targets interact with the outliner in the main pass, so I'll do that too.

4643–4648

... I don't know a ton of C++ magic, so you have just changed my life.

paquette updated this revision to Diff 116021.Sep 20 2017, 11:17 AM
paquette marked 8 inline comments as done.

Updated the diff to address feedback. There are lots of changes, but here are the highlights:

  • I added a struct for MachineOutlinerInfo which acts as the interface between the target and the pass.
  • Lots of comments
  • Updated the remarks test to reflect the cost model change
  • Merged together the frame overhead/call overhead stuff into getOutliningCandidateInfo, which returns a MachineOutlinerInfo struct.
MatzeB added inline comments.Sep 20 2017, 7:01 PM
lib/Target/AArch64/AArch64InstrInfo.cpp
4603–4616

This still checks for LR being available for the whole range between the end of the basic block and CallInsertionPt. I would expect it to be enough to stepBackward() till CallInsertionPt and only then check if LR is unused.

4637–4644

llvms version of all_of even supports ranges I think, so you could do: all_of(RepeatedSequenceLocs, DoesntNeedLRSave) here.

And for the record: I personally prefer imperative for loops (at least in cases where you have to specifically create a lambda for the predicate); but that is subjective so please do not change it back or the next reviewer will just ask for all_of again and we circle around ;-)

test/CodeGen/AArch64/machine-outliner.mir
85–112

This test can be simplified, look at http://llvm.org/docs/MIRLangRef.html#simplifying-mir-files for some tactics.

paquette added inline comments.Sep 22 2017, 9:50 AM
lib/Target/AArch64/AArch64InstrInfo.cpp
4603–4616

I actually tried that, and the outliner ended up erroneously pulling out a bunch of things without saving LR.

I tried something like

LRU.stepBackward(*CallInsertionPt);
if (!LRU.available(AArch64::LR))
    return false;

Is this incorrect?

paquette updated this revision to Diff 116425.Sep 22 2017, 4:06 PM
  • Simplified the MIR test a bit
  • Fixed the loop for the LiveRegisterUnits stuff so that we only check if LR is available at the prospective call-site
MatzeB accepted this revision.Sep 26 2017, 1:30 PM

Thanks, LGTM

This revision is now accepted and ready to land.Sep 26 2017, 1:30 PM
paquette closed this revision.Sep 27 2017, 4:06 PM
paquette marked 5 inline comments as done.

Committed:
r314341 - [MachineOutliner] AArch64: Avoid saving + restoring LR if possible