This is an archive of the discontinued LLVM Phabricator instance.

Add loop distribution to the LTO pipeline
ClosedPublic

Authored by sanwou01 on Oct 21 2020, 10:12 AM.

Details

Summary

The LoopDistribute pass is missing from the LTO pipeline, so
-enable-loop-distribute has no effect during post-link. The pre-link
loop distribution doesn't seem to survive the LTO pipeline either.

With this patch (and -flto -mllvm -enable-loop-distribute) we see a 43%
uplift on SPEC 2006 hmmer for AArch64. The rest of SPECINT 2006 is
unaffected.

Diff Detail

Event Timeline

sanwou01 created this revision.Oct 21 2020, 10:12 AM
sanwou01 requested review of this revision.Oct 21 2020, 10:12 AM

Is it possible to make a test, like llvm/test/Other/opt-O3-pipeline.ll but using -std-link-opts to print the structure of the passes? I don't think there is anything like that already for LTO, but would be useful.

FWIW, I really dislike these pipeline tests because some of them are actually very tricky to update and I doubt they provide any useful information. But agreed with Dave that for consistency such a test would probably be best (in case someone finds it useful).

sanwou01 updated this revision to Diff 299792.Oct 21 2020, 1:04 PM

Added LTO pipeline test

@SjoerdMeijer yeaahhh these pipeline tests are a bit of a pain, but nothing some big-brain sed scripting can't solve.

This is the diff after would-be-pre-committing the new pipeline test.

SjoerdMeijer accepted this revision.Oct 22 2020, 12:53 AM

This looks consistent with the other pipeline, it's opt-in, and gives a nice uplift, so LGTM.

This revision is now accepted and ready to land.Oct 22 2020, 12:53 AM

It's a shame that this adds so many other passes for something that isn't usually used, but I guess a lot of them should be free.

Is the new pass manager effected by this? It seems to go through different codepaths for LTO.

Otherwise yeah, LGTM.

nikic added a comment.Oct 22 2020, 1:20 AM

It's a shame that this adds so many other passes for something that isn't usually used, but I guess a lot of them should be free.

Right, I noticed that as well... If this turns out to have a compile-time impact, I would expect at least some parts of this can be preserved. For example, a trivial way to "preserve" SCEV without worrying about the details would be to "forget all" if loop distribution actually happens, which it does not by default. I believe BasicAA and AAResults can be preserved pretty much always, though it also doesn't have much effect.

This revision was landed with ongoing or failed builds.Nov 3 2020, 10:54 AM
This revision was automatically updated to reflect the committed changes.
sanwou01 reopened this revision.Nov 3 2020, 1:03 PM

Reverted due to new test failing on a bunch of buildbots. I'll try again tomorrow, looks like the other pipeline tests manage to work around it.

This revision is now accepted and ready to land.Nov 3 2020, 1:03 PM
This revision was automatically updated to reflect the committed changes.

I am trying to enable LoopDistribute, to observe the 43% uplift on hmmer2006, however "flto -mllvm -enable-loop-distribute" does not seem to improve performance. Could you share what other flags you used?

Hi, our flags are "-Ofast -flto -mcpu=native -fomit-frame-pointer", where "native" is a Neoverse-N1 system. Let me know if that helps.

pzheng added a subscriber: pzheng.Dec 18 2020, 4:47 PM

Hi, our flags are "-Ofast -flto -mcpu=native -fomit-frame-pointer", where "native" is a Neoverse-N1 system. Let me know if that helps.

Hi @sanwou01, did you observe a similar uplift on cortex-a57, or the improvement is only on neoverse-n1?

@pzheng I don't have perf data for Cortex-A57, but I would certainly expect to see an uplift there too.