This is an archive of the discontinued LLVM Phabricator instance.

[Pipelines] Move LTO SCCP and BDCE out of addVectorPasses()
AcceptedPublic

Authored by nikic on Apr 28 2023, 7:22 AM.

Details

Summary

I'm trying to reconcile the differences in addVectorPasses() between LTO and ThinLTO/default. These two passes are only present in the LTO version, while for ThinLTO/default these passes run earlier (part of simplification, not optimization). The exact position these run in ThinLTO/default doesn't exist in the LTO pipeline, so I've put them in approximately the same place.

After this change the ThinLTO/default version of addVectorPasses() only runs more passes or in a different order.

Diff Detail

Event Timeline

nikic created this revision.Apr 28 2023, 7:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 28 2023, 7:22 AM
nikic requested review of this revision.Apr 28 2023, 7:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 28 2023, 7:22 AM
This revision is now accepted and ready to land.Apr 28 2023, 10:37 AM

Hello. This sounds OK, but apparently it can be pretty beneficial to run BDCE after the second round of unrolling. Unfortunately this may be wrapped up with some downstream compiler changes with a downstream test, it would make it hard to produce a reproducer. It is apparently after the second round of unrolling in this particular test. Otherwise I don't see any issues past that one case.

nikic added a comment.May 1 2023, 3:57 AM

Hello. This sounds OK, but apparently it can be pretty beneficial to run BDCE after the second round of unrolling. Unfortunately this may be wrapped up with some downstream compiler changes with a downstream test, it would make it hard to produce a reproducer. It is apparently after the second round of unrolling in this particular test. Otherwise I don't see any issues past that one case.

Even without a full reproducer, would it be possible to share the IR diff for the BDCE pass for that particular test? Just to get an idea for what makes it beneficial to run it after runtime unrolling. (Note that I've currently scheduled it before full unrolling as well, but based on your description it sounds like moving it after full unrolling wouldn't help either.)

Even without a full reproducer, would it be possible to share the IR diff for the BDCE pass for that particular test? Just to get an idea for what makes it beneficial to run it after runtime unrolling. (Note that I've currently scheduled it before full unrolling as well, but based on your description it sounds like moving it after full unrolling wouldn't help either.)

Yeah, it appears to need to run after the unrolling in vectorPasses (or after vectorPasses). After full unrolling didn't seem to help.

I wouldn't want to block things just for this case. There is chance it is just a bad benchmark that doesn't store any useful results. It looks like it removed interconnected phi nodes that have no other uses. I will try to send you what I can get, but like I said I only saw it come up in this one case.