This is an archive of the discontinued LLVM Phabricator instance.

[GVN] Introduce loop load PRE
ClosedPublic

Authored by mkazantsev on Apr 5 2021, 10:57 PM.

Details

Summary

This patch allows PRE of the following type of loads:

preheader:
  br label %loop

loop:
  br i1 ..., label %merge, label %clobber

clobber:
  call foo() // Clobbers %p
  br label %merge

merge:
  ...
  br i1 ..., label %loop, label %exit

Into

preheader:
  %x0 = load %p
  br label %loop

loop:
  %x.pre = phi(x0, x2)
  br i1 ..., label %merge, label %clobber

clobber:
  call foo() // Clobbers %p
  %x1 = load %p
  br label %merge

merge:
  x2 = phi(x.pre, x1)
  ...
  br i1 ..., label %loop, label %exit

So instead of loading from %p on every iteration, we load only when the actual clobber happens.
The typical pattern which it is trying to address is: hot loop, with all code inlined and
provably having no side effects, and some side-effecting calls on cold path.

The worst overhead from it is, if we always take clobber block, we make 1 more load
overall (in preheader). It only matters if loop has very few iteration. If clobber block is not taken
at least once, the transform is neutral or profitable.

There are several improvements prospect open up:

  • We can sometimes be smarter in loop-exiting blocks via split of critical edges;
  • If we have block frequency info, we can handle multiple clobbers. The only obstacle now is that we don't know if their sum is colder than the header.

Diff Detail

Event Timeline

mkazantsev created this revision.Apr 5 2021, 10:57 PM
mkazantsev requested review of this revision.Apr 5 2021, 10:57 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 5 2021, 10:57 PM
mkazantsev edited the summary of this revision. (Show Details)Apr 5 2021, 10:57 PM
lkail added a subscriber: lkail.Apr 5 2021, 11:50 PM
reames added a comment.EditedApr 8 2021, 12:42 PM

Just for context, I'd explored a very similar transform before in https://reviews.llvm.org/D7061. One really key difference between the prior attempt and this one is that previously I hadn't explicitly handled loops and instead tried to match this from the original IR. I don't think loop info was available at the time. Though, looking at the current code, it looks like LoopInfo is optional for the pass even now.

One other key detail that has changed is we now support speculation, and don't necessarily have to prove anticipation (which the earlier change struggled with.)

In general, I think the new approach is much more likely to be successful than the original since we're solving a subset of the problems.

Code structure wise, I want to suggest we don't try to shove this new transformation into the existing performLoadPRE codepath. Several of the concerns of that code (e.g. address translation) don't apply for the loop case, and you have at least one bug (whether we need to check speculation safety) because of trying to reuse the code.

I'd suggest instead that you split out the last third or so of that function into a helper which blindly performs the insertion, and replacement, and then implement a second performLoopLoadPRE entry which checks the appropriate legality for the new transform.

I also seriously question whether this is worth doing in old-GVN at all. The only infrastructure you actually need for this is memory aliasing and speculation safety. I'd seriously suggest writing a standalone pass which uses MemorySSA and ValueTracking, and maybe reuses the extracted helper function mentioned above.

mkazantsev updated this revision to Diff 336335.Apr 9 2021, 1:07 AM
mkazantsev retitled this revision from [GVN] Introduce loop load PRE to [GVN] Introduce loop load PRE (WIP).

I'm pretty sure that availability problem you are referencing does not exist. See last 2 tests with guards.

As for refactoring, I'm going to do it. Putting WIP in the patch.

mkazantsev retitled this revision from [GVN] Introduce loop load PRE (WIP) to [GVN] Introduce loop load PRE.Apr 9 2021, 1:30 AM

Looking more into the code, I don't think that loop PRE needs any other legality checks than what we have now.

mkazantsev updated this revision to Diff 336440.Apr 9 2021, 6:51 AM

Split code out into different method. Haven't figured out yet how to make it in a separate pass with MemorySSA, but I think having it in GVN won't harm.

reames requested changes to this revision.Apr 13 2021, 12:11 PM

Comments inline include one serious correctness issue.

This is much cleaner than the original patch. I was initially hesitant to take this at all - as opposed to using MemorySSA or NewGVN - but with the new structure this looks a lot less invasive.

llvm/lib/Transforms/Scalar/GVN.cpp
1514

Extend this comment to emphasize that this means we have proven the load must execute if the loop is entered, and is thus safe to hoist to the end of the preheader without introducing a new fault. Similarly, the in-loop clobber must be dominated by the original load and is thus fault safe.

Er, hm, there's an issue here I just realized. This isn't sound. The counter example here is when the clobber is a call to free and the loop actually runs one iteration.

You need to prove that LI is safe to execute in both locations. You have multiple options in terms of reasoning, I'll let you decide which you want to explore: speculation safety, must execute, or unfreeable allocations. The last (using allocas as an example for test), might be the easiest.

1528

Tweak this comment a bit to emphasize that this ensures the new load executes at most as often as the original, and likely less often.

1533

I don't understand this restriction. Why is a switch not allowed?

1544

I don't think this loop does what you want, except maybe by accident. You allowed blocks outside the loop, as a result, you can end up with a bunch of available addresses and a bunch of loads before the preheader. This will likely later be DCEd since the preheader load will be the one actually used by SSA gen.

I strongly suspect you want exactly two available load locations: preheader, and your one in-loop clobber block.

This revision now requires changes to proceed.Apr 13 2021, 12:11 PM
mkazantsev added inline comments.Apr 13 2021, 8:17 PM
llvm/lib/Transforms/Scalar/GVN.cpp
1514

Free on the last iteration (the loop may have multiple though) is a nasty case indeed...

1533

This was ogirinally the protection against invokes. Switch is allowed, will fix.

1544

Yes, this check was lost during the refactoring. I'm pretty sure that eliminatePartiallyRedundantLoad will deal with it correctly, but it's at least not obvious. Thanks for catching.

Addressed comments, fixed bug with

I'm planning to add support for pointers basing on D99135, maybe as follow-up or on top of this.

mkazantsev planned changes to this revision.Apr 15 2021, 11:50 PM
reames requested changes to this revision.Apr 16 2021, 1:41 PM

Looks close to ready for an LGTM if you're willing to split patch as suggested.

llvm/lib/Transforms/Scalar/GVN.cpp
1544

continue the comment with something like:
"because we need a place to insert a copy of the load".

p.s. I'm fine with this in an initial patch, but you really should be using an alias check here as the trailing invoke might not alias the memory being PREed. Would make a good follow up patch.

1563

You can generalize the first check as !LoadPtr->canBeFreed()

1564

This last check is incorrect. Counter example:
for (i = 0; i < 1; i++)

v = o.f
if (c) {
  // this is loop block
  clobber();
}
atomic store g = o;
while(wait for other thread to free) {}

}

Can I ask you to pull this into a separate patch? (e.g. handle only the first two cases in this patch, and come back to the third in a follow on.)

1567

Style: hasFnAttribute(AttributeKind::NoFree) handles both of these cases. (Or will once D100226 lands).

mkazantsev marked 3 inline comments as done.

Reused canBeFreed.

Removed code related to nofree analysis for further follow-up.

Improved comments.

llvm/lib/Transforms/Scalar/GVN.cpp
1564

Good idea. Removed from this patch, need to make it more carefully.

1567

Removed.

reames accepted this revision.Apr 20 2021, 9:18 AM

LGTM w/minor comments.

llvm/lib/Transforms/Scalar/GVN.cpp
1515

Type: In order

1559

In a follow up, please generalize by using stripping bitcasts and inbounds geps before calling canBeFreed. Please don't do this in the change being lgtmed now.

llvm/test/Transforms/GVN/PRE/pre-loop-load.ll
8–9

Please add a positive test (analogous to this one), but using an alloca.

This revision is now accepted and ready to land.Apr 20 2021, 9:18 AM
nikic added inline comments.Apr 20 2021, 10:23 AM
llvm/lib/Transforms/Scalar/GVN.cpp
1565

Where do we check that LoadPtr is loop invariant (and thus available in preheader)?

llvm/test/Transforms/GVN/PRE/pre-loop-load.ll
45–46

This test looks broken. I think for it to do something useful you'll want to pass %p to may_free_memory (or another function). Otherwise the load is just undef and there is no clobber in the loop either.

reames requested changes to this revision.Apr 20 2021, 10:51 AM

LGTM withdrawn due to issue noticed by Nikita. Please address and refresh.

llvm/lib/Transforms/Scalar/GVN.cpp
1565

Er good question, and good catch. I remember this being here, but maybe it got lost in rebase.

Max, please add back the check, and a test which would have caught this.

This revision now requires changes to proceed.Apr 20 2021, 10:51 AM
mkazantsev added inline comments.Apr 20 2021, 10:00 PM
llvm/lib/Transforms/Scalar/GVN.cpp
1565

Wow, I was sure it was there. Thanks for catching!

Added loop invariant check. Test added as underlying patch.

mkazantsev added inline comments.Apr 21 2021, 12:14 AM
llvm/lib/Transforms/Scalar/GVN.cpp
1559

Shouldn't it be a part of canBeFreed?

reames accepted this revision.Apr 21 2021, 8:05 AM

LGTM

This revision is now accepted and ready to land.Apr 21 2021, 8:05 AM
This revision was automatically updated to reflect the committed changes.