This is an archive of the discontinued LLVM Phabricator instance.

[LICM] Hoist loads that are dominated by invariant.start intrinsic, and are invariant in the loop.
ClosedPublic

Authored by anna on Jan 31 2017, 8:28 AM.

Details

Summary

We can hoist out loads that are dominated by invariant.start to the preheader.
We conservatively assume the load is variant, if we see a corresponding
invariant.end intrinsic.

Diff Detail

Repository
rL LLVM

Event Timeline

anna created this revision.Jan 31 2017, 8:28 AM
sanjoy requested changes to this revision.Jan 31 2017, 11:34 AM

I'm not sure this is correct -- what if you had:

int* ptr = ...;
for (...) { // runs only one iteration
  *ptr = 20;
  invariant_start(ptr); // invariant, but only after we've written 20 to *ptr
  int val = *ptr;
}

I think this patch will LICM the load to outside the loop incorrectly. I suspect we can at most hoist up to right after the invariant_start call.

lib/Transforms/Scalar/LICM.cpp
504 ↗(On Diff #86440)

This is very hard to read, not the least because you're implicitly using LoadOp. :)

Can you instead phrase this more straightforwardly, as:

auto *Int8PtrTy = ...
while (isa<bitcast>(...)) {
  if (LoadOp->getType() == Int8PtrTy)
    break;
   ...
}
529 ↗(On Diff #86440)

Don't you have to check the size?

This revision now requires changes to proceed.Jan 31 2017, 11:34 AM
anna added a comment.Jan 31 2017, 1:52 PM

I'm not sure this is correct -- what if you had:

int* ptr = ...;
for (...) { // runs only one iteration
  *ptr = 20;
  invariant_start(ptr); // invariant, but only after we've written 20 to *ptr
  int val = *ptr;
}

I think this patch will LICM the load to outside the loop incorrectly. I suspect we can at most hoist up to right after the invariant_start call.

Thanks for catching that. I dont think we have a mechanism currently to hoist upto a specific instruction. LICM uses canSinkOrHoistInst to hoist the load to just before the terminator of preheader of the current loop CurLoop. Behaviour of sinking to end of loop won't be affected by this patch.

So, I think we can fix this problem by add one more condition for hoisting such loads: the dominating invariant.start's BB should be outside CurLoop. I have a patch for this and it catches the bug above (and from reading LICM code, nested loops with invariant.start would work as well).

lib/Transforms/Scalar/LICM.cpp
504 ↗(On Diff #86440)

Yes, that implicit LoadOp makes it pretty hard. I used a lambda because of the same logic at two places. I think the if check as you've written above is fine even with it will be duplicated.

529 ↗(On Diff #86440)

yes. The implementation of invariant.start has that possibility of specifying a diffferent size. I'll add that.

anna updated this revision to Diff 86637.Feb 1 2017, 7:45 AM
anna edited edge metadata.
anna marked 2 inline comments as done.

Addressed review comments: Consider invariant start outside the current loop.
Check for size of load operand as well.

sanjoy requested changes to this revision.Feb 1 2017, 9:26 AM

Comments inline.

lib/Transforms/Scalar/LICM.cpp
505 ↗(On Diff #86637)

I'd move the LoadOp->getType() != PtrInt8Ty check right after the while loop, since they're related.

Another way to write this would be:

while (LoadOp->getType() != PtrInt8Ty) {
  auto *BC = dyn_cast<BitCastInst>(LoadOp);
  if (!BC)
    return false;
  LoadOp = BC->getOperand(0);
}

// No need to check the type again
508 ↗(On Diff #86637)

You're not saving any time by this, since getNumUses() is O(# uses).

I think you need a if (UsesSeen++ > MaxNumUses) return false; type thing in the loop below.

525 ↗(On Diff #86637)

Don't call getNumUses here, since it is O(#uses). Instead, use hasNUsesOrMore(1).

530 ↗(On Diff #86637)

Can be LocSizeInBits <= .

533 ↗(On Diff #86637)

s/invariantstart/invariant_start/

535 ↗(On Diff #86637)

I think just checking DT->dominates(II, LoopPreHeader) is sufficient here.

This revision now requires changes to proceed.Feb 1 2017, 9:26 AM
anna updated this revision to Diff 86679.Feb 1 2017, 11:13 AM
anna edited edge metadata.
anna marked 6 inline comments as done.

Patch updated based on review comments.

lib/Transforms/Scalar/LICM.cpp
508 ↗(On Diff #86637)

thanks. done.

535 ↗(On Diff #86637)

that's right. I missed that the LoopPreheader is guaranteed to exist in loops that have LICM done on them (all these loops are in LCSSA form).

anna updated this revision to Diff 86696.Feb 1 2017, 12:19 PM

There is no guarantee that the preheader exists when invoking canSinkOrHoistInst function.
Use the properly dominates property on the loopHeader instead.

sanjoy accepted this revision.Feb 1 2017, 12:26 PM

lgtm with nits

lib/Transforms/Scalar/LICM.cpp
483 ↗(On Diff #86679)
  • s/the load/\p LI/ for clarification
  • Mention that we also check the size
  • "If we find an invariant end" - slightly misleading, since we never actually look for one. I'd rather phrase the comment as "Try to prove that \p LI is loading from a location invariant within the loop \p CurLoop by looking for an invariant_start intrinsic. Return true ..."
489 ↗(On Diff #86679)

I'd s/LoadOp/Addr/, since very soon we transform it to a value that isn't the load operand.

493 ↗(On Diff #86679)

Sorry did not notice this earlier, but this should be a cl::opt.

502 ↗(On Diff #86679)

I'd like a bound on this loop too.

516 ↗(On Diff #86679)

s/load maybe/load maybe is/

527 ↗(On Diff #86679)

I'd extract out cast<ConstantInt>(II->getArgOperand(0))->getSExtValue() * 8 as InvariantSizeInBits or something like that.

test/Transforms/LICM/hoisting.ll
194 ↗(On Diff #86679)

Another related example -- there is no explicit invariant_end but the return value of invariant_start is escaped (e.g. passed to an opaque function) so there may possibly be an invariant_end somewhere we can't see.

This revision is now accepted and ready to land.Feb 1 2017, 12:26 PM
This revision was automatically updated to reflect the committed changes.
anna marked 6 inline comments as done.Feb 2 2017, 5:34 AM