We can hoist out loads that are dominated by invariant.start to the preheader.
We conservatively assume the load is variant, if we see a corresponding
invariant.end intrinsic.
Details
Diff Detail
- Build Status
Buildable 3490 Build 3490: arc lint + arc unit
Event Timeline
I'm not sure this is correct -- what if you had:
int* ptr = ...; for (...) { // runs only one iteration *ptr = 20; invariant_start(ptr); // invariant, but only after we've written 20 to *ptr int val = *ptr; }
I think this patch will LICM the load to outside the loop incorrectly. I suspect we can at most hoist up to right after the invariant_start call.
lib/Transforms/Scalar/LICM.cpp | ||
---|---|---|
504 | This is very hard to read, not the least because you're implicitly using LoadOp. :) Can you instead phrase this more straightforwardly, as: auto *Int8PtrTy = ... while (isa<bitcast>(...)) { if (LoadOp->getType() == Int8PtrTy) break; ... } | |
529 | Don't you have to check the size? |
Thanks for catching that. I dont think we have a mechanism currently to hoist upto a specific instruction. LICM uses canSinkOrHoistInst to hoist the load to just before the terminator of preheader of the current loop CurLoop. Behaviour of sinking to end of loop won't be affected by this patch.
So, I think we can fix this problem by add one more condition for hoisting such loads: the dominating invariant.start's BB should be outside CurLoop. I have a patch for this and it catches the bug above (and from reading LICM code, nested loops with invariant.start would work as well).
lib/Transforms/Scalar/LICM.cpp | ||
---|---|---|
504 | Yes, that implicit LoadOp makes it pretty hard. I used a lambda because of the same logic at two places. I think the if check as you've written above is fine even with it will be duplicated. | |
529 | yes. The implementation of invariant.start has that possibility of specifying a diffferent size. I'll add that. |
Addressed review comments: Consider invariant start outside the current loop.
Check for size of load operand as well.
Comments inline.
lib/Transforms/Scalar/LICM.cpp | ||
---|---|---|
505 | I'd move the LoadOp->getType() != PtrInt8Ty check right after the while loop, since they're related. Another way to write this would be: while (LoadOp->getType() != PtrInt8Ty) { auto *BC = dyn_cast<BitCastInst>(LoadOp); if (!BC) return false; LoadOp = BC->getOperand(0); } // No need to check the type again | |
508 | You're not saving any time by this, since getNumUses() is O(# uses). I think you need a if (UsesSeen++ > MaxNumUses) return false; type thing in the loop below. | |
525 | Don't call getNumUses here, since it is O(#uses). Instead, use hasNUsesOrMore(1). | |
530 | Can be LocSizeInBits <= . | |
533 | s/invariantstart/invariant_start/ | |
535 | I think just checking DT->dominates(II, LoopPreHeader) is sufficient here. |
There is no guarantee that the preheader exists when invoking canSinkOrHoistInst function.
Use the properly dominates property on the loopHeader instead.
lgtm with nits
lib/Transforms/Scalar/LICM.cpp | ||
---|---|---|
483 |
| |
489 | I'd s/LoadOp/Addr/, since very soon we transform it to a value that isn't the load operand. | |
493 | Sorry did not notice this earlier, but this should be a cl::opt. | |
502 | I'd like a bound on this loop too. | |
516 | s/load maybe/load maybe is/ | |
527 | I'd extract out cast<ConstantInt>(II->getArgOperand(0))->getSExtValue() * 8 as InvariantSizeInBits or something like that. | |
test/Transforms/LICM/hoisting.ll | ||
194 | Another related example -- there is no explicit invariant_end but the return value of invariant_start is escaped (e.g. passed to an opaque function) so there may possibly be an invariant_end somewhere we can't see. |