This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] exclude ICmpZero Use in LSR if icmp can be replaced inside hardware loop.
ClosedPublic

Authored by shchenz on Jun 18 2019, 1:29 AM.

Details

Summary

Currently pass LSR runs before PPCCTRLoops, we meet some problem with this order because of the icmp inside loop which compares iteration indexed and loop count. In pass LSR, we always treat that icmp as a valid ICmpZero type LSRUse. But this is not true because in later pass HardwareLoops pass, we may replace this icmp with ctrloop instruction mtctr/bdnz. So we may get suboptimal code based on this order.

In Patch https://reviews.llvm.org/D62847, I made a patch which reorders LSR Pass and HardwareLoops pass. By reordering, LSR knows precisely whether the icmp is replaced or kept in HardwareLoops pass. But reviewers is uneasy about that patch and suggests we should add a target hook in LSR.

This patch add a target hook canSaveCmp() in TTI, it returns true only when we are sure that current handling loop in LSR is a hardware loop. If the loop is a hardware loop, we should not generate formulae for this icmp.

Diff Detail

Event Timeline

shchenz created this revision.Jun 18 2019, 1:29 AM
shchenz edited the summary of this revision. (Show Details)Jun 18 2019, 1:34 AM
shchenz marked an inline comment as done.Jun 18 2019, 1:39 AM
shchenz added inline comments.
llvm/test/CodeGen/PowerPC/negctr.ll
38

This is a known deg. With this patch,
Before Codegen Prepare Pass:

for.body:                                         ; preds = %for.body.preheader, %for.body
  %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 1, %for.body.preheader ]
  %indvars.iv.next = add i64 %indvars.iv, 1
  %exitcond = icmp eq i64 %indvars.iv.next, 0
  br i1 %exitcond, label %for.end.loopexit, label %for.body

After Codegen Prepare Pass:

for.body:                                         ; preds = %for.body.preheader, %for.body
  %indvars.iv = phi i64 [ %math, %for.body ], [ 1, %for.body.preheader ]
  %0 = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %indvars.iv, i64 1)
  %math = extractvalue { i64, i1 } %0, 0
  %ov = extractvalue { i64, i1 } %0, 1
  br i1 %ov, label %for.end, label %for.body

HardwareLoops Pass currently can not recognize uadd Intrinsic + extractvalue as a hardware loop(can not get loop exit count). Maybe we need to modify the logic in Codegen Prepare Pass: if the loop is a hardware loop, we should not optimize the cmp to uadd/usub intrinsic?

shchenz updated this revision to Diff 205493.Jun 18 2019, 6:44 PM
shchenz edited the summary of this revision. (Show Details)

rebase

shchenz retitled this revision from [PowerPC] exclude ICmpZero Use in LSR in that icmp can be replaced inside hardware loop. to [PowerPC] exclude ICmpZero Use in LSR if icmp can be replaced inside hardware loop..Jun 18 2019, 11:55 PM

I recently sent a new LSR patch proposal, D63692, that to some extent runs in the opposite direction than this one. I mean, I am adding an additional initial formula for ICmpZero as opposed to excluding it. I think that both patches are however fully compatible and probably worth merging together to get the best of all platforms. I added @shchenz, the author of this, as a D63692 reviewer so he can post any considerations.

hfinkel added inline comments.Jun 25 2019, 7:17 PM
llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
886

This is duplicating applicibility logic from the HardwareLoops pass? Can we move this into a utility function, maybe something like HardwareLoopInfo::canAnalyze(Loop *L)?

llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
3271

saveCmp -> SaveCmp

llvm/test/CodeGen/PowerPC/negctr.ll
38

Yes, or teach SCEV about the uadd/usub pattern? I lean toward making SCEV smarter.

hfinkel added inline comments.Jun 25 2019, 7:20 PM
llvm/test/CodeGen/PowerPC/negctr.ll
38

(and, just to clarify, that should be a separate patch, so the FIXME here is fine. although please add a note about the uadd/usub and GCP in the comment).

shchenz updated this revision to Diff 206633.Jun 26 2019, 5:21 AM

Thanks for your comments @hfinkel .
Updated accordingly.

For the FIXME case, yes, I plan to fix it in a seperated patch later. Get your idea about fixing it by making SCEV recognize loops with uadd/usub intrinsic.

shchenz updated this revision to Diff 206776.Jun 26 2019, 6:25 PM

call target hook function outside the loop, it should be called once per loop.

hfinkel accepted this revision.Jul 2 2019, 4:57 PM

LGTM

This revision is now accepted and ready to land.Jul 2 2019, 4:57 PM
This revision was automatically updated to reflect the committed changes.
shchenz marked an inline comment as not done.