This is an archive of the discontinued LLVM Phabricator instance.

[HardwareLoops] Change order of SCEV expression construction for InitLoopCount.
ClosedPublic

Authored by JanekvO on Nov 18 2020, 10:28 AM.

Details

Summary

Putting the +1 before the zero-extend will allow scalar evolution to fold the expression in some cases such as the one shown in PowerPC's shrink-wrap.ll test.

Diff Detail

Event Timeline

JanekvO created this revision.Nov 18 2020, 10:28 AM
JanekvO requested review of this revision.Nov 18 2020, 10:28 AM
samparker accepted this revision.Nov 19 2020, 12:55 AM

LGTM, thanks!

This revision is now accepted and ready to land.Nov 19 2020, 12:55 AM

Out of curiosity, did you try performing the +1 in HardwareLoopInfo::isHardwareLoopCandidate instead? Looking back through the code, I think it would be more clear to rename 'ExitCount' to 'TripCount' and perform this addition outside of the pass.

Out of curiosity, did you try performing the +1 in HardwareLoopInfo::isHardwareLoopCandidate instead? Looking back through the code, I think it would be more clear to rename 'ExitCount' to 'TripCount' and perform this addition outside of the pass.

I hadn't considered the +1 in isHardwareLoopCandidate until you mentioned it. I can add these changes in this diff, but I wonder whether the zero-extend should also be moved to isHardwareLoopCandidate at this point. That way only the uses of ExitCount (or TripCount, after renaming) will remain in the pass (as far as I can see).

I wonder whether the zero-extend should also be moved to isHardwareLoopCandidate at this point.

Nice, sounds good to me.

samparker added inline comments.Nov 19 2020, 7:01 AM
llvm/lib/CodeGen/HardwareLoops.cpp
385–386

Could you also double check that this PointerTy check is necessary, it seems a little odd to me but I don't know much about the PPC implementation.

thopre added a subscriber: thopre.Nov 20 2020, 6:32 AM
JanekvO updated this revision to Diff 307093.Nov 23 2020, 8:50 AM

Rename ExitCount to TripCount and move SCAV expression construction outside of HardwareLoops pass into isHardwareLoopCandidate.

JanekvO updated this revision to Diff 307094.Nov 23 2020, 8:51 AM

Diff against HEAD

JanekvO updated this revision to Diff 307095.Nov 23 2020, 8:52 AM

Sorry, one more try to diff against the right branch

JanekvO added inline comments.Nov 23 2020, 9:11 AM
llvm/lib/CodeGen/HardwareLoops.cpp
385–386

I did a quick check and removed the PointerTy check and it seems all tests still run fine. However, I don't know enough about either PPC nor scalar evolution to confidently remove the check. Intuitively, it sounds unlikely for getExitCount to return a pointer type but I'm not sure.

samparker accepted this revision.Nov 24 2020, 1:01 AM

Intuitively, it sounds unlikely for getExitCount to return a pointer type but I'm not sure.

Indeed, thanks for checking. LGTM, thanks.

This revision was landed with ongoing or failed builds.Nov 24 2020, 10:01 AM
This revision was automatically updated to reflect the committed changes.
reames added a subscriber: reames.Sep 2 2021, 3:27 PM

Glanced at this due to being CCed on pr51714.

This patch does not look correct on first glance. If backedge taken count is UINT_MAX for the narrower type, doing the increment in the narrow type produces zext(0) e.g. zero, whereas doing the zext(BTC)+1 does not. Unless there's something in the code which prevents BTC=UINT_MAX/TC=0, I suspect that's the cause of the miscompile being observed in that bug and this patch should be reverted.

Glanced at this due to being CCed on pr51714.

This patch does not look correct on first glance. If backedge taken count is UINT_MAX for the narrower type, doing the increment in the narrow type produces zext(0) e.g. zero, whereas doing the zext(BTC)+1 does not. Unless there's something in the code which prevents BTC=UINT_MAX/TC=0, I suspect that's the cause of the miscompile being observed in that bug and this patch should be reverted.

Thanks @reames
reverted in 34badc409cc452575c538c4b6449546adc38f121

Unless there's something in the code which prevents BTC=UINT_MAX/TC=0

And I thought there was!

Thanks both.