Putting the +1 before the zero-extend will allow scalar evolution to fold the expression in some cases such as the one shown in PowerPC's shrink-wrap.ll test.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Out of curiosity, did you try performing the +1 in HardwareLoopInfo::isHardwareLoopCandidate instead? Looking back through the code, I think it would be more clear to rename 'ExitCount' to 'TripCount' and perform this addition outside of the pass.
I hadn't considered the +1 in isHardwareLoopCandidate until you mentioned it. I can add these changes in this diff, but I wonder whether the zero-extend should also be moved to isHardwareLoopCandidate at this point. That way only the uses of ExitCount (or TripCount, after renaming) will remain in the pass (as far as I can see).
I wonder whether the zero-extend should also be moved to isHardwareLoopCandidate at this point.
Nice, sounds good to me.
llvm/lib/CodeGen/HardwareLoops.cpp | ||
---|---|---|
388 ↗ | (On Diff #306152) | Could you also double check that this PointerTy check is necessary, it seems a little odd to me but I don't know much about the PPC implementation. |
Rename ExitCount to TripCount and move SCAV expression construction outside of HardwareLoops pass into isHardwareLoopCandidate.
llvm/lib/CodeGen/HardwareLoops.cpp | ||
---|---|---|
388 ↗ | (On Diff #306152) | I did a quick check and removed the PointerTy check and it seems all tests still run fine. However, I don't know enough about either PPC nor scalar evolution to confidently remove the check. Intuitively, it sounds unlikely for getExitCount to return a pointer type but I'm not sure. |
Intuitively, it sounds unlikely for getExitCount to return a pointer type but I'm not sure.
Indeed, thanks for checking. LGTM, thanks.
Glanced at this due to being CCed on pr51714.
This patch does not look correct on first glance. If backedge taken count is UINT_MAX for the narrower type, doing the increment in the narrow type produces zext(0) e.g. zero, whereas doing the zext(BTC)+1 does not. Unless there's something in the code which prevents BTC=UINT_MAX/TC=0, I suspect that's the cause of the miscompile being observed in that bug and this patch should be reverted.
Unless there's something in the code which prevents BTC=UINT_MAX/TC=0
And I thought there was!
Thanks both.