Page MenuHomePhabricator

[Codegen] TargetLowering::SimplifySetCC(): omit urem when possible
ClosedPublic

Authored by lebedev.ri on Jun 16 2019, 3:24 PM.

Details

Summary

This addresses the regression that is being exposed by D50222 in test/CodeGen/X86/jump_sign.ll
The missing fold, at least partially, looks trivial:
https://rise4fun.com/Alive/Zsln
i.e. if we are comparing with zero, and comparing the urem-by-non-power-of-two,
and the urem is of something that may at most have a single bit set (or no bits set at all),
the urem is not needed.

Diff Detail

Repository
rL LLVM

Event Timeline

Thanks, you fixed it!

Can you add some new explicit tests for this fold?

+Standalone test coverage

xbolva00 accepted this revision.Jun 17 2019, 3:22 AM

Looks fine.

Thanks for alive proof as well!

This revision is now accepted and ready to land.Jun 17 2019, 3:22 AM
spatel accepted this revision.Jun 17 2019, 11:06 AM

LGTM. I have no idea if it's worth the trade-off, but we could do this sooner in IR (instcombine) instead of or in addition to SDAG?

Thank you for the reviews.

LGTM. I have no idea if it's worth the trade-off, but we could do this sooner in IR (instcombine) instead of or in addition to SDAG?

Indeed, this fold is missed in middle-end too, but the regression at hand is in back-end test,
so i'm not sure if we should just hand-wave and only fix it in middle-end.
Also, where should the middle-end fix be? Again InstCombine?

I'm also not sure if this is the best approach, the fold doesn't care *where* those bits are,
it only cares about the *count* of ones, and is thus easily defeated by shift operations,
as it is seen from tests.

Thank you for the reviews.

LGTM. I have no idea if it's worth the trade-off, but we could do this sooner in IR (instcombine) instead of or in addition to SDAG?

Indeed, this fold is missed in middle-end too, but the regression at hand is in back-end test,
so i'm not sure if we should just hand-wave and only fix it in middle-end.
Also, where should the middle-end fix be? Again InstCombine?

Yep, yet another clause under InstCombiner::visitICmpInst().

I'm also not sure if this is the best approach, the fold doesn't care *where* those bits are,
it only cares about the *count* of ones, and is thus easily defeated by shift operations,
as it is seen from tests.

True, although without some real-world evidence that this matters, I'd say "good enough". :)

This revision was automatically updated to reflect the committed changes.