This is an archive of the discontinued LLVM Phabricator instance.

[InstSimplify] use knownbits to fold more udiv/urem
ClosedPublic

Authored by spatel on Jan 4 2022, 12:02 PM.

Details

Summary

We could use knownbits on both operands for even more folds (and there are already tests in place for that), but this is enough to recover the example from:
https://github.com/llvm/llvm-project/issues/51934
(the tests are derived from the code in that example)

I am assuming no noticeable compile-time impact from this because udiv/urem are rare opcodes, but I could check that if there is concern.

Diff Detail

Event Timeline

spatel created this revision.Jan 4 2022, 12:02 PM
spatel requested review of this revision.Jan 4 2022, 12:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 4 2022, 12:02 PM
foad added a comment.Jan 5 2022, 3:12 AM

It seems like you're using knownbits information to derive range information. It would be good to do this more universally, and in both directions.

llvm/lib/Analysis/InstructionSimplify.cpp
1088–1089

Can't you just test if (KnownX.getMaxValue().ult(*C))?

spatel marked an inline comment as done.Jan 5 2022, 6:03 AM

It seems like you're using knownbits information to derive range information. It would be good to do this more universally, and in both directions.

Yes - we do have a combination analysis called "computeConstantRangeIncludingKnownBits", so that should be more powerful.
But that is currently a static helper function in ValueTracking, so it would have to be made visible, and it would be good to have more tests to show where the extra logic gives us a better result. Ok to make that a TODO item?

llvm/lib/Analysis/InstructionSimplify.cpp
1088–1089

Yes, that's shorter and faster. Thanks!

spatel updated this revision to Diff 397551.Jan 5 2022, 6:05 AM
spatel marked an inline comment as done.

Patch updated:
Use KnownBits::getMaxValue() - no functional diff intended from previous revision, but more efficient.

foad added a comment.Jan 5 2022, 6:18 AM

It seems like you're using knownbits information to derive range information. It would be good to do this more universally, and in both directions.

Yes - we do have a combination analysis called "computeConstantRangeIncludingKnownBits", so that should be more powerful.
But that is currently a static helper function in ValueTracking, so it would have to be made visible, and it would be good to have more tests to show where the extra logic gives us a better result. Ok to make that a TODO item?

Sure. I don't object to the current patch. Just wanted to understand if there's a way of doing this more universally in future.

spatel updated this revision to Diff 397574.Jan 5 2022, 7:27 AM

Patch updated (NFCI with previous revision again):

  1. Added TODO comment about range analysis.
  2. Removed check for zero divisor (it is always handled earlier, so could be an assert).
spatel added a reviewer: foad.Jan 5 2022, 7:28 AM
spatel updated this revision to Diff 397887.Jan 6 2022, 7:48 AM

Patch updated:
I removed the MaxRecurse usage from the call to computeKnownBits. Similar calls around this file don't do that (ie, we're ok using the full recursive power of knownbits independent of whether we are recursing within InstSimplify calls themselves).

This allows the call to fit neatly within 80-cols, so the patch is just a single 'if' check now. :)

This revision is now accepted and ready to land.Jan 11 2022, 2:31 PM
This revision was automatically updated to reflect the committed changes.