This is an archive of the discontinued LLVM Phabricator instance.

[Support] improve known bits analysis for leading zeros of multiply
ClosedPublic

Authored by spatel on Dec 17 2021, 1:34 PM.

Details

Summary

Instead of summing leading zeros on the input operands, multiply the max possible values of those inputs and count the leading zeros of the result. This can give us an extra zero bit (typically in cases where one of the operands is a known constant).

This allows folding away the remaining 'add' ops in the motivating bug (modeled in the PhaseOrdering IR test):
https://github.com/llvm/llvm-project/issues/48399

Diff Detail

Event Timeline

spatel created this revision.Dec 17 2021, 1:34 PM
spatel requested review of this revision.Dec 17 2021, 1:34 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 17 2021, 1:34 PM
spatel edited the summary of this revision. (Show Details)Dec 17 2021, 1:42 PM
RKSimon added inline comments.Dec 18 2021, 2:44 PM
llvm/lib/Support/KnownBits.cpp
428–435

I'm not certain, but this assert (and the max() feeding it) doesn't look quite right - please can you double check it?

spatel added inline comments.
llvm/lib/Support/KnownBits.cpp
428–435

I think it's correct (otherwise, the exhaustive unit tests for mul should find an error).
As an example, try a 4-bit mul -- 0??? * 00?? :

  1. Max wide product: 0000_0111 * 0000_0011 = 0001_0101 (7 * 3 = 21)
  2. Leading zeros of wide max product = 3
  3. Leading zeros of actual result = max(3, 4) - 4 = 0 (there are no LZ in case of overflow)

But this does suggest a more efficient implementation (and possibly easier to read) - use "umul_ov" instead of widening.
Let me update and see if that looks better.

spatel updated this revision to Diff 395326.Dec 19 2021, 6:21 AM

Patch updated: use multiply with overflow instead of wide multiply for efficiency (and possibly easier to read). This obsoletes any opportunities for an assert AFAICT.

lebedev.ri accepted this revision.Dec 19 2021, 6:24 AM

LG
We *really* need precision tests for KnownBits / ConstantRange.

This revision is now accepted and ready to land.Dec 19 2021, 6:24 AM

LG
We *really* need precision tests for KnownBits / ConstantRange.

Thanks!

Side note: I was wondering how we handle the known sign bit for mul; it is not done here, but it is in the caller code in ValueTracking's computeKnownBitsMul(). I'm not sure if there's a reason for that.

RKSimon accepted this revision.Dec 19 2021, 7:09 AM

Thanks for checking! LGTM - cheers

This revision was landed with ongoing or failed builds.Dec 20 2021, 6:12 AM
This revision was automatically updated to reflect the committed changes.