This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Correct the known bits calculation for MUL_I24.
ClosedPublic

Authored by craig.topper on Dec 31 2021, 5:59 PM.

Details

Summary

I'm not entirely sure, but based on how ComputeNumSignBits handles
ISD::MUL, I believe this code was miscounting the number of sign
bits.

As an example of an incorrect result let's say that countMinSignBits
returned 1 for the left hand side and 24 for the right hand side.
LHSValBits would be 23 and RHSValBits would be 0 and the sum would
be 23. This would cause the code to set 9 high bits as zero/one. Now
suppose the real values for the left side is 0x800000 and the right
hand side is 0xffffff. The product is 0x00800000 which has 8 sign bits
not 9.

The number of valid bits for the left and right operands is now
the number of non-sign bits + 1. If the sum of the valid bits of
the left and right sides exceeds 32, then the result may overflow and we
can't say anything about the sign of the result. If the sum is 32
or less then it won't overflow and we know the result has at least
1 sign bit.

For the previous example, the code will now calculate the left
side valid bits as 24 and the right side as 1. The sum will be 25
and the sign bits will be 32 - 25 + 1 which is 8, the correct value.

I don't know enough about AMDGPU to write a test for this. I just
noticed while looking at how KnownBits::countMinSignBits was used.

Diff Detail

Event Timeline

craig.topper created this revision.Dec 31 2021, 5:59 PM
craig.topper requested review of this revision.Dec 31 2021, 5:59 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 31 2021, 5:59 PM
Herald added a subscriber: wdng. · View Herald Transcript
lebedev.ri added inline comments.Jan 1 2022, 1:24 AM
llvm/include/llvm/Support/KnownBits.h
252–253 ↗(On Diff #396843)

This is certainly a right change.
Is there no exhaustive test for this function?
If not, could you add one? Else, adjust it to check that the returned value is u>=1 ?

Add KnownBits::countMinSignBits() unit test.

foad added a comment.Jan 2 2022, 1:20 AM

Both changes look good to me but it's not clear how they are related, and it seems odd to include the generic change in a patch which claims to be AMDGPU-specific. Would it makes sense to commit the generic change separately first?

As an example of an incorrect result let's say that countMinSignBits
returned 1 for the left hand side and 24 for the right hand side.
LHSValBits would be 23 and RHSValBits would be 0 and the sum would
be 23. This would cause the code to set 9 high bits as zero/one. Now
suppose the real values for the left side is 0x8000000 and the right
hand side is 0xffffff. The product is 0x00800000 which has 8 sign bits
not 9.

"... the real value for the left side is 0x800000 ..."? Your value 0x8000000 has too many zeros.

I can try to come up with an AMDGPU test case but not until Tuesday (GMT) at the earliest.

Incidentally there is some very similar logic for 24-bit multiplies in AMDGPUCodeGenPrepare::replaceMulWithMul24.

craig.topper edited the summary of this revision. (Show Details)Jan 2 2022, 1:49 PM

Both changes look good to me but it's not clear how they are related, and it seems odd to include the generic change in a patch which claims to be AMDGPU-specific. Would it makes sense to commit the generic change separately first?

I'll split them. I think the generic change affect AMDGPU behavior becuase the return value of 0 that is being changed only happens when no bits are known. In that case the isNegative/isNonNegative/isStrictlyPositive would all return false so no calls to setHighBits would occur.

As an example of an incorrect result let's say that countMinSignBits
returned 1 for the left hand side and 24 for the right hand side.
LHSValBits would be 23 and RHSValBits would be 0 and the sum would
be 23. This would cause the code to set 9 high bits as zero/one. Now
suppose the real values for the left side is 0x8000000 and the right
hand side is 0xffffff. The product is 0x00800000 which has 8 sign bits
not 9.

"... the real value for the left side is 0x800000 ..."? Your value 0x8000000 has too many zeros.

Fixed. I guess my brain really wanted to write a 32-bit number.

craig.topper retitled this revision from [AMDGPU][KnownBits] Correct the known bits calculation for MUL_I24. to [AMDGPU] Correct the known bits calculation for MUL_I24..Jan 2 2022, 2:07 PM
craig.topper edited the summary of this revision. (Show Details)
RKSimon added inline comments.Jan 2 2022, 2:29 PM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
4629–4632

Do you think it'd be confusing if we added KnownBits::getMinSignedBits() to match APInt::getMinSignedBits() ?

4646–4647

Use KnownBits::countMaxActiveBits() ?

craig.topper added inline comments.Jan 2 2022, 3:23 PM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
4629–4632

Shouldn’t it be getMaxSignedBits? It’s an upper bound like getMaxActiveBits.

Likewise ComputeMinSignedBits is also misnamed.

Address Simon's comments.

This looks obviously correct(C) to me, but i'll leave this to amdgpu folks.

foad added a comment.Jan 4 2022, 2:09 AM

This patch seems to change codegen in test/CodeGen/AMDGPU/lshl64-to-32.ll but I haven't managed to understand whether the change is good, bad or indifferent.

This patch seems to change codegen in test/CodeGen/AMDGPU/lshl64-to-32.ll but I haven't managed to understand whether the change is good, bad or indifferent.

I'm not seeing any failure for that test in my local runs. The test appears to be generated by update_llc_test_checks.py and rerunning the script produced no changes. Am I missing something?

foad added a comment.Jan 5 2022, 1:54 AM

This patch seems to change codegen in test/CodeGen/AMDGPU/lshl64-to-32.ll but I haven't managed to understand whether the change is good, bad or indifferent.

I'm not seeing any failure for that test in my local runs. The test appears to be generated by update_llc_test_checks.py and rerunning the script produced no changes. Am I missing something?

Sorry, false alarm. I had to manually rebase the patch when I tested it, and I got it wrong.

@foad were you able to come up with a test for this?

foad added a comment.Jan 12 2022, 3:13 AM

@foad were you able to come up with a test for this?

Sorry it had dropped off my radar. Here's a test where your patch makes a difference (with llc -march=amdgcn -mcpu=gfx900):

define i32 @f(i32 %x, i32 %y) {
  %xx = or i32 %x, -128 ; 0xffffff80
  %yy = or i32 %y, -128 ; 0xffffff80
  %r = mul i32 %xx, %yy
  %rr = lshr i32 %r, 14
  ret i32 %rr
}

Without your patch the result is simplified to 0 which is wrong. %xx and %yy are in the range -128..-1, so if they are both -128 then the product is -16384, shifted right by 14 is 1.

Rebase with test. Test will be added in a separate review.

foad added a comment.Jan 14 2022, 1:00 AM

LGTM, thanks!

This revision was not accepted when it landed; it landed in state Needs Review.Jan 14 2022, 8:55 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.