Download Raw Diff

Details

Reviewers

RKSimon
foad
arsenm
spatel
lebedev.ri

Commits

rG454256ef4f89: [AMDGPU] Correct the known bits calculation for MUL_I24.

Summary

I'm not entirely sure, but based on how ComputeNumSignBits handles
ISD::MUL, I believe this code was miscounting the number of sign
bits.

As an example of an incorrect result let's say that countMinSignBits
returned 1 for the left hand side and 24 for the right hand side.
LHSValBits would be 23 and RHSValBits would be 0 and the sum would
be 23. This would cause the code to set 9 high bits as zero/one. Now
suppose the real values for the left side is 0x800000 and the right
hand side is 0xffffff. The product is 0x00800000 which has 8 sign bits
not 9.

The number of valid bits for the left and right operands is now
the number of non-sign bits + 1. If the sum of the valid bits of
the left and right sides exceeds 32, then the result may overflow and we
can't say anything about the sign of the result. If the sum is 32
or less then it won't overflow and we know the result has at least
1 sign bit.

For the previous example, the code will now calculate the left
side valid bits as 24 and the right side as 1. The sum will be 25
and the sign bits will be 32 - 25 + 1 which is 8, the correct value.

I don't know enough about AMDGPU to write a test for this. I just
noticed while looking at how KnownBits::countMinSignBits was used.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

craig.topper created this revision.Dec 31 2021, 5:59 PM

Herald added subscribers: dexonsmith, kerbowa, hiraditya and 7 others. · View Herald TranscriptDec 31 2021, 5:59 PM

craig.topper requested review of this revision.Dec 31 2021, 5:59 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 31 2021, 5:59 PM

Herald added a subscriber: wdng. · View Herald Transcript

Harbormaster completed remote builds in B141159: Diff 396843.Dec 31 2021, 6:55 PM

lebedev.ri added inline comments.Jan 1 2022, 1:24 AM

llvm/include/llvm/Support/KnownBits.h
252–253 ↗	(On Diff #396843)	This is certainly a right change. Is there no exhaustive test for this function? If not, could you add one? Else, adjust it to check that the returned value is u>=1 ?

Add KnownBits::countMinSignBits() unit test.

Harbormaster completed remote builds in B141212: Diff 396912.Jan 1 2022, 11:05 PM

Both changes look good to me but it's not clear how they are related, and it seems odd to include the generic change in a patch which claims to be AMDGPU-specific. Would it makes sense to commit the generic change separately first?

As an example of an incorrect result let's say that countMinSignBits
returned 1 for the left hand side and 24 for the right hand side.
LHSValBits would be 23 and RHSValBits would be 0 and the sum would
be 23. This would cause the code to set 9 high bits as zero/one. Now
suppose the real values for the left side is 0x8000000 and the right
hand side is 0xffffff. The product is 0x00800000 which has 8 sign bits
not 9.

"... the real value for the left side is 0x800000 ..."? Your value 0x8000000 has too many zeros.

I can try to come up with an AMDGPU test case but not until Tuesday (GMT) at the earliest.

Incidentally there is some very similar logic for 24-bit multiplies in AMDGPUCodeGenPrepare::replaceMulWithMul24.

craig.topper edited the summary of this revision. (Show Details)Jan 2 2022, 1:49 PM

In D116469#3216166, @foad wrote:

Both changes look good to me but it's not clear how they are related, and it seems odd to include the generic change in a patch which claims to be AMDGPU-specific. Would it makes sense to commit the generic change separately first?

I'll split them. I think the generic change affect AMDGPU behavior becuase the return value of 0 that is being changed only happens when no bits are known. In that case the isNegative/isNonNegative/isStrictlyPositive would all return false so no calls to setHighBits would occur.

As an example of an incorrect result let's say that countMinSignBits
returned 1 for the left hand side and 24 for the right hand side.
LHSValBits would be 23 and RHSValBits would be 0 and the sum would
be 23. This would cause the code to set 9 high bits as zero/one. Now
suppose the real values for the left side is 0x8000000 and the right
hand side is 0xffffff. The product is 0x00800000 which has 8 sign bits
not 9.

"... the real value for the left side is 0x800000 ..."? Your value 0x8000000 has too many zeros.

Fixed. I guess my brain really wanted to write a 32-bit number.

Remove KnownBits change

craig.topper retitled this revision from [AMDGPU][KnownBits] Correct the known bits calculation for MUL_I24. to [AMDGPU] Correct the known bits calculation for MUL_I24..Jan 2 2022, 2:07 PM

craig.topper edited the summary of this revision. (Show Details)

RKSimon added inline comments.Jan 2 2022, 2:29 PM

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
4629–4632	Do you think it'd be confusing if we added KnownBits::getMinSignedBits() to match APInt::getMinSignedBits() ?
4646–4647	Use KnownBits::countMaxActiveBits() ?

Harbormaster completed remote builds in B141242: Diff 396947.Jan 2 2022, 2:47 PM

craig.topper added inline comments.Jan 2 2022, 3:23 PM

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
4629–4632	Shouldn’t it be getMaxSignedBits? It’s an upper bound like getMaxActiveBits. Likewise ComputeMinSignedBits is also misnamed.

Address Simon's comments.

Harbormaster completed remote builds in B141279: Diff 396989.Jan 2 2022, 9:19 PM

This looks obviously correct(C) to me, but i'll leave this to amdgpu folks.

craig.topper mentioned this in rGdb81324c8da5: [Support] Add KnownBits::countMaxSignedBits(). Make KnownBits::countMinSignBits….Jan 2 2022, 11:31 PM

This patch seems to change codegen in test/CodeGen/AMDGPU/lshl64-to-32.ll but I haven't managed to understand whether the change is good, bad or indifferent.

Rebase

In D116469#3219140, @foad wrote:

This patch seems to change codegen in test/CodeGen/AMDGPU/lshl64-to-32.ll but I haven't managed to understand whether the change is good, bad or indifferent.

I'm not seeing any failure for that test in my local runs. The test appears to be generated by update_llc_test_checks.py and rerunning the script produced no changes. Am I missing something?

Harbormaster completed remote builds in B141565: Diff 397399.Jan 4 2022, 2:58 PM

In D116469#3220809, @craig.topper wrote:

In D116469#3219140, @foad wrote:

This patch seems to change codegen in test/CodeGen/AMDGPU/lshl64-to-32.ll but I haven't managed to understand whether the change is good, bad or indifferent.

I'm not seeing any failure for that test in my local runs. The test appears to be generated by update_llc_test_checks.py and rerunning the script produced no changes. Am I missing something?

Sorry, false alarm. I had to manually rebase the patch when I tested it, and I got it wrong.

dexonsmith removed a subscriber: dexonsmith.Jan 11 2022, 9:40 AM

@foad were you able to come up with a test for this?

In D116469#3235782, @craig.topper wrote:

@foad were you able to come up with a test for this?

Sorry it had dropped off my radar. Here's a test where your patch makes a difference (with llc -march=amdgcn -mcpu=gfx900):

define i32 @f(i32 %x, i32 %y) {
  %xx = or i32 %x, -128 ; 0xffffff80
  %yy = or i32 %y, -128 ; 0xffffff80
  %r = mul i32 %xx, %yy
  %rr = lshr i32 %r, 14
  ret i32 %rr
}

Without your patch the result is simplified to 0 which is wrong. %xx and %yy are in the range -128..-1, so if they are both -128 then the product is -16384, shifted right by 14 is 1.

Rebase with test. Test will be added in a separate review.

Harbormaster completed remote builds in B143318: Diff 399897.Jan 13 2022, 10:29 PM

craig.topper added a parent revision: D117280: [AMDGPU] Pre-commit test for D116469. NFC.Jan 13 2022, 10:30 PM

LGTM, thanks!

This revision was not accepted when it landed; it landed in state Needs Review.Jan 14 2022, 8:55 AM

This revision was landed with ongoing or failed builds.

Closed by commit rG454256ef4f89: [AMDGPU] Correct the known bits calculation for MUL_I24. (authored by craig.topper). · Explain Why

This revision was automatically updated to reflect the committed changes.

craig.topper mentioned this in rGfc7af123ddba: [AMDGPU] Pre-commit test for D116469. NFC.

craig.topper added a commit: rG454256ef4f89: [AMDGPU] Correct the known bits calculation for MUL_I24..