This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Enable 48-bit mul in AMDGPUCodeGenPrepare.
ClosedPublic

Authored by abinavpp on Oct 24 2021, 8:19 PM.

Details

Summary

We were bailing out of creating 24-bit muls for results wider than 32
bits in AMDGPUCodeGenPrepare. With the 24-bit mulhi intrinsic, this
change teaches AMDGPUCodeGenPrepare to generate the 48-bit mul
correctly.

Diff Detail

Event Timeline

abinavpp created this revision.Oct 24 2021, 8:19 PM
abinavpp requested review of this revision.Oct 24 2021, 8:19 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 24 2021, 8:19 PM
abinavpp updated this revision to Diff 381828.Oct 24 2021, 8:41 PM

Use the original destination size in getMul24().

abinavpp added inline comments.Oct 24 2021, 9:22 PM
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
535

I think we were incorrect in doing the LHSBits + RHSBits > ... check in D111523. We did not consider the case when an operand has > 24 known bits, but the sum of known bits being in limits. mul24 instruction works only on the low-order 24 bits of its operands.

abinavpp added inline comments.Oct 24 2021, 9:28 PM
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
535

Ignore this comment. I missed the operand width check in the if above.

abinavpp added inline comments.Oct 24 2021, 9:45 PM
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
535

Just realized that this if and the similar one for signed below is not required.

abinavpp updated this revision to Diff 381833.Oct 24 2021, 9:47 PM

Removed the unnecessary ifs.

foad added a comment.Oct 25 2021, 8:09 AM

Seems reasonable, just some nits inline.

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
487

This would be neater with IsSigned ? ... : ...

494

This would be neater with IsSigned ? ... : ...

502

Can you use CreateIntrinsic to create all the calls?

531

Do all subtargets that have mul_u24 also have mulhi_u24, and the same for i24?

562

I prefer not to have a negated condition for an "if" if there's an "else" as well, otherwise the condition for the "else" is a double negative which is harder to understand.

572

Same.

abinavpp updated this revision to Diff 382272.Oct 26 2021, 5:00 AM

Addressed review comments.

abinavpp marked 5 inline comments as done.Oct 26 2021, 5:07 AM
abinavpp added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
531

I guess so. We're doing the same thing in AMDGPUISelLowering.cpp. For amdgcn, I can see a corresponding 24-bit mulhi instruction for GFX6 and above. For r600, I can see that in HD6900 and Evergreen. I haven't looked at all the subtargets. The subtarget initialization should take care of this.

foad accepted this revision.Oct 26 2021, 5:10 AM

LGTM.

This revision is now accepted and ready to land.Oct 26 2021, 5:10 AM
This revision was automatically updated to reflect the committed changes.

Needs some end to end codegen tests that show this actually produces the ISA we want

Needs some end to end codegen tests that show this actually produces the ISA we want

Added some in D112554.