We were bailing out of creating 24-bit muls for results wider than 32
bits in AMDGPUCodeGenPrepare. With the 24-bit mulhi intrinsic, this
change teaches AMDGPUCodeGenPrepare to generate the 48-bit mul
correctly.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
---|---|---|
535 | I think we were incorrect in doing the LHSBits + RHSBits > ... check in D111523. We did not consider the case when an operand has > 24 known bits, but the sum of known bits being in limits. mul24 instruction works only on the low-order 24 bits of its operands. |
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
---|---|---|
535 | Ignore this comment. I missed the operand width check in the if above. |
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
---|---|---|
535 | Just realized that this if and the similar one for signed below is not required. |
Seems reasonable, just some nits inline.
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
---|---|---|
487 | This would be neater with IsSigned ? ... : ... | |
494 | This would be neater with IsSigned ? ... : ... | |
502 | Can you use CreateIntrinsic to create all the calls? | |
531 | Do all subtargets that have mul_u24 also have mulhi_u24, and the same for i24? | |
562 | I prefer not to have a negated condition for an "if" if there's an "else" as well, otherwise the condition for the "else" is a double negative which is harder to understand. | |
572 | Same. |
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
---|---|---|
531 | I guess so. We're doing the same thing in AMDGPUISelLowering.cpp. For amdgcn, I can see a corresponding 24-bit mulhi instruction for GFX6 and above. For r600, I can see that in HD6900 and Evergreen. I haven't looked at all the subtargets. The subtarget initialization should take care of this. |
This would be neater with IsSigned ? ... : ...