This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add more llc tests for 48-bit mul generation.
ClosedPublic

Authored by abinavpp on Oct 26 2021, 9:11 AM.

Diff Detail

Event Timeline

abinavpp created this revision.Oct 26 2021, 9:11 AM
abinavpp requested review of this revision.Oct 26 2021, 9:11 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 26 2021, 9:11 AM
arsenm added inline comments.Oct 26 2021, 9:26 AM
llvm/test/CodeGen/AMDGPU/mul_int24.ll
184–186

We should have been able to eliminate these shifts since the high bits are known unused. Maybe we need a SimplifyDemandedBits combine on the mul24/mulhi24 sources?

foad added a subscriber: foad.Oct 26 2021, 9:38 AM
foad added inline comments.
llvm/test/CodeGen/AMDGPU/mul_int24.ll
223

"lhs" should be "lshr.rhs" here.

286

Same.

arsenm accepted this revision.Oct 27 2021, 3:27 PM

Test additions LGTM but they do show there's more to be done here

This revision is now accepted and ready to land.Oct 27 2021, 3:27 PM
abinavpp added inline comments.Oct 27 2021, 7:35 PM
llvm/test/CodeGen/AMDGPU/mul_int24.ll
223

I'm seeing this problem in amdgpu-codegenprepare-mul24.ll as well. Fix: D112685

abinavpp updated this revision to Diff 382891.Oct 27 2021, 7:36 PM

Addressed Jay's comment.

abinavpp marked 2 inline comments as done.Oct 27 2021, 7:37 PM
This revision was landed with ongoing or failed builds.Oct 27 2021, 7:40 PM
This revision was automatically updated to reflect the committed changes.
foad added inline comments.Oct 28 2021, 2:10 AM
llvm/test/CodeGen/AMDGPU/mul_uint24-amdgcn.ll
581–582

This seems to show that we've done a simplify-demanded-bits for the mul (because it is using the original v0 and v2) but not for the mul_hi (which is using the ANDed v1 and v3).

Maybe AMDGPUTargetLowering::performIntrinsicWOChainCombine needs to call simplifyMul24 for the new mulhi intrinsics?

abinavpp added inline comments.Oct 28 2021, 4:04 AM
llvm/test/CodeGen/AMDGPU/mul_uint24-amdgcn.ll
581–582

I missed adding this before. Fix: D112702