This is an archive of the discontinued LLVM Phabricator instance.

[GlobalISel][AMDGPU] Lower G_SMULH/G_UMULH
ClosedPublic

Authored by pdhaliwal on Aug 10 2020, 7:09 AM.

Diff Detail

Event Timeline

pdhaliwal created this revision.Aug 10 2020, 7:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 10 2020, 7:09 AM
pdhaliwal requested review of this revision.Aug 10 2020, 7:09 AM
pdhaliwal updated this revision to Diff 284356.Aug 10 2020, 7:15 AM

removed unneeded changes

arsenm added inline comments.Aug 10 2020, 12:41 PM
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1764

no auto.

Can't this use anyext?

1768

Something seems off to me about introducing a full multiply, and in whatever type the user requested. I think this only works if WideTy == 2 * OriginalType. Can you produce a mulh in the wider type? This seems more like a lowering

1769

Use Register, I would worry about introducing a copy of MachineOperand here

1771

LLT not auto

1775

ShiftAmt?

1775

Why isn't the shift amount WideTy.getSizeInBits() - Size? I don't understand - IsSigned

1819

Extra newline

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-smulh.mir
68

Can you add an 8 and 24-bit test?

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulh.mir
109

Can you add an 8 and 24-bit test?

pdhaliwal marked 5 inline comments as done.

Review comments

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1764

I am a bit doubtful if G_ANYEXT would work here. From docs, it doesn't take care of higher bits.

1768

Yes, it would only work when WideTy == 2 * OriginalType. And now if I think again it is more of a lowering operation than widening as user is not always free to choose the wider type.

1775

To accomodate the sign bit in case of signed operation.

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-smulh.mir
68

24-bit case won't work as it requires 48-bit MUL op which is not working yet.

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulh.mir
109

Added 8-bit case. But, 24-bit case won't work as it requires 48-bit MUL op which is not working yet.

pdhaliwal retitled this revision from [GlobalISel] widenScalar G_SMULH/G_UMULH to [GlobalISel][AMDGPU] Lower G_SMULH/G_UMULH.Aug 11 2020, 5:18 AM
foad requested changes to this revision.Aug 11 2020, 5:36 AM
foad added inline comments.
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
6034

I agree that anyext would not work here.

6037

Would this lowering also work for vector types, if you used LLT::scalarOrVector here?

6044

As Matt said you definitely should not subtract IsSigned here.

This revision now requires changes to proceed.Aug 11 2020, 5:36 AM
pdhaliwal updated this revision to Diff 284762.Aug 11 2020, 8:45 AM
pdhaliwal marked 2 inline comments as done.

Added support for vector types.

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
6044

I got confused in signed binary multiplication. For this operation, it is not required to subtract IsSigned.

foad accepted this revision.Aug 11 2020, 9:07 AM

Looks OK to me modulo one inline comment, as long as Matt has no further objections.

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
6039–6041

Actually it would be neater to use LLT::changeElementSize.

This revision is now accepted and ready to land.Aug 11 2020, 9:07 AM
arsenm added inline comments.Aug 13 2020, 6:16 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
542 ↗(On Diff #284762)

The expansion can fully use packed instructions with VOP3P instructions. This should try to clamp the number of elements for 16-bit cases if available before scalarizing

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-smulh.mir
41

Should add <2 x s16>, <3 x 16> and <4 x s16> cases

pdhaliwal updated this revision to Diff 288886.Aug 30 2020, 9:10 PM
pdhaliwal marked 2 inline comments as done.

Updated review comments.

@arsenm , let me know if it is good to land.

arsenm added inline comments.Sep 2 2020, 5:07 PM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
545 ↗(On Diff #288886)

This isn't the right logic, the intent is to go down to 2 elements for cases that can promote to <2 x i16>. s8 is't special here

arsenm requested changes to this revision.Sep 3 2020, 4:23 PM
This revision now requires changes to proceed.Sep 3 2020, 4:23 PM
pdhaliwal updated this revision to Diff 292461.Sep 17 2020, 4:31 AM

Updated tests and clamping number of elements to 2

arsenm added inline comments.Sep 17 2020, 6:40 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
597 ↗(On Diff #292461)

This should be unnecessary

pdhaliwal added inline comments.Sep 17 2020, 9:25 PM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
597 ↗(On Diff #292461)

If I drop this, the <2 x s32> case starts generating worse code. This is due to lowering coming into the picture which promotes the 32-bit mulh to 64-bit mul and then legalizing 64-bit mul. I can use VOP3P instruction only for S8. For others, I need to specify the scalarization.

arsenm added inline comments.Sep 18 2020, 9:40 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
597 ↗(On Diff #292461)

This should be an unconditional scalarize. The scalarization shouldn't cause a 64-bit multiply to be used

pdhaliwal added inline comments.Sep 20 2020, 9:14 PM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
597 ↗(On Diff #292461)

Hmm, unconditional scalarize would remove the possibility of using vector path for <2 x s8>. This is bit different from other operations like MUL, ADD where <2 x s16> would have been legal and unconditional scalarization would have worked. The whole point of having the scalarization conditional is because <2 x s8> can easily use <2 x s16> MUL from lowering path. And as <2 x s16> is legal for AMDGPU, the lowering will correctly use vector operations. Unconditional scalarization would simply make logic of using vector ops void.

arsenm added inline comments.Sep 22 2020, 6:19 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
597 ↗(On Diff #292461)

You already handled this case with the first fewerElementsIf, the second one just handles everything else. It doesn't need to specify not -s8

pdhaliwal updated this revision to Diff 293446.Sep 22 2020, 6:54 AM

Added lowerFor({V2S8})

pdhaliwal updated this revision to Diff 293449.Sep 22 2020, 7:01 AM

Removed unused code

arsenm added inline comments.Sep 22 2020, 7:13 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
597 ↗(On Diff #293449)

Put the actions on separate lines

600 ↗(On Diff #293449)

Separate lines

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulh.mir
567

Shouldn't use implicit uses of s8 values. I'm trying to fix implicit uses with illegal register types because we can't ultimately legalize these

Harbormaster completed remote builds in B72514: Diff 293446.
pdhaliwal updated this revision to Diff 293638.Sep 22 2020, 9:33 PM

Formatting and removed implicit uses

pdhaliwal marked 4 inline comments as done.Sep 22 2020, 9:34 PM
arsenm accepted this revision.Sep 23 2020, 6:10 AM
This revision is now accepted and ready to land.Sep 23 2020, 6:10 AM
This revision was landed with ongoing or failed builds.Sep 23 2020, 7:26 PM
This revision was automatically updated to reflect the committed changes.