Page MenuHomePhabricator

[GlobalISel][AMDGPU] Lower G_SMULH/G_UMULH
ClosedPublic

Authored by pdhaliwal on Aug 10 2020, 7:09 AM.

Diff Detail

Event Timeline

pdhaliwal created this revision.Aug 10 2020, 7:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 10 2020, 7:09 AM
pdhaliwal requested review of this revision.Aug 10 2020, 7:09 AM
pdhaliwal updated this revision to Diff 284356.Aug 10 2020, 7:15 AM

removed unneeded changes

arsenm added inline comments.Aug 10 2020, 12:41 PM
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1774

no auto.

Can't this use anyext?

1778

Something seems off to me about introducing a full multiply, and in whatever type the user requested. I think this only works if WideTy == 2 * OriginalType. Can you produce a mulh in the wider type? This seems more like a lowering

1779

Use Register, I would worry about introducing a copy of MachineOperand here

1781

LLT not auto

1785

ShiftAmt?

1785

Why isn't the shift amount WideTy.getSizeInBits() - Size? I don't understand - IsSigned

1806

Extra newline

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-smulh.mir
83

Can you add an 8 and 24-bit test?

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulh.mir
152

Can you add an 8 and 24-bit test?

pdhaliwal marked 5 inline comments as done.

Review comments

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1774

I am a bit doubtful if G_ANYEXT would work here. From docs, it doesn't take care of higher bits.

1778

Yes, it would only work when WideTy == 2 * OriginalType. And now if I think again it is more of a lowering operation than widening as user is not always free to choose the wider type.

1785

To accomodate the sign bit in case of signed operation.

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-smulh.mir
83

24-bit case won't work as it requires 48-bit MUL op which is not working yet.

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulh.mir
152

Added 8-bit case. But, 24-bit case won't work as it requires 48-bit MUL op which is not working yet.

pdhaliwal retitled this revision from [GlobalISel] widenScalar G_SMULH/G_UMULH to [GlobalISel][AMDGPU] Lower G_SMULH/G_UMULH.Aug 11 2020, 5:18 AM
foad requested changes to this revision.Aug 11 2020, 5:36 AM
foad added inline comments.
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
6154

I agree that anyext would not work here.

6157

Would this lowering also work for vector types, if you used LLT::scalarOrVector here?

6164

As Matt said you definitely should not subtract IsSigned here.

This revision now requires changes to proceed.Aug 11 2020, 5:36 AM
pdhaliwal updated this revision to Diff 284762.Aug 11 2020, 8:45 AM
pdhaliwal marked 2 inline comments as done.

Added support for vector types.

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
6164

I got confused in signed binary multiplication. For this operation, it is not required to subtract IsSigned.

foad accepted this revision.Aug 11 2020, 9:07 AM

Looks OK to me modulo one inline comment, as long as Matt has no further objections.

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
6159–6161

Actually it would be neater to use LLT::changeElementSize.

This revision is now accepted and ready to land.Aug 11 2020, 9:07 AM
arsenm added inline comments.Aug 13 2020, 6:16 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
595

The expansion can fully use packed instructions with VOP3P instructions. This should try to clamp the number of elements for 16-bit cases if available before scalarizing

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-smulh.mir
56

Should add <2 x s16>, <3 x 16> and <4 x s16> cases

pdhaliwal updated this revision to Diff 288886.Aug 30 2020, 9:10 PM
pdhaliwal marked 2 inline comments as done.

Updated review comments.

@arsenm , let me know if it is good to land.

arsenm added inline comments.Sep 2 2020, 5:07 PM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
597

This isn't the right logic, the intent is to go down to 2 elements for cases that can promote to <2 x i16>. s8 is't special here

arsenm requested changes to this revision.Sep 3 2020, 4:23 PM
This revision now requires changes to proceed.Sep 3 2020, 4:23 PM
pdhaliwal updated this revision to Diff 292461.Sep 17 2020, 4:31 AM

Updated tests and clamping number of elements to 2

arsenm added inline comments.Sep 17 2020, 6:40 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
598

This should be unnecessary

pdhaliwal added inline comments.Sep 17 2020, 9:25 PM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
598

If I drop this, the <2 x s32> case starts generating worse code. This is due to lowering coming into the picture which promotes the 32-bit mulh to 64-bit mul and then legalizing 64-bit mul. I can use VOP3P instruction only for S8. For others, I need to specify the scalarization.

arsenm added inline comments.Sep 18 2020, 9:40 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
598

This should be an unconditional scalarize. The scalarization shouldn't cause a 64-bit multiply to be used

pdhaliwal added inline comments.Sep 20 2020, 9:14 PM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
598

Hmm, unconditional scalarize would remove the possibility of using vector path for <2 x s8>. This is bit different from other operations like MUL, ADD where <2 x s16> would have been legal and unconditional scalarization would have worked. The whole point of having the scalarization conditional is because <2 x s8> can easily use <2 x s16> MUL from lowering path. And as <2 x s16> is legal for AMDGPU, the lowering will correctly use vector operations. Unconditional scalarization would simply make logic of using vector ops void.

arsenm added inline comments.Tue, Sep 22, 6:19 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
598

You already handled this case with the first fewerElementsIf, the second one just handles everything else. It doesn't need to specify not -s8

pdhaliwal updated this revision to Diff 293446.Tue, Sep 22, 6:54 AM

Added lowerFor({V2S8})

pdhaliwal updated this revision to Diff 293449.Tue, Sep 22, 7:01 AM

Removed unused code

arsenm added inline comments.Tue, Sep 22, 7:13 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
597

Put the actions on separate lines

600

Separate lines

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulh.mir
610

Shouldn't use implicit uses of s8 values. I'm trying to fix implicit uses with illegal register types because we can't ultimately legalize these

Harbormaster completed remote builds in B72514: Diff 293446.
pdhaliwal updated this revision to Diff 293638.Tue, Sep 22, 9:33 PM

Formatting and removed implicit uses

pdhaliwal marked 4 inline comments as done.Tue, Sep 22, 9:34 PM
arsenm accepted this revision.Wed, Sep 23, 6:10 AM
This revision is now accepted and ready to land.Wed, Sep 23, 6:10 AM
This revision was landed with ongoing or failed builds.Wed, Sep 23, 7:26 PM
This revision was automatically updated to reflect the committed changes.