This is an archive of the discontinued LLVM Phabricator instance.

[GlobalISel][AMDGPU] Lower G_UMULO/G_SMULO
ClosedPublic

Authored by pdhaliwal on Jan 1 2021, 2:58 AM.

Diff Detail

Event Timeline

pdhaliwal created this revision.Jan 1 2021, 2:58 AM
pdhaliwal requested review of this revision.Jan 1 2021, 2:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 1 2021, 2:58 AM
arsenm added inline comments.Jan 4 2021, 12:55 PM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
602

Would you get better results if you:

  1. scalarized vectors first
  2. Promoted small scalar types first?
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulo.mir
4

Do you need the abort=0s?

Removed global-isel-abort=0

pdhaliwal marked an inline comment as done.Jan 10 2021, 11:45 PM
pdhaliwal added inline comments.
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
602

In both cases, the MULO is somehow considered legal by compiler. For e.g. even for s32, expansion does not occur, compiler is directly using S/UMULO instructions. I am investigating this.

Moved ops close to ADDO

pdhaliwal added inline comments.Jan 14 2021, 10:26 PM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
602

It was due to missing corresponding definition of widenScalar for this operation. I tried implementing one for UMULO as it was easier. Didn't see any better result.

foad added a subscriber: foad.Jan 15 2021, 1:58 AM
foad added inline comments.Jan 15 2021, 2:15 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulo.mir
61

Why do we get ANYEXT followed by AND with 1? In the scalar s32 case we just get ZEXT which is nicer.

116–119

Is there a good reason why this is using UADDO plus ZEXT plus a second ADD, instead of using UADDE?

219

This expansion does an s32 multiply and an s16 multiply. It would be better to just do one s32 multiply -- you can extract all the information you need from the result of that.

Hi, apologies for late reply as I got sidetracked to some other work.

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulo.mir
61

This is coming from widening of BUILD_VECTOR resulting from legalization of ICMP instruction. So, I guess ZEXT is presumed dead once BUILD_VECTOR gets legalized.

116–119

This is because of legalization of UMULH for s64. I am thinking of having different patch for this as it impacts narrowing of UMULH.

219

I was able to get unsigned operation use single s32 multiply. Signed is getting bit tricky.

pdhaliwal updated this revision to Diff 320428.Feb 1 2021, 3:11 AM
  • Scalarize the vectors first
  • Using widened operation for smaller types
arsenm added inline comments.Feb 11 2021, 3:35 PM
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1842–1843

Why is this using SExtInReg in the signed case, but ZExt in the other? SExtInReg doesn't widen the type

foad added inline comments.Feb 12 2021, 8:46 AM
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1839

Maybe assert that WideTy is at least twice as wide as SrcTy, otherwise the trick we use for calculating overflow below does not work.

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-smulo.mir
3

Can you use -check-prefixes=GCN,GFX8 and GCN,GFX9 so that update_mir_test_checks will common up the identical ones?

113

This looks wrong as Matt noted above. Doesn't G_SEXT_INREG require identical source and result types? Would this fail MIR verification?

pdhaliwal updated this revision to Diff 323683.Feb 15 2021, 1:06 AM
pdhaliwal marked 2 inline comments as done.

Addressed review comments.

Can you use -check-prefixes=GCN,GFX8 and GCN,GFX9 so that update_mir_test_checks will common up the identical ones?

It does not work. Script warns as WARNING: Ignoring common prefixes: {'GCN'}: llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulo.mir

arsenm added inline comments.Feb 15 2021, 2:25 PM
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1849

Can you copy some of the comments from the DAG version? Stuff like
// Unsigned overflow occurred if the high part is non-zero.
and

// Signed overflow occurred if the high part does not sign extend the low.

3684–3687

Should just directly extract the reg here, there's no reason to refer to the MachineOperand. This is also using value copies of MachineOperand, which are generally not a good idea

3709–3710

You can hide the createGenericVirtual register calls with build Instr like buildInstr(Opc, {ResultTy, OverFlowTy}...

3718–3719

You should preserve the boolean type of the incoming, not hardcode to s1. We also have LLT.changeElementType for this

Addressed review comments.

pdhaliwal marked 4 inline comments as done.Feb 15 2021, 11:11 PM
foad added inline comments.Feb 17 2021, 12:30 AM
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1839

Did you test with assertions enabled? I think this needs to be >=.

pdhaliwal updated this revision to Diff 324247.Feb 17 2021, 2:36 AM

Fixed the assert.

pdhaliwal added inline comments.Feb 17 2021, 2:37 AM
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1839

Thanks for pointing out. I was not testing with assertions enabled so I missed it.

arsenm added inline comments.Mar 3 2021, 6:50 PM
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1845

Hmm. This is actually going beyond what I expect widenScalar to do. In general widenScalar should try to produce the original opcode. In this case, the DAG does the same overflow opcode in the wider type, and then ors the flag at the end.

1849

The DAG version still has a few more comments ( To determine if the result overflowed in a larger type...)

1855–1858

This is a common enough pattern. The DAG provides a getZExtInReg to help produce masks like this, maybe move this to MIRBuilder?

pdhaliwal updated this revision to Diff 329237.Mar 9 2021, 12:30 AM
pdhaliwal marked 3 inline comments as done.

Rebase & Review comments

arsenm requested changes to this revision.Mar 17 2021, 3:42 PM
arsenm added inline comments.
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1867–1868

I don't think the unsigned case is right. The DAG version inserts a shift here, not a mask

This revision now requires changes to proceed.Mar 17 2021, 3:42 PM
pdhaliwal marked an inline comment as done.

Use SHRL for unsigned case.

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1867–1868

I have changed it use shift instead of masking. Just curious, why was previous logic wrong? I thought zero'ing the upper bits of multiplication result and then comparing it with latter should provide the correct result.

foad added inline comments.Mar 19 2021, 3:11 AM
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1841

If the wide type is at least twice as wide as the original type, then the widened multiply provably will not overflow, so you don't need the final "or".

If we want to support widening to a type that is less than twice as wide as the original type, then remove this assert and only do the final "or" if WideTy.getScalarSizeInBits() >= 2 * SrcBitWidth. But I'm not sure how you would write a test for that code path.

pdhaliwal updated this revision to Diff 331845.Mar 19 2021, 6:01 AM

Removed assert. Now WideTy overflow check is only done when WideTy is not sufficient enough.
Added test case for s24 to verify the logic.

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1841

That "or" exists to match the existing DAG logic. But you are right that the assert makes "or" redundant. I have removed the assert and added tests for s24 to verify the logic.

foad accepted this revision.Mar 19 2021, 6:17 AM

I think this looks good, just some nits inline. If there are any further improvements they can be done as follow ups.

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1867–1868

I'm pretty sure the previous logic was fine too, it's just a different way of checking the upper part is zero.

1880

Needs a comment that the multiply can't possible overflow if the wide type is >= 2 * original width.

It's a shame that you have to duplicate this check from line 1845. Maybe @arsenm knows a cleaner way to write this.

3682

Personally I would be tempted to generalize LegalizerHelper::fewerElementsVectorMultiEltType into a generic function that can handle any operation that works on vector elements independently. But that does not have to be part of this patch.

arsenm added inline comments.Mar 19 2021, 6:28 AM
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1867–1868

Oh yes, I just can't read. Arguably avoiding the shift is better since shifts can be more expensive

pdhaliwal updated this revision to Diff 332214.EditedMar 22 2021, 1:26 AM
pdhaliwal marked an inline comment as done.

This version of the patch has bit more changes. Please let me know if it still looks good.

  • Revert the unsigned case to use Masks.
  • Simplified the logic for widenScalar
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1867–1868

I have reverted to using masks.

pdhaliwal updated this revision to Diff 332217.Mar 22 2021, 1:45 AM

Fix comments.

foad added a comment.Mar 22 2021, 2:13 AM

Still looks good to me.

This revision was not accepted when it landed; it landed in state Needs Review.Mar 22 2021, 10:46 PM
This revision was automatically updated to reflect the committed changes.