Details
- Reviewers
arsenm foad - Commits
- rGd0e5422eb8bf: [GlobalISel][AMDGPU] Lower G_UMULO/G_SMULO
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
602 | In both cases, the MULO is somehow considered legal by compiler. For e.g. even for s32, expansion does not occur, compiler is directly using S/UMULO instructions. I am investigating this. |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
602 | It was due to missing corresponding definition of widenScalar for this operation. I tried implementing one for UMULO as it was easier. Didn't see any better result. |
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulo.mir | ||
---|---|---|
61 | Why do we get ANYEXT followed by AND with 1? In the scalar s32 case we just get ZEXT which is nicer. | |
116–119 | Is there a good reason why this is using UADDO plus ZEXT plus a second ADD, instead of using UADDE? | |
219 | This expansion does an s32 multiply and an s16 multiply. It would be better to just do one s32 multiply -- you can extract all the information you need from the result of that. |
Hi, apologies for late reply as I got sidetracked to some other work.
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulo.mir | ||
---|---|---|
61 | This is coming from widening of BUILD_VECTOR resulting from legalization of ICMP instruction. So, I guess ZEXT is presumed dead once BUILD_VECTOR gets legalized. | |
116–119 | This is because of legalization of UMULH for s64. I am thinking of having different patch for this as it impacts narrowing of UMULH. | |
219 | I was able to get unsigned operation use single s32 multiply. Signed is getting bit tricky. |
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
1829–1830 | Why is this using SExtInReg in the signed case, but ZExt in the other? SExtInReg doesn't widen the type |
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
1826 | Maybe assert that WideTy is at least twice as wide as SrcTy, otherwise the trick we use for calculating overflow below does not work. | |
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-smulo.mir | ||
3 | Can you use -check-prefixes=GCN,GFX8 and GCN,GFX9 so that update_mir_test_checks will common up the identical ones? | |
113 | This looks wrong as Matt noted above. Doesn't G_SEXT_INREG require identical source and result types? Would this fail MIR verification? |
Can you use -check-prefixes=GCN,GFX8 and GCN,GFX9 so that update_mir_test_checks will common up the identical ones?
It does not work. Script warns as WARNING: Ignoring common prefixes: {'GCN'}: llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulo.mir
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
1836 | Can you copy some of the comments from the DAG version? Stuff like // Signed overflow occurred if the high part does not sign extend the low. | |
3666–3669 | Should just directly extract the reg here, there's no reason to refer to the MachineOperand. This is also using value copies of MachineOperand, which are generally not a good idea | |
3691–3692 | You can hide the createGenericVirtual register calls with build Instr like buildInstr(Opc, {ResultTy, OverFlowTy}... | |
3700–3701 | You should preserve the boolean type of the incoming, not hardcode to s1. We also have LLT.changeElementType for this |
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
1826 | Did you test with assertions enabled? I think this needs to be >=. |
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
1826 | Thanks for pointing out. I was not testing with assertions enabled so I missed it. |
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
1832 | Hmm. This is actually going beyond what I expect widenScalar to do. In general widenScalar should try to produce the original opcode. In this case, the DAG does the same overflow opcode in the wider type, and then ors the flag at the end. | |
1836 | The DAG version still has a few more comments ( To determine if the result overflowed in a larger type...) | |
1842–1845 | This is a common enough pattern. The DAG provides a getZExtInReg to help produce masks like this, maybe move this to MIRBuilder? |
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
1854–1855 | I don't think the unsigned case is right. The DAG version inserts a shift here, not a mask |
Use SHRL for unsigned case.
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
1854–1855 | I have changed it use shift instead of masking. Just curious, why was previous logic wrong? I thought zero'ing the upper bits of multiplication result and then comparing it with latter should provide the correct result. |
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
1828 | If the wide type is at least twice as wide as the original type, then the widened multiply provably will not overflow, so you don't need the final "or". If we want to support widening to a type that is less than twice as wide as the original type, then remove this assert and only do the final "or" if WideTy.getScalarSizeInBits() >= 2 * SrcBitWidth. But I'm not sure how you would write a test for that code path. |
Removed assert. Now WideTy overflow check is only done when WideTy is not sufficient enough.
Added test case for s24 to verify the logic.
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
1828 | That "or" exists to match the existing DAG logic. But you are right that the assert makes "or" redundant. I have removed the assert and added tests for s24 to verify the logic. |
I think this looks good, just some nits inline. If there are any further improvements they can be done as follow ups.
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
1854–1855 | I'm pretty sure the previous logic was fine too, it's just a different way of checking the upper part is zero. | |
1867 | Needs a comment that the multiply can't possible overflow if the wide type is >= 2 * original width. It's a shame that you have to duplicate this check from line 1845. Maybe @arsenm knows a cleaner way to write this. | |
3664 | Personally I would be tempted to generalize LegalizerHelper::fewerElementsVectorMultiEltType into a generic function that can handle any operation that works on vector elements independently. But that does not have to be part of this patch. |
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
1854–1855 | Oh yes, I just can't read. Arguably avoiding the shift is better since shifts can be more expensive |
This version of the patch has bit more changes. Please let me know if it still looks good.
- Revert the unsigned case to use Masks.
- Simplified the logic for widenScalar
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
1854–1855 | I have reverted to using masks. |
Maybe assert that WideTy is at least twice as wide as SrcTy, otherwise the trick we use for calculating overflow below does not work.