Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
1774 | no auto. Can't this use anyext? | |
1778 | Something seems off to me about introducing a full multiply, and in whatever type the user requested. I think this only works if WideTy == 2 * OriginalType. Can you produce a mulh in the wider type? This seems more like a lowering | |
1779 | Use Register, I would worry about introducing a copy of MachineOperand here | |
1781 | LLT not auto | |
1785 | ShiftAmt? | |
1785 | Why isn't the shift amount WideTy.getSizeInBits() - Size? I don't understand - IsSigned | |
1806 | Extra newline | |
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-smulh.mir | ||
83 | Can you add an 8 and 24-bit test? | |
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulh.mir | ||
152 | Can you add an 8 and 24-bit test? |
Review comments
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
1774 | I am a bit doubtful if G_ANYEXT would work here. From docs, it doesn't take care of higher bits. | |
1778 | Yes, it would only work when WideTy == 2 * OriginalType. And now if I think again it is more of a lowering operation than widening as user is not always free to choose the wider type. | |
1785 | To accomodate the sign bit in case of signed operation. | |
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-smulh.mir | ||
83 | 24-bit case won't work as it requires 48-bit MUL op which is not working yet. | |
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulh.mir | ||
152 | Added 8-bit case. But, 24-bit case won't work as it requires 48-bit MUL op which is not working yet. |
Added support for vector types.
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
6164 | I got confused in signed binary multiplication. For this operation, it is not required to subtract IsSigned. |
Looks OK to me modulo one inline comment, as long as Matt has no further objections.
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
6159–6161 | Actually it would be neater to use LLT::changeElementSize. |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
595 | The expansion can fully use packed instructions with VOP3P instructions. This should try to clamp the number of elements for 16-bit cases if available before scalarizing | |
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-smulh.mir | ||
56 | Should add <2 x s16>, <3 x 16> and <4 x s16> cases |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
597 | This isn't the right logic, the intent is to go down to 2 elements for cases that can promote to <2 x i16>. s8 is't special here |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
598 | This should be unnecessary |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
598 | If I drop this, the <2 x s32> case starts generating worse code. This is due to lowering coming into the picture which promotes the 32-bit mulh to 64-bit mul and then legalizing 64-bit mul. I can use VOP3P instruction only for S8. For others, I need to specify the scalarization. |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
598 | This should be an unconditional scalarize. The scalarization shouldn't cause a 64-bit multiply to be used |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
598 | Hmm, unconditional scalarize would remove the possibility of using vector path for <2 x s8>. This is bit different from other operations like MUL, ADD where <2 x s16> would have been legal and unconditional scalarization would have worked. The whole point of having the scalarization conditional is because <2 x s8> can easily use <2 x s16> MUL from lowering path. And as <2 x s16> is legal for AMDGPU, the lowering will correctly use vector operations. Unconditional scalarization would simply make logic of using vector ops void. |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
598 | You already handled this case with the first fewerElementsIf, the second one just handles everything else. It doesn't need to specify not -s8 |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
597 | Put the actions on separate lines | |
600 | Separate lines | |
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulh.mir | ||
610 | Shouldn't use implicit uses of s8 values. I'm trying to fix implicit uses with illegal register types because we can't ultimately legalize these |
no auto.
Can't this use anyext?