Legalizing G_MUL for non-standard types (like i33) generated an error. Putting minScalar and maxScalar instead of clampScalar.
Also using new rule, instead of widening to the next power of 2, widen to the next multiple of the passed argument (32 in this case), so instead of
widening i65 to i128, we widen it to i96.
Details
Diff Detail
Unit Tests
Time | Test | |
---|---|---|
1,180 ms | x64 debian > libomp.api::omp_get_wtime.c |
Event Timeline
Seems reasonable to me. I guess the alternative is to remove this restriction from LegalizerHelper::narrowScalarMul:
if (DstSize % NarrowSize != 0 || SrcSize % NarrowSize != 0) return UnableToLegalize;
but that would be more work and the generated code would be the same.
llvm/test/CodeGen/AMDGPU/GlobalISel/mul.ll | ||
---|---|---|
625–646 | Now s96 is widened to 128 and then truncated down to 96 which is why those add3 instructions are gone. They will only be selected for most significant register/bits. Here these registers will end up dead after trunc. A rule to widen to next multiple of 32 might be better then next power of 2 (might not make sense for scalars smaller then 16, because we want s16 in some cases). This way scalars in range (65,96) will be widened into 96, not 128. Same for anything above 128, we don't need to go from 4x32 to 8x32. So a rule like widenToNextMultipleOf32 followed by clampScalar(0, S16, S32) that is already there should do the trick. What do you think @foad? |
Include G_ADD and G_SUB along with G_MUL. Instead of widening the scalar to the next power of 2, widen it to the next
multiple of 32 (if the type is i65, widen it to i96, instead of i128).
Refactoring and formatting.
Adding some more globalisel reviewers in case they have comments on the new "size is multiple of" legality predicates.
llvm/include/llvm/CodeGen/GlobalISel/LegalizerInfo.h | ||
---|---|---|
306 | I think the generic predicates should probably be named like sizeNotMultipleOf, and take the value 32 as an extra Size (or Unit?) argument. | |
llvm/lib/CodeGen/GlobalISel/LegalityPredicates.cpp | ||
159 | Nit: don't need the parentheses but please do add a != 0. | |
llvm/lib/CodeGen/GlobalISel/LegalizeMutations.cpp | ||
71 | Use alignTo(Ty.getScalarSizeInBits(), 32). |
llvm/include/llvm/CodeGen/GlobalISel/LegalizerInfo.h | ||
---|---|---|
364 | The "depending on which is greater" part doesn't make much sense to me. Either remove Min (here and in widenScalarToNextMultipleOf) or use the same wording as widenScalarToNextMultipleOf: "Widen ... to the next multiple of Size that is at least MinSize." |
llvm/include/llvm/CodeGen/GlobalISel/LegalizerInfo.h | ||
---|---|---|
854 | Should we have a separate rule widenScalarToNextMultipleOf**If** so we can take out this scalarWiderThan check? We need this check for targets that can work with s16. |
llvm/include/llvm/CodeGen/GlobalISel/LegalizerInfo.h | ||
---|---|---|
847 | I still find this comment confusing. It says "only if Size is greater than MinSize". Do you mean "only if the type is wider than MinSize"? But I still don't think you really need a MinSize parameter. Instead I think you could change the legalization rules to something like: getActionDefinitionsBuilder({G_ADD, G_SUB, G_MUL}) .legalFor({S32, S16, V2S16}) .minScalar(0, S16) .clampMaxNumElements(0, S16, 2) .widenScalarToNextMultipleOf(0, 32) .maxScalar(0, S32) .scalarize(0); Would that work? |
Instead of clampScalar, use minScalar and maxScalar before and after widening to the next multiple of 32.
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
553 | For this case (have 16 insts but no 16 bit packed insts) I think the order should be something like: .legalFor({S32, S16}) .minScalar(0, S16) .scalarize(0) .widenScalarToNextMultipleOf(0, 32) .maxScalar(0, S32) Otherwise v2s16 would be widened to v2s32 and then scalarized to s32. Instead we want to scalarize it to s16. |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
553 | llvm/test/CodeGen/AMDGPU/GlobalISel/mul.v2i16.ll has those tests and current version of the patch does not seem to affect it. Either way scalarize() will be called first. widenScalarToNextMultipleOf() checks if type is scalar. |
LGTM, thanks! Please update the commit message which still mentions clampScalar.
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
553 |
Oh yes, I missed that. This is OK then. |
Update comment.