Page MenuHomePhabricator

[AMDGPU][GlobalISel] Legalize G_MUL for non-standard types
ClosedPublic

Authored by matejam on Sep 3 2021, 5:26 AM.

Details

Summary

Legalizing G_MUL for non-standard types (like i33) generated an error. Putting minScalar and maxScalar instead of clampScalar.
Also using new rule, instead of widening to the next power of 2, widen to the next multiple of the passed argument (32 in this case), so instead of
widening i65 to i128, we widen it to i96.

Diff Detail

Event Timeline

matejam created this revision.Sep 3 2021, 5:26 AM
matejam requested review of this revision.Sep 3 2021, 5:26 AM
foad added a comment.Sep 3 2021, 7:23 AM

Seems reasonable to me. I guess the alternative is to remove this restriction from LegalizerHelper::narrowScalarMul:

if (DstSize % NarrowSize != 0 || SrcSize % NarrowSize != 0)
  return UnableToLegalize;

but that would be more work and the generated code would be the same.

mbrkusanin added inline comments.Sep 3 2021, 7:41 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/mul.ll
625–646

Now s96 is widened to 128 and then truncated down to 96 which is why those add3 instructions are gone. They will only be selected for most significant register/bits. Here these registers will end up dead after trunc.

A rule to widen to next multiple of 32 might be better then next power of 2 (might not make sense for scalars smaller then 16, because we want s16 in some cases). This way scalars in range (65,96) will be widened into 96, not 128. Same for anything above 128, we don't need to go from 4x32 to 8x32.

So a rule like widenToNextMultipleOf32 followed by clampScalar(0, S16, S32) that is already there should do the trick.

What do you think @foad?

matejam updated this revision to Diff 370614.Sep 3 2021, 10:04 AM

Include G_ADD and G_SUB along with G_MUL. Instead of widening the scalar to the next power of 2, widen it to the next
multiple of 32 (if the type is i65, widen it to i96, instead of i128).
Refactoring and formatting.

Adding some more globalisel reviewers in case they have comments on the new "size is multiple of" legality predicates.

llvm/include/llvm/CodeGen/GlobalISel/LegalizerInfo.h
306

I think the generic predicates should probably be named like sizeNotMultipleOf, and take the value 32 as an extra Size (or Unit?) argument.

llvm/lib/CodeGen/GlobalISel/LegalityPredicates.cpp
159

Nit: don't need the parentheses but please do add a != 0.

llvm/lib/CodeGen/GlobalISel/LegalizeMutations.cpp
71

Use alignTo(Ty.getScalarSizeInBits(), 32).

matejam updated this revision to Diff 370904.Sep 6 2021, 6:18 AM

Refactoring and formatting.

foad added inline comments.Sep 6 2021, 7:12 AM
llvm/include/llvm/CodeGen/GlobalISel/LegalizerInfo.h
305

Update comment.

364

Update comments to remove 32.

365

I would suggest to add Size as a parameter here also, instead of hardcoding 32. Also remove Min unless/until there is a need for it?

848

Add Size as a parameter and remove MinSize?

matejam updated this revision to Diff 370925.Sep 6 2021, 7:37 AM

Refactoring and formatting. Using parameters instead of hardcoding.

foad added inline comments.Sep 6 2021, 7:47 AM
llvm/include/llvm/CodeGen/GlobalISel/LegalizerInfo.h
364

The "depending on which is greater" part doesn't make much sense to me.

Either remove Min (here and in widenScalarToNextMultipleOf) or use the same wording as widenScalarToNextMultipleOf: "Widen ... to the next multiple of Size that is at least MinSize."

matejam updated this revision to Diff 370932.Sep 6 2021, 8:29 AM

Changes in comments, refactoring and formatting.

mbrkusanin added inline comments.Sep 6 2021, 8:34 AM
llvm/include/llvm/CodeGen/GlobalISel/LegalizerInfo.h
855

Should we have a separate rule widenScalarToNextMultipleOf**If** so we can take out this scalarWiderThan check? We need this check for targets that can work with s16.

foad added inline comments.Sep 6 2021, 8:53 AM
llvm/include/llvm/CodeGen/GlobalISel/LegalizerInfo.h
848

I still find this comment confusing. It says "only if Size is greater than MinSize". Do you mean "only if the type is wider than MinSize"?

But I still don't think you really need a MinSize parameter. Instead I think you could change the legalization rules to something like:

getActionDefinitionsBuilder({G_ADD, G_SUB, G_MUL})
  .legalFor({S32, S16, V2S16})
  .minScalar(0, S16)
  .clampMaxNumElements(0, S16, 2)
  .widenScalarToNextMultipleOf(0, 32)
  .maxScalar(0, S32)
  .scalarize(0);

Would that work?

matejam updated this revision to Diff 371048.Sep 7 2021, 5:54 AM

Instead of clampScalar, use minScalar and maxScalar before and after widening to the next multiple of 32.

foad added inline comments.Sep 7 2021, 6:04 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
552

For this case (have 16 insts but no 16 bit packed insts) I think the order should be something like:

.legalFor({S32, S16})
.minScalar(0, S16)
.scalarize(0)
.widenScalarToNextMultipleOf(0, 32)
.maxScalar(0, S32)

Otherwise v2s16 would be widened to v2s32 and then scalarized to s32. Instead we want to scalarize it to s16.

matejam updated this revision to Diff 371053.Sep 7 2021, 6:08 AM

Formatting.

mbrkusanin added inline comments.Sep 7 2021, 6:32 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
552

llvm/test/CodeGen/AMDGPU/GlobalISel/mul.v2i16.ll has those tests and current version of the patch does not seem to affect it.
It is still scalarized to s16.

Either way scalarize() will be called first. widenScalarToNextMultipleOf() checks if type is scalar.

foad accepted this revision.Sep 7 2021, 6:40 AM

LGTM, thanks! Please update the commit message which still mentions clampScalar.

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
552

Either way scalarize() will be called first. widenScalarToNextMultipleOf() checks if type is scalar.

Oh yes, I missed that. This is OK then.

This revision is now accepted and ready to land.Sep 7 2021, 6:40 AM
matejam edited the summary of this revision. (Show Details)Sep 7 2021, 7:23 AM
This revision was landed with ongoing or failed builds.Sep 7 2021, 7:38 AM
This revision was automatically updated to reflect the committed changes.