This is an archive of the discontinued LLVM Phabricator instance.

[GlobalISel] Support vectors in LegalizerHelper::narrowScalarMul
AbandonedPublic

Authored by foad on Oct 1 2021, 5:33 AM.

Details

Summary

Also remove some redundancy because the source and result
types of any multiply are always the same.

Diff Detail

Event Timeline

foad created this revision.Oct 1 2021, 5:33 AM
foad requested review of this revision.Oct 1 2021, 5:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 1 2021, 5:33 AM
aemerson accepted this revision.Oct 4 2021, 10:30 AM
This revision is now accepted and ready to land.Oct 4 2021, 10:30 AM
foad reopened this revision.Oct 4 2021, 12:27 PM
This revision is now accepted and ready to land.Oct 4 2021, 12:27 PM
foad added a comment.Oct 5 2021, 1:43 AM

The problem here is that having narrowed:

%2:_(<2 x s64>) = G_UMULH %0:_, %1:_

into:

%3:_(<2 x s32>), %4:_(<2 x s32>) = G_UNMERGE_VALUES %0:_(<2 x s64>)
%5:_(<2 x s32>), %6:_(<2 x s32>) = G_UNMERGE_VALUES %1:_(<2 x s64>)
%7:_(<2 x s32>) = G_MUL %3:_, %5:_
%8:_(<2 x s32>) = G_MUL %4:_, %5:_
%9:_(<2 x s32>) = G_MUL %3:_, %6:_
%10:_(<2 x s32>) = G_UMULH %3:_, %5:_
%11:_(<2 x s32>), %12:_(<2 x s1>) = G_UADDO %8:_, %9:_
%13:_(<2 x s32>) = G_ZEXT %12:_(<2 x s1>)
%14:_(<2 x s32>), %15:_(<2 x s1>) = G_UADDO %11:_, %10:_
%16:_(<2 x s32>) = G_ZEXT %15:_(<2 x s1>)
%17:_(<2 x s32>) = G_ADD %13:_, %16:_
%18:_(<2 x s32>) = G_MUL %4:_, %6:_
%19:_(<2 x s32>) = G_UMULH %4:_, %5:_
%20:_(<2 x s32>) = G_UMULH %3:_, %6:_
%21:_(<2 x s32>), %22:_(<2 x s1>) = G_UADDO %18:_, %19:_
%23:_(<2 x s32>) = G_ZEXT %22:_(<2 x s1>)
%24:_(<2 x s32>), %25:_(<2 x s1>) = G_UADDO %21:_, %20:_
%26:_(<2 x s32>) = G_ZEXT %25:_(<2 x s1>)
%27:_(<2 x s32>) = G_ADD %23:_, %26:_
%28:_(<2 x s32>), %29:_(<2 x s1>) = G_UADDO %24:_, %17:_
%30:_(<2 x s32>) = G_ZEXT %29:_(<2 x s1>)
%31:_(<2 x s32>) = G_ADD %27:_, %30:_
%32:_(<2 x s32>) = G_UMULH %4:_, %6:_
%33:_(<2 x s32>) = G_ADD %32:_, %31:_

we need a way to merge %28:_(<2 x s32>) (a vector of the low parts of each element of the result) and %33:_(<2 x s32>) (same for the high parts) into a <2 x s64> result. I don't know how to do that. G_MERGE_VALUES only works on scalars, though I don't see why it couldn't be extended to work element-wise on vectors that all have the same length. And I can't see any existing helper functions in LegalizerHelper.cpp that do what I want either.

Afaik %3:_(<2 x s32>), %4:_(<2 x s32>) = G_UNMERGE_VALUES %0:_(<2 x s64>) is not yet defined to work like that and should probably be forbidden to make such instruction in builder. Only allow scalar split and vector split to elements or sub-vectors.
Desired unmerge works if done step by step:

%5:_(s64), %6:_(s64) = G_UNMERGE_VALUES %0:_(<2 x s64>)
%7:_(s32), %8:_(s32) = G_UNMERGE_VALUES %5:_(s64)
%9:_(s32), %10:_(s32) = G_UNMERGE_VALUES %6:_(s64)
%3:_(<2 x s32>) = G_BUILD_VECTOR %5:_(s64), %7:_(s32)
%4:_(<2 x s32>) = G_BUILD_VECTOR %6:_(s64), %8:_(s32)

Similar for merge: merge s64 elements first then build <2 x s64> vector.
But <2 x s32> instructions will be scalarized, bit shift packing into <2 x s16> is most probably slower.
Why not go with scalarize at the start?

foad added a comment.Oct 5 2021, 2:33 AM

Why not go with scalarize at the start?

Good idea. I will abandon this and change AMDGPU's legalization rules instead.

foad abandoned this revision.Oct 5 2021, 2:33 AM
foad added a comment.Oct 5 2021, 4:21 AM

Afaik %3:_(<2 x s32>), %4:_(<2 x s32>) = G_UNMERGE_VALUES %0:_(<2 x s64>) is not yet defined to work like that and should probably be forbidden to make such instruction in builder. Only allow scalar split and vector split to elements or sub-vectors.

Ugh, it is apprently legal but it means something different. See comments in D111132.

Don't we need this even if AMDGPU scalarizes first?

foad added a comment.Oct 20 2021, 2:40 AM

Don't we need this even if AMDGPU scalarizes first?

Why? For other targets? Maybe, but I'm not motivated to work on it until there's a concrete need for it.

Don't we need this even if AMDGPU scalarizes first?

Why? For other targets? Maybe, but I'm not motivated to work on it until there's a concrete need for it.

Yes. Plus I'm not entirely comfortable with legalization only working if you have the magic ordering of legalizer rules. I've lost a lot of time fighting legalizer rule ordering issues

foad added a comment.Oct 20 2021, 8:43 AM

My preferred way of implementing this would be to extend G_MERGE_VALUES so that it works elementwise on vectors:

%2:_(<3 x s64>) = G_MERGE_VALUES %0:_(<3 x s32>), %1:_(<3 x s32>) // %0 is three low halves, %1 is three high halves

But that requires G_UNMERGE_VALUES to be able to do the converse:

%1:_(<3 x s32>), %2:_(<3 x s32>) = G_UNMERGE_VALUES %0:_(<3 x s64>)  // %1 is three low halves, %2 is three high halves

But this was already allowed and apparently meant something different (a G_BITCAST plus the converse of G_CONCAT_VECTORS?) which is how I got sidetracked into D111132.