This is an archive of the discontinued LLVM Phabricator instance.

Scalarization for masked gather/scatter intrinsics
ClosedPublic

Authored by delena on Oct 14 2015, 5:27 AM.

Details

Summary

Masked gather/scatter intrinsics are supported on AVX-512 targets and only for 32/64 bit integers and FP types.
In all other cases these intrinsics should be split in a chain of basic blocks and a sequence of scalar load/store operations.

Example:
<16 x i32 > @llvm.masked.gather.v16i32( <16 x i32*> %Ptrs, i32 4, <16 x i1> %Mask, <16 x i32> %Src)
is translated to:

%Mask0 = extractelement <16 x i1> %Mask, i32 0
% ToLoad0 = icmp eq i1 % Mask0, true
br i1 % ToLoad0, label %cond.load, label %else

cond.load:
% Ptr0 = extractelement <16 x i32*> %Ptrs, i32 0
% Load0 = load i32, i32* % Ptr0, align 4
% Res0 = insertelement <16 x i32> undef, i32 % Load0, i32 0
br label %else

else:
%res.phi.else = phi <16 x i32>[% Res0, %cond.load], [undef, % 0]
% Mask1 = extractelement <16 x i1> %Mask, i32 1
% ToLoad1 = icmp eq i1 % Mask1, true
...

Diff Detail

Repository
rL LLVM

Event Timeline

delena updated this revision to Diff 37337.Oct 14 2015, 5:27 AM
delena retitled this revision from to Scalarization for masked gather/scatter intrinsics.
delena updated this object.
delena added reviewers: qcolombet, hfinkel, mzolotukhin.
delena set the repository for this revision to rL LLVM.
delena added a subscriber: llvm-commits.
mkuper edited edge metadata.Oct 15 2015, 5:29 AM

Hi Elena,
Some comments inline.

../include/llvm/Analysis/TargetTransformInfo.h
577

How is this different from isLegalMaskedStore() with Consecutive == 0?

../lib/CodeGen/CodeGenPrepare.cpp
1123

Perhaps these changes belong in a separate patch from the addition of ScalarizeMaskedGather()?

1129

Why is this a dyn_cast, if the result is assumed to be non-null in line 1131? If this is just for the assert, I'd make this a cast, and make the assert
assert(isa<VectorType>(CI->getType())).

1151

This is a bit odd.
If AlignVal >= VecType->getScalarSizeInBits(), then this is a nop.
So, let's say originally AlignVal < VecType->getScalarSizeInBits(). This means that you're raising the alignment for the store of the first element. Are you sure that's what you want?

1236

Same comments apply as for the MaskedLoad.

1327

This looks very similar to ScalarizeMaskedLoad, except for the all-ones case.
Can they perhaps be combined?

1356

Let's say the mask is a constant, but not an all-ones constant.
We could just iterate over the bits of the mask, and place only the loads where the bit is actually 1, instead of creating all of the branchy code.

Normally it wouldn't matter, since SimplifyCFG (I think) would clean it up - but I don't think there's a SimplifyCFG run between here and ISel. Does something else clean up the mess for the non-all-ones constant case?

../lib/Target/X86/X86TargetTransformInfo.cpp
1196–1197

In case you're going to touch this code anyway - isn't it enough to check hasAVX2() (That is, doesn't hasAVX512() imply hasAVX2() ?) )

1212

Shouldn't there be an upper limit to the DataWidth too? (Or to the vector element count, for that matter?)

delena marked an inline comment as done.Oct 18 2015, 4:51 AM

I'll separate patches:

  1. Remove "Consecutive" argument from isLegalMaskedLoad() / store
  2. Alignments and constant masks for masked load / store scalarization procedure
  3. Gather/Scatter scalarization procedure
../include/llvm/Analysis/TargetTransformInfo.h
577

You are right. I forgot about "Consecutive". I decided to remove "Consecutive" from load/store.

../lib/CodeGen/CodeGenPrepare.cpp
1123

I added alignment for gather/scatter. And here as well. I can put it as a separate patch.

1129

The form is not so important. I use VecType later and this form is more convenient for me.

1151

Thanks!
I changed to "min". (That what I initially meant for). If masked operation has alignment 16 (it is a vector op), I can't put 16 to scalar. I reduce it scalar type.

1327

It is similar but not the same. I also extract a pointer from the vector of pointers. I don't want to mix scatter and store, gather and load.

1356

Yes, this optimization is possible. For load/store as well. I'll add it.

../lib/Target/X86/X86TargetTransformInfo.cpp
1196–1197

This function should be changed anyway.

  1. We have masked load/store on AVX, not only AVX2.
  2. We have masked load/store for i16 and i8 vectors on SKX.

But I need to add CodeGen support for this.
I'll change to hasAVX2 meanwhile.

1212

If the vector will be too wide, type legalizer will split it.
I don't know what will be if vector width is not power of 2, I can reject this case.

delena updated this revision to Diff 37724.Oct 19 2015, 1:38 AM
delena edited edge metadata.
delena marked an inline comment as done.

I moved the load-store related changes to another patch. In this patch I have:

  1. isLegalGather() isLegalScatter() interface
  2. code that scalarizes masked.gather and masked.scatter intrinsics when the interface (1) returns false.
mkuper accepted this revision.Oct 22 2015, 4:16 AM
mkuper edited edge metadata.

LGTM with some nits.

../lib/CodeGen/CodeGenPrepare.cpp
1129

That's ok, I'm just saying I would prefer not to have an unchecked dyn_cast<>.

1361

Don't you need AlignVal = std::min(AlignVal, VecType->getScalarSizeInBits()), like for the load/store case?

1367

This gets eventually cleaned up is Src0 is undef, right?

1410

LoadInst* Load -> LoadInst *Load

1493

Same as above re AlignVal

../lib/Target/X86/X86TargetTransformInfo.cpp
1212

So, say, a <32 x i32> gather will get split into two <16 x i32> gathers by the legalizer?
In any case:
a) We should probably reject the not-power-of-2 case, unless we know the legalizer can handle it. If we don't reject it, then I think there should be a test for it.
b) I guess the name is a bit confusing - it's not really "isLegal" (because some things it accepts may not actually be legal on the target), it's more like "isLegalizeable". I don't think we should change the name, but it may be worth to note this in the declaration in TargetTransformInfo.h.

This revision is now accepted and ready to land.Oct 22 2015, 4:16 AM
delena marked an inline comment as done.Oct 25 2015, 3:17 AM

Hi Michael, not committing the code. I want to hear your opinion about rejection of non-power-of-2 elements.

../lib/CodeGen/CodeGenPrepare.cpp
1361

No. Gather / scatter alignment means alignment for one element, not for a vector. If the specified alignment is bigger than element size, we can't handle it properly on X86. The only one option that I see here is to scalarize masked-gather-scatter to a chain of scalar loads with required alignment.

1367

Yes, the codegen with clean it anyway.

../lib/Target/X86/X86TargetTransformInfo.cpp
1212

a) I can add rejection of the not-power-of-2 cases. And a test.
But I want to know your opinion about vector and scalar types in this function:
(This comment I want to put inside)

// This function is called now in two cases: from the Loop Vectorizer
// and from the Scalarizer.
// When the Loop Vectorizer asks about legality of the feature,
// the vectorization factor is not calculated yet. The Loop Vectorizer
// sends a scalar type and the decision is based on the width of the
// scalar element.
// Later on, the cost model will estimate usage this intrinsic based on
// the vector type.
// The Scalarizer asks again about legality. It sends a vector type.
// In this case we can reject non-power-of-2 vectors.
if (isa<VectorType>(DataTy) && !isPowerOf2_32(DataTy->getVectorNumElements()))
  return false;
Type *ScalarTy = DataTy->getScalarType();
 ...

b) "IsLegal" comes from TypeLegalizer and it is equal to "IsSupported". I wrote in the comments in TargetTransformInfo.h "Return true if the target supports.."

mkuper added inline comments.Oct 25 2015, 7:16 AM
../lib/CodeGen/CodeGenPrepare.cpp
1361

Ah, right, of course.

../lib/Target/X86/X86TargetTransformInfo.cpp
1212

SGTM.

This revision was automatically updated to reflect the committed changes.