This is an archive of the discontinued LLVM Phabricator instance.

Masked Vector Load/Store Intrinsics
Needs ReviewPublic

Authored by delena on Nov 10 2014, 5:19 AM.

Details

Summary

Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.

Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.

Diff Detail

Event Timeline

delena updated this revision to Diff 15972.Nov 10 2014, 5:19 AM
delena retitled this revision from to Masked Vector Load/Store Intrinsics.
delena updated this object.
delena edited the test plan for this revision. (Show Details)
delena set the repository for this revision to rL LLVM.
delena added a subscriber: Unknown Object (MLST).
delena updated this revision to Diff 15975.Nov 10 2014, 6:53 AM

Updated diff file.

Brief description:

  1. Store

call void @llvm.masked.store.v16i32(i8* %addr, <16 x i32>%val, i32 4, <16 x i1>%mask)

store vector value %val under %addr, write only lanes where an appropriate %mask bit is on.
Number of elements in %mask and in %val should be the same.

  1. Load

%res = call <16 x float> @llvm.masked.load.v16f32(i8* %addr, <16 x float>%src0, i32 4, <16 x i1>%mask)

load vector value from %addr, read only lanes where an appropriate %mask bit is on.
Number of elements in %mask and in %res and in %src0 should be the same.

Fill masked-off lanes with elements from %src0 (like “select”).

In the both cases don’t access memory under masked-off lanes.
Supported types depend on target.
Load/store mean consecutive access. I plan add intrinsics for the strided and random access in the future.

  • Elena

From: mankeyrabbit@gmail.com [mailto:mankeyrabbit@gmail.com] On Behalf Of James Molloy
Sent: Monday, November 10, 2014 15:28
To: reviews+D6191+public+d24d75579f47f6d4@reviews.llvm.org
Cc: Demikhovsky, Elena; Adam Nemet; Hal Finkel; rob.khasanov@gmail.com; Owen Anderson; LLVM Commits; nurmukhametov.alex@gmail.com
Subject: Re: [PATCH] Masked Vector Load/Store Intrinsics

Hi Elena,

There doesn't seem to be a change to LangRef here. Could you please briefly describe the semantics of masked load and store intrinsics?

We have strided load/store instructions in NEON that could be modelled with them, depending on their semantics.

Cheers,

James

hfinkel edited edge metadata.Nov 11 2014, 9:20 AM

Thanks for working on this. A bunch of these things can be split up into different patches (even if we don't do that for the review, please make sure you split them when committing).

You're missing separate cost-model tests for the cost-model changes. These can (and should) be tested independently from the vectorizer tests.

include/llvm/CodeGen/SelectionDAGNodes.h
1933

Remove empty comment line (same for classes below).

1935

Remove line. (same for classes below)

1946

Remove commented-out code.

1973

Don't repeat this here -- it is in the base class.

1974

All subclasses of MaskedLoadStoreSDNode will have a getMask -- move that into the base class too. Add a comment about the data type of the mask.

include/llvm/IR/IRBuilder.h
434

The pattern of this class interface is to have a function per intrinsic, so that the intrinsic id and operand structure can be hidden within the implementation. Please split this into CreateMaskedLoad and CreateMaskedStore, each with appropriate parameters.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4789

Vector splitting is a legalization action, why is this being handled here in DAGCombine (as opposed to in LegalizeVectorTypes) -- shouldn't this be a part of DAGTypeLegalizer::SplitVecRes_MSTORE?

4850

Same question as above.

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
34

Same comment re: the alignment.

37

This alignment is not right -- if the full load as alignment Alignment, then the upper half can have only half that.

lib/IR/IRBuilder.cpp
176

Don't add blank line.

lib/Target/X86/X86TargetTransformInfo.cpp
623

This is a separate change, than can be committed and tested separately.

lib/Transforms/Vectorize/LoopVectorize.cpp
5341

Don't remove this line.

test/CodeGen/X86/masked_memop.ll
67

Are there tests in this file that demonstrate complete scalarization? If not, please add some. If so, please note them.

anemet added inline comments.Nov 11 2014, 10:58 PM
include/llvm/Analysis/TargetTransformInfo.h
273–278

Should Consecutive be bool?

lib/Target/X86/X86InstrAVX512.td
2105–2139

There's got to be a better way to write this and the store later.

My preference would be to only add a few (one?) for now in order to test the functionality. Then we figure out a way to have this be part of a new AVX512_maskable class (e.g. AVX512_maskable_trapping)

lib/Transforms/Vectorize/LoopVectorize.cpp
886

Please comment what this is for.

5355

Don't add a new line.

5357–5366

I read this far and I still don't understand the setMaskedOp business. Can you please explain.

5368

Stale comment.

Hi Hal, Adam,

Thank you for the review. Please see my comments.

include/llvm/Analysis/TargetTransformInfo.h
273–278

I call to isConsecutivePtr() which returns "int" - (-1) for reverse.

bool canPredicateStore(Type *DataType, Value *Ptr) {
  return TTI->isLegalPredicatedStore(DataType, isConsecutivePtr(Ptr));
}
include/llvm/CodeGen/SelectionDAGNodes.h
1974

All done in SelectionDAGNodes.h. Thanks.

include/llvm/IR/IRBuilder.h
434

Sounds reasonable, done.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4789

It has to be done here, due to the same reason as VSELECT, this is the comment from VSELECT:

// If the VSELECT result requires splitting and the mask is provided by a
// SETCC, then split both nodes and its operands before legalization. This
// prevents the type legalizer from unrolling SETCC into scalar comparisons
// and enables future optimizations (e.g. min/max pattern matching on X86).

I'll put the same comment here as well.

I put the split-code in the DAGTypeLegalizer for the completeness, but it is not executed, otherwise the generated assembly code is too bad.

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
37

Usually alignment is not bigger than element size in this case.
We started, for example, from
%res = load i32 *%ptr, align 4
and created a call to intrinsic
%res.v = call <16 x i32> @llvm.masked.load(%ptr, %mask, undef, 4) - alignment is still 4
not we decided to split into two, alignment should remain the same.
So, vectorizer can't create masked load for
%res = load i32 *%ptr, align 64. Or I'm wrong?

Another question, what happens if user writes IR and specifies another alignment?
%res.v = call <16 x i32> @llvm.masked.load(%ptr, %mask, undef, 64)
Theoretically, if alignment is bigger than original size - splitting is wrong.
I can spit alignment if alignment is equal to the original size.

Look at SplitVecRes_LOAD() - nobody splits alignment there.

lib/IR/IRBuilder.cpp
176

Removed.

lib/Target/X86/X86InstrAVX512.td
2105–2139

I implemented load and store together. I have to generated code and check correctness.That's why I put the whole code in.
I know that I still have a lot of work here.
Let we optimize the .td file later, in the one of the next patches.

lib/Target/X86/X86TargetTransformInfo.cpp
623

Ok, no problem. I'll also add a test.

lib/Transforms/Vectorize/LoopVectorize.cpp
886

I collect here memory operations that have to be masked on vectorization. If the block is predicated, it does not mean that all loads and stores requires masks. Sometimes the pointer is safe, sometimes there is a configuration flag. All these options are checked before.
The last check is against target readiness to work with masked operations. In this case I put the instruction in the MaskedOp.
I'll add a comment in the code.

5341

Ok.

5355

removed.

5357–5366

I explained before.
Again, during vectorization I have to know for which instructions I have to make a call to masked intrinsics.

test/CodeGen/X86/masked_memop.ll
67

I did not write the complete scalarization yet. It will be in a next patch.

hfinkel added inline comments.Nov 12 2014, 7:53 AM
lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
37

You're right, you only need to divide the alignment by 2 if it is larger than the size of the half vector. Otherwise, leave it alone.

The validity of the splitting is based on type legality, and should have nothing to do with the alignment.

Look at SplitVecRes_LOAD() - nobody splits alignment there.

Sounds like a bug.

pete added a subscriber: pete.Nov 12 2014, 11:21 AM
delena updated this revision to Diff 16159.Nov 13 2014, 7:24 AM
delena edited edge metadata.

I applied all recommendations and uploaded the new patch.

anemet added inline comments.Nov 13 2014, 2:05 PM
lib/Target/X86/X86InstrAVX512.td
2105–2139

OK.

lib/Transforms/Vectorize/LoopVectorize.cpp
886

I see, yes a comment is necessary here.

rob.khasanov edited edge metadata.Nov 14 2014, 3:46 AM

Hi Elena,

include/llvm/CodeGen/SelectionDAGNodes.h
1929–1997
  1. Comment doesn't correspond to class name (MaskedLoadSDNode vs. MaskedLoadStoreSDNode).
  2. Please remove one blank line before comment
1977–1979

Correct ISD::MLOAD -> ISD::MSTORE in comment

1995–1997

Remove one blank line

include/llvm/Target/TargetSelectionDAG.td
191–198

I suggest to introduce a new Selection DAG Type Constraint that checks both operands have the same number of elements in vector.

lib/Analysis/TargetTransformInfo.cpp
104–113

Will you change return values to PrevTTI->isLegalPredicatedLoad(...) in separate patch?

lib/CodeGen/SelectionDAG/LegalizeTypes.h
573

I don't find definition of this function.

605

Also, I don't find definition of this function.

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
1506–1508

Here should be: "store is independent of the other one"

Thank you for reviewing. I fixed all comments / blank lines.

include/llvm/CodeGen/SelectionDAGNodes.h
1929–1997

done

1977–1979

done

1995–1997

done

lib/Analysis/TargetTransformInfo.cpp
104–113

No. The default is always false.

lib/CodeGen/SelectionDAG/LegalizeTypes.h
573

removed

605

removed

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
1506–1508

fixed

delena updated this revision to Diff 16275.Nov 16 2014, 1:16 AM
delena edited edge metadata.

Uploading a new diff.

Hi, please let me know whether I can commit the stuff.

Thank you.

  • Elena