Page MenuHomePhabricator

Using Masked Load / Store intrinsics in Loop Vectorizer
Needs ReviewPublic

Authored by delena on Dec 4 2014, 6:46 AM.



The loop vectorizer optimizes loops containing conditional memory accesses by generating masked load and store intrinsics.
This decision is target dependent.

I already submitted the codegen changes for the intrinsics.

Diff Detail

Event Timeline

delena updated this revision to Diff 16924.Dec 4 2014, 6:46 AM
delena retitled this revision from to Using Masked Load / Store intrinsics in Loop Vectorizer.
delena updated this object.
delena edited the test plan for this revision. (Show Details)
delena added reviewers: nadav, aschwaighofer.
delena set the repository for this revision to rL LLVM.
delena added a subscriber: Unknown Object (MLST).
aschwaighofer edited edge metadata.Dec 4 2014, 8:35 AM

Very nice! I only see minor fixes.



This is a predicate. Could you rename it to isMaskedOperation?


Should this be a SmallPtrSet?


Should this be in a separate patch? This seems like an unrelated change.


Could the mask1.ll, mask2.ll, etc be in one file? We try to not create to many .ll files when test cases are related.

Could you use

CHECK-LABEL: @function_name
CHECK: ... // rest of the test for function_name test

CHECK-LABEL: @function_name2
CHECK: ... // rest of the test for function_name2 test

Could you modify the test so that it tests more of the expected IR?

nadav edited edge metadata.Dec 4 2014, 8:39 AM

Hi Elena,

Thank you for working on this.

+ bool canPredicateStore(Type *DataType, Value *Ptr) {
+ return TTI->isLegalPredicatedStore(DataType, isConsecutivePtr(Ptr));
+ }
+ bool canPredicateLoad(Type *DataType, Value *Ptr) {
+ return TTI->isLegalPredicatedLoad(DataType, isConsecutivePtr(Ptr));
+ }
+ bool setMaskedOp(const Instruction* I) {
+ return (MaskedOp.find(I) != MaskedOp.end());
+ }

Can you please document these functions? The name setMaskedOp is confusing and Doxygen style comments could be useful here.


Hi Elena,

Please see a question from me in inline comments. And thanks for doing this!



Could this condition be simplified? E.g. we can store result of canPredicateStore(..) in a bool variable and use it here.
Also, I think it might be incorrect - shouldn't we insert SI to MaskedOp even if NumPredStores hasn't exceeded NumberOfStoresToPredicate?

delena added a comment.Dec 7 2014, 5:17 AM

I'm adding comments, changing function names, joining 4 tests in one.


do you mean isLegalMaskedLoad()? isLegalMaskedStore()?


What size should I give in this case? I don't know how many masked operations will be in this block.
Or it is not so important?


If NumPredStores hasn't exceeded NumberOfStoresToPredicate I'll not generate masked store. I want to keep the current behavior.

Something like this?

bool isSafePtr = (SafePtrs.count(SI->getPointerOperand()) !=0);
bool isSinglePredecessor = SI->getParent()->getSinglePredecessor();

if (++NumPredStores > NumberOfStoresToPredicate || !isSafePtr ||
    !isSinglePredecessor) {
  // Build a masked store if it is legal for the target, otherwise scalarize
  // the block.
  bool isLegalMaskedOp = canPredicateStore(SI->getValueOperand()->getType(),
  if (isLegalMaskedOp) {
  return false;

My tests don't work without this patch because 512 / 8 = 64


no problem

delena updated this revision to Diff 17024.Dec 7 2014, 5:23 AM
delena added a reviewer: mzolotukhin.
delena removed a subscriber: mzolotukhin.

Thank you all for reviewing this. I addressed all comments and applying the updated patch.

nadav added a comment.Dec 8 2014, 8:50 AM

+ / Returns true if vector representation of the instruction \p I
/ requires mask.
+ bool toBuildMaskedVectorInst(const Instruction* I) {
+ return (MaskedOp.count(I) != 0);
+ }
+ void SI();

Maybe a better name would be "isMaskRequired"?

mzolotukhin edited edge metadata.Dec 8 2014, 7:48 PM

Hi Elena,

Please see my comments inline.


Actually, I wanted to get rid of ++NumPredStores, followed by --NumPredStores.
I'd rewrite it like this:

if (!isSafePtr || !isSinglePredecessor) {
    if (isLegalMaskedOp) {
        // This store will be masked, no predication is needed for it.
    } else {
        // This store needs to be predicated.
        if (++NumPredStores > NumberOfStoresToPredicate)
            return false;

Also, I wonder if we actually need a 'continue' statement in if(isLegalMaskedOp) - do you intentionally skip subsequent checks (e.g. it->mayThrow) for this case?

To Michael:
Some tests fail in this case, I tried. Because NumberOfStoresToPredicate allows you to predicated some stores without masking.
I wanted to keep this knob as is.
I also do not check mayThrow() because it is for calls only.

To Nadav:

Maybe a better name would be "isMaskRequired"?

No problem

delena updated this revision to Diff 17150.Dec 11 2014, 12:44 AM
delena edited edge metadata.

Michael is right about potentially trapping constant expression. I addressed this issue and also added a test for it.

nadav added a comment.Dec 11 2014, 9:06 AM

Some changes in the patch should go in separately. For example:

  • return false;

+ return false;


This is a whitespace fix. Just commit it as is.

  • assert(MaxVectorSize <= 32 && "Did not expect to pack so many elements"

+ assert(MaxVectorSize <= 64 && "Did not expect to pack so many elements"

" into one vector!");

Also the assert change can go in.

+ void SI();

What is SI() ??

I would also split the TTI changes into a separate patch. It can probably go in already.

Some more minor notes from me.

104–105 ↗(On Diff #17150)

Please fix indentation here.

109–110 ↗(On Diff #17150)

And here.


This also might be committed separately.

delena updated this revision to Diff 17263.Dec 14 2014, 5:40 AM

This is the diff after all minor changes went in.

Hi Elena,

Thank you for checking-in the first (small) part of the changes.

The code looks good to me except some minor nits (see inline), but I'd wait for a LGTM from Nadav or Arnold.


Please fix the indentation here.


'}' and 'else' should be on the same line.


And here too.

8 ↗(On Diff #17263)

Do we really need the target triple, or could it be omitted?

12–19 ↗(On Diff #17263)

Do we need a test for i64 case as well?

26 ↗(On Diff #17263)

I believe this is unnecessary. We check for AVX2-LABEL here and later in other functions, so all AVX2-instructions should be between the two labels. Thus, the next label would designate the end of the current function, and there is no need to explicitly look for 'ret void'.

Also, I wonder if it's possible that basic blocks are printed in a different order - in this case we might get a fail even though the loop is correctly vectorized.