This is an archive of the discontinued LLVM Phabricator instance.

Gather and Scatter intrinsics in the Loop Vectorizer
ClosedPublic

Authored by delena on Dec 21 2015, 7:08 AM.

Details

Summary

Loop vectorizer now knows to vectorize GEP and create masked gather and scatter intrinsics for random memory access.

The feature is enabled on AVX-512 target.
the following loops are vectorized on AVX-512:

struct In {

float a;
float b;

};

void foo1 (float * restrict in, float * restrict out, int * restrict trigger, int * restrict index) {

for (int i=0; i<SIZE; ++i) {
  if (trigger[i] > 0) {
    out[i] = in[index[i]] + (float) 0.5;
  }
}

}
void foo2 (In * restrict in, float * restrict out, int * restrict trigger) {

for (int i=0; i<SIZE; ++i) {
  if (trigger[i] > 0) {
    out[i] = in[i].b + (float) 0.5;
  }
}

}

struct Out {

float a;
float b;

};
void foo3 (In * restrict in, Out * restrict out, int * restrict trigger) {

for (int i=0; i<SIZE; ++i) {
  if (trigger[i] > 0) {
    out[i].b = in[i].b + (float) 0.5;
  }
}

}

Diff Detail

Event Timeline

delena updated this revision to Diff 43374.Dec 21 2015, 7:08 AM
delena retitled this revision from to Gather and Scatter intrinsics in the Loop Vectorizer.
delena updated this object.
delena added reviewers: Ayal, gilr, nadav, aschwaighofer.
delena set the repository for this revision to rL LLVM.
delena added a subscriber: llvm-commits.
delena updated this revision to Diff 43829.Dec 31 2015, 5:08 AM
delena removed rL LLVM as the repository for this revision.

Updated diff after commit of gather-scatter cost

spatel added a subscriber: spatel.Jan 25 2016, 9:20 AM

I don't know the vectorizer or scatter/gather well enough for a complete review, so I've just pointed out some small fixes.

Also, I think it's fine to have scatter and gather in one patch for review, but they should be split into separate patches for commit to make it easier to investigate bugs/regressions.

../lib/IR/IRBuilder.cpp
261–263

'dyn_cast' can be a plain 'cast' since we're asserting.

265

This can be 'auto' to match the earlier cast.

285–286

Use plain 'cast' since we're asserting.

293

Use 'auto' for consistency.

304

I don't understand the comment. Please explain more or delete.

../lib/Transforms/Vectorize/LoopVectorize.cpp
2397

typo: supported

2468

Use complete sentence for comment.

2469

Extra spaces.

2470–2493

The function is getting too long / indented. I would prefer to see it broken up with helper functions.

2511–2512

Formatting - 'else' should be on the same line as the brace.

5331

typo: transferred

../test/Transforms/LoopVectorize/X86/gather_scatter.ll
20–23

Please check the full text for any masked ops that are created in this test and others. We do not want to miss any bugs/regressions resulting from changes in the data types or number of instructions produced.

../test/Transforms/LoopVectorize/X86/masked_load_store.ll
283

Please add positive checks for the expected / correct mask ops that will now be produced.

Ayal added inline comments.Jan 26 2016, 4:44 AM
../include/llvm/IR/IRBuilder.h
436

Set default Mask to 0 similar to scatter below?

../lib/IR/IRBuilder.cpp
250

Load >> Gather

253

a[n] vector

265

As Sanjay suggested, you could do

auto *PtrsTy = cast<...;
auto *PtrTy = cast<...;

and rely on these casts asserting for you. Also good to place these two lookalike variables together.

278

Val >> Data

279

ditto

281

a[n] vector

298

Inco[m]patible.
Two separate asserts?

../lib/Transforms/Vectorize/LoopVectorize.cpp
2401–2402

Can fold into

bool CreateGatherScatter = (!ConsecutiveStride && ((LI && ...) || (SI && ...)));
2470

comment that in vectorizing Gep, across UF parts, we want to keep each loop-invariant base or index of Gep scalar.

2483

looks better to first set GEPBasePtr and then set the GEP Ops.

5331

transfer[r]ed to >> translated into

5599

can assert ConsecutiveStride here to be sure

delena marked 8 inline comments as done.Jan 26 2016, 5:20 AM
delena added inline comments.
../lib/Transforms/Vectorize/LoopVectorize.cpp
2470–2493

May be after review. Otherwise the diff will be inconvenient for reviewers.

../test/Transforms/LoopVectorize/X86/gather_scatter.ll
20–23

I can add types. But the vector factor at this point is less than I want.
I received VF=8 instead of 16.
The problem is in the cost model. I tried to fix the cost model, but the patch was rejected.
So, I'm adding v8f32 and hope that it will be fixed later.

delena updated this revision to Diff 45976.Jan 26 2016, 5:23 AM
delena set the repository for this revision to rL LLVM.

Code changes according to Sanjay's comments.

delena marked 14 inline comments as done.Jan 27 2016, 11:56 PM
delena updated this revision to Diff 46244.Jan 28 2016, 1:18 AM

Minor fixes, according to Ayal comments.

nadav added inline comments.Jan 28 2016, 9:11 AM
../lib/Transforms/Vectorize/LoopVectorize.cpp
2506–2514

Please initialize to null.

2557

Please don't clean the code as part of this change. It makes it more difficult to review the code.

5335

Please use Doxygen style comment and document each of the arguments with \p.

../unittests/Support/AlignOfTest.cpp
93 ↗(On Diff #46244)

Why is this change needed?

delena updated this revision to Diff 46485.Jan 31 2016, 12:05 AM

Updated diff following Nadav's comments.

Ayal edited edge metadata.Feb 4 2016, 7:43 AM

Check your documentation to follow Nadav's comment: "Please use Doxygen style comment and document each of the arguments with \p."

../lib/IR/IRBuilder.cpp
262–263

This assert is redundant as it is taken care of by the two 'cast's. Unless you want to issue this particular error message?

282

"where the Val elements" >> "where the \p Data element"

291–292

These two asserts are mostly dead code as they are taken care of by the three 'cast's.

../lib/Transforms/Vectorize/LoopVectorize.cpp
2554

Formatting - 'else' should be on the same line as the brace.

delena marked 3 inline comments as done.Feb 11 2016, 6:28 AM
delena updated this revision to Diff 47637.Feb 11 2016, 6:30 AM
delena edited edge metadata.
delena removed rL LLVM as the repository for this revision.

Changes according to Ayal's comments.

delena updated this revision to Diff 47948.Feb 14 2016, 11:38 PM

Fixed one more style issue.

I expect this code to fail trying to vectorize non-default address space accesses. See D17270 for details. We should either support arbitrary address space for gather/scatter intrinsics or at least check that we are vectorizing a default address space accesses.

delena updated this revision to Diff 48204.Feb 17 2016, 9:36 AM

Style changes according to Ayal comments.

pjcoup added a subscriber: pjcoup.Feb 17 2016, 9:39 AM
Ayal accepted this revision.Feb 17 2016, 10:37 AM
Ayal edited edge metadata.

This looks good to me, thanks for accommodating the changes.

This revision is now accepted and ready to land.Feb 17 2016, 10:37 AM
This revision was automatically updated to reflect the committed changes.