This is an archive of the discontinued LLVM Phabricator instance.

Gather and Scatter intrinsics in the Loop Vectorizer
ClosedPublic

Authored by delena on Dec 21 2015, 7:08 AM.

Details

Summary

Loop vectorizer now knows to vectorize GEP and create masked gather and scatter intrinsics for random memory access.

The feature is enabled on AVX-512 target.
the following loops are vectorized on AVX-512:

struct In {

float a;
float b;

};

void foo1 (float * restrict in, float * restrict out, int * restrict trigger, int * restrict index) {

for (int i=0; i<SIZE; ++i) {
  if (trigger[i] > 0) {
    out[i] = in[index[i]] + (float) 0.5;
  }
}

}
void foo2 (In * restrict in, float * restrict out, int * restrict trigger) {

for (int i=0; i<SIZE; ++i) {
  if (trigger[i] > 0) {
    out[i] = in[i].b + (float) 0.5;
  }
}

}

struct Out {

float a;
float b;

};
void foo3 (In * restrict in, Out * restrict out, int * restrict trigger) {

for (int i=0; i<SIZE; ++i) {
  if (trigger[i] > 0) {
    out[i].b = in[i].b + (float) 0.5;
  }
}

}

Diff Detail

Repository
rL LLVM

Event Timeline

delena updated this revision to Diff 43374.Dec 21 2015, 7:08 AM
delena retitled this revision from to Gather and Scatter intrinsics in the Loop Vectorizer.
delena updated this object.
delena added reviewers: Ayal, gilr, nadav, aschwaighofer.
delena set the repository for this revision to rL LLVM.
delena added a subscriber: llvm-commits.
delena updated this revision to Diff 43829.Dec 31 2015, 5:08 AM
delena removed rL LLVM as the repository for this revision.

Updated diff after commit of gather-scatter cost

spatel added a subscriber: spatel.Jan 25 2016, 9:20 AM

I don't know the vectorizer or scatter/gather well enough for a complete review, so I've just pointed out some small fixes.

Also, I think it's fine to have scatter and gather in one patch for review, but they should be split into separate patches for commit to make it easier to investigate bugs/regressions.

../lib/IR/IRBuilder.cpp
261–263 ↗(On Diff #43829)

'dyn_cast' can be a plain 'cast' since we're asserting.

265 ↗(On Diff #43829)

This can be 'auto' to match the earlier cast.

285–286 ↗(On Diff #43829)

Use plain 'cast' since we're asserting.

293 ↗(On Diff #43829)

Use 'auto' for consistency.

304 ↗(On Diff #43829)

I don't understand the comment. Please explain more or delete.

../lib/Transforms/Vectorize/LoopVectorize.cpp
2397 ↗(On Diff #43829)

typo: supported

2468 ↗(On Diff #43829)

Use complete sentence for comment.

2469 ↗(On Diff #43829)

Extra spaces.

2470–2493 ↗(On Diff #43829)

The function is getting too long / indented. I would prefer to see it broken up with helper functions.

2511–2512 ↗(On Diff #43829)

Formatting - 'else' should be on the same line as the brace.

5331 ↗(On Diff #43829)

typo: transferred

../test/Transforms/LoopVectorize/X86/gather_scatter.ll
20–23 ↗(On Diff #43829)

Please check the full text for any masked ops that are created in this test and others. We do not want to miss any bugs/regressions resulting from changes in the data types or number of instructions produced.

../test/Transforms/LoopVectorize/X86/masked_load_store.ll
283 ↗(On Diff #43829)

Please add positive checks for the expected / correct mask ops that will now be produced.

Ayal added inline comments.Jan 26 2016, 4:44 AM
../include/llvm/IR/IRBuilder.h
436 ↗(On Diff #43829)

Set default Mask to 0 similar to scatter below?

../lib/IR/IRBuilder.cpp
250 ↗(On Diff #43829)

Load >> Gather

253 ↗(On Diff #43829)

a[n] vector

265 ↗(On Diff #43829)

As Sanjay suggested, you could do

auto *PtrsTy = cast<...;
auto *PtrTy = cast<...;

and rely on these casts asserting for you. Also good to place these two lookalike variables together.

278 ↗(On Diff #43829)

Val >> Data

279 ↗(On Diff #43829)

ditto

281 ↗(On Diff #43829)

a[n] vector

298 ↗(On Diff #43829)

Inco[m]patible.
Two separate asserts?

../lib/Transforms/Vectorize/LoopVectorize.cpp
2401–2402 ↗(On Diff #43829)

Can fold into

bool CreateGatherScatter = (!ConsecutiveStride && ((LI && ...) || (SI && ...)));
2470 ↗(On Diff #43829)

comment that in vectorizing Gep, across UF parts, we want to keep each loop-invariant base or index of Gep scalar.

2483 ↗(On Diff #43829)

looks better to first set GEPBasePtr and then set the GEP Ops.

5331 ↗(On Diff #43829)

transfer[r]ed to >> translated into

5599 ↗(On Diff #43829)

can assert ConsecutiveStride here to be sure

delena marked 8 inline comments as done.Jan 26 2016, 5:20 AM
delena added inline comments.
../lib/Transforms/Vectorize/LoopVectorize.cpp
2470–2493 ↗(On Diff #43829)

May be after review. Otherwise the diff will be inconvenient for reviewers.

../test/Transforms/LoopVectorize/X86/gather_scatter.ll
20–23 ↗(On Diff #43829)

I can add types. But the vector factor at this point is less than I want.
I received VF=8 instead of 16.
The problem is in the cost model. I tried to fix the cost model, but the patch was rejected.
So, I'm adding v8f32 and hope that it will be fixed later.

delena updated this revision to Diff 45976.Jan 26 2016, 5:23 AM
delena set the repository for this revision to rL LLVM.

Code changes according to Sanjay's comments.

delena marked 14 inline comments as done.Jan 27 2016, 11:56 PM
delena updated this revision to Diff 46244.Jan 28 2016, 1:18 AM

Minor fixes, according to Ayal comments.

nadav added inline comments.Jan 28 2016, 9:11 AM
../lib/Transforms/Vectorize/LoopVectorize.cpp
2506 ↗(On Diff #46244)

Please initialize to null.

2556 ↗(On Diff #46244)

Please don't clean the code as part of this change. It makes it more difficult to review the code.

5334 ↗(On Diff #46244)

Please use Doxygen style comment and document each of the arguments with \p.

../unittests/Support/AlignOfTest.cpp
93 ↗(On Diff #46244)

Why is this change needed?

delena updated this revision to Diff 46485.Jan 31 2016, 12:05 AM

Updated diff following Nadav's comments.

Ayal edited edge metadata.Feb 4 2016, 7:43 AM

Check your documentation to follow Nadav's comment: "Please use Doxygen style comment and document each of the arguments with \p."

../lib/IR/IRBuilder.cpp
262–263 ↗(On Diff #46485)

This assert is redundant as it is taken care of by the two 'cast's. Unless you want to issue this particular error message?

282 ↗(On Diff #46485)

"where the Val elements" >> "where the \p Data element"

291–292 ↗(On Diff #46485)

These two asserts are mostly dead code as they are taken care of by the three 'cast's.

../lib/Transforms/Vectorize/LoopVectorize.cpp
2554 ↗(On Diff #46485)

Formatting - 'else' should be on the same line as the brace.

delena marked 3 inline comments as done.Feb 11 2016, 6:28 AM
delena updated this revision to Diff 47637.Feb 11 2016, 6:30 AM
delena edited edge metadata.
delena removed rL LLVM as the repository for this revision.

Changes according to Ayal's comments.

delena updated this revision to Diff 47948.Feb 14 2016, 11:38 PM

Fixed one more style issue.

I expect this code to fail trying to vectorize non-default address space accesses. See D17270 for details. We should either support arbitrary address space for gather/scatter intrinsics or at least check that we are vectorizing a default address space accesses.

delena updated this revision to Diff 48204.Feb 17 2016, 9:36 AM

Style changes according to Ayal comments.

pjcoup added a subscriber: pjcoup.Feb 17 2016, 9:39 AM
Ayal accepted this revision.Feb 17 2016, 10:37 AM
Ayal edited edge metadata.

This looks good to me, thanks for accommodating the changes.

This revision is now accepted and ready to land.Feb 17 2016, 10:37 AM
This revision was automatically updated to reflect the committed changes.