This is an archive of the discontinued LLVM Phabricator instance.

Extend SLP Vectorizer to deal with aggregates
ClosedPublic

Authored by ArchDRobison on Oct 29 2015, 12:59 PM.

Details

Summary

This patch extends SLP Vectorizer to vectorize code involving structs/arrays that resemble vectors. The motivation, as explained here is to vectorize tuples and structs in Julia that act like vectors.

Per Arnold Schwaighofer's suggestion, the improvement works backwards from a store of insert value instructions. The associated patch D14260 adds a peephole optimization to inst-combine to clean up.

I limited the SLPVectorization change to aggregates that resemble vectors of length 2, 4, 8, or 16 since other lengths seem unlikely to pay off, but I could be wrong.

In some spots, I changed logic for "extractelement" to be "polymorphic" to "extractelement" or "extractvalue". In other places (e.g. findBuildVector"), that seemed to be more trouble than it was worth and so I coded custom logic (e.g. findBuildAggregate).

Diff Detail

Event Timeline

ArchDRobison retitled this revision from to Extende SLP Vectorizer to deal with aggregates.
ArchDRobison updated this object.
ArchDRobison added a subscriber: llvm-commits.
ArchDRobison retitled this revision from Extende SLP Vectorizer to deal with aggregates to Extend SLP Vectorizer to deal with aggregates.Oct 30 2015, 11:07 AM
aschwaighofer added inline comments.
lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
875 ↗(On Diff #38757)

I think we must also check for DataLayout::getTypeSizeInBits(VectorType) == DataLayout::getTypeSizeInBits(StructType/ArrayType) to show that it is truly isomorphic.

The allocated size for elements in an aggregate might be different because of padding.

For example: {<i2, i2>} != <2 x i2>

lib/Transforms/Vectorize/SLPVectorizer.cpp
284

Similar to my previous comment we must check that the type sizes are equivalent.

Added DataLayout size checks per Arnold's advice.

mzolotukhin edited edge metadata.Nov 2 2015, 12:05 PM

Hi,

It looks good in general. Please separate changes for instcombine and SLP and find some remarks from me inline (mostly nitpicks).

Thanks,
Michael

lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
842 ↗(On Diff #38938)

s/underlyingvalue/underlying value/

843 ↗(On Diff #38938)

I'd prefer to have an example in the comment as well.

850 ↗(On Diff #38938)

Unnecessary curly braces?

lib/Transforms/Vectorize/SLPVectorizer.cpp
292

I think it should depend on available vector register size, i.e. on MaxVectorRegSizeOption and MinVecRegSize.

4287

Nitpick: missing dot in the end.

test/Transforms/SLPVectorizer/X86/insertvalue.ll
2

Could you please also add a test checking that [2 x i2] isn't vectorized to <2 x i2> here? Similar to other tests in test/Transforms/SLPVectorizer/X86/bad_types.ll

ArchDRobison updated this revision to Diff 39255.EditedNov 4 2015, 2:30 PM
ArchDRobison updated this object.
ArchDRobison edited edge metadata.

This revision addresses Michael Zolotukhin's comments.

  • "instcombine" portion split out to D14260.
  • Adds negative test for NOT vectorizing to <4 x i2> . To check that the i2 is the stopper, and not some other oversight, I also added a positive similar test for <4 x i4>.
  • The test of whether an aggregate type can be mapped to a vector is changed to consider the min and max hardware vector sizes.

The last change required some restructuring to make MaxVecRegSize accessible where it was needed in canMapToVector. The restructuring is:

  • Member MaxVecRegSize moved from class SLPVectorizer to BoUpSLP.
  • CanReuseExtract and canMapToVectorchanged from global functions to members of BoUpSLP.
hfinkel added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
283

Add spaces around ==.

291

You assert this here, but where is it checked?

292

Same here: spaces around ==.

1534

Adjust spaces: if (Opcode ==

1539

for -> if

4565

Don't need {} here.

Ayal added a subscriber: Ayal.Dec 21 2015, 9:50 AM
Ayal added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
287–290

maybe better to wrap return value inside ( )

290

cast should be doing the assert above it.

291

Should check similar to the way GEPs are checked to be 2 indexed.
Also spaces around ==

1506

better call local isValidElementType to catch FP80, FP128 fwiw.

1509

spaces around >

1522

may be simpler to obtain Opcode from getSameOpcode(VL), instead of passing it.

2301

else ... return Gather? (but w/o explicit 'else')

ArchDRobison marked 11 inline comments as done.

Updated per most of Hal and Ayal's suggestions. I'll add replies to line comments shortly.

ArchDRobison marked 3 inline comments as done.Jan 28 2016, 10:10 AM

Added commentary to line comments by others.

lib/Transforms/Vectorize/SLPVectorizer.cpp
287–290

My preference is to avoid superfluous parentheses around return values. The style of the source file varies on this point, but in general seems to avoid parentheses around return values of similar complexity.

290

The assert is checking Opcode, not E. The assert is really intended to check that the parameter Opcode is correct on entry. To make this clearer, I moved it up to the entry and make it check that Opcode is ExtractElement or ExtractValue.

291

I'll remove the assert and make the check part of the return value, i.e.

return EI->getNumIndices() == 1 && *EI->idx_begin() == i;
1506

Thanks! I hadn't noticed that subtlety.

1522

I was concerned about not adding more overhead than necessary, though maybe it's not a big deal in practice.

2301

There are two ways to fix this. The "canReuseExtract" check could be an assertion, to verify that earlier checking (as it stands in the patch) cancels the scheduling if canReuseExtract returns false. But it seems better to use Gather as you suggest, in case the check is relaxed in the future.

test/Transforms/SLPVectorizer/X86/insertvalue.ll
3

Fixed by addition of @julia_load_array_of_i16 .

mzolotukhin accepted this revision.Jan 28 2016, 11:45 AM
mzolotukhin edited edge metadata.

Hi,

The changes look good to me modulo some small remarks, but you might want to wait for other reviewers' "ok" too.
Also, when committing, please check-in clean-ups and refactoring separately from the main part.

Thanks,
Michael

lib/Transforms/Vectorize/SLPVectorizer.cpp
282

Nitpick: i should be capitalized (or should we use another name at all?).

357–366

I'd suggest committing this refactoring as a separate patch to separate NFC changes from others.

423

Nitpick: s/GetMaxVecRegSize/getMaxVecRegSize/

1178

This clean-up also should be committed separately.

1509

Should we use getTypeSizeInBits, or getTypeStoreSizeInBits? I remember a bug when we used a wrong one.

1513–1517

This probably can be rewritten as a range loop, or even with std::all_of?

test/Transforms/SLPVectorizer/X86/insertvalue.ll
4–7

Please add CHECK-LABEL statements before each function. That'll help to avoid possible interference between CHECK statements from different tests.

This revision is now accepted and ready to land.Jan 28 2016, 11:45 AM
ArchDRobison edited edge metadata.
ArchDRobison marked an inline comment as done.

Cosmetic changes per Michael Zolothukin's comments.

ArchDRobison marked 4 inline comments as done.Jan 29 2016, 12:56 PM
ArchDRobison added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
282

I changed it to Idx, since that seemed to be used in other spots for variables representing LLVM indices.

1509

The fundamental issue is type-punning for load/store, so the "store size" seems more appropriate. My recent comment for http://reviews.llvm.org/D14260 with the contrived i24 example says more.

Is this still waiting for other reviews? I'd like to get this functionality upstream; I just ran into this situation again today.

On x86_64/Linux (at least compiling with libstdc++), compiling the following (-O3 -ffast-math):

#include <complex>
using namespace std;

complex<double> foo(complex<double> a, complex<double> b) {
  return a*b;
}

yields this:

define { double, double } @_Z3fooSt7complexIdES0_(double, double, double, double) #0 {
  %5 = fmul fast double %3, %0
  %6 = fmul fast double %2, %1
  %7 = fadd fast double %5, %6
  %8 = fmul fast double %2, %0
  %9 = fmul fast double %3, %1
  %10 = fsub fast double %8, %9
  %11 = insertvalue { double, double } undef, double %10, 0
  %12 = insertvalue { double, double } %11, double %7, 1
  ret { double, double } %12
}

and we completely miss this kind of code because the natural chain starts at the insertvalues.

ArchDRobison marked an inline comment as done.Apr 5 2016, 10:19 AM

It's waiting on a review of http://reviews.llvm.org/D14260 , which is a prerequisite, and was part of the original proposed patch.

Update is against today's LLVM sources. Only functional change is that patch MinVecRegSize to same place as MaxVecRegSize. Note that the prerequisite associated patch D14260 is now committed, so it would be a good time to give this patch a final look over.

hfinkel accepted this revision.Apr 26 2016, 2:10 PM
hfinkel added a reviewer: hfinkel.

LGTM too.

lib/Transforms/Vectorize/SLPVectorizer.cpp
357–366

Yes, please split this when committing. The refactoring should be separate.

1512

Comment should be a sentence that ends with a period.

1537

Comment should end with a period.

Bump. I assume this didn't get merged because @ArchDRobison does not have commit privileges? Shall I go ahead and lang this and D14260?

majnemer edited edge metadata.Jul 14 2016, 3:05 PM

Bump. I assume this didn't get merged because @ArchDRobison does not have commit privileges? Shall I go ahead and lang this and D14260?

SGTM.

loladiro closed this revision.Jul 14 2016, 3:36 PM

Never mind, both of these were applied, but without the magic string that auto-closes the review (and I looked at the wrong code base initially). This was applied in rL267899.