This is an archive of the discontinued LLVM Phabricator instance.

Choose the best consecutive candidate for a store instruction in SLP vectorizer
ClosedPublic

Authored by wmi on Jun 15 2015, 8:59 AM.

Details

Summary

This is the following patch for http://reviews.llvm.org/D10352. It changes the SLPVectorizer::vectorizeStores to choose the immediate succeeding or preceding candidate for a store instruction when it has multiple consecutive candidates. This way, it has better chance to find the slp vectorization opportunity.

Diff Detail

Repository
rL LLVM

Event Timeline

wmi updated this revision to Diff 27678.Jun 15 2015, 8:59 AM
wmi retitled this revision from to Choose the best consecutive candidate for a store instruction in SLP vectorizer.
wmi updated this object.
wmi edited the test plan for this revision. (Show Details)
wmi added reviewers: nadav, aschwaighofer.
wmi set the repository for this revision to rL LLVM.
wmi added subscribers: Unknown Object (MLST), davidxl.
nadav edited edge metadata.Jun 15 2015, 9:10 AM

Wei,

Thanks for working on this. Did you run the llvm test suite? Are there any performance wins or compile time regressions?

I have a few comments below:

  • for (unsigned j = 0; j < e; ++j) {
  • if (i == j)
  • continue;
  • const DataLayout &DL = Stores[i]->getModule()->getDataLayout();

+ const DataLayout &DL = Stores[i]->getModule()->getDataLayout();

Please initialize ‘j' when you declare it. Also, why unsigned?

+ unsigned j;
+ If a store has multiple consectutive store candidates, choose
+
the immediate succeeding or preceding one.
+ for (j = i + 1; j < e; ++j) {
+ if (R.isConsecutiveAccess(Stores[i], Stores[j], DL)) {
+ Tails.insert(Stores[j]);
+ Heads.insert(Stores[i]);
+ ConsecutiveChain[Stores[i]] = Stores[j];
+ break;
+ }
+ }

Please document this line, or simplify it.

+ if (j < e)
+ continue;

At this point you are defining a new J variable, with a different type. This is confusing.

+ for (int j = i - 1; j >= 0; --j) {

Can you think of a way to write this code without code duplication?

Thanks,
Nadav

wmi updated this revision to Diff 27767.Jun 16 2015, 10:14 AM
wmi edited edge metadata.

I changed the patch to remove duplicate code.

Then I run the testsuite with BENCHMARKING_ONLY defined and with turbo mode and address random turned off. It is very helpful and make the perf analysis a lot easier. Thanks!
I didn't see much perf change with the patch.

Thanks,
Wei.

Minor nits.

lib/Transforms/Vectorize/SLPVectorizer.cpp
3250

consecutive

test/Transforms/SLPVectorizer/X86/pr23510.ll
7

; CHECK-LABEL: @_Z3fooPml(

wmi added inline comments.Jun 23 2015, 9:37 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
3250

Fixed.

test/Transforms/SLPVectorizer/X86/pr23510.ll
7

Fixed.

wmi updated this revision to Diff 30945.Jul 29 2015, 2:20 PM

Ping.

nadav added a subscriber: nadav.Jul 29 2015, 2:25 PM

The code LGTM. Did you measure the compile time impact of this change?

wmi added a comment.Jul 29 2015, 2:35 PM

The code LGTM. Did you measure the compile time impact of this change?

Yes, I did. There was no compile time change significant enough (all less than 1%).

This revision was automatically updated to reflect the committed changes.