This is an archive of the discontinued LLVM Phabricator instance.

Choose the best consecutive candidate for a store instruction in SLP vectorizer
ClosedPublic

Authored by wmi on Jun 15 2015, 8:59 AM.

Download Raw Diff

Details

Reviewers

nadav
aschwaighofer

Commits

rGd6f7252e2ec0: [SLP vectorizer]: Choose the best consecutive candidate to pair with a store…
rL243666: [SLP vectorizer]: Choose the best consecutive candidate to pair with a store…

Summary

This is the following patch for http://reviews.llvm.org/D10352. It changes the SLPVectorizer::vectorizeStores to choose the immediate succeeding or preceding candidate for a store instruction when it has multiple consecutive candidates. This way, it has better chance to find the slp vectorization opportunity.

Diff Detail

Repository: rL LLVM

Event Timeline

wmi updated this revision to Diff 27678.Jun 15 2015, 8:59 AM

wmi retitled this revision from to Choose the best consecutive candidate for a store instruction in SLP vectorizer.

wmi updated this object.

wmi edited the test plan for this revision. (Show Details)

wmi added reviewers: nadav, aschwaighofer.

wmi set the repository for this revision to rL LLVM.

wmi added subscribers: Unknown Object (MLST), davidxl.

Wei,

Thanks for working on this. Did you run the llvm test suite? Are there any performance wins or compile time regressions?

I have a few comments below:

for (unsigned j = 0; j < e; ++j) {
if (i == j)
continue;
const DataLayout &DL = Stores[i]->getModule()->getDataLayout();

+ const DataLayout &DL = Stores[i]->getModule()->getDataLayout();

Please initialize ‘j' when you declare it. Also, why unsigned?

+ unsigned j;
+ If a store has multiple consectutive store candidates, choose
+ the immediate succeeding or preceding one.
+ for (j = i + 1; j < e; ++j) {
+ if (R.isConsecutiveAccess(Stores[i], Stores[j], DL)) {
+ Tails.insert(Stores[j]);
+ Heads.insert(Stores[i]);
+ ConsecutiveChain[Stores[i]] = Stores[j];
+ break;
+ }
+ }

Please document this line, or simplify it.

+ if (j < e)
+ continue;

At this point you are defining a new J variable, with a different type. This is confusing.

+ for (int j = i - 1; j >= 0; --j) {

Can you think of a way to write this code without code duplication?

Thanks,
Nadav

I changed the patch to remove duplicate code.

Then I run the testsuite with BENCHMARKING_ONLY defined and with turbo mode and address random turned off. It is very helpful and make the perf analysis a lot easier. Thanks!
I didn't see much perf change with the patch.

Thanks,
Wei.

Ping.

Minor nits.

lib/Transforms/Vectorize/SLPVectorizer.cpp
3250	consecutive
test/Transforms/SLPVectorizer/X86/pr23510.ll
6	; CHECK-LABEL: @_Z3fooPml(

wmi added inline comments.Jun 23 2015, 9:37 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
3250	Fixed.
test/Transforms/SLPVectorizer/X86/pr23510.ll
6	Fixed.

Ping.

The code LGTM. Did you measure the compile time impact of this change?

The code LGTM. Did you measure the compile time impact of this change?

Yes, I did. There was no compile time change significant enough (all less than 1%).

Closed by commit rL243666: [SLP vectorizer]: Choose the best consecutive candidate to pair with a store… (authored by wmi). · Explain WhyJul 30 2015, 10:41 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

25 lines

test/

Transforms/

SLPVectorizer/

X86/

pr23510.ll

37 lines

Diff 27767

lib/Transforms/Vectorize/SLPVectorizer.cpp

Show First 20 Lines • Show All 3,237 Lines • ▼ Show 20 Lines	bool SLPVectorizer::vectorizeStores(ArrayRef<StoreInst *> Stores,

// We may run into multiple chains that merge into a single chain. We mark the		// We may run into multiple chains that merge into a single chain. We mark the
// stores that we vectorized so that we don't visit the same store twice.		// stores that we vectorized so that we don't visit the same store twice.
BoUpSLP::ValueSet VectorizedStores;		BoUpSLP::ValueSet VectorizedStores;
bool Changed = false;		bool Changed = false;

// Do a quadratic search on all of the given stores and find		// Do a quadratic search on all of the given stores and find
// all of the pairs of stores that follow each other.		// all of the pairs of stores that follow each other.
		SmallVector<unsigned, 16> IndexQueue;
for (unsigned i = 0, e = Stores.size(); i < e; ++i) {		for (unsigned i = 0, e = Stores.size(); i < e; ++i) {
for (unsigned j = 0; j < e; ++j) {
if (i == j)
continue;
const DataLayout &DL = Stores[i]->getModule()->getDataLayout();		const DataLayout &DL = Stores[i]->getModule()->getDataLayout();
if (R.isConsecutiveAccess(Stores[i], Stores[j], DL)) {		IndexQueue.clear();
Tails.insert(Stores[j]);		// If a store has multiple consectutive store candidates, search Stores
		mcrosierUnsubmitted Not Done Reply Inline Actions consecutive mcrosier: consecutive
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
		// array according to the sequence: from i+1 to e, then from i-1 to 0.
		// This is because usually pairing with immediate succeeding or preceding
		// candidate create the best chance to find slp vectorization opportunity.
		unsigned j = 0;
		for (j = i + 1; j < e; ++j)
		IndexQueue.push_back(j);
		for (j = i; j > 0; --j)
		IndexQueue.push_back(j - 1);

		for (auto &k : IndexQueue) {
		if (R.isConsecutiveAccess(Stores[i], Stores[k], DL)) {
		Tails.insert(Stores[k]);
Heads.insert(Stores[i]);		Heads.insert(Stores[i]);
ConsecutiveChain[Stores[i]] = Stores[j];		ConsecutiveChain[Stores[i]] = Stores[k];
		break;
}		}
}		}
}		}

// For stores that start but don't end a link in the chain:		// For stores that start but don't end a link in the chain:
for (SetVector<StoreInst *>::iterator it = Heads.begin(), e = Heads.end();		for (SetVector<StoreInst *>::iterator it = Heads.begin(), e = Heads.end();
it != e; ++it) {		it != e; ++it) {
if (Tails.count(*it))		if (Tails.count(*it))
▲ Show 20 Lines • Show All 759 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/pr23510.ll

				; PR23510
				; RUN: opt < %s -basicaa -slp-vectorizer -S \| FileCheck %s

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				mcrosierUnsubmitted Not Done Reply Inline Actions ; CHECK-LABEL: @_Z3fooPml( mcrosier: ; CHECK-LABEL: @_Z3fooPml(
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
				; CHECK: lshr <2 x i64>
				; CHECK: lshr <2 x i64>

				@total = global i64 0, align 8

				define void @_Z3fooPml(i64* nocapture %a, i64 %i) {
				entry:
				%tmp = load i64, i64* %a, align 8
				%shr = lshr i64 %tmp, 4
				store i64 %shr, i64* %a, align 8
				%arrayidx1 = getelementptr inbounds i64, i64* %a, i64 1
				%tmp1 = load i64, i64* %arrayidx1, align 8
				%shr2 = lshr i64 %tmp1, 4
				store i64 %shr2, i64* %arrayidx1, align 8
				%arrayidx3 = getelementptr inbounds i64, i64* %a, i64 %i
				%tmp2 = load i64, i64* %arrayidx3, align 8
				%tmp3 = load i64, i64* @total, align 8
				%add = add i64 %tmp3, %tmp2
				store i64 %add, i64* @total, align 8
				%tmp4 = load i64, i64* %a, align 8
				%shr5 = lshr i64 %tmp4, 4
				store i64 %shr5, i64* %a, align 8
				%tmp5 = load i64, i64* %arrayidx1, align 8
				%shr7 = lshr i64 %tmp5, 4
				store i64 %shr7, i64* %arrayidx1, align 8
				%tmp6 = load i64, i64* %arrayidx3, align 8
				%tmp7 = load i64, i64* @total, align 8
				%add9 = add i64 %tmp7, %tmp6
				store i64 %add9, i64* @total, align 8
				ret void
				}