This is the following patch for http://reviews.llvm.org/D10352. It changes the SLPVectorizer::vectorizeStores to choose the immediate succeeding or preceding candidate for a store instruction when it has multiple consecutive candidates. This way, it has better chance to find the slp vectorization opportunity.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Wei,
Thanks for working on this. Did you run the llvm test suite? Are there any performance wins or compile time regressions?
I have a few comments below:
- for (unsigned j = 0; j < e; ++j) {
- if (i == j)
- continue;
- const DataLayout &DL = Stores[i]->getModule()->getDataLayout();
+ const DataLayout &DL = Stores[i]->getModule()->getDataLayout();
Please initialize ‘j' when you declare it. Also, why unsigned?
+ unsigned j;
+ If a store has multiple consectutive store candidates, choose
+ the immediate succeeding or preceding one.
+ for (j = i + 1; j < e; ++j) {
+ if (R.isConsecutiveAccess(Stores[i], Stores[j], DL)) {
+ Tails.insert(Stores[j]);
+ Heads.insert(Stores[i]);
+ ConsecutiveChain[Stores[i]] = Stores[j];
+ break;
+ }
+ }
Please document this line, or simplify it.
+ if (j < e)
+ continue;
At this point you are defining a new J variable, with a different type. This is confusing.
+ for (int j = i - 1; j >= 0; --j) {
Can you think of a way to write this code without code duplication?
Thanks,
Nadav
I changed the patch to remove duplicate code.
Then I run the testsuite with BENCHMARKING_ONLY defined and with turbo mode and address random turned off. It is very helpful and make the perf analysis a lot easier. Thanks!
I didn't see much perf change with the patch.
Thanks,
Wei.
The code LGTM. Did you measure the compile time impact of this change?
Yes, I did. There was no compile time change significant enough (all less than 1%).
consecutive