This is an archive of the discontinued LLVM Phabricator instance.

PR20234 - [SLP Vectorizer] Canonicalize tree operands of commutitive binary operands.
ClosedPublic

Authored by mcrosier on Jul 25 2014, 4:10 PM.

Details

Summary

As a result of recent work on the reassociation pass, I noticed the ordering of operands effected the behavior of the SLP vectorizer. The ordering can change how the expression tree is built and in turn change the cost of the tree. In the provided test case, derived from spec2k/mesa, the ordering of the load instructions (in the expression tree) are reversed, so the loads are gathered, rather than vectorized.

This patch attempts to resolve this issue by canonicalizing the operands of commutitive instructions based on source order. The result is an expression tree that more closely mirrors the instruction source order. Currently the canonicalization happens to both the instruction as well as the expression tree. However, only the latter is necessary to address this issue; I have no objection to leaving the instruction operands as is.

I'm sure a more robust solution exists, but I decided to begin with the simplest solution because I think it gets a fairly good bang for the buck, is safe, and is maintainable.

Correctness runs look good. Performance runs (AArch64/A53) also look good; no regressions and a few minor improvements.

Please take a look!

Chad

Diff Detail

Event Timeline

mcrosier updated this revision to Diff 11900.Jul 25 2014, 4:10 PM
mcrosier retitled this revision from to PR20234 - [SLP Vectorizer] Canonicalize operands of commutitive binary operands..
mcrosier updated this object.
mcrosier edited the test plan for this revision. (Show Details)
mcrosier added a subscriber: Unknown Object (MLST).
nadav edited edge metadata.Jul 25 2014, 4:15 PM

Chad,

In this patch you are changing the instruction V regardless if vectorization succeeds or not. Would it be possible to only change the VL vector without modifying the binary operator V?

Thanks,
Nadav

In D4680#5, @nadav wrote:

Chad,

In this patch you are changing the instruction V regardless if vectorization succeeds or not. Would it be possible to only change the VL vector without modifying the binary operator V?

Thanks,
Nadav

Nadav,
That would be perfectly fine. In fact, it's the right thing to do as we don't want to undo the canonicalization performed by the reassociation pass.

Chad

lib/Transforms/Vectorize/SLPVectorizer.cpp
2424

We don't need to actually modify the instruction for this to work.

mcrosier updated this revision to Diff 12045.Jul 30 2014, 1:15 PM
mcrosier removed a reviewer: grosbach.
mcrosier removed a subscriber: aemerson.

Update based on Nadav's feedback. Specifically, only commute the tree operands.

mcrosier retitled this revision from PR20234 - [SLP Vectorizer] Canonicalize operands of commutitive binary operands. to PR20234 - [SLP Vectorizer] Canonicalize tree operands of commutitive binary operands..Jul 30 2014, 1:17 PM
mcrosier added a reviewer: eeckstein.
mcrosier accepted this revision.Jul 30 2014, 2:05 PM
mcrosier added a reviewer: mcrosier.

Nadav gave the LGTM. Will commit shortly.

This revision is now accepted and ready to land.Jul 30 2014, 2:05 PM
mcrosier closed this revision.Jul 30 2014, 2:20 PM

Committed as r214338.

eeckstein edited edge metadata.Jul 31 2014, 5:02 AM

Hi Chad,

I'm sorry that I didn't see your patch before.
I'm currently also working on the SLPVectorizer and my patch is in the final review phase. It's about improving the scheduling.
This is a different issue than what you handle in your patch, but still there is one "conflict":

My new scheduling algorithm is more general and made some heuristics obsolete which - like your approach - relied on the instruction numbering. So I could remove the instruction numbering at all.

Now I looked in detail what problem you solve in your test case. Here the original problem is that the load instructions are in the wrong order, so that they are not recognised as consecutive loads.

I attached a new patch which is a more general solution and does not rely on instruction numbering.
Please take a look at it. If it proves to work well, I'd like to replace your change with this algorithm. This would make my life easier for my other patch :-)

For reference I also attached the my scheduling-patch (it's based on an earlier revision).

Thanks and sorry for replying so late,
Erik

In D4680#21, @eeckstein wrote:

Hi Chad,

I'm sorry that I didn't see your patch before.
I'm currently also working on the SLPVectorizer and my patch is in the final review phase. It's about improving the scheduling.
This is a different issue than what you handle in your patch, but still there is one "conflict":

My new scheduling algorithm is more general and made some heuristics obsolete which - like your approach - relied on the instruction numbering. So I could remove the instruction numbering at all.

Now I looked in detail what problem you solve in your test case. Here the original problem is that the load instructions are in the wrong order, so that they are not recognised as consecutive loads.

I attached a new patch which is a more general solution and does not rely on instruction numbering.
Please take a look at it. If it proves to work well, I'd like to replace your change with this algorithm. This would make my life easier for my other patch :-)

For reference I also attached the my scheduling-patch (it's based on an earlier revision).

Thanks and sorry for replying so late,
Erik

Hi Erik,
I added you to the CC list because I noticed you were working on the SLP vectorizer as well. I suspect your patch is the more robust solution I was referring to in the original email. ;) Does the test case in this patch still work with your patch? If so, I have no problem with you reverting this patch and applying yours. Please just add my test case to your patch. If not, I would like to get the test passing with your patch.

Chad

In D4680#22, @mcrosier wrote:

Hi Erik,
I added you to the CC list because I noticed you were working on the SLP vectorizer as well. I suspect your patch is the more robust solution I was referring to in the original email. ;) Does the test case in this patch still work with your patch? If so, I have no problem with you reverting this patch and applying yours. Please just add my test case to your patch. If not, I would like to get the test passing with your patch.

Chad

Note: If the test doesn't pass, please don't consider this a blocker to your patch. We can revert my patch and apply yours as it's obviously a more general solution. All I ask is that we try to get the test case passing in the near future. This specific code sequence is derived from spec2k/mesa and improves performance by ~3% (AArch64/A53).

yes, the test passes.
Please let me know if it also gives the same improvement for the benchmark.

Thanks,
Erik

In D4680#24, @eeckstein wrote:

yes, the test passes.
Please let me know if it also gives the same improvement for the benchmark.

Thanks,
Erik

If it passes, then we're good. Please go ahead and revert my patch and apply yours (assuming you've got the LGTM). :)