Page MenuHomePhabricator

[SLP] Look-ahead operand reordering heuristic.
Needs ReviewPublic

Authored by vporpo on Apr 18 2019, 4:36 PM.



This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for an example).

Diff Detail

Event Timeline

vporpo created this revision.Apr 18 2019, 4:36 PM
RKSimon added inline comments.Apr 27 2019, 8:35 AM

"slp-max-look-ahead-depth" might be better?


These kinds of hard coded costs scare me, especially without any context/description.

vporpo updated this revision to Diff 197190.Apr 29 2019, 2:31 PM
vporpo marked 2 inline comments as done.
vporpo added inline comments.

I agree, they look quite scary but the values themselves are not very important.
What we we really care about is figuring out whether the values could be potentially vectorized or not. The relative differences between the scores is not very important either. It just shows that matching loads is usually preferable to matching instructions with the same opcode etc.

We could still use the same score of 1 for all of them and we would still get decent reordering. Another alternative would be to check TTI for each potential candidate vector, but it looks like an overkill.



Seems to me, the code is not formatted


auto *


auto *


auto *


auto *


auto *


auto *


auto *


auto *

vporpo updated this revision to Diff 197195.Apr 29 2019, 3:13 PM
vporpo marked 9 inline comments as done.

Better to commit the test itself at first, without this patch

RKSimon added inline comments.May 1 2019, 9:41 AM

(style) remove outer brackets


What happens in the case where we have alt opcodes? Should we have a preference for all the same opcode vs with alt-opcode? Sometimes the alt-opcodes will fold away (shl + mul etc.) - other times it won't (shl + lshr).

vporpo updated this revision to Diff 197686.May 1 2019, 6:37 PM
vporpo marked 2 inline comments as done.

Addressed comments and updated lit test.


Hmm good point. Well, currently 'getScoreAtLevelRec()' will simply walk past the alt instructions and will assign them ScoreSameOpcode. This is does not look very accurate because alt opcodes usually require shuffles and should have a lower score.
I introduced a new ScoreAltOpcodes = 1 so that alt opcodes are not given the same score as identical opcodes (please see the new functions in lit test).

As for the alt opcodes that fold away, maybe that should be fixed in the getSameOpcode() and in struct InstructionState ? If the get folded then maybe isAltShuffle() should return false?

vporpo updated this revision to Diff 198119.May 3 2019, 6:41 PM

Updated getBestOperand() to use getLookAheadScore() for Load and Constant, not just Opcode.

please can you rebase this?

dtemirbulatov added inline comments.May 22 2019, 12:26 PM

one extra space here.

dtemirbulatov added a comment.EditedMay 23 2019, 11:57 AM

oh, I notice some regression with the change for AArch64/matmul.ll and AArch64/transpose.ll. Maybe there is a way to isolate it with heuristics?

Yes, I will take a look. Maybe it is worth using TTI for the scores after all.

rcorcs added a subscriber: rcorcs.Tue, May 28, 4:57 AM
vporpo updated this revision to Diff 202491.Fri, May 31, 2:54 PM

I investigated the two AArch64 failing tests. These tests feature the exact problem that we are trying to solve with this look-ahead heuristic. A commutative instruction had operands of the same opcode that the current heuristic has no way of reordering in an informed way. The current reordering was just lucky to pick the proper one, while the look-ahead heuristic was reordering the operands according to the score. However, the problem was that the score calculation was not considering external uses and was therefore favoring a sub-optimal operand ordering.

I updated the patch to factor in the cost of external uses, and both failures are now gone. I also updated the lit-test with a test that shows the problem with the external-uses.

vporpo updated this revision to Diff 203242.Wed, Jun 5, 1:41 PM


hmm, I have another failure with this change on my setup, now it is PR39774.ll. Probably, it might be sort algorithm differents of similar, since it just swap of two inserts. I don't have this failure if PR39774.ll is intact.

vporpo updated this revision to Diff 204408.Wed, Jun 12, 7:44 PM

Removed changes in PR39774.ll.

Hmm I am now getting the same failure as you @dtemirbulatov . I am not sure what was wrong before, but it seems that the change in PR39774.ll is no longer needed.

This revision is now accepted and ready to land.Thu, Jun 13, 5:35 AM

@vporpo I've committed the lookahead.ll changes at rL363925 with current (trunk) codegen - please can you rebase?

RKSimon requested changes to this revision.Thu, Jun 20, 6:15 AM
This revision now requires changes to proceed.Thu, Jun 20, 6:15 AM
RKSimon accepted this revision.Fri, Jun 21, 3:59 AM

LGTM - thanks!

This revision is now accepted and ready to land.Fri, Jun 21, 3:59 AM

Thank you for the reviews. Please commit the patch.

This revision was automatically updated to reflect the committed changes.
vporpo reopened this revision.Tue, Jun 25, 12:05 AM
vporpo added a subscriber: rnk.

This was reverted in r364111 since it was causing a failure in Chromium reported by @rnk.

This revision is now accepted and ready to land.Tue, Jun 25, 12:05 AM
RKSimon requested changes to this revision.Tue, Jun 25, 12:18 AM
RKSimon added a reviewer: rnk.

@rnk do you have a repro yet please?

This revision now requires changes to proceed.Tue, Jun 25, 12:18 AM

Yes, @RKSimon . It was posted in llvm-commits. I did reproduce it and I will update the patch with the fix + lit test.

vporpo updated this revision to Diff 206383.Tue, Jun 25, 1:08 AM
vporpo edited the summary of this revision. (Show Details)

Fixed crash in chromium reported by @rnk.

The crash was caused by two call instructions with different number of arguments (see lookahead_crash() function in lit test).

vporpo marked an inline comment as done.Tue, Jun 25, 1:15 AM
vporpo added inline comments.

This was the cause of the crash. There was no std::min here, so ToIdx could be OpIdx1 + 1 even if I2 had fewer operands than that.