This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for an example).
RKSimon ABataev dtemirbulatov Ayal hfinkel rnk
- rL364478: [SLP] Look-ahead operand reordering heuristic.
rL364084: [SLP] Look-ahead operand reordering heuristic.
rGcf47ff5ffb1a: [SLP] Recommit: Look-ahead operand reordering heuristic.
rL364964: [SLP] Recommit: Look-ahead operand reordering heuristic.
rG574cb0eb3a7a: [SLP] Look-ahead operand reordering heuristic.
rG5698921be2d5: [SLP] Look-ahead operand reordering heuristic.
|756 ↗||(On Diff #197190)|
Seems to me, the code is not formatted
|757 ↗||(On Diff #197190)|
|758 ↗||(On Diff #197190)|
|764 ↗||(On Diff #197190)|
|765 ↗||(On Diff #197190)|
|769 ↗||(On Diff #197190)|
|770 ↗||(On Diff #197190)|
|815 ↗||(On Diff #197190)|
|816 ↗||(On Diff #197190)|
|760 ↗||(On Diff #197366)|
(style) remove outer brackets
|773 ↗||(On Diff #197366)|
What happens in the case where we have alt opcodes? Should we have a preference for all the same opcode vs with alt-opcode? Sometimes the alt-opcodes will fold away (shl + mul etc.) - other times it won't (shl + lshr).
Addressed comments and updated lit test.
|773 ↗||(On Diff #197366)|
Hmm good point. Well, currently 'getScoreAtLevelRec()' will simply walk past the alt instructions and will assign them ScoreSameOpcode. This is does not look very accurate because alt opcodes usually require shuffles and should have a lower score.
As for the alt opcodes that fold away, maybe that should be fixed in the getSameOpcode() and in struct InstructionState ? If the get folded then maybe isAltShuffle() should return false?
I investigated the two AArch64 failing tests. These tests feature the exact problem that we are trying to solve with this look-ahead heuristic. A commutative instruction had operands of the same opcode that the current heuristic has no way of reordering in an informed way. The current reordering was just lucky to pick the proper one, while the look-ahead heuristic was reordering the operands according to the score. However, the problem was that the score calculation was not considering external uses and was therefore favoring a sub-optimal operand ordering.
I updated the patch to factor in the cost of external uses, and both failures are now gone. I also updated the lit-test with a test that shows the problem with the external-uses.
hmm, I have another failure with this change on my setup, now it is PR39774.ll. Probably, it might be sort algorithm differents of similar, since it just swap of two inserts. I don't have this failure if PR39774.ll is intact.
This change has caused a massive increase in build times when using LTO. On my workstations when building Clang toolchain itself with LTO, the build time has increased from 50 minutes to 5+ hours. On our continuous builders, we seen timeouts everywhere because they aren't able to finish within the allocated time (5 hours). Would it be possible to revert this change?
I think There are two possible causes for the compilation time increase:
- Line 901 : We can restrict the number of operands to a max of 2
- Line 820: We can restrict the visited users to a ma x of 2.
I can either create a quick patch, or I can revert it. Either is fine.