- User Since
- Apr 12 2016, 8:44 PM (165 w, 5 d)
Thu, Jun 13
Wed, Jun 12
Hmm I am now getting the same failure as you @dtemirbulatov . I am not sure what was wrong before, but it seems that the change in PR39774.ll is no longer needed.
Removed changes in PR39774.ll.
Tue, Jun 11
Wed, Jun 5
I am not sure about the value safety assumptions here. Is there any unsafe-math flag that would allow us to override the integer overflow flags?
Tue, Jun 4
Thank you for the reviews. Please commit it if @ABataev is also happy with it.
Fri, May 31
I investigated the two AArch64 failing tests. These tests feature the exact problem that we are trying to solve with this look-ahead heuristic. A commutative instruction had operands of the same opcode that the current heuristic has no way of reordering in an informed way. The current reordering was just lucky to pick the proper one, while the look-ahead heuristic was reordering the operands according to the score. However, the problem was that the score calculation was not considering external uses and was therefore favoring a sub-optimal operand ordering.
Thu, May 30
Addressed the review comments.
Wed, May 29
Tue, May 28
Addressed @dtemirbulatov 's comment.
Thanks for committing the test @RKSimon . I will follow up on this soon.
Fri, May 24
Minor comment update.
Thu, May 23
Yes, I will take a look. Maybe it is worth using TTI for the scores after all.
Tue, May 21
Mon, May 20
I think the first throttling patch should implement a very simple and fast algorithm for finding the cut:
- Add new fields to TreeEntry for Cost, ExtractCost and PredecessorsCost.
- During getTreeCost() set the TE.Cost and TE.ExtractCost (as you did in an earlier version of the patch if I am not mistaken)
- Do a single top-down traversal of the tree in reverse postorder and set the TE.PredecessorsCost equal to the cost of all the predecessor's costs until TE. While doing so, you can compare the cost of cutting just below TE by comparing the gather cost of TE versus the Cost + PredecessorsCost. This is very fast as you only need to visit each TreeEntry node once, so the complexity is linear to the size of the tree.
Sat, May 18
Thanks for the review. Please commit the change.
May 17 2019
Minor change: Fixed double printing of TreeEntry index in the dumpVectorizableTree() debug function.
May 10 2019
Thanks for the review. Please commit the change, I don't have commit access.
May 9 2019
Fixed the formatting issue.
May 8 2019
May 7 2019
Please try not to introduce a new Node structure or a new graph. You can use TreeEntry instead and you can add any necessary fields to it.
May 3 2019
Updated getBestOperand() to use getLookAheadScore() for Load and Constant, not just Opcode.
May 1 2019
Addressed comments and updated lit test.
Apr 30 2019
Apr 29 2019
@ABataev requested this to be checked in before the look-ahead patch. Please commit this if it looks good, I don't have commit access.
Apr 24 2019
Thank you for the reviews, and @uabelho for reporting the issue.
Please commit the patch as I don't have commit access.
Apr 23 2019
Accept undef in OpLastLane if Op contains an Instruction.
Cleaned up the lit test variable names.
Apr 18 2019
Apr 16 2019
Great, thanks! Please commit the patch , I don't have commit access.
Thank you for the reviews. @RKSimon any more comments ?
Apr 15 2019
Apr 12 2019
Apr 11 2019
Addressed the latest comments.
Addressed @ABataev 's comments.
Apr 6 2019
Addressed the comments and also moved the reordering-related functions inside the VLOperands class.
Apr 3 2019
I am not sure how to proceed with the operandorder.ll test. @ABataev do you know we would give higher priority to broadcasts?
Updated getBestOperand() to better handle undefs. This fixes the regression in crash_smallpt.ll.
Apr 2 2019
Rebased + some minor changes.
Apr 1 2019
Addressed the comments.
Mar 31 2019
Fixed doxygen tags.
Mar 29 2019
I also rebased to support the commutative predicates of D59992.
Mar 28 2019
Mar 18 2019
Mar 12 2019
Thanks @RKSimon . Moving the operator<<() within the EdgeInfo struct fixes the build issue with clang 7.0.1.
Mar 11 2019
Please commit this, I don't have commit access.
Mar 6 2019
Feb 6 2019
Correct me if I am wrong, but I think your algorithm can only remove the rightmost branch of the SLP graph (and maybe several of them if they are close by), as StopAt is a single index in VectorizableTree and we remove all nodes > StopAt.
I think this could be improved if we could find all possible entries where we could stop at, not just one of them. Then, in the final traversal, we could remove all branches starting from each of these nodes all the way to the leaves.
Also, I think the Delta in reduceTreeCost() should not be needed as we should run this regardless of whether the original tree is profitable or not.
Sep 25 2018
Hi Florian, yes I am happy with the changes, thanks!
Just please try to add an assertion that checks that the graph is a tree (see inline comment) to make sure that the graph is the one we expect.