- User Since
- Apr 12 2016, 8:44 PM (175 w, 5 d)
Thu, Aug 22
Mon, Aug 19
@RKSimon Yes, I will update this patch in a day or two.
Fri, Aug 16
Thu, Aug 15
Addressed latest comments
Sun, Aug 4
Just a couple more comments.
Wed, Jul 31
Tue, Jul 30
Mon, Jul 29
Addressed comments (and rebased).
Jul 15 2019
Jul 12 2019
No, there is no single BB. There can be more than one operand BBs that will have to be walked. For example, a TreeEntry in BB0 could have all its left operands in BB1 and all its right operands in BB2. In such case we would need to walk through both of them and perhaps keep the maximum cost? But anyway, given the lack of accuracy in the spill cost estimation I am not sure this is worth the trouble.
I guess it would be better if it visited all the BBs. You could traverse the VectorizableTree once and collect the BBs to be visited before you walk through them. Btw the TreeEntries now have operands, which can help you collect the BBs in the right order.
But I am not sure this is worth the trouble, because it looks to me that the spill cost estimation function is not very accurate in the first place. It only considers the spill cost over calls, and while doing so it is not comparing it against the spill cost of the scalar code, which if I am not mistaken could be equal to that of the vector code in many cases. Someone who knows more about the assumptions made here should probably comment on it.
Jul 3 2019
Thanks for sharing the example.
I am probably missing out something, but isn't this profitable only if we have more than one of these scalar loads extracting from the same vector load? Perhaps we could use these scalar loads as seeds and do short top-down SLP?
Also isn't it better to run this after vectorizeStoreChains() ?
Jul 2 2019
Jul 1 2019
Abandoning this patch. The fix can now be found in D60897.
Fixed the compilation-time issue with the introduction of the user budget limit from D63948. Also added lit test for it.
OK let me try to create a test that causes a change in the IR.
Not sure what kind of test would be suitable for this change. It simply restricts the look-ahead heuristic to not visit more than 2 operands and 2 uses, which is not something visible in the output IR.
Jun 28 2019
Updated option variable names.
Thank you @phosek for testing it, I am glad it fixed the issue.
Let me add reviewers. I will push the change as soon as it gets approved by someone.
Oops sorry about that @phosek . I wonder how none of the lit tests failed. Please try this one.
Fixed the failure.
Yes, I agree @lebedev.ri they should be options with some documentation describing what they do.
Let's first get a confirmation that this is the actual cause of the compilation-time increase.
Please try this patch @phosek . Please let us know if it fixes the compilation-time issue. Thanks!
Thanks, that would be really helpful @phosek . Let me create the quick patch.
I think There are two possible causes for the compilation time increase:
- Line 901 : We can restrict the number of operands to a max of 2
- Line 820: We can restrict the visited users to a ma x of 2.
I can either create a quick patch, or I can revert it. Either is fine.
Jun 27 2019
Jun 26 2019
Jun 25 2019
Addressed comments and rebased.
Fixed crash in chromium reported by @rnk.
Yes, @RKSimon . It was posted in llvm-commits. I did reproduce it and I will update the patch with the fix + lit test.
This was reverted in r364111 since it was causing a failure in Chromium reported by @rnk.
Jun 24 2019
Jun 21 2019
Please commit the patch, I don't have commit access.
Thank you for the reviews. Please commit the patch.
Jun 20 2019
Jun 13 2019
Jun 12 2019
Hmm I am now getting the same failure as you @dtemirbulatov . I am not sure what was wrong before, but it seems that the change in PR39774.ll is no longer needed.
Removed changes in PR39774.ll.
Jun 11 2019
Jun 5 2019
I am not sure about the value safety assumptions here. Is there any unsafe-math flag that would allow us to override the integer overflow flags?
Jun 4 2019
Thank you for the reviews. Please commit it if @ABataev is also happy with it.
May 31 2019
I investigated the two AArch64 failing tests. These tests feature the exact problem that we are trying to solve with this look-ahead heuristic. A commutative instruction had operands of the same opcode that the current heuristic has no way of reordering in an informed way. The current reordering was just lucky to pick the proper one, while the look-ahead heuristic was reordering the operands according to the score. However, the problem was that the score calculation was not considering external uses and was therefore favoring a sub-optimal operand ordering.
May 30 2019
Addressed the review comments.
May 29 2019
May 28 2019
Addressed @dtemirbulatov 's comment.
Thanks for committing the test @RKSimon . I will follow up on this soon.
May 24 2019
Minor comment update.
May 23 2019
Yes, I will take a look. Maybe it is worth using TTI for the scores after all.
May 21 2019
May 20 2019
I think the first throttling patch should implement a very simple and fast algorithm for finding the cut:
- Add new fields to TreeEntry for Cost, ExtractCost and PredecessorsCost.
- During getTreeCost() set the TE.Cost and TE.ExtractCost (as you did in an earlier version of the patch if I am not mistaken)
- Do a single top-down traversal of the tree in reverse postorder and set the TE.PredecessorsCost equal to the cost of all the predecessor's costs until TE. While doing so, you can compare the cost of cutting just below TE by comparing the gather cost of TE versus the Cost + PredecessorsCost. This is very fast as you only need to visit each TreeEntry node once, so the complexity is linear to the size of the tree.
May 18 2019
Thanks for the review. Please commit the change.
May 17 2019
Minor change: Fixed double printing of TreeEntry index in the dumpVectorizableTree() debug function.
May 10 2019
Thanks for the review. Please commit the change, I don't have commit access.
May 9 2019
Fixed the formatting issue.
May 8 2019
May 7 2019
Please try not to introduce a new Node structure or a new graph. You can use TreeEntry instead and you can add any necessary fields to it.
May 3 2019
Updated getBestOperand() to use getLookAheadScore() for Load and Constant, not just Opcode.
May 1 2019
Addressed comments and updated lit test.
Apr 30 2019
Apr 29 2019
@ABataev requested this to be checked in before the look-ahead patch. Please commit this if it looks good, I don't have commit access.