@mssimpso Is this something you're still interested in pushing forward?
I was just looking at how we could use callee metadata to improve code, and was going to implement a patch like this to update the call graph so that we had more accurate bottom-up code generation, and then started poking around and found this patch.
Jul 29 2019
Oct 1 2018
@mssimpso Are you OK with reviewing the changes to CallPromotionUtils.cpp in this patch? I wasn't sure who the correct code owner was.
May 15 2018
Thanks very much for the review Easwaran! As this patch still depends on D39339 to get the function iteration order right in the inliner, would you mind taking at look at that patch if you have a chance, or recommending someone for the review? Thanks again.
May 1 2018
Addressed first round of comments from Alexey and Ayal. Thanks again for the feedback! I'll respond to Ayal's most recent comments in a separate update.
Apr 30 2018
This is reminiscent of LV's interleave group optimization, in the sense that a couple of correlated inefficient vector "gathers" are replaced by a couple of efficiently formed vectors followed by transposing shuffles. The correlated gathers may come from the two operands of a binary operation, as in this patch, or more generally from arbitrary leaves of the SLP tree.
Apr 27 2018
Will it work correctly if some of the operations are used several times in the bundles? It would be good to have the tests for this kind of situation.
Apr 26 2018
Apr 25 2018
Apr 24 2018
LGTM. Probably wait a day before committing in case Renato/others have a comment/suggestion.
Thanks for catching this bug! Yep, it was introduced with the refactoring done in D40658. I just have one minor comment.
Addressed Javed's comments. Thanks!
Apr 23 2018
Apr 17 2018
Addressed Easwaran's comments. Thanks again!
Apr 6 2018
Addressed Easwaran's comments.
Updated to work with the latest revision of D39869.
Apr 2 2018
Mar 23 2018
Sounds good! I'll update the test. Thanks!
Mar 22 2018
Mar 21 2018
Addressed Alexey's comments. Thanks!
Here is the SLP tree of the added test case. The cost of the gather (%a, %b, %c, %d) is added twice, once per use, before this patch.
Mar 20 2018
Mar 19 2018
Mar 16 2018
Mar 15 2018
Addressed Renato's comments.
I'm happy to see this! LGTM with just one small comment, no need to re-review. Thanks!
Why can't this be an extension of getIntrinsicInstrCost itself? It already has that logic (and much more)...
Also, my comments are related to further extensions of this function (if necessary), as well as for extending getIntrinsicInstrCost.
I fear we'll end up with a big list of lambdas before the actual switches, all to wrap common variables and safety asserts.
Mar 14 2018
Do you intend to consider AArch64TTIImpl::getMinMaxRdxCost() too?
Mar 9 2018
Mar 7 2018
The test case has scalar types and it seems more interesting to see the cost rising proportionally with the vector factor.
For the test case, why not just run "-cost-model -analyze" like we do for the other tests instead of running the vectorizer? Am I missing something?
Makes sense to me. I tested this patch on Falkor and didn't notice any significant performance differences on the benchmarks I ran. Can we add a test case, though? Maybe in test/Analysis/CostModel?
Mar 5 2018
Sorry for not responding sooner - I was away from work for a few weeks. Yes, this looks like the same issue that was fixed over in rL324195. Thanks for adding the new tests, and sorry for the duplicated effort! Let me know if you run into any problems with the new code.
Jan 24 2018
thanks for taking care of this! I like the general idea of the fix but I have a concern regarding the TODO in line 408:// TODO: We should not rely on InstCombine to rewrite the reduction in the // smaller type. We should just generate a correctly typed expression // to begin with.
The cost model is relying on an optimization that will hopefully be applied by InstCombine. I wonder if it would be too complicated to implement the actual type shrinking in LV code gen as part of the fix.
That would better align cost modeling with LV code generation, which is one the major concerns of the current infrastructure.
In the future VPlan infrastructure, we will definitely need to address this optimization as a VPlan-to-VPlan transformation. Thanks for bringing up this issue.
Jan 19 2018
Jan 18 2018
Jan 15 2018
Addressed David's comments.
Jan 12 2018
I might consider holding my nose, if this restores something that was there.
I think the other two passes we have for doing sinking aren't currently enabled (GVNSink and Sink.cpp), although unless somebody puts effort in them this will always be a chicken-egg problem.
I'm not sure I'll have the time to review this patch carefully, I don't want to put you on the hook, but if you can consider an alternative, that would be great.
If you look at the original review you'll notice that I was fundamentally opposed to the change, see e.g. https://reviews.llvm.org/D38566#926530
(FWIW, it doesn't matter where we move sinking/hoisting there will be always some case that we can't get properly. We, of course should prioritize for the common case, at least IMHO).
Thanks for the quick feedback Danny/Davide. I definitely appreciate the point that SimplifyCFG may not be the best place for this kind of transformation. Davide, I assume the dedicated pass you're referring to is GVNSink? I don't think that pass is enabled yet (I haven't been closely following the progress, so I'm not sure what's holding it up at this point), but it's possible GVNSink would indeed catch the cases this patch does. I haven't tested that yet.
Thanks for cleaning this up.
Jan 11 2018
TBH I was hoping for a more complete approach to testing this, but I thought I share this relatively straight forward check.
Jan 8 2018
I think 'tryToPromote' method should be moved to the indirect call promotion pass so that the callback to inline cost/benefit is exposed there (a more general cost/benefit model needs to be developed for indirect call promotion). Initially, I think we can limit the use of the getInlineCost callback to cases where profile data is not available, this will achieve what this patch is doing without affecting existing promotion heuristics.
Jan 4 2018
Dec 27 2017
Dec 26 2017
Thanks for spotting this!
The reordering looks good to me, give the current behavior of the MachineCombiner it makes sense to move up the more profitable patterns.
However, the order in which we add the patterns to the list of candidates determines the transformation that takes place, since only the first pattern that matches will be used.
This behavior is not ideal though, especially as some order of patterns might be good for one micro architecture, whereas a different order is better for a different micro architecture.
I'll try to look into this. Maybe it's worth to change the MachineCombiner to try all available patterns.
Dec 20 2017
Dec 19 2017
Thanks David! I'll make those changes before committing.
Dec 14 2017
Addressed David's comments. Thanks again!
Thanks again for the comments, David. I'll update the patch shortly.
Dec 13 2017
Dec 12 2017
Moved demoteCall, previously in D40751, to this patch.
Addressed David's comments. Thanks for the feedback!
Dec 11 2017
Dec 6 2017