Rebased and updated test.
Apr 12 2019
Jun 7 2018
This change looks good to me. Thanks for the patch.
Apr 26 2018
Mar 15 2018
Just a gentle reminder to have a look at the latest VecClone algorithm. Thanks, Matt.
Jan 26 2018
The latest update includes some pretty extensive rework of the VecClone algorithm that Hal suggested. I also added support for VecClone in the new pass manager and since it is my first time doing this, I would welcome any specific comments on whether or not this was done correctly. Please note that I'm still working on fixing existing tests and will be adding new ones, but I wanted to make sure the overall algorithm is headed in the right direction. Thanks all.
Extensive update to the VecClone algorithm based on Hal's feedback. VecClone pass is now supported through the new pass manager. Other minor code changes made.
Jan 3 2018
Ok, after doing some experimentation I believe I understand where you're heading with this. Once I have done some more refactoring I'll post a new version of VecClone for review to make sure we're on the same page.
Dec 21 2017
Thanks for the comments, Hal. Just to clarify your point #2, I think what you're saying is that we should start from a common parameter representation; i.e., parameters should be loaded/stored through memory. Please correct me if I'm wrong. I certainly think this would be a great way to reduce the complexity of the algorithm. The remainder of items in your list should already be covered, but some tweaking may be involved.
Nov 28 2017
Moved calcCharacteristicType() function to VectorUtils so that VecClone and LV can share.
Removed vector-variant function attributes in LV instead of VecClone because LV needs to see them.
Nov 27 2017
Nov 6 2017
Oct 31 2017
Oct 23 2017
Feb 7 2017
Thanks for the feedback, Hal. I made the changes you suggested.
New changes are:
- Update function attributes to "vector-variants"="<variant list>" format.
- Move target-specific code in the VectorVariant class to TTI.
Oct 4 2016
Sep 1 2016
Aug 31 2016
Thanks for the comments, Mehdi. I had some other things come up, but I'm making some corrections now.
Jul 29 2016
Jul 27 2016
Jul 25 2016
I was just recently given commit privileges, so I can do it. Thanks Hal.
Jul 22 2016
Thanks Michael. The tests have been updated.
Jul 21 2016
I think this is just saying that some of the weird types are not supported on all targets. For now, is it ok to proceed with checking this code in?
Jul 19 2016
In the process of writing test cases, I noticed that a loop with a call to llvm.log.f32 was not getting vectorized due to cost modeling. When forcing vectorization on the loop and throwing -fveclib=SVML, the loop was vectorized with a widened intrinsic instead of the svml call. Is this correct? I would have expected to get the svml call. In light of this, wouldn't it be better to represent the math calls with vector intrinsics and let CodeGenPrepare or the backends decide how to lower them?
Jul 15 2016
Thanks for reviewing. One concern I have going forward is the number of entries that will appear in the switch statement inside addVectorizableFunctionsFromVecLib(). I assume that the right thing to do is to replace this with something that is TableGen'd? Also, I just wanted to point out that some of these entries will result in svml calls that are not legal. E.g., __svml_sinf32 does not actually exist in the library, but can be legalized in case one explicitly sets a vector length of 32. Although these types of cases are probably not common, I wanted to bring this to your attention since the legalization piece of this work will be reviewed and committed separately. If needed, I can remove those entries until the legalization is in place.
Jul 14 2016
Jun 17 2016
Apr 28 2016
Thanks for the feedback, Hal. We need the clang support because we actually have several variants of each svml function, each of them having varying levels of precision. To be able to pick the right variant, not only do we need to know the variant name and vector length, we also need the additional precision information specified by the user via -imf flags from the command line. If we go the route of implementing the translation using the addVectorizableFunctionsFromVecLib(), it seems some additional information about precision requirements would be needed in the VecDesc struct. One thing I'm concerned about with this approach is that if the math calls are translated immediately to library calls, then any subsequent optimizations are probably in a less advantageous position of being able to optimize further. Thus, one of the design goals of this project was to keep the vectorized intrinsics as late as possible before translation. What are your thoughts on this?
Apr 26 2016
Apr 6 2016
I currently have an RFC for translating vector math intrinsics to svml calls. This proposal includes the user specifying the desired precision requirements via several flags (supported by the Intel compiler currently). The plan is to attach this information in the form of function attributes at the calls sites of the math intrinsics. In turn, these attributes drive the selection of the appropriate svml function variant. Would this be helpful in this particular case? I've included a description of the flags below.