Hi All,
I'm currently looking into adding support to recognize and vectorize non SIMD kind of parallelism e.g. add sub patterns.
This kind of parallelism may be important in complex/numerical computations were these patterns are common.
These patterns can later be converted to instructions such as ADDSUBPS on X86.
I would like to get few inputs on the patch and design used to support this feature.
This patch adds support to recognize one asymmetric pairing (i.e. add/sub instruction). This is the rough design which i followed-
- Recognize add/sub patterns in getSameOpcode as shuffle vector instructions and handle shuffle vector in buildTree_rec.
- Calculate appropriate cost of vectorization when shuffle vector is used.
- Calculate appropriate mask and create shuffle vector instruction to vectorize these patterns.
The advantage of using shuffle vector is we can use the same shuffle vector code as in this patch to generally pair any alternative sequence such as addsub, subadd etc we just need to handle them in getSameOpcode and classify them as shuffle vectors.
I tested the patch on a local test case having large number of add/sub patterns and it seems to give a nice ~10% improvement.
Awaiting inputs.
Thanks and Regards
Karthik Bhat
Please leave a cost of 1 here.