as well as [zs]ext(C + x + ...) -> (D + [zs]ext(C-D + x + ...))<nuw><nsw>
if the top level addition in (D + (C-D + x * n)) could be proven to
not wrap, where the choice of D also maximizes the number of trailing
zeroes of (C-D + x * n), ensuring homogeneous behaviour of the
transformation and better canonicalization of such AddRec's
(indeed, there are 2^(2w) different expressions in B1 + ext(B2 + Y) form for
the same Y, but only 2^(2w - k) different expressions in the resulting
B3 + ext((B4 * 2^k) + Y) form, where w is the bit width of the integral type)
The AddExpr version of the transformation enables better canonicalization
of expressions like
1 + zext(5 + 20 * %x + 24 * %y) and zext(6 + 20 * %x + 24 * %y)
which get both transformed to
2 + zext(4 + 20 * %x + 24 * %y)
This pattern is common in address arithmetics and the transformation
makes it easier for passes like LoadStoreVectorizer to prove that 2 or
more memory accesses are consecutive and optimize (vectorize) them.
I found this change similar to a number of other changes to Scalar Evolution, namely:
commit 63c52aea76b530d155ec6913d5c3bbe1ecd82ad8 Author: Sanjoy Das <sanjoy@playingwithpointers.com> Date: Thu Oct 22 19:57:38 2015 +0000 [SCEV] Commute zero extends through <nuw> additions git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251052 91177308-0d34-0410-b5e6-96231b3b80d8
commit 3edd5bf90828613bacfdc2ce047d3776363123e5 Author: Justin Lebar <jlebar@google.com> Date: Thu Jun 14 17:13:48 2018 +0000 [SCEV] Simplify zext/trunc idiom that appears when handling bitmasks. Summary: Specifically, we transform zext(2^K * (trunc X to iN)) to iM -> 2^K * (zext(trunc X to i{N-K}) to iM)<nuw> This is helpful because pulling the 2^K out of the zext allows further optimizations. Reviewers: sanjoy Subscribers: hiraditya, llvm-commits, timshen Differential Revision: https://reviews.llvm.org/D48158
and the most relevant
commit 45788be6e2603ecfc149f43df1a6d5e04c5734d8 Author: Michael Zolotukhin <mzolotukhin@apple.com> Date: Sat May 24 08:09:57 2014 +0000 Implement sext(C1 + C2*X) --> sext(C1) + sext(C2*X) and sext{C1,+,C2} --> sext(C1) + sext{0,+,C2} transformation in Scalar Evolution. That helps SLP-vectorizer to recognize consecutive loads/stores. <rdar://problem/14860614> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209568 91177308-0d34-0410-b5e6-96231b3b80d8
This patch generalizes the latter one by relaxing the requirements the following way:
- C2 doesn't have to be a power of 2, it enough if it's divisible by 2 a sufficient number of times;
- C1 doesn't have to be less than C2, instead of extracting the entire C1 we can split it into 2 terms: (00...0XXX + YY...Y000), keep the second one that may cause wrapping within the extension operator, and move the first one that doesn't affect wrapping out of the extension operator, enabling further simplifications;
- C1 and C2 don't have to be positive, splitting C1 like shown above produces a sum that is guaranteed to not wrap, signed or unsigned;
- in AddExpr case there could be more than 2 terms, and in case of AddExpr the 2nd and following terms and in case of AddRecExpr the Step component don't have to be in the C2*X form or constant (respectively), they just need to have enough trailing zeros, which in turn could be guaranteed by means other than arithmetics, e.g. by a pointer alignment;
- the extension operator doesn't have to be a sext, the same transformation works and profitable for zext's as well.
Apparently, optimizations like SLPVectorizer currently fail to
vectorize even rather trivial cases like the following:
double bar(double *a, unsigned n) {
double x = 0.0; double y = 0.0; for (unsigned i = 0; i < n; i += 2) { x += a[i]; y += a[i + 1]; } return x * y;
}
If compiled with clang -std=c11 -Wpedantic -Wall -O3 main.c -S -o - -emit-llvm
(!{!"clang version 7.0.0 (trunk 337339) (llvm/trunk 337344)"})
it produces scalar code with the loop not unrolled with the unsigned n and
i (like shown above), but vectorized and unrolled loop with signed n and
i (follow https://godbolt.org/g/nq9xF8 to play with it).
With the changes made in this commit the unsigned version will be
vectorized (though not unrolled for unclear reasons).