I believe, the time has come.
First, let's look at an example:
#include <algorithm> void foo(int* data, int width, short*LUT, bool cond) { for(int i = 0; i != width; ++i) { int& val = data[i]; val = LUT[std::min(std::max(val, cond ? 0 : 128), 255)]; } }
It's obvious to vectorize, and indeed that does happen: https://godbolt.org/z/erqdaed6b
But what if we change the element type of LUT?
#include <algorithm> void foo(int* data, int width, int*LUT, bool cond) { for(int i = 0; i != width; ++i) { int& val = data[i]; val = LUT[std::min(std::max(val, cond ? 0 : 128), 255)]; } }
https://godbolt.org/z/8xrEP88fc <- oops.
The original example vectorized because of TBAA.
Indeed, we don't do a great job of modelling the select's,
so no wonder we failed.
LoopVectorizer has this interesting logic:
// Walk back through the IR for a pointer, looking for a select like the // following: // // %offset = select i1 %cmp, i64 %a, i64 %b // %addr = getelementptr double, double* %base, i64 %offset // %ld = load double, double* %addr, align 8 // // We won't be able to form a single SCEVAddRecExpr from this since the // address for each loop iteration depends on %cmp. We could potentially // produce multiple valid SCEVAddRecExprs, though, and check all of them for // memory safety/aliasing if needed. // // If we encounter some IR we don't yet handle, or something obviously fine // like a constant, then we just add the SCEV for that term to the list passed // in by the caller. If we have a node that may potentially yield a valid // SCEVAddRecExpr then we decompose it into parts and build the SCEV terms // ourselves before adding to the list. static void findForkedSCEVs( ScalarEvolution *SE, const Loop *L, Value *Ptr, SmallVectorImpl<PointerIntPair<const SCEV *, 1, bool>> &ScevList, unsigned Depth)
... that tries to deal with exactly this case, but it does so
by effectively inventing it's own representation,
which forces it to re-implement handling for everything that SCEV already handles,
and it is incomplete.
(We have something similar in SCEV already, getRangeViaFactoring())
I would like to teach SCEV about select's, and then replace that code
with a straight-forward, complete, SCEV-based implementation.
The (only?) regression i see in the diffs is the fact that i didn't add recursive predicate reasoning for selects.
This has a reasonably small compile-time impact:
https://llvm-compile-time-tracker.com/compare.php?from=0cdf030cf8dcfb8bfe551c16b14a29e124497069&to=832e0fd5f8a1ecedc64e414627ad8641d70d2682&stat=instructions:u
... while having some size-total changes.
We don't have such generalized version for div/rem because they have a fixed number of operands, and I also don't think we should have it here, when the number of operands is fixed.