I believe, the time has come.
First, let's look at an example:
#include <algorithm>
void foo(int* data, int width, short*LUT, bool cond) {
for(int i = 0; i != width; ++i) {
int& val = data[i];
val = LUT[std::min(std::max(val, cond ? 0 : 128), 255)];
}
}It's obvious to vectorize, and indeed that does happen: https://godbolt.org/z/erqdaed6b
But what if we change the element type of LUT?
#include <algorithm>
void foo(int* data, int width, int*LUT, bool cond) {
for(int i = 0; i != width; ++i) {
int& val = data[i];
val = LUT[std::min(std::max(val, cond ? 0 : 128), 255)];
}
}https://godbolt.org/z/8xrEP88fc <- oops.
The original example vectorized because of TBAA.
Indeed, we don't do a great job of modelling the select's,
so no wonder we failed.
LoopVectorizer has this interesting logic:
// Walk back through the IR for a pointer, looking for a select like the
// following:
//
// %offset = select i1 %cmp, i64 %a, i64 %b
// %addr = getelementptr double, double* %base, i64 %offset
// %ld = load double, double* %addr, align 8
//
// We won't be able to form a single SCEVAddRecExpr from this since the
// address for each loop iteration depends on %cmp. We could potentially
// produce multiple valid SCEVAddRecExprs, though, and check all of them for
// memory safety/aliasing if needed.
//
// If we encounter some IR we don't yet handle, or something obviously fine
// like a constant, then we just add the SCEV for that term to the list passed
// in by the caller. If we have a node that may potentially yield a valid
// SCEVAddRecExpr then we decompose it into parts and build the SCEV terms
// ourselves before adding to the list.
static void findForkedSCEVs(
ScalarEvolution *SE, const Loop *L, Value *Ptr,
SmallVectorImpl<PointerIntPair<const SCEV *, 1, bool>> &ScevList,
unsigned Depth)... that tries to deal with exactly this case, but it does so
by effectively inventing it's own representation,
which forces it to re-implement handling for everything that SCEV already handles,
and it is incomplete.
(We have something similar in SCEV already, getRangeViaFactoring())
I would like to teach SCEV about select's, and then replace that code
with a straight-forward, complete, SCEV-based implementation.
The (only?) regression i see in the diffs is the fact that i didn't add recursive predicate reasoning for selects.
This has a reasonably small compile-time impact:
https://llvm-compile-time-tracker.com/compare.php?from=0cdf030cf8dcfb8bfe551c16b14a29e124497069&to=832e0fd5f8a1ecedc64e414627ad8641d70d2682&stat=instructions:u
... while having some size-total changes.
We don't have such generalized version for div/rem because they have a fixed number of operands, and I also don't think we should have it here, when the number of operands is fixed.