For logical or/and reductions we emit regular intrinsics @llvm.vector.reduce.or/and.vxi1 calls.
These intrinsics are not effective for the logical or/and reductions,
especially if the optimizer is able to emit short circuit versions of
the scalar or/and instructions and vector code gets less effective than
the scalar version.
Instead, or reduction for i1 can be represented as:
%val = bitcast <ReduxWidth x i1> to iReduxWidth %res = cmp ne iReduxWidth %val, 0
and reduction for i1 can be represented as:
%val = bitcast <ReduxWidth x i1> to iReduxWidth %res = cmp eq iReduxWidth %val, 11111
This improves perfromance of the vector code significantly and make it
to outperform short circuit scalar code.
Can you add the condition that && isa<FixedVectorType>(Src)? (same request for LoopVectorize.cpp and SLPVectorize.cpp)
We're starting to make the LoopVectorizer vectorize for scalable VFs. This means we're currently fixing up cases like this where assumptions are made that are only valid for fixed-width vectors. For scalable vectors it might be possible to do the <vscale x N x i1> reduction as a compare on <vscale x 1 x iN>, but at least for SVE I know that we never want that.