Previously we were using UADDO to generate a two-result value with
the unsigned addition and the overflow mask. We then combined the
overflow mask with the trip count comparison to get a result.
However, we don't need to do this - we can simply use a UADDSAT
saturating add node to add the vector index splat and the stepvector
together. Then we can just compare this to a splat of the trip count.
This results in overall better code quality for both Thumb2 and AArch64.
Details
Details
Diff Detail
Diff Detail
Event Timeline
Comment Actions
Seems OK from what I can tell (https://alive2.llvm.org/ce/z/C381E6). It is assuming that a usubsat is present, but the old code was assuming uadd.with.overflow. And we don't expect this to come up in a lot of situations, only unrolled vector loops and those tend to start at 0.