Previously we were using UADDO to generate a two-result value with
the unsigned addition and the overflow mask. We then combined the
overflow mask with the trip count comparison to get a result.
However, we don't need to do this - we can simply use a UADDSAT
saturating add node to add the vector index splat and the stepvector
together. Then we can just compare this to a splat of the trip count.
This results in overall better code quality for both Thumb2 and AArch64.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Comment Actions
Seems OK from what I can tell (https://alive2.llvm.org/ce/z/C381E6). It is assuming that a usubsat is present, but the old code was assuming uadd.with.overflow. And we don't expect this to come up in a lot of situations, only unrolled vector loops and those tend to start at 0.