This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Improve SelectionDAGBuilder lowering code for get.active.lane.mask intrinsic
ClosedPublic

Authored by david-arm on Dec 8 2021, 8:26 AM.

Details

Summary

Previously we were using UADDO to generate a two-result value with
the unsigned addition and the overflow mask. We then combined the
overflow mask with the trip count comparison to get a result.
However, we don't need to do this - we can simply use a UADDSAT
saturating add node to add the vector index splat and the stepvector
together. Then we can just compare this to a splat of the trip count.
This results in overall better code quality for both Thumb2 and AArch64.

Diff Detail

Event Timeline

david-arm created this revision.Dec 8 2021, 8:26 AM
david-arm requested review of this revision.Dec 8 2021, 8:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 8 2021, 8:26 AM
dmgreen accepted this revision.Dec 10 2021, 1:14 AM

Seems OK from what I can tell (https://alive2.llvm.org/ce/z/C381E6). It is assuming that a usubsat is present, but the old code was assuming uadd.with.overflow. And we don't expect this to come up in a lot of situations, only unrolled vector loops and those tend to start at 0.

This revision is now accepted and ready to land.Dec 10 2021, 1:14 AM
This revision was landed with ongoing or failed builds.Dec 10 2021, 5:39 AM
This revision was automatically updated to reflect the committed changes.