This is an archive of the discontinued LLVM Phabricator instance.

Allow code generation of ARM usat/ssat instructions
Needs ReviewPublic

Authored by weimingz on Mar 16 2015, 6:15 PM.
This revision needs review, but all reviewers have resigned.

Details

Reviewers
apazos
Summary

Currently, LLVM is unable to emit usat/ssat for codes like: x = c > 255 ? 255 : (c < 0 ? 0 : c)

it generates:

%cmp = icmp sgt i32 %c, 255
%cmp1 = icmp slt i32 %c, 0
%cond = select i1 %cmp1, i32 0, i32 %c
%cond5 = select i1 %cmp, i32 255, i32 %cond
ret i32 %cond5

ASM:
cmp r0, #0
mov r1, r0
movwlt r1, #0
cmp r0, #255
movwgt r1, #255
mov r0, r1

We expect only one instruction:

usat	r0, #8, r0

This pass transforms comparisons and selections into ARM usat/ssat saturating intrinsic. I implemented as a IR level transformation instead of backend peephole because it's easier to matching and maybe shared by other targets if similar instructions are available.
Our testing shows up to 4% speedup for some benchmarks and no regressions.

Please help to review!
Thanks

Diff Detail

Event Timeline

weimingz updated this revision to Diff 22067.Mar 16 2015, 6:15 PM
weimingz retitled this revision from to Allow code generation of ARM usat/ssat instructions.
weimingz updated this object.
weimingz edited the test plan for this revision. (Show Details)
weimingz added reviewers: apazos, mcrosier.
weimingz added a subscriber: Unknown Object (MLST).
ab added a subscriber: ab.Mar 16 2015, 7:55 PM
ab added a comment.Mar 16 2015, 8:05 PM

I should note I'm sitting on patches to add generic saturation support; originally with dedicated intrinsics (there's a few months old RFC if you want to have a look), but now with SelectionDAG-level matching, with CodeGenPrepare's help. With a few other tweaks, this enables:

  • other target support (I gather you generate the ARM intrinsics, what that's unavailable elsewhere?),
  • vectorization (on say AArch64 or X86, there are only vector saturation instructions),
  • as well as DAG combines (stuff like X86 PACKSS, or add-with-saturation instructions, etc..).

Anyway: if you're interested, I can rebase and continue, and put the patch set up for review. In the meantime I'll try to have a closer look to this one.

Thanks for working on this!
-Ahmed

Hi Ahmed,

It's a good idea to have a generic sat intrinsic. It will capture the exact semantics of such operations and allows other transformations even for some targets that have no hardware SAT instructions.
For example, for AArch64, it will allow vectorizer to find opportunities for UQXTN/SQXTN.

In order to find as many patterns of clamp as possible, doing in DAG is too late in some cases because clamp is often used with other operations . For example: (clamp(x, 255) >> 2 ) << 5, EarlyCSE will hoist the shift, which prevents SimplifyCFG to convert the outer "if" to "SELECT". Besides, other passes will convert it to (x > 255 ? 2016, ...), which makes pattern matching very difficult.

Thanks,
Weiming

Hi Ahmed,

It seems a more generic approach to saturation would be a lot less complicated than this very long and repetitive list of conditionals. I may be wrong, but this pattern is normally target independent and can be done in generic IR, so it shouldn't be at all an ARM pass. Once you have usat and ssat (generic) intrinsics, then you can start pattern matching to select the best one, vectorize, etc.

Do you have your review somewhere public? A Phab review perhaps?

Weiming, maybe merging with the in progress patch would give you some ideas.

Makes sense?

cheers,
--renato

Outside of everything else in the review: please run clang-format over the code you want to add. Too many coding style issues to call them all out in the first pass :)

-eric

weimingz updated this revision to Diff 22107.Mar 17 2015, 11:19 AM

run clang-format

Hi Ahmed,

It seems a more generic approach to saturation would be a lot less complicated than this very long and repetitive list of conditionals. I may be wrong, but this pattern is normally target independent and can be done in generic IR, so it shouldn't be at all an ARM pass. Once you have usat and ssat (generic) intrinsics, then you can start pattern matching to select the best one, vectorize, etc.

Do you have your review somewhere public? A Phab review perhaps?

Weiming, maybe merging with the in progress patch would give you some ideas.

Makes sense?

cheers,
--renato

Is this the patch for generic sat? http://reviews.llvm.org/D6976
We can merge this first

jmolloy added a subscriber: jmolloy.Apr 1 2015, 6:52 AM

Hi Weiming,

I've been looking at clamp() too, specifically when we don't have the right bounds for a saturation, so we need to emit a sequence of min + max instructions.

I think your patch is a good basis for this, but please do bear in mind the other use case when designing your helper functions so it is possible to add this behaviour in the future!

Cheers,

James

Hi James,

Right. After Ahmed's generic saturation intrinsic patch is merged, let's revisit it.

Thanks,
Weiming

Hi Weiming,

I've been looking at clamp() too, specifically when we don't have the right bounds for a saturation, so we need to emit a sequence of min + max instructions.

I think your patch is a good basis for this, but please do bear in mind the other use case when designing your helper functions so it is possible to add this behaviour in the future!

Cheers,

James

apazos resigned from this revision.Feb 8 2019, 10:39 AM