Currently, LLVM is unable to emit usat/ssat for codes like: x = c > 255 ? 255 : (c < 0 ? 0 : c)
it generates:
%cmp = icmp sgt i32 %c, 255 %cmp1 = icmp slt i32 %c, 0 %cond = select i1 %cmp1, i32 0, i32 %c %cond5 = select i1 %cmp, i32 255, i32 %cond ret i32 %cond5
ASM:
cmp r0, #0
mov r1, r0
movwlt r1, #0
cmp r0, #255
movwgt r1, #255
mov r0, r1
We expect only one instruction:
usat r0, #8, r0
This pass transforms comparisons and selections into ARM usat/ssat saturating intrinsic. I implemented as a IR level transformation instead of backend peephole because it's easier to matching and maybe shared by other targets if similar instructions are available.
Our testing shows up to 4% speedup for some benchmarks and no regressions.
Please help to review!
Thanks