This is an archive of the discontinued LLVM Phabricator instance.

[Codegen] More add_sat and sub_sat promotion
AbandonedPublic

Authored by dmgreen on Oct 9 2019, 10:57 AM.

Details

Summary

As a continuation of D68643, the default promotion for saturation arithmetic can be further refined when MIN/MAX are known to be legal. All the test changes here are in uncommon types like a i4.

This uses isOperationLegal, as opposed to legal or custom because the X86 MINs/MAXs are custom lowered. Some of the MVE tests look larger because they are materialising constants (I imagine they might be pulled out of a loop).

Diff Detail

Event Timeline

dmgreen created this revision.Oct 9 2019, 10:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 9 2019, 10:57 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
dmgreen marked an inline comment as done.Oct 9 2019, 10:59 AM
dmgreen added inline comments.
llvm/test/CodeGen/X86/sadd_sat_vec.ll
496

If we always went though the min/max pair, this would look like:

+; SSE2-NEXT:    paddb %xmm1, %xmm0
+; SSE2-NEXT:    movdqa {{.*#+}} xmm1 = [7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7]
+; SSE2-NEXT:    movdqa %xmm1, %xmm2
+; SSE2-NEXT:    pcmpgtb %xmm0, %xmm2
+; SSE2-NEXT:    pand %xmm2, %xmm0
+; SSE2-NEXT:    pandn %xmm1, %xmm2
+; SSE2-NEXT:    por %xmm0, %xmm2
+; SSE2-NEXT:    movdqa {{.*#+}} xmm1 = [248,248,248,248,248,248,248,248,248,248,248,248,248,
+; SSE2-NEXT:    movdqa %xmm2, %xmm0
+; SSE2-NEXT:    pcmpgtb %xmm1, %xmm0
+; SSE2-NEXT:    pand %xmm0, %xmm2
+; SSE2-NEXT:    pandn %xmm1, %xmm0
+; SSE2-NEXT:    por %xmm2, %xmm0
+; SSE2-NEXT:    retq

Which is a little larger.

nikic accepted this revision.Oct 12 2019, 3:14 AM

LGTM

The signed cases are a bit of a mixed bag in isolation, but will probably do better inside a loop or with adjacent instructions.

This revision is now accepted and ready to land.Oct 12 2019, 3:14 AM
nikic added a comment.Oct 22 2019, 1:37 PM

This needs a rebase after D68926. Possibly some parts of it will no longer make sense (though the USUBSAT case at least should still be a win...)

dmgreen planned changes to this revision.Oct 23 2019, 2:57 AM

Yeah, sorry. I've also been adding selection from arm's qadd/qsub, but I think that will be OK to standard types. Let me try and update a few things and then rebase this to see how it still does. Like you said, unsigned might still look OK.

dmgreen updated this revision to Diff 231442.Nov 28 2019, 8:20 AM

Sorry for the delay. Now that I have added some extra Neon codegen (and some extra tests), this is how this looks.

The unsigned cases still look like a clear win. The signed cases looks worse (although you could argue in cases that this would simplify if part of a loop).

WDYT? Remove the signed part and keep the unsigned?

This revision is now accepted and ready to land.Nov 28 2019, 8:20 AM
xbolva00 added inline comments.
llvm/test/CodeGen/X86/ssub_sat_vec.ll
600

Regressions:(

nikic added a comment.Dec 5 2019, 8:53 AM

Sorry for the delay. Now that I have added some extra Neon codegen (and some extra tests), this is how this looks.

The unsigned cases still look like a clear win. The signed cases looks worse (although you could argue in cases that this would simplify if part of a loop).

WDYT? Remove the signed part and keep the unsigned?

Yeah, I think we'll want to keep just the unsigned part.

RKSimon requested changes to this revision.Feb 20 2020, 10:48 AM

rebase?

This revision now requires changes to proceed.Feb 20 2020, 10:48 AM

@dmgreen Whats the state of this? TargetLowering::expandAddSubSat can use MIN/MAX now so I'm not sure if we need this any more?

dmgreen abandoned this revision.Mar 17 2021, 2:49 AM

Hmm Yeah. It was a long time ago now, I think this wasn't a universal win and other changes have been made in the same area.