This is an archive of the discontinued LLVM Phabricator instance.

[ISel] Expand saddsat and ssubsat via asr and xor
ClosedPublic

Authored by dmgreen on Jul 12 2021, 3:52 PM.

Details

Summary

This changes the lowering of saddsat and ssubsat so that instead of using

r,o = saddo x, y
c = setcc r < 0
s = c ? INTMAX : INTMIN
ret o ? s : r

into using asr and xor to materialize the INTMAX/INTMIN constants:

r,o = saddo x, y
s = ashr r, BW-1
x = xor s, INTMIN
ret o ? x : r

https://alive2.llvm.org/ce/z/TYufgD

This seems to reduce the instruction count in most testcases across most architectures.

Diff Detail

Event Timeline

dmgreen created this revision.Jul 12 2021, 3:52 PM
dmgreen requested review of this revision.Jul 12 2021, 3:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2021, 3:52 PM
Herald added subscribers: MaskRay, wdng. · View Herald Transcript
dmgreen added inline comments.Jul 12 2021, 3:56 PM
llvm/test/CodeGen/X86/ssub_sat.ll
33 ↗(On Diff #358075)

Now that I look again, some of the individual X86 tests have increased, although the total size of the files has gone down.

What do we usually do about such things? I could just custom lower this in ARM/AArch64, but it appears to be better in general.

efriedma added inline comments.Jul 12 2021, 6:25 PM
llvm/test/CodeGen/AArch64/sadd_sat.ll
16

We could save two instructions here without really changing the algorithm if we use the carry flag from the adds, i.e. (UADDO LHS, RHS), instead of checking whether the result of the adds is negative. That's the same number of instructions as your new sequence, but with a shorter critical path, so probably better?

llvm/test/CodeGen/X86/ssub_sat.ll
33 ↗(On Diff #358075)

There are basically two options: use a target hook, or try to pick some reasonable default and write custom lowering for targets where the default is bad. There's no great answer here.

Note that even the existing x86 code isn't really ideal: if we replace the addl with leal, we can save an instruction by merging the cmpl with the subl.

dmgreen added inline comments.Jul 13 2021, 12:33 AM
llvm/test/CodeGen/AArch64/sadd_sat.ll
16

Nice idea, although I'm not sure how much it would matter in practice. The thumb1 versions were my main target, they won't have been vectorized like most other architectures.

llvm/test/CodeGen/X86/ssub_sat.ll
33 ↗(On Diff #358075)

The files were all smaller, I hadn't noticed this was quite a common case though.

foad added a subscriber: foad.Jul 13 2021, 4:12 AM

X86 already custom lowers a couple of add/subsat types so adding a couple of other cases would be fine. Annoyingly we don't have a vXi64 ashr until AVX512....

dmgreen updated this revision to Diff 366230.Aug 13 2021, 5:59 AM

After trying custom lowering a couple of times, I've decided I don't know the X86 architecture well enough to make good decisions about when it should be custom lowering, and what it should lower to.

Although it's a little ugly, this adds a target hook to pick between the two options and chooses the old method for X86 ssub.sat, where it appeared to be increasing instruction count with the ashr xor version. Does that sound OK to folks?

RKSimon added inline comments.Aug 16 2021, 11:34 AM
llvm/lib/Target/X86/X86ISelLowering.h
1036

I think we're better off avoiding this if we can - if you revert this change I'll add a X86 custom lowering and email it to you.

dmgreen added inline comments.Aug 18 2021, 2:52 AM
llvm/lib/Target/X86/X86ISelLowering.h
1036

It is quite ugly. I can take another looks at trying to custom lower it - there are just quite a few combinations.

dmgreen updated this revision to Diff 367237.Aug 18 2021, 10:18 AM

Remove the target hook and add some custom lowering for X86 scalar (Thanks to @RKSimon!)

I also altered the PromoteIntRes_ADDSUBSHLSAT from using isOperationLegalOrCustom to isOperationLegal to make sure this custom lowering didn't mess up illegal types - under the assumption that if you are custom lowering ssubsat it is unlikely to be more efficient than a min/max combo.

Some of the vector tests are still a bit larger, but not reliably and on average it seems better. Let me know if any should be specifically focused on.

The X86 vector changes look OK to me - we've reduced usage of variable blends (blendvps/blendvpd) which tends to be a particular bottleneck.

dmgreen updated this revision to Diff 367258.Aug 18 2021, 10:59 AM

Custom lower v2i64 too, under at least SSE4.1, which seems to be an improvement and reduces a lot of the remaining vector size increases. It does mean changing another isOperationLegalOrCustom -> isOperationLegal to prevent an infinite loop between custom lowering saddsat and saddo.

The X86 vector changes look OK to me - we've reduced usage of variable blends (blendvps/blendvpd) which tends to be a particular bottleneck.

OK thanks. Let me know whether to remove the v2i64 case or not.

The X86 vector changes look OK to me - we've reduced usage of variable blends (blendvps/blendvpd) which tends to be a particular bottleneck.

OK thanks. Let me know whether to remove the v2i64 case or not.

Your v2i64 change is fine.

RKSimon added inline comments.Aug 19 2021, 4:35 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
1144

Please can you move the SSE41 v2i64 lines here.

dmgreen updated this revision to Diff 367477.Aug 19 2021, 6:15 AM

Move X86 v2i64 setOperationAction's

RKSimon accepted this revision.Aug 19 2021, 6:21 AM

LGTM - cheers

This revision is now accepted and ready to land.Aug 19 2021, 6:21 AM
This revision was landed with ongoing or failed builds.Aug 19 2021, 8:08 AM
This revision was automatically updated to reflect the committed changes.