This is an archive of the discontinued LLVM Phabricator instance.

[x86] improve the default expansion of uaddsat/usubsat
ClosedPublic

Authored by spatel on Mar 5 2019, 4:12 PM.

Details

Summary

This is yet another step towards solving PR14613 (almost there!):
https://bugs.llvm.org/show_bug.cgi?id=14613

uaddsat X, Y --> (X >u (X + Y)) ? -1 : X + Y
usubsat X, Y --> (X >u Y) ? X - Y : 0

We can't count on a sane vector ISA, so override the default (umin/umax) expansion of unsigned add/sub saturate in cases where we do not have umin/umax.

There may be some small AVX1 opportunities still lurking, but I saw regressions if we allow those transforms wholesale, so stopping here to make sure things look right/better.

Diff Detail

Event Timeline

spatel created this revision.Mar 5 2019, 4:12 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 5 2019, 4:12 PM
RKSimon added inline comments.Mar 6 2019, 9:05 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
36147

This only differs from the default expansion by the optimal CondCode to use in the select - ideally we'd have a way for TLI to indicate 'preferred' comparison codes - x86/sse is probably not alone in having limited comparisons (SGT + EQ) and the others having to be custom handled.

spatel planned changes to this revision.Mar 6 2019, 9:36 AM
spatel marked an inline comment as done.
spatel added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
36147

Yes, if we can thread the predicate needle, we can get the optimal x86 code by changing the generic expansion, rather than adding x86-specific combines.

Looking at this a bit closer: the key to making this generically better is realizing that this select shouldn't be a select if we have a vector 0/-1 mask created by the compare. In that case, we should only have a bitwise logic op (and/or), never a pblendv or pandn.

Unfortunately, it seems we're missing some generic and/or x86-specific min/max transforms to back that up, so I need to chase those down.

We may also be suffering from the fact that D58974 is not a generic combine. Let me know if I should deal with that one.

@nikic - I just want to confirm: are the addsat/subsat tests in trunk only for x86 right now? If so, I want to duplicate them for AArch64 at least, so we know that we're making things better for more than just x86.

nikic added inline comments.Mar 6 2019, 9:47 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
36147

I just want to confirm: are the addsat/subsat tests in trunk only for x86 right now? If so, I want to duplicate them for AArch64 at least, so we know that we're making things better for more than just x86.

Yes, right now we're testing codegen only for X86. I was planning to look into better AArch64 codegen soon. It's probably not a good target to test generic expansions, because iirc it has instructions covering the full set of legal vector types.

spatel updated this revision to Diff 191595.Mar 20 2019, 3:43 PM

Patch updated:
We improved the generic expansion slightly with D59066. That leaves customization for x86 which is required because umin/umax are custom lowered even if we don't actually have the instructions pmaxud/pmaxuq. That's not a generic lowering problem; that's an x86 problem.

In the earlier draft, I had made this a combine, but that seems pretty clearly wrong. We're just custom lowering a few specific vector types. Test changes look pretty close what we had before.

RKSimon accepted this revision.Mar 21 2019, 8:45 AM

LGTM - thanks!

llvm/lib/Target/X86/X86ISelLowering.cpp
23895

Move these down inside the "if (VT.is128BitVector())" loop ?

This revision is now accepted and ready to land.Mar 21 2019, 8:45 AM
nikic added inline comments.Mar 21 2019, 9:58 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
23895

Or also replace the usages in the i1 branch :)

23914

Instead of hardcoding specific types and subtargets, maybe check operation legality?

if (Op.getOpcode() == ISD::UADDSAT && !TLI.isOperationLegal(ISD::UMIN, VT)) {
    // ...
}
if (Op.getOpcode() == ISD::USUBSAT && !TLI.isOperationLegal(ISD::UMAX, VT)) {
    // ...
}
lebedev.ri accepted this revision.Mar 21 2019, 10:14 AM

This LG, but i'm not sure i understand how this is related to D59066?
Here, we are clearly end up with no select in ASM.
But in D59066 we expand to this pattern.
So there is something else that is able to do the transform that we do manually in D59066?
Should D59066 be doing something else to simply trigger the existing transform?

This LG, but i'm not sure i understand how this is related to D59066?
Here, we are clearly end up with no select in ASM.
But in D59066 we expand to this pattern.
So there is something else that is able to do the transform that we do manually in D59066?
Should D59066 be doing something else to simply trigger the existing transform?

This case should be caught by D59174 after the select has been expanded into bitwise logic. D59066 forces bitwise logic even if the select would not usually be expanded.

spatel marked 2 inline comments as done.Mar 21 2019, 11:15 AM

This LG, but i'm not sure i understand how this is related to D59066?
Here, we are clearly end up with no select in ASM.
But in D59066 we expand to this pattern.
So there is something else that is able to do the transform that we do manually in D59066?
Should D59066 be doing something else to simply trigger the existing transform?

Sorry - this sequence of patches got confusing.
I started here just trying to improve x86 codegen, but then we thought other targets might benefit from something similar.
So D59066 was initially a superset of this change along with the select improvement.
But then it became clear that the generic expansion is mostly as good as it could be - if you have a decent ISA, not SSE. :)
@nikic also added the select combine to make things generally better.
So, we removed the hacks for x86 from the other patch and hopefully made it clear in this patch that we are working around x86-specific potholes.

llvm/lib/Target/X86/X86ISelLowering.cpp
23895

Yes - that was the intent. I'll do that as a preliminary NFC.

23914

Yes, that does look less fragile.

This revision was automatically updated to reflect the committed changes.