Page MenuHomePhabricator

[DAG] Move vselect(icmp_ult, -1, add(x,y)) -> uaddsat(x,y) to DAGCombine (PR40111)
ClosedPublic

Authored by RKSimon on Nov 20 2020, 9:14 AM.

Details

Summary

Move the X86 VSELECT->UADDSAT fold to DAGCombiner - there's nothing target specific about these folds.

The SSE42 test diffs are relatively benign - its avoiding an extra constant load in exchange for an extra xor operation - there are extra register moves, which is annoying as all those operations should commute them away.

Diff Detail

Event Timeline

RKSimon created this revision.Nov 20 2020, 9:14 AM
RKSimon requested review of this revision.Nov 20 2020, 9:14 AM

Why do we have v4i32 UADDSAT marked as Custom with SSE2, but then fail out of the LowerADDSAT_SUBSAT is UMIN isn't Legal, I think it needs SSE4.1?

Why do we have v4i32 UADDSAT marked as Custom with SSE2, but then fail out of the LowerADDSAT_SUBSAT is UMIN isn't Legal, I think it needs SSE4.1?

Because we tag UMIN/UMAX as custom on SSE2 targets so the legalizer will try to use them - I'll see if we can remove that as well.

Why do we have v4i32 UADDSAT marked as Custom with SSE2, but then fail out of the LowerADDSAT_SUBSAT is UMIN isn't Legal, I think it needs SSE4.1?

Because we tag UMIN/UMAX as custom on SSE2 targets so the legalizer will try to use them - I'll see if we can remove that as well.

I misread the code in LowerADDSAT_SUBSAT when I asked that question. We use the default expansion if UMIN/UMAX is legal and custom otherwise.

RKSimon updated this revision to Diff 308067.Nov 27 2020, 8:40 AM
RKSimon edited the summary of this revision. (Show Details)

Only expand UADDSAT using UMIN if its legal (+ the same for USUBSAT with UMAX). Move the 4i32/2i64 UADDSAT custom lowering to SSE41+

RKSimon updated this revision to Diff 308302.Nov 30 2020, 3:07 AM
RKSimon edited the summary of this revision. (Show Details)

Restrict ISD::UADDSAT custom lowering to SSE42+ targets - without PCMPGT+BLENDV codegen isn't great.

X86 changes LGTM

RKSimon added a subscriber: dmgreen.

@nemanjai @efriedma @dmgreen Are you happy with the patch?

dmgreen accepted this revision.Nov 30 2020, 2:57 PM

Interestingly, this seems to be detecting a different set of patterns than those that would be converted by instcombine, with unsigned_sat_constant_v16i8_using_cmp_sum and unsigned_sat_constant_v16i8_using_cmp_notval getting converted here but not in instcombine and unsigned_sat_constant_v16i8_using_min being the other way around. The ones I looked into checked out as being valid to transform (and the code was already being used is always a good sign). It might be worth looking if any of these tricks are missing from instcombine. This patch looks like an improvement on it's own though.

LGTM, Thanks.

This revision is now accepted and ready to land.Nov 30 2020, 2:57 PM
pengfei added inline comments.Nov 30 2020, 7:31 PM
llvm/test/CodeGen/X86/sat-add.ll
726–727

Nit: There seems an extra mov.

RKSimon added inline comments.Dec 1 2020, 3:34 AM
llvm/test/CodeGen/X86/sat-add.ll
726–727

Yeah - its particularly annoying as POR is commutable!

This revision was landed with ongoing or failed builds.Dec 1 2020, 3:59 AM
This revision was automatically updated to reflect the committed changes.