This is an archive of the discontinued LLVM Phabricator instance.

[X86] Use min/max for vector ult/ugt compares if avoids a sign flip.
ClosedPublic

Authored by craig.topper on Feb 5 2018, 4:12 PM.

Details

Summary

Currently we only use min/max to help with ule/uge compares because it removes an invert of the result that would otherwise be needed. But we can also use it for ult/ugt compares if it will prevent the need for a sign bit flip needed to use pcmpgt at the cost of requiring an invert after the compare.

I also refactored the code so that the max/min code is self contained and does its own return instead of setting up a flag to manipulate the rest of the function's behavior.

Most of the test cases look ok with this. I did notice that we added instructions when one of the operands being sign flipped is a constant vector that we were able to constant fold the flip into.

I also noticed that sometimes the SSE min/max clobbers a register that is needed after the compare. This resulted in an extra move being inserted before the min/max to preserve the register. We could try to detect this and switch from min to max and change the compare operands to use the operand that gets reused in the compare.

Diff Detail

Event Timeline

craig.topper created this revision.Feb 5 2018, 4:12 PM
craig.topper edited the summary of this revision. (Show Details)Feb 5 2018, 4:14 PM

I've been hitting some of the same issues while finishing the TRUNC(CLAMP()) truncation -> PACK support. Would it be a good idea to enable custom lowering of all (U,S)(MAX,MIN) on SSE2+ targets? It cleans out a lot of CMPGT/CMPGE mismatches and makes it easier to match a lot of this stuff using the SMIN etc. opcodes directly.

Maybe it's too early in the morning, but I'm not sure I followed. A lot of these cases aren't min/max how does custom handling help?

Maybe it's too early in the morning, but I'm not sure I followed. A lot of these cases aren't min/max how does custom handling help?

No, it might be me - I'm starting to see phantom minmax patterns everywhere....

But that test_ult_byte regression is annoying.

lib/Target/X86/X86ISelLowering.cpp
18004

Use IsOperationLegal instead? I think that'd add VLX v2i64/v4i64 support?

test/CodeGen/X86/vec_setcc-2.ll
150

Interesting that this didn't commute and fold the load? You'd need a MOVDQArr instead I guess.

craig.topper added inline comments.Feb 7 2018, 12:40 PM
lib/Target/X86/X86ISelLowering.cpp
18004

hasAVX512() && VET == MVT::i64 should cover VLX as well since AVX512 is a subset of VLX.

But I agree is OperationLegal would be cleaner.

Use isOperationLegal

RKSimon accepted this revision.Feb 11 2018, 8:10 AM

LGTM, with a couple of minor thoughts.

Having looked through, I think the PSUBUS regressions can be blamed on PR31293 and shouldn't block this patch.

test/CodeGen/X86/avx512vl-vec-masked-cmp.ll
16301

Is this necessary? Is there anyway that ComputeKnownBits/NumSignBits + SimplifyDemandedBits could be used to remove it?

This revision is now accepted and ready to land.Feb 11 2018, 8:10 AM
This revision was automatically updated to reflect the committed changes.
craig.topper added inline comments.Feb 11 2018, 9:17 AM
test/CodeGen/X86/avx512vl-vec-masked-cmp.ll
16301

I believe it's getting tripped up because the vpternlog is a (v16i8 (bitcast (xor (bitcast v2i64))) after legalize. I don't think computeNumSignBits can cross both of those bitcasts very well.

Probably doesn't help that LegalVectorOps visits nodes bottom up so the compare is legalized first, it creates the xor, the xor gets legalized with the bitcatsts. Then the truncate gets legalized, but its too late.

Meanwhile LegalizeDAG visits nodes top down so would probably be able to handle this case since the truncate would legalize first.