Page MenuHomePhabricator

[SelectionDAG] Remove an early-out from computeKnownBits for smin/smax
Needs ReviewPublic

Authored by foad on Sep 4 2020, 8:24 AM.

Details

Summary

Even if we know nothing about LHS, it can still be useful to know that
smax(LHS, RHS) >= RHS and smin(LHS, RHS) <= RHS.

Diff Detail

Event Timeline

foad created this revision.Sep 4 2020, 8:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 4 2020, 8:24 AM
foad requested review of this revision.Sep 4 2020, 8:24 AM
foad added a comment.Sep 4 2020, 8:26 AM

N.B without D87034, this change wouldn't affect any codegen tests. So this is one case where the improved known bits analysis actually makes a difference.

RKSimon added a subscriber: craig.topper.
RKSimon added inline comments.
llvm/test/CodeGen/X86/avx512-trunc.ll
1020

Is this testing what it means to? I can't remember offhand what the test is for - @craig.topper any ideas?

llvm/test/CodeGen/X86/masked_store_trunc_usat.ll
5199

regression?

6420

regression?

llvm/test/CodeGen/X86/vector-trunc-usat.ll
4266

we appear to have 3 constant loads now instead of 2

craig.topper added inline comments.Sep 7 2020, 6:22 PM
llvm/test/CodeGen/X86/avx512-trunc.ll
1020

Not sure either. InstCombine simplifies it to a store of all ones.

foad added inline comments.Sep 8 2020, 2:48 AM
llvm/test/CodeGen/X86/masked_store_trunc_usat.ll
5199

Wel, yes... It has spotted that the result of the pminsw is always negative, so rather than XOR with 0x8000 to flip (i.e. clear) the sign bit, it can AND with 0x7fff to clear the sign bit. But unfortunately that means materialising another constant.

I don't know where this XOR -> AND "optimization" happens, or whether it can be finessed.

The other regressions you pointed out below are basically the same issue.

foad added inline comments.Sep 10 2020, 5:45 AM
llvm/test/CodeGen/X86/masked_store_trunc_usat.ll
5199

I don't know where this XOR -> AND "optimization" happens, or whether it can be finessed.

We have done this basically forever, in the XOR case in TargetLowering::SimplifyDemandedBits:

// If one side is a constant, and all of the known set bits on the other
// side are also set in the constant, turn this into an AND, as we know
// the bits will be cleared.
//    e.g. (X | C1) ^ C2 --> (X | C1) & ~C2 iff (C1&C2) == C2

It seems to me that this is just bad luck, that we transform X^0x8000 into X&0x7FFF, but that happens to regress code quality because now we can't share with another use of the constant 0x8000. Is there any systematic way of fixing this, e.g. by doing the reverse transformation once we know what constants are already available in registers? Or can I commit this patch even with a known regression like this? After all, I'm sure if I looked hard enough I could find another test that got better by luck instead of worse.

foad added inline comments.Sep 10 2020, 9:25 AM
llvm/test/CodeGen/X86/masked_store_trunc_usat.ll
5199

If you buy the argument that SimplifyDemandedBits replacing XOR -> AND is not an optimization and is unhelpful in this case, D87464 + D87465 is my attempt at fixing that. If I rebase this patch on those two then all the bad diffs go away, leaving only the good diffs in avx512-trunc.ll.

RKSimon added inline comments.
llvm/test/CodeGen/X86/avx512-trunc.ll
1020

The test was added as part of D45315 - @ArturGainullin hasn't been an active for some time afaict.

I think we might be able to get away with adjusting the -1 splat to be a non-uniform mix of -ve constant values.