This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
-
SelectionDAG.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
known-bits-vector.ll

Differential D87145

[SelectionDAG] Remove an early-out from computeKnownBits for smin/smax
ClosedPublic

Authored by foad on Sep 4 2020, 8:24 AM.

Download Raw Diff

Details

Reviewers

nikic
RKSimon
craig.topper
pengfei
yubing

Commits

rG868da2ea939b: [SelectionDAG] Remove an early-out from computeKnownBits for smin/smax

Summary

Even if we know nothing about LHS, it can still be useful to know that
smax(LHS, RHS) >= RHS and smin(LHS, RHS) <= RHS.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

foad created this revision.Sep 4 2020, 8:24 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 4 2020, 8:24 AM

Herald added subscribers: llvm-commits, ecnelises, hiraditya. · View Herald Transcript

foad requested review of this revision.Sep 4 2020, 8:24 AM

foad added a parent revision: D87034: [KnownBits] Implement accurate unsigned and signed max and min.Sep 4 2020, 8:25 AM

Harbormaster completed remote builds in B70662: Diff 289963.Sep 4 2020, 8:25 AM

N.B without D87034, this change wouldn't affect any codegen tests. So this is one case where the improved known bits analysis actually makes a difference.

RKSimon added a reviewer: craig.topper.Sep 7 2020, 11:18 AM

RKSimon added a subscriber: craig.topper.

RKSimon added inline comments.

llvm/test/CodeGen/X86/avx512-trunc.ll
1020 ↗	(On Diff #289963)	Is this testing what it means to? I can't remember offhand what the test is for - @craig.topper any ideas?
llvm/test/CodeGen/X86/masked_store_trunc_usat.ll
5199 ↗	(On Diff #289963)	regression?
6420 ↗	(On Diff #289963)	regression?
llvm/test/CodeGen/X86/vector-trunc-usat.ll
4266 ↗	(On Diff #289963)	we appear to have 3 constant loads now instead of 2

craig.topper added inline comments.Sep 7 2020, 6:22 PM

llvm/test/CodeGen/X86/avx512-trunc.ll
1020 ↗	(On Diff #289963)	Not sure either. InstCombine simplifies it to a store of all ones.

foad added inline comments.Sep 8 2020, 2:48 AM

llvm/test/CodeGen/X86/masked_store_trunc_usat.ll
5199 ↗	(On Diff #289963)	Wel, yes... It has spotted that the result of the pminsw is always negative, so rather than XOR with 0x8000 to flip (i.e. clear) the sign bit, it can AND with 0x7fff to clear the sign bit. But unfortunately that means materialising another constant. I don't know where this XOR -> AND "optimization" happens, or whether it can be finessed. The other regressions you pointed out below are basically the same issue.

foad added inline comments.Sep 10 2020, 5:45 AM

llvm/test/CodeGen/X86/masked_store_trunc_usat.ll
5199 ↗	(On Diff #289963)	I don't know where this XOR -> AND "optimization" happens, or whether it can be finessed. We have done this basically forever, in the XOR case in TargetLowering::SimplifyDemandedBits: // If one side is a constant, and all of the known set bits on the other // side are also set in the constant, turn this into an AND, as we know // the bits will be cleared. // e.g. (X \| C1) ^ C2 --> (X \| C1) & ~C2 iff (C1&C2) == C2 It seems to me that this is just bad luck, that we transform X^0x8000 into X&0x7FFF, but that happens to regress code quality because now we can't share with another use of the constant 0x8000. Is there any systematic way of fixing this, e.g. by doing the reverse transformation once we know what constants are already available in registers? Or can I commit this patch even with a known regression like this? After all, I'm sure if I looked hard enough I could find another test that got better by luck instead of worse.

foad added inline comments.Sep 10 2020, 9:25 AM

llvm/test/CodeGen/X86/masked_store_trunc_usat.ll
5199 ↗	(On Diff #289963)	If you buy the argument that SimplifyDemandedBits replacing XOR -> AND is not an optimization and is unhelpful in this case, D87464 + D87465 is my attempt at fixing that. If I rebase this patch on those two then all the bad diffs go away, leaving only the good diffs in `avx512-trunc.ll`.

RKSimon added a subscriber: ArturGainullin.Sep 10 2020, 12:45 PM

RKSimon added inline comments.

llvm/test/CodeGen/X86/avx512-trunc.ll
1020 ↗	(On Diff #289963)	The test was added as part of D45315 - @ArturGainullin hasn't been an active for some time afaict. I think we might be able to get away with adjusting the -1 splat to be a non-uniform mix of -ve constant values.

reverse ping?

Herald added a subscriber: pengfei. · View Herald TranscriptDec 15 2020, 9:42 AM

yubing added a subscriber: yubing.Dec 16 2020, 4:54 AM

RKSimon added reviewers: pengfei, yubing.Jan 2 2021, 7:39 AM

RKSimon added inline comments.

llvm/test/CodeGen/X86/avx512-trunc.ll
1020 ↗	(On Diff #289963)	Any recommendations on what to do with these tests?

foad mentioned this in D94693: Improve KnownBits analyses for SMIN/SMAX DAG nodes..Jan 14 2021, 8:08 AM

aymanmus added a subscriber: aymanmus.Jan 14 2021, 8:32 AM

Rebase. D87236 seems to have fixed the code quality regressions.

Harbormaster completed remote builds in B85180: Diff 316673.Jan 14 2021, 8:44 AM

RKSimon mentioned this in rGb99782cf7850: [X86][AVX] Adjust unsigned saturation downconvert negative test.Jan 14 2021, 9:57 AM

@foad Please can you rebase? I think I've replaced the dodgy test with something useful now

Rebase.

LGTM - cheers!

This revision is now accepted and ready to land.Jan 14 2021, 10:09 AM

This revision was landed with ongoing or failed builds.Jan 14 2021, 10:15 AM

Closed by commit rG868da2ea939b: [SelectionDAG] Remove an early-out from computeKnownBits for smin/smax (authored by foad). · Explain Why

This revision was automatically updated to reflect the committed changes.

foad added a commit: rG868da2ea939b: [SelectionDAG] Remove an early-out from computeKnownBits for smin/smax.

Harbormaster completed remote builds in B85196: Diff 316697.Jan 14 2021, 10:16 AM

foad mentioned this in D87465: [TargetLowering] Change SimplifyDemandedBits for XOR.Mar 15 2021, 4:36 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

SelectionDAG.cpp

1 line

test/

CodeGen/

X86/

known-bits-vector.ll

12 lines

Diff 316699

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,410 Lines • ▼ Show 20 Lines	if (CstLow && CstHigh) {
if (ValueLow.isNonNegative() && ValueHigh.isNonNegative()) {		if (ValueLow.isNonNegative() && ValueHigh.isNonNegative()) {
Known.Zero.setHighBits(MinSignBits);		Known.Zero.setHighBits(MinSignBits);
break;		break;
}		}
}		}
}		}

Known = computeKnownBits(Op.getOperand(0), DemandedElts, Depth + 1);		Known = computeKnownBits(Op.getOperand(0), DemandedElts, Depth + 1);
if (Known.isUnknown()) break; // Early-out
Known2 = computeKnownBits(Op.getOperand(1), DemandedElts, Depth + 1);		Known2 = computeKnownBits(Op.getOperand(1), DemandedElts, Depth + 1);
if (IsMax)		if (IsMax)
Known = KnownBits::smax(Known, Known2);		Known = KnownBits::smax(Known, Known2);
else		else
Known = KnownBits::smin(Known, Known2);		Known = KnownBits::smin(Known, Known2);
break;		break;
}		}
case ISD::FrameIndex:		case ISD::FrameIndex:
▲ Show 20 Lines • Show All 6,798 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/known-bits-vector.ll

	Show First 20 Lines • Show All 429 Lines • ▼ Show 20 Lines
	}			}

	define <4 x float> @knownbits_smax_smin_shuffle_uitofp(<4 x i32> %a0) {			define <4 x float> @knownbits_smax_smin_shuffle_uitofp(<4 x i32> %a0) {
	; X32-LABEL: knownbits_smax_smin_shuffle_uitofp:			; X32-LABEL: knownbits_smax_smin_shuffle_uitofp:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: vpminsd {{\.LCPI.*}}, %xmm0, %xmm0			; X32-NEXT: vpminsd {{\.LCPI.*}}, %xmm0, %xmm0
	; X32-NEXT: vpmaxsd {{\.LCPI.*}}, %xmm0, %xmm0			; X32-NEXT: vpmaxsd {{\.LCPI.*}}, %xmm0, %xmm0
	; X32-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,0,3,3]			; X32-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,0,3,3]
	; X32-NEXT: vpblendw {{.*#+}} xmm1 = xmm0[0],mem[1],xmm0[2],mem[3],xmm0[4],mem[5],xmm0[6],mem[7]			; X32-NEXT: vcvtdq2ps %xmm0, %xmm0
	; X32-NEXT: vpsrld $16, %xmm0, %xmm0
	; X32-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0],mem[1],xmm0[2],mem[3],xmm0[4],mem[5],xmm0[6],mem[7]
	; X32-NEXT: vsubps {{\.LCPI.*}}, %xmm0, %xmm0
	; X32-NEXT: vaddps %xmm0, %xmm1, %xmm0
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: knownbits_smax_smin_shuffle_uitofp:			; X64-LABEL: knownbits_smax_smin_shuffle_uitofp:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: vpminsd {{.*}}(%rip), %xmm0, %xmm0			; X64-NEXT: vpminsd {{.*}}(%rip), %xmm0, %xmm0
	; X64-NEXT: vpmaxsd {{.*}}(%rip), %xmm0, %xmm0			; X64-NEXT: vpmaxsd {{.*}}(%rip), %xmm0, %xmm0
	; X64-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,0,3,3]			; X64-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,0,3,3]
	; X64-NEXT: vpblendw {{.*#+}} xmm1 = xmm0[0],mem[1],xmm0[2],mem[3],xmm0[4],mem[5],xmm0[6],mem[7]			; X64-NEXT: vcvtdq2ps %xmm0, %xmm0
	; X64-NEXT: vpsrld $16, %xmm0, %xmm0
	; X64-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0],mem[1],xmm0[2],mem[3],xmm0[4],mem[5],xmm0[6],mem[7]
	; X64-NEXT: vsubps {{.*}}(%rip), %xmm0, %xmm0
	; X64-NEXT: vaddps %xmm0, %xmm1, %xmm0
	; X64-NEXT: retq			; X64-NEXT: retq
	%1 = call <4 x i32> @llvm.x86.sse41.pminsd(<4 x i32> %a0, <4 x i32> <i32 0, i32 -65535, i32 -65535, i32 0>)			%1 = call <4 x i32> @llvm.x86.sse41.pminsd(<4 x i32> %a0, <4 x i32> <i32 0, i32 -65535, i32 -65535, i32 0>)
	%2 = call <4 x i32> @llvm.x86.sse41.pmaxsd(<4 x i32> %1, <4 x i32> <i32 65535, i32 -1, i32 -1, i32 131071>)			%2 = call <4 x i32> @llvm.x86.sse41.pmaxsd(<4 x i32> %1, <4 x i32> <i32 65535, i32 -1, i32 -1, i32 131071>)
	%3 = shufflevector <4 x i32> %2, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 3, i32 3>			%3 = shufflevector <4 x i32> %2, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 3, i32 3>
	%4 = uitofp <4 x i32> %3 to <4 x float>			%4 = uitofp <4 x i32> %3 to <4 x float>
	ret <4 x float> %4			ret <4 x float> %4
	}			}
	declare <4 x i32> @llvm.x86.sse41.pmaxsd(<4 x i32>, <4 x i32>) nounwind readnone			declare <4 x i32> @llvm.x86.sse41.pmaxsd(<4 x i32>, <4 x i32>) nounwind readnone
	▲ Show 20 Lines • Show All 226 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Remove an early-out from computeKnownBits for smin/smaxClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 316699

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

llvm/test/CodeGen/X86/known-bits-vector.ll

[SelectionDAG] Remove an early-out from computeKnownBits for smin/smax
ClosedPublic