llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4944	N0->hasOneUse() would probably be better, to check the Node has one use not the SDValue. (Although here it likely won't make much difference.)
llvm/test/CodeGen/AArch64/arm64-srl-and.ll
2	This needs a triple, and likely doesn't need a -mcpu.
11	Has this deleted some of the check lines?

lebedev.ri added inline comments.Aug 7 2021, 6:01 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4944	Why does it matter for the correctness whether the node has other uses or not? The peephole should only affect the current root use we started with, and not affect any other uses whatsoever.

craig.topper added a subscriber: craig.topper.Aug 7 2021, 7:35 AM

craig.topper added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4944	There is a CombineTo call on N0. That will affect all users of N0.

lebedev.ri added inline comments.Aug 7 2021, 7:41 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

4944

Aha, that indeed explains it, and should have been stated in the patch's description, thanks.
But why is it there?
Sounds like this transform should instead be

if (TLI.isLegalAddImmediate(ADDC.getSExtValue())) {
  SDLoc DL0(N0);
  SDValue NewAdd =
    DAG.getNode(ISD::ADD, DL0, VT,
                N0.getOperand(0), DAG.getConstant(ADDC, DL, VT));
  return DAG.getNode(ISD::AND, DL0, VT, NewAdd, N1);
}

In D107692#2932650, @lebedev.ri wrote:

Please upload all patches with full context. (-U99999)

I do not understand how one-use check here is acting as a correctness check

sorry for late reply, but I think you already know. as the node t46 has multi use, so add t45, 65535 to sub t45, -1 is not save in this case.

t64: i32 = srl t46, Constant:i64<16>

  t14: i32 = and t46, t64
t46: i32 = add t45, Constant:i32<65535>

llvm/test/CodeGen/AArch64/arm64-srl-and.ll
11	yes, as others are not the kernel code about this issue.

Allen updated this revision to Diff 365067.Aug 8 2021, 7:54 PM

Harbormaster completed remote builds in B118593: Diff 365067.Aug 8 2021, 8:34 PM

The transform that we want to make can be shown with this example (and so this test should be added somewhere as a preliminary commit):

define i32 @src(i32 %x, i32 %y) {
  %add = add i32 %x, 65535 ; --> turn this into -1 because that can be encoded as an immediate operand
  %srl = lshr i32 %y, 16
  %r = and i32 %srl, %add
  ret i32 %r
}

https://alive2.llvm.org/ce/z/Ehadpq

For AArch and PowerPC and possibly all targets (x86 would benefit too), we want this to use a -1 (or sub 1), not an add 65535:

sub w8, w0, #1
and w0, w8, w1, lsr #16

vs.

mov w8, #65535
add w8, w0, w8
and w0, w8, w1, lsr #16

But if you try this test with main today, it doesn't work:
https://godbolt.org/z/fe7h1zYv6

...because this transform has another bug: it doesn't match the commuted variant.

Over in PR51321, I mentioned that I could delete the whole transform and not see any test changes. That's because SimplifyDemandedBits does the same transform. But it has a similar problem - it only catches the case where it processes the shift before it sees the add. So it misses the same commuted pattern. I'm not sure how we'd fix that in DemandedBits; it seems like a generic ordering problem for commutative ops.

So this transform has at least 3 problems (assuming we want to keep it instead of enhancing DemandedBits):

It miscompiles the multi-use case because it is coded weirdly and missed a constraint.
It misses the commuted case.
It misses what is likely profitable for all targets by overspecifying the legality constraints.

this patch is expected to fix the runtime bug, and more cases to be optimized can be consider in another issue :)

When we use w8 replace the w1 , then the following two fragment codes is not equal.
so if the SimplifyDemandedBits does the same transform, a patch can be land separate to fix this firstly to avoid the runtime bug ?

sub w8, w0, #1
and w0, w8, w8, lsr #16

vs.

mov w8, #65535
add w8, w0, w8
and w0, w8, w8, lsr #16

llvm/test/CodeGen/AArch64/arm64-srl-and.ll
2	thanks, done

lebedev.ri mentioned this in D108103: [DAGCombiner] Add one use restriction for the pattern (and (add x, c1), (lshr y, c2)).Aug 16 2021, 12:30 AM

Please can you regenerate the diff with context

llvm/test/CodeGen/AArch64/arm64-srl-and.ll
7	do you need all the dso_local/noundef/local_unnamed_addr attributes?

Allen updated this revision to Diff 366601.Aug 16 2021, 5:20 AM

Allen marked an inline comment as done.

Please,

either add a comment explaining why the one-use check needed
or rewrite the transform the right way, so that the lack of one-use check shows up as extra instruction bloat and not a miscompilation

This revision now requires changes to proceed.Aug 16 2021, 5:39 AM

In D107692#2946669, @lebedev.ri wrote:

Please,

either add a comment explaining why the one-use check needed

or rewrite the transform the right way, so that the lack of one-use check shows up as extra instruction bloat and not a miscompilation

Since there's a visible miscompile, I'm ok with the quick/small fix since that will be an easy backport for the 13.0 release. But I agree that there should be a FIXME comment on this transform; it is clearly not ideal.

Harbormaster completed remote builds in B119693: Diff 366601.Aug 16 2021, 6:14 AM

In D107692#2946669, @lebedev.ri wrote:

Please,

either add a comment explaining why the one-use check needed

or rewrite the transform the right way, so that the lack of one-use check shows up as extra instruction bloat and not a miscompilation

ok, I'll add a comment as compare between current change and rewritting the transform , current change has more effective code (less inst) .

the following code is with rewritting the transform in your comment.

adrp	x8, :got:g
	ldr	x8, [x8, :got_lo12:g]
	mov	w9, #50
	mov	w10, #65535
	ldrh	w8, [x8]
	eor	w9, w8, w9
	add	w9, w9, w10
	sub	w8, w8, #1
	and	w0, w8, w9, lsr #16
	ret

llvm/test/CodeGen/AArch64/arm64-srl-and.ll
7	yes, they are not essential, and i'll delete them later

Allen updated this revision to Diff 366649.Aug 16 2021, 8:42 AM

Allen marked 5 inline comments as done.

RKSimon added inline comments.Aug 16 2021, 9:06 AM

llvm/test/CodeGen/AArch64/arm64-srl-and.ll
4	Please rephrase this and actually explain the "dagcombine" you're referring to in the comment

Harbormaster completed remote builds in B119726: Diff 366649.Aug 16 2021, 9:24 AM

lebedev.ri added inline comments.Aug 16 2021, 9:44 AM

llvm/test/CodeGen/AArch64/arm64-srl-and.ll
1	Please precommit this test before landing the fix

guopeilin mentioned this in rG9790a2a72f60: [tests] precommit tests for D107692.Aug 16 2021, 10:06 PM

Allen updated this revision to Diff 367226.Aug 18 2021, 8:50 AM

Allen marked an inline comment as done.

Allen edited the summary of this revision. (Show Details)

lebedev.ri added inline comments.Aug 18 2021, 9:30 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5152–5156	How can this FIXME ever be addressed, given that there already is an one-use check?

Harbormaster completed remote builds in B120139: Diff 367226.Aug 18 2021, 9:58 AM

Reverse ping @Allen
I'm not sure if we have already missed the deadline for 13.0, but we really should fix this miscompile. Someone can commandeer the patch if you can't work on it (just need to fix the comments?).

spatel commandeered this revision.Sep 6 2021, 11:42 AM

spatel edited reviewers, added: Allen; removed: spatel.

Herald added a subscriber: mcrosier. · View Herald TranscriptSep 6 2021, 11:42 AM

Same minimal fix (hopefully can still make the release branch), but updated the code and test comments

LG, thank you.

This revision is now accepted and ready to land.Sep 6 2021, 11:46 AM

Harbormaster completed remote builds in B122786: Diff 370946.Sep 6 2021, 12:28 PM

Closed by commit rGe1e4bf174b09: [DAGCombine] Prevent the transform of combine for multi-use operand (authored by spatel). · Explain WhySep 6 2021, 12:34 PM

This revision was automatically updated to reflect the committed changes.

spatel added a commit: rGe1e4bf174b09: [DAGCombine] Prevent the transform of combine for multi-use operand.

Diff 370959

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,935 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::hoistLogicOpWithSameOpcodeHands(SDNode *N) {
// If both shuffles use the same mask, and both shuffle within a single		// If both shuffles use the same mask, and both shuffle within a single
// vector, then it is worthwhile to move the swizzle after the operation.		// vector, then it is worthwhile to move the swizzle after the operation.
// The type-legalizer generates this pattern when loading illegal		// The type-legalizer generates this pattern when loading illegal
// vector types from memory. In many cases this allows additional shuffle		// vector types from memory. In many cases this allows additional shuffle
// optimizations.		// optimizations.
// There are other cases where moving the shuffle after the xor/and/or		// There are other cases where moving the shuffle after the xor/and/or
// is profitable even if shuffles don't perform a swizzle.		// is profitable even if shuffles don't perform a swizzle.
// If both shuffles use the same mask, and both shuffles have the same first		// If both shuffles use the same mask, and both shuffles have the same first
// or second operand, then it might still be profitable to move the shuffle		// or second operand, then it might still be profitable to move the shuffle
		dmgreenUnsubmitted Done Reply Inline Actions N0->hasOneUse() would probably be better, to check the Node has one use not the SDValue. (Although here it likely won't make much difference.) dmgreen: N0->hasOneUse() would probably be better, to check the Node has one use not the SDValue.
		lebedev.riUnsubmitted Done Reply Inline Actions Why does it matter for the correctness whether the node has other uses or not? The peephole should only affect the current root use we started with, and not affect any other uses whatsoever. lebedev.ri: Why does it matter for the correctness whether the node has other uses or not? The peephole…
		craig.topperUnsubmitted Done Reply Inline Actions There is a CombineTo call on N0. That will affect all users of N0. craig.topper: There is a CombineTo call on N0. That will affect all users of N0.
		lebedev.riUnsubmitted Done Reply Inline Actions Aha, that indeed explains it, and should have been stated in the patch's description, thanks. But why is it there? Sounds like this transform should instead be if (TLI.isLegalAddImmediate(ADDC.getSExtValue())) { SDLoc DL0(N0); SDValue NewAdd = DAG.getNode(ISD::ADD, DL0, VT, N0.getOperand(0), DAG.getConstant(ADDC, DL, VT)); return DAG.getNode(ISD::AND, DL0, VT, NewAdd, N1); } lebedev.ri: Aha, that indeed explains it, and should have been stated in the patch's description, thanks.
// after the xor/and/or operation.		// after the xor/and/or operation.
if (HandOpcode == ISD::VECTOR_SHUFFLE && Level < AfterLegalizeDAG) {		if (HandOpcode == ISD::VECTOR_SHUFFLE && Level < AfterLegalizeDAG) {
auto *SVN0 = cast<ShuffleVectorSDNode>(N0);		auto *SVN0 = cast<ShuffleVectorSDNode>(N0);
auto *SVN1 = cast<ShuffleVectorSDNode>(N1);		auto *SVN1 = cast<ShuffleVectorSDNode>(N1);
assert(X.getValueType() == Y.getValueType() &&		assert(X.getValueType() == Y.getValueType() &&
"Inputs to shuffles are not the same type");		"Inputs to shuffles are not the same type");

// Check that both shuffles use the same mask. The masks are known to be of		// Check that both shuffles use the same mask. The masks are known to be of
▲ Show 20 Lines • Show All 191 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitANDLike(SDValue N0, SDValue N1, SDNode *N) {

// fold (and x, undef) -> 0		// fold (and x, undef) -> 0
if (N0.isUndef() \|\| N1.isUndef())		if (N0.isUndef() \|\| N1.isUndef())
return DAG.getConstant(0, DL, VT);		return DAG.getConstant(0, DL, VT);

if (SDValue V = foldLogicOfSetCCs(true, N0, N1, DL))		if (SDValue V = foldLogicOfSetCCs(true, N0, N1, DL))
return V;		return V;

		// TODO: Rewrite this to return a new 'AND' instead of using CombineTo.
if (N0.getOpcode() == ISD::ADD && N1.getOpcode() == ISD::SRL &&		if (N0.getOpcode() == ISD::ADD && N1.getOpcode() == ISD::SRL &&
VT.getSizeInBits() <= 64) {		VT.getSizeInBits() <= 64 && N0->hasOneUse()) {
if (ConstantSDNode *ADDI = dyn_cast<ConstantSDNode>(N0.getOperand(1))) {		if (ConstantSDNode *ADDI = dyn_cast<ConstantSDNode>(N0.getOperand(1))) {
if (ConstantSDNode *SRLI = dyn_cast<ConstantSDNode>(N1.getOperand(1))) {		if (ConstantSDNode *SRLI = dyn_cast<ConstantSDNode>(N1.getOperand(1))) {
		lebedev.riUnsubmitted Not Done Reply Inline Actions How can this FIXME ever be addressed, given that there already is an one-use check? lebedev.ri: How can this FIXME ever be addressed, given that there already is an one-use check?
// Look for (and (add x, c1), (lshr y, c2)). If C1 wasn't a legal		// Look for (and (add x, c1), (lshr y, c2)). If C1 wasn't a legal
// immediate for an add, but it is legal if its top c2 bits are set,		// immediate for an add, but it is legal if its top c2 bits are set,
// transform the ADD so the immediate doesn't need to be materialized		// transform the ADD so the immediate doesn't need to be materialized
// in a register.		// in a register.
APInt ADDC = ADDI->getAPIntValue();		APInt ADDC = ADDI->getAPIntValue();
APInt SRLC = SRLI->getAPIntValue();		APInt SRLC = SRLI->getAPIntValue();
if (ADDC.getMinSignedBits() <= 64 &&		if (ADDC.getMinSignedBits() <= 64 &&
SRLC.ult(VT.getSizeInBits()) &&		SRLC.ult(VT.getSizeInBits()) &&
▲ Show 20 Lines • Show All 18,452 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-srl-and.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				lebedev.riUnsubmitted Done Reply Inline Actions Please precommit this test before landing the fix lebedev.ri: Please precommit this test before landing the fix
	; RUN: llc -mtriple=aarch64-linux-gnu -O3 < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-linux-gnu -O3 < %s \| FileCheck %s
				dmgreenUnsubmitted Done Reply Inline Actions This needs a triple, and likely doesn't need a -mcpu. dmgreen: This needs a triple, and likely doesn't need a -mcpu.
				AllenUnsubmitted Done Reply Inline Actions thanks, done Allen: thanks, done

	; Disable the dagcombine if operand has multi use			; This used to miscompile:
				RKSimonUnsubmitted Done Reply Inline Actions Please rephrase this and actually explain the "dagcombine" you're referring to in the comment RKSimon: Please rephrase this and actually explain the "dagcombine" you're referring to in the comment
				; The 16-bit -1 should not become 32-bit -1 (sub w8, w8, #1).

	@g = global i16 0, align 4			@g = global i16 0, align 4
				RKSimonUnsubmitted Done Reply Inline Actions do you need all the dso_local/noundef/local_unnamed_addr attributes? RKSimon: do you need all the dso_local/noundef/local_unnamed_addr attributes?
				AllenUnsubmitted Done Reply Inline Actions yes, they are not essential, and i'll delete them later Allen: yes, they are not essential, and i'll delete them later
	define i32 @srl_and() {			define i32 @srl_and() {
	; CHECK-LABEL: srl_and:			; CHECK-LABEL: srl_and:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: adrp x8, :got:g			; CHECK-NEXT: adrp x8, :got:g
				dmgreenUnsubmitted Done Reply Inline Actions Has this deleted some of the check lines? dmgreen: Has this deleted some of the check lines?
				AllenUnsubmitted Done Reply Inline Actions yes, as others are not the kernel code about this issue. Allen: yes, as others are not the kernel code about this issue.
	; CHECK-NEXT: ldr x8, [x8, :got_lo12:g]			; CHECK-NEXT: ldr x8, [x8, :got_lo12:g]
	; CHECK-NEXT: mov w9, #50			; CHECK-NEXT: mov w9, #50
	; CHECK-NEXT: ldrh w8, [x8]			; CHECK-NEXT: ldrh w8, [x8]
	; CHECK-NEXT: eor w8, w8, w9			; CHECK-NEXT: eor w8, w8, w9
	; CHECK-NEXT: sub w8, w8, #1			; CHECK-NEXT: mov w9, #65535
				; CHECK-NEXT: add w8, w8, w9
	; CHECK-NEXT: and w0, w8, w8, lsr #16			; CHECK-NEXT: and w0, w8, w8, lsr #16
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%0 = load i16, i16* @g, align 4			%0 = load i16, i16* @g, align 4
	%1 = xor i16 %0, 50			%1 = xor i16 %0, 50
	%tobool = icmp ne i16 %1, 0			%tobool = icmp ne i16 %1, 0
	%lor.ext = zext i1 %tobool to i32			%lor.ext = zext i1 %tobool to i32
	%sub = add i16 %1, -1			%sub = add i16 %1, -1

	%srl = zext i16 %sub to i32			%srl = zext i16 %sub to i32
	%and = and i32 %srl, %lor.ext			%and = and i32 %srl, %lor.ext

	ret i32 %and			ret i32 %and
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Prevent the transform of combine for multi-use operand
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 370959

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/test/CodeGen/AArch64/arm64-srl-and.ll

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Prevent the transform of combine for multi-use operandClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 370959

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/test/CodeGen/AArch64/arm64-srl-and.ll

[DAGCombine] Prevent the transform of combine for multi-use operand
ClosedPublic