This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Prevent the transform of combine for multi-use operand
ClosedPublic

Authored by spatel on Aug 7 2021, 3:08 AM.

Diff Detail

Unit TestsFailed

Event Timeline

Allen created this revision.Aug 7 2021, 3:08 AM
Allen requested review of this revision.Aug 7 2021, 3:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 7 2021, 3:08 AM
  1. Please upload all patches with full context. (-U99999)
  2. I do not understand how one-use check here is acting as a correctness check
dmgreen added a subscriber: dmgreen.Aug 7 2021, 5:56 AM
dmgreen added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4944

N0->hasOneUse() would probably be better, to check the Node has one use not the SDValue. (Although here it likely won't make much difference.)

llvm/test/CodeGen/AArch64/arm64-srl-and.ll
3

This needs a triple, and likely doesn't need a -mcpu.

12

Has this deleted some of the check lines?

lebedev.ri added inline comments.Aug 7 2021, 6:01 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4944

*Why* does it matter for the correctness whether the node has other uses or not?
The peephole should only affect the current root use we started with,
and not affect any other uses whatsoever.

craig.topper added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4944

There is a CombineTo call on N0. That will affect all users of N0.

lebedev.ri added inline comments.Aug 7 2021, 7:41 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4944

Aha, that indeed explains it, and should have been stated in the patch's description, thanks.
But why is it there?
Sounds like this transform should instead be

if (TLI.isLegalAddImmediate(ADDC.getSExtValue())) {
  SDLoc DL0(N0);
  SDValue NewAdd =
    DAG.getNode(ISD::ADD, DL0, VT,
                N0.getOperand(0), DAG.getConstant(ADDC, DL, VT));
  return DAG.getNode(ISD::AND, DL0, VT, NewAdd, N1);
}
Allen added a comment.Aug 8 2021, 7:18 PM
  1. Please upload all patches with full context. (-U99999)
  2. I do not understand how one-use check here is acting as a correctness check

sorry for late reply, but I think you already know. as the node t46 has multi use, so add t45, 65535 to sub t45, -1 is not save in this case.

t64: i32 = srl t46, Constant:i64<16>

  t14: i32 = and t46, t64
t46: i32 = add t45, Constant:i32<65535>
llvm/test/CodeGen/AArch64/arm64-srl-and.ll
12

yes, as others are not the kernel code about this issue.

Allen updated this revision to Diff 365067.Aug 8 2021, 7:54 PM
spatel added a subscriber: spatel.

The transform that we want to make can be shown with this example (and so this test should be added somewhere as a preliminary commit):

define i32 @src(i32 %x, i32 %y) {
  %add = add i32 %x, 65535 ; --> turn this into -1 because that can be encoded as an immediate operand
  %srl = lshr i32 %y, 16
  %r = and i32 %srl, %add
  ret i32 %r
}

https://alive2.llvm.org/ce/z/Ehadpq

For AArch and PowerPC and possibly all targets (x86 would benefit too), we want this to use a -1 (or sub 1), not an add 65535:

sub w8, w0, #1
and w0, w8, w1, lsr #16

vs.

mov w8, #65535
add w8, w0, w8
and w0, w8, w1, lsr #16

But if you try this test with main today, it doesn't work:
https://godbolt.org/z/fe7h1zYv6

...because this transform has another bug: it doesn't match the commuted variant.

Over in PR51321, I mentioned that I could delete the whole transform and not see any test changes. That's because SimplifyDemandedBits does the same transform. But it has a similar problem - it only catches the case where it processes the shift before it sees the add. So it misses the same commuted pattern. I'm not sure how we'd fix that in DemandedBits; it seems like a generic ordering problem for commutative ops.

So this transform has at least 3 problems (assuming we want to keep it instead of enhancing DemandedBits):

  1. It miscompiles the multi-use case because it is coded weirdly and missed a constraint.
  2. It misses the commuted case.
  3. It misses what is likely profitable for all targets by overspecifying the legality constraints.
Allen marked 2 inline comments as done.EditedAug 9 2021, 6:25 PM

this patch is expected to fix the runtime bug, and more cases to be optimized can be consider in another issue :)

When we use w8 replace the w1 , then the following two fragment codes is not equal.
so if the SimplifyDemandedBits does the same transform, a patch can be land separate to fix this firstly to avoid the runtime bug ?

sub w8, w0, #1
and w0, w8, w8, lsr #16

vs.

mov w8, #65535
add w8, w0, w8
and w0, w8, w8, lsr #16
llvm/test/CodeGen/AArch64/arm64-srl-and.ll
3

thanks, done

Please can you regenerate the diff with context

llvm/test/CodeGen/AArch64/arm64-srl-and.ll
8

do you need all the dso_local/noundef/local_unnamed_addr attributes?

Allen updated this revision to Diff 366601.Aug 16 2021, 5:20 AM
Allen marked an inline comment as done.
lebedev.ri requested changes to this revision.Aug 16 2021, 5:39 AM

Please,

  1. either add a comment explaining why the one-use check needed
  2. or rewrite the transform the right way, so that the lack of one-use check shows up as extra instruction bloat and not a miscompilation
This revision now requires changes to proceed.Aug 16 2021, 5:39 AM

Please,

  1. either add a comment explaining why the one-use check needed
  2. or rewrite the transform the right way, so that the lack of one-use check shows up as extra instruction bloat and not a miscompilation

Since there's a visible miscompile, I'm ok with the quick/small fix since that will be an easy backport for the 13.0 release. But I agree that there should be a FIXME comment on this transform; it is clearly not ideal.

Allen added a comment.Aug 16 2021, 6:26 AM

Please,

  1. either add a comment explaining why the one-use check needed
  2. or rewrite the transform the right way, so that the lack of one-use check shows up as extra instruction bloat and not a miscompilation

ok, I'll add a comment as compare between current change and rewritting the transform , current change has more effective code (less inst) .

the following code is with rewritting the transform in your comment.

adrp	x8, :got:g
	ldr	x8, [x8, :got_lo12:g]
	mov	w9, #50
	mov	w10, #65535
	ldrh	w8, [x8]
	eor	w9, w8, w9
	add	w9, w9, w10
	sub	w8, w8, #1
	and	w0, w8, w9, lsr #16
	ret
llvm/test/CodeGen/AArch64/arm64-srl-and.ll
8

yes, they are not essential, and i'll delete them later

Allen updated this revision to Diff 366649.Aug 16 2021, 8:42 AM
Allen marked 5 inline comments as done.
RKSimon added inline comments.Aug 16 2021, 9:06 AM
llvm/test/CodeGen/AArch64/arm64-srl-and.ll
5

Please rephrase this and actually explain the "dagcombine" you're referring to in the comment

lebedev.ri added inline comments.Aug 16 2021, 9:44 AM
llvm/test/CodeGen/AArch64/arm64-srl-and.ll
2

Please precommit this test before landing the fix

Allen updated this revision to Diff 367226.Aug 18 2021, 8:50 AM
Allen marked an inline comment as done.
Allen edited the summary of this revision. (Show Details)
lebedev.ri added inline comments.Aug 18 2021, 9:30 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5152–5156

How can this FIXME ever be addressed, given that there already is an one-use check?

spatel added a comment.Sep 4 2021, 8:03 AM

Reverse ping @Allen
I'm not sure if we have already missed the deadline for 13.0, but we really should fix this miscompile. Someone can commandeer the patch if you can't work on it (just need to fix the comments?).

spatel commandeered this revision.Sep 6 2021, 11:42 AM
spatel edited reviewers, added: Allen; removed: spatel.
spatel updated this revision to Diff 370946.Sep 6 2021, 11:43 AM
spatel marked an inline comment as done.

Same minimal fix (hopefully can still make the release branch), but updated the code and test comments

lebedev.ri accepted this revision.Sep 6 2021, 11:46 AM

LG, thank you.

This revision is now accepted and ready to land.Sep 6 2021, 11:46 AM