This is based on example in https://bugs.llvm.org/show_bug.cgi?id=51321
Details
Diff Detail
Event Timeline
- Please upload all patches with full context. (-U99999)
- I do not understand how one-use check here is acting as a correctness check
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
4944 | N0->hasOneUse() would probably be better, to check the Node has one use not the SDValue. (Although here it likely won't make much difference.) | |
llvm/test/CodeGen/AArch64/arm64-srl-and.ll | ||
2 | This needs a triple, and likely doesn't need a -mcpu. | |
11 | Has this deleted some of the check lines? |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
4944 | *Why* does it matter for the correctness whether the node has other uses or not? |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
4944 | There is a CombineTo call on N0. That will affect all users of N0. |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
4944 | Aha, that indeed explains it, and should have been stated in the patch's description, thanks. if (TLI.isLegalAddImmediate(ADDC.getSExtValue())) { SDLoc DL0(N0); SDValue NewAdd = DAG.getNode(ISD::ADD, DL0, VT, N0.getOperand(0), DAG.getConstant(ADDC, DL, VT)); return DAG.getNode(ISD::AND, DL0, VT, NewAdd, N1); } |
sorry for late reply, but I think you already know. as the node t46 has multi use, so add t45, 65535 to sub t45, -1 is not save in this case.
t64: i32 = srl t46, Constant:i64<16>
t14: i32 = and t46, t64 t46: i32 = add t45, Constant:i32<65535>
llvm/test/CodeGen/AArch64/arm64-srl-and.ll | ||
---|---|---|
11 | yes, as others are not the kernel code about this issue. |
The transform that we want to make can be shown with this example (and so this test should be added somewhere as a preliminary commit):
define i32 @src(i32 %x, i32 %y) { %add = add i32 %x, 65535 ; --> turn this into -1 because that can be encoded as an immediate operand %srl = lshr i32 %y, 16 %r = and i32 %srl, %add ret i32 %r }
https://alive2.llvm.org/ce/z/Ehadpq
For AArch and PowerPC and possibly all targets (x86 would benefit too), we want this to use a -1 (or sub 1), not an add 65535:
sub w8, w0, #1 and w0, w8, w1, lsr #16
vs.
mov w8, #65535 add w8, w0, w8 and w0, w8, w1, lsr #16
But if you try this test with main today, it doesn't work:
https://godbolt.org/z/fe7h1zYv6
...because this transform has another bug: it doesn't match the commuted variant.
Over in PR51321, I mentioned that I could delete the whole transform and not see any test changes. That's because SimplifyDemandedBits does the same transform. But it has a similar problem - it only catches the case where it processes the shift before it sees the add. So it misses the same commuted pattern. I'm not sure how we'd fix that in DemandedBits; it seems like a generic ordering problem for commutative ops.
So this transform has at least 3 problems (assuming we want to keep it instead of enhancing DemandedBits):
- It miscompiles the multi-use case because it is coded weirdly and missed a constraint.
- It misses the commuted case.
- It misses what is likely profitable for all targets by overspecifying the legality constraints.
this patch is expected to fix the runtime bug, and more cases to be optimized can be consider in another issue :)
When we use w8 replace the w1 , then the following two fragment codes is not equal.
so if the SimplifyDemandedBits does the same transform, a patch can be land separate to fix this firstly to avoid the runtime bug ?
sub w8, w0, #1 and w0, w8, w8, lsr #16
vs.
mov w8, #65535 add w8, w0, w8 and w0, w8, w8, lsr #16
llvm/test/CodeGen/AArch64/arm64-srl-and.ll | ||
---|---|---|
2 | thanks, done |
Please can you regenerate the diff with context
llvm/test/CodeGen/AArch64/arm64-srl-and.ll | ||
---|---|---|
7 | do you need all the dso_local/noundef/local_unnamed_addr attributes? |
Please,
- either add a comment explaining why the one-use check needed
- or rewrite the transform the right way, so that the lack of one-use check shows up as extra instruction bloat and not a miscompilation
Since there's a visible miscompile, I'm ok with the quick/small fix since that will be an easy backport for the 13.0 release. But I agree that there should be a FIXME comment on this transform; it is clearly not ideal.
ok, I'll add a comment as compare between current change and rewritting the transform , current change has more effective code (less inst) .
the following code is with rewritting the transform in your comment.
adrp x8, :got:g ldr x8, [x8, :got_lo12:g] mov w9, #50 mov w10, #65535 ldrh w8, [x8] eor w9, w8, w9 add w9, w9, w10 sub w8, w8, #1 and w0, w8, w9, lsr #16 ret
llvm/test/CodeGen/AArch64/arm64-srl-and.ll | ||
---|---|---|
7 | yes, they are not essential, and i'll delete them later |
llvm/test/CodeGen/AArch64/arm64-srl-and.ll | ||
---|---|---|
4 | Please rephrase this and actually explain the "dagcombine" you're referring to in the comment |
llvm/test/CodeGen/AArch64/arm64-srl-and.ll | ||
---|---|---|
1 | Please precommit this test before landing the fix |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
5152–5156 | How can this FIXME ever be addressed, given that there already is an one-use check? |
Reverse ping @Allen
I'm not sure if we have already missed the deadline for 13.0, but we really should fix this miscompile. Someone can commandeer the patch if you can't work on it (just need to fix the comments?).
Same minimal fix (hopefully can still make the release branch), but updated the code and test comments
N0->hasOneUse() would probably be better, to check the Node has one use not the SDValue. (Although here it likely won't make much difference.)