Fixes 1st issue of https://github.com/llvm/llvm-project/issues/58061
Fixes the crash of https://github.com/llvm/llvm-project/issues/58675
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/CodeGen/AArch64/i128-cmp.ll | ||
---|---|---|
119–121 | Is there some reason we don't want to combine this to cmp+ccmp+b.ne? |
llvm/test/CodeGen/AArch64/i128-cmp.ll | ||
---|---|---|
119–121 |
SelectionDAG has 19 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t6: i64,ch = CopyFromReg t0, Register:i64 %2 t26: i64 = xor t2, t6 t4: i64,ch = CopyFromReg t0, Register:i64 %1 t8: i64,ch = CopyFromReg t0, Register:i64 %3 t27: i64 = xor t4, t8 t28: i64 = or t26, t27 t22: i32 = setcc t28, Constant:i64<0>, setne:ch t21: ch = brcond t0, t22, BasicBlock:ch<exit 0xaaaab28f7268> t18: ch = br t21, BasicBlock:ch<call 0xaaaab28f7170>
-; NOLSE-NEXT: eor x11, x9, x11 -; NOLSE-NEXT: eor x8, x10, x8 -; NOLSE-NEXT: orr x8, x8, x11 +; NOLSE-NEXT: mov x9, x8 ; NOLSE-NEXT: str x9, [sp, #8] // 8-byte Folded Spill +; NOLSE-NEXT: mov x10, x12 ; NOLSE-NEXT: str x10, [sp, #16] // 8-byte Folded Spill +; NOLSE-NEXT: subs x12, x12, x13 +; NOLSE-NEXT: ccmp x8, x11, #0, eq +; NOLSE-NEXT: cset w8, eq ; NOLSE-NEXT: str x10, [sp, #32] // 8-byte Folded Spill ; NOLSE-NEXT: str x9, [sp, #40] // 8-byte Folded Spill -; NOLSE-NEXT: cbnz x8, .LBB4_1 +; NOLSE-NEXT: tbnz w8, #0, .LBB4_1 |
llvm/test/CodeGen/AArch64/i128-cmp.ll | ||
---|---|---|
119–121 | We can mostly ignore codesize at -O0. (I mean, it matters to the extent that really bloated code can start to impact compile-time, but that isn't relevant here.) |
relax the constraint N->use_begin()->getOpcode() != ISD::BRCOND
llvm/test/CodeGen/AArch64/i128-cmp.ll | ||
---|---|---|
119–121 | Done, Thank you for your guidance. |
Could this be done during lowering, int AArch64TargetLowering::LowerSETCC, or does that not work?
The getNZCVToSatisfyCondCode method is useful for getting the constant needed for CCMP's.
llvm/test/CodeGen/AArch64/i128-cmp.ll | ||
---|---|---|
13 | Are you sure this is correct? It doesn't look right. I think I would expect ccmp #0, eq; cset eq. | |
23–24 | And here it needs to set based on ne, so maybe ccmp #8, eq; cmp ne. Those two verify as equivalent. |
llvm/test/CodeGen/AArch64/i128-cmp.ll | ||
---|---|---|
13 | Yes, I test the executive for the initial case in https://github.com/llvm/llvm-project/issues/58061 |
llvm/test/CodeGen/AArch64/i128-cmp.ll | ||
---|---|---|
23–24 | can also find the cases in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104611 |
Thanks for your suggestion, I try to debug the function br_on_cmp_i128_eq in file CodeGen/AArch64/i128-cmp.ll, and find that the setcc is transform into br_cc in AArch64TargetLowering::LowerOperation
so I think it can also work in AArch64TargetLowering. Out of intresting, I'd like to know why you recommend processing in the AArch64TargetLowering?
I see. That sounds like a good reason not to do it in Lowering.
llvm/test/CodeGen/AArch64/i128-cmp.ll | ||
---|---|---|
13 | How did you test that? Is it the code from https://gcc.godbolt.org/z/Tv1YP6bPc? Could it have been constant-folded away? |
llvm/test/CodeGen/AArch64/i128-cmp.ll | ||
---|---|---|
13 | Yes, I test the executive very simple, just run the following cmd with and without the changes. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
19495 | Leftover comment about brcond |
llvm/test/CodeGen/AArch64/i128-cmp.ll | ||
---|---|---|
13 | Have you tried with -fno-inline? The example godbolt link has everything inlined into main, and constant folded into the arguments of the printf's. It doesn't look like it is really testing the codegen. |
llvm/test/CodeGen/AArch64/i128-cmp.ll | ||
---|---|---|
13 | oh, thanks very much, you are right. But how can I get the #8 for case cmp_i128_ne ? it seem the input Code should be MI or LT when it return expect "#8" with function getNZCVToSatisfyCondCode ? |
llvm/test/CodeGen/AArch64/i128-cmp.ll | ||
---|---|---|
13 | Yeah, #8 is not the only choice. I think it can be any constant where the the Z bit is clear, so #0 is fine (as is #8). #4 is not because that is the Z bit. Because the eq is a && and the ne is a ||, it might be simpler to just use a constant of 0 in both cases, providing that works. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
19503 | 0 -> AArch64CC::EQ. | |
19506 | I'm not sure if this should be MVT::Glue or MVT::i32. It seems to be created differently in different places. | |
19509–19510 | This comment doesn't seem very helpful as a code-comment. | |
19511 | AArch64CC::EQ -> 0 is probably better. It is not a condition, but the value the NZCV flags are set to. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
19506 | I don't very sure this is the accurate answer, it seems the MVT::Glue implicit instructions are scheduled together? | |
19511 | Apply your comment, thanks |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
19499–19500 | LHS should be OneUse also? | |
19506 | You haven't pass the glue to other instructions so the glue is useless. And I think we needn't use Glue here. | |
19510 | I am not sure if we can just combine to ISD::SETCC ? Maybe it can combine with some other op. |
need rebase as fail in case llvm/test/CodeGen/AArch64/bcmp.ll, which is new precommited in e95c74b423c
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
19499–19500 | The LHS node itself is not used in the return value when the pattern matched, so I don't think the OneUse is needed, correct me if I'm wrong, thanks. | |
19506 | Thanks, I'll updated it. | |
19510 | sorry, I don't understand what is the ISD::SETCC, could you please show more detailedly? as I don't find it in my changes. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
19499–19500 | for example: int use(int); int f(int a, int b, int c, int d) { int xor0 = a ^ b; int xor1 = c ^ d; int or0 = xor0 | xor1; if (or0 != 0) return use(or0); return a; } or0 is not one use. So we should keep all of the xor+or patterns. | |
19510 | The code should be simpler by combine to SetCC: SDValue XOR0 = LHS.getOperand(0); SDValue XOR1 = LHS.getOperand(1); SDValue Cmp0 = DAG.getSetCC(DL, VT, XOR0.getOperand(0), XOR0.getOperand(1), ISD::SETNE); SDValue Cmp1 = DAG.getSetCC(DL, VT, XOR1.getOperand(0), XOR1.getOperand(1), ISD::SETNE); SDValue Cmp = DAG.getNode(ISD::OR, DL, VT, Cmp0, Cmp1); return DAG.getSetCC(DL, VT, Cmp, DAG.getConstant(0, DL, VT), Cond); But may fall into potential dead loop if somewhere has the reverse combination. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
19499–19500 |
|
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
19499–19500 | It looks your case source instructions is less than dest? |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
19499–19500 | I'm just suggesting the multi-use is allowed base alive2 |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
19499–19500 | I see. Can you add the testcase where the LHS has multiple uses: define i32 @multiuse(i32 %0, i32 %1, i32 %2, i32 %3) { %5 = xor i32 %1, %0 %6 = xor i32 %3, %2 %7 = or i32 %6, %5 %8 = icmp eq i32 %7, 0 br i1 %8, label %11, label %9 9: ; preds = %4 %10 = tail call i32 @use(i32 %7) #2 br label %11 11: ; preds = %4, %9 %12 = phi i32 [ %10, %9 ], [ %0, %4 ] ret i32 %12 } We just need to make sure it doesn't increase the number of instructions. | |
19506 | Glue is probably OK. Can you add this test case: define i32 @eq(i128 noundef %x, i128 noundef %y) { entry: %cmp3 = icmp eq i128 %x, %y %conv = trunc i128 %x to i64 %conv1 = trunc i128 %y to i64 %cmp = icmp eq i64 %conv, %conv1 %or7 = or i1 %cmp3, %cmp %or = zext i1 %or7 to i32 ret i32 %or } There may be issues with the CMP/CCMP with the scheduling of instructions that ISel will create out of the DAG, but I've not seen any happen yet. | |
19510 | Hmm. It is not worth it if we are taking two steps to do what we could do in one. But there could be further DAG combiners for the setcc. I'd say this method is fine so long as we don't do it too early. If we find cases where there are missing combines, we can always add extra folds for the CMP/CCMPs. |
hi, I think this patch is triggering an assert failure in SelectionDAG.
I've filed https://github.com/llvm/llvm-project/issues/58675 to track it, with the reproducer. I've bisected the issue to this commit, and reverting locally stops the assert from triggering on the reproducer.
Can you take a look a the issue, and revert this until a fix is available?
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
19506 | As the crash of https://github.com/llvm/llvm-project/issues/58675, we need update the MVT::glue into MVT::i32 |
Leftover comment about brcond