This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Generate tbz/tbnz when comparing against zero.
ClosedPublic

Authored by mcrosier on Jul 9 2014, 1:07 PM.

Download Raw Diff

Details

Reviewers

t.p.northover
jmolloy
Jiangning

Commits

rG579c02c9a52c: [AArch64] Generate tbz/tbnz when comparing against zero.
rL214518: [AArch64] Generate tbz/tbnz when comparing against zero.

Summary

This patch generates tbz/tbnz when comparing against zero. The tbz/tbnz checks the sign bit to convert

op w1, w1, w10
cmp w1, #0
b.lt .LBB0_0

op w1, w1, w10
tbnz w1, #31, .LBB0_0

Please have a look.

Chad

Diff Detail

Repository: rL LLVM

Event Timeline

mcrosier updated this revision to Diff 11215.Jul 9 2014, 1:07 PM

mcrosier retitled this revision from to [AArch64] Generate tbz/tbnz when comparing against zero..

mcrosier updated this object.

mcrosier edited the test plan for this revision. (Show Details)

mcrosier added reviewers: t.p.northover, jmolloy.

mcrosier added subscribers: Unknown Object (MLST), grosbach, Jiangning, • HaoLiu.

Herald added subscribers: mcrosier, aemerson. · View Herald TranscriptJul 9 2014, 1:07 PM

The patch looks good, and I am researching an issue which is a bit like this one. I think there should be more test case, for example

cmp      w0, #1                 // =1
b.lt    .LBB

I have tried your patch and test.Here, I doubt whether it has something to do with add/sub or adds/subs, and I have also tried some test case written by myself.

void foo();
void gt(int tmp) {
  if (tmp >= 0)
    foo();
}

and the asm is below:

  ge:                                     // @ge 
// BB#0:                                // %entry 
        cmp      w0, #0                 // =0 
        b.lt    .LBB1_2                           
// BB#1:                                // %if.then 
        b       foo 
.LBB1_2:                                // %if.end 
        ret 
.Ltmp3:
      .size   ge, .Ltmp3-ge

I think the cmp and b.lt above can combine to tbz/tbnz. I think your patch should cover this case. When I chang the c code to (tmp>0), it generates

gt:                                     // @gt
// BB#0:                                // %entry
       cmp      w0, #1                 // =1
       b.lt    .LBB0_2
// BB#1:                                // %if.then
       b       foo
.LBB0_2:                                // %if.end
       ret

I think this case may not be easy to generate TBZ/TBNZ, what is your opinion?

Thanks
-David

In D4440#7, @xjwwm_cat wrote:

I have tried your patch and test.Here, I doubt whether it has something to do with add/sub or adds/subs, and I have also tried some test case written by myself.

The difference between add/sub and adds/subs is that the latter sets the NZCV bits.

void foo();
void gt(int tmp) {
  if (tmp >= 0)
    foo();
}

and the asm is below:

  ge:                                     // @ge 
// BB#0:                                // %entry 
        cmp      w0, #0                 // =0 
        b.lt    .LBB1_2                           
// BB#1:                                // %if.then 
        b       foo 
.LBB1_2:                                // %if.end 
        ret 
.Ltmp3:
      .size   ge, .Ltmp3-ge

I think the cmp and b.lt above can combine to tbz/tbnz. I think your patch should cover this case. When I chang the c code to (tmp>0), it generates

AFAICT, that case can be handled by a similar combine, but it shouldn't block this patch. Feel free to submit a patch of your own.

gt:                                     // @gt
// BB#0:                                // %entry
       cmp      w0, #1                 // =1
       b.lt    .LBB0_2
// BB#1:                                // %if.then
       b       foo
.LBB0_2:                                // %if.end
       ret

I think this case may not be easy to generate TBZ/TBNZ, what is your opinion?

I don't think that this is possible as you're not strictly checking in sign bit. You would need two checks, to see if the value is zero and another to check if it's negative.

Thanks
-David

Chad

Ping.

Hi Chad,

2014-07-10 4:08 GMT+08:00 Chad Rosier <mcrosier@codeaurora.org>:

Hi t.p.northover, jmolloy,

This patch generates tbz/tbnz when comparing against zero. The tbz/tbnz
checks the sign bit to convert

add/sub w1, w1, w10
cmp w1, #0
b.lt .LBB0_0

to

adds/subs w1, w1, w10
tbnz w1, #31, .LBB0_0

I think there are two cases around this,

LHS can support flags update like ADDS/SUBS, and we can cover more like

ANDS, ORRS and others etc.

LHS can't support flags update and BR_CC is the only user. At

present performBRCONDCombine can cover this scenario.

The patch would be more complete if you can add more instructions for case

For case 2, it can be a separate patch, I think.

Thanks,
-Jiangning

Please have a look.

On an A53 processor this improves the following benchmarks:
office_suite/OAv2mark +3%
office_suite/rotatev2data* +6-10%

I also saw improvements in spec2000 parser, vortex, and vpr, but just
barely above noise, so I wouldn't bank on them. These numbers were based
on the community mainline.

Chad

http://reviews.llvm.org/D4440

Files:
lib/Target/AArch64/AArch64ISelLowering.cpp
test/CodeGen/AArch64/tbz-tbnz.ll

Hi Jiangning,
AFAICT, emitComparison only generates ADDS, SUBS, and ANDS. I didn't handle the ANDS case because I didn't want to break test8 (extracted from logical_shifted_reg.ll) and it wasn't applicable to the code I was trying to target. I'll need to investigate this case further before I enable the combine for ANDS.

According to the documentation I have, the only other scalar operations that set the condition flags are ADCS, BICS, NEGS, NGCS, and SCBS. I'll look into adding these operations as well, but I'm not sure we'll hit these case very often, if at all.

I'll investigate adding an implementation in InstCombine. Thanks for the suggestion!

Chad

Jiangning,
I've updated the patch according to your feedback. Rather than enable the combine for just ADD/SUB it is now enabled in all cases except when the LHS operand is an AND. The emitComparison converts the AND to an ANDS and the test bit and branch instruction becomes redundant. It also increases register pressure because the ANDS instruction must write to a register (rather than WZR/XZR), which is consumed by the test bit and branch instruction. test8 provides examples of why we don't want to combine ANDs.

The performance numbers are slightly better, but still less than noise. All, please take a look.

Chad

In D4440#12, @mcrosier wrote:

The performance numbers are slightly better, but still less than noise. All, please take a look.

To be clear, the numbers are slightly better relative to the ADD/SUB only combine. Overall, we still see a large improvement in eembc/OAv2.

Hi Chad,

I'm happy with this, and your new patch looks good to me if only you can
add more comments in the code around excluding AND, because at the first
look it's strange.

Thanks,
-Jiangning

2014-07-30 0:02 GMT+08:00 Chad Rosier <mcrosier@codeaurora.org>:

In D4440#12, @mcrosier wrote:

The performance numbers are slightly better, but still less than noise.

All, please take a look.

To be clear, the numbers are slightly better relative to the ADD/SUB only
combine. Overall, we still see a large improvement in eembc/OAv2.

http://reviews.llvm.org/D4440

Revised patch based on Jiangning's feedback. Please take a look.

Hi Chad,

LGTM now!

Thanks,
-Jiangning

2014-07-31 23:23 GMT+08:00 Chad Rosier <mcrosier@codeaurora.org>:

Revised patch based on Jiangning's feedback. Please take a look.

http://reviews.llvm.org/D4440

Files:
lib/Target/AArch64/AArch64ISelLowering.cpp
test/CodeGen/AArch64/tbz-tbnz.ll

Closed by commit rL214518 (authored by @mcrosier).

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

28 lines

test/

CodeGen/

AArch64/

tbz-tbnz.ll

258 lines

Diff 12106

llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,910 Lines • ▼ Show 20 Lines	if (RHSC && RHSC->getZExtValue() == 0) {
// TBZ has a smaller branch displacement than CBZ. If the offset is		// TBZ has a smaller branch displacement than CBZ. If the offset is
// out of bounds, a late MI-layer pass rewrites branches.		// out of bounds, a late MI-layer pass rewrites branches.
// 403.gcc is an example that hits this case.		// 403.gcc is an example that hits this case.
if (LHS.getOpcode() == ISD::AND &&		if (LHS.getOpcode() == ISD::AND &&
isa<ConstantSDNode>(LHS.getOperand(1)) &&		isa<ConstantSDNode>(LHS.getOperand(1)) &&
isPowerOf2_64(LHS.getConstantOperandVal(1))) {		isPowerOf2_64(LHS.getConstantOperandVal(1))) {
SDValue Test = LHS.getOperand(0);		SDValue Test = LHS.getOperand(0);
uint64_t Mask = LHS.getConstantOperandVal(1);		uint64_t Mask = LHS.getConstantOperandVal(1);

// TBZ only operates on i64's, but the ext should be free.
if (Test.getValueType() == MVT::i32)
Test = DAG.getAnyExtOrTrunc(Test, dl, MVT::i64);

return DAG.getNode(AArch64ISD::TBZ, dl, MVT::Other, Chain, Test,		return DAG.getNode(AArch64ISD::TBZ, dl, MVT::Other, Chain, Test,
DAG.getConstant(Log2_64(Mask), MVT::i64), Dest);		DAG.getConstant(Log2_64(Mask), MVT::i64), Dest);
}		}

return DAG.getNode(AArch64ISD::CBZ, dl, MVT::Other, Chain, LHS, Dest);		return DAG.getNode(AArch64ISD::CBZ, dl, MVT::Other, Chain, LHS, Dest);
} else if (CC == ISD::SETNE) {		} else if (CC == ISD::SETNE) {
// See if we can use a TBZ to fold in an AND as well.		// See if we can use a TBZ to fold in an AND as well.
// TBZ has a smaller branch displacement than CBZ. If the offset is		// TBZ has a smaller branch displacement than CBZ. If the offset is
// out of bounds, a late MI-layer pass rewrites branches.		// out of bounds, a late MI-layer pass rewrites branches.
// 403.gcc is an example that hits this case.		// 403.gcc is an example that hits this case.
if (LHS.getOpcode() == ISD::AND &&		if (LHS.getOpcode() == ISD::AND &&
isa<ConstantSDNode>(LHS.getOperand(1)) &&		isa<ConstantSDNode>(LHS.getOperand(1)) &&
isPowerOf2_64(LHS.getConstantOperandVal(1))) {		isPowerOf2_64(LHS.getConstantOperandVal(1))) {
SDValue Test = LHS.getOperand(0);		SDValue Test = LHS.getOperand(0);
uint64_t Mask = LHS.getConstantOperandVal(1);		uint64_t Mask = LHS.getConstantOperandVal(1);

// TBNZ only operates on i64's, but the ext should be free.
if (Test.getValueType() == MVT::i32)
Test = DAG.getAnyExtOrTrunc(Test, dl, MVT::i64);

return DAG.getNode(AArch64ISD::TBNZ, dl, MVT::Other, Chain, Test,		return DAG.getNode(AArch64ISD::TBNZ, dl, MVT::Other, Chain, Test,
DAG.getConstant(Log2_64(Mask), MVT::i64), Dest);		DAG.getConstant(Log2_64(Mask), MVT::i64), Dest);
}		}

return DAG.getNode(AArch64ISD::CBNZ, dl, MVT::Other, Chain, LHS, Dest);		return DAG.getNode(AArch64ISD::CBNZ, dl, MVT::Other, Chain, LHS, Dest);
}		} else if (CC == ISD::SETLT && LHS.getOpcode() != ISD::AND) {
		// Don't combine AND since emitComparison converts the AND to an ANDS
		// (a.k.a. TST) and the test in the test bit and branch instruction
		// becomes redundant. This would also increase register pressure.
		uint64_t Mask = LHS.getValueType().getSizeInBits() - 1;
		return DAG.getNode(AArch64ISD::TBNZ, dl, MVT::Other, Chain, LHS,
		DAG.getConstant(Mask, MVT::i64), Dest);
		}
		}
		if (RHSC && RHSC->getSExtValue() == -1 && CC == ISD::SETGT &&
		LHS.getOpcode() != ISD::AND) {
		// Don't combine AND since emitComparison converts the AND to an ANDS
		// (a.k.a. TST) and the test in the test bit and branch instruction
		// becomes redundant. This would also increase register pressure.
		uint64_t Mask = LHS.getValueType().getSizeInBits() - 1;
		return DAG.getNode(AArch64ISD::TBZ, dl, MVT::Other, Chain, LHS,
		DAG.getConstant(Mask, MVT::i64), Dest);
}		}

SDValue CCVal;		SDValue CCVal;
SDValue Cmp = getAArch64Cmp(LHS, RHS, CC, CCVal, DAG, dl);		SDValue Cmp = getAArch64Cmp(LHS, RHS, CC, CCVal, DAG, dl);
return DAG.getNode(AArch64ISD::BRCOND, dl, MVT::Other, Chain, Dest, CCVal,		return DAG.getNode(AArch64ISD::BRCOND, dl, MVT::Other, Chain, Dest, CCVal,
Cmp);		Cmp);
}		}

▲ Show 20 Lines • Show All 5,226 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/tbz-tbnz.ll

				; RUN: llc -O1 -march=aarch64 < %s \| FileCheck %s

				declare void @t()

				define void @test1(i32 %a) {
				; CHECK-LABEL: @test1
				entry:
				%sub = add nsw i32 %a, -12
				%cmp = icmp slt i32 %sub, 0
				br i1 %cmp, label %if.then, label %if.end

				; CHECK: sub [[CMP:w[0-9]+]], w0, #12
				; CHECK: tbz [[CMP]], #31

				if.then:
				call void @t()
				br label %if.end

				if.end:
				ret void
				}

				define void @test2(i64 %a) {
				; CHECK-LABEL: @test2
				entry:
				%sub = add nsw i64 %a, -12
				%cmp = icmp slt i64 %sub, 0
				br i1 %cmp, label %if.then, label %if.end

				; CHECK: sub [[CMP:x[0-9]+]], x0, #12
				; CHECK: tbz [[CMP]], #63

				if.then:
				call void @t()
				br label %if.end

				if.end:
				ret void
				}

				define void @test3(i32 %a) {
				; CHECK-LABEL: @test3
				entry:
				%sub = add nsw i32 %a, -12
				%cmp = icmp sgt i32 %sub, -1
				br i1 %cmp, label %if.then, label %if.end

				; CHECK: sub [[CMP:w[0-9]+]], w0, #12
				; CHECK: tbnz [[CMP]], #31

				if.then:
				call void @t()
				br label %if.end

				if.end:
				ret void
				}

				define void @test4(i64 %a) {
				; CHECK-LABEL: @test4
				entry:
				%sub = add nsw i64 %a, -12
				%cmp = icmp sgt i64 %sub, -1
				br i1 %cmp, label %if.then, label %if.end

				; CHECK: sub [[CMP:x[0-9]+]], x0, #12
				; CHECK: tbnz [[CMP]], #63

				if.then:
				call void @t()
				br label %if.end

				if.end:
				ret void
				}

				define void @test5(i32 %a) {
				; CHECK-LABEL: @test5
				entry:
				%sub = add nsw i32 %a, -12
				%cmp = icmp sge i32 %sub, 0
				br i1 %cmp, label %if.then, label %if.end

				; CHECK: sub [[CMP:w[0-9]+]], w0, #12
				; CHECK: tbnz [[CMP]], #31

				if.then:
				call void @t()
				br label %if.end

				if.end:
				ret void
				}

				define void @test6(i64 %a) {
				; CHECK-LABEL: @test6
				entry:
				%sub = add nsw i64 %a, -12
				%cmp = icmp sge i64 %sub, 0
				br i1 %cmp, label %if.then, label %if.end

				; CHECK: sub [[CMP:x[0-9]+]], x0, #12
				; CHECK: tbnz [[CMP]], #63

				if.then:
				call void @t()
				br label %if.end

				if.end:
				ret void
				}

				define void @test7(i32 %a) {
				; CHECK-LABEL: @test7
				entry:
				%sub = sub nsw i32 %a, 12
				%cmp = icmp slt i32 %sub, 0
				br i1 %cmp, label %if.then, label %if.end

				; CHECK: sub [[CMP:w[0-9]+]], w0, #12
				; CHECK: tbz [[CMP]], #31

				if.then:
				call void @t()
				br label %if.end

				if.end:
				ret void
				}

				define void @test8(i64 %val1, i64 %val2, i64 %val3) {
				; CHECK-LABEL: @test8
				%and1 = and i64 %val1, %val2
				%tst1 = icmp slt i64 %and1, 0
				br i1 %tst1, label %if.then1, label %if.end

				; CHECK: tst x0, x1
				; CHECK-NEXT: b.ge .L

				if.then1:
				%and2 = and i64 %val2, %val3
				%tst2 = icmp sge i64 %and2, 0
				br i1 %tst2, label %if.then2, label %if.end

				; CHECK: and [[CMP:x[0-9]+]], x1, x2
				; CHECK-NOT: cmp
				; CHECK: tbnz [[CMP]], #63, .LBB7_5

				if.then2:
				%shifted_op1 = shl i64 %val2, 63
				%shifted_and1 = and i64 %val1, %shifted_op1
				%tst3 = icmp slt i64 %shifted_and1, 0
				br i1 %tst3, label %if.then3, label %if.end

				; CHECK: tst x0, x1, lsl #63
				; CHECK: b.ge .L

				if.then3:
				%shifted_op2 = shl i64 %val2, 62
				%shifted_and2 = and i64 %val1, %shifted_op2
				%tst4 = icmp sge i64 %shifted_and2, 0
				br i1 %tst4, label %if.then4, label %if.end

				; CHECK: tst x0, x1, lsl #62
				; CHECK: b.lt .L

				if.then4:
				call void @t()
				br label %if.end

				if.end:
				ret void
				}

				define void @test9(i64 %val1) {
				; CHECK-LABEL: @test9
				%tst = icmp slt i64 %val1, 0
				br i1 %tst, label %if.then, label %if.end

				; CHECK-NOT: cmp
				; CHECK: tbz x0, #63, .L

				if.then:
				call void @t()
				br label %if.end

				if.end:
				ret void
				}

				define void @test10(i64 %val1) {
				; CHECK-LABEL: @test10
				%tst = icmp slt i64 %val1, 0
				br i1 %tst, label %if.then, label %if.end

				; CHECK-NOT: cmp
				; CHECK: tbz x0, #63, .L

				if.then:
				call void @t()
				br label %if.end

				if.end:
				ret void
				}

				define void @test11(i64 %val1, i64* %ptr) {
				; CHECK-LABEL: @test11

				; CHECK: ldr [[CMP:x[0-9]+]], [x1]
				; CHECK-NOT: cmp
				; CHECK: tbz [[CMP]], #63, .L

				%val = load i64* %ptr
				%tst = icmp slt i64 %val, 0
				br i1 %tst, label %if.then, label %if.end

				if.then:
				call void @t()
				br label %if.end

				if.end:
				ret void
				}

				define void @test12(i64 %val1) {
				; CHECK-LABEL: @test12
				%tst = icmp slt i64 %val1, 0
				br i1 %tst, label %if.then, label %if.end

				; CHECK-NOT: cmp
				; CHECK: tbz x0, #63, .L

				if.then:
				call void @t()
				br label %if.end

				if.end:
				ret void
				}

				define void @test13(i64 %val1, i64 %val2) {
				; CHECK-LABEL: @test13
				%or = or i64 %val1, %val2
				%tst = icmp slt i64 %or, 0
				br i1 %tst, label %if.then, label %if.end

				; CHECK: orr [[CMP:x[0-9]+]], x0, x1
				; CHECK-NOT: cmp
				; CHECK: tbz [[CMP]], #63, .L

				if.then:
				call void @t()
				br label %if.end

				if.end:
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Generate tbz/tbnz when comparing against zero.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 12106

llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/trunk/test/CodeGen/AArch64/tbz-tbnz.ll

[AArch64] Generate tbz/tbnz when comparing against zero.
ClosedPublic