This is an archive of the discontinued LLVM Phabricator instance.

AArch64FastISel: Use cbz/cbnz to branch on i1
ClosedPublic

Authored by MatzeB on Dec 1 2015, 10:37 AM.

Details

Summary

In the case of a conditional branch without a preceding cmp or similar we
used to emit a "and; cmp; b.eq/b.ne" sequence, use tbz/tbnz instead.

rdar://18783875

Diff Detail

Repository
rL LLVM

Event Timeline

MatzeB updated this revision to Diff 41535.Dec 1 2015, 10:37 AM
MatzeB retitled this revision from to AArch64FastISel: Use cbz/cbnz to branch on i1.
MatzeB updated this object.
MatzeB added reviewers: ributzka, t.p.northover.
MatzeB set the repository for this revision to rL LLVM.
MatzeB added a subscriber: llvm-commits.
t.p.northover requested changes to this revision.Dec 1 2015, 10:54 AM
t.p.northover edited edge metadata.

I don't think this is quite right. The i1 content of a register is only the low bit, isn't it? This is most easily seen outside Darwin (we're the only ones that extend incoming paramters in the caller), but I think this gets miscompiled even for us:

define i32 @foo(i32 %in) {
entry:
  %val = trunc i32 %in to i1
  br label %do_test

do_test:
  %tst = phi i1 [%val, %entry]
  br i1 %tst, label %true, label %false

true:

  ret i32 0

false:
  ret i32 1
}

So I think we really do need the TBZ for correctness.

This revision now requires changes to proceed.Dec 1 2015, 10:54 AM

I don't think this is quite right. The i1 content of a register is only the low bit, isn't it? This is most easily seen outside Darwin (we're the only ones that extend incoming paramters in the caller), but I think this gets miscompiled even for us:

define i32 @foo(i32 %in) {
entry:
  %val = trunc i32 %in to i1
  br label %do_test

do_test:
  %tst = phi i1 [%val, %entry]
  br i1 %tst, label %true, label %false

true:

  ret i32 0

false:
  ret i32 1
}

So I think we really do need the TBZ for correctness.

I tried your example and even the non-fast-isel produce by "llc -march arm64 -O0" seems to use plain cbz, is this a bug?

I was looking at TargetLowering::getBooleanContents() comments: "For targets without i1 registers, this gives the nature of the high-bits of boolean values held in types wider than i1." for aarch64 this is "ZeroOrOneBooleanContent".
Is it wrong to rely on this in this context? I admit this warrants an assert(getBooleanContents() != UndefinedBooleanContents) to express my reasoning.

MatzeB updated this revision to Diff 41666.Dec 2 2015, 1:36 PM
MatzeB updated this object.
MatzeB edited edge metadata.

After discussing this with Tim, I realized that the "getBooleanContents() properties" are only guaranteed only for the usual boolean inputs of branches/selects/... and only after selection DAG legalization. So we shouldn't rely on that in the FastISel variants and I changed the code to use tbz/tbnz which should be as efficient anyway.

This new version also removes a BrCond(Trunc) special case which was creating more complicated code than the default case now.

(I also see no regressions on the test-suite with this patch).

t.p.northover accepted this revision.Dec 2 2015, 6:21 PM
t.p.northover edited edge metadata.

This looks fine to me. Thanks for updating the patch!

Tim.

This revision is now accepted and ready to land.Dec 2 2015, 6:21 PM
This revision was automatically updated to reflect the committed changes.