This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Lower llvm.ctlz.i32 to a libcall when clz is not available.
ClosedPublic

Authored by efriedma on Jun 7 2018, 4:01 PM.

Details

Summary

The inline sequence is very long (about 70 bytes on Thumb1), so it's not a good idea to inline it, especially when optimizing for size.

Diff Detail

Repository
rL LLVM

Event Timeline

efriedma created this revision.Jun 7 2018, 4:01 PM
dmgreen added a subscriber: dmgreen.Jun 8 2018, 8:59 AM

I'm not too familiar with ARM/Thumb codegen. Add the extra RUN with the current output, so we can see the difference?

Should there also be a test to demonstrate the difference between regular codegen and codegen with the optsize/minsize attribute?

I'm not checking for optsize because I don't think it makes sense to inline even when optimizing for speed... although maybe that's not right. The current code for Thumb1 is something like the following (which is essentially computing popcount(nextpoweroftwo(x)-1)).

test:
        .fnstart
@ %bb.0:
        lsrs    r1, r0, #1
        orrs    r1, r0
        lsrs    r0, r1, #2
        orrs    r0, r1
        lsrs    r1, r0, #4
        orrs    r1, r0
        lsrs    r0, r1, #8
        orrs    r0, r1
        lsrs    r1, r0, #16
        orrs    r1, r0
        mvns    r0, r1
        lsrs    r1, r0, #1
        ldr     r2, .LCPI0_0
        ands    r2, r1
        subs    r0, r0, r2
        ldr     r1, .LCPI0_1
        lsrs    r2, r0, #2
        ands    r0, r1
        ands    r2, r1
        adds    r0, r0, r2
        lsrs    r1, r0, #4
        adds    r0, r0, r1
        ldr     r1, .LCPI0_2
        ands    r1, r0
        ldr     r0, .LCPI0_3
        muls    r0, r1, r0
        lsrs    r0, r0, #24
        bx      lr
        .p2align        2
@ %bb.1:
.LCPI0_0:
        .long   1431655765              @ 0x55555555
.LCPI0_1:
        .long   858993459               @ 0x33333333
.LCPI0_2:
        .long   252645135               @ 0xf0f0f0f
.LCPI0_3:
        .long   16843009                @ 0x1010101
.Lfunc_end0:
        .size   test, .Lfunc_end0-test

I like the idea I think. Should this be guarded by some sort of gnueabi though?

__clzsi2 is provided by libgcc/compiler-rt, so it should be generally available. I guess it might be possible to construct an environment where it isn't, but I'm not sure how.

dmgreen accepted this revision.Jun 13 2018, 8:21 AM

So we (Arm Compiler 6) don't ship/compile against compiler-rt. At least not at the moment. I'm not sure why, it's been like that from before my time, we have just survived like that for a long while. I'm guessing our c library has always just filled in the gaps (at least the parts that we need).

I asked some people around here, Peter S suggested this probably didn't count as gnueabi, seeing as it works on many platforms (darwin, windows, etc). We probably have to just presume these rt functions are present, which I think makes sense from a compiler perspective. From reading the libgcc docs, they are always presumed to be present (even with nostdlib, afaict)

I can fix this up downstream, either by adding the function to our c library, or more likely by partially reverting this just in AC6 until we have these functions.

With that out the way, LGTM. This looks like a good codesize improvement.

This revision is now accepted and ready to land.Jun 13 2018, 8:21 AM
This revision was automatically updated to reflect the committed changes.