The inline sequence is very long (about 70 bytes on Thumb1), so it's not a good idea to inline it, especially when optimizing for size.
I'm not too familiar with ARM/Thumb codegen. Add the extra RUN with the current output, so we can see the difference?
Should there also be a test to demonstrate the difference between regular codegen and codegen with the optsize/minsize attribute?
I'm not checking for optsize because I don't think it makes sense to inline even when optimizing for speed... although maybe that's not right. The current code for Thumb1 is something like the following (which is essentially computing popcount(nextpoweroftwo(x)-1)).
test: .fnstart @ %bb.0: lsrs r1, r0, #1 orrs r1, r0 lsrs r0, r1, #2 orrs r0, r1 lsrs r1, r0, #4 orrs r1, r0 lsrs r0, r1, #8 orrs r0, r1 lsrs r1, r0, #16 orrs r1, r0 mvns r0, r1 lsrs r1, r0, #1 ldr r2, .LCPI0_0 ands r2, r1 subs r0, r0, r2 ldr r1, .LCPI0_1 lsrs r2, r0, #2 ands r0, r1 ands r2, r1 adds r0, r0, r2 lsrs r1, r0, #4 adds r0, r0, r1 ldr r1, .LCPI0_2 ands r1, r0 ldr r0, .LCPI0_3 muls r0, r1, r0 lsrs r0, r0, #24 bx lr .p2align 2 @ %bb.1: .LCPI0_0: .long 1431655765 @ 0x55555555 .LCPI0_1: .long 858993459 @ 0x33333333 .LCPI0_2: .long 252645135 @ 0xf0f0f0f .LCPI0_3: .long 16843009 @ 0x1010101 .Lfunc_end0: .size test, .Lfunc_end0-test
So we (Arm Compiler 6) don't ship/compile against compiler-rt. At least not at the moment. I'm not sure why, it's been like that from before my time, we have just survived like that for a long while. I'm guessing our c library has always just filled in the gaps (at least the parts that we need).
I asked some people around here, Peter S suggested this probably didn't count as gnueabi, seeing as it works on many platforms (darwin, windows, etc). We probably have to just presume these rt functions are present, which I think makes sense from a compiler perspective. From reading the libgcc docs, they are always presumed to be present (even with nostdlib, afaict)
I can fix this up downstream, either by adding the function to our c library, or more likely by partially reverting this just in AC6 until we have these functions.
With that out the way, LGTM. This looks like a good codesize improvement.