I'm a long way from home on this one...
Believe it or not, this is one step towards solving PR24818:
https://llvm.org/bugs/show_bug.cgi?id=24818
Some background here:
http://reviews.llvm.org/rL248439
The immediate problem is that ARM is using the default TLI cost settings for count-leading/trailing-zeros. I think this should be considered a cheap operation (and therefore fair game for speculation) for any implementation with V6T2 or later.
Another possibility is that we just invert the default settings for the base class hooks. Of the in-tree targets, I'm pretty sure that ARM64 and MIPS should also be making these ops cheap, but they're currently not.
The net result of allowing this speculation for the new ARM regression tests in this patch is that we get this code:
ctlz: clz r0, r0 bx lr cttz: rbit r0, r0 clz r0, r0 bx lr
Instead of:
ctlz: cmp r0, #0 moveq r0, #32 clzne r0, r0 bx lr cttz: cmp r0, #0 moveq r0, #32 rbitne r0, r0 clzne r0, r0 bx lr