Commute shift and select in the following pattern:
shift lhs, (select cond, constant1, constant2) --> select cond, (shift lhs, constant1), (shift lhs, constant2)
This is beneficial on x86, where shifting by an immediate is faster than
shifting by a register.
Canonical example:
return x << (cond ? 4 : 8);
before this patch
mov eax, edi xor ecx, ecx test esi, esi sete cl lea ecx, [rcx + 2*rcx] add ecx, 3 shl eax, cl ret
after this patch
lea eax, [8*rdi] shl edi, 6 test esi, esi cmove eax, edi ret
I enabled this folding only on x86. By my reading of the ARM Coretex-A75
optimization guide, this is not beneficial there. (I didn't check other ARM
processors.) I was unable to find a PPC optimization guide that listed
instruction latencies, so I didn't enable it there.
I think we generally have too many, overly specific queries like this. Is there any real reason to NOT do this on any target?