GCC can lower a = b * C where C = (2^n + 1) * 2^m to

add w0, w0, w0, lsl n lsl w0, w0, m

also lower C = (2^n - 1) * 2^m to

lsl w1, w0, n sub w0, w1, w0 lsl w0, w0, m

LLVM cannot do either above transformations and generate code like this

mov w8, C mul w0, w0, w8

This change considers the first case, since the second case requires an extra instruction. The change is also very conservative to try not to touch the mul that can be folded into s(u)mull, madd(sub), s(u)madd(sub)l since their costs seem unknown during ISelLowering.

I am also open to suggestions about a better place (machine-combiner???) to implement the transformation. If I have the information of the cycles of 32 and 64bit mul, I can consider more constants such as C = (2^m + 1) * (2^n+1), C = (2^m + 1) * 2^n + 1, or C = ((2^m + 1) * 2^n + 1) * 2^p