Like with casts, we need to subtract the cost of lshr instruction
from budget, and recurse into LHS operand.
Seems "pretty obviously correct" to me?
To be noted, there is a number of other shortcuts we could cost-model:
- ... + (-1 * ...) -> ... - ... <- likely very frequent case
- x - (rem x, power-of-2), which is currently (x udiv power-of-2) * power-of-2 -> x & -log2(power-of-2)
- rem x, power-of-2, which is currently x - ((x udiv power-of-2) * power-of-2) -> x & log2(power-of-2)-1
- ... * power-of-2 -> ... << log2(power-of-2) <- likely not very beneficial