LSLFast - a logical shift left up to 3 places.
In Kryo if an commutative instruction has a LSL for both operands and if the LSL can be folded into the instruction's shifted register (e.g., add x0, x1, x2, lsl #3) then we should canonicalize the operands so the smaller (in terms of the number of shifts) is the operands that is folded.
For example, rather than
lsl x1, x1, #1
add x0, x1, x2, lsl #4
we should prefer
lsl x2, x2, #4
add x0, x2, x1, lsl #1
as this safes a cycle on the add instruction.
I will add commutative instructions after this patch is approved.
I would further narrow this such that we only commute the operands when we know the shift is 3 or fewer. Otherwise, code will be perturbed for no real reason.