As far as I can tell FeatureLSLFast was originally added to specify that a lsl of <= 3 was cheap when folded into an addressing operand, so should override the one-use checks usually intended to make sure we don't perform redundant work. At a later point it also came to also mean that add x0, x1, x2, lsl N with N <= 4 was cheap, in that it took a single cycle not multiple cycles that more complex adds usually take.
This patch splits those two concepts out into separate subtarget features. The biggest change is the change to AArch64DAGToDAGISel::isWorthFoldingALU, making ALU operations now produce a ADDWrs if the shift is <= 4.
Otherwise the patch is mostly an NFC as it tries to keep the subtarget features the same for each cpu. I believe that the Arm OoO CPUs should eventually be changed to a new subtarget feature that specifies that a shift of 2 or 3 with any extend should be treated as cheap (just not shifts of 1 or 4).