- Most CPUs have dedicated adder & shifter to compute base address of
loads and stores, hence they are always free to use
- Older CPUs incur extra 1 cycle when doing load with left shift by 2, don't fold LSL to base address in these cases, add new feature for this
Can we add Addr to the name of this feature, to explain that it is about address operands, not add+lsl's. Should we also use Scale2 or Shift1?
From looking at the optimization guides and what we model in the scheduling model (https://github.com/llvm/llvm-project/blob/f6bdfb0b92690403ceef8c1d58adf7a3a733b543/llvm/lib/Target/AArch64/AArch64SchedNeoverseN1.td#L490), it looks like this should be slower for scale2 and scale16. scale4 and scale8 (and scale1, that one's easy) are fast.