When load integer types with non-power2 aligned size, we will split the load to some no overlap legal type loads.
For example, if we trying to legalize i56, it will be split to i32 + i16 + i8 three loads.
This change trying to use two i32 load with 8bits overlap to reduce the load number.
The motivation comes from ARM64EC (https://reviews.llvm.org/D125418#inline-1267564).
For now, we don't apply it to store because it will involve extra dependency in CPU load store queue.
The alignment you're passing in here doesn't seem right; you want to pass in the alignment of the load you're planning to generate, right?