Normalize the offset for endianess before checking
if the store cover the load in ForwardStoreValueToDirectLoad.
Without this we missed out on some optimizations for big
endian targets. If for example having a 4 bytes store followed
by a 1 byte load, loading the least significant byte from the
store, the STCoversLD check would fail (see @test4 in
test/CodeGen/AArch64/load-store-forwarding.ll).
This patch also fixes a problem seen in an out-of-tree target.
The target has i40 as a legal type, it is big endian,
and the StoreSize for i40 is 48 bits. So when normalizing
the offset for endianess we need to take the StoreSize into
account (assuming that padding added when storing into
a larger StoreSize always is added at the most significant
end).
@niravd do you remember why we have the check (Offset * 8 <= LDMemType.getSizeInBits()) here?
From my point of view it looks wrong. Maybe it is supposed to be (Offset * 8 <= STMemType.getSizeInBits()), i.e. checking that the load starts before the last bit written by the store. But then I guess it is enough to check
(Offset >= 0) && (Offset * 8 + LDMemType.getSizeInBits() <= STMemType.getSizeInBits()) or are we trying to catch some special case when we get overflow in the int64_t?