This patch combines endian-independent load sequences, that look like this:
(x[3]<<24 | x[2]<<16 | x[1]<<8 | x[0])
into a single load (bswapped if the data endianness is different from the target endianness):
*((int*)x)
One notable issue is alignment: this patch produces 1-aligned loads, no matter the size. In practice, this means that on some targets (ARM comes to mind), the load will codegen into the same original shift/or sequence. I'm not sure if there's a way to discover alignment for this sort of situation. But no matter what, this should always be profitable.
It doesn't happen very often in practice (LNT is still running), so I tried to avoid being too expensive (this is pretty different from the other OR combines).
Also, the combining is aborted if there are *any* stores between the first and last load in the sequence. This needs a loop scanning up from the last instruction. As a safeguard, I put a max number of instructions on that loop, but by then, we already know the combine is valid, so I'm not sure if it's a good idea to abort that late.
Thanks!
- Ahmed
As you require these loads to all be in the same block and the initial instruction when you scan later, why don't you get the load's parent block here and bail out earlier?