This is a restricted version of the combine in DAGCombiner::MatchLoadCombine.
This tries to recognize patterns like below (assuming a little-endian target):
s8* x = ... s32 val = a | (a << 8) | (a << 16) | (a << 24) -> s32 val = *((i32)a) s8* x = ... s32 val = a | (a << 8) | (a << 16) | (a << 24) -> s32 val = BSWAP(*((s32)a))
(This patch also handles the big-endian target case as well, in which the first example above has a BSWAP, and the second example above does not.)
To recognize the pattern, this searches from the last G_OR in the expression
Reg Reg \ / OR_1 Reg \ / OR_2 \ Reg .. / OR_Root
Each non-OR register in the tree is put in a list. Each register in the list is then checked to see if it's an appropriate load + shift logic.
If every register is a load + potentially a shift, the combine checks if those loads + shifts, when OR'd together, are equivalent to a wide load (possibly with a BSWAP.)
To simplify things, this patch
(1) Only handles G_ZEXTLOADs (which appear to be the common case)
(2) Only works in a single MachineBasicBlock
(3) Only handles G_SHL as the bit twiddling to stick the small load into a specific location
An IR example of this is here: https://godbolt.org/z/4sP9Pj (lifted from test/CodeGen/AArch64/load-combine.ll)
At -Os on AArch64, this is a 0.5% code size improvement for CTMark/sqlite3, and a 0.4% improvement for CTMark/7zip-benchmark.