The attached code is miscompiled when targeting big-endian at all optimisation levels except for -O0. This should print "checksum = 00000008", but actually prints "checksum = 00000000". It is correctly compiled if I change the statement just before the function call to func_13 from l_15.f0 to l_15.f1 (the result of this expression is unused). The only change this causes in the IR is to change the parameter in the call to func_13 from 0x800000180 to 0x800018000.
The problem seems to be in the 'scalar replacement of aggregates' pass. The problem arises because we have a 7-byte type but the alloca is 8 bytes (because it's 4-byte aligned), which causes the aggregate to be split up into two 4-byte slices except one actually ends up being 3 bytes. The pass takes into account endianness, thus adds a shift instruction when inserting an integer to an alloca store:
if(DL.isBigEndian()) ShAmt = 8 * (DL.getTypeStoreSize(IntTy) - DL.getTypeStoreSize(Ty) - Offset);
In this particular example an integer value of wrong size ({i32}) is passed as parameter to the function that computes the shift amount (‘insertInteger’). This causes a zero shift amount since IntTy = {i64}, Ty = {i32}, and offset = 4 bytes. My patch passes a parameter of Ty = {i24} to ‘insertInteger’.
This line is longer than 80 characters, and needs to be reformatted so that you don't exceed that limit.