This test case (which I hope is free on UB), has two stores of 0 to offsets 20 and 24 in a chunk of memory:
store i32 0, i32* %helper.20.32
store i32 0, i32* %helper.24.32, align 8
A 64 bit load, aligned to 4 bytes:
%load.helper.20.64 = load i64, i64* %helper.20.64, align 4
This is on AArch32, so during type legalisation the i64 load is split into two 32bit loads. The second of them:
t35: i32,ch = load<(load 4 from %ir.helper.20.64 + 4)> t21, t37, undef:i32
gets marked as being align 8 (note: the base+offset is align 8, not the base). This is then deemed to not alias with the load to %helper.24.32 as the alignment just set is taken as the base alignment, not the base+offset alignment.
The test case seems to need a <4 x i32> which on ARM is converted to a VLD1_UPD. I believe this pushes certain optimisation back later after legalisation. Originally it needed -combiner-global-alias-analysis, but this version shows the same error without.
Here I've set the updated alignment only if the alignment hold true for the base.