The patch is to improve the code efficiency for the case described in https://llvm.org/bugs/show_bug.cgi?id=28726
For the instruction sequence of int64 store below, %int_tmp and %float_tmp are bundled together as an int64 data before stored into memory. If the int64 data is not used outside of the store, it is more efficient to generate separate stores for %int_tmp and %float_tmp.
Instruction sequence of int64 Store:
%ref.tmp = alloca i64, align 8 %1 = bitcast float %float_tmp to i32 %sroa.1.ext = zext i32 %1 to i64 %sroa.1.shift = shl nuw i64 %sroa.1.ext, 32 %sroa.0.ext = zext i32 %int_tmp to i64 %sroa.0.insert = or i64 %sroa.1.shift, %sroa.0.ext store i64 %retval.sroa.0.0.insert.insert.i, i64* %ref.tmp, align 8
Instruction sequence of separate stores:
%ref.tmp = alloca i64, align 8 %1 = bitcast i64* %ref.tmp to i32* store i32 %int_tmp, i32* %1, align 4 %2 = getelementptr i32, i32* %1, i64 1 %3 = bitcast i32* %2 to float* store float %float_tmp, float* %3, align 4
The alloca is not necessary for the example (and was misleading to me)