This was motivated by a bug which caused code like this to be
miscompiled:
declare void @take_ptr(i8*) define void @test() { %addr1.32 = alloca i8 %addr2.32 = alloca i32, i32 1028 call void @take_ptr(i8* %addr1) ret void }
This was emitting the following assembly to get the value of %addr1:
add r0, sp, #1020 add r0, r0, #8
However, "add r0, r0, #8" is not a valid Thumb1 instruction, and this
could not be assembled. The generated object file contained this,
resulting in r0 holding SP+8 rather tha SP+1028:
add r0, sp, #1020 add r0, sp, #8
This function looked like it could have caused miscompilations for
other combinations of registers and offsets (though I don't think it is
currently called with these), and the heuristic it used did not match
the emitted code in all cases.
Why not just fix the implementation of RoundUpToAlignment? Clang seems quite capable of optimising "/Align * Align" when Align is a power of 2, and it looks like the function gets inlined at all of its callsites.