On win32 x86, SSE vector arguments were not aligned when put to stack in inalloca, loosing alignment.
That resulted in code that crashes when some other code working with arguments of SSE vector types assuming their alignment
Example of c++ code:
#include <vector> #include <functional> #include <emmintrin.h> struct Id { unsigned handle; }; struct QueryView { __m128 val; unsigned i; }; extern void perform_query(std::function<void(QueryView &)>); template<typename BT> void unaligned_stack_store( QueryView &qv, const BT & block) { block( Id{qv.i}, _mm_setzero_ps() ); } template<typename BT> void aligned_read( const BT & block) { perform_query([&](QueryView& qv) { block(qv.val); }); } void external_call ( QueryView & qv ) { { unaligned_stack_store(qv, [&](Id id, const __m128 two)->void{ aligned_read([&](__m128 & arg)->void{arg = two;});}); }}
The body of closure in std::function will (reasonably) make aligned read (movaps) on argument of m128 type.
While unaligned_stack_store will store m128 register on stack unaligned, tightly with Id structure.
This results in crash during runtime (unaligned load).
LLVM code (relevant part)
%3 = alloca inalloca <{ %struct.Id, <4 x float> }>, align 4
Assembly disasm (relevant part):
push eax
sub esp, 16
mov eax, esp
xorps xmm0, xmm0
mov dword ptr [eax], ecx
movups xmmword ptr [eax + 4], xmm0 #unaligned!
With this patch the last line will be
movups xmmword ptr [eax + 16], xmm0
which is similar to what MSVC produces.