This is an archive of the discontinued LLVM Phabricator instance.

X86_32ABI change: fix unaligned vector arguments on stack in inalloca.
AbandonedPublic

Authored by AntonYudintsev on Dec 26 2019, 2:41 PM.

Details

Reviewers
rnk
Summary

On win32 x86, SSE vector arguments were not aligned when put to stack in inalloca, loosing alignment.

That resulted in code that crashes when some other code working with arguments of SSE vector types assuming their alignment

Example of c++ code:

#include <vector>
#include <functional>
#include <emmintrin.h>

struct Id { unsigned handle; };
struct QueryView { __m128 val; unsigned i; };

extern void perform_query(std::function<void(QueryView &)>);


template<typename BT> void unaligned_stack_store( QueryView &qv, const BT & block)
{
  	block( Id{qv.i}, _mm_setzero_ps() );
}

template<typename BT> void aligned_read( const BT & block)
{
	perform_query([&](QueryView& qv) { block(qv.val); });
}

void external_call ( QueryView & qv ) {
{
	unaligned_stack_store(qv,  [&](Id id, const __m128 two)->void{ aligned_read([&](__m128 & arg)->void{arg = two;});});
}}

The body of closure in std::function will (reasonably) make aligned read (movaps) on argument of m128 type.
While unaligned_stack_store will store
m128 register on stack unaligned, tightly with Id structure.
This results in crash during runtime (unaligned load).
LLVM code (relevant part)

%3 = alloca inalloca <{ %struct.Id, <4 x float> }>, align 4

Assembly disasm (relevant part):
push eax
sub esp, 16
mov eax, esp
xorps xmm0, xmm0
mov dword ptr [eax], ecx
movups xmmword ptr [eax + 4], xmm0 #unaligned!

With this patch the last line will be
movups xmmword ptr [eax + 16], xmm0
which is similar to what MSVC produces.

Diff Detail

Event Timeline

AntonYudintsev created this revision.Dec 26 2019, 2:41 PM
rnk added a comment.Dec 26 2019, 3:53 PM

When this code was written it was not possible to make MSVC pass such highly aligned things by value. MSVC would produce a compiler error, IIRC. So, this code was written to assume that fields were always only ever four byte aligned.

It needs tests and a more careful review, which will take some time. Thanks for the info, though.

I totally agree, it requires tests and review.

Today is first time I took a look into LLVM code, and in no way I pretend my fix is the most correct.

However, I would like to share some of information.
With that fix, two of our games (War Thunder and Cuisine Royale) are compiling and working (with another completely unrelated fix/workaround to clang-cl). Without that they were crashing (on a similar code as in example). Not that it is the most thorough test, ofc, but still a test.

I also recall, that back in the days of MSVC 2010 MSVC was not capable at all of passing more than three __m128 arguments to function (in stack), it wasn't compiling and required 4+ argument to be passed by reference.
In a current MSVC (at least 2015+) it is not longer a requirement.

rnk added a comment.Jan 2 2020, 3:28 PM

I totally agree, it requires tests and review.

Today is first time I took a look into LLVM code, and in no way I pretend my fix is the most correct.

However, I would like to share some of information.
With that fix, two of our games (War Thunder and Cuisine Royale) are compiling and working (with another completely unrelated fix/workaround to clang-cl). Without that they were crashing (on a similar code as in example). Not that it is the most thorough test, ofc, but still a test.

I also recall, that back in the days of MSVC 2010 MSVC was not capable at all of passing more than three __m128 arguments to function (in stack), it wasn't compiling and required 4+ argument to be passed by reference.
In a current MSVC (at least 2015+) it is not longer a requirement.

Np, thanks for the report, glad to hear that, this issue aside, things work well. :)

I put together a much larger change to try to handle all of this stuff:
https://reviews.llvm.org/D72114
It ended up being quite involved, since it revises many assumptions I made long ago.

Thank you so much, your fix is ofc way more accurate.
Looking forward to see it merged to trunk!