This is an archive of the discontinued LLVM Phabricator instance.

Differential D71915

X86_32ABI change: fix unaligned vector arguments on stack in inalloca.
AbandonedPublic

Authored by AntonYudintsev on Dec 26 2019, 2:41 PM.

Download Raw Diff

Details

Reviewers

rnk

Summary

On win32 x86, SSE vector arguments were not aligned when put to stack in inalloca, loosing alignment.

That resulted in code that crashes when some other code working with arguments of SSE vector types assuming their alignment

Example of c++ code:

#include <vector>
#include <functional>
#include <emmintrin.h>

struct Id { unsigned handle; };
struct QueryView { __m128 val; unsigned i; };

extern void perform_query(std::function<void(QueryView &)>);


template<typename BT> void unaligned_stack_store( QueryView &qv, const BT & block)
{
  	block( Id{qv.i}, _mm_setzero_ps() );
}

template<typename BT> void aligned_read( const BT & block)
{
	perform_query([&](QueryView& qv) { block(qv.val); });
}

void external_call ( QueryView & qv ) {
{
	unaligned_stack_store(qv,  [&](Id id, const __m128 two)->void{ aligned_read([&](__m128 & arg)->void{arg = two;});});
}}

The body of closure in std::function will (reasonably) make aligned read (movaps) on argument of m128 type.
While unaligned_stack_store will store m128 register on stack unaligned, tightly with Id structure.
This results in crash during runtime (unaligned load).
LLVM code (relevant part)

%3 = alloca inalloca <{ %struct.Id, <4 x float> }>, align 4

Assembly disasm (relevant part):
push eax
sub esp, 16
mov eax, esp
xorps xmm0, xmm0
mov dword ptr [eax], ecx
movups xmmword ptr [eax + 4], xmm0 #unaligned!

With this patch the last line will be
movups xmmword ptr [eax + 16], xmm0
which is similar to what MSVC produces.

Diff Detail

Event Timeline

AntonYudintsev created this revision.Dec 26 2019, 2:41 PM

When this code was written it was not possible to make MSVC pass such highly aligned things by value. MSVC would produce a compiler error, IIRC. So, this code was written to assume that fields were always only ever four byte aligned.

It needs tests and a more careful review, which will take some time. Thanks for the info, though.

I totally agree, it requires tests and review.

Today is first time I took a look into LLVM code, and in no way I pretend my fix is the most correct.

However, I would like to share some of information.
With that fix, two of our games (War Thunder and Cuisine Royale) are compiling and working (with another completely unrelated fix/workaround to clang-cl). Without that they were crashing (on a similar code as in example). Not that it is the most thorough test, ofc, but still a test.

I also recall, that back in the days of MSVC 2010 MSVC was not capable at all of passing more than three __m128 arguments to function (in stack), it wasn't compiling and required 4+ argument to be passed by reference.
In a current MSVC (at least 2015+) it is not longer a requirement.

rnk mentioned this in D72114: [MS] Overhaul how clang passes overaligned args on x86_32.Jan 2 2020, 3:25 PM

In D71915#1796961, @AntonYudintsev wrote:

I totally agree, it requires tests and review.

Today is first time I took a look into LLVM code, and in no way I pretend my fix is the most correct.

However, I would like to share some of information.
With that fix, two of our games (War Thunder and Cuisine Royale) are compiling and working (with another completely unrelated fix/workaround to clang-cl). Without that they were crashing (on a similar code as in example). Not that it is the most thorough test, ofc, but still a test.

I also recall, that back in the days of MSVC 2010 MSVC was not capable at all of passing more than three __m128 arguments to function (in stack), it wasn't compiling and required 4+ argument to be passed by reference.
In a current MSVC (at least 2015+) it is not longer a requirement.

Np, thanks for the report, glad to hear that, this issue aside, things work well. :)

I put together a much larger change to try to handle all of this stuff:
https://reviews.llvm.org/D72114
It ended up being quite involved, since it revises many assumptions I made long ago.

Thank you so much, your fix is ofc way more accurate.
Looking forward to see it merged to trunk!

rnk mentioned this in rG2af74e27ed7d: [MS] Overhaul how clang passes overaligned args on x86_32.Jan 23 2020, 4:12 PM

Was more correctly fixed in https://reviews.llvm.org/D72114

Revision Contents

Path

Size

clang/

lib/

CodeGen/

TargetInfo.cpp

19 lines

Diff 235377

clang/lib/CodeGen/TargetInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 1,854 Lines • ▼ Show 20 Lines
	void			void
	X86_32ABIInfo::addFieldToArgStruct(SmallVector<llvm::Type *, 6> &FrameFields,			X86_32ABIInfo::addFieldToArgStruct(SmallVector<llvm::Type *, 6> &FrameFields,
	CharUnits &StackOffset, ABIArgInfo &Info,			CharUnits &StackOffset, ABIArgInfo &Info,
	QualType Type) const {			QualType Type) const {
	// Arguments are always 4-byte-aligned.			// Arguments are always 4-byte-aligned.
	CharUnits FieldAlign = CharUnits::fromQuantity(4);			CharUnits FieldAlign = CharUnits::fromQuantity(4);

	assert(StackOffset.isMultipleOf(FieldAlign) && "unaligned inalloca struct");			assert(StackOffset.isMultipleOf(FieldAlign) && "unaligned inalloca struct");
				CharUnits FieldStart = StackOffset;
				unsigned StackArgTypeAlignment = getContext().getTypeAlign(Type) / 8;
				// if the type contains an SSE vector type, the alignment has to be 16.
				StackArgTypeAlignment = (StackArgTypeAlignment >= 16 &&
				(isSSEVectorType(getContext(), Type) \|\|
				isRecordWithSSEVectorType(getContext(), Type)))
				? 16
				: MinABIStackAlignInBytes;

				CharUnits argAlign = CharUnits::fromQuantity(StackArgTypeAlignment);

				StackOffset = FieldStart.alignTo(argAlign);
				if (StackOffset != FieldStart) {
				CharUnits NumBytes = StackOffset - FieldStart;
				llvm::Type *Ty = llvm::Type::getInt8Ty(getVMContext());
				Ty = llvm::ArrayType::get(Ty, NumBytes.getQuantity());
				FrameFields.push_back(Ty);
				}

	Info = ABIArgInfo::getInAlloca(FrameFields.size());			Info = ABIArgInfo::getInAlloca(FrameFields.size());
	FrameFields.push_back(CGT.ConvertTypeForMem(Type));			FrameFields.push_back(CGT.ConvertTypeForMem(Type));
	StackOffset += getContext().getTypeSizeInChars(Type);			StackOffset += getContext().getTypeSizeInChars(Type);

	// Insert padding bytes to respect alignment.			// Insert padding bytes to respect alignment.
	CharUnits FieldEnd = StackOffset;			CharUnits FieldEnd = StackOffset;
	StackOffset = FieldEnd.alignTo(FieldAlign);			StackOffset = FieldEnd.alignTo(FieldAlign);
	if (StackOffset != FieldEnd) {			if (StackOffset != FieldEnd) {
	▲ Show 20 Lines • Show All 8,199 Lines • Show Last 20 Lines