This is an archive of the discontinued LLVM Phabricator instance.

Fix AAPCS non-compliance caused by very large structs
ClosedPublic

Authored by olista01 on Jun 25 2014, 9:32 AM.

Download Raw Diff

Details

Reviewers

Summary

This is a fix to the code in clang which inserts padding arguments to ensure that the ARM backend can emit AAPCS-VFP compliant code. This code needs to track the number of registers which have been allocated in order to do this. When passing a very large struct (>64 bytes) by value, clang emits IR which takes a pointer to the struct, but the backend converts this back to passing the struct in registers and on the stack. The bug was that this was being considered by clang to only use one register, meaning that there were situations in which padding arguments were incorrectly emitted by clang.

Diff Detail

Event Timeline

olista01 updated this revision to Diff 10837.Jun 25 2014, 9:32 AM

olista01 retitled this revision from to Fix AAPCS non-compliance caused by very large structs.

olista01 updated this object.

olista01 edited the test plan for this revision. (Show Details)

olista01 added a subscriber: Unknown Object (MLST).

Herald added subscribers: mroth, aemerson. · View Herald TranscriptJun 25 2014, 9:32 AM

rengolin added a subscriber: rengolin.Jun 27 2014, 4:54 AM

rengolin added inline comments.

lib/CodeGen/TargetInfo.cpp
4206	Couldn't this be: NumRegs = (getContext().getTypeSize(Ty) + 63) / 32; markAllocatedGPRs(2, NumRegs); and avoid the extra multiplication?

Inline comment

lib/CodeGen/TargetInfo.cpp
4206	The division by 64 is part of rounding up to the nearest multiple of 64 [0], and this simplification would give different results if the size is, for example, 64 bits. That said, this code is only used when the size is greater than 64 bytes, meaning it will always use all available GPRs, so I only left the calculations in to make the intention clear. I wouldn't be averse to replacing both arms of the "if" with markAllocatedGPRs(1, 4) and a comment, if you would prefer that? [0] This is because all types in the AAPCS have a size which is a multiple of their alignment.

rengolin added inline comments.Jun 27 2014, 6:02 AM

lib/CodeGen/TargetInfo.cpp
4206	Of course, silly me. I think making it 4 and explaining why would make more sense.

Simplify, with comment explaining why this can be done.

This patch also fixes the (textual) alignment of the NumRequired parameter in the definition of ARMABIInfo::markAllocatedGPRs, which is present in the patch I uploaded but not showing up in Phab. This may be because it is a whitespace-only change?

LGTM.

Thanks!
--renato

This revision is now accepted and ready to land.Jun 27 2014, 6:58 AM

Thanks, Committed revision 211898.

Revision Contents

Path

Size

lib/

CodeGen/

TargetInfo.cpp

9 lines

test/

CodeGen/

arm-aapcs-vfp.c

6 lines

Diff 10929

lib/CodeGen/TargetInfo.cpp

Show First 20 Lines • Show All 4,048 Lines • ▼ Show 20 Lines	for (unsigned I = 0; I < 16; I++)
VFPRegs[I] = 1;		VFPRegs[I] = 1;
AllocatedVFPs = 17; // We do not have enough VFP registers.		AllocatedVFPs = 17; // We do not have enough VFP registers.
}		}

/// Update AllocatedGPRs to record the number of general purpose registers		/// Update AllocatedGPRs to record the number of general purpose registers
/// which have been allocated. It is valid for AllocatedGPRs to go above 4,		/// which have been allocated. It is valid for AllocatedGPRs to go above 4,
/// this represents arguments being stored on the stack.		/// this represents arguments being stored on the stack.
void ARMABIInfo::markAllocatedGPRs(unsigned Alignment,		void ARMABIInfo::markAllocatedGPRs(unsigned Alignment,
unsigned NumRequired) const {		unsigned NumRequired) const {
assert((Alignment == 1 \|\| Alignment == 2) && "Alignment must be 4 or 8 bytes");		assert((Alignment == 1 \|\| Alignment == 2) && "Alignment must be 4 or 8 bytes");

if (Alignment == 2 && AllocatedGPRs & 0x1)		if (Alignment == 2 && AllocatedGPRs & 0x1)
AllocatedGPRs += 1;		AllocatedGPRs += 1;

AllocatedGPRs += NumRequired;		AllocatedGPRs += NumRequired;
}		}

▲ Show 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	ABIArgInfo ARMABIInfo::classifyArgumentType(QualType Ty, bool isVariadic,
// most 8-byte. We realign the indirect argument if type alignment is bigger		// most 8-byte. We realign the indirect argument if type alignment is bigger
// than ABI alignment.		// than ABI alignment.
uint64_t ABIAlign = 4;		uint64_t ABIAlign = 4;
uint64_t TyAlign = getContext().getTypeAlign(Ty) / 8;		uint64_t TyAlign = getContext().getTypeAlign(Ty) / 8;
if (getABIKind() == ARMABIInfo::AAPCS_VFP \|\|		if (getABIKind() == ARMABIInfo::AAPCS_VFP \|\|
getABIKind() == ARMABIInfo::AAPCS)		getABIKind() == ARMABIInfo::AAPCS)
ABIAlign = std::min(std::max(TyAlign, (uint64_t)4), (uint64_t)8);		ABIAlign = std::min(std::max(TyAlign, (uint64_t)4), (uint64_t)8);
if (getContext().getTypeSizeInChars(Ty) > CharUnits::fromQuantity(64)) {		if (getContext().getTypeSizeInChars(Ty) > CharUnits::fromQuantity(64)) {
// Update Allocated GPRs		// Update Allocated GPRs. Since this is only used when the size of the
markAllocatedGPRs(1, 1);		// argument is greater than 64 bytes, this will always use up any available
		// registers (of which there are 4). We also don't care about getting the
		// alignment right, because general-purpose registers cannot be back-filled.
		markAllocatedGPRs(1, 4);
return ABIArgInfo::getIndirect(TyAlign, /ByVal=/true,		return ABIArgInfo::getIndirect(TyAlign, /ByVal=/true,
/Realign=/TyAlign > ABIAlign);		/Realign=/TyAlign > ABIAlign);
		rengolinUnsubmitted Not Done Reply Inline Actions Couldn't this be: NumRegs = (getContext().getTypeSize(Ty) + 63) / 32; markAllocatedGPRs(2, NumRegs); and avoid the extra multiplication? rengolin: Couldn't this be: NumRegs = (getContext().getTypeSize(Ty) + 63) / 32; markAllocatedGPRs(2…
		olista01AuthorUnsubmitted Not Done Reply Inline Actions The division by 64 is part of rounding up to the nearest multiple of 64 [0], and this simplification would give different results if the size is, for example, 64 bits. That said, this code is only used when the size is greater than 64 bytes, meaning it will always use all available GPRs, so I only left the calculations in to make the intention clear. I wouldn't be averse to replacing both arms of the "if" with markAllocatedGPRs(1, 4) and a comment, if you would prefer that? [0] This is because all types in the AAPCS have a size which is a multiple of their alignment. olista01: The division by 64 is part of rounding up to the nearest multiple of 64 [0], and this…
		rengolinUnsubmitted Not Done Reply Inline Actions Of course, silly me. I think making it 4 and explaining why would make more sense. rengolin: Of course, silly me. I think making it 4 and explaining why would make more sense.
}		}

// Otherwise, pass by coercing to a structure of the appropriate size.		// Otherwise, pass by coercing to a structure of the appropriate size.
llvm::Type* ElemTy;		llvm::Type* ElemTy;
unsigned SizeRegs;		unsigned SizeRegs;
// FIXME: Try to match the types of the arguments more accurately where		// FIXME: Try to match the types of the arguments more accurately where
// we can.		// we can.
if (getContext().getTypeAlign(Ty) <= 32) {		if (getContext().getTypeAlign(Ty) <= 32) {
▲ Show 20 Lines • Show All 2,371 Lines • Show Last 20 Lines

test/CodeGen/arm-aapcs-vfp.c

	Show First 20 Lines • Show All 133 Lines • ▼ Show 20 Lines
	typedef struct { int a; int b:4; int c; } struct_int_bitfield_int;			typedef struct { int a; int b:4; int c; } struct_int_bitfield_int;
	// CHECK: define arm_aapcs_vfpcc void @test_test_vfp_stack_gpr_split_bitfield(double %a, double %b, double %c, double %d, double %e, double %f, double %g, double %h, double %i, i32 %j, i32 %k, [2 x i32], { [3 x i32] } %l.coerce)			// CHECK: define arm_aapcs_vfpcc void @test_test_vfp_stack_gpr_split_bitfield(double %a, double %b, double %c, double %d, double %e, double %f, double %g, double %h, double %i, i32 %j, i32 %k, [2 x i32], { [3 x i32] } %l.coerce)
	void test_test_vfp_stack_gpr_split_bitfield(double a, double b, double c, double d, double e, double f, double g, double h, double i, int j, int k, struct_int_bitfield_int l) {}			void test_test_vfp_stack_gpr_split_bitfield(double a, double b, double c, double d, double e, double f, double g, double h, double i, int j, int k, struct_int_bitfield_int l) {}

	// Note: this struct requires internal padding			// Note: this struct requires internal padding
	typedef struct { int x; long long y; } struct_int_long_long;			typedef struct { int x; long long y; } struct_int_long_long;
	// CHECK: define arm_aapcs_vfpcc void @test_vfp_stack_gpr_split_4(double %a, double %b, double %c, double %d, double %e, double %f, double %g, double %h, double %i, i32 %j, [3 x i32], { [2 x i64] } %k.coerce)			// CHECK: define arm_aapcs_vfpcc void @test_vfp_stack_gpr_split_4(double %a, double %b, double %c, double %d, double %e, double %f, double %g, double %h, double %i, i32 %j, [3 x i32], { [2 x i64] } %k.coerce)
	void test_vfp_stack_gpr_split_4(double a, double b, double c, double d, double e, double f, double g, double h, double i, int j, struct_int_long_long k) {}			void test_vfp_stack_gpr_split_4(double a, double b, double c, double d, double e, double f, double g, double h, double i, int j, struct_int_long_long k) {}

				// This very large struct (passed byval) uses up the GPRs, so no padding is needed
				typedef struct { int x[17]; } struct_seventeen_ints;
				typedef struct { int x[4]; } struct_four_ints;
				// CHECK: define arm_aapcs_vfpcc void @test_vfp_stack_gpr_split_5(%struct.struct_seventeen_ints* byval align 4 %a, double %b, double %c, double %d, double %e, double %f, double %g, double %h, double %i, double %j, { [4 x i32] } %k.coerce)
				void test_vfp_stack_gpr_split_5(struct_seventeen_ints a, double b, double c, double d, double e, double f, double g, double h, double i, double j, struct_four_ints k) {}