This is an archive of the discontinued LLVM Phabricator instance.

Use Store Size not Alloc Size when Coercing
ClosedPublic

Authored by tjablin on Aug 27 2014, 6:38 PM.

Download Raw Diff

Details

Reviewers

Summary

Previously, EnterStructPointerForCoercedAccess used Alloc size when determining how to convert. This was problematic, because there were situations were the alloc size was larger than the store size. For example, if the first element of a structure were i24 and the destination type were i32, the old code would generate a GEP and a load i24. The code should compare store sizes to ensure the whole object is loaded. I have attached a test case.

This patch modifies the output of arm64-be-bitfield.c test case, but the new IR seems to be equivalent, and after -O3, the compiler generates identical ARM assembly. (asr x0, x0, #54) All tests pass with r216480. Thanks!

Diff Detail

Event Timeline

tjablin updated this revision to Diff 13016.Aug 27 2014, 6:38 PM

tjablin retitled this revision from to Use Store Size not Alloc Size when Coercing.

tjablin updated this object.

tjablin edited the test plan for this revision. (Show Details)

tjablin added a reviewer: jmolloy.

tjablin added a subscriber: Unknown Object (MLST).

Herald added a subscriber: aemerson. · View Herald TranscriptAug 27 2014, 6:38 PM

Hi Thomas,

This generally looks good to me with one change.

Cheers,

James

lib/CodeGen/CGCall.cpp
647–649	This comment is confusing. "Use the store size and not the alloca size here to ensure we will actually load the whole object" - But the alloca size is always greater than or equal to the store size. So the comment seems wrong - if we use the alloca size, we are also guaranteed to load the whole object. Also please terminate sentences with a full-stop (.).

This revision now requires changes to proceed.Aug 28 2014, 2:01 AM

I have adjusted the comment as per your request. If you are satisfied, could you please push it upstream for me. Thanks.

Add REQUIRES: aarch64-registered-target since the new test looks at the ARM assembly.

Review closed - code was committed.

This revision is now accepted and ready to land.Sep 2 2014, 2:42 AM

jmolloy closed this revision.Sep 2 2014, 2:42 AM

Revision Contents

Path

Size

lib/

CodeGen/

CGCall.cpp

8 lines

test/

CodeGen/

24-bit.c

14 lines

arm64-be-bitfield.c

12 lines

Diff 13063

lib/CodeGen/CGCall.cpp

Context not available.
	llvm::Type *FirstElt = SrcSTy->getElementType(0);	llvm::Type *FirstElt = SrcSTy->getElementType(0);

	// If the first elt is at least as large as what we're looking for, or if the	// If the first elt is at least as large as what we're looking for, or if the
	// first element is the same size as the whole struct, we can enter it.	// first element is the same size as the whole struct, we can enter it. The
		// comparison must be made on the store size and not the alloca size. Using
		// the alloca size may overstate the size of the load.
		jmolloyUnsubmitted Not Done Reply Inline Actions This comment is confusing. "Use the store size and not the alloca size here to ensure we will actually load the whole object" - But the alloca size is always greater than or equal to the store size. So the comment seems wrong - if we use the alloca size, we are also guaranteed to load the whole object. Also please terminate sentences with a full-stop (.). jmolloy: This comment is confusing. "Use the store size and not the alloca size here to ensure we will…
	uint64_t FirstEltSize =	uint64_t FirstEltSize =
	CGF.CGM.getDataLayout().getTypeAllocSize(FirstElt);	CGF.CGM.getDataLayout().getTypeStoreSize(FirstElt);
	if (FirstEltSize < DstSize &&	if (FirstEltSize < DstSize &&
	FirstEltSize < CGF.CGM.getDataLayout().getTypeAllocSize(SrcSTy))	FirstEltSize < CGF.CGM.getDataLayout().getTypeStoreSize(SrcSTy))
	return SrcPtr;	return SrcPtr;

	// GEP into the first element.	// GEP into the first element.
Context not available.

test/CodeGen/24-bit.c

				// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm -O0 -o - %s \| FileCheck %s

				static union ibtt2
				{
				struct ibtt0 { signed ibt0:10; unsigned short ibt1; } ibt5;
				struct ibtt1 { signed ibt2:3; signed ibt3:9; signed ibt4:9; } ibt6;
				} ibt15 = {{267, 15266}};

				void callee_ibt0f(union ibtt2 ibtp5);

				void test(void) {
				// CHECK: = load i32*
				callee_ibt0f(ibt15);
				}

test/CodeGen/arm64-be-bitfield.c

	// RUN: %clang_cc1 -triple aarch64_be-linux-gnu -ffreestanding -emit-llvm -O0 -o - %s \| FileCheck %s			// REQUIRES: aarch64-registered-target
				// RUN: %clang_cc1 -triple aarch64_be-linux-gnu -ffreestanding -emit-llvm -O0 -o - %s \| FileCheck --check-prefix IR %s
				// RUN: %clang_cc1 -triple aarch64_be-linux-gnu -ffreestanding -S -O1 -o - %s \| FileCheck --check-prefix ARM %s

	struct bt3 { signed b2:10; signed b3:10; } b16;			struct bt3 { signed b2:10; signed b3:10; } b16;

	// The correct right-shift amount is 40 bits for big endian.			// Get the high 32-bits and then shift appropriately for big-endian.
	signed callee_b0f(struct bt3 bp11) {			signed callee_b0f(struct bt3 bp11) {
	// CHECK: = lshr i64 %{{.*}}, 40			// IR: callee_b0f(i64 [[ARG:%.*]])
				// IR: store i64 [[ARG]], i64* [[PTR:%.*]]
				// IR: [[BITCAST:%.]] = bitcast i64 [[PTR]] to i8*
				// IR: call void @llvm.memcpy.p0i8.p0i8.i64(i8* {{.}}, i8 [[BITCAST]], i64 4
				// ARM: asr x0, x0, #54
	return bp11.b2;			return bp11.b2;
	}			}