Download Raw Diff

Details

Reviewers

jdoerfert
tianshilei1992
uenoku
homerdin
sstefan1
baziotis
lebedev.ri

Commits

rG38fc89623b3e: [Attributor][Fix] Add alignment return attribute to HeapToStack

Summary

This patch changes the HeapToStack optimization to attach the return alignment
attribute information to the created alloca instruction. This would cause
problems when replacing the heap allocation with an alloca did not respect the
alignment of the original heap allocation, which would typically be aligned on
an 8 or 16 byte boundary. Malloc calls now contain alignment attributes,
so we can use that information here.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60 ms	x64 debian > LLVM.Bindings/Go::go.test

Event Timeline

jhuber6 created this revision.Dec 16 2021, 10:02 AM

Herald added a reviewer: uenoku. · View Herald TranscriptDec 16 2021, 10:02 AM

Herald added a reviewer: homerdin. · View Herald Transcript

Herald added subscribers: ormris, okura, kuter and 2 others. · View Herald Transcript

jhuber6 requested review of this revision.Dec 16 2021, 10:02 AM

Herald added a reviewer: sstefan1. · View Herald TranscriptDec 16 2021, 10:02 AM

Herald added a reviewer: baziotis. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added a subscriber: llvm-commits. · View Herald Transcript

Isn't the default alignment is a target dependent attribute?

Shouldn't/can't you query the alignment for that pointer?
Hardcoding anything like this is a sign of a problem.

In D115888#3198139, @lebedev.ri wrote:

Shouldn't/can't you query the alignment for that pointer?
Hardcoding anything like this is a sign of a problem.

I haven't found an interface in LLVM to query default alignment information from malloc. As far as I know, the malloc functions are defined by the GNU documentation to be aligned to 16 on 64-bit systems, and 8 on 32-bit systems and Clang behaves similarly. A more complete solution would be to change the __kmpc_alloc_shared RTL function to be an aligned malloc function. That way in Clang we can just use the natural alignment of the underlying type, or query the target info from there. But I don't think that is explicitly necessary because we just need to mimic the pointer's alignment as it would be from the malloc call since we're replacing it with an alloca. If you know of somewhere I can query the default malloc alignment from within LLVM I can add that, or I can just check the data layout and use the 8 / 16 distinction used in https://www.gnu.org/software/libc/manual/html_node/Aligned-Memory-Blocks.html#Aligned-Memory-Blocks. I'd prefer not to change the OpenMP RTL to include the alignment, as that would change a lot of tests and code for little gain I can see.

In D115888#3198155, @jhuber6 wrote:

In D115888#3198139, @lebedev.ri wrote:

Shouldn't/can't you query the alignment for that pointer?
Hardcoding anything like this is a sign of a problem.

I haven't found an interface in LLVM to query default alignment information from malloc.

I'm talking about attributor's AAAlign attribute here.

This revision now requires changes to proceed.Dec 16 2021, 10:28 AM

In D115888#3198157, @lebedev.ri wrote:

In D115888#3198155, @jhuber6 wrote:

In D115888#3198139, @lebedev.ri wrote:

Shouldn't/can't you query the alignment for that pointer?
Hardcoding anything like this is a sign of a problem.

I haven't found an interface in LLVM to query default alignment information from malloc.

I'm talking about attributor's AAAlign attribute here.

I'm not sure if deriving it is a solution here, considering that we are replacing a runtime call with defined default alignment with an alloca that should always at least match that. I can try using AAAlign to query it, my first thought was to use the element type of the bitcast that always follows the __kmpc_alloc_shared call, but @jdoerfert just told me to just pick whatever the default is when I asked. This issue comes from https://github.com/kokkos/kokkos/issues/4224.

Then i guess you need to basically introduce an interface to do what https://en.cppreference.com/w/cpp/types/max_align_t does, but based on a datalayout.

Harbormaster completed remote builds in B139693: Diff 394915.Dec 16 2021, 11:29 AM

In D115888#3198178, @lebedev.ri wrote:

Then i guess you need to basically introduce an interface to do what https://en.cppreference.com/w/cpp/types/max_align_t does, but based on a datalayout.

Seems reasonable. I know we can query this information from clang, e.g. https://clang.llvm.org/doxygen/classclang_1_1TargetInfo.html#a01403a5106161d4d3cd0c50c43150f89, but I don't think there is an existing string in the data layout to encode this. Will I be adding a new format this? That would be a reasonably large change so I just want to make sure I'm on the right page.

In D115888#3198322, @jhuber6 wrote:

In D115888#3198178, @lebedev.ri wrote:

Then i guess you need to basically introduce an interface to do what https://en.cppreference.com/w/cpp/types/max_align_t does, but based on a datalayout.

Seems reasonable. I know we can query this information from clang, e.g. https://clang.llvm.org/doxygen/classclang_1_1TargetInfo.html#a01403a5106161d4d3cd0c50c43150f89, but I don't think there is an existing string in the data layout to encode this. Will I be adding a new format this? That would be a reasonably large change so I just want to make sure I'm on the right page.

Err, no. I'm simply thinking that datalayout already specifies the primitive [scalar] types, so you should just need to go through them and pick the one with maximal alignment requirement, and pick it.

tschuett added a subscriber: tschuett.Dec 16 2021, 12:04 PM

tschuett added inline comments.

llvm/lib/Transforms/IPO/AttributorAttributes.cpp
5931–5933	Would a comment help to explain what the hard-coded 16 means?

In D115888#3198329, @lebedev.ri wrote:

In D115888#3198322, @jhuber6 wrote:

In D115888#3198178, @lebedev.ri wrote:

Then i guess you need to basically introduce an interface to do what https://en.cppreference.com/w/cpp/types/max_align_t does, but based on a datalayout.

Seems reasonable. I know we can query this information from clang, e.g. https://clang.llvm.org/doxygen/classclang_1_1TargetInfo.html#a01403a5106161d4d3cd0c50c43150f89, but I don't think there is an existing string in the data layout to encode this. Will I be adding a new format this? That would be a reasonably large change so I just want to make sure I'm on the right page.

Err, no. I'm simply thinking that datalayout already specifies the primitive [scalar] types, so you should just need to go through them and pick the one with maximal alignment requirement, and pick it.

The default data layout contains a 128 bit float, so if we just check the maximum alignment we'll always get at least 16, even on 32-bit architectures. I could only consider the ones set explicitly by the data layout string, but doesn't that go against the purpose of the defaults?

In D115888#3198672, @jhuber6 wrote:

In D115888#3198329, @lebedev.ri wrote:

In D115888#3198322, @jhuber6 wrote:

In D115888#3198178, @lebedev.ri wrote:

Then i guess you need to basically introduce an interface to do what https://en.cppreference.com/w/cpp/types/max_align_t does, but based on a datalayout.

Seems reasonable. I know we can query this information from clang, e.g. https://clang.llvm.org/doxygen/classclang_1_1TargetInfo.html#a01403a5106161d4d3cd0c50c43150f89, but I don't think there is an existing string in the data layout to encode this. Will I be adding a new format this? That would be a reasonably large change so I just want to make sure I'm on the right page.

Err, no. I'm simply thinking that datalayout already specifies the primitive [scalar] types, so you should just need to go through them and pick the one with maximal alignment requirement, and pick it.

The default data layout contains a 128 bit float,

For which target/architecture? What happens on other target/architectures?

so if we just check the maximum alignment we'll always get at least 16, even on 32-bit architectures. I could only consider the ones set explicitly by the data layout string, but doesn't that go against the purpose of the defaults?

In D115888#3198696, @lebedev.ri wrote:

For which target/architecture? What happens on other target/architectures?

This is from the documentation on the data layout. It defines the default values used when initializing the data layout. It seems these can only be overridden, and I can't imagine a situation where someone would override it to define a 128 bit float to have 64-bit alignment, so the largest alignment we'll have in the data layout will always be at least 16 bytes.

When constructing the data layout for a given target, LLVM starts with a default set of specifications which are then (possibly) overridden by the specifications in the datalayout keyword. The default specifications are given in this list:

e - little endian
p:64:64:64 - 64-bit pointers with 64-bit alignment.
p[n]:64:64:64 - Other address spaces are assumed to be the same as the default address space.
S0 - natural stack alignment is unspecified
i1:8:8 - i1 is 8-bit (byte) aligned
i8:8:8 - i8 is 8-bit (byte) aligned
i16:16:16 - i16 is 16-bit aligned
i32:32:32 - i32 is 32-bit aligned
i64:32:64 - i64 has ABI alignment of 32-bits but preferred alignment of 64-bits
f16:16:16 - half is 16-bit aligned
f32:32:32 - float is 32-bit aligned
f64:64:64 - double is 64-bit aligned
f128:128:128 - quad is 128-bit aligned
v64:64:64 - 64-bit vector is 64-bit aligned
v128:128:128 - 128-bit vector is 128-bit aligned

I think the most straightforward way to solve this is to add an alignment attribute to the return value when we generate code, then just copy that when we replace it. It'll change some tests but I'll try that.

Changing the method. Adding alignment information to the runtime call and using
it when we create the alloca. I might need to add the alignment information to
the runtime call to make the implementation sound, but I haven't encountered any
problems with the runtime implementation.

Herald added a project: Restricted Project. · View Herald TranscriptDec 16 2021, 4:03 PM

Herald added subscribers: cfe-commits, asavonic. · View Herald Transcript

Harbormaster completed remote builds in B139768: Diff 395023.Dec 16 2021, 4:58 PM

jhuber6 retitled this revision from [Attributor][Fix] Add default alignment to HeapToStack to [Attributor][Fix] Add alignment return attribute to HeapToStack.Dec 17 2021, 8:30 AM

jhuber6 edited the summary of this revision. (Show Details)

jdoerfert added inline comments.Dec 17 2021, 9:47 AM

clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
1411 ↗	(On Diff #395023)	This doesn't work. If the type alignment is > 8 the stack won't fulfill it unless you modify /// Add worst-case padding so that future allocations are properly aligned. constexpr const uint32_t Alignment = 8; in `openmp/libomptarget/DeviceRTL/src/State.cpp`. The fact that the state has a fixed alignment right now makes it impossible to allocate higher aligned types anyway. Proposal: Add an argument to _alloc_shared that is the alignment as computed above, effecitively making it _alloc_shared_aligned. Modify the stack to actually align the base pointer rather than extend the allocation based on the alignment passed in. Then any type alignment can be handled, including user aligned types.
1475 ↗	(On Diff #395023)	Not needed. Will cause a warning, no?
llvm/lib/Transforms/IPO/AttributorAttributes.cpp
5942	This is sensible but needs a test. You can even do it without the else for all allocations. With the proposed changes above alloc_shared would also fall into the aligned_alloc case.

I will split this into two revisions, one handling the return alignment attribute in the Attributor, and one adding alignment information to the __kmpc_alloc_shared OpenMP runtime call, turning it into an aligned allocation.

clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
1411 ↗	(On Diff #395023)	That was an original though, I was hoping to avoid the extra work, but I think this is definitely the only way to solve this reasonably, it might also allow us to use the stack more efficiently. We'll still want this alignment information, but we'll need to inform the runtime of the expected alignment.
1475 ↗	(On Diff #395023)	Forgot about this, not intended to be included.
llvm/lib/Transforms/IPO/AttributorAttributes.cpp
5942	Yes, we want this regardless because all `malloc` like calls now seem to have alignment attributes, which makes sure we respect the alignment of the original malloc call. I can probably split this into another patch.

Removing OpenMP code, only adding support for return alignments. Fixing OpenMP
will occur in a following patch.

Removing else if, we should be able to check for all allocations.

LG, don't forget to update the commit message.

jhuber6 edited the summary of this revision. (Show Details)Dec 17 2021, 11:45 AM

Harbormaster completed remote builds in B139889: Diff 395187.Dec 17 2021, 12:26 PM

jhuber6 added a child revision: D115971: [OpenMP][FIX] Change globalization alignment to 16.Dec 17 2021, 2:04 PM

Still LG

This revision was not accepted when it landed; it landed in state Needs Review.Dec 27 2021, 1:58 PM

This revision was landed with ongoing or failed builds.

Closed by commit rG38fc89623b3e: [Attributor][Fix] Add alignment return attribute to HeapToStack (authored by jhuber6). · Explain Why

This revision was automatically updated to reflect the committed changes.

jhuber6 added a commit: rG38fc89623b3e: [Attributor][Fix] Add alignment return attribute to HeapToStack.

Diff 395187

llvm/lib/Transforms/IPO/AttributorAttributes.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,922 Lines • ▼ Show 20 Lines	for (auto &It : AllocationInfos) {
IRBuilder<> B(AI.CB);		IRBuilder<> B(AI.CB);
Size = B.CreateMul(Num, SizeT, "h2s.calloc.size");		Size = B.CreateMul(Num, SizeT, "h2s.calloc.size");
} else if (AI.Kind == AllocationInfo::AllocationKind::ALIGNED_ALLOC) {		} else if (AI.Kind == AllocationInfo::AllocationKind::ALIGNED_ALLOC) {
Size = AI.CB->getOperand(1);		Size = AI.CB->getOperand(1);
} else {		} else {
Size = AI.CB->getOperand(0);		Size = AI.CB->getOperand(0);
}		}

Align Alignment(1);		Align Alignment(1);
		if (MaybeAlign RetAlign = AI.CB->getRetAlign())
		Alignment = max(Alignment, RetAlign);
		tschuettUnsubmitted Not Done Reply Inline Actions Would a comment help to explain what the hard-coded 16 means? tschuett: Would a comment help to explain what the hard-coded 16 means?
if (AI.Kind == AllocationInfo::AllocationKind::ALIGNED_ALLOC) {		if (AI.Kind == AllocationInfo::AllocationKind::ALIGNED_ALLOC) {
Optional<APInt> AlignmentAPI =		Optional<APInt> AlignmentAPI =
getAPInt(A, this, AI.CB->getArgOperand(0));		getAPInt(A, this, AI.CB->getArgOperand(0));
assert(AlignmentAPI.hasValue() &&		assert(AlignmentAPI.hasValue() &&
"Expected an alignment during manifest!");		"Expected an alignment during manifest!");
Alignment =		Alignment =
max(Alignment, MaybeAlign(AlignmentAPI.getValue().getZExtValue()));		max(Alignment, MaybeAlign(AlignmentAPI.getValue().getZExtValue()));
}		}

		jdoerfertUnsubmitted Not Done Reply Inline Actions This is sensible but needs a test. You can even do it without the else for all allocations. With the proposed changes above alloc_shared would also fall into the aligned_alloc case. jdoerfert: This is sensible but needs a test. You can even do it without the else for all allocations.
		jhuber6AuthorUnsubmitted Done Reply Inline Actions Yes, we want this regardless because all `malloc` like calls now seem to have alignment attributes, which makes sure we respect the alignment of the original malloc call. I can probably split this into another patch. jhuber6: Yes, we want this regardless because all `malloc` like calls now seem to have alignment…
unsigned AS = cast<PointerType>(AI.CB->getType())->getAddressSpace();		unsigned AS = cast<PointerType>(AI.CB->getType())->getAddressSpace();
Instruction *Alloca =		Instruction *Alloca =
new AllocaInst(Type::getInt8Ty(F->getContext()), AS, Size, Alignment,		new AllocaInst(Type::getInt8Ty(F->getContext()), AS, Size, Alignment,
"", AI.CB->getNextNode());		"", AI.CB->getNextNode());

if (Alloca->getType() != AI.CB->getType())		if (Alloca->getType() != AI.CB->getType())
Alloca = new BitCastInst(Alloca, AI.CB->getType(), "malloc_bc",		Alloca = new BitCastInst(Alloca, AI.CB->getType(), "malloc_bc",
Alloca->getNextNode());		Alloca->getNextNode());
▲ Show 20 Lines • Show All 4,000 Lines • Show Last 20 Lines

llvm/test/Transforms/Attributor/heap_to_stack.ll

	Show All 28 Lines
	declare void @free(i8* nocapture)			declare void @free(i8* nocapture)

	declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) nounwind			declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) nounwind

	define void @h2s_value_simplify_interaction(i1 %c, i8* %A) {			define void @h2s_value_simplify_interaction(i1 %c, i8* %A) {
	; IS________OPM-LABEL: define {{[^@]+}}@h2s_value_simplify_interaction			; IS________OPM-LABEL: define {{[^@]+}}@h2s_value_simplify_interaction
	; IS________OPM-SAME: (i1 [[C:%.]], i8 nocapture nofree readnone [[A:%.*]]) {			; IS________OPM-SAME: (i1 [[C:%.]], i8 nocapture nofree readnone [[A:%.*]]) {
	; IS________OPM-NEXT: entry:			; IS________OPM-NEXT: entry:
	; IS________OPM-NEXT: [[M:%.]] = tail call noalias i8 @malloc(i64 noundef 4)			; IS________OPM-NEXT: [[M:%.]] = tail call noalias align 16 i8 @malloc(i64 noundef 4)
	; IS________OPM-NEXT: br i1 [[C]], label [[T:%.]], label [[F:%.]]			; IS________OPM-NEXT: br i1 [[C]], label [[T:%.]], label [[F:%.]]
	; IS________OPM: t:			; IS________OPM: t:
	; IS________OPM-NEXT: br i1 false, label [[DEAD:%.]], label [[F2:%.]]			; IS________OPM-NEXT: br i1 false, label [[DEAD:%.]], label [[F2:%.]]
	; IS________OPM: f:			; IS________OPM: f:
	; IS________OPM-NEXT: br label [[J:%.*]]			; IS________OPM-NEXT: br label [[J:%.*]]
	; IS________OPM: f2:			; IS________OPM: f2:
	; IS________OPM-NEXT: [[C1:%.]] = bitcast i8 [[M]] to i32*			; IS________OPM-NEXT: [[C1:%.]] = bitcast i8 [[M]] to i32*
	; IS________OPM-NEXT: [[C2:%.]] = bitcast i32 [[C1]] to i8*			; IS________OPM-NEXT: [[C2:%.]] = bitcast i32 [[C1]] to i8*
	; IS________OPM-NEXT: [[L:%.]] = load i8, i8 [[C2]], align 1			; IS________OPM-NEXT: [[L:%.]] = load i8, i8 [[C2]], align 16
	; IS________OPM-NEXT: call void @usei8(i8 [[L]])			; IS________OPM-NEXT: call void @usei8(i8 [[L]])
	; IS________OPM-NEXT: call void @no_sync_func(i8* nocapture nofree noundef [[C2]]) #[[ATTR5:[0-9]+]]			; IS________OPM-NEXT: call void @no_sync_func(i8* nocapture nofree noundef align 16 [[C2]]) #[[ATTR5:[0-9]+]]
	; IS________OPM-NEXT: br label [[J]]			; IS________OPM-NEXT: br label [[J]]
	; IS________OPM: dead:			; IS________OPM: dead:
	; IS________OPM-NEXT: unreachable			; IS________OPM-NEXT: unreachable
	; IS________OPM: j:			; IS________OPM: j:
	; IS________OPM-NEXT: [[PHI:%.]] = phi i8 [ [[M]], [[F]] ], [ null, [[F2]] ]			; IS________OPM-NEXT: [[PHI:%.]] = phi i8 [ [[M]], [[F]] ], [ null, [[F2]] ]
	; IS________OPM-NEXT: tail call void @no_sync_func(i8* nocapture nofree noundef [[PHI]]) #[[ATTR5]]			; IS________OPM-NEXT: tail call void @no_sync_func(i8* nocapture nofree noundef align 16 [[PHI]]) #[[ATTR5]]
	; IS________OPM-NEXT: ret void			; IS________OPM-NEXT: ret void
	;			;
	; IS________NPM-LABEL: define {{[^@]+}}@h2s_value_simplify_interaction			; IS________NPM-LABEL: define {{[^@]+}}@h2s_value_simplify_interaction
	; IS________NPM-SAME: (i1 [[C:%.]], i8 nocapture nofree readnone [[A:%.*]]) {			; IS________NPM-SAME: (i1 [[C:%.]], i8 nocapture nofree readnone [[A:%.*]]) {
	; IS________NPM-NEXT: entry:			; IS________NPM-NEXT: entry:
	; IS________NPM-NEXT: [[TMP0:%.*]] = alloca i8, i64 4, align 1			; IS________NPM-NEXT: [[TMP0:%.*]] = alloca i8, i64 4, align 16
	; IS________NPM-NEXT: br i1 [[C]], label [[T:%.]], label [[F:%.]]			; IS________NPM-NEXT: br i1 [[C]], label [[T:%.]], label [[F:%.]]
	; IS________NPM: t:			; IS________NPM: t:
	; IS________NPM-NEXT: br i1 false, label [[DEAD:%.]], label [[F2:%.]]			; IS________NPM-NEXT: br i1 false, label [[DEAD:%.]], label [[F2:%.]]
	; IS________NPM: f:			; IS________NPM: f:
	; IS________NPM-NEXT: br label [[J:%.*]]			; IS________NPM-NEXT: br label [[J:%.*]]
	; IS________NPM: f2:			; IS________NPM: f2:
	; IS________NPM-NEXT: [[L:%.]] = load i8, i8 [[TMP0]], align 1			; IS________NPM-NEXT: [[L:%.]] = load i8, i8 [[TMP0]], align 16
	; IS________NPM-NEXT: call void @usei8(i8 [[L]])			; IS________NPM-NEXT: call void @usei8(i8 [[L]])
	; IS________NPM-NEXT: call void @no_sync_func(i8* nocapture nofree noundef [[TMP0]]) #[[ATTR6:[0-9]+]]			; IS________NPM-NEXT: call void @no_sync_func(i8* nocapture nofree noundef align 16 [[TMP0]]) #[[ATTR6:[0-9]+]]
	; IS________NPM-NEXT: br label [[J]]			; IS________NPM-NEXT: br label [[J]]
	; IS________NPM: dead:			; IS________NPM: dead:
	; IS________NPM-NEXT: unreachable			; IS________NPM-NEXT: unreachable
	; IS________NPM: j:			; IS________NPM: j:
	; IS________NPM-NEXT: [[PHI:%.]] = phi i8 [ [[TMP0]], [[F]] ], [ null, [[F2]] ]			; IS________NPM-NEXT: [[PHI:%.]] = phi i8 [ [[TMP0]], [[F]] ], [ null, [[F2]] ]
	; IS________NPM-NEXT: tail call void @no_sync_func(i8* nocapture nofree noundef [[PHI]]) #[[ATTR6]]			; IS________NPM-NEXT: tail call void @no_sync_func(i8* nocapture nofree noundef align 16 [[PHI]]) #[[ATTR6]]
	; IS________NPM-NEXT: ret void			; IS________NPM-NEXT: ret void
	;			;
	entry:			entry:
	%add = add i64 2, 2			%add = add i64 2, 2
	%m = tail call noalias i8* @malloc(i64 %add)			%m = tail call align 16 noalias i8* @malloc(i64 %add)
	br i1 %c, label %t, label %f			br i1 %c, label %t, label %f
	t:			t:
	br i1 false, label %dead, label %f2			br i1 false, label %dead, label %f2
	f:			f:
	br label %j			br label %j
	f2:			f2:
	%c1 = bitcast i8* %m to i32*			%c1 = bitcast i8* %m to i32*
	%c2 = bitcast i32* %c1 to i8*			%c2 = bitcast i32* %c1 to i8*
	▲ Show 20 Lines • Show All 761 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Attributor][Fix] Add alignment return attribute to HeapToStack
ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 395187

llvm/lib/Transforms/IPO/AttributorAttributes.cpp

llvm/test/Transforms/Attributor/heap_to_stack.ll

This is an archive of the discontinued LLVM Phabricator instance.

[Attributor][Fix] Add alignment return attribute to HeapToStackClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 395187

llvm/lib/Transforms/IPO/AttributorAttributes.cpp

llvm/test/Transforms/Attributor/heap_to_stack.ll

[Attributor][Fix] Add alignment return attribute to HeapToStack
ClosedPublic