This is an archive of the discontinued LLVM Phabricator instance.

[Attributor][Fix] Add alignment return attribute to HeapToStack
ClosedPublic

Authored by jhuber6 on Dec 16 2021, 10:02 AM.

Details

Summary

This patch changes the HeapToStack optimization to attach the return alignment
attribute information to the created alloca instruction. This would cause
problems when replacing the heap allocation with an alloca did not respect the
alignment of the original heap allocation, which would typically be aligned on
an 8 or 16 byte boundary. Malloc calls now contain alignment attributes,
so we can use that information here.

Diff Detail

Event Timeline

jhuber6 created this revision.Dec 16 2021, 10:02 AM
jhuber6 requested review of this revision.Dec 16 2021, 10:02 AM
Herald added a reviewer: baziotis. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript

Isn't the default alignment is a target dependent attribute?

Shouldn't/can't you query the alignment for that pointer?
Hardcoding anything like this is a sign of a problem.

Shouldn't/can't you query the alignment for that pointer?
Hardcoding anything like this is a sign of a problem.

I haven't found an interface in LLVM to query default alignment information from malloc. As far as I know, the malloc functions are defined by the GNU documentation to be aligned to 16 on 64-bit systems, and 8 on 32-bit systems and Clang behaves similarly. A more complete solution would be to change the __kmpc_alloc_shared RTL function to be an aligned malloc function. That way in Clang we can just use the natural alignment of the underlying type, or query the target info from there. But I don't think that is explicitly necessary because we just need to mimic the pointer's alignment as it would be from the malloc call since we're replacing it with an alloca. If you know of somewhere I can query the default malloc alignment from within LLVM I can add that, or I can just check the data layout and use the 8 / 16 distinction used in https://www.gnu.org/software/libc/manual/html_node/Aligned-Memory-Blocks.html#Aligned-Memory-Blocks. I'd prefer not to change the OpenMP RTL to include the alignment, as that would change a lot of tests and code for little gain I can see.

lebedev.ri requested changes to this revision.Dec 16 2021, 10:28 AM

Shouldn't/can't you query the alignment for that pointer?
Hardcoding anything like this is a sign of a problem.

I haven't found an interface in LLVM to query default alignment information from malloc.

I'm talking about attributor's AAAlign attribute here.

This revision now requires changes to proceed.Dec 16 2021, 10:28 AM

Shouldn't/can't you query the alignment for that pointer?
Hardcoding anything like this is a sign of a problem.

I haven't found an interface in LLVM to query default alignment information from malloc.

I'm talking about attributor's AAAlign attribute here.

I'm not sure if deriving it is a solution here, considering that we are replacing a runtime call with defined default alignment with an alloca that should always at least match that. I can try using AAAlign to query it, my first thought was to use the element type of the bitcast that always follows the __kmpc_alloc_shared call, but @jdoerfert just told me to just pick whatever the default is when I asked. This issue comes from https://github.com/kokkos/kokkos/issues/4224.

Then i guess you need to basically introduce an interface to do what https://en.cppreference.com/w/cpp/types/max_align_t does, but based on a datalayout.

Then i guess you need to basically introduce an interface to do what https://en.cppreference.com/w/cpp/types/max_align_t does, but based on a datalayout.

Seems reasonable. I know we can query this information from clang, e.g. https://clang.llvm.org/doxygen/classclang_1_1TargetInfo.html#a01403a5106161d4d3cd0c50c43150f89, but I don't think there is an existing string in the data layout to encode this. Will I be adding a new format this? That would be a reasonably large change so I just want to make sure I'm on the right page.

Then i guess you need to basically introduce an interface to do what https://en.cppreference.com/w/cpp/types/max_align_t does, but based on a datalayout.

Seems reasonable. I know we can query this information from clang, e.g. https://clang.llvm.org/doxygen/classclang_1_1TargetInfo.html#a01403a5106161d4d3cd0c50c43150f89, but I don't think there is an existing string in the data layout to encode this. Will I be adding a new format this? That would be a reasonably large change so I just want to make sure I'm on the right page.

Err, no. I'm simply thinking that datalayout already specifies the primitive [scalar] types, so you should just need to go through them and pick the one with maximal alignment requirement, and pick it.

tschuett added inline comments.
llvm/lib/Transforms/IPO/AttributorAttributes.cpp
5931–5933

Would a comment help to explain what the hard-coded 16 means?

Then i guess you need to basically introduce an interface to do what https://en.cppreference.com/w/cpp/types/max_align_t does, but based on a datalayout.

Seems reasonable. I know we can query this information from clang, e.g. https://clang.llvm.org/doxygen/classclang_1_1TargetInfo.html#a01403a5106161d4d3cd0c50c43150f89, but I don't think there is an existing string in the data layout to encode this. Will I be adding a new format this? That would be a reasonably large change so I just want to make sure I'm on the right page.

Err, no. I'm simply thinking that datalayout already specifies the primitive [scalar] types, so you should just need to go through them and pick the one with maximal alignment requirement, and pick it.

The default data layout contains a 128 bit float, so if we just check the maximum alignment we'll always get at least 16, even on 32-bit architectures. I could only consider the ones set explicitly by the data layout string, but doesn't that go against the purpose of the defaults?

Then i guess you need to basically introduce an interface to do what https://en.cppreference.com/w/cpp/types/max_align_t does, but based on a datalayout.

Seems reasonable. I know we can query this information from clang, e.g. https://clang.llvm.org/doxygen/classclang_1_1TargetInfo.html#a01403a5106161d4d3cd0c50c43150f89, but I don't think there is an existing string in the data layout to encode this. Will I be adding a new format this? That would be a reasonably large change so I just want to make sure I'm on the right page.

Err, no. I'm simply thinking that datalayout already specifies the primitive [scalar] types, so you should just need to go through them and pick the one with maximal alignment requirement, and pick it.

The default data layout contains a 128 bit float,

For which target/architecture? What happens on other target/architectures?

so if we just check the maximum alignment we'll always get at least 16, even on 32-bit architectures. I could only consider the ones set explicitly by the data layout string, but doesn't that go against the purpose of the defaults?

For which target/architecture? What happens on other target/architectures?

This is from the documentation on the data layout. It defines the default values used when initializing the data layout. It seems these can only be overridden, and I can't imagine a situation where someone would override it to define a 128 bit float to have 64-bit alignment, so the largest alignment we'll have in the data layout will always be at least 16 bytes.

When constructing the data layout for a given target, LLVM starts with a default set of specifications which are then (possibly) overridden by the specifications in the datalayout keyword. The default specifications are given in this list:

e - little endian
p:64:64:64 - 64-bit pointers with 64-bit alignment.
p[n]:64:64:64 - Other address spaces are assumed to be the same as the default address space.
S0 - natural stack alignment is unspecified
i1:8:8 - i1 is 8-bit (byte) aligned
i8:8:8 - i8 is 8-bit (byte) aligned
i16:16:16 - i16 is 16-bit aligned
i32:32:32 - i32 is 32-bit aligned
i64:32:64 - i64 has ABI alignment of 32-bits but preferred alignment of 64-bits
f16:16:16 - half is 16-bit aligned
f32:32:32 - float is 32-bit aligned
f64:64:64 - double is 64-bit aligned
f128:128:128 - quad is 128-bit aligned
v64:64:64 - 64-bit vector is 64-bit aligned
v128:128:128 - 128-bit vector is 128-bit aligned

I think the most straightforward way to solve this is to add an alignment attribute to the return value when we generate code, then just copy that when we replace it. It'll change some tests but I'll try that.

jhuber6 updated this revision to Diff 395023.Dec 16 2021, 4:03 PM

Changing the method. Adding alignment information to the runtime call and using
it when we create the alloca. I might need to add the alignment information to
the runtime call to make the implementation sound, but I haven't encountered any
problems with the runtime implementation.

Herald added a project: Restricted Project. · View Herald TranscriptDec 16 2021, 4:03 PM
jhuber6 retitled this revision from [Attributor][Fix] Add default alignment to HeapToStack to [Attributor][Fix] Add alignment return attribute to HeapToStack.Dec 17 2021, 8:30 AM
jhuber6 edited the summary of this revision. (Show Details)
jdoerfert added inline comments.Dec 17 2021, 9:47 AM
clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
1411 ↗(On Diff #395023)

This doesn't work. If the type alignment is > 8 the stack won't fulfill it unless you modify

/// Add worst-case padding so that future allocations are properly aligned.
constexpr const uint32_t Alignment = 8;

in openmp/libomptarget/DeviceRTL/src/State.cpp.
The fact that the state has a fixed alignment right now makes it impossible to allocate higher aligned types anyway.

Proposal:
Add an argument to _alloc_shared that is the alignment as computed above, effecitively making it _alloc_shared_aligned. Modify the stack to actually align the base pointer rather than extend the allocation based on the alignment passed in. Then any type alignment can be handled, including user aligned types.

1475 ↗(On Diff #395023)

Not needed. Will cause a warning, no?

llvm/lib/Transforms/IPO/AttributorAttributes.cpp
5942

This is sensible but needs a test. You can even do it without the else for all allocations. With the proposed changes above alloc_shared would also fall into the aligned_alloc case.

I will split this into two revisions, one handling the return alignment attribute in the Attributor, and one adding alignment information to the __kmpc_alloc_shared OpenMP runtime call, turning it into an aligned allocation.

clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
1411 ↗(On Diff #395023)

That was an original though, I was hoping to avoid the extra work, but I think this is definitely the only way to solve this reasonably, it might also allow us to use the stack more efficiently. We'll still want this alignment information, but we'll need to inform the runtime of the expected alignment.

1475 ↗(On Diff #395023)

Forgot about this, not intended to be included.

llvm/lib/Transforms/IPO/AttributorAttributes.cpp
5942

Yes, we want this regardless because all malloc like calls now seem to have alignment attributes, which makes sure we respect the alignment of the original malloc call. I can probably split this into another patch.

jhuber6 updated this revision to Diff 395182.Dec 17 2021, 11:37 AM

Removing OpenMP code, only adding support for return alignments. Fixing OpenMP
will occur in a following patch.

jhuber6 updated this revision to Diff 395187.Dec 17 2021, 11:39 AM

Removing else if, we should be able to check for all allocations.

jdoerfert accepted this revision.Dec 17 2021, 11:42 AM

LG, don't forget to update the commit message.

jhuber6 edited the summary of this revision. (Show Details)Dec 17 2021, 11:45 AM
This revision was not accepted when it landed; it landed in state Needs Review.Dec 27 2021, 1:58 PM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.