This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/docs/
-
docs/
2
LangRef.rst
-
ReleaseNotes.rst

Differential D81311

[RFC] LangRef: Define byref parameter attribute
ClosedPublic

Authored by arsenm on Jun 5 2020, 3:22 PM.

Download Raw Diff

Details

Reviewers

rjmccall
jdoerfert
efriedma
t-tye
yaxunl
scott.linder
rnk
spatel
lebedev.ri
nlopes
fhahn
hfinkel
Anastasia

Summary

This allows tracking the in-memory type of a pointer argument to a
function for ABI purposes. This is essentially a stripped down version
of byval to remove some of the stack-copy implications in its
definition.

My original attempt at solving some of these problems was to repurpose
byval with a different address space from the stack. However, it is
technically permitted for the callee to introduce a write to the argument,
although nothing does this in reality. There is also talk of removing
and replacing the byval attribute, so a new attribute would need to
take its place anyway.

I went with the name inmem to mirror inreg. Other name ideas I had are
"indirect", "abitype", or "pointee". One question I have is whether
this attribute is necessary, or if the definition of preallocated can
be refined to fit this case. The current description is heavy on what
it means for the call site, but I didn't understand the implications
for the callee. For the amdgpu_kernel use case, calls are illegal so
all of the details about call setup are irrelevant.

This is intended avoid some optimization issues with the current
handling of aggregate arguments, as well as fixes inflexibilty in how
frontends can specify the kernel ABI. The most honest representation
of the amdgpu_kernel convention is to expose all kernel arguments as
loads from constant memory. Today, these are raw, SSA Argument values
and codegen is responsible for turning these into loads.

Background:

There currently isn't a satisfactory way to represent how arguments
for the amdgpu_kernel calling convention are passed. In reality,
arguments are passed in a single, flat, constant memory buffer
implicitly passed to the function. It is also illegal to call this
function in the IR, and this is only ever invoked by a driver of some
kind.

It does not make sense to have a stack passed parameter in this
context as is implied by byval. It is never valid to write to the
kernel arguments, as this would corrupt the inputs seen by other
dispatches of the kernel. These argumets are also not in the same
address space as the stack, so a copy is needed to an alloca. From a
source C-like language, the kernel parameters are invisible.
Semantically, a copy is always required from the constant argument
memory to a mutable variable.

The current clang calling convention lowering emits raw values,
including aggregates into the function argument list, since using
byval would not make sense. This has some unfortunate consequences for
the optimizer. In the aggregate case, we end up with an aggregate
store to alloca, which both SROA and instcombine turn into a store of
each aggregate field. The optimizer never pieces this back together to
see that this is really just a copy from constant memory, so we end up
stuck with expensive stack usage.

This also means the backend dictates the alignment of arguments, and
arbitrarily picks the LLVM IR ABI type alignment. By allowing an
explicit alignment, frontends can make better decisions. For example,
there's real no advantage to an aligment higher than 4, so a frontend
could choose to compact the argument layout. Similarly, there is a
high penalty to using an alignment lower than 4, so a frontend could
opt into more padding for small arguments.

Another design consideration is when it is appropriate to expose the
fact that these arguments are all really passed in adjacent
memory. Currently we have a late IR optimization pass in codegen to
rewrite the kernel argument values into explicit loads to enable
vectorization. In most programs, unrelated argument loads can be
merged together. However, exposing this property directly from the
frontend has some disadvantages. We still need a way to track the
original argument sizes and alignments to report to the driver. I find
using some side-channel, metadata mechanism to track this
unappealing. If the kernel arguments were exposed as a single buffer
to begin with, alias analysis would be unaware that the padding bits
betewen arguments are meaningless. Another family of problems is there
are still some gaps in replacing all of the available parameter
attributes with metadata equivalents once lowered to loads.

The immediate plan is to start using this new attribute to handle all
aggregate argumets for kernels. Long term, it makes sense to migrate
all kernel arguments, including scalars, to be passed indirectly in
the same manner.

Additional context is in D79744.

Diff Detail

Event Timeline

arsenm created this revision.Jun 5 2020, 3:22 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 5 2020, 3:22 PM

Herald added subscribers: tpr, wdng. · View Herald Transcript

On a normal, stack-based machine, there are two ways to pass an aggregate to a function: directly, or indirectly. Directly means something like "byval" or "preallocated": the value of the pointer is not a value the caller can control, but instead refers to some fixed offset on the stack. Indirectly means that the caller actually passes a pointer to the function; essentially, the function actually takes a pointer as an argument, not an aggregate.

For byval/preallocated/inalloca, the callee "owns" the memory, loosely speaking. If that's not the case here, it's probably a bad idea to use the existing markings.

However, it is technically permitted for the callee to introduce a write to the argument, although nothing does this in reality

Realistically, there isn't much incentive for optimizations to introduce new writes to byval arguments, but they definitely could in the future. (It might make sense to optimize out a memcpy from a byval argument to an alloca in more cases.)

The optimizer never pieces this back together to see that this is really just a copy from constant memory, so we end up stuck with expensive stack usage.

If the code writes to the variable, or the address escapes, I think there's no getting around the expensive stack usage. If we can prove there are only reads, we can eliminate the alloca in more cases, I guess.

Overall, this approach seems reasonable.

llvm/docs/LangRef.rst
1081	Not sure "unless it is known this will not produce an observable change in the caller" is really useful. "It is not generally permissible to introduce a write" should be clear enough; the memory model implies that a "write" might actually correspond to multiple store instructions.
1089	As far as I can tell, this doesn't include any provision for the caller to actually allocate "inmem" memory. Maybe we should forbid calls with "inmem" arguments. (The function would have to be called by some code not written in LLVM IR.)

Most of the generalized optimization properties of this attribute seem to be redundant with existing attributes. What are the properties you're trying to convey exactly? The argument is a pointer to memory of a certain size and alignment, and that memory is known to be exclusive to this call and also constant?

I don't think in practice those other attributes would ever not be preserved through IR passes, especially for kernels. I guess they could be introduced by other analyses, and you wouldn't want this property to be confused with that? Still, maybe it would be less redundant to just have this attribute require the existence of those other attributes (particularly dereferenceable and align) rather than providing its own structure.

Do you actually need any of the structure of the type beyond a size and alignment?

I think the important bit here is that we need to distinguish between the C function types void f(A) and void f(A*). If they're both lowered to pointers with the same type, we need an attribute to distinguish them.

In D81311#2077868, @rjmccall wrote:

Most of the generalized optimization properties of this attribute seem to be redundant with existing attributes. What are the properties you're trying to convey exactly? The argument is a pointer to memory of a certain size and alignment, and that memory is known to be exclusive to this call and also constant?

It's really just the in memory size and alignment. In this caseonstantness is implicit since I'll use a constant address space for the pointer (and probably mark the parameter with readonly for good measure)

I don't think in practice those other attributes would ever not be preserved through IR passes, especially for kernels. I guess they could be introduced by other analyses, and you wouldn't want this property to be confused with that? Still, maybe it would be less redundant to just have this attribute require the existence of those other attributes (particularly dereferenceable and align) rather than providing its own structure.

I believe byval implies dereferencable already, and I was just trying to be more explicit than what the others state. The alignment behavior also matches what the other attributes already do. I expect in practice to always emit an align attribute

Do you actually need any of the structure of the type beyond a size and alignment?

Not really. I just need to distinguish whether you are are using sizeof(struct Foo) or a pointer sized slot in the argument buffer

In D81311#2077944, @arsenm wrote:

In D81311#2077868, @rjmccall wrote:

Most of the generalized optimization properties of this attribute seem to be redundant with existing attributes. What are the properties you're trying to convey exactly? The argument is a pointer to memory of a certain size and alignment, and that memory is known to be exclusive to this call and also constant?

It's really just the in memory size and alignment. In this case, constantness is implicit since I'll use a constant address space for the pointer (and probably mark the parameter with readonly for good measure)

In principle, readonly has the same issues that lead to this attribute being necessary: it's not really a semantic contract, it's something that can be placed on the argument by IR. But it's probably to separate that.

I don't think in practice those other attributes would ever not be preserved through IR passes, especially for kernels. I guess they could be introduced by other analyses, and you wouldn't want this property to be confused with that? Still, maybe it would be less redundant to just have this attribute require the existence of those other attributes (particularly dereferenceable and align) rather than providing its own structure.

I believe byval implies dereferencable already, and I was just trying to be more explicit than what the others state. The alignment behavior also matches what the other attributes already do. I expect in practice to always emit an align attribute

It's generally best to insist on strong rules up front and then weaken them if there's really a use case. There's no good reason not to carry an alignment.

Do you actually need any of the structure of the type beyond a size and alignment?

Not really. I just need to distinguish whether you are are using sizeof(struct Foo) or a pointer sized slot in the argument buffer

Okay. I think just carrying numbers is probably less representationally problematic.

My thinking here is that we can make this attribute something that we'd just add by default to all indirect arguments, and then the only special case behavior you need for kernels is that (1) there's a stable size and alignment that you can extract into the kernel-invocation system, which is probably not otherwise useful to IR beyond the existing attributes, and (2) the frontend needs to copy from the argument to form the mutable parameter variable.

I wonder if byref would be a better name for this, as a way to say that the object is semantically a direct argument that's being passed by implicit reference.

In D81311#2078235, @rjmccall wrote:

I wonder if byref would be a better name for this, as a way to say that the object is semantically a direct argument that's being passed by implicit reference.

This is probably a better name, but potentially more easily confused with byval.

As far as switching to just the raw number, I think there's value in being consistent with the other growing family of type-carrying parameter attributes but I don't really care about the type itself. I don't understand inalloca/preallocated well enough to know if those should also really only carry a size.

In D81311#2083295, @arsenm wrote:

In D81311#2078235, @rjmccall wrote:

I wonder if byref would be a better name for this, as a way to say that the object is semantically a direct argument that's being passed by implicit reference.

This is probably a better name, but potentially more easily confused with byval.

That seems like an unlikely confusion.

As far as switching to just the raw number, I think there's value in being consistent with the other growing family of type-carrying parameter attributes but I don't really care about the type itself. I don't understand inalloca/preallocated well enough to know if those should also really only carry a size.

I think carrying a type is probably an attempt to insulate them against the future removal of pointer element types. I don't think it's actually necessary in either case and could certainly just be a size and alignment. But if you want to use a type, I agree it's not inconsistent, and as long as you honor an explicit alignment it's fine.

In D81311#2084066, @rjmccall wrote:

In D81311#2083295, @arsenm wrote:

In D81311#2078235, @rjmccall wrote:

I wonder if byref would be a better name for this, as a way to say that the object is semantically a direct argument that's being passed by implicit reference.

This is probably a better name, but potentially more easily confused with byval.

That seems like an unlikely confusion.

As far as switching to just the raw number, I think there's value in being consistent with the other growing family of type-carrying parameter attributes but I don't really care about the type itself. I don't understand inalloca/preallocated well enough to know if those should also really only carry a size.

I think carrying a type is probably an attempt to insulate them against the future removal of pointer element types. I don't think it's actually necessary in either case and could certainly just be a size and alignment. But if you want to use a type, I agree it's not inconsistent, and as long as you honor an explicit alignment it's fine.

The alignment is also technically a separate attribute and we generally don't require attributes to always be paired. I guess the verifier could require you to specify align if you use this, but that's another inconsistency

Do we allow inmem to be used for other purposes? I would assume the answer is yes, as we do not forbid it.
I ask because front-ends might use this as a way to add type information to a pointer (especially after we strip them).
Or am I confusing something here?

byref

Would be fine with me too.

I guess the verifier could require you to specify align if you use this, but that's another inconsistency

This would be in line with our efforts to *always* require an alignment. Should write it down, make the verifier aware and apply it to byval as well, maybe others too. The entire "target specific alignment" stuff should be resolved as early as possible to not cause confusion and subtle bugs.

In D81311#2086227, @jdoerfert wrote:

Do we allow inmem to be used for other purposes? I would assume the answer is yes, as we do not forbid it.

I don't know what else we might use it for off-hand, but yes, I think the frontend could put this down on all value arguments that are actually passed indirectly.

In D81311#2086326, @rjmccall wrote:

In D81311#2086227, @jdoerfert wrote:

Do we allow inmem to be used for other purposes? I would assume the answer is yes, as we do not forbid it.

I don't know what else we might use it for off-hand, but yes, I think the frontend could put this down on all value arguments that are actually passed indirectly.

Where does it say it is limited to indirectly passed arguments?

In D81311#2087592, @jdoerfert wrote:

In D81311#2086326, @rjmccall wrote:

In D81311#2086227, @jdoerfert wrote:

Do we allow inmem to be used for other purposes? I would assume the answer is yes, as we do not forbid it.

I don't know what else we might use it for off-hand, but yes, I think the frontend could put this down on all value arguments that are actually passed indirectly.

Where does it say it is limited to indirectly passed arguments?

The argument does have to be a pointer. And passes aren't allowed to infer this or it becomes useless for the original purpose.

In D81311#2088075, @rjmccall wrote:

In D81311#2087592, @jdoerfert wrote:

In D81311#2086326, @rjmccall wrote:

In D81311#2086227, @jdoerfert wrote:

Do we allow inmem to be used for other purposes? I would assume the answer is yes, as we do not forbid it.

I don't know what else we might use it for off-hand, but yes, I think the frontend could put this down on all value arguments that are actually passed indirectly.

Where does it say it is limited to indirectly passed arguments?

The argument does have to be a pointer. And passes aren't allowed to infer this or it becomes useless for the original purpose.

That is what I'm trying to get at. As of right now, I don't see any reason a pass could not add this, or a front-end for that matter, for any call, assuming they now it won't mess with the ABI for the target. We might want to add language to this end?

In D81311#2088568, @jdoerfert wrote:

In D81311#2088075, @rjmccall wrote:

In D81311#2087592, @jdoerfert wrote:

In D81311#2086326, @rjmccall wrote:

In D81311#2086227, @jdoerfert wrote:

Do we allow inmem to be used for other purposes? I would assume the answer is yes, as we do not forbid it.

I don't know what else we might use it for off-hand, but yes, I think the frontend could put this down on all value arguments that are actually passed indirectly.

Where does it say it is limited to indirectly passed arguments?

The argument does have to be a pointer. And passes aren't allowed to infer this or it becomes useless for the original purpose.

That is what I'm trying to get at. As of right now, I don't see any reason a pass could not add this, or a front-end for that matter, for any call, assuming they now it won't mess with the ABI for the target. We might want to add language to this end?

But why would you want to add it? The intent is to express the ABI

In D81311#2088568, @jdoerfert wrote:

In D81311#2088075, @rjmccall wrote:

In D81311#2087592, @jdoerfert wrote:

In D81311#2086326, @rjmccall wrote:

In D81311#2086227, @jdoerfert wrote:

Do we allow inmem to be used for other purposes? I would assume the answer is yes, as we do not forbid it.

I don't know what else we might use it for off-hand, but yes, I think the frontend could put this down on all value arguments that are actually passed indirectly.

Where does it say it is limited to indirectly passed arguments?

The argument does have to be a pointer. And passes aren't allowed to infer this or it becomes useless for the original purpose.

That is what I'm trying to get at. As of right now, I don't see any reason a pass could not add this, or a front-end for that matter, for any call, assuming they now it won't mess with the ABI for the target. We might want to add language to this end?

But it does mess with the ABI; that's why it's being added. All its optimization properties are redundant with existing attributes, and those attributes can't be used for Matt's purposes because he's rightly worried about them being inferred by passes. This attribute specifically provides the extra information that this is semantically a value parameter being passed by reference, which can only be added by a frontend that has that information, which an ordinary LLVM pass cannot possibly have.

But it does mess with the ABI; that's why it's being added.

My worry is, that we do not clearly sate it does. I'm worried front-ends will use it to attach types to pointers once pointers do not have types anymore. I'll stop arguing on this one now, if you think this is restrictive enough, feel free to go ahead :)

Rename to byref. Specify more explicitly this is for the ABI, and should not be inferred.

I'm conflicted on removing the type in favor of a size value. I guess I'll see how much more painful it is to deviate from what byval does when I try to implement this

LGTM, @rjmccall wdyt?

This LGTM.

This revision is now accepted and ready to land.Jun 18 2020, 10:50 AM

I did realize one edge case that is different between carrying the size and the type. You can have a zero sized type with an alignment, but 0 is naturally an invalid value for the attribute (and all the others that carry an integer)

arsenm added a child revision: D83518: IR: Define byref attribute.Jul 9 2020, 3:42 PM

Merged with D83518 as 5e999cbe8db0b50dc9828a1c062b4ffe84c5b137

Revision Contents

Path

Size

llvm/

docs/

LangRef.rst

24 lines

ReleaseNotes.rst

6 lines

Diff 271732

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,060 Lines • ▼ Show 20 Lines	``byval`` or ``byval(<ty>)``
the same as the pointee type of the argument.		the same as the pointee type of the argument.

The byval attribute also supports specifying an alignment with the		The byval attribute also supports specifying an alignment with the
align attribute. It indicates the alignment of the stack slot to		align attribute. It indicates the alignment of the stack slot to
form and the known alignment of the pointer specified to the call		form and the known alignment of the pointer specified to the call
site. If the alignment is not specified, then the code generator		site. If the alignment is not specified, then the code generator
makes a target-specific assumption.		makes a target-specific assumption.

		.. _attr_byref:

		``byref(<ty>)``

		The ``byref`` argument attribute allows specifying the pointee
		memory type of an argument. This is similar to ``byval``, but does
		not imply a copy is made anywhere, or that the argument is passed
		on the stack. This implies the pointer is dereferenceable up to
		the storage size of the type.

		It is not generally permissible to introduce a write to an
		``byref`` pointer. The pointer may have any address space and may
		be read only.
		efriedmaUnsubmitted Not Done Reply Inline Actions Not sure "unless it is known this will not produce an observable change in the caller" is really useful. "It is not generally permissible to introduce a write" should be clear enough; the memory model implies that a "write" might actually correspond to multiple store instructions. efriedma: Not sure "unless it is known this will not produce an observable change in the caller" is…

		This is not a valid attribute for return values.

		The alignment for an ``byref`` parameter can be explicitly
		specified by combining it with the ``align`` attribute, similar to
		``byval``. If the alignment is not specified, then the code generator
		makes a target-specific assumption.

		efriedmaUnsubmitted Not Done Reply Inline Actions As far as I can tell, this doesn't include any provision for the caller to actually allocate "inmem" memory. Maybe we should forbid calls with "inmem" arguments. (The function would have to be called by some code not written in LLVM IR.) efriedma: As far as I can tell, this doesn't include any provision for the caller to actually allocate…
		This is intended for representing ABI constraints, and is not
		intended to be inferred for optimization use.

.. _attr_preallocated:		.. _attr_preallocated:

``preallocated(<ty>)``		``preallocated(<ty>)``
This indicates that the pointer parameter should really be passed by		This indicates that the pointer parameter should really be passed by
value to the function, and that the pointer parameter's pointee has		value to the function, and that the pointer parameter's pointee has
already been initialized before the call instruction. This attribute		already been initialized before the call instruction. This attribute
is only valid on LLVM pointer arguments. The argument must be the value		is only valid on LLVM pointer arguments. The argument must be the value
returned by the appropriate		returned by the appropriate
▲ Show 20 Lines • Show All 19,191 Lines • Show Last 20 Lines

llvm/docs/ReleaseNotes.rst

Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	* `dereferenceable` attributes and metadata on pointers no longer imply
optimizations would make assumptions based on the type of the pointer. This		optimizations would make assumptions based on the type of the pointer. This
behavior was undocumented. To preserve optimizations, frontends may need to		behavior was undocumented. To preserve optimizations, frontends may need to
be updated to generate appropriate `align` attributes and metadata.		be updated to generate appropriate `align` attributes and metadata.

* The DIModule metadata is extended to contain file and line number		* The DIModule metadata is extended to contain file and line number
information. This information is used to represent Fortran modules debug		information. This information is used to represent Fortran modules debug
info at IR level.		info at IR level.

		* Added the ``byref`` attribute to better represent argument passing
		for the `amdgpu_kernel` calling convention.

Changes to building LLVM		Changes to building LLVM
------------------------		------------------------

Changes to the ARM Backend		Changes to the ARM Backend
--------------------------		--------------------------

During this release ...		During this release ...

▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
Changes to the AMDGPU Target		Changes to the AMDGPU Target
-----------------------------		-----------------------------

* The backend default denormal handling mode has been switched to on		* The backend default denormal handling mode has been switched to on
for all targets for all compute function types. Frontends wishing to		for all targets for all compute function types. Frontends wishing to
retain the old behavior should explicitly request f32 denormal		retain the old behavior should explicitly request f32 denormal
flushing.		flushing.

		* The new ``byref`` attribute is now the preferred method for
		representing aggregate kernel arguments.

Changes to the AVR Target		Changes to the AVR Target
-----------------------------		-----------------------------

* Moved from an experimental backend to an official backend. AVR support is now		* Moved from an experimental backend to an official backend. AVR support is now
included by default in all LLVM builds and releases and is available under		included by default in all LLVM builds and releases and is available under
the "avr-unknown-unknown" target triple.		the "avr-unknown-unknown" target triple.

Changes to the WebAssembly Target		Changes to the WebAssembly Target
▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines