Page MenuHomePhabricator

[RFC] LangRef: Define byref parameter attribute
ClosedPublic

Authored by arsenm on Jun 5 2020, 3:22 PM.

Details

Summary

This allows tracking the in-memory type of a pointer argument to a
function for ABI purposes. This is essentially a stripped down version
of byval to remove some of the stack-copy implications in its
definition.

My original attempt at solving some of these problems was to repurpose
byval with a different address space from the stack. However, it is
technically permitted for the callee to introduce a write to the argument,
although nothing does this in reality. There is also talk of removing
and replacing the byval attribute, so a new attribute would need to
take its place anyway.

I went with the name inmem to mirror inreg. Other name ideas I had are
"indirect", "abitype", or "pointee". One question I have is whether
this attribute is necessary, or if the definition of preallocated can
be refined to fit this case. The current description is heavy on what
it means for the call site, but I didn't understand the implications
for the callee. For the amdgpu_kernel use case, calls are illegal so
all of the details about call setup are irrelevant.

This is intended avoid some optimization issues with the current
handling of aggregate arguments, as well as fixes inflexibilty in how
frontends can specify the kernel ABI. The most honest representation
of the amdgpu_kernel convention is to expose all kernel arguments as
loads from constant memory. Today, these are raw, SSA Argument values
and codegen is responsible for turning these into loads.

Background:

There currently isn't a satisfactory way to represent how arguments
for the amdgpu_kernel calling convention are passed. In reality,
arguments are passed in a single, flat, constant memory buffer
implicitly passed to the function. It is also illegal to call this
function in the IR, and this is only ever invoked by a driver of some
kind.

It does not make sense to have a stack passed parameter in this
context as is implied by byval. It is never valid to write to the
kernel arguments, as this would corrupt the inputs seen by other
dispatches of the kernel. These argumets are also not in the same
address space as the stack, so a copy is needed to an alloca. From a
source C-like language, the kernel parameters are invisible.
Semantically, a copy is always required from the constant argument
memory to a mutable variable.

The current clang calling convention lowering emits raw values,
including aggregates into the function argument list, since using
byval would not make sense. This has some unfortunate consequences for
the optimizer. In the aggregate case, we end up with an aggregate
store to alloca, which both SROA and instcombine turn into a store of
each aggregate field. The optimizer never pieces this back together to
see that this is really just a copy from constant memory, so we end up
stuck with expensive stack usage.

This also means the backend dictates the alignment of arguments, and
arbitrarily picks the LLVM IR ABI type alignment. By allowing an
explicit alignment, frontends can make better decisions. For example,
there's real no advantage to an aligment higher than 4, so a frontend
could choose to compact the argument layout. Similarly, there is a
high penalty to using an alignment lower than 4, so a frontend could
opt into more padding for small arguments.

Another design consideration is when it is appropriate to expose the
fact that these arguments are all really passed in adjacent
memory. Currently we have a late IR optimization pass in codegen to
rewrite the kernel argument values into explicit loads to enable
vectorization. In most programs, unrelated argument loads can be
merged together. However, exposing this property directly from the
frontend has some disadvantages. We still need a way to track the
original argument sizes and alignments to report to the driver. I find
using some side-channel, metadata mechanism to track this
unappealing. If the kernel arguments were exposed as a single buffer
to begin with, alias analysis would be unaware that the padding bits
betewen arguments are meaningless. Another family of problems is there
are still some gaps in replacing all of the available parameter
attributes with metadata equivalents once lowered to loads.

The immediate plan is to start using this new attribute to handle all
aggregate argumets for kernels. Long term, it makes sense to migrate
all kernel arguments, including scalars, to be passed indirectly in
the same manner.

Additional context is in D79744.

Diff Detail

Event Timeline

arsenm created this revision.Jun 5 2020, 3:22 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 5 2020, 3:22 PM
Herald added subscribers: tpr, wdng. · View Herald Transcript

On a normal, stack-based machine, there are two ways to pass an aggregate to a function: directly, or indirectly. Directly means something like "byval" or "preallocated": the value of the pointer is not a value the caller can control, but instead refers to some fixed offset on the stack. Indirectly means that the caller actually passes a pointer to the function; essentially, the function actually takes a pointer as an argument, not an aggregate.

For byval/preallocated/inalloca, the callee "owns" the memory, loosely speaking. If that's not the case here, it's probably a bad idea to use the existing markings.

However, it is technically permitted for the callee to introduce a write to the argument, although nothing does this in reality

Realistically, there isn't much incentive for optimizations to introduce new writes to byval arguments, but they definitely could in the future. (It might make sense to optimize out a memcpy from a byval argument to an alloca in more cases.)

The optimizer never pieces this back together to see that this is really just a copy from constant memory, so we end up stuck with expensive stack usage.

If the code writes to the variable, or the address escapes, I think there's no getting around the expensive stack usage. If we can prove there are only reads, we can eliminate the alloca in more cases, I guess.


Overall, this approach seems reasonable.

llvm/docs/LangRef.rst
1081

Not sure "unless it is known this will not produce an observable change in the caller" is really useful. "It is not generally permissible to introduce a write" should be clear enough; the memory model implies that a "write" might actually correspond to multiple store instructions.

1089

As far as I can tell, this doesn't include any provision for the caller to actually allocate "inmem" memory. Maybe we should forbid calls with "inmem" arguments. (The function would have to be called by some code not written in LLVM IR.)

Most of the generalized optimization properties of this attribute seem to be redundant with existing attributes. What are the properties you're trying to convey exactly? The argument is a pointer to memory of a certain size and alignment, and that memory is known to be exclusive to this call and also constant?

I don't think in practice those other attributes would ever not be preserved through IR passes, especially for kernels. I guess they could be introduced by other analyses, and you wouldn't want this property to be confused with that? Still, maybe it would be less redundant to just have this attribute require the existence of those other attributes (particularly dereferenceable and align) rather than providing its own structure.

Do you actually need any of the structure of the type beyond a size and alignment?

I think the important bit here is that we need to distinguish between the C function types void f(A) and void f(A*). If they're both lowered to pointers with the same type, we need an attribute to distinguish them.

arsenm added a comment.Jun 5 2020, 7:20 PM

Most of the generalized optimization properties of this attribute seem to be redundant with existing attributes. What are the properties you're trying to convey exactly? The argument is a pointer to memory of a certain size and alignment, and that memory is known to be exclusive to this call and also constant?

It's really just the in memory size and alignment. In this caseonstantness is implicit since I'll use a constant address space for the pointer (and probably mark the parameter with readonly for good measure)

I don't think in practice those other attributes would ever not be preserved through IR passes, especially for kernels. I guess they could be introduced by other analyses, and you wouldn't want this property to be confused with that? Still, maybe it would be less redundant to just have this attribute require the existence of those other attributes (particularly dereferenceable and align) rather than providing its own structure.

I believe byval implies dereferencable already, and I was just trying to be more explicit than what the others state. The alignment behavior also matches what the other attributes already do. I expect in practice to always emit an align attribute

Do you actually need any of the structure of the type beyond a size and alignment?

Not really. I just need to distinguish whether you are are using sizeof(struct Foo) or a pointer sized slot in the argument buffer

Most of the generalized optimization properties of this attribute seem to be redundant with existing attributes. What are the properties you're trying to convey exactly? The argument is a pointer to memory of a certain size and alignment, and that memory is known to be exclusive to this call and also constant?

It's really just the in memory size and alignment. In this case, constantness is implicit since I'll use a constant address space for the pointer (and probably mark the parameter with readonly for good measure)

In principle, readonly has the same issues that lead to this attribute being necessary: it's not really a semantic contract, it's something that can be placed on the argument by IR. But it's probably to separate that.

I don't think in practice those other attributes would ever not be preserved through IR passes, especially for kernels. I guess they could be introduced by other analyses, and you wouldn't want this property to be confused with that? Still, maybe it would be less redundant to just have this attribute require the existence of those other attributes (particularly dereferenceable and align) rather than providing its own structure.

I believe byval implies dereferencable already, and I was just trying to be more explicit than what the others state. The alignment behavior also matches what the other attributes already do. I expect in practice to always emit an align attribute

It's generally best to insist on strong rules up front and then weaken them if there's really a use case. There's no good reason not to carry an alignment.

Do you actually need any of the structure of the type beyond a size and alignment?

Not really. I just need to distinguish whether you are are using sizeof(struct Foo) or a pointer sized slot in the argument buffer

Okay. I think just carrying numbers is probably less representationally problematic.

My thinking here is that we can make this attribute something that we'd just add by default to all indirect arguments, and then the only special case behavior you need for kernels is that (1) there's a stable size and alignment that you can extract into the kernel-invocation system, which is probably not otherwise useful to IR beyond the existing attributes, and (2) the frontend needs to copy from the argument to form the mutable parameter variable.

I wonder if byref would be a better name for this, as a way to say that the object is semantically a direct argument that's being passed by implicit reference.

arsenm added a comment.Jun 9 2020, 1:38 PM

I wonder if byref would be a better name for this, as a way to say that the object is semantically a direct argument that's being passed by implicit reference.

This is probably a better name, but potentially more easily confused with byval.

As far as switching to just the raw number, I think there's value in being consistent with the other growing family of type-carrying parameter attributes but I don't really care about the type itself. I don't understand inalloca/preallocated well enough to know if those should also really only carry a size.

I wonder if byref would be a better name for this, as a way to say that the object is semantically a direct argument that's being passed by implicit reference.

This is probably a better name, but potentially more easily confused with byval.

That seems like an unlikely confusion.

As far as switching to just the raw number, I think there's value in being consistent with the other growing family of type-carrying parameter attributes but I don't really care about the type itself. I don't understand inalloca/preallocated well enough to know if those should also really only carry a size.

I think carrying a type is probably an attempt to insulate them against the future removal of pointer element types. I don't think it's actually necessary in either case and could certainly just be a size and alignment. But if you want to use a type, I agree it's not inconsistent, and as long as you honor an explicit alignment it's fine.

I wonder if byref would be a better name for this, as a way to say that the object is semantically a direct argument that's being passed by implicit reference.

This is probably a better name, but potentially more easily confused with byval.

That seems like an unlikely confusion.

As far as switching to just the raw number, I think there's value in being consistent with the other growing family of type-carrying parameter attributes but I don't really care about the type itself. I don't understand inalloca/preallocated well enough to know if those should also really only carry a size.

I think carrying a type is probably an attempt to insulate them against the future removal of pointer element types. I don't think it's actually necessary in either case and could certainly just be a size and alignment. But if you want to use a type, I agree it's not inconsistent, and as long as you honor an explicit alignment it's fine.

The alignment is also technically a separate attribute and we generally don't require attributes to always be paired. I guess the verifier could require you to specify align if you use this, but that's another inconsistency

Do we allow inmem to be used for other purposes? I would assume the answer is yes, as we do not forbid it.
I ask because front-ends might use this as a way to add type information to a pointer (especially after we strip them).
Or am I confusing something here?

byref

Would be fine with me too.

I guess the verifier could require you to specify align if you use this, but that's another inconsistency

This would be in line with our efforts to *always* require an alignment. Should write it down, make the verifier aware and apply it to byval as well, maybe others too. The entire "target specific alignment" stuff should be resolved as early as possible to not cause confusion and subtle bugs.

Do we allow inmem to be used for other purposes? I would assume the answer is yes, as we do not forbid it.

I don't know what else we might use it for off-hand, but yes, I think the frontend could put this down on all value arguments that are actually passed indirectly.

Do we allow inmem to be used for other purposes? I would assume the answer is yes, as we do not forbid it.

I don't know what else we might use it for off-hand, but yes, I think the frontend could put this down on all value arguments that are actually passed indirectly.

Where does it say it is limited to indirectly passed arguments?

Do we allow inmem to be used for other purposes? I would assume the answer is yes, as we do not forbid it.

I don't know what else we might use it for off-hand, but yes, I think the frontend could put this down on all value arguments that are actually passed indirectly.

Where does it say it is limited to indirectly passed arguments?

The argument does have to be a pointer. And passes aren't allowed to infer this or it becomes useless for the original purpose.

Do we allow inmem to be used for other purposes? I would assume the answer is yes, as we do not forbid it.

I don't know what else we might use it for off-hand, but yes, I think the frontend could put this down on all value arguments that are actually passed indirectly.

Where does it say it is limited to indirectly passed arguments?

The argument does have to be a pointer. And passes aren't allowed to infer this or it becomes useless for the original purpose.

That is what I'm trying to get at. As of right now, I don't see any reason a pass could not add this, or a front-end for that matter, for any call, assuming they now it won't mess with the ABI for the target. We might want to add language to this end?

Do we allow inmem to be used for other purposes? I would assume the answer is yes, as we do not forbid it.

I don't know what else we might use it for off-hand, but yes, I think the frontend could put this down on all value arguments that are actually passed indirectly.

Where does it say it is limited to indirectly passed arguments?

The argument does have to be a pointer. And passes aren't allowed to infer this or it becomes useless for the original purpose.

That is what I'm trying to get at. As of right now, I don't see any reason a pass could not add this, or a front-end for that matter, for any call, assuming they now it won't mess with the ABI for the target. We might want to add language to this end?

But why would you want to add it? The intent is to express the ABI

Do we allow inmem to be used for other purposes? I would assume the answer is yes, as we do not forbid it.

I don't know what else we might use it for off-hand, but yes, I think the frontend could put this down on all value arguments that are actually passed indirectly.

Where does it say it is limited to indirectly passed arguments?

The argument does have to be a pointer. And passes aren't allowed to infer this or it becomes useless for the original purpose.

That is what I'm trying to get at. As of right now, I don't see any reason a pass could not add this, or a front-end for that matter, for any call, assuming they now it won't mess with the ABI for the target. We might want to add language to this end?

But it does mess with the ABI; that's why it's being added. All its optimization properties are redundant with existing attributes, and those attributes can't be used for Matt's purposes because he's rightly worried about them being inferred by passes. This attribute specifically provides the extra information that this is semantically a value parameter being passed by reference, which can only be added by a frontend that has that information, which an ordinary LLVM pass cannot possibly have.

But it does mess with the ABI; that's why it's being added.

My worry is, that we do not clearly sate it does. I'm worried front-ends will use it to attach types to pointers once pointers do not have types anymore. I'll stop arguing on this one now, if you think this is restrictive enough, feel free to go ahead :)

arsenm updated this revision to Diff 271732.Jun 18 2020, 8:31 AM
arsenm retitled this revision from [RFC] LangRef: Define inmem parameter attribute to [RFC] LangRef: Define byref parameter attribute.

Rename to byref. Specify more explicitly this is for the ABI, and should not be inferred.

I'm conflicted on removing the type in favor of a size value. I guess I'll see how much more painful it is to deviate from what byval does when I try to implement this

rjmccall accepted this revision.Jun 18 2020, 10:50 AM

This LGTM.

This revision is now accepted and ready to land.Jun 18 2020, 10:50 AM

I did realize one edge case that is different between carrying the size and the type. You can have a zero sized type with an alignment, but 0 is naturally an invalid value for the attribute (and all the others that carry an integer)