This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/CodeGen/
-
clang/
-
CodeGen/
1/1
CGFunctionInfo.h
-
lib/CodeGen/
-
CodeGen/
3/3
CGCall.cpp
7/14
TargetInfo.cpp
-
test/
-
CodeGenCUDA/
-
kernel-args.cu
-
CodeGenOpenCL/
-
amdgpu-abi-struct-coerce.cl

Differential D79744

clang: Use byref for aggregate kernel arguments
ClosedPublic

Authored by arsenm on May 11 2020, 1:55 PM.

Download Raw Diff

Details

Reviewers

yaxunl
hliao
jdoerfert
rjmccall
Anastasia
rampitec

Summary

Add address space to indirect abi info and use it for kernels.

Previously, indirect arguments assumed assumed a stack passed object
in the alloca address space using byval. A stack pointer is unsuitable
for kernel arguments, which are passed in a separate, constant buffer
with a different address space.

Start using the new byref for aggregate kernel arguments. Previously
these were emitted as raw struct arguments, and turned into loads in
the backend. These will lower identically, although with byref you now
have the option of applying an explicit alignment. In the future, a
reasonable implementation would use byref for all kernel arguments
(this would be a practical problem at the moment due to losing things
like noalias on pointer arguments).

This is mostly to avoid fighting the optimizer's treatment of
aggregate load/store. SROA and instcombine both turn aggregate loads
and stores into a long sequence of element loads and stores, rather
than the optimizable memcpy I would expect in this situation. Now an
explicit memcpy will be introduced up-front which is better understood
and helps eliminate the alloca in more situations.

This skips using byref in the case where HIP kernel pointer arguments
in structs are promoted to global pointers. At minimum an additional
patch is needed to allow coercion with indirect arguments.

Diff Detail

Event Timeline

arsenm created this revision.May 11 2020, 1:55 PM

Herald added subscribers: kerbowa, nhaehnle, wdng, jvesely. · View Herald TranscriptMay 11 2020, 1:55 PM

arsenm added parent revisions: D79732: AMDGPU/HIP: Don't replace pointer types in kernel argument structs, D79630: AMDGPU: Start interpreting byref on kernel arguments, D79593: Verifier: Disallow byval and similar for AMDGPU calling conventions.May 11 2020, 1:55 PM

Forgot to commit a new test

Typo in commit message: "Previously, indirect arguments assumed assumed".

The parameter variable is formally considered to be in a particular address space, and taking the address of it yields a pointer in that address space. That can only work for an indirect argument if either (1) the address space of the pointer that's actually passed can be freely converted to that formal address space (because it's a subspace, or because it's a superspace but known to be in the right subspace) or (2) we're willing to copy the object into the right address space on the callee side (and able to — obviously this only works for POD types). I do see the merit of allowing an address space to be specified for targets that consider locals to be in a different formal address space than the stack would naturally be (e.g. a generic address space); I don't remember if that applies to AMDGPU.

byval in LLVM is not a generic "this is a by-value argument" annotation; it specifically means that the value is actually passed on the stack despite the formally indirect convention, and therefore there's an implicit memcpy on the caller side. That is why byval is inherently tied to the alloca address space: there's no actual pointer being passed, so it doesn't make sense to pretend it might have been promoted into a different address space. That is also why there is no restriction to writing into the pointer: there's no reason to prevent writing to the argument slot.

In D79744#2030294, @rjmccall wrote:

The parameter variable is formally considered to be in a particular address space, and taking the address of it yields a pointer in that address space. That can only work for an indirect argument if either (1) the address space of the pointer that's actually passed can be freely converted to that formal address space (because it's a subspace, or because it's a superspace but known to be in the right subspace) or (2) we're willing to copy the object into the right address space on the callee side (and able to — obviously this only works for POD types). I do see the merit of allowing an address space to be specified for targets that consider locals to be in a different formal address space than the stack would naturally be (e.g. a generic address space); I don't remember if that applies to AMDGPU.

Kernel arguments aren't directly visible to the user program, so this is an implementation detail. The user variable is the alloca that we need to explicitly copy to as you mentioned, which this patch does. It's possible to poke at these through intrinsics, but the kernel address space isn't part of the language. POD type or not, we're going to have to do something to unload values from the constant buffer onto the stack.

byval in LLVM is not a generic "this is a by-value argument" annotation; it specifically means that the value is actually passed on the stack despite the formally indirect convention, and therefore there's an implicit memcpy on the caller side. That is why byval is inherently tied to the alloca address space: there's no actual pointer being passed, so it doesn't make sense to pretend it might have been promoted into a different address space. That is also why there is no restriction to writing into the pointer: there's no reason to prevent writing to the argument slot.

In this case there is never a call. Only the callee read exists as this is the entry point. What would the alternative be? Add another flavor of byval attribute that's nearly identical?

In D79744#2030481, @arsenm wrote:

In D79744#2030294, @rjmccall wrote:

The parameter variable is formally considered to be in a particular address space, and taking the address of it yields a pointer in that address space. That can only work for an indirect argument if either (1) the address space of the pointer that's actually passed can be freely converted to that formal address space (because it's a subspace, or because it's a superspace but known to be in the right subspace) or (2) we're willing to copy the object into the right address space on the callee side (and able to — obviously this only works for POD types). I do see the merit of allowing an address space to be specified for targets that consider locals to be in a different formal address space than the stack would naturally be (e.g. a generic address space); I don't remember if that applies to AMDGPU.

Kernel arguments aren't directly visible to the user program, so this is an implementation detail. The user variable is the alloca that we need to explicitly copy to as you mentioned, which this patch does.

It's usually a goal of indirect arguments to not need this copy. We usually bind parameters directly to the pointer that was passed in.

It's possible to poke at these through intrinsics, but the kernel address space isn't part of the language. POD type or not, we're going to have to do something to unload values from the constant buffer onto the stack.

byval in LLVM is not a generic "this is a by-value argument" annotation; it specifically means that the value is actually passed on the stack despite the formally indirect convention, and therefore there's an implicit memcpy on the caller side. That is why byval is inherently tied to the alloca address space: there's no actual pointer being passed, so it doesn't make sense to pretend it might have been promoted into a different address space. That is also why there is no restriction to writing into the pointer: there's no reason to prevent writing to the argument slot.

In this case there is never a call. Only the callee read exists as this is the entry point. What would the alternative be? Add another flavor of byval attribute that's nearly identical?

It's hard to answer this because I'm not sure what you're trying to accomplish by marking the arguments byval.

In D79744#2030535, @rjmccall wrote:

In D79744#2030481, @arsenm wrote:

In D79744#2030294, @rjmccall wrote:

The parameter variable is formally considered to be in a particular address space, and taking the address of it yields a pointer in that address space. That can only work for an indirect argument if either (1) the address space of the pointer that's actually passed can be freely converted to that formal address space (because it's a subspace, or because it's a superspace but known to be in the right subspace) or (2) we're willing to copy the object into the right address space on the callee side (and able to — obviously this only works for POD types). I do see the merit of allowing an address space to be specified for targets that consider locals to be in a different formal address space than the stack would naturally be (e.g. a generic address space); I don't remember if that applies to AMDGPU.

Kernel arguments aren't directly visible to the user program, so this is an implementation detail. The user variable is the alloca that we need to explicitly copy to as you mentioned, which this patch does.

It's usually a goal of indirect arguments to not need this copy. We usually bind parameters directly to the pointer that was passed in.

It's possible to poke at these through intrinsics, but the kernel address space isn't part of the language. POD type or not, we're going to have to do something to unload values from the constant buffer onto the stack.

byval in LLVM is not a generic "this is a by-value argument" annotation; it specifically means that the value is actually passed on the stack despite the formally indirect convention, and therefore there's an implicit memcpy on the caller side. That is why byval is inherently tied to the alloca address space: there's no actual pointer being passed, so it doesn't make sense to pretend it might have been promoted into a different address space. That is also why there is no restriction to writing into the pointer: there's no reason to prevent writing to the argument slot.

In this case there is never a call. Only the callee read exists as this is the entry point. What would the alternative be? Add another flavor of byval attribute that's nearly identical?

It's hard to answer this because I'm not sure what you're trying to accomplish by marking the arguments byval.

The short answer is I'm trying to avoid fighting the optimizer's handling of aggregate load and stores. Passes already understand byval, but are actively harmful to aggregate loads and stores. Having explicit loads in the IR also brings it closer to what these are ultimately lowered to. Currently we emit a raw struct argument, which produces an aggregate store to alloca. Both SROA and instcombine unhelpfully split up the aggregate store, which doesn't optimize nicely like the memcpy from constant memory to alloca. The end result is we end up with more allocas and copies from constant memory that could have just read directly from the constant pointer.

The most honest calling convention handling would be to have an explicit constant pointer that all arguments are read from, and the function would have an empty argument list. This is somewhat impractical, since we still need to track the argument sizes and offsets somehow in the IR. It also becomes much more difficult to emit nicely annotated IR, since things like noalias still largely expect to be attached to a function argument. We do have an optimization pass that rewrites the arguments to look like this to enable vectorization later in the backend, but it suffers from losing useful alias annotations.

Okay. So the only real ABI here is the layout of the memory that the arguments are actually written into? And that memory needs to be treated as constant?

Unfortunately, I think byval just doesn't match what you want because of the mutability — the frontend *has* to have a copy into a local to get IR with correct semantics, because byval is assumed to be locally mutable by both IR-generation and (potentially) LLVM optimization. And I don't think you really want non-byval indirect. So I guess the question is what we can do in the frontend to get the optimizer behavior you need.

In D79744#2030710, @rjmccall wrote:

Okay. So the only real ABI here is the layout of the memory that the arguments are actually written into? And that memory needs to be treated as constant?

Yes, the actual kernel ABI is supposed to invisible.

Unfortunately, I think byval just doesn't match what you want because of the mutability — the frontend *has* to have a copy into a local to get IR with correct semantics, because byval is assumed to be locally mutable by both IR-generation and (potentially) LLVM optimization. And I don't think you really want non-byval indirect. So I guess the question is what we can do in the frontend to get the optimizer behavior you need.

You are allowed to have readonly on a byval pointer argument, in which case optimizations wouldn't be allowed to write into it. Is just adding readonly parameter attributes sufficient? It would be somewhat contrived, but could also define byval as constant if it's not in the alloca address space.

In D79744#2035109, @arsenm wrote:

Unfortunately, I think byval just doesn't match what you want because of the mutability — the frontend *has* to have a copy into a local to get IR with correct semantics, because byval is assumed to be locally mutable by both IR-generation and (potentially) LLVM optimization. And I don't think you really want non-byval indirect. So I guess the question is what we can do in the frontend to get the optimizer behavior you need.

You are allowed to have readonly on a byval pointer argument, in which case optimizations wouldn't be allowed to write into it. Is just adding readonly parameter attributes sufficient? It would be somewhat contrived, but could also define byval as constant if it's not in the alloca address space.

Another option is to add a form of indirect argument passing that just from a constant offset from an intrinsic call. We would still need to leave an unused struct argument in the function argument list for size keeping, and lose the ability to add an explicit alignment. We would also be mixing multiple ways of accessing arguments which is gross but survivable.

In D79744#2035109, @arsenm wrote:

In D79744#2030710, @rjmccall wrote:

Okay. So the only real ABI here is the layout of the memory that the arguments are actually written into? And that memory needs to be treated as constant?

Yes, the actual kernel ABI is supposed to invisible.

Unfortunately, I think byval just doesn't match what you want because of the mutability — the frontend *has* to have a copy into a local to get IR with correct semantics, because byval is assumed to be locally mutable by both IR-generation and (potentially) LLVM optimization. And I don't think you really want non-byval indirect. So I guess the question is what we can do in the frontend to get the optimizer behavior you need.

You are allowed to have readonly on a byval pointer argument, in which case optimizations wouldn't be allowed to write into it. Is just adding readonly parameter attributes sufficient? It would be somewhat contrived, but could also define byval as constant if it's not in the alloca address space.

byval is fundamentally about expressing that something is passed on the stack. If you want an indirect readonly noalias argument, you can make one; however, to convince IRGen to do it, you really need to think of this as a new kind of argument-passing, because "pass the address of an immutable object that the callee can't modify" is not something that we normally need to do in calling-convention lowering. That feels like a lot of complexity to solve a rather narrow problem.

A completely different approach: OpenMP has to solve some very similar problems and just lowers them completely in the frontend; have you considered just doing that? Kernels need a ton of special-case handling anyway, and IIUC you can never optimize over the boundary anyway.

In D79744#2035283, @rjmccall wrote:

A completely different approach: OpenMP has to solve some very similar problems and just lowers them completely in the frontend; have you considered just doing that? Kernels need a ton of special-case handling anyway, and IIUC you can never optimize over the boundary anyway.

Yes, this is what I was describing. The problem is this ends up hurting optimizations because now we end up losing things like noalias on the pointer arguments. I also would still need to track the argument size/offset/align information in the IR, but for that we could keep on doing what we're doing and just never use the kernel arguments. One of the advantages of byval was an explicit align field to track this

I don't understand why noalias is even a concern if the whole buffer passed to the kernel is read-only. noalias is primarily about proving non-interference, but if you can tell IR that the buffer is read-only, that's a much more powerful statement.

Regardless, if you do need noalias, you should be able to emit the same IR that we'd emit if pointers to all the fields were assigned into restrict local variables and then only those variables were subsequently used.

Drive by, I haven't read the entire history.

In D79744#2040208, @rjmccall wrote:

I don't understand why noalias is even a concern if the whole buffer passed to the kernel is read-only. noalias is primarily about proving non-interference, but if you can tell IR that the buffer is read-only, that's a much more powerful statement.

The problem is that it is a "per-pointer" attribute and not "per-object". Given two argument pointers, where one is marked readonly, may still alias. Similarly, an access to a global, indirect accesses, ... can write the "readonly" memory. Hence, the readonly is pretty useless *in the callee* if other accesses can write the memory. The readonly is useful in the caller though, usually if we have basically noalias information there (e.g., it is an alloca). Noalias is useful in the callee regardless of readonly but even better with.

Regardless, if you do need noalias, you should be able to emit the same IR that we'd emit if pointers to all the fields were assigned into restrict local variables and then only those variables were subsequently used.

We are still debating the local restrict pointer support. Once we merge that in it should be usable here.

In D79744#2040348, @jdoerfert wrote:

Drive by, I haven't read the entire history.

In D79744#2040208, @rjmccall wrote:

I don't understand why noalias is even a concern if the whole buffer passed to the kernel is read-only. noalias is primarily about proving non-interference, but if you can tell IR that the buffer is read-only, that's a much more powerful statement.

The problem is that it is a "per-pointer" attribute and not "per-object". Given two argument pointers, where one is marked readonly, may still alias. Similarly, an access to a global, indirect accesses, ... can write the "readonly" memory.

Oh, do we really not have a way to mark that memory is known to be truly immutable for a time? That seems like a really bad limitation. It should be doable with a custom alias analysis at least.

Regardless, if you do need noalias, you should be able to emit the same IR that we'd emit if pointers to all the fields were assigned into restrict local variables and then only those variables were subsequently used.

We are still debating the local restrict pointer support. Once we merge that in it should be usable here.

I thought that was finished a few years ago; is it really not considered usable yet? Or does "we" not just mean LLVM here?

In D79744#2040380, @rjmccall wrote:

In D79744#2040348, @jdoerfert wrote:

Drive by, I haven't read the entire history.

In D79744#2040208, @rjmccall wrote:

I don't understand why noalias is even a concern if the whole buffer passed to the kernel is read-only. noalias is primarily about proving non-interference, but if you can tell IR that the buffer is read-only, that's a much more powerful statement.

The problem is that it is a "per-pointer" attribute and not "per-object". Given two argument pointers, where one is marked readonly, may still alias. Similarly, an access to a global, indirect accesses, ... can write the "readonly" memory.

Oh, do we really not have a way to mark that memory is known to be truly immutable for a time? That seems like a really bad limitation. It should be doable with a custom alias analysis at least.

noalias + readone on an argument basically implies immutable for the function scope. I think we have invariant intrinsics that could do the trick as well, though I'm not too familiar with those. I was eventually hoping for paired/scoped llvm.assumes which would allow noalias + readnone again. Then there is invariant which can be placed on a load instruction. Finally, TBAA has a "constant memory" tag (or something like that), but again it is a per-access thing. That are all the in-tree ways I can think of right now.

Custom alias analysis can probably be used to some degree but except address spaces I'm unsure we have much that you can attach to a pointer and that "really sticks".

Regardless, if you do need noalias, you should be able to emit the same IR that we'd emit if pointers to all the fields were assigned into restrict local variables and then only those variables were subsequently used.

We are still debating the local restrict pointer support. Once we merge that in it should be usable here.

I thought that was finished a few years ago; is it really not considered usable yet? Or does "we" not just mean LLVM here?

Yes, I meant "we" = LLVM here. Maybe we talk about different things. I was referring to local restrict qualified variables, e.g., double * __restrict Ptr = .... The proposal to not just ignore the restrict (see https://godbolt.org/z/jLzjR3) came last year, it was a big one and progress unfortunately stalled (partly my fault). Now we are just about to see a second push to get it done.
Is that what you meant too?

In D79744#2040434, @jdoerfert wrote:

In D79744#2040380, @rjmccall wrote:

In D79744#2040348, @jdoerfert wrote:

Drive by, I haven't read the entire history.

In D79744#2040208, @rjmccall wrote:

I don't understand why noalias is even a concern if the whole buffer passed to the kernel is read-only. noalias is primarily about proving non-interference, but if you can tell IR that the buffer is read-only, that's a much more powerful statement.

The problem is that it is a "per-pointer" attribute and not "per-object". Given two argument pointers, where one is marked readonly, may still alias. Similarly, an access to a global, indirect accesses, ... can write the "readonly" memory.

Oh, do we really not have a way to mark that memory is known to be truly immutable for a time? That seems like a really bad limitation. It should be doable with a custom alias analysis at least.

noalias + readone on an argument basically implies immutable for the function scope. I think we have invariant intrinsics that could do the trick as well, though I'm not too familiar with those. I was eventually hoping for paired/scoped llvm.assumes which would allow noalias + readnone again. Then there is invariant which can be placed on a load instruction. Finally, TBAA has a "constant memory" tag (or something like that), but again it is a per-access thing. That are all the in-tree ways I can think of right now.

Custom alias analysis can probably be used to some degree but except address spaces I'm unsure we have much that you can attach to a pointer and that "really sticks".

Regardless, if you do need noalias, you should be able to emit the same IR that we'd emit if pointers to all the fields were assigned into restrict local variables and then only those variables were subsequently used.

We are still debating the local restrict pointer support. Once we merge that in it should be usable here.

I thought that was finished a few years ago; is it really not considered usable yet? Or does "we" not just mean LLVM here?

Yes, I meant "we" = LLVM here. Maybe we talk about different things. I was referring to local restrict qualified variables, e.g., double * __restrict Ptr = .... The proposal to not just ignore the restrict (see https://godbolt.org/z/jLzjR3) came last year, it was a big one and progress unfortunately stalled (partly my fault). Now we are just about to see a second push to get it done.
Is that what you meant too?

I thought I remembered Hal doing a lot of work on local restrict a few years ago. I'm probably just misremembering, or I didn't realize that the work never landed or got pulled out later.

Okay. So where we're at is that you'd like to add a new argument-passing convention that's basically "passed indirect but immutable", implying that the frontend has to copy it in order to create the mutable local parameter. That's not actually a totally ridiculous convention in principle, although it has poor worst-case behavior (copies on both sides), and that happens to be what the frontend will often have to conservatively emit. I would still prefer not to add new argument-passing conventions just to satisfy short-term optimization needs, though. Are there any other reasonable options?

In D79744#2040731, @rjmccall wrote:

In D79744#2040434, @jdoerfert wrote:

In D79744#2040380, @rjmccall wrote:

In D79744#2040348, @jdoerfert wrote:

Drive by, I haven't read the entire history.

In D79744#2040208, @rjmccall wrote:

I don't understand why noalias is even a concern if the whole buffer passed to the kernel is read-only. noalias is primarily about proving non-interference, but if you can tell IR that the buffer is read-only, that's a much more powerful statement.

The problem is that it is a "per-pointer" attribute and not "per-object". Given two argument pointers, where one is marked readonly, may still alias. Similarly, an access to a global, indirect accesses, ... can write the "readonly" memory.

Oh, do we really not have a way to mark that memory is known to be truly immutable for a time? That seems like a really bad limitation. It should be doable with a custom alias analysis at least.

noalias + readone on an argument basically implies immutable for the function scope. I think we have invariant intrinsics that could do the trick as well, though I'm not too familiar with those. I was eventually hoping for paired/scoped llvm.assumes which would allow noalias + readnone again. Then there is invariant which can be placed on a load instruction. Finally, TBAA has a "constant memory" tag (or something like that), but again it is a per-access thing. That are all the in-tree ways I can think of right now.

Custom alias analysis can probably be used to some degree but except address spaces I'm unsure we have much that you can attach to a pointer and that "really sticks".

Regardless, if you do need noalias, you should be able to emit the same IR that we'd emit if pointers to all the fields were assigned into restrict local variables and then only those variables were subsequently used.

We are still debating the local restrict pointer support. Once we merge that in it should be usable here.

I thought that was finished a few years ago; is it really not considered usable yet? Or does "we" not just mean LLVM here?

Yes, I meant "we" = LLVM here. Maybe we talk about different things. I was referring to local restrict qualified variables, e.g., double * __restrict Ptr = .... The proposal to not just ignore the restrict (see https://godbolt.org/z/jLzjR3) came last year, it was a big one and progress unfortunately stalled (partly my fault). Now we are just about to see a second push to get it done.
Is that what you meant too?

I thought I remembered Hal doing a lot of work on local restrict a few years ago. I'm probably just misremembering, or I didn't realize that the work never landed or got pulled out later.

Okay. So where we're at is that you'd like to add a new argument-passing convention that's basically "passed indirect but immutable", implying that the frontend has to copy it in order to create the mutable local parameter. That's not actually a totally ridiculous convention in principle, although it has poor worst-case behavior (copies on both sides), and that happens to be what the frontend will often have to conservatively emit. I would still prefer not to add new argument-passing conventions just to satisfy short-term optimization needs, though. Are there any other reasonable options?

For the purpose here, only the callee exists. This is essentially a freestanding function, the entry point to the program. There is no caller function, and in the future I would like to make a call to amdgpu_kernel an IR verifier error (technically OpenCL device enqueue is an exception to this, but we don't treat this as a call. Instead there's a lot of library magic to invoke the kernel. From the perspective of the callee nothing changes, since it's still not allowed to modify the incoming argument buffer or aware it was called this way).

The load-from-constant nature needs to be exposed earlier, so I think this necessarily involves changing the convention lowering in some way and it's just a question of what it looks like. To summarize the 2.5 options I've come up with are

Use constant byval parameters, as this patch does. This requires the least implementation effort but doesn't exactly fit in with byval as defined.
Replace all IR argument uses with loads from a constant offset from an intrinsic call. This still needs to leave the IR arguments in place because we do need to know the original argument sizes and offsets, but they would never have a use (or I would need to invent some other method of tracking this information)
Keep clang IR generation unchanged, but move the pass that lowers arguments to loads earlier and hack out aggregate IR loads before SROA makes things worse. This is really just a kludgier version of option 2. We do ultimately do this late in the backend to enable vectorization, but it does seem to make the middle end optimizer unhappy

In D79744#2047482, @arsenm wrote:

For the purpose here, only the callee exists. This is essentially a freestanding function, the entry point to the program. There is no caller function, and in the future I would like to make a call to amdgpu_kernel an IR verifier error (technically OpenCL device enqueue is an exception to this, but we don't treat this as a call. Instead there's a lot of library magic to invoke the kernel. From the perspective of the callee nothing changes, since it's still not allowed to modify the incoming argument buffer or aware it was called this way).

Did you consider callback annotation for the device enqueue call? While that might not change anything *now*, I'm expecting interesting optimization opportunities there at some point "soon".

The load-from-constant nature needs to be exposed earlier, so I think this necessarily involves changing the convention lowering in some way and it's just a question of what it looks like. To summarize the 2.5 options I've come up with are

Use constant byval parameters, as this patch does. This requires the least implementation effort but doesn't exactly fit in with byval as defined.

And, as was noted in the byval lang ref patch (D79636), there is a reasonable argument to be made to phase-out byval in favor of some explicit copying. If that happens, this solution should not be "the last byval user". Also, byval arguments could be used as scratchpad by smart optimizations. I somehow don't believe this is a great choice but I can by now see that the others are neither.

Replace all IR argument uses with loads from a constant offset from an intrinsic call. This still needs to leave the IR arguments in place because we do need to know the original argument sizes and offsets, but they would never have a use (or I would need to invent some other method of tracking this information)

Keep clang IR generation unchanged, but move the pass that lowers arguments to loads earlier and hack out aggregate IR loads before SROA makes things worse. This is really just a kludgier version of option 2. We do ultimately do this late in the backend to enable vectorization, but it does seem to make the middle end optimizer unhappy

In D79744#2047482, @arsenm wrote:

In D79744#2040731, @rjmccall wrote:

In D79744#2040434, @jdoerfert wrote:

In D79744#2040380, @rjmccall wrote:

In D79744#2040348, @jdoerfert wrote:

Drive by, I haven't read the entire history.

In D79744#2040208, @rjmccall wrote:

I don't understand why noalias is even a concern if the whole buffer passed to the kernel is read-only. noalias is primarily about proving non-interference, but if you can tell IR that the buffer is read-only, that's a much more powerful statement.

The problem is that it is a "per-pointer" attribute and not "per-object". Given two argument pointers, where one is marked readonly, may still alias. Similarly, an access to a global, indirect accesses, ... can write the "readonly" memory.

Oh, do we really not have a way to mark that memory is known to be truly immutable for a time? That seems like a really bad limitation. It should be doable with a custom alias analysis at least.

noalias + readone on an argument basically implies immutable for the function scope. I think we have invariant intrinsics that could do the trick as well, though I'm not too familiar with those. I was eventually hoping for paired/scoped llvm.assumes which would allow noalias + readnone again. Then there is invariant which can be placed on a load instruction. Finally, TBAA has a "constant memory" tag (or something like that), but again it is a per-access thing. That are all the in-tree ways I can think of right now.

Custom alias analysis can probably be used to some degree but except address spaces I'm unsure we have much that you can attach to a pointer and that "really sticks".

Regardless, if you do need noalias, you should be able to emit the same IR that we'd emit if pointers to all the fields were assigned into restrict local variables and then only those variables were subsequently used.

We are still debating the local restrict pointer support. Once we merge that in it should be usable here.

I thought that was finished a few years ago; is it really not considered usable yet? Or does "we" not just mean LLVM here?

Yes, I meant "we" = LLVM here. Maybe we talk about different things. I was referring to local restrict qualified variables, e.g., double * __restrict Ptr = .... The proposal to not just ignore the restrict (see https://godbolt.org/z/jLzjR3) came last year, it was a big one and progress unfortunately stalled (partly my fault). Now we are just about to see a second push to get it done.
Is that what you meant too?

I thought I remembered Hal doing a lot of work on local restrict a few years ago. I'm probably just misremembering, or I didn't realize that the work never landed or got pulled out later.

Okay. So where we're at is that you'd like to add a new argument-passing convention that's basically "passed indirect but immutable", implying that the frontend has to copy it in order to create the mutable local parameter. That's not actually a totally ridiculous convention in principle, although it has poor worst-case behavior (copies on both sides), and that happens to be what the frontend will often have to conservatively emit. I would still prefer not to add new argument-passing conventions just to satisfy short-term optimization needs, though. Are there any other reasonable options?

For the purpose here, only the callee exists. This is essentially a freestanding function, the entry point to the program.

I'm definitely not going to let you add a new "generic" argument-passing convention and then only implement exactly the parts you need; that's just adding technical debt.

The load-from-constant nature needs to be exposed earlier, so I think this necessarily involves changing the convention lowering in some way and it's just a question of what it looks like. To summarize the 2.5 options I've come up with are

Use constant byval parameters, as this patch does. This requires the least implementation effort but doesn't exactly fit in with byval as defined.

I assume you generate a byval caller in some later pass, which will then implicitly copy. Or do you lower byval in a nonstandard way in your backend?

Replace all IR argument uses with loads from a constant offset from an intrinsic call. This still needs to leave the IR arguments in place because we do need to know the original argument sizes and offsets, but they would never have a use (or I would need to invent some other method of tracking this information)

I guess passing aggregate arguments using a normal indirect convention has this same tracking problem. You'd also have to copy on the caller side to maintain semantics, which is probably hard to eliminate, but I would guess byval has the same problem?

Keep clang IR generation unchanged, but move the pass that lowers arguments to loads earlier and hack out aggregate IR loads before SROA makes things worse. This is really just a kludgier version of option 2. We do ultimately do this late in the backend to enable vectorization, but it does seem to make the middle end optimizer unhappy

Yeah, Clang tries to avoid first-class aggregates for exactly this reason, LLVM does a terrible job generating code for them.

In D79744#2047788, @jdoerfert wrote:

In D79744#2047482, @arsenm wrote:

For the purpose here, only the callee exists. This is essentially a freestanding function, the entry point to the program. There is no caller function, and in the future I would like to make a call to amdgpu_kernel an IR verifier error (technically OpenCL device enqueue is an exception to this, but we don't treat this as a call. Instead there's a lot of library magic to invoke the kernel. From the perspective of the callee nothing changes, since it's still not allowed to modify the incoming argument buffer or aware it was called this way).

Did you consider callback annotation for the device enqueue call? While that might not change anything *now*, I'm expecting interesting optimization opportunities there at some point "soon".

I'm not sure what you mean by this exactly. I'm assuming this means "move device enqueue implementation into the backend". I don't know all the details of how device enqueue is implemented, but there's so much code in the library to support this, I don't think this would end up looking like a normal calling convention lowering. It would guess it would end up looking like an IR pass that would need a calling convention with this restriction, and then a pass to insert the code the library uses now.

The load-from-constant nature needs to be exposed earlier, so I think this necessarily involves changing the convention lowering in some way and it's just a question of what it looks like. To summarize the 2.5 options I've come up with are

Use constant byval parameters, as this patch does. This requires the least implementation effort but doesn't exactly fit in with byval as defined.

And, as was noted in the byval lang ref patch (D79636), there is a reasonable argument to be made to phase-out byval in favor of some explicit copying. If that happens, this solution should not be "the last byval user". Also, byval arguments could be used as scratchpad by smart optimizations. I somehow don't believe this is a great choice but I can by now see that the others are neither.

I assume this would need replacement with a slew of other attributes to capture the type/size/alignment/dereferencability, or a new attribute entirely?

In D79744#2050498, @rjmccall wrote:

For the purpose here, only the callee exists. This is essentially a freestanding function, the entry point to the program.

I'm definitely not going to let you add a new "generic" argument-passing convention and then only implement exactly the parts you need; that's just adding technical debt.

I'm not sure what you mean here. I don't really want or need a totally new generic argument passing convention. Not every IR feature is expected to be implemented by every backend. For instance, inalloca/preallocated exist solely to implement the x86 windows ABI and no other target tries to handle them. This is just another flavor of the same fundamental concept of a parameter attribute needed for a target specific calling convention.

The load-from-constant nature needs to be exposed earlier, so I think this necessarily involves changing the convention lowering in some way and it's just a question of what it looks like. To summarize the 2.5 options I've come up with are

Use constant byval parameters, as this patch does. This requires the least implementation effort but doesn't exactly fit in with byval as defined.

I assume you generate a byval caller in some later pass, which will then implicitly copy. Or do you lower byval in a nonstandard way in your backend?

No, the caller is only an external driver of some kind. Since the address spaces don't match (and the source address space is read only), anything that behaves like a stack byval doesn't really make sense from the beginning which is why this patch changes the address space. If we were to leave this as a stack address space, we would have to add an alloca and insert a memcpy anyway. This would still leave an unusable byval argument around a pass could still theoretically try to write into.

D79630 adds the lowering that uses this. Because reasons (some of which I referenced in the previous comments), we have 3 different implementations of the ABI. The byval pointer value is simply replaced with a new pointer derived from a constant offset from an intrinsic call (this is most obvious in the AMDGPULowerKernelArguments IR version)

Replace all IR argument uses with loads from a constant offset from an intrinsic call. This still needs to leave the IR arguments in place because we do need to know the original argument sizes and offsets, but they would never have a use (or I would need to invent some other method of tracking this information)

I guess passing aggregate arguments using a normal indirect convention has this same tracking problem. You'd also have to copy on the caller side to maintain semantics, which is probably hard to eliminate, but I would guess byval has the same problem?

Isn't using byval the normal indirect convention? Really the only properties I want that byval gives me is a pointee size and alignment. The most abstract attribute I could probably come up with is a pointee(type) annotation that I could use, without the stack copying implications of byval

In D79744#2069324, @arsenm wrote:

In D79744#2050498, @rjmccall wrote:

For the purpose here, only the callee exists. This is essentially a freestanding function, the entry point to the program.

I'm definitely not going to let you add a new "generic" argument-passing convention and then only implement exactly the parts you need; that's just adding technical debt.

I'm not sure what you mean here. I don't really want or need a totally new generic argument passing convention. Not every IR feature is expected to be implemented by every backend. For instance, inalloca/preallocated exist solely to implement the x86 windows ABI and no other target tries to handle them. This is just another flavor of the same fundamental concept of a parameter attribute needed for a target specific calling convention.

I mean the Clang code for supporting this new convention, not the IR support. Of course LLVM has target-specific attributes.

The load-from-constant nature needs to be exposed earlier, so I think this necessarily involves changing the convention lowering in some way and it's just a question of what it looks like. To summarize the 2.5 options I've come up with are

Use constant byval parameters, as this patch does. This requires the least implementation effort but doesn't exactly fit in with byval as defined.

I assume you generate a byval caller in some later pass, which will then implicitly copy. Or do you lower byval in a nonstandard way in your backend?

No, the caller is only an external driver of some kind. Since the address spaces don't match (and the source address space is read only), anything that behaves like a stack byval doesn't really make sense from the beginning which is why this patch changes the address space. If we were to leave this as a stack address space, we would have to add an alloca and insert a memcpy anyway. This would still leave an unusable byval argument around a pass could still theoretically try to write into.

D79630 adds the lowering that uses this. Because reasons (some of which I referenced in the previous comments), we have 3 different implementations of the ABI. The byval pointer value is simply replaced with a new pointer derived from a constant offset from an intrinsic call (this is most obvious in the AMDGPULowerKernelArguments IR version)

Replace all IR argument uses with loads from a constant offset from an intrinsic call. This still needs to leave the IR arguments in place because we do need to know the original argument sizes and offsets, but they would never have a use (or I would need to invent some other method of tracking this information)

I guess passing aggregate arguments using a normal indirect convention has this same tracking problem. You'd also have to copy on the caller side to maintain semantics, which is probably hard to eliminate, but I would guess byval has the same problem?

Isn't using byval the normal indirect convention? Really the only properties I want that byval gives me is a pointee size and alignment. The most abstract attribute I could probably come up with is a pointee(type) annotation that I could use, without the stack copying implications of byval

byval is not an indirect convention. It looks like one in IR, but it means "it's passed in the argument area of the stack", which is essentially more like being passed directly than otherwise.

In IR, the normal indirect convention is just to pass a pointer without any extra treatment. Clang does set nonnull dereferenceable(size) align 4 for optimization purposes, though.

In D79744#2069620, @rjmccall wrote:

In D79744#2069324, @arsenm wrote:

In D79744#2050498, @rjmccall wrote:

For the purpose here, only the callee exists. This is essentially a freestanding function, the entry point to the program.

I'm definitely not going to let you add a new "generic" argument-passing convention and then only implement exactly the parts you need; that's just adding technical debt.

I'm not sure what you mean here. I don't really want or need a totally new generic argument passing convention. Not every IR feature is expected to be implemented by every backend. For instance, inalloca/preallocated exist solely to implement the x86 windows ABI and no other target tries to handle them. This is just another flavor of the same fundamental concept of a parameter attribute needed for a target specific calling convention.

I mean the Clang code for supporting this new convention, not the IR support. Of course LLVM has target-specific attributes.

I think this is converging to adding a new IR attribute that essentially just provides the pointee type for ABI purposes. I guess my name ideas for this would be "indirect", "value", "memoryvalue", "abitype"?

I forget that frontends exist sometimes, so I'm not sure I understand your clang side objection.

In D79744#2069774, @arsenm wrote:

I think this is converging to adding a new IR attribute that essentially just provides the pointee type for ABI purposes. I guess my name ideas for this would be "indirect", "value", "memoryvalue", "abitype"?

My main question for a new attribute is whether it needs to be explicitly read only. I think the answer is no, since you can't ordinarily write to any random pointer argument

In D79744#2069786, @arsenm wrote:

In D79744#2069774, @arsenm wrote:

I think this is converging to adding a new IR attribute that essentially just provides the pointee type for ABI purposes. I guess my name ideas for this would be "indirect", "value", "memoryvalue", "abitype"?

My main question for a new attribute is whether it needs to be explicitly read only. I think the answer is no, since you can't ordinarily write to any random pointer argument

I think this is "in practice" correct (assuming you don't see a write to that location in which case you can do some crazy stuff).

In D79744#2069962, @jdoerfert wrote:

In D79744#2069786, @arsenm wrote:

In D79744#2069774, @arsenm wrote:

I think this is converging to adding a new IR attribute that essentially just provides the pointee type for ABI purposes.

If you don't feel like you can rely on the dereferenceable attributes, then I guess so, yes.

sameerds added a subscriber: sameerds.Jun 5 2020, 8:32 AM

arsenm mentioned this in D81311: [RFC] LangRef: Define byref parameter attribute.Jun 5 2020, 3:22 PM

arsenm mentioned this in rG5e999cbe8db0: IR: Define byref parameter attribute.Jul 20 2020, 7:23 AM

Switch to byref. Doesn't handle the HIP kernel argument promotion case, since that requires more work

arsenm retitled this revision from clang: Use byref for kernel arguments to clang: Use byref for aggregate kernel arguments.Jul 21 2020, 2:49 PM

Arguably we should add this attribute to all indirect arguments. I can understand not wanting to update all the test cases, but you could probably avoid adding a new IndirectByRef kind of ABIArgInfo by treating kernels specially in ConstructAttributeList.

Or, sorry, I forget — is this semantically necessary because the argument is to constant memory and the callee has to copy it to form the mutable local? If so, I think (1) the above statement about theoretically using byref on all arguments still applies and (2) we do need a new ABIArgInfo kind, but we should name it something like IndirectAliased.

In D79744#2165929, @rjmccall wrote:

Arguably we should add this attribute to all indirect arguments. I can understand not wanting to update all the test cases, but you could probably avoid adding a new IndirectByRef kind of ABIArgInfo by treating kernels specially in ConstructAttributeList.

Or, sorry, I forget — is this semantically necessary because the argument is to constant memory and the callee has to copy it to form the mutable local? If so, I think (1) the above statement about theoretically using byref on all arguments still applies and (2) we do need a new ABIArgInfo kind, but we should name it something like IndirectAliased.

Yes, it's semantically needed to insert the copy from constant memory. I originally interpreted a copy as necessary if the indirect addrspace did not match the stack address space, which is a sort of roundabout way of achieving the same thing

Use distinct ABIArgInfo::Kind. Also don't enable this for OpenCL yet, since that requires fixing the callable kernel workaround

rjmccall added inline comments.Jul 24 2020, 9:30 PM

clang/include/clang/CodeGen/CGFunctionInfo.h
52	Hmm. I guess there's actually two different potential conventions here: The caller can provide the address of a known-immutable object that has the right value, and the callee has to copy it if it needs the object to have a unique address or wants to mutate it. The caller can provide the address of any object that has the right value, and the callee has to copy it if it needs the object to have a unique address, wants to mutate it, or needs the value to stick around across call boundaries. The advantage of the second is that IRGen could avoid some copies on the caller side, which could be advantageous when the callee is sufficiently trivial. The disadvantage is that the callee would have to defensively copy in more situations. Probably we should use the former. Please be explicit about it, though: Similar to Indirect, but the pointer may be to an object that is otherwise referenced. The object is known to not be modified through any other references for the duration of the call, and the callee must not itself modify the object. Because C allows parameter variables to be modified and guarantees that they have unique addresses, the callee must defensively copy the object into a local variable if it might be modified or its address might be compared. Since those are uncommon, in principle this convention allows programs to avoid copies in more situations. However, it may introduce extra copies if the callee fails to prove that a copy is unnecessary and the caller naturally produces an unaliased object for the argument.
clang/lib/CodeGen/CGCall.cpp
2216	Please add a TODO here that we could add the `byref` attribute if we're willing to update the test cases. Maybe whoever does that can add alignments at the same time.
2465	"copy it to ensure that the parameter variable is mutable and has a unique address, as C requires". I've wanted Sema to track whether local variables are mutated or have their address taken for a long time; maybe someday we can do that and then take advantage of it here. Just a random thought, sorry.
4700	Please just make this use the Indirect code. If we gave it special attention, we could optimize it better, but conservatively doing what Indirect does should still work.
clang/lib/CodeGen/TargetInfo.cpp
1997	In principle, this can be `inreg` just as much as Indirect can.
8814	I don't see why you'd use `byref` when promoting pointers in structs. Maybe it works as a hack with your backend, but it seems extremely special-case and should not be hacked into the general infrastructure.
9382	No reason not to use the Indirect code here.
9752	Same.

Address comments

clang/lib/CodeGen/TargetInfo.cpp
1997	The IR verifier currently will reject byref + inreg
8814	The whole point is to reinterpret the address space of the pointers in memory since we know if it's a kernel argument it has to be an addrspace(1) pointer or garbage. We can't infer the address space of a generic pointer loaded from memory. byref doesn't change that, it just makes the fact that these are passed in memory explicit
9382	I generally don't like speculatively adding handling for features I can't write a testcase for, but I've moved these

rjmccall added inline comments.Aug 3 2020, 10:31 AM

clang/lib/CodeGen/TargetInfo.cpp
1997	Why? `inreg` is essentially orthogonal.

arsenm added inline comments.Aug 4 2020, 11:49 AM

clang/lib/CodeGen/TargetInfo.cpp
1997	Mostly inherited from the other similar attribute handling. It can be lifted if there's a use

arsenm added inline comments.Aug 5 2020, 6:03 AM

clang/lib/CodeGen/TargetInfo.cpp
1997	Plus the name here is isArgInAlloca; this is not necessarily passed in an alloca

rjmccall added inline comments.Aug 5 2020, 11:25 AM

clang/lib/CodeGen/TargetInfo.cpp
1997	I agree that we don't need to update this.
8814	`byref` is interpreted by your backend passes as an instruction that the argument value is actually the address of an object that's passed to the kernel by value, so you need to expand the memory in the kernel argument marshalling. Why would that be something you'd want to trigger when passing a struct with a pointer in it? You're not going to recursively copy and pass down the pointee values of those pointers.

arsenm added inline comments.Aug 5 2020, 11:36 AM

clang/lib/CodeGen/TargetInfo.cpp
8814	Because all arguments are really passed byref, we're just not at the point yet where we can switch all IR arguments to use byref for all arguments. All of the relevant properties are really always on the in-memory value. The promotion this is talking about is really orthogonal to the IR mechanism used for passing kernel arguments. This promotion is because the language only exposes generic pointers. In the context of a pointer inside a struct passed as a kernel argument, we semantically know the address space of any valid pointers must be global. You could not produce a valid generic pointer from another address space here. The pointers/structs are still the same size and layout, but coercing the in-memory address space is semantically more useful to the optimizer

rjmccall added inline comments.Aug 5 2020, 11:46 AM

clang/lib/CodeGen/TargetInfo.cpp
8814	I understand that the promotion is orthogonal to the IR mechanism used for passing kernel arguments, which is exactly why I'm asking why there's a comment saying that we should "use byref when promoting pointers in struct", because I have no idea what that's supposed to mean when the pointer is just a part of the value being passed. It sounds like what you want is to maybe customize the code that's emitted to copy a byref parameter into a parameter variable when the parameter type is a struct containing a pointer you want to promote. But that doesn't really have anything to do with `byref`; if you weren't using `byref`, you'd still want a similar customization when creating the parameter variable. So it seems to me that the comment is still off-target.

Reword comment

Thanks, LGTM.

This revision is now accepted and ready to land.Aug 6 2020, 12:26 PM

30eeb742f1d11d7a7036e3b8a3bffc1dfd252082

atrosinenko mentioned this in D86020: [MemCpyOptimizer] Optimize passing byref function arguments down the stack.Aug 17 2020, 11:39 AM

Revision Contents

Path

Size

clang/

include/

clang/

CodeGen/

CGFunctionInfo.h

60 lines

lib/

CodeGen/

CGCall.cpp

40 lines

TargetInfo.cpp

31 lines

test/

CodeGenCUDA/

kernel-args.cu

8 lines

CodeGenOpenCL/

amdgpu-abi-struct-coerce.cl

3 lines

Diff 283345

clang/include/clang/CodeGen/CGFunctionInfo.h

Show All 38 Lines	enum Kind : uint8_t {
/// representation. A dummy argument is emitted before the real argument		/// representation. A dummy argument is emitted before the real argument
/// if the specified type stored in "PaddingType" is not zero.		/// if the specified type stored in "PaddingType" is not zero.
Direct,		Direct,

/// Extend - Valid only for integer argument types. Same as 'direct'		/// Extend - Valid only for integer argument types. Same as 'direct'
/// but also emit a zero/sign extension attribute.		/// but also emit a zero/sign extension attribute.
Extend,		Extend,

/// Indirect - Pass the argument indirectly via a hidden pointer		/// Indirect - Pass the argument indirectly via a hidden pointer with the
/// with the specified alignment (0 indicates default alignment).		/// specified alignment (0 indicates default alignment) and address space.
Indirect,		Indirect,

		/// IndirectAliased - Similar to Indirect, but the pointer may be to an
		/// object that is otherwise referenced. The object is known to not be
		rjmccallUnsubmitted Done Reply Inline Actions Hmm. I guess there's actually two different potential conventions here: The caller can provide the address of a known-immutable object that has the right value, and the callee has to copy it if it needs the object to have a unique address or wants to mutate it. The caller can provide the address of any object that has the right value, and the callee has to copy it if it needs the object to have a unique address, wants to mutate it, or needs the value to stick around across call boundaries. The advantage of the second is that IRGen could avoid some copies on the caller side, which could be advantageous when the callee is sufficiently trivial. The disadvantage is that the callee would have to defensively copy in more situations. Probably we should use the former. Please be explicit about it, though: Similar to Indirect, but the pointer may be to an object that is otherwise referenced. The object is known to not be modified through any other references for the duration of the call, and the callee must not itself modify the object. Because C allows parameter variables to be modified and guarantees that they have unique addresses, the callee must defensively copy the object into a local variable if it might be modified or its address might be compared. Since those are uncommon, in principle this convention allows programs to avoid copies in more situations. However, it may introduce extra copies if the callee fails to prove that a copy is unnecessary and the caller naturally produces an unaliased object for the argument. rjmccall: Hmm. I guess there's actually two different potential conventions here: - The caller can…
		/// modified through any other references for the duration of the call, and
		/// the callee must not itself modify the object. Because C allows
		/// parameter variables to be modified and guarantees that they have unique
		/// addresses, the callee must defensively copy the object into a local
		/// variable if it might be modified or its address might be compared.
		/// Since those are uncommon, in principle this convention allows programs
		/// to avoid copies in more situations. However, it may introduce extra
		/// copies if the callee fails to prove that a copy is unnecessary and the
		/// caller naturally produces an unaliased object for the argument.
		IndirectAliased,

/// Ignore - Ignore the argument (treat as void). Useful for void and		/// Ignore - Ignore the argument (treat as void). Useful for void and
/// empty structs.		/// empty structs.
Ignore,		Ignore,

/// Expand - Only valid for aggregate argument types. The structure should		/// Expand - Only valid for aggregate argument types. The structure should
/// be expanded into consecutive arguments for its constituent fields.		/// be expanded into consecutive arguments for its constituent fields.
/// Currently expand is only allowed on structures whose fields		/// Currently expand is only allowed on structures whose fields
/// are all scalar types or are themselves expandable types.		/// are all scalar types or are themselves expandable types.
Show All 22 Lines	union {
llvm::Type *UnpaddedCoerceAndExpandType; // isCoerceAndExpand()		llvm::Type *UnpaddedCoerceAndExpandType; // isCoerceAndExpand()
};		};
union {		union {
unsigned DirectOffset; // isDirect() \|\| isExtend()		unsigned DirectOffset; // isDirect() \|\| isExtend()
unsigned IndirectAlign; // isIndirect()		unsigned IndirectAlign; // isIndirect()
unsigned AllocaFieldIndex; // isInAlloca()		unsigned AllocaFieldIndex; // isInAlloca()
};		};
Kind TheKind;		Kind TheKind;
		unsigned IndirectAddrSpace : 24; // isIndirect()
bool PaddingInReg : 1;		bool PaddingInReg : 1;
bool InAllocaSRet : 1; // isInAlloca()		bool InAllocaSRet : 1; // isInAlloca()
bool InAllocaIndirect : 1;// isInAlloca()		bool InAllocaIndirect : 1;// isInAlloca()
bool IndirectByVal : 1; // isIndirect()		bool IndirectByVal : 1; // isIndirect()
bool IndirectRealign : 1; // isIndirect()		bool IndirectRealign : 1; // isIndirect()
bool SRetAfterThis : 1; // isIndirect()		bool SRetAfterThis : 1; // isIndirect()
bool InReg : 1; // isDirect() \|\| isExtend() \|\| isIndirect()		bool InReg : 1; // isDirect() \|\| isExtend() \|\| isIndirect()
bool CanBeFlattened: 1; // isDirect()		bool CanBeFlattened: 1; // isDirect()
bool SignExt : 1; // isExtend()		bool SignExt : 1; // isExtend()

bool canHavePaddingType() const {		bool canHavePaddingType() const {
return isDirect() \|\| isExtend() \|\| isIndirect() \|\| isExpand();		return isDirect() \|\| isExtend() \|\| isIndirect() \|\| isIndirectAliased() \|\|
		isExpand();
}		}
void setPaddingType(llvm::Type *T) {		void setPaddingType(llvm::Type *T) {
assert(canHavePaddingType());		assert(canHavePaddingType());
PaddingType = T;		PaddingType = T;
}		}

void setUnpaddedCoerceToType(llvm::Type *T) {		void setUnpaddedCoerceToType(llvm::Type *T) {
assert(isCoerceAndExpand());		assert(isCoerceAndExpand());
UnpaddedCoerceAndExpandType = T;		UnpaddedCoerceAndExpandType = T;
}		}

public:		public:
ABIArgInfo(Kind K = Direct)		ABIArgInfo(Kind K = Direct)
: TypeData(nullptr), PaddingType(nullptr), DirectOffset(0), TheKind(K),		: TypeData(nullptr), PaddingType(nullptr), DirectOffset(0), TheKind(K),
PaddingInReg(false), InAllocaSRet(false), InAllocaIndirect(false),		IndirectAddrSpace(0), PaddingInReg(false), InAllocaSRet(false),
IndirectByVal(false), IndirectRealign(false), SRetAfterThis(false),		InAllocaIndirect(false), IndirectByVal(false), IndirectRealign(false),
InReg(false), CanBeFlattened(false), SignExt(false) {}		SRetAfterThis(false), InReg(false), CanBeFlattened(false),
		SignExt(false) {}

static ABIArgInfo getDirect(llvm::Type *T = nullptr, unsigned Offset = 0,		static ABIArgInfo getDirect(llvm::Type *T = nullptr, unsigned Offset = 0,
llvm::Type *Padding = nullptr,		llvm::Type *Padding = nullptr,
bool CanBeFlattened = true) {		bool CanBeFlattened = true) {
auto AI = ABIArgInfo(Direct);		auto AI = ABIArgInfo(Direct);
AI.setCoerceToType(T);		AI.setCoerceToType(T);
AI.setPaddingType(Padding);		AI.setPaddingType(Padding);
AI.setDirectOffset(Offset);		AI.setDirectOffset(Offset);
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	static ABIArgInfo getIndirect(CharUnits Alignment, bool ByVal = true,
auto AI = ABIArgInfo(Indirect);		auto AI = ABIArgInfo(Indirect);
AI.setIndirectAlign(Alignment);		AI.setIndirectAlign(Alignment);
AI.setIndirectByVal(ByVal);		AI.setIndirectByVal(ByVal);
AI.setIndirectRealign(Realign);		AI.setIndirectRealign(Realign);
AI.setSRetAfterThis(false);		AI.setSRetAfterThis(false);
AI.setPaddingType(Padding);		AI.setPaddingType(Padding);
return AI;		return AI;
}		}

		/// Pass this in memory using the IR byref attribute.
		static ABIArgInfo getIndirectAliased(CharUnits Alignment, unsigned AddrSpace,
		bool Realign = false,
		llvm::Type *Padding = nullptr) {
		auto AI = ABIArgInfo(IndirectAliased);
		AI.setIndirectAlign(Alignment);
		AI.setIndirectRealign(Realign);
		AI.setPaddingType(Padding);
		AI.setIndirectAddrSpace(AddrSpace);
		return AI;
		}

static ABIArgInfo getIndirectInReg(CharUnits Alignment, bool ByVal = true,		static ABIArgInfo getIndirectInReg(CharUnits Alignment, bool ByVal = true,
bool Realign = false) {		bool Realign = false) {
auto AI = getIndirect(Alignment, ByVal, Realign);		auto AI = getIndirect(Alignment, ByVal, Realign);
AI.setInReg(true);		AI.setInReg(true);
return AI;		return AI;
}		}
static ABIArgInfo getInAlloca(unsigned FieldIndex, bool Indirect = false) {		static ABIArgInfo getInAlloca(unsigned FieldIndex, bool Indirect = false) {
auto AI = ABIArgInfo(InAlloca);		auto AI = ABIArgInfo(InAlloca);
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	#endif
}		}

Kind getKind() const { return TheKind; }		Kind getKind() const { return TheKind; }
bool isDirect() const { return TheKind == Direct; }		bool isDirect() const { return TheKind == Direct; }
bool isInAlloca() const { return TheKind == InAlloca; }		bool isInAlloca() const { return TheKind == InAlloca; }
bool isExtend() const { return TheKind == Extend; }		bool isExtend() const { return TheKind == Extend; }
bool isIgnore() const { return TheKind == Ignore; }		bool isIgnore() const { return TheKind == Ignore; }
bool isIndirect() const { return TheKind == Indirect; }		bool isIndirect() const { return TheKind == Indirect; }
		bool isIndirectAliased() const { return TheKind == IndirectAliased; }
bool isExpand() const { return TheKind == Expand; }		bool isExpand() const { return TheKind == Expand; }
bool isCoerceAndExpand() const { return TheKind == CoerceAndExpand; }		bool isCoerceAndExpand() const { return TheKind == CoerceAndExpand; }

bool canHaveCoerceToType() const {		bool canHaveCoerceToType() const {
return isDirect() \|\| isExtend() \|\| isCoerceAndExpand();		return isDirect() \|\| isExtend() \|\| isCoerceAndExpand();
}		}

// Direct/Extend accessors		// Direct/Extend accessors
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	#endif

void setInReg(bool IR) {		void setInReg(bool IR) {
assert((isDirect() \|\| isExtend() \|\| isIndirect()) && "Invalid kind!");		assert((isDirect() \|\| isExtend() \|\| isIndirect()) && "Invalid kind!");
InReg = IR;		InReg = IR;
}		}

// Indirect accessors		// Indirect accessors
CharUnits getIndirectAlign() const {		CharUnits getIndirectAlign() const {
assert(isIndirect() && "Invalid kind!");		assert((isIndirect() \|\| isIndirectAliased()) && "Invalid kind!");
return CharUnits::fromQuantity(IndirectAlign);		return CharUnits::fromQuantity(IndirectAlign);
}		}
void setIndirectAlign(CharUnits IA) {		void setIndirectAlign(CharUnits IA) {
assert(isIndirect() && "Invalid kind!");		assert((isIndirect() \|\| isIndirectAliased()) && "Invalid kind!");
IndirectAlign = IA.getQuantity();		IndirectAlign = IA.getQuantity();
}		}

bool getIndirectByVal() const {		bool getIndirectByVal() const {
assert(isIndirect() && "Invalid kind!");		assert(isIndirect() && "Invalid kind!");
return IndirectByVal;		return IndirectByVal;
}		}
void setIndirectByVal(bool IBV) {		void setIndirectByVal(bool IBV) {
assert(isIndirect() && "Invalid kind!");		assert(isIndirect() && "Invalid kind!");
IndirectByVal = IBV;		IndirectByVal = IBV;
}		}

		unsigned getIndirectAddrSpace() const {
		assert(isIndirectAliased() && "Invalid kind!");
		return IndirectAddrSpace;
		}

		void setIndirectAddrSpace(unsigned AddrSpace) {
		assert(isIndirectAliased() && "Invalid kind!");
		IndirectAddrSpace = AddrSpace;
		}

bool getIndirectRealign() const {		bool getIndirectRealign() const {
assert(isIndirect() && "Invalid kind!");		assert((isIndirect() \|\| isIndirectAliased()) && "Invalid kind!");
return IndirectRealign;		return IndirectRealign;
}		}
void setIndirectRealign(bool IR) {		void setIndirectRealign(bool IR) {
assert(isIndirect() && "Invalid kind!");		assert((isIndirect() \|\| isIndirectAliased()) && "Invalid kind!");
IndirectRealign = IR;		IndirectRealign = IR;
}		}

bool isSRetAfterThis() const {		bool isSRetAfterThis() const {
assert(isIndirect() && "Invalid kind!");		assert(isIndirect() && "Invalid kind!");
return SRetAfterThis;		return SRetAfterThis;
}		}
void setSRetAfterThis(bool AfterThis) {		void setSRetAfterThis(bool AfterThis) {
▲ Show 20 Lines • Show All 360 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGCall.cpp

Show First 20 Lines • Show All 1,464 Lines • ▼ Show 20 Lines	case ABIArgInfo::Direct: {
if (AI.isDirect() && AI.getCanBeFlattened() && STy) {		if (AI.isDirect() && AI.getCanBeFlattened() && STy) {
IRArgs.NumberOfArgs = STy->getNumElements();		IRArgs.NumberOfArgs = STy->getNumElements();
} else {		} else {
IRArgs.NumberOfArgs = 1;		IRArgs.NumberOfArgs = 1;
}		}
break;		break;
}		}
case ABIArgInfo::Indirect:		case ABIArgInfo::Indirect:
		case ABIArgInfo::IndirectAliased:
IRArgs.NumberOfArgs = 1;		IRArgs.NumberOfArgs = 1;
break;		break;
case ABIArgInfo::Ignore:		case ABIArgInfo::Ignore:
case ABIArgInfo::InAlloca:		case ABIArgInfo::InAlloca:
// ignore and inalloca doesn't have matching LLVM parameters.		// ignore and inalloca doesn't have matching LLVM parameters.
IRArgs.NumberOfArgs = 0;		IRArgs.NumberOfArgs = 0;
break;		break;
case ABIArgInfo::CoerceAndExpand:		case ABIArgInfo::CoerceAndExpand:
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	CodeGenTypes::GetFunctionType(const CGFunctionInfo &FI) {
bool Inserted = FunctionsBeingProcessed.insert(&FI).second;		bool Inserted = FunctionsBeingProcessed.insert(&FI).second;
(void)Inserted;		(void)Inserted;
assert(Inserted && "Recursively being processed?");		assert(Inserted && "Recursively being processed?");

llvm::Type *resultType = nullptr;		llvm::Type *resultType = nullptr;
const ABIArgInfo &retAI = FI.getReturnInfo();		const ABIArgInfo &retAI = FI.getReturnInfo();
switch (retAI.getKind()) {		switch (retAI.getKind()) {
case ABIArgInfo::Expand:		case ABIArgInfo::Expand:
		case ABIArgInfo::IndirectAliased:
llvm_unreachable("Invalid ABI kind for return argument");		llvm_unreachable("Invalid ABI kind for return argument");

case ABIArgInfo::Extend:		case ABIArgInfo::Extend:
case ABIArgInfo::Direct:		case ABIArgInfo::Direct:
resultType = retAI.getCoerceToType();		resultType = retAI.getCoerceToType();
break;		break;

case ABIArgInfo::InAlloca:		case ABIArgInfo::InAlloca:
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	for (; it != ie; ++it, ++ArgNo) {
case ABIArgInfo::Indirect: {		case ABIArgInfo::Indirect: {
assert(NumIRArgs == 1);		assert(NumIRArgs == 1);
// indirect arguments are always on the stack, which is alloca addr space.		// indirect arguments are always on the stack, which is alloca addr space.
llvm::Type *LTy = ConvertTypeForMem(it->type);		llvm::Type *LTy = ConvertTypeForMem(it->type);
ArgTypes[FirstIRArg] = LTy->getPointerTo(		ArgTypes[FirstIRArg] = LTy->getPointerTo(
CGM.getDataLayout().getAllocaAddrSpace());		CGM.getDataLayout().getAllocaAddrSpace());
break;		break;
}		}
		case ABIArgInfo::IndirectAliased: {
		assert(NumIRArgs == 1);
		llvm::Type *LTy = ConvertTypeForMem(it->type);
		ArgTypes[FirstIRArg] = LTy->getPointerTo(ArgInfo.getIndirectAddrSpace());
		break;
		}
case ABIArgInfo::Extend:		case ABIArgInfo::Extend:
case ABIArgInfo::Direct: {		case ABIArgInfo::Direct: {
// Fast-isel and the optimizer generally like scalar values better than		// Fast-isel and the optimizer generally like scalar values better than
// FCAs, so we flatten them if this is safe to do for this argument.		// FCAs, so we flatten them if this is safe to do for this argument.
llvm::Type *argType = ArgInfo.getCoerceToType();		llvm::Type *argType = ArgInfo.getCoerceToType();
llvm::StructType *st = dyn_cast<llvm::StructType>(argType);		llvm::StructType *st = dyn_cast<llvm::StructType>(argType);
if (st && ArgInfo.isDirect() && ArgInfo.getCanBeFlattened()) {		if (st && ArgInfo.isDirect() && ArgInfo.getCanBeFlattened()) {
assert(NumIRArgs == st->getNumElements());		assert(NumIRArgs == st->getNumElements());
▲ Show 20 Lines • Show All 447 Lines • ▼ Show 20 Lines	FuncAttrs.removeAttribute(llvm::Attribute::ReadOnly)
.removeAttribute(llvm::Attribute::ReadNone);		.removeAttribute(llvm::Attribute::ReadNone);
break;		break;
}		}

case ABIArgInfo::CoerceAndExpand:		case ABIArgInfo::CoerceAndExpand:
break;		break;

case ABIArgInfo::Expand:		case ABIArgInfo::Expand:
		case ABIArgInfo::IndirectAliased:
llvm_unreachable("Invalid ABI kind for return argument");		llvm_unreachable("Invalid ABI kind for return argument");
}		}

if (const auto *RefTy = RetTy->getAs<ReferenceType>()) {		if (const auto *RefTy = RetTy->getAs<ReferenceType>()) {
QualType PTy = RefTy->getPointeeType();		QualType PTy = RefTy->getPointeeType();
if (!PTy->isIncompleteType() && PTy->isConstantSizeType())		if (!PTy->isIncompleteType() && PTy->isConstantSizeType())
RetAttrs.addDereferenceableAttr(		RetAttrs.addDereferenceableAttr(
getMinimumObjectSize(PTy).getQuantity());		getMinimumObjectSize(PTy).getQuantity());
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	for (CGFunctionInfo::const_arg_iterator I = FI.arg_begin(),

case ABIArgInfo::Indirect: {		case ABIArgInfo::Indirect: {
if (AI.getInReg())		if (AI.getInReg())
Attrs.addAttribute(llvm::Attribute::InReg);		Attrs.addAttribute(llvm::Attribute::InReg);

if (AI.getIndirectByVal())		if (AI.getIndirectByVal())
Attrs.addByValAttr(getTypes().ConvertTypeForMem(ParamType));		Attrs.addByValAttr(getTypes().ConvertTypeForMem(ParamType));

		// TODO: We could add the byref attribute if not byval, but it would
		// require updating many testcases.

CharUnits Align = AI.getIndirectAlign();		CharUnits Align = AI.getIndirectAlign();

// In a byval argument, it is important that the required		// In a byval argument, it is important that the required
// alignment of the type is honored, as LLVM might be creating a		// alignment of the type is honored, as LLVM might be creating a
// new stack object, and needs to know what alignment to give		// new stack object, and needs to know what alignment to give
// it. (Sometimes it can deduce a sensible alignment on its own,		// it. (Sometimes it can deduce a sensible alignment on its own,
// but not if clang decides it must emit a packed struct, or the		// but not if clang decides it must emit a packed struct, or the
// user specifies increased alignment requirements.)		// user specifies increased alignment requirements.)
//		//
// This is different from indirect not byval, where the object		// This is different from indirect not byval, where the object
// exists already, and the align attribute is purely		// exists already, and the align attribute is purely
// informative.		// informative.
assert(!Align.isZero());		assert(!Align.isZero());

// For now, only add this when we have a byval argument.		// For now, only add this when we have a byval argument.
// TODO: be less lazy about updating test cases.		// TODO: be less lazy about updating test cases.
if (AI.getIndirectByVal())		if (AI.getIndirectByVal())
Attrs.addAlignmentAttr(Align.getQuantity());		Attrs.addAlignmentAttr(Align.getQuantity());

		rjmccallUnsubmitted Done Reply Inline Actions Please add a TODO here that we could add the `byref` attribute if we're willing to update the test cases. Maybe whoever does that can add alignments at the same time. rjmccall: Please add a TODO here that we could add the `byref` attribute if we're willing to update the…
// byval disables readnone and readonly.		// byval disables readnone and readonly.
FuncAttrs.removeAttribute(llvm::Attribute::ReadOnly)		FuncAttrs.removeAttribute(llvm::Attribute::ReadOnly)
.removeAttribute(llvm::Attribute::ReadNone);		.removeAttribute(llvm::Attribute::ReadNone);

		break;
		}
		case ABIArgInfo::IndirectAliased: {
		CharUnits Align = AI.getIndirectAlign();
		Attrs.addByRefAttr(getTypes().ConvertTypeForMem(ParamType));
		Attrs.addAlignmentAttr(Align.getQuantity());
break;		break;
}		}
case ABIArgInfo::Ignore:		case ABIArgInfo::Ignore:
case ABIArgInfo::Expand:		case ABIArgInfo::Expand:
case ABIArgInfo::CoerceAndExpand:		case ABIArgInfo::CoerceAndExpand:
break;		break;

case ABIArgInfo::InAlloca:		case ABIArgInfo::InAlloca:
▲ Show 20 Lines • Show All 212 Lines • ▼ Show 20 Lines	case ABIArgInfo::InAlloca: {
Builder.CreateStructGEP(ArgStruct, FieldIndex, Arg->getName());		Builder.CreateStructGEP(ArgStruct, FieldIndex, Arg->getName());
if (ArgI.getInAllocaIndirect())		if (ArgI.getInAllocaIndirect())
V = Address(Builder.CreateLoad(V),		V = Address(Builder.CreateLoad(V),
getContext().getTypeAlignInChars(Ty));		getContext().getTypeAlignInChars(Ty));
ArgVals.push_back(ParamValue::forIndirect(V));		ArgVals.push_back(ParamValue::forIndirect(V));
break;		break;
}		}

case ABIArgInfo::Indirect: {		case ABIArgInfo::Indirect:
		case ABIArgInfo::IndirectAliased: {
assert(NumIRArgs == 1);		assert(NumIRArgs == 1);
Address ParamAddr =		Address ParamAddr =
Address(Fn->getArg(FirstIRArg), ArgI.getIndirectAlign());		Address(Fn->getArg(FirstIRArg), ArgI.getIndirectAlign());

if (!hasScalarEvaluationKind(Ty)) {		if (!hasScalarEvaluationKind(Ty)) {
// Aggregates and complex variables are accessed by reference. All we		// Aggregates and complex variables are accessed by reference. All we
// need to do is realign the value, if requested.		// need to do is realign the value, if requested. Also, if the address
		// may be aliased, copy it to ensure that the parameter variable is
		// mutable and has a unique adress, as C requires.
		rjmccallUnsubmitted Done Reply Inline Actions "copy it to ensure that the parameter variable is mutable and has a unique address, as C requires". I've wanted Sema to track whether local variables are mutated or have their address taken for a long time; maybe someday we can do that and then take advantage of it here. Just a random thought, sorry. rjmccall: "copy it to ensure that the parameter variable is mutable and has a unique address, as C…
Address V = ParamAddr;		Address V = ParamAddr;
if (ArgI.getIndirectRealign()) {		if (ArgI.getIndirectRealign() \|\| ArgI.isIndirectAliased()) {
Address AlignedTemp = CreateMemTemp(Ty, "coerce");		Address AlignedTemp = CreateMemTemp(Ty, "coerce");

// Copy from the incoming argument pointer to the temporary with the		// Copy from the incoming argument pointer to the temporary with the
// appropriate alignment.		// appropriate alignment.
//		//
// FIXME: We should have a common utility for generating an aggregate		// FIXME: We should have a common utility for generating an aggregate
// copy.		// copy.
CharUnits Size = getContext().getTypeSizeInChars(Ty);		CharUnits Size = getContext().getTypeSizeInChars(Ty);
▲ Show 20 Lines • Show All 825 Lines • ▼ Show 20 Lines	// Otherwise, we need to make a first-class aggregate.

RV = llvm::UndefValue::get(returnType);		RV = llvm::UndefValue::get(returnType);
for (unsigned i = 0, e = results.size(); i != e; ++i) {		for (unsigned i = 0, e = results.size(); i != e; ++i) {
RV = Builder.CreateInsertValue(RV, results[i], i);		RV = Builder.CreateInsertValue(RV, results[i], i);
}		}
}		}
break;		break;
}		}

case ABIArgInfo::Expand:		case ABIArgInfo::Expand:
		case ABIArgInfo::IndirectAliased:
llvm_unreachable("Invalid ABI kind for return argument");		llvm_unreachable("Invalid ABI kind for return argument");
}		}

llvm::Instruction *Ret;		llvm::Instruction *Ret;
if (RV) {		if (RV) {
if (CurFuncDecl && CurFuncDecl->hasAttr<CmseNSEntryAttr>()) {		if (CurFuncDecl && CurFuncDecl->hasAttr<CmseNSEntryAttr>()) {
// For certain return types, clear padding bits, as they may reveal		// For certain return types, clear padding bits, as they may reveal
// sensitive information.		// sensitive information.
▲ Show 20 Lines • Show All 1,110 Lines • ▼ Show 20 Lines	case ABIArgInfo::InAlloca: {
// from {}* to (%struct.foo).		// from {}* to (%struct.foo).
if (Addr.getType() != MemType)		if (Addr.getType() != MemType)
Addr = Builder.CreateBitCast(Addr, MemType);		Addr = Builder.CreateBitCast(Addr, MemType);
I->copyInto(*this, Addr);		I->copyInto(*this, Addr);
}		}
break;		break;
}		}

case ABIArgInfo::Indirect: {		case ABIArgInfo::Indirect:
		case ABIArgInfo::IndirectAliased: {
assert(NumIRArgs == 1);		assert(NumIRArgs == 1);
if (!I->isAggregate()) {		if (!I->isAggregate()) {
// Make a temporary alloca to pass the argument.		// Make a temporary alloca to pass the argument.
Address Addr = CreateMemTempWithoutCast(		Address Addr = CreateMemTempWithoutCast(
I->Ty, ArgInfo.getIndirectAlign(), "indirect-arg-temp");		I->Ty, ArgInfo.getIndirectAlign(), "indirect-arg-temp");
IRCallArgs[FirstIRArg] = Addr.getPointer();		IRCallArgs[FirstIRArg] = Addr.getPointer();

I->copyInto(*this, Addr);		I->copyInto(*this, Addr);
▲ Show 20 Lines • Show All 238 Lines • ▼ Show 20 Lines	case ABIArgInfo::CoerceAndExpand: {

if (tempSize) {		if (tempSize) {
EmitLifetimeEnd(tempSize, AllocaAddr.getPointer());		EmitLifetimeEnd(tempSize, AllocaAddr.getPointer());
}		}

break;		break;
}		}

case ABIArgInfo::Expand:		case ABIArgInfo::Expand: {
unsigned IRArgPos = FirstIRArg;		unsigned IRArgPos = FirstIRArg;
ExpandTypeToArgs(I->Ty, *I, IRFuncTy, IRCallArgs, IRArgPos);		ExpandTypeToArgs(I->Ty, *I, IRFuncTy, IRCallArgs, IRArgPos);
assert(IRArgPos == FirstIRArg + NumIRArgs);		assert(IRArgPos == FirstIRArg + NumIRArgs);
break;		break;
}		}
}		}
		}
		rjmccallUnsubmitted Done Reply Inline Actions Please just make this use the Indirect code. If we gave it special attention, we could optimize it better, but conservatively doing what Indirect does should still work. rjmccall: Please just make this use the Indirect code. If we gave it special attention, we could…

const CGCallee &ConcreteCallee = Callee.prepareConcreteCallee(*this);		const CGCallee &ConcreteCallee = Callee.prepareConcreteCallee(*this);
llvm::Value *CalleePtr = ConcreteCallee.getFunctionPointer();		llvm::Value *CalleePtr = ConcreteCallee.getFunctionPointer();

// If we're using inalloca, set up that argument.		// If we're using inalloca, set up that argument.
if (ArgMemory.isValid()) {		if (ArgMemory.isValid()) {
llvm::Value *Arg = ArgMemory.getPointer();		llvm::Value *Arg = ArgMemory.getPointer();
if (CallInfo.isVariadic()) {		if (CallInfo.isVariadic()) {
▲ Show 20 Lines • Show All 393 Lines • ▼ Show 20 Lines	case ABIArgInfo::Direct: {
// If the value is offset in memory, apply the offset now.		// If the value is offset in memory, apply the offset now.
Address StorePtr = emitAddressAtOffset(*this, DestPtr, RetAI);		Address StorePtr = emitAddressAtOffset(*this, DestPtr, RetAI);
CreateCoercedStore(CI, StorePtr, DestIsVolatile, *this);		CreateCoercedStore(CI, StorePtr, DestIsVolatile, *this);

return convertTempToRValue(DestPtr, RetTy, SourceLocation());		return convertTempToRValue(DestPtr, RetTy, SourceLocation());
}		}

case ABIArgInfo::Expand:		case ABIArgInfo::Expand:
		case ABIArgInfo::IndirectAliased:
llvm_unreachable("Invalid ABI kind for return argument");		llvm_unreachable("Invalid ABI kind for return argument");
}		}

llvm_unreachable("Unhandled ABIArgInfo::Kind");		llvm_unreachable("Unhandled ABIArgInfo::Kind");
} ();		} ();

// Emit the assume_aligned check on the return value.		// Emit the assume_aligned check on the return value.
if (Ret.isScalar() && TargetDecl) {		if (Ret.isScalar() && TargetDecl) {
Show All 39 Lines

clang/lib/CodeGen/TargetInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 251 Lines • ▼ Show 20 Lines	LLVM_DUMP_METHOD void ABIArgInfo::dump() const {
case InAlloca:		case InAlloca:
OS << "InAlloca Offset=" << getInAllocaFieldIndex();		OS << "InAlloca Offset=" << getInAllocaFieldIndex();
break;		break;
case Indirect:		case Indirect:
OS << "Indirect Align=" << getIndirectAlign().getQuantity()		OS << "Indirect Align=" << getIndirectAlign().getQuantity()
<< " ByVal=" << getIndirectByVal()		<< " ByVal=" << getIndirectByVal()
<< " Realign=" << getIndirectRealign();		<< " Realign=" << getIndirectRealign();
break;		break;
		case IndirectAliased:
		OS << "Indirect Align=" << getIndirectAlign().getQuantity()
		<< " AadrSpace=" << getIndirectAddrSpace()
		<< " Realign=" << getIndirectRealign();
		break;
case Expand:		case Expand:
OS << "Expand";		OS << "Expand";
break;		break;
case CoerceAndExpand:		case CoerceAndExpand:
OS << "CoerceAndExpand Type=";		OS << "CoerceAndExpand Type=";
getCoerceAndExpandType()->print(OS);		getCoerceAndExpandType()->print(OS);
break;		break;
}		}
▲ Show 20 Lines • Show All 1,716 Lines • ▼ Show 20 Lines
}		}

static bool isArgInAlloca(const ABIArgInfo &Info) {		static bool isArgInAlloca(const ABIArgInfo &Info) {
// Leave ignored and inreg arguments alone.		// Leave ignored and inreg arguments alone.
switch (Info.getKind()) {		switch (Info.getKind()) {
case ABIArgInfo::InAlloca:		case ABIArgInfo::InAlloca:
return true;		return true;
case ABIArgInfo::Ignore:		case ABIArgInfo::Ignore:
		case ABIArgInfo::IndirectAliased:
		rjmccallUnsubmitted Not Done Reply Inline Actions In principle, this can be `inreg` just as much as Indirect can. rjmccall: In principle, this can be `inreg` just as much as Indirect can.
		arsenmAuthorUnsubmitted Done Reply Inline Actions The IR verifier currently will reject byref + inreg arsenm: The IR verifier currently will reject byref + inreg
		rjmccallUnsubmitted Not Done Reply Inline Actions Why? `inreg` is essentially orthogonal. rjmccall: Why? `inreg` is essentially orthogonal.
		arsenmAuthorUnsubmitted Done Reply Inline Actions Mostly inherited from the other similar attribute handling. It can be lifted if there's a use arsenm: Mostly inherited from the other similar attribute handling. It can be lifted if there's a use
		arsenmAuthorUnsubmitted Done Reply Inline Actions Plus the name here is isArgInAlloca; this is not necessarily passed in an alloca arsenm: Plus the name here is isArgInAlloca; this is not necessarily passed in an alloca
		rjmccallUnsubmitted Not Done Reply Inline Actions I agree that we don't need to update this. rjmccall: I agree that we don't need to update this.
return false;		return false;
case ABIArgInfo::Indirect:		case ABIArgInfo::Indirect:
case ABIArgInfo::Direct:		case ABIArgInfo::Direct:
case ABIArgInfo::Extend:		case ABIArgInfo::Extend:
return !Info.getInReg();		return !Info.getInReg();
case ABIArgInfo::Expand:		case ABIArgInfo::Expand:
case ABIArgInfo::CoerceAndExpand:		case ABIArgInfo::CoerceAndExpand:
// These are aggregate types which are never passed in registers when		// These are aggregate types which are never passed in registers when
▲ Show 20 Lines • Show All 6,785 Lines • ▼ Show 20 Lines

/// For kernels all parameters are really passed in a special buffer. It doesn't		/// For kernels all parameters are really passed in a special buffer. It doesn't
/// make sense to pass anything byval, so everything must be direct.		/// make sense to pass anything byval, so everything must be direct.
ABIArgInfo AMDGPUABIInfo::classifyKernelArgumentType(QualType Ty) const {		ABIArgInfo AMDGPUABIInfo::classifyKernelArgumentType(QualType Ty) const {
Ty = useFirstFieldIfTransparentUnion(Ty);		Ty = useFirstFieldIfTransparentUnion(Ty);

// TODO: Can we omit empty structs?		// TODO: Can we omit empty structs?

llvm::Type *LTy = nullptr;
if (const Type *SeltTy = isSingleElementStruct(Ty, getContext()))		if (const Type *SeltTy = isSingleElementStruct(Ty, getContext()))
LTy = CGT.ConvertType(QualType(SeltTy, 0));		Ty = QualType(SeltTy, 0);

		llvm::Type *OrigLTy = CGT.ConvertType(Ty);
		llvm::Type *LTy = OrigLTy;
if (getContext().getLangOpts().HIP) {		if (getContext().getLangOpts().HIP) {
if (!LTy)
LTy = CGT.ConvertType(Ty);
LTy = coerceKernelArgumentType(		LTy = coerceKernelArgumentType(
LTy, /FromAS=/getContext().getTargetAddressSpace(LangAS::Default),		OrigLTy, /FromAS=/getContext().getTargetAddressSpace(LangAS::Default),
/ToAS=/getContext().getTargetAddressSpace(LangAS::cuda_device));		/ToAS=/getContext().getTargetAddressSpace(LangAS::cuda_device));
}		}

		// FIXME: Should also use this for OpenCL, but it requires addressing the
		// problem of kernels being called.
		//
		// FIXME: This doesn't apply the optimization of coercing pointers in structs
		// to global address space when using byref. This would require implementing a
		rjmccallUnsubmitted Not Done Reply Inline Actions I don't see why you'd use `byref` when promoting pointers in structs. Maybe it works as a hack with your backend, but it seems extremely special-case and should not be hacked into the general infrastructure. rjmccall: I don't see why you'd use `byref` when promoting pointers in structs. Maybe it works as a hack…
		arsenmAuthorUnsubmitted Done Reply Inline Actions The whole point is to reinterpret the address space of the pointers in memory since we know if it's a kernel argument it has to be an addrspace(1) pointer or garbage. We can't infer the address space of a generic pointer loaded from memory. byref doesn't change that, it just makes the fact that these are passed in memory explicit arsenm: The whole point is to reinterpret the address space of the pointers in memory since we know if…
		rjmccallUnsubmitted Not Done Reply Inline Actions `byref` is interpreted by your backend passes as an instruction that the argument value is actually the address of an object that's passed to the kernel by value, so you need to expand the memory in the kernel argument marshalling. Why would that be something you'd want to trigger when passing a struct with a pointer in it? You're not going to recursively copy and pass down the pointee values of those pointers. rjmccall: `byref` is interpreted by your backend passes as an instruction that the argument value is…
		arsenmAuthorUnsubmitted Done Reply Inline Actions Because all arguments are really passed byref, we're just not at the point yet where we can switch all IR arguments to use byref for all arguments. All of the relevant properties are really always on the in-memory value. The promotion this is talking about is really orthogonal to the IR mechanism used for passing kernel arguments. This promotion is because the language only exposes generic pointers. In the context of a pointer inside a struct passed as a kernel argument, we semantically know the address space of any valid pointers must be global. You could not produce a valid generic pointer from another address space here. The pointers/structs are still the same size and layout, but coercing the in-memory address space is semantically more useful to the optimizer arsenm: Because all arguments are really passed byref, we're just not at the point yet where we can…
		rjmccallUnsubmitted Not Done Reply Inline Actions I understand that the promotion is orthogonal to the IR mechanism used for passing kernel arguments, which is exactly why I'm asking why there's a comment saying that we should "use byref when promoting pointers in struct", because I have no idea what that's supposed to mean when the pointer is just a part of the value being passed. It sounds like what you want is to maybe customize the code that's emitted to copy a byref parameter into a parameter variable when the parameter type is a struct containing a pointer you want to promote. But that doesn't really have anything to do with `byref`; if you weren't using `byref`, you'd still want a similar customization when creating the parameter variable. So it seems to me that the comment is still off-target. rjmccall: I understand that the promotion is orthogonal to the IR mechanism used for passing kernel…
		// new kind of coercion of the in-memory type when for indirect arguments.
		if (!getContext().getLangOpts().OpenCL && LTy == OrigLTy &&
		isAggregateTypeForABI(Ty)) {
		return ABIArgInfo::getIndirectAliased(
		getContext().getTypeAlignInChars(Ty),
		getContext().getTargetAddressSpace(LangAS::opencl_constant),
		false /Realign/, nullptr /Padding/);
		}

// If we set CanBeFlattened to true, CodeGen will expand the struct to its		// If we set CanBeFlattened to true, CodeGen will expand the struct to its
// individual elements, which confuses the Clover OpenCL backend; therefore we		// individual elements, which confuses the Clover OpenCL backend; therefore we
// have to set it to false here. Other args of getDirect() are just defaults.		// have to set it to false here. Other args of getDirect() are just defaults.
return ABIArgInfo::getDirect(LTy, 0, nullptr, false);		return ABIArgInfo::getDirect(LTy, 0, nullptr, false);
}		}

ABIArgInfo AMDGPUABIInfo::classifyArgumentType(QualType Ty,		ABIArgInfo AMDGPUABIInfo::classifyArgumentType(QualType Ty,
unsigned &NumRegsLeft) const {		unsigned &NumRegsLeft) const {
▲ Show 20 Lines • Show All 542 Lines • ▼ Show 20 Lines	Address SparcV9ABIInfo::EmitVAArg(CodeGenFunction &CGF, Address VAListAddr,
auto TypeInfo = getContext().getTypeInfoInChars(Ty);		auto TypeInfo = getContext().getTypeInfoInChars(Ty);

Address ArgAddr = Address::invalid();		Address ArgAddr = Address::invalid();
CharUnits Stride;		CharUnits Stride;
switch (AI.getKind()) {		switch (AI.getKind()) {
case ABIArgInfo::Expand:		case ABIArgInfo::Expand:
case ABIArgInfo::CoerceAndExpand:		case ABIArgInfo::CoerceAndExpand:
case ABIArgInfo::InAlloca:		case ABIArgInfo::InAlloca:
llvm_unreachable("Unsupported ABI kind for va_arg");		llvm_unreachable("Unsupported ABI kind for va_arg");
		rjmccallUnsubmitted Not Done Reply Inline Actions No reason not to use the Indirect code here. rjmccall: No reason not to use the Indirect code here.
		arsenmAuthorUnsubmitted Done Reply Inline Actions I generally don't like speculatively adding handling for features I can't write a testcase for, but I've moved these arsenm: I generally don't like speculatively adding handling for features I can't write a testcase for…

case ABIArgInfo::Extend: {		case ABIArgInfo::Extend: {
Stride = SlotSize;		Stride = SlotSize;
CharUnits Offset = SlotSize - TypeInfo.first;		CharUnits Offset = SlotSize - TypeInfo.first;
ArgAddr = Builder.CreateConstInBoundsByteGEP(Addr, Offset, "extend");		ArgAddr = Builder.CreateConstInBoundsByteGEP(Addr, Offset, "extend");
break;		break;
}		}

case ABIArgInfo::Direct: {		case ABIArgInfo::Direct: {
auto AllocSize = getDataLayout().getTypeAllocSize(AI.getCoerceToType());		auto AllocSize = getDataLayout().getTypeAllocSize(AI.getCoerceToType());
Stride = CharUnits::fromQuantity(AllocSize).alignTo(SlotSize);		Stride = CharUnits::fromQuantity(AllocSize).alignTo(SlotSize);
ArgAddr = Addr;		ArgAddr = Addr;
break;		break;
}		}

case ABIArgInfo::Indirect:		case ABIArgInfo::Indirect:
		case ABIArgInfo::IndirectAliased:
Stride = SlotSize;		Stride = SlotSize;
ArgAddr = Builder.CreateElementBitCast(Addr, ArgPtrTy, "indirect");		ArgAddr = Builder.CreateElementBitCast(Addr, ArgPtrTy, "indirect");
ArgAddr = Address(Builder.CreateLoad(ArgAddr, "indirect.arg"),		ArgAddr = Address(Builder.CreateLoad(ArgAddr, "indirect.arg"),
TypeInfo.second);		TypeInfo.second);
break;		break;

case ABIArgInfo::Ignore:		case ABIArgInfo::Ignore:
return Address(llvm::UndefValue::get(ArgPtrTy), TypeInfo.second);		return Address(llvm::UndefValue::get(ArgPtrTy), TypeInfo.second);
▲ Show 20 Lines • Show All 336 Lines • ▼ Show 20 Lines	Address XCoreABIInfo::EmitVAArg(CodeGenFunction &CGF, Address VAListAddr,
llvm::Type *ArgPtrTy = llvm::PointerType::getUnqual(ArgTy);		llvm::Type *ArgPtrTy = llvm::PointerType::getUnqual(ArgTy);

Address Val = Address::invalid();		Address Val = Address::invalid();
CharUnits ArgSize = CharUnits::Zero();		CharUnits ArgSize = CharUnits::Zero();
switch (AI.getKind()) {		switch (AI.getKind()) {
case ABIArgInfo::Expand:		case ABIArgInfo::Expand:
case ABIArgInfo::CoerceAndExpand:		case ABIArgInfo::CoerceAndExpand:
case ABIArgInfo::InAlloca:		case ABIArgInfo::InAlloca:
llvm_unreachable("Unsupported ABI kind for va_arg");		llvm_unreachable("Unsupported ABI kind for va_arg");
		rjmccallUnsubmitted Done Reply Inline Actions Same. rjmccall: Same.
case ABIArgInfo::Ignore:		case ABIArgInfo::Ignore:
Val = Address(llvm::UndefValue::get(ArgPtrTy), TypeAlign);		Val = Address(llvm::UndefValue::get(ArgPtrTy), TypeAlign);
ArgSize = CharUnits::Zero();		ArgSize = CharUnits::Zero();
break;		break;
case ABIArgInfo::Extend:		case ABIArgInfo::Extend:
case ABIArgInfo::Direct:		case ABIArgInfo::Direct:
Val = Builder.CreateBitCast(AP, ArgPtrTy);		Val = Builder.CreateBitCast(AP, ArgPtrTy);
ArgSize = CharUnits::fromQuantity(		ArgSize = CharUnits::fromQuantity(
getDataLayout().getTypeAllocSize(AI.getCoerceToType()));		getDataLayout().getTypeAllocSize(AI.getCoerceToType()));
ArgSize = ArgSize.alignTo(SlotSize);		ArgSize = ArgSize.alignTo(SlotSize);
break;		break;
case ABIArgInfo::Indirect:		case ABIArgInfo::Indirect:
		case ABIArgInfo::IndirectAliased:
Val = Builder.CreateElementBitCast(AP, ArgPtrTy);		Val = Builder.CreateElementBitCast(AP, ArgPtrTy);
Val = Address(Builder.CreateLoad(Val), TypeAlign);		Val = Address(Builder.CreateLoad(Val), TypeAlign);
ArgSize = SlotSize;		ArgSize = SlotSize;
break;		break;
}		}

// Increment the VAList.		// Increment the VAList.
if (!ArgSize.isZero()) {		if (!ArgSize.isZero()) {
▲ Show 20 Lines • Show All 1,341 Lines • Show Last 20 Lines

clang/test/CodeGenCUDA/kernel-args.cu

	// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device \			// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device \
	// RUN: -emit-llvm %s -o - \| FileCheck -check-prefix=AMDGCN %s			// RUN: -emit-llvm %s -o - \| FileCheck -check-prefix=AMDGCN %s
	// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda- -fcuda-is-device \			// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda- -fcuda-is-device \
	// RUN: -emit-llvm %s -o - \| FileCheck -check-prefix=NVPTX %s			// RUN: -emit-llvm %s -o - \| FileCheck -check-prefix=NVPTX %s
	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	struct A {			struct A {
	int a[32];			int a[32];
	};			};

	// AMDGCN: define amdgpu_kernel void @_Z6kernel1A(%struct.A %x.coerce)			// AMDGCN: define amdgpu_kernel void @_Z6kernel1A(%struct.A addrspace(4)* byref(%struct.A) align 4 %{{.+}})
	// NVPTX: define void @_Z6kernel1A(%struct.A* byval(%struct.A) align 4 %x)			// NVPTX: define void @_Z6kernel1A(%struct.A* byval(%struct.A) align 4 %x)
	__global__ void kernel(A x) {			__global__ void kernel(A x) {
	}			}

	class Kernel {			class Kernel {
	public:			public:
	// AMDGCN: define amdgpu_kernel void @_ZN6Kernel12memberKernelE1A(%struct.A %x.coerce)			// AMDGCN: define amdgpu_kernel void @_ZN6Kernel12memberKernelE1A(%struct.A addrspace(4)* byref(%struct.A) align 4 %{{.+}})
	// NVPTX: define void @_ZN6Kernel12memberKernelE1A(%struct.A* byval(%struct.A) align 4 %x)			// NVPTX: define void @_ZN6Kernel12memberKernelE1A(%struct.A* byval(%struct.A) align 4 %x)
	static __global__ void memberKernel(A x){}			static __global__ void memberKernel(A x){}
	template<typename T> static __global__ void templateMemberKernel(T x) {}			template<typename T> static __global__ void templateMemberKernel(T x) {}
	};			};


	template <typename T>			template <typename T>
	__global__ void templateKernel(T x) {}			__global__ void templateKernel(T x) {}

	void launch(void*);			void launch(void*);

	void test() {			void test() {
	Kernel K;			Kernel K;
	// AMDGCN: define amdgpu_kernel void @_Z14templateKernelI1AEvT_(%struct.A %x.coerce)			// AMDGCN: define amdgpu_kernel void @_Z14templateKernelI1AEvT_(%struct.A addrspace(4)* byref(%struct.A) align 4 %{{.+}}
	// NVPTX: define void @_Z14templateKernelI1AEvT_(%struct.A* byval(%struct.A) align 4 %x)			// NVPTX: define void @_Z14templateKernelI1AEvT_(%struct.A* byval(%struct.A) align 4 %x)
	launch((void*)templateKernel<A>);			launch((void*)templateKernel<A>);

	// AMDGCN: define amdgpu_kernel void @_ZN6Kernel20templateMemberKernelI1AEEvT_(%struct.A %x.coerce)			// AMDGCN: define amdgpu_kernel void @_ZN6Kernel20templateMemberKernelI1AEEvT_(%struct.A addrspace(4)* byref(%struct.A) align 4 %{{.+}}
	// NVPTX: define void @_ZN6Kernel20templateMemberKernelI1AEEvT_(%struct.A* byval(%struct.A) align 4 %x)			// NVPTX: define void @_ZN6Kernel20templateMemberKernelI1AEEvT_(%struct.A* byval(%struct.A) align 4 %x)
	launch((void*)Kernel::templateMemberKernel<A>);			launch((void*)Kernel::templateMemberKernel<A>);
	}			}

clang/test/CodeGenOpenCL/amdgpu-abi-struct-coerce.cl

	Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	typedef struct struct_of_structs_arg			typedef struct struct_of_structs_arg
	{			{
	int i1;			int i1;
	float f1;			float f1;
	struct_arg_t s1;			struct_arg_t s1;
	int i2;			int i2;
	} struct_of_structs_arg_t;			} struct_of_structs_arg_t;

	// CHECK: %union.transparent_u = type { i32 }
	typedef union			typedef union
	{			{
	int b1;			int b1;
	float b2;			float b2;
	} transparent_u __attribute__((__transparent_union__));			} transparent_u __attribute__((__transparent_union__));

	// CHECK: %struct.single_array_element_struct_arg = type { [4 x i32] }			// CHECK: %struct.single_array_element_struct_arg = type { [4 x i32] }
	typedef struct single_array_element_struct_arg			typedef struct single_array_element_struct_arg
	▲ Show 20 Lines • Show All 153 Lines • ▼ Show 20 Lines
	__kernel void kernel_struct_padding_arg(struct_padding_arg arg1) { }			__kernel void kernel_struct_padding_arg(struct_padding_arg arg1) { }

	// CHECK: void @kernel_test_struct_of_arrays_arg(%struct.struct_of_arrays_arg %arg1.coerce)			// CHECK: void @kernel_test_struct_of_arrays_arg(%struct.struct_of_arrays_arg %arg1.coerce)
	__kernel void kernel_test_struct_of_arrays_arg(struct_of_arrays_arg_t arg1) { }			__kernel void kernel_test_struct_of_arrays_arg(struct_of_arrays_arg_t arg1) { }

	// CHECK: void @kernel_struct_of_structs_arg(%struct.struct_of_structs_arg %arg1.coerce)			// CHECK: void @kernel_struct_of_structs_arg(%struct.struct_of_structs_arg %arg1.coerce)
	__kernel void kernel_struct_of_structs_arg(struct_of_structs_arg_t arg1) { }			__kernel void kernel_struct_of_structs_arg(struct_of_structs_arg_t arg1) { }

	// CHECK: void @test_kernel_transparent_union_arg(%union.transparent_u %u.coerce)			// CHECK: void @test_kernel_transparent_union_arg(i32 %u.coerce)
	__kernel void test_kernel_transparent_union_arg(transparent_u u) { }			__kernel void test_kernel_transparent_union_arg(transparent_u u) { }

	// CHECK: void @kernel_single_array_element_struct_arg(%struct.single_array_element_struct_arg %arg1.coerce)			// CHECK: void @kernel_single_array_element_struct_arg(%struct.single_array_element_struct_arg %arg1.coerce)
	__kernel void kernel_single_array_element_struct_arg(single_array_element_struct_arg_t arg1) { }			__kernel void kernel_single_array_element_struct_arg(single_array_element_struct_arg_t arg1) { }

	// CHECK: void @kernel_single_struct_element_struct_arg(%struct.single_struct_element_struct_arg %arg1.coerce)			// CHECK: void @kernel_single_struct_element_struct_arg(%struct.single_struct_element_struct_arg %arg1.coerce)
	__kernel void kernel_single_struct_element_struct_arg(single_struct_element_struct_arg_t arg1) { }			__kernel void kernel_single_struct_element_struct_arg(single_struct_element_struct_arg_t arg1) { }

	▲ Show 20 Lines • Show All 275 Lines • Show Last 20 Lines