This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/CodeGen/
-
CodeGen/
16/16
CGExpr.cpp
2/2
CodeGenFunction.h
-
CodeGenFunction.cpp
-
test/
-
CodeGenCUDA/
2/2
builtins-amdgcn.cu
-
CodeGenCXX/
-
amdgcn-automatic-variable.cpp
-
amdgcn-func-arg.cpp
-
builtin-amdgcn-atomic-inc-dec.cpp
-
vla.cpp
-
CodeGenSYCL/
-
address-space-deduction.cpp
-
OpenMP/
-
amdgcn_target_init_temp_alloca.cpp

Differential D110257

[CFE][Codegen] Make sure to maintain the contiguity of all the static allocas
ClosedPublic

Authored by hsmhsm on Sep 22 2021, 7:48 AM.

Download Raw Diff

Details

Reviewers

arsenm
rampitec
jdoerfert
lebedev.ri
nhaehnle
rjmccall
yaxunl
AndreyChurbanov
rnk
tra

Commits

rG3b9a85d10ac7: [CFE][Codegen] Make sure to maintain the contiguity of all the static allocas

Summary

at the start of the entry block, which in turn would aid better code transformation/optimization.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

hsmhsm created this revision.Sep 22 2021, 7:48 AM

Herald added a subscriber: jvesely. · View Herald TranscriptSep 22 2021, 7:48 AM

hsmhsm requested review of this revision.Sep 22 2021, 7:48 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 22 2021, 7:48 AM

Herald added subscribers: cfe-commits, sstefan1, wdng. · View Herald Transcript

hsmhsm retitled this revision from [CFE][Codegen] Do not the break contiguity of static allocas. to [CFE][Codegen] Do not break the contiguity of static allocas..Sep 22 2021, 8:12 AM

Harbormaster completed remote builds in B125130: Diff 374235.Sep 22 2021, 8:48 AM

Update source comment.

I do think it's cleaner/more canonical IR to cluster these at the top of the block, but I don't understand this comment:

otherwise, inliner's attempt to move static allocas from callee to caller will fail,

The inliner successfully moves allocas to the caller's entry block, even with addrspacecasts interspersed.

clang/lib/CodeGen/CGExpr.cpp
106–110	Where are the addrspacecasts inserted? Could you just adjust where those are inserted instead?

hsmhsm mentioned this in D109870: [AMDGPU] Enable the pass "amdgpu-replace-lds-use-with-pointer".Sep 22 2021, 9:16 AM

In D110257#3015553, @arsenm wrote:

I do think it's cleaner/more canonical IR to cluster these at the top of the block, but I don't understand this comment:

otherwise, inliner's attempt to move static allocas from callee to caller will fail,

The inliner successfully moves allocas to the caller's entry block, even with addrspacecasts interspersed.

The logic at https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/InlineFunction.cpp#L2093 assumes that static allocas (within callee) are contiguous.

clang/lib/CodeGen/CGExpr.cpp

106–110

The addressspace casts are inserted immediately after all static allocas (top static alloca cluster).

An example:

Before this patch:

entry:
  %N.addr = alloca i64, align 8, addrspace(5)
  %N.addr.ascast = addrspacecast i64 addrspace(5)* %N.addr to i64*
  %vla.addr = alloca i64, align 8, addrspace(5)
  %vla.addr.ascast = addrspacecast i64 addrspace(5)* %vla.addr to i64*
  %a.addr = alloca i32*, align 8, addrspace(5)
  %a.addr.ascast = addrspacecast i32* addrspace(5)* %a.addr to i32**
  %vla.addr2 = alloca i64, align 8, addrspace(5)
  %vla.addr2.ascast = addrspacecast i64 addrspace(5)* %vla.addr2 to i64*
  %b.addr = alloca i32*, align 8, addrspace(5)
  %b.addr.ascast = addrspacecast i32* addrspace(5)* %b.addr to i32**
  %N.casted = alloca i64, align 8, addrspace(5)
  %N.casted.ascast = addrspacecast i64 addrspace(5)* %N.casted to i64*
  %.zero.addr = alloca i32, align 4, addrspace(5)
  %.zero.addr.ascast = addrspacecast i32 addrspace(5)* %.zero.addr to i32*
  %.threadid_temp. = alloca i32, align 4, addrspace(5)
  %.threadid_temp..ascast = addrspacecast i32 addrspace(5)* %.threadid_temp. to i32*  
  store i64 %N, i64* %N.addr.ascast, align 8

With this patch:

entry:
  %N.addr = alloca i64, align 8, addrspace(5)
  %vla.addr = alloca i64, align 8, addrspace(5)
  %a.addr = alloca i32*, align 8, addrspace(5)
  %vla.addr2 = alloca i64, align 8, addrspace(5)
  %b.addr = alloca i32*, align 8, addrspace(5)
  %N.casted = alloca i64, align 8, addrspace(5)
  %.zero.addr = alloca i32, align 4, addrspace(5)
  %.threadid_temp. = alloca i32, align 4, addrspace(5)
  %.threadid_temp..ascast = addrspacecast i32 addrspace(5)* %.threadid_temp. to i32*
  %.zero.addr.ascast = addrspacecast i32 addrspace(5)* %.zero.addr to i32*
  %N.casted.ascast = addrspacecast i64 addrspace(5)* %N.casted to i64*
  %b.addr.ascast = addrspacecast i32* addrspace(5)* %b.addr to i32**
  %vla.addr2.ascast = addrspacecast i64 addrspace(5)* %vla.addr2 to i64*
  %a.addr.ascast = addrspacecast i32* addrspace(5)* %a.addr to i32**
  %vla.addr.ascast = addrspacecast i64 addrspace(5)* %vla.addr to i64*
  %N.addr.ascast = addrspacecast i64 addrspace(5)* %N.addr to i64*
  store i64 %N, i64* %N.addr.ascast, align 8

Harbormaster completed remote builds in B125143: Diff 374252.Sep 22 2021, 9:47 AM

In D110257#3015641, @hsmhsm wrote:

In D110257#3015553, @arsenm wrote:

I do think it's cleaner/more canonical IR to cluster these at the top of the block, but I don't understand this comment:

otherwise, inliner's attempt to move static allocas from callee to caller will fail,

The inliner successfully moves allocas to the caller's entry block, even with addrspacecasts interspersed.

The logic at https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/InlineFunction.cpp#L2093 assumes that static allocas (within callee) are contiguous.

True. Even worse. It will bail if a static alloca is used as an inalloca argument.
So, if you now interleave allocas that may be used in inalloca you also break the "canonical form".
I assume this hits mostly windows but still.

arsenm added inline comments.Sep 22 2021, 11:12 AM

clang/lib/CodeGen/CGExpr.cpp
106–110	I meant where in the clang code are these emitted, and how is that I set point found?

In D110257#3015712, @jdoerfert wrote:

In D110257#3015641, @hsmhsm wrote:

In D110257#3015553, @arsenm wrote:

I do think it's cleaner/more canonical IR to cluster these at the top of the block, but I don't understand this comment:

otherwise, inliner's attempt to move static allocas from callee to caller will fail,

The inliner successfully moves allocas to the caller's entry block, even with addrspacecasts interspersed.

The logic at https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/InlineFunction.cpp#L2093 assumes that static allocas (within callee) are contiguous.

True. Even worse. It will bail if a static alloca is used as an inalloca argument.
So, if you now interleave allocas that may be used in inalloca you also break the "canonical form".
I assume this hits mostly windows but still.

@rnk This might be of interest to you. Any thoughts?

In D110257#3015641, @hsmhsm wrote:

The logic at https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/InlineFunction.cpp#L2093 assumes that static allocas (within callee) are contiguous.

No, it doesn't make that assumption. That code is an attempt to optimize the linked list splicing, so that batches of contiguous allocas can be spliced over together. I believe the code would still be correct if you removed this inner loop and left behind the outer loop, which iterates over all allocas in the entry block.

In D110257#3016113, @rnk wrote:

In D110257#3015641, @hsmhsm wrote:

The logic at https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/InlineFunction.cpp#L2093 assumes that static allocas (within callee) are contiguous.

No, it doesn't make that assumption. That code is an attempt to optimize the linked list splicing, so that batches of contiguous allocas can be spliced over together. I believe the code would still be correct if you removed this inner loop and left behind the outer loop, which iterates over all allocas in the entry block.

Thanks, understood it now. So, the actual logic (in the outer loop) is - scan the (contiguous ) batches of static allocas within the callees entry block, and move those batches to caller as one chunk at a time.

That said, it is still good idea (and actually an explicitly not mandated requirement) to maintain the contiguity of the static allocas at the top of the basic block as one cluster, and it should start from FE itself. So, this patch is still relevant.

hsmhsm edited the summary of this revision. (Show Details)Sep 22 2021, 7:11 PM

hsmhsm added a reviewer: rnk.

hsmhsm edited the summary of this revision. (Show Details)

Update source comment.

xgupta added a subscriber: xgupta.Sep 22 2021, 7:29 PM

hsmhsm added inline comments.Sep 22 2021, 7:31 PM

clang/lib/CodeGen/CGExpr.cpp
106–110	I understand your question as - where exactly is code emitted? It is Clang Codegen part, the Clang Codegen maintains a builder (function specific?) which builds and emits code, and builder always maintains default current insertion position, and it can also be asked to insert at some other place by calling the api SetInsertPoint(). In this particular case, the addrespace casts are emitted at (by calling the builder) https://github.com/llvm-mirror/clang/blob/master/lib/CodeGen/TargetInfo.cpp#445 All I am doing here is - direct the builder to insert addrespace casts just after all static allocas by appropriately setting the insert position via SetInsertPoint().

Harbormaster completed remote builds in B125264: Diff 374419.Sep 22 2021, 7:49 PM

Rebase.

Harbormaster completed remote builds in B125302: Diff 374486.Sep 23 2021, 2:47 AM

Pls make sure the patch passes internal CI. Thanks.

clang/test/OpenMP/distribute_parallel_for_simd_if_codegen.cpp
264 ↗	(On Diff #374486)	Is the test updated by a script? If the original test does not check !llvm.access.group, the updated test should not check it either. This makes the test less stable.

hsmhsm added inline comments.Sep 23 2021, 7:10 AM

clang/test/OpenMP/distribute_parallel_for_simd_if_codegen.cpp
264 ↗	(On Diff #374486)	Yes, most of the tests here are updated by script only. Probably I might have missed few command line options to the script. I have not passed any option to the script while updating it. Let me check.

hsmhsm added inline comments.Sep 23 2021, 8:20 AM

clang/test/OpenMP/distribute_parallel_for_simd_if_codegen.cpp
264 ↗	(On Diff #374486)	I have used exact command line as in the test file plus extra option "--force-update". But it's presence or absence is not making any difference here. But the !llvm.access.group metadata is newly added. And I am not finding anyway of disabling it. Do you have any idea? or anyone else for that matter?

In D110257#3016918, @hsmhsm wrote:

That said, it is still good idea (and actually an explicitly not mandated requirement) to maintain the contiguity of the static allocas at the top of the basic block as one cluster, and it should start from FE itself. So, this patch is still relevant.

The patch mostly affects GPU tests, so as long as the GPU folks (AMD/nvptx) are happy with the regenerated test cases, this seems fine to me.

arsenm added inline comments.Sep 23 2021, 2:14 PM

clang/lib/CodeGen/CGExpr.cpp
107	Why is there a special AllocaInsertPt iterator in the first place? Can you avoid any iteration logic by just always inserting at the block start?

jdoerfert added inline comments.Sep 23 2021, 3:13 PM

clang/lib/CodeGen/CGExpr.cpp
107	Right. The alloca insertion point is sometimes changed to a non-entry block, and we should keep that ability. From all the use cases I know it would suffice to insert at the beginning of the alloca insertion point block though.

hsmhsm added inline comments.Sep 23 2021, 7:31 PM

clang/lib/CodeGen/CGExpr.cpp
107	I really do not understand this comment fully. This block of code here inserts an "addressspace cast" of recently inserted alloca, not the alloca itself. Alloca is already inserted. Please look at the start of this function. The old logic (in the left) inserts addressspace cast of recently inserted alloca immediately after recent alloca using AllocaInsertPt. As a side effect, it also makes AllocaInsertPt now point to this newly inserted addressspace cast. Hence, next alloca is inserted after this new addressspace cast, because now AllocaInsertPt is made to point this newly inserted addressspace cast. The new logic (here) fixes that by moving insertion point just beyond current AllocaInsertPt without updating AllocaInsertPt. How can I insert "addressspace cast" of an alloca, at the beginning of the block even before alloca? As I understand it, AllocaInsertPt is maintained to insert static allocas at the start of entry block, otherwise there is no any special reason to maintain such a special insertion point. Please look at https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/CodeGenFunction.h#L378. That said, I am not really sure, if I have completely misunderstood the comment above. If that is the case, then, I need better clarification here about what really is expected.

Remove "!llvm.access.group" metadata from check lines in test files.

rnk added inline comments.Sep 24 2021, 9:23 AM

clang/lib/CodeGen/CGExpr.cpp
107	Well, inserting at the top of the entry block would reverse the order of the allocas. Currently they appear in source/IRGen order, which is nice. Maintaining the order requires appending, which requires a cursor of some kind.

Harbormaster completed remote builds in B125585: Diff 374869.Sep 24 2021, 9:41 AM

hsmhsm added inline comments.Sep 24 2021, 11:01 AM

clang/lib/CodeGen/CGExpr.cpp
107	Yes, correct. And I am waiting for further inputs from @yaxunl / @arsenm / @jdoerfert

Nice work.

foad added a subscriber: foad.Sep 25 2021, 12:05 AM

hsmhsm added a reviewer: tra.Sep 27 2021, 10:13 PM

Rebase.

ping

Harbormaster completed remote builds in B126035: Diff 375472.Sep 27 2021, 10:51 PM

@yaxunl / @jdoerfert / @tra

Can I expect your further comment/decision on this patch?

nice work

While I understand people are eager to receive feedback on their patches, it is not helpful to ping/remind the reviewers constantly.
This does generate noise for them and can consequently also reduce their interest in a patch. The recommendation for time without
review before a "ping" is send is still one week.

clang/lib/CodeGen/CGExpr.cpp
133	I'm not even sure this is necessarily correct. How do we know the new store is not inside a loop and might write the value more than once, potentially overwriting a value set later? Aside from that (important) question, you need to update the documentation of the function. It doesn't correspond to the new semantics anymore.
clang/test/CodeGenCUDA/builtins-amdgcn.cu
2	Please prepare a pre-commit that adds auto-generated check lines. Adding them as part of a commit makes it impossible to see the difference.

In D110257#3027633, @jdoerfert wrote:

While I understand people are eager to receive feedback on their patches, it is not helpful to ping/remind the reviewers constantly.
This does generate noise for them and can consequently also reduce their interest in a patch. The recommendation for time without
review before a "ping" is send is still one week.

Agreed.

clang/lib/CodeGen/CGExpr.cpp
133	I am not convinced with above comments -Because, this code neither changes the address nor the initializer (https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/CodeGenFunction.h#L2548), but only a place where the initialization happens. Further, as the documentation says, the initializer must be a constant or function argument (and surprisingly, I do not see any assertion about it). Hence, even if the initialization happens within the loop, loop invariant code motion pass should detect it. That said, if we think, it is better to keep the initialization within entry block, we can do it at the end of the block.
clang/test/CodeGenCUDA/builtins-amdgcn.cu
2	Will do

jdoerfert added inline comments.Sep 28 2021, 9:46 AM

clang/lib/CodeGen/CGExpr.cpp
133	What the initial value is is irrelevant. Arguing LICM should take care of it is also not helpful. Changing the location of the initialization is in itself a problem: Before: a = alloca a = 0; // init is in the entry block for (...) { use(a); a = a + 1; } After, potentially: a = alloca for (...) { a = 0; // init is now at the builder insertion point use(a); a = a + 1; }

hsmhsm added inline comments.Sep 28 2021, 10:13 AM

clang/lib/CodeGen/CGExpr.cpp
133	Clarification - I am not trying to argue anything here with anybody, I am trying to defend myself, where I feel I am right as per my knowledge. If I realize that I am at false later, then I correct myself, but I am not arguing, and will never ever do. Ok, I will try to fix it from the practical point of view. But, I still think as follows - From theoretical/good programming/good front-end code generation perspective, initialization of something will never happen within construct like loop, otherwise initialization has no meaning at all.

jdoerfert added inline comments.Sep 28 2021, 10:35 AM

clang/lib/CodeGen/CGExpr.cpp
133	[...] initialization of something will never happen within construct like loop, otherwise initialization has no meaning at all. Why would we not create a new temporary allocas inside of a loop? There is nothing in any of the APIs or descriptions that would indicate you could not create a temporary alloca and initialize it while you are generating code in a loop. I cannot see why initialization would me meaningless given that it was inserted in the entry block prior to this change.

hsmhsm mentioned this in D110676: [CFE][Codegen] Update auto-generated check lines for few GPU lit tests.Sep 28 2021, 8:51 PM

Fix review comments by @jdoerfert.

Harbormaster completed remote builds in B126255: Diff 375789.Sep 28 2021, 11:46 PM

hsmhsm added a parent revision: D110676: [CFE][Codegen] Update auto-generated check lines for few GPU lit tests.Sep 28 2021, 11:53 PM

hsmhsm added a child revision: D109870: [AMDGPU] Enable the pass "amdgpu-replace-lds-use-with-pointer".Sep 29 2021, 2:16 AM

ping

hsmhsm mentioned this in rG393581d8a5cb: [CFE][Codegen] Update auto-generated check lines for few GPU lit tests.Oct 6 2021, 9:21 PM

Rebase.

This patch seems to be confused. You're making two changes. In one of them, you're trying to prevent addrspacecasts from being interleaved with the sequence of allocas in the function prologue. In the other, you're moving the store emitted by InitTempAlloca so that it becomes interleaved with the sequence of allocas, but only when there's an addrspacecast.

Now, I should say that InitTempAlloca is generally a problematic function. Its purpose is to put an initialization in the prologue of the function so that it always happens prior to some other code executing. This is rarely useful, though, because the memory is usually tied to some specific feature in the code, and as Johannes says, that place in the code may be reachable multiple times, and the memory typically needs to be freshly initialized each time. Using InitTempAlloca is therefore frequently wrong, and I'm honestly not sure there's any good reason to use it. Looking at the calls, some of them know that they're in the prologue, and so it should be fine to simply emit a store normally. Other calls seem quite suspect, like the one in CGObjCGNU.cpp. And even if it's semantically okay, it's potentially doing unnecessary work up-front when it only really needs to happen if that path is taken.

So I don't really care about aesthetic differences in the code created by InitTempAlloca, because we should just remove it completely.

If we really care about not interleaving things with the alloca sequence — and right now I am not convinced that we do, because contra the comments and description of this patch, this is not an LLVM requirement of any sort — I think we should lazily create a second InsertPt instruction after the AllocaInsertPt and insert all the secondary instruction prior to that so that they appear in source order.

Harbormaster completed remote builds in B127440: Diff 377742.Oct 6 2021, 10:33 PM

In D110257#3047166, @rjmccall wrote:

In the other, you're moving the store emitted by InitTempAlloca so that it becomes interleaved with the sequence of allocas, but only when there's an addrspacecast.

Not really.

In the absence of this patch, addrspacecast are interleaved with the sequence of allocas, and AllocaInsertPt always point to the end of this *interleaved sequence*. Within InitTempAlloca(), any init to alloca (or to its addrspacecast in case of an addrspacecast) happens just after AllocaInsertPt which is fine.

Now, in the presence of this patch, AllocaInsertPt points to the end of contiguous allocas but *BEFORE* any addrspacecast. This forces the changes to InitTempAlloca(). Otherwise, init of addrspacecast happens before addrspacecast itself.

Now, I should say that InitTempAlloca is generally a problematic function. Its purpose is to put an initialization in the prologue of the function so that it always happens prior to some other code executing. This is rarely useful, though, because the memory is usually tied to some specific feature in the code, and as Johannes says, that place in the code may be reachable multiple times, and the memory typically needs to be freshly initialized each time. Using InitTempAlloca is therefore frequently wrong, and I'm honestly not sure there's any good reason to use it. Looking at the calls, some of them know that they're in the prologue, and so it should be fine to simply emit a store normally. Other calls seem quite suspect, like the one in CGObjCGNU.cpp. And even if it's semantically okay, it's potentially doing unnecessary work up-front when it only really needs to happen if that path is taken.

So I don't really care about aesthetic differences in the code created by InitTempAlloca, because we should just remove it completely.

If we really care about not interleaving things with the alloca sequence — and right now I am not convinced that we do, because contra the comments and description of this patch, this is not an LLVM requirement of any sort — I think we should lazily create a second InsertPt instruction after the AllocaInsertPt and insert all the secondary instruction prior to that so that they appear in source order.

Agree. I will give it a try to make changes as you suggest, though it may take some time since it requires a bit of cleanup and handling of the changes to (possibly many) lit tests as a side effect.

Introduce a new instruction pointer which aid all the addressspace casts of static allocas
to appear in the source order immediately after all static allocas.

hsmhsm retitled this revision from [CFE][Codegen] Do not break the contiguity of static allocas. to [CFE][Codegen] Make sure to maintain the contiguity of all the static allocas.Oct 9 2021, 2:06 AM

hsmhsm edited the summary of this revision. (Show Details)

hsmhsm marked an inline comment as done.Oct 9 2021, 2:11 AM

Harbormaster completed remote builds in B127916: Diff 378424.Oct 9 2021, 2:38 AM

hsmhsm added a parent revision: D111324: [CFE][Codegen] Remove CodeGenFunction::InitTempAlloca().Oct 9 2021, 4:25 AM

Rebase.

Harbormaster completed remote builds in B128261: Diff 378875.Oct 11 2021, 10:20 PM

rjmccall added inline comments.Oct 11 2021, 10:49 PM

clang/lib/CodeGen/CodeGenFunction.h
392	Please call this something like `PostAllocaInsertPt`. Instead of eagerly creating it, please create it lazily: add an accessor like `getPostAllocaInsertPoint()` which creates (and saves here) an instruction that immediately follows `AllocaInsertPt` if this is currently null. I think you should make your own instruction so that we don't mess up any code that might rely on temporarily creating and then removing dead instructions. Please use an `llvm::AssertingVH<llvm::Instruction>`. I think that can handle holding a null value. The comment here should be more general, like "a place in the prologue where code can be inserted that will be dominated by all the static allocas." You don't need to talk about `addrspacecast`s specifically; that's just one possible use case for this. Also, it's either "`addrspacecast`" (using the IR name of the operation) or "address space cast" (writing out the operation in normal English words); don't run together `addressspace` as if it were a keyword when it isn't.

arichardson added a subscriber: arichardson.Oct 12 2021, 12:47 AM

arichardson added inline comments.

clang/lib/CodeGen/CGExpr.cpp
104	Shouldn't this use `cast` instead?

Fix review comments by @rjmccall.

hsmhsm marked 2 inline comments as done.Oct 14 2021, 5:55 AM

hsmhsm added inline comments.

clang/lib/CodeGen/CGExpr.cpp
104	You are right. But, in the updated patch, this code does not exist anymore.
clang/lib/CodeGen/CodeGenFunction.h
392	Thanks. Fixed the above review comments.

Harbormaster completed remote builds in B128842: Diff 379682.Oct 14 2021, 6:24 AM

Rebase + minor clean-up to patch.

Harbormaster completed remote builds in B129005: Diff 379918.Oct 14 2021, 11:11 PM

Rebase.

Harbormaster completed remote builds in B129289: Diff 380320.Oct 18 2021, 3:30 AM

Rebase.

Harbormaster completed remote builds in B129675: Diff 380871.Oct 20 2021, 3:54 AM

ping

@rjmccall I assume, I have fixed all your review comments. In case, if I have missed something OR if you think, few more changes are required for the patch, please do let me know so that I will proceed as per the comments/suggestions. I would like to bring this patch to the closure.

xgupta removed a subscriber: xgupta.Oct 28 2021, 2:34 AM

Rebase.

This patch is waiting for an action for a long time. I am expecting at-least anyone of the reviewers to either say "yes" or "no" or "further required improvements" on this patch so that I can take further action on this patch. If you say "no" to this patch with a strong valid reason, then I will abandon this patch and move-on instead of uncertainly waiting.

Harbormaster completed remote builds in B132385: Diff 384661.Nov 3 2021, 11:07 PM

LGTM. It seems all concerns have been addressed. Shall we move ahead and land this patch? Thanks.

lgtm

I believe you have addressed John's comments, and I think the IR changes mainly affect AMDGPU users, so I don't think this will be too disruptive.

Sorry about the delay, there's a bit of a bystander effect at play here with multiple reviewers.

This revision is now accepted and ready to land.Nov 9 2021, 11:44 AM

Thanks @rnk and @yaxunl .

Closed by commit rG3b9a85d10ac7: [CFE][Codegen] Make sure to maintain the contiguity of all the static allocas (authored by hsmhsm). · Explain WhyNov 9 2021, 7:16 PM

This revision was automatically updated to reflect the committed changes.

hsmhsm added a commit: rG3b9a85d10ac7: [CFE][Codegen] Make sure to maintain the contiguity of all the static allocas.

Please change the commit message to say why this change is necessary / an improvement on what we have now.

My recollection is that the amdgpu backend crashes on some IR and this decreases the probability of that IR pattern occuring, which still sounds like fixing the wrong place to me. Was this the one where hoisting static size alloca into the entry block near the backend would the problem?

I think this patch is missing a documentation update adding the new constraint that allocas must be contiguous in IR. That would help to answer questions about which alloca must be contiguous and which can occur separated by instructions, as currently none of them need to be adjacent. Also, is this only intended to constrain the entry basic block?

Please update the documentation with this new constraint. It would be helpful to know exactly when we now require alloca instructions to be adjacent to one another. If you wish to avoid other passes breaking the invariant in future, I think it needs to be added to the IR verifier, and should have been as part of this patch.

In D110257#3133838, @JonChesterfield wrote:

Please change the commit message to say why this change is necessary / an improvement on what we have now.

My recollection is that the amdgpu backend crashes on some IR and this decreases the probability of that IR pattern occuring, which still sounds like fixing the wrong place to me. Was this the one where hoisting static size alloca into the entry block near the backend would the problem?

I think this patch is missing a documentation update adding the new constraint that allocas must be contiguous in IR. That would help to answer questions about which alloca must be contiguous and which can occur separated by instructions, as currently none of them need to be adjacent. Also, is this only intended to constrain the entry basic block?

The current commit message reflect semantics of the patch - there is nothing required to change here. The goal here is to make sure that FE keeps all the static allocas as one cluster at the start of the entry block, which is a good canonical form from the perspective of better code transformation/optimization.

This is not something specific to AMDGPU backend, but AMDGPU backend at present requires this canonical form.

In D110257#3133866, @hsmhsm wrote:

This is not something specific to AMDGPU backend, but AMDGPU backend at present requires this canonical form.

Undocumented and not checked by the IR verifier. Canonical form seems to be overstating it until at least one of those is addressed.

In D110257#3133879, @JonChesterfield wrote:

In D110257#3133866, @hsmhsm wrote:

This is not something specific to AMDGPU backend, but AMDGPU backend at present requires this canonical form.

Undocumented and not checked by the IR verifier. Canonical form seems to be overstating it until at least one of those is addressed.

We already discussed that this canonical form is not something that IR verifier can verify, but it is good enough for better code transformation/optimization. Please refer llvm-dev email discussion w.r.t it.

In D110257#3133895, @hsmhsm wrote:

In D110257#3133879, @JonChesterfield wrote:

In D110257#3133866, @hsmhsm wrote:

This is not something specific to AMDGPU backend, but AMDGPU backend at present requires this canonical form.

Undocumented and not checked by the IR verifier. Canonical form seems to be overstating it until at least one of those is addressed.

We already discussed that this canonical form is not something that IR verifier can verify, but it is good enough for better code transformation/optimization. Please refer llvm-dev email discussion w.r.t it.

If the new invariant is that all alloca must be adjacent to one another, that's a trivial thing for the verifier to check. So I guess it's something else? Please write down what this new invariant is intended to be, preferably in the documentation, perhaps of the alloca instruction.

What llvm-dev discussion do you refer to?

In D110257#3133934, @JonChesterfield wrote:

In D110257#3133895, @hsmhsm wrote:

In D110257#3133879, @JonChesterfield wrote:

In D110257#3133866, @hsmhsm wrote:

This is not something specific to AMDGPU backend, but AMDGPU backend at present requires this canonical form.

Undocumented and not checked by the IR verifier. Canonical form seems to be overstating it until at least one of those is addressed.

We already discussed that this canonical form is not something that IR verifier can verify, but it is good enough for better code transformation/optimization. Please refer llvm-dev email discussion w.r.t it.

If the new invariant is that all alloca must be adjacent to one another, that's a trivial thing for the verifier to check. So I guess it's something else? Please write down what this new invariant is intended to be, preferably in the documentation, perhaps of the alloca instruction.

Please check with llvm-dev.

What llvm-dev discussion do you refer to?

I do not remember, please search for keywords, like static allocas, and figure out.

So you won't articulate or document the new invariant and you think there's a llvm-dev discussion that says we can't verify the invariant which you won't reference, but means you won't add this to the verifier.

Request changes doesn't really work after you've applied the patch.

@rnk do you object to me reverting this? I don't think we can add an invariant to IR which is undocumented and unverified/unverifiable and the patch author seems opposed to fixing either omission.

In D110257#3133866, @hsmhsm wrote:

This is not something specific to AMDGPU backend, but AMDGPU backend at present requires this canonical form.

I must emphasize this is not a hard requirement, just a nice to have

In D110257#3134001, @JonChesterfield wrote:

So you won't articulate or document the new invariant and you think there's a llvm-dev discussion that says we can't verify the invariant which you won't reference, but means you won't add this to the verifier.

The verifier does not check for whether or things are canonical or not. We don't really have formal definitions for what's considered canonical, it's just what people think make later optimizations easier. This is not a change in the IR rules.

Request changes doesn't really work after you've applied the patch.

@rnk do you object to me reverting this? I don't think we can add an invariant to IR which is undocumented and unverified/unverifiable and the patch author seems opposed to fixing either omission.

I object to reverting this as this has nothing which the verifier should be checking

If the amdgpu backend doesn't require this then it doesn't matter much if other passes undo it. If it's not an invariant, we don't need the verifier to alert people to passes that break it.

Git blame on the new code in clang will lead people here where they may be able to work out from the comments why this change was made.

In D110257#3134001, @JonChesterfield wrote:

So you won't articulate or document the new invariant and you think there's a llvm-dev discussion that says we can't verify the invariant which you won't reference, but means you won't add this to the verifier.

Request changes doesn't really work after you've applied the patch.

@rnk do you object to me reverting this? I don't think we can add an invariant to IR which is undocumented and unverified/unverifiable and the patch author seems opposed to fixing either omission.

Is this patch actually causing issues in practice? I think the decision to revert should be based on that.

I don't think this patch creates a new invariant that other passes have to respect, if that's what you're worried about. The way I see it, this patch just makes AMDGPU IR output look "nicer". Middle-end passes are free to insert casts between static allocas if they want.

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CGExpr.cpp

2 lines

CodeGenFunction.h

29 lines

CodeGenFunction.cpp

8 lines

test/

CodeGenCUDA/

builtins-amdgcn.cu

58 lines

CodeGenCXX/

amdgcn-automatic-variable.cpp

10 lines

amdgcn-func-arg.cpp

4 lines

builtin-amdgcn-atomic-inc-dec.cpp

8 lines

vla.cpp

16 lines

CodeGenSYCL/

address-space-deduction.cpp

25 lines

OpenMP/

amdgcn_target_init_temp_alloca.cpp

2 lines

Diff 386045

clang/lib/CodeGen/CGExpr.cpp

Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	Address CodeGenFunction::CreateTempAlloca(llvm::Type *Ty, CharUnits Align,
// cast alloca to the default address space when necessary.		// cast alloca to the default address space when necessary.
if (getASTAllocaAddressSpace() != LangAS::Default) {		if (getASTAllocaAddressSpace() != LangAS::Default) {
auto DestAddrSpace = getContext().getTargetAddressSpace(LangAS::Default);		auto DestAddrSpace = getContext().getTargetAddressSpace(LangAS::Default);
llvm::IRBuilderBase::InsertPointGuard IPG(Builder);		llvm::IRBuilderBase::InsertPointGuard IPG(Builder);
// When ArraySize is nullptr, alloca is inserted at AllocaInsertPt,		// When ArraySize is nullptr, alloca is inserted at AllocaInsertPt,
// otherwise alloca is inserted at the current insertion point of the		// otherwise alloca is inserted at the current insertion point of the
// builder.		// builder.
if (!ArraySize)		if (!ArraySize)
Builder.SetInsertPoint(AllocaInsertPt);		Builder.SetInsertPoint(getPostAllocaInsertPoint());
V = getTargetHooks().performAddrSpaceCast(		V = getTargetHooks().performAddrSpaceCast(
*this, V, getASTAllocaAddressSpace(), LangAS::Default,		*this, V, getASTAllocaAddressSpace(), LangAS::Default,
Ty->getPointerTo(DestAddrSpace), /non-null/ true);		Ty->getPointerTo(DestAddrSpace), /non-null/ true);
}		}

return Address(V, Align);		return Address(V, Align);
		arichardsonUnsubmitted Done Reply Inline Actions Shouldn't this use `cast` instead? arichardson: Shouldn't this use `cast` instead?
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions You are right. But, in the updated patch, this code does not exist anymore. hsmhsm: You are right. But, in the updated patch, this code does not exist anymore.
}		}

/// CreateTempAlloca - This creates an alloca and inserts it into the entry		/// CreateTempAlloca - This creates an alloca and inserts it into the entry
		arsenmUnsubmitted Done Reply Inline Actions Why is there a special AllocaInsertPt iterator in the first place? Can you avoid any iteration logic by just always inserting at the block start? arsenm: Why is there a special AllocaInsertPt iterator in the first place? Can you avoid any iteration…
		jdoerfertUnsubmitted Done Reply Inline Actions Right. The alloca insertion point is sometimes changed to a non-entry block, and we should keep that ability. From all the use cases I know it would suffice to insert at the beginning of the alloca insertion point block though. jdoerfert: Right. The alloca insertion point is sometimes changed to a non-entry block, and we should keep…
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions I really do not understand this comment fully. This block of code here inserts an "addressspace cast" of recently inserted alloca, not the alloca itself. Alloca is already inserted. Please look at the start of this function. The old logic (in the left) inserts addressspace cast of recently inserted alloca immediately after recent alloca using AllocaInsertPt. As a side effect, it also makes AllocaInsertPt now point to this newly inserted addressspace cast. Hence, next alloca is inserted after this new addressspace cast, because now AllocaInsertPt is made to point this newly inserted addressspace cast. The new logic (here) fixes that by moving insertion point just beyond current AllocaInsertPt without updating AllocaInsertPt. How can I insert "addressspace cast" of an alloca, at the beginning of the block even before alloca? As I understand it, AllocaInsertPt is maintained to insert static allocas at the start of entry block, otherwise there is no any special reason to maintain such a special insertion point. Please look at https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/CodeGenFunction.h#L378. That said, I am not really sure, if I have completely misunderstood the comment above. If that is the case, then, I need better clarification here about what really is expected. hsmhsm: I really do not understand this comment fully. This block of code here inserts an…
		rnkUnsubmitted Done Reply Inline Actions Well, inserting at the top of the entry block would reverse the order of the allocas. Currently they appear in source/IRGen order, which is nice. Maintaining the order requires appending, which requires a cursor of some kind. rnk: Well, inserting at the top of the entry block would reverse the order of the allocas. Currently…
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions Yes, correct. And I am waiting for further inputs from @yaxunl / @arsenm / @jdoerfert hsmhsm: Yes, correct. And I am waiting for further inputs from @yaxunl / @arsenm / @jdoerfert
/// block if \p ArraySize is nullptr, otherwise inserts it at the current		/// block if \p ArraySize is nullptr, otherwise inserts it at the current
/// insertion point of the builder.		/// insertion point of the builder.
llvm::AllocaInst CodeGenFunction::CreateTempAlloca(llvm::Type Ty,		llvm::AllocaInst CodeGenFunction::CreateTempAlloca(llvm::Type Ty,
		arsenmUnsubmitted Done Reply Inline Actions Where are the addrspacecasts inserted? Could you just adjust where those are inserted instead? arsenm: Where are the addrspacecasts inserted? Could you just adjust where those are inserted instead?
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions The addressspace casts are inserted immediately after all static allocas (top static alloca cluster). An example: Before this patch: entry: %N.addr = alloca i64, align 8, addrspace(5) %N.addr.ascast = addrspacecast i64 addrspace(5)* %N.addr to i64* %vla.addr = alloca i64, align 8, addrspace(5) %vla.addr.ascast = addrspacecast i64 addrspace(5)* %vla.addr to i64* %a.addr = alloca i32, align 8, addrspace(5) %a.addr.ascast = addrspacecast i32 addrspace(5)* %a.addr to i32** %vla.addr2 = alloca i64, align 8, addrspace(5) %vla.addr2.ascast = addrspacecast i64 addrspace(5)* %vla.addr2 to i64* %b.addr = alloca i32, align 8, addrspace(5) %b.addr.ascast = addrspacecast i32 addrspace(5)* %b.addr to i32** %N.casted = alloca i64, align 8, addrspace(5) %N.casted.ascast = addrspacecast i64 addrspace(5)* %N.casted to i64* %.zero.addr = alloca i32, align 4, addrspace(5) %.zero.addr.ascast = addrspacecast i32 addrspace(5)* %.zero.addr to i32* %.threadid_temp. = alloca i32, align 4, addrspace(5) %.threadid_temp..ascast = addrspacecast i32 addrspace(5)* %.threadid_temp. to i32* store i64 %N, i64* %N.addr.ascast, align 8 With this patch: entry: %N.addr = alloca i64, align 8, addrspace(5) %vla.addr = alloca i64, align 8, addrspace(5) %a.addr = alloca i32, align 8, addrspace(5) %vla.addr2 = alloca i64, align 8, addrspace(5) %b.addr = alloca i32, align 8, addrspace(5) %N.casted = alloca i64, align 8, addrspace(5) %.zero.addr = alloca i32, align 4, addrspace(5) %.threadid_temp. = alloca i32, align 4, addrspace(5) %.threadid_temp..ascast = addrspacecast i32 addrspace(5)* %.threadid_temp. to i32* %.zero.addr.ascast = addrspacecast i32 addrspace(5)* %.zero.addr to i32* %N.casted.ascast = addrspacecast i64 addrspace(5)* %N.casted to i64* %b.addr.ascast = addrspacecast i32* addrspace(5)* %b.addr to i32** %vla.addr2.ascast = addrspacecast i64 addrspace(5)* %vla.addr2 to i64* %a.addr.ascast = addrspacecast i32* addrspace(5)* %a.addr to i32** %vla.addr.ascast = addrspacecast i64 addrspace(5)* %vla.addr to i64* %N.addr.ascast = addrspacecast i64 addrspace(5)* %N.addr to i64* store i64 %N, i64* %N.addr.ascast, align 8 hsmhsm: The addressspace casts are inserted immediately after all static allocas (top static alloca…
		arsenmUnsubmitted Done Reply Inline Actions I meant where in the clang code are these emitted, and how is that I set point found? arsenm: I meant where in the clang code are these emitted, and how is that I set point found?
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions I understand your question as - where exactly is code emitted? It is Clang Codegen part, the Clang Codegen maintains a builder (function specific?) which builds and emits code, and builder always maintains default current insertion position, and it can also be asked to insert at some other place by calling the api SetInsertPoint(). In this particular case, the addrespace casts are emitted at (by calling the builder) https://github.com/llvm-mirror/clang/blob/master/lib/CodeGen/TargetInfo.cpp#445 All I am doing here is - direct the builder to insert addrespace casts just after all static allocas by appropriately setting the insert position via SetInsertPoint(). hsmhsm: I understand your question as - where exactly is code emitted? It is Clang Codegen part, the…
const Twine &Name,		const Twine &Name,
llvm::Value *ArraySize) {		llvm::Value *ArraySize) {
if (ArraySize)		if (ArraySize)
return Builder.CreateAlloca(Ty, ArraySize, Name);		return Builder.CreateAlloca(Ty, ArraySize, Name);
return new llvm::AllocaInst(Ty, CGM.getDataLayout().getAllocaAddrSpace(),		return new llvm::AllocaInst(Ty, CGM.getDataLayout().getAllocaAddrSpace(),
ArraySize, Name, AllocaInsertPt);		ArraySize, Name, AllocaInsertPt);
}		}

/// CreateDefaultAlignTempAlloca - This creates an alloca with the		/// CreateDefaultAlignTempAlloca - This creates an alloca with the
/// default alignment of the corresponding LLVM type, which is not		/// default alignment of the corresponding LLVM type, which is not
/// guaranteed to be related in any way to the expected alignment of		/// guaranteed to be related in any way to the expected alignment of
/// an AST type that might have been lowered to Ty.		/// an AST type that might have been lowered to Ty.
Address CodeGenFunction::CreateDefaultAlignTempAlloca(llvm::Type *Ty,		Address CodeGenFunction::CreateDefaultAlignTempAlloca(llvm::Type *Ty,
const Twine &Name) {		const Twine &Name) {
CharUnits Align =		CharUnits Align =
CharUnits::fromQuantity(CGM.getDataLayout().getPrefTypeAlignment(Ty));		CharUnits::fromQuantity(CGM.getDataLayout().getPrefTypeAlignment(Ty));
return CreateTempAlloca(Ty, Align, Name);		return CreateTempAlloca(Ty, Align, Name);
}		}

Address CodeGenFunction::CreateIRTemp(QualType Ty, const Twine &Name) {		Address CodeGenFunction::CreateIRTemp(QualType Ty, const Twine &Name) {
CharUnits Align = getContext().getTypeAlignInChars(Ty);		CharUnits Align = getContext().getTypeAlignInChars(Ty);
return CreateTempAlloca(ConvertType(Ty), Align, Name);		return CreateTempAlloca(ConvertType(Ty), Align, Name);
}		}
		jdoerfertUnsubmitted Done Reply Inline Actions I'm not even sure this is necessarily correct. How do we know the new store is not inside a loop and might write the value more than once, potentially overwriting a value set later? Aside from that (important) question, you need to update the documentation of the function. It doesn't correspond to the new semantics anymore. jdoerfert: I'm not even sure this is necessarily correct. How do we know the new store is not inside a…
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions I am not convinced with above comments -Because, this code neither changes the address nor the initializer (https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/CodeGenFunction.h#L2548), but only a place where the initialization happens. Further, as the documentation says, the initializer must be a constant or function argument (and surprisingly, I do not see any assertion about it). Hence, even if the initialization happens within the loop, loop invariant code motion pass should detect it. That said, if we think, it is better to keep the initialization within entry block, we can do it at the end of the block. hsmhsm: I am not convinced with above comments -Because, this code neither changes the address nor the…
		jdoerfertUnsubmitted Done Reply Inline Actions What the initial value is is irrelevant. Arguing LICM should take care of it is also not helpful. Changing the location of the initialization is in itself a problem: Before: a = alloca a = 0; // init is in the entry block for (...) { use(a); a = a + 1; } After, potentially: a = alloca for (...) { a = 0; // init is now at the builder insertion point use(a); a = a + 1; } jdoerfert: What the initial value is is irrelevant. Arguing LICM should take care of it is also not…
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions Clarification - I am not trying to argue anything here with anybody, I am trying to defend myself, where I feel I am right as per my knowledge. If I realize that I am at false later, then I correct myself, but I am not arguing, and will never ever do. Ok, I will try to fix it from the practical point of view. But, I still think as follows - From theoretical/good programming/good front-end code generation perspective, initialization of something will never happen within construct like loop, otherwise initialization has no meaning at all. hsmhsm: Clarification - I am not trying to argue anything here with anybody, I am trying to defend…
		jdoerfertUnsubmitted Done Reply Inline Actions [...] initialization of something will never happen within construct like loop, otherwise initialization has no meaning at all. Why would we not create a new temporary allocas inside of a loop? There is nothing in any of the APIs or descriptions that would indicate you could not create a temporary alloca and initialize it while you are generating code in a loop. I cannot see why initialization would me meaningless given that it was inserted in the entry block prior to this change. jdoerfert: > [...] initialization of something will never happen within construct like loop, otherwise…

Address CodeGenFunction::CreateMemTemp(QualType Ty, const Twine &Name,		Address CodeGenFunction::CreateMemTemp(QualType Ty, const Twine &Name,
Address *Alloca) {		Address *Alloca) {
// FIXME: Should we prefer the preferred type alignment here?		// FIXME: Should we prefer the preferred type alignment here?
return CreateMemTemp(Ty, getContext().getTypeAlignInChars(Ty), Name, Alloca);		return CreateMemTemp(Ty, getContext().getTypeAlignInChars(Ty), Name, Alloca);
}		}

Address CodeGenFunction::CreateMemTemp(QualType Ty, CharUnits Align,		Address CodeGenFunction::CreateMemTemp(QualType Ty, CharUnits Align,
▲ Show 20 Lines • Show All 5,348 Lines • Show Last 20 Lines

clang/lib/CodeGen/CodeGenFunction.h

Show First 20 Lines • Show All 373 Lines • ▼ Show 20 Lines	if (CurLexicalScope)
return CurLexicalScope->hasLabels();		return CurLexicalScope->hasLabels();
return !LabelMap.empty();		return !LabelMap.empty();
}		}

/// AllocaInsertPoint - This is an instruction in the entry block before which		/// AllocaInsertPoint - This is an instruction in the entry block before which
/// we prefer to insert allocas.		/// we prefer to insert allocas.
llvm::AssertingVH<llvm::Instruction> AllocaInsertPt;		llvm::AssertingVH<llvm::Instruction> AllocaInsertPt;

		private:
		/// PostAllocaInsertPt - This is a place in the prologue where code can be
		/// inserted that will be dominated by all the static allocas. This helps
		/// achieve two things:
		/// 1. Contiguity of all static allocas (within the prologue) is maintained.
		/// 2. All other prologue code (which are dominated by static allocas) do
		/// appear in the source order immediately after all static allocas.
		///
		/// PostAllocaInsertPt will be lazily created when it is really required.
		llvm::AssertingVH<llvm::Instruction> PostAllocaInsertPt = nullptr;

		rjmccallUnsubmitted Done Reply Inline Actions Please call this something like `PostAllocaInsertPt`. Instead of eagerly creating it, please create it lazily: add an accessor like `getPostAllocaInsertPoint()` which creates (and saves here) an instruction that immediately follows `AllocaInsertPt` if this is currently null. I think you should make your own instruction so that we don't mess up any code that might rely on temporarily creating and then removing dead instructions. Please use an `llvm::AssertingVH<llvm::Instruction>`. I think that can handle holding a null value. The comment here should be more general, like "a place in the prologue where code can be inserted that will be dominated by all the static allocas." You don't need to talk about `addrspacecast`s specifically; that's just one possible use case for this. Also, it's either "`addrspacecast`" (using the IR name of the operation) or "address space cast" (writing out the operation in normal English words); don't run together `addressspace` as if it were a keyword when it isn't. rjmccall: Please call this something like `PostAllocaInsertPt`. Instead of eagerly creating it, please…
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions Thanks. Fixed the above review comments. hsmhsm: Thanks. Fixed the above review comments.
		public:
		/// Return PostAllocaInsertPt. If it is not yet created, then insert it
		/// immediately after AllocaInsertPt.
		llvm::Instruction *getPostAllocaInsertPoint() {
		if (!PostAllocaInsertPt) {
		assert(AllocaInsertPt &&
		"Expected static alloca insertion point at function prologue");
		auto *EBB = AllocaInsertPt->getParent();
		assert(EBB->isEntryBlock() &&
		"EBB should be entry block of the current code gen function");
		PostAllocaInsertPt = AllocaInsertPt->clone();
		PostAllocaInsertPt->setName("postallocapt");
		PostAllocaInsertPt->insertAfter(AllocaInsertPt);
		}

		return PostAllocaInsertPt;
		}

/// API for captured statement code generation.		/// API for captured statement code generation.
class CGCapturedStmtInfo {		class CGCapturedStmtInfo {
public:		public:
explicit CGCapturedStmtInfo(CapturedRegionKind K = CR_Default)		explicit CGCapturedStmtInfo(CapturedRegionKind K = CR_Default)
: Kind(K), ThisValue(nullptr), CXXThisFieldDecl(nullptr) {}		: Kind(K), ThisValue(nullptr), CXXThisFieldDecl(nullptr) {}
explicit CGCapturedStmtInfo(const CapturedStmt &S,		explicit CGCapturedStmtInfo(const CapturedStmt &S,
CapturedRegionKind K = CR_Default)		CapturedRegionKind K = CR_Default)
: Kind(K), ThisValue(nullptr), CXXThisFieldDecl(nullptr) {		: Kind(K), ThisValue(nullptr), CXXThisFieldDecl(nullptr) {
▲ Show 20 Lines • Show All 4,481 Lines • Show Last 20 Lines

clang/lib/CodeGen/CodeGenFunction.cpp

Show First 20 Lines • Show All 418 Lines • ▼ Show 20 Lines	if (!EscapedLocals.empty()) {
CGBuilderTy(*this, AllocaInsertPt).CreateCall(FrameEscapeFn, EscapeArgs);		CGBuilderTy(*this, AllocaInsertPt).CreateCall(FrameEscapeFn, EscapeArgs);
}		}

// Remove the AllocaInsertPt instruction, which is just a convenience for us.		// Remove the AllocaInsertPt instruction, which is just a convenience for us.
llvm::Instruction *Ptr = AllocaInsertPt;		llvm::Instruction *Ptr = AllocaInsertPt;
AllocaInsertPt = nullptr;		AllocaInsertPt = nullptr;
Ptr->eraseFromParent();		Ptr->eraseFromParent();

		// PostAllocaInsertPt, if created, was lazily created when it was required,
		// remove it now since it was just created for our own convenience.
		if (PostAllocaInsertPt) {
		llvm::Instruction *PostPtr = PostAllocaInsertPt;
		PostAllocaInsertPt = nullptr;
		PostPtr->eraseFromParent();
		}

// If someone took the address of a label but never did an indirect goto, we		// If someone took the address of a label but never did an indirect goto, we
// made a zero entry PHI node, which is illegal, zap it now.		// made a zero entry PHI node, which is illegal, zap it now.
if (IndirectBranch) {		if (IndirectBranch) {
llvm::PHINode *PN = cast<llvm::PHINode>(IndirectBranch->getAddress());		llvm::PHINode *PN = cast<llvm::PHINode>(IndirectBranch->getAddress());
if (PN->getNumIncomingValues() == 0) {		if (PN->getNumIncomingValues() == 0) {
PN->replaceAllUsesWith(llvm::UndefValue::get(PN->getType()));		PN->replaceAllUsesWith(llvm::UndefValue::get(PN->getType()));
PN->eraseFromParent();		PN->eraseFromParent();
}		}
▲ Show 20 Lines • Show All 2,303 Lines • Show Last 20 Lines

clang/test/CodeGenCUDA/builtins-amdgcn.cu

// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py		// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx906 -x hip \		// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx906 -x hip \
		jdoerfertUnsubmitted Done Reply Inline Actions Please prepare a pre-commit that adds auto-generated check lines. Adding them as part of a commit makes it impossible to see the difference. jdoerfert: Please prepare a pre-commit that adds auto-generated check lines. Adding them as part of a…
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions Will do hsmhsm: Will do
// RUN: -aux-triple x86_64-unknown-linux-gnu -fcuda-is-device -emit-llvm %s \		// RUN: -aux-triple x86_64-unknown-linux-gnu -fcuda-is-device -emit-llvm %s \
// RUN: -o - \| FileCheck %s		// RUN: -o - \| FileCheck %s

// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx906 -x hip \		// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx906 -x hip \
// RUN: -aux-triple x86_64-pc-windows-msvc -fcuda-is-device -emit-llvm %s \		// RUN: -aux-triple x86_64-pc-windows-msvc -fcuda-is-device -emit-llvm %s \
// RUN: -o - \| FileCheck %s		// RUN: -o - \| FileCheck %s

#include "Inputs/cuda.h"		#include "Inputs/cuda.h"

// CHECK-LABEL: @_Z16use_dispatch_ptrPi(		// CHECK-LABEL: @_Z16use_dispatch_ptrPi(
// CHECK-NEXT: entry:		// CHECK-NEXT: entry:
// CHECK-NEXT: [[OUT:%.]] = alloca i32, align 8, addrspace(5)		// CHECK-NEXT: [[OUT:%.]] = alloca i32, align 8, addrspace(5)
// CHECK-NEXT: [[OUT_ASCAST:%.]] = addrspacecast i32 addrspace(5)* [[OUT]] to i32**
// CHECK-NEXT: [[OUT_ADDR:%.]] = alloca i32, align 8, addrspace(5)		// CHECK-NEXT: [[OUT_ADDR:%.]] = alloca i32, align 8, addrspace(5)
// CHECK-NEXT: [[OUT_ADDR_ASCAST:%.]] = addrspacecast i32 addrspace(5)* [[OUT_ADDR]] to i32**
// CHECK-NEXT: [[DISPATCH_PTR:%.]] = alloca i32, align 8, addrspace(5)		// CHECK-NEXT: [[DISPATCH_PTR:%.]] = alloca i32, align 8, addrspace(5)
		// CHECK-NEXT: [[OUT_ASCAST:%.]] = addrspacecast i32 addrspace(5)* [[OUT]] to i32**
		// CHECK-NEXT: [[OUT_ADDR_ASCAST:%.]] = addrspacecast i32 addrspace(5)* [[OUT_ADDR]] to i32**
// CHECK-NEXT: [[DISPATCH_PTR_ASCAST:%.]] = addrspacecast i32 addrspace(5)* [[DISPATCH_PTR]] to i32**		// CHECK-NEXT: [[DISPATCH_PTR_ASCAST:%.]] = addrspacecast i32 addrspace(5)* [[DISPATCH_PTR]] to i32**
// CHECK-NEXT: [[TMP0:%.]] = addrspacecast i32 addrspace(1) [[OUT_COERCE:%.]] to i32		// CHECK-NEXT: [[TMP0:%.]] = addrspacecast i32 addrspace(1) [[OUT_COERCE:%.]] to i32
// CHECK-NEXT: store i32* [[TMP0]], i32** [[OUT_ASCAST]], align 8		// CHECK-NEXT: store i32* [[TMP0]], i32** [[OUT_ASCAST]], align 8
// CHECK-NEXT: [[OUT1:%.]] = load i32, i32** [[OUT_ASCAST]], align 8		// CHECK-NEXT: [[OUT1:%.]] = load i32, i32** [[OUT_ASCAST]], align 8
// CHECK-NEXT: store i32* [[OUT1]], i32** [[OUT_ADDR_ASCAST]], align 8		// CHECK-NEXT: store i32* [[OUT1]], i32** [[OUT_ADDR_ASCAST]], align 8
// CHECK-NEXT: [[TMP1:%.]] = call align 4 dereferenceable(64) i8 addrspace(4) @llvm.amdgcn.dispatch.ptr()		// CHECK-NEXT: [[TMP1:%.]] = call align 4 dereferenceable(64) i8 addrspace(4) @llvm.amdgcn.dispatch.ptr()
// CHECK-NEXT: [[TMP2:%.]] = addrspacecast i8 addrspace(4) [[TMP1]] to i32*		// CHECK-NEXT: [[TMP2:%.]] = addrspacecast i8 addrspace(4) [[TMP1]] to i32*
// CHECK-NEXT: store i32* [[TMP2]], i32** [[DISPATCH_PTR_ASCAST]], align 8		// CHECK-NEXT: store i32* [[TMP2]], i32** [[DISPATCH_PTR_ASCAST]], align 8
// CHECK-NEXT: [[TMP3:%.]] = load i32, i32** [[DISPATCH_PTR_ASCAST]], align 8		// CHECK-NEXT: [[TMP3:%.]] = load i32, i32** [[DISPATCH_PTR_ASCAST]], align 8
// CHECK-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP3]], align 4		// CHECK-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP3]], align 4
// CHECK-NEXT: [[TMP5:%.]] = load i32, i32** [[OUT_ADDR_ASCAST]], align 8		// CHECK-NEXT: [[TMP5:%.]] = load i32, i32** [[OUT_ADDR_ASCAST]], align 8
// CHECK-NEXT: store i32 [[TMP4]], i32* [[TMP5]], align 4		// CHECK-NEXT: store i32 [[TMP4]], i32* [[TMP5]], align 4
// CHECK-NEXT: ret void		// CHECK-NEXT: ret void
//		//
__global__ void use_dispatch_ptr(int* out) {		__global__ void use_dispatch_ptr(int* out) {
const int* dispatch_ptr = (const int*)__builtin_amdgcn_dispatch_ptr();		const int* dispatch_ptr = (const int*)__builtin_amdgcn_dispatch_ptr();
out = dispatch_ptr;		out = dispatch_ptr;
}		}

__global__		__global__
// CHECK-LABEL: @_Z12test_ds_fmaxf(		// CHECK-LABEL: @_Z12test_ds_fmaxf(
// CHECK-NEXT: entry:		// CHECK-NEXT: entry:
// CHECK-NEXT: [[SRC_ADDR:%.*]] = alloca float, align 4, addrspace(5)		// CHECK-NEXT: [[SRC_ADDR:%.*]] = alloca float, align 4, addrspace(5)
// CHECK-NEXT: [[SRC_ADDR_ASCAST:%.]] = addrspacecast float addrspace(5) [[SRC_ADDR]] to float*
// CHECK-NEXT: [[X:%.*]] = alloca float, align 4, addrspace(5)		// CHECK-NEXT: [[X:%.*]] = alloca float, align 4, addrspace(5)
		// CHECK-NEXT: [[SRC_ADDR_ASCAST:%.]] = addrspacecast float addrspace(5) [[SRC_ADDR]] to float*
// CHECK-NEXT: [[X_ASCAST:%.]] = addrspacecast float addrspace(5) [[X]] to float*		// CHECK-NEXT: [[X_ASCAST:%.]] = addrspacecast float addrspace(5) [[X]] to float*
// CHECK-NEXT: store float [[SRC:%.]], float [[SRC_ADDR_ASCAST]], align 4		// CHECK-NEXT: store float [[SRC:%.]], float [[SRC_ADDR_ASCAST]], align 4
// CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC_ADDR_ASCAST]], align 4		// CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC_ADDR_ASCAST]], align 4
// CHECK-NEXT: [[TMP1:%.]] = call contract float @llvm.amdgcn.ds.fmax.f32(float addrspace(3) @_ZZ12test_ds_fmaxfE6shared, float [[TMP0]], i32 0, i32 0, i1 false)		// CHECK-NEXT: [[TMP1:%.]] = call contract float @llvm.amdgcn.ds.fmax.f32(float addrspace(3) @_ZZ12test_ds_fmaxfE6shared, float [[TMP0]], i32 0, i32 0, i1 false)
// CHECK-NEXT: store volatile float [[TMP1]], float* [[X_ASCAST]], align 4		// CHECK-NEXT: store volatile float [[TMP1]], float* [[X_ASCAST]], align 4
// CHECK-NEXT: ret void		// CHECK-NEXT: ret void
//		//
void test_ds_fmax(float src) {		void
		test_ds_fmax(float src) {
__shared__ float shared;		__shared__ float shared;
volatile float x = __builtin_amdgcn_ds_fmaxf(&shared, src, 0, 0, false);		volatile float x = __builtin_amdgcn_ds_fmaxf(&shared, src, 0, 0, false);
}		}

// CHECK-LABEL: @_Z12test_ds_faddf(		// CHECK-LABEL: @_Z12test_ds_faddf(
// CHECK-NEXT: entry:		// CHECK-NEXT: entry:
// CHECK-NEXT: [[SRC_ADDR:%.*]] = alloca float, align 4, addrspace(5)		// CHECK-NEXT: [[SRC_ADDR:%.*]] = alloca float, align 4, addrspace(5)
// CHECK-NEXT: [[SRC_ADDR_ASCAST:%.]] = addrspacecast float addrspace(5) [[SRC_ADDR]] to float*
// CHECK-NEXT: [[X:%.*]] = alloca float, align 4, addrspace(5)		// CHECK-NEXT: [[X:%.*]] = alloca float, align 4, addrspace(5)
		// CHECK-NEXT: [[SRC_ADDR_ASCAST:%.]] = addrspacecast float addrspace(5) [[SRC_ADDR]] to float*
// CHECK-NEXT: [[X_ASCAST:%.]] = addrspacecast float addrspace(5) [[X]] to float*		// CHECK-NEXT: [[X_ASCAST:%.]] = addrspacecast float addrspace(5) [[X]] to float*
// CHECK-NEXT: store float [[SRC:%.]], float [[SRC_ADDR_ASCAST]], align 4		// CHECK-NEXT: store float [[SRC:%.]], float [[SRC_ADDR_ASCAST]], align 4
// CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC_ADDR_ASCAST]], align 4		// CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC_ADDR_ASCAST]], align 4
// CHECK-NEXT: [[TMP1:%.]] = call contract float @llvm.amdgcn.ds.fadd.f32(float addrspace(3) @_ZZ12test_ds_faddfE6shared, float [[TMP0]], i32 0, i32 0, i1 false)		// CHECK-NEXT: [[TMP1:%.]] = call contract float @llvm.amdgcn.ds.fadd.f32(float addrspace(3) @_ZZ12test_ds_faddfE6shared, float [[TMP0]], i32 0, i32 0, i1 false)
// CHECK-NEXT: store volatile float [[TMP1]], float* [[X_ASCAST]], align 4		// CHECK-NEXT: store volatile float [[TMP1]], float* [[X_ASCAST]], align 4
// CHECK-NEXT: ret void		// CHECK-NEXT: ret void
//		//
__global__ void test_ds_fadd(float src) {		__global__ void test_ds_fadd(float src) {
__shared__ float shared;		__shared__ float shared;
volatile float x = __builtin_amdgcn_ds_faddf(&shared, src, 0, 0, false);		volatile float x = __builtin_amdgcn_ds_faddf(&shared, src, 0, 0, false);
}		}

// CHECK-LABEL: @_Z12test_ds_fminfPf(		// CHECK-LABEL: @_Z12test_ds_fminfPf(
// CHECK-NEXT: entry:		// CHECK-NEXT: entry:
// CHECK-NEXT: [[SHARED:%.]] = alloca float, align 8, addrspace(5)		// CHECK-NEXT: [[SHARED:%.]] = alloca float, align 8, addrspace(5)
// CHECK-NEXT: [[SHARED_ASCAST:%.]] = addrspacecast float addrspace(5)* [[SHARED]] to float**
// CHECK-NEXT: [[SRC_ADDR:%.*]] = alloca float, align 4, addrspace(5)		// CHECK-NEXT: [[SRC_ADDR:%.*]] = alloca float, align 4, addrspace(5)
// CHECK-NEXT: [[SRC_ADDR_ASCAST:%.]] = addrspacecast float addrspace(5) [[SRC_ADDR]] to float*
// CHECK-NEXT: [[SHARED_ADDR:%.]] = alloca float, align 8, addrspace(5)		// CHECK-NEXT: [[SHARED_ADDR:%.]] = alloca float, align 8, addrspace(5)
// CHECK-NEXT: [[SHARED_ADDR_ASCAST:%.]] = addrspacecast float addrspace(5)* [[SHARED_ADDR]] to float**
// CHECK-NEXT: [[X:%.*]] = alloca float, align 4, addrspace(5)		// CHECK-NEXT: [[X:%.*]] = alloca float, align 4, addrspace(5)
		// CHECK-NEXT: [[SHARED_ASCAST:%.]] = addrspacecast float addrspace(5)* [[SHARED]] to float**
		// CHECK-NEXT: [[SRC_ADDR_ASCAST:%.]] = addrspacecast float addrspace(5) [[SRC_ADDR]] to float*
		// CHECK-NEXT: [[SHARED_ADDR_ASCAST:%.]] = addrspacecast float addrspace(5)* [[SHARED_ADDR]] to float**
// CHECK-NEXT: [[X_ASCAST:%.]] = addrspacecast float addrspace(5) [[X]] to float*		// CHECK-NEXT: [[X_ASCAST:%.]] = addrspacecast float addrspace(5) [[X]] to float*
// CHECK-NEXT: [[TMP0:%.]] = addrspacecast float addrspace(1) [[SHARED_COERCE:%.]] to float		// CHECK-NEXT: [[TMP0:%.]] = addrspacecast float addrspace(1) [[SHARED_COERCE:%.]] to float
// CHECK-NEXT: store float* [[TMP0]], float** [[SHARED_ASCAST]], align 8		// CHECK-NEXT: store float* [[TMP0]], float** [[SHARED_ASCAST]], align 8
// CHECK-NEXT: [[SHARED1:%.]] = load float, float** [[SHARED_ASCAST]], align 8		// CHECK-NEXT: [[SHARED1:%.]] = load float, float** [[SHARED_ASCAST]], align 8
// CHECK-NEXT: store float [[SRC:%.]], float [[SRC_ADDR_ASCAST]], align 4		// CHECK-NEXT: store float [[SRC:%.]], float [[SRC_ADDR_ASCAST]], align 4
// CHECK-NEXT: store float* [[SHARED1]], float** [[SHARED_ADDR_ASCAST]], align 8		// CHECK-NEXT: store float* [[SHARED1]], float** [[SHARED_ADDR_ASCAST]], align 8
// CHECK-NEXT: [[TMP1:%.]] = load float, float** [[SHARED_ADDR_ASCAST]], align 8		// CHECK-NEXT: [[TMP1:%.]] = load float, float** [[SHARED_ADDR_ASCAST]], align 8
// CHECK-NEXT: [[TMP2:%.]] = addrspacecast float [[TMP1]] to float addrspace(3)*		// CHECK-NEXT: [[TMP2:%.]] = addrspacecast float [[TMP1]] to float addrspace(3)*
Show All 28 Lines	__global__ void endpgm() {
__builtin_amdgcn_endpgm();		__builtin_amdgcn_endpgm();
}		}

// Check the 64 bit argument is correctly passed to the intrinsic without truncation or assertion.		// Check the 64 bit argument is correctly passed to the intrinsic without truncation or assertion.

// CHECK-LABEL: @_Z14test_uicmp_i64Pyyy(		// CHECK-LABEL: @_Z14test_uicmp_i64Pyyy(
// CHECK-NEXT: entry:		// CHECK-NEXT: entry:
// CHECK-NEXT: [[OUT:%.]] = alloca i64, align 8, addrspace(5)		// CHECK-NEXT: [[OUT:%.]] = alloca i64, align 8, addrspace(5)
// CHECK-NEXT: [[OUT_ASCAST:%.]] = addrspacecast i64 addrspace(5)* [[OUT]] to i64**
// CHECK-NEXT: [[OUT_ADDR:%.]] = alloca i64, align 8, addrspace(5)		// CHECK-NEXT: [[OUT_ADDR:%.]] = alloca i64, align 8, addrspace(5)
// CHECK-NEXT: [[OUT_ADDR_ASCAST:%.]] = addrspacecast i64 addrspace(5)* [[OUT_ADDR]] to i64**
// CHECK-NEXT: [[A_ADDR:%.*]] = alloca i64, align 8, addrspace(5)		// CHECK-NEXT: [[A_ADDR:%.*]] = alloca i64, align 8, addrspace(5)
// CHECK-NEXT: [[A_ADDR_ASCAST:%.]] = addrspacecast i64 addrspace(5) [[A_ADDR]] to i64*
// CHECK-NEXT: [[B_ADDR:%.*]] = alloca i64, align 8, addrspace(5)		// CHECK-NEXT: [[B_ADDR:%.*]] = alloca i64, align 8, addrspace(5)
		// CHECK-NEXT: [[OUT_ASCAST:%.]] = addrspacecast i64 addrspace(5)* [[OUT]] to i64**
		// CHECK-NEXT: [[OUT_ADDR_ASCAST:%.]] = addrspacecast i64 addrspace(5)* [[OUT_ADDR]] to i64**
		// CHECK-NEXT: [[A_ADDR_ASCAST:%.]] = addrspacecast i64 addrspace(5) [[A_ADDR]] to i64*
// CHECK-NEXT: [[B_ADDR_ASCAST:%.]] = addrspacecast i64 addrspace(5) [[B_ADDR]] to i64*		// CHECK-NEXT: [[B_ADDR_ASCAST:%.]] = addrspacecast i64 addrspace(5) [[B_ADDR]] to i64*
// CHECK-NEXT: [[TMP0:%.]] = addrspacecast i64 addrspace(1) [[OUT_COERCE:%.]] to i64		// CHECK-NEXT: [[TMP0:%.]] = addrspacecast i64 addrspace(1) [[OUT_COERCE:%.]] to i64
// CHECK-NEXT: store i64* [[TMP0]], i64** [[OUT_ASCAST]], align 8		// CHECK-NEXT: store i64* [[TMP0]], i64** [[OUT_ASCAST]], align 8
// CHECK-NEXT: [[OUT1:%.]] = load i64, i64** [[OUT_ASCAST]], align 8		// CHECK-NEXT: [[OUT1:%.]] = load i64, i64** [[OUT_ASCAST]], align 8
// CHECK-NEXT: store i64* [[OUT1]], i64** [[OUT_ADDR_ASCAST]], align 8		// CHECK-NEXT: store i64* [[OUT1]], i64** [[OUT_ADDR_ASCAST]], align 8
// CHECK-NEXT: store i64 [[A:%.]], i64 [[A_ADDR_ASCAST]], align 8		// CHECK-NEXT: store i64 [[A:%.]], i64 [[A_ADDR_ASCAST]], align 8
// CHECK-NEXT: store i64 [[B:%.]], i64 [[B_ADDR_ASCAST]], align 8		// CHECK-NEXT: store i64 [[B:%.]], i64 [[B_ADDR_ASCAST]], align 8
// CHECK-NEXT: [[TMP1:%.]] = load i64, i64 [[A_ADDR_ASCAST]], align 8		// CHECK-NEXT: [[TMP1:%.]] = load i64, i64 [[A_ADDR_ASCAST]], align 8
// CHECK-NEXT: [[TMP2:%.]] = load i64, i64 [[B_ADDR_ASCAST]], align 8		// CHECK-NEXT: [[TMP2:%.]] = load i64, i64 [[B_ADDR_ASCAST]], align 8
// CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i64(i64 [[TMP1]], i64 [[TMP2]], i32 35)		// CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i64(i64 [[TMP1]], i64 [[TMP2]], i32 35)
// CHECK-NEXT: [[TMP4:%.]] = load i64, i64** [[OUT_ADDR_ASCAST]], align 8		// CHECK-NEXT: [[TMP4:%.]] = load i64, i64** [[OUT_ADDR_ASCAST]], align 8
// CHECK-NEXT: store i64 [[TMP3]], i64* [[TMP4]], align 8		// CHECK-NEXT: store i64 [[TMP3]], i64* [[TMP4]], align 8
// CHECK-NEXT: ret void		// CHECK-NEXT: ret void
//		//
__global__ void test_uicmp_i64(unsigned long long *out, unsigned long long a, unsigned long long b)		__global__ void test_uicmp_i64(unsigned long long *out, unsigned long long a, unsigned long long b)
{		{
*out = __builtin_amdgcn_uicmpl(a, b, 30+5);		*out = __builtin_amdgcn_uicmpl(a, b, 30+5);
}		}

// Check the 64 bit return value is correctly returned without truncation or assertion.		// Check the 64 bit return value is correctly returned without truncation or assertion.

// CHECK-LABEL: @_Z14test_s_memtimePy(		// CHECK-LABEL: @_Z14test_s_memtimePy(
// CHECK-NEXT: entry:		// CHECK-NEXT: entry:
// CHECK-NEXT: [[OUT:%.]] = alloca i64, align 8, addrspace(5)		// CHECK-NEXT: [[OUT:%.]] = alloca i64, align 8, addrspace(5)
// CHECK-NEXT: [[OUT_ASCAST:%.]] = addrspacecast i64 addrspace(5)* [[OUT]] to i64**
// CHECK-NEXT: [[OUT_ADDR:%.]] = alloca i64, align 8, addrspace(5)		// CHECK-NEXT: [[OUT_ADDR:%.]] = alloca i64, align 8, addrspace(5)
		// CHECK-NEXT: [[OUT_ASCAST:%.]] = addrspacecast i64 addrspace(5)* [[OUT]] to i64**
// CHECK-NEXT: [[OUT_ADDR_ASCAST:%.]] = addrspacecast i64 addrspace(5)* [[OUT_ADDR]] to i64**		// CHECK-NEXT: [[OUT_ADDR_ASCAST:%.]] = addrspacecast i64 addrspace(5)* [[OUT_ADDR]] to i64**
// CHECK-NEXT: [[TMP0:%.]] = addrspacecast i64 addrspace(1) [[OUT_COERCE:%.]] to i64		// CHECK-NEXT: [[TMP0:%.]] = addrspacecast i64 addrspace(1) [[OUT_COERCE:%.]] to i64
// CHECK-NEXT: store i64* [[TMP0]], i64** [[OUT_ASCAST]], align 8		// CHECK-NEXT: store i64* [[TMP0]], i64** [[OUT_ASCAST]], align 8
// CHECK-NEXT: [[OUT1:%.]] = load i64, i64** [[OUT_ASCAST]], align 8		// CHECK-NEXT: [[OUT1:%.]] = load i64, i64** [[OUT_ASCAST]], align 8
// CHECK-NEXT: store i64* [[OUT1]], i64** [[OUT_ADDR_ASCAST]], align 8		// CHECK-NEXT: store i64* [[OUT1]], i64** [[OUT_ADDR_ASCAST]], align 8
// CHECK-NEXT: [[TMP1:%.*]] = call i64 @llvm.amdgcn.s.memtime()		// CHECK-NEXT: [[TMP1:%.*]] = call i64 @llvm.amdgcn.s.memtime()
// CHECK-NEXT: [[TMP2:%.]] = load i64, i64** [[OUT_ADDR_ASCAST]], align 8		// CHECK-NEXT: [[TMP2:%.]] = load i64, i64** [[OUT_ADDR_ASCAST]], align 8
// CHECK-NEXT: store i64 [[TMP1]], i64* [[TMP2]], align 8		// CHECK-NEXT: store i64 [[TMP1]], i64* [[TMP2]], align 8
// CHECK-NEXT: ret void		// CHECK-NEXT: ret void
//		//
__global__ void test_s_memtime(unsigned long long* out)		__global__ void test_s_memtime(unsigned long long* out)
{		{
*out = __builtin_amdgcn_s_memtime();		*out = __builtin_amdgcn_s_memtime();
}		}

// Check a generic pointer can be passed as a shared pointer and a generic pointer.		// Check a generic pointer can be passed as a shared pointer and a generic pointer.
__device__ void func(float *x);		__device__ void func(float *x);

// CHECK-LABEL: @_Z17test_ds_fmin_funcfPf(		// CHECK-LABEL: @_Z17test_ds_fmin_funcfPf(
// CHECK-NEXT: entry:		// CHECK-NEXT: entry:
// CHECK-NEXT: [[SHARED:%.]] = alloca float, align 8, addrspace(5)		// CHECK-NEXT: [[SHARED:%.]] = alloca float, align 8, addrspace(5)
// CHECK-NEXT: [[SHARED_ASCAST:%.]] = addrspacecast float addrspace(5)* [[SHARED]] to float**
// CHECK-NEXT: [[SRC_ADDR:%.*]] = alloca float, align 4, addrspace(5)		// CHECK-NEXT: [[SRC_ADDR:%.*]] = alloca float, align 4, addrspace(5)
// CHECK-NEXT: [[SRC_ADDR_ASCAST:%.]] = addrspacecast float addrspace(5) [[SRC_ADDR]] to float*
// CHECK-NEXT: [[SHARED_ADDR:%.]] = alloca float, align 8, addrspace(5)		// CHECK-NEXT: [[SHARED_ADDR:%.]] = alloca float, align 8, addrspace(5)
// CHECK-NEXT: [[SHARED_ADDR_ASCAST:%.]] = addrspacecast float addrspace(5)* [[SHARED_ADDR]] to float**
// CHECK-NEXT: [[X:%.*]] = alloca float, align 4, addrspace(5)		// CHECK-NEXT: [[X:%.*]] = alloca float, align 4, addrspace(5)
		// CHECK-NEXT: [[SHARED_ASCAST:%.]] = addrspacecast float addrspace(5)* [[SHARED]] to float**
		// CHECK-NEXT: [[SRC_ADDR_ASCAST:%.]] = addrspacecast float addrspace(5) [[SRC_ADDR]] to float*
		// CHECK-NEXT: [[SHARED_ADDR_ASCAST:%.]] = addrspacecast float addrspace(5)* [[SHARED_ADDR]] to float**
// CHECK-NEXT: [[X_ASCAST:%.]] = addrspacecast float addrspace(5) [[X]] to float*		// CHECK-NEXT: [[X_ASCAST:%.]] = addrspacecast float addrspace(5) [[X]] to float*
// CHECK-NEXT: [[TMP0:%.]] = addrspacecast float addrspace(1) [[SHARED_COERCE:%.]] to float		// CHECK-NEXT: [[TMP0:%.]] = addrspacecast float addrspace(1) [[SHARED_COERCE:%.]] to float
// CHECK-NEXT: store float* [[TMP0]], float** [[SHARED_ASCAST]], align 8		// CHECK-NEXT: store float* [[TMP0]], float** [[SHARED_ASCAST]], align 8
// CHECK-NEXT: [[SHARED1:%.]] = load float, float** [[SHARED_ASCAST]], align 8		// CHECK-NEXT: [[SHARED1:%.]] = load float, float** [[SHARED_ASCAST]], align 8
// CHECK-NEXT: store float [[SRC:%.]], float [[SRC_ADDR_ASCAST]], align 4		// CHECK-NEXT: store float [[SRC:%.]], float [[SRC_ADDR_ASCAST]], align 4
// CHECK-NEXT: store float* [[SHARED1]], float** [[SHARED_ADDR_ASCAST]], align 8		// CHECK-NEXT: store float* [[SHARED1]], float** [[SHARED_ADDR_ASCAST]], align 8
// CHECK-NEXT: [[TMP1:%.]] = load float, float** [[SHARED_ADDR_ASCAST]], align 8		// CHECK-NEXT: [[TMP1:%.]] = load float, float** [[SHARED_ADDR_ASCAST]], align 8
// CHECK-NEXT: [[TMP2:%.]] = addrspacecast float [[TMP1]] to float addrspace(3)*		// CHECK-NEXT: [[TMP2:%.]] = addrspacecast float [[TMP1]] to float addrspace(3)*
// CHECK-NEXT: [[TMP3:%.]] = load float, float [[SRC_ADDR_ASCAST]], align 4		// CHECK-NEXT: [[TMP3:%.]] = load float, float [[SRC_ADDR_ASCAST]], align 4
// CHECK-NEXT: [[TMP4:%.]] = call contract float @llvm.amdgcn.ds.fmin.f32(float addrspace(3) [[TMP2]], float [[TMP3]], i32 0, i32 0, i1 false)		// CHECK-NEXT: [[TMP4:%.]] = call contract float @llvm.amdgcn.ds.fmin.f32(float addrspace(3) [[TMP2]], float [[TMP3]], i32 0, i32 0, i1 false)
// CHECK-NEXT: store volatile float [[TMP4]], float* [[X_ASCAST]], align 4		// CHECK-NEXT: store volatile float [[TMP4]], float* [[X_ASCAST]], align 4
// CHECK-NEXT: [[TMP5:%.]] = load float, float** [[SHARED_ADDR_ASCAST]], align 8		// CHECK-NEXT: [[TMP5:%.]] = load float, float** [[SHARED_ADDR_ASCAST]], align 8
// CHECK-NEXT: call void @_Z4funcPf(float* [[TMP5]]) #[[ATTR8:[0-9]+]]		// CHECK-NEXT: call void @_Z4funcPf(float* [[TMP5]]) #[[ATTR8:[0-9]+]]
// CHECK-NEXT: ret void		// CHECK-NEXT: ret void
//		//
__global__ void test_ds_fmin_func(float src, float *__restrict shared) {		__global__ void test_ds_fmin_func(float src, float *__restrict shared) {
volatile float x = __builtin_amdgcn_ds_fminf(shared, src, 0, 0, false);		volatile float x = __builtin_amdgcn_ds_fminf(shared, src, 0, 0, false);
func(shared);		func(shared);
}		}


// CHECK-LABEL: @_Z14test_is_sharedPf(		// CHECK-LABEL: @_Z14test_is_sharedPf(
// CHECK-NEXT: entry:		// CHECK-NEXT: entry:
// CHECK-NEXT: [[X:%.]] = alloca float, align 8, addrspace(5)		// CHECK-NEXT: [[X:%.]] = alloca float, align 8, addrspace(5)
// CHECK-NEXT: [[X_ASCAST:%.]] = addrspacecast float addrspace(5)* [[X]] to float**
// CHECK-NEXT: [[X_ADDR:%.]] = alloca float, align 8, addrspace(5)		// CHECK-NEXT: [[X_ADDR:%.]] = alloca float, align 8, addrspace(5)
// CHECK-NEXT: [[X_ADDR_ASCAST:%.]] = addrspacecast float addrspace(5)* [[X_ADDR]] to float**
// CHECK-NEXT: [[RET:%.*]] = alloca i8, align 1, addrspace(5)		// CHECK-NEXT: [[RET:%.*]] = alloca i8, align 1, addrspace(5)
		// CHECK-NEXT: [[X_ASCAST:%.]] = addrspacecast float addrspace(5)* [[X]] to float**
		// CHECK-NEXT: [[X_ADDR_ASCAST:%.]] = addrspacecast float addrspace(5)* [[X_ADDR]] to float**
// CHECK-NEXT: [[RET_ASCAST:%.]] = addrspacecast i8 addrspace(5) [[RET]] to i8*		// CHECK-NEXT: [[RET_ASCAST:%.]] = addrspacecast i8 addrspace(5) [[RET]] to i8*
// CHECK-NEXT: [[TMP0:%.]] = addrspacecast float addrspace(1) [[X_COERCE:%.]] to float		// CHECK-NEXT: [[TMP0:%.]] = addrspacecast float addrspace(1) [[X_COERCE:%.]] to float
// CHECK-NEXT: store float* [[TMP0]], float** [[X_ASCAST]], align 8		// CHECK-NEXT: store float* [[TMP0]], float** [[X_ASCAST]], align 8
// CHECK-NEXT: [[X1:%.]] = load float, float** [[X_ASCAST]], align 8		// CHECK-NEXT: [[X1:%.]] = load float, float** [[X_ASCAST]], align 8
// CHECK-NEXT: store float* [[X1]], float** [[X_ADDR_ASCAST]], align 8		// CHECK-NEXT: store float* [[X1]], float** [[X_ADDR_ASCAST]], align 8
// CHECK-NEXT: [[TMP1:%.]] = load float, float** [[X_ADDR_ASCAST]], align 8		// CHECK-NEXT: [[TMP1:%.]] = load float, float** [[X_ADDR_ASCAST]], align 8
// CHECK-NEXT: [[TMP2:%.]] = bitcast float [[TMP1]] to i8*		// CHECK-NEXT: [[TMP2:%.]] = bitcast float [[TMP1]] to i8*
// CHECK-NEXT: [[TMP3:%.]] = call i1 @llvm.amdgcn.is.shared(i8 [[TMP2]])		// CHECK-NEXT: [[TMP3:%.]] = call i1 @llvm.amdgcn.is.shared(i8 [[TMP2]])
// CHECK-NEXT: [[FROMBOOL:%.*]] = zext i1 [[TMP3]] to i8		// CHECK-NEXT: [[FROMBOOL:%.*]] = zext i1 [[TMP3]] to i8
// CHECK-NEXT: store i8 [[FROMBOOL]], i8* [[RET_ASCAST]], align 1		// CHECK-NEXT: store i8 [[FROMBOOL]], i8* [[RET_ASCAST]], align 1
// CHECK-NEXT: ret void		// CHECK-NEXT: ret void
//		//
__global__ void test_is_shared(float *x){		__global__ void test_is_shared(float *x){
bool ret = __builtin_amdgcn_is_shared(x);		bool ret = __builtin_amdgcn_is_shared(x);
}		}

clang/test/CodeGenCXX/amdgcn-automatic-variable.cpp

	Show All 11 Lines
	//			//
	void func1(int *x) {			void func1(int *x) {
	*x = 1;			*x = 1;
	}			}

	// CHECK-LABEL: @_Z5func2v(			// CHECK-LABEL: @_Z5func2v(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[LV1:%.*]] = alloca i32, align 4, addrspace(5)			// CHECK-NEXT: [[LV1:%.*]] = alloca i32, align 4, addrspace(5)
	// CHECK-NEXT: [[LV1_ASCAST:%.]] = addrspacecast i32 addrspace(5) [[LV1]] to i32*
	// CHECK-NEXT: [[LV2:%.*]] = alloca i32, align 4, addrspace(5)			// CHECK-NEXT: [[LV2:%.*]] = alloca i32, align 4, addrspace(5)
	// CHECK-NEXT: [[LV2_ASCAST:%.]] = addrspacecast i32 addrspace(5) [[LV2]] to i32*
	// CHECK-NEXT: [[LA:%.*]] = alloca [100 x i32], align 4, addrspace(5)			// CHECK-NEXT: [[LA:%.*]] = alloca [100 x i32], align 4, addrspace(5)
	// CHECK-NEXT: [[LA_ASCAST:%.]] = addrspacecast [100 x i32] addrspace(5) [[LA]] to [100 x i32]*
	// CHECK-NEXT: [[LP1:%.]] = alloca i32, align 8, addrspace(5)			// CHECK-NEXT: [[LP1:%.]] = alloca i32, align 8, addrspace(5)
	// CHECK-NEXT: [[LP1_ASCAST:%.]] = addrspacecast i32 addrspace(5)* [[LP1]] to i32**
	// CHECK-NEXT: [[LP2:%.]] = alloca i32, align 8, addrspace(5)			// CHECK-NEXT: [[LP2:%.]] = alloca i32, align 8, addrspace(5)
	// CHECK-NEXT: [[LP2_ASCAST:%.]] = addrspacecast i32 addrspace(5)* [[LP2]] to i32**
	// CHECK-NEXT: [[LVC:%.*]] = alloca i32, align 4, addrspace(5)			// CHECK-NEXT: [[LVC:%.*]] = alloca i32, align 4, addrspace(5)
				// CHECK-NEXT: [[LV1_ASCAST:%.]] = addrspacecast i32 addrspace(5) [[LV1]] to i32*
				// CHECK-NEXT: [[LV2_ASCAST:%.]] = addrspacecast i32 addrspace(5) [[LV2]] to i32*
				// CHECK-NEXT: [[LA_ASCAST:%.]] = addrspacecast [100 x i32] addrspace(5) [[LA]] to [100 x i32]*
				// CHECK-NEXT: [[LP1_ASCAST:%.]] = addrspacecast i32 addrspace(5)* [[LP1]] to i32**
				// CHECK-NEXT: [[LP2_ASCAST:%.]] = addrspacecast i32 addrspace(5)* [[LP2]] to i32**
	// CHECK-NEXT: [[LVC_ASCAST:%.]] = addrspacecast i32 addrspace(5) [[LVC]] to i32*			// CHECK-NEXT: [[LVC_ASCAST:%.]] = addrspacecast i32 addrspace(5) [[LVC]] to i32*
	// CHECK-NEXT: store i32 1, i32* [[LV1_ASCAST]], align 4			// CHECK-NEXT: store i32 1, i32* [[LV1_ASCAST]], align 4
	// CHECK-NEXT: store i32 2, i32* [[LV2_ASCAST]], align 4			// CHECK-NEXT: store i32 2, i32* [[LV2_ASCAST]], align 4
	// CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [100 x i32], [100 x i32] [[LA_ASCAST]], i64 0, i64 0			// CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [100 x i32], [100 x i32] [[LA_ASCAST]], i64 0, i64 0
	// CHECK-NEXT: store i32 3, i32* [[ARRAYIDX]], align 4			// CHECK-NEXT: store i32 3, i32* [[ARRAYIDX]], align 4
	// CHECK-NEXT: store i32* [[LV1_ASCAST]], i32** [[LP1_ASCAST]], align 8			// CHECK-NEXT: store i32* [[LV1_ASCAST]], i32** [[LP1_ASCAST]], align 8
	// CHECK-NEXT: [[ARRAYDECAY:%.]] = getelementptr inbounds [100 x i32], [100 x i32] [[LA_ASCAST]], i64 0, i64 0			// CHECK-NEXT: [[ARRAYDECAY:%.]] = getelementptr inbounds [100 x i32], [100 x i32] [[LA_ASCAST]], i64 0, i64 0
	// CHECK-NEXT: store i32* [[ARRAYDECAY]], i32** [[LP2_ASCAST]], align 8			// CHECK-NEXT: store i32* [[ARRAYDECAY]], i32** [[LP2_ASCAST]], align 8
	▲ Show 20 Lines • Show All 98 Lines • Show Last 20 Lines

clang/test/CodeGenCXX/amdgcn-func-arg.cpp

	Show All 27 Lines
	//			//
	void func_with_indirect_arg(A a) {			void func_with_indirect_arg(A a) {
	A *p = &a;			A *p = &a;
	}			}

	// CHECK-LABEL: @_Z22test_indirect_arg_autov(			// CHECK-LABEL: @_Z22test_indirect_arg_autov(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[A:%.]] = alloca [[CLASS_A:%.]], align 4, addrspace(5)			// CHECK-NEXT: [[A:%.]] = alloca [[CLASS_A:%.]], align 4, addrspace(5)
	// CHECK-NEXT: [[A_ASCAST:%.]] = addrspacecast [[CLASS_A]] addrspace(5) [[A]] to %class.A*
	// CHECK-NEXT: [[AGG_TMP:%.*]] = alloca [[CLASS_A]], align 4, addrspace(5)			// CHECK-NEXT: [[AGG_TMP:%.*]] = alloca [[CLASS_A]], align 4, addrspace(5)
				// CHECK-NEXT: [[A_ASCAST:%.]] = addrspacecast [[CLASS_A]] addrspace(5) [[A]] to %class.A*
	// CHECK-NEXT: [[AGG_TMP_ASCAST:%.]] = addrspacecast [[CLASS_A]] addrspace(5) [[AGG_TMP]] to %class.A*			// CHECK-NEXT: [[AGG_TMP_ASCAST:%.]] = addrspacecast [[CLASS_A]] addrspace(5) [[AGG_TMP]] to %class.A*
	// CHECK-NEXT: call void @_ZN1AC1Ev(%class.A* nonnull align 4 dereferenceable(4) [[A_ASCAST]])			// CHECK-NEXT: call void @_ZN1AC1Ev(%class.A* nonnull align 4 dereferenceable(4) [[A_ASCAST]])
	// CHECK-NEXT: [[TMP0:%.]] = bitcast %class.A [[AGG_TMP_ASCAST]] to i8*			// CHECK-NEXT: [[TMP0:%.]] = bitcast %class.A [[AGG_TMP_ASCAST]] to i8*
	// CHECK-NEXT: [[TMP1:%.]] = bitcast %class.A [[A_ASCAST]] to i8*			// CHECK-NEXT: [[TMP1:%.]] = bitcast %class.A [[A_ASCAST]] to i8*
	// CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 [[TMP0]], i8* align 4 [[TMP1]], i64 4, i1 false)			// CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 [[TMP0]], i8* align 4 [[TMP1]], i64 4, i1 false)
	// CHECK-NEXT: [[AGG_TMP_ASCAST_ASCAST:%.]] = addrspacecast %class.A [[AGG_TMP_ASCAST]] to [[CLASS_A]] addrspace(5)*			// CHECK-NEXT: [[AGG_TMP_ASCAST_ASCAST:%.]] = addrspacecast %class.A [[AGG_TMP_ASCAST]] to [[CLASS_A]] addrspace(5)*
	// CHECK-NEXT: call void @_Z22func_with_indirect_arg1A([[CLASS_A]] addrspace(5)* [[AGG_TMP_ASCAST_ASCAST]])			// CHECK-NEXT: call void @_Z22func_with_indirect_arg1A([[CLASS_A]] addrspace(5)* [[AGG_TMP_ASCAST_ASCAST]])
	// CHECK-NEXT: call void @_ZN1AD1Ev(%class.A* nonnull align 4 dereferenceable(4) [[AGG_TMP_ASCAST]])			// CHECK-NEXT: call void @_ZN1AD1Ev(%class.A* nonnull align 4 dereferenceable(4) [[AGG_TMP_ASCAST]])
	Show All 34 Lines
	//			//
	void func_with_byval_arg(B b) {			void func_with_byval_arg(B b) {
	B *p = &b;			B *p = &b;
	}			}

	// CHECK-LABEL: @_Z19test_byval_arg_autov(			// CHECK-LABEL: @_Z19test_byval_arg_autov(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[B:%.]] = alloca [[CLASS_B:%.]], align 4, addrspace(5)			// CHECK-NEXT: [[B:%.]] = alloca [[CLASS_B:%.]], align 4, addrspace(5)
	// CHECK-NEXT: [[B_ASCAST:%.]] = addrspacecast [[CLASS_B]] addrspace(5) [[B]] to %class.B*
	// CHECK-NEXT: [[AGG_TMP:%.*]] = alloca [[CLASS_B]], align 4, addrspace(5)			// CHECK-NEXT: [[AGG_TMP:%.*]] = alloca [[CLASS_B]], align 4, addrspace(5)
				// CHECK-NEXT: [[B_ASCAST:%.]] = addrspacecast [[CLASS_B]] addrspace(5) [[B]] to %class.B*
	// CHECK-NEXT: [[AGG_TMP_ASCAST:%.]] = addrspacecast [[CLASS_B]] addrspace(5) [[AGG_TMP]] to %class.B*			// CHECK-NEXT: [[AGG_TMP_ASCAST:%.]] = addrspacecast [[CLASS_B]] addrspace(5) [[AGG_TMP]] to %class.B*
	// CHECK-NEXT: [[TMP0:%.]] = bitcast %class.B [[AGG_TMP_ASCAST]] to i8*			// CHECK-NEXT: [[TMP0:%.]] = bitcast %class.B [[AGG_TMP_ASCAST]] to i8*
	// CHECK-NEXT: [[TMP1:%.]] = bitcast %class.B [[B_ASCAST]] to i8*			// CHECK-NEXT: [[TMP1:%.]] = bitcast %class.B [[B_ASCAST]] to i8*
	// CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 [[TMP0]], i8* align 4 [[TMP1]], i64 400, i1 false)			// CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 [[TMP0]], i8* align 4 [[TMP1]], i64 400, i1 false)
	// CHECK-NEXT: [[AGG_TMP_ASCAST_ASCAST:%.]] = addrspacecast %class.B [[AGG_TMP_ASCAST]] to [[CLASS_B]] addrspace(5)*			// CHECK-NEXT: [[AGG_TMP_ASCAST_ASCAST:%.]] = addrspacecast %class.B [[AGG_TMP_ASCAST]] to [[CLASS_B]] addrspace(5)*
	// CHECK-NEXT: call void @_Z19func_with_byval_arg1B([[CLASS_B]] addrspace(5)* byval([[CLASS_B]]) align 4 [[AGG_TMP_ASCAST_ASCAST]])			// CHECK-NEXT: call void @_Z19func_with_byval_arg1B([[CLASS_B]] addrspace(5)* byval([[CLASS_B]]) align 4 [[AGG_TMP_ASCAST_ASCAST]])
	// CHECK-NEXT: call void @_Z17func_with_ref_argR1B(%class.B* nonnull align 4 dereferenceable(400) [[B_ASCAST]])			// CHECK-NEXT: call void @_Z17func_with_ref_argR1B(%class.B* nonnull align 4 dereferenceable(400) [[B_ASCAST]])
	// CHECK-NEXT: ret void			// CHECK-NEXT: ret void
	Show All 22 Lines

clang/test/CodeGenCXX/builtin-amdgcn-atomic-inc-dec.cpp

	// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py			// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
	// REQUIRES: amdgpu-registered-target			// REQUIRES: amdgpu-registered-target
	// RUN: %clang_cc1 %s -x hip -fcuda-is-device -emit-llvm -O0 -o - \			// RUN: %clang_cc1 %s -x hip -fcuda-is-device -emit-llvm -O0 -o - \
	// RUN: -triple=amdgcn-amd-amdhsa \| opt -S \| FileCheck %s			// RUN: -triple=amdgcn-amd-amdhsa \| opt -S \| FileCheck %s

	// CHECK-LABEL: @_Z29test_non_volatile_parameter32Pj(			// CHECK-LABEL: @_Z29test_non_volatile_parameter32Pj(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[PTR_ADDR:%.]] = alloca i32, align 8, addrspace(5)			// CHECK-NEXT: [[PTR_ADDR:%.]] = alloca i32, align 8, addrspace(5)
	// CHECK-NEXT: [[PTR_ADDR_ASCAST:%.]] = addrspacecast i32 addrspace(5)* [[PTR_ADDR]] to i32**
	// CHECK-NEXT: [[RES:%.*]] = alloca i32, align 4, addrspace(5)			// CHECK-NEXT: [[RES:%.*]] = alloca i32, align 4, addrspace(5)
				// CHECK-NEXT: [[PTR_ADDR_ASCAST:%.]] = addrspacecast i32 addrspace(5)* [[PTR_ADDR]] to i32**
	// CHECK-NEXT: [[RES_ASCAST:%.]] = addrspacecast i32 addrspace(5) [[RES]] to i32*			// CHECK-NEXT: [[RES_ASCAST:%.]] = addrspacecast i32 addrspace(5) [[RES]] to i32*
	// CHECK-NEXT: store i32* [[PTR:%.]], i32* [[PTR_ADDR_ASCAST]], align 8			// CHECK-NEXT: store i32* [[PTR:%.]], i32* [[PTR_ADDR_ASCAST]], align 8
	// CHECK-NEXT: [[TMP0:%.]] = load i32, i32** [[PTR_ADDR_ASCAST]], align 8			// CHECK-NEXT: [[TMP0:%.]] = load i32, i32** [[PTR_ADDR_ASCAST]], align 8
	// CHECK-NEXT: [[TMP1:%.]] = load i32, i32** [[PTR_ADDR_ASCAST]], align 8			// CHECK-NEXT: [[TMP1:%.]] = load i32, i32** [[PTR_ADDR_ASCAST]], align 8
	// CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[TMP1]], align 4			// CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[TMP1]], align 4
	// CHECK-NEXT: [[TMP3:%.]] = call i32 @llvm.amdgcn.atomic.inc.i32.p0i32(i32 [[TMP0]], i32 [[TMP2]], i32 7, i32 2, i1 false)			// CHECK-NEXT: [[TMP3:%.]] = call i32 @llvm.amdgcn.atomic.inc.i32.p0i32(i32 [[TMP0]], i32 [[TMP2]], i32 7, i32 2, i1 false)
	// CHECK-NEXT: store i32 [[TMP3]], i32* [[RES_ASCAST]], align 4			// CHECK-NEXT: store i32 [[TMP3]], i32* [[RES_ASCAST]], align 4
	// CHECK-NEXT: [[TMP4:%.]] = load i32, i32** [[PTR_ADDR_ASCAST]], align 8			// CHECK-NEXT: [[TMP4:%.]] = load i32, i32** [[PTR_ADDR_ASCAST]], align 8
	// CHECK-NEXT: [[TMP5:%.]] = load i32, i32** [[PTR_ADDR_ASCAST]], align 8			// CHECK-NEXT: [[TMP5:%.]] = load i32, i32** [[PTR_ADDR_ASCAST]], align 8
	// CHECK-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4			// CHECK-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4
	// CHECK-NEXT: [[TMP7:%.]] = call i32 @llvm.amdgcn.atomic.dec.i32.p0i32(i32 [[TMP4]], i32 [[TMP6]], i32 7, i32 2, i1 false)			// CHECK-NEXT: [[TMP7:%.]] = call i32 @llvm.amdgcn.atomic.dec.i32.p0i32(i32 [[TMP4]], i32 [[TMP6]], i32 7, i32 2, i1 false)
	// CHECK-NEXT: store i32 [[TMP7]], i32* [[RES_ASCAST]], align 4			// CHECK-NEXT: store i32 [[TMP7]], i32* [[RES_ASCAST]], align 4
	// CHECK-NEXT: ret void			// CHECK-NEXT: ret void
	//			//
	__attribute__((device)) void test_non_volatile_parameter32(__UINT32_TYPE__ *ptr) {			__attribute__((device)) void test_non_volatile_parameter32(__UINT32_TYPE__ *ptr) {
	__UINT32_TYPE__ res;			__UINT32_TYPE__ res;
	res = __builtin_amdgcn_atomic_inc32(ptr, *ptr, __ATOMIC_SEQ_CST, "workgroup");			res = __builtin_amdgcn_atomic_inc32(ptr, *ptr, __ATOMIC_SEQ_CST, "workgroup");

	res = __builtin_amdgcn_atomic_dec32(ptr, *ptr, __ATOMIC_SEQ_CST, "workgroup");			res = __builtin_amdgcn_atomic_dec32(ptr, *ptr, __ATOMIC_SEQ_CST, "workgroup");
	}			}

	// CHECK-LABEL: @_Z29test_non_volatile_parameter64Py(			// CHECK-LABEL: @_Z29test_non_volatile_parameter64Py(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[PTR_ADDR:%.]] = alloca i64, align 8, addrspace(5)			// CHECK-NEXT: [[PTR_ADDR:%.]] = alloca i64, align 8, addrspace(5)
	// CHECK-NEXT: [[PTR_ADDR_ASCAST:%.]] = addrspacecast i64 addrspace(5)* [[PTR_ADDR]] to i64**
	// CHECK-NEXT: [[RES:%.*]] = alloca i64, align 8, addrspace(5)			// CHECK-NEXT: [[RES:%.*]] = alloca i64, align 8, addrspace(5)
				// CHECK-NEXT: [[PTR_ADDR_ASCAST:%.]] = addrspacecast i64 addrspace(5)* [[PTR_ADDR]] to i64**
	// CHECK-NEXT: [[RES_ASCAST:%.]] = addrspacecast i64 addrspace(5) [[RES]] to i64*			// CHECK-NEXT: [[RES_ASCAST:%.]] = addrspacecast i64 addrspace(5) [[RES]] to i64*
	// CHECK-NEXT: store i64* [[PTR:%.]], i64* [[PTR_ADDR_ASCAST]], align 8			// CHECK-NEXT: store i64* [[PTR:%.]], i64* [[PTR_ADDR_ASCAST]], align 8
	// CHECK-NEXT: [[TMP0:%.]] = load i64, i64** [[PTR_ADDR_ASCAST]], align 8			// CHECK-NEXT: [[TMP0:%.]] = load i64, i64** [[PTR_ADDR_ASCAST]], align 8
	// CHECK-NEXT: [[TMP1:%.]] = load i64, i64** [[PTR_ADDR_ASCAST]], align 8			// CHECK-NEXT: [[TMP1:%.]] = load i64, i64** [[PTR_ADDR_ASCAST]], align 8
	// CHECK-NEXT: [[TMP2:%.]] = load i64, i64 [[TMP1]], align 8			// CHECK-NEXT: [[TMP2:%.]] = load i64, i64 [[TMP1]], align 8
	// CHECK-NEXT: [[TMP3:%.]] = call i64 @llvm.amdgcn.atomic.inc.i64.p0i64(i64 [[TMP0]], i64 [[TMP2]], i32 7, i32 2, i1 false)			// CHECK-NEXT: [[TMP3:%.]] = call i64 @llvm.amdgcn.atomic.inc.i64.p0i64(i64 [[TMP0]], i64 [[TMP2]], i32 7, i32 2, i1 false)
	// CHECK-NEXT: store i64 [[TMP3]], i64* [[RES_ASCAST]], align 8			// CHECK-NEXT: store i64 [[TMP3]], i64* [[RES_ASCAST]], align 8
	// CHECK-NEXT: [[TMP4:%.]] = load i64, i64** [[PTR_ADDR_ASCAST]], align 8			// CHECK-NEXT: [[TMP4:%.]] = load i64, i64** [[PTR_ADDR_ASCAST]], align 8
	// CHECK-NEXT: [[TMP5:%.]] = load i64, i64** [[PTR_ADDR_ASCAST]], align 8			// CHECK-NEXT: [[TMP5:%.]] = load i64, i64** [[PTR_ADDR_ASCAST]], align 8
	// CHECK-NEXT: [[TMP6:%.]] = load i64, i64 [[TMP5]], align 8			// CHECK-NEXT: [[TMP6:%.]] = load i64, i64 [[TMP5]], align 8
	// CHECK-NEXT: [[TMP7:%.]] = call i64 @llvm.amdgcn.atomic.dec.i64.p0i64(i64 [[TMP4]], i64 [[TMP6]], i32 7, i32 2, i1 false)			// CHECK-NEXT: [[TMP7:%.]] = call i64 @llvm.amdgcn.atomic.dec.i64.p0i64(i64 [[TMP4]], i64 [[TMP6]], i32 7, i32 2, i1 false)
	// CHECK-NEXT: store i64 [[TMP7]], i64* [[RES_ASCAST]], align 8			// CHECK-NEXT: store i64 [[TMP7]], i64* [[RES_ASCAST]], align 8
	// CHECK-NEXT: ret void			// CHECK-NEXT: ret void
	//			//
	__attribute__((device)) void test_non_volatile_parameter64(__UINT64_TYPE__ *ptr) {			__attribute__((device)) void test_non_volatile_parameter64(__UINT64_TYPE__ *ptr) {
	__UINT64_TYPE__ res;			__UINT64_TYPE__ res;
	res = __builtin_amdgcn_atomic_inc64(ptr, *ptr, __ATOMIC_SEQ_CST, "workgroup");			res = __builtin_amdgcn_atomic_inc64(ptr, *ptr, __ATOMIC_SEQ_CST, "workgroup");

	res = __builtin_amdgcn_atomic_dec64(ptr, *ptr, __ATOMIC_SEQ_CST, "workgroup");			res = __builtin_amdgcn_atomic_dec64(ptr, *ptr, __ATOMIC_SEQ_CST, "workgroup");
	}			}

	// CHECK-LABEL: @_Z25test_volatile_parameter32PVj(			// CHECK-LABEL: @_Z25test_volatile_parameter32PVj(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[PTR_ADDR:%.]] = alloca i32, align 8, addrspace(5)			// CHECK-NEXT: [[PTR_ADDR:%.]] = alloca i32, align 8, addrspace(5)
	// CHECK-NEXT: [[PTR_ADDR_ASCAST:%.]] = addrspacecast i32 addrspace(5)* [[PTR_ADDR]] to i32**
	// CHECK-NEXT: [[RES:%.*]] = alloca i32, align 4, addrspace(5)			// CHECK-NEXT: [[RES:%.*]] = alloca i32, align 4, addrspace(5)
				// CHECK-NEXT: [[PTR_ADDR_ASCAST:%.]] = addrspacecast i32 addrspace(5)* [[PTR_ADDR]] to i32**
	// CHECK-NEXT: [[RES_ASCAST:%.]] = addrspacecast i32 addrspace(5) [[RES]] to i32*			// CHECK-NEXT: [[RES_ASCAST:%.]] = addrspacecast i32 addrspace(5) [[RES]] to i32*
	// CHECK-NEXT: store i32* [[PTR:%.]], i32* [[PTR_ADDR_ASCAST]], align 8			// CHECK-NEXT: store i32* [[PTR:%.]], i32* [[PTR_ADDR_ASCAST]], align 8
	// CHECK-NEXT: [[TMP0:%.]] = load i32, i32** [[PTR_ADDR_ASCAST]], align 8			// CHECK-NEXT: [[TMP0:%.]] = load i32, i32** [[PTR_ADDR_ASCAST]], align 8
	// CHECK-NEXT: [[TMP1:%.]] = load i32, i32** [[PTR_ADDR_ASCAST]], align 8			// CHECK-NEXT: [[TMP1:%.]] = load i32, i32** [[PTR_ADDR_ASCAST]], align 8
	// CHECK-NEXT: [[TMP2:%.]] = load volatile i32, i32 [[TMP1]], align 4			// CHECK-NEXT: [[TMP2:%.]] = load volatile i32, i32 [[TMP1]], align 4
	// CHECK-NEXT: [[TMP3:%.]] = call i32 @llvm.amdgcn.atomic.inc.i32.p0i32(i32 [[TMP0]], i32 [[TMP2]], i32 7, i32 2, i1 true)			// CHECK-NEXT: [[TMP3:%.]] = call i32 @llvm.amdgcn.atomic.inc.i32.p0i32(i32 [[TMP0]], i32 [[TMP2]], i32 7, i32 2, i1 true)
	// CHECK-NEXT: store i32 [[TMP3]], i32* [[RES_ASCAST]], align 4			// CHECK-NEXT: store i32 [[TMP3]], i32* [[RES_ASCAST]], align 4
	// CHECK-NEXT: [[TMP4:%.]] = load i32, i32** [[PTR_ADDR_ASCAST]], align 8			// CHECK-NEXT: [[TMP4:%.]] = load i32, i32** [[PTR_ADDR_ASCAST]], align 8
	// CHECK-NEXT: [[TMP5:%.]] = load i32, i32** [[PTR_ADDR_ASCAST]], align 8			// CHECK-NEXT: [[TMP5:%.]] = load i32, i32** [[PTR_ADDR_ASCAST]], align 8
	// CHECK-NEXT: [[TMP6:%.]] = load volatile i32, i32 [[TMP5]], align 4			// CHECK-NEXT: [[TMP6:%.]] = load volatile i32, i32 [[TMP5]], align 4
	// CHECK-NEXT: [[TMP7:%.]] = call i32 @llvm.amdgcn.atomic.dec.i32.p0i32(i32 [[TMP4]], i32 [[TMP6]], i32 7, i32 2, i1 true)			// CHECK-NEXT: [[TMP7:%.]] = call i32 @llvm.amdgcn.atomic.dec.i32.p0i32(i32 [[TMP4]], i32 [[TMP6]], i32 7, i32 2, i1 true)
	// CHECK-NEXT: store i32 [[TMP7]], i32* [[RES_ASCAST]], align 4			// CHECK-NEXT: store i32 [[TMP7]], i32* [[RES_ASCAST]], align 4
	// CHECK-NEXT: ret void			// CHECK-NEXT: ret void
	//			//
	__attribute__((device)) void test_volatile_parameter32(volatile __UINT32_TYPE__ *ptr) {			__attribute__((device)) void test_volatile_parameter32(volatile __UINT32_TYPE__ *ptr) {
	__UINT32_TYPE__ res;			__UINT32_TYPE__ res;
	res = __builtin_amdgcn_atomic_inc32(ptr, *ptr, __ATOMIC_SEQ_CST, "workgroup");			res = __builtin_amdgcn_atomic_inc32(ptr, *ptr, __ATOMIC_SEQ_CST, "workgroup");

	res = __builtin_amdgcn_atomic_dec32(ptr, *ptr, __ATOMIC_SEQ_CST, "workgroup");			res = __builtin_amdgcn_atomic_dec32(ptr, *ptr, __ATOMIC_SEQ_CST, "workgroup");
	}			}

	// CHECK-LABEL: @_Z25test_volatile_parameter64PVy(			// CHECK-LABEL: @_Z25test_volatile_parameter64PVy(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[PTR_ADDR:%.]] = alloca i64, align 8, addrspace(5)			// CHECK-NEXT: [[PTR_ADDR:%.]] = alloca i64, align 8, addrspace(5)
	// CHECK-NEXT: [[PTR_ADDR_ASCAST:%.]] = addrspacecast i64 addrspace(5)* [[PTR_ADDR]] to i64**
	// CHECK-NEXT: [[RES:%.*]] = alloca i64, align 8, addrspace(5)			// CHECK-NEXT: [[RES:%.*]] = alloca i64, align 8, addrspace(5)
				// CHECK-NEXT: [[PTR_ADDR_ASCAST:%.]] = addrspacecast i64 addrspace(5)* [[PTR_ADDR]] to i64**
	// CHECK-NEXT: [[RES_ASCAST:%.]] = addrspacecast i64 addrspace(5) [[RES]] to i64*			// CHECK-NEXT: [[RES_ASCAST:%.]] = addrspacecast i64 addrspace(5) [[RES]] to i64*
	// CHECK-NEXT: store i64* [[PTR:%.]], i64* [[PTR_ADDR_ASCAST]], align 8			// CHECK-NEXT: store i64* [[PTR:%.]], i64* [[PTR_ADDR_ASCAST]], align 8
	// CHECK-NEXT: [[TMP0:%.]] = load i64, i64** [[PTR_ADDR_ASCAST]], align 8			// CHECK-NEXT: [[TMP0:%.]] = load i64, i64** [[PTR_ADDR_ASCAST]], align 8
	// CHECK-NEXT: [[TMP1:%.]] = load i64, i64** [[PTR_ADDR_ASCAST]], align 8			// CHECK-NEXT: [[TMP1:%.]] = load i64, i64** [[PTR_ADDR_ASCAST]], align 8
	// CHECK-NEXT: [[TMP2:%.]] = load volatile i64, i64 [[TMP1]], align 8			// CHECK-NEXT: [[TMP2:%.]] = load volatile i64, i64 [[TMP1]], align 8
	// CHECK-NEXT: [[TMP3:%.]] = call i64 @llvm.amdgcn.atomic.inc.i64.p0i64(i64 [[TMP0]], i64 [[TMP2]], i32 7, i32 2, i1 true)			// CHECK-NEXT: [[TMP3:%.]] = call i64 @llvm.amdgcn.atomic.inc.i64.p0i64(i64 [[TMP0]], i64 [[TMP2]], i32 7, i32 2, i1 true)
	// CHECK-NEXT: store i64 [[TMP3]], i64* [[RES_ASCAST]], align 8			// CHECK-NEXT: store i64 [[TMP3]], i64* [[RES_ASCAST]], align 8
	// CHECK-NEXT: [[TMP4:%.]] = load i64, i64** [[PTR_ADDR_ASCAST]], align 8			// CHECK-NEXT: [[TMP4:%.]] = load i64, i64** [[PTR_ADDR_ASCAST]], align 8
	▲ Show 20 Lines • Show All 256 Lines • Show Last 20 Lines

clang/test/CodeGenCXX/vla.cpp

Show All 12 Lines	int f() {
// CHECK: @_ZN1SIiE1nE = linkonce_odr{{.*}} global i32 5		// CHECK: @_ZN1SIiE1nE = linkonce_odr{{.*}} global i32 5
int a[S<int>::n];		int a[S<int>::n];
return sizeof a;		return sizeof a;
}		}

// rdar://problem/9506377		// rdar://problem/9506377
void test0(void *array, int n) {		void test0(void *array, int n) {
// CHECK-LABEL: define{{.*}} void @_Z5test0Pvi(		// CHECK-LABEL: define{{.*}} void @_Z5test0Pvi(
// X64: [[ARRAY:%.]] = alloca i8, align 8
// AMDGCN: [[ARRAY0:%.]] = alloca i8, align 8, addrspace(5)		// AMDGCN: [[ARRAY0:%.]] = alloca i8, align 8, addrspace(5)
		// AMDGCN-NEXT: [[N0:%.*]] = alloca i32, align 4, addrspace(5)
		// AMDGCN-NEXT: [[REF0:%.]] = alloca i16, align 8, addrspace(5)
		// AMDGCN-NEXT: [[S0:%.*]] = alloca i16, align 2, addrspace(5)
// AMDGCN-NEXT: [[ARRAY:%.]] = addrspacecast i8 addrspace(5)* [[ARRAY0]] to i8**		// AMDGCN-NEXT: [[ARRAY:%.]] = addrspacecast i8 addrspace(5)* [[ARRAY0]] to i8**
// X64-NEXT: [[N:%.*]] = alloca i32, align 4
// AMDGCN: [[N0:%.*]] = alloca i32, align 4, addrspace(5)
// AMDGCN-NEXT: [[N:%.]] = addrspacecast i32 addrspace(5) [[N0]] to i32*		// AMDGCN-NEXT: [[N:%.]] = addrspacecast i32 addrspace(5) [[N0]] to i32*
// X64-NEXT: [[REF:%.]] = alloca i16, align 8
// AMDGCN: [[REF0:%.]] = alloca i16, align 8, addrspace(5)
// AMDGCN-NEXT: [[REF:%.]] = addrspacecast i16 addrspace(5)* [[REF0]] to i16**		// AMDGCN-NEXT: [[REF:%.]] = addrspacecast i16 addrspace(5)* [[REF0]] to i16**
// X64-NEXT: [[S:%.*]] = alloca i16, align 2
// AMDGCN: [[S0:%.*]] = alloca i16, align 2, addrspace(5)
// AMDGCN-NEXT: [[S:%.]] = addrspacecast i16 addrspace(5) [[S0]] to i16*		// AMDGCN-NEXT: [[S:%.]] = addrspacecast i16 addrspace(5) [[S0]] to i16*
		// X64: [[ARRAY:%.]] = alloca i8, align 8
		// X64-NEXT: [[N:%.*]] = alloca i32, align 4
		// X64-NEXT: [[REF:%.]] = alloca i16, align 8
		// X64-NEXT: [[S:%.*]] = alloca i16, align 2
// CHECK-NEXT: store i8*		// CHECK-NEXT: store i8*
// CHECK-NEXT: store i32		// CHECK-NEXT: store i32

// Capture the bounds.		// Capture the bounds.
// CHECK-NEXT: [[T0:%.]] = load i32, i32 [[N]], align 4		// CHECK-NEXT: [[T0:%.]] = load i32, i32 [[N]], align 4
// CHECK-NEXT: [[DIM0:%.*]] = zext i32 [[T0]] to i64		// CHECK-NEXT: [[DIM0:%.*]] = zext i32 [[T0]] to i64
// CHECK-NEXT: [[T0:%.]] = load i32, i32 [[N]], align 4		// CHECK-NEXT: [[T0:%.]] = load i32, i32 [[N]], align 4
// CHECK-NEXT: [[T1:%.*]] = add nsw i32 [[T0]], 1		// CHECK-NEXT: [[T1:%.*]] = add nsw i32 [[T0]], 1
// CHECK-NEXT: [[DIM1:%.*]] = zext i32 [[T1]] to i64		// CHECK-NEXT: [[DIM1:%.*]] = zext i32 [[T1]] to i64
▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

clang/test/CodeGenSYCL/address-space-deduction.cpp

	// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py			// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
	// RUN: %clang_cc1 -triple spir64 -fsycl-is-device -disable-llvm-passes -emit-llvm %s -o - \| FileCheck %s			// RUN: %clang_cc1 -triple spir64 -fsycl-is-device -disable-llvm-passes -emit-llvm %s -o - \| FileCheck %s


	// CHECK-LABEL: @_Z4testv(			// CHECK-LABEL: @_Z4testv(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[I:%.*]] = alloca i32, align 4			// CHECK-NEXT: [[I:%.*]] = alloca i32, align 4
	// CHECK-NEXT: [[I_ASCAST:%.]] = addrspacecast i32 [[I]] to i32 addrspace(4)*
	// CHECK-NEXT: [[PPTR:%.]] = alloca i32 addrspace(4), align 8			// CHECK-NEXT: [[PPTR:%.]] = alloca i32 addrspace(4), align 8
	// CHECK-NEXT: [[PPTR_ASCAST:%.]] = addrspacecast i32 addrspace(4)* [[PPTR]] to i32 addrspace(4)* addrspace(4)*
	// CHECK-NEXT: [[IS_I_PTR:%.*]] = alloca i8, align 1			// CHECK-NEXT: [[IS_I_PTR:%.*]] = alloca i8, align 1
	// CHECK-NEXT: [[IS_I_PTR_ASCAST:%.]] = addrspacecast i8 [[IS_I_PTR]] to i8 addrspace(4)*
	// CHECK-NEXT: [[VAR23:%.*]] = alloca i32, align 4			// CHECK-NEXT: [[VAR23:%.*]] = alloca i32, align 4
	// CHECK-NEXT: [[VAR23_ASCAST:%.]] = addrspacecast i32 [[VAR23]] to i32 addrspace(4)*
	// CHECK-NEXT: [[CP:%.]] = alloca i8 addrspace(4), align 8			// CHECK-NEXT: [[CP:%.]] = alloca i8 addrspace(4), align 8
	// CHECK-NEXT: [[CP_ASCAST:%.]] = addrspacecast i8 addrspace(4)* [[CP]] to i8 addrspace(4)* addrspace(4)*
	// CHECK-NEXT: [[ARR:%.*]] = alloca [42 x i32], align 4			// CHECK-NEXT: [[ARR:%.*]] = alloca [42 x i32], align 4
	// CHECK-NEXT: [[ARR_ASCAST:%.]] = addrspacecast [42 x i32] [[ARR]] to [42 x i32] addrspace(4)*
	// CHECK-NEXT: [[CPP:%.]] = alloca i8 addrspace(4), align 8			// CHECK-NEXT: [[CPP:%.]] = alloca i8 addrspace(4), align 8
	// CHECK-NEXT: [[CPP_ASCAST:%.]] = addrspacecast i8 addrspace(4)* [[CPP]] to i8 addrspace(4)* addrspace(4)*
	// CHECK-NEXT: [[APTR:%.]] = alloca i32 addrspace(4), align 8			// CHECK-NEXT: [[APTR:%.]] = alloca i32 addrspace(4), align 8
	// CHECK-NEXT: [[APTR_ASCAST:%.]] = addrspacecast i32 addrspace(4)* [[APTR]] to i32 addrspace(4)* addrspace(4)*
	// CHECK-NEXT: [[STR:%.]] = alloca i8 addrspace(4), align 8			// CHECK-NEXT: [[STR:%.]] = alloca i8 addrspace(4), align 8
	// CHECK-NEXT: [[STR_ASCAST:%.]] = addrspacecast i8 addrspace(4)* [[STR]] to i8 addrspace(4)* addrspace(4)*
	// CHECK-NEXT: [[PHI_STR:%.]] = alloca i8 addrspace(4), align 8			// CHECK-NEXT: [[PHI_STR:%.]] = alloca i8 addrspace(4), align 8
	// CHECK-NEXT: [[PHI_STR_ASCAST:%.]] = addrspacecast i8 addrspace(4)* [[PHI_STR]] to i8 addrspace(4)* addrspace(4)*
	// CHECK-NEXT: [[SELECT_NULL:%.]] = alloca i8 addrspace(4), align 8			// CHECK-NEXT: [[SELECT_NULL:%.]] = alloca i8 addrspace(4), align 8
	// CHECK-NEXT: [[SELECT_NULL_ASCAST:%.]] = addrspacecast i8 addrspace(4)* [[SELECT_NULL]] to i8 addrspace(4)* addrspace(4)*
	// CHECK-NEXT: [[SELECT_STR_TRIVIAL1:%.]] = alloca i8 addrspace(4), align 8			// CHECK-NEXT: [[SELECT_STR_TRIVIAL1:%.]] = alloca i8 addrspace(4), align 8
	// CHECK-NEXT: [[SELECT_STR_TRIVIAL1_ASCAST:%.]] = addrspacecast i8 addrspace(4)* [[SELECT_STR_TRIVIAL1]] to i8 addrspace(4)* addrspace(4)*
	// CHECK-NEXT: [[SELECT_STR_TRIVIAL2:%.]] = alloca i8 addrspace(4), align 8			// CHECK-NEXT: [[SELECT_STR_TRIVIAL2:%.]] = alloca i8 addrspace(4), align 8
				// CHECK-NEXT: [[I_ASCAST:%.]] = addrspacecast i32 [[I]] to i32 addrspace(4)*
				// CHECK-NEXT: [[PPTR_ASCAST:%.]] = addrspacecast i32 addrspace(4)* [[PPTR]] to i32 addrspace(4)* addrspace(4)*
				// CHECK-NEXT: [[IS_I_PTR_ASCAST:%.]] = addrspacecast i8 [[IS_I_PTR]] to i8 addrspace(4)*
				// CHECK-NEXT: [[VAR23_ASCAST:%.]] = addrspacecast i32 [[VAR23]] to i32 addrspace(4)*
				// CHECK-NEXT: [[CP_ASCAST:%.]] = addrspacecast i8 addrspace(4)* [[CP]] to i8 addrspace(4)* addrspace(4)*
				// CHECK-NEXT: [[ARR_ASCAST:%.]] = addrspacecast [42 x i32] [[ARR]] to [42 x i32] addrspace(4)*
				// CHECK-NEXT: [[CPP_ASCAST:%.]] = addrspacecast i8 addrspace(4)* [[CPP]] to i8 addrspace(4)* addrspace(4)*
				// CHECK-NEXT: [[APTR_ASCAST:%.]] = addrspacecast i32 addrspace(4)* [[APTR]] to i32 addrspace(4)* addrspace(4)*
				// CHECK-NEXT: [[STR_ASCAST:%.]] = addrspacecast i8 addrspace(4)* [[STR]] to i8 addrspace(4)* addrspace(4)*
				// CHECK-NEXT: [[PHI_STR_ASCAST:%.]] = addrspacecast i8 addrspace(4)* [[PHI_STR]] to i8 addrspace(4)* addrspace(4)*
				// CHECK-NEXT: [[SELECT_NULL_ASCAST:%.]] = addrspacecast i8 addrspace(4)* [[SELECT_NULL]] to i8 addrspace(4)* addrspace(4)*
				// CHECK-NEXT: [[SELECT_STR_TRIVIAL1_ASCAST:%.]] = addrspacecast i8 addrspace(4)* [[SELECT_STR_TRIVIAL1]] to i8 addrspace(4)* addrspace(4)*
	// CHECK-NEXT: [[SELECT_STR_TRIVIAL2_ASCAST:%.]] = addrspacecast i8 addrspace(4)* [[SELECT_STR_TRIVIAL2]] to i8 addrspace(4)* addrspace(4)*			// CHECK-NEXT: [[SELECT_STR_TRIVIAL2_ASCAST:%.]] = addrspacecast i8 addrspace(4)* [[SELECT_STR_TRIVIAL2]] to i8 addrspace(4)* addrspace(4)*
	// CHECK-NEXT: store i32 0, i32 addrspace(4)* [[I_ASCAST]], align 4			// CHECK-NEXT: store i32 0, i32 addrspace(4)* [[I_ASCAST]], align 4
	// CHECK-NEXT: store i32 addrspace(4)* [[I_ASCAST]], i32 addrspace(4)* addrspace(4)* [[PPTR_ASCAST]], align 8			// CHECK-NEXT: store i32 addrspace(4)* [[I_ASCAST]], i32 addrspace(4)* addrspace(4)* [[PPTR_ASCAST]], align 8
	// CHECK-NEXT: [[TMP0:%.]] = load i32 addrspace(4), i32 addrspace(4)* addrspace(4)* [[PPTR_ASCAST]], align 8			// CHECK-NEXT: [[TMP0:%.]] = load i32 addrspace(4), i32 addrspace(4)* addrspace(4)* [[PPTR_ASCAST]], align 8
	// CHECK-NEXT: [[CMP:%.]] = icmp eq i32 addrspace(4) [[TMP0]], [[I_ASCAST]]			// CHECK-NEXT: [[CMP:%.]] = icmp eq i32 addrspace(4) [[TMP0]], [[I_ASCAST]]
	// CHECK-NEXT: [[FROMBOOL:%.*]] = zext i1 [[CMP]] to i8			// CHECK-NEXT: [[FROMBOOL:%.*]] = zext i1 [[CMP]] to i8
	// CHECK-NEXT: store i8 [[FROMBOOL]], i8 addrspace(4)* [[IS_I_PTR_ASCAST]], align 1			// CHECK-NEXT: store i8 [[FROMBOOL]], i8 addrspace(4)* [[IS_I_PTR_ASCAST]], align 1
	// CHECK-NEXT: [[TMP1:%.]] = load i32 addrspace(4), i32 addrspace(4)* addrspace(4)* [[PPTR_ASCAST]], align 8			// CHECK-NEXT: [[TMP1:%.]] = load i32 addrspace(4), i32 addrspace(4)* addrspace(4)* [[PPTR_ASCAST]], align 8
	▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

clang/test/OpenMP/amdgcn_target_init_temp_alloca.cpp

	// REQUIRES: amdgpu-registered-target			// REQUIRES: amdgpu-registered-target

	// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc			// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
	// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck %s			// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck %s
	// expected-no-diagnostics			// expected-no-diagnostics

	#define N 100			#define N 100

	int test_amdgcn_target_temp_alloca() {			int test_amdgcn_target_temp_alloca() {
	// CHECK-LABEL: test_amdgcn_target_temp_alloca			// CHECK-LABEL: test_amdgcn_target_temp_alloca

	int arr[N];			int arr[N];

	// CHECK: [[VAR_ADDR:%.+]] = alloca [100 x i32]*, align 8, addrspace(5)			// CHECK: [[VAR_ADDR:%.+]] = alloca [100 x i32]*, align 8, addrspace(5)
				// CHECK-NEXT: [[VAR2_ADDR:%.+]] = alloca i32, align 4, addrspace(5)
	// CHECK-NEXT: [[VAR_ADDR_CAST:%.+]] = addrspacecast [100 x i32]* addrspace(5)* [[VAR_ADDR]] to [100 x i32]**			// CHECK-NEXT: [[VAR_ADDR_CAST:%.+]] = addrspacecast [100 x i32]* addrspace(5)* [[VAR_ADDR]] to [100 x i32]**
				// CHECK-NEXT: [[VAR2_ADDR_CAST:%.+]] = addrspacecast i32 addrspace(5)* [[VAR2_ADDR]] to i32*
	// CHECK: store [100 x i32]* [[VAR:%.+]], [100 x i32]** [[VAR_ADDR_CAST]], align 8			// CHECK: store [100 x i32]* [[VAR:%.+]], [100 x i32]** [[VAR_ADDR_CAST]], align 8

	#pragma omp target			#pragma omp target
	for (int i = 0; i < N; i++) {			for (int i = 0; i < N; i++) {
	arr[i] = 1;			arr[i] = 1;
	}			}

	return arr[0];			return arr[0];
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[CFE][Codegen] Make sure to maintain the contiguity of all the static allocasClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 386045

clang/lib/CodeGen/CGExpr.cpp

clang/lib/CodeGen/CodeGenFunction.h

clang/lib/CodeGen/CodeGenFunction.cpp

clang/test/CodeGenCUDA/builtins-amdgcn.cu

clang/test/CodeGenCXX/amdgcn-automatic-variable.cpp

clang/test/CodeGenCXX/amdgcn-func-arg.cpp

clang/test/CodeGenCXX/builtin-amdgcn-atomic-inc-dec.cpp

clang/test/CodeGenCXX/vla.cpp

clang/test/CodeGenSYCL/address-space-deduction.cpp

clang/test/OpenMP/amdgcn_target_init_temp_alloca.cpp

[CFE][Codegen] Make sure to maintain the contiguity of all the static allocas
ClosedPublic