This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Introduce more scratch registers in the ABI.
ClosedPublic

Authored by cdevadas on Mar 18 2020, 5:58 AM.

Download Raw Diff

Details

Reviewers

arsenm
t-tye
rampitec
b-sumner
mjbedy
tpr

Commits

rG375cec4b6c85: [AMDGPU] Introduce more scratch registers in the ABI.

Summary

The AMDGPU target has a convention that defined all VGPRs
(except the initial 32 argument registers) as callee-saved.
This convention is not efficient always, esp. when the callee
requiring more registers, ended up emitting a large number of
spills, even though its caller requires only a few.

This patch revises the ABI by introducing more scratch registers
that a callee can freely use.
The 256 VGPR registers now become:

32 argument registers
112 scratch registers and
112 callee-saved registers.

The scratch registers and the CSRs are intermixed at regular
intervals (a split boundary of 8) to obtain a better occupancy.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

cdevadas created this revision.Mar 18 2020, 5:58 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 18 2020, 5:58 AM

Herald added subscribers: llvm-commits, kerbowa, arphaman and 9 others. · View Herald Transcript

Harbormaster completed remote builds in B49583: Diff 251047.Mar 18 2020, 7:03 AM

arsenm added inline comments.Mar 18 2020, 7:26 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.td
220–223 ↗	(On Diff #251047)	This part should be split into a separate change

cdevadas marked an inline comment as done.Mar 18 2020, 8:13 AM

cdevadas added inline comments.

llvm/lib/Target/AMDGPU/SIRegisterInfo.td
220–223 ↗	(On Diff #251047)	Do you prefer it as a separate commit?

arsenm added inline comments.Mar 18 2020, 8:42 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.td
220–223 ↗	(On Diff #251047)	Yes, this can be committed independently

Reverted the changes made on the CostPerUse value.
It will go in a follow-up commit after this review.

In D76356#1929542, @cdevadas wrote:

Reverted the changes made on the CostPerUse value.
It will go in a follow-up commit after this review.

I would expect this to be the preliminary commit

rebase

Harbormaster completed remote builds in B49853: Diff 251581.Mar 20 2020, 3:46 AM

arsenm accepted this revision.Mar 20 2020, 9:08 AM

arsenm added inline comments.

llvm/docs/AMDGPUUsage.rst
6511	A description of why it's split this way may be helpful

This revision is now accepted and ready to land.Mar 20 2020, 9:08 AM

t-tye added inline comments.Mar 20 2020, 10:26 AM

llvm/docs/AMDGPUUsage.rst
6511	Is the striping being picked at 4 VGPRs to match the hardware VGPR allocation granularity (4 for <=GFX9 and 8 for >=GFX10)? How does this stripping impact register file fragmentation? What is the impact of objects being promoted to registers that are larger than 4 VGPRs?

arsenm added inline comments.Mar 20 2020, 10:52 AM

llvm/docs/AMDGPUUsage.rst
6511	These aren't used for argument passing, so there's no concept of objects to consider

t-tye added a reviewer: b-sumner.Mar 20 2020, 10:59 AM

t-tye added inline comments.

llvm/docs/AMDGPUUsage.rst
6511	Also this is an ABI breaking change (as is the change for the handling of the wave scratch offset) so should the EI_ABIVERSION for each EI_OSABI in the ELF header be bumped? My thinking is no since AMD has not yet started to support isa level linking nor function pointers so this change cannot affect any existing programs.

arsenm added inline comments.Mar 20 2020, 11:26 AM

llvm/docs/AMDGPUUsage.rst
6511	We didn't have, and still don't, have an ABI worthy of considering for this

b-sumner marked an inline comment as done.Mar 20 2020, 11:44 AM

b-sumner added inline comments.

llvm/docs/AMDGPUUsage.rst
6511	Seems OK to me to not bump the version since we're not touching the kernel ABI and we don't have ISA linking.

Added a brief description of the register split (AMDGPUUsage.rst).

Harbormaster completed remote builds in B49944: Diff 251739.Mar 20 2020, 2:07 PM

arsenm accepted this revision.Mar 20 2020, 2:22 PM

arsenm added inline comments.

llvm/docs/AMDGPUUsage.rst
6537	guarantee is a bit too strong of a claim

t-tye added inline comments.Mar 20 2020, 5:35 PM

llvm/docs/AMDGPUUsage.rst
6511	These aren't used for argument passing, so there's no concept of objects to consider But if the compiler wants to promote an object to contiguous registers, then it will end up spanning both clobbered and non-clobbered registers for objects larger than the granularity, forcing some spill/restore code for the parts that are in non-clobbered registers. If the stripping was at a courser granularity then this may be unnecessary if the object will fit in the courser granularity. Do we think this could be an issue?

cdevadas marked 2 inline comments as done.Mar 21 2020, 7:35 AM

cdevadas added inline comments.

llvm/docs/AMDGPUUsage.rst
6511	If you are still talking about passing the object, the compiler won't promote an object of large size into registers beyond the range defined by the convention (the first 32 VGPRs, in our case, that we didn't split anyway).
6537	Will change it to 'get a better occupancy'

Can you add some tests stressing very wide VGPRs with calls? These will be dynamic vector indexing, and some image intrinsics (or you could just use inline asm)

llvm/docs/AMDGPUUsage.rst
6511	We have few contexts with single VPGR tuples > 4. Some image instructions and indirect register indexing. We probably don’t have any tests with calls stressing these cases

This revision now requires changes to proceed.Mar 21 2020, 9:04 AM

cdevadas marked an inline comment as done.Mar 21 2020, 10:29 AM

cdevadas added inline comments.

llvm/docs/AMDGPUUsage.rst
6511	Are you saying, there are cases more than 4 contiguous VGPRs to be allocated by RA? If yes, can you tell me the instructions? I was under the impression that we have instructions requiring at most 4 contiguous VGPRs (for instance, FLAT_LOAD_DWORDX4)

t-tye added inline comments.Mar 21 2020, 11:44 AM

llvm/docs/AMDGPUUsage.rst
6511	No I am not talking about argument passing, I am talking about generating code for the body of the function. It would presumably be best that it only uses clobbered registers for values that do not want to be live across the calls it makes. So for objects larger than 4 dwords that will mean prologue/epilogue spilling/restoring, and register shuffling at call sites.
6511	We do have T# and V# values using in image instructions that are larger than 4 dwords. The instructions that use them currently need them in SGPRs but they may need to be moved from VGPRs (which would not need them to be contiguous). Do we also want to be sure the ABI will work for future hardware that may change, or are we ok with different ABIs for different targets?

t-tye added a reviewer: mjbedy.Mar 21 2020, 12:11 PM

Added @mjbedy to review.

t-tye added a reviewer: tpr.Mar 21 2020, 12:42 PM

rampitec added inline comments.Mar 23 2020, 1:13 PM

llvm/docs/AMDGPUUsage.rst
6511	There are some image instructions which have address in much longer VGPR tuple. If you look into SiRegisterInfo.td our largest VGPR register class is 32 dwords long. I'd say this also shall be a minimal interleave value.

arsenm added inline comments.Mar 23 2020, 1:50 PM

llvm/docs/AMDGPUUsage.rst
6511	But those are AGPRs, which last I checked were not considered preserved across calls anyway. The question is also how often do those need to be live across calls, which is probably very rare.

rampitec added inline comments.Mar 23 2020, 2:12 PM

llvm/docs/AMDGPUUsage.rst
6511	AGPRs can be copied to VGPRs. Anyway, there are image addresses too which are pretty big.

Thank you all for the comments.
I can see that there are concerns with the current split boundary (4 VGPRs together), considering the fact that we have wide VGPR uses in certain scenarios (image instructions).
But, like Matt mentioned, how frequently such scenarios occur?
Changing the split boundary to a large value would probably take away the whole purpose of this patch - reduce the CSR spills & try to ensure a better occupancy.

In D76356#1939493, @cdevadas wrote:

Thank you all for the comments.
I can see that there are concerns with the current split boundary (4 VGPRs together), considering the fact that we have wide VGPR uses in certain scenarios (image instructions).
But, like Matt mentioned, how frequently such scenarios occur?
Changing the split boundary to a large value would probably take away the whole purpose of this patch - reduce the CSR spills & try to ensure a better occupancy.

What will happen if you need to pass VReg_1024 into a function? It might not be a common case, but will it even work?

In D76356#1939580, @rampitec wrote:

In D76356#1939493, @cdevadas wrote:

Thank you all for the comments.
I can see that there are concerns with the current split boundary (4 VGPRs together), considering the fact that we have wide VGPR uses in certain scenarios (image instructions).
But, like Matt mentioned, how frequently such scenarios occur?
Changing the split boundary to a large value would probably take away the whole purpose of this patch - reduce the CSR spills & try to ensure a better occupancy.

What will happen if you need to pass VReg_1024 into a function? It might not be a common case, but will it even work?

It works, but that isn't changed by this patch. This is not changing the argument registers which are all still in v0-v31. I believe the largest argument type we pass in registers is <8 x i32>, and force <32 x i32> to be stack passed anyway

In D76356#1939646, @arsenm wrote:

In D76356#1939580, @rampitec wrote:

In D76356#1939493, @cdevadas wrote:

Thank you all for the comments.
I can see that there are concerns with the current split boundary (4 VGPRs together), considering the fact that we have wide VGPR uses in certain scenarios (image instructions).
But, like Matt mentioned, how frequently such scenarios occur?
Changing the split boundary to a large value would probably take away the whole purpose of this patch - reduce the CSR spills & try to ensure a better occupancy.

What will happen if you need to pass VReg_1024 into a function? It might not be a common case, but will it even work?

It works, but that isn't changed by this patch. This is not changing the argument registers which are all still in v0-v31. I believe the largest argument type we pass in registers is <8 x i32>, and force <32 x i32> to be stack passed anyway

OK. What if such a register needs to be preserved by a caller? I guess there is no safe window for it, so it will be forced to spill. Then we will spill a whole huge register, not just a part of it, because we do not use sublane spilling (except to AGPRs), right?

In D76356#1939683, @rampitec wrote:

In D76356#1939646, @arsenm wrote:

In D76356#1939580, @rampitec wrote:

In D76356#1939493, @cdevadas wrote:

Thank you all for the comments.
I can see that there are concerns with the current split boundary (4 VGPRs together), considering the fact that we have wide VGPR uses in certain scenarios (image instructions).
But, like Matt mentioned, how frequently such scenarios occur?
Changing the split boundary to a large value would probably take away the whole purpose of this patch - reduce the CSR spills & try to ensure a better occupancy.

What will happen if you need to pass VReg_1024 into a function? It might not be a common case, but will it even work?

It works, but that isn't changed by this patch. This is not changing the argument registers which are all still in v0-v31. I believe the largest argument type we pass in registers is <8 x i32>, and force <32 x i32> to be stack passed anyway

OK. What if such a register needs to be preserved by a caller? I guess there is no safe window for it, so it will be forced to spill. Then we will spill a whole huge register, not just a part of it, because we do not use sublane spilling (except to AGPRs), right?

Yes, that's what I would expect

In D76356#1942070, @arsenm wrote:

In D76356#1939683, @rampitec wrote:

In D76356#1939646, @arsenm wrote:

In D76356#1939580, @rampitec wrote:

In D76356#1939493, @cdevadas wrote:

Thank you all for the comments.
I can see that there are concerns with the current split boundary (4 VGPRs together), considering the fact that we have wide VGPR uses in certain scenarios (image instructions).
But, like Matt mentioned, how frequently such scenarios occur?
Changing the split boundary to a large value would probably take away the whole purpose of this patch - reduce the CSR spills & try to ensure a better occupancy.

What will happen if you need to pass VReg_1024 into a function? It might not be a common case, but will it even work?

It works, but that isn't changed by this patch. This is not changing the argument registers which are all still in v0-v31. I believe the largest argument type we pass in registers is <8 x i32>, and force <32 x i32> to be stack passed anyway

OK. What if such a register needs to be preserved by a caller? I guess there is no safe window for it, so it will be forced to spill. Then we will spill a whole huge register, not just a part of it, because we do not use sublane spilling (except to AGPRs), right?

Yes, that's what I would expect

Then probably interleave 4 is not a best choice. We may also need to adjust cost of tuples to make aligned allocation more likely.

In D76356#1942097, @rampitec wrote:

In D76356#1942070, @arsenm wrote:

In D76356#1939683, @rampitec wrote:

In D76356#1939646, @arsenm wrote:

In D76356#1939580, @rampitec wrote:

In D76356#1939493, @cdevadas wrote:

Thank you all for the comments.
I can see that there are concerns with the current split boundary (4 VGPRs together), considering the fact that we have wide VGPR uses in certain scenarios (image instructions).
But, like Matt mentioned, how frequently such scenarios occur?
Changing the split boundary to a large value would probably take away the whole purpose of this patch - reduce the CSR spills & try to ensure a better occupancy.

What will happen if you need to pass VReg_1024 into a function? It might not be a common case, but will it even work?

It works, but that isn't changed by this patch. This is not changing the argument registers which are all still in v0-v31. I believe the largest argument type we pass in registers is <8 x i32>, and force <32 x i32> to be stack passed anyway

OK. What if such a register needs to be preserved by a caller? I guess there is no safe window for it, so it will be forced to spill. Then we will spill a whole huge register, not just a part of it, because we do not use sublane spilling (except to AGPRs), right?

Yes, that's what I would expect

Then probably interleave 4 is not a best choice. We may also need to adjust cost of tuples to make aligned allocation more likely.

A cost for VGPR registers has been handled with https://reviews.llvm.org/D76417. This will ensure a balanced allocation of scratch registers & CSRs at every split (once the current patch is in the upstream).

In D76356#1942097, @rampitec wrote:

In D76356#1942070, @arsenm wrote:

In D76356#1939683, @rampitec wrote:

In D76356#1939646, @arsenm wrote:

In D76356#1939580, @rampitec wrote:

In D76356#1939493, @cdevadas wrote:

Thank you all for the comments.
I can see that there are concerns with the current split boundary (4 VGPRs together), considering the fact that we have wide VGPR uses in certain scenarios (image instructions).
But, like Matt mentioned, how frequently such scenarios occur?
Changing the split boundary to a large value would probably take away the whole purpose of this patch - reduce the CSR spills & try to ensure a better occupancy.

What will happen if you need to pass VReg_1024 into a function? It might not be a common case, but will it even work?

It works, but that isn't changed by this patch. This is not changing the argument registers which are all still in v0-v31. I believe the largest argument type we pass in registers is <8 x i32>, and force <32 x i32> to be stack passed anyway

OK. What if such a register needs to be preserved by a caller? I guess there is no safe window for it, so it will be forced to spill. Then we will spill a whole huge register, not just a part of it, because we do not use sublane spilling (except to AGPRs), right?

Yes, that's what I would expect

Then probably interleave 4 is not a best choice. We may also need to adjust cost of tuples to make aligned allocation more likely.

It's a tradeoff between more CSR spills in general, and the case where a large tuple needs to live across a call (which I still expect to be extremely rare). AGPRs are never treated as CSRs anyway (and for the real AGPR use cases, calls are unlikely to be used)

In D76356#1944135, @cdevadas wrote:

In D76356#1942097, @rampitec wrote:

Then probably interleave 4 is not a best choice. We may also need to adjust cost of tuples to make aligned allocation more likely.

A cost for VGPR registers has been handled with https://reviews.llvm.org/D76417. This will ensure a balanced allocation of scratch registers & CSRs at every split (once the current patch is in the upstream).

D76417 changes the cost of VGPRs, I am speaking about tuples. Assume you interleave by 4. Not you can allocate something into v[64:68] or v[65:69]. If you use the latter you will have a spill of 4 registers around any call site where it is alive. If you use the former you have a chance not to spill it.

mjbedy added inline comments.Mar 30 2020, 6:47 AM

llvm/docs/AMDGPUUsage.rst
6511	An additional concern I would have is with additional register fragmentation this might cause. This is essentially introducing an alignment requirement for wider VGPR allocations in these cases, which could cause higher register requirements. Has there been any investigation of the impact of this?

Divided the VGPRs into equal number of CSRs and scratch registers. Also, added a test case for VGPR tuple allocation.

Harbormaster completed remote builds in B55209: Diff 261054.Apr 29 2020, 4:13 PM

Ping

LGTM with test nits. This isn't perfect but we can't do much better now

llvm/test/CodeGen/AMDGPU/vgpr-tuple-allocation.ll
7	inreg currently doesn't do anything for non-shaders, so you should remove it here.
9	Typo presreved

Fixed the testcase.

Harbormaster completed remote builds in B55574: Diff 261700.May 3 2020, 11:09 AM

arsenm accepted this revision.May 4 2020, 7:38 AM

LGTM. The call convention is still open to further refinement as more information is collected, but this appears to be an improvement so a good starting point.

This revision is now accepted and ready to land.May 5 2020, 9:31 AM

Closed by commit rG375cec4b6c85: [AMDGPU] Introduce more scratch registers in the ABI. (authored by cdevadas). · Explain WhyMay 5 2020, 10:47 AM

This revision was automatically updated to reflect the committed changes.

This breaks check-llvm on Windows: http://45.33.8.238/win/14566/step_11.txt

Please take a look, and revert if it takes a while to investigate. (Maybe the test just needs a triple?)

cdevadas mentioned this in D95795: [AMDGPU] Add new CostPerUse values for VGPRs.Feb 1 2021, 9:39 AM

Revision Contents

Path

Size

llvm/

docs/

AMDGPUUsage.rst

22 lines

lib/

Target/

AMDGPU/

AMDGPUCallingConv.td

20 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

insertelement.ll

13 lines

call-argument-types.ll

12 lines

call-graph-register-usage.ll

40 lines

call-preserved-registers.ll

60 lines

call-waitcnt.ll

4 lines

callee-frame-setup.ll

53 lines

callee-special-input-sgprs-fixed-abi.ll

6 lines

callee-special-input-sgprs.ll

6 lines

callee-special-input-vgprs.ll

19 lines

cross-block-use-is-not-abi-copy.ll

64 lines

ipra-regmask.ll

2 lines

mul24-pass-ordering.ll

50 lines

nested-calls.ll

16 lines

regbank-reassign.mir

2 lines

sibling-call.ll

32 lines

spill-csr-frame-ptr-reg-copy.ll

8 lines

stack-pointer-offset-relative-frameindex.ll

6 lines

vgpr-tuple-allocation.ll

170 lines

virtregrewrite-undef-identity-copy.mir

8 lines

wave32.ll

16 lines

Diff 262168

llvm/docs/AMDGPUUsage.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 6,501 Lines • ▼ Show 20 Lines

	1. VGPR0-31 and SGPR4-29 are used to pass function result arguments as			1. VGPR0-31 and SGPR4-29 are used to pass function result arguments as
	described below. Any registers used are considered clobbered registers.			described below. Any registers used are considered clobbered registers.
	2. The following registers are preserved and have the same value as on entry:			2. The following registers are preserved and have the same value as on entry:

	* FLAT_SCRATCH			* FLAT_SCRATCH
	* EXEC			* EXEC
	* GFX6-8: M0			* GFX6-8: M0
	* All SGPR and VGPR registers except the clobbered registers of SGPR4-31 and			* All SGPR registers except the clobbered registers of SGPR4-31.
	VGPR0-31.			* VGPR40-47
				arsenmUnsubmitted Not Done Reply Inline Actions A description of why it's split this way may be helpful arsenm: A description of why it's split this way may be helpful
				t-tyeUnsubmitted Not Done Reply Inline Actions Is the striping being picked at 4 VGPRs to match the hardware VGPR allocation granularity (4 for <=GFX9 and 8 for >=GFX10)? How does this stripping impact register file fragmentation? What is the impact of objects being promoted to registers that are larger than 4 VGPRs? t-tye: Is the striping being picked at 4 VGPRs to match the hardware VGPR allocation granularity (4…
				arsenmUnsubmitted Not Done Reply Inline Actions These aren't used for argument passing, so there's no concept of objects to consider arsenm: These aren't used for argument passing, so there's no concept of objects to consider
				t-tyeUnsubmitted Not Done Reply Inline Actions These aren't used for argument passing, so there's no concept of objects to consider But if the compiler wants to promote an object to contiguous registers, then it will end up spanning both clobbered and non-clobbered registers for objects larger than the granularity, forcing some spill/restore code for the parts that are in non-clobbered registers. If the stripping was at a courser granularity then this may be unnecessary if the object will fit in the courser granularity. Do we think this could be an issue? t-tye: > These aren't used for argument passing, so there's no concept of objects to consider But if…
				cdevadasAuthorUnsubmitted Done Reply Inline Actions If you are still talking about passing the object, the compiler won't promote an object of large size into registers beyond the range defined by the convention (the first 32 VGPRs, in our case, that we didn't split anyway). cdevadas: If you are still talking about passing the object, the compiler won't promote an object of…
				t-tyeUnsubmitted Not Done Reply Inline Actions No I am not talking about argument passing, I am talking about generating code for the body of the function. It would presumably be best that it only uses clobbered registers for values that do not want to be live across the calls it makes. So for objects larger than 4 dwords that will mean prologue/epilogue spilling/restoring, and register shuffling at call sites. t-tye: No I am not talking about argument passing, I am talking about generating code for the body of…
				mjbedyUnsubmitted Not Done Reply Inline Actions An additional concern I would have is with additional register fragmentation this might cause. This is essentially introducing an alignment requirement for wider VGPR allocations in these cases, which could cause higher register requirements. Has there been any investigation of the impact of this? mjbedy: An additional concern I would have is with additional register fragmentation this might cause.
				arsenmUnsubmitted Not Done Reply Inline Actions We have few contexts with single VPGR tuples > 4. Some image instructions and indirect register indexing. We probably don’t have any tests with calls stressing these cases arsenm: We have few contexts with single VPGR tuples > 4. Some image instructions and indirect register…
				cdevadasAuthorUnsubmitted Done Reply Inline Actions Are you saying, there are cases more than 4 contiguous VGPRs to be allocated by RA? If yes, can you tell me the instructions? I was under the impression that we have instructions requiring at most 4 contiguous VGPRs (for instance, FLAT_LOAD_DWORDX4) cdevadas: Are you saying, there are cases more than 4 contiguous VGPRs to be allocated by RA? If yes, can…
				t-tyeUnsubmitted Not Done Reply Inline Actions We do have T# and V# values using in image instructions that are larger than 4 dwords. The instructions that use them currently need them in SGPRs but they may need to be moved from VGPRs (which would not need them to be contiguous). Do we also want to be sure the ABI will work for future hardware that may change, or are we ok with different ABIs for different targets? t-tye: We do have T# and V# values using in image instructions that are larger than 4 dwords. The…
				rampitecUnsubmitted Not Done Reply Inline Actions There are some image instructions which have address in much longer VGPR tuple. If you look into SiRegisterInfo.td our largest VGPR register class is 32 dwords long. I'd say this also shall be a minimal interleave value. rampitec: There are some image instructions which have address in much longer VGPR tuple. If you look…
				arsenmUnsubmitted Not Done Reply Inline Actions But those are AGPRs, which last I checked were not considered preserved across calls anyway. The question is also how often do those need to be live across calls, which is probably very rare. arsenm: But those are AGPRs, which last I checked were not considered preserved across calls anyway.
				rampitecUnsubmitted Not Done Reply Inline Actions AGPRs can be copied to VGPRs. Anyway, there are image addresses too which are pretty big. rampitec: AGPRs can be copied to VGPRs. Anyway, there are image addresses too which are pretty big.
				t-tyeUnsubmitted Not Done Reply Inline Actions Also this is an ABI breaking change (as is the change for the handling of the wave scratch offset) so should the EI_ABIVERSION for each EI_OSABI in the ELF header be bumped? My thinking is no since AMD has not yet started to support isa level linking nor function pointers so this change cannot affect any existing programs. t-tye: Also this is an ABI breaking change (as is the change for the handling of the wave scratch…
				arsenmUnsubmitted Not Done Reply Inline Actions We didn't have, and still don't, have an ABI worthy of considering for this arsenm: We didn't have, and still don't, have an ABI worthy of considering for this
				b-sumnerUnsubmitted Done Reply Inline Actions Seems OK to me to not bump the version since we're not touching the kernel ABI and we don't have ISA linking. b-sumner: Seems OK to me to not bump the version since we're not touching the kernel ABI and we don't…
				VGPR56-63
				VGPR72-79
				VGPR88-95
				VGPR104-111
				VGPR120-127
				VGPR136-143
				VGPR152-159
				VGPR168-175
				VGPR184-191
				VGPR200-207
				VGPR216-223
				VGPR232-239
				VGPR248-255
				*Except the argument registers, the VGPR cloberred and the preserved
				registers are intermixed at regular intervals in order to
				get a better occupancy.*

	For the AMDGPU backend, an inter-procedural register allocation (IPRA)			For the AMDGPU backend, an inter-procedural register allocation (IPRA)
	optimization may mark some of clobbered SGPR4-31 and VGPR0-31 registers as			optimization may mark some of clobbered SGPR and VGPR registers as
	preserved if it can be determined that the called function does not change			preserved if it can be determined that the called function does not change
	their value.			their value.

	2. The PC is set to the RA provided on entry.			2. The PC is set to the RA provided on entry.
	3. MODE register: TBD.			3. MODE register: TBD.
	4. All other registers are clobbered.			4. All other registers are clobbered.
	5. Any necessary ``waitcnt`` has been performed to ensure memory accessed by			5. Any necessary ``waitcnt`` has been performed to ensure memory accessed by
				arsenmUnsubmitted Not Done Reply Inline Actions guarantee is a bit too strong of a claim arsenm: guarantee is a bit too strong of a claim
				cdevadasAuthorUnsubmitted Done Reply Inline Actions Will change it to 'get a better occupancy' cdevadas: Will change it to 'get a better occupancy'
	function is available to the caller.			function is available to the caller.

	.. TODO::			.. TODO::

	- On gfx908 are all ACC registers clobbered?			- On gfx908 are all ACC registers clobbered?

	- How are function results returned? The address of structured types is passed			- How are function results returned? The address of structured types is passed
	by reference, but what about other types?			by reference, but what about other types?
	▲ Show 20 Lines • Show All 1,235 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td

	Show First 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
	def CSR_AMDGPU_VGPRs_24_255 : CalleeSavedRegs<			def CSR_AMDGPU_VGPRs_24_255 : CalleeSavedRegs<
	(sequence "VGPR%u", 24, 255)			(sequence "VGPR%u", 24, 255)
	>;			>;

	def CSR_AMDGPU_VGPRs_32_255 : CalleeSavedRegs<			def CSR_AMDGPU_VGPRs_32_255 : CalleeSavedRegs<
	(sequence "VGPR%u", 32, 255)			(sequence "VGPR%u", 32, 255)
	>;			>;

				def CSR_AMDGPU_VGPRs : CalleeSavedRegs<
				// The CSRs & scratch-registers are interleaved at a split boundary of 8.
				(add (sequence "VGPR%u", 40, 47),
				(sequence "VGPR%u", 56, 63),
				(sequence "VGPR%u", 72, 79),
				(sequence "VGPR%u", 88, 95),
				(sequence "VGPR%u", 104, 111),
				(sequence "VGPR%u", 120, 127),
				(sequence "VGPR%u", 136, 143),
				(sequence "VGPR%u", 152, 159),
				(sequence "VGPR%u", 168, 175),
				(sequence "VGPR%u", 184, 191),
				(sequence "VGPR%u", 200, 207),
				(sequence "VGPR%u", 216, 223),
				(sequence "VGPR%u", 232, 239),
				(sequence "VGPR%u", 248, 255))
				>;

	def CSR_AMDGPU_SGPRs_32_105 : CalleeSavedRegs<			def CSR_AMDGPU_SGPRs_32_105 : CalleeSavedRegs<
	(sequence "SGPR%u", 32, 105)			(sequence "SGPR%u", 32, 105)
	>;			>;

	// Just to get the regmask, not for calling convention purposes.			// Just to get the regmask, not for calling convention purposes.
	def CSR_AMDGPU_AllVGPRs : CalleeSavedRegs<			def CSR_AMDGPU_AllVGPRs : CalleeSavedRegs<
	(sequence "VGPR%u", 0, 255)			(sequence "VGPR%u", 0, 255)
	>;			>;

	// Just to get the regmask, not for calling convention purposes.			// Just to get the regmask, not for calling convention purposes.
	def CSR_AMDGPU_AllAllocatableSRegs : CalleeSavedRegs<			def CSR_AMDGPU_AllAllocatableSRegs : CalleeSavedRegs<
	(add (sequence "SGPR%u", 0, 105), VCC_LO, VCC_HI)			(add (sequence "SGPR%u", 0, 105), VCC_LO, VCC_HI)
	>;			>;

	def CSR_AMDGPU_HighRegs : CalleeSavedRegs<			def CSR_AMDGPU_HighRegs : CalleeSavedRegs<
	(add CSR_AMDGPU_VGPRs_32_255, CSR_AMDGPU_SGPRs_32_105)			(add CSR_AMDGPU_VGPRs, CSR_AMDGPU_SGPRs_32_105)
	>;			>;

	// Calling convention for leaf functions			// Calling convention for leaf functions
	def CC_AMDGPU_Func : CallingConv<[			def CC_AMDGPU_Func : CallingConv<[
	CCIfByVal<CCPassByVal<4, 4>>,			CCIfByVal<CCPassByVal<4, 4>>,
	CCIfType<[i1], CCPromoteToType<i32>>,			CCIfType<[i1], CCPromoteToType<i32>>,
	CCIfType<[i1, i8, i16], CCIfExtend<CCPromoteToType<i32>>>,			CCIfType<[i1, i8, i16], CCIfExtend<CCPromoteToType<i32>>>,
	CCIfType<[i32, f32, i16, f16, v2i16, v2f16, i1], CCAssignToReg<[			CCIfType<[i32, f32, i16, f16, v2i16, v2f16, i1], CCAssignToReg<[
	Show All 34 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll

	Show First 20 Lines • Show All 721 Lines • ▼ Show 20 Lines
	}			}

	define void @dyn_insertelement_v8f64_const_s_v_v(double %val, i32 %idx) {			define void @dyn_insertelement_v8f64_const_s_v_v(double %val, i32 %idx) {
	; GPRIDX-LABEL: dyn_insertelement_v8f64_const_s_v_v:			; GPRIDX-LABEL: dyn_insertelement_v8f64_const_s_v_v:
	; GPRIDX: ; %bb.0: ; %entry			; GPRIDX: ; %bb.0: ; %entry
	; GPRIDX-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GPRIDX-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GPRIDX-NEXT: s_mov_b32 s18, 0			; GPRIDX-NEXT: s_mov_b32 s18, 0
	; GPRIDX-NEXT: s_mov_b32 s19, 0x40200000			; GPRIDX-NEXT: s_mov_b32 s19, 0x40200000
	; GPRIDX-NEXT: buffer_store_dword v32, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GPRIDX-NEXT: buffer_store_dword v33, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GPRIDX-NEXT: buffer_store_dword v34, off, s[0:3], s32 ; 4-byte Folded Spill
	; GPRIDX-NEXT: s_mov_b32 s17, 0x401c0000			; GPRIDX-NEXT: s_mov_b32 s17, 0x401c0000
	; GPRIDX-NEXT: s_mov_b32 s16, s18			; GPRIDX-NEXT: s_mov_b32 s16, s18
	; GPRIDX-NEXT: s_mov_b32 s15, 0x40180000			; GPRIDX-NEXT: s_mov_b32 s15, 0x40180000
	; GPRIDX-NEXT: s_mov_b32 s14, s18			; GPRIDX-NEXT: s_mov_b32 s14, s18
	; GPRIDX-NEXT: s_mov_b32 s13, 0x40140000			; GPRIDX-NEXT: s_mov_b32 s13, 0x40140000
	; GPRIDX-NEXT: s_mov_b32 s12, s18			; GPRIDX-NEXT: s_mov_b32 s12, s18
	; GPRIDX-NEXT: s_mov_b64 s[10:11], 4.0			; GPRIDX-NEXT: s_mov_b64 s[10:11], 4.0
	; GPRIDX-NEXT: s_mov_b32 s9, 0x40080000			; GPRIDX-NEXT: s_mov_b32 s9, 0x40080000
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; GPRIDX-NEXT: s_xor_b64 exec, exec, vcc			; GPRIDX-NEXT: s_xor_b64 exec, exec, vcc
	; GPRIDX-NEXT: s_cbranch_execnz BB13_1			; GPRIDX-NEXT: s_cbranch_execnz BB13_1
	; GPRIDX-NEXT: ; %bb.2:			; GPRIDX-NEXT: ; %bb.2:
	; GPRIDX-NEXT: s_mov_b64 exec, s[4:5]			; GPRIDX-NEXT: s_mov_b64 exec, s[4:5]
	; GPRIDX-NEXT: global_store_dwordx4 v[0:1], v[3:6], off			; GPRIDX-NEXT: global_store_dwordx4 v[0:1], v[3:6], off
	; GPRIDX-NEXT: global_store_dwordx4 v[0:1], v[7:10], off			; GPRIDX-NEXT: global_store_dwordx4 v[0:1], v[7:10], off
	; GPRIDX-NEXT: global_store_dwordx4 v[0:1], v[11:14], off			; GPRIDX-NEXT: global_store_dwordx4 v[0:1], v[11:14], off
	; GPRIDX-NEXT: global_store_dwordx4 v[0:1], v[15:18], off			; GPRIDX-NEXT: global_store_dwordx4 v[0:1], v[15:18], off
	; GPRIDX-NEXT: buffer_load_dword v34, off, s[0:3], s32 ; 4-byte Folded Reload
	; GPRIDX-NEXT: buffer_load_dword v33, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GPRIDX-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GPRIDX-NEXT: s_waitcnt vmcnt(0)			; GPRIDX-NEXT: s_waitcnt vmcnt(0)
	; GPRIDX-NEXT: s_setpc_b64 s[30:31]			; GPRIDX-NEXT: s_setpc_b64 s[30:31]
	;			;
	; MOVREL-LABEL: dyn_insertelement_v8f64_const_s_v_v:			; MOVREL-LABEL: dyn_insertelement_v8f64_const_s_v_v:
	; MOVREL: ; %bb.0: ; %entry			; MOVREL: ; %bb.0: ; %entry
	; MOVREL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; MOVREL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; MOVREL-NEXT: s_waitcnt_vscnt null, 0x0			; MOVREL-NEXT: s_waitcnt_vscnt null, 0x0
	; MOVREL-NEXT: s_mov_b32 s18, 0			; MOVREL-NEXT: s_mov_b32 s18, 0
	; MOVREL-NEXT: s_mov_b32 s19, 0x40200000			; MOVREL-NEXT: s_mov_b32 s19, 0x40200000
	; MOVREL-NEXT: s_mov_b32 s17, 0x401c0000			; MOVREL-NEXT: s_mov_b32 s17, 0x401c0000
	; MOVREL-NEXT: s_mov_b32 s15, 0x40180000			; MOVREL-NEXT: s_mov_b32 s15, 0x40180000
	; MOVREL-NEXT: s_mov_b32 s13, 0x40140000			; MOVREL-NEXT: s_mov_b32 s13, 0x40140000
	; MOVREL-NEXT: s_mov_b32 s16, s18			; MOVREL-NEXT: s_mov_b32 s16, s18
	; MOVREL-NEXT: s_mov_b32 s14, s18			; MOVREL-NEXT: s_mov_b32 s14, s18
	; MOVREL-NEXT: s_mov_b32 s12, s18			; MOVREL-NEXT: s_mov_b32 s12, s18
	; MOVREL-NEXT: s_mov_b64 s[10:11], 4.0			; MOVREL-NEXT: s_mov_b64 s[10:11], 4.0
	; MOVREL-NEXT: s_mov_b32 s9, 0x40080000			; MOVREL-NEXT: s_mov_b32 s9, 0x40080000
	; MOVREL-NEXT: s_mov_b32 s8, s18			; MOVREL-NEXT: s_mov_b32 s8, s18
	; MOVREL-NEXT: s_mov_b64 s[6:7], 2.0			; MOVREL-NEXT: s_mov_b64 s[6:7], 2.0
	; MOVREL-NEXT: s_mov_b64 s[4:5], 1.0			; MOVREL-NEXT: s_mov_b64 s[4:5], 1.0
	; MOVREL-NEXT: buffer_store_dword v32, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; MOVREL-NEXT: buffer_store_dword v33, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; MOVREL-NEXT: buffer_store_dword v34, off, s[0:3], s32 ; 4-byte Folded Spill
	; MOVREL-NEXT: v_mov_b32_e32 v34, s19			; MOVREL-NEXT: v_mov_b32_e32 v34, s19
	; MOVREL-NEXT: v_mov_b32_e32 v33, s18			; MOVREL-NEXT: v_mov_b32_e32 v33, s18
	; MOVREL-NEXT: v_mov_b32_e32 v32, s17			; MOVREL-NEXT: v_mov_b32_e32 v32, s17
	; MOVREL-NEXT: v_mov_b32_e32 v31, s16			; MOVREL-NEXT: v_mov_b32_e32 v31, s16
	; MOVREL-NEXT: v_mov_b32_e32 v30, s15			; MOVREL-NEXT: v_mov_b32_e32 v30, s15
	; MOVREL-NEXT: v_mov_b32_e32 v29, s14			; MOVREL-NEXT: v_mov_b32_e32 v29, s14
	; MOVREL-NEXT: v_mov_b32_e32 v28, s13			; MOVREL-NEXT: v_mov_b32_e32 v28, s13
	; MOVREL-NEXT: v_mov_b32_e32 v27, s12			; MOVREL-NEXT: v_mov_b32_e32 v27, s12
	Show All 33 Lines
	; MOVREL-NEXT: s_xor_b32 exec_lo, exec_lo, vcc_lo			; MOVREL-NEXT: s_xor_b32 exec_lo, exec_lo, vcc_lo
	; MOVREL-NEXT: s_cbranch_execnz BB13_1			; MOVREL-NEXT: s_cbranch_execnz BB13_1
	; MOVREL-NEXT: ; %bb.2:			; MOVREL-NEXT: ; %bb.2:
	; MOVREL-NEXT: s_mov_b32 exec_lo, s4			; MOVREL-NEXT: s_mov_b32 exec_lo, s4
	; MOVREL-NEXT: global_store_dwordx4 v[0:1], v[3:6], off			; MOVREL-NEXT: global_store_dwordx4 v[0:1], v[3:6], off
	; MOVREL-NEXT: global_store_dwordx4 v[0:1], v[7:10], off			; MOVREL-NEXT: global_store_dwordx4 v[0:1], v[7:10], off
	; MOVREL-NEXT: global_store_dwordx4 v[0:1], v[11:14], off			; MOVREL-NEXT: global_store_dwordx4 v[0:1], v[11:14], off
	; MOVREL-NEXT: global_store_dwordx4 v[0:1], v[15:18], off			; MOVREL-NEXT: global_store_dwordx4 v[0:1], v[15:18], off
	; MOVREL-NEXT: buffer_load_dword v34, off, s[0:3], s32 ; 4-byte Folded Reload
	; MOVREL-NEXT: buffer_load_dword v33, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; MOVREL-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; MOVREL-NEXT: s_waitcnt vmcnt(0)
	; MOVREL-NEXT: s_waitcnt_vscnt null, 0x0			; MOVREL-NEXT: s_waitcnt_vscnt null, 0x0
	; MOVREL-NEXT: s_setpc_b64 s[30:31]			; MOVREL-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%insert = insertelement <8 x double> <double 1.0, double 2.0, double 3.0, double 4.0, double 5.0, double 6.0, double 7.0, double 8.0>, double %val, i32 %idx			%insert = insertelement <8 x double> <double 1.0, double 2.0, double 3.0, double 4.0, double 5.0, double 6.0, double 7.0, double 8.0>, double %val, i32 %idx
	%vec.0 = shufflevector <8 x double> %insert, <8 x double> undef, <2 x i32> <i32 0, i32 1>			%vec.0 = shufflevector <8 x double> %insert, <8 x double> undef, <2 x i32> <i32 0, i32 1>
	%vec.1 = shufflevector <8 x double> %insert, <8 x double> undef, <2 x i32> <i32 2, i32 3>			%vec.1 = shufflevector <8 x double> %insert, <8 x double> undef, <2 x i32> <i32 2, i32 3>
	%vec.2 = shufflevector <8 x double> %insert, <8 x double> undef, <2 x i32> <i32 4, i32 5>			%vec.2 = shufflevector <8 x double> %insert, <8 x double> undef, <2 x i32> <i32 4, i32 5>
	%vec.3 = shufflevector <8 x double> %insert, <8 x double> undef, <2 x i32> <i32 6, i32 7>			%vec.3 = shufflevector <8 x double> %insert, <8 x double> undef, <2 x i32> <i32 6, i32 7>
	▲ Show 20 Lines • Show All 1,312 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/call-argument-types.ll

	Show First 20 Lines • Show All 738 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @stack_passed_arg_alignment_v32i32_f64(<32 x i32> %val, double %tmp) #0 {			define amdgpu_kernel void @stack_passed_arg_alignment_v32i32_f64(<32 x i32> %val, double %tmp) #0 {
	entry:			entry:
	call void @stack_passed_f64_arg(<32 x i32> %val, double %tmp)			call void @stack_passed_f64_arg(<32 x i32> %val, double %tmp)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}tail_call_byval_align16:			; GCN-LABEL: {{^}}tail_call_byval_align16:
	; GCN-NOT: s32			; GCN-NOT: s32
	; GCN: buffer_store_dword v32, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill			; GCN: buffer_load_dword v32, off, s[0:3], s32 offset:12
	; GCN: buffer_store_dword v33, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GCN: buffer_load_dword v33, off, s[0:3], s32 offset:8
	; GCN: buffer_load_dword v32, off, s[0:3], s32 offset:20
	; GCN: buffer_load_dword v33, off, s[0:3], s32 offset:16

	; GCN: s_getpc_b64			; GCN: s_getpc_b64

	; GCN: buffer_store_dword v32, off, s[0:3], s32 offset:4			; GCN: buffer_store_dword v32, off, s[0:3], s32 offset:4
	; GCN: buffer_store_dword v33, off, s[0:3], s32{{$}}			; GCN: buffer_store_dword v33, off, s[0:3], s32{{$}}
	; GCN: buffer_load_dword v33, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GCN: buffer_load_dword v32, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	; GCN-NOT: s32			; GCN-NOT: s32
	; GCN: s_setpc_b64			; GCN: s_setpc_b64
	define void @tail_call_byval_align16(<32 x i32> %val, double %tmp) #0 {			define void @tail_call_byval_align16(<32 x i32> %val, double %tmp) #0 {
	entry:			entry:
	%alloca = alloca double, align 8, addrspace(5)			%alloca = alloca double, align 8, addrspace(5)
	tail call void @byval_align16_f64_arg(<32 x i32> %val, double addrspace(5)* byval align 16 %alloca)			tail call void @byval_align16_f64_arg(<32 x i32> %val, double addrspace(5)* byval align 16 %alloca)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}tail_call_stack_passed_arg_alignment_v32i32_f64:			; GCN-LABEL: {{^}}tail_call_stack_passed_arg_alignment_v32i32_f64:
	; GCN-NOT: s32			; GCN-NOT: s32
	; GCN: buffer_store_dword v32, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; GCN: buffer_store_dword v33, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GCN: buffer_load_dword v32, off, s[0:3], s32 offset:4			; GCN: buffer_load_dword v32, off, s[0:3], s32 offset:4
	; GCN: buffer_load_dword v33, off, s[0:3], s32{{$}}			; GCN: buffer_load_dword v33, off, s[0:3], s32{{$}}
	; GCN: s_getpc_b64			; GCN: s_getpc_b64
	; GCN: buffer_store_dword v33, off, s[0:3], s32{{$}}			; GCN: buffer_store_dword v33, off, s[0:3], s32{{$}}
	; GCN: buffer_store_dword v32, off, s[0:3], s32 offset:4			; GCN: buffer_store_dword v32, off, s[0:3], s32 offset:4
	; GCN: buffer_load_dword v33, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GCN: buffer_load_dword v32, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	; GCN-NOT: s32			; GCN-NOT: s32
	; GCN: s_setpc_b64			; GCN: s_setpc_b64
	define void @tail_call_stack_passed_arg_alignment_v32i32_f64(<32 x i32> %val, double %tmp) #0 {			define void @tail_call_stack_passed_arg_alignment_v32i32_f64(<32 x i32> %val, double %tmp) #0 {
	entry:			entry:
	tail call void @stack_passed_f64_arg(<32 x i32> %val, double %tmp)			tail call void @stack_passed_f64_arg(<32 x i32> %val, double %tmp)
	ret void			ret void
	}			}

	▲ Show 20 Lines • Show All 140 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/call-graph-register-usage.ll

	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mattr=-code-object-v3 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,CI %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mattr=-code-object-v3 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,CI %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mattr=-code-object-v3 -mcpu=fiji -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,VI,VI-NOBUG %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mattr=-code-object-v3 -mcpu=fiji -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,VI,VI-NOBUG %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mattr=-code-object-v3 -mcpu=iceland -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,VI,VI-BUG %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mattr=-code-object-v3 -mcpu=iceland -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,VI,VI-BUG %s

	; Make sure to run a GPU with the SGPR allocation bug.			; Make sure to run a GPU with the SGPR allocation bug.

	; GCN-LABEL: {{^}}use_vcc:			; GCN-LABEL: {{^}}use_vcc:
	; GCN: ; NumSgprs: 34			; GCN: ; NumSgprs: 34
	; GCN: ; NumVgprs: 0			; GCN: ; NumVgprs: 0
	define void @use_vcc() #1 {			define void @use_vcc() #1 {
	call void asm sideeffect "", "~{vcc}" () #0			call void asm sideeffect "", "~{vcc}" () #0
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}indirect_use_vcc:			; GCN-LABEL: {{^}}indirect_use_vcc:
	; GCN: v_writelane_b32 v32, s33, 2			; GCN: v_writelane_b32 v40, s33, 2
	; GCN: v_writelane_b32 v32, s30, 0			; GCN: v_writelane_b32 v40, s30, 0
	; GCN: v_writelane_b32 v32, s31, 1			; GCN: v_writelane_b32 v40, s31, 1
	; GCN: s_swappc_b64			; GCN: s_swappc_b64
	; GCN: v_readlane_b32 s4, v32, 0			; GCN: v_readlane_b32 s4, v40, 0
	; GCN: v_readlane_b32 s5, v32, 1			; GCN: v_readlane_b32 s5, v40, 1
	; GCN: v_readlane_b32 s33, v32, 2			; GCN: v_readlane_b32 s33, v40, 2
	; GCN: ; NumSgprs: 36			; GCN: ; NumSgprs: 36
	; GCN: ; NumVgprs: 33			; GCN: ; NumVgprs: 41
	define void @indirect_use_vcc() #1 {			define void @indirect_use_vcc() #1 {
	call void @use_vcc()			call void @use_vcc()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}indirect_2level_use_vcc_kernel:			; GCN-LABEL: {{^}}indirect_2level_use_vcc_kernel:
	; GCN: is_dynamic_callstack = 0			; GCN: is_dynamic_callstack = 0
	; CI: ; NumSgprs: 38			; CI: ; NumSgprs: 38
	; VI-NOBUG: ; NumSgprs: 40			; VI-NOBUG: ; NumSgprs: 40
	; VI-BUG: ; NumSgprs: 96			; VI-BUG: ; NumSgprs: 96
	; GCN: ; NumVgprs: 33			; GCN: ; NumVgprs: 41
	define amdgpu_kernel void @indirect_2level_use_vcc_kernel(i32 addrspace(1)* %out) #0 {			define amdgpu_kernel void @indirect_2level_use_vcc_kernel(i32 addrspace(1)* %out) #0 {
	call void @indirect_use_vcc()			call void @indirect_use_vcc()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}use_flat_scratch:			; GCN-LABEL: {{^}}use_flat_scratch:
	; CI: ; NumSgprs: 36			; CI: ; NumSgprs: 36
	; VI: ; NumSgprs: 38			; VI: ; NumSgprs: 38
	; GCN: ; NumVgprs: 0			; GCN: ; NumVgprs: 0
	define void @use_flat_scratch() #1 {			define void @use_flat_scratch() #1 {
	call void asm sideeffect "", "~{flat_scratch}" () #0			call void asm sideeffect "", "~{flat_scratch}" () #0
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}indirect_use_flat_scratch:			; GCN-LABEL: {{^}}indirect_use_flat_scratch:
	; CI: ; NumSgprs: 38			; CI: ; NumSgprs: 38
	; VI: ; NumSgprs: 40			; VI: ; NumSgprs: 40
	; GCN: ; NumVgprs: 33			; GCN: ; NumVgprs: 41
	define void @indirect_use_flat_scratch() #1 {			define void @indirect_use_flat_scratch() #1 {
	call void @use_flat_scratch()			call void @use_flat_scratch()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}indirect_2level_use_flat_scratch_kernel:			; GCN-LABEL: {{^}}indirect_2level_use_flat_scratch_kernel:
	; GCN: is_dynamic_callstack = 0			; GCN: is_dynamic_callstack = 0
	; CI: ; NumSgprs: 38			; CI: ; NumSgprs: 38
	; VI-NOBUG: ; NumSgprs: 40			; VI-NOBUG: ; NumSgprs: 40
	; VI-BUG: ; NumSgprs: 96			; VI-BUG: ; NumSgprs: 96
	; GCN: ; NumVgprs: 33			; GCN: ; NumVgprs: 41
	define amdgpu_kernel void @indirect_2level_use_flat_scratch_kernel(i32 addrspace(1)* %out) #0 {			define amdgpu_kernel void @indirect_2level_use_flat_scratch_kernel(i32 addrspace(1)* %out) #0 {
	call void @indirect_use_flat_scratch()			call void @indirect_use_flat_scratch()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}use_10_vgpr:			; GCN-LABEL: {{^}}use_10_vgpr:
	; GCN: ; NumVgprs: 10			; GCN: ; NumVgprs: 10
	define void @use_10_vgpr() #1 {			define void @use_10_vgpr() #1 {
	call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4}"() #0			call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4}"() #0
	call void asm sideeffect "", "~{v5},~{v6},~{v7},~{v8},~{v9}"() #0			call void asm sideeffect "", "~{v5},~{v6},~{v7},~{v8},~{v9}"() #0
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}indirect_use_10_vgpr:			; GCN-LABEL: {{^}}indirect_use_10_vgpr:
	; GCN: ; NumVgprs: 33			; GCN: ; NumVgprs: 41
	define void @indirect_use_10_vgpr() #0 {			define void @indirect_use_10_vgpr() #0 {
	call void @use_10_vgpr()			call void @use_10_vgpr()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}indirect_2_level_use_10_vgpr:			; GCN-LABEL: {{^}}indirect_2_level_use_10_vgpr:
	; GCN: is_dynamic_callstack = 0			; GCN: is_dynamic_callstack = 0
	; GCN: ; NumVgprs: 33			; GCN: ; NumVgprs: 41
	define amdgpu_kernel void @indirect_2_level_use_10_vgpr() #0 {			define amdgpu_kernel void @indirect_2_level_use_10_vgpr() #0 {
	call void @indirect_use_10_vgpr()			call void @indirect_use_10_vgpr()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}use_40_vgpr:			; GCN-LABEL: {{^}}use_50_vgpr:
	; GCN: ; NumVgprs: 40			; GCN: ; NumVgprs: 50
	define void @use_40_vgpr() #1 {			define void @use_50_vgpr() #1 {
	call void asm sideeffect "", "~{v39}"() #0			call void asm sideeffect "", "~{v49}"() #0
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}indirect_use_40_vgpr:			; GCN-LABEL: {{^}}indirect_use_50_vgpr:
	; GCN: ; NumVgprs: 40			; GCN: ; NumVgprs: 50
	define void @indirect_use_40_vgpr() #0 {			define void @indirect_use_50_vgpr() #0 {
	call void @use_40_vgpr()			call void @use_50_vgpr()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}use_80_sgpr:			; GCN-LABEL: {{^}}use_80_sgpr:
	; GCN: ; NumSgprs: 80			; GCN: ; NumSgprs: 80
	define void @use_80_sgpr() #1 {			define void @use_80_sgpr() #1 {
	call void asm sideeffect "", "~{s79}"() #0			call void asm sideeffect "", "~{s79}"() #0
	ret void			ret void
	▲ Show 20 Lines • Show All 160 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/call-preserved-registers.ll

Show All 17 Lines	define amdgpu_kernel void @test_kernel_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {
call void @external_void_func_void()		call void @external_void_func_void()
call void asm sideeffect "", ""() #0		call void asm sideeffect "", ""() #0
call void @external_void_func_void()		call void @external_void_func_void()
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_func_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:		; GCN-LABEL: {{^}}test_func_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:
; GCN: buffer_store_dword		; GCN: buffer_store_dword
; GCN: v_writelane_b32 v32, s33, 4		; GCN: v_writelane_b32 v40, s33, 4
; GCN: v_writelane_b32 v32, s34, 0		; GCN: v_writelane_b32 v40, s34, 0
; GCN: v_writelane_b32 v32, s35, 1		; GCN: v_writelane_b32 v40, s35, 1
; GCN: v_writelane_b32 v32, s30, 2		; GCN: v_writelane_b32 v40, s30, 2
; GCN: v_writelane_b32 v32, s31, 3		; GCN: v_writelane_b32 v40, s31, 3

; GCN: s_swappc_b64		; GCN: s_swappc_b64
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: s_swappc_b64		; GCN-NEXT: s_swappc_b64
; GCN-DAG: v_readlane_b32 s4, v32, 2		; GCN-DAG: v_readlane_b32 s4, v40, 2
; GCN-DAG: v_readlane_b32 s5, v32, 3		; GCN-DAG: v_readlane_b32 s5, v40, 3
; GCN: v_readlane_b32 s35, v32, 1		; GCN: v_readlane_b32 s35, v40, 1
; GCN: v_readlane_b32 s34, v32, 0		; GCN: v_readlane_b32 s34, v40, 0

; GCN: v_readlane_b32 s33, v32, 4		; GCN: v_readlane_b32 s33, v40, 4
; GCN: buffer_load_dword		; GCN: buffer_load_dword
; GCN: s_setpc_b64		; GCN: s_setpc_b64
define void @test_func_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {		define void @test_func_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {
call void @external_void_func_void()		call void @external_void_func_void()
call void asm sideeffect "", ""() #0		call void asm sideeffect "", ""() #0
call void @external_void_func_void()		call void @external_void_func_void()
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_func_call_external_void_funcx2:		; GCN-LABEL: {{^}}test_func_call_external_void_funcx2:
; GCN: buffer_store_dword v32		; GCN: buffer_store_dword v40
; GCN: v_writelane_b32 v32, s33, 4		; GCN: v_writelane_b32 v40, s33, 4

; GCN: s_mov_b32 s33, s32		; GCN: s_mov_b32 s33, s32
; GCN: s_add_u32 s32, s32, 0x400		; GCN: s_add_u32 s32, s32, 0x400
; GCN: s_swappc_b64		; GCN: s_swappc_b64
; GCN-NEXT: s_swappc_b64		; GCN-NEXT: s_swappc_b64

; GCN: v_readlane_b32 s33, v32, 4		; GCN: v_readlane_b32 s33, v40, 4
; GCN: buffer_load_dword v32,		; GCN: buffer_load_dword v40,
define void @test_func_call_external_void_funcx2() #0 {		define void @test_func_call_external_void_funcx2() #0 {
call void @external_void_func_void()		call void @external_void_func_void()
call void @external_void_func_void()		call void @external_void_func_void()
ret void		ret void
}		}

; GCN-LABEL: {{^}}void_func_void_clobber_s30_s31:		; GCN-LABEL: {{^}}void_func_void_clobber_s30_s31:
; GCN: s_waitcnt		; GCN: s_waitcnt
Show All 40 Lines
define amdgpu_kernel void @test_call_void_func_void_mayclobber_s31(i32 addrspace(1)* %out) #0 {		define amdgpu_kernel void @test_call_void_func_void_mayclobber_s31(i32 addrspace(1)* %out) #0 {
%s31 = call i32 asm sideeffect "; def $0", "={s31}"()		%s31 = call i32 asm sideeffect "; def $0", "={s31}"()
call void @external_void_func_void()		call void @external_void_func_void()
call void asm sideeffect "; use $0", "{s31}"(i32 %s31)		call void asm sideeffect "; use $0", "{s31}"(i32 %s31)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_void_func_void_mayclobber_v31:		; GCN-LABEL: {{^}}test_call_void_func_void_mayclobber_v31:
; GCN: v_mov_b32_e32 v32, v31		; GCN: v_mov_b32_e32 v40, v31
; GCN-NEXT: s_swappc_b64		; GCN-NEXT: s_swappc_b64
; GCN-NEXT: v_mov_b32_e32 v31, v32		; GCN-NEXT: v_mov_b32_e32 v31, v40
define amdgpu_kernel void @test_call_void_func_void_mayclobber_v31(i32 addrspace(1)* %out) #0 {		define amdgpu_kernel void @test_call_void_func_void_mayclobber_v31(i32 addrspace(1)* %out) #0 {
%v31 = call i32 asm sideeffect "; def $0", "={v31}"()		%v31 = call i32 asm sideeffect "; def $0", "={v31}"()
call void @external_void_func_void()		call void @external_void_func_void()
call void asm sideeffect "; use $0", "{v31}"(i32 %v31)		call void asm sideeffect "; use $0", "{v31}"(i32 %v31)
ret void		ret void
}		}

; FIXME: What is the expected behavior for reserved registers here?		; FIXME: What is the expected behavior for reserved registers here?
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @test_call_void_func_void_preserves_s34(i32 addrspace(1)* %out) #0 {		define amdgpu_kernel void @test_call_void_func_void_preserves_s34(i32 addrspace(1)* %out) #0 {
%s34 = call i32 asm sideeffect "; def $0", "={s34}"()		%s34 = call i32 asm sideeffect "; def $0", "={s34}"()
call void @external_void_func_void()		call void @external_void_func_void()
call void asm sideeffect "; use $0", "{s34}"(i32 %s34)		call void asm sideeffect "; use $0", "{s34}"(i32 %s34)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_void_func_void_preserves_v32: {{.*}}		; GCN-LABEL: {{^}}test_call_void_func_void_preserves_v40: {{.*}}

; GCN-NOT: v32		; GCN-NOT: v32
; GCN: s_getpc_b64 s[4:5]		; GCN: s_getpc_b64 s[4:5]
; GCN-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4		; GCN-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
; GCN-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+4		; GCN-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+4
; GCN: s_mov_b32 s32, 0		; GCN: s_mov_b32 s32, 0
; GCN-NOT: v32		; GCN-NOT: v40

; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; GCN-NEXT: ; def v32		; GCN-NEXT: ; def v40
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND

; GCN: s_swappc_b64 s[30:31], s[4:5]		; GCN: s_swappc_b64 s[30:31], s[4:5]

; GCN-NOT: v32		; GCN-NOT: v40

; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; GCN-NEXT: ; use v32		; GCN-NEXT: ; use v40
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @test_call_void_func_void_preserves_v32(i32 addrspace(1)* %out) #0 {		define amdgpu_kernel void @test_call_void_func_void_preserves_v40(i32 addrspace(1)* %out) #0 {
%v32 = call i32 asm sideeffect "; def $0", "={v32}"()		%v40 = call i32 asm sideeffect "; def $0", "={v40}"()
call void @external_void_func_void()		call void @external_void_func_void()
call void asm sideeffect "; use $0", "{v32}"(i32 %v32)		call void asm sideeffect "; use $0", "{v40}"(i32 %v40)
ret void		ret void
}		}

; GCN-LABEL: {{^}}void_func_void_clobber_s33:		; GCN-LABEL: {{^}}void_func_void_clobber_s33:
; GCN: v_writelane_b32 v0, s33, 0		; GCN: v_writelane_b32 v0, s33, 0
; GCN-NEXT: #ASMSTART		; GCN-NEXT: #ASMSTART
; GCN-NEXT: ; clobber		; GCN-NEXT: ; clobber
; GCN-NEXT: #ASMEND		; GCN-NEXT: #ASMEND
Show All 37 Lines
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @test_call_void_func_void_clobber_s34() #0 {		define amdgpu_kernel void @test_call_void_func_void_clobber_s34() #0 {
call void @void_func_void_clobber_s34()		call void @void_func_void_clobber_s34()
ret void		ret void
}		}

; GCN-LABEL: {{^}}callee_saved_sgpr_func:		; GCN-LABEL: {{^}}callee_saved_sgpr_func:
; GCN-NOT: s40		; GCN-NOT: s40
; GCN: v_writelane_b32 v32, s40		; GCN: v_writelane_b32 v40, s40
; GCN: s_swappc_b64		; GCN: s_swappc_b64
; GCN-NOT: s40		; GCN-NOT: s40
; GCN: ; use s40		; GCN: ; use s40
; GCN-NOT: s40		; GCN-NOT: s40
; GCN: v_readlane_b32 s40, v32		; GCN: v_readlane_b32 s40, v40
; GCN-NOT: s40		; GCN-NOT: s40
define void @callee_saved_sgpr_func() #2 {		define void @callee_saved_sgpr_func() #2 {
%s40 = call i32 asm sideeffect "; def s40", "={s40}"() #0		%s40 = call i32 asm sideeffect "; def s40", "={s40}"() #0
call void @external_void_func_void()		call void @external_void_func_void()
call void asm sideeffect "; use $0", "s"(i32 %s40) #0		call void asm sideeffect "; use $0", "s"(i32 %s40) #0
ret void		ret void
}		}

Show All 10 Lines	define amdgpu_kernel void @callee_saved_sgpr_kernel() #2 {
call void @external_void_func_void()		call void @external_void_func_void()
call void asm sideeffect "; use $0", "s"(i32 %s40) #0		call void asm sideeffect "; use $0", "s"(i32 %s40) #0
ret void		ret void
}		}

; First call preserved VGPR is used so it can't be used for SGPR spills.		; First call preserved VGPR is used so it can't be used for SGPR spills.
; GCN-LABEL: {{^}}callee_saved_sgpr_vgpr_func:		; GCN-LABEL: {{^}}callee_saved_sgpr_vgpr_func:
; GCN-NOT: s40		; GCN-NOT: s40
; GCN: v_writelane_b32 v33, s40		; GCN: v_writelane_b32 v41, s40
; GCN: s_swappc_b64		; GCN: s_swappc_b64
; GCN-NOT: s40		; GCN-NOT: s40
; GCN: ; use s40		; GCN: ; use s40
; GCN-NOT: s40		; GCN-NOT: s40
; GCN: v_readlane_b32 s40, v33		; GCN: v_readlane_b32 s40, v41
; GCN-NOT: s40		; GCN-NOT: s40
define void @callee_saved_sgpr_vgpr_func() #2 {		define void @callee_saved_sgpr_vgpr_func() #2 {
%s40 = call i32 asm sideeffect "; def s40", "={s40}"() #0		%s40 = call i32 asm sideeffect "; def s40", "={s40}"() #0
%v32 = call i32 asm sideeffect "; def v32", "={v32}"() #0		%v40 = call i32 asm sideeffect "; def v40", "={v40}"() #0
call void @external_void_func_void()		call void @external_void_func_void()
call void asm sideeffect "; use $0", "s"(i32 %s40) #0		call void asm sideeffect "; use $0", "s"(i32 %s40) #0
call void asm sideeffect "; use $0", "v"(i32 %v32) #0		call void asm sideeffect "; use $0", "v"(i32 %v40) #0
ret void		ret void
}		}

; GCN-LABEL: {{^}}callee_saved_sgpr_vgpr_kernel:		; GCN-LABEL: {{^}}callee_saved_sgpr_vgpr_kernel:
; GCN-NOT: s40		; GCN-NOT: s40
; GCN: ; def s40		; GCN: ; def s40
; GCN-NOT: s40		; GCN-NOT: s40
; GCN: s_swappc_b64		; GCN: s_swappc_b64
Show All 15 Lines

llvm/test/CodeGen/AMDGPU/call-waitcnt.ll

	Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0			; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0
	; GCN-NEXT: s_add_u32 s0, s0, s9			; GCN-NEXT: s_add_u32 s0, s0, s9
	; GCN-NEXT: s_addc_u32 s1, s1, 0			; GCN-NEXT: s_addc_u32 s1, s1, 0
	; GCN-NEXT: v_mov_b32_e32 v0, 0			; GCN-NEXT: v_mov_b32_e32 v0, 0
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, func@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, func@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, func@rel32@hi+4			; GCN-NEXT: s_addc_u32 s5, s5, func@rel32@hi+4
	; GCN-NEXT: s_mov_b32 s32, 0			; GCN-NEXT: s_mov_b32 s32, 0
	; GCN-NEXT: v_mov_b32_e32 v32, 0			; GCN-NEXT: v_mov_b32_e32 v40, 0
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GCN-NEXT: v_mov_b32_e32 v0, s34			; GCN-NEXT: v_mov_b32_e32 v0, s34
	; GCN-NEXT: v_mov_b32_e32 v1, s35			; GCN-NEXT: v_mov_b32_e32 v1, s35
	; GCN-NEXT: global_store_dword v[0:1], v32, off			; GCN-NEXT: global_store_dword v[0:1], v40, off
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	call void @func(i32 0)			call void @func(i32 0)
	store i32 0, i32 addrspace(1)* %ptr			store i32 0, i32 addrspace(1)* %ptr
	ret void			ret void
	}			}

	define amdgpu_kernel void @call_no_wait_after_call_return_val(i32 addrspace(1)* %ptr, i32) #0 {			define amdgpu_kernel void @call_no_wait_after_call_return_val(i32 addrspace(1)* %ptr, i32) #0 {
	; GCN-LABEL: call_no_wait_after_call_return_val:			; GCN-LABEL: call_no_wait_after_call_return_val:
	▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll

	Show First 20 Lines • Show All 121 Lines • ▼ Show 20 Lines
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
	; GCN-DAG: s_add_u32 s32, s32, 0x400			; GCN-DAG: s_add_u32 s32, s32, 0x400
	; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s33, [[FP_SPILL_LANE:[0-9]+]]			; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s33, [[FP_SPILL_LANE:[0-9]+]]

	; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s30, 0			; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s30, 0
	; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31, 1			; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31, 1
	; GCN: s_swappc_b64			; GCN: s_swappc_b64

	; GCN-DAG: v_readlane_b32 s4, v32, 0			; GCN-DAG: v_readlane_b32 s4, v40, 0
	; GCN-DAG: v_readlane_b32 s5, v32, 1			; GCN-DAG: v_readlane_b32 s5, v40, 1

	; GCN: s_sub_u32 s32, s32, 0x400			; GCN: s_sub_u32 s32, s32, 0x400
	; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], [[FP_SPILL_LANE]]			; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], [[FP_SPILL_LANE]]
	; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GCN-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64			; GCN-NEXT: s_setpc_b64
	Show All 23 Lines
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
	; GCN-NEXT: s_waitcnt			; GCN-NEXT: s_waitcnt
	; GCN-NEXT: s_setpc_b64			; GCN-NEXT: s_setpc_b64
	define void @callee_func_sgpr_spill_no_calls(i32 %in) #0 {			define void @callee_func_sgpr_spill_no_calls(i32 %in) #0 {
	call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}"() #0			call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}"() #0
	call void asm sideeffect "", "~{v8},~{v9},~{v10},~{v11},~{v12},~{v13},~{v14},~{v15}"() #0			call void asm sideeffect "", "~{v8},~{v9},~{v10},~{v11},~{v12},~{v13},~{v14},~{v15}"() #0
	call void asm sideeffect "", "~{v16},~{v17},~{v18},~{v19},~{v20},~{v21},~{v22},~{v23}"() #0			call void asm sideeffect "", "~{v16},~{v17},~{v18},~{v19},~{v20},~{v21},~{v22},~{v23}"() #0
	call void asm sideeffect "", "~{v24},~{v25},~{v26},~{v27},~{v28},~{v29},~{v30},~{v31}"() #0			call void asm sideeffect "", "~{v24},~{v25},~{v26},~{v27},~{v28},~{v29},~{v30},~{v31}"() #0
				call void asm sideeffect "", "~{v32},~{v33},~{v34},~{v35},~{v36},~{v37},~{v38},~{v39}"() #0

	%wide.sgpr0 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0			%wide.sgpr0 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
	%wide.sgpr1 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0			%wide.sgpr1 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
	%wide.sgpr2 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0			%wide.sgpr2 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
	%wide.sgpr5 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0			%wide.sgpr5 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
	%wide.sgpr3 = call <8 x i32> asm sideeffect "; def $0", "=s" () #0			%wide.sgpr3 = call <8 x i32> asm sideeffect "; def $0", "=s" () #0
	%wide.sgpr4 = call <2 x i32> asm sideeffect "; def $0", "=s" () #0			%wide.sgpr4 = call <2 x i32> asm sideeffect "; def $0", "=s" () #0

	Show All 23 Lines
	}			}

	; TODO: Can the SP inc/deec be remvoed?			; TODO: Can the SP inc/deec be remvoed?
	; GCN-LABEL: {{^}}callee_with_stack_no_fp_elim_csr_vgpr:			; GCN-LABEL: {{^}}callee_with_stack_no_fp_elim_csr_vgpr:
	; GCN: s_waitcnt			; GCN: s_waitcnt
	; GCN-NEXT:s_mov_b32 [[FP_COPY:s[0-9]+]], s33			; GCN-NEXT:s_mov_b32 [[FP_COPY:s[0-9]+]], s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0			; GCN: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0
	; GCN-DAG: buffer_store_dword v33, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-DAG: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-DAG: buffer_store_dword [[ZERO]], off, s[0:3], s33 offset:8			; GCN-DAG: buffer_store_dword [[ZERO]], off, s[0:3], s33 offset:8

	; GCN: ;;#ASMSTART			; GCN: ;;#ASMSTART
	; GCN-NEXT: ; clobber v33			; GCN-NEXT: ; clobber v41
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND

	; GCN: buffer_load_dword v33, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN: s_add_u32 s32, s32, 0x300			; GCN: s_add_u32 s32, s32, 0x300
	; GCN-NEXT: s_sub_u32 s32, s32, 0x300			; GCN-NEXT: s_sub_u32 s32, s32, 0x300
	; GCN-NEXT: s_mov_b32 s33, s4			; GCN-NEXT: s_mov_b32 s33, s4
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64			; GCN-NEXT: s_setpc_b64
	define void @callee_with_stack_no_fp_elim_csr_vgpr() #1 {			define void @callee_with_stack_no_fp_elim_csr_vgpr() #1 {
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	store volatile i32 0, i32 addrspace(5)* %alloca			store volatile i32 0, i32 addrspace(5)* %alloca
	call void asm sideeffect "; clobber v33", "~{v33}"()			call void asm sideeffect "; clobber v41", "~{v41}"()
	ret void			ret void
	}			}

	; Use a copy to a free SGPR instead of introducing a second CSR VGPR.			; Use a copy to a free SGPR instead of introducing a second CSR VGPR.
	; GCN-LABEL: {{^}}last_lane_vgpr_for_fp_csr:			; GCN-LABEL: {{^}}last_lane_vgpr_for_fp_csr:
	; GCN: s_waitcnt			; GCN: s_waitcnt
	; GCN-NEXT: v_writelane_b32 v1, s33, 63			; GCN-NEXT: v_writelane_b32 v1, s33, 63
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN: buffer_store_dword v33, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-COUNT-63: v_writelane_b32 v1			; GCN-COUNT-63: v_writelane_b32 v1
	; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33 offset:8			; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33 offset:8
	; GCN: ;;#ASMSTART			; GCN: ;;#ASMSTART
	; GCN-COUNT-63: v_readlane_b32 s{{[0-9]+}}, v1			; GCN-COUNT-63: v_readlane_b32 s{{[0-9]+}}, v1

	; GCN: s_add_u32 s32, s32, 0x300			; GCN: s_add_u32 s32, s32, 0x300
	; GCN-NEXT: s_sub_u32 s32, s32, 0x300			; GCN-NEXT: s_sub_u32 s32, s32, 0x300
	; GCN-NEXT: v_readlane_b32 s33, v1, 63			; GCN-NEXT: v_readlane_b32 s33, v1, 63
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64			; GCN-NEXT: s_setpc_b64
	define void @last_lane_vgpr_for_fp_csr() #1 {			define void @last_lane_vgpr_for_fp_csr() #1 {
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	store volatile i32 0, i32 addrspace(5)* %alloca			store volatile i32 0, i32 addrspace(5)* %alloca
	call void asm sideeffect "; clobber v33", "~{v33}"()			call void asm sideeffect "; clobber v41", "~{v41}"()
	call void asm sideeffect "",			call void asm sideeffect "",
	"~{s40},~{s41},~{s42},~{s43},~{s44},~{s45},~{s46},~{s47},~{s48},~{s49}			"~{s40},~{s41},~{s42},~{s43},~{s44},~{s45},~{s46},~{s47},~{s48},~{s49}
	,~{s50},~{s51},~{s52},~{s53},~{s54},~{s55},~{s56},~{s57},~{s58},~{s59}			,~{s50},~{s51},~{s52},~{s53},~{s54},~{s55},~{s56},~{s57},~{s58},~{s59}
	,~{s60},~{s61},~{s62},~{s63},~{s64},~{s65},~{s66},~{s67},~{s68},~{s69}			,~{s60},~{s61},~{s62},~{s63},~{s64},~{s65},~{s66},~{s67},~{s68},~{s69}
	,~{s70},~{s71},~{s72},~{s73},~{s74},~{s75},~{s76},~{s77},~{s78},~{s79}			,~{s70},~{s71},~{s72},~{s73},~{s74},~{s75},~{s76},~{s77},~{s78},~{s79}
	,~{s80},~{s81},~{s82},~{s83},~{s84},~{s85},~{s86},~{s87},~{s88},~{s89}			,~{s80},~{s81},~{s82},~{s83},~{s84},~{s85},~{s86},~{s87},~{s88},~{s89}
	,~{s90},~{s91},~{s92},~{s93},~{s94},~{s95},~{s96},~{s97},~{s98},~{s99}			,~{s90},~{s91},~{s92},~{s93},~{s94},~{s95},~{s96},~{s97},~{s98},~{s99}
	,~{s100},~{s101},~{s102}"() #1			,~{s100},~{s101},~{s102}"() #1

	ret void			ret void
	}			}

	; Use a copy to a free SGPR instead of introducing a second CSR VGPR.			; Use a copy to a free SGPR instead of introducing a second CSR VGPR.
	; GCN-LABEL: {{^}}no_new_vgpr_for_fp_csr:			; GCN-LABEL: {{^}}no_new_vgpr_for_fp_csr:
	; GCN: s_waitcnt			; GCN: s_waitcnt
	; GCN-NEXT: s_mov_b32 [[FP_COPY:s[0-9]+]], s33			; GCN-NEXT: s_mov_b32 [[FP_COPY:s[0-9]+]], s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: buffer_store_dword v33, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-COUNT-64: v_writelane_b32 v1,			; GCN-COUNT-64: v_writelane_b32 v1,

	; GCN: buffer_store_dword			; GCN: buffer_store_dword
	; GCN: ;;#ASMSTART			; GCN: ;;#ASMSTART
	; GCN-COUNT-64: v_readlane_b32 s{{[0-9]+}}, v1			; GCN-COUNT-64: v_readlane_b32 s{{[0-9]+}}, v1

	; GCN: buffer_load_dword v33, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN: s_add_u32 s32, s32, 0x300			; GCN: s_add_u32 s32, s32, 0x300
	; GCN-NEXT: s_sub_u32 s32, s32, 0x300			; GCN-NEXT: s_sub_u32 s32, s32, 0x300
	; GCN-NEXT: s_mov_b32 s33, [[FP_COPY]]			; GCN-NEXT: s_mov_b32 s33, [[FP_COPY]]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64			; GCN-NEXT: s_setpc_b64
	define void @no_new_vgpr_for_fp_csr() #1 {			define void @no_new_vgpr_for_fp_csr() #1 {
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	store volatile i32 0, i32 addrspace(5)* %alloca			store volatile i32 0, i32 addrspace(5)* %alloca
	call void asm sideeffect "; clobber v33", "~{v33}"()			call void asm sideeffect "; clobber v41", "~{v41}"()
	call void asm sideeffect "",			call void asm sideeffect "",
	"~{s39},~{s40},~{s41},~{s42},~{s43},~{s44},~{s45},~{s46},~{s47},~{s48},~{s49}			"~{s39},~{s40},~{s41},~{s42},~{s43},~{s44},~{s45},~{s46},~{s47},~{s48},~{s49}
	,~{s50},~{s51},~{s52},~{s53},~{s54},~{s55},~{s56},~{s57},~{s58},~{s59}			,~{s50},~{s51},~{s52},~{s53},~{s54},~{s55},~{s56},~{s57},~{s58},~{s59}
	,~{s60},~{s61},~{s62},~{s63},~{s64},~{s65},~{s66},~{s67},~{s68},~{s69}			,~{s60},~{s61},~{s62},~{s63},~{s64},~{s65},~{s66},~{s67},~{s68},~{s69}
	,~{s70},~{s71},~{s72},~{s73},~{s74},~{s75},~{s76},~{s77},~{s78},~{s79}			,~{s70},~{s71},~{s72},~{s73},~{s74},~{s75},~{s76},~{s77},~{s78},~{s79}
	,~{s80},~{s81},~{s82},~{s83},~{s84},~{s85},~{s86},~{s87},~{s88},~{s89}			,~{s80},~{s81},~{s82},~{s83},~{s84},~{s85},~{s86},~{s87},~{s88},~{s89}
	,~{s90},~{s91},~{s92},~{s93},~{s94},~{s95},~{s96},~{s97},~{s98},~{s99}			,~{s90},~{s91},~{s92},~{s93},~{s94},~{s95},~{s96},~{s97},~{s98},~{s99}
	,~{s100},~{s101},~{s102}"() #1			,~{s100},~{s101},~{s102}"() #1
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	}			}

	; Need a new CSR VGPR to satisfy the FP spill.			; Need a new CSR VGPR to satisfy the FP spill.
	; GCN-LABEL: {{^}}no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr:			; GCN-LABEL: {{^}}no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr:
	; GCN: s_waitcnt			; GCN: s_waitcnt
	; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GCN-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
	; GCN-NEXT: v_writelane_b32 v32, s33, 2			; GCN-NEXT: v_writelane_b32 [[CSR_VGPR]], s33, 2
	; GCN-NEXT: v_writelane_b32 v32, s30, 0			; GCN-NEXT: v_writelane_b32 [[CSR_VGPR]], s30, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32

	; GCN-DAG: v_writelane_b32 v32, s31, 1			; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31, 1
	; GCN-DAG: buffer_store_dword			; GCN-DAG: buffer_store_dword
	; GCN: s_add_u32 s32, s32, 0x300{{$}}			; GCN: s_add_u32 s32, s32, 0x300{{$}}

	; GCN: ;;#ASMSTART			; GCN: ;;#ASMSTART

	; GCN: v_readlane_b32 s4, v32, 0			; GCN: v_readlane_b32 s4, [[CSR_VGPR]], 0
	; GCN-NEXT: v_readlane_b32 s5, v32, 1			; GCN-NEXT: v_readlane_b32 s5, [[CSR_VGPR]], 1
	; GCN-NEXT: s_sub_u32 s32, s32, 0x300{{$}}			; GCN-NEXT: s_sub_u32 s32, s32, 0x300{{$}}
	; GCN-NEXT: v_readlane_b32 s33, v32, 2			; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], 2
	; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GCN-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64			; GCN-NEXT: s_setpc_b64
	define void @no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr() #1 {			define void @no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr() #1 {
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	store volatile i32 0, i32 addrspace(5)* %alloca			store volatile i32 0, i32 addrspace(5)* %alloca

	; Use all clobberable registers, so FP has to spill to a VGPR.			; Use all clobberable registers, so FP has to spill to a VGPR.
	call void asm sideeffect "",			call void asm sideeffect "",
	"~{s0},~{s1},~{s2},~{s3},~{s4},~{s5},~{s6},~{s7},~{s8},~{s9}			"~{s0},~{s1},~{s2},~{s3},~{s4},~{s5},~{s6},~{s7},~{s8},~{s9}
	,~{s10},~{s11},~{s12},~{s13},~{s14},~{s15},~{s16},~{s17},~{s18},~{s19}			,~{s10},~{s11},~{s12},~{s13},~{s14},~{s15},~{s16},~{s17},~{s18},~{s19}
	,~{s20},~{s21},~{s22},~{s23},~{s24},~{s25},~{s26},~{s27},~{s28},~{s29}			,~{s20},~{s21},~{s22},~{s23},~{s24},~{s25},~{s26},~{s27},~{s28},~{s29}
	,~{s30},~{s31}"() #0			,~{s30},~{s31}"() #0

	call void asm sideeffect "; clobber nonpreserved VGPRs",			call void asm sideeffect "; clobber nonpreserved initial VGPRs",
	"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}			"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}
	,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19}			,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19}
	,~{v20},~{v21},~{v22},~{v23},~{v24},~{v25},~{v26},~{v27},~{v28},~{v29}			,~{v20},~{v21},~{v22},~{v23},~{v24},~{v25},~{v26},~{v27},~{v28},~{v29}
	,~{v30},~{v31}"() #1			,~{v30},~{v31},~{v32},~{v33},~{v34},~{v35},~{v36},~{v37},~{v38},~{v39}"() #1

	ret void			ret void
	}			}

	; The byval argument exceeds the MUBUF constant offset, so a scratch			; The byval argument exceeds the MUBUF constant offset, so a scratch
	; register is needed to access the CSR VGPR slot.			; register is needed to access the CSR VGPR slot.
	; GCN-LABEL: {{^}}scratch_reg_needed_mubuf_offset:			; GCN-LABEL: {{^}}scratch_reg_needed_mubuf_offset:
	; GCN: s_waitcnt			; GCN: s_waitcnt
	; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GCN-NEXT: v_mov_b32_e32 [[SCRATCH_VGPR:v[0-9]+]], 0x1008			; GCN-NEXT: v_mov_b32_e32 [[SCRATCH_VGPR:v[0-9]+]], 0x1008
	; GCN-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], [[SCRATCH_VGPR]], s[0:3], s32 offen ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], [[SCRATCH_VGPR]], s[0:3], s32 offen ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
	; GCN-NEXT: v_writelane_b32 v32, s33, 2			; GCN-NEXT: v_writelane_b32 [[CSR_VGPR]], s33, 2
	; GCN-NEXT: v_writelane_b32 v32, s30, 0			; GCN-NEXT: v_writelane_b32 [[CSR_VGPR]], s30, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-DAG: v_writelane_b32 v32, s31, 1			; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31, 1
	; GCN-DAG: s_add_u32 s32, s32, 0x40300{{$}}			; GCN-DAG: s_add_u32 s32, s32, 0x40300{{$}}
	; GCN-DAG: buffer_store_dword			; GCN-DAG: buffer_store_dword

	; GCN: ;;#ASMSTART			; GCN: ;;#ASMSTART

	; GCN: v_readlane_b32 s4, v32, 0			; GCN: v_readlane_b32 s4, [[CSR_VGPR]], 0
	; GCN-NEXT: v_readlane_b32 s5, v32, 1			; GCN-NEXT: v_readlane_b32 s5, [[CSR_VGPR]], 1
	; GCN-NEXT: s_sub_u32 s32, s32, 0x40300{{$}}			; GCN-NEXT: s_sub_u32 s32, s32, 0x40300{{$}}
	; GCN-NEXT: v_readlane_b32 s33, v32, 2			; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], 2
	; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GCN-NEXT: v_mov_b32_e32 [[SCRATCH_VGPR:v[0-9]+]], 0x1008			; GCN-NEXT: v_mov_b32_e32 [[SCRATCH_VGPR:v[0-9]+]], 0x1008
	; GCN-NEXT: buffer_load_dword [[CSR_VGPR]], [[SCRATCH_VGPR]], s[0:3], s32 offen ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword [[CSR_VGPR]], [[SCRATCH_VGPR]], s[0:3], s32 offen ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64			; GCN-NEXT: s_setpc_b64
	define void @scratch_reg_needed_mubuf_offset([4096 x i8] addrspace(5)* byval align 4 %arg) #1 {			define void @scratch_reg_needed_mubuf_offset([4096 x i8] addrspace(5)* byval align 4 %arg) #1 {
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	store volatile i32 0, i32 addrspace(5)* %alloca			store volatile i32 0, i32 addrspace(5)* %alloca

	; Use all clobberable registers, so FP has to spill to a VGPR.			; Use all clobberable registers, so FP has to spill to a VGPR.
	call void asm sideeffect "; clobber nonpreserved SGPRs",			call void asm sideeffect "; clobber nonpreserved SGPRs",
	"~{s0},~{s1},~{s2},~{s3},~{s4},~{s5},~{s6},~{s7},~{s8},~{s9}			"~{s0},~{s1},~{s2},~{s3},~{s4},~{s5},~{s6},~{s7},~{s8},~{s9}
	,~{s10},~{s11},~{s12},~{s13},~{s14},~{s15},~{s16},~{s17},~{s18},~{s19}			,~{s10},~{s11},~{s12},~{s13},~{s14},~{s15},~{s16},~{s17},~{s18},~{s19}
	,~{s20},~{s21},~{s22},~{s23},~{s24},~{s25},~{s26},~{s27},~{s28},~{s29}			,~{s20},~{s21},~{s22},~{s23},~{s24},~{s25},~{s26},~{s27},~{s28},~{s29}
	,~{s30},~{s31}"() #0			,~{s30},~{s31}"() #0

	; Use all clobberable VGPRs, so a CSR spill is needed for the VGPR			; Use all clobberable VGPRs, so a CSR spill is needed for the VGPR
	call void asm sideeffect "; clobber nonpreserved VGPRs",			call void asm sideeffect "; clobber nonpreserved VGPRs",
	"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}			"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}
	,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19}			,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19}
	,~{v20},~{v21},~{v22},~{v23},~{v24},~{v25},~{v26},~{v27},~{v28},~{v29}			,~{v20},~{v21},~{v22},~{v23},~{v24},~{v25},~{v26},~{v27},~{v28},~{v29}
	,~{v30},~{v31}"() #1			,~{v30},~{v31},~{v32},~{v33},~{v34},~{v35},~{v36},~{v37},~{v38},~{v39}"() #1

	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}local_empty_func:			; GCN-LABEL: {{^}}local_empty_func:
	; GCN: s_waitcnt			; GCN: s_waitcnt
	; GCN-NEXT: s_setpc_b64			; GCN-NEXT: s_setpc_b64
	define internal void @local_empty_func() #0 {			define internal void @local_empty_func() #0 {
	Show All 23 Lines

llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs-fixed-abi.ll

Show First 20 Lines • Show All 138 Lines • ▼ Show 20 Lines	define hidden void @use_workgroup_id_yz() #1 {
ret void		ret void
}		}

; Argument is in right place already		; Argument is in right place already
; GCN-LABEL: {{^}}func_indirect_use_workgroup_id_x:		; GCN-LABEL: {{^}}func_indirect_use_workgroup_id_x:
; GCN-NOT: s12		; GCN-NOT: s12
; GCN-NOT: s13		; GCN-NOT: s13
; GCN-NOT: s14		; GCN-NOT: s14
; GCN: v_readlane_b32 s4, v32, 0		; GCN: v_readlane_b32 s4, v40, 0
define hidden void @func_indirect_use_workgroup_id_x() #1 {		define hidden void @func_indirect_use_workgroup_id_x() #1 {
call void @use_workgroup_id_x()		call void @use_workgroup_id_x()
ret void		ret void
}		}

; GCN-LABEL: {{^}}func_indirect_use_workgroup_id_y:		; GCN-LABEL: {{^}}func_indirect_use_workgroup_id_y:
; GCN-NOT: s4		; GCN-NOT: s4
; GCN: v_readlane_b32 s4, v32, 0		; GCN: v_readlane_b32 s4, v40, 0
define hidden void @func_indirect_use_workgroup_id_y() #1 {		define hidden void @func_indirect_use_workgroup_id_y() #1 {
call void @use_workgroup_id_y()		call void @use_workgroup_id_y()
ret void		ret void
}		}

; GCN-LABEL: {{^}}func_indirect_use_workgroup_id_z:		; GCN-LABEL: {{^}}func_indirect_use_workgroup_id_z:
; GCN-NOT: s4		; GCN-NOT: s4
; GCN: v_readlane_b32 s4, v32, 0		; GCN: v_readlane_b32 s4, v40, 0
define hidden void @func_indirect_use_workgroup_id_z() #1 {		define hidden void @func_indirect_use_workgroup_id_z() #1 {
call void @use_workgroup_id_z()		call void @use_workgroup_id_z()
ret void		ret void
}		}

; GCN-LABEL: {{^}}other_arg_use_workgroup_id_x:		; GCN-LABEL: {{^}}other_arg_use_workgroup_id_x:
; GCN: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+\]}}, v0		; GCN: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+\]}}, v0
; GCN: ; use s12		; GCN: ; use s12
▲ Show 20 Lines • Show All 171 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs.ll

	Show First 20 Lines • Show All 296 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @kern_indirect_use_workgroup_id_yz() #1 {			define amdgpu_kernel void @kern_indirect_use_workgroup_id_yz() #1 {
	call void @use_workgroup_id_yz()			call void @use_workgroup_id_yz()
	ret void			ret void
	}			}

	; Argument is in right place already			; Argument is in right place already
	; GCN-LABEL: {{^}}func_indirect_use_workgroup_id_x:			; GCN-LABEL: {{^}}func_indirect_use_workgroup_id_x:
	; GCN-NOT: s4			; GCN-NOT: s4
	; GCN: v_readlane_b32 s4, v32, 0			; GCN: v_readlane_b32 s4, v40, 0
	define hidden void @func_indirect_use_workgroup_id_x() #1 {			define hidden void @func_indirect_use_workgroup_id_x() #1 {
	call void @use_workgroup_id_x()			call void @use_workgroup_id_x()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}func_indirect_use_workgroup_id_y:			; GCN-LABEL: {{^}}func_indirect_use_workgroup_id_y:
	; GCN-NOT: s4			; GCN-NOT: s4
	; GCN: v_readlane_b32 s4, v32, 0			; GCN: v_readlane_b32 s4, v40, 0
	define hidden void @func_indirect_use_workgroup_id_y() #1 {			define hidden void @func_indirect_use_workgroup_id_y() #1 {
	call void @use_workgroup_id_y()			call void @use_workgroup_id_y()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}func_indirect_use_workgroup_id_z:			; GCN-LABEL: {{^}}func_indirect_use_workgroup_id_z:
	; GCN-NOT: s4			; GCN-NOT: s4
	; GCN: v_readlane_b32 s4, v32, 0			; GCN: v_readlane_b32 s4, v40, 0
	define hidden void @func_indirect_use_workgroup_id_z() #1 {			define hidden void @func_indirect_use_workgroup_id_z() #1 {
	call void @use_workgroup_id_z()			call void @use_workgroup_id_z()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}other_arg_use_workgroup_id_x:			; GCN-LABEL: {{^}}other_arg_use_workgroup_id_x:
	; GCN: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+\]}}, v0			; GCN: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+\]}}, v0
	; GCN: ; use s4			; GCN: ; use s4
	▲ Show 20 Lines • Show All 281 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/callee-special-input-vgprs.ll

Show First 20 Lines • Show All 390 Lines • ▼ Show 20 Lines
; FIXEDABI: v_or_b32_e32 [[TMP2:v[0-9]+]], v0, [[TMP1]]		; FIXEDABI: v_or_b32_e32 [[TMP2:v[0-9]+]], v0, [[TMP1]]
; FIXEDABI: v_or_b32_e32 v31, [[TMP2]], [[TMP0]]		; FIXEDABI: v_or_b32_e32 v31, [[TMP2]], [[TMP0]]
define amdgpu_kernel void @kern_indirect_other_arg_use_workitem_id_z() #1 {		define amdgpu_kernel void @kern_indirect_other_arg_use_workitem_id_z() #1 {
call void @other_arg_use_workitem_id_z(i32 555)		call void @other_arg_use_workitem_id_z(i32 555)
ret void		ret void
}		}

; GCN-LABEL: {{^}}too_many_args_use_workitem_id_x:		; GCN-LABEL: {{^}}too_many_args_use_workitem_id_x:
; VARABI: buffer_store_dword v32, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; VARABI: buffer_load_dword v32, off, s[0:3], s32{{$}}		; VARABI: buffer_load_dword v32, off, s[0:3], s32{{$}}
; VARABI: v_and_b32_e32 v32, 0x3ff, v32		; VARABI: v_and_b32_e32 v32, 0x3ff, v32
; VARABI: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+]}}, v32		; VARABI: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+]}}, v32

; VARABI: buffer_load_dword v32, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload		; VARABI: s_waitcnt
; VARABI-NEXT: s_waitcnt
; VARABI-NEXT: s_setpc_b64		; VARABI-NEXT: s_setpc_b64

; FIXEDABI: v_and_b32_e32 v31, 0x3ff, v31		; FIXEDABI: v_and_b32_e32 v31, 0x3ff, v31
; FIXEDABI: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s32{{$}}		; FIXEDABI: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s32{{$}}
define void @too_many_args_use_workitem_id_x(		define void @too_many_args_use_workitem_id_x(
i32 %arg0, i32 %arg1, i32 %arg2, i32 %arg3, i32 %arg4, i32 %arg5, i32 %arg6, i32 %arg7,		i32 %arg0, i32 %arg1, i32 %arg2, i32 %arg3, i32 %arg4, i32 %arg5, i32 %arg6, i32 %arg7,
i32 %arg8, i32 %arg9, i32 %arg10, i32 %arg11, i32 %arg12, i32 %arg13, i32 %arg14, i32 %arg15,		i32 %arg8, i32 %arg9, i32 %arg10, i32 %arg11, i32 %arg12, i32 %arg13, i32 %arg14, i32 %arg15,
i32 %arg16, i32 %arg17, i32 %arg18, i32 %arg19, i32 %arg20, i32 %arg21, i32 %arg22, i32 %arg23,		i32 %arg16, i32 %arg17, i32 %arg18, i32 %arg19, i32 %arg20, i32 %arg21, i32 %arg22, i32 %arg23,
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	call void @too_many_args_use_workitem_id_x(
i32 250, i32 260, i32 270, i32 280,		i32 250, i32 260, i32 270, i32 280,
i32 290, i32 300, i32 310, i32 320)		i32 290, i32 300, i32 310, i32 320)
ret void		ret void
}		}

; Requires loading and storing to stack slot.		; Requires loading and storing to stack slot.
; GCN-LABEL: {{^}}too_many_args_call_too_many_args_use_workitem_id_x:		; GCN-LABEL: {{^}}too_many_args_call_too_many_args_use_workitem_id_x:
; GCN-DAG: s_add_u32 s32, s32, 0x400{{$}}		; GCN-DAG: s_add_u32 s32, s32, 0x400{{$}}
; GCN-DAG: buffer_store_dword v32, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GCN-DAG: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GCN-DAG: buffer_load_dword v32, off, s[0:3], s33{{$}}		; GCN-DAG: buffer_load_dword v32, off, s[0:3], s33{{$}}

; GCN: buffer_store_dword v32, off, s[0:3], s32{{$}}		; GCN: buffer_store_dword v32, off, s[0:3], s32{{$}}

; GCN: s_swappc_b64		; GCN: s_swappc_b64

; GCN: buffer_load_dword v32, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GCN: s_sub_u32 s32, s32, 0x400{{$}}		; GCN: s_sub_u32 s32, s32, 0x400{{$}}
		; GCN: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
; GCN: s_setpc_b64		; GCN: s_setpc_b64
define void @too_many_args_call_too_many_args_use_workitem_id_x(		define void @too_many_args_call_too_many_args_use_workitem_id_x(
i32 %arg0, i32 %arg1, i32 %arg2, i32 %arg3, i32 %arg4, i32 %arg5, i32 %arg6, i32 %arg7,		i32 %arg0, i32 %arg1, i32 %arg2, i32 %arg3, i32 %arg4, i32 %arg5, i32 %arg6, i32 %arg7,
i32 %arg8, i32 %arg9, i32 %arg10, i32 %arg11, i32 %arg12, i32 %arg13, i32 %arg14, i32 %arg15,		i32 %arg8, i32 %arg9, i32 %arg10, i32 %arg11, i32 %arg12, i32 %arg13, i32 %arg14, i32 %arg15,
i32 %arg16, i32 %arg17, i32 %arg18, i32 %arg19, i32 %arg20, i32 %arg21, i32 %arg22, i32 %arg23,		i32 %arg16, i32 %arg17, i32 %arg18, i32 %arg19, i32 %arg20, i32 %arg21, i32 %arg22, i32 %arg23,
i32 %arg24, i32 %arg25, i32 %arg26, i32 %arg27, i32 %arg28, i32 %arg29, i32 %arg30, i32 %arg31) #1 {		i32 %arg24, i32 %arg25, i32 %arg26, i32 %arg27, i32 %arg28, i32 %arg29, i32 %arg30, i32 %arg31) #1 {
call void @too_many_args_use_workitem_id_x(		call void @too_many_args_use_workitem_id_x(
i32 %arg0, i32 %arg1, i32 %arg2, i32 %arg3, i32 %arg4, i32 %arg5, i32 %arg6, i32 %arg7,		i32 %arg0, i32 %arg1, i32 %arg2, i32 %arg3, i32 %arg4, i32 %arg5, i32 %arg6, i32 %arg7,
i32 %arg8, i32 %arg9, i32 %arg10, i32 %arg11, i32 %arg12, i32 %arg13, i32 %arg14, i32 %arg15,		i32 %arg8, i32 %arg9, i32 %arg10, i32 %arg11, i32 %arg12, i32 %arg13, i32 %arg14, i32 %arg15,
i32 %arg16, i32 %arg17, i32 %arg18, i32 %arg19, i32 %arg20, i32 %arg21, i32 %arg22, i32 %arg23,		i32 %arg16, i32 %arg17, i32 %arg18, i32 %arg19, i32 %arg20, i32 %arg21, i32 %arg22, i32 %arg23,
i32 %arg24, i32 %arg25, i32 %arg26, i32 %arg27, i32 %arg28, i32 %arg29, i32 %arg30, i32 %arg31)		i32 %arg24, i32 %arg25, i32 %arg26, i32 %arg27, i32 %arg28, i32 %arg29, i32 %arg30, i32 %arg31)
ret void		ret void
}		}

; var abi stack layout:		; var abi stack layout:
; frame[0] = byval arg32		; frame[0] = byval arg32
; frame[1] = stack passed workitem ID x		; frame[1] = stack passed workitem ID x
; frame[2] = VGPR spill slot		; frame[2] = VGPR spill slot

; GCN-LABEL: {{^}}too_many_args_use_workitem_id_x_byval:		; GCN-LABEL: {{^}}too_many_args_use_workitem_id_x_byval:
; VARABI: buffer_store_dword v32, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; VARABI: buffer_load_dword v32, off, s[0:3], s32 offset:4		; VARABI: buffer_load_dword v32, off, s[0:3], s32 offset:4
; VARABI-NEXT: s_waitcnt		; VARABI-NEXT: s_waitcnt
; VARABI-NEXT: v_and_b32_e32 v32, 0x3ff, v32		; VARABI-NEXT: v_and_b32_e32 v32, 0x3ff, v32
; VARABI-NEXT: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+\]}}, v32		; VARABI-NEXT: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+\]}}, v32
; VARABI: buffer_load_dword v0, off, s[0:3], s32{{$}}		; VARABI: buffer_load_dword v0, off, s[0:3], s32{{$}}
; VARABI: buffer_load_dword v32, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
; VARABI: s_setpc_b64		; VARABI: s_setpc_b64


; FIXEDABI: v_and_b32_e32 v31, 0x3ff, v31		; FIXEDABI: v_and_b32_e32 v31, 0x3ff, v31
; FIXEDABI-NEXT: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+\]}}, v31		; FIXEDABI-NEXT: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+\]}}, v31

; FIXEDABI: buffer_load_dword v0, off, s[0:3], s32{{$}}		; FIXEDABI: buffer_load_dword v0, off, s[0:3], s32{{$}}
; FIXEDABI: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+\]}}, v0		; FIXEDABI: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+\]}}, v0
▲ Show 20 Lines • Show All 134 Lines • ▼ Show 20 Lines	call void @too_many_args_use_workitem_id_x_byval(
i32 170, i32 180, i32 190, i32 200,		i32 170, i32 180, i32 190, i32 200,
i32 210, i32 220, i32 230, i32 240,		i32 210, i32 220, i32 230, i32 240,
i32 250, i32 260, i32 270, i32 280,		i32 250, i32 260, i32 270, i32 280,
i32 290, i32 300, i32 310, i32 320,		i32 290, i32 300, i32 310, i32 320,
i32 addrspace(5)* %alloca)		i32 addrspace(5)* %alloca)
ret void		ret void
}		}

; Only one stack load should be emitted for all 3 values.
; GCN-LABEL: {{^}}too_many_args_use_workitem_id_xyz:		; GCN-LABEL: {{^}}too_many_args_use_workitem_id_xyz:
; VARABI: buffer_store_dword v32, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; VARABI: buffer_store_dword v33, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; VARABI-NOT: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s32{{$}}		; VARABI-NOT: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s32{{$}}
; VARABI: buffer_load_dword v32, off, s[0:3], s32{{$}}		; VARABI: buffer_load_dword v32, off, s[0:3], s32{{$}}
; VARABI-NOT: buffer_load_dword		; VARABI-NOT: buffer_load_dword

; VARABI: v_and_b32_e32 [[AND_X:v[0-9]+]], 0x3ff, v32		; VARABI: v_and_b32_e32 [[AND_X:v[0-9]+]], 0x3ff, v32
; VARABI-NOT: buffer_load_dword		; VARABI-NOT: buffer_load_dword
; VARABI: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+]}}, [[AND_X]]		; VARABI: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+]}}, [[AND_X]]
; VARABI-NOT: buffer_load_dword		; VARABI-NOT: buffer_load_dword
; VARABI: v_bfe_u32 [[BFE_Y:v[0-9]+]], v32, 10, 10		; VARABI: v_bfe_u32 [[BFE_Y:v[0-9]+]], v32, 10, 10
; VARABI-NEXT: v_bfe_u32 [[BFE_Z:v[0-9]+]], v32, 20, 10		; VARABI-NEXT: v_bfe_u32 [[BFE_Z:v[0-9]+]], v32, 20, 10
; VARABI-NEXT: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+]}}, [[BFE_Y]]		; VARABI-NEXT: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+]}}, [[BFE_Y]]
; VARABI-NEXT: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+]}}, [[BFE_Z]]		; VARABI-NEXT: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+]}}, [[BFE_Z]]

; VARABI: buffer_load_dword v33, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload		; VARABI: s_waitcnt
; VARABI: buffer_load_dword v32, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
; VARABI-NEXT: s_waitcnt
; VARABI-NEXT: s_setpc_b64		; VARABI-NEXT: s_setpc_b64


; FIXEDABI: v_and_b32_e32 [[AND_X:v[0-9]+]], 0x3ff, v31		; FIXEDABI: v_and_b32_e32 [[AND_X:v[0-9]+]], 0x3ff, v31
; FIXEDABI-NOT: buffer_load_dword		; FIXEDABI-NOT: buffer_load_dword
; FIXEDABI: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+]}}, [[AND_X]]		; FIXEDABI: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+]}}, [[AND_X]]
; FIXEDABI-NOT: buffer_load_dword		; FIXEDABI-NOT: buffer_load_dword
; FIXEDABI: v_bfe_u32 [[BFE_Y:v[0-9]+]], v31, 10, 10		; FIXEDABI: v_bfe_u32 [[BFE_Y:v[0-9]+]], v31, 10, 10
▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines
; GCN-DAG: {{flat\|global}}_store_dword v[0:1], [[IDX]]		; GCN-DAG: {{flat\|global}}_store_dword v[0:1], [[IDX]]
; GCN-DAG: v_bfe_u32 [[IDY:v[0-9]+]], v31, 10, 10		; GCN-DAG: v_bfe_u32 [[IDY:v[0-9]+]], v31, 10, 10
; GCN-DAG: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+]}}, [[IDY]]		; GCN-DAG: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+]}}, [[IDY]]
; GCN-DAG: v_bfe_u32 [[IDZ:v[0-9]+]], v31, 20, 10		; GCN-DAG: v_bfe_u32 [[IDZ:v[0-9]+]], v31, 20, 10
; GCN-DAG: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+]}}, [[IDZ]]		; GCN-DAG: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+]}}, [[IDZ]]

; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
; GCN: ScratchSize: 8		; GCN: ScratchSize: 0
define void @too_many_args_use_workitem_id_x_stack_yz(		define void @too_many_args_use_workitem_id_x_stack_yz(
i32 %arg0, i32 %arg1, i32 %arg2, i32 %arg3, i32 %arg4, i32 %arg5, i32 %arg6, i32 %arg7,		i32 %arg0, i32 %arg1, i32 %arg2, i32 %arg3, i32 %arg4, i32 %arg5, i32 %arg6, i32 %arg7,
i32 %arg8, i32 %arg9, i32 %arg10, i32 %arg11, i32 %arg12, i32 %arg13, i32 %arg14, i32 %arg15,		i32 %arg8, i32 %arg9, i32 %arg10, i32 %arg11, i32 %arg12, i32 %arg13, i32 %arg14, i32 %arg15,
i32 %arg16, i32 %arg17, i32 %arg18, i32 %arg19, i32 %arg20, i32 %arg21, i32 %arg22, i32 %arg23,		i32 %arg16, i32 %arg17, i32 %arg18, i32 %arg19, i32 %arg20, i32 %arg21, i32 %arg22, i32 %arg23,
i32 %arg24, i32 %arg25, i32 %arg26, i32 %arg27, i32 %arg28, i32 %arg29, i32 %arg30) #1 {		i32 %arg24, i32 %arg25, i32 %arg26, i32 %arg27, i32 %arg28, i32 %arg29, i32 %arg30) #1 {
%val0 = call i32 @llvm.amdgcn.workitem.id.x()		%val0 = call i32 @llvm.amdgcn.workitem.id.x()
store volatile i32 %val0, i32 addrspace(1)* undef		store volatile i32 %val0, i32 addrspace(1)* undef
%val1 = call i32 @llvm.amdgcn.workitem.id.y()		%val1 = call i32 @llvm.amdgcn.workitem.id.y()
▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll

	Show All 22 Lines
	; since that didn't look like an ABI copy.			; since that didn't look like an ABI copy.


	define float @call_split_type_used_outside_block_v2f32() #0 {			define float @call_split_type_used_outside_block_v2f32() #0 {
	; GCN-LABEL: call_split_type_used_outside_block_v2f32:			; GCN-LABEL: call_split_type_used_outside_block_v2f32:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v32, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: v_writelane_b32 v32, s33, 2			; GCN-NEXT: v_writelane_b32 v40, s33, 2
	; GCN-NEXT: v_writelane_b32 v32, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_add_u32 s32, s32, 0x400			; GCN-NEXT: s_add_u32 s32, s32, 0x400
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, func_v2f32@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, func_v2f32@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, func_v2f32@rel32@hi+4			; GCN-NEXT: s_addc_u32 s5, s5, func_v2f32@rel32@hi+4
	; GCN-NEXT: v_writelane_b32 v32, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GCN-NEXT: v_readlane_b32 s4, v32, 0			; GCN-NEXT: v_readlane_b32 s4, v40, 0
	; GCN-NEXT: v_readlane_b32 s5, v32, 1			; GCN-NEXT: v_readlane_b32 s5, v40, 1
	; GCN-NEXT: s_sub_u32 s32, s32, 0x400			; GCN-NEXT: s_sub_u32 s32, s32, 0x400
	; GCN-NEXT: v_readlane_b32 s33, v32, 2			; GCN-NEXT: v_readlane_b32 s33, v40, 2
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[4:5]			; GCN-NEXT: s_setpc_b64 s[4:5]
	bb0:			bb0:
	%split.ret.type = call <2 x float> @func_v2f32()			%split.ret.type = call <2 x float> @func_v2f32()
	br label %bb1			br label %bb1

	bb1:			bb1:
	%extract = extractelement <2 x float> %split.ret.type, i32 0			%extract = extractelement <2 x float> %split.ret.type, i32 0
	ret float %extract			ret float %extract
	}			}

	define float @call_split_type_used_outside_block_v3f32() #0 {			define float @call_split_type_used_outside_block_v3f32() #0 {
	; GCN-LABEL: call_split_type_used_outside_block_v3f32:			; GCN-LABEL: call_split_type_used_outside_block_v3f32:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v32, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: v_writelane_b32 v32, s33, 2			; GCN-NEXT: v_writelane_b32 v40, s33, 2
	; GCN-NEXT: v_writelane_b32 v32, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_add_u32 s32, s32, 0x400			; GCN-NEXT: s_add_u32 s32, s32, 0x400
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, func_v3f32@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, func_v3f32@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, func_v3f32@rel32@hi+4			; GCN-NEXT: s_addc_u32 s5, s5, func_v3f32@rel32@hi+4
	; GCN-NEXT: v_writelane_b32 v32, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GCN-NEXT: v_readlane_b32 s4, v32, 0			; GCN-NEXT: v_readlane_b32 s4, v40, 0
	; GCN-NEXT: v_readlane_b32 s5, v32, 1			; GCN-NEXT: v_readlane_b32 s5, v40, 1
	; GCN-NEXT: s_sub_u32 s32, s32, 0x400			; GCN-NEXT: s_sub_u32 s32, s32, 0x400
	; GCN-NEXT: v_readlane_b32 s33, v32, 2			; GCN-NEXT: v_readlane_b32 s33, v40, 2
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[4:5]			; GCN-NEXT: s_setpc_b64 s[4:5]
	bb0:			bb0:
	%split.ret.type = call <3 x float> @func_v3f32()			%split.ret.type = call <3 x float> @func_v3f32()
	br label %bb1			br label %bb1

	bb1:			bb1:
	%extract = extractelement <3 x float> %split.ret.type, i32 0			%extract = extractelement <3 x float> %split.ret.type, i32 0
	ret float %extract			ret float %extract
	}			}

	define half @call_split_type_used_outside_block_v4f16() #0 {			define half @call_split_type_used_outside_block_v4f16() #0 {
	; GCN-LABEL: call_split_type_used_outside_block_v4f16:			; GCN-LABEL: call_split_type_used_outside_block_v4f16:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v32, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: v_writelane_b32 v32, s33, 2			; GCN-NEXT: v_writelane_b32 v40, s33, 2
	; GCN-NEXT: v_writelane_b32 v32, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_add_u32 s32, s32, 0x400			; GCN-NEXT: s_add_u32 s32, s32, 0x400
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, func_v4f16@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, func_v4f16@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, func_v4f16@rel32@hi+4			; GCN-NEXT: s_addc_u32 s5, s5, func_v4f16@rel32@hi+4
	; GCN-NEXT: v_writelane_b32 v32, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GCN-NEXT: v_readlane_b32 s4, v32, 0			; GCN-NEXT: v_readlane_b32 s4, v40, 0
	; GCN-NEXT: v_readlane_b32 s5, v32, 1			; GCN-NEXT: v_readlane_b32 s5, v40, 1
	; GCN-NEXT: s_sub_u32 s32, s32, 0x400			; GCN-NEXT: s_sub_u32 s32, s32, 0x400
	; GCN-NEXT: v_readlane_b32 s33, v32, 2			; GCN-NEXT: v_readlane_b32 s33, v40, 2
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[4:5]			; GCN-NEXT: s_setpc_b64 s[4:5]
	bb0:			bb0:
	%split.ret.type = call <4 x half> @func_v4f16()			%split.ret.type = call <4 x half> @func_v4f16()
	br label %bb1			br label %bb1

	bb1:			bb1:
	%extract = extractelement <4 x half> %split.ret.type, i32 0			%extract = extractelement <4 x half> %split.ret.type, i32 0
	ret half %extract			ret half %extract
	}			}

	define { i32, half } @call_split_type_used_outside_block_struct() #0 {			define { i32, half } @call_split_type_used_outside_block_struct() #0 {
	; GCN-LABEL: call_split_type_used_outside_block_struct:			; GCN-LABEL: call_split_type_used_outside_block_struct:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v32, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: v_writelane_b32 v32, s33, 2			; GCN-NEXT: v_writelane_b32 v40, s33, 2
	; GCN-NEXT: v_writelane_b32 v32, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_add_u32 s32, s32, 0x400			; GCN-NEXT: s_add_u32 s32, s32, 0x400
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, func_struct@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, func_struct@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, func_struct@rel32@hi+4			; GCN-NEXT: s_addc_u32 s5, s5, func_struct@rel32@hi+4
	; GCN-NEXT: v_writelane_b32 v32, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GCN-NEXT: v_readlane_b32 s4, v32, 0			; GCN-NEXT: v_readlane_b32 s4, v40, 0
	; GCN-NEXT: v_readlane_b32 s5, v32, 1			; GCN-NEXT: v_readlane_b32 s5, v40, 1
	; GCN-NEXT: v_mov_b32_e32 v1, v4			; GCN-NEXT: v_mov_b32_e32 v1, v4
	; GCN-NEXT: s_sub_u32 s32, s32, 0x400			; GCN-NEXT: s_sub_u32 s32, s32, 0x400
	; GCN-NEXT: v_readlane_b32 s33, v32, 2			; GCN-NEXT: v_readlane_b32 s33, v40, 2
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[4:5]			; GCN-NEXT: s_setpc_b64 s[4:5]
	bb0:			bb0:
	%split.ret.type = call { <4 x i32>, <4 x half> } @func_struct()			%split.ret.type = call { <4 x i32>, <4 x half> } @func_struct()
	br label %bb1			br label %bb1

	bb1:			bb1:
	▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/ipra-regmask.ll

	; RUN: llc -mtriple=amdgcn-amd-amdhsa -enable-ipra -print-regusage -o /dev/null 2>&1 < %s \| FileCheck %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -enable-ipra -print-regusage -o /dev/null 2>&1 < %s \| FileCheck %s
	; Make sure the expected regmask is generated for sub/superregisters.			; Make sure the expected regmask is generated for sub/superregisters.

	; CHECK-DAG: csr Clobbered Registers: $vgpr0 $vgpr0_hi16 $vgpr0_lo16 $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31 $vgpr0_vgpr1_vgpr2_vgpr3 $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4 $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5 $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7 $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15 $vgpr0_vgpr1 $vgpr0_vgpr1_vgpr2 {{$}}			; CHECK-DAG: csr Clobbered Registers: $vgpr0 $vgpr0_hi16 $vgpr0_lo16 $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31 $vgpr0_vgpr1_vgpr2_vgpr3 $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4 $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5 $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7 $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15 $vgpr0_vgpr1 $vgpr0_vgpr1_vgpr2 {{$}}
	define void @csr() #0 {			define void @csr() #0 {
	call void asm sideeffect "", "~{v0},~{v36},~{v37}"() #0			call void asm sideeffect "", "~{v0},~{v44},~{v45}"() #0
	ret void			ret void
	}			}

	; CHECK-DAG: subregs_for_super Clobbered Registers: $vgpr0 $vgpr1 $vgpr0_hi16 $vgpr1_hi16 $vgpr0_lo16 $vgpr1_lo16 $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31 $vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31_vgpr32 $vgpr0_vgpr1_vgpr2_vgpr3 $vgpr1_vgpr2_vgpr3_vgpr4 $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4 $vgpr1_vgpr2_vgpr3_vgpr4_vgpr5 $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5 $vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6 $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7 $vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8 $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15 $vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16 $vgpr0_vgpr1 $vgpr1_vgpr2 $vgpr0_vgpr1_vgpr2 $vgpr1_vgpr2_vgpr3 {{$}}			; CHECK-DAG: subregs_for_super Clobbered Registers: $vgpr0 $vgpr1 $vgpr0_hi16 $vgpr1_hi16 $vgpr0_lo16 $vgpr1_lo16 $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31 $vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31_vgpr32 $vgpr0_vgpr1_vgpr2_vgpr3 $vgpr1_vgpr2_vgpr3_vgpr4 $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4 $vgpr1_vgpr2_vgpr3_vgpr4_vgpr5 $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5 $vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6 $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7 $vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8 $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15 $vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16 $vgpr0_vgpr1 $vgpr1_vgpr2 $vgpr0_vgpr1_vgpr2 $vgpr1_vgpr2_vgpr3 {{$}}
	define void @subregs_for_super() #0 {			define void @subregs_for_super() #0 {
	call void asm sideeffect "", "~{v0},~{v1}"() #0			call void asm sideeffect "", "~{v0},~{v1}"() #0
	ret void			ret void
	}			}
	Show All 32 Lines

llvm/test/CodeGen/AMDGPU/mul24-pass-ordering.ll

Show First 20 Lines • Show All 181 Lines • ▼ Show 20 Lines	; CHECK-NOT: mul i32
ret void		ret void
}		}

define void @slsr1_1(i32 %b.arg, i32 %s.arg) #0 {		define void @slsr1_1(i32 %b.arg, i32 %s.arg) #0 {
; GFX9-LABEL: slsr1_1:		; GFX9-LABEL: slsr1_1:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX9-NEXT: buffer_store_dword v35, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[4:5]		; GFX9-NEXT: s_mov_b64 exec, s[4:5]
; GFX9-NEXT: v_writelane_b32 v35, s33, 4		; GFX9-NEXT: v_writelane_b32 v43, s33, 4
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_add_u32 s32, s32, 0x800		; GFX9-NEXT: s_add_u32 s32, s32, 0x800
; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v33, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v34, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: v_writelane_b32 v35, s34, 0		; GFX9-NEXT: v_writelane_b32 v43, s34, 0
; GFX9-NEXT: s_getpc_b64 s[4:5]		; GFX9-NEXT: s_getpc_b64 s[4:5]
; GFX9-NEXT: s_add_u32 s4, s4, foo@gotpcrel32@lo+4		; GFX9-NEXT: s_add_u32 s4, s4, foo@gotpcrel32@lo+4
; GFX9-NEXT: s_addc_u32 s5, s5, foo@gotpcrel32@hi+4		; GFX9-NEXT: s_addc_u32 s5, s5, foo@gotpcrel32@hi+4
; GFX9-NEXT: v_writelane_b32 v35, s35, 1		; GFX9-NEXT: v_writelane_b32 v43, s35, 1
; GFX9-NEXT: s_load_dwordx2 s[34:35], s[4:5], 0x0		; GFX9-NEXT: s_load_dwordx2 s[34:35], s[4:5], 0x0
; GFX9-NEXT: v_mov_b32_e32 v32, v1		; GFX9-NEXT: v_mov_b32_e32 v40, v1
; GFX9-NEXT: v_mov_b32_e32 v33, v0		; GFX9-NEXT: v_mov_b32_e32 v41, v0
; GFX9-NEXT: v_writelane_b32 v35, s30, 2		; GFX9-NEXT: v_writelane_b32 v43, s30, 2
; GFX9-NEXT: v_mul_u32_u24_e32 v0, v33, v32		; GFX9-NEXT: v_mul_u32_u24_e32 v0, v41, v40
; GFX9-NEXT: v_writelane_b32 v35, s31, 3		; GFX9-NEXT: v_writelane_b32 v43, s31, 3
; GFX9-NEXT: v_and_b32_e32 v34, 0xffffff, v32		; GFX9-NEXT: v_and_b32_e32 v42, 0xffffff, v40
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX9-NEXT: v_mad_u32_u24 v32, v33, v32, v34		; GFX9-NEXT: v_mad_u32_u24 v40, v41, v40, v42
; GFX9-NEXT: v_mov_b32_e32 v0, v32		; GFX9-NEXT: v_mov_b32_e32 v0, v40
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX9-NEXT: v_add_u32_e32 v0, v32, v34		; GFX9-NEXT: v_add_u32_e32 v0, v40, v42
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX9-NEXT: v_readlane_b32 s4, v35, 2		; GFX9-NEXT: v_readlane_b32 s4, v43, 2
; GFX9-NEXT: v_readlane_b32 s5, v35, 3		; GFX9-NEXT: v_readlane_b32 s5, v43, 3
; GFX9-NEXT: v_readlane_b32 s35, v35, 1		; GFX9-NEXT: v_readlane_b32 s35, v43, 1
; GFX9-NEXT: v_readlane_b32 s34, v35, 0		; GFX9-NEXT: v_readlane_b32 s34, v43, 0
; GFX9-NEXT: buffer_load_dword v34, off, s[0:3], s33 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload
; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
; GFX9-NEXT: s_sub_u32 s32, s32, 0x800		; GFX9-NEXT: s_sub_u32 s32, s32, 0x800
; GFX9-NEXT: v_readlane_b32 s33, v35, 4		; GFX9-NEXT: v_readlane_b32 s33, v43, 4
; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1		; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
; GFX9-NEXT: buffer_load_dword v35, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
; GFX9-NEXT: s_mov_b64 exec, s[6:7]		; GFX9-NEXT: s_mov_b64 exec, s[6:7]
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[4:5]		; GFX9-NEXT: s_setpc_b64 s[4:5]
%b = and i32 %b.arg, 16777215		%b = and i32 %b.arg, 16777215
%s = and i32 %s.arg, 16777215		%s = and i32 %s.arg, 16777215

; CHECK-LABEL: @slsr1(		; CHECK-LABEL: @slsr1(
; foo(b * s);		; foo(b * s);
Show All 25 Lines

llvm/test/CodeGen/AMDGPU/nested-calls.ll

	; RUN: llc -march=amdgcn -mcpu=fiji -mattr=-flat-for-global -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=VI %s			; RUN: llc -march=amdgcn -mcpu=fiji -mattr=-flat-for-global -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=VI %s
	; RUN: llc -march=amdgcn -mcpu=hawaii -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=CI %s			; RUN: llc -march=amdgcn -mcpu=hawaii -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=CI %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=GFX9 -check-prefix=VI %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=GFX9 -check-prefix=VI %s

	; Test calls when called by other callable functions rather than			; Test calls when called by other callable functions rather than
	; kernels.			; kernels.

	declare void @external_void_func_i32(i32) #0			declare void @external_void_func_i32(i32) #0

	; GCN-LABEL: {{^}}test_func_call_external_void_func_i32_imm:			; GCN-LABEL: {{^}}test_func_call_external_void_func_i32_imm:
	; GCN: s_waitcnt			; GCN: s_waitcnt

	; Spill CSR VGPR used for SGPR spilling			; Spill CSR VGPR used for SGPR spilling
	; GCN: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GCN-NEXT: buffer_store_dword v32, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
	; GCN-DAG: v_writelane_b32 v32, s33, 2			; GCN-DAG: v_writelane_b32 v40, s33, 2
	; GCN-DAG: s_mov_b32 s33, s32			; GCN-DAG: s_mov_b32 s33, s32
	; GCN-DAG: s_add_u32 s32, s32, 0x400			; GCN-DAG: s_add_u32 s32, s32, 0x400
	; GCN-DAG: v_writelane_b32 v32, s30, 0			; GCN-DAG: v_writelane_b32 v40, s30, 0
	; GCN-DAG: v_writelane_b32 v32, s31, 1			; GCN-DAG: v_writelane_b32 v40, s31, 1

	; GCN: s_swappc_b64			; GCN: s_swappc_b64

	; GCN: v_readlane_b32 s4, v32, 0			; GCN: v_readlane_b32 s4, v40, 0
	; GCN: v_readlane_b32 s5, v32, 1			; GCN: v_readlane_b32 s5, v40, 1

	; GCN-NEXT: s_sub_u32 s32, s32, 0x400			; GCN-NEXT: s_sub_u32 s32, s32, 0x400
	; GCN-NEXT: v_readlane_b32 s33, v32, 2			; GCN-NEXT: v_readlane_b32 s33, v40, 2
	; GCN: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[4:5]			; GCN-NEXT: s_setpc_b64 s[4:5]
	define void @test_func_call_external_void_func_i32_imm() #0 {			define void @test_func_call_external_void_func_i32_imm() #0 {
	call void @external_void_func_i32(i32 42)			call void @external_void_func_i32(i32 42)
	ret void			ret void
	}			}

	Show All 21 Lines

llvm/test/CodeGen/AMDGPU/regbank-reassign.mir

Show First 20 Lines • Show All 248 Lines • ▼ Show 20 Lines	bb.0:
GLOBAL_STORE_DWORDX4 %3, %6, 0, 0, 0, 0, implicit $exec		GLOBAL_STORE_DWORDX4 %3, %6, 0, 0, 0, 0, implicit $exec
GLOBAL_STORE_DWORDX4 %3, %7, 0, 0, 0, 0, implicit $exec		GLOBAL_STORE_DWORDX4 %3, %7, 0, 0, 0, 0, implicit $exec
GLOBAL_STORE_DWORDX4 %3, %8, 0, 0, 0, 0, implicit $exec		GLOBAL_STORE_DWORDX4 %3, %8, 0, 0, 0, 0, implicit $exec
GLOBAL_STORE_DWORDX4 %3, %9, 0, 0, 0, 0, implicit $exec		GLOBAL_STORE_DWORDX4 %3, %9, 0, 0, 0, 0, implicit $exec
S_ENDPGM 0		S_ENDPGM 0
...		...

# GCN-LABEL: csr{{$}}		# GCN-LABEL: csr{{$}}
# GCN: V_AND_B32_e32 $vgpr4, $vgpr0,		# GCN: V_AND_B32_e32 $vgpr37, $vgpr0,
---		---
name: csr		name: csr
tracksRegLiveness: true		tracksRegLiveness: true
registers:		registers:
- { id: 0, class: vgpr_32, preferred-register: '$vgpr0' }		- { id: 0, class: vgpr_32, preferred-register: '$vgpr0' }
- { id: 1, class: vgpr_32, preferred-register: '$vgpr4' }		- { id: 1, class: vgpr_32, preferred-register: '$vgpr4' }
- { id: 2, class: vgpr_32, preferred-register: '$vgpr1' }		- { id: 2, class: vgpr_32, preferred-register: '$vgpr1' }
- { id: 3, class: vreg_64, preferred-register: '$vgpr2_vgpr3' }		- { id: 3, class: vreg_64, preferred-register: '$vgpr2_vgpr3' }
▲ Show 20 Lines • Show All 231 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/sibling-call.ll

Show First 20 Lines • Show All 146 Lines • ▼ Show 20 Lines	define fastcc i32 @i32_fastcc_i32_i32_a32i32(i32 %arg0, i32 %arg1, [32 x i32] %large) #1 {
%add1 = add i32 %add0, %val_firststack		%add1 = add i32 %add0, %val_firststack
%add2 = add i32 %add1, %val_laststack		%add2 = add i32 %add1, %val_laststack
ret i32 %add2		ret i32 %add2
}		}

; FIXME: Why load and store same location for stack args?		; FIXME: Why load and store same location for stack args?
; GCN-LABEL: {{^}}sibling_call_i32_fastcc_i32_i32_a32i32:		; GCN-LABEL: {{^}}sibling_call_i32_fastcc_i32_i32_a32i32:

; GCN-DAG: buffer_store_dword v32, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
; GCN-DAG: buffer_store_dword v33, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill

; GCN-DAG: buffer_load_dword [[LOAD_0:v[0-9]+]], off, s[0:3], s32{{$}}		; GCN-DAG: buffer_load_dword [[LOAD_0:v[0-9]+]], off, s[0:3], s32{{$}}
; GCN-DAG: buffer_load_dword [[LOAD_1:v[0-9]+]], off, s[0:3], s32 offset:4		; GCN-DAG: buffer_load_dword [[LOAD_1:v[0-9]+]], off, s[0:3], s32 offset:4

; GCN-NOT: s32		; GCN-NOT: s32

; GCN-DAG: buffer_store_dword [[LOAD_0]], off, s[0:3], s32{{$}}		; GCN-DAG: buffer_store_dword [[LOAD_0]], off, s[0:3], s32{{$}}
; GCN-DAG: buffer_store_dword [[LOAD_1]], off, s[0:3], s32 offset:4		; GCN-DAG: buffer_store_dword [[LOAD_1]], off, s[0:3], s32 offset:4

; GCN-DAG: buffer_load_dword v32, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
; GCN-DAG: buffer_load_dword v33, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload

; GCN-NOT: s32		; GCN-NOT: s32
; GCN: s_setpc_b64		; GCN: s_setpc_b64
define fastcc i32 @sibling_call_i32_fastcc_i32_i32_a32i32(i32 %a, i32 %b, [32 x i32] %c) #1 {		define fastcc i32 @sibling_call_i32_fastcc_i32_i32_a32i32(i32 %a, i32 %b, [32 x i32] %c) #1 {
entry:		entry:
%ret = tail call fastcc i32 @i32_fastcc_i32_i32_a32i32(i32 %a, i32 %b, [32 x i32] %c)		%ret = tail call fastcc i32 @i32_fastcc_i32_i32_a32i32(i32 %a, i32 %b, [32 x i32] %c)
ret i32 %ret		ret i32 %ret
}		}

; GCN-LABEL: {{^}}sibling_call_i32_fastcc_i32_i32_a32i32_stack_object:		; GCN-LABEL: {{^}}sibling_call_i32_fastcc_i32_i32_a32i32_stack_object:
; GCN-DAG: v_mov_b32_e32 [[NINE:v[0-9]+]], 9		; GCN-DAG: v_mov_b32_e32 [[NINE:v[0-9]+]], 9
; GCN: buffer_store_dword [[NINE]], off, s[0:3], s32 offset:40		; GCN: buffer_store_dword [[NINE]], off, s[0:3], s32 offset:28
; GCN: s_setpc_b64		; GCN: s_setpc_b64
define fastcc i32 @sibling_call_i32_fastcc_i32_i32_a32i32_stack_object(i32 %a, i32 %b, [32 x i32] %c) #1 {		define fastcc i32 @sibling_call_i32_fastcc_i32_i32_a32i32_stack_object(i32 %a, i32 %b, [32 x i32] %c) #1 {
entry:		entry:
%alloca = alloca [16 x i32], align 4, addrspace(5)		%alloca = alloca [16 x i32], align 4, addrspace(5)
%gep = getelementptr inbounds [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 5		%gep = getelementptr inbounds [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 5
store volatile i32 9, i32 addrspace(5)* %gep		store volatile i32 9, i32 addrspace(5)* %gep
%ret = tail call fastcc i32 @i32_fastcc_i32_i32_a32i32(i32 %a, i32 %b, [32 x i32] %c)		%ret = tail call fastcc i32 @i32_fastcc_i32_i32_a32i32(i32 %a, i32 %b, [32 x i32] %c)
ret i32 %ret		ret i32 %ret
Show All 10 Lines
entry:		entry:
%ret = tail call fastcc i32 @i32_fastcc_i32_i32_a32i32(i32 %a, i32 %b, [32 x i32] zeroinitializer)		%ret = tail call fastcc i32 @i32_fastcc_i32_i32_a32i32(i32 %a, i32 %b, [32 x i32] zeroinitializer)
ret i32 %ret		ret i32 %ret
}		}

; Have another non-tail in the function		; Have another non-tail in the function
; GCN-LABEL: {{^}}sibling_call_i32_fastcc_i32_i32_other_call:		; GCN-LABEL: {{^}}sibling_call_i32_fastcc_i32_i32_other_call:
; GCN: s_or_saveexec_b64 s{{\[[0-9]+:[0-9]+\]}}, -1		; GCN: s_or_saveexec_b64 s{{\[[0-9]+:[0-9]+\]}}, -1
; GCN-NEXT: buffer_store_dword v34, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec		; GCN-NEXT: s_mov_b64 exec
; GCN: s_mov_b32 s33, s32		; GCN: s_mov_b32 s33, s32
; GCN-DAG: s_add_u32 s32, s32, 0x400		; GCN-DAG: s_add_u32 s32, s32, 0x400

; GCN-DAG: buffer_store_dword v32, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GCN-DAG: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GCN-DAG: buffer_store_dword v33, off, s[0:3], s33 ; 4-byte Folded Spill		; GCN-DAG: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
; GCN-DAG: v_writelane_b32 v34, s34, 0		; GCN-DAG: v_writelane_b32 v42, s34, 0
; GCN-DAG: v_writelane_b32 v34, s35, 1		; GCN-DAG: v_writelane_b32 v42, s35, 1

; GCN-DAG: s_getpc_b64 s[4:5]		; GCN-DAG: s_getpc_b64 s[4:5]
; GCN-DAG: s_add_u32 s4, s4, i32_fastcc_i32_i32@gotpcrel32@lo+4		; GCN-DAG: s_add_u32 s4, s4, i32_fastcc_i32_i32@gotpcrel32@lo+4
; GCN-DAG: s_addc_u32 s5, s5, i32_fastcc_i32_i32@gotpcrel32@hi+4		; GCN-DAG: s_addc_u32 s5, s5, i32_fastcc_i32_i32@gotpcrel32@hi+4


; GCN: s_swappc_b64		; GCN: s_swappc_b64

; GCN-DAG: v_readlane_b32 s34, v34, 0		; GCN-DAG: v_readlane_b32 s34, v42, 0
; GCN-DAG: v_readlane_b32 s35, v34, 1		; GCN-DAG: v_readlane_b32 s35, v42, 1

; GCN: buffer_load_dword v33, off, s[0:3], s33 ; 4-byte Folded Reload		; GCN: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
; GCN: buffer_load_dword v32, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload		; GCN: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload

; GCN: s_getpc_b64 s[4:5]		; GCN: s_getpc_b64 s[4:5]
; GCN-NEXT: s_add_u32 s4, s4, sibling_call_i32_fastcc_i32_i32@rel32@lo+4		; GCN-NEXT: s_add_u32 s4, s4, sibling_call_i32_fastcc_i32_i32@rel32@lo+4
; GCN-NEXT: s_addc_u32 s5, s5, sibling_call_i32_fastcc_i32_i32@rel32@hi+4		; GCN-NEXT: s_addc_u32 s5, s5, sibling_call_i32_fastcc_i32_i32@rel32@hi+4

; GCN: s_sub_u32 s32, s32, 0x400		; GCN: s_sub_u32 s32, s32, 0x400
; GCN-NEXT: v_readlane_b32 s33,		; GCN-NEXT: v_readlane_b32 s33,
; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1		; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
; GCN-NEXT: buffer_load_dword v34, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, s[6:7]		; GCN-NEXT: s_mov_b64 exec, s[6:7]
; GCN-NEXT: s_setpc_b64 s[4:5]		; GCN-NEXT: s_setpc_b64 s[4:5]
define fastcc i32 @sibling_call_i32_fastcc_i32_i32_other_call(i32 %a, i32 %b, i32 %c) #1 {		define fastcc i32 @sibling_call_i32_fastcc_i32_i32_other_call(i32 %a, i32 %b, i32 %c) #1 {
entry:		entry:
%other.call = tail call fastcc i32 @i32_fastcc_i32_i32(i32 %a, i32 %b)		%other.call = tail call fastcc i32 @i32_fastcc_i32_i32(i32 %a, i32 %b)
%ret = tail call fastcc i32 @sibling_call_i32_fastcc_i32_i32(i32 %a, i32 %b, i32 %other.call)		%ret = tail call fastcc i32 @sibling_call_i32_fastcc_i32_i32(i32 %a, i32 %b, i32 %other.call)
ret i32 %ret		ret i32 %ret
}		}

; Have stack object in caller and stack passed arguments. SP should be		; Have stack object in caller and stack passed arguments. SP should be
; in same place at function exit.		; in same place at function exit.

; GCN-LABEL: {{^}}sibling_call_stack_objecti32_fastcc_i32_i32_a32i32:		; GCN-LABEL: {{^}}sibling_call_stack_objecti32_fastcc_i32_i32_a32i32:
; GCN-NOT: s33		; GCN-NOT: s33
; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s32 offset:		; GCN: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s32 offset:

; GCN-NOT: s33		; GCN-NOT: s33

; GCN: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s32 offset:		; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s32 offset:
; GCN: s_setpc_b64 s[4:5]		; GCN: s_setpc_b64 s[4:5]
define fastcc i32 @sibling_call_stack_objecti32_fastcc_i32_i32_a32i32(i32 %a, i32 %b, [32 x i32] %c) #1 {		define fastcc i32 @sibling_call_stack_objecti32_fastcc_i32_i32_a32i32(i32 %a, i32 %b, [32 x i32] %c) #1 {
entry:		entry:
%alloca = alloca [16 x i32], align 4, addrspace(5)		%alloca = alloca [16 x i32], align 4, addrspace(5)
%gep = getelementptr inbounds [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 5		%gep = getelementptr inbounds [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 5
store volatile i32 9, i32 addrspace(5)* %gep		store volatile i32 9, i32 addrspace(5)* %gep
%ret = tail call fastcc i32 @i32_fastcc_i32_i32_a32i32(i32 %a, i32 %b, [32 x i32] %c)		%ret = tail call fastcc i32 @i32_fastcc_i32_i32_a32i32(i32 %a, i32 %b, [32 x i32] %c)
ret i32 %ret		ret i32 %ret
Show All 19 Lines

llvm/test/CodeGen/AMDGPU/spill-csr-frame-ptr-reg-copy.ll

	; RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -stress-regalloc=1 < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -stress-regalloc=1 < %s \| FileCheck -check-prefix=GCN %s

	; GCN-LABEL: {{^}}spill_csr_s5_copy:			; GCN-LABEL: {{^}}spill_csr_s5_copy:
	; GCN: s_or_saveexec_b64			; GCN: s_or_saveexec_b64
	; GCN-NEXT: buffer_store_dword v32, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec			; GCN-NEXT: s_mov_b64 exec
	; GCN: v_writelane_b32 v32, s33, 2			; GCN: v_writelane_b32 v40, s33, 2
	; GCN: s_swappc_b64			; GCN: s_swappc_b64

	; GCN: v_mov_b32_e32 [[K:v[0-9]+]], 9			; GCN: v_mov_b32_e32 [[K:v[0-9]+]], 9
	; GCN: buffer_store_dword [[K]], off, s[0:3], s33{{$}}			; GCN: buffer_store_dword [[K]], off, s[0:3], s33{{$}}

	; GCN: v_readlane_b32 s33, v32, 2			; GCN: v_readlane_b32 s33, v40, 2
	; GCN: s_or_saveexec_b64			; GCN: s_or_saveexec_b64
	; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GCN: s_mov_b64 exec			; GCN: s_mov_b64 exec
	; GCN: s_setpc_b64			; GCN: s_setpc_b64
	define void @spill_csr_s5_copy() #0 {			define void @spill_csr_s5_copy() #0 {
	bb:			bb:
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	%tmp = tail call i64 @func() #1			%tmp = tail call i64 @func() #1
	%tmp1 = getelementptr inbounds i32, i32 addrspace(1)* null, i64 %tmp			%tmp1 = getelementptr inbounds i32, i32 addrspace(1)* null, i64 %tmp
	%tmp2 = load i32, i32 addrspace(1)* %tmp1, align 4			%tmp2 = load i32, i32 addrspace(1)* %tmp1, align 4
	Show All 9 Lines

llvm/test/CodeGen/AMDGPU/stack-pointer-offset-relative-frameindex.ll

	Show All 23 Lines
	; GCN-NEXT: s_addc_u32 s37, s37, 0			; GCN-NEXT: s_addc_u32 s37, s37, 0
	; GCN-NEXT: v_mov_b32_e32 v1, 0x2000			; GCN-NEXT: v_mov_b32_e32 v1, 0x2000
	; GCN-NEXT: v_mov_b32_e32 v2, 0x4000			; GCN-NEXT: v_mov_b32_e32 v2, 0x4000
	; GCN-NEXT: v_mov_b32_e32 v3, 0			; GCN-NEXT: v_mov_b32_e32 v3, 0
	; GCN-NEXT: v_mov_b32_e32 v4, 0x400000			; GCN-NEXT: v_mov_b32_e32 v4, 0x400000
	; GCN-NEXT: s_mov_b64 s[0:1], s[36:37]			; GCN-NEXT: s_mov_b64 s[0:1], s[36:37]
	; GCN-NEXT: s_mov_b64 s[2:3], s[38:39]			; GCN-NEXT: s_mov_b64 s[2:3], s[38:39]
	; GCN-NEXT: s_mov_b32 s32, 0xc0000			; GCN-NEXT: s_mov_b32 s32, 0xc0000
	; GCN-NEXT: v_add_nc_u32_e64 v32, 4, 0x4000			; GCN-NEXT: v_add_nc_u32_e64 v40, 4, 0x4000
	; GCN-NEXT: ; implicit-def: $vcc_hi			; GCN-NEXT: ; implicit-def: $vcc_hi
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, svm_eval_nodes@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, svm_eval_nodes@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, svm_eval_nodes@rel32@hi+4			; GCN-NEXT: s_addc_u32 s5, s5, svm_eval_nodes@rel32@hi+4
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: v_mov_b32_e32 v0, s6			; GCN-NEXT: v_mov_b32_e32 v0, s6
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GCN-NEXT: v_cmp_ne_u32_e32 vcc_lo, 0, v0			; GCN-NEXT: v_cmp_ne_u32_e32 vcc_lo, 0, v0
	; GCN-NEXT: s_and_saveexec_b32 s0, vcc_lo			; GCN-NEXT: s_and_saveexec_b32 s0, vcc_lo
	; GCN-NEXT: s_cbranch_execz BB0_2			; GCN-NEXT: s_cbranch_execz BB0_2
	; GCN-NEXT: ; %bb.1: ; %if.then4.i			; GCN-NEXT: ; %bb.1: ; %if.then4.i
	; GCN-NEXT: buffer_load_dword v0, v32, s[36:39], s32 offen			; GCN-NEXT: buffer_load_dword v0, v40, s[36:39], s32 offen
	; GCN-NEXT: buffer_load_dword v1, v32, s[36:39], s32 offen offset:4			; GCN-NEXT: buffer_load_dword v1, v40, s[36:39], s32 offen offset:4
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_add_nc_u32_e32 v0, v1, v0			; GCN-NEXT: v_add_nc_u32_e32 v0, v1, v0
	; GCN-NEXT: v_mul_lo_u32 v0, 0x41c64e6d, v0			; GCN-NEXT: v_mul_lo_u32 v0, 0x41c64e6d, v0
	; GCN-NEXT: v_add_nc_u32_e32 v0, 0x3039, v0			; GCN-NEXT: v_add_nc_u32_e32 v0, 0x3039, v0
	; GCN-NEXT: buffer_store_dword v0, v0, s[36:39], 0 offen			; GCN-NEXT: buffer_store_dword v0, v0, s[36:39], 0 offen
	; GCN-NEXT: BB0_2: ; %shader_eval_surface.exit			; GCN-NEXT: BB0_2: ; %shader_eval_surface.exit
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	entry:			entry:
	Show All 23 Lines

llvm/test/CodeGen/AMDGPU/vgpr-tuple-allocation.ll

This file was added.

				; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9 %s
				; RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s

				declare void @extern_func()

				define <4 x float> @non_preserved_vgpr_tuple8(<8 x i32> %rsrc, <4 x i32> %samp, float %bias, float %zcompare, float %s, float %t, float %clamp) {
				; The vgpr tuple8 operand in image_gather4_c_b_cl instruction needs not be
				arsenmUnsubmitted Not Done Reply Inline Actions inreg currently doesn't do anything for non-shaders, so you should remove it here. arsenm: inreg currently doesn't do anything for non-shaders, so you should remove it here.
				; preserved across the call and should get 8 scratch registers.

				arsenmUnsubmitted Not Done Reply Inline Actions Typo presreved arsenm: Typo presreved
				; GFX9-LABEL: non_preserved_vgpr_tuple8:
				; GFX9: buffer_store_dword v44, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
				; GFX9: buffer_store_dword v40, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 ; 4-byte Folded Spill

				; GFX9: v_mov_b32_e32 v37, v11
				; GFX9-NEXT: v_mov_b32_e32 v38, v10
				; GFX9-NEXT: v_mov_b32_e32 v49, v9
				; GFX9-NEXT: v_writelane_b32 v44, s30, 0
				; GFX9-NEXT: v_mov_b32_e32 v36, v16
				; GFX9-NEXT: v_mov_b32_e32 v35, v15
				; GFX9-NEXT: v_mov_b32_e32 v34, v14
				; GFX9-NEXT: v_mov_b32_e32 v33, v13
				; GFX9-NEXT: v_mov_b32_e32 v32, v12

				; GFX9: ;;#ASMSTART
				; GFX9-NEXT: ;;#ASMEND

				; GFX9: image_gather4_c_b_cl v[40:43], v[32:39], s[4:11], s[12:15] dmask:0x1
				; GFX9-NEXT: s_getpc_b64 s[4:5]
				; GFX9-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4
				; GFX9-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+4
				; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
				; GFX9-NEXT: s_waitcnt lgkmcnt(0)
				; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]

				; GFX9: buffer_load_dword v43, off, s[0:3], s33 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
				; GFX9: buffer_load_dword v44, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
				; GFX9: s_setpc_b64 s[4:5]
				;
				; GFX10-LABEL: non_preserved_vgpr_tuple8:
				; GFX10: buffer_store_dword v44, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
				; GFX10: buffer_store_dword v40, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s33 ; 4-byte Folded Spill

				; GFX10: v_mov_b32_e32 v36, v16
				; GFX10-NEXT: v_mov_b32_e32 v35, v15
				; GFX10-NEXT: v_mov_b32_e32 v34, v14
				; GFX10-NEXT: v_mov_b32_e32 v33, v13
				; GFX10-NEXT: v_mov_b32_e32 v32, v12

				; GFX10: ;;#ASMSTART
				; GFX10-NEXT: ;;#ASMEND

				; GFX10: image_gather4_c_b_cl v[40:43], v[32:39], s[4:11], s[12:15] dmask:0x1
				; GFX10-NEXT: v_nop
				; GFX10-NEXT: s_getpc_b64 s[4:5]
				; GFX10-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4
				; GFX10-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+4
				; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
				; GFX10-NEXT: s_waitcnt lgkmcnt(0)
				; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]

				; GFX10: buffer_load_dword v43, off, s[0:3], s33 ; 4-byte Folded Reload
				; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload

				; GFX10: buffer_load_dword v44, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
				; GFX10: s_setpc_b64 s[4:5]
				main_body:
				call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}"() #0
				call void asm sideeffect "", "~{v8},~{v9},~{v10},~{v11},~{v12},~{v13},~{v14},~{v15}"() #0
				call void asm sideeffect "", "~{v16},~{v17},~{v18},~{v19},~{v20},~{v21},~{v22},~{v23}"() #0
				call void asm sideeffect "", "~{v24},~{v25},~{v26},~{v27},~{v28},~{v29},~{v30},~{v31}"() #0
				%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f32.f32(i32 1, float %bias, float %zcompare, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
				call void @extern_func()
				ret <4 x float> %v
				}

				define <4 x float> @call_preserved_vgpr_tuple8(<8 x i32> %rsrc, <4 x i32> %samp, float %bias, float %zcompare, float %s, float %t, float %clamp) {
				; The vgpr tuple8 operand in image_gather4_c_b_cl instruction needs to be preserved
				; across the call and should get allcoated to 8 CSRs.
				; Only the lower 5 sub-registers of the tuple are preserved.
				; The upper 3 sub-registers are unused.

				; GFX9-LABEL: call_preserved_vgpr_tuple8:
				; GFX9: buffer_store_dword v56, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
				; GFX9: buffer_store_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v44, off, s[0:3], s33 ; 4-byte Folded Spill

				; GFX9: v_mov_b32_e32 v44, v16
				; GFX9-NEXT: v_mov_b32_e32 v43, v15
				; GFX9-NEXT: v_mov_b32_e32 v42, v14
				; GFX9-NEXT: v_mov_b32_e32 v41, v13
				; GFX9-NEXT: v_mov_b32_e32 v40, v12

				; GFX9: s_getpc_b64 s[4:5]
				; GFX9-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4
				; GFX9-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+4
				; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
				; GFX9-NEXT: image_gather4_c_b_cl v[0:3], v[40:47], s[36:43], s[44:47] dmask:0x1
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: global_store_dwordx4 v[0:1], v[0:3], off
				; GFX9-NEXT: s_waitcnt lgkmcnt(0)
				; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
				; GFX9-NEXT: image_gather4_c_b_cl v[0:3], v[40:47], s[36:43], s[44:47] dmask:0x1

				; GFX9: buffer_load_dword v44, off, s[0:3], s33 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload

				; GFX9: buffer_load_dword v56, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
				; GFX9: s_setpc_b64 s[4:5]
				;
				; GFX10-LABEL: call_preserved_vgpr_tuple8:
				; GFX10: buffer_store_dword v45, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
				; GFX10: buffer_store_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v44, off, s[0:3], s33 ; 4-byte Folded Spill

				; GFX10: s_getpc_b64 s[4:5]
				; GFX10-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4
				; GFX10-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+4
				; GFX10-NEXT: v_mov_b32_e32 v40, v16
				; GFX10-NEXT: v_mov_b32_e32 v41, v15
				; GFX10-NEXT: image_gather4_c_b_cl v[0:3], v[12:19], s[36:43], s[44:47] dmask:0x1
				; GFX10-NEXT: v_mov_b32_e32 v42, v14
				; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
				; GFX10-NEXT: v_mov_b32_e32 v43, v13
				; GFX10-NEXT: v_mov_b32_e32 v44, v12
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: global_store_dwordx4 v[0:1], v[0:3], off
				; GFX10-NEXT: s_waitcnt lgkmcnt(0)
				; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
				; GFX10-NEXT: image_gather4_c_b_cl v[0:3], [v44, v43, v42, v41, v40], s[36:43], s[44:47] dmask:0x1

				; GFX10: buffer_load_dword v44, off, s[0:3], s33 ; 4-byte Folded Reload
				; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
				; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
				; GFX10: buffer_load_dword v45, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
				; GFX10: s_setpc_b64 s[4:5]
				main_body:
				%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f32.f32(i32 1, float %bias, float %zcompare, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
				store <4 x float> %v, <4 x float> addrspace(1)* undef
				call void @extern_func()
				%v1 = call <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f32.f32(i32 1, float %bias, float %zcompare, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
				ret <4 x float> %v1
				}

				declare <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f32.f32(i32 immarg, float, float, float, float, float, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #1

				attributes #0 = { nounwind writeonly }
				attributes #1 = { nounwind readonly }

llvm/test/CodeGen/AMDGPU/virtregrewrite-undef-identity-copy.mir

	Show All 23 Lines
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'			scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
	frameOffsetReg: '$sgpr95'			frameOffsetReg: '$sgpr95'
	stackPtrOffsetReg: '$sgpr32'			stackPtrOffsetReg: '$sgpr32'
	body: \|			body: \|
	bb.0:			bb.0:
	; CHECK-LABEL: name: undef_identity_copy			; CHECK-LABEL: name: undef_identity_copy
	; CHECK: renamable $vgpr32_vgpr33_vgpr34_vgpr35 = FLAT_LOAD_DWORDX4 undef renamable $vgpr0_vgpr1, 0, 0, 0, 0, implicit $exec, implicit $flat_scr :: (load 16, addrspace 1)			; CHECK: renamable $vgpr40_vgpr41_vgpr42_vgpr43 = FLAT_LOAD_DWORDX4 undef renamable $vgpr0_vgpr1, 0, 0, 0, 0, implicit $exec, implicit $flat_scr :: (load 16, addrspace 1)
	; CHECK: renamable $sgpr6_sgpr7 = SI_PC_ADD_REL_OFFSET target-flags(amdgpu-rel32-lo) @foo + 4, target-flags(amdgpu-rel32-hi) @foo + 4, implicit-def dead $scc			; CHECK: renamable $sgpr6_sgpr7 = SI_PC_ADD_REL_OFFSET target-flags(amdgpu-rel32-lo) @foo + 4, target-flags(amdgpu-rel32-hi) @foo + 4, implicit-def dead $scc
	; CHECK: ADJCALLSTACKUP 0, 0, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr95			; CHECK: ADJCALLSTACKUP 0, 0, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr95
	; CHECK: $sgpr4 = COPY $sgpr95			; CHECK: $sgpr4 = COPY $sgpr95
	; CHECK: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr6_sgpr7, @foo, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr4			; CHECK: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr6_sgpr7, @foo, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr4
	; CHECK: ADJCALLSTACKDOWN 0, 4, implicit-def $scc, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr95			; CHECK: ADJCALLSTACKDOWN 0, 4, implicit-def $scc, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr95
	; CHECK: renamable $sgpr6_sgpr7 = SI_PC_ADD_REL_OFFSET target-flags(amdgpu-rel32-lo) @bar + 4, target-flags(amdgpu-rel32-hi) @bar + 4, implicit-def dead $scc			; CHECK: renamable $sgpr6_sgpr7 = SI_PC_ADD_REL_OFFSET target-flags(amdgpu-rel32-lo) @bar + 4, target-flags(amdgpu-rel32-hi) @bar + 4, implicit-def dead $scc
	; CHECK: ADJCALLSTACKUP 0, 0, implicit-def $scc, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr95			; CHECK: ADJCALLSTACKUP 0, 0, implicit-def $scc, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr95
	; CHECK: $sgpr4 = COPY $sgpr95			; CHECK: $sgpr4 = COPY $sgpr95
	; CHECK: $vgpr0 = COPY renamable $vgpr32			; CHECK: $vgpr0 = COPY renamable $vgpr40
	; CHECK: $vgpr1 = COPY renamable $vgpr33			; CHECK: $vgpr1 = COPY renamable $vgpr41
	; CHECK: $vgpr2 = COPY renamable $vgpr34			; CHECK: $vgpr2 = COPY renamable $vgpr42
	; CHECK: $vgpr3 = KILL undef renamable $vgpr3			; CHECK: $vgpr3 = KILL undef renamable $vgpr3
	; CHECK: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr6_sgpr7, @bar, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr4, implicit $vgpr0, implicit killed $vgpr1, implicit killed $vgpr2, implicit killed $vgpr3, implicit-def $vgpr0			; CHECK: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr6_sgpr7, @bar, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr4, implicit $vgpr0, implicit killed $vgpr1, implicit killed $vgpr2, implicit killed $vgpr3, implicit-def $vgpr0
	; CHECK: ADJCALLSTACKDOWN 0, 4, implicit-def $scc, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr95			; CHECK: ADJCALLSTACKDOWN 0, 4, implicit-def $scc, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr95
	; CHECK: FLAT_STORE_DWORD undef renamable $vgpr0_vgpr1, killed renamable $vgpr0, 0, 0, 0, 0, implicit $exec, implicit $flat_scr :: (store 4, addrspace 1)			; CHECK: FLAT_STORE_DWORD undef renamable $vgpr0_vgpr1, killed renamable $vgpr0, 0, 0, 0, 0, implicit $exec, implicit $flat_scr :: (store 4, addrspace 1)
	; CHECK: S_ENDPGM 0			; CHECK: S_ENDPGM 0
	%0:vreg_128 = FLAT_LOAD_DWORDX4 undef %1:vreg_64, 0, 0, 0, 0, implicit $exec, implicit $flat_scr :: (load 16, addrspace 1)			%0:vreg_128 = FLAT_LOAD_DWORDX4 undef %1:vreg_64, 0, 0, 0, 0, implicit $exec, implicit $flat_scr :: (load 16, addrspace 1)
	%2:sreg_64 = SI_PC_ADD_REL_OFFSET target-flags(amdgpu-rel32-lo) @foo + 4, target-flags(amdgpu-rel32-hi) @foo + 4, implicit-def dead $scc			%2:sreg_64 = SI_PC_ADD_REL_OFFSET target-flags(amdgpu-rel32-lo) @foo + 4, target-flags(amdgpu-rel32-hi) @foo + 4, implicit-def dead $scc
	ADJCALLSTACKUP 0, 0, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr95, implicit-def $scc			ADJCALLSTACKUP 0, 0, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr95, implicit-def $scc
	Show All 17 Lines

llvm/test/CodeGen/AMDGPU/wave32.ll

	Show First 20 Lines • Show All 1,052 Lines • ▼ Show 20 Lines
	; Test save/restore of VGPR needed for SGPR spilling.			; Test save/restore of VGPR needed for SGPR spilling.

	; GCN-LABEL: {{^}}callee_no_stack_with_call:			; GCN-LABEL: {{^}}callee_no_stack_with_call:
	; GCN: s_waitcnt			; GCN: s_waitcnt
	; GCN-NEXT: s_waitcnt_vscnt			; GCN-NEXT: s_waitcnt_vscnt

	; GFX1064-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GFX1064-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GFX1032-NEXT: s_or_saveexec_b32 [[COPY_EXEC0:s[0-9]]], -1{{$}}			; GFX1032-NEXT: s_or_saveexec_b32 [[COPY_EXEC0:s[0-9]]], -1{{$}}
	; GCN-NEXT: buffer_store_dword v32, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: v_nop			; GCN-NEXT: v_nop
	; GFX1064-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]			; GFX1064-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
	; GFX1032-NEXT: s_mov_b32 exec_lo, [[COPY_EXEC0]]			; GFX1032-NEXT: s_mov_b32 exec_lo, [[COPY_EXEC0]]

	; GCN-NEXT: v_writelane_b32 v32, s33, 2			; GCN-NEXT: v_writelane_b32 v40, s33, 2
	; GCN: s_mov_b32 s33, s32			; GCN: s_mov_b32 s33, s32
	; GFX1064: s_add_u32 s32, s32, 0x400			; GFX1064: s_add_u32 s32, s32, 0x400
	; GFX1032: s_add_u32 s32, s32, 0x200			; GFX1032: s_add_u32 s32, s32, 0x200


	; GCN-DAG: v_writelane_b32 v32, s30, 0			; GCN-DAG: v_writelane_b32 v40, s30, 0
	; GCN-DAG: v_writelane_b32 v32, s31, 1			; GCN-DAG: v_writelane_b32 v40, s31, 1
	; GCN: s_swappc_b64			; GCN: s_swappc_b64
	; GCN-DAG: v_readlane_b32 s4, v32, 0			; GCN-DAG: v_readlane_b32 s4, v40, 0
	; GCN-DAG: v_readlane_b32 s5, v32, 1			; GCN-DAG: v_readlane_b32 s5, v40, 1


	; GFX1064: s_sub_u32 s32, s32, 0x400			; GFX1064: s_sub_u32 s32, s32, 0x400
	; GFX1032: s_sub_u32 s32, s32, 0x200			; GFX1032: s_sub_u32 s32, s32, 0x200
	; GCN: v_readlane_b32 s33, v32, 2			; GCN: v_readlane_b32 s33, v40, 2
	; GFX1064: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GFX1064: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GFX1032: s_or_saveexec_b32 [[COPY_EXEC1:s[0-9]]], -1{{$}}			; GFX1032: s_or_saveexec_b32 [[COPY_EXEC1:s[0-9]]], -1{{$}}
	; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: v_nop			; GCN-NEXT: v_nop
	; GFX1064-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]			; GFX1064-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
	; GFX1032-NEXT: s_mov_b32 exec_lo, [[COPY_EXEC1]]			; GFX1032-NEXT: s_mov_b32 exec_lo, [[COPY_EXEC1]]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64			; GCN-NEXT: s_setpc_b64
	define void @callee_no_stack_with_call() #1 {			define void @callee_no_stack_with_call() #1 {
	call void @external_void_func_void()			call void @external_void_func_void()
	ret void			ret void
	Show All 36 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Introduce more scratch registers in the ABI.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 262168

llvm/docs/AMDGPUUsage.rst

llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td

llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll

llvm/test/CodeGen/AMDGPU/call-argument-types.ll

llvm/test/CodeGen/AMDGPU/call-graph-register-usage.ll

llvm/test/CodeGen/AMDGPU/call-preserved-registers.ll

llvm/test/CodeGen/AMDGPU/call-waitcnt.ll

llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll

llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs-fixed-abi.ll

llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs.ll

llvm/test/CodeGen/AMDGPU/callee-special-input-vgprs.ll

llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll

llvm/test/CodeGen/AMDGPU/ipra-regmask.ll

llvm/test/CodeGen/AMDGPU/mul24-pass-ordering.ll

llvm/test/CodeGen/AMDGPU/nested-calls.ll

llvm/test/CodeGen/AMDGPU/regbank-reassign.mir

llvm/test/CodeGen/AMDGPU/sibling-call.ll

llvm/test/CodeGen/AMDGPU/spill-csr-frame-ptr-reg-copy.ll

llvm/test/CodeGen/AMDGPU/stack-pointer-offset-relative-frameindex.ll

llvm/test/CodeGen/AMDGPU/vgpr-tuple-allocation.ll

llvm/test/CodeGen/AMDGPU/virtregrewrite-undef-identity-copy.mir

llvm/test/CodeGen/AMDGPU/wave32.ll

[AMDGPU] Introduce more scratch registers in the ABI.
ClosedPublic