This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/test/CodeGenOpenCL/
-
test/
-
CodeGenOpenCL/
-
builtins-amdgcn.cl
-
llvm/
-
docs/
-
LangRef.rst
-
lib/
-
Analysis/
-
BasicAliasAnalysis.cpp
-
IR/
-
Value.cpp
-
test/
-
Analysis/BasicAA/
-
BasicAA/
1
dereferenceable.ll
-
CodeGen/
-
AArch64/GlobalISel/
-
GlobalISel/
-
irtranslator-load-metadata.ll
-
AMDGPU/
-
GlobalISel/
-
irtranslator-call-return-values.ll
-
irtranslator-call.ll
-
irtranslator-indirect-call.ll
-
irtranslator-sibling-call.ll
-
buffer-intrinsics-mmo-offsets.ll
-
indirect-addressing-term.ll
-
kernel-args.ll
-
legalize-fp-load-invariant.ll
-
store-local.96.ll
-
PowerPC/
-
memcmp-mergeexpand.ll
-
WebAssembly/
-
reg-stackify.ll
-
X86/
-
load-partial.ll
-
Transforms/
-
InstCombine/AMDGPU/
-
AMDGPU/
1
memcpy-from-constant.ll
-
LICM/
1
hoist-deref-load.ll
-
MergeICmps/X86/
-
X86/
-
addressspaces.ll

Differential D110745

Redefine deref(N) attribute family as point-in-time semantics (aka deref-at-point)
AbandonedPublic

Authored by reames on Sep 29 2021, 11:23 AM.

Download Raw Diff

Details

Reviewers

jdoerfert
nlopes
nhaehnle
nikic
fhahn
asbirlea
SjoerdMeijer
dmgreen
efriedma

Summary

This change shifts all of our mechanisms for specifying dereferenceability to be explicitly point-in-time. This change covers both argument and return attribute forms of both dereferenceable(N) and dereferenceable_or_null(N) attributes, as well as the parallel family of metadata.

The change in semantics is believed to be backwards compatible - that is, legacy bitcode which expected the previous semantics will not miscompile. There may be a non-trivial performance (i.e. optimization quality) impact with this change. Please see detailed discussion later on.

A bit of history for context...

Our existing semantics (as implemented in the optimizer), were primarily scope based. Once an argument was marked dereferenceable, the underlying memory was assumed to *remain* dereferenceable until the end of the dynamic scope. It's been pointed out a couple of times that this didn't align with the way clang uses the attribute, and there were even mentions of potential miscompiles.

The tension comes up from the fact that the legacy semantics happen to be really useful for other frontends, and we've been hesitant to give up the optimization power implied. We don't really have a great mechanism for context-sensitive dereferenceability.

The last iteration of this - from 2019 - eventually evolved into a proposal to split the attributes into two variants - one scoped, one point-in-time. You can see the history and discussion on D61652, but the proposal never landed.

Earlier this year, I returned to the topic and worked with Johannes (the author of the last attempt) to come up with a proposal which allowed us - we thought - to use the point-in-time semantics while recovering the scoped semantics in all practical cases. Unfortunately, this attempt failed as it turns out we don't have broad agreement on what some of our function and argument attributes actually mean. See D99100 and linked reviews/RFC for the history.

While this latest effort didn't succeed for *all* cases where the scoped semantics might be useful, it did succeed for one in particular - the GCed language use case which motivated my strong objections to Johannes' earlier proposals.

All of this is to say, we're going back to the original idea of just redefining the attributes to the weaker semantics, and giving up some optimization potential in the progress.

Expected code quality impact

There may be a negative code quality (performance) impact on some workloads. In particular, there is great uncertainty about the impact on rust code compiled by rustc. On the C/C++/Clang side, no large impact has been identified to date, but we also have not run extensive benchmarks. (Reviewer help with that would be very appreciated!)

The risk of negative code quality impacts will be reduced for LTO and languages with larger modules. We have managed to strengthen the (on by default) inference of the nosync and nofree function attributes, and one of the few cases we got everyone on board with was that an argument to such a function couldn't be freed within it's dynamic scope. As a result, attribute inference will tend to extend point-in-time semantics to scoped semantics for many common cases, provided we can see the whole call graph.

However, once we inline that function out of existence, we will tend to loose the strong deref fact in the callers scope. This means there's a pass ordering interaction which is undesirable, but also hard to avoid as we don't yet have a fully context sensitive deref analysis. We may, in the future, need to consider implementing one. (To be clear, this is not a totally new problem. The old scoped semantics had the same issue for the argument attributes, which were the most heavily used.)

If you think you might have code quality impact, you can test using the -use-dereferenceable-at-point-semantics=false flag. Once confirmed, help reproducing and reducing would be very appreciated.

On test changes

I had previously made an effort to go through and add "nofree nosync" to tests where doing so seemed to preserve the spirit of the test. What's left are cases where a) the semantic change is meaningful, or b) tests which have been updated since my last sweep. Out of these, the only ones I really haven't investigated are the AMDGPU changes.

Diff Detail

Event Timeline

reames created this revision.Sep 29 2021, 11:23 AM

Herald added subscribers: dexonsmith, kerbowa, pengfei and 11 others. · View Herald TranscriptSep 29 2021, 11:23 AM

reames requested review of this revision.Sep 29 2021, 11:23 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptSep 29 2021, 11:23 AM

Herald added subscribers: cfe-commits, aheejin. · View Herald Transcript

reames mentioned this in D99100: [WIP] Implement RFC: Decomposing deref(N) into deref(N) + nofree.Sep 29 2021, 11:24 AM

reames edited the summary of this revision. (Show Details)Sep 29 2021, 11:42 AM

nikic added a reviewer: nikic.Sep 29 2021, 11:46 AM

Harbormaster completed remote builds in B126393: Diff 375968.Sep 29 2021, 11:54 AM

vchuravy added a subscriber: vchuravy.Sep 29 2021, 8:37 PM

LGTM. We have discussed this before. It's important to fix the reference types in C++.

llvm/test/Analysis/BasicAA/dereferenceable.ll
66	this one is unfortunate.. They can't alias because objsize(%obj) < dereferenceable size of %ret. This case should be trivial to catch.

This revision is now accepted and ready to land.Oct 1 2021, 2:43 AM

Sorry, but the fact that there is still no way to opt-in to the old behavior is still a blocker from my side. If we can't use dereferenceable + nofree arguments for that purpose, then we need to provide a different way to do that. Like dereferenceable + really_nofree. It looks like the current implementation doesn't even accept the dereferenceable + nofree + noalias case originally proposed (which is pretty bad from a design perspective, but would at least work fairly well for rustc in practice). I don't think that our current analysis capabilities are sufficient to land this change at this time.

This revision now requires changes to proceed.Oct 1 2021, 3:26 AM

In D110745#3035975, @nikic wrote:

Sorry, but the fact that there is still no way to opt-in to the old behavior is still a blocker from my side. If we can't use dereferenceable + nofree arguments for that purpose, then we need to provide a different way to do that. Like dereferenceable + really_nofree. It looks like the current implementation doesn't even accept the dereferenceable + nofree + noalias case originally proposed (which is pretty bad from a design perspective, but would at least work fairly well for rustc in practice). I don't think that our current analysis capabilities are sufficient to land this change at this time.

@nikic Do you have any specific examples of where this causes a workload to regress? At this point, I really need something specific as opposed to a general concern. We're at the point where perfection is very much the enemy of the good here. As noted, I've already spent a lot of time trying to minimize impact.

ychen added a subscriber: ychen.Oct 2 2021, 12:39 PM

nikic mentioned this in D111028: [BasicAA] Ignore CanBeFreed in minimal extent reasoning.Oct 3 2021, 8:48 AM

This really needs to be properly benchmarked.

llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll
9	No longer happens..
llvm/test/Transforms/LICM/hoist-deref-load.ll
419	Regression in that C code?

nikic mentioned this in rG30001af84ec5: [BasicAA] Ignore CanBeFreed in minimal extent reasoning.Oct 4 2021, 1:09 PM

@nikic ping on previous question. It's been a month, and this has been LGTMed. Without response, I plan to land this.

In D110745#3038848, @xbolva00 wrote:

This really needs to be properly benchmarked.

This has been benchmarked on every workload I care about, and shows no interesting regressions. Unfortunately, those are all non-public Java workloads,

On the C/C++ side, I don't have a ready environment in which to run anything representative. From the semantic change, I wouldn't expect C++ to show much difference, and besides, this is fixing a long standing fairly major correctness issue. If you have particular suites you care about, please run them and share results.

At this point, I strongly lean towards committing and letting regressions be reported. We might revert, or we might simply fix forward depending on what comes up

Herald added a subscriber: jeroen.dobbelaere. · View Herald TranscriptNov 12 2021, 3:56 PM

xbolva00 added reviewers: fhahn, asbirlea, SjoerdMeijer, dmgreen, efriedma.Nov 13 2021, 1:14 AM

In D110745#3128719, @reames wrote:

@nikic ping on previous question. It's been a month, and this has been LGTMed. Without response, I plan to land this.

Sorry, I did do some measurements but forgot to report back. The only run-time workload I can easily measure are rustc check builds, where I observed regressions ranging between -0.6% and +4.8% across different projects (the -0.6% "regressions" indicate an improvement, i.e. that some deref-based optimization happens to be non-profitable). I'm not sure that helps you much apart from saying that yes indeed, this does have some impact on rust code. It's not catastrophic (though caveat emptor: impact on "benchmarky" code may well be higher), but it's also avoidable.

In D110745#3038848, @xbolva00 wrote:

This really needs to be properly benchmarked.

This has been benchmarked on every workload I care about, and shows no interesting regressions. Unfortunately, those are all non-public Java workloads,

I think something important to mention here is that the Java (i.e. GC'd language) case is also the only where we don't really expect a regression, because it has good modeling under the proposed patch: GC'd pointers can't be freed during the optimization pipeline (until statepoint lowering), so they're simply not affected by this change. For that reason I don't think the fact that Java workloads don't see regressions really tells us anything about how this would behave for other frontends, which are mostly not GC'd.

On the C/C++ side, I don't have a ready environment in which to run anything representative. From the semantic change, I wouldn't expect C++ to show much difference, and besides, this is fixing a long standing fairly major correctness issue. If you have particular suites you care about, please run them and share results.

Maybe @asbirlea or @aeubanks can run some benchmarks? I would expect regressions for C++, because nofree inference is very limited (e.g. won't be inferred for pretty much any code using growing STL containers). Though at least in the C++ case regressions may be somewhat justified, in that this fixes a correctness issue in some cases.

At this point, I strongly lean towards committing and letting regressions be reported. We might revert, or we might simply fix forward depending on what comes up

I'm not sure what you're expecting from that. At least as far as Rust is concerned, the problem here seems pretty well-understood to me: You are dropping (without replacement) the ability to specify that an argument is dereferenceable within a function. I'm perfectly happy with the change in default behavior, all I want is a way to get the old one back. I don't think that having an example of that in "real" code is going to add any useful information.

I'm stopping work on this. This has already exceeded the amount of work which is worthwhile for me, and it seems there is yet more work needed.

On @nikic's prompting, I finally went ahead and got the test-suite setup and tested a clang version with and without the flag thrown. Unfortunately, the results were not pretty. We have significant regressions in some of the memset/memcpy tests. I did not bother to dig into why in detail, but I suspect the lack of context sensitivity is biting us.

Between the measured regressions on both C++ and rust, I don't think this can go in.

At this point, I've done everything I reasonable can to drive this to conclusion. My actual motivation for this was a purely defensive effort to avoid regressing Java performance when this someday got fixed, and to make a good faith effort to justify my objections to Johannes' original patches. That is complete.

Frankly, I think it's incredibly unfortunate that clang has an active miscompile, and no one seems motivated to fix that after *years* of it being there. However, I have no commercial interest in clang, and the amount of work that seems to be remaining is well beyond anything I'm willing to do on a volunteer basis.

Let me summarize some ideas on future direction for the next poor person who stumbles into this rats nest.

The approach taken in this round of trying to infer scoped dereferenceability from existing attributes and to strengthen the inference of such to cover practical cases has been partially successful, but I no longer believe can be pushed across the finish line. The problem here is not technical, but political. We appear to have unresolved disagreements about the semantics of attributes, and the review process towards resolving those disagreements touch on many otherwise disjoint parts of the project. I would definitely not advise moving further in this direction unless you greatly enjoy herding cats.

We could implement a contextual dereferenceability analysis. This is useful to have no matter what, but requires extending the current must-execute logic and finding ways to efficiently make that information available cheaply through much of the pass pipeline. I have some ideas on that, and if someone wants to brainstorm this, feel free to reach out. However, I think it needs to be said that its unclear if a "perfect" version of this analysis is enough to recover the scoped facts in all cases. This is a fairly speculative approach, and it might not be enough.

The approach taken in D61652 of splitting the dereferenceability attribute into two is a bit ugly. The objection to this approach in this round was mostly driven by the observation that the "alloc-size" attribute had the same semantic split around whether the implied dereferenceability was scoped or not. The good news is that the work done in this round was enough to cover performance regressions from the "alloc-size" version, and at this point, the only checked in code for "alloc-size" uses the non-globally dereferenceable semantics. (We had to because it was actively miscompiling otherwise.)

Personally, if I was motivated to continue working on this, I'd probably resurrect D61652 and call it a day.

@nikic - Since you now have the sole remaining frontend for which dropping global deref is a performance regression without also being a correctness fix, any chance you're interested in driving this further?

Revision Contents

Path

Size

clang/

test/

CodeGenOpenCL/

builtins-amdgcn.cl

7 lines

llvm/

docs/

LangRef.rst

34 lines

lib/

Analysis/

BasicAliasAnalysis.cpp

1 line

IR/

Value.cpp

2 lines

test/

Analysis/

BasicAA/

dereferenceable.ll

2 lines

CodeGen/

AArch64/

GlobalISel/

irtranslator-load-metadata.ll

66 lines

AMDGPU/

GlobalISel/

irtranslator-call-return-values.ll

6 lines

irtranslator-call.ll

6 lines

irtranslator-indirect-call.ll

116 lines

irtranslator-sibling-call.ll

2 lines

buffer-intrinsics-mmo-offsets.ll

471 lines

indirect-addressing-term.ll

188 lines

kernel-args.ll

68 lines

legalize-fp-load-invariant.ll

2 lines

store-local.96.ll

458 lines

PowerPC/

memcmp-mergeexpand.ll

20 lines

WebAssembly/

reg-stackify.ll

2 lines

X86/

load-partial.ll

26 lines

Transforms/

InstCombine/

AMDGPU/

memcpy-from-constant.ll

36 lines

LICM/

hoist-deref-load.ll

16 lines

MergeICmps/

X86/

addressspaces.ll

21 lines

Diff 375968

clang/test/CodeGenOpenCL/builtins-amdgcn.cl

Show First 20 Lines • Show All 547 Lines • ▼ Show 20 Lines	void test_get_workgroup_size(int d, global int *out)
case 0: *out = __builtin_amdgcn_workgroup_size_x() + 1; break;		case 0: *out = __builtin_amdgcn_workgroup_size_x() + 1; break;
case 1: *out = __builtin_amdgcn_workgroup_size_y(); break;		case 1: *out = __builtin_amdgcn_workgroup_size_y(); break;
case 2: *out = __builtin_amdgcn_workgroup_size_z(); break;		case 2: *out = __builtin_amdgcn_workgroup_size_z(); break;
default: *out = 0;		default: *out = 0;
}		}
}		}

// CHECK-LABEL: @test_get_grid_size(		// CHECK-LABEL: @test_get_grid_size(
		// CHECK: phi i64 [ 20, %sw.bb2 ], [ 16, %sw.bb1 ], [ 12, %entry ]
// CHECK: call align 4 dereferenceable(64) i8 addrspace(4)* @llvm.amdgcn.dispatch.ptr()		// CHECK: call align 4 dereferenceable(64) i8 addrspace(4)* @llvm.amdgcn.dispatch.ptr()
// CHECK: getelementptr i8, i8 addrspace(4)* %{{.*}}, i64 12		// CHECK: getelementptr i8, i8 addrspace(4)* %{{.}}, i64 %{{.}}
// CHECK: load i32, i32 addrspace(4)* %{{.*}}, align 4, !invariant.load
// CHECK: getelementptr i8, i8 addrspace(4)* %{{.*}}, i64 16
// CHECK: load i32, i32 addrspace(4)* %{{.*}}, align 4, !invariant.load
// CHECK: getelementptr i8, i8 addrspace(4)* %{{.*}}, i64 20
// CHECK: load i32, i32 addrspace(4)* %{{.*}}, align 4, !invariant.load		// CHECK: load i32, i32 addrspace(4)* %{{.*}}, align 4, !invariant.load
void test_get_grid_size(int d, global int *out)		void test_get_grid_size(int d, global int *out)
{		{
switch (d) {		switch (d) {
case 0: *out = __builtin_amdgcn_grid_size_x(); break;		case 0: *out = __builtin_amdgcn_grid_size_x(); break;
case 1: *out = __builtin_amdgcn_grid_size_y(); break;		case 1: *out = __builtin_amdgcn_grid_size_y(); break;
case 2: *out = __builtin_amdgcn_grid_size_z(); break;		case 2: *out = __builtin_amdgcn_grid_size_z(); break;
default: *out = 0;		default: *out = 0;
▲ Show 20 Lines • Show All 186 Lines • Show Last 20 Lines

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,287 Lines • ▼ Show 20 Lines	``nonnull``
This indicates that the parameter or return pointer is not null. This		This indicates that the parameter or return pointer is not null. This
attribute may only be applied to pointer typed parameters. This is not		attribute may only be applied to pointer typed parameters. This is not
checked or enforced by LLVM; if the parameter or return pointer is null,		checked or enforced by LLVM; if the parameter or return pointer is null,
:ref:`poison value <poisonvalues>` is returned or passed instead.		:ref:`poison value <poisonvalues>` is returned or passed instead.
The ``nonnull`` attribute should be combined with the ``noundef`` attribute		The ``nonnull`` attribute should be combined with the ``noundef`` attribute
to ensure a pointer is not null or otherwise the behavior is undefined.		to ensure a pointer is not null or otherwise the behavior is undefined.

``dereferenceable(<n>)``		``dereferenceable(<n>)``
This indicates that the parameter or return pointer is dereferenceable. This		This indicates that the parameter or return pointer is dereferenceable at
attribute may only be applied to pointer typed parameters. A pointer that		the instant of the call. This attribute may only be applied to pointer
is dereferenceable can be loaded from speculatively without a risk of		typed parameters. The number of bytes known to be dereferenceable must
trapping. The number of bytes known to be dereferenceable must be provided		be provided in parentheses. It is legal for the number of bytes to be less
in parentheses. It is legal for the number of bytes to be less than the		than the size of the pointee type.
size of the pointee type. The ``nonnull`` attribute does not imply
dereferenceability (consider a pointer to one element past the end of an		A pointer that is dereferenceable at a particular location in the program
array), however ``dereferenceable(<n>)`` does imply ``nonnull`` in		can be loaded from speculatively without a risk of trapping at that
``addrspace(0)`` (which is the default address space), except if the		location. In general, once a memory location becomes dereferenceable, it
		will remain dereferenceable until the underlying object is freed.

		The ``nonnull`` attribute does not imply dereferenceability (consider a
		pointer to one element past the end of an array), however
		``dereferenceable(<n>)`` does imply ``nonnull`` in ``addrspace(0)``
		(which is the default address space), except if the
``null_pointer_is_valid`` function attribute is present.		``null_pointer_is_valid`` function attribute is present.
``n`` should be a positive number. The pointer should be well defined,		``n`` should be a positive number. The pointer should be well defined,
otherwise it is undefined behavior. This means ``dereferenceable(<n>)``		otherwise it is undefined behavior. This means ``dereferenceable(<n>)``
implies ``noundef``.		implies ``noundef``.

``dereferenceable_or_null(<n>)``		``dereferenceable_or_null(<n>)``
This indicates that the parameter or return value isn't both		This indicates that the parameter or return value isn't both
non-null and non-dereferenceable (up to ``<n>`` bytes) at the same		non-null and non-dereferenceable (up to ``<n>`` bytes) at the same
▲ Show 20 Lines • Show All 4,976 Lines • ▼ Show 20 Lines
or switch that it is attached to is completely unpredictable.		or switch that it is attached to is completely unpredictable.

.. _md_dereferenceable:		.. _md_dereferenceable:

'``dereferenceable``' Metadata		'``dereferenceable``' Metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The existence of the ``!dereferenceable`` metadata on the instruction		The existence of the ``!dereferenceable`` metadata on the instruction
tells the optimizer that the value loaded is known to be dereferenceable.		tells the optimizer that the value loaded is known to be dereferenceable at
The number of bytes known to be dereferenceable is specified by the integer		that program location. The number of bytes known to be dereferenceable is
value in the metadata node. This is analogous to the ''dereferenceable''		specified by the integer value in the metadata node. This is analogous to the
attribute on parameters and return values.		''dereferenceable'' attribute on parameters and return values.

.. _md_dereferenceable_or_null:		.. _md_dereferenceable_or_null:

'``dereferenceable_or_null``' Metadata		'``dereferenceable_or_null``' Metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The existence of the ``!dereferenceable_or_null`` metadata on the		The existence of the ``!dereferenceable_or_null`` metadata on the
instruction tells the optimizer that the value loaded is known to be either		instruction tells the optimizer that the value loaded is known to be either
dereferenceable or null.		dereferenceable or null at that program location.
The number of bytes known to be dereferenceable is specified by the integer		The number of bytes known to be dereferenceable is specified by the integer
value in the metadata node. This is analogous to the ''dereferenceable_or_null''		value in the metadata node. This is analogous to the ''dereferenceable_or_null''
attribute on parameters and return values.		attribute on parameters and return values.

.. _llvm.loop:		.. _llvm.loop:

'``llvm.loop``'		'``llvm.loop``'
^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^
▲ Show 20 Lines • Show All 17,293 Lines • Show Last 20 Lines

llvm/lib/Analysis/BasicAliasAnalysis.cpp

Show First 20 Lines • Show All 202 Lines • ▼ Show 20 Lines	static uint64_t getMinimalExtentFrom(const Value &V,
bool NullIsValidLoc) {		bool NullIsValidLoc) {
// If we have dereferenceability information we know a lower bound for the		// If we have dereferenceability information we know a lower bound for the
// extent as accesses for a lower offset would be valid. We need to exclude		// extent as accesses for a lower offset would be valid. We need to exclude
// the "or null" part if null is a valid pointer.		// the "or null" part if null is a valid pointer.
bool CanBeNull, CanBeFreed;		bool CanBeNull, CanBeFreed;
uint64_t DerefBytes =		uint64_t DerefBytes =
V.getPointerDereferenceableBytes(DL, CanBeNull, CanBeFreed);		V.getPointerDereferenceableBytes(DL, CanBeNull, CanBeFreed);
DerefBytes = (CanBeNull && NullIsValidLoc) ? 0 : DerefBytes;		DerefBytes = (CanBeNull && NullIsValidLoc) ? 0 : DerefBytes;
		// TODO: We may be able to do better for a potentially freed pointer here
DerefBytes = CanBeFreed ? 0 : DerefBytes;		DerefBytes = CanBeFreed ? 0 : DerefBytes;
// If queried with a precise location size, we assume that location size to be		// If queried with a precise location size, we assume that location size to be
// accessed, thus valid.		// accessed, thus valid.
if (LocSize.isPrecise())		if (LocSize.isPrecise())
DerefBytes = std::max(DerefBytes, LocSize.getValue());		DerefBytes = std::max(DerefBytes, LocSize.getValue());
return DerefBytes;		return DerefBytes;
}		}

▲ Show 20 Lines • Show All 1,668 Lines • Show Last 20 Lines

llvm/lib/IR/Value.cpp

	Show All 33 Lines
	#include "llvm/Support/ErrorHandling.h"			#include "llvm/Support/ErrorHandling.h"
	#include "llvm/Support/ManagedStatic.h"			#include "llvm/Support/ManagedStatic.h"
	#include "llvm/Support/raw_ostream.h"			#include "llvm/Support/raw_ostream.h"
	#include <algorithm>			#include <algorithm>

	using namespace llvm;			using namespace llvm;

	static cl::opt<unsigned> UseDerefAtPointSemantics(			static cl::opt<unsigned> UseDerefAtPointSemantics(
	"use-dereferenceable-at-point-semantics", cl::Hidden, cl::init(false),			"use-dereferenceable-at-point-semantics", cl::Hidden, cl::init(true),
	cl::desc("Deref attributes and metadata infer facts at definition only"));			cl::desc("Deref attributes and metadata infer facts at definition only"));

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Value Class			// Value Class
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	static inline Type checkType(Type Ty) {			static inline Type checkType(Type Ty) {
	assert(Ty && "Value defined with a null type: Error!");			assert(Ty && "Value defined with a null type: Error!");
	return Ty;			return Ty;
	▲ Show 20 Lines • Show All 1,197 Lines • Show Last 20 Lines

llvm/test/Analysis/BasicAA/dereferenceable.ll

Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	bb:
store i32 1, i32* %obj, align 4		store i32 1, i32* %obj, align 4
store i64 0, i64* %ret, align 8		store i64 0, i64* %ret, align 8
%tmp = load i32, i32* %obj, align 4		%tmp = load i32, i32* %obj, align 4
ret i32 %tmp		ret i32 %tmp
}		}

define i32 @local_and_deref_ret_2() {		define i32 @local_and_deref_ret_2() {
; CHECK: Function: local_and_deref_ret_2: 2 pointers, 2 call sites		; CHECK: Function: local_and_deref_ret_2: 2 pointers, 2 call sites
; CHECK-NEXT: NoAlias: i32* %obj, i32* %ret		; CHECK-NEXT: MayAlias: i32* %obj, i32* %ret
		nlopesUnsubmitted Not Done Reply Inline Actions this one is unfortunate.. They can't alias because objsize(%obj) < dereferenceable size of %ret. This case should be trivial to catch. nlopes: this one is unfortunate.. They can't alias because objsize(%obj) < dereferenceable size of %ret.
bb:		bb:
%obj = alloca i32		%obj = alloca i32
call void @unknown(i32* %obj)		call void @unknown(i32* %obj)
%ret = call dereferenceable(8) i32* @get_i32_deref8()		%ret = call dereferenceable(8) i32* @get_i32_deref8()
store i32 1, i32* %obj, align 4		store i32 1, i32* %obj, align 4
store i32 0, i32* %ret, align 8		store i32 0, i32* %ret, align 8
%tmp = load i32, i32* %obj, align 4		%tmp = load i32, i32* %obj, align 4
ret i32 %tmp		ret i32 %tmp
▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-load-metadata.ll

	; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	; RUN: llc -mtriple=aarch64-- -mcpu=falkor -O0 -aarch64-enable-atomic-cfg-tidy=0 -stop-after=irtranslator -global-isel -verify-machineinstrs %s -o - \| FileCheck %s			; RUN: llc -mtriple=aarch64-- -mcpu=falkor -O0 -aarch64-enable-atomic-cfg-tidy=0 -stop-after=irtranslator -global-isel -verify-machineinstrs %s -o - \| FileCheck %s

	define i32 @load_invariant(i32* %ptr) {			define i32 @load_invariant(i32* %ptr) {
	; CHECK-LABEL: name: load_invariant			; CHECK-LABEL: name: load_invariant
	; CHECK: bb.1 (%ir-block.0):			; CHECK: bb.1 (%ir-block.0):
	; CHECK: liveins: $x0			; CHECK-NEXT: liveins: $x0
	; CHECK: [[COPY:%[0-9]+]]:_(p0) = COPY $x0			; CHECK-NEXT: {{ $}}
	; CHECK: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[COPY]](p0) :: (invariant load (s32) from %ir.ptr)			; CHECK-NEXT: [[COPY:%[0-9]+]]:_(p0) = COPY $x0
	; CHECK: $w0 = COPY [[LOAD]](s32)			; CHECK-NEXT: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[COPY]](p0) :: (invariant load (s32) from %ir.ptr)
	; CHECK: RET_ReallyLR implicit $w0			; CHECK-NEXT: $w0 = COPY [[LOAD]](s32)
				; CHECK-NEXT: RET_ReallyLR implicit $w0
	%load = load i32, i32* %ptr, align 4, !invariant.load !0			%load = load i32, i32* %ptr, align 4, !invariant.load !0
	ret i32 %load			ret i32 %load
	}			}

	define i32 @load_volatile_invariant(i32* %ptr) {			define i32 @load_volatile_invariant(i32* %ptr) {
	; CHECK-LABEL: name: load_volatile_invariant			; CHECK-LABEL: name: load_volatile_invariant
	; CHECK: bb.1 (%ir-block.0):			; CHECK: bb.1 (%ir-block.0):
	; CHECK: liveins: $x0			; CHECK-NEXT: liveins: $x0
	; CHECK: [[COPY:%[0-9]+]]:_(p0) = COPY $x0			; CHECK-NEXT: {{ $}}
	; CHECK: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[COPY]](p0) :: (volatile invariant load (s32) from %ir.ptr)			; CHECK-NEXT: [[COPY:%[0-9]+]]:_(p0) = COPY $x0
	; CHECK: $w0 = COPY [[LOAD]](s32)			; CHECK-NEXT: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[COPY]](p0) :: (volatile invariant load (s32) from %ir.ptr)
	; CHECK: RET_ReallyLR implicit $w0			; CHECK-NEXT: $w0 = COPY [[LOAD]](s32)
				; CHECK-NEXT: RET_ReallyLR implicit $w0
	%load = load volatile i32, i32* %ptr, align 4, !invariant.load !0			%load = load volatile i32, i32* %ptr, align 4, !invariant.load !0
	ret i32 %load			ret i32 %load
	}			}

	define i32 @load_dereferenceable(i32* dereferenceable(4) %ptr) {			define i32 @load_dereferenceable(i32* dereferenceable(4) %ptr) {
	; CHECK-LABEL: name: load_dereferenceable			; CHECK-LABEL: name: load_dereferenceable
	; CHECK: bb.1 (%ir-block.0):			; CHECK: bb.1 (%ir-block.0):
	; CHECK: liveins: $x0			; CHECK-NEXT: liveins: $x0
	; CHECK: [[COPY:%[0-9]+]]:_(p0) = COPY $x0			; CHECK-NEXT: {{ $}}
	; CHECK: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[COPY]](p0) :: (dereferenceable load (s32) from %ir.ptr)			; CHECK-NEXT: [[COPY:%[0-9]+]]:_(p0) = COPY $x0
	; CHECK: $w0 = COPY [[LOAD]](s32)			; CHECK-NEXT: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[COPY]](p0) :: (load (s32) from %ir.ptr)
	; CHECK: RET_ReallyLR implicit $w0			; CHECK-NEXT: $w0 = COPY [[LOAD]](s32)
				; CHECK-NEXT: RET_ReallyLR implicit $w0
	%load = load i32, i32* %ptr, align 4			%load = load i32, i32* %ptr, align 4
	ret i32 %load			ret i32 %load
	}			}

	define i32 @load_dereferenceable_invariant(i32* dereferenceable(4) %ptr) {			define i32 @load_dereferenceable_invariant(i32* dereferenceable(4) %ptr) {
	; CHECK-LABEL: name: load_dereferenceable_invariant			; CHECK-LABEL: name: load_dereferenceable_invariant
	; CHECK: bb.1 (%ir-block.0):			; CHECK: bb.1 (%ir-block.0):
	; CHECK: liveins: $x0			; CHECK-NEXT: liveins: $x0
	; CHECK: [[COPY:%[0-9]+]]:_(p0) = COPY $x0			; CHECK-NEXT: {{ $}}
	; CHECK: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[COPY]](p0) :: (dereferenceable invariant load (s32) from %ir.ptr)			; CHECK-NEXT: [[COPY:%[0-9]+]]:_(p0) = COPY $x0
	; CHECK: $w0 = COPY [[LOAD]](s32)			; CHECK-NEXT: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[COPY]](p0) :: (invariant load (s32) from %ir.ptr)
	; CHECK: RET_ReallyLR implicit $w0			; CHECK-NEXT: $w0 = COPY [[LOAD]](s32)
				; CHECK-NEXT: RET_ReallyLR implicit $w0
	%load = load i32, i32* %ptr, align 4, !invariant.load !0			%load = load i32, i32* %ptr, align 4, !invariant.load !0
	ret i32 %load			ret i32 %load
	}			}

	define i32 @load_nontemporal(i32* %ptr) {			define i32 @load_nontemporal(i32* %ptr) {
	; CHECK-LABEL: name: load_nontemporal			; CHECK-LABEL: name: load_nontemporal
	; CHECK: bb.1 (%ir-block.0):			; CHECK: bb.1 (%ir-block.0):
	; CHECK: liveins: $x0			; CHECK-NEXT: liveins: $x0
	; CHECK: [[COPY:%[0-9]+]]:_(p0) = COPY $x0			; CHECK-NEXT: {{ $}}
	; CHECK: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[COPY]](p0) :: (non-temporal load (s32) from %ir.ptr)			; CHECK-NEXT: [[COPY:%[0-9]+]]:_(p0) = COPY $x0
	; CHECK: $w0 = COPY [[LOAD]](s32)			; CHECK-NEXT: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[COPY]](p0) :: (non-temporal load (s32) from %ir.ptr)
	; CHECK: RET_ReallyLR implicit $w0			; CHECK-NEXT: $w0 = COPY [[LOAD]](s32)
				; CHECK-NEXT: RET_ReallyLR implicit $w0
	%load = load i32, i32* %ptr, align 4, !nontemporal !0			%load = load i32, i32* %ptr, align 4, !nontemporal !0
	ret i32 %load			ret i32 %load
	}			}

	define i32 @load_falkor_strided_access(i32* %ptr) {			define i32 @load_falkor_strided_access(i32* %ptr) {
	; CHECK-LABEL: name: load_falkor_strided_access			; CHECK-LABEL: name: load_falkor_strided_access
	; CHECK: bb.1 (%ir-block.0):			; CHECK: bb.1 (%ir-block.0):
	; CHECK: liveins: $x0			; CHECK-NEXT: liveins: $x0
	; CHECK: [[COPY:%[0-9]+]]:_(p0) = COPY $x0			; CHECK-NEXT: {{ $}}
	; CHECK: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[COPY]](p0) :: ("aarch64-strided-access" load (s32) from %ir.ptr)			; CHECK-NEXT: [[COPY:%[0-9]+]]:_(p0) = COPY $x0
	; CHECK: $w0 = COPY [[LOAD]](s32)			; CHECK-NEXT: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[COPY]](p0) :: ("aarch64-strided-access" load (s32) from %ir.ptr)
	; CHECK: RET_ReallyLR implicit $w0			; CHECK-NEXT: $w0 = COPY [[LOAD]](s32)
				; CHECK-NEXT: RET_ReallyLR implicit $w0
	%load = load i32, i32* %ptr, align 4, !falkor.strided.access !0			%load = load i32, i32* %ptr, align 4, !falkor.strided.access !0
	ret i32 %load			ret i32 %load
	}			}

	!0 = !{}			!0 = !{}

llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-return-values.ll

Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @test_call_external_i32_func_i32_imm(i32 addrspace(1)* %out) #0 {
; GCN-NEXT: [[COPY4:%[0-9]+]]:sgpr_32 = COPY $sgpr15		; GCN-NEXT: [[COPY4:%[0-9]+]]:sgpr_32 = COPY $sgpr15
; GCN-NEXT: [[COPY5:%[0-9]+]]:sgpr_32 = COPY $sgpr14		; GCN-NEXT: [[COPY5:%[0-9]+]]:sgpr_32 = COPY $sgpr14
; GCN-NEXT: [[COPY6:%[0-9]+]]:sgpr_64 = COPY $sgpr10_sgpr11		; GCN-NEXT: [[COPY6:%[0-9]+]]:sgpr_64 = COPY $sgpr10_sgpr11
; GCN-NEXT: [[COPY7:%[0-9]+]]:sgpr_64 = COPY $sgpr6_sgpr7		; GCN-NEXT: [[COPY7:%[0-9]+]]:sgpr_64 = COPY $sgpr6_sgpr7
; GCN-NEXT: [[COPY8:%[0-9]+]]:sgpr_64 = COPY $sgpr4_sgpr5		; GCN-NEXT: [[COPY8:%[0-9]+]]:sgpr_64 = COPY $sgpr4_sgpr5
; GCN-NEXT: [[COPY9:%[0-9]+]]:_(p4) = COPY $sgpr8_sgpr9		; GCN-NEXT: [[COPY9:%[0-9]+]]:_(p4) = COPY $sgpr8_sgpr9
; GCN-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 42		; GCN-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 42
; GCN-NEXT: [[INT:%[0-9]+]]:_(p4) = G_INTRINSIC intrinsic(@llvm.amdgcn.kernarg.segment.ptr)		; GCN-NEXT: [[INT:%[0-9]+]]:_(p4) = G_INTRINSIC intrinsic(@llvm.amdgcn.kernarg.segment.ptr)
; GCN-NEXT: [[LOAD:%[0-9]+]]:_(p1) = G_LOAD [[INT]](p4) :: (dereferenceable invariant load (p1) from %ir.out.kernarg.offset.cast, align 16, addrspace 4)		; GCN-NEXT: [[LOAD:%[0-9]+]]:_(p1) = G_LOAD [[INT]](p4) :: (invariant load (p1) from %ir.out.kernarg.offset.cast, align 16, addrspace 4)
; GCN-NEXT: ADJCALLSTACKUP 0, 0, implicit-def $scc		; GCN-NEXT: ADJCALLSTACKUP 0, 0, implicit-def $scc
; GCN-NEXT: [[GV:%[0-9]+]]:sreg_64(p0) = G_GLOBAL_VALUE @external_i32_func_i32		; GCN-NEXT: [[GV:%[0-9]+]]:sreg_64(p0) = G_GLOBAL_VALUE @external_i32_func_i32
; GCN-NEXT: [[COPY10:%[0-9]+]]:_(p4) = COPY [[COPY8]]		; GCN-NEXT: [[COPY10:%[0-9]+]]:_(p4) = COPY [[COPY8]]
; GCN-NEXT: [[COPY11:%[0-9]+]]:_(p4) = COPY [[COPY7]]		; GCN-NEXT: [[COPY11:%[0-9]+]]:_(p4) = COPY [[COPY7]]
; GCN-NEXT: [[COPY12:%[0-9]+]]:_(p4) = COPY [[COPY9]](p4)		; GCN-NEXT: [[COPY12:%[0-9]+]]:_(p4) = COPY [[COPY9]](p4)
; GCN-NEXT: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 8		; GCN-NEXT: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 8
; GCN-NEXT: [[PTR_ADD:%[0-9]+]]:_(p4) = G_PTR_ADD [[COPY12]], [[C1]](s64)		; GCN-NEXT: [[PTR_ADD:%[0-9]+]]:_(p4) = G_PTR_ADD [[COPY12]], [[C1]](s64)
; GCN-NEXT: [[COPY13:%[0-9]+]]:_(s64) = COPY [[COPY6]]		; GCN-NEXT: [[COPY13:%[0-9]+]]:_(s64) = COPY [[COPY6]]
▲ Show 20 Lines • Show All 2,743 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @test_call_external_v33i32_func_v33i32_i32(<33 x i32> addrspace(1)* %p, i32 %idx) #0 {
; GCN-NEXT: [[COPY4:%[0-9]+]]:sgpr_32 = COPY $sgpr15		; GCN-NEXT: [[COPY4:%[0-9]+]]:sgpr_32 = COPY $sgpr15
; GCN-NEXT: [[COPY5:%[0-9]+]]:sgpr_32 = COPY $sgpr14		; GCN-NEXT: [[COPY5:%[0-9]+]]:sgpr_32 = COPY $sgpr14
; GCN-NEXT: [[COPY6:%[0-9]+]]:sgpr_64 = COPY $sgpr10_sgpr11		; GCN-NEXT: [[COPY6:%[0-9]+]]:sgpr_64 = COPY $sgpr10_sgpr11
; GCN-NEXT: [[COPY7:%[0-9]+]]:sgpr_64 = COPY $sgpr6_sgpr7		; GCN-NEXT: [[COPY7:%[0-9]+]]:sgpr_64 = COPY $sgpr6_sgpr7
; GCN-NEXT: [[COPY8:%[0-9]+]]:sgpr_64 = COPY $sgpr4_sgpr5		; GCN-NEXT: [[COPY8:%[0-9]+]]:sgpr_64 = COPY $sgpr4_sgpr5
; GCN-NEXT: [[COPY9:%[0-9]+]]:_(p4) = COPY $sgpr8_sgpr9		; GCN-NEXT: [[COPY9:%[0-9]+]]:_(p4) = COPY $sgpr8_sgpr9
; GCN-NEXT: [[DEF:%[0-9]+]]:_(p1) = G_IMPLICIT_DEF		; GCN-NEXT: [[DEF:%[0-9]+]]:_(p1) = G_IMPLICIT_DEF
; GCN-NEXT: [[INT:%[0-9]+]]:_(p4) = G_INTRINSIC intrinsic(@llvm.amdgcn.kernarg.segment.ptr)		; GCN-NEXT: [[INT:%[0-9]+]]:_(p4) = G_INTRINSIC intrinsic(@llvm.amdgcn.kernarg.segment.ptr)
; GCN-NEXT: [[LOAD:%[0-9]+]]:_(p1) = G_LOAD [[INT]](p4) :: (dereferenceable invariant load (p1) from %ir.p.kernarg.offset.cast, align 16, addrspace 4)		; GCN-NEXT: [[LOAD:%[0-9]+]]:_(p1) = G_LOAD [[INT]](p4) :: (invariant load (p1) from %ir.p.kernarg.offset.cast, align 16, addrspace 4)
; GCN-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 8		; GCN-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 8
; GCN-NEXT: [[PTR_ADD:%[0-9]+]]:_(p4) = G_PTR_ADD [[INT]], [[C]](s64)		; GCN-NEXT: [[PTR_ADD:%[0-9]+]]:_(p4) = G_PTR_ADD [[INT]], [[C]](s64)
; GCN-NEXT: [[LOAD1:%[0-9]+]]:_(s32) = G_LOAD [[PTR_ADD]](p4) :: (dereferenceable invariant load (s32) from %ir.idx.kernarg.offset.cast, align 8, addrspace 4)		; GCN-NEXT: [[LOAD1:%[0-9]+]]:_(s32) = G_LOAD [[PTR_ADD]](p4) :: (invariant load (s32) from %ir.idx.kernarg.offset.cast, align 8, addrspace 4)
; GCN-NEXT: [[FRAME_INDEX:%[0-9]+]]:_(p5) = G_FRAME_INDEX %stack.0		; GCN-NEXT: [[FRAME_INDEX:%[0-9]+]]:_(p5) = G_FRAME_INDEX %stack.0
; GCN-NEXT: ADJCALLSTACKUP 0, 0, implicit-def $scc		; GCN-NEXT: ADJCALLSTACKUP 0, 0, implicit-def $scc
; GCN-NEXT: [[GV:%[0-9]+]]:sreg_64(p0) = G_GLOBAL_VALUE @external_v33i32_func_v33i32_i32		; GCN-NEXT: [[GV:%[0-9]+]]:sreg_64(p0) = G_GLOBAL_VALUE @external_v33i32_func_v33i32_i32
; GCN-NEXT: [[COPY10:%[0-9]+]]:_(p4) = COPY [[COPY8]]		; GCN-NEXT: [[COPY10:%[0-9]+]]:_(p4) = COPY [[COPY8]]
; GCN-NEXT: [[COPY11:%[0-9]+]]:_(p4) = COPY [[COPY7]]		; GCN-NEXT: [[COPY11:%[0-9]+]]:_(p4) = COPY [[COPY7]]
; GCN-NEXT: [[COPY12:%[0-9]+]]:_(p4) = COPY [[COPY9]](p4)		; GCN-NEXT: [[COPY12:%[0-9]+]]:_(p4) = COPY [[COPY9]](p4)
; GCN-NEXT: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 16		; GCN-NEXT: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 16
; GCN-NEXT: [[PTR_ADD1:%[0-9]+]]:_(p4) = G_PTR_ADD [[COPY12]], [[C1]](s64)		; GCN-NEXT: [[PTR_ADD1:%[0-9]+]]:_(p4) = G_PTR_ADD [[COPY12]], [[C1]](s64)
▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,294 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @test_call_external_void_func_p0_imm(i8* %arg) #0 {
; CHECK-NEXT: [[COPY3:%[0-9]+]]:sgpr_32 = COPY $sgpr16		; CHECK-NEXT: [[COPY3:%[0-9]+]]:sgpr_32 = COPY $sgpr16
; CHECK-NEXT: [[COPY4:%[0-9]+]]:sgpr_32 = COPY $sgpr15		; CHECK-NEXT: [[COPY4:%[0-9]+]]:sgpr_32 = COPY $sgpr15
; CHECK-NEXT: [[COPY5:%[0-9]+]]:sgpr_32 = COPY $sgpr14		; CHECK-NEXT: [[COPY5:%[0-9]+]]:sgpr_32 = COPY $sgpr14
; CHECK-NEXT: [[COPY6:%[0-9]+]]:sgpr_64 = COPY $sgpr10_sgpr11		; CHECK-NEXT: [[COPY6:%[0-9]+]]:sgpr_64 = COPY $sgpr10_sgpr11
; CHECK-NEXT: [[COPY7:%[0-9]+]]:sgpr_64 = COPY $sgpr6_sgpr7		; CHECK-NEXT: [[COPY7:%[0-9]+]]:sgpr_64 = COPY $sgpr6_sgpr7
; CHECK-NEXT: [[COPY8:%[0-9]+]]:sgpr_64 = COPY $sgpr4_sgpr5		; CHECK-NEXT: [[COPY8:%[0-9]+]]:sgpr_64 = COPY $sgpr4_sgpr5
; CHECK-NEXT: [[COPY9:%[0-9]+]]:_(p4) = COPY $sgpr8_sgpr9		; CHECK-NEXT: [[COPY9:%[0-9]+]]:_(p4) = COPY $sgpr8_sgpr9
; CHECK-NEXT: [[INT:%[0-9]+]]:_(p4) = G_INTRINSIC intrinsic(@llvm.amdgcn.kernarg.segment.ptr)		; CHECK-NEXT: [[INT:%[0-9]+]]:_(p4) = G_INTRINSIC intrinsic(@llvm.amdgcn.kernarg.segment.ptr)
; CHECK-NEXT: [[LOAD:%[0-9]+]]:_(p0) = G_LOAD [[INT]](p4) :: (dereferenceable invariant load (p0) from %ir.arg.kernarg.offset.cast, align 16, addrspace 4)		; CHECK-NEXT: [[LOAD:%[0-9]+]]:_(p0) = G_LOAD [[INT]](p4) :: (invariant load (p0) from %ir.arg.kernarg.offset.cast, align 16, addrspace 4)
; CHECK-NEXT: ADJCALLSTACKUP 0, 0, implicit-def $scc		; CHECK-NEXT: ADJCALLSTACKUP 0, 0, implicit-def $scc
; CHECK-NEXT: [[GV:%[0-9]+]]:sreg_64(p0) = G_GLOBAL_VALUE @external_void_func_p0		; CHECK-NEXT: [[GV:%[0-9]+]]:sreg_64(p0) = G_GLOBAL_VALUE @external_void_func_p0
; CHECK-NEXT: [[COPY10:%[0-9]+]]:_(p4) = COPY [[COPY8]]		; CHECK-NEXT: [[COPY10:%[0-9]+]]:_(p4) = COPY [[COPY8]]
; CHECK-NEXT: [[COPY11:%[0-9]+]]:_(p4) = COPY [[COPY7]]		; CHECK-NEXT: [[COPY11:%[0-9]+]]:_(p4) = COPY [[COPY7]]
; CHECK-NEXT: [[COPY12:%[0-9]+]]:_(p4) = COPY [[COPY9]](p4)		; CHECK-NEXT: [[COPY12:%[0-9]+]]:_(p4) = COPY [[COPY9]](p4)
; CHECK-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 8		; CHECK-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 8
; CHECK-NEXT: [[PTR_ADD:%[0-9]+]]:_(p4) = G_PTR_ADD [[COPY12]], [[C]](s64)		; CHECK-NEXT: [[PTR_ADD:%[0-9]+]]:_(p4) = G_PTR_ADD [[COPY12]], [[C]](s64)
; CHECK-NEXT: [[COPY13:%[0-9]+]]:_(s64) = COPY [[COPY6]]		; CHECK-NEXT: [[COPY13:%[0-9]+]]:_(s64) = COPY [[COPY6]]
▲ Show 20 Lines • Show All 3,199 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @stack_passed_arg_alignment_v32i32_f64(<32 x i32> %val, double %tmp) #0 {
; CHECK-NEXT: [[COPY3:%[0-9]+]]:sgpr_32 = COPY $sgpr16		; CHECK-NEXT: [[COPY3:%[0-9]+]]:sgpr_32 = COPY $sgpr16
; CHECK-NEXT: [[COPY4:%[0-9]+]]:sgpr_32 = COPY $sgpr15		; CHECK-NEXT: [[COPY4:%[0-9]+]]:sgpr_32 = COPY $sgpr15
; CHECK-NEXT: [[COPY5:%[0-9]+]]:sgpr_32 = COPY $sgpr14		; CHECK-NEXT: [[COPY5:%[0-9]+]]:sgpr_32 = COPY $sgpr14
; CHECK-NEXT: [[COPY6:%[0-9]+]]:sgpr_64 = COPY $sgpr10_sgpr11		; CHECK-NEXT: [[COPY6:%[0-9]+]]:sgpr_64 = COPY $sgpr10_sgpr11
; CHECK-NEXT: [[COPY7:%[0-9]+]]:sgpr_64 = COPY $sgpr6_sgpr7		; CHECK-NEXT: [[COPY7:%[0-9]+]]:sgpr_64 = COPY $sgpr6_sgpr7
; CHECK-NEXT: [[COPY8:%[0-9]+]]:sgpr_64 = COPY $sgpr4_sgpr5		; CHECK-NEXT: [[COPY8:%[0-9]+]]:sgpr_64 = COPY $sgpr4_sgpr5
; CHECK-NEXT: [[COPY9:%[0-9]+]]:_(p4) = COPY $sgpr8_sgpr9		; CHECK-NEXT: [[COPY9:%[0-9]+]]:_(p4) = COPY $sgpr8_sgpr9
; CHECK-NEXT: [[INT:%[0-9]+]]:_(p4) = G_INTRINSIC intrinsic(@llvm.amdgcn.kernarg.segment.ptr)		; CHECK-NEXT: [[INT:%[0-9]+]]:_(p4) = G_INTRINSIC intrinsic(@llvm.amdgcn.kernarg.segment.ptr)
; CHECK-NEXT: [[LOAD:%[0-9]+]]:_(<32 x s32>) = G_LOAD [[INT]](p4) :: (dereferenceable invariant load (<32 x s32>) from %ir.val.kernarg.offset.cast, align 16, addrspace 4)		; CHECK-NEXT: [[LOAD:%[0-9]+]]:_(<32 x s32>) = G_LOAD [[INT]](p4) :: (invariant load (<32 x s32>) from %ir.val.kernarg.offset.cast, align 16, addrspace 4)
; CHECK-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 128		; CHECK-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 128
; CHECK-NEXT: [[PTR_ADD:%[0-9]+]]:_(p4) = G_PTR_ADD [[INT]], [[C]](s64)		; CHECK-NEXT: [[PTR_ADD:%[0-9]+]]:_(p4) = G_PTR_ADD [[INT]], [[C]](s64)
; CHECK-NEXT: [[LOAD1:%[0-9]+]]:_(s64) = G_LOAD [[PTR_ADD]](p4) :: (dereferenceable invariant load (s64) from %ir.tmp.kernarg.offset.cast, align 16, addrspace 4)		; CHECK-NEXT: [[LOAD1:%[0-9]+]]:_(s64) = G_LOAD [[PTR_ADD]](p4) :: (invariant load (s64) from %ir.tmp.kernarg.offset.cast, align 16, addrspace 4)
; CHECK-NEXT: ADJCALLSTACKUP 0, 0, implicit-def $scc		; CHECK-NEXT: ADJCALLSTACKUP 0, 0, implicit-def $scc
; CHECK-NEXT: [[GV:%[0-9]+]]:sreg_64(p0) = G_GLOBAL_VALUE @stack_passed_f64_arg		; CHECK-NEXT: [[GV:%[0-9]+]]:sreg_64(p0) = G_GLOBAL_VALUE @stack_passed_f64_arg
; CHECK-NEXT: [[COPY10:%[0-9]+]]:_(p4) = COPY [[COPY8]]		; CHECK-NEXT: [[COPY10:%[0-9]+]]:_(p4) = COPY [[COPY8]]
; CHECK-NEXT: [[COPY11:%[0-9]+]]:_(p4) = COPY [[COPY7]]		; CHECK-NEXT: [[COPY11:%[0-9]+]]:_(p4) = COPY [[COPY7]]
; CHECK-NEXT: [[COPY12:%[0-9]+]]:_(p4) = COPY [[COPY9]](p4)		; CHECK-NEXT: [[COPY12:%[0-9]+]]:_(p4) = COPY [[COPY9]](p4)
; CHECK-NEXT: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 136		; CHECK-NEXT: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 136
; CHECK-NEXT: [[PTR_ADD1:%[0-9]+]]:_(p4) = G_PTR_ADD [[COPY12]], [[C1]](s64)		; CHECK-NEXT: [[PTR_ADD1:%[0-9]+]]:_(p4) = G_PTR_ADD [[COPY12]], [[C1]](s64)
; CHECK-NEXT: [[COPY13:%[0-9]+]]:_(s64) = COPY [[COPY6]]		; CHECK-NEXT: [[COPY13:%[0-9]+]]:_(s64) = COPY [[COPY6]]
▲ Show 20 Lines • Show All 644 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-indirect-call.ll

	; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	; RUN: llc -global-isel -amdgpu-fixed-function-abi -stop-after=irtranslator -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -verify-machineinstrs -o - %s \| FileCheck -enable-var-scope %s			; RUN: llc -global-isel -amdgpu-fixed-function-abi -stop-after=irtranslator -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -verify-machineinstrs -o - %s \| FileCheck -enable-var-scope %s

	define amdgpu_kernel void @test_indirect_call_sgpr_ptr(void()* %fptr) {			define amdgpu_kernel void @test_indirect_call_sgpr_ptr(void()* %fptr) {
	; CHECK-LABEL: name: test_indirect_call_sgpr_ptr			; CHECK-LABEL: name: test_indirect_call_sgpr_ptr
	; CHECK: bb.1 (%ir-block.0):			; CHECK: bb.1 (%ir-block.0):
	; CHECK: liveins: $sgpr14, $sgpr15, $sgpr16, $vgpr0, $vgpr1, $vgpr2, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9, $sgpr10_sgpr11			; CHECK-NEXT: liveins: $sgpr14, $sgpr15, $sgpr16, $vgpr0, $vgpr1, $vgpr2, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9, $sgpr10_sgpr11
	; CHECK: [[COPY:%[0-9]+]]:vgpr_32(s32) = COPY $vgpr2			; CHECK-NEXT: {{ $}}
	; CHECK: [[COPY1:%[0-9]+]]:vgpr_32(s32) = COPY $vgpr1			; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr_32(s32) = COPY $vgpr2
	; CHECK: [[COPY2:%[0-9]+]]:vgpr_32(s32) = COPY $vgpr0			; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr_32(s32) = COPY $vgpr1
	; CHECK: [[COPY3:%[0-9]+]]:sgpr_32 = COPY $sgpr16			; CHECK-NEXT: [[COPY2:%[0-9]+]]:vgpr_32(s32) = COPY $vgpr0
	; CHECK: [[COPY4:%[0-9]+]]:sgpr_32 = COPY $sgpr15			; CHECK-NEXT: [[COPY3:%[0-9]+]]:sgpr_32 = COPY $sgpr16
	; CHECK: [[COPY5:%[0-9]+]]:sgpr_32 = COPY $sgpr14			; CHECK-NEXT: [[COPY4:%[0-9]+]]:sgpr_32 = COPY $sgpr15
	; CHECK: [[COPY6:%[0-9]+]]:sgpr_64 = COPY $sgpr10_sgpr11			; CHECK-NEXT: [[COPY5:%[0-9]+]]:sgpr_32 = COPY $sgpr14
	; CHECK: [[COPY7:%[0-9]+]]:sgpr_64 = COPY $sgpr6_sgpr7			; CHECK-NEXT: [[COPY6:%[0-9]+]]:sgpr_64 = COPY $sgpr10_sgpr11
	; CHECK: [[COPY8:%[0-9]+]]:sgpr_64 = COPY $sgpr4_sgpr5			; CHECK-NEXT: [[COPY7:%[0-9]+]]:sgpr_64 = COPY $sgpr6_sgpr7
	; CHECK: [[COPY9:%[0-9]+]]:_(p4) = COPY $sgpr8_sgpr9			; CHECK-NEXT: [[COPY8:%[0-9]+]]:sgpr_64 = COPY $sgpr4_sgpr5
	; CHECK: [[INT:%[0-9]+]]:_(p4) = G_INTRINSIC intrinsic(@llvm.amdgcn.kernarg.segment.ptr)			; CHECK-NEXT: [[COPY9:%[0-9]+]]:_(p4) = COPY $sgpr8_sgpr9
	; CHECK: [[LOAD:%[0-9]+]]:sreg_64(p0) = G_LOAD [[INT]](p4) :: (dereferenceable invariant load (p0) from %ir.fptr.kernarg.offset.cast, align 16, addrspace 4)			; CHECK-NEXT: [[INT:%[0-9]+]]:_(p4) = G_INTRINSIC intrinsic(@llvm.amdgcn.kernarg.segment.ptr)
	; CHECK: ADJCALLSTACKUP 0, 0, implicit-def $scc			; CHECK-NEXT: [[LOAD:%[0-9]+]]:sreg_64(p0) = G_LOAD [[INT]](p4) :: (invariant load (p0) from %ir.fptr.kernarg.offset.cast, align 16, addrspace 4)
	; CHECK: [[COPY10:%[0-9]+]]:_(p4) = COPY [[COPY8]]			; CHECK-NEXT: ADJCALLSTACKUP 0, 0, implicit-def $scc
	; CHECK: [[COPY11:%[0-9]+]]:_(p4) = COPY [[COPY7]]			; CHECK-NEXT: [[COPY10:%[0-9]+]]:_(p4) = COPY [[COPY8]]
	; CHECK: [[COPY12:%[0-9]+]]:_(p4) = COPY [[COPY9]](p4)			; CHECK-NEXT: [[COPY11:%[0-9]+]]:_(p4) = COPY [[COPY7]]
	; CHECK: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 8			; CHECK-NEXT: [[COPY12:%[0-9]+]]:_(p4) = COPY [[COPY9]](p4)
	; CHECK: [[PTR_ADD:%[0-9]+]]:_(p4) = G_PTR_ADD [[COPY12]], [[C]](s64)			; CHECK-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 8
	; CHECK: [[COPY13:%[0-9]+]]:_(s64) = COPY [[COPY6]]			; CHECK-NEXT: [[PTR_ADD:%[0-9]+]]:_(p4) = G_PTR_ADD [[COPY12]], [[C]](s64)
	; CHECK: [[COPY14:%[0-9]+]]:_(s32) = COPY [[COPY5]]			; CHECK-NEXT: [[COPY13:%[0-9]+]]:_(s64) = COPY [[COPY6]]
	; CHECK: [[COPY15:%[0-9]+]]:_(s32) = COPY [[COPY4]]			; CHECK-NEXT: [[COPY14:%[0-9]+]]:_(s32) = COPY [[COPY5]]
	; CHECK: [[COPY16:%[0-9]+]]:_(s32) = COPY [[COPY3]]			; CHECK-NEXT: [[COPY15:%[0-9]+]]:_(s32) = COPY [[COPY4]]
	; CHECK: [[COPY17:%[0-9]+]]:_(s32) = COPY [[COPY2]](s32)			; CHECK-NEXT: [[COPY16:%[0-9]+]]:_(s32) = COPY [[COPY3]]
	; CHECK: [[COPY18:%[0-9]+]]:_(s32) = COPY [[COPY1]](s32)			; CHECK-NEXT: [[COPY17:%[0-9]+]]:_(s32) = COPY [[COPY2]](s32)
	; CHECK: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 10			; CHECK-NEXT: [[COPY18:%[0-9]+]]:_(s32) = COPY [[COPY1]](s32)
	; CHECK: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[COPY18]], [[C1]](s32)			; CHECK-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 10
	; CHECK: [[OR:%[0-9]+]]:_(s32) = G_OR [[COPY17]], [[SHL]]			; CHECK-NEXT: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[COPY18]], [[C1]](s32)
	; CHECK: [[COPY19:%[0-9]+]]:_(s32) = COPY [[COPY]](s32)			; CHECK-NEXT: [[OR:%[0-9]+]]:_(s32) = G_OR [[COPY17]], [[SHL]]
	; CHECK: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 20			; CHECK-NEXT: [[COPY19:%[0-9]+]]:_(s32) = COPY [[COPY]](s32)
	; CHECK: [[SHL1:%[0-9]+]]:_(s32) = G_SHL [[COPY19]], [[C2]](s32)			; CHECK-NEXT: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 20
	; CHECK: [[OR1:%[0-9]+]]:_(s32) = G_OR [[OR]], [[SHL1]]			; CHECK-NEXT: [[SHL1:%[0-9]+]]:_(s32) = G_SHL [[COPY19]], [[C2]](s32)
	; CHECK: [[COPY20:%[0-9]+]]:_(<4 x s32>) = COPY $private_rsrc_reg			; CHECK-NEXT: [[OR1:%[0-9]+]]:_(s32) = G_OR [[OR]], [[SHL1]]
	; CHECK: $sgpr0_sgpr1_sgpr2_sgpr3 = COPY [[COPY20]](<4 x s32>)			; CHECK-NEXT: [[COPY20:%[0-9]+]]:_(<4 x s32>) = COPY $private_rsrc_reg
	; CHECK: $sgpr4_sgpr5 = COPY [[COPY10]](p4)			; CHECK-NEXT: $sgpr0_sgpr1_sgpr2_sgpr3 = COPY [[COPY20]](<4 x s32>)
	; CHECK: $sgpr6_sgpr7 = COPY [[COPY11]](p4)			; CHECK-NEXT: $sgpr4_sgpr5 = COPY [[COPY10]](p4)
	; CHECK: $sgpr8_sgpr9 = COPY [[PTR_ADD]](p4)			; CHECK-NEXT: $sgpr6_sgpr7 = COPY [[COPY11]](p4)
	; CHECK: $sgpr10_sgpr11 = COPY [[COPY13]](s64)			; CHECK-NEXT: $sgpr8_sgpr9 = COPY [[PTR_ADD]](p4)
	; CHECK: $sgpr12 = COPY [[COPY14]](s32)			; CHECK-NEXT: $sgpr10_sgpr11 = COPY [[COPY13]](s64)
	; CHECK: $sgpr13 = COPY [[COPY15]](s32)			; CHECK-NEXT: $sgpr12 = COPY [[COPY14]](s32)
	; CHECK: $sgpr14 = COPY [[COPY16]](s32)			; CHECK-NEXT: $sgpr13 = COPY [[COPY15]](s32)
	; CHECK: $vgpr31 = COPY [[OR1]](s32)			; CHECK-NEXT: $sgpr14 = COPY [[COPY16]](s32)
	; CHECK: $sgpr30_sgpr31 = SI_CALL [[LOAD]](p0), 0, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr4_sgpr5, implicit $sgpr6_sgpr7, implicit $sgpr8_sgpr9, implicit $sgpr10_sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $vgpr31			; CHECK-NEXT: $vgpr31 = COPY [[OR1]](s32)
	; CHECK: ADJCALLSTACKDOWN 0, 0, implicit-def $scc			; CHECK-NEXT: $sgpr30_sgpr31 = SI_CALL [[LOAD]](p0), 0, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr4_sgpr5, implicit $sgpr6_sgpr7, implicit $sgpr8_sgpr9, implicit $sgpr10_sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $vgpr31
	; CHECK: S_ENDPGM 0			; CHECK-NEXT: ADJCALLSTACKDOWN 0, 0, implicit-def $scc
				; CHECK-NEXT: S_ENDPGM 0
	call void %fptr()			call void %fptr()
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_gfx_indirect_call_sgpr_ptr(void()* %fptr) {			define amdgpu_gfx void @test_gfx_indirect_call_sgpr_ptr(void()* %fptr) {
	; CHECK-LABEL: name: test_gfx_indirect_call_sgpr_ptr			; CHECK-LABEL: name: test_gfx_indirect_call_sgpr_ptr
	; CHECK: bb.1 (%ir-block.0):			; CHECK: bb.1 (%ir-block.0):
	; CHECK: liveins: $vgpr0, $vgpr1, $sgpr30_sgpr31			; CHECK-NEXT: liveins: $vgpr0, $vgpr1, $sgpr30_sgpr31
	; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0			; CHECK-NEXT: {{ $}}
	; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1			; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
	; CHECK: [[MV:%[0-9]+]]:sreg_64(p0) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)			; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
	; CHECK: [[COPY2:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31			; CHECK-NEXT: [[MV:%[0-9]+]]:sreg_64(p0) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
	; CHECK: ADJCALLSTACKUP 0, 0, implicit-def $scc			; CHECK-NEXT: [[COPY2:%[0-9]+]]:sgpr_64 = COPY $sgpr30_sgpr31
	; CHECK: [[COPY3:%[0-9]+]]:_(<4 x s32>) = COPY $sgpr0_sgpr1_sgpr2_sgpr3			; CHECK-NEXT: ADJCALLSTACKUP 0, 0, implicit-def $scc
	; CHECK: $sgpr0_sgpr1_sgpr2_sgpr3 = COPY [[COPY3]](<4 x s32>)			; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(<4 x s32>) = COPY $sgpr0_sgpr1_sgpr2_sgpr3
	; CHECK: $sgpr30_sgpr31 = SI_CALL [[MV]](p0), 0, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3			; CHECK-NEXT: $sgpr0_sgpr1_sgpr2_sgpr3 = COPY [[COPY3]](<4 x s32>)
	; CHECK: ADJCALLSTACKDOWN 0, 0, implicit-def $scc			; CHECK-NEXT: $sgpr30_sgpr31 = SI_CALL [[MV]](p0), 0, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3
	; CHECK: [[COPY4:%[0-9]+]]:ccr_sgpr_64 = COPY [[COPY2]]			; CHECK-NEXT: ADJCALLSTACKDOWN 0, 0, implicit-def $scc
	; CHECK: S_SETPC_B64_return [[COPY4]]			; CHECK-NEXT: [[COPY4:%[0-9]+]]:ccr_sgpr_64 = COPY [[COPY2]]
				; CHECK-NEXT: S_SETPC_B64_return [[COPY4]]
	call amdgpu_gfx void %fptr()			call amdgpu_gfx void %fptr()
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-sibling-call.ll

Show First 20 Lines • Show All 142 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @kernel_call_i32_fastcc_i32_i32_unused_result(i32 %a, i32 %b, i32 %c) #1 {
; GCN-LABEL: name: kernel_call_i32_fastcc_i32_i32_unused_result		; GCN-LABEL: name: kernel_call_i32_fastcc_i32_i32_unused_result
; GCN: bb.1.entry:		; GCN: bb.1.entry:
; GCN-NEXT: liveins: $sgpr8_sgpr9		; GCN-NEXT: liveins: $sgpr8_sgpr9
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: [[COPY:%[0-9]+]]:_(p4) = COPY $sgpr8_sgpr9		; GCN-NEXT: [[COPY:%[0-9]+]]:_(p4) = COPY $sgpr8_sgpr9
; GCN-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0		; GCN-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
; GCN-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 1		; GCN-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
; GCN-NEXT: [[INT:%[0-9]+]]:_(p4) = G_INTRINSIC intrinsic(@llvm.amdgcn.kernarg.segment.ptr)		; GCN-NEXT: [[INT:%[0-9]+]]:_(p4) = G_INTRINSIC intrinsic(@llvm.amdgcn.kernarg.segment.ptr)
; GCN-NEXT: [[LOAD:%[0-9]+]]:_(<2 x s32>) = G_LOAD [[INT]](p4) :: (dereferenceable invariant load (<2 x s32>) from %ir.0, align 16, addrspace 4)		; GCN-NEXT: [[LOAD:%[0-9]+]]:_(<2 x s32>) = G_LOAD [[INT]](p4) :: (invariant load (<2 x s32>) from %ir.0, align 16, addrspace 4)
; GCN-NEXT: [[EVEC:%[0-9]+]]:_(s32) = G_EXTRACT_VECTOR_ELT [[LOAD]](<2 x s32>), [[C]](s32)		; GCN-NEXT: [[EVEC:%[0-9]+]]:_(s32) = G_EXTRACT_VECTOR_ELT [[LOAD]](<2 x s32>), [[C]](s32)
; GCN-NEXT: [[EVEC1:%[0-9]+]]:_(s32) = G_EXTRACT_VECTOR_ELT [[LOAD]](<2 x s32>), [[C1]](s32)		; GCN-NEXT: [[EVEC1:%[0-9]+]]:_(s32) = G_EXTRACT_VECTOR_ELT [[LOAD]](<2 x s32>), [[C1]](s32)
; GCN-NEXT: [[C2:%[0-9]+]]:_(s64) = G_CONSTANT i64 4		; GCN-NEXT: [[C2:%[0-9]+]]:_(s64) = G_CONSTANT i64 4
; GCN-NEXT: [[PTR_ADD:%[0-9]+]]:_(p4) = G_PTR_ADD [[INT]], [[C2]](s64)		; GCN-NEXT: [[PTR_ADD:%[0-9]+]]:_(p4) = G_PTR_ADD [[INT]], [[C2]](s64)
; GCN-NEXT: ADJCALLSTACKUP 0, 0, implicit-def $scc		; GCN-NEXT: ADJCALLSTACKUP 0, 0, implicit-def $scc
; GCN-NEXT: [[GV:%[0-9]+]]:sreg_64(p0) = G_GLOBAL_VALUE @i32_fastcc_i32_i32		; GCN-NEXT: [[GV:%[0-9]+]]:sreg_64(p0) = G_GLOBAL_VALUE @i32_fastcc_i32_i32
; GCN-NEXT: $vgpr0 = COPY [[EVEC]](s32)		; GCN-NEXT: $vgpr0 = COPY [[EVEC]](s32)
; GCN-NEXT: $vgpr1 = COPY [[EVEC1]](s32)		; GCN-NEXT: $vgpr1 = COPY [[EVEC1]](s32)
▲ Show 20 Lines • Show All 1,362 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/buffer-intrinsics-mmo-offsets.ll

	; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=gfx908 -verify-machineinstrs -stop-after=amdgpu-isel -o - %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=gfx908 -verify-machineinstrs -stop-after=amdgpu-isel -o - %s \| FileCheck -check-prefix=GCN %s

	define amdgpu_cs void @mmo_offsets0(<4 x i32> addrspace(6)* inreg noalias dereferenceable(18446744073709551615) %arg0, i32 %arg1) {			define amdgpu_cs void @mmo_offsets0(<4 x i32> addrspace(6)* inreg noalias dereferenceable(18446744073709551615) %arg0, i32 %arg1) {
	; GCN-LABEL: name: mmo_offsets0			; GCN-LABEL: name: mmo_offsets0
	; GCN: bb.0.bb.0:			; GCN: bb.0.bb.0:
	; GCN: liveins: $sgpr0, $vgpr0			; GCN-NEXT: liveins: $sgpr0, $vgpr0
	; GCN: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GCN-NEXT: {{ $}}
	; GCN: [[COPY1:%[0-9]+]]:sgpr_32 = COPY $sgpr0			; GCN-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GCN: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 0			; GCN-NEXT: [[COPY1:%[0-9]+]]:sgpr_32 = COPY $sgpr0
	; GCN: [[REG_SEQUENCE:%[0-9]+]]:sgpr_64 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[S_MOV_B32_]], %subreg.sub1			; GCN-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 0
	; GCN: [[S_LOAD_DWORDX4_IMM:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM killed [[REG_SEQUENCE]], 0, 0 :: (dereferenceable invariant load (s128) from %ir.arg0, addrspace 6)			; GCN-NEXT: [[REG_SEQUENCE:%[0-9]+]]:sgpr_64 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[S_MOV_B32_]], %subreg.sub1
	; GCN: [[BUFFER_LOAD_DWORDX4_OFFSET:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_OFFSET [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 16, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from custom "BufferResource" + 16, align 1, addrspace 4)			; GCN-NEXT: [[S_LOAD_DWORDX4_IMM:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM killed [[REG_SEQUENCE]], 0, 0 :: (invariant load (s128) from %ir.arg0, addrspace 6)
	; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1, implicit $exec			; GCN-NEXT: [[BUFFER_LOAD_DWORDX4_OFFSET:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_OFFSET [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 16, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from custom "BufferResource" + 16, align 1, addrspace 4)
	; GCN: [[BUFFER_LOAD_DWORDX4_IDXEN:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 16, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)			; GCN-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1, implicit $exec
	; GCN: [[BUFFER_LOAD_DWORDX4_OFFEN:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_OFFEN [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)			; GCN-NEXT: [[BUFFER_LOAD_DWORDX4_IDXEN:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 16, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)
	; GCN: [[BUFFER_LOAD_DWORDX4_IDXEN1:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_IDXEN [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 16, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)			; GCN-NEXT: [[BUFFER_LOAD_DWORDX4_OFFEN:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_OFFEN [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)
	; GCN: INLINEASM &"", 1 /* sideeffect attdialect */			; GCN-NEXT: [[BUFFER_LOAD_DWORDX4_IDXEN1:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_IDXEN [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 16, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)
	; GCN: BUFFER_STORE_DWORDX4_OFFSET_exact killed [[BUFFER_LOAD_DWORDX4_OFFSET]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 32, 0, 0, 0, implicit $exec :: (dereferenceable store (s128) into custom "BufferResource" + 32, align 1, addrspace 4)			; GCN-NEXT: INLINEASM &"", 1 /* sideeffect attdialect */
	; GCN: BUFFER_STORE_DWORDX4_OFFEN_exact killed [[BUFFER_LOAD_DWORDX4_OFFEN]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 0, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)			; GCN-NEXT: BUFFER_STORE_DWORDX4_OFFSET_exact killed [[BUFFER_LOAD_DWORDX4_OFFSET]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 32, 0, 0, 0, implicit $exec :: (dereferenceable store (s128) into custom "BufferResource" + 32, align 1, addrspace 4)
	; GCN: BUFFER_STORE_DWORDX4_IDXEN_exact killed [[BUFFER_LOAD_DWORDX4_IDXEN]], [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 32, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)			; GCN-NEXT: BUFFER_STORE_DWORDX4_OFFEN_exact killed [[BUFFER_LOAD_DWORDX4_OFFEN]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 0, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)
	; GCN: BUFFER_STORE_DWORDX4_IDXEN_exact killed [[BUFFER_LOAD_DWORDX4_IDXEN1]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 32, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)			; GCN-NEXT: BUFFER_STORE_DWORDX4_IDXEN_exact killed [[BUFFER_LOAD_DWORDX4_IDXEN]], [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 32, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)
	; GCN: INLINEASM &"", 1 /* sideeffect attdialect */			; GCN-NEXT: BUFFER_STORE_DWORDX4_IDXEN_exact killed [[BUFFER_LOAD_DWORDX4_IDXEN1]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 32, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)
	; GCN: [[BUFFER_LOAD_FORMAT_XYZW_OFFSET:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_OFFSET [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 48, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from custom "BufferResource" + 48, align 1, addrspace 4)			; GCN-NEXT: INLINEASM &"", 1 /* sideeffect attdialect */
	; GCN: [[BUFFER_LOAD_FORMAT_XYZW_IDXEN:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 48, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)			; GCN-NEXT: [[BUFFER_LOAD_FORMAT_XYZW_OFFSET:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_OFFSET [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 48, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from custom "BufferResource" + 48, align 1, addrspace 4)
	; GCN: [[BUFFER_LOAD_FORMAT_XYZW_OFFEN:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_OFFEN [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)			; GCN-NEXT: [[BUFFER_LOAD_FORMAT_XYZW_IDXEN:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 48, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)
	; GCN: [[BUFFER_LOAD_FORMAT_XYZW_IDXEN1:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_IDXEN [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 48, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)			; GCN-NEXT: [[BUFFER_LOAD_FORMAT_XYZW_OFFEN:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_OFFEN [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)
	; GCN: INLINEASM &"", 1 /* sideeffect attdialect */			; GCN-NEXT: [[BUFFER_LOAD_FORMAT_XYZW_IDXEN1:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_IDXEN [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 48, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)
	; GCN: BUFFER_STORE_FORMAT_XYZW_OFFSET_exact killed [[BUFFER_LOAD_FORMAT_XYZW_OFFSET]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 64, 0, 0, 0, implicit $exec :: (dereferenceable store (s128) into custom "BufferResource" + 64, align 1, addrspace 4)			; GCN-NEXT: INLINEASM &"", 1 /* sideeffect attdialect */
	; GCN: BUFFER_STORE_FORMAT_XYZW_OFFEN_exact killed [[BUFFER_LOAD_FORMAT_XYZW_OFFEN]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 0, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)			; GCN-NEXT: BUFFER_STORE_FORMAT_XYZW_OFFSET_exact killed [[BUFFER_LOAD_FORMAT_XYZW_OFFSET]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 64, 0, 0, 0, implicit $exec :: (dereferenceable store (s128) into custom "BufferResource" + 64, align 1, addrspace 4)
	; GCN: BUFFER_STORE_FORMAT_XYZW_IDXEN_exact killed [[BUFFER_LOAD_FORMAT_XYZW_IDXEN]], [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 64, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)			; GCN-NEXT: BUFFER_STORE_FORMAT_XYZW_OFFEN_exact killed [[BUFFER_LOAD_FORMAT_XYZW_OFFEN]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 0, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)
	; GCN: BUFFER_STORE_FORMAT_XYZW_IDXEN_exact killed [[BUFFER_LOAD_FORMAT_XYZW_IDXEN1]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 64, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)			; GCN-NEXT: BUFFER_STORE_FORMAT_XYZW_IDXEN_exact killed [[BUFFER_LOAD_FORMAT_XYZW_IDXEN]], [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 64, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)
	; GCN: INLINEASM &"", 1 /* sideeffect attdialect */			; GCN-NEXT: BUFFER_STORE_FORMAT_XYZW_IDXEN_exact killed [[BUFFER_LOAD_FORMAT_XYZW_IDXEN1]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 64, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)
	; GCN: BUFFER_ATOMIC_ADD_OFFSET [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 80, 0, implicit $exec :: (volatile dereferenceable load store (s32) on custom "BufferResource" + 80, align 1, addrspace 4)			; GCN-NEXT: INLINEASM &"", 1 /* sideeffect attdialect */
	; GCN: BUFFER_ATOMIC_ADD_OFFEN [[COPY]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 0, 0, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)			; GCN-NEXT: BUFFER_ATOMIC_ADD_OFFSET [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 80, 0, implicit $exec :: (volatile dereferenceable load store (s32) on custom "BufferResource" + 80, align 1, addrspace 4)
	; GCN: BUFFER_ATOMIC_ADD_IDXEN [[COPY]], [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 80, 0, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)			; GCN-NEXT: BUFFER_ATOMIC_ADD_OFFEN [[COPY]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 0, 0, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)
	; GCN: BUFFER_ATOMIC_ADD_IDXEN [[COPY]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 80, 0, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)			; GCN-NEXT: BUFFER_ATOMIC_ADD_IDXEN [[COPY]], [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 80, 0, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)
	; GCN: INLINEASM &"", 1 /* sideeffect attdialect */			; GCN-NEXT: BUFFER_ATOMIC_ADD_IDXEN [[COPY]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 80, 0, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)
	; GCN: [[REG_SEQUENCE1:%[0-9]+]]:vreg_64 = REG_SEQUENCE [[COPY]], %subreg.sub0, [[COPY]], %subreg.sub1			; GCN-NEXT: INLINEASM &"", 1 /* sideeffect attdialect */
	; GCN: [[DEF:%[0-9]+]]:vreg_64 = IMPLICIT_DEF			; GCN-NEXT: [[REG_SEQUENCE1:%[0-9]+]]:vreg_64 = REG_SEQUENCE [[COPY]], %subreg.sub0, [[COPY]], %subreg.sub1
	; GCN: BUFFER_ATOMIC_CMPSWAP_OFFSET [[REG_SEQUENCE1]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 96, 1, implicit $exec :: (volatile dereferenceable load store (s32) on custom "BufferResource" + 96, align 1, addrspace 4)			; GCN-NEXT: [[DEF:%[0-9]+]]:vreg_64 = IMPLICIT_DEF
	; GCN: [[COPY2:%[0-9]+]]:vreg_64 = COPY [[DEF]]			; GCN-NEXT: BUFFER_ATOMIC_CMPSWAP_OFFSET [[REG_SEQUENCE1]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 96, 1, implicit $exec :: (volatile dereferenceable load store (s32) on custom "BufferResource" + 96, align 1, addrspace 4)
	; GCN: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY2]].sub0			; GCN-NEXT: [[COPY2:%[0-9]+]]:vreg_64 = COPY [[DEF]]
	; GCN: [[DEF1:%[0-9]+]]:vreg_64 = IMPLICIT_DEF			; GCN-NEXT: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY2]].sub0
	; GCN: BUFFER_ATOMIC_CMPSWAP_OFFEN [[REG_SEQUENCE1]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 0, 1, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)			; GCN-NEXT: [[DEF1:%[0-9]+]]:vreg_64 = IMPLICIT_DEF
	; GCN: [[COPY4:%[0-9]+]]:vreg_64 = COPY [[DEF1]]			; GCN-NEXT: BUFFER_ATOMIC_CMPSWAP_OFFEN [[REG_SEQUENCE1]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 0, 1, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)
	; GCN: [[COPY5:%[0-9]+]]:vgpr_32 = COPY [[COPY4]].sub0			; GCN-NEXT: [[COPY4:%[0-9]+]]:vreg_64 = COPY [[DEF1]]
	; GCN: [[DEF2:%[0-9]+]]:vreg_64 = IMPLICIT_DEF			; GCN-NEXT: [[COPY5:%[0-9]+]]:vgpr_32 = COPY [[COPY4]].sub0
	; GCN: BUFFER_ATOMIC_CMPSWAP_IDXEN [[REG_SEQUENCE1]], [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 96, 1, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)			; GCN-NEXT: [[DEF2:%[0-9]+]]:vreg_64 = IMPLICIT_DEF
	; GCN: [[COPY6:%[0-9]+]]:vreg_64 = COPY [[DEF2]]			; GCN-NEXT: BUFFER_ATOMIC_CMPSWAP_IDXEN [[REG_SEQUENCE1]], [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 96, 1, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)
	; GCN: [[COPY7:%[0-9]+]]:vgpr_32 = COPY [[COPY6]].sub0			; GCN-NEXT: [[COPY6:%[0-9]+]]:vreg_64 = COPY [[DEF2]]
	; GCN: [[DEF3:%[0-9]+]]:vreg_64 = IMPLICIT_DEF			; GCN-NEXT: [[COPY7:%[0-9]+]]:vgpr_32 = COPY [[COPY6]].sub0
	; GCN: BUFFER_ATOMIC_CMPSWAP_IDXEN [[REG_SEQUENCE1]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 96, 1, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)			; GCN-NEXT: [[DEF3:%[0-9]+]]:vreg_64 = IMPLICIT_DEF
	; GCN: [[COPY8:%[0-9]+]]:vreg_64 = COPY [[DEF3]]			; GCN-NEXT: BUFFER_ATOMIC_CMPSWAP_IDXEN [[REG_SEQUENCE1]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 96, 1, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)
	; GCN: [[COPY9:%[0-9]+]]:vgpr_32 = COPY [[COPY8]].sub0			; GCN-NEXT: [[COPY8:%[0-9]+]]:vreg_64 = COPY [[DEF3]]
	; GCN: INLINEASM &"", 1 /* sideeffect attdialect */			; GCN-NEXT: [[COPY9:%[0-9]+]]:vgpr_32 = COPY [[COPY8]].sub0
	; GCN: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1065353216, implicit $exec			; GCN-NEXT: INLINEASM &"", 1 /* sideeffect attdialect */
	; GCN: BUFFER_ATOMIC_ADD_F32_OFFSET [[V_MOV_B32_e32_1]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 112, 0, implicit $exec :: (volatile dereferenceable load store (s32) on custom "BufferResource" + 112, align 1, addrspace 4)			; GCN-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1065353216, implicit $exec
	; GCN: BUFFER_ATOMIC_ADD_F32_OFFEN [[V_MOV_B32_e32_1]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 0, 0, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)			; GCN-NEXT: BUFFER_ATOMIC_ADD_F32_OFFSET [[V_MOV_B32_e32_1]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 112, 0, implicit $exec :: (volatile dereferenceable load store (s32) on custom "BufferResource" + 112, align 1, addrspace 4)
	; GCN: BUFFER_ATOMIC_ADD_F32_IDXEN [[V_MOV_B32_e32_1]], [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 112, 0, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)			; GCN-NEXT: BUFFER_ATOMIC_ADD_F32_OFFEN [[V_MOV_B32_e32_1]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 0, 0, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)
	; GCN: BUFFER_ATOMIC_ADD_F32_IDXEN [[V_MOV_B32_e32_1]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 112, 0, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)			; GCN-NEXT: BUFFER_ATOMIC_ADD_F32_IDXEN [[V_MOV_B32_e32_1]], [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 112, 0, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)
	; GCN: INLINEASM &"", 1 /* sideeffect attdialect */			; GCN-NEXT: BUFFER_ATOMIC_ADD_F32_IDXEN [[V_MOV_B32_e32_1]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 112, 0, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)
	; GCN: [[BUFFER_LOAD_DWORDX4_OFFSET1:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_OFFSET [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 128, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from custom "BufferResource" + 128, align 1, addrspace 4)			; GCN-NEXT: INLINEASM &"", 1 /* sideeffect attdialect */
	; GCN: [[S_MOV_B32_1:%[0-9]+]]:sreg_32 = S_MOV_B32 64			; GCN-NEXT: [[BUFFER_LOAD_DWORDX4_OFFSET1:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_OFFSET [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 128, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from custom "BufferResource" + 128, align 1, addrspace 4)
	; GCN: [[BUFFER_LOAD_DWORDX4_OFFSET2:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_OFFSET [[S_LOAD_DWORDX4_IMM]], killed [[S_MOV_B32_1]], 64, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from custom "BufferResource" + 128, align 1, addrspace 4)			; GCN-NEXT: [[S_MOV_B32_1:%[0-9]+]]:sreg_32 = S_MOV_B32 64
	; GCN: [[S_MOV_B32_2:%[0-9]+]]:sreg_32 = S_MOV_B32 128			; GCN-NEXT: [[BUFFER_LOAD_DWORDX4_OFFSET2:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_OFFSET [[S_LOAD_DWORDX4_IMM]], killed [[S_MOV_B32_1]], 64, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from custom "BufferResource" + 128, align 1, addrspace 4)
	; GCN: [[BUFFER_LOAD_DWORDX4_OFFSET3:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_OFFSET [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_2]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from custom "BufferResource" + 128, align 1, addrspace 4)			; GCN-NEXT: [[S_MOV_B32_2:%[0-9]+]]:sreg_32 = S_MOV_B32 128
	; GCN: [[BUFFER_LOAD_DWORDX4_OFFEN1:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_OFFEN [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_2]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)			; GCN-NEXT: [[BUFFER_LOAD_DWORDX4_OFFSET3:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_OFFSET [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_2]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from custom "BufferResource" + 128, align 1, addrspace 4)
	; GCN: [[COPY10:%[0-9]+]]:sreg_32 = COPY [[COPY]]			; GCN-NEXT: [[BUFFER_LOAD_DWORDX4_OFFEN1:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_OFFEN [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_2]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)
	; GCN: [[BUFFER_LOAD_DWORDX4_OFFSET4:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_OFFSET [[S_LOAD_DWORDX4_IMM]], [[COPY10]], 128, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)			; GCN-NEXT: [[COPY10:%[0-9]+]]:sreg_32 = COPY [[COPY]]
	; GCN: INLINEASM &"", 1 /* sideeffect attdialect */			; GCN-NEXT: [[BUFFER_LOAD_DWORDX4_OFFSET4:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_OFFSET [[S_LOAD_DWORDX4_IMM]], [[COPY10]], 128, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)
	; GCN: [[BUFFER_LOAD_FORMAT_XYZW_OFFSET1:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_OFFSET [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 144, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from custom "BufferResource" + 144, align 1, addrspace 4)			; GCN-NEXT: INLINEASM &"", 1 /* sideeffect attdialect */
	; GCN: [[S_MOV_B32_3:%[0-9]+]]:sreg_32 = S_MOV_B32 72			; GCN-NEXT: [[BUFFER_LOAD_FORMAT_XYZW_OFFSET1:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_OFFSET [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 144, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from custom "BufferResource" + 144, align 1, addrspace 4)
	; GCN: [[BUFFER_LOAD_FORMAT_XYZW_OFFSET2:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_OFFSET [[S_LOAD_DWORDX4_IMM]], killed [[S_MOV_B32_3]], 72, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from custom "BufferResource" + 144, align 1, addrspace 4)			; GCN-NEXT: [[S_MOV_B32_3:%[0-9]+]]:sreg_32 = S_MOV_B32 72
	; GCN: [[S_MOV_B32_4:%[0-9]+]]:sreg_32 = S_MOV_B32 144			; GCN-NEXT: [[BUFFER_LOAD_FORMAT_XYZW_OFFSET2:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_OFFSET [[S_LOAD_DWORDX4_IMM]], killed [[S_MOV_B32_3]], 72, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from custom "BufferResource" + 144, align 1, addrspace 4)
	; GCN: [[BUFFER_LOAD_FORMAT_XYZW_OFFSET3:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_OFFSET [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_4]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from custom "BufferResource" + 144, align 1, addrspace 4)			; GCN-NEXT: [[S_MOV_B32_4:%[0-9]+]]:sreg_32 = S_MOV_B32 144
	; GCN: [[BUFFER_LOAD_FORMAT_XYZW_OFFEN1:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_OFFEN [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_4]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)			; GCN-NEXT: [[BUFFER_LOAD_FORMAT_XYZW_OFFSET3:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_OFFSET [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_4]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from custom "BufferResource" + 144, align 1, addrspace 4)
	; GCN: [[COPY11:%[0-9]+]]:sreg_32 = COPY [[COPY]]			; GCN-NEXT: [[BUFFER_LOAD_FORMAT_XYZW_OFFEN1:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_OFFEN [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_4]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)
	; GCN: [[BUFFER_LOAD_FORMAT_XYZW_OFFSET4:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_OFFSET [[S_LOAD_DWORDX4_IMM]], [[COPY11]], 144, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)			; GCN-NEXT: [[COPY11:%[0-9]+]]:sreg_32 = COPY [[COPY]]
	; GCN: INLINEASM &"", 1 /* sideeffect attdialect */			; GCN-NEXT: [[BUFFER_LOAD_FORMAT_XYZW_OFFSET4:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_OFFSET [[S_LOAD_DWORDX4_IMM]], [[COPY11]], 144, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)
	; GCN: BUFFER_ATOMIC_ADD_OFFSET [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 160, 0, implicit $exec :: (volatile dereferenceable load store (s32) on custom "BufferResource" + 160, align 1, addrspace 4)			; GCN-NEXT: INLINEASM &"", 1 /* sideeffect attdialect */
	; GCN: [[S_MOV_B32_5:%[0-9]+]]:sreg_32 = S_MOV_B32 80			; GCN-NEXT: BUFFER_ATOMIC_ADD_OFFSET [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 160, 0, implicit $exec :: (volatile dereferenceable load store (s32) on custom "BufferResource" + 160, align 1, addrspace 4)
	; GCN: BUFFER_ATOMIC_ADD_OFFSET [[COPY]], [[S_LOAD_DWORDX4_IMM]], killed [[S_MOV_B32_5]], 80, 0, implicit $exec :: (volatile dereferenceable load store (s32) on custom "BufferResource" + 160, align 1, addrspace 4)			; GCN-NEXT: [[S_MOV_B32_5:%[0-9]+]]:sreg_32 = S_MOV_B32 80
	; GCN: [[S_MOV_B32_6:%[0-9]+]]:sreg_32 = S_MOV_B32 160			; GCN-NEXT: BUFFER_ATOMIC_ADD_OFFSET [[COPY]], [[S_LOAD_DWORDX4_IMM]], killed [[S_MOV_B32_5]], 80, 0, implicit $exec :: (volatile dereferenceable load store (s32) on custom "BufferResource" + 160, align 1, addrspace 4)
	; GCN: BUFFER_ATOMIC_ADD_OFFSET [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_6]], 0, 0, implicit $exec :: (volatile dereferenceable load store (s32) on custom "BufferResource" + 160, align 1, addrspace 4)			; GCN-NEXT: [[S_MOV_B32_6:%[0-9]+]]:sreg_32 = S_MOV_B32 160
	; GCN: BUFFER_ATOMIC_ADD_OFFEN [[COPY]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_6]], 0, 0, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)			; GCN-NEXT: BUFFER_ATOMIC_ADD_OFFSET [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_6]], 0, 0, implicit $exec :: (volatile dereferenceable load store (s32) on custom "BufferResource" + 160, align 1, addrspace 4)
	; GCN: [[COPY12:%[0-9]+]]:sreg_32 = COPY [[COPY]]			; GCN-NEXT: BUFFER_ATOMIC_ADD_OFFEN [[COPY]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_6]], 0, 0, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)
	; GCN: BUFFER_ATOMIC_ADD_OFFSET [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[COPY12]], 160, 0, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)			; GCN-NEXT: [[COPY12:%[0-9]+]]:sreg_32 = COPY [[COPY]]
	; GCN: INLINEASM &"", 1 /* sideeffect attdialect */			; GCN-NEXT: BUFFER_ATOMIC_ADD_OFFSET [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[COPY12]], 160, 0, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)
	; GCN: [[DEF4:%[0-9]+]]:vreg_64 = IMPLICIT_DEF			; GCN-NEXT: INLINEASM &"", 1 /* sideeffect attdialect */
	; GCN: BUFFER_ATOMIC_CMPSWAP_OFFSET [[REG_SEQUENCE1]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 176, 1, implicit $exec :: (volatile dereferenceable load store (s32) on custom "BufferResource" + 176, align 1, addrspace 4)			; GCN-NEXT: [[DEF4:%[0-9]+]]:vreg_64 = IMPLICIT_DEF
	; GCN: [[COPY13:%[0-9]+]]:vreg_64 = COPY [[DEF4]]			; GCN-NEXT: BUFFER_ATOMIC_CMPSWAP_OFFSET [[REG_SEQUENCE1]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 176, 1, implicit $exec :: (volatile dereferenceable load store (s32) on custom "BufferResource" + 176, align 1, addrspace 4)
	; GCN: [[COPY14:%[0-9]+]]:vgpr_32 = COPY [[COPY13]].sub0			; GCN-NEXT: [[COPY13:%[0-9]+]]:vreg_64 = COPY [[DEF4]]
	; GCN: [[S_MOV_B32_7:%[0-9]+]]:sreg_32 = S_MOV_B32 88			; GCN-NEXT: [[COPY14:%[0-9]+]]:vgpr_32 = COPY [[COPY13]].sub0
	; GCN: [[DEF5:%[0-9]+]]:vreg_64 = IMPLICIT_DEF			; GCN-NEXT: [[S_MOV_B32_7:%[0-9]+]]:sreg_32 = S_MOV_B32 88
	; GCN: BUFFER_ATOMIC_CMPSWAP_OFFSET [[REG_SEQUENCE1]], [[S_LOAD_DWORDX4_IMM]], killed [[S_MOV_B32_7]], 88, 1, implicit $exec :: (volatile dereferenceable load store (s32) on custom "BufferResource" + 176, align 1, addrspace 4)			; GCN-NEXT: [[DEF5:%[0-9]+]]:vreg_64 = IMPLICIT_DEF
	; GCN: [[COPY15:%[0-9]+]]:vreg_64 = COPY [[DEF5]]			; GCN-NEXT: BUFFER_ATOMIC_CMPSWAP_OFFSET [[REG_SEQUENCE1]], [[S_LOAD_DWORDX4_IMM]], killed [[S_MOV_B32_7]], 88, 1, implicit $exec :: (volatile dereferenceable load store (s32) on custom "BufferResource" + 176, align 1, addrspace 4)
	; GCN: [[COPY16:%[0-9]+]]:vgpr_32 = COPY [[COPY15]].sub0			; GCN-NEXT: [[COPY15:%[0-9]+]]:vreg_64 = COPY [[DEF5]]
	; GCN: [[S_MOV_B32_8:%[0-9]+]]:sreg_32 = S_MOV_B32 176			; GCN-NEXT: [[COPY16:%[0-9]+]]:vgpr_32 = COPY [[COPY15]].sub0
	; GCN: [[DEF6:%[0-9]+]]:vreg_64 = IMPLICIT_DEF			; GCN-NEXT: [[S_MOV_B32_8:%[0-9]+]]:sreg_32 = S_MOV_B32 176
	; GCN: BUFFER_ATOMIC_CMPSWAP_OFFSET [[REG_SEQUENCE1]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_8]], 0, 1, implicit $exec :: (volatile dereferenceable load store (s32) on custom "BufferResource" + 176, align 1, addrspace 4)			; GCN-NEXT: [[DEF6:%[0-9]+]]:vreg_64 = IMPLICIT_DEF
	; GCN: [[COPY17:%[0-9]+]]:vreg_64 = COPY [[DEF6]]			; GCN-NEXT: BUFFER_ATOMIC_CMPSWAP_OFFSET [[REG_SEQUENCE1]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_8]], 0, 1, implicit $exec :: (volatile dereferenceable load store (s32) on custom "BufferResource" + 176, align 1, addrspace 4)
	; GCN: [[COPY18:%[0-9]+]]:vgpr_32 = COPY [[COPY17]].sub0			; GCN-NEXT: [[COPY17:%[0-9]+]]:vreg_64 = COPY [[DEF6]]
	; GCN: [[DEF7:%[0-9]+]]:vreg_64 = IMPLICIT_DEF			; GCN-NEXT: [[COPY18:%[0-9]+]]:vgpr_32 = COPY [[COPY17]].sub0
	; GCN: BUFFER_ATOMIC_CMPSWAP_OFFEN [[REG_SEQUENCE1]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_8]], 0, 1, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)			; GCN-NEXT: [[DEF7:%[0-9]+]]:vreg_64 = IMPLICIT_DEF
	; GCN: [[COPY19:%[0-9]+]]:vreg_64 = COPY [[DEF7]]			; GCN-NEXT: BUFFER_ATOMIC_CMPSWAP_OFFEN [[REG_SEQUENCE1]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_8]], 0, 1, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)
	; GCN: [[COPY20:%[0-9]+]]:vgpr_32 = COPY [[COPY19]].sub0			; GCN-NEXT: [[COPY19:%[0-9]+]]:vreg_64 = COPY [[DEF7]]
	; GCN: [[COPY21:%[0-9]+]]:sreg_32 = COPY [[COPY]]			; GCN-NEXT: [[COPY20:%[0-9]+]]:vgpr_32 = COPY [[COPY19]].sub0
	; GCN: [[DEF8:%[0-9]+]]:vreg_64 = IMPLICIT_DEF			; GCN-NEXT: [[COPY21:%[0-9]+]]:sreg_32 = COPY [[COPY]]
	; GCN: BUFFER_ATOMIC_CMPSWAP_OFFSET [[REG_SEQUENCE1]], [[S_LOAD_DWORDX4_IMM]], [[COPY21]], 176, 1, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)			; GCN-NEXT: [[DEF8:%[0-9]+]]:vreg_64 = IMPLICIT_DEF
	; GCN: [[COPY22:%[0-9]+]]:vreg_64 = COPY [[DEF8]]			; GCN-NEXT: BUFFER_ATOMIC_CMPSWAP_OFFSET [[REG_SEQUENCE1]], [[S_LOAD_DWORDX4_IMM]], [[COPY21]], 176, 1, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)
	; GCN: [[COPY23:%[0-9]+]]:vgpr_32 = COPY [[COPY22]].sub0			; GCN-NEXT: [[COPY22:%[0-9]+]]:vreg_64 = COPY [[DEF8]]
	; GCN: INLINEASM &"", 1 /* sideeffect attdialect */			; GCN-NEXT: [[COPY23:%[0-9]+]]:vgpr_32 = COPY [[COPY22]].sub0
	; GCN: BUFFER_STORE_DWORDX4_OFFSET_exact killed [[BUFFER_LOAD_DWORDX4_OFFSET1]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 192, 0, 0, 0, implicit $exec :: (dereferenceable store (s128) into custom "BufferResource" + 192, align 1, addrspace 4)			; GCN-NEXT: INLINEASM &"", 1 /* sideeffect attdialect */
	; GCN: [[S_MOV_B32_9:%[0-9]+]]:sreg_32 = S_MOV_B32 96			; GCN-NEXT: BUFFER_STORE_DWORDX4_OFFSET_exact killed [[BUFFER_LOAD_DWORDX4_OFFSET1]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 192, 0, 0, 0, implicit $exec :: (dereferenceable store (s128) into custom "BufferResource" + 192, align 1, addrspace 4)
	; GCN: BUFFER_STORE_DWORDX4_OFFSET_exact killed [[BUFFER_LOAD_DWORDX4_OFFSET2]], [[S_LOAD_DWORDX4_IMM]], killed [[S_MOV_B32_9]], 96, 0, 0, 0, implicit $exec :: (dereferenceable store (s128) into custom "BufferResource" + 192, align 1, addrspace 4)			; GCN-NEXT: [[S_MOV_B32_9:%[0-9]+]]:sreg_32 = S_MOV_B32 96
	; GCN: [[S_MOV_B32_10:%[0-9]+]]:sreg_32 = S_MOV_B32 192			; GCN-NEXT: BUFFER_STORE_DWORDX4_OFFSET_exact killed [[BUFFER_LOAD_DWORDX4_OFFSET2]], [[S_LOAD_DWORDX4_IMM]], killed [[S_MOV_B32_9]], 96, 0, 0, 0, implicit $exec :: (dereferenceable store (s128) into custom "BufferResource" + 192, align 1, addrspace 4)
	; GCN: BUFFER_STORE_DWORDX4_OFFSET_exact killed [[BUFFER_LOAD_DWORDX4_OFFSET3]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_10]], 0, 0, 0, 0, implicit $exec :: (dereferenceable store (s128) into custom "BufferResource" + 192, align 1, addrspace 4)			; GCN-NEXT: [[S_MOV_B32_10:%[0-9]+]]:sreg_32 = S_MOV_B32 192
	; GCN: BUFFER_STORE_DWORDX4_OFFEN_exact killed [[BUFFER_LOAD_DWORDX4_OFFEN1]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_10]], 0, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)			; GCN-NEXT: BUFFER_STORE_DWORDX4_OFFSET_exact killed [[BUFFER_LOAD_DWORDX4_OFFSET3]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_10]], 0, 0, 0, 0, implicit $exec :: (dereferenceable store (s128) into custom "BufferResource" + 192, align 1, addrspace 4)
	; GCN: [[COPY24:%[0-9]+]]:sreg_32 = COPY [[COPY]]			; GCN-NEXT: BUFFER_STORE_DWORDX4_OFFEN_exact killed [[BUFFER_LOAD_DWORDX4_OFFEN1]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_10]], 0, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)
	; GCN: BUFFER_STORE_DWORDX4_OFFSET_exact killed [[BUFFER_LOAD_DWORDX4_OFFSET4]], [[S_LOAD_DWORDX4_IMM]], [[COPY24]], 192, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)			; GCN-NEXT: [[COPY24:%[0-9]+]]:sreg_32 = COPY [[COPY]]
	; GCN: INLINEASM &"", 1 /* sideeffect attdialect */			; GCN-NEXT: BUFFER_STORE_DWORDX4_OFFSET_exact killed [[BUFFER_LOAD_DWORDX4_OFFSET4]], [[S_LOAD_DWORDX4_IMM]], [[COPY24]], 192, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)
	; GCN: BUFFER_STORE_FORMAT_XYZW_OFFSET_exact killed [[BUFFER_LOAD_FORMAT_XYZW_OFFSET1]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 208, 0, 0, 0, implicit $exec :: (dereferenceable store (s128) into custom "BufferResource" + 208, align 1, addrspace 4)			; GCN-NEXT: INLINEASM &"", 1 /* sideeffect attdialect */
	; GCN: [[S_MOV_B32_11:%[0-9]+]]:sreg_32 = S_MOV_B32 104			; GCN-NEXT: BUFFER_STORE_FORMAT_XYZW_OFFSET_exact killed [[BUFFER_LOAD_FORMAT_XYZW_OFFSET1]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 208, 0, 0, 0, implicit $exec :: (dereferenceable store (s128) into custom "BufferResource" + 208, align 1, addrspace 4)
	; GCN: BUFFER_STORE_FORMAT_XYZW_OFFSET_exact killed [[BUFFER_LOAD_FORMAT_XYZW_OFFSET2]], [[S_LOAD_DWORDX4_IMM]], killed [[S_MOV_B32_11]], 104, 0, 0, 0, implicit $exec :: (dereferenceable store (s128) into custom "BufferResource" + 208, align 1, addrspace 4)			; GCN-NEXT: [[S_MOV_B32_11:%[0-9]+]]:sreg_32 = S_MOV_B32 104
	; GCN: [[S_MOV_B32_12:%[0-9]+]]:sreg_32 = S_MOV_B32 208			; GCN-NEXT: BUFFER_STORE_FORMAT_XYZW_OFFSET_exact killed [[BUFFER_LOAD_FORMAT_XYZW_OFFSET2]], [[S_LOAD_DWORDX4_IMM]], killed [[S_MOV_B32_11]], 104, 0, 0, 0, implicit $exec :: (dereferenceable store (s128) into custom "BufferResource" + 208, align 1, addrspace 4)
	; GCN: BUFFER_STORE_FORMAT_XYZW_OFFSET_exact killed [[BUFFER_LOAD_FORMAT_XYZW_OFFSET3]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_12]], 0, 0, 0, 0, implicit $exec :: (dereferenceable store (s128) into custom "BufferResource" + 208, align 1, addrspace 4)			; GCN-NEXT: [[S_MOV_B32_12:%[0-9]+]]:sreg_32 = S_MOV_B32 208
	; GCN: BUFFER_STORE_FORMAT_XYZW_OFFEN_exact killed [[BUFFER_LOAD_FORMAT_XYZW_OFFEN1]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_12]], 0, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)			; GCN-NEXT: BUFFER_STORE_FORMAT_XYZW_OFFSET_exact killed [[BUFFER_LOAD_FORMAT_XYZW_OFFSET3]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_12]], 0, 0, 0, 0, implicit $exec :: (dereferenceable store (s128) into custom "BufferResource" + 208, align 1, addrspace 4)
	; GCN: [[COPY25:%[0-9]+]]:sreg_32 = COPY [[COPY]]			; GCN-NEXT: BUFFER_STORE_FORMAT_XYZW_OFFEN_exact killed [[BUFFER_LOAD_FORMAT_XYZW_OFFEN1]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_12]], 0, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)
	; GCN: BUFFER_STORE_FORMAT_XYZW_OFFSET_exact killed [[BUFFER_LOAD_FORMAT_XYZW_OFFSET4]], [[S_LOAD_DWORDX4_IMM]], [[COPY25]], 208, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)			; GCN-NEXT: [[COPY25:%[0-9]+]]:sreg_32 = COPY [[COPY]]
	; GCN: INLINEASM &"", 1 /* sideeffect attdialect */			; GCN-NEXT: BUFFER_STORE_FORMAT_XYZW_OFFSET_exact killed [[BUFFER_LOAD_FORMAT_XYZW_OFFSET4]], [[S_LOAD_DWORDX4_IMM]], [[COPY25]], 208, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)
	; GCN: [[COPY26:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]			; GCN-NEXT: INLINEASM &"", 1 /* sideeffect attdialect */
	; GCN: [[BUFFER_LOAD_DWORDX4_IDXEN2:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_IDXEN [[COPY26]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 224, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from custom "BufferResource" + 224, align 1, addrspace 4)			; GCN-NEXT: [[COPY26:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
	; GCN: [[S_MOV_B32_13:%[0-9]+]]:sreg_32 = S_MOV_B32 112			; GCN-NEXT: [[BUFFER_LOAD_DWORDX4_IDXEN2:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_IDXEN [[COPY26]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 224, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from custom "BufferResource" + 224, align 1, addrspace 4)
	; GCN: [[COPY27:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]			; GCN-NEXT: [[S_MOV_B32_13:%[0-9]+]]:sreg_32 = S_MOV_B32 112
	; GCN: [[BUFFER_LOAD_DWORDX4_IDXEN3:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_IDXEN [[COPY27]], [[S_LOAD_DWORDX4_IMM]], killed [[S_MOV_B32_13]], 112, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from custom "BufferResource" + 224, align 1, addrspace 4)			; GCN-NEXT: [[COPY27:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
	; GCN: [[S_MOV_B32_14:%[0-9]+]]:sreg_32 = S_MOV_B32 224			; GCN-NEXT: [[BUFFER_LOAD_DWORDX4_IDXEN3:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_IDXEN [[COPY27]], [[S_LOAD_DWORDX4_IMM]], killed [[S_MOV_B32_13]], 112, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from custom "BufferResource" + 224, align 1, addrspace 4)
	; GCN: [[COPY28:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]			; GCN-NEXT: [[S_MOV_B32_14:%[0-9]+]]:sreg_32 = S_MOV_B32 224
	; GCN: [[BUFFER_LOAD_DWORDX4_IDXEN4:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_IDXEN [[COPY28]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_14]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from custom "BufferResource" + 224, align 1, addrspace 4)			; GCN-NEXT: [[COPY28:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
	; GCN: [[REG_SEQUENCE2:%[0-9]+]]:vreg_64 = REG_SEQUENCE [[S_MOV_B32_]], %subreg.sub0, [[COPY]], %subreg.sub1			; GCN-NEXT: [[BUFFER_LOAD_DWORDX4_IDXEN4:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_IDXEN [[COPY28]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_14]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from custom "BufferResource" + 224, align 1, addrspace 4)
	; GCN: [[BUFFER_LOAD_DWORDX4_BOTHEN:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_BOTHEN [[REG_SEQUENCE2]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_14]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)			; GCN-NEXT: [[REG_SEQUENCE2:%[0-9]+]]:vreg_64 = REG_SEQUENCE [[S_MOV_B32_]], %subreg.sub0, [[COPY]], %subreg.sub1
	; GCN: [[COPY29:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]			; GCN-NEXT: [[BUFFER_LOAD_DWORDX4_BOTHEN:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_BOTHEN [[REG_SEQUENCE2]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_14]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)
	; GCN: [[COPY30:%[0-9]+]]:sreg_32 = COPY [[COPY]]			; GCN-NEXT: [[COPY29:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
	; GCN: [[BUFFER_LOAD_DWORDX4_IDXEN5:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_IDXEN [[COPY29]], [[S_LOAD_DWORDX4_IMM]], [[COPY30]], 224, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)			; GCN-NEXT: [[COPY30:%[0-9]+]]:sreg_32 = COPY [[COPY]]
	; GCN: [[BUFFER_LOAD_DWORDX4_IDXEN6:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 224, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)			; GCN-NEXT: [[BUFFER_LOAD_DWORDX4_IDXEN5:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_IDXEN [[COPY29]], [[S_LOAD_DWORDX4_IMM]], [[COPY30]], 224, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)
	; GCN: [[BUFFER_LOAD_DWORDX4_IDXEN7:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_IDXEN [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 224, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)			; GCN-NEXT: [[BUFFER_LOAD_DWORDX4_IDXEN6:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 224, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)
	; GCN: INLINEASM &"", 1 /* sideeffect attdialect */			; GCN-NEXT: [[BUFFER_LOAD_DWORDX4_IDXEN7:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_IDXEN [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 224, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)
	; GCN: [[COPY31:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]			; GCN-NEXT: INLINEASM &"", 1 /* sideeffect attdialect */
	; GCN: [[BUFFER_LOAD_FORMAT_XYZW_IDXEN2:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_IDXEN [[COPY31]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 240, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from custom "BufferResource" + 240, align 1, addrspace 4)			; GCN-NEXT: [[COPY31:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
	; GCN: [[S_MOV_B32_15:%[0-9]+]]:sreg_32 = S_MOV_B32 120			; GCN-NEXT: [[BUFFER_LOAD_FORMAT_XYZW_IDXEN2:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_IDXEN [[COPY31]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 240, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from custom "BufferResource" + 240, align 1, addrspace 4)
	; GCN: [[COPY32:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]			; GCN-NEXT: [[S_MOV_B32_15:%[0-9]+]]:sreg_32 = S_MOV_B32 120
	; GCN: [[BUFFER_LOAD_FORMAT_XYZW_IDXEN3:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_IDXEN [[COPY32]], [[S_LOAD_DWORDX4_IMM]], killed [[S_MOV_B32_15]], 120, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from custom "BufferResource" + 240, align 1, addrspace 4)			; GCN-NEXT: [[COPY32:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
	; GCN: [[S_MOV_B32_16:%[0-9]+]]:sreg_32 = S_MOV_B32 240			; GCN-NEXT: [[BUFFER_LOAD_FORMAT_XYZW_IDXEN3:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_IDXEN [[COPY32]], [[S_LOAD_DWORDX4_IMM]], killed [[S_MOV_B32_15]], 120, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from custom "BufferResource" + 240, align 1, addrspace 4)
	; GCN: [[COPY33:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]			; GCN-NEXT: [[S_MOV_B32_16:%[0-9]+]]:sreg_32 = S_MOV_B32 240
	; GCN: [[BUFFER_LOAD_FORMAT_XYZW_IDXEN4:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_IDXEN [[COPY33]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_16]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from custom "BufferResource" + 240, align 1, addrspace 4)			; GCN-NEXT: [[COPY33:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
	; GCN: [[BUFFER_LOAD_FORMAT_XYZW_BOTHEN:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_BOTHEN [[REG_SEQUENCE2]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_16]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)			; GCN-NEXT: [[BUFFER_LOAD_FORMAT_XYZW_IDXEN4:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_IDXEN [[COPY33]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_16]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from custom "BufferResource" + 240, align 1, addrspace 4)
	; GCN: [[COPY34:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]			; GCN-NEXT: [[BUFFER_LOAD_FORMAT_XYZW_BOTHEN:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_BOTHEN [[REG_SEQUENCE2]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_16]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)
	; GCN: [[COPY35:%[0-9]+]]:sreg_32 = COPY [[COPY]]			; GCN-NEXT: [[COPY34:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
	; GCN: [[BUFFER_LOAD_FORMAT_XYZW_IDXEN5:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_IDXEN [[COPY34]], [[S_LOAD_DWORDX4_IMM]], [[COPY35]], 240, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)			; GCN-NEXT: [[COPY35:%[0-9]+]]:sreg_32 = COPY [[COPY]]
	; GCN: [[BUFFER_LOAD_FORMAT_XYZW_IDXEN6:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 240, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)			; GCN-NEXT: [[BUFFER_LOAD_FORMAT_XYZW_IDXEN5:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_IDXEN [[COPY34]], [[S_LOAD_DWORDX4_IMM]], [[COPY35]], 240, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)
	; GCN: [[BUFFER_LOAD_FORMAT_XYZW_IDXEN7:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_IDXEN [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 240, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)			; GCN-NEXT: [[BUFFER_LOAD_FORMAT_XYZW_IDXEN6:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 240, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)
	; GCN: INLINEASM &"", 1 /* sideeffect attdialect */			; GCN-NEXT: [[BUFFER_LOAD_FORMAT_XYZW_IDXEN7:%[0-9]+]]:vreg_128 = BUFFER_LOAD_FORMAT_XYZW_IDXEN [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 240, 0, 0, 0, implicit $exec :: (dereferenceable load (s128), align 1, addrspace 4)
	; GCN: [[COPY36:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]			; GCN-NEXT: INLINEASM &"", 1 /* sideeffect attdialect */
	; GCN: BUFFER_ATOMIC_ADD_IDXEN [[COPY]], [[COPY36]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 256, 0, implicit $exec :: (volatile dereferenceable load store (s32) on custom "BufferResource" + 256, align 1, addrspace 4)			; GCN-NEXT: [[COPY36:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
	; GCN: [[COPY37:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]			; GCN-NEXT: BUFFER_ATOMIC_ADD_IDXEN [[COPY]], [[COPY36]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 256, 0, implicit $exec :: (volatile dereferenceable load store (s32) on custom "BufferResource" + 256, align 1, addrspace 4)
	; GCN: BUFFER_ATOMIC_ADD_IDXEN [[COPY]], [[COPY37]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_2]], 128, 0, implicit $exec :: (volatile dereferenceable load store (s32) on custom "BufferResource" + 256, align 1, addrspace 4)			; GCN-NEXT: [[COPY37:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
	; GCN: [[S_MOV_B32_17:%[0-9]+]]:sreg_32 = S_MOV_B32 256			; GCN-NEXT: BUFFER_ATOMIC_ADD_IDXEN [[COPY]], [[COPY37]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_2]], 128, 0, implicit $exec :: (volatile dereferenceable load store (s32) on custom "BufferResource" + 256, align 1, addrspace 4)
	; GCN: [[COPY38:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]			; GCN-NEXT: [[S_MOV_B32_17:%[0-9]+]]:sreg_32 = S_MOV_B32 256
	; GCN: BUFFER_ATOMIC_ADD_IDXEN [[COPY]], [[COPY38]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_17]], 0, 0, implicit $exec :: (volatile dereferenceable load store (s32) on custom "BufferResource" + 256, align 1, addrspace 4)			; GCN-NEXT: [[COPY38:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
	; GCN: BUFFER_ATOMIC_ADD_BOTHEN [[COPY]], [[REG_SEQUENCE2]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_17]], 0, 0, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)			; GCN-NEXT: BUFFER_ATOMIC_ADD_IDXEN [[COPY]], [[COPY38]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_17]], 0, 0, implicit $exec :: (volatile dereferenceable load store (s32) on custom "BufferResource" + 256, align 1, addrspace 4)
	; GCN: [[COPY39:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]			; GCN-NEXT: BUFFER_ATOMIC_ADD_BOTHEN [[COPY]], [[REG_SEQUENCE2]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_17]], 0, 0, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)
	; GCN: [[COPY40:%[0-9]+]]:sreg_32 = COPY [[COPY]]			; GCN-NEXT: [[COPY39:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
	; GCN: BUFFER_ATOMIC_ADD_IDXEN [[COPY]], [[COPY39]], [[S_LOAD_DWORDX4_IMM]], [[COPY40]], 256, 0, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)			; GCN-NEXT: [[COPY40:%[0-9]+]]:sreg_32 = COPY [[COPY]]
	; GCN: BUFFER_ATOMIC_ADD_IDXEN [[COPY]], [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 256, 0, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)			; GCN-NEXT: BUFFER_ATOMIC_ADD_IDXEN [[COPY]], [[COPY39]], [[S_LOAD_DWORDX4_IMM]], [[COPY40]], 256, 0, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)
	; GCN: BUFFER_ATOMIC_ADD_IDXEN [[COPY]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 256, 0, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)			; GCN-NEXT: BUFFER_ATOMIC_ADD_IDXEN [[COPY]], [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 256, 0, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)
	; GCN: INLINEASM &"", 1 /* sideeffect attdialect */			; GCN-NEXT: BUFFER_ATOMIC_ADD_IDXEN [[COPY]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 256, 0, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)
	; GCN: [[COPY41:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]			; GCN-NEXT: INLINEASM &"", 1 /* sideeffect attdialect */
	; GCN: [[DEF9:%[0-9]+]]:vreg_64 = IMPLICIT_DEF			; GCN-NEXT: [[COPY41:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
	; GCN: BUFFER_ATOMIC_CMPSWAP_IDXEN [[REG_SEQUENCE1]], [[COPY41]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 272, 1, implicit $exec :: (volatile dereferenceable load store (s32) on custom "BufferResource" + 272, align 1, addrspace 4)			; GCN-NEXT: [[DEF9:%[0-9]+]]:vreg_64 = IMPLICIT_DEF
	; GCN: [[COPY42:%[0-9]+]]:vreg_64 = COPY [[DEF9]]			; GCN-NEXT: BUFFER_ATOMIC_CMPSWAP_IDXEN [[REG_SEQUENCE1]], [[COPY41]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 272, 1, implicit $exec :: (volatile dereferenceable load store (s32) on custom "BufferResource" + 272, align 1, addrspace 4)
	; GCN: [[COPY43:%[0-9]+]]:vgpr_32 = COPY [[COPY42]].sub0			; GCN-NEXT: [[COPY42:%[0-9]+]]:vreg_64 = COPY [[DEF9]]
	; GCN: [[S_MOV_B32_18:%[0-9]+]]:sreg_32 = S_MOV_B32 136			; GCN-NEXT: [[COPY43:%[0-9]+]]:vgpr_32 = COPY [[COPY42]].sub0
	; GCN: [[COPY44:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]			; GCN-NEXT: [[S_MOV_B32_18:%[0-9]+]]:sreg_32 = S_MOV_B32 136
	; GCN: [[DEF10:%[0-9]+]]:vreg_64 = IMPLICIT_DEF			; GCN-NEXT: [[COPY44:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
	; GCN: BUFFER_ATOMIC_CMPSWAP_IDXEN [[REG_SEQUENCE1]], [[COPY44]], [[S_LOAD_DWORDX4_IMM]], killed [[S_MOV_B32_18]], 136, 1, implicit $exec :: (volatile dereferenceable load store (s32) on custom "BufferResource" + 272, align 1, addrspace 4)			; GCN-NEXT: [[DEF10:%[0-9]+]]:vreg_64 = IMPLICIT_DEF
	; GCN: [[COPY45:%[0-9]+]]:vreg_64 = COPY [[DEF10]]			; GCN-NEXT: BUFFER_ATOMIC_CMPSWAP_IDXEN [[REG_SEQUENCE1]], [[COPY44]], [[S_LOAD_DWORDX4_IMM]], killed [[S_MOV_B32_18]], 136, 1, implicit $exec :: (volatile dereferenceable load store (s32) on custom "BufferResource" + 272, align 1, addrspace 4)
	; GCN: [[COPY46:%[0-9]+]]:vgpr_32 = COPY [[COPY45]].sub0			; GCN-NEXT: [[COPY45:%[0-9]+]]:vreg_64 = COPY [[DEF10]]
	; GCN: [[S_MOV_B32_19:%[0-9]+]]:sreg_32 = S_MOV_B32 272			; GCN-NEXT: [[COPY46:%[0-9]+]]:vgpr_32 = COPY [[COPY45]].sub0
	; GCN: [[COPY47:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]			; GCN-NEXT: [[S_MOV_B32_19:%[0-9]+]]:sreg_32 = S_MOV_B32 272
	; GCN: [[DEF11:%[0-9]+]]:vreg_64 = IMPLICIT_DEF			; GCN-NEXT: [[COPY47:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
	; GCN: BUFFER_ATOMIC_CMPSWAP_IDXEN [[REG_SEQUENCE1]], [[COPY47]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_19]], 0, 1, implicit $exec :: (volatile dereferenceable load store (s32) on custom "BufferResource" + 272, align 1, addrspace 4)			; GCN-NEXT: [[DEF11:%[0-9]+]]:vreg_64 = IMPLICIT_DEF
	; GCN: [[COPY48:%[0-9]+]]:vreg_64 = COPY [[DEF11]]			; GCN-NEXT: BUFFER_ATOMIC_CMPSWAP_IDXEN [[REG_SEQUENCE1]], [[COPY47]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_19]], 0, 1, implicit $exec :: (volatile dereferenceable load store (s32) on custom "BufferResource" + 272, align 1, addrspace 4)
	; GCN: [[COPY49:%[0-9]+]]:vgpr_32 = COPY [[COPY48]].sub0			; GCN-NEXT: [[COPY48:%[0-9]+]]:vreg_64 = COPY [[DEF11]]
	; GCN: [[DEF12:%[0-9]+]]:vreg_64 = IMPLICIT_DEF			; GCN-NEXT: [[COPY49:%[0-9]+]]:vgpr_32 = COPY [[COPY48]].sub0
	; GCN: BUFFER_ATOMIC_CMPSWAP_BOTHEN [[REG_SEQUENCE1]], [[REG_SEQUENCE2]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_19]], 0, 1, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)			; GCN-NEXT: [[DEF12:%[0-9]+]]:vreg_64 = IMPLICIT_DEF
	; GCN: [[COPY50:%[0-9]+]]:vreg_64 = COPY [[DEF12]]			; GCN-NEXT: BUFFER_ATOMIC_CMPSWAP_BOTHEN [[REG_SEQUENCE1]], [[REG_SEQUENCE2]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_19]], 0, 1, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)
	; GCN: [[COPY51:%[0-9]+]]:vgpr_32 = COPY [[COPY50]].sub0			; GCN-NEXT: [[COPY50:%[0-9]+]]:vreg_64 = COPY [[DEF12]]
	; GCN: [[COPY52:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]			; GCN-NEXT: [[COPY51:%[0-9]+]]:vgpr_32 = COPY [[COPY50]].sub0
	; GCN: [[COPY53:%[0-9]+]]:sreg_32 = COPY [[COPY]]			; GCN-NEXT: [[COPY52:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
	; GCN: [[DEF13:%[0-9]+]]:vreg_64 = IMPLICIT_DEF			; GCN-NEXT: [[COPY53:%[0-9]+]]:sreg_32 = COPY [[COPY]]
	; GCN: BUFFER_ATOMIC_CMPSWAP_IDXEN [[REG_SEQUENCE1]], [[COPY52]], [[S_LOAD_DWORDX4_IMM]], [[COPY53]], 272, 1, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)			; GCN-NEXT: [[DEF13:%[0-9]+]]:vreg_64 = IMPLICIT_DEF
	; GCN: [[COPY54:%[0-9]+]]:vreg_64 = COPY [[DEF13]]			; GCN-NEXT: BUFFER_ATOMIC_CMPSWAP_IDXEN [[REG_SEQUENCE1]], [[COPY52]], [[S_LOAD_DWORDX4_IMM]], [[COPY53]], 272, 1, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)
	; GCN: [[COPY55:%[0-9]+]]:vgpr_32 = COPY [[COPY54]].sub0			; GCN-NEXT: [[COPY54:%[0-9]+]]:vreg_64 = COPY [[DEF13]]
	; GCN: [[DEF14:%[0-9]+]]:vreg_64 = IMPLICIT_DEF			; GCN-NEXT: [[COPY55:%[0-9]+]]:vgpr_32 = COPY [[COPY54]].sub0
	; GCN: BUFFER_ATOMIC_CMPSWAP_IDXEN [[REG_SEQUENCE1]], [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 272, 1, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)			; GCN-NEXT: [[DEF14:%[0-9]+]]:vreg_64 = IMPLICIT_DEF
	; GCN: [[COPY56:%[0-9]+]]:vreg_64 = COPY [[DEF14]]			; GCN-NEXT: BUFFER_ATOMIC_CMPSWAP_IDXEN [[REG_SEQUENCE1]], [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 272, 1, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)
	; GCN: [[COPY57:%[0-9]+]]:vgpr_32 = COPY [[COPY56]].sub0			; GCN-NEXT: [[COPY56:%[0-9]+]]:vreg_64 = COPY [[DEF14]]
	; GCN: [[DEF15:%[0-9]+]]:vreg_64 = IMPLICIT_DEF			; GCN-NEXT: [[COPY57:%[0-9]+]]:vgpr_32 = COPY [[COPY56]].sub0
	; GCN: BUFFER_ATOMIC_CMPSWAP_IDXEN [[REG_SEQUENCE1]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 272, 1, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)			; GCN-NEXT: [[DEF15:%[0-9]+]]:vreg_64 = IMPLICIT_DEF
	; GCN: [[COPY58:%[0-9]+]]:vreg_64 = COPY [[DEF15]]			; GCN-NEXT: BUFFER_ATOMIC_CMPSWAP_IDXEN [[REG_SEQUENCE1]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 272, 1, implicit $exec :: (volatile dereferenceable load store (s32), align 1, addrspace 4)
	; GCN: [[COPY59:%[0-9]+]]:vgpr_32 = COPY [[COPY58]].sub0			; GCN-NEXT: [[COPY58:%[0-9]+]]:vreg_64 = COPY [[DEF15]]
	; GCN: INLINEASM &"", 1 /* sideeffect attdialect */			; GCN-NEXT: [[COPY59:%[0-9]+]]:vgpr_32 = COPY [[COPY58]].sub0
	; GCN: [[COPY60:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]			; GCN-NEXT: INLINEASM &"", 1 /* sideeffect attdialect */
	; GCN: BUFFER_STORE_DWORDX4_IDXEN_exact killed [[BUFFER_LOAD_DWORDX4_IDXEN2]], [[COPY60]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 288, 0, 0, 0, implicit $exec :: (dereferenceable store (s128) into custom "BufferResource" + 288, align 1, addrspace 4)			; GCN-NEXT: [[COPY60:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
	; GCN: [[COPY61:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]			; GCN-NEXT: BUFFER_STORE_DWORDX4_IDXEN_exact killed [[BUFFER_LOAD_DWORDX4_IDXEN2]], [[COPY60]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 288, 0, 0, 0, implicit $exec :: (dereferenceable store (s128) into custom "BufferResource" + 288, align 1, addrspace 4)
	; GCN: BUFFER_STORE_DWORDX4_IDXEN_exact killed [[BUFFER_LOAD_DWORDX4_IDXEN3]], [[COPY61]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_4]], 144, 0, 0, 0, implicit $exec :: (dereferenceable store (s128) into custom "BufferResource" + 288, align 1, addrspace 4)			; GCN-NEXT: [[COPY61:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
	; GCN: [[S_MOV_B32_20:%[0-9]+]]:sreg_32 = S_MOV_B32 288			; GCN-NEXT: BUFFER_STORE_DWORDX4_IDXEN_exact killed [[BUFFER_LOAD_DWORDX4_IDXEN3]], [[COPY61]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_4]], 144, 0, 0, 0, implicit $exec :: (dereferenceable store (s128) into custom "BufferResource" + 288, align 1, addrspace 4)
	; GCN: [[COPY62:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]			; GCN-NEXT: [[S_MOV_B32_20:%[0-9]+]]:sreg_32 = S_MOV_B32 288
	; GCN: BUFFER_STORE_DWORDX4_IDXEN_exact killed [[BUFFER_LOAD_DWORDX4_IDXEN4]], [[COPY62]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_20]], 0, 0, 0, 0, implicit $exec :: (dereferenceable store (s128) into custom "BufferResource" + 288, align 1, addrspace 4)			; GCN-NEXT: [[COPY62:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
	; GCN: BUFFER_STORE_DWORDX4_BOTHEN_exact killed [[BUFFER_LOAD_DWORDX4_BOTHEN]], [[REG_SEQUENCE2]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_20]], 0, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)			; GCN-NEXT: BUFFER_STORE_DWORDX4_IDXEN_exact killed [[BUFFER_LOAD_DWORDX4_IDXEN4]], [[COPY62]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_20]], 0, 0, 0, 0, implicit $exec :: (dereferenceable store (s128) into custom "BufferResource" + 288, align 1, addrspace 4)
	; GCN: [[COPY63:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]			; GCN-NEXT: BUFFER_STORE_DWORDX4_BOTHEN_exact killed [[BUFFER_LOAD_DWORDX4_BOTHEN]], [[REG_SEQUENCE2]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_20]], 0, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)
	; GCN: [[COPY64:%[0-9]+]]:sreg_32 = COPY [[COPY]]			; GCN-NEXT: [[COPY63:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
	; GCN: BUFFER_STORE_DWORDX4_IDXEN_exact killed [[BUFFER_LOAD_DWORDX4_IDXEN5]], [[COPY63]], [[S_LOAD_DWORDX4_IMM]], [[COPY64]], 288, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)			; GCN-NEXT: [[COPY64:%[0-9]+]]:sreg_32 = COPY [[COPY]]
	; GCN: BUFFER_STORE_DWORDX4_IDXEN_exact killed [[BUFFER_LOAD_DWORDX4_IDXEN6]], [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 288, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)			; GCN-NEXT: BUFFER_STORE_DWORDX4_IDXEN_exact killed [[BUFFER_LOAD_DWORDX4_IDXEN5]], [[COPY63]], [[S_LOAD_DWORDX4_IMM]], [[COPY64]], 288, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)
	; GCN: BUFFER_STORE_DWORDX4_IDXEN_exact killed [[BUFFER_LOAD_DWORDX4_IDXEN7]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 288, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)			; GCN-NEXT: BUFFER_STORE_DWORDX4_IDXEN_exact killed [[BUFFER_LOAD_DWORDX4_IDXEN6]], [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 288, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)
	; GCN: INLINEASM &"", 1 /* sideeffect attdialect */			; GCN-NEXT: BUFFER_STORE_DWORDX4_IDXEN_exact killed [[BUFFER_LOAD_DWORDX4_IDXEN7]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 288, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)
	; GCN: [[COPY65:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]			; GCN-NEXT: INLINEASM &"", 1 /* sideeffect attdialect */
	; GCN: BUFFER_STORE_FORMAT_XYZW_IDXEN_exact killed [[BUFFER_LOAD_FORMAT_XYZW_IDXEN2]], [[COPY65]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 304, 0, 0, 0, implicit $exec :: (dereferenceable store (s128) into custom "BufferResource" + 304, align 1, addrspace 4)			; GCN-NEXT: [[COPY65:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
	; GCN: [[S_MOV_B32_21:%[0-9]+]]:sreg_32 = S_MOV_B32 152			; GCN-NEXT: BUFFER_STORE_FORMAT_XYZW_IDXEN_exact killed [[BUFFER_LOAD_FORMAT_XYZW_IDXEN2]], [[COPY65]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 304, 0, 0, 0, implicit $exec :: (dereferenceable store (s128) into custom "BufferResource" + 304, align 1, addrspace 4)
	; GCN: [[COPY66:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]			; GCN-NEXT: [[S_MOV_B32_21:%[0-9]+]]:sreg_32 = S_MOV_B32 152
	; GCN: BUFFER_STORE_FORMAT_XYZW_IDXEN_exact killed [[BUFFER_LOAD_FORMAT_XYZW_IDXEN3]], [[COPY66]], [[S_LOAD_DWORDX4_IMM]], killed [[S_MOV_B32_21]], 152, 0, 0, 0, implicit $exec :: (dereferenceable store (s128) into custom "BufferResource" + 304, align 1, addrspace 4)			; GCN-NEXT: [[COPY66:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
	; GCN: [[S_MOV_B32_22:%[0-9]+]]:sreg_32 = S_MOV_B32 304			; GCN-NEXT: BUFFER_STORE_FORMAT_XYZW_IDXEN_exact killed [[BUFFER_LOAD_FORMAT_XYZW_IDXEN3]], [[COPY66]], [[S_LOAD_DWORDX4_IMM]], killed [[S_MOV_B32_21]], 152, 0, 0, 0, implicit $exec :: (dereferenceable store (s128) into custom "BufferResource" + 304, align 1, addrspace 4)
	; GCN: [[COPY67:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]			; GCN-NEXT: [[S_MOV_B32_22:%[0-9]+]]:sreg_32 = S_MOV_B32 304
	; GCN: BUFFER_STORE_FORMAT_XYZW_IDXEN_exact killed [[BUFFER_LOAD_FORMAT_XYZW_IDXEN4]], [[COPY67]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_22]], 0, 0, 0, 0, implicit $exec :: (dereferenceable store (s128) into custom "BufferResource" + 304, align 1, addrspace 4)			; GCN-NEXT: [[COPY67:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
	; GCN: BUFFER_STORE_FORMAT_XYZW_BOTHEN_exact killed [[BUFFER_LOAD_FORMAT_XYZW_BOTHEN]], [[REG_SEQUENCE2]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_22]], 0, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)			; GCN-NEXT: BUFFER_STORE_FORMAT_XYZW_IDXEN_exact killed [[BUFFER_LOAD_FORMAT_XYZW_IDXEN4]], [[COPY67]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_22]], 0, 0, 0, 0, implicit $exec :: (dereferenceable store (s128) into custom "BufferResource" + 304, align 1, addrspace 4)
	; GCN: [[COPY68:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]			; GCN-NEXT: BUFFER_STORE_FORMAT_XYZW_BOTHEN_exact killed [[BUFFER_LOAD_FORMAT_XYZW_BOTHEN]], [[REG_SEQUENCE2]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_22]], 0, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)
	; GCN: [[COPY69:%[0-9]+]]:sreg_32 = COPY [[COPY]]			; GCN-NEXT: [[COPY68:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
	; GCN: BUFFER_STORE_FORMAT_XYZW_IDXEN_exact killed [[BUFFER_LOAD_FORMAT_XYZW_IDXEN5]], [[COPY68]], [[S_LOAD_DWORDX4_IMM]], [[COPY69]], 304, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)			; GCN-NEXT: [[COPY69:%[0-9]+]]:sreg_32 = COPY [[COPY]]
	; GCN: BUFFER_STORE_FORMAT_XYZW_IDXEN_exact killed [[BUFFER_LOAD_FORMAT_XYZW_IDXEN6]], [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 304, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)			; GCN-NEXT: BUFFER_STORE_FORMAT_XYZW_IDXEN_exact killed [[BUFFER_LOAD_FORMAT_XYZW_IDXEN5]], [[COPY68]], [[S_LOAD_DWORDX4_IMM]], [[COPY69]], 304, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)
	; GCN: BUFFER_STORE_FORMAT_XYZW_IDXEN_exact killed [[BUFFER_LOAD_FORMAT_XYZW_IDXEN7]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 304, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)			; GCN-NEXT: BUFFER_STORE_FORMAT_XYZW_IDXEN_exact killed [[BUFFER_LOAD_FORMAT_XYZW_IDXEN6]], [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 304, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)
	; GCN: S_ENDPGM 0			; GCN-NEXT: BUFFER_STORE_FORMAT_XYZW_IDXEN_exact killed [[BUFFER_LOAD_FORMAT_XYZW_IDXEN7]], [[COPY]], [[S_LOAD_DWORDX4_IMM]], [[S_MOV_B32_]], 304, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), align 1, addrspace 4)
				; GCN-NEXT: S_ENDPGM 0
	bb.0:			bb.0:
	%tmp0 = load <4 x i32>, <4 x i32> addrspace(6)* %arg0, align 16, !invariant.load !0			%tmp0 = load <4 x i32>, <4 x i32> addrspace(6)* %arg0, align 16, !invariant.load !0
	%buffer0 = call nsz <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %tmp0, i32 0, i32 16, i1 false, i1 false) #0			%buffer0 = call nsz <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %tmp0, i32 0, i32 16, i1 false, i1 false) #0
	%buffer1 = call nsz <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %tmp0, i32 0, i32 %arg1, i1 false, i1 false) #0			%buffer1 = call nsz <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %tmp0, i32 0, i32 %arg1, i1 false, i1 false) #0
	%buffer2 = call nsz <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %tmp0, i32 1, i32 16, i1 false, i1 false) #0			%buffer2 = call nsz <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %tmp0, i32 1, i32 16, i1 false, i1 false) #0
	%buffer3 = call nsz <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %tmp0, i32 %arg1, i32 16, i1 false, i1 false) #0			%buffer3 = call nsz <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %tmp0, i32 %arg1, i32 16, i1 false, i1 false) #0

	; Insert inline asm to keep the different instruction types from being mixed. This makes the output easier to read.			; Insert inline asm to keep the different instruction types from being mixed. This makes the output easier to read.
	▲ Show 20 Lines • Show All 181 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/indirect-addressing-term.ll

	; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	; RUN: llc -O0 -amdgpu-scalarize-global-loads=false -march=amdgcn -mcpu=gfx900 -mattr=-flat-for-global -verify-machineinstrs -stop-after=regallocfast < %s \| FileCheck -check-prefixes=GCN %s			; RUN: llc -O0 -amdgpu-scalarize-global-loads=false -march=amdgcn -mcpu=gfx900 -mattr=-flat-for-global -verify-machineinstrs -stop-after=regallocfast < %s \| FileCheck -check-prefixes=GCN %s

	; Verify that we consider the xor at the end of the waterfall loop emitted for			; Verify that we consider the xor at the end of the waterfall loop emitted for
	; divergent indirect addressing as a terminator.			; divergent indirect addressing as a terminator.

	declare i32 @llvm.amdgcn.workitem.id.x() #1			declare i32 @llvm.amdgcn.workitem.id.x() #1

	; There should be no spill code inserted between the xor and the real terminator			; There should be no spill code inserted between the xor and the real terminator
	define amdgpu_kernel void @extract_w_offset_vgpr(i32 addrspace(1)* %out) {			define amdgpu_kernel void @extract_w_offset_vgpr(i32 addrspace(1)* %out) {
	; GCN-LABEL: name: extract_w_offset_vgpr			; GCN-LABEL: name: extract_w_offset_vgpr
	; GCN: bb.0.entry:			; GCN: bb.0.entry:
	; GCN: successors: %bb.1(0x80000000)			; GCN-NEXT: successors: %bb.1(0x80000000)
	; GCN: liveins: $vgpr0, $sgpr0_sgpr1			; GCN-NEXT: liveins: $vgpr0, $sgpr0_sgpr1
	; GCN: SI_SPILL_V32_SAVE killed $vgpr0, %stack.3, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.3, addrspace 5)			; GCN-NEXT: {{ $}}
	; GCN: renamable $sgpr0_sgpr1 = S_LOAD_DWORDX2_IMM killed renamable $sgpr0_sgpr1, 36, 0 :: (dereferenceable invariant load (s64) from %ir.out.kernarg.offset.cast, align 4, addrspace 4)			; GCN-NEXT: SI_SPILL_V32_SAVE killed $vgpr0, %stack.3, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.3, addrspace 5)
	; GCN: renamable $sgpr6 = COPY renamable $sgpr1			; GCN-NEXT: renamable $sgpr0_sgpr1 = S_LOAD_DWORDX2_IMM killed renamable $sgpr0_sgpr1, 36, 0 :: (invariant load (s64) from %ir.out.kernarg.offset.cast, align 4, addrspace 4)
	; GCN: renamable $sgpr0 = COPY renamable $sgpr0, implicit killed $sgpr0_sgpr1			; GCN-NEXT: renamable $sgpr6 = COPY renamable $sgpr1
	; GCN: renamable $sgpr4 = S_MOV_B32 61440			; GCN-NEXT: renamable $sgpr0 = COPY renamable $sgpr0, implicit killed $sgpr0_sgpr1
	; GCN: renamable $sgpr5 = S_MOV_B32 -1			; GCN-NEXT: renamable $sgpr4 = S_MOV_B32 61440
	; GCN: undef renamable $sgpr0 = COPY killed renamable $sgpr0, implicit-def $sgpr0_sgpr1_sgpr2_sgpr3			; GCN-NEXT: renamable $sgpr5 = S_MOV_B32 -1
	; GCN: renamable $sgpr1 = COPY killed renamable $sgpr6			; GCN-NEXT: undef renamable $sgpr0 = COPY killed renamable $sgpr0, implicit-def $sgpr0_sgpr1_sgpr2_sgpr3
	; GCN: renamable $sgpr2 = COPY killed renamable $sgpr5			; GCN-NEXT: renamable $sgpr1 = COPY killed renamable $sgpr6
	; GCN: renamable $sgpr3 = COPY killed renamable $sgpr4			; GCN-NEXT: renamable $sgpr2 = COPY killed renamable $sgpr5
	; GCN: SI_SPILL_S128_SAVE killed $sgpr0_sgpr1_sgpr2_sgpr3, %stack.2, implicit $exec, implicit $sgpr32 :: (store (s128) into %stack.2, align 4, addrspace 5)			; GCN-NEXT: renamable $sgpr3 = COPY killed renamable $sgpr4
	; GCN: renamable $sgpr0 = S_MOV_B32 16			; GCN-NEXT: SI_SPILL_S128_SAVE killed $sgpr0_sgpr1_sgpr2_sgpr3, %stack.2, implicit $exec, implicit $sgpr32 :: (store (s128) into %stack.2, align 4, addrspace 5)
	; GCN: renamable $sgpr1 = S_MOV_B32 15			; GCN-NEXT: renamable $sgpr0 = S_MOV_B32 16
	; GCN: renamable $sgpr2 = S_MOV_B32 14			; GCN-NEXT: renamable $sgpr1 = S_MOV_B32 15
	; GCN: renamable $sgpr3 = S_MOV_B32 13			; GCN-NEXT: renamable $sgpr2 = S_MOV_B32 14
	; GCN: renamable $sgpr4 = S_MOV_B32 12			; GCN-NEXT: renamable $sgpr3 = S_MOV_B32 13
	; GCN: renamable $sgpr5 = S_MOV_B32 11			; GCN-NEXT: renamable $sgpr4 = S_MOV_B32 12
	; GCN: renamable $sgpr6 = S_MOV_B32 10			; GCN-NEXT: renamable $sgpr5 = S_MOV_B32 11
	; GCN: renamable $sgpr7 = S_MOV_B32 9			; GCN-NEXT: renamable $sgpr6 = S_MOV_B32 10
	; GCN: renamable $sgpr8 = S_MOV_B32 8			; GCN-NEXT: renamable $sgpr7 = S_MOV_B32 9
	; GCN: renamable $sgpr9 = S_MOV_B32 7			; GCN-NEXT: renamable $sgpr8 = S_MOV_B32 8
	; GCN: renamable $sgpr10 = S_MOV_B32 6			; GCN-NEXT: renamable $sgpr9 = S_MOV_B32 7
	; GCN: renamable $sgpr11 = S_MOV_B32 5			; GCN-NEXT: renamable $sgpr10 = S_MOV_B32 6
	; GCN: renamable $sgpr12 = S_MOV_B32 3			; GCN-NEXT: renamable $sgpr11 = S_MOV_B32 5
	; GCN: renamable $sgpr13 = S_MOV_B32 2			; GCN-NEXT: renamable $sgpr12 = S_MOV_B32 3
	; GCN: renamable $sgpr14 = S_MOV_B32 1			; GCN-NEXT: renamable $sgpr13 = S_MOV_B32 2
	; GCN: renamable $sgpr15 = S_MOV_B32 0			; GCN-NEXT: renamable $sgpr14 = S_MOV_B32 1
	; GCN: renamable $vgpr0 = COPY killed renamable $sgpr15			; GCN-NEXT: renamable $sgpr15 = S_MOV_B32 0
	; GCN: renamable $vgpr30 = COPY killed renamable $sgpr14			; GCN-NEXT: renamable $vgpr0 = COPY killed renamable $sgpr15
	; GCN: renamable $vgpr29 = COPY killed renamable $sgpr13			; GCN-NEXT: renamable $vgpr30 = COPY killed renamable $sgpr14
	; GCN: renamable $vgpr28 = COPY killed renamable $sgpr12			; GCN-NEXT: renamable $vgpr29 = COPY killed renamable $sgpr13
	; GCN: renamable $vgpr27 = COPY killed renamable $sgpr11			; GCN-NEXT: renamable $vgpr28 = COPY killed renamable $sgpr12
	; GCN: renamable $vgpr26 = COPY killed renamable $sgpr10			; GCN-NEXT: renamable $vgpr27 = COPY killed renamable $sgpr11
	; GCN: renamable $vgpr25 = COPY killed renamable $sgpr9			; GCN-NEXT: renamable $vgpr26 = COPY killed renamable $sgpr10
	; GCN: renamable $vgpr24 = COPY killed renamable $sgpr8			; GCN-NEXT: renamable $vgpr25 = COPY killed renamable $sgpr9
	; GCN: renamable $vgpr23 = COPY killed renamable $sgpr7			; GCN-NEXT: renamable $vgpr24 = COPY killed renamable $sgpr8
	; GCN: renamable $vgpr22 = COPY killed renamable $sgpr6			; GCN-NEXT: renamable $vgpr23 = COPY killed renamable $sgpr7
	; GCN: renamable $vgpr21 = COPY killed renamable $sgpr5			; GCN-NEXT: renamable $vgpr22 = COPY killed renamable $sgpr6
	; GCN: renamable $vgpr20 = COPY killed renamable $sgpr4			; GCN-NEXT: renamable $vgpr21 = COPY killed renamable $sgpr5
	; GCN: renamable $vgpr19 = COPY killed renamable $sgpr3			; GCN-NEXT: renamable $vgpr20 = COPY killed renamable $sgpr4
	; GCN: renamable $vgpr18 = COPY killed renamable $sgpr2			; GCN-NEXT: renamable $vgpr19 = COPY killed renamable $sgpr3
	; GCN: renamable $vgpr17 = COPY killed renamable $sgpr1			; GCN-NEXT: renamable $vgpr18 = COPY killed renamable $sgpr2
	; GCN: renamable $vgpr16 = COPY killed renamable $sgpr0			; GCN-NEXT: renamable $vgpr17 = COPY killed renamable $sgpr1
	; GCN: undef renamable $vgpr0 = COPY killed renamable $vgpr0, implicit-def $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15			; GCN-NEXT: renamable $vgpr16 = COPY killed renamable $sgpr0
	; GCN: renamable $vgpr1 = COPY killed renamable $vgpr30			; GCN-NEXT: undef renamable $vgpr0 = COPY killed renamable $vgpr0, implicit-def $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15
	; GCN: renamable $vgpr2 = COPY killed renamable $vgpr29			; GCN-NEXT: renamable $vgpr1 = COPY killed renamable $vgpr30
	; GCN: renamable $vgpr3 = COPY killed renamable $vgpr28			; GCN-NEXT: renamable $vgpr2 = COPY killed renamable $vgpr29
	; GCN: renamable $vgpr4 = COPY killed renamable $vgpr27			; GCN-NEXT: renamable $vgpr3 = COPY killed renamable $vgpr28
	; GCN: renamable $vgpr5 = COPY killed renamable $vgpr26			; GCN-NEXT: renamable $vgpr4 = COPY killed renamable $vgpr27
	; GCN: renamable $vgpr6 = COPY killed renamable $vgpr25			; GCN-NEXT: renamable $vgpr5 = COPY killed renamable $vgpr26
	; GCN: renamable $vgpr7 = COPY killed renamable $vgpr24			; GCN-NEXT: renamable $vgpr6 = COPY killed renamable $vgpr25
	; GCN: renamable $vgpr8 = COPY killed renamable $vgpr23			; GCN-NEXT: renamable $vgpr7 = COPY killed renamable $vgpr24
	; GCN: renamable $vgpr9 = COPY killed renamable $vgpr22			; GCN-NEXT: renamable $vgpr8 = COPY killed renamable $vgpr23
	; GCN: renamable $vgpr10 = COPY killed renamable $vgpr21			; GCN-NEXT: renamable $vgpr9 = COPY killed renamable $vgpr22
	; GCN: renamable $vgpr11 = COPY killed renamable $vgpr20			; GCN-NEXT: renamable $vgpr10 = COPY killed renamable $vgpr21
	; GCN: renamable $vgpr12 = COPY killed renamable $vgpr19			; GCN-NEXT: renamable $vgpr11 = COPY killed renamable $vgpr20
	; GCN: renamable $vgpr13 = COPY killed renamable $vgpr18			; GCN-NEXT: renamable $vgpr12 = COPY killed renamable $vgpr19
	; GCN: renamable $vgpr14 = COPY killed renamable $vgpr17			; GCN-NEXT: renamable $vgpr13 = COPY killed renamable $vgpr18
	; GCN: renamable $vgpr15 = COPY killed renamable $vgpr16			; GCN-NEXT: renamable $vgpr14 = COPY killed renamable $vgpr17
	; GCN: SI_SPILL_V512_SAVE killed $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, %stack.1, $sgpr32, 0, implicit $exec :: (store (s512) into %stack.1, align 4, addrspace 5)			; GCN-NEXT: renamable $vgpr15 = COPY killed renamable $vgpr16
	; GCN: renamable $sgpr0_sgpr1 = S_MOV_B64 $exec			; GCN-NEXT: SI_SPILL_V512_SAVE killed $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, %stack.1, $sgpr32, 0, implicit $exec :: (store (s512) into %stack.1, align 4, addrspace 5)
	; GCN: SI_SPILL_S64_SAVE killed $sgpr0_sgpr1, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s64) into %stack.0, align 4, addrspace 5)			; GCN-NEXT: renamable $sgpr0_sgpr1 = S_MOV_B64 $exec
	; GCN: renamable $vgpr0 = IMPLICIT_DEF			; GCN-NEXT: SI_SPILL_S64_SAVE killed $sgpr0_sgpr1, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s64) into %stack.0, align 4, addrspace 5)
	; GCN: renamable $sgpr0_sgpr1 = IMPLICIT_DEF			; GCN-NEXT: renamable $vgpr0 = IMPLICIT_DEF
	; GCN: bb.1:			; GCN-NEXT: renamable $sgpr0_sgpr1 = IMPLICIT_DEF
	; GCN: successors: %bb.1(0x40000000), %bb.3(0x40000000)			; GCN-NEXT: {{ $}}
	; GCN: $sgpr0_sgpr1 = SI_SPILL_S64_RESTORE %stack.4, implicit $exec, implicit $sgpr32 :: (load (s64) from %stack.4, align 4, addrspace 5)			; GCN-NEXT: bb.1:
	; GCN: $vgpr17 = SI_SPILL_V32_RESTORE %stack.5, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.5, addrspace 5)			; GCN-NEXT: successors: %bb.1(0x40000000), %bb.3(0x40000000)
	; GCN: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15 = SI_SPILL_V512_RESTORE %stack.1, $sgpr32, 0, implicit $exec :: (load (s512) from %stack.1, align 4, addrspace 5)			; GCN-NEXT: {{ $}}
	; GCN: $vgpr16 = SI_SPILL_V32_RESTORE %stack.3, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.3, addrspace 5)			; GCN-NEXT: $sgpr0_sgpr1 = SI_SPILL_S64_RESTORE %stack.4, implicit $exec, implicit $sgpr32 :: (load (s64) from %stack.4, align 4, addrspace 5)
	; GCN: renamable $sgpr2 = V_READFIRSTLANE_B32 $vgpr16, implicit $exec			; GCN-NEXT: $vgpr17 = SI_SPILL_V32_RESTORE %stack.5, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.5, addrspace 5)
	; GCN: renamable $sgpr0_sgpr1 = V_CMP_EQ_U32_e64 $sgpr2, $vgpr16, implicit $exec			; GCN-NEXT: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15 = SI_SPILL_V512_RESTORE %stack.1, $sgpr32, 0, implicit $exec :: (load (s512) from %stack.1, align 4, addrspace 5)
	; GCN: renamable $sgpr0_sgpr1 = S_AND_SAVEEXEC_B64 killed renamable $sgpr0_sgpr1, implicit-def $exec, implicit-def dead $scc, implicit $exec			; GCN-NEXT: $vgpr16 = SI_SPILL_V32_RESTORE %stack.3, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.3, addrspace 5)
	; GCN: renamable $vgpr0 = V_INDIRECT_REG_READ_GPR_IDX_B32_V16 $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, killed $sgpr2, 11, implicit-def $m0, implicit $m0, implicit $exec			; GCN-NEXT: renamable $sgpr2 = V_READFIRSTLANE_B32 $vgpr16, implicit $exec
	; GCN: SI_SPILL_V32_SAVE $vgpr0, %stack.6, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.6, addrspace 5)			; GCN-NEXT: renamable $sgpr0_sgpr1 = V_CMP_EQ_U32_e64 $sgpr2, $vgpr16, implicit $exec
	; GCN: SI_SPILL_V32_SAVE killed $vgpr0, %stack.5, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.5, addrspace 5)			; GCN-NEXT: renamable $sgpr0_sgpr1 = S_AND_SAVEEXEC_B64 killed renamable $sgpr0_sgpr1, implicit-def $exec, implicit-def dead $scc, implicit $exec
	; GCN: renamable $sgpr2_sgpr3 = COPY renamable $sgpr0_sgpr1			; GCN-NEXT: renamable $vgpr0 = V_INDIRECT_REG_READ_GPR_IDX_B32_V16 $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, killed $sgpr2, 11, implicit-def $m0, implicit $m0, implicit $exec
	; GCN: SI_SPILL_S64_SAVE killed $sgpr2_sgpr3, %stack.4, implicit $exec, implicit $sgpr32 :: (store (s64) into %stack.4, align 4, addrspace 5)			; GCN-NEXT: SI_SPILL_V32_SAVE $vgpr0, %stack.6, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.6, addrspace 5)
	; GCN: $exec = S_XOR_B64_term $exec, killed renamable $sgpr0_sgpr1, implicit-def dead $scc			; GCN-NEXT: SI_SPILL_V32_SAVE killed $vgpr0, %stack.5, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.5, addrspace 5)
	; GCN: S_CBRANCH_EXECNZ %bb.1, implicit $exec			; GCN-NEXT: renamable $sgpr2_sgpr3 = COPY renamable $sgpr0_sgpr1
	; GCN: bb.3:			; GCN-NEXT: SI_SPILL_S64_SAVE killed $sgpr2_sgpr3, %stack.4, implicit $exec, implicit $sgpr32 :: (store (s64) into %stack.4, align 4, addrspace 5)
	; GCN: successors: %bb.2(0x80000000)			; GCN-NEXT: $exec = S_XOR_B64_term $exec, killed renamable $sgpr0_sgpr1, implicit-def dead $scc
	; GCN: $sgpr0_sgpr1 = SI_SPILL_S64_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s64) from %stack.0, align 4, addrspace 5)			; GCN-NEXT: S_CBRANCH_EXECNZ %bb.1, implicit $exec
	; GCN: $exec = S_MOV_B64 renamable $sgpr0_sgpr1			; GCN-NEXT: {{ $}}
	; GCN: bb.2:			; GCN-NEXT: bb.3:
	; GCN: $vgpr0 = SI_SPILL_V32_RESTORE %stack.6, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.6, addrspace 5)			; GCN-NEXT: successors: %bb.2(0x80000000)
	; GCN: $sgpr0_sgpr1_sgpr2_sgpr3 = SI_SPILL_S128_RESTORE %stack.2, implicit $exec, implicit $sgpr32 :: (load (s128) from %stack.2, align 4, addrspace 5)			; GCN-NEXT: {{ $}}
	; GCN: BUFFER_STORE_DWORD_OFFSET killed renamable $vgpr0, killed renamable $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, implicit $exec :: (store (s32) into %ir.out.load, addrspace 1)			; GCN-NEXT: $sgpr0_sgpr1 = SI_SPILL_S64_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s64) from %stack.0, align 4, addrspace 5)
	; GCN: S_ENDPGM 0			; GCN-NEXT: $exec = S_MOV_B64 renamable $sgpr0_sgpr1
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.2:
				; GCN-NEXT: $vgpr0 = SI_SPILL_V32_RESTORE %stack.6, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.6, addrspace 5)
				; GCN-NEXT: $sgpr0_sgpr1_sgpr2_sgpr3 = SI_SPILL_S128_RESTORE %stack.2, implicit $exec, implicit $sgpr32 :: (load (s128) from %stack.2, align 4, addrspace 5)
				; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET killed renamable $vgpr0, killed renamable $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, implicit $exec :: (store (s32) into %ir.out.load, addrspace 1)
				; GCN-NEXT: S_ENDPGM 0
	entry:			entry:
	%id = call i32 @llvm.amdgcn.workitem.id.x() #1			%id = call i32 @llvm.amdgcn.workitem.id.x() #1
	%index = add i32 %id, 1			%index = add i32 %id, 1
	%value = extractelement <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16>, i32 %index			%value = extractelement <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16>, i32 %index
	store i32 %value, i32 addrspace(1)* %out			store i32 %value, i32 addrspace(1)* %out
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/kernel-args.ll

	Show First 20 Lines • Show All 1,129 Lines • ▼ Show 20 Lines
	entry:			entry:
	store <3 x i16> %in, <3 x i16> addrspace(1)* %out, align 4			store <3 x i16> %in, <3 x i16> addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	define amdgpu_kernel void @v3i32_arg(<3 x i32> addrspace(1)* nocapture %out, <3 x i32> %in) nounwind {			define amdgpu_kernel void @v3i32_arg(<3 x i32> addrspace(1)* nocapture %out, <3 x i32> %in) nounwind {
	; SI-LABEL: v3i32_arg:			; SI-LABEL: v3i32_arg:
	; SI: ; %bb.0: ; %entry			; SI: ; %bb.0: ; %entry
	; SI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0xd			; SI-NEXT: s_load_dword s2, s[0:1], 0xf
	; SI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9			; SI-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x9
	; SI-NEXT: s_mov_b32 s3, 0xf000			; SI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
	; SI-NEXT: s_mov_b32 s2, -1			; SI-NEXT: s_mov_b32 s7, 0xf000
				; SI-NEXT: s_mov_b32 s6, -1
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: v_mov_b32_e32 v0, s6			; SI-NEXT: v_mov_b32_e32 v0, s2
	; SI-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:8			; SI-NEXT: buffer_store_dword v0, off, s[4:7], 0 offset:8
	; SI-NEXT: s_waitcnt expcnt(0)			; SI-NEXT: s_waitcnt expcnt(0)
	; SI-NEXT: v_mov_b32_e32 v0, s4			; SI-NEXT: v_mov_b32_e32 v0, s0
	; SI-NEXT: v_mov_b32_e32 v1, s5			; SI-NEXT: v_mov_b32_e32 v1, s1
	; SI-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0			; SI-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0
	; SI-NEXT: s_endpgm			; SI-NEXT: s_endpgm
	;			;
	; VI-LABEL: v3i32_arg:			; VI-LABEL: v3i32_arg:
	; VI: ; %bb.0: ; %entry			; VI: ; %bb.0: ; %entry
	; VI-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24			; VI-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x24
	; VI-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x34			; VI-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34
				; VI-NEXT: s_load_dword s0, s[0:1], 0x3c
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: v_mov_b32_e32 v3, s4			; VI-NEXT: v_mov_b32_e32 v4, s3
	; VI-NEXT: v_mov_b32_e32 v0, s0			; VI-NEXT: v_mov_b32_e32 v0, s4
	; VI-NEXT: v_mov_b32_e32 v1, s1			; VI-NEXT: v_mov_b32_e32 v1, s5
	; VI-NEXT: v_mov_b32_e32 v2, s2			; VI-NEXT: v_mov_b32_e32 v2, s0
	; VI-NEXT: v_mov_b32_e32 v4, s5			; VI-NEXT: v_mov_b32_e32 v3, s2
	; VI-NEXT: flat_store_dwordx3 v[3:4], v[0:2]			; VI-NEXT: flat_store_dwordx3 v[3:4], v[0:2]
	; VI-NEXT: s_endpgm			; VI-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: v3i32_arg:			; GFX9-LABEL: v3i32_arg:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_load_dwordx2 s[6:7], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[6:7], s[4:5], 0x0
	; GFX9-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x10			; GFX9-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x10
	; GFX9-NEXT: v_mov_b32_e32 v3, 0			; GFX9-NEXT: v_mov_b32_e32 v3, 0
	Show All 40 Lines
	entry:			entry:
	store <3 x i32> %in, <3 x i32> addrspace(1)* %out, align 4			store <3 x i32> %in, <3 x i32> addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	define amdgpu_kernel void @v3f32_arg(<3 x float> addrspace(1)* nocapture %out, <3 x float> %in) nounwind {			define amdgpu_kernel void @v3f32_arg(<3 x float> addrspace(1)* nocapture %out, <3 x float> %in) nounwind {
	; SI-LABEL: v3f32_arg:			; SI-LABEL: v3f32_arg:
	; SI: ; %bb.0: ; %entry			; SI: ; %bb.0: ; %entry
	; SI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0xd			; SI-NEXT: s_load_dword s2, s[0:1], 0xf
	; SI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9			; SI-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x9
	; SI-NEXT: s_mov_b32 s3, 0xf000			; SI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
	; SI-NEXT: s_mov_b32 s2, -1			; SI-NEXT: s_mov_b32 s7, 0xf000
				; SI-NEXT: s_mov_b32 s6, -1
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: v_mov_b32_e32 v0, s6			; SI-NEXT: v_mov_b32_e32 v0, s2
	; SI-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:8			; SI-NEXT: buffer_store_dword v0, off, s[4:7], 0 offset:8
	; SI-NEXT: s_waitcnt expcnt(0)			; SI-NEXT: s_waitcnt expcnt(0)
	; SI-NEXT: v_mov_b32_e32 v0, s4			; SI-NEXT: v_mov_b32_e32 v0, s0
	; SI-NEXT: v_mov_b32_e32 v1, s5			; SI-NEXT: v_mov_b32_e32 v1, s1
	; SI-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0			; SI-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0
	; SI-NEXT: s_endpgm			; SI-NEXT: s_endpgm
	;			;
	; VI-LABEL: v3f32_arg:			; VI-LABEL: v3f32_arg:
	; VI: ; %bb.0: ; %entry			; VI: ; %bb.0: ; %entry
	; VI-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24			; VI-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x24
	; VI-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x34			; VI-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34
				; VI-NEXT: s_load_dword s0, s[0:1], 0x3c
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: v_mov_b32_e32 v3, s4			; VI-NEXT: v_mov_b32_e32 v4, s3
	; VI-NEXT: v_mov_b32_e32 v0, s0			; VI-NEXT: v_mov_b32_e32 v0, s4
	; VI-NEXT: v_mov_b32_e32 v1, s1			; VI-NEXT: v_mov_b32_e32 v1, s5
	; VI-NEXT: v_mov_b32_e32 v2, s2			; VI-NEXT: v_mov_b32_e32 v2, s0
	; VI-NEXT: v_mov_b32_e32 v4, s5			; VI-NEXT: v_mov_b32_e32 v3, s2
	; VI-NEXT: flat_store_dwordx3 v[3:4], v[0:2]			; VI-NEXT: flat_store_dwordx3 v[3:4], v[0:2]
	; VI-NEXT: s_endpgm			; VI-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: v3f32_arg:			; GFX9-LABEL: v3f32_arg:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_load_dwordx2 s[6:7], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[6:7], s[4:5], 0x0
	; GFX9-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x10			; GFX9-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x10
	; GFX9-NEXT: v_mov_b32_e32 v3, 0			; GFX9-NEXT: v_mov_b32_e32 v3, 0
	▲ Show 20 Lines • Show All 5,133 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/legalize-fp-load-invariant.ll

	; RUN: llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs -stop-after=amdgpu-isel -o - %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs -stop-after=amdgpu-isel -o - %s \| FileCheck -check-prefix=GCN %s

	; Type legalization for illegal FP type results was dropping invariant			; Type legalization for illegal FP type results was dropping invariant
	; and dereferenceable flags.			; and dereferenceable flags.

	; GCN: BUFFER_LOAD_USHORT{{.*}} :: (dereferenceable invariant load (s16) from %ir.ptr, addrspace 4)			; GCN: BUFFER_LOAD_USHORT{{.*}} :: (invariant load (s16) from %ir.ptr, addrspace 4)
	define half @legalize_f16_load(half addrspace(4)* dereferenceable(4) %ptr) {			define half @legalize_f16_load(half addrspace(4)* dereferenceable(4) %ptr) {
	%load = load half, half addrspace(4)* %ptr, !invariant.load !0			%load = load half, half addrspace(4)* %ptr, !invariant.load !0
	%add = fadd half %load, 1.0			%add = fadd half %load, 1.0
	ret half %add			ret half %add
	}			}

	!0 = !{}			!0 = !{}

llvm/test/CodeGen/AMDGPU/store-local.96.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX9 %s			; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX9 %s
	; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX7 %s			; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX7 %s
	; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=tahiti -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX6 %s			; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=tahiti -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX6 %s
	; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX10 %s			; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX10 %s

	define amdgpu_kernel void @store_lds_v3i32(<3 x i32> addrspace(3)* %out, <3 x i32> %x) {			define amdgpu_kernel void @store_lds_v3i32(<3 x i32> addrspace(3)* %out, <3 x i32> %x) {
	; GFX9-LABEL: store_lds_v3i32:			; GFX9-LABEL: store_lds_v3i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_load_dword s2, s[0:1], 0x24			; GFX9-NEXT: s_load_dword s4, s[0:1], 0x24
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x34			; GFX9-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
				; GFX9-NEXT: s_load_dword s5, s[0:1], 0x3c
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v3, s2			; GFX9-NEXT: v_mov_b32_e32 v3, s4
	; GFX9-NEXT: v_mov_b32_e32 v0, s4			; GFX9-NEXT: v_mov_b32_e32 v0, s2
	; GFX9-NEXT: v_mov_b32_e32 v1, s5			; GFX9-NEXT: v_mov_b32_e32 v1, s3
	; GFX9-NEXT: v_mov_b32_e32 v2, s6			; GFX9-NEXT: v_mov_b32_e32 v2, s5
	; GFX9-NEXT: ds_write_b96 v3, v[0:2]			; GFX9-NEXT: ds_write_b96 v3, v[0:2]
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX7-LABEL: store_lds_v3i32:			; GFX7-LABEL: store_lds_v3i32:
	; GFX7: ; %bb.0:			; GFX7: ; %bb.0:
	; GFX7-NEXT: s_load_dword s4, s[0:1], 0x9			; GFX7-NEXT: s_load_dword s4, s[0:1], 0x9
	; GFX7-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0xd			; GFX7-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0xd
				; GFX7-NEXT: s_load_dword s0, s[0:1], 0xf
	; GFX7-NEXT: s_mov_b32 m0, -1			; GFX7-NEXT: s_mov_b32 m0, -1
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: v_mov_b32_e32 v3, s4			; GFX7-NEXT: v_mov_b32_e32 v3, s4
	; GFX7-NEXT: v_mov_b32_e32 v0, s0			; GFX7-NEXT: v_mov_b32_e32 v0, s2
	; GFX7-NEXT: v_mov_b32_e32 v1, s1			; GFX7-NEXT: v_mov_b32_e32 v1, s3
	; GFX7-NEXT: v_mov_b32_e32 v2, s2			; GFX7-NEXT: v_mov_b32_e32 v2, s0
	; GFX7-NEXT: ds_write_b96 v3, v[0:2]			; GFX7-NEXT: ds_write_b96 v3, v[0:2]
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX6-LABEL: store_lds_v3i32:			; GFX6-LABEL: store_lds_v3i32:
	; GFX6: ; %bb.0:			; GFX6: ; %bb.0:
	; GFX6-NEXT: s_load_dword s4, s[0:1], 0x9			; GFX6-NEXT: s_load_dword s4, s[0:1], 0x9
	; GFX6-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0xd			; GFX6-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0xd
				; GFX6-NEXT: s_load_dword s0, s[0:1], 0xf
	; GFX6-NEXT: s_mov_b32 m0, -1			; GFX6-NEXT: s_mov_b32 m0, -1
	; GFX6-NEXT: s_waitcnt lgkmcnt(0)			; GFX6-NEXT: s_waitcnt lgkmcnt(0)
	; GFX6-NEXT: v_mov_b32_e32 v2, s4			; GFX6-NEXT: v_mov_b32_e32 v2, s4
	; GFX6-NEXT: v_mov_b32_e32 v1, s2
	; GFX6-NEXT: v_mov_b32_e32 v0, s0			; GFX6-NEXT: v_mov_b32_e32 v0, s0
	; GFX6-NEXT: ds_write_b32 v2, v1 offset:8			; GFX6-NEXT: ds_write_b32 v2, v0 offset:8
	; GFX6-NEXT: v_mov_b32_e32 v1, s1			; GFX6-NEXT: v_mov_b32_e32 v0, s2
				; GFX6-NEXT: v_mov_b32_e32 v1, s3
	; GFX6-NEXT: ds_write_b64 v2, v[0:1]			; GFX6-NEXT: ds_write_b64 v2, v[0:1]
	; GFX6-NEXT: s_endpgm			; GFX6-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: store_lds_v3i32:			; GFX10-LABEL: store_lds_v3i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x2
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x34			; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
	; GFX10-NEXT: s_load_dword s2, s[0:1], 0x24			; GFX10-NEXT: s_load_dword s4, s[0:1], 0x3c
				; GFX10-NEXT: s_load_dword s5, s[0:1], 0x24
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v0, s4			; GFX10-NEXT: v_mov_b32_e32 v0, s2
	; GFX10-NEXT: v_mov_b32_e32 v1, s5			; GFX10-NEXT: v_mov_b32_e32 v1, s3
	; GFX10-NEXT: v_mov_b32_e32 v2, s6			; GFX10-NEXT: v_mov_b32_e32 v2, s4
	; GFX10-NEXT: v_mov_b32_e32 v3, s2			; GFX10-NEXT: v_mov_b32_e32 v3, s5
	; GFX10-NEXT: ds_write_b96 v3, v[0:2]			; GFX10-NEXT: ds_write_b96 v3, v[0:2]
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	store <3 x i32> %x, <3 x i32> addrspace(3)* %out			store <3 x i32> %x, <3 x i32> addrspace(3)* %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @store_lds_v3i32_align1(<3 x i32> addrspace(3)* %out, <3 x i32> %x) {			define amdgpu_kernel void @store_lds_v3i32_align1(<3 x i32> addrspace(3)* %out, <3 x i32> %x) {
	; GFX9-LABEL: store_lds_v3i32_align1:			; GFX9-LABEL: store_lds_v3i32_align1:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_load_dword s2, s[0:1], 0x24			; GFX9-NEXT: s_load_dword s4, s[0:1], 0x24
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x34			; GFX9-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
				; GFX9-NEXT: s_load_dword s5, s[0:1], 0x3c
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, s2			; GFX9-NEXT: v_mov_b32_e32 v0, s4
	; GFX9-NEXT: v_mov_b32_e32 v1, s6			; GFX9-NEXT: s_lshr_b32 s0, s5, 8
	; GFX9-NEXT: v_mov_b32_e32 v2, s5			; GFX9-NEXT: s_lshr_b32 s1, s5, 24
	; GFX9-NEXT: ds_write_b8 v0, v1 offset:8
	; GFX9-NEXT: ds_write_b8_d16_hi v0, v1 offset:10
	; GFX9-NEXT: ds_write_b8 v0, v2 offset:4
	; GFX9-NEXT: ds_write_b8_d16_hi v0, v2 offset:6
	; GFX9-NEXT: v_mov_b32_e32 v1, s4
	; GFX9-NEXT: s_lshr_b32 s0, s6, 8
	; GFX9-NEXT: ds_write_b8 v0, v1
	; GFX9-NEXT: ds_write_b8_d16_hi v0, v1 offset:2
	; GFX9-NEXT: v_mov_b32_e32 v1, s0			; GFX9-NEXT: v_mov_b32_e32 v1, s0
	; GFX9-NEXT: s_lshr_b32 s0, s6, 24
	; GFX9-NEXT: ds_write_b8 v0, v1 offset:9			; GFX9-NEXT: ds_write_b8 v0, v1 offset:9
	; GFX9-NEXT: v_mov_b32_e32 v1, s0			; GFX9-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-NEXT: s_lshr_b32 s0, s5, 8
	; GFX9-NEXT: ds_write_b8 v0, v1 offset:11			; GFX9-NEXT: ds_write_b8 v0, v1 offset:11
				; GFX9-NEXT: v_mov_b32_e32 v1, s3
				; GFX9-NEXT: ds_write_b8 v0, v1 offset:4
				; GFX9-NEXT: ds_write_b8_d16_hi v0, v1 offset:6
				; GFX9-NEXT: v_mov_b32_e32 v1, s2
				; GFX9-NEXT: s_lshr_b32 s0, s3, 8
				; GFX9-NEXT: ds_write_b8 v0, v1
				; GFX9-NEXT: ds_write_b8_d16_hi v0, v1 offset:2
	; GFX9-NEXT: v_mov_b32_e32 v1, s0			; GFX9-NEXT: v_mov_b32_e32 v1, s0
	; GFX9-NEXT: s_lshr_b32 s0, s5, 24			; GFX9-NEXT: s_lshr_b32 s0, s3, 24
	; GFX9-NEXT: ds_write_b8 v0, v1 offset:5			; GFX9-NEXT: ds_write_b8 v0, v1 offset:5
	; GFX9-NEXT: v_mov_b32_e32 v1, s0			; GFX9-NEXT: v_mov_b32_e32 v1, s0
	; GFX9-NEXT: s_lshr_b32 s0, s4, 8			; GFX9-NEXT: s_lshr_b32 s0, s2, 8
	; GFX9-NEXT: ds_write_b8 v0, v1 offset:7			; GFX9-NEXT: ds_write_b8 v0, v1 offset:7
	; GFX9-NEXT: v_mov_b32_e32 v1, s0			; GFX9-NEXT: v_mov_b32_e32 v1, s0
	; GFX9-NEXT: s_lshr_b32 s0, s4, 24			; GFX9-NEXT: s_lshr_b32 s0, s2, 24
	; GFX9-NEXT: ds_write_b8 v0, v1 offset:1			; GFX9-NEXT: ds_write_b8 v0, v1 offset:1
	; GFX9-NEXT: v_mov_b32_e32 v1, s0			; GFX9-NEXT: v_mov_b32_e32 v1, s0
	; GFX9-NEXT: ds_write_b8 v0, v1 offset:3			; GFX9-NEXT: ds_write_b8 v0, v1 offset:3
				; GFX9-NEXT: v_mov_b32_e32 v1, s5
				; GFX9-NEXT: ds_write_b8 v0, v1 offset:8
				; GFX9-NEXT: ds_write_b8_d16_hi v0, v1 offset:10
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX7-LABEL: store_lds_v3i32_align1:			; GFX7-LABEL: store_lds_v3i32_align1:
	; GFX7: ; %bb.0:			; GFX7: ; %bb.0:
	; GFX7-NEXT: s_load_dword s4, s[0:1], 0x9			; GFX7-NEXT: s_load_dword s4, s[0:1], 0x9
	; GFX7-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0xd			; GFX7-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0xd
				; GFX7-NEXT: s_load_dword s0, s[0:1], 0xf
	; GFX7-NEXT: s_mov_b32 m0, -1			; GFX7-NEXT: s_mov_b32 m0, -1
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: v_mov_b32_e32 v0, s4			; GFX7-NEXT: v_mov_b32_e32 v0, s4
	; GFX7-NEXT: v_mov_b32_e32 v1, s2			; GFX7-NEXT: s_lshr_b32 s1, s0, 8
	; GFX7-NEXT: v_mov_b32_e32 v2, s1			; GFX7-NEXT: v_mov_b32_e32 v1, s1
	; GFX7-NEXT: ds_write_b8 v0, v1 offset:8			; GFX7-NEXT: s_lshr_b32 s1, s0, 24
	; GFX7-NEXT: ds_write_b8 v0, v2 offset:4
	; GFX7-NEXT: v_mov_b32_e32 v1, s0
	; GFX7-NEXT: s_lshr_b32 s3, s2, 8
	; GFX7-NEXT: ds_write_b8 v0, v1
	; GFX7-NEXT: v_mov_b32_e32 v1, s3
	; GFX7-NEXT: s_lshr_b32 s3, s2, 24
	; GFX7-NEXT: ds_write_b8 v0, v1 offset:9			; GFX7-NEXT: ds_write_b8 v0, v1 offset:9
	; GFX7-NEXT: v_mov_b32_e32 v1, s3			; GFX7-NEXT: v_mov_b32_e32 v1, s1
	; GFX7-NEXT: s_lshr_b32 s2, s2, 16			; GFX7-NEXT: s_lshr_b32 s1, s0, 16
	; GFX7-NEXT: ds_write_b8 v0, v1 offset:11			; GFX7-NEXT: ds_write_b8 v0, v1 offset:11
	; GFX7-NEXT: v_mov_b32_e32 v1, s2			; GFX7-NEXT: v_mov_b32_e32 v1, s1
	; GFX7-NEXT: s_lshr_b32 s2, s1, 8
	; GFX7-NEXT: ds_write_b8 v0, v1 offset:10			; GFX7-NEXT: ds_write_b8 v0, v1 offset:10
				; GFX7-NEXT: v_mov_b32_e32 v1, s3
				; GFX7-NEXT: ds_write_b8 v0, v1 offset:4
	; GFX7-NEXT: v_mov_b32_e32 v1, s2			; GFX7-NEXT: v_mov_b32_e32 v1, s2
	; GFX7-NEXT: s_lshr_b32 s2, s1, 24			; GFX7-NEXT: s_lshr_b32 s1, s3, 8
				; GFX7-NEXT: ds_write_b8 v0, v1
				; GFX7-NEXT: v_mov_b32_e32 v1, s1
				; GFX7-NEXT: s_lshr_b32 s1, s3, 24
	; GFX7-NEXT: ds_write_b8 v0, v1 offset:5			; GFX7-NEXT: ds_write_b8 v0, v1 offset:5
	; GFX7-NEXT: v_mov_b32_e32 v1, s2			; GFX7-NEXT: v_mov_b32_e32 v1, s1
	; GFX7-NEXT: s_lshr_b32 s1, s1, 16			; GFX7-NEXT: s_lshr_b32 s1, s3, 16
	; GFX7-NEXT: ds_write_b8 v0, v1 offset:7			; GFX7-NEXT: ds_write_b8 v0, v1 offset:7
	; GFX7-NEXT: v_mov_b32_e32 v1, s1			; GFX7-NEXT: v_mov_b32_e32 v1, s1
	; GFX7-NEXT: s_lshr_b32 s1, s0, 8			; GFX7-NEXT: s_lshr_b32 s1, s2, 8
	; GFX7-NEXT: ds_write_b8 v0, v1 offset:6			; GFX7-NEXT: ds_write_b8 v0, v1 offset:6
	; GFX7-NEXT: v_mov_b32_e32 v1, s1			; GFX7-NEXT: v_mov_b32_e32 v1, s1
	; GFX7-NEXT: s_lshr_b32 s1, s0, 24			; GFX7-NEXT: s_lshr_b32 s1, s2, 24
	; GFX7-NEXT: ds_write_b8 v0, v1 offset:1			; GFX7-NEXT: ds_write_b8 v0, v1 offset:1
	; GFX7-NEXT: v_mov_b32_e32 v1, s1			; GFX7-NEXT: v_mov_b32_e32 v1, s1
	; GFX7-NEXT: s_lshr_b32 s0, s0, 16			; GFX7-NEXT: s_lshr_b32 s1, s2, 16
	; GFX7-NEXT: ds_write_b8 v0, v1 offset:3			; GFX7-NEXT: ds_write_b8 v0, v1 offset:3
	; GFX7-NEXT: v_mov_b32_e32 v1, s0			; GFX7-NEXT: v_mov_b32_e32 v1, s1
	; GFX7-NEXT: ds_write_b8 v0, v1 offset:2			; GFX7-NEXT: ds_write_b8 v0, v1 offset:2
				; GFX7-NEXT: v_mov_b32_e32 v1, s0
				; GFX7-NEXT: ds_write_b8 v0, v1 offset:8
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX6-LABEL: store_lds_v3i32_align1:			; GFX6-LABEL: store_lds_v3i32_align1:
	; GFX6: ; %bb.0:			; GFX6: ; %bb.0:
	; GFX6-NEXT: s_load_dword s4, s[0:1], 0x9			; GFX6-NEXT: s_load_dword s4, s[0:1], 0x9
	; GFX6-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0xd			; GFX6-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0xd
				; GFX6-NEXT: s_load_dword s0, s[0:1], 0xf
	; GFX6-NEXT: s_mov_b32 m0, -1			; GFX6-NEXT: s_mov_b32 m0, -1
	; GFX6-NEXT: s_waitcnt lgkmcnt(0)			; GFX6-NEXT: s_waitcnt lgkmcnt(0)
	; GFX6-NEXT: v_mov_b32_e32 v0, s4			; GFX6-NEXT: v_mov_b32_e32 v0, s4
	; GFX6-NEXT: v_mov_b32_e32 v1, s2			; GFX6-NEXT: s_lshr_b32 s1, s0, 8
	; GFX6-NEXT: v_mov_b32_e32 v2, s1			; GFX6-NEXT: v_mov_b32_e32 v1, s1
	; GFX6-NEXT: ds_write_b8 v0, v1 offset:8			; GFX6-NEXT: s_lshr_b32 s1, s0, 24
	; GFX6-NEXT: ds_write_b8 v0, v2 offset:4
	; GFX6-NEXT: v_mov_b32_e32 v1, s0
	; GFX6-NEXT: s_lshr_b32 s3, s2, 8
	; GFX6-NEXT: ds_write_b8 v0, v1
	; GFX6-NEXT: v_mov_b32_e32 v1, s3
	; GFX6-NEXT: s_lshr_b32 s3, s2, 24
	; GFX6-NEXT: ds_write_b8 v0, v1 offset:9			; GFX6-NEXT: ds_write_b8 v0, v1 offset:9
	; GFX6-NEXT: v_mov_b32_e32 v1, s3			; GFX6-NEXT: v_mov_b32_e32 v1, s1
	; GFX6-NEXT: s_lshr_b32 s2, s2, 16			; GFX6-NEXT: s_lshr_b32 s1, s0, 16
	; GFX6-NEXT: ds_write_b8 v0, v1 offset:11			; GFX6-NEXT: ds_write_b8 v0, v1 offset:11
	; GFX6-NEXT: v_mov_b32_e32 v1, s2			; GFX6-NEXT: v_mov_b32_e32 v1, s1
	; GFX6-NEXT: s_lshr_b32 s2, s1, 8
	; GFX6-NEXT: ds_write_b8 v0, v1 offset:10			; GFX6-NEXT: ds_write_b8 v0, v1 offset:10
				; GFX6-NEXT: v_mov_b32_e32 v1, s3
				; GFX6-NEXT: ds_write_b8 v0, v1 offset:4
	; GFX6-NEXT: v_mov_b32_e32 v1, s2			; GFX6-NEXT: v_mov_b32_e32 v1, s2
	; GFX6-NEXT: s_lshr_b32 s2, s1, 24			; GFX6-NEXT: s_lshr_b32 s1, s3, 8
				; GFX6-NEXT: ds_write_b8 v0, v1
				; GFX6-NEXT: v_mov_b32_e32 v1, s1
				; GFX6-NEXT: s_lshr_b32 s1, s3, 24
	; GFX6-NEXT: ds_write_b8 v0, v1 offset:5			; GFX6-NEXT: ds_write_b8 v0, v1 offset:5
	; GFX6-NEXT: v_mov_b32_e32 v1, s2			; GFX6-NEXT: v_mov_b32_e32 v1, s1
	; GFX6-NEXT: s_lshr_b32 s1, s1, 16			; GFX6-NEXT: s_lshr_b32 s1, s3, 16
	; GFX6-NEXT: ds_write_b8 v0, v1 offset:7			; GFX6-NEXT: ds_write_b8 v0, v1 offset:7
	; GFX6-NEXT: v_mov_b32_e32 v1, s1			; GFX6-NEXT: v_mov_b32_e32 v1, s1
	; GFX6-NEXT: s_lshr_b32 s1, s0, 8			; GFX6-NEXT: s_lshr_b32 s1, s2, 8
	; GFX6-NEXT: ds_write_b8 v0, v1 offset:6			; GFX6-NEXT: ds_write_b8 v0, v1 offset:6
	; GFX6-NEXT: v_mov_b32_e32 v1, s1			; GFX6-NEXT: v_mov_b32_e32 v1, s1
	; GFX6-NEXT: s_lshr_b32 s1, s0, 24			; GFX6-NEXT: s_lshr_b32 s1, s2, 24
	; GFX6-NEXT: ds_write_b8 v0, v1 offset:1			; GFX6-NEXT: ds_write_b8 v0, v1 offset:1
	; GFX6-NEXT: v_mov_b32_e32 v1, s1			; GFX6-NEXT: v_mov_b32_e32 v1, s1
	; GFX6-NEXT: s_lshr_b32 s0, s0, 16			; GFX6-NEXT: s_lshr_b32 s1, s2, 16
	; GFX6-NEXT: ds_write_b8 v0, v1 offset:3			; GFX6-NEXT: ds_write_b8 v0, v1 offset:3
	; GFX6-NEXT: v_mov_b32_e32 v1, s0			; GFX6-NEXT: v_mov_b32_e32 v1, s1
	; GFX6-NEXT: ds_write_b8 v0, v1 offset:2			; GFX6-NEXT: ds_write_b8 v0, v1 offset:2
				; GFX6-NEXT: v_mov_b32_e32 v1, s0
				; GFX6-NEXT: ds_write_b8 v0, v1 offset:8
	; GFX6-NEXT: s_endpgm			; GFX6-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: store_lds_v3i32_align1:			; GFX10-LABEL: store_lds_v3i32_align1:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x2
	; GFX10-NEXT: s_load_dword s2, s[0:1], 0x24			; GFX10-NEXT: s_load_dword s4, s[0:1], 0x3c
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x34			; GFX10-NEXT: s_load_dword s5, s[0:1], 0x24
				; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v0, s2			; GFX10-NEXT: s_lshr_b32 s0, s4, 8
	; GFX10-NEXT: v_mov_b32_e32 v1, s6			; GFX10-NEXT: v_mov_b32_e32 v0, s5
	; GFX10-NEXT: v_mov_b32_e32 v2, s5			; GFX10-NEXT: v_mov_b32_e32 v1, s3
	; GFX10-NEXT: v_mov_b32_e32 v3, s4			; GFX10-NEXT: v_mov_b32_e32 v2, s2
	; GFX10-NEXT: s_lshr_b32 s0, s6, 8			; GFX10-NEXT: s_lshr_b32 s1, s4, 24
	; GFX10-NEXT: s_lshr_b32 s1, s6, 24			; GFX10-NEXT: s_lshr_b32 s5, s3, 8
	; GFX10-NEXT: s_lshr_b32 s2, s5, 8			; GFX10-NEXT: s_lshr_b32 s3, s3, 24
	; GFX10-NEXT: s_lshr_b32 s3, s5, 24			; GFX10-NEXT: s_lshr_b32 s6, s2, 8
	; GFX10-NEXT: s_lshr_b32 s5, s4, 8			; GFX10-NEXT: s_lshr_b32 s2, s2, 24
	; GFX10-NEXT: s_lshr_b32 s4, s4, 24
	; GFX10-NEXT: v_mov_b32_e32 v4, s0			; GFX10-NEXT: v_mov_b32_e32 v4, s0
				; GFX10-NEXT: v_mov_b32_e32 v3, s4
	; GFX10-NEXT: v_mov_b32_e32 v5, s1			; GFX10-NEXT: v_mov_b32_e32 v5, s1
	; GFX10-NEXT: v_mov_b32_e32 v6, s2			; GFX10-NEXT: v_mov_b32_e32 v6, s5
	; GFX10-NEXT: v_mov_b32_e32 v7, s3			; GFX10-NEXT: v_mov_b32_e32 v7, s3
	; GFX10-NEXT: v_mov_b32_e32 v8, s5			; GFX10-NEXT: v_mov_b32_e32 v8, s6
	; GFX10-NEXT: v_mov_b32_e32 v9, s4			; GFX10-NEXT: v_mov_b32_e32 v9, s2
	; GFX10-NEXT: ds_write_b8 v0, v1 offset:8			; GFX10-NEXT: ds_write_b8 v0, v1 offset:4
	; GFX10-NEXT: ds_write_b8_d16_hi v0, v1 offset:10			; GFX10-NEXT: ds_write_b8_d16_hi v0, v1 offset:6
	; GFX10-NEXT: ds_write_b8 v0, v2 offset:4			; GFX10-NEXT: ds_write_b8 v0, v2
	; GFX10-NEXT: ds_write_b8_d16_hi v0, v2 offset:6			; GFX10-NEXT: ds_write_b8_d16_hi v0, v2 offset:2
	; GFX10-NEXT: ds_write_b8 v0, v3
	; GFX10-NEXT: ds_write_b8_d16_hi v0, v3 offset:2
	; GFX10-NEXT: ds_write_b8 v0, v4 offset:9			; GFX10-NEXT: ds_write_b8 v0, v4 offset:9
	; GFX10-NEXT: ds_write_b8 v0, v5 offset:11			; GFX10-NEXT: ds_write_b8 v0, v5 offset:11
	; GFX10-NEXT: ds_write_b8 v0, v6 offset:5			; GFX10-NEXT: ds_write_b8 v0, v6 offset:5
	; GFX10-NEXT: ds_write_b8 v0, v7 offset:7			; GFX10-NEXT: ds_write_b8 v0, v7 offset:7
	; GFX10-NEXT: ds_write_b8 v0, v8 offset:1			; GFX10-NEXT: ds_write_b8 v0, v8 offset:1
	; GFX10-NEXT: ds_write_b8 v0, v9 offset:3			; GFX10-NEXT: ds_write_b8 v0, v9 offset:3
				; GFX10-NEXT: ds_write_b8 v0, v3 offset:8
				; GFX10-NEXT: ds_write_b8_d16_hi v0, v3 offset:10
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	store <3 x i32> %x, <3 x i32> addrspace(3)* %out, align 1			store <3 x i32> %x, <3 x i32> addrspace(3)* %out, align 1
	ret void			ret void
	}			}

	define amdgpu_kernel void @store_lds_v3i32_align2(<3 x i32> addrspace(3)* %out, <3 x i32> %x) {			define amdgpu_kernel void @store_lds_v3i32_align2(<3 x i32> addrspace(3)* %out, <3 x i32> %x) {
	; GFX9-LABEL: store_lds_v3i32_align2:			; GFX9-LABEL: store_lds_v3i32_align2:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_load_dword s2, s[0:1], 0x24			; GFX9-NEXT: s_load_dword s4, s[0:1], 0x24
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x34			; GFX9-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
				; GFX9-NEXT: s_load_dword s5, s[0:1], 0x3c
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, s2			; GFX9-NEXT: v_mov_b32_e32 v0, s4
	; GFX9-NEXT: v_mov_b32_e32 v1, s6			; GFX9-NEXT: v_mov_b32_e32 v1, s3
	; GFX9-NEXT: v_mov_b32_e32 v2, s5			; GFX9-NEXT: ds_write_b16 v0, v1 offset:4
	; GFX9-NEXT: ds_write_b16 v0, v1 offset:8			; GFX9-NEXT: ds_write_b16_d16_hi v0, v1 offset:6
	; GFX9-NEXT: ds_write_b16_d16_hi v0, v1 offset:10			; GFX9-NEXT: v_mov_b32_e32 v1, s2
	; GFX9-NEXT: ds_write_b16 v0, v2 offset:4
	; GFX9-NEXT: ds_write_b16_d16_hi v0, v2 offset:6
	; GFX9-NEXT: v_mov_b32_e32 v1, s4
	; GFX9-NEXT: ds_write_b16 v0, v1			; GFX9-NEXT: ds_write_b16 v0, v1
	; GFX9-NEXT: ds_write_b16_d16_hi v0, v1 offset:2			; GFX9-NEXT: ds_write_b16_d16_hi v0, v1 offset:2
				; GFX9-NEXT: v_mov_b32_e32 v1, s5
				; GFX9-NEXT: ds_write_b16 v0, v1 offset:8
				; GFX9-NEXT: ds_write_b16_d16_hi v0, v1 offset:10
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX7-LABEL: store_lds_v3i32_align2:			; GFX7-LABEL: store_lds_v3i32_align2:
	; GFX7: ; %bb.0:			; GFX7: ; %bb.0:
	; GFX7-NEXT: s_load_dword s4, s[0:1], 0x9			; GFX7-NEXT: s_load_dword s4, s[0:1], 0x9
	; GFX7-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0xd			; GFX7-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0xd
				; GFX7-NEXT: s_load_dword s0, s[0:1], 0xf
	; GFX7-NEXT: s_mov_b32 m0, -1			; GFX7-NEXT: s_mov_b32 m0, -1
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: v_mov_b32_e32 v0, s4			; GFX7-NEXT: v_mov_b32_e32 v0, s4
				; GFX7-NEXT: s_lshr_b32 s1, s0, 16
				; GFX7-NEXT: v_mov_b32_e32 v1, s1
				; GFX7-NEXT: ds_write_b16 v0, v1 offset:10
				; GFX7-NEXT: v_mov_b32_e32 v1, s3
				; GFX7-NEXT: ds_write_b16 v0, v1 offset:4
	; GFX7-NEXT: v_mov_b32_e32 v1, s2			; GFX7-NEXT: v_mov_b32_e32 v1, s2
	; GFX7-NEXT: v_mov_b32_e32 v2, s1			; GFX7-NEXT: s_lshr_b32 s1, s3, 16
	; GFX7-NEXT: ds_write_b16 v0, v1 offset:8
	; GFX7-NEXT: ds_write_b16 v0, v2 offset:4
	; GFX7-NEXT: v_mov_b32_e32 v1, s0
	; GFX7-NEXT: s_lshr_b32 s2, s2, 16
	; GFX7-NEXT: ds_write_b16 v0, v1			; GFX7-NEXT: ds_write_b16 v0, v1
	; GFX7-NEXT: v_mov_b32_e32 v1, s2
	; GFX7-NEXT: s_lshr_b32 s1, s1, 16
	; GFX7-NEXT: ds_write_b16 v0, v1 offset:10
	; GFX7-NEXT: v_mov_b32_e32 v1, s1			; GFX7-NEXT: v_mov_b32_e32 v1, s1
	; GFX7-NEXT: s_lshr_b32 s0, s0, 16			; GFX7-NEXT: s_lshr_b32 s1, s2, 16
	; GFX7-NEXT: ds_write_b16 v0, v1 offset:6			; GFX7-NEXT: ds_write_b16 v0, v1 offset:6
	; GFX7-NEXT: v_mov_b32_e32 v1, s0			; GFX7-NEXT: v_mov_b32_e32 v1, s1
	; GFX7-NEXT: ds_write_b16 v0, v1 offset:2			; GFX7-NEXT: ds_write_b16 v0, v1 offset:2
				; GFX7-NEXT: v_mov_b32_e32 v1, s0
				; GFX7-NEXT: ds_write_b16 v0, v1 offset:8
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX6-LABEL: store_lds_v3i32_align2:			; GFX6-LABEL: store_lds_v3i32_align2:
	; GFX6: ; %bb.0:			; GFX6: ; %bb.0:
	; GFX6-NEXT: s_load_dword s4, s[0:1], 0x9			; GFX6-NEXT: s_load_dword s4, s[0:1], 0x9
	; GFX6-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0xd			; GFX6-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0xd
				; GFX6-NEXT: s_load_dword s0, s[0:1], 0xf
	; GFX6-NEXT: s_mov_b32 m0, -1			; GFX6-NEXT: s_mov_b32 m0, -1
	; GFX6-NEXT: s_waitcnt lgkmcnt(0)			; GFX6-NEXT: s_waitcnt lgkmcnt(0)
	; GFX6-NEXT: v_mov_b32_e32 v0, s4			; GFX6-NEXT: v_mov_b32_e32 v0, s4
				; GFX6-NEXT: s_lshr_b32 s1, s0, 16
				; GFX6-NEXT: v_mov_b32_e32 v1, s1
				; GFX6-NEXT: ds_write_b16 v0, v1 offset:10
				; GFX6-NEXT: v_mov_b32_e32 v1, s3
				; GFX6-NEXT: ds_write_b16 v0, v1 offset:4
	; GFX6-NEXT: v_mov_b32_e32 v1, s2			; GFX6-NEXT: v_mov_b32_e32 v1, s2
	; GFX6-NEXT: v_mov_b32_e32 v2, s1			; GFX6-NEXT: s_lshr_b32 s1, s3, 16
	; GFX6-NEXT: ds_write_b16 v0, v1 offset:8
	; GFX6-NEXT: ds_write_b16 v0, v2 offset:4
	; GFX6-NEXT: v_mov_b32_e32 v1, s0
	; GFX6-NEXT: s_lshr_b32 s2, s2, 16
	; GFX6-NEXT: ds_write_b16 v0, v1			; GFX6-NEXT: ds_write_b16 v0, v1
	; GFX6-NEXT: v_mov_b32_e32 v1, s2
	; GFX6-NEXT: s_lshr_b32 s1, s1, 16
	; GFX6-NEXT: ds_write_b16 v0, v1 offset:10
	; GFX6-NEXT: v_mov_b32_e32 v1, s1			; GFX6-NEXT: v_mov_b32_e32 v1, s1
	; GFX6-NEXT: s_lshr_b32 s0, s0, 16			; GFX6-NEXT: s_lshr_b32 s1, s2, 16
	; GFX6-NEXT: ds_write_b16 v0, v1 offset:6			; GFX6-NEXT: ds_write_b16 v0, v1 offset:6
	; GFX6-NEXT: v_mov_b32_e32 v1, s0			; GFX6-NEXT: v_mov_b32_e32 v1, s1
	; GFX6-NEXT: ds_write_b16 v0, v1 offset:2			; GFX6-NEXT: ds_write_b16 v0, v1 offset:2
				; GFX6-NEXT: v_mov_b32_e32 v1, s0
				; GFX6-NEXT: ds_write_b16 v0, v1 offset:8
	; GFX6-NEXT: s_endpgm			; GFX6-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: store_lds_v3i32_align2:			; GFX10-LABEL: store_lds_v3i32_align2:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x2
	; GFX10-NEXT: s_load_dword s2, s[0:1], 0x24			; GFX10-NEXT: s_load_dword s4, s[0:1], 0x24
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x34			; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
				; GFX10-NEXT: s_load_dword s5, s[0:1], 0x3c
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v0, s2			; GFX10-NEXT: v_mov_b32_e32 v0, s4
	; GFX10-NEXT: v_mov_b32_e32 v1, s6			; GFX10-NEXT: v_mov_b32_e32 v1, s3
	; GFX10-NEXT: v_mov_b32_e32 v2, s5			; GFX10-NEXT: v_mov_b32_e32 v2, s2
	; GFX10-NEXT: v_mov_b32_e32 v3, s4			; GFX10-NEXT: v_mov_b32_e32 v3, s5
	; GFX10-NEXT: ds_write_b16 v0, v1 offset:8			; GFX10-NEXT: ds_write_b16 v0, v1 offset:4
	; GFX10-NEXT: ds_write_b16_d16_hi v0, v1 offset:10			; GFX10-NEXT: ds_write_b16_d16_hi v0, v1 offset:6
	; GFX10-NEXT: ds_write_b16 v0, v2 offset:4			; GFX10-NEXT: ds_write_b16 v0, v2
	; GFX10-NEXT: ds_write_b16_d16_hi v0, v2 offset:6			; GFX10-NEXT: ds_write_b16_d16_hi v0, v2 offset:2
	; GFX10-NEXT: ds_write_b16 v0, v3			; GFX10-NEXT: ds_write_b16 v0, v3 offset:8
	; GFX10-NEXT: ds_write_b16_d16_hi v0, v3 offset:2			; GFX10-NEXT: ds_write_b16_d16_hi v0, v3 offset:10
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	store <3 x i32> %x, <3 x i32> addrspace(3)* %out, align 2			store <3 x i32> %x, <3 x i32> addrspace(3)* %out, align 2
	ret void			ret void
	}			}

	define amdgpu_kernel void @store_lds_v3i32_align4(<3 x i32> addrspace(3)* %out, <3 x i32> %x) {			define amdgpu_kernel void @store_lds_v3i32_align4(<3 x i32> addrspace(3)* %out, <3 x i32> %x) {
	; GFX9-LABEL: store_lds_v3i32_align4:			; GFX9-LABEL: store_lds_v3i32_align4:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_load_dword s2, s[0:1], 0x24			; GFX9-NEXT: s_load_dword s4, s[0:1], 0x24
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x34			; GFX9-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
				; GFX9-NEXT: s_load_dword s5, s[0:1], 0x3c
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, s2			; GFX9-NEXT: v_mov_b32_e32 v0, s4
	; GFX9-NEXT: v_mov_b32_e32 v1, s4			; GFX9-NEXT: v_mov_b32_e32 v1, s2
	; GFX9-NEXT: v_mov_b32_e32 v2, s5			; GFX9-NEXT: v_mov_b32_e32 v2, s3
	; GFX9-NEXT: v_mov_b32_e32 v3, s6
	; GFX9-NEXT: ds_write2_b32 v0, v1, v2 offset1:1			; GFX9-NEXT: ds_write2_b32 v0, v1, v2 offset1:1
	; GFX9-NEXT: ds_write_b32 v0, v3 offset:8			; GFX9-NEXT: v_mov_b32_e32 v1, s5
				; GFX9-NEXT: ds_write_b32 v0, v1 offset:8
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX7-LABEL: store_lds_v3i32_align4:			; GFX7-LABEL: store_lds_v3i32_align4:
	; GFX7: ; %bb.0:			; GFX7: ; %bb.0:
	; GFX7-NEXT: s_load_dword s4, s[0:1], 0x9			; GFX7-NEXT: s_load_dword s4, s[0:1], 0x9
	; GFX7-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0xd			; GFX7-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0xd
				; GFX7-NEXT: s_load_dword s0, s[0:1], 0xf
	; GFX7-NEXT: s_mov_b32 m0, -1			; GFX7-NEXT: s_mov_b32 m0, -1
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: v_mov_b32_e32 v0, s4			; GFX7-NEXT: v_mov_b32_e32 v0, s4
	; GFX7-NEXT: v_mov_b32_e32 v1, s0
	; GFX7-NEXT: v_mov_b32_e32 v2, s1
	; GFX7-NEXT: ds_write2_b32 v0, v1, v2 offset1:1
	; GFX7-NEXT: v_mov_b32_e32 v1, s2			; GFX7-NEXT: v_mov_b32_e32 v1, s2
				; GFX7-NEXT: v_mov_b32_e32 v2, s3
				; GFX7-NEXT: ds_write2_b32 v0, v1, v2 offset1:1
				; GFX7-NEXT: v_mov_b32_e32 v1, s0
	; GFX7-NEXT: ds_write_b32 v0, v1 offset:8			; GFX7-NEXT: ds_write_b32 v0, v1 offset:8
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX6-LABEL: store_lds_v3i32_align4:			; GFX6-LABEL: store_lds_v3i32_align4:
	; GFX6: ; %bb.0:			; GFX6: ; %bb.0:
	; GFX6-NEXT: s_load_dword s4, s[0:1], 0x9			; GFX6-NEXT: s_load_dword s4, s[0:1], 0x9
	; GFX6-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0xd			; GFX6-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0xd
				; GFX6-NEXT: s_load_dword s0, s[0:1], 0xf
	; GFX6-NEXT: s_mov_b32 m0, -1			; GFX6-NEXT: s_mov_b32 m0, -1
	; GFX6-NEXT: s_waitcnt lgkmcnt(0)			; GFX6-NEXT: s_waitcnt lgkmcnt(0)
	; GFX6-NEXT: v_mov_b32_e32 v0, s4			; GFX6-NEXT: v_mov_b32_e32 v0, s4
	; GFX6-NEXT: v_mov_b32_e32 v1, s1			; GFX6-NEXT: v_mov_b32_e32 v1, s3
	; GFX6-NEXT: v_mov_b32_e32 v2, s0			; GFX6-NEXT: v_mov_b32_e32 v2, s2
	; GFX6-NEXT: ds_write2_b32 v0, v2, v1 offset1:1			; GFX6-NEXT: ds_write2_b32 v0, v2, v1 offset1:1
	; GFX6-NEXT: v_mov_b32_e32 v1, s2			; GFX6-NEXT: v_mov_b32_e32 v1, s0
	; GFX6-NEXT: ds_write_b32 v0, v1 offset:8			; GFX6-NEXT: ds_write_b32 v0, v1 offset:8
	; GFX6-NEXT: s_endpgm			; GFX6-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: store_lds_v3i32_align4:			; GFX10-LABEL: store_lds_v3i32_align4:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x2
	; GFX10-NEXT: s_load_dword s2, s[0:1], 0x24			; GFX10-NEXT: s_load_dword s4, s[0:1], 0x24
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x34			; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
				; GFX10-NEXT: s_load_dword s5, s[0:1], 0x3c
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v0, s2			; GFX10-NEXT: v_mov_b32_e32 v0, s4
	; GFX10-NEXT: v_mov_b32_e32 v1, s4			; GFX10-NEXT: v_mov_b32_e32 v1, s2
	; GFX10-NEXT: v_mov_b32_e32 v2, s5			; GFX10-NEXT: v_mov_b32_e32 v2, s3
	; GFX10-NEXT: v_mov_b32_e32 v3, s6			; GFX10-NEXT: v_mov_b32_e32 v3, s5
	; GFX10-NEXT: ds_write2_b32 v0, v1, v2 offset1:1			; GFX10-NEXT: ds_write2_b32 v0, v1, v2 offset1:1
	; GFX10-NEXT: ds_write_b32 v0, v3 offset:8			; GFX10-NEXT: ds_write_b32 v0, v3 offset:8
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	store <3 x i32> %x, <3 x i32> addrspace(3)* %out, align 4			store <3 x i32> %x, <3 x i32> addrspace(3)* %out, align 4
	ret void			ret void
	}			}

	define amdgpu_kernel void @store_lds_v3i32_align8(<3 x i32> addrspace(3)* %out, <3 x i32> %x) {			define amdgpu_kernel void @store_lds_v3i32_align8(<3 x i32> addrspace(3)* %out, <3 x i32> %x) {
	; GFX9-LABEL: store_lds_v3i32_align8:			; GFX9-LABEL: store_lds_v3i32_align8:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_load_dword s2, s[0:1], 0x24			; GFX9-NEXT: s_load_dword s4, s[0:1], 0x24
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x34			; GFX9-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
				; GFX9-NEXT: s_load_dword s5, s[0:1], 0x3c
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v2, s2			; GFX9-NEXT: v_mov_b32_e32 v2, s4
	; GFX9-NEXT: v_mov_b32_e32 v3, s6			; GFX9-NEXT: v_mov_b32_e32 v0, s2
	; GFX9-NEXT: v_mov_b32_e32 v0, s4			; GFX9-NEXT: v_mov_b32_e32 v1, s3
	; GFX9-NEXT: v_mov_b32_e32 v1, s5
	; GFX9-NEXT: ds_write_b32 v2, v3 offset:8
	; GFX9-NEXT: ds_write_b64 v2, v[0:1]			; GFX9-NEXT: ds_write_b64 v2, v[0:1]
				; GFX9-NEXT: v_mov_b32_e32 v0, s5
				; GFX9-NEXT: ds_write_b32 v2, v0 offset:8
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX7-LABEL: store_lds_v3i32_align8:			; GFX7-LABEL: store_lds_v3i32_align8:
	; GFX7: ; %bb.0:			; GFX7: ; %bb.0:
	; GFX7-NEXT: s_load_dword s4, s[0:1], 0x9			; GFX7-NEXT: s_load_dword s4, s[0:1], 0x9
	; GFX7-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0xd			; GFX7-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0xd
				; GFX7-NEXT: s_load_dword s0, s[0:1], 0xf
	; GFX7-NEXT: s_mov_b32 m0, -1			; GFX7-NEXT: s_mov_b32 m0, -1
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: v_mov_b32_e32 v2, s4			; GFX7-NEXT: v_mov_b32_e32 v2, s4
	; GFX7-NEXT: v_mov_b32_e32 v1, s2			; GFX7-NEXT: v_mov_b32_e32 v0, s2
	; GFX7-NEXT: v_mov_b32_e32 v0, s0			; GFX7-NEXT: v_mov_b32_e32 v1, s3
	; GFX7-NEXT: ds_write_b32 v2, v1 offset:8
	; GFX7-NEXT: v_mov_b32_e32 v1, s1
	; GFX7-NEXT: ds_write_b64 v2, v[0:1]			; GFX7-NEXT: ds_write_b64 v2, v[0:1]
				; GFX7-NEXT: v_mov_b32_e32 v0, s0
				; GFX7-NEXT: ds_write_b32 v2, v0 offset:8
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX6-LABEL: store_lds_v3i32_align8:			; GFX6-LABEL: store_lds_v3i32_align8:
	; GFX6: ; %bb.0:			; GFX6: ; %bb.0:
	; GFX6-NEXT: s_load_dword s4, s[0:1], 0x9			; GFX6-NEXT: s_load_dword s4, s[0:1], 0x9
	; GFX6-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0xd			; GFX6-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0xd
				; GFX6-NEXT: s_load_dword s0, s[0:1], 0xf
	; GFX6-NEXT: s_mov_b32 m0, -1			; GFX6-NEXT: s_mov_b32 m0, -1
	; GFX6-NEXT: s_waitcnt lgkmcnt(0)			; GFX6-NEXT: s_waitcnt lgkmcnt(0)
	; GFX6-NEXT: v_mov_b32_e32 v2, s4			; GFX6-NEXT: v_mov_b32_e32 v2, s4
	; GFX6-NEXT: v_mov_b32_e32 v1, s2			; GFX6-NEXT: v_mov_b32_e32 v0, s2
	; GFX6-NEXT: v_mov_b32_e32 v0, s0			; GFX6-NEXT: v_mov_b32_e32 v1, s3
	; GFX6-NEXT: ds_write_b32 v2, v1 offset:8
	; GFX6-NEXT: v_mov_b32_e32 v1, s1
	; GFX6-NEXT: ds_write_b64 v2, v[0:1]			; GFX6-NEXT: ds_write_b64 v2, v[0:1]
				; GFX6-NEXT: v_mov_b32_e32 v0, s0
				; GFX6-NEXT: ds_write_b32 v2, v0 offset:8
	; GFX6-NEXT: s_endpgm			; GFX6-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: store_lds_v3i32_align8:			; GFX10-LABEL: store_lds_v3i32_align8:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x2
	; GFX10-NEXT: s_load_dword s2, s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x34			; GFX10-NEXT: s_load_dword s4, s[0:1], 0x24
				; GFX10-NEXT: s_load_dword s5, s[0:1], 0x3c
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v2, s2			; GFX10-NEXT: v_mov_b32_e32 v0, s2
	; GFX10-NEXT: v_mov_b32_e32 v3, s6			; GFX10-NEXT: v_mov_b32_e32 v1, s3
	; GFX10-NEXT: v_mov_b32_e32 v0, s4			; GFX10-NEXT: v_mov_b32_e32 v2, s4
	; GFX10-NEXT: v_mov_b32_e32 v1, s5			; GFX10-NEXT: v_mov_b32_e32 v3, s5
	; GFX10-NEXT: ds_write_b32 v2, v3 offset:8
	; GFX10-NEXT: ds_write_b64 v2, v[0:1]			; GFX10-NEXT: ds_write_b64 v2, v[0:1]
				; GFX10-NEXT: ds_write_b32 v2, v3 offset:8
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	store <3 x i32> %x, <3 x i32> addrspace(3)* %out, align 8			store <3 x i32> %x, <3 x i32> addrspace(3)* %out, align 8
	ret void			ret void
	}			}

	define amdgpu_kernel void @store_lds_v3i32_align16(<3 x i32> addrspace(3)* %out, <3 x i32> %x) {			define amdgpu_kernel void @store_lds_v3i32_align16(<3 x i32> addrspace(3)* %out, <3 x i32> %x) {
	; GFX9-LABEL: store_lds_v3i32_align16:			; GFX9-LABEL: store_lds_v3i32_align16:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_load_dword s2, s[0:1], 0x24			; GFX9-NEXT: s_load_dword s4, s[0:1], 0x24
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x34			; GFX9-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
				; GFX9-NEXT: s_load_dword s5, s[0:1], 0x3c
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v3, s2			; GFX9-NEXT: v_mov_b32_e32 v3, s4
	; GFX9-NEXT: v_mov_b32_e32 v0, s4			; GFX9-NEXT: v_mov_b32_e32 v0, s2
	; GFX9-NEXT: v_mov_b32_e32 v1, s5			; GFX9-NEXT: v_mov_b32_e32 v1, s3
	; GFX9-NEXT: v_mov_b32_e32 v2, s6			; GFX9-NEXT: v_mov_b32_e32 v2, s5
	; GFX9-NEXT: ds_write_b96 v3, v[0:2]			; GFX9-NEXT: ds_write_b96 v3, v[0:2]
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX7-LABEL: store_lds_v3i32_align16:			; GFX7-LABEL: store_lds_v3i32_align16:
	; GFX7: ; %bb.0:			; GFX7: ; %bb.0:
	; GFX7-NEXT: s_load_dword s4, s[0:1], 0x9			; GFX7-NEXT: s_load_dword s4, s[0:1], 0x9
	; GFX7-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0xd			; GFX7-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0xd
				; GFX7-NEXT: s_load_dword s0, s[0:1], 0xf
	; GFX7-NEXT: s_mov_b32 m0, -1			; GFX7-NEXT: s_mov_b32 m0, -1
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: v_mov_b32_e32 v3, s4			; GFX7-NEXT: v_mov_b32_e32 v3, s4
	; GFX7-NEXT: v_mov_b32_e32 v0, s0			; GFX7-NEXT: v_mov_b32_e32 v0, s2
	; GFX7-NEXT: v_mov_b32_e32 v1, s1			; GFX7-NEXT: v_mov_b32_e32 v1, s3
	; GFX7-NEXT: v_mov_b32_e32 v2, s2			; GFX7-NEXT: v_mov_b32_e32 v2, s0
	; GFX7-NEXT: ds_write_b96 v3, v[0:2]			; GFX7-NEXT: ds_write_b96 v3, v[0:2]
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX6-LABEL: store_lds_v3i32_align16:			; GFX6-LABEL: store_lds_v3i32_align16:
	; GFX6: ; %bb.0:			; GFX6: ; %bb.0:
	; GFX6-NEXT: s_load_dword s4, s[0:1], 0x9			; GFX6-NEXT: s_load_dword s4, s[0:1], 0x9
	; GFX6-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0xd			; GFX6-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0xd
				; GFX6-NEXT: s_load_dword s0, s[0:1], 0xf
	; GFX6-NEXT: s_mov_b32 m0, -1			; GFX6-NEXT: s_mov_b32 m0, -1
	; GFX6-NEXT: s_waitcnt lgkmcnt(0)			; GFX6-NEXT: s_waitcnt lgkmcnt(0)
	; GFX6-NEXT: v_mov_b32_e32 v2, s4			; GFX6-NEXT: v_mov_b32_e32 v2, s4
	; GFX6-NEXT: v_mov_b32_e32 v1, s2
	; GFX6-NEXT: v_mov_b32_e32 v0, s0			; GFX6-NEXT: v_mov_b32_e32 v0, s0
	; GFX6-NEXT: ds_write_b32 v2, v1 offset:8			; GFX6-NEXT: ds_write_b32 v2, v0 offset:8
	; GFX6-NEXT: v_mov_b32_e32 v1, s1			; GFX6-NEXT: v_mov_b32_e32 v0, s2
				; GFX6-NEXT: v_mov_b32_e32 v1, s3
	; GFX6-NEXT: ds_write_b64 v2, v[0:1]			; GFX6-NEXT: ds_write_b64 v2, v[0:1]
	; GFX6-NEXT: s_endpgm			; GFX6-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: store_lds_v3i32_align16:			; GFX10-LABEL: store_lds_v3i32_align16:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x2
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x34			; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
	; GFX10-NEXT: s_load_dword s2, s[0:1], 0x24			; GFX10-NEXT: s_load_dword s4, s[0:1], 0x3c
				; GFX10-NEXT: s_load_dword s5, s[0:1], 0x24
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v0, s4			; GFX10-NEXT: v_mov_b32_e32 v0, s2
	; GFX10-NEXT: v_mov_b32_e32 v1, s5			; GFX10-NEXT: v_mov_b32_e32 v1, s3
	; GFX10-NEXT: v_mov_b32_e32 v2, s6			; GFX10-NEXT: v_mov_b32_e32 v2, s4
	; GFX10-NEXT: v_mov_b32_e32 v3, s2			; GFX10-NEXT: v_mov_b32_e32 v3, s5
	; GFX10-NEXT: ds_write_b96 v3, v[0:2]			; GFX10-NEXT: ds_write_b96 v3, v[0:2]
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	store <3 x i32> %x, <3 x i32> addrspace(3)* %out, align 16			store <3 x i32> %x, <3 x i32> addrspace(3)* %out, align 16
	ret void			ret void
	}			}

llvm/test/CodeGen/PowerPC/memcmp-mergeexpand.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -verify-machineinstrs -mcpu=pwr8 -mtriple=powerpc64le-unknown-gnu-linux < %s \| FileCheck %s -check-prefix=PPC64LE			; RUN: llc -verify-machineinstrs -mcpu=pwr8 -mtriple=powerpc64le-unknown-gnu-linux < %s \| FileCheck %s -check-prefix=PPC64LE

	; This tests interaction between MergeICmp and ExpandMemCmp.			; This tests interaction between MergeICmp and ExpandMemCmp.

	%"struct.std::pair" = type { i32, i32 }			%"struct.std::pair" = type { i32, i32 }

	define zeroext i1 @opeq1(			define zeroext i1 @opeq1(
	; PPC64LE-LABEL: opeq1:			; PPC64LE-LABEL: opeq1:
	; PPC64LE: # %bb.0: # %"entry+land.rhs.i"			; PPC64LE: # %bb.0: # %entry
	; PPC64LE-NEXT: ld 3, 0(3)			; PPC64LE-NEXT: lwz 5, 0(3)
	; PPC64LE-NEXT: ld 4, 0(4)			; PPC64LE-NEXT: lwz 6, 0(4)
	; PPC64LE-NEXT: xor 3, 3, 4			; PPC64LE-NEXT: crxor 2, 2, 2
	; PPC64LE-NEXT: cntlzd 3, 3			; PPC64LE-NEXT: cmplw 1, 5, 6
	; PPC64LE-NEXT: rldicl 3, 3, 58, 63			; PPC64LE-NEXT: bne 1, .LBB0_2
				; PPC64LE-NEXT: # %bb.1: # %land.rhs.i
				; PPC64LE-NEXT: lwz 3, 4(3)
				; PPC64LE-NEXT: lwz 4, 4(4)
				; PPC64LE-NEXT: cmpw 3, 4
				; PPC64LE-NEXT: .LBB0_2: # %opeq1.exit
				; PPC64LE-NEXT: li 3, 0
				; PPC64LE-NEXT: li 4, 1
				; PPC64LE-NEXT: iseleq 3, 4, 3
	; PPC64LE-NEXT: blr			; PPC64LE-NEXT: blr
	%"struct.std::pair"* nocapture readonly dereferenceable(8) %a,			%"struct.std::pair"* nocapture readonly dereferenceable(8) %a,
	%"struct.std::pair"* nocapture readonly dereferenceable(8) %b) local_unnamed_addr #0 {			%"struct.std::pair"* nocapture readonly dereferenceable(8) %b) local_unnamed_addr #0 {
	entry:			entry:
	%first.i = getelementptr inbounds %"struct.std::pair", %"struct.std::pair"* %a, i64 0, i32 0			%first.i = getelementptr inbounds %"struct.std::pair", %"struct.std::pair"* %a, i64 0, i32 0
	%0 = load i32, i32* %first.i, align 4			%0 = load i32, i32* %first.i, align 4
	%first1.i = getelementptr inbounds %"struct.std::pair", %"struct.std::pair"* %b, i64 0, i32 0			%first1.i = getelementptr inbounds %"struct.std::pair", %"struct.std::pair"* %b, i64 0, i32 0
	%1 = load i32, i32* %first1.i, align 4			%1 = load i32, i32* %first1.i, align 4
	Show All 17 Lines

llvm/test/CodeGen/WebAssembly/reg-stackify.ll

	Show First 20 Lines • Show All 553 Lines • ▼ Show 20 Lines
	; CHECK: call {{.*}}, callee, $0			; CHECK: call {{.*}}, callee, $0
	; CHECK: i32.load $push{{.*}}, 0($2)			; CHECK: i32.load $push{{.*}}, 0($2)
	; CHECK: return $pop			; CHECK: return $pop
	; NOREGS-LABEL: store_past_invar_load			; NOREGS-LABEL: store_past_invar_load
	; NOREGS: i32.store 0			; NOREGS: i32.store 0
	; NOREGS: call callee			; NOREGS: call callee
	; NOREGS: i32.load 0			; NOREGS: i32.load 0
	; NOREGS: return			; NOREGS: return
	define i32 @store_past_invar_load(i32 %a, i32* %p1, i32* dereferenceable(4) %p2) {			define i32 @store_past_invar_load(i32 %a, i32* %p1, i32* byval(i32) %p2) {
	store i32 %a, i32* %p1			store i32 %a, i32* %p1
	%b = load i32, i32* %p2, !invariant.load !0			%b = load i32, i32* %p2, !invariant.load !0
	call i32 @callee(i32 %a)			call i32 @callee(i32 %a)
	ret i32 %b			ret i32 %b
	}			}

	; CHECK-LABEL: ignore_dbg_value:			; CHECK-LABEL: ignore_dbg_value:
	; CHECK: .Lfunc_begin			; CHECK: .Lfunc_begin
	▲ Show 20 Lines • Show All 107 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/load-partial.ll

Show First 20 Lines • Show All 151 Lines • ▼ Show 20 Lines	; AVX-NEXT: retq
%8 = getelementptr inbounds <4 x float>, <4 x float>* %0, i64 0, i64 2		%8 = getelementptr inbounds <4 x float>, <4 x float>* %0, i64 0, i64 2
%9 = load float, float* %8, align 4		%9 = load float, float* %8, align 4
%10 = insertelement <4 x float> %7, float %9, i32 2		%10 = insertelement <4 x float> %7, float %9, i32 2
%11 = insertelement <4 x float> %10, float %9, i32 3		%11 = insertelement <4 x float> %10, float %9, i32 3
ret <4 x float> %11		ret <4 x float> %11
}		}

define <4 x float> @load_float4_float3_trunc(<4 x float>* nocapture readonly dereferenceable(16)) {		define <4 x float> @load_float4_float3_trunc(<4 x float>* nocapture readonly dereferenceable(16)) {
; SSE-LABEL: load_float4_float3_trunc:		; SSE2-LABEL: load_float4_float3_trunc:
; SSE: # %bb.0:		; SSE2: # %bb.0:
; SSE-NEXT: movaps (%rdi), %xmm0		; SSE2-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
; SSE-NEXT: retq		; SSE2-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
		; SSE2-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
		; SSE2-NEXT: retq
		;
		; SSSE3-LABEL: load_float4_float3_trunc:
		; SSSE3: # %bb.0:
		; SSSE3-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
		; SSSE3-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
		; SSSE3-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
		; SSSE3-NEXT: retq
		;
		; SSE41-LABEL: load_float4_float3_trunc:
		; SSE41: # %bb.0:
		; SSE41-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
		; SSE41-NEXT: insertps {{.*#+}} xmm0 = xmm0[0,1],mem[0],xmm0[3]
		; SSE41-NEXT: retq
;		;
; AVX-LABEL: load_float4_float3_trunc:		; AVX-LABEL: load_float4_float3_trunc:
; AVX: # %bb.0:		; AVX: # %bb.0:
; AVX-NEXT: vmovaps (%rdi), %xmm0		; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
		; AVX-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1],mem[0],xmm0[3]
; AVX-NEXT: retq		; AVX-NEXT: retq
%2 = bitcast <4 x float>* %0 to i64*		%2 = bitcast <4 x float>* %0 to i64*
%3 = load i64, i64* %2, align 16		%3 = load i64, i64* %2, align 16
%4 = getelementptr inbounds <4 x float>, <4 x float>* %0, i64 0, i64 2		%4 = getelementptr inbounds <4 x float>, <4 x float>* %0, i64 0, i64 2
%5 = bitcast float* %4 to i64*		%5 = bitcast float* %4 to i64*
%6 = load i64, i64* %5, align 8		%6 = load i64, i64* %5, align 8
%7 = trunc i64 %3 to i32		%7 = trunc i64 %3 to i32
%8 = bitcast i32 %7 to float		%8 = bitcast i32 %7 to float
▲ Show 20 Lines • Show All 265 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt -mtriple=amdgcn-amd-amdhsa -S -amdgpu-aa-wrapper -amdgpu-aa -instcombine -o - %s \| FileCheck %s		; RUN: opt -mtriple=amdgcn-amd-amdhsa -S -amdgpu-aa-wrapper -amdgpu-aa -instcombine -o - %s \| FileCheck %s

; Make sure the optimization from memcpy-from-global.ll happens, but		; Make sure the optimization from memcpy-from-global.ll happens, but
; the constant source is not a global variable.		; the constant source is not a global variable.

target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5"		target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5"

; Simple memcpy to alloca from constant address space argument.		; Simple memcpy to alloca from constant address space argument.
		xbolva00Unsubmitted Not Done Reply Inline Actions No longer happens.. xbolva00: No longer happens..
define i8 @memcpy_constant_arg_ptr_to_alloca([32 x i8] addrspace(4)* noalias readonly align 4 dereferenceable(32) %arg, i32 %idx) {		define i8 @memcpy_constant_arg_ptr_to_alloca([32 x i8] addrspace(4)* noalias readonly align 4 dereferenceable(32) %arg, i32 %idx) {
; CHECK-LABEL: @memcpy_constant_arg_ptr_to_alloca(		; CHECK-LABEL: @memcpy_constant_arg_ptr_to_alloca(
; CHECK-NEXT: [[TMP1:%.]] = sext i32 [[IDX:%.]] to i64		; CHECK-NEXT: [[ALLOCA:%.*]] = alloca [32 x i8], align 4, addrspace(5)
; CHECK-NEXT: [[GEP:%.]] = getelementptr [32 x i8], [32 x i8] addrspace(4) [[ARG:%.*]], i64 0, i64 [[TMP1]]		; CHECK-NEXT: [[ALLOCA_CAST:%.]] = getelementptr inbounds [32 x i8], [32 x i8] addrspace(5) [[ALLOCA]], i32 0, i32 0
; CHECK-NEXT: [[LOAD:%.]] = load i8, i8 addrspace(4) [[GEP]], align 1		; CHECK-NEXT: [[ARG_CAST:%.]] = getelementptr inbounds [32 x i8], [32 x i8] addrspace(4) [[ARG:%.*]], i64 0, i64 0
		; CHECK-NEXT: call void @llvm.memcpy.p5i8.p4i8.i64(i8 addrspace(5)* noundef align 4 dereferenceable(32) [[ALLOCA_CAST]], i8 addrspace(4)* noundef align 4 dereferenceable(32) [[ARG_CAST]], i64 32, i1 false)
		; CHECK-NEXT: [[GEP:%.]] = getelementptr inbounds [32 x i8], [32 x i8] addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
		; CHECK-NEXT: [[LOAD:%.]] = load i8, i8 addrspace(5) [[GEP]], align 1
; CHECK-NEXT: ret i8 [[LOAD]]		; CHECK-NEXT: ret i8 [[LOAD]]
;		;
%alloca = alloca [32 x i8], align 4, addrspace(5)		%alloca = alloca [32 x i8], align 4, addrspace(5)
%alloca.cast = bitcast [32 x i8] addrspace(5)* %alloca to i8 addrspace(5)*		%alloca.cast = bitcast [32 x i8] addrspace(5)* %alloca to i8 addrspace(5)*
%arg.cast = bitcast [32 x i8] addrspace(4)* %arg to i8 addrspace(4)*		%arg.cast = bitcast [32 x i8] addrspace(4)* %arg to i8 addrspace(4)*
call void @llvm.memcpy.p5i8.p4i8.i64(i8 addrspace(5)* %alloca.cast, i8 addrspace(4)* %arg.cast, i64 32, i1 false)		call void @llvm.memcpy.p5i8.p4i8.i64(i8 addrspace(5)* %alloca.cast, i8 addrspace(4)* %arg.cast, i64 32, i1 false)
%gep = getelementptr inbounds [32 x i8], [32 x i8] addrspace(5)* %alloca, i32 0, i32 %idx		%gep = getelementptr inbounds [32 x i8], [32 x i8] addrspace(5)* %alloca, i32 0, i32 %idx
%load = load i8, i8 addrspace(5)* %gep		%load = load i8, i8 addrspace(5)* %gep
ret i8 %load		ret i8 %load
}		}

define i8 @memcpy_constant_arg_ptr_to_alloca_load_metadata([32 x i8] addrspace(4)* noalias readonly align 4 dereferenceable(32) %arg, i32 %idx) {		define i8 @memcpy_constant_arg_ptr_to_alloca_load_metadata([32 x i8] addrspace(4)* noalias readonly align 4 dereferenceable(32) %arg, i32 %idx) {
; CHECK-LABEL: @memcpy_constant_arg_ptr_to_alloca_load_metadata(		; CHECK-LABEL: @memcpy_constant_arg_ptr_to_alloca_load_metadata(
; CHECK-NEXT: [[TMP1:%.]] = sext i32 [[IDX:%.]] to i64		; CHECK-NEXT: [[ALLOCA:%.*]] = alloca [32 x i8], align 4, addrspace(5)
; CHECK-NEXT: [[GEP:%.]] = getelementptr [32 x i8], [32 x i8] addrspace(4) [[ARG:%.*]], i64 0, i64 [[TMP1]]		; CHECK-NEXT: [[ALLOCA_CAST:%.]] = getelementptr inbounds [32 x i8], [32 x i8] addrspace(5) [[ALLOCA]], i32 0, i32 0
; CHECK-NEXT: [[LOAD:%.]] = load i8, i8 addrspace(4) [[GEP]], align 1, !noalias !0		; CHECK-NEXT: [[ARG_CAST:%.]] = getelementptr inbounds [32 x i8], [32 x i8] addrspace(4) [[ARG:%.*]], i64 0, i64 0
		; CHECK-NEXT: call void @llvm.memcpy.p5i8.p4i8.i64(i8 addrspace(5)* noundef align 4 dereferenceable(32) [[ALLOCA_CAST]], i8 addrspace(4)* noundef align 4 dereferenceable(32) [[ARG_CAST]], i64 32, i1 false)
		; CHECK-NEXT: [[GEP:%.]] = getelementptr inbounds [32 x i8], [32 x i8] addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
		; CHECK-NEXT: [[LOAD:%.]] = load i8, i8 addrspace(5) [[GEP]], align 1, !noalias !0
; CHECK-NEXT: ret i8 [[LOAD]]		; CHECK-NEXT: ret i8 [[LOAD]]
;		;
%alloca = alloca [32 x i8], align 4, addrspace(5)		%alloca = alloca [32 x i8], align 4, addrspace(5)
%alloca.cast = bitcast [32 x i8] addrspace(5)* %alloca to i8 addrspace(5)*		%alloca.cast = bitcast [32 x i8] addrspace(5)* %alloca to i8 addrspace(5)*
%arg.cast = bitcast [32 x i8] addrspace(4)* %arg to i8 addrspace(4)*		%arg.cast = bitcast [32 x i8] addrspace(4)* %arg to i8 addrspace(4)*
call void @llvm.memcpy.p5i8.p4i8.i64(i8 addrspace(5)* %alloca.cast, i8 addrspace(4)* %arg.cast, i64 32, i1 false)		call void @llvm.memcpy.p5i8.p4i8.i64(i8 addrspace(5)* %alloca.cast, i8 addrspace(4)* %arg.cast, i64 32, i1 false)
%gep = getelementptr inbounds [32 x i8], [32 x i8] addrspace(5)* %alloca, i32 0, i32 %idx		%gep = getelementptr inbounds [32 x i8], [32 x i8] addrspace(5)* %alloca, i32 0, i32 %idx
%load = load i8, i8 addrspace(5)* %gep, !noalias !0		%load = load i8, i8 addrspace(5)* %gep, !noalias !0
ret i8 %load		ret i8 %load
}		}

define i64 @memcpy_constant_arg_ptr_to_alloca_load_alignment([32 x i64] addrspace(4)* noalias readonly align 4 dereferenceable(256) %arg, i32 %idx) {		define i64 @memcpy_constant_arg_ptr_to_alloca_load_alignment([32 x i64] addrspace(4)* noalias readonly align 4 dereferenceable(256) %arg, i32 %idx) {
; CHECK-LABEL: @memcpy_constant_arg_ptr_to_alloca_load_alignment(		; CHECK-LABEL: @memcpy_constant_arg_ptr_to_alloca_load_alignment(
; CHECK-NEXT: [[TMP1:%.]] = sext i32 [[IDX:%.]] to i64		; CHECK-NEXT: [[ALLOCA:%.*]] = alloca [32 x i64], align 4, addrspace(5)
; CHECK-NEXT: [[GEP:%.]] = getelementptr [32 x i64], [32 x i64] addrspace(4) [[ARG:%.*]], i64 0, i64 [[TMP1]]		; CHECK-NEXT: [[ALLOCA_CAST:%.]] = bitcast [32 x i64] addrspace(5) [[ALLOCA]] to i8 addrspace(5)*
; CHECK-NEXT: [[LOAD:%.]] = load i64, i64 addrspace(4) [[GEP]], align 16		; CHECK-NEXT: [[ARG_CAST:%.]] = bitcast [32 x i64] addrspace(4) [[ARG:%.]] to i8 addrspace(4)
		; CHECK-NEXT: call void @llvm.memcpy.p5i8.p4i8.i64(i8 addrspace(5)* noundef align 4 dereferenceable(256) [[ALLOCA_CAST]], i8 addrspace(4)* noundef align 4 dereferenceable(256) [[ARG_CAST]], i64 256, i1 false)
		; CHECK-NEXT: [[GEP:%.]] = getelementptr inbounds [32 x i64], [32 x i64] addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
		; CHECK-NEXT: [[LOAD:%.]] = load i64, i64 addrspace(5) [[GEP]], align 16
; CHECK-NEXT: ret i64 [[LOAD]]		; CHECK-NEXT: ret i64 [[LOAD]]
;		;
%alloca = alloca [32 x i64], align 4, addrspace(5)		%alloca = alloca [32 x i64], align 4, addrspace(5)
%alloca.cast = bitcast [32 x i64] addrspace(5)* %alloca to i8 addrspace(5)*		%alloca.cast = bitcast [32 x i64] addrspace(5)* %alloca to i8 addrspace(5)*
%arg.cast = bitcast [32 x i64] addrspace(4)* %arg to i8 addrspace(4)*		%arg.cast = bitcast [32 x i64] addrspace(4)* %arg to i8 addrspace(4)*
call void @llvm.memcpy.p5i8.p4i8.i64(i8 addrspace(5)* %alloca.cast, i8 addrspace(4)* %arg.cast, i64 256, i1 false)		call void @llvm.memcpy.p5i8.p4i8.i64(i8 addrspace(5)* %alloca.cast, i8 addrspace(4)* %arg.cast, i64 256, i1 false)
%gep = getelementptr inbounds [32 x i64], [32 x i64] addrspace(5)* %alloca, i32 0, i32 %idx		%gep = getelementptr inbounds [32 x i64], [32 x i64] addrspace(5)* %alloca, i32 0, i32 %idx
%load = load i64, i64 addrspace(5)* %gep, align 16		%load = load i64, i64 addrspace(5)* %gep, align 16
Show All 17 Lines	;
%gep = getelementptr inbounds [32 x i64], [32 x i64] addrspace(5)* %alloca, i32 0, i32 %idx		%gep = getelementptr inbounds [32 x i64], [32 x i64] addrspace(5)* %alloca, i32 0, i32 %idx
%load = load atomic i64, i64 addrspace(5)* %gep syncscope("somescope") acquire, align 8		%load = load atomic i64, i64 addrspace(5)* %gep syncscope("somescope") acquire, align 8
ret i64 %load		ret i64 %load
}		}

; Simple memmove to alloca from constant address space argument.		; Simple memmove to alloca from constant address space argument.
define i8 @memmove_constant_arg_ptr_to_alloca([32 x i8] addrspace(4)* noalias readonly align 4 dereferenceable(32) %arg, i32 %idx) {		define i8 @memmove_constant_arg_ptr_to_alloca([32 x i8] addrspace(4)* noalias readonly align 4 dereferenceable(32) %arg, i32 %idx) {
; CHECK-LABEL: @memmove_constant_arg_ptr_to_alloca(		; CHECK-LABEL: @memmove_constant_arg_ptr_to_alloca(
; CHECK-NEXT: [[TMP1:%.]] = sext i32 [[IDX:%.]] to i64		; CHECK-NEXT: [[ALLOCA:%.*]] = alloca [32 x i8], align 4, addrspace(5)
; CHECK-NEXT: [[GEP:%.]] = getelementptr [32 x i8], [32 x i8] addrspace(4) [[ARG:%.*]], i64 0, i64 [[TMP1]]		; CHECK-NEXT: [[ALLOCA_CAST:%.]] = getelementptr inbounds [32 x i8], [32 x i8] addrspace(5) [[ALLOCA]], i32 0, i32 0
; CHECK-NEXT: [[LOAD:%.]] = load i8, i8 addrspace(4) [[GEP]], align 1		; CHECK-NEXT: [[ARG_CAST:%.]] = getelementptr inbounds [32 x i8], [32 x i8] addrspace(4) [[ARG:%.*]], i64 0, i64 0
		; CHECK-NEXT: call void @llvm.memmove.p5i8.p4i8.i32(i8 addrspace(5)* noundef align 4 dereferenceable(32) [[ALLOCA_CAST]], i8 addrspace(4)* noundef align 4 dereferenceable(32) [[ARG_CAST]], i32 32, i1 false)
		; CHECK-NEXT: [[GEP:%.]] = getelementptr inbounds [32 x i8], [32 x i8] addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
		; CHECK-NEXT: [[LOAD:%.]] = load i8, i8 addrspace(5) [[GEP]], align 1
; CHECK-NEXT: ret i8 [[LOAD]]		; CHECK-NEXT: ret i8 [[LOAD]]
;		;
%alloca = alloca [32 x i8], align 4, addrspace(5)		%alloca = alloca [32 x i8], align 4, addrspace(5)
%alloca.cast = bitcast [32 x i8] addrspace(5)* %alloca to i8 addrspace(5)*		%alloca.cast = bitcast [32 x i8] addrspace(5)* %alloca to i8 addrspace(5)*
%arg.cast = bitcast [32 x i8] addrspace(4)* %arg to i8 addrspace(4)*		%arg.cast = bitcast [32 x i8] addrspace(4)* %arg to i8 addrspace(4)*
call void @llvm.memmove.p5i8.p4i8.i32(i8 addrspace(5)* %alloca.cast, i8 addrspace(4)* %arg.cast, i32 32, i1 false)		call void @llvm.memmove.p5i8.p4i8.i32(i8 addrspace(5)* %alloca.cast, i8 addrspace(4)* %arg.cast, i32 32, i1 false)
%gep = getelementptr inbounds [32 x i8], [32 x i8] addrspace(5)* %alloca, i32 0, i32 %idx		%gep = getelementptr inbounds [32 x i8], [32 x i8] addrspace(5)* %alloca, i32 0, i32 %idx
%load = load i8, i8 addrspace(5)* %gep		%load = load i8, i8 addrspace(5)* %gep
▲ Show 20 Lines • Show All 197 Lines • Show Last 20 Lines

llvm/test/Transforms/LICM/hoist-deref-load.ll

	Show First 20 Lines • Show All 410 Lines • ▼ Show 20 Lines
	; This test represents the following function:			; This test represents the following function:
	; void test1(int * __restrict__ a, int b, int *cptr, int n) {			; void test1(int * __restrict__ a, int b, int *cptr, int n) {
	; c = *cptr;			; c = *cptr;
	; for (int i = 0; i < n; ++i)			; for (int i = 0; i < n; ++i)
	; if (a[i] > 0)			; if (a[i] > 0)
	; a[i] = (c)b[i];			; a[i] = (c)b[i];
	; }			; }
	; and we want to hoist the load of %c out of the loop. This can be done only			; and we want to hoist the load of %c out of the loop. This can be done only
	; because the dereferenceable meatdata on the c = *cptr load.			; because the dereferenceable meatdata on the c = *cptr load.
				xbolva00Unsubmitted Not Done Reply Inline Actions Regression in that C code? xbolva00: Regression in that C code?
	define void @test7(i32* noalias %a, i32* %b, i32** %cptr, i32 %n) #0 {			define void @test7(i32* noalias %a, i32* %b, i32** %cptr, i32 %n) #0 {
	; CHECK-LABEL: @test7(			; CHECK-LABEL: @test7(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[C:%.]] = load i32, i32** [[CPTR:%.*]], align 8, !dereferenceable !0, !align !0			; CHECK-NEXT: [[C:%.]] = load i32, i32** [[CPTR:%.*]], align 8, !dereferenceable !0, !align !0
	; CHECK-NEXT: [[CMP11:%.]] = icmp sgt i32 [[N:%.]], 0			; CHECK-NEXT: [[CMP11:%.]] = icmp sgt i32 [[N:%.]], 0
	; CHECK-NEXT: br i1 [[CMP11]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_END:%.]]			; CHECK-NEXT: br i1 [[CMP11]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_END:%.]]
	; CHECK: for.body.preheader:			; CHECK: for.body.preheader:
	; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[C]], align 4
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_INC:%.*]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_INC:%.*]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[ARRAYIDX]], align 4			; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[CMP1:%.*]] = icmp sgt i32 [[TMP1]], 0			; CHECK-NEXT: [[CMP1:%.*]] = icmp sgt i32 [[TMP0]], 0
	; CHECK-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.*]], label [[FOR_INC]]			; CHECK-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.*]], label [[FOR_INC]]
	; CHECK: if.then:			; CHECK: if.then:
				; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[C]], align 4
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[ARRAYIDX3]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[ARRAYIDX3]], align 4
	; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP2]], [[TMP0]]			; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP2]], [[TMP1]]
	; CHECK-NEXT: store i32 [[MUL]], i32* [[ARRAYIDX]], align 4			; CHECK-NEXT: store i32 [[MUL]], i32* [[ARRAYIDX]], align 4
	; CHECK-NEXT: br label [[FOR_INC]]			; CHECK-NEXT: br label [[FOR_INC]]
	; CHECK: for.inc:			; CHECK: for.inc:
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[N]]			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[N]]
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END_LOOPEXIT:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END_LOOPEXIT:%.*]], label [[FOR_BODY]]
	; CHECK: for.end.loopexit:			; CHECK: for.end.loopexit:
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[C:%.]] = load i32, i32** [[CPTR:%.*]], align 8, !dereferenceable_or_null !0, !align !0			; CHECK-NEXT: [[C:%.]] = load i32, i32** [[CPTR:%.*]], align 8, !dereferenceable_or_null !0, !align !0
	; CHECK-NEXT: [[NOT_NULL:%.]] = icmp ne i32 [[C]], null			; CHECK-NEXT: [[NOT_NULL:%.]] = icmp ne i32 [[C]], null
	; CHECK-NEXT: br i1 [[NOT_NULL]], label [[NOT_NULL:%.]], label [[FOR_END:%.]]			; CHECK-NEXT: br i1 [[NOT_NULL]], label [[NOT_NULL:%.]], label [[FOR_END:%.]]
	; CHECK: not.null:			; CHECK: not.null:
	; CHECK-NEXT: [[CMP11:%.]] = icmp sgt i32 [[N:%.]], 0			; CHECK-NEXT: [[CMP11:%.]] = icmp sgt i32 [[N:%.]], 0
	; CHECK-NEXT: br i1 [[CMP11]], label [[FOR_BODY_PREHEADER:%.*]], label [[FOR_END]]			; CHECK-NEXT: br i1 [[CMP11]], label [[FOR_BODY_PREHEADER:%.*]], label [[FOR_END]]
	; CHECK: for.body.preheader:			; CHECK: for.body.preheader:
	; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[C]], align 4
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_INC:%.*]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_INC:%.*]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[ARRAYIDX]], align 4			; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[CMP1:%.*]] = icmp sgt i32 [[TMP1]], 0			; CHECK-NEXT: [[CMP1:%.*]] = icmp sgt i32 [[TMP0]], 0
	; CHECK-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.*]], label [[FOR_INC]]			; CHECK-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.*]], label [[FOR_INC]]
	; CHECK: if.then:			; CHECK: if.then:
				; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[C]], align 4
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[ARRAYIDX3]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[ARRAYIDX3]], align 4
	; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP2]], [[TMP0]]			; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP2]], [[TMP1]]
	; CHECK-NEXT: store i32 [[MUL]], i32* [[ARRAYIDX]], align 4			; CHECK-NEXT: store i32 [[MUL]], i32* [[ARRAYIDX]], align 4
	; CHECK-NEXT: br label [[FOR_INC]]			; CHECK-NEXT: br label [[FOR_INC]]
	; CHECK: for.inc:			; CHECK: for.inc:
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[N]]			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[N]]
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END_LOOPEXIT:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END_LOOPEXIT:%.*]], label [[FOR_BODY]]
	; CHECK: for.end.loopexit:			; CHECK: for.end.loopexit:
	▲ Show 20 Lines • Show All 647 Lines • Show Last 20 Lines

llvm/test/Transforms/MergeICmps/X86/addressspaces.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -S -mergeicmps -verify-dom-info \| FileCheck %s			; RUN: opt < %s -S -mergeicmps -verify-dom-info \| FileCheck %s
	target triple = "x86_64"			target triple = "x86_64"

	; First check that we try to from a memcmp.			; First check that we try to from a memcmp.
	define void @form_memcmp([2 x i64]* dereferenceable(16) %a, [2 x i64]* dereferenceable(16) %b) {			define void @form_memcmp([2 x i64]* dereferenceable(16) %a, [2 x i64]* dereferenceable(16) %b) {
	; CHECK-LABEL: @form_memcmp(			; CHECK-LABEL: @form_memcmp(
	; CHECK-NEXT: bb0:			; CHECK-NEXT: bb0:
	; CHECK-NEXT: [[PTR_A0:%.]] = getelementptr inbounds [2 x i64], [2 x i64] [[A:%.*]], i64 0, i64 0			; CHECK-NEXT: [[PTR_A0:%.]] = getelementptr inbounds [2 x i64], [2 x i64] [[A:%.*]], i64 0, i64 0
	; CHECK-NEXT: [[PTR_A1:%.]] = getelementptr inbounds [2 x i64], [2 x i64] [[A]], i64 0, i64 1			; CHECK-NEXT: [[PTR_A1:%.]] = getelementptr inbounds [2 x i64], [2 x i64] [[A]], i64 0, i64 1
	; CHECK-NEXT: [[PTR_B0:%.]] = getelementptr inbounds [2 x i64], [2 x i64] [[B:%.*]], i64 0, i64 0			; CHECK-NEXT: [[PTR_B0:%.]] = getelementptr inbounds [2 x i64], [2 x i64] [[B:%.*]], i64 0, i64 0
	; CHECK-NEXT: [[PTR_B1:%.]] = getelementptr inbounds [2 x i64], [2 x i64] [[B]], i64 0, i64 1			; CHECK-NEXT: [[PTR_B1:%.]] = getelementptr inbounds [2 x i64], [2 x i64] [[B]], i64 0, i64 1
	; CHECK-NEXT: br label %"bb1+bb2"			; CHECK-NEXT: br label [[BB1:%.*]]
	; CHECK: "bb1+bb2":			; CHECK: bb1:
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds [2 x i64], [2 x i64] [[A]], i64 0, i64 0			; CHECK-NEXT: [[A0:%.]] = load i64, i64 [[PTR_A0]], align 4
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [2 x i64], [2 x i64] [[B]], i64 0, i64 0			; CHECK-NEXT: [[B0:%.]] = load i64, i64 [[PTR_B0]], align 4
	; CHECK-NEXT: [[CSTR:%.]] = bitcast i64 [[TMP0]] to i8*			; CHECK-NEXT: [[COND0:%.*]] = icmp eq i64 [[A0]], [[B0]]
	; CHECK-NEXT: [[CSTR1:%.]] = bitcast i64 [[TMP1]] to i8*			; CHECK-NEXT: br i1 [[COND0]], label [[BB2:%.]], label [[BB3:%.]]
	; CHECK-NEXT: [[MEMCMP:%.]] = call i32 @memcmp(i8 [[CSTR]], i8* [[CSTR1]], i64 16)			; CHECK: bb2:
	; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i32 [[MEMCMP]], 0			; CHECK-NEXT: [[A1:%.]] = load i64, i64 [[PTR_A1]], align 4
	; CHECK-NEXT: br label [[BB3:%.*]]			; CHECK-NEXT: [[B1:%.]] = load i64, i64 [[PTR_B1]], align 4
				; CHECK-NEXT: [[COND1:%.*]] = icmp eq i64 [[A1]], [[B1]]
				; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
				; CHECK-NEXT: [[NECESSARY:%.*]] = phi i1 [ [[COND1]], [[BB2]] ], [ false, [[BB1]] ]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb0:			bb0:
	%ptr_a0 = getelementptr inbounds [2 x i64], [2 x i64]* %a, i64 0, i64 0			%ptr_a0 = getelementptr inbounds [2 x i64], [2 x i64]* %a, i64 0, i64 0
	%ptr_a1 = getelementptr inbounds [2 x i64], [2 x i64]* %a, i64 0, i64 1			%ptr_a1 = getelementptr inbounds [2 x i64], [2 x i64]* %a, i64 0, i64 1
	%ptr_b0 = getelementptr inbounds [2 x i64], [2 x i64]* %b, i64 0, i64 0			%ptr_b0 = getelementptr inbounds [2 x i64], [2 x i64]* %b, i64 0, i64 0
	%ptr_b1 = getelementptr inbounds [2 x i64], [2 x i64]* %b, i64 0, i64 1			%ptr_b1 = getelementptr inbounds [2 x i64], [2 x i64]* %b, i64 0, i64 1
	br label %bb1			br label %bb1
	▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Redefine deref(N) attribute family as point-in-time semantics (aka deref-at-point)AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 375968

clang/test/CodeGenOpenCL/builtins-amdgcn.cl

llvm/docs/LangRef.rst

llvm/lib/Analysis/BasicAliasAnalysis.cpp

llvm/lib/IR/Value.cpp

llvm/test/Analysis/BasicAA/dereferenceable.ll

llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-load-metadata.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-return-values.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-indirect-call.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-sibling-call.ll

llvm/test/CodeGen/AMDGPU/buffer-intrinsics-mmo-offsets.ll

llvm/test/CodeGen/AMDGPU/indirect-addressing-term.ll

llvm/test/CodeGen/AMDGPU/kernel-args.ll

llvm/test/CodeGen/AMDGPU/legalize-fp-load-invariant.ll

llvm/test/CodeGen/AMDGPU/store-local.96.ll

llvm/test/CodeGen/PowerPC/memcmp-mergeexpand.ll

llvm/test/CodeGen/WebAssembly/reg-stackify.ll

llvm/test/CodeGen/X86/load-partial.ll

llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll

llvm/test/Transforms/LICM/hoist-deref-load.ll

llvm/test/Transforms/MergeICmps/X86/addressspaces.ll

Redefine deref(N) attribute family as point-in-time semantics (aka deref-at-point)
AbandonedPublic