This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
4/9
SIInstrInfo.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
GlobalISel/
-
llvm.amdgcn.div.fmas.ll
-
amdhsa-trap-num-sgprs.ll
-
kernel-args.ll
-
memory_clause.ll
-
promote-constOffset-to-imm.ll
-
salu-to-valu.ll
-
sgpr-control-flow.ll
-
shift-i128.ll
-
store-weird-sizes.ll
-
udivrem.ll

Differential D83626

[AMDGPU/MemOpsCluster] Guard new mem ops clustering heuristic logic by a flag
AbandonedPublic

Authored by hsmhsm on Jul 11 2020, 11:16 AM.

Download Raw Diff

Details

Reviewers

foad
arsenm
cfang

Summary

For the mem ops clustering logic, keep both old and new logic, guard
new logic by a flag. By default, the flag is disabled and the old logic will be
in place. When the flag is enabled, the new logic is triggered. The flag to
enable new logic is:
--amdgpu-enable-mem-ops-cluster-heuristic-based-on-clustered-bytes=true.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

hsmhsm created this revision.Jul 11 2020, 11:16 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 11 2020, 11:16 AM

Herald added subscribers: llvm-commits, kerbowa, hiraditya and 8 others. · View Herald Transcript

Harbormaster completed remote builds in B63870: Diff 277253.Jul 11 2020, 11:48 AM

foad added inline comments.Jul 13 2020, 12:35 AM

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
103	Is there a more useful description than "new heuristic"? If we're asking people to choose between two heuristics then the fact that one of them is currently "new" isn't really helpful.

Have taken care of review comment by Jay.

Rebased to upstream master.

Harbormaster failed remote builds in B63919: Diff 277346!Jul 13 2020, 2:35 AM

hsmhsm edited the summary of this revision. (Show Details)Jul 13 2020, 2:37 AM

hsmhsm edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B63923: Diff 277353.Jul 13 2020, 3:05 AM

rampitec added inline comments.Jul 13 2020, 10:06 AM

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
562	You have problem with extremely wide loads. I am not sure what was in the regression case, but probably something like 8 longs or so. Isn't it better to tweak it instead and just clamp based on the NumBytes as it supposed to be? You are saying you are checking NumBytes, but the return is solely based on NumLoads.

hsmhsm marked an inline comment as done.Jul 13 2020, 11:55 AM

hsmhsm added inline comments.

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
562	Let's consder your example as you replied to my email: Stas' Reply: Now say you have 4 incoming loads of <8 x i64>. According to your logic: this is 8644/8 = 256 bytes, MaxNumLoads = 4. You would say it is OK to cluster, because 4 <= 4 == true. Now say you have 8 loads of <8 x i32>. It is the same 256 bytes, but you inhibit clustering because 8 <= 4 == false. It is not really based on the number of bytes, it is based on the number of loads. As per my understanding, the crux of the logic is this - When the clustered bytes is even large but it is shared by very less number of load instructions, then consider them for clustering. It is where the real usage of clustered bytes plays a role. From above sence, your first example is ok for the heuristic, but not the second example.

hsmhsm marked an inline comment as done.Jul 13 2020, 12:07 PM

hsmhsm added inline comments.

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
562	And, in any case, I am not sure what you mean by - "just clamp based on the NumBytes as it supposed to be". I welcome your ideas in bit detail so that I understand it in detail, and it helps me to quickly experiment with it. Also, reworking the heuristic problably means again run all the must performance benchmarks, and make sure, we are not regressing there, and it means, the patch will take its own time before it is commited to master.

rampitec added inline comments.Jul 13 2020, 12:12 PM

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
562	I mean you can allow extra wide loads with your logic, which is not desired. I suggest to limit it. Did you look at the regression IR? Then if you afraid of further regressions and think more testing is needed, just revert the change lead to regression.

hsmhsm marked an inline comment as done.Jul 13 2020, 12:26 PM

hsmhsm added inline comments.

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
562	My honest openion is this - My hands are now tied to other urgent issues, and I do not have any time left to think about it. So safe bet now is just continue with old logic as it is. Since any way, we are not permanently reverting the change, and we are just disabling it, we can come back to it once I get some breathing time, and re-work it sooner than later.

rampitec added inline comments.Jul 13 2020, 12:32 PM

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
562	It simply does not do what you wrote it does. You wrote it makes a decision based on a number of bytes. It does not.

hsmhsm marked an inline comment as done.Jul 13 2020, 1:01 PM

hsmhsm added inline comments.

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
562	I had to come-up with, this heuristic in order satisfy all the workloads, and more importantly, to make sure that the workload (rocSpARSE) for which, the work was primarily undertaken will work as expected. But, yes, I think, I am missing some important key insights here. Any change now means re-evalaute everything from scratch. Also, it is not going to be very strighforward change, considering all the workloads. So, let's disable it for the time being. If you are otherwise fine with this patch, let's first merge this patch, so that it won't blocks some key projects.

@foad @rampitec

PING to give LGTM if the patch looks fine.

And, as I said, let's re-work the heuristic in backgpround as the issues cannot be fixed quickly. There are also other tickets created where it says PSDB CI test for the patch is causing CEFFE2 tests to fail. So, quick fix is not possible here. Even if we do, it is very likey that something else will pops up again.

rampitec removed a reviewer: rampitec.Jul 14 2020, 2:55 AM

I don't like the fact that we're solving problems with such switches.

Fundamentally, more clustering should never hurt from the perspective of the memory system, as long as we really properly cluster by locality. The way it could hurt is by increasing register pressure (and reducing occupancy). So the correct long-term solution is to make the scheduler better at avoiding the register pressure.

I think we can live with this ugliness for now, but I implore you to look at better solutions.

One must-fix inline.

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
103	The name of the cl::opt variable must match the name of the command-line option.

Hi @nhaehnle

I agree with you on your comments (and also with what @rampitec had made eariler).

We landed-up with this heuristic based on various fact to satisfy all the work-load that we investigated. But, now, I am realizing that there is serious loop-hole with this patch as @rampitec pointed out eariler.

If you carefully look into the original (old) heuristic, though, it also says, it is based on clustered bytes, but, the counted num bytes is not very accurate because of the way it is implemented. There is also a fixme comment for it. My initial journey of this patch started - (1) to remove that fixme, by adding necessary support, (2) and, then to fix the performance issues which are internally raised.

But, now, I reliaze that working on both (1) and (2) simultaneously led to current issues. So, my goal now is to make sure that the system is stable as eairler after (1). Once that is achieved, then we have 1:1 parity between old heuristic and its new clean-up. Then only, I can think of (2).

I am working on (1) right now, will put it for review once it is ready. From this angle, this patch is no more required.

I appreciate all the comments made here so far.

And, further I reverted my original commmit (cc9d69385659be32178506a38b4f2e112ed01ad4) which had introduced faulty heuristic since I am not finding any quick solution here. Now, none of the issues should be blocked because of this faulty patch. I will now start from scratch again to arrive at the heauristic which hopefully purley based on number of custered-bytes without mixing-up with number of clustered instructions.

Reverted commit is: 4905536086ee47f26cd13d716eff8aa6424dfdd7

And, I am closing this phabricator review.

This review request is no longer required. I am abandoning it.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

SIInstrInfo.cpp

88 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

llvm.amdgcn.div.fmas.ll

102 lines

amdhsa-trap-num-sgprs.ll

2 lines

kernel-args.ll

4 lines

memory_clause.ll

64 lines

promote-constOffset-to-imm.ll

8 lines

4 lines

44 lines

10 lines

254 lines

69 lines

Diff 277353

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	static cl::opt<bool> Fix16BitCopies(
cl::ReallyHidden);		cl::ReallyHidden);

SIInstrInfo::SIInstrInfo(const GCNSubtarget &ST)		SIInstrInfo::SIInstrInfo(const GCNSubtarget &ST)
: AMDGPUGenInstrInfo(AMDGPU::ADJCALLSTACKUP, AMDGPU::ADJCALLSTACKDOWN),		: AMDGPUGenInstrInfo(AMDGPU::ADJCALLSTACKUP, AMDGPU::ADJCALLSTACKDOWN),
RI(ST), ST(ST) {		RI(ST), ST(ST) {
SchedModel.init(&ST);		SchedModel.init(&ST);
}		}

		// Option to enable a heuristic for computing max mem ops cluster size which is
		// based on clustered bytes.
		static cl::opt<bool> EnableNewMemOpsClusterHeuristic(
		"amdgpu-enable-mem-ops-cluster-heuristic-based-on-clustered-bytes",
		foadUnsubmitted Not Done Reply Inline Actions Is there a more useful description than "new heuristic"? If we're asking people to choose between two heuristics then the fact that one of them is currently "new" isn't really helpful. foad: Is there a more useful description than "new heuristic"? If we're asking people to choose…
		nhaehnleUnsubmitted Not Done Reply Inline Actions The name of the cl::opt variable must match the name of the command-line option. nhaehnle: The name of the cl::opt variable must match the name of the command-line option.
		cl::desc("Enable clustered-bytes based heuristic for computing max mem ops "
		"cluster size"),
		cl::init(false), cl::Hidden);

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// TargetInstrInfo callbacks		// TargetInstrInfo callbacks
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

static unsigned getNumOperandsNoGlue(SDNode *Node) {		static unsigned getNumOperandsNoGlue(SDNode *Node) {
unsigned N = Node->getNumOperands();		unsigned N = Node->getNumOperands();
while (N && Node->getOperand(N - 1).getValueType() == MVT::Glue)		while (N && Node->getOperand(N - 1).getValueType() == MVT::Glue)
--N;		--N;
▲ Show 20 Lines • Show All 369 Lines • ▼ Show 20 Lines	bool SIInstrInfo::shouldClusterMemOps(ArrayRef<const MachineOperand *> BaseOps1,
// If current mem ops pair do not have same base pointer, then they cannot be		// If current mem ops pair do not have same base pointer, then they cannot be
// clustered.		// clustered.
assert(!BaseOps1.empty() && !BaseOps2.empty());		assert(!BaseOps1.empty() && !BaseOps2.empty());
const MachineInstr &FirstLdSt = *BaseOps1.front()->getParent();		const MachineInstr &FirstLdSt = *BaseOps1.front()->getParent();
const MachineInstr &SecondLdSt = *BaseOps2.front()->getParent();		const MachineInstr &SecondLdSt = *BaseOps2.front()->getParent();
if (!memOpsHaveSameBasePtr(FirstLdSt, BaseOps1, SecondLdSt, BaseOps2))		if (!memOpsHaveSameBasePtr(FirstLdSt, BaseOps1, SecondLdSt, BaseOps2))
return false;		return false;

// Compute max cluster size based on average number bytes clustered till now,		if (!EnableNewMemOpsClusterHeuristic) {
// and decide based on it, if current mem ops pair can be clustered or not.		// New heuristic to compute max cluster size is `not` enabled, continue with
		// the existing logic.
		const MachineOperand *FirstDst = nullptr;
		const MachineOperand *SecondDst = nullptr;

		if ((isMUBUF(FirstLdSt) && isMUBUF(SecondLdSt)) \|\|
		(isMTBUF(FirstLdSt) && isMTBUF(SecondLdSt)) \|\|
		(isMIMG(FirstLdSt) && isMIMG(SecondLdSt)) \|\|
		(isFLAT(FirstLdSt) && isFLAT(SecondLdSt))) {
		const unsigned MaxGlobalLoadCluster = 7;
		if (NumLoads > MaxGlobalLoadCluster)
		return false;

		FirstDst = getNamedOperand(FirstLdSt, AMDGPU::OpName::vdata);
		if (!FirstDst)
		FirstDst = getNamedOperand(FirstLdSt, AMDGPU::OpName::vdst);
		SecondDst = getNamedOperand(SecondLdSt, AMDGPU::OpName::vdata);
		if (!SecondDst)
		SecondDst = getNamedOperand(SecondLdSt, AMDGPU::OpName::vdst);
		} else if (isSMRD(FirstLdSt) && isSMRD(SecondLdSt)) {
		FirstDst = getNamedOperand(FirstLdSt, AMDGPU::OpName::sdst);
		SecondDst = getNamedOperand(SecondLdSt, AMDGPU::OpName::sdst);
		} else if (isDS(FirstLdSt) && isDS(SecondLdSt)) {
		FirstDst = getNamedOperand(FirstLdSt, AMDGPU::OpName::vdst);
		SecondDst = getNamedOperand(SecondLdSt, AMDGPU::OpName::vdst);
		}

		if (!FirstDst \|\| !SecondDst)
		return false;

		// Try to limit clustering based on the total number of bytes loaded
		// rather than the number of instructions. This is done to help reduce
		// register pressure. The method used is somewhat inexact, though,
		// because it assumes that all loads in the cluster will load the
		// same number of bytes as FirstLdSt.

		// The unit of this value is bytes.
		// FIXME: This needs finer tuning.
		unsigned LoadClusterThreshold = 16;

		const MachineRegisterInfo &MRI =
		FirstLdSt.getParent()->getParent()->getRegInfo();

		const Register Reg = FirstDst->getReg();

		const TargetRegisterClass *DstRC = Register::isVirtualRegister(Reg)
		? MRI.getRegClass(Reg)
		: RI.getPhysRegClass(Reg);

		// FIXME: NumLoads should not be subtracted 1. This is to match behavior
		// of clusterNeighboringMemOps which was previosly passing cluster length
		// less 1. LoadClusterThreshold should be tuned instead.
		return ((NumLoads - 1) * (RI.getRegSizeInBits(*DstRC) / 8)) <=
		LoadClusterThreshold;
		} else {
		// Use new heuristic to compute max cluster size based on average number
		// bytes clustered till now, and decide based on it, if current mem ops pair
		// can be clustered or not.
assert((NumLoads > 0) && (NumBytes > 0) && (NumBytes >= NumLoads) &&		assert((NumLoads > 0) && (NumBytes > 0) && (NumBytes >= NumLoads) &&
"Invalid NumLoads/NumBytes values");		"Invalid NumLoads/NumBytes values");
unsigned MaxNumLoads;		unsigned MaxNumLoads;
if (NumBytes <= 4 * NumLoads) {		if (NumBytes <= 4 * NumLoads) {
// Loads are dword or smaller (on average).		// Loads are dword or smaller (on average).
MaxNumLoads = 5;		MaxNumLoads = 5;
} else {		} else {
// Loads are bigger than a dword (on average).		// Loads are bigger than a dword (on average).
MaxNumLoads = 4;		MaxNumLoads = 4;
}		}
return NumLoads <= MaxNumLoads;		return NumLoads <= MaxNumLoads;
		rampitecUnsubmitted Not Done Reply Inline Actions You have problem with extremely wide loads. I am not sure what was in the regression case, but probably something like 8 longs or so. Isn't it better to tweak it instead and just clamp based on the NumBytes as it supposed to be? You are saying you are checking NumBytes, but the return is solely based on NumLoads. rampitec: You have problem with extremely wide loads. I am not sure what was in the regression case, but…
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions Let's consder your example as you replied to my email: Stas' Reply: Now say you have 4 incoming loads of <8 x i64>. According to your logic: this is 8644/8 = 256 bytes, MaxNumLoads = 4. You would say it is OK to cluster, because 4 <= 4 == true. Now say you have 8 loads of <8 x i32>. It is the same 256 bytes, but you inhibit clustering because 8 <= 4 == false. It is not really based on the number of bytes, it is based on the number of loads. As per my understanding, the crux of the logic is this - When the clustered bytes is even large but it is shared by very less number of load instructions, then consider them for clustering. It is where the real usage of clustered bytes plays a role. From above sence, your first example is ok for the heuristic, but not the second example. hsmhsm: Let's consder your example as you replied to my email: Stas' Reply: ``` Now say you have 4…
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions And, in any case, I am not sure what you mean by - "just clamp based on the NumBytes as it supposed to be". I welcome your ideas in bit detail so that I understand it in detail, and it helps me to quickly experiment with it. Also, reworking the heuristic problably means again run all the must performance benchmarks, and make sure, we are not regressing there, and it means, the patch will take its own time before it is commited to master. hsmhsm: And, in any case, I am not sure what you mean by - "just clamp based on the NumBytes as it…
		rampitecUnsubmitted Not Done Reply Inline Actions I mean you can allow extra wide loads with your logic, which is not desired. I suggest to limit it. Did you look at the regression IR? Then if you afraid of further regressions and think more testing is needed, just revert the change lead to regression. rampitec: I mean you can allow extra wide loads with your logic, which is not desired. I suggest to limit…
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions My honest openion is this - My hands are now tied to other urgent issues, and I do not have any time left to think about it. So safe bet now is just continue with old logic as it is. Since any way, we are not permanently reverting the change, and we are just disabling it, we can come back to it once I get some breathing time, and re-work it sooner than later. hsmhsm: My honest openion is this - My hands are now tied to other urgent issues, and I do not have any…
		rampitecUnsubmitted Not Done Reply Inline Actions It simply does not do what you wrote it does. You wrote it makes a decision based on a number of bytes. It does not. rampitec: It simply does not do what you wrote it does. You wrote it makes a decision based on a number…
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions I had to come-up with, this heuristic in order satisfy all the workloads, and more importantly, to make sure that the workload (rocSpARSE) for which, the work was primarily undertaken will work as expected. But, yes, I think, I am missing some important key insights here. Any change now means re-evalaute everything from scratch. Also, it is not going to be very strighforward change, considering all the workloads. So, let's disable it for the time being. If you are otherwise fine with this patch, let's first merge this patch, so that it won't blocks some key projects. hsmhsm: I had to come-up with, this heuristic in order satisfy all the workloads, and more importantly…
}		}
		}

// FIXME: This behaves strangely. If, for example, you have 32 load + stores,		// FIXME: This behaves strangely. If, for example, you have 32 load + stores,
// the first 16 loads will be interleaved with the stores, and the next 16 will		// the first 16 loads will be interleaved with the stores, and the next 16 will
// be clustered as expected. It should really split into 2 16 store batches.		// be clustered as expected. It should really split into 2 16 store batches.
//		//
// Loads are clustered until this returns false, rather than trying to schedule		// Loads are clustered until this returns false, rather than trying to schedule
// groups of stores. This also means we have to deal with saying different		// groups of stores. This also means we have to deal with saying different
// address space loads should be clustered, and ones which might cause bank		// address space loads should be clustered, and ones which might cause bank
▲ Show 20 Lines • Show All 6,654 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.div.fmas.ll

Show First 20 Lines • Show All 229 Lines • ▼ Show 20 Lines
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s6, -1
; GFX7-NEXT: s_nop 2		; GFX7-NEXT: s_nop 2
; GFX7-NEXT: v_div_fmas_f32 v0, v0, v1, v2		; GFX7-NEXT: v_div_fmas_f32 v0, v0, v1, v2
; GFX7-NEXT: buffer_store_dword v0, off, s[4:7], 0		; GFX7-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: test_div_fmas_f32:		; GFX8-LABEL: test_div_fmas_f32:
; GFX8: ; %bb.0:		; GFX8: ; %bb.0:
; GFX8-NEXT: s_load_dword s2, s[0:1], 0xb8		; GFX8-NEXT: s_load_dword s2, s[0:1], 0x4c
; GFX8-NEXT: s_load_dword s3, s[0:1], 0x4c		; GFX8-NEXT: s_load_dword s3, s[0:1], 0x70
; GFX8-NEXT: s_load_dword s4, s[0:1], 0x70		; GFX8-NEXT: s_load_dword s4, s[0:1], 0x94
; GFX8-NEXT: s_load_dword s5, s[0:1], 0x94		; GFX8-NEXT: s_load_dword s5, s[0:1], 0xb8
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_and_b32 s2, 1, s2		; GFX8-NEXT: v_mov_b32_e32 v0, s2
; GFX8-NEXT: v_mov_b32_e32 v0, s3		; GFX8-NEXT: v_mov_b32_e32 v1, s3
; GFX8-NEXT: v_mov_b32_e32 v1, s4		; GFX8-NEXT: v_mov_b32_e32 v2, s4
; GFX8-NEXT: v_mov_b32_e32 v2, s5		; GFX8-NEXT: s_and_b32 s2, 1, s5
; GFX8-NEXT: v_cmp_ne_u32_e64 vcc, 0, s2		; GFX8-NEXT: v_cmp_ne_u32_e64 vcc, 0, s2
; GFX8-NEXT: s_nop 3		; GFX8-NEXT: s_nop 3
; GFX8-NEXT: v_div_fmas_f32 v2, v0, v1, v2		; GFX8-NEXT: v_div_fmas_f32 v2, v0, v1, v2
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_store_dword v[0:1], v2		; GFX8-NEXT: flat_store_dword v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX10_W32-LABEL: test_div_fmas_f32:		; GFX10_W32-LABEL: test_div_fmas_f32:
; GFX10_W32: ; %bb.0:		; GFX10_W32: ; %bb.0:
; GFX10_W32-NEXT: s_clause 0x4		; GFX10_W32-NEXT: s_clause 0x4
; GFX10_W32-NEXT: s_load_dword s2, s[0:1], 0xb8		; GFX10_W32-NEXT: s_load_dword s2, s[0:1], 0xb8
; GFX10_W32-NEXT: s_load_dword s3, s[0:1], 0x70		; GFX10_W32-NEXT: s_load_dword s3, s[0:1], 0x70
; GFX10_W32-NEXT: s_load_dword s4, s[0:1], 0x94		; GFX10_W32-NEXT: s_load_dword s4, s[0:1], 0x94
; GFX10_W32-NEXT: s_load_dword s5, s[0:1], 0x4c		; GFX10_W32-NEXT: s_load_dword s5, s[0:1], 0x4c
▲ Show 20 Lines • Show All 259 Lines • ▼ Show 20 Lines	; GFX10_W64-NEXT: s_endpgm
%result = call float @llvm.amdgcn.div.fmas.f32(float %a, float %b, float 1.0, i1 %d)		%result = call float @llvm.amdgcn.div.fmas.f32(float %a, float %b, float 1.0, i1 %d)
store float %result, float addrspace(1)* %out, align 4		store float %result, float addrspace(1)* %out, align 4
ret void		ret void
}		}

define amdgpu_kernel void @test_div_fmas_f64(double addrspace(1)* %out, double %a, double %b, double %c, i1 %d) {		define amdgpu_kernel void @test_div_fmas_f64(double addrspace(1)* %out, double %a, double %b, double %c, i1 %d) {
; GFX7-LABEL: test_div_fmas_f64:		; GFX7-LABEL: test_div_fmas_f64:
; GFX7: ; %bb.0:		; GFX7: ; %bb.0:
; GFX7-NEXT: s_load_dwordx8 s[4:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dword s8, s[0:1], 0x11
; GFX7-NEXT: s_load_dword s0, s[0:1], 0x11		; GFX7-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x9
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v0, s6		; GFX7-NEXT: v_mov_b32_e32 v0, s2
; GFX7-NEXT: v_mov_b32_e32 v2, s8		; GFX7-NEXT: v_mov_b32_e32 v1, s3
; GFX7-NEXT: v_mov_b32_e32 v4, s10
; GFX7-NEXT: s_and_b32 s0, 1, s0
; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: v_mov_b32_e32 v3, s9
; GFX7-NEXT: v_mov_b32_e32 v5, s11
; GFX7-NEXT: v_cmp_ne_u32_e64 vcc, 0, s0
; GFX7-NEXT: s_nop 3
; GFX7-NEXT: v_div_fmas_f64 v[0:1], v[0:1], v[2:3], v[4:5]
; GFX7-NEXT: v_mov_b32_e32 v2, s4		; GFX7-NEXT: v_mov_b32_e32 v2, s4
		; GFX7-NEXT: v_mov_b32_e32 v4, s6
		; GFX7-NEXT: s_and_b32 s2, 1, s8
; GFX7-NEXT: v_mov_b32_e32 v3, s5		; GFX7-NEXT: v_mov_b32_e32 v3, s5
		; GFX7-NEXT: v_mov_b32_e32 v5, s7
		; GFX7-NEXT: v_cmp_ne_u32_e64 vcc, 0, s2
		; GFX7-NEXT: s_nop 3
		; GFX7-NEXT: v_div_fmas_f64 v[0:1], v[0:1], v[2:3], v[4:5]
		; GFX7-NEXT: v_mov_b32_e32 v3, s1
		; GFX7-NEXT: v_mov_b32_e32 v2, s0
; GFX7-NEXT: flat_store_dwordx2 v[2:3], v[0:1]		; GFX7-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: test_div_fmas_f64:		; GFX8-LABEL: test_div_fmas_f64:
; GFX8: ; %bb.0:		; GFX8: ; %bb.0:
; GFX8-NEXT: s_load_dwordx8 s[4:11], s[0:1], 0x24		; GFX8-NEXT: s_load_dword s8, s[0:1], 0x44
; GFX8-NEXT: s_load_dword s0, s[0:1], 0x44		; GFX8-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x24
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: v_mov_b32_e32 v0, s6		; GFX8-NEXT: v_mov_b32_e32 v0, s2
; GFX8-NEXT: v_mov_b32_e32 v2, s8		; GFX8-NEXT: v_mov_b32_e32 v1, s3
; GFX8-NEXT: v_mov_b32_e32 v4, s10
; GFX8-NEXT: s_and_b32 s0, 1, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s7
; GFX8-NEXT: v_mov_b32_e32 v3, s9
; GFX8-NEXT: v_mov_b32_e32 v5, s11
; GFX8-NEXT: v_cmp_ne_u32_e64 vcc, 0, s0
; GFX8-NEXT: s_nop 3
; GFX8-NEXT: v_div_fmas_f64 v[0:1], v[0:1], v[2:3], v[4:5]
; GFX8-NEXT: v_mov_b32_e32 v2, s4		; GFX8-NEXT: v_mov_b32_e32 v2, s4
		; GFX8-NEXT: v_mov_b32_e32 v4, s6
		; GFX8-NEXT: s_and_b32 s2, 1, s8
; GFX8-NEXT: v_mov_b32_e32 v3, s5		; GFX8-NEXT: v_mov_b32_e32 v3, s5
		; GFX8-NEXT: v_mov_b32_e32 v5, s7
		; GFX8-NEXT: v_cmp_ne_u32_e64 vcc, 0, s2
		; GFX8-NEXT: s_nop 3
		; GFX8-NEXT: v_div_fmas_f64 v[0:1], v[0:1], v[2:3], v[4:5]
		; GFX8-NEXT: v_mov_b32_e32 v3, s1
		; GFX8-NEXT: v_mov_b32_e32 v2, s0
; GFX8-NEXT: flat_store_dwordx2 v[2:3], v[0:1]		; GFX8-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX10_W32-LABEL: test_div_fmas_f64:		; GFX10_W32-LABEL: test_div_fmas_f64:
; GFX10_W32: ; %bb.0:		; GFX10_W32: ; %bb.0:
; GFX10_W32-NEXT: s_clause 0x1		; GFX10_W32-NEXT: s_clause 0x1
; GFX10_W32-NEXT: s_load_dword s8, s[0:1], 0x44		; GFX10_W32-NEXT: s_load_dword s8, s[0:1], 0x44
; GFX10_W32-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x24		; GFX10_W32-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x24
; GFX10_W32-NEXT: ; implicit-def: $vcc_hi		; GFX10_W32-NEXT: ; implicit-def: $vcc_hi
; GFX10_W32-NEXT: s_waitcnt lgkmcnt(0)		; GFX10_W32-NEXT: s_waitcnt lgkmcnt(0)
▲ Show 20 Lines • Show All 585 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/amdhsa-trap-num-sgprs.ll

	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx803 -mattr=+trap-handler < %s \| FileCheck %s --check-prefixes=GCN,TRAP-HANDLER-ENABLE			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx803 -mattr=+trap-handler < %s \| FileCheck %s --check-prefixes=GCN,TRAP-HANDLER-ENABLE
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx803 -mattr=-trap-handler < %s \| FileCheck %s --check-prefixes=GCN,TRAP-HANDLER-DISABLE			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx803 -mattr=-trap-handler < %s \| FileCheck %s --check-prefixes=GCN,TRAP-HANDLER-DISABLE

	; GCN-LABEL: {{^}}amdhsa_trap_num_sgprs			; GCN-LABEL: {{^}}amdhsa_trap_num_sgprs
	; TRAP-HANDLER-ENABLE: NumSgprs: 61			; TRAP-HANDLER-ENABLE: NumSgprs: 61
	; TRAP-HANDLER-DISABLE: NumSgprs: 77			; TRAP-HANDLER-DISABLE: NumSgprs: 79
	define amdgpu_kernel void @amdhsa_trap_num_sgprs(			define amdgpu_kernel void @amdhsa_trap_num_sgprs(
	i32 addrspace(1)* %out0, i32 %in0,			i32 addrspace(1)* %out0, i32 %in0,
	i32 addrspace(1)* %out1, i32 %in1,			i32 addrspace(1)* %out1, i32 %in1,
	i32 addrspace(1)* %out2, i32 %in2,			i32 addrspace(1)* %out2, i32 %in2,
	i32 addrspace(1)* %out3, i32 %in3,			i32 addrspace(1)* %out3, i32 %in3,
	i32 addrspace(1)* %out4, i32 %in4,			i32 addrspace(1)* %out4, i32 %in4,
	i32 addrspace(1)* %out5, i32 %in5,			i32 addrspace(1)* %out5, i32 %in5,
	i32 addrspace(1)* %out6, i32 %in6,			i32 addrspace(1)* %out6, i32 %in6,
	▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/kernel-args.ll

Show First 20 Lines • Show All 849 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @struct_argument_alignment({i32, i64} %arg0, i8, {i32, i64} %arg1) {
store volatile i64 %val3, i64 addrspace(1)* null		store volatile i64 %val3, i64 addrspace(1)* null
ret void		ret void
}		}

; No padding between i8 and next struct, but round up at end to 4 byte		; No padding between i8 and next struct, but round up at end to 4 byte
; multiple.		; multiple.
; FUNC-LABEL: {{^}}packed_struct_argument_alignment:		; FUNC-LABEL: {{^}}packed_struct_argument_alignment:
; HSA-GFX9: kernarg_segment_byte_size = 28		; HSA-GFX9: kernarg_segment_byte_size = 28
; HSA-GFX9: s_load_dword s{{[0-9]+}}, s[4:5], 0x0
; HSA-GFX9: s_load_dwordx2 s{{\[[0-9]+:[0-9]+\]}}, s[4:5], 0x4
; HSA-GFX9: global_load_dwordx2 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, off offset:17		; HSA-GFX9: global_load_dwordx2 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, off offset:17
; HSA-GFX9: global_load_dword v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, off offset:13		; HSA-GFX9: global_load_dword v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, off offset:13
		; HSA-GFX9: s_load_dword s{{[0-9]+}}, s[4:5], 0x0
		; HSA-GFX9: s_load_dwordx2 s{{\[[0-9]+:[0-9]+\]}}, s[4:5], 0x4
define amdgpu_kernel void @packed_struct_argument_alignment(<{i32, i64}> %arg0, i8, <{i32, i64}> %arg1) {		define amdgpu_kernel void @packed_struct_argument_alignment(<{i32, i64}> %arg0, i8, <{i32, i64}> %arg1) {
%val0 = extractvalue <{i32, i64}> %arg0, 0		%val0 = extractvalue <{i32, i64}> %arg0, 0
%val1 = extractvalue <{i32, i64}> %arg0, 1		%val1 = extractvalue <{i32, i64}> %arg0, 1
%val2 = extractvalue <{i32, i64}> %arg1, 0		%val2 = extractvalue <{i32, i64}> %arg1, 0
%val3 = extractvalue <{i32, i64}> %arg1, 1		%val3 = extractvalue <{i32, i64}> %arg1, 1
store volatile i32 %val0, i32 addrspace(1)* null		store volatile i32 %val0, i32 addrspace(1)* null
store volatile i64 %val1, i64 addrspace(1)* null		store volatile i64 %val1, i64 addrspace(1)* null
store volatile i32 %val2, i32 addrspace(1)* null		store volatile i32 %val2, i32 addrspace(1)* null
▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/memory_clause.ll

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	bb:
store <4 x i32> %tmp12, <4 x i32> addrspace(1)* %tmp13, align 16		store <4 x i32> %tmp12, <4 x i32> addrspace(1)* %tmp13, align 16
store <4 x i32> %tmp16, <4 x i32> addrspace(1)* %tmp17, align 16		store <4 x i32> %tmp16, <4 x i32> addrspace(1)* %tmp17, align 16
ret void		ret void
}		}

define amdgpu_kernel void @scalar_clause(<4 x i32> addrspace(1)* noalias nocapture readonly %arg, <4 x i32> addrspace(1)* noalias nocapture %arg1) {		define amdgpu_kernel void @scalar_clause(<4 x i32> addrspace(1)* noalias nocapture readonly %arg, <4 x i32> addrspace(1)* noalias nocapture %arg1) {
; GCN-LABEL: scalar_clause:		; GCN-LABEL: scalar_clause:
; GCN: ; %bb.0: ; %bb		; GCN: ; %bb.0: ; %bb
; GCN-NEXT: s_load_dwordx2 s[16:17], s[0:1], 0x24		; GCN-NEXT: s_load_dwordx2 s[16:17], s[0:1], 0x24
; GCN-NEXT: s_load_dwordx2 s[18:19], s[0:1], 0x2c		; GCN-NEXT: s_load_dwordx2 s[18:19], s[0:1], 0x2c
; GCN-NEXT: s_nop 0		; GCN-NEXT: s_nop 0
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: s_load_dwordx4 s[0:3], s[16:17], 0x0		; GCN-NEXT: s_load_dwordx4 s[0:3], s[16:17], 0x0
; GCN-NEXT: s_load_dwordx4 s[4:7], s[16:17], 0x10		; GCN-NEXT: s_load_dwordx4 s[4:7], s[16:17], 0x10
; GCN-NEXT: s_load_dwordx4 s[8:11], s[16:17], 0x20		; GCN-NEXT: s_load_dwordx4 s[8:11], s[16:17], 0x20
; GCN-NEXT: s_load_dwordx4 s[12:15], s[16:17], 0x30		; GCN-NEXT: s_load_dwordx4 s[12:15], s[16:17], 0x30
; GCN-NEXT: v_mov_b32_e32 v16, s18		; GCN-NEXT: v_mov_b32_e32 v12, s18
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: v_mov_b32_e32 v0, s0		; GCN-NEXT: v_mov_b32_e32 v0, s0
; GCN-NEXT: v_mov_b32_e32 v4, s4		; GCN-NEXT: v_mov_b32_e32 v4, s4
; GCN-NEXT: v_mov_b32_e32 v8, s8		; GCN-NEXT: v_mov_b32_e32 v8, s8
; GCN-NEXT: v_mov_b32_e32 v12, s12		; GCN-NEXT: v_mov_b32_e32 v13, s19
; GCN-NEXT: v_mov_b32_e32 v17, s19
; GCN-NEXT: v_mov_b32_e32 v1, s1		; GCN-NEXT: v_mov_b32_e32 v1, s1
; GCN-NEXT: v_mov_b32_e32 v2, s2		; GCN-NEXT: v_mov_b32_e32 v2, s2
; GCN-NEXT: v_mov_b32_e32 v3, s3		; GCN-NEXT: v_mov_b32_e32 v3, s3
; GCN-NEXT: v_mov_b32_e32 v5, s5		; GCN-NEXT: v_mov_b32_e32 v5, s5
; GCN-NEXT: v_mov_b32_e32 v6, s6		; GCN-NEXT: v_mov_b32_e32 v6, s6
; GCN-NEXT: v_mov_b32_e32 v7, s7		; GCN-NEXT: v_mov_b32_e32 v7, s7
		; GCN-NEXT: global_store_dwordx4 v[12:13], v[0:3], off
		; GCN-NEXT: global_store_dwordx4 v[12:13], v[4:7], off offset:16
		; GCN-NEXT: v_mov_b32_e32 v0, s12
; GCN-NEXT: v_mov_b32_e32 v9, s9		; GCN-NEXT: v_mov_b32_e32 v9, s9
; GCN-NEXT: v_mov_b32_e32 v10, s10		; GCN-NEXT: v_mov_b32_e32 v10, s10
; GCN-NEXT: v_mov_b32_e32 v11, s11		; GCN-NEXT: v_mov_b32_e32 v11, s11
; GCN-NEXT: v_mov_b32_e32 v13, s13		; GCN-NEXT: v_mov_b32_e32 v1, s13
; GCN-NEXT: v_mov_b32_e32 v14, s14		; GCN-NEXT: v_mov_b32_e32 v2, s14
; GCN-NEXT: v_mov_b32_e32 v15, s15		; GCN-NEXT: v_mov_b32_e32 v3, s15
; GCN-NEXT: global_store_dwordx4 v[16:17], v[0:3], off		; GCN-NEXT: global_store_dwordx4 v[12:13], v[8:11], off offset:32
; GCN-NEXT: global_store_dwordx4 v[16:17], v[4:7], off offset:16		; GCN-NEXT: global_store_dwordx4 v[12:13], v[0:3], off offset:48
; GCN-NEXT: global_store_dwordx4 v[16:17], v[8:11], off offset:32
; GCN-NEXT: global_store_dwordx4 v[16:17], v[12:15], off offset:48
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
bb:		bb:
%tmp = load <4 x i32>, <4 x i32> addrspace(1)* %arg, align 16		%tmp = load <4 x i32>, <4 x i32> addrspace(1)* %arg, align 16
%tmp2 = getelementptr inbounds <4 x i32>, <4 x i32> addrspace(1)* %arg, i64 1		%tmp2 = getelementptr inbounds <4 x i32>, <4 x i32> addrspace(1)* %arg, i64 1
%tmp3 = load <4 x i32>, <4 x i32> addrspace(1)* %tmp2, align 16		%tmp3 = load <4 x i32>, <4 x i32> addrspace(1)* %tmp2, align 16
%tmp4 = getelementptr inbounds <4 x i32>, <4 x i32> addrspace(1)* %arg1, i64 1		%tmp4 = getelementptr inbounds <4 x i32>, <4 x i32> addrspace(1)* %arg1, i64 1
%tmp5 = getelementptr inbounds <4 x i32>, <4 x i32> addrspace(1)* %arg, i64 2		%tmp5 = getelementptr inbounds <4 x i32>, <4 x i32> addrspace(1)* %arg, i64 2
%tmp6 = load <4 x i32>, <4 x i32> addrspace(1)* %tmp5, align 16		%tmp6 = load <4 x i32>, <4 x i32> addrspace(1)* %tmp5, align 16
%tmp7 = getelementptr inbounds <4 x i32>, <4 x i32> addrspace(1)* %arg1, i64 2		%tmp7 = getelementptr inbounds <4 x i32>, <4 x i32> addrspace(1)* %arg1, i64 2
▲ Show 20 Lines • Show All 177 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll

	Show All 11 Lines
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	;			;
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048
				; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-4096			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-4096
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048
	;			;
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	;			;
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-4096
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-4096			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-4096
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-4096			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-4096
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048
				; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-4096
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off
	;			;
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	▲ Show 20 Lines • Show All 187 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @Offset64(i8 addrspace(1)* %buffer) {			define amdgpu_kernel void @Offset64(i8 addrspace(1)* %buffer) {
	; GCN-LABEL: Offset64:			; GCN-LABEL: Offset64:
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	;			;
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-4096			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-4096
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048
				; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	;			;
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	entry:			entry:
	%call = tail call i64 @_Z13get_global_idj(i32 0)			%call = tail call i64 @_Z13get_global_idj(i32 0)
	%conv = and i64 %call, 255			%conv = and i64 %call, 255
	▲ Show 20 Lines • Show All 137 Lines • ▼ Show 20 Lines
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	;			;
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
				; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048
	;			;
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/salu-to-valu.ll

Show First 20 Lines • Show All 197 Lines • ▼ Show 20 Lines	entry:
%tmp5 = or <8 x i32> %tmp4, %c		%tmp5 = or <8 x i32> %tmp4, %c
store <8 x i32> %tmp5, <8 x i32> addrspace(1)* %out		store <8 x i32> %tmp5, <8 x i32> addrspace(1)* %out
ret void		ret void
}		}

; GCN-LABEL: {{^}}smrd_valu_ci_offset_x16:		; GCN-LABEL: {{^}}smrd_valu_ci_offset_x16:

; SI: s_mov_b32 {{s[0-9]+}}, 0x13480		; SI: s_mov_b32 {{s[0-9]+}}, 0x13480
; SI: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], {{s[0-9]+}} addr64
; SI: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], 0 addr64 offset:16		; SI: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], 0 addr64 offset:16
; SI: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], 0 addr64 offset:32		; SI: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], {{s[0-9]+}} addr64
; SI: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], 0 addr64 offset:48
; CI-NOHSA-DAG: s_mov_b32 [[OFFSET0:s[0-9]+]], 0x13480{{$}}		; CI-NOHSA-DAG: s_mov_b32 [[OFFSET0:s[0-9]+]], 0x13480{{$}}
; CI-NOHSA-DAG: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], [[OFFSET0]] addr64{{$}}		; CI-NOHSA-DAG: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], [[OFFSET0]] addr64{{$}}
; CI-NOHSA-DAG: s_mov_b32 [[OFFSET1:s[0-9]+]], 0x13490{{$}}		; CI-NOHSA-DAG: s_mov_b32 [[OFFSET1:s[0-9]+]], 0x13490{{$}}
; CI-NOHSA-DAG: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], [[OFFSET1]] addr64{{$}}		; CI-NOHSA-DAG: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], [[OFFSET1]] addr64{{$}}
; CI-NOHSA-DAG: s_mov_b32 [[OFFSET2:s[0-9]+]], 0x134a0{{$}}		; CI-NOHSA-DAG: s_mov_b32 [[OFFSET2:s[0-9]+]], 0x134a0{{$}}
; CI-NOHSA-DAG: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], [[OFFSET2]] addr64{{$}}		; CI-NOHSA-DAG: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], [[OFFSET2]] addr64{{$}}
; CI-NOHSA-DAG: s_mov_b32 [[OFFSET3:s[0-9]+]], 0x134b0{{$}}		; CI-NOHSA-DAG: s_mov_b32 [[OFFSET3:s[0-9]+]], 0x134b0{{$}}
; CI-NOHSA-DAG: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], [[OFFSET3]] addr64{{$}}		; CI-NOHSA-DAG: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], [[OFFSET3]] addr64{{$}}
▲ Show 20 Lines • Show All 296 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/sgpr-control-flow.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=SI %s			; RUN: llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=SI %s
	;			;
	; Most SALU instructions ignore control flow, so we need to make sure			; Most SALU instructions ignore control flow, so we need to make sure
	; they don't overwrite values from other blocks.			; they don't overwrite values from other blocks.

	; If the branch decision is made based on a value in an SGPR then all			; If the branch decision is made based on a value in an SGPR then all
	; threads will execute the same code paths, so we don't need to worry			; threads will execute the same code paths, so we don't need to worry
	; about instructions in different blocks overwriting each other.			; about instructions in different blocks overwriting each other.

	define amdgpu_kernel void @sgpr_if_else_salu_br(i32 addrspace(1)* %out, i32 %a, i32 %b, i32 %c, i32 %d, i32 %e) {			define amdgpu_kernel void @sgpr_if_else_salu_br(i32 addrspace(1)* %out, i32 %a, i32 %b, i32 %c, i32 %d, i32 %e) {
	; SI-LABEL: sgpr_if_else_salu_br:			; SI-LABEL: sgpr_if_else_salu_br:
	; SI: ; %bb.0: ; %entry			; SI: ; %bb.0: ; %entry
	; SI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0xb			; SI-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x9
	; SI-NEXT: s_load_dword s2, s[0:1], 0xf			; SI-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0xb
	; SI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9			; SI-NEXT: s_load_dword s0, s[0:1], 0xf
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: s_cmp_lg_u32 s4, 0			; SI-NEXT: s_cmp_lg_u32 s8, 0
	; SI-NEXT: s_cbranch_scc0 BB0_2			; SI-NEXT: s_cbranch_scc0 BB0_2
	; SI-NEXT:; %bb.1: ; %else			; SI-NEXT:; %bb.1: ; %else
	; SI-NEXT: s_add_i32 s2, s7, s2			; SI-NEXT: s_add_i32 s0, s11, s0
	; SI-NEXT: s_mov_b64 s[8:9], 0			; SI-NEXT: s_mov_b64 s[2:3], 0
	; SI-NEXT: s_andn2_b64 vcc, exec, s[8:9]			; SI-NEXT: s_andn2_b64 vcc, exec, s[2:3]
	; SI-NEXT: s_cbranch_vccz BB0_3			; SI-NEXT: s_cbranch_vccz BB0_3
	; SI-NEXT: s_branch BB0_4			; SI-NEXT: s_branch BB0_4
	; SI-NEXT:BB0_2:			; SI-NEXT:BB0_2:
	; SI-NEXT: s_mov_b64 s[8:9], -1			; SI-NEXT: s_mov_b64 s[2:3], -1
	; SI-NEXT: ; implicit-def: $sgpr2			; SI-NEXT: ; implicit-def: $sgpr0
	; SI-NEXT: s_andn2_b64 vcc, exec, s[8:9]			; SI-NEXT: s_andn2_b64 vcc, exec, s[2:3]
	; SI-NEXT: s_cbranch_vccnz BB0_4			; SI-NEXT: s_cbranch_vccnz BB0_4
	; SI-NEXT:BB0_3: ; %if			; SI-NEXT:BB0_3: ; %if
	; SI-NEXT: s_sub_i32 s2, s5, s6			; SI-NEXT: s_sub_i32 s0, s9, s10
	; SI-NEXT:BB0_4: ; %endif			; SI-NEXT:BB0_4: ; %endif
	; SI-NEXT: s_add_i32 s4, s2, s4			; SI-NEXT: s_add_i32 s0, s0, s8
	; SI-NEXT: s_mov_b32 s3, 0xf000			; SI-NEXT: s_mov_b32 s7, 0xf000
	; SI-NEXT: s_mov_b32 s2, -1			; SI-NEXT: s_mov_b32 s6, -1
	; SI-NEXT: v_mov_b32_e32 v0, s4			; SI-NEXT: v_mov_b32_e32 v0, s0
	; SI-NEXT: buffer_store_dword v0, off, s[0:3], 0			; SI-NEXT: buffer_store_dword v0, off, s[4:7], 0
	; SI-NEXT: s_endpgm			; SI-NEXT: s_endpgm
	entry:			entry:
	%0 = icmp eq i32 %a, 0			%0 = icmp eq i32 %a, 0
	br i1 %0, label %if, label %else			br i1 %0, label %if, label %else

	if:			if:
	%1 = sub i32 %b, %c			%1 = sub i32 %b, %c
	br label %endif			br label %endif

	▲ Show 20 Lines • Show All 188 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/shift-i128.ll

	Show First 20 Lines • Show All 440 Lines • ▼ Show 20 Lines
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	%shl = ashr <2 x i128> %lhs, %rhs			%shl = ashr <2 x i128> %lhs, %rhs
	ret <2 x i128> %shl			ret <2 x i128> %shl
	}			}

	define amdgpu_kernel void @s_shl_v2i128ss(<2 x i128> %lhs, <2 x i128> %rhs) {			define amdgpu_kernel void @s_shl_v2i128ss(<2 x i128> %lhs, <2 x i128> %rhs) {
	; GCN-LABEL: s_shl_v2i128ss:			; GCN-LABEL: s_shl_v2i128ss:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_load_dwordx8 s[8:15], s[4:5], 0x0
	; GCN-NEXT: s_load_dwordx8 s[16:23], s[4:5], 0x8			; GCN-NEXT: s_load_dwordx8 s[16:23], s[4:5], 0x8
				; GCN-NEXT: s_load_dwordx8 s[8:15], s[4:5], 0x0
	; GCN-NEXT: v_mov_b32_e32 v10, 16			; GCN-NEXT: v_mov_b32_e32 v10, 16
	; GCN-NEXT: v_mov_b32_e32 v8, 0			; GCN-NEXT: v_mov_b32_e32 v8, 0
	; GCN-NEXT: v_mov_b32_e32 v11, 0			; GCN-NEXT: v_mov_b32_e32 v11, 0
	; GCN-NEXT: v_mov_b32_e32 v9, 0
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: s_sub_i32 s6, 64, s16			; GCN-NEXT: s_sub_i32 s6, 64, s16
	; GCN-NEXT: v_cmp_lt_u64_e64 s[0:1], s[16:17], 64			; GCN-NEXT: v_cmp_lt_u64_e64 s[0:1], s[16:17], 64
	; GCN-NEXT: v_cmp_eq_u64_e64 s[2:3], s[18:19], 0			; GCN-NEXT: v_cmp_eq_u64_e64 s[2:3], s[18:19], 0
	; GCN-NEXT: s_sub_i32 s4, s16, 64			; GCN-NEXT: s_sub_i32 s4, s16, 64
	; GCN-NEXT: s_lshr_b64 s[6:7], s[8:9], s6			; GCN-NEXT: s_lshr_b64 s[6:7], s[8:9], s6
	; GCN-NEXT: s_lshl_b64 s[24:25], s[10:11], s16			; GCN-NEXT: s_lshl_b64 s[24:25], s[10:11], s16
	; GCN-NEXT: s_and_b64 vcc, s[2:3], s[0:1]			; GCN-NEXT: s_and_b64 vcc, s[2:3], s[0:1]
	Show All 36 Lines
	; GCN-NEXT: v_mov_b32_e32 v0, s3			; GCN-NEXT: v_mov_b32_e32 v0, s3
	; GCN-NEXT: v_cndmask_b32_e32 v1, 0, v0, vcc			; GCN-NEXT: v_cndmask_b32_e32 v1, 0, v0, vcc
	; GCN-NEXT: v_mov_b32_e32 v0, s2			; GCN-NEXT: v_mov_b32_e32 v0, s2
	; GCN-NEXT: s_lshl_b64 s[2:3], s[12:13], s20			; GCN-NEXT: s_lshl_b64 s[2:3], s[12:13], s20
	; GCN-NEXT: v_mov_b32_e32 v4, s3			; GCN-NEXT: v_mov_b32_e32 v4, s3
	; GCN-NEXT: v_cndmask_b32_e64 v5, 0, v4, s[0:1]			; GCN-NEXT: v_cndmask_b32_e64 v5, 0, v4, s[0:1]
	; GCN-NEXT: v_mov_b32_e32 v4, s2			; GCN-NEXT: v_mov_b32_e32 v4, s2
	; GCN-NEXT: v_cndmask_b32_e64 v4, 0, v4, s[0:1]			; GCN-NEXT: v_cndmask_b32_e64 v4, 0, v4, s[0:1]
				; GCN-NEXT: v_mov_b32_e32 v9, 0
	; GCN-NEXT: v_cndmask_b32_e32 v0, 0, v0, vcc			; GCN-NEXT: v_cndmask_b32_e32 v0, 0, v0, vcc
	; GCN-NEXT: flat_store_dwordx4 v[10:11], v[4:7]			; GCN-NEXT: flat_store_dwordx4 v[10:11], v[4:7]
	; GCN-NEXT: flat_store_dwordx4 v[8:9], v[0:3]			; GCN-NEXT: flat_store_dwordx4 v[8:9], v[0:3]
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	%shift = shl <2 x i128> %lhs, %rhs			%shift = shl <2 x i128> %lhs, %rhs
	store <2 x i128> %shift, <2 x i128> addrspace(1)* null			store <2 x i128> %shift, <2 x i128> addrspace(1)* null
	ret void			ret void
	}			}

	define amdgpu_kernel void @s_lshr_v2i128_ss(<2 x i128> %lhs, <2 x i128> %rhs) {			define amdgpu_kernel void @s_lshr_v2i128_ss(<2 x i128> %lhs, <2 x i128> %rhs) {
	; GCN-LABEL: s_lshr_v2i128_ss:			; GCN-LABEL: s_lshr_v2i128_ss:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_load_dwordx8 s[8:15], s[4:5], 0x0
	; GCN-NEXT: s_load_dwordx8 s[16:23], s[4:5], 0x8			; GCN-NEXT: s_load_dwordx8 s[16:23], s[4:5], 0x8
				; GCN-NEXT: s_load_dwordx8 s[8:15], s[4:5], 0x0
	; GCN-NEXT: v_mov_b32_e32 v10, 16			; GCN-NEXT: v_mov_b32_e32 v10, 16
	; GCN-NEXT: v_mov_b32_e32 v8, 0			; GCN-NEXT: v_mov_b32_e32 v8, 0
	; GCN-NEXT: v_mov_b32_e32 v11, 0			; GCN-NEXT: v_mov_b32_e32 v11, 0
	; GCN-NEXT: v_mov_b32_e32 v9, 0
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: s_sub_i32 s6, 64, s16			; GCN-NEXT: s_sub_i32 s6, 64, s16
	; GCN-NEXT: v_cmp_lt_u64_e64 s[0:1], s[16:17], 64			; GCN-NEXT: v_cmp_lt_u64_e64 s[0:1], s[16:17], 64
	; GCN-NEXT: v_cmp_eq_u64_e64 s[2:3], s[18:19], 0			; GCN-NEXT: v_cmp_eq_u64_e64 s[2:3], s[18:19], 0
	; GCN-NEXT: s_sub_i32 s4, s16, 64			; GCN-NEXT: s_sub_i32 s4, s16, 64
	; GCN-NEXT: s_lshl_b64 s[6:7], s[10:11], s6			; GCN-NEXT: s_lshl_b64 s[6:7], s[10:11], s6
	; GCN-NEXT: s_lshr_b64 s[24:25], s[8:9], s16			; GCN-NEXT: s_lshr_b64 s[24:25], s[8:9], s16
	; GCN-NEXT: s_or_b64 s[6:7], s[24:25], s[6:7]			; GCN-NEXT: s_or_b64 s[6:7], s[24:25], s[6:7]
	Show All 36 Lines
	; GCN-NEXT: v_mov_b32_e32 v2, s3			; GCN-NEXT: v_mov_b32_e32 v2, s3
	; GCN-NEXT: v_cndmask_b32_e32 v3, 0, v2, vcc			; GCN-NEXT: v_cndmask_b32_e32 v3, 0, v2, vcc
	; GCN-NEXT: v_mov_b32_e32 v2, s2			; GCN-NEXT: v_mov_b32_e32 v2, s2
	; GCN-NEXT: s_lshr_b64 s[2:3], s[14:15], s20			; GCN-NEXT: s_lshr_b64 s[2:3], s[14:15], s20
	; GCN-NEXT: v_mov_b32_e32 v6, s3			; GCN-NEXT: v_mov_b32_e32 v6, s3
	; GCN-NEXT: v_cndmask_b32_e64 v7, 0, v6, s[0:1]			; GCN-NEXT: v_cndmask_b32_e64 v7, 0, v6, s[0:1]
	; GCN-NEXT: v_mov_b32_e32 v6, s2			; GCN-NEXT: v_mov_b32_e32 v6, s2
	; GCN-NEXT: v_cndmask_b32_e64 v6, 0, v6, s[0:1]			; GCN-NEXT: v_cndmask_b32_e64 v6, 0, v6, s[0:1]
				; GCN-NEXT: v_mov_b32_e32 v9, 0
	; GCN-NEXT: v_cndmask_b32_e32 v2, 0, v2, vcc			; GCN-NEXT: v_cndmask_b32_e32 v2, 0, v2, vcc
	; GCN-NEXT: flat_store_dwordx4 v[10:11], v[4:7]			; GCN-NEXT: flat_store_dwordx4 v[10:11], v[4:7]
	; GCN-NEXT: flat_store_dwordx4 v[8:9], v[0:3]			; GCN-NEXT: flat_store_dwordx4 v[8:9], v[0:3]
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	%shift = lshr <2 x i128> %lhs, %rhs			%shift = lshr <2 x i128> %lhs, %rhs
	store <2 x i128> %shift, <2 x i128> addrspace(1)* null			store <2 x i128> %shift, <2 x i128> addrspace(1)* null
	ret void			ret void
	}			}

	define amdgpu_kernel void @s_ashr_v2i128_ss(<2 x i128> %lhs, <2 x i128> %rhs) {			define amdgpu_kernel void @s_ashr_v2i128_ss(<2 x i128> %lhs, <2 x i128> %rhs) {
	; GCN-LABEL: s_ashr_v2i128_ss:			; GCN-LABEL: s_ashr_v2i128_ss:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_load_dwordx8 s[8:15], s[4:5], 0x0
	; GCN-NEXT: s_load_dwordx8 s[16:23], s[4:5], 0x8			; GCN-NEXT: s_load_dwordx8 s[16:23], s[4:5], 0x8
				; GCN-NEXT: s_load_dwordx8 s[8:15], s[4:5], 0x0
	; GCN-NEXT: v_mov_b32_e32 v8, 0			; GCN-NEXT: v_mov_b32_e32 v8, 0
	; GCN-NEXT: v_mov_b32_e32 v9, 0			; GCN-NEXT: v_mov_b32_e32 v9, 0
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: s_sub_i32 s6, 64, s16			; GCN-NEXT: s_sub_i32 s6, 64, s16
	; GCN-NEXT: v_cmp_lt_u64_e64 s[0:1], s[16:17], 64			; GCN-NEXT: v_cmp_lt_u64_e64 s[0:1], s[16:17], 64
	; GCN-NEXT: v_cmp_eq_u64_e64 s[2:3], s[18:19], 0			; GCN-NEXT: v_cmp_eq_u64_e64 s[2:3], s[18:19], 0
	; GCN-NEXT: s_sub_i32 s4, s16, 64			; GCN-NEXT: s_sub_i32 s4, s16, 64
	; GCN-NEXT: s_lshl_b64 s[6:7], s[10:11], s6			; GCN-NEXT: s_lshl_b64 s[6:7], s[10:11], s6
	▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/store-weird-sizes.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,CIVI,HAWAII %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,CIVI,HAWAII %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,CIVI,FIJI %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,CIVI,FIJI %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9 %s

	define void @local_store_i56(i56 addrspace(3)* %ptr, i56 %arg) #0 {			define void @local_store_i56(i56 addrspace(3)* %ptr, i56 %arg) #0 {
	; CIVI-LABEL: local_store_i56:			; CIVI-LABEL: local_store_i56:
	; CIVI: ; %bb.0:			; CIVI: ; %bb.0:
	; CIVI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CIVI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CIVI-NEXT: v_lshrrev_b32_e32 v3, 16, v2
	; CIVI-NEXT: s_mov_b32 m0, -1			; CIVI-NEXT: s_mov_b32 m0, -1
	; CIVI-NEXT: ds_write_b8 v0, v3 offset:6
	; CIVI-NEXT: ds_write_b16 v0, v2 offset:4			; CIVI-NEXT: ds_write_b16 v0, v2 offset:4
	; CIVI-NEXT: ds_write_b32 v0, v1			; CIVI-NEXT: ds_write_b32 v0, v1
				; CIVI-NEXT: v_lshrrev_b32_e32 v1, 16, v2
				; CIVI-NEXT: ds_write_b8 v0, v1 offset:6
	; CIVI-NEXT: s_waitcnt lgkmcnt(0)			; CIVI-NEXT: s_waitcnt lgkmcnt(0)
	; CIVI-NEXT: s_setpc_b64 s[30:31]			; CIVI-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: local_store_i56:			; GFX9-LABEL: local_store_i56:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: ds_write_b8_d16_hi v0, v2 offset:6			; GFX9-NEXT: ds_write_b8_d16_hi v0, v2 offset:6
	; GFX9-NEXT: ds_write_b16 v0, v2 offset:4			; GFX9-NEXT: ds_write_b16 v0, v2 offset:4
	; GFX9-NEXT: ds_write_b32 v0, v1			; GFX9-NEXT: ds_write_b32 v0, v1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	store i56 %arg, i56 addrspace(3)* %ptr, align 8			store i56 %arg, i56 addrspace(3)* %ptr, align 8
	ret void			ret void
	}			}

	define amdgpu_kernel void @local_store_i55(i55 addrspace(3)* %ptr, i55 %arg) #0 {			define amdgpu_kernel void @local_store_i55(i55 addrspace(3)* %ptr, i55 %arg) #0 {
	; HAWAII-LABEL: local_store_i55:			; HAWAII-LABEL: local_store_i55:
	; HAWAII: ; %bb.0:			; HAWAII: ; %bb.0:
	; HAWAII-NEXT: s_or_b32 s0, s4, 14			; HAWAII-NEXT: s_or_b32 s0, s4, 14
	; HAWAII-NEXT: v_mov_b32_e32 v0, s0			; HAWAII-NEXT: v_mov_b32_e32 v0, s0
	; HAWAII-NEXT: v_mov_b32_e32 v1, s5			; HAWAII-NEXT: v_mov_b32_e32 v1, s5
	; HAWAII-NEXT: flat_load_ubyte v0, v[0:1]			; HAWAII-NEXT: flat_load_ubyte v0, v[0:1]
	; HAWAII-NEXT: s_load_dword s0, s[4:5], 0x0			; HAWAII-NEXT: s_load_dword s0, s[4:5], 0x0
	; HAWAII-NEXT: s_load_dword s1, s[4:5], 0x2			; HAWAII-NEXT: s_load_dword s1, s[4:5], 0x2
	; HAWAII-NEXT: s_load_dword s2, s[4:5], 0x3			; HAWAII-NEXT: s_load_dword s2, s[4:5], 0x3
	; HAWAII-NEXT: s_mov_b32 m0, -1			; HAWAII-NEXT: s_mov_b32 m0, -1
	; HAWAII-NEXT: s_waitcnt lgkmcnt(0)			; HAWAII-NEXT: s_waitcnt lgkmcnt(0)
	; HAWAII-NEXT: v_mov_b32_e32 v1, s0			; HAWAII-NEXT: v_mov_b32_e32 v1, s0
	; HAWAII-NEXT: v_mov_b32_e32 v2, s1			; HAWAII-NEXT: v_mov_b32_e32 v3, s1
	; HAWAII-NEXT: v_mov_b32_e32 v3, s2			; HAWAII-NEXT: v_mov_b32_e32 v2, s2
				; HAWAII-NEXT: ds_write_b16 v1, v2 offset:4
	; HAWAII-NEXT: s_waitcnt vmcnt(0)			; HAWAII-NEXT: s_waitcnt vmcnt(0)
	; HAWAII-NEXT: v_and_b32_e32 v0, 0x7f, v0			; HAWAII-NEXT: v_and_b32_e32 v0, 0x7f, v0
	; HAWAII-NEXT: ds_write_b8 v1, v0 offset:6			; HAWAII-NEXT: ds_write_b8 v1, v0 offset:6
	; HAWAII-NEXT: ds_write_b16 v1, v3 offset:4			; HAWAII-NEXT: ds_write_b32 v1, v3
	; HAWAII-NEXT: ds_write_b32 v1, v2
	; HAWAII-NEXT: s_endpgm			; HAWAII-NEXT: s_endpgm
	;			;
	; FIJI-LABEL: local_store_i55:			; FIJI-LABEL: local_store_i55:
	; FIJI: ; %bb.0:			; FIJI: ; %bb.0:
	; FIJI-NEXT: s_or_b32 s0, s4, 14			; FIJI-NEXT: s_or_b32 s0, s4, 14
	; FIJI-NEXT: v_mov_b32_e32 v0, s0			; FIJI-NEXT: v_mov_b32_e32 v0, s0
	; FIJI-NEXT: v_mov_b32_e32 v1, s5			; FIJI-NEXT: v_mov_b32_e32 v1, s5
	; FIJI-NEXT: flat_load_ubyte v0, v[0:1]			; FIJI-NEXT: flat_load_ubyte v0, v[0:1]
	; FIJI-NEXT: s_load_dword s0, s[4:5], 0x0			; FIJI-NEXT: s_load_dword s0, s[4:5], 0x0
	; FIJI-NEXT: s_load_dword s1, s[4:5], 0x8			; FIJI-NEXT: s_load_dword s1, s[4:5], 0x8
	; FIJI-NEXT: s_load_dword s2, s[4:5], 0xc			; FIJI-NEXT: s_load_dword s2, s[4:5], 0xc
	; FIJI-NEXT: s_mov_b32 m0, -1			; FIJI-NEXT: s_mov_b32 m0, -1
	; FIJI-NEXT: s_waitcnt lgkmcnt(0)			; FIJI-NEXT: s_waitcnt lgkmcnt(0)
	; FIJI-NEXT: v_mov_b32_e32 v1, s0			; FIJI-NEXT: v_mov_b32_e32 v1, s0
	; FIJI-NEXT: v_mov_b32_e32 v3, s1			; FIJI-NEXT: v_mov_b32_e32 v3, s1
	; FIJI-NEXT: s_and_b32 s3, s2, 0xffff			; FIJI-NEXT: s_and_b32 s3, s2, 0xffff
	; FIJI-NEXT: v_mov_b32_e32 v2, s2			; FIJI-NEXT: v_mov_b32_e32 v2, s2
				; FIJI-NEXT: ds_write_b16 v1, v2 offset:4
	; FIJI-NEXT: s_waitcnt vmcnt(0)			; FIJI-NEXT: s_waitcnt vmcnt(0)
	; FIJI-NEXT: v_lshlrev_b32_e32 v0, 16, v0			; FIJI-NEXT: v_lshlrev_b32_e32 v0, 16, v0
	; FIJI-NEXT: v_or_b32_e32 v0, s3, v0			; FIJI-NEXT: v_or_b32_e32 v0, s3, v0
	; FIJI-NEXT: v_bfe_u32 v0, v0, 16, 7			; FIJI-NEXT: v_bfe_u32 v0, v0, 16, 7
	; FIJI-NEXT: ds_write_b8 v1, v0 offset:6			; FIJI-NEXT: ds_write_b8 v1, v0 offset:6
	; FIJI-NEXT: ds_write_b16 v1, v2 offset:4
	; FIJI-NEXT: ds_write_b32 v1, v3			; FIJI-NEXT: ds_write_b32 v1, v3
	; FIJI-NEXT: s_endpgm			; FIJI-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: local_store_i55:			; GFX9-LABEL: local_store_i55:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: v_mov_b32_e32 v0, s4			; GFX9-NEXT: v_mov_b32_e32 v0, s4
	; GFX9-NEXT: v_mov_b32_e32 v1, s5			; GFX9-NEXT: v_mov_b32_e32 v1, s5
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: global_load_ubyte_d16_hi v2, v[0:1], off offset:14			; GFX9-NEXT: global_load_ubyte_d16_hi v2, v[0:1], off offset:14
	; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0			; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0
	; GFX9-NEXT: s_load_dword s1, s[4:5], 0x8			; GFX9-NEXT: s_load_dword s1, s[4:5], 0x8
	; GFX9-NEXT: s_load_dword s2, s[4:5], 0xc			; GFX9-NEXT: s_load_dword s2, s[4:5], 0xc
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, s0			; GFX9-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-NEXT: v_mov_b32_e32 v3, s1			; GFX9-NEXT: v_mov_b32_e32 v3, s1
	; GFX9-NEXT: s_and_b32 s3, s2, 0xffff
	; GFX9-NEXT: v_mov_b32_e32 v1, s2			; GFX9-NEXT: v_mov_b32_e32 v1, s2
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_and_b32 s3, s2, 0xffff
	; GFX9-NEXT: v_or_b32_e32 v2, s3, v2
	; GFX9-NEXT: v_and_b32_e32 v2, 0x7fffff, v2
	; GFX9-NEXT: ds_write_b8_d16_hi v0, v2 offset:6
	; GFX9-NEXT: ds_write_b16 v0, v1 offset:4			; GFX9-NEXT: ds_write_b16 v0, v1 offset:4
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: v_or_b32_e32 v1, s3, v2
				; GFX9-NEXT: v_and_b32_e32 v1, 0x7fffff, v1
				; GFX9-NEXT: ds_write_b8_d16_hi v0, v1 offset:6
	; GFX9-NEXT: ds_write_b32 v0, v3			; GFX9-NEXT: ds_write_b32 v0, v3
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	store i55 %arg, i55 addrspace(3)* %ptr, align 8			store i55 %arg, i55 addrspace(3)* %ptr, align 8
	ret void			ret void
	}			}

	define amdgpu_kernel void @local_store_i48(i48 addrspace(3)* %ptr, i48 %arg) #0 {			define amdgpu_kernel void @local_store_i48(i48 addrspace(3)* %ptr, i48 %arg) #0 {
	; HAWAII-LABEL: local_store_i48:			; HAWAII-LABEL: local_store_i48:
	; HAWAII: ; %bb.0:			; HAWAII: ; %bb.0:
	; HAWAII-NEXT: s_load_dword s0, s[4:5], 0x0			; HAWAII-NEXT: s_load_dword s0, s[4:5], 0x0
	; HAWAII-NEXT: s_load_dword s1, s[4:5], 0x2			; HAWAII-NEXT: s_load_dword s1, s[4:5], 0x2
	; HAWAII-NEXT: s_load_dword s2, s[4:5], 0x3			; HAWAII-NEXT: s_load_dword s2, s[4:5], 0x3
	; HAWAII-NEXT: s_mov_b32 m0, -1			; HAWAII-NEXT: s_mov_b32 m0, -1
	; HAWAII-NEXT: s_waitcnt lgkmcnt(0)			; HAWAII-NEXT: s_waitcnt lgkmcnt(0)
	; HAWAII-NEXT: v_mov_b32_e32 v0, s0			; HAWAII-NEXT: v_mov_b32_e32 v0, s0
	; HAWAII-NEXT: v_mov_b32_e32 v2, s1
	; HAWAII-NEXT: v_mov_b32_e32 v1, s2			; HAWAII-NEXT: v_mov_b32_e32 v1, s2
	; HAWAII-NEXT: ds_write_b16 v0, v1 offset:4			; HAWAII-NEXT: ds_write_b16 v0, v1 offset:4
	; HAWAII-NEXT: ds_write_b32 v0, v2			; HAWAII-NEXT: v_mov_b32_e32 v1, s1
				; HAWAII-NEXT: ds_write_b32 v0, v1
	; HAWAII-NEXT: s_endpgm			; HAWAII-NEXT: s_endpgm
	;			;
	; FIJI-LABEL: local_store_i48:			; FIJI-LABEL: local_store_i48:
	; FIJI: ; %bb.0:			; FIJI: ; %bb.0:
	; FIJI-NEXT: s_load_dword s0, s[4:5], 0x0			; FIJI-NEXT: s_load_dword s0, s[4:5], 0x0
	; FIJI-NEXT: s_load_dword s1, s[4:5], 0x8			; FIJI-NEXT: s_load_dword s1, s[4:5], 0x8
	; FIJI-NEXT: s_load_dword s2, s[4:5], 0xc			; FIJI-NEXT: s_load_dword s2, s[4:5], 0xc
	; FIJI-NEXT: s_mov_b32 m0, -1			; FIJI-NEXT: s_mov_b32 m0, -1
	; FIJI-NEXT: s_waitcnt lgkmcnt(0)			; FIJI-NEXT: s_waitcnt lgkmcnt(0)
	; FIJI-NEXT: v_mov_b32_e32 v0, s0			; FIJI-NEXT: v_mov_b32_e32 v0, s0
	; FIJI-NEXT: v_mov_b32_e32 v2, s1
	; FIJI-NEXT: v_mov_b32_e32 v1, s2			; FIJI-NEXT: v_mov_b32_e32 v1, s2
	; FIJI-NEXT: ds_write_b16 v0, v1 offset:4			; FIJI-NEXT: ds_write_b16 v0, v1 offset:4
	; FIJI-NEXT: ds_write_b32 v0, v2			; FIJI-NEXT: v_mov_b32_e32 v1, s1
				; FIJI-NEXT: ds_write_b32 v0, v1
	; FIJI-NEXT: s_endpgm			; FIJI-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: local_store_i48:			; GFX9-LABEL: local_store_i48:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0			; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0
	; GFX9-NEXT: s_load_dword s1, s[4:5], 0x8			; GFX9-NEXT: s_load_dword s1, s[4:5], 0x8
	; GFX9-NEXT: s_load_dword s2, s[4:5], 0xc			; GFX9-NEXT: s_load_dword s2, s[4:5], 0xc
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, s0			; GFX9-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-NEXT: v_mov_b32_e32 v2, s1			; GFX9-NEXT: v_mov_b32_e32 v2, s1
	; GFX9-NEXT: v_mov_b32_e32 v1, s2			; GFX9-NEXT: v_mov_b32_e32 v1, s2
	; GFX9-NEXT: ds_write_b16 v0, v1 offset:4			; GFX9-NEXT: ds_write_b16 v0, v1 offset:4
	; GFX9-NEXT: ds_write_b32 v0, v2			; GFX9-NEXT: ds_write_b32 v0, v2
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	store i48 %arg, i48 addrspace(3)* %ptr, align 8			store i48 %arg, i48 addrspace(3)* %ptr, align 8
	ret void			ret void
	}			}

	define amdgpu_kernel void @local_store_i65(i65 addrspace(3)* %ptr, i65 %arg) #0 {			define amdgpu_kernel void @local_store_i65(i65 addrspace(3)* %ptr, i65 %arg) #0 {
	; HAWAII-LABEL: local_store_i65:			; HAWAII-LABEL: local_store_i65:
	; HAWAII: ; %bb.0:			; HAWAII: ; %bb.0:
	; HAWAII-NEXT: s_load_dword s2, s[4:5], 0x0			; HAWAII-NEXT: s_load_dword s2, s[4:5], 0x0
	; HAWAII-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x2			; HAWAII-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x2
	; HAWAII-NEXT: s_load_dword s3, s[4:5], 0x4			; HAWAII-NEXT: s_load_dword s3, s[4:5], 0x4
	; HAWAII-NEXT: s_mov_b32 m0, -1			; HAWAII-NEXT: s_mov_b32 m0, -1
	; HAWAII-NEXT: s_waitcnt lgkmcnt(0)			; HAWAII-NEXT: s_waitcnt lgkmcnt(0)
	; HAWAII-NEXT: v_mov_b32_e32 v2, s2			; HAWAII-NEXT: v_mov_b32_e32 v2, s2
	; HAWAII-NEXT: v_mov_b32_e32 v0, s0
	; HAWAII-NEXT: s_and_b32 s3, s3, 1			; HAWAII-NEXT: s_and_b32 s3, s3, 1
	; HAWAII-NEXT: v_mov_b32_e32 v3, s3			; HAWAII-NEXT: v_mov_b32_e32 v0, s3
				; HAWAII-NEXT: ds_write_b8 v2, v0 offset:8
				; HAWAII-NEXT: v_mov_b32_e32 v0, s0
	; HAWAII-NEXT: v_mov_b32_e32 v1, s1			; HAWAII-NEXT: v_mov_b32_e32 v1, s1
	; HAWAII-NEXT: ds_write_b8 v2, v3 offset:8
	; HAWAII-NEXT: ds_write_b64 v2, v[0:1]			; HAWAII-NEXT: ds_write_b64 v2, v[0:1]
	; HAWAII-NEXT: s_endpgm			; HAWAII-NEXT: s_endpgm
	;			;
	; FIJI-LABEL: local_store_i65:			; FIJI-LABEL: local_store_i65:
	; FIJI: ; %bb.0:			; FIJI: ; %bb.0:
	; FIJI-NEXT: s_load_dword s2, s[4:5], 0x0			; FIJI-NEXT: s_load_dword s2, s[4:5], 0x0
	; FIJI-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x8			; FIJI-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x8
	; FIJI-NEXT: s_load_dword s3, s[4:5], 0x10			; FIJI-NEXT: s_load_dword s3, s[4:5], 0x10
	; FIJI-NEXT: s_mov_b32 m0, -1			; FIJI-NEXT: s_mov_b32 m0, -1
	; FIJI-NEXT: s_waitcnt lgkmcnt(0)			; FIJI-NEXT: s_waitcnt lgkmcnt(0)
	; FIJI-NEXT: v_mov_b32_e32 v2, s2			; FIJI-NEXT: v_mov_b32_e32 v2, s2
	; FIJI-NEXT: v_mov_b32_e32 v0, s0
	; FIJI-NEXT: s_and_b32 s3, s3, 1			; FIJI-NEXT: s_and_b32 s3, s3, 1
	; FIJI-NEXT: v_mov_b32_e32 v3, s3			; FIJI-NEXT: v_mov_b32_e32 v0, s3
				; FIJI-NEXT: ds_write_b8 v2, v0 offset:8
				; FIJI-NEXT: v_mov_b32_e32 v0, s0
	; FIJI-NEXT: v_mov_b32_e32 v1, s1			; FIJI-NEXT: v_mov_b32_e32 v1, s1
	; FIJI-NEXT: ds_write_b8 v2, v3 offset:8
	; FIJI-NEXT: ds_write_b64 v2, v[0:1]			; FIJI-NEXT: ds_write_b64 v2, v[0:1]
	; FIJI-NEXT: s_endpgm			; FIJI-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: local_store_i65:			; GFX9-LABEL: local_store_i65:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0			; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX9-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x8			; GFX9-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x8
	; GFX9-NEXT: s_load_dword s3, s[4:5], 0x10			; GFX9-NEXT: s_load_dword s3, s[4:5], 0x10
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v2, s2			; GFX9-NEXT: v_mov_b32_e32 v2, s2
	Show All 27 Lines
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	store i13 %arg, i13 addrspace(3)* %ptr, align 8			store i13 %arg, i13 addrspace(3)* %ptr, align 8
	ret void			ret void
	}			}

	define void @local_store_i17(i17 addrspace(3)* %ptr, i17 %arg) #0 {			define void @local_store_i17(i17 addrspace(3)* %ptr, i17 %arg) #0 {
	; CIVI-LABEL: local_store_i17:			; CIVI-LABEL: local_store_i17:
	; CIVI: ; %bb.0:			; CIVI: ; %bb.0:
	; CIVI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CIVI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CIVI-NEXT: s_mov_b32 m0, -1			; CIVI-NEXT: s_mov_b32 m0, -1
	; CIVI-NEXT: v_bfe_u32 v2, v1, 16, 1
	; CIVI-NEXT: ds_write_b16 v0, v1			; CIVI-NEXT: ds_write_b16 v0, v1
	; CIVI-NEXT: ds_write_b8 v0, v2 offset:2			; CIVI-NEXT: v_bfe_u32 v1, v1, 16, 1
				; CIVI-NEXT: ds_write_b8 v0, v1 offset:2
	; CIVI-NEXT: s_waitcnt lgkmcnt(0)			; CIVI-NEXT: s_waitcnt lgkmcnt(0)
	; CIVI-NEXT: s_setpc_b64 s[30:31]			; CIVI-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: local_store_i17:			; GFX9-LABEL: local_store_i17:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: v_and_b32_e32 v2, 0x1ffff, v1
	; GFX9-NEXT: ds_write_b16 v0, v1			; GFX9-NEXT: ds_write_b16 v0, v1
	; GFX9-NEXT: ds_write_b8_d16_hi v0, v2 offset:2			; GFX9-NEXT: v_and_b32_e32 v1, 0x1ffff, v1
				; GFX9-NEXT: ds_write_b8_d16_hi v0, v1 offset:2
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	store i17 %arg, i17 addrspace(3)* %ptr, align 8			store i17 %arg, i17 addrspace(3)* %ptr, align 8
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

llvm/test/CodeGen/AMDGPU/udivrem.ll

	Show All 30 Lines
	; R600-NEXT: 2(2.802597e-45), 0(0.000000e+00)			; R600-NEXT: 2(2.802597e-45), 0(0.000000e+00)
	; R600-NEXT: ADD_INT * T1.W, PV.W, 1,			; R600-NEXT: ADD_INT * T1.W, PV.W, 1,
	; R600-NEXT: CNDE_INT T2.X, T3.W, T0.W, PV.W,			; R600-NEXT: CNDE_INT T2.X, T3.W, T0.W, PV.W,
	; R600-NEXT: LSHR * T3.X, KC0[2].Y, literal.x,			; R600-NEXT: LSHR * T3.X, KC0[2].Y, literal.x,
	; R600-NEXT: 2(2.802597e-45), 0(0.000000e+00)			; R600-NEXT: 2(2.802597e-45), 0(0.000000e+00)
	;			;
	; GFX6-LABEL: test_udivrem:			; GFX6-LABEL: test_udivrem:
	; GFX6: ; %bb.0:			; GFX6: ; %bb.0:
	; GFX6-NEXT: s_load_dword s3, s[0:1], 0x26			; GFX6-NEXT: s_load_dword s3, s[0:1], 0x26
	; GFX6-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x9			; GFX6-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x9
	; GFX6-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x13			; GFX6-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x13
	; GFX6-NEXT: s_load_dword s0, s[0:1], 0x1d
	; GFX6-NEXT: s_mov_b32 s7, 0xf000			; GFX6-NEXT: s_mov_b32 s7, 0xf000
	; GFX6-NEXT: s_mov_b32 s6, -1			; GFX6-NEXT: s_mov_b32 s6, -1
	; GFX6-NEXT: s_mov_b32 s10, s6			; GFX6-NEXT: s_mov_b32 s10, s6
	; GFX6-NEXT: s_waitcnt lgkmcnt(0)			; GFX6-NEXT: s_waitcnt lgkmcnt(0)
	; GFX6-NEXT: v_cvt_f32_u32_e32 v0, s3			; GFX6-NEXT: v_cvt_f32_u32_e32 v0, s3
	; GFX6-NEXT: s_sub_i32 s2, 0, s3			; GFX6-NEXT: s_sub_i32 s2, 0, s3
	; GFX6-NEXT: s_mov_b32 s11, s7			; GFX6-NEXT: s_mov_b32 s11, s7
	; GFX6-NEXT: v_rcp_iflag_f32_e32 v0, v0			; GFX6-NEXT: v_rcp_iflag_f32_e32 v0, v0
	; GFX6-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0			; GFX6-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0
	; GFX6-NEXT: v_cvt_u32_f32_e32 v0, v0			; GFX6-NEXT: v_cvt_u32_f32_e32 v0, v0
	; GFX6-NEXT: v_mul_lo_u32 v1, s2, v0			; GFX6-NEXT: v_mul_lo_u32 v1, s2, v0
				; GFX6-NEXT: s_load_dword s2, s[0:1], 0x1d
	; GFX6-NEXT: v_mul_hi_u32 v1, v0, v1			; GFX6-NEXT: v_mul_hi_u32 v1, v0, v1
	; GFX6-NEXT: v_add_i32_e32 v0, vcc, v1, v0			; GFX6-NEXT: v_add_i32_e32 v0, vcc, v1, v0
	; GFX6-NEXT: v_mul_hi_u32 v0, s0, v0			; GFX6-NEXT: s_waitcnt lgkmcnt(0)
				; GFX6-NEXT: v_mul_hi_u32 v0, s2, v0
	; GFX6-NEXT: v_mul_lo_u32 v1, v0, s3			; GFX6-NEXT: v_mul_lo_u32 v1, v0, s3
	; GFX6-NEXT: v_add_i32_e32 v2, vcc, 1, v0			; GFX6-NEXT: v_add_i32_e32 v2, vcc, 1, v0
	; GFX6-NEXT: v_sub_i32_e32 v1, vcc, s0, v1			; GFX6-NEXT: v_sub_i32_e32 v1, vcc, s2, v1
	; GFX6-NEXT: v_cmp_le_u32_e64 s[0:1], s3, v1			; GFX6-NEXT: v_cmp_le_u32_e64 s[0:1], s3, v1
	; GFX6-NEXT: v_cndmask_b32_e64 v0, v0, v2, s[0:1]			; GFX6-NEXT: v_cndmask_b32_e64 v0, v0, v2, s[0:1]
	; GFX6-NEXT: v_subrev_i32_e32 v2, vcc, s3, v1			; GFX6-NEXT: v_subrev_i32_e32 v2, vcc, s3, v1
	; GFX6-NEXT: v_cndmask_b32_e64 v1, v1, v2, s[0:1]			; GFX6-NEXT: v_cndmask_b32_e64 v1, v1, v2, s[0:1]
	; GFX6-NEXT: v_add_i32_e32 v2, vcc, 1, v0			; GFX6-NEXT: v_add_i32_e32 v2, vcc, 1, v0
	; GFX6-NEXT: v_cmp_le_u32_e64 s[0:1], s3, v1			; GFX6-NEXT: v_cmp_le_u32_e64 s[0:1], s3, v1
	; GFX6-NEXT: v_cndmask_b32_e64 v0, v0, v2, s[0:1]			; GFX6-NEXT: v_cndmask_b32_e64 v0, v0, v2, s[0:1]
	; GFX6-NEXT: v_subrev_i32_e32 v2, vcc, s3, v1			; GFX6-NEXT: v_subrev_i32_e32 v2, vcc, s3, v1
	; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0			; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: v_cndmask_b32_e64 v0, v1, v2, s[0:1]			; GFX6-NEXT: v_cndmask_b32_e64 v0, v1, v2, s[0:1]
	; GFX6-NEXT: buffer_store_dword v0, off, s[8:11], 0			; GFX6-NEXT: buffer_store_dword v0, off, s[8:11], 0
	; GFX6-NEXT: s_endpgm			; GFX6-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: test_udivrem:			; GFX8-LABEL: test_udivrem:
	; GFX8: ; %bb.0:			; GFX8: ; %bb.0:
	; GFX8-NEXT: s_load_dword s7, s[0:1], 0x98			; GFX8-NEXT: s_load_dword s7, s[0:1], 0x98
	; GFX8-NEXT: s_load_dword s6, s[0:1], 0x74			; GFX8-NEXT: s_load_dword s6, s[0:1], 0x74
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_cvt_f32_u32_e32 v0, s7			; GFX8-NEXT: v_cvt_f32_u32_e32 v0, s7
	; GFX8-NEXT: s_sub_i32 s2, 0, s7			; GFX8-NEXT: s_sub_i32 s2, 0, s7
	▲ Show 20 Lines • Show All 401 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU/MemOpsCluster] Guard new mem ops clustering heuristic logic by a flagAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 277353

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.div.fmas.ll

llvm/test/CodeGen/AMDGPU/amdhsa-trap-num-sgprs.ll

llvm/test/CodeGen/AMDGPU/kernel-args.ll

llvm/test/CodeGen/AMDGPU/memory_clause.ll

llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll

llvm/test/CodeGen/AMDGPU/salu-to-valu.ll

llvm/test/CodeGen/AMDGPU/sgpr-control-flow.ll

llvm/test/CodeGen/AMDGPU/shift-i128.ll

llvm/test/CodeGen/AMDGPU/store-weird-sizes.ll

llvm/test/CodeGen/AMDGPU/udivrem.ll

[AMDGPU/MemOpsCluster] Guard new mem ops clustering heuristic logic by a flag
AbandonedPublic