This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
1/2
AMDGPUSubtarget.cpp
-
SIInstrInfo.h
-
SIInstrInfo.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
llvm.amdgcn.ds.gws.barrier.ll
-
llvm.amdgcn.ds.gws.init.ll
-
min.ll
-
misched-killflags.mir
-
mul24-pass-ordering.ll
-
mul_uint24-amdgcn.ll
-
packed-op-sel.ll
-
scratch-simple.ll
-
selectcc-opt.ll
-
setcc-opt.ll
-
sint_to_fp.ll
-
spill-vgpr-to-agpr.ll
-
sub.i16.ll
-
uint_to_fp.ll
-
wave32.ll
-
zero_extend.ll

Differential D72487

[AMDGPU] Fix bundle scheduling
ClosedPublic

Authored by rampitec on Jan 9 2020, 3:42 PM.

Download Raw Diff

Details

Reviewers

foad
kerbowa

Commits

rGcd69e4c74c17: [AMDGPU] Fix bundle scheduling

Summary

Bundles coming to scheduler considered free, i.e. zero latency.
Fixed.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

rampitec created this revision.Jan 9 2020, 3:42 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 9 2020, 3:42 PM

Herald added subscribers: hiraditya, t-tye, tpr and 8 others. · View Herald Transcript

arsenm added inline comments.Jan 9 2020, 3:48 PM

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
758	This seems like a core scheduler flaw, not something to put in a target dag mutation

I'm surprised this isn't already a standard mutation that targets can override but,

LGTM

This revision is now accepted and ready to land.Jan 9 2020, 3:49 PM

rampitec marked an inline comment as done.Jan 9 2020, 3:59 PM

rampitec added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
758	That is not so general. Different targets have different level of parallelism here, so latencies can either add up, be parallel or use some more peculiar pattern. In a general case only target knows what will be the latency in this case. There can be a way to describe it somehow, it is just not there.

Closed by commit rGcd69e4c74c17: [AMDGPU] Fix bundle scheduling (authored by rampitec). · Explain WhyJan 9 2020, 5:48 PM

This revision was automatically updated to reflect the committed changes.

This is only relevant for post-ra scheduling because we don't have any bundles when we do pre-ra scheduling, right?

Doing it as a DAG mutation seems a bit late. One problem is that you set the latency for the bundle's succ, but not for the corresponding pred, so you see odd mismatches like this:

SU(5):   BUNDLE implicit-def $sgpr6_sgpr7, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $scc
  # preds left       : 2
  # succs left       : 1
  # rdefs left       : 0
  Latency            : 3
  Depth              : 0
  Height             : 1
  Predecessors:
    SU(0): Anti Latency=0
    SU(0): Anti Latency=0
  Successors:
    SU(11): Data Latency=1 Reg=$sgpr6_sgpr7
...
SU(11):   S_NOP 0, implicit killed $sgpr6_sgpr7, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr4, implicit killed $vgpr0_vgpr1_vgpr2_vgpr3
  # preds left       : 6
  # succs left       : 0
  # rdefs left       : 0
  Latency            : 1
  Depth              : 2
  Height             : 0
  Predecessors:
    SU(10): Data Latency=0 Reg=$vgpr0_vgpr1_vgpr2_vgpr3
    SU(9): Data Latency=0 Reg=$vgpr0_vgpr1_vgpr2_vgpr3
    SU(8): Data Latency=0 Reg=$vgpr0_vgpr1_vgpr2_vgpr3
    SU(7): Data Latency=0 Reg=$vgpr0_vgpr1_vgpr2_vgpr3
    SU(6): Data Latency=0 Reg=$sgpr4
    SU(5): Data Latency=0 Reg=$sgpr6_sgpr7

(Latency for succ of SU(5) is 1, but latency for pred of SU(11) is 0.)

Instead how about doing this and implementing it in adjustSchedDependency for AMDGPU?

diff --git a/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp b/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp
index 96a1f86c3e0..ef5926e4f8f 100644
--- a/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp
+++ b/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp
@@ -269,9 +269,9 @@ void ScheduleDAGInstrs::addPhysRegDataDeps(SUnit *SU, unsigned OperIdx) {
       if (!ImplicitPseudoDef && !ImplicitPseudoUse) {
         Dep.setLatency(SchedModel.computeOperandLatency(SU->getInstr(), OperIdx,
                                                         RegUse, UseOp));
-        ST.adjustSchedDependency(SU, UseSU, Dep);
       } else
         Dep.setLatency(0);
+      ST.adjustSchedDependency(SU, UseSU, Dep);
 
       UseSU->addPred(Dep);
     }

Also, just curious, why did you update so many tests? I only found 4 that were failing with the patch:

Failing Tests (4):
    LLVM :: CodeGen/AMDGPU/llvm.amdgcn.ds.gws.barrier.ll
    LLVM :: CodeGen/AMDGPU/llvm.amdgcn.ds.gws.init.ll
    LLVM :: CodeGen/AMDGPU/misched-killflags.mir
    LLVM :: CodeGen/AMDGPU/mul24-pass-ordering.ll

In D72487#1813828, @foad wrote:

This is only relevant for post-ra scheduling because we don't have any bundles when we do pre-ra scheduling, right?

In fact we have much more bundles coming to pre-RA scheduler when we have xnack and form memory clauses. However, I have tried to use the same mutation in the pre-RA scheduler and did not notice any scheduling difference at all, even with memory clause tests.

Doing it as a DAG mutation seems a bit late. One problem is that you set the latency for the bundle's succ, but not for the corresponding pred, so you see odd mismatches like this:

SU(5):   BUNDLE implicit-def $sgpr6_sgpr7, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $scc
  # preds left       : 2
  # succs left       : 1
  # rdefs left       : 0
  Latency            : 3
  Depth              : 0
  Height             : 1
  Predecessors:
    SU(0): Anti Latency=0
    SU(0): Anti Latency=0
  Successors:
    SU(11): Data Latency=1 Reg=$sgpr6_sgpr7
...
SU(11):   S_NOP 0, implicit killed $sgpr6_sgpr7, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr4, implicit killed $vgpr0_vgpr1_vgpr2_vgpr3
  # preds left       : 6
  # succs left       : 0
  # rdefs left       : 0
  Latency            : 1
  Depth              : 2
  Height             : 0
  Predecessors:
    SU(10): Data Latency=0 Reg=$vgpr0_vgpr1_vgpr2_vgpr3
    SU(9): Data Latency=0 Reg=$vgpr0_vgpr1_vgpr2_vgpr3
    SU(8): Data Latency=0 Reg=$vgpr0_vgpr1_vgpr2_vgpr3
    SU(7): Data Latency=0 Reg=$vgpr0_vgpr1_vgpr2_vgpr3
    SU(6): Data Latency=0 Reg=$sgpr4
    SU(5): Data Latency=0 Reg=$sgpr6_sgpr7

(Latency for succ of SU(5) is 1, but latency for pred of SU(11) is 0.)

Instead how about doing this and implementing it in adjustSchedDependency for AMDGPU?

diff --git a/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp b/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp
index 96a1f86c3e0..ef5926e4f8f 100644
--- a/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp
+++ b/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp
@@ -269,9 +269,9 @@ void ScheduleDAGInstrs::addPhysRegDataDeps(SUnit *SU, unsigned OperIdx) {
       if (!ImplicitPseudoDef && !ImplicitPseudoUse) {
         Dep.setLatency(SchedModel.computeOperandLatency(SU->getInstr(), OperIdx,
                                                         RegUse, UseOp));
-        ST.adjustSchedDependency(SU, UseSU, Dep);
       } else
         Dep.setLatency(0);
+      ST.adjustSchedDependency(SU, UseSU, Dep);
 
       UseSU->addPred(Dep);
     }

You are right, it seem I've got correct schedilng with this change, but preds are still wrong. I can either extend the mutation or try this adjustment. I will explore it.

Also, just curious, why did you update so many tests? I only found 4 that were failing with the patch:
Failing Tests (4):
    LLVM :: CodeGen/AMDGPU/llvm.amdgcn.ds.gws.barrier.ll
    LLVM :: CodeGen/AMDGPU/llvm.amdgcn.ds.gws.init.ll
    LLVM :: CodeGen/AMDGPU/misched-killflags.mir
    LLVM :: CodeGen/AMDGPU/mul24-pass-ordering.ll

Yeah, this is a split from another change. I am preparing the patch which will send memory operation clusters to the post-RA scheduler instead of current DAG mutation, so it looks like some tests from that patch have sneaked here.

In D72487#1813828, @foad wrote:

Instead how about doing this and implementing it in adjustSchedDependency for AMDGPU?

diff --git a/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp b/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp
index 96a1f86c3e0..ef5926e4f8f 100644
--- a/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp
+++ b/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp
@@ -269,9 +269,9 @@ void ScheduleDAGInstrs::addPhysRegDataDeps(SUnit *SU, unsigned OperIdx) {
       if (!ImplicitPseudoDef && !ImplicitPseudoUse) {
         Dep.setLatency(SchedModel.computeOperandLatency(SU->getInstr(), OperIdx,
                                                         RegUse, UseOp));
-        ST.adjustSchedDependency(SU, UseSU, Dep);
       } else
         Dep.setLatency(0);
+      ST.adjustSchedDependency(SU, UseSU, Dep);
 
       UseSU->addPred(Dep);
     }

Looks like this patch would break some internal logic inside Hexagon's adjustSchedDependency()...

In D72487#1814764, @rampitec wrote:

In D72487#1813828, @foad wrote:

Instead how about doing this and implementing it in adjustSchedDependency for AMDGPU?

diff --git a/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp b/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp
index 96a1f86c3e0..ef5926e4f8f 100644
--- a/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp
+++ b/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp
@@ -269,9 +269,9 @@ void ScheduleDAGInstrs::addPhysRegDataDeps(SUnit *SU, unsigned OperIdx) {
       if (!ImplicitPseudoDef && !ImplicitPseudoUse) {
         Dep.setLatency(SchedModel.computeOperandLatency(SU->getInstr(), OperIdx,
                                                         RegUse, UseOp));
-        ST.adjustSchedDependency(SU, UseSU, Dep);
       } else
         Dep.setLatency(0);
+      ST.adjustSchedDependency(SU, UseSU, Dep);
 
       UseSU->addPred(Dep);
     }

Looks like this patch would break some internal logic inside Hexagon's adjustSchedDependency()...

D72535

rampitec mentioned this in rG987bf8b6c146: Let targets adjust operand latency of bundles.Jan 10 2020, 3:02 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUSubtarget.cpp

40 lines

SIInstrInfo.h

4 lines

SIInstrInfo.cpp

17 lines

test/

CodeGen/

AMDGPU/

llvm.amdgcn.ds.gws.barrier.ll

2 lines

llvm.amdgcn.ds.gws.init.ll

2 lines

min.ll

2 lines

misched-killflags.mir

2 lines

mul24-pass-ordering.ll

2 lines

2 lines

6 lines

2 lines

2 lines

16 lines

2 lines

spill-vgpr-to-agpr.ll

2 lines

4 lines

2 lines

8 lines

6 lines

Diff 237230

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp

Show First 20 Lines • Show All 749 Lines • ▼ Show 20 Lines	for (SUnit &SU : DAG->SUnits) {
}		}
}		}

SUa = &SU;		SUa = &SU;
}		}
}		}
};		};

		struct FixBundleLatencyMutation : ScheduleDAGMutation {
		arsenmUnsubmitted Not Done Reply Inline Actions This seems like a core scheduler flaw, not something to put in a target dag mutation arsenm: This seems like a core scheduler flaw, not something to put in a target dag mutation
		rampitecAuthorUnsubmitted Done Reply Inline Actions That is not so general. Different targets have different level of parallelism here, so latencies can either add up, be parallel or use some more peculiar pattern. In a general case only target knows what will be the latency in this case. There can be a way to describe it somehow, it is just not there. rampitec: That is not so general. Different targets have different level of parallelism here, so…
		const SIInstrInfo *TII;

		const TargetSchedModel *TSchedModel;

		FixBundleLatencyMutation(const SIInstrInfo *tii) : TII(tii) {}

		unsigned computeLatency(const MachineInstr &MI, unsigned Reg) const {
		const SIRegisterInfo &TRI = TII->getRegisterInfo();
		MachineBasicBlock::const_instr_iterator I(MI.getIterator());
		MachineBasicBlock::const_instr_iterator E(MI.getParent()->instr_end());
		unsigned Lat = 0;
		for (++I; I != E && I->isBundledWithPred(); ++I) {
		if (!I->modifiesRegister(Reg, &TRI))
		continue;
		Lat = TSchedModel->computeInstrLatency(&*I);
		break;
		}
		return Lat;
		}

		void apply(ScheduleDAGInstrs *DAGInstrs) override {
		ScheduleDAGMI DAG = static_cast<ScheduleDAGMI>(DAGInstrs);
		TSchedModel = DAGInstrs->getSchedModel();
		if (!TSchedModel \|\| DAG->SUnits.empty())
		return;

		for (SUnit &SU : DAG->SUnits) {
		if (!SU.isInstr() \|\| !SU.getInstr()->isBundle())
		continue;
		for (SDep &Dep : SU.Succs) {
		if (Dep.getKind() == SDep::Kind::Data && Dep.getReg())
		if (unsigned Lat = computeLatency(*SU.getInstr(), Dep.getReg()))
		Dep.setLatency(Lat);
		}
		}
		}
		};

struct FillMFMAShadowMutation : ScheduleDAGMutation {		struct FillMFMAShadowMutation : ScheduleDAGMutation {
const SIInstrInfo *TII;		const SIInstrInfo *TII;

ScheduleDAGMI *DAG;		ScheduleDAGMI *DAG;

FillMFMAShadowMutation(const SIInstrInfo *tii) : TII(tii) {}		FillMFMAShadowMutation(const SIInstrInfo *tii) : TII(tii) {}

bool isSALU(const SUnit *SU) const {		bool isSALU(const SUnit *SU) const {
▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	for (SUnit &SU : DAG->SUnits) {
}		}
}		}
}		}
};		};
} // namespace		} // namespace

void GCNSubtarget::getPostRAMutations(		void GCNSubtarget::getPostRAMutations(
std::vector<std::unique_ptr<ScheduleDAGMutation>> &Mutations) const {		std::vector<std::unique_ptr<ScheduleDAGMutation>> &Mutations) const {
		Mutations.push_back(std::make_unique<FixBundleLatencyMutation>(&InstrInfo));
Mutations.push_back(std::make_unique<MemOpClusterMutation>(&InstrInfo));		Mutations.push_back(std::make_unique<MemOpClusterMutation>(&InstrInfo));
Mutations.push_back(std::make_unique<FillMFMAShadowMutation>(&InstrInfo));		Mutations.push_back(std::make_unique<FillMFMAShadowMutation>(&InstrInfo));
}		}

const AMDGPUSubtarget &AMDGPUSubtarget::get(const MachineFunction &MF) {		const AMDGPUSubtarget &AMDGPUSubtarget::get(const MachineFunction &MF) {
if (MF.getTarget().getTargetTriple().getArch() == Triple::amdgcn)		if (MF.getTarget().getTargetTriple().getArch() == Triple::amdgcn)
return static_cast<const AMDGPUSubtarget&>(MF.getSubtarget<GCNSubtarget>());		return static_cast<const AMDGPUSubtarget&>(MF.getSubtarget<GCNSubtarget>());
else		else
Show All 9 Lines

llvm/lib/Target/AMDGPU/SIInstrInfo.h

Show First 20 Lines • Show All 1,036 Lines • ▼ Show 20 Lines	public:
void fixImplicitOperands(MachineInstr &MI) const;		void fixImplicitOperands(MachineInstr &MI) const;

MachineInstr *foldMemoryOperandImpl(MachineFunction &MF, MachineInstr &MI,		MachineInstr *foldMemoryOperandImpl(MachineFunction &MF, MachineInstr &MI,
ArrayRef<unsigned> Ops,		ArrayRef<unsigned> Ops,
MachineBasicBlock::iterator InsertPt,		MachineBasicBlock::iterator InsertPt,
int FrameIndex,		int FrameIndex,
LiveIntervals *LIS = nullptr,		LiveIntervals *LIS = nullptr,
VirtRegMap *VRM = nullptr) const override;		VirtRegMap *VRM = nullptr) const override;

		unsigned getInstrLatency(const InstrItineraryData *ItinData,
		const MachineInstr &MI,
		unsigned *PredCost) const override;
};		};

/// \brief Returns true if a reg:subreg pair P has a TRC class		/// \brief Returns true if a reg:subreg pair P has a TRC class
inline bool isOfRegClass(const TargetInstrInfo::RegSubRegPair &P,		inline bool isOfRegClass(const TargetInstrInfo::RegSubRegPair &P,
const TargetRegisterClass &TRC,		const TargetRegisterClass &TRC,
MachineRegisterInfo &MRI) {		MachineRegisterInfo &MRI) {
auto *RC = MRI.getRegClass(P.Reg);		auto *RC = MRI.getRegClass(P.Reg);
if (!P.SubReg)		if (!P.SubReg)
▲ Show 20 Lines • Show All 116 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

Show First 20 Lines • Show All 6,619 Lines • ▼ Show 20 Lines	if (MI.isFullCopy()) {
if (SrcReg == AMDGPU::M0 && DstReg.isVirtual()) {		if (SrcReg == AMDGPU::M0 && DstReg.isVirtual()) {
MF.getRegInfo().constrainRegClass(DstReg, &AMDGPU::SReg_32_XM0RegClass);		MF.getRegInfo().constrainRegClass(DstReg, &AMDGPU::SReg_32_XM0RegClass);
return nullptr;		return nullptr;
}		}
}		}

return nullptr;		return nullptr;
}		}

		unsigned SIInstrInfo::getInstrLatency(const InstrItineraryData *ItinData,
		const MachineInstr &MI,
		unsigned *PredCost) const {
		if (MI.isBundle()) {
		MachineBasicBlock::const_instr_iterator I(MI.getIterator());
		MachineBasicBlock::const_instr_iterator E(MI.getParent()->instr_end());
		unsigned Lat = 0, Count = 0;
		for (++I; I != E && I->isBundledWithPred(); ++I) {
		++Count;
		Lat = std::max(Lat, getInstrLatency(ItinData, *I, PredCost));
		}
		return Lat + Count - 1;
		}

		return AMDGPUGenInstrInfo::getInstrLatency(ItinData, MI, PredCost);
		}

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ds.gws.barrier.ll

	; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=tahiti -o - -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,LOOP %s			; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=tahiti -o - -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,LOOP %s
	; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=hawaii -o - -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,LOOP %s			; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=hawaii -o - -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,LOOP %s
	; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=fiji -o - -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,LOOP %s			; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=fiji -o - -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,LOOP %s
	; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -o - -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,NOLOOP %s			; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -o - -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,NOLOOP %s
	; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1010 -o - -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,NOLOOP,GFX10 %s			; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1010 -asm-verbose=0 -o - -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,NOLOOP,GFX10 %s

	; Make sure the op is emitted bundled with a waitcnt with and without the retry loop, and the bundle is not removed by ExpandPostRAPseudos.			; Make sure the op is emitted bundled with a waitcnt with and without the retry loop, and the bundle is not removed by ExpandPostRAPseudos.
	; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=tahiti -stop-after=postrapseudos -o - -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=MIR %s			; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=tahiti -stop-after=postrapseudos -o - -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=MIR %s
	; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -stop-after=postrapseudos -o - -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=MIR %s			; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -stop-after=postrapseudos -o - -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=MIR %s


	; Minimum offset			; Minimum offset
	; GCN-LABEL: {{^}}gws_barrier_offset0:			; GCN-LABEL: {{^}}gws_barrier_offset0:
	▲ Show 20 Lines • Show All 211 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ds.gws.init.ll

	; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=tahiti -o - -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,LOOP %s			; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=tahiti -o - -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,LOOP %s
	; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=hawaii -o - -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,LOOP %s			; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=hawaii -o - -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,LOOP %s
	; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=fiji -o - -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,LOOP %s			; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=fiji -o - -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,LOOP %s
	; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -o - -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,NOLOOP %s			; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -o - -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,NOLOOP %s
	; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1010 -o - -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,NOLOOP %s			; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1010 -asm-verbose=0 -o - -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,NOLOOP %s

	; Minimum offset			; Minimum offset
	; GCN-LABEL: {{^}}gws_init_offset0:			; GCN-LABEL: {{^}}gws_init_offset0:
	; GCN-DAG: s_load_dword [[BAR_NUM:s[0-9]+]]			; GCN-DAG: s_load_dword [[BAR_NUM:s[0-9]+]]
	; GCN-DAG: s_mov_b32 m0, 0{{$}}			; GCN-DAG: s_mov_b32 m0, 0{{$}}
	; GCN: v_mov_b32_e32 v0, [[BAR_NUM]]			; GCN: v_mov_b32_e32 v0, [[BAR_NUM]]
	; NOLOOP: ds_gws_init v0 gds{{$}}			; NOLOOP: ds_gws_init v0 gds{{$}}

	▲ Show 20 Lines • Show All 137 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/min.ll

Show First 20 Lines • Show All 398 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @v_test_umin_ult_i32_multi_use(i32 addrspace(1)* %out0, i1 addrspace(1)* %out1, i32 addrspace(1)* %aptr, i32 addrspace(1)* %bptr) #0 {
store i32 %val, i32 addrspace(1)* %out0, align 4		store i32 %val, i32 addrspace(1)* %out0, align 4
store i1 %cmp, i1 addrspace(1)* %out1		store i1 %cmp, i1 addrspace(1)* %out1
ret void		ret void
}		}

; FUNC-LABEL: @v_test_umin_ult_i16_multi_use		; FUNC-LABEL: @v_test_umin_ult_i16_multi_use
; GCN-NOT: v_min		; GCN-NOT: v_min
; GCN: v_cmp_lt_u32		; GCN: v_cmp_lt_u32
; GCN-NEXT: v_cndmask_b32		; GCN: v_cndmask_b32
; GCN-NOT: v_min		; GCN-NOT: v_min
; GCN: s_endpgm		; GCN: s_endpgm

; EG-NOT: MIN_UINT		; EG-NOT: MIN_UINT
define amdgpu_kernel void @v_test_umin_ult_i16_multi_use(i16 addrspace(1)* %out0, i1 addrspace(1)* %out1, i16 addrspace(1)* %aptr, i16 addrspace(1)* %bptr) #0 {		define amdgpu_kernel void @v_test_umin_ult_i16_multi_use(i16 addrspace(1)* %out0, i1 addrspace(1)* %out1, i16 addrspace(1)* %aptr, i16 addrspace(1)* %bptr) #0 {
%a = load i16, i16 addrspace(1)* %aptr, align 2		%a = load i16, i16 addrspace(1)* %aptr, align 2
%b = load i16, i16 addrspace(1)* %bptr, align 2		%b = load i16, i16 addrspace(1)* %bptr, align 2
%cmp = icmp ult i16 %a, %b		%cmp = icmp ult i16 %a, %b
▲ Show 20 Lines • Show All 234 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/misched-killflags.mir

Show All 29 Lines	bb.0:
S_ENDPGM 0		S_ENDPGM 0
...		...
# CHECK-LABEL: name: func0		# CHECK-LABEL: name: func0
# CHECK-DAG: $sgpr10 = S_MOV_B32 5		# CHECK-DAG: $sgpr10 = S_MOV_B32 5
# CHECK-DAG: $sgpr9 = S_MOV_B32 4		# CHECK-DAG: $sgpr9 = S_MOV_B32 4
# CHECK-DAG: $sgpr8 = S_MOV_B32 3		# CHECK-DAG: $sgpr8 = S_MOV_B32 3
# CHECK-DAG: $sgpr33 = S_MOV_B32 $sgpr7		# CHECK-DAG: $sgpr33 = S_MOV_B32 $sgpr7
# CHECK: $vgpr0 = V_MOV_B32_e32 $sgpr8, implicit $exec, implicit-def $vgpr0_vgpr1_vgpr2_vgpr3, implicit $sgpr8_sgpr9_sgpr10_sgpr11		# CHECK: $vgpr0 = V_MOV_B32_e32 $sgpr8, implicit $exec, implicit-def $vgpr0_vgpr1_vgpr2_vgpr3, implicit $sgpr8_sgpr9_sgpr10_sgpr11
# CHECK: $sgpr32 = S_MOV_B32 $sgpr33
# CHECK: BUNDLE implicit-def $sgpr6_sgpr7, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $scc {		# CHECK: BUNDLE implicit-def $sgpr6_sgpr7, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $scc {
# CHECK: $sgpr6_sgpr7 = S_GETPC_B64		# CHECK: $sgpr6_sgpr7 = S_GETPC_B64
# CHECK: $sgpr6 = S_ADD_U32 internal $sgpr6, 0, implicit-def $scc		# CHECK: $sgpr6 = S_ADD_U32 internal $sgpr6, 0, implicit-def $scc
# CHECK: $sgpr7 = S_ADDC_U32 internal $sgpr7, 0, implicit-def $scc, implicit internal $scc		# CHECK: $sgpr7 = S_ADDC_U32 internal $sgpr7, 0, implicit-def $scc, implicit internal $scc
# CHECK: }		# CHECK: }
		# CHECK: $sgpr32 = S_MOV_B32 $sgpr33
# CHECK: $sgpr4 = S_MOV_B32 killed $sgpr33		# CHECK: $sgpr4 = S_MOV_B32 killed $sgpr33
# CHECK: $vgpr1 = V_MOV_B32_e32 $sgpr9, implicit $exec, implicit $sgpr8_sgpr9_sgpr10_sgpr11		# CHECK: $vgpr1 = V_MOV_B32_e32 $sgpr9, implicit $exec, implicit $sgpr8_sgpr9_sgpr10_sgpr11
# CHECK: $vgpr2 = V_MOV_B32_e32 $sgpr10, implicit $exec, implicit $sgpr8_sgpr9_sgpr10_sgpr11		# CHECK: $vgpr2 = V_MOV_B32_e32 $sgpr10, implicit $exec, implicit $sgpr8_sgpr9_sgpr10_sgpr11
# CHECK: $vgpr3 = V_MOV_B32_e32 killed $sgpr11, implicit $exec, implicit $sgpr8_sgpr9_sgpr10_sgpr11, implicit $exec		# CHECK: $vgpr3 = V_MOV_B32_e32 killed $sgpr11, implicit $exec, implicit $sgpr8_sgpr9_sgpr10_sgpr11, implicit $exec
# CHECK: S_NOP 0, implicit $sgpr6_sgpr7, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit killed $sgpr4, implicit killed $vgpr0_vgpr1_vgpr2_vgpr3		# CHECK: S_NOP 0, implicit $sgpr6_sgpr7, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit killed $sgpr4, implicit killed $vgpr0_vgpr1_vgpr2_vgpr3
# CHECK: S_ENDPGM 0		# CHECK: S_ENDPGM 0

llvm/test/CodeGen/AMDGPU/mul24-pass-ordering.ll

	Show First 20 Lines • Show All 193 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v35, s34, 4			; GFX9-NEXT: v_writelane_b32 v35, s34, 4
	; GFX9-NEXT: s_mov_b32 s34, s32			; GFX9-NEXT: s_mov_b32 s34, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x800			; GFX9-NEXT: s_add_u32 s32, s32, 0x800
	; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s34 offset:8 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s34 offset:8 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v33, off, s[0:3], s34 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v33, off, s[0:3], s34 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v34, off, s[0:3], s34 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v34, off, s[0:3], s34 ; 4-byte Folded Spill
	; GFX9-NEXT: v_writelane_b32 v35, s36, 0			; GFX9-NEXT: v_writelane_b32 v35, s36, 0
	; GFX9-NEXT: v_writelane_b32 v35, s37, 1
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, foo@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, foo@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, foo@gotpcrel32@hi+4			; GFX9-NEXT: s_addc_u32 s5, s5, foo@gotpcrel32@hi+4
				; GFX9-NEXT: v_writelane_b32 v35, s37, 1
	; GFX9-NEXT: s_load_dwordx2 s[36:37], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[36:37], s[4:5], 0x0
	; GFX9-NEXT: v_mov_b32_e32 v32, v1			; GFX9-NEXT: v_mov_b32_e32 v32, v1
	; GFX9-NEXT: v_mov_b32_e32 v33, v0			; GFX9-NEXT: v_mov_b32_e32 v33, v0
	; GFX9-NEXT: v_writelane_b32 v35, s30, 2			; GFX9-NEXT: v_writelane_b32 v35, s30, 2
	; GFX9-NEXT: v_mul_u32_u24_e32 v0, v33, v32			; GFX9-NEXT: v_mul_u32_u24_e32 v0, v33, v32
	; GFX9-NEXT: v_writelane_b32 v35, s31, 3			; GFX9-NEXT: v_writelane_b32 v35, s31, 3
	; GFX9-NEXT: v_and_b32_e32 v34, 0xffffff, v32			; GFX9-NEXT: v_and_b32_e32 v34, 0xffffff, v32
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/mul_uint24-amdgcn.ll

	Show First 20 Lines • Show All 199 Lines • ▼ Show 20 Lines
	}			}

	; FUNC-LABEL: {{^}}test_umulhi24_i33:			; FUNC-LABEL: {{^}}test_umulhi24_i33:
	; GCN: s_load_dword s			; GCN: s_load_dword s
	; GCN: s_load_dword s			; GCN: s_load_dword s
	; GCN-NOT: and			; GCN-NOT: and
	; GCN-NOT: lshr			; GCN-NOT: lshr
	; GCN: v_mul_hi_u32_u24_e32 v[[MUL_HI:[0-9]+]],			; GCN: v_mul_hi_u32_u24_e32 v[[MUL_HI:[0-9]+]],
	; GCN-NEXT: v_and_b32_e32 v[[HI:[0-9]+]], 1, v[[MUL_HI]]			; GCN: v_and_b32_e32 v[[HI:[0-9]+]], 1, v[[MUL_HI]]
	; GCN-NEXT: buffer_store_dword v[[HI]]			; GCN-NEXT: buffer_store_dword v[[HI]]
	define amdgpu_kernel void @test_umulhi24_i33(i32 addrspace(1)* %out, i33 %a, i33 %b) {			define amdgpu_kernel void @test_umulhi24_i33(i32 addrspace(1)* %out, i33 %a, i33 %b) {
	entry:			entry:
	%tmp0 = shl i33 %a, 9			%tmp0 = shl i33 %a, 9
	%a_24 = lshr i33 %tmp0, 9			%a_24 = lshr i33 %tmp0, 9
	%tmp1 = shl i33 %b, 9			%tmp1 = shl i33 %b, 9
	%b_24 = lshr i33 %tmp1, 9			%b_24 = lshr i33 %tmp1, 9
	%tmp2 = mul i33 %a_24, %b_24			%tmp2 = mul i33 %a_24, %b_24
	▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/packed-op-sel.ll

	Show First 20 Lines • Show All 198 Lines • ▼ Show 20 Lines

	; GCN-LABEL: {{^}}fma_vector_vector_scalar_lo_neg_scalar_hi:			; GCN-LABEL: {{^}}fma_vector_vector_scalar_lo_neg_scalar_hi:
	; GCN: ds_read_b32 [[VEC0:v[0-9]+]]			; GCN: ds_read_b32 [[VEC0:v[0-9]+]]
	; GCN: ds_read_b32 [[VEC1:v[0-9]+]]			; GCN: ds_read_b32 [[VEC1:v[0-9]+]]
	; GCN: ds_read_u16 [[SCALAR0:v[0-9]+]]			; GCN: ds_read_u16 [[SCALAR0:v[0-9]+]]
	; GCN: ds_read_u16 [[SCALAR1:v[0-9]+]]			; GCN: ds_read_u16 [[SCALAR1:v[0-9]+]]

	; FIXME: Remove and			; FIXME: Remove and
	; GCN: v_and_b32_e32 [[SCALAR0]], 0xffff, [[SCALAR0]]			; GCN-DAG: v_and_b32_e32 [[SCALAR0]], 0xffff, [[SCALAR0]]
	; GCN: v_xor_b32_e32 [[SCALAR1]], 0x8000, [[SCALAR1]]			; GCN-DAG: v_xor_b32_e32 [[SCALAR1]], 0x8000, [[SCALAR1]]
	; GCN: v_lshl_or_b32 [[PACKED:v[0-9]+]], [[SCALAR1]], 16, [[SCALAR0]]			; GCN: v_lshl_or_b32 [[PACKED:v[0-9]+]], [[SCALAR1]], 16, [[SCALAR0]]

	; GCN: v_pk_fma_f16 v{{[0-9]+}}, [[VEC0]], [[VEC1]], [[PACKED]]{{$}}			; GCN: v_pk_fma_f16 v{{[0-9]+}}, [[VEC0]], [[VEC1]], [[PACKED]]{{$}}
	define amdgpu_kernel void @fma_vector_vector_scalar_lo_neg_scalar_hi(<2 x half> addrspace(1)* %out, <2 x half> addrspace(3)* %lds, half addrspace(3)* %arg2) #0 {			define amdgpu_kernel void @fma_vector_vector_scalar_lo_neg_scalar_hi(<2 x half> addrspace(1)* %out, <2 x half> addrspace(3)* %lds, half addrspace(3)* %arg2) #0 {
	bb:			bb:
	%lds.gep1 = getelementptr inbounds <2 x half>, <2 x half> addrspace(3)* %lds, i32 1			%lds.gep1 = getelementptr inbounds <2 x half>, <2 x half> addrspace(3)* %lds, i32 1
	%arg2.gep = getelementptr inbounds half, half addrspace(3)* %arg2, i32 2			%arg2.gep = getelementptr inbounds half, half addrspace(3)* %arg2, i32 2

	▲ Show 20 Lines • Show All 438 Lines • ▼ Show 20 Lines
	; GCN: ds_read_b32 [[VEC2:v[0-9]+]]			; GCN: ds_read_b32 [[VEC2:v[0-9]+]]

	; GCN-NOT: pack			; GCN-NOT: pack
	; GCN-NOT: and			; GCN-NOT: and
	; GCN-NOT: shl			; GCN-NOT: shl
	; GCN-NOT: _or			; GCN-NOT: _or

	; GCN: v_pk_add_f16 [[FADD:v[0-9]+]]			; GCN: v_pk_add_f16 [[FADD:v[0-9]+]]
	; GCN-NEXT: v_pk_fma_f16 v{{[0-9]+}}, [[VEC0]], [[VEC1]], [[FADD]] op_sel:[0,0,1] op_sel_hi:[1,1,0]{{$}}			; GCN: v_pk_fma_f16 v{{[0-9]+}}, [[VEC0]], [[VEC1]], [[FADD]] op_sel:[0,0,1] op_sel_hi:[1,1,0]{{$}}
	define amdgpu_kernel void @mix_elt_types_op_sel(<2 x half> addrspace(1)* %out, <2 x half> addrspace(3)* %lds) #0 {			define amdgpu_kernel void @mix_elt_types_op_sel(<2 x half> addrspace(1)* %out, <2 x half> addrspace(3)* %lds) #0 {
	bb:			bb:
	%lds.gep1 = getelementptr inbounds <2 x half>, <2 x half> addrspace(3)* %lds, i32 1			%lds.gep1 = getelementptr inbounds <2 x half>, <2 x half> addrspace(3)* %lds, i32 1
	%lds.gep2 = getelementptr inbounds <2 x half>, <2 x half> addrspace(3)* %lds, i32 2			%lds.gep2 = getelementptr inbounds <2 x half>, <2 x half> addrspace(3)* %lds, i32 2

	%vec0 = load volatile <2 x half>, <2 x half> addrspace(3)* %lds, align 4			%vec0 = load volatile <2 x half>, <2 x half> addrspace(3)* %lds, align 4
	%vec1 = load volatile <2 x half>, <2 x half> addrspace(3)* %lds.gep1, align 4			%vec1 = load volatile <2 x half>, <2 x half> addrspace(3)* %lds.gep1, align 4
	%vec2 = load volatile <2 x half>, <2 x half> addrspace(3)* %lds.gep2, align 4			%vec2 = load volatile <2 x half>, <2 x half> addrspace(3)* %lds.gep2, align 4
	Show All 20 Lines

llvm/test/CodeGen/AMDGPU/scratch-simple.ll

	Show All 16 Lines
	; GCN-DAG: s_mov_b32 s4, SCRATCH_RSRC_DWORD0			; GCN-DAG: s_mov_b32 s4, SCRATCH_RSRC_DWORD0
	; GCN-DAG: s_mov_b32 s5, SCRATCH_RSRC_DWORD1			; GCN-DAG: s_mov_b32 s5, SCRATCH_RSRC_DWORD1
	; GCN-DAG: s_mov_b32 s6, -1			; GCN-DAG: s_mov_b32 s6, -1
	; SI-DAG: s_mov_b32 s7, 0xe8f000			; SI-DAG: s_mov_b32 s7, 0xe8f000
	; VI-DAG: s_mov_b32 s7, 0xe80000			; VI-DAG: s_mov_b32 s7, 0xe80000
	; GFX9-DAG: s_mov_b32 s7, 0xe00000			; GFX9-DAG: s_mov_b32 s7, 0xe00000
	; GFX10_W32-DAG: s_mov_b32 s7, 0x31c16000			; GFX10_W32-DAG: s_mov_b32 s7, 0x31c16000
	; GFX10_W64-DAG: s_mov_b32 s7, 0x31e16000			; GFX10_W64-DAG: s_mov_b32 s7, 0x31e16000
	; GCN-NOT: s_mov_b32 s0
	; GCN-DAG: v_lshlrev_b32_e32 [[BYTES:v[0-9]+]], 2, v0			; GCN-DAG: v_lshlrev_b32_e32 [[BYTES:v[0-9]+]], 2, v0
	; GCN-DAG: v_and_b32_e32 [[CLAMP_IDX:v[0-9]+]], 0x1fc, [[BYTES]]			; GCN-DAG: v_and_b32_e32 [[CLAMP_IDX:v[0-9]+]], 0x1fc, [[BYTES]]
				; GCN-NOT: s_mov_b32 s0

	; GCN-DAG: v_or_b32_e32 [[LO_OFF:v[0-9]+]], 0x200, [[CLAMP_IDX]]			; GCN-DAG: v_or_b32_e32 [[LO_OFF:v[0-9]+]], 0x200, [[CLAMP_IDX]]
	; GCN-DAG: v_or_b32_e32 [[HI_OFF:v[0-9]+]], 0x400, [[CLAMP_IDX]]			; GCN-DAG: v_or_b32_e32 [[HI_OFF:v[0-9]+]], 0x400, [[CLAMP_IDX]]

	; GCN: buffer_load_dword {{v[0-9]+}}, [[LO_OFF]], {{s\[[0-9]+:[0-9]+\]}}, s0 offen			; GCN: buffer_load_dword {{v[0-9]+}}, [[LO_OFF]], {{s\[[0-9]+:[0-9]+\]}}, s0 offen
	; GCN: buffer_load_dword {{v[0-9]+}}, [[HI_OFF]], {{s\[[0-9]+:[0-9]+\]}}, s0 offen			; GCN: buffer_load_dword {{v[0-9]+}}, [[HI_OFF]], {{s\[[0-9]+:[0-9]+\]}}, s0 offen
	define amdgpu_ps float @ps_main(i32 %idx) {			define amdgpu_ps float @ps_main(i32 %idx) {
	%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx			%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
	▲ Show 20 Lines • Show All 99 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/selectcc-opt.ll

Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	entry:
%0 = icmp sgt i32 %in, 0		%0 = icmp sgt i32 %in, 0
%1 = select i1 %0, float 2.0, float 3.0		%1 = select i1 %0, float 2.0, float 3.0
store float %1, float addrspace(1)* %out		store float %1, float addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}selectcc_bool:		; FUNC-LABEL: {{^}}selectcc_bool:
; SI: v_cmp_ne_u32		; SI: v_cmp_ne_u32
; SI-NEXT: v_cndmask_b32_e64		; SI: v_cndmask_b32_e64
; SI-NOT: cmp		; SI-NOT: cmp
; SI-NOT: cndmask		; SI-NOT: cndmask
define amdgpu_kernel void @selectcc_bool(i32 addrspace(1)* %out, i32 %a, i32 %b) nounwind {		define amdgpu_kernel void @selectcc_bool(i32 addrspace(1)* %out, i32 %a, i32 %b) nounwind {
%icmp0 = icmp ne i32 %a, %b		%icmp0 = icmp ne i32 %a, %b
%ext = select i1 %icmp0, i32 -1, i32 0		%ext = select i1 %icmp0, i32 -1, i32 0
store i32 %ext, i32 addrspace(1)* %out		store i32 %ext, i32 addrspace(1)* %out
ret void		ret void
}		}

llvm/test/CodeGen/AMDGPU/setcc-opt.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=GCN -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=GCN -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=VI -check-prefix=GCN -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=VI -check-prefix=GCN -check-prefix=FUNC %s
	; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s			; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s

	; FUNC-LABEL: {{^}}sext_bool_icmp_eq_0:			; FUNC-LABEL: {{^}}sext_bool_icmp_eq_0:
	; GCN-NOT: v_cmp			; GCN-NOT: v_cmp
	; GCN: v_cmp_ne_u32_e32 vcc,			; GCN: v_cmp_ne_u32_e32 vcc,
	; GCN-NEXT: v_cndmask_b32_e64 [[RESULT:v[0-9]+]], 0, 1, vcc			; GCN: v_cndmask_b32_e64 [[RESULT:v[0-9]+]], 0, 1, vcc
	; GCN-NEXT:buffer_store_byte [[RESULT]]			; GCN-NEXT:buffer_store_byte [[RESULT]]
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm

	; EG: SETNE_INT * [[CMP:T[0-9]+]].[[CMPCHAN:[XYZW]]], KC0[2].Z, KC0[2].W			; EG: SETNE_INT * [[CMP:T[0-9]+]].[[CMPCHAN:[XYZW]]], KC0[2].Z, KC0[2].W
	; EG: AND_INT T{{[0-9]+.[XYZW]}}, PS, 1			; EG: AND_INT T{{[0-9]+.[XYZW]}}, PS, 1
	define amdgpu_kernel void @sext_bool_icmp_eq_0(i1 addrspace(1)* %out, i32 %a, i32 %b) nounwind {			define amdgpu_kernel void @sext_bool_icmp_eq_0(i1 addrspace(1)* %out, i32 %a, i32 %b) nounwind {
	%icmp0 = icmp eq i32 %a, %b			%icmp0 = icmp eq i32 %a, %b
	%ext = sext i1 %icmp0 to i32			%ext = sext i1 %icmp0 to i32
	%icmp1 = icmp eq i32 %ext, 0			%icmp1 = icmp eq i32 %ext, 0
	store i1 %icmp1, i1 addrspace(1)* %out			store i1 %icmp1, i1 addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}sext_bool_icmp_ne_0:			; FUNC-LABEL: {{^}}sext_bool_icmp_ne_0:
	; GCN-NOT: v_cmp			; GCN-NOT: v_cmp
	; GCN: v_cmp_ne_u32_e32 vcc,			; GCN: v_cmp_ne_u32_e32 vcc,
	; GCN-NEXT: v_cndmask_b32_e64 [[RESULT:v[0-9]+]], 0, 1, vcc			; GCN: v_cndmask_b32_e64 [[RESULT:v[0-9]+]], 0, 1, vcc
	; GCN-NEXT: buffer_store_byte [[RESULT]]			; GCN-NEXT: buffer_store_byte [[RESULT]]
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm

	; EG: SETNE_INT * [[CMP:T[0-9]+]].[[CMPCHAN:[XYZW]]], KC0[2].Z, KC0[2].W			; EG: SETNE_INT * [[CMP:T[0-9]+]].[[CMPCHAN:[XYZW]]], KC0[2].Z, KC0[2].W
	; EG: AND_INT T{{[0-9]+.[XYZW]}}, PS, 1			; EG: AND_INT T{{[0-9]+.[XYZW]}}, PS, 1
	define amdgpu_kernel void @sext_bool_icmp_ne_0(i1 addrspace(1)* %out, i32 %a, i32 %b) nounwind {			define amdgpu_kernel void @sext_bool_icmp_ne_0(i1 addrspace(1)* %out, i32 %a, i32 %b) nounwind {
	%icmp0 = icmp ne i32 %a, %b			%icmp0 = icmp ne i32 %a, %b
	%ext = sext i1 %icmp0 to i32			%ext = sext i1 %icmp0 to i32
	%icmp1 = icmp ne i32 %ext, 0			%icmp1 = icmp ne i32 %ext, 0
	store i1 %icmp1, i1 addrspace(1)* %out			store i1 %icmp1, i1 addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}sext_bool_icmp_eq_neg1:			; FUNC-LABEL: {{^}}sext_bool_icmp_eq_neg1:
	; GCN-NOT: v_cmp			; GCN-NOT: v_cmp
	; GCN: v_cmp_eq_u32_e32 vcc,			; GCN: v_cmp_eq_u32_e32 vcc,
	; GCN-NEXT: v_cndmask_b32_e64 [[RESULT:v[0-9]+]], 0, 1, vcc			; GCN: v_cndmask_b32_e64 [[RESULT:v[0-9]+]], 0, 1, vcc
	; GCN-NEXT: buffer_store_byte [[RESULT]]			; GCN-NEXT: buffer_store_byte [[RESULT]]
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	define amdgpu_kernel void @sext_bool_icmp_eq_neg1(i1 addrspace(1)* %out, i32 %a, i32 %b) nounwind {			define amdgpu_kernel void @sext_bool_icmp_eq_neg1(i1 addrspace(1)* %out, i32 %a, i32 %b) nounwind {
	%icmp0 = icmp eq i32 %a, %b			%icmp0 = icmp eq i32 %a, %b
	%ext = sext i1 %icmp0 to i32			%ext = sext i1 %icmp0 to i32
	%icmp1 = icmp eq i32 %ext, -1			%icmp1 = icmp eq i32 %ext, -1
	store i1 %icmp1, i1 addrspace(1)* %out			store i1 %icmp1, i1 addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}sext_bool_icmp_ne_neg1:			; FUNC-LABEL: {{^}}sext_bool_icmp_ne_neg1:
	; GCN-NOT: v_cmp			; GCN-NOT: v_cmp
	; GCN: v_cmp_eq_u32_e32 vcc,			; GCN: v_cmp_eq_u32_e32 vcc,
	; GCN-NEXT: v_cndmask_b32_e64 [[RESULT:v[0-9]+]], 0, 1, vcc			; GCN: v_cndmask_b32_e64 [[RESULT:v[0-9]+]], 0, 1, vcc
	; GCN-NEXT: buffer_store_byte [[RESULT]]			; GCN-NEXT: buffer_store_byte [[RESULT]]
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	define amdgpu_kernel void @sext_bool_icmp_ne_neg1(i1 addrspace(1)* %out, i32 %a, i32 %b) nounwind {			define amdgpu_kernel void @sext_bool_icmp_ne_neg1(i1 addrspace(1)* %out, i32 %a, i32 %b) nounwind {
	%icmp0 = icmp ne i32 %a, %b			%icmp0 = icmp ne i32 %a, %b
	%ext = sext i1 %icmp0 to i32			%ext = sext i1 %icmp0 to i32
	%icmp1 = icmp ne i32 %ext, -1			%icmp1 = icmp ne i32 %ext, -1
	store i1 %icmp1, i1 addrspace(1)* %out			store i1 %icmp1, i1 addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}zext_bool_icmp_eq_0:			; FUNC-LABEL: {{^}}zext_bool_icmp_eq_0:
	; GCN-NOT: v_cmp			; GCN-NOT: v_cmp
	; GCN: v_cmp_ne_u32_e32 vcc,			; GCN: v_cmp_ne_u32_e32 vcc,
	; GCN-NEXT: v_cndmask_b32_e64 [[RESULT:v[0-9]+]], 0, 1, vcc			; GCN: v_cndmask_b32_e64 [[RESULT:v[0-9]+]], 0, 1, vcc
	; GCN-NEXT: buffer_store_byte [[RESULT]]			; GCN-NEXT: buffer_store_byte [[RESULT]]
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	define amdgpu_kernel void @zext_bool_icmp_eq_0(i1 addrspace(1)* %out, i32 %a, i32 %b) nounwind {			define amdgpu_kernel void @zext_bool_icmp_eq_0(i1 addrspace(1)* %out, i32 %a, i32 %b) nounwind {
	%icmp0 = icmp eq i32 %a, %b			%icmp0 = icmp eq i32 %a, %b
	%ext = zext i1 %icmp0 to i32			%ext = zext i1 %icmp0 to i32
	%icmp1 = icmp eq i32 %ext, 0			%icmp1 = icmp eq i32 %ext, 0
	store i1 %icmp1, i1 addrspace(1)* %out			store i1 %icmp1, i1 addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}zext_bool_icmp_ne_0:			; FUNC-LABEL: {{^}}zext_bool_icmp_ne_0:
	; GCN-NOT: v_cmp			; GCN-NOT: v_cmp
	; GCN: v_cmp_ne_u32_e32 vcc,			; GCN: v_cmp_ne_u32_e32 vcc,
	; GCN-NEXT: v_cndmask_b32_e64 [[RESULT:v[0-9]+]], 0, 1, vcc			; GCN: v_cndmask_b32_e64 [[RESULT:v[0-9]+]], 0, 1, vcc
	; GCN-NEXT: buffer_store_byte [[RESULT]]			; GCN-NEXT: buffer_store_byte [[RESULT]]
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	define amdgpu_kernel void @zext_bool_icmp_ne_0(i1 addrspace(1)* %out, i32 %a, i32 %b) nounwind {			define amdgpu_kernel void @zext_bool_icmp_ne_0(i1 addrspace(1)* %out, i32 %a, i32 %b) nounwind {
	%icmp0 = icmp ne i32 %a, %b			%icmp0 = icmp ne i32 %a, %b
	%ext = zext i1 %icmp0 to i32			%ext = zext i1 %icmp0 to i32
	%icmp1 = icmp ne i32 %ext, 0			%icmp1 = icmp ne i32 %ext, 0
	store i1 %icmp1, i1 addrspace(1)* %out			store i1 %icmp1, i1 addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}zext_bool_icmp_eq_1:			; FUNC-LABEL: {{^}}zext_bool_icmp_eq_1:
	; GCN-NOT: v_cmp			; GCN-NOT: v_cmp
	; GCN: v_cmp_eq_u32_e32 vcc,			; GCN: v_cmp_eq_u32_e32 vcc,
	; GCN-NEXT: v_cndmask_b32_e64 [[RESULT:v[0-9]+]], 0, 1, vcc			; GCN: v_cndmask_b32_e64 [[RESULT:v[0-9]+]], 0, 1, vcc
	; GCN-NEXT: buffer_store_byte [[RESULT]]			; GCN-NEXT: buffer_store_byte [[RESULT]]
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	define amdgpu_kernel void @zext_bool_icmp_eq_1(i1 addrspace(1)* %out, i32 %a, i32 %b) nounwind {			define amdgpu_kernel void @zext_bool_icmp_eq_1(i1 addrspace(1)* %out, i32 %a, i32 %b) nounwind {
	%icmp0 = icmp eq i32 %a, %b			%icmp0 = icmp eq i32 %a, %b
	%ext = zext i1 %icmp0 to i32			%ext = zext i1 %icmp0 to i32
	%icmp1 = icmp eq i32 %ext, 1			%icmp1 = icmp eq i32 %ext, 1
	store i1 %icmp1, i1 addrspace(1)* %out			store i1 %icmp1, i1 addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}zext_bool_icmp_ne_1:			; FUNC-LABEL: {{^}}zext_bool_icmp_ne_1:
	; GCN-NOT: v_cmp			; GCN-NOT: v_cmp
	; GCN: v_cmp_eq_u32_e32 vcc,			; GCN: v_cmp_eq_u32_e32 vcc,
	; GCN-NEXT: v_cndmask_b32_e64 [[RESULT:v[0-9]+]], 0, 1, vcc			; GCN: v_cndmask_b32_e64 [[RESULT:v[0-9]+]], 0, 1, vcc
	; GCN-NEXT: buffer_store_byte [[RESULT]]			; GCN-NEXT: buffer_store_byte [[RESULT]]
	define amdgpu_kernel void @zext_bool_icmp_ne_1(i1 addrspace(1)* %out, i32 %a, i32 %b) nounwind {			define amdgpu_kernel void @zext_bool_icmp_ne_1(i1 addrspace(1)* %out, i32 %a, i32 %b) nounwind {
	%icmp0 = icmp ne i32 %a, %b			%icmp0 = icmp ne i32 %a, %b
	%ext = zext i1 %icmp0 to i32			%ext = zext i1 %icmp0 to i32
	%icmp1 = icmp ne i32 %ext, 1			%icmp1 = icmp ne i32 %ext, 1
	store i1 %icmp1, i1 addrspace(1)* %out			store i1 %icmp1, i1 addrspace(1)* %out
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 165 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/sint_to_fp.ll

Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @v_sint_to_fp_v4i32(<4 x float> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) #0 {
%value = load <4 x i32>, <4 x i32> addrspace(1)* %in.gep		%value = load <4 x i32>, <4 x i32> addrspace(1)* %in.gep
%result = sitofp <4 x i32> %value to <4 x float>		%result = sitofp <4 x i32> %value to <4 x float>
store <4 x float> %result, <4 x float> addrspace(1)* %out.gep		store <4 x float> %result, <4 x float> addrspace(1)* %out.gep
ret void		ret void
}		}

; FUNC-LABEL: {{^}}s_sint_to_fp_i1_f32:		; FUNC-LABEL: {{^}}s_sint_to_fp_i1_f32:
; SI: v_cmp_eq_u32_e64 [[CMP:s\[[0-9]+:[0-9]\]]],		; SI: v_cmp_eq_u32_e64 [[CMP:s\[[0-9]+:[0-9]\]]],
; SI-NEXT: v_cndmask_b32_e64 [[RESULT:v[0-9]+]], 0, 1.0, [[CMP]]		; SI: v_cndmask_b32_e64 [[RESULT:v[0-9]+]], 0, 1.0, [[CMP]]
; SI: buffer_store_dword [[RESULT]],		; SI: buffer_store_dword [[RESULT]],
; SI: s_endpgm		; SI: s_endpgm
define amdgpu_kernel void @s_sint_to_fp_i1_f32(float addrspace(1)* %out, i32 %in) #0 {		define amdgpu_kernel void @s_sint_to_fp_i1_f32(float addrspace(1)* %out, i32 %in) #0 {
%cmp = icmp eq i32 %in, 0		%cmp = icmp eq i32 %in, 0
%fp = uitofp i1 %cmp to float		%fp = uitofp i1 %cmp to float
store float %fp, float addrspace(1)* %out		store float %fp, float addrspace(1)* %out
ret void		ret void
}		}
Show All 32 Lines

llvm/test/CodeGen/AMDGPU/spill-vgpr-to-agpr.ll

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @max_10_vgprs(i32 addrspace(1)* %p) #0 {
store volatile i32 %v9, i32 addrspace(1)* undef		store volatile i32 %v9, i32 addrspace(1)* undef
store volatile i32 %v10, i32 addrspace(1)* undef		store volatile i32 %v10, i32 addrspace(1)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}max_10_vgprs_used_9a:		; GCN-LABEL: {{^}}max_10_vgprs_used_9a:
; GFX908-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD0		; GFX908-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD0
; GFX908-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD1		; GFX908-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD1
; GFX908: v_accvgpr_write_b32 a9, v{{[0-9]}}		; GFX908-DAG: v_accvgpr_write_b32 a9, v{{[0-9]}}
; GFX908: buffer_store_dword v{{[0-9]}},		; GFX908: buffer_store_dword v{{[0-9]}},
; GFX908-NOT: buffer_		; GFX908-NOT: buffer_
; GFX908: v_accvgpr_read_b32 v{{[0-9]}}, a9		; GFX908: v_accvgpr_read_b32 v{{[0-9]}}, a9
; GFX908: buffer_load_dword v{{[0-9]}},		; GFX908: buffer_load_dword v{{[0-9]}},
; GFX908-NOT: buffer_		; GFX908-NOT: buffer_

; GFX900: couldn't allocate input reg for constraint 'a'		; GFX900: couldn't allocate input reg for constraint 'a'

▲ Show 20 Lines • Show All 220 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/sub.i16.ll

Show First 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @v_test_sub_i16_zext_to_i64(i64 addrspace(1)* %out, i16 addrspace(1)* %in0, i16 addrspace(1)* %in1) #1 {
ret void		ret void
}		}

; FIXME: Need to handle non-uniform case for function below (load without gep).		; FIXME: Need to handle non-uniform case for function below (load without gep).
; GCN-LABEL: {{^}}v_test_sub_i16_sext_to_i32:		; GCN-LABEL: {{^}}v_test_sub_i16_sext_to_i32:
; VI: flat_load_ushort [[A:v[0-9]+]]		; VI: flat_load_ushort [[A:v[0-9]+]]
; VI: flat_load_ushort [[B:v[0-9]+]]		; VI: flat_load_ushort [[B:v[0-9]+]]
; VI: v_sub_u16_e32 [[ADD:v[0-9]+]], [[A]], [[B]]		; VI: v_sub_u16_e32 [[ADD:v[0-9]+]], [[A]], [[B]]
; VI-NEXT: v_bfe_i32 [[SEXT:v[0-9]+]], [[ADD]], 0, 16		; VI: v_bfe_i32 [[SEXT:v[0-9]+]], [[ADD]], 0, 16
; VI-NEXT: buffer_store_dword [[SEXT]]		; VI-NEXT: buffer_store_dword [[SEXT]]
define amdgpu_kernel void @v_test_sub_i16_sext_to_i32(i32 addrspace(1)* %out, i16 addrspace(1)* %in0, i16 addrspace(1)* %in1) #1 {		define amdgpu_kernel void @v_test_sub_i16_sext_to_i32(i32 addrspace(1)* %out, i16 addrspace(1)* %in0, i16 addrspace(1)* %in1) #1 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep.out = getelementptr inbounds i32, i32 addrspace(1)* %out, i32 %tid		%gep.out = getelementptr inbounds i32, i32 addrspace(1)* %out, i32 %tid
%gep.in0 = getelementptr inbounds i16, i16 addrspace(1)* %in0, i32 %tid		%gep.in0 = getelementptr inbounds i16, i16 addrspace(1)* %in0, i32 %tid
%gep.in1 = getelementptr inbounds i16, i16 addrspace(1)* %in1, i32 %tid		%gep.in1 = getelementptr inbounds i16, i16 addrspace(1)* %in1, i32 %tid
%a = load i16, i16 addrspace(1)* %gep.in0		%a = load i16, i16 addrspace(1)* %gep.in0
%b = load i16, i16 addrspace(1)* %gep.in1		%b = load i16, i16 addrspace(1)* %gep.in1
%add = sub i16 %a, %b		%add = sub i16 %a, %b
%ext = sext i16 %add to i32		%ext = sext i16 %add to i32
store i32 %ext, i32 addrspace(1)* %out		store i32 %ext, i32 addrspace(1)* %out
ret void		ret void
}		}

; FIXME: Need to handle non-uniform case for function below (load without gep).		; FIXME: Need to handle non-uniform case for function below (load without gep).
; GCN-LABEL: {{^}}v_test_sub_i16_sext_to_i64:		; GCN-LABEL: {{^}}v_test_sub_i16_sext_to_i64:
; VI: flat_load_ushort [[A:v[0-9]+]]		; VI: flat_load_ushort [[A:v[0-9]+]]
; VI: flat_load_ushort [[B:v[0-9]+]]		; VI: flat_load_ushort [[B:v[0-9]+]]
; VI: v_sub_u16_e32 [[ADD:v[0-9]+]], [[A]], [[B]]		; VI: v_sub_u16_e32 [[ADD:v[0-9]+]], [[A]], [[B]]
; VI-NEXT: v_bfe_i32 v[[LO:[0-9]+]], [[ADD]], 0, 16		; VI-NEXT: v_bfe_i32 v[[LO:[0-9]+]], [[ADD]], 0, 16
; VI-NEXT: v_ashrrev_i32_e32 v[[HI:[0-9]+]], 31, v[[LO]]		; VI: v_ashrrev_i32_e32 v[[HI:[0-9]+]], 31, v[[LO]]
; VI-NEXT: buffer_store_dwordx2 v{{\[}}[[LO]]:[[HI]]{{\]}}		; VI-NEXT: buffer_store_dwordx2 v{{\[}}[[LO]]:[[HI]]{{\]}}
define amdgpu_kernel void @v_test_sub_i16_sext_to_i64(i64 addrspace(1)* %out, i16 addrspace(1)* %in0, i16 addrspace(1)* %in1) #1 {		define amdgpu_kernel void @v_test_sub_i16_sext_to_i64(i64 addrspace(1)* %out, i16 addrspace(1)* %in0, i16 addrspace(1)* %in1) #1 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep.out = getelementptr inbounds i64, i64 addrspace(1)* %out, i32 %tid		%gep.out = getelementptr inbounds i64, i64 addrspace(1)* %out, i32 %tid
%gep.in0 = getelementptr inbounds i16, i16 addrspace(1)* %in0, i32 %tid		%gep.in0 = getelementptr inbounds i16, i16 addrspace(1)* %in0, i32 %tid
%gep.in1 = getelementptr inbounds i16, i16 addrspace(1)* %in1, i32 %tid		%gep.in1 = getelementptr inbounds i16, i16 addrspace(1)* %in1, i32 %tid
%a = load i16, i16 addrspace(1)* %gep.in0		%a = load i16, i16 addrspace(1)* %gep.in0
%b = load i16, i16 addrspace(1)* %gep.in1		%b = load i16, i16 addrspace(1)* %gep.in1
Show All 29 Lines

llvm/test/CodeGen/AMDGPU/uint_to_fp.ll

Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @v_uint_to_fp_v4i32(<4 x float> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) #0 {
%value = load <4 x i32>, <4 x i32> addrspace(1)* %in.gep		%value = load <4 x i32>, <4 x i32> addrspace(1)* %in.gep
%result = uitofp <4 x i32> %value to <4 x float>		%result = uitofp <4 x i32> %value to <4 x float>
store <4 x float> %result, <4 x float> addrspace(1)* %out.gep		store <4 x float> %result, <4 x float> addrspace(1)* %out.gep
ret void		ret void
}		}

; FUNC-LABEL: {{^}}s_uint_to_fp_i1_to_f32:		; FUNC-LABEL: {{^}}s_uint_to_fp_i1_to_f32:
; SI: v_cmp_eq_u32_e64 [[CMP:s\[[0-9]+:[0-9]\]]],		; SI: v_cmp_eq_u32_e64 [[CMP:s\[[0-9]+:[0-9]\]]],
; SI-NEXT: v_cndmask_b32_e64 [[RESULT:v[0-9]+]], 0, 1.0, [[CMP]]		; SI: v_cndmask_b32_e64 [[RESULT:v[0-9]+]], 0, 1.0, [[CMP]]
; SI: buffer_store_dword [[RESULT]],		; SI: buffer_store_dword [[RESULT]],
; SI: s_endpgm		; SI: s_endpgm
define amdgpu_kernel void @s_uint_to_fp_i1_to_f32(float addrspace(1)* %out, i32 %in) #0 {		define amdgpu_kernel void @s_uint_to_fp_i1_to_f32(float addrspace(1)* %out, i32 %in) #0 {
%cmp = icmp eq i32 %in, 0		%cmp = icmp eq i32 %in, 0
%fp = uitofp i1 %cmp to float		%fp = uitofp i1 %cmp to float
store float %fp, float addrspace(1)* %out		store float %fp, float addrspace(1)* %out
ret void		ret void
}		}
▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/wave32.ll

	Show First 20 Lines • Show All 485 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @fdiv_f32(float addrspace(1)* %out, float %a, float %b) #0 {			define amdgpu_kernel void @fdiv_f32(float addrspace(1)* %out, float %a, float %b) #0 {
	entry:			entry:
	%fdiv = fdiv float %a, %b			%fdiv = fdiv float %a, %b
	store float %fdiv, float addrspace(1)* %out			store float %fdiv, float addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}test_br_cc_f16:			; GCN-LABEL: {{^}}test_br_cc_f16:
	; GFX1032: v_cmp_nlt_f16_e32 vcc_lo,			; GFX1032: v_cmp_nlt_f16_e32 vcc_lo,
	; GFX1032-NEXT: s_and_b32 vcc_lo, exec_lo, vcc_lo			; GFX1032: s_and_b32 vcc_lo, exec_lo, vcc_lo
	; GFX1064: v_cmp_nlt_f16_e32 vcc,			; GFX1064: v_cmp_nlt_f16_e32 vcc,
	; GFX1064-NEXT: s_and_b64 vcc, exec, vcc{{$}}			; GFX1064: s_and_b64 vcc, exec, vcc{{$}}
	; GCN-NEXT: s_cbranch_vccnz			; GCN-NEXT: s_cbranch_vccnz
	define amdgpu_kernel void @test_br_cc_f16(			define amdgpu_kernel void @test_br_cc_f16(
	half addrspace(1)* %r,			half addrspace(1)* %r,
	half addrspace(1)* %a,			half addrspace(1)* %a,
	half addrspace(1)* %b) {			half addrspace(1)* %b) {
	entry:			entry:
	%a.val = load half, half addrspace(1)* %a			%a.val = load half, half addrspace(1)* %a
	%b.val = load half, half addrspace(1)* %b			%b.val = load half, half addrspace(1)* %b
	▲ Show 20 Lines • Show All 625 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/zero_extend.ll

	Show All 30 Lines
	; GCN-LABEL: {{^}}s_arg_zext_i1_to_i64:			; GCN-LABEL: {{^}}s_arg_zext_i1_to_i64:
	define amdgpu_kernel void @s_arg_zext_i1_to_i64(i64 addrspace(1)* %out, i1 zeroext %arg) #0 {			define amdgpu_kernel void @s_arg_zext_i1_to_i64(i64 addrspace(1)* %out, i1 zeroext %arg) #0 {
	%ext = zext i1 %arg to i64			%ext = zext i1 %arg to i64
	store i64 %ext, i64 addrspace(1)* %out, align 8			store i64 %ext, i64 addrspace(1)* %out, align 8
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}s_cmp_zext_i1_to_i64:			; GCN-LABEL: {{^}}s_cmp_zext_i1_to_i64:
	; GCN: s_mov_b32 s{{[0-9]+}}, 0			; GCN-DAG: s_mov_b32 s{{[0-9]+}}, 0
	; GCN: v_cmp_eq_u32			; GCN-DAG: v_cmp_eq_u32
	; GCN: v_cndmask_b32			; GCN: v_cndmask_b32
	define amdgpu_kernel void @s_cmp_zext_i1_to_i64(i64 addrspace(1)* %out, i32 %a, i32 %b) #0 {			define amdgpu_kernel void @s_cmp_zext_i1_to_i64(i64 addrspace(1)* %out, i32 %a, i32 %b) #0 {
	%cmp = icmp eq i32 %a, %b			%cmp = icmp eq i32 %a, %b
	%ext = zext i1 %cmp to i64			%ext = zext i1 %cmp to i64
	store i64 %ext, i64 addrspace(1)* %out, align 8			store i64 %ext, i64 addrspace(1)* %out, align 8
	ret void			ret void
	}			}

	; FIXME: Why different commute?			; FIXME: Why different commute?
	Show All 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix bundle schedulingClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 237230

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp

llvm/lib/Target/AMDGPU/SIInstrInfo.h

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ds.gws.barrier.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ds.gws.init.ll

llvm/test/CodeGen/AMDGPU/min.ll

llvm/test/CodeGen/AMDGPU/misched-killflags.mir

llvm/test/CodeGen/AMDGPU/mul24-pass-ordering.ll

llvm/test/CodeGen/AMDGPU/mul_uint24-amdgcn.ll

llvm/test/CodeGen/AMDGPU/packed-op-sel.ll

llvm/test/CodeGen/AMDGPU/scratch-simple.ll

llvm/test/CodeGen/AMDGPU/selectcc-opt.ll

llvm/test/CodeGen/AMDGPU/setcc-opt.ll

llvm/test/CodeGen/AMDGPU/sint_to_fp.ll

llvm/test/CodeGen/AMDGPU/spill-vgpr-to-agpr.ll

llvm/test/CodeGen/AMDGPU/sub.i16.ll

llvm/test/CodeGen/AMDGPU/uint_to_fp.ll

llvm/test/CodeGen/AMDGPU/wave32.ll

llvm/test/CodeGen/AMDGPU/zero_extend.ll

[AMDGPU] Fix bundle scheduling
ClosedPublic