This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] [llvm-mc] Support of Trap Handler registers (TTMP0..11 and TBA/TMA)
ClosedPublic

Authored by artem.tamazov on Mar 2 2016, 12:43 PM.

Download Raw Diff

Details

Reviewers

• tstellarAMD
arsenm

Commits

rGeb4d5a9b0b30: [AMDGPU][llvm-mc] Support of Trap Handler registers (TTMP0..11 and TBA/TMA)git…
rL266205: [AMDGPU][llvm-mc] Support of Trap Handler registers (TTMP0..11 and…

Summary

Added: Tests for implemented features.
TODO: Support for TTMP quads, comma-separated syntax in "[]" and more.

Diff Detail

Repository: rL LLVM

Event Timeline

artem.tamazov updated this revision to Diff 49661.Mar 2 2016, 12:43 PM

artem.tamazov retitled this revision from to [AMDGPU] [llvm-mc] Support of Trap Handler registers (TTMP0..11 and TBA/TMA).

artem.tamazov updated this object.

artem.tamazov added reviewers: • tstellarAMD, arsenm.

artem.tamazov set the repository for this revision to rL LLVM.

artem.tamazov added a project: Restricted Project.

artem.tamazov added a subscriber: Restricted Project.

Herald added a subscriber: arsenm. · View Herald TranscriptMar 2 2016, 12:43 PM

• tstellarAMD added inline comments.Mar 2 2016, 7:41 PM

lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
391	What will these be used for?
lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
549	enum name should be capitalized: RegisterKind
test/CodeGen/AMDGPU/and.ll
259–260	Is adding -DAG here really necessary? Since both patterns define variables, I would expect them to be matched in the order they were written.

I've seen random scheduling changes before by adding registers. I had a patch which I think solved this a few months ago, but it was reverted and I haven't had time to find out what the problem was

arsenm added inline comments.Mar 2 2016, 8:00 PM

lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
549	it cal also be enum RegisterKind with no typedef
567	Dead code
lib/Target/AMDGPU/SIRegisterInfo.td
68	You can use a loop here over the numbers and add to get the encoding value

Thanks for reviewing. I am going to fix issues found and get back to fixing/improving CodeGen tests (to make those less dependent on inst scheduling) til no regressions.

lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
391	For implementing GPU trap/exception handlers. Writing of trap handlers require register allocation scheme which is different from one used for normal code. I am assuming that CodeGen should not use TBA/TMA/TTMPxx registers.
lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
567	Don't mind, I will update that on next iteration or remove prior submit. We need support for TTMP quads for writing trap handlers in assembly.
lib/Target/AMDGPU/SIRegisterInfo.td
68	Yes, but... Please let me keep this small fragment just for aesthetic reasons. At least something easily understandable in .td files )))
268–270	I'm going to update this on next iteration or remove prior submit.
test/CodeGen/AMDGPU/and.ll
259–260	Yes. For verde target (but not for tonga), the next two insns (s_mov_b32 and s_movk_i32) are emitted prior these two buffer loads and missed. Adding -DAG to loads fixes that.

In D17825#367010, @arsenm wrote:

I've seen random scheduling changes before by adding registers. I had a patch which I think solved this a few months ago, but it was reverted and I haven't had time to find out what the problem was

It's a pity... So I will continue with tests. Perhaps the problem is inevitable or hard to resolve (e.g. if it is related to some unspecified aspects of C++ and/or llvm containers etc).

Small fixes as per review etc.
All failing tests now handled (fixed of marked XFAIL).
Hopefully, the final version.

Herald added subscribers: qcolombet, MatzeB. · View Herald TranscriptMar 16 2016, 1:30 PM

artem.tamazov updated this object.Mar 16 2016, 1:34 PM

I'm not really happy about all the XFAILs and test changes (I know these aren't your fault). Is it possible to support these in the compiler without adding the registers to the SReg_32 class or adding all the trap register classes?

In D17825#376811, @tstellarAMD wrote:

I'm not really happy about all the XFAILs and test changes (I know these aren't your fault).

Me too, especially taking into account how much efforts it took to double-check all those failures ))

Is it possible to support these in the compiler without adding the registers to the SReg_32 class or adding all the trap register classes?

I am not llvm expert yet, but I guess it is possible but would require workaround-style coding (like duplication of code and similar). TTMP registers can be used just like scalar registers (except that these can be written only in exception context).

I believe that test changes will be required sooner or later. Perhaps we can minimize the changes - for example, switch off MI scheduler instead of making tests scheduling-tolerant. But that may lead to narrower testing coverage etc.

WRT XFAIL tests - there are only two cases. I think that ds_write2 case just reveals some different problem related to load-store-opt. Another case (si-triv-disjoint-mem-access) is related to the machinery I am not quite familiar with, e.g. @reorder_constant_load_local_store_constant_load. I am going to submit bugzillas for both so we will not overlook those in the future.

Ping

I don't want to merge this with all these test changes. Maybe we can wait until someone can make the scheduler less susceptible to register changes.

In D17825#381118, @tstellarAMD wrote:

I don't want to merge this with all these test changes. Maybe we can wait until someone can make the scheduler less susceptible to register changes.

All right. As far as I understand, the change is OK (except arguable test changes), and the problem lies in the scheduler.

Yes, we can wait, but not too long.
We need to provide trap handler support for the debugger team in reasonable time span.
Do you or Matt know who can update the scheduler and when?
If you have an old patch (which fixes that issue but have been reverted by some reason), we can find time to make that patch acceptable for the llvm trunk.

NOTE: you do not need to perform actual merge. As soon as review is accepted, I will rebase the changes to the tip and update the diff.

IMPORTANT: The change contains fixes for the following issues in the tests:

test\CodeGen\AMDGPU\ctlz.ll
	v_ctlz_i32: Error at line 39. Line 38 is suspicious.
	v_ctlz_i8: Error at line 102.
	v_ctlz_i64: Error at line 147.
test\CodeGen\AMDGPU\ds_write2st64.ll
	simple_write2st64_two_val_max_offset_f32: Error at line 47.
test\CodeGen\AMDGPU\setcc-opt.ll
	zext_bool_icmp_ne_1: Missing s_engpgm after line 145.
test\CodeGen\AMDGPU\sra.ll
	s_ashr_63_i64: Error at line 234.
test\CodeGen\AMDGPU\udivrem.ll
	test_udivrem: Three (too many) v_subrev_i32_e32 insts. Actual ISA contains only two of those. I do not know how current version of test pass.

I recommend integration of those changes regardless of the the rest.

You can try applying these 3 patches before your change and see if it helps to reduce the test changes:

http://reviews.llvm.org/D18451
http://reviews.llvm.org/D18452
http://reviews.llvm.org/D18453

In D17825#367010, @arsenm wrote:

I've seen random scheduling changes before by adding registers. I had a patch which I think solved this a few months ago, but it was reverted and I haven't had time to find out what the problem was

Can you point me to this patch? I don't know why the scheduler would behave that much differently with more registers. Of course register pressure changes, but my understanding is that GPUs rarely hit the register limit anyway.

In D17825#382877, @MatzeB wrote:

In D17825#367010, @arsenm wrote:

I've seen random scheduling changes before by adding registers. I had a patch which I think solved this a few months ago, but it was reverted and I haven't had time to find out what the problem was

Can you point me to this patch? I don't know why the scheduler would behave that much differently with more registers. Of course register pressure changes, but my understanding is that GPUs rarely hit the register limit anyway.

This current patch is a good example of this. Just take a look at any of the changed tests with and without this patch.

In D17825#382877, @MatzeB wrote:

In D17825#367010, @arsenm wrote:

I've seen random scheduling changes before by adding registers. I had a patch which I think solved this a few months ago, but it was reverted and I haven't had time to find out what the problem was

Can you point me to this patch? I don't know why the scheduler would behave that much differently with more registers. Of course register pressure changes, but my understanding is that GPUs rarely hit the register limit anyway.

This was r252674, which was reverted. I think this help with some similar random effects, but it was a while ago now

In D17825#382877, @MatzeB wrote:

In D17825#367010, @arsenm wrote:

I've seen random scheduling changes before by adding registers. I had a patch which I think solved this a few months ago, but it was reverted and I haven't had time to find out what the problem was

Can you point me to this patch? I don't know why the scheduler would behave that much differently with more registers. Of course register pressure changes, but my understanding is that GPUs rarely hit the register limit anyway.

Hi Matthias, I would appreciate if you share any news w.r.t. the scheduling issue. For example, is it so that any of patches mentioned by Tom and Matt solve the problem? Thanks!

In D17825#391213, @artem.tamazov wrote:

In D17825#382877, @MatzeB wrote:

In D17825#367010, @arsenm wrote:

I've seen random scheduling changes before by adding registers. I had a patch which I think solved this a few months ago, but it was reverted and I haven't had time to find out what the problem was

Can you point me to this patch? I don't know why the scheduler would behave that much differently with more registers. Of course register pressure changes, but my understanding is that GPUs rarely hit the register limit anyway.

Hi Matthias, I would appreciate if you share any news w.r.t. the scheduling issue. For example, is it so that any of patches mentioned by Tom and Matt solve the problem? Thanks!

I've committed some scheduling changes to trunk. You might want to try rebasing this patch to see if any of those patches helped.

Thanks to Tom, scheduling changes do not appear anymore. No more unnecessary test changes in this patch.

LGTM.

This revision is now accepted and ready to land.Apr 7 2016, 11:23 AM

Closed by commit rL266205: [AMDGPU][llvm-mc] Support of Trap Handler registers (TTMP0..11 and… (authored by artem.tamazov). · Explain WhyApr 13 2016, 9:24 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

AMDGPUAsmPrinter.cpp

15 lines

AsmParser/

AMDGPUAsmParser.cpp

56 lines

InstPrinter/

AMDGPUInstPrinter.cpp

31 lines

SIRegisterInfo.cpp

26 lines

SIRegisterInfo.td

65 lines

test/

CodeGen/

AMDGPU/

and.ll

8 lines

atomic_cmp_swap_local.ll

18 lines

2 lines

16 lines

9 lines

12 lines

Diff 49661

lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp

Show First 20 Lines • Show All 376 Lines • ▼ Show 20 Lines	for (const MachineInstr &MI : MBB) {
continue;		continue;

case AMDGPU::FLAT_SCR:		case AMDGPU::FLAT_SCR:
case AMDGPU::FLAT_SCR_LO:		case AMDGPU::FLAT_SCR_LO:
case AMDGPU::FLAT_SCR_HI:		case AMDGPU::FLAT_SCR_HI:
FlatUsed = true;		FlatUsed = true;
continue;		continue;

		case AMDGPU::TBA:
		case AMDGPU::TBA_LO:
		case AMDGPU::TBA_HI:
		case AMDGPU::TMA:
		case AMDGPU::TMA_LO:
		case AMDGPU::TMA_HI:
		llvm_unreachable("Trap Handler registers should not be used");
		tstellarAMDUnsubmitted Not Done Reply Inline Actions What will these be used for? tstellarAMD: What will these be used for?
		artem.tamazovAuthorUnsubmitted Not Done Reply Inline Actions For implementing GPU trap/exception handlers. Writing of trap handlers require register allocation scheme which is different from one used for normal code. I am assuming that CodeGen should not use TBA/TMA/TTMPxx registers. artem.tamazov: For implementing GPU trap/exception handlers. Writing of trap handlers require register…
		continue;

default:		default:
break;		break;
}		}

if (AMDGPU::SReg_32RegClass.contains(reg)) {		if (AMDGPU::SReg_32RegClass.contains(reg)) {
		if (AMDGPU::TTMP_32RegClass.contains(reg)) {
		llvm_unreachable("Trap Handler registers should not be used");
		}
isSGPR = true;		isSGPR = true;
width = 1;		width = 1;
} else if (AMDGPU::VGPR_32RegClass.contains(reg)) {		} else if (AMDGPU::VGPR_32RegClass.contains(reg)) {
isSGPR = false;		isSGPR = false;
width = 1;		width = 1;
} else if (AMDGPU::SReg_64RegClass.contains(reg)) {		} else if (AMDGPU::SReg_64RegClass.contains(reg)) {
		if (AMDGPU::TTMP_64RegClass.contains(reg)) {
		llvm_unreachable("Trap Handler registers should not be used");
		}
isSGPR = true;		isSGPR = true;
width = 2;		width = 2;
} else if (AMDGPU::VReg_64RegClass.contains(reg)) {		} else if (AMDGPU::VReg_64RegClass.contains(reg)) {
isSGPR = false;		isSGPR = false;
width = 2;		width = 2;
} else if (AMDGPU::VReg_96RegClass.contains(reg)) {		} else if (AMDGPU::VReg_96RegClass.contains(reg)) {
isSGPR = false;		isSGPR = false;
width = 3;		width = 3;
▲ Show 20 Lines • Show All 298 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp

Show First 20 Lines • Show All 540 Lines • ▼ Show 20 Lines	struct OptionalOperand {
AMDGPUOperand::ImmTy Type;		AMDGPUOperand::ImmTy Type;
bool IsBit;		bool IsBit;
int64_t Default;		int64_t Default;
bool (*ConvertResult)(int64_t&);		bool (*ConvertResult)(int64_t&);
};		};

}		}

static int getRegClass(bool IsVgpr, unsigned RegWidth) {		typedef enum { IS_VGPR, IS_SGPR, IS_TTMP } registerKind;
		tstellarAMDUnsubmitted Not Done Reply Inline Actions enum name should be capitalized: RegisterKind tstellarAMD: enum name should be capitalized: RegisterKind
		arsenmUnsubmitted Not Done Reply Inline Actions it cal also be enum RegisterKind with no typedef arsenm: it cal also be enum RegisterKind with no typedef
if (IsVgpr) {
		static int getRegClass(registerKind Is, unsigned RegWidth) {
		if (Is == IS_VGPR) {
switch (RegWidth) {		switch (RegWidth) {
default: return -1;		default: return -1;
case 1: return AMDGPU::VGPR_32RegClassID;		case 1: return AMDGPU::VGPR_32RegClassID;
case 2: return AMDGPU::VReg_64RegClassID;		case 2: return AMDGPU::VReg_64RegClassID;
case 3: return AMDGPU::VReg_96RegClassID;		case 3: return AMDGPU::VReg_96RegClassID;
case 4: return AMDGPU::VReg_128RegClassID;		case 4: return AMDGPU::VReg_128RegClassID;
case 8: return AMDGPU::VReg_256RegClassID;		case 8: return AMDGPU::VReg_256RegClassID;
case 16: return AMDGPU::VReg_512RegClassID;		case 16: return AMDGPU::VReg_512RegClassID;
}		}
		} else if (Is == IS_TTMP) {
		switch (RegWidth) {
		default: return -1;
		case 1: return AMDGPU::TTMP_32RegClassID;
		case 2: return AMDGPU::TTMP_64RegClassID;
		// case 4: return AMDGPU::TTMP_128RegClassID;
		arsenmUnsubmitted Done Reply Inline Actions Dead code arsenm: Dead code
		artem.tamazovAuthorUnsubmitted Done Reply Inline Actions Don't mind, I will update that on next iteration or remove prior submit. We need support for TTMP quads for writing trap handlers in assembly. artem.tamazov: Don't mind, I will update that on next iteration or remove prior submit. We need support for…
}		}
		} else if (Is == IS_SGPR) {
switch (RegWidth) {		switch (RegWidth) {
default: return -1;		default: return -1;
case 1: return AMDGPU::SGPR_32RegClassID;		case 1: return AMDGPU::SGPR_32RegClassID;
case 2: return AMDGPU::SGPR_64RegClassID;		case 2: return AMDGPU::SGPR_64RegClassID;
case 4: return AMDGPU::SReg_128RegClassID;		case 4: return AMDGPU::SReg_128RegClassID;
case 8: return AMDGPU::SReg_256RegClassID;		case 8: return AMDGPU::SReg_256RegClassID;
case 16: return AMDGPU::SReg_512RegClassID;		case 16: return AMDGPU::SReg_512RegClassID;
}		}
}		}
		return -1;
		}

static unsigned getRegForName(StringRef RegName) {		static unsigned getRegForName(StringRef RegName) {

return StringSwitch<unsigned>(RegName)		return StringSwitch<unsigned>(RegName)
.Case("exec", AMDGPU::EXEC)		.Case("exec", AMDGPU::EXEC)
.Case("vcc", AMDGPU::VCC)		.Case("vcc", AMDGPU::VCC)
.Case("flat_scratch", AMDGPU::FLAT_SCR)		.Case("flat_scratch", AMDGPU::FLAT_SCR)
.Case("m0", AMDGPU::M0)		.Case("m0", AMDGPU::M0)
.Case("scc", AMDGPU::SCC)		.Case("scc", AMDGPU::SCC)
.Case("flat_scratch_lo", AMDGPU::FLAT_SCR_LO)		.Case("flat_scratch_lo", AMDGPU::FLAT_SCR_LO)
.Case("flat_scratch_hi", AMDGPU::FLAT_SCR_HI)		.Case("flat_scratch_hi", AMDGPU::FLAT_SCR_HI)
.Case("vcc_lo", AMDGPU::VCC_LO)		.Case("vcc_lo", AMDGPU::VCC_LO)
.Case("vcc_hi", AMDGPU::VCC_HI)		.Case("vcc_hi", AMDGPU::VCC_HI)
.Case("exec_lo", AMDGPU::EXEC_LO)		.Case("exec_lo", AMDGPU::EXEC_LO)
.Case("exec_hi", AMDGPU::EXEC_HI)		.Case("exec_hi", AMDGPU::EXEC_HI)
		.Case("tma_lo", AMDGPU::TMA_LO)
		.Case("tma_hi", AMDGPU::TMA_HI)
		.Case("tba_lo", AMDGPU::TBA_LO)
		.Case("tba_hi", AMDGPU::TBA_HI)
.Default(0);		.Default(0);
}		}

bool AMDGPUAsmParser::ParseRegister(unsigned &RegNo, SMLoc &StartLoc, SMLoc &EndLoc) {		bool AMDGPUAsmParser::ParseRegister(unsigned &RegNo, SMLoc &StartLoc, SMLoc &EndLoc) {
const AsmToken Tok = Parser.getTok();		const AsmToken Tok = Parser.getTok();
StartLoc = Tok.getLoc();		StartLoc = Tok.getLoc();
EndLoc = Tok.getEndLoc();		EndLoc = Tok.getEndLoc();
const MCRegisterInfo *TRI = getContext().getRegisterInfo();		const MCRegisterInfo *TRI = getContext().getRegisterInfo();

StringRef RegName = Tok.getString();		StringRef RegName = Tok.getString();
RegNo = getRegForName(RegName);		RegNo = getRegForName(RegName);

if (RegNo) {		if (RegNo) {
Parser.Lex();		Parser.Lex();
return !subtargetHasRegister(*TRI, RegNo);		return !subtargetHasRegister(*TRI, RegNo);
}		}

// Match vgprs and sgprs		// Match vgprs, sgprs and ttmps
if (RegName[0] != 's' && RegName[0] != 'v')		if (RegName[0] != 's' && RegName[0] != 'v' && !RegName.startswith("ttmp"))
return true;		return true;

bool IsVgpr = RegName[0] == 'v';		const registerKind Is = RegName[0] == 'v' ? IS_VGPR : RegName[0] == 's' ? IS_SGPR : IS_TTMP;
unsigned RegWidth;		unsigned RegWidth;
unsigned RegIndexInClass;		unsigned RegIndexInClass;
if (RegName.size() > 1) {		if (RegName.size() > (Is == IS_TTMP ? strlen("ttmp") : 1) ) {
// We have a 32-bit register		// We have a single 32-bit register. Syntax: vXX
RegWidth = 1;		RegWidth = 1;
if (RegName.substr(1).getAsInteger(10, RegIndexInClass))		if (RegName.substr(Is == IS_TTMP ? strlen("ttmp") : 1).getAsInteger(10, RegIndexInClass))
return true;		return true;
Parser.Lex();		Parser.Lex();
} else {		} else {
// We have a register greater than 32-bits.		// We have a register greater than 32-bits (a range of single registers). Syntax: v[XX:YY]

int64_t RegLo, RegHi;		int64_t RegLo, RegHi;
Parser.Lex();		Parser.Lex();
if (getLexer().isNot(AsmToken::LBrac))		if (getLexer().isNot(AsmToken::LBrac))
return true;		return true;

Parser.Lex();		Parser.Lex();
if (getParser().parseAbsoluteExpression(RegLo))		if (getParser().parseAbsoluteExpression(RegLo))
return true;		return true;

if (getLexer().isNot(AsmToken::Colon))		if (getLexer().isNot(AsmToken::Colon))
return true;		return true;

Parser.Lex();		Parser.Lex();
if (getParser().parseAbsoluteExpression(RegHi))		if (getParser().parseAbsoluteExpression(RegHi))
return true;		return true;

if (getLexer().isNot(AsmToken::RBrac))		if (getLexer().isNot(AsmToken::RBrac))
return true;		return true;

Parser.Lex();		Parser.Lex();
RegWidth = (RegHi - RegLo) + 1;		RegWidth = (RegHi - RegLo) + 1;
if (IsVgpr) {		if (Is == IS_VGPR) {
// VGPR registers aren't aligned.		// VGPR registers aren't aligned.
RegIndexInClass = RegLo;		RegIndexInClass = RegLo;
} else {		} else {
// SGPR registers are aligned. Max alignment is 4 dwords.		// SGPR and TTMP registers must be are aligned. Max required alignment is 4 dwords.
unsigned Size = std::min(RegWidth, 4u);		unsigned Size = std::min(RegWidth, 4u);
if (RegLo % Size != 0)		if (RegLo % Size != 0)
return true;		return true;

RegIndexInClass = RegLo / Size;		RegIndexInClass = RegLo / Size;
}		}
}		}

int RCID = getRegClass(IsVgpr, RegWidth);		int RCID = getRegClass(Is, RegWidth);
if (RCID == -1)		if (RCID == -1)
return true;		return true;

const MCRegisterClass RC = TRI->getRegClass(RCID);		const MCRegisterClass RC = TRI->getRegClass(RCID);
if (RegIndexInClass >= RC.getNumRegs())		if (RegIndexInClass >= RC.getNumRegs())
return true;		return true;

RegNo = RC.getRegister(RegIndexInClass);		RegNo = RC.getRegister(RegIndexInClass);
▲ Show 20 Lines • Show All 1,196 Lines • Show Last 20 Lines

lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.cpp

Show First 20 Lines • Show All 173 Lines • ▼ Show 20 Lines	case AMDGPU::FLAT_SCR:
O << "flat_scratch";		O << "flat_scratch";
return;		return;
case AMDGPU::VCC_LO:		case AMDGPU::VCC_LO:
O << "vcc_lo";		O << "vcc_lo";
return;		return;
case AMDGPU::VCC_HI:		case AMDGPU::VCC_HI:
O << "vcc_hi";		O << "vcc_hi";
return;		return;
		case AMDGPU::TBA_LO:
		O << "tba_lo";
		return;
		case AMDGPU::TBA_HI:
		O << "tba_hi";
		return;
		case AMDGPU::TMA_LO:
		O << "tma_lo";
		return;
		case AMDGPU::TMA_HI:
		O << "tma_hi";
		return;
case AMDGPU::EXEC_LO:		case AMDGPU::EXEC_LO:
O << "exec_lo";		O << "exec_lo";
return;		return;
case AMDGPU::EXEC_HI:		case AMDGPU::EXEC_HI:
O << "exec_hi";		O << "exec_hi";
return;		return;
case AMDGPU::FLAT_SCR_LO:		case AMDGPU::FLAT_SCR_LO:
O << "flat_scratch_lo";		O << "flat_scratch_lo";
Show All 12 Lines	if (MRI.getRegClass(AMDGPU::VGPR_32RegClassID).contains(reg)) {
Type = 'v';		Type = 'v';
NumRegs = 1;		NumRegs = 1;
} else if (MRI.getRegClass(AMDGPU::SGPR_32RegClassID).contains(reg)) {		} else if (MRI.getRegClass(AMDGPU::SGPR_32RegClassID).contains(reg)) {
Type = 's';		Type = 's';
NumRegs = 1;		NumRegs = 1;
} else if (MRI.getRegClass(AMDGPU::VReg_64RegClassID).contains(reg)) {		} else if (MRI.getRegClass(AMDGPU::VReg_64RegClassID).contains(reg)) {
Type = 'v';		Type = 'v';
NumRegs = 2;		NumRegs = 2;
} else if (MRI.getRegClass(AMDGPU::SReg_64RegClassID).contains(reg)) {		} else if (MRI.getRegClass(AMDGPU::SGPR_64RegClassID).contains(reg)) {
Type = 's';		Type = 's';
NumRegs = 2;		NumRegs = 2;
		} else if (MRI.getRegClass(AMDGPU::TTMP_64RegClassID).contains(reg)) {
		Type = 't';
		NumRegs = 2;
} else if (MRI.getRegClass(AMDGPU::VReg_128RegClassID).contains(reg)) {		} else if (MRI.getRegClass(AMDGPU::VReg_128RegClassID).contains(reg)) {
Type = 'v';		Type = 'v';
NumRegs = 4;		NumRegs = 4;
} else if (MRI.getRegClass(AMDGPU::SReg_128RegClassID).contains(reg)) {		} else if (MRI.getRegClass(AMDGPU::SReg_128RegClassID).contains(reg)) {
Type = 's';		Type = 's';
NumRegs = 4;		NumRegs = 4;
} else if (MRI.getRegClass(AMDGPU::VReg_96RegClassID).contains(reg)) {		} else if (MRI.getRegClass(AMDGPU::VReg_96RegClassID).contains(reg)) {
Type = 'v';		Type = 'v';
Show All 13 Lines	void AMDGPUInstPrinter::printRegOperand(unsigned reg, raw_ostream &O,
} else {		} else {
O << getRegisterName(reg);		O << getRegisterName(reg);
return;		return;
}		}

// The low 8 bits of the encoding value is the register index, for both VGPRs		// The low 8 bits of the encoding value is the register index, for both VGPRs
// and SGPRs.		// and SGPRs.
unsigned RegIdx = MRI.getEncodingValue(reg) & ((1 << 8) - 1);		unsigned RegIdx = MRI.getEncodingValue(reg) & ((1 << 8) - 1);
		if (Type == 't') // Trap temps start at offset 112. TODO: Get this from tablegen.
		RegIdx -= 112; // FIXME hack.
if (NumRegs == 1) {		if (NumRegs == 1) {
O << Type << RegIdx;		if (Type == 't') // FIXME hack
		O << "ttmp";
		else
		O << Type;
		O << RegIdx;
return;		return;
}		}

O << Type << '[' << RegIdx << ':' << (RegIdx + NumRegs - 1) << ']';		if (Type == 't') // FIXME hack
		O << "ttmp";
		else
		O << Type;
		O << '[' << RegIdx << ':' << (RegIdx + NumRegs - 1) << ']';
}		}

void AMDGPUInstPrinter::printVOPDst(const MCInst *MI, unsigned OpNo,		void AMDGPUInstPrinter::printVOPDst(const MCInst *MI, unsigned OpNo,
raw_ostream &O) {		raw_ostream &O) {
if (MII.get(MI->getOpcode()).TSFlags & SIInstrFlags::VOP3)		if (MII.get(MI->getOpcode()).TSFlags & SIInstrFlags::VOP3)
O << "_e64 ";		O << "_e64 ";
else		else
O << "_e32 ";		O << "_e32 ";
▲ Show 20 Lines • Show All 417 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIRegisterInfo.cpp

Show First 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	BitVector SIRegisterInfo::getReservedRegs(const MachineFunction &MF) const {
BitVector Reserved(getNumRegs());		BitVector Reserved(getNumRegs());
Reserved.set(AMDGPU::INDIRECT_BASE_ADDR);		Reserved.set(AMDGPU::INDIRECT_BASE_ADDR);

// EXEC_LO and EXEC_HI could be allocated and used as regular register, but		// EXEC_LO and EXEC_HI could be allocated and used as regular register, but
// this seems likely to result in bugs, so I'm marking them as reserved.		// this seems likely to result in bugs, so I'm marking them as reserved.
reserveRegisterTuples(Reserved, AMDGPU::EXEC);		reserveRegisterTuples(Reserved, AMDGPU::EXEC);
reserveRegisterTuples(Reserved, AMDGPU::FLAT_SCR);		reserveRegisterTuples(Reserved, AMDGPU::FLAT_SCR);

		// Reserve Trap Handler registers - support is not implemented in Codegen.
		reserveRegisterTuples(Reserved, AMDGPU::TBA);
		reserveRegisterTuples(Reserved, AMDGPU::TMA);
		reserveRegisterTuples(Reserved, AMDGPU::TTMP0_TTMP1);
		reserveRegisterTuples(Reserved, AMDGPU::TTMP2_TTMP3);
		reserveRegisterTuples(Reserved, AMDGPU::TTMP4_TTMP5);
		reserveRegisterTuples(Reserved, AMDGPU::TTMP6_TTMP7);
		reserveRegisterTuples(Reserved, AMDGPU::TTMP8_TTMP9);
		reserveRegisterTuples(Reserved, AMDGPU::TTMP10_TTMP11);

// Reserve the last 2 registers so we will always have at least 2 more that		// Reserve the last 2 registers so we will always have at least 2 more that
// will physically contain VCC.		// will physically contain VCC.
reserveRegisterTuples(Reserved, AMDGPU::SGPR102_SGPR103);		reserveRegisterTuples(Reserved, AMDGPU::SGPR102_SGPR103);

const AMDGPUSubtarget &ST = MF.getSubtarget<AMDGPUSubtarget>();		const AMDGPUSubtarget &ST = MF.getSubtarget<AMDGPUSubtarget>();

if (ST.getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS) {		if (ST.getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS) {
// SI/CI have 104 SGPRs. VI has 102. We need to shift down the reservation		// SI/CI have 104 SGPRs. VI has 102. We need to shift down the reservation
▲ Show 20 Lines • Show All 467 Lines • ▼ Show 20 Lines	unsigned SIRegisterInfo::getPhysRegSubReg(unsigned Reg,
const TargetRegisterClass *SubRC,		const TargetRegisterClass *SubRC,
unsigned Channel) const {		unsigned Channel) const {

switch (Reg) {		switch (Reg) {
case AMDGPU::VCC:		case AMDGPU::VCC:
switch(Channel) {		switch(Channel) {
case 0: return AMDGPU::VCC_LO;		case 0: return AMDGPU::VCC_LO;
case 1: return AMDGPU::VCC_HI;		case 1: return AMDGPU::VCC_HI;
default: llvm_unreachable("Invalid SubIdx for VCC");		default: llvm_unreachable("Invalid SubIdx for VCC"); break;
		}

		case AMDGPU::TBA:
		switch(Channel) {
		case 0: return AMDGPU::TBA_LO;
		case 1: return AMDGPU::TBA_HI;
		default: llvm_unreachable("Invalid SubIdx for TBA"); break;
		}

		case AMDGPU::TMA:
		switch(Channel) {
		case 0: return AMDGPU::TMA_LO;
		case 1: return AMDGPU::TMA_HI;
		default: llvm_unreachable("Invalid SubIdx for TMA"); break;
}		}

case AMDGPU::FLAT_SCR:		case AMDGPU::FLAT_SCR:
switch (Channel) {		switch (Channel) {
case 0:		case 0:
return AMDGPU::FLAT_SCR_LO;		return AMDGPU::FLAT_SCR_LO;
case 1:		case 1:
return AMDGPU::FLAT_SCR_HI;		return AMDGPU::FLAT_SCR_HI;
▲ Show 20 Lines • Show All 140 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIRegisterInfo.td

Show All 38 Lines	def EXEC : RegisterWithSubRegs<"EXEC", [EXEC_LO, EXEC_HI]>,
let Namespace = "AMDGPU";		let Namespace = "AMDGPU";
let SubRegIndices = [sub0, sub1];		let SubRegIndices = [sub0, sub1];
let HWEncoding = 126;		let HWEncoding = 126;
}		}

def SCC : SIReg<"scc", 253>;		def SCC : SIReg<"scc", 253>;
def M0 : SIReg <"m0", 124>;		def M0 : SIReg <"m0", 124>;

		// Trap handler registers
		def TBA_LO : SIReg<"tba_lo", 108>;
		def TBA_HI : SIReg<"tba_hi", 109>;

		def TBA : RegisterWithSubRegs<"tba", [TBA_LO, TBA_HI]>,
		DwarfRegAlias<TBA_LO> {
		let Namespace = "AMDGPU";
		let SubRegIndices = [sub0, sub1];
		let HWEncoding = 108;
		}

		def TMA_LO : SIReg<"tma_lo", 110>;
		def TMA_HI : SIReg<"tma_hi", 111>;

		def TMA : RegisterWithSubRegs<"tma", [TMA_LO, TMA_HI]>,
		DwarfRegAlias<TMA_LO> {
		let Namespace = "AMDGPU";
		let SubRegIndices = [sub0, sub1];
		let HWEncoding = 110;
		}

		def TTMP0 : SIReg <"ttmp0", 112>;
		arsenmUnsubmitted Not Done Reply Inline Actions You can use a loop here over the numbers and add to get the encoding value arsenm: You can use a loop here over the numbers and add to get the encoding value
		artem.tamazovAuthorUnsubmitted Done Reply Inline Actions Yes, but... Please let me keep this small fragment just for aesthetic reasons. At least something easily understandable in .td files ))) artem.tamazov: Yes, but... Please let me keep this small fragment just for aesthetic reasons. At least…
		def TTMP1 : SIReg <"ttmp1", 113>;
		def TTMP2 : SIReg <"ttmp2", 114>;
		def TTMP3 : SIReg <"ttmp3", 115>;
		def TTMP4 : SIReg <"ttmp4", 116>;
		def TTMP5 : SIReg <"ttmp5", 117>;
		def TTMP6 : SIReg <"ttmp6", 118>;
		def TTMP7 : SIReg <"ttmp7", 119>;
		def TTMP8 : SIReg <"ttmp8", 120>;
		def TTMP9 : SIReg <"ttmp9", 121>;
		def TTMP10 : SIReg <"ttmp10", 122>;
		def TTMP11 : SIReg <"ttmp11", 123>;

multiclass FLAT_SCR_LOHI_m <string n, bits<16> ci_e, bits<16> vi_e> {		multiclass FLAT_SCR_LOHI_m <string n, bits<16> ci_e, bits<16> vi_e> {
def _ci : SIReg<n, ci_e>;		def _ci : SIReg<n, ci_e>;
def _vi : SIReg<n, vi_e>;		def _vi : SIReg<n, vi_e>;
def "" : SIReg<"", 0>;		def "" : SIReg<"", 0>;
}		}

class FlatReg <Register lo, Register hi, bits<16> encoding> :		class FlatReg <Register lo, Register hi, bits<16> encoding> :
RegisterWithSubRegs<"flat_scratch", [lo, hi]>,		RegisterWithSubRegs<"flat_scratch", [lo, hi]>,
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	def SGPR_512 : RegisterTuples<[sub0, sub1, sub2, sub3, sub4, sub5, sub6, sub7,
(add (decimate (shl SGPR_32, 9), 4)),		(add (decimate (shl SGPR_32, 9), 4)),
(add (decimate (shl SGPR_32, 10), 4)),		(add (decimate (shl SGPR_32, 10), 4)),
(add (decimate (shl SGPR_32, 11), 4)),		(add (decimate (shl SGPR_32, 11), 4)),
(add (decimate (shl SGPR_32, 12), 4)),		(add (decimate (shl SGPR_32, 12), 4)),
(add (decimate (shl SGPR_32, 13), 4)),		(add (decimate (shl SGPR_32, 13), 4)),
(add (decimate (shl SGPR_32, 14), 4)),		(add (decimate (shl SGPR_32, 14), 4)),
(add (decimate (shl SGPR_32, 15), 4))]>;		(add (decimate (shl SGPR_32, 15), 4))]>;

		// Trap handler TMP 32-bit registers
		def TTMP_32 : RegisterClass<"AMDGPU", [i32, f32], 32,
		(add (sequence "TTMP%u", 0, 11))> {
		let isAllocatable = 0;
		}

		// Trap handler TMP 64-bit registers
		def TTMP_64Regs : RegisterTuples<[sub0, sub1],
		[(add (decimate TTMP_32, 2)),
		(add (decimate (shl TTMP_32, 1), 2))]>;

		// Trap handler TMP 128-bit registers
		def TTMP_128Regs : RegisterTuples<[sub0, sub1, sub2, sub3],
		[(add (decimate TTMP_32, 4)),
		(add (decimate (shl TTMP_32, 1), 4)),
		(add (decimate (shl TTMP_32, 2), 4)),
		(add (decimate (shl TTMP_32, 3), 4))]>;

// VGPR 32-bit registers		// VGPR 32-bit registers
def VGPR_32 : RegisterClass<"AMDGPU", [i32, f32], 32,		def VGPR_32 : RegisterClass<"AMDGPU", [i32, f32], 32,
(add (sequence "VGPR%u", 0, 255))>;		(add (sequence "VGPR%u", 0, 255))>;

// VGPR 64-bit registers		// VGPR 64-bit registers
def VGPR_64 : RegisterTuples<[sub0, sub1],		def VGPR_64 : RegisterTuples<[sub0, sub1],
[(add (trunc VGPR_32, 255)),		[(add (trunc VGPR_32, 255)),
(add (shl VGPR_32, 1))]>;		(add (shl VGPR_32, 1))]>;
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines

class RegImmMatcher<string name> : AsmOperandClass {		class RegImmMatcher<string name> : AsmOperandClass {
let Name = name;		let Name = name;
let RenderMethod = "addRegOrImmOperands";		let RenderMethod = "addRegOrImmOperands";
}		}

// Register class for all scalar registers (SGPRs + Special Registers)		// Register class for all scalar registers (SGPRs + Special Registers)
def SReg_32 : RegisterClass<"AMDGPU", [i32, f32], 32,		def SReg_32 : RegisterClass<"AMDGPU", [i32, f32], 32,
(add SGPR_32, M0, VCC_LO, VCC_HI, EXEC_LO, EXEC_HI, FLAT_SCR_LO, FLAT_SCR_HI)		(add SGPR_32, M0, VCC_LO, VCC_HI, EXEC_LO, EXEC_HI, FLAT_SCR_LO, FLAT_SCR_HI,
		TTMP_32, TMA_LO, TMA_HI, TBA_LO, TBA_HI)
>;		>;

def SGPR_64 : RegisterClass<"AMDGPU", [v2i32, i64, f64], 32, (add SGPR_64Regs)>;		def SGPR_64 : RegisterClass<"AMDGPU", [v2i32, i64, f64], 32, (add SGPR_64Regs)>;

		def TTMP_64 : RegisterClass<"AMDGPU", [v2i32, i64, f64], 32, (add TTMP_64Regs)> {
		let isAllocatable = 0;
		}

def SReg_64 : RegisterClass<"AMDGPU", [v2i32, i64, f64, i1], 32,		def SReg_64 : RegisterClass<"AMDGPU", [v2i32, i64, f64, i1], 32,
(add SGPR_64, VCC, EXEC, FLAT_SCR)		(add SGPR_64, VCC, EXEC, FLAT_SCR, TTMP_64)
>;		>;

		//def TTMP_128 : RegisterClass<"AMDGPU", [v4i32, v16i8, v2i64], 32, (add TTMP_128Regs)>;
		//
		//def SReg_128 : RegisterClass<"AMDGPU", [v4i32, v16i8, v2i64], 32, (add SGPR_128, TTMP_128)> {
		artem.tamazovAuthorUnsubmitted Done Reply Inline Actions I'm going to update this on next iteration or remove prior submit. artem.tamazov: I'm going to update this on next iteration or remove prior submit.

def SReg_128 : RegisterClass<"AMDGPU", [v4i32, v16i8, v2i64], 32, (add SGPR_128)> {		def SReg_128 : RegisterClass<"AMDGPU", [v4i32, v16i8, v2i64], 32, (add SGPR_128)> {
// Requires 2 s_mov_b64 to copy		// Requires 2 s_mov_b64 to copy
let CopyCost = 2;		let CopyCost = 2;
}		}

def SReg_256 : RegisterClass<"AMDGPU", [v8i32, v8f32], 32, (add SGPR_256)> {		def SReg_256 : RegisterClass<"AMDGPU", [v8i32, v8f32], 32, (add SGPR_256)> {
// Requires 4 s_mov_b64 to copy		// Requires 4 s_mov_b64 to copy
let CopyCost = 4;		let CopyCost = 4;
▲ Show 20 Lines • Show All 114 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/and.ll

Show First 20 Lines • Show All 250 Lines • ▼ Show 20 Lines
define void @v_and_constant_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %aptr) {		define void @v_and_constant_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %aptr) {
%a = load i64, i64 addrspace(1)* %aptr, align 8		%a = load i64, i64 addrspace(1)* %aptr, align 8
%and = and i64 %a, 1231231234567		%and = and i64 %a, 1231231234567
store i64 %and, i64 addrspace(1)* %out, align 8		store i64 %and, i64 addrspace(1)* %out, align 8
ret void		ret void
}		}

; FUNC-LABEL: {{^}}v_and_multi_use_constant_i64:		; FUNC-LABEL: {{^}}v_and_multi_use_constant_i64:
; SI: buffer_load_dwordx2 v{{\[}}[[LO0:[0-9]+]]:[[HI0:[0-9]+]]{{\]}}		; SI-DAG: buffer_load_dwordx2 v{{\[}}[[LO0:[0-9]+]]:[[HI0:[0-9]+]]{{\]}}
; SI: buffer_load_dwordx2 v{{\[}}[[LO1:[0-9]+]]:[[HI1:[0-9]+]]{{\]}}		; SI-DAG: buffer_load_dwordx2 v{{\[}}[[LO1:[0-9]+]]:[[HI1:[0-9]+]]{{\]}}
		tstellarAMDUnsubmitted Done Reply Inline Actions Is adding -DAG here really necessary? Since both patterns define variables, I would expect them to be matched in the order they were written. tstellarAMD: Is adding -DAG here really necessary? Since both patterns define variables, I would expect…
		artem.tamazovAuthorUnsubmitted Done Reply Inline Actions Yes. For verde target (but not for tonga), the next two insns (s_mov_b32 and s_movk_i32) are emitted prior these two buffer loads and missed. Adding -DAG to loads fixes that. artem.tamazov: Yes. For verde target (but not for tonga), the next two insns (s_mov_b32 and s_movk_i32) are…
; SI-DAG: s_mov_b32 [[KLO:s[0-9]+]], 0xab19b207{{$}}		; SI-DAG: s_mov_b32 [[KLO:s[0-9]+]], 0xab19b207{{$}}
; SI-DAG: s_movk_i32 [[KHI:s[0-9]+]], 0x11e{{$}}		; SI-DAG: s_movk_i32 [[KHI:s[0-9]+]], 0x11e{{$}}
; SI-DAG: v_and_b32_e32 {{v[0-9]+}}, [[KLO]], v[[LO0]]		; SI-DAG: v_and_b32_e32 {{v[0-9]+}}, [[KLO]], v[[LO0]]
; SI-DAG: v_and_b32_e32 {{v[0-9]+}}, [[KHI]], v[[HI0]]		; SI-DAG: v_and_b32_e32 {{v[0-9]+}}, [[KHI]], v[[HI0]]
; SI-DAG: v_and_b32_e32 {{v[0-9]+}}, [[KLO]], v[[LO1]]		; SI-DAG: v_and_b32_e32 {{v[0-9]+}}, [[KLO]], v[[LO1]]
; SI-DAG: v_and_b32_e32 {{v[0-9]+}}, [[KHI]], v[[HI1]]		; SI-DAG: v_and_b32_e32 {{v[0-9]+}}, [[KHI]], v[[HI1]]
; SI: buffer_store_dwordx2		; SI: buffer_store_dwordx2
; SI: buffer_store_dwordx2		; SI: buffer_store_dwordx2
▲ Show 20 Lines • Show All 209 Lines • ▼ Show 20 Lines	define void @s_and_inline_imm_neg_4.0_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %aptr, i64 %a) {
ret void		ret void
}		}


; Test with the 64-bit integer bitpattern for a 32-bit float in the		; Test with the 64-bit integer bitpattern for a 32-bit float in the
; low 32-bits, which is not a valid 64-bit inline immmediate.		; low 32-bits, which is not a valid 64-bit inline immmediate.

; FUNC-LABEL: {{^}}s_and_inline_imm_f32_4.0_i64:		; FUNC-LABEL: {{^}}s_and_inline_imm_f32_4.0_i64:
; SI: s_load_dwordx2		; SI-DAG: s_load_dwordx2
; SI: s_load_dword s		; SI-DAG: s_load_dword s
; SI-NOT: and		; SI-NOT: and
; SI: s_and_b32 s[[K_HI:[0-9]+]], s{{[0-9]+}}, 4.0		; SI: s_and_b32 s[[K_HI:[0-9]+]], s{{[0-9]+}}, 4.0
; SI-NOT: and		; SI-NOT: and
; SI: buffer_store_dwordx2		; SI: buffer_store_dwordx2
define void @s_and_inline_imm_f32_4.0_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %aptr, i64 %a) {		define void @s_and_inline_imm_f32_4.0_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %aptr, i64 %a) {
%and = and i64 %a, 1082130432		%and = and i64 %a, 1082130432
store i64 %and, i64 addrspace(1)* %out, align 8		store i64 %and, i64 addrspace(1)* %out, align 8
ret void		ret void
▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/atomic_cmp_swap_local.ll

	; RUN: llc -march=amdgcn -mcpu=SI -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=SICI -check-prefix=GCN -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=SI -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=SICI -check-prefix=GCN -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=bonaire -verify-machineinstrs < %s \| FileCheck -check-prefix=SICI -check-prefix=CIVI -check-prefix=GCN -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=bonaire -verify-machineinstrs < %s \| FileCheck -check-prefix=SICI -check-prefix=CIVI -check-prefix=GCN -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=VI -check-prefix=CIVI -check-prefix=GCN -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=VI -check-prefix=CIVI -check-prefix=GCN -check-prefix=FUNC %s

	; FUNC-LABEL: {{^}}lds_atomic_cmpxchg_ret_i32_offset:			; FUNC-LABEL: {{^}}lds_atomic_cmpxchg_ret_i32_offset:
	; GCN: v_mov_b32_e32 [[VCMP:v[0-9]+]], 7			; GCN-DAG: v_mov_b32_e32 [[VCMP:v[0-9]+]], 7
	; SICI: s_load_dword [[PTR:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0xb			; SICI-DAG: s_load_dword [[PTR:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0xb
	; SICI: s_load_dword [[SWAP:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0xc			; SICI-DAG: s_load_dword [[SWAP:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0xc
	; VI: s_load_dword [[PTR:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0x2c			; VI-DAG: s_load_dword [[PTR:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0x2c
	; VI: s_load_dword [[SWAP:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0x30			; VI-DAG: s_load_dword [[SWAP:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0x30
	; GCN-DAG: v_mov_b32_e32 [[VPTR:v[0-9]+]], [[PTR]]			; GCN-DAG: v_mov_b32_e32 [[VPTR:v[0-9]+]], [[PTR]]
	; GCN-DAG: v_mov_b32_e32 [[VSWAP:v[0-9]+]], [[SWAP]]			; GCN-DAG: v_mov_b32_e32 [[VSWAP:v[0-9]+]], [[SWAP]]
	; GCN: ds_cmpst_rtn_b32 [[RESULT:v[0-9]+]], [[VPTR]], [[VCMP]], [[VSWAP]] offset:16			; GCN: ds_cmpst_rtn_b32 [[RESULT:v[0-9]+]], [[VPTR]], [[VCMP]], [[VSWAP]] offset:16
	; GCN: s_endpgm			; GCN: s_endpgm
	define void @lds_atomic_cmpxchg_ret_i32_offset(i32 addrspace(1)* %out, i32 addrspace(3)* %ptr, i32 %swap) nounwind {			define void @lds_atomic_cmpxchg_ret_i32_offset(i32 addrspace(1)* %out, i32 addrspace(3)* %ptr, i32 %swap) nounwind {
	%gep = getelementptr i32, i32 addrspace(3)* %ptr, i32 4			%gep = getelementptr i32, i32 addrspace(3)* %ptr, i32 4
	%pair = cmpxchg i32 addrspace(3)* %gep, i32 7, i32 %swap seq_cst monotonic			%pair = cmpxchg i32 addrspace(3)* %gep, i32 7, i32 %swap seq_cst monotonic
	%result = extractvalue { i32, i1 } %pair, 0			%result = extractvalue { i32, i1 } %pair, 0
	store i32 %result, i32 addrspace(1)* %out, align 4			store i32 %result, i32 addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}lds_atomic_cmpxchg_ret_i64_offset:			; FUNC-LABEL: {{^}}lds_atomic_cmpxchg_ret_i64_offset:
	; GCN-DAG: v_mov_b32_e32 v[[LOVCMP:[0-9]+]], 7			; GCN-DAG: v_mov_b32_e32 v[[LOVCMP:[0-9]+]], 7
	; GCN-DAG: v_mov_b32_e32 v[[HIVCMP:[0-9]+]], 0			; GCN-DAG: v_mov_b32_e32 v[[HIVCMP:[0-9]+]], 0
	; SICI: s_load_dword [[PTR:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0xb			; SICI-DAG: s_load_dword [[PTR:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0xb
	; SICI: s_load_dwordx2 s{{\[}}[[LOSWAP:[0-9]+]]:[[HISWAP:[0-9]+]]{{\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0xd			; SICI-DAG: s_load_dwordx2 s{{\[}}[[LOSWAP:[0-9]+]]:[[HISWAP:[0-9]+]]{{\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0xd
	; VI: s_load_dword [[PTR:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0x2c			; VI-DAG: s_load_dword [[PTR:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0x2c
	; VI: s_load_dwordx2 s{{\[}}[[LOSWAP:[0-9]+]]:[[HISWAP:[0-9]+]]{{\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0x34			; VI-DAG: s_load_dwordx2 s{{\[}}[[LOSWAP:[0-9]+]]:[[HISWAP:[0-9]+]]{{\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0x34
	; GCN-DAG: v_mov_b32_e32 [[VPTR:v[0-9]+]], [[PTR]]			; GCN-DAG: v_mov_b32_e32 [[VPTR:v[0-9]+]], [[PTR]]
	; GCN-DAG: v_mov_b32_e32 v[[LOSWAPV:[0-9]+]], s[[LOSWAP]]			; GCN-DAG: v_mov_b32_e32 v[[LOSWAPV:[0-9]+]], s[[LOSWAP]]
	; GCN-DAG: v_mov_b32_e32 v[[HISWAPV:[0-9]+]], s[[HISWAP]]			; GCN-DAG: v_mov_b32_e32 v[[HISWAPV:[0-9]+]], s[[HISWAP]]
	; GCN: ds_cmpst_rtn_b64 [[RESULT:v\[[0-9]+:[0-9]+\]]], [[VPTR]], v{{\[}}[[LOVCMP]]:[[HIVCMP]]{{\]}}, v{{\[}}[[LOSWAPV]]:[[HISWAPV]]{{\]}} offset:32			; GCN: ds_cmpst_rtn_b64 [[RESULT:v\[[0-9]+:[0-9]+\]]], [[VPTR]], v{{\[}}[[LOVCMP]]:[[HIVCMP]]{{\]}}, v{{\[}}[[LOSWAPV]]:[[HISWAPV]]{{\]}} offset:32
	; GCN: buffer_store_dwordx2 [[RESULT]],			; GCN: buffer_store_dwordx2 [[RESULT]],
	; GCN: s_endpgm			; GCN: s_endpgm
	define void @lds_atomic_cmpxchg_ret_i64_offset(i64 addrspace(1)* %out, i64 addrspace(3)* %ptr, i64 %swap) nounwind {			define void @lds_atomic_cmpxchg_ret_i64_offset(i64 addrspace(1)* %out, i64 addrspace(3)* %ptr, i64 %swap) nounwind {
	%gep = getelementptr i64, i64 addrspace(3)* %ptr, i32 4			%gep = getelementptr i64, i64 addrspace(3)* %ptr, i32 4
	▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/bswap.ll

	; RUN: llc -march=amdgcn -mcpu=SI -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=SI -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s

	declare i32 @llvm.bswap.i32(i32) nounwind readnone			declare i32 @llvm.bswap.i32(i32) nounwind readnone
	declare <2 x i32> @llvm.bswap.v2i32(<2 x i32>) nounwind readnone			declare <2 x i32> @llvm.bswap.v2i32(<2 x i32>) nounwind readnone
	declare <4 x i32> @llvm.bswap.v4i32(<4 x i32>) nounwind readnone			declare <4 x i32> @llvm.bswap.v4i32(<4 x i32>) nounwind readnone
	declare <8 x i32> @llvm.bswap.v8i32(<8 x i32>) nounwind readnone			declare <8 x i32> @llvm.bswap.v8i32(<8 x i32>) nounwind readnone
	declare i64 @llvm.bswap.i64(i64) nounwind readnone			declare i64 @llvm.bswap.i64(i64) nounwind readnone
	declare <2 x i64> @llvm.bswap.v2i64(<2 x i64>) nounwind readnone			declare <2 x i64> @llvm.bswap.v2i64(<2 x i64>) nounwind readnone
	declare <4 x i64> @llvm.bswap.v4i64(<4 x i64>) nounwind readnone			declare <4 x i64> @llvm.bswap.v4i64(<4 x i64>) nounwind readnone

	; FUNC-LABEL: @test_bswap_i32			; FUNC-LABEL: @test_bswap_i32
	; SI: buffer_load_dword [[VAL:v[0-9]+]]			; SI-DAG: buffer_load_dword [[VAL:v[0-9]+]]
	; SI-DAG: v_alignbit_b32 [[TMP0:v[0-9]+]], [[VAL]], [[VAL]], 8			; SI-DAG: v_alignbit_b32 [[TMP0:v[0-9]+]], [[VAL]], [[VAL]], 8
	; SI-DAG: v_alignbit_b32 [[TMP1:v[0-9]+]], [[VAL]], [[VAL]], 24			; SI-DAG: v_alignbit_b32 [[TMP1:v[0-9]+]], [[VAL]], [[VAL]], 24
	; SI-DAG: s_mov_b32 [[K:s[0-9]+]], 0xff00ff			; SI-DAG: s_mov_b32 [[K:s[0-9]+]], 0xff00ff
	; SI: v_bfi_b32 [[RESULT:v[0-9]+]], [[K]], [[TMP1]], [[TMP0]]			; SI: v_bfi_b32 [[RESULT:v[0-9]+]], [[K]], [[TMP1]], [[TMP0]]
	; SI: buffer_store_dword [[RESULT]]			; SI: buffer_store_dword [[RESULT]]
	; SI: s_endpgm			; SI: s_endpgm
	define void @test_bswap_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in) nounwind {			define void @test_bswap_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in) nounwind {
	%val = load i32, i32 addrspace(1)* %in, align 4			%val = load i32, i32 addrspace(1)* %in, align 4
	▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/ctlz.ll

Show All 30 Lines	define void @s_ctlz_i32(i32 addrspace(1)* noalias %out, i32 %val) nounwind {
%ctlz = call i32 @llvm.ctlz.i32(i32 %val, i1 false) nounwind readnone		%ctlz = call i32 @llvm.ctlz.i32(i32 %val, i1 false) nounwind readnone
store i32 %ctlz, i32 addrspace(1)* %out, align 4		store i32 %ctlz, i32 addrspace(1)* %out, align 4
ret void		ret void
}		}

; FUNC-LABEL: {{^}}v_ctlz_i32:		; FUNC-LABEL: {{^}}v_ctlz_i32:
; SI: buffer_load_dword [[VAL:v[0-9]+]],		; SI: buffer_load_dword [[VAL:v[0-9]+]],
; SI-DAG: v_ffbh_u32_e32 [[CTLZ:v[0-9]+]], [[VAL]]		; SI-DAG: v_ffbh_u32_e32 [[CTLZ:v[0-9]+]], [[VAL]]
; SI-DAG: v_cmp_eq_i32_e32 vcc, 0, [[CTLZ]]		; FIXME v_ffbh_u32 does not look correct for v_ctlz_i32.
		; SI-DAG: v_cmp_eq_i32_e32 vcc, 0, [[VAL]]
; SI: v_cndmask_b32_e64 [[RESULT:v[0-9]+]], [[CTLZ]], 32, vcc		; SI: v_cndmask_b32_e64 [[RESULT:v[0-9]+]], [[CTLZ]], 32, vcc
; SI: buffer_store_dword [[RESULT]],		; SI: buffer_store_dword [[RESULT]],
; SI: s_endpgm		; SI: s_endpgm

; EG: FFBH_UINT		; EG: FFBH_UINT
; EG: CNDE_INT		; EG: CNDE_INT
define void @v_ctlz_i32(i32 addrspace(1)* noalias %out, i32 addrspace(1)* noalias %valptr) nounwind {		define void @v_ctlz_i32(i32 addrspace(1)* noalias %out, i32 addrspace(1)* noalias %valptr) nounwind {
%val = load i32, i32 addrspace(1)* %valptr, align 4		%val = load i32, i32 addrspace(1)* %valptr, align 4
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	define void @v_ctlz_v4i32(<4 x i32> addrspace(1)* noalias %out, <4 x i32> addrspace(1)* noalias %valptr) nounwind {
%ctlz = call <4 x i32> @llvm.ctlz.v4i32(<4 x i32> %val, i1 false) nounwind readnone		%ctlz = call <4 x i32> @llvm.ctlz.v4i32(<4 x i32> %val, i1 false) nounwind readnone
store <4 x i32> %ctlz, <4 x i32> addrspace(1)* %out, align 16		store <4 x i32> %ctlz, <4 x i32> addrspace(1)* %out, align 16
ret void		ret void
}		}

; FUNC-LABEL: {{^}}v_ctlz_i8:		; FUNC-LABEL: {{^}}v_ctlz_i8:
; SI: buffer_load_ubyte [[VAL:v[0-9]+]],		; SI: buffer_load_ubyte [[VAL:v[0-9]+]],
; SI-DAG: v_ffbh_u32_e32 [[FFBH:v[0-9]+]], [[VAL]]		; SI-DAG: v_ffbh_u32_e32 [[FFBH:v[0-9]+]], [[VAL]]
; SI-DAG: v_cmp_eq_i32_e32 vcc, 0, [[CTLZ]]		; SI-DAG: v_cmp_eq_i32_e32 vcc, 0, [[VAL]]
; SI-DAG: v_cndmask_b32_e64 [[CORRECTED_FFBH:v[0-9]+]], [[FFBH]], 32, vcc		; SI-DAG: v_cndmask_b32_e64 [[CORRECTED_FFBH:v[0-9]+]], [[FFBH]], 32, vcc
; SI: v_add_i32_e32 [[RESULT:v[0-9]+]], vcc, 0xffffffe8, [[CORRECTED_FFBH]]		; SI: v_add_i32_e32 [[RESULT:v[0-9]+]], vcc, 0xffffffe8, [[CORRECTED_FFBH]]
; SI: buffer_store_byte [[RESULT]],		; SI: buffer_store_byte [[RESULT]],
define void @v_ctlz_i8(i8 addrspace(1)* noalias %out, i8 addrspace(1)* noalias %valptr) nounwind {		define void @v_ctlz_i8(i8 addrspace(1)* noalias %out, i8 addrspace(1)* noalias %valptr) nounwind {
%val = load i8, i8 addrspace(1)* %valptr		%val = load i8, i8 addrspace(1)* %valptr
%ctlz = call i8 @llvm.ctlz.i8(i8 %val, i1 false) nounwind readnone		%ctlz = call i8 @llvm.ctlz.i8(i8 %val, i1 false) nounwind readnone
store i8 %ctlz, i8 addrspace(1)* %out		store i8 %ctlz, i8 addrspace(1)* %out
ret void		ret void
Show All 21 Lines	define void @s_ctlz_i64_trunc(i32 addrspace(1)* noalias %out, i64 %val) nounwind {
%ctlz = call i64 @llvm.ctlz.i64(i64 %val, i1 false)		%ctlz = call i64 @llvm.ctlz.i64(i64 %val, i1 false)
%trunc = trunc i64 %ctlz to i32		%trunc = trunc i64 %ctlz to i32
store i32 %trunc, i32 addrspace(1)* %out		store i32 %trunc, i32 addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}v_ctlz_i64:		; FUNC-LABEL: {{^}}v_ctlz_i64:
; SI: {{buffer\|flat}}_load_dwordx2 v{{\[}}[[LO:[0-9]+]]:[[HI:[0-9]+]]{{\]}}		; SI: {{buffer\|flat}}_load_dwordx2 v{{\[}}[[LO:[0-9]+]]:[[HI:[0-9]+]]{{\]}}
; SI-DAG: v_cmp_eq_i32_e64 [[CMPHI:s\[[0-9]+:[0-9]+\]]], 0, v[[HI]]
; SI-DAG: v_ffbh_u32_e32 [[FFBH_LO:v[0-9]+]], v[[LO]]		; SI-DAG: v_ffbh_u32_e32 [[FFBH_LO:v[0-9]+]], v[[LO]]
; SI-DAG: v_add_i32_e32 [[ADD:v[0-9]+]], vcc, 32, [[FFBH_LO]]		; SI-DAG: v_add_i32_e32 [[ADD:v[0-9]+]], vcc, 32, [[FFBH_LO]]
; SI-DAG: v_ffbh_u32_e32 [[FFBH_HI:v[0-9]+]], v[[HI]]		; SI-DAG: v_ffbh_u32_e32 [[FFBH_HI:v[0-9]+]], v[[HI]]
; SI-DAG: v_cndmask_b32_e64 v[[CTLZ:[0-9]+]], [[FFBH_HI]], [[ADD]], [[CMPHI]]		; SI-DAG: v_cmp_eq_i32_e{{32\|64}} [[CMPHI:s\[[0-9]+:[0-9]+\]\|vcc]], 0, v[[HI]]
		; FIXME: Not checked: When CMPHI != vcc, src3 of the next instruction is not verified.
		; FIXME: Reason: regex can not contain variables.
		; FIXME: Alternatively, VI prefix can be used for tonga, but that would
		; FIXME: require duplication of almost all SI checks except this one or
		; FIXME: moving this test to separate .ll file. Both look like overkill.
		; SI-DAG: v_cndmask_b32_e{{64\|32}} v[[CTLZ:[0-9]+]], [[FFBH_HI]], [[ADD]]
; SI-DAG: v_or_b32_e32 [[OR:v[0-9]+]], v[[LO]], v[[HI]]		; SI-DAG: v_or_b32_e32 [[OR:v[0-9]+]], v[[LO]], v[[HI]]
; SI-DAG: v_cmp_eq_i32_e32 vcc, 0, [[OR]]		; SI-DAG: v_cmp_eq_i32_e32 vcc, 0, [[OR]]
; SI-DAG: v_cndmask_b32_e64 v[[CLTZ_LO:[0-9]+]], v[[CTLZ:[0-9]+]], 64, vcc		; SI-DAG: v_cndmask_b32_e64 v[[CLTZ_LO:[0-9]+]], v[[CTLZ]], 64, vcc
; SI-DAG: v_mov_b32_e32 v[[CTLZ_HI:[0-9]+]], 0{{$}}		; SI-DAG: v_mov_b32_e32 v[[CTLZ_HI:[0-9]+]], 0{{$}}
; SI: {{buffer\|flat}}_store_dwordx2 {{.*}}v{{\[}}[[CLTZ_LO]]:[[CTLZ_HI]]{{\]}}		; SI: {{buffer\|flat}}_store_dwordx2 {{.*}}v{{\[}}[[CLTZ_LO]]:[[CTLZ_HI]]{{\]}}
define void @v_ctlz_i64(i64 addrspace(1)* noalias %out, i64 addrspace(1)* noalias %in) nounwind {		define void @v_ctlz_i64(i64 addrspace(1)* noalias %out, i64 addrspace(1)* noalias %in) nounwind {
%tid = call i32 @llvm.r600.read.tidig.x()		%tid = call i32 @llvm.r600.read.tidig.x()
%in.gep = getelementptr i64, i64 addrspace(1)* %in, i32 %tid		%in.gep = getelementptr i64, i64 addrspace(1)* %in, i32 %tid
%out.gep = getelementptr i64, i64 addrspace(1)* %out, i32 %tid		%out.gep = getelementptr i64, i64 addrspace(1)* %out, i32 %tid
%val = load i64, i64 addrspace(1)* %in.gep		%val = load i64, i64 addrspace(1)* %in.gep
%ctlz = call i64 @llvm.ctlz.i64(i64 %val, i1 false)		%ctlz = call i64 @llvm.ctlz.i64(i64 %val, i1 false)
▲ Show 20 Lines • Show All 114 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/cvt_f32_ubyte.ll

Show All 26 Lines	define void @load_v2i8_to_v2f32(<2 x float> addrspace(1)* noalias %out, <2 x i8> addrspace(1)* noalias %in) nounwind {
%cvt = uitofp <2 x i8> %load to <2 x float>		%cvt = uitofp <2 x i8> %load to <2 x float>
store <2 x float> %cvt, <2 x float> addrspace(1)* %out, align 16		store <2 x float> %cvt, <2 x float> addrspace(1)* %out, align 16
ret void		ret void
}		}

; SI-LABEL: {{^}}load_v3i8_to_v3f32:		; SI-LABEL: {{^}}load_v3i8_to_v3f32:
; SI-NOT: bfe		; SI-NOT: bfe
; SI-NOT: v_cvt_f32_ubyte3_e32		; SI-NOT: v_cvt_f32_ubyte3_e32
; SI-DAG: v_cvt_f32_ubyte2_e32		; SI-DAG: v_cvt_f32_ubyte2_e32 v[[WORD2:[0-9]+]]
; SI-DAG: v_cvt_f32_ubyte1_e32		; SI-DAG: v_cvt_f32_ubyte1_e32 v[[WORD1:[0-9]+]]
; SI-DAG: v_cvt_f32_ubyte0_e32		; SI-DAG: v_cvt_f32_ubyte0_e32 v[[WORD0:[0-9]+]]
; SI: buffer_store_dwordx2 v{{\[}}[[LORESULT]]:[[HIRESULT]]{{\]}},		; SI: buffer_store_dword v[[WORD2]],
		; SI: buffer_store_dwordx2 v{{\[}}[[WORD0]]:[[WORD1]]{{\]}},
define void @load_v3i8_to_v3f32(<3 x float> addrspace(1)* noalias %out, <3 x i8> addrspace(1)* noalias %in) nounwind {		define void @load_v3i8_to_v3f32(<3 x float> addrspace(1)* noalias %out, <3 x i8> addrspace(1)* noalias %in) nounwind {
%load = load <3 x i8>, <3 x i8> addrspace(1)* %in, align 4		%load = load <3 x i8>, <3 x i8> addrspace(1)* %in, align 4
%cvt = uitofp <3 x i8> %load to <3 x float>		%cvt = uitofp <3 x i8> %load to <3 x float>
store <3 x float> %cvt, <3 x float> addrspace(1)* %out, align 16		store <3 x float> %cvt, <3 x float> addrspace(1)* %out, align 16
ret void		ret void
}		}

; SI-LABEL: {{^}}load_v4i8_to_v4f32:		; SI-LABEL: {{^}}load_v4i8_to_v4f32:
▲ Show 20 Lines • Show All 144 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/madak.ll

; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=GCN %s		; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=GCN %s
; XUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=VI -check-prefix=GCN %s		; XUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=VI -check-prefix=GCN %s

; FIXME: Enable VI		; FIXME: Enable VI

declare i32 @llvm.amdgcn.workitem.id.x() nounwind readnone		declare i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
declare float @llvm.fabs.f32(float) nounwind readnone		declare float @llvm.fabs.f32(float) nounwind readnone

; GCN-LABEL: {{^}}madak_f32:		; GCN-LABEL: {{^}}madak_f32:
; GCN: buffer_load_dword [[VA:v[0-9]+]]		; GCN-DAG: s_load_dwordx2 [[SA_LO:s\[[0-9]+:]]{{[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0xb
; GCN: buffer_load_dword [[VB:v[0-9]+]]		; GCN-DAG: s_load_dwordx2 [[SB_LO:s\[[0-9]+:]]{{[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0xd
		; GCN-DAG: buffer_load_dword [[VA:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, [[SA_LO]]{{[0-9]+\]}}
		; GCN-DAG: buffer_load_dword [[VB:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, [[SB_LO]]{{[0-9]+\]}}
; GCN: v_madak_f32_e32 {{v[0-9]+}}, [[VA]], [[VB]], 0x41200000		; GCN: v_madak_f32_e32 {{v[0-9]+}}, [[VA]], [[VB]], 0x41200000
define void @madak_f32(float addrspace(1)* noalias %out, float addrspace(1)* noalias %in.a, float addrspace(1)* noalias %in.b) nounwind {		define void @madak_f32(float addrspace(1)* noalias %out, float addrspace(1)* noalias %in.a, float addrspace(1)* noalias %in.b) nounwind {
%tid = tail call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone		%tid = tail call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
%in.a.gep = getelementptr float, float addrspace(1)* %in.a, i32 %tid		%in.a.gep = getelementptr float, float addrspace(1)* %in.a, i32 %tid
%in.b.gep = getelementptr float, float addrspace(1)* %in.b, i32 %tid		%in.b.gep = getelementptr float, float addrspace(1)* %in.b, i32 %tid
%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid		%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid

%a = load float, float addrspace(1)* %in.a.gep, align 4		%a = load float, float addrspace(1)* %in.a.gep, align 4
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	define void @madak_m_inline_imm_f32(float addrspace(1)* noalias %out, float addrspace(1)* noalias %in.a) nounwind {
store float %madak, float addrspace(1)* %out.gep, align 4		store float %madak, float addrspace(1)* %out.gep, align 4
ret void		ret void
}		}

; Make sure nothing weird happens with a value that is also allowed as		; Make sure nothing weird happens with a value that is also allowed as
; an inline immediate.		; an inline immediate.

; GCN-LABEL: {{^}}madak_inline_imm_f32:		; GCN-LABEL: {{^}}madak_inline_imm_f32:
; GCN: buffer_load_dword [[VA:v[0-9]+]]		; GCN: s_load_dwordx2
; GCN: buffer_load_dword [[VB:v[0-9]+]]		; GCN: s_load_dwordx2
		; GCN-DAG: buffer_load_dword [[VA:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, s[4:7]
		; GCN-DAG: buffer_load_dword [[VB:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, s[8:11]
; GCN: v_mad_f32 {{v[0-9]+}}, [[VA]], [[VB]], 4.0		; GCN: v_mad_f32 {{v[0-9]+}}, [[VA]], [[VB]], 4.0
define void @madak_inline_imm_f32(float addrspace(1)* noalias %out, float addrspace(1)* noalias %in.a, float addrspace(1)* noalias %in.b) nounwind {		define void @madak_inline_imm_f32(float addrspace(1)* noalias %out, float addrspace(1)* noalias %in.a, float addrspace(1)* noalias %in.b) nounwind {
%tid = tail call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone		%tid = tail call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
%in.a.gep = getelementptr float, float addrspace(1)* %in.a, i32 %tid		%in.a.gep = getelementptr float, float addrspace(1)* %in.a, i32 %tid
%in.b.gep = getelementptr float, float addrspace(1)* %in.b, i32 %tid		%in.b.gep = getelementptr float, float addrspace(1)* %in.b, i32 %tid
%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid		%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid

%a = load float, float addrspace(1)* %in.a.gep, align 4		%a = load float, float addrspace(1)* %in.a.gep, align 4
▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] [llvm-mc] Support of Trap Handler registers (TTMP0..11 and TBA/TMA)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 49661

lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp

lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp

lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.cpp

lib/Target/AMDGPU/SIRegisterInfo.cpp

lib/Target/AMDGPU/SIRegisterInfo.td

test/CodeGen/AMDGPU/and.ll

test/CodeGen/AMDGPU/atomic_cmp_swap_local.ll

test/CodeGen/AMDGPU/bswap.ll

test/CodeGen/AMDGPU/ctlz.ll

test/CodeGen/AMDGPU/cvt_f32_ubyte.ll

test/CodeGen/AMDGPU/madak.ll

[AMDGPU] [llvm-mc] Support of Trap Handler registers (TTMP0..11 and TBA/TMA)
ClosedPublic