This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Target/NVPTX/
-
Target/
-
NVPTX/
-
InstPrinter/
-
NVPTXInstPrinter.h
-
NVPTXInstPrinter.cpp
-
NVPTXInstrInfo.td
-
NVPTXIntrinsics.td
-
test/CodeGen/NVPTX/
-
CodeGen/
-
NVPTX/
-
wmma.py

Differential D59393

[NVPTX] generate correct MMA instruction mnemonics with PTX63+.
ClosedPublic

Authored by tra on Mar 14 2019, 3:17 PM.

Download Raw Diff

Details

Reviewers

serge-sans-paille
timshen

Commits

rG8d825b38ed2c: [NVPTX] generate correct MMA instruction mnemonics with PTX63+.
rL359246: [NVPTX] generate correct MMA instruction mnemonics with PTX63+.

Summary

PTX 6.3 requires using ".aligned" in the MMA instruction names.
In order to generate correct name, now we pass current
PTX version to each instruction as an extra constant operand
and InstPrinter adjusts its output accordingly.

Diff Detail

Repository: rL LLVM

Event Timeline

tra created this revision.Mar 14 2019, 3:17 PM

Herald added a reviewer: serge-sans-paille. · View Herald TranscriptMar 14 2019, 3:17 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: jdoerfert, bixia, hiraditya and 2 others. · View Herald Transcript

tra added a parent revision: D59389: [NVPTX] Refactor generation of MMA intrinsics and instructions. NFC..Mar 14 2019, 3:17 PM

Harbormaster completed remote builds in B29180: Diff 190741.Mar 14 2019, 3:19 PM

jlebar edited reviewers, added: timshen; removed: jlebar.Mar 14 2019, 4:17 PM

jlebar added a subscriber: jlebar.

timshen added inline comments.Mar 18 2019, 4:50 PM

llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
53 ↗	(On Diff #190741)	Rebase onto D59389?
llvm/test/CodeGen/NVPTX/wmma.py
244 ↗	(On Diff #190741)	Who is supposed to run this script? Can we check-in the result of this script and make them part of the regression tests? Relatedly, for other backends we have a framework for it. See `llvm/utils/update_llc_test_checks.py`. The generated file looks like `llvm/test/CodeGen/PowerPC/atomics-regression.ll`. One of the advantages to check-in the generated file is that, and succeeding behavioral changes are reflected in the patch.

timshen added inline comments.Mar 18 2019, 4:53 PM

llvm/test/CodeGen/NVPTX/wmma.py
244 ↗	(On Diff #190741)	Who is supposed to run this script? I guess I can answer this part - lit. Still, it'd be great to check-in the generated .ll files with RUN lines in them.

Rebased on updated D59389

Harbormaster completed remote builds in B29319: Diff 191213.Mar 18 2019, 5:09 PM

tra marked an inline comment as done.Mar 18 2019, 5:15 PM

tra added inline comments.

llvm/test/CodeGen/NVPTX/wmma.py

244 ↗

(On Diff #190741)

The script is executed by the lit which then runs llc with the generated output and checks the resulting PTX.

I'm not convinced that committing generated .ll has much value -- it's in the ballpark of a megabyte of uninteresting boilerplate mostly consisting of enumerating 4- and 8-tuples of arguments and results. Upcoming changes will bring more supported types and will multiply the amount of generated ll without making it any more interesting for humans.

e.g just one function out of *a lot*:

declare {<2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>} @llvm.nvvm.wmma.m16n16k16.load.a.row.f16.p3i8(i8 addrspace(3)* %src );

; CHECK-LABEL: .func {{.*}}test_llvm_nvvm_wmma_m16n16k16_load_a_row_f16_p3i8(
define {<2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>} @test_llvm_nvvm_wmma_m16n16k16_load_a_row_f16_p3i8(i8 addrspace(3)* %src ) {
; CHECK: wmma.load.a.sync.aligned.row.m16n16k16.shared.f16
; CHECK: {{{%hh[0-9]+, *%hh[0-9]+, *%hh[0-9]+, *%hh[0-9]+, *%hh[0-9]+, *%hh[0-9]+, *%hh[0-9]+, *%hh[0-9]+}}}
; CHECK: [%rd{{[0-9]+}}]
  %v0 = call {<2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>} @llvm.nvvm.wmma.m16n16k16.load.a.row.f16.p3i8(i8 addrspace(3)* %src );
  ret {<2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>} %v0;
}

It's easy enough to generate or grab the one done by lit -- the name would be right there in the failing command.

timshen accepted this revision.Mar 18 2019, 5:44 PM

timshen added inline comments.

llvm/test/CodeGen/NVPTX/wmma.py
244 ↗	(On Diff #190741)	I'm not convinced that committing generated .ll has much value -- it's in the ballpark of a megabyte of uninteresting boilerplate mostly consisting of enumerating 4- and 8-tuples of arguments and results. Upcoming changes will bring more supported types and will multiply the amount of generated ll without making it any more interesting for humans. test/CodeGen/X86 already does this: ...src/llvm-project/llvm % wc -c `grep 'autogenerated by utils/update_llc' -r test/CodeGen/X86 -l` \| tail -1 37287954 total It has 37MB of autogenerated .ll files, presumably for its massive intrinsics. e.g just one function out of a lot: Compared to `test/CodeGen/X86/avx512vl-vec-masked-cmp.ll` it isn't that bad. ;) However, I realized that `wmma.py` is somewhat different from `utils/update_llc_test_checks.py` What `utils/update_llc_test_checks.py` does: Run llc on the arbitrary input IR and get the asm output. Use regex replacement to turn the asm into CHECK lines. The regexes are different for different targets. Print out the .ll file with those CHECK lines. What `wmma.py` does: Enumerate all possible combinations of wmma IR inputs. Generate the CHECK lines directly using the same wmma-specific knowledge that generates the IR. Print out the .ll file with the CHECK lines. The key difference is that `update_llc_test_checks.py` won't be wmma-specific. Another crucial difference is that wmma.py generates very generic check-lines like `[%rd{{[0-9]+}}]`, while `update_llc_test_checks.py` usually prints out the exact literal it extracts from the asm result, e.g. `%rd1`. As a result, wmma.py's output isn't as readable as I thought it would be (less literals), so I'm fine without checking-in the wppa.py-generated files. However, I encourage that some of the NVPTX contributors (!) add NVPTX support to `update_llc_test_checks.py`. With that, we could have supported wmma.py almost freely, along with all other kinds of PTX regression tests.

This revision is now accepted and ready to land.Mar 18 2019, 5:44 PM

tra marked an inline comment as done.Mar 19 2019, 11:26 AM

tra added inline comments.

llvm/test/CodeGen/NVPTX/wmma.py
244 ↗	(On Diff #190741)	IIUIC, `update_llc_test_checks.py` effectively freezes the output generated by llcn now so it can be checked for regressions later. wmma.py use case is different, at least for me -- I use it as a way to create the reference output that llc can't generate yet and then use it to make sure my NVPTX back-end changes do the right thing. That said, once the back-end functionality is implemented, it becomes just a 'compare to the reference' test and the task of generating CHECK lines can be indeed offloaded to `update_llc_test_checks.py`. I'll think of splitting these two use cases. Perhaps I should keep the script to aid with development, but, once it's done, generate reference .ll with implemented intrinsics and let `update_llc_test_checks.py` generate the checks for generated PTX.

tra added a child revision: D60015: [NVPTX] Added intrinsics/instructions for MMA ops on (sub-)integers.Apr 4 2019, 11:31 AM

Closed by commit rL359246: [NVPTX] generate correct MMA instruction mnemonics with PTX63+. (authored by tra). · Explain WhyApr 25 2019, 3:26 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

NVPTX/

InstPrinter/

2 lines

14 lines

4 lines

279 lines

test/

CodeGen/

NVPTX/

wmma.py

17 lines

Diff 196736

llvm/trunk/lib/Target/NVPTX/InstPrinter/NVPTXInstPrinter.h

Show All 34 Lines	public:

void printOperand(const MCInst *MI, unsigned OpNo, raw_ostream &O);		void printOperand(const MCInst *MI, unsigned OpNo, raw_ostream &O);
void printCvtMode(const MCInst *MI, int OpNum, raw_ostream &O,		void printCvtMode(const MCInst *MI, int OpNum, raw_ostream &O,
const char *Modifier = nullptr);		const char *Modifier = nullptr);
void printCmpMode(const MCInst *MI, int OpNum, raw_ostream &O,		void printCmpMode(const MCInst *MI, int OpNum, raw_ostream &O,
const char *Modifier = nullptr);		const char *Modifier = nullptr);
void printLdStCode(const MCInst *MI, int OpNum,		void printLdStCode(const MCInst *MI, int OpNum,
raw_ostream &O, const char *Modifier = nullptr);		raw_ostream &O, const char *Modifier = nullptr);
		void printMmaCode(const MCInst *MI, int OpNum, raw_ostream &O,
		const char *Modifier = nullptr);
void printMemOperand(const MCInst *MI, int OpNum,		void printMemOperand(const MCInst *MI, int OpNum,
raw_ostream &O, const char *Modifier = nullptr);		raw_ostream &O, const char *Modifier = nullptr);
void printProtoIdent(const MCInst *MI, int OpNum,		void printProtoIdent(const MCInst *MI, int OpNum,
raw_ostream &O, const char *Modifier = nullptr);		raw_ostream &O, const char *Modifier = nullptr);
};		};

}		}

#endif		#endif

llvm/trunk/lib/Target/NVPTX/InstPrinter/NVPTXInstPrinter.cpp

Show First 20 Lines • Show All 263 Lines • ▼ Show 20 Lines	if (!strcmp(Modifier, "volatile")) {
else if (Imm == NVPTX::PTXLdStInstCode::V4)		else if (Imm == NVPTX::PTXLdStInstCode::V4)
O << ".v4";		O << ".v4";
} else		} else
llvm_unreachable("Unknown Modifier");		llvm_unreachable("Unknown Modifier");
} else		} else
llvm_unreachable("Empty Modifier");		llvm_unreachable("Empty Modifier");
}		}

		void NVPTXInstPrinter::printMmaCode(const MCInst *MI, int OpNum, raw_ostream &O,
		const char *Modifier) {
		const MCOperand &MO = MI->getOperand(OpNum);
		int Imm = (int)MO.getImm();
		if (Modifier == nullptr \|\| strcmp(Modifier, "version") == 0) {
		O << Imm; // Just print out PTX version
		} else if (strcmp(Modifier, "aligned") == 0) {
		// PTX63 requires '.aligned' in the name of the instruction.
		if (Imm >= 63)
		O << ".aligned";
		} else
		llvm_unreachable("Unknown Modifier");
		}

void NVPTXInstPrinter::printMemOperand(const MCInst *MI, int OpNum,		void NVPTXInstPrinter::printMemOperand(const MCInst *MI, int OpNum,
raw_ostream &O, const char *Modifier) {		raw_ostream &O, const char *Modifier) {
printOperand(MI, OpNum, O);		printOperand(MI, OpNum, O);

if (Modifier && !strcmp(Modifier, "add")) {		if (Modifier && !strcmp(Modifier, "add")) {
O << ", ";		O << ", ";
printOperand(MI, OpNum + 1, O);		printOperand(MI, OpNum + 1, O);
} else {		} else {
Show All 16 Lines

llvm/trunk/lib/Target/NVPTX/NVPTXInstrInfo.td

	Show First 20 Lines • Show All 1,542 Lines • ▼ Show 20 Lines
	def imemAny : Operand<iPTRAny> {			def imemAny : Operand<iPTRAny> {
	let PrintMethod = "printOperand";			let PrintMethod = "printOperand";
	}			}

	def LdStCode : Operand<i32> {			def LdStCode : Operand<i32> {
	let PrintMethod = "printLdStCode";			let PrintMethod = "printLdStCode";
	}			}

				def MmaCode : Operand<i32> {
				let PrintMethod = "printMmaCode";
				}

	def SDTWrapper : SDTypeProfile<1, 1, [SDTCisSameAs<0, 1>, SDTCisPtrTy<0>]>;			def SDTWrapper : SDTypeProfile<1, 1, [SDTCisSameAs<0, 1>, SDTCisPtrTy<0>]>;
	def Wrapper : SDNode<"NVPTXISD::Wrapper", SDTWrapper>;			def Wrapper : SDNode<"NVPTXISD::Wrapper", SDTWrapper>;

	// Load a memory address into a u32 or u64 register.			// Load a memory address into a u32 or u64 register.
	def MOV_ADDR : NVPTXInst<(outs Int32Regs:$dst), (ins imem:$a),			def MOV_ADDR : NVPTXInst<(outs Int32Regs:$dst), (ins imem:$a),
	"mov.u32 \t$dst, $a;",			"mov.u32 \t$dst, $a;",
	[(set Int32Regs:$dst, (Wrapper tglobaladdr:$a))]>;			[(set Int32Regs:$dst, (Wrapper tglobaladdr:$a))]>;
	def MOV_ADDR64 : NVPTXInst<(outs Int64Regs:$dst), (ins imem:$a),			def MOV_ADDR64 : NVPTXInst<(outs Int64Regs:$dst), (ins imem:$a),
	▲ Show 20 Lines • Show All 1,573 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/NVPTX/NVPTXIntrinsics.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show All 32 Lines	def AS_match {
code shared = [{		code shared = [{
return ChkMemSDNodeAddressSpace(N, llvm::ADDRESS_SPACE_SHARED);		return ChkMemSDNodeAddressSpace(N, llvm::ADDRESS_SPACE_SHARED);
}];		}];
code global = [{		code global = [{
return ChkMemSDNodeAddressSpace(N, llvm::ADDRESS_SPACE_GLOBAL);		return ChkMemSDNodeAddressSpace(N, llvm::ADDRESS_SPACE_GLOBAL);
}];		}];
}		}

		// A node that will be replaced with the current PTX version.
		class PTX {
		SDNodeXForm PTXVerXform = SDNodeXForm<imm, [{
		return getI32Imm(Subtarget->getPTXVersion(), SDLoc(N));
		}]>;
		// (i32 0) will be XForm'ed to the currently used PTX version.
		dag version = (PTXVerXform (i32 0));
		}
		def ptx : PTX;

		// Generates list of n sequential register names.
		// E.g. RegNames<3,"r">.ret -> ["r0", "r1", "r2" ]
		class RegSeq<int n, string prefix> {
		list<string> ret = !if(n, !listconcat(RegSeq<!add(n,-1), prefix>.ret,
		[prefix # !add(n, -1)]),
		[]);
		}

//-----------------------------------		//-----------------------------------
// Synchronization and shuffle functions		// Synchronization and shuffle functions
//-----------------------------------		//-----------------------------------
let isConvergent = 1 in {		let isConvergent = 1 in {
def INT_BARRIER0 : NVPTXInst<(outs), (ins),		def INT_BARRIER0 : NVPTXInst<(outs), (ins),
"bar.sync \t0;",		"bar.sync \t0;",
[(int_nvvm_barrier0)]>;		[(int_nvvm_barrier0)]>;
def INT_BARRIERN : NVPTXInst<(outs), (ins Int32Regs:$src1),		def INT_BARRIERN : NVPTXInst<(outs), (ins Int32Regs:$src1),
▲ Show 20 Lines • Show All 7,330 Lines • ▼ Show 20 Lines
def INT_PTX_SREG_PM3 : PTX_READ_SREG_R32<"pm3", int_nvvm_read_ptx_sreg_pm3>;		def INT_PTX_SREG_PM3 : PTX_READ_SREG_R32<"pm3", int_nvvm_read_ptx_sreg_pm3>;

// TODO: It would be nice to use PTX_READ_SREG here, but it doesn't		// TODO: It would be nice to use PTX_READ_SREG here, but it doesn't
// handle the constant.		// handle the constant.
def INT_PTX_SREG_WARPSIZE :		def INT_PTX_SREG_WARPSIZE :
NVPTXInst<(outs Int32Regs:$dst), (ins), "mov.u32 \t$dst, WARP_SZ;",		NVPTXInst<(outs Int32Regs:$dst), (ins), "mov.u32 \t$dst, WARP_SZ;",
[(set Int32Regs:$dst, (int_nvvm_read_ptx_sreg_warpsize))]>;		[(set Int32Regs:$dst, (int_nvvm_read_ptx_sreg_warpsize))]>;

class EmptyNVPTXInst : NVPTXInst<(outs), (ins), "?", []>;
// Generates list of n sequential register names.
class RegSeq<int n, string prefix> {
list<string> ret = !if(n, !listconcat(RegSeq<!add(n,-1), prefix>.ret,
[prefix # !add(n, -1)]),
[]);
}

// Helper class that represents a 'fragment' of an NVPTX *MMA instruction.		// Helper class that represents a 'fragment' of an NVPTX *MMA instruction.
// In addition to target-independent fields provided by WMMA_REGS, it adds		// In addition to target-independent fields provided by WMMA_REGS, it adds
// the fields commonly used to implement specific PTX instruction -- register		// the fields commonly used to implement specific PTX instruction -- register
// types and names, constraints, parts of assembly, etc.		// types and names, constraints, parts of assembly, etc.
class WMMA_REGINFO<string Geom, string Frag, string PtxEltType>		class WMMA_REGINFO<string Geom, string Frag, string PtxEltType>
: WMMA_REGS<Geom, Frag, PtxEltType> {		: WMMA_REGS<Geom, Frag, PtxEltType> {
// NVPTX register types used to carry fragment data.		// NVPTX register types used to carry fragment data.
NVPTXRegClass regclass = !cond(		NVPTXRegClass regclass = !cond(
!eq(PtxEltType, "f16") : Float16x2Regs,		!eq(PtxEltType, "f16") : Float16x2Regs,
!eq(PtxEltType, "f32") : Float32Regs);		!eq(PtxEltType, "f32") : Float32Regs);

// Instruction input/output arguments for the fragment.		// Instruction input/output arguments for the fragment.
list<NVPTXRegClass> ptx_regs = !foreach(tmp, regs, regclass);		list<NVPTXRegClass> ptx_regs = !foreach(tmp, regs, regclass);

// List of register names for the fragment -- ["ra0", "ra1",...]		// List of register names for the fragment -- ["ra0", "ra1",...]
list<string> reg_names = RegSeq<!size(ptx_regs), "r"#frag>.ret;		list<string> reg_names = RegSeq<!size(ptx_regs), "r"#frag>.ret;

// Generates "{{$r0, $r1,.... $rN-1}}" for use in asm string construction.		// Generates "{{$r0, $r1,.... $rN-1}}" for use in asm string construction.
string regstring = "{{$" # !head(reg_names)		string regstring = "{{$" # !head(reg_names)
# !foldl("", !tail(reg_names), a, b,		# !foldl("", !tail(reg_names), a, b,
!strconcat(a, ", $", b))		!strconcat(a, ", $", b))
# "}}";		# "}}";

// Predicates for particular fragment variant. Technically those are		// Predicates for particular fragment variant. Technically those are
// per-instruction predicates, but currently all fragments that can be used in		// per-instruction predicates, but currently all fragments that can be used in
Show All 13 Lines	!and(!or(!eq(Geom, "m8n32k16"),
!or(!eq(PtxEltType, "f16"),		!or(!eq(PtxEltType, "f16"),
!eq(PtxEltType, "f32"))) : [hasSM70, hasPTX61]);		!eq(PtxEltType, "f32"))) : [hasSM70, hasPTX61]);

// template DAGs for instruction inputs/output.		// template DAGs for instruction inputs/output.
dag Outs = !dag(outs, ptx_regs, reg_names);		dag Outs = !dag(outs, ptx_regs, reg_names);
dag Ins = !dag(ins, ptx_regs, reg_names);		dag Ins = !dag(ins, ptx_regs, reg_names);
}		}

class BuildPattern<dag Outs, PatFrag IntrMatcher, dag Ins> {		// Convert dag of arguments into a dag to match given intrinsic.
		class BuildPatternI<Intrinsic Intr, dag Ins> {
		// Build a dag pattern that matches the intrinsic call.
		dag ret = !foreach(tmp, Ins,
		!subst(imem, ADDRvar,
		!subst(MEMri64, ADDRri64,
		!subst(MEMri, ADDRri,
		!subst(ins, Intr, tmp)))));
		}

		// Same as above, but uses PatFrag instead of an Intrinsic.
		class BuildPatternPF<PatFrag Intr, dag Ins> {
// Build a dag pattern that matches the intrinsic call.		// Build a dag pattern that matches the intrinsic call.
// We want a dag that looks like this:		dag ret = !foreach(tmp, Ins,
// (set <output args>, (intrinsic <input arguments>)) where input and
// output arguments are named patterns that would match corresponding
// input/output arguments of the instruction.
//
// First we construct (set <output arguments>) from instruction's outs dag by
// replacing dag operator 'outs' with 'set'.
dag PatOuts = !foreach(tmp, Outs, !subst(outs, set, tmp));
// Similarly, construct (intrinsic <input arguments>) sub-dag from
// instruction's input arguments, only now we also need to replace operands
// with patterns that would match them and the operator 'ins' with the
// intrinsic.
dag PatArgs = !foreach(tmp, Ins,
!subst(imem, ADDRvar,		!subst(imem, ADDRvar,
!subst(MEMri64, ADDRri64,		!subst(MEMri64, ADDRri64,
!subst(MEMri, ADDRri,		!subst(MEMri, ADDRri,
!subst(ins, IntrMatcher, tmp)))));		!subst(ins, Intr, tmp)))));
// Finally, consatenate both parts together. !con() requires both dags to have		}
// the same operator, so we wrap PatArgs in a (set ...) dag.
dag ret = !con(PatOuts, (set PatArgs));		// Common WMMA-related fields used for building patterns for all MMA instructions.
		class WMMA_INSTR<string _Intr, list<dag> _Args>
		: NVPTXInst<(outs), (ins), "?", []> {
		Intrinsic Intr = !cast<Intrinsic>(_Intr);
		// Concatenate all arguments into a single dag.
		dag Args = !foldl((ins), _Args, a, b, !con(a,b));
		// Pre-build the pattern to match (intrinsic arg0, arg1, ...).
		dag IntrinsicPattern = BuildPatternI<!cast<Intrinsic>(Intr), Args>.ret;
}		}

//		//
// wmma.load.[a\|b\|c].sync.[row\|col].m16n16k16[\|.global\|.shared].[f16\|f32]		// wmma.load.[a\|b\|c].sync.[row\|col].m16n16k16[\|.global\|.shared].[f16\|f32]
//		//

class WMMA_LOAD_INTR_HELPER<WMMA_REGINFO Frag, string Layout, string Space,
bit WithStride>
: PatFrag <(ops),(ops)> {
// Intrinsic that matches this instruction.
Intrinsic Intr = !cast<Intrinsic>(WMMA_NAME_LDST<"load", Frag, Layout,
WithStride>.record);
let Operands = !if(WithStride, (ops node:$src, node:$ldm), (ops node:$src));
let Fragments = [!foreach(tmp, Operands, !subst(ops, Intr, tmp))];
let PredicateCode = !cond(!eq(Space, ".shared"): AS_match.shared,
!eq(Space, ".global"): AS_match.global,
1: AS_match.generic);
}

class WMMA_LOAD<WMMA_REGINFO Frag, string Layout, string Space, bit WithStride,		class WMMA_LOAD<WMMA_REGINFO Frag, string Layout, string Space, bit WithStride,
DAGOperand SrcOp>		DAGOperand SrcOp>
: EmptyNVPTXInst,		: WMMA_INSTR<WMMA_NAME_LDST<"load", Frag, Layout, WithStride>.record,
		[!con((ins SrcOp:$src),
		!if(WithStride, (ins Int32Regs:$ldm), (ins)))]>,
Requires<Frag.Predicates> {		Requires<Frag.Predicates> {
// Pattern that matches the intrinsic for this instruction variant.		// Load/store intrinsics are overloaded on pointer's address space.
PatFrag IntrMatcher = WMMA_LOAD_INTR_HELPER<Frag, Layout, Space, WithStride>;		// To match the right intrinsic, we need to build AS-constrained PatFrag.
dag Ins = !con((ins SrcOp:$src), !if(WithStride, (ins Int32Regs:$ldm), (ins)));		// Operands is a dag equivalent in shape to Args, but using (ops node:$name, .....).
		dag PFOperands = !if(WithStride, (ops node:$src, node:$ldm), (ops node:$src));
		// Build PatFrag that only matches particular address space.
		PatFrag IntrFrag = PatFrag<PFOperands,
		!foreach(tmp, PFOperands, !subst(ops, Intr, tmp)),
		!cond(!eq(Space, ".shared"): AS_match.shared,
		!eq(Space, ".global"): AS_match.global,
		1: AS_match.generic)>;
		// Build AS-constrained pattern.
		let IntrinsicPattern = BuildPatternPF<IntrFrag, Args>.ret;

let Pattern = [BuildPattern<Frag.Outs, IntrMatcher, Ins>.ret];
let OutOperandList = Frag.Outs;		let OutOperandList = Frag.Outs;
let InOperandList = Ins;		let InOperandList = !con(Args, (ins MmaCode:$ptx));
let AsmString = "wmma.load."		let AsmString = "wmma.load."
# Frag.frag		# Frag.frag
# ".sync"		# ".sync"
		# "${ptx:aligned}"
# "." # Layout		# "." # Layout
# "." # Frag.geom		# "." # Frag.geom
# Space		# Space
# "." # Frag.ptx_elt_type # " \t"		# "." # Frag.ptx_elt_type # " \t"
# Frag.regstring		# Frag.regstring
# ", [$src]"		# ", [$src]"
# !if(WithStride, ", $ldm", "")		# !if(WithStride, ", $ldm", "")
# ";";		# ";";
}		}

//		//
// wmma.store.d.sync.[row\|col].m16n16k16[\|.global\|.shared].[f16\|f32]		// wmma.store.d.sync.[row\|col].m16n16k16[\|.global\|.shared].[f16\|f32]
//		//
class WMMA_STORE_INTR_HELPER<WMMA_REGINFO Frag, string Layout, string Space,		class WMMA_STORE_D<WMMA_REGINFO Frag, string Layout, string Space,
bit WithStride>		bit WithStride, DAGOperand DstOp>
: PatFrag <(ops),(ops)> {		: WMMA_INSTR<WMMA_NAME_LDST<"store", Frag, Layout, WithStride>.record,
// Intrinsic that matches this instruction.		[!con((ins DstOp:$dst),
Intrinsic Intr = !cast<Intrinsic>(WMMA_NAME_LDST<"store", Frag, Layout,		Frag.Ins,
WithStride>.record);		!if(WithStride, (ins Int32Regs:$ldm), (ins)))]>,
let Operands = !con((ops node:$dst),		Requires<Frag.Predicates> {

		// Load/store intrinsics are overloaded on pointer's address space.
		// To match the right intrinsic, we need to build AS-constrained PatFrag.
		// Operands is a dag equivalent in shape to Args, but using (ops node:$name, .....).
		dag PFOperands = !con((ops node:$dst),
!dag(ops, !foreach(tmp, Frag.regs, node), Frag.reg_names),		!dag(ops, !foreach(tmp, Frag.regs, node), Frag.reg_names),
!if(WithStride, (ops node:$ldm), (ops)));		!if(WithStride, (ops node:$ldm), (ops)));
let Fragments = [!foreach(tmp, Operands, !subst(ops, Intr, tmp))];		// Build PatFrag that only matches particular address space.
let PredicateCode = !cond(!eq(Space, ".shared"): AS_match.shared,		PatFrag IntrFrag = PatFrag<PFOperands,
		!foreach(tmp, PFOperands, !subst(ops, Intr, tmp)),
		!cond(!eq(Space, ".shared"): AS_match.shared,
!eq(Space, ".global"): AS_match.global,		!eq(Space, ".global"): AS_match.global,
1: AS_match.generic);		1: AS_match.generic)>;
}		// Build AS-constrained pattern.
		let IntrinsicPattern = BuildPatternPF<IntrFrag, Args>.ret;

class WMMA_STORE<WMMA_REGINFO Frag, string Layout, string Space, bit WithStride,		let InOperandList = !con(Args, (ins MmaCode:$ptx));
DAGOperand DstOp>
: EmptyNVPTXInst,
Requires<Frag.Predicates> {
PatFrag IntrMatcher = WMMA_STORE_INTR_HELPER<Frag, Layout, Space, WithStride>;
dag Ins = !con((ins DstOp:$src),
Frag.Ins,
!if(WithStride, (ins Int32Regs:$ldm), (ins)));
let Pattern = [BuildPattern<(set), IntrMatcher, Ins>.ret];
let OutOperandList = (outs);		let OutOperandList = (outs);
let InOperandList = Ins;		let AsmString = "wmma.store.d.sync"
let AsmString = "wmma.store.d.sync."		# "${ptx:aligned}"
# Layout		# "." # Layout
# "." # Frag.geom		# "." # Frag.geom
# Space		# Space
# "." # Frag.ptx_elt_type		# "." # Frag.ptx_elt_type
# " \t[$src],"		# " \t[$dst],"
# Frag.regstring		# Frag.regstring
# !if(WithStride, ", $ldm", "")		# !if(WithStride, ", $ldm", "")
# ";";		# ";";
}		}

// Create all load/store variants		// Create all load/store variants
		defset list<WMMA_INSTR> MMA_LDSTs = {
foreach geom = ["m16n16k16", "m32n8k16", "m8n32k16" ] in {		foreach geom = ["m16n16k16", "m32n8k16", "m8n32k16" ] in {
foreach layout = ["row", "col"] in {		foreach layout = ["row", "col"] in {
foreach stride = [0, 1] in {		foreach stride = [0, 1] in {
foreach space = [".global", ".shared", ""] in {		foreach space = [".global", ".shared", ""] in {
foreach addr = [imem, Int32Regs, Int64Regs, MEMri, MEMri64] in {		foreach addr = [imem, Int32Regs, Int64Regs, MEMri, MEMri64] in {
foreach frag = [WMMA_REGINFO<geom, "a", "f16">,		foreach frag = [WMMA_REGINFO<geom, "a", "f16">,
WMMA_REGINFO<geom, "b", "f16">,		WMMA_REGINFO<geom, "b", "f16">,
WMMA_REGINFO<geom, "c", "f16">,		WMMA_REGINFO<geom, "c", "f16">,
WMMA_REGINFO<geom, "c", "f32">] in {		WMMA_REGINFO<geom, "c", "f32">] in {
def : WMMA_LOAD<frag, layout, space, stride, addr>;		def : WMMA_LOAD<frag, layout, space, stride, addr>;
}		}
foreach frag = [WMMA_REGINFO<geom, "d", "f16">,		foreach frag = [WMMA_REGINFO<geom, "d", "f16">,
WMMA_REGINFO<geom, "d", "f32">] in {		WMMA_REGINFO<geom, "d", "f32">] in {
def : WMMA_STORE<frag, layout, space, stride, addr>;		def : WMMA_STORE_D<frag, layout, space, stride, addr>;
}		}
} // addr		} // addr
} // space		} // space
} // stride		} // stride
} // layout		} // layout
} // geom		} // geom
		} // defset

// WMMA.MMA		// WMMA.MMA
class WMMA_MMA<WMMA_REGINFO FragA, WMMA_REGINFO FragB,		class WMMA_MMA<WMMA_REGINFO FragA, WMMA_REGINFO FragB,
WMMA_REGINFO FragC, WMMA_REGINFO FragD,		WMMA_REGINFO FragC, WMMA_REGINFO FragD,
string ALayout, string BLayout, int Satfinite>		string ALayout, string BLayout, int Satfinite>
: EmptyNVPTXInst,		: WMMA_INSTR<WMMA_NAME_MMA<ALayout, BLayout, FragC, FragD, Satfinite>.record,
		[FragA.Ins, FragB.Ins, FragC.Ins]>,
Requires<FragC.Predicates> {		Requires<FragC.Predicates> {
//Intrinsic Intr = int_nvvm_suld_1d_v4i32_zero;		let OutOperandList = FragD.Outs;
Intrinsic Intr = !cast<Intrinsic>(WMMA_NAME_MMA<ALayout, BLayout, FragC, FragD, Satfinite>.record);		let InOperandList = !con(Args, (ins MmaCode:$ptx));
dag Outs = FragD.Outs;		let AsmString = "wmma.mma.sync"
dag Ins = !con(FragA.Ins,		# "${ptx:aligned}"
FragB.Ins,		# "." # ALayout
FragC.Ins);

// Construct the pattern to match corresponding intrinsic call.
// mma does not load/store anything, so we don't need complex operand matching here.
dag PatOuts = !foreach(tmp, Outs, !subst(outs, set, tmp));
dag PatArgs = !foreach(tmp, Ins, !subst(ins, Intr, tmp));
let Pattern = [!con(PatOuts, (set PatArgs))];
let OutOperandList = Outs;
let InOperandList = Ins;
let AsmString = "wmma.mma.sync."
# ALayout
# "." # BLayout		# "." # BLayout
# "." # FragA.geom		# "." # FragA.geom
# "." # FragD.ptx_elt_type		# "." # FragD.ptx_elt_type
# "." # FragC.ptx_elt_type		# "." # FragC.ptx_elt_type
# !if(Satfinite, ".satfinite", "") # "\n\t\t"		# !if(Satfinite, ".satfinite", "") # "\n\t\t"
# FragD.regstring # ",\n\t\t"		# FragD.regstring # ",\n\t\t"
# FragA.regstring # ",\n\t\t"		# FragA.regstring # ",\n\t\t"
# FragB.regstring # ",\n\t\t"		# FragB.regstring # ",\n\t\t"
# FragC.regstring # ";";		# FragC.regstring # ";";
}		}

		defset list<WMMA_INSTR> MMAs = {
foreach geom = ["m16n16k16", "m32n8k16", "m8n32k16" ] in {		foreach geom = ["m16n16k16", "m32n8k16", "m8n32k16" ] in {
foreach layout_a = ["row", "col"] in {		foreach layout_a = ["row", "col"] in {
foreach layout_b = ["row", "col"] in {		foreach layout_b = ["row", "col"] in {
foreach frag_c = [WMMA_REGINFO<geom, "c", "f16">,		foreach frag_c = [WMMA_REGINFO<geom, "c", "f16">,
WMMA_REGINFO<geom, "c", "f32">] in {		WMMA_REGINFO<geom, "c", "f32">] in {
foreach frag_d = [WMMA_REGINFO<geom, "d", "f16">,		foreach frag_d = [WMMA_REGINFO<geom, "d", "f16">,
WMMA_REGINFO<geom, "d", "f32">] in {		WMMA_REGINFO<geom, "d", "f32">] in {
foreach satf = [0, 1] in {		foreach satf = [0, 1] in {
def : WMMA_MMA<WMMA_REGINFO<geom, "a", "f16">,		def : WMMA_MMA<WMMA_REGINFO<geom, "a", "f16">,
WMMA_REGINFO<geom, "b", "f16">,		WMMA_REGINFO<geom, "b", "f16">,
frag_c, frag_d, layout_a, layout_b, satf>;		frag_c, frag_d, layout_a, layout_b, satf>;
} // satf		} // satf
} // frag_d		} // frag_d
} // frag_c		} // frag_c
} // layout_b		} // layout_b
} // layout_a		} // layout_a
} // geom		} // geom
		} // defset

		// Constructing non-flat DAGs is still a pain. I can't !subst a dag node with a
		// dag, so the ptx.version must be appended after foreach replaces 'ins' with
		// the instruction record.
		class WMMA_PAT<WMMA_INSTR wi>
		: Pat<wi.IntrinsicPattern,
		!con(!foreach(tmp, wi.Args, !subst(ins, wi, tmp)),
		(wi ptx.version))>;

		// Build intrinsic->instruction patterns for all MMA instructions.
		foreach mma = !listconcat(MMAs, MMA_LDSTs) in
		def : WMMA_PAT<mma>;

llvm/trunk/test/CodeGen/NVPTX/wmma.py

# This test generates all variants of wmma intrinsics and verifies that LLVM		# This test generates all variants of wmma intrinsics and verifies that LLVM
# generates correct instructions for them.		# generates correct instructions for them.

# RUN: python %s > %t.ll		# RUN: python %s > %t.ll
# RUN: llc < %t.ll -march=nvptx64 -mcpu=sm_70 -mattr=+ptx61 \| FileCheck %t.ll		# RUN: llc < %t.ll -march=nvptx64 -mcpu=sm_70 -mattr=+ptx61 \| FileCheck %t.ll
		# RUN: python %s --ptx=63 > %t-ptx63.ll
		# RUN: llc < %t-ptx63.ll -march=nvptx64 -mcpu=sm_70 -mattr=+ptx63 \| FileCheck %t-ptx63.ll

from __future__ import print_function		from __future__ import print_function

		import argparse
from itertools import product		from itertools import product
from string import Template		from string import Template

def make_wmma_slice_ty(abcd, itype):		def make_wmma_slice_ty(abcd, itype):
elt_ty = "<2 x half>" if itype == "f16" else "float"		elt_ty = "<2 x half>" if itype == "f16" else "float"
num_elts = 4 if abcd in "cd" and itype == "f16" else 8;		num_elts = 4 if abcd in "cd" and itype == "f16" else 8;
return [elt_ty] * num_elts		return [elt_ty] * num_elts

▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
; CHECK: {${check_result}}		; CHECK: {${check_result}}
; CHECK: [%rd{{[0-9]+}}+128]${stride_pattern}		; CHECK: [%rd{{[0-9]+}}+128]${stride_pattern}
%src1 = getelementptr i8, i8 ${as}* %src, i32 128;		%src1 = getelementptr i8, i8 ${as}* %src, i32 128;
%v0 = call ${ret_ty} @${intrinsic}(i8 ${as}* %src1 ${extra_args});		%v0 = call ${ret_ty} @${intrinsic}(i8 ${as}* %src1 ${extra_args});
ret ${ret_ty} %v0;		ret ${ret_ty} %v0;
}		}
"""		"""
intrinsic_template = "llvm.nvvm.wmma.${geom}.load.${abc}.${layout}${stride}.${itype}.${pspace}"		intrinsic_template = "llvm.nvvm.wmma.${geom}.load.${abc}.${layout}${stride}.${itype}.${pspace}"
instruction_template = "wmma.load.${abc}.sync.${layout}.${geom}${space}.${itype}"		instruction_template = "wmma.load.${abc}.sync${aligned}.${layout}.${geom}${space}.${itype}"

for geom, abc, layout, space, stride, itype in product(		for geom, abc, layout, space, stride, itype in product(
known_geoms,		known_geoms,
"abc",		"abc",
["row","col"],		["row","col"],
["",".shared",".global"],		["",".shared",".global"],
["", ".stride"],		["", ".stride"],
["f16", "f32"]):		["f16", "f32"]):

params = {		params = {
"abc" : abc,		"abc" : abc,
		"aligned" : ".aligned" if ptx_version >= 63 else "",
"layout" : layout,		"layout" : layout,
"space" : space,		"space" : space,
"stride" : stride,		"stride" : stride,
"itype" : itype,		"itype" : itype,
"pspace" : get_pspace(space),		"pspace" : get_pspace(space),
"as" : "addrspace(%d)" % get_aspace(space),		"as" : "addrspace(%d)" % get_aspace(space),
"geom" : geom,		"geom" : geom,
}		}
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
; CHECK: ${check_args}		; CHECK: ${check_args}
; CHECK: ${stride_pattern}		; CHECK: ${stride_pattern}
%src1 = getelementptr i8, i8 ${as}* %src, i32 128;		%src1 = getelementptr i8, i8 ${as}* %src, i32 128;
call void @${intrinsic}(i8 ${as}* %src1, ${args}${extra_args});		call void @${intrinsic}(i8 ${as}* %src1, ${args}${extra_args});
ret void		ret void
}		}
"""		"""
intrinsic_template = "llvm.nvvm.wmma.${geom}.store.${abc}.${layout}${stride}.${itype}.${pspace}"		intrinsic_template = "llvm.nvvm.wmma.${geom}.store.${abc}.${layout}${stride}.${itype}.${pspace}"
instruction_template = "wmma.store.${abc}.sync.${layout}.${geom}${space}.${itype}"		instruction_template = "wmma.store.${abc}.sync${aligned}.${layout}.${geom}${space}.${itype}"

for geom, abc, layout, space, stride, itype in product(		for geom, abc, layout, space, stride, itype in product(
known_geoms,		known_geoms,
"d",		"d",
["row","col"],		["row","col"],
["",".shared",".global"],		["",".shared",".global"],
["", ".stride"],		["", ".stride"],
["f16", "f32"]):		["f16", "f32"]):

params = {		params = {
"abc" : abc,		"abc" : abc,
		"aligned" : ".aligned" if ptx_version >= 63 else "",
"layout" : layout,		"layout" : layout,
"space" : space,		"space" : space,
"stride" : stride,		"stride" : stride,
"itype" : itype,		"itype" : itype,
"pspace" : get_pspace(space),		"pspace" : get_pspace(space),
"as" : "addrspace(%d)" % get_aspace(space),		"as" : "addrspace(%d)" % get_aspace(space),
"geom" : geom,		"geom" : geom,
}		}
Show All 28 Lines
; CHECK-NEXT: ${check_ab}		; CHECK-NEXT: ${check_ab}
; CHECK-NEXT: ${check_c}		; CHECK-NEXT: ${check_c}
%r = call ${ret_ty} @${intrinsic}(		%r = call ${ret_ty} @${intrinsic}(
${args});		${args});
ret ${ret_ty} %r;		ret ${ret_ty} %r;
}		}
"""		"""
intrinsic_template = "llvm.nvvm.wmma.${geom}.mma.${alayout}.${blayout}.${dtype}.${ctype}${satf}"		intrinsic_template = "llvm.nvvm.wmma.${geom}.mma.${alayout}.${blayout}.${dtype}.${ctype}${satf}"
instruction_template = "wmma.mma.sync.${alayout}.${blayout}.${geom}.${dtype}.${ctype}${satf}"		instruction_template = "wmma.mma.sync${aligned}.${alayout}.${blayout}.${geom}.${dtype}.${ctype}${satf}"

for geom, alayout, blayout, ctype, dtype, satf in product(		for geom, alayout, blayout, ctype, dtype, satf in product(
known_geoms,		known_geoms,
["row","col"],		["row","col"],
["row","col"],		["row","col"],
["f16", "f32"],		["f16", "f32"],
["f16", "f32"],		["f16", "f32"],
[".satfinite", ""]):		[".satfinite", ""]):

params = {		params = {
		"aligned" : ".aligned" if ptx_version >= 63 else "",
"alayout" : alayout,		"alayout" : alayout,
"blayout" : blayout,		"blayout" : blayout,
"ctype" : ctype,		"ctype" : ctype,
"dtype" : dtype,		"dtype" : dtype,
"satf" : satf,		"satf" : satf,
"geom" : geom,		"geom" : geom,
}		}

Show All 12 Lines	for geom, alayout, blayout, ctype, dtype, satf in product(
test_params["args"] = args		test_params["args"] = args
print(Template(mma_template).substitute(test_params))		print(Template(mma_template).substitute(test_params))

def main():		def main():
gen_wmma_load_tests()		gen_wmma_load_tests()
gen_wmma_store_tests()		gen_wmma_store_tests()
gen_wmma_mma_tests()		gen_wmma_mma_tests()

		parser = argparse.ArgumentParser()
		parser.add_argument('--ptx', type=int, default=60)
		args = parser.parse_args()
		ptx_version = args.ptx

main()		main()