This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/GPU/
-
mlir/
-
Dialect/
-
GPU/
2/4
GPUOps.td
-
lib/Dialect/GPU/IR/
-
Dialect/
-
GPU/
-
IR/
1/2
GPUDialect.cpp

Differential D73465

Add gpu::LaunchOp::addKernelArgument.
AbandonedPublic

Authored by herhut on Jan 27 2020, 5:34 AM.

Download Raw Diff

Details

Reviewers

ftynse
nicolasvasilache

Summary

Code motion into a gpu::LaunchOp region requires operands of moved
instructions to be threaded through operands of the gpu.launch to
ensure that the gpu.launch remaines closed from above.

To enable this, gpu.launch operations are now created with extensible
operand storage. The overhead is expected to be low given that
gpu.launch is a relatively rare operation.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	890 ms	libc++.std/language_support/cmp/cmp_partialord::Unknown Unit Message ("")
	850 ms	libc++.std/language_support/cmp/cmp_strongeq::Unknown Unit Message ("")
	870 ms	libc++.std/language_support/cmp/cmp_strongord::Unknown Unit Message ("")
	780 ms	libc++.std/language_support/cmp/cmp_weakeq::Unknown Unit Message ("")
	850 ms	libc++.std/language_support/cmp/cmp_weakord::Unknown Unit Message ("")

Event Timeline

herhut created this revision.Jan 27 2020, 5:34 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptJan 27 2020, 5:34 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, liufengdb, lucyrfox and 9 others. · View Herald Transcript

Unit tests: fail. 62155 tests passed, 5 failed and 811 were skipped.

failed: libc++.std/language_support/cmp/cmp_partialord/partialord.pass.cpp
failed: libc++.std/language_support/cmp/cmp_strongeq/cmp.strongeq.pass.cpp
failed: libc++.std/language_support/cmp/cmp_strongord/strongord.pass.cpp
failed: libc++.std/language_support/cmp/cmp_weakeq/cmp.weakeq.pass.cpp
failed: libc++.std/language_support/cmp/cmp_weakord/weakord.pass.cpp

clang-tidy: pass.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

Harbormaster failed remote builds in B44988: Diff 240539!Jan 27 2020, 5:53 AM

ftynse accepted this revision.Jan 27 2020, 6:01 AM

ftynse added inline comments.

mlir/include/mlir/Dialect/GPU/GPUOps.td
488	This says "gpu.launch" but the line above says "gpu.func". Let's use one everywhere and say the other is equivalent.

This revision is now accepted and ready to land.Jan 27 2020, 6:01 AM

herhut marked an inline comment as done.Jan 27 2020, 11:16 AM

herhut added inline comments.

mlir/include/mlir/Dialect/GPU/GPUOps.td
488	But they are not. When used inside of a launch, it cannot have operands. I could maybe state that gpu.launch is considered equivalent to a void function? I found it surprising that launch now has a return (as opposed to the terminator). That moves it closer to a function where it should feel more like a loop. WDYT about adding the terminator op back or is that too many operations?

ftynse added inline comments.Jan 27 2020, 1:36 PM

mlir/include/mlir/Dialect/GPU/GPUOps.td
488	Then it's even more confusing than I thought. I'm fine with having a separate terminator for `launch`.

rriddle added inline comments.Jan 27 2020, 1:42 PM

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
285	nit: You can also use `append` to add arguments: emitError().append(..., ..., ...).attachNote(...).append(..., ..., ...);

Code motion into a gpu::LaunchOp region requires operands of moved instructions to be threaded through operands of the gpu.launch to ensure that the gpu.launch remaines closed from above.

Can we revisit lifting this requirement? This was something that was raised from the beginning of the development of this op.

In D73465#1843770, @mehdi_amini wrote:

Code motion into a gpu::LaunchOp region requires operands of moved instructions to be threaded through operands of the gpu.launch to ensure that the gpu.launch remaines closed from above.

Can we revisit lifting this requirement? This was something that was raised from the beginning of the development of this op.

Should we just lift the requirement or completely forego the concept of passing arguments into the launch? Just lifting the requirement would be less of a breaking change but there is no good use of having the operands.

Split out gpu dialect cleanup parts.

herhut marked 2 inline comments as done.Jan 28 2020, 3:14 AM

herhut added inline comments.

mlir/include/mlir/Dialect/GPU/GPUOps.td
488	I will add it back in a new diff.
mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
285	Thanks! I will do this in a new diff.

Unit tests: fail. 62155 tests passed, 5 failed and 811 were skipped.

failed: libc++.std/language_support/cmp/cmp_partialord/partialord.pass.cpp
failed: libc++.std/language_support/cmp/cmp_strongeq/cmp.strongeq.pass.cpp
failed: libc++.std/language_support/cmp/cmp_strongord/strongord.pass.cpp
failed: libc++.std/language_support/cmp/cmp_weakeq/cmp.weakeq.pass.cpp
failed: libc++.std/language_support/cmp/cmp_weakord/weakord.pass.cpp

clang-tidy: pass.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

Harbormaster failed remote builds in B45108: Diff 240821!Jan 28 2020, 3:46 AM

In D73465#1844126, @herhut wrote:

In D73465#1843770, @mehdi_amini wrote:

Code motion into a gpu::LaunchOp region requires operands of moved instructions to be threaded through operands of the gpu.launch to ensure that the gpu.launch remaines closed from above.

Can we revisit lifting this requirement? This was something that was raised from the beginning of the development of this op.

Should we just lift the requirement or completely forego the concept of passing arguments into the launch? Just lifting the requirement would be less of a breaking change but there is no good use of having the operands.

Right: I'm not sure what is the use for having the operands at all?

I went for lifting the requirement that gpu.launch needs to be closed from above. So we no longer have any arguments.

See https://reviews.llvm.org/D73769

Herald added a subscriber: Joonsoo. · View Herald TranscriptJan 31 2020, 1:38 AM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

GPU/

GPUOps.td

4 lines

lib/

Dialect/

GPU/

IR/

GPUDialect.cpp

11 lines

Diff 240821

mlir/include/mlir/Dialect/GPU/GPUOps.td

Show First 20 Lines • Show All 448 Lines • ▼ Show 20 Lines	let extraClassDeclaration = [{

/// Get the SSA values of the kernel arguments.		/// Get the SSA values of the kernel arguments.
iterator_range<Block::args_iterator> getKernelArguments();		iterator_range<Block::args_iterator> getKernelArguments();

/// Erase the `index`-th kernel argument. Both the entry block argument and		/// Erase the `index`-th kernel argument. Both the entry block argument and
/// the operand will be dropped. The block argument must not have any uses.		/// the operand will be dropped. The block argument must not have any uses.
void eraseKernelArgument(unsigned index);		void eraseKernelArgument(unsigned index);

		/// Add the given value as a kernel argument. Returns the corresponding newly
		/// added BlockArgument.
		BlockArgument addKernelArgument(Value argument);

static StringRef getBlocksKeyword() { return "blocks"; }		static StringRef getBlocksKeyword() { return "blocks"; }
static StringRef getThreadsKeyword() { return "threads"; }		static StringRef getThreadsKeyword() { return "threads"; }
static StringRef getArgsKeyword() { return "args"; }		static StringRef getArgsKeyword() { return "args"; }

/// The number of launch configuration operands, placed at the leading		/// The number of launch configuration operands, placed at the leading
/// positions of the operand list.		/// positions of the operand list.
static constexpr unsigned kNumConfigOperands = 6;		static constexpr unsigned kNumConfigOperands = 6;

Show All 11 Lines	def GPU_ReturnOp : GPU_Op<"return", [Terminator]>, Arguments<(ins)>,
Results<(outs)> {		Results<(outs)> {
let summary = "Terminator for GPU launch regions.";		let summary = "Terminator for GPU launch regions.";
let description = [{		let description = [{
A terminator operation for regions that appear in the body of `gpu.launch`		A terminator operation for regions that appear in the body of `gpu.launch`
operation. These regions are not expected to return any value so the		operation. These regions are not expected to return any value so the
terminator takes no operands.		terminator takes no operands.
}];		}];

let parser = [{ return success(); }];		let parser = [{ return success(); }];
		ftynseUnsubmitted Not Done Reply Inline Actions This says "gpu.launch" but the line above says "gpu.func". Let's use one everywhere and say the other is equivalent. ftynse: This says "gpu.launch" but the line above says "gpu.func". Let's use one everywhere and say the…
		herhutAuthorUnsubmitted Done Reply Inline Actions But they are not. When used inside of a launch, it cannot have operands. I could maybe state that gpu.launch is considered equivalent to a void function? I found it surprising that launch now has a return (as opposed to the terminator). That moves it closer to a function where it should feel more like a loop. WDYT about adding the terminator op back or is that too many operations? herhut: But they are not. When used inside of a launch, it cannot have operands. I could maybe state…
		ftynseUnsubmitted Not Done Reply Inline Actions Then it's even more confusing than I thought. I'm fine with having a separate terminator for `launch`. ftynse: Then it's even more confusing than I thought. I'm fine with having a separate terminator for…
		herhutAuthorUnsubmitted Done Reply Inline Actions I will add it back in a new diff. herhut: I will add it back in a new diff.
let printer = [{ p << getOperationName(); }];		let printer = [{ p << getOperationName(); }];
}		}

def GPU_YieldOp : GPU_Op<"yield", [Terminator]>,		def GPU_YieldOp : GPU_Op<"yield", [Terminator]>,
Arguments<(ins Variadic<AnyType>:$values)> {		Arguments<(ins Variadic<AnyType>:$values)> {
let summary = "GPU yield operation";		let summary = "GPU yield operation";
let description = [{		let description = [{
"gpu.yield" is a special terminator operation for blocks inside regions		"gpu.yield" is a special terminator operation for blocks inside regions
▲ Show 20 Lines • Show All 161 Lines • Show Last 20 Lines

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp

Show First 20 Lines • Show All 195 Lines • ▼ Show 20 Lines

void LaunchOp::build(Builder *builder, OperationState &result, Value gridSizeX,		void LaunchOp::build(Builder *builder, OperationState &result, Value gridSizeX,
Value gridSizeY, Value gridSizeZ, Value blockSizeX,		Value gridSizeY, Value gridSizeZ, Value blockSizeX,
Value blockSizeY, Value blockSizeZ, ValueRange operands) {		Value blockSizeY, Value blockSizeZ, ValueRange operands) {
// Add grid and block sizes as op operands, followed by the data operands.		// Add grid and block sizes as op operands, followed by the data operands.
result.addOperands(		result.addOperands(
{gridSizeX, gridSizeY, gridSizeZ, blockSizeX, blockSizeY, blockSizeZ});		{gridSizeX, gridSizeY, gridSizeZ, blockSizeX, blockSizeY, blockSizeZ});
result.addOperands(operands);		result.addOperands(operands);
		// We want to be able to add operands later, for instance due to code motion.
		result.setOperandListToResizable();

// Create a kernel body region with kNumConfigRegionAttributes + N arguments,		// Create a kernel body region with kNumConfigRegionAttributes + N arguments,
// where the first kNumConfigRegionAttributes arguments have `index` type and		// where the first kNumConfigRegionAttributes arguments have `index` type and
// the rest have the same types as the data operands.		// the rest have the same types as the data operands.
Region *kernelRegion = result.addRegion();		Region *kernelRegion = result.addRegion();
Block *body = new Block();		Block *body = new Block();
body->addArguments(		body->addArguments(
std::vector<Type>(kNumConfigRegionAttributes, builder->getIndexType()));		std::vector<Type>(kNumConfigRegionAttributes, builder->getIndexType()));
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	for (Block &block : op.body()) {
if (block.empty())		if (block.empty())
continue;		continue;
if (block.back().getNumSuccessors() != 0)		if (block.back().getNumSuccessors() != 0)
continue;		continue;
if (!isa<gpu::ReturnOp>(&block.back())) {		if (!isa<gpu::ReturnOp>(&block.back())) {
return block.back()		return block.back()
.emitError("expected 'gpu.terminator' or a terminator with "		.emitError("expected 'gpu.terminator' or a terminator with "
"successors")		"successors")
.attachNote(op.getLoc())		.attachNote(op.getLoc())
		rriddleUnsubmitted Not Done Reply Inline Actions nit: You can also use `append` to add arguments: emitError().append(..., ..., ...).attachNote(...).append(..., ..., ...); rriddle: nit: You can also use `append` to add arguments: ``` emitError().append(..., ..., ...).
		herhutAuthorUnsubmitted Done Reply Inline Actions Thanks! I will do this in a new diff. herhut: Thanks! I will do this in a new diff.
<< "in '" << LaunchOp::getOperationName() << "' body region";		<< "in '" << LaunchOp::getOperationName() << "' body region";
}		}
}		}

return success();		return success();
}		}

// Pretty-print the kernel grid/block size assignment as		// Pretty-print the kernel grid/block size assignment as
▲ Show 20 Lines • Show All 152 Lines • ▼ Show 20 Lines
void LaunchOp::eraseKernelArgument(unsigned index) {		void LaunchOp::eraseKernelArgument(unsigned index) {
Block &entryBlock = body().front();		Block &entryBlock = body().front();
assert(index < entryBlock.getNumArguments() - kNumConfigRegionAttributes &&		assert(index < entryBlock.getNumArguments() - kNumConfigRegionAttributes &&
"kernel argument index overflow");		"kernel argument index overflow");
entryBlock.eraseArgument(kNumConfigRegionAttributes + index);		entryBlock.eraseArgument(kNumConfigRegionAttributes + index);
getOperation()->eraseOperand(kNumConfigOperands + index);		getOperation()->eraseOperand(kNumConfigOperands + index);
}		}

		BlockArgument LaunchOp::addKernelArgument(Value value) {
		Block &entryBlock = body().front();
		Operation *op = getOperation();
		llvm::SmallVector<Value, 8> operands(op->getOperands());
		operands.push_back(value);
		op->setOperands(operands);
		return entryBlock.addArgument(value.getType());
		}

namespace {		namespace {
// Clone any known constants passed as operands to the kernel into its body.		// Clone any known constants passed as operands to the kernel into its body.
class PropagateConstantBounds : public OpRewritePattern<LaunchOp> {		class PropagateConstantBounds : public OpRewritePattern<LaunchOp> {
using OpRewritePattern<LaunchOp>::OpRewritePattern;		using OpRewritePattern<LaunchOp>::OpRewritePattern;

PatternMatchResult matchAndRewrite(LaunchOp launchOp,		PatternMatchResult matchAndRewrite(LaunchOp launchOp,
PatternRewriter &rewriter) const override {		PatternRewriter &rewriter) const override {
rewriter.startRootUpdate(launchOp);		rewriter.startRootUpdate(launchOp);
▲ Show 20 Lines • Show All 405 Lines • Show Last 20 Lines