This is an archive of the discontinued LLVM Phabricator instance.

I have some tests ready as well but just wanted to check what kind of test are preferred: test running with the mlir-cuda-runner with a check of the result or a mlir-opt lowering with checks of the lowering?

In D75766#1910251, @clementval wrote:

I have some tests ready as well but just wanted to check what kind of test are preferred: test running with the mlir-cuda-runner with a check of the result or a miler-opt lowering with checks of the lowering?

We can't assume that someone has a GPU to be able to run the compiler unit tests: so lit+FileCheck please.

Thanks Valentin, the change looks good, with one caveat: the all_reduce lowering in LowerGpuOpsToNVVMOps.cpp is on the chopping block.
Would you mind replicating your changes in mlir/lib/Dialect/GPU/Transforms/AllReduceLowering.cpp as well?
I hope I will get to the clean up next week, it's overdue but I've been busy with other stuff.

@csigg Sure I was thinking about it. I'll do it an update the patch.
@mehdi_amini Thanks for the confirmation. I'll add some tests to this patch then.

clementval edited the summary of this revision. (Show Details)Mar 7 2020, 11:12 AM

Nice, thanks for adding this!

mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
127	Instead of asserting here, could you extend the verifier to report errors instead? If this combination is illegal, the verifier should reject it.

In D75766#1910973, @mehdi_amini wrote:

In D75766#1910251, @clementval wrote:

I have some tests ready as well but just wanted to check what kind of test are preferred: test running with the mlir-cuda-runner with a check of the result or a miler-opt lowering with checks of the lowering?

We can't assume that someone has a GPU to be able to run the compiler unit tests: so lit+FileCheck please.

It is still nice to have a test with the mlir-cuda-runner to check that it computes what we expect it to compute if you have already written them. But as Mehdi said, testing the generated IR is the mandatory one.

jdoerfert resigned from this revision.Mar 9 2020, 8:35 AM

Add some mlir-cuda-runner tests
Add some mlir-opt + FileCheck tests
Address other comments

Herald added a subscriber: aartbik. · View Herald TranscriptMar 9 2020, 1:02 PM

@csigg @herhut I updated the patch with your suggestions.

bondhugula requested changes to this revision.Mar 10 2020, 5:27 AM

bondhugula added a subscriber: bondhugula.

bondhugula added inline comments.

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
156 ↗	(On Diff #249195)	Nit: "`" -> '`'
mlir/lib/ExecutionEngine/RunnerUtils.cpp
46 ↗	(On Diff #249195)	Nit: M->rank is int64_t.
46 ↗	(On Diff #249195)	Unrelated to this patch: an UnrankedMemRefType having a ->rank field is weird. Would have been better to name it UnknownRankMemRefType FWIW.
77 ↗	(On Diff #249195)	List initialization? `UnrankedMemRefType<int32_t> descriptor = {rank, ptr};`

This revision now requires changes to proceed.Mar 10 2020, 5:27 AM

Address @bondhugula comments

clementval marked 3 inline comments as done.Mar 10 2020, 6:02 AM

Thanks for addressing the comments. Looks good to land.

@herhut Thanks for the review. Do you mind pushing it? I do not have access rights.

This revision was not accepted when it landed; it landed in state Needs Review.Mar 10 2020, 1:41 PM

Closed by commit rG2eff566b07da: [MLIR] Add `and`, `or`, `xor`, `min`, `max` too gpu.all_reduce and the nvvm… (authored by herhut). · Explain Why

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2020, 1:41 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

@herhut Thanks for pushing the patch. I don't see me in the attribution? Is this normal? I don't really mind but just wanted to double check.

In D75766#1916663, @clementval wrote:

@herhut Thanks for pushing the patch. I don't see me in the attribution? Is this normal? I don't really mind but just wanted to double check.

Thanks for flagging this. No, this is not normal. It is me not checking that arc patch does the right thing. You should have remained as the author. I can revert and reland with corrected attribution.

In D75766#1916763, @herhut wrote:

Thanks for flagging this. No, this is not normal. It is me not checking that arc patch does the right thing. You should have remained as the author. I can revert and reland with corrected attribution.

Thanks a lot! Really appreciate!

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

GPU/

GPUOps.td

16 lines

lib/

Conversion/

GPUToNVVM/

LowerGpuOpsToNVVMOps.cpp

38 lines

Diff 248805

mlir/include/mlir/Dialect/GPU/GPUOps.td

Show First 20 Lines • Show All 476 Lines • ▼ Show 20 Lines	let description = [{

Example:		Example:

```gpu.yield %f0, %f1 : f32, f32		```gpu.yield %f0, %f1 : f32, f32
```		```
}];		}];
}		}

// These mirror the XLA ComparisonDirection enum.		// add, mul mirror the XLA ComparisonDirection enum.
def GPU_AllReduceOpAdd : StrEnumAttrCase<"add">;		def GPU_AllReduceOpAdd : StrEnumAttrCase<"add">;
		def GPU_AllReduceOpAnd : StrEnumAttrCase<"and">;
		def GPU_AllReduceOpMax : StrEnumAttrCase<"max">;
		def GPU_AllReduceOpMin : StrEnumAttrCase<"min">;
def GPU_AllReduceOpMul : StrEnumAttrCase<"mul">;		def GPU_AllReduceOpMul : StrEnumAttrCase<"mul">;
		def GPU_AllReduceOpOr : StrEnumAttrCase<"or">;
		def GPU_AllReduceOpXor : StrEnumAttrCase<"xor">;

def GPU_AllReduceOperationAttr : StrEnumAttr<"AllReduceOperationAttr",		def GPU_AllReduceOperationAttr : StrEnumAttr<"AllReduceOperationAttr",
"built-in reduction operations supported by gpu.allreduce.",		"built-in reduction operations supported by gpu.allreduce.",
[		[
GPU_AllReduceOpAdd,		GPU_AllReduceOpAdd,
		GPU_AllReduceOpAnd,
		GPU_AllReduceOpMax,
		GPU_AllReduceOpMin,
GPU_AllReduceOpMul,		GPU_AllReduceOpMul,
		GPU_AllReduceOpOr,
		GPU_AllReduceOpXor
]>;		]>;

def GPU_AllReduceOp : GPU_Op<"all_reduce",		def GPU_AllReduceOp : GPU_Op<"all_reduce",
[SameOperandsAndResultType, IsolatedFromAbove]>,		[SameOperandsAndResultType, IsolatedFromAbove]>,
Arguments<(ins AnyType:$value,		Arguments<(ins AnyType:$value,
OptionalAttr<GPU_AllReduceOperationAttr>:$op)>,		OptionalAttr<GPU_AllReduceOperationAttr>:$op)>,
Results<(outs AnyType)> {		Results<(outs AnyType)> {
let summary = "Reduce values among workgroup.";		let summary = "Reduce values among workgroup.";
let description = [{		let description = [{
The "all_reduce" op reduces the value of every work item across a local		The "all_reduce" op reduces the value of every work item across a local
workgroup. The result is equal for all work items of a workgroup.		workgroup. The result is equal for all work items of a workgroup.

For example, both		For example, both
```		```
%1 = "gpu.all_reduce"(%0) ({}) { op = "add" } : (f32) -> (f32)		%1 = "gpu.all_reduce"(%0) ({}) { op = "add" } : (f32) -> (f32)
%2 = "gpu.all_reduce"(%0) ({		%2 = "gpu.all_reduce"(%0) ({
^bb(%lhs : f32, %rhs : f32):		^bb(%lhs : f32, %rhs : f32):
%sum = addf %lhs, %rhs : f32		%sum = addf %lhs, %rhs : f32
"gpu.yield"(%sum) : (f32) -> ()		"gpu.yield"(%sum) : (f32) -> ()
}) : (f32) -> (f32)		}) : (f32) -> (f32)
```		```
compute the sum of each work item's %0 value. The first version specifies		compute the sum of each work item's %0 value. The first version specifies
the accumulation as operation, whereas the second version specifies the		the accumulation as operation, whereas the second version specifies the
accumulation as code region. The accumulation operation must either be		accumulation as code region. The accumulation operation must be one of:
`add` or `mul`.		`add`, `and`, `max`, `min`, `mul`, `or`, `xor`.

Either none or all work items of a workgroup need to execute this op		Either none or all work items of a workgroup need to execute this op
in convergence.		in convergence.
}];		}];
let regions = (region AnyRegion:$body);		let regions = (region AnyRegion:$body);
let verifier = [{ return ::verifyAllReduce(*this); }];		let verifier = [{ return ::verifyAllReduce(*this); }];
}		}

▲ Show 20 Lines • Show All 110 Lines • Show Last 20 Lines

mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp

Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	AccumulatorFactory getFactory(StringRef opName, llvm::Type *type) const {
if (opName == "add") {		if (opName == "add") {
return isFloatingPoint ? getFactory<LLVM::FAddOp>()		return isFloatingPoint ? getFactory<LLVM::FAddOp>()
: getFactory<LLVM::AddOp>();		: getFactory<LLVM::AddOp>();
}		}
if (opName == "mul") {		if (opName == "mul") {
return isFloatingPoint ? getFactory<LLVM::FMulOp>()		return isFloatingPoint ? getFactory<LLVM::FMulOp>()
: getFactory<LLVM::MulOp>();		: getFactory<LLVM::MulOp>();
}		}
		if (opName == "and") {
		assert(!isFloatingPoint &&
		herhutUnsubmitted Done Reply Inline Actions Instead of asserting here, could you extend the verifier to report errors instead? If this combination is illegal, the verifier should reject it. herhut: Instead of asserting here, could you extend the verifier to report errors instead? If this…
		"<and> accumulator is not compatible with Floating Point type");
		return getFactory<LLVM::AndOp>();
		}
		if (opName == "or") {
		assert(!isFloatingPoint &&
		"<or> accumulator is not compatible with Floating Point type");
		return getFactory<LLVM::OrOp>();
		}
		if (opName == "xor") {
		assert(!isFloatingPoint &&
		"<xor> accumulator is not compatible with Floating Point type");
		return getFactory<LLVM::XOrOp>();
		}
		if (opName == "max") {
		return isFloatingPoint ? getCmpFactory<LLVM::FCmpOp, LLVM::FCmpPredicate>(
		LLVM::FCmpPredicate::ugt)
		: getCmpFactory<LLVM::ICmpOp, LLVM::ICmpPredicate>(
		LLVM::ICmpPredicate::ugt);
		}
		if (opName == "min") {
		return isFloatingPoint ? getCmpFactory<LLVM::FCmpOp, LLVM::FCmpPredicate>(
		LLVM::FCmpPredicate::ult)
		: getCmpFactory<LLVM::ICmpOp, LLVM::ICmpPredicate>(
		LLVM::ICmpPredicate::ult);
		}

return AccumulatorFactory();		return AccumulatorFactory();
}		}

/// Returns an accumulator factory that creates an op of type T.		/// Returns an accumulator factory that creates an op of type T.
template <typename T> AccumulatorFactory getFactory() const {		template <typename T> AccumulatorFactory getFactory() const {
return [](Location loc, Value lhs, Value rhs,		return [](Location loc, Value lhs, Value rhs,
ConversionPatternRewriter &rewriter) {		ConversionPatternRewriter &rewriter) {
return rewriter.create<T>(loc, lhs.getType(), lhs, rhs);		return rewriter.create<T>(loc, lhs.getType(), lhs, rhs);
};		};
}		}

		/// Returns an accumulator for comparaison such as min, max. T is the type
		/// of the compare op and P is the type of the predicate.
		template <typename T, typename P>
		AccumulatorFactory getCmpFactory(P predicate) const {
		return [predicate](Location loc, Value lhs, Value rhs,
		ConversionPatternRewriter &rewriter) {
		Value cmp = rewriter.create<T>(loc, predicate, lhs, rhs);
		return rewriter.create<LLVM::SelectOp>(loc, cmp, lhs, rhs);
		};
		}

/// Creates an all_reduce across the block.		/// Creates an all_reduce across the block.
///		///
/// First reduce the elements within a warp. The first thread of each warp		/// First reduce the elements within a warp. The first thread of each warp
/// writes the intermediate result to shared memory. After synchronizing the		/// writes the intermediate result to shared memory. After synchronizing the
/// block, the first warp reduces the values from shared memory. The result		/// block, the first warp reduces the values from shared memory. The result
/// is broadcasted to all threads through shared memory.		/// is broadcasted to all threads through shared memory.
///		///
/// %warp_reduce = `createWarpReduce(%operand)`		/// %warp_reduce = `createWarpReduce(%operand)`
▲ Show 20 Lines • Show All 586 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] Add `and`, `or`, `xor`, `min`, `max` too gpu.all_reduce and the nvvm loweringClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 248805

mlir/include/mlir/Dialect/GPU/GPUOps.td

mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp

[MLIR] Add `and`, `or`, `xor`, `min`, `max` too gpu.all_reduce and the nvvm lowering
ClosedPublic