This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/Vector/IR/
-
mlir/
-
Dialect/
-
Vector/
-
IR/
-
VectorOps.h
7/20
VectorOps.td
-
lib/Dialect/Vector/IR/
-
Dialect/
-
Vector/
-
IR/
-
CMakeLists.txt
3/7
VectorOps.cpp
-
test/Dialect/Vector/
-
Dialect/
-
Vector/
1
invalid.mlir
1/2
ops.mlir
-
utils/bazel/llvm-project-overlay/mlir/
-
bazel/
-
llvm-project-overlay/
-
mlir/
-
BUILD.bazel

Differential D123703

[mlir][vector] Add operations used for Vector distribution
ClosedPublic

Authored by ThomasRaoux on Apr 13 2022, 12:16 PM.

Download Raw Diff

Details

Reviewers

springerm
nicolasvasilache
aartbik
mravishankar
antiagainst

Commits

rG59058c441a9b: [mlir][vector] Add operations used for Vector distribution

Summary

Add vector op warp_execute_on_lane_0 that will be used to do incremental
vector distribution in order to target warp level vector programming for
architectures with GPU-like SIMT programming model.
The idea behind the op is discussed further on discourse:
https://discourse.llvm.org/t/vector-vector-distribution-large-vector-to-small-vector/1983/23

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ThomasRaoux created this revision.Apr 13 2022, 12:16 PM

Herald added a reviewer: aartbik. · View Herald TranscriptApr 13 2022, 12:16 PM

Herald added a reviewer: mravishankar. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: sdasgup3, wenzhicui, wrengr and 20 others. · View Herald Transcript

ThomasRaoux requested review of this revision.Apr 13 2022, 12:16 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 13 2022, 12:16 PM

Herald added a subscriber: stephenneuendorffer. · View Herald Transcript

Harbormaster completed remote builds in B159500: Diff 422593.Apr 13 2022, 12:40 PM

antiagainst requested changes to this revision.Apr 14 2022, 4:29 PM

antiagainst added inline comments.

mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
2545	This summary tells no much additional information. What about "terminates and yields values from vector regions"?
2547	Can we improve the doc here? Something like scf.yield? https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Dialect/SCF/SCFOps.td#L695
2564	Also here. What about "Executes operations in the associated region on lane #0 of a GPU SIMT warp".
2565	Make clear that this op is for GPU SIMT like execution model at the beginning?
2574	The description here is a bit hard for me to parse through. Can we reword this to be clearer? What is the mental model for vector ops before/after `vector.warp_execute_on_lane_0`? (I assume a single lane in the SIMT warp?) What's the mental model for the op arguments/results? (still data range accessed by a single lane in the SIMT warp?) What's the mental model for the op region arguments/yields? (I think full data range for all threads in the SIMT warp?) ... It would be nice if we can say that. (Maybe the current wording is trying to express that, but I find it hard to parse and grok.)
2591	Hmm, this is also hard for me to parse.. You mean "for captured values _it's_ only available to lane 0 .. ?"
2598	Bikeshed: I'd think something like `vector.warp_execute_on_lane_0 @ 32 [%laneid]` reads better: we are execute on lane 0 out of 32, and the current lane's index is using an indexing operator.
mlir/lib/Dialect/Vector/IR/VectorOps.cpp
4716	Hmm, terms here can be improved IMO. What about calling it "warp data vector" and "lane data vector"? ... for lane/warp data vector operands ...
4762	Any chance to add some logic to verify captured values? Like only allow capturing constants (or whatever needed) for now. Better to be conservative at the beginning to avoid surprises and we can relax later. Hard in the reverse way.
mlir/test/Dialect/Vector/ops.mlir
774	Add tests for negative cases?

This revision now requires changes to proceed.Apr 14 2022, 4:29 PM

address review comments

ThomasRaoux marked 2 inline comments as done.Apr 14 2022, 7:49 PM

ThomasRaoux added inline comments.

mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
2547	good point, rewrote it following this example.
2565	Added quite a few details to the description, hopefully it is more clear.
2574	I tried to add more details but it is not easy to explain clearly which is why I was hoping the examples of lowering make the semantic of the op easier to understand.
2591	kind of, basically the op is meant to have the semantic of if(laneid == 0) { ... }
2598	because it is not really meant to reprensent indexing I'd prefer keeping the current representation. I would like it to look more like a `if(laneid == 0)` kind of op.
mlir/lib/Dialect/Vector/IR/VectorOps.cpp
4716	I was thinking about it but I think this gives the wrong impression. When using this op there isn't warp level data and thread level data, everything is already in SIMT mode so all the data are thread level. What the operands expose is that the data from all the lanes may be transfered to the lane0 so that everything ca be computed on one thread.
4762	I'm not sure there is an advantage as it would need to be relaxed in the next commit. This is needed to support any kind of basic transformations.
mlir/test/Dialect/Vector/ops.mlir
774	oops forgot to upload those. Thanks for catching this

I know that this is already reviewed by Nicolas previously. So LGTM.

mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
2545	Nit: Terminates...

This revision is now accepted and ready to land.Apr 14 2022, 8:11 PM

Fix typo

thanks Lei

Harbormaster completed remote builds in B159770: Diff 423007.Apr 14 2022, 8:38 PM

This revision was landed with ongoing or failed builds.Apr 14 2022, 8:53 PM

Closed by commit rG59058c441a9b: [mlir][vector] Add operations used for Vector distribution (authored by ThomasRaoux). · Explain Why

This revision was automatically updated to reflect the committed changes.

ThomasRaoux added a commit: rG59058c441a9b: [mlir][vector] Add operations used for Vector distribution.

nicolasvasilache added inline comments.Apr 19 2022, 3:44 AM

mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
2573	Interestingly, this is also true for more general SPMD: you could want to write any distributed program on vectors this way: if (MPI_Rank == 0) { ... } Can we make the doc a little less GPU-specific? No need to go into full MPI for now but this is more general than "just GPU".
2580	I would probably write all doc as `threadId / numThreads` and specialize to `laneid/warpSize` in the particular (important) case of GPUs.
2587	In the future, the properties of the distribution may be described by extra attributes (e.g. affine map) and operands ?
2598	Give an example of block cyclic here ? E.g. vector<64xf32> on 32 threads maps `vector[0], vector[1]` to thread 0.
2608	Like I had mentioned in my comment in the past: https://github.com/google/iree-llvm-sandbox/pull/147, I think readability would still benefit from having a clear documentation of what piece of code runs in parallel and what runs sequentially.
2626	Here, clearly spelling out distributed vs sequential regions and arguments would be helpful. It feels a bit counter-intuitive to say that the op receives a vector<4xi32> as an operand and internally has access to a vector<128xi32>.
2635	This example is probably the first time the reader really understands what is going on. With an extra preliminary example that just explains what runs in parallel and in sequential in what part of the code (without any detail on packing/unpacking of vector values), I think this would be easier to read for newcomers.
mlir/lib/Dialect/Vector/IR/VectorOps.cpp
4624	hmm `parseRegionArgument`? I would have thought a simple `parseOperand` of IndexType would be in order here?
4708	I am wondering whether we should go into such details of verification or if we should just take 1 type, generate the 2nd type and compare for equality? I expect this will quickly grow in complexity in the future while not giving significantly more actionable information than `expected V1 sequential type to match V2 distributed type (but got V3)`.
mlir/test/Dialect/Vector/invalid.mlir
1534	If we go for the simple generate + equality approach, this would be just 1 extra test for all cases.

Thanks for the review @nicolasvasilache and sorry for the delay as I was in vacation. I'll send another patch to address your comments.

ThomasRaoux added inline comments.May 9 2022, 6:12 AM

mlir/lib/Dialect/Vector/IR/VectorOps.cpp
4708	I agree with that, however because the distribution is currently implicit we would need to first compute the implicit distribution to infer the distributed type. Computing the implicit distribution would require the same set of checks so it wouldn't simplify the code right now. Once we make distribution explicit we can change this code into verifying distribution map and compute directly the infer distributed type.

ThomasRaoux mentioned this in D125227: [mlir][vector] NFC change to improve doc of vector distribution op.May 9 2022, 6:25 AM

ThomasRaoux mentioned this in rGc53ee73b4875: [mlir][vector] NFC change to improve doc of vector distribution op.Jul 22 2022, 10:18 AM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Vector/

IR/

VectorOps.h

1 line

VectorOps.td

136 lines

lib/

Dialect/

Vector/

IR/

CMakeLists.txt

1 line

VectorOps.cpp

178 lines

test/

Dialect/

Vector/

invalid.mlir

75 lines

ops.mlir

27 lines

utils/

bazel/

llvm-project-overlay/

mlir/

BUILD.bazel

2 lines

Diff 423010

mlir/include/mlir/Dialect/Vector/IR/VectorOps.h

	Show All 14 Lines

	#include "mlir/Dialect/Arithmetic/IR/Arithmetic.h"			#include "mlir/Dialect/Arithmetic/IR/Arithmetic.h"
	#include "mlir/IR/AffineMap.h"			#include "mlir/IR/AffineMap.h"
	#include "mlir/IR/Attributes.h"			#include "mlir/IR/Attributes.h"
	#include "mlir/IR/BuiltinTypes.h"			#include "mlir/IR/BuiltinTypes.h"
	#include "mlir/IR/Dialect.h"			#include "mlir/IR/Dialect.h"
	#include "mlir/IR/OpDefinition.h"			#include "mlir/IR/OpDefinition.h"
	#include "mlir/IR/PatternMatch.h"			#include "mlir/IR/PatternMatch.h"
				#include "mlir/Interfaces/ControlFlowInterfaces.h"
	#include "mlir/Interfaces/SideEffectInterfaces.h"			#include "mlir/Interfaces/SideEffectInterfaces.h"
	#include "mlir/Interfaces/VectorInterfaces.h"			#include "mlir/Interfaces/VectorInterfaces.h"
	#include "mlir/Interfaces/ViewLikeInterface.h"			#include "mlir/Interfaces/ViewLikeInterface.h"
	#include "llvm/ADT/StringExtras.h"			#include "llvm/ADT/StringExtras.h"

	// Pull in all enum type definitions and utility function declarations.			// Pull in all enum type definitions and utility function declarations.
	#include "mlir/Dialect/Vector/IR/VectorOpsEnums.h.inc"			#include "mlir/Dialect/Vector/IR/VectorOpsEnums.h.inc"

	▲ Show 20 Lines • Show All 157 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/Vector/IR/VectorOps.td

//===- VectorOps.td - Vector op definitions ---------------- tablegen --====//		//===- VectorOps.td - Vector op definitions ---------------- tablegen --====//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// Defines MLIR vector operations.		// Defines MLIR vector operations.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef VECTOR_OPS		#ifndef VECTOR_OPS
#define VECTOR_OPS		#define VECTOR_OPS

		include "mlir/Interfaces/ControlFlowInterfaces.td"
include "mlir/Interfaces/InferTypeOpInterface.td"		include "mlir/Interfaces/InferTypeOpInterface.td"
include "mlir/Interfaces/SideEffectInterfaces.td"		include "mlir/Interfaces/SideEffectInterfaces.td"
include "mlir/Interfaces/VectorInterfaces.td"		include "mlir/Interfaces/VectorInterfaces.td"
include "mlir/Interfaces/ViewLikeInterface.td"		include "mlir/Interfaces/ViewLikeInterface.td"

def Vector_Dialect : Dialect {		def Vector_Dialect : Dialect {
let name = "vector";		let name = "vector";
let cppNamespace = "::mlir::vector";		let cppNamespace = "::mlir::vector";
▲ Show 20 Lines • Show All 2,510 Lines • ▼ Show 20 Lines	let extraClassDeclaration = [{
}		}
}];		}];
let assemblyFormat =		let assemblyFormat =
"$kind `,` $source `,` $initial_value attr-dict `:` "		"$kind `,` $source `,` $initial_value attr-dict `:` "
"type($source) `,` type($initial_value) ";		"type($source) `,` type($initial_value) ";
let hasVerifier = 1;		let hasVerifier = 1;
}		}

		def Vector_YieldOp : Vector_Op<"yield", [
		NoSideEffect, ReturnLike, Terminator]> {
		let summary = "Terminates and yields values from vector regions.";
		antiagainstUnsubmitted Done Reply Inline Actions This summary tells no much additional information. What about "terminates and yields values from vector regions"? antiagainst: This summary tells no much additional information. What about "terminates and yields values…
		antiagainstUnsubmitted Not Done Reply Inline Actions Nit: Terminates... antiagainst: Nit: Terminates...
		let description = [{
		"vector.yield" yields an SSA value from the Vector dialect op region and
		antiagainstUnsubmitted Not Done Reply Inline Actions Can we improve the doc here? Something like scf.yield? https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Dialect/SCF/SCFOps.td#L695 antiagainst: Can we improve the doc here? Something like scf.yield? https://github.com/llvm/llvm…
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions good point, rewrote it following this example. ThomasRaoux: good point, rewrote it following this example.
		terminates the regions. The semantics of how the values are yielded is
		defined by the parent operation.
		If "vector.yield" has any operands, the operands must correspond to the
		parent operation's results.
		If the parent operation defines no value the vector.yield may be omitted
		when printing the region.
		}];

		let arguments = (ins Variadic<AnyType>:$operands);

		let builders = [
		OpBuilder<(ins), [{ /* nothing to do */ }]>,
		];

		let assemblyFormat = "attr-dict ($operands^ `:` type($operands))?";
		}

		antiagainstUnsubmitted Done Reply Inline Actions Also here. What about "Executes operations in the associated region on lane #0 of a GPU SIMT warp". antiagainst: Also here. What about "Executes operations in the associated region on lane #0 of a GPU SIMT…
		def Vector_WarpExecuteOnLane0Op : Vector_Op<"warp_execute_on_lane_0",
		antiagainstUnsubmitted Not Done Reply Inline Actions Make clear that this op is for GPU SIMT like execution model at the beginning? antiagainst: Make clear that this op is for GPU SIMT like execution model at the beginning?
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions Added quite a few details to the description, hopefully it is more clear. ThomasRaoux: Added quite a few details to the description, hopefully it is more clear.
		[DeclareOpInterfaceMethods<RegionBranchOpInterface, ["areTypesCompatible"]>,
		SingleBlockImplicitTerminator<"vector::YieldOp">,
		RecursiveSideEffects]> {
		let summary = "Executes operations in the associated region on lane #0 of a"
		"GPU SIMT warp";
		let description = [{
		`warp_execute_on_lane_0` is an operation used to bridge the gap between
		vector programming and GPU SIMT programming model. It allows to trivially
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Interestingly, this is also true for more general SPMD: you could want to write any distributed program on vectors this way: if (MPI_Rank == 0) { ... } Can we make the doc a little less GPU-specific? No need to go into full MPI for now but this is more general than "just GPU". nicolasvasilache: Interestingly, this is also true for more general SPMD: you could want to write any distributed…
		convert a region of vector code meant to run on a GPU warp into a valid SIMT
		antiagainstUnsubmitted Not Done Reply Inline Actions The description here is a bit hard for me to parse through. Can we reword this to be clearer? What is the mental model for vector ops before/after `vector.warp_execute_on_lane_0`? (I assume a single lane in the SIMT warp?) What's the mental model for the op arguments/results? (still data range accessed by a single lane in the SIMT warp?) What's the mental model for the op region arguments/yields? (I think full data range for all threads in the SIMT warp?) ... It would be nice if we can say that. (Maybe the current wording is trying to express that, but I find it hard to parse and grok.) antiagainst: The description here is a bit hard for me to parse through. Can we reword this to be clearer?
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions I tried to add more details but it is not easy to explain clearly which is why I was hoping the examples of lowering make the semantic of the op easier to understand. ThomasRaoux: I tried to add more details but it is not easy to explain clearly which is why I was hoping the…
		region and then allows incremental transformation to distribute vector
		operations on the SIMT lane.

		Any code present in the region would only be executed on first lane
		based on the `laneid` operand. The `laneid` operand is an integer ID between
		[0, `warp_size`). The `warp_size` attribute indicates the number of lanes in
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions I would probably write all doc as `threadId / numThreads` and specialize to `laneid/warpSize` in the particular (important) case of GPUs. nicolasvasilache: I would probably write all doc as `threadId / numThreads` and specialize to `laneid/warpSize`…
		a warp.

		Operands are vector values distributed on all lanes that may be used by
		the single lane execution. The matching region argument is a vector of all
		the values of those lanes available to the single active lane. The
		distributed dimension is implicit based on the shape of the operand and
		argument. In the future this may be described by an affine map.
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions In the future, the properties of the distribution may be described by extra attributes (e.g. affine map) and operands ? nicolasvasilache: In the future, the properties of the distribution may be described by extra attributes (e.g.

		Return values are distributed on all lanes using laneId as index. The
		vector is distributed based on the shape ratio between the vector type of
		the yield and the result type.
		antiagainstUnsubmitted Not Done Reply Inline Actions Hmm, this is also hard for me to parse.. You mean "for captured values _it's_ only available to lane 0 .. ?" antiagainst: Hmm, this is also hard for me to parse.. You mean "for captured values _it's_ only available to…
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions kind of, basically the op is meant to have the semantic of if(laneid == 0) { ... } ThomasRaoux: kind of, basically the op is meant to have the semantic of ``` if(laneid == 0) { ... } ```
		If the shapes are the same this means the value is broadcasted to all lanes.
		In the future the distribution can be made more explicit using affine_maps
		and will support having multiple Ids.

		Therefore the `warp_execute_on_lane_0` operations allow to implicitly copy
		between lane0 and the lanes of the warp. When distributing a vector
		from lane0 to all the lanes, the data are distributed in a block cyclic way.
		antiagainstUnsubmitted Not Done Reply Inline Actions Bikeshed: I'd think something like `vector.warp_execute_on_lane_0 @ 32 [%laneid]` reads better: we are execute on lane 0 out of 32, and the current lane's index is using an indexing operator. antiagainst: Bikeshed: I'd think something like `vector.warp_execute_on_lane_0 @ 32 [%laneid]` reads better…
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions because it is not really meant to reprensent indexing I'd prefer keeping the current representation. I would like it to look more like a `if(laneid == 0)` kind of op. ThomasRaoux: because it is not really meant to reprensent indexing I'd prefer keeping the current…
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Give an example of block cyclic here ? E.g. vector<64xf32> on 32 threads maps `vector[0], vector[1]` to thread 0. nicolasvasilache: Give an example of block cyclic here ? E.g. vector<64xf32> on 32 threads maps `vector[0]…

		During lowering values passed as operands and return value need to be
		visible to different lanes within the warp. This would usually be done by
		going through memory.

		The region is not isolated from above. For values coming from the parent
		region not going through operands only the lane 0 value will be accesible so
		it generally only make sense for uniform values.

		Example:
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Like I had mentioned in my comment in the past: https://github.com/google/iree-llvm-sandbox/pull/147, I think readability would still benefit from having a clear documentation of what piece of code runs in parallel and what runs sequentially. nicolasvasilache: Like I had mentioned in my comment in the past: https://github.com/google/iree-llvm…
		```
		vector.warp_execute_on_lane_0 (%laneid)[32] {
		...
		}
		```

		This may be lowered to an scf.if region as below:
		```
		%cnd = arith.cmpi eq, %laneid, %c0 : index
		scf.if %cnd {
		...
		}
		```

		When the region has operands and/or return values:
		```
		%0 = vector.warp_execute_on_lane_0(%laneid)[32]
		args(%v0 : vector<4xi32>) -> (vector<1xf32>) {
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Here, clearly spelling out distributed vs sequential regions and arguments would be helpful. It feels a bit counter-intuitive to say that the op receives a vector<4xi32> as an operand and internally has access to a vector<128xi32>. nicolasvasilache: Here, clearly spelling out distributed vs sequential regions and arguments would be helpful.
		^bb0(%arg0 : vector<128xi32>) :
		...
		vector.yield %1 : vector<32xf32>
		}
		```

		values at the region boundary would go through memory:
		```
		%tmp0 = memreg.alloc() : memref<32xf32, 3>
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions This example is probably the first time the reader really understands what is going on. With an extra preliminary example that just explains what runs in parallel and in sequential in what part of the code (without any detail on packing/unpacking of vector values), I think this would be easier to read for newcomers. nicolasvasilache: This example is probably the first time the reader really understands what is going on. With an…
		%tmp1 = memreg.alloc() : memref<32xf32, 3>
		%cnd = arith.cmpi eq, %laneid, %c0 : index
		vector.store %v0, %tmp0[%laneid] : memref<32xf32>, vector<1xf32>
		warp_sync
		scf.if %cnd {
		%arg0 = vector.load %tmp0[%c0] : memref<32xf32>, vector<32xf32>
		...
		vector.store %1, %tmp1[%c0] : memref<32xf32>, vector<32xf32>
		}
		warp_sync
		%0 = vector.load %tmp1[%laneid] : memref<32xf32>, vector<32xf32>
		```

		}];

		let hasVerifier = 1;
		let hasCustomAssemblyFormat = 1;
		let arguments = (ins Index:$laneid, I64Attr:$warp_size,
		Variadic<AnyType>:$args);
		let results = (outs Variadic<AnyType>:$results);
		let regions = (region SizedRegion<1>:$warpRegion);

		let skipDefaultBuilders = 1;
		let builders = [
		OpBuilder<(ins "Value":$laneid, "int64_t":$warpSize)>,
		OpBuilder<(ins "TypeRange":$resultTypes, "Value":$laneid,
		"int64_t":$warpSize)>,
		// `blockArgTypes` are different than `args` types as they are they
		// represent all the `args` instances visibile to lane 0. Therefore we need
		// to explicit pass the type.
		OpBuilder<(ins "TypeRange":$resultTypes, "Value":$laneid,
		"int64_t":$warpSize, "ValueRange":$args,
		"TypeRange":$blockArgTypes)>
		];

		let extraClassDeclaration = [{
		bool isDefinedOutsideOfRegion(Value value) {
		return !getRegion().isAncestor(value.getParentRegion());
		}
		}];
		}

#endif // VECTOR_OPS		#endif // VECTOR_OPS

mlir/lib/Dialect/Vector/IR/CMakeLists.txt

	add_mlir_dialect_library(MLIRVector			add_mlir_dialect_library(MLIRVector
	VectorOps.cpp			VectorOps.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/Vector/IR			${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/Vector/IR

	DEPENDS			DEPENDS
	MLIRVectorOpsIncGen			MLIRVectorOpsIncGen
	MLIRVectorOpsEnumsIncGen			MLIRVectorOpsEnumsIncGen

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
	MLIRArithmetic			MLIRArithmetic
				MLIRControlFlowInterfaces
	MLIRDataLayoutInterfaces			MLIRDataLayoutInterfaces
	MLIRDialectUtils			MLIRDialectUtils
	MLIRIR			MLIRIR
	MLIRMemRef			MLIRMemRef
	MLIRSideEffectInterfaces			MLIRSideEffectInterfaces
	MLIRTensor			MLIRTensor
	MLIRVectorInterfaces			MLIRVectorInterfaces
	)			)

mlir/lib/Dialect/Vector/IR/VectorOps.cpp

Show First 20 Lines • Show All 4,584 Lines • ▼ Show 20 Lines	OpFoldResult SplatOp::fold(ArrayRef<Attribute> operands) {
if (!constOperand.isa_and_nonnull<IntegerAttr, FloatAttr>())		if (!constOperand.isa_and_nonnull<IntegerAttr, FloatAttr>())
return {};		return {};

// SplatElementsAttr::get treats single value for second arg as being a splat.		// SplatElementsAttr::get treats single value for second arg as being a splat.
return SplatElementsAttr::get(getType(), {constOperand});		return SplatElementsAttr::get(getType(), {constOperand});
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		// WarpExecuteOnLane0Op
		//===----------------------------------------------------------------------===//

		void WarpExecuteOnLane0Op::print(OpAsmPrinter &p) {
		p << "(" << getLaneid() << ")";

		SmallVector<StringRef> coreAttr = {getWarpSizeAttrName()};
		auto warpSizeAttr = getOperation()->getAttr(getWarpSizeAttrName());
		p << "[" << warpSizeAttr.cast<IntegerAttr>().getInt() << "]";

		if (!getArgs().empty())
		p << " args(" << getArgs() << " : " << getArgs().getTypes() << ")";
		if (!getResults().empty())
		p << " -> (" << getResults().getTypes() << ')';
		p << " ";
		p.printRegion(getRegion(),
		/printEntryBlockArgs=/true,
		/printBlockTerminators=/!getResults().empty());
		p.printOptionalAttrDict(getOperation()->getAttrs(), coreAttr);
		}

		ParseResult WarpExecuteOnLane0Op::parse(OpAsmParser &parser,
		OperationState &result) {
		// Create the region.
		result.regions.reserve(1);
		Region *warpRegion = result.addRegion();

		auto &builder = parser.getBuilder();
		OpAsmParser::UnresolvedOperand laneId;

		// Parse predicate operand.
		if (parser.parseLParen() \|\| parser.parseRegionArgument(laneId) \|\|
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions hmm `parseRegionArgument`? I would have thought a simple `parseOperand` of IndexType would be in order here? nicolasvasilache: hmm `parseRegionArgument`? I would have thought a simple `parseOperand` of IndexType would be…
		parser.parseRParen())
		return failure();

		int64_t warpSize;
		if (parser.parseLSquare() \|\| parser.parseInteger(warpSize) \|\|
		parser.parseRSquare())
		return failure();
		result.addAttribute(getWarpSizeAttrName(OperationName(getOperationName(),
		builder.getContext())),
		builder.getI64IntegerAttr(warpSize));

		if (parser.resolveOperand(laneId, builder.getIndexType(), result.operands))
		return failure();

		llvm::SMLoc inputsOperandsLoc;
		SmallVector<OpAsmParser::UnresolvedOperand> inputsOperands;
		SmallVector<Type> inputTypes;
		if (succeeded(parser.parseOptionalKeyword("args"))) {
		if (parser.parseLParen())
		return failure();

		inputsOperandsLoc = parser.getCurrentLocation();
		if (parser.parseOperandList(inputsOperands) \|\|
		parser.parseColonTypeList(inputTypes) \|\| parser.parseRParen())
		return failure();
		}
		if (parser.resolveOperands(inputsOperands, inputTypes, inputsOperandsLoc,
		result.operands))
		return failure();

		// Parse optional results type list.
		if (parser.parseOptionalArrowTypeList(result.types))
		return failure();
		// Parse the region.
		if (parser.parseRegion(warpRegion, /arguments=*/{},
		/argTypes=/{}))
		return failure();
		WarpExecuteOnLane0Op::ensureTerminator(*warpRegion, builder, result.location);

		// Parse the optional attribute list.
		if (parser.parseOptionalAttrDict(result.attributes))
		return failure();
		return success();
		}

		void WarpExecuteOnLane0Op::getSuccessorRegions(
		Optional<unsigned> index, ArrayRef<Attribute> operands,
		SmallVectorImpl<RegionSuccessor> &regions) {
		if (index.hasValue()) {
		regions.push_back(RegionSuccessor(getResults()));
		return;
		}

		// The warp region is always executed
		regions.push_back(RegionSuccessor(&getWarpRegion()));
		}

		void WarpExecuteOnLane0Op::build(OpBuilder &builder, OperationState &result,
		TypeRange resultTypes, Value laneId,
		int64_t warpSize) {
		build(builder, result, resultTypes, laneId, warpSize,
		/operands=/llvm::None, /argTypes=/llvm::None);
		}

		void WarpExecuteOnLane0Op::build(OpBuilder &builder, OperationState &result,
		TypeRange resultTypes, Value laneId,
		int64_t warpSize, ValueRange args,
		TypeRange blockArgTypes) {
		result.addOperands(laneId);
		result.addAttribute(getAttributeNames()[0],
		builder.getI64IntegerAttr(warpSize));
		result.addTypes(resultTypes);
		result.addOperands(args);
		assert(args.size() == blockArgTypes.size());
		OpBuilder::InsertionGuard guard(builder);
		Region *warpRegion = result.addRegion();
		Block *block = builder.createBlock(warpRegion);
		for (auto it : llvm::zip(blockArgTypes, args))
		block->addArgument(std::get<0>(it), std::get<1>(it).getLoc());
		}

		/// Helper check if the distributed vector type is consistent with the expanded
		/// type and distributed size.
		static LogicalResult verifyDistributedType(Type expanded, Type distributed,
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions I am wondering whether we should go into such details of verification or if we should just take 1 type, generate the 2nd type and compare for equality? I expect this will quickly grow in complexity in the future while not giving significantly more actionable information than `expected V1 sequential type to match V2 distributed type (but got V3)`. nicolasvasilache: I am wondering whether we should go into such details of verification or if we should just take…
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions I agree with that, however because the distribution is currently implicit we would need to first compute the implicit distribution to infer the distributed type. Computing the implicit distribution would require the same set of checks so it wouldn't simplify the code right now. Once we make distribution explicit we can change this code into verifying distribution map and compute directly the infer distributed type. ThomasRaoux: I agree with that, however because the distribution is currently implicit we would need to…
		int64_t warpSize, Operation *op) {
		// If the types matches there is no distribution.
		if (expanded == distributed)
		return success();
		auto expandedVecType = expanded.dyn_cast<VectorType>();
		auto distributedVecType = distributed.dyn_cast<VectorType>();
		if (!expandedVecType \|\| !distributedVecType)
		return op->emitOpError("expected vector type for distributed operands.");
		antiagainstUnsubmitted Not Done Reply Inline Actions Hmm, terms here can be improved IMO. What about calling it "warp data vector" and "lane data vector"? ... for lane/warp data vector operands ... antiagainst: Hmm, terms here can be improved IMO. What about calling it "warp data vector" and "lane data…
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions I was thinking about it but I think this gives the wrong impression. When using this op there isn't warp level data and thread level data, everything is already in SIMT mode so all the data are thread level. What the operands expose is that the data from all the lanes may be transfered to the lane0 so that everything ca be computed on one thread. ThomasRaoux: I was thinking about it but I think this gives the wrong impression. When using this op there…
		if (expandedVecType.getRank() != distributedVecType.getRank() \|\|
		expandedVecType.getElementType() != distributedVecType.getElementType())
		return op->emitOpError(
		"expected distributed vectors to have same rank and element type.");
		bool foundDistributedDim = false;
		for (int64_t i = 0, e = expandedVecType.getRank(); i < e; i++) {
		if (expandedVecType.getDimSize(i) == distributedVecType.getDimSize(i))
		continue;
		if (expandedVecType.getDimSize(i) ==
		distributedVecType.getDimSize(i) * warpSize) {
		if (foundDistributedDim)
		return op->emitOpError()
		<< "expected only one dimension to be distributed from "
		<< expandedVecType << " to " << distributedVecType;
		foundDistributedDim = true;
		continue;
		}
		return op->emitOpError() << "incompatible distribution dimensions from "
		<< expandedVecType << " to " << distributedVecType;
		}
		return success();
		}

		LogicalResult WarpExecuteOnLane0Op::verify() {
		if (getArgs().size() != getWarpRegion().getNumArguments())
		return emitOpError(
		"expected same number op arguments and block arguments.");
		auto yield =
		cast<YieldOp>(getWarpRegion().getBlocks().begin()->getTerminator());
		if (yield.getNumOperands() != getNumResults())
		return emitOpError(
		"expected same number of yield operands and return values.");
		int64_t warpSize = getWarpSize();
		for (auto it : llvm::zip(getWarpRegion().getArguments(), getArgs())) {
		if (failed(verifyDistributedType(std::get<0>(it).getType(),
		std::get<1>(it).getType(), warpSize,
		getOperation())))
		return failure();
		}
		for (auto it : llvm::zip(yield.getOperands(), getResults())) {
		if (failed(verifyDistributedType(std::get<0>(it).getType(),
		std::get<1>(it).getType(), warpSize,
		getOperation())))
		return failure();
		}
		return success();
		antiagainstUnsubmitted Not Done Reply Inline Actions Any chance to add some logic to verify captured values? Like only allow capturing constants (or whatever needed) for now. Better to be conservative at the beginning to avoid surprises and we can relax later. Hard in the reverse way. antiagainst: Any chance to add some logic to verify captured values? Like only allow capturing constants (or…
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions I'm not sure there is an advantage as it would need to be relaxed in the next commit. This is needed to support any kind of basic transformations. ThomasRaoux: I'm not sure there is an advantage as it would need to be relaxed in the next commit. This is…
		}

		bool WarpExecuteOnLane0Op::areTypesCompatible(Type lhs, Type rhs) {
		return succeeded(
		verifyDistributedType(lhs, rhs, getWarpSize(), getOperation()));
		}

		//===----------------------------------------------------------------------===//
// TableGen'd op method definitions		// TableGen'd op method definitions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#define GET_OP_CLASSES		#define GET_OP_CLASSES
#include "mlir/Dialect/Vector/IR/VectorOps.cpp.inc"		#include "mlir/Dialect/Vector/IR/VectorOps.cpp.inc"

mlir/test/Dialect/Vector/invalid.mlir

	Show First 20 Lines • Show All 1,522 Lines • ▼ Show 20 Lines

	// -----			// -----

	func @invalid_splat(%v : f32) {			func @invalid_splat(%v : f32) {
	// expected-error@+1 {{invalid kind of type specified}}			// expected-error@+1 {{invalid kind of type specified}}
	vector.splat %v : memref<8xf32>			vector.splat %v : memref<8xf32>
	return			return
	}			}

				// -----

				func @warp_wrong_num_outputs(%laneid: index) {
				nicolasvasilacheUnsubmitted Not Done Reply Inline Actions If we go for the simple generate + equality approach, this would be just 1 extra test for all cases. nicolasvasilache: If we go for the simple generate + equality approach, this would be just 1 extra test for all…
				// expected-error@+1 {{'vector.warp_execute_on_lane_0' op expected same number of yield operands and return values.}}
				%2 = vector.warp_execute_on_lane_0(%laneid)[64] -> (vector<4xi32>) {
				}
				return
				}

				// -----

				func @warp_wrong_num_inputs(%laneid: index) {
				// expected-error@+1 {{'vector.warp_execute_on_lane_0' op expected same number op arguments and block arguments.}}
				vector.warp_execute_on_lane_0(%laneid)[64] {
				^bb0(%arg0 : vector<128xi32>) :
				}
				return
				}

				// -----

				func @warp_wrong_return_distribution(%laneid: index) {
				// expected-error@+1 {{'vector.warp_execute_on_lane_0' op incompatible distribution dimensions from 'vector<128xi32>' to 'vector<4xi32>'}}
				%2 = vector.warp_execute_on_lane_0(%laneid)[64] -> (vector<4xi32>) {
				%0 = arith.constant dense<2>: vector<128xi32>
				vector.yield %0 : vector<128xi32>
				}
				return
				}


				// -----

				func @warp_wrong_arg_distribution(%laneid: index, %v0 : vector<4xi32>) {
				// expected-error@+1 {{'vector.warp_execute_on_lane_0' op incompatible distribution dimensions from 'vector<128xi32>' to 'vector<4xi32>'}}
				vector.warp_execute_on_lane_0(%laneid)[64]
				args(%v0 : vector<4xi32>) {
				^bb0(%arg0 : vector<128xi32>) :
				}
				return
				}

				// -----

				func @warp_2_distributed_dims(%laneid: index) {
				// expected-error@+1 {{'vector.warp_execute_on_lane_0' op expected only one dimension to be distributed from 'vector<128x128xi32>' to 'vector<4x4xi32>'}}
				%2 = vector.warp_execute_on_lane_0(%laneid)[32] -> (vector<4x4xi32>) {
				%0 = arith.constant dense<2>: vector<128x128xi32>
				vector.yield %0 : vector<128x128xi32>
				}
				return
				}

				// -----

				func @warp_mismatch_rank(%laneid: index) {
				// expected-error@+1 {{'vector.warp_execute_on_lane_0' op expected distributed vectors to have same rank and element type.}}
				%2 = vector.warp_execute_on_lane_0(%laneid)[32] -> (vector<4x4xi32>) {
				%0 = arith.constant dense<2>: vector<128xi32>
				vector.yield %0 : vector<128xi32>
				}
				return
				}

				// -----

				func @warp_mismatch_rank(%laneid: index) {
				// expected-error@+1 {{'vector.warp_execute_on_lane_0' op expected vector type for distributed operands.}}
				%2 = vector.warp_execute_on_lane_0(%laneid)[32] -> (i32) {
				%0 = arith.constant dense<2>: vector<128xi32>
				vector.yield %0 : vector<128xi32>
				}
				return
				}

mlir/test/Dialect/Vector/ops.mlir

	Show First 20 Lines • Show All 739 Lines • ▼ Show 20 Lines
	}			}

	// CHECK-LABEL: func @vector_splat_0d(			// CHECK-LABEL: func @vector_splat_0d(
	func @vector_splat_0d(%a: f32) -> vector<f32> {			func @vector_splat_0d(%a: f32) -> vector<f32> {
	// CHECK: vector.splat %{{.*}} : vector<f32>			// CHECK: vector.splat %{{.*}} : vector<f32>
	%0 = vector.splat %a : vector<f32>			%0 = vector.splat %a : vector<f32>
	return %0 : vector<f32>			return %0 : vector<f32>
	}			}

				// CHECK-LABEL: func @warp_execute_on_lane_0(
				func @warp_execute_on_lane_0(%laneid: index) {
				// CHECK-NEXT: vector.warp_execute_on_lane_0(%{{.*}})[32] {
				vector.warp_execute_on_lane_0(%laneid)[32] {
				// CHECK-NEXT: }
				}
				// CHECK-NEXT: return
				return
				}

				// CHECK-LABEL: func @warp_operand_result(
				func @warp_operand_result(%laneid: index, %v0 : vector<4xi32>) -> (vector<4xi32>) {
				// CHECK-NEXT: %{{.}} = vector.warp_execute_on_lane_0(%{{.}})[32] args(%{{.*}} : vector<4xi32>) -> (vector<4xi32>) {
				%2 = vector.warp_execute_on_lane_0(%laneid)[32]
				args(%v0 : vector<4xi32>) -> (vector<4xi32>) {
				^bb0(%arg0 : vector<128xi32>) :
				%0 = arith.constant dense<2>: vector<128xi32>
				%1 = arith.addi %arg0, %0 : vector<128xi32>
				// CHECK: vector.yield %{{.*}} : vector<128xi32>
				vector.yield %1 : vector<128xi32>
				// CHECK-NEXT: }
				}
				return %2 : vector<4xi32>
				}


				antiagainstUnsubmitted Not Done Reply Inline Actions Add tests for negative cases? antiagainst: Add tests for negative cases?
				ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions oops forgot to upload those. Thanks for catching this ThomasRaoux: oops forgot to upload those. Thanks for catching this

utils/bazel/llvm-project-overlay/mlir/BUILD.bazel

Show First 20 Lines • Show All 2,957 Lines • ▼ Show 20 Lines	cc_library(
),		),
hdrs = glob([		hdrs = glob([
"include/mlir/Dialect/Vector/IR/*.h",		"include/mlir/Dialect/Vector/IR/*.h",
]),		]),
includes = ["include"],		includes = ["include"],
deps = [		deps = [
":ArithmeticDialect",		":ArithmeticDialect",
":ArithmeticUtils",		":ArithmeticUtils",
		":ControlFlowInterfaces",
":DialectUtils",		":DialectUtils",
":IR",		":IR",
":MemRefDialect",		":MemRefDialect",
":SideEffectInterfaces",		":SideEffectInterfaces",
":Support",		":Support",
":TensorDialect",		":TensorDialect",
":VectorInterfaces",		":VectorInterfaces",
":VectorOpsIncGen",		":VectorOpsIncGen",
▲ Show 20 Lines • Show All 4,296 Lines • ▼ Show 20 Lines	cc_library(
],		],
)		)

td_library(		td_library(
name = "VectorOpsTdFiles",		name = "VectorOpsTdFiles",
srcs = ["include/mlir/Dialect/Vector/IR/VectorOps.td"],		srcs = ["include/mlir/Dialect/Vector/IR/VectorOps.td"],
includes = ["include"],		includes = ["include"],
deps = [		deps = [
		":ControlFlowInterfacesTdFiles",
":InferTypeOpInterfaceTdFiles",		":InferTypeOpInterfaceTdFiles",
":OpBaseTdFiles",		":OpBaseTdFiles",
":SideEffectInterfacesTdFiles",		":SideEffectInterfacesTdFiles",
":VectorInterfacesTdFiles",		":VectorInterfacesTdFiles",
":ViewLikeInterfaceTdFiles",		":ViewLikeInterfaceTdFiles",
],		],
)		)

▲ Show 20 Lines • Show All 1,472 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][vector] Add operations used for Vector distributionClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 423010

mlir/include/mlir/Dialect/Vector/IR/VectorOps.h

mlir/include/mlir/Dialect/Vector/IR/VectorOps.td

mlir/lib/Dialect/Vector/IR/CMakeLists.txt

mlir/lib/Dialect/Vector/IR/VectorOps.cpp

mlir/test/Dialect/Vector/invalid.mlir

mlir/test/Dialect/Vector/ops.mlir

utils/bazel/llvm-project-overlay/mlir/BUILD.bazel

[mlir][vector] Add operations used for Vector distribution
ClosedPublic