This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/IR/
-
mlir/
-
IR/
-
AffineMap.h
-
lib/
-
Dialect/Linalg/Transforms/
-
Linalg/
-
Transforms/
-
Vectorization.cpp
-
IR/
1/3
AffineMap.cpp
-
test/Dialect/Linalg/
-
Dialect/
-
Linalg/
-
vectorization.mlir

Differential D111563

[mlir][Linalg] Enable vectorization of explicit broadcasts
ClosedPublic

Authored by dcaballe on Oct 11 2021, 10:41 AM.

Download Raw Diff

Details

Reviewers

pifon2a
ThomasRaoux
nicolasvasilache
herhut
aartbik
bondhugula
rriddle

Commits

rG5c1d356c18c3: [mlir][Linalg] Enable vectorization of explicit broadcasts

Summary

This patch teaches isProjectedPermutation and inverseAndBroadcastProjectedPermutation
utilities to deal with maps representing an explicit broadcast, e.g., (d0, d1) -> (d0, 0).
This extension is needed to enable vectorization of such explicit broadcast in Linalg.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dcaballe created this revision.Oct 11 2021, 10:41 AM

Herald added a reviewer: rriddle. · View Herald TranscriptOct 11 2021, 10:41 AM

Herald added subscribers: wenzhicui, wrengr, Chia-hungDuan and 18 others. · View Herald Transcript

dcaballe requested review of this revision.Oct 11 2021, 10:41 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 11 2021, 10:41 AM

Herald added subscribers: limo1996, stephenneuendorffer. · View Herald Transcript

Harbormaster completed remote builds in B128154: Diff 378731.Oct 11 2021, 11:54 AM

ThomasRaoux added inline comments.Oct 11 2021, 12:21 PM

mlir/lib/IR/AffineMap.cpp
498–526	Does that mean `(d0, d1) -> (0, 0)` would be a projected permuation? That doesn't sound right. I think you need a different function to check for something like prjected permutation with broadcast like a generalization of `isPermutationOfMinorIdentityWithBroadcasting`?

Could you please verify if all tests still pass? For example, we might need to modify applyPermutationMap in AffineMap.h.

template <typename T>
SmallVector<T> applyPermutationMap(AffineMap map, llvm::ArrayRef<T> source) {
  assert(map.isProjectedPermutation());
  assert(map.getNumInputs() == source.size());
  SmallVector<T> result;
  result.reserve(map.getNumResults());
  for (unsigned i = 0, e = map.getNumResults(); i < e; ++i) {
    /* fix
    if (auto expr = map.getResult(i).dyn_cast<AffineConstantExpr>()) {
      result.push_back(0);
      continue;
    }
    */
    unsigned dim = map.getDimPosition(i);
    result.push_back(source[dim]);
  }
  return result;
}

I have something similar locally but I am not clear this is actually needed for now.
Would you mind holding off a bit until I clear up my stack and avoid conflicts?

Addressed Alex's feedback for now.

In D111563#3056030, @pifon2a wrote:

Could you please verify if all tests still pass? For example, we might need to modify applyPermutationMap in AffineMap.h.

Good catch! Yes, I had run all the lit tests and nothing failed. I fixed this, thanks!

In D111563#3056288, @nicolasvasilache wrote:

I have something similar locally but I am not clear this is actually needed for now.
Would you mind holding off a bit until I clear up my stack and avoid conflicts?

Sure, I'll wait for you. After looking at your code, I'm addressing the review comments since they will also apply to your code.

mlir/lib/IR/AffineMap.cpp
498–526	Good point! Note that this utility, not sure if intentionally, is already returning true for `(d0, d1) -> ()`. `(d0, d1) -> (0, 0)` would be the equivalent preserving the dims. I think it boils down to what the definition of "projected permutation" is. I asked @nicolasvasilache and it seems to be ok to have zeros in a projected permutation. We could add logic to skip `(d0, d1) -> ()` and `(d0, d1) -> (0, 0)` but they look like a valid corner case to me. WDYT?

ThomasRaoux added inline comments.Oct 11 2021, 7:20 PM

mlir/lib/IR/AffineMap.cpp
498–526	In my mind projected permutation meant it has a permutation of the input dimensions and some dimensions may be missing (projection), so `(d0, d1) -> ()` sounds like a valid projected permutation but having zeros instead of dimensions seems a bit odd. @nicolasvasilache is the referenced so if he thinks it is correct to allow extra zeros it is fine with me but this seems less intuitive to me. It would be good to look at how this helper is used right now and check if any places assume that it means that all the results are dimensions.

Harbormaster completed remote builds in B128250: Diff 378855.Oct 11 2021, 7:24 PM

pifon2a accepted this revision.Oct 12 2021, 12:14 AM

This revision is now accepted and ready to land.Oct 12 2021, 12:14 AM

My stack is now flushed, feel free to land this @dcaballe and thanks for your patience!

Well technically, orthogonal projection of a vector v=(v0, .. vn) on the canonical axis e_k is the vector v=(v0, .. 0_k .. vn).
So far we have liberally jumped the gun a bit by also dropping those dimensions so that other invariants agree.

This LGTM; I would just make the behavior optional with a flag to avoid surprises (for now).

Adding a flag and enable it only for vectorization.
This should be ready to go.

This revision was landed with ongoing or failed builds.Oct 12 2021, 2:10 PM

Closed by commit rG5c1d356c18c3: [mlir][Linalg] Enable vectorization of explicit broadcasts (authored by dcaballe). · Explain Why

This revision was automatically updated to reflect the committed changes.

dcaballe added a commit: rG5c1d356c18c3: [mlir][Linalg] Enable vectorization of explicit broadcasts.

Harbormaster completed remote builds in B128469: Diff 379170.Oct 12 2021, 2:16 PM

Revision Contents

Path

Size

mlir/

include/

mlir/

IR/

AffineMap.h

31 lines

lib/

Dialect/

Linalg/

Transforms/

Vectorization.cpp

8 lines

IR/

AffineMap.cpp

32 lines

test/

Dialect/

Linalg/

vectorization.mlir

60 lines

Diff 379182

mlir/include/mlir/IR/AffineMap.h

Show First 20 Lines • Show All 267 Lines • ▼ Show 20 Lines	public:
/// `(d0)[s0, s1, s2] -> (d0 + s1 + s2 + 1, d0 - s0 - s2 - 1)`		/// `(d0)[s0, s1, s2] -> (d0 + s1 + s2 + 1, d0 - s0 - s2 - 1)`
AffineMap compose(AffineMap map) const;		AffineMap compose(AffineMap map) const;

/// Applies composition by the dims of `this` to the integer `values` and		/// Applies composition by the dims of `this` to the integer `values` and
/// returns the resulting values. `this` must be symbol-less.		/// returns the resulting values. `this` must be symbol-less.
SmallVector<int64_t, 4> compose(ArrayRef<int64_t> values) const;		SmallVector<int64_t, 4> compose(ArrayRef<int64_t> values) const;

/// Returns true if the AffineMap represents a subset (i.e. a projection) of a		/// Returns true if the AffineMap represents a subset (i.e. a projection) of a
/// symbol-less permutation map.		/// symbol-less permutation map. `allowZeroInResults` allows projected
bool isProjectedPermutation() const;		/// permutation maps with constant zero result expressions.
		/// TODO: Remove `allowZeroInResults` when constant zero result expressions
		/// are broadly supported.
		bool isProjectedPermutation(bool allowZeroInResults = false) const;

/// Returns true if the AffineMap represents a symbol-less permutation map.		/// Returns true if the AffineMap represents a symbol-less permutation map.
bool isPermutation() const;		bool isPermutation() const;

/// Returns the map consisting of the `resultPos` subset.		/// Returns the map consisting of the `resultPos` subset.
AffineMap getSubMap(ArrayRef<unsigned> resultPos) const;		AffineMap getSubMap(ArrayRef<unsigned> resultPos) const;

/// Returns the map consisting of `length` expressions starting from `start`.		/// Returns the map consisting of `length` expressions starting from `start`.
▲ Show 20 Lines • Show All 173 Lines • ▼ Show 20 Lines
/// affine_map<(d0, d1, d2, d3) -> (d2)>		/// affine_map<(d0, d1, d2, d3) -> (d2)>
/// ```		/// ```
///		///
/// returns:		/// returns:
///		///
/// ```mlir		/// ```mlir
/// affine_map<(d0) -> (0, 0, d0, 0)>		/// affine_map<(d0) -> (0, 0, d0, 0)>
/// ```		/// ```
		/// Example 4:
		///
		/// ```mlir
		/// affine_map<(d0, d1, d2) -> (d0, 0)>
		/// ```
		///
		/// returns:
		///
		/// ```mlir
		/// affine_map<(d0, d1) -> (d0, 0, 0)>
		/// ```
AffineMap inverseAndBroadcastProjectedPermuation(AffineMap map);		AffineMap inverseAndBroadcastProjectedPermuation(AffineMap map);

/// Concatenates a list of `maps` into a single AffineMap, stepping over		/// Concatenates a list of `maps` into a single AffineMap, stepping over
/// potentially empty maps. Assumes each of the underlying map has 0 symbols.		/// potentially empty maps. Assumes each of the underlying map has 0 symbols.
/// The resulting map has a number of dims equal to the max of `maps`' dims and		/// The resulting map has a number of dims equal to the max of `maps`' dims and
/// the concatenated results as its results.		/// the concatenated results as its results.
/// Returns an empty map if all input `maps` are empty.		/// Returns an empty map if all input `maps` are empty.
///		///
Show All 38 Lines

/// Apply a permutation from `map` to `source` and return the result.		/// Apply a permutation from `map` to `source` and return the result.
template <typename T>		template <typename T>
SmallVector<T> applyPermutationMap(AffineMap map, llvm::ArrayRef<T> source) {		SmallVector<T> applyPermutationMap(AffineMap map, llvm::ArrayRef<T> source) {
assert(map.isProjectedPermutation());		assert(map.isProjectedPermutation());
assert(map.getNumInputs() == source.size());		assert(map.getNumInputs() == source.size());
SmallVector<T> result;		SmallVector<T> result;
result.reserve(map.getNumResults());		result.reserve(map.getNumResults());
for (unsigned i = 0, e = map.getNumResults(); i < e; ++i) {		for (AffineExpr expr : map.getResults()) {
unsigned dim = map.getDimPosition(i);		if (auto dimExpr = expr.dyn_cast<AffineDimExpr>()) {
result.push_back(source[dim]);		result.push_back(source[dimExpr.getPosition()]);
		} else if (auto constExpr = expr.dyn_cast<AffineConstantExpr>()) {
		assert(constExpr.getValue() == 0 &&
		"Unexpected constant in projected permutation map");
		result.push_back(0);
		} else {
		llvm_unreachable("Unexpected result in projected permutation map");
		}
}		}
return result;		return result;
}		}

inline raw_ostream &operator<<(raw_ostream &os, AffineMap map) {		inline raw_ostream &operator<<(raw_ostream &os, AffineMap map) {
map.print(os);		map.print(os);
return os;		return os;
}		}
Show All 26 Lines

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
/// ins(%0 : tensor<2x3x4xf32>)		/// ins(%0 : tensor<2x3x4xf32>)
/// outs(%1 : tensor<5x6xf32>)		/// outs(%1 : tensor<5x6xf32>)
/// ```		/// ```
///		///
/// the iteration domain size of the linalg op is 3x5x4x6x2. The first affine		/// the iteration domain size of the linalg op is 3x5x4x6x2. The first affine
/// map is reindexed to `affine_map<(d0, d1, d2) -> (d2, d0, d1)>`, the second		/// map is reindexed to `affine_map<(d0, d1, d2) -> (d2, d0, d1)>`, the second
/// affine map is reindexed to `affine_map<(d0, d1) -> (d0, d1)>`.		/// affine map is reindexed to `affine_map<(d0, d1) -> (d0, d1)>`.
static AffineMap reindexIndexingMap(AffineMap map) {		static AffineMap reindexIndexingMap(AffineMap map) {
assert(map.isProjectedPermutation() && "expected projected permutation");		assert(map.isProjectedPermutation(/allowZerosInResults=/true) &&
		"expected projected permutation");
auto res = compressUnusedDims(map);		auto res = compressUnusedDims(map);
assert(res.getNumDims() == res.getNumResults() &&		assert(res.getNumDims() == res.getNumResults() &&
"expected reindexed map with same number of dims and results");		"expected reindexed map with same number of dims and results");
return res;		return res;
}		}

/// Helper data structure to represent the result of vectorization.		/// Helper data structure to represent the result of vectorization.
/// In certain specific cases, like terminators, we do not want to propagate/		/// In certain specific cases, like terminators, we do not want to propagate/
▲ Show 20 Lines • Show All 499 Lines • ▼ Show 20 Lines	CustomVectorizationHook vectorizeContraction =
return VectorizationResult{VectorizationStatus::NewOp, contract};		return VectorizationResult{VectorizationStatus::NewOp, contract};
};		};
return vectorizeAsLinalgGeneric(b, linalgOp, newResults,		return vectorizeAsLinalgGeneric(b, linalgOp, newResults,
/broadcastToMaximalCommonShape=/false,		/broadcastToMaximalCommonShape=/false,
{vectorizeContraction});		{vectorizeContraction});
}		}

static bool allIndexingsAreProjectedPermutation(LinalgOp op) {		static bool allIndexingsAreProjectedPermutation(LinalgOp op) {
return llvm::all_of(op.getIndexingMaps(),		return llvm::all_of(op.getIndexingMaps(), [](AffineMap m) {
[](AffineMap m) { return m.isProjectedPermutation(); });		return m.isProjectedPermutation(/allowZerosInResults=/true);
		});
}		}

// TODO: probably need some extra checks for reduction followed by consumer		// TODO: probably need some extra checks for reduction followed by consumer
// ops that may not commute (e.g. linear reduction + non-linear instructions).		// ops that may not commute (e.g. linear reduction + non-linear instructions).
static LogicalResult reductionPreconditions(LinalgOp op) {		static LogicalResult reductionPreconditions(LinalgOp op) {
if (llvm::none_of(op.iterator_types(), isReductionIterator)) {		if (llvm::none_of(op.iterator_types(), isReductionIterator)) {
LDBG("reduction precondition failed: no reduction iterator");		LDBG("reduction precondition failed: no reduction iterator");
return failure();		return failure();
▲ Show 20 Lines • Show All 776 Lines • Show Last 20 Lines

mlir/lib/IR/AffineMap.cpp

Show First 20 Lines • Show All 489 Lines • ▼ Show 20 Lines	SmallVector<int64_t, 4> AffineMap::compose(ArrayRef<int64_t> values) const {
auto resMap = compose(AffineMap::get(0, 0, exprs, ctx));		auto resMap = compose(AffineMap::get(0, 0, exprs, ctx));
SmallVector<int64_t, 4> res;		SmallVector<int64_t, 4> res;
res.reserve(resMap.getNumResults());		res.reserve(resMap.getNumResults());
for (auto e : resMap.getResults())		for (auto e : resMap.getResults())
res.push_back(e.cast<AffineConstantExpr>().getValue());		res.push_back(e.cast<AffineConstantExpr>().getValue());
return res;		return res;
}		}

bool AffineMap::isProjectedPermutation() const {		bool AffineMap::isProjectedPermutation(bool allowZeroInResults) const {
if (getNumSymbols() > 0)		if (getNumSymbols() > 0)
return false;		return false;

		// Having more results than inputs means that results have duplicated dims or
		// zeros that can't be mapped to input dims.
		if (getNumResults() > getNumInputs())
		return false;

SmallVector<bool, 8> seen(getNumInputs(), false);		SmallVector<bool, 8> seen(getNumInputs(), false);
		// A projected permutation can have, at most, only one instance of each input
		// dimension in the result expressions. Zeros are allowed as long as the
		// number of result expressions is lower or equal than the number of input
		// expressions.
for (auto expr : getResults()) {		for (auto expr : getResults()) {
if (auto dim = expr.dyn_cast<AffineDimExpr>()) {		if (auto dim = expr.dyn_cast<AffineDimExpr>()) {
if (seen[dim.getPosition()])		if (seen[dim.getPosition()])
return false;		return false;
seen[dim.getPosition()] = true;		seen[dim.getPosition()] = true;
continue;		} else {
}		auto constExpr = expr.dyn_cast<AffineConstantExpr>();
		if (!allowZeroInResults \|\| !constExpr \|\| constExpr.getValue() != 0)
return false;		return false;
}		}
		}

		// Results are either dims or zeros and zeros can be mapped to input dims.
return true;		return true;
}		}
		ThomasRaouxUnsubmitted Not Done Reply Inline Actions Does that mean `(d0, d1) -> (0, 0)` would be a projected permuation? That doesn't sound right. I think you need a different function to check for something like prjected permutation with broadcast like a generalization of `isPermutationOfMinorIdentityWithBroadcasting`? ThomasRaoux: Does that mean `(d0, d1) -> (0, 0)` would be a projected permuation? That doesn't sound right.
		dcaballeAuthorUnsubmitted Done Reply Inline Actions Good point! Note that this utility, not sure if intentionally, is already returning true for `(d0, d1) -> ()`. `(d0, d1) -> (0, 0)` would be the equivalent preserving the dims. I think it boils down to what the definition of "projected permutation" is. I asked @nicolasvasilache and it seems to be ok to have zeros in a projected permutation. We could add logic to skip `(d0, d1) -> ()` and `(d0, d1) -> (0, 0)` but they look like a valid corner case to me. WDYT? dcaballe: Good point! Note that this utility, not sure if intentionally, is already returning true for `…
		ThomasRaouxUnsubmitted Not Done Reply Inline Actions In my mind projected permutation meant it has a permutation of the input dimensions and some dimensions may be missing (projection), so `(d0, d1) -> ()` sounds like a valid projected permutation but having zeros instead of dimensions seems a bit odd. @nicolasvasilache is the referenced so if he thinks it is correct to allow extra zeros it is fine with me but this seems less intuitive to me. It would be good to look at how this helper is used right now and check if any places assume that it means that all the results are dimensions. ThomasRaoux: In my mind projected permutation meant it has a permutation of the input dimensions and some…

bool AffineMap::isPermutation() const {		bool AffineMap::isPermutation() const {
if (getNumDims() != getNumResults())		if (getNumDims() != getNumResults())
return false;		return false;
return isProjectedPermutation();		return isProjectedPermutation();
}		}

AffineMap AffineMap::getSubMap(ArrayRef<unsigned> resultPos) const {		AffineMap AffineMap::getSubMap(ArrayRef<unsigned> resultPos) const {
▲ Show 20 Lines • Show All 170 Lines • ▼ Show 20 Lines	for (auto expr : exprs)
if (expr)		if (expr)
seenExprs.push_back(expr);		seenExprs.push_back(expr);
if (seenExprs.size() != map.getNumInputs())		if (seenExprs.size() != map.getNumInputs())
return AffineMap();		return AffineMap();
return AffineMap::get(map.getNumResults(), 0, seenExprs, map.getContext());		return AffineMap::get(map.getNumResults(), 0, seenExprs, map.getContext());
}		}

AffineMap mlir::inverseAndBroadcastProjectedPermuation(AffineMap map) {		AffineMap mlir::inverseAndBroadcastProjectedPermuation(AffineMap map) {
assert(map.isProjectedPermutation());		assert(map.isProjectedPermutation(/allowZeroInResults=/true));
MLIRContext *context = map.getContext();		MLIRContext *context = map.getContext();
AffineExpr zero = mlir::getAffineConstantExpr(0, context);		AffineExpr zero = mlir::getAffineConstantExpr(0, context);
// Start with all the results as 0.		// Start with all the results as 0.
SmallVector<AffineExpr, 4> exprs(map.getNumInputs(), zero);		SmallVector<AffineExpr, 4> exprs(map.getNumInputs(), zero);
for (unsigned i : llvm::seq(unsigned(0), map.getNumResults())) {		for (unsigned i : llvm::seq(unsigned(0), map.getNumResults())) {
// Reverse each dimension existing in the oringal map result.		// Skip zeros from input map. 'exprs' is already initialized to zero.
		if (auto constExpr = map.getResult(i).dyn_cast<AffineConstantExpr>()) {
		assert(constExpr.getValue() == 0 &&
		"Unexpected constant in projected permutation");
		(void)constExpr;
		continue;
		}

		// Reverse each dimension existing in the original map result.
exprs[map.getDimPosition(i)] = getAffineDimExpr(i, context);		exprs[map.getDimPosition(i)] = getAffineDimExpr(i, context);
}		}
return AffineMap::get(map.getNumResults(), /symbolCount=/0, exprs, context);		return AffineMap::get(map.getNumResults(), /symbolCount=/0, exprs, context);
}		}

AffineMap mlir::concatAffineMaps(ArrayRef<AffineMap> maps) {		AffineMap mlir::concatAffineMaps(ArrayRef<AffineMap> maps) {
unsigned numResults = 0, numDims = 0, numSymbols = 0;		unsigned numResults = 0, numDims = 0, numSymbols = 0;
for (auto m : maps)		for (auto m : maps)
▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/vectorization.mlir

Show First 20 Lines • Show All 857 Lines • ▼ Show 20 Lines	^bb0(%in0: f32, %out0: f32): // no predecessors
%min = minf %in0, %out0 : f32		%min = minf %in0, %out0 : f32
linalg.yield %min : f32		linalg.yield %min : f32
} -> tensor<4xf32>		} -> tensor<4xf32>
return %red : tensor<4xf32>		return %red : tensor<4xf32>
}		}

// -----		// -----

		// CHECK-DAG: #[[$M5:.*]] = affine_map<(d0, d1) -> (d0, 0)>

		// CHECK-LABEL: func @explicit_broadcast(
		func @explicit_broadcast(%arg0: tensor<4x4xf32>, %arg1: tensor<4x1xf32>) -> tensor<4x4xf32> {
		// CHECK: vector.transfer_read {{.*}} {in_bounds = [true, true]} : tensor<4x4xf32>, vector<4x4xf32>
		// CHECK: vector.transfer_read {{.*}} {in_bounds = [true, true], permutation_map = #[[$M5]]} : tensor<4x1xf32>, vector<4x4xf32>
		// CHECK: subf {{.*}} : vector<4x4xf32>
		// CHECK: vector.transfer_write {{.*}} {in_bounds = [true, true]} : vector<4x4xf32>, tensor<4x4xf32>
		%c0 = constant 0.0 : f32
		%init = linalg.init_tensor [4, 4] : tensor<4x4xf32>
		%fill = linalg.fill(%c0, %init) : f32, tensor<4x4xf32> -> tensor<4x4xf32>
		%red = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,
		affine_map<(d0, d1) -> (d0, 0)>,
		affine_map<(d0, d1) -> (d0, d1)>],
		iterator_types = ["parallel", "parallel"]}
		ins(%arg0, %arg1 : tensor<4x4xf32>, tensor<4x1xf32>)
		outs(%fill : tensor<4x4xf32>) {
		^bb0(%arg7: f32, %arg8: f32, %arg9: f32):
		%40 = subf %arg7, %arg8 : f32
		linalg.yield %40 : f32
		} -> tensor<4x4xf32>
		return %red : tensor<4x4xf32>
		}

		// -----

		// CHECK-DAG: #[[$M6:.*]] = affine_map<(d0, d1) -> (d0, 0)>
		// CHECK-DAG: #[[$M7:.*]] = affine_map<(d0) -> (d0, 0)>

		// CHECK-LABEL: func @fused_broadcast_red_2d
		func @fused_broadcast_red_2d(%arg0: tensor<4x4xf32>, %arg1: tensor<4x1xf32>) -> tensor<4xf32> {
		// CHECK: vector.transfer_read {{.*}} {in_bounds = [true, true]} : tensor<4x4xf32>, vector<4x4xf32>
		// CHECK: vector.transfer_read {{.*}} {in_bounds = [true, true], permutation_map = #[[$M6]]} : tensor<4x1xf32>, vector<4x4xf32>
		// CHECK: vector.transfer_read {{.*}} {in_bounds = [true, true], permutation_map = #[[$M7]]} : tensor<4xf32>, vector<4x4xf32>
		// CHECK: subf {{.*}} : vector<4x4xf32>
		// CHECK: math.exp {{.*}} : vector<4x4xf32>
		// CHECK: addf {{.*}} : vector<4x4xf32>
		// CHECK: vector.multi_reduction #vector.kind<add>, {{.*}} : vector<4x4xf32> to vector<4xf32>
		// CHECK: vector.transfer_write {{.*}} {in_bounds = [true]} : vector<4xf32>, tensor<4xf32>
		%c0 = constant 0.0 : f32
		%init = linalg.init_tensor [4] : tensor<4xf32>
		%fill = linalg.fill(%c0, %init) : f32, tensor<4xf32> -> tensor<4xf32>
		%red = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,
		affine_map<(d0, d1) -> (d0, 0)>,
		affine_map<(d0, d1) -> (d0)>],
		iterator_types = ["parallel", "reduction"]}
		ins(%arg0, %arg1 : tensor<4x4xf32>, tensor<4x1xf32>)
		outs(%fill : tensor<4xf32>) {
		^bb0(%arg7: f32, %arg8: f32, %arg9: f32):
		%40 = subf %arg7, %arg8 : f32
		%41 = math.exp %40 : f32
		%42 = addf %41, %arg9 : f32
		linalg.yield %42 : f32
		} -> tensor<4xf32>
		return %red : tensor<4xf32>
		}

		// -----

// CHECK-LABEL: func @reduce_1d(		// CHECK-LABEL: func @reduce_1d(
// CHECK-SAME: %[[A:.*]]: tensor<32xf32>		// CHECK-SAME: %[[A:.*]]: tensor<32xf32>
func @reduce_1d(%arg0: tensor<32xf32>) -> tensor<f32> {		func @reduce_1d(%arg0: tensor<32xf32>) -> tensor<f32> {
// CHECK-DAG: %[[F0_v1:.*]] = constant dense<0.000000e+00> : vector<1xf32>		// CHECK-DAG: %[[F0_v1:.*]] = constant dense<0.000000e+00> : vector<1xf32>
// CHECK-DAG: %[[F0_v32:.*]] = constant dense<0.000000e+00> : vector<32xf32>		// CHECK-DAG: %[[F0_v32:.*]] = constant dense<0.000000e+00> : vector<32xf32>
// CHECK-DAG: %[[C0:.*]] = constant 0 : index		// CHECK-DAG: %[[C0:.*]] = constant 0 : index
%f0 = constant 0.000000e+00 : f32		%f0 = constant 0.000000e+00 : f32

Show All 20 Lines	%2 = linalg.generic {
outs(%1 : tensor<f32>) {		outs(%1 : tensor<f32>) {
^bb0(%a: f32, %b: f32): // no predecessors		^bb0(%a: f32, %b: f32): // no predecessors
%3 = addf %a, %b : f32		%3 = addf %a, %b : f32
linalg.yield %3 : f32		linalg.yield %3 : f32
} -> tensor<f32>		} -> tensor<f32>

return %2 : tensor<f32>		return %2 : tensor<f32>
}		}