This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Dialect/SparseTensor/Transforms/
-
Dialect/
-
SparseTensor/
-
Transforms/
-
CodegenEnv.h
-
CodegenEnv.cpp
-
Sparsification.cpp
-
test/Dialect/SparseTensor/
-
Dialect/
-
SparseTensor/
-
sparse_kernels.mlir

Differential D150061

[mlir][sparse] Use incremental topological sort to compute loop order from iteration graph
Needs ReviewPublic

Authored by streichgeorg on May 7 2023, 3:23 AM.

Download Raw Diff

Details

Reviewers

nicolasvasilache
aartbik

Summary

Refactor iteration graph code a bit so that the loop order can be computed incrementally and constraints that would create cycles can be skipped. This makes it possible to apply constraints in a more fine-grained way. The current implementation increases the asymptotic complexity of the procedure, so it may need some optimization. Some other improvements I have in mind are

Add loop type constraints (filter < parallel < reduction) via edges as well, instead of implicitly.
Relax constraints for dense tensors so that we only require the innermost dimension to be last. As far as I can tell, there is no need to have a specific ordering between dimensions otherwise.
Add constraints for a single dense tensor as a group (e.g. only add them if all can be satisfied)

At the moment when running check-mlir one test fails. This is because now some of the Undef constraints can be applied for conv2d which moves the sparse loop outside.

Update:
I changed the conv2d check to match the code that is now generated.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

streichgeorg created this revision.May 7 2023, 3:23 AM

Herald added a reviewer: aartbik. · View Herald TranscriptMay 7 2023, 3:23 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: bviyer, hanchung, jsetoain and 25 others. · View Herald Transcript

Harbormaster completed remote builds in B230478: Diff 520162.May 7 2023, 3:35 AM

streichgeorg edited the summary of this revision. (Show Details)May 7 2023, 7:50 AM

streichgeorg edited the summary of this revision. (Show Details)

streichgeorg edited the summary of this revision. (Show Details)May 7 2023, 7:53 AM

Update test

streichgeorg published this revision for review.May 9 2023, 11:05 PM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptMay 9 2023, 11:05 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Harbormaster completed remote builds in B231034: Diff 520911.May 9 2023, 11:17 PM

Fix history

Squash commits

Harbormaster completed remote builds in B231040: Diff 520917.May 9 2023, 11:59 PM

Thanks for contributing to the sparse compiler effort!

At the moment when running check-mlir one test fails. This is because now some of the Undef constraints can be applied for conv2d and the new loop order is an actual improvement (because it moves sparse loops outside).

Can you please update the CHECK test too in that case (or did you, I see a test updated, but also this comment, hence the question)? Also, did you measure any actual improvements. The sparse-out is intuitive, but it does not always really result in performance gains (still O(nnz * n) vs O(n * nnz). Having some measurements to back up the claim would help.

aartbik retitled this revision from Use incremental topological sort to compute loop order from iteration graph to [mlir][sparse] Use incremental topological sort to compute loop order from iteration graph.May 13 2023, 7:41 PM

Thank you for your reply!

I did update the check but forgot to update my comment accordingly. With the updated check, all tests in check-mlir are passing.

As for the improvement, I was only considering how many constraints are satisfied by each loop ordering, but did not look at any performance measurements. I will try to benchmark the two versions and report my findings.

streichgeorg edited the summary of this revision. (Show Details)May 17 2023, 11:54 AM

streichgeorg edited the summary of this revision. (Show Details)

Relax constraints for dense tensors so that we only require the innermost dimension to be last. As far as I can tell, there is no need to have a specific ordering between dimensions otherwise.

This might not be true. If the cache line is large enough to hold multiple rows, it is better to visit them in order too. So you might still want to ensure the ordering for all dimensions if it is satisfiable. But what you proposed might be a suboptimal order in between if it is not satisfiable.

also FYI, the filter-loop might be removed sometime in the future because we have a better implementation to replace it.

Herald added a subscriber: K-Wu. · View Herald TranscriptMay 22 2023, 10:41 AM

Quick update, I managed to get the python benchmarking script working and added a convolution kernel as well as more matrix multiplication configurations. As far as I can tell there seems to be no performance gain by the new loop order this change introduces (at least for the sizes and sparsity factors I tried out).

Furthermore, I also tried leaving out the 'undef' constraints entirely, which seems to improve performance on the convolution kernel by a bit, but, causes a regression for matrix multiplication. As a next step I will try to figure out why that is and come up with a configuration that is fast for both of them.

This might not be true. If the cache line is large enough to hold multiple rows, it is better to visit them in order too. So you might still want to ensure the ordering for all dimensions if it is satisfiable. But what you proposed might be a suboptimal order in between if it is not satisfiable.

True, I did not consider the case where dimensions are small.

We really appreciate your contributions, but are a bit reluctant to accept this revision. It seems to over-engineer something that is currently not a bottleneck for us at all,and swapping in new code always has the risk of breaking something unexpected.

aartbik resigned from this revision.Aug 16 2023, 2:48 PM

Revision Contents

Path

Size

mlir/

lib/

Dialect/

SparseTensor/

Transforms/

CodegenEnv.h

17 lines

CodegenEnv.cpp

21 lines

Sparsification.cpp

519 lines

test/

Dialect/

SparseTensor/

sparse_kernels.mlir

43 lines

Diff 520917

mlir/lib/Dialect/SparseTensor/Transforms/CodegenEnv.h

Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	public:
//		//

/// Whether the tensor expression is admissible for codegen.		/// Whether the tensor expression is admissible for codegen.
/// It also sets the sparseOut if the output tensor is sparse.		/// It also sets the sparseOut if the output tensor is sparse.
bool isAdmissibleTensorExp(ExprId e);		bool isAdmissibleTensorExp(ExprId e);

/// Whether the iteration graph is sorted in admissible topoOrder.		/// Whether the iteration graph is sorted in admissible topoOrder.
/// Sets outerParNest on success with sparse output		/// Sets outerParNest on success with sparse output
bool isAdmissibleTopoOrder();		bool isAdmissibleTopoOrder(const std::vector<LoopId> &topSort);

//		//
// Topological delegate and sort methods.		// Topological delegate and sort methods.
//		//

LoopOrd topSortSize() const { return topSort.size(); }		LoopOrd topSortSize() const { return topSort.size(); }
LoopId topSortAt(LoopOrd n) const { return topSort.at(n); }		LoopId topSortAt(LoopOrd n) const { return topSort.at(n); }
void topSortPushBack(LoopId i) { topSort.push_back(i); }		void topSortSet(const std::vector<LoopId> &value) {
void topSortClear(size_t capacity = 0) {		assert(isAdmissibleTopoOrder(value));
topSort.clear();		if (hasSparseOutput())
topSort.reserve(capacity);		outerParNest = computeTopoOrderNest(value);
		topSort = value;
}		}

ArrayRef<LoopId> getTopSortSlice(LoopOrd n, LoopOrd m) const;		ArrayRef<LoopId> getTopSortSlice(LoopOrd n, LoopOrd m) const;
ArrayRef<LoopId> getLoopStackUpTo(LoopOrd n) const;		ArrayRef<LoopId> getLoopStackUpTo(LoopOrd n) const;
ArrayRef<LoopId> getCurrentLoopStack() const;		ArrayRef<LoopId> getCurrentLoopStack() const;
/// Returns the induction-variable for the loop identified by the given		/// Returns the induction-variable for the loop identified by the given
/// `LoopId`. This method handles application of the topological sort		/// `LoopId`. This method handles application of the topological sort
/// in order to convert the `LoopId` into the corresponding `LoopOrd`.		/// in order to convert the `LoopId` into the corresponding `LoopOrd`.
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	private:

// Loop emitter helper class.		// Loop emitter helper class.
LoopEmitter loopEmitter;		LoopEmitter loopEmitter;

// Topological sort. This serves as a mapping from `LoopOrd` to `LoopId`		// Topological sort. This serves as a mapping from `LoopOrd` to `LoopId`
// (cf., `getLoopVar` and `topSortAt`).		// (cf., `getLoopVar` and `topSortAt`).
std::vector<LoopId> topSort;		std::vector<LoopId> topSort;

		// Computes `outerParNest` for a given ordering. We add a function for this
		// because we need to compute this when checking if an ordering is admissible.
		LoopOrd computeTopoOrderNest(const std::vector<LoopId> &topSort);

// Sparse tensor as output. Implemented either through direct injective		// Sparse tensor as output. Implemented either through direct injective
// insertion in lexicographic index order or through access pattern		// insertion in lexicographic index order or through access pattern
// expansion in the innermost loop nest (`expValues` through `expCount`).		// expansion in the innermost loop nest (`expValues` through `expCount`).
OpOperand *sparseOut;		OpOperand *sparseOut;
// The count of outer non-filter loops, as defined by `isAdmissibleTopoOrder`.		// The count of outer non-filter loops, as defined by `computeTopoOrderNest`.
LoopOrd outerParNest;		LoopOrd outerParNest;
Value insChain;		Value insChain;
Value expValues;		Value expValues;
Value expFilled;		Value expFilled;
Value expAdded;		Value expAdded;
Value expCount;		Value expCount;

// Bookkeeping for reductions (up-to-date value of the reduction, and indices		// Bookkeeping for reductions (up-to-date value of the reduction, and indices
Show All 19 Lines

mlir/lib/Dialect/SparseTensor/Transforms/CodegenEnv.cpp

Show First 20 Lines • Show All 177 Lines • ▼ Show 20 Lines	if (latticeMerger.isSingleCondition(tensor, exp))
return true;		return true;

// Accept "truly dynamic" if the output tensor materializes uninitialized		// Accept "truly dynamic" if the output tensor materializes uninitialized
// into the computation and insertions occur in lexicographic index order.		// into the computation and insertions occur in lexicographic index order.
sparseOut = lhs;		sparseOut = lhs;
return isMaterializing(lhs->get());		return isMaterializing(lhs->get());
}		}

bool CodegenEnv::isAdmissibleTopoOrder() {		LoopOrd CodegenEnv::computeTopoOrderNest(const std::vector<LoopId> &topSort) {
if (!hasSparseOutput())
return true;

OpOperand *lhs = linalgOp.getDpsInitOperand(0);
// Accept "truly dynamic" if the output tensor materializes uninitialized		// Accept "truly dynamic" if the output tensor materializes uninitialized
// into the computation and insertions occur in lexicographic index order.		// into the computation and insertions occur in lexicographic index order.
LoopOrd nest = 0;		LoopOrd nest = 0;
const auto iteratorTypes = linalgOp.getIteratorTypesArray();		const auto iteratorTypes = linalgOp.getIteratorTypesArray();
assert(topSortSize() == latticeMerger.getNumLoops());		assert(topSort.size() == latticeMerger.getNumLoops());
for (const LoopId i : topSort) {		for (const LoopId i : topSort) {
if (!latticeMerger.isFilterLoop(i)) {		if (!latticeMerger.isFilterLoop(i)) {
// We only count non-filter loops as filter loops should be considered		// We only count non-filter loops as filter loops should be considered
// a special type of parallel loops.		// a special type of parallel loops.
if (linalg::isReductionIterator(iteratorTypes[i]))		if (linalg::isReductionIterator(iteratorTypes[i]))
break; // terminate at first reduction		break; // terminate at first reduction
nest++;		nest++;
}		}
}		}

		return nest;
		}

		bool CodegenEnv::isAdmissibleTopoOrder(const std::vector<LoopId> &topSort) {
		if (!hasSparseOutput())
		return true;

		LoopOrd nest = computeTopoOrderNest(topSort);

		OpOperand *lhs = linalgOp.getDpsInitOperand(0);

// Determine admissible dynamic insertion situations:		// Determine admissible dynamic insertion situations:
// (1) fully injective, since there are no reductions,		// (1) fully injective, since there are no reductions,
// (2) admissible 1-d expansion in innermost dimension.		// (2) admissible 1-d expansion in innermost dimension.
if (static_cast<int64_t>(nest) >= linalgOp.getRank(lhs) - 1) {		if (static_cast<int64_t>(nest) >= linalgOp.getRank(lhs) - 1) {
outerParNest = nest;
return true;		return true;
}		}
return false;		return false;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Code generation environment topological sort methods		// Code generation environment topological sort methods
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines

mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp

Show All 39 Lines
using namespace mlir::sparse_tensor;		using namespace mlir::sparse_tensor;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Declarations		// Declarations
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

namespace {		namespace {

/// Iteration graph sorting.
enum class SortMask : unsigned {
// The individual mask bits.
kIncludeDenseOutput = 0x1, // b001
kIncludeDenseInput = 0x2, // b010
kIncludeUndef = 0x4, // b100
// The subsets of mask bits.
kIncludeAll = 0x7, // b111
kIncludeDense = 0x3, // b011
kSparseOnly = 0x0, // b000
};

inline static bool includesAny(SortMask mask1, SortMask mask2) {
return static_cast<unsigned>(mask1) & static_cast<unsigned>(mask2);
}

inline static bool includesDenseInput(SortMask mask) {
return includesAny(mask, SortMask::kIncludeDenseInput);
}

inline static bool includesDenseOutput(SortMask mask) {
return includesAny(mask, SortMask::kIncludeDenseOutput);
}

inline static bool includesDense(SortMask mask) {
return includesAny(mask, SortMask::kIncludeDense);
}

inline static bool includesUndef(SortMask mask) {
return includesAny(mask, SortMask::kIncludeUndef);
}

/// A helper class that visits an affine expression and tries to find an		/// A helper class that visits an affine expression and tries to find an
/// AffineDimExpr to which the corresponding iterator from a GenericOp matches		/// AffineDimExpr to which the corresponding iterator from a GenericOp matches
/// the desired iterator type.		/// the desired iterator type.
class AffineDimFinder : public AffineExprVisitor<AffineDimFinder> {		class AffineDimFinder : public AffineExprVisitor<AffineDimFinder> {
public:		public:
explicit AffineDimFinder(linalg::GenericOp op)		explicit AffineDimFinder(linalg::GenericOp op)
: iterTypes(op.getIteratorTypes()) {}		: iterTypes(op.getIteratorTypes()) {}

Show All 28 Lines

// Flattens an affine expression into a list of AffineDimExprs.		// Flattens an affine expression into a list of AffineDimExprs.
struct AffineDimCollector : public AffineExprVisitor<AffineDimCollector> {		struct AffineDimCollector : public AffineExprVisitor<AffineDimCollector> {
// Overrides method from AffineExprVisitor.		// Overrides method from AffineExprVisitor.
void visitDimExpr(AffineDimExpr expr) { dims.push_back(expr); }		void visitDimExpr(AffineDimExpr expr) { dims.push_back(expr); }
SmallVector<AffineDimExpr> dims;		SmallVector<AffineDimExpr> dims;
};		};

		/// Allows constructing a topological sort incrementally. Loosely inspired by
		/// `llvm::ScheduleDAGTopologicalSort`.
		class IncrementalTopoSort {
		public:
		// Initializes the sorting with the given order. As the sorting is 'stable'
		// the ordering between nodes will be kept unless a newly added edge requires
		// it to change. The graph will not contain any edges at this point.
		IncrementalTopoSort(SmallVector<LoopId> &initialOrder);

		// Returns whether adding an edge from `a` to `b` would create a cycle and
		// invalidate the sort.
		bool createsCycle(LoopId a, LoopId b);

		std::vector<LoopId> &getCurrent() { return index2Loop; };

		// Adds a directed edge from `a` to `b` to the graph and updates the sorting.
		void addEdge(LoopId a, LoopId b);

		private:
		bool DFS(LoopId a, LoopOrd upperBound, llvm::SmallBitVector &visited);
		void allocate(LoopId l, LoopOrd index) {
		loop2Index[l] = index;
		index2Loop[index] = l;
		}

		LoopOrd numLoops;
		std::vector<LoopId> index2Loop;
		SmallVector<LoopOrd> loop2Index;

		SmallVector<SmallVector<LoopId>> succ;
		};

} // namespace		} // namespace

		IncrementalTopoSort::IncrementalTopoSort(SmallVector<LoopId> &initialOrder)
		: numLoops(initialOrder.size()) {
		index2Loop.resize(numLoops);
		loop2Index.resize(numLoops);

		for (LoopOrd i = 0; i < numLoops; i++) {
		allocate(initialOrder[i], i);
		}

		succ.resize(numLoops);
		}

		bool IncrementalTopoSort::createsCycle(LoopId a, LoopId b) {
		LoopOrd lowerBound = loop2Index[b];
		LoopOrd upperBound = loop2Index[a];

		if (upperBound < lowerBound)
		return false;

		llvm::SmallBitVector visited(numLoops);
		return DFS(b, upperBound, visited);
		}

		void IncrementalTopoSort::addEdge(LoopId a, LoopId b) {
		LoopOrd lowerBound = loop2Index[b];
		LoopOrd upperBound = loop2Index[a];

		// Check if the constraint is not already satisfied.
		if (lowerBound < upperBound) {
		llvm::SmallBitVector visited(numLoops);

		// Find nodes which need to be moved behind `a`.
		bool createsCycle = DFS(b, upperBound, visited);
		assert(!createsCycle);

		SmallVector<LoopId> moveToBack;

		for (LoopOrd i = lowerBound; i <= upperBound; ++i) {
		LoopId l = index2Loop[i];
		if (visited.test(l)) {
		moveToBack.push_back(l);
		} else {
		allocate(l, i - moveToBack.size());
		}
		}

		for (unsigned j = 0; j < moveToBack.size(); ++j) {
		LoopId l = moveToBack[j];
		allocate(l, upperBound + 1 - moveToBack.size() + j);
		}
		}

		succ[a].push_back(b);
		}

		bool IncrementalTopoSort::DFS(LoopId a, LoopOrd upperBound,
		llvm::SmallBitVector &visited) {
		SmallVector<LoopId> queue = {a};

		while (!queue.empty()) {
		LoopId u = queue.pop_back_val();
		visited.set(u);

		for (LoopId v : succ[u]) {
		if (v == index2Loop[upperBound])
		return true;

		// Ignore nodes that come after upper bound as there cannot be any back
		// edges.
		if (!visited.test(v) && loop2Index[v] < upperBound)
		queue.push_back(v);
		}
		}

		return false;
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Sparse compiler analysis methods.		// Sparse compiler analysis methods.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

// TODO: the "idx"-vs-"ldx" naming convention is not self-explanatory,		// TODO: the "idx"-vs-"ldx" naming convention is not self-explanatory,
// and those letters are too easy to confuse visually. We should switch		// and those letters are too easy to confuse visually. We should switch
// to a more self-explanatory naming convention like "curLoop"-vs-"prevLoop"		// to a more self-explanatory naming convention like "curLoop"-vs-"prevLoop"
// (assuming that's the actual meaning behind the "idx"-vs-"ldx" convention).		// (assuming that's the actual meaning behind the "idx"-vs-"ldx" convention).
▲ Show 20 Lines • Show All 311 Lines • ▼ Show 20 Lines	for (Level l = 0; l < lvlRank; l++) {
return false; // inadmissible affine expression		return false; // inadmissible affine expression
}		}
}		}
}		}
assert(filterLdx == env.merger().getNumLoops());		assert(filterLdx == env.merger().getNumLoops());
return annotated;		return annotated;
}		}

/// A helper to compute a topological sort. O(n^2) time complexity		namespace {
/// as we use adj matrix for the graph.		enum ConstraintKind { Sparse, DenseInput, DenseOutput, Undef };
/// The sorted result will put the first Reduction iterator to the
/// latest possible `LoopOrd`.
///
/// The `inDegree` is indexed by `LoopId`, and the `adjM` is indexed by
/// `(LoopId,LoopId)`.
static bool topSortOptimal(CodegenEnv &env,
ArrayRef<utils::IteratorType> iteratorTypes,
std::vector<unsigned> &inDegree,
std::vector<std::vector<bool>> &adjM) {
std::vector<LoopId> redIt; // reduce iterator with 0 degree
std::vector<LoopId> parIt; // parallel iterator with 0 degree
std::vector<LoopId> filterIt; // filter loop with 0 degree
const LoopId numLoops = env.merger().getNumLoops();
for (LoopId i = 0; i < numLoops; i++) {
if (inDegree[i] == 0) {
if (env.merger().isFilterLoop(i))
filterIt.push_back(i);
else if (linalg::isReductionIterator(iteratorTypes[i]))
redIt.push_back(i);
else
parIt.push_back(i);
}
}

while (!redIt.empty() \|\| !parIt.empty() \|\| !filterIt.empty()) {		using OrderingConstraint = std::pair<LoopId, LoopId>;
// We always choose in order of filter loop -> parallel loop -> reduction		} // namespace
// loop because
// 1. Putting reduction loop early might make the loop sequence
// inadmissible.
// 2. Filter loops should be put as early as possible for better
// performance, since only one (if any) iteration will carry the
// computation. E.g., for (1 to N)
// for (1 to M)
// for (1 to K)
// if (xxx)
// O(X) computation => O(NMK+NMX) time complexity
//
// By putting the filter loop one level up, we got
//
// for (1 to N)
// for (1 to K)
// if (xxx)
// for (1 to M)
// O(X) computation => O(NK+NMX) time complexity
auto &it = !filterIt.empty() ? filterIt : (!parIt.empty() ? parIt : redIt);
auto src = it.back();
env.topSortPushBack(src);
it.pop_back();
// Update in-degree, and push 0-degree node into worklist.
for (LoopId dst = 0; dst < numLoops; dst++) {
if (adjM[src][dst] && --inDegree[dst] == 0) {
if (env.merger().isFilterLoop(dst))
filterIt.push_back(dst);
else if (linalg::isReductionIterator(iteratorTypes[dst]))
redIt.push_back(dst);
else
parIt.push_back(dst);
}
}
}
return env.topSortSize() == numLoops;
}

/// Helper method to add all constraints from the indices in one affine		/// Helper method to add all constraints from the indices in one affine
/// expression before all indices in the other affine expression. For		/// expression before all indices in the other affine expression. For
/// example i0+i1 < i2+i3+1 yields i0<i2, i0<i3, i1<i2, and i1<i3.		/// example i0+i1 < i2+i3+1 yields i0<i2, i0<i3, i1<i2, and i1<i3.
/// The affine expression `a` is empty iff `fidx` have a value, leading to		/// The affine expression `a` is empty iff `fidx` have a value, leading to
/// b = (i0 + i1) < fidx => i0 < fidx, i1 < fidx.		/// b = (i0 + i1) < fidx => i0 < fidx, i1 < fidx.
/// The affine expression `b` is empty iff `tidx` have a value, leading to		/// The affine expression `b` is empty iff `tidx` have a value, leading to
/// tidx < a = (i0 + i1) => tidx < i0, tidx < i1.		/// tidx < a = (i0 + i1) => tidx < i0, tidx < i1.
///		///
/// The `inDegree` is indexed by `LoopId`, and the `adjM` is indexed by		/// The `inDegree` is indexed by `LoopId`, and the `adjM` is indexed by
/// `(LoopId,LoopId)`.		/// `(LoopId,LoopId)`.
static void addAffineOrderings(std::vector<std::vector<bool>> &adjM,		static void addAffineOrderings(SmallVector<OrderingConstraint> &constraints,
std::vector<unsigned> &inDegree, AffineExpr a,		AffineExpr a, AffineExpr b,
AffineExpr b, std::optional<LoopId> fidx,		std::optional<LoopId> fidx,
std::optional<LoopId> tidx) {		std::optional<LoopId> tidx) {
if (!a && !b) {		if (!a && !b) {
// Recursion leaf.		// Recursion leaf.
assert(fidx && tidx);		assert(fidx && tidx);
const LoopId f = fidx, t = tidx;		const LoopId f = fidx, t = tidx;
if (!adjM[f][t]) {		constraints.push_back({f, t});
adjM[f][t] = true;
inDegree[t]++;
}
return;		return;
}		}
// Picks an affine expression and expand (recurse into) it.		// Picks an affine expression and expand (recurse into) it.
const auto toExpand = a ? a : b;		const auto toExpand = a ? a : b;
switch (toExpand.getKind()) {		switch (toExpand.getKind()) {
case AffineExprKind::DimId: {		case AffineExprKind::DimId: {
const std::optional<LoopId> idx{		const std::optional<LoopId> idx{
toExpand.cast<AffineDimExpr>().getPosition()};		toExpand.cast<AffineDimExpr>().getPosition()};
if (toExpand == a)		if (toExpand == a)
addAffineOrderings(adjM, inDegree, AffineExpr(), b, idx, tidx);		addAffineOrderings(constraints, AffineExpr(), b, idx, tidx);
else // toExpand == b		else // toExpand == b
addAffineOrderings(adjM, inDegree, a, AffineExpr(), fidx, idx);		addAffineOrderings(constraints, a, AffineExpr(), fidx, idx);
break;		break;
}		}
case AffineExprKind::Add:		case AffineExprKind::Add:
case AffineExprKind::Mul: {		case AffineExprKind::Mul: {
auto binOp = toExpand.cast<AffineBinaryOpExpr>();		auto binOp = toExpand.cast<AffineBinaryOpExpr>();
if (toExpand == a) {		if (toExpand == a) {
addAffineOrderings(adjM, inDegree, binOp.getLHS(), b, fidx, tidx);		addAffineOrderings(constraints, binOp.getLHS(), b, fidx, tidx);
addAffineOrderings(adjM, inDegree, binOp.getRHS(), b, fidx, tidx);		addAffineOrderings(constraints, binOp.getRHS(), b, fidx, tidx);
} else {		} else {
addAffineOrderings(adjM, inDegree, a, binOp.getLHS(), fidx, tidx);		addAffineOrderings(constraints, a, binOp.getLHS(), fidx, tidx);
addAffineOrderings(adjM, inDegree, a, binOp.getRHS(), fidx, tidx);		addAffineOrderings(constraints, a, binOp.getRHS(), fidx, tidx);
}		}
break;		break;
}		}
default:		default:
break;		break;
}		}
}		}

Show All 26 Lines	if (!ta.isa<AffineConstantExpr>()) {
finder.setPickedIterType(utils::IteratorType::reduction);		finder.setPickedIterType(utils::IteratorType::reduction);
finder.walkPostOrder(ta);		finder.walkPostOrder(ta);
ta = finder.getDimExpr();		ta = finder.getDimExpr();
tldx = finder.getDimExpr().getPosition();		tldx = finder.getDimExpr().getPosition();
}		}
}		}
}		}

static void addFilterLoopBasedConstraints(CodegenEnv &env, OpOperand &t,		static void
OpOperand *skip, SortMask mask,		addFilterLoopBasedConstraints(CodegenEnv &env, OpOperand &t, OpOperand *skip,
std::vector<std::vector<bool>> &adjM,		SmallVector<OrderingConstraint> &constraints) {
std::vector<unsigned> &inDegree) {
// Get map, encoding, and tensor-identifier.		// Get map, encoding, and tensor-identifier.
const auto map = env.op().getMatchingIndexingMap(&t);		const auto map = env.op().getMatchingIndexingMap(&t);
const auto enc = getSparseTensorEncoding(t.get().getType());		const auto enc = getSparseTensorEncoding(t.get().getType());
const TensorId tid = env.makeTensorId(t.getOperandNumber());		const TensorId tid = env.makeTensorId(t.getOperandNumber());

// Each tensor expression and optional dimension ordering (row-major		// Each tensor expression and optional dimension ordering (row-major
// by default) puts an ordering constraint on the loop indices. For		// by default) puts an ordering constraint on the loop indices. For
// example, the tensor expresion A_ijk forces the ordering i < j < k		// example, the tensor expresion A_ijk forces the ordering i < j < k
// on the loop indices if no explicit dimension ordering is given.		// on the loop indices if no explicit dimension ordering is given.
const Level lvlRank = map.getNumResults();		const Level lvlRank = map.getNumResults();
assert(!enc \|\| lvlRank == enc.getLvlRank());		assert(!enc \|\| lvlRank == enc.getLvlRank());
for (Level lvl = 0; lvl < lvlRank; lvl++) {		for (Level lvl = 0; lvl < lvlRank; lvl++) {
// FIXME: `toOrigDim` is deprecated.		// FIXME: `toOrigDim` is deprecated.
AffineExpr ta = map.getResult(toOrigDim(enc, lvl));		AffineExpr ta = map.getResult(toOrigDim(enc, lvl));
std::optional<LoopId> tldx = env.merger().getLoopId(tid, lvl);		std::optional<LoopId> tldx = env.merger().getLoopId(tid, lvl);
// Filter loops should be constructed after all the dependent loops,		// Filter loops should be constructed after all the dependent loops,
// i.e., d0 + d1 < filter_loop(d0 + d1)		// i.e., d0 + d1 < filter_loop(d0 + d1)
if (tldx && env.merger().isFilterLoop(*tldx)) {		if (tldx && env.merger().isFilterLoop(*tldx)) {
assert(!ta.isa<AffineDimExpr>() &&		assert(!ta.isa<AffineDimExpr>() &&
!isDenseDLT(enc.getDimLevelType()[lvl]));		!isDenseDLT(enc.getDimLevelType()[lvl]));
addAffineOrderings(adjM, inDegree, ta, AffineExpr(), std::nullopt, tldx);		addAffineOrderings(constraints, ta, AffineExpr(), std::nullopt, tldx);
// Now that the ordering of affine expression is captured by filter		// Now that the ordering of affine expression is captured by filter
// loop idx, we only need to ensure the affine ordering against filter		// loop idx, we only need to ensure the affine ordering against filter
// loop. Thus, we reset the affine express to nil here to mark it as		// loop. Thus, we reset the affine express to nil here to mark it as
// resolved.		// resolved.
ta = AffineExpr();		ta = AffineExpr();
}		}

// Skip tensor during cycle resolution, though order between filter loop		// Skip tensor during cycle resolution, though order between filter loop
// and dependent loops need to be guaranteed unconditionally.		// and dependent loops need to be guaranteed unconditionally.
if (&t == skip)		if (&t == skip)
continue;		continue;

if (lvl > 0) {		if (lvl > 0) {
// FIXME: `toOrigDim` is deprecated.		// FIXME: `toOrigDim` is deprecated.
AffineExpr fa = map.getResult(toOrigDim(enc, lvl - 1));		AffineExpr fa = map.getResult(toOrigDim(enc, lvl - 1));
std::optional<LoopId> fldx = env.merger().getLoopId(tid, lvl - 1);		std::optional<LoopId> fldx = env.merger().getLoopId(tid, lvl - 1);

// Applying order constraints on every pair of dimExpr between two		// Applying order constraints on every pair of dimExpr between two
// compound affine expressions can sometime too strict:		// compound affine expressions can sometime too strict:
// E.g, for [dense, dense] -> (d0 + d1, d2 + d3).		// E.g, for [dense, dense] -> (d0 + d1, d2 + d3).
// It is totally fine to have loop sequence d0->d2->d1->d3 instead of		// It is totally fine to have loop sequence d0->d2->d1->d3 instead of
// requiring d0 < d2, d1 < d2, d0 < d3, d1 < d3.		// requiring d0 < d2, d1 < d2, d0 < d3, d1 < d3.
// We also relax the affine constraint when use slice-based algorithm		// We also relax the affine constraint when use slice-based algorithm
// as there is no filter loop for affine index on sparse dimension.		// as there is no filter loop for affine index on sparse dimension.
// TODO: do we really need the condition?
if (!includesDense(mask))
tryRelaxAffineConstraints(env.op(), fldx, fa, tldx, ta);		tryRelaxAffineConstraints(env.op(), fldx, fa, tldx, ta);

// (d0 + d1) < (d2 + d3), or		// (d0 + d1) < (d2 + d3), or
// filter_loop_d-1 < (d2 + d3), or		// filter_loop_d-1 < (d2 + d3), or
// (d0 + d1) < filter_loop_d, or		// (d0 + d1) < filter_loop_d, or
// filter_loop_d-1 < filter_loop_d depending on whether fa/ta is reset		// filter_loop_d-1 < filter_loop_d depending on whether fa/ta is reset
// above.		// above.
addAffineOrderings(adjM, inDegree, fa, ta, fldx, tldx);		addAffineOrderings(constraints, fa, ta, fldx, tldx);
}		}
}		}
}		}

static void addSliceBasedConstraints(CodegenEnv &env, OpOperand &t,		static void
OpOperand *skip, SortMask mask,		addSliceBasedConstraints(CodegenEnv &env, OpOperand &t, OpOperand *skip,
std::vector<std::vector<bool>> &adjM,		SmallVector<OrderingConstraint> &constraints) {
std::vector<unsigned> &inDegree) {
// Get map and encoding.		// Get map and encoding.
const auto map = env.op().getMatchingIndexingMap(&t);		const auto map = env.op().getMatchingIndexingMap(&t);
const auto enc = getSparseTensorEncoding(t.get().getType());		const auto enc = getSparseTensorEncoding(t.get().getType());

// No special treatment for simple indices.		// No special treatment for simple indices.
if (getNumNonTrivialIdxExpOnSparseLvls(map, t.get()) == 0)		if (getNumNonTrivialIdxExpOnSparseLvls(map, t.get()) == 0)
return addFilterLoopBasedConstraints(env, t, skip, mask, adjM, inDegree);		return addFilterLoopBasedConstraints(env, t, skip, constraints);

// Skip tensor during cycle resolution, though order between filter loop		// Skip tensor during cycle resolution, though order between filter loop
// and dependent loops need to be guaranteed unconditionally.		// and dependent loops need to be guaranteed unconditionally.
if (&t == skip)		if (&t == skip)
return;		return;

AffineDimFinder finder(env.op());		AffineDimFinder finder(env.op());
finder.setPickedIterType(utils::IteratorType::reduction);		finder.setPickedIterType(utils::IteratorType::reduction);
Show All 13 Lines	for (Level lvl = 1; lvl < lvlRank; lvl++) {
const AffineDimExpr fexp = finder.getDimExpr();		const AffineDimExpr fexp = finder.getDimExpr();
const LoopId fldx = env.makeLoopId(fexp.getPosition());		const LoopId fldx = env.makeLoopId(fexp.getPosition());

finder.walkPostOrder(ta);		finder.walkPostOrder(ta);
const AffineDimExpr texp = finder.getDimExpr();		const AffineDimExpr texp = finder.getDimExpr();
const LoopId tldx = env.makeLoopId(texp.getPosition());		const LoopId tldx = env.makeLoopId(texp.getPosition());

// d_x > d_y		// d_x > d_y
if (!adjM[fldx][tldx]) {		constraints.push_back({fldx, tldx});
adjM[fldx][tldx] = true;
inDegree[tldx]++;
}

AffineDimCollector fCollector;		AffineDimCollector fCollector;
fCollector.walkPostOrder(fa);		fCollector.walkPostOrder(fa);
AffineDimCollector tCollector;		AffineDimCollector tCollector;
tCollector.walkPostOrder(ta);		tCollector.walkPostOrder(ta);

// make sure dx and dy is the last;		// make sure dx and dy is the last;
for (auto fd : fCollector.dims) {		for (auto fd : fCollector.dims) {
const LoopId f = env.makeLoopId(fd.getPosition());		const LoopId f = env.makeLoopId(fd.getPosition());
if (f == fldx)		if (f == fldx)
continue;		continue;
if (!adjM[f][fldx]) {
adjM[f][fldx] = true;		constraints.push_back({f, fldx});
inDegree[fldx]++;
}
}		}
for (auto td : tCollector.dims) {		for (auto td : tCollector.dims) {
const LoopId t = env.makeLoopId(td.getPosition());		const LoopId t = env.makeLoopId(td.getPosition());
if (t == tldx)		if (t == tldx)
continue;		continue;
if (!adjM[t][tldx]) {
adjM[t][tldx] = true;		constraints.push_back({t, tldx});
inDegree[tldx]++;
}
}		}
// Since we only support affine addition, the order between two dim		// Since we only support affine addition, the order between two dim
// expression does not really matters.		// expression does not really matters.
// {d0, d1, d3} - d_x > {d4, d5, d6} - d_y		// {d0, d1, d3} - d_x > {d4, d5, d6} - d_y
// This is to ensure that the affine expressions are reduced in sparse		// This is to ensure that the affine expressions are reduced in sparse
// tensor level ordering.		// tensor level ordering.
// TODO: this ordering could probably be loosen if we support out-of-order		// TODO: this ordering could probably be loosen if we support out-of-order
// reduction.		// reduction.
// TODO: the evaluation order need to be ensure to		// TODO: the evaluation order need to be ensure to
// support affine multiplication.		// support affine multiplication.
for (auto fd : fCollector.dims) {		for (auto fd : fCollector.dims) {
const LoopId f = env.makeLoopId(fd.getPosition());		const LoopId f = env.makeLoopId(fd.getPosition());
if (f == fldx) // skip d_x		if (f == fldx) // skip d_x
continue;		continue;

for (auto td : tCollector.dims) {		for (auto td : tCollector.dims) {
const LoopId t = env.makeLoopId(td.getPosition());		const LoopId t = env.makeLoopId(td.getPosition());
if (t == tldx) // skip d_y		if (t == tldx) // skip d_y
continue;		continue;
if (!adjM[f][t]) {		constraints.push_back({f, t});
adjM[f][t] = true;
inDegree[t]++;
}
}		}
}		}
}		}
}		}

/// Computes a topologically sorted iteration graph for the linalg operation.		static void collectOrderingConstraints(
/// Ensures all tensors are visited in natural index order. This is		CodegenEnv &env, OpOperand *skip, bool idxReducBased,
/// essential for sparse storage formats since these only support access		std::unordered_map<ConstraintKind, SmallVector<OrderingConstraint>>
/// along fixed dimensions. Even for dense storage formats, however, the natural		&constraints) {
/// index order yields innermost unit-stride access with better spatial
/// locality.
static bool computeIterationGraph(CodegenEnv &env, SortMask mask,
OpOperand *skip, bool idxReducBased = false) {
// Set up an n x n from/to adjacency matrix of the iteration graph
// for the implicit loop indices i_0 .. i_n-1.
const unsigned numLoops = env.merger().getNumLoops();		const unsigned numLoops = env.merger().getNumLoops();
std::vector<std::vector<bool>> adjM(numLoops,
std::vector<bool>(numLoops, false));
std::vector<unsigned> inDegree(numLoops, 0); // in-degree of each node.
const auto iteratorTypes = env.op().getIteratorTypesArray();		const auto iteratorTypes = env.op().getIteratorTypesArray();
// Iterate over the indexing maps of every tensor in the tensor expression.		// Iterate over the indexing maps of every tensor in the tensor expression.
for (OpOperand &t : env.op()->getOpOperands()) {		for (OpOperand &t : env.op()->getOpOperands()) {
// Get map and encoding.		// Get map and encoding.
const auto enc = getSparseTensorEncoding(t.get().getType());		const auto enc = getSparseTensorEncoding(t.get().getType());
// Skips dense inputs/outputs when not requested.
const bool isDenseInput = !enc && env.op().isDpsInput(&t);
const bool isDenseOutput = !enc && !isDenseInput;
if ((isDenseInput && !includesDenseInput(mask)) \|\|
(isDenseOutput && !includesDenseOutput(mask)))
continue;

// Push unrelated loops into sparse iteration space, so these		// Push unrelated loops into sparse iteration space, so these
// will be skipped more often.		// will be skipped more often.
// TODO: Do we really need this?		// TODO: Do we really need this?
if (includesUndef(mask)) {
const TensorId tid = env.makeTensorId(t.getOperandNumber());		const TensorId tid = env.makeTensorId(t.getOperandNumber());
for (LoopId i = 0; i < numLoops; i++) {		for (LoopId i = 0; i < numLoops; i++) {

const auto dltI = env.dlt(tid, i);		const auto dltI = env.dlt(tid, i);
if (isCompressedDLT(dltI) \|\| isCompressedWithHiDLT(dltI) \|\|		if (isCompressedDLT(dltI) \|\| isCompressedWithHiDLT(dltI) \|\|
isSingletonDLT(dltI)) {		isSingletonDLT(dltI)) {
for (LoopId j = 0; j < numLoops; j++)		for (LoopId j = 0; j < numLoops; j++) {
if (isUndefDLT(env.dlt(tid, j))) {		if (!env.merger().isFilterLoop(j))
adjM[i][j] = true;		constraints[ConstraintKind::Undef].push_back({i, j});
inDegree[j]++;
}		}
} else {		} else {
assert(isDenseDLT(dltI) \|\| isUndefDLT(dltI));		assert(isDenseDLT(dltI) \|\| isUndefDLT(dltI));
}		}
}		}
}
// Push unrelated loops into sparse iteration space, so these		ConstraintKind constraintKind = ConstraintKind::Sparse;
// will be skipped more often.
		const bool isDenseInput = !enc && env.op().isDpsInput(&t);
		const bool isDenseOutput = !enc && !isDenseInput;

		if (isDenseInput)
		constraintKind = ConstraintKind::DenseInput;
		else if (isDenseOutput)
		constraintKind = ConstraintKind::DenseOutput;

if (idxReducBased)		if (idxReducBased)
addSliceBasedConstraints(env, t, skip, mask, adjM, inDegree);		addSliceBasedConstraints(env, t, skip, constraints[constraintKind]);
else		else
addFilterLoopBasedConstraints(env, t, skip, mask, adjM, inDegree);		addFilterLoopBasedConstraints(env, t, skip, constraints[constraintKind]);
		}
		}

		static void computeInitialLoopOrdering(CodegenEnv &env,
		SmallVector<LoopId> &initialOrdering) {
		const unsigned numLoops = env.merger().getNumLoops();
		initialOrdering.resize(numLoops);

		for (LoopId i = 0; i < numLoops; ++i) {
		initialOrdering[i] = i;
		}

		const auto iteratorTypes = env.op().getIteratorTypesArray();

		// For the initial sorting we use filter loop -> parallel loop -> reduction
		// loop because
		// 1. Putting reduction loop early might make the loop sequence
		// inadmissible.
		// 2. Filter loops should be put as early as possible for better
		// performance, since only one (if any) iteration will carry the
		// computation. E.g., for (1 to N)
		// for (1 to M)
		// for (1 to K)
		// if (xxx)
		// O(X) computation => O(NMK+NMX) time complexity
		//
		// By putting the filter loop one level up, we got
		//
		// for (1 to N)
		// for (1 to K)
		// if (xxx)
		// for (1 to M)
		// O(X) computation => O(NK+NMX) time complexity
		auto loopRank = [&](LoopId idx) {
		if (env.merger().isFilterLoop(idx))
		return -1;
		else if (linalg::isReductionIterator(iteratorTypes[idx]))
		return 1;
		else
		return 0;
		};

		std::sort(initialOrdering.begin(), initialOrdering.end(),
		[&](LoopId a, LoopId b) { return loopRank(a) < loopRank(b); });
		}

		/// Given a list of constraints resp. edges that define the iteration graph this
		/// returns a topological sort of the nodes that respects all sparse
		/// constraints. For the other constraints we greedily try to add them such that
		/// no cycles occur.
		static void computeTopSort(
		CodegenEnv &env,
		std::unordered_map<ConstraintKind, SmallVector<OrderingConstraint>>
		&constraints,
		bool &hasCycle, bool &isAdmissible) {
		// By providing an initial sorting we can impose soft constraints on the
		// ordering based on loop types.
		// TODO: It might be better to impose the loop type constraints via edges as
		// well.
		SmallVector<LoopId> initialOrdering;
		computeInitialLoopOrdering(env, initialOrdering);

		IncrementalTopoSort topSort(initialOrdering);

		hasCycle = false;
		isAdmissible = true;

		// First try to satisfy all required constraints, if this fails we return
		// an error.
		for (const OrderingConstraint &constraint :
		constraints[ConstraintKind::Sparse]) {
		LoopId i, j;
		std::tie(i, j) = constraint;

		if (topSort.createsCycle(i, j)) {
		hasCycle = true;
		return;
		}

		topSort.addEdge(i, j);
		}

		if (!env.isAdmissibleTopoOrder(topSort.getCurrent())) {
		isAdmissible = false;
		return;
		}

		// Now we add relaxable constraints and skip any that lead to cycles.
		SmallVector<ConstraintKind> relaxableConstraints = {
		ConstraintKind::DenseInput, ConstraintKind::DenseOutput,
		ConstraintKind::Undef};

		// If we encounter an inadmissible ordering here, we return the last good
		// configuration.
		// TODO: We could also skip the constraint that led to an inadmissible
		// ordering and continue.
		std::vector<LoopId> lastAdmissible = topSort.getCurrent();

		bool encounteredInadmissible = false;
		for (ConstraintKind kind : relaxableConstraints) {
		for (const OrderingConstraint &constraint : constraints[kind]) {
		LoopId i, j;
		std::tie(i, j) = constraint;
		if (topSort.createsCycle(i, j))
		continue;

		topSort.addEdge(i, j);

		if (!env.isAdmissibleTopoOrder(topSort.getCurrent())) {
		encounteredInadmissible = true;
		break;
		}
		lastAdmissible = topSort.getCurrent();
}		}
// Topologically sort the iteration graph to determine loop order.		if (encounteredInadmissible)
// Report failure for a cyclic iteration graph.		break;
env.topSortClear(numLoops);		}
return topSortOptimal(env, iteratorTypes, inDegree, adjM);
		env.topSortSet(lastAdmissible);
		}

		/// Computes a topologically sorted iteration graph for the linalg operation.
		/// Ensures all tensors are visited in natural index order. This is
		/// essential for sparse storage formats since these only support access
		/// along fixed dimensions. Even for dense storage formats, however, the natural
		/// index order yields innermost unit-stride access with better spatial
		/// locality.
		static void computeIterationGraph(CodegenEnv &env, OpOperand *skip,
		bool &hasCycle, bool &isAdmissible,
		bool idxReducBased = false) {
		// Note that we use the terms 'constraint' and 'edge' somewhat interchangably
		// here
		std::unordered_map<ConstraintKind, SmallVector<OrderingConstraint>>
		constraints;
		collectOrderingConstraints(env, skip, idxReducBased, constraints);

		computeTopSort(env, constraints, hasCycle, isAdmissible);
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Sparse compiler synthesis methods (statements and expressions).		// Sparse compiler synthesis methods (statements and expressions).
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// Local bufferization of all dense and sparse data structures.		/// Local bufferization of all dense and sparse data structures.
static void genBuffers(CodegenEnv &env, OpBuilder &builder) {		static void genBuffers(CodegenEnv &env, OpBuilder &builder) {
▲ Show 20 Lines • Show All 1,036 Lines • ▼ Show 20 Lines	LogicalResult matchAndRewrite(linalg::GenericOp op,
if (failed(env.initTensorExp()))		if (failed(env.initTensorExp()))
return failure();		return failure();

// Computes a topologically sorted iteration graph to ensure tensors		// Computes a topologically sorted iteration graph to ensure tensors
// are visited in natural index order. Gradually relaxes the considered		// are visited in natural index order. Gradually relaxes the considered
// constraints until an acyclic iteration graph results, such that sparse		// constraints until an acyclic iteration graph results, such that sparse
// code generation can proceed. As a last resort, an attempt is made		// code generation can proceed. As a last resort, an attempt is made
// to resolve cycles by inserting a conversion.		// to resolve cycles by inserting a conversion.
bool isAdmissible = false;		bool hasCycle, isAdmissible;
bool hasCycle = true;		computeIterationGraph(env, nullptr, hasCycle, isAdmissible, idxReducBased);

// An const list of all masks that we used for interation graph		if (!isAdmissible) {
// computation. Must be ordered from more strict to less strict.		return failure();
// Ideally (though might not be guaranteed), the eariler a constraint mask
// can be satisfied, the faster the generated kernel will be.
const auto allMasks = {
SortMask::kIncludeAll, SortMask::kIncludeDense,
SortMask::kIncludeDenseInput, SortMask::kIncludeDenseOutput,
SortMask::kIncludeUndef, SortMask::kSparseOnly};
for (const SortMask mask : allMasks) {
if (computeIterationGraph(env, mask, nullptr, idxReducBased)) {
hasCycle = false;
if (env.isAdmissibleTopoOrder()) {
isAdmissible = true;
break;
}
// else try a set of less strict constraints.
}
}		}

if (hasCycle) {		if (hasCycle) {
return idxReducBased		return idxReducBased
? failure() // TODO: should cycle be resolved differently?		? failure() // TODO: should cycle be resolved differently?
: resolveCycle(env, rewriter); // one last shot		: resolveCycle(env, rewriter); // one last shot
}		}

if (!isAdmissible)
return failure(); // inadmissible expression, reject

// Recursively generates code if admissible.		// Recursively generates code if admissible.
env.startEmit();		env.startEmit();
genBuffers(env, rewriter);		genBuffers(env, rewriter);
// TODO: Constant affine expression should be handled differently when using		// TODO: Constant affine expression should be handled differently when using
// slice-based codegen, it does not matter now becasue we already reject the		// slice-based codegen, it does not matter now becasue we already reject the
// constant expression at a earlier stage.		// constant expression at a earlier stage.
genInitConstantDenseAddress(env, rewriter);		genInitConstantDenseAddress(env, rewriter);
genStmt(env, rewriter, env.getExprId(), 0);		genStmt(env, rewriter, env.getExprId(), 0);
genResult(env, rewriter);		genResult(env, rewriter);
return success();		return success();
}		}

private:		private:
// Last resort cycle resolution.		// Last resort cycle resolution.
LogicalResult resolveCycle(CodegenEnv &env, PatternRewriter &rewriter) const {		LogicalResult resolveCycle(CodegenEnv &env, PatternRewriter &rewriter) const {
// Compute topological sort while leaving out every		// Compute topological sort while leaving out every
// sparse input tensor in succession until an acylic		// sparse input tensor in succession until an acylic
// iteration graph results.		// iteration graph results.
for (OpOperand *t : env.op().getDpsInputOperands()) {		for (OpOperand *t : env.op().getDpsInputOperands()) {
const TensorId tid = env.makeTensorId(t->getOperandNumber());		const TensorId tid = env.makeTensorId(t->getOperandNumber());
Value tval = t->get();		Value tval = t->get();
auto srcEnc = getSparseTensorEncoding(tval.getType());		auto srcEnc = getSparseTensorEncoding(tval.getType());
if (!srcEnc \|\| !computeIterationGraph(env, SortMask::kSparseOnly, t))
		if (!srcEnc)
continue;		continue;

		bool hasCycle, isAdmissible;
		computeIterationGraph(env, t, hasCycle, isAdmissible);
		if (hasCycle \|\| !isAdmissible)
		continue;

// Found an input tensor that resolves the cycle by inserting a		// Found an input tensor that resolves the cycle by inserting a
// conversion into a sparse tensor that adheres to the iteration		// conversion into a sparse tensor that adheres to the iteration
// graph order. Also releases the temporary sparse tensor.		// graph order. Also releases the temporary sparse tensor.
//		//
// TODO: investigate fusing the conversion with computation,		// TODO: investigate fusing the conversion with computation,
// especially if it is a direct yield!		// especially if it is a direct yield!
//		//
auto srcTp = getRankedTensorType(tval);		auto srcTp = getRankedTensorType(tval);
Show All 31 Lines

mlir/test/Dialect/SparseTensor/sparse_kernels.mlir

	Show First 20 Lines • Show All 203 Lines • ▼ Show 20 Lines
	// CHECK-DAG: %[[VAL_5:.*]] = arith.constant 1 : index			// CHECK-DAG: %[[VAL_5:.*]] = arith.constant 1 : index
	// CHECK-DAG: %[[VAL_6:.*]] = bufferization.to_memref %[[VAL_0]] : memref<8x8xi32>			// CHECK-DAG: %[[VAL_6:.*]] = bufferization.to_memref %[[VAL_0]] : memref<8x8xi32>
	// CHECK-DAG: %[[VAL_7:.*]] = sparse_tensor.positions %[[VAL_1]] {level = 0 : index} : tensor<3x3xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>> to memref<?xindex>			// CHECK-DAG: %[[VAL_7:.*]] = sparse_tensor.positions %[[VAL_1]] {level = 0 : index} : tensor<3x3xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>> to memref<?xindex>
	// CHECK-DAG: %[[VAL_8:.*]] = sparse_tensor.coordinates %[[VAL_1]] {level = 0 : index} : tensor<3x3xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>> to memref<?xindex>			// CHECK-DAG: %[[VAL_8:.*]] = sparse_tensor.coordinates %[[VAL_1]] {level = 0 : index} : tensor<3x3xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>> to memref<?xindex>
	// CHECK-DAG: %[[VAL_9:.*]] = sparse_tensor.positions %[[VAL_1]] {level = 1 : index} : tensor<3x3xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>> to memref<?xindex>			// CHECK-DAG: %[[VAL_9:.*]] = sparse_tensor.positions %[[VAL_1]] {level = 1 : index} : tensor<3x3xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>> to memref<?xindex>
	// CHECK-DAG: %[[VAL_10:.*]] = sparse_tensor.coordinates %[[VAL_1]] {level = 1 : index} : tensor<3x3xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>> to memref<?xindex>			// CHECK-DAG: %[[VAL_10:.*]] = sparse_tensor.coordinates %[[VAL_1]] {level = 1 : index} : tensor<3x3xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>> to memref<?xindex>
	// CHECK-DAG: %[[VAL_11:.*]] = sparse_tensor.values %[[VAL_1]] : tensor<3x3xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>> to memref<?xi32>			// CHECK-DAG: %[[VAL_11:.*]] = sparse_tensor.values %[[VAL_1]] : tensor<3x3xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>> to memref<?xi32>
	// CHECK: %[[VAL_12:.*]] = bufferization.to_memref %[[VAL_2]] : memref<6x6xi32>			// CHECK: %[[VAL_12:.*]] = bufferization.to_memref %[[VAL_2]] : memref<6x6xi32>
	// CHECK: scf.for %[[VAL_13:.*]] = %[[VAL_4]] to %[[VAL_3]] step %[[VAL_5]] {			// CHECK: %[[VAL_13:.*]] = memref.load %[[VAL_7]]{{\[}}%[[VAL_4]]] : memref<?xindex>
	// CHECK: %[[VAL_14:.*]] = memref.load %[[VAL_7]]{{\[}}%[[VAL_4]]] : memref<?xindex>			// CHECK: %[[VAL_14:.*]] = memref.load %[[VAL_7]]{{\[}}%[[VAL_5]]] : memref<?xindex>
	// CHECK: %[[VAL_15:.*]] = memref.load %[[VAL_7]]{{\[}}%[[VAL_5]]] : memref<?xindex>			// CHECK: scf.for %[[VAL_15:.*]] = %[[VAL_13]] to %[[VAL_14]] step %[[VAL_5]] {
	// CHECK: scf.for %[[VAL_16:.*]] = %[[VAL_14]] to %[[VAL_15]] step %[[VAL_5]] {			// CHECK: %[[VAL_16:.*]] = memref.load %[[VAL_8]]{{\[}}%[[VAL_15]]] : memref<?xindex>
	// CHECK: %[[VAL_17:.*]] = memref.load %[[VAL_8]]{{\[}}%[[VAL_16]]] : memref<?xindex>			// CHECK: scf.for %[[VAL_17:.*]] = %[[VAL_4]] to %[[VAL_3]] step %[[VAL_5]] {
	// CHECK: scf.for %[[VAL_18:.*]] = %[[VAL_4]] to %[[VAL_3]] step %[[VAL_5]] {			// CHECK: %[[VAL_18:.*]] = memref.load %[[VAL_9]]{{\[}}%[[VAL_15]]] : memref<?xindex>
	// CHECK: %[[VAL_19:.*]] = memref.load %[[VAL_12]]{{\[}}%[[VAL_13]], %[[VAL_18]]] : memref<6x6xi32>			// CHECK: %[[VAL_19:.*]] = arith.addi %[[VAL_15]], %[[VAL_5]] : index
	// CHECK: %[[VAL_20:.*]] = memref.load %[[VAL_9]]{{\[}}%[[VAL_16]]] : memref<?xindex>			// CHECK: %[[VAL_20:.*]] = memref.load %[[VAL_9]]{{\[}}%[[VAL_19]]] : memref<?xindex>
	// CHECK: %[[VAL_21:.*]] = arith.addi %[[VAL_16]], %[[VAL_5]] : index			// CHECK: scf.for %[[VAL_21:.*]] = %[[VAL_18]] to %[[VAL_20]] step %[[VAL_5]] {
	// CHECK: %[[VAL_22:.*]] = memref.load %[[VAL_9]]{{\[}}%[[VAL_21]]] : memref<?xindex>			// CHECK: %[[VAL_22:.*]] = memref.load %[[VAL_10]]{{\[}}%[[VAL_21]]] : memref<?xindex>
	// CHECK: %[[VAL_23:.]] = scf.for %[[VAL_24:.]] = %[[VAL_20]] to %[[VAL_22]] step %[[VAL_5]] iter_args(%[[VAL_25:.*]] = %[[VAL_19]]) -> (i32) {			// CHECK: %[[VAL_23:.*]] = memref.load %[[VAL_11]]{{\[}}%[[VAL_21]]] : memref<?xi32>
	// CHECK: %[[VAL_26:.*]] = memref.load %[[VAL_10]]{{\[}}%[[VAL_24]]] : memref<?xindex>			// CHECK: scf.for %[[VAL_24:.*]] = %[[VAL_4]] to %[[VAL_3]] step %[[VAL_5]] {
	// CHECK: %[[VAL_27:.*]] = arith.addi %[[VAL_13]], %[[VAL_17]] : index			// CHECK: %[[VAL_25:.*]] = memref.load %[[VAL_12]]{{\[}}%[[VAL_17]], %[[VAL_24]]] : memref<6x6xi32>
	// CHECK: %[[VAL_28:.*]] = arith.addi %[[VAL_18]], %[[VAL_26]] : index			// CHECK: %[[VAL_26:.*]] = arith.addi %[[VAL_17]], %[[VAL_16]] : index
	// CHECK: %[[VAL_29:.*]] = memref.load %[[VAL_6]]{{\[}}%[[VAL_27]], %[[VAL_28]]] : memref<8x8xi32>			// CHECK: %[[VAL_27:.*]] = arith.addi %[[VAL_24]], %[[VAL_22]] : index
	// CHECK: %[[VAL_30:.*]] = memref.load %[[VAL_11]]{{\[}}%[[VAL_24]]] : memref<?xi32>			// CHECK: %[[VAL_28:.*]] = memref.load %[[VAL_6]]{{\[}}%[[VAL_26]], %[[VAL_27]]] : memref<8x8xi32>
	// CHECK: %[[VAL_31:.*]] = arith.muli %[[VAL_29]], %[[VAL_30]] : i32			// CHECK: %[[VAL_29:.*]] = arith.muli %[[VAL_28]], %[[VAL_23]] : i32
	// CHECK: %[[VAL_32:.*]] = arith.addi %[[VAL_25]], %[[VAL_31]] : i32			// CHECK: %[[VAL_30:.*]] = arith.addi %[[VAL_25]], %[[VAL_29]] : i32
	// CHECK: scf.yield %[[VAL_32]] : i32			// CHECK: memref.store %[[VAL_30]], %[[VAL_12]]{{\[}}%[[VAL_17]], %[[VAL_24]]] : memref<6x6xi32>
	// CHECK: } {"Emitted from" = "linalg.generic"}			// CHECK: } {"Emitted from" = "linalg.generic"}
	// CHECK: memref.store %[[VAL_33:.*]], %[[VAL_12]]{{\[}}%[[VAL_13]], %[[VAL_18]]] : memref<6x6xi32>
	// CHECK: } {"Emitted from" = "linalg.generic"}			// CHECK: } {"Emitted from" = "linalg.generic"}
	// CHECK: } {"Emitted from" = "linalg.generic"}			// CHECK: } {"Emitted from" = "linalg.generic"}
	// CHECK: } {"Emitted from" = "linalg.generic"}			// CHECK: } {"Emitted from" = "linalg.generic"}
	// CHECK: %[[VAL_34:.*]] = bufferization.to_tensor %[[VAL_12]] : memref<6x6xi32>			// CHECK: %[[VAL_31:.*]] = bufferization.to_tensor %[[VAL_12]] : memref<6x6xi32>
	// CHECK: return %[[VAL_34]] : tensor<6x6xi32>			// CHECK: return %[[VAL_31]] : tensor<6x6xi32>
	// CHECK: }			// CHECK: }
	func.func @conv2d(%input: tensor<8x8xi32>,			func.func @conv2d(%input: tensor<8x8xi32>,
	%filter: tensor<3x3xi32, #DCSR>,			%filter: tensor<3x3xi32, #DCSR>,
	%output: tensor<6x6xi32>) -> tensor<6x6xi32> {			%output: tensor<6x6xi32>) -> tensor<6x6xi32> {
	%0 = linalg.conv_2d			%0 = linalg.conv_2d
	ins (%input, %filter: tensor<8x8xi32>, tensor<3x3xi32, #DCSR>)			ins (%input, %filter: tensor<8x8xi32>, tensor<3x3xi32, #DCSR>)
	outs (%output: tensor<6x6xi32>) -> tensor<6x6xi32>			outs (%output: tensor<6x6xi32>) -> tensor<6x6xi32>
	return %0 : tensor<6x6xi32>			return %0 : tensor<6x6xi32>
	▲ Show 20 Lines • Show All 114 Lines • Show Last 20 Lines