This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/SparseTensor/
-
mlir/
-
Dialect/
-
SparseTensor/
-
IR/
-
SparseTensorType.h
-
Utils/
6/7
Merger.h
-
lib/Dialect/SparseTensor/
-
Dialect/
-
SparseTensor/
-
Transforms/
-
CodegenEnv.h
-
CodegenEnv.cpp
4/5
LoopEmitter.h
5/5
LoopEmitter.cpp
-
SparseTensorRewriting.cpp
-
Sparsification.cpp
-
Utils/
-
Merger.cpp
-
unittests/Dialect/SparseTensor/
-
Dialect/
-
SparseTensor/
-
MergerTest.cpp

Differential D145756

[mlir][sparse] Cleaning up names in {Merger,LoopEmitter,CodegenEnv}.{h,cpp}
ClosedPublic

Authored by wrengr on Mar 9 2023, 5:01 PM.

Download Raw Diff

Details

Reviewers

aartbik
bixia
Peiming
nicolasvasilache

Commits

rGb8cf7af9090b: [mlir][sparse] Cleaning up names in {Merger,LoopEmitter,CodegenEnv}.{h,cpp}

Summary

This change does a bunch of renaming to clear up confusions in these files. In particular, this change:

Renames variables and methods to clarify the "dim"/"lvl" distinction, and changes them to use the Dimension/Level types as appropriate.
Introduces new typedefs
- ExprId, LatPointId, LatSetId: to clarify the interning design of the Merger.
- LoopId, LoopOrd: to clarify the distinction between arbitrary names for loop-variables, vs numeric identifiers based on the actual order of loop generation.
- TensorId
- (Future CLs will change these from typedefs to structs/classes, so that the typechecker can help avoid mixups.)
Updates documentation to match the new terminology
Adds additional assertions
Adds const to local variables along the way

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

wrengr created this revision.Mar 9 2023, 5:01 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 9 2023, 5:01 PM

Herald added subscribers: hanchung, jsetoain, Moerafaat and 20 others. · View Herald Transcript

wrengr requested review of this revision.Mar 9 2023, 5:01 PM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptMar 9 2023, 5:01 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

This version of the patch is based off an old upstream revision, hence all the comments about having backported the changes from D145532. I am currently in the process of rebasing the patch to incorporate D145532 properly, and will upload the new version once it finishes compiling locally (the be sure I didn't introduce any rebasing bugs).

Harbormaster completed remote builds in B218561: Diff 503985.Mar 9 2023, 6:10 PM

Rebasing.

This rebase causes Integration/Dialect/SparseTensor/CPU/reshape_dot.mlir to fail, whereas it was working before. The immediate cause is a failed assertion about dstLvl in LoopEmitter::getCollapseReassociation. I'm still re-debugging that, but am uploading this version for reviewers to get started on.

Harbormaster completed remote builds in B218572: Diff 504002.Mar 9 2023, 7:00 PM

Peiming added inline comments.Mar 10 2023, 9:24 AM

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp
227	Yeah, probably we should.
229	Initializes (finally my turn ;-))
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h
313	I won't spend too much effort on this because this is a temporary workaround. In the most cases, the `reassoc` is empty, and each level is queried just once.

wrengr added inline comments.Mar 10 2023, 11:40 AM

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp
229	Fwiw, the style Aart has been pushing is: (1) for documentation on functions/methods, use the indicative ("-s"); but (2) for commentary on statements, use the imperative ("-{}") ;)
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h
313	Even if it's queried just once, still makes sense to me to compute it when we store it. (The exception would be if it's expensive to compute (which it isn't really), or if a bunch of them are stored but never queried.) If it's just a temporary workaround, what's the longer-term solution? Cuz the workaround is pretty invasive— conceptually speaking (since it means we need to be sure to distinguish/clarify srcLvl-vs-dstLvl in a lot of places); not operationally speaking. Fwiw, if by the workaround-ness you mean our goal to split off the "view" stuff from the "real tensor" stuff, then I'm not entirely sure if that'd help the conceptual problems here. Even if/when we have separate MLIR-types for views vs tensors, we'll still need to handle the combinations together (e.g., if someone does a (tensor,view)-matmul then the shared dimension-axis will need to iterate over the corresponding source-tensor levels and then do the appropriate filtering to combine them). In any case, the main goal for this CL is just to rename things (and for this method, to move it to the appropriate section rather than having it interleaved with the data members).

In D145756#4183589, @wrengr wrote:

This rebase causes Integration/Dialect/SparseTensor/CPU/reshape_dot.mlir to fail, whereas it was working before. The immediate cause is a failed assertion about dstLvl in LoopEmitter::getCollapseReassociation. I'm still re-debugging that, but am uploading this version for reviewers to get started on.

Hmm, everything passed on the buildbot. I wonder how my local repo differs...

resolving todo re reserving room for LoopEmitter::loopSeqStack

wrengr mentioned this in D141532: [mlir][sparse] support dynamic sparse tensor slices..Mar 10 2023, 12:09 PM

Peiming added inline comments.Mar 10 2023, 12:22 PM

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp
509	You should use `dstLvl` here

Peiming added inline comments.Mar 10 2023, 12:24 PM

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h
313	The ulimate solution is to have a `sparse tensor view`, (or whatever you prefer) and a set of operations on top them.

Addressing comment to fix bug in LoopEmitter::enterLoopOverTensorAtLvl

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp
509	Thanks :)

Harbormaster completed remote builds in B218758: Diff 504258.Mar 10 2023, 3:03 PM

Rebased over D141532

Resolving a todo in Merger.cpp

Improving documentation for Merger::setHasSparseOut

Harbormaster completed remote builds in B218798: Diff 504319.Mar 10 2023, 7:28 PM

are we really going to follow up on all the TODO's and notes?
is it really worth putting them *all* in our codebase?

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
33	You are adding a lot of comments with TODOs that read more like notes to self on how to improve the code. It makes the actual code actually a bit harder to read, since each block of comments looks like detailed documentation, but really is not
220	I really like the n, e, b short convention, but not it becomes very spelled out and breaks the lines
497–505	This was documented at some point, but perhaps the convention was lost. The vector as used for anything with fixed a priori size, the others for stuff that had variable length (but not too much)

In D145756#4189851, @aartbik wrote:

are we really going to follow up on all the TODO's and notes?
is it really worth putting them *all* in our codebase?

Some of the todos are really questions to the team (e.g., the std::vector vs SmallVector discrepancy, and explicating how LoopId/LoopOrd interact with AffineDimExpr/LinalgIndexOp, cleaning up the "idx"/"ldx" names). And the fixmes are things I'd love to resolve, but I couldn't figure out what they were supposed to be from context, so those're also for the folks more familiar with those particular parts of the codebase.

But yes, I was planning to follow up on the others (making Kind nested/enumclass, encoding arity into the Kind, making the aliases into structs, using AOS for LoopEmitter). I can remove them if you'd prefer

updating comment/todo re std::vector vs llvm::SmallVector

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
220	I don't quite follow what you mean here
497–505	Aha. I'll convert the todo to a comment explaining that Though I do worry about the fact that `operator[]` has different semantics for the two types; since the type is not evident at the callsites, and thus it's not immediately apparent whether we need to assert against OOB or not.

Doing a few more renames:

Merger
- conjLatPoint -> conjLat
- takeConj -> conjSet
- takeDisj -> disjSet
- takeCombi -> combiSet
CodegenEnv
- getTensorExp -> getExprId

The Merger methods are renamed to follow the "lat"/"set" naming convention used by the other methods. The CodegenEnv method is renamed because the return type is ExprId rather than TensorExp.

git-clang-format

Marking Merger::buildExp const

Harbormaster completed remote builds in B219153: Diff 504821.Mar 13 2023, 4:13 PM

aartbik accepted this revision.Mar 14 2023, 10:30 AM

aartbik added inline comments.

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
220	Oh, I meant that using n,e,b as parameter name was a convention in this file and gave rise to very short lines for the methods, like constructor and such. But in general, spelling things out, like you do, is better, so this was more me thinking out loud ;-)

This revision is now accepted and ready to land.Mar 14 2023, 10:30 AM

Peiming added inline comments.Mar 14 2023, 10:34 AM

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h
43	I think `unsigned` is good enough here if you interpret it as the index to loop-stack

Changing LoopOrd to use unsigned

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
220	Ah, gotcha. I was mostly just trying to stick with the preexisting style re single- vs triple-letter names in all these files (just introducing `n` for `LoopOrd` to keep it distinct from `i` for `LoopId`); in part to minimize the diff, and in part to avoid dying on any hills
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h
43	The previous code seemed a bit inconsistent about whether to use `size_t` vs `unsigned`, so I just picked the one that seemed a bit more common. Easy enough to switch.

This revision was landed with ongoing or failed builds.Mar 14 2023, 11:51 AM

Closed by commit rGb8cf7af9090b: [mlir][sparse] Cleaning up names in {Merger,LoopEmitter,CodegenEnv}.{h,cpp} (authored by wrengr). · Explain Why

This revision was automatically updated to reflect the committed changes.

wrengr added a commit: rGb8cf7af9090b: [mlir][sparse] Cleaning up names in {Merger,LoopEmitter,CodegenEnv}.{h,cpp}.

Harbormaster completed remote builds in B219426: Diff 505198.Mar 14 2023, 12:46 PM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

SparseTensor/

IR/

SparseTensorType.h

1 line

Utils/

Merger.h

450 lines

lib/

Dialect/

SparseTensor/

Transforms/

57 lines

77 lines

323 lines

718 lines

SparseTensorRewriting.cpp

15 lines

Sparsification.cpp

628 lines

Utils/

Merger.cpp

342 lines

unittests/

Dialect/

SparseTensor/

MergerTest.cpp

36 lines

Diff 505204

mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorType.h

Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	public:

SparseTensorType(ShapedType stp, SparseTensorEncodingAttr enc)		SparseTensorType(ShapedType stp, SparseTensorEncodingAttr enc)
: SparseTensorType(		: SparseTensorType(
RankedTensorType::get(stp.getShape(), stp.getElementType(), enc)) {}		RankedTensorType::get(stp.getShape(), stp.getElementType(), enc)) {}

// Copy-assignment would be implicitly deleted (because our fields		// Copy-assignment would be implicitly deleted (because our fields
// are const), so we explicitly delete it for clarity.		// are const), so we explicitly delete it for clarity.
SparseTensorType &operator=(const SparseTensorType &) = delete;		SparseTensorType &operator=(const SparseTensorType &) = delete;
		// So we must explicitly define the copy-ctor to silence -Wdeprecated-copy.
SparseTensorType(const SparseTensorType &) = default;		SparseTensorType(const SparseTensorType &) = default;

/// Constructs a new `SparseTensorType` with the same dimension-shape		/// Constructs a new `SparseTensorType` with the same dimension-shape
/// and element type, but with the encoding replaced by the given encoding.		/// and element type, but with the encoding replaced by the given encoding.
SparseTensorType withEncoding(SparseTensorEncodingAttr newEnc) const {		SparseTensorType withEncoding(SparseTensorEncodingAttr newEnc) const {
return SparseTensorType(rtp, newEnc);		return SparseTensorType(rtp, newEnc);
}		}

▲ Show 20 Lines • Show All 197 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h

Show All 9 Lines
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef MLIR_DIALECT_SPARSETENSOR_UTILS_MERGER_H_		#ifndef MLIR_DIALECT_SPARSETENSOR_UTILS_MERGER_H_
#define MLIR_DIALECT_SPARSETENSOR_UTILS_MERGER_H_		#define MLIR_DIALECT_SPARSETENSOR_UTILS_MERGER_H_

#include "mlir/Dialect/Linalg/IR/Linalg.h"		#include "mlir/Dialect/Linalg/IR/Linalg.h"
#include "mlir/Dialect/SparseTensor/IR/Enums.h"		#include "mlir/Dialect/SparseTensor/IR/Enums.h"
		#include "mlir/Dialect/SparseTensor/IR/SparseTensor.h"
#include "mlir/IR/Value.h"		#include "mlir/IR/Value.h"
#include "llvm/ADT/BitVector.h"		#include "llvm/ADT/BitVector.h"
#include <optional>		#include <optional>

namespace mlir {		namespace mlir {
namespace sparse_tensor {		namespace sparse_tensor {

/// Tensor expression kind.		/// Tensor expression kind.
		///
		/// The `kLoopVar` leaf kind is for representing `linalg::IndexOp`.
		/// That is, its argument is a `LoopId` identifying the loop-variable
		/// in question, and its value will be the current iteration's value
		/// of that loop-variable. See the `LoopId` documentation for more details.
		//
		// TODO: make this an `enum class` nested in the `TensorExp` class;
		aartbikUnsubmitted Not Done Reply Inline Actions You are adding a lot of comments with TODOs that read more like notes to self on how to improve the code. It makes the actual code actually a bit harder to read, since each block of comments looks like detailed documentation, but really is not aartbik: You are adding a lot of comments with TODOs that read more like notes to self on how to…
		// to improve namespacing, and match the pattern used by other "Kind"
		// enums in MLIR.
		//
		// TODO: Modify this definition so that the numeric values already encode
		// the `ExpArity` (while extending the notion of "arity" to include not
		// just the number of `ExprId` children the node has, but also whether the
		// node has a `Value` and/or `Operation*`). Doing this will avoid needing
		// to enumerate all the kinds in `getExpArity` and in the `TensorExp` ctor,
		// and should help clean up a few other places as well.
enum Kind {		enum Kind {
// Leaf.		// Leaf.
kTensor = 0,		kTensor = 0,
kInvariant,		kInvariant,
kIndex,		kLoopVar,
// Unary operations.		// Unary operations.
kAbsF,		kAbsF,
kAbsC,		kAbsC,
kAbsI,		kAbsI,
kCeilF,		kCeilF,
kFloorF,		kFloorF,
kSqrtF,		kSqrtF,
kSqrtC,		kSqrtC,
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	enum Kind {
kXorI,		kXorI,
kShrS, // signed		kShrS, // signed
kShrU, // unsigned		kShrU, // unsigned
kShlI,		kShlI,
kBinary, // semiring binary op		kBinary, // semiring binary op
kReduce, // semiring reduction op		kReduce, // semiring reduction op
};		};

		// TODO: These type aliases currently only serve to make the code more
		// self-documenting, however because they are not type-checked they can
		// do nothing to prevent mixups. We should really change them from mere
		// aliases to actual struct definitions, so that we can type-check them.

		/// Tensor identifiers. The valid set of identifiers is defined by the
		/// first argument passed to the `Merger` ctor.
		using TensorId = unsigned;

		/// Loop identifiers. The valid set of identifiers is defined by the
		/// second two arguments to the `Merger` ctor.
		///
		/// These identifiers serve as proxies for the `$dim` argument to
		/// `linalg::IndexOp`, however the numerical value of a `LoopId` should
		/// not necessarily be equated with the numerical value of the corresponding
		/// `$dim` argument. The `$dim` arguments are De Bruijn indices: that
		/// is, they identify the loop which binds the loop-variable by counting
		/// the enclosing loops from innermost to outermost, starting from zero.
		/// Whereas `LoopId` are considered to be arbitrary names for identifying
		/// loops; since the `Merger` does not care about the actual ordering of
		/// loops, and leaves it up to the `LoopEmitter` to specify the actual
		/// loop ordering (`LoopOrd`).
		///
		/// TODO: Despite the above claim that `$dim` and `LoopId` need not be
		/// numerically equal, some code in the `Merger` class does equate them
		/// (e.g., `buildTensorExp`). So we need to explicate the exact relationship
		/// between `$dim`, `LoopId`, and `LoopOrd`; especially with regards to their
		/// providence. If `LoopId` really is supposed to be equated with `$dim`,
		/// then we should change the name to `LoopIdx` or similar, to capture the
		/// fact that its numerical value is not invariant when entering/exiting
		/// loops (unlike `TensorId`, `ExprId`, `LatPointId`, and `LatSetId` which
		/// are invariant identifiers).
		using LoopId = unsigned;

		/// A compressed representation of `std::pair<TensorId, LoopId>`.
		/// The compression scheme is such that this also serves as an index
		/// into the bitvector stored in `LatPoint` (since that bitvector is
		/// just the implementation for a set of `TensorLoopId` values).
		using TensorLoopId = unsigned;

		/// `TensorExp` identifiers. These are allocated by `Merger::addExp`,
		/// and serve as unique identifiers for the corresponding `TensorExp` object.
		using ExprId = unsigned;

		/// `LatPoint` identifiers. These are allocated by `Merger::addLat`,
		/// and serve as unique identifiers for the corresponding `LatPoint` object.
		using LatPointId = unsigned;

		/// `LatSet` identifiers. These are allocated by `Merger::addSet` (and
		/// by other methods calling that one), and serve as unique identifiers
		/// for the corresponding `SmallVector<LatPointId>` object.
		using LatSetId = unsigned;

		/// A constant serving as the canonically invalid identifier, regardless
		/// of the identifier type.
		static constexpr unsigned kInvalidId = -1u;

/// Children subexpressions of tensor operations.		/// Children subexpressions of tensor operations.
struct Children {		struct Children {
unsigned e0;		ExprId e0;
unsigned e1;		ExprId e1;
};		};

/// Tensor expression. Represents a MLIR expression in tensor index notation.		/// Tensor expression. Represents a MLIR expression in tensor index notation.
struct TensorExp {		struct TensorExp {
TensorExp(Kind k, unsigned x, unsigned y, Value v, Operation *operation);		// The `x` parameter has different types depending on the value of the
		// `k` parameter. The correspondences are:
		// * `kTensor` -> `TensorId`
		// * `kInvariant` -> `kInvalidId`
		// * `kLoopVar` -> `LoopId`
		// * else -> `ExprId`
		//
		// The `y`, `v`, and `op` parameters either must or must not be
		// `kInvalidId`/`nullptr`, depending on the value of the `k` parameter;
		// however, they have uniform C++ types regardless of the value of `k`.
		TensorExp(Kind k, unsigned x, ExprId y, Value v, Operation *op);

/// Tensor expression kind.		/// Tensor expression kind.
Kind kind;		Kind kind;

union {		union {
/// Expressions representing tensors simply have a tensor number.		/// `kTensor` expressions simply have a tensor identifier.
unsigned tensor;		TensorId tensor;

/// Indices hold the index number.		/// `kLoopVar` expressions simply have a loop identifier.
unsigned index;		LoopId loop;

/// Tensor operations hold the indices of their children.		/// All other expressions hold the `ExprId`s of their children.
Children children;		Children children;
};		};

/// Direct link to IR for an invariant or the destination value (to		/// Direct link to IR for an invariant or the destination value (to
/// infer destination type) of a cast operation During code generation,		/// infer destination type) of a cast operation During code generation,
/// this field may be used to cache "hoisted" loop invariant tensor loads.		/// this field may be used to cache "hoisted" loop invariant tensor loads.
Value val;		Value val;

/// Code blocks used by semirings. For the case of kUnary, kBinary, kReduce,		/// Code blocks used by semirings. For the case of kUnary, kBinary, kReduce,
/// and kSelect, this holds the original operation with all regions. For		/// and kSelect, this holds the original operation with all regions. For
/// kBinaryBranch, this holds the YieldOp for the left or right half		/// kBinaryBranch, this holds the YieldOp for the left or right half
/// to be merged into a nested scf loop.		/// to be merged into a nested scf loop.
Operation *op;		Operation *op;
};		};

/// Lattice point. Each lattice point consists of a conjunction of tensor		/// Lattice point. Each lattice point consists of a formal conjunction
/// loop indices (encoded in a bitvector) and the index of the corresponding		/// of `TensorLoopId`s, together with the identifier of the corresponding
/// tensor expression.		/// tensor expression. The formal conjunction is represented as a set of
		/// `TensorLoopId`, where that set is implemented as a `BitVector`.
struct LatPoint {		struct LatPoint {
LatPoint(unsigned n, unsigned e, unsigned b);		/// Construct the lattice point from a given set of `TensorLoopId`s.
LatPoint(const BitVector &b, unsigned e);		LatPoint(const BitVector &bits, ExprId e);

		/// Construct a lattice point with `(t,i)` as the only `TensorLoopId`,
		/// where `(t,i) < (numTensors,numLoops)`.
		LatPoint(unsigned numTensors, unsigned numLoops, TensorId t, LoopId i,
		aartbikUnsubmitted Done Reply Inline Actions I really like the n, e, b short convention, but not it becomes very spelled out and breaks the lines aartbik: I really like the n, e, b short convention, but not it becomes very spelled out and breaks the…
		wrengrAuthorUnsubmitted Done Reply Inline Actions I don't quite follow what you mean here wrengr: I don't quite follow what you mean here
		aartbikUnsubmitted Done Reply Inline Actions Oh, I meant that using n,e,b as parameter name was a convention in this file and gave rise to very short lines for the methods, like constructor and such. But in general, spelling things out, like you do, is better, so this was more me thinking out loud ;-) aartbik: Oh, I meant that using n,e,b as parameter name was a convention in this file and gave rise to…
		wrengrAuthorUnsubmitted Done Reply Inline Actions Ah, gotcha. I was mostly just trying to stick with the preexisting style re single- vs triple-letter names in all these files (just introducing `n` for `LoopOrd` to keep it distinct from `i` for `LoopId`); in part to minimize the diff, and in part to avoid dying on any hills wrengr: Ah, gotcha. I was mostly just trying to stick with the preexisting style re single- vs triple…
		ExprId e);

/// Conjunction of tensor loop indices as bitvector. This represents		/// Conjunction of all `TensorLoopId`s involved in the tensor expression.
/// all indices involved in the tensor expression
BitVector bits;		BitVector bits;

/// Simplified conjunction of tensor loop indices as bitvector. This		/// Simplified conjunction of `TensorLoopId` as bitvector. This
/// represents a simplified condition under which this tensor expression		/// represents a simplified condition under which this tensor expression
/// must execute. Pre-computed during codegen to avoid repeated eval.		/// must execute. Pre-computed during codegen to avoid repeated eval.
BitVector simple;		BitVector simple;

/// Index of the tensor expression.		/// Identifier of the tensor expression.
unsigned exp;		ExprId exp;
};		};

/// A class to handle all iteration lattice operations. This class abstracts		/// A class to handle all iteration lattice operations. This class abstracts
/// away from some implementation details of storing iteration lattices and		/// away from some implementation details of storing iteration lattices and
/// tensor expressions. This allows for fine-tuning performance characteristics		/// tensor expressions. This allows for fine-tuning performance characteristics
/// independently from the basic algorithm if bottlenecks are identified.		/// independently from the basic algorithm if bottlenecks are identified.
class Merger {		class Merger {
public:		public:
/// Constructs a merger for the given number of tensors, native loops, and		/// Constructs a merger for the given number of tensors, native loops, and
/// filter loops. The user supplies the number of tensors involved in the		/// filter loops. The user supplies the number of tensors involved in the
/// kernel, with the last tensor in this set denoting the output tensor.		/// kernel, with the last tensor in this set denoting the output tensor.
/// The merger adds an additional synthetic tensor at the end of this set		/// The merger adds an additional synthetic tensor at the end of this set
/// to represent all invariant expressions in the kernel.		/// to represent all invariant expressions in the kernel.
///		///
/// In addition to natives loops (which are specified by the GenericOp),		/// In addition to natives loops (which are specified by the GenericOp),
/// extra filter loops are needed in order to handle affine expressions on		/// extra filter loops are needed in order to handle affine expressions on
/// sparse dimensions. E.g., (d0, d1, d2) => (d0 + d1, d2), a naive		/// sparse levels. E.g., (d0, d1, d2) => (d0 + d1, d2), a naive
/// implementation of the filter loop could be generated as		/// implementation of the filter loop could be generated as
///		///
/// for (coord : sparse_dim[0])		/// for (const auto c0 : coordinates[0]) {
/// if (coord == d0 + d1) {		/// if (c0 == d0 + d1) {
/// generated_code;		/// generated_code;
/// }		/// }
/// }		/// }
///		///
/// to filter out coordinates that are not equal to the affine expression.		/// to filter out coordinates that are not equal to the affine expression.
///		//
/// TODO: we want to make the filter loop more efficient in the future, e.g.,		// TODO: we want to make the filter loop more efficient in the future,
/// by avoiding scanning the full stored index sparse (keeping the last		// e.g., by avoiding scanning the full list of stored coordinates (keeping
/// position in ordered list) or even apply binary search to find the index.		// the last position in ordered list) or even apply binary search to find
///		// the coordinate.
Merger(unsigned t, unsigned l, unsigned fl);		//
		// TODO: would be cleaner to understand/document if the first argument
/// Adds a tensor expression. Returns its index.		// gave the number of input tensors, instead of the current number of
unsigned addExp(Kind k, unsigned e0, unsigned e1 = -1u, Value v = Value(),		// input+output tensors.
		Merger(unsigned numInputOutputTensors, unsigned numNativeLoops,
		unsigned numFilterLoops);

		/// Constructs a new tensor expression, and returns its identifier.
		/// The type of the `e0` argument varies according to the value of the
		/// `k` argument, as described by the `TensorExp` ctor.
		ExprId addExp(Kind k, unsigned e0, ExprId e1 = kInvalidId, Value v = Value(),
Operation *op = nullptr);		Operation *op = nullptr);
unsigned addExp(Kind k, unsigned e, Value v, Operation *op = nullptr) {		ExprId addExp(Kind k, ExprId e, Value v, Operation *op = nullptr) {
return addExp(k, e, -1u, v, op);		return addExp(k, e, kInvalidId, v, op);
}		}
unsigned addExp(Kind k, Value v, Operation *op = nullptr) {		ExprId addExp(Kind k, Value v, Operation *op = nullptr) {
return addExp(k, -1u, -1u, v, op);		return addExp(k, kInvalidId, kInvalidId, v, op);
}		}

/// Adds an iteration lattice point. Returns its index.		/// Constructs a new iteration lattice point, and returns its identifier.
unsigned addLat(unsigned t, unsigned i, unsigned e);		LatPointId addLat(TensorId t, LoopId i, ExprId e);

/// Adds a new, initially empty, set. Returns its index.		/// Constructs a new (initially empty) set, and returns its identifier.
unsigned addSet();		LatSetId addSet();

/// Computes a single conjunction of two lattice points by taking the "union"		/// Computes a single conjunction of two lattice points by taking the "union"
/// of loop indices (effectively constructing a larger "intersection" of those		/// of `LoopId` (effectively constructing a larger "intersection" of those
/// indices) with a newly constructed tensor (sub)expression of given kind.		/// loops) with a newly constructed tensor (sub)expression of given kind.
/// Returns the index of the new lattice point.		/// Returns the identifier of the new lattice point.
unsigned conjLatPoint(Kind kind, unsigned p0, unsigned p1,		LatPointId conjLat(Kind kind, LatPointId p0, LatPointId p1,
Operation *op = nullptr);		Operation *op = nullptr);

/// Conjunctive merge of two lattice sets L0 and L1 is conjunction of		/// Conjunctive merge of two lattice sets: `(s0 /\_op s1)`.
/// cartesian product. Returns the index of the new set.		/// Returns the identifier of the new set.
unsigned takeConj(Kind kind, unsigned s0, unsigned s1,		LatSetId conjSet(Kind kind, LatSetId s0, LatSetId s1,
Operation *op = nullptr);		Operation *op = nullptr);

/// Disjunctive merge of two lattice sets L0 and L1 is (L0 /\_op L1, L0, L1).		/// Disjunctive merge of two lattice sets: `(s0 /\_op s1, s0, s1)`.
/// Returns the index of the new set.		/// Returns the identifier of the new set.
unsigned takeDisj(Kind kind, unsigned s0, unsigned s1,		LatSetId disjSet(Kind kind, LatSetId s0, LatSetId s1,
Operation *op = nullptr);		Operation *op = nullptr);

/// Disjunctive merge of two lattice sets L0 and L1 with custom handling of		/// Disjunctive merge of two lattice sets with custom handling of the
/// the overlap, left, and right regions. Any region may be left missing in		/// overlap, left, and right regions. Any region may be left missing
/// the output. Returns the index of the new set.		/// in the output. Returns the identifier of the new set.
unsigned takeCombi(Kind kind, unsigned s0, unsigned s1, Operation *orig,		LatSetId combiSet(Kind kind, LatSetId s0, LatSetId s1, Operation *orig,
bool includeLeft, Kind ltrans, Operation *opleft,		bool includeLeft, Kind ltrans, Operation *opleft,
bool includeRight, Kind rtrans, Operation *opright);		bool includeRight, Kind rtrans, Operation *opright);

/// Maps the unary operator over the lattice set of the operand, i.e. each		/// Maps the unary operator over the lattice set of the operand, i.e. each
/// lattice point on an expression E is simply copied over, but with OP E		/// lattice point on an expression E is simply copied over, but with OP E
/// as new expression. Returns the index of the new set.		/// as new expression. Returns the identifier of the new set.
unsigned mapSet(Kind kind, unsigned s0, Value v = Value(),		LatSetId mapSet(Kind kind, LatSetId s, Value v = Value(),
Operation *op = nullptr);		Operation *op = nullptr);

/// Optimizes the iteration lattice points in the given set. This		/// Optimizes the iteration lattice points in the given set. This
/// method should be called right before code generation to avoid		/// method should be called right before code generation to avoid
/// generating redundant loops and conditions.		/// generating redundant loops and conditions.
unsigned optimizeSet(unsigned s0);		LatSetId optimizeSet(LatSetId s);

/// Simplifies the conditions in a conjunction of a given lattice point		/// Simplifies the conditions in a conjunction of a given lattice point
/// within the given set using just two basic rules:		/// within the given set using just two basic rules:
/// (1) multiple dense conditions are reduced to single dense, and		/// (1) multiple dense conditions are reduced to single dense, and
/// (2) a singleton sparse/dense is reduced to sparse/random access.		/// (2) a singleton sparse/dense is reduced to sparse/random access.
BitVector simplifyCond(unsigned s0, unsigned p0);		BitVector simplifyCond(LatSetId s, LatPointId p);

/// Returns true if Li > Lj.		/// Returns true if p0 > p1.
bool latGT(unsigned i, unsigned j) const;		bool latGT(LatPointId p0, LatPointId p1) const;

/// Returns true if Li and Lj only differ in dense.		/// Returns true if p0 and p1 only differ in dense.
bool onlyDenseDiff(unsigned i, unsigned j);		bool onlyDenseDiff(LatPointId p0, LatPointId p1) const;

/// Bit translation (get tensor ID).		/// Gets the tensor-identifier of the `TensorLoopId`.
unsigned tensor(unsigned b) const { return b % numTensors; }		TensorId tensor(TensorLoopId b) const { return b % numTensors; }
/// Bit translation (get loop index).		/// Gets the loop-identifier of the `TensorLoopId`.
unsigned index(unsigned b) const { return b / numTensors; }		LoopId loop(TensorLoopId b) const { return b / numTensors; }

/// Get the number of total loops (native loops + filter loops).		/// Get the total number of tensors (including the output-tensor and
unsigned getNumLoops() const { return numLoops; }		/// synthetic-tensor). The result is given the type `TensorId` since
/// Get the number of native loops.		/// the result is primarily used as an upper bound for `TensorId`s.
unsigned getNumNativeLoops() const { return numNativeLoops; }		TensorId getNumTensors() const { return numTensors; }
/// Get the number of filter loops.
unsigned getNumFilterLoops() const { return numLoops - numNativeLoops; }		/// Get the total number of loops (native loops + filter loops).
/// Get the starting filter loop index.		/// The result is given the type `LoopId` since the result will
unsigned getFilterLoopStartingIdx() const { return getNumNativeLoops(); }		/// generally be used as a for-loop upper bound.
		LoopId getNumLoops() const { return numLoops; }
/// Returns true if bit corresponds to index of output tensor.		/// Get the number of native loops. The result is given the type
bool isOutTensor(unsigned b, unsigned i) const {		/// `LoopId` since the result will generally be used as a for-loop
return tensor(b) == outTensor && index(b) == i;		/// upper bound.
}		LoopId getNumNativeLoops() const { return numNativeLoops; }
		/// Get the number of filter loops. The result is given the type
/// Gets tensor ID for the output tensor.		/// `LoopId` since the result will generally be used as a for-loop
unsigned getOutTensorID() const { return outTensor; }		/// upper bound.
/// Gets tensor ID for the synthetic tensor (used for all invariant tensor		LoopId getNumFilterLoops() const { return numLoops - numNativeLoops; }
/// expressions).		/// Get the identifier of the first filter-loop.
unsigned getSynTensorID() const { return syntheticTensor; }		LoopId getStartingFilterLoopId() const { return getNumNativeLoops(); }

bool isFilterLoop(unsigned ldx) const {		/// Returns true if `b` is the `i`th loop of the output tensor.
assert(ldx < numLoops);		bool isOutTensor(TensorLoopId b, LoopId i) const {
return ldx >= numNativeLoops;		assert(i < numLoops);
		return b == numTensors * i + outTensor;
		}

		/// Get the output tensor's identifier.
		TensorId getOutTensorID() const { return outTensor; }
		/// Get the synthetic tensor's identifier (used for all invariant
		/// tensor expressions).
		TensorId getSynTensorID() const { return syntheticTensor; }

		bool isFilterLoop(LoopId i) const {
		assert(i < numLoops);
		return i >= numNativeLoops;
}		}

/// Returns true if the expression is `(kTensor t)`.		/// Returns true if the expression is `(kTensor t)`.
bool expIsTensor(unsigned e, unsigned t) const {		bool expIsTensor(ExprId e, TensorId t) const {
return tensorExps[e].kind == kTensor && tensorExps[e].tensor == t;		return tensorExps[e].kind == kTensor && tensorExps[e].tensor == t;
}		}

/// Returns true if the expression contains the `t` as an operand.		/// Returns true if the expression contains the tensor as an operand.
bool expContainsTensor(unsigned e, unsigned t) const;		bool expContainsTensor(ExprId e, TensorId t) const;

/// Returns true if the expression contains a negation on output tensor.		/// Returns true if the expression contains a negation on output tensor.
/// I.e., `- outTensor` or `exp - outputTensor`		/// I.e., `- outTensor` or `exp - outputTensor`
/// NOTE: this is an trivial tests in that it does not handle recursive		/// NOTE: this is an trivial tests in that it does not handle recursive
/// negation, i.e., it returns true when the expression is `-(-tensor)`.		/// negation, i.e., it returns true when the expression is `-(-tensor)`.
bool hasNegateOnOut(unsigned e) const;		bool hasNegateOnOut(ExprId e) const;

/// Returns true if given tensor iterates only in the given tensor		/// Returns true if given tensor iterates only in the given tensor
/// expression. For the output tensor, this defines a "simply dynamic"		/// expression. For the output tensor, this defines a "simply dynamic"
/// operation [Bik96]. For instance: a(i) *= 2.0 or a(i) += a(i) for		/// operation [Bik96]. For instance: a(i) *= 2.0 or a(i) += a(i) for
/// sparse vector a.		/// sparse vector a.
bool isSingleCondition(unsigned t, unsigned e) const;		bool isSingleCondition(TensorId t, ExprId e) const;

/// Returns true if any set bit corresponds to sparse dimension level type.		/// Returns true if any `TensorLoopId` in the bitvector corresponds
		/// to sparse level-type.
bool hasAnySparse(const BitVector &bits) const;		bool hasAnySparse(const BitVector &bits) const;

/// Gets the dimension level type of the `t`th tensor on `i`th loop.		/// Gets the level-type of the `t`th tensor on `i`th loop.
DimLevelType getDimLevelType(unsigned t, unsigned i) const {		DimLevelType getDimLevelType(TensorId t, LoopId i) const {
assert(t < numTensors && i < numLoops);		assert(t < numTensors && i < numLoops);
return dimTypes[t][i];		return lvlTypes[t][i];
}		}
		DimLevelType getDimLevelType(TensorLoopId b) const {
/// Gets the dimension level type of `b`.		return getDimLevelType(tensor(b), loop(b));
DimLevelType getDimLevelType(unsigned b) const {
return getDimLevelType(tensor(b), index(b));
}		}

std::optional<unsigned> getLoopIdx(unsigned t, unsigned dim) const {		/// Gets the loop identifier for the `lvl`th level of the `t`th tensor.
assert(t < numTensors && dim < numLoops);		std::optional<LoopId> getLoopId(TensorId t, Level lvl) const {
return dimToLoopIdx[t][dim];		assert(t < numTensors && lvl < lvlToLoop[t].size());
		return lvlToLoop[t][lvl];
}		}

/// Gets the dimension number of the the `t`th tensor on `i`th loop.		/// Gets the level number of the the `t`th tensor on `i`th loop.
std::optional<unsigned> getDimNum(unsigned t, unsigned i) const {		std::optional<Level> getLvl(TensorId t, LoopId i) const {
assert(t < numTensors && i < numLoops);		assert(t < numTensors && i < numLoops);
return loopIdxToDim[t][i];		return loopToLvl[t][i];
}

/// Gets the dimension number of `b`.
std::optional<unsigned> getDimNum(unsigned b) const {
return getDimNum(tensor(b), index(b));
}		}
		std::optional<Level> getLvl(TensorLoopId b) const {
/// Sets the dimension and dimension level type of the `t`th tensor on `i`th		return getLvl(tensor(b), loop(b));
/// loop.
void setDimAndDimLevelType(unsigned t, unsigned i, unsigned dim,
DimLevelType dlt) {
assert(isValidDLT(dlt));
dimTypes[t][i] = dlt;
loopIdxToDim[t][i] = dim;
assert(dim < numLoops);
dimToLoopIdx[t][dim] = i;
}		}

// Iterates the bits of a lattice, for each set bit, converts it into the		/// Sets the level number and level-type of the `t`th tensor on
// corresponding tensor dimension and invokes the callback.		/// `i`th loop.
void foreachTidDimPairInBits(		void setLevelAndType(TensorId t, LoopId i, Level lvl, DimLevelType dlt) {
const BitVector &bits,		assert(t < numTensors && i < numLoops && lvl < lvlToLoop[t].size() &&
function_ref<void(unsigned b, unsigned tid, std::optional<unsigned> dim,		isValidDLT(dlt));
DimLevelType dlt)>		lvlTypes[t][i] = dlt;
cb) {		loopToLvl[t][i] = lvl;
for (unsigned b : bits.set_bits())		lvlToLoop[t][lvl] = i;
cb(b, tensor(b), getDimNum(b), getDimLevelType(b));		}

		/// Iterates over a set of `TensorLoopId`s, invoking the callback
		/// for each `TensorLoopId` and passing it the corresponding tensor
		/// identifier, level, and level-type.
		void
		foreachTensorLoopId(const BitVector &bits,
		function_ref<void(TensorLoopId, TensorId,
		std::optional<Level>, DimLevelType)>
		callback) const {
		for (const TensorLoopId b : bits.set_bits())
		callback(b, tensor(b), getLvl(b), getDimLevelType(b));
}		}

// Has sparse output tensor setter.		/// Sets whether the output tensor is sparse or not.
void setHasSparseOut(bool s) { hasSparseOut = s; }		void setHasSparseOut(bool s) { hasSparseOut = s; }

/// Convenience getters to immediately access the stored nodes.		/// Convenience getters to immediately access the stored nodes.
/// Typically it is inadvisible to keep the reference around, as in		/// Typically it is inadvisible to keep the reference around, as in
/// "TensorExpr &te = merger.exp(e))", since insertions into the merger		/// `TensorExpr &te = merger.exp(e)`, since insertions into the merger
/// may cause data movement and invalidate the underlying memory address.		/// may cause data movement and invalidate the underlying memory address.
TensorExp &exp(unsigned e) { return tensorExps[e]; }		TensorExp &exp(ExprId e) { return tensorExps[e]; }
LatPoint &lat(unsigned l) { return latPoints[l]; }		LatPoint &lat(LatPointId p) { return latPoints[p]; }
SmallVector<unsigned> &set(unsigned s) { return latSets[s]; }		SmallVector<LatPointId> &set(LatSetId s) { return latSets[s]; }

#ifndef NDEBUG		#ifndef NDEBUG
/// Print methods (for debugging).		/// Print methods (for debugging).
void dumpExp(unsigned e) const;		void dumpExp(ExprId e) const;
void dumpLat(unsigned p) const;		void dumpLat(LatPointId p) const;
void dumpSet(unsigned s) const;		void dumpSet(LatSetId s) const;
void dumpBits(const BitVector &bits) const;		void dumpBits(const BitVector &bits) const;
#endif		#endif

/// Builds the iteration lattices in a bottom-up traversal given the		/// Builds the iteration lattices in a bottom-up traversal given the
/// remaining tensor (sub)expression and the next loop index in the		/// remaining tensor (sub)expression and the next loop in the iteration
/// iteration graph. Returns index of the root expression.		/// graph. Returns the identifier of the root set.
unsigned buildLattices(unsigned e, unsigned i);		LatSetId buildLattices(ExprId e, LoopId i);

/// Builds a tensor expression from the given Linalg operation.		/// Builds a tensor expression from the given Linalg operation.
/// Returns index of the root expression on success.		/// On success, returns the identifier of the root expression.
std::optional<unsigned> buildTensorExpFromLinalg(linalg::GenericOp op);		std::optional<ExprId> buildTensorExpFromLinalg(linalg::GenericOp op);

/// Rebuilds SSA format from a tensor expression.		/// Rebuilds SSA format from a tensor expression.
Value buildExp(RewriterBase &rewriter, Location loc, unsigned e, Value v0,		Value buildExp(RewriterBase &rewriter, Location loc, ExprId e, Value v0,
Value v1);		Value v1) const;

private:		private:
/// Private helpers.		/// Private helpers.
bool maybeZero(unsigned e) const;		bool maybeZero(ExprId e) const;
bool isInvariant(unsigned e) const;		bool isInvariant(ExprId e) const;
Type inferType(unsigned e, Value src);		Type inferType(ExprId e, Value src) const;

/// Traverses the SSA tree (possibly a DAG) to build a tensor expression.		/// Traverses the SSA tree (possibly a DAG) to build a tensor expression.
std::optional<unsigned> buildTensorExp(linalg::GenericOp op, Value v);		std::optional<ExprId> buildTensorExp(linalg::GenericOp op, Value v);

/// Merger data structures.		/// Merger data structures.
const unsigned outTensor;		const TensorId outTensor;
const unsigned syntheticTensor;		const TensorId syntheticTensor;
const unsigned numTensors;		const unsigned numTensors;
const unsigned numNativeLoops;		const unsigned numNativeLoops;
const unsigned numLoops;		const unsigned numLoops;
bool hasSparseOut;		bool hasSparseOut;

// Map that converts pair<tensor id, loop id> to the corresponding dimension		// Below we use `std::vector` for things which have a priori fixed
// level type.		// sizes, whereas we use `llvm::SmallVector` for things with variable
std::vector<std::vector<DimLevelType>> dimTypes;		// size. Do beware that these two classes differ in the semantics of
		// `operator[]`: `SmallVector` performs OOB checks, whereas `std::vector`
// Map that converts pair<tensor id, loop id> to the corresponding		// does not.
// dimension.
std::vector<std::vector<std::optional<unsigned>>> loopIdxToDim;		// Map that converts pair<TensorId, LoopId> to the corresponding
		// level-type.
		std::vector<std::vector<DimLevelType>> lvlTypes;
		aartbikUnsubmitted Done Reply Inline Actions This was documented at some point, but perhaps the convention was lost. The vector as used for anything with fixed a priori size, the others for stuff that had variable length (but not too much) aartbik: This was documented at some point, but perhaps the convention was lost. The vector as used for…
		wrengrAuthorUnsubmitted Done Reply Inline Actions Aha. I'll convert the todo to a comment explaining that Though I do worry about the fact that `operator[]` has different semantics for the two types; since the type is not evident at the callsites, and thus it's not immediately apparent whether we need to assert against OOB or not. wrengr: Aha. I'll convert the todo to a comment explaining that Though I do worry about the fact that…

		// Map that converts pair<TensorId, LoopId> to the corresponding
		// level.
		std::vector<std::vector<std::optional<Level>>> loopToLvl;

// Map that converts pair<tensor id, dim> to the corresponding loop id.		// Map that converts pair<TensorId, Level> to the corresponding LoopId.
std::vector<std::vector<std::optional<unsigned>>> dimToLoopIdx;		std::vector<std::vector<std::optional<LoopId>>> lvlToLoop;

llvm::SmallVector<TensorExp> tensorExps;		llvm::SmallVector<TensorExp> tensorExps;
llvm::SmallVector<LatPoint> latPoints;		llvm::SmallVector<LatPoint> latPoints;
llvm::SmallVector<SmallVector<unsigned>> latSets;		llvm::SmallVector<SmallVector<LatPointId>> latSets;
};		};

} // namespace sparse_tensor		} // namespace sparse_tensor
} // namespace mlir		} // namespace mlir

#endif // MLIR_DIALECT_SPARSETENSOR_UTILS_MERGER_H_		#endif // MLIR_DIALECT_SPARSETENSOR_UTILS_MERGER_H_

mlir/lib/Dialect/SparseTensor/Transforms/CodegenEnv.h

Show All 39 Lines	public:
CodegenEnv(linalg::GenericOp linop, SparsificationOptions opts,		CodegenEnv(linalg::GenericOp linop, SparsificationOptions opts,
unsigned numTensors, unsigned numLoops, unsigned numFilterLoops);		unsigned numTensors, unsigned numLoops, unsigned numFilterLoops);

//		//
// General methods.		// General methods.
//		//

LogicalResult initTensorExp();		LogicalResult initTensorExp();
unsigned getTensorExp() const { return tensorExp; }		ExprId getExprId() const { return tensorExp; }

linalg::GenericOp op() const { return linalgOp; }		linalg::GenericOp op() const { return linalgOp; }
const SparsificationOptions &options() const { return sparseOptions; }		const SparsificationOptions &options() const { return sparseOptions; }
Merger &merger() { return latticeMerger; }		Merger &merger() { return latticeMerger; }
LoopEmitter &emitter() { return loopEmitter; }		LoopEmitter &emitter() { return loopEmitter; }

void startEmit();		void startEmit();

/// Generates loop boundary statements (entering/exiting loops). The function		/// Generates loop boundary statements (entering/exiting loops). The function
/// passes and updates the passed-in parameters.		/// passes and updates the passed-in parameters.
std::optional<Operation *>		std::optional<Operation *>
genLoopBoundary(function_ref<		genLoopBoundary(function_ref<
std::optional<Operation *>(MutableArrayRef<Value> parameters)>		std::optional<Operation *>(MutableArrayRef<Value> parameters)>
callback);		callback);

//		//
// Merger delegates.		// Merger delegates.
//		//

TensorExp &exp(unsigned e) { return latticeMerger.exp(e); }		TensorExp &exp(ExprId e) { return latticeMerger.exp(e); }
LatPoint &lat(unsigned l) { return latticeMerger.lat(l); }		LatPoint &lat(LatPointId l) { return latticeMerger.lat(l); }
SmallVector<unsigned> &set(unsigned s) { return latticeMerger.set(s); }		SmallVector<LatPointId> &set(LatSetId s) { return latticeMerger.set(s); }
DimLevelType dlt(unsigned t, unsigned i) const {		DimLevelType dlt(TensorId t, LoopId i) const {
return latticeMerger.getDimLevelType(t, i);		return latticeMerger.getDimLevelType(t, i);
}		}
DimLevelType dlt(unsigned b) const {		DimLevelType dlt(TensorLoopId b) const {
return latticeMerger.getDimLevelType(b);		return latticeMerger.getDimLevelType(b);
}		}

//		//
// Code generation environment verify functions.		// Code generation environment verify functions.
//		//

/// Whether the tensor expression is admissible for codegen.		/// Whether the tensor expression is admissible for codegen.
/// It also sets the sparseOut if the output tensor is sparse.		/// It also sets the sparseOut if the output tensor is sparse.
bool isAdmissibleTensorExp(unsigned exp);		bool isAdmissibleTensorExp(ExprId e);

/// Whether the iteration graph is sorted in admissible topoOrder.		/// Whether the iteration graph is sorted in admissible topoOrder.
/// Sets outerParNest on success with sparse output		/// Sets outerParNest on success with sparse output
bool isAdmissibleTopoOrder();		bool isAdmissibleTopoOrder();

//		//
// Topological delegate and sort methods.		// Topological delegate and sort methods.
//		//

size_t topSortSize() const { return topSort.size(); }		LoopOrd topSortSize() const { return topSort.size(); }
unsigned topSortAt(unsigned i) const { return topSort.at(i); }		LoopId topSortAt(LoopOrd n) const { return topSort.at(n); }
void topSortPushBack(unsigned i) { topSort.push_back(i); }		void topSortPushBack(LoopId i) { topSort.push_back(i); }
void topSortClear(unsigned capacity = 0) {		void topSortClear(size_t capacity = 0) {
topSort.clear();		topSort.clear();
topSort.reserve(capacity);		topSort.reserve(capacity);
}		}

ArrayRef<unsigned> getTopSortSlice(size_t n, size_t m) const;		ArrayRef<LoopId> getTopSortSlice(LoopOrd n, LoopOrd m) const;
ArrayRef<unsigned> getLoopCurStack() const;		ArrayRef<LoopId> getLoopStackUpTo(LoopOrd n) const;
Value getLoopIdxValue(size_t loopIdx) const;		ArrayRef<LoopId> getCurrentLoopStack() const;
		/// Returns the induction-variable for the loop identified by the given
		/// `LoopId`. This method handles application of the topological sort
		/// in order to convert the `LoopId` into the corresponding `LoopOrd`.
		Value getLoopVar(LoopId i) const;

//		//
// Sparse tensor output and expansion methods.		// Sparse tensor output and expansion methods.
//		//

bool hasSparseOutput() const { return sparseOut != nullptr; }		bool hasSparseOutput() const { return sparseOut != nullptr; }
bool isSparseOutput(OpOperand *o) const { return sparseOut == o; }		bool isSparseOutput(OpOperand *o) const { return sparseOut == o; }

Value getInsertionChain() const { return insChain; }		Value getInsertionChain() const { return insChain; }
void updateInsertionChain(Value chain);		void updateInsertionChain(Value chain);

bool atExpandLevel(OpOperand *o, unsigned rank, unsigned lv) const;		// FIXME: clarify what this "rank" is really supposed to mean/be.
		bool atExpandLevel(OpOperand *o, unsigned rank, LoopOrd n) const;
void startExpand(Value values, Value filled, Value added, Value count);		void startExpand(Value values, Value filled, Value added, Value count);
bool isExpand() const { return expValues != nullptr; }		bool isExpand() const { return expValues != nullptr; }
void updateExpandCount(Value count);		void updateExpandCount(Value count);
Value getExpandValues() const { return expValues; }		Value getExpandValues() const { return expValues; }
Value getExpandFilled() const { return expFilled; }		Value getExpandFilled() const { return expFilled; }
Value getExpandAdded() const { return expAdded; }		Value getExpandAdded() const { return expAdded; }
Value getExpandCount() const { return expCount; }		Value getExpandCount() const { return expCount; }
void endExpand();		void endExpand();

//		//
// Reduction methods.		// Reduction methods.
//		//

void startReduc(unsigned exp, Value val);		void startReduc(ExprId exp, Value val);
bool isReduc() const { return redExp != -1u; }		bool isReduc() const { return redExp != kInvalidId; }
void updateReduc(Value val);		void updateReduc(Value val);
Value getReduc() const { return redVal; }		Value getReduc() const { return redVal; }
Value endReduc();		Value endReduc();
void setValidLexInsert(Value val);		void setValidLexInsert(Value val);
void clearValidLexInsert();		void clearValidLexInsert();
Value getValidLexInsert() const { return redValidLexInsert; }		Value getValidLexInsert() const { return redValidLexInsert; }

void startCustomReduc(unsigned exp);		void startCustomReduc(ExprId exp);
bool isCustomReduc() const { return redCustom != -1u; }		bool isCustomReduc() const { return redCustom != kInvalidId; }
Value getCustomRedId();		Value getCustomRedId();
void endCustomReduc();		void endCustomReduc();

private:		private:
// Linalg operation.		// Linalg operation.
linalg::GenericOp linalgOp;		linalg::GenericOp linalgOp;

// Sparsification options.		// Sparsification options.
SparsificationOptions sparseOptions;		SparsificationOptions sparseOptions;

// Merger helper class.		// Merger helper class.
Merger latticeMerger;		Merger latticeMerger;

// Loop emitter helper class.		// Loop emitter helper class.
LoopEmitter loopEmitter;		LoopEmitter loopEmitter;

// Topological sort.		// Topological sort. This serves as a mapping from `LoopOrd` to `LoopId`
std::vector<unsigned> topSort;		// (cf., `getLoopVar` and `topSortAt`).
		std::vector<LoopId> topSort;

// Sparse tensor as output. Implemented either through direct injective		// Sparse tensor as output. Implemented either through direct injective
// insertion in lexicographic index order or through access pattern		// insertion in lexicographic index order or through access pattern
// expansion in the innermost loop nest (`expValues` through `expCount`).		// expansion in the innermost loop nest (`expValues` through `expCount`).
OpOperand *sparseOut;		OpOperand *sparseOut;
unsigned outerParNest;		// The count of outer non-filter loops, as defined by `isAdmissibleTopoOrder`.
		LoopOrd outerParNest;
Value insChain;		Value insChain;
Value expValues;		Value expValues;
Value expFilled;		Value expFilled;
Value expAdded;		Value expAdded;
Value expCount;		Value expCount;

// Bookkeeping for reductions (up-to-date value of the reduction, and indices		// Bookkeeping for reductions (up-to-date value of the reduction, and indices
// into the merger's expression tree. When the indices of a tensor reduction		// into the merger's expression tree. When the indices of a tensor reduction
// expression are exhausted, all inner loops can use a scalarized reduction.		// expression are exhausted, all inner loops can use a scalarized reduction.
Value redVal;		Value redVal;
unsigned redExp;		ExprId redExp;
unsigned redCustom;		ExprId redCustom;

// Bookkeeping for lex insertion during reductions. Holds the runtime boolean		// Bookkeeping for lex insertion during reductions. Holds the runtime boolean
// value of whether any reduction occurred. This is only set during a		// value of whether any reduction occurred. This is only set during a
// reduction and cleared once the reduction is finished.		// reduction and cleared once the reduction is finished.
Value redValidLexInsert;		Value redValidLexInsert;

// The root tensor expression of the kernel.		// The root tensor expression of the kernel.
unsigned tensorExp;		ExprId tensorExp;
};		};

} // namespace sparse_tensor		} // namespace sparse_tensor
} // namespace mlir		} // namespace mlir

#endif // MLIR_DIALECT_SPARSETENSOR_TRANSFORMS_CODEGENENV_H_		#endif // MLIR_DIALECT_SPARSETENSOR_TRANSFORMS_CODEGENENV_H_

mlir/lib/Dialect/SparseTensor/Transforms/CodegenEnv.cpp

Show All 32 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

CodegenEnv::CodegenEnv(linalg::GenericOp linop, SparsificationOptions opts,		CodegenEnv::CodegenEnv(linalg::GenericOp linop, SparsificationOptions opts,
unsigned numTensors, unsigned numLoops,		unsigned numTensors, unsigned numLoops,
unsigned numFilterLoops)		unsigned numFilterLoops)
: linalgOp(linop), sparseOptions(opts),		: linalgOp(linop), sparseOptions(opts),
latticeMerger(numTensors, numLoops, numFilterLoops), loopEmitter(),		latticeMerger(numTensors, numLoops, numFilterLoops), loopEmitter(),
topSort(), sparseOut(nullptr), outerParNest(-1u), insChain(), expValues(),		topSort(), sparseOut(nullptr), outerParNest(-1u), insChain(), expValues(),
expFilled(), expAdded(), expCount(), redVal(), redExp(-1u),		expFilled(), expAdded(), expCount(), redVal(), redExp(kInvalidId),
redCustom(-1u), redValidLexInsert() {}		redCustom(kInvalidId), redValidLexInsert() {}

LogicalResult CodegenEnv::initTensorExp() {		LogicalResult CodegenEnv::initTensorExp() {
// Builds the tensor expression for the Linalg operation in SSA form.		// Builds the tensor expression for the Linalg operation in SSA form.
std::optional<unsigned> optExp = latticeMerger.buildTensorExpFromLinalg(op());		std::optional<ExprId> optExp = latticeMerger.buildTensorExpFromLinalg(op());
if (!optExp \|\| !isAdmissibleTensorExp(*optExp))		if (!optExp \|\| !isAdmissibleTensorExp(*optExp))
return failure();		return failure();

tensorExp = *optExp;		tensorExp = *optExp;
return success();		return success();
}		}

void CodegenEnv::startEmit() {		void CodegenEnv::startEmit() {
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	if (insChain != nullptr)
updateInsertionChain(params[i]);		updateInsertionChain(params[i]);
return r;		return r;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Code generation environment verify functions.		// Code generation environment verify functions.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

bool CodegenEnv::isAdmissibleTensorExp(unsigned exp) {		bool CodegenEnv::isAdmissibleTensorExp(ExprId exp) {
// We reject any expression that makes a reduction from `-outTensor`, as those		// We reject any expression that makes a reduction from `-outTensor`, as those
// expressions create a dependency between the current iteration (i) and the		// expressions create a dependency between the current iteration (i) and the
// previous iteration (i-1). It would require iterating over the whole		// previous iteration (i-1). It would require iterating over the whole
// coordinate space, which prevent exploiting sparsity for faster code.		// coordinate space, which prevent exploiting sparsity for faster code.
for (utils::IteratorType it : linalgOp.getIteratorTypesArray()) {		for (utils::IteratorType it : linalgOp.getIteratorTypesArray()) {
if (it == utils::IteratorType::reduction) {		if (it == utils::IteratorType::reduction) {
if (latticeMerger.hasNegateOnOut(exp))		if (latticeMerger.hasNegateOnOut(exp))
return false;		return false;
break;		break;
}		}
}		}

OpOperand *lhs = linalgOp.getDpsInitOperand(0);		OpOperand *lhs = linalgOp.getDpsInitOperand(0);
unsigned tensor = lhs->getOperandNumber();		// That the operand number is a valid `TensorId` will be verified
		// by the call to `isSingleCondition` below; though we may want to add
		// assertions to check it here, in order to give better error messages.
		const TensorId tensor = lhs->getOperandNumber();
// An non-annotated output tensor is assumed dense, and becomes a random		// An non-annotated output tensor is assumed dense, and becomes a random
// access n-dim memref. Admissible since insertions cannot occur.		// access n-dim memref. Admissible since insertions cannot occur.
if (getSparseTensorType(lhs->get()).isAllDense())		if (getSparseTensorType(lhs->get()).isAllDense())
return true;		return true;

// A tensor expression with a sparse output tensor that changes its values		// A tensor expression with a sparse output tensor that changes its values
// but not its nonzero structure, an operation called "simply dynamic" in		// but not its nonzero structure, an operation called "simply dynamic" in
// [Bik96,Ch9], is also admissible without special env.		// [Bik96,Ch9], is also admissible without special env.
if (latticeMerger.isSingleCondition(tensor, exp))		if (latticeMerger.isSingleCondition(tensor, exp))
return true;		return true;

// Accept "truly dynamic" if the output tensor materializes uninitialized		// Accept "truly dynamic" if the output tensor materializes uninitialized
// into the computation and insertions occur in lexicographic index order.		// into the computation and insertions occur in lexicographic index order.
sparseOut = lhs;		sparseOut = lhs;
return isMaterializing(lhs->get());		return isMaterializing(lhs->get());
}		}

bool CodegenEnv::isAdmissibleTopoOrder() {		bool CodegenEnv::isAdmissibleTopoOrder() {
if (!hasSparseOutput())		if (!hasSparseOutput())
return true;		return true;

OpOperand *lhs = linalgOp.getDpsInitOperand(0);		OpOperand *lhs = linalgOp.getDpsInitOperand(0);
// Accept "truly dynamic" if the output tensor materializes uninitialized		// Accept "truly dynamic" if the output tensor materializes uninitialized
// into the computation and insertions occur in lexicographic index order.		// into the computation and insertions occur in lexicographic index order.
unsigned nest = 0;		LoopOrd nest = 0;
auto iteratorTypes = linalgOp.getIteratorTypesArray();		const auto iteratorTypes = linalgOp.getIteratorTypesArray();
for (unsigned i = 0, e = latticeMerger.getNumLoops(); i < e; i++) {		assert(topSortSize() == latticeMerger.getNumLoops());
if (!latticeMerger.isFilterLoop(topSortAt(i))) {		for (const LoopId i : topSort) {
		if (!latticeMerger.isFilterLoop(i)) {
// We only count non-filter loops as filter loops should be considered		// We only count non-filter loops as filter loops should be considered
// as a special type of parallel loops.		// a special type of parallel loops.
if (linalg::isReductionIterator(iteratorTypes[topSortAt(i)]))		if (linalg::isReductionIterator(iteratorTypes[i]))
break; // terminate at first reduction		break; // terminate at first reduction
nest++;		nest++;
}		}
}		}
// Determine admissible dynamic insertion situations:		// Determine admissible dynamic insertion situations:
// (1) fully injective, since there are no reductions,		// (1) fully injective, since there are no reductions,
// (2) admissible 1-d expansion in innermost dimension.		// (2) admissible 1-d expansion in innermost dimension.
if (nest >= linalgOp.getRank(lhs) - 1) {		if (static_cast<int64_t>(nest) >= linalgOp.getRank(lhs) - 1) {
outerParNest = nest;		outerParNest = nest;
return true;		return true;
}		}
return false;		return false;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Code generation environment topological sort methods		// Code generation environment topological sort methods
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

ArrayRef<unsigned> CodegenEnv::getTopSortSlice(size_t n, size_t m) const {		ArrayRef<LoopId> CodegenEnv::getTopSortSlice(LoopOrd n, LoopOrd m) const {
return ArrayRef<unsigned>(topSort).slice(n, m);		return ArrayRef<LoopId>(topSort).slice(n, m);
}		}

ArrayRef<unsigned> CodegenEnv::getLoopCurStack() const {		ArrayRef<LoopId> CodegenEnv::getLoopStackUpTo(LoopOrd n) const {
return getTopSortSlice(0, loopEmitter.getCurrentDepth());		return ArrayRef<LoopId>(topSort).take_front(n);
}		}

Value CodegenEnv::getLoopIdxValue(size_t loopIdx) const {		ArrayRef<LoopId> CodegenEnv::getCurrentLoopStack() const {
for (unsigned lv = 0, lve = topSort.size(); lv < lve; lv++)		return getLoopStackUpTo(loopEmitter.getCurrentDepth());
if (topSort[lv] == loopIdx)		}
return loopEmitter.getLoopIV(lv);
llvm_unreachable("invalid loop index");		Value CodegenEnv::getLoopVar(LoopId i) const {
		// TODO: this class should store the inverse of `topSort` so that
		// it can do this conversion directly, instead of searching through
		// `topSort` every time. (Or else, `LoopEmitter` should handle this.)
		for (LoopOrd n = 0, numLoops = topSortSize(); n < numLoops; n++)
		if (topSort[n] == i)
		return loopEmitter.getLoopIV(n);
		llvm_unreachable("invalid loop identifier");
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Code generation environment sparse tensor output and expansion methods		// Code generation environment sparse tensor output and expansion methods
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

void CodegenEnv::updateInsertionChain(Value chain) {		void CodegenEnv::updateInsertionChain(Value chain) {
assert(sparseOut != nullptr && insChain != nullptr);		assert(sparseOut != nullptr && insChain != nullptr);
insChain = chain;		insChain = chain;
}		}

bool CodegenEnv::atExpandLevel(OpOperand *o, unsigned rank, unsigned lv) const {		// FIXME: clarify what this "rank" is really supposed to mean/be.
return sparseOut == o && outerParNest == rank - 1 && outerParNest == lv;		bool CodegenEnv::atExpandLevel(OpOperand *o, unsigned rank, LoopOrd n) const {
		return sparseOut == o && outerParNest == static_cast<LoopOrd>(rank - 1) &&
		outerParNest == n;
}		}

void CodegenEnv::startExpand(Value values, Value filled, Value added,		void CodegenEnv::startExpand(Value values, Value filled, Value added,
Value count) {		Value count) {
assert(sparseOut != nullptr && expValues == nullptr);		assert(sparseOut != nullptr && expValues == nullptr);
expValues = values;		expValues = values;
expFilled = filled;		expFilled = filled;
expAdded = added;		expAdded = added;
Show All 9 Lines	void CodegenEnv::endExpand() {
assert(sparseOut != nullptr && expValues != nullptr);		assert(sparseOut != nullptr && expValues != nullptr);
expValues = expFilled = expAdded = expCount = Value();		expValues = expFilled = expAdded = expCount = Value();
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Code generation environment reduction methods		// Code generation environment reduction methods
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

void CodegenEnv::startReduc(unsigned exp, Value val) {		void CodegenEnv::startReduc(ExprId exp, Value val) {
assert(redExp == -1u && exp != -1u);		assert(!isReduc() && exp != kInvalidId);
redExp = exp;		redExp = exp;
updateReduc(val);		updateReduc(val);
}		}

void CodegenEnv::updateReduc(Value val) {		void CodegenEnv::updateReduc(Value val) {
assert(redExp != -1u);		assert(isReduc());
redVal = exp(redExp).val = val;		redVal = exp(redExp).val = val;
}		}

Value CodegenEnv::endReduc() {		Value CodegenEnv::endReduc() {
Value val = redVal;		Value val = redVal;
updateReduc(Value());		updateReduc(Value());
redExp = -1u;		redExp = kInvalidId;
return val;		return val;
}		}

void CodegenEnv::setValidLexInsert(Value val) {		void CodegenEnv::setValidLexInsert(Value val) {
assert(isReduc() && val);		assert(isReduc() && val);
redValidLexInsert = val;		redValidLexInsert = val;
}		}

void CodegenEnv::clearValidLexInsert() {		void CodegenEnv::clearValidLexInsert() {
assert(!isReduc());		assert(!isReduc());
redValidLexInsert = Value();		redValidLexInsert = Value();
}		}

void CodegenEnv::startCustomReduc(unsigned exp) {		void CodegenEnv::startCustomReduc(ExprId exp) {
assert(redCustom == -1u && exp != -1u);		assert(!isCustomReduc() && exp != kInvalidId);
redCustom = exp;		redCustom = exp;
}		}

Value CodegenEnv::getCustomRedId() {		Value CodegenEnv::getCustomRedId() {
assert(redCustom != -1u);		assert(isCustomReduc());
return dyn_cast<sparse_tensor::ReduceOp>(exp(redCustom).op).getIdentity();		return dyn_cast<sparse_tensor::ReduceOp>(exp(redCustom).op).getIdentity();
}		}

void CodegenEnv::endCustomReduc() {		void CodegenEnv::endCustomReduc() {
assert(redCustom != -1u);		assert(isCustomReduc());
redCustom = -1u;		redCustom = kInvalidId;
}		}

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h

//===- LoopEmitter.h --------------------------------------------- C++ --===//		//===- LoopEmitter.h --------------------------------------------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef MLIR_DIALECT_SPARSETENSOR_TRANSFORMS_SPARSETENSORLOOPEMITTER_H_		#ifndef MLIR_DIALECT_SPARSETENSOR_TRANSFORMS_SPARSETENSORLOOPEMITTER_H_
#define MLIR_DIALECT_SPARSETENSOR_TRANSFORMS_SPARSETENSORLOOPEMITTER_H_		#define MLIR_DIALECT_SPARSETENSOR_TRANSFORMS_SPARSETENSORLOOPEMITTER_H_

#include <vector>		#include <vector>

#include "mlir/Dialect/SparseTensor/IR/Enums.h"		#include "mlir/Dialect/SparseTensor/IR/Enums.h"
		#include "mlir/Dialect/SparseTensor/IR/SparseTensor.h"
		#include "mlir/Dialect/SparseTensor/Utils/Merger.h"
#include "mlir/IR/PatternMatch.h"		#include "mlir/IR/PatternMatch.h"

namespace mlir {		namespace mlir {
namespace sparse_tensor {		namespace sparse_tensor {

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		/// The position of a loop in the loop-stack, or the position of a
		/// `LoopId` in a topologically-sorted list of `LoopId`s.
		///
		/// Although this type may have the same cardinality as `LoopId`, it must
		/// not be confused with that type. The `LoopId` type is used by the `Merger`
		/// as a unique identifier for loop-variables, regardless of the ordering
		/// of those loops. Whereas the `LoopOrd` type is used by the `LoopEmitter`
		/// (and `CodegenEnv`) to refer to the actual order in which loops are
		/// generated.
		///
		/// TODO: further explicate the correspondences between these various
		/// types. In particular, since the `$dim` argument to `linalg::IndexOp`
		/// is a De Bruijn index, it seems like that should correspond to `LoopOrd`,
		/// and yet the `Merger` has that correspond with `LoopId` instead.
		/// In addition `LoopEmitter::genAffine` has `AffineDimExpr::position`
		/// correspond to `LoopId`, however it is unclear what the providence
		/// of those `AffineDimExpr` is.
		//
		// TODO: use a struct/class rather than a typedef, so that we can actually
		// typecheck this to avoid mixups in the code.
		using LoopOrd = unsigned;
		PeimingUnsubmitted Done Reply Inline Actions I think `unsigned` is good enough here if you interpret it as the index to loop-stack Peiming: I think `unsigned` is good enough here if you interpret it as the index to loop-stack
		wrengrAuthorUnsubmitted Done Reply Inline Actions The previous code seemed a bit inconsistent about whether to use `size_t` vs `unsigned`, so I just picked the one that seemed a bit more common. Easy enough to switch. wrengr: The previous code seemed a bit inconsistent about whether to use `size_t` vs `unsigned`, so I…

		//===----------------------------------------------------------------------===//
// SparseTensorLoopEmiter class, manages sparse tensors and helps to		// SparseTensorLoopEmiter class, manages sparse tensors and helps to
// generate loop structure to (co)-iterate sparse tensors.		// generate loop structure to (co)-iterate sparse tensors.
//		//
// An example usage:		// An example usage:
// To generate the following loops over T1<?x?> and T2<?x?>		// To generate the following loops over T1<?x?> and T2<?x?>
//		//
// for i in TENSOR_1_0 {		// for i in TENSOR_1_0 {
// for j : TENSOR_2_0 {		// for j : TENSOR_2_0 {
// for k : TENSOR_1_1 {}		// for k : TENSOR_1_1 {}
// for k : TENSOR_2_1 {}		// for k : TENSOR_2_1 {}
// }		// }
// }		// }
//		//
// One can use		// One can use
//		//
// SparseTensorLoopEmiter loopEmiter({T1, T1});		// LoopEmiter loopEmiter({T1, T1});
// loopEmiter.initializeLoopEmit();		// loopEmiter.initializeLoopEmit();
// loopEmiter.enterLoopOverTensorAtDim(T1, 0);		// loopEmiter.enterLoopOverTensorAtLvl(T1, 0);
// loopEmiter.enterLoopOverTensorAtDim(T2, 0);		// loopEmiter.enterLoopOverTensorAtLvl(T2, 0);
// loopEmiter.enterLoopOverTensorAtDim(T1, 1);		// loopEmiter.enterLoopOverTensorAtLvl(T1, 1);
// loopEmiter.exitCurrentLoop();		// loopEmiter.exitCurrentLoop();
// loopEmiter.enterLoopOverTensorAtDim(T2, 1);		// loopEmiter.enterLoopOverTensorAtLvl(T2, 1);
// loopEmiter.exitCurrentLoop(); // exit k		// loopEmiter.exitCurrentLoop(); // exit k
// loopEmiter.exitCurrentLoop(); // exit j		// loopEmiter.exitCurrentLoop(); // exit j
// loopEmiter.exitCurrentLoop(); // exit i		// loopEmiter.exitCurrentLoop(); // exit i
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

class LoopEmitter {		class LoopEmitter {
public:		public:
/// Optional callback function to setup dense output tensors when		/// Optional callback function to setup dense output tensors when
/// initializing the loop emitter (e.g., to fill a dense output with zeros).		/// initializing the loop emitter (e.g., to fill a dense output with zeros).
using OutputUpdater = function_ref<Value(OpBuilder &builder, Location loc,		using OutputUpdater = function_ref<Value(OpBuilder &builder, Location loc,
Value memref, Value tensor)>;		Value memref, Value tensor)>;

LoopEmitter() = default;		LoopEmitter() = default;

/// Takes an array of tensors inputs, on which the generated loops will		/// Takes an array of input tensors, which the generated loops will
/// iterate on. The index of the tensor in the array is also the tensor id		/// iterate over. Each tensor is given a `TensorId` (numerically equal
/// (tid) used in related functions. If isSparseOut is set, loop emitter		/// to the position of that tensor `Value` in the array). Setting
/// assume that the sparse output tensor is empty, and will always generate		/// `isSparseOut` indicates that the sparse output tensor is empty,
/// loops on it based on the dim sizes. An optional array could be provided		/// so the loop emitter will generate loops over it according to the
/// (by sparsification) to indicate the loop id sequence that will be		/// level-sizes. The `topSort` array specifies the actual order in
/// generated. It is used to establish the mapping between affineDimExpr to		/// which loops are generated, thus providing a mapping from `LoopOrd`
/// the corresponding loop index in the loop stack that are maintained by the		/// to `LoopId`.
/// loop emitter.
void initialize(ValueRange tensors, StringAttr loopTag = nullptr,		void initialize(ValueRange tensors, StringAttr loopTag = nullptr,
bool hasOutput = false, bool isSparseOut = false,		bool hasOutput = false, bool isSparseOut = false,
ArrayRef<unsigned> topSort = {});		ArrayRef<LoopId> topSort = {});

explicit LoopEmitter(ValueRange tensors, StringAttr loopTag = nullptr,		explicit LoopEmitter(ValueRange tensors, StringAttr loopTag = nullptr,
bool hasOutput = false, bool isSparseOut = false,		bool hasOutput = false, bool isSparseOut = false,
ArrayRef<unsigned> topSort = {});		ArrayRef<LoopId> topSort = {});

/// Starts a loop emitting session by generating all the buffers needed to		/// Starts a loop emitting session by generating all the buffers needed
/// iterate tensors.		/// for iterating over the tensors.
void initializeLoopEmit(OpBuilder &builder, Location loc,		void initializeLoopEmit(OpBuilder &builder, Location loc,
OutputUpdater updater = nullptr);		OutputUpdater updater = nullptr);

/// Generates a list of operations to compute the affine expression.		/// Generates code to compute an affine expression whose variables are
Value genAffine(OpBuilder &builder, AffineExpr a, Location loc);		/// `LoopId`s (i.e., `a.cast<AffineDimExpr>().getPosition()` is a valid
		/// `LoopId`).
		Value genAffine(OpBuilder &builder, Location loc, AffineExpr a);

/// Enters a new loop sequence, the loops within the same sequence starts		/// Enters a new loop sequence, the loops within the same sequence starts
/// from the break points of previous loop instead of starting over from 0.		/// from the break points of previous loop instead of starting over from 0.
/// e.g.,		/// e.g.,
/// {		/// {
/// // loop sequence start.		/// // loop sequence start.
/// p0 = while(xxx)		/// p0 = while(xxx)
/// ...		/// ...
/// break p0		/// break p0
///		///
/// // Starts loop from p0		/// // Starts loop from p0
/// for (i = p0; i < end; i++)		/// for (i = p0; i < end; i++)
/// ...		/// ...
/// // loop sequence end.		/// // loop sequence end.
/// }		/// }
void enterNewLoopSeq(OpBuilder &builder, Location loc, ArrayRef<size_t> tids,		void enterNewLoopSeq(OpBuilder &builder, Location loc,
ArrayRef<size_t> dims);		ArrayRef<TensorId> tids, ArrayRef<Level> lvls);

// exit the current loop sequence, this will reset universal index to 0.		/// Exits the current loop sequence, this will reset universal index to 0.
void exitCurrentLoopSeq() {		void exitCurrentLoopSeq() {
assert(loopSeqStack.size() == loopStack.size() + 1);		assert(loopSeqStack.size() == loopStack.size() + 1);
loopSeqStack.pop_back();		loopSeqStack.pop_back();
}		}

// TODO: Gets rid of `dim` in the argument list? Track the dimension we		// TODO: Get rid of `lvls` in the argument list? Track the level we
// are currently at internally. Then it would be enterNextDimForTensor.		// are currently at internally. Then it would be enterNextLvlForTensor.
// Still need a way to specify the dim for non annoated dense tensor though,		// Still need a way to specify the lvl for non-annotated tensors though,
// as it can be accessed out of order.		// as those can be accessed out of order.
/// Emits loop over tensor_tid_dim, it assumes that loops between		//
/// tensor_tid_[0, dim - 1] have already been generated.		/// Emits loop over tensor_tid_lvl, it assumes that loops between
		/// tensor_tid_[0, lvl - 1] have already been generated.
/// The function will also perform in-place update on the `reduc` vector to		/// The function will also perform in-place update on the `reduc` vector to
/// return the reduction variable used inside the generated loop.		/// return the reduction variable used inside the generated loop.
Operation *enterLoopOverTensorAtDim(OpBuilder &builder, Location loc,		Operation *enterLoopOverTensorAtLvl(OpBuilder &builder, Location loc,
ArrayRef<size_t> tids,		ArrayRef<TensorId> tids,
ArrayRef<size_t> dims,		ArrayRef<Level> lvls,
MutableArrayRef<Value> reduc = {},		MutableArrayRef<Value> reduc = {},
bool isParallel = false);		bool isParallel = false);

Operation *enterFilterLoopOverTensorAtDim(OpBuilder &builder, Location loc,		Operation *enterFilterLoopOverTensorAtLvl(OpBuilder &builder, Location loc,
size_t tid, size_t dim,		TensorId tid, Level lvl,
AffineExpr affine,		AffineExpr affine,
MutableArrayRef<Value> reduc = {});		MutableArrayRef<Value> reduc = {});

void genDenseAffineAddressAtCurLevel(OpBuilder &builder, Location loc,		void genDenseAffineAddress(OpBuilder &builder, Location loc, TensorId tid,
size_t tid, size_t dim,		Level lvl, AffineExpr lvlExpr);
AffineExpr affine);

/// Emits a co-iteration loop over a set of tensors.		/// Emits a co-iteration loop over a set of tensors.
Operation *enterCoIterationOverTensorsAtDims(		Operation *enterCoIterationOverTensorsAtLvls(
OpBuilder &builder, Location loc, ArrayRef<size_t> tids,		OpBuilder &builder, Location loc, ArrayRef<TensorId> tids,
ArrayRef<size_t> dims, bool needsUniv, MutableArrayRef<Value> reduc = {});		ArrayRef<Level> lvls, bool needsUniv, MutableArrayRef<Value> reduc = {});

void exitCurrentLoop(RewriterBase &rewriter, Location loc,		void exitCurrentLoop(RewriterBase &rewriter, Location loc,
MutableArrayRef<Value> reduc = {});		MutableArrayRef<Value> reduc = {});

/// Returns the array of coordinate for all the loop generated till now.		/// Fills the out-parameter with the loop induction variables for all
void getCoordinateArray(SmallVectorImpl<Value> &coords) const {		/// loops in the current loop-stack. The variables are given in the
		/// same order as the loop-stack, hence `ivs` should be indexed into
		/// by `LoopOrd` (not `LoopId`).
		void getLoopIVs(SmallVectorImpl<Value> &ivs) const {
		ivs.clear();
		ivs.reserve(getCurrentDepth());
for (auto &l : loopStack)		for (auto &l : loopStack)
coords.push_back(l.iv);		ivs.push_back(l.iv);
}		}

/// Gets loop induction variable at the given level.		/// Gets the current depth of the loop-stack. The result is given
unsigned getCurrentDepth() const { return loopStack.size(); }		/// the type `LoopOrd` for the same reason as one-past-the-end iterators.
		LoopOrd getCurrentDepth() const { return loopStack.size(); }
/// Gets loop induction variable at the given level.
Value getLoopIV(size_t level) const {		/// Gets loop induction variable for the given `LoopOrd`.
if (level < loopStack.size())		Value getLoopIV(LoopOrd n) const {
return loopStack[level].iv;		return n < getCurrentDepth() ? loopStack[n].iv : Value();
return nullptr;
}		}

///		///
/// Getters.		/// Getters.
///		///
const std::vector<std::vector<Value>> &getPidxs() const { return pidxs; };		const std::vector<std::vector<Value>> &getPosits() const { return posits; };
const std::vector<std::vector<Value>> &getCoord() const { return coord; };		const std::vector<std::vector<Value>> &getCoords() const { return coords; };
const std::vector<std::vector<Value>> &getHighs() const { return highs; };		const std::vector<std::vector<Value>> &getHighs() const { return highs; };
const std::vector<std::vector<Value>> &getPosBuffer() const {		const std::vector<std::vector<Value>> &getPositionBuffers() const {
return posBuffer;		return positionsBuffers;
};		};
const std::vector<std::vector<Value>> &getCrdBuffer() const {		const std::vector<std::vector<Value>> &getCoordinateBuffers() const {
return crdBuffer;		return coordinatesBuffers;
};		};
const std::vector<Value> &getValBuffer() const { return valBuffer; };		const std::vector<Value> &getValBuffer() const { return valBuffer; };

constexpr static llvm::StringLiteral getLoopEmitterLoopAttrName() {		constexpr static llvm::StringLiteral getLoopEmitterLoopAttrName() {
return llvm::StringLiteral("Emitted from");		return llvm::StringLiteral("Emitted from");
}		}

private:		private:
struct LoopLevelInfo {		struct LoopInfo {
LoopLevelInfo(ArrayRef<size_t> tids, ArrayRef<size_t> dims, Operation *loop,		LoopInfo(ArrayRef<TensorId> tids, ArrayRef<Level> lvls, Operation *loop,
Block *userBlock, Value iv, StringAttr loopTag)		Block *userBlock, Value iv, StringAttr loopTag)
: tids(tids), dims(dims), loop(loop), userCodeBlock(userBlock), iv(iv) {		: tids(tids), lvls(lvls), loop(loop), userCodeBlock(userBlock), iv(iv) {
// Attached a special tag to loop emitter generated loop.		// Attached a special tag to loop emitter generated loop.
if (loopTag)		if (loopTag)
loop->setAttr(LoopEmitter::getLoopEmitterLoopAttrName(), loopTag);		loop->setAttr(LoopEmitter::getLoopEmitterLoopAttrName(), loopTag);
}		}
// TODO: maybe use a vector<pair> for tid and dim?		// TODO: maybe use a vector<pair> for tid and lvl?
		// (Better yet, compress them together a la `TensorLoopId`.)
// The set of tensors that the loop is operating on		// The set of tensors that the loop is operating on
const llvm::SmallVector<size_t> tids;		const llvm::SmallVector<TensorId> tids;
// The corresponding dims for the tensors		// The corresponding levels for the tensors
const llvm::SmallVector<size_t> dims;		const llvm::SmallVector<Level> lvls;
const Operation *loop; // the loop operation		const Operation *loop; // the loop operation
Block *const userCodeBlock; // the block holding users' generated code.		Block *const userCodeBlock; // the block holding users' generated code.
const Value iv; // the induction variable for the loop		const Value iv; // the induction variable for the loop
};		};

/// Linearizes address for dense dimension (i.e., p = (i * d0) + j).		/// Linearizes address for dense level (i.e., p = (i * d0) + j).
Value genAddress(OpBuilder &builder, Location loc, size_t tid, size_t dim,		Value genAddress(OpBuilder &builder, Location loc, TensorId tid, Level lvl,
Value iv);		Value iv);

/// Generates the segment high for a non-unique level (to fast forward		/// Generates the segment high for a non-unique level (to fast forward
/// duplicated coordinates).		/// duplicated coordinates). That is, it generates the code:
Value genSegmentHigh(OpBuilder &builder, Location loc, size_t tid, size_t lvl,		///
Value pos, Value pHi);		/// crd = coordinates_tid_lvl[pos]
		/// while (pos < pHi && coordinates_tid_lvl[pos] == crd)
		/// pos++;
		/// <return pos>;
		Value genSegmentHigh(OpBuilder &builder, Location loc, TensorId tid,
		Level lvl, Value pos, Value pHi);

/// Generates instructions to compute the coordinate of tensors[tid][lvl]		/// Generates instructions to compute the coordinate of tensors[tid][lvl]
/// under the current loop context. The final argument is the		/// under the current loop context. The final argument is the
/// collapsed-output level, whereas this function handles converting		/// collapsed-output level, whereas this function handles converting
/// that to the uncollapsed-input level		/// that to the uncollapsed-input level
Value genSparseCrd(OpBuilder &builder, Location loc, size_t tid,		Value genSparseCrd(OpBuilder &builder, Location loc, TensorId tid,
size_t dstLvl);		Level dstLvl);

/// Generates a predicate to determine whether the tranformed coordinates are		/// Generates a predicate to determine whether the tranformed coordinates are
/// in the given slice.		/// in the given slice.
/// Returns std::pair<Transformed coordinates, Predicate>		/// Returns std::pair<Transformed coordinates, Predicate>
std::pair<Value, Value> genSliceLegitPredicate(OpBuilder &builder,		std::pair<Value, Value> genSliceLegitPredicate(OpBuilder &builder,
Location loc, Value crd,		Location loc, Value crd,
unsigned tid, unsigned lvl);		TensorId tid, Level lvl);

		TensorId getNumTensors() const { return tensors.size(); }

bool isOutputTensor(size_t tid) {		bool isOutputTensor(TensorId tid) const {
return hasOutput && tid == tensors.size() - 1;		return hasOutput && tid == static_cast<TensorId>(getNumTensors() - 1);
}		}

bool isSparseOutput(size_t tid) { return isOutputTensor(tid) && isSparseOut; }		bool isSparseOutput(TensorId tid) const {
		return isOutputTensor(tid) && isSparseOut;
		}

/// Setups [lo, hi] for iterating tensor[dim], it assumes that tensor[0		/// Prepares loop for iterating over `tensor[lvl]`, under the assumption
/// ...dims-1] has already been setup.		/// that `tensor[0...lvl-1]` loops have already been set up.
void prepareLoopOverTensorAtDim(OpBuilder &builder, Location loc, size_t tid,		void prepareLoopOverTensorAtLvl(OpBuilder &builder, Location loc,
size_t dim);		TensorId tid, Level lvl);

/// Emits extra locals, since the locals might not be in simplified lattices		/// Emits extra locals, since the locals might not be in simplified lattices
/// point used to generate the loops, but are still required to generates		/// point used to generate the loops, but are still required to generate
/// expressions.		/// expressions.
void emitExtraLocalsForTensorsAtDenseDims(OpBuilder &builder, Location loc,		void emitExtraLocalsForTensorsAtDenseLvls(OpBuilder &builder, Location loc,
ArrayRef<size_t> tids,		ArrayRef<TensorId> tids,
ArrayRef<size_t> dims);		ArrayRef<Level> lvls);

/// Exits a for loop, returns the reduction results, e.g.,		/// Exits a for loop, returns the reduction results, e.g.,
/// For sequential for loops:		/// For sequential for loops:
/// %ret = for () {		/// %ret = for () {
/// ...		/// ...
/// %val = addi %args, %c		/// %val = addi %args, %c
/// yield %val		/// yield %val
/// }		/// }
Show All 16 Lines	private:
/// users (`reduc`).		/// users (`reduc`).
void exitForLoop(RewriterBase &rewriter, Location loc,		void exitForLoop(RewriterBase &rewriter, Location loc,
MutableArrayRef<Value> reduc);		MutableArrayRef<Value> reduc);

/// Exits a while loop, returns the reduction results.		/// Exits a while loop, returns the reduction results.
void exitCoIterationLoop(OpBuilder &builder, Location loc,		void exitCoIterationLoop(OpBuilder &builder, Location loc,
MutableArrayRef<Value> reduc);		MutableArrayRef<Value> reduc);

		//
		// View-based-reshape methods.
		//

		/// Get the collapse reassociation for `tensors[tid][dstLvl]`.
		/// For unreshaped operands, the reassociation is simply an identity
		/// transformation.
		///
		/// NOTE: the result uses `Level` rather than the `int64_t` of
		/// `ReassociationIndices`, since the former gives clarity to what
		/// the values actually mean.
		///
		/// TODO: why not do this computation when we first store the reassoc,
		PeimingUnsubmitted Not Done Reply Inline Actions I won't spend too much effort on this because this is a temporary workaround. In the most cases, the `reassoc` is empty, and each level is queried just once. Peiming: I won't spend too much effort on this because 1) this is a temporary workaround. 2) In the…
		wrengrAuthorUnsubmitted Done Reply Inline Actions Even if it's queried just once, still makes sense to me to compute it when we store it. (The exception would be if it's expensive to compute (which it isn't really), or if a bunch of them are stored but never queried.) If it's just a temporary workaround, what's the longer-term solution? Cuz the workaround is pretty invasive— conceptually speaking (since it means we need to be sure to distinguish/clarify srcLvl-vs-dstLvl in a lot of places); not operationally speaking. Fwiw, if by the workaround-ness you mean our goal to split off the "view" stuff from the "real tensor" stuff, then I'm not entirely sure if that'd help the conceptual problems here. Even if/when we have separate MLIR-types for views vs tensors, we'll still need to handle the combinations together (e.g., if someone does a (tensor,view)-matmul then the shared dimension-axis will need to iterate over the corresponding source-tensor levels and then do the appropriate filtering to combine them). In any case, the main goal for this CL is just to rename things (and for this method, to move it to the appropriate section rather than having it interleaved with the data members). wrengr: Even if it's queried just once, still makes sense to me to compute it when we store it. (The…
		PeimingUnsubmitted Done Reply Inline Actions The ulimate solution is to have a `sparse tensor view`, (or whatever you prefer) and a set of operations on top them. Peiming: The ulimate solution is to have a `sparse tensor view`, (or whatever you prefer) and a set of…
		/// instead of doing it every time we look it up?
		SmallVector<Level, 2> getCollapseReassociation(TensorId tid, Level dstLvl) {
		assert(tid < getNumTensors() && "Invalid TensorId");
		assert(collapseReassoc.size() == getNumTensors());
		if (const auto reassoc = collapseReassoc[tid]) {
		// TODO: store the dstLvlRank in the LoopEmitter so that we can
		// check `dstLvl < dstLvlRank` at the top; and only here need to
		// assert that `reassoc.size() == dstLvlRank`.
		assert(dstLvl < reassoc.size() && "Level is out-of-bounds");
		const auto srcLvls = reassoc[dstLvl].cast<ArrayAttr>();
		return llvm::to_vector<2>(
		llvm::map_range(srcLvls, [&](Attribute srcLvl) -> Level {
		// TODO: replace this with the converter for `LevelAttr`.
		return srcLvl.cast<IntegerAttr>().getValue().getZExtValue();
		}));
		}
		return {dstLvl};
		}

/// A optional string attribute that should be attached to the loop		/// A optional string attribute that should be attached to the loop
/// generated by loop emitter, it might help following passes to identify		/// generated by loop emitter, it might help following passes to identify
/// loops that operates on sparse tensors more easily.		/// loops that operates on sparse tensors more easily.
StringAttr loopTag;		StringAttr loopTag;
/// Whether the loop emitter needs to treat the last tensor as the output		/// Whether the loop emitter needs to treat the last tensor as the output
/// tensor.		/// tensor.
bool hasOutput;		bool hasOutput;
bool isSparseOut;		bool isSparseOut;

		//
		// Fields which have `numTensor` many entries.
		//
		// TODO: switch to an AOS style to avoid any possible mismatches.
		//

/// Input and (optional) output tensors.		/// Input and (optional) output tensors.
std::vector<Value> tensors;		std::vector<Value> tensors;
/// The dim type array for each tensor.		/// Level-types for each `(TensorId, Level)` pair.
std::vector<std::vector<DimLevelType>> dimTypes;		std::vector<std::vector<DimLevelType>> lvlTypes;
/// Sparse iteration information (by tensor and dim). These arrays		// Sparse iteration information for each `(TensorId, Level)` pair.
/// are updated to remain current within the current loop.		// These arrays are updated to remain current within the current loop.
// TODO: we may want to rename "pidx(s)" to `posCursor(s)` or similar.		// TODO: Clarify which of these are indexed by dstLvl vs srcLvl.
std::vector<std::vector<Value>> pidxs;		//
		/// The collection of positions for a given element (one such collection
		/// for each tensor). This is the position analogue of the "coords"
		/// naming convention.
		///
		/// FIXME: [CLARIFY_POSITS_LVL] It's unclear which levels are used
		/// to index the `posits` array. On the one hand `genSparseCrd`
		/// uses dstLvl; on the other hand `enterLoopOverTensorAtLvl`,
		/// `prepareLoopOverTensorAtLvl`, and `enterCoIterationOverTensorsAtLvls`
		/// uses srcLvl. So which is it?
		std::vector<std::vector<Value>> posits;
		/// The collection of coordinates for a given element (one such
		/// collection for each tensor).
		std::vector<std::vector<Value>> coords;
// The segment upper bound for non-uniques level after de-duplication.		// The segment upper bound for non-uniques level after de-duplication.
std::vector<std::vector<Value>> segHi;		std::vector<std::vector<Value>> segHi;
std::vector<std::vector<Value>> coord;
std::vector<std::vector<Value>> highs;		std::vector<std::vector<Value>> highs;
std::vector<std::vector<Value>> lvlSizes;		std::vector<std::vector<Value>> lvlSizes;
std::vector<std::vector<Value>> posBuffer; // to_positions		std::vector<std::vector<Value>> positionsBuffers; // to_positions
std::vector<std::vector<Value>> crdBuffer; // to_coordinates		std::vector<std::vector<Value>> coordinatesBuffers; // to_coordinates
std::vector<Value> valBuffer; // to_value		std::vector<Value> valBuffer; // to_value

/// Whether the sparse input is a slice.		/// Whether the sparse input is a slice.
std::vector<bool> isSparseSlices;		std::vector<bool> isSparseSlices;
/// Values related to slices.		/// Values related to slices.
std::vector<std::vector<Value>> sliceOffsets;		std::vector<std::vector<Value>> sliceOffsets;
std::vector<std::vector<Value>> sliceStrides;		std::vector<std::vector<Value>> sliceStrides;

		/// Collapse Reassociations related to a specific tensor
		// TODO: support expand.
		std::vector<ArrayAttr> collapseReassoc;

		/// TODO: not yet used, it should track the current level for each tensor
		/// to help eliminate `lvls` paramters from above APIs.
		/// std::vector<Level> curLvl;

		//
		// Fields which have at most `numLoops` many entries.
		//

/// Loop Stack, stores the information of all the nested loops that are		/// Loop Stack, stores the information of all the nested loops that are
/// alive.		/// alive.
std::vector<LoopLevelInfo> loopStack;		std::vector<LoopInfo> loopStack;

/// Loop Sequence Stack, stores the unversial index for the current loop		/// Loop Sequence Stack, stores the universal index for the current loop
/// sequence.		/// sequence.
std::vector<Value> loopSeqStack;		std::vector<Value> loopSeqStack;

/// Maps AffineDimExpr to the index of the loop in loopStack.		/// Maps `LoopId` (used by `AffineDimExpr`) to `LoopOrd` (in the `loopStack`).
/// TODO: We should probably use a callback function here to make it more		/// TODO: We should probably use a callback function here to make it more
/// general.		/// general.
std::vector<unsigned> sparsiferLoopLvlMap;		std::vector<LoopOrd> loopIdToOrd;

//
// View based reshape related-fields and methods
//

/// Collapse Reassociations related to a specific tensor
// TODO: support expand.
std::vector<ArrayAttr> collapseReassoc;

/// Get the collapse reassociation for tensors[tid] on l. For unreshaped
/// operands, the reassociation is simply an identity transformation.
SmallVector<int64_t, 2> getCollapseReassociation(unsigned tid, unsigned l) {
// Returns for SmallVector<int64_t, 2> just like `ReassociaionIndices`
if (auto reass = collapseReassoc[tid]) {
auto attr = reass[l];
return llvm::to_vector<2>(
llvm::map_range(attr.cast<ArrayAttr>(), [&](Attribute indexAttr) {
return indexAttr.cast<IntegerAttr>().getInt();
}));
}
return {l};
}

/// TODO: not yet used, it should track the current level for each tensor
/// to help eliminate `dim` paramters from above APIs.
/// std::vector<size_t> curLv;
};		};

} // namespace sparse_tensor		} // namespace sparse_tensor
} // namespace mlir		} // namespace mlir

#endif // MLIR_DIALECT_SPARSETENSOR_TRANSFORMS_SPARSETENSORLOOPEMITTER_H_		#endif // MLIR_DIALECT_SPARSETENSOR_TRANSFORMS_SPARSETENSORLOOPEMITTER_H_

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp

Show All 9 Lines
#include "CodegenUtils.h"		#include "CodegenUtils.h"

#include "mlir/Dialect/Arith/IR/Arith.h"		#include "mlir/Dialect/Arith/IR/Arith.h"
#include "mlir/Dialect/Bufferization/IR/Bufferization.h"		#include "mlir/Dialect/Bufferization/IR/Bufferization.h"
#include "mlir/Dialect/Linalg/IR/Linalg.h"		#include "mlir/Dialect/Linalg/IR/Linalg.h"
#include "mlir/Dialect/Linalg/Utils/Utils.h"		#include "mlir/Dialect/Linalg/Utils/Utils.h"
#include "mlir/Dialect/MemRef/IR/MemRef.h"		#include "mlir/Dialect/MemRef/IR/MemRef.h"
#include "mlir/Dialect/SCF/IR/SCF.h"		#include "mlir/Dialect/SCF/IR/SCF.h"
		#include "mlir/Dialect/SparseTensor/IR/SparseTensorType.h"
#include "mlir/Dialect/Tensor/IR/Tensor.h"		#include "mlir/Dialect/Tensor/IR/Tensor.h"

using namespace mlir;		using namespace mlir;
using namespace mlir::sparse_tensor;		using namespace mlir::sparse_tensor;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// File local helper functions.		// File local helper functions.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
Show All 13 Lines	if (load.getType().getIntOrFloatBitWidth() < 64)
load = builder.create<arith::ExtUIOp>(loc, builder.getI64Type(), load);		load = builder.create<arith::ExtUIOp>(loc, builder.getI64Type(), load);
load =		load =
builder.create<arith::IndexCastOp>(loc, builder.getIndexType(), load);		builder.create<arith::IndexCastOp>(loc, builder.getIndexType(), load);
}		}
return load;		return load;
}		}

static Value genSliceOffset(OpBuilder &builder, Location loc, Value tensor,		static Value genSliceOffset(OpBuilder &builder, Location loc, Value tensor,
unsigned lvl) {		Level lvl) {
auto enc = getSparseTensorEncoding(tensor.getType());		auto enc = getSparseTensorEncoding(tensor.getType());
// FIXME: `toOrigDim` is deprecated		// FIXME: `toOrigDim` is deprecated
return createOrFoldSliceOffsetOp(builder, loc, tensor, toOrigDim(enc, lvl));		return createOrFoldSliceOffsetOp(builder, loc, tensor, toOrigDim(enc, lvl));
}		}

static Value genSliceStride(OpBuilder &builder, Location loc, Value tensor,		static Value genSliceStride(OpBuilder &builder, Location loc, Value tensor,
unsigned lvl) {		Level lvl) {
auto enc = getSparseTensorEncoding(tensor.getType());		auto enc = getSparseTensorEncoding(tensor.getType());
// FIXME: `toOrigDim` is deprecated		// FIXME: `toOrigDim` is deprecated
return createOrFoldSliceStrideOp(builder, loc, tensor, toOrigDim(enc, lvl));		return createOrFoldSliceStrideOp(builder, loc, tensor, toOrigDim(enc, lvl));
}		}

// Converts a coordinate relative to the slice to the coordinate relative		/// Converts a coordinate relative to the slice to the coordinate relative
// to the underlying tensor.		/// to the underlying tensor.
static Value toSliceCoord(OpBuilder &builder, Location loc, Value v,		// FIXME: that description says "sliceCrd -> tensorCrd"; but the function
Value offset, Value stride, Value tensor,		// name suggests it should be "tensorCrd -> sliceCrd".
unsigned lvl) {		static Value toSliceCrd(OpBuilder &builder, Location loc, Value crd,
// iv = iv * stride + offset		Value offset, Value stride, Value tensor, Level lvl) {
v = builder.create<arith::MulIOp>(loc, v, stride);		// tensorCrd = sliceCrd * stride + offset
v = builder.create<arith::AddIOp>(loc, v, offset);		crd = builder.create<arith::MulIOp>(loc, crd, stride);
return v;		crd = builder.create<arith::AddIOp>(loc, crd, offset);
		return crd;
}		}

// Converts a coordinate relative to the underlying tensor to the coordinate		/// Converts a coordinate relative to the underlying tensor to the coordinate
// relative to the slice, returns a extra reminder value		/// relative to the slice, returns a extra reminder value
		// FIXME: that description says "tensorCrd -> sliceCrd"; but the function
		// name suggests it should be "sliceCrd -> tensorCrd".
static std::pair<Value, Value> fromSliceCrd(OpBuilder &builder, Location loc,		static std::pair<Value, Value> fromSliceCrd(OpBuilder &builder, Location loc,
Value iv, Value offset,		Value crd, Value offset,
Value stride, Value tensor,		Value stride, Value tensor,
unsigned lvl) {		Level lvl) {
// iv = (iv - offset) / stride		// sliceCrd = (tensorCrd - offset) / stride
iv = builder.create<arith::SubIOp>(loc, iv, offset);		crd = builder.create<arith::SubIOp>(loc, crd, offset);
Value rem = builder.create<arith::RemUIOp>(loc, iv, stride);		Value rem = builder.create<arith::RemUIOp>(loc, crd, stride);
iv = builder.create<arith::DivUIOp>(loc, iv, stride);		crd = builder.create<arith::DivUIOp>(loc, crd, stride);
return std::make_pair(iv, rem);		return std::make_pair(crd, rem);
}		}

std::pair<Value, Value>		std::pair<Value, Value>
LoopEmitter::genSliceLegitPredicate(OpBuilder &builder, Location loc, Value crd,		LoopEmitter::genSliceLegitPredicate(OpBuilder &builder, Location loc, Value crd,
unsigned tid, unsigned lvl) {		TensorId tid, Level lvl) {
assert(isSparseSlices[tid]);		assert(isSparseSlices[tid]);
Value slice = tensors[tid];		Value slice = tensors[tid];
Value offset = sliceOffsets[tid][lvl];		Value offset = sliceOffsets[tid][lvl];
Value stride = sliceStrides[tid][lvl];		Value stride = sliceStrides[tid][lvl];
auto enc = getSparseTensorEncoding(slice.getType());		auto enc = getSparseTensorEncoding(slice.getType());

std::pair<Value, Value> transformedCrd =		const auto [newCrd, crdRem] =
fromSliceCrd(builder, loc, crd, offset, stride, slice, lvl);		fromSliceCrd(builder, loc, crd, offset, stride, slice, lvl);

SmallVector<Value, 3> conds; // at most 3 conditions		SmallVector<Value, 3> conds; // at most 3 conditions

// First, coord >= offset (skip the check if offset is known to be 0).		// First, coord >= offset (skip the check if offset is known to be 0).
if (auto staticOffset = enc.getStaticLvlSliceOffset(lvl);		if (auto staticOffset = enc.getStaticLvlSliceOffset(lvl);
!(staticOffset.has_value() && *staticOffset == 0)) {		!(staticOffset.has_value() && *staticOffset == 0)) {
auto geOffset = builder.create<arith::CmpIOp>(		auto geOffset = builder.create<arith::CmpIOp>(
loc, arith::CmpIPredicate::uge, crd, offset);		loc, arith::CmpIPredicate::uge, crd, offset);
conds.push_back(geOffset);		conds.push_back(geOffset);
}		}

// Second, coord_in_slice < length		// Second, coord_in_slice < length
auto ltLength = builder.create<arith::CmpIOp>(		auto ltLength = builder.create<arith::CmpIOp>(loc, arith::CmpIPredicate::ult,
loc, arith::CmpIPredicate::ult, transformedCrd.first, lvlSizes[tid][lvl]);		newCrd, lvlSizes[tid][lvl]);
conds.push_back(ltLength);		conds.push_back(ltLength);

// Third, rem == 0 (skip the check if stride is known to be 1).		// Third, rem == 0 (skip the check if stride is known to be 1).
if (auto staticStride = enc.getStaticLvlSliceStride(lvl);		if (auto staticStride = enc.getStaticLvlSliceStride(lvl);
!(staticStride.has_value() && *staticStride == 1)) {		!(staticStride.has_value() && *staticStride == 1)) {
auto fitStride = builder.create<arith::CmpIOp>(		auto fitStride = builder.create<arith::CmpIOp>(
loc, arith::CmpIPredicate::eq, transformedCrd.second,		loc, arith::CmpIPredicate::eq, crdRem, constantIndex(builder, loc, 0));
constantIndex(builder, loc, 0));
conds.push_back(fitStride);		conds.push_back(fitStride);
}		}

// Must meet all condition to be a valid coordinate in slice.		// Must meet all condition to be a valid coordinate in slice.
auto pred = conds.front();		auto pred = conds.front();
for (auto cond : ValueRange(conds).drop_front())		for (auto cond : ValueRange(conds).drop_front())
pred = builder.create<arith::AndIOp>(loc, pred, cond);		pred = builder.create<arith::AndIOp>(loc, pred, cond);

return {transformedCrd.first, pred};		return {newCrd, pred};
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Sparse tensor loop emitter class implementations		// Sparse tensor loop emitter class implementations
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

Value LoopEmitter::genAddress(OpBuilder &builder, Location loc, size_t tid,		Value LoopEmitter::genAddress(OpBuilder &builder, Location loc, TensorId tid,
size_t dim, Value iv) {		Level lvl, Value crd) {
Value p = dim == 0 ? constantIndex(builder, loc, 0) : pidxs[tid][dim - 1];		Value pos = lvl == 0 ? constantIndex(builder, loc, 0) : posits[tid][lvl - 1];
Value mul = builder.create<arith::MulIOp>(loc, highs[tid][dim], p);		Value mul = builder.create<arith::MulIOp>(loc, highs[tid][lvl], pos);
if (isSparseSlices[tid])		if (isSparseSlices[tid])
iv = toSliceCoord(builder, loc, iv, sliceOffsets[tid][dim],		crd = toSliceCrd(builder, loc, crd, sliceOffsets[tid][lvl],
sliceStrides[tid][dim], tensors[tid], dim);		sliceStrides[tid][lvl], tensors[tid], lvl);
Value add = builder.create<arith::AddIOp>(loc, mul, iv);		Value add = builder.create<arith::AddIOp>(loc, mul, crd);
return add;		return add;
}		}

Value LoopEmitter::genSegmentHigh(OpBuilder &builder, Location loc, size_t tid,		Value LoopEmitter::genSegmentHigh(OpBuilder &builder, Location loc,
size_t lvl, Value pos, Value pHi) {		TensorId tid, Level lvl, Value pLo,
Value prevCrd = genIndexLoad(builder, loc, crdBuffer[tid][lvl], pos);		Value pHi) {
// De-duplicates repeated elements.		const auto coordinates = coordinatesBuffers[tid][lvl];
//		const auto sameCrd = genIndexLoad(builder, loc, coordinates, pLo);
// while (pos < pHi && coord[pos] == prev_coord)
// pos++;
// return pos;
auto whileOp = builder.create<scf::WhileOp>(		auto whileOp = builder.create<scf::WhileOp>(
loc, builder.getIndexType(), pos,		loc, builder.getIndexType(), pLo,
/beforeBuilder=/		/beforeBuilder=/
[this, tid, lvl, pHi, prevCrd](OpBuilder &builder, Location loc,		[pHi, coordinates, sameCrd](OpBuilder &builder, Location loc,
ValueRange ivs) {		ValueRange ivs) {
		const auto pos = ivs[0];
Value inBound = builder.create<arith::CmpIOp>(		Value inBound = builder.create<arith::CmpIOp>(
loc, arith::CmpIPredicate::ult, ivs[0], pHi);		loc, arith::CmpIPredicate::ult, pos, pHi);
auto ifOp =		auto ifInBound =
builder.create<scf::IfOp>(loc, builder.getI1Type(), inBound, true);		builder.create<scf::IfOp>(loc, builder.getI1Type(), inBound, true);
{		{
OpBuilder::InsertionGuard guard(builder);		OpBuilder::InsertionGuard guard(builder);
// Load the next coordinates only when inbound (to avoid OOB		// Load the next coordinates only when inbound (to avoid OOB
// acccesses).		// acccesses).
builder.setInsertionPointToStart(ifOp.thenBlock());		builder.setInsertionPointToStart(ifInBound.thenBlock());
Value nxCrd = genIndexLoad(builder, loc, crdBuffer[tid][lvl], ivs[0]);		Value crd = genIndexLoad(builder, loc, coordinates, pos);
Value cont = builder.create<arith::CmpIOp>(		Value isSameCrd = builder.create<arith::CmpIOp>(
loc, arith::CmpIPredicate::eq, nxCrd, prevCrd);		loc, arith::CmpIPredicate::eq, crd, sameCrd);
builder.create<scf::YieldOp>(loc, cont);		builder.create<scf::YieldOp>(loc, isSameCrd);
// Else, the position is out of bound, yield false to terminate the		// Else, the position is out of bound, yield false to terminate the
// loop.		// loop.
builder.setInsertionPointToStart(ifOp.elseBlock());		builder.setInsertionPointToStart(ifInBound.elseBlock());
builder.create<scf::YieldOp>(loc, constantI1(builder, loc, false));		builder.create<scf::YieldOp>(loc, constantI1(builder, loc, false));
}		}
builder.create<scf::ConditionOp>(loc, ifOp.getResults()[0], ivs);		builder.create<scf::ConditionOp>(loc, ifInBound.getResults()[0], ivs);
},		},
/afterBuilder=/		/afterBuilder=/
[](OpBuilder &builder, Location loc, ValueRange ivs) {		[](OpBuilder &builder, Location loc, ValueRange ivs) {
// pos ++		// pos ++
Value nxPos = builder.create<arith::AddIOp>(		Value nextPos = builder.create<arith::AddIOp>(
loc, ivs[0], constantIndex(builder, loc, 1));		loc, ivs[0], constantIndex(builder, loc, 1));
builder.create<scf::YieldOp>(loc, nxPos);		builder.create<scf::YieldOp>(loc, nextPos);
});		});
// Return the segment high.		// Return the segment high.
return whileOp.getResult(0);		return whileOp.getResult(0);
}		}

Value LoopEmitter::genSparseCrd(OpBuilder &builder, Location loc, size_t tid,		Value LoopEmitter::genSparseCrd(OpBuilder &builder, Location loc, TensorId tid,
size_t dstLvl) {		Level dstLvl) {
Value crd = constantIndex(builder, loc, 0);		Value crd = constantIndex(builder, loc, 0);
const auto reassoc = getCollapseReassociation(tid, dstLvl);		const auto reassoc = getCollapseReassociation(tid, dstLvl);
for (unsigned i = 0; i < reassoc.size(); i++) {		const unsigned reassocSize = reassoc.size();
const auto srcLvl = reassoc[i];		for (unsigned i = 0; i < reassocSize; i++) {
		const Level srcLvl = reassoc[i];
// A load on the coordinates array yields the coordinate.		// A load on the coordinates array yields the coordinate.
const Value mem = crdBuffer[tid][srcLvl];		const Value mem = coordinatesBuffers[tid][srcLvl];
const Value pos = pidxs[tid][dstLvl];		/// FIXME: See the [CLARIFY_POSITS_LVL] note in the header.
		const Value pos = posits[tid][dstLvl];
const Value off = genIndexLoad(builder, loc, mem, pos);		const Value off = genIndexLoad(builder, loc, mem, pos);
// Linearized the coordinates within the same collapse reassociation.		// Linearized the coordinates within the same collapse reassociation.
crd = builder.create<arith::AddIOp>(loc, crd, off);		crd = builder.create<arith::AddIOp>(loc, crd, off);
if (i != reassoc.size() - 1) {		if (i != reassocSize - 1) {
crd = builder.create<arith::MulIOp>(loc, crd,		crd = builder.create<arith::MulIOp>(loc, crd,
this->lvlSizes[tid][reassoc[i + 1]]);		this->lvlSizes[tid][reassoc[i + 1]]);
}		}
}		}
return crd;		return crd;
}		}

LoopEmitter::LoopEmitter(ValueRange tensors, StringAttr loopTag, bool hasOutput,		LoopEmitter::LoopEmitter(ValueRange tensors, StringAttr loopTag, bool hasOutput,
bool isSparseOut, ArrayRef<unsigned> topSort) {		bool isSparseOut, ArrayRef<LoopId> topSort) {
initialize(tensors, loopTag, hasOutput, isSparseOut, topSort);		initialize(tensors, loopTag, hasOutput, isSparseOut, topSort);
}		}

void LoopEmitter::initialize(ValueRange ts, StringAttr loopTag, bool hasOutput,		void LoopEmitter::initialize(ValueRange ts, StringAttr loopTag, bool hasOutput,
bool isSparseOut, ArrayRef<unsigned> topSort) {		bool isSparseOut, ArrayRef<LoopId> topSort) {
// First initializes fields.		// First initialize the top-level type of the fields.
this->loopTag = loopTag;		this->loopTag = loopTag;
this->hasOutput = hasOutput;		this->hasOutput = hasOutput;
this->isSparseOut = isSparseOut;		this->isSparseOut = isSparseOut;
this->tensors.assign(ts.begin(), ts.end());
this->isSparseSlices.assign(tensors.size(), false);
this->sliceOffsets.assign(tensors.size(), std::vector<Value>());
this->sliceStrides.assign(tensors.size(), std::vector<Value>());
this->dimTypes.assign(tensors.size(), std::vector<DimLevelType>());
this->pidxs.assign(tensors.size(), std::vector<Value>());
this->segHi.assign(tensors.size(), std::vector<Value>());
this->coord.assign(tensors.size(), std::vector<Value>());
this->highs.assign(tensors.size(), std::vector<Value>());
this->lvlSizes.assign(tensors.size(), std::vector<Value>());
this->posBuffer.assign(tensors.size(), std::vector<Value>());
this->crdBuffer.assign(tensors.size(), std::vector<Value>());
this->valBuffer.assign(tensors.size(), nullptr);
this->loopStack.reserve(topSort.size());
this->sparsiferLoopLvlMap.assign(topSort.size(), 0);
this->collapseReassoc.assign(tensors.size(), nullptr);

for (size_t tid = 0, e = tensors.size(); tid < e; tid++) {		const TensorId numTensors = ts.size();
auto t = tensors[tid];		this->tensors.assign(ts.begin(), ts.end());
		this->lvlTypes.assign(numTensors, std::vector<DimLevelType>());
		this->lvlSizes.assign(numTensors, std::vector<Value>());
		this->highs.assign(numTensors, std::vector<Value>());
		this->segHi.assign(numTensors, std::vector<Value>());
		PeimingUnsubmitted Done Reply Inline Actions Yeah, probably we should. Peiming: Yeah, probably we should.
		this->posits.assign(numTensors, std::vector<Value>());
		this->coords.assign(numTensors, std::vector<Value>());
		PeimingUnsubmitted Done Reply Inline Actions Initializes (finally my turn ;-)) Peiming: Initializes (finally my turn ;-))
		wrengrAuthorUnsubmitted Done Reply Inline Actions Fwiw, the style Aart has been pushing is: (1) for documentation on functions/methods, use the indicative ("-s"); but (2) for commentary on statements, use the imperative ("-{}") ;) wrengr: Fwiw, the style Aart has been pushing is: (1) for documentation on functions/methods, use the…
		this->positionsBuffers.assign(numTensors, std::vector<Value>());
		this->coordinatesBuffers.assign(numTensors, std::vector<Value>());
		this->valBuffer.assign(numTensors, nullptr);
		this->collapseReassoc.assign(numTensors, nullptr);
		this->isSparseSlices.assign(numTensors, false);
		this->sliceOffsets.assign(numTensors, std::vector<Value>());
		this->sliceStrides.assign(numTensors, std::vector<Value>());

		const LoopOrd numLoops = topSort.size();
		// These zeros will be overwritten below, but we need to initialize
		// them to something since we'll need random-access assignment.
		this->loopIdToOrd.assign(numLoops, 0);
		this->loopStack.reserve(numLoops);
		this->loopSeqStack.reserve(numLoops);

		// Initialize nested types of `TensorId`-indexed fields.
		for (TensorId tid = 0; tid < numTensors; tid++) {
		const Value t = tensors[tid];
// a scalar or 0-dimension tensors		// a scalar or 0-dimension tensors
if (isZeroRankedTensorOrScalar(t.getType()))		if (isZeroRankedTensorOrScalar(t.getType()))
continue;		continue;

auto rtp = getRankedTensorType(t);		auto rtp = getRankedTensorType(t);
if (auto reshape = t.getDefiningOp<tensor::CollapseShapeOp>();		if (auto reshape = t.getDefiningOp<tensor::CollapseShapeOp>();
isUniqueCOOType(rtp) && reshape) {		isUniqueCOOType(rtp) && reshape) {
// TODO: Supports more kinds of sparse tensors.		// TODO: Supports more kinds of sparse tensors.
// FIXME: We should instead lower reshape operations on sparse tensors to		// FIXME: We should instead lower reshape operations on sparse tensors to
// view change.		// view change.
collapseReassoc[tid] = reshape.getReassociation();		collapseReassoc[tid] = reshape.getReassociation();
rtp = reshape.getSrcType();		rtp = reshape.getSrcType();
// Overwrites the tensor to the source tensor of reshape operations.		// Overwrites the tensor to the source tensor of reshape operations.
tensors[tid] = t = reshape.getSrc();		tensors[tid] = reshape.getSrc();
}		}
auto rank = static_cast<size_t>(rtp.getRank());		const SparseTensorType stt(rtp);
auto enc = getSparseTensorEncoding(rtp);		const Level lvlRank = stt.getLvlRank();
// We always treat sparse output tensor as dense so that we always iterate		// We always treat sparse output tensor as dense so that we always iterate
// it based on dim size.		// it based on lvl size.
if (enc && !(isOutputTensor(tid) && isSparseOut)) {		if (stt.hasEncoding() && !(isOutputTensor(tid) && isSparseOut)) {
		const auto enc = stt.getEncoding();
isSparseSlices[tid] = enc.isSlice();		isSparseSlices[tid] = enc.isSlice();
for (auto dimTp : enc.getDimLevelType())		for (auto lvlTp : enc.getDimLevelType())
dimTypes[tid].push_back(dimTp);		lvlTypes[tid].push_back(lvlTp);
} else		} else {
dimTypes[tid].assign(rank, DimLevelType::Dense);		lvlTypes[tid].assign(lvlRank, DimLevelType::Dense);
		}

// Initialize using empty value.		// Initialize using empty value.
sliceOffsets[tid].assign(rank, Value());		lvlSizes[tid].assign(lvlRank, Value());
sliceStrides[tid].assign(rank, Value());		highs[tid].assign(lvlRank, Value());
pidxs[tid].assign(rank, Value());		segHi[tid].assign(lvlRank, Value());
segHi[tid].assign(rank, Value());		posits[tid].assign(lvlRank, Value());
coord[tid].assign(rank, Value());		coords[tid].assign(lvlRank, Value());
highs[tid].assign(rank, Value());		positionsBuffers[tid].assign(lvlRank, Value());
lvlSizes[tid].assign(rank, Value());		coordinatesBuffers[tid].assign(lvlRank, Value());
posBuffer[tid].assign(rank, Value());		sliceOffsets[tid].assign(lvlRank, Value());
crdBuffer[tid].assign(rank, Value());		sliceStrides[tid].assign(lvlRank, Value());
}		}

		// Construct the inverse of the `topSort` from the sparsifier.
		// This is needed to map `AffineDimExpr`s back to the `LoopOrd`
		// used in loop emitter.
// FIXME: This map should be maintained outside loop emitter.		// FIXME: This map should be maintained outside loop emitter.
for (unsigned i = 0, e = topSort.size(); i < e; i++) {		for (LoopOrd n = 0; n < numLoops; n++)
// This is an inverse map of the topologically sorted loop index from		loopIdToOrd[topSort[n]] = n;
// sparsifier. This is needed to map the AffineDimExpr back to the loopStack
// index used in loop emitter.
sparsiferLoopLvlMap[topSort[i]] = i;
}
}		}

void LoopEmitter::initializeLoopEmit(OpBuilder &builder, Location loc,		void LoopEmitter::initializeLoopEmit(OpBuilder &builder, Location loc,
LoopEmitter::OutputUpdater updater) {		LoopEmitter::OutputUpdater updater) {
// For every tensor, find lower and upper bound on dimensions, set the		// For every tensor:
// same bounds on loop indices, and obtain dense or sparse buffer(s).		// * get the values buffer.
for (size_t t = 0, e = tensors.size(); t < e; t++) {		// * For every level:
const auto tensor = tensors[t];		// * get the positions and coordinates buffers
		// * get/compute the level-size, which is also used as the upper-bound
		// on positions.
		for (TensorId t = 0, numTensors = getNumTensors(); t < numTensors; t++) {
		const Value tensor = tensors[t];
const auto rtp = tensor.getType().dyn_cast<RankedTensorType>();		const auto rtp = tensor.getType().dyn_cast<RankedTensorType>();
if (!rtp)		if (!rtp)
// Skips only scalar, zero ranked tensor still need to be bufferized and		// Skips only scalar, zero ranked tensor still need to be bufferized and
// (probably) filled with zeros by users.		// (probably) filled with zeros by users.
continue;		continue;
// FIXME: the definition of `lvlRank` looks more like a dim-rank;		// FIXME: the definition of `lvlRank` looks more like a dim-rank;
// but the variable is used as a level everywhere below, which		// but the variable is used as a level everywhere below, which
// suggests there may be some dim/lvl confusion going on here.		// suggests there may be some dim/lvl confusion going on here.
const Level lvlRank = rtp.getRank();		const Level lvlRank = rtp.getRank();
const auto shape = rtp.getShape();		const auto shape = rtp.getShape();
const auto enc = getSparseTensorEncoding(rtp);		const auto enc = getSparseTensorEncoding(rtp);
const Level cooStart = enc ? getCOOStart(enc) : lvlRank;		const Level cooStart = enc ? getCOOStart(enc) : lvlRank;
// Scan all levels of current tensor.		// Scan all levels of current tensor.
for (Level l = 0; l < lvlRank; l++) {		for (Level l = 0; l < lvlRank; l++) {
// This should be called only once at beginning.		// This should be called only once at beginning.
assert(!posBuffer[t][l] && !crdBuffer[t][l] && !highs[t][l]);		assert(!positionsBuffers[t][l] && !coordinatesBuffers[t][l] &&
const auto dlt = dimTypes[t][l];		!highs[t][l]);
		const auto lvlTp = lvlTypes[t][l];
// Handle sparse storage schemes.		// Handle sparse storage schemes.
if (isCompressedDLT(dlt)) {		if (isCompressedDLT(lvlTp)) {
// Generate sparse primitives to obtains positions and coordinates.		// Generate sparse primitives to obtain positions and coordinates.
posBuffer[t][l] = genToPositions(builder, loc, tensor, l);		positionsBuffers[t][l] = genToPositions(builder, loc, tensor, l);
crdBuffer[t][l] = genToCoordinates(builder, loc, tensor, l, cooStart);		coordinatesBuffers[t][l] =
} else if (isSingletonDLT(dlt)) {		genToCoordinates(builder, loc, tensor, l, cooStart);
		} else if (isSingletonDLT(lvlTp)) {
// Singleton level, fetch coordinates.		// Singleton level, fetch coordinates.
crdBuffer[t][l] = genToCoordinates(builder, loc, tensor, l, cooStart);		coordinatesBuffers[t][l] =
		genToCoordinates(builder, loc, tensor, l, cooStart);
} else {		} else {
// Dense level, nothing to fetch.		// Dense level, nothing to fetch.
assert(isDenseDLT(dlt));		assert(isDenseDLT(lvlTp));
}		}

// FIXME: `toOrigDim` is deprecated		// FIXME: `toOrigDim` is deprecated. For now this relies on the
// Since we do not have HigherOrdering now, we can always rely on the 1:1		// 1:1 mapping between levels and dimensions, since nowhere else
// mapping from level to dimension to retrieve the level size.		// in the code supports HigherOrdering yet either.
Value lvlSz = mlir::linalg::createOrFoldDimOp(builder, loc, tensor,		Value lvlSz = mlir::linalg::createOrFoldDimOp(builder, loc, tensor,
toOrigDim(enc, l));		toOrigDim(enc, l));
// Find upper bound in current dimension.		// Find upper bound in current dimension.
highs[t][l] = lvlSizes[t][l] = lvlSz;		highs[t][l] = lvlSizes[t][l] = lvlSz;
if (isSparseSlices[t]) {		if (isSparseSlices[t]) {
sliceOffsets[t][l] = genSliceOffset(builder, loc, tensors[t], l);		sliceOffsets[t][l] = genSliceOffset(builder, loc, tensors[t], l);
sliceStrides[t][l] = genSliceStride(builder, loc, tensors[t], l);		sliceStrides[t][l] = genSliceStride(builder, loc, tensors[t], l);
}		}
Show All 19 Lines	if (!enc) {
builder.create<bufferization::ToMemrefOp>(loc, denseTp, tensor);		builder.create<bufferization::ToMemrefOp>(loc, denseTp, tensor);
// Dense outputs need special handling.		// Dense outputs need special handling.
if (isOutput && updater)		if (isOutput && updater)
denseVal = updater(builder, loc, denseVal, tensor);		denseVal = updater(builder, loc, denseVal, tensor);

valBuffer[t] = denseVal;		valBuffer[t] = denseVal;
} else {		} else {
// Annotated sparse tensors.		// Annotated sparse tensors.
// We also need the value buffer for annotated all dense `sparse` tensor.		// We also need the value buffer for all-dense annotated "sparse" tensors.
valBuffer[t] = genToValues(builder, loc, tensor);		valBuffer[t] = genToValues(builder, loc, tensor);
}		}
// NOTE: we can also prepare for 0 dim here in advance, this will hosit		// NOTE: we can also prepare for 0 lvl here in advance, this will hoist
// some loop preparation from tensor iteration, but will also (undesirably)		// some loop preparation from tensor iteration, but will also (undesirably)
// hosit the code ouside if conditions.		// hoist the code ouside if-conditions.
}		}
}		}

void LoopEmitter::enterNewLoopSeq(OpBuilder &builder, Location loc,		void LoopEmitter::enterNewLoopSeq(OpBuilder &builder, Location loc,
ArrayRef<size_t> tids,		ArrayRef<TensorId> tids,
ArrayRef<size_t> dims) {		ArrayRef<Level> lvls) {
// TODO: sort		// TODO: sort
assert(loopSeqStack.size() == loopStack.size());		assert(loopSeqStack.size() == loopStack.size());
// Universal Index starts from 0.		// Universal Index starts from 0.
loopSeqStack.emplace_back(constantIndex(builder, loc, 0));		loopSeqStack.emplace_back(constantIndex(builder, loc, 0));
// Prepares for all the tensors used in the current loop sequence.		// Prepares for all the tensors used in the current loop sequence.
for (auto [tid, dim] : llvm::zip(tids, dims))		assert(tids.size() == lvls.size());
prepareLoopOverTensorAtDim(builder, loc, tid, dim);		for (auto [tid, lvl] : llvm::zip(tids, lvls))
		prepareLoopOverTensorAtLvl(builder, loc, tid, lvl);
}		}

Value LoopEmitter::genAffine(OpBuilder &builder, AffineExpr a, Location loc) {		Value LoopEmitter::genAffine(OpBuilder &builder, Location loc, AffineExpr a) {
switch (a.getKind()) {		switch (a.getKind()) {
case AffineExprKind::DimId: {		case AffineExprKind::DimId: {
unsigned idx = a.cast<AffineDimExpr>().getPosition();		// FIXME: since the one callsite in Sparsification passes in a
return loopStack[sparsiferLoopLvlMap[idx]].iv;		// level-expression, the `getPosition` must in fact be a `Dimension`.
		// However, elsewhere we have been lead to expect that `loopIdToOrd`
		// should be indexed by `LoopId`...
		const LoopId i = a.cast<AffineDimExpr>().getPosition();
		return loopStack[loopIdToOrd[i]].iv;
}		}
case AffineExprKind::Add: {		case AffineExprKind::Add: {
auto binOp = a.cast<AffineBinaryOpExpr>();		auto binOp = a.cast<AffineBinaryOpExpr>();
return builder.create<arith::AddIOp>(		return builder.create<arith::AddIOp>(
loc, genAffine(builder, binOp.getLHS(), loc),		loc, genAffine(builder, loc, binOp.getLHS()),
genAffine(builder, binOp.getRHS(), loc));		genAffine(builder, loc, binOp.getRHS()));
}		}
case AffineExprKind::Mul: {		case AffineExprKind::Mul: {
auto binOp = a.cast<AffineBinaryOpExpr>();		auto binOp = a.cast<AffineBinaryOpExpr>();
return builder.create<arith::MulIOp>(		return builder.create<arith::MulIOp>(
loc, genAffine(builder, binOp.getLHS(), loc),		loc, genAffine(builder, loc, binOp.getLHS()),
genAffine(builder, binOp.getRHS(), loc));		genAffine(builder, loc, binOp.getRHS()));
}		}
case AffineExprKind::Constant: {		case AffineExprKind::Constant: {
int64_t c = a.cast<AffineConstantExpr>().getValue();		int64_t c = a.cast<AffineConstantExpr>().getValue();
return constantIndex(builder, loc, c);		return constantIndex(builder, loc, c);
}		}
default:		default:
llvm_unreachable("unexpected affine subscript");		llvm_unreachable("unexpected affine subscript");
}		}
}		}

Operation *LoopEmitter::enterLoopOverTensorAtDim(		Operation *LoopEmitter::enterLoopOverTensorAtLvl(
OpBuilder &builder, Location loc, ArrayRef<size_t> tids,		OpBuilder &builder, Location loc, ArrayRef<TensorId> tids,
ArrayRef<size_t> dims, MutableArrayRef<Value> reduc, bool isParallel) {		ArrayRef<Level> lvls, MutableArrayRef<Value> reduc, bool isParallel) {
// TODO: support multiple return on parallel for?		// TODO: support multiple return on parallel for?
assert(!isParallel \|\| reduc.size() <= 1);		assert(!isParallel \|\| reduc.size() <= 1);
bool isSparseInput = false;		bool isSparseInput = false;
size_t tid = tids.front(), dim = dims.front();		TensorId tid = tids.front();
for (auto [t, d] : llvm::zip(tids, dims)) {		Level dstLvl = lvls.front();
assert(dimTypes[t].size() > d); // Must be a valid tid, dim pair		assert(tids.size() == lvls.size());
assert(!coord[t][d]); // We cannot re-enter the same level		for (auto [t, l] : llvm::zip(tids, lvls)) {
auto dimType = dimTypes[t][d];		// TODO: this check for validity of the (t,l) pairs should be
// Must be a recognizable DLT.		// checked/enforced at the callsites, if possible.
assert(isDenseDLT(dimType) \|\| isCompressedDLT(dimType) \|\|		assert(t < lvlTypes.size() && l < lvlTypes[t].size());
isSingletonDLT(dimType));		assert(!coords[t][l]); // We cannot re-enter the same level
bool isSparse = isCompressedDLT(dimType) \|\| isSingletonDLT(dimType);		const auto lvlTp = lvlTypes[t][l];
		const bool isSparse = isCompressedDLT(lvlTp) \|\| isSingletonDLT(lvlTp);
		// Must be a recognizable level-type.
		assert(isSparse \|\| isDenseDLT(lvlTp));
// We can at most have one sparse input, otherwise, a while loop is required		// We can at most have one sparse input, otherwise, a while loop is required
// to co-iterate multiple sparse tensors.		// to co-iterate multiple sparse tensors.
assert(!isSparseInput \|\| !isSparse);		assert(!isSparseInput \|\| !isSparse);
if (isSparse) {		if (isSparse) {
tid = t;		tid = t;
dim = d;		dstLvl = l;
}		}
isSparseInput = isSparseInput \|\| isSparse;		isSparseInput = isSparseInput \|\| isSparse;
}		}

const auto reassoc = getCollapseReassociation(tid, dim);		const auto reassoc = getCollapseReassociation(tid, dstLvl);
// TODO: support dynamic slices.		// TODO: support dynamic slices.
// Uses the first dimension here to build the loop bound (which is also the		// Use the first source-level here to build the loop bound (which is
// biggest range).		// also the biggest range).
const auto fdim = reassoc.front();		const Level srcLvl = reassoc.front();
Value step = constantIndex(builder, loc, 1);		const Value step = constantIndex(builder, loc, 1);
Value lo = isSparseInput ? pidxs[tid][fdim] // current offset		/// FIXME: See the [CLARIFY_POSITS_LVL] note in the header.
		const Value lo = isSparseInput ? posits[tid][srcLvl] // current position
: loopSeqStack.back(); // universal index		: loopSeqStack.back(); // universal index
Value hi = highs[tid][fdim];		const Value hi = highs[tid][srcLvl];

Operation *loop = nullptr;		Operation *loop = nullptr;
Value iv;		Value iv;
if (isParallel) {		if (isParallel) {
assert(collapseReassoc[tid] == nullptr);		assert(collapseReassoc[tid] == nullptr);
scf::ParallelOp parOp =		scf::ParallelOp parOp =
builder.create<scf::ParallelOp>(loc, lo, hi, step, reduc);		builder.create<scf::ParallelOp>(loc, lo, hi, step, reduc);
builder.setInsertionPointToStart(parOp.getBody());		builder.setInsertionPointToStart(parOp.getBody());
assert(parOp.getNumReductions() == reduc.size());		assert(parOp.getNumReductions() == reduc.size());
iv = parOp.getInductionVars()[0];		iv = parOp.getInductionVars()[0];

// In-place update on the reduction variable vector.		// In-place update on the reduction variable vector.
// Note that the init vals is not the actual reduction variables but instead		// Note that the init vals is not the actual reduction variables but instead
// used as a `special handle` to (temporarily) represent them. The		// used as a "special handle" to (temporarily) represent them. The
// expression on init vals will be moved into scf.reduce and replaced with		// expression on init vals will be moved into scf.reduce and replaced with
// the block arguments when exiting the loop (see exitForLoop). This is		// the block arguments when exiting the loop (see exitForLoop). This is
// needed as we can not build the actual reduction block and get the actual		// needed as we can not build the actual reduction block and get the actual
// reduction varaible before users fill parallel loop body.		// reduction varaible before users fill parallel loop body.
for (int i = 0, e = reduc.size(); i < e; i++)		for (int i = 0, e = reduc.size(); i < e; i++)
reduc[i] = parOp.getInitVals()[i];		reduc[i] = parOp.getInitVals()[i];
loop = parOp;		loop = parOp;
} else {		} else {
scf::ForOp forOp = builder.create<scf::ForOp>(loc, lo, hi, step, reduc);		scf::ForOp forOp = builder.create<scf::ForOp>(loc, lo, hi, step, reduc);
builder.setInsertionPointToStart(forOp.getBody());		builder.setInsertionPointToStart(forOp.getBody());
iv = forOp.getInductionVar();		iv = forOp.getInductionVar();

// In-place update on the reduction variable vector.		// In-place update on the reduction variable vector.
assert(forOp.getNumRegionIterArgs() == reduc.size());		assert(forOp.getNumRegionIterArgs() == reduc.size());
for (int i = 0, e = reduc.size(); i < e; i++)		for (int i = 0, e = reduc.size(); i < e; i++)
reduc[i] = forOp.getRegionIterArg(i);		reduc[i] = forOp.getRegionIterArg(i);
loop = forOp;		loop = forOp;
}		}
assert(loop && iv);		assert(loop && iv);

Value crd;		Value crd;
if (isSparseInput) {		if (isSparseInput) {
assert(reassoc.size() == 1 \|\| isUniqueCOOType(tensors[tid].getType()));		assert(reassoc.size() == 1 \|\| isUniqueCOOType(tensors[tid].getType()));
// For COO, the position is the same across consecutive levels.		// For COO, the position is the same across consecutive levels.
		/// FIXME: See the [CLARIFY_POSITS_LVL] note in the header.
llvm::for_each(reassoc,		llvm::for_each(reassoc,
[this, tid, iv](Level lvl) { pidxs[tid][lvl] = iv; });		[this, tid, iv](Level srcLvl) { posits[tid][srcLvl] = iv; });
crd = genSparseCrd(builder, loc, tid, dim);		crd = genSparseCrd(builder, loc, tid, dstLvl);
		PeimingUnsubmitted Done Reply Inline Actions You should use `dstLvl` here Peiming: You should use `dstLvl` here
		wrengrAuthorUnsubmitted Done Reply Inline Actions Thanks :) wrengr: Thanks :)
} else {		} else {
// Dense tensor, the coordinate is the inducation variable.		// Dense tensor, the coordinate is the inducation variable.
crd = iv;		crd = iv;
}		}

if (isSparseSlices[tid] && isSparseInput) {		if (isSparseSlices[tid] && isSparseInput) {
// For sparse level slices, we need to filter out invalid coordinates that		// For sparse level slices, we need to filter out invalid coordinates that
// are not included in the slice.		// are not included in the slice.
SmallVector<Type> types;		SmallVector<Type> types;
for (Value red : reduc)		for (Value red : reduc)
types.push_back(red.getType());		types.push_back(red.getType());

auto [trans, pred] = genSliceLegitPredicate(builder, loc, crd, tid, dim);		auto [trans, pred] = genSliceLegitPredicate(builder, loc, crd, tid, srcLvl);
bool hasReduc = !types.empty();		bool hasReduc = !types.empty();
scf::IfOp ifOp = builder.create<scf::IfOp>(loc, types, pred,		scf::IfOp ifOp = builder.create<scf::IfOp>(loc, types, pred,
/else/ hasReduc);		/else/ hasReduc);
if (hasReduc) {		if (hasReduc) {
// scf.for (a) -> v		// scf.for (a) -> v
// %s = scf.if (a) -> v		// %s = scf.if (a) -> v
// user-generated code.		// user-generated code.
// else		// else
// yield a		// yield a
// yield %s		// yield %s
builder.create<scf::YieldOp>(loc, ifOp.getResults());		builder.create<scf::YieldOp>(loc, ifOp.getResults());
builder.setInsertionPointToStart(&ifOp.getElseRegion().front());		builder.setInsertionPointToStart(&ifOp.getElseRegion().front());
// On mismatch.		// On mismatch.
builder.create<scf::YieldOp>(loc, reduc);		builder.create<scf::YieldOp>(loc, reduc);
}		}
// Set the insertion point to matched branch.		// Set the insertion point to matched branch.
builder.setInsertionPointToStart(&ifOp.getThenRegion().front());		builder.setInsertionPointToStart(&ifOp.getThenRegion().front());
crd = trans;		crd = trans;
}		}

assert(crd);		assert(crd);
coord[tid][dim] = crd;		coords[tid][srcLvl] = crd;
// NOTE: we can also prepare for next dim here in advance		// NOTE: we can also prepare for next level here in advance
// Push the loop into stack		// Push the loop into stack
loopStack.emplace_back(ArrayRef<size_t>(tid), ArrayRef<size_t>(dim), loop,		loopStack.emplace_back(ArrayRef<TensorId>(tid), ArrayRef<Level>(srcLvl), loop,
builder.getInsertionBlock(), coord[tid][dim], loopTag);		builder.getInsertionBlock(), crd, loopTag);
// Emit extra locals.		// Emit extra locals.
emitExtraLocalsForTensorsAtDenseDims(builder, loc, tids, dims);		emitExtraLocalsForTensorsAtDenseLvls(builder, loc, tids, lvls);

return loop;		return loop;
}		}

Operation *LoopEmitter::enterFilterLoopOverTensorAtDim(		Operation *LoopEmitter::enterFilterLoopOverTensorAtLvl(
OpBuilder &builder, Location loc, size_t tid, size_t dim, AffineExpr affine,		OpBuilder &builder, Location loc, TensorId tid, Level lvl,
MutableArrayRef<Value> reduc) {		AffineExpr affine, MutableArrayRef<Value> reduc) {
assert(!affine.isa<AffineDimExpr>() && !isDenseDLT(dimTypes[tid][dim]));		assert(tid < lvlTypes.size() && lvl < lvlTypes[tid].size());
assert(dimTypes[tid].size() > dim);		assert(!affine.isa<AffineDimExpr>() && !isDenseDLT(lvlTypes[tid][lvl]));
// We can not re-enter the same level.		// We can not re-enter the same level.
assert(!coord[tid][dim]);		assert(!coords[tid][lvl]);

Value step = constantIndex(builder, loc, 1);

Value lo = pidxs[tid][dim];
Value hi = highs[tid][dim];

// TODO: We should instead use a whileOp for filter loop to allow early		// TODO: We should instead use a whileOp for filter loop to allow early
// break when exceeding (for ordered dimensions).		// break when exceeding (for ordered levels).
// TODO: There are many other potiential opportunities that we might apply in		// TODO: There are many other potiential opportunities that we might apply in
// the future. E.g., we could use binary search to located the position index.		// the future. E.g., we could use binary search to locate positions.
scf::ForOp forOp = builder.create<scf::ForOp>(loc, lo, hi, step, reduc);		const Value step = constantIndex(builder, loc, 1);
		const Value pLo = posits[tid][lvl];
		const Value pHi = highs[tid][lvl];
		scf::ForOp forOp = builder.create<scf::ForOp>(loc, pLo, pHi, step, reduc);

// In-place update on the reduction variable vector.		// In-place update on the reduction variable vector.
assert(forOp.getNumRegionIterArgs() == reduc.size());		assert(forOp.getNumRegionIterArgs() == reduc.size());
for (int i = 0, e = reduc.size(); i < e; i++)		for (int i = 0, e = reduc.size(); i < e; i++)
reduc[i] = forOp.getRegionIterArg(i);		reduc[i] = forOp.getRegionIterArg(i);

builder.setInsertionPointToStart(forOp.getBody());		builder.setInsertionPointToStart(forOp.getBody());
Value iv = forOp.getInductionVar();		// The induction variable gives the position.
		const Value pos = forOp.getInductionVar();
pidxs[tid][dim] = iv;		posits[tid][lvl] = pos;
// Generating a load on the coordinates array yields the coordinate.		// Generating a load on the coordinates array yields the crd.
Value mem = crdBuffer[tid][dim];		const Value mem = coordinatesBuffers[tid][lvl];
coord[tid][dim] = genIndexLoad(builder, loc, mem, iv);		const Value crd = genIndexLoad(builder, loc, mem, pos);
		coords[tid][lvl] = crd;

// Generate an if-condition to filter out coordinates that are not		// Generate an if-condition to filter out coordinates that are not
// equal to the result of the affine expression.		// equal to the result of the affine expression.
Value expected = genAffine(builder, affine, loc);		Value expected = genAffine(builder, loc, affine);
auto pred = builder.create<arith::CmpIOp>(loc, arith::CmpIPredicate::eq,		auto pred = builder.create<arith::CmpIOp>(loc, arith::CmpIPredicate::eq, crd,
coord[tid][dim], expected);		expected);
SmallVector<Type> types;		SmallVector<Type> types;
for (Value red : reduc) {		for (Value red : reduc) {
types.push_back(red.getType());		types.push_back(red.getType());
}		}

bool hasReduc = !types.empty();		bool hasReduc = !types.empty();
scf::IfOp ifOp =		scf::IfOp ifOp =
builder.create<scf::IfOp>(loc, types, pred, /else/ hasReduc);		builder.create<scf::IfOp>(loc, types, pred, /else/ hasReduc);
if (hasReduc) {		if (hasReduc) {
// scf.for (a) -> v		// scf.for (a) -> v
// %s = scf.if (a) -> v		// %s = scf.if (a) -> v
// user-generated code.		// user-generated code.
// else		// else
// yield a		// yield a
// yield %s		// yield %s
builder.create<scf::YieldOp>(loc, ifOp.getResults());		builder.create<scf::YieldOp>(loc, ifOp.getResults());
builder.setInsertionPointToStart(&ifOp.getElseRegion().front());		builder.setInsertionPointToStart(&ifOp.getElseRegion().front());
// On mismatch.		// On mismatch.
builder.create<scf::YieldOp>(loc, reduc);		builder.create<scf::YieldOp>(loc, reduc);
}		}
// Set the insert point to matched branch.		// Set the insert point to matched branch.
builder.setInsertionPointToStart(&ifOp.getThenRegion().front());		builder.setInsertionPointToStart(&ifOp.getThenRegion().front());

// NOTE: we can also prepare for next dim here in advance		// NOTE: we can also prepare for next lvl here in advance
// Push the loop into stack		// Push the loop into stack
loopStack.emplace_back(ArrayRef<size_t>(tid), ArrayRef<size_t>(dim), forOp,		loopStack.emplace_back(ArrayRef<TensorId>(tid), ArrayRef<Level>(lvl), forOp,
builder.getInsertionBlock(), coord[tid][dim], nullptr);		builder.getInsertionBlock(), crd, nullptr);
return forOp;		return forOp;
}		}

void LoopEmitter::genDenseAffineAddressAtCurLevel(OpBuilder &builder,		void LoopEmitter::genDenseAffineAddress(OpBuilder &builder, Location loc,
Location loc, size_t tid,		TensorId tid, Level lvl,
size_t dim,		AffineExpr lvlExpr) {
AffineExpr affine) {		assert(isDenseDLT(lvlTypes[tid][lvl]));
Value affineV = genAffine(builder, affine, loc);		// For dense levels, the level-coordinate also serves as the position.
pidxs[tid][dim] = genAddress(builder, loc, tid, dim, affineV);		Value lvlCrd = genAffine(builder, loc, lvlExpr);
		posits[tid][lvl] = genAddress(builder, loc, tid, lvl, lvlCrd);
}		}

Operation *LoopEmitter::enterCoIterationOverTensorsAtDims(		Operation *LoopEmitter::enterCoIterationOverTensorsAtLvls(
OpBuilder &builder, Location loc, ArrayRef<size_t> tids,		OpBuilder &builder, Location loc, ArrayRef<TensorId> tids,
ArrayRef<size_t> dims, bool needsUniv, MutableArrayRef<Value> reduc) {		ArrayRef<Level> lvls, bool needsUniv, MutableArrayRef<Value> reduc) {
assert(tids.size() == dims.size());		assert(tids.size() == lvls.size());
SmallVector<Type> types;		SmallVector<Type> types;
SmallVector<Value> operands;		SmallVector<Value> operands;
// Construct the while-loop with a parameter for each coordinate.		// Construct the while-loop with a parameter for each coordinate.
Type indexType = builder.getIndexType();		const Type indexType = builder.getIndexType();
for (auto [tid, dim] : llvm::zip(tids, dims)) {		for (auto [tid, lvl] : llvm::zip(tids, lvls)) {
if (isCompressedDLT(dimTypes[tid][dim]) \|\|		const auto lvlTp = lvlTypes[tid][lvl];
isSingletonDLT(dimTypes[tid][dim])) {		if (isCompressedDLT(lvlTp) \|\| isSingletonDLT(lvlTp)) {
const auto reassoc = getCollapseReassociation(tid, dim);		const auto reassoc = getCollapseReassociation(tid, lvl);
for (unsigned i = 0, e = reassoc.size() - 1; i < e; i++) {		for (unsigned i = 0, e = reassoc.size() - 1; i < e; i++) {
if (!isUniqueDLT(dimTypes[tid][reassoc[i]])) {		if (!isUniqueDLT(lvlTypes[tid][reassoc[i]])) {
// This is the segment high for each non-unique levels.		// This is the segment high for each non-unique levels.
types.push_back(indexType);		types.push_back(indexType);
operands.push_back(constantIndex(builder, loc, 0));		operands.push_back(constantIndex(builder, loc, 0));
}		}
}		}
assert(pidxs[tid][dim]);		const auto pos = posits[tid][reassoc.front()];
		assert(pos);
types.push_back(indexType);		types.push_back(indexType);
operands.push_back(pidxs[tid][reassoc.front()]);		operands.push_back(pos);
}		}
}		}
// The position where user-supplied reduction variable starts.		// The position where user-supplied reduction variable starts.
for (Value rec : reduc) {		for (Value rec : reduc) {
types.push_back(rec.getType());		types.push_back(rec.getType());
operands.push_back(rec);		operands.push_back(rec);
}		}
if (needsUniv) {		if (needsUniv) {
types.push_back(indexType);		types.push_back(indexType);
// Update universal index.		// Update universal index.
operands.push_back(loopSeqStack.back());		operands.push_back(loopSeqStack.back());
}		}
assert(types.size() == operands.size());		assert(types.size() == operands.size());
scf::WhileOp whileOp = builder.create<scf::WhileOp>(loc, types, operands);		scf::WhileOp whileOp = builder.create<scf::WhileOp>(loc, types, operands);

SmallVector<Location> locs(types.size(), loc);		SmallVector<Location> locs(types.size(), loc);
Block *before = builder.createBlock(&whileOp.getBefore(), {}, types, locs);		Block *before = builder.createBlock(&whileOp.getBefore(), {}, types, locs);
Block *after = builder.createBlock(&whileOp.getAfter(), {}, types, locs);		Block *after = builder.createBlock(&whileOp.getAfter(), {}, types, locs);

// Build the "before" region, which effectively consists		// Build the "before" region, which effectively consists
// of a conjunction of "i < upper" tests on all induction.		// of a conjunction of "i < upper" tests on all induction.
builder.setInsertionPointToStart(&whileOp.getBefore().front());		builder.setInsertionPointToStart(&whileOp.getBefore().front());
Value cond;		Value cond;
unsigned o = 0;		unsigned o = 0;
for (auto [t, lvl] : llvm::zip(tids, dims)) {		for (auto [t, lvl] : llvm::zip(tids, lvls)) {
unsigned tid = t; // Why `t` can not be captured by lambda?		unsigned tid = t; // Why `t` can not be captured by lambda?
if (isCompressedDLT(dimTypes[tid][lvl]) \|\|		const auto lvlTp = lvlTypes[tid][lvl];
isSingletonDLT(dimTypes[tid][lvl])) {		if (isCompressedDLT(lvlTp) \|\| isSingletonDLT(lvlTp)) {
const auto reassoc = getCollapseReassociation(tid, lvl);		const auto reassoc = getCollapseReassociation(tid, lvl);
assert(reassoc.size() == 1 \|\| isUniqueCOOType(tensors[tid].getType()));		assert(reassoc.size() == 1 \|\| isUniqueCOOType(tensors[tid].getType()));
for (unsigned i = 0, e = reassoc.size() - 1; i < e; i++) {		for (unsigned i = 0, e = reassoc.size() - 1; i < e; i++) {
if (!isUniqueDLT(dimTypes[tid][reassoc[i]])) {		if (!isUniqueDLT(lvlTypes[tid][reassoc[i]])) {
// Links the SSA chain for segHi.		// Links the SSA chain for segHi.
segHi[tid][reassoc[i]] = after->getArgument(o++);		segHi[tid][reassoc[i]] = after->getArgument(o++);
}		}
}		}
Value op1 = before->getArgument(o);		Value op1 = before->getArgument(o);
// We used the first level bound as the bound the collapsed set of levels.		// We used the first level bound as the bound the collapsed set of levels.
Value op2 = highs[tid][reassoc.front()];		Value op2 = highs[tid][reassoc.front()];
Value opc = builder.create<arith::CmpIOp>(loc, arith::CmpIPredicate::ult,		Value opc = builder.create<arith::CmpIOp>(loc, arith::CmpIPredicate::ult,
op1, op2);		op1, op2);
cond = cond ? builder.create<arith::AndIOp>(loc, cond, opc) : opc;		cond = cond ? builder.create<arith::AndIOp>(loc, cond, opc) : opc;
// Update positions		// Update positions
Value pos = after->getArgument(o++);		Value pos = after->getArgument(o++);
// For COO, the position is the same across consecutive levels.		// For COO, the position is the same across consecutive levels.
llvm::for_each(reassoc,		/// FIXME: See the [CLARIFY_POSITS_LVL] note in the header.
[this, tid, pos](Level lvl) { pidxs[tid][lvl] = pos; });		llvm::for_each(reassoc, [this, tid, pos](Level srcLvl) {
		posits[tid][srcLvl] = pos;
		});
}		}
}		}
builder.create<scf::ConditionOp>(loc, cond, before->getArguments());		builder.create<scf::ConditionOp>(loc, cond, before->getArguments());

// Generates while body.		// Generates while body.
builder.setInsertionPointToStart(&whileOp.getAfter().front());		builder.setInsertionPointToStart(&whileOp.getAfter().front());

SmallVector<std::pair<Value, unsigned>> slicesPreds;		SmallVector<std::pair<Value, unsigned>> slicesPreds;
unsigned i = 0;		unsigned i = 0;
for (auto [tid, dim] : llvm::zip(tids, dims)) {		for (auto [tid, lvl] : llvm::zip(tids, lvls)) {
// Prepares for next level.		// Prepares for next level.
if (isCompressedDLT(dimTypes[tid][dim]) \|\|		const auto lvlTp = lvlTypes[tid][lvl];
isSingletonDLT(dimTypes[tid][dim])) {		if (isCompressedDLT(lvlTp) \|\| isSingletonDLT(lvlTp)) {
coord[tid][dim] = genSparseCrd(builder, loc, tid, dim);		coords[tid][lvl] = genSparseCrd(builder, loc, tid, lvl);
if (isSparseSlices[tid]) {		if (isSparseSlices[tid]) {
auto [trans, pred] =		auto [trans, pred] =
genSliceLegitPredicate(builder, loc, coord[tid][dim], tid, dim);		genSliceLegitPredicate(builder, loc, coords[tid][lvl], tid, lvl);
slicesPreds.emplace_back(pred, i);		slicesPreds.emplace_back(pred, i);
// Updates to the relative coordinate to the slice.		// Updates to the relative coordinate to the slice.
coord[tid][dim] = trans;		coords[tid][lvl] = trans;
}		}
i++;		i++;
}		}
}		}

if (!slicesPreds.empty()) {		if (!slicesPreds.empty()) {
// Skips invalid loop iteration when slice coordinate is inapplicable.		// Skips invalid loop iteration when slice coordinate is inapplicable.
SmallVector<Value> yields(after->getArguments());		SmallVector<Value> yields(after->getArguments());
// Generates a list of if statments		// Generates a list of if statments
// pidx = in_slice ? pidx : pidx + 1		// pos = in_slice ? pos : pos + 1
// TODO: instead of always picking pidx + 1, we should set pidx = high to		// TODO: instead of always picking pos + 1, we should set pos = high to
// break to loop if the coordinates is larger than the slice size.		// break to loop if the coordinates are larger than the slice size.
		//
		// This "idx" is the index into `llvm::zip(tids, lvls)`
for (auto [pred, idx] : slicesPreds) {		for (auto [pred, idx] : slicesPreds) {
Value nextPidx = builder.create<arith::AddIOp>(		Value nextPos = builder.create<arith::AddIOp>(
loc, yields[idx], constantIndex(builder, loc, 1));		loc, yields[idx], constantIndex(builder, loc, 1));
yields[idx] =		yields[idx] =
builder.create<arith::SelectOp>(loc, pred, yields[idx], nextPidx);		builder.create<arith::SelectOp>(loc, pred, yields[idx], nextPos);
}		}

Value pred = slicesPreds.front().first;		Value pred = slicesPreds.front().first;
for (int i = 1, e = slicesPreds.size(); i < e; i++) {		for (int i = 1, e = slicesPreds.size(); i < e; i++) {
pred = builder.create<arith::AndIOp>(loc, pred, slicesPreds[i].first);		pred = builder.create<arith::AndIOp>(loc, pred, slicesPreds[i].first);
}		}
auto ifOp = builder.create<scf::IfOp>(loc, types, pred, /else/ true);		auto ifOp = builder.create<scf::IfOp>(loc, types, pred, /else/ true);
ifOp->setAttr(getLoopEmitterLoopAttrName(),		ifOp->setAttr(getLoopEmitterLoopAttrName(),
StringAttr::get(builder.getContext(), "slice"));		StringAttr::get(builder.getContext(), "slice"));
builder.create<scf::YieldOp>(loc, ifOp->getResults());		builder.create<scf::YieldOp>(loc, ifOp->getResults());
assert(types.size() == yields.size());		assert(types.size() == yields.size());
// If not all slices are legit		// If not all slices are legit
builder.setInsertionPointToStart(&ifOp.getElseRegion().front());		builder.setInsertionPointToStart(&ifOp.getElseRegion().front());
builder.create<scf::YieldOp>(loc, yields);		builder.create<scf::YieldOp>(loc, yields);

// If all slices are legit, start the user generated code.		// If all slices are legit, start the user generated code.
builder.setInsertionPointToStart(&ifOp.getThenRegion().front());		builder.setInsertionPointToStart(&ifOp.getThenRegion().front());
}		}

Value min;		Value min;
// Finds the minimum coordinate		// Finds the minimum coordinate
if (!needsUniv) {		if (!needsUniv) {
for (auto [tid, dim] : llvm::zip(tids, dims)) {		for (auto [tid, lvl] : llvm::zip(tids, lvls)) {
if (isCompressedDLT(dimTypes[tid][dim]) \|\|		const auto lvlTp = lvlTypes[tid][lvl];
isSingletonDLT(dimTypes[tid][dim])) {		if (isCompressedDLT(lvlTp) \|\| isSingletonDLT(lvlTp)) {
		const auto crd = coords[tid][lvl];
if (min) {		if (min) {
Value cmp = builder.create<arith::CmpIOp>(		Value cmp = builder.create<arith::CmpIOp>(
loc, arith::CmpIPredicate::ult, coord[tid][dim], min);		loc, arith::CmpIPredicate::ult, crd, min);
min = builder.create<arith::SelectOp>(loc, cmp, coord[tid][dim], min);		min = builder.create<arith::SelectOp>(loc, cmp, crd, min);
} else {		} else {
min = coord[tid][dim];		min = crd;
}		}
}		}
}		}
} else {		} else {
assert(!min);		assert(!min);
// Otherwise, universal index is the minimal pidx.		// Otherwise, universal index is the minimal pos.
min = after->getArguments().back();		min = after->getArguments().back();
}		}

// Sets up the loop stack.		// Sets up the loop stack.
loopStack.emplace_back(tids, dims, whileOp, builder.getInsertionBlock(), min,		loopStack.emplace_back(tids, lvls, whileOp, builder.getInsertionBlock(), min,
loopTag);		loopTag);
assert(loopStack.size() == loopSeqStack.size());		assert(loopStack.size() == loopSeqStack.size());

for (auto [tid, dim] : llvm::zip(tids, dims)) {		for (auto [tid, dstLvl] : llvm::zip(tids, lvls)) {
const auto reassoc = getCollapseReassociation(tid, dim);		const auto reassoc = getCollapseReassociation(tid, dstLvl);
assert(reassoc.size() == 1 \|\| isUniqueCOOType(tensors[tid].getType()));		assert(reassoc.size() == 1 \|\| isUniqueCOOType(tensors[tid].getType()));
// TODO: Refactors this into smaller functions.		// TODO: Refactors this into smaller functions.
// NOTE: For all the collapsed level (except for the last one, that is why		// NOTE: For all the collapsed level (except for the last one, that is why
// the loop ends with `reassoc.size() - 1`), as each iteration is advanced		// the loop ends with `reassoc.size() - 1`), as each iteration is advanced
// by the segment size of the last level, which does not always invalidate		// by the segment size of the last level, which does not always invalidate
// the segment size for the previous levels, thus we need to propagate the		// the segment size for the previous levels, thus we need to propagate the
// segment sizes across loop iterations and only forward if needed.		// segment sizes across loop iterations and only forward if needed.
//		//
// E.g., for a COO tensor with the following coordinates array.		// E.g., for a COO tensor with the following coordinates array.
// (0, 0, 1),		// (0, 0, 1),
// (0, 0, 2),		// (0, 0, 2),
// (1, 1, 1),		// (1, 1, 1),
// segHi[lvl=0] = segHi[lvl=1] = 2		// segHi[lvl=0] = segHi[lvl=1] = 2
// segHi[lvl=2] = 1,		// segHi[lvl=2] = 1,
// the first iteration does not invalidate segHi[0] and segHi[1]		// the first iteration does not invalidate segHi[0] and segHi[1]
for (unsigned i = 0, e = reassoc.size() - 1; i < e; i++) {		for (unsigned i = 0, e = reassoc.size() - 1; i < e; i++) {
const auto lvl = reassoc[i];		const Level srcLvl = reassoc[i];
if (!isUniqueDLT(dimTypes[tid][lvl])) {		if (!isUniqueDLT(lvlTypes[tid][srcLvl])) {
Value pos = pidxs[tid][lvl];		const Value pos = posits[tid][srcLvl];
assert(segHi[tid][lvl]);		const auto oldSegHi = segHi[tid][srcLvl];
		assert(oldSegHi);
Value newSegHi = builder.create<arith::CmpIOp>(		Value newSegHi = builder.create<arith::CmpIOp>(
loc, arith::CmpIPredicate::uge, pos, segHi[tid][lvl]);		loc, arith::CmpIPredicate::uge, pos, oldSegHi);
auto ifOp = builder.create<scf::IfOp>(loc, builder.getIndexType(),		auto ifNewSegHi = builder.create<scf::IfOp>(loc, builder.getIndexType(),
newSegHi, true);		newSegHi, true);
{		{
OpBuilder::InsertionGuard guard(builder);		OpBuilder::InsertionGuard guard(builder);
builder.setInsertionPointToStart(ifOp.thenBlock());		builder.setInsertionPointToStart(ifNewSegHi.thenBlock());
builder.create<scf::YieldOp>(		builder.create<scf::YieldOp>(loc,
loc,		genSegmentHigh(builder, loc, tid, srcLvl,
genSegmentHigh(builder, loc, tid, lvl, pos, highs[tid][lvl]));		pos, highs[tid][srcLvl]));
// Else, resues the same segment high.		// Else, resues the same segment high.
builder.setInsertionPointToStart(ifOp.elseBlock());		builder.setInsertionPointToStart(ifNewSegHi.elseBlock());
builder.create<scf::YieldOp>(loc, segHi[tid][lvl]);		builder.create<scf::YieldOp>(loc, oldSegHi);
}		}
highs[tid][lvl + 1] = segHi[tid][lvl] = ifOp.getResult(0);		highs[tid][srcLvl + 1] = segHi[tid][srcLvl] = ifNewSegHi.getResult(0);
}		}
};		};
const auto lvl = reassoc.back();		const auto srcLvl = reassoc.back();
if (!isUniqueDLT(dimTypes[tid][lvl])) {		if (!isUniqueDLT(lvlTypes[tid][srcLvl])) {
segHi[tid][lvl] = genSegmentHigh(builder, loc, tid, lvl, pidxs[tid][lvl],		segHi[tid][srcLvl] = genSegmentHigh(
highs[tid][lvl]);		builder, loc, tid, srcLvl, posits[tid][srcLvl], highs[tid][srcLvl]);
}		}
}		}

// Emits extra locals		// Emits extra locals
emitExtraLocalsForTensorsAtDenseDims(builder, loc, tids, dims);		emitExtraLocalsForTensorsAtDenseLvls(builder, loc, tids, lvls);

// Updates reduction variables		// Updates reduction variables
assert(after->getNumArguments() == o + reduc.size() + (needsUniv ? 1 : 0));		assert(after->getNumArguments() == o + reduc.size() + (needsUniv ? 1 : 0));
// In-place update on reduction variable.		// In-place update on reduction variable.
for (unsigned i = 0, e = reduc.size(); i < e; i++)		for (unsigned i = 0, e = reduc.size(); i < e; i++)
reduc[i] = after->getArgument(o + i);		reduc[i] = after->getArgument(o + i);

return whileOp;		return whileOp;
}		}

void LoopEmitter::prepareLoopOverTensorAtDim(OpBuilder &builder, Location loc,		void LoopEmitter::prepareLoopOverTensorAtLvl(OpBuilder &builder, Location loc,
size_t tid, size_t dim) {		TensorId tid, Level dstLvl) {
assert(dimTypes[tid].size() > dim);		assert(tid < lvlTypes.size() && dstLvl < lvlTypes[tid].size());
auto dimType = dimTypes[tid][dim];		const auto lvlTp = lvlTypes[tid][dstLvl];

if (isDenseDLT(dimType))		if (isDenseDLT(lvlTp))
return;		return;

for (auto lvl : getCollapseReassociation(tid, dim)) {		const Value c0 = constantIndex(builder, loc, 0);
		const Value c1 = constantIndex(builder, loc, 1);
		for (const Level srcLvl : getCollapseReassociation(tid, dstLvl)) {
// Either the first level, or the previous level has been set.		// Either the first level, or the previous level has been set.
assert(lvl == 0 \|\| pidxs[tid][lvl - 1]);		/// FIXME: See the [CLARIFY_POSITS_LVL] note in the header.
Value c0 = constantIndex(builder, loc, 0);		assert(srcLvl == 0 \|\| posits[tid][srcLvl - 1]);
Value c1 = constantIndex(builder, loc, 1);		if (!isCompressedDLT(lvlTp) && !isSingletonDLT(lvlTp))
if (isCompressedDLT(dimType)) {		continue;
Value mem = posBuffer[tid][lvl];		if (isCompressedDLT(lvlTp)) {
		const Value mem = positionsBuffers[tid][srcLvl];

Value pLo = lvl == 0 ? c0 : pidxs[tid][lvl - 1];		const Value pLo = srcLvl == 0 ? c0 : posits[tid][srcLvl - 1];
pidxs[tid][lvl] = genIndexLoad(builder, loc, mem, pLo);		posits[tid][srcLvl] = genIndexLoad(builder, loc, mem, pLo);

Value pHi = builder.create<arith::AddIOp>(loc, pLo, c1);		const Value pHi = builder.create<arith::AddIOp>(loc, pLo, c1);
highs[tid][lvl] = genIndexLoad(builder, loc, mem, pHi);		highs[tid][srcLvl] = genIndexLoad(builder, loc, mem, pHi);
return;		return;
}		}
if (isSingletonDLT(dimType)) {		if (isSingletonDLT(lvlTp)) {
Value pLo = lvl == 0 ? c0 : pidxs[tid][lvl - 1];		const Value pLo = srcLvl == 0 ? c0 : posits[tid][srcLvl - 1];
Value pHi;		posits[tid][srcLvl] = pLo;
// If this is non-unique, the pHi is bound by the segment high of the
// previous level.		// If we are coiterating non-unique levels, then use pHi=segHi;
if (!isUniqueDLT(dimTypes[tid][lvl - 1]))		// otherwise use pHi=pLo+1.
pHi = segHi[tid][lvl - 1];		// NOTE: Just because the level is non-unique, that does not
		// guarantee that segHi is defined: because we only generate segHi
// If pHi is still uninitialized, we set it to one as it is a singleton		// whenever coiterating, in order to improve code quality for the
// level.		// non-coiterating cases.
// NOTE: Even if the level is non-unique, the pHi might not have been set		const auto theSegHi = segHi[tid][srcLvl - 1];
// in the previous statement, as we only compute segment high when we are		highs[tid][srcLvl] = (!isUniqueDLT(lvlTypes[tid][srcLvl - 1]) && theSegHi)
// coiterating non-unique levels.		? theSegHi
if (!pHi)		: builder.create<arith::AddIOp>(loc, pLo, c1);
pHi = builder.create<arith::AddIOp>(loc, pLo, c1);
pidxs[tid][lvl] = pLo;
highs[tid][lvl] = pHi;
return;		return;
}		}
}		}

llvm_unreachable("Unrecognizable dimesion type!");		llvm_unreachable("Unrecognized level-type!");
}		}

void LoopEmitter::emitExtraLocalsForTensorsAtDenseDims(OpBuilder &builder,		void LoopEmitter::emitExtraLocalsForTensorsAtDenseLvls(OpBuilder &builder,
Location loc,		Location loc,
ArrayRef<size_t> tids,		ArrayRef<TensorId> tids,
ArrayRef<size_t> dims) {		ArrayRef<Level> lvls) {
// Initialize dense positions. Note that we generate dense coordinates of the		// Initialize dense positions. Note that we generate dense coordinates of the
// output tensor unconditionally, since they may not appear in the lattice,		// output tensor unconditionally, since they may not appear in the lattice,
// but may be needed for linearized codegen.		// but may be needed for linearized codegen.
for (auto [tid, dim] : llvm::zip(tids, dims)) {		assert(tids.size() == lvls.size());
if (isDenseDLT(dimTypes[tid][dim])) {		for (auto [tid, lvl] : llvm::zip(tids, lvls)) {
		if (isDenseDLT(lvlTypes[tid][lvl])) {
auto enc = getSparseTensorEncoding(tensors[tid].getType());		auto enc = getSparseTensorEncoding(tensors[tid].getType());
if (enc && !isSparseOutput(tid)) {		if (enc && !isSparseOutput(tid)) {
bool validPidx = dim == 0 \|\| pidxs[tid][dim - 1];		bool validPos = lvl == 0 \|\| posits[tid][lvl - 1];
if (!validPidx) {		if (!validPos) {
// We might not find the pidx for the sparse output tensor as it is		// We might not find the pos for the sparse output tensor as it is
// unconditionally required by the sparsification.		// unconditionally required by the sparsification.
assert(isOutputTensor(tid));		assert(isOutputTensor(tid));
continue;		continue;
}		}
pidxs[tid][dim] =		posits[tid][lvl] =
genAddress(builder, loc, tid, dim, loopStack.back().iv);		genAddress(builder, loc, tid, lvl, loopStack.back().iv);
// NOTE: we can also prepare for next dim here in advance		// NOTE: we can also prepare for next lvl here in advance
}		}
}		}
}		}
}		}

void LoopEmitter::exitForLoop(RewriterBase &rewriter, Location loc,		void LoopEmitter::exitForLoop(RewriterBase &rewriter, Location loc,
MutableArrayRef<Value> reduc) {		MutableArrayRef<Value> reduc) {
LoopLevelInfo &loopInfo = loopStack.back();		const LoopInfo &loopInfo = loopStack.back();
rewriter.setInsertionPointToEnd(loopInfo.userCodeBlock);		rewriter.setInsertionPointToEnd(loopInfo.userCodeBlock);
auto &dims = loopStack.back().dims;		if (auto forOp = llvm::dyn_cast<scf::ForOp>(loopInfo.loop)) {
auto &tids = loopStack.back().tids;
auto forOp = llvm::dyn_cast<scf::ForOp>(loopInfo.loop);
if (forOp) {
if (!reduc.empty()) {		if (!reduc.empty()) {
assert(reduc.size() == forOp.getNumResults());		assert(reduc.size() == forOp.getNumResults());
rewriter.create<scf::YieldOp>(loc, reduc);		rewriter.create<scf::YieldOp>(loc, reduc);
}		}
// Exit the loop.		// Exit the loop.
rewriter.setInsertionPointAfter(forOp);		rewriter.setInsertionPointAfter(forOp);
// In-place update reduction variables.		// In-place update reduction variables.
for (unsigned i = 0, e = forOp.getResults().size(); i < e; i++)		for (unsigned i = 0, e = forOp.getResults().size(); i < e; i++)
Show All 14 Lines	if (!reduc.empty()) {
Value curVal;		Value curVal;
if (redExp->getOperand(0) == redVal)		if (redExp->getOperand(0) == redVal)
curVal = redExp->getOperand(1);		curVal = redExp->getOperand(1);
else if (redExp->getOperand(1) == redVal)		else if (redExp->getOperand(1) == redVal)
curVal = redExp->getOperand(0);		curVal = redExp->getOperand(0);
// One of the operands must be the init value (which is also the		// One of the operands must be the init value (which is also the
// previous reduction value).		// previous reduction value).
assert(curVal);		assert(curVal);
		#ifndef NDEBUG
// The reduction expression should be the only user of the reduction val		// The reduction expression should be the only user of the reduction val
// inside the parallel for.		// inside the parallel for.
unsigned numUsers = 0;		unsigned numUsers = 0;
for (Operation *op : redVal.getUsers()) {		for (Operation *op : redVal.getUsers()) {
if (op->getParentOp() == parOp)		if (op->getParentOp() == parOp)
numUsers++;		numUsers++;
}		}
assert(numUsers == 1);		assert(numUsers == 1);
(void)numUsers; // to silence unused variable warning in release build		#endif // NDEBUG

rewriter.setInsertionPointAfter(redExp);		rewriter.setInsertionPointAfter(redExp);
auto redOp = rewriter.create<scf::ReduceOp>(loc, curVal);		auto redOp = rewriter.create<scf::ReduceOp>(loc, curVal);
// Attach to the reduction op.		// Attach to the reduction op.
Block *redBlock = &redOp.getRegion().getBlocks().front();		Block *redBlock = &redOp.getRegion().getBlocks().front();
rewriter.setInsertionPointToEnd(redBlock);		rewriter.setInsertionPointToEnd(redBlock);
Operation newRed = rewriter.clone(redExp);		Operation newRed = rewriter.clone(redExp);
// Replaces arguments of the reduction expression by using the block		// Replaces arguments of the reduction expression by using the block
Show All 9 Lines	#endif // NDEBUG
// In-place update reduction variables.		// In-place update reduction variables.
for (unsigned i = 0, e = parOp.getResults().size(); i < e; i++)		for (unsigned i = 0, e = parOp.getResults().size(); i < e; i++)
reduc[i] = parOp.getResult(i);		reduc[i] = parOp.getResult(i);
}		}

// Finished iterating a tensor, clean up		// Finished iterating a tensor, clean up
// We only do the clean up on for loop as while loops do not necessarily		// We only do the clean up on for loop as while loops do not necessarily
// finish the iteration on a sparse tensor		// finish the iteration on a sparse tensor
for (auto [tid, dim] : llvm::zip(tids, dims)) {		for (auto [tid, lvl] : llvm::zip(loopInfo.tids, loopInfo.lvls)) {
// Reset to null.		// Reset to null.
coord[tid][dim] = Value();		coords[tid][lvl] = Value();
pidxs[tid][dim] = Value();		posits[tid][lvl] = Value();
// Dense dimension, high is fixed.		// Dense level, high is fixed.
if (!isDenseDLT(dimTypes[tid][dim]))		if (!isDenseDLT(lvlTypes[tid][lvl]))
highs[tid][dim] = Value();		highs[tid][lvl] = Value();
}		}
}		}

void LoopEmitter::exitCoIterationLoop(OpBuilder &builder, Location loc,		void LoopEmitter::exitCoIterationLoop(OpBuilder &builder, Location loc,
MutableArrayRef<Value> reduc) {		MutableArrayRef<Value> reduc) {
const LoopLevelInfo &loopInfo = loopStack.back();		const LoopInfo &loopInfo = loopStack.back();
auto whileOp = llvm::cast<scf::WhileOp>(loopInfo.loop);		auto whileOp = llvm::cast<scf::WhileOp>(loopInfo.loop);
builder.setInsertionPointToEnd(loopInfo.userCodeBlock);		builder.setInsertionPointToEnd(loopInfo.userCodeBlock);
auto &dims = loopInfo.dims;
auto &tids = loopInfo.tids;
Value iv = loopInfo.iv;		Value iv = loopInfo.iv;
// Finalize the induction. Note that the induction could be performed		// Finalize the induction. Note that the induction could be performed
// in the individual if-branches to avoid re-evaluating the conditions.		// in the individual if-branches to avoid re-evaluating the conditions.
// However, that would result in a rather elaborate forest of yield		// However, that would result in a rather elaborate forest of yield
// instructions during code generation. Moreover, performing the induction		// instructions during code generation. Moreover, performing the induction
// after the if-statements more closely resembles code generated by TACO.		// after the if-statements more closely resembles code generated by TACO.
unsigned o = 0;		unsigned o = 0;
SmallVector<Value> operands;		SmallVector<Value> operands;
Value one = constantIndex(builder, loc, 1);		Value one = constantIndex(builder, loc, 1);
for (auto [tid, dim] : llvm::zip(tids, dims)) {		for (auto [tid, dstLvl] : llvm::zip(loopInfo.tids, loopInfo.lvls)) {
if (isCompressedDLT(dimTypes[tid][dim]) \|\|		const auto lvlTp = lvlTypes[tid][dstLvl];
isSingletonDLT(dimTypes[tid][dim])) {		if (isCompressedDLT(lvlTp) \|\| isSingletonDLT(lvlTp)) {
const auto reassoc = getCollapseReassociation(tid, dim);		const auto reassoc = getCollapseReassociation(tid, dstLvl);
assert(reassoc.size() == 1 \|\| isUniqueCOOType(tensors[tid].getType()));		assert(reassoc.size() == 1 \|\| isUniqueCOOType(tensors[tid].getType()));
for (unsigned i = 0, e = reassoc.size() - 1; i < e; i++) {		for (unsigned i = 0, e = reassoc.size() - 1; i < e; i++) {
const auto lvl = reassoc[i];		const Level srcLvl = reassoc[i];
if (!isUniqueDLT(dimTypes[tid][lvl])) {		if (!isUniqueDLT(lvlTypes[tid][srcLvl])) {
operands.push_back(segHi[tid][lvl]);		operands.push_back(segHi[tid][srcLvl]);
o++;		o++;
}		}
}		}
Value op1 = coord[tid][dim];		const Value crd = coords[tid][dstLvl];
Value op3 = pidxs[tid][dim];		const Value pos = posits[tid][dstLvl];
Value cmp =		Value cmp =
builder.create<arith::CmpIOp>(loc, arith::CmpIPredicate::eq, op1, iv);		builder.create<arith::CmpIOp>(loc, arith::CmpIPredicate::eq, crd, iv);
// If the loop contains a coiteration with non-unique level, we fast		// If the loop contains a coiteration with non-unique level, we fast
// forward all the duplicated coords by setting the position to the		// forward all the duplicated coords by setting the position to the
// segment high.		// segment high.
// If this is a collapsed dim, we forward pidx based on the last level in		Value add = !isUniqueDLT(lvlTypes[tid][reassoc.back()])
// the collapsed level set.
Value add = !isUniqueDLT(dimTypes[tid][reassoc.back()])
? segHi[tid][reassoc.back()]		? segHi[tid][reassoc.back()]
: builder.create<arith::AddIOp>(loc, op3, one);		: builder.create<arith::AddIOp>(loc, pos, one);

operands.push_back(builder.create<arith::SelectOp>(loc, cmp, add, op3));		operands.push_back(builder.create<arith::SelectOp>(loc, cmp, add, pos));
// Following loops continue iteration from the break point of the		// Following loops continue iteration from the break point of the
// current while loop.		// current while loop.
Value pos = whileOp->getResult(o++);		const Value newPos = whileOp->getResult(o++);
const auto t = tid;		// We need to define a new local variable for `tid` to avoid
llvm::for_each(reassoc, [this, t, pos](Level l) { pidxs[t][l] = pos; });		// warnings about "captured structured bindings are a C++20 extension".
// The coordinates are invalid now.		// FIXME(wrengr): define a helper function to capture this idiom!
coord[tid][dim] = nullptr;		const TensorId newTid = tid;
// The segment high are invalid now		llvm::for_each(reassoc, [this, newTid, newPos](Level srcLvl) {
segHi[tid][dim] = nullptr;		posits[newTid][srcLvl] = newPos;
		});
		// The coordinate is invalid now.
		coords[tid][dstLvl] = nullptr;
		// The segment high is invalid now.
		segHi[tid][dstLvl] = nullptr;
// highs remains unchanged.		// highs remains unchanged.
}		}
}		}

// Reduction value from users.		// Reduction value from users.
for (auto &i : reduc) {		for (auto &i : reduc) {
operands.push_back(i);		operands.push_back(i);
// In place update reduction variable.		// In place update reduction variable.
Show All 13 Lines	void LoopEmitter::exitCoIterationLoop(OpBuilder &builder, Location loc,
builder.create<scf::YieldOp>(loc, operands);		builder.create<scf::YieldOp>(loc, operands);
builder.setInsertionPointAfter(whileOp);		builder.setInsertionPointAfter(whileOp);
}		}

void LoopEmitter::exitCurrentLoop(RewriterBase &rewriter, Location loc,		void LoopEmitter::exitCurrentLoop(RewriterBase &rewriter, Location loc,
MutableArrayRef<Value> reduc) {		MutableArrayRef<Value> reduc) {
// Clean up the values, it would help use to discover potential bug at a		// Clean up the values, it would help use to discover potential bug at a
// earlier stage (instead of silently using a wrong value).		// earlier stage (instead of silently using a wrong value).
LoopLevelInfo &loopInfo = loopStack.back();		const LoopInfo &loopInfo = loopStack.back();
assert(loopInfo.tids.size() == loopInfo.dims.size());		assert(loopInfo.tids.size() == loopInfo.lvls.size());
SmallVector<Value> red;		SmallVector<Value> red;
if (llvm::isa<scf::WhileOp>(loopInfo.loop)) {		if (llvm::isa<scf::WhileOp>(loopInfo.loop)) {
exitCoIterationLoop(rewriter, loc, reduc);		exitCoIterationLoop(rewriter, loc, reduc);
} else {		} else {
exitForLoop(rewriter, loc, reduc);		exitForLoop(rewriter, loc, reduc);
}		}

assert(loopStack.size() == loopSeqStack.size());		assert(loopStack.size() == loopSeqStack.size());
loopStack.pop_back();		loopStack.pop_back();
}		}

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorRewriting.cpp

Show First 20 Lines • Show All 932 Lines • ▼ Show 20 Lines	LogicalResult matchAndRewrite(ForeachOp op,
// Otherwise, use loop emitter to generate loops.		// Otherwise, use loop emitter to generate loops.
const auto enc = stt.getEncoding();		const auto enc = stt.getEncoding();

// 1. Generates loop for the sparse input.		// 1. Generates loop for the sparse input.
LoopEmitter loopEmitter(		LoopEmitter loopEmitter(
ValueRange{input},		ValueRange{input},
StringAttr::get(getContext(), ForeachOp::getOperationName()));		StringAttr::get(getContext(), ForeachOp::getOperationName()));
loopEmitter.initializeLoopEmit(rewriter, loc);		loopEmitter.initializeLoopEmit(rewriter, loc);
for (Dimension d = 0; d < dimRank; d++) {		for (Level l = 0; l < lvlRank; l++) {
// TODO: provide utility function for loop sequences that only contains		// TODO: provide utility function for loop sequences that only contains
// one for loop?		// one for loop?
const Level l = op.getOrder() ? op.getOrder()->getDimPosition(d) : d;		// FIXME(wrengr): what is this "ld" supposed to be really?
loopEmitter.enterNewLoopSeq(rewriter, loc, 0, static_cast<size_t>(l));		const Level ld = op.getOrder() ? op.getOrder()->getDimPosition(l) : l;
		loopEmitter.enterNewLoopSeq(rewriter, loc, 0, ld);
// Note that reduc will be taken care of by loop emitter and get updated		// Note that reduc will be taken care of by loop emitter and get updated
// in place.		// in place.

loopEmitter.enterLoopOverTensorAtDim(rewriter, loc, 0, l, reduc);		loopEmitter.enterLoopOverTensorAtLvl(rewriter, loc, 0, l, reduc);
}		}

SmallVector<Value> lcvs;		SmallVector<Value> lcvs;
lcvs.reserve(lvlRank);		lcvs.reserve(lvlRank);
loopEmitter.getCoordinateArray(lcvs);		loopEmitter.getLoopIVs(lcvs);

if (op.getOrder()) {		if (op.getOrder()) {
// FIXME: There is some dim/lvl confusion here since `dimRank != lvlRank`		// FIXME: There is some dim/lvl confusion here since `dimRank != lvlRank`
SmallVector<Value> dcvs = lcvs; // keep a copy		SmallVector<Value> dcvs = lcvs; // keep a copy
for (Dimension d = 0; d < dimRank; d++) {		for (Dimension d = 0; d < dimRank; d++) {
auto l = op.getOrder()->getDimPosition(d);		auto l = op.getOrder()->getDimPosition(d);
lcvs[l] = dcvs[d];		lcvs[l] = dcvs[d];
}		}
}		}
Value vals = loopEmitter.getValBuffer()[0];		Value vals = loopEmitter.getValBuffer()[0];
Value pidx = loopEmitter.getPidxs()[0].back();		Value pos = loopEmitter.getPosits()[0].back();
// Loads the value from sparse tensor using position-index;		// Loads the value from sparse tensor using position-index;
// loads the value from dense tensor using coords.		// loads the value from dense tensor using coords.
Value val = enc ? rewriter.create<memref::LoadOp>(loc, vals, pidx)		Value val = enc ? rewriter.create<memref::LoadOp>(loc, vals, pos)
: rewriter.create<memref::LoadOp>(loc, vals, lcvs);		: rewriter.create<memref::LoadOp>(loc, vals, lcvs);

// 2. Inline the block in the foreach operator.		// 2. Inline the block in the foreach operator.
Block *srcBlock = op.getBody();		Block *srcBlock = op.getBody();

// Remap coordinates.		// Remap coordinates.
SmallVector<Value> args;		SmallVector<Value> args;
for (Dimension d = 0; d < dimRank; d++) {		for (Dimension d = 0; d < dimRank; d++) {
▲ Show 20 Lines • Show All 161 Lines • Show Last 20 Lines

mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp

Show First 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	public:
void setPickedIterType(utils::IteratorType iterType) {		void setPickedIterType(utils::IteratorType iterType) {
pickIterType = iterType;		pickIterType = iterType;
}		}

/// Get the desired AffineDimExpr.		/// Get the desired AffineDimExpr.
AffineDimExpr getDimExpr() const { return pickedDim.cast<AffineDimExpr>(); }		AffineDimExpr getDimExpr() const { return pickedDim.cast<AffineDimExpr>(); }

private:		private:
/// The picked AffineDimExpr after visit.		/// The picked AffineDimExpr after visit. This must be stored as
		/// `AffineExpr` rather than `AffineDimExpr`, because the latter
		/// doesn't have a default ctor.
AffineExpr pickedDim;		AffineExpr pickedDim;
/// The iterator type that we want.		/// The iterator type that we want.
utils::IteratorType pickIterType;		utils::IteratorType pickIterType;
/// The mapping between dim=>iterator type.		/// The mapping between dim=>iterator type.
SmallVector<utils::IteratorType> iterTypes;		SmallVector<utils::IteratorType> iterTypes;
};		};

} // namespace		} // namespace

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Sparse compiler analysis methods.		// Sparse compiler analysis methods.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		// TODO: the "idx"-vs-"ldx" naming convention is not self-explanatory,
		// and those letters are too easy to confuse visually. We should switch
		// to a more self-explanatory naming convention like "curLoop"-vs-"prevLoop"
		// (assuming that's the actual meaning behind the "idx"-vs-"ldx" convention).

/// Determines if affine expression is invariant.		/// Determines if affine expression is invariant.
static bool isInvariantAffine(AffineExpr a, ArrayRef<unsigned> loopStack,		static bool isInvariantAffine(AffineExpr a, ArrayRef<LoopId> loopStack,
unsigned ldx, bool &atLevel) {		LoopId ldx, bool &isAtLoop) {
switch (a.getKind()) {		switch (a.getKind()) {
case AffineExprKind::DimId: {		case AffineExprKind::DimId: {
unsigned idx = a.cast<AffineDimExpr>().getPosition();		const LoopId i = a.cast<AffineDimExpr>().getPosition();
if (idx == ldx) {		if (i == ldx) {
atLevel = true;		isAtLoop = true;
// Must be invariant if we are at the level.		// Must be invariant if we are at the given loop.
return true;		return true;
}		}
bool isInvariant = false;		bool isInvariant = false;
for (unsigned loop : loopStack) {		for (LoopId l : loopStack) {
isInvariant = (loop == idx);		isInvariant = (l == i);
if (isInvariant)		if (isInvariant)
break;		break;
}		}
return isInvariant;		return isInvariant;
}		}
case AffineExprKind::Add:		case AffineExprKind::Add:
case AffineExprKind::Mul: {		case AffineExprKind::Mul: {
auto binOp = a.cast<AffineBinaryOpExpr>();		auto binOp = a.cast<AffineBinaryOpExpr>();
return isInvariantAffine(binOp.getLHS(), loopStack, ldx, atLevel) &&		return isInvariantAffine(binOp.getLHS(), loopStack, ldx, isAtLoop) &&
isInvariantAffine(binOp.getRHS(), loopStack, ldx, atLevel);		isInvariantAffine(binOp.getRHS(), loopStack, ldx, isAtLoop);
}		}
default: {		default: {
assert(a.isa<AffineConstantExpr>());		assert(a.isa<AffineConstantExpr>());
return true;		return true;
}		}
}		}
}		}

/// Determines if affine expression is invariant.		/// Determines if affine expression is invariant.
static bool isInvariantAffine(CodegenEnv &env, AffineExpr a, unsigned ldx,		static bool isInvariantAffine(CodegenEnv &env, AffineExpr a, LoopId ldx,
bool &atLevel) {		bool &isAtLoop) {
return isInvariantAffine(a, env.getLoopCurStack(), ldx, atLevel);		return isInvariantAffine(a, env.getCurrentLoopStack(), ldx, isAtLoop);
}		}

/// Helper method to construct a permuted dimension ordering		/// Helper method to construct a permuted dimension ordering
/// that adheres to the given topological sort.		/// that adheres to the given topological sort.
		//
		// FIXME: does the above actually mean "dimensions", or should it say
		// "level ordering"? The same dim/lvl confusion applies to all the code
		// and comments in the definition below.
static AffineMap permute(CodegenEnv &env, AffineMap m) {		static AffineMap permute(CodegenEnv &env, AffineMap m) {
assert(m.getNumDims() + env.merger().getNumFilterLoops() ==		assert(m.getNumDims() + env.merger().getNumFilterLoops() ==
env.topSortSize() &&		env.topSortSize() &&
"size mismatch");		"size mismatch");
// Construct the inverse of `m`; to avoid the asymptotic complexity		// Construct the inverse of `m`; to avoid the asymptotic complexity
// of calling `m.getPermutedPosition` repeatedly.		// of calling `m.getPermutedPosition` repeatedly.
		//
		// The variable `perm` must use `unsigned` rather than `Dimension`/`Level`,
		// because that's what `AffineMap::getPermutationMap` requires.
		// TODO: however, `perm` should be renamed to make clear what exactly
		// it's storing a permutation of.
SmallVector<unsigned> perm;		SmallVector<unsigned> perm;
unsigned numResults = m.getNumResults();		const unsigned numResults = m.getNumResults();
BitVector worklist(numResults, true);		BitVector worklist(numResults, true);
unsigned loopDepth = 1;		LoopOrd loopDepth = 1;

// Construct the permutation.		// Construct the permutation.
while (worklist.any() && loopDepth <= env.topSortSize()) {		while (worklist.any() && loopDepth <= env.topSortSize()) {
unsigned preSize = perm.size();		const unsigned preSize = perm.size();
for (auto dim : worklist.set_bits()) {		for (unsigned dim : worklist.set_bits()) {
bool atLevel = false;		bool isAtLoop = false;
if (m.getResult(dim).isa<AffineConstantExpr>() \|\|		if (m.getResult(dim).isa<AffineConstantExpr>() \|\|
(isInvariantAffine(m.getResult(dim),		(isInvariantAffine(m.getResult(dim), env.getLoopStackUpTo(loopDepth),
env.getTopSortSlice(0, loopDepth),		env.topSortAt(loopDepth - 1), isAtLoop) &&
env.topSortAt(loopDepth - 1), atLevel) &&		isAtLoop)) {
atLevel)) {
// If the matching affine is constant expression or just become		// If the matching affine is constant expression or just become
// invariant. We can visit the dimension now without breaking the		// invariant. We can visit the dimension now without breaking the
// topSort constraint.		// topSort constraint.
perm.push_back(dim);		perm.push_back(dim);
}		}
}		}

// Removes resolved dimension.		// Removes resolved dimension.
for (unsigned i = preSize, e = perm.size(); i < e; i++)		for (unsigned i = preSize, e = perm.size(); i < e; i++)
worklist.reset(perm[i]);		worklist.reset(perm[i]);

// Tries to entering the next loop level.		// Try entering the next loop in the stack.
loopDepth += 1;		loopDepth++;
}		}

assert(perm.size() == numResults);		assert(perm.size() == numResults);
return AffineMap::getPermutationMap(perm, env.op().getContext());		return AffineMap::getPermutationMap(perm, env.op().getContext());
}		}

/// Helper method to inspect affine expressions. Rejects cases where the		/// Helper method to inspect affine expressions. Rejects cases where the
/// same index is used more than once. Also rejects compound affine		/// same index is used more than once. Also rejects compound affine
/// expressions in sparse dimensions.		/// expressions in sparse dimensions.
/// filterIdx stores the current filter loop idx should be used for the next		/// filterIdx stores the current filter loop idx should be used for the next
/// compound affine sparse level, and it will be incremented by one when		/// compound affine sparse level, and it will be incremented by one when
/// used.		/// used.
static bool findAffine(Merger &merger, unsigned tensor, unsigned dim,		static bool findAffine(Merger &merger, TensorId tid, Level lvl, AffineExpr a,
AffineExpr a, DimLevelType dlt, unsigned &filterLdx,		DimLevelType dlt, LoopId &filterLdx,
bool setLvlFormat = true) {		bool setLvlFormat = true) {
switch (a.getKind()) {		switch (a.getKind()) {
case AffineExprKind::DimId: {		case AffineExprKind::DimId: {
unsigned idx = a.cast<AffineDimExpr>().getPosition();		const LoopId idx = a.cast<AffineDimExpr>().getPosition();
if (!isUndefDLT(merger.getDimLevelType(tensor, idx)))		if (!isUndefDLT(merger.getDimLevelType(tid, idx)))
return false; // used more than once		return false; // used more than once

if (setLvlFormat)		if (setLvlFormat)
merger.setDimAndDimLevelType(tensor, idx, dim, dlt);		merger.setLevelAndType(tid, idx, lvl, dlt);
return true;		return true;
}		}
case AffineExprKind::Add:		case AffineExprKind::Add:
case AffineExprKind::Mul:		case AffineExprKind::Mul:
case AffineExprKind::Constant: {		case AffineExprKind::Constant: {
if (!isDenseDLT(dlt) && setLvlFormat) {		if (!isDenseDLT(dlt) && setLvlFormat) {
assert(isUndefDLT(merger.getDimLevelType(tensor, filterLdx)));		assert(isUndefDLT(merger.getDimLevelType(tid, filterLdx)));
// Use a filter loop for sparse affine expression.		// Use a filter loop for sparse affine expression.
merger.setDimAndDimLevelType(tensor, filterLdx++, dim, dlt);		merger.setLevelAndType(tid, filterLdx++, lvl, dlt);
}		}

if (auto binOp = a.dyn_cast<AffineBinaryOpExpr>()) {		if (auto binOp = a.dyn_cast<AffineBinaryOpExpr>()) {
// We do not set dim level format for affine expresssion like d0 + d1 on		// We do not set dim level format for affine expresssion like d0 + d1 on
// either loop index at d0 or d1.		// either loop index at d0 or d1.
// We continue the recursion merely to check whether current affine is		// We continue the recursion merely to check whether current affine is
// admissible or not.		// admissible or not.
return findAffine(merger, tensor, dim, binOp.getLHS(), dlt, filterLdx,		return findAffine(merger, tid, lvl, binOp.getLHS(), dlt, filterLdx,
false) &&		false) &&
findAffine(merger, tensor, dim, binOp.getRHS(), dlt, filterLdx,		findAffine(merger, tid, lvl, binOp.getRHS(), dlt, filterLdx,
false);		false);
}		}
// Falls through when it is a constant Affine		// Falls through when it is a constant Affine
return true;		return true;
}		}
default:		default:
return false;		return false;
}		}
}		}

/// Get the total number of compound affine expressions in affineMap that are		/// Get the total number of compound affine expressions in the
/// attached to the given tensor. For the following inputs:		/// `getMatchingIndexingMap` for the given tensor. For the following inputs:
///		///
/// affineMap = (d0, d1, d2) => (d0 + d1, d2)		/// map = (d0, d1, d2) => (d0 + d1, d2)
/// tensor = ["compressed", "compressed"]		/// lvlTypes = ["compressed", "compressed"]
///		///
/// Returns 1 (because the first level is compressed and its corresponding		/// Returns 1 (because the first level is compressed and its corresponding
/// affineMap is d0 + d1)		/// indexing-expression is `d0 + d1`)
static unsigned getNumCompoundAffineOnSparseDims(AffineMap affineMap,		static unsigned getNumCompoundAffineOnSparseLvls(AffineMap map, Value tensor) {
Value tensor) {		// The `tensor` is not guaranted to have `RankedTensorType`, therefore
		// we can't use `getRankedTensorType`/`getSparseTensorType` here.
		// However, we don't need to handle `StorageSpecifierType`, so we
		// can use `SparseTensorType` once we guard against non-tensors.
		const auto rtp = tensor.getType().dyn_cast<RankedTensorType>();
		if (!rtp)
		return 0;
		const SparseTensorType stt(rtp);

		// FIXME: There's some dim/lvl confusion here. The previous version of
		// the code asserted that there are `lvlRank`-many expressions, but then
		// the `exprs[d]` expression assumes there are in fact `dimRank`-many
		// expressions. Even though `ArrayRef::operator[]` will check for OOB,
		// the mismatch between the assertion and the usage belies that this code
		// cannot support non-permutations.
		//
		// Elsewhere in this file the maps returned by
		// `linalg::GenericOp::getMatchingIndexingMap` are inconsistent about
		// whether they're expected to have `lvlRank`-many or `dimRank`-many
		// expressions (cf., `genSubscript` vs `findSparseAnnotations`);
		// so those are no help in determining which is actually intended.
		//
		// For now we work around this problem by asserting the two ranks agree.
		const Dimension dimRank = stt.getDimRank();
		const Level lvlRank = stt.getLvlRank();
		assert(dimRank == lvlRank && "Non-permutations not currently supported");
		const auto exprs = map.getResults();
		assert(static_cast<Dimension>(exprs.size()) == dimRank &&
		"AffineMap does not have dimension-rank many results");
		(void)dimRank;
unsigned num = 0;		unsigned num = 0;
const auto enc = getSparseTensorEncoding(tensor.getType());
if (enc) {
const ArrayRef<AffineExpr> exps = affineMap.getResults();
const Level lvlRank = enc.getLvlRank();
assert(static_cast<Level>(exps.size()) == lvlRank);
for (Level l = 0; l < lvlRank; l++) {		for (Level l = 0; l < lvlRank; l++) {
// FIXME: `toOrigDim` is deprecated.		// FIXME: `toOrigDim` is deprecated.
const Dimension d = toOrigDim(enc, l);		const Dimension d = toOrigDim(stt.getEncoding(), l);
// FIXME: there's some dim/lvl confusion here; since `d` isn't		if (!exprs[d].isa<AffineDimExpr>() && !stt.isDenseLvl(l))
// guaranteed to be in bounds (for non-permutations).
if (!exps[d].isa<AffineDimExpr>() && !enc.isDenseLvl(l))
num++;		num++;
}		}
}
return num;		return num;
}		}

/// Get the total number of compound affine expressions attached on a sparse		/// Get the total number of sparse levels with compound affine
/// level in the given GenericOp.		/// expressions, summed over all operands of the `GenericOp`.
static unsigned getNumCompoundAffineOnSparseDims(linalg::GenericOp op) {		static unsigned getNumCompoundAffineOnSparseLvls(linalg::GenericOp op) {
unsigned num = 0;		unsigned num = 0;
for (OpOperand &t : op->getOpOperands())		for (OpOperand &t : op->getOpOperands())
num += getNumCompoundAffineOnSparseDims(op.getMatchingIndexingMap(&t),		num += getNumCompoundAffineOnSparseLvls(op.getMatchingIndexingMap(&t),
t.get());		t.get());
return num;		return num;
}		}

static bool hasCompoundAffineOnSparseOut(linalg::GenericOp op) {		static bool hasCompoundAffineOnSparseOut(linalg::GenericOp op) {
OpOperand *out = op.getDpsInitOperand(0);		OpOperand *out = op.getDpsInitOperand(0);
if (getSparseTensorType(out->get()).isAllDense())		if (getSparseTensorType(out->get()).isAllDense())
return false;		return false;
return getNumCompoundAffineOnSparseDims(op.getMatchingIndexingMap(out),		return getNumCompoundAffineOnSparseLvls(op.getMatchingIndexingMap(out),
out->get());		out->get());
}		}

/// Helper method to inspect sparse encodings in the tensor types.		/// Helper method to inspect sparse encodings in the tensor types.
/// Fills the per-dimension sparsity information for all tensors.		/// Fills the per-dimension sparsity information for all tensors.
/// Returns true if the sparse annotations and affine subscript		/// Returns true if the sparse annotations and affine subscript
/// expressions of all tensors are admissible. Returns false if		/// expressions of all tensors are admissible. Returns false if
/// no annotations are found or inadmissible constructs occur.		/// no annotations are found or inadmissible constructs occur.
static bool findSparseAnnotations(CodegenEnv &env) {		static bool findSparseAnnotations(CodegenEnv &env) {
bool annotated = false;		bool annotated = false;
unsigned filterLdx = env.merger().getFilterLoopStartingIdx();		// `filterLdx` may be mutated by `findAffine`.
		LoopId filterLdx = env.merger().getStartingFilterLoopId();
for (OpOperand &t : env.op()->getOpOperands()) {		for (OpOperand &t : env.op()->getOpOperands()) {
const auto map = env.op().getMatchingIndexingMap(&t);		const auto map = env.op().getMatchingIndexingMap(&t);
const auto enc = getSparseTensorEncoding(t.get().getType());		const auto enc = getSparseTensorEncoding(t.get().getType());
if (enc)		if (enc)
annotated = true;		annotated = true;
const Level lvlRank = map.getNumResults();		const Level lvlRank = map.getNumResults();
assert(!enc \|\| lvlRank == enc.getLvlRank());		assert(!enc \|\| lvlRank == enc.getLvlRank());
assert(static_cast<Level>(env.op().getRank(&t)) == lvlRank);		assert(static_cast<Level>(env.op().getRank(&t)) == lvlRank);
for (Level l = 0; l < lvlRank; l++) {		for (Level l = 0; l < lvlRank; l++) {
const unsigned tensor = t.getOperandNumber();		const TensorId tid = t.getOperandNumber();
// FIXME: `toOrigDim` is deprecated.		// FIXME: `toOrigDim` is deprecated.
		// FIXME: above we asserted that there are `lvlRank` many results,
		// but this is assuming there are in fact `dimRank` many results instead.
const AffineExpr a = map.getResult(toOrigDim(enc, l));		const AffineExpr a = map.getResult(toOrigDim(enc, l));
if (!findAffine(env.merger(), tensor, l, a, enc.getLvlType(l), filterLdx))		if (!findAffine(env.merger(), tid, l, a, enc.getLvlType(l), filterLdx))
return false; // inadmissible affine expression		return false; // inadmissible affine expression
}		}
}		}
assert(filterLdx == env.merger().getNumLoops());		assert(filterLdx == env.merger().getNumLoops());
return annotated;		return annotated;
}		}

/// A helper to compute a topological sort. O(n^2) time complexity		/// A helper to compute a topological sort. O(n^2) time complexity
/// as we use adj matrix for the graph.		/// as we use adj matrix for the graph.
/// The sorted result will put the first Reduction iterator to the		/// The sorted result will put the first Reduction iterator to the
/// latest possible index.		/// latest possible index.
static bool topSortOptimal(CodegenEnv &env, unsigned n,		/// FIXME(wrengr): correct the above "index"
		///
		/// The `inDegree` is indexed by `LoopId`, and the `adjM` is indexed by
		/// `(LoopId,LoopId)`.
		static bool topSortOptimal(CodegenEnv &env, LoopId n,
ArrayRef<utils::IteratorType> iteratorTypes,		ArrayRef<utils::IteratorType> iteratorTypes,
std::vector<unsigned> &inDegree,		std::vector<unsigned> &inDegree,
std::vector<std::vector<bool>> &adjM) {		std::vector<std::vector<bool>> &adjM) {
std::vector<unsigned> redIt; // reduce iterator with 0 degree		std::vector<LoopId> redIt; // reduce iterator with 0 degree
std::vector<unsigned> parIt; // parallel iterator with 0 degree		std::vector<LoopId> parIt; // parallel iterator with 0 degree
std::vector<unsigned> filterIt; // filter loop with 0 degree		std::vector<LoopId> filterIt; // filter loop with 0 degree
for (unsigned i = 0; i < n; i++) {		for (LoopId i = 0; i < n; i++) {
if (inDegree[i] == 0) {		if (inDegree[i] == 0) {
if (env.merger().isFilterLoop(i))		if (env.merger().isFilterLoop(i))
filterIt.push_back(i);		filterIt.push_back(i);
else if (linalg::isReductionIterator(iteratorTypes[i]))		else if (linalg::isReductionIterator(iteratorTypes[i]))
redIt.push_back(i);		redIt.push_back(i);
else		else
parIt.push_back(i);		parIt.push_back(i);
}		}
Show All 19 Lines	while (!redIt.empty() \|\| !parIt.empty() \|\| !filterIt.empty()) {
// if (xxx)		// if (xxx)
// for (1 to M)		// for (1 to M)
// O(X) computation => O(NK+NMX) time complexity		// O(X) computation => O(NK+NMX) time complexity
auto &it = !filterIt.empty() ? filterIt : (!parIt.empty() ? parIt : redIt);		auto &it = !filterIt.empty() ? filterIt : (!parIt.empty() ? parIt : redIt);
auto src = it.back();		auto src = it.back();
env.topSortPushBack(src);		env.topSortPushBack(src);
it.pop_back();		it.pop_back();
// Update in-degree, and push 0-degree node into worklist.		// Update in-degree, and push 0-degree node into worklist.
for (unsigned dst = 0; dst < n; dst++) {		for (LoopId dst = 0; dst < n; dst++) {
if (adjM[src][dst] && --inDegree[dst] == 0) {		if (adjM[src][dst] && --inDegree[dst] == 0) {
if (env.merger().isFilterLoop(dst))		if (env.merger().isFilterLoop(dst))
filterIt.push_back(dst);		filterIt.push_back(dst);
else if (linalg::isReductionIterator(iteratorTypes[dst]))		else if (linalg::isReductionIterator(iteratorTypes[dst]))
redIt.push_back(dst);		redIt.push_back(dst);
else		else
parIt.push_back(dst);		parIt.push_back(dst);
}		}
}		}
}		}
return env.topSortSize() == n;		return env.topSortSize() == n;
}		}

/// Helper method to add all constraints from the indices in one affine		/// Helper method to add all constraints from the indices in one affine
/// expression before all indices in the other affine expression. For		/// expression before all indices in the other affine expression. For
/// example i0+i1 < i2+i3+1 yields i0<i2, i0<i3, i1<i2, and i1<i3.		/// example i0+i1 < i2+i3+1 yields i0<i2, i0<i3, i1<i2, and i1<i3.
/// The affine expression `a` is empty iff `fidx` have a value, leading to		/// The affine expression `a` is empty iff `fidx` have a value, leading to
/// b = (i0 + i1) < fidx => i0 < fidx, i1 < fidx.		/// b = (i0 + i1) < fidx => i0 < fidx, i1 < fidx.
/// The affine expression `b` is empty iff `tidx` have a value, leading to		/// The affine expression `b` is empty iff `tidx` have a value, leading to
/// tidx < a = (i0 + i1) => tidx < i0, tidx < i1.		/// tidx < a = (i0 + i1) => tidx < i0, tidx < i1.
		///
		/// The `inDegree` is indexed by `LoopId`, and the `adjM` is indexed by
		/// `(LoopId,LoopId)`.
static void addAffineOrderings(std::vector<std::vector<bool>> &adjM,		static void addAffineOrderings(std::vector<std::vector<bool>> &adjM,
std::vector<unsigned> &inDegree, AffineExpr a,		std::vector<unsigned> &inDegree, AffineExpr a,
AffineExpr b, std::optional<unsigned> fidx,		AffineExpr b, std::optional<LoopId> fidx,
std::optional<unsigned> tidx) {		std::optional<LoopId> tidx) {
if (!a && !b) {		if (!a && !b) {
// Recursion leaf.		// Recursion leaf.
assert(fidx && tidx);		assert(fidx && tidx);
unsigned f = fidx, t = tidx;		const LoopId f = fidx, t = tidx;
if (!adjM[f][t]) {		if (!adjM[f][t]) {
adjM[f][t] = true;		adjM[f][t] = true;
inDegree[t]++;		inDegree[t]++;
}		}
return;		return;
}		}
// Picks an affine expression and expand (recurse into) it.		// Picks an affine expression and expand (recurse into) it.
auto toExpand = a ? a : b;		const auto toExpand = a ? a : b;
switch (toExpand.getKind()) {		switch (toExpand.getKind()) {
case AffineExprKind::DimId: {		case AffineExprKind::DimId: {
auto idx = toExpand.cast<AffineDimExpr>().getPosition();		std::optional<LoopId> idx = toExpand.cast<AffineDimExpr>().getPosition();
if (toExpand == a)		if (toExpand == a)
addAffineOrderings(adjM, inDegree, AffineExpr(), b, idx, tidx);		addAffineOrderings(adjM, inDegree, AffineExpr(), b, idx, tidx);
else // toExpand == b		else // toExpand == b
addAffineOrderings(adjM, inDegree, a, AffineExpr(), fidx, idx);		addAffineOrderings(adjM, inDegree, a, AffineExpr(), fidx, idx);
break;		break;
}		}
case AffineExprKind::Add:		case AffineExprKind::Add:
case AffineExprKind::Mul: {		case AffineExprKind::Mul: {
auto binOp = toExpand.cast<AffineBinaryOpExpr>();		auto binOp = toExpand.cast<AffineBinaryOpExpr>();
if (toExpand == a) {		if (toExpand == a) {
addAffineOrderings(adjM, inDegree, binOp.getLHS(), b, fidx, tidx);		addAffineOrderings(adjM, inDegree, binOp.getLHS(), b, fidx, tidx);
addAffineOrderings(adjM, inDegree, binOp.getRHS(), b, fidx, tidx);		addAffineOrderings(adjM, inDegree, binOp.getRHS(), b, fidx, tidx);
} else {		} else {
addAffineOrderings(adjM, inDegree, a, binOp.getLHS(), fidx, tidx);		addAffineOrderings(adjM, inDegree, a, binOp.getLHS(), fidx, tidx);
addAffineOrderings(adjM, inDegree, a, binOp.getRHS(), fidx, tidx);		addAffineOrderings(adjM, inDegree, a, binOp.getRHS(), fidx, tidx);
}		}
break;		break;
}		}
default:		default:
break;		break;
}		}
}		}

static void tryLoosenAffineDenseConstraints(linalg::GenericOp op,		static void tryLoosenAffineDenseConstraints(linalg::GenericOp op,
std::optional<unsigned> &fldx,		std::optional<LoopId> &fldx,
AffineExpr &fa,		AffineExpr &fa,
std::optional<unsigned> &tldx,		std::optional<LoopId> &tldx,
AffineExpr &ta) {		AffineExpr &ta) {
// We use a heuristic here to only pick one dim expression from each		// We use a heuristic here to only pick one dim expression from each
// compound affine expression to establish the order between two dense		// compound affine expression to establish the order between two dense
// dimensions.		// dimensions.
if (!tldx) {		if (!tldx) {
AffineDimFinder finder(op);		AffineDimFinder finder(op);
// NOTE: The ordering can only be loosen when the destination level is		// NOTE: The ordering can only be loosen when the destination level is
// dense (when !tldx), for [dense, sparse] -> (d0 + d1, d2), we still		// dense (when !tldx), for [dense, sparse] -> (d0 + d1, d2), we still
Show All 24 Lines
/// essential for sparse storage formats since these only support access		/// essential for sparse storage formats since these only support access
/// along fixed levels. Even for dense storage formats, however, the natural		/// along fixed levels. Even for dense storage formats, however, the natural
/// coordinate order yields innermost unit-stride access with better spatial		/// coordinate order yields innermost unit-stride access with better spatial
/// locality.		/// locality.
static bool computeIterationGraph(CodegenEnv &env, SortMask mask,		static bool computeIterationGraph(CodegenEnv &env, SortMask mask,
OpOperand *skip = nullptr) {		OpOperand *skip = nullptr) {
// Set up an n x n from/to adjacency matrix of the iteration graph		// Set up an n x n from/to adjacency matrix of the iteration graph
// for the implicit loop indices i_0 .. i_n-1.		// for the implicit loop indices i_0 .. i_n-1.
const unsigned n = env.merger().getNumLoops();		const LoopId n = env.merger().getNumLoops();
std::vector<std::vector<bool>> adjM(n, std::vector<bool>(n, false));		std::vector<std::vector<bool>> adjM(n, std::vector<bool>(n, false));
std::vector<unsigned> inDegree(n, 0); // in-degree of each node.		std::vector<unsigned> inDegree(n, 0); // in-degree of each node.
const auto iteratorTypes = env.op().getIteratorTypesArray();		const auto iteratorTypes = env.op().getIteratorTypesArray();
// Iterate over the indexing maps of every tensor in the tensor expression.		// Iterate over the indexing maps of every tensor in the tensor expression.
for (OpOperand &t : env.op()->getOpOperands()) {		for (OpOperand &t : env.op()->getOpOperands()) {
// Get map and encoding.		// Get map and encoding.
const auto map = env.op().getMatchingIndexingMap(&t);		const auto map = env.op().getMatchingIndexingMap(&t);
const auto enc = getSparseTensorEncoding(t.get().getType());		const auto enc = getSparseTensorEncoding(t.get().getType());
assert(map.getNumDims() + getNumCompoundAffineOnSparseDims(env.op()) == n);		assert(map.getNumDims() + getNumCompoundAffineOnSparseLvls(env.op()) == n);

// Skips dense inputs/outputs when not requested.		// Skips dense inputs/outputs when not requested.
const bool isDenseInput = !enc && env.op().isDpsInput(&t);		const bool isDenseInput = !enc && env.op().isDpsInput(&t);
const bool isDenseOutput = !enc && !isDenseInput;		const bool isDenseOutput = !enc && !isDenseInput;
if ((isDenseInput && !includesDenseInput(mask)) \|\|		if ((isDenseInput && !includesDenseInput(mask)) \|\|
(isDenseOutput && !includesDenseOutput(mask)))		(isDenseOutput && !includesDenseOutput(mask)))
continue;		continue;

// Push unrelated loops into sparse iteration space, so these		// Push unrelated loops into sparse iteration space, so these
// will be skipped more often.		// will be skipped more often.
// TODO: Do we really need this?		// TODO: Do we really need this?
if (includesUndef(mask)) {		if (includesUndef(mask)) {
unsigned tensor = t.getOperandNumber();		const TensorId tensor = t.getOperandNumber();
for (unsigned i = 0; i < n; i++) {		for (LoopId i = 0; i < n; i++) {
if (isCompressedDLT(env.dlt(tensor, i)) \|\|		const auto dltI = env.dlt(tensor, i);
isSingletonDLT(env.dlt(tensor, i))) {		if (isCompressedDLT(dltI) \|\| isSingletonDLT(dltI)) {
for (unsigned j = 0; j < n; j++)		for (LoopId j = 0; j < n; j++)
if (isUndefDLT(env.dlt(tensor, j))) {		if (isUndefDLT(env.dlt(tensor, j))) {
adjM[i][j] = true;		adjM[i][j] = true;
inDegree[j]++;		inDegree[j]++;
}		}
} else {		} else {
assert(isDenseDLT(env.dlt(tensor, i)) \|\|		assert(isDenseDLT(dltI) \|\| isUndefDLT(dltI));
isUndefDLT(env.dlt(tensor, i)));
}		}
}		}
}		}

// Each tensor expression and optional dimension ordering (row-major		// Each tensor expression and optional dimension ordering (row-major
// by default) puts an ordering constraint on the loop indices. For		// by default) puts an ordering constraint on the loop indices. For
// example, the tensor expresion A_ijk forces the ordering i < j < k		// example, the tensor expresion A_ijk forces the ordering i < j < k
// on the loop indices if no explicit dimension ordering is given.		// on the loop indices if no explicit dimension ordering is given.
const Level lvlRank = map.getNumResults();		const Level lvlRank = map.getNumResults();
assert(!enc \|\| lvlRank == enc.getLvlRank());		assert(!enc \|\| lvlRank == enc.getLvlRank());
for (Level l = 0; l < lvlRank; l++) {		for (Level l = 0; l < lvlRank; l++) {
// FIXME: `toOrigDim` is deprecated.		// FIXME: `toOrigDim` is deprecated.
		// FIXME: above we asserted that there are `lvlRank` many results,
		// but this is assuming there are in fact `dimRank` many results instead.
AffineExpr ta = map.getResult(toOrigDim(enc, l));		AffineExpr ta = map.getResult(toOrigDim(enc, l));
std::optional<unsigned> tldx =		std::optional<LoopId> tldx =
env.merger().getLoopIdx(t.getOperandNumber(), l);		env.merger().getLoopId(t.getOperandNumber(), l);

// Filter loops should be constructed after all the dependent loops,		// Filter loops should be constructed after all the dependent loops,
// i.e., d0 + d1 < filter_loop(d0 + d1)		// i.e., d0 + d1 < filter_loop(d0 + d1)
if (tldx && env.merger().isFilterLoop(*tldx)) {		if (tldx && env.merger().isFilterLoop(*tldx)) {
assert(!ta.isa<AffineDimExpr>() && !isDenseDLT(enc.getLvlType(l)));		assert(!ta.isa<AffineDimExpr>() && !isDenseDLT(enc.getLvlType(l)));
addAffineOrderings(adjM, inDegree, ta, AffineExpr(), std::nullopt,		addAffineOrderings(adjM, inDegree, ta, AffineExpr(), std::nullopt,
tldx);		tldx);
// Now that the ordering of affine expression is captured by filter		// Now that the ordering of affine expression is captured by filter
// loop idx, we only need to ensure the affine ordering against filter		// loop idx, we only need to ensure the affine ordering against filter
// loop. Thus, we reset the affine express to nil here to mark it as		// loop. Thus, we reset the affine express to nil here to mark it as
// resolved.		// resolved.
ta = AffineExpr();		ta = AffineExpr();
}		}

// Skip tensor during cycle resolution, though order between filter loop		// Skip tensor during cycle resolution, though order between filter loop
// and dependent loops need to be guaranteed unconditionally.		// and dependent loops need to be guaranteed unconditionally.
if (&t == skip)		if (&t == skip)
continue;		continue;

if (l > 0) {		if (l > 0) {
// FIXME: `toOrigDim` is deprecated.		// FIXME: `toOrigDim` is deprecated.
		// FIXME: above we asserted that there are `lvlRank` many results,
		// but this is assuming there are in fact `dimRank` many results.
AffineExpr fa = map.getResult(toOrigDim(enc, l - 1));		AffineExpr fa = map.getResult(toOrigDim(enc, l - 1));
std::optional<unsigned> fldx =		std::optional<LoopId> fldx =
env.merger().getLoopIdx(t.getOperandNumber(), l - 1);		env.merger().getLoopId(t.getOperandNumber(), l - 1);

// Applying order constraints on every pair of dimExpr between two		// Applying order constraints on every pair of dimExpr between two
// compound affine expressions can sometime too strict:		// compound affine expressions can sometime too strict:
// E.g, for [dense, dense] -> (d0 + d1, d2 + d3).		// E.g, for [dense, dense] -> (d0 + d1, d2 + d3).
// It is totally fine to have loop sequence d0->d2->d1->d3 instead of		// It is totally fine to have loop sequence d0->d2->d1->d3 instead of
// requiring d0 < d2, d1 < d2, d0 < d3, d1 < d3.		// requiring d0 < d2, d1 < d2, d0 < d3, d1 < d3.
if (!includesDense(mask))		if (!includesDense(mask))
tryLoosenAffineDenseConstraints(env.op(), fldx, fa, tldx, ta);		tryLoosenAffineDenseConstraints(env.op(), fldx, fa, tldx, ta);
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
// a "coordinate", or "Ldx", or what). So the function should be renamed		// a "coordinate", or "Ldx", or what). So the function should be renamed
// and/or the documentation expanded in order to clarify.		// and/or the documentation expanded in order to clarify.
static Value genIndex(CodegenEnv &env, OpOperand *t) {		static Value genIndex(CodegenEnv &env, OpOperand *t) {
auto map = env.op().getMatchingIndexingMap(t);		auto map = env.op().getMatchingIndexingMap(t);
const auto stt = getSparseTensorType(t->get());		const auto stt = getSparseTensorType(t->get());
const Level lvlRank = stt.getLvlRank();		const Level lvlRank = stt.getLvlRank();
assert(static_cast<Level>(map.getNumResults()) == lvlRank);		assert(static_cast<Level>(map.getNumResults()) == lvlRank);
// FIXME: `toOrigDim` is deprecated.		// FIXME: `toOrigDim` is deprecated.
		// FIXME: above we asserted that there are `lvlRank` many results,
		// but this is assuming there are in fact `dimRank` many results instead.
AffineExpr a = map.getResult(toOrigDim(stt.getEncoding(), lvlRank - 1));		AffineExpr a = map.getResult(toOrigDim(stt.getEncoding(), lvlRank - 1));
assert(a.getKind() == AffineExprKind::DimId);		assert(a.getKind() == AffineExprKind::DimId);
unsigned idx = a.cast<AffineDimExpr>().getPosition();		const LoopId idx = a.cast<AffineDimExpr>().getPosition();
return env.getLoopIdxValue(idx);		return env.getLoopVar(idx);
}		}

/// Generates subscript for load/store on a dense or sparse tensor.		/// Generates subscript for load/store on a dense or sparse tensor.
static Value genSubscript(CodegenEnv &env, OpBuilder &builder, OpOperand *t,		static Value genSubscript(CodegenEnv &env, OpBuilder &builder, OpOperand *t,
SmallVectorImpl<Value> &args) {		SmallVectorImpl<Value> &args) {
linalg::GenericOp op = env.op();		const Location loc = env.op().getLoc();
unsigned tensor = t->getOperandNumber();		const TensorId tid = t->getOperandNumber();
auto map = op.getMatchingIndexingMap(t);		const auto map = env.op().getMatchingIndexingMap(t);
const auto stt = getSparseTensorType(t->get());		const auto stt = getSparseTensorType(t->get());
if (stt.hasEncoding()) {		if (stt.hasEncoding()) {
Value pidx = env.emitter().getPidxs()[tensor].back();		// For sparse tensors we only push the last-level's position onto `args`.
assert(pidx);		const auto pos = env.emitter().getPosits()[tid].back();
args.push_back(pidx); // position index		assert(pos);
		args.push_back(pos);
} else {		} else {
		// For dense tensors we push all level's coordinates onto `args`.
const Level lvlRank = stt.getLvlRank();		const Level lvlRank = stt.getLvlRank();
assert(static_cast<Level>(map.getNumResults()) == lvlRank);		assert(static_cast<Level>(map.getNumResults()) == lvlRank);
for (Level l = 0; l < lvlRank; l++) {		for (Level l = 0; l < lvlRank; l++) {
AffineExpr a = map.getResult(l);		const auto lvlExpr = map.getResult(l);
args.push_back(env.emitter().genAffine(builder, a, op.getLoc()));		const auto lvlCrd = env.emitter().genAffine(builder, loc, lvlExpr);
		args.push_back(lvlCrd);
}		}
}		}
return env.emitter().getValBuffer()[tensor];		return env.emitter().getValBuffer()[tid];
}		}

/// Generates insertion code to implement dynamic tensor load.		/// Generates insertion code to implement dynamic tensor load.
static Value genInsertionLoad(CodegenEnv &env, OpBuilder &builder,		static Value genInsertionLoad(CodegenEnv &env, OpBuilder &builder,
OpOperand *t) {		OpOperand *t) {
linalg::GenericOp op = env.op();		linalg::GenericOp op = env.op();
Location loc = op.getLoc();		Location loc = op.getLoc();
// Direct lexicographic coordinate order, tensor loads as zero.		// Direct lexicographic coordinate order, tensor loads as zero.
Show All 26 Lines

/// Generates insertion code to implement dynamic tensor store.		/// Generates insertion code to implement dynamic tensor store.
static void genInsertionStore(CodegenEnv &env, OpBuilder &builder, OpOperand *t,		static void genInsertionStore(CodegenEnv &env, OpBuilder &builder, OpOperand *t,
Value rhs) {		Value rhs) {
linalg::GenericOp op = env.op();		linalg::GenericOp op = env.op();
Location loc = op.getLoc();		Location loc = op.getLoc();
// Direct insertion in lexicographic coordinate order.		// Direct insertion in lexicographic coordinate order.
if (!env.isExpand()) {		if (!env.isExpand()) {
unsigned rank = op.getRank(t);		const LoopOrd numLoops = op.getRank(t);
// FIXME: It's not entirely clear what "indices" means here (i.e.,		// TODO: rewrite this to use `env.emitter().getLoopIVs(ivs)`
// are they "coordinates"? and if so, then are they level-coords or		// instead. We just need to either assert that `numLoops ==
// dim-coords?)		// env.emitter().getCurrentDepth()`, or else update the `getLoopIVs`
SmallVector<Value> indices;		// method to take an optional parameter to restrict to a smaller depth.
for (unsigned i = 0; i < rank; i++) {		SmallVector<Value> ivs;
assert(env.emitter().getLoopIV(i));		ivs.reserve(numLoops);
indices.push_back(env.emitter().getLoopIV(i));		for (LoopOrd n = 0; n < numLoops; n++) {
		const auto iv = env.emitter().getLoopIV(n);
		assert(iv);
		ivs.push_back(iv);
}		}
Value chain = env.getInsertionChain();		Value chain = env.getInsertionChain();
if (!env.getValidLexInsert()) {		if (!env.getValidLexInsert()) {
env.updateInsertionChain(		env.updateInsertionChain(builder.create<InsertOp>(loc, rhs, chain, ivs));
builder.create<InsertOp>(loc, rhs, chain, indices));
} else {		} else {
// Generates runtime check for a valid lex during reduction,		// Generates runtime check for a valid lex during reduction,
// to avoid inserting the identity value for empty reductions.		// to avoid inserting the identity value for empty reductions.
// if (validLexInsert) then		// if (validLexInsert) then
// insert(rhs) into chain		// insert(rhs) into chain
// return updated chain		// return updated chain
// else		// else
// return unmodified chain		// return unmodified chain
scf::IfOp ifValidLexInsert = builder.create<scf::IfOp>(		scf::IfOp ifValidLexInsert = builder.create<scf::IfOp>(
loc, chain.getType(), env.getValidLexInsert(),		loc, chain.getType(), env.getValidLexInsert(),
/else=/true);		/else=/true);
// True branch.		// True branch.
builder.setInsertionPointToStart(ifValidLexInsert.thenBlock());		builder.setInsertionPointToStart(ifValidLexInsert.thenBlock());
Value res = builder.create<InsertOp>(loc, rhs, chain, indices);		Value res = builder.create<InsertOp>(loc, rhs, chain, ivs);
builder.create<scf::YieldOp>(loc, res);		builder.create<scf::YieldOp>(loc, res);
// False branch.		// False branch.
builder.setInsertionPointToStart(ifValidLexInsert.elseBlock());		builder.setInsertionPointToStart(ifValidLexInsert.elseBlock());
builder.create<scf::YieldOp>(loc, chain);		builder.create<scf::YieldOp>(loc, chain);
// Value assignment.		// Value assignment.
builder.setInsertionPointAfter(ifValidLexInsert);		builder.setInsertionPointAfter(ifValidLexInsert);
env.updateInsertionChain(ifValidLexInsert.getResult(0));		env.updateInsertionChain(ifValidLexInsert.getResult(0));
}		}
Show All 30 Lines	static void genInsertionStore(CodegenEnv &env, OpBuilder &builder, OpOperand *t,
builder.create<scf::YieldOp>(loc, count);		builder.create<scf::YieldOp>(loc, count);
builder.setInsertionPointAfter(ifOp);		builder.setInsertionPointAfter(ifOp);
// Value assignment.		// Value assignment.
env.updateExpandCount(ifOp.getResult(0));		env.updateExpandCount(ifOp.getResult(0));
builder.create<memref::StoreOp>(loc, rhs, values, index);		builder.create<memref::StoreOp>(loc, rhs, values, index);
}		}

/// Generates a load on a dense or sparse tensor.		/// Generates a load on a dense or sparse tensor.
static Value genTensorLoad(CodegenEnv &env, OpBuilder &builder, unsigned exp) {		static Value genTensorLoad(CodegenEnv &env, OpBuilder &builder, ExprId exp) {
// Test if the load was hoisted to a higher loop nest.		// Test if the load was hoisted to a higher loop nest.
Value val = env.exp(exp).val;		Value val = env.exp(exp).val;
if (val)		if (val)
return val;		return val;

// Load during insertion.		// Load during insertion.
linalg::GenericOp op = env.op();		linalg::GenericOp op = env.op();
OpOperand *t = &op->getOpOperand(env.exp(exp).tensor);		OpOperand *t = &op->getOpOperand(env.exp(exp).tensor);
if (env.isSparseOutput(t)) {		if (env.isSparseOutput(t)) {
if (env.isCustomReduc())		if (env.isCustomReduc())
return genInsertionLoadReduce(env, builder, t);		return genInsertionLoadReduce(env, builder, t);
return genInsertionLoad(env, builder, t);		return genInsertionLoad(env, builder, t);
}		}
// Actual load.		// Actual load.
SmallVector<Value> args;		SmallVector<Value> args;
Value ptr = genSubscript(env, builder, t, args);		Value ptr = genSubscript(env, builder, t, args);
return builder.create<memref::LoadOp>(op.getLoc(), ptr, args);		return builder.create<memref::LoadOp>(op.getLoc(), ptr, args);
}		}

/// Generates a store on a dense or sparse tensor.		/// Generates a store on a dense or sparse tensor.
static void genTensorStore(CodegenEnv &env, OpBuilder &builder, unsigned exp,		static void genTensorStore(CodegenEnv &env, OpBuilder &builder, ExprId exp,
Value rhs) {		Value rhs) {
linalg::GenericOp op = env.op();		linalg::GenericOp op = env.op();
Location loc = op.getLoc();		Location loc = op.getLoc();
// Test if this is a scalarized reduction.		// Test if this is a scalarized reduction.
if (env.isReduc()) {		if (env.isReduc()) {
env.updateReduc(rhs);		env.updateReduc(rhs);
return;		return;
}		}
Show All 31 Lines	static void genTensorStore(CodegenEnv &env, OpBuilder &builder, ExprId exp,
}		}
// Actual store.		// Actual store.
SmallVector<Value> args;		SmallVector<Value> args;
Value ptr = genSubscript(env, builder, t, args);		Value ptr = genSubscript(env, builder, t, args);
builder.create<memref::StoreOp>(loc, rhs, ptr, args);		builder.create<memref::StoreOp>(loc, rhs, ptr, args);
}		}

/// Generates an invariant value.		/// Generates an invariant value.
inline static Value genInvariantValue(CodegenEnv &env, unsigned exp) {		inline static Value genInvariantValue(CodegenEnv &env, ExprId exp) {
return env.exp(exp).val;		return env.exp(exp).val;
}		}

/// Semi-ring branches are simply inlined by the sparse compiler. Prior		/// Semi-ring branches are simply inlined by the sparse compiler. Prior
/// analysis has verified that all computations are "local" to the inlined		/// analysis has verified that all computations are "local" to the inlined
/// branch or otherwise invariantly defined outside the loop nest, with the		/// branch or otherwise invariantly defined outside the loop nest, with the
/// exception of index computations, which need to be relinked to actual		/// exception of index computations, which need to be relinked to actual
/// inlined cloned code.		/// inlined cloned code.
static Value relinkBranch(CodegenEnv &env, RewriterBase &rewriter, Block *block,		static Value relinkBranch(CodegenEnv &env, RewriterBase &rewriter, Block *block,
Value e, unsigned ldx) {		Value e, LoopId ldx) {
if (Operation *def = e.getDefiningOp()) {		if (Operation *def = e.getDefiningOp()) {
if (auto indexOp = dyn_cast<linalg::IndexOp>(def))		if (auto indexOp = dyn_cast<linalg::IndexOp>(def))
return env.getLoopIdxValue(indexOp.getDim());		return env.getLoopVar(indexOp.getDim());
if (def->getBlock() == block) {		if (def->getBlock() == block) {
for (unsigned i = 0, n = def->getNumOperands(); i < n; i++) {		for (unsigned i = 0, n = def->getNumOperands(); i < n; i++) {
rewriter.updateRootInPlace(def, [&]() {		rewriter.updateRootInPlace(def, [&]() {
def->setOperand(		def->setOperand(
i, relinkBranch(env, rewriter, block, def->getOperand(i), ldx));		i, relinkBranch(env, rewriter, block, def->getOperand(i), ldx));
});		});
}		}
}		}
}		}
return e;		return e;
}		}

/// Recursively generates tensor expression.		/// Recursively generates tensor expression.
static Value genExp(CodegenEnv &env, RewriterBase &rewriter, unsigned exp,		static Value genExp(CodegenEnv &env, RewriterBase &rewriter, ExprId e,
unsigned ldx) {		LoopId ldx) {
linalg::GenericOp op = env.op();		linalg::GenericOp op = env.op();
Location loc = op.getLoc();		Location loc = op.getLoc();

if (exp == -1u)		if (e == kInvalidId)
return Value();		return Value();
if (env.exp(exp).kind == Kind::kTensor)		const TensorExp &exp = env.exp(e);
return genTensorLoad(env, rewriter, exp);		const auto kind = exp.kind;
if (env.exp(exp).kind == Kind::kInvariant)		if (kind == Kind::kTensor)
return genInvariantValue(env, exp);		return genTensorLoad(env, rewriter, e);
if (env.exp(exp).kind == Kind::kIndex)		if (kind == Kind::kInvariant)
return env.getLoopIdxValue(env.exp(exp).index);		return genInvariantValue(env, e);
		if (kind == Kind::kLoopVar)
if (env.exp(exp).kind == Kind::kReduce)		return env.getLoopVar(exp.loop);
env.startCustomReduc(exp); // enter custom
		if (kind == Kind::kReduce)
Value v0 = genExp(env, rewriter, env.exp(exp).children.e0, ldx);		env.startCustomReduc(e); // enter custom
Value v1 = genExp(env, rewriter, env.exp(exp).children.e1, ldx);
Value ee = env.merger().buildExp(rewriter, loc, exp, v0, v1);		Value v0 = genExp(env, rewriter, exp.children.e0, ldx);
if (ee && (env.exp(exp).kind == Kind::kUnary \|\|		Value v1 = genExp(env, rewriter, exp.children.e1, ldx);
env.exp(exp).kind == Kind::kBinary \|\|		Value ee = env.merger().buildExp(rewriter, loc, e, v0, v1);
env.exp(exp).kind == Kind::kBinaryBranch \|\|		if (ee && (kind == Kind::kUnary \|\| kind == Kind::kBinary \|\|
env.exp(exp).kind == Kind::kReduce \|\|		kind == Kind::kBinaryBranch \|\| kind == Kind::kReduce \|\|
env.exp(exp).kind == Kind::kSelect))		kind == Kind::kSelect))
ee = relinkBranch(env, rewriter, ee.getParentBlock(), ee, ldx);		ee = relinkBranch(env, rewriter, ee.getParentBlock(), ee, ldx);

if (env.exp(exp).kind == Kind::kReduce)		if (kind == Kind::kReduce)
env.endCustomReduc(); // exit custom		env.endCustomReduc(); // exit custom

if (env.exp(exp).kind == kSelect) {		if (kind == kSelect) {
assert(!env.exp(exp).val);		assert(!exp.val);
env.exp(exp).val = v0; // Preserve value for later use.		env.exp(e).val = v0; // Preserve value for later use.
}		}

return ee;		return ee;
}		}

/// Hoists loop invariant tensor loads for which indices have been exhausted.		/// Hoists loop invariant tensor loads for which indices have been exhausted.
static void genInvariants(CodegenEnv &env, OpBuilder &builder, unsigned exp,		static void genInvariants(CodegenEnv &env, OpBuilder &builder, ExprId exp,
unsigned ldx, bool atStart) {		LoopId ldx, bool atStart) {
if (exp == -1u)		if (exp == kInvalidId)
return;		return;
if (env.exp(exp).kind == Kind::kTensor) {		if (env.exp(exp).kind == Kind::kTensor) {
// Inspect tensor indices.		// Inspect tensor indices.
bool atLevel = ldx == -1u;		bool isAtLoop = ldx == kInvalidId;
linalg::GenericOp op = env.op();		linalg::GenericOp op = env.op();
OpOperand &t = op->getOpOperand(env.exp(exp).tensor);		OpOperand &t = op->getOpOperand(env.exp(exp).tensor);
auto map = op.getMatchingIndexingMap(&t);		auto map = op.getMatchingIndexingMap(&t);
const auto stt = getSparseTensorType(t.get());		const auto stt = getSparseTensorType(t.get());
const Level lvlRank = stt.getLvlRank();		const Level lvlRank = stt.getLvlRank();
assert(static_cast<Level>(map.getNumResults()) == lvlRank);		assert(static_cast<Level>(map.getNumResults()) == lvlRank);
for (Level l = 0; l < lvlRank; l++) {		for (Level l = 0; l < lvlRank; l++) {
// FIXME: `toOrigDim` is deprecated.		// FIXME: `toOrigDim` is deprecated.
		// FIXME: above we asserted that there are `lvlRank` many results,
		// but this is assuming there are in fact `dimRank` many results instead.
AffineExpr a = map.getResult(toOrigDim(stt.getEncoding(), l));		AffineExpr a = map.getResult(toOrigDim(stt.getEncoding(), l));
std::optional<unsigned> sldx =		const auto sldx = env.merger().getLoopId(t.getOperandNumber(), l);
env.merger().getLoopIdx(t.getOperandNumber(), l);
if (sldx && env.merger().isFilterLoop(*sldx)) {		if (sldx && env.merger().isFilterLoop(*sldx)) {
if (!env.getLoopIdxValue(*sldx))		if (!env.getLoopVar(*sldx))
// The filter loops has not been constructed.		// The filter loops has not been constructed.
return;		return;
if (*sldx == ldx)		if (*sldx == ldx)
atLevel = true;		isAtLoop = true;
} else if (!isInvariantAffine(env, a, ldx, atLevel))		} else if (!isInvariantAffine(env, a, ldx, isAtLoop))
return; // still in play		return; // still in play
}		}
// All exhausted at this level (atLevel denotes exactly at this level).		// All exhausted at this level (isAtLoop denotes exactly at this LoopId).
if (!atLevel)		if (!isAtLoop)
return;		return;
OpOperand *lhs = op.getDpsInitOperand(0);		OpOperand *lhs = op.getDpsInitOperand(0);
if (lhs == &t) {		if (lhs == &t) {
// Start or end a scalarized reduction.		// Start or end a scalarized reduction.
if (atStart) {		if (atStart) {
Value load = env.isCustomReduc() ? env.getCustomRedId()		Value load = env.isCustomReduc() ? env.getCustomRedId()
: genTensorLoad(env, builder, exp);		: genTensorLoad(env, builder, exp);
env.startReduc(exp, load);		env.startReduc(exp, load);
if (env.hasSparseOutput())		if (env.hasSparseOutput())
env.setValidLexInsert(constantI1(builder, env.op().getLoc(), false));		env.setValidLexInsert(constantI1(builder, env.op().getLoc(), false));
} else {		} else {
genTensorStore(env, builder, exp, env.endReduc());		genTensorStore(env, builder, exp, env.endReduc());
env.clearValidLexInsert();		env.clearValidLexInsert();
}		}
} else {		} else {
// Start or end loop invariant hoisting of a tensor load.		// Start or end loop invariant hoisting of a tensor load.
env.exp(exp).val = atStart ? genTensorLoad(env, builder, exp) : Value();		env.exp(exp).val = atStart ? genTensorLoad(env, builder, exp) : Value();
}		}
} else if (env.exp(exp).kind != Kind::kInvariant &&		} else if (env.exp(exp).kind != Kind::kInvariant &&
env.exp(exp).kind != Kind::kIndex) {		env.exp(exp).kind != Kind::kLoopVar) {
// Traverse into the binary operations. Note that we only hoist		// Traverse into the binary operations. Note that we only hoist
// tensor loads, since subsequent MLIR/LLVM passes know how to		// tensor loads, since subsequent MLIR/LLVM passes know how to
// deal with all other kinds of derived loop invariants.		// deal with all other kinds of derived loop invariants.
if (env.exp(exp).kind == Kind::kReduce)		if (env.exp(exp).kind == Kind::kReduce)
env.startCustomReduc(exp); // enter custom		env.startCustomReduc(exp); // enter custom
unsigned e0 = env.exp(exp).children.e0;		const ExprId e0 = env.exp(exp).children.e0;
unsigned e1 = env.exp(exp).children.e1;		const ExprId e1 = env.exp(exp).children.e1;
genInvariants(env, builder, e0, ldx, atStart);		genInvariants(env, builder, e0, ldx, atStart);
genInvariants(env, builder, e1, ldx, atStart);		genInvariants(env, builder, e1, ldx, atStart);
if (env.exp(exp).kind == Kind::kReduce)		if (env.exp(exp).kind == Kind::kReduce)
env.endCustomReduc(); // exit custom		env.endCustomReduc(); // exit custom
}		}
}		}

/// Generates an expanded access pattern in innermost dimension.		/// Generates an expanded access pattern in innermost dimension.
static void genExpand(CodegenEnv &env, OpBuilder &builder, unsigned at,		static void genExpand(CodegenEnv &env, OpBuilder &builder, LoopOrd at,
bool atStart) {		bool atStart) {
linalg::GenericOp op = env.op();		linalg::GenericOp op = env.op();
OpOperand *lhs = op.getDpsInitOperand(0);		OpOperand *lhs = op.getDpsInitOperand(0);
if (!env.atExpandLevel(lhs, op.getRank(lhs), at))		if (!env.atExpandLevel(lhs, op.getRank(lhs), at))
return; // not needed at this level		return; // not needed at this level
assert(!env.isReduc());		assert(!env.isReduc());
// Generate start or end of an expanded access pattern. Note that because		// Generate start or end of an expanded access pattern. Note that because
// an expension does not rely on the ongoing contents of the sparse storage		// an expension does not rely on the ongoing contents of the sparse storage
Show All 10 Lines	if (atStart) {
Type t3 = MemRefType::get(dynShape, builder.getIndexType());		Type t3 = MemRefType::get(dynShape, builder.getIndexType());
Type t4 = builder.getIndexType();		Type t4 = builder.getIndexType();
auto r = builder.create<ExpandOp>(loc, TypeRange({t1, t2, t3, t4}), tensor);		auto r = builder.create<ExpandOp>(loc, TypeRange({t1, t2, t3, t4}), tensor);
assert(r.getNumResults() == 4);		assert(r.getNumResults() == 4);
env.startExpand(r.getResult(0), r.getResult(1), r.getResult(2),		env.startExpand(r.getResult(0), r.getResult(1), r.getResult(2),
r.getResult(3));		r.getResult(3));
} else {		} else {
SmallVector<Value> indices;		SmallVector<Value> indices;
for (unsigned i = 0; i < at; i++)		for (LoopOrd i = 0; i < at; i++)
indices.push_back(env.emitter().getLoopIV(i));		indices.push_back(env.emitter().getLoopIV(i));
Value values = env.getExpandValues();		Value values = env.getExpandValues();
Value filled = env.getExpandFilled();		Value filled = env.getExpandFilled();
Value added = env.getExpandAdded();		Value added = env.getExpandAdded();
Value count = env.getExpandCount();		Value count = env.getExpandCount();
Value chain = env.getInsertionChain();		Value chain = env.getInsertionChain();
Value compress = builder.create<CompressOp>(loc, values, filled, added,		Value compress = builder.create<CompressOp>(loc, values, filled, added,
count, chain, indices);		count, chain, indices);
Show All 25 Lines	static bool isParallelFor(CodegenEnv &env, bool isOuter, bool isSparse) {
case SparseParallelizationStrategy::kAnyStorageAnyLoop:		case SparseParallelizationStrategy::kAnyStorageAnyLoop:
return true;		return true;
}		}
llvm_unreachable("unexpected parallelization strategy");		llvm_unreachable("unexpected parallelization strategy");
}		}

/// Generates a for-loop on a single index.		/// Generates a for-loop on a single index.
static Operation *genFor(CodegenEnv &env, OpBuilder &builder, bool isOuter,		static Operation *genFor(CodegenEnv &env, OpBuilder &builder, bool isOuter,
bool isInner, unsigned idx, ArrayRef<size_t> tids,		bool isInner, LoopId ldx, ArrayRef<TensorId> tids,
ArrayRef<size_t> dims) {		ArrayRef<Level> lvls) {
linalg::GenericOp op = env.op();		linalg::GenericOp op = env.op();
Location loc = op.getLoc();		Location loc = op.getLoc();
auto iteratorTypes = op.getIteratorTypesArray();		auto iteratorTypes = op.getIteratorTypesArray();
bool isSparse = llvm::any_of(tids, [idx, &env](size_t tid) {		bool isSparse = llvm::any_of(tids, [ldx, &env](TensorId tid) {
return isCompressedDLT(env.dlt(tid, idx)) \|\|		const auto dlt = env.dlt(tid, ldx);
isSingletonDLT(env.dlt(tid, idx));		return isCompressedDLT(dlt) \|\| isSingletonDLT(dlt);
});		});

bool isParallel = isParallelFor(env, isOuter, isSparse);		bool isParallel = isParallelFor(env, isOuter, isSparse);

Operation loop = env.genLoopBoundary([&](MutableArrayRef<Value> reduc) {		Operation loop = env.genLoopBoundary([&](MutableArrayRef<Value> reduc) {
if (env.merger().isFilterLoop(idx)) {		if (env.merger().isFilterLoop(ldx)) {
size_t tid = tids.front(), dim = dims.front();		const TensorId tid = tids.front();
// tids/dims must only have one value because filter loops only		const Level lvl = lvls.front();
		// tids/lvls must only have one value because filter loops only
// corresponding to the one and only sparse tensor level.		// corresponding to the one and only sparse tensor level.
assert(isSparse && tids.size() == 1 && dims.size() == 1);		assert(isSparse && tids.size() == 1 && lvls.size() == 1);
OpOperand *t = &op->getOpOperand(tid);		OpOperand *t = &op->getOpOperand(tid);
auto enc = getSparseTensorEncoding(t->get().getType());		auto enc = getSparseTensorEncoding(t->get().getType());
// Retrieves the affine expression for the filter loop.		// Retrieves the affine expression for the filter loop.
// FIXME: `toOrigDim` is deprecated.		// FIXME: `toOrigDim` is deprecated.
AffineExpr a =		AffineExpr a =
op.getMatchingIndexingMap(t).getResult(toOrigDim(enc, dim));		op.getMatchingIndexingMap(t).getResult(toOrigDim(enc, lvl));
return env.emitter().enterFilterLoopOverTensorAtDim(builder, loc, tid,		return env.emitter().enterFilterLoopOverTensorAtLvl(builder, loc, tid,
dim, a, reduc);		lvl, a, reduc);
}		}
return env.emitter().enterLoopOverTensorAtDim(builder, loc, tids, dims,		return env.emitter().enterLoopOverTensorAtLvl(builder, loc, tids, lvls,
reduc, isParallel);		reduc, isParallel);
});		});
assert(loop);		assert(loop);
return loop;		return loop;
}		}

/// Emit a while-loop for co-iteration over multiple indices.		/// Emit a while-loop for co-iteration over multiple indices.
static Operation *genWhile(CodegenEnv &env, OpBuilder &builder, unsigned idx,		static Operation *genWhile(CodegenEnv &env, OpBuilder &builder, LoopId idx,
bool needsUniv, ArrayRef<size_t> tids,		bool needsUniv, ArrayRef<TensorId> tids,
ArrayRef<size_t> dims) {		ArrayRef<Level> lvls) {
Operation loop = env.genLoopBoundary([&](MutableArrayRef<Value> reduc) {		Operation loop = env.genLoopBoundary([&](MutableArrayRef<Value> reduc) {
// Construct the while-loop with a parameter for each		// Construct the while-loop with a parameter for each
// index.		// index.
return env.emitter().enterCoIterationOverTensorsAtDims(		return env.emitter().enterCoIterationOverTensorsAtLvls(
builder, env.op().getLoc(), tids, dims, needsUniv, reduc);		builder, env.op().getLoc(), tids, lvls, needsUniv, reduc);
});		});
assert(loop);		assert(loop);
return loop;		return loop;
}		}

/// Generates a for-loop or a while-loop, depending on whether it implements		/// Generates a for-loop or a while-loop, depending on whether it implements
/// singleton iteration or co-iteration over the given conjunction.		/// singleton iteration or co-iteration over the given conjunction.
static Operation *genLoop(CodegenEnv &env, OpBuilder &builder, unsigned at,		static Operation *genLoop(CodegenEnv &env, OpBuilder &builder, LoopOrd at,
bool needsUniv, ArrayRef<size_t> tids,		bool needsUniv, ArrayRef<TensorId> tids,
ArrayRef<size_t> dims, bool isFor) {		ArrayRef<Level> lvls, bool isFor) {
assert(tids.size() == dims.size());		assert(tids.size() == lvls.size());
unsigned idx = env.topSortAt(at);		const LoopId idx = env.topSortAt(at);
if (isFor) {		if (isFor) {
bool isOuter = at == 0;		bool isOuter = at == 0;
bool isInner = at == env.topSortSize() - 1;		bool isInner = at == env.topSortSize() - 1;
return genFor(env, builder, isOuter, isInner, idx, tids, dims);		return genFor(env, builder, isOuter, isInner, idx, tids, lvls);
}		}
return genWhile(env, builder, idx, needsUniv, tids, dims);		return genWhile(env, builder, idx, needsUniv, tids, lvls);
}		}

/// Generates the induction structure for a while-loop.		/// Generates the induction structure for a while-loop.
static void finalizeWhileOp(CodegenEnv &env, OpBuilder &builder, unsigned idx,		static void finalizeWhileOp(CodegenEnv &env, OpBuilder &builder, LoopId idx,
bool needsUniv, BitVector &induction,		bool needsUniv, BitVector &induction,
scf::WhileOp whileOp) {		scf::WhileOp whileOp) {
Location loc = env.op().getLoc();		Location loc = env.op().getLoc();
// Finalize each else branch of all if statements.		// Finalize each else branch of all if statements.
if (env.isReduc() \|\| env.isExpand() \|\| env.getInsertionChain()) {		if (env.isReduc() \|\| env.isExpand() \|\| env.getInsertionChain()) {
while (auto ifOp = dyn_cast_or_null<scf::IfOp>(		while (auto ifOp = dyn_cast_or_null<scf::IfOp>(
builder.getInsertionBlock()->getParentOp())) {		builder.getInsertionBlock()->getParentOp())) {
// Break on IfOp for slicing filtering.		// Break on IfOp for slicing filtering.
Show All 23 Lines	while (auto ifOp = dyn_cast_or_null<scf::IfOp>(
builder.create<scf::YieldOp>(loc, yields);		builder.create<scf::YieldOp>(loc, yields);
builder.setInsertionPointAfter(ifOp);		builder.setInsertionPointAfter(ifOp);
}		}
}		}
builder.setInsertionPointToEnd(&whileOp.getAfter().front());		builder.setInsertionPointToEnd(&whileOp.getAfter().front());
}		}

/// Generates a single if-statement within a while-loop.		/// Generates a single if-statement within a while-loop.
static scf::IfOp genIf(CodegenEnv &env, OpBuilder &builder, unsigned idx,		static scf::IfOp genIf(CodegenEnv &env, OpBuilder &builder, LoopId ldx,
BitVector &conditions) {		BitVector &conditions) {
Location loc = env.op().getLoc();		Location loc = env.op().getLoc();
SmallVector<Type> types;		SmallVector<Type> types;
Value cond;		Value cond;
for (unsigned b = 0, be = conditions.size(); b < be; b++) {		for (TensorLoopId b = 0, be = conditions.size(); b < be; b++) {
if (!conditions[b])		if (!conditions[b])
continue;		continue;
unsigned tensor = env.merger().tensor(b);		const TensorId tid = env.merger().tensor(b);
assert(idx == env.merger().index(b));		assert(ldx == env.merger().loop(b));
Value clause;		Value clause;
if (isCompressedDLT(env.dlt(b)) \|\| isSingletonDLT(env.dlt(b))) {		const auto dlt = env.dlt(b);
auto dim = *env.merger().getDimNum(tensor, idx);		if (isCompressedDLT(dlt) \|\| isSingletonDLT(dlt)) {
Value op1 = env.emitter().getCoord()[tensor][dim];		const Level lvl = *env.merger().getLvl(tid, ldx);
Value op2 = env.getLoopIdxValue(idx);		const Value crd = env.emitter().getCoords()[tid][lvl];
clause = builder.create<arith::CmpIOp>(loc, arith::CmpIPredicate::eq, op1,		const Value lvar = env.getLoopVar(ldx);
op2);		clause = builder.create<arith::CmpIOp>(loc, arith::CmpIPredicate::eq, crd,
		lvar);
} else {		} else {
assert(isDenseDLT(env.merger().getDimLevelType(b)) \|\|		assert(isDenseDLT(dlt) \|\| isUndefDLT(dlt));
isUndefDLT(env.merger().getDimLevelType(b)));
clause = constantI1(builder, loc, true);		clause = constantI1(builder, loc, true);
}		}
cond = cond ? builder.create<arith::AndIOp>(loc, cond, clause) : clause;		cond = cond ? builder.create<arith::AndIOp>(loc, cond, clause) : clause;
}		}
if (env.isReduc()) {		if (env.isReduc()) {
types.push_back(env.getReduc().getType());		types.push_back(env.getReduc().getType());
if (env.getValidLexInsert())		if (env.getValidLexInsert())
types.push_back(env.getValidLexInsert().getType());		types.push_back(env.getValidLexInsert().getType());
Show All 33 Lines
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Sparse compiler synthesis methods (loop sequence).		// Sparse compiler synthesis methods (loop sequence).
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// Starts a loop sequence at given level. Returns true if		/// Starts a loop sequence at given level. Returns true if
/// the universal loop index must be maintained at this level.		/// the universal loop index must be maintained at this level.
static bool startLoopSeq(CodegenEnv &env, OpBuilder &builder, unsigned exp,		static bool startLoopSeq(CodegenEnv &env, OpBuilder &builder, ExprId exp,
unsigned at, unsigned idx, unsigned ldx,		LoopOrd at, LoopId idx, LoopId ldx, LatSetId lts) {
unsigned lts) {		assert(!env.getLoopVar(idx));
assert(!env.getLoopIdxValue(idx));
// Emit invariants at this loop sequence level.		// Emit invariants at this loop sequence level.
genInvariants(env, builder, exp, ldx, /atStart=/true);		genInvariants(env, builder, exp, ldx, /atStart=/true);
// Emit access pattern expansion for sparse tensor output.		// Emit access pattern expansion for sparse tensor output.
genExpand(env, builder, at, /atStart=/true);		genExpand(env, builder, at, /atStart=/true);
// Emit further intitialization at this loop sequence level.		// Emit further intitialization at this loop sequence level.
unsigned l0 = env.set(lts)[0];		const LatPointId l0 = env.set(lts)[0];
bool needsUniv = false;		bool needsUniv = false;

SmallVector<size_t> tids;		SmallVector<TensorId> tids;
SmallVector<size_t> dims;		SmallVector<Level> lvls;
env.merger().foreachTidDimPairInBits(		env.merger().foreachTensorLoopId(
env.lat(l0).bits, [&](unsigned b, unsigned tid,		env.lat(l0).bits, [&](TensorLoopId b, TensorId tid,
std::optional<unsigned> dim, DimLevelType dlt) {		std::optional<Level> lvl, DimLevelType dlt) {
assert(env.merger().index(b) == idx);		assert(env.merger().loop(b) == idx);
if (isDenseDLT(dlt) \|\| isUndefDLT(dlt)) {		if (isDenseDLT(dlt) \|\| isUndefDLT(dlt)) {
needsUniv = true;		needsUniv = true;
} else {		} else {
// sparse/singleton dim levels.		// sparse/singleton levels.
tids.push_back(tid);		tids.push_back(tid);
dims.push_back(*dim);		lvls.push_back(*lvl);
}		}
});		});

env.emitter().enterNewLoopSeq(builder, env.op().getLoc(), tids, dims);		env.emitter().enterNewLoopSeq(builder, env.op().getLoc(), tids, lvls);

// Maintain the universal index only if it is actually		// Maintain the universal index only if it is actually
// consumed by a subsequent lattice point.		// consumed by a subsequent lattice point.
if (needsUniv) {		if (needsUniv) {
unsigned lsize = env.set(lts).size();		unsigned lsize = env.set(lts).size();
for (unsigned i = 1; i < lsize; i++) {		for (unsigned i = 1; i < lsize; i++) {
unsigned li = env.set(lts)[i];		const LatPointId li = env.set(lts)[i];
if (!env.merger().hasAnySparse(env.lat(li).simple))		if (!env.merger().hasAnySparse(env.lat(li).simple))
return true;		return true;
}		}
}		}
return false;		return false;
}		}

static void genConstantDenseAddressFromLevel(CodegenEnv &env,		static void genConstantDenseAddressFromLevel(CodegenEnv &env,
OpBuilder &builder, unsigned tid,		OpBuilder &builder, TensorId tid,
Level lvl) {		Level startLvl) {
// TODO: Handle affine expression on output tensor.		// TODO: Handle affine expression on output tensor.
linalg::GenericOp op = env.op();		linalg::GenericOp op = env.op();
assert(tid < op.getNumDpsInputs());		assert(tid < op.getNumDpsInputs());
OpOperand *input = op.getDpsInputOperands()[tid];		OpOperand *input = op.getDpsInputOperands()[tid];
ArrayRef<AffineExpr> affines = op.getMatchingIndexingMap(input).getResults();		const auto lvlExprs = op.getMatchingIndexingMap(input).getResults();
const auto enc = getSparseTensorEncoding(input->get().getType());		const auto enc = getSparseTensorEncoding(input->get().getType());
if (enc) {		if (enc) {
		const Location loc = op.getLoc();
		const TensorId tid = input->getOperandNumber();
const Level lvlRank = enc.getLvlRank();		const Level lvlRank = enc.getLvlRank();
assert(affines.size() == static_cast<size_t>(lvlRank));		assert(lvlExprs.size() == static_cast<size_t>(lvlRank));
for (Level l = lvl; l < lvlRank; l++) {		// FIXME: there is dim/lvl confusion here
		for (Level l = startLvl; l < lvlRank; l++) {
// FIXME: `toOrigDim` is deprecated.		// FIXME: `toOrigDim` is deprecated.
AffineExpr affine = affines[toOrigDim(enc, l)];		AffineExpr lvlExpr = lvlExprs[toOrigDim(enc, l)];
if (enc.isDenseLvl(l) && affine.isa<AffineConstantExpr>())		if (enc.isDenseLvl(l) && lvlExpr.isa<AffineConstantExpr>())
env.emitter().genDenseAffineAddressAtCurLevel(		env.emitter().genDenseAffineAddress(builder, loc, tid, l, lvlExpr);
builder, op.getLoc(), input->getOperandNumber(), l, affine);
else		else
return; // break on first non-dense non-constant level		return; // break on first non-dense non-constant level
}		}
}		}
}		}

static void genInitConstantDenseAddress(CodegenEnv &env,		static void genInitConstantDenseAddress(CodegenEnv &env,
RewriterBase &rewriter) {		RewriterBase &rewriter) {
// We can generate address for constant affine expression before any loops		// We can generate address for constant affine expression before any loops
// starting from the first level as they do not depend on any thing.		// starting from the first level as they do not depend on any thing.
// E.g., [Dense, Dense, Sparse] -> (1, 2, d0), the addresses for the first two		// E.g., [Dense, Dense, Sparse] -> (1, 2, d0), the addresses for the first two
// levels can be determined before loops.		// levels can be determined before loops.
for (unsigned tid = 0, e = env.op().getNumDpsInputs(); tid < e; tid++)		for (TensorId tid = 0, e = env.op().getNumDpsInputs(); tid < e; tid++)
genConstantDenseAddressFromLevel(env, rewriter, tid, 0);		genConstantDenseAddressFromLevel(env, rewriter, tid, 0);
}		}

/// Return true if the lattices bit can be iterated by a for loop.		/// Return true if the lattices bit can be iterated by a for loop.
static bool translateBitsToTidDimPairs(		static bool translateBitsToTidLvlPairs(
CodegenEnv &env, unsigned li, unsigned idx, SmallVectorImpl<size_t> &tids,		CodegenEnv &env, LatPointId li, LoopId ldx, SmallVectorImpl<TensorId> &tids,
SmallVectorImpl<size_t> &dims, SmallVectorImpl<size_t> &affineTids,		SmallVectorImpl<Level> &lvls, SmallVectorImpl<TensorId> &affineTids,
SmallVectorImpl<size_t> &affineDims, SmallVectorImpl<AffineExpr> &exps) {		SmallVectorImpl<Level> &affineLvls, SmallVectorImpl<AffineExpr> &exps) {
const BitVector &all = env.lat(li).bits;		const BitVector &all = env.lat(li).bits;
const BitVector &simple = env.lat(li).simple;		const BitVector &simple = env.lat(li).simple;
		const TensorId outTid = env.merger().getOutTensorID();
		const std::optional<Level> outLvl = env.merger().getLvl(outTid, ldx);

unsigned numloopCond = 0;		unsigned numloopCond = 0;
bool hasNonUnique = false;		bool hasNonUnique = false;
// Converts bits to array + dim pair		env.merger().foreachTensorLoopId(
env.merger().foreachTidDimPairInBits(		all, [&, ldx](TensorLoopId b, TensorId tid, std::optional<Level> lvl,
all, [&, idx](unsigned b, unsigned tid, std::optional<unsigned> dim,
DimLevelType dlt) {		DimLevelType dlt) {
if (simple.test(b)) {		if (simple.test(b)) {
if (isUndefDLT(dlt)) {		if (isUndefDLT(dlt)) {
// An undefined dlt in the lattices, we probably mean to iterate		// An undefined dlt in the lattices, we probably mean to
// based on the dim of output tensor.		// iterate based on the level of output tensor. E.g., this
// E.g., this could be a synthetic tensor (for invariants and sparse		// could be a synthetic tensor (for invariants and sparse
// output tensor).		// output tensor).
// out[i][j] = invariant; or a broadcast		// out[i][j] = invariant; or a broadcast
// out[i][j] = in[i] (j is undef for input)		// out[i][j] = in[i] (j is undef for input)
tid = env.merger().getOutTensorID();		tid = outTid;
dim = env.merger().getDimNum(tid, idx);		lvl = outLvl;
// Skips invalid dim (e.g., when this is a zero ranked tensor).		// Skips invalid lvl (e.g., when this is a zero ranked tensor).
if (!dim)		if (!lvl)
return;		return;
}		}
hasNonUnique = !isUniqueDLT(dlt) \|\| hasNonUnique;		hasNonUnique = !isUniqueDLT(dlt) \|\| hasNonUnique;
tids.push_back(tid);		tids.push_back(tid);
dims.push_back(*dim);		lvls.push_back(*lvl);
numloopCond++;		numloopCond++;
} else if (isDenseDLT(dlt)) {		} else if (isDenseDLT(dlt)) {
tids.push_back(tid);		tids.push_back(tid);
dims.push_back(*dim);		lvls.push_back(*lvl);
} else {		} else {
assert(isUndefDLT(dlt));		assert(isUndefDLT(dlt));
linalg::GenericOp op = env.op();		linalg::GenericOp op = env.op();
if (tid >= op.getNumDpsInputs())		if (tid >= op.getNumDpsInputs())
// We only handle affine expression on input tensors (for now).		// We only handle affine expression on input tensors (for now).
return;		return;
OpOperand *operand = &op->getOpOperand(tid);		OpOperand *operand = &op->getOpOperand(tid);
const auto stt = getSparseTensorType(operand->get());		const auto stt = getSparseTensorType(operand->get());
// Non-annotated dense tensors requires no special handling.		// Non-annotated dense tensors requires no special handling.
if (!stt.hasEncoding())		if (!stt.hasEncoding())
return;		return;

ArrayRef<AffineExpr> affines =		ArrayRef<AffineExpr> affines =
op.getMatchingIndexingMap(operand).getResults();		op.getMatchingIndexingMap(operand).getResults();
const Level lvlRank = stt.getLvlRank();		const Level lvlRank = stt.getLvlRank();
assert(affines.size() == static_cast<size_t>(lvlRank));		assert(affines.size() == static_cast<size_t>(lvlRank));
for (Level l = 0; l < lvlRank; l++) {		for (Level l = 0; l < lvlRank; l++) {
// FIXME: `toOrigDim` is deprecated.		// FIXME: `toOrigDim` is deprecated.
AffineExpr exp = affines[toOrigDim(stt.getEncoding(), l)];		AffineExpr exp = affines[toOrigDim(stt.getEncoding(), l)];
// Skip simple affine expression and non dense dimensions (which has		// Skip simple affine expression and non-dense levels (which
// it own filter loop).		// have their own filter loop).
if (exp.isa<AffineDimExpr>() \|\| !stt.isDenseLvl(l))		if (exp.isa<AffineDimExpr>() \|\| !stt.isDenseLvl(l))
continue;		continue;

// Constant affine expression are handled in genLoop		// Constant affine expression are handled in genLoop
if (!exp.isa<AffineConstantExpr>()) {		if (!exp.isa<AffineConstantExpr>()) {
bool atLevel = false;		bool isAtLoop = false;
if (isInvariantAffine(env, exp, idx, atLevel) && atLevel) {		if (isInvariantAffine(env, exp, ldx, isAtLoop) && isAtLoop) {
// If the compound affine is invariant and we are right at the		// If the compound affine is invariant and we are right at the
// level. We need to generate the address according to the		// level. We need to generate the address according to the
// affine expression. This is also the best place we can do it		// affine expression. This is also the best place we can do it
// to avoid putting it inside inner loops.		// to avoid putting it inside inner loops.
// NOTE: It assumes that the levels of the input tensor are		// NOTE: It assumes that the levels of the input tensor are
// initialized in order (and it is also currently guaranteed by		// initialized in order (and it is also currently guaranteed by
// computeIterationGraph), another more admissible approach		// computeIterationGraph), another more admissible approach
// might be accepting out-of-order access between consecutive		// might be accepting out-of-order access between consecutive
// dense levels.		// dense levels.
affineTids.push_back(tid);		affineTids.push_back(tid);
affineDims.push_back(l);		affineLvls.push_back(l);
exps.push_back(exp);		exps.push_back(exp);
}		}
}		}
}		}
}		}
});		});

if (isDenseDLT(env.dlt(env.merger().getOutTensorID(), idx))) {		if (isDenseDLT(env.dlt(outTid, ldx))) {
// Note that we generate dense indices of the output tensor		// Note that we generate dense indices of the output tensor
// unconditionally, since they may not appear in the lattice, but may be		// unconditionally, since they may not appear in the lattice, but may be
// needed for linearized env.		// needed for linearized env.
auto dim = *env.merger().getDimNum(env.merger().getOutTensorID(), idx);		tids.push_back(outTid);
tids.push_back(env.merger().getOutTensorID());		lvls.push_back(*outLvl);
dims.push_back(dim);
}		}

assert(numloopCond > 0);		assert(numloopCond > 0);
// If we just need to one loop conditions and the conditions is not imposed on		// If we just need to one loop conditions and the conditions is not imposed on
// non-unique level, the loop can be generated by a for loop.		// non-unique level, the loop can be generated by a for loop.
return numloopCond == 1 && !hasNonUnique;		return numloopCond == 1 && !hasNonUnique;
}		}

/// Starts a single loop in current sequence.		/// Starts a single loop in current sequence.
static Operation *startLoop(CodegenEnv &env, OpBuilder &builder, unsigned at,		static Operation *startLoop(CodegenEnv &env, OpBuilder &builder, LoopOrd at,
unsigned li, bool needsUniv) {		LatPointId li, bool needsUniv) {
// The set of tensors + dims to generate loops on		// The set of tensors + lvls to generate loops on
SmallVector<size_t> tids, dims;		SmallVector<TensorId> tids, affineTids;
		SmallVector<Level> lvls, affineLvls;
// The set of dense tensors with non-trivial affine expression that just		// The set of dense tensors with non-trivial affine expression that just
// becomes invariant and the address shall now be generated at the current		// becomes invariant and the address shall now be generated at the current
// level.		// level.
SmallVector<size_t> affineTids, affineDims;
SmallVector<AffineExpr> affines;		SmallVector<AffineExpr> affines;
bool isFor = translateBitsToTidDimPairs(		bool isFor = translateBitsToTidLvlPairs(
env, li, env.topSortAt(at), tids, dims, affineTids, affineDims, affines);		env, li, env.topSortAt(at), tids, lvls, affineTids, affineLvls, affines);

// Emit the for/while-loop control.		// Emit the for/while-loop control.
Operation *loop = genLoop(env, builder, at, needsUniv, tids, dims, isFor);		Operation *loop = genLoop(env, builder, at, needsUniv, tids, lvls, isFor);
for (auto [tid, dim, exp] : llvm::zip(affineTids, affineDims, affines)) {		Location loc = env.op().getLoc();
env.emitter().genDenseAffineAddressAtCurLevel(builder, env.op().getLoc(),		for (auto [tid, lvl, exp] : llvm::zip(affineTids, affineLvls, affines)) {
tid, dim, exp);		env.emitter().genDenseAffineAddress(builder, loc, tid, lvl, exp);
}		}

// Until now, we have entered every <tid, dim> pair in {cond, extra,		// Until now, we have entered every <tid, lvl> pair in {cond, extra,
// affine}Tids/Dims. The addresses of the upcoming levels which are dependent		// affine}Tids/Lvls. The addresses of the upcoming levels which are dependent
// on constant affines expression may now be determined.		// on constant affines expression may now be determined.
auto allTids = llvm::concat<size_t>(tids, affineTids);		auto allTids = llvm::concat<TensorId>(tids, affineTids);
auto allDims = llvm::concat<size_t>(dims, affineDims);		auto allLvls = llvm::concat<Level>(lvls, affineLvls);
for (auto [tid, dim] : llvm::zip(allTids, allDims)) {		for (auto [tid, lvl] : llvm::zip(allTids, allLvls)) {
if (tid != env.merger().getOutTensorID())		if (tid != env.merger().getOutTensorID())
genConstantDenseAddressFromLevel(env, builder, tid, dim + 1);		genConstantDenseAddressFromLevel(env, builder, tid, lvl + 1);
}		}

return loop;		return loop;
}		}

/// Ends a single loop in current sequence. Returns new values for needsUniv.		/// Ends a single loop in current sequence. Returns new values for needsUniv.
static bool endLoop(CodegenEnv &env, RewriterBase &rewriter, Operation *loop,		static bool endLoop(CodegenEnv &env, RewriterBase &rewriter, Operation *loop,
unsigned idx, unsigned li, bool needsUniv) {		LoopId idx, LatPointId li, bool needsUniv) {
// End a while-loop.		// End a while-loop.
if (auto whileOp = dyn_cast<scf::WhileOp>(loop)) {		if (auto whileOp = dyn_cast<scf::WhileOp>(loop)) {
finalizeWhileOp(env, rewriter, idx, needsUniv, env.lat(li).bits, whileOp);		finalizeWhileOp(env, rewriter, idx, needsUniv, env.lat(li).bits, whileOp);
} else if (auto forOp = dyn_cast<scf::ForOp>(loop)) {		} else if (auto forOp = dyn_cast<scf::ForOp>(loop)) {
// Any iteration of a reduction for-loop creates a valid lex insert.		// Any iteration of a reduction for-loop creates a valid lex insert.
if (env.isReduc() && env.getValidLexInsert())		if (env.isReduc() && env.getValidLexInsert())
env.setValidLexInsert(constantI1(rewriter, env.op().getLoc(), true));		env.setValidLexInsert(constantI1(rewriter, env.op().getLoc(), true));
} else {		} else {
needsUniv = false;		needsUniv = false;
}		}

env.genLoopBoundary([&](MutableArrayRef<Value> reduc) {		env.genLoopBoundary([&](MutableArrayRef<Value> reduc) {
env.emitter().exitCurrentLoop(rewriter, env.op().getLoc(), reduc);		env.emitter().exitCurrentLoop(rewriter, env.op().getLoc(), reduc);
return std::nullopt;		return std::nullopt;
});		});

return needsUniv;		return needsUniv;
}		}

/// Ends a loop sequence at given level.		/// Ends a loop sequence at given level.
static void endLoopSeq(CodegenEnv &env, OpBuilder &builder, unsigned exp,		static void endLoopSeq(CodegenEnv &env, OpBuilder &builder, ExprId exp,
unsigned at, unsigned idx, unsigned ldx) {		LoopOrd at, LoopId idx, LoopId ldx) {
assert(env.getLoopIdxValue(idx) == nullptr);		assert(!env.getLoopVar(idx));
env.emitter().exitCurrentLoopSeq();		env.emitter().exitCurrentLoopSeq();
// Unmark bookkeeping of invariants and loop index.		// Unmark bookkeeping of invariants and loop index.
genInvariants(env, builder, exp, ldx, /atStart=/false);		genInvariants(env, builder, exp, ldx, /atStart=/false);
// Finalize access pattern expansion for sparse tensor output.		// Finalize access pattern expansion for sparse tensor output.
genExpand(env, builder, at, /atStart=/false);		genExpand(env, builder, at, /atStart=/false);
}		}

/// Recursively generates code while computing iteration lattices in order		/// Recursively generates code while computing iteration lattices in order
/// to manage the complexity of implementing co-iteration over unions		/// to manage the complexity of implementing co-iteration over unions
/// and intersections of sparse iterations spaces.		/// and intersections of sparse iterations spaces.
static void genStmt(CodegenEnv &env, RewriterBase &rewriter, unsigned exp,		static void genStmt(CodegenEnv &env, RewriterBase &rewriter, ExprId exp,
unsigned at) {		LoopOrd at) {
// At each leaf, assign remaining tensor (sub)expression to output tensor.		// At each leaf, assign remaining tensor (sub)expression to output tensor.
if (at == env.topSortSize()) {		if (at == env.topSortSize()) {
unsigned ldx = env.topSortAt(at - 1);		const LoopId ldx = env.topSortAt(at - 1);
Value rhs = genExp(env, rewriter, exp, ldx);		Value rhs = genExp(env, rewriter, exp, ldx);
genTensorStore(env, rewriter, exp, rhs);		genTensorStore(env, rewriter, exp, rhs);
return;		return;
}		}

// Construct iteration lattices for current loop index, with L0 at top.		// Construct iteration lattices for current loop index, with L0 at top.
unsigned idx = env.topSortAt(at);		const LoopId idx = env.topSortAt(at);
unsigned ldx = at == 0 ? -1u : env.topSortAt(at - 1);		const LoopId ldx = at == 0 ? kInvalidId : env.topSortAt(at - 1);
unsigned lts = env.merger().optimizeSet(env.merger().buildLattices(exp, idx));		const LatSetId lts =
		env.merger().optimizeSet(env.merger().buildLattices(exp, idx));

// Start a loop sequence.		// Start a loop sequence.
bool needsUniv = startLoopSeq(env, rewriter, exp, at, idx, ldx, lts);		bool needsUniv = startLoopSeq(env, rewriter, exp, at, idx, ldx, lts);

// Emit a loop for every lattice point L0 >= Li in this loop sequence.		// Emit a loop for every lattice point L0 >= Li in this loop sequence.
unsigned lsize = env.set(lts).size();		unsigned lsize = env.set(lts).size();
for (unsigned i = 0; i < lsize; i++) {		for (unsigned i = 0; i < lsize; i++) {
// Start a loop.		// Start a loop.
unsigned li = env.set(lts)[i];		const LatPointId li = env.set(lts)[i];
Operation *loop = startLoop(env, rewriter, at, li, needsUniv);		Operation *loop = startLoop(env, rewriter, at, li, needsUniv);

// Visit all lattices points with Li >= Lj to generate the		// Visit all lattices points with Li >= Lj to generate the
// loop-body, possibly with if statements for coiteration.		// loop-body, possibly with if statements for coiteration.
Value redInput = env.getReduc();		Value redInput = env.getReduc();
Value cntInput = env.getExpandCount();		Value cntInput = env.getExpandCount();
Value insInput = env.getInsertionChain();		Value insInput = env.getInsertionChain();
bool isWhile = dyn_cast<scf::WhileOp>(loop) != nullptr;		bool isWhile = dyn_cast<scf::WhileOp>(loop) != nullptr;
for (unsigned j = 0; j < lsize; j++) {		for (unsigned j = 0; j < lsize; j++) {
unsigned lj = env.set(lts)[j];		const LatPointId lj = env.set(lts)[j];
unsigned ej = env.lat(lj).exp;		const ExprId ej = env.lat(lj).exp;
if (li == lj \|\| env.merger().latGT(li, lj)) {		if (li == lj \|\| env.merger().latGT(li, lj)) {
// Recurse into body of each branch.		// Recurse into body of each branch.
if (isWhile) {		if (isWhile) {
scf::IfOp ifOp = genIf(env, rewriter, idx, env.lat(lj).simple);		scf::IfOp ifOp = genIf(env, rewriter, idx, env.lat(lj).simple);
genStmt(env, rewriter, ej, at + 1);		genStmt(env, rewriter, ej, at + 1);
endIf(env, rewriter, ifOp, loop, redInput, cntInput, insInput);		endIf(env, rewriter, ifOp, loop, redInput, cntInput, insInput);
} else {		} else {
genStmt(env, rewriter, ej, at + 1);		genStmt(env, rewriter, ej, at + 1);
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	public:
LogicalResult matchAndRewrite(linalg::GenericOp op,		LogicalResult matchAndRewrite(linalg::GenericOp op,
PatternRewriter &rewriter) const override {		PatternRewriter &rewriter) const override {
// Only accept single output operations without affine index on sparse		// Only accept single output operations without affine index on sparse
// output.		// output.
if (op.getNumDpsInits() != 1 \|\| hasCompoundAffineOnSparseOut(op))		if (op.getNumDpsInits() != 1 \|\| hasCompoundAffineOnSparseOut(op))
return failure();		return failure();

// Sets up a code generation environment.		// Sets up a code generation environment.
unsigned numTensors = op->getNumOperands();		const unsigned numTensors = op->getNumOperands();
unsigned numLoops = op.getNumLoops();		const unsigned numLoops = op.getNumLoops();
unsigned numFilterLoops = getNumCompoundAffineOnSparseDims(op);		const unsigned numFilterLoops = getNumCompoundAffineOnSparseLvls(op);
CodegenEnv env(op, options, numTensors, numLoops, numFilterLoops);		CodegenEnv env(op, options, numTensors, numLoops, numFilterLoops);

// Detects sparse annotations and translates the per-dimension sparsity		// Detects sparse annotations and translates the per-level sparsity
// information for all tensors to loop indices in the kernel.		// information for all tensors to loop indices in the kernel.
if (!findSparseAnnotations(env))		if (!findSparseAnnotations(env))
return failure();		return failure();

// Constructs the tensor expressions tree from `op`, returns failure if the		// Constructs the tensor expressions tree from `op`, returns failure if the
// tree can not be built or the tensor expression is inadmissible.		// tree can not be built or the tensor expression is inadmissible.
if (failed(env.initTensorExp()))		if (failed(env.initTensorExp()))
return failure();		return failure();

// Computes a topologically sorted iteration graph to ensure tensors		// Computes a topologically sorted iteration graph to ensure tensors
// are visited in natural index order. Gradually relaxes the considered		// are visited in natural index order. Gradually relaxes the considered
// constraints until an acyclic iteration graph results, such that sparse		// constraints until an acyclic iteration graph results, such that sparse
// code generation can proceed. As a last resort, an attempt is made		// code generation can proceed. As a last resort, an attempt is made
// to resolve cycles by inserting a conversion.		// to resolve cycles by inserting a conversion.
bool isAdmissible = false;		bool isAdmissible = false;
bool hasCycle = true;		bool hasCycle = true;

// An const list of all masks that we used for interation graph		// An const list of all masks that we used for interation graph
// computation. Must be ordered from more strict to less strict.		// computation. Must be ordered from more strict to less strict.
// Ideally (though might not be guaranteed), the eariler a constraint mask		// Ideally (though might not be guaranteed), the eariler a constraint mask
// can be satisfied, the faster the generated kernel will be.		// can be satisfied, the faster the generated kernel will be.
const auto allMask = {		const auto allMasks = {
SortMask::kIncludeAll, SortMask::kIncludeDense,		SortMask::kIncludeAll, SortMask::kIncludeDense,
SortMask::kIncludeDenseInput, SortMask::kIncludeDenseOutput,		SortMask::kIncludeDenseInput, SortMask::kIncludeDenseOutput,
SortMask::kIncludeUndef, SortMask::kSparseOnly};		SortMask::kIncludeUndef, SortMask::kSparseOnly};
for (auto mask : allMask) {		for (const SortMask mask : allMasks) {
if (computeIterationGraph(env, mask)) {		if (computeIterationGraph(env, mask)) {
hasCycle = false;		hasCycle = false;
if (env.isAdmissibleTopoOrder()) {		if (env.isAdmissibleTopoOrder()) {
isAdmissible = true;		isAdmissible = true;
break;		break;
}		}
// else try a set of less strict constraints.		// else try a set of less strict constraints.
}		}
}		}
if (hasCycle)		if (hasCycle)
return resolveCycle(env, rewriter); // one last shot		return resolveCycle(env, rewriter); // one last shot
if (!isAdmissible)		if (!isAdmissible)
return failure(); // inadmissible expression, reject		return failure(); // inadmissible expression, reject

// Recursively generates code if admissible.		// Recursively generates code if admissible.
env.startEmit();		env.startEmit();
genBuffers(env, rewriter);		genBuffers(env, rewriter);
genInitConstantDenseAddress(env, rewriter);		genInitConstantDenseAddress(env, rewriter);
genStmt(env, rewriter, env.getTensorExp(), 0);		genStmt(env, rewriter, env.getExprId(), 0);
genResult(env, rewriter);		genResult(env, rewriter);
return success();		return success();
}		}

private:		private:
// Last resort cycle resolution.		// Last resort cycle resolution.
LogicalResult resolveCycle(CodegenEnv &env, PatternRewriter &rewriter) const {		LogicalResult resolveCycle(CodegenEnv &env, PatternRewriter &rewriter) const {
// Compute topological sort while leaving out every		// Compute topological sort while leaving out every
// sparse input tensor in succession until an acylic		// sparse input tensor in succession until an acylic
// iteration graph results.		// iteration graph results.
for (OpOperand *t : env.op().getDpsInputOperands()) {		for (OpOperand *t : env.op().getDpsInputOperands()) {
unsigned tensor = t->getOperandNumber();		const TensorId tid = t->getOperandNumber();
Value tval = t->get();		Value tval = t->get();
auto srcEnc = getSparseTensorEncoding(tval.getType());		auto srcEnc = getSparseTensorEncoding(tval.getType());
if (!srcEnc \|\| !computeIterationGraph(env, SortMask::kSparseOnly, t))		if (!srcEnc \|\| !computeIterationGraph(env, SortMask::kSparseOnly, t))
continue;		continue;
// Found an input tensor that resolves the cycle by inserting a		// Found an input tensor that resolves the cycle by inserting a
// conversion into a sparse tensor that adheres to the iteration		// conversion into a sparse tensor that adheres to the iteration
// graph order. Also releases the temporary sparse tensor.		// graph order. Also releases the temporary sparse tensor.
//		//
// TODO: investigate fusing the conversion with computation,		// TODO: investigate fusing the conversion with computation,
// especially if it is a direct yield!		// especially if it is a direct yield!
//		//
auto srcTp = getRankedTensorType(tval);		auto srcTp = getRankedTensorType(tval);
auto dstEnc = SparseTensorEncodingAttr::get(		auto dstEnc = SparseTensorEncodingAttr::get(
getContext(), srcEnc.getDimLevelType(),		getContext(), srcEnc.getDimLevelType(),
permute(env, env.op().getMatchingIndexingMap(t)), // new order		permute(env, env.op().getMatchingIndexingMap(t)), // new order
srcEnc.getHigherOrdering(), srcEnc.getPosWidth(),		srcEnc.getHigherOrdering(), srcEnc.getPosWidth(),
srcEnc.getCrdWidth());		srcEnc.getCrdWidth());
auto dstTp = RankedTensorType::get(srcTp.getShape(),		auto dstTp = RankedTensorType::get(srcTp.getShape(),
srcTp.getElementType(), dstEnc);		srcTp.getElementType(), dstEnc);
auto convert = rewriter.create<ConvertOp>(tval.getLoc(), dstTp, tval);		auto convert = rewriter.create<ConvertOp>(tval.getLoc(), dstTp, tval);
rewriter.updateRootInPlace(		rewriter.updateRootInPlace(env.op(),
env.op(), [&]() { env.op()->setOperand(tensor, convert); });		[&]() { env.op()->setOperand(tid, convert); });
rewriter.setInsertionPointAfter(env.op());		rewriter.setInsertionPointAfter(env.op());
rewriter.create<bufferization::DeallocTensorOp>(tval.getLoc(), convert);		rewriter.create<bufferization::DeallocTensorOp>(tval.getLoc(), convert);
return success();		return success();
}		}
// Cannot be resolved with a single conversion.		// Cannot be resolved with a single conversion.
// TODO: convert more than one?		// TODO: convert more than one?
return failure();		return failure();
}		}
Show All 13 Lines

mlir/lib/Dialect/SparseTensor/Utils/Merger.cpp

Show All 24 Lines	enum class ExpArity {
kBinary,		kBinary,
};		};

static ExpArity getExpArity(Kind k) {		static ExpArity getExpArity(Kind k) {
switch (k) {		switch (k) {
// Leaf.		// Leaf.
case kTensor:		case kTensor:
case kInvariant:		case kInvariant:
case kIndex:		case kLoopVar:
return ExpArity::kNullary;		return ExpArity::kNullary;
case kAbsF:		case kAbsF:
case kAbsC:		case kAbsC:
case kAbsI:		case kAbsI:
case kCeilF:		case kCeilF:
case kFloorF:		case kFloorF:
case kSqrtF:		case kSqrtF:
case kSqrtC:		case kSqrtC:
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	static ExpArity getExpArity(Kind k) {
}		}
llvm_unreachable("unexpected kind");		llvm_unreachable("unexpected kind");
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Constructors.		// Constructors.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

TensorExp::TensorExp(Kind k, unsigned x, unsigned y, Value v, Operation *o)		TensorExp::TensorExp(Kind k, unsigned x, ExprId y, Value v, Operation *o)
: kind(k), val(v), op(o) {		: kind(k), val(v), op(o) {
switch (kind) {		switch (kind) {
// Leaf.		// Leaf.
case kTensor:		case kTensor:
assert(x != -1u && y == -1u && !v && !o);		assert(x != kInvalidId && y == kInvalidId && !v && !o);
tensor = x;		tensor = x;
break;		break;
case kInvariant:		case kInvariant:
assert(x == -1u && y == -1u && v && !o);		assert(x == kInvalidId && y == kInvalidId && v && !o);
break;		break;
case kIndex:		case kLoopVar:
assert(x != -1u && y == -1u && !v && !o);		assert(x != kInvalidId && y == kInvalidId && !v && !o);
index = x;		loop = x;
break;		break;
// Unary operations.		// Unary operations.
case kAbsF:		case kAbsF:
case kAbsC:		case kAbsC:
case kAbsI:		case kAbsI:
case kCeilF:		case kCeilF:
case kFloorF:		case kFloorF:
case kSqrtF:		case kSqrtF:
case kSqrtC:		case kSqrtC:
case kExpm1F:		case kExpm1F:
case kExpm1C:		case kExpm1C:
case kLog1pF:		case kLog1pF:
case kLog1pC:		case kLog1pC:
case kSinF:		case kSinF:
case kSinC:		case kSinC:
case kTanhF:		case kTanhF:
case kTanhC:		case kTanhC:
case kNegF:		case kNegF:
case kNegC:		case kNegC:
case kNegI:		case kNegI:
case kCIm:		case kCIm:
case kCRe:		case kCRe:
assert(x != -1u && y == -1u && !v && !o);		assert(x != kInvalidId && y == kInvalidId && !v && !o);
children.e0 = x;		children.e0 = x;
children.e1 = y;		children.e1 = y;
break;		break;
case kTruncF:		case kTruncF:
case kExtF:		case kExtF:
case kCastFS:		case kCastFS:
case kCastFU:		case kCastFU:
case kCastSF:		case kCastSF:
case kCastUF:		case kCastUF:
case kCastS:		case kCastS:
case kCastU:		case kCastU:
case kCastIdx:		case kCastIdx:
case kTruncI:		case kTruncI:
case kBitCast:		case kBitCast:
assert(x != -1u && y == -1u && v && !o);		assert(x != kInvalidId && y == kInvalidId && v && !o);
children.e0 = x;		children.e0 = x;
children.e1 = y;		children.e1 = y;
break;		break;
case kBinaryBranch:		case kBinaryBranch:
case kSelect:		case kSelect:
assert(x != -1u && y == -1u && !v && o);		assert(x != kInvalidId && y == kInvalidId && !v && o);
children.e0 = x;		children.e0 = x;
children.e1 = y;		children.e1 = y;
break;		break;
case kUnary:		case kUnary:
// No assertion on y can be made, as the branching paths involve both		// No assertion on y can be made, as the branching paths involve both
// a unary (mapSet) and binary (takeDisj) pathway.		// a unary (`mapSet`) and binary (`disjSet`) pathway.
assert(x != -1u && !v && o);		assert(x != kInvalidId && !v && o);
children.e0 = x;		children.e0 = x;
children.e1 = y;		children.e1 = y;
break;		break;
// Binary operations.		// Binary operations.
case kMulF:		case kMulF:
case kMulC:		case kMulC:
case kMulI:		case kMulI:
case kDivF:		case kDivF:
case kDivC:		case kDivC:
case kDivS:		case kDivS:
case kDivU:		case kDivU:
case kAddF:		case kAddF:
case kAddC:		case kAddC:
case kAddI:		case kAddI:
case kSubF:		case kSubF:
case kSubC:		case kSubC:
case kSubI:		case kSubI:
case kAndI:		case kAndI:
case kOrI:		case kOrI:
case kXorI:		case kXorI:
case kShrS:		case kShrS:
case kShrU:		case kShrU:
case kShlI:		case kShlI:
assert(x != -1u && y != -1u && !v && !o);		assert(x != kInvalidId && y != kInvalidId && !v && !o);
children.e0 = x;		children.e0 = x;
children.e1 = y;		children.e1 = y;
break;		break;
case kBinary:		case kBinary:
case kReduce:		case kReduce:
assert(x != -1u && y != -1u && !v && o);		assert(x != kInvalidId && y != kInvalidId && !v && o);
children.e0 = x;		children.e0 = x;
children.e1 = y;		children.e1 = y;
break;		break;
}		}
}		}

LatPoint::LatPoint(unsigned n, unsigned e, unsigned b)		LatPoint::LatPoint(const BitVector &bits, ExprId e) : bits(bits), exp(e) {}
: bits(n, false), exp(e) {
		LatPoint::LatPoint(unsigned numTensors, unsigned numLoops, TensorId t, LoopId i,
		ExprId e)
		: bits(numLoops * numTensors, false), exp(e) {
		assert(t < numTensors && i < numLoops);
		const TensorLoopId b = numTensors * i + t;
bits.set(b);		bits.set(b);
}		}

LatPoint::LatPoint(const BitVector &b, unsigned e) : bits(b), exp(e) {}		Merger::Merger(unsigned numInputOutputTensors, unsigned numNativeLoops,
		unsigned numFilterLoops)
Merger::Merger(unsigned t, unsigned l, unsigned fl)		: outTensor(numInputOutputTensors - 1),
: outTensor(t - 1), syntheticTensor(t), numTensors(t + 1),		syntheticTensor(numInputOutputTensors),
numNativeLoops(l), numLoops(l + fl), hasSparseOut(false),		numTensors(numInputOutputTensors + 1), numNativeLoops(numNativeLoops),
dimTypes(numTensors,		numLoops(numNativeLoops + numFilterLoops), hasSparseOut(false),
		lvlTypes(numTensors,
std::vector<DimLevelType>(numLoops, DimLevelType::Undef)),		std::vector<DimLevelType>(numLoops, DimLevelType::Undef)),
loopIdxToDim(numTensors, std::vector<std::optional<unsigned>>(		loopToLvl(numTensors,
numLoops, std::nullopt)),		std::vector<std::optional<Level>>(numLoops, std::nullopt)),
dimToLoopIdx(numTensors, std::vector<std::optional<unsigned>>(		lvlToLoop(numTensors,
numLoops, std::nullopt)) {}		std::vector<std::optional<LoopId>>(numLoops, std::nullopt)) {}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Lattice methods.		// Lattice methods.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

unsigned Merger::addExp(Kind k, unsigned e0, unsigned e1, Value v,		ExprId Merger::addExp(Kind k, unsigned x, ExprId y, Value v, Operation *op) {
Operation *op) {		const ExprId e = tensorExps.size();
unsigned e = tensorExps.size();		assert((k != kTensor \|\| x < numTensors) && (k != kLoopVar \|\| x < numLoops));
tensorExps.push_back(TensorExp(k, e0, e1, v, op));		tensorExps.emplace_back(k, x, y, v, op);
return e;		return e;
}		}

unsigned Merger::addLat(unsigned t, unsigned i, unsigned e) {		LatPointId Merger::addLat(TensorId t, LoopId i, ExprId e) {
assert(t < numTensors && i < numLoops);		assert(t < numTensors && i < numLoops);
unsigned p = latPoints.size();		const LatPointId p = latPoints.size();
latPoints.push_back(LatPoint(numLoops * numTensors, e, numTensors * i + t));		latPoints.emplace_back(numTensors, numLoops, t, i, e);
return p;		return p;
}		}

unsigned Merger::addSet() {		LatSetId Merger::addSet() {
unsigned s = latSets.size();		const LatSetId s = latSets.size();
latSets.emplace_back();		latSets.emplace_back();
return s;		return s;
}		}

unsigned Merger::conjLatPoint(Kind kind, unsigned p0, unsigned p1,		LatPointId Merger::conjLat(Kind kind, LatPointId p0, LatPointId p1,
Operation *op) {		Operation *op) {
unsigned p = latPoints.size();		const LatPointId p = latPoints.size();
BitVector nb = BitVector(latPoints[p0].bits);		BitVector bits(latPoints[p0].bits);
nb \|= latPoints[p1].bits;		bits \|= latPoints[p1].bits;
unsigned e = addExp(kind, latPoints[p0].exp, latPoints[p1].exp, Value(), op);		const ExprId e =
latPoints.push_back(LatPoint(nb, e));		addExp(kind, latPoints[p0].exp, latPoints[p1].exp, Value(), op);
		latPoints.emplace_back(bits, e);
return p;		return p;
}		}

unsigned Merger::takeConj(Kind kind, unsigned s0, unsigned s1, Operation *op) {		LatSetId Merger::conjSet(Kind kind, LatSetId s0, LatSetId s1, Operation *op) {
unsigned s = addSet();		const LatSetId s = addSet();
for (unsigned p0 : latSets[s0])		for (const LatPointId p0 : latSets[s0])
for (unsigned p1 : latSets[s1])		for (const LatPointId p1 : latSets[s1])
latSets[s].push_back(conjLatPoint(kind, p0, p1, op));		latSets[s].push_back(conjLat(kind, p0, p1, op));
return s;		return s;
}		}

unsigned Merger::takeDisj(Kind kind, unsigned s0, unsigned s1, Operation *op) {		LatSetId Merger::disjSet(Kind kind, LatSetId s0, LatSetId s1, Operation *op) {
unsigned s = takeConj(kind, s0, s1, op);		const LatSetId s = conjSet(kind, s0, s1, op);
// Followed by all in s0.		// Followed by all in s0.
for (unsigned p : latSets[s0])		for (const LatPointId p : latSets[s0])
latSets[s].push_back(p);		latSets[s].push_back(p);
// Map binary 0-y to unary -y.		// Map binary 0-y to unary -y.
// TODO: move this if-else logic into buildLattices		// TODO: move this if-else logic into buildLattices
if (kind == kSubF)		if (kind == kSubF)
s1 = mapSet(kNegF, s1);		s1 = mapSet(kNegF, s1);
else if (kind == kSubC)		else if (kind == kSubC)
s1 = mapSet(kNegC, s1);		s1 = mapSet(kNegC, s1);
else if (kind == kSubI)		else if (kind == kSubI)
s1 = mapSet(kNegI, s1);		s1 = mapSet(kNegI, s1);
// Followed by all in s1.		// Followed by all in s1.
for (unsigned p : latSets[s1])		for (const LatPointId p : latSets[s1])
latSets[s].push_back(p);		latSets[s].push_back(p);
return s;		return s;
}		}

unsigned Merger::takeCombi(Kind kind, unsigned s0, unsigned s1, Operation *orig,		LatSetId Merger::combiSet(Kind kind, LatSetId s0, LatSetId s1, Operation *orig,
bool includeLeft, Kind ltrans, Operation *opleft,		bool includeLeft, Kind ltrans, Operation *opleft,
bool includeRight, Kind rtrans, Operation *opright) {		bool includeRight, Kind rtrans, Operation *opright) {
unsigned s = takeConj(kind, s0, s1, orig);		const LatSetId s = conjSet(kind, s0, s1, orig);
// Left Region.		// Left Region.
if (includeLeft) {		if (includeLeft) {
if (opleft)		if (opleft)
s0 = mapSet(ltrans, s0, Value(), opleft);		s0 = mapSet(ltrans, s0, Value(), opleft);
for (unsigned p : latSets[s0])		for (const LatPointId p : latSets[s0])
latSets[s].push_back(p);		latSets[s].push_back(p);
}		}
// Right Region.		// Right Region.
if (includeRight) {		if (includeRight) {
if (opright)		if (opright)
s1 = mapSet(rtrans, s1, Value(), opright);		s1 = mapSet(rtrans, s1, Value(), opright);
for (unsigned p : latSets[s1])		for (const LatPointId p : latSets[s1])
latSets[s].push_back(p);		latSets[s].push_back(p);
}		}
return s;		return s;
}		}

unsigned Merger::mapSet(Kind kind, unsigned s0, Value v, Operation *op) {		LatSetId Merger::mapSet(Kind kind, LatSetId s0, Value v, Operation *op) {
assert(kAbsF <= kind && kind <= kSelect);		assert(kAbsF <= kind && kind <= kSelect);
unsigned s = addSet();		const LatSetId s = addSet();
for (unsigned p : latSets[s0]) {		for (const LatPointId p : latSets[s0]) {
unsigned e = addExp(kind, latPoints[p].exp, v, op);		const ExprId e = addExp(kind, latPoints[p].exp, v, op);
latPoints.push_back(LatPoint(latPoints[p].bits, e));		latPoints.emplace_back(latPoints[p].bits, e);
latSets[s].push_back(latPoints.size() - 1);		latSets[s].push_back(latPoints.size() - 1);
}		}
return s;		return s;
}		}

unsigned Merger::optimizeSet(unsigned s0) {		LatSetId Merger::optimizeSet(LatSetId s0) {
unsigned s = addSet();		const LatSetId s = addSet();
assert(!latSets[s0].empty());		assert(!latSets[s0].empty());
unsigned p0 = latSets[s0][0];		const LatPointId p0 = latSets[s0][0];
for (unsigned p1 : latSets[s0]) {		for (const LatPointId p1 : latSets[s0]) {
bool add = true;		bool add = true;
if (p0 != p1) {		if (p0 != p1) {
// Is this a straightforward copy?		// Check whether this is a straightforward copy.
unsigned e = latPoints[p1].exp;		const ExprId e = latPoints[p1].exp;
if (expIsTensor(e, outTensor))		if (expIsTensor(e, outTensor))
continue;		continue;
// Conjunction already covered?		// Check whether this conjunction is already covered.
for (unsigned p2 : latSets[s]) {		for (const LatPointId p2 : latSets[s]) {
assert(!latGT(p1, p2)); // Lj => Li would be bad		assert(!latGT(p1, p2)); // Lj => Li would be bad
if (onlyDenseDiff(p2, p1)) {		if (onlyDenseDiff(p2, p1)) {
add = false;		add = false;
break;		break;
}		}
}		}
assert(!add \|\| latGT(p0, p1));		assert(!add \|\| latGT(p0, p1));
}		}
if (add)		if (add)
latSets[s].push_back(p1);		latSets[s].push_back(p1);
}		}
for (unsigned p : latSets[s])		for (const LatPointId p : latSets[s])
latPoints[p].simple = simplifyCond(s, p);		latPoints[p].simple = simplifyCond(s, p);
return s;		return s;
}		}

BitVector Merger::simplifyCond(unsigned s0, unsigned p0) {		BitVector Merger::simplifyCond(LatSetId s0, LatPointId p0) {
// First determine if this lattice point is a singleton, i.e.,		// First determine if this lattice point is a singleton, i.e.,
// the last point in a lattice, no other is less than this one.		// the last point in a lattice, no other is less than this one.
bool isSingleton = true;		bool isSingleton = true;
for (unsigned p1 : latSets[s0]) {		for (const LatPointId p1 : latSets[s0]) {
if (p0 != p1 && latGT(p0, p1)) {		if (p0 != p1 && latGT(p0, p1)) {
isSingleton = false;		isSingleton = false;
break;		break;
}		}
}		}

BitVector simple = latPoints[p0].bits;		BitVector simple(latPoints[p0].bits);
bool reset = isSingleton && hasAnySparse(simple);		bool reset = isSingleton && hasAnySparse(simple);
unsigned be = simple.size();		const TensorLoopId be = simple.size();
unsigned offset = 0; // relative to the end		TensorLoopId offset = 0; // relative to the end
if (!reset)		if (!reset)
// Starts resetting from a dense dimension, so that the first bit (if kept)		// Starts resetting from a dense level, so that the first bit (if kept)
// is not undefined dimension type.		// is not undefined level-type.
for (unsigned b = 0; b < be; b++) {		for (TensorLoopId b = 0; b < be; b++) {
if (simple[b] && isDenseDLT(getDimLevelType(b))) {		if (simple[b] && isDenseDLT(getDimLevelType(b))) {
offset = be - b - 1; // relative to the end		offset = be - b - 1; // relative to the end
break;		break;
}		}
}		}

// Now apply the two basic rules. We also iterate the bits reversely to always		// Now apply the two basic rules. We also iterate the bits reversely to always
// keep the rightmost bit (which could possibly be a synthetic tensor).		// keep the rightmost bit (which could possibly be a synthetic tensor).
for (unsigned b = be - 1 - offset, i = 0; i < be;		for (TensorLoopId b = be - 1 - offset, i = 0; i < be;
b = b == 0 ? be - 1 : b - 1, i++) {		b = b == 0 ? be - 1 : b - 1, i++) {
if (simple[b] && (!isCompressedDLT(getDimLevelType(b)) &&		if (simple[b]) {
!isSingletonDLT(getDimLevelType(b)))) {		const auto dlt = getDimLevelType(b);
		if (!isCompressedDLT(dlt) && !isSingletonDLT(dlt)) {
if (reset)		if (reset)
simple.reset(b);		simple.reset(b);
reset = true;		reset = true;
}		}
}		}
		}
return simple;		return simple;
}		}

bool Merger::latGT(unsigned i, unsigned j) const {		bool Merger::latGT(LatPointId i, LatPointId j) const {
const BitVector &bitsi = latPoints[i].bits;		const BitVector &bitsi = latPoints[i].bits;
const BitVector &bitsj = latPoints[j].bits;		const BitVector &bitsj = latPoints[j].bits;
assert(bitsi.size() == bitsj.size());		assert(bitsi.size() == bitsj.size());
if (bitsi.count() > bitsj.count()) {		if (bitsi.count() > bitsj.count()) {
for (unsigned b = 0, be = bitsj.size(); b < be; b++)		for (TensorLoopId b = 0, be = bitsj.size(); b < be; b++)
if (bitsj[b] && !bitsi[b])		if (bitsj[b] && !bitsi[b])
return false;		return false;
return true;		return true;
}		}
return false;		return false;
}		}

bool Merger::onlyDenseDiff(unsigned i, unsigned j) {		bool Merger::onlyDenseDiff(LatPointId i, LatPointId j) const {
BitVector tmp = latPoints[j].bits;		BitVector tmp(latPoints[j].bits);
tmp ^= latPoints[i].bits;		tmp ^= latPoints[i].bits;
return !hasAnySparse(tmp);		return !hasAnySparse(tmp);
}		}

bool Merger::expContainsTensor(unsigned e, unsigned t) const {		bool Merger::expContainsTensor(ExprId e, TensorId t) const {
if (tensorExps[e].kind == kTensor)		if (tensorExps[e].kind == kTensor)
return tensorExps[e].tensor == t;		return tensorExps[e].tensor == t;

switch (getExpArity(tensorExps[e].kind)) {		switch (getExpArity(tensorExps[e].kind)) {
case ExpArity::kNullary:		case ExpArity::kNullary:
return false;		return false;
case ExpArity::kUnary: {		case ExpArity::kUnary: {
unsigned op = tensorExps[e].children.e0;		const ExprId e0 = tensorExps[e].children.e0;
if (expIsTensor(op, t))		if (expIsTensor(e0, t))
return true;		return true;
return expContainsTensor(op, t);		return expContainsTensor(e0, t);
}		}
case ExpArity::kBinary: {		case ExpArity::kBinary: {
unsigned op1 = tensorExps[e].children.e0;		const ExprId e0 = tensorExps[e].children.e0;
unsigned op2 = tensorExps[e].children.e1;		const ExprId e1 = tensorExps[e].children.e1;
if (expIsTensor(op1, t) \|\| expIsTensor(op2, t))		if (expIsTensor(e0, t) \|\| expIsTensor(e1, t))
return true;		return true;
return expContainsTensor(op1, t) \|\| expContainsTensor(op2, t);		return expContainsTensor(e0, t) \|\| expContainsTensor(e1, t);
}		}
}		}
llvm_unreachable("unexpected arity");		llvm_unreachable("unexpected arity");
}		}

bool Merger::hasNegateOnOut(unsigned e) const {		bool Merger::hasNegateOnOut(ExprId e) const {
switch (tensorExps[e].kind) {		switch (tensorExps[e].kind) {
case kNegF:		case kNegF:
case kNegC:		case kNegC:
case kNegI:		case kNegI:
return expContainsTensor(tensorExps[e].children.e0, outTensor);		return expContainsTensor(tensorExps[e].children.e0, outTensor);
case kSubF:		case kSubF:
case kSubC:		case kSubC:
case kSubI:		case kSubI:
Show All 9 Lines	case ExpArity::kBinary:
return hasNegateOnOut(tensorExps[e].children.e0) \|\|		return hasNegateOnOut(tensorExps[e].children.e0) \|\|
hasNegateOnOut(tensorExps[e].children.e1);		hasNegateOnOut(tensorExps[e].children.e1);
}		}
}		}
}		}
llvm_unreachable("unexpected kind");		llvm_unreachable("unexpected kind");
}		}

bool Merger::isSingleCondition(unsigned t, unsigned e) const {		bool Merger::isSingleCondition(TensorId t, ExprId e) const {
		assert(t < numTensors && e < tensorExps.size());
switch (tensorExps[e].kind) {		switch (tensorExps[e].kind) {
// Leaf.		// Leaf.
case kTensor:		case kTensor:
return tensorExps[e].tensor == t;		return tensorExps[e].tensor == t;
case kInvariant:		case kInvariant:
case kIndex:		case kLoopVar:
return false;		return false;
// Unary operations.		// Unary operations.
case kAbsF:		case kAbsF:
case kAbsC:		case kAbsC:
case kAbsI:		case kAbsI:
case kCeilF:		case kCeilF:
case kFloorF:		case kFloorF:
case kSqrtF:		case kSqrtF:
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	bool Merger::isSingleCondition(TensorId t, ExprId e) const {
case kBinary:		case kBinary:
case kReduce:		case kReduce:
return false;		return false;
}		}
llvm_unreachable("unexpected kind");		llvm_unreachable("unexpected kind");
}		}

bool Merger::hasAnySparse(const BitVector &bits) const {		bool Merger::hasAnySparse(const BitVector &bits) const {
for (unsigned b = 0, be = bits.size(); b < be; b++)		for (TensorLoopId b = 0, be = bits.size(); b < be; b++)
if (bits[b] && (isCompressedDLT(getDimLevelType(b)) \|\|		if (bits[b]) {
isSingletonDLT(getDimLevelType(b))))		const auto dlt = getDimLevelType(b);
		if (isCompressedDLT(dlt) \|\| isSingletonDLT(dlt))
return true;		return true;
		}
return false;		return false;
}		}

#ifndef NDEBUG		#ifndef NDEBUG

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Print methods (for debugging).		// Print methods (for debugging).
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

static const char *kindToOpSymbol(Kind kind) {		static const char *kindToOpSymbol(Kind kind) {
switch (kind) {		switch (kind) {
// Leaf.		// Leaf.
case kTensor:		case kTensor:
return "tensor";		return "tensor";
case kInvariant:		case kInvariant:
return "invariant";		return "invariant";
case kIndex:		case kLoopVar:
return "index";		return "index";
// Unary operations.		// Unary operations.
case kAbsF:		case kAbsF:
case kAbsC:		case kAbsC:
case kAbsI:		case kAbsI:
return "abs";		return "abs";
case kCeilF:		case kCeilF:
return "ceil";		return "ceil";
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	static const char *kindToOpSymbol(Kind kind) {
case kBinary:		case kBinary:
return "binary";		return "binary";
case kReduce:		case kReduce:
return "reduce";		return "reduce";
}		}
llvm_unreachable("unexpected kind for symbol");		llvm_unreachable("unexpected kind for symbol");
}		}

void Merger::dumpExp(unsigned e) const {		void Merger::dumpExp(ExprId e) const {
switch (tensorExps[e].kind) {		switch (tensorExps[e].kind) {
// Leaf.		// Leaf.
case kTensor:		case kTensor:
if (tensorExps[e].tensor == syntheticTensor)		if (tensorExps[e].tensor == syntheticTensor)
llvm::dbgs() << "synthetic_";		llvm::dbgs() << "synthetic_";
else if (tensorExps[e].tensor == outTensor)		else if (tensorExps[e].tensor == outTensor)
llvm::dbgs() << "output_";		llvm::dbgs() << "output_";
llvm::dbgs() << "tensor_" << tensorExps[e].tensor;		llvm::dbgs() << "tensor_" << tensorExps[e].tensor;
break;		break;
case kInvariant:		case kInvariant:
llvm::dbgs() << "invariant";		llvm::dbgs() << "invariant";
break;		break;
case kIndex:		case kLoopVar:
llvm::dbgs() << "index_" << tensorExps[e].index;		llvm::dbgs() << "loopvar_" << tensorExps[e].loop;
break;		break;
// Unary operations.		// Unary operations.
case kAbsF:		case kAbsF:
case kAbsC:		case kAbsC:
case kAbsI:		case kAbsI:
case kCeilF:		case kCeilF:
case kFloorF:		case kFloorF:
case kSqrtF:		case kSqrtF:
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	case kReduce:
llvm::dbgs() << "(";		llvm::dbgs() << "(";
dumpExp(tensorExps[e].children.e0);		dumpExp(tensorExps[e].children.e0);
llvm::dbgs() << " " << kindToOpSymbol(tensorExps[e].kind) << " ";		llvm::dbgs() << " " << kindToOpSymbol(tensorExps[e].kind) << " ";
dumpExp(tensorExps[e].children.e1);		dumpExp(tensorExps[e].children.e1);
llvm::dbgs() << ")";		llvm::dbgs() << ")";
}		}
}		}

void Merger::dumpLat(unsigned p) const {		void Merger::dumpLat(LatPointId p) const {
llvm::dbgs() << "lat(";		llvm::dbgs() << "lat(";
dumpBits(latPoints[p].bits);		dumpBits(latPoints[p].bits);
llvm::dbgs() << " :";		llvm::dbgs() << " :";
dumpBits(latPoints[p].simple);		dumpBits(latPoints[p].simple);
llvm::dbgs() << " : ";		llvm::dbgs() << " : ";
dumpExp(latPoints[p].exp);		dumpExp(latPoints[p].exp);
llvm::dbgs() << " )\n";		llvm::dbgs() << " )\n";
}		}

void Merger::dumpSet(unsigned s) const {		void Merger::dumpSet(LatSetId s) const {
llvm::dbgs() << "{ #" << latSets[s].size() << "\n";		llvm::dbgs() << "{ #" << latSets[s].size() << "\n";
for (unsigned p : latSets[s]) {		for (const LatPointId p : latSets[s]) {
llvm::dbgs() << " ";		llvm::dbgs() << " ";
dumpLat(p);		dumpLat(p);
}		}
llvm::dbgs() << "}\n";		llvm::dbgs() << "}\n";
}		}

void Merger::dumpBits(const BitVector &bits) const {		void Merger::dumpBits(const BitVector &bits) const {
for (unsigned b = 0, be = bits.size(); b < be; b++) {		for (TensorLoopId b = 0, be = bits.size(); b < be; b++) {
if (bits[b]) {		if (bits[b]) {
unsigned t = tensor(b);		const TensorId t = tensor(b);
unsigned i = index(b);		const LoopId i = loop(b);
DimLevelType dlt = dimTypes[t][i];		const auto dlt = lvlTypes[t][i];
llvm::dbgs() << " i_" << t << "_" << i << "_" << toMLIRString(dlt);		llvm::dbgs() << " i_" << t << "_" << i << "_" << toMLIRString(dlt);
}		}
}		}
}		}

#endif // NDEBUG		#endif // NDEBUG

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Builder methods.		// Builder methods.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

unsigned Merger::buildLattices(unsigned e, unsigned i) {		LatSetId Merger::buildLattices(ExprId e, LoopId i) {
Kind kind = tensorExps[e].kind;		const Kind kind = tensorExps[e].kind;
switch (kind) {		switch (kind) {
// Leaf.		// Leaf.
case kTensor:		case kTensor:
case kInvariant:		case kInvariant:
case kIndex: {		case kLoopVar: {
// Either the index is really used in the tensor expression, or it is		// Either the loop-var is really used in the tensor expression, or it is
// set to the undefined index in that dimension. An invariant expression,		// set to the undefined loop-var in that level. An invariant expression,
// a proper index value, and a truly dynamic sparse output tensor are set		// a proper index value, and a truly dynamic sparse output tensor are set
// to a synthetic tensor with undefined indices only to ensure the		// to a synthetic tensor with undefined indices only to ensure the
// iteration space is not skipped as a result of their contents.		// iteration space is not skipped as a result of their contents.
unsigned s = addSet();		const LatSetId s = addSet();
unsigned t = syntheticTensor;		TensorId t = syntheticTensor;
if (kind == kTensor) {		if (kind == kTensor) {
t = tensorExps[e].tensor;		t = tensorExps[e].tensor;
if (hasSparseOut && t == outTensor)		if (hasSparseOut && t == outTensor)
t = syntheticTensor;		t = syntheticTensor;
}		}
latSets[s].push_back(addLat(t, i, e));		latSets[s].push_back(addLat(t, i, e));
return s;		return s;
}		}
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	return mapSet(kind, buildLattices(tensorExps[e].children.e0, i), Value(),
tensorExps[e].op);		tensorExps[e].op);
case kUnary:		case kUnary:
// A custom unary operation.		// A custom unary operation.
//		//
// op y\| !y \| y \|		// op y\| !y \| y \|
// ----+----------+------------+		// ----+----------+------------+
// \| absent() \| present(y) \|		// \| absent() \| present(y) \|
{		{
unsigned child0 = buildLattices(tensorExps[e].children.e0, i);		const LatSetId child0 = buildLattices(tensorExps[e].children.e0, i);
UnaryOp unop = cast<UnaryOp>(tensorExps[e].op);		UnaryOp unop = cast<UnaryOp>(tensorExps[e].op);
Region &absentRegion = unop.getAbsentRegion();		Region &absentRegion = unop.getAbsentRegion();

if (absentRegion.empty()) {		if (absentRegion.empty()) {
// Simple mapping over existing values.		// Simple mapping over existing values.
return mapSet(kind, child0, Value(), unop);		return mapSet(kind, child0, Value(), unop);
} // Use a disjunction with `unop` on the left and the absent value as an		} // Use a disjunction with `unop` on the left and the absent value as an
// invariant on the right.		// invariant on the right.
Block &absentBlock = absentRegion.front();		Block &absentBlock = absentRegion.front();
YieldOp absentYield = cast<YieldOp>(absentBlock.getTerminator());		YieldOp absentYield = cast<YieldOp>(absentBlock.getTerminator());
Value absentVal = absentYield.getResult();		Value absentVal = absentYield.getResult();
unsigned rhs = addExp(kInvariant, absentVal);		const ExprId rhs = addExp(kInvariant, absentVal);
return takeDisj(kind, child0, buildLattices(rhs, i), unop);		return disjSet(kind, child0, buildLattices(rhs, i), unop);
}		}
// Binary operations.		// Binary operations.
case kMulF:		case kMulF:
case kMulC:		case kMulC:
case kMulI:		case kMulI:
case kAndI:		case kAndI:
// A multiplicative operation only needs to be performed		// A multiplicative operation only needs to be performed
// for the conjunction of sparse iteration spaces.		// for the conjunction of sparse iteration spaces.
//		//
// x*y\|!y \| y \|		// x*y\|!y \| y \|
// ---+---+---+		// ---+---+---+
// !x \| 0 \| 0 \|		// !x \| 0 \| 0 \|
// x \| 0 \|x*y\|		// x \| 0 \|x*y\|
//		//
// Note even here, 0NaN=NaN and 0Inf=NaN, but that is ignored.		// Note even here, 0NaN=NaN and 0Inf=NaN, but that is ignored.
return takeConj(kind, // take binary conjunction		return conjSet(kind, buildLattices(tensorExps[e].children.e0, i),
buildLattices(tensorExps[e].children.e0, i),
buildLattices(tensorExps[e].children.e1, i));		buildLattices(tensorExps[e].children.e1, i));
case kDivF:		case kDivF:
case kDivC:		case kDivC:
case kDivS:		case kDivS:
case kDivU:		case kDivU:
// A division is tricky, since 0/0, 0/c, c/0 all have		// A division is tricky, since 0/0, 0/c, c/0 all have
// specific outcomes for floating-point and integers.		// specific outcomes for floating-point and integers.
// Thus, we need to traverse the full iteration space.		// Thus, we need to traverse the full iteration space.
//		//
// x/y\|!y \| y \|		// x/y\|!y \| y \|
// ---+---+---+		// ---+---+---+
// !x \|0/0\|0/y\| FP: 0/0=NaN,c/0=Inf,0/c=0 with c true nonzero		// !x \|0/0\|0/y\| FP: 0/0=NaN,c/0=Inf,0/c=0 with c true nonzero
// x \|x/0\|x/y\| INT: x/0=exception for any x		// x \|x/0\|x/y\| INT: x/0=exception for any x
//		//
// TODO: for now we "fixed" this by only accepting x/c cases		// TODO: for now we "fixed" this by only accepting x/c cases
// during expression building, so that the conjunction		// during expression building, so that the conjunction
// rules applies (viz. x/c = x*(1/c) as far as lattice		// rules applies (viz. x/c = x*(1/c) as far as lattice
// construction is concerned).		// construction is concerned).
assert(!maybeZero(tensorExps[e].children.e1));		assert(!maybeZero(tensorExps[e].children.e1));
return takeConj(kind, // take binary conjunction		return conjSet(kind, buildLattices(tensorExps[e].children.e0, i),
buildLattices(tensorExps[e].children.e0, i),
buildLattices(tensorExps[e].children.e1, i));		buildLattices(tensorExps[e].children.e1, i));
case kAddF:		case kAddF:
case kAddC:		case kAddC:
case kAddI:		case kAddI:
case kSubF:		case kSubF:
case kSubC:		case kSubC:
case kSubI:		case kSubI:
case kOrI:		case kOrI:
case kXorI:		case kXorI:
// An additive operation needs to be performed		// An additive operation needs to be performed
// for the disjunction of sparse iteration spaces.		// for the disjunction of sparse iteration spaces.
//		//
// x+y\|!y \| y \| x-y\|!y \| y \|		// x+y\|!y \| y \| x-y\|!y \| y \|
// ---+---+---+ ---+---+---+		// ---+---+---+ ---+---+---+
// !x \| 0 \| y \| !x \| 0 \|-y \|		// !x \| 0 \| y \| !x \| 0 \|-y \|
// x \| x \|x+y\| x \| x \|x-y\|		// x \| x \|x+y\| x \| x \|x-y\|
return takeDisj(kind, // take binary disjunction		return disjSet(kind, buildLattices(tensorExps[e].children.e0, i),
buildLattices(tensorExps[e].children.e0, i),
buildLattices(tensorExps[e].children.e1, i));		buildLattices(tensorExps[e].children.e1, i));
case kShrS:		case kShrS:
case kShrU:		case kShrU:
case kShlI:		case kShlI:
// A shift operation by an invariant amount (viz. tensor expressions		// A shift operation by an invariant amount (viz. tensor expressions
// can only occur at the left-hand-side of the operator) can be handled		// can only occur at the left-hand-side of the operator) can be handled
// with the conjuction rule.		// with the conjuction rule.
assert(isInvariant(tensorExps[e].children.e1));		assert(isInvariant(tensorExps[e].children.e1));
return takeConj(kind, // take binary conjunction		return conjSet(kind, buildLattices(tensorExps[e].children.e0, i),
buildLattices(tensorExps[e].children.e0, i),
buildLattices(tensorExps[e].children.e1, i));		buildLattices(tensorExps[e].children.e1, i));
case kBinary:		case kBinary:
// A custom binary operation.		// A custom binary operation.
//		//
// x op y\| !y \| y \|		// x op y\| !y \| y \|
// ------+---------+--------------+		// ------+---------+--------------+
// !x \| empty \| right(y) \|		// !x \| empty \| right(y) \|
// x \| left(x) \| overlap(x,y) \|		// x \| left(x) \| overlap(x,y) \|
{		{
unsigned child0 = buildLattices(tensorExps[e].children.e0, i);		const LatSetId child0 = buildLattices(tensorExps[e].children.e0, i);
unsigned child1 = buildLattices(tensorExps[e].children.e1, i);		const LatSetId child1 = buildLattices(tensorExps[e].children.e1, i);
BinaryOp binop = cast<BinaryOp>(tensorExps[e].op);		BinaryOp binop = cast<BinaryOp>(tensorExps[e].op);
Region &leftRegion = binop.getLeftRegion();		Region &leftRegion = binop.getLeftRegion();
Region &rightRegion = binop.getRightRegion();		Region &rightRegion = binop.getRightRegion();
// Left Region.		// Left Region.
Operation *leftYield = nullptr;		Operation *leftYield = nullptr;
if (!leftRegion.empty()) {		if (!leftRegion.empty()) {
Block &leftBlock = leftRegion.front();		Block &leftBlock = leftRegion.front();
leftYield = leftBlock.getTerminator();		leftYield = leftBlock.getTerminator();
}		}
// Right Region.		// Right Region.
Operation *rightYield = nullptr;		Operation *rightYield = nullptr;
if (!rightRegion.empty()) {		if (!rightRegion.empty()) {
Block &rightBlock = rightRegion.front();		Block &rightBlock = rightRegion.front();
rightYield = rightBlock.getTerminator();		rightYield = rightBlock.getTerminator();
}		}
bool includeLeft = binop.getLeftIdentity() \|\| !leftRegion.empty();		bool includeLeft = binop.getLeftIdentity() \|\| !leftRegion.empty();
bool includeRight = binop.getRightIdentity() \|\| !rightRegion.empty();		bool includeRight = binop.getRightIdentity() \|\| !rightRegion.empty();
return takeCombi(kBinary, child0, child1, binop, includeLeft,		return combiSet(kBinary, child0, child1, binop, includeLeft,
kBinaryBranch, leftYield, includeRight, kBinaryBranch,		kBinaryBranch, leftYield, includeRight, kBinaryBranch,
rightYield);		rightYield);
}		}
case kReduce:		case kReduce:
// A custom reduce operation.		// A custom reduce operation.
return takeConj(kind, buildLattices(tensorExps[e].children.e0, i),		return conjSet(kind, buildLattices(tensorExps[e].children.e0, i),
buildLattices(tensorExps[e].children.e1, i),		buildLattices(tensorExps[e].children.e1, i),
tensorExps[e].op);		tensorExps[e].op);
}		}
llvm_unreachable("unexpected expression kind");		llvm_unreachable("unexpected expression kind");
}		}

std::optional<unsigned> Merger::buildTensorExpFromLinalg(linalg::GenericOp op) {		std::optional<ExprId> Merger::buildTensorExpFromLinalg(linalg::GenericOp op) {
// Build the linalg semantics backward from yield.		// Build the linalg semantics backward from yield.
Operation *yield = op.getRegion().front().getTerminator();		Operation *yield = op.getRegion().front().getTerminator();
assert(isa<linalg::YieldOp>(yield));		assert(isa<linalg::YieldOp>(yield));
return buildTensorExp(op, yield->getOperand(0));		return buildTensorExp(op, yield->getOperand(0));
}		}

/// Only returns false if we are certain this is a nonzero.		/// Only returns false if we are certain this is a nonzero.
bool Merger::maybeZero(unsigned e) const {		bool Merger::maybeZero(ExprId e) const {
if (tensorExps[e].kind == kInvariant) {		if (tensorExps[e].kind == kInvariant) {
if (auto c = tensorExps[e].val.getDefiningOp<complex::ConstantOp>()) {		if (auto c = tensorExps[e].val.getDefiningOp<complex::ConstantOp>()) {
ArrayAttr arrayAttr = c.getValue();		ArrayAttr arrayAttr = c.getValue();
return arrayAttr[0].cast<FloatAttr>().getValue().isZero() &&		return arrayAttr[0].cast<FloatAttr>().getValue().isZero() &&
arrayAttr[1].cast<FloatAttr>().getValue().isZero();		arrayAttr[1].cast<FloatAttr>().getValue().isZero();
}		}
if (auto c = tensorExps[e].val.getDefiningOp<arith::ConstantIntOp>())		if (auto c = tensorExps[e].val.getDefiningOp<arith::ConstantIntOp>())
return c.value() == 0;		return c.value() == 0;
if (auto c = tensorExps[e].val.getDefiningOp<arith::ConstantFloatOp>())		if (auto c = tensorExps[e].val.getDefiningOp<arith::ConstantFloatOp>())
return c.value().isZero();		return c.value().isZero();
}		}
return true;		return true;
}		}

bool Merger::isInvariant(unsigned e) const {		bool Merger::isInvariant(ExprId e) const {
return tensorExps[e].kind == kInvariant;		return tensorExps[e].kind == kInvariant;
}		}

Type Merger::inferType(unsigned e, Value src) {		Type Merger::inferType(ExprId e, Value src) const {
// Obtain the destination type from the cast node.		// Obtain the destination type from the cast node.
Type dtp = tensorExps[e].val.getType();		Type dtp = tensorExps[e].val.getType();
// Inspect source type. For vector types, apply the same		// Inspect source type. For vector types, apply the same
// vectorization to the destination type.		// vectorization to the destination type.
if (auto vtp = src.getType().dyn_cast<VectorType>())		if (auto vtp = src.getType().dyn_cast<VectorType>())
return VectorType::get(vtp.getNumElements(), dtp, vtp.getNumScalableDims());		return VectorType::get(vtp.getNumElements(), dtp, vtp.getNumScalableDims());
return dtp;		return dtp;
}		}

/// Ensures that sparse compiler can generate code for expression.		/// Ensures that sparse compiler can generate code for expression.
static bool isAdmissibleBranchExp(Operation op, Block block, Value v) {		static bool isAdmissibleBranchExp(Operation op, Block block, Value v) {
// Arguments are always admissible.		// Arguments are always admissible.
if (auto arg = v.dyn_cast<BlockArgument>())		if (v.isa<BlockArgument>())
return true;		return true;
// Accept index anywhere.		// Accept index anywhere.
Operation *def = v.getDefiningOp();		Operation *def = v.getDefiningOp();
if (isa<linalg::IndexOp>(def))		if (isa<linalg::IndexOp>(def))
return true;		return true;
// Operation defined outside branch.		// Operation defined outside branch.
if (def->getBlock() != block)		if (def->getBlock() != block)
return def->getBlock() != op->getBlock(); // invariant?		return def->getBlock() != op->getBlock(); // invariant?
Show All 10 Lines	static bool isAdmissibleBranch(Operation *op, Region &region) {
if (region.empty())		if (region.empty())
return true;		return true;
// Build the semi-ring branch semantics backward from yield.		// Build the semi-ring branch semantics backward from yield.
Operation *yield = region.front().getTerminator();		Operation *yield = region.front().getTerminator();
assert(isa<YieldOp>(yield));		assert(isa<YieldOp>(yield));
return isAdmissibleBranchExp(op, &region.front(), yield->getOperand(0));		return isAdmissibleBranchExp(op, &region.front(), yield->getOperand(0));
}		}

std::optional<unsigned> Merger::buildTensorExp(linalg::GenericOp op, Value v) {		std::optional<ExprId> Merger::buildTensorExp(linalg::GenericOp op, Value v) {
if (auto arg = v.dyn_cast<BlockArgument>()) {		if (auto arg = v.dyn_cast<BlockArgument>()) {
unsigned argN = arg.getArgNumber();		const TensorId argN = arg.getArgNumber();
// Any argument of the generic op that is not marked as a scalar		// Any argument of the generic op that is not marked as a scalar
// argument is considered a tensor, indexed by the implicit loop		// argument is considered a tensor, indexed by the implicit loop
// bounds. This includes rank-0 tensor arguments.		// bounds. This includes rank-0 tensor arguments.
if (arg.getOwner()->getParentOp() == op) {		if (arg.getOwner()->getParentOp() == op) {
OpOperand &t = op->getOpOperand(argN);		OpOperand &t = op->getOpOperand(argN);
if (!op.isScalar(&t))		if (!op.isScalar(&t))
return addExp(kTensor, argN);		return addExp(kTensor, argN);
v = t.get(); // get scalar value		v = t.get(); // get scalar value
}		}
// Any other argument (marked as scalar argument for the generic op		// Any other argument (marked as scalar argument for the generic op
// or belonging to an enveloping op) is considered invariant.		// or belonging to an enveloping op) is considered invariant.
return addExp(kInvariant, v);		return addExp(kInvariant, v);
}		}
// Something defined outside is invariant.		// Something defined outside is invariant.
Operation *def = v.getDefiningOp();		Operation *def = v.getDefiningOp();
if (def->getBlock() != &op.getRegion().front())		if (def->getBlock() != &op.getRegion().front())
return addExp(kInvariant, v);		return addExp(kInvariant, v);
// Construct index operations.		// Construct index operations.
if (def->getNumOperands() == 0) {		if (def->getNumOperands() == 0) {
if (auto indexOp = dyn_cast<linalg::IndexOp>(def))		if (auto indexOp = dyn_cast<linalg::IndexOp>(def))
return addExp(kIndex, indexOp.getDim());		return addExp(kLoopVar, indexOp.getDim());
}		}
// Construct unary operations if subexpression can be built.		// Construct unary operations if subexpression can be built.
if (def->getNumOperands() == 1) {		if (def->getNumOperands() == 1) {
auto x = buildTensorExp(op, def->getOperand(0));		const auto x = buildTensorExp(op, def->getOperand(0));
if (x.has_value()) {		if (x.has_value()) {
unsigned e = *x;		const ExprId e = *x;
if (isa<math::AbsFOp>(def))		if (isa<math::AbsFOp>(def))
return addExp(kAbsF, e);		return addExp(kAbsF, e);
if (isa<complex::AbsOp>(def))		if (isa<complex::AbsOp>(def))
return addExp(kAbsC, e);		return addExp(kAbsC, e);
if (isa<math::AbsIOp>(def))		if (isa<math::AbsIOp>(def))
return addExp(kAbsI, e);		return addExp(kAbsI, e);
if (isa<math::CeilOp>(def))		if (isa<math::CeilOp>(def))
return addExp(kCeilF, e);		return addExp(kCeilF, e);
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	if (x.has_value()) {
return addExp(kSelect, e, Value(), def);		return addExp(kSelect, e, Value(), def);
}		}
}		}
}		}
// Construct binary operations if subexpressions can be built.		// Construct binary operations if subexpressions can be built.
// See buildLattices() for an explanation of rejecting certain		// See buildLattices() for an explanation of rejecting certain
// division and shift operations.		// division and shift operations.
if (def->getNumOperands() == 2) {		if (def->getNumOperands() == 2) {
auto x = buildTensorExp(op, def->getOperand(0));		const auto x = buildTensorExp(op, def->getOperand(0));
auto y = buildTensorExp(op, def->getOperand(1));		const auto y = buildTensorExp(op, def->getOperand(1));
if (x.has_value() && y.has_value()) {		if (x.has_value() && y.has_value()) {
unsigned e0 = *x;		const ExprId e0 = *x;
unsigned e1 = *y;		const ExprId e1 = *y;
if (isa<arith::MulFOp>(def))		if (isa<arith::MulFOp>(def))
return addExp(kMulF, e0, e1);		return addExp(kMulF, e0, e1);
if (isa<complex::MulOp>(def))		if (isa<complex::MulOp>(def))
return addExp(kMulC, e0, e1);		return addExp(kMulC, e0, e1);
if (isa<arith::MulIOp>(def))		if (isa<arith::MulIOp>(def))
return addExp(kMulI, e0, e1);		return addExp(kMulI, e0, e1);
if (isa<arith::DivFOp>(def) && !maybeZero(e1))		if (isa<arith::DivFOp>(def) && !maybeZero(e1))
return addExp(kDivF, e0, e1);		return addExp(kDivF, e0, e1);
Show All 34 Lines	if (x.has_value() && y.has_value()) {
(binop.getRightIdentity() \|\|		(binop.getRightIdentity() \|\|
isAdmissibleBranch(binop, binop.getRightRegion())))		isAdmissibleBranch(binop, binop.getRightRegion())))
return addExp(kBinary, e0, e1, Value(), def);		return addExp(kBinary, e0, e1, Value(), def);
}		}
}		}
}		}
// Construct ternary operations if subexpressions can be built.		// Construct ternary operations if subexpressions can be built.
if (def->getNumOperands() == 3) {		if (def->getNumOperands() == 3) {
auto x = buildTensorExp(op, def->getOperand(0));		const auto x = buildTensorExp(op, def->getOperand(0));
auto y = buildTensorExp(op, def->getOperand(1));		const auto y = buildTensorExp(op, def->getOperand(1));
auto z = buildTensorExp(op, def->getOperand(2));		const auto z = buildTensorExp(op, def->getOperand(2));
if (x.has_value() && y.has_value() && z.has_value()) {		if (x.has_value() && y.has_value() && z.has_value()) {
unsigned e0 = *x;		const ExprId e0 = *x;
unsigned e1 = *y;		const ExprId e1 = *y;
if (auto redop = dyn_cast<sparse_tensor::ReduceOp>(def)) {		if (auto redop = dyn_cast<sparse_tensor::ReduceOp>(def)) {
if (isAdmissibleBranch(redop, redop.getRegion()))		if (isAdmissibleBranch(redop, redop.getRegion()))
return addExp(kReduce, e0, e1, Value(), def);		return addExp(kReduce, e0, e1, Value(), def);
}		}
}		}
}		}
// Cannot build.		// Cannot build.
return std::nullopt;		return std::nullopt;
Show All 39 Lines	static Value buildBinaryOverlap(RewriterBase &rewriter, Location loc,
Region &overlapRegion = binop.getOverlapRegion();		Region &overlapRegion = binop.getOverlapRegion();
if (overlapRegion.empty())		if (overlapRegion.empty())
// Uninitialized Value() will be interpreted as missing data in the		// Uninitialized Value() will be interpreted as missing data in the
// output.		// output.
return Value();		return Value();
return insertYieldOp(rewriter, loc, overlapRegion, {v0, v1});		return insertYieldOp(rewriter, loc, overlapRegion, {v0, v1});
}		}

Value Merger::buildExp(RewriterBase &rewriter, Location loc, unsigned e,		Value Merger::buildExp(RewriterBase &rewriter, Location loc, ExprId e, Value v0,
Value v0, Value v1) {		Value v1) const {
switch (tensorExps[e].kind) {		switch (tensorExps[e].kind) {
// Leaf.		// Leaf.
case kTensor:		case kTensor:
case kInvariant:		case kInvariant:
case kIndex:		case kLoopVar:
llvm_unreachable("unexpected non-op");		llvm_unreachable("unexpected non-op");
// Unary operations.		// Unary operations.
case kAbsF:		case kAbsF:
return rewriter.create<math::AbsFOp>(loc, v0);		return rewriter.create<math::AbsFOp>(loc, v0);
case kAbsC: {		case kAbsC: {
auto type = v0.getType().cast<ComplexType>();		auto type = v0.getType().cast<ComplexType>();
auto eltType = type.getElementType().cast<FloatType>();		auto eltType = type.getElementType().cast<FloatType>();
return rewriter.create<complex::AbsOp>(loc, eltType, v0);		return rewriter.create<complex::AbsOp>(loc, eltType, v0);
▲ Show 20 Lines • Show All 128 Lines • Show Last 20 Lines

mlir/unittests/Dialect/SparseTensor/MergerTest.cpp

Show First 20 Lines • Show All 219 Lines • ▼ Show 20 Lines	bool compareExpression(unsigned e, const std::shared_ptr<Pattern> &pattern) {
auto tensorExp = merger.exp(e);		auto tensorExp = merger.exp(e);
if (tensorExp.kind != pattern->kind)		if (tensorExp.kind != pattern->kind)
return false;		return false;
switch (tensorExp.kind) {		switch (tensorExp.kind) {
// Leaf.		// Leaf.
case kTensor:		case kTensor:
return tensorExp.tensor == pattern->tensorNum;		return tensorExp.tensor == pattern->tensorNum;
case kInvariant:		case kInvariant:
case kIndex:		case kLoopVar:
llvm_unreachable("invariant not handled yet");		llvm_unreachable("invariant not handled yet");
// Unary operations.		// Unary operations.
case kAbsF:		case kAbsF:
case kAbsC:		case kAbsC:
case kAbsI:		case kAbsI:
case kCeilF:		case kCeilF:
case kFloorF:		case kFloorF:
case kSqrtF:		case kSqrtF:
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	protected:
// Our single loop.		// Our single loop.
const unsigned l0 = 0;		const unsigned l0 = 0;

MergerTest3T1L() : MergerTestBase(3, 1) {		MergerTest3T1L() : MergerTestBase(3, 1) {
EXPECT_TRUE(merger.getOutTensorID() == t2);		EXPECT_TRUE(merger.getOutTensorID() == t2);

// Tensor 0: sparse input vector.		// Tensor 0: sparse input vector.
merger.addExp(Kind::kTensor, t0, -1u);		merger.addExp(Kind::kTensor, t0, -1u);
merger.setDimAndDimLevelType(t0, l0, 0, DimLevelType::Compressed);		merger.setLevelAndType(t0, l0, 0, DimLevelType::Compressed);

// Tensor 1: sparse input vector.		// Tensor 1: sparse input vector.
merger.addExp(Kind::kTensor, t1, -1u);		merger.addExp(Kind::kTensor, t1, -1u);
merger.setDimAndDimLevelType(t1, l0, 0, DimLevelType::Compressed);		merger.setLevelAndType(t1, l0, 0, DimLevelType::Compressed);

// Tensor 2: dense output vector.		// Tensor 2: dense output vector.
merger.addExp(Kind::kTensor, t2, -1u);		merger.addExp(Kind::kTensor, t2, -1u);
merger.setDimAndDimLevelType(t2, l0, 0, DimLevelType::Dense);		merger.setLevelAndType(t2, l0, 0, DimLevelType::Dense);
}		}
};		};

class MergerTest4T1L : public MergerTestBase {		class MergerTest4T1L : public MergerTestBase {
protected:		protected:
// Our four tensors (three inputs, one output).		// Our four tensors (three inputs, one output).
const unsigned t0 = 0, t1 = 1, t2 = 2, t3 = 3;		const unsigned t0 = 0, t1 = 1, t2 = 2, t3 = 3;

// Our single loop.		// Our single loop.
const unsigned l0 = 0;		const unsigned l0 = 0;

MergerTest4T1L() : MergerTestBase(4, 1) {		MergerTest4T1L() : MergerTestBase(4, 1) {
EXPECT_TRUE(merger.getOutTensorID() == t3);		EXPECT_TRUE(merger.getOutTensorID() == t3);

// Tensor 0: sparse input vector.		// Tensor 0: sparse input vector.
merger.addExp(Kind::kTensor, t0, -1u);		merger.addExp(Kind::kTensor, t0, -1u);
merger.setDimAndDimLevelType(t0, l0, 0, DimLevelType::Compressed);		merger.setLevelAndType(t0, l0, 0, DimLevelType::Compressed);

// Tensor 1: sparse input vector.		// Tensor 1: sparse input vector.
merger.addExp(Kind::kTensor, t1, -1u);		merger.addExp(Kind::kTensor, t1, -1u);
merger.setDimAndDimLevelType(t1, l0, 0, DimLevelType::Compressed);		merger.setLevelAndType(t1, l0, 0, DimLevelType::Compressed);

// Tensor 2: sparse input vector		// Tensor 2: sparse input vector
merger.addExp(Kind::kTensor, t2, -1u);		merger.addExp(Kind::kTensor, t2, -1u);
merger.setDimAndDimLevelType(t2, l0, 0, DimLevelType::Compressed);		merger.setLevelAndType(t2, l0, 0, DimLevelType::Compressed);

// Tensor 3: dense output vector		// Tensor 3: dense output vector
merger.addExp(Kind::kTensor, t3, -1u);		merger.addExp(Kind::kTensor, t3, -1u);
merger.setDimAndDimLevelType(t3, l0, 0, DimLevelType::Dense);		merger.setLevelAndType(t3, l0, 0, DimLevelType::Dense);
}		}
};		};

///		///
/// Tests with both sparse and dense input.		/// Tests with both sparse and dense input.
///		///

class MergerTest3T1LD : public MergerTestBase {		class MergerTest3T1LD : public MergerTestBase {
protected:		protected:
// Our three tensors (two inputs, one output).		// Our three tensors (two inputs, one output).
const unsigned t0 = 0, t1 = 1, t2 = 2;		const unsigned t0 = 0, t1 = 1, t2 = 2;

// Our single loop.		// Our single loop.
const unsigned l0 = 0;		const unsigned l0 = 0;

MergerTest3T1LD() : MergerTestBase(3, 1) {		MergerTest3T1LD() : MergerTestBase(3, 1) {
EXPECT_TRUE(merger.getOutTensorID() == t2);		EXPECT_TRUE(merger.getOutTensorID() == t2);

// Tensor 0: sparse input vector.		// Tensor 0: sparse input vector.
merger.addExp(Kind::kTensor, t0, -1u);		merger.addExp(Kind::kTensor, t0, -1u);
merger.setDimAndDimLevelType(t0, l0, 0, DimLevelType::Compressed);		merger.setLevelAndType(t0, l0, 0, DimLevelType::Compressed);

// Tensor 1: dense input vector.		// Tensor 1: dense input vector.
merger.addExp(Kind::kTensor, t1, -1u);		merger.addExp(Kind::kTensor, t1, -1u);
merger.setDimAndDimLevelType(t1, l0, 0, DimLevelType::Dense);		merger.setLevelAndType(t1, l0, 0, DimLevelType::Dense);

// Tensor 2: dense output vector.		// Tensor 2: dense output vector.
merger.addExp(Kind::kTensor, t2, -1u);		merger.addExp(Kind::kTensor, t2, -1u);
merger.setDimAndDimLevelType(t2, l0, 0, DimLevelType::Dense);		merger.setLevelAndType(t2, l0, 0, DimLevelType::Dense);
}		}
};		};

///		///
/// Tests with both undef and dense input.		/// Tests with both undef and dense input.
///		///

class MergerTest4T1LU : public MergerTestBase {		class MergerTest4T1LU : public MergerTestBase {
protected:		protected:
// Our three tensors (three inputs, one output).		// Our three tensors (three inputs, one output).
const unsigned t0 = 0, t1 = 1, t2 = 2, t3 = 3;		const unsigned t0 = 0, t1 = 1, t2 = 2, t3 = 3;

// Our single loop.		// Our single loop.
const unsigned l0 = 0;		const unsigned l0 = 0;

MergerTest4T1LU() : MergerTestBase(4, 1) {		MergerTest4T1LU() : MergerTestBase(4, 1) {
EXPECT_TRUE(merger.getOutTensorID() == t3);		EXPECT_TRUE(merger.getOutTensorID() == t3);

// Tensor 0: undef input vector.		// Tensor 0: undef input vector.
merger.addExp(Kind::kTensor, t0, -1u);		merger.addExp(Kind::kTensor, t0, -1u);
merger.setDimAndDimLevelType(t0, l0, 0, DimLevelType::Undef);		merger.setLevelAndType(t0, l0, 0, DimLevelType::Undef);

// Tensor 1: dense input vector.		// Tensor 1: dense input vector.
merger.addExp(Kind::kTensor, t1, -1u);		merger.addExp(Kind::kTensor, t1, -1u);
merger.setDimAndDimLevelType(t1, l0, 0, DimLevelType::Dense);		merger.setLevelAndType(t1, l0, 0, DimLevelType::Dense);

// Tensor 2: undef input vector.		// Tensor 2: undef input vector.
merger.addExp(Kind::kTensor, t2, -1u);		merger.addExp(Kind::kTensor, t2, -1u);
merger.setDimAndDimLevelType(t2, l0, 0, DimLevelType::Undef);		merger.setLevelAndType(t2, l0, 0, DimLevelType::Undef);

// Tensor 3: dense output vector.		// Tensor 3: dense output vector.
merger.addExp(Kind::kTensor, t3, -1u);		merger.addExp(Kind::kTensor, t3, -1u);
merger.setDimAndDimLevelType(t3, l0, 0, DimLevelType::Dense);		merger.setLevelAndType(t3, l0, 0, DimLevelType::Dense);
}		}
};		};

///		///
/// Tests with operation on sparse output.		/// Tests with operation on sparse output.
///		///

class MergerTest3T1LSo : public MergerTestBase {		class MergerTest3T1LSo : public MergerTestBase {
protected:		protected:
// Our three tensors (two inputs, one output, one synthetic).		// Our three tensors (two inputs, one output, one synthetic).
const unsigned t0 = 0, t1 = 1, t2 = 2, t3 = 3;		const unsigned t0 = 0, t1 = 1, t2 = 2, t3 = 3;

// Our single loop.		// Our single loop.
const unsigned l0 = 0;		const unsigned l0 = 0;

MergerTest3T1LSo() : MergerTestBase(3, 1) {		MergerTest3T1LSo() : MergerTestBase(3, 1) {
EXPECT_TRUE(merger.getOutTensorID() == t2);		EXPECT_TRUE(merger.getOutTensorID() == t2);
EXPECT_TRUE(merger.getSynTensorID() == t3);		EXPECT_TRUE(merger.getSynTensorID() == t3);

merger.setHasSparseOut(true);		merger.setHasSparseOut(true);

// Tensor 0: undef input vector.		// Tensor 0: undef input vector.
merger.addExp(Kind::kTensor, t0, -1u);		merger.addExp(Kind::kTensor, t0, -1u);
merger.setDimAndDimLevelType(t0, l0, 0, DimLevelType::Undef);		merger.setLevelAndType(t0, l0, 0, DimLevelType::Undef);

// Tensor 1: undef input vector.		// Tensor 1: undef input vector.
merger.addExp(Kind::kTensor, t1, -1u);		merger.addExp(Kind::kTensor, t1, -1u);
merger.setDimAndDimLevelType(t1, l0, 0, DimLevelType::Undef);		merger.setLevelAndType(t1, l0, 0, DimLevelType::Undef);

// Tensor 2: sparse output vector.		// Tensor 2: sparse output vector.
merger.addExp(Kind::kTensor, t2, -1u);		merger.addExp(Kind::kTensor, t2, -1u);
merger.setDimAndDimLevelType(t2, l0, 0, DimLevelType::Compressed);		merger.setLevelAndType(t2, l0, 0, DimLevelType::Compressed);
}		}
};		};

} // namespace		} // namespace

/// Vector multiplication (conjunction) of 3 vectors, i.e.;		/// Vector multiplication (conjunction) of 3 vectors, i.e.;
/// a(i) = b(i) * c(i) * d(i)		/// a(i) = b(i) * c(i) * d(i)
/// which should form the single lattice point		/// which should form the single lattice point
▲ Show 20 Lines • Show All 325 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse] Cleaning up names in {Merger,LoopEmitter,CodegenEnv}.{h,cpp}ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 505204

mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorType.h

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h

mlir/lib/Dialect/SparseTensor/Transforms/CodegenEnv.h

mlir/lib/Dialect/SparseTensor/Transforms/CodegenEnv.cpp

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorRewriting.cpp

mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp

mlir/lib/Dialect/SparseTensor/Utils/Merger.cpp

mlir/unittests/Dialect/SparseTensor/MergerTest.cpp

[mlir][sparse] Cleaning up names in {Merger,LoopEmitter,CodegenEnv}.{h,cpp}
ClosedPublic