This is an archive of the discontinued LLVM Phabricator instance.

@aartbik This is mostly the same code as before the split. I moved getReductionIdentity out of Merger.cpp and it's now fully contained in codegen. I also broke up genInsertionLoad into two functions -- one to handle reductions and one to handle non-reduction cases. This eliminates the duplicate if (codegen.redKind == kNoReduc) checks.

I'm still not sure what all needs to happen for vectorized reductions.

Harbormaster completed remote builds in B177603: Diff 447687.Jul 26 2022, 8:50 AM

Sorry for the delay getting back to you on this, I was OOO for 2 weeks. I had a look, and the amount of changes in the merger seem about right. The changes in sparsification, however, feel too elaborate. I will hack a bit around tomorrow with this and let you know if there are simpler solutions, and also get back on the vectorization question. In the meantime, the changes to sparse_out_reduction.mlir seem self-contained, so that could be a quick revision first, once you addressed the memsan issue.

mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
816	Note that reductions should be fully "scalarized", so I never expect tensor loads having to deal with reductions
mlir/lib/Dialect/SparseTensor/Utils/Merger.cpp
1041	why is this commented out? do we need some verification on the value of the third parameter?
1043	This probably should become getRegion() when you rebase with some recent changes in MLIR core
mlir/test/Dialect/SparseTensor/roundtrip.mlir
292 ↗	(On Diff #447687)	change seems unrelated to this revision
mlir/test/Dialect/SparseTensor/sparse_kernels.mlir
58 ↗	(On Diff #447687)	why do we have changes in the original reductions?
mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_out_reduction.mlir
1 ↗	(On Diff #447687)	The changes in this file merely seem to increase coverage of existing reductions, but they could be done in their own revision, since it is unrelated to this change.
15 ↗	(On Diff #447687)	note that this is a DCSC if you want to use the standard naming (you can rename the one above to DCSR then to be consistent)
78 ↗	(On Diff #447687)	maybe redprod1 and redprod2, or otherwise redprodDCSR and redprocDCSC?
165 ↗	(On Diff #447687)	you will need to release smr and smc as well!
mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_reduce_custom.mlir
51	although I of course applaud composing the new feature with other features in our tests, I would expect that for simplicity you also have one integration test that just introduces the new sparse_tensor.reduce, without using the binary or unary ops also
mlir/unittests/Dialect/SparseTensor/MergerTest.cpp
264	Should this not be in the binary section?! I think the confusion is because some other tags are misplaced here ;-)

aartbik added inline comments.Aug 9 2022, 5:16 PM

mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
1053	I think you will only need to change the code slightly in this file at two places. (1) At this point (start of a scalarized reduction), when you are in a custom reduction, don't call genTensorLoad, but initialize the "load" Value with the identity value provided by the sparse_tensor.reduce third parameter. (2) At genInsertionLoad, we need to deal with regular insertions and access pattern expansion for custom reductions. In the former, we load the identity rather than zero, and are done. In the latter case, we cannot simply load the zero from the expanded access pattern, but need to do if (!expFilled[i]) value= identity else value = expValues[i] You have some of that logic already, but it happens during a regular tensor load. I think you will just need to do it during genInsertionLoad Will that provide the semantics you expect?
mlir/lib/Dialect/SparseTensor/Utils/Merger.cpp
1042	redop -> redOp
1228	This seem small enough to just do case kReduce: { ReduceOp redOp = cast<ReduceOp>(tensorExps[e].op); return insertYieldOp(rewriter, loc, redOp.getRegion(), {v0, v1}); }

aartbik added inline comments.Aug 10 2022, 1:40 PM

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
122	Comment needs to be updated for kReduce
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
393	not needed, see below
423	not needed, see below
429	Note that you do not have to worry about vectorization, since the "block" of code in the semi-ring operation cannot be vectorized anyway. You will have to make sure, however, that vectorization is disabled in the presence of a custom reduction.
756	I would simplify this into return builder.create<arith::SelectOp>(loc, isFilled, valAtIndex, identity);
816	Let me refine my comment, I would never expect this to happen in pure tensor load, but in an insertion load only. So although I would have done it inside that method, this makes sense, except that I would just test for custom reduction, not all the others, and proceed with the code you had just for custom reduction. You will have to check if we are inside a custom reduction more widely, since redKind is not necessarily set (when we are in an insertion loop). Having a better detection here, for example by setting a customRed value during the recursion, will avoid the huge block of code later! All logic will be here. Also, this avoids the changes to existing reductions you had.
988	this block of code is not needed when the custom reduction is properly detected during insertion load, see my code suggestion above
1054	So, this branch would become, where getCustomRedId fetches the third argument from a custom reduction operation. if (atStart) { Kind kind = merger.exp(last).kind; Value load = kind == Kind::kReduce ? getCustomRedId(merger.exp(last).op) : genTensorLoad(merger, codegen, builder, op, exp); codegen.redKind = getReduction(kind); codegen.redExp = exp; updateReduc(merger, codegen, load, "START"); } else { ....
mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_reduce_custom.mlir
24	unrelated to this change, but I actually noticed that when we use the custom reduction, we don't seem to verify the iterator types as carefully as otherwise is the case. Did you notice that too. If so, we probably need to fix that in the verifier (hopefully we don't need to go into linalg for that).
152	This is actually WAI, sort of. When we do a reduction within the innermost loop, we always come out with a value, even if the loop is actually never taken. This behavior also happens for regular reductions, and is similar to what the TACO compiler generates (at least at the moment). Agreed, we could improve this, but that will take some more work beyond this new operation.
174	Note that recent changes unified this with the bufferization dialect ops: bufferization.dealloc_tensor

@aartbik Thank you taking a look at my code and giving me feedback.

I do have a question about the approach you are suggesting. If I check for custom reduction and only use the identity in that case, it will leave the current behavior of standard reductions unchanged, including the behavior that reduction with mulf always returns 0.0. Is that the direction you want to go? My PR tried to address that, but I'm okay leaving it alone and only focusing on custom reduction if you feel that is better.

mlir/test/Dialect/SparseTensor/sparse_kernels.mlir
58 ↗	(On Diff #447687)	This PR doesn't only implement custom reductions. It also adds the `getReductionIdentity` which applies to standard reductions like `mulf`. Without those changes, reduction using `mulf` will always return 0.0. If you want to leave the existing behavior of standard reductions unchanged, that is fine with me. It will then require a custom reduction to use `mulf`with a specified identity of 1.0.
mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_out_reduction.mlir
1 ↗	(On Diff #447687)	The tests I added here are meant to add coverage of existing (non-custom) reductions, specifically `mulf`, which I consider as being in a broken state. That is why I included them here. Per my note above, I'm okay if we want to leave existing behavior alone. In that case, I would remove these tests.

I propose that we leave current behavior unchanged (although we should fix both issues eventually, i.e. mul-reduction with 1 as well as innermost loop skipped yields no value) and focus on adding new op only.
After that we can address the two issues pointed out.

Sounds good?

Updates based on feedback

@aartbik This is definitely simpler than what I had in Sparsification.cpp. Thanks for pointing me in the right direction.

Harbormaster completed remote builds in B181107: Diff 452440.Aug 13 2022, 11:36 AM

Almost there! I believe we can make it even simpler with some recursive set/unset.
But then good to go

mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
818	since the if returns, no else, and indent the second return to the left
983	I believe that this function can be simpler too. Simply remove the "last" and all codegen code you added Then, as else branch of L985 use if (merger.exp(exp).kind == Kind::kReduce) codegen.redCustom = getCustomRedId(merger.exp(exp).op); and then right before exit at L1291 do if (merger.exp(exp).kind == Kind::kReduce) codegen.redCustom = Value() you can also make redCustom an unsigned and use -1/exp value instead. This way, the recursive nature will take care of handling the tensor load in the code above, without special handling Of course, this assume custom reductions are never nested, something that makes sense, but perhaps we should verify
mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_reduce_custom.mlir
214	make this more precise, it it not about the Inf per se, but about not having full "fill in" due to the reduction loop

Further refinement based on feedback

I hope this is what you had in mind. I didn't see how to make redCustom unsigned, as it needs to store a Value.

Harbormaster completed remote builds in B181611: Diff 453108.Aug 16 2022, 1:53 PM

In D130573#3727091, @jim22k wrote:

I hope this is what you had in mind. I didn't see how to make redCustom unsigned, as it needs to store a Value.

Oh, I just meant to store the exp index, and then later extract the value using that; but this is fine too

mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
986	we can actually assert here to avoid the nested problem assert(!codegen.redCustom); codegen.redCustom = ....

Assuming you address my last few nits (can't help myself), this is good to ship!
Thanks for your patience during this review.

mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
984	if you move this below the next few if-s that return you can do this without the else, and without braces if ... return if (merger.exp(exp).kind == Kind::kReduce) { codegen.redCustom = t=

This revision is now accepted and ready to land.Aug 16 2022, 4:22 PM

Final updates

Harbormaster completed remote builds in B181661: Diff 453173.Aug 16 2022, 5:40 PM

Closed by commit rGc8bb23547f21: [mlir][sparse] Custom reduce with identity (authored by jim22k). · Explain WhyAug 17 2022, 9:22 AM

This revision was automatically updated to reflect the committed changes.

jim22k added a commit: rGc8bb23547f21: [mlir][sparse] Custom reduce with identity.

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

SparseTensor/

Utils/

Merger.h

5 lines

lib/

Dialect/

SparseTensor/

Transforms/

Sparsification.cpp

60 lines

Utils/

Merger.cpp

31 lines

test/

Integration/

Dialect/

SparseTensor/

CPU/

sparse_reduce_custom.mlir

234 lines

unittests/

Dialect/

SparseTensor/

MergerTest.cpp

1 line

Diff 453305

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h

Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	enum Kind {
kSubI,		kSubI,
kAndI,		kAndI,
kOrI,		kOrI,
kXorI,		kXorI,
kShrS, // signed		kShrS, // signed
kShrU, // unsigned		kShrU, // unsigned
kShlI,		kShlI,
kBinary, // semiring binary op		kBinary, // semiring binary op
		kReduce, // semiring reduction op
};		};

/// Children subexpressions of tensor operations.		/// Children subexpressions of tensor operations.
struct Children {		struct Children {
unsigned e0;		unsigned e0;
unsigned e1;		unsigned e1;
};		};

Show All 15 Lines	union {
Children children;		Children children;
};		};

/// Direct link to IR for an invariant or the destination value (to		/// Direct link to IR for an invariant or the destination value (to
/// infer destination type) of a cast operation During code generation,		/// infer destination type) of a cast operation During code generation,
/// this field may be used to cache "hoisted" loop invariant tensor loads.		/// this field may be used to cache "hoisted" loop invariant tensor loads.
Value val;		Value val;

/// Code blocks used by semirings. For the case of kUnary and		/// Code blocks used by semirings. For the case of kUnary, kBinary, and
/// kBinary, this holds the original operation with all regions. For		/// kReduce, this holds the original operation with all regions. For
/// kBinaryBranch, this holds the YieldOp for the left or right half		/// kBinaryBranch, this holds the YieldOp for the left or right half
/// to be merged into a nested scf loop.		/// to be merged into a nested scf loop.
		aartbikUnsubmitted Done Reply Inline Actions Comment needs to be updated for kReduce aartbik: Comment needs to be updated for kReduce
Operation *op;		Operation *op;
};		};

/// Lattice point. Each lattice point consists of a conjunction of tensor		/// Lattice point. Each lattice point consists of a conjunction of tensor
/// loop indices (encoded in a bitvector) and the index of the corresponding		/// loop indices (encoded in a bitvector) and the index of the corresponding
/// tensor expression.		/// tensor expression.
struct LatPoint {		struct LatPoint {
LatPoint(unsigned n, unsigned e, unsigned b);		LatPoint(unsigned n, unsigned e, unsigned b);
▲ Show 20 Lines • Show All 180 Lines • Show Last 20 Lines

mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
enum SortMask {		enum SortMask {
kSparseOnly = 0x0,		kSparseOnly = 0x0,
kIncludeDense = 0x1,		kIncludeDense = 0x1,
kIncludeUndef = 0x2,		kIncludeUndef = 0x2,
kIncludeAll = 0x3		kIncludeAll = 0x3
};		};

// Reduction kinds.		// Reduction kinds.
enum Reduction { kNoReduc, kSum, kProduct, kAnd, kOr, kXor };		enum Reduction { kNoReduc, kSum, kProduct, kAnd, kOr, kXor, kCustom };

// Code generation.		// Code generation.
struct CodeGen {		struct CodeGen {
CodeGen(SparsificationOptions o, unsigned numTensors, unsigned numLoops,		CodeGen(SparsificationOptions o, unsigned numTensors, unsigned numLoops,
OpOperand *op, unsigned nest)		OpOperand *op, unsigned nest)
: options(o), loops(numLoops), sizes(numLoops), buffers(numTensors),		: options(o), loops(numLoops), sizes(numLoops), buffers(numTensors),
pointers(numTensors, std::vector<Value>(numLoops)),		pointers(numTensors, std::vector<Value>(numLoops)),
indices(numTensors, std::vector<Value>(numLoops)),		indices(numTensors, std::vector<Value>(numLoops)),
Show All 20 Lines	struct CodeGen {
std::vector<std::vector<Value>> highs;		std::vector<std::vector<Value>> highs;
std::vector<std::vector<Value>> pidxs;		std::vector<std::vector<Value>> pidxs;
std::vector<std::vector<Value>> idxs;		std::vector<std::vector<Value>> idxs;
/// Current reduction, updated during code generation. When indices of a		/// Current reduction, updated during code generation. When indices of a
/// reduction are exhausted, all inner loops can use a scalarized reduction.		/// reduction are exhausted, all inner loops can use a scalarized reduction.
unsigned redExp = -1u;		unsigned redExp = -1u;
Value redVal;		Value redVal;
Reduction redKind = kNoReduc;		Reduction redKind = kNoReduc;
		unsigned redCustom = -1u;
// Sparse tensor as output. Implemented either through direct injective		// Sparse tensor as output. Implemented either through direct injective
// insertion in lexicographic index order (where indices are updated		// insertion in lexicographic index order (where indices are updated
// in the temporary array `lexIdx`) or through access pattern expansion		// in the temporary array `lexIdx`) or through access pattern expansion
// in the innermost loop nest (`expValues` through `expCount`).		// in the innermost loop nest (`expValues` through `expCount`).
OpOperand *sparseOut;		OpOperand *sparseOut;
unsigned outerParNest;		unsigned outerParNest;
Value lexIdx;		Value lexIdx;
Value lexVal;		Value lexVal;
▲ Show 20 Lines • Show All 270 Lines • ▼ Show 20 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Sparse compiler synthesis methods (reductions).		// Sparse compiler synthesis methods (reductions).
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// Maps reduction kind to vector::CombiningKind.		/// Maps reduction kind to vector::CombiningKind.
static vector::CombiningKind getCombiningKind(Reduction kind) {		static vector::CombiningKind getCombiningKind(Reduction kind) {
switch (kind) {		switch (kind) {
case kNoReduc:		case kNoReduc:
		case kCustom:
break;		break;
case kSum:		case kSum:
return vector::CombiningKind::ADD;		return vector::CombiningKind::ADD;
case kProduct:		case kProduct:
return vector::CombiningKind::MUL;		return vector::CombiningKind::MUL;
case kAnd:		case kAnd:
return vector::CombiningKind::AND;		return vector::CombiningKind::AND;
case kOr:		case kOr:
return vector::CombiningKind::OR;		return vector::CombiningKind::OR;
case kXor:		case kXor:
return vector::CombiningKind::XOR;		return vector::CombiningKind::XOR;
}		}
llvm_unreachable("unknown reduction kind");		llvm_unreachable("unknown reduction kind");
}		}

/// Maps operation to reduction.		/// Maps operation to reduction.
		aartbikUnsubmitted Done Reply Inline Actions not needed, see below aartbik: not needed, see below
static Reduction getReduction(Kind kind) {		static Reduction getReduction(Kind kind) {
switch (kind) {		switch (kind) {
case Kind::kAddF:		case Kind::kAddF:
case Kind::kAddC:		case Kind::kAddC:
case Kind::kAddI:		case Kind::kAddI:
case Kind::kSubF:		case Kind::kSubF:
case Kind::kSubC:		case Kind::kSubC:
case Kind::kSubI:		case Kind::kSubI:
return kSum;		return kSum;
case Kind::kMulF:		case Kind::kMulF:
case Kind::kMulC:		case Kind::kMulC:
case Kind::kMulI:		case Kind::kMulI:
return kProduct;		return kProduct;
case Kind::kAndI:		case Kind::kAndI:
return kAnd;		return kAnd;
case Kind::kOrI:		case Kind::kOrI:
return kOr;		return kOr;
case Kind::kXorI:		case Kind::kXorI:
return kXor;		return kXor;
		case Kind::kReduce:
		return kCustom;
default:		default:
llvm_unreachable("unexpected reduction operator");		llvm_unreachable("unexpected reduction operator");
}		}
}		}

/// Generates an initial value for a vector reduction, following the scheme		/// Generates an initial value for a vector reduction, following the scheme
/// given in Chapter 5 of "The Software Vectorization Handbook", where the		/// given in Chapter 5 of "The Software Vectorization Handbook", where the
/// initial scalar value is correctly embedded in the vector reduction value,		/// initial scalar value is correctly embedded in the vector reduction value,
/// and a straightforward horizontal reduction will complete the operation.		/// and a straightforward horizontal reduction will complete the operation.
		aartbikUnsubmitted Done Reply Inline Actions not needed, see below aartbik: not needed, see below
static Value genVectorReducInit(CodeGen &codegen, OpBuilder &builder,		static Value genVectorReducInit(CodeGen &codegen, OpBuilder &builder,
Location loc, VectorType vtp) {		Location loc, VectorType vtp) {
Value r = codegen.redVal;		Value r = codegen.redVal;
switch (codegen.redKind) {		switch (codegen.redKind) {
case kNoReduc:		case kNoReduc:
		case kCustom:
		aartbikUnsubmitted Not Done Reply Inline Actions Note that you do not have to worry about vectorization, since the "block" of code in the semi-ring operation cannot be vectorized anyway. You will have to make sure, however, that vectorization is disabled in the presence of a custom reduction. aartbik: Note that you do not have to worry about vectorization, since the "block" of code in the semi…
break;		break;
case kSum:		case kSum:
case kXor:		case kXor:
// Initialize reduction vector to: \| 0 \| .. \| 0 \| r \|		// Initialize reduction vector to: \| 0 \| .. \| 0 \| r \|
return builder.create<vector::InsertElementOp>(		return builder.create<vector::InsertElementOp>(
loc, r, constantZero(builder, loc, vtp),		loc, r, constantZero(builder, loc, vtp),
constantIndex(builder, loc, 0));		constantIndex(builder, loc, 0));
case kProduct:		case kProduct:
Show All 16 Lines
}		}

/// Updates scalarized reduction value.		/// Updates scalarized reduction value.
static void updateReduc(Merger &merger, CodeGen &codegen, Value reduc) {		static void updateReduc(Merger &merger, CodeGen &codegen, Value reduc) {
assert(codegen.redKind != kNoReduc);		assert(codegen.redKind != kNoReduc);
codegen.redVal = merger.exp(codegen.redExp).val = reduc;		codegen.redVal = merger.exp(codegen.redExp).val = reduc;
}		}

		/// Extracts identity from custom reduce.
		static Value getCustomRedId(Operation *op) {
		return dyn_cast<sparse_tensor::ReduceOp>(op).getIdentity();
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Sparse compiler synthesis methods (statements and expressions).		// Sparse compiler synthesis methods (statements and expressions).
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// Generates buffer for the output tensor. Note that all sparse kernels		/// Generates buffer for the output tensor. Note that all sparse kernels
/// assume that when all elements are written to (viz. x(i) = y(i) * z(i)),		/// assume that when all elements are written to (viz. x(i) = y(i) * z(i)),
/// the output buffer is already initialized to all zeroes and only nonzeroes		/// the output buffer is already initialized to all zeroes and only nonzeroes
/// values are computed and written out. For updates (viz. x(i) += y(i) * z(i)),		/// values are computed and written out. For updates (viz. x(i) += y(i) * z(i)),
▲ Show 20 Lines • Show All 256 Lines • ▼ Show 20 Lines	if (!codegen.expValues) {
Type tp = getElementTypeOrSelf(t->get().getType());		Type tp = getElementTypeOrSelf(t->get().getType());
return constantZero(builder, loc, tp);		return constantZero(builder, loc, tp);
}		}
// Load from expanded access pattern.		// Load from expanded access pattern.
Value index = genIndex(codegen, op, t);		Value index = genIndex(codegen, op, t);
return builder.create<memref::LoadOp>(loc, codegen.expValues, index);		return builder.create<memref::LoadOp>(loc, codegen.expValues, index);
}		}

		/// Generates insertion code to implement dynamic tensor load for reduction.
		static Value genInsertionLoadReduce(Merger &merger, CodeGen &codegen,
		OpBuilder &builder, linalg::GenericOp op,
		OpOperand *t) {
		Location loc = op.getLoc();
		Value identity = getCustomRedId(merger.exp(codegen.redCustom).op);
		// Direct lexicographic index order, tensor loads as identity.
		if (!codegen.expValues) {
		return identity;
		}
		// Load from expanded access pattern if filled, identity otherwise.
		Value index = genIndex(codegen, op, t);
		Value isFilled =
		builder.create<memref::LoadOp>(loc, codegen.expFilled, index);
		Value valAtIndex =
		builder.create<memref::LoadOp>(loc, codegen.expValues, index);
		return builder.create<arith::SelectOp>(loc, isFilled, valAtIndex, identity);
		}
		aartbikUnsubmitted Done Reply Inline Actions I would simplify this into return builder.create<arith::SelectOp>(loc, isFilled, valAtIndex, identity); aartbik: I would simplify this into return builder.create<arith::SelectOp>(loc, isFilled, valAtIndex…

/// Generates insertion code to implement dynamic tensor store.		/// Generates insertion code to implement dynamic tensor store.
static void genInsertionStore(CodeGen &codegen, OpBuilder &builder,		static void genInsertionStore(CodeGen &codegen, OpBuilder &builder,
linalg::GenericOp op, OpOperand *t, Value rhs) {		linalg::GenericOp op, OpOperand *t, Value rhs) {
Location loc = op.getLoc();		Location loc = op.getLoc();
// Direct insertion in lexicographic index order.		// Direct insertion in lexicographic index order.
if (!codegen.expValues) {		if (!codegen.expValues) {
builder.create<memref::StoreOp>(loc, rhs, codegen.lexVal);		builder.create<memref::StoreOp>(loc, rhs, codegen.lexVal);
builder.create<LexInsertOp>(loc, t->get(), codegen.lexIdx, codegen.lexVal);		builder.create<LexInsertOp>(loc, t->get(), codegen.lexIdx, codegen.lexVal);
Show All 38 Lines	static Value genTensorLoad(Merger &merger, CodeGen &codegen, OpBuilder &builder,
Value val = merger.exp(exp).val;		Value val = merger.exp(exp).val;
if (val) {		if (val) {
if (codegen.curVecLength > 1 && !val.getType().isa<VectorType>())		if (codegen.curVecLength > 1 && !val.getType().isa<VectorType>())
return genVectorInvariantValue(codegen, builder, val);		return genVectorInvariantValue(codegen, builder, val);
return val;		return val;
}		}
// Load during insertion.		// Load during insertion.
OpOperand *t = op.getInputAndOutputOperands()[merger.exp(exp).tensor];		OpOperand *t = op.getInputAndOutputOperands()[merger.exp(exp).tensor];
if (t == codegen.sparseOut)		if (t == codegen.sparseOut) {
		if (codegen.redCustom != -1u)
		return genInsertionLoadReduce(merger, codegen, builder, op, t);
return genInsertionLoad(codegen, builder, op, t);		return genInsertionLoad(codegen, builder, op, t);
		}
		aartbikUnsubmitted Done Reply Inline Actions Note that reductions should be fully "scalarized", so I never expect tensor loads having to deal with reductions aartbik: Note that reductions should be fully "scalarized", so I never expect tensor loads having to…
		aartbikUnsubmitted Done Reply Inline Actions Let me refine my comment, I would never expect this to happen in pure tensor load, but in an insertion load only. So although I would have done it inside that method, this makes sense, except that I would just test for custom reduction, not all the others, and proceed with the code you had just for custom reduction. You will have to check if we are inside a custom reduction more widely, since redKind is not necessarily set (when we are in an insertion loop). Having a better detection here, for example by setting a customRed value during the recursion, will avoid the huge block of code later! All logic will be here. Also, this avoids the changes to existing reductions you had. aartbik: Let me refine my comment, I would never expect this to happen in pure tensor load, but in an…
// Actual load.		// Actual load.
SmallVector<Value, 4> args;		SmallVector<Value, 4> args;
		aartbikUnsubmitted Done Reply Inline Actions since the if returns, no else, and indent the second return to the left aartbik: since the if returns, no else, and indent the second return to the left
Value ptr = genSubscript(codegen, builder, op, t, args);		Value ptr = genSubscript(codegen, builder, op, t, args);
if (codegen.curVecLength > 1)		if (codegen.curVecLength > 1)
return genVectorLoad(codegen, builder, ptr, args);		return genVectorLoad(codegen, builder, ptr, args);
return builder.create<memref::LoadOp>(op.getLoc(), ptr, args);		return builder.create<memref::LoadOp>(op.getLoc(), ptr, args);
}		}

/// Generates a store on a dense or sparse tensor.		/// Generates a store on a dense or sparse tensor.
static void genTensorStore(Merger &merger, CodeGen &codegen, OpBuilder &builder,		static void genTensorStore(Merger &merger, CodeGen &codegen, OpBuilder &builder,
▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines

/// Recursively generates tensor expression.		/// Recursively generates tensor expression.
static Value genExp(Merger &merger, CodeGen &codegen, RewriterBase &rewriter,		static Value genExp(Merger &merger, CodeGen &codegen, RewriterBase &rewriter,
linalg::GenericOp op, unsigned exp, unsigned ldx) {		linalg::GenericOp op, unsigned exp, unsigned ldx) {
Location loc = op.getLoc();		Location loc = op.getLoc();
if (exp == -1u)		if (exp == -1u)
return Value();		return Value();
if (merger.exp(exp).kind == Kind::kTensor)		if (merger.exp(exp).kind == Kind::kTensor)
return genTensorLoad(merger, codegen, rewriter, op, exp);		return genTensorLoad(merger, codegen, rewriter, op, exp);
		aartbikUnsubmitted Done Reply Inline Actions I believe that this function can be simpler too. Simply remove the "last" and all codegen code you added Then, as else branch of L985 use if (merger.exp(exp).kind == Kind::kReduce) codegen.redCustom = getCustomRedId(merger.exp(exp).op); and then right before exit at L1291 do if (merger.exp(exp).kind == Kind::kReduce) codegen.redCustom = Value() you can also make redCustom an unsigned and use -1/exp value instead. This way, the recursive nature will take care of handling the tensor load in the code above, without special handling Of course, this assume custom reductions are never nested, something that makes sense, but perhaps we should verify aartbik: I believe that this function can be simpler too. Simply remove the "last" and all codegen code…
if (merger.exp(exp).kind == Kind::kInvariant)		if (merger.exp(exp).kind == Kind::kInvariant)
		aartbikUnsubmitted Done Reply Inline Actions if you move this below the next few if-s that return you can do this without the else, and without braces if ... return if (merger.exp(exp).kind == Kind::kReduce) { codegen.redCustom = t= aartbik: if you move this below the next few if-s that return you can do this without the else, and…
return genInvariantValue(merger, codegen, rewriter, exp);		return genInvariantValue(merger, codegen, rewriter, exp);
if (merger.exp(exp).kind == Kind::kIndex)		if (merger.exp(exp).kind == Kind::kIndex)
		aartbikUnsubmitted Done Reply Inline Actions we can actually assert here to avoid the nested problem assert(!codegen.redCustom); codegen.redCustom = .... aartbik: we can actually assert here to avoid the nested problem assert(!codegen.redCustom); codegen.
return genIndexValue(codegen, rewriter, merger.exp(exp).index, ldx);		return genIndexValue(codegen, rewriter, merger.exp(exp).index, ldx);
		if (merger.exp(exp).kind == Kind::kReduce) {
		aartbikUnsubmitted Done Reply Inline Actions this block of code is not needed when the custom reduction is properly detected during insertion load, see my code suggestion above aartbik: this block of code is not needed when the custom reduction is properly detected during…
		// Make custom reduction identity accessible for expanded access pattern.
		assert(codegen.redCustom == -1u);
		codegen.redCustom = exp;
		}
Value v0 =		Value v0 =
genExp(merger, codegen, rewriter, op, merger.exp(exp).children.e0, ldx);		genExp(merger, codegen, rewriter, op, merger.exp(exp).children.e0, ldx);
Value v1 =		Value v1 =
genExp(merger, codegen, rewriter, op, merger.exp(exp).children.e1, ldx);		genExp(merger, codegen, rewriter, op, merger.exp(exp).children.e1, ldx);
Value ee = merger.buildExp(rewriter, loc, exp, v0, v1);		Value ee = merger.buildExp(rewriter, loc, exp, v0, v1);
if (ee && (merger.exp(exp).kind == Kind::kUnary \|\|		if (ee && (merger.exp(exp).kind == Kind::kUnary \|\|
merger.exp(exp).kind == Kind::kBinary \|\|		merger.exp(exp).kind == Kind::kBinary \|\|
merger.exp(exp).kind == Kind::kBinaryBranch))		merger.exp(exp).kind == Kind::kBinaryBranch \|\|
		merger.exp(exp).kind == Kind::kReduce))
ee = relinkBranch(codegen, rewriter, ee.getParentBlock(), ee, ldx);		ee = relinkBranch(codegen, rewriter, ee.getParentBlock(), ee, ldx);
		if (merger.exp(exp).kind == Kind::kReduce)
		codegen.redCustom = -1u;
return ee;		return ee;
}		}

/// Determines if affine expression is invariant.		/// Determines if affine expression is invariant.
static bool isInvariantAffine(const CodeGen &codegen, AffineExpr a,		static bool isInvariantAffine(const CodeGen &codegen, AffineExpr a,
unsigned ldx, bool &atLevel) {		unsigned ldx, bool &atLevel) {
switch (a.getKind()) {		switch (a.getKind()) {
case AffineExprKind::DimId: {		case AffineExprKind::DimId: {
Show All 11 Lines	static bool isInvariantAffine(const CodeGen &codegen, AffineExpr a,
default:		default:
return true;		return true;
}		}
}		}

/// Hoists loop invariant tensor loads for which indices have been exhausted.		/// Hoists loop invariant tensor loads for which indices have been exhausted.
static void genInvariants(Merger &merger, CodeGen &codegen, OpBuilder &builder,		static void genInvariants(Merger &merger, CodeGen &codegen, OpBuilder &builder,
linalg::GenericOp op, unsigned exp, unsigned ldx,		linalg::GenericOp op, unsigned exp, unsigned ldx,
bool atStart, Kind last = Kind::kTensor) {		bool atStart, unsigned last = 0) {
if (exp == -1u)		if (exp == -1u)
return;		return;
if (merger.exp(exp).kind == Kind::kTensor) {		if (merger.exp(exp).kind == Kind::kTensor) {
// Inspect tensor indices.		// Inspect tensor indices.
bool atLevel = ldx == -1u;		bool atLevel = ldx == -1u;
OpOperand *t = op.getInputAndOutputOperands()[merger.exp(exp).tensor];		OpOperand *t = op.getInputAndOutputOperands()[merger.exp(exp).tensor];
auto map = op.getTiedIndexingMap(t);		auto map = op.getTiedIndexingMap(t);
auto enc = getSparseTensorEncoding(t->get().getType());		auto enc = getSparseTensorEncoding(t->get().getType());
for (unsigned d = 0, rank = map.getNumResults(); d < rank; d++) {		for (unsigned d = 0, rank = map.getNumResults(); d < rank; d++) {
AffineExpr a = map.getResult(perm(enc, d));		AffineExpr a = map.getResult(perm(enc, d));
if (!isInvariantAffine(codegen, a, ldx, atLevel))		if (!isInvariantAffine(codegen, a, ldx, atLevel))
return; // still in play		return; // still in play
}		}
// All exhausted at this level (atLevel denotes exactly at this level).		// All exhausted at this level (atLevel denotes exactly at this level).
if (!atLevel)		if (!atLevel)
return;		return;
OpOperand *lhs = op.getOutputOperand(0);		OpOperand *lhs = op.getOutputOperand(0);
if (lhs == t) {		if (lhs == t) {
// Start or end a scalarized reduction		// Start or end a scalarized reduction
if (atStart) {		if (atStart) {
Value load = genTensorLoad(merger, codegen, builder, op, exp);		Kind kind = merger.exp(last).kind;
		aartbikUnsubmitted Done Reply Inline Actions I think you will only need to change the code slightly in this file at two places. (1) At this point (start of a scalarized reduction), when you are in a custom reduction, don't call genTensorLoad, but initialize the "load" Value with the identity value provided by the sparse_tensor.reduce third parameter. (2) At genInsertionLoad, we need to deal with regular insertions and access pattern expansion for custom reductions. In the former, we load the identity rather than zero, and are done. In the latter case, we cannot simply load the zero from the expanded access pattern, but need to do if (!expFilled[i]) value= identity else value = expValues[i] You have some of that logic already, but it happens during a regular tensor load. I think you will just need to do it during genInsertionLoad Will that provide the semantics you expect? aartbik: I think you will only need to change the code slightly in this file at two places. (1) At…
codegen.redKind = getReduction(last);		Value load = kind == Kind::kReduce
		aartbikUnsubmitted Done Reply Inline Actions So, this branch would become, where getCustomRedId fetches the third argument from a custom reduction operation. if (atStart) { Kind kind = merger.exp(last).kind; Value load = kind == Kind::kReduce ? getCustomRedId(merger.exp(last).op) : genTensorLoad(merger, codegen, builder, op, exp); codegen.redKind = getReduction(kind); codegen.redExp = exp; updateReduc(merger, codegen, load, "START"); } else { .... aartbik: So, this branch would become, where getCustomRedId fetches the third argument from a custom…
		? getCustomRedId(merger.exp(last).op)
		: genTensorLoad(merger, codegen, builder, op, exp);
		codegen.redKind = getReduction(kind);
codegen.redExp = exp;		codegen.redExp = exp;
updateReduc(merger, codegen, load);		updateReduc(merger, codegen, load);
} else {		} else {
Value redVal = codegen.redVal;		Value redVal = codegen.redVal;
updateReduc(merger, codegen, Value());		updateReduc(merger, codegen, Value());
codegen.redExp = -1u;		codegen.redExp = -1u;
codegen.redKind = kNoReduc;		codegen.redKind = kNoReduc;
genTensorStore(merger, codegen, builder, op, exp, redVal);		genTensorStore(merger, codegen, builder, op, exp, redVal);
}		}
} else {		} else {
// Start or end loop invariant hoisting of a tensor load.		// Start or end loop invariant hoisting of a tensor load.
merger.exp(exp).val =		merger.exp(exp).val =
atStart ? genTensorLoad(merger, codegen, builder, op, exp) : Value();		atStart ? genTensorLoad(merger, codegen, builder, op, exp) : Value();
}		}
} else if (merger.exp(exp).kind != Kind::kInvariant &&		} else if (merger.exp(exp).kind != Kind::kInvariant &&
merger.exp(exp).kind != Kind::kIndex) {		merger.exp(exp).kind != Kind::kIndex) {
// Traverse into the binary operations. Note that we only hoist		// Traverse into the binary operations. Note that we only hoist
// tensor loads, since subsequent MLIR/LLVM passes know how to		// tensor loads, since subsequent MLIR/LLVM passes know how to
// deal with all other kinds of derived loop invariants.		// deal with all other kinds of derived loop invariants.
Kind last = merger.exp(exp).kind;
unsigned e0 = merger.exp(exp).children.e0;		unsigned e0 = merger.exp(exp).children.e0;
unsigned e1 = merger.exp(exp).children.e1;		unsigned e1 = merger.exp(exp).children.e1;
genInvariants(merger, codegen, builder, op, e0, ldx, atStart, last);		genInvariants(merger, codegen, builder, op, e0, ldx, atStart, exp);
genInvariants(merger, codegen, builder, op, e1, ldx, atStart, last);		genInvariants(merger, codegen, builder, op, e1, ldx, atStart, exp);
}		}
}		}

/// Generates an expanded access pattern in innermost dimension.		/// Generates an expanded access pattern in innermost dimension.
static void genExpansion(Merger &merger, CodeGen &codegen, OpBuilder &builder,		static void genExpansion(Merger &merger, CodeGen &codegen, OpBuilder &builder,
linalg::GenericOp op, unsigned at, bool atStart) {		linalg::GenericOp op, unsigned at, bool atStart) {
OpOperand *lhs = codegen.sparseOut;		OpOperand *lhs = codegen.sparseOut;
if (!lhs \|\| codegen.outerParNest != op.getRank(lhs) - 1 \|\|		if (!lhs \|\| codegen.outerParNest != op.getRank(lhs) - 1 \|\|
▲ Show 20 Lines • Show All 763 Lines • Show Last 20 Lines

mlir/lib/Dialect/SparseTensor/Utils/Merger.cpp

Show First 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	TensorExp::TensorExp(Kind k, unsigned x, unsigned y, Value v, Operation *o)
case kShrS:		case kShrS:
case kShrU:		case kShrU:
case kShlI:		case kShlI:
assert(x != -1u && y != -1u && !v && !o);		assert(x != -1u && y != -1u && !v && !o);
children.e0 = x;		children.e0 = x;
children.e1 = y;		children.e1 = y;
break;		break;
case kBinary:		case kBinary:
		case kReduce:
assert(x != -1u && y != -1u && !v && o);		assert(x != -1u && y != -1u && !v && o);
children.e0 = x;		children.e0 = x;
children.e1 = y;		children.e1 = y;
break;		break;
}		}
}		}

LatPoint::LatPoint(unsigned n, unsigned e, unsigned b)		LatPoint::LatPoint(unsigned n, unsigned e, unsigned b)
▲ Show 20 Lines • Show All 246 Lines • ▼ Show 20 Lines	case kAddI:
return isSingleCondition(t, tensorExps[e].children.e0) &&		return isSingleCondition(t, tensorExps[e].children.e0) &&
isSingleCondition(t, tensorExps[e].children.e1);		isSingleCondition(t, tensorExps[e].children.e1);
case kSubF:		case kSubF:
case kSubC:		case kSubC:
case kSubI:		case kSubI:
case kOrI:		case kOrI:
case kXorI:		case kXorI:
case kBinary:		case kBinary:
		case kReduce:
return false;		return false;
}		}
llvm_unreachable("unexpected kind");		llvm_unreachable("unexpected kind");
}		}

#ifndef NDEBUG		#ifndef NDEBUG

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	static const char *kindToOpSymbol(Kind kind) {
case kShrS:		case kShrS:
return "a>>";		return "a>>";
case kShrU:		case kShrU:
return ">>";		return ">>";
case kShlI:		case kShlI:
return "<<";		return "<<";
case kBinary:		case kBinary:
return "binary";		return "binary";
		case kReduce:
		return "reduce";
}		}
llvm_unreachable("unexpected kind for symbol");		llvm_unreachable("unexpected kind for symbol");
}		}

void Merger::dumpExp(unsigned e) const {		void Merger::dumpExp(unsigned e) const {
switch (tensorExps[e].kind) {		switch (tensorExps[e].kind) {
// Leaf.		// Leaf.
case kTensor:		case kTensor:
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	void Merger::dumpExp(unsigned e) const {
case kSubI:		case kSubI:
case kAndI:		case kAndI:
case kOrI:		case kOrI:
case kXorI:		case kXorI:
case kShrS:		case kShrS:
case kShrU:		case kShrU:
case kShlI:		case kShlI:
case kBinary:		case kBinary:
		case kReduce:
llvm::dbgs() << "(";		llvm::dbgs() << "(";
dumpExp(tensorExps[e].children.e0);		dumpExp(tensorExps[e].children.e0);
llvm::dbgs() << " " << kindToOpSymbol(tensorExps[e].kind) << " ";		llvm::dbgs() << " " << kindToOpSymbol(tensorExps[e].kind) << " ";
dumpExp(tensorExps[e].children.e1);		dumpExp(tensorExps[e].children.e1);
llvm::dbgs() << ")";		llvm::dbgs() << ")";
}		}
}		}

▲ Show 20 Lines • Show All 224 Lines • ▼ Show 20 Lines	// x \| left(x) \| overlap(x,y) \|
rightYield = rightBlock.getTerminator();		rightYield = rightBlock.getTerminator();
}		}
bool includeLeft = binop.getLeftIdentity() \|\| !leftRegion.empty();		bool includeLeft = binop.getLeftIdentity() \|\| !leftRegion.empty();
bool includeRight = binop.getRightIdentity() \|\| !rightRegion.empty();		bool includeRight = binop.getRightIdentity() \|\| !rightRegion.empty();
return takeCombi(kBinary, child0, child1, binop, includeLeft,		return takeCombi(kBinary, child0, child1, binop, includeLeft,
kBinaryBranch, leftYield, includeRight, kBinaryBranch,		kBinaryBranch, leftYield, includeRight, kBinaryBranch,
rightYield);		rightYield);
}		}
		case kReduce:
		// A custom reduce operation.
		return takeConj(kind, buildLattices(tensorExps[e].children.e0, i),
		buildLattices(tensorExps[e].children.e1, i),
		tensorExps[e].op);
}		}
llvm_unreachable("unexpected expression kind");		llvm_unreachable("unexpected expression kind");
}		}

Optional<unsigned> Merger::buildTensorExpFromLinalg(linalg::GenericOp op) {		Optional<unsigned> Merger::buildTensorExpFromLinalg(linalg::GenericOp op) {
// Build the linalg semantics backward from yield.		// Build the linalg semantics backward from yield.
Operation *yield = op.getRegion().front().getTerminator();		Operation *yield = op.getRegion().front().getTerminator();
assert(isa<linalg::YieldOp>(yield));		assert(isa<linalg::YieldOp>(yield));
▲ Show 20 Lines • Show All 155 Lines • ▼ Show 20 Lines	if (x.has_value()) {
if (isAdmissableBranch(unop, unop.getPresentRegion()) &&		if (isAdmissableBranch(unop, unop.getPresentRegion()) &&
isAdmissableBranch(unop, unop.getAbsentRegion()))		isAdmissableBranch(unop, unop.getAbsentRegion()))
return addExp(kUnary, e, Value(), def);		return addExp(kUnary, e, Value(), def);
}		}
}		}
}		}
// Construct binary operations if subexpressions can be built.		// Construct binary operations if subexpressions can be built.
// See buildLattices() for an explanation of rejecting certain		// See buildLattices() for an explanation of rejecting certain
// division and shift operations		// division and shift operations.
if (def->getNumOperands() == 2) {		if (def->getNumOperands() == 2) {
auto x = buildTensorExp(op, def->getOperand(0));		auto x = buildTensorExp(op, def->getOperand(0));
auto y = buildTensorExp(op, def->getOperand(1));		auto y = buildTensorExp(op, def->getOperand(1));
if (x.has_value() && y.has_value()) {		if (x.has_value() && y.has_value()) {
unsigned e0 = x.value();		unsigned e0 = x.value();
unsigned e1 = y.value();		unsigned e1 = y.value();
if (isa<arith::MulFOp>(def))		if (isa<arith::MulFOp>(def))
return addExp(kMulF, e0, e1);		return addExp(kMulF, e0, e1);
Show All 38 Lines	if (x.has_value() && y.has_value()) {
(binop.getLeftIdentity() \|\|		(binop.getLeftIdentity() \|\|
isAdmissableBranch(binop, binop.getLeftRegion())) &&		isAdmissableBranch(binop, binop.getLeftRegion())) &&
(binop.getRightIdentity() \|\|		(binop.getRightIdentity() \|\|
isAdmissableBranch(binop, binop.getRightRegion())))		isAdmissableBranch(binop, binop.getRightRegion())))
return addExp(kBinary, e0, e1, Value(), def);		return addExp(kBinary, e0, e1, Value(), def);
}		}
}		}
}		}
		// Construct ternary operations if subexpressions can be built.
		if (def->getNumOperands() == 3) {
		auto x = buildTensorExp(op, def->getOperand(0));
		auto y = buildTensorExp(op, def->getOperand(1));
		auto z = buildTensorExp(op, def->getOperand(2));
		if (x.has_value() && y.has_value() && z.has_value()) {
		unsigned e0 = x.value();
		unsigned e1 = y.value();
		// unsigned e2 = z.getValue();
		aartbikUnsubmitted Done Reply Inline Actions why is this commented out? do we need some verification on the value of the third parameter? aartbik: why is this commented out? do we need some verification on the value of the third parameter?
		if (auto redop = dyn_cast<sparse_tensor::ReduceOp>(def)) {
		aartbikUnsubmitted Done Reply Inline Actions redop -> redOp aartbik: redop -> redOp
		if (isAdmissableBranch(redop, redop.getRegion()))
		aartbikUnsubmitted Done Reply Inline Actions This probably should become getRegion() when you rebase with some recent changes in MLIR core aartbik: This probably should become getRegion() when you rebase with some recent changes in MLIR core
		return addExp(kReduce, e0, e1, Value(), def);
		}
		}
		}
// Cannot build.		// Cannot build.
return None;		return None;
}		}

static Value insertYieldOp(RewriterBase &rewriter, Location loc, Region &region,		static Value insertYieldOp(RewriterBase &rewriter, Location loc, Region &region,
ValueRange vals) {		ValueRange vals) {
// Make a clone of overlap region.		// Make a clone of overlap region.
Region tmpRegion;		Region tmpRegion;
▲ Show 20 Lines • Show All 163 Lines • ▼ Show 20 Lines	case kShlI:
return rewriter.create<arith::ShLIOp>(loc, v0, v1);		return rewriter.create<arith::ShLIOp>(loc, v0, v1);
case kBinaryBranch: // semi-ring ops with custom logic.		case kBinaryBranch: // semi-ring ops with custom logic.
return insertYieldOp(rewriter, loc,		return insertYieldOp(rewriter, loc,
*tensorExps[e].op->getBlock()->getParent(), {v0});		*tensorExps[e].op->getBlock()->getParent(), {v0});
case kUnary:		case kUnary:
return buildUnaryPresent(rewriter, loc, tensorExps[e].op, v0);		return buildUnaryPresent(rewriter, loc, tensorExps[e].op, v0);
case kBinary:		case kBinary:
return buildBinaryOverlap(rewriter, loc, tensorExps[e].op, v0, v1);		return buildBinaryOverlap(rewriter, loc, tensorExps[e].op, v0, v1);
		case kReduce: {
		ReduceOp redOp = cast<ReduceOp>(tensorExps[e].op);
		aartbikUnsubmitted Done Reply Inline Actions This seem small enough to just do case kReduce: { ReduceOp redOp = cast<ReduceOp>(tensorExps[e].op); return insertYieldOp(rewriter, loc, redOp.getRegion(), {v0, v1}); } aartbik: This seem small enough to just do case kReduce: { ReduceOp redOp = cast<ReduceOp>…
		return insertYieldOp(rewriter, loc, redOp.getRegion(), {v0, v1});
		}
}		}
llvm_unreachable("unexpected expression kind in build");		llvm_unreachable("unexpected expression kind in build");
}		}

} // namespace sparse_tensor		} // namespace sparse_tensor
} // namespace mlir		} // namespace mlir

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_reduce_custom.mlir

This file was added.

				// RUN: mlir-opt %s --sparse-compiler \| \
				// RUN: mlir-cpu-runner \
				// RUN: -e entry -entry-point-result=void \
				// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_c_runner_utils%shlibext \| \
				// RUN: FileCheck %s

				#SparseVector = #sparse_tensor.encoding<{dimLevelType = ["compressed"]}>
				#CSR = #sparse_tensor.encoding<{dimLevelType = ["dense", "compressed"]}>
				#CSC = #sparse_tensor.encoding<{
				dimLevelType = [ "dense", "compressed" ],
				dimOrdering = affine_map<(i,j) -> (j,i)>
				}>

				//
				// Traits for tensor operations.
				//
				#trait_matmul = {
				indexing_maps = [
				affine_map<(i,j,k) -> (i,k)>, // A
				affine_map<(i,j,k) -> (k,j)>, // B
				affine_map<(i,j,k) -> (i,j)> // C (out)
				],
				iterator_types = ["parallel", "parallel", "reduction"],
				doc = "C(i,j) = SUM_k A(i,k) * B(k,j)"
				aartbikUnsubmitted Not Done Reply Inline Actions unrelated to this change, but I actually noticed that when we use the custom reduction, we don't seem to verify the iterator types as carefully as otherwise is the case. Did you notice that too. If so, we probably need to fix that in the verifier (hopefully we don't need to go into linalg for that). aartbik: unrelated to this change, but I actually noticed that when we use the custom reduction, we…
				}

				#trait_mat_reduce_rowwise = {
				indexing_maps = [
				affine_map<(i,j) -> (i,j)>, // A (in)
				affine_map<(i,j) -> (i)> // X (out)
				],
				iterator_types = ["parallel", "reduction"],
				doc = "X(i) = PROD_j A(i,j)"
				}

				#trait_mat_reduce_colwise = {
				indexing_maps = [
				affine_map<(i,j) -> (i,j)>, // A (in)
				affine_map<(i,j) -> (j)> // X (out)
				],
				iterator_types = ["reduction", "parallel"],
				doc = "X(j) = PROD_i A(i,j)"
				}

				module {
				func.func @redProdLex(%arga: tensor<?x?xf64, #CSR>) -> tensor<?xf64, #SparseVector> {
				%c0 = arith.constant 0 : index
				%cf1 = arith.constant 1.0 : f64
				%d0 = tensor.dim %arga, %c0 : tensor<?x?xf64, #CSR>
				%xv = bufferization.alloc_tensor(%d0): tensor<?xf64, #SparseVector>
				%0 = linalg.generic #trait_mat_reduce_rowwise
				aartbikUnsubmitted Not Done Reply Inline Actions although I of course applaud composing the new feature with other features in our tests, I would expect that for simplicity you also have one integration test that just introduces the new sparse_tensor.reduce, without using the binary or unary ops also aartbik: although I of course applaud composing the new feature with other features in our tests, I…
				ins(%arga: tensor<?x?xf64, #CSR>)
				outs(%xv: tensor<?xf64, #SparseVector>) {
				^bb(%a: f64, %b: f64):
				%1 = sparse_tensor.reduce %a, %b, %cf1 : f64 {
				^bb0(%x: f64, %y: f64):
				%2 = arith.mulf %x, %y : f64
				sparse_tensor.yield %2 : f64
				}
				linalg.yield %1 : f64
				} -> tensor<?xf64, #SparseVector>
				return %0 : tensor<?xf64, #SparseVector>
				}

				func.func @redProdExpand(%arga: tensor<?x?xf64, #CSC>) -> tensor<?xf64, #SparseVector> {
				%c0 = arith.constant 0 : index
				%cf1 = arith.constant 1.0 : f64
				%d0 = tensor.dim %arga, %c0 : tensor<?x?xf64, #CSC>
				%xv = bufferization.alloc_tensor(%d0): tensor<?xf64, #SparseVector>
				%0 = linalg.generic #trait_mat_reduce_rowwise
				ins(%arga: tensor<?x?xf64, #CSC>)
				outs(%xv: tensor<?xf64, #SparseVector>) {
				^bb(%a: f64, %b: f64):
				%1 = sparse_tensor.reduce %a, %b, %cf1 : f64 {
				^bb0(%x: f64, %y: f64):
				%2 = arith.mulf %x, %y : f64
				sparse_tensor.yield %2 : f64
				}
				linalg.yield %1 : f64
				} -> tensor<?xf64, #SparseVector>
				return %0 : tensor<?xf64, #SparseVector>
				}

				func.func @min_plus_csrcsr(%arga: tensor<?x?xf64, #CSR>,
				%argb: tensor<?x?xf64, #CSR>) -> tensor<?x?xf64, #CSR> {
				%c0 = arith.constant 0 : index
				%c1 = arith.constant 1 : index
				%maxf = arith.constant 1.0e999 : f64
				%d0 = tensor.dim %arga, %c0 : tensor<?x?xf64, #CSR>
				%d1 = tensor.dim %argb, %c1 : tensor<?x?xf64, #CSR>
				%xm = bufferization.alloc_tensor(%d0, %d1) : tensor<?x?xf64, #CSR>
				%0 = linalg.generic #trait_matmul
				ins(%arga, %argb: tensor<?x?xf64, #CSR>, tensor<?x?xf64, #CSR>)
				outs(%xm: tensor<?x?xf64, #CSR>) {
				^bb(%a: f64, %b: f64, %output: f64):
				%1 = sparse_tensor.binary %a, %b : f64, f64 to f64
				overlap = {
				^bb0(%x: f64, %y: f64):
				%3 = arith.addf %x, %y : f64
				sparse_tensor.yield %3 : f64
				}
				left={}
				right={}
				%2 = sparse_tensor.reduce %1, %output, %maxf : f64 {
				^bb0(%x: f64, %y: f64):
				%cmp = arith.cmpf "olt", %x, %y : f64
				%3 = arith.select %cmp, %x, %y : f64
				sparse_tensor.yield %3 : f64
				}
				linalg.yield %2 : f64
				} -> tensor<?x?xf64, #CSR>
				return %0 : tensor<?x?xf64, #CSR>
				}

				func.func @min_plus_csrcsc(%arga: tensor<?x?xf64, #CSR>,
				%argb: tensor<?x?xf64, #CSC>) -> tensor<?x?xf64, #CSR> {
				%c0 = arith.constant 0 : index
				%c1 = arith.constant 1 : index
				%maxf = arith.constant 1.0e999 : f64
				%d0 = tensor.dim %arga, %c0 : tensor<?x?xf64, #CSR>
				%d1 = tensor.dim %argb, %c1 : tensor<?x?xf64, #CSC>
				%xm = bufferization.alloc_tensor(%d0, %d1) : tensor<?x?xf64, #CSR>
				%0 = linalg.generic #trait_matmul
				ins(%arga, %argb: tensor<?x?xf64, #CSR>, tensor<?x?xf64, #CSC>)
				outs(%xm: tensor<?x?xf64, #CSR>) {
				^bb(%a: f64, %b: f64, %output: f64):
				%1 = sparse_tensor.binary %a, %b : f64, f64 to f64
				overlap = {
				^bb0(%x: f64, %y: f64):
				%3 = arith.addf %x, %y : f64
				sparse_tensor.yield %3 : f64
				}
				left={}
				right={}
				%2 = sparse_tensor.reduce %1, %output, %maxf : f64 {
				^bb0(%x: f64, %y: f64):
				%cmp = arith.cmpf "olt", %x, %y : f64
				%3 = arith.select %cmp, %x, %y : f64
				sparse_tensor.yield %3 : f64
				}
				linalg.yield %2 : f64
				} -> tensor<?x?xf64, #CSR>
				return %0 : tensor<?x?xf64, #CSR>
				}

				// Dumps a sparse vector of type f64.
				func.func @dump_vec(%arg0: tensor<?xf64, #SparseVector>) {
				// Dump the values array to verify only sparse contents are stored.
				%c0 = arith.constant 0 : index
				%d0 = arith.constant -1.0 : f64
				%0 = sparse_tensor.values %arg0 : tensor<?xf64, #SparseVector> to memref<?xf64>
				%1 = vector.transfer_read %0[%c0], %d0: memref<?xf64>, vector<8xf64>
				aartbikUnsubmitted Not Done Reply Inline Actions This is actually WAI, sort of. When we do a reduction within the innermost loop, we always come out with a value, even if the loop is actually never taken. This behavior also happens for regular reductions, and is similar to what the TACO compiler generates (at least at the moment). Agreed, we could improve this, but that will take some more work beyond this new operation. aartbik: This is actually WAI, sort of. When we do a reduction within the innermost loop, we always come…
				vector.print %1 : vector<8xf64>
				// Dump the dense vector to verify structure is correct.
				%dv = sparse_tensor.convert %arg0 : tensor<?xf64, #SparseVector> to tensor<?xf64>
				%2 = vector.transfer_read %dv[%c0], %d0: tensor<?xf64>, vector<16xf64>
				vector.print %2 : vector<16xf64>
				return
				}

				// Dump a sparse matrix.
				func.func @dump_mat(%arg0: tensor<?x?xf64, #CSR>) {
				// Dump the values array to verify only sparse contents are stored.
				%c0 = arith.constant 0 : index
				%d0 = arith.constant -1.0 : f64
				%0 = sparse_tensor.values %arg0 : tensor<?x?xf64, #CSR> to memref<?xf64>
				%1 = vector.transfer_read %0[%c0], %d0: memref<?xf64>, vector<16xf64>
				vector.print %1 : vector<16xf64>
				%dm = sparse_tensor.convert %arg0 : tensor<?x?xf64, #CSR> to tensor<?x?xf64>
				%2 = vector.transfer_read %dm[%c0, %c0], %d0: tensor<?x?xf64>, vector<5x5xf64>
				vector.print %2 : vector<5x5xf64>
				return
				}

				aartbikUnsubmitted Not Done Reply Inline Actions Note that recent changes unified this with the bufferization dialect ops: bufferization.dealloc_tensor aartbik: Note that recent changes unified this with the bufferization dialect ops: bufferization.
				// Driver method to call and verify vector kernels.
				func.func @entry() {
				%c0 = arith.constant 0 : index

				// Setup sparse matrices.
				%m1 = arith.constant sparse<
				[ [0,0], [0,1], [1,0], [2,2], [2,3], [2,4], [3,0], [3,2], [3,3] ],
				[ 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0 ]
				> : tensor<4x5xf64>
				%m2 = arith.constant sparse<
				[ [0,0], [1,3], [2,0], [2,3], [3,1], [4,1] ],
				[6.0, 5.0, 4.0, 3.0, 2.0, 11.0 ]
				> : tensor<5x4xf64>
				%sm1 = sparse_tensor.convert %m1 : tensor<4x5xf64> to tensor<?x?xf64, #CSR>
				%sm2r = sparse_tensor.convert %m2 : tensor<5x4xf64> to tensor<?x?xf64, #CSR>
				%sm2c = sparse_tensor.convert %m2 : tensor<5x4xf64> to tensor<?x?xf64, #CSC>

				// Call sparse matrix kernels.
				%1 = call @redProdLex(%sm1) : (tensor<?x?xf64, #CSR>) -> tensor<?xf64, #SparseVector>
				%2 = call @redProdExpand(%sm2c) : (tensor<?x?xf64, #CSC>) -> tensor<?xf64, #SparseVector>
				%5 = call @min_plus_csrcsr(%sm1, %sm2r)
				: (tensor<?x?xf64, #CSR>, tensor<?x?xf64, #CSR>) -> tensor<?x?xf64, #CSR>
				%6 = call @min_plus_csrcsc(%sm1, %sm2c)
				: (tensor<?x?xf64, #CSR>, tensor<?x?xf64, #CSC>) -> tensor<?x?xf64, #CSR>

				//
				// Verify the results.
				//
				// CHECK: ( 1, 2, 3, 4, 5, 6, 7, 8, 9, -1, -1, -1, -1, -1, -1, -1 )
				// CHECK-NEXT: ( ( 1, 2, 0, 0, 0 ), ( 3, 0, 0, 0, 0 ), ( 0, 0, 4, 5, 6 ), ( 7, 0, 8, 9, 0 ), ( -1, -1, -1, -1, -1 ) )
				// CHECK-NEXT: ( 6, 5, 4, 3, 2, 11, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 )
				// CHECK-NEXT: ( ( 6, 0, 0, 0, -1 ), ( 0, 0, 0, 5, -1 ), ( 4, 0, 0, 3, -1 ), ( 0, 2, 0, 0, -1 ), ( 0, 11, 0, 0, -1 ) )
				// CHECK-NEXT: ( 2, 3, 120, 504, -1, -1, -1, -1 )
				// CHECK-NEXT: ( 2, 3, 120, 504, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 )
				// CHECK-NEXT: ( 6, 5, 12, 2, 11, -1, -1, -1 )
				// CHECK-NEXT: ( 6, 5, 12, 2, 11, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 )
				// CHECK-NEXT: ( 7, 7, 9, 8, 7, 7, 12, 11, 11, -1, -1, -1, -1, -1, -1, -1 )
				// CHECK-NEXT: ( ( 7, 0, 0, 7, -1 ), ( 9, 0, 0, 0, -1 ), ( 8, 7, 0, 7, -1 ), ( 12, 11, 0, 11, -1 ), ( -1, -1, -1, -1, -1 ) )
				// TODO: Update once identity values are no longer inserted for non-overlapping dot product
				// CHECK-NEXT: ( 7, inf, inf, 7, 9, inf, inf, inf, 8, 7, inf, 7, 12, 11, inf, 11 )
				aartbikUnsubmitted Done Reply Inline Actions make this more precise, it it not about the Inf per se, but about not having full "fill in" due to the reduction loop aartbik: make this more precise, it it not about the Inf per se, but about not having full "fill in" due…
				// CHECK-NEXT: ( ( 7, inf, inf, 7, -1 ), ( 9, inf, inf, inf, -1 ), ( 8, 7, inf, 7, -1 ), ( 12, 11, inf, 11, -1 ), ( -1, -1, -1, -1, -1 ) )
				//
				call @dump_mat(%sm1) : (tensor<?x?xf64, #CSR>) -> ()
				call @dump_mat(%sm2r) : (tensor<?x?xf64, #CSR>) -> ()
				call @dump_vec(%1) : (tensor<?xf64, #SparseVector>) -> ()
				call @dump_vec(%2) : (tensor<?xf64, #SparseVector>) -> ()
				call @dump_mat(%5) : (tensor<?x?xf64, #CSR>) -> ()
				call @dump_mat(%6) : (tensor<?x?xf64, #CSR>) -> ()

				// Release the resources.
				bufferization.dealloc_tensor %sm1 : tensor<?x?xf64, #CSR>
				bufferization.dealloc_tensor %sm2r : tensor<?x?xf64, #CSR>
				bufferization.dealloc_tensor %sm2c : tensor<?x?xf64, #CSC>
				bufferization.dealloc_tensor %1 : tensor<?xf64, #SparseVector>
				bufferization.dealloc_tensor %2 : tensor<?xf64, #SparseVector>
				bufferization.dealloc_tensor %5 : tensor<?x?xf64, #CSR>
				bufferization.dealloc_tensor %6 : tensor<?x?xf64, #CSR>
				return
				}
				}

mlir/unittests/Dialect/SparseTensor/MergerTest.cpp

Show First 20 Lines • Show All 255 Lines • ▼ Show 20 Lines	bool compareExpression(unsigned e, const std::shared_ptr<Pattern> &pattern) {
case kCastU:		case kCastU:
case kCastIdx:		case kCastIdx:
case kTruncI:		case kTruncI:
case kCIm:		case kCIm:
case kCRe:		case kCRe:
case kBitCast:		case kBitCast:
case kBinaryBranch:		case kBinaryBranch:
case kUnary:		case kUnary:
return compareExpression(tensorExp.children.e0, pattern->e0);		return compareExpression(tensorExp.children.e0, pattern->e0);
		aartbikUnsubmitted Done Reply Inline Actions Should this not be in the binary section?! I think the confusion is because some other tags are misplaced here ;-) aartbik: Should this not be in the binary section?! I think the confusion is because some other tags…
// Binary operations.		// Binary operations.
case kMulF:		case kMulF:
case kMulC:		case kMulC:
case kMulI:		case kMulI:
case kDivF:		case kDivF:
case kDivC:		case kDivC:
case kDivS:		case kDivS:
case kDivU:		case kDivU:
case kAddF:		case kAddF:
case kAddC:		case kAddC:
case kAddI:		case kAddI:
case kSubF:		case kSubF:
case kSubC:		case kSubC:
case kSubI:		case kSubI:
case kAndI:		case kAndI:
case kOrI:		case kOrI:
case kXorI:		case kXorI:
case kShrS:		case kShrS:
case kShrU:		case kShrU:
case kShlI:		case kShlI:
case kBinary:		case kBinary:
		case kReduce:
return compareExpression(tensorExp.children.e0, pattern->e0) &&		return compareExpression(tensorExp.children.e0, pattern->e0) &&
compareExpression(tensorExp.children.e1, pattern->e1);		compareExpression(tensorExp.children.e1, pattern->e1);
}		}
llvm_unreachable("unexpected kind");		llvm_unreachable("unexpected kind");
}		}

unsigned numTensors;		unsigned numTensors;
unsigned numLoops;		unsigned numLoops;
▲ Show 20 Lines • Show All 350 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse] Implement custom reductionClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 453305

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h

mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp

mlir/lib/Dialect/SparseTensor/Utils/Merger.cpp

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_reduce_custom.mlir

mlir/unittests/Dialect/SparseTensor/MergerTest.cpp

[mlir][sparse] Implement custom reduction
ClosedPublic