This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/Linalg/Transforms/
-
mlir/
-
Dialect/
-
Linalg/
-
Transforms/
5/5
Transforms.h
-
lib/Dialect/Linalg/Transforms/
-
Dialect/
-
Linalg/
-
Transforms/
5/5
Sparsification.cpp
-
test/
-
Dialect/Linalg/
-
Linalg/
3/3
sparse_parallel.mlir
-
lib/Transforms/
-
Transforms/
-
TestSparsification.cpp

Differential D91978

[mlir][sparse] add parallelization strategies to sparse compiler
ClosedPublic

Authored by aartbik on Nov 23 2020, 10:01 AM.

Download Raw Diff

Details

Reviewers

nicolasvasilache
ezhulenev
penpornk
mehdi_amini
tatianashp
ftynse

Commits

rG5c4e397e6ce5: [mlir][sparse] add parallelization strategies to sparse compiler

Summary

This CL adds the ability to request different parallelization strategies
for the generate code. Every "parallel" loop is a candidate, and converted
to a parallel op if it is an actual for-loop (not a while) and the strategy
allows dense/sparse outer/inner parallelization.

This will connect directly with the work of @ezhulenev on parallel loops.

Still TBD: vectorization strategy

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

aartbik created this revision.Nov 23 2020, 10:01 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 23 2020, 10:01 AM

Herald added subscribers: mravishankar, teijeong, rdzhabarov and 14 others. · View Herald Transcript

aartbik requested review of this revision.Nov 23 2020, 10:01 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptNov 23 2020, 10:01 AM

Herald added subscribers: limo1996, stephenneuendorffer, nicolasvasilache. · View Herald Transcript

aartbik added reviewers: ezhulenev, penpornk, mehdi_amini, tatianashp.Nov 23 2020, 10:03 AM

Harbormaster completed remote builds in B79815: Diff 307112.Nov 23 2020, 10:18 AM

For ease of reviewing, I split out the invariant generalization in its own review: https://reviews.llvm.org/D91985

aartbik added a reviewer: ftynse.Nov 23 2020, 3:53 PM

penpornk added inline comments.Nov 24 2020, 1:21 PM

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
798	Typo: s/stategy/strategy/
800	Maybe explain the types of loop first just so it's clear for people who are not familiar with TACO? For loops that only iterates one tensor. (Can be called dense/sparse loop based on the compression of that tensor dimension.) For loops that co-iterates tensors (at most one tensor can have sparse storage for that dimension). While loops co-iterating tensors (more than one tensor have sparse storage for that dimension -- parallelization not supported).
801	Nit: Comma after e.g.
802	Nit: Missing a close parenthesis.
mlir/lib/Dialect/Linalg/Transforms/Sparsification.cpp
399	Typo: s/it it/it is/
584–615	Typo: s/loop loop/loop/
622–626	Please bear with my noob questions: It looks like `highs` and `sizes` store the inferred dimension sizes. What exactly is `lo` (and `pidxs` and `loops`)? If it's the starting point of a dimension, isn't it 0? Why are we updating `pidxs` and `loops` through `parOp.getInductionVars()`? Also, why do we need `highs` to be per tensor for sparse but `sizes` can be just per dimension? (If multiple tensors co-iterate on a dimension, the size should match anyway.) It'd be great if you could help add more explanations about them (or examples) to `struct CodeGen`. (Or I can do it later once I'm more familiar with the code.)
mlir/test/Dialect/Linalg/sparse_2d.mlir
1133 ↗	(On Diff #307112)	Does `alloca` fill the allocated memory with 0s? If not, `X` has not been initialized to 0 before the loop, and the loop only set its values where `A` is nonzero.
mlir/test/Dialect/Linalg/sparse_parallel.mlir
149	If in the `SCALE` case (e.g., `trait_ss`) the inner loop can be parallelized, then I think in this case it could be parallelized too, because the amount of work solely depends on `A`.

Herald added a subscriber: nimiwio. · View Herald TranscriptNov 24 2020, 1:21 PM

aartbik marked 8 inline comments as done.Nov 24 2020, 4:11 PM

aartbik added inline comments.

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
800	Added some explanation at the top level.
mlir/lib/Dialect/Linalg/Transforms/Sparsification.cpp
622–626	Yes, I can add some more comments to the code, since it has become less self-explanatory then when I started this. In a nutshell, yes some of these are fixed before hand (like sizes), others are updated as we go (so that loops after loops iterate from where the previous loop left the induction).
mlir/test/Dialect/Linalg/sparse_2d.mlir
1133 ↗	(On Diff #307112)	None of these buffers are properly initialized. This is the temporary "local bufferization" solution we talked about earlier. Leaving these stubs here allows me to quickly hand covert this into something working. But eventually this will be replaced by proper initialization code (reading from file for example).
mlir/test/Dialect/Linalg/sparse_parallel.mlir
149	Well in this case, the inner loop is a "reduction". But you are right this could be parallelized too with a parallel tree reduction. And in fact, the "scf.parallel" operation supports the "scf.reduce" we may start using in the future....

rebased + addressed comments

Harbormaster completed remote builds in B80024: Diff 307474.Nov 24 2020, 4:32 PM

penpornk accepted this revision.Nov 24 2020, 4:41 PM

penpornk added inline comments.

mlir/lib/Dialect/Linalg/Transforms/Sparsification.cpp
622–626	Thank you very much!
mlir/test/Dialect/Linalg/sparse_parallel.mlir
149	Oh, right. I forgot that it is doing reduction so it's out of scope for now. Thank you for the clarification! :)

This revision is now accepted and ready to land.Nov 24 2020, 4:41 PM

Thanks for your review, Penporn! Keep those useful comments coming!

Closed by commit rG5c4e397e6ce5: [mlir][sparse] add parallelization strategies to sparse compiler (authored by aartbik). · Explain WhyNov 24 2020, 5:17 PM

This revision was automatically updated to reflect the committed changes.

aartbik added a commit: rG5c4e397e6ce5: [mlir][sparse] add parallelization strategies to sparse compiler.

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Linalg/

Transforms/

Transforms.h

57 lines

lib/

Dialect/

Linalg/

Transforms/

Sparsification.cpp

170 lines

test/

Dialect/

Linalg/

sparse_parallel.mlir

161 lines

lib/

Transforms/

TestSparsification.cpp

54 lines

Diff 307483

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h

Show First 20 Lines • Show All 777 Lines • ▼ Show 20 Lines

/// granularity than a sequence of traditional compiler passes.

LogicalResult applyStagedPatterns(

Operation *op, ArrayRef<FrozenRewritePatternList> stage1Patterns,

const FrozenRewritePatternList &stage2Patterns,

function_ref<LogicalResult(Operation *)> stage3Lambda = nullptr);

//===----------------------------------------------------------------------===//

// Support for sparse tensor code generation.

// The sparse compiler part of MLIR lowers a tensor expression formulated as a

// Linalg operation into a sequence of loops depending on what dimensions of the

// tensors are marked dense or sparse. The generated code distinguishes between:

// (1) for-loops that iterate over a single dense dimension,

// (2) for-loops that iterate over a single sparse dimension,

// (3) while-loops that co-iterate over several sparse dimensions.

// The for-loops may be subsequently optimized for parallel or vector execution.

// For more details, the Dialect/Linalg/Transforms/Sparsification.cpp file.

//===----------------------------------------------------------------------===//

void populateSparsificationPatterns(MLIRContext *context,

OwningRewritePatternList &patterns);

/// Defines a parallelization strategy. Any implicit loop in the Linalg

penpornkUnsubmitted

Done

//===----------------------------------------------------------------------===//

- /// Defines a parallelization stategy for the sparse compiler. Any implicit loop

+ /// Defines a parallelization strategy for the sparse compiler. Any implicit loop

/// in the Linalg operation that is marked "parallel" (viz. no "reduction") is

Typo: s/stategy/strategy/

penpornk: Typo: s/stategy/strategy/

/// operation that is marked "parallel" (thus not "reduction") is a candidate

/// for parallelization. The loop is made parallel if (1) allowed by the

penpornkUnsubmitted

Done

Maybe explain the types of loop first just so it's clear for people who are not familiar with TACO?

For loops that only iterates one tensor. (Can be called dense/sparse loop based on the compression of that tensor dimension.)
For loops that co-iterates tensors (at most one tensor can have sparse storage for that dimension).
While loops co-iterating tensors (more than one tensor have sparse storage for that dimension -- parallelization not supported).

penpornk: Maybe explain the types of loop first just so it's clear for people who are not familiar with…

aartbikAuthorUnsubmitted

Done

Added some explanation at the top level.

aartbik: Added some explanation at the top level.

/// strategy (e.g., AnyStorageOuterLoop considers either a dense or sparse

penpornkUnsubmitted

Done

/// a candidate for parallelization. The loop is made parallel if (1) allowed by

- /// the strategy (e.g. AnyStorageOuterLoop considers either a dense or sparse

+ /// the strategy (e.g., AnyStorageOuterLoop considers either a dense or sparse

/// outermost loop only, and (2) the emitted code is an actual for-loop (and

Nit: Comma after e.g.

penpornk: Nit: Comma after e.g.

/// outermost loop only), and (2) the generated code is an actual for-loop

penpornkUnsubmitted

Done

/// the strategy (e.g. AnyStorageOuterLoop considers either a dense or sparse

- /// outermost loop only, and (2) the emitted code is an actual for-loop (and

+ /// outermost loop only), and (2) the emitted code is an actual for-loop (and

/// not a co-iterating while-loop).

Nit: Missing a close parenthesis.

penpornk: Nit: Missing a close parenthesis.

/// (and not a co-iterating while-loop).

enum class SparseParallelizationStrategy {

kNone,

kDenseOuterLoop,

kAnyStorageOuterLoop,

kDenseAnyLoop,

kAnyStorageAnyLoop

// TODO: support reduction parallelization too?

};

/// Defines a vectorization strategy. Any implicit inner loop in the Linalg

/// operation is a candidate (full SIMD for "parallel" loops and horizontal

/// SIMD for "reduction" loops). A loop is actually vectorized if (1) allowed

/// by the strategy, and (2) the emitted code is an actual for-loop (and not

/// a co-iterating while-loop).

enum class SparseVectorizationStrategy {

kNone,

kDenseInnerLoop,

kAnyStorageInnerLoop

};

/// Sparsification options.

struct SparsificationOptions {

SparsificationOptions(SparseParallelizationStrategy p,

SparseVectorizationStrategy v, unsigned vl)

: parallelizationStrategy(p), vectorizationStrategy(v), vectorLength(vl) {

}

SparsificationOptions()

: SparsificationOptions(SparseParallelizationStrategy::kNone,

SparseVectorizationStrategy::kNone, 1u) {}

SparseParallelizationStrategy parallelizationStrategy;

SparseVectorizationStrategy vectorizationStrategy;

unsigned vectorLength;

};

/// Set up sparsification rewriting rules with the given options.

void populateSparsificationPatterns(

MLIRContext *context, OwningRewritePatternList &patterns,

const SparsificationOptions &options = SparsificationOptions());

} // namespace linalg

} // namespace mlir

#endif // DIALECT_LINALG_TRANSFORMS_TRANSFORMS_H_

mlir/lib/Dialect/Linalg/Transforms/Sparsification.cpp

Show First 20 Lines • Show All 229 Lines • ▼ Show 20 Lines private:

std::vector<std::vector<bool>> isSparse; std::vector<std::vector<bool>> isSparse;

llvm::SmallVector<TensorExp, 32> tensorExps; llvm::SmallVector<TensorExp, 32> tensorExps;

llvm::SmallVector<LatPoint, 16> latPoints; llvm::SmallVector<LatPoint, 16> latPoints;

llvm::SmallVector<SmallVector<unsigned, 16>, 8> latSets; llvm::SmallVector<SmallVector<unsigned, 16>, 8> latSets;

}; };

// Code generation. // Code generation.

struct CodeGen { struct CodeGen {

CodeGen(unsigned numTensors, unsigned numLoops) CodeGen(linalg::SparsificationOptions o, unsigned numTensors,

: loops(numLoops), sizes(numLoops), buffers(numTensors), unsigned numLoops)

: options(o), loops(numLoops), sizes(numLoops), buffers(numTensors),

pointers(numTensors, std::vector<Value>(numLoops)), pointers(numTensors, std::vector<Value>(numLoops)),

indices(numTensors, std::vector<Value>(numLoops)), indices(numTensors, std::vector<Value>(numLoops)),

highs(numTensors, std::vector<Value>(numLoops)), highs(numTensors, std::vector<Value>(numLoops)),

pidxs(numTensors, std::vector<Value>(numLoops)), pidxs(numTensors, std::vector<Value>(numLoops)),

idxs(numTensors, std::vector<Value>(numLoops)) {} idxs(numTensors, std::vector<Value>(numLoops)) {}

// Universal dense indices and upper bounds (by index). // Sparsification options.

linalg::SparsificationOptions options;

// Universal dense indices and upper bounds (by index). The loops array

// is updated with the value of the universal dense index in the current

// loop. The sizes array is set once with the inferred dimension sizes.

std::vector<Value> loops; std::vector<Value> loops;

std::vector<Value> sizes; std::vector<Value> sizes;

// Buffers for storing dense and sparse numerical values (by tensor). // Buffers for storing dense and sparse numerical values (by tensor).

// This array is set once during bufferization of all tensors.

std::vector<Value> buffers; std::vector<Value> buffers;

// Sparse storage schemes (1-D): pointers and indices (by tensor and index). // Sparse storage schemes (1-D): pointers and indices (by tensor and index).

// This array is set once during bufferization of all sparse tensors.

std::vector<std::vector<Value>> pointers; std::vector<std::vector<Value>> pointers;

std::vector<std::vector<Value>> indices; std::vector<std::vector<Value>> indices;

// Sparse iteration information (by tensor and index). // Sparse iteration information (by tensor and index). These arrays

// are updated to remain current within the current loop.

std::vector<std::vector<Value>> highs; std::vector<std::vector<Value>> highs;

std::vector<std::vector<Value>> pidxs; std::vector<std::vector<Value>> pidxs;

std::vector<std::vector<Value>> idxs; std::vector<std::vector<Value>> idxs;

}; };

} // namespace } // namespace

/// Helper method to inspect sparse annotations in the linalg operation. /// Helper method to inspect sparse annotations in the linalg operation.

▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines

} }

/// Builds the iteration lattices in a bottom-up traversal given the remaining /// Builds the iteration lattices in a bottom-up traversal given the remaining

/// tensor (sub)expression and the next loop index in the iteration graph. /// tensor (sub)expression and the next loop index in the iteration graph.

static unsigned buildLattices(Merger &merger, linalg::GenericOp op, static unsigned buildLattices(Merger &merger, linalg::GenericOp op,

unsigned exp, unsigned idx) { unsigned exp, unsigned idx) {

Kind kind = merger.exp(exp).kind; Kind kind = merger.exp(exp).kind;

if (kind == Kind::kTensor || kind == Kind::kInvariant) { if (kind == Kind::kTensor || kind == Kind::kInvariant) {

// Either the index is really used in the tensor expression, or it it // Either the index is really used in the tensor expression, or it is

penpornkUnsubmitted

Done

if (kind == Kind::kTensor || kind == Kind::kInvariant) {

- // Either the index is really used in the tensor expression, or it it

+ // Either the index is really used in the tensor expression, or it is

// set to the "non-existing dense index" in that dimension. Invariant

Typo: s/it it/it is/

penpornk: Typo: s/it it/it is/

// set to the "non-existing dense index" in that dimension. Invariant // set to the "non-existing dense index" in that dimension. Invariant

// expressions borrow the output tensor indices. // expressions borrow the output tensor indices.

unsigned s = merger.addSet(); unsigned s = merger.addSet();

unsigned t = kind == Kind::kTensor ? merger.exp(exp).e0 unsigned t = kind == Kind::kTensor ? merger.exp(exp).e0

: op.getNumInputsAndOutputs() - 1; : op.getNumInputsAndOutputs() - 1;

merger.set(s).push_back(merger.addLat(t, idx, exp)); merger.set(s).push_back(merger.addLat(t, idx, exp));

return s; return s;

} }

▲ Show 20 Lines • Show All 168 Lines • ▼ Show 20 Lines for (unsigned b = 0, be = inits.size(); b < be; b++) {

} }

// Initialize the universal dense index. // Initialize the universal dense index.

codegen.loops[idx] = rewriter.create<ConstantIndexOp>(loc, 0); codegen.loops[idx] = rewriter.create<ConstantIndexOp>(loc, 0);

return needsUniv; return needsUniv;

} }

/// Generates a for-loop or a while-loop, depending on whether it implements /// Generates a for-loop on a single index.

/// singleton iteration or co-iteration over the given conjunction. static Operation *genFor(Merger &merger, CodeGen &codegen,

static void genLoop(Merger &merger, CodeGen &codegen, PatternRewriter &rewriter, PatternRewriter &rewriter, linalg::GenericOp op,

linalg::GenericOp op, unsigned idx, bool needsUniv, bool isOuter, unsigned idx, llvm::BitVector &indices) {

llvm::BitVector &indices, scf::ForOp &forOp,

scf::WhileOp &whileOp) {

Location loc = op.getLoc();

// Emit a for-loop for a single index.

if (indices.count() == 1) {

unsigned fb = indices.find_first(); unsigned fb = indices.find_first();

unsigned tensor = merger.tensor(fb); unsigned tensor = merger.tensor(fb);

assert(idx == merger.index(fb)); assert(idx == merger.index(fb));

// Emit a sparse for-loop or a dense for-loop.

Value one = rewriter.create<ConstantIndexOp>(loc, 1); // Parallelization strategy. Any implicit loop in the Linalg operation that

if (merger.isSparseBit(fb)) { // is marked "parallel" is a candidate. Whether it is actually converted to

forOp = rewriter.create<scf::ForOp>(loc, codegen.pidxs[tensor][idx], // a parallel operation depends on the requested strategy.

codegen.highs[tensor][idx], one); auto iteratorTypes = op.iterator_types().getValue();

codegen.pidxs[tensor][idx] = forOp.getInductionVar(); bool isSparse = merger.isSparseBit(fb);

bool isParallel = linalg::isParallelIteratorType(iteratorTypes[idx]);

switch (codegen.options.parallelizationStrategy) {

case linalg::SparseParallelizationStrategy::kNone:

isParallel = false;

break;

case linalg::SparseParallelizationStrategy::kDenseOuterLoop:

isParallel &= isOuter && !isSparse;

break;

case linalg::SparseParallelizationStrategy::kAnyStorageOuterLoop:

isParallel &= isOuter;

break;

case linalg::SparseParallelizationStrategy::kDenseAnyLoop:

isParallel &= !isSparse;

break;

case linalg::SparseParallelizationStrategy::kAnyStorageAnyLoop:

break;

}

// Loop bounds and increment.

penpornkUnsubmitted

Done

return needsUniv;

}

- /// Generates a for-loop loop on a single index.

+ /// Generates a for-loop on a single index.

static Operation *genFor(Merger &merger, CodeGen &codegen,

Typo: s/loop loop/loop/

penpornk: Typo: s/loop loop/loop/

Location loc = op.getLoc();

Value lo;

Value hi;

Value step = rewriter.create<ConstantIndexOp>(loc, 1);

Value index;

if (isSparse) {

lo = codegen.pidxs[tensor][idx];

hi = codegen.highs[tensor][idx];

} else { } else {

forOp = rewriter.create<scf::ForOp>(loc, codegen.loops[idx], lo = codegen.loops[idx];

codegen.sizes[idx], one); hi = codegen.sizes[idx];

penpornkUnsubmitted

Done

Please bear with my noob questions:
It looks like highs and sizes store the inferred dimension sizes. What exactly is lo (and pidxs and loops)? If it's the starting point of a dimension, isn't it 0? Why are we updating pidxs and loops through parOp.getInductionVars()?

Also, why do we need highs to be per tensor for sparse but sizes can be just per dimension? (If multiple tensors co-iterate on a dimension, the size should match anyway.)

It'd be great if you could help add more explanations about them (or examples) to struct CodeGen. (Or I can do it later once I'm more familiar with the code.)

penpornk: Please bear with my noob questions: It looks like `highs` and `sizes` store the inferred…

aartbikAuthorUnsubmitted

Done

Yes, I can add some more comments to the code, since it has become less self-explanatory then when I started this. In a nutshell, yes some of these are fixed before hand (like sizes), others are updated as we go (so that loops after loops iterate from where the previous loop left the induction).

aartbik: Yes, I can add some more comments to the code, since it has become less self-explanatory then…

penpornkUnsubmitted

Done

Thank you very much!

penpornk: Thank you very much!

codegen.loops[idx] = forOp.getInductionVar();

} }

// Emit a parallel loop.

if (isParallel) {

scf::ParallelOp parOp = rewriter.create<scf::ParallelOp>(loc, lo, hi, step);

if (isSparse)

codegen.pidxs[tensor][idx] = parOp.getInductionVars()[0];

else

codegen.loops[idx] = parOp.getInductionVars()[0];

rewriter.setInsertionPointToStart(parOp.getBody());

return parOp;

}

// Emit a sequential loop.

scf::ForOp forOp = rewriter.create<scf::ForOp>(loc, lo, hi, step);

if (isSparse)

codegen.pidxs[tensor][idx] = forOp.getInductionVar();

else

codegen.loops[idx] = forOp.getInductionVar();

rewriter.setInsertionPointToStart(forOp.getBody()); rewriter.setInsertionPointToStart(forOp.getBody());

return; return forOp;

} }

// Otherwise, emit a while-loop for co-iteration. /// Emit a while-loop for co-iteration over multiple indices.

Type indexType = rewriter.getIndexType(); static Operation *genWhile(Merger &merger, CodeGen &codegen,

PatternRewriter &rewriter, linalg::GenericOp op,

unsigned idx, bool needsUniv,

llvm::BitVector &indices) {

SmallVector<Type, 4> types; SmallVector<Type, 4> types;

SmallVector<Value, 4> operands; SmallVector<Value, 4> operands;

// Construct the while-loop with a parameter for each index.

Type indexType = rewriter.getIndexType();

for (unsigned b = 0, be = indices.size(); b < be; b++) { for (unsigned b = 0, be = indices.size(); b < be; b++) {

if (indices[b] && merger.isSparseBit(b)) { if (indices[b] && merger.isSparseBit(b)) {

unsigned tensor = merger.tensor(b); unsigned tensor = merger.tensor(b);

assert(idx == merger.index(b)); assert(idx == merger.index(b));

types.push_back(indexType); types.push_back(indexType);

operands.push_back(codegen.pidxs[tensor][idx]); operands.push_back(codegen.pidxs[tensor][idx]);

} }

if (needsUniv) { if (needsUniv) {

types.push_back(indexType); types.push_back(indexType);

operands.push_back(codegen.loops[idx]); operands.push_back(codegen.loops[idx]);

} }

whileOp = rewriter.create<scf::WhileOp>(loc, types, operands); Location loc = op.getLoc();

scf::WhileOp whileOp = rewriter.create<scf::WhileOp>(loc, types, operands);

Block *before = rewriter.createBlock(&whileOp.before(), {}, types); Block *before = rewriter.createBlock(&whileOp.before(), {}, types);

Block *after = rewriter.createBlock(&whileOp.after(), {}, types); Block *after = rewriter.createBlock(&whileOp.after(), {}, types);

// Build the "before" region, which effectively consists // Build the "before" region, which effectively consists

// of a conjunction of "i < upper" tests on all induction. // of a conjunction of "i < upper" tests on all induction.

rewriter.setInsertionPointToStart(&whileOp.before().front()); rewriter.setInsertionPointToStart(&whileOp.before().front());

Value cond; Value cond;

unsigned o = 0; unsigned o = 0;

for (unsigned b = 0, be = indices.size(); b < be; b++) { for (unsigned b = 0, be = indices.size(); b < be; b++) {

if (indices[b] && merger.isSparseBit(b)) { if (indices[b] && merger.isSparseBit(b)) {

unsigned tensor = merger.tensor(b); unsigned tensor = merger.tensor(b);

assert(idx == merger.index(b)); assert(idx == merger.index(b));

Value op1 = before->getArgument(o); Value op1 = before->getArgument(o);

Value op2 = codegen.highs[tensor][idx]; Value op2 = codegen.highs[tensor][idx];

Value opc = rewriter.create<CmpIOp>(loc, CmpIPredicate::ult, op1, op2); Value opc = rewriter.create<CmpIOp>(loc, CmpIPredicate::ult, op1, op2);

cond = cond ? rewriter.create<AndOp>(loc, cond, opc) : opc; cond = cond ? rewriter.create<AndOp>(loc, cond, opc) : opc;

codegen.pidxs[tensor][idx] = after->getArgument(o++); codegen.pidxs[tensor][idx] = after->getArgument(o++);

} }

if (needsUniv) if (needsUniv)

codegen.loops[idx] = after->getArgument(o++); codegen.loops[idx] = after->getArgument(o++);

assert(o == operands.size()); assert(o == operands.size());

rewriter.create<scf::ConditionOp>(loc, cond, before->getArguments()); rewriter.create<scf::ConditionOp>(loc, cond, before->getArguments());

rewriter.setInsertionPointToStart(&whileOp.after().front()); rewriter.setInsertionPointToStart(&whileOp.after().front());

return whileOp;

}

/// Generates a for-loop or a while-loop, depending on whether it implements

/// singleton iteration or co-iteration over the given conjunction.

static Operation *genLoop(Merger &merger, CodeGen &codegen,

PatternRewriter &rewriter, linalg::GenericOp op,

bool isOuter, unsigned idx, bool needsUniv,

llvm::BitVector &indices) {

if (indices.count() == 1)

return genFor(merger, codegen, rewriter, op, isOuter, idx, indices);

return genWhile(merger, codegen, rewriter, op, idx, needsUniv, indices);

} }

/// Generates the local variables for this loop, consisting of the sparse /// Generates the local variables for this loop, consisting of the sparse

/// indices, restored universal dense index, and dense positions. /// indices, restored universal dense index, and dense positions.

static void genLocals(Merger &merger, CodeGen &codegen, static void genLocals(Merger &merger, CodeGen &codegen,

PatternRewriter &rewriter, linalg::GenericOp op, PatternRewriter &rewriter, linalg::GenericOp op,

std::vector<unsigned> &topSort, unsigned at, std::vector<unsigned> &topSort, unsigned at,

bool needsUniv, llvm::BitVector &locals) { bool needsUniv, llvm::BitVector &locals) {

▲ Show 20 Lines • Show All 147 Lines • ▼ Show 20 Lines bool needsUniv =

genInit(merger, codegen, rewriter, op, topSort, at, lat0.bits) && genInit(merger, codegen, rewriter, op, topSort, at, lat0.bits) &&

lsize > 1; lsize > 1;

// Emit a loop for every lattice point L0 >= Li. // Emit a loop for every lattice point L0 >= Li.

for (unsigned li : merger.set(lts)) { for (unsigned li : merger.set(lts)) {

LatPoint lati = merger.lat(li); LatPoint lati = merger.lat(li);

// Emit loop. // Emit loop.

scf::ForOp forOp;

scf::WhileOp whileOp;

llvm::BitVector indices = lati.bits; llvm::BitVector indices = lati.bits;

optimizeIndices(merger, lsize, indices); optimizeIndices(merger, lsize, indices);

genLoop(merger, codegen, rewriter, op, idx, needsUniv, indices, forOp, bool isOuter = at == 0;

whileOp); Operation *loop = genLoop(merger, codegen, rewriter, op, isOuter, idx,

needsUniv, indices);

genLocals(merger, codegen, rewriter, op, topSort, at, needsUniv, lati.bits); genLocals(merger, codegen, rewriter, op, topSort, at, needsUniv, lati.bits);

// Visit all lattices points with Li >= Lj to generate the // Visit all lattices points with Li >= Lj to generate the

// loop-body, possibly with if statements for coiteration. // loop-body, possibly with if statements for coiteration.

bool isWhile = dyn_cast<scf::WhileOp>(loop) != nullptr;

scf::IfOp ifOp; scf::IfOp ifOp;

for (unsigned lj : merger.set(lts)) { for (unsigned lj : merger.set(lts)) {

if (li == lj || merger.latGT(li, lj)) { if (li == lj || merger.latGT(li, lj)) {

LatPoint latj = merger.lat(lj); LatPoint latj = merger.lat(lj);

llvm::BitVector tmp = latj.bits; llvm::BitVector tmp = latj.bits;

tmp ^= lati.bits; tmp ^= lati.bits;

if (merger.hasAnyOf(tmp, false)) if (merger.hasAnyOf(tmp, false))

continue; // dense exhausted within if/else continue; // dense exhausted within if/else

// Recurse into body of each branch. // Recurse into body of each branch.

if (whileOp) if (isWhile)

genIf(merger, codegen, rewriter, op, idx, latj.bits, ifOp); genIf(merger, codegen, rewriter, op, idx, latj.bits, ifOp);

genStmt(merger, codegen, rewriter, op, topSort, latj.exp, at + 1); genStmt(merger, codegen, rewriter, op, topSort, latj.exp, at + 1);

} }

// Wrap-up induction and restore insertion point. // Wrap-up induction and restore insertion point.

if (forOp) { if (isWhile) {

needsUniv = false; scf::WhileOp whileOp = cast<scf::WhileOp>(loop);

rewriter.setInsertionPointAfter(forOp);

} else {

rewriter.setInsertionPointToEnd(&whileOp.after().front()); rewriter.setInsertionPointToEnd(&whileOp.after().front());

genWhileInduction(merger, codegen, rewriter, op, idx, needsUniv, genWhileInduction(merger, codegen, rewriter, op, idx, needsUniv,

lati.bits, whileOp.results()); lati.bits, whileOp.results());

rewriter.setInsertionPointAfter(whileOp); } else {

needsUniv = false;

} }

rewriter.setInsertionPointAfter(loop);

} }

namespace { namespace {

/// Sparse rewriting rule for generic Lingalg operation. /// Sparse rewriting rule for generic Lingalg operation.

struct GenericOpSparsifier : public OpRewritePattern<linalg::GenericOp> { struct GenericOpSparsifier : public OpRewritePattern<linalg::GenericOp> {

using OpRewritePattern<linalg::GenericOp>::OpRewritePattern; public:

GenericOpSparsifier(MLIRContext *context, linalg::SparsificationOptions o)

: OpRewritePattern<linalg::GenericOp>(context), options(o) {}

LogicalResult matchAndRewrite(linalg::GenericOp op, LogicalResult matchAndRewrite(linalg::GenericOp op,

PatternRewriter &rewriter) const override { PatternRewriter &rewriter) const override {

// Detects sparse annotations and translate the per-dimension sparsity // Detects sparse annotations and translate the per-dimension sparsity

// information for all tensors to loop indices in the kernel. // information for all tensors to loop indices in the kernel.

if (!op.hasSparseSemantics()) if (!op.hasSparseSemantics())

return failure(); return failure();

assert(op.getNumOutputs() == 1); assert(op.getNumOutputs() == 1);

Show All 15 Lines LogicalResult matchAndRewrite(linalg::GenericOp op,

// Finds the terminating yield statement and builds the tensor // Finds the terminating yield statement and builds the tensor

// expression for the Linalg operation in SSA form. // expression for the Linalg operation in SSA form.

Operation *yield = op.region().front().getTerminator(); Operation *yield = op.region().front().getTerminator();

Optional<unsigned> exp = buildTensorExp(merger, op, yield->getOperand(0)); Optional<unsigned> exp = buildTensorExp(merger, op, yield->getOperand(0));

if (!exp.hasValue()) if (!exp.hasValue())

return failure(); // build failure return failure(); // build failure

// Recursively generates code. // Recursively generates code.

CodeGen codegen(numTensors, numLoops); CodeGen codegen(options, numTensors, numLoops);

genBuffers(merger, codegen, rewriter, op); genBuffers(merger, codegen, rewriter, op);

genStmt(merger, codegen, rewriter, op, topSort, exp.getValue(), 0); genStmt(merger, codegen, rewriter, op, topSort, exp.getValue(), 0);

Value result = Value result =

rewriter.create<TensorLoadOp>(op.getLoc(), codegen.buffers.back()); rewriter.create<TensorLoadOp>(op.getLoc(), codegen.buffers.back());

rewriter.replaceOp(op, result); rewriter.replaceOp(op, result);

return success(); return success();

} }

private:

/// Options to control sparse code generation.

linalg::SparsificationOptions options;

}; };

} // namespace } // namespace

/// Populates the given patterns list with rewriting rules required for /// Populates the given patterns list with rewriting rules required for

/// the sparsification of linear algebra operations. /// the sparsification of linear algebra operations.

void mlir::linalg::populateSparsificationPatterns( void linalg::populateSparsificationPatterns(

MLIRContext *context, OwningRewritePatternList &patterns) { MLIRContext *context, OwningRewritePatternList &patterns,

patterns.insert<GenericOpSparsifier>(context); const SparsificationOptions &options) {

patterns.insert<GenericOpSparsifier>(context, options);

} }

mlir/test/Dialect/Linalg/sparse_parallel.mlir

This file was added.

				// RUN: mlir-opt %s -test-sparsification="parallelization-strategy=0" \| \
				// RUN: FileCheck %s --check-prefix=CHECK-PAR0
				// RUN: mlir-opt %s -test-sparsification="parallelization-strategy=1" \| \
				// RUN: FileCheck %s --check-prefix=CHECK-PAR1
				// RUN: mlir-opt %s -test-sparsification="parallelization-strategy=2" \| \
				// RUN: FileCheck %s --check-prefix=CHECK-PAR2
				// RUN: mlir-opt %s -test-sparsification="parallelization-strategy=3" \| \
				// RUN: FileCheck %s --check-prefix=CHECK-PAR3
				// RUN: mlir-opt %s -test-sparsification="parallelization-strategy=4" \| \
				// RUN: FileCheck %s --check-prefix=CHECK-PAR4

				#trait_dd = {
				indexing_maps = [
				affine_map<(i,j) -> (i,j)>, // A
				affine_map<(i,j) -> (i,j)> // X (out)
				],
				sparse = [
				[ "D", "D" ], // A
				[ "D", "D" ] // X
				],
				iterator_types = ["parallel", "parallel"],
				doc = "X(i,j) = A(i,j) * SCALE"
				}

				//
				// CHECK-PAR0-LABEL: func @scale_dd
				// CHECK-PAR0: scf.for
				// CHECK-PAR0: scf.for
				// CHECK-PAR0: return
				//
				// CHECK-PAR1-LABEL: func @scale_dd
				// CHECK-PAR1: scf.parallel
				// CHECK-PAR1: scf.for
				// CHECK-PAR1: return
				//
				// CHECK-PAR2-LABEL: func @scale_dd
				// CHECK-PAR2: scf.parallel
				// CHECK-PAR2: scf.for
				// CHECK-PAR2: return
				//
				// CHECK-PAR3-LABEL: func @scale_dd
				// CHECK-PAR3: scf.parallel
				// CHECK-PAR3: scf.parallel
				// CHECK-PAR3: return
				//
				// CHECK-PAR4-LABEL: func @scale_dd
				// CHECK-PAR4: scf.parallel
				// CHECK-PAR4: scf.parallel
				// CHECK-PAR4: return
				//
				func @scale_dd(%scale: f32, %arga: tensor<?x?xf32>) -> tensor<?x?xf32> {
				%0 = linalg.generic #trait_dd
				ins(%arga: tensor<?x?xf32>) {
				^bb(%a: f32):
				%0 = mulf %a, %scale : f32
				linalg.yield %0 : f32
				} -> tensor<?x?xf32>
				return %0 : tensor<?x?xf32>
				}

				#trait_ss = {
				indexing_maps = [
				affine_map<(i,j) -> (i,j)>, // A
				affine_map<(i,j) -> (i,j)> // X (out)
				],
				sparse = [
				[ "S", "S" ], // A
				[ "D", "D" ] // X
				],
				iterator_types = ["parallel", "parallel"],
				doc = "X(i,j) = A(i,j) * SCALE"
				}

				//
				// CHECK-PAR0-LABEL: func @scale_ss
				// CHECK-PAR0: scf.for
				// CHECK-PAR0: scf.for
				// CHECK-PAR0: return
				//
				// CHECK-PAR1-LABEL: func @scale_ss
				// CHECK-PAR1: scf.for
				// CHECK-PAR1: scf.for
				// CHECK-PAR1: return
				//
				// CHECK-PAR2-LABEL: func @scale_ss
				// CHECK-PAR2: scf.parallel
				// CHECK-PAR2: scf.for
				// CHECK-PAR2: return
				//
				// CHECK-PAR3-LABEL: func @scale_ss
				// CHECK-PAR3: scf.for
				// CHECK-PAR3: scf.for
				// CHECK-PAR3: return
				//
				// CHECK-PAR4-LABEL: func @scale_ss
				// CHECK-PAR4: scf.parallel
				// CHECK-PAR4: scf.parallel
				// CHECK-PAR4: return
				//
				func @scale_ss(%scale: f32, %arga: tensor<?x?xf32>) -> tensor<?x?xf32> {
				%0 = linalg.generic #trait_ss
				ins(%arga: tensor<?x?xf32>) {
				^bb(%a: f32):
				%0 = mulf %a, %scale : f32
				linalg.yield %0 : f32
				} -> tensor<?x?xf32>
				return %0 : tensor<?x?xf32>
				}

				#trait_matvec = {
				indexing_maps = [
				affine_map<(i,j) -> (i,j)>, // A
				affine_map<(i,j) -> (j)>, // b
				affine_map<(i,j) -> (i)> // x (out)
				],
				sparse = [
				[ "D", "S" ], // A
				[ "D" ], // b
				[ "D" ] // x
				],
				iterator_types = ["parallel", "reduction"],
				doc = "x(i) += A(i,j) * b(j)"
				}

				//
				// CHECK-PAR0-LABEL: func @matvec
				// CHECK-PAR0: scf.for
				// CHECK-PAR0: scf.for
				// CHECK-PAR0: return
				//
				// CHECK-PAR1-LABEL: func @matvec
				// CHECK-PAR1: scf.parallel
				// CHECK-PAR1: scf.for
				// CHECK-PAR1: return
				//
				// CHECK-PAR2-LABEL: func @matvec
				// CHECK-PAR2: scf.parallel
				// CHECK-PAR2: scf.for
				// CHECK-PAR2: return
				//
				// CHECK-PAR3-LABEL: func @matvec
				// CHECK-PAR3: scf.parallel
				// CHECK-PAR3: scf.for
				// CHECK-PAR3: return
				//
				// CHECK-PAR4-LABEL: func @matvec
				// CHECK-PAR4: scf.parallel
				// CHECK-PAR4: scf.for
				// CHECK-PAR4: return
				penpornkUnsubmitted Done Reply Inline Actions If in the `SCALE` case (e.g., `trait_ss`) the inner loop can be parallelized, then I think in this case it could be parallelized too, because the amount of work solely depends on `A`. penpornk: If in the `SCALE` case (e.g., `trait_ss`) the inner loop can be parallelized, then I think in…
				aartbikAuthorUnsubmitted Done Reply Inline Actions Well in this case, the inner loop is a "reduction". But you are right this could be parallelized too with a parallel tree reduction. And in fact, the "scf.parallel" operation supports the "scf.reduce" we may start using in the future.... aartbik: Well in this case, the inner loop is a "reduction". But you are right this could be…
				penpornkUnsubmitted Done Reply Inline Actions Oh, right. I forgot that it is doing reduction so it's out of scope for now. Thank you for the clarification! :) penpornk: Oh, right. I forgot that it is doing reduction so it's out of scope for now. Thank you for the…
				//
				func @matvec(%argA: tensor<16x32xf32>, %argb: tensor<32xf32>, %argx: tensor<16xf32>) -> tensor<16xf32> {
				%0 = linalg.generic #trait_matvec
				ins(%argA, %argb : tensor<16x32xf32>, tensor<32xf32>)
				init(%argx : tensor<16xf32>) {
				^bb(%A: f32, %b: f32, %x: f32):
				%0 = mulf %A, %b : f32
				%1 = addf %0, %x : f32
				linalg.yield %1 : f32
				} -> tensor<16xf32>
				return %0 : tensor<16xf32>
				}

mlir/test/lib/Transforms/TestSparsification.cpp

	Show All 10 Lines
	#include "mlir/Transforms/GreedyPatternRewriteDriver.h"			#include "mlir/Transforms/GreedyPatternRewriteDriver.h"

	using namespace mlir;			using namespace mlir;

	namespace {			namespace {

	struct TestSparsification			struct TestSparsification
	: public PassWrapper<TestSparsification, FunctionPass> {			: public PassWrapper<TestSparsification, FunctionPass> {

				TestSparsification() = default;
				TestSparsification(const TestSparsification &pass) {}

				Option<int32_t> parallelization{
				*this, "parallelization-strategy",
				llvm::cl::desc("Set the parallelization strategy"), llvm::cl::init(0)};

				Option<int32_t> vectorization{
				*this, "vectorization-strategy",
				llvm::cl::desc("Set the vectorization strategy"), llvm::cl::init(0)};

				Option<int32_t> vectorLength{
				*this, "vl", llvm::cl::desc("Set the vector length"), llvm::cl::init(1)};

				/// Registers all dialects required by testing.
	void getDependentDialects(DialectRegistry &registry) const override {			void getDependentDialects(DialectRegistry &registry) const override {
	registry.insert<scf::SCFDialect>();			registry.insert<scf::SCFDialect, vector::VectorDialect>();
				}

				/// Returns parallelization strategy given on command line.
				linalg::SparseParallelizationStrategy parallelOption() {
				switch (parallelization) {
				default:
				return linalg::SparseParallelizationStrategy::kNone;
				case 1:
				return linalg::SparseParallelizationStrategy::kDenseOuterLoop;
				case 2:
				return linalg::SparseParallelizationStrategy::kAnyStorageOuterLoop;
				case 3:
				return linalg::SparseParallelizationStrategy::kDenseAnyLoop;
				case 4:
				return linalg::SparseParallelizationStrategy::kAnyStorageAnyLoop;
				}
				}

				/// Returns vectorization strategy given on command line.
				linalg::SparseVectorizationStrategy vectorOption() {
				switch (vectorization) {
				default:
				return linalg::SparseVectorizationStrategy::kNone;
				case 1:
				return linalg::SparseVectorizationStrategy::kDenseInnerLoop;
				case 2:
				return linalg::SparseVectorizationStrategy::kAnyStorageInnerLoop;
				}
	}			}

				/// Runs the test on a function.
	void runOnFunction() override {			void runOnFunction() override {
	auto *ctx = &getContext();			auto *ctx = &getContext();
	OwningRewritePatternList patterns;			OwningRewritePatternList patterns;
	linalg::populateSparsificationPatterns(ctx, patterns);			// Translate strategy flags to strategy options.
				linalg::SparsificationOptions options(parallelOption(), vectorOption(),
				vectorLength);
				// Apply rewriting.
				linalg::populateSparsificationPatterns(ctx, patterns, options);
	applyPatternsAndFoldGreedily(getFunction(), std::move(patterns));			applyPatternsAndFoldGreedily(getFunction(), std::move(patterns));
	}			}
	};			};

	} // end anonymous namespace			} // end anonymous namespace

	namespace mlir {			namespace mlir {
	namespace test {			namespace test {
	Show All 9 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse] add parallelization strategies to sparse compilerClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 307483

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h

mlir/lib/Dialect/Linalg/Transforms/Sparsification.cpp

mlir/test/Dialect/Linalg/sparse_parallel.mlir

mlir/test/lib/Transforms/TestSparsification.cpp

[mlir][sparse] add parallelization strategies to sparse compiler
ClosedPublic