This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Dialect/Vector/Transforms/
-
Dialect/
-
Vector/
-
Transforms/
9/13
LowerVectorShapeCast.cpp
-
test/Dialect/Vector/
-
Dialect/
-
Vector/
2/2
vector-shape-cast-lowering-scalable-vectors.mlir

Differential D159217

[mlir][VectorOps] Add lowering for vector.shape_cast of scalable vectors
ClosedPublic

Authored by benmxwl-arm on Aug 30 2023, 10:48 AM.

Download Raw Diff

Details

Reviewers

c-rhodes
awarzynski
aartbik
nicolasvasilache
dcaballe

Commits

rG8dffb71cbada: [mlir][VectorOps] Add lowering for vector.shape_cast of scalable vectors

Summary

This adds a lowering similar to the general shape_cast lowering, but
instead moves elements a (scalable) subvector at a time via
vector.scalable.extract/insert. It is restricted to the case where both
the source and result vector types have a single trailing scalable
dimension (due to limitations of the insert/extract ops).

The current lowerings are now disabled for scalable vectors, as they
produce incorrect results at runtime (due to assuming a fixed number
of elements).

Examples of casts that now work:

// Flattening:
%v = vector.shape_cast %arg0 : vector<4x[8]xi8> to vector<[32]xi8>

// Un-flattening:
%v = vector.shape_cast %arg0 : vector<[8]xi32> to vector<2x1x[4]xi32>

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

benmxwl-arm created this revision.Aug 30 2023, 10:48 AM

Herald added a reviewer: aartbik. · View Herald TranscriptAug 30 2023, 10:48 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: bviyer, Moerafaat, zero9178 and 24 others. · View Herald Transcript

benmxwl-arm requested review of this revision.Aug 30 2023, 10:48 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptAug 30 2023, 10:48 AM

Herald added a reviewer: dcaballe. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: alextsao1999, stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Harbormaster completed remote builds in B255837: Diff 554780.Aug 30 2023, 12:14 PM

Mostly makes sense - at least based on tests. I've made a few suggestions and left a few comments. In general, It would help if you documented ScalableShapeCastOpRewritePattern a bit more. Including adding comments in matchAndRewrite. Hopefully my hints will be helpful.

Thanks for working on this!

mlir/lib/Dialect/Vector/Transforms/LowerVectorShapeCast.cpp
305–348	Doxygen + short code example, pls
323–324	This comment is a bit unclear to me. How about: %unflat = vector.shape_cast %arg0 : vector<[8]xi32> to vector<2x[1]x4xi32> ? Not that I suggest that we start generating code like this, but this pattern could in principle support this, right? It does not because it relies on `vector.scalable.{insert\|extract}`, but that could be relaxed in the future, right?
342–343	Why "Min" rather than "Trailing"?
345–346	This is the length of the vector to be extracted?
348	Why "min"? Isn't this the total number of elements? Also, this is ignoring "scalability", but I guess that that;s fine?
366	Rather than "extractedSubVector", could you call it "inputSubVector" (we can guess that it's been "extracted", but it's not obvious _where_ it was extracted from).
408	[nit] We know that this is testing a vector, so "VectorType" can be dropped from the name.
mlir/test/Dialect/Vector/vector-shape-cast-lowering-scalable-vectors.mlir
2	It would be good to add a "negative" test (i.e. "scalable" vectors where the scalable dim is not the trailing dim).
7	[nit] Your function names are quite long. However, we know that this file is all about "shape_cast", so you could safely drop "shape_cast" from names. [nit] "3d_scalable_to_1d" suggests that the input is 3d scalable and the output is 1d

Add a bunch more tests (including a few negative tests)
Attempt to name tests a little better :)
Clear up naming and add a few explanatory comments
Avoid generating duplicate extracts from the source vector (when the subvector size < trailing source dim)

benmxwl-arm marked 9 inline comments as done.Aug 31 2023, 5:17 AM

benmxwl-arm added inline comments.

mlir/lib/Dialect/Vector/Transforms/LowerVectorShapeCast.cpp
323–324	I don't think this lowering (or one like this) could work for that cast. The problems there are: This lowering is part of vector-to-llvm and there's no legal (or soon to be legal) representation of `vector<2x[1]x4xi32>` in LLVM For `vector<[8]xi32> to vector<2x[1]x4xi32>` you'd need extract/insert fixed vectors (`vector<4xi32>`) 2*vscale times, so you'd need a (runtime) loop. There's (currently) no ops that could handle the require inserts/extracts.
342–343	It's min as in the minimum vector size (when vscale is 1)

Harbormaster completed remote builds in B255984: Diff 554988.Aug 31 2023, 5:58 AM

I've been wanting to rewrite the (non-scalable) shape cast lowering for a while now using IndexingUtils and avoiding custom one-off incIdx like things.

Would you see an opportunity to make use of the unroll iterator introduced here: https://reviews.llvm.org/D150000 ?

It seems to me that you could use a zip with the above iterator and you have 2 cases:

if shape_cast source.scalable_size == shape_cast dest.scalable_size then just vector.extract from iterator_source.drop_back and vector.insert into iterator_dest.drop_back
otherwise same as 1 + an extra: a. vector.scalable.extract of iterator_source.take_back() if shape_cast source.scalable_size > shape_cast dest.scalable_size; else b. vector.scalable.insert of iterator_dest.take_back() if shape_cast source.scalable_size < shape_cast dest.scalable_size;

Bonus points for also doing the non-scalable with a similar algorithm and avoiding the custom for loop and the progressive decomposition when we can really just do a zip (but of course not necessary since your PR is orthogonal).

Would that make sense ?

In D159217#4631532, @nicolasvasilache wrote:

I've been wanting to rewrite the (non-scalable) shape cast lowering for a while now using IndexingUtils and avoiding custom one-off incIdx like things.

Would you see an opportunity to make use of the unroll iterator introduced here: https://reviews.llvm.org/D150000 ?

It seems to me that you could use a zip with the above iterator and you have 2 cases:

if shape_cast source.scalable_size == shape_cast dest.scalable_size then just vector.extract from iterator_source.drop_back and vector.insert into iterator_dest.drop_back

otherwise same as 1 + an extra: a. vector.scalable.extract of iterator_source.take_back() if shape_cast source.scalable_size > shape_cast dest.scalable_size; else b. vector.scalable.insert of iterator_dest.take_back() if shape_cast source.scalable_size < shape_cast dest.scalable_size;

Bonus points for also doing the non-scalable with a similar algorithm and avoiding the custom for loop and the progressive decomposition when we can really just do a zip (but of course not necessary since your PR is orthogonal).

Would that make sense ?

+1 to using IndexingUtils if that's possible. However, given that https://reviews.llvm.org/D150000 hasn't landed yet, would it be OK to merge this one as is and try to refactor shortly after? @nicolasvasilache ? That would unblock a few things for us.

@benmxwl-arm Approving as is - LGTM! But please wait for Nicolas to confirm before landing (or try to refactor if D150000 lands).

mlir/lib/Dialect/Vector/Transforms/LowerVectorShapeCast.cpp
368	[nit]
383
398	[nit]
414

This revision is now accepted and ready to land.Sep 6 2023, 12:50 AM

I gave https://reviews.llvm.org/D150000 a quick try (and found that the patch currently has a few simple merge conflicts).
A bigger issue is the new TileOffsetRangeIterator fails to compile if passed to llvm::zip(). It seems like it's missing an implementation of std::iterator_traits.

no type named 'iterator_category' in 'std::iterator_traits<mlir::detail::TileOffsetRangeIterator<long>>'

I tried adding a basic implementation, but that leads to more issues cropping up like missing a difference_type (then the same issues for llvm::detail::zip_common<llvm::detail::zip_shortest<mlir::detail::TileOffsetRangeIterator... )

In D159217#4639477, @benmxwl-arm wrote:

I gave https://reviews.llvm.org/D150000 a quick try (and found that the patch currently has a few simple merge conflicts).
A bigger issue is the new TileOffsetRangeIterator fails to compile if passed to llvm::zip(). It seems like it's missing an implementation of std::iterator_traits.

no type named 'iterator_category' in 'std::iterator_traits<mlir::detail::TileOffsetRangeIterator<long>>'

I tried adding a basic implementation, but that leads to more issues cropping up like missing a difference_type (then the same issues for llvm::detail::zip_common<llvm::detail::zip_shortest<mlir::detail::TileOffsetRangeIterator... )

Ok, sorry to have sent you on a wild goose chase, I'll try to land https://reviews.llvm.org/D150000 myself soon as I do not see recent activity there.
Let's proceed with this for now and add a few TODOs.

nicolasvasilache accepted this revision.Sep 7 2023, 1:03 AM

This revision was landed with ongoing or failed builds.Sep 7 2023, 9:00 AM

Closed by commit rG8dffb71cbada: [mlir][VectorOps] Add lowering for vector.shape_cast of scalable vectors (authored by benmxwl-arm). · Explain Why

This revision was automatically updated to reflect the committed changes.

benmxwl-arm added a commit: rG8dffb71cbada: [mlir][VectorOps] Add lowering for vector.shape_cast of scalable vectors.

Thanks for the notes above regarding D150000. I'll try to fix those today and land it

Revision Contents

Path

Size

mlir/

lib/

Dialect/

Vector/

Transforms/

LowerVectorShapeCast.cpp

181 lines

test/

Dialect/

Vector/

vector-shape-cast-lowering-scalable-vectors.mlir

214 lines

Diff 556167

mlir/lib/Dialect/Vector/Transforms/LowerVectorShapeCast.cpp

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines class ShapeCastOp2DDownCastRewritePattern

: public OpRewritePattern<vector::ShapeCastOp> { : public OpRewritePattern<vector::ShapeCastOp> {

public: public:

using OpRewritePattern::OpRewritePattern; using OpRewritePattern::OpRewritePattern;

LogicalResult matchAndRewrite(vector::ShapeCastOp op, LogicalResult matchAndRewrite(vector::ShapeCastOp op,

PatternRewriter &rewriter) const override { PatternRewriter &rewriter) const override {

auto sourceVectorType = op.getSourceVectorType(); auto sourceVectorType = op.getSourceVectorType();

auto resultVectorType = op.getResultVectorType(); auto resultVectorType = op.getResultVectorType();

if (sourceVectorType.isScalable() || resultVectorType.isScalable())

return failure();

if (sourceVectorType.getRank() != 2 || resultVectorType.getRank() != 1) if (sourceVectorType.getRank() != 2 || resultVectorType.getRank() != 1)

return failure(); return failure();

auto loc = op.getLoc(); auto loc = op.getLoc();

Value desc = rewriter.create<arith::ConstantOp>( Value desc = rewriter.create<arith::ConstantOp>(

loc, resultVectorType, rewriter.getZeroAttr(resultVectorType)); loc, resultVectorType, rewriter.getZeroAttr(resultVectorType));

unsigned mostMinorVectorSize = sourceVectorType.getShape()[1]; unsigned mostMinorVectorSize = sourceVectorType.getShape()[1];

for (int64_t i = 0, e = sourceVectorType.getShape().front(); i != e; ++i) { for (int64_t i = 0, e = sourceVectorType.getShape().front(); i != e; ++i) {

Show All 17 Lines class ShapeCastOp2DUpCastRewritePattern

: public OpRewritePattern<vector::ShapeCastOp> { : public OpRewritePattern<vector::ShapeCastOp> {

public: public:

using OpRewritePattern::OpRewritePattern; using OpRewritePattern::OpRewritePattern;

LogicalResult matchAndRewrite(vector::ShapeCastOp op, LogicalResult matchAndRewrite(vector::ShapeCastOp op,

PatternRewriter &rewriter) const override { PatternRewriter &rewriter) const override {

auto sourceVectorType = op.getSourceVectorType(); auto sourceVectorType = op.getSourceVectorType();

auto resultVectorType = op.getResultVectorType(); auto resultVectorType = op.getResultVectorType();

if (sourceVectorType.isScalable() || resultVectorType.isScalable())

return failure();

if (sourceVectorType.getRank() != 1 || resultVectorType.getRank() != 2) if (sourceVectorType.getRank() != 1 || resultVectorType.getRank() != 2)

return failure(); return failure();

auto loc = op.getLoc(); auto loc = op.getLoc();

Value desc = rewriter.create<arith::ConstantOp>( Value desc = rewriter.create<arith::ConstantOp>(

loc, resultVectorType, rewriter.getZeroAttr(resultVectorType)); loc, resultVectorType, rewriter.getZeroAttr(resultVectorType));

unsigned mostMinorVectorSize = resultVectorType.getShape()[1]; unsigned mostMinorVectorSize = resultVectorType.getShape()[1];

for (int64_t i = 0, e = resultVectorType.getShape().front(); i != e; ++i) { for (int64_t i = 0, e = resultVectorType.getShape().front(); i != e; ++i) {

Value vec = rewriter.create<vector::ExtractStridedSliceOp>( Value vec = rewriter.create<vector::ExtractStridedSliceOp>(

loc, op.getSource(), /*offsets=*/i * mostMinorVectorSize, loc, op.getSource(), /*offsets=*/i * mostMinorVectorSize,

/*sizes=*/mostMinorVectorSize, /*sizes=*/mostMinorVectorSize,

/*strides=*/1); /*strides=*/1);

desc = rewriter.create<vector::InsertOp>(loc, vec, desc, i); desc = rewriter.create<vector::InsertOp>(loc, vec, desc, i);

} }

rewriter.replaceOp(op, desc); rewriter.replaceOp(op, desc);

return success(); return success();

} }

}; };

static void incIdx(llvm::MutableArrayRef<int64_t> idx, VectorType tp,

int dimIdx, int initialStep = 1) {

int step = initialStep;

for (int d = dimIdx; d >= 0; d--) {

idx[d] += step;

if (idx[d] >= tp.getDimSize(d)) {

idx[d] = 0;

step = 1;

} else {

break;

}

// We typically should not lower general shape cast operations into data // We typically should not lower general shape cast operations into data

// movement instructions, since the assumption is that these casts are // movement instructions, since the assumption is that these casts are

// optimized away during progressive lowering. For completeness, however, // optimized away during progressive lowering. For completeness, however,

// we fall back to a reference implementation that moves all elements // we fall back to a reference implementation that moves all elements

// into the right place if we get here. // into the right place if we get here.

class ShapeCastOpRewritePattern : public OpRewritePattern<vector::ShapeCastOp> { class ShapeCastOpRewritePattern : public OpRewritePattern<vector::ShapeCastOp> {

public: public:

using OpRewritePattern::OpRewritePattern; using OpRewritePattern::OpRewritePattern;

LogicalResult matchAndRewrite(vector::ShapeCastOp op, LogicalResult matchAndRewrite(vector::ShapeCastOp op,

PatternRewriter &rewriter) const override { PatternRewriter &rewriter) const override {

Location loc = op.getLoc(); Location loc = op.getLoc();

auto sourceVectorType = op.getSourceVectorType(); auto sourceVectorType = op.getSourceVectorType();

auto resultVectorType = op.getResultVectorType(); auto resultVectorType = op.getResultVectorType();

if (sourceVectorType.isScalable() || resultVectorType.isScalable())

return failure();

// Special case 2D / 1D lowerings with better implementations. // Special case 2D / 1D lowerings with better implementations.

// TODO: make is ND / 1D to allow generic ND -> 1D -> MD. // TODO: make is ND / 1D to allow generic ND -> 1D -> MD.

int64_t srcRank = sourceVectorType.getRank(); int64_t srcRank = sourceVectorType.getRank();

int64_t resRank = resultVectorType.getRank(); int64_t resRank = resultVectorType.getRank();

if ((srcRank == 2 && resRank == 1) || (srcRank == 1 && resRank == 2)) if ((srcRank == 2 && resRank == 1) || (srcRank == 1 && resRank == 2))

return failure(); return failure();

// Generic ShapeCast lowering path goes all the way down to unrolled scalar // Generic ShapeCast lowering path goes all the way down to unrolled scalar

Show All 38 Lines for (int64_t i = 0; i < numElts; i++) {

} else { } else {

result = result =

rewriter.create<vector::InsertOp>(loc, extract, result, resIdx); rewriter.create<vector::InsertOp>(loc, extract, result, resIdx);

} }

rewriter.replaceOp(op, result); rewriter.replaceOp(op, result);

return success(); return success();

} }

};

/// A shape_cast lowering for scalable vectors with a single trailing scalable

/// dimension. This is similar to the general shape_cast lowering but makes use

/// of vector.scalable.insert and vector.scalable.extract to move elements a

/// subvector at a time.

///

/// E.g.:

/// ```

/// // Flatten scalable vector

/// %0 = vector.shape_cast %arg0 : vector<2x1x[4]xi32> to vector<[8]xi32>

/// ```

/// is rewritten to:

/// ```

/// // Flatten scalable vector

/// %c = arith.constant dense<0> : vector<[8]xi32>

/// %0 = vector.extract %arg0[0, 0] : vector<2x1x[4]xi32>

/// %1 = vector.scalable.insert %0, %c[0] : vector<[4]xi32> into vector<[8]xi32>

/// %2 = vector.extract %arg0[1, 0] : vector<2x1x[4]xi32>

/// %3 = vector.scalable.insert %2, %1[4] : vector<[4]xi32> into vector<[8]xi32>

/// ```

/// or:

/// ```

/// // Un-flatten scalable vector

/// %0 = vector.shape_cast %arg0 : vector<[8]xi32> to vector<2x1x[4]xi32>

/// ```

/// is rewritten to:

/// ```

/// // Un-flatten scalable vector

/// %c = arith.constant dense<0> : vector<2x1x[4]xi32>

/// %0 = vector.scalable.extract %arg0[0] : vector<[4]xi32> from vector<[8]xi32>

/// %1 = vector.insert %0, %c [0, 0] : vector<[4]xi32> into vector<2x1x[4]xi32>

/// %2 = vector.scalable.extract %arg0[4] : vector<[4]xi32> from vector<[8]xi32>

/// %3 = vector.insert %2, %1 [1, 0] : vector<[4]xi32> into vector<2x1x[4]xi32>

/// ```

class ScalableShapeCastOpRewritePattern

: public OpRewritePattern<vector::ShapeCastOp> {

public:

using OpRewritePattern::OpRewritePattern;

LogicalResult matchAndRewrite(vector::ShapeCastOp op,

PatternRewriter &rewriter) const override {

Location loc = op.getLoc();

auto sourceVectorType = op.getSourceVectorType();

auto resultVectorType = op.getResultVectorType();

auto srcRank = sourceVectorType.getRank();

auto resRank = resultVectorType.getRank();

// This can only lower shape_casts where both the source and result types

// have a single trailing scalable dimension. This is because there are no

// legal representation of other scalable types in LLVM (and likely won't be

// soon). There are also (currently) no operations that can index or extract

// from >= 2D scalable vectors or scalable vectors of fixed vectors.

if (!isTrailingDimScalable(sourceVectorType) ||

!isTrailingDimScalable(resultVectorType)) {

return failure();

}

// The sizes of the trailing dimension of the source and result vectors, the

// size of subvector to move, and the number of elements in the vectors.

// These are "min" sizes as they are the size when vscale == 1.

auto minSourceTrailingSize = sourceVectorType.getShape().back();

auto minResultTrailingSize = resultVectorType.getShape().back();

auto minExtractionSize =

std::min(minSourceTrailingSize, minResultTrailingSize);

int64_t minNumElts = 1;

for (auto size : sourceVectorType.getShape())

minNumElts *= size;

// The subvector type to move from the source to the result. Note that this

// is a scalable vector. This rewrite will generate code in terms of the

// "min" size (vscale == 1 case), that scales to any vscale.

auto extractionVectorType = VectorType::get(

{minExtractionSize}, sourceVectorType.getElementType(), {true});

Value result = rewriter.create<arith::ConstantOp>(

loc, resultVectorType, rewriter.getZeroAttr(resultVectorType));

SmallVector<int64_t> srcIdx(srcRank);

SmallVector<int64_t> resIdx(resRank);

// TODO: Try rewriting this with StaticTileOffsetRange (from IndexingUtils)

// once D150000 lands.

Value currentResultScalableVector;

Value currentSourceScalableVector;

for (int64_t i = 0; i < minNumElts; i += minExtractionSize) {

// 1. Extract a scalable subvector from the source vector.

if (!currentSourceScalableVector) {

if (srcRank != 1) {

currentSourceScalableVector = rewriter.create<vector::ExtractOp>(

loc, op.getSource(), llvm::ArrayRef(srcIdx).drop_back());

} else {

currentSourceScalableVector = op.getSource();

}

Value sourceSubVector = currentSourceScalableVector;

if (minExtractionSize < minSourceTrailingSize) {

sourceSubVector = rewriter.create<vector::ScalableExtractOp>(

loc, extractionVectorType, sourceSubVector, srcIdx.back());

}

// 2. Insert the scalable subvector into the result vector.

if (!currentResultScalableVector) {

if (minExtractionSize == minResultTrailingSize) {

currentResultScalableVector = sourceSubVector;

} else if (resRank != 1) {

currentResultScalableVector = rewriter.create<vector::ExtractOp>(

loc, result, llvm::ArrayRef(resIdx).drop_back());

} else {

currentResultScalableVector = result;

}

if (minExtractionSize < minResultTrailingSize) {

currentResultScalableVector = rewriter.create<vector::ScalableInsertOp>(

loc, sourceSubVector, currentResultScalableVector, resIdx.back());

}

private: // 3. Update the source and result scalable vectors if needed.

static void incIdx(SmallVector<int64_t> &idx, VectorType tp, int64_t r) { if (resIdx.back() + minExtractionSize >= minResultTrailingSize &&

assert(0 <= r && r < tp.getRank()); currentResultScalableVector != result) {

if (++idx[r] == tp.getDimSize(r)) { // Finished row of result. Insert complete scalable vector into result

awarzynskiUnsubmitted

Done

This comment is a bit unclear to me. How about:

%unflat = vector.shape_cast %arg0 : vector<[8]xi32> to vector<2x[1]x4xi32>

? Not that I suggest that we start generating code like this, but this pattern could in principle support this, right? It does not because it relies on vector.scalable.{insert|extract}, but that could be relaxed in the future, right?

awarzynski: This comment is a bit unclear to me. How about: ``` %unflat = vector.shape_cast %arg0 : vector<…

benmxwl-armAuthorUnsubmitted

Done

I don't think this lowering (or one like this) could work for that cast.

The problems there are:

This lowering is part of vector-to-llvm and there's no legal (or soon to be legal) representation of vector<2x[1]x4xi32> in LLVM
For vector<[8]xi32> to vector<2x[1]x4xi32> you'd need extract/insert fixed vectors (vector<4xi32>) 2*vscale times, so you'd need a (runtime) loop.
There's (currently) no ops that could handle the require inserts/extracts.

benmxwl-arm: I don't think this lowering (or one like this) could work for that cast. The problems there…

idx[r] = 0; // (n-D) vector.

incIdx(idx, tp, r - 1); result = rewriter.create<vector::InsertOp>(

loc, currentResultScalableVector, result,

llvm::ArrayRef(resIdx).drop_back());

currentResultScalableVector = {};

} }

if (srcIdx.back() + minExtractionSize >= minSourceTrailingSize) {

// Finished row of source.

currentSourceScalableVector = {};

}

// 4. Increment the insert/extract indices, stepping by minExtractionSize

// for the trailing dimensions.

incIdx(srcIdx, sourceVectorType, srcRank - 1, minExtractionSize);

incIdx(resIdx, resultVectorType, resRank - 1, minExtractionSize);

}

rewriter.replaceOp(op, result);

return success();

awarzynskiUnsubmitted

Done

Why "Min" rather than "Trailing"?

awarzynski: Why "Min" rather than "Trailing"?

benmxwl-armAuthorUnsubmitted

Done

It's min as in the minimum vector size (when vscale is 1)

benmxwl-arm: It's min as in the minimum vector size (when vscale is 1)

}

static bool isTrailingDimScalable(VectorType type) {

awarzynskiUnsubmitted

Done

This is the length of the vector to be extracted?

awarzynski: This is the length of the vector to be extracted?

return type.getRank() >= 1 && type.getScalableDims().back() &&

!llvm::is_contained(type.getScalableDims().drop_back(), true);

awarzynskiUnsubmitted

Done

Doxygen + short code example, pls

awarzynski: Doxygen + short code example, pls

awarzynskiUnsubmitted

Done

Why "min"? Isn't this the total number of elements?

Also, this is ignoring "scalability", but I guess that that;s fine?

awarzynski: Why "min"? Isn't this the total number of elements? Also, this is ignoring "scalability", but…

} }

}; };

} // namespace } // namespace

void mlir::vector::populateVectorShapeCastLoweringPatterns( void mlir::vector::populateVectorShapeCastLoweringPatterns(

RewritePatternSet &patterns, PatternBenefit benefit) { RewritePatternSet &patterns, PatternBenefit benefit) {

patterns.add<ShapeCastOp2DDownCastRewritePattern, patterns.add<ShapeCastOp2DDownCastRewritePattern,

ShapeCastOp2DUpCastRewritePattern, ShapeCastOpRewritePattern>( ShapeCastOp2DUpCastRewritePattern, ShapeCastOpRewritePattern,

patterns.getContext(), benefit); ScalableShapeCastOpRewritePattern>(patterns.getContext(),

benefit);

} }

awarzynskiUnsubmitted

Done

return success();

}

- static bool isVectorTypeWithtrailingScalableDim(VectorType type) {

+ static bool isTrailingDimScalable(VectorType type) {

return type.getRank() >= 1 && type.getScalableDims().back() &&

[nit] We know that this is testing a vector, so "VectorType" can be dropped from the name.

awarzynski: [nit] We know that this is testing a vector, so "VectorType" can be dropped from the name.

awarzynskiUnsubmitted

Done

Rather than "extractedSubVector", could you call it "inputSubVector" (we can guess that it's been "extracted", but it's not obvious _where_ it was extracted from).

awarzynski: Rather than "extractedSubVector", could you call it "inputSubVector" (we can guess that it's…

awarzynskiUnsubmitted

Not Done

for (int64_t i = 0; i < minNumElts; i += minExtractionSize) {

- // Extract a scalable subvector from the source vector.

+ // 1. Extract a scalable subvector from the source vector.

if (!currentSourceScalableVector) {

[nit]

awarzynski: [nit]

awarzynskiUnsubmitted

Not Done

loc, extractionVectorType, sourceSubVector, srcIdx.back());

}

- // Insert the scalable subvector into the result vector.

+ // 2. Insert the scalable subvector into the result vector.

if (!currentResultScalableVector) {

awarzynski:

awarzynskiUnsubmitted

Not Done

currentSourceScalableVector = {};

}

- // Increment the insert/extract indices, stepping by minExtractionSize for

+ // 4. Increment the insert/extract indices, stepping by minExtractionSize for

// the trailing dimensions.

awarzynski:

awarzynskiUnsubmitted

Not Done

loc, sourceSubVector, currentResultScalableVector, resIdx.back());

}

+ // 3. Update the source and result scalable vectors if needed.

if (resIdx.back() + minExtractionSize >= minResultTrailingSize &&

[nit]

awarzynski: [nit]

mlir/test/Dialect/Vector/vector-shape-cast-lowering-scalable-vectors.mlir

This file was added.

				// RUN: mlir-opt %s --test-transform-dialect-interpreter \| FileCheck %s

				awarzynskiUnsubmitted Done Reply Inline Actions It would be good to add a "negative" test (i.e. "scalable" vectors where the scalable dim is not the trailing dim). awarzynski: It would be good to add a "negative" test (i.e. "scalable" vectors where the scalable dim is…
				/// This tests that shape casts of scalable vectors (with one trailing scalable dim)
				/// can be correctly lowered to vector.scalable.insert/extract.

				// CHECK-LABEL: i32_3d_to_1d_last_dim_scalable
				// CHECK-SAME: %[[arg0:.*]]: vector<2x1x[4]xi32>
				awarzynskiUnsubmitted Done Reply Inline Actions [nit] Your function names are quite long. However, we know that this file is all about "shape_cast", so you could safely drop "shape_cast" from names. [nit] "3d_scalable_to_1d" suggests that the input is 3d scalable and the output is 1d awarzynski: [nit] Your function names are quite long. However, we know that this file is all about…
				func.func @i32_3d_to_1d_last_dim_scalable(%arg0: vector<2x1x[4]xi32>) -> vector<[8]xi32>
				{
				// CHECK-NEXT: %[[cst:.*]] = arith.constant dense<0> : vector<[8]xi32>
				// CHECK-NEXT: %[[subvec0:.*]] = vector.extract %[[arg0]][0, 0] : vector<2x1x[4]xi32>
				// CHECK-NEXT: %[[res0:.*]] = vector.scalable.insert %[[subvec0]], %[[cst]][0] : vector<[4]xi32> into vector<[8]xi32>
				// CHECK-NEXT: %[[subvec1:.*]] = vector.extract %[[arg0]][1, 0] : vector<2x1x[4]xi32>
				// CHECK-NEXT: %[[res1:.*]] = vector.scalable.insert %[[subvec1]], %[[res0]][4] : vector<[4]xi32> into vector<[8]xi32>
				%flat = vector.shape_cast %arg0 : vector<2x1x[4]xi32> to vector<[8]xi32>
				// CHECK-NEXT: return %[[res1]] : vector<[8]xi32>
				return %flat : vector<[8]xi32>
				}

				// -----

				// CHECK-LABEL: i32_1d_to_3d_last_dim_scalable
				// CHECK-SAME: %[[arg0:.*]]: vector<[8]xi32>
				func.func @i32_1d_to_3d_last_dim_scalable(%arg0: vector<[8]xi32>) -> vector<2x1x[4]xi32> {
				// CHECK-NEXT: %[[cst:.*]] = arith.constant dense<0> : vector<2x1x[4]xi32>
				// CHECK-NEXT: %[[subvec0:.*]] = vector.scalable.extract %[[arg0]][0] : vector<[4]xi32> from vector<[8]xi32>
				// CHECK-NEXT: %[[res0:.*]] = vector.insert %[[subvec0]], %[[cst]] [0, 0] : vector<[4]xi32> into vector<2x1x[4]xi32>
				// CHECK-NEXT: %[[subvec1:.*]] = vector.scalable.extract %[[arg0]][4] : vector<[4]xi32> from vector<[8]xi32>
				// CHECK-NEXT: %[[res1:.*]] = vector.insert %[[subvec1]], %[[res0]] [1, 0] : vector<[4]xi32> into vector<2x1x[4]xi32>
				%unflat = vector.shape_cast %arg0 : vector<[8]xi32> to vector<2x1x[4]xi32>
				// CHECK-NEXT: return %[[res1]] : vector<2x1x[4]xi32>
				return %unflat : vector<2x1x[4]xi32>
				}

				// -----

				// CHECK-LABEL: i8_2d_to_1d_last_dim_scalable
				// CHECK-SAME: %[[arg0:.*]]: vector<4x[8]xi8>
				func.func @i8_2d_to_1d_last_dim_scalable(%arg0: vector<4x[8]xi8>) -> vector<[32]xi8> {
				// CHECK-NEXT: %[[cst:.*]] = arith.constant dense<0> : vector<[32]xi8>
				// CHECK-NEXT: %[[subvec0:.*]] = vector.extract %[[arg0]][0] : vector<4x[8]xi8>
				// CHECK-NEXT: %[[res0:.*]] = vector.scalable.insert %[[subvec0]], %[[cst]][0] : vector<[8]xi8> into vector<[32]xi8>
				// CHECK-NEXT: %[[subvec1:.*]] = vector.extract %[[arg0]][1] : vector<4x[8]xi8>
				// CHECK-NEXT: %[[res1:.*]] = vector.scalable.insert %[[subvec1]], %[[res0]][8] : vector<[8]xi8> into vector<[32]xi8>
				// CHECK-NEXT: %[[subvec2:.*]] = vector.extract %[[arg0]][2] : vector<4x[8]xi8>
				// CHECK-NEXT: %[[res2:.*]] = vector.scalable.insert %[[subvec2]], %[[res1]][16] : vector<[8]xi8> into vector<[32]xi8>
				// CHECK-NEXT: %[[subvec3:.*]] = vector.extract %[[arg0]][3] : vector<4x[8]xi8>
				// CHECK-NEXT: %[[res3:.*]] = vector.scalable.insert %[[subvec3]], %[[res2]][24] : vector<[8]xi8> into vector<[32]xi8>
				%flat = vector.shape_cast %arg0 : vector<4x[8]xi8> to vector<[32]xi8>
				// CHECK-NEXT: return %[[res3]] : vector<[32]xi8>
				return %flat : vector<[32]xi8>
				}

				// -----

				// CHECK-LABEL: i8_1d_to_2d_last_dim_scalable
				// CHECK-SAME: %[[arg0:.*]]: vector<[32]xi8>
				func.func @i8_1d_to_2d_last_dim_scalable(%arg0: vector<[32]xi8>) -> vector<4x[8]xi8> {
				// CHECK-NEXT: %[[cst:.*]] = arith.constant dense<0> : vector<4x[8]xi8>
				// CHECK-NEXT: %[[subvec0:.*]] = vector.scalable.extract %[[arg0]][0] : vector<[8]xi8> from vector<[32]xi8>
				// CHECK-NEXT: %[[res0:.*]] = vector.insert %[[subvec0]], %[[cst]] [0] : vector<[8]xi8> into vector<4x[8]xi8>
				// CHECK-NEXT: %[[subvec1:.*]] = vector.scalable.extract %[[arg0]][8] : vector<[8]xi8> from vector<[32]xi8>
				// CHECK-NEXT: %[[res1:.*]] = vector.insert %[[subvec1]], %[[res0]] [1] : vector<[8]xi8> into vector<4x[8]xi8>
				// CHECK-NEXT: %[[subvec2:.*]] = vector.scalable.extract %[[arg0]][16] : vector<[8]xi8> from vector<[32]xi8>
				// CHECK-NEXT: %[[res2:.*]] = vector.insert %[[subvec2]], %[[res1]] [2] : vector<[8]xi8> into vector<4x[8]xi8>
				// CHECK-NEXT: %[[subvec3:.*]] = vector.scalable.extract %[[arg0]][24] : vector<[8]xi8> from vector<[32]xi8>
				// CHECK-NEXT: %[[res3:.*]] = vector.insert %[[subvec3]], %[[res2]] [3] : vector<[8]xi8> into vector<4x[8]xi8>
				%unflat = vector.shape_cast %arg0 : vector<[32]xi8> to vector<4x[8]xi8>
				// CHECK-NEXT: return %[[res3]] : vector<4x[8]xi8>
				return %unflat : vector<4x[8]xi8>
				}

				// -----

				// CHECK-LABEL: f32_permute_leading_non_scalable_dims
				// CHECK-SAME: %[[arg0:.*]]: vector<2x3x[4]xf32>
				func.func @f32_permute_leading_non_scalable_dims(%arg0: vector<2x3x[4]xf32>) -> vector<3x2x[4]xf32> {
				// CHECK-NEXT: %[[cst:.*]] = arith.constant dense<0.000000e+00> : vector<3x2x[4]xf32>
				// CHECK-NEXT: %[[subvec0:.*]] = vector.extract %[[arg0]][0, 0] : vector<2x3x[4]xf32>
				// CHECK-NEXT: %[[res0:.*]] = vector.insert %[[subvec0]], %[[cst]] [0, 0] : vector<[4]xf32> into vector<3x2x[4]xf32>
				// CHECK-NEXT: %[[subvec1:.*]] = vector.extract %[[arg0]][0, 1] : vector<2x3x[4]xf32>
				// CHECK-NEXT: %[[res1:.*]] = vector.insert %[[subvec1]], %[[res0]] [0, 1] : vector<[4]xf32> into vector<3x2x[4]xf32>
				// CHECK-NEXT: %[[subvec2:.*]] = vector.extract %[[arg0]][0, 2] : vector<2x3x[4]xf32>
				// CHECK-NEXT: %[[res2:.*]] = vector.insert %[[subvec2]], %[[res1]] [1, 0] : vector<[4]xf32> into vector<3x2x[4]xf32>
				// CHECK-NEXT: %[[subvec3:.*]] = vector.extract %[[arg0]][1, 0] : vector<2x3x[4]xf32>
				// CHECK-NEXT: %[[res3:.*]] = vector.insert %[[subvec3]], %[[res2]] [1, 1] : vector<[4]xf32> into vector<3x2x[4]xf32>
				// CHECK-NEXT: %[[subvec4:.*]] = vector.extract %[[arg0]][1, 1] : vector<2x3x[4]xf32>
				// CHECK-NEXT: %[[res4:.*]] = vector.insert %[[subvec4]], %[[res3]] [2, 0] : vector<[4]xf32> into vector<3x2x[4]xf32>
				// CHECK-NEXT: %[[subvec5:.*]] = vector.extract %[[arg0]][1, 2] : vector<2x3x[4]xf32>
				// CHECK-NEXT: %[[res5:.*]] = vector.insert %[[subvec5]], %[[res4]] [2, 1] : vector<[4]xf32> into vector<3x2x[4]xf32>
				%res = vector.shape_cast %arg0: vector<2x3x[4]xf32> to vector<3x2x[4]xf32>
				// CHECK-NEXT: return %[[res5]] : vector<3x2x[4]xf32>
				return %res : vector<3x2x[4]xf32>
				}

				// -----

				// CHECK-LABEL: f64_flatten_leading_non_scalable_dims
				// CHECK-SAME: %[[arg0:.*]]: vector<2x2x[2]xf64>
				func.func @f64_flatten_leading_non_scalable_dims(%arg0: vector<2x2x[2]xf64>) -> vector<4x[2]xf64>
				{
				// CHECK-NEXT: %[[cst:.*]] = arith.constant dense<0.000000e+00> : vector<4x[2]xf64>
				// CHECK-NEXT: %[[subvec0:.*]] = vector.extract %[[arg0]][0, 0] : vector<2x2x[2]xf64>
				// CHECK-NEXT: %[[res0:.*]] = vector.insert %[[subvec0]], %[[cst]] [0] : vector<[2]xf64> into vector<4x[2]xf64>
				// CHECK-NEXT: %[[subvec1:.*]] = vector.extract %[[arg0]][0, 1] : vector<2x2x[2]xf64>
				// CHECK-NEXT: %[[res1:.*]] = vector.insert %[[subvec1]], %[[res0]] [1] : vector<[2]xf64> into vector<4x[2]xf64>
				// CHECK-NEXT: %[[subvec2:.*]] = vector.extract %[[arg0]][1, 0] : vector<2x2x[2]xf64>
				// CHECK-NEXT: %[[res2:.*]] = vector.insert %[[subvec2]], %[[res1]] [2] : vector<[2]xf64> into vector<4x[2]xf64>
				// CHECK-NEXT: %[[subvec3:.*]] = vector.extract %[[arg0]][1, 1] : vector<2x2x[2]xf64>
				// CHECK-NEXT: %[[res3:.*]] = vector.insert %[[subvec3]], %[[res2]] [3] : vector<[2]xf64> into vector<4x[2]xf64>
				%res = vector.shape_cast %arg0: vector<2x2x[2]xf64> to vector<4x[2]xf64>
				// CHECK-NEXT: return %7 : vector<4x[2]xf64>
				return %res : vector<4x[2]xf64>
				}

				// -----

				// CHECK-LABEL: f32_reduce_trailing_scalable_dim
				// CHECK-SAME: %[[arg0:.*]]: vector<3x[4]xf32>
				func.func @f32_reduce_trailing_scalable_dim(%arg0: vector<3x[4]xf32>) -> vector<6x[2]xf32>
				{
				// CHECK-NEXT: %[[cst:.*]] = arith.constant dense<0.000000e+00> : vector<6x[2]xf32>
				// CHECK-NEXT: %[[srcvec0:.*]] = vector.extract %[[arg0]][0] : vector<3x[4]xf32>
				// CHECK-NEXT: %[[subvec0:.*]] = vector.scalable.extract %[[srcvec0]][0] : vector<[2]xf32> from vector<[4]xf32>
				// CHECK-NEXT: %[[res0:.*]] = vector.insert %[[subvec0]], %[[cst]] [0] : vector<[2]xf32> into vector<6x[2]xf32>
				// CHECK-NEXT: %[[subvec1:.*]] = vector.scalable.extract %[[srcvec0]][2] : vector<[2]xf32> from vector<[4]xf32>
				// CHECK-NEXT: %[[res1:.*]] = vector.insert %[[subvec1]], %[[res0]] [1] : vector<[2]xf32> into vector<6x[2]xf32>
				// CHECK-NEXT: %[[srcvec1:.*]] = vector.extract %[[arg0]][1] : vector<3x[4]xf32>
				// CHECK-NEXT: %[[subvec2:.*]] = vector.scalable.extract %[[srcvec1]][0] : vector<[2]xf32> from vector<[4]xf32>
				// CHECK-NEXT: %[[res2:.*]] = vector.insert %[[subvec2]], %[[res1]] [2] : vector<[2]xf32> into vector<6x[2]xf32>
				// CHECK-NEXT: %[[subvec3:.*]] = vector.scalable.extract %[[srcvec1]][2] : vector<[2]xf32> from vector<[4]xf32>
				// CHECK-NEXT: %[[res3:.*]] = vector.insert %[[subvec3]], %[[res2]] [3] : vector<[2]xf32> into vector<6x[2]xf32>
				// CHECK-NEXT: %[[srcvec2:.*]] = vector.extract %[[arg0]][2] : vector<3x[4]xf32>
				// CHECK-NEXT: %[[subvec4:.*]] = vector.scalable.extract %[[srcvec2]][0] : vector<[2]xf32> from vector<[4]xf32>
				// CHECK-NEXT: %[[res4:.*]] = vector.insert %[[subvec4]], %[[res3]] [4] : vector<[2]xf32> into vector<6x[2]xf32>
				// CHECK-NEXT: %[[subvec5:.*]] = vector.scalable.extract %[[srcvec2]][2] : vector<[2]xf32> from vector<[4]xf32>
				// CHECK-NEXT: %[[res5:.*]] = vector.insert %[[subvec5]], %[[res4]] [5] : vector<[2]xf32> into vector<6x[2]xf32>
				%res = vector.shape_cast %arg0: vector<3x[4]xf32> to vector<6x[2]xf32>
				// CHECK-NEXT: return %[[res5]] : vector<6x[2]xf32>
				return %res: vector<6x[2]xf32>
				}

				// -----

				// CHECK-LABEL: f32_increase_trailing_scalable_dim
				// CHECK-SAME: %[[arg0:.*]]: vector<4x[2]xf32>
				func.func @f32_increase_trailing_scalable_dim(%arg0: vector<4x[2]xf32>) -> vector<2x[4]xf32>
				{
				// CHECK-NEXT: %[[cst:.*]] = arith.constant dense<0.000000e+00> : vector<2x[4]xf32>
				// CHECK-NEXT: %[[subvec0:.*]] = vector.extract %[[arg0]][0] : vector<4x[2]xf32>
				// CHECK-NEXT: %[[resvec0:.*]] = vector.extract %[[cst]][0] : vector<2x[4]xf32>
				// CHECK-NEXT: %[[resvec1:.*]] = vector.scalable.insert %[[subvec0]], %[[resvec0]][0] : vector<[2]xf32> into vector<[4]xf32>
				// CHECK-NEXT: %[[subvec1:.*]] = vector.extract %[[arg0]][1] : vector<4x[2]xf32>
				// CHECK-NEXT: %[[resvec2:.*]] = vector.scalable.insert %[[subvec1]], %[[resvec1]][2] : vector<[2]xf32> into vector<[4]xf32>
				// CHECK-NEXT: %[[res0:.*]] = vector.insert %[[resvec2]], %[[cst]] [0] : vector<[4]xf32> into vector<2x[4]xf32>
				// CHECK-NEXT: %[[subvec3:.*]] = vector.extract %[[arg0]][2] : vector<4x[2]xf32>
				// CHECK-NEXT: %[[resvec3:.*]] = vector.extract %[[cst]][1] : vector<2x[4]xf32>
				// CHECK-NEXT: %[[resvec4:.*]] = vector.scalable.insert %[[subvec3]], %[[resvec3]][0] : vector<[2]xf32> into vector<[4]xf32>
				// CHECK-NEXT: %[[subvec4:.*]] = vector.extract %[[arg0]][3] : vector<4x[2]xf32>
				// CHECK-NEXT: %[[resvec5:.*]] = vector.scalable.insert %[[subvec4]], %[[resvec4]][2] : vector<[2]xf32> into vector<[4]xf32>
				// CHECK-NEXT: %[[res1:.*]] = vector.insert %[[resvec5]], %[[res0]] [1] : vector<[4]xf32> into vector<2x[4]xf32>
				%res = vector.shape_cast %arg0: vector<4x[2]xf32> to vector<2x[4]xf32>
				// CHECK-NEXT: return %[[res1]] : vector<2x[4]xf32>
				return %res: vector<2x[4]xf32>
				}

				// -----

				/// The following shape_casts are not supported as the types cannot be
				/// represented in LLVM (and likely won't be supported soon), and currently
				/// there's no ops that could do the extracts/inserts required.

				// -----

				// CHECK-LABEL: cannot_cast_to_non_trailing_scalable_dim
				// CHECK-SAME: %[[arg0:.*]]: vector<[4]xf32>
				func.func @cannot_cast_to_non_trailing_scalable_dim(%arg0: vector<[4]xf32>) -> vector<[2]x2xf32> {
				// CHECK-NEXT: %[[res:.*]] = vector.shape_cast %[[arg0]] : vector<[4]xf32> to vector<[2]x2xf32>
				%res = vector.shape_cast %arg0 : vector<[4]xf32> to vector<[2]x2xf32>
				// CHECK-NEXT: return %[[res]] : vector<[2]x2xf32>
				return %res: vector<[2]x2xf32>
				}

				// -----

				// CHECK-LABEL: cannot_shape_cast_from_non_trailing_scalable_dim
				// CHECK-SAME: %[[arg0:.*]]: vector<[2]x2xf32>
				func.func @cannot_shape_cast_from_non_trailing_scalable_dim(%arg0: vector<[2]x2xf32>) -> vector<[4]xf32> {
				// CHECK-NEXT: %[[res:.*]] = vector.shape_cast %[[arg0]] : vector<[2]x2xf32> to vector<[4]xf32>
				%res = vector.shape_cast %arg0 : vector<[2]x2xf32> to vector<[4]xf32>
				// CHECK-NEXT: return %[[res]] : vector<[4]xf32>
				return %res: vector<[4]xf32>
				}

				// -----

				// CHECK-LABEL: cannot_shape_cast_more_than_one_scalable_dim
				// CHECK-SAME: %[[arg0:.*]]: vector<[4]x[4]xf32>
				func.func @cannot_shape_cast_more_than_one_scalable_dim(%arg0: vector<[4]x[4]xf32>) -> vector<2x[2]x[4]xf32> {
				// CHECK-NEXT: %[[res:.*]] = vector.shape_cast %[[arg0]] : vector<[4]x[4]xf32> to vector<2x[2]x[4]xf32>
				%res = vector.shape_cast %arg0 : vector<[4]x[4]xf32> to vector<2x[2]x[4]xf32>
				// CHECK-NEXT: return %[[res]] : vector<2x[2]x[4]xf32>
				return %res: vector<2x[2]x[4]xf32>
				}

				transform.sequence failures(propagate) {
				^bb1(%module_op: !transform.any_op):
				%f = transform.structured.match ops{["func.func"]} in %module_op
				: (!transform.any_op) -> !transform.any_op

				transform.apply_patterns to %f {
				transform.apply_patterns.vector.lower_shape_cast
				} : !transform.any_op
				}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][VectorOps] Add lowering for vector.shape_cast of scalable vectorsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 556167

mlir/lib/Dialect/Vector/Transforms/LowerVectorShapeCast.cpp

mlir/test/Dialect/Vector/vector-shape-cast-lowering-scalable-vectors.mlir

[mlir][VectorOps] Add lowering for vector.shape_cast of scalable vectors
ClosedPublic