This is an archive of the discontinued LLVM Phabricator instance.

[mlir] [VectorOps] Improve SIMD compares with narrower indices
ClosedPublic

Authored by aartbik on Sep 3 2020, 4:01 PM.

Download Raw Diff

Details

Reviewers

ftynse
nicolasvasilache
bkramer
reidtatge

Commits

rG060c9dd1cc46: [mlir] [VectorOps] Improve SIMD compares with narrower indices

Summary

When allowed, use 32-bit indices rather than 64-bit indices in the
SIMD computation of masks. This runs up to 2x and 4x faster on
a number of AVX2 and AVX512 microbenchmarks.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

aartbik created this revision.Sep 3 2020, 4:01 PM

Herald added a reviewer: ftynse. · View Herald TranscriptSep 3 2020, 4:01 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: msifontes, jurahul, Kayjukh and 11 others. · View Herald Transcript

aartbik requested review of this revision.Sep 3 2020, 4:01 PM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptSep 3 2020, 4:01 PM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

aartbik added reviewers: bkramer, reidtatge.Sep 3 2020, 4:07 PM

Harbormaster completed remote builds in B70595: Diff 289829.Sep 3 2020, 4:17 PM

Looks good. Would it make sense to automatically enable this if the incoming memref is known to have fewer than 2^32 elements?

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
127	clang-tidy rightfully complains about the name not being camelBack :)

This revision is now accepted and ready to land.Sep 3 2020, 4:30 PM

rebased

In D87116#2255567, @bkramer wrote:

Looks good. Would it make sense to automatically enable this if the incoming memref is known to have fewer than 2^32 elements?

I was pondering over this. Would a 32-bit index space in general suffice?
For now I made it an option we can play around with and keep this automatic optimization in mind.

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
127	you are right, I am mixing my styleguides

fixed method case, thanks Ben!

Harbormaster completed remote builds in B70600: Diff 289835.Sep 3 2020, 5:26 PM

Harbormaster completed remote builds in B70601: Diff 289836.Sep 3 2020, 5:49 PM

Closed by commit rG060c9dd1cc46: [mlir] [VectorOps] Improve SIMD compares with narrower indices (authored by aartbik). · Explain WhySep 3 2020, 9:43 PM

This revision was automatically updated to reflect the committed changes.

aartbik added a commit: rG060c9dd1cc46: [mlir] [VectorOps] Improve SIMD compares with narrower indices.

Thanks for tracking and fixing the perf bug Aart!
Maybe this also influences how aggressive we need to be in splitting in the codegen strategy to get to peak.
Will also be interesting to see the effects on mobile (cc @asaadaldien )

Revision Contents

Path

Size

mlir/

include/

mlir/

Conversion/

Passes.td

5 lines

VectorToLLVM/

ConvertVectorToLLVM.h

12 lines

lib/

Conversion/

VectorToLLVM/

ConvertVectorToLLVM.cpp

145 lines

Dialect/

Vector/

VectorTransforms.cpp

20 lines

test/

Conversion/

VectorToLLVM/

vector-mask-to-llvm.mlir

48 lines

vector-to-llvm.mlir

18 lines

Dialect/

Vector/

vector-contract-transforms.mlir

64 lines

Diff 289836

mlir/include/mlir/Conversion/Passes.td

	Show First 20 Lines • Show All 352 Lines • ▼ Show 20 Lines
	def ConvertVectorToLLVM : Pass<"convert-vector-to-llvm", "ModuleOp"> {			def ConvertVectorToLLVM : Pass<"convert-vector-to-llvm", "ModuleOp"> {
	let summary = "Lower the operations from the vector dialect into the LLVM "			let summary = "Lower the operations from the vector dialect into the LLVM "
	"dialect";			"dialect";
	let constructor = "mlir::createConvertVectorToLLVMPass()";			let constructor = "mlir::createConvertVectorToLLVMPass()";
	let dependentDialects = ["LLVM::LLVMDialect"];			let dependentDialects = ["LLVM::LLVMDialect"];
	let options = [			let options = [
	Option<"reassociateFPReductions", "reassociate-fp-reductions",			Option<"reassociateFPReductions", "reassociate-fp-reductions",
	"bool", /default=/"false",			"bool", /default=/"false",
	"Allows llvm to reassociate floating-point reductions for speed">			"Allows llvm to reassociate floating-point reductions for speed">,
				Option<"enableIndexOptimizations", "enable-index-optimizations",
				"bool", /default=/"false",
				"Allows compiler to assume indices fit in 32-bit if that yields faster code">
	];			];
	}			}

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// VectorToROCDL			// VectorToROCDL
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	def ConvertVectorToROCDL : Pass<"convert-vector-to-rocdl", "ModuleOp"> {			def ConvertVectorToROCDL : Pass<"convert-vector-to-rocdl", "ModuleOp"> {
	let summary = "Lower the operations from the vector dialect into the ROCDL "			let summary = "Lower the operations from the vector dialect into the ROCDL "
	"dialect";			"dialect";
	let constructor = "mlir::createConvertVectorToROCDLPass()";			let constructor = "mlir::createConvertVectorToROCDLPass()";
	let dependentDialects = ["ROCDL::ROCDLDialect"];			let dependentDialects = ["ROCDL::ROCDLDialect"];
	}			}

	#endif // MLIR_CONVERSION_PASSES			#endif // MLIR_CONVERSION_PASSES

mlir/include/mlir/Conversion/VectorToLLVM/ConvertVectorToLLVM.h

	Show All 16 Lines
	class OperationPass;			class OperationPass;

	/// Options to control Vector to LLVM lowering.			/// Options to control Vector to LLVM lowering.
	///			///
	/// This should kept in sync with VectorToLLVM options defined for the			/// This should kept in sync with VectorToLLVM options defined for the
	/// ConvertVectorToLLVM pass in include/mlir/Conversion/Passes.td			/// ConvertVectorToLLVM pass in include/mlir/Conversion/Passes.td
	struct LowerVectorToLLVMOptions {			struct LowerVectorToLLVMOptions {
	bool reassociateFPReductions = false;			bool reassociateFPReductions = false;
	LowerVectorToLLVMOptions &setReassociateFPReductions(bool r) {			bool enableIndexOptimizations = false;
	reassociateFPReductions = r;			LowerVectorToLLVMOptions &setReassociateFPReductions(bool b) {
				reassociateFPReductions = b;
				return *this;
				}
				LowerVectorToLLVMOptions &setEnableIndexOptimizations(bool b) {
				enableIndexOptimizations = b;
	return *this;			return *this;
	}			}
	};			};

	/// Collect a set of patterns to convert from Vector contractions to LLVM Matrix			/// Collect a set of patterns to convert from Vector contractions to LLVM Matrix
	/// Intrinsics. To lower to assembly, the LLVM flag -lower-matrix-intrinsics			/// Intrinsics. To lower to assembly, the LLVM flag -lower-matrix-intrinsics
	/// will be needed when invoking LLVM.			/// will be needed when invoking LLVM.
	void populateVectorToLLVMMatrixConversionPatterns(			void populateVectorToLLVMMatrixConversionPatterns(
	LLVMTypeConverter &converter, OwningRewritePatternList &patterns);			LLVMTypeConverter &converter, OwningRewritePatternList &patterns);

	/// Collect a set of patterns to convert from the Vector dialect to LLVM.			/// Collect a set of patterns to convert from the Vector dialect to LLVM.
	void populateVectorToLLVMConversionPatterns(			void populateVectorToLLVMConversionPatterns(
	LLVMTypeConverter &converter, OwningRewritePatternList &patterns,			LLVMTypeConverter &converter, OwningRewritePatternList &patterns,
	bool reassociateFPReductions = false);			bool reassociateFPReductions = false,
				bool enableIndexOptimizations = false);

	/// Create a pass to convert vector operations to the LLVMIR dialect.			/// Create a pass to convert vector operations to the LLVMIR dialect.
	std::unique_ptr<OperationPass<ModuleOp>> createConvertVectorToLLVMPass(			std::unique_ptr<OperationPass<ModuleOp>> createConvertVectorToLLVMPass(
	const LowerVectorToLLVMOptions &options = LowerVectorToLLVMOptions());			const LowerVectorToLLVMOptions &options = LowerVectorToLLVMOptions());

	} // namespace mlir			} // namespace mlir

	#endif // MLIR_CONVERSION_VECTORTOLLVM_CONVERTVECTORTOLLVM_H_			#endif // MLIR_CONVERSION_VECTORTOLLVM_CONVERTVECTORTOLLVM_H_

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp

Show First 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	static SmallVector<int64_t, 4> getI64SubArray(ArrayAttr arrayAttr,
SmallVector<int64_t, 4> res;		SmallVector<int64_t, 4> res;
res.reserve(arrayAttr.size() - dropFront - dropBack);		res.reserve(arrayAttr.size() - dropFront - dropBack);
for (auto it = range.begin() + dropFront, eit = range.end() - dropBack;		for (auto it = range.begin() + dropFront, eit = range.end() - dropBack;
it != eit; ++it)		it != eit; ++it)
res.push_back((*it).getValue().getSExtValue());		res.push_back((*it).getValue().getSExtValue());
return res;		return res;
}		}

		// Helper that returns a vector comparison that constructs a mask:
		// mask = [0,1,..,n-1] + [o,o,..,o] < [b,b,..,b]
		//
		// NOTE: The LLVM::GetActiveLaneMaskOp intrinsic would provide an alternative,
		// much more compact, IR for this operation, but LLVM eventually
		// generates more elaborate instructions for this intrinsic since it
		// is very conservative on the boundary conditions.
		static Value buildVectorComparison(ConversionPatternRewriter &rewriter,
		bkramerUnsubmitted Done Reply Inline Actions clang-tidy rightfully complains about the name not being camelBack :) bkramer: clang-tidy rightfully complains about the name not being camelBack :)
		aartbikAuthorUnsubmitted Done Reply Inline Actions you are right, I am mixing my styleguides aartbik: you are right, I am mixing my styleguides
		Operation *op, bool enableIndexOptimizations,
		int64_t dim, Value b, Value *off = nullptr) {
		auto loc = op->getLoc();
		// If we can assume all indices fit in 32-bit, we perform the vector
		// comparison in 32-bit to get a higher degree of SIMD parallelism.
		// Otherwise we perform the vector comparison using 64-bit indices.
		Value indices;
		Type idxType;
		if (enableIndexOptimizations) {
		SmallVector<int32_t, 4> values(dim);
		for (int64_t d = 0; d < dim; d++)
		values[d] = d;
		indices =
		rewriter.create<ConstantOp>(loc, rewriter.getI32VectorAttr(values));
		idxType = rewriter.getI32Type();
		} else {
		SmallVector<int64_t, 4> values(dim);
		for (int64_t d = 0; d < dim; d++)
		values[d] = d;
		indices =
		rewriter.create<ConstantOp>(loc, rewriter.getI64VectorAttr(values));
		idxType = rewriter.getI64Type();
		}
		// Add in an offset if requested.
		if (off) {
		Value o = rewriter.create<IndexCastOp>(loc, idxType, *off);
		Value ov = rewriter.create<SplatOp>(loc, indices.getType(), o);
		indices = rewriter.create<AddIOp>(loc, ov, indices);
		}
		// Construct the vector comparison.
		Value bound = rewriter.create<IndexCastOp>(loc, idxType, b);
		Value bounds = rewriter.create<SplatOp>(loc, indices.getType(), bound);
		return rewriter.create<CmpIOp>(loc, CmpIPredicate::slt, indices, bounds);
		}

// Helper that returns data layout alignment of an operation with memref.		// Helper that returns data layout alignment of an operation with memref.
template <typename T>		template <typename T>
LogicalResult getMemRefAlignment(LLVMTypeConverter &typeConverter, T op,		LogicalResult getMemRefAlignment(LLVMTypeConverter &typeConverter, T op,
unsigned &align) {		unsigned &align) {
Type elementTy =		Type elementTy =
typeConverter.convertType(op.getMemRefType().getElementType());		typeConverter.convertType(op.getMemRefType().getElementType());
if (!elementTy)		if (!elementTy)
return failure();		return failure();
▲ Show 20 Lines • Show All 379 Lines • ▼ Show 20 Lines	public:
}		}
};		};

/// Conversion pattern for all vector reductions.		/// Conversion pattern for all vector reductions.
class VectorReductionOpConversion : public ConvertToLLVMPattern {		class VectorReductionOpConversion : public ConvertToLLVMPattern {
public:		public:
explicit VectorReductionOpConversion(MLIRContext *context,		explicit VectorReductionOpConversion(MLIRContext *context,
LLVMTypeConverter &typeConverter,		LLVMTypeConverter &typeConverter,
bool reassociateFP)		bool reassociateFPRed)
: ConvertToLLVMPattern(vector::ReductionOp::getOperationName(), context,		: ConvertToLLVMPattern(vector::ReductionOp::getOperationName(), context,
typeConverter),		typeConverter),
reassociateFPReductions(reassociateFP) {}		reassociateFPReductions(reassociateFPRed) {}

LogicalResult		LogicalResult
matchAndRewrite(Operation *op, ArrayRef<Value> operands,		matchAndRewrite(Operation *op, ArrayRef<Value> operands,
ConversionPatternRewriter &rewriter) const override {		ConversionPatternRewriter &rewriter) const override {
auto reductionOp = cast<vector::ReductionOp>(op);		auto reductionOp = cast<vector::ReductionOp>(op);
auto kind = reductionOp.kind();		auto kind = reductionOp.kind();
Type eltType = reductionOp.dest().getType();		Type eltType = reductionOp.dest().getType();
Type llvmType = typeConverter.convertType(eltType);		Type llvmType = typeConverter.convertType(eltType);
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	matchAndRewrite(Operation *op, ArrayRef<Value> operands,
}		}
return failure();		return failure();
}		}

private:		private:
const bool reassociateFPReductions;		const bool reassociateFPReductions;
};		};

		/// Conversion pattern for a vector.create_mask (1-D only).
		class VectorCreateMaskOpConversion : public ConvertToLLVMPattern {
		public:
		explicit VectorCreateMaskOpConversion(MLIRContext *context,
		LLVMTypeConverter &typeConverter,
		bool enableIndexOpt)
		: ConvertToLLVMPattern(vector::CreateMaskOp::getOperationName(), context,
		typeConverter),
		enableIndexOptimizations(enableIndexOpt) {}

		LogicalResult
		matchAndRewrite(Operation *op, ArrayRef<Value> operands,
		ConversionPatternRewriter &rewriter) const override {
		auto dstType = op->getResult(0).getType().cast<VectorType>();
		int64_t rank = dstType.getRank();
		if (rank == 1) {
		rewriter.replaceOp(
		op, buildVectorComparison(rewriter, op, enableIndexOptimizations,
		dstType.getDimSize(0), operands[0]));
		return success();
		}
		return failure();
		}

		private:
		const bool enableIndexOptimizations;
		};

class VectorShuffleOpConversion : public ConvertToLLVMPattern {		class VectorShuffleOpConversion : public ConvertToLLVMPattern {
public:		public:
explicit VectorShuffleOpConversion(MLIRContext *context,		explicit VectorShuffleOpConversion(MLIRContext *context,
LLVMTypeConverter &typeConverter)		LLVMTypeConverter &typeConverter)
: ConvertToLLVMPattern(vector::ShuffleOp::getOperationName(), context,		: ConvertToLLVMPattern(vector::ShuffleOp::getOperationName(), context,
typeConverter) {}		typeConverter) {}

LogicalResult		LogicalResult
▲ Show 20 Lines • Show All 516 Lines • ▼ Show 20 Lines	matchAndRewrite(Operation *op, ArrayRef<Value> operands,

rewriter.replaceOp(op, {desc});		rewriter.replaceOp(op, {desc});
return success();		return success();
}		}
};		};

/// Conversion pattern that converts a 1-D vector transfer read/write op in a		/// Conversion pattern that converts a 1-D vector transfer read/write op in a
/// sequence of:		/// sequence of:
/// 1. Bitcast or addrspacecast to vector form.		/// 1. Get the source/dst address as an LLVM vector pointer.
/// 2. Create an offsetVector = [ offset + 0 .. offset + vector_length - 1 ].		/// 2. Create a vector with linear indices [ 0 .. vector_length - 1 ].
/// 3. Create a mask where offsetVector is compared against memref upper bound.		/// 3. Create an offsetVector = [ offset + 0 .. offset + vector_length - 1 ].
/// 4. Rewrite op as a masked read or write.		/// 4. Create a mask where offsetVector is compared against memref upper bound.
		/// 5. Rewrite op as a masked read or write.
template <typename ConcreteOp>		template <typename ConcreteOp>
class VectorTransferConversion : public ConvertToLLVMPattern {		class VectorTransferConversion : public ConvertToLLVMPattern {
public:		public:
explicit VectorTransferConversion(MLIRContext *context,		explicit VectorTransferConversion(MLIRContext *context,
LLVMTypeConverter &typeConv)		LLVMTypeConverter &typeConv,
: ConvertToLLVMPattern(ConcreteOp::getOperationName(), context,		bool enableIndexOpt)
typeConv) {}		: ConvertToLLVMPattern(ConcreteOp::getOperationName(), context, typeConv),
		enableIndexOptimizations(enableIndexOpt) {}

LogicalResult		LogicalResult
matchAndRewrite(Operation *op, ArrayRef<Value> operands,		matchAndRewrite(Operation *op, ArrayRef<Value> operands,
ConversionPatternRewriter &rewriter) const override {		ConversionPatternRewriter &rewriter) const override {
auto xferOp = cast<ConcreteOp>(op);		auto xferOp = cast<ConcreteOp>(op);
auto adaptor = getTransferOpAdapter(xferOp, operands);		auto adaptor = getTransferOpAdapter(xferOp, operands);

if (xferOp.getVectorType().getRank() > 1 \|\|		if (xferOp.getVectorType().getRank() > 1 \|\|
llvm::size(xferOp.indices()) == 0)		llvm::size(xferOp.indices()) == 0)
return failure();		return failure();
if (xferOp.permutation_map() !=		if (xferOp.permutation_map() !=
AffineMap::getMinorIdentityMap(xferOp.permutation_map().getNumInputs(),		AffineMap::getMinorIdentityMap(xferOp.permutation_map().getNumInputs(),
xferOp.getVectorType().getRank(),		xferOp.getVectorType().getRank(),
op->getContext()))		op->getContext()))
return failure();		return failure();
// Only contiguous source tensors supported atm.		// Only contiguous source tensors supported atm.
SmallVector<int64_t, 4> strides;		SmallVector<int64_t, 4> strides;
if (!isContiguous(xferOp.getMemRefType(), strides))		if (!isContiguous(xferOp.getMemRefType(), strides))
return failure();		return failure();

auto toLLVMTy = [&](Type t) { return typeConverter.convertType(t); };		auto toLLVMTy = [&](Type t) { return typeConverter.convertType(t); };

Location loc = op->getLoc();		Location loc = op->getLoc();
Type i64Type = rewriter.getIntegerType(64);
MemRefType memRefType = xferOp.getMemRefType();		MemRefType memRefType = xferOp.getMemRefType();

if (auto memrefVectorElementType =		if (auto memrefVectorElementType =
memRefType.getElementType().dyn_cast<VectorType>()) {		memRefType.getElementType().dyn_cast<VectorType>()) {
// Memref has vector element type.		// Memref has vector element type.
if (memrefVectorElementType.getElementType() !=		if (memrefVectorElementType.getElementType() !=
xferOp.getVectorType().getElementType())		xferOp.getVectorType().getElementType())
return failure();		return failure();
Show All 30 Lines	else
vectorDataPtr = rewriter.create<LLVM::AddrSpaceCastOp>(		vectorDataPtr = rewriter.create<LLVM::AddrSpaceCastOp>(
loc, vecTy.getPointerTo(), dataPtr);		loc, vecTy.getPointerTo(), dataPtr);

if (!xferOp.isMaskedDim(0))		if (!xferOp.isMaskedDim(0))
return replaceTransferOpWithLoadOrStore(rewriter, typeConverter, loc,		return replaceTransferOpWithLoadOrStore(rewriter, typeConverter, loc,
xferOp, operands, vectorDataPtr);		xferOp, operands, vectorDataPtr);

// 2. Create a vector with linear indices [ 0 .. vector_length - 1 ].		// 2. Create a vector with linear indices [ 0 .. vector_length - 1 ].
unsigned vecWidth = vecTy.getVectorNumElements();
VectorType vectorCmpType = VectorType::get(vecWidth, i64Type);
SmallVector<int64_t, 8> indices;
indices.reserve(vecWidth);
for (unsigned i = 0; i < vecWidth; ++i)
indices.push_back(i);
Value linearIndices = rewriter.create<ConstantOp>(
loc, vectorCmpType,
DenseElementsAttr::get(vectorCmpType, ArrayRef<int64_t>(indices)));
linearIndices = rewriter.create<LLVM::DialectCastOp>(
loc, toLLVMTy(vectorCmpType), linearIndices);

// 3. Create offsetVector = [ offset + 0 .. offset + vector_length - 1 ].		// 3. Create offsetVector = [ offset + 0 .. offset + vector_length - 1 ].
// TODO: when the leaf transfer rank is k > 1 we need the last
// `k` dimensions here.
unsigned lastIndex = llvm::size(xferOp.indices()) - 1;
Value offsetIndex = *(xferOp.indices().begin() + lastIndex);
offsetIndex = rewriter.create<IndexCastOp>(loc, i64Type, offsetIndex);
Value base = rewriter.create<SplatOp>(loc, vectorCmpType, offsetIndex);
Value offsetVector = rewriter.create<AddIOp>(loc, base, linearIndices);

// 4. Let dim the memref dimension, compute the vector comparison mask:		// 4. Let dim the memref dimension, compute the vector comparison mask:
// [ offset + 0 .. offset + vector_length - 1 ] < [ dim .. dim ]		// [ offset + 0 .. offset + vector_length - 1 ] < [ dim .. dim ]
		//
		// TODO: when the leaf transfer rank is k > 1, we need the last `k`
		// dimensions here.
		unsigned vecWidth = vecTy.getVectorNumElements();
		unsigned lastIndex = llvm::size(xferOp.indices()) - 1;
		Value off = *(xferOp.indices().begin() + lastIndex);
Value dim = rewriter.create<DimOp>(loc, xferOp.memref(), lastIndex);		Value dim = rewriter.create<DimOp>(loc, xferOp.memref(), lastIndex);
dim = rewriter.create<IndexCastOp>(loc, i64Type, dim);		Value mask = buildVectorComparison(rewriter, op, enableIndexOptimizations,
dim = rewriter.create<SplatOp>(loc, vectorCmpType, dim);		vecWidth, dim, &off);
Value mask =
rewriter.create<CmpIOp>(loc, CmpIPredicate::slt, offsetVector, dim);
mask = rewriter.create<LLVM::DialectCastOp>(loc, toLLVMTy(mask.getType()),
mask);

// 5. Rewrite as a masked read / write.		// 5. Rewrite as a masked read / write.
return replaceTransferOpWithMasked(rewriter, typeConverter, loc, xferOp,		return replaceTransferOpWithMasked(rewriter, typeConverter, loc, xferOp,
operands, vectorDataPtr, mask);		operands, vectorDataPtr, mask);
}		}

		private:
		const bool enableIndexOptimizations;
};		};

class VectorPrintOpConversion : public ConvertToLLVMPattern {		class VectorPrintOpConversion : public ConvertToLLVMPattern {
public:		public:
explicit VectorPrintOpConversion(MLIRContext *context,		explicit VectorPrintOpConversion(MLIRContext *context,
LLVMTypeConverter &typeConverter)		LLVMTypeConverter &typeConverter)
: ConvertToLLVMPattern(vector::PrintOp::getOperationName(), context,		: ConvertToLLVMPattern(vector::PrintOp::getOperationName(), context,
typeConverter) {}		typeConverter) {}
▲ Show 20 Lines • Show All 191 Lines • ▼ Show 20 Lines	public:
bool hasBoundedRewriteRecursion() const final { return true; }		bool hasBoundedRewriteRecursion() const final { return true; }
};		};

} // namespace		} // namespace

/// Populate the given list with patterns that convert from Vector to LLVM.		/// Populate the given list with patterns that convert from Vector to LLVM.
void mlir::populateVectorToLLVMConversionPatterns(		void mlir::populateVectorToLLVMConversionPatterns(
LLVMTypeConverter &converter, OwningRewritePatternList &patterns,		LLVMTypeConverter &converter, OwningRewritePatternList &patterns,
bool reassociateFPReductions) {		bool reassociateFPReductions, bool enableIndexOptimizations) {
MLIRContext *ctx = converter.getDialect()->getContext();		MLIRContext *ctx = converter.getDialect()->getContext();
// clang-format off		// clang-format off
patterns.insert<VectorFMAOpNDRewritePattern,		patterns.insert<VectorFMAOpNDRewritePattern,
VectorInsertStridedSliceOpDifferentRankRewritePattern,		VectorInsertStridedSliceOpDifferentRankRewritePattern,
VectorInsertStridedSliceOpSameRankRewritePattern,		VectorInsertStridedSliceOpSameRankRewritePattern,
VectorExtractStridedSliceOpConversion>(ctx);		VectorExtractStridedSliceOpConversion>(ctx);
patterns.insert<VectorReductionOpConversion>(		patterns.insert<VectorReductionOpConversion>(
ctx, converter, reassociateFPReductions);		ctx, converter, reassociateFPReductions);
		patterns.insert<VectorCreateMaskOpConversion,
		VectorTransferConversion<TransferReadOp>,
		VectorTransferConversion<TransferWriteOp>>(
		ctx, converter, enableIndexOptimizations);
patterns		patterns
.insert<VectorShuffleOpConversion,		.insert<VectorShuffleOpConversion,
VectorExtractElementOpConversion,		VectorExtractElementOpConversion,
VectorExtractOpConversion,		VectorExtractOpConversion,
VectorFMAOp1DConversion,		VectorFMAOp1DConversion,
VectorInsertElementOpConversion,		VectorInsertElementOpConversion,
VectorInsertOpConversion,		VectorInsertOpConversion,
VectorPrintOpConversion,		VectorPrintOpConversion,
VectorTransferConversion<TransferReadOp>,
VectorTransferConversion<TransferWriteOp>,
VectorTypeCastOpConversion,		VectorTypeCastOpConversion,
VectorMaskedLoadOpConversion,		VectorMaskedLoadOpConversion,
VectorMaskedStoreOpConversion,		VectorMaskedStoreOpConversion,
VectorGatherOpConversion,		VectorGatherOpConversion,
VectorScatterOpConversion,		VectorScatterOpConversion,
VectorExpandLoadOpConversion,		VectorExpandLoadOpConversion,
VectorCompressStoreOpConversion>(ctx, converter);		VectorCompressStoreOpConversion>(ctx, converter);
// clang-format on		// clang-format on
}		}

void mlir::populateVectorToLLVMMatrixConversionPatterns(		void mlir::populateVectorToLLVMMatrixConversionPatterns(
LLVMTypeConverter &converter, OwningRewritePatternList &patterns) {		LLVMTypeConverter &converter, OwningRewritePatternList &patterns) {
MLIRContext *ctx = converter.getDialect()->getContext();		MLIRContext *ctx = converter.getDialect()->getContext();
patterns.insert<VectorMatmulOpConversion>(ctx, converter);		patterns.insert<VectorMatmulOpConversion>(ctx, converter);
patterns.insert<VectorFlatTransposeOpConversion>(ctx, converter);		patterns.insert<VectorFlatTransposeOpConversion>(ctx, converter);
}		}

namespace {		namespace {
struct LowerVectorToLLVMPass		struct LowerVectorToLLVMPass
: public ConvertVectorToLLVMBase<LowerVectorToLLVMPass> {		: public ConvertVectorToLLVMBase<LowerVectorToLLVMPass> {
LowerVectorToLLVMPass(const LowerVectorToLLVMOptions &options) {		LowerVectorToLLVMPass(const LowerVectorToLLVMOptions &options) {
this->reassociateFPReductions = options.reassociateFPReductions;		this->reassociateFPReductions = options.reassociateFPReductions;
		this->enableIndexOptimizations = options.enableIndexOptimizations;
}		}
void runOnOperation() override;		void runOnOperation() override;
};		};
} // namespace		} // namespace

void LowerVectorToLLVMPass::runOnOperation() {		void LowerVectorToLLVMPass::runOnOperation() {
// Perform progressive lowering of operations on slices and		// Perform progressive lowering of operations on slices and
// all contraction operations. Also applies folding and DCE.		// all contraction operations. Also applies folding and DCE.
{		{
OwningRewritePatternList patterns;		OwningRewritePatternList patterns;
populateVectorToVectorCanonicalizationPatterns(patterns, &getContext());		populateVectorToVectorCanonicalizationPatterns(patterns, &getContext());
populateVectorSlicesLoweringPatterns(patterns, &getContext());		populateVectorSlicesLoweringPatterns(patterns, &getContext());
populateVectorContractLoweringPatterns(patterns, &getContext());		populateVectorContractLoweringPatterns(patterns, &getContext());
applyPatternsAndFoldGreedily(getOperation(), patterns);		applyPatternsAndFoldGreedily(getOperation(), patterns);
}		}

// Convert to the LLVM IR dialect.		// Convert to the LLVM IR dialect.
LLVMTypeConverter converter(&getContext());		LLVMTypeConverter converter(&getContext());
OwningRewritePatternList patterns;		OwningRewritePatternList patterns;
populateVectorToLLVMMatrixConversionPatterns(converter, patterns);		populateVectorToLLVMMatrixConversionPatterns(converter, patterns);
populateVectorToLLVMConversionPatterns(converter, patterns,		populateVectorToLLVMConversionPatterns(
reassociateFPReductions);		converter, patterns, reassociateFPReductions, enableIndexOptimizations);
populateVectorToLLVMMatrixConversionPatterns(converter, patterns);		populateVectorToLLVMMatrixConversionPatterns(converter, patterns);
populateStdToLLVMConversionPatterns(converter, patterns);		populateStdToLLVMConversionPatterns(converter, patterns);

LLVMConversionTarget target(getContext());		LLVMConversionTarget target(getContext());
if (failed(applyPartialConversion(getOperation(), target, patterns))) {		if (failed(applyPartialConversion(getOperation(), target, patterns)))
signalPassFailure();		signalPassFailure();
}		}
}

std::unique_ptr<OperationPass<ModuleOp>>		std::unique_ptr<OperationPass<ModuleOp>>
mlir::createConvertVectorToLLVMPass(const LowerVectorToLLVMOptions &options) {		mlir::createConvertVectorToLLVMPass(const LowerVectorToLLVMOptions &options) {
return std::make_unique<LowerVectorToLLVMPass>(options);		return std::make_unique<LowerVectorToLLVMPass>(options);
}		}

mlir/lib/Dialect/Vector/VectorTransforms.cpp

Show First 20 Lines • Show All 1,341 Lines • ▼ Show 20 Lines	public:

LogicalResult matchAndRewrite(vector::ConstantMaskOp op,		LogicalResult matchAndRewrite(vector::ConstantMaskOp op,
PatternRewriter &rewriter) const override {		PatternRewriter &rewriter) const override {
auto loc = op.getLoc();		auto loc = op.getLoc();
auto dstType = op.getResult().getType().cast<VectorType>();		auto dstType = op.getResult().getType().cast<VectorType>();
auto eltType = dstType.getElementType();		auto eltType = dstType.getElementType();
auto dimSizes = op.mask_dim_sizes();		auto dimSizes = op.mask_dim_sizes();
int64_t rank = dimSizes.size();		int64_t rank = dimSizes.size();
int64_t trueDim = dimSizes[0].cast<IntegerAttr>().getInt();		int64_t trueDim = std::min(dstType.getDimSize(0),
		dimSizes[0].cast<IntegerAttr>().getInt());

if (rank == 1) {		if (rank == 1) {
// Express constant 1-D case in explicit vector form:		// Express constant 1-D case in explicit vector form:
// [T,..,T,F,..,F].		// [T,..,T,F,..,F].
SmallVector<bool, 4> values(dstType.getDimSize(0));		SmallVector<bool, 4> values(dstType.getDimSize(0));
for (int64_t d = 0; d < trueDim; d++)		for (int64_t d = 0; d < trueDim; d++)
values[d] = true;		values[d] = true;
rewriter.replaceOpWithNewOp<ConstantOp>(		rewriter.replaceOpWithNewOp<ConstantOp>(
Show All 38 Lines	LogicalResult matchAndRewrite(vector::CreateMaskOp op,
PatternRewriter &rewriter) const override {		PatternRewriter &rewriter) const override {
auto loc = op.getLoc();		auto loc = op.getLoc();
auto dstType = op.getResult().getType().cast<VectorType>();		auto dstType = op.getResult().getType().cast<VectorType>();
auto eltType = dstType.getElementType();		auto eltType = dstType.getElementType();
int64_t dim = dstType.getDimSize(0);		int64_t dim = dstType.getDimSize(0);
int64_t rank = dstType.getRank();		int64_t rank = dstType.getRank();
Value idx = op.getOperand(0);		Value idx = op.getOperand(0);

if (rank == 1) {		if (rank == 1)
// Express dynamic 1-D case in explicit vector form:		return failure(); // leave for lowering
// mask = [0,1,..,n-1] < [a,a,..,a]
SmallVector<int64_t, 4> values(dim);
for (int64_t d = 0; d < dim; d++)
values[d] = d;
Value indices =
rewriter.create<ConstantOp>(loc, rewriter.getI64VectorAttr(values));
Value bound =
rewriter.create<IndexCastOp>(loc, rewriter.getI64Type(), idx);
Value bounds = rewriter.create<SplatOp>(loc, indices.getType(), bound);
rewriter.replaceOpWithNewOp<CmpIOp>(op, CmpIPredicate::slt, indices,
bounds);
return success();
}

VectorType lowType =		VectorType lowType =
VectorType::get(dstType.getShape().drop_front(), eltType);		VectorType::get(dstType.getShape().drop_front(), eltType);
Value trueVal = rewriter.create<vector::CreateMaskOp>(		Value trueVal = rewriter.create<vector::CreateMaskOp>(
loc, lowType, op.getOperands().drop_front());		loc, lowType, op.getOperands().drop_front());
Value falseVal = rewriter.create<ConstantOp>(loc, lowType,		Value falseVal = rewriter.create<ConstantOp>(loc, lowType,
rewriter.getZeroAttr(lowType));		rewriter.getZeroAttr(lowType));
Value result = rewriter.create<ConstantOp>(loc, dstType,		Value result = rewriter.create<ConstantOp>(loc, dstType,
▲ Show 20 Lines • Show All 1,043 Lines • Show Last 20 Lines

mlir/test/Conversion/VectorToLLVM/vector-mask-to-llvm.mlir

This file was added.

				// RUN: mlir-opt %s --convert-vector-to-llvm='enable-index-optimizations=1' \| FileCheck %s --check-prefix=CMP32
				// RUN: mlir-opt %s --convert-vector-to-llvm='enable-index-optimizations=0' \| FileCheck %s --check-prefix=CMP64

				// CMP32-LABEL: llvm.func @genbool_var_1d(
				// CMP32-SAME: %[[A:.*]]: !llvm.i64)
				// CMP32: %[[T0:.*]] = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]> : vector<11xi32>) : !llvm.vec<11 x i32>
				// CMP32: %[[T1:.*]] = llvm.trunc %[[A]] : !llvm.i64 to !llvm.i32
				// CMP32: %[[T2:.*]] = llvm.mlir.undef : !llvm.vec<11 x i32>
				// CMP32: %[[T3:.*]] = llvm.mlir.constant(0 : i32) : !llvm.i32
				// CMP32: %[[T4:.*]] = llvm.insertelement %[[T1]], %[[T2]][%[[T3]] : !llvm.i32] : !llvm.vec<11 x i32>
				// CMP32: %[[T5:.*]] = llvm.shufflevector %[[T4]], %[[T2]] [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<11 x i32>, !llvm.vec<11 x i32>
				// CMP32: %[[T6:.*]] = llvm.icmp "slt" %[[T0]], %[[T5]] : !llvm.vec<11 x i32>
				// CMP32: llvm.return %[[T6]] : !llvm.vec<11 x i1>

				// CMP64-LABEL: llvm.func @genbool_var_1d(
				// CMP64-SAME: %[[A:.*]]: !llvm.i64)
				// CMP64: %[[T0:.*]] = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]> : vector<11xi64>) : !llvm.vec<11 x i64>
				// CMP64: %[[T1:.*]] = llvm.mlir.undef : !llvm.vec<11 x i64>
				// CMP64: %[[T2:.*]] = llvm.mlir.constant(0 : i32) : !llvm.i32
				// CMP64: %[[T3:.*]] = llvm.insertelement %[[A]], %[[T1]][%[[T2]] : !llvm.i32] : !llvm.vec<11 x i64>
				// CMP64: %[[T4:.*]] = llvm.shufflevector %[[T3]], %[[T1]] [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<11 x i64>, !llvm.vec<11 x i64>
				// CMP64: %[[T5:.*]] = llvm.icmp "slt" %[[T0]], %[[T4]] : !llvm.vec<11 x i64>
				// CMP64: llvm.return %[[T5]] : !llvm.vec<11 x i1>

				func @genbool_var_1d(%arg0: index) -> vector<11xi1> {
				%0 = vector.create_mask %arg0 : vector<11xi1>
				return %0 : vector<11xi1>
				}

				// CMP32-LABEL: llvm.func @transfer_read_1d
				// CMP32: %[[C:.*]] = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]> : vector<16xi32>) : !llvm.vec<16 x i32>
				// CMP32: %[[A:.]] = llvm.add %{{.}}, %[[C]] : !llvm.vec<16 x i32>
				// CMP32: %[[M:.]] = llvm.icmp "slt" %[[A]], %{{.}} : !llvm.vec<16 x i32>
				// CMP32: %[[L:.]] = llvm.intr.masked.load %{{.}}, %[[M]], %{{.*}}
				// CMP32: llvm.return %[[L]] : !llvm.vec<16 x float>

				// CMP64-LABEL: llvm.func @transfer_read_1d
				// CMP64: %[[C:.*]] = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]> : vector<16xi64>) : !llvm.vec<16 x i64>
				// CMP64: %[[A:.]] = llvm.add %{{.}}, %[[C]] : !llvm.vec<16 x i64>
				// CMP64: %[[M:.]] = llvm.icmp "slt" %[[A]], %{{.}} : !llvm.vec<16 x i64>
				// CMP64: %[[L:.]] = llvm.intr.masked.load %{{.}}, %[[M]], %{{.*}}
				// CMP64: llvm.return %[[L]] : !llvm.vec<16 x float>

				func @transfer_read_1d(%A : memref<?xf32>, %i: index) -> vector<16xf32> {
				%d = constant -1.0: f32
				%f = vector.transfer_read %A[%i], %d {permutation_map = affine_map<(d0) -> (d0)>} : memref<?xf32>, vector<16xf32>
				return %f : vector<16xf32>
				}

mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir

Show First 20 Lines • Show All 743 Lines • ▼ Show 20 Lines
// CHECK-LABEL: func @transfer_read_1d		// CHECK-LABEL: func @transfer_read_1d
// CHECK-SAME: %[[BASE:[a-zA-Z0-9]*]]: !llvm.i64) -> !llvm.vec<17 x float>		// CHECK-SAME: %[[BASE:[a-zA-Z0-9]*]]: !llvm.i64) -> !llvm.vec<17 x float>
//		//
// 1. Bitcast to vector form.		// 1. Bitcast to vector form.
// CHECK: %[[gep:.]] = llvm.getelementptr {{.}} :		// CHECK: %[[gep:.]] = llvm.getelementptr {{.}} :
// CHECK-SAME: (!llvm.ptr<float>, !llvm.i64) -> !llvm.ptr<float>		// CHECK-SAME: (!llvm.ptr<float>, !llvm.i64) -> !llvm.ptr<float>
// CHECK: %[[vecPtr:.*]] = llvm.bitcast %[[gep]] :		// CHECK: %[[vecPtr:.*]] = llvm.bitcast %[[gep]] :
// CHECK-SAME: !llvm.ptr<float> to !llvm.ptr<vec<17 x float>>		// CHECK-SAME: !llvm.ptr<float> to !llvm.ptr<vec<17 x float>>
		// CHECK: %[[DIM:.]] = llvm.extractvalue %{{.}}[3, 0] :
		// CHECK-SAME: !llvm.struct<(ptr<float>, ptr<float>, i64, array<1 x i64>, array<1 x i64>)>
//		//
// 2. Create a vector with linear indices [ 0 .. vector_length - 1 ].		// 2. Create a vector with linear indices [ 0 .. vector_length - 1 ].
// CHECK: %[[linearIndex:.*]] = llvm.mlir.constant(		// CHECK: %[[linearIndex:.*]] = llvm.mlir.constant(dense
// CHECK-SAME: dense<[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]> :		// CHECK-SAME: <[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]> :
// CHECK-SAME: vector<17xi64>) : !llvm.vec<17 x i64>		// CHECK-SAME: vector<17xi64>) : !llvm.vec<17 x i64>
//		//
// 3. Create offsetVector = [ offset + 0 .. offset + vector_length - 1 ].		// 3. Create offsetVector = [ offset + 0 .. offset + vector_length - 1 ].
// CHECK: %[[offsetVec:.*]] = llvm.mlir.undef : !llvm.vec<17 x i64>		// CHECK: %[[offsetVec:.*]] = llvm.mlir.undef : !llvm.vec<17 x i64>
// CHECK: %[[c0:.*]] = llvm.mlir.constant(0 : i32) : !llvm.i32		// CHECK: %[[c0:.*]] = llvm.mlir.constant(0 : i32) : !llvm.i32
// CHECK: %[[offsetVec2:.*]] = llvm.insertelement %[[BASE]], %[[offsetVec]][%[[c0]] :		// CHECK: %[[offsetVec2:.*]] = llvm.insertelement %[[BASE]], %[[offsetVec]][%[[c0]] :
// CHECK-SAME: !llvm.i32] : !llvm.vec<17 x i64>		// CHECK-SAME: !llvm.i32] : !llvm.vec<17 x i64>
// CHECK: %[[offsetVec3:.]] = llvm.shufflevector %[[offsetVec2]], %{{.}} [		// CHECK: %[[offsetVec3:.]] = llvm.shufflevector %[[offsetVec2]], %{{.}} [
// CHECK-SAME: 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32,		// CHECK-SAME: 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32,
// CHECK-SAME: 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32,		// CHECK-SAME: 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32,
// CHECK-SAME: 0 : i32, 0 : i32, 0 : i32] :		// CHECK-SAME: 0 : i32, 0 : i32, 0 : i32] :
// CHECK-SAME: !llvm.vec<17 x i64>, !llvm.vec<17 x i64>		// CHECK-SAME: !llvm.vec<17 x i64>, !llvm.vec<17 x i64>
// CHECK: %[[offsetVec4:.*]] = llvm.add %[[offsetVec3]], %[[linearIndex]] :		// CHECK: %[[offsetVec4:.*]] = llvm.add %[[offsetVec3]], %[[linearIndex]] :
// CHECK-SAME: !llvm.vec<17 x i64>		// CHECK-SAME: !llvm.vec<17 x i64>
//		//
// 4. Let dim the memref dimension, compute the vector comparison mask:		// 4. Let dim the memref dimension, compute the vector comparison mask:
// [ offset + 0 .. offset + vector_length - 1 ] < [ dim .. dim ]		// [ offset + 0 .. offset + vector_length - 1 ] < [ dim .. dim ]
// CHECK: %[[DIM:.]] = llvm.extractvalue %{{.}}[3, 0] :
// CHECK-SAME: !llvm.struct<(ptr<float>, ptr<float>, i64, array<1 x i64>, array<1 x i64>)>
// CHECK: %[[dimVec:.*]] = llvm.mlir.undef : !llvm.vec<17 x i64>		// CHECK: %[[dimVec:.*]] = llvm.mlir.undef : !llvm.vec<17 x i64>
// CHECK: %[[c01:.*]] = llvm.mlir.constant(0 : i32) : !llvm.i32		// CHECK: %[[c01:.*]] = llvm.mlir.constant(0 : i32) : !llvm.i32
// CHECK: %[[dimVec2:.*]] = llvm.insertelement %[[DIM]], %[[dimVec]][%[[c01]] :		// CHECK: %[[dimVec2:.*]] = llvm.insertelement %[[DIM]], %[[dimVec]][%[[c01]] :
// CHECK-SAME: !llvm.i32] : !llvm.vec<17 x i64>		// CHECK-SAME: !llvm.i32] : !llvm.vec<17 x i64>
// CHECK: %[[dimVec3:.]] = llvm.shufflevector %[[dimVec2]], %{{.}} [		// CHECK: %[[dimVec3:.]] = llvm.shufflevector %[[dimVec2]], %{{.}} [
// CHECK-SAME: 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32,		// CHECK-SAME: 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32,
// CHECK-SAME: 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32,		// CHECK-SAME: 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32,
// CHECK-SAME: 0 : i32, 0 : i32, 0 : i32] :		// CHECK-SAME: 0 : i32, 0 : i32, 0 : i32] :
Show All 11 Lines
//		//
// 1. Bitcast to vector form.		// 1. Bitcast to vector form.
// CHECK: %[[gep_b:.]] = llvm.getelementptr {{.}} :		// CHECK: %[[gep_b:.]] = llvm.getelementptr {{.}} :
// CHECK-SAME: (!llvm.ptr<float>, !llvm.i64) -> !llvm.ptr<float>		// CHECK-SAME: (!llvm.ptr<float>, !llvm.i64) -> !llvm.ptr<float>
// CHECK: %[[vecPtr_b:.*]] = llvm.bitcast %[[gep_b]] :		// CHECK: %[[vecPtr_b:.*]] = llvm.bitcast %[[gep_b]] :
// CHECK-SAME: !llvm.ptr<float> to !llvm.ptr<vec<17 x float>>		// CHECK-SAME: !llvm.ptr<float> to !llvm.ptr<vec<17 x float>>
//		//
// 2. Create a vector with linear indices [ 0 .. vector_length - 1 ].		// 2. Create a vector with linear indices [ 0 .. vector_length - 1 ].
// CHECK: %[[linearIndex_b:.*]] = llvm.mlir.constant(		// CHECK: %[[linearIndex_b:.*]] = llvm.mlir.constant(dense
// CHECK-SAME: dense<[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]> :		// CHECK-SAME: <[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]> :
// CHECK-SAME: vector<17xi64>) : !llvm.vec<17 x i64>		// CHECK-SAME: vector<17xi64>) : !llvm.vec<17 x i64>
//		//
// 3. Create offsetVector = [ offset + 0 .. offset + vector_length - 1 ].		// 3. Create offsetVector = [ offset + 0 .. offset + vector_length - 1 ].
// CHECK: llvm.shufflevector {{.*}} [0 : i32, 0 : i32, 0 : i32, 0 : i32,		// CHECK: llvm.shufflevector {{.*}} [0 : i32, 0 : i32, 0 : i32, 0 : i32,
// CHECK-SAME: 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32,		// CHECK-SAME: 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32,
// CHECK-SAME: 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] :		// CHECK-SAME: 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] :
// CHECK-SAME: !llvm.vec<17 x i64>, !llvm.vec<17 x i64>		// CHECK-SAME: !llvm.vec<17 x i64>, !llvm.vec<17 x i64>
// CHECK: llvm.add		// CHECK: llvm.add
//		//
Show All 14 Lines	func @transfer_read_2d_to_1d(%A : memref<?x?xf32>, %base0: index, %base1: index) -> vector<17xf32> {
%f7 = constant 7.0: f32		%f7 = constant 7.0: f32
%f = vector.transfer_read %A[%base0, %base1], %f7		%f = vector.transfer_read %A[%base0, %base1], %f7
{permutation_map = affine_map<(d0, d1) -> (d1)>} :		{permutation_map = affine_map<(d0, d1) -> (d1)>} :
memref<?x?xf32>, vector<17xf32>		memref<?x?xf32>, vector<17xf32>
return %f: vector<17xf32>		return %f: vector<17xf32>
}		}
// CHECK-LABEL: func @transfer_read_2d_to_1d		// CHECK-LABEL: func @transfer_read_2d_to_1d
// CHECK-SAME: %[[BASE_0:[a-zA-Z0-9]]]: !llvm.i64, %[[BASE_1:[a-zA-Z0-9]]]: !llvm.i64) -> !llvm.vec<17 x float>		// CHECK-SAME: %[[BASE_0:[a-zA-Z0-9]]]: !llvm.i64, %[[BASE_1:[a-zA-Z0-9]]]: !llvm.i64) -> !llvm.vec<17 x float>
		// CHECK: %[[DIM:.]] = llvm.extractvalue %{{.}}[3, 1] :
		// CHECK-SAME: !llvm.struct<(ptr<float>, ptr<float>, i64, array<2 x i64>, array<2 x i64>)>
//		//
// Create offsetVector = [ offset + 0 .. offset + vector_length - 1 ].		// Create offsetVector = [ offset + 0 .. offset + vector_length - 1 ].
// CHECK: %[[offsetVec:.*]] = llvm.mlir.undef : !llvm.vec<17 x i64>		// CHECK: %[[offsetVec:.*]] = llvm.mlir.undef : !llvm.vec<17 x i64>
// CHECK: %[[c0:.*]] = llvm.mlir.constant(0 : i32) : !llvm.i32		// CHECK: %[[c0:.*]] = llvm.mlir.constant(0 : i32) : !llvm.i32
// Here we check we properly use %BASE_1		// Here we check we properly use %BASE_1
// CHECK: %[[offsetVec2:.*]] = llvm.insertelement %[[BASE_1]], %[[offsetVec]][%[[c0]] :		// CHECK: %[[offsetVec2:.*]] = llvm.insertelement %[[BASE_1]], %[[offsetVec]][%[[c0]] :
// CHECK-SAME: !llvm.i32] : !llvm.vec<17 x i64>		// CHECK-SAME: !llvm.i32] : !llvm.vec<17 x i64>
// CHECK: %[[offsetVec3:.]] = llvm.shufflevector %[[offsetVec2]], %{{.}} [		// CHECK: %[[offsetVec3:.]] = llvm.shufflevector %[[offsetVec2]], %{{.}} [
// CHECK-SAME: 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32,		// CHECK-SAME: 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32,
// CHECK-SAME: 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32,		// CHECK-SAME: 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32,
// CHECK-SAME: 0 : i32, 0 : i32, 0 : i32] :		// CHECK-SAME: 0 : i32, 0 : i32, 0 : i32] :
//		//
// Let dim the memref dimension, compute the vector comparison mask:		// Let dim the memref dimension, compute the vector comparison mask:
// [ offset + 0 .. offset + vector_length - 1 ] < [ dim .. dim ]		// [ offset + 0 .. offset + vector_length - 1 ] < [ dim .. dim ]
// Here we check we properly use %DIM[1]		// Here we check we properly use %DIM[1]
// CHECK: %[[DIM:.]] = llvm.extractvalue %{{.}}[3, 1] :
// CHECK-SAME: !llvm.struct<(ptr<float>, ptr<float>, i64, array<2 x i64>, array<2 x i64>)>
// CHECK: %[[dimVec:.*]] = llvm.mlir.undef : !llvm.vec<17 x i64>		// CHECK: %[[dimVec:.*]] = llvm.mlir.undef : !llvm.vec<17 x i64>
// CHECK: %[[c01:.*]] = llvm.mlir.constant(0 : i32) : !llvm.i32		// CHECK: %[[c01:.*]] = llvm.mlir.constant(0 : i32) : !llvm.i32
// CHECK: %[[dimVec2:.*]] = llvm.insertelement %[[DIM]], %[[dimVec]][%[[c01]] :		// CHECK: %[[dimVec2:.*]] = llvm.insertelement %[[DIM]], %[[dimVec]][%[[c01]] :
// CHECK-SAME: !llvm.i32] : !llvm.vec<17 x i64>		// CHECK-SAME: !llvm.i32] : !llvm.vec<17 x i64>
// CHECK: %[[dimVec3:.]] = llvm.shufflevector %[[dimVec2]], %{{.}} [		// CHECK: %[[dimVec3:.]] = llvm.shufflevector %[[dimVec2]], %{{.}} [
// CHECK-SAME: 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32,		// CHECK-SAME: 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32,
// CHECK-SAME: 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32,		// CHECK-SAME: 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32,
// CHECK-SAME: 0 : i32, 0 : i32, 0 : i32] :		// CHECK-SAME: 0 : i32, 0 : i32, 0 : i32] :
▲ Show 20 Lines • Show All 159 Lines • Show Last 20 Lines

mlir/test/Dialect/Vector/vector-contract-transforms.mlir

	Show First 20 Lines • Show All 779 Lines • ▼ Show 20 Lines
	// CHECK: %[[T1:.*]] = vector.insert %[[T0]], %[[C3]] [0] : vector<3x4xi1> into vector<2x3x4xi1>			// CHECK: %[[T1:.*]] = vector.insert %[[T0]], %[[C3]] [0] : vector<3x4xi1> into vector<2x3x4xi1>
	// CHECK: return %[[T1]] : vector<2x3x4xi1>			// CHECK: return %[[T1]] : vector<2x3x4xi1>

	func @genbool_3d() -> vector<2x3x4xi1> {			func @genbool_3d() -> vector<2x3x4xi1> {
	%v = vector.constant_mask [1, 1, 3] : vector<2x3x4xi1>			%v = vector.constant_mask [1, 1, 3] : vector<2x3x4xi1>
	return %v: vector<2x3x4xi1>			return %v: vector<2x3x4xi1>
	}			}

	// CHECK-LABEL: func @genbool_var_1d			// CHECK-LABEL: func @genbool_var_1d(
	// CHECK-SAME: %[[A:.*]]: index			// CHECK-SAME: %[[A:.*]]: index)
	// CHECK: %[[C1:.*]] = constant dense<[0, 1, 2]> : vector<3xi64>			// CHECK: %[[T0:.*]] = vector.create_mask %[[A]] : vector<3xi1>
	// CHECK: %[[T0:.*]] = index_cast %[[A]] : index to i64			// CHECK: return %[[T0]] : vector<3xi1>
	// CHECK: %[[T1:.*]] = splat %[[T0]] : vector<3xi64>
	// CHECK: %[[T2:.*]] = cmpi "slt", %[[C1]], %[[T1]] : vector<3xi64>
	// CHECK: return %[[T2]] : vector<3xi1>

	func @genbool_var_1d(%arg0: index) -> vector<3xi1> {			func @genbool_var_1d(%arg0: index) -> vector<3xi1> {
	%0 = vector.create_mask %arg0 : vector<3xi1>			%0 = vector.create_mask %arg0 : vector<3xi1>
	return %0 : vector<3xi1>			return %0 : vector<3xi1>
	}			}

	// CHECK-LABEL: func @genbool_var_2d			// CHECK-LABEL: func @genbool_var_2d(
	// CHECK-SAME: %[[A:.*0]]: index			// CHECK-SAME: %[[A:.*0]]: index,
	// CHECK-SAME: %[[B:.*1]]: index			// CHECK-SAME: %[[B:.*1]]: index)
	// CHECK: %[[CI:.*]] = constant dense<[0, 1, 2]> : vector<3xi64>			// CHECK: %[[C1:.*]] = constant dense<false> : vector<3xi1>
	// CHECK: %[[CF:.*]] = constant dense<false> : vector<3xi1>
	// CHECK: %[[C2:.*]] = constant dense<false> : vector<2x3xi1>			// CHECK: %[[C2:.*]] = constant dense<false> : vector<2x3xi1>
	// CHECK: %[[c0:.*]] = constant 0 : index			// CHECK: %[[c0:.*]] = constant 0 : index
	// CHECK: %[[c1:.*]] = constant 1 : index			// CHECK: %[[c1:.*]] = constant 1 : index
	// CHECK: %[[T0:.*]] = index_cast %[[B]] : index to i64			// CHECK: %[[T0:.*]] = vector.create_mask %[[B]] : vector<3xi1>
	// CHECK: %[[T1:.*]] = splat %[[T0]] : vector<3xi64>			// CHECK: %[[T1:.*]] = cmpi "slt", %[[c0]], %[[A]] : index
	// CHECK: %[[T2:.*]] = cmpi "slt", %[[CI]], %[[T1]] : vector<3xi64>			// CHECK: %[[T2:.*]] = select %[[T1]], %[[T0]], %[[C1]] : vector<3xi1>
	// CHECK: %[[T3:.*]] = cmpi "slt", %[[c0]], %[[A]] : index			// CHECK: %[[T3:.*]] = vector.insert %[[T2]], %[[C2]] [0] : vector<3xi1> into vector<2x3xi1>
	// CHECK: %[[T4:.*]] = select %[[T3]], %[[T2]], %[[CF]] : vector<3xi1>			// CHECK: %[[T4:.*]] = cmpi "slt", %[[c1]], %[[A]] : index
	// CHECK: %[[T5:.*]] = vector.insert %[[T4]], %[[C2]] [0] : vector<3xi1> into vector<2x3xi1>			// CHECK: %[[T5:.*]] = select %[[T4]], %[[T0]], %[[C1]] : vector<3xi1>
	// CHECK: %[[T6:.*]] = cmpi "slt", %[[c1]], %[[A]] : index			// CHECK: %[[T6:.*]] = vector.insert %[[T5]], %[[T3]] [1] : vector<3xi1> into vector<2x3xi1>
	// CHECK: %[[T7:.*]] = select %[[T6]], %[[T2]], %[[CF]] : vector<3xi1>			// CHECK: return %[[T6]] : vector<2x3xi1>
	// CHECK: %[[T8:.*]] = vector.insert %[[T7]], %[[T5]] [1] : vector<3xi1> into vector<2x3xi1>
	// CHECK: return %[[T8]] : vector<2x3xi1>

	func @genbool_var_2d(%arg0: index, %arg1: index) -> vector<2x3xi1> {			func @genbool_var_2d(%arg0: index, %arg1: index) -> vector<2x3xi1> {
	%0 = vector.create_mask %arg0, %arg1 : vector<2x3xi1>			%0 = vector.create_mask %arg0, %arg1 : vector<2x3xi1>
	return %0 : vector<2x3xi1>			return %0 : vector<2x3xi1>
	}			}

				// CHECK-LABEL: func @genbool_var_3d(
				// CHECK-SAME: %[[A:.*0]]: index,
				// CHECK-SAME: %[[B:.*1]]: index,
				// CHECK-SAME: %[[C:.*2]]: index)
				// CHECK: %[[C1:.*]] = constant dense<false> : vector<7xi1>
				// CHECK: %[[C2:.*]] = constant dense<false> : vector<1x7xi1>
				// CHECK: %[[C3:.*]] = constant dense<false> : vector<2x1x7xi1>
				// CHECK: %[[c0:.*]] = constant 0 : index
				// CHECK: %[[c1:.*]] = constant 1 : index
				// CHECK: %[[T0:.*]] = vector.create_mask %[[C]] : vector<7xi1>
				// CHECK: %[[T1:.*]] = cmpi "slt", %[[c0]], %[[B]] : index
				// CHECK: %[[T2:.*]] = select %[[T1]], %[[T0]], %[[C1]] : vector<7xi1>
				// CHECK: %[[T3:.*]] = vector.insert %[[T2]], %[[C2]] [0] : vector<7xi1> into vector<1x7xi1>
				// CHECK: %[[T4:.*]] = cmpi "slt", %[[c0]], %[[A]] : index
				// CHECK: %[[T5:.*]] = select %[[T4]], %[[T3]], %[[C2]] : vector<1x7xi1>
				// CHECK: %[[T6:.*]] = vector.insert %[[T5]], %[[C3]] [0] : vector<1x7xi1> into vector<2x1x7xi1>
				// CHECK: %[[T7:.*]] = cmpi "slt", %[[c1]], %[[A]] : index
				// CHECK: %[[T8:.*]] = select %[[T7]], %[[T3]], %[[C2]] : vector<1x7xi1>
				// CHECK: %[[T9:.*]] = vector.insert %[[T8]], %[[T6]] [1] : vector<1x7xi1> into vector<2x1x7xi1>
				// CHECK: return %[[T9]] : vector<2x1x7xi1>

				func @genbool_var_3d(%arg0: index, %arg1: index, %arg2: index) -> vector<2x1x7xi1> {
				%0 = vector.create_mask %arg0, %arg1, %arg2 : vector<2x1x7xi1>
				return %0 : vector<2x1x7xi1>
				}

	#matmat_accesses_0 = [			#matmat_accesses_0 = [
	affine_map<(m, n, k) -> (m, k)>,			affine_map<(m, n, k) -> (m, k)>,
	affine_map<(m, n, k) -> (k, n)>,			affine_map<(m, n, k) -> (k, n)>,
	affine_map<(m, n, k) -> (m, n)>			affine_map<(m, n, k) -> (m, n)>
	]			]
	#matmat_trait_0 = {			#matmat_trait_0 = {
	indexing_maps = #matmat_accesses_0,			indexing_maps = #matmat_accesses_0,
	iterator_types = ["parallel", "parallel", "reduction"]			iterator_types = ["parallel", "parallel", "reduction"]
	▲ Show 20 Lines • Show All 233 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] [VectorOps] Improve SIMD compares with narrower indicesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 289836

mlir/include/mlir/Conversion/Passes.td

mlir/include/mlir/Conversion/VectorToLLVM/ConvertVectorToLLVM.h

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp

mlir/lib/Dialect/Vector/VectorTransforms.cpp

mlir/test/Conversion/VectorToLLVM/vector-mask-to-llvm.mlir

mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir

mlir/test/Dialect/Vector/vector-contract-transforms.mlir

[mlir] [VectorOps] Improve SIMD compares with narrower indices
ClosedPublic