This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/
-
mlir/
-
Conversion/StandardToSPIRV/
-
StandardToSPIRV/
-
StandardToSPIRV.h
-
Dialect/SPIRV/Transforms/
-
SPIRV/
-
Transforms/
-
SPIRVConversion.h
-
lib/
-
Conversion/StandardToSPIRV/
-
StandardToSPIRV/
-
StandardToSPIRV.cpp
-
StandardToSPIRVPass.cpp
-
Dialect/SPIRV/Transforms/
-
SPIRV/
-
Transforms/
-
SPIRVConversion.cpp
-
test/Conversion/StandardToSPIRV/
-
Conversion/
-
StandardToSPIRV/
-
std-ops-to-spirv.mlir

Differential D98052

[mlir][spirv] Convert tensor.extract for very small tensors
ClosedPublic

Authored by antiagainst on Mar 5 2021, 8:23 AM.

Download Raw Diff

Details

Reviewers

mravishankar
hanchung
ThomasRaoux

Commits

rGbb6f5c831479: [mlir][spirv] Convert tensor.extract for very small tensors

Summary

Normally tensors will be stored in buffers before converting to SPIR-V,
given that is how a large amount of data is sent to the GPU. However,
SPIR-V supports converting from tensors directly too. This is for the
cases where the tensor just contains a small amount of elements and it
makes sense to directly inline them as a small data array in the shader.
To handle this, internally the conversion might create new local
variables. SPIR-V consumers in GPU drivers may or may not optimize that
away. So this has implications over register pressure. Therefore, a
threshold is used to control when the patterns should kick in.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

antiagainst created this revision.Mar 5 2021, 8:23 AM

Herald added a reviewer: mravishankar. · View Herald TranscriptMar 5 2021, 8:23 AM

Herald added subscribers: dcaballe, cota, teijeong and 17 others. · View Herald Transcript

antiagainst requested review of this revision.Mar 5 2021, 8:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 5 2021, 8:23 AM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

antiagainst added reviewers: hanchung, ThomasRaoux.Mar 5 2021, 8:26 AM

The code change looks fine to me, however I wonder if we really want to support translating directly tensors to SPIR-V on the long term. This doesn't look well aligned with similar lowering to llvm and the same code generation could be done by lowering the tensor earlier. You are more familiar with this code than am I so maybe I'm missing some reasons why it is better to handle it that way but I would assume getting rid of all tensors before converting to SPIR-V would be a better long term solution.
Also in term of performance I'm not sure this is the most efficient way to handle this case. I think having the constants in a uniform buffer would be more efficient on most GPUs than creating a private memory array and indexing into it. (unless the indexes end up being constant after optimization and the value can be stored in registers). In general GPUs would have optimizations for uniform buffers while private arrays are duplicated for each work items.

In D98052#2607021, @ThomasRaoux wrote:

The code change looks fine to me, however I wonder if we really want to support translating directly tensors to SPIR-V on the long term. This doesn't look well aligned with similar lowering to llvm and the same code generation could be done by lowering the tensor earlier. You are more familiar with this code than am I so maybe I'm missing some reasons why it is better to handle it that way but I would assume getting rid of all tensors before converting to SPIR-V would be a better long term solution.
Also in term of performance I'm not sure this is the most efficient way to handle this case. I think having the constants in a uniform buffer would be more efficient on most GPUs than creating a private memory array and indexing into it. (unless the indexes end up being constant after optimization and the value can be stored in registers). In general GPUs would have optimizations for uniform buffers while private arrays are duplicated for each work items.

Yup certainly. This is not a general applicable pattern for converting all tensors of all sizes. It's actually very restricted: for constant tensors with a very small number of elements (like < 8 elements or something) when we want to just inline the data array inside the shader. (Sort of like you have a data table where you query its value using indices.) This is also what I'm trying to make very clear by the comments and the special byteCountThreshold control. Creating a dedicated buffer for such a small number of elements may not really worth it.

Use private storage

(nit: your revision description has a typo "Suppot")

In D98052#2607339, @mehdi_amini wrote:

(nit: your revision description has a typo "Suppot")

Ah yeah. :D Fixed, thanks!

Switch back to Function storage class for now

This is also what I'm trying to make very clear by the comments and the special byteCountThreshold control. Creating a dedicated buffer for such a small number of elements may not really worth it.

I suspect the driver will end up storing it in memory as most GPUs cannot index on the register file. This would end up using even more memory as each work item would allocate memory unless the driver optimizes it and prevent from using the potential constant cache. Some drivers may optimize it to a bunch of select ops but I wonder if we want to rely on driver for that.

I don't mean to block progress and I think the code is fine. It would be good on the future to figure out if we really want to allow keeping tensors all the way to SPIR-V conversion.

This revision is now accepted and ready to land.Mar 5 2021, 11:04 AM

In D98052#2607381, @ThomasRaoux wrote:

This is also what I'm trying to make very clear by the comments and the special byteCountThreshold control. Creating a dedicated buffer for such a small number of elements may not really worth it.

I suspect the driver will end up storing it in memory as most GPUs cannot index on the register file. This would end up using even more memory as each work item would allocate memory unless the driver optimizes it and prevent from using the potential constant cache. Some drivers may optimize it to a bunch of select ops but I wonder if we want to rely on driver for that.

I don't mean to block progress and I think the code is fine. It would be good on the future to figure out if we really want to allow keeping tensors all the way to SPIR-V conversion.

+1. True. In general I would do whatever we can do at SPIR-V level to reduce the reliance on the driver compiler. Actually we should at least be using Private storage here given this is not specific to the work item. However right now SPIR-V global variable with Private storage class does not support initializers well. I need to fix that. Planning to do it later. In the long run I think we can inject patterns to convert to vector first before SPIR-V to make it more structured. But still need to convert to !spv.array given there aren't native vector support with size > 4.

In D98052#2607504, @antiagainst wrote:

In D98052#2607381, @ThomasRaoux wrote:

This is also what I'm trying to make very clear by the comments and the special byteCountThreshold control. Creating a dedicated buffer for such a small number of elements may not really worth it.

I suspect the driver will end up storing it in memory as most GPUs cannot index on the register file. This would end up using even more memory as each work item would allocate memory unless the driver optimizes it and prevent from using the potential constant cache. Some drivers may optimize it to a bunch of select ops but I wonder if we want to rely on driver for that.

I don't mean to block progress and I think the code is fine. It would be good on the future to figure out if we really want to allow keeping tensors all the way to SPIR-V conversion.

+1. True. In general I would do whatever we can do at SPIR-V level to reduce the reliance on the driver compiler. Actually we should at least be using Private storage here given this is not specific to the work item. However right now SPIR-V global variable with Private storage class does not support initializers well. I need to fix that. Planning to do it later. In the long run I think we can inject patterns to convert to vector first before SPIR-V to make it more structured. But still need to convert to !spv.array given there aren't native vector support with size > 4.

Sounds like a good plan to me. My suggestion would be for very small array to lower to a bunch of cmp/select and larger array put it in uniform buffer. I think using private memory for constant data is almost always going to be suboptimal. Private is still per work item as far as I understand. I think it only makes sense to use it when we need to index on non-constant data and in my experience will have significant cost as it goes to memory.

+1. True. In general I would do whatever we can do at SPIR-V level to reduce the reliance on the driver compiler. Actually we should at least be using Private storage here given this is not specific to the work item. However right now SPIR-V global variable with Private storage class does not support initializers well. I need to fix that. Planning to do it later. In the long run I think we can inject patterns to convert to vector first before SPIR-V to make it more structured. But still need to convert to !spv.array given there aren't native vector support with size > 4.

Sounds like a good plan to me. My suggestion would be for very small array to lower to a bunch of cmp/select and larger array put it in uniform buffer. I think using private memory for constant data is almost always going to be suboptimal. Private is still per work item as far as I understand. I think it only makes sense to use it when we need to index on non-constant data and in my experience will have significant cost as it goes to memory.

Yup. Sorry I didn't complete the sentence enough; ends up it's confusing I think: "... not specific to the work item [in order to give the driver compiler more direct hints that this is global static const array so that it can handle it better]." We should also attach NonWritable decoration to it. NonWritable's semantics was explicitly relaxed to support this kind of constant data array look up thing. (We have similar needs on graphics side too. It has been a few years this was addressed. So the GPU drivers should be somewhat reliable to optimize it. ;-P) It's just that NonWritable cannot be attached to Private/Function storage objects before SPIR-V 1.4.. (Vulkan 1.1 natively just supports SPIR-V 1.3. There is an extension VK_KHR_spirv_1_4 makes SPIR-V 1.4 to Vulkan 1.1 though, which is getting adoption on Android.) Looks like we can throw in two patterns, one with the NonWritable decoration (and higher priority) and other one without, and let the target environment to filter the suitable one to use. (But again, may need to fix the issue mentioned here first. Quite a few missing things. ;-P) Generating cmp/select instruction chains is certainly another interesting direction and might be better!

In D98052#2607827, @antiagainst wrote:

+1. True. In general I would do whatever we can do at SPIR-V level to reduce the reliance on the driver compiler. Actually we should at least be using Private storage here given this is not specific to the work item. However right now SPIR-V global variable with Private storage class does not support initializers well. I need to fix that. Planning to do it later. In the long run I think we can inject patterns to convert to vector first before SPIR-V to make it more structured. But still need to convert to !spv.array given there aren't native vector support with size > 4.

Sounds like a good plan to me. My suggestion would be for very small array to lower to a bunch of cmp/select and larger array put it in uniform buffer. I think using private memory for constant data is almost always going to be suboptimal. Private is still per work item as far as I understand. I think it only makes sense to use it when we need to index on non-constant data and in my experience will have significant cost as it goes to memory.

Yup. Sorry I didn't complete the sentence enough; ends up it's confusing I think: "... not specific to the work item [in order to give the driver compiler more direct hints that this is global static const array so that it can handle it better]." We should also attach NonWritable decoration to it. NonWritable's semantics was explicitly relaxed to support this kind of constant data array look up thing. (We have similar needs on graphics side too. It has been a few years this was addressed. So the GPU drivers should be somewhat reliable to optimize it. ;-P) It's just that NonWritable cannot be attached to Private/Function storage objects before SPIR-V 1.4.. (Vulkan 1.1 natively just supports SPIR-V 1.3. There is an extension VK_KHR_spirv_1_4 makes SPIR-V 1.4 to Vulkan 1.1 though, which is getting adoption on Android.) Looks like we can throw in two patterns, one with the NonWritable decoration (and higher priority) and other one without, and let the target environment to filter the suitable one to use. (But again, may need to fix the issue mentioned here first. Quite a few missing things. ;-P) Generating cmp/select instruction chains is certainly another interesting direction and might be better!

Ha good point, I think the NonWritable should work, it would be equivalent to the inline constant buffer of DX and the global constants of OCL. I'm pretty sure the driver just convert it into a constant buffer though :) Anyway this is definitely good enough for now.

Harbormaster completed remote builds in B92328: Diff 328536.Mar 6 2021, 12:46 AM

Harbormaster completed remote builds in B92353: Diff 328585.Mar 6 2021, 3:43 AM

Harbormaster completed remote builds in B92364: Diff 328595.Mar 6 2021, 5:01 AM

Closed by commit rGbb6f5c831479: [mlir][spirv] Convert tensor.extract for very small tensors (authored by antiagainst). · Explain WhyMar 6 2021, 5:07 AM

This revision was automatically updated to reflect the committed changes.

antiagainst added a commit: rGbb6f5c831479: [mlir][spirv] Convert tensor.extract for very small tensors.

Revision Contents

Path

Size

mlir/

include/

mlir/

Conversion/

StandardToSPIRV/

StandardToSPIRV.h

17 lines

Dialect/

SPIRV/

Transforms/

SPIRVConversion.h

5 lines

lib/

Conversion/

StandardToSPIRV/

StandardToSPIRV.cpp

73 lines

StandardToSPIRVPass.cpp

2 lines

Dialect/

SPIRV/

Transforms/

SPIRVConversion.cpp

45 lines

test/

Conversion/

StandardToSPIRV/

std-ops-to-spirv.mlir

29 lines

Diff 328755

mlir/include/mlir/Conversion/StandardToSPIRV/StandardToSPIRV.h

	Show All 19 Lines

	/// Appends to a pattern list additional patterns for translating standard ops			/// Appends to a pattern list additional patterns for translating standard ops
	/// to SPIR-V ops. Also adds the patterns to legalize ops not directly			/// to SPIR-V ops. Also adds the patterns to legalize ops not directly
	/// translated to SPIR-V dialect.			/// translated to SPIR-V dialect.
	void populateStandardToSPIRVPatterns(MLIRContext *context,			void populateStandardToSPIRVPatterns(MLIRContext *context,
	SPIRVTypeConverter &typeConverter,			SPIRVTypeConverter &typeConverter,
	OwningRewritePatternList &patterns);			OwningRewritePatternList &patterns);

				/// Appends to a pattern list additional patterns for translating tensor ops
				/// to SPIR-V ops.
				///
				/// Note: Normally tensors will be stored in buffers before converting to
				/// SPIR-V, given that is how a large amount of data is sent to the GPU.
				/// However, SPIR-V supports converting from tensors directly too. This is
				/// for the cases where the tensor just contains a small amount of elements
				/// and it makes sense to directly inline them as a small data array in the
				/// shader. To handle this, internally the conversion might create new local
				/// variables. SPIR-V consumers in GPU drivers may or may not optimize that
				/// away. So this has implications over register pressure. Therefore, a
				/// threshold is used to control when the patterns should kick in.
				void populateTensorToSPIRVPatterns(MLIRContext *context,
				SPIRVTypeConverter &typeConverter,
				int64_t byteCountThreshold,
				OwningRewritePatternList &patterns);

	/// Appends to a pattern list patterns to legalize ops that are not directly			/// Appends to a pattern list patterns to legalize ops that are not directly
	/// lowered to SPIR-V.			/// lowered to SPIR-V.
	void populateStdLegalizationPatternsForSPIRVLowering(			void populateStdLegalizationPatternsForSPIRVLowering(
	MLIRContext *context, OwningRewritePatternList &patterns);			MLIRContext *context, OwningRewritePatternList &patterns);

	} // namespace mlir			} // namespace mlir

	#endif // MLIR_CONVERSION_STANDARDTOSPIRV_STANDARDTOSPIRV_H			#endif // MLIR_CONVERSION_STANDARDTOSPIRV_STANDARDTOSPIRV_H

mlir/include/mlir/Dialect/SPIRV/Transforms/SPIRVConversion.h

	Show First 20 Lines • Show All 98 Lines • ▼ Show 20 Lines

	/// Returns the value for the given `builtin` variable. This function gets or			/// Returns the value for the given `builtin` variable. This function gets or
	/// inserts the global variable associated for the builtin within the nearest			/// inserts the global variable associated for the builtin within the nearest
	/// enclosing op that has a symbol table. Returns null Value if such an			/// enclosing op that has a symbol table. Returns null Value if such an
	/// enclosing op cannot be found.			/// enclosing op cannot be found.
	Value getBuiltinVariableValue(Operation *op, BuiltIn builtin,			Value getBuiltinVariableValue(Operation *op, BuiltIn builtin,
	OpBuilder &builder);			OpBuilder &builder);

				/// Generates IR to perform index linearization with the given `indices` and
				/// their corresponding `strides`, adding an initial `offset`.
				Value linearizeIndex(ValueRange indices, ArrayRef<int64_t> strides,
				int64_t offset, Location loc, OpBuilder &builder);

	/// Performs the index computation to get to the element at `indices` of the			/// Performs the index computation to get to the element at `indices` of the
	/// memory pointed to by `basePtr`, using the layout map of `baseType`.			/// memory pointed to by `basePtr`, using the layout map of `baseType`.

	// TODO: This method assumes that the `baseType` is a MemRefType with AffineMap			// TODO: This method assumes that the `baseType` is a MemRefType with AffineMap
	// that has static strides. Extend to handle dynamic strides.			// that has static strides. Extend to handle dynamic strides.
	spirv::AccessChainOp getElementPtr(SPIRVTypeConverter &typeConverter,			spirv::AccessChainOp getElementPtr(SPIRVTypeConverter &typeConverter,
	MemRefType baseType, Value basePtr,			MemRefType baseType, Value basePtr,
	ValueRange indices, Location loc,			ValueRange indices, Location loc,
	Show All 11 Lines

mlir/lib/Conversion/StandardToSPIRV/StandardToSPIRV.cpp

Show All 10 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "mlir/Dialect/Math/IR/Math.h"		#include "mlir/Dialect/Math/IR/Math.h"
#include "mlir/Dialect/SPIRV/IR/SPIRVDialect.h"		#include "mlir/Dialect/SPIRV/IR/SPIRVDialect.h"
#include "mlir/Dialect/SPIRV/IR/SPIRVOps.h"		#include "mlir/Dialect/SPIRV/IR/SPIRVOps.h"
#include "mlir/Dialect/SPIRV/Transforms/SPIRVConversion.h"		#include "mlir/Dialect/SPIRV/Transforms/SPIRVConversion.h"
#include "mlir/Dialect/SPIRV/Utils/LayoutUtils.h"		#include "mlir/Dialect/SPIRV/Utils/LayoutUtils.h"
#include "mlir/Dialect/StandardOps/IR/Ops.h"		#include "mlir/Dialect/StandardOps/IR/Ops.h"
		#include "mlir/Dialect/Tensor/IR/Tensor.h"
#include "mlir/IR/AffineMap.h"		#include "mlir/IR/AffineMap.h"
#include "mlir/Support/LogicalResult.h"		#include "mlir/Support/LogicalResult.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"

#define DEBUG_TYPE "std-to-spirv-pattern"		#define DEBUG_TYPE "std-to-spirv-pattern"

using namespace mlir;		using namespace mlir;
▲ Show 20 Lines • Show All 480 Lines • ▼ Show 20 Lines	matchAndRewrite(ZeroExtendIOp op, ArrayRef<Value> operands,
Value zero = spirv::ConstantOp::getZero(dstType, loc, rewriter);		Value zero = spirv::ConstantOp::getZero(dstType, loc, rewriter);
Value one = spirv::ConstantOp::getOne(dstType, loc, rewriter);		Value one = spirv::ConstantOp::getOne(dstType, loc, rewriter);
rewriter.template replaceOpWithNewOp<spirv::SelectOp>(		rewriter.template replaceOpWithNewOp<spirv::SelectOp>(
op, dstType, operands.front(), one, zero);		op, dstType, operands.front(), one, zero);
return success();		return success();
}		}
};		};

		/// Converts tensor.extract into loading using access chains from SPIR-V local
		/// variables.
		class TensorExtractPattern final
		: public OpConversionPattern<tensor::ExtractOp> {
		public:
		TensorExtractPattern(TypeConverter &typeConverter, MLIRContext *context,
		int64_t threshold, PatternBenefit benefit = 1)
		: OpConversionPattern(typeConverter, context, benefit),
		byteCountThreshold(threshold) {}

		LogicalResult
		matchAndRewrite(tensor::ExtractOp extractOp, ArrayRef<Value> operands,
		ConversionPatternRewriter &rewriter) const override {
		TensorType tensorType = extractOp.tensor().getType().cast<TensorType>();

		if (!tensorType.hasStaticShape())
		return rewriter.notifyMatchFailure(extractOp, "non-static tensor");

		if (tensorType.getNumElements() * tensorType.getElementTypeBitWidth() >
		byteCountThreshold * 8)
		return rewriter.notifyMatchFailure(extractOp,
		"exceeding byte count threshold");

		Location loc = extractOp.getLoc();
		tensor::ExtractOp::Adaptor adaptor(operands);

		int64_t rank = tensorType.getRank();
		SmallVector<int64_t, 4> strides(rank, 1);
		for (int i = rank - 2; i >= 0; --i) {
		strides[i] = strides[i + 1] * tensorType.getDimSize(i + 1);
		}

		Type varType = spirv::PointerType::get(adaptor.tensor().getType(),
		spirv::StorageClass::Function);

		spirv::VariableOp varOp;
		if (adaptor.tensor().getDefiningOp<spirv::ConstantOp>()) {
		varOp = rewriter.create<spirv::VariableOp>(
		loc, varType, spirv::StorageClass::Function,
		/initializer=/adaptor.tensor());
		} else {
		// Need to store the value to the local variable. It's questionable
		// whether we want to support such case though.
		return failure();
		}

		Value index = spirv::linearizeIndex(adaptor.indices(), strides,
		/offset=/0, loc, rewriter);
		auto acOp = rewriter.create<spirv::AccessChainOp>(loc, varOp, index);

		rewriter.replaceOpWithNewOp<spirv::LoadOp>(extractOp, acOp);

		return success();
		}

		private:
		int64_t byteCountThreshold;
		};

/// Converts std.trunci to spv.Select if the type of result is i1 or vector of		/// Converts std.trunci to spv.Select if the type of result is i1 or vector of
/// i1.		/// i1.
class TruncI1Pattern final : public OpConversionPattern<TruncateIOp> {		class TruncI1Pattern final : public OpConversionPattern<TruncateIOp> {
public:		public:
using OpConversionPattern<TruncateIOp>::OpConversionPattern;		using OpConversionPattern<TruncateIOp>::OpConversionPattern;

LogicalResult		LogicalResult
matchAndRewrite(TruncateIOp op, ArrayRef<Value> operands,		matchAndRewrite(TruncateIOp op, ArrayRef<Value> operands,
▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	LogicalResult SignedRemIOpPattern::matchAndRewrite(

return success();		return success();
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// ConstantOp with composite type.		// ConstantOp with composite type.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		// TODO: This probably should be split into the vector case and tensor case,
		// so that the tensor case can be moved to TensorToSPIRV conversion. But,
		// std.constant is for the standard dialect though.
LogicalResult ConstantCompositeOpPattern::matchAndRewrite(		LogicalResult ConstantCompositeOpPattern::matchAndRewrite(
ConstantOp constOp, ArrayRef<Value> operands,		ConstantOp constOp, ArrayRef<Value> operands,
ConversionPatternRewriter &rewriter) const {		ConversionPatternRewriter &rewriter) const {
auto srcType = constOp.getType().dyn_cast<ShapedType>();		auto srcType = constOp.getType().dyn_cast<ShapedType>();
if (!srcType)		if (!srcType)
return failure();		return failure();

// std.constant should only have vector or tenor types.		// std.constant should only have vector or tenor types.
▲ Show 20 Lines • Show All 532 Lines • ▼ Show 20 Lines	patterns.insert<
// TODO: Move to separate pass.		// TODO: Move to separate pass.
UnaryAndBinaryOpPattern<math::CosOp, spirv::GLSLCosOp>,		UnaryAndBinaryOpPattern<math::CosOp, spirv::GLSLCosOp>,
UnaryAndBinaryOpPattern<math::ExpOp, spirv::GLSLExpOp>,		UnaryAndBinaryOpPattern<math::ExpOp, spirv::GLSLExpOp>,
UnaryAndBinaryOpPattern<math::LogOp, spirv::GLSLLogOp>,		UnaryAndBinaryOpPattern<math::LogOp, spirv::GLSLLogOp>,
UnaryAndBinaryOpPattern<math::RsqrtOp, spirv::GLSLInverseSqrtOp>,		UnaryAndBinaryOpPattern<math::RsqrtOp, spirv::GLSLInverseSqrtOp>,
UnaryAndBinaryOpPattern<math::SinOp, spirv::GLSLSinOp>,		UnaryAndBinaryOpPattern<math::SinOp, spirv::GLSLSinOp>,
UnaryAndBinaryOpPattern<math::SqrtOp, spirv::GLSLSqrtOp>,		UnaryAndBinaryOpPattern<math::SqrtOp, spirv::GLSLSqrtOp>,
UnaryAndBinaryOpPattern<math::TanhOp, spirv::GLSLTanhOp>,		UnaryAndBinaryOpPattern<math::TanhOp, spirv::GLSLTanhOp>,

// Unary and binary patterns		// Unary and binary patterns
BitwiseOpPattern<AndOp, spirv::LogicalAndOp, spirv::BitwiseAndOp>,		BitwiseOpPattern<AndOp, spirv::LogicalAndOp, spirv::BitwiseAndOp>,
BitwiseOpPattern<OrOp, spirv::LogicalOrOp, spirv::BitwiseOrOp>,		BitwiseOpPattern<OrOp, spirv::LogicalOrOp, spirv::BitwiseOrOp>,
UnaryAndBinaryOpPattern<AbsFOp, spirv::GLSLFAbsOp>,		UnaryAndBinaryOpPattern<AbsFOp, spirv::GLSLFAbsOp>,
UnaryAndBinaryOpPattern<AddFOp, spirv::FAddOp>,		UnaryAndBinaryOpPattern<AddFOp, spirv::FAddOp>,
UnaryAndBinaryOpPattern<AddIOp, spirv::IAddOp>,		UnaryAndBinaryOpPattern<AddIOp, spirv::IAddOp>,
UnaryAndBinaryOpPattern<CeilFOp, spirv::GLSLCeilOp>,		UnaryAndBinaryOpPattern<CeilFOp, spirv::GLSLCeilOp>,
UnaryAndBinaryOpPattern<DivFOp, spirv::FDivOp>,		UnaryAndBinaryOpPattern<DivFOp, spirv::FDivOp>,
Show All 38 Lines	patterns.insert<
TypeCastingOpPattern<FPTruncOp, spirv::FConvertOp>>(typeConverter,		TypeCastingOpPattern<FPTruncOp, spirv::FConvertOp>>(typeConverter,
context);		context);

// Give CmpFOpNanKernelPattern a higher benefit so it can prevail when Kernel		// Give CmpFOpNanKernelPattern a higher benefit so it can prevail when Kernel
// capability is available.		// capability is available.
patterns.insert<CmpFOpNanKernelPattern>(typeConverter, context,		patterns.insert<CmpFOpNanKernelPattern>(typeConverter, context,
/benefit=/2);		/benefit=/2);
}		}

		void populateTensorToSPIRVPatterns(MLIRContext *context,
		SPIRVTypeConverter &typeConverter,
		int64_t byteCountThreshold,
		OwningRewritePatternList &patterns) {
		patterns.insert<TensorExtractPattern>(typeConverter, context,
		byteCountThreshold);
		}

} // namespace mlir		} // namespace mlir

mlir/lib/Conversion/StandardToSPIRV/StandardToSPIRVPass.cpp

Show All 31 Lines	void ConvertStandardToSPIRVPass::runOnOperation() {

auto targetAttr = spirv::lookupTargetEnvOrDefault(module);		auto targetAttr = spirv::lookupTargetEnvOrDefault(module);
std::unique_ptr<ConversionTarget> target =		std::unique_ptr<ConversionTarget> target =
spirv::SPIRVConversionTarget::get(targetAttr);		spirv::SPIRVConversionTarget::get(targetAttr);

SPIRVTypeConverter typeConverter(targetAttr);		SPIRVTypeConverter typeConverter(targetAttr);
OwningRewritePatternList patterns;		OwningRewritePatternList patterns;
populateStandardToSPIRVPatterns(context, typeConverter, patterns);		populateStandardToSPIRVPatterns(context, typeConverter, patterns);
		populateTensorToSPIRVPatterns(context, typeConverter,
		/byteCountThreshold=/64, patterns);
populateBuiltinFuncToSPIRVPatterns(context, typeConverter, patterns);		populateBuiltinFuncToSPIRVPatterns(context, typeConverter, patterns);

if (failed(applyPartialConversion(module, *target, std::move(patterns))))		if (failed(applyPartialConversion(module, *target, std::move(patterns))))
return signalPassFailure();		return signalPassFailure();
}		}

std::unique_ptr<OperationPass<ModuleOp>>		std::unique_ptr<OperationPass<ModuleOp>>
mlir::createConvertStandardToSPIRVPass() {		mlir::createConvertStandardToSPIRVPass() {
return std::make_unique<ConvertStandardToSPIRVPass>();		return std::make_unique<ConvertStandardToSPIRVPass>();
}		}

mlir/lib/Dialect/SPIRV/Transforms/SPIRVConversion.cpp

Show First 20 Lines • Show All 601 Lines • ▼ Show 20 Lines	Value mlir::spirv::getBuiltinVariableValue(Operation *op,
Value ptr = builder.create<spirv::AddressOfOp>(op->getLoc(), varOp);		Value ptr = builder.create<spirv::AddressOfOp>(op->getLoc(), varOp);
return builder.create<spirv::LoadOp>(op->getLoc(), ptr);		return builder.create<spirv::LoadOp>(op->getLoc(), ptr);
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Index calculation		// Index calculation
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		Value mlir::spirv::linearizeIndex(ValueRange indices, ArrayRef<int64_t> strides,
		int64_t offset, Location loc,
		OpBuilder &builder) {
		assert(indices.size() == strides.size() &&
		"must provide indices for all dimensions");

		auto indexType = SPIRVTypeConverter::getIndexType(builder.getContext());

		// TODO: Consider moving to use affine.apply and patterns converting
		// affine.apply to standard ops. This needs converting to SPIR-V passes to be
		// broken down into progressive small steps so we can have intermediate steps
		// using other dialects. At the moment SPIR-V is the final sink.

		Value linearizedIndex = builder.create<spirv::ConstantOp>(
		loc, indexType, IntegerAttr::get(indexType, offset));
		for (auto index : llvm::enumerate(indices)) {
		Value strideVal = builder.create<spirv::ConstantOp>(
		loc, indexType, IntegerAttr::get(indexType, strides[index.index()]));
		Value update = builder.create<spirv::IMulOp>(loc, strideVal, index.value());
		linearizedIndex =
		builder.create<spirv::IAddOp>(loc, linearizedIndex, update);
		}
		return linearizedIndex;
		}

spirv::AccessChainOp mlir::spirv::getElementPtr(		spirv::AccessChainOp mlir::spirv::getElementPtr(
SPIRVTypeConverter &typeConverter, MemRefType baseType, Value basePtr,		SPIRVTypeConverter &typeConverter, MemRefType baseType, Value basePtr,
ValueRange indices, Location loc, OpBuilder &builder) {		ValueRange indices, Location loc, OpBuilder &builder) {
// Get base and offset of the MemRefType and verify they are static.		// Get base and offset of the MemRefType and verify they are static.

int64_t offset;		int64_t offset;
SmallVector<int64_t, 4> strides;		SmallVector<int64_t, 4> strides;
if (failed(getStridesAndOffset(baseType, strides, offset)) \|\|		if (failed(getStridesAndOffset(baseType, strides, offset)) \|\|
llvm::is_contained(strides, MemRefType::getDynamicStrideOrOffset()) \|\|		llvm::is_contained(strides, MemRefType::getDynamicStrideOrOffset()) \|\|
offset == MemRefType::getDynamicStrideOrOffset()) {		offset == MemRefType::getDynamicStrideOrOffset()) {
return nullptr;		return nullptr;
}		}

auto indexType = typeConverter.getIndexType(builder.getContext());		auto indexType = typeConverter.getIndexType(builder.getContext());

SmallVector<Value, 2> linearizedIndices;		SmallVector<Value, 2> linearizedIndices;
// Add a '0' at the start to index into the struct.
auto zero = spirv::ConstantOp::getZero(indexType, loc, builder);		auto zero = spirv::ConstantOp::getZero(indexType, loc, builder);

		// Add a '0' at the start to index into the struct.
linearizedIndices.push_back(zero);		linearizedIndices.push_back(zero);

if (baseType.getRank() == 0) {		if (baseType.getRank() == 0) {
linearizedIndices.push_back(zero);		linearizedIndices.push_back(zero);
} else {		} else {
// TODO: Instead of this logic, use affine.apply and add patterns for		linearizedIndices.push_back(
// lowering affine.apply to standard ops. These will get lowered to SPIR-V		linearizeIndex(indices, strides, offset, loc, builder));
// ops by the DialectConversion framework.
Value ptrLoc = builder.create<spirv::ConstantOp>(
loc, indexType, IntegerAttr::get(indexType, offset));
assert(indices.size() == strides.size() &&
"must provide indices for all dimensions");
for (auto index : llvm::enumerate(indices)) {
Value strideVal = builder.create<spirv::ConstantOp>(
loc, indexType, IntegerAttr::get(indexType, strides[index.index()]));
Value update =
builder.create<spirv::IMulOp>(loc, strideVal, index.value());
ptrLoc = builder.create<spirv::IAddOp>(loc, ptrLoc, update);
}
linearizedIndices.push_back(ptrLoc);
}		}
return builder.create<spirv::AccessChainOp>(loc, basePtr, linearizedIndices);		return builder.create<spirv::AccessChainOp>(loc, basePtr, linearizedIndices);
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Set ABI attributes for lowering entry functions.		// Set ABI attributes for lowering entry functions.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

mlir/test/Conversion/StandardToSPIRV/std-ops-to-spirv.mlir

	Show First 20 Lines • Show All 1,142 Lines • ▼ Show 20 Lines
	// Check that multiple-return functions are not converted.			// Check that multiple-return functions are not converted.
	// CHECK-LABEL: func @return_multi_val			// CHECK-LABEL: func @return_multi_val
	func @return_multi_val(%arg0: f32) -> (f32, f32) {			func @return_multi_val(%arg0: f32) -> (f32, f32) {
	// CHECK: return			// CHECK: return
	return %arg0, %arg0: f32, f32			return %arg0, %arg0: f32, f32
	}			}

	}			}

				// -----

				//===----------------------------------------------------------------------===//
				// tensor.extract
				//===----------------------------------------------------------------------===//

				// CHECK-LABEL: func @tensor_extract_constant
				// CHECK-SAME: (%[[A:.+]]: i32, %[[B:.+]]: i32, %[[C:.+]]: i32)
				func @tensor_extract_constant(%a : index, %b: index, %c: index) -> i32 {
				// CHECK: %[[CST:.+]] = spv.Constant dense<[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]>
				%cst = constant dense<[[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]]> : tensor<2x2x3xi32>
				// CHECK: %[[VAR:.+]] = spv.Variable init(%[[CST]]) : !spv.ptr<!spv.array<12 x i32, stride=4>, Function>
				// CHECK: %[[C0:.+]] = spv.Constant 0 : i32
				// CHECK: %[[C6:.+]] = spv.Constant 6 : i32
				// CHECK: %[[MUL0:.+]] = spv.IMul %[[C6]], %[[A]] : i32
				// CHECK: %[[ADD0:.+]] = spv.IAdd %[[C0]], %[[MUL0]] : i32
				// CHECK: %[[C3:.+]] = spv.Constant 3 : i32
				// CHECK: %[[MUL1:.+]] = spv.IMul %[[C3]], %[[B]] : i32
				// CHECK: %[[ADD1:.+]] = spv.IAdd %[[ADD0]], %[[MUL1]] : i32
				// CHECK: %[[C1:.+]] = spv.Constant 1 : i32
				// CHECK: %[[MUL2:.+]] = spv.IMul %[[C1]], %[[C]] : i32
				// CHECK: %[[ADD2:.+]] = spv.IAdd %[[ADD1]], %[[MUL2]] : i32
				// CHECK: %[[AC:.+]] = spv.AccessChain %[[VAR]][%[[ADD2]]]
				// CHECK: %[[VAL:.+]] = spv.Load "Function" %[[AC]] : i32
				%extract = tensor.extract %cst[%a, %b, %c] : tensor<2x2x3xi32>
				// CHECK: spv.ReturnValue %[[VAL]]
				return %extract : i32
				}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][spirv] Convert tensor.extract for very small tensorsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 328755

mlir/include/mlir/Conversion/StandardToSPIRV/StandardToSPIRV.h

mlir/include/mlir/Dialect/SPIRV/Transforms/SPIRVConversion.h

mlir/lib/Conversion/StandardToSPIRV/StandardToSPIRV.cpp

mlir/lib/Conversion/StandardToSPIRV/StandardToSPIRVPass.cpp

mlir/lib/Dialect/SPIRV/Transforms/SPIRVConversion.cpp

mlir/test/Conversion/StandardToSPIRV/std-ops-to-spirv.mlir

[mlir][spirv] Convert tensor.extract for very small tensors
ClosedPublic