This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/
-
mlir/
-
Conversion/
-
LLVMCommon/
-
LoweringOptions.h
2
Pattern.h
-
TypeConverter.h
-
MemRefToLLVM/
-
AllocLikeConversion.h
-
Dialect/GPU/IR/
-
GPU/
-
IR/
2
GPUBase.td
-
lib/
-
Conversion/
-
GPUCommon/
-
GPUToLLVMConversion.cpp
-
GPUToNVVM/
-
CMakeLists.txt
1
LowerGpuOpsToNVVMOps.cpp
-
LLVMCommon/
-
Pattern.cpp
-
TypeConverter.cpp
-
MemRefToLLVM/
-
MemRefToLLVM.cpp
-
Dialect/GPU/IR/
-
GPU/
-
IR/
-
GPUDialect.cpp
-
test/
-
Conversion/
-
GPUCommon/
-
memory-attrbution.mlir
-
GPUToNVVM/
-
typed-pointers.mlir
-
wmma-ops-to-nvvm.mlir
-
Integration/Dialect/Vector/GPU/CUDA/
-
Dialect/
-
Vector/
-
GPU/
-
CUDA/
-
test-warp-distribute.mlir
-
utils/bazel/llvm-project-overlay/mlir/
-
bazel/
-
llvm-project-overlay/
-
mlir/
-
BUILD.bazel

Differential D156898

[mlir][memref] WIP - Reorganize conversions involving memref in the presence of backend-specific type conversions.
Needs ReviewPublic

Authored by nicolasvasilache on Aug 2 2023, 7:27 AM.

Download Raw Diff

Details

Reviewers

aartbik
ftynse
bondhugula
ThomasRaoux
dcaballe
herhut
mehdi_amini
kerrmudgeon
springerm
guraypp

Summary

This revision is a work in progress that tries to entangle the notion of backend-specific type conversion.
In particular it demonstrates that a notional sink pass (or generalized pattern application) helps
avoid footguns that inevitably arise from pass-based thinking that requires keeping all passes in sync with
the same type conversion decision.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nicolasvasilache created this revision.Aug 2 2023, 7:27 AM

Herald added a reviewer: aartbik. · View Herald TranscriptAug 2 2023, 7:27 AM

Herald added a reviewer: ftynse. · View Herald Transcript

Herald added a reviewer: bondhugula. · View Herald Transcript

Herald added a reviewer: ThomasRaoux. · View Herald Transcript

Herald added a reviewer: dcaballe. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: gysit, Dinistro, bviyer and 30 others. · View Herald Transcript

nicolasvasilache requested review of this revision.Aug 2 2023, 7:27 AM

Herald added a reviewer: herhut. · View Herald TranscriptAug 2 2023, 7:27 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: wangpc, stephenneuendorffer, jholewinski. · View Herald Transcript

nicolasvasilache added reviewers: mehdi_amini, kerrmudgeon, springerm, guraypp.Aug 2 2023, 7:29 AM

Harbormaster completed remote builds in B249771: Diff 546461.Aug 2 2023, 9:26 AM

mehdi_amini added inline comments.Aug 2 2023, 8:11 PM

mlir/include/mlir/Conversion/LLVMCommon/Pattern.h
51	This API should not be used for addressing then because it is decoupled from the memory space? Won't this lead to unrealized_cast because of different index size between arithmetic and indexing?
60	It's a bit weird that the indexing of memref would differ from this API which is more generic and about address spaces.
mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
58	This is just gonna do `isSharedMemoryAddressSpace(type.getMemorySpace());` does this indirection pulls its weight here? Also shall it be defined inline?

guraypp added inline comments.Aug 2 2023, 11:54 PM

mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
62	We could add `hasConstantMemoryAddressSpace` or `hasLocalMemoryAddressSpace`.
mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
211	We should use 32b for the local and constant memory spaces as well. Here is the list of address spaces (I don't remember what is 2): generic = 0 global = 1 shared = 3 constant = 4 local = 5 As far as I can see, MLIR does set shared address space, but does not set local or constant address spaces.

Does all this complexity really belongs to llvm conversion pass?
Maybe add a separate pass which decomposes memrefs index calculations into explicit ops (possibly inserting appropriate trunc casts depending on memory space) and then result is just straightforwardly converted to llvm.
As as bonus, you will be able to additional canonicalizations/transformations between this new pass and llvm conversion.

Revision Contents

Path

Size

mlir/

include/

mlir/

Conversion/

LLVMCommon/

LoweringOptions.h

4 lines

Pattern.h

5 lines

TypeConverter.h

5 lines

MemRefToLLVM/

AllocLikeConversion.h

3 lines

Dialect/

GPU/

IR/

GPUBase.td

8 lines

lib/

Conversion/

GPUCommon/

GPUToLLVMConversion.cpp

8 lines

GPUToNVVM/

CMakeLists.txt

2 lines

LowerGpuOpsToNVVMOps.cpp

13 lines

LLVMCommon/

Pattern.cpp

29 lines

TypeConverter.cpp

10 lines

MemRefToLLVM/

MemRefToLLVM.cpp

14 lines

Dialect/

GPU/

IR/

GPUDialect.cpp

16 lines

test/

Conversion/

GPUCommon/

memory-attrbution.mlir

24 lines

GPUToNVVM/

typed-pointers.mlir

14 lines

wmma-ops-to-nvvm.mlir

42 lines

Integration/

Dialect/

Vector/

GPU/

CUDA/

test-warp-distribute.mlir

6 lines

utils/

bazel/

llvm-project-overlay/

mlir/

BUILD.bazel

1 line

Diff 546461

mlir/include/mlir/Conversion/LLVMCommon/LoweringOptions.h

//===- LoweringOptions.h - Common config for lowering to LLVM ---- C++ --===//		//===- LoweringOptions.h - Common config for lowering to LLVM ---- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// Provides a configuration shared by several conversions targeting the LLVM		// Provides a configuration shared by several conversions targeting the LLVM
// dialect.		// dialect.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef MLIR_CONVERSION_LLVMCOMMON_LOWERINGOPTIONS_H		#ifndef MLIR_CONVERSION_LLVMCOMMON_LOWERINGOPTIONS_H
#define MLIR_CONVERSION_LLVMCOMMON_LOWERINGOPTIONS_H		#define MLIR_CONVERSION_LLVMCOMMON_LOWERINGOPTIONS_H

		#include "mlir/IR/BuiltinTypes.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"

namespace mlir {		namespace mlir {

class DataLayout;		class DataLayout;
class MLIRContext;		class MLIRContext;

/// Value to pass as bitwidth for the index type when the converter is expected		/// Value to pass as bitwidth for the index type when the converter is expected
Show All 36 Lines	void overrideIndexBitwidth(unsigned bitwidth) {
assert(bitwidth != kDeriveIndexBitwidthFromDataLayout &&		assert(bitwidth != kDeriveIndexBitwidthFromDataLayout &&
"can only override to a concrete bitwidth");		"can only override to a concrete bitwidth");
indexBitwidth = bitwidth;		indexBitwidth = bitwidth;
}		}

/// Get the index bitwidth.		/// Get the index bitwidth.
unsigned getIndexBitwidth() const { return indexBitwidth; }		unsigned getIndexBitwidth() const { return indexBitwidth; }

		/// Hook to customize the conversion of MemRefType to LLVMType.
		llvm::function_ref<Type(MemRefType)> memrefIndexTypeConverter = nullptr;

private:		private:
unsigned indexBitwidth;		unsigned indexBitwidth;
};		};

} // namespace mlir		} // namespace mlir

#endif // MLIR_CONVERSION_LLVMCOMMON_LOWERINGOPTIONS_H		#endif // MLIR_CONVERSION_LLVMCOMMON_LOWERINGOPTIONS_H

mlir/include/mlir/Conversion/LLVMCommon/Pattern.h

	Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	protected:			protected:
	/// Returns the LLVM dialect.			/// Returns the LLVM dialect.
	LLVM::LLVMDialect &getDialect() const;			LLVM::LLVMDialect &getDialect() const;

	LLVMTypeConverter *getTypeConverter() const;			LLVMTypeConverter *getTypeConverter() const;

	/// Gets the MLIR type wrapping the LLVM integer type whose bit width is			/// Gets the MLIR type wrapping the LLVM integer type whose bit width is
	/// defined by the used type converter.			/// defined by the used type converter.
	Type getIndexType() const;			Type getIndexType() const;
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions This API should not be used for addressing then because it is decoupled from the memory space? Won't this lead to unrealized_cast because of different index size between arithmetic and indexing? mehdi_amini: This API should not be used for addressing then because it is decoupled from the memory space?

				/// Gets the MLIR type wrapping the LLVM integer type whose bit width is
				/// defined by the used type converter and matching the index type needed for
				/// MemRefType `t`.
				Type getIndexTypeMatchingMemRef(MemRefType t) const;

	/// Gets the MLIR type wrapping the LLVM integer type whose bit width			/// Gets the MLIR type wrapping the LLVM integer type whose bit width
	/// corresponds to that of a LLVM pointer type.			/// corresponds to that of a LLVM pointer type.
	Type getIntPtrType(unsigned addressSpace = 0) const;			Type getIntPtrType(unsigned addressSpace = 0) const;
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions It's a bit weird that the indexing of memref would differ from this API which is more generic and about address spaces. mehdi_amini: It's a bit weird that the indexing of memref would differ from this API which is more generic…

	/// Gets the MLIR type wrapping the LLVM void type.			/// Gets the MLIR type wrapping the LLVM void type.
	Type getVoidType() const;			Type getVoidType() const;

	/// Get the MLIR type wrapping the LLVM i8* type.			/// Get the MLIR type wrapping the LLVM i8* type.
	Type getVoidPtrType() const;			Type getVoidPtrType() const;

	/// Create a constant Op producing a value of `resultType` from an index-typed			/// Create a constant Op producing a value of `resultType` from an index-typed
	▲ Show 20 Lines • Show All 161 Lines • Show Last 20 Lines

mlir/include/mlir/Conversion/LLVMCommon/TypeConverter.h

Show First 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	public:
const DataLayoutAnalysis *getDataLayoutAnalysis() const {		const DataLayoutAnalysis *getDataLayoutAnalysis() const {
return dataLayoutAnalysis;		return dataLayoutAnalysis;
}		}

/// Gets the LLVM representation of the index type. The returned type is an		/// Gets the LLVM representation of the index type. The returned type is an
/// integer type with the size configured for this type converter.		/// integer type with the size configured for this type converter.
Type getIndexType();		Type getIndexType();

		/// Gets the LLVM representation of the index type that matches the MemRefType
		/// `t`. The returned type is an integer type with the size configured for
		/// this type converter.
		Type getIndexTypeMatchingMemRef(MemRefType t);

/// Returns true if using opaque pointers was enabled in the lowering options.		/// Returns true if using opaque pointers was enabled in the lowering options.
bool useOpaquePointers() const { return getOptions().useOpaquePointers; }		bool useOpaquePointers() const { return getOptions().useOpaquePointers; }

/// Creates an LLVM pointer type with the given element type and address		/// Creates an LLVM pointer type with the given element type and address
/// space.		/// space.
/// This function is meant to be used in code supporting both typed and opaque		/// This function is meant to be used in code supporting both typed and opaque
/// pointers, as it will create an opaque pointer with the given address space		/// pointers, as it will create an opaque pointer with the given address space
/// if opaque pointers are enabled in the lowering options.		/// if opaque pointers are enabled in the lowering options.
▲ Show 20 Lines • Show All 125 Lines • Show Last 20 Lines

mlir/include/mlir/Conversion/MemRefToLLVM/AllocLikeConversion.h

Show All 37 Lines	protected:

/// Computes the alignment for the given memory allocation op.		/// Computes the alignment for the given memory allocation op.
template <typename OpType>		template <typename OpType>
Value getAlignment(ConversionPatternRewriter &rewriter, Location loc,		Value getAlignment(ConversionPatternRewriter &rewriter, Location loc,
OpType op) const {		OpType op) const {
MemRefType memRefType = op.getType();		MemRefType memRefType = op.getType();
Value alignment;		Value alignment;
if (auto alignmentAttr = op.getAlignment()) {		if (auto alignmentAttr = op.getAlignment()) {
Type indexType = getIndexType();		Type indexType =
		ConvertToLLVMPattern::getIndexTypeMatchingMemRef(memRefType);
alignment =		alignment =
createIndexAttrConstant(rewriter, loc, indexType, *alignmentAttr);		createIndexAttrConstant(rewriter, loc, indexType, *alignmentAttr);
} else if (!memRefType.getElementType().isSignlessIntOrIndexOrFloat()) {		} else if (!memRefType.getElementType().isSignlessIntOrIndexOrFloat()) {
// In the case where no alignment is specified, we may want to override		// In the case where no alignment is specified, we may want to override
// `malloc's` behavior. `malloc` typically aligns at the size of the		// `malloc's` behavior. `malloc` typically aligns at the size of the
// biggest scalar on a target HW. For non-scalars, use the natural		// biggest scalar on a target HW. For non-scalars, use the natural
// alignment of the LLVM type given by the LLVM DataLayout.		// alignment of the LLVM type given by the LLVM DataLayout.
alignment = getSizeInBytes(loc, memRefType.getElementType(), rewriter);		alignment = getSizeInBytes(loc, memRefType.getElementType(), rewriter);
▲ Show 20 Lines • Show All 99 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/GPU/IR/GPUBase.td

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	let extraClassDeclaration = [{

/// Returns the numeric value used to identify the workgroup memory address		/// Returns the numeric value used to identify the workgroup memory address
/// space.		/// space.
static AddressSpace getWorkgroupAddressSpace() { return AddressSpace::Workgroup; }		static AddressSpace getWorkgroupAddressSpace() { return AddressSpace::Workgroup; }

/// Returns the numeric value used to identify the private memory address		/// Returns the numeric value used to identify the private memory address
/// space.		/// space.
static AddressSpace getPrivateAddressSpace() { return AddressSpace::Private; }		static AddressSpace getPrivateAddressSpace() { return AddressSpace::Private; }

		/// Return true if the given MemRefType has an address space that is a
		/// gpu::AddressSpaceAttr attribute with value 'workgroup`.
		static bool hasSharedMemoryAddressSpace(MemRefType type);
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions This is just gonna do `isSharedMemoryAddressSpace(type.getMemorySpace());` does this indirection pulls its weight here? Also shall it be defined inline? mehdi_amini: This is just gonna do `isSharedMemoryAddressSpace(type.getMemorySpace());` does this…

		/// Return true if the given Attribute has matches is a gpu::AddressSpaceAttr
		/// attribute with value 'workgroup`.
		static bool isSharedMemoryAddressSpace(Attribute type);
		gurayppUnsubmitted Not Done Reply Inline Actions We could add `hasConstantMemoryAddressSpace` or `hasLocalMemoryAddressSpace`. guraypp: We could add `hasConstantMemoryAddressSpace` or `hasLocalMemoryAddressSpace`.
}];		}];

let dependentDialects = ["arith::ArithDialect"];		let dependentDialects = ["arith::ArithDialect"];
let useDefaultAttributePrinterParser = 1;		let useDefaultAttributePrinterParser = 1;
let useDefaultTypePrinterParser = 1;		let useDefaultTypePrinterParser = 1;
let usePropertiesForAttributes = 1;		let usePropertiesForAttributes = 1;
}		}

▲ Show 20 Lines • Show All 119 Lines • Show Last 20 Lines

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp

	Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	class ConvertOpToGpuRuntimeCallPattern : public ConvertOpToLLVMPattern<OpTy> {			class ConvertOpToGpuRuntimeCallPattern : public ConvertOpToLLVMPattern<OpTy> {
	public:			public:
	explicit ConvertOpToGpuRuntimeCallPattern(LLVMTypeConverter &typeConverter)			explicit ConvertOpToGpuRuntimeCallPattern(LLVMTypeConverter &typeConverter)
	: ConvertOpToLLVMPattern<OpTy>(typeConverter) {}			: ConvertOpToLLVMPattern<OpTy>(typeConverter) {}

	protected:			protected:
	Value getNumElements(ConversionPatternRewriter &rewriter, Location loc,			Value getNumElements(ConversionPatternRewriter &rewriter, Location loc,
	MemRefType type, MemRefDescriptor desc) const {			MemRefType type, MemRefDescriptor desc) const {
	Type indexType = ConvertToLLVMPattern::getIndexType();			Type indexType = ConvertToLLVMPattern::getIndexTypeMatchingMemRef(type);
	return type.hasStaticShape()			return type.hasStaticShape()
	? ConvertToLLVMPattern::createIndexAttrConstant(			? ConvertToLLVMPattern::createIndexAttrConstant(
	rewriter, loc, indexType, type.getNumElements())			rewriter, loc, indexType, type.getNumElements())
	// For identity maps (verified by caller), the number of			// For identity maps (verified by caller), the number of
	// elements is stride[0] * size[0].			// elements is stride[0] * size[0].
	: rewriter.create<LLVM::MulOp>(loc,			: rewriter.create<LLVM::MulOp>(loc,
	desc.stride(rewriter, loc, 0),			desc.stride(rewriter, loc, 0),
	desc.size(rewriter, loc, 0));			desc.size(rewriter, loc, 0));
	▲ Show 20 Lines • Show All 570 Lines • ▼ Show 20 Lines
	private:			private:
	LogicalResult			LogicalResult
	matchAndRewrite(gpu::SDDMMOp op, OpAdaptor adaptor,			matchAndRewrite(gpu::SDDMMOp op, OpAdaptor adaptor,
	ConversionPatternRewriter &rewriter) const override;			ConversionPatternRewriter &rewriter) const override;
	};			};

	} // namespace			} // namespace

				static IntegerType getIndexTypeForMemRef(MemRefType t) {
				int64_t numBits = gpu::GPUDialect::hasSharedMemoryAddressSpace(t) ? 32 : 64;
				return IntegerType::get(t.getContext(), numBits);
				}

	void GpuToLLVMConversionPass::runOnOperation() {			void GpuToLLVMConversionPass::runOnOperation() {
	LowerToLLVMOptions options(&getContext());			LowerToLLVMOptions options(&getContext());
	options.useOpaquePointers = useOpaquePointers;			options.useOpaquePointers = useOpaquePointers;
	options.useBarePtrCallConv = hostBarePtrCallConv;			options.useBarePtrCallConv = hostBarePtrCallConv;
				options.memrefIndexTypeConverter = getIndexTypeForMemRef;

	LLVMTypeConverter converter(&getContext(), options);			LLVMTypeConverter converter(&getContext(), options);
	RewritePatternSet patterns(&getContext());			RewritePatternSet patterns(&getContext());
	LLVMConversionTarget target(getContext());			LLVMConversionTarget target(getContext());

	target.addIllegalDialect<gpu::GPUDialect>();			target.addIllegalDialect<gpu::GPUDialect>();

	mlir::arith::populateArithToLLVMConversionPatterns(converter, patterns);			mlir::arith::populateArithToLLVMConversionPatterns(converter, patterns);
	▲ Show 20 Lines • Show All 1,141 Lines • Show Last 20 Lines

mlir/lib/Conversion/GPUToNVVM/CMakeLists.txt

Show All 12 Lines	add_mlir_conversion_library(MLIRGPUToNVVMTransforms
LINK_LIBS PUBLIC		LINK_LIBS PUBLIC
MLIRArithToLLVM		MLIRArithToLLVM
MLIRFuncToLLVM		MLIRFuncToLLVM
MLIRGPUDialect		MLIRGPUDialect
MLIRGPUToGPURuntimeTransforms		MLIRGPUToGPURuntimeTransforms
MLIRLLVMCommonConversion		MLIRLLVMCommonConversion
MLIRLLVMDialect		MLIRLLVMDialect
MLIRMemRefToLLVM		MLIRMemRefToLLVM
		MLIRNVGPUDialect
MLIRNVVMDialect		MLIRNVVMDialect
MLIRPass		MLIRPass
MLIRTransformUtils		MLIRTransformUtils
		MLIRVectorToLLVM
)		)

mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp

Show All 10 Lines
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "mlir/Conversion/GPUToNVVM/GPUToNVVMPass.h"		#include "mlir/Conversion/GPUToNVVM/GPUToNVVMPass.h"

#include "mlir/Conversion/ArithToLLVM/ArithToLLVM.h"		#include "mlir/Conversion/ArithToLLVM/ArithToLLVM.h"
#include "mlir/Conversion/ControlFlowToLLVM/ControlFlowToLLVM.h"		#include "mlir/Conversion/ControlFlowToLLVM/ControlFlowToLLVM.h"
#include "mlir/Conversion/FuncToLLVM/ConvertFuncToLLVM.h"		#include "mlir/Conversion/FuncToLLVM/ConvertFuncToLLVM.h"
		#include "mlir/Conversion/GPUCommon/GPUCommonPass.h"
#include "mlir/Conversion/LLVMCommon/ConversionTarget.h"		#include "mlir/Conversion/LLVMCommon/ConversionTarget.h"
#include "mlir/Conversion/LLVMCommon/LoweringOptions.h"		#include "mlir/Conversion/LLVMCommon/LoweringOptions.h"
#include "mlir/Conversion/LLVMCommon/TypeConverter.h"		#include "mlir/Conversion/LLVMCommon/TypeConverter.h"
#include "mlir/Conversion/MemRefToLLVM/MemRefToLLVM.h"		#include "mlir/Conversion/MemRefToLLVM/MemRefToLLVM.h"
		#include "mlir/Conversion/VectorToLLVM/ConvertVectorToLLVM.h"
#include "mlir/Dialect/ControlFlow/IR/ControlFlow.h"		#include "mlir/Dialect/ControlFlow/IR/ControlFlow.h"
#include "mlir/Dialect/Func/IR/FuncOps.h"		#include "mlir/Dialect/Func/IR/FuncOps.h"
#include "mlir/Dialect/GPU/IR/GPUDialect.h"		#include "mlir/Dialect/GPU/IR/GPUDialect.h"
#include "mlir/Dialect/GPU/Transforms/Passes.h"		#include "mlir/Dialect/GPU/Transforms/Passes.h"
#include "mlir/Dialect/LLVMIR/NVVMDialect.h"		#include "mlir/Dialect/LLVMIR/NVVMDialect.h"
#include "mlir/Dialect/Math/IR/Math.h"		#include "mlir/Dialect/Math/IR/Math.h"
#include "mlir/Dialect/MemRef/IR/MemRef.h"		#include "mlir/Dialect/MemRef/IR/MemRef.h"
		#include "mlir/Dialect/NVGPU/IR/NVGPUDialect.h"
#include "mlir/Transforms/DialectConversion.h"		#include "mlir/Transforms/DialectConversion.h"
#include "mlir/Transforms/GreedyPatternRewriteDriver.h"		#include "mlir/Transforms/GreedyPatternRewriteDriver.h"

#include "../GPUCommon/GPUOpsLowering.h"		#include "../GPUCommon/GPUOpsLowering.h"
#include "../GPUCommon/IndexIntrinsicsOpLowering.h"		#include "../GPUCommon/IndexIntrinsicsOpLowering.h"
#include "../GPUCommon/OpToFuncCallLowering.h"		#include "../GPUCommon/OpToFuncCallLowering.h"
#include <optional>		#include <optional>

▲ Show 20 Lines • Show All 159 Lines • ▼ Show 20 Lines	matchAndRewrite(gpu::LaneIdOp op, gpu::LaneIdOp::Adaptor adaptor,
rewriter.replaceOp(op, {newOp});		rewriter.replaceOp(op, {newOp});
return success();		return success();
}		}
};		};

/// Import the GPU Ops to NVVM Patterns.		/// Import the GPU Ops to NVVM Patterns.
#include "GPUToNVVM.cpp.inc"		#include "GPUToNVVM.cpp.inc"

		static IntegerType getIndexTypeForMemRef(MemRefType t) {
		int64_t numBits = (gpu::GPUDialect::hasSharedMemoryAddressSpace(t) \|\|
		nvgpu::NVGPUDialect::hasSharedMemoryAddressSpace(t))
		? 32
		gurayppUnsubmitted Not Done Reply Inline Actions We should use 32b for the local and constant memory spaces as well. Here is the list of address spaces (I don't remember what is 2): generic = 0 global = 1 shared = 3 constant = 4 local = 5 As far as I can see, MLIR does set shared address space, but does not set local or constant address spaces. guraypp: We should use 32b for the local and constant memory spaces as well. Here is the list of address…
		: 64;
		return IntegerType::get(t.getContext(), numBits);
		}

/// A pass that replaces all occurrences of GPU device operations with their		/// A pass that replaces all occurrences of GPU device operations with their
/// corresponding NVVM equivalent.		/// corresponding NVVM equivalent.
///		///
/// This pass only handles device code and is not meant to be run on GPU host		/// This pass only handles device code and is not meant to be run on GPU host
/// code.		/// code.
struct LowerGpuOpsToNVVMOpsPass		struct LowerGpuOpsToNVVMOpsPass
: public impl::ConvertGpuOpsToNVVMOpsBase<LowerGpuOpsToNVVMOpsPass> {		: public impl::ConvertGpuOpsToNVVMOpsBase<LowerGpuOpsToNVVMOpsPass> {
LowerGpuOpsToNVVMOpsPass() = default;		LowerGpuOpsToNVVMOpsPass() = default;
Show All 14 Lines	void runOnOperation() override {
// Customize the bitwidth used for the device side index computations.		// Customize the bitwidth used for the device side index computations.
LowerToLLVMOptions options(		LowerToLLVMOptions options(
m.getContext(),		m.getContext(),
DataLayout(cast<DataLayoutOpInterface>(m.getOperation())));		DataLayout(cast<DataLayoutOpInterface>(m.getOperation())));
if (indexBitwidth != kDeriveIndexBitwidthFromDataLayout)		if (indexBitwidth != kDeriveIndexBitwidthFromDataLayout)
options.overrideIndexBitwidth(indexBitwidth);		options.overrideIndexBitwidth(indexBitwidth);
options.useOpaquePointers = useOpaquePointers;		options.useOpaquePointers = useOpaquePointers;
options.useBarePtrCallConv = useBarePtrCallConv;		options.useBarePtrCallConv = useBarePtrCallConv;
		options.memrefIndexTypeConverter = getIndexTypeForMemRef;

// Apply in-dialect lowering. In-dialect lowering will replace		// Apply in-dialect lowering. In-dialect lowering will replace
// ops which need to be lowered further, which is not supported by a		// ops which need to be lowered further, which is not supported by a
// single conversion pass.		// single conversion pass.
{		{
RewritePatternSet patterns(m.getContext());		RewritePatternSet patterns(m.getContext());
populateGpuRewritePatterns(patterns);		populateGpuRewritePatterns(patterns);
if (failed(applyPatternsAndFoldGreedily(m, std::move(patterns))))		if (failed(applyPatternsAndFoldGreedily(m, std::move(patterns))))
Show All 23 Lines	void runOnOperation() override {
// Lowering for MMAMatrixType.		// Lowering for MMAMatrixType.
converter.addConversion([&](gpu::MMAMatrixType type) -> Type {		converter.addConversion([&](gpu::MMAMatrixType type) -> Type {
return convertMMAToLLVMType(type);		return convertMMAToLLVMType(type);
});		});
RewritePatternSet llvmPatterns(m.getContext());		RewritePatternSet llvmPatterns(m.getContext());

arith::populateArithToLLVMConversionPatterns(converter, llvmPatterns);		arith::populateArithToLLVMConversionPatterns(converter, llvmPatterns);
cf::populateControlFlowToLLVMConversionPatterns(converter, llvmPatterns);		cf::populateControlFlowToLLVMConversionPatterns(converter, llvmPatterns);
		populateVectorToLLVMConversionPatterns(converter, llvmPatterns);
populateFuncToLLVMConversionPatterns(converter, llvmPatterns);		populateFuncToLLVMConversionPatterns(converter, llvmPatterns);
populateFinalizeMemRefToLLVMConversionPatterns(converter, llvmPatterns);		populateFinalizeMemRefToLLVMConversionPatterns(converter, llvmPatterns);
populateGpuToNVVMConversionPatterns(converter, llvmPatterns);		populateGpuToNVVMConversionPatterns(converter, llvmPatterns);
populateGpuWMMAToNVVMConversionPatterns(converter, llvmPatterns);		populateGpuWMMAToNVVMConversionPatterns(converter, llvmPatterns);
if (this->hasRedux)		if (this->hasRedux)
populateGpuSubgroupReduceOpLoweringPattern(converter, llvmPatterns);		populateGpuSubgroupReduceOpLoweringPattern(converter, llvmPatterns);
LLVMConversionTarget target(getContext());		LLVMConversionTarget target(getContext());
configureGpuToNVVMConversionLegality(target);		configureGpuToNVVMConversionLegality(target);
▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

mlir/lib/Conversion/LLVMCommon/Pattern.cpp

Show All 13 Lines
#include "mlir/IR/BuiltinAttributes.h"		#include "mlir/IR/BuiltinAttributes.h"

using namespace mlir;		using namespace mlir;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// ConvertToLLVMPattern		// ConvertToLLVMPattern
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		static Value convertToDesiredIndexType(OpBuilder &b, Location loc, Value src,
		Type desiredIndexType) {
		assert(src.getType().isIntOrIndex() && !src.getType().isIndex() &&
		"expected int type");
		assert(desiredIndexType.isIntOrIndex() && !desiredIndexType.isIndex() &&
		"expected int type");
		if (src.getType() == desiredIndexType)
		return src;
		if (src.getType().getIntOrFloatBitWidth() <
		desiredIndexType.getIntOrFloatBitWidth())
		return b.create<LLVM::SExtOp>(loc, desiredIndexType, src);
		return b.create<LLVM::TruncOp>(loc, desiredIndexType, src);
		}

ConvertToLLVMPattern::ConvertToLLVMPattern(StringRef rootOpName,		ConvertToLLVMPattern::ConvertToLLVMPattern(StringRef rootOpName,
MLIRContext *context,		MLIRContext *context,
LLVMTypeConverter &typeConverter,		LLVMTypeConverter &typeConverter,
PatternBenefit benefit)		PatternBenefit benefit)
: ConversionPattern(typeConverter, rootOpName, benefit, context) {}		: ConversionPattern(typeConverter, rootOpName, benefit, context) {}

LLVMTypeConverter *ConvertToLLVMPattern::getTypeConverter() const {		LLVMTypeConverter *ConvertToLLVMPattern::getTypeConverter() const {
return static_cast<LLVMTypeConverter *>(		return static_cast<LLVMTypeConverter *>(
ConversionPattern::getTypeConverter());		ConversionPattern::getTypeConverter());
}		}

LLVM::LLVMDialect &ConvertToLLVMPattern::getDialect() const {		LLVM::LLVMDialect &ConvertToLLVMPattern::getDialect() const {
return *getTypeConverter()->getDialect();		return *getTypeConverter()->getDialect();
}		}

Type ConvertToLLVMPattern::getIndexType() const {		Type ConvertToLLVMPattern::getIndexType() const {
return getTypeConverter()->getIndexType();		return getTypeConverter()->getIndexType();
}		}

		Type ConvertToLLVMPattern::getIndexTypeMatchingMemRef(MemRefType t) const {
		return getTypeConverter()->getIndexTypeMatchingMemRef(t);
		}

Type ConvertToLLVMPattern::getIntPtrType(unsigned addressSpace) const {		Type ConvertToLLVMPattern::getIntPtrType(unsigned addressSpace) const {
return IntegerType::get(&getTypeConverter()->getContext(),		return IntegerType::get(&getTypeConverter()->getContext(),
getTypeConverter()->getPointerBitwidth(addressSpace));		getTypeConverter()->getPointerBitwidth(addressSpace));
}		}

Type ConvertToLLVMPattern::getVoidType() const {		Type ConvertToLLVMPattern::getVoidType() const {
return LLVM::LLVMVoidType::get(&getTypeConverter()->getContext());		return LLVM::LLVMVoidType::get(&getTypeConverter()->getContext());
}		}
Show All 20 Lines	Value ConvertToLLVMPattern::getStridedElementPtr(
MemRefDescriptor memRefDescriptor(memRefDesc);		MemRefDescriptor memRefDescriptor(memRefDesc);
// Use a canonical representation of the start address so that later		// Use a canonical representation of the start address so that later
// optimizations have a longer sequence of instructions to CSE.		// optimizations have a longer sequence of instructions to CSE.
// If we don't do that we would sprinkle the memref.offset in various		// If we don't do that we would sprinkle the memref.offset in various
// position of the different address computations.		// position of the different address computations.
Value base =		Value base =
memRefDescriptor.bufferPtr(rewriter, loc, *getTypeConverter(), type);		memRefDescriptor.bufferPtr(rewriter, loc, *getTypeConverter(), type);

Type indexType = getIndexType();		Type indexType = getIndexTypeMatchingMemRef(type);
Value index;		Value index;
for (int i = 0, e = indices.size(); i < e; ++i) {		for (int i = 0, e = indices.size(); i < e; ++i) {
Value increment = indices[i];		Value increment = indices[i];
if (strides[i] != 1) { // Skip if stride is 1.		if (strides[i] != 1) { // Skip if stride is 1.
Value stride =		Value stride =
ShapedType::isDynamic(strides[i])		ShapedType::isDynamic(strides[i])
? memRefDescriptor.stride(rewriter, loc, i)		? memRefDescriptor.stride(rewriter, loc, i)
: createIndexAttrConstant(rewriter, loc, indexType, strides[i]);		: createIndexAttrConstant(rewriter, loc, indexType, strides[i]);
		increment =
		convertToDesiredIndexType(rewriter, loc, increment, indexType);
increment = rewriter.create<LLVM::MulOp>(loc, increment, stride);		increment = rewriter.create<LLVM::MulOp>(loc, increment, stride);
}		}
		increment = convertToDesiredIndexType(rewriter, loc, increment, indexType);
index =		index =
index ? rewriter.create<LLVM::AddOp>(loc, index, increment) : increment;		index ? rewriter.create<LLVM::AddOp>(loc, index, increment) : increment;
}		}

Type elementPtrType = memRefDescriptor.getElementPtrType();		Type elementPtrType = memRefDescriptor.getElementPtrType();
return index ? rewriter.create<LLVM::GEPOp>(		return index ? rewriter.create<LLVM::GEPOp>(
loc, elementPtrType,		loc, elementPtrType,
getTypeConverter()->convertType(type.getElementType()),		getTypeConverter()->convertType(type.getElementType()),
Show All 26 Lines	void ConvertToLLVMPattern::getMemRefDescriptorSizes(
assert(isConvertibleAndHasIdentityMaps(memRefType) &&		assert(isConvertibleAndHasIdentityMaps(memRefType) &&
"layout maps must have been normalized away");		"layout maps must have been normalized away");
assert(count(memRefType.getShape(), ShapedType::kDynamic) ==		assert(count(memRefType.getShape(), ShapedType::kDynamic) ==
static_cast<ssize_t>(dynamicSizes.size()) &&		static_cast<ssize_t>(dynamicSizes.size()) &&
"dynamicSizes size doesn't match dynamic sizes count in memref shape");		"dynamicSizes size doesn't match dynamic sizes count in memref shape");

sizes.reserve(memRefType.getRank());		sizes.reserve(memRefType.getRank());
unsigned dynamicIndex = 0;		unsigned dynamicIndex = 0;
Type indexType = getIndexType();		Type indexType = getIndexTypeMatchingMemRef(memRefType);
for (int64_t size : memRefType.getShape()) {		for (int64_t size : memRefType.getShape()) {
sizes.push_back(		sizes.push_back(
size == ShapedType::kDynamic		size == ShapedType::kDynamic
? dynamicSizes[dynamicIndex++]		? dynamicSizes[dynamicIndex++]
: createIndexAttrConstant(rewriter, loc, indexType, size));		: createIndexAttrConstant(rewriter, loc, indexType, size));
}		}

// Strides: iterate sizes in reverse order and multiply.		// Strides: iterate sizes in reverse order and multiply.
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines

Value ConvertToLLVMPattern::getNumElements(		Value ConvertToLLVMPattern::getNumElements(
Location loc, MemRefType memRefType, ValueRange dynamicSizes,		Location loc, MemRefType memRefType, ValueRange dynamicSizes,
ConversionPatternRewriter &rewriter) const {		ConversionPatternRewriter &rewriter) const {
assert(count(memRefType.getShape(), ShapedType::kDynamic) ==		assert(count(memRefType.getShape(), ShapedType::kDynamic) ==
static_cast<ssize_t>(dynamicSizes.size()) &&		static_cast<ssize_t>(dynamicSizes.size()) &&
"dynamicSizes size doesn't match dynamic sizes count in memref shape");		"dynamicSizes size doesn't match dynamic sizes count in memref shape");

Type indexType = getIndexType();		Type indexType = getIndexTypeMatchingMemRef(memRefType);
Value numElements = memRefType.getRank() == 0		Value numElements = memRefType.getRank() == 0
? createIndexAttrConstant(rewriter, loc, indexType, 1)		? createIndexAttrConstant(rewriter, loc, indexType, 1)
: nullptr;		: nullptr;
unsigned dynamicIndex = 0;		unsigned dynamicIndex = 0;

// Compute the total number of memref elements.		// Compute the total number of memref elements.
for (int64_t staticSize : memRefType.getShape()) {		for (int64_t staticSize : memRefType.getShape()) {
if (numElements) {		if (numElements) {
Show All 22 Lines	MemRefDescriptor ConvertToLLVMPattern::createMemRefDescriptor(

// Field 1: Allocated pointer, used for malloc/free.		// Field 1: Allocated pointer, used for malloc/free.
memRefDescriptor.setAllocatedPtr(rewriter, loc, allocatedPtr);		memRefDescriptor.setAllocatedPtr(rewriter, loc, allocatedPtr);

// Field 2: Actual aligned pointer to payload.		// Field 2: Actual aligned pointer to payload.
memRefDescriptor.setAlignedPtr(rewriter, loc, alignedPtr);		memRefDescriptor.setAlignedPtr(rewriter, loc, alignedPtr);

// Field 3: Offset in aligned pointer.		// Field 3: Offset in aligned pointer.
Type indexType = getIndexType();		Type indexType = getIndexTypeMatchingMemRef(memRefType);
memRefDescriptor.setOffset(		memRefDescriptor.setOffset(
rewriter, loc, createIndexAttrConstant(rewriter, loc, indexType, 0));		rewriter, loc, createIndexAttrConstant(rewriter, loc, indexType, 0));

// Fields 4: Sizes.		// Fields 4: Sizes.
for (const auto &en : llvm::enumerate(sizes))		for (const auto &en : llvm::enumerate(sizes))
memRefDescriptor.setSize(rewriter, loc, en.index(), en.value());		memRefDescriptor.setSize(rewriter, loc, en.index(), en.value());

// Field 5: Strides.		// Field 5: Strides.
▲ Show 20 Lines • Show All 131 Lines • Show Last 20 Lines

mlir/lib/Conversion/LLVMCommon/TypeConverter.cpp

Show First 20 Lines • Show All 168 Lines • ▼ Show 20 Lines
MLIRContext &LLVMTypeConverter::getContext() {		MLIRContext &LLVMTypeConverter::getContext() {
return *getDialect()->getContext();		return *getDialect()->getContext();
}		}

Type LLVMTypeConverter::getIndexType() {		Type LLVMTypeConverter::getIndexType() {
return IntegerType::get(&getContext(), getIndexTypeBitwidth());		return IntegerType::get(&getContext(), getIndexTypeBitwidth());
}		}

		Type LLVMTypeConverter::getIndexTypeMatchingMemRef(MemRefType t) {
		return options.memrefIndexTypeConverter ? options.memrefIndexTypeConverter(t)
		: getIndexType();
		}

LLVM::LLVMPointerType		LLVM::LLVMPointerType
LLVMTypeConverter::getPointerType(Type elementType, unsigned int addressSpace) {		LLVMTypeConverter::getPointerType(Type elementType, unsigned int addressSpace) {
if (useOpaquePointers())		if (useOpaquePointers())
return LLVM::LLVMPointerType::get(&getContext(), addressSpace);		return LLVM::LLVMPointerType::get(&getContext(), addressSpace);
return LLVM::LLVMPointerType::get(elementType, addressSpace);		return LLVM::LLVMPointerType::get(elementType, addressSpace);
}		}

unsigned LLVMTypeConverter::getPointerBitwidth(unsigned addressSpace) {		unsigned LLVMTypeConverter::getPointerBitwidth(unsigned addressSpace) {
▲ Show 20 Lines • Show All 149 Lines • ▼ Show 20 Lines	emitError(UnknownLoc::get(type.getContext()),
"conversion of memref memory space ")		"conversion of memref memory space ")
<< type.getMemorySpace()		<< type.getMemorySpace()
<< " to integer address space "		<< " to integer address space "
"failed. Consider adding memory space conversions.";		"failed. Consider adding memory space conversions.";
return {};		return {};
}		}
auto ptrTy = getPointerType(elementType, *addressSpace);		auto ptrTy = getPointerType(elementType, *addressSpace);

auto indexTy = getIndexType();		Type indexTy = getIndexTypeMatchingMemRef(type);

SmallVector<Type, 5> results = {ptrTy, ptrTy, indexTy};		SmallVector<Type, 5> results = {ptrTy, ptrTy, indexTy};
auto rank = type.getRank();		auto rank = type.getRank();
if (rank == 0)		if (rank == 0)
return results;		return results;

if (unpackAggregates)		if (unpackAggregates)
results.insert(results.end(), 2 * rank, indexTy);		results.insert(results.end(), 2 * rank, indexTy);
else		else
results.insert(results.end(), 2, LLVM::LLVMArrayType::get(indexTy, rank));		results.insert(results.end(), 2, LLVM::LLVMArrayType::get(indexTy, rank));
return results;		return results;
}		}

unsigned LLVMTypeConverter::getMemRefDescriptorSize(MemRefType type,		unsigned LLVMTypeConverter::getMemRefDescriptorSize(MemRefType type,
const DataLayout &layout) {		const DataLayout &layout) {
// Compute the descriptor size given that of its components indicated above.		// Compute the descriptor size given that of its components indicated above.
unsigned space = *getMemRefAddressSpace(type);		unsigned space = *getMemRefAddressSpace(type);
return 2 * llvm::divideCeil(getPointerBitwidth(space), 8) +		return 2 * llvm::divideCeil(getPointerBitwidth(space), 8) +
(1 + 2 * type.getRank()) * layout.getTypeSize(getIndexType());		(1 + 2 * type.getRank()) *
		layout.getTypeSize(getIndexTypeMatchingMemRef(type));
}		}

/// Converts MemRefType to LLVMType. A MemRefType is converted to a struct that		/// Converts MemRefType to LLVMType. A MemRefType is converted to a struct that
/// packs the descriptor fields as defined by `getMemRefDescriptorFields`.		/// packs the descriptor fields as defined by `getMemRefDescriptorFields`.
Type LLVMTypeConverter::convertMemRefType(MemRefType type) {		Type LLVMTypeConverter::convertMemRefType(MemRefType type) {
// When converting a MemRefType to a struct with descriptor fields, do not		// When converting a MemRefType to a struct with descriptor fields, do not
// unpack the `sizes` and `strides` arrays.		// unpack the `sizes` and `strides` arrays.
SmallVector<Type, 5> types =		SmallVector<Type, 5> types =
▲ Show 20 Lines • Show All 281 Lines • Show Last 20 Lines

mlir/lib/Conversion/MemRefToLLVM/MemRefToLLVM.cpp

Show First 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	struct ReallocOpLoweringBase : public AllocationOpLLVMLowering {
LogicalResult matchAndRewrite(memref::ReallocOp op, OpAdaptor adaptor,		LogicalResult matchAndRewrite(memref::ReallocOp op, OpAdaptor adaptor,
ConversionPatternRewriter &rewriter) const {		ConversionPatternRewriter &rewriter) const {
OpBuilder::InsertionGuard guard(rewriter);		OpBuilder::InsertionGuard guard(rewriter);
Location loc = op.getLoc();		Location loc = op.getLoc();

auto computeNumElements =		auto computeNumElements =
[&](MemRefType type, function_ref<Value()> getDynamicSize) -> Value {		[&](MemRefType type, function_ref<Value()> getDynamicSize) -> Value {
// Compute number of elements.		// Compute number of elements.
Type indexType = ConvertToLLVMPattern::getIndexType();		Type indexType = ConvertToLLVMPattern::getIndexTypeMatchingMemRef(type);
Value numElements =		Value numElements =
type.isDynamicDim(0)		type.isDynamicDim(0)
? getDynamicSize()		? getDynamicSize()
: createIndexAttrConstant(rewriter, loc, indexType,		: createIndexAttrConstant(rewriter, loc, indexType,
type.getDimSize(0));		type.getDimSize(0));
if (numElements.getType() != indexType)		if (numElements.getType() != indexType)
numElements = typeConverter->materializeTargetConversion(		numElements = typeConverter->materializeTargetConversion(
rewriter, loc, indexType, numElements);		rewriter, loc, indexType, numElements);
▲ Show 20 Lines • Show All 306 Lines • ▼ Show 20 Lines	Type indexPtrTy = getTypeConverter()->getPointerType(
getTypeConverter()->getIndexType(), addressSpace);		getTypeConverter()->getIndexType(), addressSpace);
Value offsetPtr = rewriter.create<LLVM::GEPOp>(		Value offsetPtr = rewriter.create<LLVM::GEPOp>(
loc, indexPtrTy, elementType, scalarMemRefDescPtr,		loc, indexPtrTy, elementType, scalarMemRefDescPtr,
ArrayRef<LLVM::GEPArg>{0, 2});		ArrayRef<LLVM::GEPArg>{0, 2});

// The size value that we have to extract can be obtained using GEPop with		// The size value that we have to extract can be obtained using GEPop with
// `dimOp.index() + 1` index argument.		// `dimOp.index() + 1` index argument.
Value idxPlusOne = rewriter.create<LLVM::AddOp>(		Value idxPlusOne = rewriter.create<LLVM::AddOp>(
loc, createIndexAttrConstant(rewriter, loc, getIndexType(), 1),		loc,
		createIndexAttrConstant(rewriter, loc, adaptor.getIndex().getType(), 1),
adaptor.getIndex());		adaptor.getIndex());
Value sizePtr = rewriter.create<LLVM::GEPOp>(		Value sizePtr = rewriter.create<LLVM::GEPOp>(
loc, indexPtrTy, getTypeConverter()->getIndexType(), offsetPtr,		loc, indexPtrTy, getTypeConverter()->getIndexType(), offsetPtr,
idxPlusOne);		idxPlusOne);
return rewriter		return rewriter
.create<LLVM::LoadOp>(loc, getTypeConverter()->getIndexType(), sizePtr)		.create<LLVM::LoadOp>(loc, getTypeConverter()->getIndexType(), sizePtr)
.getResult();		.getResult();
}		}
Show All 10 Lines	private:

Value extractSizeOfRankedMemRef(Type operandType, memref::DimOp dimOp,		Value extractSizeOfRankedMemRef(Type operandType, memref::DimOp dimOp,
OpAdaptor adaptor,		OpAdaptor adaptor,
ConversionPatternRewriter &rewriter) const {		ConversionPatternRewriter &rewriter) const {
Location loc = dimOp.getLoc();		Location loc = dimOp.getLoc();

// Take advantage if index is constant.		// Take advantage if index is constant.
MemRefType memRefType = cast<MemRefType>(operandType);		MemRefType memRefType = cast<MemRefType>(operandType);
Type indexType = getIndexType();		Type indexType = getIndexTypeMatchingMemRef(memRefType);
if (std::optional<int64_t> index = getConstantDimIndex(dimOp)) {		if (std::optional<int64_t> index = getConstantDimIndex(dimOp)) {
int64_t i = *index;		int64_t i = *index;
if (i >= 0 && i < memRefType.getRank()) {		if (i >= 0 && i < memRefType.getRank()) {
if (memRefType.isDynamicDim(i)) {		if (memRefType.isDynamicDim(i)) {
// extract dynamic size from the memref descriptor.		// extract dynamic size from the memref descriptor.
MemRefDescriptor descriptor(adaptor.getSource());		MemRefDescriptor descriptor(adaptor.getSource());
return descriptor.size(rewriter, loc, i);		return descriptor.size(rewriter, loc, i);
}		}
▲ Show 20 Lines • Show All 833 Lines • ▼ Show 20 Lines	if (shapeMemRefType.hasStaticShape()) {
return rewriter.notifyMatchFailure(reshapeOp,		return rewriter.notifyMatchFailure(reshapeOp,
"dynamic offset is unsupported");		"dynamic offset is unsupported");

desc.setConstantOffset(rewriter, loc, offset);		desc.setConstantOffset(rewriter, loc, offset);

assert(targetMemRefType.getLayout().isIdentity() &&		assert(targetMemRefType.getLayout().isIdentity() &&
"Identity layout map is a precondition of a valid reshape op");		"Identity layout map is a precondition of a valid reshape op");

Type indexType = getIndexType();		Type indexType = getIndexTypeMatchingMemRef(targetMemRefType);
Value stride = nullptr;		Value stride = nullptr;
int64_t targetRank = targetMemRefType.getRank();		int64_t targetRank = targetMemRefType.getRank();
for (auto i : llvm::reverse(llvm::seq<int64_t>(0, targetRank))) {		for (auto i : llvm::reverse(llvm::seq<int64_t>(0, targetRank))) {
if (!ShapedType::isDynamic(strides[i])) {		if (!ShapedType::isDynamic(strides[i])) {
// If the stride for this dimension is dynamic, then use the product		// If the stride for this dimension is dynamic, then use the product
// of the sizes of the inner dimensions.		// of the sizes of the inner dimensions.
stride =		stride =
createIndexAttrConstant(rewriter, loc, indexType, strides[i]);		createIndexAttrConstant(rewriter, loc, indexType, strides[i]);
▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	convertSourceMemRefToDescriptor(ConversionPatternRewriter &rewriter,

// Use the offset pointer as base for further addressing. Copy over the new		// Use the offset pointer as base for further addressing. Copy over the new
// shape and compute strides. For this, we create a loop from rank-1 to 0.		// shape and compute strides. For this, we create a loop from rank-1 to 0.
Value targetSizesBase = UnrankedMemRefDescriptor::sizeBasePtr(		Value targetSizesBase = UnrankedMemRefDescriptor::sizeBasePtr(
rewriter, loc, *getTypeConverter(), underlyingDescPtr, elementPtrType);		rewriter, loc, *getTypeConverter(), underlyingDescPtr, elementPtrType);
Value targetStridesBase = UnrankedMemRefDescriptor::strideBasePtr(		Value targetStridesBase = UnrankedMemRefDescriptor::strideBasePtr(
rewriter, loc, *getTypeConverter(), targetSizesBase, resultRank);		rewriter, loc, *getTypeConverter(), targetSizesBase, resultRank);
Value shapeOperandPtr = shapeDesc.alignedPtr(rewriter, loc);		Value shapeOperandPtr = shapeDesc.alignedPtr(rewriter, loc);
Value oneIndex = createIndexAttrConstant(rewriter, loc, getIndexType(), 1);		Value oneIndex =
		createIndexAttrConstant(rewriter, loc, resultRank.getType(), 1);
Value resultRankMinusOne =		Value resultRankMinusOne =
rewriter.create<LLVM::SubOp>(loc, resultRank, oneIndex);		rewriter.create<LLVM::SubOp>(loc, resultRank, oneIndex);

Block *initBlock = rewriter.getInsertionBlock();		Block *initBlock = rewriter.getInsertionBlock();
Type indexType = getTypeConverter()->getIndexType();		Type indexType = getTypeConverter()->getIndexType();
Block::iterator remainingOpsIt = std::next(rewriter.getInsertionPoint());		Block::iterator remainingOpsIt = std::next(rewriter.getInsertionPoint());

Block *condBlock = rewriter.createBlock(initBlock->getParent(), {},		Block *condBlock = rewriter.createBlock(initBlock->getParent(), {},
▲ Show 20 Lines • Show All 236 Lines • ▼ Show 20 Lines	matchAndRewrite(memref::ViewOp viewOp, OpAdaptor adaptor,
} else {		} else {
bitcastPtr = rewriter.create<LLVM::BitcastOp>(		bitcastPtr = rewriter.create<LLVM::BitcastOp>(
loc, LLVM::LLVMPointerType::get(targetElementTy, sourceMemorySpace),		loc, LLVM::LLVMPointerType::get(targetElementTy, sourceMemorySpace),
alignedPtr);		alignedPtr);
}		}

targetMemRef.setAlignedPtr(rewriter, loc, bitcastPtr);		targetMemRef.setAlignedPtr(rewriter, loc, bitcastPtr);

Type indexType = getIndexType();		auto indexType = targetMemRef.getIndexType();
// Field 3: The offset in the resulting type must be 0. This is		// Field 3: The offset in the resulting type must be 0. This is
// because of the type change: an offset on srcType* may not be		// because of the type change: an offset on srcType* may not be
// expressible as an offset on dstType*.		// expressible as an offset on dstType*.
targetMemRef.setOffset(		targetMemRef.setOffset(
rewriter, loc,		rewriter, loc,
createIndexAttrConstant(rewriter, loc, indexType, offset));		createIndexAttrConstant(rewriter, loc, indexType, offset));

// Early exit for 0-D corner case.		// Early exit for 0-D corner case.
▲ Show 20 Lines • Show All 219 Lines • Show Last 20 Lines

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp

	Show All 29 Lines
	#include "llvm/ADT/TypeSwitch.h"			#include "llvm/ADT/TypeSwitch.h"
	#include "llvm/Support/ErrorHandling.h"			#include "llvm/Support/ErrorHandling.h"

	using namespace mlir;			using namespace mlir;
	using namespace mlir::gpu;			using namespace mlir::gpu;

	#include "mlir/Dialect/GPU/IR/GPUOpsDialect.cpp.inc"			#include "mlir/Dialect/GPU/IR/GPUOpsDialect.cpp.inc"

				/// Return true if the given MemRefType has an address space that is a
				/// gpu::AddressSpaceAttr attribute with value 'workgroup`.
				bool gpu::GPUDialect::hasSharedMemoryAddressSpace(MemRefType type) {
				return isSharedMemoryAddressSpace(type.getMemorySpace());
				}

				/// Return true if the given Attribute has matches is a gpu::AddressSpaceAttr
				/// attribute with value 'workgroup`.
				bool gpu::GPUDialect::isSharedMemoryAddressSpace(Attribute memorySpace) {
				if (!memorySpace)
				return false;
				if (auto gpuAttr = llvm::dyn_cast<gpu::AddressSpaceAttr>(memorySpace))
				return gpuAttr.getValue() == gpu::AddressSpace::Workgroup;
				return false;
				}

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// GPU Device Mapping Attributes			// GPU Device Mapping Attributes
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	int64_t GPUBlockMappingAttr::getMappingId() const {			int64_t GPUBlockMappingAttr::getMappingId() const {
	return static_cast<int64_t>(getBlock());			return static_cast<int64_t>(getBlock());
	}			}

	▲ Show 20 Lines • Show All 1,768 Lines • Show Last 20 Lines

mlir/test/Conversion/GPUCommon/memory-attrbution.mlir

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	gpu.func @workgroup(%arg0: f32) workgroup(%arg1: memref<4xf32, #gpu.address_space<workgroup>>) {
// NVVM: %[[raw:.*]] = llvm.getelementptr %[[addr]][0, 0]		// NVVM: %[[raw:.*]] = llvm.getelementptr %[[addr]][0, 0]
// NVVM-SAME: !llvm.ptr<3>		// NVVM-SAME: !llvm.ptr<3>

// ROCDL: %[[addr:.*]] = llvm.mlir.addressof @[[$buffer]] : !llvm.ptr<3>		// ROCDL: %[[addr:.*]] = llvm.mlir.addressof @[[$buffer]] : !llvm.ptr<3>
// ROCDL: %[[raw:.*]] = llvm.getelementptr %[[addr]][0, 0]		// ROCDL: %[[raw:.*]] = llvm.getelementptr %[[addr]][0, 0]
// ROCDL-SAME: !llvm.ptr<3>		// ROCDL-SAME: !llvm.ptr<3>

// Populate the memref descriptor.		// Populate the memref descriptor.
// NVVM: %[[descr1:.*]] = llvm.mlir.undef : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>		// NVVM: %[[descr1:.*]] = llvm.mlir.undef : !llvm.struct<(ptr<3>, ptr<3>, i32, array<1 x i32>, array<1 x i32>)>
// NVVM: %[[descr2:.*]] = llvm.insertvalue %[[raw]], %[[descr1]][0]		// NVVM: %[[descr2:.*]] = llvm.insertvalue %[[raw]], %[[descr1]][0]
// NVVM: %[[descr3:.*]] = llvm.insertvalue %[[raw]], %[[descr2]][1]		// NVVM: %[[descr3:.*]] = llvm.insertvalue %[[raw]], %[[descr2]][1]
// NVVM: %[[c0:.*]] = llvm.mlir.constant(0 : index) : i64		// NVVM: %[[c0:.*]] = llvm.mlir.constant(0 : index) : i32
// NVVM: %[[descr4:.*]] = llvm.insertvalue %[[c0]], %[[descr3]][2]		// NVVM: %[[descr4:.*]] = llvm.insertvalue %[[c0]], %[[descr3]][2]
// NVVM: %[[c4:.*]] = llvm.mlir.constant(4 : index) : i64		// NVVM: %[[c4:.*]] = llvm.mlir.constant(4 : index) : i32
// NVVM: %[[descr5:.*]] = llvm.insertvalue %[[c4]], %[[descr4]][3, 0]		// NVVM: %[[descr5:.*]] = llvm.insertvalue %[[c4]], %[[descr4]][3, 0]
// NVVM: %[[c1:.*]] = llvm.mlir.constant(1 : index) : i64		// NVVM: %[[c1:.*]] = llvm.mlir.constant(1 : index) : i32
// NVVM: %[[descr6:.*]] = llvm.insertvalue %[[c1]], %[[descr5]][4, 0]		// NVVM: %[[descr6:.*]] = llvm.insertvalue %[[c1]], %[[descr5]][4, 0]

// ROCDL: %[[descr1:.*]] = llvm.mlir.undef : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>		// ROCDL: %[[descr1:.*]] = llvm.mlir.undef : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
// ROCDL: %[[descr2:.*]] = llvm.insertvalue %[[raw]], %[[descr1]][0]		// ROCDL: %[[descr2:.*]] = llvm.insertvalue %[[raw]], %[[descr1]][0]
// ROCDL: %[[descr3:.*]] = llvm.insertvalue %[[raw]], %[[descr2]][1]		// ROCDL: %[[descr3:.*]] = llvm.insertvalue %[[raw]], %[[descr2]][1]
// ROCDL: %[[c0:.*]] = llvm.mlir.constant(0 : index) : i64		// ROCDL: %[[c0:.*]] = llvm.mlir.constant(0 : index) : i64
// ROCDL: %[[descr4:.*]] = llvm.insertvalue %[[c0]], %[[descr3]][2]		// ROCDL: %[[descr4:.*]] = llvm.insertvalue %[[c0]], %[[descr3]][2]
// ROCDL: %[[c4:.*]] = llvm.mlir.constant(4 : index) : i64		// ROCDL: %[[c4:.*]] = llvm.mlir.constant(4 : index) : i64
Show All 37 Lines	gpu.func @workgroup3d(%arg0: f32) workgroup(%arg1: memref<4x2x6xf32, #gpu.address_space<workgroup>>) {
// NVVM: %[[raw:.*]] = llvm.getelementptr %[[addr]][0, 0]		// NVVM: %[[raw:.*]] = llvm.getelementptr %[[addr]][0, 0]
// NVVM-SAME: !llvm.ptr<3>		// NVVM-SAME: !llvm.ptr<3>

// ROCDL: %[[addr:.*]] = llvm.mlir.addressof @[[$buffer]] : !llvm.ptr<3>		// ROCDL: %[[addr:.*]] = llvm.mlir.addressof @[[$buffer]] : !llvm.ptr<3>
// ROCDL: %[[raw:.*]] = llvm.getelementptr %[[addr]][0, 0]		// ROCDL: %[[raw:.*]] = llvm.getelementptr %[[addr]][0, 0]
// ROCDL-SAME: !llvm.ptr<3>		// ROCDL-SAME: !llvm.ptr<3>

// Populate the memref descriptor.		// Populate the memref descriptor.
// NVVM: %[[descr1:.*]] = llvm.mlir.undef : !llvm.struct<(ptr<3>, ptr<3>, i64, array<3 x i64>, array<3 x i64>)>		// NVVM: %[[descr1:.*]] = llvm.mlir.undef : !llvm.struct<(ptr<3>, ptr<3>, i32, array<3 x i32>, array<3 x i32>)>
// NVVM: %[[descr2:.*]] = llvm.insertvalue %[[raw]], %[[descr1]][0]		// NVVM: %[[descr2:.*]] = llvm.insertvalue %[[raw]], %[[descr1]][0]
// NVVM: %[[descr3:.*]] = llvm.insertvalue %[[raw]], %[[descr2]][1]		// NVVM: %[[descr3:.*]] = llvm.insertvalue %[[raw]], %[[descr2]][1]
// NVVM: %[[c0:.*]] = llvm.mlir.constant(0 : index) : i64		// NVVM: %[[c0:.*]] = llvm.mlir.constant(0 : index) : i32
// NVVM: %[[descr4:.*]] = llvm.insertvalue %[[c0]], %[[descr3]][2]		// NVVM: %[[descr4:.*]] = llvm.insertvalue %[[c0]], %[[descr3]][2]
// NVVM: %[[c4:.*]] = llvm.mlir.constant(4 : index) : i64		// NVVM: %[[c4:.*]] = llvm.mlir.constant(4 : index) : i32
// NVVM: %[[descr5:.*]] = llvm.insertvalue %[[c4]], %[[descr4]][3, 0]		// NVVM: %[[descr5:.*]] = llvm.insertvalue %[[c4]], %[[descr4]][3, 0]
// NVVM: %[[c12:.*]] = llvm.mlir.constant(12 : index) : i64		// NVVM: %[[c12:.*]] = llvm.mlir.constant(12 : index) : i32
// NVVM: %[[descr6:.*]] = llvm.insertvalue %[[c12]], %[[descr5]][4, 0]		// NVVM: %[[descr6:.*]] = llvm.insertvalue %[[c12]], %[[descr5]][4, 0]
// NVVM: %[[c2:.*]] = llvm.mlir.constant(2 : index) : i64		// NVVM: %[[c2:.*]] = llvm.mlir.constant(2 : index) : i32
// NVVM: %[[descr7:.*]] = llvm.insertvalue %[[c2]], %[[descr6]][3, 1]		// NVVM: %[[descr7:.*]] = llvm.insertvalue %[[c2]], %[[descr6]][3, 1]
// NVVM: %[[c6:.*]] = llvm.mlir.constant(6 : index) : i64		// NVVM: %[[c6:.*]] = llvm.mlir.constant(6 : index) : i32
// NVVM: %[[descr8:.*]] = llvm.insertvalue %[[c6]], %[[descr7]][4, 1]		// NVVM: %[[descr8:.*]] = llvm.insertvalue %[[c6]], %[[descr7]][4, 1]
// NVVM: %[[c6:.*]] = llvm.mlir.constant(6 : index) : i64		// NVVM: %[[c6:.*]] = llvm.mlir.constant(6 : index) : i32
// NVVM: %[[descr9:.*]] = llvm.insertvalue %[[c6]], %[[descr8]][3, 2]		// NVVM: %[[descr9:.*]] = llvm.insertvalue %[[c6]], %[[descr8]][3, 2]
// NVVM: %[[c1:.*]] = llvm.mlir.constant(1 : index) : i64		// NVVM: %[[c1:.*]] = llvm.mlir.constant(1 : index) : i32
// NVVM: %[[descr10:.*]] = llvm.insertvalue %[[c1]], %[[descr9]][4, 2]		// NVVM: %[[descr10:.*]] = llvm.insertvalue %[[c1]], %[[descr9]][4, 2]

// ROCDL: %[[descr1:.*]] = llvm.mlir.undef : !llvm.struct<(ptr<3>, ptr<3>, i64, array<3 x i64>, array<3 x i64>)>		// ROCDL: %[[descr1:.*]] = llvm.mlir.undef : !llvm.struct<(ptr<3>, ptr<3>, i64, array<3 x i64>, array<3 x i64>)>
// ROCDL: %[[descr2:.*]] = llvm.insertvalue %[[raw]], %[[descr1]][0]		// ROCDL: %[[descr2:.*]] = llvm.insertvalue %[[raw]], %[[descr1]][0]
// ROCDL: %[[descr3:.*]] = llvm.insertvalue %[[raw]], %[[descr2]][1]		// ROCDL: %[[descr3:.*]] = llvm.insertvalue %[[raw]], %[[descr2]][1]
// ROCDL: %[[c0:.*]] = llvm.mlir.constant(0 : index) : i64		// ROCDL: %[[c0:.*]] = llvm.mlir.constant(0 : index) : i64
// ROCDL: %[[descr4:.*]] = llvm.insertvalue %[[c0]], %[[descr3]][2]		// ROCDL: %[[descr4:.*]] = llvm.insertvalue %[[c0]], %[[descr3]][2]
// ROCDL: %[[c4:.*]] = llvm.mlir.constant(4 : index) : i64		// ROCDL: %[[c4:.*]] = llvm.mlir.constant(4 : index) : i64
▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines

mlir/test/Conversion/GPUToNVVM/typed-pointers.mlir

	// RUN: mlir-opt --convert-gpu-to-nvvm="use-opaque-pointers=0" --split-input-file %s \| FileCheck %s			// RUN: mlir-opt --convert-gpu-to-nvvm="use-opaque-pointers=0" --split-input-file %s \| FileCheck %s
	// RUN: mlir-opt --convert-gpu-to-nvvm="index-bitwidth=32 use-opaque-pointers=0" --split-input-file %s \| FileCheck --check-prefix=CHECK32 %s			// RUN: mlir-opt --convert-gpu-to-nvvm="index-bitwidth=32 use-opaque-pointers=0" --split-input-file %s \| FileCheck --check-prefix=CHECK32 %s

	gpu.module @test_module {			gpu.module @test_module {

	// CHECK-LABEL: func @gpu_wmma_load_op() ->			// CHECK-LABEL: func @gpu_wmma_load_op() ->
	// CHECK-SAME: !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>			// CHECK-SAME: !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>
	// CHECK32-LABEL: func @gpu_wmma_load_op() ->			// CHECK32-LABEL: func @gpu_wmma_load_op() ->
	func.func @gpu_wmma_load_op() -> (!gpu.mma_matrix<16x16xf16, "AOp">) {			func.func @gpu_wmma_load_op() -> (!gpu.mma_matrix<16x16xf16, "AOp">) {
	%wg = memref.alloca() {alignment = 32} : memref<32x32xf16, 3>			%wg = memref.alloca() {alignment = 32} : memref<32x32xf16, 3>
	%i = arith.constant 16 : index			%i = arith.constant 16 : index
	%j = arith.constant 16 : index			%j = arith.constant 16 : index
	%0 = gpu.subgroup_mma_load_matrix %wg[%i, %j] {leadDimension = 32 : index, transpose} : memref<32x32xf16, 3> -> !gpu.mma_matrix<16x16xf16, "AOp">			%0 = gpu.subgroup_mma_load_matrix %wg[%i, %j] {leadDimension = 32 : index, transpose} : memref<32x32xf16, 3> -> !gpu.mma_matrix<16x16xf16, "AOp">
	// CHECK: %[[INX:.*]] = llvm.mlir.constant(16 : index) : i64			// CHECK: %[[INX64:.*]] = llvm.mlir.constant(16 : index) : i64
	// CHECK: %{{.}} = llvm.insertvalue %{{.}}, %{{.}}[{{.}}, {{.*}}]			// CHECK: %{{.}} = llvm.insertvalue %{{.}}, %{{.}}[{{.}}, {{.*}}]
	// CHECK: %[[BASE:.]] = llvm.extractvalue %{{.}}[1] : !llvm.struct<(ptr<f16, 3>, ptr<f16, 3>, i64, array<2 x i64>, array<2 x i64>)>			// CHECK: %[[BASE:.]] = llvm.extractvalue %{{.}}[1] : !llvm.struct<(ptr<f16, 3>, ptr<f16, 3>, i32, array<2 x i32>, array<2 x i32>)>
	// CHECK: %[[LDM:.*]] = llvm.mlir.constant(32 : index) : i64			// CHECK: %[[LDM:.*]] = llvm.mlir.constant(32 : index) : i32
	// CHECK: %[[LI:.*]] = llvm.mul %[[INX]], %[[LDM]] : i64			// CHECK: %[[INX:.*]] = llvm.trunc %[[INX64]] : i64 to i32
	// CHECK: %[[LIJ:.*]] = llvm.add %[[LI]], %[[INX]] : i64			// CHECK: %[[LI:.*]] = llvm.mul %[[INX]], %[[LDM]] : i32
	// CHECK: %[[ADDRESS:.*]] = llvm.getelementptr %[[BASE]][%[[LIJ]]] : (!llvm.ptr<f16, 3>, i64) -> !llvm.ptr<f16, 3>			// CHECK: %[[INX2:.*]] = llvm.trunc %[[INX64]] : i64 to i32
				// CHECK: %[[LIJ:.*]] = llvm.add %[[LI]], %[[INX2]] : i32
				// CHECK: %[[ADDRESS:.*]] = llvm.getelementptr %[[BASE]][%[[LIJ]]] : (!llvm.ptr<f16, 3>, i32) -> !llvm.ptr<f16, 3>
	// CHECK: %[[LDM32:.*]] = llvm.mlir.constant(32 : index) : i32			// CHECK: %[[LDM32:.*]] = llvm.mlir.constant(32 : index) : i32
	// CHECK: %[[FRAG:.*]] = nvvm.wmma.load %[[ADDRESS]], %[[LDM32]]			// CHECK: %[[FRAG:.*]] = nvvm.wmma.load %[[ADDRESS]], %[[LDM32]]
	// CHECK-SAME: {eltype = #nvvm.mma_type<f16>, frag = #nvvm.mma_frag<a>, k = 16 : i32, layout = #nvvm.mma_layout<col>, m = 16 : i32, n = 16 : i32} : (!llvm.ptr<f16, 3>) -> !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>			// CHECK-SAME: {eltype = #nvvm.mma_type<f16>, frag = #nvvm.mma_frag<a>, k = 16 : i32, layout = #nvvm.mma_layout<col>, m = 16 : i32, n = 16 : i32} : (!llvm.ptr<f16, 3>) -> !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>
	// CHECK: llvm.return %[[FRAG]] : !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>			// CHECK: llvm.return %[[FRAG]] : !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>

	// CHECK32: %[[INX:.*]] = llvm.mlir.constant(16 : index) : i32			// CHECK32: %[[INX:.*]] = llvm.mlir.constant(16 : index) : i32
	// CHECK32: %{{.}} = llvm.insertvalue %{{.}}, %{{.}}[{{.}}, {{.*}}]			// CHECK32: %{{.}} = llvm.insertvalue %{{.}}, %{{.}}[{{.}}, {{.*}}]
	// CHECK32: %[[BASE:.]] = llvm.extractvalue %{{.}}[1] : !llvm.struct<(ptr<f16, 3>, ptr<f16, 3>, i32, array<2 x i32>, array<2 x i32>)>			// CHECK32: %[[BASE:.]] = llvm.extractvalue %{{.}}[1] : !llvm.struct<(ptr<f16, 3>, ptr<f16, 3>, i32, array<2 x i32>, array<2 x i32>)>
	Show All 11 Lines

mlir/test/Conversion/GPUToNVVM/wmma-ops-to-nvvm.mlir

// RUN: mlir-opt --convert-gpu-to-nvvm='use-opaque-pointers=1' --split-input-file %s \| FileCheck %s		// RUN: mlir-opt --convert-gpu-to-nvvm='use-opaque-pointers=1' --split-input-file %s \| FileCheck %s
// RUN: mlir-opt --convert-gpu-to-nvvm="index-bitwidth=32 use-opaque-pointers=1" --split-input-file %s \| FileCheck --check-prefix=CHECK32 %s		// RUN: mlir-opt --convert-gpu-to-nvvm="index-bitwidth=32 use-opaque-pointers=1" --split-input-file %s \| FileCheck --check-prefix=CHECK32 %s

gpu.module @test_module {		gpu.module @test_module {

// CHECK-LABEL: func @gpu_wmma_load_op() ->		// CHECK-LABEL: func @gpu_wmma_load_op() ->
// CHECK-SAME: !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>		// CHECK-SAME: !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>
// CHECK32-LABEL: func @gpu_wmma_load_op() ->		// CHECK32-LABEL: func @gpu_wmma_load_op() ->
func.func @gpu_wmma_load_op() -> (!gpu.mma_matrix<16x16xf16, "AOp">) {		func.func @gpu_wmma_load_op() -> (!gpu.mma_matrix<16x16xf16, "AOp">) {
%wg = memref.alloca() {alignment = 32} : memref<32x32xf16, 3>		%wg = memref.alloca() {alignment = 32} : memref<32x32xf16, 3>
%i = arith.constant 16 : index		%i = arith.constant 16 : index
%j = arith.constant 16 : index		%j = arith.constant 16 : index
%0 = gpu.subgroup_mma_load_matrix %wg[%i, %j] {leadDimension = 32 : index, transpose} : memref<32x32xf16, 3> -> !gpu.mma_matrix<16x16xf16, "AOp">		%0 = gpu.subgroup_mma_load_matrix %wg[%i, %j] {leadDimension = 32 : index, transpose} : memref<32x32xf16, 3> -> !gpu.mma_matrix<16x16xf16, "AOp">
// CHECK: %[[INX:.*]] = llvm.mlir.constant(16 : index) : i64		// CHECK: %[[INX64:.*]] = llvm.mlir.constant(16 : index) : i64
// CHECK: %{{.}} = llvm.insertvalue %{{.}}, %{{.}}[{{.}}, {{.*}}]		// CHECK: %{{.}} = llvm.insertvalue %{{.}}, %{{.}}[{{.}}, {{.*}}]
// CHECK: %[[BASE:.]] = llvm.extractvalue %{{.}}[1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<2 x i64>, array<2 x i64>)>		// CHECK: %[[BASE:.]] = llvm.extractvalue %{{.}}[1] : !llvm.struct<(ptr<3>, ptr<3>, i32, array<2 x i32>, array<2 x i32>)>
// CHECK: %[[LDM:.*]] = llvm.mlir.constant(32 : index) : i64		// CHECK: %[[LDM:.*]] = llvm.mlir.constant(32 : index) : i32
// CHECK: %[[LI:.*]] = llvm.mul %[[INX]], %[[LDM]] : i64		// CHECK: %[[INX:.*]] = llvm.trunc %[[INX64]] : i64 to i32
// CHECK: %[[LIJ:.*]] = llvm.add %[[LI]], %[[INX]] : i64		// CHECK: %[[LI:.*]] = llvm.mul %[[INX]], %[[LDM]] : i32
// CHECK: %[[ADDRESS:.*]] = llvm.getelementptr %[[BASE]][%[[LIJ]]] : (!llvm.ptr<3>, i64) -> !llvm.ptr<3>, f16		// CHECK: %[[INX2:.*]] = llvm.trunc %[[INX64]] : i64 to i32
		// CHECK: %[[LIJ:.*]] = llvm.add %[[LI]], %[[INX2]] : i32
		// CHECK: %[[ADDRESS:.*]] = llvm.getelementptr %[[BASE]][%[[LIJ]]] : (!llvm.ptr<3>, i32) -> !llvm.ptr<3>, f16
// CHECK: %[[LDM32:.*]] = llvm.mlir.constant(32 : index) : i32		// CHECK: %[[LDM32:.*]] = llvm.mlir.constant(32 : index) : i32
// CHECK: %[[FRAG:.*]] = nvvm.wmma.load %[[ADDRESS]], %[[LDM32]]		// CHECK: %[[FRAG:.*]] = nvvm.wmma.load %[[ADDRESS]], %[[LDM32]]
// CHECK-SAME: {eltype = #nvvm.mma_type<f16>, frag = #nvvm.mma_frag<a>, k = 16 : i32, layout = #nvvm.mma_layout<col>, m = 16 : i32, n = 16 : i32} : (!llvm.ptr<3>) -> !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>		// CHECK-SAME: {eltype = #nvvm.mma_type<f16>, frag = #nvvm.mma_frag<a>, k = 16 : i32, layout = #nvvm.mma_layout<col>, m = 16 : i32, n = 16 : i32} : (!llvm.ptr<3>) -> !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>
// CHECK: llvm.return %[[FRAG]] : !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>		// CHECK: llvm.return %[[FRAG]] : !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>

// CHECK32: %[[INX:.*]] = llvm.mlir.constant(16 : index) : i32		// CHECK32: %[[INX:.*]] = llvm.mlir.constant(16 : index) : i32
// CHECK32: %{{.}} = llvm.insertvalue %{{.}}, %{{.}}[{{.}}, {{.*}}]		// CHECK32: %{{.}} = llvm.insertvalue %{{.}}, %{{.}}[{{.}}, {{.*}}]
// CHECK32: %[[BASE:.]] = llvm.extractvalue %{{.}}[1] : !llvm.struct<(ptr<3>, ptr<3>, i32, array<2 x i32>, array<2 x i32>)>		// CHECK32: %[[BASE:.]] = llvm.extractvalue %{{.}}[1] : !llvm.struct<(ptr<3>, ptr<3>, i32, array<2 x i32>, array<2 x i32>)>
Show All 16 Lines	gpu.module @test_module {
// CHECK-LABEL: func @gpu_wmma_int8_load_op() ->		// CHECK-LABEL: func @gpu_wmma_int8_load_op() ->
// CHECK-SAME: !llvm.struct<(i32, i32)>		// CHECK-SAME: !llvm.struct<(i32, i32)>
// CHECK32-LABEL: func @gpu_wmma_int8_load_op() ->		// CHECK32-LABEL: func @gpu_wmma_int8_load_op() ->
func.func @gpu_wmma_int8_load_op() -> (!gpu.mma_matrix<16x16xsi8, "AOp">) {		func.func @gpu_wmma_int8_load_op() -> (!gpu.mma_matrix<16x16xsi8, "AOp">) {
%wg = memref.alloca() {alignment = 32} : memref<32x32xi8, 3>		%wg = memref.alloca() {alignment = 32} : memref<32x32xi8, 3>
%i = arith.constant 16 : index		%i = arith.constant 16 : index
%j = arith.constant 16 : index		%j = arith.constant 16 : index
%0 = gpu.subgroup_mma_load_matrix %wg[%i, %j] {leadDimension = 32 : index, transpose} : memref<32x32xi8, 3> -> !gpu.mma_matrix<16x16xsi8, "AOp">		%0 = gpu.subgroup_mma_load_matrix %wg[%i, %j] {leadDimension = 32 : index, transpose} : memref<32x32xi8, 3> -> !gpu.mma_matrix<16x16xsi8, "AOp">
// CHECK: %[[INX:.*]] = llvm.mlir.constant(16 : index) : i64		// CHECK: %[[INX64:.*]] = llvm.mlir.constant(16 : index) : i64
// CHECK: %{{.}} = llvm.insertvalue %{{.}}, %{{.}}[{{.}}, {{.*}}]		// CHECK: %{{.}} = llvm.insertvalue %{{.}}, %{{.}}[{{.}}, {{.*}}]
// CHECK: %[[BASE:.]] = llvm.extractvalue %{{.}}[1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<2 x i64>, array<2 x i64>)>		// CHECK: %[[BASE:.]] = llvm.extractvalue %{{.}}[1] : !llvm.struct<(ptr<3>, ptr<3>, i32, array<2 x i32>, array<2 x i32>)>
// CHECK: %[[LDM:.*]] = llvm.mlir.constant(32 : index) : i64		// CHECK: %[[LDM:.*]] = llvm.mlir.constant(32 : index) : i32
// CHECK: %[[LI:.*]] = llvm.mul %[[INX]], %[[LDM]] : i64		// CHECK: %[[INX:.*]] = llvm.trunc %[[INX64]] : i64 to i32
// CHECK: %[[LIJ:.*]] = llvm.add %[[LI]], %[[INX]] : i64		// CHECK: %[[LI:.*]] = llvm.mul %[[INX]], %[[LDM]] : i32
// CHECK: %[[ADDRESS:.*]] = llvm.getelementptr %[[BASE]][%[[LIJ]]] : (!llvm.ptr<3>, i64) -> !llvm.ptr<3>, i8		// CHECK: %[[INX2:.*]] = llvm.trunc %[[INX64]] : i64 to i32
		// CHECK: %[[LIJ:.*]] = llvm.add %[[LI]], %[[INX2]] : i32
		// CHECK: %[[ADDRESS:.*]] = llvm.getelementptr %[[BASE]][%[[LIJ]]] : (!llvm.ptr<3>, i32) -> !llvm.ptr<3>, i8
// CHECK: %[[LDM32:.*]] = llvm.mlir.constant(32 : index) : i32		// CHECK: %[[LDM32:.*]] = llvm.mlir.constant(32 : index) : i32
// CHECK: %[[FRAG:.*]] = nvvm.wmma.load %[[ADDRESS]], %[[LDM32]]		// CHECK: %[[FRAG:.*]] = nvvm.wmma.load %[[ADDRESS]], %[[LDM32]]
// CHECK-SAME: {eltype = #nvvm.mma_type<s8>, frag = #nvvm.mma_frag<a>, k = 16 : i32, layout = #nvvm.mma_layout<col>, m = 16 : i32, n = 16 : i32} : (!llvm.ptr<3>) -> !llvm.struct<(i32, i32)>		// CHECK-SAME: {eltype = #nvvm.mma_type<s8>, frag = #nvvm.mma_frag<a>, k = 16 : i32, layout = #nvvm.mma_layout<col>, m = 16 : i32, n = 16 : i32} : (!llvm.ptr<3>) -> !llvm.struct<(i32, i32)>
// CHECK: llvm.return %[[FRAG]] : !llvm.struct<(i32, i32)>		// CHECK: llvm.return %[[FRAG]] : !llvm.struct<(i32, i32)>

// CHECK32: %[[INX:.*]] = llvm.mlir.constant(16 : index) : i32		// CHECK32: %[[INX:.*]] = llvm.mlir.constant(16 : index) : i32
// CHECK32: %{{.}} = llvm.insertvalue %{{.}}, %{{.}}[{{.}}, {{.*}}]		// CHECK32: %{{.}} = llvm.insertvalue %{{.}}, %{{.}}[{{.}}, {{.*}}]
// CHECK32: %[[BASE:.]] = llvm.extractvalue %{{.}}[1] : !llvm.struct<(ptr<3>, ptr<3>, i32, array<2 x i32>, array<2 x i32>)>		// CHECK32: %[[BASE:.]] = llvm.extractvalue %{{.}}[1] : !llvm.struct<(ptr<3>, ptr<3>, i32, array<2 x i32>, array<2 x i32>)>
Show All 17 Lines	gpu.module @test_module {
// CHECK-SAME: (%[[D:.*]]: !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>)		// CHECK-SAME: (%[[D:.*]]: !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>)
// CHECK32-LABEL: func @gpu_wmma_store_op		// CHECK32-LABEL: func @gpu_wmma_store_op
// CHECK32-SAME: (%[[D:.*]]: !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>)		// CHECK32-SAME: (%[[D:.*]]: !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>)
func.func @gpu_wmma_store_op(%arg0 : !gpu.mma_matrix<16x16xf16, "COp">) -> () {		func.func @gpu_wmma_store_op(%arg0 : !gpu.mma_matrix<16x16xf16, "COp">) -> () {
%sg = memref.alloca(){alignment = 32} : memref<32x32xf16, 3>		%sg = memref.alloca(){alignment = 32} : memref<32x32xf16, 3>
%i = arith.constant 16 : index		%i = arith.constant 16 : index
%j = arith.constant 16 : index		%j = arith.constant 16 : index
gpu.subgroup_mma_store_matrix %arg0, %sg[%i,%j] {leadDimension= 32 : index, transpose} : !gpu.mma_matrix<16x16xf16, "COp">, memref<32x32xf16, 3>		gpu.subgroup_mma_store_matrix %arg0, %sg[%i,%j] {leadDimension= 32 : index, transpose} : !gpu.mma_matrix<16x16xf16, "COp">, memref<32x32xf16, 3>
// CHECK: %[[INX:.*]] = llvm.mlir.constant(16 : index) : i64		// CHECK: %[[INX64:.*]] = llvm.mlir.constant(16 : index) : i64
// CHECK: %{{.}} = llvm.insertvalue %{{.}}, %{{.}}[{{.}}, {{.*}}]		// CHECK: %{{.}} = llvm.insertvalue %{{.}}, %{{.}}[{{.}}, {{.*}}]
// CHECK: %{{.}} = llvm.insertvalue %{{.}}, %{{.}}[{{.}}, {{.*}}]		// CHECK: %{{.}} = llvm.insertvalue %{{.}}, %{{.}}[{{.}}, {{.*}}]
// CHECK: %{{.}} = llvm.insertvalue %{{.}}, %{{.}}[{{.}}, {{.*}}]		// CHECK: %{{.}} = llvm.insertvalue %{{.}}, %{{.}}[{{.}}, {{.*}}]
// CHECK: %[[MEMREF:.]] = llvm.insertvalue %{{.}}, %{{.}}[{{.}}, {{.*}}]		// CHECK: %[[MEMREF:.]] = llvm.insertvalue %{{.}}, %{{.}}[{{.}}, {{.*}}]
// CHECK: %[[EL1:.*]] = llvm.extractvalue %[[D]][0] : !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>		// CHECK: %[[EL1:.*]] = llvm.extractvalue %[[D]][0] : !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>
// CHECK: %[[EL2:.*]] = llvm.extractvalue %[[D]][1] : !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>		// CHECK: %[[EL2:.*]] = llvm.extractvalue %[[D]][1] : !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>
// CHECK: %[[EL3:.*]] = llvm.extractvalue %[[D]][2] : !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>		// CHECK: %[[EL3:.*]] = llvm.extractvalue %[[D]][2] : !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>
// CHECK: %[[EL4:.*]] = llvm.extractvalue %[[D]][3] : !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>		// CHECK: %[[EL4:.*]] = llvm.extractvalue %[[D]][3] : !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>
// CHECK: %[[BASE:.*]] = llvm.extractvalue %[[MEMREF]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<2 x i64>, array<2 x i64>)>		// CHECK: %[[BASE:.*]] = llvm.extractvalue %[[MEMREF]][1] : !llvm.struct<(ptr<3>, ptr<3>, i32, array<2 x i32>, array<2 x i32>)>
// CHECK: %[[LDM:.*]] = llvm.mlir.constant(32 : index) : i64		// CHECK: %[[LDM:.*]] = llvm.mlir.constant(32 : index) : i32
// CHECK: %[[LI:.*]] = llvm.mul %[[INX]], %[[LDM]] : i64		// CHECK: %[[INX:.*]] = llvm.trunc %[[INX64]] : i64 to i32
// CHECK: %[[LIJ:.*]] = llvm.add %[[LI]], %[[INX]] : i64		// CHECK: %[[LI:.*]] = llvm.mul %[[INX]], %[[LDM]] : i32
// CHECK: %[[ADDRESS:.*]] = llvm.getelementptr %[[BASE]][%[[LIJ]]] : (!llvm.ptr<3>, i64) -> !llvm.ptr<3>, f16		// CHECK: %[[INX2:.*]] = llvm.trunc %[[INX64]] : i64 to i32
		// CHECK: %[[LIJ:.*]] = llvm.add %[[LI]], %[[INX2]] : i32
		// CHECK: %[[ADDRESS:.*]] = llvm.getelementptr %[[BASE]][%[[LIJ]]] : (!llvm.ptr<3>, i32) -> !llvm.ptr<3>, f16
// CHECK: %[[LDM32:.*]] = llvm.mlir.constant(32 : index) : i32		// CHECK: %[[LDM32:.*]] = llvm.mlir.constant(32 : index) : i32
// CHECK: nvvm.wmma.store %[[ADDRESS]], %[[LDM32]], %[[EL1]], %[[EL2]], %[[EL3]], %[[EL4]]		// CHECK: nvvm.wmma.store %[[ADDRESS]], %[[LDM32]], %[[EL1]], %[[EL2]], %[[EL3]], %[[EL4]]
// CHECK-SAME: {eltype = #nvvm.mma_type<f16>, k = 16 : i32, layout = #nvvm.mma_layout<col>, m = 16 : i32, n = 16 : i32} : !llvm.ptr<3>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>		// CHECK-SAME: {eltype = #nvvm.mma_type<f16>, k = 16 : i32, layout = #nvvm.mma_layout<col>, m = 16 : i32, n = 16 : i32} : !llvm.ptr<3>, vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>
// CHECK: llvm.return		// CHECK: llvm.return

// CHECK32: %[[INX:.*]] = llvm.mlir.constant(16 : index) : i32		// CHECK32: %[[INX:.*]] = llvm.mlir.constant(16 : index) : i32
// CHECK32: %{{.}} = llvm.insertvalue %{{.}}, %{{.}}[{{.}}, {{.*}}]		// CHECK32: %{{.}} = llvm.insertvalue %{{.}}, %{{.}}[{{.}}, {{.*}}]
// CHECK32: %{{.}} = llvm.insertvalue %{{.}}, %{{.}}[{{.}}, {{.*}}]		// CHECK32: %{{.}} = llvm.insertvalue %{{.}}, %{{.}}[{{.}}, {{.*}}]
▲ Show 20 Lines • Show All 235 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/GPU/CUDA/test-warp-distribute.mlir

	// Run the test cases without distributing ops to test default lowering. Run			// Run the test cases without distributing ops to test default lowering. Run
	// everything on the same thread.			// everything on the same thread.
	// RUN: mlir-opt %s -test-vector-warp-distribute=rewrite-warp-ops-to-scf-if -canonicalize \| \			// RUN: mlir-opt %s -test-vector-warp-distribute=rewrite-warp-ops-to-scf-if -canonicalize \| \
	// RUN: mlir-opt -convert-scf-to-cf -convert-cf-to-llvm -convert-vector-to-llvm -convert-arith-to-llvm \			// RUN: mlir-opt -convert-scf-to-cf -convert-cf-to-llvm -convert-arith-to-llvm \
	// RUN: -gpu-kernel-outlining \|\			// RUN: -gpu-kernel-outlining \|\
	// RUN: mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm,reconcile-unrealized-casts,gpu-to-cubin))' \|\			// RUN: mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm,reconcile-unrealized-casts,gpu-to-cubin))' \|\
	// RUN: mlir-opt -gpu-to-llvm -reconcile-unrealized-casts \|\			// RUN: mlir-opt -gpu-to-llvm -reconcile-unrealized-casts \|\
	// RUN: mlir-cpu-runner -e main -entry-point-result=void \			// RUN: mlir-cpu-runner -e main -entry-point-result=void \
	// RUN: -shared-libs=%mlir_cuda_runtime \			// RUN: -shared-libs=%mlir_cuda_runtime \
	// RUN: -shared-libs=%mlir_c_runner_utils \			// RUN: -shared-libs=%mlir_c_runner_utils \
	// RUN: -shared-libs=%mlir_runner_utils \| \			// RUN: -shared-libs=%mlir_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	// Run the same test cases with distribution and propagation.			// Run the same test cases with distribution and propagation.
	// RUN: mlir-opt %s -test-vector-warp-distribute="hoist-uniform distribute-transfer-write" \			// RUN: mlir-opt %s -test-vector-warp-distribute="hoist-uniform distribute-transfer-write" \
	// RUN: -test-vector-warp-distribute=rewrite-warp-ops-to-scf-if -canonicalize \| \			// RUN: -test-vector-warp-distribute=rewrite-warp-ops-to-scf-if -canonicalize \| \
	// RUN: mlir-opt -convert-scf-to-cf -convert-cf-to-llvm -convert-vector-to-llvm -convert-arith-to-llvm \			// RUN: mlir-opt -convert-scf-to-cf -convert-cf-to-llvm -convert-arith-to-llvm \
	// RUN: -gpu-kernel-outlining \|\			// RUN: -gpu-kernel-outlining \|\
	// RUN: mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm,reconcile-unrealized-casts,gpu-to-cubin))' \|\			// RUN: mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm,reconcile-unrealized-casts,gpu-to-cubin))' \|\
	// RUN: mlir-opt -gpu-to-llvm -reconcile-unrealized-casts \|\			// RUN: mlir-opt -gpu-to-llvm -reconcile-unrealized-casts \|\
	// RUN: mlir-cpu-runner -e main -entry-point-result=void \			// RUN: mlir-cpu-runner -e main -entry-point-result=void \
	// RUN: -shared-libs=%mlir_cuda_runtime \			// RUN: -shared-libs=%mlir_cuda_runtime \
	// RUN: -shared-libs=%mlir_c_runner_utils \			// RUN: -shared-libs=%mlir_c_runner_utils \
	// RUN: -shared-libs=%mlir_runner_utils \| \			// RUN: -shared-libs=%mlir_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	// RUN: mlir-opt %s -test-vector-warp-distribute="hoist-uniform distribute-transfer-write propagate-distribution" \			// RUN: mlir-opt %s -test-vector-warp-distribute="hoist-uniform distribute-transfer-write propagate-distribution" \
	// RUN: -test-vector-warp-distribute=rewrite-warp-ops-to-scf-if -canonicalize \| \			// RUN: -test-vector-warp-distribute=rewrite-warp-ops-to-scf-if -canonicalize \| \
	// RUN: mlir-opt -convert-scf-to-cf -convert-cf-to-llvm -convert-vector-to-llvm -convert-arith-to-llvm \			// RUN: mlir-opt -convert-scf-to-cf -convert-cf-to-llvm -convert-arith-to-llvm \
	// RUN: -gpu-kernel-outlining \|\			// RUN: -gpu-kernel-outlining \|\
	// RUN: mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm,reconcile-unrealized-casts,gpu-to-cubin))' \|\			// RUN: mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm,reconcile-unrealized-casts,gpu-to-cubin))' \|\
	// RUN: mlir-opt -gpu-to-llvm -reconcile-unrealized-casts \|\			// RUN: mlir-opt -gpu-to-llvm -reconcile-unrealized-casts \|\
	// RUN: mlir-cpu-runner -e main -entry-point-result=void \			// RUN: mlir-cpu-runner -e main -entry-point-result=void \
	// RUN: -shared-libs=%mlir_cuda_runtime \			// RUN: -shared-libs=%mlir_cuda_runtime \
	// RUN: -shared-libs=%mlir_c_runner_utils \			// RUN: -shared-libs=%mlir_c_runner_utils \
	// RUN: -shared-libs=%mlir_runner_utils \| \			// RUN: -shared-libs=%mlir_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

utils/bazel/llvm-project-overlay/mlir/BUILD.bazel

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,948 Lines • ▼ Show 20 Lines	deps = [
":GPUToNVVMGen",		":GPUToNVVMGen",
":GPUTransforms",		":GPUTransforms",
":IR",		":IR",
":LLVMCommonConversion",		":LLVMCommonConversion",
":LLVMDialect",		":LLVMDialect",
":MathDialect",		":MathDialect",
":MemRefDialect",		":MemRefDialect",
":MemRefToLLVM",		":MemRefToLLVM",
		":NVGPUDialect",
":NVVMDialect",		":NVVMDialect",
":Pass",		":Pass",
":Transforms",		":Transforms",
"//llvm:Support",		"//llvm:Support",
],		],
)		)

cc_library(		cc_library(
▲ Show 20 Lines • Show All 7,159 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][memref] WIP - Reorganize conversions involving memref in the presence of backend-specific type conversions.Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 546461

mlir/include/mlir/Conversion/LLVMCommon/LoweringOptions.h

mlir/include/mlir/Conversion/LLVMCommon/Pattern.h

mlir/include/mlir/Conversion/LLVMCommon/TypeConverter.h

mlir/include/mlir/Conversion/MemRefToLLVM/AllocLikeConversion.h

mlir/include/mlir/Dialect/GPU/IR/GPUBase.td

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp

mlir/lib/Conversion/GPUToNVVM/CMakeLists.txt

mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp

mlir/lib/Conversion/LLVMCommon/Pattern.cpp

mlir/lib/Conversion/LLVMCommon/TypeConverter.cpp

mlir/lib/Conversion/MemRefToLLVM/MemRefToLLVM.cpp

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp

mlir/test/Conversion/GPUCommon/memory-attrbution.mlir

mlir/test/Conversion/GPUToNVVM/typed-pointers.mlir

mlir/test/Conversion/GPUToNVVM/wmma-ops-to-nvvm.mlir

mlir/test/Integration/Dialect/Vector/GPU/CUDA/test-warp-distribute.mlir

utils/bazel/llvm-project-overlay/mlir/BUILD.bazel

[mlir][memref] WIP - Reorganize conversions involving memref in the presence of backend-specific type conversions.
Needs ReviewPublic