This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/
-
mlir/
-
Dialect/
-
GPU/
-
IR/
4/4
GPUBase.td
-
Transforms/
1/1
Passes.h
-
Passes.td
-
LLVMIR/
-
ROCDLOps.td
-
NVGPU/IR/
-
IR/
-
NVGPU.td
-
lib/
-
Conversion/
-
GPUCommon/
-
GPUOpsLowering.h
-
GPUOpsLowering.cpp
-
GPUToNVVM/
2/2
LowerGpuOpsToNVVMOps.cpp
-
GPUToROCDL/
1/2
LowerGpuOpsToROCDLOps.cpp
-
MemRefToSPIRV/
-
MapMemRefStorageClassPass.cpp
-
Dialect/
-
GPU/
-
CMakeLists.txt
-
IR/
-
GPUDialect.cpp
-
Transforms/
-
AllReduceLowering.cpp
8/9
LowerMemorySpaceAttributes.cpp
-
MemoryPromotion.cpp
-
NVGPU/
-
IR/
4/4
NVGPUDialect.cpp
-
Transforms/
1/1
OptimizeSharedMemory.cpp
-
test/
-
Conversion/GPUCommon/
-
GPUCommon/
-
memory-attrbution.mlir
-
Dialect/
-
GPU/
-
all-reduce-max.mlir
-
all-reduce.mlir
2/2
invalid.mlir
-
promotion.mlir
-
NVGPU/
-
invalid.mlir

Differential D140644

[mlir][gpu] Migrate hard-coded address space integers to an enum attribute (gpu::AddressSpaceAttr)
ClosedPublic

Authored by christopherbate on Dec 23 2022, 4:44 PM.

Download Raw Diff

Details

Reviewers

antiagainst
bondhugula
ThomasRaoux
nicolasvasilache
herhut
ftynse
dcaballe

Commits

rG6ca1a09f03e8: [mlir][gpu] Migrate hard-coded address space integers to an enum attribute (gpu…

Summary

This is a purely mechanical change that introduces an enum attribute in the GPU
dialect to represent the various memref memory spaces as opposed to the
hard-coded integer attributes that are currently used.

The following steps were taken to make the transition across the codebase:

Introduce a pass "gpu-lower-memory-space-attributes":

The pass updates all memref types that have a memory space attribute that is a
gpu::AddressSpaceAttr. These attributes are changed to IntegerAttr's using a
mapping that is given by the caller. This pass is based on the
"map-memref-spirv-storage-class" pass and the common functions can probably
be refactored into a set of utilities under the MemRef dialect.

Update the verifiers of GPU/NVGPU dialect operations.

If a verifier currently checks the address space of an operand using
e.g.getWorkspaceAddressSpace, then it can continue to do so. However, the
checks are changed to only fail if the memory space is either missing or a wrong
value of type gpu::AddressSpaceAttr. Otherwise, it just assumes the address
space is correct because it was specifically lowered to something other than a
gpu::AddressSpaceAttr.

Update existing gpu-to-llvm conversion infrastructure.

In the existing gpu-to-X passes, we add a full conversion equivalent to
gpu-lower-memory-space-attributes just before doing the conversion to the
LLVMDialect. This is done because currently both the gpu-to-llvm passes
(rocdl,nvvm) run gpu-to-gpu rewrites within the pass, which introduce
AddressSpaceAttr memory space annotations. Therefore, I inserted the
memory space conversion between the gpu-to-gpu rewrites and the LLVM
conversion.

For more context see the below discourse discussion:
https://discourse.llvm.org/t/gpu-workgroup-shared-memory-address-space-is-hard-coded/

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

christopherbate created this revision.Dec 23 2022, 4:44 PM

Herald added a reviewer: antiagainst. · View Herald TranscriptDec 23 2022, 4:44 PM

Herald added a reviewer: bondhugula. · View Herald Transcript

Herald added a reviewer: ThomasRaoux. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: Moerafaat, zero9178, bzcheeseman and 26 others. · View Herald Transcript

christopherbate requested review of this revision.Dec 23 2022, 4:44 PM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptDec 23 2022, 4:44 PM

Herald added a reviewer: herhut. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache, jholewinski. · View Herald Transcript

The code for the pass that is introduced here is being exercised in the gpu-to-llvm conversions, but I still need to add tests for the pass in isolation.

Harbormaster completed remote builds in B204826: Diff 485172.Dec 23 2022, 5:06 PM

The approach looks good overall. Please address a bunch of cosmetic changes.

mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
76–77	Any particular reason to start from 3 here? I'd rather use the number that wouldn't work transparently for CUDA so we shake off related issues early. I understand reserving 0 for the default space though.
76–77	Also, please add the non-capitalized version of the string as the third argument to `I32EnumAttrCase`, we don't want capitalized names in the assembly syntax.
mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
184	We'd better not ignore pattern application failures. Even if the patterns _currently_ cannot fail, they may be eventually changed and we will puzzled as to why the pass succeeds while doing nothing.
mlir/lib/Dialect/GPU/Transforms/LowerMemorySpaceAttributes.cpp
33–34	Copy-pasta? This file is not related to SPIR-V conversion.
43	Ditto.
51	Same here and probably below.
99–108	I see that some of this code is carried over, but we should modernize it. There is no reason to privilege builtin function types in this conversion. Use the `SubelementTypeInterface` to access nested types and convert them. This will support, e.g., memref of memref, which is currently ignored.
122	Nit: no need to prefix `SmallVector` with `llvm::`. And also no need to specify the explicit number of stack elements. Here and below.
183	Nit: please add a newline.
mlir/lib/Dialect/NVGPU/IR/NVGPUDialect.cpp
42	Can we factor out the magic `3` into a named constant, something like `NVGPUDialect::kSharedMemoryAddressSpace`.
272	Nit: let's not hardcode 3 here either, use the named constant instead.
mlir/test/Dialect/GPU/invalid.mlir
353	Please drop trailing spaces here and below.

This revision now requires changes to proceed.Dec 28 2022, 12:44 AM

bondhugula added inline comments.Dec 28 2022, 9:38 AM

mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
188–190	Nit: These three lines can go below to the relevant. case statement.
mlir/lib/Dialect/NVGPU/IR/NVGPUDialect.cpp
42	+1 to `kSharedMemoryAddressSpace`.
mlir/lib/Dialect/NVGPU/Transforms/OptimizeSharedMemory.cpp
261	`allocOp.getType()`.
mlir/test/Dialect/GPU/invalid.mlir
364	Drop trailing whitespace. `git diff --check HEAD~`

antiagainst added inline comments.Dec 29 2022, 3:43 PM

mlir/include/mlir/Dialect/GPU/Transforms/Passes.h
67	Populates type conversion rules for ...
mlir/lib/Dialect/GPU/Transforms/LowerMemorySpaceAttributes.cpp
99–108	+1! It would be nice to update the SPIR-V side too. :)

I've got a few notes

mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
76	Because AMD wants it, please include `global : 1`
80
mlir/lib/Dialect/NVGPU/IR/NVGPUDialect.cpp
42	+1 and do it for ROCDL too

Address all comments

Herald added a reviewer: dcaballe. · View Herald TranscriptJan 5 2023, 9:00 AM

Herald added a subscriber: awarzynski. · View Herald Transcript

christopherbate added inline comments.Jan 5 2023, 9:00 AM

mlir/lib/Dialect/GPU/Transforms/LowerMemorySpaceAttributes.cpp
99–108	I updated it to to `SubElementTypeInterface`, thanks for the pointer! That appears to handle all cases in this function.
99–108	For the SPIRV dialect code, I'm not very familiar with it, so I didn't modify it as it's not absolutely required for this patch.

Thanks!

mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
141	Nit: can these be turned into named constants, similarly to NVVM?

This revision is now accepted and ready to land.Jan 5 2023, 9:07 AM

christopherbate added inline comments.Jan 5 2023, 10:15 AM

mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
141	Yeah I missed that one, will fix before landing.

Harbormaster completed remote builds in B205931: Diff 486600.Jan 5 2023, 10:45 AM

Add some tests for the standalone "gpu-lower-memory-space-attributes" pass.

In gpu.func verifier, don't emit an error if block args for workgroup/private address space don't contain an attribute at all. This could be purposeful.
Only emit an error if the memref type contains the wrong #gpu.address_space attribute.

Fix trailing whitespace.

Harbormaster completed remote builds in B206835: Diff 487858.Jan 10 2023, 11:37 AM

Closed by commit rG6ca1a09f03e8: [mlir][gpu] Migrate hard-coded address space integers to an enum attribute (gpu… (authored by christopherbate). · Explain WhyJan 13 2023, 10:00 AM

This revision was automatically updated to reflect the committed changes.

christopherbate added a commit: rG6ca1a09f03e8: [mlir][gpu] Migrate hard-coded address space integers to an enum attribute (gpu….

This appears to have broken the Windows buildbot. https://lab.llvm.org/buildbot/#/builders/13/builds/30769/steps/6/logs/stdio

C:\buildbot\mlir-x64-windows-ninja\llvm-project\mlir\lib\Conversion\GPUToNVVM\LowerGpuOpsToNVVMOps.cpp(209) : error C2220: the following warning is treated as an error
C:\buildbot\mlir-x64-windows-ninja\llvm-project\mlir\lib\Conversion\GPUToNVVM\LowerGpuOpsToNVVMOps.cpp(209) : warning C4715: '<lambda_7426c7b91625e1020521dab1f464b370>::operator()': not all control paths return a value

C:\buildbot\mlir-x64-windows-ninja\llvm-project\mlir\lib\Conversion\GPUToROCDL\LowerGpuOpsToROCDLOps.cpp(149) : error C2220: the following warning is treated as an error
C:\buildbot\mlir-x64-windows-ninja\llvm-project\mlir\lib\Conversion\GPUToROCDL\LowerGpuOpsToROCDLOps.cpp(149) : warning C4715: '<lambda_fea91019496d34fd405e9fcb5563e11d>::operator()': not all control paths return a value

In D140644#4052291, @NathanielMcVicar wrote:

This appears to have broken the Windows buildbot. https://lab.llvm.org/buildbot/#/builders/13/builds/30769/steps/6/logs/stdio

C:\buildbot\mlir-x64-windows-ninja\llvm-project\mlir\lib\Conversion\GPUToNVVM\LowerGpuOpsToNVVMOps.cpp(209) : error C2220: the following warning is treated as an error
C:\buildbot\mlir-x64-windows-ninja\llvm-project\mlir\lib\Conversion\GPUToNVVM\LowerGpuOpsToNVVMOps.cpp(209) : warning C4715: '<lambda_7426c7b91625e1020521dab1f464b370>::operator()': not all control paths return a value

C:\buildbot\mlir-x64-windows-ninja\llvm-project\mlir\lib\Conversion\GPUToROCDL\LowerGpuOpsToROCDLOps.cpp(149) : error C2220: the following warning is treated as an error
C:\buildbot\mlir-x64-windows-ninja\llvm-project\mlir\lib\Conversion\GPUToROCDL\LowerGpuOpsToROCDLOps.cpp(149) : warning C4715: '<lambda_fea91019496d34fd405e9fcb5563e11d>::operator()': not all control paths return a value

Thanks, will fix now.

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

GPU/

IR/

GPUBase.td

40 lines

Transforms/

Passes.h

19 lines

Passes.td

19 lines

LLVMIR/

ROCDLOps.td

7 lines

NVGPU/

IR/

NVGPU.td

17 lines

lib/

Conversion/

GPUCommon/

GPUOpsLowering.h

7 lines

GPUOpsLowering.cpp

2 lines

GPUToNVVM/

LowerGpuOpsToNVVMOps.cpp

59 lines

GPUToROCDL/

LowerGpuOpsToROCDLOps.cpp

42 lines

MemRefToSPIRV/

MapMemRefStorageClassPass.cpp

4 lines

Dialect/

GPU/

CMakeLists.txt

1 line

IR/

GPUDialect.cpp

16 lines

Transforms/

AllReduceLowering.cpp

6 lines

LowerMemorySpaceAttributes.cpp

182 lines

MemoryPromotion.cpp

8 lines

NVGPU/

IR/

NVGPUDialect.cpp

33 lines

Transforms/

OptimizeSharedMemory.cpp

9 lines

test/

Conversion/

GPUCommon/

memory-attrbution.mlir

24 lines

Dialect/

GPU/

8 lines

8 lines

30 lines

14 lines

NVGPU/

invalid.mlir

4 lines

Diff 486600

mlir/include/mlir/Dialect/GPU/IR/GPUBase.td

//===-- GPUBase.td - GPU dialect definitions ---------------*- tablegen -*-===// //===-- GPUBase.td - GPU dialect definitions ---------------*- tablegen -*-===//

// //

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information. // See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

// //

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// //

// Defines the GPU dialect // Defines the GPU dialect

// //

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

#ifndef GPU_BASE #ifndef GPU_BASE

#define GPU_BASE #define GPU_BASE

include "mlir/IR/AttrTypeBase.td" include "mlir/IR/AttrTypeBase.td"

include "mlir/IR/EnumAttr.td"

include "mlir/IR/OpBase.td" include "mlir/IR/OpBase.td"

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// GPU Dialect. // GPU Dialect.

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

def GPU_Dialect : Dialect { def GPU_Dialect : Dialect {

let name = "gpu"; let name = "gpu";

Show All 16 Lines let extraClassDeclaration = [{

/// Returns the number of workgroup (thread, block) dimensions supported in /// Returns the number of workgroup (thread, block) dimensions supported in

/// the GPU dialect. /// the GPU dialect.

// TODO: consider generalizing this. // TODO: consider generalizing this.

static unsigned getNumWorkgroupDimensions() { return 3; } static unsigned getNumWorkgroupDimensions() { return 3; }

/// Returns the numeric value used to identify the workgroup memory address /// Returns the numeric value used to identify the workgroup memory address

/// space. /// space.

static unsigned getWorkgroupAddressSpace() { return 3; } static AddressSpace getWorkgroupAddressSpace() { return AddressSpace::Workgroup; }

/// Returns the numeric value used to identify the private memory address /// Returns the numeric value used to identify the private memory address

/// space. /// space.

static unsigned getPrivateAddressSpace() { return 5; } static AddressSpace getPrivateAddressSpace() { return AddressSpace::Private; }

}]; }];

let dependentDialects = ["arith::ArithDialect"]; let dependentDialects = ["arith::ArithDialect"];

let useDefaultAttributePrinterParser = 1; let useDefaultAttributePrinterParser = 1;

let useDefaultTypePrinterParser = 1; let useDefaultTypePrinterParser = 1;

} }

//===----------------------------------------------------------------------===//

// GPU Enums.

//===----------------------------------------------------------------------===//

class GPU_I32Enum<string name, string description, list<I32EnumAttrCase> cases>

: I32EnumAttr<name, description, cases> {

let genSpecializedAttr = 0;

let cppNamespace = "::mlir::gpu";

}

class GPU_I32EnumAttr<string mnemonic, GPU_I32Enum enumInfo> :

EnumAttr<GPU_Dialect, enumInfo, mnemonic> {

let assemblyFormat = "`<` $value `>`";

}

def GPU_AddressSpaceGlobal : I32EnumAttrCase<"Global", 1, "global">;

krzysz00Unsubmitted

Done

let assemblyFormat = "`<` $value `>`";

}

+ def GPU_AddressSpaceGlobal :

+ I32EnumAttrCase<"global", 1>;

def GPU_AddressSpaceWorkgroup : I32EnumAttrCase<"Workgroup", 3>;

def GPU_AddressSpacePrivate : I32EnumAttrCase<"Private", 4>;

Because AMD wants it, please include global : 1

krzysz00: Because AMD wants it, please include `global : 1`

def GPU_AddressSpaceWorkgroup : I32EnumAttrCase<"Workgroup", 2, "workgroup">;

ftynseUnsubmitted

Done

Any particular reason to start from 3 here? I'd rather use the number that wouldn't work transparently for CUDA so we shake off related issues early. I understand reserving 0 for the default space though.

ftynse: Any particular reason to start from 3 here? I'd rather use the number that wouldn't work…

ftynseUnsubmitted

Done

Also, please add the non-capitalized version of the string as the third argument to I32EnumAttrCase, we don't want capitalized names in the assembly syntax.

ftynse: Also, please add the non-capitalized version of the string as the third argument to…

def GPU_AddressSpacePrivate : I32EnumAttrCase<"Private", 3, "private">;

def GPU_AddressSpaceEnum : GPU_I32Enum<

"AddressSpace", "GPU address space", [

krzysz00Unsubmitted

Done

"AddressSpace", "GPU address space", [

+ GPU_AddressSpaceGlobal,

GPU_AddressSpaceWorkgroup,

GPU_AddressSpacePrivate

krzysz00:

GPU_AddressSpaceGlobal,

GPU_AddressSpaceWorkgroup,

GPU_AddressSpacePrivate

]>;

def GPU_AddressSpaceAttr :

GPU_I32EnumAttr<"address_space", GPU_AddressSpaceEnum>;

//===----------------------------------------------------------------------===//

// GPU Types.

//===----------------------------------------------------------------------===//

def GPU_AsyncToken : DialectType< def GPU_AsyncToken : DialectType<

GPU_Dialect, CPred<"$_self.isa<::mlir::gpu::AsyncTokenType>()">, "async token type">, GPU_Dialect, CPred<"$_self.isa<::mlir::gpu::AsyncTokenType>()">, "async token type">,

BuildableType<"mlir::gpu::AsyncTokenType::get($_builder.getContext())">; BuildableType<"mlir::gpu::AsyncTokenType::get($_builder.getContext())">;

// Predicat to check if type is gpu::MMAMatrixType. // Predicat to check if type is gpu::MMAMatrixType.

def IsMMAMatrixTypePred : CPred<"$_self.isa<::mlir::gpu::MMAMatrixType>()">; def IsMMAMatrixTypePred : CPred<"$_self.isa<::mlir::gpu::MMAMatrixType>()">;

def GPU_MMAMatrix : DialectType< def GPU_MMAMatrix : DialectType<

GPU_Dialect, IsMMAMatrixTypePred, "MMAMatrix type">; GPU_Dialect, IsMMAMatrixTypePred, "MMAMatrix type">;

// Memref type acceptable to gpu.subgroup_mma_{load|store}_matrix ops. // Memref type acceptable to gpu.subgroup_mma_{load|store}_matrix ops.

def GPU_MMAMemRef : MemRefOf<[F16, F32, VectorOfRankAndType<[1], [F16, F32]>]>; def GPU_MMAMemRef : MemRefOf<[F16, F32, VectorOfRankAndType<[1], [F16, F32]>]>;

class MMAMatrixOf<list<Type> allowedTypes> : class MMAMatrixOf<list<Type> allowedTypes> :

ContainerType<AnyTypeOf<allowedTypes>, IsMMAMatrixTypePred, ContainerType<AnyTypeOf<allowedTypes>, IsMMAMatrixTypePred,

"$_self.cast<::mlir::gpu::MMAMatrixType>().getElementType()", "$_self.cast<::mlir::gpu::MMAMatrixType>().getElementType()",

"gpu.mma_matrix", "::mlir::gpu::MMAMatrixType">; "gpu.mma_matrix", "::mlir::gpu::MMAMatrixType">;

//===----------------------------------------------------------------------===//

// GPU Interfaces.

//===----------------------------------------------------------------------===//

def GPU_AsyncOpInterface : OpInterface<"AsyncOpInterface"> { def GPU_AsyncOpInterface : OpInterface<"AsyncOpInterface"> {

let description = [{ let description = [{

Interface for GPU operations that execute asynchronously on the device. Interface for GPU operations that execute asynchronously on the device.

GPU operations implementing this interface take a list of dependencies GPU operations implementing this interface take a list of dependencies

as `gpu.async.token` arguments and optionally return a `gpu.async.token`. as `gpu.async.token` arguments and optionally return a `gpu.async.token`.

The op doesn't start executing until all depent ops producing the async The op doesn't start executing until all depent ops producing the async

▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/GPU/Transforms/Passes.h

	Show All 17 Lines

	namespace llvm {			namespace llvm {
	class TargetMachine;			class TargetMachine;
	class LLVMContext;			class LLVMContext;
	class Module;			class Module;
	} // namespace llvm			} // namespace llvm

	namespace mlir {			namespace mlir {
				class TypeConverter;
				class ConversionTarget;
	namespace func {			namespace func {
	class FuncOp;			class FuncOp;
	} // namespace func			} // namespace func

	#define GEN_PASS_DECL			#define GEN_PASS_DECL
	#include "mlir/Dialect/GPU/Transforms/Passes.h.inc"			#include "mlir/Dialect/GPU/Transforms/Passes.h.inc"

	/// Pass that moves ops which are likely an index computation into gpu.launch			/// Pass that moves ops which are likely an index computation into gpu.launch
	Show All 19 Lines
	void populateGpuAllReducePatterns(RewritePatternSet &patterns);			void populateGpuAllReducePatterns(RewritePatternSet &patterns);

	/// Collect all patterns to rewrite ops within the GPU dialect.			/// Collect all patterns to rewrite ops within the GPU dialect.
	inline void populateGpuRewritePatterns(RewritePatternSet &patterns) {			inline void populateGpuRewritePatterns(RewritePatternSet &patterns) {
	populateGpuAllReducePatterns(patterns);			populateGpuAllReducePatterns(patterns);
	}			}

	namespace gpu {			namespace gpu {
				/// A function that maps a MemorySpace enum to a target-specific integer value.
				using MemorySpaceMapping =
				std::function<unsigned(gpu::AddressSpace gpuAddressSpace)>;

				/// Populates type conversion rules for lowering memory space attributes to
				antiagainstUnsubmitted Done Reply Inline Actions Populates type conversion rules for ... antiagainst: Populates type conversion rules for ...
				/// numeric values.
				void populateMemorySpaceAttributeTypeConversions(
				TypeConverter &typeConverter, const MemorySpaceMapping &mapping);

				/// Populates patterns to lower memory space attributes to numeric values.
				void populateMemorySpaceLoweringPatterns(TypeConverter &typeConverter,
				RewritePatternSet &patterns);

				/// Populates legality rules for lowering memory space attriutes to numeric
				/// values.
				void populateLowerMemorySpaceOpLegality(ConversionTarget &target);

	/// Returns the default annotation name for GPU binary blobs.			/// Returns the default annotation name for GPU binary blobs.
	std::string getDefaultGpuBinaryAnnotation();			std::string getDefaultGpuBinaryAnnotation();

	/// Base pass class to serialize kernel functions through LLVM into			/// Base pass class to serialize kernel functions through LLVM into
	/// user-specified IR and add the resulting blob as module attribute.			/// user-specified IR and add the resulting blob as module attribute.
	class SerializeToBlobPass : public OperationPass<gpu::GPUModuleOp> {			class SerializeToBlobPass : public OperationPass<gpu::GPUModuleOp> {
	public:			public:
	SerializeToBlobPass(TypeID passID);			SerializeToBlobPass(TypeID passID);
	▲ Show 20 Lines • Show All 74 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/GPU/Transforms/Passes.td

	Show All 31 Lines
	def GpuMapParallelLoopsPass			def GpuMapParallelLoopsPass
	: Pass<"gpu-map-parallel-loops", "mlir::func::FuncOp"> {			: Pass<"gpu-map-parallel-loops", "mlir::func::FuncOp"> {
	let summary = "Greedily maps loops to GPU hardware dimensions.";			let summary = "Greedily maps loops to GPU hardware dimensions.";
	let constructor = "mlir::createGpuMapParallelLoopsPass()";			let constructor = "mlir::createGpuMapParallelLoopsPass()";
	let description = "Greedily maps loops to GPU hardware dimensions.";			let description = "Greedily maps loops to GPU hardware dimensions.";
	let dependentDialects = ["mlir::gpu::GPUDialect"];			let dependentDialects = ["mlir::gpu::GPUDialect"];
	}			}

				def GPULowerMemorySpaceAttributesPass
				: Pass<"gpu-lower-memory-space-attributes"> {
				let summary = "Assign numeric values to memref memory space symbolic placeholders";
				let description = [{
				Updates all memref types that have a memory space attribute
				that is a `gpu::AddressSpaceAttr`. These attributes are
				changed to `IntegerAttr`'s using a mapping that is given in the
				options.
				}];
				let options = [
				Option<"privateAddrSpace", "private", "unsigned", "5",
				"private address space numeric value">,
				Option<"workgroupAddrSpace", "workgroup", "unsigned", "3",
				"workgroup address space numeric value">,
				Option<"globalAddrSpace", "global", "unsigned", "1",
				"global address space numeric value">
				];
				}

	#endif // MLIR_DIALECT_GPU_PASSES			#endif // MLIR_DIALECT_GPU_PASSES

mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td

Show All 24 Lines	def ROCDL_Dialect : Dialect {
let cppNamespace = "::mlir::ROCDL";		let cppNamespace = "::mlir::ROCDL";
let dependentDialects = ["LLVM::LLVMDialect"];		let dependentDialects = ["LLVM::LLVMDialect"];
let hasOperationAttrVerify = 1;		let hasOperationAttrVerify = 1;

let extraClassDeclaration = [{		let extraClassDeclaration = [{
/// Get the name of the attribute used to annotate external kernel		/// Get the name of the attribute used to annotate external kernel
/// functions.		/// functions.
static StringRef getKernelFuncAttrName() { return "rocdl.kernel"; }		static StringRef getKernelFuncAttrName() { return "rocdl.kernel"; }

		/// The address space value that represents global memory.
		static constexpr unsigned kGlobalMemoryAddressSpace = 1;
		/// The address space value that represents shared memory.
		static constexpr unsigned kSharedMemoryAddressSpace = 3;
		/// The address space value that represents private memory.
		static constexpr unsigned kPrivateMemoryAddressSpace = 5;
}];		}];
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// ROCDL op definitions		// ROCDL op definitions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

class ROCDL_Op<string mnemonic, list<Trait> traits = []> :		class ROCDL_Op<string mnemonic, list<Trait> traits = []> :
▲ Show 20 Lines • Show All 211 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/NVGPU/IR/NVGPU.td

Show All 30 Lines	let description = [{
The `NVGPU` dialect provides a bridge between higher-level target-agnostic		The `NVGPU` dialect provides a bridge between higher-level target-agnostic
dialects (GPU and Vector) and the lower-level target-specific dialect		dialects (GPU and Vector) and the lower-level target-specific dialect
(LLVM IR based NVVM dialect) for NVIDIA GPUs. This allow representing PTX		(LLVM IR based NVVM dialect) for NVIDIA GPUs. This allow representing PTX
specific operations while using MLIR high level dialects such as Memref		specific operations while using MLIR high level dialects such as Memref
and Vector for memory and target-specific register operands, respectively.		and Vector for memory and target-specific register operands, respectively.
}];		}];

let useDefaultTypePrinterParser = 1;		let useDefaultTypePrinterParser = 1;

		let extraClassDeclaration = [{
		/// Return true if the given MemRefType has an integer address
		/// space that matches the NVVM shared memory address space or
		/// is a gpu::AddressSpaceAttr attribute with value 'workgroup`.
		static bool hasSharedMemoryAddressSpace(MemRefType type);

		/// Defines the MemRef memory space attribute numeric value that indicates
		/// a memref is located in global memory. This should correspond to the
		/// value used in NVVM.
		static constexpr unsigned kGlobaldMemoryAddressSpace = 1;

		/// Defines the MemRef memory space attribute numeric value that indicates
		/// a memref is located in shared memory. This should correspond to the
		/// value used in NVVM.
		static constexpr unsigned kSharedMemoryAddressSpace = 3;
		}];
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// NVGPU Type Definitions		// NVGPU Type Definitions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

class NVGPU_Type<string name, string typeMnemonic,		class NVGPU_Type<string name, string typeMnemonic,
list<Trait> traits = []> : TypeDef<NVGPU_Dialect, name, traits> {		list<Trait> traits = []> : TypeDef<NVGPU_Dialect, name, traits> {
▲ Show 20 Lines • Show All 287 Lines • Show Last 20 Lines

mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h

	Show All 10 Lines
	#include "mlir/Conversion/LLVMCommon/Pattern.h"			#include "mlir/Conversion/LLVMCommon/Pattern.h"
	#include "mlir/Dialect/GPU/IR/GPUDialect.h"			#include "mlir/Dialect/GPU/IR/GPUDialect.h"
	#include "mlir/Dialect/LLVMIR/LLVMDialect.h"			#include "mlir/Dialect/LLVMIR/LLVMDialect.h"

	namespace mlir {			namespace mlir {

	struct GPUFuncOpLowering : ConvertOpToLLVMPattern<gpu::GPUFuncOp> {			struct GPUFuncOpLowering : ConvertOpToLLVMPattern<gpu::GPUFuncOp> {
	GPUFuncOpLowering(LLVMTypeConverter &converter, unsigned allocaAddrSpace,			GPUFuncOpLowering(LLVMTypeConverter &converter, unsigned allocaAddrSpace,
	StringAttr kernelAttributeName)			unsigned workgroupAddrSpace, StringAttr kernelAttributeName)
	: ConvertOpToLLVMPattern<gpu::GPUFuncOp>(converter),			: ConvertOpToLLVMPattern<gpu::GPUFuncOp>(converter),
	allocaAddrSpace(allocaAddrSpace),			allocaAddrSpace(allocaAddrSpace),
				workgroupAddrSpace(workgroupAddrSpace),
	kernelAttributeName(kernelAttributeName) {}			kernelAttributeName(kernelAttributeName) {}

	LogicalResult			LogicalResult
	matchAndRewrite(gpu::GPUFuncOp gpuFuncOp, OpAdaptor adaptor,			matchAndRewrite(gpu::GPUFuncOp gpuFuncOp, OpAdaptor adaptor,
	ConversionPatternRewriter &rewriter) const override;			ConversionPatternRewriter &rewriter) const override;

	private:			private:
	/// The address spcae to use for `alloca`s in private memory.			/// The address space to use for `alloca`s in private memory.
	unsigned allocaAddrSpace;			unsigned allocaAddrSpace;
				/// The address space to use declaring workgroup memory.
				unsigned workgroupAddrSpace;

	/// The attribute name to use instead of `gpu.kernel`.			/// The attribute name to use instead of `gpu.kernel`.
	StringAttr kernelAttributeName;			StringAttr kernelAttributeName;
	};			};

	/// The lowering of gpu.printf to a call to HIP hostcalls			/// The lowering of gpu.printf to a call to HIP hostcalls
	///			///
	/// Simplifies llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp, as we don't have			/// Simplifies llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp, as we don't have
	▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp

Show All 32 Lines	for (const auto &en : llvm::enumerate(gpuFuncOp.getWorkgroupAttributions())) {
auto elementType =		auto elementType =
typeConverter->convertType(type.getElementType()).template cast<Type>();		typeConverter->convertType(type.getElementType()).template cast<Type>();
auto arrayType = LLVM::LLVMArrayType::get(elementType, numElements);		auto arrayType = LLVM::LLVMArrayType::get(elementType, numElements);
std::string name = std::string(		std::string name = std::string(
llvm::formatv("__wg_{0}_{1}", gpuFuncOp.getName(), en.index()));		llvm::formatv("__wg_{0}_{1}", gpuFuncOp.getName(), en.index()));
auto globalOp = rewriter.create<LLVM::GlobalOp>(		auto globalOp = rewriter.create<LLVM::GlobalOp>(
gpuFuncOp.getLoc(), arrayType, /isConstant=/false,		gpuFuncOp.getLoc(), arrayType, /isConstant=/false,
LLVM::Linkage::Internal, name, /value=/Attribute(),		LLVM::Linkage::Internal, name, /value=/Attribute(),
/alignment=/0, gpu::GPUDialect::getWorkgroupAddressSpace());		/alignment=/0, workgroupAddrSpace);
workgroupBuffers.push_back(globalOp);		workgroupBuffers.push_back(globalOp);
}		}

// Rewrite the original GPU function to an LLVM function.		// Rewrite the original GPU function to an LLVM function.
auto convertedType = typeConverter->convertType(gpuFuncOp.getFunctionType());		auto convertedType = typeConverter->convertType(gpuFuncOp.getFunctionType());
if (!convertedType)		if (!convertedType)
return failure();		return failure();
auto funcType =		auto funcType =
▲ Show 20 Lines • Show All 353 Lines • Show Last 20 Lines

mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp

Show First 20 Lines • Show All 169 Lines • ▼ Show 20 Lines	void runOnOperation() override {

// Customize the bitwidth used for the device side index computations.		// Customize the bitwidth used for the device side index computations.
LowerToLLVMOptions options(		LowerToLLVMOptions options(
m.getContext(),		m.getContext(),
DataLayout(cast<DataLayoutOpInterface>(m.getOperation())));		DataLayout(cast<DataLayoutOpInterface>(m.getOperation())));
if (indexBitwidth != kDeriveIndexBitwidthFromDataLayout)		if (indexBitwidth != kDeriveIndexBitwidthFromDataLayout)
options.overrideIndexBitwidth(indexBitwidth);		options.overrideIndexBitwidth(indexBitwidth);

// MemRef conversion for GPU to NVVM lowering. The GPU dialect uses memory		// Apply in-dialect lowering. In-dialect lowering will replace
// space 5 for private memory attributions, but NVVM represents private		// ops which need to be lowered further, which is not supported by a
// memory allocations as local `alloca`s in the default address space. This		// single conversion pass.
// converter drops the private memory space to support the use case above.		{
LLVMTypeConverter converter(m.getContext(), options);		RewritePatternSet patterns(m.getContext());
converter.addConversion([&](MemRefType type) -> std::optional<Type> {		populateGpuRewritePatterns(patterns);
if (type.getMemorySpaceAsInt() !=		if (failed(applyPatternsAndFoldGreedily(m, std::move(patterns))))
		ftynseUnsubmitted Done Reply Inline Actions We'd better not ignore pattern application failures. Even if the patterns _currently_ cannot fail, they may be eventually changed and we will puzzled as to why the pass succeeds while doing nothing. ftynse: We'd better not ignore pattern application failures. Even if the patterns _currently_ cannot…
gpu::GPUDialect::getPrivateAddressSpace())		return signalPassFailure();
return std::nullopt;		}
return converter.convertType(MemRefType::Builder(type).setMemorySpace(
IntegerAttr::get(IntegerType::get(m.getContext(), 64), 0)));		// MemRef conversion for GPU to NVVM lowering.
		{
		RewritePatternSet patterns(m.getContext());
		bondhugulaUnsubmitted Done Reply Inline Actions Nit: These three lines can go below to the relevant. case statement. bondhugula: Nit: These three lines can go below to the relevant. case statement.
		TypeConverter typeConverter;
		typeConverter.addConversion([](Type t) { return t; });
		// NVVM uses alloca in the default address space to represent private
		// memory allocations, so drop private annotations. NVVM uses address
		// space 3 for shared memory. NVVM uses the default address space to
		// represent global memory.
		gpu::populateMemorySpaceAttributeTypeConversions(
		typeConverter, [](gpu::AddressSpace space) -> unsigned {
		switch (space) {
		case gpu::AddressSpace::Global:
		return static_cast<unsigned>(
		NVVM::NVVMMemorySpace::kGlobalMemorySpace);
		case gpu::AddressSpace::Workgroup:
		return static_cast<unsigned>(
		NVVM::NVVMMemorySpace::kSharedMemorySpace);
		case gpu::AddressSpace::Private:
		return 0;
		}
});		});
		gpu::populateMemorySpaceLoweringPatterns(typeConverter, patterns);
		ConversionTarget target(getContext());
		gpu::populateLowerMemorySpaceOpLegality(target);
		if (failed(applyFullConversion(m, target, std::move(patterns))))
		return signalPassFailure();
		}

		LLVMTypeConverter converter(m.getContext(), options);
// Lowering for MMAMatrixType.		// Lowering for MMAMatrixType.
converter.addConversion([&](gpu::MMAMatrixType type) -> Type {		converter.addConversion([&](gpu::MMAMatrixType type) -> Type {
return convertMMAToLLVMType(type);		return convertMMAToLLVMType(type);
});		});
RewritePatternSet patterns(m.getContext());
RewritePatternSet llvmPatterns(m.getContext());		RewritePatternSet llvmPatterns(m.getContext());

// Apply in-dialect lowering first. In-dialect lowering will replace ops
// which need to be lowered further, which is not supported by a single
// conversion pass.
populateGpuRewritePatterns(patterns);
(void)applyPatternsAndFoldGreedily(m, std::move(patterns));

arith::populateArithToLLVMConversionPatterns(converter, llvmPatterns);		arith::populateArithToLLVMConversionPatterns(converter, llvmPatterns);
cf::populateControlFlowToLLVMConversionPatterns(converter, llvmPatterns);		cf::populateControlFlowToLLVMConversionPatterns(converter, llvmPatterns);
populateFuncToLLVMConversionPatterns(converter, llvmPatterns);		populateFuncToLLVMConversionPatterns(converter, llvmPatterns);
populateMemRefToLLVMConversionPatterns(converter, llvmPatterns);		populateMemRefToLLVMConversionPatterns(converter, llvmPatterns);
populateGpuToNVVMConversionPatterns(converter, llvmPatterns);		populateGpuToNVVMConversionPatterns(converter, llvmPatterns);
populateGpuWMMAToNVVMConversionPatterns(converter, llvmPatterns);		populateGpuWMMAToNVVMConversionPatterns(converter, llvmPatterns);
LLVMConversionTarget target(getContext());		LLVMConversionTarget target(getContext());
configureGpuToNVVMConversionLegality(target);		configureGpuToNVVMConversionLegality(target);
Show All 40 Lines	patterns
GPULaneIdOpToNVVM, GPUShuffleOpLowering, GPUReturnOpLowering>(		GPULaneIdOpToNVVM, GPUShuffleOpLowering, GPUReturnOpLowering>(
converter);		converter);

// Explicitly drop memory space when lowering private memory		// Explicitly drop memory space when lowering private memory
// attributions since NVVM models it as `alloca`s in the default		// attributions since NVVM models it as `alloca`s in the default
// memory space and does not support `alloca`s with addrspace(5).		// memory space and does not support `alloca`s with addrspace(5).
patterns.add<GPUFuncOpLowering>(		patterns.add<GPUFuncOpLowering>(
converter, /allocaAddrSpace=/0,		converter, /allocaAddrSpace=/0,
		/workgroupAddrSpace=/
		static_cast<unsigned>(NVVM::NVVMMemorySpace::kSharedMemorySpace),
StringAttr::get(&converter.getContext(),		StringAttr::get(&converter.getContext(),
NVVM::NVVMDialect::getKernelFuncAttrName()));		NVVM::NVVMDialect::getKernelFuncAttrName()));

populateOpPatterns<math::AbsFOp>(converter, patterns, "__nv_fabsf",		populateOpPatterns<math::AbsFOp>(converter, patterns, "__nv_fabsf",
"__nv_fabs");		"__nv_fabs");
populateOpPatterns<math::AtanOp>(converter, patterns, "__nv_atanf",		populateOpPatterns<math::AtanOp>(converter, patterns, "__nv_atanf",
"__nv_atan");		"__nv_atan");
populateOpPatterns<math::Atan2Op>(converter, patterns, "__nv_atan2f",		populateOpPatterns<math::Atan2Op>(converter, patterns, "__nv_atan2f",
Show All 33 Lines

mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp

Show First 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	if (useBarePtrCallConv) {
if (canUseBarePointers.wasInterrupted()) {		if (canUseBarePointers.wasInterrupted()) {
emitError(UnknownLoc::get(ctx),		emitError(UnknownLoc::get(ctx),
"bare pointer calling convention requires all memrefs to "		"bare pointer calling convention requires all memrefs to "
"have static shape and use the identity map");		"have static shape and use the identity map");
return signalPassFailure();		return signalPassFailure();
}		}
}		}

LLVMTypeConverter converter(ctx, options);		// Apply in-dialect lowering. In-dialect lowering will replace
		// ops which need to be lowered further, which is not supported by a
		// single conversion pass.
		{
RewritePatternSet patterns(ctx);		RewritePatternSet patterns(ctx);
RewritePatternSet llvmPatterns(ctx);

populateGpuRewritePatterns(patterns);		populateGpuRewritePatterns(patterns);
(void)applyPatternsAndFoldGreedily(m, std::move(patterns));		(void)applyPatternsAndFoldGreedily(m, std::move(patterns));
		}

		// Apply memory space lowering. The target uses 3 for workgroup memory and 5
		// for private memory.
		{
		RewritePatternSet patterns(ctx);
		TypeConverter typeConverter;
		typeConverter.addConversion([](Type t) { return t; });
		gpu::populateMemorySpaceAttributeTypeConversions(
		typeConverter, [](gpu::AddressSpace space) {
		switch (space) {
		case gpu::AddressSpace::Global:
		return 1;
		ftynseUnsubmitted Not Done Reply Inline Actions Nit: can these be turned into named constants, similarly to NVVM? ftynse: Nit: can these be turned into named constants, similarly to NVVM?
		christopherbateAuthorUnsubmitted Done Reply Inline Actions Yeah I missed that one, will fix before landing. christopherbate: Yeah I missed that one, will fix before landing.
		case gpu::AddressSpace::Workgroup:
		return 3;
		case gpu::AddressSpace::Private:
		return 5;
		}
		});
		ConversionTarget target(getContext());
		gpu::populateLowerMemorySpaceOpLegality(target);
		gpu::populateMemorySpaceLoweringPatterns(typeConverter, patterns);
		if (failed(applyFullConversion(m, target, std::move(patterns))))
		return signalPassFailure();
		}

		LLVMTypeConverter converter(ctx, options);
		RewritePatternSet llvmPatterns(ctx);

mlir::arith::populateArithToLLVMConversionPatterns(converter, llvmPatterns);		mlir::arith::populateArithToLLVMConversionPatterns(converter, llvmPatterns);
populateAMDGPUToROCDLConversionPatterns(converter, llvmPatterns,		populateAMDGPUToROCDLConversionPatterns(converter, llvmPatterns,
*maybeChipset);		*maybeChipset);
populateVectorToLLVMConversionPatterns(converter, llvmPatterns);		populateVectorToLLVMConversionPatterns(converter, llvmPatterns);
cf::populateControlFlowToLLVMConversionPatterns(converter, llvmPatterns);		cf::populateControlFlowToLLVMConversionPatterns(converter, llvmPatterns);
populateFuncToLLVMConversionPatterns(converter, llvmPatterns);		populateFuncToLLVMConversionPatterns(converter, llvmPatterns);
populateMemRefToLLVMConversionPatterns(converter, llvmPatterns);		populateMemRefToLLVMConversionPatterns(converter, llvmPatterns);
Show All 40 Lines	patterns
GPUIndexIntrinsicOpLowering<gpu::BlockDimOp, ROCDL::BlockDimXOp,		GPUIndexIntrinsicOpLowering<gpu::BlockDimOp, ROCDL::BlockDimXOp,
ROCDL::BlockDimYOp, ROCDL::BlockDimZOp>,		ROCDL::BlockDimYOp, ROCDL::BlockDimZOp>,
GPUIndexIntrinsicOpLowering<gpu::BlockIdOp, ROCDL::BlockIdXOp,		GPUIndexIntrinsicOpLowering<gpu::BlockIdOp, ROCDL::BlockIdXOp,
ROCDL::BlockIdYOp, ROCDL::BlockIdZOp>,		ROCDL::BlockIdYOp, ROCDL::BlockIdZOp>,
GPUIndexIntrinsicOpLowering<gpu::GridDimOp, ROCDL::GridDimXOp,		GPUIndexIntrinsicOpLowering<gpu::GridDimOp, ROCDL::GridDimXOp,
ROCDL::GridDimYOp, ROCDL::GridDimZOp>,		ROCDL::GridDimYOp, ROCDL::GridDimZOp>,
GPUReturnOpLowering>(converter);		GPUReturnOpLowering>(converter);
patterns.add<GPUFuncOpLowering>(		patterns.add<GPUFuncOpLowering>(
converter, /allocaAddrSpace=/5,		converter,
		/allocaAddrSpace=/ROCDL::ROCDLDialect::kPrivateMemoryAddressSpace,
		/workgroupAddrSpace=/ROCDL::ROCDLDialect::kSharedMemoryAddressSpace,
StringAttr::get(&converter.getContext(),		StringAttr::get(&converter.getContext(),
ROCDL::ROCDLDialect::getKernelFuncAttrName()));		ROCDL::ROCDLDialect::getKernelFuncAttrName()));
if (Runtime::HIP == runtime) {		if (Runtime::HIP == runtime) {
patterns.add<GPUPrintfOpToHIPLowering>(converter);		patterns.add<GPUPrintfOpToHIPLowering>(converter);
} else if (Runtime::OpenCL == runtime) {		} else if (Runtime::OpenCL == runtime) {
// Use address space = 4 to match the OpenCL definition of printf()		// Use address space = 4 to match the OpenCL definition of printf()
patterns.add<GPUPrintfOpToLLVMCallLowering>(converter, /addressSpace=/4);		patterns.add<GPUPrintfOpToLLVMCallLowering>(converter, /addressSpace=/4);
}		}
▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

mlir/lib/Conversion/MemRefToSPIRV/MapMemRefStorageClassPass.cpp

Show First 20 Lines • Show All 216 Lines • ▼ Show 20 Lines	if (auto typeAttr = attr.dyn_cast<TypeAttr>())
return isLegalType(typeAttr.getValue());		return isLegalType(typeAttr.getValue());
return true;		return true;
}		}

/// Returns true if the given `op` is considered as legal for SPIR-V conversion.		/// Returns true if the given `op` is considered as legal for SPIR-V conversion.
static bool isLegalOp(Operation *op) {		static bool isLegalOp(Operation *op) {
if (auto funcOp = dyn_cast<FunctionOpInterface>(op)) {		if (auto funcOp = dyn_cast<FunctionOpInterface>(op)) {
return llvm::all_of(funcOp.getArgumentTypes(), isLegalType) &&		return llvm::all_of(funcOp.getArgumentTypes(), isLegalType) &&
llvm::all_of(funcOp.getResultTypes(), isLegalType);		llvm::all_of(funcOp.getResultTypes(), isLegalType) &&
		llvm::all_of(funcOp.getFunctionBody().getArgumentTypes(),
		isLegalType);
}		}

auto attrs = llvm::map_range(op->getAttrs(), [](const NamedAttribute &attr) {		auto attrs = llvm::map_range(op->getAttrs(), [](const NamedAttribute &attr) {
return attr.getValue();		return attr.getValue();
});		});

return llvm::all_of(op->getOperandTypes(), isLegalType) &&		return llvm::all_of(op->getOperandTypes(), isLegalType) &&
llvm::all_of(op->getResultTypes(), isLegalType) &&		llvm::all_of(op->getResultTypes(), isLegalType) &&
▲ Show 20 Lines • Show All 131 Lines • Show Last 20 Lines

mlir/lib/Dialect/GPU/CMakeLists.txt

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	add_mlir_dialect_library(MLIRGPUTransforms
Transforms/AllReduceLowering.cpp		Transforms/AllReduceLowering.cpp
Transforms/AsyncRegionRewriter.cpp		Transforms/AsyncRegionRewriter.cpp
Transforms/KernelOutlining.cpp		Transforms/KernelOutlining.cpp
Transforms/MemoryPromotion.cpp		Transforms/MemoryPromotion.cpp
Transforms/ParallelLoopMapper.cpp		Transforms/ParallelLoopMapper.cpp
Transforms/SerializeToBlob.cpp		Transforms/SerializeToBlob.cpp
Transforms/SerializeToCubin.cpp		Transforms/SerializeToCubin.cpp
Transforms/SerializeToHsaco.cpp		Transforms/SerializeToHsaco.cpp
		Transforms/LowerMemorySpaceAttributes.cpp

ADDITIONAL_HEADER_DIRS		ADDITIONAL_HEADER_DIRS
${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/GPU		${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/GPU

LINK_COMPONENTS		LINK_COMPONENTS
Core		Core
MC		MC
${NVPTX_LIBS}		${NVPTX_LIBS}
▲ Show 20 Lines • Show All 83 Lines • Show Last 20 Lines

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp

Show First 20 Lines • Show All 1,008 Lines • ▼ Show 20 Lines	LogicalResult GPUFuncOp::verifyType() {
if (isKernel() && getFunctionType().getNumResults() != 0)		if (isKernel() && getFunctionType().getNumResults() != 0)
return emitOpError() << "expected void return type for kernel function";		return emitOpError() << "expected void return type for kernel function";

return success();		return success();
}		}

static LogicalResult verifyAttributions(Operation *op,		static LogicalResult verifyAttributions(Operation *op,
ArrayRef<BlockArgument> attributions,		ArrayRef<BlockArgument> attributions,
unsigned memorySpace) {		gpu::AddressSpace memorySpace) {
for (Value v : attributions) {		for (Value v : attributions) {
auto type = v.getType().dyn_cast<MemRefType>();		auto type = v.getType().dyn_cast<MemRefType>();
if (!type)		if (!type)
return op->emitOpError() << "expected memref type in attribution";		return op->emitOpError() << "expected memref type in attribution";

if (type.getMemorySpaceAsInt() != memorySpace) {		if (!type.getMemorySpace())
		return op->emitOpError() << "expected memory space in attribution";

		// We can only verify the address space if it hasn't already been lowered
		// from the AddressSpaceAttr to a target-specific numeric value.
		auto addressSpace = type.getMemorySpace().dyn_cast<gpu::AddressSpaceAttr>();
		if (!addressSpace)
		continue;
		if (addressSpace.getValue() != memorySpace)
return op->emitOpError()		return op->emitOpError()
<< "expected memory space " << memorySpace << " in attribution";		<< "expected memory space " << stringifyAddressSpace(memorySpace)
}		<< " in attribution";
}		}
return success();		return success();
}		}

/// Verifies the body of the function.		/// Verifies the body of the function.
LogicalResult GPUFuncOp::verifyBody() {		LogicalResult GPUFuncOp::verifyBody() {
if (empty())		if (empty())
return emitOpError() << "expected body with at least one block";		return emitOpError() << "expected body with at least one block";
▲ Show 20 Lines • Show All 401 Lines • Show Last 20 Lines

mlir/lib/Dialect/GPU/Transforms/AllReduceLowering.cpp

Show First 20 Lines • Show All 152 Lines • ▼ Show 20 Lines	private:
Value getDimOp(gpu::Dimension dimension) {		Value getDimOp(gpu::Dimension dimension) {
Value dim = create<T>(indexType, dimension);		Value dim = create<T>(indexType, dimension);
return create<arith::IndexCastOp>(int32Type, dim);		return create<arith::IndexCastOp>(int32Type, dim);
}		}

/// Adds type to funcOp's workgroup attributions.		/// Adds type to funcOp's workgroup attributions.
Value createWorkgroupBuffer() {		Value createWorkgroupBuffer() {
// TODO: Pick a proper location for the attribution.		// TODO: Pick a proper location for the attribution.
int workgroupMemoryAddressSpace =		auto workgroupMemoryAddressSpace = gpu::AddressSpaceAttr::get(
gpu::GPUDialect::getWorkgroupAddressSpace();		funcOp->getContext(), gpu::GPUDialect::getWorkgroupAddressSpace());
auto bufferType = MemRefType::get({kSubgroupSize}, valueType, AffineMap{},		auto bufferType = MemRefType::get({kSubgroupSize}, valueType, AffineMap{},
workgroupMemoryAddressSpace);		workgroupMemoryAddressSpace);
return funcOp.addWorkgroupAttribution(bufferType, rewriter.getUnknownLoc());		return funcOp.addWorkgroupAttribution(bufferType, rewriter.getUnknownLoc());
}		}

/// Returns an accumulator factory using either the op attribute or the body		/// Returns an accumulator factory using either the op attribute or the body
/// region.		/// region.
AccumulatorFactory getFactory() {		AccumulatorFactory getFactory() {
▲ Show 20 Lines • Show All 228 Lines • ▼ Show 20 Lines	LogicalResult matchAndRewrite(Operation *op,
auto callback = [&](gpu::AllReduceOp reduceOp) -> WalkResult {		auto callback = [&](gpu::AllReduceOp reduceOp) -> WalkResult {
if (!reduceOp.getUniform())		if (!reduceOp.getUniform())
return WalkResult::interrupt();		return WalkResult::interrupt();

reduceOps.emplace_back(reduceOp);		reduceOps.emplace_back(reduceOp);
return WalkResult::advance();		return WalkResult::advance();
};		};

if (funcOp.walk(callback).wasInterrupted())		if (funcOp.walk(callback).wasInterrupted() \|\| reduceOps.empty())
return rewriter.notifyMatchFailure(		return rewriter.notifyMatchFailure(
op, "Non uniform reductions are not supported yet.");		op, "Non uniform reductions are not supported yet.");

for (gpu::AllReduceOp reduceOp : reduceOps)		for (gpu::AllReduceOp reduceOp : reduceOps)
GpuAllReduceRewriter(funcOp, reduceOp, rewriter).rewrite();		GpuAllReduceRewriter(funcOp, reduceOp, rewriter).rewrite();

return success();		return success();
}		}
};		};
} // namespace		} // namespace

void mlir::populateGpuAllReducePatterns(RewritePatternSet &patterns) {		void mlir::populateGpuAllReducePatterns(RewritePatternSet &patterns) {
patterns.add<GpuAllReduceConversion>(patterns.getContext());		patterns.add<GpuAllReduceConversion>(patterns.getContext());
}		}

mlir/lib/Dialect/GPU/Transforms/LowerMemorySpaceAttributes.cpp

This file was added.

				//===- LowerMemorySpaceAttributes.cpp ------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				///
				/// Implementation of a pass that rewrites the IR so that uses of
				/// `gpu::AddressSpaceAttr` in memref memory space annotations are replaced
				/// with caller-specified numeric values.
				///
				//===----------------------------------------------------------------------===//
				#include "mlir/Dialect/GPU/IR/GPUDialect.h"
				#include "mlir/Dialect/GPU/Transforms/Passes.h"
				#include "mlir/IR/PatternMatch.h"
				#include "mlir/Transforms/DialectConversion.h"
				#include "llvm/Support/Debug.h"

				namespace mlir {
				#define GEN_PASS_DEF_GPULOWERMEMORYSPACEATTRIBUTESPASS
				#include "mlir/Dialect/GPU/Transforms/Passes.h.inc"
				} // namespace mlir

				using namespace mlir;
				using namespace mlir::gpu;

				//===----------------------------------------------------------------------===//
				// Conversion Target
				//===----------------------------------------------------------------------===//

				/// Returns true if the given `type` is considered as legal during memory space
				/// attribute lowering.
				static bool isLegalType(Type type) {
				ftynseUnsubmitted Done Reply Inline Actions Copy-pasta? This file is not related to SPIR-V conversion. ftynse: Copy-pasta? This file is not related to SPIR-V conversion.
				if (auto memRefType = type.dyn_cast<BaseMemRefType>()) {
				return !memRefType.getMemorySpace()
				.isa_and_nonnull<gpu::AddressSpaceAttr>();
				}
				return true;
				}

				/// Returns true if the given `attr` is considered legal during memory space
				/// attribute lowering.
				ftynseUnsubmitted Done Reply Inline Actions Ditto. ftynse: Ditto.
				static bool isLegalAttr(Attribute attr) {
				if (auto typeAttr = attr.dyn_cast<TypeAttr>())
				return isLegalType(typeAttr.getValue());
				return true;
				}

				/// Returns true if the given `op` is legal during memory space attribute
				/// lowering.
				ftynseUnsubmitted Done Reply Inline Actions Same here and probably below. ftynse: Same here and probably below.
				static bool isLegalOp(Operation *op) {
				if (auto funcOp = dyn_cast<FunctionOpInterface>(op)) {
				return llvm::all_of(funcOp.getArgumentTypes(), isLegalType) &&
				llvm::all_of(funcOp.getResultTypes(), isLegalType) &&
				llvm::all_of(funcOp.getFunctionBody().getArgumentTypes(),
				isLegalType);
				}

				auto attrs = llvm::map_range(op->getAttrs(), [](const NamedAttribute &attr) {
				return attr.getValue();
				});

				return llvm::all_of(op->getOperandTypes(), isLegalType) &&
				llvm::all_of(op->getResultTypes(), isLegalType) &&
				llvm::all_of(attrs, isLegalAttr);
				}

				void gpu::populateLowerMemorySpaceOpLegality(ConversionTarget &target) {
				target.markUnknownOpDynamicallyLegal(isLegalOp);
				}

				//===----------------------------------------------------------------------===//
				// Type Converter
				//===----------------------------------------------------------------------===//

				IntegerAttr wrapNumericMemorySpace(MLIRContext *ctx, unsigned space) {
				return IntegerAttr::get(IntegerType::get(ctx, 64), space);
				}

				void mlir::gpu::populateMemorySpaceAttributeTypeConversions(
				TypeConverter &typeConverter, const MemorySpaceMapping &mapping) {
				typeConverter.addConversion([mapping](Type type) -> Optional<Type> {
				auto subElementType = type.dyn_cast_or_null<SubElementTypeInterface>();
				if (!subElementType)
				return type;
				Type newType = subElementType.replaceSubElements(
				[mapping](Attribute attr) -> std::optional<Attribute> {
				auto memorySpaceAttr = attr.dyn_cast_or_null<gpu::AddressSpaceAttr>();
				if (!memorySpaceAttr)
				return std::nullopt;
				auto newValue = wrapNumericMemorySpace(
				attr.getContext(), mapping(memorySpaceAttr.getValue()));
				return newValue;
				});
				return newType;
				});
				}

				namespace {

				/// Converts any op that has operands/results/attributes with numeric MemRef
				/// memory spaces.
				struct LowerMemRefAddressSpacePattern final : public ConversionPattern {
				LowerMemRefAddressSpacePattern(MLIRContext *context, TypeConverter &converter)
				: ConversionPattern(converter, MatchAnyOpTypeTag(), 1, context) {}

				LogicalResult
				ftynseUnsubmitted Done Reply Inline Actions I see that some of this code is carried over, but we should modernize it. There is no reason to privilege builtin function types in this conversion. Use the `SubelementTypeInterface` to access nested types and convert them. This will support, e.g., memref of memref, which is currently ignored. ftynse: I see that some of this code is carried over, but we should modernize it. There is no reason to…
				antiagainstUnsubmitted Not Done Reply Inline Actions +1! It would be nice to update the SPIR-V side too. :) antiagainst: +1! It would be nice to update the SPIR-V side too. :)
				christopherbateAuthorUnsubmitted Done Reply Inline Actions I updated it to to `SubElementTypeInterface`, thanks for the pointer! That appears to handle all cases in this function. christopherbate: I updated it to to `SubElementTypeInterface`, thanks for the pointer! That appears to handle…
				christopherbateAuthorUnsubmitted Done Reply Inline Actions For the SPIRV dialect code, I'm not very familiar with it, so I didn't modify it as it's not absolutely required for this patch. christopherbate: For the SPIRV dialect code, I'm not very familiar with it, so I didn't modify it as it's not…
				matchAndRewrite(Operation *op, ArrayRef<Value> operands,
				ConversionPatternRewriter &rewriter) const override {
				SmallVector<NamedAttribute> newAttrs;
				newAttrs.reserve(op->getAttrs().size());
				for (auto attr : op->getAttrs()) {
				if (auto typeAttr = attr.getValue().dyn_cast<TypeAttr>()) {
				auto newAttr = getTypeConverter()->convertType(typeAttr.getValue());
				newAttrs.emplace_back(attr.getName(), TypeAttr::get(newAttr));
				} else {
				newAttrs.push_back(attr);
				}
				}

				SmallVector<Type> newResults;
				ftynseUnsubmitted Done Reply Inline Actions Nit: no need to prefix `SmallVector` with `llvm::`. And also no need to specify the explicit number of stack elements. Here and below. ftynse: Nit: no need to prefix `SmallVector` with `llvm::`. And also no need to specify the explicit…
				(void)getTypeConverter()->convertTypes(op->getResultTypes(), newResults);

				OperationState state(op->getLoc(), op->getName().getStringRef(), operands,
				newResults, newAttrs, op->getSuccessors());

				for (Region &region : op->getRegions()) {
				Region *newRegion = state.addRegion();
				rewriter.inlineRegionBefore(region, *newRegion, newRegion->begin());
				TypeConverter::SignatureConversion result(newRegion->getNumArguments());
				(void)getTypeConverter()->convertSignatureArgs(
				newRegion->getArgumentTypes(), result);
				rewriter.applySignatureConversion(newRegion, result);
				}

				Operation *newOp = rewriter.create(state);
				rewriter.replaceOp(op, newOp->getResults());
				return success();
				}
				};
				} // namespace

				void mlir::gpu::populateMemorySpaceLoweringPatterns(
				TypeConverter &typeConverter, RewritePatternSet &patterns) {
				patterns.add<LowerMemRefAddressSpacePattern>(patterns.getContext(),
				typeConverter);
				}

				namespace {
				class LowerMemorySpaceAttributesPass
				: public mlir::impl::GPULowerMemorySpaceAttributesPassBase<
				LowerMemorySpaceAttributesPass> {
				public:
				using Base::Base;
				void runOnOperation() override {
				MLIRContext *context = &getContext();
				Operation *op = getOperation();

				ConversionTarget target(getContext());
				populateLowerMemorySpaceOpLegality(target);

				TypeConverter typeConverter;
				typeConverter.addConversion([](Type t) { return t; });
				populateMemorySpaceAttributeTypeConversions(
				typeConverter, [this](AddressSpace space) -> unsigned {
				switch (space) {
				case AddressSpace::Global:
				return globalAddrSpace;
				case AddressSpace::Workgroup:
				return workgroupAddrSpace;
				case AddressSpace::Private:
				return privateAddrSpace;
				}
				});
				RewritePatternSet patterns(context);
				populateMemorySpaceLoweringPatterns(typeConverter, patterns);
				if (failed(applyFullConversion(op, target, std::move(patterns))))
				return signalPassFailure();
				}
				};
				} // namespace
				ftynseUnsubmitted Done Reply Inline Actions Nit: please add a newline. ftynse: Nit: please add a newline.

mlir/lib/Dialect/GPU/Transforms/MemoryPromotion.cpp

	Show First 20 Lines • Show All 141 Lines • ▼ Show 20 Lines
	/// Promotes a function argument to workgroup memory in the given function. The			/// Promotes a function argument to workgroup memory in the given function. The
	/// copies will be inserted in the beginning and in the end of the function.			/// copies will be inserted in the beginning and in the end of the function.
	void mlir::promoteToWorkgroupMemory(GPUFuncOp op, unsigned arg) {			void mlir::promoteToWorkgroupMemory(GPUFuncOp op, unsigned arg) {
	Value value = op.getArgument(arg);			Value value = op.getArgument(arg);
	auto type = value.getType().dyn_cast<MemRefType>();			auto type = value.getType().dyn_cast<MemRefType>();
	assert(type && type.hasStaticShape() && "can only promote memrefs");			assert(type && type.hasStaticShape() && "can only promote memrefs");

	// Get the type of the buffer in the workgroup memory.			// Get the type of the buffer in the workgroup memory.
	int workgroupMemoryAddressSpace = gpu::GPUDialect::getWorkgroupAddressSpace();			auto workgroupMemoryAddressSpace = gpu::AddressSpaceAttr::get(
	auto bufferType = MemRefType::get(type.getShape(), type.getElementType(), {},			op->getContext(), gpu::AddressSpace::Workgroup);
	workgroupMemoryAddressSpace);			auto bufferType = MemRefType::get(type.getShape(), type.getElementType(),
				MemRefLayoutAttrInterface{},
				Attribute(workgroupMemoryAddressSpace));
	Value attribution = op.addWorkgroupAttribution(bufferType, value.getLoc());			Value attribution = op.addWorkgroupAttribution(bufferType, value.getLoc());

	// Replace the uses first since only the original uses are currently present.			// Replace the uses first since only the original uses are currently present.
	// Then insert the copies.			// Then insert the copies.
	value.replaceAllUsesWith(attribution);			value.replaceAllUsesWith(attribution);
	insertCopies(op.getBody(), op.getLoc(), value, attribution);			insertCopies(op.getBody(), op.getLoc(), value, attribution);
	}			}

mlir/lib/Dialect/NVGPU/IR/NVGPUDialect.cpp

Show All 28 Lines
#include "mlir/Dialect/NVGPU/IR/NVGPUTypes.cpp.inc"		#include "mlir/Dialect/NVGPU/IR/NVGPUTypes.cpp.inc"
>();		>();
addOperations<		addOperations<
#define GET_OP_LIST		#define GET_OP_LIST
#include "mlir/Dialect/NVGPU/IR/NVGPU.cpp.inc"		#include "mlir/Dialect/NVGPU/IR/NVGPU.cpp.inc"
>();		>();
}		}

		bool nvgpu::NVGPUDialect::hasSharedMemoryAddressSpace(MemRefType type) {
		Attribute memorySpace = type.getMemorySpace();
		if (!memorySpace)
		return false;
		if (auto intAttr = memorySpace.dyn_cast<IntegerAttr>())
		return intAttr.getInt() == NVGPUDialect::kSharedMemoryAddressSpace;
		ftynseUnsubmitted Done Reply Inline Actions Can we factor out the magic `3` into a named constant, something like `NVGPUDialect::kSharedMemoryAddressSpace`. ftynse: Can we factor out the magic `3` into a named constant, something like `NVGPUDialect…
		bondhugulaUnsubmitted Done Reply Inline Actions +1 to `kSharedMemoryAddressSpace`. bondhugula: +1 to `kSharedMemoryAddressSpace`.
		krzysz00Unsubmitted Done Reply Inline Actions +1 and do it for ROCDL too krzysz00: +1 and do it for ROCDL too
		if (auto gpuAttr = memorySpace.dyn_cast<gpu::AddressSpaceAttr>())
		return gpuAttr.getValue() == gpu::AddressSpace::Workgroup;
		return false;
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// NVGPU_DeviceAsyncCopyOp		// NVGPU_DeviceAsyncCopyOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// Return true if the last dimension of the MemRefType has unit stride. Also		/// Return true if the last dimension of the MemRefType has unit stride. Also
/// return true for memrefs with no strides.		/// return true for memrefs with no strides.
static bool isLastMemrefDimUnitStride(MemRefType type) {		static bool isLastMemrefDimUnitStride(MemRefType type) {
int64_t offset;		int64_t offset;
SmallVector<int64_t> strides;		SmallVector<int64_t> strides;
if (failed(getStridesAndOffset(type, strides, offset))) {		if (failed(getStridesAndOffset(type, strides, offset))) {
return false;		return false;
}		}
return strides.back() == 1;		return strides.back() == 1;
}		}

LogicalResult DeviceAsyncCopyOp::verify() {		LogicalResult DeviceAsyncCopyOp::verify() {
auto srcMemref = getSrc().getType().cast<MemRefType>();		auto srcMemref = getSrc().getType().cast<MemRefType>();
auto dstMemref = getDst().getType().cast<MemRefType>();		auto dstMemref = getDst().getType().cast<MemRefType>();
unsigned workgroupAddressSpace = gpu::GPUDialect::getWorkgroupAddressSpace();
if (!isLastMemrefDimUnitStride(srcMemref))		if (!isLastMemrefDimUnitStride(srcMemref))
return emitError("source memref most minor dim must have unit stride");		return emitError("source memref most minor dim must have unit stride");
if (!isLastMemrefDimUnitStride(dstMemref))		if (!isLastMemrefDimUnitStride(dstMemref))
return emitError("destination memref most minor dim must have unit stride");		return emitError("destination memref most minor dim must have unit stride");
if (dstMemref.getMemorySpaceAsInt() != workgroupAddressSpace)		if (!NVGPUDialect::hasSharedMemoryAddressSpace(dstMemref))
return emitError("destination memref must have memory space ")		return emitError()
<< workgroupAddressSpace;		<< "destination memref must have a memory space attribute of "
		"IntegerAttr("
		<< NVGPUDialect::kSharedMemoryAddressSpace
		<< ") or gpu::AddressSpaceAttr(Workgroup)";
if (dstMemref.getElementType() != srcMemref.getElementType())		if (dstMemref.getElementType() != srcMemref.getElementType())
return emitError("source and destination must have the same element type");		return emitError("source and destination must have the same element type");
if (size_t(srcMemref.getRank()) != getSrcIndices().size())		if (size_t(srcMemref.getRank()) != getSrcIndices().size())
return emitOpError() << "expected " << srcMemref.getRank()		return emitOpError() << "expected " << srcMemref.getRank()
<< " source indices, got " << getSrcIndices().size();		<< " source indices, got " << getSrcIndices().size();
if (size_t(dstMemref.getRank()) != getDstIndices().size())		if (size_t(dstMemref.getRank()) != getDstIndices().size())
return emitOpError() << "expected " << dstMemref.getRank()		return emitOpError() << "expected " << dstMemref.getRank()
<< " destination indices, got "		<< " destination indices, got "
▲ Show 20 Lines • Show All 172 Lines • ▼ Show 20 Lines	LogicalResult LdMatrixOp::verify() {
int64_t numElementsPer32b = 32 / elementBitWidth;		int64_t numElementsPer32b = 32 / elementBitWidth;

// number of 8-by-8 tiles		// number of 8-by-8 tiles
int64_t numTiles = getNumTiles();		int64_t numTiles = getNumTiles();

// transpose elements in vector registers at 16b granularity when true		// transpose elements in vector registers at 16b granularity when true
bool isTranspose = getTranspose();		bool isTranspose = getTranspose();

// address space id for shared memory
unsigned smemAddressSpace = gpu::GPUDialect::getWorkgroupAddressSpace();

//		//
// verification		// verification
//		//

if (!(srcMemref.getMemorySpaceAsInt() == smemAddressSpace))		if (!NVGPUDialect::hasSharedMemoryAddressSpace(srcMemref))
return emitError()		return emitError()
<< "expected nvgpu.ldmatrix srcMemref must have memory space "		<< "expected nvgpu.ldmatrix srcMemref must have a memory space "
<< smemAddressSpace;		"attribute of IntegerAttr("
		ftynseUnsubmitted Done Reply Inline Actions Nit: let's not hardcode 3 here either, use the named constant instead. ftynse: Nit: let's not hardcode 3 here either, use the named constant instead.
		<< NVGPUDialect::kSharedMemoryAddressSpace
		<< ") or gpu::AddressSpaceAttr(Workgroup)";
if (elementBitWidth > 32)		if (elementBitWidth > 32)
return emitError() << "nvgpu.ldmatrix works for 32b or lower";		return emitError() << "nvgpu.ldmatrix works for 32b or lower";
if (isTranspose && !(elementBitWidth == 16))		if (isTranspose && !(elementBitWidth == 16))
return emitError()		return emitError()
<< "nvgpu.ldmatrix transpose works only at 16b granularity";		<< "nvgpu.ldmatrix transpose works only at 16b granularity";
if (!(resShape[1] == numElementsPer32b))		if (!(resShape[1] == numElementsPer32b))
return emitError() << "expected vector register shape[1] = "		return emitError() << "expected vector register shape[1] = "
<< numElementsPer32b;		<< numElementsPer32b;
Show All 18 Lines

mlir/lib/Dialect/NVGPU/Transforms/OptimizeSharedMemory.cpp

Show First 20 Lines • Show All 175 Lines • ▼ Show 20 Lines	getShmReadAndWriteOps(Operation *parentOp, Value shmMemRef,

return success();		return success();
}		}

mlir::LogicalResult		mlir::LogicalResult
mlir::nvgpu::optimizeSharedMemoryReadsAndWrites(Operation *parentOp,		mlir::nvgpu::optimizeSharedMemoryReadsAndWrites(Operation *parentOp,
Value memrefValue) {		Value memrefValue) {
auto memRefType = memrefValue.getType().dyn_cast<MemRefType>();		auto memRefType = memrefValue.getType().dyn_cast<MemRefType>();
if (!memRefType \|\| memRefType.getMemorySpaceAsInt() !=		if (!memRefType \|\| !NVGPUDialect::hasSharedMemoryAddressSpace(memRefType))
gpu::GPUDialect::getWorkgroupAddressSpace())
return failure();		return failure();

// Abort if the given value has any sub-views; we do not do any alias		// Abort if the given value has any sub-views; we do not do any alias
// analysis.		// analysis.
bool hasSubView = false;		bool hasSubView = false;
parentOp->walk([&](memref::SubViewOp subView) { hasSubView = true; });		parentOp->walk([&](memref::SubViewOp subView) { hasSubView = true; });
if (hasSubView)		if (hasSubView)
return failure();		return failure();
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	class OptimizeSharedMemoryPass
: public nvgpu::impl::OptimizeSharedMemoryBase<OptimizeSharedMemoryPass> {		: public nvgpu::impl::OptimizeSharedMemoryBase<OptimizeSharedMemoryPass> {
public:		public:
OptimizeSharedMemoryPass() = default;		OptimizeSharedMemoryPass() = default;

void runOnOperation() override {		void runOnOperation() override {
Operation *op = getOperation();		Operation *op = getOperation();
SmallVector<memref::AllocOp> shmAllocOps;		SmallVector<memref::AllocOp> shmAllocOps;
op->walk([&](memref::AllocOp allocOp) {		op->walk([&](memref::AllocOp allocOp) {
if (allocOp.getMemref()		if (!NVGPUDialect::hasSharedMemoryAddressSpace(allocOp.getType()))
.getType()
.cast<MemRefType>()
.getMemorySpaceAsInt() !=
gpu::GPUDialect::getWorkgroupAddressSpace())
return;		return;
		bondhugulaUnsubmitted Done Reply Inline Actions `allocOp.getType()`. bondhugula: `allocOp.getType()`.
shmAllocOps.push_back(allocOp);		shmAllocOps.push_back(allocOp);
});		});
for (auto allocOp : shmAllocOps) {		for (auto allocOp : shmAllocOps) {
if (failed(optimizeSharedMemoryReadsAndWrites(getOperation(),		if (failed(optimizeSharedMemoryReadsAndWrites(getOperation(),
allocOp.getMemref())))		allocOp.getMemref())))
return;		return;
}		}
}		}
};		};
} // namespace		} // namespace

std::unique_ptr<Pass> mlir::nvgpu::createOptimizeSharedMemoryPass() {		std::unique_ptr<Pass> mlir::nvgpu::createOptimizeSharedMemoryPass() {
return std::make_unique<OptimizeSharedMemoryPass>();		return std::make_unique<OptimizeSharedMemoryPass>();
}		}

mlir/test/Conversion/GPUCommon/memory-attrbution.mlir

// RUN: mlir-opt -allow-unregistered-dialect --convert-gpu-to-nvvm --split-input-file %s \| FileCheck --check-prefix=NVVM %s		// RUN: mlir-opt -allow-unregistered-dialect --convert-gpu-to-nvvm --split-input-file %s \| FileCheck --check-prefix=NVVM %s
// RUN: mlir-opt -allow-unregistered-dialect --convert-gpu-to-rocdl --split-input-file %s \| FileCheck --check-prefix=ROCDL %s		// RUN: mlir-opt -allow-unregistered-dialect --convert-gpu-to-rocdl --split-input-file %s \| FileCheck --check-prefix=ROCDL %s

gpu.module @kernel {		gpu.module @kernel {
// NVVM-LABEL: llvm.func @private		// NVVM-LABEL: llvm.func @private
gpu.func @private(%arg0: f32) private(%arg1: memref<4xf32, 5>) {		gpu.func @private(%arg0: f32) private(%arg1: memref<4xf32, #gpu.address_space<private>>) {
// Allocate private memory inside the function.		// Allocate private memory inside the function.
// NVVM: %[[size:.*]] = llvm.mlir.constant(4 : i64) : i64		// NVVM: %[[size:.*]] = llvm.mlir.constant(4 : i64) : i64
// NVVM: %[[raw:.*]] = llvm.alloca %[[size]] x f32 : (i64) -> !llvm.ptr<f32>		// NVVM: %[[raw:.*]] = llvm.alloca %[[size]] x f32 : (i64) -> !llvm.ptr<f32>

// ROCDL: %[[size:.*]] = llvm.mlir.constant(4 : i64) : i64		// ROCDL: %[[size:.*]] = llvm.mlir.constant(4 : i64) : i64
// ROCDL: %[[raw:.*]] = llvm.alloca %[[size]] x f32 : (i64) -> !llvm.ptr<f32, 5>		// ROCDL: %[[raw:.*]] = llvm.alloca %[[size]] x f32 : (i64) -> !llvm.ptr<f32, 5>

// Populate the memref descriptor.		// Populate the memref descriptor.
Show All 22 Lines	gpu.func @private(%arg0: f32) private(%arg1: memref<4xf32, #gpu.address_space<private>>) {
// NVVM: llvm.extractvalue %[[descr6:.*]]		// NVVM: llvm.extractvalue %[[descr6:.*]]
// NVVM: llvm.getelementptr		// NVVM: llvm.getelementptr
// NVVM: llvm.store		// NVVM: llvm.store

// ROCDL: llvm.extractvalue %[[descr6:.*]]		// ROCDL: llvm.extractvalue %[[descr6:.*]]
// ROCDL: llvm.getelementptr		// ROCDL: llvm.getelementptr
// ROCDL: llvm.store		// ROCDL: llvm.store
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
memref.store %arg0, %arg1[%c0] : memref<4xf32, 5>		memref.store %arg0, %arg1[%c0] : memref<4xf32, #gpu.address_space<private>>

"terminator"() : () -> ()		"terminator"() : () -> ()
}		}
}		}

// -----		// -----

gpu.module @kernel {		gpu.module @kernel {
// Workgroup buffers are allocated as globals.		// Workgroup buffers are allocated as globals.
// NVVM: llvm.mlir.global internal @[[$buffer:.*]]()		// NVVM: llvm.mlir.global internal @[[$buffer:.*]]()
// NVVM-SAME: addr_space = 3		// NVVM-SAME: addr_space = 3
// NVVM-SAME: !llvm.array<4 x f32>		// NVVM-SAME: !llvm.array<4 x f32>

// ROCDL: llvm.mlir.global internal @[[$buffer:.*]]()		// ROCDL: llvm.mlir.global internal @[[$buffer:.*]]()
// ROCDL-SAME: addr_space = 3		// ROCDL-SAME: addr_space = 3
// ROCDL-SAME: !llvm.array<4 x f32>		// ROCDL-SAME: !llvm.array<4 x f32>

// NVVM-LABEL: llvm.func @workgroup		// NVVM-LABEL: llvm.func @workgroup
// NVVM-SAME: {		// NVVM-SAME: {

// ROCDL-LABEL: llvm.func @workgroup		// ROCDL-LABEL: llvm.func @workgroup
// ROCDL-SAME: {		// ROCDL-SAME: {
gpu.func @workgroup(%arg0: f32) workgroup(%arg1: memref<4xf32, 3>) {		gpu.func @workgroup(%arg0: f32) workgroup(%arg1: memref<4xf32, #gpu.address_space<workgroup>>) {
// Get the address of the first element in the global array.		// Get the address of the first element in the global array.
// NVVM: %[[addr:.*]] = llvm.mlir.addressof @[[$buffer]] : !llvm.ptr<array<4 x f32>, 3>		// NVVM: %[[addr:.*]] = llvm.mlir.addressof @[[$buffer]] : !llvm.ptr<array<4 x f32>, 3>
// NVVM: %[[raw:.*]] = llvm.getelementptr %[[addr]][0, 0]		// NVVM: %[[raw:.*]] = llvm.getelementptr %[[addr]][0, 0]
// NVVM-SAME: !llvm.ptr<f32, 3>		// NVVM-SAME: !llvm.ptr<f32, 3>

// ROCDL: %[[addr:.*]] = llvm.mlir.addressof @[[$buffer]] : !llvm.ptr<array<4 x f32>, 3>		// ROCDL: %[[addr:.*]] = llvm.mlir.addressof @[[$buffer]] : !llvm.ptr<array<4 x f32>, 3>
// ROCDL: %[[raw:.*]] = llvm.getelementptr %[[addr]][0, 0]		// ROCDL: %[[raw:.*]] = llvm.getelementptr %[[addr]][0, 0]
// ROCDL-SAME: !llvm.ptr<f32, 3>		// ROCDL-SAME: !llvm.ptr<f32, 3>
Show All 24 Lines	gpu.func @workgroup(%arg0: f32) workgroup(%arg1: memref<4xf32, #gpu.address_space<workgroup>>) {
// NVVM: llvm.extractvalue %[[descr6:.*]]		// NVVM: llvm.extractvalue %[[descr6:.*]]
// NVVM: llvm.getelementptr		// NVVM: llvm.getelementptr
// NVVM: llvm.store		// NVVM: llvm.store

// ROCDL: llvm.extractvalue %[[descr6:.*]]		// ROCDL: llvm.extractvalue %[[descr6:.*]]
// ROCDL: llvm.getelementptr		// ROCDL: llvm.getelementptr
// ROCDL: llvm.store		// ROCDL: llvm.store
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
memref.store %arg0, %arg1[%c0] : memref<4xf32, 3>		memref.store %arg0, %arg1[%c0] : memref<4xf32, #gpu.address_space<workgroup>>

"terminator"() : () -> ()		"terminator"() : () -> ()
}		}
}		}

// -----		// -----

gpu.module @kernel {		gpu.module @kernel {
// Check that the total size was computed correctly.		// Check that the total size was computed correctly.
// NVVM: llvm.mlir.global internal @[[$buffer:.*]]()		// NVVM: llvm.mlir.global internal @[[$buffer:.*]]()
// NVVM-SAME: addr_space = 3		// NVVM-SAME: addr_space = 3
// NVVM-SAME: !llvm.array<48 x f32>		// NVVM-SAME: !llvm.array<48 x f32>

// ROCDL: llvm.mlir.global internal @[[$buffer:.*]]()		// ROCDL: llvm.mlir.global internal @[[$buffer:.*]]()
// ROCDL-SAME: addr_space = 3		// ROCDL-SAME: addr_space = 3
// ROCDL-SAME: !llvm.array<48 x f32>		// ROCDL-SAME: !llvm.array<48 x f32>

// NVVM-LABEL: llvm.func @workgroup3d		// NVVM-LABEL: llvm.func @workgroup3d
// ROCDL-LABEL: llvm.func @workgroup3d		// ROCDL-LABEL: llvm.func @workgroup3d
gpu.func @workgroup3d(%arg0: f32) workgroup(%arg1: memref<4x2x6xf32, 3>) {		gpu.func @workgroup3d(%arg0: f32) workgroup(%arg1: memref<4x2x6xf32, #gpu.address_space<workgroup>>) {
// Get the address of the first element in the global array.		// Get the address of the first element in the global array.
// NVVM: %[[addr:.*]] = llvm.mlir.addressof @[[$buffer]] : !llvm.ptr<array<48 x f32>, 3>		// NVVM: %[[addr:.*]] = llvm.mlir.addressof @[[$buffer]] : !llvm.ptr<array<48 x f32>, 3>
// NVVM: %[[raw:.*]] = llvm.getelementptr %[[addr]][0, 0]		// NVVM: %[[raw:.*]] = llvm.getelementptr %[[addr]][0, 0]
// NVVM-SAME: !llvm.ptr<f32, 3>		// NVVM-SAME: !llvm.ptr<f32, 3>

// ROCDL: %[[addr:.*]] = llvm.mlir.addressof @[[$buffer]] : !llvm.ptr<array<48 x f32>, 3>		// ROCDL: %[[addr:.*]] = llvm.mlir.addressof @[[$buffer]] : !llvm.ptr<array<48 x f32>, 3>
// ROCDL: %[[raw:.*]] = llvm.getelementptr %[[addr]][0, 0]		// ROCDL: %[[raw:.*]] = llvm.getelementptr %[[addr]][0, 0]
// ROCDL-SAME: !llvm.ptr<f32, 3>		// ROCDL-SAME: !llvm.ptr<f32, 3>
Show All 31 Lines	gpu.func @workgroup3d(%arg0: f32) workgroup(%arg1: memref<4x2x6xf32, #gpu.address_space<workgroup>>) {
// ROCDL: %[[c6:.*]] = llvm.mlir.constant(6 : index) : i64		// ROCDL: %[[c6:.*]] = llvm.mlir.constant(6 : index) : i64
// ROCDL: %[[descr8:.*]] = llvm.insertvalue %[[c6]], %[[descr7]][4, 1]		// ROCDL: %[[descr8:.*]] = llvm.insertvalue %[[c6]], %[[descr7]][4, 1]
// ROCDL: %[[c6:.*]] = llvm.mlir.constant(6 : index) : i64		// ROCDL: %[[c6:.*]] = llvm.mlir.constant(6 : index) : i64
// ROCDL: %[[descr9:.*]] = llvm.insertvalue %[[c6]], %[[descr8]][3, 2]		// ROCDL: %[[descr9:.*]] = llvm.insertvalue %[[c6]], %[[descr8]][3, 2]
// ROCDL: %[[c1:.*]] = llvm.mlir.constant(1 : index) : i64		// ROCDL: %[[c1:.*]] = llvm.mlir.constant(1 : index) : i64
// ROCDL: %[[descr10:.*]] = llvm.insertvalue %[[c1]], %[[descr9]][4, 2]		// ROCDL: %[[descr10:.*]] = llvm.insertvalue %[[c1]], %[[descr9]][4, 2]

%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
memref.store %arg0, %arg1[%c0,%c0,%c0] : memref<4x2x6xf32, 3>		memref.store %arg0, %arg1[%c0,%c0,%c0] : memref<4x2x6xf32, #gpu.address_space<workgroup>>
"terminator"() : () -> ()		"terminator"() : () -> ()
}		}
}		}

// -----		// -----

gpu.module @kernel {		gpu.module @kernel {
// Check that several buffers are defined.		// Check that several buffers are defined.
// NVVM: llvm.mlir.global internal @[[$buffer1:.*]]()		// NVVM: llvm.mlir.global internal @[[$buffer1:.*]]()
// NVVM-SAME: !llvm.array<1 x f32>		// NVVM-SAME: !llvm.array<1 x f32>
// NVVM: llvm.mlir.global internal @[[$buffer2:.*]]()		// NVVM: llvm.mlir.global internal @[[$buffer2:.*]]()
// NVVM-SAME: !llvm.array<2 x f32>		// NVVM-SAME: !llvm.array<2 x f32>

// ROCDL: llvm.mlir.global internal @[[$buffer1:.*]]()		// ROCDL: llvm.mlir.global internal @[[$buffer1:.*]]()
// ROCDL-SAME: !llvm.array<1 x f32>		// ROCDL-SAME: !llvm.array<1 x f32>
// ROCDL: llvm.mlir.global internal @[[$buffer2:.*]]()		// ROCDL: llvm.mlir.global internal @[[$buffer2:.*]]()
// ROCDL-SAME: !llvm.array<2 x f32>		// ROCDL-SAME: !llvm.array<2 x f32>

// NVVM-LABEL: llvm.func @multiple		// NVVM-LABEL: llvm.func @multiple
// ROCDL-LABEL: llvm.func @multiple		// ROCDL-LABEL: llvm.func @multiple
gpu.func @multiple(%arg0: f32)		gpu.func @multiple(%arg0: f32)
workgroup(%arg1: memref<1xf32, 3>, %arg2: memref<2xf32, 3>)		workgroup(%arg1: memref<1xf32, #gpu.address_space<workgroup>>, %arg2: memref<2xf32, #gpu.address_space<workgroup>>)
private(%arg3: memref<3xf32, 5>, %arg4: memref<4xf32, 5>) {		private(%arg3: memref<3xf32, #gpu.address_space<private>>, %arg4: memref<4xf32, #gpu.address_space<private>>) {

// Workgroup buffers.		// Workgroup buffers.
// NVVM: llvm.mlir.addressof @[[$buffer1]]		// NVVM: llvm.mlir.addressof @[[$buffer1]]
// NVVM: llvm.mlir.addressof @[[$buffer2]]		// NVVM: llvm.mlir.addressof @[[$buffer2]]

// ROCDL: llvm.mlir.addressof @[[$buffer1]]		// ROCDL: llvm.mlir.addressof @[[$buffer1]]
// ROCDL: llvm.mlir.addressof @[[$buffer2]]		// ROCDL: llvm.mlir.addressof @[[$buffer2]]

// Private buffers.		// Private buffers.
// NVVM: %[[c3:.*]] = llvm.mlir.constant(3 : i64)		// NVVM: %[[c3:.*]] = llvm.mlir.constant(3 : i64)
// NVVM: llvm.alloca %[[c3]] x f32 : (i64) -> !llvm.ptr<f32>		// NVVM: llvm.alloca %[[c3]] x f32 : (i64) -> !llvm.ptr<f32>
// NVVM: %[[c4:.*]] = llvm.mlir.constant(4 : i64)		// NVVM: %[[c4:.*]] = llvm.mlir.constant(4 : i64)
// NVVM: llvm.alloca %[[c4]] x f32 : (i64) -> !llvm.ptr<f32>		// NVVM: llvm.alloca %[[c4]] x f32 : (i64) -> !llvm.ptr<f32>

// ROCDL: %[[c3:.*]] = llvm.mlir.constant(3 : i64)		// ROCDL: %[[c3:.*]] = llvm.mlir.constant(3 : i64)
// ROCDL: llvm.alloca %[[c3]] x f32 : (i64) -> !llvm.ptr<f32, 5>		// ROCDL: llvm.alloca %[[c3]] x f32 : (i64) -> !llvm.ptr<f32, 5>
// ROCDL: %[[c4:.*]] = llvm.mlir.constant(4 : i64)		// ROCDL: %[[c4:.*]] = llvm.mlir.constant(4 : i64)
// ROCDL: llvm.alloca %[[c4]] x f32 : (i64) -> !llvm.ptr<f32, 5>		// ROCDL: llvm.alloca %[[c4]] x f32 : (i64) -> !llvm.ptr<f32, 5>

%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
memref.store %arg0, %arg1[%c0] : memref<1xf32, 3>		memref.store %arg0, %arg1[%c0] : memref<1xf32, #gpu.address_space<workgroup>>
memref.store %arg0, %arg2[%c0] : memref<2xf32, 3>		memref.store %arg0, %arg2[%c0] : memref<2xf32, #gpu.address_space<workgroup>>
memref.store %arg0, %arg3[%c0] : memref<3xf32, 5>		memref.store %arg0, %arg3[%c0] : memref<3xf32, #gpu.address_space<private>>
memref.store %arg0, %arg4[%c0] : memref<4xf32, 5>		memref.store %arg0, %arg4[%c0] : memref<4xf32, #gpu.address_space<private>>
"terminator"() : () -> ()		"terminator"() : () -> ()
}		}
}		}

mlir/test/Dialect/GPU/all-reduce-max.mlir

// RUN: mlir-opt -test-gpu-rewrite %s \| FileCheck %s		// RUN: mlir-opt -test-gpu-rewrite %s \| FileCheck %s

// NOTE: Assertions have been autogenerated by utils/generate-test-checks.py		// NOTE: Assertions have been autogenerated by utils/generate-test-checks.py
// CHECK: gpu.module @kernels {		// CHECK: gpu.module @kernels {
gpu.module @kernels {		gpu.module @kernels {

// CHECK-LABEL: gpu.func @kernel(		// CHECK-LABEL: gpu.func @kernel(
// CHECK-SAME: [[VAL_0:%.]]: f32) workgroup([[VAL_1:%.]] : memref<32xf32, 3>) kernel {		// CHECK-SAME: [[VAL_0:%.]]: f32) workgroup([[VAL_1:%.]] : memref<32xf32, #gpu.address_space<workgroup>>) kernel {
gpu.func @kernel(%arg0 : f32) kernel {		gpu.func @kernel(%arg0 : f32) kernel {
// CHECK-DAG: [[VAL_2:%.*]] = arith.constant 31 : i32		// CHECK-DAG: [[VAL_2:%.*]] = arith.constant 31 : i32
// CHECK-DAG: [[VAL_3:%.*]] = arith.constant 0 : i32		// CHECK-DAG: [[VAL_3:%.*]] = arith.constant 0 : i32
// CHECK-DAG: [[VAL_4:%.*]] = arith.constant 0 : index		// CHECK-DAG: [[VAL_4:%.*]] = arith.constant 0 : index
// CHECK-DAG: [[VAL_5:%.*]] = arith.constant 32 : i32		// CHECK-DAG: [[VAL_5:%.*]] = arith.constant 32 : i32
// CHECK-DAG: [[VAL_6:%.*]] = arith.constant 1 : i32		// CHECK-DAG: [[VAL_6:%.*]] = arith.constant 1 : i32
// CHECK-DAG: [[VAL_7:%.*]] = arith.constant 2 : i32		// CHECK-DAG: [[VAL_7:%.*]] = arith.constant 2 : i32
// CHECK-DAG: [[VAL_8:%.*]] = arith.constant 4 : i32		// CHECK-DAG: [[VAL_8:%.*]] = arith.constant 4 : i32
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	gpu.func @kernel(%arg0 : f32) kernel {
// CHECK: [[VAL_77:%.*]] = arith.cmpf ugt, [[VAL_74]], [[VAL_75]] : f32		// CHECK: [[VAL_77:%.*]] = arith.cmpf ugt, [[VAL_74]], [[VAL_75]] : f32
// CHECK: [[VAL_78:%.*]] = arith.select [[VAL_77]], [[VAL_74]], [[VAL_75]] : f32		// CHECK: [[VAL_78:%.*]] = arith.select [[VAL_77]], [[VAL_74]], [[VAL_75]] : f32
// CHECK: cf.br ^bb18([[VAL_78]] : f32)		// CHECK: cf.br ^bb18([[VAL_78]] : f32)
// CHECK: ^bb18([[VAL_79:%.*]]: f32):		// CHECK: ^bb18([[VAL_79:%.*]]: f32):
// CHECK: cf.cond_br [[VAL_30]], ^bb19, ^bb20		// CHECK: cf.cond_br [[VAL_30]], ^bb19, ^bb20
// CHECK: ^bb19:		// CHECK: ^bb19:
// CHECK: [[VAL_80:%.*]] = arith.divsi [[VAL_27]], [[VAL_5]] : i32		// CHECK: [[VAL_80:%.*]] = arith.divsi [[VAL_27]], [[VAL_5]] : i32
// CHECK: [[VAL_81:%.*]] = arith.index_cast [[VAL_80]] : i32 to index		// CHECK: [[VAL_81:%.*]] = arith.index_cast [[VAL_80]] : i32 to index
// CHECK: store [[VAL_79]], [[VAL_1]]{{\[}}[[VAL_81]]] : memref<32xf32, 3>		// CHECK: store [[VAL_79]], [[VAL_1]]{{\[}}[[VAL_81]]] : memref<32xf32, #gpu.address_space<workgroup>>
// CHECK: cf.br ^bb21		// CHECK: cf.br ^bb21
// CHECK: ^bb20:		// CHECK: ^bb20:
// CHECK: cf.br ^bb21		// CHECK: cf.br ^bb21
// CHECK: ^bb21:		// CHECK: ^bb21:
// CHECK: gpu.barrier		// CHECK: gpu.barrier
// CHECK: [[VAL_82:%.*]] = arith.addi [[VAL_28]], [[VAL_2]] : i32		// CHECK: [[VAL_82:%.*]] = arith.addi [[VAL_28]], [[VAL_2]] : i32
// CHECK: [[VAL_83:%.*]] = arith.divsi [[VAL_82]], [[VAL_5]] : i32		// CHECK: [[VAL_83:%.*]] = arith.divsi [[VAL_82]], [[VAL_5]] : i32
// CHECK: [[VAL_84:%.*]] = arith.cmpi slt, [[VAL_27]], [[VAL_83]] : i32		// CHECK: [[VAL_84:%.*]] = arith.cmpi slt, [[VAL_27]], [[VAL_83]] : i32
// CHECK: cf.cond_br [[VAL_84]], ^bb22, ^bb41		// CHECK: cf.cond_br [[VAL_84]], ^bb22, ^bb41
// CHECK: ^bb22:		// CHECK: ^bb22:
// CHECK: [[VAL_85:%.*]] = arith.index_cast [[VAL_27]] : i32 to index		// CHECK: [[VAL_85:%.*]] = arith.index_cast [[VAL_27]] : i32 to index
// CHECK: [[VAL_86:%.*]] = memref.load [[VAL_1]]{{\[}}[[VAL_85]]] : memref<32xf32, 3>		// CHECK: [[VAL_86:%.*]] = memref.load [[VAL_1]]{{\[}}[[VAL_85]]] : memref<32xf32, #gpu.address_space<workgroup>>
// CHECK: [[VAL_87:%.*]] = arith.cmpi slt, [[VAL_83]], [[VAL_5]] : i32		// CHECK: [[VAL_87:%.*]] = arith.cmpi slt, [[VAL_83]], [[VAL_5]] : i32
// CHECK: cf.cond_br [[VAL_87]], ^bb23, ^bb39		// CHECK: cf.cond_br [[VAL_87]], ^bb23, ^bb39
// CHECK: ^bb23:		// CHECK: ^bb23:
// CHECK: [[VAL_88:%.]], [[VAL_89:%.]] = gpu.shuffle xor [[VAL_86]], [[VAL_6]], [[VAL_83]] : f32		// CHECK: [[VAL_88:%.]], [[VAL_89:%.]] = gpu.shuffle xor [[VAL_86]], [[VAL_6]], [[VAL_83]] : f32
// CHECK: cf.cond_br [[VAL_89]], ^bb24, ^bb25		// CHECK: cf.cond_br [[VAL_89]], ^bb24, ^bb25
// CHECK: ^bb24:		// CHECK: ^bb24:
// CHECK: [[VAL_90:%.*]] = arith.cmpf ugt, [[VAL_86]], [[VAL_88]] : f32		// CHECK: [[VAL_90:%.*]] = arith.cmpf ugt, [[VAL_86]], [[VAL_88]] : f32
// CHECK: [[VAL_91:%.*]] = arith.select [[VAL_90]], [[VAL_86]], [[VAL_88]] : f32		// CHECK: [[VAL_91:%.*]] = arith.select [[VAL_90]], [[VAL_86]], [[VAL_88]] : f32
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	gpu.func @kernel(%arg0 : f32) kernel {
// CHECK: [[VAL_125:%.]], [[VAL_126:%.]] = gpu.shuffle xor [[VAL_124]], [[VAL_9]], [[VAL_5]] : f32		// CHECK: [[VAL_125:%.]], [[VAL_126:%.]] = gpu.shuffle xor [[VAL_124]], [[VAL_9]], [[VAL_5]] : f32
// CHECK: [[VAL_127:%.*]] = arith.cmpf ugt, [[VAL_124]], [[VAL_125]] : f32		// CHECK: [[VAL_127:%.*]] = arith.cmpf ugt, [[VAL_124]], [[VAL_125]] : f32
// CHECK: [[VAL_128:%.*]] = arith.select [[VAL_127]], [[VAL_124]], [[VAL_125]] : f32		// CHECK: [[VAL_128:%.*]] = arith.select [[VAL_127]], [[VAL_124]], [[VAL_125]] : f32
// CHECK: [[VAL_129:%.]], [[VAL_130:%.]] = gpu.shuffle xor [[VAL_128]], [[VAL_10]], [[VAL_5]] : f32		// CHECK: [[VAL_129:%.]], [[VAL_130:%.]] = gpu.shuffle xor [[VAL_128]], [[VAL_10]], [[VAL_5]] : f32
// CHECK: [[VAL_131:%.*]] = arith.cmpf ugt, [[VAL_128]], [[VAL_129]] : f32		// CHECK: [[VAL_131:%.*]] = arith.cmpf ugt, [[VAL_128]], [[VAL_129]] : f32
// CHECK: [[VAL_132:%.*]] = arith.select [[VAL_131]], [[VAL_128]], [[VAL_129]] : f32		// CHECK: [[VAL_132:%.*]] = arith.select [[VAL_131]], [[VAL_128]], [[VAL_129]] : f32
// CHECK: cf.br ^bb40([[VAL_132]] : f32)		// CHECK: cf.br ^bb40([[VAL_132]] : f32)
// CHECK: ^bb40([[VAL_133:%.*]]: f32):		// CHECK: ^bb40([[VAL_133:%.*]]: f32):
// CHECK: store [[VAL_133]], [[VAL_1]]{{\[}}[[VAL_4]]] : memref<32xf32, 3>		// CHECK: store [[VAL_133]], [[VAL_1]]{{\[}}[[VAL_4]]] : memref<32xf32, #gpu.address_space<workgroup>>
// CHECK: cf.br ^bb42		// CHECK: cf.br ^bb42
// CHECK: ^bb41:		// CHECK: ^bb41:
// CHECK: cf.br ^bb42		// CHECK: cf.br ^bb42
// CHECK: ^bb42:		// CHECK: ^bb42:
// CHECK: gpu.barrier		// CHECK: gpu.barrier
%sum = gpu.all_reduce max %arg0 uniform {} : (f32) -> (f32)		%sum = gpu.all_reduce max %arg0 uniform {} : (f32) -> (f32)
gpu.return		gpu.return
}		}

}		}

mlir/test/Dialect/GPU/all-reduce.mlir

// RUN: mlir-opt -test-gpu-rewrite %s \| FileCheck %s		// RUN: mlir-opt -test-gpu-rewrite %s \| FileCheck %s

// NOTE: Assertions have been autogenerated by utils/generate-test-checks.py		// NOTE: Assertions have been autogenerated by utils/generate-test-checks.py
// CHECK: gpu.module @kernels {		// CHECK: gpu.module @kernels {
gpu.module @kernels {		gpu.module @kernels {

// CHECK-LABEL: gpu.func @kernel(		// CHECK-LABEL: gpu.func @kernel(
// CHECK-SAME: [[VAL_0:%.]]: f32) workgroup([[VAL_1:%.]] : memref<32xf32, 3>) kernel {		// CHECK-SAME: [[VAL_0:%.]]: f32) workgroup([[VAL_1:%.]] : memref<32xf32, #gpu.address_space<workgroup>>) kernel {
gpu.func @kernel(%arg0 : f32) kernel {		gpu.func @kernel(%arg0 : f32) kernel {
// CHECK-DAG: [[VAL_2:%.*]] = arith.constant 31 : i32		// CHECK-DAG: [[VAL_2:%.*]] = arith.constant 31 : i32
// CHECK-DAG: [[VAL_3:%.*]] = arith.constant 0 : i32		// CHECK-DAG: [[VAL_3:%.*]] = arith.constant 0 : i32
// CHECK-DAG: [[VAL_4:%.*]] = arith.constant 0 : index		// CHECK-DAG: [[VAL_4:%.*]] = arith.constant 0 : index
// CHECK-DAG: [[VAL_5:%.*]] = arith.constant 32 : i32		// CHECK-DAG: [[VAL_5:%.*]] = arith.constant 32 : i32
// CHECK-DAG: [[VAL_6:%.*]] = arith.constant 1 : i32		// CHECK-DAG: [[VAL_6:%.*]] = arith.constant 1 : i32
// CHECK-DAG: [[VAL_7:%.*]] = arith.constant 2 : i32		// CHECK-DAG: [[VAL_7:%.*]] = arith.constant 2 : i32
// CHECK-DAG: [[VAL_8:%.*]] = arith.constant 4 : i32		// CHECK-DAG: [[VAL_8:%.*]] = arith.constant 4 : i32
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	gpu.func @kernel(%arg0 : f32) kernel {
// CHECK: [[VAL_66:%.]], [[VAL_67:%.]] = gpu.shuffle xor [[VAL_65]], [[VAL_10]], [[VAL_5]] : f32		// CHECK: [[VAL_66:%.]], [[VAL_67:%.]] = gpu.shuffle xor [[VAL_65]], [[VAL_10]], [[VAL_5]] : f32
// CHECK: [[VAL_68:%.*]] = arith.addf [[VAL_65]], [[VAL_66]] : f32		// CHECK: [[VAL_68:%.*]] = arith.addf [[VAL_65]], [[VAL_66]] : f32
// CHECK: cf.br ^bb18([[VAL_68]] : f32)		// CHECK: cf.br ^bb18([[VAL_68]] : f32)
// CHECK: ^bb18([[VAL_69:%.*]]: f32):		// CHECK: ^bb18([[VAL_69:%.*]]: f32):
// CHECK: cf.cond_br [[VAL_30]], ^bb19, ^bb20		// CHECK: cf.cond_br [[VAL_30]], ^bb19, ^bb20
// CHECK: ^bb19:		// CHECK: ^bb19:
// CHECK: [[VAL_70:%.*]] = arith.divsi [[VAL_27]], [[VAL_5]] : i32		// CHECK: [[VAL_70:%.*]] = arith.divsi [[VAL_27]], [[VAL_5]] : i32
// CHECK: [[VAL_71:%.*]] = arith.index_cast [[VAL_70]] : i32 to index		// CHECK: [[VAL_71:%.*]] = arith.index_cast [[VAL_70]] : i32 to index
// CHECK: store [[VAL_69]], [[VAL_1]]{{\[}}[[VAL_71]]] : memref<32xf32, 3>		// CHECK: store [[VAL_69]], [[VAL_1]]{{\[}}[[VAL_71]]] : memref<32xf32, #gpu.address_space<workgroup>>
// CHECK: cf.br ^bb21		// CHECK: cf.br ^bb21
// CHECK: ^bb20:		// CHECK: ^bb20:
// CHECK: cf.br ^bb21		// CHECK: cf.br ^bb21
// CHECK: ^bb21:		// CHECK: ^bb21:
// CHECK: gpu.barrier		// CHECK: gpu.barrier
// CHECK: [[VAL_72:%.*]] = arith.addi [[VAL_28]], [[VAL_2]] : i32		// CHECK: [[VAL_72:%.*]] = arith.addi [[VAL_28]], [[VAL_2]] : i32
// CHECK: [[VAL_73:%.*]] = arith.divsi [[VAL_72]], [[VAL_5]] : i32		// CHECK: [[VAL_73:%.*]] = arith.divsi [[VAL_72]], [[VAL_5]] : i32
// CHECK: [[VAL_74:%.*]] = arith.cmpi slt, [[VAL_27]], [[VAL_73]] : i32		// CHECK: [[VAL_74:%.*]] = arith.cmpi slt, [[VAL_27]], [[VAL_73]] : i32
// CHECK: cf.cond_br [[VAL_74]], ^bb22, ^bb41		// CHECK: cf.cond_br [[VAL_74]], ^bb22, ^bb41
// CHECK: ^bb22:		// CHECK: ^bb22:
// CHECK: [[VAL_75:%.*]] = arith.index_cast [[VAL_27]] : i32 to index		// CHECK: [[VAL_75:%.*]] = arith.index_cast [[VAL_27]] : i32 to index
// CHECK: [[VAL_76:%.*]] = memref.load [[VAL_1]]{{\[}}[[VAL_75]]] : memref<32xf32, 3>		// CHECK: [[VAL_76:%.*]] = memref.load [[VAL_1]]{{\[}}[[VAL_75]]] : memref<32xf32, #gpu.address_space<workgroup>>
// CHECK: [[VAL_77:%.*]] = arith.cmpi slt, [[VAL_73]], [[VAL_5]] : i32		// CHECK: [[VAL_77:%.*]] = arith.cmpi slt, [[VAL_73]], [[VAL_5]] : i32
// CHECK: cf.cond_br [[VAL_77]], ^bb23, ^bb39		// CHECK: cf.cond_br [[VAL_77]], ^bb23, ^bb39
// CHECK: ^bb23:		// CHECK: ^bb23:
// CHECK: [[VAL_78:%.]], [[VAL_79:%.]] = gpu.shuffle xor [[VAL_76]], [[VAL_6]], [[VAL_73]] : f32		// CHECK: [[VAL_78:%.]], [[VAL_79:%.]] = gpu.shuffle xor [[VAL_76]], [[VAL_6]], [[VAL_73]] : f32
// CHECK: cf.cond_br [[VAL_79]], ^bb24, ^bb25		// CHECK: cf.cond_br [[VAL_79]], ^bb24, ^bb25
// CHECK: ^bb24:		// CHECK: ^bb24:
// CHECK: [[VAL_80:%.*]] = arith.addf [[VAL_76]], [[VAL_78]] : f32		// CHECK: [[VAL_80:%.*]] = arith.addf [[VAL_76]], [[VAL_78]] : f32
// CHECK: cf.br ^bb26([[VAL_80]] : f32)		// CHECK: cf.br ^bb26([[VAL_80]] : f32)
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	gpu.func @kernel(%arg0 : f32) kernel {
// CHECK: [[VAL_104:%.]], [[VAL_105:%.]] = gpu.shuffle xor [[VAL_103]], [[VAL_8]], [[VAL_5]] : f32		// CHECK: [[VAL_104:%.]], [[VAL_105:%.]] = gpu.shuffle xor [[VAL_103]], [[VAL_8]], [[VAL_5]] : f32
// CHECK: [[VAL_106:%.*]] = arith.addf [[VAL_103]], [[VAL_104]] : f32		// CHECK: [[VAL_106:%.*]] = arith.addf [[VAL_103]], [[VAL_104]] : f32
// CHECK: [[VAL_107:%.]], [[VAL_108:%.]] = gpu.shuffle xor [[VAL_106]], [[VAL_9]], [[VAL_5]] : f32		// CHECK: [[VAL_107:%.]], [[VAL_108:%.]] = gpu.shuffle xor [[VAL_106]], [[VAL_9]], [[VAL_5]] : f32
// CHECK: [[VAL_109:%.*]] = arith.addf [[VAL_106]], [[VAL_107]] : f32		// CHECK: [[VAL_109:%.*]] = arith.addf [[VAL_106]], [[VAL_107]] : f32
// CHECK: [[VAL_110:%.]], [[VAL_111:%.]] = gpu.shuffle xor [[VAL_109]], [[VAL_10]], [[VAL_5]] : f32		// CHECK: [[VAL_110:%.]], [[VAL_111:%.]] = gpu.shuffle xor [[VAL_109]], [[VAL_10]], [[VAL_5]] : f32
// CHECK: [[VAL_112:%.*]] = arith.addf [[VAL_109]], [[VAL_110]] : f32		// CHECK: [[VAL_112:%.*]] = arith.addf [[VAL_109]], [[VAL_110]] : f32
// CHECK: cf.br ^bb40([[VAL_112]] : f32)		// CHECK: cf.br ^bb40([[VAL_112]] : f32)
// CHECK: ^bb40([[VAL_113:%.*]]: f32):		// CHECK: ^bb40([[VAL_113:%.*]]: f32):
// CHECK: store [[VAL_113]], [[VAL_1]]{{\[}}[[VAL_4]]] : memref<32xf32, 3>		// CHECK: store [[VAL_113]], [[VAL_1]]{{\[}}[[VAL_4]]] : memref<32xf32, #gpu.address_space<workgroup>>
// CHECK: cf.br ^bb42		// CHECK: cf.br ^bb42
// CHECK: ^bb41:		// CHECK: ^bb41:
// CHECK: cf.br ^bb42		// CHECK: cf.br ^bb42
// CHECK: ^bb42:		// CHECK: ^bb42:
// CHECK: gpu.barrier		// CHECK: gpu.barrier
%sum = gpu.all_reduce add %arg0 uniform {} : (f32) -> (f32)		%sum = gpu.all_reduce add %arg0 uniform {} : (f32) -> (f32)
gpu.return		gpu.return
}		}

}		}

mlir/test/Dialect/GPU/invalid.mlir

Show First 20 Lines • Show All 344 Lines • ▼ Show 20 Lines	gpu.module @gpu_funcs {
}) {sym_name="kernel_1", function_type=f32} : () -> ()		}) {sym_name="kernel_1", function_type=f32} : () -> ()
}		}
}		}

// -----		// -----

module {		module {
gpu.module @gpu_funcs {		gpu.module @gpu_funcs {
// expected-error @+1 {{expected memref type in attribution}}		// expected-error @below {{'gpu.func' op expected memref type in attribution}}
		ftynseUnsubmitted Done Reply Inline Actions Please drop trailing spaces here and below. ftynse: Please drop trailing spaces here and below.
gpu.func @kernel() workgroup(%0: i32) {		gpu.func @kernel() workgroup(%0: i32) {
gpu.return		gpu.return
}		}
}		}
}		}

// -----		// -----

module {		module {
gpu.module @gpu_funcs {		gpu.module @gpu_funcs {
// expected-error @+1 {{expected memory space 3 in attribution}}		// expected-error @below {{'gpu.func' op expected memory space in attribution}}
		bondhugulaUnsubmitted Done Reply Inline Actions Drop trailing whitespace. `git diff --check HEAD~` bondhugula: Drop trailing whitespace. `git diff --check HEAD~`
gpu.func @kernel() workgroup(%0: memref<4xf32>) {		gpu.func @kernel() workgroup(%0: memref<4xf32>) {
gpu.return		gpu.return
}		}
}		}
}		}

// -----		// -----

module {		module {
gpu.module @gpu_funcs {		gpu.module @gpu_funcs {
// expected-error @+1 {{expected memory space 5 in attribution}}		// expected-error @below {{'gpu.func' op expected memory space workgroup in attribution}}
		gpu.func @kernel() workgroup(%0: memref<4xf32, #gpu.address_space<private>>) {
		gpu.return
		}
		}
		}

		// -----

		module {
		gpu.module @gpu_funcs {
		// expected-error @below {{'gpu.func' op expected memory space in attribution}}
gpu.func @kernel() private(%0: memref<4xf32>) {		gpu.func @kernel() private(%0: memref<4xf32>) {
gpu.return		gpu.return
}		}
}		}
}		}

// -----		// -----

module {		module {
gpu.module @gpu_funcs {		gpu.module @gpu_funcs {
// expected-error @+1 {{expected memory space 5 in attribution}}		// expected-error @below {{'gpu.func' op expected memory space in attribution}}
gpu.func @kernel() private(%0: memref<4xf32>) {		gpu.func @kernel() private(%0: memref<4xf32>) {
gpu.return		gpu.return
}		}
}		}
}		}

// -----		// -----

module {		module {
gpu.module @gpu_funcs {		gpu.module @gpu_funcs {
		// expected-error @below {{'gpu.func' op expected memory space private in attribution}}
		gpu.func @kernel() private(%0: memref<4xf32, #gpu.address_space<workgroup>>) {
		gpu.return
		}
		}
		}

		// -----

		module {
		gpu.module @gpu_funcs {
// expected-note @+1 {{return type declared here}}		// expected-note @+1 {{return type declared here}}
gpu.func @kernel() {		gpu.func @kernel() {
%0 = arith.constant 0 : index		%0 = arith.constant 0 : index
// expected-error @+1 {{'gpu.return' op expected 0 result operands}}		// expected-error @+1 {{'gpu.return' op expected 0 result operands}}
gpu.return %0 : index		gpu.return %0 : index
}		}
}		}
}		}
▲ Show 20 Lines • Show All 219 Lines • Show Last 20 Lines

mlir/test/Dialect/GPU/promotion.mlir

// RUN: mlir-opt -allow-unregistered-dialect -pass-pipeline='builtin.module(gpu.module(gpu.func(test-gpu-memory-promotion)))' -split-input-file %s \| FileCheck %s		// RUN: mlir-opt -allow-unregistered-dialect -pass-pipeline='builtin.module(gpu.module(gpu.func(test-gpu-memory-promotion)))' -split-input-file %s \| FileCheck %s

gpu.module @foo {		gpu.module @foo {

// Verify that the attribution was indeed introduced		// Verify that the attribution was indeed introduced
// CHECK-LABEL: @memref3d		// CHECK-LABEL: @memref3d
// CHECK-SAME: (%[[arg:.*]]: memref<5x4xf32>		// CHECK-SAME: (%[[arg:.*]]: memref<5x4xf32>
// CHECK-SAME: workgroup(%[[promoted:.*]] : memref<5x4xf32, 3>)		// CHECK-SAME: workgroup(%[[promoted:.*]] : memref<5x4xf32, #gpu.address_space<workgroup>>)
gpu.func @memref3d(%arg0: memref<5x4xf32> {gpu.test_promote_workgroup}) kernel {		gpu.func @memref3d(%arg0: memref<5x4xf32> {gpu.test_promote_workgroup}) kernel {
// Verify that loop bounds are emitted, the order does not matter.		// Verify that loop bounds are emitted, the order does not matter.
// CHECK-DAG: %[[c1:.*]] = arith.constant 1		// CHECK-DAG: %[[c1:.*]] = arith.constant 1
// CHECK-DAG: %[[c4:.*]] = arith.constant 4		// CHECK-DAG: %[[c4:.*]] = arith.constant 4
// CHECK-DAG: %[[c5:.*]] = arith.constant 5		// CHECK-DAG: %[[c5:.*]] = arith.constant 5
// CHECK-DAG: %[[tx:.*]] = gpu.thread_id x		// CHECK-DAG: %[[tx:.*]] = gpu.thread_id x
// CHECK-DAG: %[[ty:.*]] = gpu.thread_id y		// CHECK-DAG: %[[ty:.*]] = gpu.thread_id y
// CHECK-DAG: %[[tz:.*]] = gpu.thread_id z		// CHECK-DAG: %[[tz:.*]] = gpu.thread_id z
// CHECK-DAG: %[[bdx:.*]] = gpu.block_dim x		// CHECK-DAG: %[[bdx:.*]] = gpu.block_dim x
// CHECK-DAG: %[[bdy:.*]] = gpu.block_dim y		// CHECK-DAG: %[[bdy:.*]] = gpu.block_dim y
// CHECK-DAG: %[[bdz:.*]] = gpu.block_dim z		// CHECK-DAG: %[[bdz:.*]] = gpu.block_dim z

// Verify that loops for the copy are emitted. We only check the number of		// Verify that loops for the copy are emitted. We only check the number of
// loops here since their bounds are produced by mapLoopToProcessorIds,		// loops here since their bounds are produced by mapLoopToProcessorIds,
// tested separately.		// tested separately.
// CHECK: scf.for %[[i0:.*]] =		// CHECK: scf.for %[[i0:.*]] =
// CHECK: scf.for %[[i1:.*]] =		// CHECK: scf.for %[[i1:.*]] =
// CHECK: scf.for %[[i2:.*]] =		// CHECK: scf.for %[[i2:.*]] =

// Verify that the copy is emitted and uses only the last two loops.		// Verify that the copy is emitted and uses only the last two loops.
// CHECK: %[[v:.*]] = memref.load %[[arg]][%[[i1]], %[[i2]]]		// CHECK: %[[v:.*]] = memref.load %[[arg]][%[[i1]], %[[i2]]]
// CHECK: store %[[v]], %[[promoted]][%[[i1]], %[[i2]]]		// CHECK: store %[[v]], %[[promoted]][%[[i1]], %[[i2]]]

// Verify that the use has been rewritten.		// Verify that the use has been rewritten.
// CHECK: "use"(%[[promoted]]) : (memref<5x4xf32, 3>)		// CHECK: "use"(%[[promoted]]) : (memref<5x4xf32, #gpu.address_space<workgroup>>)
"use"(%arg0) : (memref<5x4xf32>) -> ()		"use"(%arg0) : (memref<5x4xf32>) -> ()


// Verify that loops for the copy are emitted. We only check the number of		// Verify that loops for the copy are emitted. We only check the number of
// loops here since their bounds are produced by mapLoopToProcessorIds,		// loops here since their bounds are produced by mapLoopToProcessorIds,
// tested separately.		// tested separately.
// CHECK: scf.for %[[i0:.*]] =		// CHECK: scf.for %[[i0:.*]] =
// CHECK: scf.for %[[i1:.*]] =		// CHECK: scf.for %[[i1:.*]] =
// CHECK: scf.for %[[i2:.*]] =		// CHECK: scf.for %[[i2:.*]] =

// Verify that the copy is emitted and uses only the last two loops.		// Verify that the copy is emitted and uses only the last two loops.
// CHECK: %[[v:.*]] = memref.load %[[promoted]][%[[i1]], %[[i2]]]		// CHECK: %[[v:.*]] = memref.load %[[promoted]][%[[i1]], %[[i2]]]
// CHECK: store %[[v]], %[[arg]][%[[i1]], %[[i2]]]		// CHECK: store %[[v]], %[[arg]][%[[i1]], %[[i2]]]
gpu.return		gpu.return
}		}
}		}

// -----		// -----

gpu.module @foo {		gpu.module @foo {

// Verify that the attribution was indeed introduced		// Verify that the attribution was indeed introduced
// CHECK-LABEL: @memref5d		// CHECK-LABEL: @memref5d
// CHECK-SAME: (%[[arg:.*]]: memref<8x7x6x5x4xf32>		// CHECK-SAME: (%[[arg:.*]]: memref<8x7x6x5x4xf32>
// CHECK-SAME: workgroup(%[[promoted:.*]] : memref<8x7x6x5x4xf32, 3>)		// CHECK-SAME: workgroup(%[[promoted:.*]] : memref<8x7x6x5x4xf32, #gpu.address_space<workgroup>>)
gpu.func @memref5d(%arg0: memref<8x7x6x5x4xf32> {gpu.test_promote_workgroup}) kernel {		gpu.func @memref5d(%arg0: memref<8x7x6x5x4xf32> {gpu.test_promote_workgroup}) kernel {
// Verify that loop bounds are emitted, the order does not matter.		// Verify that loop bounds are emitted, the order does not matter.
// CHECK-DAG: %[[c0:.*]] = arith.constant 0		// CHECK-DAG: %[[c0:.*]] = arith.constant 0
// CHECK-DAG: %[[c1:.*]] = arith.constant 1		// CHECK-DAG: %[[c1:.*]] = arith.constant 1
// CHECK-DAG: %[[c4:.*]] = arith.constant 4		// CHECK-DAG: %[[c4:.*]] = arith.constant 4
// CHECK-DAG: %[[c5:.*]] = arith.constant 5		// CHECK-DAG: %[[c5:.*]] = arith.constant 5
// CHECK-DAG: %[[c6:.*]] = arith.constant 6		// CHECK-DAG: %[[c6:.*]] = arith.constant 6
// CHECK-DAG: %[[c7:.*]] = arith.constant 7		// CHECK-DAG: %[[c7:.*]] = arith.constant 7
Show All 12 Lines	gpu.func @memref5d(%arg0: memref<8x7x6x5x4xf32> {gpu.test_promote_workgroup}) kernel {
// CHECK: scf.for %[[i3:.*]] =		// CHECK: scf.for %[[i3:.*]] =
// CHECK: scf.for %[[i4:.*]] =		// CHECK: scf.for %[[i4:.*]] =

// Verify that the copy is emitted.		// Verify that the copy is emitted.
// CHECK: %[[v:.*]] = memref.load %[[arg]][%[[i0]], %[[i1]], %[[i2]], %[[i3]], %[[i4]]]		// CHECK: %[[v:.*]] = memref.load %[[arg]][%[[i0]], %[[i1]], %[[i2]], %[[i3]], %[[i4]]]
// CHECK: store %[[v]], %[[promoted]][%[[i0]], %[[i1]], %[[i2]], %[[i3]], %[[i4]]]		// CHECK: store %[[v]], %[[promoted]][%[[i0]], %[[i1]], %[[i2]], %[[i3]], %[[i4]]]

// Verify that the use has been rewritten.		// Verify that the use has been rewritten.
// CHECK: "use"(%[[promoted]]) : (memref<8x7x6x5x4xf32, 3>)		// CHECK: "use"(%[[promoted]]) : (memref<8x7x6x5x4xf32, #gpu.address_space<workgroup>>)
"use"(%arg0) : (memref<8x7x6x5x4xf32>) -> ()		"use"(%arg0) : (memref<8x7x6x5x4xf32>) -> ()

// Verify that loop loops for the copy are emitted.		// Verify that loop loops for the copy are emitted.
// CHECK: scf.for %[[i0:.*]] =		// CHECK: scf.for %[[i0:.*]] =
// CHECK: scf.for %[[i1:.*]] =		// CHECK: scf.for %[[i1:.*]] =
// CHECK: scf.for %[[i2:.*]] =		// CHECK: scf.for %[[i2:.*]] =
// CHECK: scf.for %[[i3:.*]] =		// CHECK: scf.for %[[i3:.*]] =
// CHECK: scf.for %[[i4:.*]] =		// CHECK: scf.for %[[i4:.*]] =

// Verify that the copy is emitted.		// Verify that the copy is emitted.
// CHECK: %[[v:.*]] = memref.load %[[promoted]][%[[i0]], %[[i1]], %[[i2]], %[[i3]], %[[i4]]]		// CHECK: %[[v:.*]] = memref.load %[[promoted]][%[[i0]], %[[i1]], %[[i2]], %[[i3]], %[[i4]]]
// CHECK: store %[[v]], %[[arg]][%[[i0]], %[[i1]], %[[i2]], %[[i3]], %[[i4]]]		// CHECK: store %[[v]], %[[arg]][%[[i0]], %[[i1]], %[[i2]], %[[i3]], %[[i4]]]
gpu.return		gpu.return
}		}
}		}

// -----		// -----

gpu.module @foo {		gpu.module @foo {

// Check that attribution insertion works fine.		// Check that attribution insertion works fine.
// CHECK-LABEL: @insert		// CHECK-LABEL: @insert
// CHECK-SAME: (%{{.*}}: memref<4xf32>		// CHECK-SAME: (%{{.*}}: memref<4xf32>
// CHECK-SAME: workgroup(%{{.*}}: memref<1x1xf64, 3>		// CHECK-SAME: workgroup(%{{.*}}: memref<1x1xf64, #gpu.address_space<workgroup>>
// CHECK-SAME: %[[wg2:.*]] : memref<4xf32, 3>)		// CHECK-SAME: %[[wg2:.*]] : memref<4xf32, #gpu.address_space<workgroup>>)
// CHECK-SAME: private(%{{.*}}: memref<1x1xi64, 5>)		// CHECK-SAME: private(%{{.*}}: memref<1x1xi64, 5>)
gpu.func @insert(%arg0: memref<4xf32> {gpu.test_promote_workgroup})		gpu.func @insert(%arg0: memref<4xf32> {gpu.test_promote_workgroup})
workgroup(%arg1: memref<1x1xf64, 3>)		workgroup(%arg1: memref<1x1xf64, #gpu.address_space<workgroup>>)
private(%arg2: memref<1x1xi64, 5>)		private(%arg2: memref<1x1xi64, 5>)
kernel {		kernel {
// CHECK: "use"(%[[wg2]])		// CHECK: "use"(%[[wg2]])
"use"(%arg0) : (memref<4xf32>) -> ()		"use"(%arg0) : (memref<4xf32>) -> ()
gpu.return		gpu.return
}		}
}		}

mlir/test/Dialect/NVGPU/invalid.mlir

	// RUN: mlir-opt -split-input-file -verify-diagnostics %s			// RUN: mlir-opt -split-input-file -verify-diagnostics %s

	func.func @ldmatrix_address_space_f16_x4(%arg0: memref<128x128xf16, 2>) -> vector<4x1xf16> {			func.func @ldmatrix_address_space_f16_x4(%arg0: memref<128x128xf16, 2>) -> vector<4x1xf16> {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	// expected-error @+1 {{expected nvgpu.ldmatrix srcMemref must have memory space 3}}			// expected-error @below {{expected nvgpu.ldmatrix srcMemref must have a memory space attribute of IntegerAttr(3) or gpu::AddressSpaceAttr(Workgroup)}}
	%a = nvgpu.ldmatrix %arg0[%c0, %c0] {transpose = false, numTiles = 4 : i32} : memref<128x128xf16, 2> -> vector<4x1xf16>			%a = nvgpu.ldmatrix %arg0[%c0, %c0] {transpose = false, numTiles = 4 : i32} : memref<128x128xf16, 2> -> vector<4x1xf16>
	return %a : vector<4x1xf16>			return %a : vector<4x1xf16>
	}			}
	// -----			// -----

	func.func @ldmatrix_num_elements_f16_x4(%arg0: memref<128x128xf16, 3>) -> vector<4x1xf16> {			func.func @ldmatrix_num_elements_f16_x4(%arg0: memref<128x128xf16, 3>) -> vector<4x1xf16> {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	// expected-error @+1 {{expected vector register shape[1] = 2}}			// expected-error @+1 {{expected vector register shape[1] = 2}}
	▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines
	func.func @m16n8k32_int32_datatype(%arg0: vector<4x4xi32>, %arg1: vector<2x4xi8>, %arg2: vector<2x2xi32>) -> vector<2x2xi32> {			func.func @m16n8k32_int32_datatype(%arg0: vector<4x4xi32>, %arg1: vector<2x4xi8>, %arg2: vector<2x2xi32>) -> vector<2x2xi32> {
	// expected-error @+1 {{op failed to verify that matrixA and matrixB have same element type}}			// expected-error @+1 {{op failed to verify that matrixA and matrixB have same element type}}
	%d = nvgpu.mma.sync (%arg0, %arg1, %arg2) {mmaShape = [16, 8, 32]} : (vector<4x4xi32>, vector<2x4xi8>, vector<2x2xi32>) -> vector<2x2xi32>			%d = nvgpu.mma.sync (%arg0, %arg1, %arg2) {mmaShape = [16, 8, 32]} : (vector<4x4xi32>, vector<2x4xi8>, vector<2x2xi32>) -> vector<2x2xi32>
	return %d : vector<2x2xi32>			return %d : vector<2x2xi32>
	}			}
	// -----			// -----

	func.func @async_cp_memory_space(%dst : memref<16xf32>, %src : memref<16xf32>, %i : index) -> () {			func.func @async_cp_memory_space(%dst : memref<16xf32>, %src : memref<16xf32>, %i : index) -> () {
	// expected-error @+1 {{destination memref must have memory space 3}}			// expected-error @below {{destination memref must have a memory space attribute of IntegerAttr(3) or gpu::AddressSpaceAttr(Workgroup)}}
	nvgpu.device_async_copy %src[%i], %dst[%i], 16 : memref<16xf32> to memref<16xf32>			nvgpu.device_async_copy %src[%i], %dst[%i], 16 : memref<16xf32> to memref<16xf32>
	return			return
	}			}

	// -----			// -----

	func.func @async_cp_memref_type(%dst : memref<16xi32, 3>, %src : memref<16xf32>, %i : index) -> () {			func.func @async_cp_memref_type(%dst : memref<16xi32, 3>, %src : memref<16xf32>, %i : index) -> () {
	// expected-error @+1 {{source and destination must have the same element type}}			// expected-error @+1 {{source and destination must have the same element type}}
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][gpu] Migrate hard-coded address space integers to an enum attribute (gpu::AddressSpaceAttr)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 486600

mlir/include/mlir/Dialect/GPU/IR/GPUBase.td

mlir/include/mlir/Dialect/GPU/Transforms/Passes.h

mlir/include/mlir/Dialect/GPU/Transforms/Passes.td

mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td

mlir/include/mlir/Dialect/NVGPU/IR/NVGPU.td

mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h

mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp

mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp

mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp

mlir/lib/Conversion/MemRefToSPIRV/MapMemRefStorageClassPass.cpp

mlir/lib/Dialect/GPU/CMakeLists.txt

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp

mlir/lib/Dialect/GPU/Transforms/AllReduceLowering.cpp

mlir/lib/Dialect/GPU/Transforms/LowerMemorySpaceAttributes.cpp

mlir/lib/Dialect/GPU/Transforms/MemoryPromotion.cpp

mlir/lib/Dialect/NVGPU/IR/NVGPUDialect.cpp

mlir/lib/Dialect/NVGPU/Transforms/OptimizeSharedMemory.cpp

mlir/test/Conversion/GPUCommon/memory-attrbution.mlir

mlir/test/Dialect/GPU/all-reduce-max.mlir

mlir/test/Dialect/GPU/all-reduce.mlir

mlir/test/Dialect/GPU/invalid.mlir

mlir/test/Dialect/GPU/promotion.mlir

mlir/test/Dialect/NVGPU/invalid.mlir

[mlir][gpu] Migrate hard-coded address space integers to an enum attribute (gpu::AddressSpaceAttr)
ClosedPublic