This is an archive of the discontinued LLVM Phabricator instance.

mlir/include/mlir/Conversion/Passes.td
300	LLVM -> ROCDL
mlir/lib/Conversion/VectorToROCDL/VectorToROCDL.cpp
33	is this class really used?
68	`int1False` doesn't seem meaningful here. According to AMD GCN ISA manual, what you want to represent here should be `glc` and `slc` bits?
91	could you specify the limitations of this patch?
95	could `AffineMap::isMinorIdentity(xferOp.permutation_map())` be used here?
109	from tests provided, it seems `vecWidth` could only be 2 or 4?
112	nit: add "." at the end of the setence.
121	I guess the only meaningful address spaces for this conversion pattern are 0 (generic) and 1 (global) on AMD GPU. Could additional checks be placed here, and also tests to cover memrefs residing on address space 1?
126	DWORD?
188	is this line necessary?
mlir/test/mlir-rocm-runner/vector-transferops.mlir
12	this end-to-end test only checks loading a `vector<2xf32>`, is it possible to add one which checks `vector<4xf32>` as well?

whchung added inline comments.Jun 5 2020, 1:38 PM

mlir/test/Conversion/VectorToROCDL/vector-to-rocdl.mlir
35	could you add tests for `vector.transfer_write`?

jerryyin added a reviewer: whchung.Jun 5 2020, 1:42 PM

jerryyin marked 3 inline comments as done.Jun 5 2020, 1:45 PM

whchung added inline comments.Jun 5 2020, 1:49 PM

mlir/lib/Conversion/VectorToROCDL/VectorToROCDL.cpp
45	`int32Zero` should also have a meaningful name. Is it `vindex`?
185	is it really necessary to populate std->llvm conversion patterns?

Harbormaster completed remote builds in B59314: Diff 268920.Jun 5 2020, 2:03 PM

Harbormaster completed remote builds in B59315: Diff 268921.

jerryyin marked 3 inline comments as done.Jun 5 2020, 3:39 PM

jerryyin added inline comments.

mlir/lib/Conversion/VectorToROCDL/VectorToROCDL.cpp
33	Some very intricate details here. I got to understand this only after I have taken to compare between ods operand adaptor and the op itself. I'm giving an example based on the tablegen-ed .inc files: On one hand, in ods operand adaptor, `getODSOperands()` is implemented with odsOperands.begin() + offset This routes to the "actual" operands with regards the lowering process. For example, a value with `llvm.i64` type. The lowered instruction will be `llvm.add(%0, %1):(!llvm.i64, !llvm.i64) -> !llvm.i64` On the other hand, in the member function provided by itself, `getODSOperands()` is implemented with op.getOperation()->operand_begin() + offset This routes to the operands from MLIR perspective. For example, a value with mlir's `index`type. The lowered instruction will be: `llvm.add(%0, %1):(!llvm.i64, index) -> !llvm.i64`, yielding an error complaining that types does not match. For the specific case, if I don't take the ods version of the operand, then `getDataPtr` either complains it is not valid type match (as example above), or complains that it is not valid memref (because it has already got lowered to llvm).
185	It is for the convenience of unit tests. `createConvertVectorToROCDLPass()` will only be invoked in unit test, and for the very minimal I need to use `FuncOp` and `ReturnOp`. In the actual test I also used `AddOp`. To my impression the `ToLLVMConversion` passes all depends on `StdToLLVM` pass in one way or another to get the (memref) type lowering done correctly.
188	Yes, it has to be there. This allows the mlir type to be lowered llvm type, which, from the perspective of rocdl is not in invalid form.

jerryyin marked 4 inline comments as done.Jun 5 2020, 3:46 PM

jerryyin added inline comments.

mlir/lib/Conversion/VectorToROCDL/VectorToROCDL.cpp
45	Will update.
121	Sure.
mlir/test/Conversion/VectorToROCDL/vector-to-rocdl.mlir
35	Will duplicate the above three cases.
mlir/test/mlir-rocm-runner/vector-transferops.mlir
12	The vector width differences are being done mainly through ROCDL lowering path. But yes, the unit test should not assume based on existing implementations. Will do.

aartbik added inline comments.Jun 5 2020, 3:53 PM

mlir/lib/Conversion/VectorToROCDL/VectorToROCDL.cpp
65	This looks like a copy-and-paste of Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp However, part of the comments were not included. Please fix.
68	At first glance, I thought this was also due to due to copy and paste, since ConvertToLLVMPattern is really a base class for converting to LLVM IR, but I noticed other GPU passes use it too. Still, perhaps good to document somewhere this is really a lowering phase into a mix of ROCDL::xx and LLVM::yy dialects?

jerryyin marked 8 inline comments as done and an inline comment as not done.Jun 8 2020, 8:55 AM

jerryyin added inline comments.

mlir/lib/Conversion/VectorToROCDL/VectorToROCDL.cpp
65	Yes that's where I started from. Comments updated.
68	Hmm, you are right that lowering to GPU dialect eventually lowers to `LLVM dialect` because either `NVVM dialect`(cuda) or `ROCDL dialect`(rocm) is a superset of `LLVM dialect`. This is implied in the directory structure that all those are placed in the same directory with the `LLVM dialect`. I don't have a clear clue where the best place is to put this documentation. Probably not a good idea to put in one conversion pass, but rather better inside of some common place of all ROCDL + NNVM + LLVMAVX512 dialects, which doesn't exist yet. Either way, I'd tend to make it stay as background knowledge as is for now.
91	Will do on top of the pass.
109	That's right. A few lines above there is a comment documenting the x1 issues.
121	I digged around and find no example of global `memref`. If global memref isn't a thing than we don't need to consider about address space 1. Let me know if I'm wrong.
126	Renamed to `constConfigAttr` to align with rest of naming convention.

Address review feedbacks

Harbormaster completed remote builds in B59500: Diff 269260.Jun 8 2020, 10:30 AM

Could you please elaborate on your plans for this revision going forward?
@ThomasRaoux, @mravishankar and I are looking into directly using the vector dialect to target GPUs and this looks like this is going in a similar direction.

mlir/lib/Conversion/VectorToROCDL/VectorToROCDL.cpp
33	The adaptor gives you the illusion of having a TransferReadOp with all its accessors on the `operands`, which as you saw have already been lowered, This hides the underlying implementation details of which of the `operands` is what, in the cases where the op would not have been valid with the lowered operands (i.e. it does not typecheck).

whchung added inline comments.Jun 8 2020, 11:45 AM

mlir/lib/Conversion/VectorToROCDL/VectorToROCDL.cpp
121	To the very least, a check to make sure the address space of memrefs is 0 or 1 should be made, because vector transfers for memrefs on address space 3 or 5 really shouldn't go thru this conversion process yet.

In D81276#2080453, @nicolasvasilache wrote:

Could you please elaborate on your plans for this revision going forward?
@ThomasRaoux, @mravishankar and I are looking into directly using the vector dialect to target GPUs and this looks like this is going in a similar direction.

Thanks for helping explain the adaptor change.

After this CL we might want to generalize the vector transferops lowering with >1d cases. There seems to be some work in scf->llvm conversion pass that does just that. I will have to follow the development a little more closely. But to my understanding that is not a priority for now. This CL is more of a hot fix than the start of new features. The background for the CL is that we realized masked load/store (in vector->llvm pass) results in very poor IR and ISA at AMD GPU target. Therefore comes the pass to fix it.

@whchung Please feel free to add if there's something in additional. Thanks.

Adding addresspace check

jerryyin marked 13 inline comments as done.Jun 8 2020, 1:17 PM

jerryyin marked an inline comment as done.

whchung added inline comments.Jun 8 2020, 1:37 PM

mlir/lib/Conversion/VectorToROCDL/VectorToROCDL.cpp
125	nit: move the check right after `memRefType` is created, and remove curly braces. Curly braces in some checks above could also be removed.

Adjust instruction sequences, remove redundant curly braces

jerryyin marked an inline comment as done.Jun 8 2020, 1:49 PM

LGTM. @aartbik / @nicolasvasilache would you mind give this patch another round of review?

This revision is now accepted and ready to land.Jun 8 2020, 2:12 PM

Harbormaster completed remote builds in B59534: Diff 269332.Jun 8 2020, 2:25 PM

Harbormaster failed remote builds in B59544: Diff 269345!Jun 8 2020, 2:59 PM

mehdi_amini added inline comments.Jun 8 2020, 8:23 PM

mlir/lib/Conversion/VectorToROCDL/VectorToROCDL.cpp
116	Please fix the clang-tidy warning here.

Fix clang-tidy warning by removing unecessary local varaible

jerryyin marked 2 inline comments as done.Jun 9 2020, 6:55 AM

jerryyin added inline comments.

mlir/lib/Conversion/VectorToROCDL/VectorToROCDL.cpp
116	Thanks for your reminder. Done.

jerryyin marked an inline comment as done.Jun 9 2020, 6:56 AM

Harbormaster failed remote builds in B59637: Diff 269532!Jun 9 2020, 7:07 AM

Rebase and remove blank line at EOF

Harbormaster completed remote builds in B59641: Diff 269546.Jun 9 2020, 8:46 AM

In D81276#2080952, @whchung wrote:

LGTM. @aartbik / @nicolasvasilache would you mind give this patch another round of review?

I have nothing further. I would wait for Nicolas' feedback, since he was interested in the direction that would yield the least overlap in efforts.

rriddle added inline comments.Jun 9 2020, 3:17 PM

mlir/include/mlir/Conversion/VectorToROCDL/VectorToROCDL.h
12	nit: I don't think this include is necessary.
mlir/lib/Conversion/VectorToROCDL/VectorToROCDL.cpp
33	nit: Static functions go in the top-level namespace, only use anonymous for things like classes. https://llvm.org/docs/CodingStandards.html#anonymous-namespaces
40	This and the above seem unnecessary. Can you just use either OpTy::OperandAdaptor or OperandAdaptor<OpTy>
43	Missing static on each of these functions.
63	nit: Remove the above newline.
66	nit: ///

jerryyin marked 8 inline comments as done.Jun 10 2020, 7:32 AM

jerryyin added inline comments.

mlir/lib/Conversion/VectorToROCDL/VectorToROCDL.cpp
33	Thanks for pointing out the coding standards page. It provides good justifications on use of namespace and static key word.
40	Thanks for pointing out, just realized that this is the recommended way to do it in the mlir documentation.

Resolve review feedbacks

Harbormaster failed remote builds in B59802: Diff 269843!Jun 10 2020, 8:12 AM

I'm pretty sure the build failure has nothing to do with this CL. I will wait till tomorrow to push it upstream if there's no further feedback to it. Thanks.

Harbormaster completed remote builds in B59802: Diff 269843.Jun 10 2020, 3:02 PM

Pushed to trunk revision: 2abad3433f9f48cb0a103726a9af1ad79603d23d

Revision Contents

Path

Size

mlir/

include/

mlir/

Conversion/

Passes.td

10 lines

VectorToROCDL/

VectorToROCDL.h

28 lines

InitAllPasses.h

1 line

lib/

Conversion/

CMakeLists.txt

1 line

GPUToROCDL/

CMakeLists.txt

1 line

LowerGpuOpsToROCDLOps.cpp

2 lines

VectorToROCDL/

CMakeLists.txt

19 lines

VectorToROCDL.cpp

199 lines

test/

Conversion/

VectorToROCDL/

vector-to-rocdl.mlir

38 lines

mlir-rocm-runner/

vector-transferops.mlir

51 lines

Diff 268921

mlir/include/mlir/Conversion/Passes.td

	Show First 20 Lines • Show All 286 Lines • ▼ Show 20 Lines
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	def ConvertVectorToLLVM : Pass<"convert-vector-to-llvm", "ModuleOp"> {			def ConvertVectorToLLVM : Pass<"convert-vector-to-llvm", "ModuleOp"> {
	let summary = "Lower the operations from the vector dialect into the LLVM "			let summary = "Lower the operations from the vector dialect into the LLVM "
	"dialect";			"dialect";
	let constructor = "mlir::createConvertVectorToLLVMPass()";			let constructor = "mlir::createConvertVectorToLLVMPass()";
	}			}

				//===----------------------------------------------------------------------===//
				// VectorToROCDL
				//===----------------------------------------------------------------------===//

				def ConvertVectorToROCDL : Pass<"convert-vector-to-rocdl", "ModuleOp"> {
				let summary = "Lower the operations from the vector dialect into the ROCDL "
				whchungUnsubmitted Done Reply Inline Actions LLVM -> ROCDL whchung: LLVM -> ROCDL
				"dialect";
				let constructor = "mlir::createConvertVectorToROCDLPass()";
				}

	#endif // MLIR_CONVERSION_PASSES			#endif // MLIR_CONVERSION_PASSES

mlir/include/mlir/Conversion/VectorToROCDL/VectorToROCDL.h

This file was added.

				//===- VectorToROCDL.h - Convert Vector to ROCDL dialect ---- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				#ifndef MLIR_CONVERSION_VECTORTOROCDL_VECTORTOROCDL_H_
				#define MLIR_CONVERSION_VECTORTOROCDL_VECTORTOROCDL_H_

				#include "mlir/Transforms/DialectConversion.h"

				rriddleUnsubmitted Done Reply Inline Actions nit: I don't think this include is necessary. rriddle: nit: I don't think this include is necessary.
				namespace mlir {
				class LLVMTypeConverter;
				class OwningRewritePatternList;
				class ModuleOp;
				template <typename OpT>
				class OperationPass;

				/// Collect a set of patterns to convert from the GPU dialect to ROCDL.
				void populateVectorToROCDLConversionPatterns(
				LLVMTypeConverter &converter, OwningRewritePatternList &patterns);

				/// Create a pass to convert vector operations to the ROCDL dialect.
				std::unique_ptr<OperationPass<ModuleOp>> createConvertVectorToROCDLPass();
				jerryyinAuthorUnsubmitted Done Reply Inline Actions Will update comments jerryyin: Will update comments

				} // namespace mlir
				#endif // MLIR_CONVERSION_VECTORTOROCDL_VECTORTOROCDL_H_

mlir/include/mlir/InitAllPasses.h

	Show All 23 Lines
	#include "mlir/Conversion/LinalgToSPIRV/LinalgToSPIRVPass.h"			#include "mlir/Conversion/LinalgToSPIRV/LinalgToSPIRVPass.h"
	#include "mlir/Conversion/LinalgToStandard/LinalgToStandard.h"			#include "mlir/Conversion/LinalgToStandard/LinalgToStandard.h"
	#include "mlir/Conversion/SCFToGPU/SCFToGPUPass.h"			#include "mlir/Conversion/SCFToGPU/SCFToGPUPass.h"
	#include "mlir/Conversion/SCFToStandard/SCFToStandard.h"			#include "mlir/Conversion/SCFToStandard/SCFToStandard.h"
	#include "mlir/Conversion/ShapeToStandard/ShapeToStandard.h"			#include "mlir/Conversion/ShapeToStandard/ShapeToStandard.h"
	#include "mlir/Conversion/StandardToLLVM/ConvertStandardToLLVMPass.h"			#include "mlir/Conversion/StandardToLLVM/ConvertStandardToLLVMPass.h"
	#include "mlir/Conversion/StandardToSPIRV/ConvertStandardToSPIRVPass.h"			#include "mlir/Conversion/StandardToSPIRV/ConvertStandardToSPIRVPass.h"
	#include "mlir/Conversion/VectorToLLVM/ConvertVectorToLLVM.h"			#include "mlir/Conversion/VectorToLLVM/ConvertVectorToLLVM.h"
				#include "mlir/Conversion/VectorToROCDL/VectorToROCDL.h"
	#include "mlir/Conversion/VectorToSCF/VectorToSCF.h"			#include "mlir/Conversion/VectorToSCF/VectorToSCF.h"
	#include "mlir/Dialect/Affine/Passes.h"			#include "mlir/Dialect/Affine/Passes.h"
	#include "mlir/Dialect/GPU/Passes.h"			#include "mlir/Dialect/GPU/Passes.h"
	#include "mlir/Dialect/LLVMIR/Transforms/LegalizeForExport.h"			#include "mlir/Dialect/LLVMIR/Transforms/LegalizeForExport.h"
	#include "mlir/Dialect/Linalg/Passes.h"			#include "mlir/Dialect/Linalg/Passes.h"
	#include "mlir/Dialect/Quant/Passes.h"			#include "mlir/Dialect/Quant/Passes.h"
	#include "mlir/Dialect/SCF/Passes.h"			#include "mlir/Dialect/SCF/Passes.h"
	#include "mlir/Dialect/SPIRV/Passes.h"			#include "mlir/Dialect/SPIRV/Passes.h"
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

mlir/lib/Conversion/CMakeLists.txt

	add_subdirectory(AffineToStandard)			add_subdirectory(AffineToStandard)
	add_subdirectory(AVX512ToLLVM)			add_subdirectory(AVX512ToLLVM)
	add_subdirectory(GPUCommon)			add_subdirectory(GPUCommon)
	add_subdirectory(GPUToNVVM)			add_subdirectory(GPUToNVVM)
	add_subdirectory(GPUToROCDL)			add_subdirectory(GPUToROCDL)
	add_subdirectory(GPUToSPIRV)			add_subdirectory(GPUToSPIRV)
	add_subdirectory(GPUToVulkan)			add_subdirectory(GPUToVulkan)
	add_subdirectory(LinalgToLLVM)			add_subdirectory(LinalgToLLVM)
	add_subdirectory(LinalgToSPIRV)			add_subdirectory(LinalgToSPIRV)
	add_subdirectory(LinalgToStandard)			add_subdirectory(LinalgToStandard)
	add_subdirectory(SCFToGPU)			add_subdirectory(SCFToGPU)
	add_subdirectory(SCFToStandard)			add_subdirectory(SCFToStandard)
	add_subdirectory(ShapeToStandard)			add_subdirectory(ShapeToStandard)
	add_subdirectory(StandardToLLVM)			add_subdirectory(StandardToLLVM)
	add_subdirectory(StandardToSPIRV)			add_subdirectory(StandardToSPIRV)
				add_subdirectory(VectorToROCDL)
	add_subdirectory(VectorToLLVM)			add_subdirectory(VectorToLLVM)
	add_subdirectory(VectorToSCF)			add_subdirectory(VectorToSCF)

mlir/lib/Conversion/GPUToROCDL/CMakeLists.txt

Show All 9 Lines	add_mlir_conversion_library(MLIRGPUtoROCDLTransforms
MLIRGPUToROCDLIncGen		MLIRGPUToROCDLIncGen

LINK_LIBS PUBLIC		LINK_LIBS PUBLIC
MLIRGPU		MLIRGPU
MLIRLLVMIR		MLIRLLVMIR
MLIRROCDLIR		MLIRROCDLIR
MLIRPass		MLIRPass
MLIRStandardToLLVM		MLIRStandardToLLVM
		MLIRVectorToROCDL
)		)

mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp

Show All 9 Lines
// GPU operations.		// GPU operations.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "mlir/Conversion/GPUToROCDL/GPUToROCDLPass.h"		#include "mlir/Conversion/GPUToROCDL/GPUToROCDLPass.h"

#include "mlir/Conversion/StandardToLLVM/ConvertStandardToLLVMPass.h"		#include "mlir/Conversion/StandardToLLVM/ConvertStandardToLLVMPass.h"
#include "mlir/Conversion/VectorToLLVM/ConvertVectorToLLVM.h"		#include "mlir/Conversion/VectorToLLVM/ConvertVectorToLLVM.h"
		#include "mlir/Conversion/VectorToROCDL/VectorToROCDL.h"
#include "mlir/Dialect/GPU/GPUDialect.h"		#include "mlir/Dialect/GPU/GPUDialect.h"
#include "mlir/Dialect/GPU/Passes.h"		#include "mlir/Dialect/GPU/Passes.h"
#include "mlir/Dialect/LLVMIR/ROCDLDialect.h"		#include "mlir/Dialect/LLVMIR/ROCDLDialect.h"
#include "mlir/Dialect/Vector/VectorOps.h"		#include "mlir/Dialect/Vector/VectorOps.h"
#include "mlir/Pass/Pass.h"		#include "mlir/Pass/Pass.h"
#include "mlir/Transforms/DialectConversion.h"		#include "mlir/Transforms/DialectConversion.h"
#include "llvm/Support/FormatVariadic.h"		#include "llvm/Support/FormatVariadic.h"

Show All 24 Lines	void runOnOperation() override {

OwningRewritePatternList patterns;		OwningRewritePatternList patterns;

populateGpuRewritePatterns(m.getContext(), patterns);		populateGpuRewritePatterns(m.getContext(), patterns);
applyPatternsAndFoldGreedily(m, patterns);		applyPatternsAndFoldGreedily(m, patterns);
patterns.clear();		patterns.clear();

populateVectorToLLVMConversionPatterns(converter, patterns);		populateVectorToLLVMConversionPatterns(converter, patterns);
		populateVectorToROCDLConversionPatterns(converter, patterns);
populateStdToLLVMConversionPatterns(converter, patterns);		populateStdToLLVMConversionPatterns(converter, patterns);
populateGpuToROCDLConversionPatterns(converter, patterns);		populateGpuToROCDLConversionPatterns(converter, patterns);
LLVMConversionTarget target(getContext());		LLVMConversionTarget target(getContext());
target.addIllegalDialect<gpu::GPUDialect>();		target.addIllegalDialect<gpu::GPUDialect>();
target.addIllegalOp<LLVM::CosOp, LLVM::ExpOp, LLVM::FAbsOp, LLVM::FCeilOp,		target.addIllegalOp<LLVM::CosOp, LLVM::ExpOp, LLVM::FAbsOp, LLVM::FCeilOp,
LLVM::LogOp, LLVM::Log10Op, LLVM::Log2Op>();		LLVM::LogOp, LLVM::Log10Op, LLVM::Log2Op>();
target.addIllegalOp<FuncOp>();		target.addIllegalOp<FuncOp>();
target.addLegalDialect<ROCDL::ROCDLDialect>();		target.addLegalDialect<ROCDL::ROCDLDialect>();
▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

mlir/lib/Conversion/VectorToROCDL/CMakeLists.txt

This file was added.

				add_mlir_conversion_library(MLIRVectorToROCDL
				VectorToROCDL.cpp

				ADDITIONAL_HEADER_DIRS
				${MLIR_MAIN_INCLUDE_DIR}/mlir/Conversion/VectorToROCDL

				DEPENDS
				MLIRConversionPassIncGen
				intrinsics_gen

				LINK_COMPONENTS
				Core

				LINK_LIBS PUBLIC
				MLIRROCDLIR
				MLIRStandardToLLVM
				MLIRVector
				MLIRTransforms
				)

mlir/lib/Conversion/VectorToROCDL/VectorToROCDL.cpp

This file was added.

				//===- VectorToROCDL.cpp - Vector to ROCDL lowering passes ------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements a pass to generate ROCDLIR operations for higher-level
				// Vector operations.
				//
				//===----------------------------------------------------------------------===//

				#include "mlir/Conversion/VectorToROCDL/VectorToROCDL.h"

				#include "../PassDetail.h"
				#include "mlir/Conversion/StandardToLLVM/ConvertStandardToLLVM.h"
				#include "mlir/Conversion/StandardToLLVM/ConvertStandardToLLVMPass.h"
				#include "mlir/Dialect/GPU/GPUDialect.h"
				#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
				#include "mlir/Dialect/LLVMIR/ROCDLDialect.h"
				#include "mlir/Dialect/StandardOps/IR/Ops.h"
				#include "mlir/Dialect/Vector/VectorOps.h"
				#include "mlir/Pass/Pass.h"
				#include "mlir/Transforms/DialectConversion.h"

				using namespace mlir;
				using namespace mlir::vector;

				namespace {

				static TransferReadOpOperandAdaptor
				getTransferOpAdapter(TransferReadOp xferOp, ArrayRef<Value> operands) {
				whchungUnsubmitted Done Reply Inline Actions is this class really used? whchung: is this class really used?
				jerryyinAuthorUnsubmitted Done Reply Inline Actions Some very intricate details here. I got to understand this only after I have taken to compare between ods operand adaptor and the op itself. I'm giving an example based on the tablegen-ed .inc files: On one hand, in ods operand adaptor, `getODSOperands()` is implemented with odsOperands.begin() + offset This routes to the "actual" operands with regards the lowering process. For example, a value with `llvm.i64` type. The lowered instruction will be `llvm.add(%0, %1):(!llvm.i64, !llvm.i64) -> !llvm.i64` On the other hand, in the member function provided by itself, `getODSOperands()` is implemented with op.getOperation()->operand_begin() + offset This routes to the operands from MLIR perspective. For example, a value with mlir's `index`type. The lowered instruction will be: `llvm.add(%0, %1):(!llvm.i64, index) -> !llvm.i64`, yielding an error complaining that types does not match. For the specific case, if I don't take the ods version of the operand, then `getDataPtr` either complains it is not valid type match (as example above), or complains that it is not valid memref (because it has already got lowered to llvm). jerryyin: Some very intricate details here. I got to understand this only after I have taken to compare…
				nicolasvasilacheUnsubmitted Done Reply Inline Actions The adaptor gives you the illusion of having a TransferReadOp with all its accessors on the `operands`, which as you saw have already been lowered, This hides the underlying implementation details of which of the `operands` is what, in the cases where the op would not have been valid with the lowered operands (i.e. it does not typecheck). nicolasvasilache: The adaptor gives you the illusion of having a TransferReadOp with all its accessors on the…
				rriddleUnsubmitted Done Reply Inline Actions nit: Static functions go in the top-level namespace, only use anonymous for things like classes. https://llvm.org/docs/CodingStandards.html#anonymous-namespaces rriddle: nit: Static functions go in the top-level namespace, only use anonymous for things like classes.
				jerryyinAuthorUnsubmitted Done Reply Inline Actions Thanks for pointing out the coding standards page. It provides good justifications on use of namespace and static key word. jerryyin: Thanks for pointing out the coding standards page. It provides good justifications on use of…
				return TransferReadOpOperandAdaptor(operands);
				}

				static TransferWriteOpOperandAdaptor
				getTransferOpAdapter(TransferWriteOp xferOp, ArrayRef<Value> operands) {
				return TransferWriteOpOperandAdaptor(operands);
				}
				rriddleUnsubmitted Done Reply Inline Actions This and the above seem unnecessary. Can you just use either OpTy::OperandAdaptor or OperandAdaptor<OpTy> rriddle: This and the above seem unnecessary. Can you just use either OpTy::OperandAdaptor or…
				jerryyinAuthorUnsubmitted Done Reply Inline Actions Thanks for pointing out, just realized that this is the recommended way to do it in the mlir documentation. jerryyin: Thanks for pointing out, just realized that this is the recommended way to do it in the mlir…

				LogicalResult replaceTransferOpWithMubuf(
				ConversionPatternRewriter &rewriter, ArrayRef<Value> operands,
				rriddleUnsubmitted Done Reply Inline Actions Missing static on each of these functions. rriddle: Missing static on each of these functions.
				LLVMTypeConverter &typeConverter, Location loc, TransferReadOp xferOp,
				LLVM::LLVMType &vecTy, Value &dwordConfig, Value &int32Zero,
				whchungUnsubmitted Done Reply Inline Actions `int32Zero` should also have a meaningful name. Is it `vindex`? whchung: `int32Zero` should also have a meaningful name. Is it `vindex`?
				jerryyinAuthorUnsubmitted Done Reply Inline Actions Will update. jerryyin: Will update.
				Value &offsetSizeInBytes, Value &glc, Value &slc) {
				rewriter.replaceOpWithNewOp<ROCDL::MubufLoadOp>(
				xferOp, vecTy, dwordConfig, int32Zero, offsetSizeInBytes, glc, slc);
				return success();
				}

				LogicalResult replaceTransferOpWithMubuf(
				ConversionPatternRewriter &rewriter, ArrayRef<Value> operands,
				LLVMTypeConverter &typeConverter, Location loc, TransferWriteOp xferOp,
				LLVM::LLVMType &vecTy, Value &dwordConfig, Value &int32Zero,
				Value &offsetSizeInBytes, Value &glc, Value &slc) {
				auto adaptor = TransferWriteOpOperandAdaptor(operands);
				rewriter.replaceOpWithNewOp<ROCDL::MubufStoreOp>(xferOp, adaptor.vector(),
				dwordConfig, int32Zero,
				offsetSizeInBytes, glc, slc);

				return success();
				}
				rriddleUnsubmitted Done Reply Inline Actions nit: Remove the above newline. rriddle: nit: Remove the above newline.

				// Conversion pattern that converts a 1-D vector transfer read/write op in a
				aartbikUnsubmitted Done Reply Inline Actions This looks like a copy-and-paste of Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp However, part of the comments were not included. Please fix. aartbik: This looks like a copy-and-paste of Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp However…
				jerryyinAuthorUnsubmitted Done Reply Inline Actions Yes that's where I started from. Comments updated. jerryyin: Yes that's where I started from. Comments updated.
				// sequence of:
				rriddleUnsubmitted Done Reply Inline Actions nit: /// rriddle: nit: ///
				template <typename ConcreteOp>
				class VectorTransferConversion : public ConvertToLLVMPattern {
				whchungUnsubmitted Done Reply Inline Actions `int1False` doesn't seem meaningful here. According to AMD GCN ISA manual, what you want to represent here should be `glc` and `slc` bits? whchung: `int1False` doesn't seem meaningful here. According to AMD GCN ISA manual, what you want to…
				aartbikUnsubmitted Done Reply Inline Actions At first glance, I thought this was also due to due to copy and paste, since ConvertToLLVMPattern is really a base class for converting to LLVM IR, but I noticed other GPU passes use it too. Still, perhaps good to document somewhere this is really a lowering phase into a mix of ROCDL::xx and LLVM::yy dialects? aartbik: At first glance, I thought this was also due to due to copy and paste, since…
				jerryyinAuthorUnsubmitted Done Reply Inline Actions Hmm, you are right that lowering to GPU dialect eventually lowers to `LLVM dialect` because either `NVVM dialect`(cuda) or `ROCDL dialect`(rocm) is a superset of `LLVM dialect`. This is implied in the directory structure that all those are placed in the same directory with the `LLVM dialect`. I don't have a clear clue where the best place is to put this documentation. Probably not a good idea to put in one conversion pass, but rather better inside of some common place of all ROCDL + NNVM + LLVMAVX512 dialects, which doesn't exist yet. Either way, I'd tend to make it stay as background knowledge as is for now. jerryyin: Hmm, you are right that lowering to GPU dialect eventually lowers to `LLVM dialect` because…
				public:
				explicit VectorTransferConversion(MLIRContext *context,
				LLVMTypeConverter &typeConv)
				: ConvertToLLVMPattern(ConcreteOp::getOperationName(), context,
				typeConv) {}

				LogicalResult
				matchAndRewrite(Operation *op, ArrayRef<Value> operands,
				ConversionPatternRewriter &rewriter) const override {
				auto xferOp = cast<ConcreteOp>(op);
				auto adaptor = getTransferOpAdapter(xferOp, operands);

				if (xferOp.getVectorType().getRank() > 1 \|\|
				llvm::size(xferOp.indices()) == 0)
				return failure();

				if (xferOp.permutation_map() !=
				AffineMap::getMinorIdentityMap(xferOp.permutation_map().getNumInputs(),
				xferOp.getVectorType().getRank(),
				op->getContext()))
				return failure();

				// Have it handled in vector->llvm conversion pass.
				whchungUnsubmitted Done Reply Inline Actions could you specify the limitations of this patch? whchung: could you specify the limitations of this patch?
				jerryyinAuthorUnsubmitted Done Reply Inline Actions Will do on top of the pass. jerryyin: Will do on top of the pass.
				if (!xferOp.isMaskedDim(0))
				return failure();

				auto toLLVMTy = [&](Type t) { return typeConverter.convertType(t); };
				whchungUnsubmitted Done Reply Inline Actions could `AffineMap::isMinorIdentity(xferOp.permutation_map())` be used here? whchung: could `AffineMap::isMinorIdentity(xferOp.permutation_map())` be used here?
				LLVM::LLVMType vecTy =
				toLLVMTy(xferOp.getVectorType()).template cast<LLVM::LLVMType>();
				unsigned vecWidth = vecTy.getVectorNumElements();
				Location loc = op->getLoc();

				// The backend result vector scalarization have trouble scalarize
				// <1 x ty> result, exclude the x1 width from the lowering.
				if (vecWidth != 2 && vecWidth != 4)
				return failure();

				// Obtain dataPtr and elementType from the memref.
				MemRefType memRefType = xferOp.getMemRefType();
				auto elementType = memRefType.getElementType();
				auto convertedPtrType = typeConverter.convertType(elementType)
				whchungUnsubmitted Done Reply Inline Actions from tests provided, it seems `vecWidth` could only be 2 or 4? whchung: from tests provided, it seems `vecWidth` could only be 2 or 4?
				jerryyinAuthorUnsubmitted Done Reply Inline Actions That's right. A few lines above there is a comment documenting the x1 issues. jerryyin: That's right. A few lines above there is a comment documenting the x1 issues.
				.template cast<LLVM::LLVMType>()
				.getPointerTo(0);
				// Note that the dataPtr starts at the offset address specified by
				whchungUnsubmitted Done Reply Inline Actions nit: add "." at the end of the setence. whchung: nit: add "." at the end of the setence.
				// indices, so no need to calculat offset size in bytes again in
				// the MUBUF instruction.
				Value dataPtr = getDataPtr(loc, memRefType, adaptor.memref(),
				adaptor.indices(), rewriter, getModule());
				mehdi_aminiUnsubmitted Done Reply Inline Actions Please fix the clang-tidy warning here. mehdi_amini: Please fix the clang-tidy warning here.
				jerryyinAuthorUnsubmitted Done Reply Inline Actions Thanks for your reminder. Done. jerryyin: Thanks for your reminder. Done.

				if (memRefType.getMemorySpace() != 0)
				dataPtr = rewriter.create<LLVM::AddrSpaceCastOp>(loc, convertedPtrType,
				dataPtr);

				whchungUnsubmitted Done Reply Inline Actions I guess the only meaningful address spaces for this conversion pattern are 0 (generic) and 1 (global) on AMD GPU. Could additional checks be placed here, and also tests to cover memrefs residing on address space 1? whchung: I guess the only meaningful address spaces for this conversion pattern are 0 (generic) and 1…
				jerryyinAuthorUnsubmitted Done Reply Inline Actions Sure. jerryyin: Sure.
				jerryyinAuthorUnsubmitted Done Reply Inline Actions I digged around and find no example of global `memref`. If global memref isn't a thing than we don't need to consider about address space 1. Let me know if I'm wrong. jerryyin: I digged around and find no example of global `memref`. If global memref isn't a thing than we…
				whchungUnsubmitted Done Reply Inline Actions To the very least, a check to make sure the address space of memrefs is 0 or 1 should be made, because vector transfers for memrefs on address space 3 or 5 really shouldn't go thru this conversion process yet. whchung: To the very least, a check to make sure the address space of memrefs is 0 or 1 should be made…
				// 1. Create and fill a <4 x i32> dwordConfig with:
				// 1st two elements holding the address of dataPtr.
				// 3rd element: -1.
				// 4th element: 0x27000.
				whchungUnsubmitted Done Reply Inline Actions nit: move the check right after `memRefType` is created, and remove curly braces. Curly braces in some checks above could also be removed. whchung: nit: move the check right after `memRefType` is created, and remove curly braces. Curly braces…
				SmallVector<int32_t, 4> indices{0, 0, -1, 0x27000};
				whchungUnsubmitted Done Reply Inline Actions DWORD? whchung: DWORD?
				jerryyinAuthorUnsubmitted Done Reply Inline Actions Renamed to `constConfigAttr` to align with rest of naming convention. jerryyin: Renamed to `constConfigAttr` to align with rest of naming convention.
				Type i32Ty = rewriter.getIntegerType(32);
				VectorType i32Vecx4 = VectorType::get(4, i32Ty);
				Value constConfig = rewriter.create<LLVM::ConstantOp>(
				loc, toLLVMTy(i32Vecx4),
				DenseElementsAttr::get(i32Vecx4, ArrayRef<int32_t>(indices)));

				// Treat first two element of <4 x i32> as i64, and save the dataPtr
				// to it.
				Type i64Ty = rewriter.getIntegerType(64);
				Value i64x2Ty = rewriter.create<LLVM::BitcastOp>(
				loc,
				LLVM::LLVMType::getVectorTy(
				toLLVMTy(i64Ty).template cast<LLVM::LLVMType>(), 2),
				constConfig);
				Value dataPtrAsI64 = rewriter.create<LLVM::PtrToIntOp>(
				loc, toLLVMTy(i64Ty).template cast<LLVM::LLVMType>(), dataPtr);
				Value zero = createIndexConstant(rewriter, loc, 0);
				Value dwordConfig = rewriter.create<LLVM::InsertElementOp>(
				loc,
				LLVM::LLVMType::getVectorTy(
				toLLVMTy(i64Ty).template cast<LLVM::LLVMType>(), 2),
				i64x2Ty, dataPtrAsI64, zero);
				dwordConfig =
				rewriter.create<LLVM::BitcastOp>(loc, toLLVMTy(i32Vecx4), dwordConfig);

				// 2. Rewrite op as a buffer read or write.
				Value int1False = rewriter.create<LLVM::ConstantOp>(
				loc, toLLVMTy(rewriter.getIntegerType(1)),
				rewriter.getIntegerAttr(rewriter.getIntegerType(1), 0));
				Value int32Zero = rewriter.create<LLVM::ConstantOp>(
				loc, toLLVMTy(i32Ty),
				rewriter.getIntegerAttr(rewriter.getIntegerType(32), 0));
				return replaceTransferOpWithMubuf(rewriter, operands, typeConverter, loc,
				xferOp, vecTy, dwordConfig, int32Zero,
				int32Zero, int1False, int1False);
				}
				};
				} // end anonymous namespace

				void mlir::populateVectorToROCDLConversionPatterns(
				LLVMTypeConverter &converter, OwningRewritePatternList &patterns) {
				MLIRContext *ctx = converter.getDialect()->getContext();
				patterns.insert<VectorTransferConversion<TransferReadOp>,
				VectorTransferConversion<TransferWriteOp>>(ctx, converter);
				}

				namespace {
				struct LowerVectorToROCDLPass
				: public ConvertVectorToROCDLBase<LowerVectorToROCDLPass> {
				void runOnOperation() override;
				};
				} // namespace

				void LowerVectorToROCDLPass::runOnOperation() {
				LLVMTypeConverter converter(&getContext());
				OwningRewritePatternList patterns;

				populateVectorToROCDLConversionPatterns(converter, patterns);
				populateStdToLLVMConversionPatterns(converter, patterns);
				whchungUnsubmitted Done Reply Inline Actions is it really necessary to populate std->llvm conversion patterns? whchung: is it really necessary to populate std->llvm conversion patterns?
				jerryyinAuthorUnsubmitted Done Reply Inline Actions It is for the convenience of unit tests. `createConvertVectorToROCDLPass()` will only be invoked in unit test, and for the very minimal I need to use `FuncOp` and `ReturnOp`. In the actual test I also used `AddOp`. To my impression the `ToLLVMConversion` passes all depends on `StdToLLVM` pass in one way or another to get the (memref) type lowering done correctly. jerryyin: It is for the convenience of unit tests. `createConvertVectorToROCDLPass()` will only be…

				LLVMConversionTarget target(getContext());
				target.addLegalDialect<ROCDL::ROCDLDialect>();
				whchungUnsubmitted Done Reply Inline Actions is this line necessary? whchung: is this line necessary?
				jerryyinAuthorUnsubmitted Done Reply Inline Actions Yes, it has to be there. This allows the mlir type to be lowered llvm type, which, from the perspective of rocdl is not in invalid form. jerryyin: Yes, it has to be there. This allows the mlir type to be lowered llvm type, which, from the…

				if (failed(applyPartialConversion(getOperation(), target, patterns,
				&converter))) {
				signalPassFailure();
				}
				}

				std::unique_ptr<OperationPass<ModuleOp>>
				mlir::createConvertVectorToROCDLPass() {
				return std::make_unique<LowerVectorToROCDLPass>();
				}

mlir/test/Conversion/VectorToROCDL/vector-to-rocdl.mlir

This file was added.

				// RUN: mlir-opt %s -convert-vector-to-rocdl \| FileCheck %s

				gpu.module @test_module{
				func @transfer_readx2(%A : memref<?xf32>, %base: index) -> vector<2xf32> {
				%f0 = constant 0.0: f32
				%f = vector.transfer_read %A[%base], %f0
				{permutation_map = affine_map<(d0) -> (d0)>} :
				memref<?xf32>, vector<2xf32>
				return %f: vector<2xf32>
				}
				// CHECK-LABEL: @transfer_readx2
				// CHECK: rocdl.buffer.load {{.*}} !llvm<"<2 x float>">

				jerryyinAuthorUnsubmitted Done Reply Inline Actions Will update to x2 test case. jerryyin: Will update to x2 test case.
				func @transfer_readx4(%A : memref<?xf32>, %base: index) -> vector<4xf32> {
				%f0 = constant 0.0: f32
				%f = vector.transfer_read %A[%base], %f0
				{permutation_map = affine_map<(d0) -> (d0)>} :
				memref<?xf32>, vector<4xf32>
				return %f: vector<4xf32>
				}
				// CHECK-LABEL: @transfer_readx4
				// CHECK: rocdl.buffer.load {{.*}} !llvm<"<4 x float>">

				func @transfer_read_dwordConfig(%A : memref<?xf32>, %base: index) -> vector<4xf32> {
				%f0 = constant 0.0: f32
				%f = vector.transfer_read %A[%base], %f0
				{permutation_map = affine_map<(d0) -> (d0)>} :
				memref<?xf32>, vector<4xf32>
				return %f: vector<4xf32>
				}
				// CHECK-LABEL: @transfer_read_dwordConfig
				// CHECK: %[[gep:.]] = llvm.getelementptr {{.}}
				// CHECK: [0, 0, -1, 159744]
				// CHECK: %[[i64:.*]] = llvm.ptrtoint %[[gep]]
				// CHECK: llvm.insertelement %[[i64]]
				whchungUnsubmitted Done Reply Inline Actions could you add tests for `vector.transfer_write`? whchung: could you add tests for `vector.transfer_write`?
				jerryyinAuthorUnsubmitted Done Reply Inline Actions Will duplicate the above three cases. jerryyin: Will duplicate the above three cases.

				}

mlir/test/mlir-rocm-runner/vector-transferops.mlir

This file was added.

				// RUN: mlir-rocm-runner %s --shared-libs=%rocm_wrapper_library_dir/librocm-runtime-wrappers%shlibext,%linalg_test_lib_dir/libmlir_runner_utils%shlibext --entry-point-result=void \| FileCheck %s

				func @vectransfer(%arg0 : memref<?xf32>, %arg1 : memref<?xf32>) {
				%cst = constant 1 : index
				gpu.launch blocks(%bx, %by, %bz) in (%grid_x = %cst, %grid_y = %cst, %grid_z = %cst)
				threads(%tx, %ty, %tz) in (%block_x = %cst, %block_y = %cst, %block_z = %cst) {
				%f0 = constant 0.0: f32
				%base = constant 0 : index
				%f = vector.transfer_read %arg0[%base], %f0
				{permutation_map = affine_map<(d0) -> (d0)>} :
				memref<?xf32>, vector<2xf32>

				whchungUnsubmitted Done Reply Inline Actions this end-to-end test only checks loading a `vector<2xf32>`, is it possible to add one which checks `vector<4xf32>` as well? whchung: this end-to-end test only checks loading a `vector<2xf32>`, is it possible to add one which…
				jerryyinAuthorUnsubmitted Done Reply Inline Actions The vector width differences are being done mainly through ROCDL lowering path. But yes, the unit test should not assume based on existing implementations. Will do. jerryyin: The vector width differences are being done mainly through ROCDL lowering path. But yes, the…
				%c = addf %f, %f : vector<2xf32>

				%base1 = constant 1 : index
				vector.transfer_write %c, %arg1[%base1]
				{permutation_map = affine_map<(d0) -> (d0)>} :
				vector<2xf32>, memref<?xf32>

				gpu.terminator
				}
				return
				}

				// CHECK: [1.23, 2.46, 2.46, 1.23]
				func @main() {
				%cf1 = constant 1.0 : f32

				%arg0 = alloc() : memref<4xf32>
				%arg1 = alloc() : memref<4xf32>

				%22 = memref_cast %arg0 : memref<4xf32> to memref<?xf32>
				%23 = memref_cast %arg1 : memref<4xf32> to memref<?xf32>

				%cast0 = memref_cast %22 : memref<?xf32> to memref<*xf32>
				%cast1 = memref_cast %23 : memref<?xf32> to memref<*xf32>

				call @mgpuMemHostRegisterFloat(%cast0) : (memref<*xf32>) -> ()
				call @mgpuMemHostRegisterFloat(%cast1) : (memref<*xf32>) -> ()

				%24 = call @mgpuMemGetDeviceMemRef1dFloat(%22) : (memref<?xf32>) -> (memref<?xf32>)
				%26 = call @mgpuMemGetDeviceMemRef1dFloat(%23) : (memref<?xf32>) -> (memref<?xf32>)

				call @vectransfer(%24, %26) : (memref<?xf32>, memref<?xf32>) -> ()
				call @print_memref_f32(%cast1) : (memref<*xf32>) -> ()
				return
				}

				func @mgpuMemHostRegisterFloat(%ptr : memref<*xf32>)
				func @mgpuMemGetDeviceMemRef1dFloat(%ptr : memref<?xf32>) -> (memref<?xf32>)
				func @print_memref_f32(%ptr : memref<*xf32>)

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][rocdl] Adding vector to ROCDL dialect loweringClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 268921

mlir/include/mlir/Conversion/Passes.td

mlir/include/mlir/Conversion/VectorToROCDL/VectorToROCDL.h

mlir/include/mlir/InitAllPasses.h

mlir/lib/Conversion/CMakeLists.txt

mlir/lib/Conversion/GPUToROCDL/CMakeLists.txt

mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp

mlir/lib/Conversion/VectorToROCDL/CMakeLists.txt

mlir/lib/Conversion/VectorToROCDL/VectorToROCDL.cpp

mlir/test/Conversion/VectorToROCDL/vector-to-rocdl.mlir

mlir/test/mlir-rocm-runner/vector-transferops.mlir

[mlir][rocdl] Adding vector to ROCDL dialect lowering
ClosedPublic