This is an archive of the discontinued LLVM Phabricator instance.

[mlir][AMDGPU] Add emulation pass for atomics on AMDGPU targets
ClosedPublic

Authored by krzysz00 on Apr 19 2023, 9:12 AM.

Download Raw Diff

Details

Reviewers

ThomasRaoux
nicolasvasilache
herhut
nirvedhmeshram

Commits

rGcc4703745ffa: [mlir][AMDGPU] Add emulation pass for atomics on AMDGPU targets

Summary

Not all AMDGPU targets support all atomic operations. For example,
there are not atomic floating-point adds on the gfx10 series. Add a
pass to emulate these operations using a compare-and-swap loop, by
analogy to the generic atomicrmw rewrite in MemrefToLLVM.

This pass is named generally, as in the future we may have a
memref-to-amdgpu that translates constructs like atomicrmw fmax (which
doesn't generally exist in LLVM) to the relevant intrinsics, which may
themselves require emulation.

Since the AMDGPU dialect now has a pass that operates on it, the
dialect's directory structure is reorganized to match other similarly
complex dialects.

The pass should be run before amdgpu-to-rocdl if desired.

This commit also adds f64 support to atomic_fmax.

Depends on D148722

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

krzysz00 created this revision.Apr 19 2023, 9:12 AM

Herald added a reviewer: ThomasRaoux. · View Herald TranscriptApr 19 2023, 9:12 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: bviyer, Moerafaat, zero9178 and 30 others. · View Herald Transcript

krzysz00 requested review of this revision.Apr 19 2023, 9:12 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptApr 19 2023, 9:12 AM

Herald added a reviewer: herhut. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache, wdng. · View Herald Transcript

Harbormaster completed remote builds in B226624: Diff 514985.Apr 19 2023, 9:34 AM

nirvedhmeshram accepted this revision.May 3 2023, 10:37 AM

This revision is now accepted and ready to land.May 3 2023, 10:37 AM

Rebase, fix merge conflict

This revision was landed with ongoing or failed builds.May 3 2023, 2:18 PM

Closed by commit rGcc4703745ffa: [mlir][AMDGPU] Add emulation pass for atomics on AMDGPU targets (authored by krzysz00). · Explain Why

This revision was automatically updated to reflect the committed changes.

krzysz00 added a commit: rGcc4703745ffa: [mlir][AMDGPU] Add emulation pass for atomics on AMDGPU targets.

Harbormaster completed remote builds in B229817: Diff 519252.May 3 2023, 3:25 PM

Revision Contents

Path

Size

mlir/

include/

mlir/

Conversion/

AMDGPUToROCDL/

AMDGPUToROCDL.h

2 lines

Chipset.h

Dialect/

AMDGPU/

AMDGPU.td

AMDGPUDialect.h

CMakeLists.txt

14 lines

	IR/

2 lines

14 lines

Transforms/

6 lines

33 lines

33 lines

	Dialect/	AMDGPU/	Utils/
		Conversion/	AMDGPUToROCDL/

Chipset.h

4 lines

InitAllDialects.h

2 lines

InitAllPasses.h

2 lines

lib/

Conversion/

AMDGPUToROCDL/

AMDGPUToROCDL.cpp

2 lines

CMakeLists.txt

2 lines

Chipset.cpp

Dialect/

AMDGPU/

CMakeLists.txt

2 lines

IR/

AMDGPUDialect.cpp

14 lines

Transforms/

CMakeLists.txt

19 lines

EmulateAtomics.cpp

189 lines

Utils/

CMakeLists.txt

10 lines

	Dialect/	AMDGPU/	Utils/
		Conversion/	AMDGPUToROCDL/

Chipset.cpp

2 lines

test/

Dialect/

AMDGPU/

amdgpu-emulate-atomics.mlir

52 lines

Diff 519253

mlir/include/mlir/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.h

	//===- AMDGPUToROCDL.h - Convert AMDGPU to ROCDL dialect --- C++ --===//			//===- AMDGPUToROCDL.h - Convert AMDGPU to ROCDL dialect --- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	#ifndef MLIR_CONVERSION_AMDGPUTOROCDL_AMDGPUTOROCDL_H_			#ifndef MLIR_CONVERSION_AMDGPUTOROCDL_AMDGPUTOROCDL_H_
	#define MLIR_CONVERSION_AMDGPUTOROCDL_AMDGPUTOROCDL_H_			#define MLIR_CONVERSION_AMDGPUTOROCDL_AMDGPUTOROCDL_H_

	#include "mlir/Conversion/AMDGPUToROCDL/Chipset.h"			#include "mlir/Dialect/AMDGPU/Utils/Chipset.h"
	#include <memory>			#include <memory>
	#include <string>			#include <string>

	namespace mlir {			namespace mlir {

	class LLVMTypeConverter;			class LLVMTypeConverter;
	class RewritePatternSet;			class RewritePatternSet;
	class Pass;			class Pass;
	Show All 13 Lines

mlir/include/mlir/Conversion/AMDGPUToROCDL/Chipset.h

This file was moved to mlir/include/mlir/Dialect/AMDGPU/Utils/Chipset.h.

mlir/include/mlir/Dialect/AMDGPU/AMDGPU.td

This file was moved to mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td.

mlir/include/mlir/Dialect/AMDGPU/AMDGPUDialect.h

This file was moved to mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPUDialect.h.

mlir/include/mlir/Dialect/AMDGPU/CMakeLists.txt

This file was copied to mlir/include/mlir/Dialect/AMDGPU/IR/CMakeLists.txt.

	add_mlir_dialect(AMDGPU amdgpu)			add_subdirectory(IR)
	add_mlir_doc(AMDGPU AMDGPU Dialects/ -gen-dialect-doc)			add_subdirectory(Transforms)

	set(LLVM_TARGET_DEFINITIONS AMDGPU.td)
	mlir_tablegen(AMDGPUEnums.h.inc -gen-enum-decls)
	mlir_tablegen(AMDGPUEnums.cpp.inc -gen-enum-defs)
	add_public_tablegen_target(MLIRAMDGPUEnumsGen)

	set(LLVM_TARGET_DEFINITIONS AMDGPU.td)
	mlir_tablegen(AMDGPUAttributes.h.inc -gen-attrdef-decls -attrdefs-dialect=amdgpu)
	mlir_tablegen(AMDGPUAttributes.cpp.inc -gen-attrdef-defs -attrdefs-dialect=amdgpu)
	add_public_tablegen_target(MLIRAMDGPUAttributesIncGen)

mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td

This file was moved from mlir/include/mlir/Dialect/AMDGPU/AMDGPU.td.

Show First 20 Lines • Show All 215 Lines • ▼ Show 20 Lines	def AMDGPU_RawBufferAtomicFaddOp :
let hasCanonicalizer = 1;		let hasCanonicalizer = 1;
let hasVerifier = 1;		let hasVerifier = 1;
}		}

// Raw buffer atomic floating point max		// Raw buffer atomic floating point max
def AMDGPU_RawBufferAtomicFmaxOp :		def AMDGPU_RawBufferAtomicFmaxOp :
AMDGPU_Op<"raw_buffer_atomic_fmax", [AllElementTypesMatch<["value", "memref"]>,		AMDGPU_Op<"raw_buffer_atomic_fmax", [AllElementTypesMatch<["value", "memref"]>,
AttrSizedOperandSegments]>,		AttrSizedOperandSegments]>,
Arguments<(ins F32:$value,		Arguments<(ins AnyTypeOf<[F32, F64]>:$value,
Arg<AnyMemRef, "buffer to operate on", [MemRead, MemWrite]>:$memref,		Arg<AnyMemRef, "buffer to operate on", [MemRead, MemWrite]>:$memref,
Variadic<I32>:$indices,		Variadic<I32>:$indices,
DefaultValuedAttr<BoolAttr, "true">:$boundsCheck,		DefaultValuedAttr<BoolAttr, "true">:$boundsCheck,
OptionalAttr<I32Attr>:$indexOffset,		OptionalAttr<I32Attr>:$indexOffset,
Optional<I32>:$sgprOffset)> {		Optional<I32>:$sgprOffset)> {

let summary = "Raw Buffer Floating-point Atomic Max (non-GFX9)";		let summary = "Raw Buffer Floating-point Atomic Max (non-GFX9)";
let description = [{		let description = [{
▲ Show 20 Lines • Show All 208 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPUDialect.h

This file was moved from mlir/include/mlir/Dialect/AMDGPU/AMDGPUDialect.h.

	//===- AMDGPUDialect.h - MLIR Dialect for AMDGPU ---------- C++ --===//			//===- AMDGPUDialect.h - MLIR Dialect for AMDGPU ---------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file declares a dialect for MLIR wrappers around AMDGPU-specific			// This file declares a dialect for MLIR wrappers around AMDGPU-specific
	// intrinssics and for other AMD GPU-specific functionality.			// intrinssics and for other AMD GPU-specific functionality.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef MLIR_DIALECT_AMDGPU_AMDGPUDIALECT_H_			#ifndef MLIR_DIALECT_AMDGPU_IR_AMDGPUDIALECT_H_
	#define MLIR_DIALECT_AMDGPU_AMDGPUDIALECT_H_			#define MLIR_DIALECT_AMDGPU_IR_AMDGPUDIALECT_H_

	#include "mlir/IR/BuiltinTypes.h"			#include "mlir/IR/BuiltinTypes.h"
	#include "mlir/IR/Dialect.h"			#include "mlir/IR/Dialect.h"
	#include "mlir/IR/OpDefinition.h"			#include "mlir/IR/OpDefinition.h"
	#include "mlir/Interfaces/SideEffectInterfaces.h"			#include "mlir/Interfaces/SideEffectInterfaces.h"

	#include "mlir/Dialect/AMDGPU/AMDGPUDialect.h.inc"			#include "mlir/Dialect/AMDGPU/IR/AMDGPUDialect.h.inc"

	#include "mlir/Dialect/AMDGPU/AMDGPUEnums.h.inc"			#include "mlir/Dialect/AMDGPU/IR/AMDGPUEnums.h.inc"

	#define GET_ATTRDEF_CLASSES			#define GET_ATTRDEF_CLASSES
	#include "mlir/Dialect/AMDGPU/AMDGPUAttributes.h.inc"			#include "mlir/Dialect/AMDGPU/IR/AMDGPUAttributes.h.inc"

	#define GET_OP_CLASSES			#define GET_OP_CLASSES
	#include "mlir/Dialect/AMDGPU/AMDGPU.h.inc"			#include "mlir/Dialect/AMDGPU/IR/AMDGPU.h.inc"

	#endif // MLIR_DIALECT_AMDGPU_AMDGPUDIALECT_H_			#endif // MLIR_DIALECT_AMDGPU_IR_AMDGPUDIALECT_H_

mlir/include/mlir/Dialect/AMDGPU/IR/CMakeLists.txt

This file was copied from mlir/include/mlir/Dialect/AMDGPU/CMakeLists.txt.

The contents of this file were not changed.

mlir/include/mlir/Dialect/AMDGPU/Transforms/CMakeLists.txt

This file was added.

				set(LLVM_TARGET_DEFINITIONS Passes.td)
				mlir_tablegen(Passes.h.inc -gen-pass-decls -name AMDGPU)
				add_public_tablegen_target(MLIRAMDGPUTransformsIncGen)
				add_dependencies(mlir-headers MLIRAMDGPUTransformsIncGen)

				add_mlir_doc(Passes AMDGPUPasses ./ -gen-pass-doc)

mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.h

This file was added.

				//===-- Passes.h - AMDGPU transformation pass declarations --- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file declares the transformation passes for the TOSA Dialect in MLIR.
				//
				//===----------------------------------------------------------------------===//

				#ifndef MLIR_DIALECT_AMDGPU_TRANSFORMS_PASSES_H_
				#define MLIR_DIALECT_AMDGPU_TRANSFORMS_PASSES_H_

				#include "mlir/Dialect/AMDGPU/Utils/Chipset.h"
				#include "mlir/Pass/Pass.h"

				namespace mlir {
				class ConversionTarget;
				namespace amdgpu {

				#define GEN_PASS_DECL_AMDGPUEMULATEATOMICSPASS
				#define GEN_PASS_REGISTRATION
				#include "mlir/Dialect/AMDGPU/Transforms/Passes.h.inc"

				void populateAmdgpuEmulateAtomicsPatterns(ConversionTarget &target,
				RewritePatternSet &patterns,
				Chipset chipset);
				} // namespace amdgpu
				} // namespace mlir

				#endif // MLIR_DIALECT_AMDGPU_TRANSFORMS_PASSES_H_

mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.td

This file was added.

				//===-- Passes.td - AMDGPU pass declarations ----- tablegen --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file declares the passes for the AMDGPU Dialect in MLIR.
				//
				//===----------------------------------------------------------------------===//

				#ifndef MLIR_DIALECT_AMDGPU_TRANSFORMS_PASSES_TD_
				#define MLIR_DIALECT_AMDGPU_TRANSFORMS_PASSES_TD_

				include "mlir/Pass/PassBase.td"

				def AmdgpuEmulateAtomicsPass : Pass<"amdgpu-emulate-atomics"> {
				let summary = "Emulate atomic operations on chipsets that do not support them";
				let description = [{
				This pass rewrites any AMDGPU-specific atomic operation that is not supported
				on the given `chipset` into a compare-and-swap loop.
				}];
				let dependentDialects = [
				"cf::ControlFlowDialect",
				"arith::ArithDialect",
				];
				let options = [Option<"chipset", "chipset", "std::string",
				/default=/"\"gfx000\"",
				"Chipset that these operations will run on">];
				}

				#endif // MLIR_DIALECT_AMDGPU_TRANSFORMS_PASSES_TD_

mlir/include/mlir/Dialect/AMDGPU/Utils/Chipset.h

This file was moved from mlir/include/mlir/Conversion/AMDGPUToROCDL/Chipset.h.

	//===- Chipset.h - AMDGPU Chipset version struct ----------- C++ --===//			//===- Chipset.h - AMDGPU Chipset version struct ----------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	#ifndef MLIR_CONVERSION_AMDGPUTOROCDL_CHIPSET_H_			#ifndef MLIR_DIALECT_AMDGPU_UTILS_CHIPSET_H_
	#define MLIR_CONVERSION_AMDGPUTOROCDL_CHIPSET_H_			#define MLIR_DIALECT_AMDGPU_UTILS_CHIPSET_H_

	#include "mlir/Support/LogicalResult.h"			#include "mlir/Support/LogicalResult.h"

	namespace mlir {			namespace mlir {
	namespace amdgpu {			namespace amdgpu {
	struct Chipset {			struct Chipset {
	Chipset() = default;			Chipset() = default;
	Chipset(unsigned majorVersion, unsigned minorVersion)			Chipset(unsigned majorVersion, unsigned minorVersion)
	Show All 10 Lines

mlir/include/mlir/InitAllDialects.h

	//===- InitAllDialects.h - MLIR Dialects Registration ------------ C++ --===//			//===- InitAllDialects.h - MLIR Dialects Registration ------------ C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file defines a helper to trigger the registration of all dialects and			// This file defines a helper to trigger the registration of all dialects and
	// passes to the system.			// passes to the system.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef MLIR_INITALLDIALECTS_H_			#ifndef MLIR_INITALLDIALECTS_H_
	#define MLIR_INITALLDIALECTS_H_			#define MLIR_INITALLDIALECTS_H_

	#include "mlir/Dialect/AMDGPU/AMDGPUDialect.h"			#include "mlir/Dialect/AMDGPU/IR/AMDGPUDialect.h"
	#include "mlir/Dialect/AMX/AMXDialect.h"			#include "mlir/Dialect/AMX/AMXDialect.h"
	#include "mlir/Dialect/Affine/IR/AffineOps.h"			#include "mlir/Dialect/Affine/IR/AffineOps.h"
	#include "mlir/Dialect/Affine/IR/ValueBoundsOpInterfaceImpl.h"			#include "mlir/Dialect/Affine/IR/ValueBoundsOpInterfaceImpl.h"
	#include "mlir/Dialect/Affine/TransformOps/AffineTransformOps.h"			#include "mlir/Dialect/Affine/TransformOps/AffineTransformOps.h"
	#include "mlir/Dialect/Arith/IR/Arith.h"			#include "mlir/Dialect/Arith/IR/Arith.h"
	#include "mlir/Dialect/Arith/IR/ValueBoundsOpInterfaceImpl.h"			#include "mlir/Dialect/Arith/IR/ValueBoundsOpInterfaceImpl.h"
	#include "mlir/Dialect/Arith/Transforms/BufferizableOpInterfaceImpl.h"			#include "mlir/Dialect/Arith/Transforms/BufferizableOpInterfaceImpl.h"
	#include "mlir/Dialect/ArmNeon/ArmNeonDialect.h"			#include "mlir/Dialect/ArmNeon/ArmNeonDialect.h"
	▲ Show 20 Lines • Show All 146 Lines • Show Last 20 Lines

mlir/include/mlir/InitAllPasses.h

Show All 9 Lines
// passes to the system.		// passes to the system.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef MLIR_INITALLPASSES_H_		#ifndef MLIR_INITALLPASSES_H_
#define MLIR_INITALLPASSES_H_		#define MLIR_INITALLPASSES_H_

#include "mlir/Conversion/Passes.h"		#include "mlir/Conversion/Passes.h"
		#include "mlir/Dialect/AMDGPU/Transforms/Passes.h"
#include "mlir/Dialect/Affine/Passes.h"		#include "mlir/Dialect/Affine/Passes.h"
#include "mlir/Dialect/Arith/Transforms/Passes.h"		#include "mlir/Dialect/Arith/Transforms/Passes.h"
#include "mlir/Dialect/Async/Passes.h"		#include "mlir/Dialect/Async/Passes.h"
#include "mlir/Dialect/Bufferization/Transforms/Passes.h"		#include "mlir/Dialect/Bufferization/Transforms/Passes.h"
#include "mlir/Dialect/Func/Transforms/Passes.h"		#include "mlir/Dialect/Func/Transforms/Passes.h"
#include "mlir/Dialect/GPU/Transforms/Passes.h"		#include "mlir/Dialect/GPU/Transforms/Passes.h"
#include "mlir/Dialect/LLVMIR/Transforms/Passes.h"		#include "mlir/Dialect/LLVMIR/Transforms/Passes.h"
#include "mlir/Dialect/Linalg/Passes.h"		#include "mlir/Dialect/Linalg/Passes.h"
Show All 25 Lines	inline void registerAllPasses() {
// General passes		// General passes
registerTransformsPasses();		registerTransformsPasses();

// Conversion passes		// Conversion passes
registerConversionPasses();		registerConversionPasses();

// Dialect passes		// Dialect passes
affine::registerAffinePasses();		affine::registerAffinePasses();
		amdgpu::registerAMDGPUPasses();
registerAsyncPasses();		registerAsyncPasses();
arith::registerArithPasses();		arith::registerArithPasses();
bufferization::registerBufferizationPasses();		bufferization::registerBufferizationPasses();
func::registerFuncPasses();		func::registerFuncPasses();
registerGPUPasses();		registerGPUPasses();
registerGpuSerializeToCubinPass();		registerGpuSerializeToCubinPass();
registerGpuSerializeToHsacoPass();		registerGpuSerializeToHsacoPass();
registerLinalgPasses();		registerLinalgPasses();
Show All 19 Lines

mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp

	//===- AMDGPUToROCDL.cpp - AMDGPU to ROCDL dialect conversion -------===//			//===- AMDGPUToROCDL.cpp - AMDGPU to ROCDL dialect conversion -------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "mlir/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.h"			#include "mlir/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.h"

	#include "mlir/Conversion/LLVMCommon/ConversionTarget.h"			#include "mlir/Conversion/LLVMCommon/ConversionTarget.h"
	#include "mlir/Conversion/LLVMCommon/Pattern.h"			#include "mlir/Conversion/LLVMCommon/Pattern.h"
	#include "mlir/Dialect/AMDGPU/AMDGPUDialect.h"			#include "mlir/Dialect/AMDGPU/IR/AMDGPUDialect.h"
	#include "mlir/Dialect/LLVMIR/LLVMDialect.h"			#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
	#include "mlir/Dialect/LLVMIR/ROCDLDialect.h"			#include "mlir/Dialect/LLVMIR/ROCDLDialect.h"
	#include "mlir/Pass/Pass.h"			#include "mlir/Pass/Pass.h"

	#include "llvm/ADT/STLExtras.h"			#include "llvm/ADT/STLExtras.h"
	#include <optional>			#include <optional>

	namespace mlir {			namespace mlir {
	▲ Show 20 Lines • Show All 544 Lines • Show Last 20 Lines

mlir/lib/Conversion/AMDGPUToROCDL/CMakeLists.txt

	add_mlir_conversion_library(MLIRAMDGPUToROCDL			add_mlir_conversion_library(MLIRAMDGPUToROCDL
	AMDGPUToROCDL.cpp			AMDGPUToROCDL.cpp
	Chipset.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Conversion/AMDGPUToROCDL			${MLIR_MAIN_INCLUDE_DIR}/mlir/Conversion/AMDGPUToROCDL

	DEPENDS			DEPENDS
	MLIRConversionPassIncGen			MLIRConversionPassIncGen

	LINK_COMPONENTS			LINK_COMPONENTS
	Core			Core

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
	MLIRLLVMCommonConversion			MLIRLLVMCommonConversion
	MLIRLLVMDialect			MLIRLLVMDialect
	MLIRROCDLDialect			MLIRROCDLDialect
	MLIRAMDGPUDialect			MLIRAMDGPUDialect
				MLIRAMDGPUUtils
	MLIRPass			MLIRPass
	MLIRTransforms			MLIRTransforms
	)			)

mlir/lib/Conversion/AMDGPUToROCDL/Chipset.cpp

This file was moved to mlir/lib/Dialect/AMDGPU/Utils/Chipset.cpp.

mlir/lib/Dialect/AMDGPU/CMakeLists.txt

	add_subdirectory(IR)			add_subdirectory(IR)
				add_subdirectory(Transforms)
				add_subdirectory(Utils)

mlir/lib/Dialect/AMDGPU/IR/AMDGPUDialect.cpp

//===- AMDGPUDialect.cpp - MLIR AMDGPU dialect implementation --------===//		//===- AMDGPUDialect.cpp - MLIR AMDGPU dialect implementation --------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements the AMDGPU dialect and its operations.		// This file implements the AMDGPU dialect and its operations.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "mlir/Dialect/AMDGPU/AMDGPUDialect.h"		#include "mlir/Dialect/AMDGPU/IR/AMDGPUDialect.h"

#include "mlir/Dialect/Arith/IR/Arith.h"		#include "mlir/Dialect/Arith/IR/Arith.h"
#include "mlir/Dialect/GPU/IR/GPUDialect.h"		#include "mlir/Dialect/GPU/IR/GPUDialect.h"
#include "mlir/IR/Builders.h"		#include "mlir/IR/Builders.h"
#include "mlir/IR/BuiltinTypes.h"		#include "mlir/IR/BuiltinTypes.h"
#include "mlir/IR/Diagnostics.h"		#include "mlir/IR/Diagnostics.h"
#include "mlir/IR/DialectImplementation.h"		#include "mlir/IR/DialectImplementation.h"
#include "mlir/IR/Matchers.h"		#include "mlir/IR/Matchers.h"
#include "mlir/IR/OpImplementation.h"		#include "mlir/IR/OpImplementation.h"
#include "mlir/IR/PatternMatch.h"		#include "mlir/IR/PatternMatch.h"
#include "mlir/IR/TypeUtilities.h"		#include "mlir/IR/TypeUtilities.h"
#include "llvm/ADT/TypeSwitch.h"		#include "llvm/ADT/TypeSwitch.h"

#include <limits>		#include <limits>
#include <optional>		#include <optional>

using namespace mlir;		using namespace mlir;
using namespace mlir::amdgpu;		using namespace mlir::amdgpu;

#include "mlir/Dialect/AMDGPU/AMDGPUDialect.cpp.inc"		#include "mlir/Dialect/AMDGPU/IR/AMDGPUDialect.cpp.inc"

void AMDGPUDialect::initialize() {		void AMDGPUDialect::initialize() {
addOperations<		addOperations<
#define GET_OP_LIST		#define GET_OP_LIST
#include "mlir/Dialect/AMDGPU/AMDGPU.cpp.inc"		#include "mlir/Dialect/AMDGPU/IR/AMDGPU.cpp.inc"
>();		>();
addAttributes<		addAttributes<
#define GET_ATTRDEF_LIST		#define GET_ATTRDEF_LIST
#include "mlir/Dialect/AMDGPU/AMDGPUAttributes.cpp.inc"		#include "mlir/Dialect/AMDGPU/IR/AMDGPUAttributes.cpp.inc"
>();		>();
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// RawBuffer*Op		// RawBuffer*Op
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
template <typename T>		template <typename T>
static LogicalResult verifyRawBufferOp(T &op) {		static LogicalResult verifyRawBufferOp(T &op) {
▲ Show 20 Lines • Show All 226 Lines • ▼ Show 20 Lines	LogicalResult MFMAOp::verify() {

if ((getNegateA() \|\| getNegateB() \|\| getNegateC()) && !destElem.isF64())		if ((getNegateA() \|\| getNegateB() \|\| getNegateC()) && !destElem.isF64())
return emitOpError(		return emitOpError(
"negation flags only available for double-precision operations");		"negation flags only available for double-precision operations");

return success();		return success();
}		}

#include "mlir/Dialect/AMDGPU/AMDGPUEnums.cpp.inc"		#include "mlir/Dialect/AMDGPU/IR/AMDGPUEnums.cpp.inc"

#define GET_ATTRDEF_CLASSES		#define GET_ATTRDEF_CLASSES
#include "mlir/Dialect/AMDGPU/AMDGPUAttributes.cpp.inc"		#include "mlir/Dialect/AMDGPU/IR/AMDGPUAttributes.cpp.inc"

#define GET_OP_CLASSES		#define GET_OP_CLASSES
#include "mlir/Dialect/AMDGPU/AMDGPU.cpp.inc"		#include "mlir/Dialect/AMDGPU/IR/AMDGPU.cpp.inc"

mlir/lib/Dialect/AMDGPU/Transforms/CMakeLists.txt

This file was added.

				add_mlir_dialect_library(MLIRAMDGPUTransforms
				EmulateAtomics.cpp

				ADDITIONAL_HEADER_DIRS
				{$MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/AMDGPU/Transforms

				DEPENDS
				MLIRAMDGPUTransformsIncGen

				LINK_LIBS PUBLIC
				MLIRAMDGPUDialect
				MLIRAMDGPUUtils
				MLIRArithDialect
				MLIRControlFlowDialect
				MLIRIR
				MLIRPass
				MLIRTransforms
				MLIRTransformUtils
				)

mlir/lib/Dialect/AMDGPU/Transforms/EmulateAtomics.cpp

This file was added.

				//===- EmulateAtomics.cpp - Emulate unsupported AMDGPU atomics ------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "mlir/Dialect/AMDGPU/Transforms/Passes.h"

				#include "mlir/Dialect/AMDGPU/IR/AMDGPUDialect.h"
				#include "mlir/Dialect/Arith/IR/Arith.h"
				#include "mlir/Dialect/ControlFlow/IR/ControlFlow.h"
				#include "mlir/Dialect/ControlFlow/IR/ControlFlowOps.h"
				#include "mlir/IR/BuiltinAttributes.h"
				#include "mlir/Transforms/DialectConversion.h"
				#include "mlir/Transforms/GreedyPatternRewriteDriver.h"

				namespace mlir::amdgpu {
				#define GEN_PASS_DEF_AMDGPUEMULATEATOMICSPASS
				#include "mlir/Dialect/AMDGPU/Transforms/Passes.h.inc"
				} // namespace mlir::amdgpu

				using namespace mlir;
				using namespace mlir::amdgpu;

				namespace {
				struct AmdgpuEmulateAtomicsPass
				: public amdgpu::impl::AmdgpuEmulateAtomicsPassBase<
				AmdgpuEmulateAtomicsPass> {
				using AmdgpuEmulateAtomicsPassBase<
				AmdgpuEmulateAtomicsPass>::AmdgpuEmulateAtomicsPassBase;
				void runOnOperation() override;
				};

				template <typename AtomicOp, typename ArithOp>
				struct RawBufferAtomicByCasPattern : public OpConversionPattern<AtomicOp> {
				using OpConversionPattern<AtomicOp>::OpConversionPattern;
				using Adaptor = typename AtomicOp::Adaptor;

				LogicalResult
				matchAndRewrite(AtomicOp atomicOp, Adaptor adaptor,
				ConversionPatternRewriter &rewriter) const override;
				};
				} // namespace

				namespace {
				enum class DataArgAction : unsigned char {
				Duplicate,
				Drop,
				};
				} // namespace

				// Fix up the fact that, when we're migrating from a general bugffer atomic
				// to a load or to a CAS, the number of openrands, and thus the number of
				// entries needed in operand_segment_sizes, needs to change. We use this method
				// because we'd like to preserve unknown attributes on the atomic instead of
				// discarding them.
				static void patchOperandSegmentSizes(ArrayRef<NamedAttribute> attrs,
				SmallVectorImpl<NamedAttribute> &newAttrs,
				DataArgAction action) {
				newAttrs.reserve(attrs.size());
				for (NamedAttribute attr : attrs) {
				if (attr.getName().getValue() != "operand_segment_sizes") {
				newAttrs.push_back(attr);
				continue;
				}
				auto segmentAttr = attr.getValue().cast<DenseI32ArrayAttr>();
				MLIRContext *context = segmentAttr.getContext();
				DenseI32ArrayAttr newSegments;
				switch (action) {
				case DataArgAction::Drop:
				newSegments = DenseI32ArrayAttr::get(
				context, segmentAttr.asArrayRef().drop_front());
				break;
				case DataArgAction::Duplicate: {
				SmallVector<int32_t> newVals;
				ArrayRef<int32_t> oldVals = segmentAttr.asArrayRef();
				newVals.push_back(oldVals[0]);
				newVals.append(oldVals.begin(), oldVals.end());
				newSegments = DenseI32ArrayAttr::get(context, newVals);
				break;
				}
				}
				newAttrs.push_back(NamedAttribute(attr.getName(), newSegments));
				}
				}

				template <typename AtomicOp, typename ArithOp>
				LogicalResult RawBufferAtomicByCasPattern<AtomicOp, ArithOp>::matchAndRewrite(
				AtomicOp atomicOp, Adaptor adaptor,
				ConversionPatternRewriter &rewriter) const {
				Location loc = atomicOp.getLoc();

				ArrayRef<NamedAttribute> origAttrs = atomicOp->getAttrs();
				ValueRange operands = adaptor.getOperands();
				Value data = operands.take_front()[0];
				ValueRange invariantArgs = operands.drop_front();
				Type dataType = data.getType();

				SmallVector<NamedAttribute> loadAttrs;
				patchOperandSegmentSizes(origAttrs, loadAttrs, DataArgAction::Drop);
				Value initialLoad =
				rewriter.create<RawBufferLoadOp>(loc, dataType, invariantArgs, loadAttrs);
				Block *currentBlock = rewriter.getInsertionBlock();
				Block *afterAtomic =
				rewriter.splitBlock(currentBlock, rewriter.getInsertionPoint());
				Block *loopBlock = rewriter.createBlock(afterAtomic, {dataType}, {loc});

				rewriter.setInsertionPointToEnd(currentBlock);
				rewriter.create<cf::BranchOp>(loc, loopBlock, initialLoad);

				rewriter.setInsertionPointToEnd(loopBlock);
				Value prevLoad = loopBlock->getArgument(0);
				Value operated = rewriter.create<ArithOp>(loc, data, prevLoad);

				SmallVector<NamedAttribute> cmpswapAttrs;
				patchOperandSegmentSizes(origAttrs, cmpswapAttrs, DataArgAction::Duplicate);
				SmallVector<Value> cmpswapArgs = {operated, prevLoad};
				cmpswapArgs.append(invariantArgs.begin(), invariantArgs.end());
				Value atomicRes = rewriter.create<RawBufferAtomicCmpswapOp>(
				loc, dataType, cmpswapArgs, cmpswapAttrs);

				// We care about exact bitwise equality here, so do some bitcasts.
				// These will fold away during lowering to the ROCDL dialect, where
				// an int->float bitcast is introduced to account for the fact that cmpswap
				// only takes integer arguments.

				Value prevLoadForCompare = prevLoad;
				Value atomicResForCompare = atomicRes;
				if (auto floatDataTy = dataType.dyn_cast<FloatType>()) {
				Type equivInt = rewriter.getIntegerType(floatDataTy.getWidth());
				prevLoadForCompare =
				rewriter.create<arith::BitcastOp>(loc, equivInt, prevLoad);
				atomicResForCompare =
				rewriter.create<arith::BitcastOp>(loc, equivInt, atomicRes);
				}
				Value canLeave = rewriter.create<arith::CmpIOp>(
				loc, arith::CmpIPredicate::eq, atomicResForCompare, prevLoadForCompare);
				rewriter.create<cf::CondBranchOp>(loc, canLeave, afterAtomic, ValueRange{},
				loopBlock, atomicRes);
				rewriter.replaceOp(atomicOp, {});
				return success();
				}

				void mlir::amdgpu::populateAmdgpuEmulateAtomicsPatterns(
				ConversionTarget &target, RewritePatternSet &patterns, Chipset chipset) {
				// gfx10 has no atomic adds.
				if (chipset.majorVersion == 10 \|\| chipset.majorVersion < 9 \|\|
				(chipset.majorVersion == 9 && chipset.minorVersion < 0x08)) {
				target.addIllegalOp<RawBufferAtomicFaddOp>();
				}
				// gfx9 has no to a very limited support for floating-point min and max.
				if (chipset.majorVersion == 9) {
				if (chipset.minorVersion >= 0x0a) {
				// gfx90a supports f64 max (and min, but we don't have a min wrapper right
				// now) but all other types need to be emulated.
				target.addDynamicallyLegalOp<RawBufferAtomicFmaxOp>(
				[](RawBufferAtomicFmaxOp op) -> bool {
				return op.getValue().getType().isF64();
				});
				} else {
				target.addIllegalOp<RawBufferAtomicFmaxOp>();
				}
				}
				patterns
				.add<RawBufferAtomicByCasPattern<RawBufferAtomicFaddOp, arith::AddFOp>,
				RawBufferAtomicByCasPattern<RawBufferAtomicFmaxOp, arith::MaxFOp>>(
				patterns.getContext());
				}

				void AmdgpuEmulateAtomicsPass::runOnOperation() {
				Operation *op = getOperation();
				FailureOr<Chipset> maybeChipset = Chipset::parse(chipset);
				if (failed(maybeChipset)) {
				emitError(op->getLoc(), "Invalid chipset name: " + chipset);
				return signalPassFailure();
				}

				MLIRContext &ctx = getContext();
				ConversionTarget target(ctx);
				RewritePatternSet patterns(&ctx);
				target.markUnknownOpDynamicallyLegal(
				[](Operation *op) -> bool { return true; });

				populateAmdgpuEmulateAtomicsPatterns(target, patterns, *maybeChipset);
				if (failed(applyPartialConversion(op, target, std::move(patterns))))
				return signalPassFailure();
				}

mlir/lib/Dialect/AMDGPU/Utils/CMakeLists.txt

This file was added.

				add_mlir_dialect_library(MLIRAMDGPUUtils
				Chipset.cpp

				ADDITIONAL_HEADER_DIRS
				${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/AMDGPU/Utils

				LINK_LIBS PUBLIC
				MLIRAMDGPUDialect
				MLIRSupport
				)

mlir/lib/Dialect/AMDGPU/Utils/Chipset.cpp

This file was moved from mlir/lib/Conversion/AMDGPUToROCDL/Chipset.cpp.

	//===- Chipset.cpp - AMDGPU Chipset version struct parsing -----------===//			//===- Chipset.cpp - AMDGPU Chipset version struct parsing -----------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "mlir/Conversion/AMDGPUToROCDL/Chipset.h"			#include "mlir/Dialect/AMDGPU/Utils/Chipset.h"
	#include "mlir/Support/LLVM.h"			#include "mlir/Support/LLVM.h"
	#include "llvm/ADT/StringRef.h"			#include "llvm/ADT/StringRef.h"

	using namespace mlir;			using namespace mlir;
	using namespace mlir::amdgpu;			using namespace mlir::amdgpu;

	FailureOr<Chipset> Chipset::parse(StringRef name) {			FailureOr<Chipset> Chipset::parse(StringRef name) {
	if (!name.startswith("gfx"))			if (!name.startswith("gfx"))
	Show All 11 Lines

mlir/test/Dialect/AMDGPU/amdgpu-emulate-atomics.mlir

This file was added.

				// RUN: mlir-opt -split-input-file -amdgpu-emulate-atomics=chipset=gfx90a %s \| FileCheck %s --check-prefixes=CHECK,GFX9
				// RUN: mlir-opt -split-input-file -amdgpu-emulate-atomics=chipset=gfx1030 %s \| FileCheck %s --check-prefixes=CHECK,GFX10

				// -----

				func.func @atomic_fmax(%val: f32, %buffer: memref<?xf32>, %idx: i32) {
				// CHECK: func @atomic_fmax
				// CHECK-SAME: ([[val:%.+]]: f32, [[buffer:%.+]]: memref<?xf32>, [[idx:%.+]]: i32)
				// CHECK: gpu.printf "Begin\0A"
				// GFX10: amdgpu.raw_buffer_atomic_fmax {foo, indexOffset = 4 : i32} [[val]] -> [[buffer]][[[idx]]]
				// GFX9: [[ld:%.+]] = amdgpu.raw_buffer_load {foo, indexOffset = 4 : i32} [[buffer]][[[idx]]]
				// GFX9: cf.br [[loop:\^.+]]([[ld]] : f32)
				// GFX9: [[loop]]([[arg:%.+]]: f32):
				// GFX9: [[operated:%.+]] = arith.maxf [[val]], [[arg]]
				// GFX9: [[atomicRes:%.+]] = amdgpu.raw_buffer_atomic_cmpswap {foo, indexOffset = 4 : i32} [[operated]], [[arg]] -> [[buffer]][[[idx]]]
				// GFX9: [[argCast:%.+]] = arith.bitcast [[arg]] : f32 to i32
				// GFX9: [[resCast:%.+]] = arith.bitcast [[atomicRes]] : f32 to i32
				// GFX9: [[test:%.+]] = arith.cmpi eq, [[resCast]], [[argCast]]
				// GFX9: cf.cond_br [[test]], [[post:\^.+]], [[loop]]([[atomicRes]] : f32)
				// GFX9: [[post]]:
				// CHECK-NEXT: gpu.printf "End\0A"
				gpu.printf "Begin\n"
				amdgpu.raw_buffer_atomic_fmax {foo, indexOffset = 4 : i32} %val -> %buffer[%idx] : f32 -> memref<?xf32>, i32
				gpu.printf "End\n"
				func.return
				}

				// -----

				func.func @atomic_fmax_f64(%val: f64, %buffer: memref<?xf64>, %idx: i32) {
				// CHECK: func @atomic_fmax_f64
				// CHECK-SAME: ([[val:%.+]]: f64, [[buffer:%.+]]: memref<?xf64>, [[idx:%.+]]: i32)
				// CHECK: gpu.printf "Begin\0A"
				// GFX9: amdgpu.raw_buffer_atomic_fmax [[val]] -> [[buffer]][[[idx]]]
				// GFX10: amdgpu.raw_buffer_atomic_fmax [[val]] -> [[buffer]][[[idx]]]
				// CHECK-NEXT: gpu.printf "End\0A"
				gpu.printf "Begin\n"
				amdgpu.raw_buffer_atomic_fmax %val -> %buffer[%idx] : f64 -> memref<?xf64>, i32
				gpu.printf "End\n"
				func.return
				}

				// -----

				func.func @atomic_fadd(%val: f32, %buffer: memref<?xf32>, %idx: i32) {
				// CHECK: func @atomic_fadd
				// GFX9: amdgpu.raw_buffer_atomic_fadd
				// GFX10: amdgpu.raw_buffer_load
				// GFX10: amdgpu.raw_buffer_atomic_cmpswap
				amdgpu.raw_buffer_atomic_fadd %val -> %buffer[%idx] : f32 -> memref<?xf32>, i32
				func.return
				}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][AMDGPU] Add emulation pass for atomics on AMDGPU targetsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 519253

mlir/include/mlir/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.h

mlir/include/mlir/Conversion/AMDGPUToROCDL/Chipset.h

mlir/include/mlir/Dialect/AMDGPU/AMDGPU.td

mlir/include/mlir/Dialect/AMDGPU/AMDGPUDialect.h

mlir/include/mlir/Dialect/AMDGPU/CMakeLists.txt

mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td

mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPUDialect.h

mlir/include/mlir/Dialect/AMDGPU/IR/CMakeLists.txt

mlir/include/mlir/Dialect/AMDGPU/Transforms/CMakeLists.txt

mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.h

mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.td

mlir/include/mlir/Dialect/AMDGPU/Utils/Chipset.h

mlir/include/mlir/InitAllDialects.h

mlir/include/mlir/InitAllPasses.h

mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp

mlir/lib/Conversion/AMDGPUToROCDL/CMakeLists.txt

mlir/lib/Conversion/AMDGPUToROCDL/Chipset.cpp

mlir/lib/Dialect/AMDGPU/CMakeLists.txt

mlir/lib/Dialect/AMDGPU/IR/AMDGPUDialect.cpp

mlir/lib/Dialect/AMDGPU/Transforms/CMakeLists.txt

mlir/lib/Dialect/AMDGPU/Transforms/EmulateAtomics.cpp

mlir/lib/Dialect/AMDGPU/Utils/CMakeLists.txt

mlir/lib/Dialect/AMDGPU/Utils/Chipset.cpp

mlir/test/Dialect/AMDGPU/amdgpu-emulate-atomics.mlir

[mlir][AMDGPU] Add emulation pass for atomics on AMDGPU targets
ClosedPublic