This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/Linalg/
-
mlir/
-
Dialect/
-
Linalg/
1/4
Passes.h
-
lib/Conversion/GPUToNVVM/
-
Conversion/
-
GPUToNVVM/
2/5
LowerGpuOpsToNVVMOps.cpp
-
test/Conversion/GPUToNVVM/
-
Conversion/
-
GPUToNVVM/
-
gpu-to-nvvm.mlir

Differential D131135

[MLIR/GPUToNVVM] lowering arith.maxf/minf to __nv_fmax/__nv_fmin
AbandonedPublic

Authored by qingyunqu on Aug 3 2022, 8:21 PM.

Download Raw Diff

Details

Reviewers

ThomasRaoux
herhut
nicolasvasilache

Summary

Lowering arith.maxf/minf to __nv_fmax/__nv_fmin in GPUToNVVM Conversion.
Also remove Linalg pass declaration which is not cleaned.

Diff Detail

Event Timeline

qingyunqu created this revision.Aug 3 2022, 8:21 PM

Herald added a reviewer: ThomasRaoux. · View Herald TranscriptAug 3 2022, 8:21 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: bzcheeseman, mattd, gchakrabarti and 21 others. · View Herald Transcript

qingyunqu requested review of this revision.Aug 3 2022, 8:21 PM

Herald added a reviewer: herhut. · View Herald TranscriptAug 3 2022, 8:21 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache, jholewinski. · View Herald Transcript

qingyunqu set the repository for this revision to rG LLVM Github Monorepo.Aug 3 2022, 8:23 PM

qingyunqu retitled this revision from [MLIR/GPUToNVVM] lowering aritch.maxf/minf to __nv_fmaxf/__nv_fminf to [MLIR/GPUToNVVM] lowering arith.maxf/minf to __nv_fmax/__nv_fmin.

qingyunqu edited the summary of this revision. (Show Details)

Also remove Linalg pass declaration which is not cleaned.

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptAug 3 2022, 8:33 PM

Herald added a subscriber: mravishankar. · View Herald Transcript

qingyunqu edited the summary of this revision. (Show Details)Aug 3 2022, 8:34 PM

Herald added a subscriber: limo1996. · View Herald TranscriptAug 3 2022, 8:34 PM

Harbormaster completed remote builds in B179185: Diff 449859.Aug 3 2022, 8:47 PM

ThomasRaoux added inline comments.Aug 3 2022, 9:06 PM

mlir/include/mlir/Dialect/Linalg/Passes.h
34	Is this on purpose?

ThomasRaoux added inline comments.Aug 3 2022, 9:11 PM

mlir/include/mlir/Dialect/Linalg/Passes.h
34	never mind I see your comment and this is dead.
mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
257	The semantic of `arith::MaxFOp` and `__nv_fmax_f` are different for Nan values: arith::MaxFOp: Returns the maximum of the two arguments, treating -0.0 as less than +0.0. If one of the arguments is NaN, then the result is also NaN. __nv_fmax_f: If one argument is a NaN and the other is legitimate numeric value, the numeric value is chosen.

qingyunqu added inline comments.Aug 3 2022, 9:16 PM

mlir/include/mlir/Dialect/Linalg/Passes.h
34	this pass seems to be removed at https://reviews.llvm.org/D124145

qingyunqu added inline comments.Aug 3 2022, 9:37 PM

mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
257	This seems to be a problem. BTW, does other math function have the same problem? like math.sin(NaN) and __nv_sin(NaN).

ThomasRaoux added inline comments.Aug 3 2022, 9:41 PM

mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
257	Yes I don't think we can have this lowering with the current semantic of `arith::MaxFOp` as far as I know this problem is only for fmax/fmin as the semantic of TF and other ML framework tends to be different than what hw natively support. For other arithmetic operations the semantic is more obvious.

mravishankar added inline comments.Aug 3 2022, 10:19 PM

mlir/include/mlir/Dialect/Linalg/Passes.h
34	Thanks!

qingyunqu added inline comments.Aug 3 2022, 11:25 PM

mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
257	Thanks, should I close this differential?

ThomasRaoux added inline comments.Aug 3 2022, 11:44 PM

mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
257	Yes unfortunately I don’t think there is another solution right now.

qingyunqu abandoned this revision.Aug 3 2022, 11:47 PM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Linalg/

Passes.h

1 line

lib/

Conversion/

GPUToNVVM/

LowerGpuOpsToNVVMOps.cpp

4 lines

test/

Conversion/

GPUToNVVM/

gpu-to-nvvm.mlir

30 lines

Diff 449859

mlir/include/mlir/Dialect/Linalg/Passes.h

	Show All 25 Lines
	struct OneShotBufferizationOptions;			struct OneShotBufferizationOptions;
	} // namespace bufferization			} // namespace bufferization

	std::unique_ptr<Pass> createConvertElementwiseToLinalgPass();			std::unique_ptr<Pass> createConvertElementwiseToLinalgPass();

	std::unique_ptr<Pass> createLinalgFoldUnitExtentDimsPass();			std::unique_ptr<Pass> createLinalgFoldUnitExtentDimsPass();

	std::unique_ptr<Pass> createLinalgElementwiseOpFusionPass();			std::unique_ptr<Pass> createLinalgElementwiseOpFusionPass();
	std::unique_ptr<Pass> createFoldReshapeOpsByLinearizationPass();
	ThomasRaouxUnsubmitted Not Done Reply Inline Actions Is this on purpose? ThomasRaoux: Is this on purpose?
	ThomasRaouxUnsubmitted Not Done Reply Inline Actions never mind I see your comment and this is dead. ThomasRaoux: never mind I see your comment and this is dead.
	qingyunquAuthorUnsubmitted Done Reply Inline Actions this pass seems to be removed at https://reviews.llvm.org/D124145 qingyunqu: this pass seems to be removed at https://reviews.llvm.org/D124145
	mravishankarUnsubmitted Not Done Reply Inline Actions Thanks! mravishankar: Thanks!

	std::unique_ptr<Pass> createLinalgNamedOpConversionPass();			std::unique_ptr<Pass> createLinalgNamedOpConversionPass();

	std::unique_ptr<OperationPass<func::FuncOp>>			std::unique_ptr<OperationPass<func::FuncOp>>
	createLinalgTilingPass(ArrayRef<int64_t> tileSizes = {},			createLinalgTilingPass(ArrayRef<int64_t> tileSizes = {},
	linalg::LinalgTilingLoopType loopType =			linalg::LinalgTilingLoopType loopType =
	linalg::LinalgTilingLoopType::Loops);			linalg::LinalgTilingLoopType::Loops);

	▲ Show 20 Lines • Show All 120 Lines • Show Last 20 Lines

mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp

Show First 20 Lines • Show All 248 Lines • ▼ Show 20 Lines	void mlir::populateGpuToNVVMConversionPatterns(LLVMTypeConverter &converter,
// Explicitly drop memory space when lowering private memory		// Explicitly drop memory space when lowering private memory
// attributions since NVVM models it as `alloca`s in the default		// attributions since NVVM models it as `alloca`s in the default
// memory space and does not support `alloca`s with addrspace(5).		// memory space and does not support `alloca`s with addrspace(5).
patterns.add<GPUFuncOpLowering>(		patterns.add<GPUFuncOpLowering>(
converter, /allocaAddrSpace=/0,		converter, /allocaAddrSpace=/0,
StringAttr::get(&converter.getContext(),		StringAttr::get(&converter.getContext(),
NVVM::NVVMDialect::getKernelFuncAttrName()));		NVVM::NVVMDialect::getKernelFuncAttrName()));

		patterns.add<OpToFuncCallLowering<arith::MaxFOp>>(converter, "__nv_fmaxf",
		ThomasRaouxUnsubmitted Not Done Reply Inline Actions The semantic of `arith::MaxFOp` and `__nv_fmax_f` are different for Nan values: arith::MaxFOp: Returns the maximum of the two arguments, treating -0.0 as less than +0.0. If one of the arguments is NaN, then the result is also NaN. __nv_fmax_f: If one argument is a NaN and the other is legitimate numeric value, the numeric value is chosen. ThomasRaoux: The semantic of `arith::MaxFOp` and `__nv_fmax_f` are different for Nan values: arith::MaxFOp…
		qingyunquAuthorUnsubmitted Done Reply Inline Actions This seems to be a problem. BTW, does other math function have the same problem? like math.sin(NaN) and __nv_sin(NaN). qingyunqu: This seems to be a problem. BTW, does other math function have the same problem? like math.sin…
		ThomasRaouxUnsubmitted Not Done Reply Inline Actions Yes I don't think we can have this lowering with the current semantic of `arith::MaxFOp` as far as I know this problem is only for fmax/fmin as the semantic of TF and other ML framework tends to be different than what hw natively support. For other arithmetic operations the semantic is more obvious. ThomasRaoux: Yes I don't think we can have this lowering with the current semantic of `arith::MaxFOp` as far…
		qingyunquAuthorUnsubmitted Done Reply Inline Actions Thanks, should I close this differential? qingyunqu: Thanks, should I close this differential?
		ThomasRaouxUnsubmitted Not Done Reply Inline Actions Yes unfortunately I don’t think there is another solution right now. ThomasRaoux: Yes unfortunately I don’t think there is another solution right now.
		"__nv_fmax");
		patterns.add<OpToFuncCallLowering<arith::MinFOp>>(converter, "__nv_fminf",
		"__nv_fmin");
patterns.add<OpToFuncCallLowering<math::AbsOp>>(converter, "__nv_fabsf",		patterns.add<OpToFuncCallLowering<math::AbsOp>>(converter, "__nv_fabsf",
"__nv_fabs");		"__nv_fabs");
patterns.add<OpToFuncCallLowering<math::AtanOp>>(converter, "__nv_atanf",		patterns.add<OpToFuncCallLowering<math::AtanOp>>(converter, "__nv_atanf",
"__nv_atan");		"__nv_atan");
patterns.add<OpToFuncCallLowering<math::Atan2Op>>(converter, "__nv_atan2f",		patterns.add<OpToFuncCallLowering<math::Atan2Op>>(converter, "__nv_atan2f",
"__nv_atan2");		"__nv_atan2");
patterns.add<OpToFuncCallLowering<math::CeilOp>>(converter, "__nv_ceilf",		patterns.add<OpToFuncCallLowering<math::CeilOp>>(converter, "__nv_ceilf",
"__nv_ceil");		"__nv_ceil");
Show All 34 Lines

mlir/test/Conversion/GPUToNVVM/gpu-to-nvvm.mlir

Show First 20 Lines • Show All 160 Lines • ▼ Show 20 Lines	func.func @gpu_sync() {
gpu.barrier		gpu.barrier
func.return		func.return
}		}
}		}

// -----		// -----

gpu.module @test_module {		gpu.module @test_module {
		// CHECK: llvm.func @__nv_fmaxf(f32, f32) -> f32
		// CHECK: llvm.func @__nv_fmax(f64, f64) -> f64
		// CHECK-LABEL: func @gpu_fmax
		func.func @gpu_fmax(%arg0_f32 : f32, %arg1_f32 : f32, %arg0_f64 : f64, %arg1_f64 : f64) -> (f32, f64) {
		%result32 = arith.maxf %arg0_f32, %arg1_f32 : f32
		// CHECK: llvm.call @__nv_fmaxf(%{{.*}}) : (f32, f32) -> f32
		%result64 = arith.maxf %arg0_f64, %arg1_f64 : f64
		// CHECK: llvm.call @__nv_fmax(%{{.*}}) : (f64, f64) -> f64
		func.return %result32, %result64 : f32, f64
		}
		}

		// -----

		gpu.module @test_module {
		// CHECK: llvm.func @__nv_fminf(f32, f32) -> f32
		// CHECK: llvm.func @__nv_fmin(f64, f64) -> f64
		// CHECK-LABEL: func @gpu_fmin
		func.func @gpu_fmin(%arg0_f32 : f32, %arg1_f32 : f32, %arg0_f64 : f64, %arg1_f64 : f64) -> (f32, f64) {
		%result32 = arith.minf %arg0_f32, %arg1_f32 : f32
		// CHECK: llvm.call @__nv_fminf(%{{.*}}) : (f32, f32) -> f32
		%result64 = arith.minf %arg0_f64, %arg1_f64 : f64
		// CHECK: llvm.call @__nv_fmin(%{{.*}}) : (f64, f64) -> f64
		func.return %result32, %result64 : f32, f64
		}
		}

		// -----

		gpu.module @test_module {
// CHECK: llvm.func @__nv_fabsf(f32) -> f32		// CHECK: llvm.func @__nv_fabsf(f32) -> f32
// CHECK: llvm.func @__nv_fabs(f64) -> f64		// CHECK: llvm.func @__nv_fabs(f64) -> f64
// CHECK-LABEL: func @gpu_fabs		// CHECK-LABEL: func @gpu_fabs
func.func @gpu_fabs(%arg_f32 : f32, %arg_f64 : f64) -> (f32, f64) {		func.func @gpu_fabs(%arg_f32 : f32, %arg_f64 : f64) -> (f32, f64) {
%result32 = math.abs %arg_f32 : f32		%result32 = math.abs %arg_f32 : f32
// CHECK: llvm.call @__nv_fabsf(%{{.*}}) : (f32) -> f32		// CHECK: llvm.call @__nv_fabsf(%{{.*}}) : (f32) -> f32
%result64 = math.abs %arg_f64 : f64		%result64 = math.abs %arg_f64 : f64
// CHECK: llvm.call @__nv_fabs(%{{.*}}) : (f64) -> f64		// CHECK: llvm.call @__nv_fabs(%{{.*}}) : (f64) -> f64
▲ Show 20 Lines • Show All 314 Lines • Show Last 20 Lines