This is an archive of the discontinued LLVM Phabricator instance.

mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
263	RemFOp already lowers to llvm instruction frem (https://llvm.org/docs/LangRef.html#frem-instruction). Trying it out this works fine through the NVPTX backend. Therefore I don't think we need that. What problem is it trying to solve?

This revision now requires changes to proceed.Dec 27 2022, 1:38 AM

The fix above corrects an edge case in which the result of the remainder operation is NaN when the second argument is inf.
This was detected in TensorFlow (https://github.com/tensorflow/tensorflow/issues/58369). (Sorry for the delay, I had to fix a problem with my bug repro :-))

Requesting review again with this additional information.

In D140684#4019204, @bchetioui wrote:

The fix above corrects an edge case in which the result of the remainder operation is NaN when the second argument is inf.
This was detected in TensorFlow (https://github.com/tensorflow/tensorflow/issues/58369). (Sorry for the delay, I had to fix a problem with my bug repro :-))

Requesting review again with this additional information.

Thanks for explaining. I don't think this is the right fix as this would mean the other existing lowering of arith.remf would be incorrect and cause miscompile if applied before this pattern.
Looking at it a bit, llvm instruction frem should return the result you want for frem 1.f, inf : f32. The instruction is defined here https://llvm.org/docs/LangRef.html#frem-instruction and says the semantic should match libm fmod. fmod(1.f, inf) returns 1.f.
The bug seems to be in NVPTX backend: https://github.com/llvm/llvm-project/blob/2784b243e38f7d526a40838a854554b53ebeb41e/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td#L1156
The sequence emitted won't do the right thing for inf, it should be fixed there.

Thanks for pointing this out, I agree that this would be a better fix.
I will work on a fix there.

The proposed fix is at https://reviews.llvm.org/D140846. Thanks again for pointing me there!

Abandoned in favour of https://reviews.llvm.org/D140846.

Revision Contents

Path

Size

mlir/

lib/

Conversion/

GPUToNVVM/

LowerGpuOpsToNVVMOps.cpp

7 lines

Diff 485356

mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp

Show First 20 Lines • Show All 215 Lines • ▼ Show 20 Lines
} // namespace		} // namespace

void mlir::configureGpuToNVVMConversionLegality(ConversionTarget &target) {		void mlir::configureGpuToNVVMConversionLegality(ConversionTarget &target) {
target.addIllegalOp<func::FuncOp>();		target.addIllegalOp<func::FuncOp>();
target.addLegalDialect<::mlir::LLVM::LLVMDialect>();		target.addLegalDialect<::mlir::LLVM::LLVMDialect>();
target.addLegalDialect<::mlir::NVVM::NVVMDialect>();		target.addLegalDialect<::mlir::NVVM::NVVMDialect>();
target.addIllegalDialect<gpu::GPUDialect>();		target.addIllegalDialect<gpu::GPUDialect>();
target.addIllegalOp<LLVM::CosOp, LLVM::ExpOp, LLVM::Exp2Op, LLVM::FAbsOp,		target.addIllegalOp<LLVM::CosOp, LLVM::ExpOp, LLVM::Exp2Op, LLVM::FAbsOp,
LLVM::FCeilOp, LLVM::FFloorOp, LLVM::LogOp, LLVM::Log10Op,		LLVM::FCeilOp, LLVM::FFloorOp, LLVM::FRemOp, LLVM::LogOp,
LLVM::Log2Op, LLVM::PowOp, LLVM::SinOp, LLVM::SqrtOp>();		LLVM::Log10Op, LLVM::Log2Op, LLVM::PowOp, LLVM::SinOp,
		LLVM::SqrtOp>();

// TODO: Remove once we support replacing non-root ops.		// TODO: Remove once we support replacing non-root ops.
target.addLegalOp<gpu::YieldOp, gpu::GPUModuleOp, gpu::ModuleEndOp>();		target.addLegalOp<gpu::YieldOp, gpu::GPUModuleOp, gpu::ModuleEndOp>();
}		}

template <typename OpTy>		template <typename OpTy>
static void populateOpPatterns(LLVMTypeConverter &converter,		static void populateOpPatterns(LLVMTypeConverter &converter,
RewritePatternSet &patterns, StringRef f32Func,		RewritePatternSet &patterns, StringRef f32Func,
Show All 20 Lines	void mlir::populateGpuToNVVMConversionPatterns(LLVMTypeConverter &converter,
// Explicitly drop memory space when lowering private memory		// Explicitly drop memory space when lowering private memory
// attributions since NVVM models it as `alloca`s in the default		// attributions since NVVM models it as `alloca`s in the default
// memory space and does not support `alloca`s with addrspace(5).		// memory space and does not support `alloca`s with addrspace(5).
patterns.add<GPUFuncOpLowering>(		patterns.add<GPUFuncOpLowering>(
converter, /allocaAddrSpace=/0,		converter, /allocaAddrSpace=/0,
StringAttr::get(&converter.getContext(),		StringAttr::get(&converter.getContext(),
NVVM::NVVMDialect::getKernelFuncAttrName()));		NVVM::NVVMDialect::getKernelFuncAttrName()));

		populateOpPatterns<arith::RemFOp>(converter, patterns, "__nv_remainderf",
		ThomasRaouxUnsubmitted Not Done Reply Inline Actions RemFOp already lowers to llvm instruction frem (https://llvm.org/docs/LangRef.html#frem-instruction). Trying it out this works fine through the NVPTX backend. Therefore I don't think we need that. What problem is it trying to solve? ThomasRaoux: RemFOp already lowers to llvm instruction frem (https://llvm.org/docs/LangRef.html#frem…
		"__nv_remainder");
populateOpPatterns<math::AbsFOp>(converter, patterns, "__nv_fabsf",		populateOpPatterns<math::AbsFOp>(converter, patterns, "__nv_fabsf",
"__nv_fabs");		"__nv_fabs");
populateOpPatterns<math::AtanOp>(converter, patterns, "__nv_atanf",		populateOpPatterns<math::AtanOp>(converter, patterns, "__nv_atanf",
"__nv_atan");		"__nv_atan");
populateOpPatterns<math::Atan2Op>(converter, patterns, "__nv_atan2f",		populateOpPatterns<math::Atan2Op>(converter, patterns, "__nv_atan2f",
"__nv_atan2");		"__nv_atan2");
populateOpPatterns<math::CeilOp>(converter, patterns, "__nv_ceilf",		populateOpPatterns<math::CeilOp>(converter, patterns, "__nv_ceilf",
"__nv_ceil");		"__nv_ceil");
Show All 30 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][gpu] Lower arith::RemFOp to __nv_remainder on NVIDIA GPU.AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 485356

mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp

[mlir][gpu] Lower arith::RemFOp to __nv_remainder on NVIDIA GPU.
AbandonedPublic