This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/
-
mlir/
-
Conversion/StandardToLLVM/
-
StandardToLLVM/
-
ConvertStandardToLLVM.h
-
IR/
-
PatternMatch.h
-
lib/
-
Conversion/
-
GPUCommon/
-
OpToFuncCallLowering.h
-
GPUToNVVM/
-
LowerGpuOpsToNVVMOps.cpp
-
GPUToROCDL/
-
LowerGpuOpsToROCDLOps.cpp
-
IR/
-
PatternMatch.cpp

Differential D73713

Fixed non-deterministic GPU intrinsic lowering.
AbandonedPublic

Authored by dfki-jugr on Jan 30 2020, 7:46 AM.

Download Raw Diff

Details

Reviewers

rriddle
mehdi_amini
mravishankar
herhut
ftynse
nicolasvasilache

Summary

The current implementation of the intrinsic lowering phases seemed to be
non-determinsitic accross platforms. Multiple patterns with the same
priority have been matched differently on different platforms (Windows/
Linux) (see https://reviews.llvm.org/D73471). This patch circumvents
this issue by using an adjusted pattern-rewriter benefit to avoid
clashes.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	650 ms	libc++.std/containers/sequences/array/array_creation::Unknown Unit Message ("")

Event Timeline

dfki-jugr created this revision.Jan 30 2020, 7:46 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptJan 30 2020, 7:46 AM

Herald added a reviewer: rriddle. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, liufengdb, herhut and 13 others. · View Herald Transcript

Unit tests: fail. 62343 tests passed, 1 failed and 839 were skipped.

failed: libc++.std/containers/sequences/array/array_creation/to_array.fail.cpp

clang-tidy: pass.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

Harbormaster failed remote builds in B45348: Diff 241456!Jan 30 2020, 8:13 AM

This doesn't look an ideal solution. The pattern list is supposed to be deterministic. Can you provide more details here? (ideally with repro instructions)

rriddle requested changes to this revision.Jan 30 2020, 9:40 AM

This revision now requires changes to proceed.Jan 30 2020, 9:40 AM

stella.stamenova added a subscriber: stella.stamenova.Jan 30 2020, 10:41 AM

This seems like hiding a bug, not fixing it, I'd really like us not to get down this road.

Same here, pattern benefits have important use cases. Twiddling them around to make a test pass is not an option.

Herald added a subscriber: Joonsoo. · View Herald TranscriptJan 31 2020, 1:54 PM

Let's use dynamicallyIllegalOp instead to disallow the CPU intrinsic explicitly. That would also make the non-determinism go away by not relying on pattern order anymore. And it also captures the actual issue that the intrinsics call is indeed illegal.

In D73713#1856579, @herhut wrote:

Let's use dynamicallyIllegalOp instead to disallow the CPU intrinsic explicitly. That would also make the non-determinism go away by not relying on pattern order anymore. And it also captures the actual issue that the intrinsics call is indeed illegal.

IIUC: This does not address the underlying issue with the framework though, it seems to still be in the category of "hiding the bug".

In D73713#1858103, @mehdi_amini wrote:

In D73713#1856579, @herhut wrote:

Let's use dynamicallyIllegalOp instead to disallow the CPU intrinsic explicitly. That would also make the non-determinism go away by not relying on pattern order anymore. And it also captures the actual issue that the intrinsics call is indeed illegal.

IIUC: This does not address the underlying issue with the framework though, it seems to still be in the category of "hiding the bug".

No and no. It will not fix the underlying non-determinism issue, so that will have to be addressed. Independently, we should not rely on pattern insertion order for correctness either. So we should exclude the rewriting to LLVM intrinsics that are not available.

So let's get this correct and also fix the underlying issue.

If this is going to take more than a day or two to fix at this point, I think you should at least XFAIL the tests that are failing on Windows because any runs that include MLIR right now are red and will stay so until your fix lands. You can un-XFAIL them once you verify that they pass with the fix.

@mehdi_amini I fully get your point and I totally agree that we should not hide underlying bugs. However, you can argue as soon as two pattern matchers have the same benefit their ordering will be undefinied. In this case, the only possible solution you have, is to change the pattern matcher benefit.
@herhut We follow your arguments that we should unbreak the build on Windows asap. As mentioned above, this does not solve the general non-determinism issue. However, even if the ordering would be deterministic, we should not rely on intrinsic magic that might be easily broken by another commit in the future.

Herald added a reviewer: mravishankar. · View Herald TranscriptFeb 6 2020, 3:23 AM

@rriddle I've tried to come up with a minimal example including a new dialect. However, the issue does not occur in simple cases on our testing machines. The easiest way to dive into this issue would be to move line 728 in LowerGPUOpsToNNVMOps.cc to 750:

// move this line
// populateWithGenerated(converter.getDialect()->getContext(), &patterns);
patterns
    .insert<GPUIndexIntrinsicOpLowering<gpu::ThreadIdOp, NVVM::ThreadIdXOp,
                                        NVVM::ThreadIdYOp, NVVM::ThreadIdZOp>,
            GPUIndexIntrinsicOpLowering<gpu::BlockDimOp, NVVM::BlockDimXOp,
                                        NVVM::BlockDimYOp, NVVM::BlockDimZOp>,
            GPUIndexIntrinsicOpLowering<gpu::BlockIdOp, NVVM::BlockIdXOp,
                                        NVVM::BlockIdYOp, NVVM::BlockIdZOp>,
            GPUIndexIntrinsicOpLowering<gpu::GridDimOp, NVVM::GridDimXOp,
                                        NVVM::GridDimYOp, NVVM::GridDimZOp>,
            GPUAllReduceOpLowering, GPUShuffleOpLowering, GPUFuncOpLowering,
            GPUReturnOpLowering>(converter);
patterns.insert<OpToFuncCallLowering<AbsFOp>>(converter, "__nv_fabsf",
                                             "__nv_fabs");
patterns.insert<OpToFuncCallLowering<CeilFOp>>(converter, "__nv_ceilf",
                                             "__nv_ceil");
patterns.insert<OpToFuncCallLowering<CosOp>>(converter, "__nv_cosf",
                                             "__nv_cos");
patterns.insert<OpToFuncCallLowering<ExpOp>>(converter, "__nv_expf",
                                             "__nv_exp");
patterns.insert<OpToFuncCallLowering<TanhOp>>(converter, "__nv_tanhf",
                                              "__nv_tanh");
// to here
populateWithGenerated(converter.getDialect()->getContext(), &patterns);

nicolasvasilache resigned from this revision.Oct 2 2020, 1:26 AM

Herald added a reviewer: herhut. · View Herald TranscriptOct 2 2020, 1:26 AM

Herald added a reviewer: ftynse. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: tatianashp, msifontes, jurahul and 5 others. · View Herald Transcript

dfki-jugr abandoned this revision.Nov 13 2020, 1:31 AM

Herald added a subscriber: rdzhabarov. · View Herald TranscriptNov 13 2020, 1:31 AM

Revision Contents

Path

Size

mlir/

include/

mlir/

Conversion/

StandardToLLVM/

ConvertStandardToLLVM.h

6 lines

IR/

PatternMatch.h

3 lines

lib/

Conversion/

GPUCommon/

OpToFuncCallLowering.h

5 lines

GPUToNVVM/

LowerGpuOpsToNVVMOps.cpp

5 lines

GPUToROCDL/

LowerGpuOpsToROCDLOps.cpp

5 lines

IR/

PatternMatch.cpp

6 lines

Diff 241456

mlir/include/mlir/Conversion/StandardToLLVM/ConvertStandardToLLVM.h

Show First 20 Lines • Show All 224 Lines • ▼ Show 20 Lines	public:
/// Builds IR setting ranked memref descriptor ptr		/// Builds IR setting ranked memref descriptor ptr
void setMemRefDescPtr(OpBuilder &builder, Location loc, Value value);		void setMemRefDescPtr(OpBuilder &builder, Location loc, Value value);
};		};
/// Base class for operation conversions targeting the LLVM IR dialect. Provides		/// Base class for operation conversions targeting the LLVM IR dialect. Provides
/// conversion patterns with an access to the containing LLVMLowering for the		/// conversion patterns with an access to the containing LLVMLowering for the
/// purpose of type conversions.		/// purpose of type conversions.
class LLVMOpLowering : public ConversionPattern {		class LLVMOpLowering : public ConversionPattern {
public:		public:
		/// Returns the default benefit value of all LLVM op lowering patterns.
		static PatternBenefit getDefaultBenefit() { return PatternBenefit(1); }

LLVMOpLowering(StringRef rootOpName, MLIRContext *context,		LLVMOpLowering(StringRef rootOpName, MLIRContext *context,
LLVMTypeConverter &lowering, PatternBenefit benefit = 1);		LLVMTypeConverter &lowering,
		PatternBenefit benefit = getDefaultBenefit());

protected:		protected:
// Back-reference to the lowering class, used to call type and function		// Back-reference to the lowering class, used to call type and function
// conversions accounting for potential extensions.		// conversions accounting for potential extensions.
LLVMTypeConverter &lowering;		LLVMTypeConverter &lowering;
};		};

} // namespace mlir		} // namespace mlir

#endif // MLIR_CONVERSION_STANDARDTOLLVM_CONVERTSTANDARDTOLLVM_H		#endif // MLIR_CONVERSION_STANDARDTOLLVM_CONVERTSTANDARDTOLLVM_H

mlir/include/mlir/IR/PatternMatch.h

Show All 35 Lines	public:

static PatternBenefit impossibleToMatch() { return PatternBenefit(); }		static PatternBenefit impossibleToMatch() { return PatternBenefit(); }
bool isImpossibleToMatch() const { return *this == impossibleToMatch(); }		bool isImpossibleToMatch() const { return *this == impossibleToMatch(); }

/// If the corresponding pattern can match, return its benefit. If the		/// If the corresponding pattern can match, return its benefit. If the
// corresponding pattern isImpossibleToMatch() then this aborts.		// corresponding pattern isImpossibleToMatch() then this aborts.
unsigned short getBenefit() const;		unsigned short getBenefit() const;

		/// Increases the current benefit by one (if possible).
		PatternBenefit increase() const;

bool operator==(const PatternBenefit &rhs) const {		bool operator==(const PatternBenefit &rhs) const {
return representation == rhs.representation;		return representation == rhs.representation;
}		}
bool operator!=(const PatternBenefit &rhs) const { return !(*this == rhs); }		bool operator!=(const PatternBenefit &rhs) const { return !(*this == rhs); }
bool operator<(const PatternBenefit &rhs) const {		bool operator<(const PatternBenefit &rhs) const {
return representation < rhs.representation;		return representation < rhs.representation;
}		}

▲ Show 20 Lines • Show All 419 Lines • Show Last 20 Lines

mlir/lib/Conversion/GPUCommon/OpToFuncCallLowering.h

	Show All 23 Lines
	/// %exp_f32 = std.exp %arg_f32 : f32			/// %exp_f32 = std.exp %arg_f32 : f32
	///			///
	/// will be transformed into			/// will be transformed into
	/// llvm.call @__nv_expf(%arg_f32) : (!llvm.float) -> !llvm.float			/// llvm.call @__nv_expf(%arg_f32) : (!llvm.float) -> !llvm.float
	template <typename SourceOp>			template <typename SourceOp>
	struct OpToFuncCallLowering : public LLVMOpLowering {			struct OpToFuncCallLowering : public LLVMOpLowering {
	public:			public:
	explicit OpToFuncCallLowering(LLVMTypeConverter &lowering_, StringRef f32Func,			explicit OpToFuncCallLowering(LLVMTypeConverter &lowering_, StringRef f32Func,
	StringRef f64Func)			StringRef f64Func,
				PatternBenefit benefit = getDefaultBenefit())
	: LLVMOpLowering(SourceOp::getOperationName(),			: LLVMOpLowering(SourceOp::getOperationName(),
	lowering_.getDialect()->getContext(), lowering_),			lowering_.getDialect()->getContext(), lowering_, benefit),
				Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - lowering_.getDialect()->getContext(), lowering_, benefit), + lowering_.getDialect()->getContext(), lowering_, + benefit), Lint: Pre-merge checks: clang-format: please reformat the code ``` - lowering_.getDialect()…
	f32Func(f32Func), f64Func(f64Func) {}			f32Func(f32Func), f64Func(f64Func) {}

	PatternMatchResult			PatternMatchResult
	matchAndRewrite(Operation *op, ArrayRef<Value> operands,			matchAndRewrite(Operation *op, ArrayRef<Value> operands,
	ConversionPatternRewriter &rewriter) const override {			ConversionPatternRewriter &rewriter) const override {
	using LLVM::LLVMFuncOp;			using LLVM::LLVMFuncOp;
	using LLVM::LLVMType;			using LLVM::LLVMType;

	▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp

Show First 20 Lines • Show All 740 Lines • ▼ Show 20 Lines	void mlir::populateGpuToNVVMConversionPatterns(
patterns.insert<OpToFuncCallLowering<AbsFOp>>(converter, "__nv_fabsf",		patterns.insert<OpToFuncCallLowering<AbsFOp>>(converter, "__nv_fabsf",
"__nv_fabs");		"__nv_fabs");
patterns.insert<OpToFuncCallLowering<CeilFOp>>(converter, "__nv_ceilf",		patterns.insert<OpToFuncCallLowering<CeilFOp>>(converter, "__nv_ceilf",
"__nv_ceil");		"__nv_ceil");
patterns.insert<OpToFuncCallLowering<CosOp>>(converter, "__nv_cosf",		patterns.insert<OpToFuncCallLowering<CosOp>>(converter, "__nv_cosf",
"__nv_cos");		"__nv_cos");
patterns.insert<OpToFuncCallLowering<ExpOp>>(converter, "__nv_expf",		patterns.insert<OpToFuncCallLowering<ExpOp>>(converter, "__nv_expf",
"__nv_exp");		"__nv_exp");
patterns.insert<OpToFuncCallLowering<TanhOp>>(converter, "__nv_tanhf",		patterns.insert<OpToFuncCallLowering<TanhOp>>(
"__nv_tanh");		converter, "__nv_tanhf", "__nv_tanh",
		LLVMOpLowering::getDefaultBenefit().increase());
}		}

std::unique_ptr<OpPassBase<gpu::GPUModuleOp>>		std::unique_ptr<OpPassBase<gpu::GPUModuleOp>>
mlir::createLowerGpuOpsToNVVMOpsPass() {		mlir::createLowerGpuOpsToNVVMOpsPass() {
return std::make_unique<LowerGpuOpsToNVVMOpsPass>();		return std::make_unique<LowerGpuOpsToNVVMOpsPass>();
}		}

static PassRegistration<LowerGpuOpsToNVVMOpsPass>		static PassRegistration<LowerGpuOpsToNVVMOpsPass>
pass("convert-gpu-to-nvvm", "Generate NVVM operations for gpu operations");		pass("convert-gpu-to-nvvm", "Generate NVVM operations for gpu operations");

mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp

Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	void runOnOperation() override {
patterns.insert<OpToFuncCallLowering<AbsFOp>>(converter, "__ocml_fabs_f32",		patterns.insert<OpToFuncCallLowering<AbsFOp>>(converter, "__ocml_fabs_f32",
"__ocml_fabs_f64");		"__ocml_fabs_f64");
patterns.insert<OpToFuncCallLowering<CeilFOp>>(converter, "__ocml_ceil_f32",		patterns.insert<OpToFuncCallLowering<CeilFOp>>(converter, "__ocml_ceil_f32",
"__ocml_ceil_f64");		"__ocml_ceil_f64");
patterns.insert<OpToFuncCallLowering<CosOp>>(converter, "__ocml_cos_f32",		patterns.insert<OpToFuncCallLowering<CosOp>>(converter, "__ocml_cos_f32",
"__ocml_cos_f64");		"__ocml_cos_f64");
patterns.insert<OpToFuncCallLowering<ExpOp>>(converter, "__ocml_exp_f32",		patterns.insert<OpToFuncCallLowering<ExpOp>>(converter, "__ocml_exp_f32",
"__ocml_exp_f64");		"__ocml_exp_f64");
patterns.insert<OpToFuncCallLowering<TanhOp>>(converter, "__ocml_tanh_f32",		patterns.insert<OpToFuncCallLowering<TanhOp>>(
"__ocml_tanh_f64");		converter, "__ocml_tanh_f32", "__ocml_tanh_f64",
		LLVMOpLowering::getDefaultBenefit().increase());

ConversionTarget target(getContext());		ConversionTarget target(getContext());
target.addLegalDialect<LLVM::LLVMDialect, ROCDL::ROCDLDialect>();		target.addLegalDialect<LLVM::LLVMDialect, ROCDL::ROCDLDialect>();
target.addIllegalOp<LLVM::FAbsOp, LLVM::FCeilOp, LLVM::CosOp,		target.addIllegalOp<LLVM::FAbsOp, LLVM::FCeilOp, LLVM::CosOp,
LLVM::ExpOp>();		LLVM::ExpOp>();
target.addDynamicallyLegalOp<FuncOp>(		target.addDynamicallyLegalOp<FuncOp>(
[&](FuncOp op) { return converter.isSignatureLegal(op.getType()); });		[&](FuncOp op) { return converter.isSignatureLegal(op.getType()); });
if (failed(applyPartialConversion(m, target, patterns, &converter)))		if (failed(applyPartialConversion(m, target, patterns, &converter)))
Show All 14 Lines

mlir/lib/IR/PatternMatch.cpp

	Show All 18 Lines
	}			}

	unsigned short PatternBenefit::getBenefit() const {			unsigned short PatternBenefit::getBenefit() const {
	assert(representation != ImpossibleToMatchSentinel &&			assert(representation != ImpossibleToMatchSentinel &&
	"Pattern doesn't match");			"Pattern doesn't match");
	return representation;			return representation;
	}			}

				PatternBenefit PatternBenefit::increase() const {
				if (representation + 1 < ImpossibleToMatchSentinel)
				return PatternBenefit(representation + 1);
				return *this;
				}

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Pattern implementation			// Pattern implementation
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	Pattern::Pattern(StringRef rootName, PatternBenefit benefit,			Pattern::Pattern(StringRef rootName, PatternBenefit benefit,
	MLIRContext *context)			MLIRContext *context)
	: rootKind(OperationName(rootName, context)), benefit(benefit) {}			: rootKind(OperationName(rootName, context)), benefit(benefit) {}

	▲ Show 20 Lines • Show All 164 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Fixed non-deterministic GPU intrinsic lowering.AbandonedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 241456

mlir/include/mlir/Conversion/StandardToLLVM/ConvertStandardToLLVM.h

mlir/include/mlir/IR/PatternMatch.h

mlir/lib/Conversion/GPUCommon/OpToFuncCallLowering.h

mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp

mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp

mlir/lib/IR/PatternMatch.cpp

Fixed non-deterministic GPU intrinsic lowering.
AbandonedPublic