This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/lib/Conversion/
-
lib/
-
Conversion/
-
GPUCommon/
1/1
OpToFuncCallLowering.h
-
GPUToNVVM/
-
LowerGpuOpsToNVVMOps.cpp
-
GPUToROCDL/
-
LowerGpuOpsToROCDLOps.cpp

Differential D74389

[MLIR][GPU] Disallow llvm tanh intrinsics when lowering to NVVM/ROCm.
ClosedPublic

Authored by herhut on Feb 11 2020, 1:26 AM.

Download Raw Diff

Details

Reviewers

mravishankar
ftynse

Commits

rG890d5e2dd232: [MLIR][GPU] Disallow llvm tanh intrinsics when lowering to NVVM/ROCm.

Summary

The lowering to NVVM and ROCm handles tanh operations differently by
mapping them to NVVM/ROCm specific intrinsics. This conflicts with
the lowering to LLVM, which uses the default llvm intrinsic. This change
declares the LLVM intrinsics to be illegal, hence disallowing the
correspondign rewrite.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

herhut created this revision.Feb 11 2020, 1:26 AM

Herald added a reviewer: mravishankar. · View Herald TranscriptFeb 11 2020, 1:26 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, Joonsoo, liufengdb and 13 others. · View Herald Transcript

Explicit lambda instead of std::bind.

PTAL.

Writing this in code for discussion. Is this the way to resolve conflicts or do we want to hand-pick patterns from the LLVM conversion? The latter seems brittle as well, as the number of pattern keeps growing and we need to track. By disallowing, we can do this whenever we add a new intrinsic.

I am somewhat torn between the two myself, so I wrote it down to see what it looks like. This would unbreak the windows builds.

This would work for me. At scale, we may need to analyze effects of such approach on compile time. The converter essentially twice the number of patterns it has to search through, and there are potential rollbacks of rewrites that generated illegal operations.

In general, I think we want to expose individual LLVM patterns anyway because several people hit the selection problem repeatedly. I haven't had time to do that yet.

For intrinsics specifically, I have a vague idea of some fallback mechanism, as in: the default Std->LLVM-intrinsic patterns are enabled unless there is a different pattern.

mlir/lib/Conversion/GPUCommon/OpToFuncCallLowering.h
103	Nit: SmallVector is re-exported into the `mlir` namespace, drop the `llvm::`

Harbormaster failed remote builds in B46187: Diff 243761!Feb 11 2020, 1:58 AM

Harbormaster completed remote builds in B46188: Diff 243766.

Drop llvm.

In D74389#1868977, @ftynse wrote:

This would work for me. At scale, we may need to analyze effects of such approach on compile time. The converter essentially twice the number of patterns it has to search through, and there are potential rollbacks of rewrites that generated illegal operations.

In general, I think we want to expose individual LLVM patterns anyway because several people hit the selection problem repeatedly. I haven't had time to do that yet.

The granularity of exposing them is not clear to me. I also thought about an intrinsics/non-intrinsics split. But even the definition of what intrinsic based lowering means is vague. LLVM::ExpOp is an operation but ultimately lowered to some intrinsic. One way to make this cleaner would be to expose all intrinsics as LLVM::Operation in MLIR and only lower them when going to llvm proper. Then at least we can filter on operations and not intrinsic names.

For intrinsics specifically, I have a vague idea of some fallback mechanism, as in: the default Std->LLVM-intrinsic patterns are enabled unless there is a different pattern.

It would be great if one could define priorities of patterns so that certain pattern shadow others. This is different from benefit in that it would have stronger semantics (disallowing patterns rather than preferring them). One idea would be to group pattern into formal groups and then allowing to specify a partial order on these groups, which is then used to specify application order of patterns. That would solve out case where NVVM pattern are meant to shadow LLVM pattern.

For now, should we land this?

Harbormaster completed remote builds in B46191: Diff 243772.Feb 11 2020, 2:33 AM

In D74389#1869018, @herhut wrote:

The granularity of exposing them is not clear to me. I also thought about an intrinsics/non-intrinsics split. But even the definition of what intrinsic based lowering means is vague. LLVM::ExpOp is an operation but ultimately lowered to some intrinsic.

The notion of intrinsic does not exist in MLIR proper, everything is an operation. I replicated it in the LLVM dialect to better match the LLVM IR, and to commonalize the lowering.

One way to make this cleaner would be to expose all intrinsics as LLVM::Operation in MLIR and only lower them when going to llvm proper. Then at least we can filter on operations and not intrinsic names.

We should absolutely do that.

For intrinsics specifically, I have a vague idea of some fallback mechanism, as in: the default Std->LLVM-intrinsic patterns are enabled unless there is a different pattern.

It would be great if one could define priorities of patterns so that certain pattern shadow others. This is different from benefit in that it would have stronger semantics (disallowing patterns rather than preferring them). One idea would be to group pattern into formal groups and then allowing to specify a partial order on these groups, which is then used to specify application order of patterns. That would solve out case where NVVM pattern are meant to shadow LLVM pattern.

For now, should we land this?

Yes if it fixes the build.

This revision is now accepted and ready to land.Feb 11 2020, 2:39 AM

Closed by commit rG890d5e2dd232: [MLIR][GPU] Disallow llvm tanh intrinsics when lowering to NVVM/ROCm. (authored by herhut). · Explain WhyFeb 11 2020, 6:19 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

mlir/

lib/

Conversion/

GPUCommon/

OpToFuncCallLowering.h

17 lines

GPUToNVVM/

LowerGpuOpsToNVVMOps.cpp

2 lines

GPUToROCDL/

LowerGpuOpsToROCDLOps.cpp

5 lines

Diff 243761

mlir/lib/Conversion/GPUCommon/OpToFuncCallLowering.h

Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	LLVM::LLVMFuncOp appendOrGetFuncOp(StringRef funcName,
mlir::OpBuilder b(op->getParentOfType<LLVMFuncOp>());		mlir::OpBuilder b(op->getParentOfType<LLVMFuncOp>());
return b.create<LLVMFuncOp>(op->getLoc(), funcName, funcType);		return b.create<LLVMFuncOp>(op->getLoc(), funcName, funcType);
}		}

const std::string f32Func;		const std::string f32Func;
const std::string f64Func;		const std::string f64Func;
};		};

		namespace gpu {
		/// Returns a predicate to be used with addDynamicallyLegalOp. The predicate
		/// returns false for calls to the provided intrinsics and true otherwise.
		inline std::function<bool(Operation *)>
		filterIllegalLLVMIntrinsics(ArrayRef<StringRef> intrinsics, MLIRContext *ctx) {
		llvm::SmallVector<StringRef, 4> illegalIds(intrinsics.begin(),
		ftynseUnsubmitted Done Reply Inline Actions Nit: SmallVector is re-exported into the `mlir` namespace, drop the `llvm::` ftynse: Nit: SmallVector is re-exported into the `mlir` namespace, drop the `llvm::`
		intrinsics.end());
		return [illegalIds](Operation *op) -> bool {
		LLVM::CallOp callOp = dyn_cast<LLVM::CallOp>(op);
		if (!callOp \|\| !callOp.callee())
		return true;
		StringRef callee = callOp.callee().getValue();
		return !llvm::any_of(illegalIds, std::bind(StringRef::equals, callee));
		};
		}
		} // namespace gpu

} // namespace mlir		} // namespace mlir

#endif // MLIR_CONVERSION_GPUCOMMON_OPTOFUNCCALLLOWERING_H_		#endif // MLIR_CONVERSION_GPUCOMMON_OPTOFUNCCALLLOWERING_H_

mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp

Show First 20 Lines • Show All 708 Lines • ▼ Show 20 Lines	void runOnOperation() override {
populateGpuToNVVMConversionPatterns(converter, patterns);		populateGpuToNVVMConversionPatterns(converter, patterns);
ConversionTarget target(getContext());		ConversionTarget target(getContext());
target.addIllegalDialect<gpu::GPUDialect>();		target.addIllegalDialect<gpu::GPUDialect>();
target.addIllegalOp<LLVM::FAbsOp, LLVM::FCeilOp, LLVM::CosOp,		target.addIllegalOp<LLVM::FAbsOp, LLVM::FCeilOp, LLVM::CosOp,
LLVM::ExpOp>();		LLVM::ExpOp>();
target.addIllegalOp<FuncOp>();		target.addIllegalOp<FuncOp>();
target.addLegalDialect<LLVM::LLVMDialect>();		target.addLegalDialect<LLVM::LLVMDialect>();
target.addLegalDialect<NVVM::NVVMDialect>();		target.addLegalDialect<NVVM::NVVMDialect>();
		target.addDynamicallyLegalOp<mlir::LLVM::CallOp>(
		gpu::filterIllegalLLVMIntrinsics({"tanh", "tanhf"}, m.getContext()));
// TODO(csigg): Remove once we support replacing non-root ops.		// TODO(csigg): Remove once we support replacing non-root ops.
target.addLegalOp<gpu::YieldOp, gpu::GPUModuleOp, gpu::ModuleEndOp>();		target.addLegalOp<gpu::YieldOp, gpu::GPUModuleOp, gpu::ModuleEndOp>();
if (failed(applyPartialConversion(m, target, patterns, &converter)))		if (failed(applyPartialConversion(m, target, patterns, &converter)))
signalPassFailure();		signalPassFailure();
}		}
};		};

} // anonymous namespace		} // anonymous namespace
Show All 34 Lines

mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	patterns.insert<OpToFuncCallLowering<ExpOp>>(converter, "__ocml_exp_f32",
"__ocml_exp_f64");		"__ocml_exp_f64");
patterns.insert<OpToFuncCallLowering<TanhOp>>(converter, "__ocml_tanh_f32",		patterns.insert<OpToFuncCallLowering<TanhOp>>(converter, "__ocml_tanh_f32",
"__ocml_tanh_f64");		"__ocml_tanh_f64");

ConversionTarget target(getContext());		ConversionTarget target(getContext());
target.addLegalDialect<LLVM::LLVMDialect, ROCDL::ROCDLDialect>();		target.addLegalDialect<LLVM::LLVMDialect, ROCDL::ROCDLDialect>();
target.addIllegalOp<LLVM::FAbsOp, LLVM::FCeilOp, LLVM::CosOp,		target.addIllegalOp<LLVM::FAbsOp, LLVM::FCeilOp, LLVM::CosOp,
LLVM::ExpOp>();		LLVM::ExpOp>();
target.addDynamicallyLegalOp<FuncOp>(		target.addDynamicallyLegalOp<LLVM::CallOp>(
[&](FuncOp op) { return converter.isSignatureLegal(op.getType()); });		gpu::filterIllegalLLVMIntrinsics({"tanh", "tanhf"}, m.getContext()));
		target.addIllegalOp<FuncOp>();
if (failed(applyPartialConversion(m, target, patterns, &converter)))		if (failed(applyPartialConversion(m, target, patterns, &converter)))
signalPassFailure();		signalPassFailure();
}		}
};		};

} // anonymous namespace		} // anonymous namespace

std::unique_ptr<OpPassBase<gpu::GPUModuleOp>>		std::unique_ptr<OpPassBase<gpu::GPUModuleOp>>
mlir::createLowerGpuOpsToROCDLOpsPass() {		mlir::createLowerGpuOpsToROCDLOpsPass() {
return std::make_unique<LowerGpuOpsToROCDLOpsPass>();		return std::make_unique<LowerGpuOpsToROCDLOpsPass>();
}		}

static PassRegistration<LowerGpuOpsToROCDLOpsPass>		static PassRegistration<LowerGpuOpsToROCDLOpsPass>
pass("convert-gpu-to-rocdl",		pass("convert-gpu-to-rocdl",
"Generate ROCDL operations for gpu operations");		"Generate ROCDL operations for gpu operations");