This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add amdgpu-unify-metadata pass
ClosedPublic

Authored by rampitec on Oct 7 2016, 2:50 PM.

Download Raw Diff

Details

Reviewers

tony-tye
• tstellarAMD
vpykhtin
arsenm

Commits

rG50ea93a2bda4: [AMDGPU] Add amdgpu-unify-metadata pass
rL289092: [AMDGPU] Add amdgpu-unify-metadata pass

Summary

Multiple metadata values for records such as opencl.ocl.version, llvm.ident and similar are created after linking several modules. For some of them, notably opencl.ocl.version, this creates semantic problem because we cannot tell which version of OpenCL the composite module conforms.

Moreover, such repetitions of identical values often create a huge list of unneeded metadata, which grows bitcode size both in memory and stored on disk. It can go up to several Mb when linked against our OpenCL library. Lastly, such long lists obscure reading of dumped IR.

The pass unifies metadata after linking.

Ideally we would like to run this as a last step during linking, but such interface is not available for a target. Therefor it is run as a first step in the optimizer as guided by the AMDGPUTargetMachine::addEarlyAsPossiblePasses(). There is a drawback, passes added this way go to a function pass manager, so while pass shall be a ModulePass by the nature it is converted to a function pass to work on the function's parent. Note, second and further invocations for other functions do nothing because metadata is already unified. In the future we may consider to convert it back to a ModulePass.

For the OpenCL version we could use two modes as guided by the last argument of unifyVersionMD() - pick largest or pick the first one. In general largest may be more correct, but in reality first is a right one with our library. The library is built as OpenCL 2.0 although it does not use specific features in a calls to functions which are not 2.0 specific. The first value in the list is always user's kernel module version, so we can rely on that first value. We may reconsider this in the future, but that would also require us to split library into 1.2 and 2.0 portions.

Diff Detail

Repository: rL LLVM

Event Timeline

rampitec updated this revision to Diff 73987.Oct 7 2016, 2:50 PM

rampitec retitled this revision from to [AMDGPU] Add amdgpu-unify-metadata pass.

rampitec updated this object.

rampitec added reviewers: • tstellarAMD, vpykhtin.

rampitec set the repository for this revision to rL LLVM.

rampitec added a project: Restricted Project.

rampitec added a subscriber: llvm-commits.

Herald edited edge metadata. · View Herald TranscriptOct 7 2016, 2:50 PM

Herald added subscribers: tony-tye, yaxunl, mgorny and 5 others. · View Herald Transcript

I think Matt had a patch to do this in common code.

rampitec added a reviewer: arsenm.Oct 10 2016, 12:24 PM

This should be handled by the generic linker, this isn't an AMDGPU specific issue. I had this https://reviews.llvm.org/D20582 (but there are more comments not visible in phabricator). An RFC should be posted for whether all named metadata should behave like a set

In D25381#567873, @arsenm wrote:

This should be handled by the generic linker, this isn't an AMDGPU specific issue. I had this https://reviews.llvm.org/D20582 (but there are more comments not visible in phabricator). An RFC should be posted for whether all named metadata should behave like a set

I see. D20582 only covers llvm.ident. The same happens to opencl.ocl.version, spir.version and others. Notably opencl and spir version handling are specific to the target stack. Like in our case we cannot use maximum version, but have to take version from the kernel module. Some other implementation probably would want different behavior. So in my opinion D20582 is a good move, but does not replace this change.

In D25381#568446, @rampitec wrote:

In D25381#567873, @arsenm wrote:

This should be handled by the generic linker, this isn't an AMDGPU specific issue. I had this https://reviews.llvm.org/D20582 (but there are more comments not visible in phabricator). An RFC should be posted for whether all named metadata should behave like a set

I see. D20582 only covers llvm.ident. The same happens to opencl.ocl.version, spir.version and others. Notably opencl and spir version handling are specific to the target stack. Like in our case we cannot use maximum version, but have to take version from the kernel module. Some other implementation probably would want different behavior. So in my opinion D20582 is a good move, but does not replace this change.

Take a look at the proposed solution here: https://marc.info/?l=llvm-commits&m=147370670326170&w=2 This is what Matt was referring to when he mentioned doing a RFC on the list.

In D25381#568507, @tstellarAMD wrote:

Take a look at the proposed solution here: https://marc.info/?l=llvm-commits&m=147370670326170&w=2 This is what Matt was referring to when he mentioned doing a RFC on the list.

That's an interesting idea, but a long haul. Even then I do not see how would it resolve problem with choosing a right version from the proposed vector of values.
I mean the patch here is not ideal as well, but it resolves problem for now. I guess when an RFC is created and implemented we would still need something like this to make a target choice.

For the OpenCL version metadata, is there anything in the backend that uses it?

In D25381#569897, @tstellarAMD wrote:

For the OpenCL version metadata, is there anything in the backend that uses it?

One thing is that OpenCL 2.0 supports non-uniform workgroup sizes, and it is non-uniform by default. A query to get_local_size and similar become surprisingly long expansion and you cannot really constant fold a known wg size any longer. Under OCL 1.2 you can optimize it freely.

rampitec added a reviewer: tony-tye.Oct 31 2016, 12:07 PM

In D25381#569897, @tstellarAMD wrote:

For the OpenCL version metadata, is there anything in the backend that uses it?

The runtime metadata relies on it to know the OpenCL version.

It seems like merging the OpenCl metadata would have to be done in an AMDGPU backend pass, since it uses rules specific to our target. However, I think de-duplicating generic metadata like .ident should be handled in a target independent way.

lib/Target/AMDGPU/AMDGPUUnifyMetadata.cpp
105 ↗	(On Diff #73987)	Would it be better to use Twine here?

In D25381#588102, @tstellarAMD wrote:

It seems like merging the OpenCl metadata would have to be done in an AMDGPU backend pass, since it uses rules specific to our target. However, I think de-duplicating generic metadata like .ident should be handled in a target independent way.

I agree. When D20582 is submitted we can remove one line from this change.

lib/Target/AMDGPU/AMDGPUUnifyMetadata.cpp
105 ↗	(On Diff #73987)	I have to modify the string each iteration (potentially). That is not how Twine does work, it is basically not created to be modified later. In particular is does not have operator=(), only constructor.

vpykhtin added inline comments.Dec 2 2016, 4:49 AM

test/CodeGen/AMDGPU/unify-metadata.ll
20 ↗	(On Diff #73987)	How about adding test for deduplicating extension strings?

Changed source to conform to SPIR specification - opencl.* named metadata all contain a single metadata reference, which is itself a list of values.
Added opencl.used.extensions test.
Changed loops to C++11 range syntax.

rampitec marked 3 inline comments as done.Dec 2 2016, 1:36 PM

LGTM.

This revision is now accepted and ready to land.Dec 8 2016, 4:27 AM

Closed by commit rL289092: [AMDGPU] Add amdgpu-unify-metadata pass (authored by rampitec). · Explain WhyDec 8 2016, 11:56 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

AMDGPU/

AMDGPU.h

4 lines

AMDGPUTargetMachine.h

1 line

AMDGPUTargetMachine.cpp

6 lines

AMDGPUUnifyMetadata.cpp

147 lines

CMakeLists.txt

1 line

test/

CodeGen/

AMDGPU/

unify-metadata.ll

26 lines

Diff 80801

llvm/trunk/lib/Target/AMDGPU/AMDGPU.h

	Show First 20 Lines • Show All 84 Lines • ▼ Show 20 Lines

	Pass *createAMDGPUStructurizeCFGPass();			Pass *createAMDGPUStructurizeCFGPass();
	FunctionPass *createAMDGPUISelDag(TargetMachine &TM,			FunctionPass *createAMDGPUISelDag(TargetMachine &TM,
	CodeGenOpt::Level OptLevel);			CodeGenOpt::Level OptLevel);
	ModulePass *createAMDGPUAlwaysInlinePass();			ModulePass *createAMDGPUAlwaysInlinePass();
	ModulePass *createAMDGPUOpenCLImageTypeLoweringPass();			ModulePass *createAMDGPUOpenCLImageTypeLoweringPass();
	FunctionPass *createAMDGPUAnnotateUniformValues();			FunctionPass *createAMDGPUAnnotateUniformValues();

				FunctionPass* createAMDGPUUnifyMetadataPass();
				void initializeAMDGPUUnifyMetadataPass(PassRegistry&);
				extern char &AMDGPUUnifyMetadataID;

	void initializeSIFixControlFlowLiveIntervalsPass(PassRegistry&);			void initializeSIFixControlFlowLiveIntervalsPass(PassRegistry&);
	extern char &SIFixControlFlowLiveIntervalsID;			extern char &SIFixControlFlowLiveIntervalsID;

	void initializeAMDGPUAnnotateUniformValuesPass(PassRegistry&);			void initializeAMDGPUAnnotateUniformValuesPass(PassRegistry&);
	extern char &AMDGPUAnnotateUniformValuesPassID;			extern char &AMDGPUAnnotateUniformValuesPassID;

	void initializeAMDGPUCodeGenPreparePass(PassRegistry&);			void initializeAMDGPUCodeGenPreparePass(PassRegistry&);
	extern char &AMDGPUCodeGenPrepareID;			extern char &AMDGPUCodeGenPrepareID;
	▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPUTargetMachine.h

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	public:
const AMDGPUIntrinsicInfo *getIntrinsicInfo() const override {		const AMDGPUIntrinsicInfo *getIntrinsicInfo() const override {
return &IntrinsicInfo;		return &IntrinsicInfo;
}		}
TargetIRAnalysis getTargetIRAnalysis() override;		TargetIRAnalysis getTargetIRAnalysis() override;

TargetLoweringObjectFile *getObjFileLowering() const override {		TargetLoweringObjectFile *getObjFileLowering() const override {
return TLOF.get();		return TLOF.get();
}		}
		void addEarlyAsPossiblePasses(PassManagerBase &PM) override;
};		};

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// R600 Target Machine (R600 -> Cayman)		// R600 Target Machine (R600 -> Cayman)
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

class R600TargetMachine final : public AMDGPUTargetMachine {		class R600TargetMachine final : public AMDGPUTargetMachine {
private:		private:
Show All 35 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show All 28 Lines
#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
#include "llvm/CodeGen/TargetPassConfig.h"		#include "llvm/CodeGen/TargetPassConfig.h"
#include "llvm/Support/TargetRegistry.h"		#include "llvm/Support/TargetRegistry.h"
#include "llvm/Transforms/IPO.h"		#include "llvm/Transforms/IPO.h"
#include "llvm/Transforms/IPO/AlwaysInliner.h"		#include "llvm/Transforms/IPO/AlwaysInliner.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Scalar/GVN.h"		#include "llvm/Transforms/Scalar/GVN.h"
#include "llvm/Transforms/Vectorize.h"		#include "llvm/Transforms/Vectorize.h"
		#include "llvm/IR/LegacyPassManager.h"

using namespace llvm;		using namespace llvm;

static cl::opt<bool> EnableR600StructurizeCFG(		static cl::opt<bool> EnableR600StructurizeCFG(
"r600-ir-structurize",		"r600-ir-structurize",
cl::desc("Use StructurizeCFG IR pass"),		cl::desc("Use StructurizeCFG IR pass"),
cl::init(true));		cl::init(true));

Show All 35 Lines	extern "C" void LLVMInitializeAMDGPUTarget() {
initializeSIFoldOperandsPass(*PR);		initializeSIFoldOperandsPass(*PR);
initializeSIShrinkInstructionsPass(*PR);		initializeSIShrinkInstructionsPass(*PR);
initializeSIFixControlFlowLiveIntervalsPass(*PR);		initializeSIFixControlFlowLiveIntervalsPass(*PR);
initializeSILoadStoreOptimizerPass(*PR);		initializeSILoadStoreOptimizerPass(*PR);
initializeAMDGPUAnnotateKernelFeaturesPass(*PR);		initializeAMDGPUAnnotateKernelFeaturesPass(*PR);
initializeAMDGPUAnnotateUniformValuesPass(*PR);		initializeAMDGPUAnnotateUniformValuesPass(*PR);
initializeAMDGPUPromoteAllocaPass(*PR);		initializeAMDGPUPromoteAllocaPass(*PR);
initializeAMDGPUCodeGenPreparePass(*PR);		initializeAMDGPUCodeGenPreparePass(*PR);
		initializeAMDGPUUnifyMetadataPass(*PR);
initializeSIAnnotateControlFlowPass(*PR);		initializeSIAnnotateControlFlowPass(*PR);
initializeSIInsertWaitsPass(*PR);		initializeSIInsertWaitsPass(*PR);
initializeSIWholeQuadModePass(*PR);		initializeSIWholeQuadModePass(*PR);
initializeSILowerControlFlowPass(*PR);		initializeSILowerControlFlowPass(*PR);
initializeSIInsertSkipsPass(*PR);		initializeSIInsertSkipsPass(*PR);
initializeSIDebuggerInsertNopsPass(*PR);		initializeSIDebuggerInsertNopsPass(*PR);
initializeSIOptimizeExecMaskingPass(*PR);		initializeSIOptimizeExecMaskingPass(*PR);
}		}
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
StringRef AMDGPUTargetMachine::getFeatureString(const Function &F) const {		StringRef AMDGPUTargetMachine::getFeatureString(const Function &F) const {
Attribute FSAttr = F.getFnAttribute("target-features");		Attribute FSAttr = F.getFnAttribute("target-features");

return FSAttr.hasAttribute(Attribute::None) ?		return FSAttr.hasAttribute(Attribute::None) ?
getTargetFeatureString() :		getTargetFeatureString() :
FSAttr.getValueAsString();		FSAttr.getValueAsString();
}		}

		void AMDGPUTargetMachine::addEarlyAsPossiblePasses(PassManagerBase &PM) {
		PM.add(llvm::createAMDGPUUnifyMetadataPass());
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// R600 Target Machine (R600 -> Cayman)		// R600 Target Machine (R600 -> Cayman)
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

R600TargetMachine::R600TargetMachine(const Target &T, const Triple &TT,		R600TargetMachine::R600TargetMachine(const Target &T, const Triple &TT,
StringRef CPU, StringRef FS,		StringRef CPU, StringRef FS,
TargetOptions Options,		TargetOptions Options,
Optional<Reloc::Model> RM,		Optional<Reloc::Model> RM,
▲ Show 20 Lines • Show All 431 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPUUnifyMetadata.cpp

				//===-- AMDGPUUnifyMetadata.cpp - Unify OpenCL metadata -------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// \file
				// \brief This pass that unifies multiple OpenCL metadata due to linking.
				//
				//===----------------------------------------------------------------------===//

				#include "AMDGPU.h"
				#include "llvm/IR/Constants.h"
				#include "llvm/IR/Module.h"
				#include "llvm/Pass.h"

				using namespace llvm;

				namespace {
				namespace kOCLMD {
				const char SpirVer[] = "opencl.spir.version";
				const char OCLVer[] = "opencl.ocl.version";
				const char UsedExt[] = "opencl.used.extensions";
				const char UsedOptCoreFeat[] = "opencl.used.optional.core.features";
				const char CompilerOptions[] = "opencl.compiler.options";
				const char LLVMIdent[] = "llvm.ident";
				}

				/// \brief Unify multiple OpenCL metadata due to linking.
				class AMDGPUUnifyMetadata : public FunctionPass {
				public:
				static char ID;
				explicit AMDGPUUnifyMetadata() : FunctionPass(ID) {};

				private:
				// This should really be a module pass but we have to run it as early
				// as possible, so given function passes are executed first and
				// TargetMachine::addEarlyAsPossiblePasses() expects only function passes
				// it has to be a function pass.
				virtual bool runOnModule(Module &M);

				// \todo: Convert to a module pass.
				virtual bool runOnFunction(Function &F);

				/// \brief Unify version metadata.
				/// \return true if changes are made.
				/// Assume the named metadata has operands each of which is a pair of
				/// integer constant, e.g.
				/// !Name = {!n1, !n2}
				/// !n1 = {i32 1, i32 2}
				/// !n2 = {i32 2, i32 0}
				/// Keep the largest version as the sole operand if PickFirst is false.
				/// Otherwise pick it from the first value, representing kernel module.
				bool unifyVersionMD(Module &M, StringRef Name, bool PickFirst) {
				auto NamedMD = M.getNamedMetadata(Name);
				if (!NamedMD \|\| NamedMD->getNumOperands() <= 1)
				return false;
				MDNode *MaxMD = nullptr;
				auto MaxVer = 0U;
				for (const auto &VersionMD : NamedMD->operands()) {
				assert(VersionMD->getNumOperands() == 2);
				auto CMajor = mdconst::extract<ConstantInt>(VersionMD->getOperand(0));
				auto VersionMajor = CMajor->getZExtValue();
				auto CMinor = mdconst::extract<ConstantInt>(VersionMD->getOperand(1));
				auto VersionMinor = CMinor->getZExtValue();
				auto Ver = (VersionMajor * 100) + (VersionMinor * 10);
				if (Ver > MaxVer) {
				MaxVer = Ver;
				MaxMD = VersionMD;
				}
				if (PickFirst)
				break;
				}
				NamedMD->eraseFromParent();
				NamedMD = M.getOrInsertNamedMetadata(Name);
				NamedMD->addOperand(MaxMD);
				return true;
				}

				/// \brief Unify version metadata.
				/// \return true if changes are made.
				/// Assume the named metadata has operands each of which is a list e.g.
				/// !Name = {!n1, !n2}
				/// !n1 = !{!"cl_khr_fp16", {!"cl_khr_fp64"}}
				/// !n2 = !{!"cl_khr_image"}
				/// Combine it into a single list with unique operands.
				bool unifyExtensionMD(Module &M, StringRef Name) {
				auto NamedMD = M.getNamedMetadata(Name);
				if (!NamedMD \|\| NamedMD->getNumOperands() == 1)
				return false;

				SmallVector<Metadata *, 4> All;
				for (const auto &MD : NamedMD->operands())
				for (const auto &Op : MD->operands())
				if (std::find(All.begin(), All.end(), Op.get()) == All.end())
				All.push_back(Op.get());

				NamedMD->eraseFromParent();
				NamedMD = M.getOrInsertNamedMetadata(Name);
				NamedMD->addOperand(MDNode::get(M.getContext(), All));
				return true;
				}
				};

				} // end anonymous namespace

				char AMDGPUUnifyMetadata::ID = 0;

				char &llvm::AMDGPUUnifyMetadataID = AMDGPUUnifyMetadata::ID;

				INITIALIZE_PASS(AMDGPUUnifyMetadata, "amdgpu-unify-metadata",
				"Unify multiple OpenCL metadata due to linking",
				false, false)

				FunctionPass* llvm::createAMDGPUUnifyMetadataPass() {
				return new AMDGPUUnifyMetadata();
				}

				bool AMDGPUUnifyMetadata::runOnModule(Module &M) {
				const char* Vers[] = {
				kOCLMD::SpirVer,
				kOCLMD::OCLVer
				};
				const char* Exts[] = {
				kOCLMD::UsedExt,
				kOCLMD::UsedOptCoreFeat,
				kOCLMD::CompilerOptions,
				kOCLMD::LLVMIdent
				};

				bool Changed = false;

				for (auto &I:Vers)
				Changed \|= unifyVersionMD(M, I, true);

				for (auto &I:Exts)
				Changed \|= unifyExtensionMD(M, I);

				return Changed;
				}

				bool AMDGPUUnifyMetadata::runOnFunction(Function &F) {
				return runOnModule(*F.getParent());
				}

llvm/trunk/lib/Target/AMDGPU/CMakeLists.txt

Show All 35 Lines	add_llvm_target(AMDGPUCodeGen
AMDGPUAsmPrinter.cpp		AMDGPUAsmPrinter.cpp
AMDGPUCodeGenPrepare.cpp		AMDGPUCodeGenPrepare.cpp
AMDGPUFrameLowering.cpp		AMDGPUFrameLowering.cpp
AMDGPUTargetObjectFile.cpp		AMDGPUTargetObjectFile.cpp
AMDGPUIntrinsicInfo.cpp		AMDGPUIntrinsicInfo.cpp
AMDGPUISelDAGToDAG.cpp		AMDGPUISelDAGToDAG.cpp
AMDGPUMCInstLower.cpp		AMDGPUMCInstLower.cpp
AMDGPUMachineFunction.cpp		AMDGPUMachineFunction.cpp
		AMDGPUUnifyMetadata.cpp
AMDGPUOpenCLImageTypeLoweringPass.cpp		AMDGPUOpenCLImageTypeLoweringPass.cpp
AMDGPUSubtarget.cpp		AMDGPUSubtarget.cpp
AMDGPUTargetMachine.cpp		AMDGPUTargetMachine.cpp
AMDGPUTargetTransformInfo.cpp		AMDGPUTargetTransformInfo.cpp
AMDGPUISelLowering.cpp		AMDGPUISelLowering.cpp
AMDGPUInstrInfo.cpp		AMDGPUInstrInfo.cpp
AMDGPUPromoteAlloca.cpp		AMDGPUPromoteAlloca.cpp
AMDGPURegisterInfo.cpp		AMDGPURegisterInfo.cpp
▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/unify-metadata.ll

				; RUN: opt -mtriple=amdgcn--amdhsa -amdgpu-unify-metadata -S < %s \| FileCheck -check-prefix=ALL %s

				; This test check that we have a singe metadata value after linking several
				; modules for records such as opencl.ocl.version, llvm.ident and similar.

				; ALL-DAG: !opencl.ocl.version = !{![[OCL_VER:[0-9]+]]}
				; ALL-DAG: !llvm.ident = !{![[LLVM_IDENT:[0-9]+]]}
				; ALL-DAG: !opencl.used.extensions = !{![[USED_EXT:[0-9]+]]}
				; ALL-DAG: ![[OCL_VER]] = !{i32 1, i32 2}
				; ALL-DAG: ![[LLVM_IDENT]] = !{!"clang version 4.0 "}
				; ALL-DAG: ![[USED_EXT]] = !{!"cl_images", !"cl_khr_fp16", !"cl_doubles"}

				define void @test() {
				ret void
				}

				!opencl.ocl.version = !{!1, !0, !0, !0}
				!llvm.ident = !{!2, !2, !2, !2}
				!opencl.used.extensions = !{!3, !3, !4, !5}

				!0 = !{i32 2, i32 0}
				!1 = !{i32 1, i32 2}
				!2 = !{!"clang version 4.0 "}
				!3 = !{!"cl_images", !"cl_khr_fp16"}
				!4 = !{!"cl_images", !"cl_doubles"}
				!5 = !{}