This is an archive of the discontinued LLVM Phabricator instance.

lib/Target/AMDGPU/AMDGPULibCalls.cpp
598 ↗	(On Diff #111514)	"llvm::" prefix is not needed.
616 ↗	(On Diff #111514)	You should not use getOrInsertFunction directly, but use AMDGPULibCalls::getFunction(). Since you are not using it you also fail to check we are in pre-link.
645 ↗	(On Diff #111514)	You should use AMDGPULibFunc and parser/mangler used across the whole this source.

This revision now requires changes to proceed.Aug 17 2017, 10:26 AM

rampitec added a reviewer: vpykhtin.Aug 17 2017, 10:33 AM

yaxunl marked 3 inline comments as done.Aug 20 2017, 2:38 PM

yaxunl added inline comments.

lib/Target/AMDGPU/AMDGPULibCalls.cpp
598 ↗	(On Diff #111514)	will remove
616 ↗	(On Diff #111514)	whether in pre-link can be checked by whether the function is declaration. `__read_pipe_` and `__write_pipe_` are unmangled functions. AMDGPULibFunc::getFunction relies on argument information encoded in mangled function names, which does not work for unmangled functions.
645 ↗	(On Diff #111514)	will refactor this.

refactor AMDGPULibFunc to handle unmangled lib functions.

add check for declaration to avoid post-link transformation of pipe functions.

You still fail to use getFunction and thus also fail to check prelink.

lib/Target/AMDGPU/AMDGPULibCalls.cpp
67 ↗	(On Diff #111912)	It is better hide these details from the client. AMDGPULibFunc should care about all of it and FuncInfo shall be sufficient structure to describe a function. Simplifier should not care about details.
616 ↗	(On Diff #111514)	You are still using M->getOrInsertFunction() directly.
lib/Target/AMDGPU/AMDGPULibFunc.cpp
987 ↗	(On Diff #111912)	I believe we can go without bruteforce search loop.
lib/Target/AMDGPU/AMDGPULibFunc.h
240 ↗	(On Diff #111912)	There is already EI_READ_PIPE above. Is it mangled? Then why these are unmangled?
366 ↗	(On Diff #111912)	StringRef designed to be passed by value, here and below.
test/CodeGen/AMDGPU/simplify-libcalls.ll
693 ↗	(On Diff #111912)	Run tests through opt -instnamer.

This revision now requires changes to proceed.Aug 22 2017, 9:31 AM

Clang does not emit mangled functions for pipe builtin functions. Instead, it emits unmangled functions e.g. __read_pipe_*. These functions are not overloaded and will be implemented by device library.

The original implementation of AMDGPULibFunc assumes all library functions have mangled name. It extracts function argument information based on the mangled function name and use it to implement getOrInsertFunction.

For unmangled library functions, those functions regarding mangling or function parameter info are either irrelevant or useless. Those functions are needed for mangled functions because the mangling requires such information.

Therefore I subclass AMDGPULibFunc as AMDGPUMangledLibFunc and AMDGPUUnmangledLibFunc and move all the stuff needed for mangling to AMDGPUMangledLibFunc.

For AMDGPUUnmangledLibFunc, since there is no argument type information from mangling, AMDGPULibFunc::getOrInsertFunction cannot be implemented. There is no special trick about identifying whether it is prelink or not in the original implementation. Just checks whether the function is declaration or not. I see no point of implementing a special getOrInsertFunction for AMDGPUUnmangledLibFunc.

In D36831#848827, @yaxunl wrote:

Clang does not emit mangled functions for pipe builtin functions. Instead, it emits unmangled functions e.g. __read_pipe_*. These functions are not overloaded and will be implemented by device library.

The original implementation of AMDGPULibFunc assumes all library functions have mangled name. It extracts function argument information based on the mangled function name and use it to implement getOrInsertFunction.

For unmangled library functions, those functions regarding mangling or function parameter info are either irrelevant or useless. Those functions are needed for mangled functions because the mangling requires such information.

Therefore I subclass AMDGPULibFunc as AMDGPUMangledLibFunc and AMDGPUUnmangledLibFunc and move all the stuff needed for mangling to AMDGPUMangledLibFunc.

For AMDGPUUnmangledLibFunc, since there is no argument type information from mangling, AMDGPULibFunc::getOrInsertFunction cannot be implemented. There is no special trick about identifying whether it is prelink or not in the original implementation. Just checks whether the function is declaration or not. I see no point of implementing a special getOrInsertFunction for AMDGPUUnmangledLibFunc.

What about EI_READ_PIPE? Is it mangled? Shall it be removed?

Whatever you subclass all the changes shall be contained within AMDGPULibFunc and not in its clients. If some of the arguments on FuncInfo are unused in this case that is fine. You can also check the former implementation from HSAIL which was also handling EDG style magling (unmagled in therms you are using here).

All in all that is only legal to do it on prelink, and that is exactly what getFunction() does.

yaxunl marked 8 inline comments as done.Aug 22 2017, 10:11 AM

yaxunl added inline comments.

lib/Target/AMDGPU/AMDGPULibCalls.cpp
67 ↗	(On Diff #111912)	The handling of mangled lib function requires function argument information, which is not available for unmangled lib function. The transformation of unmangled lib function does not require function argument information. There is no point of implementing many of the member functions of mangled lib functions.
lib/Target/AMDGPU/AMDGPULibFunc.cpp
987 ↗	(On Diff #111912)	will add a map.
lib/Target/AMDGPU/AMDGPULibFunc.h
240 ↗	(On Diff #111912)	clang does not emit mangled read_pipe functions. Will remove EI_READ_PIPE.
366 ↗	(On Diff #111912)	This comes from the original implementation. Will fix.
test/CodeGen/AMDGPU/simplify-libcalls.ll
693 ↗	(On Diff #111912)	will fix.

In D36831#848830, @rampitec wrote:

In D36831#848827, @yaxunl wrote:

Clang does not emit mangled functions for pipe builtin functions. Instead, it emits unmangled functions e.g. __read_pipe_*. These functions are not overloaded and will be implemented by device library.

The original implementation of AMDGPULibFunc assumes all library functions have mangled name. It extracts function argument information based on the mangled function name and use it to implement getOrInsertFunction.

For unmangled library functions, those functions regarding mangling or function parameter info are either irrelevant or useless. Those functions are needed for mangled functions because the mangling requires such information.

Therefore I subclass AMDGPULibFunc as AMDGPUMangledLibFunc and AMDGPUUnmangledLibFunc and move all the stuff needed for mangling to AMDGPUMangledLibFunc.

For AMDGPUUnmangledLibFunc, since there is no argument type information from mangling, AMDGPULibFunc::getOrInsertFunction cannot be implemented. There is no special trick about identifying whether it is prelink or not in the original implementation. Just checks whether the function is declaration or not. I see no point of implementing a special getOrInsertFunction for AMDGPUUnmangledLibFunc.

What about EI_READ_PIPE? Is it mangled? Shall it be removed?

Whatever you subclass all the changes shall be contained within AMDGPULibFunc and not in its clients. If some of the arguments on FuncInfo are unused in this case that is fine. You can also check the former implementation from HSAIL which was also handling EDG style magling (unmagled in therms you are using here).

All in all that is only legal to do it on prelink, and that is exactly what getFunction() does.

For pipe functions, clang currently does not do any mangling at all. It is not like the EDG style mangling.

As I said, getFunction() requires function argument information, which is not available for unmangled functions. To implement getFunction() requires providing argument information for unmangled functions, which is time consuming and also useless.

In D36831#848885, @yaxunl wrote:

As I said, getFunction() requires function argument information, which is not available for unmangled functions. To implement getFunction() requires providing argument information for unmangled functions, which is time consuming and also useless.

You can just omit filling it. You have FuncId and that is all you need in this case.

In D36831#848889, @rampitec wrote:

In D36831#848885, @yaxunl wrote:

As I said, getFunction() requires function argument information, which is not available for unmangled functions. To implement getFunction() requires providing argument information for unmangled functions, which is time consuming and also useless.

You can just omit filling it. You have FuncId and that is all you need in this case.

I need to get a function with different (transformed) function type.

Run it without -amdgpu-prelink. It will fail to link. It will also fail to build library.

In D36831#848905, @rampitec wrote:

Run it without -amdgpu-prelink. It will fail to link. It will also fail to build library.

The device library has not implemented these functions yet. I think that's why it fails to link.

I will investigate why it fails to build library.

In D36831#849723, @yaxunl wrote:

In D36831#848905, @rampitec wrote:

Run it without -amdgpu-prelink. It will fail to link. It will also fail to build library.

The device library has not implemented these functions yet. I think that's why it fails to link.

I will investigate why it fails to build library.

It fails because you do not use getFunction, effectively skipping prelinck check.

In D36831#849725, @rampitec wrote:

In D36831#849723, @yaxunl wrote:

In D36831#848905, @rampitec wrote:

Run it without -amdgpu-prelink. It will fail to link. It will also fail to build library.

The device library has not implemented these functions yet. I think that's why it fails to link.

I will investigate why it fails to build library.

It fails because you do not use getFunction, effectively skipping prelinck check.

For mangled lib functions, my patch does not change how they are handled. They still go through getFunction.

For unmangled lib functions, I only transform them if they are declarations. In post-linking pass, they are already linked and are not declarations, therefore they stay unchanged.

I am wondering why there will be link failure.

In D36831#850296, @yaxunl wrote:

In D36831#849725, @rampitec wrote:

In D36831#849723, @yaxunl wrote:

In D36831#848905, @rampitec wrote:

Run it without -amdgpu-prelink. It will fail to link. It will also fail to build library.

The device library has not implemented these functions yet. I think that's why it fails to link.

I will investigate why it fails to build library.

It fails because you do not use getFunction, effectively skipping prelinck check.

For mangled lib functions, my patch does not change how they are handled. They still go through getFunction.

For unmangled lib functions, I only transform them if they are declarations. In post-linking pass, they are already linked and are not declarations, therefore they stay unchanged.

I am wondering why there will be link failure.

Library build works before link, but you do not check that prelink transfirmations allowed.

In D36831#850356, @rampitec wrote:

In D36831#850296, @yaxunl wrote:

In D36831#849725, @rampitec wrote:

In D36831#849723, @yaxunl wrote:

In D36831#848905, @rampitec wrote:

Run it without -amdgpu-prelink. It will fail to link. It will also fail to build library.

The device library has not implemented these functions yet. I think that's why it fails to link.

I will investigate why it fails to build library.

It fails because you do not use getFunction, effectively skipping prelinck check.

For mangled lib functions, my patch does not change how they are handled. They still go through getFunction.

For unmangled lib functions, I only transform them if they are declarations. In post-linking pass, they are already linked and are not declarations, therefore they stay unchanged.

I am wondering why there will be link failure.

Library build works before link, but you do not check that prelink transfirmations allowed.

library contains definition of unmangled functions, since they are not declaration, the pass will not change them.

b-sumner added a subscriber: b-sumner.Aug 31 2017, 12:31 PM

Splitting AMDGPULibFunc in two classes looks a huge overkill. How about modifying AMDGPULibFunc::parse so it could accept unmangled names and just return an enum id for the function (using some fast lookup approach)? Type info for such functions can be left unpopulated and supposed to be handled by the client (as in fold_read_write_pipe).

In D36831#858723, @vpykhtin wrote:

Splitting AMDGPULibFunc in two classes looks a huge overkill. How about modifying AMDGPULibFunc::parse so it could accept unmangled names and just return an enum id for the function (using some fast lookup approach)? Type info for such functions can be left unpopulated and supposed to be handled by the client (as in fold_read_write_pipe).

The unmangled lib function is different from mangled function in the way how the function names and type information are handled. I have found a way to reuse the interface mangle, getOrInsertFunction, getFunction, and getFunctionType. However to achieve that we really need to have different classes for mangled and unmangled lib function and take advantages of some virtual functions. Having different classes for unmangled and mangled lib functions also have a cleaner design where all name mangling stuff are kept in where they belong.

Ok, unmangled part looks different indeed. If the issue with pre-link checking is solved this patch is ok with me.

rampitec added inline comments.Sep 1 2017, 9:55 AM

lib/Target/AMDGPU/AMDGPULibCalls.cpp
616 ↗	(On Diff #111514)	I see you are now checking for the declaration vs definition, but it is still only legal on pre-link and that is not checked.
lib/Target/AMDGPU/AMDGPULibFunc.cpp
987 ↗	(On Diff #111912)	Not done.
lib/Target/AMDGPU/AMDGPULibFunc.h
366 ↗	(On Diff #111912)	Not done.
test/CodeGen/AMDGPU/simplify-libcalls.ll
693 ↗	(On Diff #111912)	Not done.

Revised by Stas' comments. Use AMDGPULibFunc::getOrInsertFunction to create function.

Minor change of test.

vpykhtin added inline comments.Sep 4 2017, 3:15 AM

lib/Target/AMDGPU/AMDGPULibFunc.h
366 ↗	(On Diff #111912)	Initially (non-const) StringRef& was introduced intentionally to have mangledName with stripped name on return.

yaxunl marked an inline comment as done.Sep 4 2017, 5:40 AM

yaxunl added inline comments.

lib/Target/AMDGPU/AMDGPULibCalls.cpp
616 ↗	(On Diff #111514)	Now use AMDGPULibFunc::getOrInsertFunction().
lib/Target/AMDGPU/AMDGPULibFunc.h
366 ↗	(On Diff #111912)	Only const StringRef& is changed to StringRef. non-const StringRef& is not changed.

rampitec added inline comments.Sep 4 2017, 10:30 AM

lib/Target/AMDGPU/AMDGPULibCalls.cpp
636 ↗	(On Diff #113673)	The point of using it is to check its result and bail if null. Also no modifications shall be done before this check.
67 ↗	(On Diff #111912)	The whole point of moving logic to distinguish between mangled and unmangled into the parser is to have no massive changes in the client like this and to let client to not bother differentiating on every other line.
test/CodeGen/AMDGPU/simplify-libcalls.ll
693 ↗	(On Diff #111912)	I still see it.

yaxunl marked 2 inline comments as done.Sep 4 2017, 2:55 PM

yaxunl added inline comments.

test/CodeGen/AMDGPU/simplify-libcalls.ll
693 ↗	(On Diff #111912)	I've add -instnamer to the RUN line. Do you mean I should get the original .ll through opt -instnamer so that the .ll contains named instructions?

rampitec added inline comments.Sep 4 2017, 4:30 PM

test/CodeGen/AMDGPU/simplify-libcalls.ll
693 ↗	(On Diff #111912)	Yes, there shall be no numbered variables in the test as it has to be easily editable.

Revised by Stas' comments.

rampitec added inline comments.Sep 5 2017, 2:00 PM

test/CodeGen/AMDGPU/simplify-libcalls.ll
1 ↗	(On Diff #113906)	-instanamer is not needed here.

Remove -instnamer from RUN line.

rampitec added inline comments.Sep 5 2017, 2:09 PM

lib/Target/AMDGPU/AMDGPULibCalls.cpp
612 ↗	(On Diff #113906)	Please move the actual IR change below "return false" statement.

Revised by Stas' comments.

Thanks!

This revision is now accepted and ready to land.Sep 5 2017, 2:32 PM

Closed by commit rL312598: [AMDGPU] Transform __read_pipe_* and __write_pipe_* (authored by yaxunl). · Explain WhySep 5 2017, 5:31 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

AMDGPU/

AMDGPULibCalls.cpp

92 lines

AMDGPULibFunc.h

167 lines

AMDGPULibFunc.cpp

192 lines

test/

CodeGen/

AMDGPU/

simplify-libcalls.ll

129 lines

Diff 113939

llvm/trunk/lib/Target/AMDGPU/AMDGPULibCalls.cpp

Show First 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	private:
bool fold_log10(CallInst *CI, IRBuilder<> &B, const FuncInfo &FInfo);		bool fold_log10(CallInst *CI, IRBuilder<> &B, const FuncInfo &FInfo);

// sqrt		// sqrt
bool fold_sqrt(CallInst *CI, IRBuilder<> &B, const FuncInfo &FInfo);		bool fold_sqrt(CallInst *CI, IRBuilder<> &B, const FuncInfo &FInfo);

// sin/cos		// sin/cos
bool fold_sincos(CallInst * CI, IRBuilder<> &B, AliasAnalysis * AA);		bool fold_sincos(CallInst * CI, IRBuilder<> &B, AliasAnalysis * AA);

		// __read_pipe/__write_pipe
		bool fold_read_write_pipe(CallInst *CI, IRBuilder<> &B, FuncInfo &FInfo);

// Get insertion point at entry.		// Get insertion point at entry.
BasicBlock::iterator getEntryIns(CallInst * UI);		BasicBlock::iterator getEntryIns(CallInst * UI);
// Insert an Alloc instruction.		// Insert an Alloc instruction.
AllocaInst* insertAlloca(CallInst * UI, IRBuilder<> &B, const char *prefix);		AllocaInst* insertAlloca(CallInst * UI, IRBuilder<> &B, const char *prefix);
// Get a scalar native builtin signle argument FP function		// Get a scalar native builtin signle argument FP function
Constant* getNativeFunction(Module* M, const FuncInfo &FInfo);		Constant* getNativeFunction(Module* M, const FuncInfo &FInfo);

protected:		protected:
▲ Show 20 Lines • Show All 311 Lines • ▼ Show 20 Lines	static TableRef getOptTable(AMDGPULibFunc::EFuncId id) {
case AMDGPULibFunc::EI_TANPI: return TableRef(tbl_tanpi);		case AMDGPULibFunc::EI_TANPI: return TableRef(tbl_tanpi);
case AMDGPULibFunc::EI_TGAMMA: return TableRef(tbl_tgamma);		case AMDGPULibFunc::EI_TGAMMA: return TableRef(tbl_tgamma);
default:;		default:;
}		}
return TableRef();		return TableRef();
}		}

static inline int getVecSize(const AMDGPULibFunc& FInfo) {		static inline int getVecSize(const AMDGPULibFunc& FInfo) {
return FInfo.Leads[0].VectorSize;		return FInfo.getLeads()[0].VectorSize;
}		}

static inline AMDGPULibFunc::EType getArgType(const AMDGPULibFunc& FInfo) {		static inline AMDGPULibFunc::EType getArgType(const AMDGPULibFunc& FInfo) {
return (AMDGPULibFunc::EType)FInfo.Leads[0].ArgType;		return (AMDGPULibFunc::EType)FInfo.getLeads()[0].ArgType;
}		}

Constant AMDGPULibCalls::getFunction(Module M, const FuncInfo& fInfo) {		Constant AMDGPULibCalls::getFunction(Module M, const FuncInfo& fInfo) {
// If we are doing PreLinkOpt, the function is external. So it is safe to		// If we are doing PreLinkOpt, the function is external. So it is safe to
// use getOrInsertFunction() at this stage.		// use getOrInsertFunction() at this stage.

return EnablePreLink ? AMDGPULibFunc::getOrInsertFunction(M, fInfo)		return EnablePreLink ? AMDGPULibFunc::getOrInsertFunction(M, fInfo)
: AMDGPULibFunc::getFunction(M, fInfo);		: AMDGPULibFunc::getFunction(M, fInfo);
Show All 28 Lines	bool AMDGPULibCalls::sincosUseNative(CallInst *aCI, const FuncInfo &FInfo) {
bool native_sin = useNativeFunc("sin");		bool native_sin = useNativeFunc("sin");
bool native_cos = useNativeFunc("cos");		bool native_cos = useNativeFunc("cos");

if (native_sin && native_cos) {		if (native_sin && native_cos) {
Module *M = aCI->getModule();		Module *M = aCI->getModule();
Value *opr0 = aCI->getArgOperand(0);		Value *opr0 = aCI->getArgOperand(0);

AMDGPULibFunc nf;		AMDGPULibFunc nf;
nf.Leads[0].ArgType = FInfo.Leads[0].ArgType;		nf.getLeads()[0].ArgType = FInfo.getLeads()[0].ArgType;
nf.Leads[0].VectorSize = FInfo.Leads[0].VectorSize;		nf.getLeads()[0].VectorSize = FInfo.getLeads()[0].VectorSize;

nf.setPrefix(AMDGPULibFunc::NATIVE);		nf.setPrefix(AMDGPULibFunc::NATIVE);
nf.setId(AMDGPULibFunc::EI_SIN);		nf.setId(AMDGPULibFunc::EI_SIN);
Constant *sinExpr = getFunction(M, nf);		Constant *sinExpr = getFunction(M, nf);

nf.setPrefix(AMDGPULibFunc::NATIVE);		nf.setPrefix(AMDGPULibFunc::NATIVE);
nf.setId(AMDGPULibFunc::EI_COS);		nf.setId(AMDGPULibFunc::EI_COS);
Constant *cosExpr = getFunction(M, nf);		Constant *cosExpr = getFunction(M, nf);
Show All 12 Lines	bool AMDGPULibCalls::sincosUseNative(CallInst *aCI, const FuncInfo &FInfo) {
return false;		return false;
}		}

bool AMDGPULibCalls::useNative(CallInst *aCI) {		bool AMDGPULibCalls::useNative(CallInst *aCI) {
CI = aCI;		CI = aCI;
Function *Callee = aCI->getCalledFunction();		Function *Callee = aCI->getCalledFunction();

FuncInfo FInfo;		FuncInfo FInfo;
if (!parseFunctionName(Callee->getName(), &FInfo) \|\|		if (!parseFunctionName(Callee->getName(), &FInfo) \|\| !FInfo.isMangled() \|\|
FInfo.getPrefix() != AMDGPULibFunc::NOPFX \|\|		FInfo.getPrefix() != AMDGPULibFunc::NOPFX \|\|
getArgType(FInfo) == AMDGPULibFunc::F64 \|\|		getArgType(FInfo) == AMDGPULibFunc::F64 \|\| !HasNative(FInfo.getId()) \|\|
!HasNative(FInfo.getId()) \|\|
!(AllNative \|\| useNativeFunc(FInfo.getName())) ) {		!(AllNative \|\| useNativeFunc(FInfo.getName()))) {
return false;		return false;
}		}

if (FInfo.getId() == AMDGPULibFunc::EI_SINCOS)		if (FInfo.getId() == AMDGPULibFunc::EI_SINCOS)
return sincosUseNative(aCI, FInfo);		return sincosUseNative(aCI, FInfo);

FInfo.setPrefix(AMDGPULibFunc::NATIVE);		FInfo.setPrefix(AMDGPULibFunc::NATIVE);
Constant *F = getFunction(aCI->getModule(), FInfo);		Constant *F = getFunction(aCI->getModule(), FInfo);
if (!F)		if (!F)
return false;		return false;

aCI->setCalledFunction(F);		aCI->setCalledFunction(F);
DEBUG_WITH_TYPE("usenative", dbgs() << "<useNative> replace " << *aCI		DEBUG_WITH_TYPE("usenative", dbgs() << "<useNative> replace " << *aCI
<< " with native version");		<< " with native version");
return true;		return true;
}		}

		// Clang emits call of __read_pipe_2 or __read_pipe_4 for OpenCL read_pipe
		// builtin, with appended type size and alignment arguments, where 2 or 4
		// indicates the original number of arguments. The library has optimized version
		// of __read_pipe_2/__read_pipe_4 when the type size and alignment has the same
		// power of 2 value. This function transforms __read_pipe_2 to __read_pipe_2_N
		// for such cases where N is the size in bytes of the type (N = 1, 2, 4, 8, ...,
		// 128). The same for __read_pipe_4, write_pipe_2, and write_pipe_4.
		bool AMDGPULibCalls::fold_read_write_pipe(CallInst *CI, IRBuilder<> &B,
		FuncInfo &FInfo) {
		auto *Callee = CI->getCalledFunction();
		if (!Callee->isDeclaration())
		return false;

		assert(Callee->hasName() && "Invalid read_pipe/write_pipe function");
		auto *M = Callee->getParent();
		auto &Ctx = M->getContext();
		std::string Name = Callee->getName();
		auto NumArg = CI->getNumArgOperands();
		if (NumArg != 4 && NumArg != 6)
		return false;
		auto *PacketSize = CI->getArgOperand(NumArg - 2);
		auto *PacketAlign = CI->getArgOperand(NumArg - 1);
		if (!isa<ConstantInt>(PacketSize) \|\| !isa<ConstantInt>(PacketAlign))
		return false;
		unsigned Size = cast<ConstantInt>(PacketSize)->getZExtValue();
		unsigned Align = cast<ConstantInt>(PacketAlign)->getZExtValue();
		if (Size != Align \|\| !isPowerOf2_32(Size))
		return false;

		Type *PtrElemTy;
		if (Size <= 8)
		PtrElemTy = Type::getIntNTy(Ctx, Size * 8);
		else
		PtrElemTy = VectorType::get(Type::getInt64Ty(Ctx), Size / 8);
		unsigned PtrArgLoc = CI->getNumArgOperands() - 3;
		auto PtrArg = CI->getArgOperand(PtrArgLoc);
		unsigned PtrArgAS = PtrArg->getType()->getPointerAddressSpace();
		auto *PtrTy = llvm::PointerType::get(PtrElemTy, PtrArgAS);

		SmallVector<llvm::Type *, 6> ArgTys;
		for (unsigned I = 0; I != PtrArgLoc; ++I)
		ArgTys.push_back(CI->getArgOperand(I)->getType());
		ArgTys.push_back(PtrTy);

		Name = Name + "_" + std::to_string(Size);
		auto *FTy = FunctionType::get(Callee->getReturnType(),
		ArrayRef<Type *>(ArgTys), false);
		AMDGPULibFunc NewLibFunc(Name, FTy);
		auto *F = AMDGPULibFunc::getOrInsertFunction(M, NewLibFunc);
		if (!F)
		return false;

		auto *BCast = B.CreatePointerCast(PtrArg, PtrTy);
		SmallVector<Value *, 6> Args;
		for (unsigned I = 0; I != PtrArgLoc; ++I)
		Args.push_back(CI->getArgOperand(I));
		Args.push_back(BCast);

		auto *NCI = B.CreateCall(F, Args);
		NCI->setAttributes(CI->getAttributes());
		CI->replaceAllUsesWith(NCI);
		CI->dropAllReferences();
		CI->eraseFromParent();

		return true;
		}

// This function returns false if no change; return true otherwise.		// This function returns false if no change; return true otherwise.
bool AMDGPULibCalls::fold(CallInst CI, AliasAnalysis AA) {		bool AMDGPULibCalls::fold(CallInst CI, AliasAnalysis AA) {
this->CI = CI;		this->CI = CI;
Function *Callee = CI->getCalledFunction();		Function *Callee = CI->getCalledFunction();

// Ignore indirect calls.		// Ignore indirect calls.
if (Callee == 0) return false;		if (Callee == 0) return false;

▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	bool AMDGPULibCalls::fold(CallInst CI, AliasAnalysis AA) {
case AMDGPULibFunc::EI_COS:		case AMDGPULibFunc::EI_COS:
case AMDGPULibFunc::EI_SIN:		case AMDGPULibFunc::EI_SIN:
if ((getArgType(FInfo) == AMDGPULibFunc::F32 \|\|		if ((getArgType(FInfo) == AMDGPULibFunc::F32 \|\|
getArgType(FInfo) == AMDGPULibFunc::F64)		getArgType(FInfo) == AMDGPULibFunc::F64)
&& (FInfo.getPrefix() == AMDGPULibFunc::NOPFX))		&& (FInfo.getPrefix() == AMDGPULibFunc::NOPFX))
return fold_sincos(CI, B, AA);		return fold_sincos(CI, B, AA);

break;		break;
		case AMDGPULibFunc::EI_READ_PIPE_2:
		case AMDGPULibFunc::EI_READ_PIPE_4:
		case AMDGPULibFunc::EI_WRITE_PIPE_2:
		case AMDGPULibFunc::EI_WRITE_PIPE_4:
		return fold_read_write_pipe(CI, B, FInfo);

default:		default:
break;		break;
}		}

return false;		return false;
}		}

▲ Show 20 Lines • Show All 607 Lines • ▼ Show 20 Lines	bool AMDGPULibCalls::fold_sincos(CallInst *CI, IRBuilder<> &B,

if (!UI) return false;		if (!UI) return false;

// Merge the sin and cos.		// Merge the sin and cos.

// for OpenCL 2.0 we have only generic implementation of sincos		// for OpenCL 2.0 we have only generic implementation of sincos
// function.		// function.
AMDGPULibFunc nf(AMDGPULibFunc::EI_SINCOS, fInfo);		AMDGPULibFunc nf(AMDGPULibFunc::EI_SINCOS, fInfo);
nf.Leads[0].PtrKind = AMDGPULibFunc::GENERIC;		nf.getLeads()[0].PtrKind = AMDGPULibFunc::GENERIC;
Function *Fsincos = dyn_cast_or_null<Function>(getFunction(M, nf));		Function *Fsincos = dyn_cast_or_null<Function>(getFunction(M, nf));
if (!Fsincos) return false;		if (!Fsincos) return false;

BasicBlock::iterator ItOld = B.GetInsertPoint();		BasicBlock::iterator ItOld = B.GetInsertPoint();
AllocaInst *Alloc = insertAlloca(UI, B, "__sincos_");		AllocaInst *Alloc = insertAlloca(UI, B, "__sincos_");
B.SetInsertPoint(UI);		B.SetInsertPoint(UI);

Value *P = Alloc;		Value *P = Alloc;
▲ Show 20 Lines • Show All 399 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPULibFunc.h

Show All 12 Lines
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"

namespace llvm {		namespace llvm {

class FunctionType;		class FunctionType;
class Function;		class Function;
class Module;		class Module;

class AMDGPULibFunc {		class AMDGPULibFuncBase {
public:		public:
enum EFuncId {		enum EFuncId {
EI_NONE,		EI_NONE,

// IMPORTANT: enums below should go in ascending by 1 value order		// IMPORTANT: enums below should go in ascending by 1 value order
// because they are used as indexes in the mangling rules table.		// because they are used as indexes in the mangling rules table.
// don't use explicit value assignment.		// don't use explicit value assignment.
		//
		// There are two types of library functions: those with mangled
		// name and those with unmangled name. The enums for the library
		// functions with mangled name are defined before enums for the
		// library functions with unmangled name. The enum for the last
		// library function with mangled name is EI_LAST_MANGLED.
		//
		// Library functions with mangled name.
EI_ABS,		EI_ABS,
EI_ABS_DIFF,		EI_ABS_DIFF,
EI_ACOS,		EI_ACOS,
EI_ACOSH,		EI_ACOSH,
EI_ACOSPI,		EI_ACOSPI,
EI_ADD_SAT,		EI_ADD_SAT,
EI_ALL,		EI_ALL,
EI_ANY,		EI_ANY,
▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	enum EFuncId {
EI_NEXTAFTER,		EI_NEXTAFTER,
EI_NORMALIZE,		EI_NORMALIZE,
EI_POPCOUNT,		EI_POPCOUNT,
EI_POW,		EI_POW,
EI_POWN,		EI_POWN,
EI_POWR,		EI_POWR,
EI_PREFETCH,		EI_PREFETCH,
EI_RADIANS,		EI_RADIANS,
EI_READ_PIPE,
EI_RECIP,		EI_RECIP,
EI_REMAINDER,		EI_REMAINDER,
EI_REMQUO,		EI_REMQUO,
EI_RESERVE_READ_PIPE,		EI_RESERVE_READ_PIPE,
EI_RESERVE_WRITE_PIPE,		EI_RESERVE_WRITE_PIPE,
EI_RHADD,		EI_RHADD,
EI_RINT,		EI_RINT,
EI_ROOTN,		EI_ROOTN,
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	enum EFuncId {
EI_WORK_GROUP_SCAN_EXCLUSIVE_MAX,		EI_WORK_GROUP_SCAN_EXCLUSIVE_MAX,
EI_WORK_GROUP_SCAN_EXCLUSIVE_MIN,		EI_WORK_GROUP_SCAN_EXCLUSIVE_MIN,
EI_WORK_GROUP_SCAN_INCLUSIVE_ADD,		EI_WORK_GROUP_SCAN_INCLUSIVE_ADD,
EI_WORK_GROUP_SCAN_INCLUSIVE_MAX,		EI_WORK_GROUP_SCAN_INCLUSIVE_MAX,
EI_WORK_GROUP_SCAN_INCLUSIVE_MIN,		EI_WORK_GROUP_SCAN_INCLUSIVE_MIN,
EI_WRITE_IMAGEF,		EI_WRITE_IMAGEF,
EI_WRITE_IMAGEI,		EI_WRITE_IMAGEI,
EI_WRITE_IMAGEUI,		EI_WRITE_IMAGEUI,
EI_WRITE_PIPE,
EI_NCOS,		EI_NCOS,
EI_NEXP2,		EI_NEXP2,
EI_NFMA,		EI_NFMA,
EI_NLOG2,		EI_NLOG2,
EI_NRCP,		EI_NRCP,
EI_NRSQRT,		EI_NRSQRT,
EI_NSIN,		EI_NSIN,
EI_NSQRT,		EI_NSQRT,
EI_FTZ,		EI_FTZ,
EI_FLDEXP,		EI_FLDEXP,
EI_CLASS,		EI_CLASS,
EI_RCBRT,		EI_RCBRT,
		EI_LAST_MANGLED =
		EI_RCBRT, /* The last library function with mangled name */

		// Library functions with unmangled name.
		EI_READ_PIPE_2,
		EI_READ_PIPE_4,
		EI_WRITE_PIPE_2,
		EI_WRITE_PIPE_4,

EX_INTRINSICS_COUNT		EX_INTRINSICS_COUNT
};		};

enum ENamePrefix {		enum ENamePrefix {
NOPFX,		NOPFX,
NATIVE,		NATIVE,
HALF		HALF
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	void reset() {
VectorSize = 1;		VectorSize = 1;
PtrKind = 0;		PtrKind = 0;
}		}
Param() { reset(); }		Param() { reset(); }

template <typename Stream>		template <typename Stream>
void mangleItanium(Stream& os);		void mangleItanium(Stream& os);
};		};
		static bool isMangled(EFuncId Id) {
		return static_cast<unsigned>(Id) <= static_cast<unsigned>(EI_LAST_MANGLED);
		}
		};

		class AMDGPULibFuncImpl : public AMDGPULibFuncBase {
public:		public:
static bool parse(StringRef mangledName, AMDGPULibFunc &iInfo);		AMDGPULibFuncImpl() {}
		virtual ~AMDGPULibFuncImpl() {}
AMDGPULibFunc();
AMDGPULibFunc(EFuncId id, const AMDGPULibFunc& copyFrom);

ENamePrefix getPrefix() const { return FKind; }		/// Get unmangled name for mangled library function and name for unmangled
		/// library function.
		virtual std::string getName() const = 0;
		virtual unsigned getNumArgs() const = 0;
EFuncId getId() const { return FuncId; }		EFuncId getId() const { return FuncId; }
		ENamePrefix getPrefix() const { return FKind; }

std::string getName() const;		bool isMangled() const { return AMDGPULibFuncBase::isMangled(FuncId); }
unsigned getNumArgs() const;

FunctionType* getFunctionType(Module& M) const;		void setId(EFuncId id) { FuncId = id; }
		virtual bool parseFuncName(StringRef &mangledName) = 0;

std::string mangle() const;		/// \return The mangled function name for mangled library functions
		/// and unmangled function name for unmangled library functions.
		virtual std::string mangle() const = 0;

		void setName(StringRef N) { Name = N; }
void setPrefix(ENamePrefix pfx) { FKind = pfx; }		void setPrefix(ENamePrefix pfx) { FKind = pfx; }
void setId(EFuncId id) { FuncId = id; }

		virtual FunctionType *getFunctionType(Module &M) const = 0;

		protected:
		EFuncId FuncId;
		std::string Name;
		ENamePrefix FKind;
		};

		/// Wrapper class for AMDGPULIbFuncImpl
		class AMDGPULibFunc : public AMDGPULibFuncBase {
		public:
		explicit AMDGPULibFunc() : Impl(std::unique_ptr<AMDGPULibFuncImpl>()) {}
		AMDGPULibFunc(const AMDGPULibFunc &F);
		/// Clone a mangled library func with the Id \p Id and argument info from \p
		/// CopyFrom.
		explicit AMDGPULibFunc(EFuncId Id, const AMDGPULibFunc &CopyFrom);
		/// Construct an unmangled library function on the fly.
		explicit AMDGPULibFunc(StringRef FName, FunctionType *FT);

		AMDGPULibFunc &operator=(const AMDGPULibFunc &F);

		/// Get unmangled name for mangled library function and name for unmangled
		/// library function.
		std::string getName() const { return Impl->getName(); }
		unsigned getNumArgs() const { return Impl->getNumArgs(); }
		EFuncId getId() const { return Impl->getId(); }
		ENamePrefix getPrefix() const { return Impl->getPrefix(); }
		/// Get leading parameters for mangled lib functions.
		Param *getLeads();
		const Param *getLeads() const;

		bool isMangled() const { return Impl->isMangled(); }
		void setId(EFuncId Id) { Impl->setId(Id); }
		bool parseFuncName(StringRef &MangledName) {
		return Impl->parseFuncName(MangledName);
		}

		/// \return The mangled function name for mangled library functions
		/// and unmangled function name for unmangled library functions.
		std::string mangle() const { return Impl->mangle(); }

		void setName(StringRef N) { Impl->setName(N); }
		void setPrefix(ENamePrefix PFX) { Impl->setPrefix(PFX); }

		FunctionType *getFunctionType(Module &M) const {
		return Impl->getFunctionType(M);
		}
static Function* getFunction(llvm::Module *M, const AMDGPULibFunc& fInfo);		static Function getFunction(llvm::Module M, const AMDGPULibFunc &fInfo);

static Function* getOrInsertFunction(llvm::Module *M,		static Function getOrInsertFunction(llvm::Module M,
const AMDGPULibFunc& fInfo);		const AMDGPULibFunc &fInfo);
		static bool parse(StringRef MangledName, AMDGPULibFunc &Ptr);

static StringRef getUnmangledName(const StringRef& mangledName);		private:
		/// Initialize as a mangled library function.
		void initMangled();
		std::unique_ptr<AMDGPULibFuncImpl> Impl;
		};

		class AMDGPUMangledLibFunc : public AMDGPULibFuncImpl {
		public:
Param Leads[2];		Param Leads[2];

private:		explicit AMDGPUMangledLibFunc();
EFuncId FuncId;		explicit AMDGPUMangledLibFunc(EFuncId id,
ENamePrefix FKind;		const AMDGPUMangledLibFunc &copyFrom);
std::string Name;
		std::string getName() const override;
		unsigned getNumArgs() const override;
		FunctionType *getFunctionType(Module &M) const override;
		static StringRef getUnmangledName(StringRef MangledName);

		bool parseFuncName(StringRef &mangledName) override;

		// Methods for support type inquiry through isa, cast, and dyn_cast:
		static bool classof(const AMDGPULibFuncImpl *F) { return F->isMangled(); }

void reset();		std::string mangle() const override;

		private:
std::string mangleNameItanium() const;		std::string mangleNameItanium() const;
bool parseItanuimName(StringRef& mangledName);

std::string mangleName(const StringRef& name) const;		std::string mangleName(StringRef Name) const;
bool parseName(const StringRef& mangledName);		bool parseUnmangledName(StringRef MangledName);

template <typename Stream>		template <typename Stream> void writeName(Stream &OS) const;
void writeName(Stream& OS) const;
};		};

		class AMDGPUUnmangledLibFunc : public AMDGPULibFuncImpl {
		FunctionType *FuncTy;

		public:
		explicit AMDGPUUnmangledLibFunc();
		explicit AMDGPUUnmangledLibFunc(StringRef FName, FunctionType *FT) {
		Name = FName;
		FuncTy = FT;
		}
		std::string getName() const override { return Name; }
		unsigned getNumArgs() const override;
		FunctionType *getFunctionType(Module &M) const override { return FuncTy; }

		bool parseFuncName(StringRef &Name) override;

		// Methods for support type inquiry through isa, cast, and dyn_cast:
		static bool classof(const AMDGPULibFuncImpl *F) { return !F->isMangled(); }

		std::string mangle() const override { return Name; }

		void setFunctionType(FunctionType *FT) { FuncTy = FT; }
		};
}		}
#endif // _AMDGPU_LIBFUNC_H_		#endif // _AMDGPU_LIBFUNC_H_

llvm/trunk/lib/Target/AMDGPU/AMDGPULibFunc.cpp

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	struct ManglingRule {
unsigned char Param[5];		unsigned char Param[5];

int maxLeadIndex() const { return (std::max)(Lead[0], Lead[1]); }		int maxLeadIndex() const { return (std::max)(Lead[0], Lead[1]); }
int getNumLeads() const { return (Lead[0] ? 1 : 0) + (Lead[1] ? 1 : 0); }		int getNumLeads() const { return (Lead[0] ? 1 : 0) + (Lead[1] ? 1 : 0); }

unsigned getNumArgs() const;		unsigned getNumArgs() const;
};		};

		// Information about library functions with unmangled names.
		class UnmangledFuncInfo {
		StringRef const Name;
		unsigned NumArgs;

		// Table for all lib functions with unmangled names.
		static const UnmangledFuncInfo Table[];

		// Number of entries in Table.
		static const unsigned TableSize;

		// Map function name to index.
		class NameMap : public StringMap<unsigned> {
		public:
		NameMap() {
		for (unsigned I = 0; I != TableSize; ++I)
		(*this)[Table[I].Name] = I;
		}
		};
		friend class NameMap;
		static NameMap Map;

		public:
		using ID = AMDGPULibFunc::EFuncId;
		UnmangledFuncInfo() = default;
		UnmangledFuncInfo(StringRef _Name, unsigned _NumArgs)
		: Name(_Name), NumArgs(_NumArgs) {}
		// Get index to Table by function name.
		static bool lookup(StringRef Name, ID &Id);
		static unsigned toIndex(ID Id) {
		assert(static_cast<unsigned>(Id) >
		static_cast<unsigned>(AMDGPULibFunc::EI_LAST_MANGLED) &&
		"Invalid unmangled library function");
		return static_cast<unsigned>(Id) - 1 -
		static_cast<unsigned>(AMDGPULibFunc::EI_LAST_MANGLED);
		}
		static ID toFuncId(unsigned Index) {
		assert(Index < TableSize && "Invalid unmangled library function");
		return static_cast<ID>(
		Index + 1 + static_cast<unsigned>(AMDGPULibFunc::EI_LAST_MANGLED));
		}
		static unsigned getNumArgs(ID Id) { return Table[toIndex(Id)].NumArgs; }
		static StringRef getName(ID Id) { return Table[toIndex(Id)].Name; }
		};

unsigned ManglingRule::getNumArgs() const {		unsigned ManglingRule::getNumArgs() const {
unsigned I=0;		unsigned I=0;
while (I < (sizeof Param/sizeof Param[0]) && Param[I]) ++I;		while (I < (sizeof Param/sizeof Param[0]) && Param[I]) ++I;
return I;		return I;
}		}

// This table describes function formal argument type rules. The order of rules		// This table describes function formal argument type rules. The order of rules
// corresponds to the EFuncId enum at AMDGPULibFunc.h		// corresponds to the EFuncId enum at AMDGPULibFunc.h
▲ Show 20 Lines • Show All 134 Lines • ▼ Show 20 Lines
{ "nextafter" , {1}, {E_ANY,E_COPY}},		{ "nextafter" , {1}, {E_ANY,E_COPY}},
{ "normalize" , {1}, {E_ANY}},		{ "normalize" , {1}, {E_ANY}},
{ "popcount" , {1}, {E_ANY}},		{ "popcount" , {1}, {E_ANY}},
{ "pow" , {1}, {E_ANY,E_COPY}},		{ "pow" , {1}, {E_ANY,E_COPY}},
{ "pown" , {1}, {E_ANY,E_SETBASE_I32}},		{ "pown" , {1}, {E_ANY,E_SETBASE_I32}},
{ "powr" , {1}, {E_ANY,E_COPY}},		{ "powr" , {1}, {E_ANY,E_COPY}},
{ "prefetch" , {1}, {E_CONSTPTR_ANY,EX_SIZET}},		{ "prefetch" , {1}, {E_CONSTPTR_ANY,EX_SIZET}},
{ "radians" , {1}, {E_ANY}},		{ "radians" , {1}, {E_ANY}},
{ "read_pipe" , {4}, {E_COPY,EX_RESERVEDID,EX_UINT,E_ANY}},
{ "recip" , {1}, {E_ANY}},		{ "recip" , {1}, {E_ANY}},
{ "remainder" , {1}, {E_ANY,E_COPY}},		{ "remainder" , {1}, {E_ANY,E_COPY}},
{ "remquo" , {1,3}, {E_ANY,E_COPY,E_ANY}},		{ "remquo" , {1,3}, {E_ANY,E_COPY,E_ANY}},
{ "reserve_read_pipe" , {1}, {E_ANY,EX_UINT}},		{ "reserve_read_pipe" , {1}, {E_ANY,EX_UINT}},
{ "reserve_write_pipe" , {1}, {E_ANY,EX_UINT}},		{ "reserve_write_pipe" , {1}, {E_ANY,EX_UINT}},
{ "rhadd" , {1}, {E_ANY,E_COPY}},		{ "rhadd" , {1}, {E_ANY,E_COPY}},
{ "rint" , {1}, {E_ANY}},		{ "rint" , {1}, {E_ANY}},
{ "rootn" , {1}, {E_ANY,E_SETBASE_I32}},		{ "rootn" , {1}, {E_ANY,E_SETBASE_I32}},
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
{ "work_group_scan_exclusive_max" , {1}, {E_ANY}},		{ "work_group_scan_exclusive_max" , {1}, {E_ANY}},
{ "work_group_scan_exclusive_min" , {1}, {E_ANY}},		{ "work_group_scan_exclusive_min" , {1}, {E_ANY}},
{ "work_group_scan_inclusive_add" , {1}, {E_ANY}},		{ "work_group_scan_inclusive_add" , {1}, {E_ANY}},
{ "work_group_scan_inclusive_max" , {1}, {E_ANY}},		{ "work_group_scan_inclusive_max" , {1}, {E_ANY}},
{ "work_group_scan_inclusive_min" , {1}, {E_ANY}},		{ "work_group_scan_inclusive_min" , {1}, {E_ANY}},
{ "write_imagef" , {1}, {E_ANY,E_IMAGECOORDS,EX_FLOAT4}},		{ "write_imagef" , {1}, {E_ANY,E_IMAGECOORDS,EX_FLOAT4}},
{ "write_imagei" , {1}, {E_ANY,E_IMAGECOORDS,EX_INTV4}},		{ "write_imagei" , {1}, {E_ANY,E_IMAGECOORDS,EX_INTV4}},
{ "write_imageui" , {1}, {E_ANY,E_IMAGECOORDS,EX_UINTV4}},		{ "write_imageui" , {1}, {E_ANY,E_IMAGECOORDS,EX_UINTV4}},
{ "write_pipe" , {4}, {E_COPY,EX_RESERVEDID,EX_UINT,E_ANY}},
{ "ncos" , {1}, {E_ANY} },		{ "ncos" , {1}, {E_ANY} },
{ "nexp2" , {1}, {E_ANY} },		{ "nexp2" , {1}, {E_ANY} },
{ "nfma" , {1}, {E_ANY, E_COPY, E_COPY} },		{ "nfma" , {1}, {E_ANY, E_COPY, E_COPY} },
{ "nlog2" , {1}, {E_ANY} },		{ "nlog2" , {1}, {E_ANY} },
{ "nrcp" , {1}, {E_ANY} },		{ "nrcp" , {1}, {E_ANY} },
{ "nrsqrt" , {1}, {E_ANY} },		{ "nrsqrt" , {1}, {E_ANY} },
{ "nsin" , {1}, {E_ANY} },		{ "nsin" , {1}, {E_ANY} },
{ "nsqrt" , {1}, {E_ANY} },		{ "nsqrt" , {1}, {E_ANY} },
{ "ftz" , {1}, {E_ANY} },		{ "ftz" , {1}, {E_ANY} },
{ "fldexp" , {1}, {E_ANY, EX_UINT} },		{ "fldexp" , {1}, {E_ANY, EX_UINT} },
{ "class" , {1}, {E_ANY, EX_UINT} },		{ "class" , {1}, {E_ANY, EX_UINT} },
{ "rcbrt" , {1}, {E_ANY} },		{ "rcbrt" , {1}, {E_ANY} },
};		};

		// Library functions with unmangled name.
		const UnmangledFuncInfo UnmangledFuncInfo::Table[] = {
		{"__read_pipe_2", 4},
		{"__read_pipe_4", 6},
		{"__write_pipe_2", 4},
		{"__write_pipe_4", 6},
		};

		const unsigned UnmangledFuncInfo::TableSize =
		sizeof(UnmangledFuncInfo::Table) / sizeof(UnmangledFuncInfo::Table[0]);

		UnmangledFuncInfo::NameMap UnmangledFuncInfo::Map;

static const struct ManglingRulesMap : public StringMap<int> {		static const struct ManglingRulesMap : public StringMap<int> {
ManglingRulesMap()		ManglingRulesMap()
: StringMap<int>(sizeof(manglingRules)/sizeof(manglingRules[0])) {		: StringMap<int>(sizeof(manglingRules)/sizeof(manglingRules[0])) {
int Id = 0;		int Id = 0;
for (auto Rule : manglingRules)		for (auto Rule : manglingRules)
insert({ Rule.Name, Id++ });		insert({ Rule.Name, Id++ });
}		}
} manglingRulesMap;		} manglingRulesMap;
▲ Show 20 Lines • Show All 147 Lines • ▼ Show 20 Lines	if (Len <= 0 \|\| static_cast<size_t>(Len) > mangledName.size())
return StringRef();		return StringRef();
StringRef Res = mangledName.substr(0, Len);		StringRef Res = mangledName.substr(0, Len);
drop_front(mangledName, Len);		drop_front(mangledName, Len);
return Res;		return Res;
}		}

} // end anonymous namespace		} // end anonymous namespace

AMDGPULibFunc::AMDGPULibFunc() {		AMDGPUMangledLibFunc::AMDGPUMangledLibFunc() {
reset();
}

AMDGPULibFunc::AMDGPULibFunc(EFuncId id, const AMDGPULibFunc& copyFrom)
: FuncId(id) {
FKind = copyFrom.FKind;
Leads[0] = copyFrom.Leads[0];
Leads[1] = copyFrom.Leads[1];
}

void AMDGPULibFunc::reset() {
FuncId = EI_NONE;		FuncId = EI_NONE;
FKind = NOPFX;		FKind = NOPFX;
Leads[0].reset();		Leads[0].reset();
Leads[1].reset();		Leads[1].reset();
Name.clear();		Name.clear();
}		}

		AMDGPUUnmangledLibFunc::AMDGPUUnmangledLibFunc() {
		FuncId = EI_NONE;
		FuncTy = nullptr;
		}

		AMDGPUMangledLibFunc::AMDGPUMangledLibFunc(
		EFuncId id, const AMDGPUMangledLibFunc &copyFrom) {
		FuncId = id;
		FKind = copyFrom.FKind;
		Leads[0] = copyFrom.Leads[0];
		Leads[1] = copyFrom.Leads[1];
		}

///////////////////////////////////////////////////////////////////////////////		///////////////////////////////////////////////////////////////////////////////
// Demangling		// Demangling

static int parseVecSize(StringRef& mangledName) {		static int parseVecSize(StringRef& mangledName) {
size_t const Len = eatNumber(mangledName);		size_t const Len = eatNumber(mangledName);
switch (Len) {		switch (Len) {
case 2: case 3: case 4: case 8: case 16:		case 2: case 3: case 4: case 8: case 16:
return Len;		return Len;
Show All 12 Lines	AMDGPULibFunc::ENamePrefix Pfx =
.Default(AMDGPULibFunc::NOPFX);		.Default(AMDGPULibFunc::NOPFX);

if (Pfx != AMDGPULibFunc::NOPFX)		if (Pfx != AMDGPULibFunc::NOPFX)
mangledName = P.second;		mangledName = P.second;

return Pfx;		return Pfx;
}		}

bool AMDGPULibFunc::parseName(const StringRef& fullName) {		bool AMDGPUMangledLibFunc::parseUnmangledName(StringRef FullName) {
FuncId = static_cast<EFuncId>(manglingRulesMap.lookup(fullName));		FuncId = static_cast<EFuncId>(manglingRulesMap.lookup(FullName));
return FuncId != EI_NONE;		return FuncId != EI_NONE;
}		}

///////////////////////////////////////////////////////////////////////////////		///////////////////////////////////////////////////////////////////////////////
// Itanium Demangling		// Itanium Demangling

namespace {		namespace {
struct ItaniumParamParser {		struct ItaniumParamParser {
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	if (::isDigit(TC)) {
}		}
}		}
if (res.ArgType == 0) return false;		if (res.ArgType == 0) return false;
Prev.VectorSize = res.VectorSize;		Prev.VectorSize = res.VectorSize;
Prev.ArgType = res.ArgType;		Prev.ArgType = res.ArgType;
return true;		return true;
}		}

bool AMDGPULibFunc::parseItanuimName(StringRef& mangledName) {		bool AMDGPUMangledLibFunc::parseFuncName(StringRef &mangledName) {
StringRef Name = eatLengthPrefixedName(mangledName);		StringRef Name = eatLengthPrefixedName(mangledName);
FKind = parseNamePrefix(Name);		FKind = parseNamePrefix(Name);
if (!parseName(Name)) return false;		if (!parseUnmangledName(Name))
		return false;

const ManglingRule& Rule = manglingRules[FuncId];		const ManglingRule& Rule = manglingRules[FuncId];
ItaniumParamParser Parser;		ItaniumParamParser Parser;
for (int I=0; I < Rule.maxLeadIndex(); ++I) {		for (int I=0; I < Rule.maxLeadIndex(); ++I) {
Param P;		Param P;
if (!Parser.parseItaniumParam(mangledName, P))		if (!Parser.parseItaniumParam(mangledName, P))
return false;		return false;

if ((I + 1) == Rule.Lead[0]) Leads[0] = P;		if ((I + 1) == Rule.Lead[0]) Leads[0] = P;
if ((I + 1) == Rule.Lead[1]) Leads[1] = P;		if ((I + 1) == Rule.Lead[1]) Leads[1] = P;
}		}
return true;		return true;
}		}

bool AMDGPULibFunc::parse(StringRef mangledName, AMDGPULibFunc& iInfo) {		bool AMDGPUUnmangledLibFunc::parseFuncName(StringRef &Name) {
iInfo.reset();		if (!UnmangledFuncInfo::lookup(Name, FuncId))
if (mangledName.empty())
return false;		return false;
		setName(Name);
		return true;
		}

if (eatTerm(mangledName, "_Z")) {		bool AMDGPULibFunc::parse(StringRef FuncName, AMDGPULibFunc &F) {
return iInfo.parseItanuimName(mangledName);		if (FuncName.empty()) {
		F.Impl = std::unique_ptr<AMDGPULibFuncImpl>();
		return false;
}		}

		if (eatTerm(FuncName, "_Z"))
		F.Impl = make_unique<AMDGPUMangledLibFunc>();
		else
		F.Impl = make_unique<AMDGPUUnmangledLibFunc>();
		if (F.Impl->parseFuncName(FuncName))
		return true;

		F.Impl = std::unique_ptr<AMDGPULibFuncImpl>();
return false;		return false;
}		}

StringRef AMDGPULibFunc::getUnmangledName(const StringRef& mangledName) {		StringRef AMDGPUMangledLibFunc::getUnmangledName(StringRef mangledName) {
StringRef S = mangledName;		StringRef S = mangledName;
if (eatTerm(S, "_Z"))		if (eatTerm(S, "_Z"))
return eatLengthPrefixedName(S);		return eatLengthPrefixedName(S);
return StringRef();		return StringRef();
}		}


///////////////////////////////////////////////////////////////////////////////		///////////////////////////////////////////////////////////////////////////////
// Mangling		// Mangling

template <typename Stream>		template <typename Stream>
void AMDGPULibFunc::writeName(Stream& OS) const {		void AMDGPUMangledLibFunc::writeName(Stream &OS) const {
const char *Pfx = "";		const char *Pfx = "";
switch (FKind) {		switch (FKind) {
case NATIVE: Pfx = "native_"; break;		case NATIVE: Pfx = "native_"; break;
case HALF: Pfx = "half_"; break;		case HALF: Pfx = "half_"; break;
default: break;		default: break;
}		}
if (!Name.empty()) {		if (!Name.empty()) {
OS << Pfx << Name;		OS << Pfx << Name;
} else if (FuncId != EI_NONE) {		} else if (FuncId != EI_NONE) {
OS << Pfx;		OS << Pfx;
const StringRef& S = manglingRules[FuncId].Name;		const StringRef& S = manglingRules[FuncId].Name;
OS.write(S.data(), S.size());		OS.write(S.data(), S.size());
}		}
}		}

std::string AMDGPULibFunc::mangle() const {		std::string AMDGPUMangledLibFunc::mangle() const { return mangleNameItanium(); }
return mangleNameItanium();
}

///////////////////////////////////////////////////////////////////////////////		///////////////////////////////////////////////////////////////////////////////
// Itanium Mangling		// Itanium Mangling

static const char *getItaniumTypeName(AMDGPULibFunc::EType T) {		static const char *getItaniumTypeName(AMDGPULibFunc::EType T) {
switch (T) {		switch (T) {
case AMDGPULibFunc::U8: return "h";		case AMDGPULibFunc::U8: return "h";
case AMDGPULibFunc::U16: return "t";		case AMDGPULibFunc::U16: return "t";
▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	void operator()(Stream& os, AMDGPULibFunc::Param p) {
os << getItaniumTypeName((AMDGPULibFunc::EType)p.ArgType);		os << getItaniumTypeName((AMDGPULibFunc::EType)p.ArgType);

exit:		exit:
if (Ptr.ArgType) Str.push_back(Ptr);		if (Ptr.ArgType) Str.push_back(Ptr);
}		}
};		};
} // namespace		} // namespace

std::string AMDGPULibFunc::mangleNameItanium() const {		std::string AMDGPUMangledLibFunc::mangleNameItanium() const {
SmallString<128> Buf;		SmallString<128> Buf;
raw_svector_ostream S(Buf);		raw_svector_ostream S(Buf);
SmallString<128> NameBuf;		SmallString<128> NameBuf;
raw_svector_ostream Name(NameBuf);		raw_svector_ostream Name(NameBuf);
writeName(Name);		writeName(Name);
const StringRef& NameStr = Name.str();		const StringRef& NameStr = Name.str();
S << "_Z" << static_cast<int>(NameStr.size()) << NameStr;		S << "_Z" << static_cast<int>(NameStr.size()) << NameStr;

▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	if (P.VectorSize > 1)
T = VectorType::get(T, P.VectorSize);		T = VectorType::get(T, P.VectorSize);
if (P.PtrKind != AMDGPULibFunc::BYVALUE)		if (P.PtrKind != AMDGPULibFunc::BYVALUE)
T = useAddrSpace ? T->getPointerTo((P.PtrKind & AMDGPULibFunc::ADDR_SPACE)		T = useAddrSpace ? T->getPointerTo((P.PtrKind & AMDGPULibFunc::ADDR_SPACE)
- 1)		- 1)
: T->getPointerTo();		: T->getPointerTo();
return T;		return T;
}		}

FunctionType* AMDGPULibFunc::getFunctionType(Module& M) const {		FunctionType *AMDGPUMangledLibFunc::getFunctionType(Module &M) const {
LLVMContext& C = M.getContext();		LLVMContext& C = M.getContext();
std::vector<Type*> Args;		std::vector<Type*> Args;
ParamIterator I(Leads, manglingRules[FuncId]);		ParamIterator I(Leads, manglingRules[FuncId]);
Param P;		Param P;
while ((P=I.getNextParam()).ArgType != 0)		while ((P=I.getNextParam()).ArgType != 0)
Args.push_back(getIntrinsicParamType(C, P, true));		Args.push_back(getIntrinsicParamType(C, P, true));

return FunctionType::get(		return FunctionType::get(
getIntrinsicParamType(C, getRetType(FuncId, Leads), true),		getIntrinsicParamType(C, getRetType(FuncId, Leads), true),
Args, false);		Args, false);
}		}

unsigned AMDGPULibFunc::getNumArgs() const {		unsigned AMDGPUMangledLibFunc::getNumArgs() const {
return manglingRules[FuncId].getNumArgs();		return manglingRules[FuncId].getNumArgs();
}		}

std::string AMDGPULibFunc::getName() const {		unsigned AMDGPUUnmangledLibFunc::getNumArgs() const {
		return UnmangledFuncInfo::getNumArgs(FuncId);
		}

		std::string AMDGPUMangledLibFunc::getName() const {
SmallString<128> Buf;		SmallString<128> Buf;
raw_svector_ostream OS(Buf);		raw_svector_ostream OS(Buf);
writeName(OS);		writeName(OS);
return OS.str();		return OS.str();
}		}

Function AMDGPULibFunc::getFunction(Module M, const AMDGPULibFunc& fInfo) {		Function AMDGPULibFunc::getFunction(Module M, const AMDGPULibFunc &fInfo) {
std::string FuncName = fInfo.mangle();		std::string FuncName = fInfo.mangle();
Function *F = dyn_cast_or_null<Function>(		Function *F = dyn_cast_or_null<Function>(
M->getValueSymbolTable().lookup(FuncName));		M->getValueSymbolTable().lookup(FuncName));

// check formal with actual types conformance		// check formal with actual types conformance
if (F && !F->isDeclaration()		if (F && !F->isDeclaration()
&& !F->isVarArg()		&& !F->isVarArg()
&& F->arg_size() == fInfo.getNumArgs()) {		&& F->arg_size() == fInfo.getNumArgs()) {
return F;		return F;
}		}
return nullptr;		return nullptr;
}		}

Function AMDGPULibFunc::getOrInsertFunction(Module M,		Function AMDGPULibFunc::getOrInsertFunction(Module M,
const AMDGPULibFunc& fInfo) {		const AMDGPULibFunc &fInfo) {
std::string const FuncName = fInfo.mangle();		std::string const FuncName = fInfo.mangle();
Function *F = dyn_cast_or_null<Function>(		Function *F = dyn_cast_or_null<Function>(
M->getValueSymbolTable().lookup(FuncName));		M->getValueSymbolTable().lookup(FuncName));

// check formal with actual types conformance		// check formal with actual types conformance
if (F && !F->isDeclaration()		if (F && !F->isDeclaration()
&& !F->isVarArg()		&& !F->isVarArg()
&& F->arg_size() == fInfo.getNumArgs()) {		&& F->arg_size() == fInfo.getNumArgs()) {
Show All 23 Lines	if (hasPtr) {
LLVMContext &Ctx = M->getContext();		LLVMContext &Ctx = M->getContext();
Attr.addAttribute(Ctx, AttributeList::FunctionIndex, Attribute::ReadOnly);		Attr.addAttribute(Ctx, AttributeList::FunctionIndex, Attribute::ReadOnly);
Attr.addAttribute(Ctx, AttributeList::FunctionIndex, Attribute::NoUnwind);		Attr.addAttribute(Ctx, AttributeList::FunctionIndex, Attribute::NoUnwind);
C = M->getOrInsertFunction(FuncName, FuncTy, Attr);		C = M->getOrInsertFunction(FuncName, FuncTy, Attr);
}		}

return cast<Function>(C);		return cast<Function>(C);
}		}

		bool UnmangledFuncInfo::lookup(StringRef Name, ID &Id) {
		auto Loc = Map.find(Name);
		if (Loc != Map.end()) {
		Id = toFuncId(Loc->second);
		return true;
		}
		Id = AMDGPULibFunc::EI_NONE;
		return false;
		}

		AMDGPULibFunc::AMDGPULibFunc(const AMDGPULibFunc &F) {
		if (auto *MF = dyn_cast<AMDGPUMangledLibFunc>(F.Impl.get()))
		Impl.reset(new AMDGPUMangledLibFunc(*MF));
		else if (auto *UMF = dyn_cast<AMDGPUUnmangledLibFunc>(F.Impl.get()))
		Impl.reset(new AMDGPUUnmangledLibFunc(*UMF));
		else
		Impl = std::unique_ptr<AMDGPULibFuncImpl>();
		}

		AMDGPULibFunc &AMDGPULibFunc::operator=(const AMDGPULibFunc &F) {
		if (this == &F)
		return *this;
		new (this) AMDGPULibFunc(F);
		return *this;
		}

		AMDGPULibFunc::AMDGPULibFunc(EFuncId Id, const AMDGPULibFunc &CopyFrom) {
		assert(AMDGPULibFuncBase::isMangled(Id) && CopyFrom.isMangled() &&
		"not supported");
		Impl.reset(new AMDGPUMangledLibFunc(
		Id, *cast<AMDGPUMangledLibFunc>(CopyFrom.Impl.get())));
		}

		AMDGPULibFunc::AMDGPULibFunc(StringRef Name, FunctionType *FT) {
		Impl.reset(new AMDGPUUnmangledLibFunc(Name, FT));
		}

		void AMDGPULibFunc::initMangled() { Impl.reset(new AMDGPUMangledLibFunc()); }

		AMDGPULibFunc::Param *AMDGPULibFunc::getLeads() {
		if (!Impl)
		initMangled();
		return cast<AMDGPUMangledLibFunc>(Impl.get())->Leads;
		}

		const AMDGPULibFunc::Param *AMDGPULibFunc::getLeads() const {
		return cast<const AMDGPUMangledLibFunc>(Impl.get())->Leads;
		}

llvm/trunk/test/CodeGen/AMDGPU/simplify-libcalls.ll

Show First 20 Lines • Show All 293 Lines • ▼ Show 20 Lines	entry:
store float %call, float addrspace(1)* %a, align 4		store float %call, float addrspace(1)* %a, align 4
ret void		ret void
}		}

; GCN-LABEL: {{^}}define amdgpu_kernel void @test_pow_c		; GCN-LABEL: {{^}}define amdgpu_kernel void @test_pow_c
; GCN: %__powx2 = fmul fast float %tmp, %tmp		; GCN: %__powx2 = fmul fast float %tmp, %tmp
; GCN: %__powx21 = fmul fast float %__powx2, %__powx2		; GCN: %__powx21 = fmul fast float %__powx2, %__powx2
; GCN: %__powx22 = fmul fast float %__powx2, %tmp		; GCN: %__powx22 = fmul fast float %__powx2, %tmp
; GCN: %0 = fmul fast float %__powx21, %__powx21		; GCN: %[[r0:.*]] = fmul fast float %__powx21, %__powx21
; GCN: %__powprod3 = fmul fast float %0, %__powx22		; GCN: %__powprod3 = fmul fast float %[[r0]], %__powx22
define amdgpu_kernel void @test_pow_c(float addrspace(1)* nocapture %a) {		define amdgpu_kernel void @test_pow_c(float addrspace(1)* nocapture %a) {
entry:		entry:
%arrayidx = getelementptr inbounds float, float addrspace(1)* %a, i64 1		%arrayidx = getelementptr inbounds float, float addrspace(1)* %a, i64 1
%tmp = load float, float addrspace(1)* %arrayidx, align 4		%tmp = load float, float addrspace(1)* %arrayidx, align 4
%call = tail call fast float @_Z3powff(float %tmp, float 1.100000e+01)		%call = tail call fast float @_Z3powff(float %tmp, float 1.100000e+01)
store float %call, float addrspace(1)* %a, align 4		store float %call, float addrspace(1)* %a, align 4
ret void		ret void
}		}

; GCN-LABEL: {{^}}define amdgpu_kernel void @test_powr_c		; GCN-LABEL: {{^}}define amdgpu_kernel void @test_powr_c
; GCN: %__powx2 = fmul fast float %tmp, %tmp		; GCN: %__powx2 = fmul fast float %tmp, %tmp
; GCN: %__powx21 = fmul fast float %__powx2, %__powx2		; GCN: %__powx21 = fmul fast float %__powx2, %__powx2
; GCN: %__powx22 = fmul fast float %__powx2, %tmp		; GCN: %__powx22 = fmul fast float %__powx2, %tmp
; GCN: %0 = fmul fast float %__powx21, %__powx21		; GCN: %[[r0:.*]] = fmul fast float %__powx21, %__powx21
; GCN: %__powprod3 = fmul fast float %0, %__powx22		; GCN: %__powprod3 = fmul fast float %[[r0]], %__powx22
define amdgpu_kernel void @test_powr_c(float addrspace(1)* nocapture %a) {		define amdgpu_kernel void @test_powr_c(float addrspace(1)* nocapture %a) {
entry:		entry:
%arrayidx = getelementptr inbounds float, float addrspace(1)* %a, i64 1		%arrayidx = getelementptr inbounds float, float addrspace(1)* %a, i64 1
%tmp = load float, float addrspace(1)* %arrayidx, align 4		%tmp = load float, float addrspace(1)* %arrayidx, align 4
%call = tail call fast float @_Z4powrff(float %tmp, float 1.100000e+01)		%call = tail call fast float @_Z4powrff(float %tmp, float 1.100000e+01)
store float %call, float addrspace(1)* %a, align 4		store float %call, float addrspace(1)* %a, align 4
ret void		ret void
}		}

declare float @_Z4powrff(float, float)		declare float @_Z4powrff(float, float)

; GCN-LABEL: {{^}}define amdgpu_kernel void @test_pown_c		; GCN-LABEL: {{^}}define amdgpu_kernel void @test_pown_c
; GCN: %__powx2 = fmul fast float %tmp, %tmp		; GCN: %__powx2 = fmul fast float %tmp, %tmp
; GCN: %__powx21 = fmul fast float %__powx2, %__powx2		; GCN: %__powx21 = fmul fast float %__powx2, %__powx2
; GCN: %__powx22 = fmul fast float %__powx2, %tmp		; GCN: %__powx22 = fmul fast float %__powx2, %tmp
; GCN: %0 = fmul fast float %__powx21, %__powx21		; GCN: %[[r0:.*]] = fmul fast float %__powx21, %__powx21
; GCN: %__powprod3 = fmul fast float %0, %__powx22		; GCN: %__powprod3 = fmul fast float %[[r0]], %__powx22
define amdgpu_kernel void @test_pown_c(float addrspace(1)* nocapture %a) {		define amdgpu_kernel void @test_pown_c(float addrspace(1)* nocapture %a) {
entry:		entry:
%arrayidx = getelementptr inbounds float, float addrspace(1)* %a, i64 1		%arrayidx = getelementptr inbounds float, float addrspace(1)* %a, i64 1
%tmp = load float, float addrspace(1)* %arrayidx, align 4		%tmp = load float, float addrspace(1)* %arrayidx, align 4
%call = tail call fast float @_Z4pownfi(float %tmp, i32 11)		%call = tail call fast float @_Z4pownfi(float %tmp, i32 11)
store float %call, float addrspace(1)* %a, align 4		store float %call, float addrspace(1)* %a, align 4
ret void		ret void
}		}

declare float @_Z4pownfi(float, i32)		declare float @_Z4pownfi(float, i32)

; GCN-LABEL: {{^}}define amdgpu_kernel void @test_pow		; GCN-LABEL: {{^}}define amdgpu_kernel void @test_pow
; GCN-POSTLINK: tail call fast float @_Z3powff(float %tmp, float 1.013000e+03)		; GCN-POSTLINK: tail call fast float @_Z3powff(float %tmp, float 1.013000e+03)
; GCN-PRELINK: %__fabs = tail call fast float @_Z4fabsf(float %tmp)		; GCN-PRELINK: %__fabs = tail call fast float @_Z4fabsf(float %tmp)
; GCN-PRELINK: %__log2 = tail call fast float @_Z4log2f(float %__fabs)		; GCN-PRELINK: %__log2 = tail call fast float @_Z4log2f(float %__fabs)
; GCN-PRELINK: %__ylogx = fmul fast float %__log2, 1.013000e+03		; GCN-PRELINK: %__ylogx = fmul fast float %__log2, 1.013000e+03
; GCN-PRELINK: %__exp2 = tail call fast float @_Z4exp2f(float %__ylogx)		; GCN-PRELINK: %__exp2 = tail call fast float @_Z4exp2f(float %__ylogx)
; GCN-PRELINK: %0 = bitcast float %tmp to i32		; GCN-PRELINK: %[[r0:.*]] = bitcast float %tmp to i32
; GCN-PRELINK: %__pow_sign = and i32 %0, -2147483648		; GCN-PRELINK: %__pow_sign = and i32 %[[r0]], -2147483648
; GCN-PRELINK: %1 = bitcast float %__exp2 to i32		; GCN-PRELINK: %[[r1:.*]] = bitcast float %__exp2 to i32
; GCN-PRELINK: %2 = or i32 %__pow_sign, %1		; GCN-PRELINK: %[[r2:.*]] = or i32 %__pow_sign, %[[r1]]
; GCN-PRELINK: %3 = bitcast float addrspace(1)* %a to i32 addrspace(1)*		; GCN-PRELINK: %[[r3:.]] = bitcast float addrspace(1) %a to i32 addrspace(1)*
; GCN-PRELINK: store i32 %2, i32 addrspace(1)* %3, align 4		; GCN-PRELINK: store i32 %[[r2]], i32 addrspace(1)* %[[r3]], align 4
define amdgpu_kernel void @test_pow(float addrspace(1)* nocapture %a) {		define amdgpu_kernel void @test_pow(float addrspace(1)* nocapture %a) {
entry:		entry:
%tmp = load float, float addrspace(1)* %a, align 4		%tmp = load float, float addrspace(1)* %a, align 4
%call = tail call fast float @_Z3powff(float %tmp, float 1.013000e+03)		%call = tail call fast float @_Z3powff(float %tmp, float 1.013000e+03)
store float %call, float addrspace(1)* %a, align 4		store float %call, float addrspace(1)* %a, align 4
ret void		ret void
}		}

Show All 21 Lines
; GCN-POSTLINK: tail call fast float @_Z4pownfi(float %tmp, i32 %conv)		; GCN-POSTLINK: tail call fast float @_Z4pownfi(float %tmp, i32 %conv)
; GCN-PRELINK: %conv = fptosi float %tmp1 to i32		; GCN-PRELINK: %conv = fptosi float %tmp1 to i32
; GCN-PRELINK: %__fabs = tail call fast float @_Z4fabsf(float %tmp)		; GCN-PRELINK: %__fabs = tail call fast float @_Z4fabsf(float %tmp)
; GCN-PRELINK: %__log2 = tail call fast float @_Z4log2f(float %__fabs)		; GCN-PRELINK: %__log2 = tail call fast float @_Z4log2f(float %__fabs)
; GCN-PRELINK: %pownI2F = sitofp i32 %conv to float		; GCN-PRELINK: %pownI2F = sitofp i32 %conv to float
; GCN-PRELINK: %__ylogx = fmul fast float %__log2, %pownI2F		; GCN-PRELINK: %__ylogx = fmul fast float %__log2, %pownI2F
; GCN-PRELINK: %__exp2 = tail call fast float @_Z4exp2f(float %__ylogx)		; GCN-PRELINK: %__exp2 = tail call fast float @_Z4exp2f(float %__ylogx)
; GCN-PRELINK: %__yeven = shl i32 %conv, 31		; GCN-PRELINK: %__yeven = shl i32 %conv, 31
; GCN-PRELINK: %0 = bitcast float %tmp to i32		; GCN-PRELINK: %[[r0:.*]] = bitcast float %tmp to i32
; GCN-PRELINK: %__pow_sign = and i32 %__yeven, %0		; GCN-PRELINK: %__pow_sign = and i32 %__yeven, %[[r0]]
; GCN-PRELINK: %1 = bitcast float %__exp2 to i32		; GCN-PRELINK: %[[r1:.*]] = bitcast float %__exp2 to i32
; GCN-PRELINK: %2 = or i32 %__pow_sign, %1		; GCN-PRELINK: %[[r2:.*]] = or i32 %__pow_sign, %[[r1]]
; GCN-PRELINK: %3 = bitcast float addrspace(1)* %a to i32 addrspace(1)*		; GCN-PRELINK: %[[r3:.]] = bitcast float addrspace(1) %a to i32 addrspace(1)*
; GCN-PRELINK: store i32 %2, i32 addrspace(1)* %3, align 4		; GCN-PRELINK: store i32 %[[r2]], i32 addrspace(1)* %[[r3]], align 4
define amdgpu_kernel void @test_pown(float addrspace(1)* nocapture %a) {		define amdgpu_kernel void @test_pown(float addrspace(1)* nocapture %a) {
entry:		entry:
%tmp = load float, float addrspace(1)* %a, align 4		%tmp = load float, float addrspace(1)* %a, align 4
%arrayidx1 = getelementptr inbounds float, float addrspace(1)* %a, i64 1		%arrayidx1 = getelementptr inbounds float, float addrspace(1)* %a, i64 1
%tmp1 = load float, float addrspace(1)* %arrayidx1, align 4		%tmp1 = load float, float addrspace(1)* %arrayidx1, align 4
%conv = fptosi float %tmp1 to i32		%conv = fptosi float %tmp1 to i32
%call = tail call fast float @_Z4pownfi(float %tmp, i32 %conv)		%call = tail call fast float @_Z4pownfi(float %tmp, i32 %conv)
store float %call, float addrspace(1)* %a, align 4		store float %call, float addrspace(1)* %a, align 4
▲ Show 20 Lines • Show All 277 Lines • ▼ Show 20 Lines	entry:
%arrayidx1 = getelementptr inbounds float, float addrspace(1)* %a, i64 1		%arrayidx1 = getelementptr inbounds float, float addrspace(1)* %a, i64 1
%tmp1 = addrspacecast float addrspace(1)* %arrayidx1 to float addrspace(4)*		%tmp1 = addrspacecast float addrspace(1)* %arrayidx1 to float addrspace(4)*
%call = tail call fast float @_Z6sincosfPU3AS4f(float %tmp, float addrspace(4)* %tmp1)		%call = tail call fast float @_Z6sincosfPU3AS4f(float %tmp, float addrspace(4)* %tmp1)
store float %call, float addrspace(1)* %a, align 4		store float %call, float addrspace(1)* %a, align 4
ret void		ret void
}		}

declare float @_Z6sincosfPU3AS4f(float, float addrspace(4)*)		declare float @_Z6sincosfPU3AS4f(float, float addrspace(4)*)

		%opencl.pipe_t = type opaque
		%opencl.reserve_id_t = type opaque

		; GCN-LABEL: {{^}}define amdgpu_kernel void @test_read_pipe(%opencl.pipe_t addrspace(1)* %p, i32 addrspace(1)* %ptr)
		; GCN-PRELINK: call i32 @__read_pipe_2_4(%opencl.pipe_t addrspace(1)* %{{.}}, i32 addrspace(4) %{{.*}}) #[[NOUNWIND:[0-9]+]]
		; GCN-PRELINK: call i32 @__read_pipe_4_4(%opencl.pipe_t addrspace(1)* %{{.}}, %opencl.reserve_id_t %{{.}}, i32 2, i32 addrspace(4) %{{.*}}) #[[NOUNWIND]]
		define amdgpu_kernel void @test_read_pipe(%opencl.pipe_t addrspace(1)* %p, i32 addrspace(1)* %ptr) local_unnamed_addr {
		entry:
		%tmp = bitcast i32 addrspace(1)* %ptr to i8 addrspace(1)*
		%tmp1 = addrspacecast i8 addrspace(1)* %tmp to i8 addrspace(4)*
		%tmp2 = tail call i32 @__read_pipe_2(%opencl.pipe_t addrspace(1)* %p, i8 addrspace(4)* %tmp1, i32 4, i32 4) #0
		%tmp3 = tail call %opencl.reserve_id_t* @__reserve_read_pipe(%opencl.pipe_t addrspace(1)* %p, i32 2, i32 4, i32 4)
		%tmp4 = tail call i32 @__read_pipe_4(%opencl.pipe_t addrspace(1)* %p, %opencl.reserve_id_t* %tmp3, i32 2, i8 addrspace(4)* %tmp1, i32 4, i32 4) #0
		tail call void @__commit_read_pipe(%opencl.pipe_t addrspace(1)* %p, %opencl.reserve_id_t* %tmp3, i32 4, i32 4)
		ret void
		}

		declare i32 @__read_pipe_2(%opencl.pipe_t addrspace(1), i8 addrspace(4), i32, i32)

		declare %opencl.reserve_id_t* @__reserve_read_pipe(%opencl.pipe_t addrspace(1)*, i32, i32, i32)

		declare i32 @__read_pipe_4(%opencl.pipe_t addrspace(1), %opencl.reserve_id_t, i32, i8 addrspace(4)*, i32, i32)

		declare void @__commit_read_pipe(%opencl.pipe_t addrspace(1), %opencl.reserve_id_t, i32, i32)

		; GCN-LABEL: {{^}}define amdgpu_kernel void @test_write_pipe(%opencl.pipe_t addrspace(1)* %p, i32 addrspace(1)* %ptr)
		; GCN-PRELINK: call i32 @__write_pipe_2_4(%opencl.pipe_t addrspace(1)* %{{.}}, i32 addrspace(4) %{{.*}}) #[[NOUNWIND]]
		; GCN-PRELINK: call i32 @__write_pipe_4_4(%opencl.pipe_t addrspace(1)* %{{.}}, %opencl.reserve_id_t %{{.}}, i32 2, i32 addrspace(4) %{{.*}}) #[[NOUNWIND]]
		define amdgpu_kernel void @test_write_pipe(%opencl.pipe_t addrspace(1)* %p, i32 addrspace(1)* %ptr) local_unnamed_addr {
		entry:
		%tmp = bitcast i32 addrspace(1)* %ptr to i8 addrspace(1)*
		%tmp1 = addrspacecast i8 addrspace(1)* %tmp to i8 addrspace(4)*
		%tmp2 = tail call i32 @__write_pipe_2(%opencl.pipe_t addrspace(1)* %p, i8 addrspace(4)* %tmp1, i32 4, i32 4) #0
		%tmp3 = tail call %opencl.reserve_id_t* @__reserve_write_pipe(%opencl.pipe_t addrspace(1)* %p, i32 2, i32 4, i32 4) #0
		%tmp4 = tail call i32 @__write_pipe_4(%opencl.pipe_t addrspace(1)* %p, %opencl.reserve_id_t* %tmp3, i32 2, i8 addrspace(4)* %tmp1, i32 4, i32 4) #0
		tail call void @__commit_write_pipe(%opencl.pipe_t addrspace(1)* %p, %opencl.reserve_id_t* %tmp3, i32 4, i32 4) #0
		ret void
		}

		declare i32 @__write_pipe_2(%opencl.pipe_t addrspace(1), i8 addrspace(4), i32, i32) local_unnamed_addr

		declare %opencl.reserve_id_t* @__reserve_write_pipe(%opencl.pipe_t addrspace(1)*, i32, i32, i32) local_unnamed_addr

		declare i32 @__write_pipe_4(%opencl.pipe_t addrspace(1), %opencl.reserve_id_t, i32, i8 addrspace(4)*, i32, i32) local_unnamed_addr

		declare void @__commit_write_pipe(%opencl.pipe_t addrspace(1), %opencl.reserve_id_t, i32, i32) local_unnamed_addr

		%struct.S = type { [100 x i32] }

		; GCN-LABEL: {{^}}define amdgpu_kernel void @test_pipe_size
		; GCN-PRELINK: call i32 @__read_pipe_2_1(%opencl.pipe_t addrspace(1)* %{{.}} i8 addrspace(4) %{{.*}}) #[[NOUNWIND]]
		; GCN-PRELINK: call i32 @__read_pipe_2_2(%opencl.pipe_t addrspace(1)* %{{.}} i16 addrspace(4) %{{.*}}) #[[NOUNWIND]]
		; GCN-PRELINK: call i32 @__read_pipe_2_4(%opencl.pipe_t addrspace(1)* %{{.}} i32 addrspace(4) %{{.*}}) #[[NOUNWIND]]
		; GCN-PRELINK: call i32 @__read_pipe_2_8(%opencl.pipe_t addrspace(1)* %{{.}} i64 addrspace(4) %{{.*}}) #[[NOUNWIND]]
		; GCN-PRELINK: call i32 @__read_pipe_2_16(%opencl.pipe_t addrspace(1)* %{{.}}, <2 x i64> addrspace(4) %{{.*}}) #[[NOUNWIND]]
		; GCN-PRELINK: call i32 @__read_pipe_2_32(%opencl.pipe_t addrspace(1)* %{{.}}, <4 x i64> addrspace(4) %{{.*}} #[[NOUNWIND]]
		; GCN-PRELINK: call i32 @__read_pipe_2_64(%opencl.pipe_t addrspace(1)* %{{.}}, <8 x i64> addrspace(4) %{{.*}} #[[NOUNWIND]]
		; GCN-PRELINK: call i32 @__read_pipe_2_128(%opencl.pipe_t addrspace(1)* %{{.}}, <16 x i64> addrspace(4) %{{.*}} #[[NOUNWIND]]
		; GCN-PRELINK: call i32 @__read_pipe_2(%opencl.pipe_t addrspace(1)* %{{.}}, i8 addrspace(4) %{{.*}} i32 400, i32 4) #[[NOUNWIND]]
		define amdgpu_kernel void @test_pipe_size(%opencl.pipe_t addrspace(1)* %p1, i8 addrspace(1)* %ptr1, %opencl.pipe_t addrspace(1)* %p2, i16 addrspace(1)* %ptr2, %opencl.pipe_t addrspace(1)* %p4, i32 addrspace(1)* %ptr4, %opencl.pipe_t addrspace(1)* %p8, i64 addrspace(1)* %ptr8, %opencl.pipe_t addrspace(1)* %p16, <2 x i64> addrspace(1)* %ptr16, %opencl.pipe_t addrspace(1)* %p32, <4 x i64> addrspace(1)* %ptr32, %opencl.pipe_t addrspace(1)* %p64, <8 x i64> addrspace(1)* %ptr64, %opencl.pipe_t addrspace(1)* %p128, <16 x i64> addrspace(1)* %ptr128, %opencl.pipe_t addrspace(1)* %pu, %struct.S addrspace(1)* %ptru) local_unnamed_addr #0 {
		entry:
		%tmp = addrspacecast i8 addrspace(1)* %ptr1 to i8 addrspace(4)*
		%tmp1 = tail call i32 @__read_pipe_2(%opencl.pipe_t addrspace(1)* %p1, i8 addrspace(4)* %tmp, i32 1, i32 1) #0
		%tmp2 = bitcast i16 addrspace(1)* %ptr2 to i8 addrspace(1)*
		%tmp3 = addrspacecast i8 addrspace(1)* %tmp2 to i8 addrspace(4)*
		%tmp4 = tail call i32 @__read_pipe_2(%opencl.pipe_t addrspace(1)* %p2, i8 addrspace(4)* %tmp3, i32 2, i32 2) #0
		%tmp5 = bitcast i32 addrspace(1)* %ptr4 to i8 addrspace(1)*
		%tmp6 = addrspacecast i8 addrspace(1)* %tmp5 to i8 addrspace(4)*
		%tmp7 = tail call i32 @__read_pipe_2(%opencl.pipe_t addrspace(1)* %p4, i8 addrspace(4)* %tmp6, i32 4, i32 4) #0
		%tmp8 = bitcast i64 addrspace(1)* %ptr8 to i8 addrspace(1)*
		%tmp9 = addrspacecast i8 addrspace(1)* %tmp8 to i8 addrspace(4)*
		%tmp10 = tail call i32 @__read_pipe_2(%opencl.pipe_t addrspace(1)* %p8, i8 addrspace(4)* %tmp9, i32 8, i32 8) #0
		%tmp11 = bitcast <2 x i64> addrspace(1)* %ptr16 to i8 addrspace(1)*
		%tmp12 = addrspacecast i8 addrspace(1)* %tmp11 to i8 addrspace(4)*
		%tmp13 = tail call i32 @__read_pipe_2(%opencl.pipe_t addrspace(1)* %p16, i8 addrspace(4)* %tmp12, i32 16, i32 16) #0
		%tmp14 = bitcast <4 x i64> addrspace(1)* %ptr32 to i8 addrspace(1)*
		%tmp15 = addrspacecast i8 addrspace(1)* %tmp14 to i8 addrspace(4)*
		%tmp16 = tail call i32 @__read_pipe_2(%opencl.pipe_t addrspace(1)* %p32, i8 addrspace(4)* %tmp15, i32 32, i32 32) #0
		%tmp17 = bitcast <8 x i64> addrspace(1)* %ptr64 to i8 addrspace(1)*
		%tmp18 = addrspacecast i8 addrspace(1)* %tmp17 to i8 addrspace(4)*
		%tmp19 = tail call i32 @__read_pipe_2(%opencl.pipe_t addrspace(1)* %p64, i8 addrspace(4)* %tmp18, i32 64, i32 64) #0
		%tmp20 = bitcast <16 x i64> addrspace(1)* %ptr128 to i8 addrspace(1)*
		%tmp21 = addrspacecast i8 addrspace(1)* %tmp20 to i8 addrspace(4)*
		%tmp22 = tail call i32 @__read_pipe_2(%opencl.pipe_t addrspace(1)* %p128, i8 addrspace(4)* %tmp21, i32 128, i32 128) #0
		%tmp23 = bitcast %struct.S addrspace(1)* %ptru to i8 addrspace(1)*
		%tmp24 = addrspacecast i8 addrspace(1)* %tmp23 to i8 addrspace(4)*
		%tmp25 = tail call i32 @__read_pipe_2(%opencl.pipe_t addrspace(1)* %pu, i8 addrspace(4)* %tmp24, i32 400, i32 4) #0
		ret void
		}

		; CGN-PRELINK: attributes #[[NOUNWIND]] = { nounwind }
		attributes #0 = { nounwind }

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Transform __read_pipe_* and __write_pipe_*ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 113939

llvm/trunk/lib/Target/AMDGPU/AMDGPULibCalls.cpp

llvm/trunk/lib/Target/AMDGPU/AMDGPULibFunc.h

llvm/trunk/lib/Target/AMDGPU/AMDGPULibFunc.cpp

llvm/trunk/test/CodeGen/AMDGPU/simplify-libcalls.ll

[AMDGPU] Transform __read_pipe_* and __write_pipe_*
ClosedPublic