This is an archive of the discontinued LLVM Phabricator instance.

lib/Target/AMDGPU/AMDGPULibCalls.cpp
612	"llvm::" prefix is not needed.
630	You should not use getOrInsertFunction directly, but use AMDGPULibCalls::getFunction(). Since you are not using it you also fail to check we are in pre-link.
671	You should use AMDGPULibFunc and parser/mangler used across the whole this source.

This revision now requires changes to proceed.Aug 17 2017, 10:26 AM

rampitec added a reviewer: vpykhtin.Aug 17 2017, 10:33 AM

yaxunl marked 3 inline comments as done.Aug 20 2017, 2:38 PM

yaxunl added inline comments.

lib/Target/AMDGPU/AMDGPULibCalls.cpp
612	will remove
630	whether in pre-link can be checked by whether the function is declaration. `__read_pipe_` and `__write_pipe_` are unmangled functions. AMDGPULibFunc::getFunction relies on argument information encoded in mangled function names, which does not work for unmangled functions.
671	will refactor this.

refactor AMDGPULibFunc to handle unmangled lib functions.

add check for declaration to avoid post-link transformation of pipe functions.

You still fail to use getFunction and thus also fail to check prelink.

lib/Target/AMDGPU/AMDGPULibCalls.cpp
67	It is better hide these details from the client. AMDGPULibFunc should care about all of it and FuncInfo shall be sufficient structure to describe a function. Simplifier should not care about details.
630	You are still using M->getOrInsertFunction() directly.
lib/Target/AMDGPU/AMDGPULibFunc.cpp
1002	I believe we can go without bruteforce search loop.
lib/Target/AMDGPU/AMDGPULibFunc.h
238	There is already EI_READ_PIPE above. Is it mangled? Then why these are unmangled?
351–361	StringRef designed to be passed by value, here and below.
test/CodeGen/AMDGPU/simplify-libcalls.ll
704	Run tests through opt -instnamer.

This revision now requires changes to proceed.Aug 22 2017, 9:31 AM

Clang does not emit mangled functions for pipe builtin functions. Instead, it emits unmangled functions e.g. __read_pipe_*. These functions are not overloaded and will be implemented by device library.

The original implementation of AMDGPULibFunc assumes all library functions have mangled name. It extracts function argument information based on the mangled function name and use it to implement getOrInsertFunction.

For unmangled library functions, those functions regarding mangling or function parameter info are either irrelevant or useless. Those functions are needed for mangled functions because the mangling requires such information.

Therefore I subclass AMDGPULibFunc as AMDGPUMangledLibFunc and AMDGPUUnmangledLibFunc and move all the stuff needed for mangling to AMDGPUMangledLibFunc.

For AMDGPUUnmangledLibFunc, since there is no argument type information from mangling, AMDGPULibFunc::getOrInsertFunction cannot be implemented. There is no special trick about identifying whether it is prelink or not in the original implementation. Just checks whether the function is declaration or not. I see no point of implementing a special getOrInsertFunction for AMDGPUUnmangledLibFunc.

In D36831#848827, @yaxunl wrote:

Clang does not emit mangled functions for pipe builtin functions. Instead, it emits unmangled functions e.g. __read_pipe_*. These functions are not overloaded and will be implemented by device library.

The original implementation of AMDGPULibFunc assumes all library functions have mangled name. It extracts function argument information based on the mangled function name and use it to implement getOrInsertFunction.

For unmangled library functions, those functions regarding mangling or function parameter info are either irrelevant or useless. Those functions are needed for mangled functions because the mangling requires such information.

Therefore I subclass AMDGPULibFunc as AMDGPUMangledLibFunc and AMDGPUUnmangledLibFunc and move all the stuff needed for mangling to AMDGPUMangledLibFunc.

For AMDGPUUnmangledLibFunc, since there is no argument type information from mangling, AMDGPULibFunc::getOrInsertFunction cannot be implemented. There is no special trick about identifying whether it is prelink or not in the original implementation. Just checks whether the function is declaration or not. I see no point of implementing a special getOrInsertFunction for AMDGPUUnmangledLibFunc.

What about EI_READ_PIPE? Is it mangled? Shall it be removed?

Whatever you subclass all the changes shall be contained within AMDGPULibFunc and not in its clients. If some of the arguments on FuncInfo are unused in this case that is fine. You can also check the former implementation from HSAIL which was also handling EDG style magling (unmagled in therms you are using here).

All in all that is only legal to do it on prelink, and that is exactly what getFunction() does.

yaxunl marked 8 inline comments as done.Aug 22 2017, 10:11 AM

yaxunl added inline comments.

lib/Target/AMDGPU/AMDGPULibCalls.cpp
67	The handling of mangled lib function requires function argument information, which is not available for unmangled lib function. The transformation of unmangled lib function does not require function argument information. There is no point of implementing many of the member functions of mangled lib functions.
lib/Target/AMDGPU/AMDGPULibFunc.cpp
1002	will add a map.
lib/Target/AMDGPU/AMDGPULibFunc.h
238	clang does not emit mangled read_pipe functions. Will remove EI_READ_PIPE.
351–361	This comes from the original implementation. Will fix.
test/CodeGen/AMDGPU/simplify-libcalls.ll
704	will fix.

In D36831#848830, @rampitec wrote:

In D36831#848827, @yaxunl wrote:

Clang does not emit mangled functions for pipe builtin functions. Instead, it emits unmangled functions e.g. __read_pipe_*. These functions are not overloaded and will be implemented by device library.

The original implementation of AMDGPULibFunc assumes all library functions have mangled name. It extracts function argument information based on the mangled function name and use it to implement getOrInsertFunction.

For unmangled library functions, those functions regarding mangling or function parameter info are either irrelevant or useless. Those functions are needed for mangled functions because the mangling requires such information.

Therefore I subclass AMDGPULibFunc as AMDGPUMangledLibFunc and AMDGPUUnmangledLibFunc and move all the stuff needed for mangling to AMDGPUMangledLibFunc.

For AMDGPUUnmangledLibFunc, since there is no argument type information from mangling, AMDGPULibFunc::getOrInsertFunction cannot be implemented. There is no special trick about identifying whether it is prelink or not in the original implementation. Just checks whether the function is declaration or not. I see no point of implementing a special getOrInsertFunction for AMDGPUUnmangledLibFunc.

What about EI_READ_PIPE? Is it mangled? Shall it be removed?

Whatever you subclass all the changes shall be contained within AMDGPULibFunc and not in its clients. If some of the arguments on FuncInfo are unused in this case that is fine. You can also check the former implementation from HSAIL which was also handling EDG style magling (unmagled in therms you are using here).

All in all that is only legal to do it on prelink, and that is exactly what getFunction() does.

For pipe functions, clang currently does not do any mangling at all. It is not like the EDG style mangling.

As I said, getFunction() requires function argument information, which is not available for unmangled functions. To implement getFunction() requires providing argument information for unmangled functions, which is time consuming and also useless.

In D36831#848885, @yaxunl wrote:

As I said, getFunction() requires function argument information, which is not available for unmangled functions. To implement getFunction() requires providing argument information for unmangled functions, which is time consuming and also useless.

You can just omit filling it. You have FuncId and that is all you need in this case.

In D36831#848889, @rampitec wrote:

In D36831#848885, @yaxunl wrote:

As I said, getFunction() requires function argument information, which is not available for unmangled functions. To implement getFunction() requires providing argument information for unmangled functions, which is time consuming and also useless.

You can just omit filling it. You have FuncId and that is all you need in this case.

I need to get a function with different (transformed) function type.

Run it without -amdgpu-prelink. It will fail to link. It will also fail to build library.

In D36831#848905, @rampitec wrote:

Run it without -amdgpu-prelink. It will fail to link. It will also fail to build library.

The device library has not implemented these functions yet. I think that's why it fails to link.

I will investigate why it fails to build library.

In D36831#849723, @yaxunl wrote:

In D36831#848905, @rampitec wrote:

Run it without -amdgpu-prelink. It will fail to link. It will also fail to build library.

The device library has not implemented these functions yet. I think that's why it fails to link.

I will investigate why it fails to build library.

It fails because you do not use getFunction, effectively skipping prelinck check.

In D36831#849725, @rampitec wrote:

In D36831#849723, @yaxunl wrote:

In D36831#848905, @rampitec wrote:

Run it without -amdgpu-prelink. It will fail to link. It will also fail to build library.

The device library has not implemented these functions yet. I think that's why it fails to link.

I will investigate why it fails to build library.

It fails because you do not use getFunction, effectively skipping prelinck check.

For mangled lib functions, my patch does not change how they are handled. They still go through getFunction.

For unmangled lib functions, I only transform them if they are declarations. In post-linking pass, they are already linked and are not declarations, therefore they stay unchanged.

I am wondering why there will be link failure.

In D36831#850296, @yaxunl wrote:

In D36831#849725, @rampitec wrote:

In D36831#849723, @yaxunl wrote:

In D36831#848905, @rampitec wrote:

Run it without -amdgpu-prelink. It will fail to link. It will also fail to build library.

The device library has not implemented these functions yet. I think that's why it fails to link.

I will investigate why it fails to build library.

It fails because you do not use getFunction, effectively skipping prelinck check.

For mangled lib functions, my patch does not change how they are handled. They still go through getFunction.

For unmangled lib functions, I only transform them if they are declarations. In post-linking pass, they are already linked and are not declarations, therefore they stay unchanged.

I am wondering why there will be link failure.

Library build works before link, but you do not check that prelink transfirmations allowed.

In D36831#850356, @rampitec wrote:

In D36831#850296, @yaxunl wrote:

In D36831#849725, @rampitec wrote:

In D36831#849723, @yaxunl wrote:

In D36831#848905, @rampitec wrote:

Run it without -amdgpu-prelink. It will fail to link. It will also fail to build library.

The device library has not implemented these functions yet. I think that's why it fails to link.

I will investigate why it fails to build library.

It fails because you do not use getFunction, effectively skipping prelinck check.

For mangled lib functions, my patch does not change how they are handled. They still go through getFunction.

For unmangled lib functions, I only transform them if they are declarations. In post-linking pass, they are already linked and are not declarations, therefore they stay unchanged.

I am wondering why there will be link failure.

Library build works before link, but you do not check that prelink transfirmations allowed.

library contains definition of unmangled functions, since they are not declaration, the pass will not change them.

b-sumner added a subscriber: b-sumner.Aug 31 2017, 12:31 PM

Splitting AMDGPULibFunc in two classes looks a huge overkill. How about modifying AMDGPULibFunc::parse so it could accept unmangled names and just return an enum id for the function (using some fast lookup approach)? Type info for such functions can be left unpopulated and supposed to be handled by the client (as in fold_read_write_pipe).

In D36831#858723, @vpykhtin wrote:

Splitting AMDGPULibFunc in two classes looks a huge overkill. How about modifying AMDGPULibFunc::parse so it could accept unmangled names and just return an enum id for the function (using some fast lookup approach)? Type info for such functions can be left unpopulated and supposed to be handled by the client (as in fold_read_write_pipe).

The unmangled lib function is different from mangled function in the way how the function names and type information are handled. I have found a way to reuse the interface mangle, getOrInsertFunction, getFunction, and getFunctionType. However to achieve that we really need to have different classes for mangled and unmangled lib function and take advantages of some virtual functions. Having different classes for unmangled and mangled lib functions also have a cleaner design where all name mangling stuff are kept in where they belong.

Ok, unmangled part looks different indeed. If the issue with pre-link checking is solved this patch is ok with me.

rampitec added inline comments.Sep 1 2017, 9:55 AM

lib/Target/AMDGPU/AMDGPULibCalls.cpp
630	I see you are now checking for the declaration vs definition, but it is still only legal on pre-link and that is not checked.
lib/Target/AMDGPU/AMDGPULibFunc.cpp
1002	Not done.
lib/Target/AMDGPU/AMDGPULibFunc.h
351–361	Not done.
test/CodeGen/AMDGPU/simplify-libcalls.ll
704	Not done.

Revised by Stas' comments. Use AMDGPULibFunc::getOrInsertFunction to create function.

Minor change of test.

vpykhtin added inline comments.Sep 4 2017, 3:15 AM

lib/Target/AMDGPU/AMDGPULibFunc.h
351–361	Initially (non-const) StringRef& was introduced intentionally to have mangledName with stripped name on return.

yaxunl marked an inline comment as done.Sep 4 2017, 5:40 AM

yaxunl added inline comments.

lib/Target/AMDGPU/AMDGPULibCalls.cpp
630	Now use AMDGPULibFunc::getOrInsertFunction().
lib/Target/AMDGPU/AMDGPULibFunc.h
351–361	Only const StringRef& is changed to StringRef. non-const StringRef& is not changed.

rampitec added inline comments.Sep 4 2017, 10:30 AM

lib/Target/AMDGPU/AMDGPULibCalls.cpp
67	The whole point of moving logic to distinguish between mangled and unmangled into the parser is to have no massive changes in the client like this and to let client to not bother differentiating on every other line.
636	The point of using it is to check its result and bail if null. Also no modifications shall be done before this check.
test/CodeGen/AMDGPU/simplify-libcalls.ll
704	I still see it.

yaxunl marked 2 inline comments as done.Sep 4 2017, 2:55 PM

yaxunl added inline comments.

test/CodeGen/AMDGPU/simplify-libcalls.ll
704	I've add -instnamer to the RUN line. Do you mean I should get the original .ll through opt -instnamer so that the .ll contains named instructions?

rampitec added inline comments.Sep 4 2017, 4:30 PM

test/CodeGen/AMDGPU/simplify-libcalls.ll
704	Yes, there shall be no numbered variables in the test as it has to be easily editable.

Revised by Stas' comments.

rampitec added inline comments.Sep 5 2017, 2:00 PM

test/CodeGen/AMDGPU/simplify-libcalls.ll
93–1	-instanamer is not needed here.

Remove -instnamer from RUN line.

rampitec added inline comments.Sep 5 2017, 2:09 PM

lib/Target/AMDGPU/AMDGPULibCalls.cpp
166	Please move the actual IR change below "return false" statement.

Revised by Stas' comments.

Thanks!

This revision is now accepted and ready to land.Sep 5 2017, 2:32 PM

Closed by commit rL312598: [AMDGPU] Transform __read_pipe_* and __write_pipe_* (authored by yaxunl). · Explain WhySep 5 2017, 5:31 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

	AMDGPULibCalls.cpp
	AMDGPULibCalls.cpp

273 lines

	AMDGPULibFunc.h
	AMDGPULibFunc.h

106 lines

	AMDGPULibFunc.cpp
	AMDGPULibFunc.cpp

152 lines

test/

CodeGen/

AMDGPU/

	simplify-libcalls.ll
	simplify-libcalls.ll

135 lines

Diff 113672

lib/Target/AMDGPU/AMDGPULibCalls.cpp

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
#define MATH_RLOG2_E 0.6931471805599453094172321214581765680755001343602552		#define MATH_RLOG2_E 0.6931471805599453094172321214581765680755001343602552

namespace llvm {		namespace llvm {

class AMDGPULibCalls {		class AMDGPULibCalls {
private:		private:

typedef llvm::AMDGPULibFunc FuncInfo;		typedef llvm::AMDGPULibFunc FuncInfo;
		typedef llvm::AMDGPUMangledLibFunc MangledFuncInfo;
		rampitecUnsubmitted Done Reply Inline Actions It is better hide these details from the client. AMDGPULibFunc should care about all of it and FuncInfo shall be sufficient structure to describe a function. Simplifier should not care about details. rampitec: It is better hide these details from the client. AMDGPULibFunc should care about all of it and…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions The handling of mangled lib function requires function argument information, which is not available for unmangled lib function. The transformation of unmangled lib function does not require function argument information. There is no point of implementing many of the member functions of mangled lib functions. yaxunl: The handling of mangled lib function requires function argument information, which is not…
		rampitecUnsubmitted Not Done Reply Inline Actions The whole point of moving logic to distinguish between mangled and unmangled into the parser is to have no massive changes in the client like this and to let client to not bother differentiating on every other line. rampitec: The whole point of moving logic to distinguish between mangled and unmangled into the parser is…
		typedef llvm::AMDGPUUnmangledLibFunc UnmangledFuncInfo;

// -fuse-native.		// -fuse-native.
bool AllNative = false;		bool AllNative = false;

bool useNativeFunc(const StringRef F) const;		bool useNativeFunc(const StringRef F) const;

// Return a pointer (pointer expr) to the function if function defintion with		// Return a pointer (pointer expr) to the function if function defintion with
// "FuncName" exists. It may create a new function prototype in pre-link mode.		// "FuncName" exists. It may create a new function prototype in pre-link mode.
Constant getFunction(Module M, const FuncInfo& fInfo);		Constant getFunction(Module M, const MangledFuncInfo &fInfo);

// Replace a normal function with its native version.		// Replace a normal function with its native version.
bool replaceWithNative(CallInst *CI, const FuncInfo &FInfo);		bool replaceWithNative(CallInst *CI, const MangledFuncInfo &FInfo);

bool parseFunctionName(const StringRef& FMangledName,		std::unique_ptr<FuncInfo> parseFunctionName(const StringRef &Name);
FuncInfo FInfo=nullptr /out*/);

bool TDOFold(CallInst *CI, const FuncInfo &FInfo);		bool TDOFold(CallInst *CI, const MangledFuncInfo &FInfo);

/* Specialized optimizations */		/* Specialized optimizations */

// recip (half or native)		// recip (half or native)
bool fold_recip(CallInst *CI, IRBuilder<> &B, const FuncInfo &FInfo);		bool fold_recip(CallInst *CI, IRBuilder<> &B, const MangledFuncInfo &FInfo);

// divide (half or native)		// divide (half or native)
bool fold_divide(CallInst *CI, IRBuilder<> &B, const FuncInfo &FInfo);		bool fold_divide(CallInst *CI, IRBuilder<> &B, const MangledFuncInfo &FInfo);

// pow/powr/pown		// pow/powr/pown
bool fold_pow(CallInst *CI, IRBuilder<> &B, const FuncInfo &FInfo);		bool fold_pow(CallInst *CI, IRBuilder<> &B, const MangledFuncInfo &FInfo);

// rootn		// rootn
bool fold_rootn(CallInst *CI, IRBuilder<> &B, const FuncInfo &FInfo);		bool fold_rootn(CallInst *CI, IRBuilder<> &B, const MangledFuncInfo &FInfo);

// fma/mad		// fma/mad
bool fold_fma_mad(CallInst *CI, IRBuilder<> &B, const FuncInfo &FInfo);		bool fold_fma_mad(CallInst *CI, IRBuilder<> &B, const MangledFuncInfo &FInfo);

// -fuse-native for sincos		// -fuse-native for sincos
bool sincosUseNative(CallInst *aCI, const FuncInfo &FInfo);		bool sincosUseNative(CallInst *aCI, const MangledFuncInfo &FInfo);

// evaluate calls if calls' arguments are constants.		// evaluate calls if calls' arguments are constants.
bool evaluateScalarMathFunc(FuncInfo &FInfo, double& Res0,		bool evaluateScalarMathFunc(MangledFuncInfo &FInfo, double &Res0,
double& Res1, Constant copr0, Constant copr1, Constant *copr2);		double &Res1, Constant copr0, Constant copr1,
bool evaluateCall(CallInst *aCI, FuncInfo &FInfo);		Constant *copr2);
		bool evaluateCall(CallInst *aCI, MangledFuncInfo &FInfo);

// exp		// exp
bool fold_exp(CallInst *CI, IRBuilder<> &B, const FuncInfo &FInfo);		bool fold_exp(CallInst *CI, IRBuilder<> &B, const MangledFuncInfo &FInfo);

// exp2		// exp2
bool fold_exp2(CallInst *CI, IRBuilder<> &B, const FuncInfo &FInfo);		bool fold_exp2(CallInst *CI, IRBuilder<> &B, const MangledFuncInfo &FInfo);

// exp10		// exp10
bool fold_exp10(CallInst *CI, IRBuilder<> &B, const FuncInfo &FInfo);		bool fold_exp10(CallInst *CI, IRBuilder<> &B, const MangledFuncInfo &FInfo);

// log		// log
bool fold_log(CallInst *CI, IRBuilder<> &B, const FuncInfo &FInfo);		bool fold_log(CallInst *CI, IRBuilder<> &B, const MangledFuncInfo &FInfo);

// log2		// log2
bool fold_log2(CallInst *CI, IRBuilder<> &B, const FuncInfo &FInfo);		bool fold_log2(CallInst *CI, IRBuilder<> &B, const MangledFuncInfo &FInfo);

// log10		// log10
bool fold_log10(CallInst *CI, IRBuilder<> &B, const FuncInfo &FInfo);		bool fold_log10(CallInst *CI, IRBuilder<> &B, const MangledFuncInfo &FInfo);

// sqrt		// sqrt
bool fold_sqrt(CallInst *CI, IRBuilder<> &B, const FuncInfo &FInfo);		bool fold_sqrt(CallInst *CI, IRBuilder<> &B, const MangledFuncInfo &FInfo);

// sin/cos		// sin/cos
bool fold_sincos(CallInst * CI, IRBuilder<> &B, AliasAnalysis * AA);		bool fold_sincos(CallInst * CI, IRBuilder<> &B, AliasAnalysis * AA);

		// __read_pipe/__write_pipe
		bool fold_read_write_pipe(CallInst *CI, IRBuilder<> &B,
		UnmangledFuncInfo &FInfo);

// Get insertion point at entry.		// Get insertion point at entry.
BasicBlock::iterator getEntryIns(CallInst * UI);		BasicBlock::iterator getEntryIns(CallInst * UI);
// Insert an Alloc instruction.		// Insert an Alloc instruction.
AllocaInst* insertAlloca(CallInst * UI, IRBuilder<> &B, const char *prefix);		AllocaInst* insertAlloca(CallInst * UI, IRBuilder<> &B, const char *prefix);
// Get a scalar native builtin signle argument FP function		// Get a scalar native builtin signle argument FP function
Constant* getNativeFunction(Module* M, const FuncInfo &FInfo);		Constant getNativeFunction(Module M, const MangledFuncInfo &FInfo);
		// Fold library function with mangled name.
		bool foldMangledFunction(CallInst *CI, MangledFuncInfo &Info, IRBuilder<> &B,
		AliasAnalysis *AA = nullptr);
		// Fold library function with unmangled name.
		bool foldUnmangledFunction(CallInst *CI, UnmangledFuncInfo &Info,
		IRBuilder<> &B, AliasAnalysis *AA = nullptr);

protected:		protected:
CallInst *CI;		CallInst *CI;

bool isUnsafeMath(const CallInst *CI) const;		bool isUnsafeMath(const CallInst *CI) const;

void replaceCall(Value *With) {		void replaceCall(Value *With) {
CI->replaceAllUsesWith(With);		CI->replaceAllUsesWith(With);
CI->eraseFromParent();		CI->eraseFromParent();
}		}

public:		public:
bool fold(CallInst CI, AliasAnalysis AA = nullptr);		bool fold(CallInst CI, AliasAnalysis AA = nullptr);

void initNativeFuncs();		void initNativeFuncs();
		rampitecUnsubmitted Not Done Reply Inline Actions Please move the actual IR change below "return false" statement. rampitec: Please move the actual IR change below "return false" statement.

// Replace a normal math function call with that native version		// Replace a normal math function call with that native version
bool useNative(CallInst *CI);		bool useNative(CallInst *CI);
};		};

} // end llvm namespace		} // end llvm namespace

namespace {		namespace {
▲ Show 20 Lines • Show All 289 Lines • ▼ Show 20 Lines	static TableRef getOptTable(AMDGPULibFunc::EFuncId id) {
case AMDGPULibFunc::EI_TANH: return TableRef(tbl_tanh);		case AMDGPULibFunc::EI_TANH: return TableRef(tbl_tanh);
case AMDGPULibFunc::EI_TANPI: return TableRef(tbl_tanpi);		case AMDGPULibFunc::EI_TANPI: return TableRef(tbl_tanpi);
case AMDGPULibFunc::EI_TGAMMA: return TableRef(tbl_tgamma);		case AMDGPULibFunc::EI_TGAMMA: return TableRef(tbl_tgamma);
default:;		default:;
}		}
return TableRef();		return TableRef();
}		}

static inline int getVecSize(const AMDGPULibFunc& FInfo) {		static inline int getVecSize(const AMDGPUMangledLibFunc &FInfo) {
return FInfo.Leads[0].VectorSize;		return FInfo.Leads[0].VectorSize;
}		}

static inline AMDGPULibFunc::EType getArgType(const AMDGPULibFunc& FInfo) {		static inline AMDGPULibFunc::EType
		getArgType(const AMDGPUMangledLibFunc &FInfo) {
return (AMDGPULibFunc::EType)FInfo.Leads[0].ArgType;		return (AMDGPULibFunc::EType)FInfo.Leads[0].ArgType;
}		}

Constant AMDGPULibCalls::getFunction(Module M, const FuncInfo& fInfo) {		Constant AMDGPULibCalls::getFunction(Module M, const MangledFuncInfo &fInfo) {
// If we are doing PreLinkOpt, the function is external. So it is safe to		// If we are doing PreLinkOpt, the function is external. So it is safe to
// use getOrInsertFunction() at this stage.		// use getOrInsertFunction() at this stage.

return EnablePreLink ? AMDGPULibFunc::getOrInsertFunction(M, fInfo)		return EnablePreLink ? AMDGPUMangledLibFunc::getOrInsertFunction(M, fInfo)
: AMDGPULibFunc::getFunction(M, fInfo);		: AMDGPUMangledLibFunc::getFunction(M, fInfo);
}		}

bool AMDGPULibCalls::parseFunctionName(const StringRef& FMangledName,		std::unique_ptr<AMDGPULibCalls::FuncInfo>
FuncInfo *FInfo) {		AMDGPULibCalls::parseFunctionName(const StringRef &Name) {
return AMDGPULibFunc::parse(FMangledName, *FInfo);		return AMDGPULibFunc::parse(Name);
}		}

bool AMDGPULibCalls::isUnsafeMath(const CallInst *CI) const {		bool AMDGPULibCalls::isUnsafeMath(const CallInst *CI) const {
if (auto Op = dyn_cast<FPMathOperator>(CI))		if (auto Op = dyn_cast<FPMathOperator>(CI))
if (Op->hasUnsafeAlgebra())		if (Op->hasUnsafeAlgebra())
return true;		return true;
const Function *F = CI->getParent()->getParent();		const Function *F = CI->getParent()->getParent();
Attribute Attr = F->getFnAttribute("unsafe-fp-math");		Attribute Attr = F->getFnAttribute("unsafe-fp-math");
return Attr.getValueAsString() == "true";		return Attr.getValueAsString() == "true";
}		}

bool AMDGPULibCalls::useNativeFunc(const StringRef F) const {		bool AMDGPULibCalls::useNativeFunc(const StringRef F) const {
return AllNative \|\|		return AllNative \|\|
std::find(UseNative.begin(), UseNative.end(), F) != UseNative.end();		std::find(UseNative.begin(), UseNative.end(), F) != UseNative.end();
}		}

void AMDGPULibCalls::initNativeFuncs() {		void AMDGPULibCalls::initNativeFuncs() {
AllNative = useNativeFunc("all") \|\|		AllNative = useNativeFunc("all") \|\|
(UseNative.getNumOccurrences() && UseNative.size() == 1 &&		(UseNative.getNumOccurrences() && UseNative.size() == 1 &&
UseNative.begin()->empty());		UseNative.begin()->empty());
}		}

bool AMDGPULibCalls::sincosUseNative(CallInst *aCI, const FuncInfo &FInfo) {		bool AMDGPULibCalls::sincosUseNative(CallInst *aCI,
		const MangledFuncInfo &FInfo) {
bool native_sin = useNativeFunc("sin");		bool native_sin = useNativeFunc("sin");
bool native_cos = useNativeFunc("cos");		bool native_cos = useNativeFunc("cos");

if (native_sin && native_cos) {		if (native_sin && native_cos) {
Module *M = aCI->getModule();		Module *M = aCI->getModule();
Value *opr0 = aCI->getArgOperand(0);		Value *opr0 = aCI->getArgOperand(0);

AMDGPULibFunc nf;		AMDGPUMangledLibFunc nf;
nf.Leads[0].ArgType = FInfo.Leads[0].ArgType;		nf.Leads[0].ArgType = FInfo.Leads[0].ArgType;
nf.Leads[0].VectorSize = FInfo.Leads[0].VectorSize;		nf.Leads[0].VectorSize = FInfo.Leads[0].VectorSize;

nf.setPrefix(AMDGPULibFunc::NATIVE);		nf.setPrefix(AMDGPULibFunc::NATIVE);
nf.setId(AMDGPULibFunc::EI_SIN);		nf.setId(AMDGPULibFunc::EI_SIN);
Constant *sinExpr = getFunction(M, nf);		Constant *sinExpr = getFunction(M, nf);

nf.setPrefix(AMDGPULibFunc::NATIVE);		nf.setPrefix(AMDGPULibFunc::NATIVE);
Show All 13 Lines	bool AMDGPULibCalls::sincosUseNative(CallInst *aCI,
}		}
return false;		return false;
}		}

bool AMDGPULibCalls::useNative(CallInst *aCI) {		bool AMDGPULibCalls::useNative(CallInst *aCI) {
CI = aCI;		CI = aCI;
Function *Callee = aCI->getCalledFunction();		Function *Callee = aCI->getCalledFunction();

FuncInfo FInfo;		auto PInfo = parseFunctionName(Callee->getName());
if (!parseFunctionName(Callee->getName(), &FInfo) \|\|		auto *FInfo = dyn_cast_or_null<MangledFuncInfo>(PInfo.get());
FInfo.getPrefix() != AMDGPULibFunc::NOPFX \|\|
getArgType(FInfo) == AMDGPULibFunc::F64 \|\|		if (!FInfo)
!HasNative(FInfo.getId()) \|\|		return false;
!(AllNative \|\| useNativeFunc(FInfo.getName())) ) {
		if (FInfo->getPrefix() != AMDGPULibFunc::NOPFX \|\|
		getArgType(*FInfo) == AMDGPULibFunc::F64 \|\| !HasNative(FInfo->getId()) \|\|
		!(AllNative \|\| useNativeFunc(FInfo->getUnmangledName()))) {
return false;		return false;
}		}

if (FInfo.getId() == AMDGPULibFunc::EI_SINCOS)		if (FInfo->getId() == AMDGPULibFunc::EI_SINCOS)
return sincosUseNative(aCI, FInfo);		return sincosUseNative(aCI, *FInfo);

FInfo.setPrefix(AMDGPULibFunc::NATIVE);		FInfo->setPrefix(AMDGPULibFunc::NATIVE);
Constant *F = getFunction(aCI->getModule(), FInfo);		Constant F = getFunction(aCI->getModule(), FInfo);
if (!F)		if (!F)
return false;		return false;

aCI->setCalledFunction(F);		aCI->setCalledFunction(F);
DEBUG_WITH_TYPE("usenative", dbgs() << "<useNative> replace " << *aCI		DEBUG_WITH_TYPE("usenative", dbgs() << "<useNative> replace " << *aCI
<< " with native version");		<< " with native version");
return true;		return true;
}		}

		// Clang emits call of __read_pipe_2 or __read_pipe_4 for OpenCL read_pipe
		// builtin, with appended type size and alignment arguments, where 2 or 4
		// indicates the original number of arguments. The library has optimized version
		// of __read_pipe_2/__read_pipe_4 when the type size and alignment has the same
		// power of 2 value. This function transforms __read_pipe_2 to __read_pipe_2_N
		// for such cases where N is the size in bytes of the type (N = 1, 2, 4, 8, ...,
		// 128). The same for __read_pipe_4, write_pipe_2, and write_pipe_4.
		bool AMDGPULibCalls::fold_read_write_pipe(CallInst *CI, IRBuilder<> &B,
		UnmangledFuncInfo &FInfo) {
		auto *Callee = CI->getCalledFunction();
		if (!Callee->isDeclaration())
		return false;

		assert(Callee->hasName() && "Invalid read_pipe/write_pipe function");
		auto *M = Callee->getParent();
		auto &Ctx = M->getContext();
		std::string Name = Callee->getName();
		auto NumArg = CI->getNumArgOperands();
		if (NumArg != 4 && NumArg != 6)
		return false;
		auto *PacketSize = CI->getArgOperand(NumArg - 2);
		auto *PacketAlign = CI->getArgOperand(NumArg - 1);
		if (!isa<ConstantInt>(PacketSize) \|\| !isa<ConstantInt>(PacketAlign))
		return false;
		unsigned Size = cast<ConstantInt>(PacketSize)->getZExtValue();
		unsigned Align = cast<ConstantInt>(PacketAlign)->getZExtValue();
		if (Size != Align \|\| !isPowerOf2_32(Size))
		return false;

		Type *PtrElemTy;
		if (Size <= 8)
		PtrElemTy = Type::getIntNTy(Ctx, Size * 8);
		else
		PtrElemTy = VectorType::get(Type::getInt64Ty(Ctx), Size / 8);
		rampitecUnsubmitted Done Reply Inline Actions "llvm::" prefix is not needed. rampitec: "llvm::" prefix is not needed.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions will remove yaxunl: will remove
		unsigned PtrArgLoc = CI->getNumArgOperands() - 3;
		auto PtrArg = CI->getArgOperand(PtrArgLoc);
		unsigned PtrArgAS = PtrArg->getType()->getPointerAddressSpace();
		auto *PtrTy = llvm::PointerType::get(PtrElemTy, PtrArgAS);

		SmallVector<llvm::Type *, 6> ArgTys;
		for (unsigned I = 0; I != PtrArgLoc; ++I)
		ArgTys.push_back(CI->getArgOperand(I)->getType());
		ArgTys.push_back(PtrTy);

		Name = Name + "_" + std::to_string(Size);

		auto *FTy = FunctionType::get(Callee->getReturnType(),
		ArrayRef<Type *>(ArgTys), false);
		auto *BCast = B.CreatePointerCast(PtrArg, PtrTy);

		SmallVector<Value *, 6> Args;
		for (unsigned I = 0; I != PtrArgLoc; ++I)
		rampitecUnsubmitted Done Reply Inline Actions You should not use getOrInsertFunction directly, but use AMDGPULibCalls::getFunction(). Since you are not using it you also fail to check we are in pre-link. rampitec: You should not use getOrInsertFunction directly, but use AMDGPULibCalls::getFunction(). Since…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions whether in pre-link can be checked by whether the function is declaration. `__read_pipe_` and `__write_pipe_` are unmangled functions. AMDGPULibFunc::getFunction relies on argument information encoded in mangled function names, which does not work for unmangled functions. yaxunl: whether in pre-link can be checked by whether the function is declaration. `__read_pipe_*` and…
		rampitecUnsubmitted Done Reply Inline Actions You are still using M->getOrInsertFunction() directly. rampitec: You are still using M->getOrInsertFunction() directly.
		rampitecUnsubmitted Done Reply Inline Actions I see you are now checking for the declaration vs definition, but it is still only legal on pre-link and that is not checked. rampitec: I see you are now checking for the declaration vs definition, but it is still only legal on pre…
		yaxunlAuthorUnsubmitted Not Done Reply Inline Actions Now use AMDGPULibFunc::getOrInsertFunction(). yaxunl: Now use AMDGPULibFunc::getOrInsertFunction().
		Args.push_back(CI->getArgOperand(I));
		Args.push_back(BCast);

		FInfo.setName(Name);
		FInfo.setFunctionType(FTy);
		auto *F = AMDGPULibFunc::getOrInsertFunction(M, FInfo);
		rampitecUnsubmitted Not Done Reply Inline Actions The point of using it is to check its result and bail if null. Also no modifications shall be done before this check. rampitec: The point of using it is to check its result and bail if null. Also no modifications shall be…
		auto *NCI = B.CreateCall(F, Args);
		NCI->setAttributes(CI->getAttributes());
		CI->replaceAllUsesWith(NCI);
		CI->dropAllReferences();
		CI->eraseFromParent();

		return true;
		}

// This function returns false if no change; return true otherwise.		// This function returns false if no change; return true otherwise.
bool AMDGPULibCalls::fold(CallInst CI, AliasAnalysis AA) {		bool AMDGPULibCalls::fold(CallInst CI, AliasAnalysis AA) {
this->CI = CI;		this->CI = CI;
Function *Callee = CI->getCalledFunction();		Function *Callee = CI->getCalledFunction();

// Ignore indirect calls.		// Ignore indirect calls.
if (Callee == 0) return false;		if (Callee == 0) return false;

FuncInfo FInfo;		auto PFInfo = parseFunctionName(Callee->getName());
if (!parseFunctionName(Callee->getName(), &FInfo))		if (!PFInfo)
return false;		return false;

		auto &FInfo = *PFInfo;
// Further check the number of arguments to see if they match.		// Further check the number of arguments to see if they match.
if (CI->getNumArgOperands() != FInfo.getNumArgs())		if (CI->getNumArgOperands() != FInfo.getNumArgs())
return false;		return false;

BasicBlock *BB = CI->getParent();		BasicBlock *BB = CI->getParent();
LLVMContext &Context = CI->getParent()->getContext();		LLVMContext &Context = CI->getParent()->getContext();
IRBuilder<> B(Context);		IRBuilder<> B(Context);

// Set the builder to the instruction after the call.		// Set the builder to the instruction after the call.
B.SetInsertPoint(BB, CI->getIterator());		B.SetInsertPoint(BB, CI->getIterator());

// Copy fast flags from the original call.		// Copy fast flags from the original call.
if (const FPMathOperator *FPOp = dyn_cast<const FPMathOperator>(CI))		if (const FPMathOperator *FPOp = dyn_cast<const FPMathOperator>(CI))
		rampitecUnsubmitted Done Reply Inline Actions You should use AMDGPULibFunc and parser/mangler used across the whole this source. rampitec: You should use AMDGPULibFunc and parser/mangler used across the whole this source.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions will refactor this. yaxunl: will refactor this.
B.setFastMathFlags(FPOp->getFastMathFlags());		B.setFastMathFlags(FPOp->getFastMathFlags());

		if (auto *Mangled = dyn_cast<MangledFuncInfo>(&FInfo))
		return foldMangledFunction(CI, *Mangled, B, AA);

		auto *Unmangled = cast<UnmangledFuncInfo>(&FInfo);
		return foldUnmangledFunction(CI, *Unmangled, B, AA);
		}

		bool AMDGPULibCalls::foldMangledFunction(CallInst *CI, MangledFuncInfo &FInfo,
		IRBuilder<> &B, AliasAnalysis *AA) {
if (TDOFold(CI, FInfo))		if (TDOFold(CI, FInfo))
return true;		return true;

// Under unsafe-math, evaluate calls if possible.		// Under unsafe-math, evaluate calls if possible.
// According to Brian Sumner, we can do this for all f32 function calls		// According to Brian Sumner, we can do this for all f32 function calls
// using host's double function calls.		// using host's double function calls.
if (isUnsafeMath(CI) && evaluateCall(CI, FInfo))		if (isUnsafeMath(CI) && evaluateCall(CI, FInfo))
return true;		return true;
Show All 34 Lines	bool AMDGPULibCalls::foldMangledFunction(CallInst *CI, MangledFuncInfo &FInfo,
case AMDGPULibFunc::EI_COS:		case AMDGPULibFunc::EI_COS:
case AMDGPULibFunc::EI_SIN:		case AMDGPULibFunc::EI_SIN:
if ((getArgType(FInfo) == AMDGPULibFunc::F32 \|\|		if ((getArgType(FInfo) == AMDGPULibFunc::F32 \|\|
getArgType(FInfo) == AMDGPULibFunc::F64)		getArgType(FInfo) == AMDGPULibFunc::F64)
&& (FInfo.getPrefix() == AMDGPULibFunc::NOPFX))		&& (FInfo.getPrefix() == AMDGPULibFunc::NOPFX))
return fold_sincos(CI, B, AA);		return fold_sincos(CI, B, AA);

break;		break;
		default:
		break;
		}

		return false;
		}

		bool AMDGPULibCalls::foldUnmangledFunction(CallInst *CI,
		UnmangledFuncInfo &FInfo,
		IRBuilder<> &B, AliasAnalysis *AA) {
		switch (FInfo.getId()) {
		case AMDGPULibFunc::EI_READ_PIPE_2:
		case AMDGPULibFunc::EI_READ_PIPE_4:
		case AMDGPULibFunc::EI_WRITE_PIPE_2:
		case AMDGPULibFunc::EI_WRITE_PIPE_4:
		return fold_read_write_pipe(CI, B, FInfo);

default:		default:
break;		break;
}		}

return false;		return false;
}		}

bool AMDGPULibCalls::TDOFold(CallInst *CI, const FuncInfo &FInfo) {		bool AMDGPULibCalls::TDOFold(CallInst *CI, const MangledFuncInfo &FInfo) {
// Table-Driven optimization		// Table-Driven optimization
const TableRef tr = getOptTable(FInfo.getId());		const TableRef tr = getOptTable(FInfo.getId());
if (tr.size==0)		if (tr.size==0)
return false;		return false;

int const sz = (int)tr.size;		int const sz = (int)tr.size;
const TableEntry * const ftbl = tr.table;		const TableEntry * const ftbl = tr.table;
Value *opr0 = CI->getArgOperand(0);		Value *opr0 = CI->getArgOperand(0);
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	if (ConstantFP *CF = dyn_cast<ConstantFP>(opr0)) {
}		}
}		}
}		}
}		}

return false;		return false;
}		}

bool AMDGPULibCalls::replaceWithNative(CallInst *CI, const FuncInfo &FInfo) {		bool AMDGPULibCalls::replaceWithNative(CallInst *CI,
		const MangledFuncInfo &FInfo) {
Module *M = CI->getModule();		Module *M = CI->getModule();
if (getArgType(FInfo) != AMDGPULibFunc::F32 \|\|		if (getArgType(FInfo) != AMDGPULibFunc::F32 \|\|
FInfo.getPrefix() != AMDGPULibFunc::NOPFX \|\|		FInfo.getPrefix() != AMDGPULibFunc::NOPFX \|\|
!HasNative(FInfo.getId()))		!HasNative(FInfo.getId()))
return false;		return false;

AMDGPULibFunc nf = FInfo;		AMDGPUMangledLibFunc nf = FInfo;
nf.setPrefix(AMDGPULibFunc::NATIVE);		nf.setPrefix(AMDGPULibFunc::NATIVE);
if (Constant *FPExpr = getFunction(M, nf)) {		if (Constant *FPExpr = getFunction(M, nf)) {
DEBUG(dbgs() << "AMDIC: " << *CI << " ---> ");		DEBUG(dbgs() << "AMDIC: " << *CI << " ---> ");

CI->setCalledFunction(FPExpr);		CI->setCalledFunction(FPExpr);

DEBUG(dbgs() << *CI << '\n');		DEBUG(dbgs() << *CI << '\n');

return true;		return true;
}		}
return false;		return false;
}		}

// [native_]half_recip(c) ==> 1.0/c		// [native_]half_recip(c) ==> 1.0/c
bool AMDGPULibCalls::fold_recip(CallInst *CI, IRBuilder<> &B,		bool AMDGPULibCalls::fold_recip(CallInst *CI, IRBuilder<> &B,
const FuncInfo &FInfo) {		const MangledFuncInfo &FInfo) {
Value *opr0 = CI->getArgOperand(0);		Value *opr0 = CI->getArgOperand(0);
if (ConstantFP *CF = dyn_cast<ConstantFP>(opr0)) {		if (ConstantFP *CF = dyn_cast<ConstantFP>(opr0)) {
// Just create a normal div. Later, InstCombine will be able		// Just create a normal div. Later, InstCombine will be able
// to compute the divide into a constant (avoid check float infinity		// to compute the divide into a constant (avoid check float infinity
// or subnormal at this point).		// or subnormal at this point).
Value *nval = B.CreateFDiv(ConstantFP::get(CF->getType(), 1.0),		Value *nval = B.CreateFDiv(ConstantFP::get(CF->getType(), 1.0),
opr0,		opr0,
"recip2div");		"recip2div");
DEBUG(errs() << "AMDIC: " << *CI		DEBUG(errs() << "AMDIC: " << *CI
<< " ---> " << *nval << "\n");		<< " ---> " << *nval << "\n");
replaceCall(nval);		replaceCall(nval);
return true;		return true;
}		}
return false;		return false;
}		}

// [native_]half_divide(x, c) ==> x/c		// [native_]half_divide(x, c) ==> x/c
bool AMDGPULibCalls::fold_divide(CallInst *CI, IRBuilder<> &B,		bool AMDGPULibCalls::fold_divide(CallInst *CI, IRBuilder<> &B,
const FuncInfo &FInfo) {		const MangledFuncInfo &FInfo) {
Value *opr0 = CI->getArgOperand(0);		Value *opr0 = CI->getArgOperand(0);
Value *opr1 = CI->getArgOperand(1);		Value *opr1 = CI->getArgOperand(1);
ConstantFP *CF0 = dyn_cast<ConstantFP>(opr0);		ConstantFP *CF0 = dyn_cast<ConstantFP>(opr0);
ConstantFP *CF1 = dyn_cast<ConstantFP>(opr1);		ConstantFP *CF1 = dyn_cast<ConstantFP>(opr1);

if ((CF0 && CF1) \|\| // both are constants		if ((CF0 && CF1) \|\| // both are constants
(CF1 && (getArgType(FInfo) == AMDGPULibFunc::F32)))		(CF1 && (getArgType(FInfo) == AMDGPULibFunc::F32)))
// CF1 is constant && f32 divide		// CF1 is constant && f32 divide
Show All 13 Lines	#if _XOPEN_SOURCE >= 600 \|\| _ISOC99_SOURCE \|\| _POSIX_C_SOURCE >= 200112L
return ::log2(V);		return ::log2(V);
#else		#else
return log(V) / 0.693147180559945309417;		return log(V) / 0.693147180559945309417;
#endif		#endif
}		}
}		}

bool AMDGPULibCalls::fold_pow(CallInst *CI, IRBuilder<> &B,		bool AMDGPULibCalls::fold_pow(CallInst *CI, IRBuilder<> &B,
const FuncInfo &FInfo) {		const MangledFuncInfo &FInfo) {
assert((FInfo.getId() == AMDGPULibFunc::EI_POW \|\|		assert((FInfo.getId() == AMDGPULibFunc::EI_POW \|\|
FInfo.getId() == AMDGPULibFunc::EI_POWR \|\|		FInfo.getId() == AMDGPULibFunc::EI_POWR \|\|
FInfo.getId() == AMDGPULibFunc::EI_POWN) &&		FInfo.getId() == AMDGPULibFunc::EI_POWN) &&
"fold_pow: encounter a wrong function call");		"fold_pow: encounter a wrong function call");

Value opr0, opr1;		Value opr0, opr1;
ConstantFP *CF;		ConstantFP *CF;
ConstantInt *CINT;		ConstantInt *CINT;
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	if ((CF && CF->isExactlyValue(-1.0)) \|\| (CINT && ci_opr1 == -1)) {
replaceCall(nval);		replaceCall(nval);
return true;		return true;
}		}

Module *M = CI->getModule();		Module *M = CI->getModule();
if (CF && (CF->isExactlyValue(0.5) \|\| CF->isExactlyValue(-0.5))) {		if (CF && (CF->isExactlyValue(0.5) \|\| CF->isExactlyValue(-0.5))) {
// pow[r](x, [-]0.5) = sqrt(x)		// pow[r](x, [-]0.5) = sqrt(x)
bool issqrt = CF->isExactlyValue(0.5);		bool issqrt = CF->isExactlyValue(0.5);
if (Constant *FPExpr = getFunction(M,		if (Constant *FPExpr = getFunction(
AMDGPULibFunc(issqrt ? AMDGPULibFunc::EI_SQRT		M, AMDGPUMangledLibFunc(issqrt ? AMDGPULibFunc::EI_SQRT
: AMDGPULibFunc::EI_RSQRT, FInfo))) {		: AMDGPULibFunc::EI_RSQRT,
		FInfo))) {
DEBUG(errs() << "AMDIC: " << *CI << " ---> "		DEBUG(errs() << "AMDIC: " << *CI << " ---> "
<< FInfo.getName().c_str() << "(" << *opr0 << ")\n");		<< FInfo.getUnmangledName().c_str() << "(" << *opr0
		<< ")\n");
Value *nval = CreateCallEx(B,FPExpr, opr0, issqrt ? "__pow2sqrt"		Value *nval = CreateCallEx(B,FPExpr, opr0, issqrt ? "__pow2sqrt"
: "__pow2rsqrt");		: "__pow2rsqrt");
replaceCall(nval);		replaceCall(nval);
return true;		return true;
}		}
}		}

if (!isUnsafeMath(CI))		if (!isUnsafeMath(CI))
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	if (abs_opr1 <= 12) {
DEBUG(errs() << "AMDIC: " << *CI << " ---> "		DEBUG(errs() << "AMDIC: " << *CI << " ---> "
<< ((ci_opr1 < 0) ? "1/prod(" : "prod(") << *opr0 << ")\n");		<< ((ci_opr1 < 0) ? "1/prod(" : "prod(") << *opr0 << ")\n");
replaceCall(nval);		replaceCall(nval);
return true;		return true;
}		}

// powr ---> exp2(y * log2(x))		// powr ---> exp2(y * log2(x))
// pown/pow ---> powr(fabs(x), y) \| (x & ((int)y << 31))		// pown/pow ---> powr(fabs(x), y) \| (x & ((int)y << 31))
Constant *ExpExpr = getFunction(M, AMDGPULibFunc(AMDGPULibFunc::EI_EXP2,		Constant *ExpExpr =
FInfo));		getFunction(M, AMDGPUMangledLibFunc(AMDGPULibFunc::EI_EXP2, FInfo));
if (!ExpExpr)		if (!ExpExpr)
return false;		return false;

bool needlog = false;		bool needlog = false;
bool needabs = false;		bool needabs = false;
bool needcopysign = false;		bool needcopysign = false;
Constant *cnval = nullptr;		Constant *cnval = nullptr;
if (getVecSize(FInfo) == 1) {		if (getVecSize(FInfo) == 1) {
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	if (getVecSize(FInfo) == 1) {
}		}
} else		} else
return false;		return false;
}		}
}		}

Value *nval;		Value *nval;
if (needabs) {		if (needabs) {
Constant *AbsExpr = getFunction(M, AMDGPULibFunc(AMDGPULibFunc::EI_FABS,		Constant *AbsExpr =
FInfo));		getFunction(M, AMDGPUMangledLibFunc(AMDGPULibFunc::EI_FABS, FInfo));
if (!AbsExpr)		if (!AbsExpr)
return false;		return false;
nval = CreateCallEx(B, AbsExpr, opr0, "__fabs");		nval = CreateCallEx(B, AbsExpr, opr0, "__fabs");
} else {		} else {
nval = cnval ? cnval : opr0;		nval = cnval ? cnval : opr0;
}		}
if (needlog) {		if (needlog) {
Constant *LogExpr = getFunction(M, AMDGPULibFunc(AMDGPULibFunc::EI_LOG2,		Constant *LogExpr =
FInfo));		getFunction(M, AMDGPUMangledLibFunc(AMDGPULibFunc::EI_LOG2, FInfo));
if (!LogExpr)		if (!LogExpr)
return false;		return false;
nval = CreateCallEx(B,LogExpr, nval, "__log2");		nval = CreateCallEx(B,LogExpr, nval, "__log2");
}		}

if (FInfo.getId() == AMDGPULibFunc::EI_POWN) {		if (FInfo.getId() == AMDGPULibFunc::EI_POWN) {
// convert int(32) to fp(f32 or f64)		// convert int(32) to fp(f32 or f64)
opr1 = B.CreateSIToFP(opr1, nval->getType(), "pownI2F");		opr1 = B.CreateSIToFP(opr1, nval->getType(), "pownI2F");
Show All 24 Lines	bool AMDGPULibCalls::fold_pow(CallInst *CI, IRBuilder<> &B,
DEBUG(errs() << "AMDIC: " << *CI << " ---> "		DEBUG(errs() << "AMDIC: " << *CI << " ---> "
<< "exp2(" << opr1 << " log2(" << *opr0 << "))\n");		<< "exp2(" << opr1 << " log2(" << *opr0 << "))\n");
replaceCall(nval);		replaceCall(nval);

return true;		return true;
}		}

bool AMDGPULibCalls::fold_rootn(CallInst *CI, IRBuilder<> &B,		bool AMDGPULibCalls::fold_rootn(CallInst *CI, IRBuilder<> &B,
const FuncInfo &FInfo) {		const MangledFuncInfo &FInfo) {
Value *opr0 = CI->getArgOperand(0);		Value *opr0 = CI->getArgOperand(0);
Value *opr1 = CI->getArgOperand(1);		Value *opr1 = CI->getArgOperand(1);

ConstantInt *CINT = dyn_cast<ConstantInt>(opr1);		ConstantInt *CINT = dyn_cast<ConstantInt>(opr1);
if (!CINT) {		if (!CINT) {
return false;		return false;
}		}
int ci_opr1 = (int)CINT->getSExtValue();		int ci_opr1 = (int)CINT->getSExtValue();
if (ci_opr1 == 1) { // rootn(x, 1) = x		if (ci_opr1 == 1) { // rootn(x, 1) = x
DEBUG(errs() << "AMDIC: " << *CI		DEBUG(errs() << "AMDIC: " << *CI
<< " ---> " << *opr0 << "\n");		<< " ---> " << *opr0 << "\n");
replaceCall(opr0);		replaceCall(opr0);
return true;		return true;
}		}
if (ci_opr1 == 2) { // rootn(x, 2) = sqrt(x)		if (ci_opr1 == 2) { // rootn(x, 2) = sqrt(x)
std::vector<const Type*> ParamsTys;		std::vector<const Type*> ParamsTys;
ParamsTys.push_back(opr0->getType());		ParamsTys.push_back(opr0->getType());
Module *M = CI->getModule();		Module *M = CI->getModule();
if (Constant *FPExpr = getFunction(M, AMDGPULibFunc(AMDGPULibFunc::EI_SQRT,		if (Constant *FPExpr = getFunction(
FInfo))) {		M, AMDGPUMangledLibFunc(AMDGPULibFunc::EI_SQRT, FInfo))) {
DEBUG(errs() << "AMDIC: " << CI << " ---> sqrt(" << opr0 << ")\n");		DEBUG(errs() << "AMDIC: " << CI << " ---> sqrt(" << opr0 << ")\n");
Value *nval = CreateCallEx(B,FPExpr, opr0, "__rootn2sqrt");		Value *nval = CreateCallEx(B,FPExpr, opr0, "__rootn2sqrt");
replaceCall(nval);		replaceCall(nval);
return true;		return true;
}		}
} else if (ci_opr1 == 3) { // rootn(x, 3) = cbrt(x)		} else if (ci_opr1 == 3) { // rootn(x, 3) = cbrt(x)
Module *M = CI->getModule();		Module *M = CI->getModule();
if (Constant *FPExpr = getFunction(M, AMDGPULibFunc(AMDGPULibFunc::EI_CBRT,		if (Constant *FPExpr = getFunction(
FInfo))) {		M, AMDGPUMangledLibFunc(AMDGPULibFunc::EI_CBRT, FInfo))) {
DEBUG(errs() << "AMDIC: " << CI << " ---> cbrt(" << opr0 << ")\n");		DEBUG(errs() << "AMDIC: " << CI << " ---> cbrt(" << opr0 << ")\n");
Value *nval = CreateCallEx(B,FPExpr, opr0, "__rootn2cbrt");		Value *nval = CreateCallEx(B,FPExpr, opr0, "__rootn2cbrt");
replaceCall(nval);		replaceCall(nval);
return true;		return true;
}		}
} else if (ci_opr1 == -1) { // rootn(x, -1) = 1.0/x		} else if (ci_opr1 == -1) { // rootn(x, -1) = 1.0/x
DEBUG(errs() << "AMDIC: " << CI << " ---> 1.0 / " << opr0 << "\n");		DEBUG(errs() << "AMDIC: " << CI << " ---> 1.0 / " << opr0 << "\n");
Value *nval = B.CreateFDiv(ConstantFP::get(opr0->getType(), 1.0),		Value *nval = B.CreateFDiv(ConstantFP::get(opr0->getType(), 1.0),
opr0,		opr0,
"__rootn2div");		"__rootn2div");
replaceCall(nval);		replaceCall(nval);
return true;		return true;
} else if (ci_opr1 == -2) { // rootn(x, -2) = rsqrt(x)		} else if (ci_opr1 == -2) { // rootn(x, -2) = rsqrt(x)
std::vector<const Type*> ParamsTys;		std::vector<const Type*> ParamsTys;
ParamsTys.push_back(opr0->getType());		ParamsTys.push_back(opr0->getType());
Module *M = CI->getModule();		Module *M = CI->getModule();
if (Constant *FPExpr = getFunction(M, AMDGPULibFunc(AMDGPULibFunc::EI_RSQRT,		if (Constant *FPExpr = getFunction(
FInfo))) {		M, AMDGPUMangledLibFunc(AMDGPULibFunc::EI_RSQRT, FInfo))) {
DEBUG(errs() << "AMDIC: " << CI << " ---> rsqrt(" << opr0 << ")\n");		DEBUG(errs() << "AMDIC: " << CI << " ---> rsqrt(" << opr0 << ")\n");
Value *nval = CreateCallEx(B,FPExpr, opr0, "__rootn2rsqrt");		Value *nval = CreateCallEx(B,FPExpr, opr0, "__rootn2rsqrt");
replaceCall(nval);		replaceCall(nval);
return true;		return true;
}		}
}		}
return false;		return false;
}		}

bool AMDGPULibCalls::fold_fma_mad(CallInst *CI, IRBuilder<> &B,		bool AMDGPULibCalls::fold_fma_mad(CallInst *CI, IRBuilder<> &B,
const FuncInfo &FInfo) {		const MangledFuncInfo &FInfo) {
Value *opr0 = CI->getArgOperand(0);		Value *opr0 = CI->getArgOperand(0);
Value *opr1 = CI->getArgOperand(1);		Value *opr1 = CI->getArgOperand(1);
Value *opr2 = CI->getArgOperand(2);		Value *opr2 = CI->getArgOperand(2);

ConstantFP *CF0 = dyn_cast<ConstantFP>(opr0);		ConstantFP *CF0 = dyn_cast<ConstantFP>(opr0);
ConstantFP *CF1 = dyn_cast<ConstantFP>(opr1);		ConstantFP *CF1 = dyn_cast<ConstantFP>(opr1);
if ((CF0 && CF0->isZero()) \|\| (CF1 && CF1->isZero())) {		if ((CF0 && CF0->isZero()) \|\| (CF1 && CF1->isZero())) {
// fma/mad(a, b, c) = c if a=0 \|\| b=0		// fma/mad(a, b, c) = c if a=0 \|\| b=0
Show All 27 Lines	if (CF->isZero()) {
return true;		return true;
}		}
}		}

return false;		return false;
}		}

// Get a scalar native builtin signle argument FP function		// Get a scalar native builtin signle argument FP function
Constant* AMDGPULibCalls::getNativeFunction(Module* M, const FuncInfo& FInfo) {		Constant AMDGPULibCalls::getNativeFunction(Module M,
		const MangledFuncInfo &FInfo) {
if (getArgType(FInfo) == AMDGPULibFunc::F64 \|\| !HasNative(FInfo.getId()))		if (getArgType(FInfo) == AMDGPULibFunc::F64 \|\| !HasNative(FInfo.getId()))
return nullptr;		return nullptr;
FuncInfo nf = FInfo;		MangledFuncInfo nf = FInfo;
nf.setPrefix(AMDGPULibFunc::NATIVE);		nf.setPrefix(AMDGPULibFunc::NATIVE);
return getFunction(M, nf);		return getFunction(M, nf);
}		}

// fold sqrt -> native_sqrt (x)		// fold sqrt -> native_sqrt (x)
bool AMDGPULibCalls::fold_sqrt(CallInst *CI, IRBuilder<> &B,		bool AMDGPULibCalls::fold_sqrt(CallInst *CI, IRBuilder<> &B,
const FuncInfo &FInfo) {		const MangledFuncInfo &FInfo) {
if (getArgType(FInfo) == AMDGPULibFunc::F32 && (getVecSize(FInfo) == 1) &&		if (getArgType(FInfo) == AMDGPULibFunc::F32 && (getVecSize(FInfo) == 1) &&
(FInfo.getPrefix() != AMDGPULibFunc::NATIVE)) {		(FInfo.getPrefix() != AMDGPULibFunc::NATIVE)) {
if (Constant *FPExpr = getNativeFunction(		if (Constant *FPExpr = getNativeFunction(
CI->getModule(), AMDGPULibFunc(AMDGPULibFunc::EI_SQRT, FInfo))) {		CI->getModule(),
		AMDGPUMangledLibFunc(AMDGPULibFunc::EI_SQRT, FInfo))) {
Value *opr0 = CI->getArgOperand(0);		Value *opr0 = CI->getArgOperand(0);
DEBUG(errs() << "AMDIC: " << *CI << " ---> "		DEBUG(errs() << "AMDIC: " << *CI << " ---> "
<< "sqrt(" << *opr0 << ")\n");		<< "sqrt(" << *opr0 << ")\n");
Value *nval = CreateCallEx(B,FPExpr, opr0, "__sqrt");		Value *nval = CreateCallEx(B,FPExpr, opr0, "__sqrt");
replaceCall(nval);		replaceCall(nval);
return true;		return true;
}		}
}		}
return false;		return false;
}		}

// fold sin, cos -> sincos.		// fold sin, cos -> sincos.
bool AMDGPULibCalls::fold_sincos(CallInst *CI, IRBuilder<> &B,		bool AMDGPULibCalls::fold_sincos(CallInst *CI, IRBuilder<> &B,
AliasAnalysis *AA) {		AliasAnalysis *AA) {
AMDGPULibFunc fInfo;		auto Info = AMDGPULibFunc::parse(CI->getCalledFunction()->getName());
if (!AMDGPULibFunc::parse(CI->getCalledFunction()->getName(), fInfo))		AMDGPUMangledLibFunc *pInfo = cast<AMDGPUMangledLibFunc>(Info.get());
		if (!pInfo)
return false;		return false;

		AMDGPUMangledLibFunc &fInfo = *pInfo;
assert(fInfo.getId() == AMDGPULibFunc::EI_SIN \|\|		assert(fInfo.getId() == AMDGPULibFunc::EI_SIN \|\|
fInfo.getId() == AMDGPULibFunc::EI_COS);		fInfo.getId() == AMDGPULibFunc::EI_COS);
bool const isSin = fInfo.getId() == AMDGPULibFunc::EI_SIN;		bool const isSin = fInfo.getId() == AMDGPULibFunc::EI_SIN;

Value *CArgVal = CI->getArgOperand(0);		Value *CArgVal = CI->getArgOperand(0);
BasicBlock * const CBB = CI->getParent();		BasicBlock * const CBB = CI->getParent();

int const MaxScan = 30;		int const MaxScan = 30;
Show All 40 Lines	bool AMDGPULibCalls::fold_sincos(CallInst *CI, IRBuilder<> &B,
}		}

if (!UI) return false;		if (!UI) return false;

// Merge the sin and cos.		// Merge the sin and cos.

// for OpenCL 2.0 we have only generic implementation of sincos		// for OpenCL 2.0 we have only generic implementation of sincos
// function.		// function.
AMDGPULibFunc nf(AMDGPULibFunc::EI_SINCOS, fInfo);		AMDGPUMangledLibFunc nf(AMDGPULibFunc::EI_SINCOS, fInfo);
nf.Leads[0].PtrKind = AMDGPULibFunc::GENERIC;		nf.Leads[0].PtrKind = AMDGPULibFunc::GENERIC;
Function *Fsincos = dyn_cast_or_null<Function>(getFunction(M, nf));		Function *Fsincos = dyn_cast_or_null<Function>(getFunction(M, nf));
if (!Fsincos) return false;		if (!Fsincos) return false;

BasicBlock::iterator ItOld = B.GetInsertPoint();		BasicBlock::iterator ItOld = B.GetInsertPoint();
AllocaInst *Alloc = insertAlloca(UI, B, "__sincos_");		AllocaInst *Alloc = insertAlloca(UI, B, "__sincos_");
B.SetInsertPoint(UI);		B.SetInsertPoint(UI);

▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	AllocaInst* AMDGPULibCalls::insertAlloca(CallInst *UI, IRBuilder<> &B,
B.SetInsertPoint(&*ItNew);		B.SetInsertPoint(&*ItNew);
AllocaInst *Alloc = B.CreateAlloca(RetType, 0,		AllocaInst *Alloc = B.CreateAlloca(RetType, 0,
std::string(prefix) + UI->getName());		std::string(prefix) + UI->getName());
Alloc->setAlignment(UCallee->getParent()->getDataLayout()		Alloc->setAlignment(UCallee->getParent()->getDataLayout()
.getTypeAllocSize(RetType));		.getTypeAllocSize(RetType));
return Alloc;		return Alloc;
}		}

bool AMDGPULibCalls::evaluateScalarMathFunc(FuncInfo &FInfo,		bool AMDGPULibCalls::evaluateScalarMathFunc(MangledFuncInfo &FInfo,
double& Res0, double& Res1,		double &Res0, double &Res1,
Constant copr0, Constant copr1,		Constant copr0, Constant copr1,
Constant *copr2) {		Constant *copr2) {
// By default, opr0/opr1/opr3 holds values of float/double type.		// By default, opr0/opr1/opr3 holds values of float/double type.
// If they are not float/double, each function has to its		// If they are not float/double, each function has to its
// operand separately.		// operand separately.
double opr0=0.0, opr1=0.0, opr2=0.0;		double opr0=0.0, opr1=0.0, opr2=0.0;
ConstantFP *fpopr0 = dyn_cast_or_null<ConstantFP>(copr0);		ConstantFP *fpopr0 = dyn_cast_or_null<ConstantFP>(copr0);
ConstantFP *fpopr1 = dyn_cast_or_null<ConstantFP>(copr1);		ConstantFP *fpopr1 = dyn_cast_or_null<ConstantFP>(copr1);
▲ Show 20 Lines • Show All 177 Lines • ▼ Show 20 Lines	bool AMDGPULibCalls::evaluateScalarMathFunc(MangledFuncInfo &FInfo,
case AMDGPULibFunc::EI_MAD:		case AMDGPULibFunc::EI_MAD:
Res0 = opr0 * opr1 + opr2;		Res0 = opr0 * opr1 + opr2;
return true;		return true;
}		}

return false;		return false;
}		}

bool AMDGPULibCalls::evaluateCall(CallInst *aCI, FuncInfo &FInfo) {		bool AMDGPULibCalls::evaluateCall(CallInst *aCI, MangledFuncInfo &FInfo) {
int numArgs = (int)aCI->getNumArgOperands();		int numArgs = (int)aCI->getNumArgOperands();
if (numArgs > 3)		if (numArgs > 3)
return false;		return false;

Constant *copr0 = nullptr;		Constant *copr0 = nullptr;
Constant *copr1 = nullptr;		Constant *copr1 = nullptr;
Constant *copr2 = nullptr;		Constant *copr2 = nullptr;
if (numArgs > 0) {		if (numArgs > 0) {
▲ Show 20 Lines • Show All 143 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPULibFunc.h

Show All 20 Lines
class AMDGPULibFunc {		class AMDGPULibFunc {
public:		public:
enum EFuncId {		enum EFuncId {
EI_NONE,		EI_NONE,

// IMPORTANT: enums below should go in ascending by 1 value order		// IMPORTANT: enums below should go in ascending by 1 value order
// because they are used as indexes in the mangling rules table.		// because they are used as indexes in the mangling rules table.
// don't use explicit value assignment.		// don't use explicit value assignment.
		//
		// There are two types of library functions: those with mangled
		// name and those with unmangled name. The enums for the library
		// functions with mangled name are defined before enums for the
		// library functions with unmangled name. The enum for the last
		// library function with mangled name is EI_LAST_MANGLED.
		//
		// Library functions with mangled name.
EI_ABS,		EI_ABS,
EI_ABS_DIFF,		EI_ABS_DIFF,
EI_ACOS,		EI_ACOS,
EI_ACOSH,		EI_ACOSH,
EI_ACOSPI,		EI_ACOSPI,
EI_ADD_SAT,		EI_ADD_SAT,
EI_ALL,		EI_ALL,
EI_ANY,		EI_ANY,
▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	enum EFuncId {
EI_NEXTAFTER,		EI_NEXTAFTER,
EI_NORMALIZE,		EI_NORMALIZE,
EI_POPCOUNT,		EI_POPCOUNT,
EI_POW,		EI_POW,
EI_POWN,		EI_POWN,
EI_POWR,		EI_POWR,
EI_PREFETCH,		EI_PREFETCH,
EI_RADIANS,		EI_RADIANS,
EI_READ_PIPE,
EI_RECIP,		EI_RECIP,
EI_REMAINDER,		EI_REMAINDER,
EI_REMQUO,		EI_REMQUO,
EI_RESERVE_READ_PIPE,		EI_RESERVE_READ_PIPE,
EI_RESERVE_WRITE_PIPE,		EI_RESERVE_WRITE_PIPE,
EI_RHADD,		EI_RHADD,
EI_RINT,		EI_RINT,
EI_ROOTN,		EI_ROOTN,
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	enum EFuncId {
EI_WORK_GROUP_SCAN_EXCLUSIVE_MAX,		EI_WORK_GROUP_SCAN_EXCLUSIVE_MAX,
EI_WORK_GROUP_SCAN_EXCLUSIVE_MIN,		EI_WORK_GROUP_SCAN_EXCLUSIVE_MIN,
EI_WORK_GROUP_SCAN_INCLUSIVE_ADD,		EI_WORK_GROUP_SCAN_INCLUSIVE_ADD,
EI_WORK_GROUP_SCAN_INCLUSIVE_MAX,		EI_WORK_GROUP_SCAN_INCLUSIVE_MAX,
EI_WORK_GROUP_SCAN_INCLUSIVE_MIN,		EI_WORK_GROUP_SCAN_INCLUSIVE_MIN,
EI_WRITE_IMAGEF,		EI_WRITE_IMAGEF,
EI_WRITE_IMAGEI,		EI_WRITE_IMAGEI,
EI_WRITE_IMAGEUI,		EI_WRITE_IMAGEUI,
EI_WRITE_PIPE,
EI_NCOS,		EI_NCOS,
EI_NEXP2,		EI_NEXP2,
EI_NFMA,		EI_NFMA,
EI_NLOG2,		EI_NLOG2,
EI_NRCP,		EI_NRCP,
EI_NRSQRT,		EI_NRSQRT,
EI_NSIN,		EI_NSIN,
EI_NSQRT,		EI_NSQRT,
EI_FTZ,		EI_FTZ,
EI_FLDEXP,		EI_FLDEXP,
EI_CLASS,		EI_CLASS,
EI_RCBRT,		EI_RCBRT,
		EI_LAST_MANGLED =
		EI_RCBRT, /* The last library function with mangled name */

		// Library functions with unmangled name.
		EI_READ_PIPE_2,
		rampitecUnsubmitted Done Reply Inline Actions There is already EI_READ_PIPE above. Is it mangled? Then why these are unmangled? rampitec: There is already EI_READ_PIPE above. Is it mangled? Then why these are unmangled?
		yaxunlAuthorUnsubmitted Done Reply Inline Actions clang does not emit mangled read_pipe functions. Will remove EI_READ_PIPE. yaxunl: clang does not emit mangled read_pipe functions. Will remove EI_READ_PIPE.
		EI_READ_PIPE_4,
		EI_WRITE_PIPE_2,
		EI_WRITE_PIPE_4,

EX_INTRINSICS_COUNT		EX_INTRINSICS_COUNT
};		};

enum ENamePrefix {		enum ENamePrefix {
NOPFX,		NOPFX,
NATIVE,		NATIVE,
HALF		HALF
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	struct Param {
}		}
Param() { reset(); }		Param() { reset(); }

template <typename Stream>		template <typename Stream>
void mangleItanium(Stream& os);		void mangleItanium(Stream& os);
};		};

public:		public:
static bool parse(StringRef mangledName, AMDGPULibFunc &iInfo);		static std::unique_ptr<AMDGPULibFunc> parse(StringRef mangledName);

AMDGPULibFunc();		explicit AMDGPULibFunc() {}
AMDGPULibFunc(EFuncId id, const AMDGPULibFunc& copyFrom);		virtual ~AMDGPULibFunc() {}

		virtual unsigned getNumArgs() const = 0;

ENamePrefix getPrefix() const { return FKind; }
EFuncId getId() const { return FuncId; }		EFuncId getId() const { return FuncId; }

std::string getName() const;		bool isMangled() const {
unsigned getNumArgs() const;		return static_cast<unsigned>(FuncId) <=
		static_cast<unsigned>(EI_LAST_MANGLED);
		}

FunctionType* getFunctionType(Module& M) const;		void setId(EFuncId id) { FuncId = id; }
		virtual bool parseFuncName(StringRef &mangledName) = 0;

std::string mangle() const;		/// \return The mangled function name for mangled library functions
		/// and unmangled function name for unmangled library functions.
		virtual std::string mangle() const = 0;

void setPrefix(ENamePrefix pfx) { FKind = pfx; }		void setName(StringRef N) { Name = N; }
void setId(EFuncId id) { FuncId = id; }

		virtual FunctionType *getFunctionType(Module &M) const = 0;
static Function* getFunction(llvm::Module *M, const AMDGPULibFunc& fInfo);		static Function getFunction(llvm::Module M, const AMDGPULibFunc &fInfo);

static Function* getOrInsertFunction(llvm::Module *M,		static Function getOrInsertFunction(llvm::Module M,
const AMDGPULibFunc& fInfo);		const AMDGPULibFunc &fInfo);

static StringRef getUnmangledName(const StringRef& mangledName);		protected:
		EFuncId FuncId;
		std::string Name;
		};

		class AMDGPUMangledLibFunc : public AMDGPULibFunc {
		public:
Param Leads[2];		Param Leads[2];

		explicit AMDGPUMangledLibFunc();
		explicit AMDGPUMangledLibFunc(EFuncId id,
		const AMDGPUMangledLibFunc &copyFrom);
		unsigned getNumArgs() const override;
		bool parseFuncName(StringRef &mangledName) override;
		// Methods for support type inquiry through isa, cast, and dyn_cast:
		static bool classof(const AMDGPULibFunc *F) { return F->isMangled(); }
		rampitecUnsubmitted Done Reply Inline Actions StringRef designed to be passed by value, here and below. rampitec: StringRef designed to be passed by value, here and below.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions This comes from the original implementation. Will fix. yaxunl: This comes from the original implementation. Will fix.
		rampitecUnsubmitted Done Reply Inline Actions Not done. rampitec: Not done.
		vpykhtinUnsubmitted Done Reply Inline Actions Initially (non-const) StringRef& was introduced intentionally to have mangledName with stripped name on return. vpykhtin: Initially (non-const) StringRef& was introduced intentionally to have mangledName with stripped…
		yaxunlAuthorUnsubmitted Not Done Reply Inline Actions Only const StringRef& is changed to StringRef. non-const StringRef& is not changed. yaxunl: Only const StringRef& is changed to StringRef. non-const StringRef& is not changed.

		std::string getUnmangledName() const;
		FunctionType *getFunctionType(Module &M) const override;

		ENamePrefix getPrefix() const { return FKind; }
		void setPrefix(ENamePrefix pfx) { FKind = pfx; }

		static StringRef getUnmangledName(StringRef MangledName);

		std::string mangle() const override;

private:		private:
EFuncId FuncId;
ENamePrefix FKind;		ENamePrefix FKind;
std::string Name;

void reset();

std::string mangleNameItanium() const;		std::string mangleNameItanium() const;
bool parseItanuimName(StringRef& mangledName);

std::string mangleName(const StringRef& name) const;		std::string mangleName(StringRef Name) const;
bool parseName(const StringRef& mangledName);		bool parseUnmangledName(StringRef MangledName);

template <typename Stream>		template <typename Stream> void writeName(Stream &OS) const;
void writeName(Stream& OS) const;
};		};

		class AMDGPUUnmangledLibFunc : public AMDGPULibFunc {
		FunctionType *FuncTy;

		public:
		explicit AMDGPUUnmangledLibFunc();
		unsigned getNumArgs() const override;
		// Methods for support type inquiry through isa, cast, and dyn_cast:
		static bool classof(const AMDGPULibFunc *F) { return !F->isMangled(); }
		bool parseFuncName(StringRef &Name) override;
		std::string mangle() const override { return Name; }
		void setFunctionType(FunctionType *FT) { FuncTy = FT; }
		FunctionType *getFunctionType(Module &M) const override { return FuncTy; }
		};
}		}
#endif // _AMDGPU_LIBFUNC_H_		#endif // _AMDGPU_LIBFUNC_H_

lib/Target/AMDGPU/AMDGPULibFunc.cpp

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	struct ManglingRule {
unsigned char Param[5];		unsigned char Param[5];

int maxLeadIndex() const { return (std::max)(Lead[0], Lead[1]); }		int maxLeadIndex() const { return (std::max)(Lead[0], Lead[1]); }
int getNumLeads() const { return (Lead[0] ? 1 : 0) + (Lead[1] ? 1 : 0); }		int getNumLeads() const { return (Lead[0] ? 1 : 0) + (Lead[1] ? 1 : 0); }

unsigned getNumArgs() const;		unsigned getNumArgs() const;
};		};

		// Information about library functions with unmangled names.
		class UnmangledFuncInfo {
		StringRef const Name;
		unsigned NumArgs;

		// Table for all lib functions with unmangled names.
		static const UnmangledFuncInfo Table[];

		// Number of entries in Table.
		static const unsigned TableSize;

		// Map function name to index.
		class NameMap : public StringMap<unsigned> {
		public:
		NameMap() {
		for (unsigned I = 0; I != TableSize; ++I)
		(*this)[Table[I].Name] = I;
		}
		};
		friend class NameMap;
		static NameMap Map;

		public:
		using ID = AMDGPULibFunc::EFuncId;
		UnmangledFuncInfo() = default;
		UnmangledFuncInfo(StringRef _Name, unsigned _NumArgs)
		: Name(_Name), NumArgs(_NumArgs) {}
		// Get index to Table by function name.
		static bool lookup(StringRef Name, ID &Id);
		static unsigned toIndex(ID Id) {
		assert(static_cast<unsigned>(Id) >
		static_cast<unsigned>(AMDGPULibFunc::EI_LAST_MANGLED) &&
		"Invalid unmangled library function");
		return static_cast<unsigned>(Id) - 1 -
		static_cast<unsigned>(AMDGPULibFunc::EI_LAST_MANGLED);
		}
		static ID toFuncId(unsigned Index) {
		assert(Index < TableSize && "Invalid unmangled library function");
		return static_cast<ID>(
		Index + 1 + static_cast<unsigned>(AMDGPULibFunc::EI_LAST_MANGLED));
		}
		static unsigned getNumArgs(ID Id) { return Table[toIndex(Id)].NumArgs; }
		static StringRef getName(ID Id) { return Table[toIndex(Id)].Name; }
		};

unsigned ManglingRule::getNumArgs() const {		unsigned ManglingRule::getNumArgs() const {
unsigned I=0;		unsigned I=0;
while (I < (sizeof Param/sizeof Param[0]) && Param[I]) ++I;		while (I < (sizeof Param/sizeof Param[0]) && Param[I]) ++I;
return I;		return I;
}		}

// This table describes function formal argument type rules. The order of rules		// This table describes function formal argument type rules. The order of rules
// corresponds to the EFuncId enum at AMDGPULibFunc.h		// corresponds to the EFuncId enum at AMDGPULibFunc.h
▲ Show 20 Lines • Show All 134 Lines • ▼ Show 20 Lines
{ "nextafter" , {1}, {E_ANY,E_COPY}},		{ "nextafter" , {1}, {E_ANY,E_COPY}},
{ "normalize" , {1}, {E_ANY}},		{ "normalize" , {1}, {E_ANY}},
{ "popcount" , {1}, {E_ANY}},		{ "popcount" , {1}, {E_ANY}},
{ "pow" , {1}, {E_ANY,E_COPY}},		{ "pow" , {1}, {E_ANY,E_COPY}},
{ "pown" , {1}, {E_ANY,E_SETBASE_I32}},		{ "pown" , {1}, {E_ANY,E_SETBASE_I32}},
{ "powr" , {1}, {E_ANY,E_COPY}},		{ "powr" , {1}, {E_ANY,E_COPY}},
{ "prefetch" , {1}, {E_CONSTPTR_ANY,EX_SIZET}},		{ "prefetch" , {1}, {E_CONSTPTR_ANY,EX_SIZET}},
{ "radians" , {1}, {E_ANY}},		{ "radians" , {1}, {E_ANY}},
{ "read_pipe" , {4}, {E_COPY,EX_RESERVEDID,EX_UINT,E_ANY}},
{ "recip" , {1}, {E_ANY}},		{ "recip" , {1}, {E_ANY}},
{ "remainder" , {1}, {E_ANY,E_COPY}},		{ "remainder" , {1}, {E_ANY,E_COPY}},
{ "remquo" , {1,3}, {E_ANY,E_COPY,E_ANY}},		{ "remquo" , {1,3}, {E_ANY,E_COPY,E_ANY}},
{ "reserve_read_pipe" , {1}, {E_ANY,EX_UINT}},		{ "reserve_read_pipe" , {1}, {E_ANY,EX_UINT}},
{ "reserve_write_pipe" , {1}, {E_ANY,EX_UINT}},		{ "reserve_write_pipe" , {1}, {E_ANY,EX_UINT}},
{ "rhadd" , {1}, {E_ANY,E_COPY}},		{ "rhadd" , {1}, {E_ANY,E_COPY}},
{ "rint" , {1}, {E_ANY}},		{ "rint" , {1}, {E_ANY}},
{ "rootn" , {1}, {E_ANY,E_SETBASE_I32}},		{ "rootn" , {1}, {E_ANY,E_SETBASE_I32}},
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
{ "work_group_scan_exclusive_max" , {1}, {E_ANY}},		{ "work_group_scan_exclusive_max" , {1}, {E_ANY}},
{ "work_group_scan_exclusive_min" , {1}, {E_ANY}},		{ "work_group_scan_exclusive_min" , {1}, {E_ANY}},
{ "work_group_scan_inclusive_add" , {1}, {E_ANY}},		{ "work_group_scan_inclusive_add" , {1}, {E_ANY}},
{ "work_group_scan_inclusive_max" , {1}, {E_ANY}},		{ "work_group_scan_inclusive_max" , {1}, {E_ANY}},
{ "work_group_scan_inclusive_min" , {1}, {E_ANY}},		{ "work_group_scan_inclusive_min" , {1}, {E_ANY}},
{ "write_imagef" , {1}, {E_ANY,E_IMAGECOORDS,EX_FLOAT4}},		{ "write_imagef" , {1}, {E_ANY,E_IMAGECOORDS,EX_FLOAT4}},
{ "write_imagei" , {1}, {E_ANY,E_IMAGECOORDS,EX_INTV4}},		{ "write_imagei" , {1}, {E_ANY,E_IMAGECOORDS,EX_INTV4}},
{ "write_imageui" , {1}, {E_ANY,E_IMAGECOORDS,EX_UINTV4}},		{ "write_imageui" , {1}, {E_ANY,E_IMAGECOORDS,EX_UINTV4}},
{ "write_pipe" , {4}, {E_COPY,EX_RESERVEDID,EX_UINT,E_ANY}},
{ "ncos" , {1}, {E_ANY} },		{ "ncos" , {1}, {E_ANY} },
{ "nexp2" , {1}, {E_ANY} },		{ "nexp2" , {1}, {E_ANY} },
{ "nfma" , {1}, {E_ANY, E_COPY, E_COPY} },		{ "nfma" , {1}, {E_ANY, E_COPY, E_COPY} },
{ "nlog2" , {1}, {E_ANY} },		{ "nlog2" , {1}, {E_ANY} },
{ "nrcp" , {1}, {E_ANY} },		{ "nrcp" , {1}, {E_ANY} },
{ "nrsqrt" , {1}, {E_ANY} },		{ "nrsqrt" , {1}, {E_ANY} },
{ "nsin" , {1}, {E_ANY} },		{ "nsin" , {1}, {E_ANY} },
{ "nsqrt" , {1}, {E_ANY} },		{ "nsqrt" , {1}, {E_ANY} },
{ "ftz" , {1}, {E_ANY} },		{ "ftz" , {1}, {E_ANY} },
{ "fldexp" , {1}, {E_ANY, EX_UINT} },		{ "fldexp" , {1}, {E_ANY, EX_UINT} },
{ "class" , {1}, {E_ANY, EX_UINT} },		{ "class" , {1}, {E_ANY, EX_UINT} },
{ "rcbrt" , {1}, {E_ANY} },		{ "rcbrt" , {1}, {E_ANY} },
};		};

		// Library functions with unmangled name.
		const UnmangledFuncInfo UnmangledFuncInfo::Table[] = {
		{"__read_pipe_2", 4},
		{"__read_pipe_4", 6},
		{"__write_pipe_2", 4},
		{"__write_pipe_4", 6},
		};

		const unsigned UnmangledFuncInfo::TableSize =
		sizeof(UnmangledFuncInfo::Table) / sizeof(UnmangledFuncInfo::Table[0]);

		UnmangledFuncInfo::NameMap UnmangledFuncInfo::Map;

static const struct ManglingRulesMap : public StringMap<int> {		static const struct ManglingRulesMap : public StringMap<int> {
ManglingRulesMap()		ManglingRulesMap()
: StringMap<int>(sizeof(manglingRules)/sizeof(manglingRules[0])) {		: StringMap<int>(sizeof(manglingRules)/sizeof(manglingRules[0])) {
int Id = 0;		int Id = 0;
for (auto Rule : manglingRules)		for (auto Rule : manglingRules)
insert({ Rule.Name, Id++ });		insert({ Rule.Name, Id++ });
}		}
} manglingRulesMap;		} manglingRulesMap;
▲ Show 20 Lines • Show All 147 Lines • ▼ Show 20 Lines	if (Len <= 0 \|\| static_cast<size_t>(Len) > mangledName.size())
return StringRef();		return StringRef();
StringRef Res = mangledName.substr(0, Len);		StringRef Res = mangledName.substr(0, Len);
drop_front(mangledName, Len);		drop_front(mangledName, Len);
return Res;		return Res;
}		}

} // end anonymous namespace		} // end anonymous namespace

AMDGPULibFunc::AMDGPULibFunc() {		AMDGPUMangledLibFunc::AMDGPUMangledLibFunc() {
reset();
}

AMDGPULibFunc::AMDGPULibFunc(EFuncId id, const AMDGPULibFunc& copyFrom)
: FuncId(id) {
FKind = copyFrom.FKind;
Leads[0] = copyFrom.Leads[0];
Leads[1] = copyFrom.Leads[1];
}

void AMDGPULibFunc::reset() {
FuncId = EI_NONE;		FuncId = EI_NONE;
FKind = NOPFX;		FKind = NOPFX;
Leads[0].reset();		Leads[0].reset();
Leads[1].reset();		Leads[1].reset();
Name.clear();		Name.clear();
}		}

		AMDGPUUnmangledLibFunc::AMDGPUUnmangledLibFunc() { FuncId = EI_NONE; }

		AMDGPUMangledLibFunc::AMDGPUMangledLibFunc(
		EFuncId id, const AMDGPUMangledLibFunc &copyFrom) {
		FuncId = id;
		FKind = copyFrom.FKind;
		Leads[0] = copyFrom.Leads[0];
		Leads[1] = copyFrom.Leads[1];
		}

///////////////////////////////////////////////////////////////////////////////		///////////////////////////////////////////////////////////////////////////////
// Demangling		// Demangling

static int parseVecSize(StringRef& mangledName) {		static int parseVecSize(StringRef& mangledName) {
size_t const Len = eatNumber(mangledName);		size_t const Len = eatNumber(mangledName);
switch (Len) {		switch (Len) {
case 2: case 3: case 4: case 8: case 16:		case 2: case 3: case 4: case 8: case 16:
return Len;		return Len;
Show All 12 Lines	AMDGPULibFunc::ENamePrefix Pfx =
.Default(AMDGPULibFunc::NOPFX);		.Default(AMDGPULibFunc::NOPFX);

if (Pfx != AMDGPULibFunc::NOPFX)		if (Pfx != AMDGPULibFunc::NOPFX)
mangledName = P.second;		mangledName = P.second;

return Pfx;		return Pfx;
}		}

bool AMDGPULibFunc::parseName(const StringRef& fullName) {		bool AMDGPUMangledLibFunc::parseUnmangledName(StringRef FullName) {
FuncId = static_cast<EFuncId>(manglingRulesMap.lookup(fullName));		FuncId = static_cast<EFuncId>(manglingRulesMap.lookup(FullName));
return FuncId != EI_NONE;		return FuncId != EI_NONE;
}		}

///////////////////////////////////////////////////////////////////////////////		///////////////////////////////////////////////////////////////////////////////
// Itanium Demangling		// Itanium Demangling

namespace {		namespace {
struct ItaniumParamParser {		struct ItaniumParamParser {
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	if (::isDigit(TC)) {
}		}
}		}
if (res.ArgType == 0) return false;		if (res.ArgType == 0) return false;
Prev.VectorSize = res.VectorSize;		Prev.VectorSize = res.VectorSize;
Prev.ArgType = res.ArgType;		Prev.ArgType = res.ArgType;
return true;		return true;
}		}

bool AMDGPULibFunc::parseItanuimName(StringRef& mangledName) {		bool AMDGPUMangledLibFunc::parseFuncName(StringRef &mangledName) {
StringRef Name = eatLengthPrefixedName(mangledName);		StringRef Name = eatLengthPrefixedName(mangledName);
FKind = parseNamePrefix(Name);		FKind = parseNamePrefix(Name);
if (!parseName(Name)) return false;		if (!parseUnmangledName(Name))
		return false;

const ManglingRule& Rule = manglingRules[FuncId];		const ManglingRule& Rule = manglingRules[FuncId];
ItaniumParamParser Parser;		ItaniumParamParser Parser;
for (int I=0; I < Rule.maxLeadIndex(); ++I) {		for (int I=0; I < Rule.maxLeadIndex(); ++I) {
Param P;		Param P;
if (!Parser.parseItaniumParam(mangledName, P))		if (!Parser.parseItaniumParam(mangledName, P))
return false;		return false;

if ((I + 1) == Rule.Lead[0]) Leads[0] = P;		if ((I + 1) == Rule.Lead[0]) Leads[0] = P;
if ((I + 1) == Rule.Lead[1]) Leads[1] = P;		if ((I + 1) == Rule.Lead[1]) Leads[1] = P;
}		}
return true;		return true;
}		}

bool AMDGPULibFunc::parse(StringRef mangledName, AMDGPULibFunc& iInfo) {		bool AMDGPUUnmangledLibFunc::parseFuncName(StringRef &Name) {
iInfo.reset();		if (!UnmangledFuncInfo::lookup(Name, FuncId))
if (mangledName.empty())
return false;		return false;
		setName(Name);
if (eatTerm(mangledName, "_Z")) {		return true;
return iInfo.parseItanuimName(mangledName);
}		}
return false;
		std::unique_ptr<AMDGPULibFunc> AMDGPULibFunc::parse(StringRef FuncName) {
		if (FuncName.empty())
		return std::unique_ptr<AMDGPULibFunc>();

		std::unique_ptr<AMDGPULibFunc> LibF;
		if (eatTerm(FuncName, "_Z"))
		LibF = make_unique<AMDGPUMangledLibFunc>();
		else
		LibF = make_unique<AMDGPUUnmangledLibFunc>();
		if (LibF->parseFuncName(FuncName))
		return LibF;

		return std::unique_ptr<AMDGPULibFunc>();
}		}

StringRef AMDGPULibFunc::getUnmangledName(const StringRef& mangledName) {		StringRef AMDGPUMangledLibFunc::getUnmangledName(StringRef mangledName) {
StringRef S = mangledName;		StringRef S = mangledName;
if (eatTerm(S, "_Z"))		if (eatTerm(S, "_Z"))
return eatLengthPrefixedName(S);		return eatLengthPrefixedName(S);
return StringRef();		return StringRef();
}		}


///////////////////////////////////////////////////////////////////////////////		///////////////////////////////////////////////////////////////////////////////
// Mangling		// Mangling

template <typename Stream>		template <typename Stream>
void AMDGPULibFunc::writeName(Stream& OS) const {		void AMDGPUMangledLibFunc::writeName(Stream &OS) const {
const char *Pfx = "";		const char *Pfx = "";
switch (FKind) {		switch (FKind) {
case NATIVE: Pfx = "native_"; break;		case NATIVE: Pfx = "native_"; break;
case HALF: Pfx = "half_"; break;		case HALF: Pfx = "half_"; break;
default: break;		default: break;
}		}
if (!Name.empty()) {		if (!Name.empty()) {
OS << Pfx << Name;		OS << Pfx << Name;
} else if (FuncId != EI_NONE) {		} else if (FuncId != EI_NONE) {
OS << Pfx;		OS << Pfx;
const StringRef& S = manglingRules[FuncId].Name;		const StringRef& S = manglingRules[FuncId].Name;
OS.write(S.data(), S.size());		OS.write(S.data(), S.size());
}		}
}		}

std::string AMDGPULibFunc::mangle() const {		std::string AMDGPUMangledLibFunc::mangle() const { return mangleNameItanium(); }
return mangleNameItanium();
}

///////////////////////////////////////////////////////////////////////////////		///////////////////////////////////////////////////////////////////////////////
// Itanium Mangling		// Itanium Mangling

static const char *getItaniumTypeName(AMDGPULibFunc::EType T) {		static const char *getItaniumTypeName(AMDGPULibFunc::EType T) {
switch (T) {		switch (T) {
case AMDGPULibFunc::U8: return "h";		case AMDGPULibFunc::U8: return "h";
case AMDGPULibFunc::U16: return "t";		case AMDGPULibFunc::U16: return "t";
▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	void operator()(Stream& os, AMDGPULibFunc::Param p) {
os << getItaniumTypeName((AMDGPULibFunc::EType)p.ArgType);		os << getItaniumTypeName((AMDGPULibFunc::EType)p.ArgType);

exit:		exit:
if (Ptr.ArgType) Str.push_back(Ptr);		if (Ptr.ArgType) Str.push_back(Ptr);
}		}
};		};
} // namespace		} // namespace

std::string AMDGPULibFunc::mangleNameItanium() const {		std::string AMDGPUMangledLibFunc::mangleNameItanium() const {
SmallString<128> Buf;		SmallString<128> Buf;
raw_svector_ostream S(Buf);		raw_svector_ostream S(Buf);
SmallString<128> NameBuf;		SmallString<128> NameBuf;
raw_svector_ostream Name(NameBuf);		raw_svector_ostream Name(NameBuf);
writeName(Name);		writeName(Name);
const StringRef& NameStr = Name.str();		const StringRef& NameStr = Name.str();
S << "_Z" << static_cast<int>(NameStr.size()) << NameStr;		S << "_Z" << static_cast<int>(NameStr.size()) << NameStr;

▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	if (P.VectorSize > 1)
T = VectorType::get(T, P.VectorSize);		T = VectorType::get(T, P.VectorSize);
if (P.PtrKind != AMDGPULibFunc::BYVALUE)		if (P.PtrKind != AMDGPULibFunc::BYVALUE)
T = useAddrSpace ? T->getPointerTo((P.PtrKind & AMDGPULibFunc::ADDR_SPACE)		T = useAddrSpace ? T->getPointerTo((P.PtrKind & AMDGPULibFunc::ADDR_SPACE)
- 1)		- 1)
: T->getPointerTo();		: T->getPointerTo();
return T;		return T;
}		}

FunctionType* AMDGPULibFunc::getFunctionType(Module& M) const {		FunctionType *AMDGPUMangledLibFunc::getFunctionType(Module &M) const {
LLVMContext& C = M.getContext();		LLVMContext& C = M.getContext();
std::vector<Type*> Args;		std::vector<Type*> Args;
ParamIterator I(Leads, manglingRules[FuncId]);		ParamIterator I(Leads, manglingRules[FuncId]);
Param P;		Param P;
while ((P=I.getNextParam()).ArgType != 0)		while ((P=I.getNextParam()).ArgType != 0)
Args.push_back(getIntrinsicParamType(C, P, true));		Args.push_back(getIntrinsicParamType(C, P, true));

return FunctionType::get(		return FunctionType::get(
getIntrinsicParamType(C, getRetType(FuncId, Leads), true),		getIntrinsicParamType(C, getRetType(FuncId, Leads), true),
Args, false);		Args, false);
}		}

unsigned AMDGPULibFunc::getNumArgs() const {		unsigned AMDGPUMangledLibFunc::getNumArgs() const {
return manglingRules[FuncId].getNumArgs();		return manglingRules[FuncId].getNumArgs();
}		}

std::string AMDGPULibFunc::getName() const {		unsigned AMDGPUUnmangledLibFunc::getNumArgs() const {
		return UnmangledFuncInfo::getNumArgs(FuncId);
		}

		std::string AMDGPUMangledLibFunc::getUnmangledName() const {
SmallString<128> Buf;		SmallString<128> Buf;
raw_svector_ostream OS(Buf);		raw_svector_ostream OS(Buf);
writeName(OS);		writeName(OS);
return OS.str();		return OS.str();
}		}

Function AMDGPULibFunc::getFunction(Module M, const AMDGPULibFunc& fInfo) {		Function AMDGPULibFunc::getFunction(Module M, const AMDGPULibFunc &fInfo) {
std::string FuncName = fInfo.mangle();		std::string FuncName = fInfo.mangle();
Function *F = dyn_cast_or_null<Function>(		Function *F = dyn_cast_or_null<Function>(
M->getValueSymbolTable().lookup(FuncName));		M->getValueSymbolTable().lookup(FuncName));

// check formal with actual types conformance		// check formal with actual types conformance
if (F && !F->isDeclaration()		if (F && !F->isDeclaration()
&& !F->isVarArg()		&& !F->isVarArg()
&& F->arg_size() == fInfo.getNumArgs()) {		&& F->arg_size() == fInfo.getNumArgs()) {
return F;		return F;
}		}
return nullptr;		return nullptr;
}		}

Function AMDGPULibFunc::getOrInsertFunction(Module M,		Function AMDGPULibFunc::getOrInsertFunction(Module M,
const AMDGPULibFunc& fInfo) {		const AMDGPULibFunc &fInfo) {
std::string const FuncName = fInfo.mangle();		std::string const FuncName = fInfo.mangle();
Function *F = dyn_cast_or_null<Function>(		Function *F = dyn_cast_or_null<Function>(
M->getValueSymbolTable().lookup(FuncName));		M->getValueSymbolTable().lookup(FuncName));

// check formal with actual types conformance		// check formal with actual types conformance
if (F && !F->isDeclaration()		if (F && !F->isDeclaration()
&& !F->isVarArg()		&& !F->isVarArg()
&& F->arg_size() == fInfo.getNumArgs()) {		&& F->arg_size() == fInfo.getNumArgs()) {
Show All 23 Lines	if (hasPtr) {
LLVMContext &Ctx = M->getContext();		LLVMContext &Ctx = M->getContext();
Attr.addAttribute(Ctx, AttributeList::FunctionIndex, Attribute::ReadOnly);		Attr.addAttribute(Ctx, AttributeList::FunctionIndex, Attribute::ReadOnly);
Attr.addAttribute(Ctx, AttributeList::FunctionIndex, Attribute::NoUnwind);		Attr.addAttribute(Ctx, AttributeList::FunctionIndex, Attribute::NoUnwind);
C = M->getOrInsertFunction(FuncName, FuncTy, Attr);		C = M->getOrInsertFunction(FuncName, FuncTy, Attr);
}		}

return cast<Function>(C);		return cast<Function>(C);
}		}

		bool UnmangledFuncInfo::lookup(StringRef Name, ID &Id) {
		auto Loc = Map.find(Name);
		rampitecUnsubmitted Done Reply Inline Actions I believe we can go without bruteforce search loop. rampitec: I believe we can go without bruteforce search loop.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions will add a map. yaxunl: will add a map.
		rampitecUnsubmitted Done Reply Inline Actions Not done. rampitec: Not done.
		if (Loc != Map.end()) {
		Id = toFuncId(Loc->second);
		return true;
		}
		Id = AMDGPULibFunc::EI_NONE;
		return false;
		}

test/CodeGen/AMDGPU/simplify-libcalls.ll

; RUN: opt -S -O1 -mtriple=amdgcn-- -amdgpu-simplify-libcall <%s \| FileCheck -check-prefix=GCN -check-prefix=GCN-POSTLINK %s		; RUN: opt -S -O1 -mtriple=amdgcn-- -amdgpu-simplify-libcall <%s \| opt -instnamer -S \| FileCheck -check-prefix=GCN -check-prefix=GCN-POSTLINK %s
		rampitecUnsubmitted Not Done Reply Inline Actions -instanamer is not needed here. rampitec: -instanamer is not needed here.
; RUN: opt -S -O1 -mtriple=amdgcn-- -amdgpu-simplify-libcall -amdgpu-prelink <%s \| FileCheck -check-prefix=GCN -check-prefix=GCN-PRELINK %s		; RUN: opt -S -O1 -mtriple=amdgcn-- -amdgpu-simplify-libcall -amdgpu-prelink <%s \| opt -instnamer -S \| FileCheck -check-prefix=GCN -check-prefix=GCN-PRELINK %s
; RUN: opt -S -O1 -mtriple=amdgcn-- -amdgpu-use-native -amdgpu-prelink <%s \| FileCheck -check-prefix=GCN -check-prefix=GCN-NATIVE %s		; RUN: opt -S -O1 -mtriple=amdgcn-- -amdgpu-use-native -amdgpu-prelink <%s \| opt -instnamer -S \| FileCheck -check-prefix=GCN -check-prefix=GCN-NATIVE %s

; GCN-LABEL: {{^}}define amdgpu_kernel void @test_sincos		; GCN-LABEL: {{^}}define amdgpu_kernel void @test_sincos
; GCN-POSTLINK: tail call fast float @_Z3sinf(		; GCN-POSTLINK: tail call fast float @_Z3sinf(
; GCN-POSTLINK: tail call fast float @_Z3cosf(		; GCN-POSTLINK: tail call fast float @_Z3cosf(
; GCN-PRELINK: call fast float @_Z6sincosfPU3AS4f(		; GCN-PRELINK: call fast float @_Z6sincosfPU3AS4f(
; GCN-NATIVE: tail call fast float @_Z10native_sinf(		; GCN-NATIVE: tail call fast float @_Z10native_sinf(
; GCN-NATIVE: tail call fast float @_Z10native_cosf(		; GCN-NATIVE: tail call fast float @_Z10native_cosf(
define amdgpu_kernel void @test_sincos(float addrspace(1)* nocapture %a) {		define amdgpu_kernel void @test_sincos(float addrspace(1)* nocapture %a) {
▲ Show 20 Lines • Show All 282 Lines • ▼ Show 20 Lines	entry:
store float %call, float addrspace(1)* %a, align 4		store float %call, float addrspace(1)* %a, align 4
ret void		ret void
}		}

; GCN-LABEL: {{^}}define amdgpu_kernel void @test_pow_c		; GCN-LABEL: {{^}}define amdgpu_kernel void @test_pow_c
; GCN: %__powx2 = fmul fast float %tmp, %tmp		; GCN: %__powx2 = fmul fast float %tmp, %tmp
; GCN: %__powx21 = fmul fast float %__powx2, %__powx2		; GCN: %__powx21 = fmul fast float %__powx2, %__powx2
; GCN: %__powx22 = fmul fast float %__powx2, %tmp		; GCN: %__powx22 = fmul fast float %__powx2, %tmp
; GCN: %0 = fmul fast float %__powx21, %__powx21		; GCN: %[[r0:.*]] = fmul fast float %__powx21, %__powx21
; GCN: %__powprod3 = fmul fast float %0, %__powx22		; GCN: %__powprod3 = fmul fast float %[[r0]], %__powx22
define amdgpu_kernel void @test_pow_c(float addrspace(1)* nocapture %a) {		define amdgpu_kernel void @test_pow_c(float addrspace(1)* nocapture %a) {
entry:		entry:
%arrayidx = getelementptr inbounds float, float addrspace(1)* %a, i64 1		%arrayidx = getelementptr inbounds float, float addrspace(1)* %a, i64 1
%tmp = load float, float addrspace(1)* %arrayidx, align 4		%tmp = load float, float addrspace(1)* %arrayidx, align 4
%call = tail call fast float @_Z3powff(float %tmp, float 1.100000e+01)		%call = tail call fast float @_Z3powff(float %tmp, float 1.100000e+01)
store float %call, float addrspace(1)* %a, align 4		store float %call, float addrspace(1)* %a, align 4
ret void		ret void
}		}

; GCN-LABEL: {{^}}define amdgpu_kernel void @test_powr_c		; GCN-LABEL: {{^}}define amdgpu_kernel void @test_powr_c
; GCN: %__powx2 = fmul fast float %tmp, %tmp		; GCN: %__powx2 = fmul fast float %tmp, %tmp
; GCN: %__powx21 = fmul fast float %__powx2, %__powx2		; GCN: %__powx21 = fmul fast float %__powx2, %__powx2
; GCN: %__powx22 = fmul fast float %__powx2, %tmp		; GCN: %__powx22 = fmul fast float %__powx2, %tmp
; GCN: %0 = fmul fast float %__powx21, %__powx21		; GCN: %[[r0:.*]] = fmul fast float %__powx21, %__powx21
; GCN: %__powprod3 = fmul fast float %0, %__powx22		; GCN: %__powprod3 = fmul fast float %[[r0]], %__powx22
define amdgpu_kernel void @test_powr_c(float addrspace(1)* nocapture %a) {		define amdgpu_kernel void @test_powr_c(float addrspace(1)* nocapture %a) {
entry:		entry:
%arrayidx = getelementptr inbounds float, float addrspace(1)* %a, i64 1		%arrayidx = getelementptr inbounds float, float addrspace(1)* %a, i64 1
%tmp = load float, float addrspace(1)* %arrayidx, align 4		%tmp = load float, float addrspace(1)* %arrayidx, align 4
%call = tail call fast float @_Z4powrff(float %tmp, float 1.100000e+01)		%call = tail call fast float @_Z4powrff(float %tmp, float 1.100000e+01)
store float %call, float addrspace(1)* %a, align 4		store float %call, float addrspace(1)* %a, align 4
ret void		ret void
}		}

declare float @_Z4powrff(float, float)		declare float @_Z4powrff(float, float)

; GCN-LABEL: {{^}}define amdgpu_kernel void @test_pown_c		; GCN-LABEL: {{^}}define amdgpu_kernel void @test_pown_c
; GCN: %__powx2 = fmul fast float %tmp, %tmp		; GCN: %__powx2 = fmul fast float %tmp, %tmp
; GCN: %__powx21 = fmul fast float %__powx2, %__powx2		; GCN: %__powx21 = fmul fast float %__powx2, %__powx2
; GCN: %__powx22 = fmul fast float %__powx2, %tmp		; GCN: %__powx22 = fmul fast float %__powx2, %tmp
; GCN: %0 = fmul fast float %__powx21, %__powx21		; GCN: %[[r0:.*]] = fmul fast float %__powx21, %__powx21
; GCN: %__powprod3 = fmul fast float %0, %__powx22		; GCN: %__powprod3 = fmul fast float %[[r0]], %__powx22
define amdgpu_kernel void @test_pown_c(float addrspace(1)* nocapture %a) {		define amdgpu_kernel void @test_pown_c(float addrspace(1)* nocapture %a) {
entry:		entry:
%arrayidx = getelementptr inbounds float, float addrspace(1)* %a, i64 1		%arrayidx = getelementptr inbounds float, float addrspace(1)* %a, i64 1
%tmp = load float, float addrspace(1)* %arrayidx, align 4		%tmp = load float, float addrspace(1)* %arrayidx, align 4
%call = tail call fast float @_Z4pownfi(float %tmp, i32 11)		%call = tail call fast float @_Z4pownfi(float %tmp, i32 11)
store float %call, float addrspace(1)* %a, align 4		store float %call, float addrspace(1)* %a, align 4
ret void		ret void
}		}

declare float @_Z4pownfi(float, i32)		declare float @_Z4pownfi(float, i32)

; GCN-LABEL: {{^}}define amdgpu_kernel void @test_pow		; GCN-LABEL: {{^}}define amdgpu_kernel void @test_pow
; GCN-POSTLINK: tail call fast float @_Z3powff(float %tmp, float 1.013000e+03)		; GCN-POSTLINK: tail call fast float @_Z3powff(float %tmp, float 1.013000e+03)
; GCN-PRELINK: %__fabs = tail call fast float @_Z4fabsf(float %tmp)		; GCN-PRELINK: %__fabs = tail call fast float @_Z4fabsf(float %tmp)
; GCN-PRELINK: %__log2 = tail call fast float @_Z4log2f(float %__fabs)		; GCN-PRELINK: %__log2 = tail call fast float @_Z4log2f(float %__fabs)
; GCN-PRELINK: %__ylogx = fmul fast float %__log2, 1.013000e+03		; GCN-PRELINK: %__ylogx = fmul fast float %__log2, 1.013000e+03
; GCN-PRELINK: %__exp2 = tail call fast float @_Z4exp2f(float %__ylogx)		; GCN-PRELINK: %__exp2 = tail call fast float @_Z4exp2f(float %__ylogx)
; GCN-PRELINK: %0 = bitcast float %tmp to i32		; GCN-PRELINK: %[[r0:.*]] = bitcast float %tmp to i32
; GCN-PRELINK: %__pow_sign = and i32 %0, -2147483648		; GCN-PRELINK: %__pow_sign = and i32 %[[r0]], -2147483648
; GCN-PRELINK: %1 = bitcast float %__exp2 to i32		; GCN-PRELINK: %[[r1:.*]] = bitcast float %__exp2 to i32
; GCN-PRELINK: %2 = or i32 %__pow_sign, %1		; GCN-PRELINK: %[[r2:.*]] = or i32 %__pow_sign, %[[r1]]
; GCN-PRELINK: %3 = bitcast float addrspace(1)* %a to i32 addrspace(1)*		; GCN-PRELINK: %[[r3:.]] = bitcast float addrspace(1) %a to i32 addrspace(1)*
; GCN-PRELINK: store i32 %2, i32 addrspace(1)* %3, align 4		; GCN-PRELINK: store i32 %[[r2]], i32 addrspace(1)* %[[r3]], align 4
define amdgpu_kernel void @test_pow(float addrspace(1)* nocapture %a) {		define amdgpu_kernel void @test_pow(float addrspace(1)* nocapture %a) {
entry:		entry:
%tmp = load float, float addrspace(1)* %a, align 4		%tmp = load float, float addrspace(1)* %a, align 4
%call = tail call fast float @_Z3powff(float %tmp, float 1.013000e+03)		%call = tail call fast float @_Z3powff(float %tmp, float 1.013000e+03)
store float %call, float addrspace(1)* %a, align 4		store float %call, float addrspace(1)* %a, align 4
ret void		ret void
}		}

Show All 21 Lines
; GCN-POSTLINK: tail call fast float @_Z4pownfi(float %tmp, i32 %conv)		; GCN-POSTLINK: tail call fast float @_Z4pownfi(float %tmp, i32 %conv)
; GCN-PRELINK: %conv = fptosi float %tmp1 to i32		; GCN-PRELINK: %conv = fptosi float %tmp1 to i32
; GCN-PRELINK: %__fabs = tail call fast float @_Z4fabsf(float %tmp)		; GCN-PRELINK: %__fabs = tail call fast float @_Z4fabsf(float %tmp)
; GCN-PRELINK: %__log2 = tail call fast float @_Z4log2f(float %__fabs)		; GCN-PRELINK: %__log2 = tail call fast float @_Z4log2f(float %__fabs)
; GCN-PRELINK: %pownI2F = sitofp i32 %conv to float		; GCN-PRELINK: %pownI2F = sitofp i32 %conv to float
; GCN-PRELINK: %__ylogx = fmul fast float %__log2, %pownI2F		; GCN-PRELINK: %__ylogx = fmul fast float %__log2, %pownI2F
; GCN-PRELINK: %__exp2 = tail call fast float @_Z4exp2f(float %__ylogx)		; GCN-PRELINK: %__exp2 = tail call fast float @_Z4exp2f(float %__ylogx)
; GCN-PRELINK: %__yeven = shl i32 %conv, 31		; GCN-PRELINK: %__yeven = shl i32 %conv, 31
; GCN-PRELINK: %0 = bitcast float %tmp to i32		; GCN-PRELINK: %[[r0:.*]] = bitcast float %tmp to i32
; GCN-PRELINK: %__pow_sign = and i32 %__yeven, %0		; GCN-PRELINK: %__pow_sign = and i32 %__yeven, %[[r0]]
; GCN-PRELINK: %1 = bitcast float %__exp2 to i32		; GCN-PRELINK: %[[r1:.*]] = bitcast float %__exp2 to i32
; GCN-PRELINK: %2 = or i32 %__pow_sign, %1		; GCN-PRELINK: %[[r2:.*]] = or i32 %__pow_sign, %[[r1]]
; GCN-PRELINK: %3 = bitcast float addrspace(1)* %a to i32 addrspace(1)*		; GCN-PRELINK: %[[r3:.]] = bitcast float addrspace(1) %a to i32 addrspace(1)*
; GCN-PRELINK: store i32 %2, i32 addrspace(1)* %3, align 4		; GCN-PRELINK: store i32 %[[r2]], i32 addrspace(1)* %[[r3]], align 4
define amdgpu_kernel void @test_pown(float addrspace(1)* nocapture %a) {		define amdgpu_kernel void @test_pown(float addrspace(1)* nocapture %a) {
entry:		entry:
%tmp = load float, float addrspace(1)* %a, align 4		%tmp = load float, float addrspace(1)* %a, align 4
%arrayidx1 = getelementptr inbounds float, float addrspace(1)* %a, i64 1		%arrayidx1 = getelementptr inbounds float, float addrspace(1)* %a, i64 1
%tmp1 = load float, float addrspace(1)* %arrayidx1, align 4		%tmp1 = load float, float addrspace(1)* %arrayidx1, align 4
%conv = fptosi float %tmp1 to i32		%conv = fptosi float %tmp1 to i32
%call = tail call fast float @_Z4pownfi(float %tmp, i32 %conv)		%call = tail call fast float @_Z4pownfi(float %tmp, i32 %conv)
store float %call, float addrspace(1)* %a, align 4		store float %call, float addrspace(1)* %a, align 4
▲ Show 20 Lines • Show All 277 Lines • ▼ Show 20 Lines	entry:
%arrayidx1 = getelementptr inbounds float, float addrspace(1)* %a, i64 1		%arrayidx1 = getelementptr inbounds float, float addrspace(1)* %a, i64 1
%tmp1 = addrspacecast float addrspace(1)* %arrayidx1 to float addrspace(4)*		%tmp1 = addrspacecast float addrspace(1)* %arrayidx1 to float addrspace(4)*
%call = tail call fast float @_Z6sincosfPU3AS4f(float %tmp, float addrspace(4)* %tmp1)		%call = tail call fast float @_Z6sincosfPU3AS4f(float %tmp, float addrspace(4)* %tmp1)
store float %call, float addrspace(1)* %a, align 4		store float %call, float addrspace(1)* %a, align 4
ret void		ret void
}		}

declare float @_Z6sincosfPU3AS4f(float, float addrspace(4)*)		declare float @_Z6sincosfPU3AS4f(float, float addrspace(4)*)

		%opencl.pipe_t = type opaque
		%opencl.reserve_id_t = type opaque

		; GCN-LABEL: {{^}}define amdgpu_kernel void @test_read_pipe(%opencl.pipe_t addrspace(1)* %p, i32 addrspace(1)* %ptr)
		; GCN-PRELINK: call i32 @__read_pipe_2_4(%opencl.pipe_t addrspace(1)* %{{.}}, i32 addrspace(4) %{{.*}}) #[[NOUNWIND:[0-9]+]]
		; GCN-PRELINK: call i32 @__read_pipe_4_4(%opencl.pipe_t addrspace(1)* %{{.}}, %opencl.reserve_id_t %{{.}}, i32 2, i32 addrspace(4) %{{.*}}) #[[NOUNWIND]]
		define amdgpu_kernel void @test_read_pipe(%opencl.pipe_t addrspace(1)* %p, i32 addrspace(1)* %ptr) local_unnamed_addr {
		entry:
		%0 = bitcast i32 addrspace(1)* %ptr to i8 addrspace(1)*
		rampitecUnsubmitted Done Reply Inline Actions Run tests through opt -instnamer. rampitec: Run tests through opt -instnamer.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions will fix. yaxunl: will fix.
		rampitecUnsubmitted Done Reply Inline Actions Not done. rampitec: Not done.
		rampitecUnsubmitted Done Reply Inline Actions I still see it. rampitec: I still see it.
		yaxunlAuthorUnsubmitted Not Done Reply Inline Actions I've add -instnamer to the RUN line. Do you mean I should get the original .ll through opt -instnamer so that the .ll contains named instructions? yaxunl: I've add -instnamer to the RUN line. Do you mean I should get the original .ll through opt…
		rampitecUnsubmitted Not Done Reply Inline Actions Yes, there shall be no numbered variables in the test as it has to be easily editable. rampitec: Yes, there shall be no numbered variables in the test as it has to be easily editable.
		%1 = addrspacecast i8 addrspace(1)* %0 to i8 addrspace(4)*
		%2 = tail call i32 @__read_pipe_2(%opencl.pipe_t addrspace(1)* %p, i8 addrspace(4)* %1, i32 4, i32 4) #0
		%3 = tail call %opencl.reserve_id_t* @__reserve_read_pipe(%opencl.pipe_t addrspace(1)* %p, i32 2, i32 4, i32 4)
		%4 = tail call i32 @__read_pipe_4(%opencl.pipe_t addrspace(1)* %p, %opencl.reserve_id_t* %3, i32 2, i8 addrspace(4)* %1, i32 4, i32 4) #0
		tail call void @__commit_read_pipe(%opencl.pipe_t addrspace(1)* %p, %opencl.reserve_id_t* %3, i32 4, i32 4)
		ret void
		}

		declare i32 @__read_pipe_2(%opencl.pipe_t addrspace(1), i8 addrspace(4), i32, i32)

		declare %opencl.reserve_id_t* @__reserve_read_pipe(%opencl.pipe_t addrspace(1)*, i32, i32, i32)

		declare i32 @__read_pipe_4(%opencl.pipe_t addrspace(1), %opencl.reserve_id_t, i32, i8 addrspace(4)*, i32, i32)

		declare void @__commit_read_pipe(%opencl.pipe_t addrspace(1), %opencl.reserve_id_t, i32, i32)

		; GCN-LABEL: {{^}}define amdgpu_kernel void @test_write_pipe(%opencl.pipe_t addrspace(1)* %p, i32 addrspace(1)* %ptr)
		; GCN-PRELINK: call i32 @__write_pipe_2_4(%opencl.pipe_t addrspace(1)* %{{.}}, i32 addrspace(4) %{{.*}}) #[[NOUNWIND]]
		; GCN-PRELINK: call i32 @__write_pipe_4_4(%opencl.pipe_t addrspace(1)* %{{.}}, %opencl.reserve_id_t %{{.}}, i32 2, i32 addrspace(4) %{{.*}}) #[[NOUNWIND]]
		define amdgpu_kernel void @test_write_pipe(%opencl.pipe_t addrspace(1)* %p, i32 addrspace(1)* %ptr) local_unnamed_addr {
		entry:
		%0 = bitcast i32 addrspace(1)* %ptr to i8 addrspace(1)*
		%1 = addrspacecast i8 addrspace(1)* %0 to i8 addrspace(4)*
		%2 = tail call i32 @__write_pipe_2(%opencl.pipe_t addrspace(1)* %p, i8 addrspace(4)* %1, i32 4, i32 4) #0
		%3 = tail call %opencl.reserve_id_t* @__reserve_write_pipe(%opencl.pipe_t addrspace(1)* %p, i32 2, i32 4, i32 4) #0
		%4 = tail call i32 @__write_pipe_4(%opencl.pipe_t addrspace(1)* %p, %opencl.reserve_id_t* %3, i32 2, i8 addrspace(4)* %1, i32 4, i32 4) #0
		tail call void @__commit_write_pipe(%opencl.pipe_t addrspace(1)* %p, %opencl.reserve_id_t* %3, i32 4, i32 4) #0
		ret void
		}

		declare i32 @__write_pipe_2(%opencl.pipe_t addrspace(1), i8 addrspace(4), i32, i32) local_unnamed_addr

		declare %opencl.reserve_id_t* @__reserve_write_pipe(%opencl.pipe_t addrspace(1)*, i32, i32, i32) local_unnamed_addr

		declare i32 @__write_pipe_4(%opencl.pipe_t addrspace(1), %opencl.reserve_id_t, i32, i8 addrspace(4)*, i32, i32) local_unnamed_addr

		declare void @__commit_write_pipe(%opencl.pipe_t addrspace(1), %opencl.reserve_id_t, i32, i32) local_unnamed_addr

		%struct.S = type { [100 x i32] }

		; GCN-LABEL: {{^}}define amdgpu_kernel void @test_pipe_size
		; GCN-PRELINK: call i32 @__read_pipe_2_1(%opencl.pipe_t addrspace(1)* %{{.}} i8 addrspace(4) %{{.*}}) #[[NOUNWIND]]
		; GCN-PRELINK: call i32 @__read_pipe_2_2(%opencl.pipe_t addrspace(1)* %{{.}} i16 addrspace(4) %{{.*}}) #[[NOUNWIND]]
		; GCN-PRELINK: call i32 @__read_pipe_2_4(%opencl.pipe_t addrspace(1)* %{{.}} i32 addrspace(4) %{{.*}}) #[[NOUNWIND]]
		; GCN-PRELINK: call i32 @__read_pipe_2_8(%opencl.pipe_t addrspace(1)* %{{.}} i64 addrspace(4) %{{.*}}) #[[NOUNWIND]]
		; GCN-PRELINK: call i32 @__read_pipe_2_16(%opencl.pipe_t addrspace(1)* %{{.}}, <2 x i64> addrspace(4) %{{.*}}) #[[NOUNWIND]]
		; GCN-PRELINK: call i32 @__read_pipe_2_32(%opencl.pipe_t addrspace(1)* %{{.}}, <4 x i64> addrspace(4) %{{.*}} #[[NOUNWIND]]
		; GCN-PRELINK: call i32 @__read_pipe_2_64(%opencl.pipe_t addrspace(1)* %{{.}}, <8 x i64> addrspace(4) %{{.*}} #[[NOUNWIND]]
		; GCN-PRELINK: call i32 @__read_pipe_2_128(%opencl.pipe_t addrspace(1)* %{{.}}, <16 x i64> addrspace(4) %{{.*}} #[[NOUNWIND]]
		; GCN-PRELINK: call i32 @__read_pipe_2(%opencl.pipe_t addrspace(1)* %{{.}}, i8 addrspace(4) %{{.*}} i32 400, i32 4) #[[NOUNWIND]]
		define amdgpu_kernel void @test_pipe_size(%opencl.pipe_t addrspace(1)* %p1, i8 addrspace(1)* %ptr1, %opencl.pipe_t addrspace(1)* %p2, i16 addrspace(1)* %ptr2, %opencl.pipe_t addrspace(1)* %p4, i32 addrspace(1)* %ptr4, %opencl.pipe_t addrspace(1)* %p8, i64 addrspace(1)* %ptr8, %opencl.pipe_t addrspace(1)* %p16, <2 x i64> addrspace(1)* %ptr16, %opencl.pipe_t addrspace(1)* %p32, <4 x i64> addrspace(1)* %ptr32, %opencl.pipe_t addrspace(1)* %p64, <8 x i64> addrspace(1)* %ptr64, %opencl.pipe_t addrspace(1)* %p128, <16 x i64> addrspace(1)* %ptr128, %opencl.pipe_t addrspace(1)* %pu, %struct.S addrspace(1)* %ptru) local_unnamed_addr #0 {
		entry:
		%0 = addrspacecast i8 addrspace(1)* %ptr1 to i8 addrspace(4)*
		%1 = tail call i32 @__read_pipe_2(%opencl.pipe_t addrspace(1)* %p1, i8 addrspace(4)* %0, i32 1, i32 1) #0
		%2 = bitcast i16 addrspace(1)* %ptr2 to i8 addrspace(1)*
		%3 = addrspacecast i8 addrspace(1)* %2 to i8 addrspace(4)*
		%4 = tail call i32 @__read_pipe_2(%opencl.pipe_t addrspace(1)* %p2, i8 addrspace(4)* %3, i32 2, i32 2) #0
		%5 = bitcast i32 addrspace(1)* %ptr4 to i8 addrspace(1)*
		%6 = addrspacecast i8 addrspace(1)* %5 to i8 addrspace(4)*
		%7 = tail call i32 @__read_pipe_2(%opencl.pipe_t addrspace(1)* %p4, i8 addrspace(4)* %6, i32 4, i32 4) #0
		%8 = bitcast i64 addrspace(1)* %ptr8 to i8 addrspace(1)*
		%9 = addrspacecast i8 addrspace(1)* %8 to i8 addrspace(4)*
		%10 = tail call i32 @__read_pipe_2(%opencl.pipe_t addrspace(1)* %p8, i8 addrspace(4)* %9, i32 8, i32 8) #0
		%11 = bitcast <2 x i64> addrspace(1)* %ptr16 to i8 addrspace(1)*
		%12 = addrspacecast i8 addrspace(1)* %11 to i8 addrspace(4)*
		%13 = tail call i32 @__read_pipe_2(%opencl.pipe_t addrspace(1)* %p16, i8 addrspace(4)* %12, i32 16, i32 16) #0
		%14 = bitcast <4 x i64> addrspace(1)* %ptr32 to i8 addrspace(1)*
		%15 = addrspacecast i8 addrspace(1)* %14 to i8 addrspace(4)*
		%16 = tail call i32 @__read_pipe_2(%opencl.pipe_t addrspace(1)* %p32, i8 addrspace(4)* %15, i32 32, i32 32) #0
		%17 = bitcast <8 x i64> addrspace(1)* %ptr64 to i8 addrspace(1)*
		%18 = addrspacecast i8 addrspace(1)* %17 to i8 addrspace(4)*
		%19 = tail call i32 @__read_pipe_2(%opencl.pipe_t addrspace(1)* %p64, i8 addrspace(4)* %18, i32 64, i32 64) #0
		%20 = bitcast <16 x i64> addrspace(1)* %ptr128 to i8 addrspace(1)*
		%21 = addrspacecast i8 addrspace(1)* %20 to i8 addrspace(4)*
		%22 = tail call i32 @__read_pipe_2(%opencl.pipe_t addrspace(1)* %p128, i8 addrspace(4)* %21, i32 128, i32 128) #0
		%23 = bitcast %struct.S addrspace(1)* %ptru to i8 addrspace(1)*
		%24 = addrspacecast i8 addrspace(1)* %23 to i8 addrspace(4)*
		%25 = tail call i32 @__read_pipe_2(%opencl.pipe_t addrspace(1)* %pu, i8 addrspace(4)* %24, i32 400, i32 4) #0
		ret void
		}

		; CGN-PRELINK: attributes #[[NOUNWIND]] = { nounwind }
		attributes #0 = { nounwind }

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Transform __read_pipe_* and __write_pipe_*ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 113672

lib/Target/AMDGPU/AMDGPULibCalls.cpp

lib/Target/AMDGPU/AMDGPULibFunc.h

lib/Target/AMDGPU/AMDGPULibFunc.cpp

test/CodeGen/AMDGPU/simplify-libcalls.ll

[AMDGPU] Transform __read_pipe_* and __write_pipe_*
ClosedPublic