Page MenuHomePhabricator

[AMDGPU] Transform __read_pipe_* and __write_pipe_*
ClosedPublic

Authored by yaxunl on Aug 17 2017, 7:36 AM.

Details

Summary

When packet size equals packet align and is power of 2, transform
read_pipe* and write_pipe* to specialized library function.

Diff Detail

Repository
rL LLVM

Event Timeline

yaxunl created this revision.Aug 17 2017, 7:36 AM
rampitec requested changes to this revision.Aug 17 2017, 10:26 AM

Please use AMDGPULibFunc instead of manual function name manipulations.

lib/Target/AMDGPU/AMDGPULibCalls.cpp
598 ↗(On Diff #111514)

"llvm::" prefix is not needed.

616 ↗(On Diff #111514)

You should not use getOrInsertFunction directly, but use AMDGPULibCalls::getFunction().
Since you are not using it you also fail to check we are in pre-link.

645 ↗(On Diff #111514)

You should use AMDGPULibFunc and parser/mangler used across the whole this source.

This revision now requires changes to proceed.Aug 17 2017, 10:26 AM
yaxunl marked 3 inline comments as done.Aug 20 2017, 2:38 PM
yaxunl added inline comments.
lib/Target/AMDGPU/AMDGPULibCalls.cpp
598 ↗(On Diff #111514)

will remove

616 ↗(On Diff #111514)

whether in pre-link can be checked by whether the function is declaration.

__read_pipe_* and __write_pipe_* are unmangled functions. AMDGPULibFunc::getFunction relies on argument information encoded in mangled function names, which does not work for unmangled functions.

645 ↗(On Diff #111514)

will refactor this.

yaxunl updated this revision to Diff 111907.Aug 20 2017, 2:47 PM
yaxunl edited edge metadata.
yaxunl marked 3 inline comments as done.
yaxunl removed 1 blocking reviewer(s): vpykhtin.

refactor AMDGPULibFunc to handle unmangled lib functions.

yaxunl updated this revision to Diff 111912.Aug 20 2017, 2:51 PM

add check for declaration to avoid post-link transformation of pipe functions.

rampitec requested changes to this revision.Aug 22 2017, 9:31 AM

You still fail to use getFunction and thus also fail to check prelink.

lib/Target/AMDGPU/AMDGPULibCalls.cpp
67 ↗(On Diff #111912)

It is better hide these details from the client. AMDGPULibFunc should care about all of it and FuncInfo shall be sufficient structure to describe a function. Simplifier should not care about details.

616 ↗(On Diff #111514)

You are still using M->getOrInsertFunction() directly.

lib/Target/AMDGPU/AMDGPULibFunc.cpp
987 ↗(On Diff #111912)

I believe we can go without bruteforce search loop.

lib/Target/AMDGPU/AMDGPULibFunc.h
240 ↗(On Diff #111912)

There is already EI_READ_PIPE above. Is it mangled? Then why these are unmangled?

366 ↗(On Diff #111912)

StringRef designed to be passed by value, here and below.

test/CodeGen/AMDGPU/simplify-libcalls.ll
693 ↗(On Diff #111912)

Run tests through opt -instnamer.

This revision now requires changes to proceed.Aug 22 2017, 9:31 AM

Clang does not emit mangled functions for pipe builtin functions. Instead, it emits unmangled functions e.g. __read_pipe_*. These functions are not overloaded and will be implemented by device library.

The original implementation of AMDGPULibFunc assumes all library functions have mangled name. It extracts function argument information based on the mangled function name and use it to implement getOrInsertFunction.

For unmangled library functions, those functions regarding mangling or function parameter info are either irrelevant or useless. Those functions are needed for mangled functions because the mangling requires such information.

Therefore I subclass AMDGPULibFunc as AMDGPUMangledLibFunc and AMDGPUUnmangledLibFunc and move all the stuff needed for mangling to AMDGPUMangledLibFunc.

For AMDGPUUnmangledLibFunc, since there is no argument type information from mangling, AMDGPULibFunc::getOrInsertFunction cannot be implemented. There is no special trick about identifying whether it is prelink or not in the original implementation. Just checks whether the function is declaration or not. I see no point of implementing a special getOrInsertFunction for AMDGPUUnmangledLibFunc.

Clang does not emit mangled functions for pipe builtin functions. Instead, it emits unmangled functions e.g. __read_pipe_*. These functions are not overloaded and will be implemented by device library.

The original implementation of AMDGPULibFunc assumes all library functions have mangled name. It extracts function argument information based on the mangled function name and use it to implement getOrInsertFunction.

For unmangled library functions, those functions regarding mangling or function parameter info are either irrelevant or useless. Those functions are needed for mangled functions because the mangling requires such information.

Therefore I subclass AMDGPULibFunc as AMDGPUMangledLibFunc and AMDGPUUnmangledLibFunc and move all the stuff needed for mangling to AMDGPUMangledLibFunc.

For AMDGPUUnmangledLibFunc, since there is no argument type information from mangling, AMDGPULibFunc::getOrInsertFunction cannot be implemented. There is no special trick about identifying whether it is prelink or not in the original implementation. Just checks whether the function is declaration or not. I see no point of implementing a special getOrInsertFunction for AMDGPUUnmangledLibFunc.

What about EI_READ_PIPE? Is it mangled? Shall it be removed?

Whatever you subclass all the changes shall be contained within AMDGPULibFunc and not in its clients. If some of the arguments on FuncInfo are unused in this case that is fine. You can also check the former implementation from HSAIL which was also handling EDG style magling (unmagled in therms you are using here).

All in all that is only legal to do it on prelink, and that is exactly what getFunction() does.

yaxunl marked 8 inline comments as done.Aug 22 2017, 10:11 AM
yaxunl added inline comments.
lib/Target/AMDGPU/AMDGPULibCalls.cpp
67 ↗(On Diff #111912)

The handling of mangled lib function requires function argument information, which is not available for unmangled lib function.

The transformation of unmangled lib function does not require function argument information. There is no point of implementing many of the member functions of mangled lib functions.

lib/Target/AMDGPU/AMDGPULibFunc.cpp
987 ↗(On Diff #111912)

will add a map.

lib/Target/AMDGPU/AMDGPULibFunc.h
240 ↗(On Diff #111912)

clang does not emit mangled read_pipe functions. Will remove EI_READ_PIPE.

366 ↗(On Diff #111912)

This comes from the original implementation. Will fix.

test/CodeGen/AMDGPU/simplify-libcalls.ll
693 ↗(On Diff #111912)

will fix.

yaxunl marked 5 inline comments as done.Aug 22 2017, 10:23 AM

Clang does not emit mangled functions for pipe builtin functions. Instead, it emits unmangled functions e.g. __read_pipe_*. These functions are not overloaded and will be implemented by device library.

The original implementation of AMDGPULibFunc assumes all library functions have mangled name. It extracts function argument information based on the mangled function name and use it to implement getOrInsertFunction.

For unmangled library functions, those functions regarding mangling or function parameter info are either irrelevant or useless. Those functions are needed for mangled functions because the mangling requires such information.

Therefore I subclass AMDGPULibFunc as AMDGPUMangledLibFunc and AMDGPUUnmangledLibFunc and move all the stuff needed for mangling to AMDGPUMangledLibFunc.

For AMDGPUUnmangledLibFunc, since there is no argument type information from mangling, AMDGPULibFunc::getOrInsertFunction cannot be implemented. There is no special trick about identifying whether it is prelink or not in the original implementation. Just checks whether the function is declaration or not. I see no point of implementing a special getOrInsertFunction for AMDGPUUnmangledLibFunc.

What about EI_READ_PIPE? Is it mangled? Shall it be removed?

Whatever you subclass all the changes shall be contained within AMDGPULibFunc and not in its clients. If some of the arguments on FuncInfo are unused in this case that is fine. You can also check the former implementation from HSAIL which was also handling EDG style magling (unmagled in therms you are using here).

All in all that is only legal to do it on prelink, and that is exactly what getFunction() does.

For pipe functions, clang currently does not do any mangling at all. It is not like the EDG style mangling.

As I said, getFunction() requires function argument information, which is not available for unmangled functions. To implement getFunction() requires providing argument information for unmangled functions, which is time consuming and also useless.

As I said, getFunction() requires function argument information, which is not available for unmangled functions. To implement getFunction() requires providing argument information for unmangled functions, which is time consuming and also useless.

You can just omit filling it. You have FuncId and that is all you need in this case.

As I said, getFunction() requires function argument information, which is not available for unmangled functions. To implement getFunction() requires providing argument information for unmangled functions, which is time consuming and also useless.

You can just omit filling it. You have FuncId and that is all you need in this case.

I need to get a function with different (transformed) function type.

Run it without -amdgpu-prelink. It will fail to link. It will also fail to build library.

Run it without -amdgpu-prelink. It will fail to link. It will also fail to build library.

The device library has not implemented these functions yet. I think that's why it fails to link.

I will investigate why it fails to build library.

Run it without -amdgpu-prelink. It will fail to link. It will also fail to build library.

The device library has not implemented these functions yet. I think that's why it fails to link.

I will investigate why it fails to build library.

It fails because you do not use getFunction, effectively skipping prelinck check.

Run it without -amdgpu-prelink. It will fail to link. It will also fail to build library.

The device library has not implemented these functions yet. I think that's why it fails to link.

I will investigate why it fails to build library.

It fails because you do not use getFunction, effectively skipping prelinck check.

For mangled lib functions, my patch does not change how they are handled. They still go through getFunction.

For unmangled lib functions, I only transform them if they are declarations. In post-linking pass, they are already linked and are not declarations, therefore they stay unchanged.

I am wondering why there will be link failure.

Run it without -amdgpu-prelink. It will fail to link. It will also fail to build library.

The device library has not implemented these functions yet. I think that's why it fails to link.

I will investigate why it fails to build library.

It fails because you do not use getFunction, effectively skipping prelinck check.

For mangled lib functions, my patch does not change how they are handled. They still go through getFunction.

For unmangled lib functions, I only transform them if they are declarations. In post-linking pass, they are already linked and are not declarations, therefore they stay unchanged.

I am wondering why there will be link failure.

Library build works before link, but you do not check that prelink transfirmations allowed.

Run it without -amdgpu-prelink. It will fail to link. It will also fail to build library.

The device library has not implemented these functions yet. I think that's why it fails to link.

I will investigate why it fails to build library.

It fails because you do not use getFunction, effectively skipping prelinck check.

For mangled lib functions, my patch does not change how they are handled. They still go through getFunction.

For unmangled lib functions, I only transform them if they are declarations. In post-linking pass, they are already linked and are not declarations, therefore they stay unchanged.

I am wondering why there will be link failure.

Library build works before link, but you do not check that prelink transfirmations allowed.

library contains definition of unmangled functions, since they are not declaration, the pass will not change them.

vpykhtin edited edge metadata.Sep 1 2017, 6:35 AM

Splitting AMDGPULibFunc in two classes looks a huge overkill. How about modifying AMDGPULibFunc::parse so it could accept unmangled names and just return an enum id for the function (using some fast lookup approach)? Type info for such functions can be left unpopulated and supposed to be handled by the client (as in fold_read_write_pipe).

yaxunl marked 5 inline comments as done.Sep 1 2017, 7:02 AM

Splitting AMDGPULibFunc in two classes looks a huge overkill. How about modifying AMDGPULibFunc::parse so it could accept unmangled names and just return an enum id for the function (using some fast lookup approach)? Type info for such functions can be left unpopulated and supposed to be handled by the client (as in fold_read_write_pipe).

The unmangled lib function is different from mangled function in the way how the function names and type information are handled. I have found a way to reuse the interface mangle, getOrInsertFunction, getFunction, and getFunctionType. However to achieve that we really need to have different classes for mangled and unmangled lib function and take advantages of some virtual functions. Having different classes for unmangled and mangled lib functions also have a cleaner design where all name mangling stuff are kept in where they belong.

vpykhtin accepted this revision.Sep 1 2017, 9:32 AM

Ok, unmangled part looks different indeed. If the issue with pre-link checking is solved this patch is ok with me.

rampitec added inline comments.Sep 1 2017, 9:55 AM
lib/Target/AMDGPU/AMDGPULibCalls.cpp
616 ↗(On Diff #111514)

I see you are now checking for the declaration vs definition, but it is still only legal on pre-link and that is not checked.

lib/Target/AMDGPU/AMDGPULibFunc.cpp
987 ↗(On Diff #111912)

Not done.

lib/Target/AMDGPU/AMDGPULibFunc.h
366 ↗(On Diff #111912)

Not done.

test/CodeGen/AMDGPU/simplify-libcalls.ll
693 ↗(On Diff #111912)

Not done.

yaxunl updated this revision to Diff 113672.Sep 2 2017, 8:29 PM
yaxunl edited edge metadata.
yaxunl marked an inline comment as done.

Revised by Stas' comments. Use AMDGPULibFunc::getOrInsertFunction to create function.

yaxunl updated this revision to Diff 113673.Sep 2 2017, 8:40 PM
yaxunl marked 4 inline comments as done.

Minor change of test.

vpykhtin added inline comments.Sep 4 2017, 3:15 AM
lib/Target/AMDGPU/AMDGPULibFunc.h
366 ↗(On Diff #111912)

Initially (non-const) StringRef& was introduced intentionally to have mangledName with stripped name on return.

yaxunl marked an inline comment as done.Sep 4 2017, 5:40 AM
yaxunl added inline comments.
lib/Target/AMDGPU/AMDGPULibCalls.cpp
616 ↗(On Diff #111514)

Now use AMDGPULibFunc::getOrInsertFunction().

lib/Target/AMDGPU/AMDGPULibFunc.h
366 ↗(On Diff #111912)

Only const StringRef& is changed to StringRef. non-const StringRef& is not changed.

rampitec added inline comments.Sep 4 2017, 10:30 AM
lib/Target/AMDGPU/AMDGPULibCalls.cpp
67 ↗(On Diff #111912)

The whole point of moving logic to distinguish between mangled and unmangled into the parser is to have no massive changes in the client like this and to let client to not bother differentiating on every other line.

636 ↗(On Diff #113673)

The point of using it is to check its result and bail if null. Also no modifications shall be done before this check.

test/CodeGen/AMDGPU/simplify-libcalls.ll
693 ↗(On Diff #111912)

I still see it.

yaxunl marked 2 inline comments as done.Sep 4 2017, 2:55 PM
yaxunl added inline comments.
test/CodeGen/AMDGPU/simplify-libcalls.ll
693 ↗(On Diff #111912)

I've add -instnamer to the RUN line. Do you mean I should get the original .ll through opt -instnamer so that the .ll contains named instructions?

rampitec added inline comments.Sep 4 2017, 4:30 PM
test/CodeGen/AMDGPU/simplify-libcalls.ll
693 ↗(On Diff #111912)

Yes, there shall be no numbered variables in the test as it has to be easily editable.

yaxunl updated this revision to Diff 113906.Sep 5 2017, 1:24 PM
yaxunl marked an inline comment as done.

Revised by Stas' comments.

rampitec added inline comments.Sep 5 2017, 2:00 PM
test/CodeGen/AMDGPU/simplify-libcalls.ll
1 ↗(On Diff #113906)

-instanamer is not needed here.

yaxunl updated this revision to Diff 113909.Sep 5 2017, 2:04 PM

Remove -instnamer from RUN line.

rampitec added inline comments.Sep 5 2017, 2:09 PM
lib/Target/AMDGPU/AMDGPULibCalls.cpp
612 ↗(On Diff #113906)

Please move the actual IR change below "return false" statement.

yaxunl updated this revision to Diff 113912.Sep 5 2017, 2:25 PM

Revised by Stas' comments.

rampitec accepted this revision.Sep 5 2017, 2:32 PM

Thanks!

This revision is now accepted and ready to land.Sep 5 2017, 2:32 PM
This revision was automatically updated to reflect the committed changes.