This is an archive of the discontinued LLVM Phabricator instance.

[clang][CodeGen] Fix address space for sret
AbandonedPublic

Authored by yaxunl on Apr 27 2021, 11:58 AM.

Details

Reviewers
rjmccall
Summary

sret is returned through temporary variables allocated on stack,
therefore it should use alloca address space.

Currently clang use default address space for sret pointers. This
causes inefficient code generated for AMDGPU backend since
alloca address space is 32 bit whereas generic pointer is 64 bit.
It also causes assertions where alloca address space is expected.

This patch uses alloca address space for sret pointers. It is NFC
for targets alloca address space of which is default address space.

Diff Detail

Event Timeline

yaxunl created this revision.Apr 27 2021, 11:58 AM
yaxunl requested review of this revision.Apr 27 2021, 11:58 AM
yaxunl updated this revision to Diff 341054.Apr 27 2021, 7:09 PM

fix invalid bitcast due to sret pointer passed to ctor

yaxunl updated this revision to Diff 341257.Apr 28 2021, 10:26 AM

cast return value to default address space since it is expected. also fix debug info

I think this is intentional; requiring the indirect-result parameter to be in the alloca address space would prevent direct initialization of non-temporary memory, which is an important optimization in C++.

I think this is intentional; requiring the indirect-result parameter to be in the alloca address space would prevent direct initialization of non-temporary memory, which is an important optimization in C++.

You mean situations like this? https://godbolt.org/z/KnPs6znK8

Address of a global variable is directly passed as the sret arg to the function returning a struct, instead of creating a temporary struct variable and passing its address as the sret arg?

However, this is forbidden in CUDA/HIP: https://godbolt.org/z/1hjsrn9Tn

So can we assume sret arg is always in alloca addr space for CUDA/HIP?

I think this is intentional; requiring the indirect-result parameter to be in the alloca address space would prevent direct initialization of non-temporary memory, which is an important optimization in C++.

You mean situations like this? https://godbolt.org/z/KnPs6znK8

Address of a global variable is directly passed as the sret arg to the function returning a struct, instead of creating a temporary struct variable and passing its address as the sret arg?

That's one specific case of copy-elision, but there are others, e.g.: https://godbolt.org/z/boTeMseaM

I suppose you could disable general copy-elision. But don't you have aggressive optimizations to strength-reduce generic pointer operations?

yaxunl abandoned this revision.Apr 29 2021, 7:45 AM

I think this is intentional; requiring the indirect-result parameter to be in the alloca address space would prevent direct initialization of non-temporary memory, which is an important optimization in C++.

You mean situations like this? https://godbolt.org/z/KnPs6znK8

Address of a global variable is directly passed as the sret arg to the function returning a struct, instead of creating a temporary struct variable and passing its address as the sret arg?

That's one specific case of copy-elision, but there are others, e.g.: https://godbolt.org/z/boTeMseaM

I suppose you could disable general copy-elision. But don't you have aggressive optimizations to strength-reduce generic pointer operations?

Thanks for the example. I agree that sret should be in default addr space.