The s_getpc instruction is exposed as intrinsic llvm.amdgcn.s.getpc.
Details
Diff Detail
- Build Status
Buildable 6535 Build 6535: arc lint + arc unit
Event Timeline
What are you using this for? I don't see how an unpredictable address like this is useful
The particular use-case this is required for only uses the top part of the address that is returned, so the imprecision isn't a problem. The onus is on the programmer to use barriers if a more precise value is required.
Even then that's a pretty big assumption relying on the high 32-bits. What are you doing with the address? This might be better served by something more targeted
Even then that's a pretty big assumption relying on the high 32-bits. What are you doing with the address? This might be better served by something more targeted
The use-case is for code that follows a convention that ensures the high address is always the value that is required. This intrinsic is unlikely to be widely used, but is probably more useful than an intrinsic that just returns the high address - although I accept that is debatable!
Low & high bits and use cases aside, isn't this something that could be covered by adding support for PC in the read_register intrinsic?
I'm sure that is possible, but I'm put off by the LLVM LangRef description of read_register which includes the following:
Warning: So far it only works with the stack pointer on selected architectures (ARM, AArch64, PowerPC and x86_64). Significant amount of work is needed to support other registers...
Apart from that warning, the advantage of s_getpc is that it just exposes an instruction which has some documentation and known behaviour, while adding new behaviour to read_register would be more obscure.
read/write register work for AMDGPU, though there is the hazard you can easily create impossible cases of VGPR->SGPR copies we don't attempt to handle
read/write register would be inappropriate for AMDGPU because we don't actually have an addressable PC register, and it's accessed through an instruction like this
[AMDGPU] add intrinsic for s_getpc
The s_getpc instruction is exposed as intrinsic llvm.amdgcn.s.getpc
Updated comment to clarify that it is provided to allow a specific style
of position independent code to determine the high part of its address when
it is known (through convention) that the code and any data of interest does
not cross a 4Gb address boundary.
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
576 | Do you have the patch for the clang component of this? | |
577 | You could also debatably make it speculatable | |
lib/Target/AMDGPU/SOPInstructions.td | ||
950–952 | You should be able to add this directly to the instruction definition without a standalone pattern. The comment is also unnecessary | |
test/CodeGen/AMDGPU/llvm.amdgcn.s.getpc.ll | ||
16 | Not that it matters, but this doesn't match the readnone it was declared with |
Amendments addressing review comments:
Corresponding Clang changes are in D33276.
I have made the intrinsic speculatable.
I have moved the intrinsic pattern to be next to the instruction def, but I'm too much of a Tablegen novice to be able to roll them in together.
Corrected attributes in test.
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/2604
http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/3280
$ svn commit -m 'Revert r303859, CodeGen/AMDGPU/llvm.amdgcn.s.getpc.ll fails on bots.'
Sending include/llvm/IR/IntrinsicsAMDGPU.td
Sending lib/Target/AMDGPU/SOPInstructions.td
Deleting test/CodeGen/AMDGPU/llvm.amdgcn.s.getpc.ll
Transmitting file data ..
Committed revision 303902.
llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.s.getpc.ll | ||
---|---|---|
9 ↗ | (On Diff #100242) | @timcorringham, Add amdgpu_kernel convention. |
Do you have the patch for the clang component of this?