This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] add intrinsic for s_getpc
ClosedPublic

Authored by timcorringham on May 4 2017, 6:54 AM.

Details

Summary

The s_getpc instruction is exposed as intrinsic llvm.amdgcn.s.getpc.

Event Timeline

timcorringham created this revision.May 4 2017, 6:54 AM
arsenm edited edge metadata.May 4 2017, 10:24 AM

What are you using this for? I don't see how an unpredictable address like this is useful

What are you using this for? I don't see how an unpredictable address like this is useful

The particular use-case this is required for only uses the top part of the address that is returned, so the imprecision isn't a problem. The onus is on the programmer to use barriers if a more precise value is required.

arsenm added a comment.May 5 2017, 9:56 AM

What are you using this for? I don't see how an unpredictable address like this is useful

The particular use-case this is required for only uses the top part of the address that is returned, so the imprecision isn't a problem. The onus is on the programmer to use barriers if a more precise value is required.

Even then that's a pretty big assumption relying on the high 32-bits. What are you doing with the address? This might be better served by something more targeted

Even then that's a pretty big assumption relying on the high 32-bits. What are you doing with the address? This might be better served by something more targeted

The use-case is for code that follows a convention that ensures the high address is always the value that is required. This intrinsic is unlikely to be widely used, but is probably more useful than an intrinsic that just returns the high address - although I accept that is debatable!

Low & high bits and use cases aside, isn't this something that could be covered by adding support for PC in the read_register intrinsic?

Low & high bits and use cases aside, isn't this something that could be covered by adding support for PC in the read_register intrinsic?

I'm sure that is possible, but I'm put off by the LLVM LangRef description of read_register which includes the following:

Warning: So far it only works with the stack pointer on selected architectures (ARM, AArch64, PowerPC and x86_64). Significant amount of work is needed to support other registers...

Apart from that warning, the advantage of s_getpc is that it just exposes an instruction which has some documentation and known behaviour, while adding new behaviour to read_register would be more obscure.

read/write register work for AMDGPU, though there is the hazard you can easily create impossible cases of VGPR->SGPR copies we don't attempt to handle

read/write register would be inappropriate for AMDGPU because we don't actually have an addressable PC register, and it's accessed through an instruction like this

[AMDGPU] add intrinsic for s_getpc

The s_getpc instruction is exposed as intrinsic llvm.amdgcn.s.getpc

Updated comment to clarify that it is provided to allow a specific style
of position independent code to determine the high part of its address when
it is known (through convention) that the code and any data of interest does
not cross a 4Gb address boundary.

arsenm added inline comments.May 15 2017, 10:04 AM
include/llvm/IR/IntrinsicsAMDGPU.td
576

Do you have the patch for the clang component of this?

577

You could also debatably make it speculatable

lib/Target/AMDGPU/SOPInstructions.td
941–943

You should be able to add this directly to the instruction definition without a standalone pattern. The comment is also unnecessary

test/CodeGen/AMDGPU/llvm.amdgcn.s.getpc.ll
16

Not that it matters, but this doesn't match the readnone it was declared with

Amendments addressing review comments:

Corresponding Clang changes are in D33276.

I have made the intrinsic speculatable.

I have moved the intrinsic pattern to be next to the instruction def, but I'm too much of a Tablegen novice to be able to roll them in together.

Corrected attributes in test.

Amendments addressing review comments:

Corresponding Clang changes are in D33276.

I have made the intrinsic speculatable.

I have moved the intrinsic pattern to be next to the instruction def, but I'm too much of a Tablegen novice to be able to roll them in together.

Corrected attributes in test.

See SI_PS_LIVE, you just need something like [(set i64:$sdst, (int_foo))]

Updated s_getpc instruction definition to include intrinsic.

arsenm accepted this revision.May 23 2017, 9:46 AM

LGTM

This revision is now accepted and ready to land.May 23 2017, 9:46 AM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.May 25 2017, 12:20 PM

http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/2604
http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/3280

$ svn commit -m 'Revert r303859, CodeGen/AMDGPU/llvm.amdgcn.s.getpc.ll fails on bots.'
Sending include/llvm/IR/IntrinsicsAMDGPU.td
Sending lib/Target/AMDGPU/SOPInstructions.td
Deleting test/CodeGen/AMDGPU/llvm.amdgcn.s.getpc.ll
Transmitting file data ..
Committed revision 303902.

kzhuravl added inline comments.May 25 2017, 12:24 PM
llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.s.getpc.ll
9 ↗(On Diff #100242)

@timcorringham, Add amdgpu_kernel convention.