⚙ D32862 [AMDGPU] add intrinsic for s

timcorringham created this revision.May 4 2017, 6:54 AM

Herald added subscribers: t-tye, tpr, dstuttard and 4 others. · View Herald TranscriptMay 4 2017, 6:54 AM

Harbormaster completed remote builds in B6147: Diff 97820.May 4 2017, 6:54 AM

What are you using this for? I don't see how an unpredictable address like this is useful

In D32862#746257, @arsenm wrote:

What are you using this for? I don't see how an unpredictable address like this is useful

The particular use-case this is required for only uses the top part of the address that is returned, so the imprecision isn't a problem. The onus is on the programmer to use barriers if a more precise value is required.

In D32862#746896, @timcorringham wrote:

In D32862#746257, @arsenm wrote:

What are you using this for? I don't see how an unpredictable address like this is useful

The particular use-case this is required for only uses the top part of the address that is returned, so the imprecision isn't a problem. The onus is on the programmer to use barriers if a more precise value is required.

Even then that's a pretty big assumption relying on the high 32-bits. What are you doing with the address? This might be better served by something more targeted

Even then that's a pretty big assumption relying on the high 32-bits. What are you doing with the address? This might be better served by something more targeted

The use-case is for code that follows a convention that ensures the high address is always the value that is required. This intrinsic is unlikely to be widely used, but is probably more useful than an intrinsic that just returns the high address - although I accept that is debatable!

Low & high bits and use cases aside, isn't this something that could be covered by adding support for PC in the read_register intrinsic?

In D32862#748486, @nhaehnle wrote:

Low & high bits and use cases aside, isn't this something that could be covered by adding support for PC in the read_register intrinsic?

I'm sure that is possible, but I'm put off by the LLVM LangRef description of read_register which includes the following:

Warning: So far it only works with the stack pointer on selected architectures (ARM, AArch64, PowerPC and x86_64). Significant amount of work is needed to support other registers...

Apart from that warning, the advantage of s_getpc is that it just exposes an instruction which has some documentation and known behaviour, while adding new behaviour to read_register would be more obscure.

read/write register work for AMDGPU, though there is the hazard you can easily create impossible cases of VGPR->SGPR copies we don't attempt to handle

read/write register would be inappropriate for AMDGPU because we don't actually have an addressable PC register, and it's accessed through an instruction like this

[AMDGPU] add intrinsic for s_getpc

The s_getpc instruction is exposed as intrinsic llvm.amdgcn.s.getpc

Updated comment to clarify that it is provided to allow a specific style
of position independent code to determine the high part of its address when
it is known (through convention) that the code and any data of interest does
not cross a 4Gb address boundary.

arsenm added inline comments.May 15 2017, 10:04 AM

include/llvm/IR/IntrinsicsAMDGPU.td
576	Do you have the patch for the clang component of this?
577	You could also debatably make it speculatable
lib/Target/AMDGPU/SOPInstructions.td
950–952	You should be able to add this directly to the instruction definition without a standalone pattern. The comment is also unnecessary
test/CodeGen/AMDGPU/llvm.amdgcn.s.getpc.ll
16	Not that it matters, but this doesn't match the readnone it was declared with

timcorringham mentioned this in D33276: [AMDGPU] add __builtin_amdgcn_s_getpc.May 17 2017, 7:02 AM

Amendments addressing review comments:

Corresponding Clang changes are in D33276.

I have made the intrinsic speculatable.

I have moved the intrinsic pattern to be next to the instruction def, but I'm too much of a Tablegen novice to be able to roll them in together.

Corrected attributes in test.

In D32862#757385, @timcorringham wrote:

Amendments addressing review comments:

Corresponding Clang changes are in D33276.

I have made the intrinsic speculatable.

I have moved the intrinsic pattern to be next to the instruction def, but I'm too much of a Tablegen novice to be able to roll them in together.

Corrected attributes in test.

See SI_PS_LIVE, you just need something like [(set i64:$sdst, (int_foo))]

Updated s_getpc instruction definition to include intrinsic.

LGTM

This revision is now accepted and ready to land.May 23 2017, 9:46 AM

Closed by commit rL303859: [AMDGPU] add intrinsic for s_getpc (authored by timcorringham). · Explain WhyMay 25 2017, 7:04 AM

This revision was automatically updated to reflect the committed changes.

timcorringham mentioned this in rL303861: [AMDGPU] add __builtin_amdgcn_s_getpc.May 25 2017, 7:16 AM

http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/2604
http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/3280

$ svn commit -m 'Revert r303859, CodeGen/AMDGPU/llvm.amdgcn.s.getpc.ll fails on bots.'
Sending include/llvm/IR/IntrinsicsAMDGPU.td
Sending lib/Target/AMDGPU/SOPInstructions.td
Deleting test/CodeGen/AMDGPU/llvm.amdgcn.s.getpc.ll
Transmitting file data ..
Committed revision 303902.

kzhuravl added inline comments.May 25 2017, 12:24 PM

llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.s.getpc.ll
9 ↗	(On Diff #100242)	@timcorringham, Add `amdgpu_kernel` convention.

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] add intrinsic for s_getpc
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 99412

include/llvm/IR/IntrinsicsAMDGPU.td

lib/Target/AMDGPU/SOPInstructions.td

test/CodeGen/AMDGPU/llvm.amdgcn.s.getpc.ll