This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Implement readlane/readfirstlane intrinsics to expose the instructions.
ClosedPublic

Authored by cfang on Jul 18 2016, 4:52 PM.

Details

Summary

This patch defines a new register class (VM0_32 = VReg_32 + M0) and implements readlane and readfirstlane intrinsics.

Diff Detail

Event Timeline

cfang updated this revision to Diff 64421.Jul 18 2016, 4:52 PM
cfang retitled this revision from to AMDGPU/SI: Implement readlane/readfirstlane intrinsics to expose the instructions..
cfang updated this object.
cfang added reviewers: arsenm, tstellarAMD.
cfang added subscribers: arsenm, llvm-commits.
arsenm added inline comments.Jul 18 2016, 4:59 PM
include/llvm/IR/IntrinsicsAMDGPU.td
395

Comment typo and not needed

400

Ditto

lib/Target/AMDGPU/SIInstructions.td
2364–2365

This is a very simple pattern so should go with the instruction definition's pattern

2371–2373

Ditto

test/CodeGen/AMDGPU/llvm.amdgcn.readfirstlane.ll
7–11

Should also include a test which has an immediate source to make sure that it is moved into a register. Another that uses inline asm to put a value in m0 would also be useful (same for the other intrinsic too)

test/CodeGen/AMDGPU/llvm.amdgcn.readlane.ll
8

Attributes not needed on call site

15

Use attribute group for the nounwind also

cfang updated this revision to Diff 64615.Jul 19 2016, 5:16 PM

update based on Matt's comments:

  1. Remove unnecessary comments in intrinsic definitions;
  2. move the pattern defs into instructions;
  3. add imm src test cases;
  4. remove unnecessary attributes at call sites;
  5. put "nounwind" into a new attribute group.
NOTE: M0 test case is not added in this update.
arsenm edited edge metadata.Jul 19 2016, 6:14 PM

I think you should still add the m0 testcase even though it isn't folded

cfang updated this revision to Diff 64711.Jul 20 2016, 10:54 AM
cfang edited edge metadata.

Add "m0" testcase based on Matt's request.

TODO: shoudl fold m0 into the src operand of readlane/readfirstlane. This is at a lower priority and can be done in a separate patch.

lib/Target/AMDGPU/SIInstructions.td
1225

You should use the type in the pattern rather than the register class.

1598

Same here.

2361

Extra whitespace.

Do you plan to add code to DivergenceAnalysis to recognize these intrinsics? We might eventually want to use a readfirstlane intrinsic from Mesa to hint when certain indices can be assumed to be uniform.

(Right now, LLVM ends up deciding where to insert the readfirstlane in those cases, since they are situations where buffer resource descriptors are loaded which must end up in SGPRs anyway. By giving a readfirstlane hint, we could perhaps generate more efficient code.)

cfang updated this revision to Diff 66053.Jul 28 2016, 4:34 PM

Update based on Tom's Comments:

  1. Use the type in the pattern instead of register class.
  2. remove extra white space.
cfang added a comment.Jul 28 2016, 4:37 PM

Do you plan to add code to DivergenceAnalysis to recognize these intrinsics? We might eventually want to use a readfirstlane intrinsic from Mesa to hint when certain indices can be assumed to be uniform.

(Right now, LLVM ends up deciding where to insert the readfirstlane in those cases, since they are situations where buffer resource descriptors are loaded which must end up in SGPRs anyway. By giving a readfirstlane hint, we could perhaps generate more efficient code.)

I do not have this plan right now. But we will definitely consider your suggestion.

cfang added a comment.Aug 1 2016, 10:52 AM

Hi, Matt and Tom:

Is there any further suggestions on this patch? Thanks.

Hi, Matt and Tom:

Is there any further suggestions on this patch? Thanks.

Need further suggestion what to do on this. I think readlane and readfirstlane are very important.

tstellarAMD accepted this revision.Aug 19 2016, 10:04 AM
tstellarAMD edited edge metadata.
This revision is now accepted and ready to land.Aug 19 2016, 10:04 AM
This revision was automatically updated to reflect the committed changes.