This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and Intrinsics.
ClosedPublic

Authored by cfang on Feb 25 2016, 10:02 AM.

Details

Summary

These are instructions introduced in VI+ Chips. We defined the instructions in this patch, and introduce intrinsics
llvm.amdgcn.ds.permute/llvm.amdgcn.ds.bpermute to expose them.

Diff Detail

Event Timeline

cfang updated this revision to Diff 49084.Feb 25 2016, 10:02 AM
cfang retitled this revision from to AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and Intrinsics..
cfang updated this object.
cfang added reviewers: arsenm, tstellarAMD.
cfang added subscribers: arsenm, llvm-commits.
arsenm added inline comments.Feb 25 2016, 11:13 AM
lib/Target/AMDGPU/SIInstrInfo.cpp
230–231

Checking just offset0 should be sufficient, and can be moved above

lib/Target/AMDGPU/VIInstructions.td
131

Are we sure these don't real M0?

test/CodeGen/AMDGPU/llvm.amdgcn.ds.permute.ll
4–5

I would prefer splitting the 2 separate intrinsics into separate patches. These are also missing the readnone (which shoulda also use attribute groups)

lib/Target/AMDGPU/VIInstructions.td
131

It reads M0, but it is supposed to ignore its value, so for our purposes we can treat it as if it doesn't read M0.

cfang updated this revision to Diff 49202.Feb 26 2016, 9:25 AM

Update the patch based on Matt's Review:

  1. Check only Offset0Imm and move the check one line ahead.
  2. It is safe to remove M0 from the Uses list for ds_permute.ds_bpermute.
  3. split the LIT test for ds_permute and ds_bpermute separately.
cfang added inline comments.Feb 26 2016, 9:28 AM
test/CodeGen/AMDGPU/llvm.amdgcn.ds.permute.ll
5–6

I just split the test case. If you want to split the intrinsics and/or instruction definitions, I can do it in the integration. Thanks.

arsenm added inline comments.Feb 27 2016, 12:36 AM
lib/Target/AMDGPU/VIInstructions.td
136–139

Why can't these be patterns on the instruction definition instead of standalone Pats?

cfang updated this revision to Diff 49427.Feb 29 2016, 3:01 PM

Move the pattern for ds_permute intrinsic code generation into
the instruction definition, based on Matt's comment.

arsenm accepted this revision.Feb 29 2016, 3:13 PM
arsenm edited edge metadata.

LGTM

test/CodeGen/AMDGPU/llvm.amdgcn.ds.bpermute.ll
6

Might want to check that there are 2 VGPR operands

This revision is now accepted and ready to land.Feb 29 2016, 3:13 PM
This revision was automatically updated to reflect the committed changes.