These are instructions introduced in VI+ Chips. We defined the instructions in this patch, and introduce intrinsics
llvm.amdgcn.ds.permute/llvm.amdgcn.ds.bpermute to expose them.
Details
Diff Detail
Event Timeline
lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
230–231 | Checking just offset0 should be sufficient, and can be moved above | |
lib/Target/AMDGPU/VIInstructions.td | ||
131 | Are we sure these don't real M0? | |
test/CodeGen/AMDGPU/llvm.amdgcn.ds.permute.ll | ||
4–5 | I would prefer splitting the 2 separate intrinsics into separate patches. These are also missing the readnone (which shoulda also use attribute groups) |
lib/Target/AMDGPU/VIInstructions.td | ||
---|---|---|
131 | It reads M0, but it is supposed to ignore its value, so for our purposes we can treat it as if it doesn't read M0. |
Update the patch based on Matt's Review:
- Check only Offset0Imm and move the check one line ahead.
- It is safe to remove M0 from the Uses list for ds_permute.ds_bpermute.
- split the LIT test for ds_permute and ds_bpermute separately.
test/CodeGen/AMDGPU/llvm.amdgcn.ds.permute.ll | ||
---|---|---|
5–6 | I just split the test case. If you want to split the intrinsics and/or instruction definitions, I can do it in the integration. Thanks. |
lib/Target/AMDGPU/VIInstructions.td | ||
---|---|---|
136–139 | Why can't these be patterns on the instruction definition instead of standalone Pats? |
Move the pattern for ds_permute intrinsic code generation into
the instruction definition, based on Matt's comment.
LGTM
test/CodeGen/AMDGPU/llvm.amdgcn.ds.bpermute.ll | ||
---|---|---|
6 | Might want to check that there are 2 VGPR operands |
Checking just offset0 should be sufficient, and can be moved above