This patch defines a new register class (VM0_32 = VReg_32 + M0) and implements readlane and readfirstlane intrinsics.
Details
Diff Detail
Event Timeline
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
395 | Comment typo and not needed | |
400 | Ditto | |
lib/Target/AMDGPU/SIInstructions.td | ||
2364–2365 | This is a very simple pattern so should go with the instruction definition's pattern | |
2371–2373 | Ditto | |
test/CodeGen/AMDGPU/llvm.amdgcn.readfirstlane.ll | ||
7–11 | Should also include a test which has an immediate source to make sure that it is moved into a register. Another that uses inline asm to put a value in m0 would also be useful (same for the other intrinsic too) | |
test/CodeGen/AMDGPU/llvm.amdgcn.readlane.ll | ||
8 | Attributes not needed on call site | |
15 | Use attribute group for the nounwind also |
update based on Matt's comments:
- Remove unnecessary comments in intrinsic definitions;
- move the pattern defs into instructions;
- add imm src test cases;
- remove unnecessary attributes at call sites;
- put "nounwind" into a new attribute group.
Add "m0" testcase based on Matt's request.
TODO: shoudl fold m0 into the src operand of readlane/readfirstlane. This is at a lower priority and can be done in a separate patch.
Do you plan to add code to DivergenceAnalysis to recognize these intrinsics? We might eventually want to use a readfirstlane intrinsic from Mesa to hint when certain indices can be assumed to be uniform.
(Right now, LLVM ends up deciding where to insert the readfirstlane in those cases, since they are situations where buffer resource descriptors are loaded which must end up in SGPRs anyway. By giving a readfirstlane hint, we could perhaps generate more efficient code.)
Update based on Tom's Comments:
- Use the type in the pattern instead of register class.
- remove extra white space.
Need further suggestion what to do on this. I think readlane and readfirstlane are very important.
Comment typo and not needed