Add an LLVM intrinsic / Clang Builtin to expose the v_cmp_ne_i32 instruction.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Do you mean an LLVM IR icmp instruction or a generic intrinsic that takes a condition code as its third input?
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
394 ↗ | (On Diff #64938) | Remove the GCCBuiltins, they don't work with overloaded intrinsics |
400 ↗ | (On Diff #64938) | the 3rd parameter should be i32 |
lib/Target/AMDGPU/AMDGPUISelLowering.h | ||
231 ↗ | (On Diff #64938) | Needs a comment that this is setcc with the full mask result |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
1655–1656 ↗ | (On Diff #64938) | These should be put towards the end of the cases |
1657 ↗ | (On Diff #64938) | Variables should be capitalized and camel case. What happens if the cond code is out of range? There should probably be a clamp |
1658 ↗ | (On Diff #64938) | Extra spaces between type and name |
1659 ↗ | (On Diff #64938) | This looks like it goes over 80 characters |
lib/Target/AMDGPU/SIInstructions.td | ||
2365–2366 ↗ | (On Diff #64938) | This can just be a class. You can also try adding the pattern dag to the v_cmp instruction definition patterns list (although I'm not 100% sure if the multiple patterns actually work). A multiclass might help if you don't want to repeat for i32/i64 |
2372–2373 ↗ | (On Diff #64938) | All compare types should be defined. Additionally i64 and the FP ones are missing |
test/CodeGen/AMDGPU/llvm.amdgcn.icmp.ne.ll | ||
6–12 ↗ | (On Diff #64938) | There should be a test for every condition code and i32/i64 |
8 ↗ | (On Diff #64938) | nounwind should also be an attribute group |
9 ↗ | (On Diff #64938) | Call site doesn't need the at tributes |
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
400 ↗ | (On Diff #64938) | Also, should the return type be i64 instead of double? |
Code changes based on Matt's comments.
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
400 ↗ | (On Diff #65358) | Yes, code has been changed accordingly. Thanks! |
lib/Target/AMDGPU/AMDGPUISelLowering.h | ||
---|---|---|
231 ↗ | (On Diff #65358) | Should have space after the //, and it should be capitalized and punctuated. Maybe clearer would be a compare with a result bit per item in the wavefront or something, mask result sounds more ambiguous maybe |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
1909–1910 ↗ | (On Diff #65358) | You should do the range check before the static_cast since I think it is undefined behavior to have an out of bounds enum value inserted. This also won't work for fcmp, each should be handled in its own case with its own range check for the specific compare types' range |
lib/Target/AMDGPU/SIInstructions.td | ||
2374–2375 ↗ | (On Diff #65358) | The unsigned should use the _U32 compare |
2383–2384 ↗ | (On Diff #65358) | Ditto |
2385–2386 ↗ | (On Diff #65358) | Ditto |
2399–2411 ↗ | (On Diff #65358) | This also needs to be done for the unordered compares |
test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.ll | ||
102–104 ↗ | (On Diff #65358) | Missing unordered compares |
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
392 ↗ | (On Diff #65427) | You should remove this comment |
397 ↗ | (On Diff #65427) | And this one |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
1907–1909 ↗ | (On Diff #65427) | Instead of an assert, how about returning undef? this should also have a test. Same if the operand isn't really constant, you'll need to do the dyn_cast yourself |
1918–1922 ↗ | (On Diff #65427) | Should refer to FCmpInst |
lib/Target/AMDGPU/SIInstructions.td | ||
2413–2425 ↗ | (On Diff #65427) | These are not the correct unordered comparison instructions, refer to the existing set of fcmp patterns for which to use |
lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
2413–2425 ↗ | (On Diff #65427) | Unordered compares should select the V_CMP_N* instructions. Take a look at the instruction definitions to see which condition matches to which instruction. |
test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.ll | ||
---|---|---|
199 ↗ | (On Diff #65427) | You should also add a test that uses fabs on the inputs to make sure that source modifiers are folded |
- Add dyn_cast for type converting.
- Fixed not using correct FCmpInst type.
- Fixed incorrectly use of unordered insutrctions
- Added fabs as input for fcmp test.
The title of the commit is also inaccurate, it should be intrinsics for compare with the full wavefront result
test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.ll | ||
---|---|---|
200 ↗ | (On Diff #65538) | Still missing these tests |
test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.ll | ||
---|---|---|
200 ↗ | (On Diff #65538) | I have just created one "define void @v_fcmp_f32_oeq_with_fabs(i64 addrspace(1)* %out, float %src, float %a) #1" and put it one the top of tests. Should I write fabs tests for all fcmp comparisons? |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
1907 ↗ | (On Diff #65632) | Space before = |
1921 ↗ | (On Diff #65632) | Ditto |
test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.ll | ||
7 ↗ | (On Diff #65632) | Missing test for invalid condition code value |
9 ↗ | (On Diff #65632) | You can move the |s outside of the regex and then you don't have to escape them |
11 ↗ | (On Diff #65632) | Don't need call site attributes |
11–16 ↗ | (On Diff #65632) | Still should test that both operands can have the source modifiers folded |
29 ↗ | (On Diff #65632) | it doesn't really matter, but there's no reason this test needs to under-align the stores, Fix these to be align 8 or remove the aligns |
test/CodeGen/AMDGPU/llvm.amdgcn.icmp.ll | ||
6 ↗ | (On Diff #65632) | Missing test for invalid condition code value |
lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
2391–2392 ↗ | (On Diff #65761) | Spaces before the types and the next ( |
test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.ll | ||
28 ↗ | (On Diff #65761) | You should drop the suffix here to strengthen the test. It would be best to reduce to just v_cmp because something could commute the instruction |
test/CodeGen/AMDGPU/llvm.amdgcn.icmp.ll | ||
16 ↗ | (On Diff #65761) | Ditto |