This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU : Add intrinsics for compare with the full wavefront result, such as v_cmp_ne_i32, etc..
ClosedPublic

Authored by wdng on Jul 18 2016, 2:57 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

wdng updated this revision to Diff 64393.Jul 18 2016, 2:57 PM
wdng retitled this revision from to AMDGPU : Add an LLVM intrinsic / Clang Builtin to expose the v_cmp_ne_i32 instruction..
wdng updated this object.
wdng added reviewers: tstellarAMD, arsenm.
wdng set the repository for this revision to rL LLVM.
arsenm requested changes to this revision.Jul 18 2016, 2:58 PM
arsenm edited edge metadata.

We should only do a general icmp, not an intrinsic for specific compare opcodes

This revision now requires changes to proceed.Jul 18 2016, 2:58 PM

We should only do a general icmp, not an intrinsic for specific compare opcodes

Do you mean an LLVM IR icmp instruction or a generic intrinsic that takes a condition code as its third input?

We should only do a general icmp, not an intrinsic for specific compare opcodes

Do you mean an LLVM IR icmp instruction or a generic intrinsic that takes a condition code as its third input?

Yes

wdng updated this revision to Diff 64928.Jul 21 2016, 11:12 AM
wdng edited edge metadata.

Use a general way to implement v_cmp_ne_i32.

wdng updated this revision to Diff 64938.Jul 21 2016, 12:11 PM
wdng edited edge metadata.

Fixed a data type issue for fcmp intrinsic definition.

arsenm added inline comments.Jul 21 2016, 4:12 PM
include/llvm/IR/IntrinsicsAMDGPU.td
394 ↗(On Diff #64938)

Remove the GCCBuiltins, they don't work with overloaded intrinsics

400 ↗(On Diff #64938)

the 3rd parameter should be i32

lib/Target/AMDGPU/AMDGPUISelLowering.h
231 ↗(On Diff #64938)

Needs a comment that this is setcc with the full mask result

lib/Target/AMDGPU/SIISelLowering.cpp
1655–1656 ↗(On Diff #64938)

These should be put towards the end of the cases

1657 ↗(On Diff #64938)

Variables should be capitalized and camel case.

What happens if the cond code is out of range? There should probably be a clamp

1658 ↗(On Diff #64938)

Extra spaces between type and name

1659 ↗(On Diff #64938)

This looks like it goes over 80 characters

lib/Target/AMDGPU/SIInstructions.td
2365–2366 ↗(On Diff #64938)

This can just be a class. You can also try adding the pattern dag to the v_cmp instruction definition patterns list (although I'm not 100% sure if the multiple patterns actually work). A multiclass might help if you don't want to repeat for i32/i64

2372–2373 ↗(On Diff #64938)

All compare types should be defined. Additionally i64 and the FP ones are missing

test/CodeGen/AMDGPU/llvm.amdgcn.icmp.ne.ll
6–12 ↗(On Diff #64938)

There should be a test for every condition code and i32/i64

8 ↗(On Diff #64938)

nounwind should also be an attribute group

9 ↗(On Diff #64938)

Call site doesn't need the at tributes

include/llvm/IR/IntrinsicsAMDGPU.td
400 ↗(On Diff #64938)

Also, should the return type be i64 instead of double?

wdng updated this revision to Diff 65358.Jul 25 2016, 9:25 AM
wdng marked 13 inline comments as done.

Code changes based on Matt's comments.

include/llvm/IR/IntrinsicsAMDGPU.td
400 ↗(On Diff #65358)

Yes, code has been changed accordingly. Thanks!

arsenm added inline comments.Jul 25 2016, 12:09 PM
lib/Target/AMDGPU/AMDGPUISelLowering.h
231 ↗(On Diff #65358)

Should have space after the //, and it should be capitalized and punctuated. Maybe clearer would be a compare with a result bit per item in the wavefront or something, mask result sounds more ambiguous maybe

lib/Target/AMDGPU/SIISelLowering.cpp
1909–1910 ↗(On Diff #65358)

You should do the range check before the static_cast since I think it is undefined behavior to have an out of bounds enum value inserted.

This also won't work for fcmp, each should be handled in its own case with its own range check for the specific compare types' range

lib/Target/AMDGPU/SIInstructions.td
2374–2375 ↗(On Diff #65358)

The unsigned should use the _U32 compare

2383–2384 ↗(On Diff #65358)

Ditto

2385–2386 ↗(On Diff #65358)

Ditto

2399–2411 ↗(On Diff #65358)

This also needs to be done for the unordered compares

test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.ll
102–104 ↗(On Diff #65358)

Missing unordered compares

wdng updated this revision to Diff 65427.Jul 25 2016, 3:09 PM
wdng marked 7 inline comments as done.

Changes based on Matt's comments.

arsenm added inline comments.Jul 25 2016, 3:14 PM
include/llvm/IR/IntrinsicsAMDGPU.td
392 ↗(On Diff #65427)

You should remove this comment

397 ↗(On Diff #65427)

And this one

lib/Target/AMDGPU/SIISelLowering.cpp
1907–1909 ↗(On Diff #65427)

Instead of an assert, how about returning undef? this should also have a test. Same if the operand isn't really constant, you'll need to do the dyn_cast yourself

1918–1922 ↗(On Diff #65427)

Should refer to FCmpInst

lib/Target/AMDGPU/SIInstructions.td
2413–2425 ↗(On Diff #65427)

These are not the correct unordered comparison instructions, refer to the existing set of fcmp patterns for which to use

lib/Target/AMDGPU/SIInstructions.td
2413–2425 ↗(On Diff #65427)

Unordered compares should select the V_CMP_N* instructions. Take a look at the instruction definitions to see which condition matches to which instruction.

arsenm added inline comments.Jul 25 2016, 3:18 PM
test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.ll
199 ↗(On Diff #65427)

You should also add a test that uses fabs on the inputs to make sure that source modifiers are folded

wdng updated this revision to Diff 65536.Jul 26 2016, 9:08 AM
wdng marked 6 inline comments as done.
  1. Add dyn_cast for type converting.
  2. Fixed not using correct FCmpInst type.
  3. Fixed incorrectly use of unordered insutrctions
  4. Added fabs as input for fcmp test.
wdng updated this revision to Diff 65537.Jul 26 2016, 9:10 AM

Upload correct diff with cached LIT tests update.

wdng updated this revision to Diff 65538.Jul 26 2016, 9:12 AM

Upload correct diff file.

The title of the commit is also inaccurate, it should be intrinsics for compare with the full wavefront result

test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.ll
200 ↗(On Diff #65538)

Still missing these tests

wdng added inline comments.Jul 26 2016, 4:13 PM
test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.ll
200 ↗(On Diff #65538)

I have just created one "define void @v_fcmp_f32_oeq_with_fabs(i64 addrspace(1)* %out, float %src, float %a) #1" and put it one the top of tests. Should I write fabs tests for all fcmp comparisons?

wdng retitled this revision from AMDGPU : Add an LLVM intrinsic / Clang Builtin to expose the v_cmp_ne_i32 instruction. to AMDGPU : Add intrinsics for compare with the full wavefront result, such as v_cmp_ne_i32, etc...Jul 26 2016, 4:14 PM
arsenm added inline comments.Jul 26 2016, 4:21 PM
test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.ll
6 ↗(On Diff #65538)

Use the attribute group

10–15 ↗(On Diff #65538)

The call site does not need the attribute specified. Can you also test the other operand? The check line should check the actual operands, this currently does not actually check much

wdng updated this revision to Diff 65630.Jul 26 2016, 4:44 PM

Modified one LIT test to check operands.

wdng updated this revision to Diff 65632.Jul 26 2016, 4:50 PM

Fixed corrupted diff patch.

arsenm added inline comments.Jul 26 2016, 6:09 PM
lib/Target/AMDGPU/SIISelLowering.cpp
1907 ↗(On Diff #65632)

Space before =

1921 ↗(On Diff #65632)

Ditto

test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.ll
7 ↗(On Diff #65632)

Missing test for invalid condition code value

9 ↗(On Diff #65632)

You can move the |s outside of the regex and then you don't have to escape them

11 ↗(On Diff #65632)

Don't need call site attributes

11–16 ↗(On Diff #65632)

Still should test that both operands can have the source modifiers folded

29 ↗(On Diff #65632)

it doesn't really matter, but there's no reason this test needs to under-align the stores, Fix these to be align 8 or remove the aligns

test/CodeGen/AMDGPU/llvm.amdgcn.icmp.ll
6 ↗(On Diff #65632)

Missing test for invalid condition code value

wdng updated this revision to Diff 65761.Jul 27 2016, 9:55 AM
wdng marked 6 inline comments as done.

Add illegal cond code LIT tests.

arsenm added inline comments.Jul 27 2016, 12:01 PM
lib/Target/AMDGPU/SIInstructions.td
2391–2392 ↗(On Diff #65761)

Spaces before the types and the next (

test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.ll
28 ↗(On Diff #65761)

You should drop the suffix here to strengthen the test. It would be best to reduce to just v_cmp because something could commute the instruction

test/CodeGen/AMDGPU/llvm.amdgcn.icmp.ll
16 ↗(On Diff #65761)

Ditto

wdng updated this revision to Diff 65788.Jul 27 2016, 12:14 PM
wdng marked 4 inline comments as done.
  1. Rename LIT test function names.
  2. Add space between type and parenthesis.
arsenm accepted this revision.Jul 27 2016, 4:00 PM
arsenm edited edge metadata.

LGTM, you can drop the "such as" part from your commit message

This revision is now accepted and ready to land.Jul 27 2016, 4:00 PM
This revision was automatically updated to reflect the committed changes.