Page MenuHomePhabricator

Transform illegal intrinsics to V_ILLEGAL
ClosedPublic

Authored by Leonc on Apr 13 2022, 10:39 AM.

Details

Summary

Related tasks:

  • SWDEV-240194
  • SWDEV-309417
  • SWDEV-334876

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Leonc added a comment.Sun, Jul 24, 7:21 PM
  • Fix existing test failures.
bcahoon added inline comments.Mon, Jul 25, 7:03 PM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
4415 ↗(On Diff #447180)

At this location, ILLEGAL is a memory opcode, but SelectionDAGDumper will crash because when trying to print its memops.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
6636

I think it may be easiest to create the ILLEGAL instruction here rather than creating it after the call to lowerImage. That way, there doesn't have to be copies of the code that creates ILLEGAL.

For example,
MachineSDNode *NewNode = DAG.getMachineNode(AMDGPU::V_ILLEGAL, DL, MVT::Other, Op.getOperand(0));
return DAG.getMergeValues({ DAG.getUNDEF(Op.getValueType()), SDValue(NewNode, 0) }, DL);

The UNDEF value replaces return result for the intrinsic, and the V_ILLEGAL replaces the chain result. It appears, though, that V_ILLEGAL is removed if the intrinsic chain result is not used anywhere. I'm not sure if that is a bad thing since it means it's not really needed?

7121

Can this go in lowerImage? It seems like this will mark some cases as illegal that should be handled either in DAGToDAG or with patterns.

Leonc updated this revision to Diff 448074.Wed, Jul 27, 9:31 AM
Leonc marked an inline comment as done.
  • Revert changes to return values.
  • Refactor code.
  • Address comments.
bcahoon added inline comments.Wed, Jul 27, 3:33 PM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
540 ↗(On Diff #448074)

Will ILLEGAL be removed?

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
6637

No need for braces

8401

The v_illegal should have an operand for the chain operand in the original instruction, if one exists.

t0 EntryToken
res,ch = intrinsic t0, <other opernads>
->
res = undef
ch = v_illegal t0

Also, if there is a use of the chain definition in the original instruction, then there is no need to add the code below.

Missing testscases. Also should include some assembler and disassembler tests for V_ILLEGAL

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8401

I'd expect to use the original chain, but it also doesn't really matter given that it's filler anyway

Leonc updated this revision to Diff 448301.Thu, Jul 28, 4:33 AM
Leonc marked 8 inline comments as done.
  • Address comments.
Leonc added inline comments.Thu, Jul 28, 4:34 AM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
540 ↗(On Diff #448074)

Thanks I missed that one.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
7852

I think this is unnecessary. We still need to call lowerImage for the return value when it succeeds.

8401

Thanks @bcahoon & @arsenm.

I added a check for MemSDNode and pass the chain to V_ILLEGAL if it exists.

Leonc updated this revision to Diff 448660.Fri, Jul 29, 9:20 AM
  • Handling for atomics plus tests.

A couple more tests are needed. A test for for the image intrinsic. Also an assembler test for the v_illegal encoding.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
7827

no braces needed

llvm/lib/Target/AMDGPU/SIInstructions.td
3368

Do we need different version for gfx10?

Leonc updated this revision to Diff 448997.Mon, Aug 1, 5:28 AM
  • Add different versions of V_ILLEGAL for different targets.
Leonc marked 2 inline comments as done.Mon, Aug 1, 6:07 AM
foad added a comment.Mon, Aug 1, 6:32 AM

The rationale is for library code that looks like:

void example(float *p, float v) {
  if (_ISA_verison == 9008 || __ISA_verison == 9010)
     global_atomic_fadd(p, v);
  else
     generic_atomic_fadd(p, v);
}

At optimization levels > 0, the dead code is eliminated, but at optimization level 0, the dead code remain and the compiler attempts to generate an instruction. That instruction would never be executed due to the condition, but it still needs to be generated. In these cases, we don't have multilibs (target specific libraries).

Right, but why can't global_atomic_fadd and generic_atomic_fadd be functions that just contain a call to the corresponding builtin and have an appropriate "target-cpu"= attribute? I guess you would need to do inlining in the backend once you know what subtarget you are compiling for. Is that viable?

llvm/test/CodeGen/AMDGPU/v_illegal-atomics.ll
2

File should not be executable.

bcahoon added inline comments.Mon, Aug 1, 7:28 AM
llvm/lib/Target/AMDGPU/SIInstructions.td
3364

Do we want different mnemonics, or should the same mnemonic be used for the cases when the encoding is all 0's or all 1's?

When you add the llvm-mc tests, it will also be a good idea to add a test that pipes the output of llvm-mc through llvm-objdump to the that the different encodings show up properly for the two cases.

Leonc updated this revision to Diff 449018.Mon, Aug 1, 7:42 AM
Leonc marked an inline comment as done.
  • Fix indentation, braces, and file mode.
Leonc marked an inline comment as done.Mon, Aug 1, 7:43 AM
Leonc added inline comments.
llvm/lib/Target/AMDGPU/SIInstructions.td
3364

Do we want different mnemonics, or should the same mnemonic be used for the cases when the encoding is all 0's or all 1's?

@arsenm asked for separate definitions:

I would prefer to define a separate V_ILLEGAL that uses all 1s pre gfx10

Leonc added inline comments.Mon, Aug 1, 7:56 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8399

Ternary args are the wrong way round. Running local tests before I update the diff to fix this.

Leonc updated this revision to Diff 449021.Mon, Aug 1, 8:01 AM
  • Fix ternary args.
Leonc updated this revision to Diff 449602.Wed, Aug 3, 2:15 AM
  • Add v_illegal assembly tests.
  • Update instruction definitions.
Leonc marked an inline comment as done.Wed, Aug 3, 2:19 AM
Leonc added inline comments.
llvm/test/MC/AMDGPU/v_illegal-atomics.s
2–3

Adding modifications to pipe the compiler output through objdump as these tests aren't very helpful.

arsenm added a comment.Wed, Aug 3, 6:46 AM

Right, but why can't global_atomic_fadd and generic_atomic_fadd be functions that just contain a call to the corresponding builtin and have an appropriate "target-cpu"= attribute? I guess you would need to do inlining in the backend once you know what subtarget you are compiling for. Is that viable?

The problem is in the called function with the appropriate attributes. The function isn't deleted and we still need to produce something. The current situation is it works in cases where the wrong subtarget also happens to have the same encoding, but breaks on targets that never had an equivalent instruction encoded

arsenm added inline comments.Wed, Aug 3, 7:04 AM
llvm/lib/Target/AMDGPU/SIInstructions.td
3364

The instruction is actually named v_illegal, so you shouldn't invent new names here. I would name these V_ILLEGAL for the gfx10 version and V_ILLEGAL_gfx6_gfx7_gf8_gfx9?

I also think V_ILLEGAL is available in a vop3 encoding (for gfx10) but probably should define that in a separate patch for the benefit of the disassembler

arsenm added inline comments.Wed, Aug 3, 7:06 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8401

Checking for MemSDNode doesn't make sense, other nodes can have chains. You would need to pass it in from the source if it applies. However I don't think it's worth the complexity, since this really is filler content anyway

Leonc added inline comments.Wed, Aug 3, 7:12 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8401

Thanks, I agree.

foad added a comment.Wed, Aug 3, 7:12 AM

Right, but why can't global_atomic_fadd and generic_atomic_fadd be functions that just contain a call to the corresponding builtin and have an appropriate "target-cpu"= attribute? I guess you would need to do inlining in the backend once you know what subtarget you are compiling for. Is that viable?

The problem is in the called function with the appropriate attributes. The function isn't deleted and we still need to produce something. The current situation is it works in cases where the wrong subtarget also happens to have the same encoding, but breaks on targets that never had an equivalent instruction encoded

If target-cpu doesn't let you target different CPUs for different functions then I don't understand what it does.

arsenm added a comment.Wed, Aug 3, 7:24 AM

If target-cpu doesn't let you target different CPUs for different functions then I don't understand what it does.

For the relevant cases in the library, we key off the subtarget feature in target-features. The problem is in the final codegen we've propagated the real cpu to the leaf function, which then artificially has an incompatible target-features. We would have to have some code fixup the target-cpu to some random other device with compatible features

Leonc updated this revision to Diff 449723.Wed, Aug 3, 11:08 AM
  • Address comments.
Leonc marked 2 inline comments as done.Wed, Aug 3, 11:12 AM
Leonc added inline comments.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8401

It turns out the chain is necessary. An assertion in RemoveNodeFromCSEMaps fails when Op is an atomic intrinsic.

arsenm added inline comments.Wed, Aug 3, 2:25 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8401

Yes, a chain is needed but you should be able to use the entry node. If you want to forward it in from the call sites, that would also be OK

Leonc updated this revision to Diff 450001.Thu, Aug 4, 8:54 AM
  • Address comments and fix tests.
Leonc marked an inline comment as done.Thu, Aug 4, 8:55 AM
arsenm added a comment.Thu, Aug 4, 1:39 PM

Could also use a disassembler test

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8397

SDLoc DL(Op)

Leonc updated this revision to Diff 450279.Fri, Aug 5, 6:26 AM
  • Address comments.
Leonc marked an inline comment as done.Fri, Aug 5, 6:26 AM
arsenm added inline comments.Fri, Aug 5, 6:48 AM
llvm/test/CodeGen/AMDGPU/v_illegal-atomics.ll
41

This looks backwards. gfx10 is getting the -1 encoding and gfx9 is getting 0

54

gfx10 has v_illegal, so this should be 0x0 and printed as v_illegal

Leonc added inline comments.Fri, Aug 5, 6:58 AM
llvm/test/CodeGen/AMDGPU/v_illegal-atomics.ll
41

Are you sure gfx9 is getting the zero encoding? I thought this line was "address: encoding".
If this was wrong wouldn't the tests in v_illegal-atomics.s also be wrong?

arsenm added inline comments.Fri, Aug 5, 7:02 AM
llvm/test/CodeGen/AMDGPU/v_illegal-atomics.ll
41

You're right, gfx9 looks right. The gfx1030 line with .long 0xffffffff looks wrong.

Leonc added inline comments.Fri, Aug 5, 7:07 AM
llvm/test/CodeGen/AMDGPU/v_illegal-atomics.ll
41

I'm guessing gfx1030 is due to the disassebler. We know the encoding is correct from the tests in v_illegal-atomics.s.
I'll look into it.

Leonc updated this revision to Diff 450320.Fri, Aug 5, 10:25 AM
  • Address comments.
Leonc marked 3 inline comments as done.Fri, Aug 5, 10:26 AM
arsenm accepted this revision.Fri, Aug 5, 1:07 PM

LGTM except for the stray debug print. We also probably should clean up the the encoding to rely on subtarget features instead of the generation check

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8396

DRop this debug printing

This revision is now accepted and ready to land.Fri, Aug 5, 1:07 PM
Leonc updated this revision to Diff 450368.Fri, Aug 5, 1:11 PM
  • Remove debug code.
arsenm accepted this revision.Fri, Aug 5, 1:11 PM
Leonc marked an inline comment as done.Fri, Aug 5, 1:12 PM
Leonc added inline comments.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8396

Thanks, my mistake.

Leonc marked an inline comment as done.Fri, Aug 5, 1:13 PM
Leonc added a comment.Fri, Aug 5, 1:27 PM

We also probably should clean up the the encoding to rely on subtarget features instead of the generation check

Is there a generic way to do that without adding maintenance overhead, or is it better that we don't implicitly support new subtargets/generations in case they have different encodings of v_illegal?

This revision was landed with ongoing or failed builds.Sat, Aug 6, 12:59 AM
This revision was automatically updated to reflect the committed changes.