This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Add G_AMDGPU_MAD_64_32 instructions
ClosedPublic

Authored by nhaehnle on May 3 2022, 5:11 AM.

Details

Summary

These generic instructions are trivially selected to
V_MAD_[IU]64_[IU]32 instructions when run on the VALU.

When at least both factors are scalar, it is usually better to execute
some or all of the instruction on the SALU. To this end, we lower the
instruction to simpler instructions that are supported on the SALU
when applying the register bank mapping.

Diff Detail

Event Timeline

nhaehnle created this revision.May 3 2022, 5:11 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 3 2022, 5:11 AM
nhaehnle requested review of this revision.May 3 2022, 5:11 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 3 2022, 5:11 AM
Herald added a subscriber: wdng. · View Herald Transcript
arsenm added inline comments.May 3 2022, 6:37 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3168

Don't the selector patterns work? This one should work without much fuss

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1624

I would assume we form this after regbank select and don't need to legalize it based on the regbank

nhaehnle added inline comments.May 4 2022, 4:11 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3168

Which selector patterns do you mean? SelectionDAG has custom code for this as well because of the vcc_out (see AMDGPUDAGToDAGISel::SelectMAD_64_32).

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1624

No, see the follow-up patch in the stack which introduces custom legalization for G_MUL.

I did look into combining the results of the generic G_MUL legalization, but the patterns get really messy and expensive. It's cleaner (and faster in terms of compile-time) to generate the right code directly from legalization.

I suppose one could look at a hybrid of (custom G_MUL legalization into non-target-specific GMIR, but with better suitable patterns) + (matching GMIR into V_MAD_U64_U32 after RegBankSelect). However, merely matching G_MUL + G_UMULH + G_ADDO + G_ADDE is already quite expensive.

arsenm added inline comments.May 4 2022, 4:55 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3168

Oh right, I forgot about the VCC output as usual (the original CI manual didn't mention it and I still haven't caught up)

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1641

Most places invert this and use IsSigned (saves a few characters I guess)

1649–1651

Can you use m_ZeroInt?

1680–1692

Can you use constrainOpWithReadfirstlane?

1773

You could use ApplyRegBankMapping to avoid all of these setRegBank calls

nhaehnle updated this revision to Diff 428407.May 10 2022, 9:21 AM
nhaehnle marked 2 inline comments as done.
  • address review comments
  • build on a factored-out buildReadFirstLane (see D125324)
nhaehnle added inline comments.May 10 2022, 9:22 AM
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1641

How strongly do you feel about that? I used IsUnsigned because it puts the more common opcode first in ternary ?: operators.

1649–1651

Good point.

1680–1692

Not quite, because MulHi may actually be used in two separate places. It really feels more logical to insert the readfirstlane when the producing instruction is built. However, I looked into factoring out the code that creates readfirstlane (and added a new parent commit for that).

1773

I looked into that, but ApplyRegBankMapping unconditionally applies the same bank to everything. This code needs different banks throughout, and it doesn't seem possible to adjust ApplyRegBankMapping.

arsenm added inline comments.May 16 2022, 3:30 PM
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1641

Not very

arsenm accepted this revision.May 16 2022, 3:55 PM
This revision is now accepted and ready to land.May 16 2022, 3:55 PM
This revision was landed with ongoing or failed builds.May 27 2022, 10:36 AM
This revision was automatically updated to reflect the committed changes.