This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Select VGPR versions of MFMA if possible
ClosedPublic

Authored by rampitec on Jan 13 2022, 2:32 PM.

Details

Summary

We can select _vgprcd versions of MAI instructions and have no
AGPRs with the whole budget left for VGPRs if:

  1. This is a kernel;
  2. It has no calls;
  3. It runs at least on 2 waves thus having not more that 256 VGPRs.
  4. There is no inline asm requesting AGPRs.

Diff Detail

Event Timeline

rampitec created this revision.Jan 13 2022, 2:32 PM
rampitec requested review of this revision.Jan 13 2022, 2:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 13 2022, 2:32 PM
Herald added a subscriber: wdng. · View Herald Transcript
rampitec retitled this revision from [AMDGPU] Select VGPR versions of MFMA is possible to [AMDGPU] Select VGPR versions of MFMA if possible.Jan 13 2022, 3:46 PM
rampitec added a subscriber: b-sumner.

Same thing is possible even with calls but requires a full call graph examination. If annotate kernel features pass will move to attributer it shall be possible to analyze inline asm in calls and propagate this info.

arsenm added inline comments.Jan 17 2022, 4:49 PM
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
103

Relying on the calls check here is unreliable. Since you only really need to query this before selection, can't we just see when an asm statement containing AGPRs is hit?

llvm/test/CodeGen/AMDGPU/mfma-bf16-vgpr-cd-select.ll
3

Move the -global-isel flag to the first argument

rampitec added inline comments.Jan 17 2022, 4:53 PM
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
103

This function already relyies on this. Then it is not sufficient to check for asm, as a called function may use agprs.

arsenm added inline comments.Jan 17 2022, 4:55 PM
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
103

I know it does, and it's not good. Unlike the current use case for checking for calls, you don't need to know this up-front for the ABI so you don't have to stick this here

rampitec added inline comments.Jan 17 2022, 4:58 PM
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
103

I could add a check for calls where I am checking asm. The scan is there anyway. Is that better?

There must be no calls for this to be legal.

arsenm added inline comments.Jan 17 2022, 5:08 PM
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
103

Yes. Technically the legalizer is free to introduce new calls, but we happen to never do that.

Since this is only used in a pattern predicate, and we will have visited asm and call sites by that time, you can check both of those things in the predicate and cache it

rampitec updated this revision to Diff 400938.Jan 18 2022, 11:55 AM
rampitec marked an inline comment as done.
  • Check for calls along with inline asm not relying on function attribute.
  • Moved -global-isel flag in tests.
rampitec marked 2 inline comments as done.Jan 18 2022, 11:55 AM
rampitec marked an inline comment as done.
arsenm added inline comments.Jan 26 2022, 4:20 PM
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
4253

Doesn't this need a subtarget check?

llvm/lib/Target/AMDGPU/VOP3PInstructions.td
448

The point of checking it here is you don't need to inspect the original IR. You can check if there are any AGPR uses in the machine function

rampitec marked an inline comment as done.Jan 27 2022, 9:52 AM
rampitec added inline comments.
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
4253

It is in the mayNeedAGPRs() itself.

llvm/lib/Target/AMDGPU/VOP3PInstructions.td
448

This is selection, there MF is not yet selected. Plus it needs a check for calls.

rampitec marked an inline comment as done.Feb 3 2022, 7:53 PM

Ping

Ping. It has passed targeted tests on HW.

arsenm added inline comments.Feb 7 2022, 11:57 AM
llvm/lib/Target/AMDGPU/VOP3PInstructions.td
448

We know if there are calls at selection time already, and have already lowered all asm and should have seen the AGPRs already

rampitec added inline comments.Feb 7 2022, 12:57 PM
llvm/lib/Target/AMDGPU/VOP3PInstructions.td
448

In fact we don't. As discussed offline we only know it for current bb. This test will fail:

declare <32 x float> @llvm.amdgcn.mfma.f32.32x32x1f32(float, float, <32 x float>, i32, i32, i32)

define amdgpu_kernel void @test_mfma_f32_32x32x1f32_call(<32 x float> addrspace(1)* %arg) #0 {
bb1:
  %in.1 = load <32 x float>, <32 x float> addrspace(1)* %arg
  %mai.1 = tail call <32 x float> @llvm.amdgcn.mfma.f32.32x32x1f32(float 1.0, float 2.0, <32 x float> %in.1, i32 1, i32 2, i32 3)
  store <32 x float> %mai.1, <32 x float> addrspace(1)* %arg
  br label %bb2

bb2:
  call void @foo()
  ret void
}

declare void @foo()

attributes #0 = { "amdgpu-flat-work-group-size"="1,256" "amdgpu-waves-per-eu"="2" }
arsenm accepted this revision.Feb 7 2022, 5:13 PM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/VOP3PInstructions.td
448

Add this test for future reference? Probably should make it a conditional branch

This revision is now accepted and ready to land.Feb 7 2022, 5:13 PM
rampitec updated this revision to Diff 406882.Feb 8 2022, 10:00 AM
rampitec marked an inline comment as done.

Added test with more than one BB.

This revision was landed with ongoing or failed builds.Feb 8 2022, 10:19 AM
This revision was automatically updated to reflect the committed changes.