This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Set up User SGPRs for queue_ptr only when necessary
ClosedPublic

Authored by cfang on Feb 14 2022, 12:00 PM.

Details

Summary

In general, we need queue_ptr for aperture bases and trap handling, and user SGPRs have to be set up to hold queue_ptr. In current implementation, user SGPRs are set up unnecessarily for some cases. If the target has aperture registers, queue_ptr is not needed to reference aperture bases. For trap handling, if target suppots getDoorbellID, queue_ptr is also not necessary. Futher, code object version 5 introduces new kernel ABI which passes queue_ptr as an implicit kernel argument, so user SGPRs are no longer necessary for queue_ptr. Based on the trap handling document: https://llvm.org/docs/AMDGPUUsage.html#amdgpu-trap-handler-for-amdhsa-os-v4-onwards-table, llvm.debugtrap does not need queue_ptr, we remove queue_ptr suport for llvm.debugtrap in the backend.

Diff Detail

Event Timeline

cfang created this revision.Feb 14 2022, 12:00 PM
cfang requested review of this revision.Feb 14 2022, 12:00 PM
Herald added a reviewer: sstefan1. · View Herald Transcript
Herald added a reviewer: baziotis. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: wdng. · View Herald Transcript
arsenm added inline comments.Feb 14 2022, 12:04 PM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
55

This really ought to be something read from the IR

429–430

This should recognize both the intrinsic and load from the specific offset from the implicitarg ptr, similar to the new hostcall handling. We still should be able to infer no queue ptr with it in memory

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
653 ↗(On Diff #408540)

I don't see why this query needs to change, the code object version can be considered when setting QueuePtr initially

cfang added inline comments.Feb 14 2022, 2:25 PM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
429–430

This is different from the case of hostcall handling. We are handling aperture bases in the backend. We do not have explicit intrinsic call for implicitarf ptr.

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
653 ↗(On Diff #408540)

I intend to factor out same thing like QuterPre && (CodeObjectVersion < 5).
Can we introduce a global function in AMDGPU space like:
bool AMDGPU::needsQueuePreUserSGPRs(MachineFunctionInfo MFI) to achieve that purpose?

arsenm added inline comments.Feb 14 2022, 2:27 PM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
429–430

It's not different because some subtargets still use the queue pointer from here (pre gfx9)

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
653 ↗(On Diff #408540)

No, I mean it's inappropriate to check the version here. The version+subtarget should change whether QueuePtr is set, not add an additional check here

cfang added inline comments.Feb 14 2022, 2:59 PM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
429–430

I know some subtargets still use the queue pointer. However,
you suggest we use similar approach as we handle hostcall.
But we actually have the different case. For hostcall, we are using implicitarg_ptr + offset, but for aperture bases, we do not have implicitarg_ptr
intrinsic call at all.

arsenm added inline comments.Feb 14 2022, 3:13 PM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
429–430

The logical queue pointer value still exists and we can infer that it's not needed, just like hostcall in this case

cfang updated this revision to Diff 408907.Feb 15 2022, 9:13 AM
cfang marked an inline comment as not done.

Remove the introduced SIMachineFunctionInfo::needsQueuePtrUserSGPRs function.
Also correct the formats.

arsenm added inline comments.Feb 21 2022, 5:49 PM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
429–430

We should still be tracking the logical queue pointer

Please fix the commit description so that the first line is self-contained and separated from the rest by a blank line. This matters a lot when looking at the output of "git log ---oneline".

cfang edited the summary of this revision. (Show Details)Feb 22 2022, 3:42 PM
cfang edited the summary of this revision. (Show Details)
cfang updated this revision to Diff 411553.Feb 25 2022, 4:33 PM

Rebase and did a few minor changes.

cfang added inline comments.Feb 25 2022, 4:38 PM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
429–430

Can you be explicit what is "logical queue pointer" here> And why do we need to trace it?

cfang updated this revision to Diff 413617.Mar 7 2022, 1:36 PM

Add the check for queue_ptr through implicitarg_ptr + offset (200) under V5.
Add corresponding LIT test.

Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2022, 1:36 PM
cfang marked an inline comment as done.
cfang added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
55

As discussed in another review (https://reviews.llvm.org/D119027
"IIUC the global opt var is the best we have right now, and any improvement to that situation is orthogonal to this change. I would vote that this not block the patch under review!"), we are using the global opt var for this change.

sameerds accepted this revision.Mar 7 2022, 8:59 PM

Please wait for a day to give other reviewers a chance to respond.

This revision is now accepted and ready to land.Mar 7 2022, 8:59 PM
arsenm added inline comments.Mar 8 2022, 7:33 AM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
426

Typo acess

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
166–171

This isn't a scalable solution for all of the inputs. Should have an enum with offsets or something

cfang added inline comments.Mar 8 2022, 9:25 AM
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
166–171

Will do this in a following patch because the enum of the implicit kernel arguments will also be used there. Still thinking of the appropriate definition.

arsenm accepted this revision.Mar 8 2022, 9:28 AM
This revision was landed with ongoing or failed builds.Mar 9 2022, 10:15 AM
This revision was automatically updated to reflect the committed changes.