This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Always reserve VGPR for AGPR copies on gfx908
ClosedPublic

Authored by arsenm on Feb 15 2022, 6:27 PM.

Details

Reviewers
rampitec
cdevadas
ronlieb
Group Reviewers
Restricted Project
Summary

Just because there aren't AGPRs in the original program doesn't mean
the register allocator can't choose to use them (unless we were to
forcibly reserve all AGPRs if there weren't any uses). This happens in
high pressure situations and introduces copies to avoid spills.

In this test, the allocator ends up introducing a copy from SGPR to
AGPR which requires an intermediate VGPR. I don't believe it would
introduce a copy from AGPR to AGPR in this situation, since it would
be trying to use an intermediate with a different class.

Theoretically this is also broken on gfx90a, but I have been unable to
come up with a testcase.

Diff Detail

Event Timeline

arsenm created this revision.Feb 15 2022, 6:27 PM
arsenm requested review of this revision.Feb 15 2022, 6:27 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2022, 6:27 PM
Herald added a subscriber: wdng. · View Herald Transcript

What will happen to a very small kernel which could use 24 vgprs or less? Will it still use 24 vgprs and have occupancy 10?

ronlieb accepted this revision.Feb 16 2022, 2:38 PM
ronlieb added a subscriber: ronlieb.

lgtm

This revision is now accepted and ready to land.Feb 16 2022, 2:38 PM

What will happen to a very small kernel which could use 24 vgprs or less? Will it still use 24 vgprs and have occupancy 10?

No, it will still use v32. I haven't been able to construct a testcase where this would happen that can fit in so few registers

What will happen to a very small kernel which could use 24 vgprs or less? Will it still use 24 vgprs and have occupancy 10?

No, it will still use v32. I haven't been able to construct a testcase where this would happen that can fit in so few registers

A simple testcase is an empty kernel or a kernel loading and storing a single dword. Do you mean this will have v32 used?

What will happen to a very small kernel which could use 24 vgprs or less? Will it still use 24 vgprs and have occupancy 10?

No, it will still use v32. I haven't been able to construct a testcase where this would happen that can fit in so few registers

A simple testcase is an empty kernel or a kernel loading and storing a single dword. Do you mean this will have v32 used?

No, I mean if you end up with an AGPR->AGPR copy in your tiny kernel, you would end up using v0-v24, and then jump to v32. We could try to be fancier by shifting the reserved register around, or raise the VGPR allocation if we know we need to reserve it. Either one is a separate patch with greater potential for breakage

rampitec accepted this revision.Feb 16 2022, 3:26 PM

OK. I was mostly concerned about simple kernels without agprs.

OK. I was mostly concerned about simple kernels without agprs.

Plus for the moment we still try to use the scavenger and still have the reserved as a fallback. We need to stop doing that, but I have a separate patch for that