This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Clean up LDS-related occupancy calculations
ClosedPublic

Authored by nhaehnle on Dec 6 2022, 2:33 PM.

Details

Summary

Occupancy is expressed as waves per SIMD. This means that we need to
take into account the number of SIMDs per "CU" or, to be more precise,
the number of SIMDs over which a workgroup may be distributed.

getOccupancyWithLocalMemSize was wrong because it didn't take SIMDs
into account at all.

At the same time, we need to take into account that WGP mode offers
access to a larger total amount of LDS, since this can affect how
non-power-of-two LDS allocations are rounded. To make this work
consistently, we distinguish between (available) local memory size and
addressable local memory size (which is always limited by 64kB on
gfx10+, even with WGP mode).

This change results in a massive amount of test churn. A lot of it is
caused by the fact that the default work group size is 1024, which means
that (due to rounding effects) the default occupancy on older hardware
is 8 instead of 10, which affects scheduling via register pressure
estimates. I've adjusted most tests by just running the UTC tools, but
in some cases I manually changed the work group size to 32 or 64 to make
sure that work group size chunkiness has no effect.

Diff Detail

Event Timeline

nhaehnle created this revision.Dec 6 2022, 2:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 6 2022, 2:33 PM
nhaehnle requested review of this revision.Dec 6 2022, 2:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 6 2022, 2:33 PM
Herald added a subscriber: wdng. · View Herald Transcript
arsenm added inline comments.Dec 6 2022, 2:41 PM
llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
146

I'd like to avoid growing the number of generation based feature checks in favor of separate subtarget features

nhaehnle added inline comments.Dec 8 2022, 11:37 AM
llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
146

Hmm. I was using the same kind of check as what's already in AMDGPUBaseInfo. And it's not clear to me what the feature here would be that makes sense.

It is unfortunate that CuMode was picked out as the feature instead of WgpMode. Should we add CuMode as a feature to everything <gfx10? Or somehow figure out how to invert the sense of the feature, and make WgpMode the feature that has to be added explicitly? Or something else entirely?

arsenm added inline comments.Dec 8 2022, 4:11 PM
llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
146

I'd prefer inverting it but that's probably a good bit of migration pain.

The current feature is the dynamic switch; I was thinking another feature that says it has the dynamic switch. That's what we do for sramecc

nhaehnle added inline comments.Dec 9 2022, 2:11 AM
llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
146

Would we still have a need for for the static feature if the dynamic switch was inverted? I see three options for this change right now:

  1. Commit is as-is.
  2. First invert to feature WgpMode instead of feature CuMode.
  3. Add a static feature SupportsWgpMode.

I believe option 2 is the best long-term path, but it feels like a pretty deep rabbit hole and I wouldn't want to make this change dependent on it. Option 3 could be done reasonably quickly, but it's not clear to me that it's actually better than option 1, if we're supposed to eventually invert anyway (at which point that static feature becomes unused?).

arsenm accepted this revision.Dec 9 2022, 2:17 PM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
146

It's easy to remove the static feature after it's unused. It doesn't need to leak out into user IR in target-features attributes. I'm OK with this for now, up to you if you want to try for one of the better options

This revision is now accepted and ready to land.Dec 9 2022, 2:17 PM
This revision was landed with ongoing or failed builds.Jan 23 2023, 12:43 PM
This revision was automatically updated to reflect the committed changes.