Page MenuHomePhabricator

[AMDGPU] Add maximum NSA size limit ISA feature
ClosedPublic

Authored by critson on May 28 2021, 5:30 PM.

Details

Summary

Add maximum NSA size limit as an ISA feature.
Use this to reduce NSA usage on GFX10.1 to avoid stability issues
with 4 and 5 dwords NSA instructions.
Maintain use of longer NSA instructions on GFX10.3.

Note: this also contains some minor fixes for GlobalISel which
did not work correctly with non-NSA form instructions on GFX10.

Diff Detail

Event Timeline

critson created this revision.May 28 2021, 5:30 PM
critson requested review of this revision.May 28 2021, 5:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 28 2021, 5:30 PM
arsenm added inline comments.Jun 1 2021, 2:23 PM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4188

Why does this need to round up? We should be able to directly handle non powers of 2

critson added inline comments.Jun 1 2021, 11:06 PM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4188

I think the point after which this needs to round up should be higher, i.e. 6 instead of 4.
The problem I am hitting is that it seems instruction definitions for the appropriate sizes are missing at the moment in a lot of cases.
Perhaps the MIMG definitions predate when we added VReg_160 / VReg_192?
I guess I can try and fix those first.

However I think we will have to round up for larger vector sizes as we do not have arbitrary register sizes beyond VReg_192?
There are plenty of instructions which take arbitrary sizes beyond 6 VGPRs, so can only be defined with a single VReg_256/VReg_512 argument.

arsenm added inline comments.Jun 2 2021, 8:20 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4188

We should add the register classes eventually. If we are going to workaround the missing classes, this isn't the place for it. The selector can fixup the classes when we actually need to pick a class

critson updated this revision to Diff 349810.Jun 4 2021, 4:18 AM
  • GlobalISel: move vaddr widening from legalizer to selection
critson marked 2 inline comments as done.Jun 4 2021, 4:19 AM
critson added inline comments.
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4188

I have moved this to the selector, but it seems kind of messy -- which makes me suspect I am doing it wrong.

foad added inline comments.Jun 4 2021, 4:22 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
1695 ↗(On Diff #349810)

divideCeil(..., 32)

llvm/lib/Target/AMDGPU/GCNSubtarget.h
139

Is there a reason this can't be unsigned?

critson marked 3 inline comments as done.Jun 4 2021, 6:43 PM
critson added inline comments.
llvm/lib/Target/AMDGPU/GCNSubtarget.h
139

The tablegen part needs to be int, but this can be unsigned.

critson updated this revision to Diff 350005.Jun 4 2021, 6:44 PM
critson marked an inline comment as done.
  • Address reviewer comments
critson updated this revision to Diff 353296.Jun 21 2021, 1:21 AM
  • Simplify based on MIMG support for v5/v6/v7
  • Update with gfx1013 target
foad added inline comments.Jun 21 2021, 1:38 AM
llvm/lib/Target/AMDGPU/AMDGPU.td
632–637

It seems a bit odd to define it like this gives the impression that the limit varies on different subtargets by design, which it doesn't really. The only reason to limit it to 5 is to "avoid stability issues" (i.e. work around hardware bugs). But I'm not sure what else to suggest.

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4156–4161

This part looks like a separate clean-up or latent bug fix?

4377

Don't need the cast

4415

Don't need the cast (that was the whole point of making it unsigned).

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
6216

Don't need the cast.

critson updated this revision to Diff 353330.Jun 21 2021, 3:59 AM
critson marked 2 inline comments as done.
  • Remove casts
llvm/lib/Target/AMDGPU/AMDGPU.td
632–637

I agree it is a bit odd, but does provide us flexibility if we need to change the limit again for other hardware issues.

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4156–4161

It is a fix for a bug triggered by changes in this diff.
I can move it to another review, but I do not think I'll be able to build a test case for it.

foad accepted this revision.Jun 21 2021, 5:23 AM
foad added inline comments.
llvm/lib/Target/AMDGPU/AMDGPU.td
632–637

Fair enough

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4156–4161

Fair enough

This revision is now accepted and ready to land.Jun 21 2021, 5:23 AM
This revision was automatically updated to reflect the committed changes.