If more registers are needed for VAddr then the NSA format allows then the
final register can act as a contiguous set of remaining addresses. Update
legalizer to pack register for this new format and allow instruction
selection to use NSA encoding when number of addresses exceeds max size.
Also update SIShrinkInstructions to handle partial NSA.
Details
Diff Detail
Unit Tests
Time | Test | |
---|---|---|
9,320 ms | x64 debian > MLIR.Examples/standalone::test.toy |
Event Timeline
Do we need verifier checks for the NSA restrictions on gfx11?
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
---|---|---|
1863 | Add a comment to explain there's a limit on gfx11 | |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
4984 | Can you make the descriptive of the NSA behavior rather than check/refer to gfx11+? | |
5001 | Comment that this is packing the last register? | |
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
4639 | Avoid generation check? | |
4643 | This is just RI.getRegSizeInBits |
Since PartialNSA is now a SubtargetFeature it can be turned off with -mattr=+partial-nsa-encoding to get old behavior.
Thank you for implementing this.
I have left a few minor comments inline.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
6598 | This condition is redundant, as you already placed the same condition in UsePartialNSA? | |
6669 | Redundant condition as above -- presumably else branch implies UseNSA, so should not need to be a duplicate of it? | |
llvm/test/CodeGen/AMDGPU/cluster_stores.ll | ||
462 | It is interesting this has been able to reduce register pressure over GFX10. | |
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.dim.ll | ||
1675 | This doesn't need NSA. |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
6496 | How is this change related? | |
6598 | I think it would be simpler overall to call getBuildDwordsVector(DAG, DL, ArrayRef(VAddrs).drop_front(NSAMaxSize - 1)) here, instead of adding another argument to getBuildDwordsVector. | |
6670 | Can you write append_range(Ops, ArrayRef(VAddrs).take_front(NSAMaxSize - 1))? |
Add a comment to explain there's a limit on gfx11