This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Disable NSA for BVH instructions when appropriate
ClosedPublic

Authored by critson on May 27 2021, 2:05 AM.

Details

Summary

Check maximum NSA size when selecting NSA or non-NSA BVH instructions.

Diff Detail

Event Timeline

critson created this revision.May 27 2021, 2:05 AM
critson requested review of this revision.May 27 2021, 2:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2021, 2:05 AM

Any reason to force SA?

Any reason to force SA?

This is primarily for testing stability issues with long form NSA instructions.
I am half minded that it does not need to go into the code base, but feel like the code should at least exist somewhere, such as a this review.

critson updated this revision to Diff 353298.Jun 21 2021, 1:22 AM
  • Rework this based on NSA limit functionality
critson retitled this revision from [AMDGPU] Add options to disable NSA for BVH instructions to [AMDGPU] Disable NSA for BVH instructions when appropriate.Jun 21 2021, 1:23 AM
critson edited the summary of this revision. (Show Details)
foad added inline comments.Jun 21 2021, 1:43 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4794

Maybe use a 2 by 2 by 2 array of opcodes?

critson updated this revision to Diff 353329.Jun 21 2021, 3:52 AM
  • Use array for opcode look up
critson updated this revision to Diff 361583.Jul 25 2021, 11:59 PM

Ping.
It would be good to get this merged now the other NSA limiting features are in.

foad added inline comments.Jul 26 2021, 7:15 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4825

Why do we have to round up to 8 or 16?

4829

Can this be undef instead of 0? Can we push the same register N times instead of creating N different registers?

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
4260

Why does this hard code 512, when the comment says 256 or 512?

critson updated this revision to Diff 361938.Jul 27 2021, 1:25 AM
  • Remove unnecessary padding.
  • Properly set bank size
critson marked 4 inline comments as done.Jul 27 2021, 1:29 AM
critson added inline comments.
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4825

I cannot remember why I did this for GlobalIsel.
It certainly works without, perhaps it didn't when I first wrote this.

foad added inline comments.Jul 27 2021, 2:19 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4826–4827

Nit: we generally avoid explicit createGenericVirtualRegister calls. You can write: Register MergedOps = B.buildMerge(OpTy, Ops).getReg(0);

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
7419

Same question as for globalisel: do we need to round up at all here? Rounding up to 8 certainly seems odd now that we have v5, v6, v7 classes.

7421

Can we use undef instead of zero to avoid having to materialise a constant?

critson marked 4 inline comments as done.Jul 30 2021, 12:59 AM
critson added inline comments.
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4826–4827

Sure, I think I just copied the style of the code above.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
7419

BVH minimum size is 256-bits, so the new MIMG v5/v6/v7 are not relevant here.
I have however rewritten this code to only do anything above 8 VGPRs.

critson updated this revision to Diff 362982.Jul 30 2021, 12:59 AM
critson marked 2 inline comments as done.
  • Address reviewer comments
foad added a comment.Aug 2 2021, 2:06 AM

LGTM, thanks!

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
7421

Nit: I think you can do Ops.append(16 - Ops.size(), Undef).

This revision was not accepted when it landed; it landed in state Needs Review.Aug 2 2021, 4:11 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.

Sorry, I just realized you had not accepted this when I landed it (after addressing the nit). Please let me know if you have any concerns.

foad added a comment.EditedAug 2 2021, 5:17 AM

Sorry, I just realized you had not accepted this when I landed it

Oh, did I forget to press the "accept" button? If so that was just an oversight on my part.