Check maximum NSA size when selecting NSA or non-NSA BVH instructions.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This is primarily for testing stability issues with long form NSA instructions.
I am half minded that it does not need to go into the code base, but feel like the code should at least exist somewhere, such as a this review.
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
4794 | Maybe use a 2 by 2 by 2 array of opcodes? |
Ping.
It would be good to get this merged now the other NSA limiting features are in.
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
4825 | Why do we have to round up to 8 or 16? | |
4829 | Can this be undef instead of 0? Can we push the same register N times instead of creating N different registers? | |
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
4260 | Why does this hard code 512, when the comment says 256 or 512? |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
4825 | I cannot remember why I did this for GlobalIsel. |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
4826–4827 | Nit: we generally avoid explicit createGenericVirtualRegister calls. You can write: Register MergedOps = B.buildMerge(OpTy, Ops).getReg(0); | |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
7419 | Same question as for globalisel: do we need to round up at all here? Rounding up to 8 certainly seems odd now that we have v5, v6, v7 classes. | |
7421 | Can we use undef instead of zero to avoid having to materialise a constant? |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
4826–4827 | Sure, I think I just copied the style of the code above. | |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
7419 | BVH minimum size is 256-bits, so the new MIMG v5/v6/v7 are not relevant here. |
LGTM, thanks!
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
7421 | Nit: I think you can do Ops.append(16 - Ops.size(), Undef). |
Sorry, I just realized you had not accepted this when I landed it (after addressing the nit). Please let me know if you have any concerns.
Oh, did I forget to press the "accept" button? If so that was just an oversight on my part.
Maybe use a 2 by 2 by 2 array of opcodes?