This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Enable `permlanex16` selection with `+16-bit-insts,+gfx10-insts`
AbandonedPublic

Authored by Pierre-vh on Oct 28 2022, 5:43 AM.

Details

Summary

ROCm device libs can emit permlanex w/ the +gfx10-insts attribute, and it counts on the optimizer to remove the call if the GPU is <GFX10.
When built at O0 it caused codegen issues as Clang allowed this intrinsic to go through with just +gfx10-insts, but the backend wanted the GPU to be >=GFX10 as well.

This patch allows selecting that intrinsic when just minimum required attributes are present. That is, +gfx10-insts & +16-bit-insts.

Depends on D136944

Diff Detail

Event Timeline

Pierre-vh created this revision.Oct 28 2022, 5:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 28 2022, 5:43 AM
Pierre-vh requested review of this revision.Oct 28 2022, 5:43 AM
foad added inline comments.Oct 28 2022, 5:57 AM
llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
185

Yuck. Why is this required?

Pierre-vh updated this revision to Diff 471519.Oct 28 2022, 6:21 AM
Pierre-vh marked an inline comment as done.

Fix getConstantBusLimit

Pierre-vh added inline comments.Oct 28 2022, 6:23 AM
llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
185

Ah, this is indeed ugly and I forgot to fix it.
Fixed it.

The reason why it's needed is because legalizeOperandsVOP3 sort of assumes (and rightly so) that it'll always return 2 for permlane(x).
If it returns 1, it messes up the legalization because the loop below the permlane-specific logic will undo the legalization.
Also if it returns 1 for PERMLANE, verification will fail complaining about constant bus limit being exceeded, but the instruction needs 2 sgpr operands so it doesn't make sense anyway.

foad added inline comments.Oct 28 2022, 6:23 AM
llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
185

What happens if you don't change this at all?

foad added inline comments.Oct 28 2022, 6:25 AM
llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
185

(Sorry I see you already answered that.)

foad added inline comments.Oct 28 2022, 6:31 AM
llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
185

This still makes me nervous. Are there parts of the compiler that assume this'll return 1 for pre-GFX10 instructions? Have you tried compiling a large body of code with -mcpu=gfx900 -mattr=+gfx10insts to see if the compiler crashes horribly?

Maybe it would be cleaner to special-case v_permlane in legalizeOperandsVOP3 and/or verifyInstruction.

arsenm added inline comments.Oct 28 2022, 8:57 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
557

Nothing here requires or implies a need for mad64_32?

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
185

I'd also rather not touch this.

This is another case where I feel we would be better off directly emitting subtarget specific instructions and reading the property out of the instruction properties

Pierre-vh planned changes to this revision.Nov 2 2022, 2:02 AM
Pierre-vh added inline comments.Nov 3 2022, 6:12 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
557

I don't really know, the assert was just tripping for some reason. Maybe I can just remove the assert and that's it?

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
185

What could be the alternative then? Just special-casing the instructions? Or not touching getConstantBusLimit at all?
I'm not sure the latter is a good idea, it should really return 2 for those instructions, no?

Pierre-vh abandoned this revision.Nov 15 2022, 12:26 AM