This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Add support for return values (8 patches)
AbandonedPublic

Authored by mareko on Jan 6 2016, 4:59 PM.

Details

Summary

These 8 patches add support for return values and other stuff allowing compiling parts of shaders separately and concatenating their binaries to form a complete shader. This feature will be used by Mesa.

There is a hack for ISD::MERGE_VALUES which I'm not too proud of. For some reason, getCopyToReg(Constant) is translated to V_MOV, which is problematic if the output should be an SGPR.

It already works except for a bug in SIFoldOperands that sometimes generates completely wrong code if some return values occur. I still have to fix that.

Please review.

Diff Detail

Event Timeline

mareko updated this revision to Diff 44182.Jan 6 2016, 4:59 PM
mareko retitled this revision from to AMDGPU/SI: Add new target attribute InitialPSInputAddr.
mareko updated this object.
mareko retitled this revision from AMDGPU/SI: Add new target attribute InitialPSInputAddr to AMDGPU/SI: Add support for return values (8 patches).Jan 6 2016, 5:00 PM

For some reason, getCopyToReg(Constant) is translated to V_MOV, which is problematic if the output should be an SGPR.

We should be emitting s_mov_b32 for constants instead of v_mov_b32. This can be done by changing the pattern in SIInstructions.td. I may have a patch for this somewhere.

lib/Target/AMDGPU/AMDGPUCallingConv.td
20–26

Will different shader types have different calling conventions? If so, it might make sense to create a separate CC_SI def for each shader type.

mareko added inline comments.Jan 6 2016, 5:35 PM
lib/Target/AMDGPU/AMDGPUCallingConv.td
20–26

The calling convention will always be the same, but the number of SGPR and VGPR inputs and outputs can be vary even among shaders of the same type.

mareko abandoned this revision.Jan 7 2016, 8:16 AM
arsenm edited edge metadata.Jan 7 2016, 10:19 AM

I assume this is still broken for divergent returns?

lib/Target/AMDGPU/AMDGPUCallingConv.td
20–26

Is it actually possible to initialize this many SGPRs? I thought you could only do the 16 user SGPRs

arsenm added inline comments.Jan 7 2016, 10:26 AM
lib/Target/AMDGPU/AMDGPUCallingConv.td
20–26

I think this should be split into a separate CC. I don't think the calling convention should change depending on the type, at least for non-kernel compute functions.

lib/Target/AMDGPU/SIISelLowering.cpp
950

I think you will still encounter ZExt/SExt

lib/Target/AMDGPU/SIMachineFunctionInfo.h
69

New public fields should not be added here

mareko added inline comments.Jan 7 2016, 12:18 PM
lib/Target/AMDGPU/AMDGPUCallingConv.td
20–26

Is it actually possible to initialize this many SGPRs? I thought you could only do the 16 user SGPRs

If you concatenate shader binaries, you are not limited by what the hardware can preload.

16 user SGPRs + several hw-preloaded SGPRs + any SGPRs additionally returned by the previous shader part, which can return any number of SGPRs. Note that the compiled function can be in the middle part of the shader, meaning the compiler doesn't know which instructions will be executed before the beginning and which instructions will be executed after the end. For example, if a shader part doesn't use user SGPRs, it doesn't have to declare them. The shader can still use user SGPRs in a previous shader part. If the next shader part expects user SGPRs at the standard locations, the current part must copy inputs to outputs as-is (this is a no-op, since it doesn't move anything, but it prevents the compiler from overwriting the registers).

40 SGPRs should be enough for now, but if we wanted, we can get 102 input SGPRs very easily.

I think this should be split into a separate CC. I don't think the calling
convention should change depending on the type, at least for non-kernel
compute functions.

Why? The calling convention is very flexible. It doesn't break existing applications. Shaders can get and return anything, even void. An input SGPR is marked by the "inreg" or "byval" flag, otherwise it's a VGPR. Output SGPRs must be declared as i32, VGPRs are f32.

lib/Target/AMDGPU/SIISelLowering.cpp
950

Even if we only use f32 and i32 in Mesa?

lib/Target/AMDGPU/SIMachineFunctionInfo.h
69

Private it is then.

nhaehnle added inline comments.Jan 8 2016, 7:02 AM
lib/Target/AMDGPU/AMDGPUCallingConv.td
20–26

We can only get 69 SGPRs easily: 80 SGPRs on Tonga/Iceland, of which 6 may be reserved for vcc, xnack_mask, flat_scr, and 5 are potentially needed for the scratch buffer descriptor and scratch wave offset. In any case, it doesn't seem like it'd ever be an issue.

(I've idly wondered in the past whether we couldn't fold the scratch wave offset into the buffer descriptor, which would save one SGPR. And theoretically, vcc could be abused for input/output as well, and xnack_mask at least in the GPU shader. But I wouldn't call those "easily" ;))

arsenm added inline comments.Jan 8 2016, 11:09 AM
lib/Target/AMDGPU/AMDGPUCallingConv.td
20–26

Because changing the register class based on the type doesn't make any sense for a general calling convention. Compute will never want this, which will probably always use VGPR arguments except for a few special cases.