This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Add support for return values (8 patches)
AbandonedPublic

Authored by mareko on Jan 6 2016, 4:59 PM.

Download Raw Diff

Details

Reviewers

• tstellarAMD
nhaehnle
arsenm

Summary

These 8 patches add support for return values and other stuff allowing compiling parts of shaders separately and concatenating their binaries to form a complete shader. This feature will be used by Mesa.

There is a hack for ISD::MERGE_VALUES which I'm not too proud of. For some reason, getCopyToReg(Constant) is translated to V_MOV, which is problematic if the output should be an SGPR.

It already works except for a bug in SIFoldOperands that sometimes generates completely wrong code if some return values occur. I still have to fix that.

Please review.

Diff Detail

Event Timeline

mareko updated this revision to Diff 44182.Jan 6 2016, 4:59 PM

mareko retitled this revision from to AMDGPU/SI: Add new target attribute InitialPSInputAddr.

mareko updated this object.

mareko added reviewers: • tstellarAMD, arsenm, nhaehnle.

Herald added a subscriber: arsenm. · View Herald TranscriptJan 6 2016, 4:59 PM

mareko retitled this revision from AMDGPU/SI: Add new target attribute InitialPSInputAddr to AMDGPU/SI: Add support for return values (8 patches).Jan 6 2016, 5:00 PM

For some reason, getCopyToReg(Constant) is translated to V_MOV, which is problematic if the output should be an SGPR.

We should be emitting s_mov_b32 for constants instead of v_mov_b32. This can be done by changing the pattern in SIInstructions.td. I may have a patch for this somewhere.

lib/Target/AMDGPU/AMDGPUCallingConv.td
20–26	Will different shader types have different calling conventions? If so, it might make sense to create a separate CC_SI def for each shader type.

mareko added inline comments.Jan 6 2016, 5:35 PM

lib/Target/AMDGPU/AMDGPUCallingConv.td
20–26	The calling convention will always be the same, but the number of SGPR and VGPR inputs and outputs can be vary even among shaders of the same type.

mareko abandoned this revision.Jan 7 2016, 8:16 AM

I assume this is still broken for divergent returns?

lib/Target/AMDGPU/AMDGPUCallingConv.td
20–26	Is it actually possible to initialize this many SGPRs? I thought you could only do the 16 user SGPRs

arsenm added inline comments.Jan 7 2016, 10:26 AM

lib/Target/AMDGPU/AMDGPUCallingConv.td
20–26	I think this should be split into a separate CC. I don't think the calling convention should change depending on the type, at least for non-kernel compute functions.
lib/Target/AMDGPU/SIISelLowering.cpp
950	I think you will still encounter ZExt/SExt
lib/Target/AMDGPU/SIMachineFunctionInfo.h
69	New public fields should not be added here

mareko added inline comments.Jan 7 2016, 12:18 PM

lib/Target/AMDGPU/AMDGPUCallingConv.td
20–26	Is it actually possible to initialize this many SGPRs? I thought you could only do the 16 user SGPRs If you concatenate shader binaries, you are not limited by what the hardware can preload. 16 user SGPRs + several hw-preloaded SGPRs + any SGPRs additionally returned by the previous shader part, which can return any number of SGPRs. Note that the compiled function can be in the middle part of the shader, meaning the compiler doesn't know which instructions will be executed before the beginning and which instructions will be executed after the end. For example, if a shader part doesn't use user SGPRs, it doesn't have to declare them. The shader can still use user SGPRs in a previous shader part. If the next shader part expects user SGPRs at the standard locations, the current part must copy inputs to outputs as-is (this is a no-op, since it doesn't move anything, but it prevents the compiler from overwriting the registers). 40 SGPRs should be enough for now, but if we wanted, we can get 102 input SGPRs very easily. I think this should be split into a separate CC. I don't think the calling convention should change depending on the type, at least for non-kernel compute functions. Why? The calling convention is very flexible. It doesn't break existing applications. Shaders can get and return anything, even void. An input SGPR is marked by the "inreg" or "byval" flag, otherwise it's a VGPR. Output SGPRs must be declared as i32, VGPRs are f32.
lib/Target/AMDGPU/SIISelLowering.cpp
950	Even if we only use f32 and i32 in Mesa?
lib/Target/AMDGPU/SIMachineFunctionInfo.h
69	Private it is then.

nhaehnle added inline comments.Jan 8 2016, 7:02 AM

lib/Target/AMDGPU/AMDGPUCallingConv.td
20–26	We can only get 69 SGPRs easily: 80 SGPRs on Tonga/Iceland, of which 6 may be reserved for vcc, xnack_mask, flat_scr, and 5 are potentially needed for the scratch buffer descriptor and scratch wave offset. In any case, it doesn't seem like it'd ever be an issue. (I've idly wondered in the past whether we couldn't fold the scratch wave offset into the buffer descriptor, which would save one SGPR. And theoretically, vcc could be abused for input/output as well, and xnack_mask at least in the GPU shader. But I wouldn't call those "easily" ;))

arsenm added inline comments.Jan 8 2016, 11:09 AM

lib/Target/AMDGPU/AMDGPUCallingConv.td
20–26	Because changing the register class based on the type doesn't make any sense for a general calling convention. Compute will never want this, which will probably always use VGPR arguments except for a few special cases.

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

AMDGPUAsmPrinter.cpp

2 lines

AMDGPUCallingConv.td

67 lines

AMDGPUISelLowering.h

2 lines

AMDGPUISelLowering.cpp

6 lines

2 lines

2 lines

7 lines

112 lines

17 lines

SIMachineFunctionInfo.h

2 lines

SIMachineFunctionInfo.cpp

4 lines

Utils/

AMDGPUBaseInfo.h

2 lines

AMDGPUBaseInfo.cpp

21 lines

test/

CodeGen/

AMDGPU/

ret.ll

200 lines

Diff 44182

lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp

Show First 20 Lines • Show All 557 Lines • ▼ Show 20 Lines	if (STM.isVGPRSpillingEnabled(MFI)) {
OutStreamer->EmitIntValue(S_0286E8_WAVESIZE(KernelInfo.ScratchBlocks), 4);		OutStreamer->EmitIntValue(S_0286E8_WAVESIZE(KernelInfo.ScratchBlocks), 4);
}		}
}		}

if (MFI->getShaderType() == ShaderType::PIXEL) {		if (MFI->getShaderType() == ShaderType::PIXEL) {
OutStreamer->EmitIntValue(R_00B02C_SPI_SHADER_PGM_RSRC2_PS, 4);		OutStreamer->EmitIntValue(R_00B02C_SPI_SHADER_PGM_RSRC2_PS, 4);
OutStreamer->EmitIntValue(S_00B02C_EXTRA_LDS_SIZE(KernelInfo.LDSBlocks), 4);		OutStreamer->EmitIntValue(S_00B02C_EXTRA_LDS_SIZE(KernelInfo.LDSBlocks), 4);
OutStreamer->EmitIntValue(R_0286CC_SPI_PS_INPUT_ENA, 4);		OutStreamer->EmitIntValue(R_0286CC_SPI_PS_INPUT_ENA, 4);
		OutStreamer->EmitIntValue(MFI->PSInputEna, 4);
		OutStreamer->EmitIntValue(R_0286D0_SPI_PS_INPUT_ADDR, 4);
OutStreamer->EmitIntValue(MFI->PSInputAddr, 4);		OutStreamer->EmitIntValue(MFI->PSInputAddr, 4);
}		}
}		}

void AMDGPUAsmPrinter::EmitAmdKernelCodeT(const MachineFunction &MF,		void AMDGPUAsmPrinter::EmitAmdKernelCodeT(const MachineFunction &MF,
const SIProgramInfo &KernelInfo) const {		const SIProgramInfo &KernelInfo) const {
const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
const AMDGPUSubtarget &STM = MF.getSubtarget<AMDGPUSubtarget>();		const AMDGPUSubtarget &STM = MF.getSubtarget<AMDGPUSubtarget>();
▲ Show 20 Lines • Show All 83 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUCallingConv.td

	Show All 11 Lines
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	// Inversion of CCIfInReg			// Inversion of CCIfInReg
	class CCIfNotInReg<CCAction A> : CCIf<"!ArgFlags.isInReg()", A> {}			class CCIfNotInReg<CCAction A> : CCIf<"!ArgFlags.isInReg()", A> {}

	// Calling convention for SI			// Calling convention for SI
	def CC_SI : CallingConv<[			def CC_SI : CallingConv<[

	CCIfInReg<CCIfType<[f32, i32] , CCAssignToReg<[			CCIfInReg<CCIfType<[f32, i32] , CCAssignToReg<[
	SGPR0, SGPR1, SGPR2, SGPR3, SGPR4, SGPR5, SGPR6, SGPR7,			SGPR0, SGPR1, SGPR2, SGPR3, SGPR4, SGPR5, SGPR6, SGPR7,
	SGPR8, SGPR9, SGPR10, SGPR11, SGPR12, SGPR13, SGPR14, SGPR15,			SGPR8, SGPR9, SGPR10, SGPR11, SGPR12, SGPR13, SGPR14, SGPR15,
	SGPR16, SGPR17, SGPR18, SGPR19, SGPR20, SGPR21			SGPR16, SGPR17, SGPR18, SGPR19, SGPR20, SGPR21, SGPR22, SGPR23,
				SGPR24, SGPR25, SGPR26, SGPR27, SGPR28, SGPR29, SGPR30, SGPR31,
				SGPR32, SGPR33, SGPR34, SGPR35, SGPR36, SGPR37, SGPR38, SGPR39
	]>>>,			]>>>,
				tstellarAMDUnsubmitted Not Done Reply Inline Actions Will different shader types have different calling conventions? If so, it might make sense to create a separate CC_SI def for each shader type. tstellarAMD: Will different shader types have different calling conventions? If so, it might make sense to…
				marekoAuthorUnsubmitted Not Done Reply Inline Actions The calling convention will always be the same, but the number of SGPR and VGPR inputs and outputs can be vary even among shaders of the same type. mareko: The calling convention will always be the same, but the number of SGPR and VGPR inputs and…
				arsenmUnsubmitted Not Done Reply Inline Actions Is it actually possible to initialize this many SGPRs? I thought you could only do the 16 user SGPRs arsenm: Is it actually possible to initialize this many SGPRs? I thought you could only do the 16 user…
				arsenmUnsubmitted Not Done Reply Inline Actions I think this should be split into a separate CC. I don't think the calling convention should change depending on the type, at least for non-kernel compute functions. arsenm: I think this should be split into a separate CC. I don't think the calling convention should…
				marekoAuthorUnsubmitted Not Done Reply Inline Actions Is it actually possible to initialize this many SGPRs? I thought you could only do the 16 user SGPRs If you concatenate shader binaries, you are not limited by what the hardware can preload. 16 user SGPRs + several hw-preloaded SGPRs + any SGPRs additionally returned by the previous shader part, which can return any number of SGPRs. Note that the compiled function can be in the middle part of the shader, meaning the compiler doesn't know which instructions will be executed before the beginning and which instructions will be executed after the end. For example, if a shader part doesn't use user SGPRs, it doesn't have to declare them. The shader can still use user SGPRs in a previous shader part. If the next shader part expects user SGPRs at the standard locations, the current part must copy inputs to outputs as-is (this is a no-op, since it doesn't move anything, but it prevents the compiler from overwriting the registers). 40 SGPRs should be enough for now, but if we wanted, we can get 102 input SGPRs very easily. I think this should be split into a separate CC. I don't think the calling convention should change depending on the type, at least for non-kernel compute functions. Why? The calling convention is very flexible. It doesn't break existing applications. Shaders can get and return anything, even void. An input SGPR is marked by the "inreg" or "byval" flag, otherwise it's a VGPR. Output SGPRs must be declared as i32, VGPRs are f32. mareko: > Is it actually possible to initialize this many SGPRs? I thought you could only do the 16…
				nhaehnleUnsubmitted Not Done Reply Inline Actions We can only get 69 SGPRs easily: 80 SGPRs on Tonga/Iceland, of which 6 may be reserved for vcc, xnack_mask, flat_scr, and 5 are potentially needed for the scratch buffer descriptor and scratch wave offset. In any case, it doesn't seem like it'd ever be an issue. (I've idly wondered in the past whether we couldn't fold the scratch wave offset into the buffer descriptor, which would save one SGPR. And theoretically, vcc could be abused for input/output as well, and xnack_mask at least in the GPU shader. But I wouldn't call those "easily" ;)) nhaehnle: We can only get 69 SGPRs easily: 80 SGPRs on Tonga/Iceland, of which 6 may be reserved for vcc…
				arsenmUnsubmitted Not Done Reply Inline Actions Because changing the register class based on the type doesn't make any sense for a general calling convention. Compute will never want this, which will probably always use VGPR arguments except for a few special cases. arsenm: Because changing the register class based on the type doesn't make any sense for a general…

	CCIfInReg<CCIfType<[i64] , CCAssignToRegWithShadow<			CCIfInReg<CCIfType<[i64] , CCAssignToRegWithShadow<
	[ SGPR0, SGPR2, SGPR4, SGPR6, SGPR8, SGPR10, SGPR12, SGPR14 ],			[ SGPR0, SGPR2, SGPR4, SGPR6, SGPR8, SGPR10, SGPR12, SGPR14,
	[ SGPR1, SGPR3, SGPR5, SGPR7, SGPR9, SGPR11, SGPR13, SGPR15 ]			SGPR16, SGPR18, SGPR20, SGPR22, SGPR24, SGPR26, SGPR28, SGPR30,
				SGPR32, SGPR34, SGPR36, SGPR38 ],
				[ SGPR1, SGPR3, SGPR5, SGPR7, SGPR9, SGPR11, SGPR13, SGPR15,
				SGPR17, SGPR19, SGPR21, SGPR23, SGPR25, SGPR27, SGPR29, SGPR31,
				SGPR33, SGPR35, SGPR37, SGPR39 ]
	>>>,			>>>,

				// 32*4 + 4 is the minimum for a fetch shader consumer with 32 inputs.
	CCIfNotInReg<CCIfType<[f32, i32] , CCAssignToReg<[			CCIfNotInReg<CCIfType<[f32, i32] , CCAssignToReg<[
	VGPR0, VGPR1, VGPR2, VGPR3, VGPR4, VGPR5, VGPR6, VGPR7,			VGPR0, VGPR1, VGPR2, VGPR3, VGPR4, VGPR5, VGPR6, VGPR7,
	VGPR8, VGPR9, VGPR10, VGPR11, VGPR12, VGPR13, VGPR14, VGPR15,			VGPR8, VGPR9, VGPR10, VGPR11, VGPR12, VGPR13, VGPR14, VGPR15,
	VGPR16, VGPR17, VGPR18, VGPR19, VGPR20, VGPR21, VGPR22, VGPR23,			VGPR16, VGPR17, VGPR18, VGPR19, VGPR20, VGPR21, VGPR22, VGPR23,
	VGPR24, VGPR25, VGPR26, VGPR27, VGPR28, VGPR29, VGPR30, VGPR31			VGPR24, VGPR25, VGPR26, VGPR27, VGPR28, VGPR29, VGPR30, VGPR31,
				VGPR32, VGPR33, VGPR34, VGPR35, VGPR36, VGPR37, VGPR38, VGPR39,
				VGPR40, VGPR41, VGPR42, VGPR43, VGPR44, VGPR45, VGPR46, VGPR47,
				VGPR48, VGPR49, VGPR50, VGPR51, VGPR52, VGPR53, VGPR54, VGPR55,
				VGPR56, VGPR57, VGPR58, VGPR59, VGPR60, VGPR61, VGPR62, VGPR63,
				VGPR64, VGPR65, VGPR66, VGPR67, VGPR68, VGPR69, VGPR70, VGPR71,
				VGPR72, VGPR73, VGPR74, VGPR75, VGPR76, VGPR77, VGPR78, VGPR79,
				VGPR80, VGPR81, VGPR82, VGPR83, VGPR84, VGPR85, VGPR86, VGPR87,
				VGPR88, VGPR89, VGPR90, VGPR91, VGPR92, VGPR93, VGPR94, VGPR95,
				VGPR96, VGPR97, VGPR98, VGPR99, VGPR100, VGPR101, VGPR102, VGPR103,
				VGPR104, VGPR105, VGPR106, VGPR107, VGPR108, VGPR109, VGPR110, VGPR111,
				VGPR112, VGPR113, VGPR114, VGPR115, VGPR116, VGPR117, VGPR118, VGPR119,
				VGPR120, VGPR121, VGPR122, VGPR123, VGPR124, VGPR125, VGPR126, VGPR127,
				VGPR128, VGPR129, VGPR130, VGPR131, VGPR132, VGPR133, VGPR134, VGPR135
	]>>>,			]>>>,

	CCIfByVal<CCIfType<[i64] , CCAssignToRegWithShadow<			CCIfByVal<CCIfType<[i64] , CCAssignToRegWithShadow<
	[ SGPR0, SGPR2, SGPR4, SGPR6, SGPR8, SGPR10, SGPR12, SGPR14 ],			[ SGPR0, SGPR2, SGPR4, SGPR6, SGPR8, SGPR10, SGPR12, SGPR14,
	[ SGPR1, SGPR3, SGPR5, SGPR7, SGPR9, SGPR11, SGPR13, SGPR15 ]			SGPR16, SGPR18, SGPR20, SGPR22, SGPR24, SGPR26, SGPR28, SGPR30,
				SGPR32, SGPR34, SGPR36, SGPR38 ],
				[ SGPR1, SGPR3, SGPR5, SGPR7, SGPR9, SGPR11, SGPR13, SGPR15,
				SGPR17, SGPR19, SGPR21, SGPR23, SGPR25, SGPR27, SGPR29, SGPR31,
				SGPR33, SGPR35, SGPR37, SGPR39 ]
	>>>			>>>

	]>;			]>;

				def RetCC_SI : CallingConv<[
				CCIfType<[i32] , CCAssignToReg<[
				SGPR0, SGPR1, SGPR2, SGPR3, SGPR4, SGPR5, SGPR6, SGPR7,
				SGPR8, SGPR9, SGPR10, SGPR11, SGPR12, SGPR13, SGPR14, SGPR15,
				SGPR16, SGPR17, SGPR18, SGPR19, SGPR20, SGPR21, SGPR22, SGPR23,
				SGPR24, SGPR25, SGPR26, SGPR27, SGPR28, SGPR29, SGPR30, SGPR31,
				SGPR32, SGPR33, SGPR34, SGPR35, SGPR36, SGPR37, SGPR38, SGPR39
				]>>,

				// 32*4 + 4 is the minimum for a fetch shader with 32 outputs.
				CCIfType<[f32] , CCAssignToReg<[
				VGPR0, VGPR1, VGPR2, VGPR3, VGPR4, VGPR5, VGPR6, VGPR7,
				VGPR8, VGPR9, VGPR10, VGPR11, VGPR12, VGPR13, VGPR14, VGPR15,
				VGPR16, VGPR17, VGPR18, VGPR19, VGPR20, VGPR21, VGPR22, VGPR23,
				VGPR24, VGPR25, VGPR26, VGPR27, VGPR28, VGPR29, VGPR30, VGPR31,
				VGPR32, VGPR33, VGPR34, VGPR35, VGPR36, VGPR37, VGPR38, VGPR39,
				VGPR40, VGPR41, VGPR42, VGPR43, VGPR44, VGPR45, VGPR46, VGPR47,
				VGPR48, VGPR49, VGPR50, VGPR51, VGPR52, VGPR53, VGPR54, VGPR55,
				VGPR56, VGPR57, VGPR58, VGPR59, VGPR60, VGPR61, VGPR62, VGPR63,
				VGPR64, VGPR65, VGPR66, VGPR67, VGPR68, VGPR69, VGPR70, VGPR71,
				VGPR72, VGPR73, VGPR74, VGPR75, VGPR76, VGPR77, VGPR78, VGPR79,
				VGPR80, VGPR81, VGPR82, VGPR83, VGPR84, VGPR85, VGPR86, VGPR87,
				VGPR88, VGPR89, VGPR90, VGPR91, VGPR92, VGPR93, VGPR94, VGPR95,
				VGPR96, VGPR97, VGPR98, VGPR99, VGPR100, VGPR101, VGPR102, VGPR103,
				VGPR104, VGPR105, VGPR106, VGPR107, VGPR108, VGPR109, VGPR110, VGPR111,
				VGPR112, VGPR113, VGPR114, VGPR115, VGPR116, VGPR117, VGPR118, VGPR119,
				VGPR120, VGPR121, VGPR122, VGPR123, VGPR124, VGPR125, VGPR126, VGPR127,
				VGPR128, VGPR129, VGPR130, VGPR131, VGPR132, VGPR133, VGPR134, VGPR135
				]>>
				]>;

	// Calling convention for R600			// Calling convention for R600
	def CC_R600 : CallingConv<[			def CC_R600 : CallingConv<[
	CCIfInReg<CCIfType<[v4f32, v4i32] , CCAssignToReg<[			CCIfInReg<CCIfType<[v4f32, v4i32] , CCAssignToReg<[
	T0_XYZW, T1_XYZW, T2_XYZW, T3_XYZW, T4_XYZW, T5_XYZW, T6_XYZW, T7_XYZW,			T0_XYZW, T1_XYZW, T2_XYZW, T3_XYZW, T4_XYZW, T5_XYZW, T6_XYZW, T7_XYZW,
	T8_XYZW, T9_XYZW, T10_XYZW, T11_XYZW, T12_XYZW, T13_XYZW, T14_XYZW, T15_XYZW,			T8_XYZW, T9_XYZW, T10_XYZW, T11_XYZW, T12_XYZW, T13_XYZW, T14_XYZW, T15_XYZW,
	T16_XYZW, T17_XYZW, T18_XYZW, T19_XYZW, T20_XYZW, T21_XYZW, T22_XYZW,			T16_XYZW, T17_XYZW, T18_XYZW, T19_XYZW, T20_XYZW, T21_XYZW, T22_XYZW,
	T23_XYZW, T24_XYZW, T25_XYZW, T26_XYZW, T27_XYZW, T28_XYZW, T29_XYZW,			T23_XYZW, T24_XYZW, T25_XYZW, T26_XYZW, T27_XYZW, T28_XYZW, T29_XYZW,
	T30_XYZW, T31_XYZW, T32_XYZW			T30_XYZW, T31_XYZW, T32_XYZW
	Show All 30 Lines

lib/Target/AMDGPU/AMDGPUISelLowering.h

Show First 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	protected:
/// from the LLVM IR Function and fixup the ISD:InputArg values before		/// from the LLVM IR Function and fixup the ISD:InputArg values before
/// passing them to AnalyzeFormalArguments()		/// passing them to AnalyzeFormalArguments()
void getOriginalFunctionArgs(SelectionDAG &DAG,		void getOriginalFunctionArgs(SelectionDAG &DAG,
const Function *F,		const Function *F,
const SmallVectorImpl<ISD::InputArg> &Ins,		const SmallVectorImpl<ISD::InputArg> &Ins,
SmallVectorImpl<ISD::InputArg> &OrigIns) const;		SmallVectorImpl<ISD::InputArg> &OrigIns) const;
void AnalyzeFormalArguments(CCState &State,		void AnalyzeFormalArguments(CCState &State,
const SmallVectorImpl<ISD::InputArg> &Ins) const;		const SmallVectorImpl<ISD::InputArg> &Ins) const;
		void AnalyzeReturn(CCState &State,
		const SmallVectorImpl<ISD::OutputArg> &Outs) const;

public:		public:
AMDGPUTargetLowering(TargetMachine &TM, const AMDGPUSubtarget &STI);		AMDGPUTargetLowering(TargetMachine &TM, const AMDGPUSubtarget &STI);

bool isFAbsFree(EVT VT) const override;		bool isFAbsFree(EVT VT) const override;
bool isFNegFree(EVT VT) const override;		bool isFNegFree(EVT VT) const override;
bool isTruncateFree(EVT Src, EVT Dest) const override;		bool isTruncateFree(EVT Src, EVT Dest) const override;
bool isTruncateFree(Type Src, Type Dest) const override;		bool isTruncateFree(Type Src, Type Dest) const override;
▲ Show 20 Lines • Show All 194 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

	Show First 20 Lines • Show All 559 Lines • ▼ Show 20 Lines
	//===---------------------------------------------------------------------===//			//===---------------------------------------------------------------------===//

	void AMDGPUTargetLowering::AnalyzeFormalArguments(CCState &State,			void AMDGPUTargetLowering::AnalyzeFormalArguments(CCState &State,
	const SmallVectorImpl<ISD::InputArg> &Ins) const {			const SmallVectorImpl<ISD::InputArg> &Ins) const {

	State.AnalyzeFormalArguments(Ins, CC_AMDGPU);			State.AnalyzeFormalArguments(Ins, CC_AMDGPU);
	}			}

				void AMDGPUTargetLowering::AnalyzeReturn(CCState &State,
				const SmallVectorImpl<ISD::OutputArg> &Outs) const {

				State.AnalyzeReturn(Outs, RetCC_SI);
				}

	SDValue AMDGPUTargetLowering::LowerReturn(			SDValue AMDGPUTargetLowering::LowerReturn(
	SDValue Chain,			SDValue Chain,
	CallingConv::ID CallConv,			CallingConv::ID CallConv,
	bool isVarArg,			bool isVarArg,
	const SmallVectorImpl<ISD::OutputArg> &Outs,			const SmallVectorImpl<ISD::OutputArg> &Outs,
	const SmallVectorImpl<SDValue> &OutVals,			const SmallVectorImpl<SDValue> &OutVals,
	SDLoc DL, SelectionDAG &DAG) const {			SDLoc DL, SelectionDAG &DAG) const {
	return DAG.getNode(AMDGPUISD::RET_FLAG, DL, MVT::Other, Chain);			return DAG.getNode(AMDGPUISD::RET_FLAG, DL, MVT::Other, Chain);
	▲ Show 20 Lines • Show All 2,301 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUInstrInfo.td

	Show First 20 Lines • Show All 234 Lines • ▼ Show 20 Lines
	// Flow Control DAG Nodes			// Flow Control DAG Nodes
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	def IL_brcond : SDNode<"AMDGPUISD::BRANCH_COND", SDTIL_BRCond, [SDNPHasChain]>;			def IL_brcond : SDNode<"AMDGPUISD::BRANCH_COND", SDTIL_BRCond, [SDNPHasChain]>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Call/Return DAG Nodes			// Call/Return DAG Nodes
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	def IL_retflag : SDNode<"AMDGPUISD::RET_FLAG", SDTNone,			def IL_retflag : SDNode<"AMDGPUISD::RET_FLAG", SDTNone,
	[SDNPHasChain, SDNPOptInGlue]>;			[SDNPHasChain, SDNPOptInGlue, SDNPVariadic]>;

lib/Target/AMDGPU/SIDefines.h

	Show First 20 Lines • Show All 131 Lines • ▼ Show 20 Lines
	#define S_00B84C_LDS_SIZE(x) (((x) & 0x1FF) << 15)			#define S_00B84C_LDS_SIZE(x) (((x) & 0x1FF) << 15)
	#define G_00B84C_LDS_SIZE(x) (((x) >> 15) & 0x1FF)			#define G_00B84C_LDS_SIZE(x) (((x) >> 15) & 0x1FF)
	#define C_00B84C_LDS_SIZE 0xFF007FFF			#define C_00B84C_LDS_SIZE 0xFF007FFF
	#define S_00B84C_EXCP_EN(x) (((x) & 0x7F) << 24)			#define S_00B84C_EXCP_EN(x) (((x) & 0x7F) << 24)
	#define G_00B84C_EXCP_EN(x) (((x) >> 24) & 0x7F)			#define G_00B84C_EXCP_EN(x) (((x) >> 24) & 0x7F)
	#define C_00B84C_EXCP_EN			#define C_00B84C_EXCP_EN

	#define R_0286CC_SPI_PS_INPUT_ENA 0x0286CC			#define R_0286CC_SPI_PS_INPUT_ENA 0x0286CC
				#define R_0286D0_SPI_PS_INPUT_ADDR 0x0286D0

	#define R_00B848_COMPUTE_PGM_RSRC1 0x00B848			#define R_00B848_COMPUTE_PGM_RSRC1 0x00B848
	#define S_00B848_VGPRS(x) (((x) & 0x3F) << 0)			#define S_00B848_VGPRS(x) (((x) & 0x3F) << 0)
	#define G_00B848_VGPRS(x) (((x) >> 0) & 0x3F)			#define G_00B848_VGPRS(x) (((x) >> 0) & 0x3F)
	#define C_00B848_VGPRS 0xFFFFFFC0			#define C_00B848_VGPRS 0xFFFFFFC0
	#define S_00B848_SGPRS(x) (((x) & 0x0F) << 6)			#define S_00B848_SGPRS(x) (((x) & 0x0F) << 6)
	#define G_00B848_SGPRS(x) (((x) >> 6) & 0x0F)			#define G_00B848_SGPRS(x) (((x) >> 6) & 0x0F)
	#define C_00B848_SGPRS 0xFFFFFC3F			#define C_00B848_SGPRS 0xFFFFFC3F
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.h

Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	bool shouldConvertConstantLoadToIntImm(const APInt &Imm,
Type *Ty) const override;		Type *Ty) const override;

SDValue LowerFormalArguments(SDValue Chain, CallingConv::ID CallConv,		SDValue LowerFormalArguments(SDValue Chain, CallingConv::ID CallConv,
bool isVarArg,		bool isVarArg,
const SmallVectorImpl<ISD::InputArg> &Ins,		const SmallVectorImpl<ISD::InputArg> &Ins,
SDLoc DL, SelectionDAG &DAG,		SDLoc DL, SelectionDAG &DAG,
SmallVectorImpl<SDValue> &InVals) const override;		SmallVectorImpl<SDValue> &InVals) const override;

		SDValue LowerReturn(SDValue Chain,
		CallingConv::ID CallConv,
		bool isVarArg,
		const SmallVectorImpl<ISD::OutputArg> &Outs,
		const SmallVectorImpl<SDValue> &OutVals,
		SDLoc DL, SelectionDAG &DAG) const override;

MachineBasicBlock * EmitInstrWithCustomInserter(MachineInstr * MI,		MachineBasicBlock * EmitInstrWithCustomInserter(MachineInstr * MI,
MachineBasicBlock * BB) const override;		MachineBasicBlock * BB) const override;
bool enableAggressiveFMAFusion(EVT VT) const override;		bool enableAggressiveFMAFusion(EVT VT) const override;
EVT getSetCCResultType(const DataLayout &DL, LLVMContext &Context,		EVT getSetCCResultType(const DataLayout &DL, LLVMContext &Context,
EVT VT) const override;		EVT VT) const override;
MVT getScalarShiftAmountTy(const DataLayout &, EVT) const override;		MVT getScalarShiftAmountTy(const DataLayout &, EVT) const override;
bool isFMAFasterThanFMulAndFAdd(EVT VT) const override;		bool isFMAFasterThanFMulAndFAdd(EVT VT) const override;
SDValue LowerOperation(SDValue Op, SelectionDAG &DAG) const override;		SDValue LowerOperation(SDValue Op, SelectionDAG &DAG) const override;
Show All 26 Lines

lib/Target/AMDGPU/SIISelLowering.cpp

Show First 20 Lines • Show All 592 Lines • ▼ Show 20 Lines	SDValue SITargetLowering::LowerFormalArguments(
SmallVector<ISD::InputArg, 16> Splits;		SmallVector<ISD::InputArg, 16> Splits;
BitVector Skipped(Ins.size());		BitVector Skipped(Ins.size());

for (unsigned i = 0, e = Ins.size(), PSInputNum = 0; i != e; ++i) {		for (unsigned i = 0, e = Ins.size(), PSInputNum = 0; i != e; ++i) {
const ISD::InputArg &Arg = Ins[i];		const ISD::InputArg &Arg = Ins[i];

// First check if it's a PS input addr		// First check if it's a PS input addr
if (Info->getShaderType() == ShaderType::PIXEL && !Arg.Flags.isInReg() &&		if (Info->getShaderType() == ShaderType::PIXEL && !Arg.Flags.isInReg() &&
!Arg.Flags.isByVal()) {		!Arg.Flags.isByVal() && PSInputNum <= 15) {

assert((PSInputNum <= 15) && "Too many PS inputs!");		if (!Arg.Used && !(Info->PSInputAddr & (1 << PSInputNum))) {

if (!Arg.Used) {
// We can safely skip PS inputs		// We can safely skip PS inputs
Skipped.set(i);		Skipped.set(i);
++PSInputNum;		++PSInputNum;
continue;		continue;
}		}

Info->PSInputAddr \|= 1 << PSInputNum++;		Info->PSInputEna \|= 1 << PSInputNum;
		Info->PSInputAddr \|= 1 << PSInputNum;
		++PSInputNum;
}		}

// Second split vertices into their elements		// Second split vertices into their elements
if (Info->getShaderType() != ShaderType::COMPUTE && Arg.VT.isVector()) {		if (Info->getShaderType() != ShaderType::COMPUTE && Arg.VT.isVector()) {
ISD::InputArg NewArg = Arg;		ISD::InputArg NewArg = Arg;
NewArg.Flags.setSplit();		NewArg.Flags.setSplit();
NewArg.VT = Arg.VT.getVectorElementType();		NewArg.VT = Arg.VT.getVectorElementType();

Show All 14 Lines	SDValue SITargetLowering::LowerFormalArguments(
}		}

SmallVector<CCValAssign, 16> ArgLocs;		SmallVector<CCValAssign, 16> ArgLocs;
CCState CCInfo(CallConv, isVarArg, DAG.getMachineFunction(), ArgLocs,		CCState CCInfo(CallConv, isVarArg, DAG.getMachineFunction(), ArgLocs,
*DAG.getContext());		*DAG.getContext());

// At least one interpolation mode must be enabled or else the GPU will hang.		// At least one interpolation mode must be enabled or else the GPU will hang.
if (Info->getShaderType() == ShaderType::PIXEL &&		if (Info->getShaderType() == ShaderType::PIXEL &&
(Info->PSInputAddr & 0x7F) == 0) {		(Info->PSInputEna & 0x7F) == 0) {
		Info->PSInputEna \|= 1;

		if (!(Info->PSInputAddr & 0x1)) {
Info->PSInputAddr \|= 1;		Info->PSInputAddr \|= 1;
CCInfo.AllocateReg(AMDGPU::VGPR0);		CCInfo.AllocateReg(AMDGPU::VGPR0);
CCInfo.AllocateReg(AMDGPU::VGPR1);		CCInfo.AllocateReg(AMDGPU::VGPR1);
}		}
		}

if (Info->getShaderType() == ShaderType::COMPUTE) {		if (Info->getShaderType() == ShaderType::COMPUTE) {
getOriginalFunctionArgs(DAG, DAG.getMachineFunction().getFunction(), Ins,		getOriginalFunctionArgs(DAG, DAG.getMachineFunction().getFunction(), Ins,
Splits);		Splits);
}		}

// FIXME: How should these inputs interact with inreg / custom SGPR inputs?		// FIXME: How should these inputs interact with inreg / custom SGPR inputs?
if (Info->hasPrivateSegmentBuffer()) {		if (Info->hasPrivateSegmentBuffer()) {
▲ Show 20 Lines • Show All 211 Lines • ▼ Show 20 Lines	SDValue SITargetLowering::LowerFormalArguments(
}		}

if (Chains.empty())		if (Chains.empty())
return Chain;		return Chain;

return DAG.getNode(ISD::TokenFactor, DL, MVT::Other, Chains);		return DAG.getNode(ISD::TokenFactor, DL, MVT::Other, Chains);
}		}

		SDValue SITargetLowering::LowerReturn(SDValue Chain,
		CallingConv::ID CallConv,
		bool isVarArg,
		const SmallVectorImpl<ISD::OutputArg> &Outs,
		const SmallVectorImpl<SDValue> &OutVals,
		SDLoc DL, SelectionDAG &DAG) const {
		MachineFunction &MF = DAG.getMachineFunction();
		SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();
		Info->ReturnsVoid = Outs.size() == 0;

		SmallVector<ISD::OutputArg, 48> Splits;
		SmallVector<SDValue, 48> SplitVals;

		// Split vectors into their elements.
		for (unsigned i = 0, e = Outs.size(); i != e; ++i) {
		const ISD::OutputArg &Out = Outs[i];

		if (Out.VT.isVector()) {
		MVT VT = Out.VT.getVectorElementType();
		ISD::OutputArg NewOut = Out;
		NewOut.Flags.setSplit();
		NewOut.VT = VT;

		// We want the original number of vector elements here, e.g.
		// three or five, not four or eight.
		unsigned NumElements = Out.ArgVT.getVectorNumElements();

		for (unsigned j = 0; j != NumElements; ++j) {
		SDValue Elem = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, VT, OutVals[i],
		DAG.getConstant(j, DL, MVT::i32));
		SplitVals.push_back(Elem);
		Splits.push_back(NewOut);
		NewOut.PartOffset += NewOut.VT.getStoreSize();
		}
		} else {
		SplitVals.push_back(OutVals[i]);
		Splits.push_back(Out);
		}
		}

		// CCValAssign - represent the assignment of the return value to a location.
		SmallVector<CCValAssign, 48> RVLocs;

		// CCState - Info about the registers and stack slots.
		CCState CCInfo(CallConv, isVarArg, DAG.getMachineFunction(), RVLocs,
		*DAG.getContext());

		// Analyze outgoing return values.
		AnalyzeReturn(CCInfo, Splits);

		SDValue Flag;
		SmallVector<SDValue, 48> RetOps;
		RetOps.push_back(Chain); // Operand #0 = Chain (updated below)

		// Copy the result values into the output registers.
		for (unsigned i = 0, realRVLocIdx = 0;
		i != RVLocs.size();
		++i, ++realRVLocIdx) {
		CCValAssign &VA = RVLocs[i];
		assert(VA.isRegLoc() && "Can only return in registers!");

		SDValue Arg = SplitVals[realRVLocIdx];

		// Copied from other backends.
		switch (VA.getLocInfo()) {
		default: llvm_unreachable("Unknown loc info!");
		case CCValAssign::Full:
		break;
		case CCValAssign::BCvt:
		Arg = DAG.getNode(ISD::BITCAST, DL, VA.getLocVT(), Arg);
		break;
		}
		arsenmUnsubmitted Not Done Reply Inline Actions I think you will still encounter ZExt/SExt arsenm: I think you will still encounter ZExt/SExt
		marekoAuthorUnsubmitted Not Done Reply Inline Actions Even if we only use f32 and i32 in Mesa? mareko: Even if we only use f32 and i32 in Mesa?

		Chain = DAG.getCopyToReg(Chain, DL, VA.getLocReg(), Arg, Flag);
		Flag = Chain.getValue(1);
		RetOps.push_back(DAG.getRegister(VA.getLocReg(), VA.getLocVT()));
		}

		// Update chain and glue.
		RetOps[0] = Chain;
		if (Flag.getNode())
		RetOps.push_back(Flag);

		return DAG.getNode(AMDGPUISD::RET_FLAG, DL, MVT::Other, RetOps);
		}

MachineBasicBlock * SITargetLowering::EmitInstrWithCustomInserter(		MachineBasicBlock * SITargetLowering::EmitInstrWithCustomInserter(
MachineInstr * MI, MachineBasicBlock * BB) const {		MachineInstr * MI, MachineBasicBlock * BB) const {

switch (MI->getOpcode()) {		switch (MI->getOpcode()) {
default:		default:
return AMDGPUTargetLowering::EmitInstrWithCustomInserter(MI, BB);		return AMDGPUTargetLowering::EmitInstrWithCustomInserter(MI, BB);
case AMDGPU::BRANCH:		case AMDGPU::BRANCH:
return BB;		return BB;
▲ Show 20 Lines • Show All 1,404 Lines • ▼ Show 20 Lines
/// \brief Legalize target independent instructions (e.g. INSERT_SUBREG)		/// \brief Legalize target independent instructions (e.g. INSERT_SUBREG)
/// with frame index operands.		/// with frame index operands.
/// LLVM assumes that inputs are to these instructions are registers.		/// LLVM assumes that inputs are to these instructions are registers.
void SITargetLowering::legalizeTargetIndependentNode(SDNode *Node,		void SITargetLowering::legalizeTargetIndependentNode(SDNode *Node,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {

SmallVector<SDValue, 8> Ops;		SmallVector<SDValue, 8> Ops;
for (unsigned i = 0; i < Node->getNumOperands(); ++i) {		for (unsigned i = 0; i < Node->getNumOperands(); ++i) {
if (!isFrameIndexOp(Node->getOperand(i))) {		if (!isFrameIndexOp(Node->getOperand(i)) &&
		(Node->getOpcode() != ISD::MERGE_VALUES \|\|
		!isa<ConstantSDNode>(Node->getOperand(i)))) {
Ops.push_back(Node->getOperand(i));		Ops.push_back(Node->getOperand(i));
continue;		continue;
}		}

SDLoc DL(Node);		SDLoc DL(Node);
Ops.push_back(SDValue(DAG.getMachineNode(AMDGPU::S_MOV_B32, DL,		Ops.push_back(SDValue(DAG.getMachineNode(AMDGPU::S_MOV_B32, DL,
Node->getOperand(i).getValueType(),		Node->getOperand(i).getValueType(),
Node->getOperand(i)), 0));		Node->getOperand(i)), 0));
▲ Show 20 Lines • Show All 228 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInsertWaits.cpp

Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	private:
/// \brief Different export instruction types seen since last wait.		/// \brief Different export instruction types seen since last wait.
unsigned ExpInstrTypesSeen;		unsigned ExpInstrTypesSeen;

/// \brief Type of the last opcode.		/// \brief Type of the last opcode.
InstType LastOpcodeType;		InstType LastOpcodeType;

bool LastInstWritesM0;		bool LastInstWritesM0;

		/// \brief Whether the machine function returns void
		bool ReturnsVoid;

/// \brief Get increment/decrement amount for this instruction.		/// \brief Get increment/decrement amount for this instruction.
Counters getHwCounts(MachineInstr &MI);		Counters getHwCounts(MachineInstr &MI);

/// \brief Is operand relevant for async execution?		/// \brief Is operand relevant for async execution?
bool isOpRelevant(MachineOperand &Op);		bool isOpRelevant(MachineOperand &Op);

/// \brief Get register interval an operand affects.		/// \brief Get register interval an operand affects.
RegInterval getRegInterval(const TargetRegisterClass *RC,		RegInterval getRegInterval(const TargetRegisterClass *RC,
▲ Show 20 Lines • Show All 222 Lines • ▼ Show 20 Lines	void SIInsertWaits::pushInstruction(MachineBasicBlock &MBB,
}		}
}		}

bool SIInsertWaits::insertWait(MachineBasicBlock &MBB,		bool SIInsertWaits::insertWait(MachineBasicBlock &MBB,
MachineBasicBlock::iterator I,		MachineBasicBlock::iterator I,
const Counters &Required) {		const Counters &Required) {

// End of program? No need to wait on anything		// End of program? No need to wait on anything
if (I != MBB.end() && I->getOpcode() == AMDGPU::S_ENDPGM)		// A function not returning void needs to wait, because other bytecode will
		// be appended after it and we don't know what it will be.
		if (I != MBB.end() && I->getOpcode() == AMDGPU::S_ENDPGM && ReturnsVoid)
return false;		return false;

// Figure out if the async instructions execute in order		// Figure out if the async instructions execute in order
bool Ordered[3];		bool Ordered[3];

// VM_CNT is always ordered		// VM_CNT is always ordered
Ordered[0] = true;		Ordered[0] = true;

▲ Show 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	TRI =
static_cast<const SIRegisterInfo *>(MF.getSubtarget().getRegisterInfo());		static_cast<const SIRegisterInfo *>(MF.getSubtarget().getRegisterInfo());

MRI = &MF.getRegInfo();		MRI = &MF.getRegInfo();

WaitedOn = ZeroCounts;		WaitedOn = ZeroCounts;
LastIssued = ZeroCounts;		LastIssued = ZeroCounts;
LastOpcodeType = OTHER;		LastOpcodeType = OTHER;
LastInstWritesM0 = false;		LastInstWritesM0 = false;
		ReturnsVoid = MF.getInfo<SIMachineFunctionInfo>()->ReturnsVoid;

memset(&UsedRegs, 0, sizeof(UsedRegs));		memset(&UsedRegs, 0, sizeof(UsedRegs));
memset(&DefinedRegs, 0, sizeof(DefinedRegs));		memset(&DefinedRegs, 0, sizeof(DefinedRegs));

for (MachineFunction::iterator BI = MF.begin(), BE = MF.end();		for (MachineFunction::iterator BI = MF.begin(), BE = MF.end();
BI != BE; ++BI) {		BI != BE; ++BI) {

MachineBasicBlock &MBB = *BI;		MachineBasicBlock &MBB = *BI;
for (MachineBasicBlock::iterator I = MBB.begin(), E = MBB.end();		for (MachineBasicBlock::iterator I = MBB.begin(), E = MBB.end();
I != E; ++I) {		I != E; ++I) {

// Wait for everything before a barrier.		// Wait for everything before a barrier.
if (I->getOpcode() == AMDGPU::S_BARRIER)		if (I->getOpcode() == AMDGPU::S_BARRIER)
Changes \|= insertWait(MBB, I, LastIssued);		Changes \|= insertWait(MBB, I, LastIssued);
else		else
Changes \|= insertWait(MBB, I, handleOperands(*I));		Changes \|= insertWait(MBB, I, handleOperands(*I));

pushInstruction(MBB, I);		pushInstruction(MBB, I);
handleSendMsg(MBB, I);		handleSendMsg(MBB, I);
}		}

// Wait for everything at the end of the MBB		// Wait for everything at the end of the MBB
Changes \|= insertWait(MBB, MBB.getFirstTerminator(), LastIssued);		Changes \|= insertWait(MBB, MBB.getFirstTerminator(), LastIssued);

		// Functions returning something shouldn't contain S_ENDPGM, because other
		// bytecode will be appended after it.
		if (!ReturnsVoid) {
		MachineBasicBlock::iterator I = MBB.getFirstTerminator();
		assert(I != MBB.end());
		if (I->getOpcode() == AMDGPU::S_ENDPGM)
		I->eraseFromParent();
		}
}		}

return Changes;		return Changes;
}		}

lib/Target/AMDGPU/SIMachineFunctionInfo.h

Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	class SIMachineFunctionInfo : public AMDGPUMachineFunction {
unsigned WorkGroupIDYSystemSGPR;		unsigned WorkGroupIDYSystemSGPR;
unsigned WorkGroupIDZSystemSGPR;		unsigned WorkGroupIDZSystemSGPR;
unsigned WorkGroupInfoSystemSGPR;		unsigned WorkGroupInfoSystemSGPR;
unsigned PrivateSegmentWaveByteOffsetSystemSGPR;		unsigned PrivateSegmentWaveByteOffsetSystemSGPR;

public:		public:
// FIXME: Make private		// FIXME: Make private
unsigned LDSWaveSpillSize;		unsigned LDSWaveSpillSize;
		unsigned PSInputEna;
unsigned PSInputAddr;		unsigned PSInputAddr;
std::map<unsigned, unsigned> LaneVGPRs;		std::map<unsigned, unsigned> LaneVGPRs;
unsigned ScratchOffsetReg;		unsigned ScratchOffsetReg;
unsigned NumUserSGPRs;		unsigned NumUserSGPRs;
unsigned NumSystemSGPRs;		unsigned NumSystemSGPRs;
		bool ReturnsVoid;
		arsenmUnsubmitted Not Done Reply Inline Actions New public fields should not be added here arsenm: New public fields should not be added here
		marekoAuthorUnsubmitted Not Done Reply Inline Actions Private it is then. mareko: Private it is then.

private:		private:
bool HasSpilledSGPRs;		bool HasSpilledSGPRs;
bool HasSpilledVGPRs;		bool HasSpilledVGPRs;

// Feature bits required for inputs passed in user SGPRs.		// Feature bits required for inputs passed in user SGPRs.
bool PrivateSegmentBuffer : 1;		bool PrivateSegmentBuffer : 1;
bool DispatchPtr : 1;		bool DispatchPtr : 1;
▲ Show 20 Lines • Show All 207 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	: AMDGPUMachineFunction(MF),
GridWorkGroupCountYUserSGPR(AMDGPU::NoRegister),		GridWorkGroupCountYUserSGPR(AMDGPU::NoRegister),
GridWorkGroupCountZUserSGPR(AMDGPU::NoRegister),		GridWorkGroupCountZUserSGPR(AMDGPU::NoRegister),
WorkGroupIDXSystemSGPR(AMDGPU::NoRegister),		WorkGroupIDXSystemSGPR(AMDGPU::NoRegister),
WorkGroupIDYSystemSGPR(AMDGPU::NoRegister),		WorkGroupIDYSystemSGPR(AMDGPU::NoRegister),
WorkGroupIDZSystemSGPR(AMDGPU::NoRegister),		WorkGroupIDZSystemSGPR(AMDGPU::NoRegister),
WorkGroupInfoSystemSGPR(AMDGPU::NoRegister),		WorkGroupInfoSystemSGPR(AMDGPU::NoRegister),
PrivateSegmentWaveByteOffsetSystemSGPR(AMDGPU::NoRegister),		PrivateSegmentWaveByteOffsetSystemSGPR(AMDGPU::NoRegister),
LDSWaveSpillSize(0),		LDSWaveSpillSize(0),
		PSInputEna(0),
PSInputAddr(0),		PSInputAddr(0),
NumUserSGPRs(0),		NumUserSGPRs(0),
NumSystemSGPRs(0),		NumSystemSGPRs(0),
		ReturnsVoid(true),
HasSpilledSGPRs(false),		HasSpilledSGPRs(false),
HasSpilledVGPRs(false),		HasSpilledVGPRs(false),
PrivateSegmentBuffer(false),		PrivateSegmentBuffer(false),
DispatchPtr(false),		DispatchPtr(false),
QueuePtr(false),		QueuePtr(false),
DispatchID(false),		DispatchID(false),
KernargSegmentPtr(false),		KernargSegmentPtr(false),
FlatScratchInit(false),		FlatScratchInit(false),
GridWorkgroupCountX(false),		GridWorkgroupCountX(false),
GridWorkgroupCountY(false),		GridWorkgroupCountY(false),
GridWorkgroupCountZ(false),		GridWorkgroupCountZ(false),
WorkGroupIDX(true),		WorkGroupIDX(true),
WorkGroupIDY(false),		WorkGroupIDY(false),
WorkGroupIDZ(false),		WorkGroupIDZ(false),
WorkGroupInfo(false),		WorkGroupInfo(false),
PrivateSegmentWaveByteOffset(false),		PrivateSegmentWaveByteOffset(false),
WorkItemIDX(true),		WorkItemIDX(true),
WorkItemIDY(false),		WorkItemIDY(false),
WorkItemIDZ(false) {		WorkItemIDZ(false) {
const AMDGPUSubtarget &ST = MF.getSubtarget<AMDGPUSubtarget>();		const AMDGPUSubtarget &ST = MF.getSubtarget<AMDGPUSubtarget>();
const Function *F = MF.getFunction();		const Function *F = MF.getFunction();

		PSInputAddr = AMDGPU::getInitialPSInputAddr(*F);

const MachineFrameInfo *FrameInfo = MF.getFrameInfo();		const MachineFrameInfo *FrameInfo = MF.getFrameInfo();

if (getShaderType() == ShaderType::COMPUTE)		if (getShaderType() == ShaderType::COMPUTE)
KernargSegmentPtr = true;		KernargSegmentPtr = true;

if (F->hasFnAttribute("amdgpu-work-group-id-y"))		if (F->hasFnAttribute("amdgpu-work-group-id-y"))
WorkGroupIDY = true;		WorkGroupIDY = true;

▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines

lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h

	Show All 39 Lines

	MCSection *getHSARodataReadonlyAgentSection(MCContext &Ctx);			MCSection *getHSARodataReadonlyAgentSection(MCContext &Ctx);

	bool isGroupSegment(const GlobalValue *GV);			bool isGroupSegment(const GlobalValue *GV);
	bool isGlobalSegment(const GlobalValue *GV);			bool isGlobalSegment(const GlobalValue *GV);
	bool isReadOnlySegment(const GlobalValue *GV);			bool isReadOnlySegment(const GlobalValue *GV);

	unsigned getShaderType(const Function &F);			unsigned getShaderType(const Function &F);
				unsigned getInitialPSInputAddr(const Function &F);


	bool isSI(const MCSubtargetInfo &STI);			bool isSI(const MCSubtargetInfo &STI);
	bool isCI(const MCSubtargetInfo &STI);			bool isCI(const MCSubtargetInfo &STI);
	bool isVI(const MCSubtargetInfo &STI);			bool isVI(const MCSubtargetInfo &STI);

	/// If \p Reg is a pseudo reg, return the correct hardware register given			/// If \p Reg is a pseudo reg, return the correct hardware register given
	/// \p STI otherwise return \p Reg.			/// \p STI otherwise return \p Reg.
	unsigned getMCReg(unsigned Reg, const MCSubtargetInfo &STI);			unsigned getMCReg(unsigned Reg, const MCSubtargetInfo &STI);

	} // end namespace AMDGPU			} // end namespace AMDGPU
	} // end namespace llvm			} // end namespace llvm

	#endif			#endif

lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp

	Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines
	bool isGlobalSegment(const GlobalValue *GV) {			bool isGlobalSegment(const GlobalValue *GV) {
	return GV->getType()->getAddressSpace() == AMDGPUAS::GLOBAL_ADDRESS;			return GV->getType()->getAddressSpace() == AMDGPUAS::GLOBAL_ADDRESS;
	}			}

	bool isReadOnlySegment(const GlobalValue *GV) {			bool isReadOnlySegment(const GlobalValue *GV) {
	return GV->getType()->getAddressSpace() == AMDGPUAS::CONSTANT_ADDRESS;			return GV->getType()->getAddressSpace() == AMDGPUAS::CONSTANT_ADDRESS;
	}			}

	static const char ShaderTypeAttribute[] = "ShaderType";			static unsigned getIntegerAttribute(const Function &F, const char *Name,
				unsigned Default) {
	unsigned getShaderType(const Function &F) {			Attribute A = F.getFnAttribute(Name);
	Attribute A = F.getFnAttribute(ShaderTypeAttribute);			unsigned Result = Default;
	unsigned ShaderType = ShaderType::COMPUTE;

	if (A.isStringAttribute()) {			if (A.isStringAttribute()) {
	StringRef Str = A.getValueAsString();			StringRef Str = A.getValueAsString();
	if (Str.getAsInteger(0, ShaderType)) {			if (Str.getAsInteger(0, Result)) {
	LLVMContext &Ctx = F.getContext();			LLVMContext &Ctx = F.getContext();
	Ctx.emitError("can't parse shader type");			Ctx.emitError("can't parse shader type");
	}			}
	}			}
	return ShaderType;			return Result;
				}

				unsigned getShaderType(const Function &F) {
				return getIntegerAttribute(F, "ShaderType", ShaderType::COMPUTE);
				}

				unsigned getInitialPSInputAddr(const Function &F) {
				return getIntegerAttribute(F, "InitialPSInputAddr", 0);
	}			}

	bool isSI(const MCSubtargetInfo &STI) {			bool isSI(const MCSubtargetInfo &STI) {
	return STI.getFeatureBits()[AMDGPU::FeatureSouthernIslands];			return STI.getFeatureBits()[AMDGPU::FeatureSouthernIslands];
	}			}

	bool isCI(const MCSubtargetInfo &STI) {			bool isCI(const MCSubtargetInfo &STI) {
	return STI.getFeatureBits()[AMDGPU::FeatureSeaIslands];			return STI.getFeatureBits()[AMDGPU::FeatureSeaIslands];
	Show All 27 Lines

test/CodeGen/AMDGPU/ret.ll

This file was added.

				; RUN: llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s
				; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

				attributes #0 = { "ShaderType"="1" }

				declare void @llvm.SI.export(i32, i32, i32, i32, i32, float, float, float, float)

				; GCN-LABEL: {{^}}vgpr:
				; GCN: v_mov_b32_e32 v1, v0
				; GCN-DAG: v_add_f32_e32 v0, 1.0, v1
				; GCN-DAG: exp 15, 0, 1, 1, 1, v1, v1, v1, v1
				; GCN: s_waitcnt expcnt(0)
				; GCN-NOT: s_endpgm
				define {float, float} @vgpr([9 x <16 x i8>] addrspace(2)* byval, i32 inreg, i32 inreg, float) #0 {
				call void @llvm.SI.export(i32 15, i32 1, i32 1, i32 0, i32 1, float %3, float %3, float %3, float %3)
				%x = fadd float %3, 1.0
				%a = insertvalue {float, float} undef, float %x, 0
				%b = insertvalue {float, float} %a, float %3, 1
				ret {float, float} %b
				}

				; GCN-LABEL: {{^}}vgpr_literal:
				; GCN: v_mov_b32_e32 v4, v0
				; GCN-DAG: v_mov_b32_e32 v0, 1.0
				; GCN-DAG: v_mov_b32_e32 v1, 2.0
				; GCN-DAG: v_mov_b32_e32 v2, 4.0
				; GCN-DAG: v_mov_b32_e32 v3, -1.0
				; GCN: exp 15, 0, 1, 1, 1, v4, v4, v4, v4
				; GCN: s_waitcnt expcnt(0)
				; GCN-NOT: s_endpgm
				define {float, float, float, float} @vgpr_literal([9 x <16 x i8>] addrspace(2)* byval, i32 inreg, i32 inreg, float) #0 {
				call void @llvm.SI.export(i32 15, i32 1, i32 1, i32 0, i32 1, float %3, float %3, float %3, float %3)
				ret {float, float, float, float} {float 1.0, float 2.0, float 4.0, float -1.0}
				}


				; GCN-LABEL: {{^}}vgpr_ps_addr0:
				; GCN-NOT: v_mov_b32_e32 v0
				; GCN-NOT: v_mov_b32_e32 v1
				; GCN-NOT: v_mov_b32_e32 v2
				; GCN: v_mov_b32_e32 v3, v4
				; GCN: v_mov_b32_e32 v4, v6
				; GCN-NOT: s_endpgm
				attributes #1 = { "ShaderType"="0" "InitialPSInputAddr"="0" }
				define {float, float, float, float, float} @vgpr_ps_addr0([9 x <16 x i8>] addrspace(2)* byval, i32 inreg, i32 inreg, <2 x i32>, <2 x i32>, <2 x i32>, <3 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, float, float, float, float, float, float, float, float, float) #1 {
				%i0 = extractelement <2 x i32> %4, i32 0
				%i1 = extractelement <2 x i32> %4, i32 1
				%i2 = extractelement <2 x i32> %7, i32 0
				%i3 = extractelement <2 x i32> %8, i32 0
				%f0 = bitcast i32 %i0 to float
				%f1 = bitcast i32 %i1 to float
				%f2 = bitcast i32 %i2 to float
				%f3 = bitcast i32 %i3 to float
				%r0 = insertvalue {float, float, float, float, float} undef, float %f0, 0
				%r1 = insertvalue {float, float, float, float, float} %r0, float %f1, 1
				%r2 = insertvalue {float, float, float, float, float} %r1, float %f2, 2
				%r3 = insertvalue {float, float, float, float, float} %r2, float %f3, 3
				%r4 = insertvalue {float, float, float, float, float} %r3, float %12, 4
				ret {float, float, float, float, float} %r4
				}


				; GCN-LABEL: {{^}}vgpr_ps_addr1:
				; GCN-DAG: v_mov_b32_e32 v0, v2
				; GCN-DAG: v_mov_b32_e32 v1, v3
				; GCN: v_mov_b32_e32 v2, v4
				; GCN-DAG: v_mov_b32_e32 v3, v6
				; GCN-DAG: v_mov_b32_e32 v4, v8
				; GCN-NOT: s_endpgm
				attributes #2 = { "ShaderType"="0" "InitialPSInputAddr"="1" }
				define {float, float, float, float, float} @vgpr_ps_addr1([9 x <16 x i8>] addrspace(2)* byval, i32 inreg, i32 inreg, <2 x i32>, <2 x i32>, <2 x i32>, <3 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, float, float, float, float, float, float, float, float, float) #2 {
				%i0 = extractelement <2 x i32> %4, i32 0
				%i1 = extractelement <2 x i32> %4, i32 1
				%i2 = extractelement <2 x i32> %7, i32 0
				%i3 = extractelement <2 x i32> %8, i32 0
				%f0 = bitcast i32 %i0 to float
				%f1 = bitcast i32 %i1 to float
				%f2 = bitcast i32 %i2 to float
				%f3 = bitcast i32 %i3 to float
				%r0 = insertvalue {float, float, float, float, float} undef, float %f0, 0
				%r1 = insertvalue {float, float, float, float, float} %r0, float %f1, 1
				%r2 = insertvalue {float, float, float, float, float} %r1, float %f2, 2
				%r3 = insertvalue {float, float, float, float, float} %r2, float %f3, 3
				%r4 = insertvalue {float, float, float, float, float} %r3, float %12, 4
				ret {float, float, float, float, float} %r4
				}


				; GCN-LABEL: {{^}}vgpr_ps_addr119:
				; GCN-DAG: v_mov_b32_e32 v0, v2
				; GCN-DAG: v_mov_b32_e32 v1, v3
				; GCN: v_mov_b32_e32 v2, v6
				; GCN: v_mov_b32_e32 v3, v8
				; GCN: v_mov_b32_e32 v4, v12
				; GCN-NOT: s_endpgm
				attributes #3 = { "ShaderType"="0" "InitialPSInputAddr"="119" }
				define {float, float, float, float, float} @vgpr_ps_addr119([9 x <16 x i8>] addrspace(2)* byval, i32 inreg, i32 inreg, <2 x i32>, <2 x i32>, <2 x i32>, <3 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, float, float, float, float, float, float, float, float, float) #3 {
				%i0 = extractelement <2 x i32> %4, i32 0
				%i1 = extractelement <2 x i32> %4, i32 1
				%i2 = extractelement <2 x i32> %7, i32 0
				%i3 = extractelement <2 x i32> %8, i32 0
				%f0 = bitcast i32 %i0 to float
				%f1 = bitcast i32 %i1 to float
				%f2 = bitcast i32 %i2 to float
				%f3 = bitcast i32 %i3 to float
				%r0 = insertvalue {float, float, float, float, float} undef, float %f0, 0
				%r1 = insertvalue {float, float, float, float, float} %r0, float %f1, 1
				%r2 = insertvalue {float, float, float, float, float} %r1, float %f2, 2
				%r3 = insertvalue {float, float, float, float, float} %r2, float %f3, 3
				%r4 = insertvalue {float, float, float, float, float} %r3, float %12, 4
				ret {float, float, float, float, float} %r4
				}


				; GCN-LABEL: {{^}}vgpr_ps_addr418:
				; GCN-NOT: v_mov_b32_e32 v0
				; GCN-NOT: v_mov_b32_e32 v1
				; GCN-NOT: v_mov_b32_e32 v2
				; GCN: v_mov_b32_e32 v3, v4
				; GCN: v_mov_b32_e32 v4, v8
				; GCN-NOT: s_endpgm
				attributes #4 = { "ShaderType"="0" "InitialPSInputAddr"="418" }
				define {float, float, float, float, float} @vgpr_ps_addr418([9 x <16 x i8>] addrspace(2)* byval, i32 inreg, i32 inreg, <2 x i32>, <2 x i32>, <2 x i32>, <3 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, float, float, float, float, float, float, float, float, float) #4 {
				%i0 = extractelement <2 x i32> %4, i32 0
				%i1 = extractelement <2 x i32> %4, i32 1
				%i2 = extractelement <2 x i32> %7, i32 0
				%i3 = extractelement <2 x i32> %8, i32 0
				%f0 = bitcast i32 %i0 to float
				%f1 = bitcast i32 %i1 to float
				%f2 = bitcast i32 %i2 to float
				%f3 = bitcast i32 %i3 to float
				%r0 = insertvalue {float, float, float, float, float} undef, float %f0, 0
				%r1 = insertvalue {float, float, float, float, float} %r0, float %f1, 1
				%r2 = insertvalue {float, float, float, float, float} %r1, float %f2, 2
				%r3 = insertvalue {float, float, float, float, float} %r2, float %f3, 3
				%r4 = insertvalue {float, float, float, float, float} %r3, float %12, 4
				ret {float, float, float, float, float} %r4
				}


				; GCN-LABEL: {{^}}sgpr:
				; GCN: s_add_i32 s0, s3, 2
				; GCN: s_mov_b32 s2, s3
				; GCN-NOT: s_endpgm
				define {i32, i32, i32} @sgpr([9 x <16 x i8>] addrspace(2)* byval, i32 inreg, i32 inreg, float) #0 {
				%x = add i32 %2, 2
				%a = insertvalue {i32, i32, i32} undef, i32 %x, 0
				%b = insertvalue {i32, i32, i32} %a, i32 %1, 1
				%c = insertvalue {i32, i32, i32} %a, i32 %2, 2
				ret {i32, i32, i32} %c
				}


				; GCN-LABEL: {{^}}sgpr_literal:
				; GCN: s_mov_b32 s0, 5
				; XGCN-NOT: s_mov_b32 s0, s0
				; GCN-DAG: s_mov_b32 s1, 6
				; GCN-DAG: s_mov_b32 s2, 7
				; GCN-DAG: s_mov_b32 s3, 8
				; GCN-NOT: s_endpgm
				define {i32, i32, i32, i32} @sgpr_literal([9 x <16 x i8>] addrspace(2)* byval, i32 inreg, i32 inreg, float) #0 {
				%x = add i32 %2, 2
				ret {i32, i32, i32, i32} {i32 5, i32 6, i32 7, i32 8}
				}


				; GCN-LABEL: {{^}}both:
				; GCN: v_mov_b32_e32 v1, v0
				; GCN-DAG: exp 15, 0, 1, 1, 1, v1, v1, v1, v1
				; GCN-DAG: v_add_f32_e32 v0, 1.0, v1
				; GCN-DAG: s_add_i32 s0, s3, 2
				; GCN-DAG: s_mov_b32 s1, s2
				; GCN: s_mov_b32 s2, s3
				; GCN: s_waitcnt expcnt(0)
				; GCN-NOT: s_endpgm
				define {float, i32, float, i32, i32} @both([9 x <16 x i8>] addrspace(2)* byval, i32 inreg, i32 inreg, float) #0 {
				call void @llvm.SI.export(i32 15, i32 1, i32 1, i32 0, i32 1, float %3, float %3, float %3, float %3)
				%v = fadd float %3, 1.0
				%s = add i32 %2, 2
				%a0 = insertvalue {float, i32, float, i32, i32} undef, float %v, 0
				%a1 = insertvalue {float, i32, float, i32, i32} %a0, i32 %s, 1
				%a2 = insertvalue {float, i32, float, i32, i32} %a1, float %3, 2
				%a3 = insertvalue {float, i32, float, i32, i32} %a2, i32 %1, 3
				%a4 = insertvalue {float, i32, float, i32, i32} %a3, i32 %2, 4
				ret {float, i32, float, i32, i32} %a4
				}


				; GCN-LABEL: {{^}}structure_literal:
				; GCN: v_mov_b32_e32 v3, v0
				; GCN-DAG: v_mov_b32_e32 v0, 1.0
				; GCN-DAG: s_mov_b32 s0, 2
				; GCN-DAG: s_mov_b32 s1, 3
				; GCN-DAG: v_mov_b32_e32 v1, 2.0
				; GCN-DAG: v_mov_b32_e32 v2, 4.0
				; GCN-DAG: exp 15, 0, 1, 1, 1, v3, v3, v3, v3
				define {{float, i32}, {i32, <2 x float>}} @structure_literal([9 x <16 x i8>] addrspace(2)* byval, i32 inreg, i32 inreg, float) #0 {
				call void @llvm.SI.export(i32 15, i32 1, i32 1, i32 0, i32 1, float %3, float %3, float %3, float %3)
				ret {{float, i32}, {i32, <2 x float>}} {{float, i32} {float 1.0, i32 2}, {i32, <2 x float>} {i32 3, <2 x float> <float 2.0, float 4.0>}}
				}

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Add support for return values (8 patches)AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 44182

lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp

lib/Target/AMDGPU/AMDGPUCallingConv.td

lib/Target/AMDGPU/AMDGPUISelLowering.h

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

lib/Target/AMDGPU/AMDGPUInstrInfo.td

lib/Target/AMDGPU/SIDefines.h

lib/Target/AMDGPU/SIISelLowering.h

lib/Target/AMDGPU/SIISelLowering.cpp

lib/Target/AMDGPU/SIInsertWaits.cpp

lib/Target/AMDGPU/SIMachineFunctionInfo.h

lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h

lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp

test/CodeGen/AMDGPU/ret.ll

AMDGPU/SI: Add support for return values (8 patches)
AbandonedPublic