This is an archive of the discontinued LLVM Phabricator instance.

I also thought we were trying to get rid of group static size. It's broken with LDS relocations which we need to move towards. Can we just switch directly to using a relocation here?

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
5584	Why does it specifically need to be a 0 sized array? I think this would depend purely on the linkage, or treat any 0 sized type the same way

This revision now requires changes to proceed.Jun 24 2020, 1:05 PM

scchan added a subscriber: scchan.Jun 24 2020, 1:24 PM

scchan added inline comments.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
5588	don't you need to make sure whether the static size would give you an offset with the correct alignment?

I just found that change for non-HSA/-PAL environment. I need to check how it works and fit into other tests. So far, that's a critical change to ensure we won't change the original source code too much. Is it possible to address that relocation in a long run (says 1~3 weeks) to avoid the tight schedule.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
5584	That's the only syntax accepted under HIP clang. At least, zero-sized types should be checked to ensure developers understand the usage of the dynamic shared memory.

hliao marked an inline comment as done.Jun 24 2020, 1:50 PM

hliao added inline comments.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
5588	I remember that size should be always DWORD aligned. Let me check the code calculated that.

Harbormaster completed remote builds in B61609: Diff 273132.Jun 24 2020, 2:41 PM

My understanding is this feature is equivalent to the OpenCL dynamic group segment allocation. The runtime would presumably implement it in a similar way.

So the HIP runtime must take the static LDS size, round up to the alignment requirement of the dynamic allocation (OpenCL just uses the maximally aligned OpenCL data type), then add the size of the dynamic LDS. The AQL packet group segment field is set to the total LDS size.

In OpenCL there can be multiple kernel arguments, and the LDS address is passed to each. But for HIP there is only one dynamic area denoted by this weird extern. How is the dynamic LDS storage accessed? Is the address passed as an implicit kernel argument, or does the compiler implicitly use the aligned static LDS size?

I don't think this actually works since you could have multiple 0 sized objects, and they would both get the same address. I think this has to be an external relocation

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
5584	That's a bit too much HIP specific logic. Also what does this do if there are more than one? How can these return different addresses?
5588	The global has an explicit alignment that needs to be respected

In D82496#2112763, @t-tye wrote:

My understanding is this feature is equivalent to the OpenCL dynamic group segment allocation. The runtime would presumably implement it in a similar way.

So the HIP runtime must take the static LDS size, round up to the alignment requirement of the dynamic allocation (OpenCL just uses the maximally aligned OpenCL data type), then add the size of the dynamic LDS. The AQL packet group segment field is set to the total LDS size.

In OpenCL there can be multiple kernel arguments, and the LDS address is passed to each. But for HIP there is only one dynamic area denoted by this weird extern. How is the dynamic LDS storage accessed? Is the address passed as an implicit kernel argument, or does the compiler implicitly use the aligned static LDS size?

This's the point. To keep compatible with CUDA, multiple dynamically sized arrays in a single kernel must declare a single extern unsized array and uses the address to divide it into multiple arrays by developers themselves. It's in fact quite similar to the local memory parameter in OpenCL. Thus, all extern unsized __shared__ arrays are mapped onto the same address. The developer has the responsibility to divide it into multiple ones. That's in fact is same across HCC and HIP-Clang except that we want to maximize the compatibility and avoid changing source code too much.

Address the alignment issue.

Harbormaster completed remote builds in B67429: Diff 283833.Aug 7 2020, 1:55 AM

Also missing the globalisel handling

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
31	I don't think this should be mutable
68	Rename to getAllocatedLDSSize()? I think there should be a separate method to get the size plus the roundup to the dynamic alignment

arsenm added inline comments.Aug 7 2020, 1:36 PM

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
34	This doesn't need to be MaybeAlign, just Align. Also expand to Dynamic? Also should probably elaborate that this is used for the case where a dynamically sized global is used
94	Set is the wrong word here; ensureDynamicLDSAlign()?
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
5585	This should check if the allocated size of GV->getValueType() is 0, not special case 0 sized arrays
5589	Should be the alignment of the aggregate itself, not the element type

For the globalisel part, you'll need D84638 and the global lowering should introduce the intrinsic, not the machine pseudo

arsenm added inline comments.Aug 7 2020, 2:01 PM

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
34	This also needs to be added to the MachineFunctionInfo serialization
69	I think having this here actually breaks the calculation of the total size for all of the statically known globals

Add GlobalISel and MIR support.

hliao marked an inline comment as done.Aug 9 2020, 11:15 PM

hliao added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
68	To report the LDS usage accurately, the final LDSSize need to count the padding due to dynamic shared memory alignment. Now, the re-alignment on LDS is explicitly done before the final instruction selection.
69	The re-alignment is done explicitly just before final instruction selection and just once. BTW, `getLDSSize` should be only valid after instruction selection.

Harbormaster completed remote builds in B67661: Diff 284255.Aug 9 2020, 11:40 PM

arsenm added inline comments.Aug 10 2020, 12:54 PM

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
2808–2811 ↗	(On Diff #284255)	I think these should remain distinct queries/fields, not fixed up at an arbitrary point. GlobalISel will miss this for example. The asm printer would query the kind that accounts for the dynamic padding
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2284	This should not special case 0 sized arrays and should check the allocated size
llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
38	This can still just be Align
96	This isn't a set since it does more than set the value. ensureDynamicLDSAlign?
102–103	This shouldn't be a mutation, but return the aligned up size. totalLDSAllocSize()?
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
5585	This should not special case 0 sized arrays. This is 0 allocation size
llvm/test/CodeGen/MIR/AMDGPU/machine-function-info.ll
50	The default should be 1

hliao marked an inline comment as done.Aug 10 2020, 1:05 PM

hliao added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
2808–2811 ↗	(On Diff #284255)	GlobalISel calls `adjustLDSSizeForDynLDSAlign` similarly before the finalization of ISel.
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2284	I tend to be restrictive here to follow how that is used in HIP, zero-sized array always has zero allocated size. If other clients need similar usage but general zero-allocated type, we may enhance accordingly.
llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
102–103	don't we need to report the accurate usage of LDS? Does that alignment padding need counting as well for the final LDSSize?

arsenm added inline comments.Aug 10 2020, 1:07 PM

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
2808–2811 ↗	(On Diff #284255)	Having extra state that needs to be made consistent is bad. It's better to just track the two independent fields and do the roundup when needed at the end
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2284	Types don't mean anything. Any 0 sized globals are getting allocated to the same address. We're just going to miscompile other 0 sized types. We have no reason to treat other 0 sized types differently
2286–2287	Should use the aggregate alignment, not the element

Add preFinalizeLowering so that both DAGISel and GISel shares the same path
to adjust LDS size.

hliao marked an inline comment as done.Aug 17 2020, 1:48 PM

hliao added inline comments.

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2286–2287	Propose to add `preFinalizeLowering` before pseudo instruction expansion so that both GISel and DAGISel have the chance to adjust LDS size.

arsenm added inline comments.Aug 17 2020, 2:14 PM

llvm/include/llvm/CodeGen/TargetLowering.h
2794 ↗	(On Diff #286135)	The last thing we need is more callbacks called at random points
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2286–2287	It's really cleaner to just have this compute what you care about at the point you care about it. Having a point where this needs to be made consistent is both worse from a serialization perspective, and from an optimization point since theoretically we could spill into the padding later

hliao added inline comments.Aug 17 2020, 2:40 PM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2286–2287	But, as we discuss this is a short-term solution before the linker could perform the per-kernel LDS resolution. As the spilling to LDS is not implemented yet, this short-term solution should be kept as simple as possible and finally reverted. To pad LDS for the shared memory array, we have to wait until all static LDS ones are allocated. That's should be the point where `amdgcn_groupstatcisize` is about to be expanded, i.e. before finalizing lowering.

arsenm added inline comments.Aug 17 2020, 2:44 PM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2286–2287	Short term workaround or not, the less mutable state the better. I see no advantage to accumulating the allocated + padding size in a single variable vs. keeping the two separate. You have to track both in MFI anyway, and accumulating like this loses information.

Harbormaster completed remote builds in B68671: Diff 286135.Aug 17 2020, 2:57 PM

Remove adjustLDSSizeForDynLDSAlign.

Harbormaster completed remote builds in B68708: Diff 286197.Aug 17 2020, 10:34 PM

arsenm added inline comments.Aug 18 2020, 3:28 PM

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
41	Leftover comment: Just before the final selection, LDSSize is adjusted accordingly.
42	This doesn't need to be MaybeAlign, it can just be Align and default to 1
102–103	ensureDynamicLDSAlign(), and don't need to conditionally set it. This also does not need to mutate LDSSize
llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-no-ir.mir
126	Should be 1

hliao marked 4 inline comments as done.Aug 18 2020, 8:10 PM

Revise following comments.

Harbormaster completed remote builds in B68840: Diff 286463.Aug 18 2020, 8:40 PM

Rebase

Harbormaster completed remote builds in B68852: Diff 286479.Aug 18 2020, 11:19 PM

Minor coding style fix.

Harbormaster completed remote builds in B68892: Diff 286555.Aug 19 2020, 7:41 AM

arsenm added inline comments.Aug 19 2020, 8:48 AM

llvm/include/llvm/CodeGen/MIRYamlMapping.h
168–171	Should add parser tests for these cases
llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
59	This is an independent field and should not be changed here
llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
99	This should still not be modifying LDSSize. This is still missing an independent query to give the static + rounded size

hliao added inline comments.Aug 19 2020, 9:10 AM

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
59	As the sequence of static LDS allocation and dynamic LDS alignment updates are processed in the program order or reverse of that order, we need to collect all static LDS usage and dynamic LDS alignment. As we remove the previous the one single point adjustment, we need to update LDSSizze if there's any static LDS allocation or dynamic LDS alignment updates.
llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
99	Valid LDS queries should be done after instruction selection. `LDSSize` is ONLY modified within instruction selection through static LDS allocation and dynamic LDS alignment update.

hliao added inline comments.Aug 19 2020, 9:14 AM

llvm/include/llvm/CodeGen/MIRYamlMapping.h
168–171	llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-no-ir.mir covers that.

arsenm added inline comments.Aug 19 2020, 9:27 AM

llvm/include/llvm/CodeGen/MIRYamlMapping.h
168–171	It doesn't cover the error cases
llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
59	OK yes, this is in the right place now. However,r it should be where the alignment is updated
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
5588	This logic should be moved into allocateLDSGlobal

hliao added inline comments.Aug 19 2020, 9:39 AM

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
59	As we don't know which one will be processed last, we need to update `LDSSize` in both cases to ensure the correct one is calculated.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
5588	the allocation of dynamic LDS is not handled by the compiler. We only collect the alignment.

arsenm added inline comments.Aug 19 2020, 9:49 AM

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
59	This is actually terrible and we're going to burn extra padding, but I guess it's conservatively correct

Add the invalid case for alignment parsing.

hliao marked an inline comment as done.Aug 19 2020, 9:50 AM

hliao added inline comments.

llvm/include/llvm/CodeGen/MIRYamlMapping.h
168–171	Add an invalid case.

hliao marked an inline comment as done.Aug 19 2020, 9:54 AM

hliao added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
59	By maintaining `StaticLDSSize`, the padding for dynamic LDS is done only once. However, we need to continuously update it if there's any static LDS allocation or dynamic LDS alignment updates.

Harbormaster completed remote builds in B68915: Diff 286591.Aug 19 2020, 10:51 AM

Rebase.

Harbormaster completed remote builds in B69034: Diff 286819.Aug 20 2020, 8:19 AM

arsenm added inline comments.Aug 20 2020, 12:55 PM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2284	Should get global's alignment, not just the type
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
5588	This should take the alignment from the global, not just the type alignment
llvm/test/CodeGen/AMDGPU/GlobalISel/hip.extern.shared.array.ll
11	Needs some tests with larger explicit alignments
llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-dynlds-align-invalid-case.mir
2–3	This isn't XFAIL, it's run with not and check the error message output

Check the explicit alignment if any and fix the negative test case.

hliao marked 4 inline comments as done.Aug 20 2020, 1:43 PM

hliao added inline comments.

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2284	Good catch, fixed in the latest revision.

hliao marked an inline comment as done.Aug 20 2020, 2:00 PM

Harbormaster completed remote builds in B69077: Diff 286896.Aug 20 2020, 2:35 PM

arsenm added inline comments.Aug 20 2020, 4:36 PM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
5585	Should probably add a comment explaining this is dynamically allocated or something
llvm/test/CodeGen/AMDGPU/hip.extern.shared.array.ll
4–11	None of these use an unnaturally high alignment

Add comment and revist the test case.

arsenm accepted this revision.Aug 20 2020, 6:15 PM

This revision is now accepted and ready to land.Aug 20 2020, 6:15 PM

Harbormaster completed remote builds in B69096: Diff 286927.Aug 20 2020, 6:22 PM

Closed by commit rG5257a60ee02e: [amdgpu] Add codegen support for HIP dynamic shared memory. (authored by hliao). · Explain WhyAug 20 2020, 6:29 PM

This revision was automatically updated to reflect the committed changes.

hliao added a commit: rG5257a60ee02e: [amdgpu] Add codegen support for HIP dynamic shared memory..

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

MIRYamlMapping.h

16 lines

lib/

Target/

AMDGPU/

AMDGPULegalizerInfo.cpp

19 lines

AMDGPUMachineFunction.h

18 lines

AMDGPUMachineFunction.cpp

21 lines

SIISelLowering.cpp

25 lines

SIMachineFunctionInfo.h

2 lines

SIMachineFunctionInfo.cpp

32 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

hip.extern.shared.array.ll

140 lines

hip.extern.shared.array.ll

138 lines

MIR/

AMDGPU/

machine-function-info-dynlds-align-invalid-case.mir

14 lines

machine-function-info-no-ir.mir

21 lines

machine-function-info.ll

4 lines

Diff 286934

llvm/include/llvm/CodeGen/MIRYamlMapping.h

Show First 20 Lines • Show All 153 Lines • ▼ Show 20 Lines	static StringRef input(StringRef Scalar, void *, MaybeAlign &Alignment) {
if (n > 0 && !isPowerOf2_64(n))		if (n > 0 && !isPowerOf2_64(n))
return "must be 0 or a power of two";		return "must be 0 or a power of two";
Alignment = MaybeAlign(n);		Alignment = MaybeAlign(n);
return StringRef();		return StringRef();
}		}
static QuotingType mustQuote(StringRef) { return QuotingType::None; }		static QuotingType mustQuote(StringRef) { return QuotingType::None; }
};		};

		template <> struct ScalarTraits<Align> {
		static void output(const Align &Alignment, void *, llvm::raw_ostream &OS) {
		OS << Alignment.value();
		}
		static StringRef input(StringRef Scalar, void *, Align &Alignment) {
		unsigned long long N;
		if (getAsUnsignedInteger(Scalar, 10, N))
		return "invalid number";
		if (!isPowerOf2_64(N))
		return "must be a power of two";
		arsenmUnsubmitted Not Done Reply Inline Actions Should add parser tests for these cases arsenm: Should add parser tests for these cases
		hliaoAuthorUnsubmitted Done Reply Inline Actions llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-no-ir.mir covers that. hliao: llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-no-ir.mir covers that.
		arsenmUnsubmitted Done Reply Inline Actions It doesn't cover the error cases arsenm: It doesn't cover the error cases
		hliaoAuthorUnsubmitted Done Reply Inline Actions Add an invalid case. hliao: Add an invalid case.
		Alignment = Align(N);
		return StringRef();
		}
		static QuotingType mustQuote(StringRef) { return QuotingType::None; }
		};

} // end namespace yaml		} // end namespace yaml
} // end namespace llvm		} // end namespace llvm

LLVM_YAML_IS_SEQUENCE_VECTOR(llvm::yaml::StringValue)		LLVM_YAML_IS_SEQUENCE_VECTOR(llvm::yaml::StringValue)
LLVM_YAML_IS_FLOW_SEQUENCE_VECTOR(llvm::yaml::FlowStringValue)		LLVM_YAML_IS_FLOW_SEQUENCE_VECTOR(llvm::yaml::FlowStringValue)
LLVM_YAML_IS_FLOW_SEQUENCE_VECTOR(llvm::yaml::UnsignedValue)		LLVM_YAML_IS_FLOW_SEQUENCE_VECTOR(llvm::yaml::UnsignedValue)

namespace llvm {		namespace llvm {
▲ Show 20 Lines • Show All 483 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

Show First 20 Lines • Show All 2,273 Lines • ▼ Show 20 Lines	if (AS == AMDGPUAS::LOCAL_ADDRESS \|\| AS == AMDGPUAS::REGION_ADDRESS) {
// TODO: We could emit code to handle the initialization somewhere.		// TODO: We could emit code to handle the initialization somewhere.
if (!AMDGPUTargetLowering::hasDefinedInitializer(GV)) {		if (!AMDGPUTargetLowering::hasDefinedInitializer(GV)) {
const SITargetLowering *TLI = ST.getTargetLowering();		const SITargetLowering *TLI = ST.getTargetLowering();
if (!TLI->shouldUseLDSConstAddress(GV)) {		if (!TLI->shouldUseLDSConstAddress(GV)) {
MI.getOperand(1).setTargetFlags(SIInstrInfo::MO_ABS32_LO);		MI.getOperand(1).setTargetFlags(SIInstrInfo::MO_ABS32_LO);
return true; // Leave in place;		return true; // Leave in place;
}		}

		if (AS == AMDGPUAS::LOCAL_ADDRESS && GV->hasExternalLinkage()) {
		Type *Ty = GV->getValueType();
		// HIP uses an unsized array `extern __shared__ T s[]` or similar
		arsenmUnsubmitted Not Done Reply Inline Actions This should not special case 0 sized arrays and should check the allocated size arsenm: This should not special case 0 sized arrays and should check the allocated size
		hliaoAuthorUnsubmitted Done Reply Inline Actions I tend to be restrictive here to follow how that is used in HIP, zero-sized array always has zero allocated size. If other clients need similar usage but general zero-allocated type, we may enhance accordingly. hliao: I tend to be restrictive here to follow how that is used in HIP, zero-sized array always has…
		arsenmUnsubmitted Not Done Reply Inline Actions Types don't mean anything. Any 0 sized globals are getting allocated to the same address. We're just going to miscompile other 0 sized types. We have no reason to treat other 0 sized types differently arsenm: Types don't mean anything. Any 0 sized globals are getting allocated to the same address. We're…
		arsenmUnsubmitted Done Reply Inline Actions Should get global's alignment, not just the type arsenm: Should get global's alignment, not just the type
		hliaoAuthorUnsubmitted Done Reply Inline Actions Good catch, fixed in the latest revision. hliao: Good catch, fixed in the latest revision.
		// zero-sized type in other languages to declare the dynamic shared
		// memory which size is not known at the compile time. They will be
		// allocated by the runtime and placed directly after the static
		arsenmUnsubmitted Not Done Reply Inline Actions Should use the aggregate alignment, not the element arsenm: Should use the aggregate alignment, not the element
		hliaoAuthorUnsubmitted Done Reply Inline Actions Propose to add `preFinalizeLowering` before pseudo instruction expansion so that both GISel and DAGISel have the chance to adjust LDS size. hliao: Propose to add `preFinalizeLowering` before pseudo instruction expansion so that both GISel and…
		arsenmUnsubmitted Not Done Reply Inline Actions It's really cleaner to just have this compute what you care about at the point you care about it. Having a point where this needs to be made consistent is both worse from a serialization perspective, and from an optimization point since theoretically we could spill into the padding later arsenm: It's really cleaner to just have this compute what you care about at the point you care about…
		hliaoAuthorUnsubmitted Done Reply Inline Actions But, as we discuss this is a short-term solution before the linker could perform the per-kernel LDS resolution. As the spilling to LDS is not implemented yet, this short-term solution should be kept as simple as possible and finally reverted. To pad LDS for the shared memory array, we have to wait until all static LDS ones are allocated. That's should be the point where `amdgcn_groupstatcisize` is about to be expanded, i.e. before finalizing lowering. hliao: But, as we discuss this is a short-term solution before the linker could perform the per-kernel…
		arsenmUnsubmitted Not Done Reply Inline Actions Short term workaround or not, the less mutable state the better. I see no advantage to accumulating the allocated + padding size in a single variable vs. keeping the two separate. You have to track both in MFI anyway, and accumulating like this loses information. arsenm: Short term workaround or not, the less mutable state the better. I see no advantage to…
		// allocated ones. They all share the same offset.
		if (B.getDataLayout().getTypeAllocSize(Ty).isZero()) {
		// Adjust alignment for that dynamic shared memory array.
		MFI->setDynLDSAlign(B.getDataLayout(), *cast<GlobalVariable>(GV));
		LLT S32 = LLT::scalar(32);
		auto Sz =
		B.buildIntrinsic(Intrinsic::amdgcn_groupstaticsize, {S32}, false);
		B.buildIntToPtr(DstReg, Sz);
		MI.eraseFromParent();
		return true;
		}
		}

B.buildConstant(		B.buildConstant(
DstReg,		DstReg,
MFI->allocateLDSGlobal(B.getDataLayout(), *cast<GlobalVariable>(GV)));		MFI->allocateLDSGlobal(B.getDataLayout(), *cast<GlobalVariable>(GV)));
MI.eraseFromParent();		MI.eraseFromParent();
return true;		return true;
}		}

const Function &Fn = MF.getFunction();		const Function &Fn = MF.getFunction();
▲ Show 20 Lines • Show All 2,360 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h

	//===-- AMDGPUMachineFunctionInfo.h -------------------------------- C++ --=//			//===-- AMDGPUMachineFunctionInfo.h -------------------------------- C++ --=//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_LIB_TARGET_AMDGPU_AMDGPUMACHINEFUNCTION_H			#ifndef LLVM_LIB_TARGET_AMDGPU_AMDGPUMACHINEFUNCTION_H
	#define LLVM_LIB_TARGET_AMDGPU_AMDGPUMACHINEFUNCTION_H			#define LLVM_LIB_TARGET_AMDGPU_AMDGPUMACHINEFUNCTION_H

				#include "Utils/AMDGPUBaseInfo.h"
	#include "llvm/ADT/DenseMap.h"			#include "llvm/ADT/DenseMap.h"
	#include "llvm/CodeGen/MachineFunction.h"			#include "llvm/CodeGen/MachineFunction.h"
	#include "Utils/AMDGPUBaseInfo.h"			#include "llvm/Support/Alignment.h"

	namespace llvm {			namespace llvm {

	class GCNSubtarget;			class GCNSubtarget;

	class AMDGPUMachineFunction : public MachineFunctionInfo {			class AMDGPUMachineFunction : public MachineFunctionInfo {
	/// A map to keep track of local memory objects and their offsets within the			/// A map to keep track of local memory objects and their offsets within the
	/// local memory space.			/// local memory space.
	SmallDenseMap<const GlobalValue *, unsigned, 4> LocalMemoryObjects;			SmallDenseMap<const GlobalValue *, unsigned, 4> LocalMemoryObjects;

	protected:			protected:
	uint64_t ExplicitKernArgSize = 0; // Cache for this.			uint64_t ExplicitKernArgSize = 0; // Cache for this.
	Align MaxKernArgAlign; // Cache for this.			Align MaxKernArgAlign; // Cache for this.

	/// Number of bytes in the LDS that are being used.			/// Number of bytes in the LDS that are being used.
	unsigned LDSSize = 0;			unsigned LDSSize = 0;
				arsenmUnsubmitted Not Done Reply Inline Actions I don't think this should be mutable arsenm: I don't think this should be mutable

				/// Number of bytes in the LDS allocated statically. This field is only used
				/// in the instruction selector and not part of the machine function info.
				arsenmUnsubmitted Not Done Reply Inline Actions This doesn't need to be MaybeAlign, just Align. Also expand to Dynamic? Also should probably elaborate that this is used for the case where a dynamically sized global is used arsenm: This doesn't need to be MaybeAlign, just Align. Also expand to Dynamic? Also should probably…
				arsenmUnsubmitted Not Done Reply Inline Actions This also needs to be added to the MachineFunctionInfo serialization arsenm: This also needs to be added to the MachineFunctionInfo serialization
				unsigned StaticLDSSize = 0;

				/// Align for dynamic shared memory if any. Dynamic shared memory is
				/// allocated directly after the static one, i.e., LDSSize. Need to pad
				arsenmUnsubmitted Not Done Reply Inline Actions This can still just be Align arsenm: This can still just be Align
				/// LDSSize to ensure that dynamic one is aligned accordingly.
				/// The maximal alignment is updated during IR translation or lowering
				/// stages.
				arsenmUnsubmitted Done Reply Inline Actions Leftover comment: Just before the final selection, LDSSize is adjusted accordingly. arsenm: Leftover comment: Just before the final selection, LDSSize is adjusted accordingly.
				Align DynLDSAlign;
				arsenmUnsubmitted Done Reply Inline Actions This doesn't need to be MaybeAlign, it can just be Align and default to 1 arsenm: This doesn't need to be MaybeAlign, it can just be Align and default to 1

	// State of MODE register, assumed FP mode.			// State of MODE register, assumed FP mode.
	AMDGPU::SIModeRegisterDefaults Mode;			AMDGPU::SIModeRegisterDefaults Mode;

	// Kernels + shaders. i.e. functions called by the driver and not called			// Kernels + shaders. i.e. functions called by the driver and not called
	// by other functions.			// by other functions.
	bool IsEntryFunction = false;			bool IsEntryFunction = false;

	bool NoSignedZerosFPMath = false;			bool NoSignedZerosFPMath = false;

	// Function may be memory bound.			// Function may be memory bound.
	bool MemoryBound = false;			bool MemoryBound = false;

	// Kernel may need limited waves per EU for better performance.			// Kernel may need limited waves per EU for better performance.
	bool WaveLimiter = false;			bool WaveLimiter = false;

	public:			public:
	AMDGPUMachineFunction(const MachineFunction &MF);			AMDGPUMachineFunction(const MachineFunction &MF);

	uint64_t getExplicitKernArgSize() const {			uint64_t getExplicitKernArgSize() const {
	return ExplicitKernArgSize;			return ExplicitKernArgSize;
	}			}

	unsigned getMaxKernArgAlign() const { return MaxKernArgAlign.value(); }			unsigned getMaxKernArgAlign() const { return MaxKernArgAlign.value(); }

	unsigned getLDSSize() const {			unsigned getLDSSize() const {
				arsenmUnsubmitted Not Done Reply Inline Actions Rename to getAllocatedLDSSize()? I think there should be a separate method to get the size plus the roundup to the dynamic alignment arsenm: Rename to getAllocatedLDSSize()? I think there should be a separate method to get the size plus…
				hliaoAuthorUnsubmitted Done Reply Inline Actions To report the LDS usage accurately, the final LDSSize need to count the padding due to dynamic shared memory alignment. Now, the re-alignment on LDS is explicitly done before the final instruction selection. hliao: To report the LDS usage accurately, the final LDSSize need to count the padding due to dynamic…
	return LDSSize;			return LDSSize;
				arsenmUnsubmitted Done Reply Inline Actions I think having this here actually breaks the calculation of the total size for all of the statically known globals arsenm: I think having this here actually breaks the calculation of the total size for all of the…
				hliaoAuthorUnsubmitted Done Reply Inline Actions The re-alignment is done explicitly just before final instruction selection and just once. BTW, `getLDSSize` should be only valid after instruction selection. hliao: The re-alignment is done explicitly just before final instruction selection and just once. BTW…
	}			}

	AMDGPU::SIModeRegisterDefaults getMode() const {			AMDGPU::SIModeRegisterDefaults getMode() const {
	return Mode;			return Mode;
	}			}

	bool isEntryFunction() const {			bool isEntryFunction() const {
	return IsEntryFunction;			return IsEntryFunction;
	}			}

	bool hasNoSignedZerosFPMath() const {			bool hasNoSignedZerosFPMath() const {
	return NoSignedZerosFPMath;			return NoSignedZerosFPMath;
	}			}

	bool isMemoryBound() const {			bool isMemoryBound() const {
	return MemoryBound;			return MemoryBound;
	}			}

	bool needsWaveLimiter() const {			bool needsWaveLimiter() const {
	return WaveLimiter;			return WaveLimiter;
	}			}

	unsigned allocateLDSGlobal(const DataLayout &DL, const GlobalVariable &GV);			unsigned allocateLDSGlobal(const DataLayout &DL, const GlobalVariable &GV);

				Align getDynLDSAlign() const { return DynLDSAlign; }
				arsenmUnsubmitted Not Done Reply Inline Actions Set is the wrong word here; ensureDynamicLDSAlign()? arsenm: Set is the wrong word here; ensureDynamicLDSAlign()?

				void setDynLDSAlign(const DataLayout &DL, const GlobalVariable &GV);
				arsenmUnsubmitted Not Done Reply Inline Actions This isn't a set since it does more than set the value. ensureDynamicLDSAlign? arsenm: This isn't a set since it does more than set the value. ensureDynamicLDSAlign?
	};			};

	}			}
				arsenmUnsubmitted Not Done Reply Inline Actions This should still not be modifying LDSSize. This is still missing an independent query to give the static + rounded size arsenm: This should still not be modifying LDSSize. This is still missing an independent query to give…
				hliaoAuthorUnsubmitted Done Reply Inline Actions Valid LDS queries should be done after instruction selection. `LDSSize` is ONLY modified within instruction selection through static LDS allocation and dynamic LDS alignment update. hliao: Valid LDS queries should be done after instruction selection. `LDSSize` is ONLY modified within…
	#endif			#endif
				arsenmUnsubmitted Not Done Reply Inline Actions This shouldn't be a mutation, but return the aligned up size. totalLDSAllocSize()? arsenm: This shouldn't be a mutation, but return the aligned up size. totalLDSAllocSize()?
				hliaoAuthorUnsubmitted Done Reply Inline Actions don't we need to report the accurate usage of LDS? Does that alignment padding need counting as well for the final LDSSize? hliao: don't we need to report the accurate usage of LDS? Does that alignment padding need counting as…
				arsenmUnsubmitted Done Reply Inline Actions ensureDynamicLDSAlign(), and don't need to conditionally set it. This also does not need to mutate LDSSize arsenm: ensureDynamicLDSAlign(), and don't need to conditionally set it. This also does not need to…

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp

Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	if (!Entry.second)
return Entry.first->second;		return Entry.first->second;

Align Alignment =		Align Alignment =
DL.getValueOrABITypeAlignment(GV.getAlign(), GV.getValueType());		DL.getValueOrABITypeAlignment(GV.getAlign(), GV.getValueType());

/// TODO: We should sort these to minimize wasted space due to alignment		/// TODO: We should sort these to minimize wasted space due to alignment
/// padding. Currently the padding is decided by the first encountered use		/// padding. Currently the padding is decided by the first encountered use
/// during lowering.		/// during lowering.
unsigned Offset = LDSSize = alignTo(LDSSize, Alignment);		unsigned Offset = StaticLDSSize = alignTo(StaticLDSSize, Alignment);

Entry.first->second = Offset;		Entry.first->second = Offset;
LDSSize += DL.getTypeAllocSize(GV.getValueType());		StaticLDSSize += DL.getTypeAllocSize(GV.getValueType());

		// Update the LDS size considering the padding to align the dynamic shared
		// memory.
		LDSSize = alignTo(StaticLDSSize, DynLDSAlign);
		arsenmUnsubmitted Not Done Reply Inline Actions This is an independent field and should not be changed here arsenm: This is an independent field and should not be changed here
		hliaoAuthorUnsubmitted Done Reply Inline Actions As the sequence of static LDS allocation and dynamic LDS alignment updates are processed in the program order or reverse of that order, we need to collect all static LDS usage and dynamic LDS alignment. As we remove the previous the one single point adjustment, we need to update LDSSizze if there's any static LDS allocation or dynamic LDS alignment updates. hliao: As the sequence of static LDS allocation and dynamic LDS alignment updates are processed in the…
		arsenmUnsubmitted Not Done Reply Inline Actions OK yes, this is in the right place now. However,r it should be where the alignment is updated arsenm: OK yes, this is in the right place now. However,r it should be where the alignment is updated
		hliaoAuthorUnsubmitted Done Reply Inline Actions As we don't know which one will be processed last, we need to update `LDSSize` in both cases to ensure the correct one is calculated. hliao: As we don't know which one will be processed last, we need to update `LDSSize` in both cases to…
		arsenmUnsubmitted Not Done Reply Inline Actions This is actually terrible and we're going to burn extra padding, but I guess it's conservatively correct arsenm: This is actually terrible and we're going to burn extra padding, but I guess it's…
		hliaoAuthorUnsubmitted Done Reply Inline Actions By maintaining `StaticLDSSize`, the padding for dynamic LDS is done only once. However, we need to continuously update it if there's any static LDS allocation or dynamic LDS alignment updates. hliao: By maintaining `StaticLDSSize`, the padding for dynamic LDS is done only once. However, we need…

return Offset;		return Offset;
}		}

		void AMDGPUMachineFunction::setDynLDSAlign(const DataLayout &DL,
		const GlobalVariable &GV) {
		assert(DL.getTypeAllocSize(GV.getValueType()).isZero());

		Align Alignment =
		DL.getValueOrABITypeAlignment(GV.getAlign(), GV.getValueType());
		if (Alignment <= DynLDSAlign)
		return;

		LDSSize = alignTo(StaticLDSSize, Alignment);
		DynLDSAlign = Alignment;
		}

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,565 Lines • ▼ Show 20 Lines	buildPCRelGlobalAddress(SelectionDAG &DAG, const GlobalValue *GV,
}		}
return DAG.getNode(AMDGPUISD::PC_ADD_REL_OFFSET, DL, PtrVT, PtrLo, PtrHi);		return DAG.getNode(AMDGPUISD::PC_ADD_REL_OFFSET, DL, PtrVT, PtrLo, PtrHi);
}		}

SDValue SITargetLowering::LowerGlobalAddress(AMDGPUMachineFunction *MFI,		SDValue SITargetLowering::LowerGlobalAddress(AMDGPUMachineFunction *MFI,
SDValue Op,		SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
GlobalAddressSDNode *GSD = cast<GlobalAddressSDNode>(Op);		GlobalAddressSDNode *GSD = cast<GlobalAddressSDNode>(Op);
		SDLoc DL(GSD);
		EVT PtrVT = Op.getValueType();

const GlobalValue *GV = GSD->getGlobal();		const GlobalValue *GV = GSD->getGlobal();
if ((GSD->getAddressSpace() == AMDGPUAS::LOCAL_ADDRESS &&		if ((GSD->getAddressSpace() == AMDGPUAS::LOCAL_ADDRESS &&
shouldUseLDSConstAddress(GV)) \|\|		shouldUseLDSConstAddress(GV)) \|\|
GSD->getAddressSpace() == AMDGPUAS::REGION_ADDRESS \|\|		GSD->getAddressSpace() == AMDGPUAS::REGION_ADDRESS \|\|
GSD->getAddressSpace() == AMDGPUAS::PRIVATE_ADDRESS)		GSD->getAddressSpace() == AMDGPUAS::PRIVATE_ADDRESS) {
		if (GSD->getAddressSpace() == AMDGPUAS::LOCAL_ADDRESS &&
		GV->hasExternalLinkage()) {
		Type *Ty = GV->getValueType();
		arsenmUnsubmitted Not Done Reply Inline Actions Why does it specifically need to be a 0 sized array? I think this would depend purely on the linkage, or treat any 0 sized type the same way arsenm: Why does it specifically need to be a 0 sized array? I think this would depend purely on the…
		hliaoAuthorUnsubmitted Done Reply Inline Actions That's the only syntax accepted under HIP clang. At least, zero-sized types should be checked to ensure developers understand the usage of the dynamic shared memory. hliao: That's the only syntax accepted under HIP clang. At least, zero-sized types should be checked…
		arsenmUnsubmitted Not Done Reply Inline Actions That's a bit too much HIP specific logic. Also what does this do if there are more than one? How can these return different addresses? arsenm: That's a bit too much HIP specific logic. Also what does this do if there are more than one?
		// HIP uses an unsized array `extern __shared__ T s[]` or similar
		arsenmUnsubmitted Not Done Reply Inline Actions This should check if the allocated size of GV->getValueType() is 0, not special case 0 sized arrays arsenm: This should check if the allocated size of GV->getValueType() is 0, not special case 0 sized…
		arsenmUnsubmitted Done Reply Inline Actions This should not special case 0 sized arrays. This is 0 allocation size arsenm: This should not special case 0 sized arrays. This is 0 allocation size
		arsenmUnsubmitted Not Done Reply Inline Actions Should probably add a comment explaining this is dynamically allocated or something arsenm: Should probably add a comment explaining this is dynamically allocated or something
		// zero-sized type in other languages to declare the dynamic shared
		// memory which size is not known at the compile time. They will be
		// allocated by the runtime and placed directly after the static
		scchanUnsubmitted Not Done Reply Inline Actions don't you need to make sure whether the static size would give you an offset with the correct alignment? scchan: don't you need to make sure whether the static size would give you an offset with the correct…
		hliaoAuthorUnsubmitted Done Reply Inline Actions I remember that size should be always DWORD aligned. Let me check the code calculated that. hliao: I remember that size should be always DWORD aligned. Let me check the code calculated that.
		arsenmUnsubmitted Not Done Reply Inline Actions The global has an explicit alignment that needs to be respected arsenm: The global has an explicit alignment that needs to be respected
		arsenmUnsubmitted Not Done Reply Inline Actions This logic should be moved into allocateLDSGlobal arsenm: This logic should be moved into allocateLDSGlobal
		hliaoAuthorUnsubmitted Done Reply Inline Actions the allocation of dynamic LDS is not handled by the compiler. We only collect the alignment. hliao: the allocation of dynamic LDS is not handled by the compiler. We only collect the alignment.
		arsenmUnsubmitted Done Reply Inline Actions This should take the alignment from the global, not just the type alignment arsenm: This should take the alignment from the global, not just the type alignment
		// allocated ones. They all share the same offset.
		arsenmUnsubmitted Not Done Reply Inline Actions Should be the alignment of the aggregate itself, not the element type arsenm: Should be the alignment of the aggregate itself, not the element type
		if (DAG.getDataLayout().getTypeAllocSize(Ty).isZero()) {
		assert(PtrVT == MVT::i32 && "32-bit pointer is expected.");
		// Adjust alignment for that dynamic shared memory array.
		MFI->setDynLDSAlign(DAG.getDataLayout(), *cast<GlobalVariable>(GV));
		return SDValue(
		DAG.getMachineNode(AMDGPU::GET_GROUPSTATICSIZE, DL, PtrVT), 0);
		}
		}
return AMDGPUTargetLowering::LowerGlobalAddress(MFI, Op, DAG);		return AMDGPUTargetLowering::LowerGlobalAddress(MFI, Op, DAG);
		}
SDLoc DL(GSD);
EVT PtrVT = Op.getValueType();

if (GSD->getAddressSpace() == AMDGPUAS::LOCAL_ADDRESS) {		if (GSD->getAddressSpace() == AMDGPUAS::LOCAL_ADDRESS) {
SDValue GA = DAG.getTargetGlobalAddress(GV, DL, MVT::i32, GSD->getOffset(),		SDValue GA = DAG.getTargetGlobalAddress(GV, DL, MVT::i32, GSD->getOffset(),
SIInstrInfo::MO_ABS32_LO);		SIInstrInfo::MO_ABS32_LO);
return DAG.getNode(AMDGPUISD::LDS, DL, MVT::i32, GA);		return DAG.getNode(AMDGPUISD::LDS, DL, MVT::i32, GA);
}		}

if (shouldEmitFixup(GV))		if (shouldEmitFixup(GV))
▲ Show 20 Lines • Show All 6,072 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h

Show First 20 Lines • Show All 271 Lines • ▼ Show 20 Lines	static void mapping(IO &YamlIO, SIMode &Mode) {
YamlIO.mapOptional("fp64-fp16-output-denormals", Mode.FP64FP16OutputDenormals, true);		YamlIO.mapOptional("fp64-fp16-output-denormals", Mode.FP64FP16OutputDenormals, true);
}		}
};		};

struct SIMachineFunctionInfo final : public yaml::MachineFunctionInfo {		struct SIMachineFunctionInfo final : public yaml::MachineFunctionInfo {
uint64_t ExplicitKernArgSize = 0;		uint64_t ExplicitKernArgSize = 0;
unsigned MaxKernArgAlign = 0;		unsigned MaxKernArgAlign = 0;
unsigned LDSSize = 0;		unsigned LDSSize = 0;
		Align DynLDSAlign;
bool IsEntryFunction = false;		bool IsEntryFunction = false;
bool NoSignedZerosFPMath = false;		bool NoSignedZerosFPMath = false;
bool MemoryBound = false;		bool MemoryBound = false;
bool WaveLimiter = false;		bool WaveLimiter = false;
bool HasSpilledSGPRs = false;		bool HasSpilledSGPRs = false;
bool HasSpilledVGPRs = false;		bool HasSpilledVGPRs = false;
uint32_t HighBitsOf32BitAddress = 0;		uint32_t HighBitsOf32BitAddress = 0;

Show All 13 Lines
};		};

template <> struct MappingTraits<SIMachineFunctionInfo> {		template <> struct MappingTraits<SIMachineFunctionInfo> {
static void mapping(IO &YamlIO, SIMachineFunctionInfo &MFI) {		static void mapping(IO &YamlIO, SIMachineFunctionInfo &MFI) {
YamlIO.mapOptional("explicitKernArgSize", MFI.ExplicitKernArgSize,		YamlIO.mapOptional("explicitKernArgSize", MFI.ExplicitKernArgSize,
UINT64_C(0));		UINT64_C(0));
YamlIO.mapOptional("maxKernArgAlign", MFI.MaxKernArgAlign, 0u);		YamlIO.mapOptional("maxKernArgAlign", MFI.MaxKernArgAlign, 0u);
YamlIO.mapOptional("ldsSize", MFI.LDSSize, 0u);		YamlIO.mapOptional("ldsSize", MFI.LDSSize, 0u);
		YamlIO.mapOptional("dynLDSAlign", MFI.DynLDSAlign, Align());
YamlIO.mapOptional("isEntryFunction", MFI.IsEntryFunction, false);		YamlIO.mapOptional("isEntryFunction", MFI.IsEntryFunction, false);
YamlIO.mapOptional("noSignedZerosFPMath", MFI.NoSignedZerosFPMath, false);		YamlIO.mapOptional("noSignedZerosFPMath", MFI.NoSignedZerosFPMath, false);
YamlIO.mapOptional("memoryBound", MFI.MemoryBound, false);		YamlIO.mapOptional("memoryBound", MFI.MemoryBound, false);
YamlIO.mapOptional("waveLimiter", MFI.WaveLimiter, false);		YamlIO.mapOptional("waveLimiter", MFI.WaveLimiter, false);
YamlIO.mapOptional("hasSpilledSGPRs", MFI.HasSpilledSGPRs, false);		YamlIO.mapOptional("hasSpilledSGPRs", MFI.HasSpilledSGPRs, false);
YamlIO.mapOptional("hasSpilledVGPRs", MFI.HasSpilledVGPRs, false);		YamlIO.mapOptional("hasSpilledVGPRs", MFI.HasSpilledVGPRs, false);
YamlIO.mapOptional("scratchRSrcReg", MFI.ScratchRSrcReg,		YamlIO.mapOptional("scratchRSrcReg", MFI.ScratchRSrcReg,
StringValue("$private_rsrc_reg"));		StringValue("$private_rsrc_reg"));
▲ Show 20 Lines • Show All 628 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

Show First 20 Lines • Show All 531 Lines • ▼ Show 20 Lines	convertArgumentInfo(const AMDGPUFunctionArgInfo &ArgInfo,

if (Any)		if (Any)
return AI;		return AI;

return None;		return None;
}		}

yaml::SIMachineFunctionInfo::SIMachineFunctionInfo(		yaml::SIMachineFunctionInfo::SIMachineFunctionInfo(
const llvm::SIMachineFunctionInfo& MFI,		const llvm::SIMachineFunctionInfo &MFI, const TargetRegisterInfo &TRI)
const TargetRegisterInfo &TRI)
: ExplicitKernArgSize(MFI.getExplicitKernArgSize()),		: ExplicitKernArgSize(MFI.getExplicitKernArgSize()),
MaxKernArgAlign(MFI.getMaxKernArgAlign()),		MaxKernArgAlign(MFI.getMaxKernArgAlign()), LDSSize(MFI.getLDSSize()),
LDSSize(MFI.getLDSSize()),		DynLDSAlign(MFI.getDynLDSAlign()), IsEntryFunction(MFI.isEntryFunction()),
IsEntryFunction(MFI.isEntryFunction()),
NoSignedZerosFPMath(MFI.hasNoSignedZerosFPMath()),		NoSignedZerosFPMath(MFI.hasNoSignedZerosFPMath()),
MemoryBound(MFI.isMemoryBound()),		MemoryBound(MFI.isMemoryBound()), WaveLimiter(MFI.needsWaveLimiter()),
WaveLimiter(MFI.needsWaveLimiter()),
HasSpilledSGPRs(MFI.hasSpilledSGPRs()),		HasSpilledSGPRs(MFI.hasSpilledSGPRs()),
HasSpilledVGPRs(MFI.hasSpilledVGPRs()),		HasSpilledVGPRs(MFI.hasSpilledVGPRs()),
HighBitsOf32BitAddress(MFI.get32BitAddressHighBits()),		HighBitsOf32BitAddress(MFI.get32BitAddressHighBits()),
ScratchRSrcReg(regToString(MFI.getScratchRSrcReg(), TRI)),		ScratchRSrcReg(regToString(MFI.getScratchRSrcReg(), TRI)),
FrameOffsetReg(regToString(MFI.getFrameOffsetReg(), TRI)),		FrameOffsetReg(regToString(MFI.getFrameOffsetReg(), TRI)),
StackPtrOffsetReg(regToString(MFI.getStackPtrOffsetReg(), TRI)),		StackPtrOffsetReg(regToString(MFI.getStackPtrOffsetReg(), TRI)),
ArgInfo(convertArgumentInfo(MFI.getArgInfo(), TRI)),		ArgInfo(convertArgumentInfo(MFI.getArgInfo(), TRI)), Mode(MFI.getMode()) {
Mode(MFI.getMode()) {}		}

void yaml::SIMachineFunctionInfo::mappingImpl(yaml::IO &YamlIO) {		void yaml::SIMachineFunctionInfo::mappingImpl(yaml::IO &YamlIO) {
MappingTraits<SIMachineFunctionInfo>::mapping(YamlIO, *this);		MappingTraits<SIMachineFunctionInfo>::mapping(YamlIO, *this);
}		}

bool SIMachineFunctionInfo::initializeBaseYamlFields(		bool SIMachineFunctionInfo::initializeBaseYamlFields(
const yaml::SIMachineFunctionInfo &YamlMFI) {		const yaml::SIMachineFunctionInfo &YamlMFI) {
ExplicitKernArgSize = YamlMFI.ExplicitKernArgSize;		ExplicitKernArgSize = YamlMFI.ExplicitKernArgSize;
MaxKernArgAlign = assumeAligned(YamlMFI.MaxKernArgAlign);		MaxKernArgAlign = assumeAligned(YamlMFI.MaxKernArgAlign);
LDSSize = YamlMFI.LDSSize;		LDSSize = YamlMFI.LDSSize;
		DynLDSAlign = YamlMFI.DynLDSAlign;
HighBitsOf32BitAddress = YamlMFI.HighBitsOf32BitAddress;		HighBitsOf32BitAddress = YamlMFI.HighBitsOf32BitAddress;
IsEntryFunction = YamlMFI.IsEntryFunction;		IsEntryFunction = YamlMFI.IsEntryFunction;
NoSignedZerosFPMath = YamlMFI.NoSignedZerosFPMath;		NoSignedZerosFPMath = YamlMFI.NoSignedZerosFPMath;
MemoryBound = YamlMFI.MemoryBound;		MemoryBound = YamlMFI.MemoryBound;
WaveLimiter = YamlMFI.WaveLimiter;		WaveLimiter = YamlMFI.WaveLimiter;
HasSpilledSGPRs = YamlMFI.HasSpilledSGPRs;		HasSpilledSGPRs = YamlMFI.HasSpilledSGPRs;
HasSpilledVGPRs = YamlMFI.HasSpilledVGPRs;		HasSpilledVGPRs = YamlMFI.HasSpilledVGPRs;
return false;		return false;
Show All 19 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/hip.extern.shared.array.ll

This file was added.

				; RUN: llc -global-isel -mtriple=amdgcn--amdhsa -mcpu=gfx900 -verify-machineinstrs -o - %s \| FileCheck %s

				@lds0 = addrspace(3) global [512 x float] undef
				@lds1 = addrspace(3) global [256 x float] undef
				@lds2 = addrspace(3) global [4096 x float] undef
				@lds3 = addrspace(3) global [67 x i8] undef

				@dynamic_shared0 = external addrspace(3) global [0 x float]
				@dynamic_shared1 = external addrspace(3) global [0 x double]
				@dynamic_shared2 = external addrspace(3) global [0 x double], align 4
				@dynamic_shared3 = external addrspace(3) global [0 x double], align 16
				arsenmUnsubmitted Done Reply Inline Actions Needs some tests with larger explicit alignments arsenm: Needs some tests with larger explicit alignments

				; CHECK-LABEL: {{^}}dynamic_shared_array_0:
				; CHECK: v_add_u32_e32 v{{[0-9]+}}, 0x800, v{{[0-9]+}}
				define amdgpu_kernel void @dynamic_shared_array_0(float addrspace(1)* %out) {
				%tid.x = tail call i32 @llvm.amdgcn.workitem.id.x()
				%arrayidx0 = getelementptr inbounds [512 x float], [512 x float] addrspace(3)* @lds0, i32 0, i32 %tid.x
				%val0 = load float, float addrspace(3)* %arrayidx0, align 4
				%arrayidx1 = getelementptr inbounds [0 x float], [0 x float] addrspace(3)* @dynamic_shared0, i32 0, i32 %tid.x
				store float %val0, float addrspace(3)* %arrayidx1, align 4
				ret void
				}

				; CHECK-LABEL: {{^}}dynamic_shared_array_1:
				; CHECK: v_lshlrev_b32_e32 {{v[0-9]+}}, 2, {{v[0-9]+}}
				; CHECK: v_lshlrev_b32_e32 {{v[0-9]+}}, 2, {{v[0-9]+}}
				; CHECK: v_lshlrev_b32_e32 [[IDX:v[0-9]+]], 2, {{v[0-9]+}}
				; CHECK: v_add_u32_e32 {{v[0-9]+}}, 0xc00, [[IDX]]
				define amdgpu_kernel void @dynamic_shared_array_1(float addrspace(1)* %out, i32 %cond) {
				entry:
				%tid.x = tail call i32 @llvm.amdgcn.workitem.id.x()
				%idx.0 = add nsw i32 %tid.x, 64
				%tmp = icmp eq i32 %cond, 0
				br i1 %tmp, label %if, label %else

				if: ; preds = %entry
				%arrayidx0 = getelementptr inbounds [512 x float], [512 x float] addrspace(3)* @lds0, i32 0, i32 %idx.0
				%val0 = load float, float addrspace(3)* %arrayidx0, align 4
				br label %endif

				else: ; preds = %entry
				%arrayidx1 = getelementptr inbounds [256 x float], [256 x float] addrspace(3)* @lds1, i32 0, i32 %idx.0
				%val1 = load float, float addrspace(3)* %arrayidx1, align 4
				br label %endif

				endif: ; preds = %else, %if
				%val = phi float [ %val0, %if ], [ %val1, %else ]
				%arrayidx = getelementptr inbounds [0 x float], [0 x float] addrspace(3)* @dynamic_shared0, i32 0, i32 %tid.x
				store float %val, float addrspace(3)* %arrayidx, align 4
				ret void
				}

				; CHECK-LABEL: {{^}}dynamic_shared_array_2:
				; CHECK: v_lshlrev_b32_e32 [[IDX:v[0-9]+]], 2, {{v[0-9]+}}
				; CHECK: v_add_u32_e32 {{v[0-9]+}}, 0x4000, [[IDX]]
				define amdgpu_kernel void @dynamic_shared_array_2(i32 %idx) {
				%tid.x = tail call i32 @llvm.amdgcn.workitem.id.x()
				%vidx = add i32 %tid.x, %idx
				%arrayidx0 = getelementptr inbounds [4096 x float], [4096 x float] addrspace(3)* @lds2, i32 0, i32 %vidx
				%val0 = load float, float addrspace(3)* %arrayidx0, align 4
				%arrayidx1 = getelementptr inbounds [0 x float], [0 x float] addrspace(3)* @dynamic_shared0, i32 0, i32 %tid.x
				store float %val0, float addrspace(3)* %arrayidx1, align 4
				ret void
				}

				; The offset to the dynamic shared memory array should be aligned on the type
				; specified.
				; CHECK-LABEL: {{^}}dynamic_shared_array_3:
				; CHECK: v_lshlrev_b32_e32 [[IDX:v[0-9]+]], 2, {{v[0-9]+}}
				; CHECK: v_add_u32_e32 {{v[0-9]+}}, 0x44, [[IDX]]
				define amdgpu_kernel void @dynamic_shared_array_3(i32 %idx) {
				%tid.x = tail call i32 @llvm.amdgcn.workitem.id.x()
				%vidx = add i32 %tid.x, %idx
				%arrayidx0 = getelementptr inbounds [67 x i8], [67 x i8] addrspace(3)* @lds3, i32 0, i32 %vidx
				%val0 = load i8, i8 addrspace(3)* %arrayidx0, align 4
				%val1 = uitofp i8 %val0 to float
				%arrayidx1 = getelementptr inbounds [0 x float], [0 x float] addrspace(3)* @dynamic_shared0, i32 0, i32 %tid.x
				store float %val1, float addrspace(3)* %arrayidx1, align 4
				ret void
				}

				; The offset to the dynamic shared memory array should be aligned on the
				; maximal one.
				; CHECK-LABEL: {{^}}dynamic_shared_array_4:
				; CHECK: v_mov_b32_e32 [[DYNLDS:v[0-9]+]], 0x48
				; CHECK: v_lshlrev_b32_e32 [[IDX:v[0-9]+]], 2, {{v[0-9]+}}
				; CHECK: v_add_u32_e32 {{v[0-9]+}}, [[DYNLDS]], [[IDX]]
				define amdgpu_kernel void @dynamic_shared_array_4(i32 %idx) {
				%tid.x = tail call i32 @llvm.amdgcn.workitem.id.x()
				%vidx = add i32 %tid.x, %idx
				%arrayidx0 = getelementptr inbounds [67 x i8], [67 x i8] addrspace(3)* @lds3, i32 0, i32 %vidx
				%val0 = load i8, i8 addrspace(3)* %arrayidx0, align 4
				%val1 = uitofp i8 %val0 to float
				%val2 = uitofp i8 %val0 to double
				%arrayidx1 = getelementptr inbounds [0 x float], [0 x float] addrspace(3)* @dynamic_shared0, i32 0, i32 %tid.x
				store float %val1, float addrspace(3)* %arrayidx1, align 4
				%arrayidx2 = getelementptr inbounds [0 x double], [0 x double] addrspace(3)* @dynamic_shared1, i32 0, i32 %tid.x
				store double %val2, double addrspace(3)* %arrayidx2, align 4
				ret void
				}

				; Honor the explicit alignment from the specified variable.
				; CHECK-LABEL: {{^}}dynamic_shared_array_5:
				; CHECK: v_mov_b32_e32 [[DYNLDS:v[0-9]+]], 0x44
				; CHECK: v_lshlrev_b32_e32 [[IDX:v[0-9]+]], 2, {{v[0-9]+}}
				; CHECK: v_add_u32_e32 {{v[0-9]+}}, [[DYNLDS]], [[IDX]]
				define amdgpu_kernel void @dynamic_shared_array_5(i32 %idx) {
				%tid.x = tail call i32 @llvm.amdgcn.workitem.id.x()
				%vidx = add i32 %tid.x, %idx
				%arrayidx0 = getelementptr inbounds [67 x i8], [67 x i8] addrspace(3)* @lds3, i32 0, i32 %vidx
				%val0 = load i8, i8 addrspace(3)* %arrayidx0, align 4
				%val1 = uitofp i8 %val0 to float
				%val2 = uitofp i8 %val0 to double
				%arrayidx1 = getelementptr inbounds [0 x float], [0 x float] addrspace(3)* @dynamic_shared0, i32 0, i32 %tid.x
				store float %val1, float addrspace(3)* %arrayidx1, align 4
				%arrayidx2 = getelementptr inbounds [0 x double], [0 x double] addrspace(3)* @dynamic_shared2, i32 0, i32 %tid.x
				store double %val2, double addrspace(3)* %arrayidx2, align 4
				ret void
				}

				; Honor the explicit alignment from the specified variable.
				; CHECK-LABEL: {{^}}dynamic_shared_array_6:
				; CHECK: v_mov_b32_e32 [[DYNLDS:v[0-9]+]], 0x50
				; CHECK: v_lshlrev_b32_e32 [[IDX:v[0-9]+]], 2, {{v[0-9]+}}
				; CHECK: v_add_u32_e32 {{v[0-9]+}}, [[DYNLDS]], [[IDX]]
				define amdgpu_kernel void @dynamic_shared_array_6(i32 %idx) {
				%tid.x = tail call i32 @llvm.amdgcn.workitem.id.x()
				%vidx = add i32 %tid.x, %idx
				%arrayidx0 = getelementptr inbounds [67 x i8], [67 x i8] addrspace(3)* @lds3, i32 0, i32 %vidx
				%val0 = load i8, i8 addrspace(3)* %arrayidx0, align 4
				%val1 = uitofp i8 %val0 to float
				%val2 = uitofp i8 %val0 to double
				%arrayidx1 = getelementptr inbounds [0 x float], [0 x float] addrspace(3)* @dynamic_shared0, i32 0, i32 %tid.x
				store float %val1, float addrspace(3)* %arrayidx1, align 4
				%arrayidx2 = getelementptr inbounds [0 x double], [0 x double] addrspace(3)* @dynamic_shared3, i32 0, i32 %tid.x
				store double %val2, double addrspace(3)* %arrayidx2, align 4
				ret void
				}

				declare i32 @llvm.amdgcn.workitem.id.x()

llvm/test/CodeGen/AMDGPU/hip.extern.shared.array.ll

This file was added.

				; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx900 -verify-machineinstrs -o - %s \| FileCheck %s

				@lds0 = addrspace(3) global [512 x float] undef
				@lds1 = addrspace(3) global [256 x float] undef
				@lds2 = addrspace(3) global [4096 x float] undef
				@lds3 = addrspace(3) global [67 x i8] undef

				@dynamic_shared0 = external addrspace(3) global [0 x float]
				@dynamic_shared1 = external addrspace(3) global [0 x double]
				@dynamic_shared2 = external addrspace(3) global [0 x double], align 4
				@dynamic_shared3 = external addrspace(3) global [0 x double], align 16
				arsenmUnsubmitted Not Done Reply Inline Actions None of these use an unnaturally high alignment arsenm: None of these use an unnaturally high alignment

				; CHECK-LABEL: {{^}}dynamic_shared_array_0:
				; CHECK: v_add_u32_e32 v{{[0-9]+}}, 0x800, v{{[0-9]+}}
				define amdgpu_kernel void @dynamic_shared_array_0(float addrspace(1)* %out) {
				%tid.x = tail call i32 @llvm.amdgcn.workitem.id.x()
				%arrayidx0 = getelementptr inbounds [512 x float], [512 x float] addrspace(3)* @lds0, i32 0, i32 %tid.x
				%val0 = load float, float addrspace(3)* %arrayidx0, align 4
				%arrayidx1 = getelementptr inbounds [0 x float], [0 x float] addrspace(3)* @dynamic_shared0, i32 0, i32 %tid.x
				store float %val0, float addrspace(3)* %arrayidx1, align 4
				ret void
				}

				; CHECK-LABEL: {{^}}dynamic_shared_array_1:
				; CHECK: v_mov_b32_e32 [[DYNLDS:v[0-9]+]], 0xc00
				; CHECK: v_lshl_add_u32 {{v[0-9]+}}, {{v[0-9]+}}, 2, [[DYNLDS]]
				define amdgpu_kernel void @dynamic_shared_array_1(float addrspace(1)* %out, i32 %cond) {
				entry:
				%tid.x = tail call i32 @llvm.amdgcn.workitem.id.x()
				%idx.0 = add nsw i32 %tid.x, 64
				%tmp = icmp eq i32 %cond, 0
				br i1 %tmp, label %if, label %else

				if: ; preds = %entry
				%arrayidx0 = getelementptr inbounds [512 x float], [512 x float] addrspace(3)* @lds0, i32 0, i32 %idx.0
				%val0 = load float, float addrspace(3)* %arrayidx0, align 4
				br label %endif

				else: ; preds = %entry
				%arrayidx1 = getelementptr inbounds [256 x float], [256 x float] addrspace(3)* @lds1, i32 0, i32 %idx.0
				%val1 = load float, float addrspace(3)* %arrayidx1, align 4
				br label %endif

				endif: ; preds = %else, %if
				%val = phi float [ %val0, %if ], [ %val1, %else ]
				%arrayidx = getelementptr inbounds [0 x float], [0 x float] addrspace(3)* @dynamic_shared0, i32 0, i32 %tid.x
				store float %val, float addrspace(3)* %arrayidx, align 4
				ret void
				}

				; CHECK-LABEL: {{^}}dynamic_shared_array_2:
				; CHECK: v_mov_b32_e32 [[DYNLDS:v[0-9]+]], 0x4000
				; CHECK: v_lshl_add_u32 {{v[0-9]+}}, {{v[0-9]+}}, 2, [[DYNLDS]]
				define amdgpu_kernel void @dynamic_shared_array_2(i32 %idx) {
				%tid.x = tail call i32 @llvm.amdgcn.workitem.id.x()
				%vidx = add i32 %tid.x, %idx
				%arrayidx0 = getelementptr inbounds [4096 x float], [4096 x float] addrspace(3)* @lds2, i32 0, i32 %vidx
				%val0 = load float, float addrspace(3)* %arrayidx0, align 4
				%arrayidx1 = getelementptr inbounds [0 x float], [0 x float] addrspace(3)* @dynamic_shared0, i32 0, i32 %tid.x
				store float %val0, float addrspace(3)* %arrayidx1, align 4
				ret void
				}

				; The offset to the dynamic shared memory array should be aligned on the type
				; specified.
				; CHECK-LABEL: {{^}}dynamic_shared_array_3:
				; CHECK: v_mov_b32_e32 [[DYNLDS:v[0-9]+]], 0x44
				; CHECK: v_lshl_add_u32 {{v[0-9]+}}, {{v[0-9]+}}, 2, [[DYNLDS]]
				define amdgpu_kernel void @dynamic_shared_array_3(i32 %idx) {
				%tid.x = tail call i32 @llvm.amdgcn.workitem.id.x()
				%vidx = add i32 %tid.x, %idx
				%arrayidx0 = getelementptr inbounds [67 x i8], [67 x i8] addrspace(3)* @lds3, i32 0, i32 %vidx
				%val0 = load i8, i8 addrspace(3)* %arrayidx0, align 4
				%val1 = uitofp i8 %val0 to float
				%arrayidx1 = getelementptr inbounds [0 x float], [0 x float] addrspace(3)* @dynamic_shared0, i32 0, i32 %tid.x
				store float %val1, float addrspace(3)* %arrayidx1, align 4
				ret void
				}

				; The offset to the dynamic shared memory array should be aligned on the
				; maximal one.
				; CHECK-LABEL: {{^}}dynamic_shared_array_4:
				; CHECK: v_mov_b32_e32 [[DYNLDS:v[0-9]+]], 0x48
				; CHECK-DAG: v_lshl_add_u32 {{v[0-9]+}}, {{v[0-9]+}}, 2, [[DYNLDS]]
				; CHECK-DAG: v_lshl_add_u32 {{v[0-9]+}}, {{v[0-9]+}}, 3, [[DYNLDS]]
				define amdgpu_kernel void @dynamic_shared_array_4(i32 %idx) {
				%tid.x = tail call i32 @llvm.amdgcn.workitem.id.x()
				%vidx = add i32 %tid.x, %idx
				%arrayidx0 = getelementptr inbounds [67 x i8], [67 x i8] addrspace(3)* @lds3, i32 0, i32 %vidx
				%val0 = load i8, i8 addrspace(3)* %arrayidx0, align 4
				%val1 = uitofp i8 %val0 to float
				%val2 = uitofp i8 %val0 to double
				%arrayidx1 = getelementptr inbounds [0 x float], [0 x float] addrspace(3)* @dynamic_shared0, i32 0, i32 %tid.x
				store float %val1, float addrspace(3)* %arrayidx1, align 4
				%arrayidx2 = getelementptr inbounds [0 x double], [0 x double] addrspace(3)* @dynamic_shared1, i32 0, i32 %tid.x
				store double %val2, double addrspace(3)* %arrayidx2, align 4
				ret void
				}

				; Honor the explicit alignment from the specified variable.
				; CHECK-LABEL: {{^}}dynamic_shared_array_5:
				; CHECK: v_mov_b32_e32 [[DYNLDS:v[0-9]+]], 0x44
				; CHECK-DAG: v_lshl_add_u32 {{v[0-9]+}}, {{v[0-9]+}}, 2, [[DYNLDS]]
				; CHECK-DAG: v_lshl_add_u32 {{v[0-9]+}}, {{v[0-9]+}}, 3, [[DYNLDS]]
				define amdgpu_kernel void @dynamic_shared_array_5(i32 %idx) {
				%tid.x = tail call i32 @llvm.amdgcn.workitem.id.x()
				%vidx = add i32 %tid.x, %idx
				%arrayidx0 = getelementptr inbounds [67 x i8], [67 x i8] addrspace(3)* @lds3, i32 0, i32 %vidx
				%val0 = load i8, i8 addrspace(3)* %arrayidx0, align 4
				%val1 = uitofp i8 %val0 to float
				%val2 = uitofp i8 %val0 to double
				%arrayidx1 = getelementptr inbounds [0 x float], [0 x float] addrspace(3)* @dynamic_shared0, i32 0, i32 %tid.x
				store float %val1, float addrspace(3)* %arrayidx1, align 4
				%arrayidx2 = getelementptr inbounds [0 x double], [0 x double] addrspace(3)* @dynamic_shared2, i32 0, i32 %tid.x
				store double %val2, double addrspace(3)* %arrayidx2, align 4
				ret void
				}

				; Honor the explicit alignment from the specified variable.
				; CHECK-LABEL: {{^}}dynamic_shared_array_6:
				; CHECK: v_mov_b32_e32 [[DYNLDS:v[0-9]+]], 0x50
				; CHECK-DAG: v_lshl_add_u32 {{v[0-9]+}}, {{v[0-9]+}}, 2, [[DYNLDS]]
				; CHECK-DAG: v_lshl_add_u32 {{v[0-9]+}}, {{v[0-9]+}}, 3, [[DYNLDS]]
				define amdgpu_kernel void @dynamic_shared_array_6(i32 %idx) {
				%tid.x = tail call i32 @llvm.amdgcn.workitem.id.x()
				%vidx = add i32 %tid.x, %idx
				%arrayidx0 = getelementptr inbounds [67 x i8], [67 x i8] addrspace(3)* @lds3, i32 0, i32 %vidx
				%val0 = load i8, i8 addrspace(3)* %arrayidx0, align 4
				%val1 = uitofp i8 %val0 to float
				%val2 = uitofp i8 %val0 to double
				%arrayidx1 = getelementptr inbounds [0 x float], [0 x float] addrspace(3)* @dynamic_shared0, i32 0, i32 %tid.x
				store float %val1, float addrspace(3)* %arrayidx1, align 4
				%arrayidx2 = getelementptr inbounds [0 x double], [0 x double] addrspace(3)* @dynamic_shared3, i32 0, i32 %tid.x
				store double %val2, double addrspace(3)* %arrayidx2, align 4
				ret void
				}

				declare i32 @llvm.amdgcn.workitem.id.x()

llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-dynlds-align-invalid-case.mir

This file was added.

				# RUN: not llc -mtriple=amdgcn-amd-amdhsa -run-pass=none -verify-machineinstrs %s -o - 2>&1 \| FileCheck %s

				---
				arsenmUnsubmitted Done Reply Inline Actions This isn't XFAIL, it's run with not and check the error message output arsenm: This isn't XFAIL, it's run with not and check the error message output
				# CHECK: error: YAML:8:16: must be a power of two

				name: dyn_lds_with_alignment
				machineFunctionInfo:
				dynLDSAlign: 9

				body: \|
				bb.0:
				S_ENDPGM 0

				...

llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-no-ir.mir

# RUN: llc -mtriple=amdgcn-amd-amdhsa -run-pass=none -verify-machineinstrs %s -o - \| FileCheck -check-prefixes=FULL,ALL %s		# RUN: llc -mtriple=amdgcn-amd-amdhsa -run-pass=none -verify-machineinstrs %s -o - \| FileCheck -check-prefixes=FULL,ALL %s
# RUN: llc -mtriple=amdgcn-amd-amdhsa -run-pass=none -simplify-mir -verify-machineinstrs %s -o - \| FileCheck -check-prefixes=SIMPLE,ALL %s		# RUN: llc -mtriple=amdgcn-amd-amdhsa -run-pass=none -simplify-mir -verify-machineinstrs %s -o - \| FileCheck -check-prefixes=SIMPLE,ALL %s


---		---
# ALL-LABEL: name: kernel0		# ALL-LABEL: name: kernel0
# FULL: machineFunctionInfo:		# FULL: machineFunctionInfo:
# FULL-NEXT: explicitKernArgSize: 128		# FULL-NEXT: explicitKernArgSize: 128
# FULL-NEXT: maxKernArgAlign: 64		# FULL-NEXT: maxKernArgAlign: 64
# FULL-NEXT: ldsSize: 2048		# FULL-NEXT: ldsSize: 2048
		# FULL-NEXT: dynLDSAlign: 1
# FULL-NEXT: isEntryFunction: true		# FULL-NEXT: isEntryFunction: true
# FULL-NEXT: noSignedZerosFPMath: false		# FULL-NEXT: noSignedZerosFPMath: false
# FULL-NEXT: memoryBound: true		# FULL-NEXT: memoryBound: true
# FULL-NEXT: waveLimiter: true		# FULL-NEXT: waveLimiter: true
# FULL-NEXT: hasSpilledSGPRs: false		# FULL-NEXT: hasSpilledSGPRs: false
# FULL-NEXT: hasSpilledVGPRs: false		# FULL-NEXT: hasSpilledVGPRs: false
# FULL-NEXT: scratchRSrcReg: '$sgpr8_sgpr9_sgpr10_sgpr11'		# FULL-NEXT: scratchRSrcReg: '$sgpr8_sgpr9_sgpr10_sgpr11'
# FULL-NEXT: frameOffsetReg: '$sgpr12'		# FULL-NEXT: frameOffsetReg: '$sgpr12'
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines

# FIXME: Should be able to not print section for simple		# FIXME: Should be able to not print section for simple
---		---
# ALL-LABEL: name: no_mfi		# ALL-LABEL: name: no_mfi
# FULL: machineFunctionInfo:		# FULL: machineFunctionInfo:
# FULL-NEXT: explicitKernArgSize: 0		# FULL-NEXT: explicitKernArgSize: 0
# FULL-NEXT: maxKernArgAlign: 1		# FULL-NEXT: maxKernArgAlign: 1
# FULL-NEXT: ldsSize: 0		# FULL-NEXT: ldsSize: 0
		# FULL-NEXT: dynLDSAlign: 1
# FULL-NEXT: isEntryFunction: false		# FULL-NEXT: isEntryFunction: false
# FULL-NEXT: noSignedZerosFPMath: false		# FULL-NEXT: noSignedZerosFPMath: false
# FULL-NEXT: memoryBound: false		# FULL-NEXT: memoryBound: false
# FULL-NEXT: waveLimiter: false		# FULL-NEXT: waveLimiter: false
# FULL-NEXT: hasSpilledSGPRs: false		# FULL-NEXT: hasSpilledSGPRs: false
# FULL-NEXT: hasSpilledVGPRs: false		# FULL-NEXT: hasSpilledVGPRs: false
# FULL-NEXT: scratchRSrcReg: '$private_rsrc_reg'		# FULL-NEXT: scratchRSrcReg: '$private_rsrc_reg'
# FULL-NEXT: frameOffsetReg: '$fp_reg'		# FULL-NEXT: frameOffsetReg: '$fp_reg'
Show All 24 Lines
...		...

---		---
# ALL-LABEL: name: empty_mfi		# ALL-LABEL: name: empty_mfi
# FULL: machineFunctionInfo:		# FULL: machineFunctionInfo:
# FULL-NEXT: explicitKernArgSize: 0		# FULL-NEXT: explicitKernArgSize: 0
# FULL-NEXT: maxKernArgAlign: 1		# FULL-NEXT: maxKernArgAlign: 1
# FULL-NEXT: ldsSize: 0		# FULL-NEXT: ldsSize: 0
		# FULL-NEXT: dynLDSAlign: 1
		arsenmUnsubmitted Done Reply Inline Actions Should be 1 arsenm: Should be 1
# FULL-NEXT: isEntryFunction: false		# FULL-NEXT: isEntryFunction: false
# FULL-NEXT: noSignedZerosFPMath: false		# FULL-NEXT: noSignedZerosFPMath: false
# FULL-NEXT: memoryBound: false		# FULL-NEXT: memoryBound: false
# FULL-NEXT: waveLimiter: false		# FULL-NEXT: waveLimiter: false
# FULL-NEXT: hasSpilledSGPRs: false		# FULL-NEXT: hasSpilledSGPRs: false
# FULL-NEXT: hasSpilledVGPRs: false		# FULL-NEXT: hasSpilledVGPRs: false
# FULL-NEXT: scratchRSrcReg: '$private_rsrc_reg'		# FULL-NEXT: scratchRSrcReg: '$private_rsrc_reg'
# FULL-NEXT: frameOffsetReg: '$fp_reg'		# FULL-NEXT: frameOffsetReg: '$fp_reg'
Show All 25 Lines
...		...

---		---
# ALL-LABEL: name: empty_mfi_entry_func		# ALL-LABEL: name: empty_mfi_entry_func
# FULL: machineFunctionInfo:		# FULL: machineFunctionInfo:
# FULL-NEXT: explicitKernArgSize: 0		# FULL-NEXT: explicitKernArgSize: 0
# FULL-NEXT: maxKernArgAlign: 1		# FULL-NEXT: maxKernArgAlign: 1
# FULL-NEXT: ldsSize: 0		# FULL-NEXT: ldsSize: 0
		# FULL-NEXT: dynLDSAlign: 1
# FULL-NEXT: isEntryFunction: true		# FULL-NEXT: isEntryFunction: true
# FULL-NEXT: noSignedZerosFPMath: false		# FULL-NEXT: noSignedZerosFPMath: false
# FULL-NEXT: memoryBound: false		# FULL-NEXT: memoryBound: false
# FULL-NEXT: waveLimiter: false		# FULL-NEXT: waveLimiter: false
# FULL-NEXT: hasSpilledSGPRs: false		# FULL-NEXT: hasSpilledSGPRs: false
# FULL-NEXT: hasSpilledVGPRs: false		# FULL-NEXT: hasSpilledVGPRs: false
# FULL-NEXT: scratchRSrcReg: '$private_rsrc_reg'		# FULL-NEXT: scratchRSrcReg: '$private_rsrc_reg'
# FULL-NEXT: frameOffsetReg: '$fp_reg'		# FULL-NEXT: frameOffsetReg: '$fp_reg'
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	machineFunctionInfo:
hasSpilledSGPRs: true		hasSpilledSGPRs: true
hasSpilledVGPRs: true		hasSpilledVGPRs: true

body: \|		body: \|
bb.0:		bb.0:
S_ENDPGM 0		S_ENDPGM 0

...		...

		---
		# ALL-LABEL: name: dyn_lds_with_alignment

		# FULL: ldsSize: 0
		# FULL-NEXT: dynLDSAlign: 8

		# SIMPLE: dynLDSAlign: 8
		name: dyn_lds_with_alignment
		machineFunctionInfo:
		dynLDSAlign: 8

		body: \|
		bb.0:
		S_ENDPGM 0

		...

llvm/test/CodeGen/MIR/AMDGPU/machine-function-info.ll

; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=tahiti -stop-after finalize-isel -o %t.mir %s		; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=tahiti -stop-after finalize-isel -o %t.mir %s
; RUN: llc -run-pass=none -verify-machineinstrs %t.mir -o - \| FileCheck %s		; RUN: llc -run-pass=none -verify-machineinstrs %t.mir -o - \| FileCheck %s

; Test that SIMachineFunctionInfo can be round trip serialized through		; Test that SIMachineFunctionInfo can be round trip serialized through
; MIR.		; MIR.

@lds = addrspace(3) global [512 x float] undef, align 4		@lds = addrspace(3) global [512 x float] undef, align 4

; CHECK-LABEL: {{^}}name: kernel		; CHECK-LABEL: {{^}}name: kernel
; CHECK: machineFunctionInfo:		; CHECK: machineFunctionInfo:
; CHECK-NEXT: explicitKernArgSize: 128		; CHECK-NEXT: explicitKernArgSize: 128
; CHECK-NEXT: maxKernArgAlign: 64		; CHECK-NEXT: maxKernArgAlign: 64
; CHECK-NEXT: ldsSize: 0		; CHECK-NEXT: ldsSize: 0
		; CHECK-NEXT: dynLDSAlign: 1
; CHECK-NEXT: isEntryFunction: true		; CHECK-NEXT: isEntryFunction: true
; CHECK-NEXT: noSignedZerosFPMath: false		; CHECK-NEXT: noSignedZerosFPMath: false
; CHECK-NEXT: memoryBound: false		; CHECK-NEXT: memoryBound: false
; CHECK-NEXT: waveLimiter: false		; CHECK-NEXT: waveLimiter: false
; CHECK-NEXT: hasSpilledSGPRs: false		; CHECK-NEXT: hasSpilledSGPRs: false
; CHECK-NEXT: hasSpilledVGPRs: false		; CHECK-NEXT: hasSpilledVGPRs: false
; CHECK-NEXT: scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'		; CHECK-NEXT: scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'
; CHECK-NEXT: frameOffsetReg: '$fp_reg'		; CHECK-NEXT: frameOffsetReg: '$fp_reg'
Show All 19 Lines	define amdgpu_kernel void @kernel(i32 %arg0, i64 %arg1, <16 x i32> %arg2) {
ret void		ret void
}		}

; CHECK-LABEL: {{^}}name: ps_shader		; CHECK-LABEL: {{^}}name: ps_shader
; CHECK: machineFunctionInfo:		; CHECK: machineFunctionInfo:
; CHECK-NEXT: explicitKernArgSize: 0		; CHECK-NEXT: explicitKernArgSize: 0
; CHECK-NEXT: maxKernArgAlign: 1		; CHECK-NEXT: maxKernArgAlign: 1
; CHECK-NEXT: ldsSize: 0		; CHECK-NEXT: ldsSize: 0
		; CHECK-NEXT: dynLDSAlign: 1
		arsenmUnsubmitted Not Done Reply Inline Actions The default should be 1 arsenm: The default should be 1
; CHECK-NEXT: isEntryFunction: true		; CHECK-NEXT: isEntryFunction: true
; CHECK-NEXT: noSignedZerosFPMath: false		; CHECK-NEXT: noSignedZerosFPMath: false
; CHECK-NEXT: memoryBound: false		; CHECK-NEXT: memoryBound: false
; CHECK-NEXT: waveLimiter: false		; CHECK-NEXT: waveLimiter: false
; CHECK-NEXT: hasSpilledSGPRs: false		; CHECK-NEXT: hasSpilledSGPRs: false
; CHECK-NEXT: hasSpilledVGPRs: false		; CHECK-NEXT: hasSpilledVGPRs: false
; CHECK-NEXT: scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'		; CHECK-NEXT: scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'
; CHECK-NEXT: frameOffsetReg: '$fp_reg'		; CHECK-NEXT: frameOffsetReg: '$fp_reg'
Show All 14 Lines	define amdgpu_ps void @ps_shader(i32 %arg0, i32 inreg %arg1) {
ret void		ret void
}		}

; CHECK-LABEL: {{^}}name: function		; CHECK-LABEL: {{^}}name: function
; CHECK: machineFunctionInfo:		; CHECK: machineFunctionInfo:
; CHECK-NEXT: explicitKernArgSize: 0		; CHECK-NEXT: explicitKernArgSize: 0
; CHECK-NEXT: maxKernArgAlign: 1		; CHECK-NEXT: maxKernArgAlign: 1
; CHECK-NEXT: ldsSize: 0		; CHECK-NEXT: ldsSize: 0
		; CHECK-NEXT: dynLDSAlign: 1
; CHECK-NEXT: isEntryFunction: false		; CHECK-NEXT: isEntryFunction: false
; CHECK-NEXT: noSignedZerosFPMath: false		; CHECK-NEXT: noSignedZerosFPMath: false
; CHECK-NEXT: memoryBound: false		; CHECK-NEXT: memoryBound: false
; CHECK-NEXT: waveLimiter: false		; CHECK-NEXT: waveLimiter: false
; CHECK-NEXT: hasSpilledSGPRs: false		; CHECK-NEXT: hasSpilledSGPRs: false
; CHECK-NEXT: hasSpilledVGPRs: false		; CHECK-NEXT: hasSpilledVGPRs: false
; CHECK-NEXT: scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'		; CHECK-NEXT: scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
; CHECK-NEXT: frameOffsetReg: '$sgpr33'		; CHECK-NEXT: frameOffsetReg: '$sgpr33'
Show All 13 Lines	define void @function() {
ret void		ret void
}		}

; CHECK-LABEL: {{^}}name: function_nsz		; CHECK-LABEL: {{^}}name: function_nsz
; CHECK: machineFunctionInfo:		; CHECK: machineFunctionInfo:
; CHECK-NEXT: explicitKernArgSize: 0		; CHECK-NEXT: explicitKernArgSize: 0
; CHECK-NEXT: maxKernArgAlign: 1		; CHECK-NEXT: maxKernArgAlign: 1
; CHECK-NEXT: ldsSize: 0		; CHECK-NEXT: ldsSize: 0
		; CHECK-NEXT: dynLDSAlign: 1
; CHECK-NEXT: isEntryFunction: false		; CHECK-NEXT: isEntryFunction: false
; CHECK-NEXT: noSignedZerosFPMath: true		; CHECK-NEXT: noSignedZerosFPMath: true
; CHECK-NEXT: memoryBound: false		; CHECK-NEXT: memoryBound: false
; CHECK-NEXT: waveLimiter: false		; CHECK-NEXT: waveLimiter: false
; CHECK-NEXT: hasSpilledSGPRs: false		; CHECK-NEXT: hasSpilledSGPRs: false
; CHECK-NEXT: hasSpilledVGPRs: false		; CHECK-NEXT: hasSpilledVGPRs: false
; CHECK-NEXT: scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'		; CHECK-NEXT: scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
; CHECK-NEXT: frameOffsetReg: '$sgpr33'		; CHECK-NEXT: frameOffsetReg: '$sgpr33'
▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[amdgpu] Add codegen support for HIP dynamic shared memory.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 286934

llvm/include/llvm/CodeGen/MIRYamlMapping.h

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/hip.extern.shared.array.ll

llvm/test/CodeGen/AMDGPU/hip.extern.shared.array.ll

llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-dynlds-align-invalid-case.mir

llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-no-ir.mir

llvm/test/CodeGen/MIR/AMDGPU/machine-function-info.ll

[amdgpu] Add codegen support for HIP dynamic shared memory.
ClosedPublic