This is an archive of the discontinued LLVM Phabricator instance.

Scalarization for global uniform loads
ClosedPublic

Authored by alex-t on Nov 21 2016, 8:55 AM.

Download Raw Diff

Details

Reviewers

rampitec
nhaustov
• tstellarAMD
vpykhtin
arsenm

Commits

rG18009560c59d: [AMDGPU] Scalarization of global uniform loads.
rL289076: [AMDGPU] Scalarization of global uniform loads.

Summary

LC can currently select scalar load for uniform memory access basing on readonly memory address space only. This restriction originated from the fact that in HW prior to VI vector and scalar caches are not coherent. With MemoryDependenceAnalysis we can check that the memory location corresponding to the memory operand of the LOAD is not clobbered along the all paths from the function entry.

Diff Detail

Event Timeline

alex-t updated this revision to Diff 78729.Nov 21 2016, 8:55 AM

alex-t retitled this revision from to Scalarization for global uniform loads.

alex-t updated this object.

Herald added a reviewer: • tstellarAMD. · View Herald TranscriptNov 21 2016, 8:55 AM

Herald added subscribers: nhaehnle, arsenm. · View Herald Transcript

alex-t added reviewers: rampitec, nhaustov, vpykhtin, arsenm.Nov 21 2016, 8:56 AM

Related patch that takes a slightly different approach: https://reviews.llvm.org/D19493

This is not valid to run on just any function. A value shall not be written to memory starting from the kernel. We currently inline everything, but when we have calls that will be an error.

rampitec added inline comments.Nov 21 2016, 10:50 AM

lib/Target/AMDGPU/AMDGPUAnnotateUniformValues.cpp
94	What is wrong with Def?
lib/Target/AMDGPU/SIISelLowering.cpp
535	I see it already exists in SITargetLowering::isMemOpUniform(), but we need a better way to identify a kernarg than UndefValue. Especially because later the very same logic will be used for user's non-inlined functions.
536	I do not follow the logic. Why do GlobalValue and Constant pointers are always no clobber?
lib/Target/AMDGPU/SMInstructions.td
226	It also has to be uniform.
test/CodeGen/AMDGPU/global_smrd.ll
1	Please add -verify-machineinstrs

fixes according the reviewers requests

Herald added a subscriber: wdng. · View Herald TranscriptNov 22 2016, 7:33 AM

alex-t added inline comments.Nov 22 2016, 7:33 AM

lib/Target/AMDGPU/AMDGPUAnnotateUniformValues.cpp
94	Since I could not invent the example where the instructions that defines pointer may be of vector (non-uniform) kind I just deleted "isDef()" as you suggested.
lib/Target/AMDGPU/SIISelLowering.cpp
535	I don't understand this either. From my observation "kernarg" is always "llvm::Argument". The code itself obviously copied from the SITargetLowering::isMemOpUniform() BTW, what check would you suggest for specifically kernel arguments?
536	This is not a "logic" - just "copy-paste" )

I prefer the approach of changing the address space of the pointer, rather than adding an additional metadata node that the backend needs to check.

Also, this needs more tests. You can borrow the ones from the patch I mentioned earlier.

In D26917#601398, @tstellarAMD wrote:

Related patch that takes a slightly different approach: https://reviews.llvm.org/D19493

Tom, your patch is cool. The only thing I don't like about it is the fact that you have to change address space of "not-clobberable" pointers. I cannot take into account all possible passes that may (or may not) leverage on the correct (unchanged) address space. Let's imagine that further somebody invent a very cool optimization that is legal for distinctly read only but not legal for the global (even not-clobberable).

The problem with this patch is that I have to change a huge amount of tests.
I looked into the several failed lit tests.

The reason is as follows:

most of the tests are intended to be as simple as possible that's why they don't use divergent intrinsic if this is not necessary for the test. As a result they use uniform loads to retrieve the data.
any arithmetic instructions taking this uniform data as operands become uniform as well. Since we use ISel to deduce the scalar/vector form of operation, we'll have most of the instruction flow scalar.

For example this simple input

%b_ptr = getelementptr <2 x i32>, <2 x i32> addrspace(1)* %in, i32 1
%a = load <2 x i32>, <2 x i32> addrspace(1) * %in
%b = load <2 x i32>, <2 x i32> addrspace(1) * %b_ptr
%result = and <2 x i32> %a, %b
store <2 x i32> %result, <2 x i32> addrspace(1)* %out

will produce mostly scalar flow:

s_load_dwordx2 s[2:3], s[4:5], 0x8
s_load_dwordx2 s[0:1], s[4:5], 0x0
s_nop 0
s_waitcnt lgkmcnt(0)
s_load_dwordx2 s[4:5], s[2:3], 0x0
v_mov_b32_e32 v3, s1
s_load_dwordx2 s[2:3], s[2:3], 0x8
v_mov_b32_e32 v2, s0
s_waitcnt lgkmcnt(0)
s_and_b32 s3, s5, s3
s_and_b32 s2, s4, s2
v_mov_b32_e32 v0, s2
v_mov_b32_e32 v1, s3
flat_store_dwordx2 v[2:3], v[0:1]

approach of changing the address space of the pointer, rather than adding an additional metadata node that the backend needs to check.

I just meant that the latter adds information to IR but the former loses information from IR.

In D26917#602618, @tstellarAMD wrote:

I prefer the approach of changing the address space of the pointer, rather than adding an additional metadata node that the backend needs to check.

The problem with address space cast from global to constant is that it is against memory model. We have adopted HSA memory model and constant does not alias with global. In fact it does not alias even with flat.

This seems right to me, but it shall only run on kernel functions.

lib/Target/AMDGPU/SIISelLowering.cpp
535	Since it already exists let's keep it for now. I was thinking about a PseudoSourceValue.

In D26917#602851, @rampitec wrote:

In D26917#602618, @tstellarAMD wrote:

I prefer the approach of changing the address space of the pointer, rather than adding an additional metadata node that the backend needs to check.

The problem with address space cast from global to constant is that it is against memory model. We have adopted HSA memory model and constant does not alias with global. In fact it does not alias even with flat.

Ok, then I don't have any objections to this approach.

lib/Target/AMDGPU/SIISelLowering.cpp
536	Do we actually need all this code here? Isn't it enough just to check for the metadata?

In D26917#602882, @tstellarAMD wrote:

In D26917#602851, @rampitec wrote:

In D26917#602618, @tstellarAMD wrote:

I prefer the approach of changing the address space of the pointer, rather than adding an additional metadata node that the backend needs to check.

The problem with address space cast from global to constant is that it is against memory model. We have adopted HSA memory model and constant does not alias with global. In fact it does not alias even with flat.

Ok, then I don't have any objections to this approach.

There is one serious drawback in my approach: metadata cannot be set on Argument. So even trivial example like this "load i32, i32 addrspace(1)* %arg" won't be scalarized. To pass any metadata to ISel I need Instruction (i.e. GEP). So I'd have to transform

load i32, i32 addrspace(1)* %arg

%gep = getelementptr i32, i32 addrspace(1)* %arg, i32 0
load i32, i32 addrspace(1)* %gep

to set "noclobber" on GEP.

In D26917#609035, @alex-t wrote:

There is one serious drawback in my approach: metadata cannot be set on Argument. So even trivial example like this "load i32, i32 addrspace(1)* %arg" won't be scalarized. To pass any metadata to ISel I need Instruction (i.e. GEP). So I'd have to transform

load i32, i32 addrspace(1)* %arg

to

%gep = getelementptr i32, i32 addrspace(1)* %arg, i32 0
load i32, i32 addrspace(1)* %gep

to set "noclobber" on GEP.

I do not think this is an issue. We used this with HSAIL for a long time and seen no problems. Moreover, with call support the same will be needed for the uniformness metadata as well.

This is improved implementation of the global memory scalarization. It checks if the memory location is clobbered along the CFG to the Function boundary. This approach is restricted by the FunctionPass capabilities and is not allowed to go beyond the current Function. So we cannot check accesses to Module level variables outside the Function. That's why analysis is restricted to kernel only given that any function calls (when implemented) will be considered as clobbers for any memory location.
This implementation relies on the existing Divergence analysis and does not attempt to improve it's results.

Global loads scalarization is SWITCHED OFF by default.
To enable use: "-amdgpu-scalarize-global-loads=true" LLC option.

Further work is planned to improve current implementation.
Namely:
Constant expression as a pointer operand support
Caching the results of the DFG along the CFG for clobbering memory accesses to shorten the search path and improve compile time on the large CFGs.
There is no currently any DFS depth limit since I had relevant experience in HSAIL backend and did not observe serious compile time impact even on large source files. If somebody feel it is necessary I can add depth limitation.

rampitec added inline comments.Dec 2 2016, 2:27 PM

lib/Target/AMDGPU/AMDGPUAnnotateUniformValues.cpp
24	Includes should be alphabetically sorted.
76	Could you please move brace in line with the for loop or remove braces?
90	Could you please capitalize names "checklist" and "load"? Same for other variables.
93	Inconsistent indent.
94	Would be nice to have spacing around "*" consistent.
99	I would suggest checking for kernel only once in runOnFunction.
104	I suppose this condition can never happen, as you have replaced all uses of this Ptr below in the else block.
107	Can you avoid const_cast and use const_iterator?
110	Inconsistent indent.

rampitec added inline comments.Dec 2 2016, 2:33 PM

lib/Target/AMDGPU/AMDGPUAnnotateUniformValues.cpp
104	Please disregard this comment, the code is indeed reachable.

Style fixed

lib/Target/AMDGPU/AMDGPUAnnotateUniformValues.cpp
24	Which symbol of the full file path should be used as a sorting key?
107	As you probably know there is no consistent strategy regarding "const" in LLVM. MemoryDependenceAnalysis::getSimplePointerDependencyFrom, as welll as other MDA interface methods, accepts non-const iterator while it does not change anything. That's why the only way is to make all the parameters and methods in whole call stack non-const. I removed "const" modifier every where along the code. No const_casts any longer. Also I changed getSimplePointerDependencyFrom to getPointerDependencyFrom that queries invariant.load metadata stuff and thus potentially provides better alias granularity.

rampitec added inline comments.Dec 5 2016, 9:49 AM

lib/Target/AMDGPU/AMDGPUAnnotateUniformValues.cpp
24	You did it right now ;) The key is a full string.
112	That is really better just to return right here if it is not kernel.

alex-t added inline comments.Dec 5 2016, 11:01 AM

lib/Target/AMDGPU/AMDGPUAnnotateUniformValues.cpp
112	Really? What's about the loads from the readonly memory? Aren't they still valid even in non-kernel?

rampitec added inline comments.Dec 5 2016, 11:08 AM

lib/Target/AMDGPU/AMDGPUAnnotateUniformValues.cpp
112	OK, I see your point. But then check for isKernelFunc first in the condition before doing the expensive DFS.

alex-t updated this revision to Diff 80302.Dec 5 2016, 11:46 AM

alex-t marked an inline comment as done.

LGTM

This revision is now accepted and ready to land.Dec 5 2016, 12:02 PM

arsenm added inline comments.Dec 5 2016, 3:31 PM

lib/Target/AMDGPU/AMDGPUAnnotateUniformValues.cpp
19–20	Alphabetize
35	*LI
82–87	C++ style comments
lib/Target/AMDGPU/SIISelLowering.cpp
2629	Previous line
test/CodeGen/AMDGPU/global_smrd_cfg.ll
1 ↗	(On Diff #80302)	You don't need the -O2 since that's the default. Can you also change the check prefixes to GCN, and also run instnamer (same for the other tests)

arsenm added inline comments.Dec 5 2016, 3:32 PM

lib/Target/AMDGPU/AMDGPUAnnotateUniformValues.cpp
88–89	Weird formatting
102–103	else on same line as previous }
105–106	Asterisks to right

alex-t updated this revision to Diff 80390.Dec 6 2016, 2:23 AM

alex-t edited edge metadata.

Closed by commit rL289076: [AMDGPU] Scalarization of global uniform loads. (authored by alex-t). · Explain WhyDec 8 2016, 9:39 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

AMDGPUAnnotateUniformValues.cpp

18 lines

SIISelLowering.h

1 line

SIISelLowering.cpp

27 lines

SMInstructions.td

6 lines

test/

CodeGen/

AMDGPU/

global_smrd.ll

27 lines

Diff 78729

lib/Target/AMDGPU/AMDGPUAnnotateUniformValues.cpp

	Show All 10 Lines
	/// This pass adds amdgpu.uniform metadata to IR values so this information			/// This pass adds amdgpu.uniform metadata to IR values so this information
	/// can be used during instruction selection.			/// can be used during instruction selection.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "AMDGPU.h"			#include "AMDGPU.h"
	#include "AMDGPUIntrinsicInfo.h"			#include "AMDGPUIntrinsicInfo.h"
	#include "llvm/Analysis/DivergenceAnalysis.h"			#include "llvm/Analysis/DivergenceAnalysis.h"
				#include "llvm/Analysis/MemoryDependenceAnalysis.h"
	#include "llvm/IR/InstVisitor.h"			#include "llvm/IR/InstVisitor.h"
				arsenmUnsubmitted Not Done Reply Inline Actions Alphabetize arsenm: Alphabetize
	#include "llvm/IR/IRBuilder.h"			#include "llvm/IR/IRBuilder.h"
	#include "llvm/Support/Debug.h"			#include "llvm/Support/Debug.h"
	#include "llvm/Support/raw_ostream.h"			#include "llvm/Support/raw_ostream.h"

				rampitecUnsubmitted Done Reply Inline Actions Includes should be alphabetically sorted. rampitec: Includes should be alphabetically sorted.
				alex-tAuthorUnsubmitted Not Done Reply Inline Actions Which symbol of the full file path should be used as a sorting key? alex-t: Which symbol of the full file path should be used as a sorting key?
				rampitecUnsubmitted Not Done Reply Inline Actions You did it right now ;) The key is a full string. rampitec: You did it right now ;) The key is a full string.
	#define DEBUG_TYPE "amdgpu-annotate-uniform"			#define DEBUG_TYPE "amdgpu-annotate-uniform"

	using namespace llvm;			using namespace llvm;

	namespace {			namespace {

	class AMDGPUAnnotateUniformValues : public FunctionPass,			class AMDGPUAnnotateUniformValues : public FunctionPass,
	public InstVisitor<AMDGPUAnnotateUniformValues> {			public InstVisitor<AMDGPUAnnotateUniformValues> {
	DivergenceAnalysis *DA;			DivergenceAnalysis *DA;
				MemoryDependenceResults * MDR;

				arsenmUnsubmitted Not Done Reply Inline Actions LI arsenm:* *LI

	public:			public:
	static char ID;			static char ID;
	AMDGPUAnnotateUniformValues() :			AMDGPUAnnotateUniformValues() :
	FunctionPass(ID) { }			FunctionPass(ID) { }
	bool doInitialization(Module &M) override;			bool doInitialization(Module &M) override;
	bool runOnFunction(Function &F) override;			bool runOnFunction(Function &F) override;
	StringRef getPassName() const override {			StringRef getPassName() const override {
	return "AMDGPU Annotate Uniform Values";			return "AMDGPU Annotate Uniform Values";
	}			}
	void getAnalysisUsage(AnalysisUsage &AU) const override {			void getAnalysisUsage(AnalysisUsage &AU) const override {
	AU.addRequired<DivergenceAnalysis>();			AU.addRequired<DivergenceAnalysis>();
				AU.addRequired<MemoryDependenceWrapperPass>();
	AU.setPreservesAll();			AU.setPreservesAll();
	}			}

	void visitBranchInst(BranchInst &I);			void visitBranchInst(BranchInst &I);
	void visitLoadInst(LoadInst &I);			void visitLoadInst(LoadInst &I);

	};			};

	} // End anonymous namespace			} // End anonymous namespace

	INITIALIZE_PASS_BEGIN(AMDGPUAnnotateUniformValues, DEBUG_TYPE,			INITIALIZE_PASS_BEGIN(AMDGPUAnnotateUniformValues, DEBUG_TYPE,
	"Add AMDGPU uniform metadata", false, false)			"Add AMDGPU uniform metadata", false, false)
	INITIALIZE_PASS_DEPENDENCY(DivergenceAnalysis)			INITIALIZE_PASS_DEPENDENCY(DivergenceAnalysis)
				INITIALIZE_PASS_DEPENDENCY(MemoryDependenceWrapperPass)
	INITIALIZE_PASS_END(AMDGPUAnnotateUniformValues, DEBUG_TYPE,			INITIALIZE_PASS_END(AMDGPUAnnotateUniformValues, DEBUG_TYPE,
	"Add AMDGPU uniform metadata", false, false)			"Add AMDGPU uniform metadata", false, false)

	char AMDGPUAnnotateUniformValues::ID = 0;			char AMDGPUAnnotateUniformValues::ID = 0;

	static void setUniformMetadata(Instruction *I) {			static void setUniformMetadata(Instruction *I) {
	I->setMetadata("amdgpu.uniform", MDNode::get(I->getContext(), {}));			I->setMetadata("amdgpu.uniform", MDNode::get(I->getContext(), {}));
	}			}

	void AMDGPUAnnotateUniformValues::visitBranchInst(BranchInst &I) {			void AMDGPUAnnotateUniformValues::visitBranchInst(BranchInst &I) {
	if (I.isUnconditional())			if (I.isUnconditional())
	return;			return;

	Value *Cond = I.getCondition();			Value *Cond = I.getCondition();
				rampitecUnsubmitted Done Reply Inline Actions Could you please move brace in line with the for loop or remove braces? rampitec: Could you please move brace in line with the for loop or remove braces?
	if (!DA->isUniform(Cond))			if (!DA->isUniform(Cond))
	return;			return;

	setUniformMetadata(I.getParent()->getTerminator());			setUniformMetadata(I.getParent()->getTerminator());
	}			}

	void AMDGPUAnnotateUniformValues::visitLoadInst(LoadInst &I) {			void AMDGPUAnnotateUniformValues::visitLoadInst(LoadInst &I) {
	Value *Ptr = I.getPointerOperand();			Value *Ptr = I.getPointerOperand();
	if (!DA->isUniform(Ptr))			if (!DA->isUniform(Ptr))
	return;			return;

				arsenmUnsubmitted Not Done Reply Inline Actions C++ style comments arsenm: C++ style comments
	if (Instruction *PtrI = dyn_cast<Instruction>(Ptr))			if (Instruction *PtrI = dyn_cast<Instruction>(Ptr)) {
	setUniformMetadata(PtrI);			setUniformMetadata(PtrI);
				arsenmUnsubmitted Not Done Reply Inline Actions Weird formatting arsenm: Weird formatting
				// TODO: DL->getPointerSize
				rampitecUnsubmitted Done Reply Inline Actions Could you please capitalize names "checklist" and "load"? Same for other variables. rampitec: Could you please capitalize names "checklist" and "load"? Same for other variables.
				MemDepResult mdr =
				MDR->getSimplePointerDependencyFrom(MemoryLocation(Ptr),
				true, BasicBlock::iterator(PtrI), PtrI->getParent(), &I);
				rampitecUnsubmitted Done Reply Inline Actions Inconsistent indent. rampitec: Inconsistent indent.
				if (!mdr.isClobber() && !mdr.isDef())
				rampitecUnsubmitted Done Reply Inline Actions What is wrong with Def? rampitec: What is wrong with Def?
				alex-tAuthorUnsubmitted Not Done Reply Inline Actions Since I could not invent the example where the instructions that defines pointer may be of vector (non-uniform) kind I just deleted "isDef()" as you suggested. alex-t: Since I could not invent the example where the instructions that defines pointer may be of…
				rampitecUnsubmitted Done Reply Inline Actions Would be nice to have spacing around "" consistent. rampitec:* Would be nice to have spacing around "*" consistent.
				PtrI->setMetadata("amdgpu.noclobber", MDNode::get(PtrI->getContext(), {}));
				}
	}			}

	bool AMDGPUAnnotateUniformValues::doInitialization(Module &M) {			bool AMDGPUAnnotateUniformValues::doInitialization(Module &M) {
				rampitecUnsubmitted Not Done Reply Inline Actions I would suggest checking for kernel only once in runOnFunction. rampitec: I would suggest checking for kernel only once in runOnFunction.
	return false;			return false;
	}			}

	bool AMDGPUAnnotateUniformValues::runOnFunction(Function &F) {			bool AMDGPUAnnotateUniformValues::runOnFunction(Function &F) {
				arsenmUnsubmitted Not Done Reply Inline Actions else on same line as previous } arsenm: else on same line as previous }
	if (skipFunction(F))			if (skipFunction(F))
				rampitecUnsubmitted Not Done Reply Inline Actions I suppose this condition can never happen, as you have replaced all uses of this Ptr below in the else block. rampitec: I suppose this condition can never happen, as you have replaced all uses of this Ptr below in…
				rampitecUnsubmitted Not Done Reply Inline Actions Please disregard this comment, the code is indeed reachable. rampitec: Please disregard this comment, the code is indeed reachable.
	return false;			return false;

				arsenmUnsubmitted Not Done Reply Inline Actions Asterisks to right arsenm: Asterisks to right
	DA = &getAnalysis<DivergenceAnalysis>();			DA = &getAnalysis<DivergenceAnalysis>();
				rampitecUnsubmitted Done Reply Inline Actions Can you avoid const_cast and use const_iterator? rampitec: Can you avoid const_cast and use const_iterator?
				alex-tAuthorUnsubmitted Not Done Reply Inline Actions As you probably know there is no consistent strategy regarding "const" in LLVM. MemoryDependenceAnalysis::getSimplePointerDependencyFrom, as welll as other MDA interface methods, accepts non-const iterator while it does not change anything. That's why the only way is to make all the parameters and methods in whole call stack non-const. I removed "const" modifier every where along the code. No const_casts any longer. Also I changed getSimplePointerDependencyFrom to getPointerDependencyFrom that queries invariant.load metadata stuff and thus potentially provides better alias granularity. alex-t: As you probably know there is no consistent strategy regarding "const" in LLVM.
				MDR = &getAnalysis<MemoryDependenceWrapperPass>().getMemDep();
	visit(F);			visit(F);

				rampitecUnsubmitted Done Reply Inline Actions Inconsistent indent. rampitec: Inconsistent indent.
	return true;			return true;
	}			}
				rampitecUnsubmitted Not Done Reply Inline Actions That is really better just to return right here if it is not kernel. rampitec: That is really better just to return right here if it is not kernel.
				alex-tAuthorUnsubmitted Not Done Reply Inline Actions Really? What's about the loads from the readonly memory? Aren't they still valid even in non-kernel? alex-t: Really? What's about the loads from the readonly memory? Aren't they still valid even in non…
				rampitecUnsubmitted Done Reply Inline Actions OK, I see your point. But then check for isKernelFunc first in the condition before doing the expensive DFS. rampitec: OK, I see your point. But then check for isKernelFunc first in the condition before doing the…

	FunctionPass *			FunctionPass *
	llvm::createAMDGPUAnnotateUniformValues() {			llvm::createAMDGPUAnnotateUniformValues() {
	return new AMDGPUAnnotateUniformValues();			return new AMDGPUAnnotateUniformValues();
	}			}

lib/Target/AMDGPU/SIISelLowering.h

Show First 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	public:

EVT getOptimalMemOpType(uint64_t Size, unsigned DstAlign,		EVT getOptimalMemOpType(uint64_t Size, unsigned DstAlign,
unsigned SrcAlign, bool IsMemset,		unsigned SrcAlign, bool IsMemset,
bool ZeroMemset,		bool ZeroMemset,
bool MemcpyStrSrc,		bool MemcpyStrSrc,
MachineFunction &MF) const override;		MachineFunction &MF) const override;

bool isMemOpUniform(const SDNode *N) const;		bool isMemOpUniform(const SDNode *N) const;
		bool isMemOpHasNoClobberedMemOperand(const SDNode *N) const;
bool isNoopAddrSpaceCast(unsigned SrcAS, unsigned DestAS) const override;		bool isNoopAddrSpaceCast(unsigned SrcAS, unsigned DestAS) const override;

TargetLoweringBase::LegalizeTypeAction		TargetLoweringBase::LegalizeTypeAction
getPreferredVectorAction(EVT VT) const override;		getPreferredVectorAction(EVT VT) const override;

bool shouldConvertConstantLoadToIntImm(const APInt &Imm,		bool shouldConvertConstantLoadToIntImm(const APInt &Imm,
Type *Ty) const override;		Type *Ty) const override;

▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.cpp

Show First 20 Lines • Show All 518 Lines • ▼ Show 20 Lines	return AS == AMDGPUAS::GLOBAL_ADDRESS \|\|
AS == AMDGPUAS::CONSTANT_ADDRESS;		AS == AMDGPUAS::CONSTANT_ADDRESS;
}		}

bool SITargetLowering::isNoopAddrSpaceCast(unsigned SrcAS,		bool SITargetLowering::isNoopAddrSpaceCast(unsigned SrcAS,
unsigned DestAS) const {		unsigned DestAS) const {
return isFlatGlobalAddrSpace(SrcAS) && isFlatGlobalAddrSpace(DestAS);		return isFlatGlobalAddrSpace(SrcAS) && isFlatGlobalAddrSpace(DestAS);
}		}

		bool SITargetLowering::isMemOpHasNoClobberedMemOperand(const SDNode *N) const {
		const MemSDNode *MemNode = cast<MemSDNode>(N);
		const Value *Ptr = MemNode->getMemOperand()->getValue();

		// UndefValue means this is a load of a kernel input. These are uniform.
		// Sometimes LDS instructions have constant pointers.
		// If Ptr is null, then that means this mem operand contains a
		// PseudoSourceValue like GOT.
		if (!Ptr \|\| isa<UndefValue>(Ptr) \|\| isa<Argument>(Ptr) \|\|
		rampitecUnsubmitted Not Done Reply Inline Actions I see it already exists in SITargetLowering::isMemOpUniform(), but we need a better way to identify a kernarg than UndefValue. Especially because later the very same logic will be used for user's non-inlined functions. rampitec: I see it already exists in SITargetLowering::isMemOpUniform(), but we need a better way to…
		alex-tAuthorUnsubmitted Not Done Reply Inline Actions I don't understand this either. From my observation "kernarg" is always "llvm::Argument". The code itself obviously copied from the SITargetLowering::isMemOpUniform() BTW, what check would you suggest for specifically kernel arguments? alex-t: I don't understand this either. From my observation "kernarg" is always "llvm::Argument". The…
		rampitecUnsubmitted Not Done Reply Inline Actions Since it already exists let's keep it for now. I was thinking about a PseudoSourceValue. rampitec: Since it already exists let's keep it for now. I was thinking about a PseudoSourceValue.
		isa<Constant>(Ptr) \|\| isa<GlobalValue>(Ptr))
		rampitecUnsubmitted Done Reply Inline Actions I do not follow the logic. Why do GlobalValue and Constant pointers are always no clobber? rampitec: I do not follow the logic. Why do GlobalValue and Constant pointers are always no clobber?
		alex-tAuthorUnsubmitted Not Done Reply Inline Actions This is not a "logic" - just "copy-paste" ) alex-t: This is not a "logic" - just "copy-paste" )
		tstellarAMDUnsubmitted Not Done Reply Inline Actions Do we actually need all this code here? Isn't it enough just to check for the metadata? tstellarAMD: Do we actually need all this code here? Isn't it enough just to check for the metadata?
		return true;

		const Instruction *I = dyn_cast<Instruction>(Ptr);
		return I && I->getMetadata("amdgpu.noclobber");
		}

bool SITargetLowering::isMemOpUniform(const SDNode *N) const {		bool SITargetLowering::isMemOpUniform(const SDNode *N) const {
const MemSDNode *MemNode = cast<MemSDNode>(N);		const MemSDNode *MemNode = cast<MemSDNode>(N);
const Value *Ptr = MemNode->getMemOperand()->getValue();		const Value *Ptr = MemNode->getMemOperand()->getValue();

// UndefValue means this is a load of a kernel input. These are uniform.		// UndefValue means this is a load of a kernel input. These are uniform.
// Sometimes LDS instructions have constant pointers.		// Sometimes LDS instructions have constant pointers.
// If Ptr is null, then that means this mem operand contains a		// If Ptr is null, then that means this mem operand contains a
// PseudoSourceValue like GOT.		// PseudoSourceValue like GOT.
▲ Show 20 Lines • Show All 2,065 Lines • ▼ Show 20 Lines	AS = MFI->hasFlatScratchInit() ?
AMDGPUAS::PRIVATE_ADDRESS : AMDGPUAS::GLOBAL_ADDRESS;		AMDGPUAS::PRIVATE_ADDRESS : AMDGPUAS::GLOBAL_ADDRESS;

unsigned NumElements = MemVT.getVectorNumElements();		unsigned NumElements = MemVT.getVectorNumElements();
switch (AS) {		switch (AS) {
case AMDGPUAS::CONSTANT_ADDRESS:		case AMDGPUAS::CONSTANT_ADDRESS:
if (isMemOpUniform(Load))		if (isMemOpUniform(Load))
return SDValue();		return SDValue();
// Non-uniform loads will be selected to MUBUF instructions, so they		// Non-uniform loads will be selected to MUBUF instructions, so they
// have the same legalization requires ments as global and private		// have the same legalization requirements as global and private
// loads.		// loads.
//		//
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
case AMDGPUAS::GLOBAL_ADDRESS:		case AMDGPUAS::GLOBAL_ADDRESS:
		{
		arsenmUnsubmitted Not Done Reply Inline Actions Previous line arsenm: Previous line
		if (isMemOpUniform(Load) && isMemOpHasNoClobberedMemOperand(Load))
		return SDValue();
		// Non-uniform loads will be selected to MUBUF instructions, so they
		// have the same legalization requirements as global and private
		// loads.
		//
		}
		LLVM_FALLTHROUGH;
case AMDGPUAS::FLAT_ADDRESS:		case AMDGPUAS::FLAT_ADDRESS:
if (NumElements > 4)		if (NumElements > 4)
return SplitVectorLoad(Op, DAG);		return SplitVectorLoad(Op, DAG);
// v4 loads are supported for private and global memory.		// v4 loads are supported for private and global memory.
return SDValue();		return SDValue();
case AMDGPUAS::PRIVATE_ADDRESS: {		case AMDGPUAS::PRIVATE_ADDRESS: {
// Depending on the setting of the private_element_size field in the		// Depending on the setting of the private_element_size field in the
// resource descriptor, we can only make private accesses up to a certain		// resource descriptor, we can only make private accesses up to a certain
▲ Show 20 Lines • Show All 1,477 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SMInstructions.td

	Show First 20 Lines • Show All 214 Lines • ▼ Show 20 Lines

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Scalar Memory Patterns			// Scalar Memory Patterns
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	def smrd_load : PatFrag <(ops node:$ptr), (load node:$ptr), [{			def smrd_load : PatFrag <(ops node:$ptr), (load node:$ptr), [{
	auto Ld = cast<LoadSDNode>(N);			auto Ld = cast<LoadSDNode>(N);
	return Ld->getAlignment() >= 4 &&			return Ld->getAlignment() >= 4 &&
	Ld->getAddressSpace() == AMDGPUAS::CONSTANT_ADDRESS &&			(Ld->getAddressSpace() == AMDGPUAS::CONSTANT_ADDRESS &&
	static_cast<const SITargetLowering *>(getTargetLowering())->isMemOpUniform(N);			static_cast<const SITargetLowering *>(getTargetLowering())->isMemOpUniform(N)) \|\|
				(Ld->getAddressSpace() == AMDGPUAS::GLOBAL_ADDRESS &&
				static_cast<const SITargetLowering *>(getTargetLowering())->isMemOpHasNoClobberedMemOperand(N));
				rampitecUnsubmitted Done Reply Inline Actions It also has to be uniform. rampitec: It also has to be uniform.
	}]>;			}]>;

	def SMRDImm : ComplexPattern<i64, 2, "SelectSMRDImm">;			def SMRDImm : ComplexPattern<i64, 2, "SelectSMRDImm">;
	def SMRDImm32 : ComplexPattern<i64, 2, "SelectSMRDImm32">;			def SMRDImm32 : ComplexPattern<i64, 2, "SelectSMRDImm32">;
	def SMRDSgpr : ComplexPattern<i64, 2, "SelectSMRDSgpr">;			def SMRDSgpr : ComplexPattern<i64, 2, "SelectSMRDSgpr">;
	def SMRDBufferImm : ComplexPattern<i32, 1, "SelectSMRDBufferImm">;			def SMRDBufferImm : ComplexPattern<i32, 1, "SelectSMRDBufferImm">;
	def SMRDBufferImm32 : ComplexPattern<i32, 1, "SelectSMRDBufferImm32">;			def SMRDBufferImm32 : ComplexPattern<i32, 1, "SelectSMRDBufferImm32">;
	def SMRDBufferSgpr : ComplexPattern<i32, 1, "SelectSMRDBufferSgpr">;			def SMRDBufferSgpr : ComplexPattern<i32, 1, "SelectSMRDBufferSgpr">;
	▲ Show 20 Lines • Show All 285 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/global_smrd.ll

This file was added.

				; RUN: llc -O2 -mtriple amdgcn--amdhsa -mcpu=fiji < %s \| FileCheck %s
				rampitecUnsubmitted Not Done Reply Inline Actions Please add -verify-machineinstrs rampitec: Please add -verify-machineinstrs

				; CHECK: s_load_dwordx4
				; CHECK-NOT: flat_load_dword

				define amdgpu_kernel void @snork(float addrspace(1)* readonly %arg, float addrspace(1)* nocapture %arg1) {
				bb:
				%tmp2 = load float, float addrspace(1)* %arg, align 4, !tbaa !8
				%tmp3 = fadd float %tmp2, 0.000000e+00
				%tmp4 = getelementptr inbounds float, float addrspace(1)* %arg, i64 1
				%tmp5 = load float, float addrspace(1)* %tmp4, align 4, !tbaa !8
				%tmp6 = fadd float %tmp3, %tmp5
				%tmp7 = getelementptr inbounds float, float addrspace(1)* %arg, i64 2
				%tmp8 = load float, float addrspace(1)* %tmp7, align 4, !tbaa !8
				%tmp9 = fadd float %tmp6, %tmp8
				%tmp10 = getelementptr inbounds float, float addrspace(1)* %arg, i64 3
				%tmp11 = load float, float addrspace(1)* %tmp10, align 4, !tbaa !8
				%tmp12 = fadd float %tmp9, %tmp11
				%tmp13 = getelementptr inbounds float, float addrspace(1)* %arg1
				store float %tmp12, float addrspace(1)* %tmp13, align 4, !tbaa !8
				ret void
				}

				!8 = !{!9, !9, i64 0}
				!9 = !{!"float", !10, i64 0}
				!10 = !{!"omnipotent char", !11, i64 0}
				!11 = !{!"Simple C/C++ TBAA"}