This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/CodeGen/
-
CodeGen/
-
TargetInfo.cpp
-
test/
-
CodeGen/
-
nvptx-abi.c
-
CodeGenCUDA/
-
kernel-args-alignment.cu
1/2
kernel-args.cu
-
OpenMP/
-
nvptx_unsupported_type_codegen.cpp
-
llvm/lib/Target/NVPTX/
-
lib/
-
Target/
-
NVPTX/
-
CMakeLists.txt
-
NVPTX.h
-
NVPTXAA.cpp
-
NVPTXAsmPrinter.cpp
-
NVPTXISelLowering.cpp
-
NVPTXTargetMachine.cpp

Differential D91590

[NVPTX] Efficently support dynamic index on CUDA kernel aggregate parameters.
Needs ReviewPublic

Authored by hliao on Nov 16 2020, 11:15 PM.

Download Raw Diff

Details

Reviewers

tra
jlebar

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	370 ms	linux > HWAddressSanitizer-x86_64.TestCases::sizes.cpp

Event Timeline

hliao created this revision.Nov 16 2020, 11:15 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptNov 16 2020, 11:15 PM

Herald added subscribers: llvm-commits, cfe-commits, arphaman and 4 others. · View Herald Transcript

hliao requested review of this revision.Nov 16 2020, 11:15 PM

This's an experimental or demo-only patch in my spare time on eliminating private memory usage in https://godbolt.org/z/EPPn6h. The attachment

sample.tar.xz2 KBDownload

includes both the reference and new IR, PTX, and SASS (sm_60) output. For the new code, that aggregate argument is loaded through LDC instruction in SASS instead of MOV due to the non-static address. I don't have sm_60 to verify that. Could you try that on the real hardware?

BTW, from PTX ISA document, parameter space is read-only for input parameters and write-only for output parameters. If that's right, even non-kernel function may also require a similar change as the semantic is different from the language model, where the argument variable could be modified in the function body.

Harbormaster completed remote builds in B79052: Diff 305665.Nov 16 2020, 11:54 PM

In D91590#2398842, @hliao wrote:

This's an experimental or demo-only patch in my spare time on eliminating private memory usage in https://godbolt.org/z/EPPn6h. The attachment
sample.tar.xz2 KBDownload
includes both the reference and new IR, PTX, and SASS (sm_60) output. For the new code, that aggregate argument is loaded through LDC instruction in SASS instead of MOV due to the non-static address. I don't have sm_60 to verify that. Could you try that on the real hardware?

I'll give it a try.

BTW, from PTX ISA document, parameter space is read-only for input parameters and write-only for output parameters. If that's right, even non-kernel function may also require a similar change as the semantic is different from the language model, where the argument variable could be modified in the function body.

Regular functions currently handle parameters exactly the same way as kernels - via a copy to a local buffer, which can then be modified. https://godbolt.org/z/W9PY17
So, if we need to change a parameter, it would have to be done on a local copy.

clang/test/CodeGenCUDA/kernel-args.cu
13–14	Is the idea here to rely on PTX to store the value in param space (so we do actually pass the parameter by value) and represent it on IR level as a reference to an an externally-provided storage with the value. So: C++ passes argument by value IR knows that PTX will store it somewhere in param space and uses `byref` we still generate PTX which has parameter passed by value, but now we can access it directly via a reference to param-space value. Presumably for parameters we do want to modify, we'll need to fall back to having a local copy. So far so good. However, now we may have a problem distinguishing between C++-level arguments passed by value vs by reference -- they all will look like `byref` on IR level. That is, unless you rely on `addrspace(101)` to indicate that it's actually a `byval` in disguise. It looks plausible as long as we can guarantee that we never modify it. Neither in the current function nor in any of the callees, if we pass it by reference. I'm not particularly familiar with AA machinery. I'd appreciate if you could elaborate on how you see it all work end-to-end.

I am legit excited about this if we could figure out how to make it work, but I don't have anything to add beyond what tra said.

As mentioned earlier, that's very experimental support. Even though the SASS looks reasonable, it still needs verifying on real systems. For non-kernel functions, it seems we share the path. So that we should do a similar thing. The current approach fixes that in the codegen phase by adding back the alloca to match the parameter space semantic. Once that alloca is dynamically indexed, it won't be promoted in SROA. Only instcomb eliminates that alloca when it is only modified once by copying from a constant memory. As instcomb won't break certain patterns prepared in the codegen preparation, it won't run in the backend. That dynamically indexed alloca won't be removed.

clang/test/CodeGenCUDA/kernel-args.cu
13–14	It does the same thing as `nvptx-lower-args` does but applies that earlier in the frontend. The upside is that IR is optimized by all the middle-end opts. `instcomb` will remove that dynamically indexed `alloca` if it's only modified by copying from constant memory. AA teaches the compiler that parameter space has the property of constantness. Even though we run SROA after `nvptx-lower-args`, but we general won't run `instcomb` in the backend as it potentially breaks certain patterns prepared in the codegen preparation phase. `byref` (newly added) in LLVM IR is different from by-reference in C++. The later is translated into a pointer. `byref` in LLVM IR says that content of that pointer should not be modified in the function body. It won't be ambiguous from the IR side. It's still possible for the backend to do similar stuff. Once that `byval` argument has `readonly`, that `alloca` could be skipped.

hliao mentioned this in D91928: [nvptx] Skip alloca for read-only byval arguments..Nov 21 2020, 11:55 PM

Revision Contents

Path

Size

clang/

lib/

CodeGen/

TargetInfo.cpp

6 lines

test/

CodeGen/

nvptx-abi.c

10 lines

CodeGenCUDA/

kernel-args-alignment.cu

2 lines

kernel-args.cu

8 lines

OpenMP/

nvptx_unsupported_type_codegen.cpp

4 lines

llvm/

lib/

Target/

NVPTX/

1 line

2 lines

131 lines

3 lines

NVPTXISelLowering.cpp

3 lines

NVPTXTargetMachine.cpp

21 lines

Diff 305665

clang/lib/CodeGen/TargetInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,075 Lines • ▼ Show 20 Lines	if (isAggregateTypeForABI(Ty)) {
if (getContext().getLangOpts().CUDAIsDevice) {		if (getContext().getLangOpts().CUDAIsDevice) {
if (Ty->isCUDADeviceBuiltinSurfaceType())		if (Ty->isCUDADeviceBuiltinSurfaceType())
return ABIArgInfo::getDirect(		return ABIArgInfo::getDirect(
CGInfo.getCUDADeviceBuiltinSurfaceDeviceType());		CGInfo.getCUDADeviceBuiltinSurfaceDeviceType());
if (Ty->isCUDADeviceBuiltinTextureType())		if (Ty->isCUDADeviceBuiltinTextureType())
return ABIArgInfo::getDirect(		return ABIArgInfo::getDirect(
CGInfo.getCUDADeviceBuiltinTextureDeviceType());		CGInfo.getCUDADeviceBuiltinTextureDeviceType());
}		}
return getNaturalAlignIndirect(Ty, /* byval */ true);		return ABIArgInfo::getIndirectAliased(
		getContext().getTypeAlignInChars(Ty),
		getContext().getTargetAddressSpace(
		getLangASFromTargetAS(/ADDRESS_SPACE_PARAM/ 101)),
		false /Realign/, nullptr /Padding/);
}		}

if (const auto *EIT = Ty->getAs<ExtIntType>()) {		if (const auto *EIT = Ty->getAs<ExtIntType>()) {
if ((EIT->getNumBits() > 128) \|\|		if ((EIT->getNumBits() > 128) \|\|
(!getContext().getTargetInfo().hasInt128Type() &&		(!getContext().getTargetInfo().hasInt128Type() &&
EIT->getNumBits() > 64))		EIT->getNumBits() > 64))
return getNaturalAlignIndirect(Ty, /* byval */ true);		return getNaturalAlignIndirect(Ty, /* byval */ true);
}		}
▲ Show 20 Lines • Show All 4,073 Lines • Show Last 20 Lines

clang/test/CodeGen/nvptx-abi.c

	Show All 15 Lines
	// CHECK-LABEL: @bar			// CHECK-LABEL: @bar
	// CHECK: call %struct.float4_s @my_function			// CHECK: call %struct.float4_s @my_function
	ret = my_function();			ret = my_function();
	return ret.x;			return ret.x;
	}			}

	void foo(float4_t x) {			void foo(float4_t x) {
	// CHECK-LABEL: @foo			// CHECK-LABEL: @foo
	// CHECK: %struct.float4_s* byval(%struct.float4_s) align 4 %x			// CHECK: %struct.float4_s addrspace(101)* byref(%struct.float4_s) align 4 %0
	}			}

	void fooN(float4_t x, float4_t y, float4_t z) {			void fooN(float4_t x, float4_t y, float4_t z) {
	// CHECK-LABEL: @fooN			// CHECK-LABEL: @fooN
	// CHECK: %struct.float4_s* byval(%struct.float4_s) align 4 %x			// CHECK: %struct.float4_s addrspace(101)* byref(%struct.float4_s) align 4 %0
	// CHECK: %struct.float4_s* byval(%struct.float4_s) align 4 %y			// CHECK: %struct.float4_s addrspace(101)* byref(%struct.float4_s) align 4 %1
	// CHECK: %struct.float4_s* byval(%struct.float4_s) align 4 %z			// CHECK: %struct.float4_s addrspace(101)* byref(%struct.float4_s) align 4 %2
	}			}

	typedef struct nested_s {			typedef struct nested_s {
	unsigned long long x;			unsigned long long x;
	float z[64];			float z[64];
	float4_t t;			float4_t t;
	} nested_t;			} nested_t;

	void baz(nested_t x) {			void baz(nested_t x) {
	// CHECK-LABEL: @baz			// CHECK-LABEL: @baz
	// CHECK: %struct.nested_s* byval(%struct.nested_s) align 8 %x)			// CHECK: %struct.nested_s addrspace(101)* byref(%struct.nested_s) align 8 %0)
	}			}

clang/test/CodeGenCUDA/kernel-args-alignment.cu

	Show All 30 Lines
	// 1. offset 0, width 1			// 1. offset 0, width 1
	// 2. offset 8 (because alignof(S) == 8), width 16			// 2. offset 8 (because alignof(S) == 8), width 16
	// 3. offset 24, width 8			// 3. offset 24, width 8
	// HOST-OLD: call i32 @cudaSetupArgument({{[^,]*}}, i64 1, i64 0)			// HOST-OLD: call i32 @cudaSetupArgument({{[^,]*}}, i64 1, i64 0)
	// HOST-OLD: call i32 @cudaSetupArgument({{[^,]*}}, i64 16, i64 8)			// HOST-OLD: call i32 @cudaSetupArgument({{[^,]*}}, i64 16, i64 8)
	// HOST-OLD: call i32 @cudaSetupArgument({{[^,]*}}, i64 8, i64 24)			// HOST-OLD: call i32 @cudaSetupArgument({{[^,]*}}, i64 8, i64 24)

	// DEVICE-LABEL: @_Z6kernelc1SPi			// DEVICE-LABEL: @_Z6kernelc1SPi
	// DEVICE-SAME: i8{{[^,]}}, %struct.S byval(%struct.S) align 8{{[^,]}}, i32			// DEVICE-SAME: i8{{[^,]}}, %struct.S addrspace(101) byref(%struct.S) align 8{{[^,]}}, i32
	__global__ void kernel(char a, S s, int *b) {}			__global__ void kernel(char a, S s, int *b) {}

clang/test/CodeGenCUDA/kernel-args.cu

	// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -fcuda-is-device \			// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -fcuda-is-device \
	// RUN: -emit-llvm %s -o - \| FileCheck -check-prefix=AMDGCN %s			// RUN: -emit-llvm %s -o - \| FileCheck -check-prefix=AMDGCN %s
	// RUN: %clang_cc1 -x cuda -triple nvptx64-nvidia-cuda- -fcuda-is-device \			// RUN: %clang_cc1 -x cuda -triple nvptx64-nvidia-cuda- -fcuda-is-device \
	// RUN: -emit-llvm %s -o - \| FileCheck -check-prefix=NVPTX %s			// RUN: -emit-llvm %s -o - \| FileCheck -check-prefix=NVPTX %s
	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	struct A {			struct A {
	int a[32];			int a[32];
	float *p;			float *p;
	};			};

	// AMDGCN: define amdgpu_kernel void @_Z6kernel1A(%struct.A addrspace(4)* byref(%struct.A) align 8 %{{.+}})			// AMDGCN: define amdgpu_kernel void @_Z6kernel1A(%struct.A addrspace(4)* byref(%struct.A) align 8 %{{.+}})
	// NVPTX: define void @_Z6kernel1A(%struct.A* byval(%struct.A) align 8 %x)			// NVPTX: define void @_Z6kernel1A(%struct.A addrspace(101)* byref(%struct.A) align 8 %0)
	__global__ void kernel(A x) {			__global__ void kernel(A x) {
				traUnsubmitted Not Done Reply Inline Actions Is the idea here to rely on PTX to store the value in param space (so we do actually pass the parameter by value) and represent it on IR level as a reference to an an externally-provided storage with the value. So: C++ passes argument by value IR knows that PTX will store it somewhere in param space and uses `byref` we still generate PTX which has parameter passed by value, but now we can access it directly via a reference to param-space value. Presumably for parameters we do want to modify, we'll need to fall back to having a local copy. So far so good. However, now we may have a problem distinguishing between C++-level arguments passed by value vs by reference -- they all will look like `byref` on IR level. That is, unless you rely on `addrspace(101)` to indicate that it's actually a `byval` in disguise. It looks plausible as long as we can guarantee that we never modify it. Neither in the current function nor in any of the callees, if we pass it by reference. I'm not particularly familiar with AA machinery. I'd appreciate if you could elaborate on how you see it all work end-to-end. tra: Is the idea here to rely on PTX to store the value in param space (so we do actually pass the…
				hliaoAuthorUnsubmitted Done Reply Inline Actions It does the same thing as `nvptx-lower-args` does but applies that earlier in the frontend. The upside is that IR is optimized by all the middle-end opts. `instcomb` will remove that dynamically indexed `alloca` if it's only modified by copying from constant memory. AA teaches the compiler that parameter space has the property of constantness. Even though we run SROA after `nvptx-lower-args`, but we general won't run `instcomb` in the backend as it potentially breaks certain patterns prepared in the codegen preparation phase. `byref` (newly added) in LLVM IR is different from by-reference in C++. The later is translated into a pointer. `byref` in LLVM IR says that content of that pointer should not be modified in the function body. It won't be ambiguous from the IR side. It's still possible for the backend to do similar stuff. Once that `byval` argument has `readonly`, that `alloca` could be skipped. hliao: It does the same thing as `nvptx-lower-args` does but applies that earlier in the frontend. The…
	}			}

	class Kernel {			class Kernel {
	public:			public:
	// AMDGCN: define amdgpu_kernel void @_ZN6Kernel12memberKernelE1A(%struct.A addrspace(4)* byref(%struct.A) align 8 %{{.+}})			// AMDGCN: define amdgpu_kernel void @_ZN6Kernel12memberKernelE1A(%struct.A addrspace(4)* byref(%struct.A) align 8 %{{.+}})
	// NVPTX: define void @_ZN6Kernel12memberKernelE1A(%struct.A* byval(%struct.A) align 8 %x)			// NVPTX: define void @_ZN6Kernel12memberKernelE1A(%struct.A addrspace(101)* byref(%struct.A) align 8 %0)
	static __global__ void memberKernel(A x){}			static __global__ void memberKernel(A x){}
	template<typename T> static __global__ void templateMemberKernel(T x) {}			template<typename T> static __global__ void templateMemberKernel(T x) {}
	};			};


	template <typename T>			template <typename T>
	__global__ void templateKernel(T x) {}			__global__ void templateKernel(T x) {}

	void launch(void*);			void launch(void*);

	void test() {			void test() {
	Kernel K;			Kernel K;
	// AMDGCN: define amdgpu_kernel void @_Z14templateKernelI1AEvT_(%struct.A addrspace(4)* byref(%struct.A) align 8 %{{.+}}			// AMDGCN: define amdgpu_kernel void @_Z14templateKernelI1AEvT_(%struct.A addrspace(4)* byref(%struct.A) align 8 %{{.+}}
	// NVPTX: define void @_Z14templateKernelI1AEvT_(%struct.A* byval(%struct.A) align 8 %x)			// NVPTX: define void @_Z14templateKernelI1AEvT_(%struct.A addrspace(101)* byref(%struct.A) align 8 %0)
	launch((void*)templateKernel<A>);			launch((void*)templateKernel<A>);

	// AMDGCN: define amdgpu_kernel void @_ZN6Kernel20templateMemberKernelI1AEEvT_(%struct.A addrspace(4)* byref(%struct.A) align 8 %{{.+}}			// AMDGCN: define amdgpu_kernel void @_ZN6Kernel20templateMemberKernelI1AEEvT_(%struct.A addrspace(4)* byref(%struct.A) align 8 %{{.+}}
	// NVPTX: define void @_ZN6Kernel20templateMemberKernelI1AEEvT_(%struct.A* byval(%struct.A) align 8 %x)			// NVPTX: define void @_ZN6Kernel20templateMemberKernelI1AEEvT_(%struct.A addrspace(101)* byref(%struct.A) align 8 %0)
	launch((void*)Kernel::templateMemberKernel<A>);			launch((void*)Kernel::templateMemberKernel<A>);
	}			}

clang/test/OpenMP/nvptx_unsupported_type_codegen.cpp

Show All 28 Lines	struct T1 {
char c;		char c;
T1() : a(12), f(15) {}		T1() : a(12), f(15) {}
T1 &operator+(T1 &b) { f += b.a; return *this;}		T1 &operator+(T1 &b) { f += b.a; return *this;}
};		};

#pragma omp declare target		#pragma omp declare target
T a = T();		T a = T();
T f = a;		T f = a;
// CHECK: define{{ hidden \| }}void @{{.+}}foo{{.+}}([[T]]* byval([[T]]) align {{.+}})		// CHECK: define{{ hidden \| }}void @{{.+}}foo{{.+}}([[T]] addrspace(101)* byref([[T]]) align {{.+}})
void foo(T a = T()) {		void foo(T a = T()) {
return;		return;
}		}
// CHECK: define{{ hidden \| }}[6 x i64] @{{.+}}bar{{.+}}()		// CHECK: define{{ hidden \| }}[6 x i64] @{{.+}}bar{{.+}}()
T bar() {		T bar() {
// CHECK: bitcast [[T]]* %{{.+}} to [6 x i64]*		// CHECK: bitcast [[T]]* %{{.+}} to [6 x i64]*
// CHECK-NEXT: load [6 x i64], [6 x i64]* %{{.+}},		// CHECK-NEXT: load [6 x i64], [6 x i64]* %{{.+}},
// CHECK-NEXT: ret [6 x i64]		// CHECK-NEXT: ret [6 x i64]
return T();		return T();
}		}
// CHECK: define{{ hidden \| }}void @{{.+}}baz{{.+}}()		// CHECK: define{{ hidden \| }}void @{{.+}}baz{{.+}}()
void baz() {		void baz() {
// CHECK: call [6 x i64] @{{.+}}bar{{.+}}()		// CHECK: call [6 x i64] @{{.+}}bar{{.+}}()
// CHECK-NEXT: bitcast [[T]]* %{{.+}} to [6 x i64]*		// CHECK-NEXT: bitcast [[T]]* %{{.+}} to [6 x i64]*
// CHECK-NEXT: store [6 x i64] %{{.+}}, [6 x i64]* %{{.+}},		// CHECK-NEXT: store [6 x i64] %{{.+}}, [6 x i64]* %{{.+}},
T t = bar();		T t = bar();
}		}
T1 a1 = T1();		T1 a1 = T1();
T1 f1 = a1;		T1 f1 = a1;
// CHECK: define{{ hidden \| }}void @{{.+}}foo1{{.+}}([[T1]]* byval([[T1]]) align {{.+}})		// CHECK: define{{ hidden \| }}void @{{.+}}foo1{{.+}}([[T1]] addrspace(101)* byref([[T1]]) align {{.+}})
void foo1(T1 a = T1()) {		void foo1(T1 a = T1()) {
return;		return;
}		}
// CHECK: define{{ hidden \| }}[[T1]] @{{.+}}bar1{{.+}}()		// CHECK: define{{ hidden \| }}[[T1]] @{{.+}}bar1{{.+}}()
T1 bar1() {		T1 bar1() {
// CHECK: load [[T1]], [[T1]]*		// CHECK: load [[T1]], [[T1]]*
// CHECK-NEXT: ret [[T1]]		// CHECK-NEXT: ret [[T1]]
return T1();		return T1();
}		}
// CHECK: define{{ hidden \| }}void @{{.+}}baz1{{.+}}()		// CHECK: define{{ hidden \| }}void @{{.+}}baz1{{.+}}()
void baz1() {		void baz1() {
// CHECK: call [[T1]] @{{.+}}bar1{{.+}}()		// CHECK: call [[T1]] @{{.+}}bar1{{.+}}()
T1 t = bar1();		T1 t = bar1();
}		}
#pragma omp end declare target		#pragma omp end declare target

llvm/lib/Target/NVPTX/CMakeLists.txt

	add_llvm_component_group(NVPTX)			add_llvm_component_group(NVPTX)

	set(LLVM_TARGET_DEFINITIONS NVPTX.td)			set(LLVM_TARGET_DEFINITIONS NVPTX.td)

	tablegen(LLVM NVPTXGenAsmWriter.inc -gen-asm-writer)			tablegen(LLVM NVPTXGenAsmWriter.inc -gen-asm-writer)
	tablegen(LLVM NVPTXGenDAGISel.inc -gen-dag-isel)			tablegen(LLVM NVPTXGenDAGISel.inc -gen-dag-isel)
	tablegen(LLVM NVPTXGenInstrInfo.inc -gen-instr-info)			tablegen(LLVM NVPTXGenInstrInfo.inc -gen-instr-info)
	tablegen(LLVM NVPTXGenRegisterInfo.inc -gen-register-info)			tablegen(LLVM NVPTXGenRegisterInfo.inc -gen-register-info)
	tablegen(LLVM NVPTXGenSubtargetInfo.inc -gen-subtarget)			tablegen(LLVM NVPTXGenSubtargetInfo.inc -gen-subtarget)

	add_public_tablegen_target(NVPTXCommonTableGen)			add_public_tablegen_target(NVPTXCommonTableGen)

	set(NVPTXCodeGen_sources			set(NVPTXCodeGen_sources
				NVPTXAA.cpp
	NVPTXAllocaHoisting.cpp			NVPTXAllocaHoisting.cpp
	NVPTXAsmPrinter.cpp			NVPTXAsmPrinter.cpp
	NVPTXAssignValidGlobalNames.cpp			NVPTXAssignValidGlobalNames.cpp
	NVPTXFrameLowering.cpp			NVPTXFrameLowering.cpp
	NVPTXGenericToNVVM.cpp			NVPTXGenericToNVVM.cpp
	NVPTXISelDAGToDAG.cpp			NVPTXISelDAGToDAG.cpp
	NVPTXISelLowering.cpp			NVPTXISelLowering.cpp
	NVPTXImageOptimizer.cpp			NVPTXImageOptimizer.cpp
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/lib/Target/NVPTX/NVPTX.h

	Show All 40 Lines
	FunctionPass *createNVVMReflectPass(unsigned int SmVersion);			FunctionPass *createNVVMReflectPass(unsigned int SmVersion);
	MachineFunctionPass *createNVPTXPrologEpilogPass();			MachineFunctionPass *createNVPTXPrologEpilogPass();
	MachineFunctionPass *createNVPTXReplaceImageHandlesPass();			MachineFunctionPass *createNVPTXReplaceImageHandlesPass();
	FunctionPass *createNVPTXImageOptimizerPass();			FunctionPass *createNVPTXImageOptimizerPass();
	FunctionPass createNVPTXLowerArgsPass(const NVPTXTargetMachine TM);			FunctionPass createNVPTXLowerArgsPass(const NVPTXTargetMachine TM);
	FunctionPass *createNVPTXLowerAllocaPass();			FunctionPass *createNVPTXLowerAllocaPass();
	MachineFunctionPass *createNVPTXPeephole();			MachineFunctionPass *createNVPTXPeephole();
	MachineFunctionPass *createNVPTXProxyRegErasurePass();			MachineFunctionPass *createNVPTXProxyRegErasurePass();
				ImmutablePass *createNVPTXAAWrapperPass();
				ImmutablePass *createNVPTXExternalAAWrapperPass();

	namespace NVPTX {			namespace NVPTX {
	enum DrvInterface {			enum DrvInterface {
	NVCL,			NVCL,
	CUDA			CUDA
	};			};

	// A field inside TSFlags needs a shift and a mask. The usage is			// A field inside TSFlags needs a shift and a mask. The usage is
	▲ Show 20 Lines • Show All 111 Lines • Show Last 20 Lines

llvm/lib/Target/NVPTX/NVPTXAA.cpp

This file was added.

				#include "MCTargetDesc/NVPTXBaseInfo.h"
				#include "NVPTX.h"
				#include "llvm/ADT/Triple.h"
				#include "llvm/Analysis/AliasAnalysis.h"
				#include "llvm/Analysis/ValueTracking.h"
				#include "llvm/IR/Module.h"

				namespace llvm {
				void initializeNVPTXAAWrapperPass(PassRegistry &);
				void initializeNVPTXExternalAAWrapperPass(PassRegistry &);
				} // namespace llvm

				#define DEBUG_TYPE "nvptx-aa"

				using namespace llvm;

				namespace {

				class NVPTXAAResult : public AAResultBase<NVPTXAAResult> {
				friend AAResultBase<NVPTXAAResult>;

				public:
				explicit NVPTXAAResult() : AAResultBase() {}
				NVPTXAAResult(NVPTXAAResult &&Arg) : AAResultBase(std::move(Arg)) {}

				bool invalidate(Function &F, const PreservedAnalyses &PA,
				FunctionAnalysisManager::Invalidator &Inv);

				AliasResult alias(const MemoryLocation &LocA, const MemoryLocation &LocB,
				AAQueryInfo &AAQI) {
				MemoryLocation L1 = LocA;
				MemoryLocation L2 = LocB;
				unsigned AS1 = L1.Ptr->getType()->getPointerAddressSpace();
				unsigned AS2 = L2.Ptr->getType()->getPointerAddressSpace();
				if (AS1 != ADDRESS_SPACE_GENERIC) {
				std::swap(L1, L2);
				std::swap(AS1, AS2);
				}
				if (AS1 == ADDRESS_SPACE_GENERIC) {
				const auto *O1 =
				getUnderlyingObject(L1.Ptr->stripPointerCastsAndInvariantGroups());
				AS1 = O1->getType()->getPointerAddressSpace();
				if (AS2 == ADDRESS_SPACE_GENERIC) {
				const auto *O2 =
				getUnderlyingObject(L1.Ptr->stripPointerCastsAndInvariantGroups());
				AS2 = O2->getType()->getPointerAddressSpace();
				}
				if (AS1 == ADDRESS_SPACE_PARAM \|\| AS2 == ADDRESS_SPACE_PARAM) {
				if (AS1 != AS2)
				return NoAlias;
				// Fallback to the next alias analysis.
				} else if (AS1 != ADDRESS_SPACE_GENERIC && AS2 != ADDRESS_SPACE_GENERIC) {
				if (AS1 != AS2)
				return NoAlias;
				// Fallback to the next alias analysis.
				}
				} else if (AS1 != AS2) {
				return NoAlias;
				// Fallback to the next alias analysis.
				}
				// Query the next alias analysis.
				return AAResultBase::alias(LocA, LocB, AAQI);
				}

				bool pointsToConstantMemory(const MemoryLocation &Loc, AAQueryInfo &AAQI,
				bool OrLocal) {
				unsigned AS = Loc.Ptr->getType()->getPointerAddressSpace();
				// According to PTX ISA section 5.1.6.4, ``Function input parameters may be
				// read via `ld.param` and function return parameters may be written using
				// `st.param`; it is illegal to write to an input parameter or read from a
				// return parameter.'' It's safe to assume that parameter memory space is
				// constant.
				if (AS == ADDRESS_SPACE_CONST \|\| AS == ADDRESS_SPACE_PARAM)
				return true;
				return AAResultBase::pointsToConstantMemory(Loc, AAQI, OrLocal);
				}
				};

				class NVPTXAAWrapper : public ImmutablePass {
				std::unique_ptr<NVPTXAAResult> Result;

				public:
				static char ID;

				NVPTXAAWrapper() : ImmutablePass(ID) {
				initializeNVPTXAAWrapperPass(*PassRegistry::getPassRegistry());
				}

				NVPTXAAResult &getResult() { return *Result; }
				const NVPTXAAResult &getResult() const { return *Result; }

				bool doInitialization(Module &M) override {
				Result.reset(new NVPTXAAResult());
				return false;
				}

				bool doFinalization(Module &M) override {
				Result.reset();
				return false;
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.setPreservesAll();
				}
				};

				class NVPTXExternalAAWrapper : public ExternalAAWrapperPass {
				public:
				static char ID;

				NVPTXExternalAAWrapper()
				: ExternalAAWrapperPass([](Pass &P, Function &F, AAResults &AAR) {
				if (auto *WrapperPass = P.getAnalysisIfAvailable<NVPTXAAWrapper>())
				AAR.addAAResult(WrapperPass->getResult());
				}) {}
				};

				} // End of anonymous namespace

				char NVPTXAAWrapper::ID = 0;
				char NVPTXExternalAAWrapper::ID = 0;

				INITIALIZE_PASS(NVPTXAAWrapper, DEBUG_TYPE, "NVPTX AA Wrapper", true, true)
				INITIALIZE_PASS(NVPTXExternalAAWrapper, "nvptx-external-aa-wrapper",
				"NVPTX ExternalAA Wrapper", true, true)

				ImmutablePass *llvm::createNVPTXAAWrapperPass() { return new NVPTXAAWrapper(); }

				ImmutablePass *llvm::createNVPTXExternalAAWrapperPass() {
				return new NVPTXExternalAAWrapper();
				}

llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp

Show First 20 Lines • Show All 1,451 Lines • ▼ Show 20 Lines	if (isKernelFunction(*F)) {
O << "\t.param .samplerref ";		O << "\t.param .samplerref ";
CurrentFnSym->print(O, MAI);		CurrentFnSym->print(O, MAI);
O << "_param_" << paramIndex;		O << "_param_" << paramIndex;
}		}
continue;		continue;
}		}
}		}

if (!PAL.hasParamAttribute(paramIndex, Attribute::ByVal)) {		if (!PAL.hasParamAttribute(paramIndex, Attribute::ByVal) &&
		!PAL.hasParamAttribute(paramIndex, Attribute::ByRef)) {
if (Ty->isAggregateType() \|\| Ty->isVectorTy() \|\| Ty->isIntegerTy(128)) {		if (Ty->isAggregateType() \|\| Ty->isVectorTy() \|\| Ty->isIntegerTy(128)) {
// Just print .param .align <a> .b8 .param[size];		// Just print .param .align <a> .b8 .param[size];
// <a> = PAL.getparamalignment		// <a> = PAL.getparamalignment
// size = typeallocsize of element type		// size = typeallocsize of element type
const Align align = DL.getValueOrABITypeAlignment(		const Align align = DL.getValueOrABITypeAlignment(
PAL.getParamAlignment(paramIndex), Ty);		PAL.getParamAlignment(paramIndex), Ty);

unsigned sz = DL.getTypeAllocSize(Ty);		unsigned sz = DL.getTypeAllocSize(Ty);
▲ Show 20 Lines • Show All 784 Lines • Show Last 20 Lines

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

Show First 20 Lines • Show All 2,525 Lines • ▼ Show 20 Lines	if (theArgs[i]->use_empty()) {
InVals.push_back(DAG.getNode(ISD::UNDEF, dl, Ins[InsIdx].VT));		InVals.push_back(DAG.getNode(ISD::UNDEF, dl, Ins[InsIdx].VT));
continue;		continue;
}		}

// In the following cases, assign a node order of "idx+1"		// In the following cases, assign a node order of "idx+1"
// to newly created nodes. The SDNodes for params have to		// to newly created nodes. The SDNodes for params have to
// appear in the same order as their order of appearance		// appear in the same order as their order of appearance
// in the original function. "idx+1" holds that order.		// in the original function. "idx+1" holds that order.
if (!PAL.hasParamAttribute(i, Attribute::ByVal)) {		if (!PAL.hasParamAttribute(i, Attribute::ByVal) &&
		!PAL.hasParamAttribute(i, Attribute::ByRef)) {
bool aggregateIsPacked = false;		bool aggregateIsPacked = false;
if (StructType *STy = dyn_cast<StructType>(Ty))		if (StructType *STy = dyn_cast<StructType>(Ty))
aggregateIsPacked = STy->isPacked();		aggregateIsPacked = STy->isPacked();

SmallVector<EVT, 16> VTs;		SmallVector<EVT, 16> VTs;
SmallVector<uint64_t, 16> Offsets;		SmallVector<uint64_t, 16> Offsets;
ComputePTXValueVTs(*this, DL, Ty, VTs, &Offsets, 0);		ComputePTXValueVTs(*this, DL, Ty, VTs, &Offsets, 0);
assert(VTs.size() > 0 && "Unexpected empty type.");		assert(VTs.size() > 0 && "Unexpected empty type.");
▲ Show 20 Lines • Show All 2,505 Lines • Show Last 20 Lines

llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp

Show First 20 Lines • Show All 192 Lines • ▼ Show 20 Lines
} // end anonymous namespace		} // end anonymous namespace

TargetPassConfig *NVPTXTargetMachine::createPassConfig(PassManagerBase &PM) {		TargetPassConfig *NVPTXTargetMachine::createPassConfig(PassManagerBase &PM) {
return new NVPTXPassConfig(*this, PM);		return new NVPTXPassConfig(*this, PM);
}		}

void NVPTXTargetMachine::adjustPassManager(PassManagerBuilder &Builder) {		void NVPTXTargetMachine::adjustPassManager(PassManagerBuilder &Builder) {
Builder.addExtension(		Builder.addExtension(
PassManagerBuilder::EP_EarlyAsPossible,		PassManagerBuilder::EP_EarlyAsPossible,
[&](const PassManagerBuilder &, legacy::PassManagerBase &PM) {		[&](const PassManagerBuilder &, legacy::PassManagerBase &PM) {
		PM.add(createNVPTXAAWrapperPass());
		PM.add(createNVPTXExternalAAWrapperPass());
PM.add(createNVVMReflectPass(Subtarget.getSmVersion()));		PM.add(createNVVMReflectPass(Subtarget.getSmVersion()));
PM.add(createNVVMIntrRangePass(Subtarget.getSmVersion()));		PM.add(createNVVMIntrRangePass(Subtarget.getSmVersion()));
});		});
		Builder.addExtension(
		PassManagerBuilder::EP_ModuleOptimizerEarly,
		[&](const PassManagerBuilder &, legacy::PassManagerBase &PM) {
		PM.add(createNVPTXAAWrapperPass());
		PM.add(createNVPTXExternalAAWrapperPass());
		});
}		}

TargetTransformInfo		TargetTransformInfo
NVPTXTargetMachine::getTargetTransformInfo(const Function &F) {		NVPTXTargetMachine::getTargetTransformInfo(const Function &F) {
return TargetTransformInfo(NVPTXTTIImpl(this, F));		return TargetTransformInfo(NVPTXTTIImpl(this, F));
}		}

void NVPTXPassConfig::addEarlyCSEOrGVNPass() {		void NVPTXPassConfig::addEarlyCSEOrGVNPass() {
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	void NVPTXPassConfig::addIRPasses() {
// NVPTXLowerArgs is required for correctness and should be run right		// NVPTXLowerArgs is required for correctness and should be run right
// before the address space inference passes.		// before the address space inference passes.
addPass(createNVPTXLowerArgsPass(&getNVPTXTargetMachine()));		addPass(createNVPTXLowerArgsPass(&getNVPTXTargetMachine()));
if (getOptLevel() != CodeGenOpt::None) {		if (getOptLevel() != CodeGenOpt::None) {
addAddressSpaceInferencePasses();		addAddressSpaceInferencePasses();
addStraightLineScalarOptimizationPasses();		addStraightLineScalarOptimizationPasses();
}		}

		addPass(createNVPTXAAWrapperPass());
		addPass(createNVPTXExternalAAWrapperPass());

// === LSR and other generic IR passes ===		// === LSR and other generic IR passes ===
TargetPassConfig::addIRPasses();		TargetPassConfig::addIRPasses();
// EarlyCSE is not always strong enough to clean up what LSR produces. For		// EarlyCSE is not always strong enough to clean up what LSR produces. For
// example, GVN can combine		// example, GVN can combine
//		//
// %0 = add %a, %b		// %0 = add %a, %b
// %1 = add %b, %a		// %1 = add %b, %a
//		//
▲ Show 20 Lines • Show All 111 Lines • Show Last 20 Lines