This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/CodeGen/
-
CodeGen/
-
TargetInfo.cpp
-
test/CodeGenCUDA/
-
CodeGenCUDA/
-
surface.cu
-
texture.cu
-
llvm/
-
lib/Target/NVPTX/
-
Target/
-
NVPTX/
-
CMakeLists.txt
-
NVPTX.h
-
NVPTXTargetMachine.cpp
-
NVPTXTexSurfHandleInternalizer.cpp
-
test/CodeGen/NVPTX/
-
CodeGen/
-
NVPTX/
-
tex-read-cuda.ll

Differential D77777

[nvptx] Add `nvvm.texsurf.handle` internalizer.
Needs ReviewPublic

Authored by hliao on Apr 8 2020, 10:58 PM.

Download Raw Diff

Details

Reviewers

tra

Summary

Replace them with the internal version, i.e. nvvm.texsurf.handle.internal just before the instruction selector.
Teach clang codegen to generate nvvm.texsurf.handle instead of nvvm.texsurf.handle.internal.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	150 ms	lldb-unit.Host/_/HostTests::Unknown Unit Message ("")

Event Timeline

hliao created this revision.Apr 8 2020, 10:58 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 8 2020, 10:58 PM

Herald added subscribers: cfe-commits, hiraditya, mgorny, jholewinski. · View Herald Transcript

Harbormaster failed remote builds in B52454: Diff 256191!Apr 8 2020, 11:58 PM

Rebase to trunk.

Harbormaster failed remote builds in B52523: Diff 256321!Apr 9 2020, 9:05 AM

The patch could use a more detailed description. Specifically, it does not describe the purpose of these changes.

Replace them with the internal version, i.e. nvvm.texsurf.handle.internal just before the instruction selector.

It's not clear what is 'them'. 'nvvm.texsurf.handle' ?
If so, do we need 'internal' any more? Can we just rename internal and be done with it? Adding an extra pass just to replace one intrinsic with another seems to be unnecessary.

I may be missing something here. Why do we have internal and non-internal intrinsics at all? Do we need both?

In D77777#1972349, @tra wrote:

The patch could use a more detailed description. Specifically, it does not describe the purpose of these changes.

Replace them with the internal version, i.e. nvvm.texsurf.handle.internal just before the instruction selector.

It's not clear what is 'them'. 'nvvm.texsurf.handle' ?
If so, do we need 'internal' any more? Can we just rename internal and be done with it? Adding an extra pass just to replace one intrinsic with another seems to be unnecessary.

I may be missing something here. Why do we have internal and non-internal intrinsics at all? Do we need both?

besides required by NVVM IR spec, the metadata in that intrinsic is a trick to prevent it from being sunk into common code during optimization in LLVM IR. NVPTX backend only handles the internal version. We need to internalize them for codegen. I will put a brief explanation in that pass.

Add more comments to explain what that pass does.

Harbormaster failed remote builds in B52636: Diff 256511!Apr 10 2020, 12:30 AM

Fix a clang-tidy warning.

Harbormaster failed remote builds in B52640: Diff 256518!Apr 10 2020, 2:08 AM

In D77777#1972720, @hliao wrote:

In D77777#1972349, @tra wrote:

The patch could use a more detailed description. Specifically, it does not describe the purpose of these changes.

Replace them with the internal version, i.e. nvvm.texsurf.handle.internal just before the instruction selector.

It's not clear what is 'them'. 'nvvm.texsurf.handle' ?
If so, do we need 'internal' any more? Can we just rename internal and be done with it? Adding an extra pass just to replace one intrinsic with another seems to be unnecessary.

I may be missing something here. Why do we have internal and non-internal intrinsics at all? Do we need both?

besides required by NVVM IR spec,

NVVM IR spec is for nvidia's own compiler. It's based on LLVM, but it does not impose direct constraints on LLVM's design choices.

the metadata in that intrinsic is a trick to prevent it from being sunk into common code during optimization in LLVM IR.

This sounds like it may have been done that way in an attempt to work around a problem with intrinsics' constraints. We may want to check if there's a better way to do it now.
Right now both intrinsics are marked with [IntrNoMem] which may be the reason for compiler feeling free to move it around. We may need to give compiler correct information and then we may not need this just-in-time intrinsic replacement hack. I think it should be at least IntrArgMemOnly or, maybe IntrInaccessibleMemOrArgMemOnly.

NVPTX backend only handles the internal version.

This is obviously fixable.

In D77777#1974672, @tra wrote:

In D77777#1972720, @hliao wrote:

In D77777#1972349, @tra wrote:

The patch could use a more detailed description. Specifically, it does not describe the purpose of these changes.

Replace them with the internal version, i.e. nvvm.texsurf.handle.internal just before the instruction selector.

It's not clear what is 'them'. 'nvvm.texsurf.handle' ?
If so, do we need 'internal' any more? Can we just rename internal and be done with it? Adding an extra pass just to replace one intrinsic with another seems to be unnecessary.

I may be missing something here. Why do we have internal and non-internal intrinsics at all? Do we need both?

besides required by NVVM IR spec,

NVVM IR spec is for nvidia's own compiler. It's based on LLVM, but it does not impose direct constraints on LLVM's design choices.

It would be an advantage and, sometimes, desirable to generate IR compatible to NVVM IR spec.

the metadata in that intrinsic is a trick to prevent it from being sunk into common code during optimization in LLVM IR.

This sounds like it may have been done that way in an attempt to work around a problem with intrinsics' constraints. We may want to check if there's a better way to do it now.
Right now both intrinsics are marked with [IntrNoMem] which may be the reason for compiler feeling free to move it around. We may need to give compiler correct information and then we may not need this just-in-time intrinsic replacement hack. I think it should be at least IntrArgMemOnly or, maybe IntrInaccessibleMemOrArgMemOnly.

That may not exactly model the behavior as, for binding texture/surface support, in fact, it's true that there's no memory operation at all. Even with InstArgMemOnly or similar attributes, it still won't be preventable for optimizations to sink common code. Such trick is played in lots of intrinsics, such as read.register and etc.

NVPTX backend only handles the internal version.

This is obviously fixable.

SDAG so far cannot handle metadata GISel doesn't have support either. Getting that supported in TICG won't justify too much for target-specific intrinsics as metadata should not directly be used in code generation.

In D77777#1974849, @hliao wrote:

NVVM IR spec is for nvidia's own compiler. It's based on LLVM, but it does not impose direct constraints on LLVM's design choices.

It would be an advantage and, sometimes, desirable to generate IR compatible to NVVM IR spec.

I'm not against it, but I think it's OK to make different choices if we have good reasons for that. NVIDIA didn't update LLVM since they've contributed the original implementation, so by now we're both far behind the current state of NVVM and quite a bit sideways due to the things LLVM has added to NVPTX backend.

This sounds like it may have been done that way in an attempt to work around a problem with intrinsics' constraints. We may want to check if there's a better way to do it now.
Right now both intrinsics are marked with [IntrNoMem] which may be the reason for compiler feeling free to move it around. We may need to give compiler correct information and then we may not need this just-in-time intrinsic replacement hack. I think it should be at least IntrArgMemOnly or, maybe IntrInaccessibleMemOrArgMemOnly.

That may not exactly model the behavior as, for binding texture/surface support, in fact, it's true that there's no memory operation at all. Even with InstArgMemOnly or similar attributes, it still won't be preventable for optimizations to sink common code. Such trick is played in lots of intrinsics, such as read.register and etc.

Can you give me an example where/how optimizer would break things? Is that because were using metadata as an argument?

I've re-read NVVM docs and I can't say that I understand how it's supposed to work.
metadata holding the texture or surface variable alone is a rather odd notion and I'm not surprised that it's not handled well. In the end we do end up with a 'handle' which is an in-memory object. Perhaps it should be represented as a real variable with a metadata attribute. Then we can lower it as a handle, can enforce that only texture/surface instructions are allowed to use it and will have a way to tell LLVM what it's allowed to do.

I don't have a good picture of how it all will fit together in the end (or whether what I suggest makes sense), but the current implementation appears to be in need of rethinking.

In D77777#1974988, @tra wrote:

In D77777#1974849, @hliao wrote:

NVVM IR spec is for nvidia's own compiler. It's based on LLVM, but it does not impose direct constraints on LLVM's design choices.

It would be an advantage and, sometimes, desirable to generate IR compatible to NVVM IR spec.

I'm not against it, but I think it's OK to make different choices if we have good reasons for that. NVIDIA didn't update LLVM since they've contributed the original implementation, so by now we're both far behind the current state of NVVM and quite a bit sideways due to the things LLVM has added to NVPTX backend.

This sounds like it may have been done that way in an attempt to work around a problem with intrinsics' constraints. We may want to check if there's a better way to do it now.
Right now both intrinsics are marked with [IntrNoMem] which may be the reason for compiler feeling free to move it around. We may need to give compiler correct information and then we may not need this just-in-time intrinsic replacement hack. I think it should be at least IntrArgMemOnly or, maybe IntrInaccessibleMemOrArgMemOnly.

That may not exactly model the behavior as, for binding texture/surface support, in fact, it's true that there's no memory operation at all. Even with InstArgMemOnly or similar attributes, it still won't be preventable for optimizations to sink common code. Such trick is played in lots of intrinsics, such as read.register and etc.

Can you give me an example where/how optimizer would break things? Is that because were using metadata as an argument?

I've re-read NVVM docs and I can't say that I understand how it's supposed to work.
metadata holding the texture or surface variable alone is a rather odd notion and I'm not surprised that it's not handled well. In the end we do end up with a 'handle' which is an in-memory object. Perhaps it should be represented as a real variable with a metadata attribute. Then we can lower it as a handle, can enforce that only texture/surface instructions are allowed to use it and will have a way to tell LLVM what it's allowed to do.

I don't have a good picture of how it all will fit together in the end (or whether what I suggest makes sense), but the current implementation appears to be in need of rethinking.

the 1st argument in llvm.nvvm.texsurf.hande.internal or the 2nd one in llvm.nvvm.texsurf.handle must be kept as an immediate or constant value, i.e. that global variable. However, optimizations will find common code in the following

if (cond) {
  %hnd = texsurf.handle.internal(@tex1);
} else {
  %hnd = texsurf.handle.internal(@tex2)
}
= use(%hnd)

and hoist or sink it into

if (cond) {
  %ptr = @tex1;
} else {
  %ptr = @tex2;
}
%hnd = texsurf.handle.intenal(%ptr);
= use(%hnd)

The backend cannot handle non immediate operand in texsurf.handle. The similar thing happens to read.register as well as it also assumes its argument is always an immediate value.

In D77777#1975178, @hliao wrote:
the 1st argument in llvm.nvvm.texsurf.hande.internal or the 2nd one in llvm.nvvm.texsurf.handle must be kept as an immediate or constant value, i.e. that global variable. However, optimizations will find common code in the following
if (cond) {
  %hnd = texsurf.handle.internal(@tex1);
} else {
  %hnd = texsurf.handle.internal(@tex2)
}
= use(%hnd)
and hoist or sink it into
if (cond) {
  %ptr = @tex1;
} else {
  %ptr = @tex2;
}
%hnd = texsurf.handle.intenal(%ptr);
= use(%hnd)
The backend cannot handle non immediate operand in texsurf.handle. The similar thing happens to read.register as well as it also assumes its argument is always an immediate value.

I wonder if we can use token types to represent the handle? https://reviews.llvm.org/D11861
@majnemer -- would this use case be suitable for the token type?

Also, if I read PTX docs correctly, it should be OK to pass texture handle address via an intermediate variable:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#texture-sampler-and-surface-types

Creating pointers to opaque variables using mov, e.g., mov.u64 reg, opaque_var;. The resulting pointer may be stored to and loaded from memory, passed as a parameter to functions, and de-referenced by texture and surface load, store, and query instructions

We may not need the tokens and should be able to use regular pointer.

In D77777#1975406, @tra wrote:
In D77777#1975178, @hliao wrote:
the 1st argument in llvm.nvvm.texsurf.hande.internal or the 2nd one in llvm.nvvm.texsurf.handle must be kept as an immediate or constant value, i.e. that global variable. However, optimizations will find common code in the following
if (cond) {
  %hnd = texsurf.handle.internal(@tex1);
} else {
  %hnd = texsurf.handle.internal(@tex2)
}
= use(%hnd)
and hoist or sink it into
if (cond) {
  %ptr = @tex1;
} else {
  %ptr = @tex2;
}
%hnd = texsurf.handle.intenal(%ptr);
= use(%hnd)
The backend cannot handle non immediate operand in texsurf.handle. The similar thing happens to read.register as well as it also assumes its argument is always an immediate value.
I wonder if we can use token types to represent the handle? https://reviews.llvm.org/D11861
@majnemer -- would this use case be suitable for the token type?

If we still could make PHI over token, it canont serve this purpose. Check llvm::canReplaceOperandWithVariable for operand for details.

In D77777#1975440, @tra wrote:

Also, if I read PTX docs correctly, it should be OK to pass texture handle address via an intermediate variable:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#texture-sampler-and-surface-types

Creating pointers to opaque variables using mov, e.g., mov.u64 reg, opaque_var;. The resulting pointer may be stored to and loaded from memory, passed as a parameter to functions, and de-referenced by texture and surface load, store, and query instructions

We may not need the tokens and should be able to use regular pointer.

That handle is the output of texsurf.handle intrinsic instead of its input. Internally within NVTPX backend, it needs to keep track of which global variable needs to be a texref or surfref and requires the operand of texsurf.handle must be a global variable. Check NVPTXReplaceImageHandles.cpp line 167 - 175.

In D77777#1975628, @hliao wrote:

In D77777#1975440, @tra wrote:

Also, if I read PTX docs correctly, it should be OK to pass texture handle address via an intermediate variable:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#texture-sampler-and-surface-types

Creating pointers to opaque variables using mov, e.g., mov.u64 reg, opaque_var;. The resulting pointer may be stored to and loaded from memory, passed as a parameter to functions, and de-referenced by texture and surface load, store, and query instructions

We may not need the tokens and should be able to use regular pointer.

That handle is the output of texsurf.handle intrinsic instead of its input. Internally within NVTPX backend, it needs to keep track of which global variable needs to be a texref or surfref and requires the operand of texsurf.handle must be a global variable. Check NVPTXReplaceImageHandles.cpp line 167 - 175.

That's the point I'm trying to make -- existing code may not be the best way to implement this and should be improved. If we are serious about supporting textures & surfaces, then it may be worth making it work properly, as opposed to adding more hacks to get the old and till-now largely unused bits of LLVM do what we want.

We could do something like this:

class instances with the texref attribute are lowered in a way that produces a global handle. It could be a handle-only, or the handle may be produced in addition to the object itself.
intrinsics accept texref pointers and follow standard LLVM rules/assumptions.
=> code is subject to regular LLVM optimizations
=> no need to have special passes to tweak IR just so. At worst, we may keep something similar to NVPTXReplaceImageHandles which would replace object references with handle references. We may not even need to do that. As far as LLVM is concerned texture handle is just a pointer, and the objects with texref attribute are lowered as a .texref global in PTX. It's user's responsibility to pass the right pointer to the intrinsic.

On a side note, the lowering of texture/surface instructions and intrinsics could use a major overhaul, too. It's currently excessively redundant and could be reduced to a much more concise tablegen-driven implementation.

In D77777#1975618, @hliao wrote:
In D77777#1975406, @tra wrote:
In D77777#1975178, @hliao wrote:
the 1st argument in llvm.nvvm.texsurf.hande.internal or the 2nd one in llvm.nvvm.texsurf.handle must be kept as an immediate or constant value, i.e. that global variable. However, optimizations will find common code in the following
if (cond) {
  %hnd = texsurf.handle.internal(@tex1);
} else {
  %hnd = texsurf.handle.internal(@tex2)
}
= use(%hnd)
and hoist or sink it into
if (cond) {
  %ptr = @tex1;
} else {
  %ptr = @tex2;
}
%hnd = texsurf.handle.intenal(%ptr);
= use(%hnd)
The backend cannot handle non immediate operand in texsurf.handle. The similar thing happens to read.register as well as it also assumes its argument is always an immediate value.
I wonder if we can use token types to represent the handle? https://reviews.llvm.org/D11861
@majnemer -- would this use case be suitable for the token type?
If we still could make PHI over token, it canont serve this purpose. Check llvm::canReplaceOperandWithVariable for operand for details.

It is not possible to PHI a token value. Token values disable the call to canReplaceOperandWithVariable.

Revision Contents

Path

Size

clang/

lib/

CodeGen/

TargetInfo.cpp

12 lines

test/

CodeGenCUDA/

surface.cu

2 lines

texture.cu

4 lines

llvm/

lib/

Target/

NVPTX/

CMakeLists.txt

7 lines

NVPTX.h

1 line

NVPTXTargetMachine.cpp

6 lines

NVPTXTexSurfHandleInternalizer.cpp

91 lines

test/

CodeGen/

NVPTX/

tex-read-cuda.ll

24 lines

Diff 256518

clang/lib/CodeGen/TargetInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,476 Lines • ▼ Show 20 Lines	static void emitBuiltinSurfTexDeviceCopy(CodeGenFunction &CGF, LValue Dst,
LValue Src) {		LValue Src) {
llvm::Value *Handle = nullptr;		llvm::Value *Handle = nullptr;
llvm::Constant *C =		llvm::Constant *C =
llvm::dyn_cast<llvm::Constant>(Src.getAddress(CGF).getPointer());		llvm::dyn_cast<llvm::Constant>(Src.getAddress(CGF).getPointer());
// Lookup `addrspacecast` through the constant pointer if any.		// Lookup `addrspacecast` through the constant pointer if any.
if (auto *ASC = llvm::dyn_cast_or_null<llvm::AddrSpaceCastOperator>(C))		if (auto *ASC = llvm::dyn_cast_or_null<llvm::AddrSpaceCastOperator>(C))
C = llvm::cast<llvm::Constant>(ASC->getPointerOperand());		C = llvm::cast<llvm::Constant>(ASC->getPointerOperand());
if (auto *GV = llvm::dyn_cast_or_null<llvm::GlobalVariable>(C)) {		if (auto *GV = llvm::dyn_cast_or_null<llvm::GlobalVariable>(C)) {
		// According to [NVVM IR Spec][1], `nvvm.texsurf.handle` should be used
		// to access texture/surface memory. The first argument to that intrinsic
		// is a metadata holding the texture or surface variable. The second
		// argument to that intrinsic is the texture or surface variable itself.
		// ---
		// [1]: https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html
		llvm::Value *MD = llvm::MetadataAsValue::get(
		CGF.getLLVMContext(), llvm::ConstantAsMetadata::get(GV));
// Load the handle from the specific global variable using		// Load the handle from the specific global variable using
// `nvvm.texsurf.handle.internal` intrinsic.		// `nvvm.texsurf.handle.internal` intrinsic.
Handle = CGF.EmitRuntimeCall(		Handle = CGF.EmitRuntimeCall(
CGF.CGM.getIntrinsic(llvm::Intrinsic::nvvm_texsurf_handle_internal,		CGF.CGM.getIntrinsic(llvm::Intrinsic::nvvm_texsurf_handle,
{GV->getType()}),		{GV->getType()}),
{GV}, "texsurf_handle");		{MD, GV}, "texsurf_handle");
} else		} else
Handle = CGF.EmitLoadOfScalar(Src, SourceLocation());		Handle = CGF.EmitLoadOfScalar(Src, SourceLocation());
CGF.EmitStoreOfScalar(Handle, Dst);		CGF.EmitStoreOfScalar(Handle, Dst);
}		}
};		};

/// Checks if the type is unsupported directly by the current target.		/// Checks if the type is unsupported directly by the current target.
static bool isUnsupportedType(ASTContext &Context, QualType T) {		static bool isUnsupportedType(ASTContext &Context, QualType T) {
▲ Show 20 Lines • Show All 3,982 Lines • Show Last 20 Lines

clang/test/CodeGenCUDA/surface.cu

	Show All 22 Lines
	// On the host side, they remain in the original type.			// On the host side, they remain in the original type.
	// HOST: @surf = internal global %struct.surface			// HOST: @surf = internal global %struct.surface
	// HOST: @0 = private unnamed_addr constant [5 x i8] c"surf\00"			// HOST: @0 = private unnamed_addr constant [5 x i8] c"surf\00"
	surface<void, 2> surf;			surface<void, 2> surf;

	__attribute__((device)) int suld_2d_zero(surface<void, 2>, int, int) asm("llvm.nvvm.suld.2d.i32.zero");			__attribute__((device)) int suld_2d_zero(surface<void, 2>, int, int) asm("llvm.nvvm.suld.2d.i32.zero");

	// DEVICE-LABEL: i32 @_Z3fooii(i32 %x, i32 %y)			// DEVICE-LABEL: i32 @_Z3fooii(i32 %x, i32 %y)
	// DEVICE: call i64 @llvm.nvvm.texsurf.handle.internal.p1i64(i64 addrspace(1)* @surf)			// DEVICE: call i64 @llvm.nvvm.texsurf.handle.p1i64(metadata [[SURF:.*]], [[SURF]])
	// DEVICE: call i32 @llvm.nvvm.suld.2d.i32.zero(i64 %{{.}}, i32 %{{.}}, i32 %{{.*}})			// DEVICE: call i32 @llvm.nvvm.suld.2d.i32.zero(i64 %{{.}}, i32 %{{.}}, i32 %{{.*}})
	__attribute__((device)) int foo(int x, int y) {			__attribute__((device)) int foo(int x, int y) {
	return suld_2d_zero(surf, x, y);			return suld_2d_zero(surf, x, y);
	}			}

	// HOST: define internal void @[[PREFIX:__cuda]]_register_globals			// HOST: define internal void @[[PREFIX:__cuda]]_register_globals
	// Texture references need registering with correct arguments.			// Texture references need registering with correct arguments.
	// HOST: call void @[[PREFIX]]RegisterSurface(i8** %0, i8{{.}}({{.}}@surf{{.}}), i8{{.}}({{.}}@0{{.}}), i8{{.}}({{.}}@0{{.}}), i32 2, i32 0)			// HOST: call void @[[PREFIX]]RegisterSurface(i8** %0, i8{{.}}({{.}}@surf{{.}}), i8{{.}}({{.}}@0{{.}}), i8{{.}}({{.}}@0{{.}}), i32 2, i32 0)

	// They also need annotating in metadata.			// They also need annotating in metadata.
	// DEVICE: !0 = !{i64 addrspace(1)* @surf, !"surface", i32 1}			// DEVICE: !0 = !{i64 addrspace(1)* @surf, !"surface", i32 1}

clang/test/CodeGenCUDA/texture.cu

	Show All 31 Lines
	struct v4f {			struct v4f {
	float x, y, z, w;			float x, y, z, w;
	};			};

	__attribute__((device)) v4f tex2d_ld(texture<float, 2, ElementType>, float, float) asm("llvm.nvvm.tex.unified.2d.v4f32.f32");			__attribute__((device)) v4f tex2d_ld(texture<float, 2, ElementType>, float, float) asm("llvm.nvvm.tex.unified.2d.v4f32.f32");
	__attribute__((device)) v4f tex2d_ld(texture<float, 2, NormalizedFloat>, int, int) asm("llvm.nvvm.tex.unified.2d.v4f32.s32");			__attribute__((device)) v4f tex2d_ld(texture<float, 2, NormalizedFloat>, int, int) asm("llvm.nvvm.tex.unified.2d.v4f32.s32");

	// DEVICE-LABEL: float @_Z3fooff(float %x, float %y)			// DEVICE-LABEL: float @_Z3fooff(float %x, float %y)
	// DEVICE: call i64 @llvm.nvvm.texsurf.handle.internal.p1i64(i64 addrspace(1)* @tex)			// DEVICE: call i64 @llvm.nvvm.texsurf.handle.p1i64(metadata [[TEX:.*]], [[TEX]])
	// DEVICE: call %struct.v4f @llvm.nvvm.tex.unified.2d.v4f32.f32(i64 %{{.}}, float %{{.}}, float %{{.*}})			// DEVICE: call %struct.v4f @llvm.nvvm.tex.unified.2d.v4f32.f32(i64 %{{.}}, float %{{.}}, float %{{.*}})
	// DEVICE: call i64 @llvm.nvvm.texsurf.handle.internal.p1i64(i64 addrspace(1)* @norm)			// DEVICE: call i64 @llvm.nvvm.texsurf.handle.p1i64(metadata [[NORM:.*]], [[NORM]])
	// DEVICE: call %struct.v4f @llvm.nvvm.tex.unified.2d.v4f32.s32(i64 %{{.}}, i32 %{{.}}, i32 %{{.*}})			// DEVICE: call %struct.v4f @llvm.nvvm.tex.unified.2d.v4f32.s32(i64 %{{.}}, i32 %{{.}}, i32 %{{.*}})
	__attribute__((device)) float foo(float x, float y) {			__attribute__((device)) float foo(float x, float y) {
	return tex2d_ld(tex, x, y).x + tex2d_ld(norm, int(x), int(y)).x;			return tex2d_ld(tex, x, y).x + tex2d_ld(norm, int(x), int(y)).x;
	}			}

	// HOST: define internal void @[[PREFIX:__cuda]]_register_globals			// HOST: define internal void @[[PREFIX:__cuda]]_register_globals
	// Texture references need registering with correct arguments.			// Texture references need registering with correct arguments.
	// HOST: call void @[[PREFIX]]RegisterTexture(i8** %0, i8{{.}}({{.}}@tex{{.}}), i8{{.}}({{.}}@0{{.}}), i8{{.}}({{.}}@0{{.}}), i32 2, i32 0, i32 0)			// HOST: call void @[[PREFIX]]RegisterTexture(i8** %0, i8{{.}}({{.}}@tex{{.}}), i8{{.}}({{.}}@0{{.}}), i8{{.}}({{.}}@0{{.}}), i32 2, i32 0, i32 0)
	// HOST: call void @[[PREFIX]]RegisterTexture(i8** %0, i8{{.}}({{.}}@norm{{.}}), i8{{.}}({{.}}@1{{.}}), i8{{.}}({{.}}@1{{.}}), i32 2, i32 1, i32 0)			// HOST: call void @[[PREFIX]]RegisterTexture(i8** %0, i8{{.}}({{.}}@norm{{.}}), i8{{.}}({{.}}@1{{.}}), i8{{.}}({{.}}@1{{.}}), i32 2, i32 1, i32 0)

	// They also need annotating in metadata.			// They also need annotating in metadata.
	// DEVICE: !0 = !{i64 addrspace(1)* @tex, !"texture", i32 1}			// DEVICE: !0 = !{i64 addrspace(1)* @tex, !"texture", i32 1}
	// DEVICE: !1 = !{i64 addrspace(1)* @norm, !"texture", i32 1}			// DEVICE: !1 = !{i64 addrspace(1)* @norm, !"texture", i32 1}

llvm/lib/Target/NVPTX/CMakeLists.txt

Show All 13 Lines	set(NVPTXCodeGen_sources
NVPTXAssignValidGlobalNames.cpp		NVPTXAssignValidGlobalNames.cpp
NVPTXFrameLowering.cpp		NVPTXFrameLowering.cpp
NVPTXGenericToNVVM.cpp		NVPTXGenericToNVVM.cpp
NVPTXISelDAGToDAG.cpp		NVPTXISelDAGToDAG.cpp
NVPTXISelLowering.cpp		NVPTXISelLowering.cpp
NVPTXImageOptimizer.cpp		NVPTXImageOptimizer.cpp
NVPTXInstrInfo.cpp		NVPTXInstrInfo.cpp
NVPTXLowerAggrCopies.cpp		NVPTXLowerAggrCopies.cpp
NVPTXLowerArgs.cpp
NVPTXLowerAlloca.cpp		NVPTXLowerAlloca.cpp
NVPTXPeephole.cpp		NVPTXLowerArgs.cpp
NVPTXMCExpr.cpp		NVPTXMCExpr.cpp
		NVPTXPeephole.cpp
NVPTXPrologEpilogPass.cpp		NVPTXPrologEpilogPass.cpp
		NVPTXProxyRegErasure.cpp
NVPTXRegisterInfo.cpp		NVPTXRegisterInfo.cpp
NVPTXReplaceImageHandles.cpp		NVPTXReplaceImageHandles.cpp
NVPTXSubtarget.cpp		NVPTXSubtarget.cpp
NVPTXTargetMachine.cpp		NVPTXTargetMachine.cpp
NVPTXTargetTransformInfo.cpp		NVPTXTargetTransformInfo.cpp
		NVPTXTexSurfHandleInternalizer.cpp
NVPTXUtilities.cpp		NVPTXUtilities.cpp
NVVMIntrRange.cpp		NVVMIntrRange.cpp
NVVMReflect.cpp		NVVMReflect.cpp
NVPTXProxyRegErasure.cpp
)		)

add_llvm_target(NVPTXCodeGen ${NVPTXCodeGen_sources})		add_llvm_target(NVPTXCodeGen ${NVPTXCodeGen_sources})

add_subdirectory(MCTargetDesc)		add_subdirectory(MCTargetDesc)
add_subdirectory(TargetInfo)		add_subdirectory(TargetInfo)

llvm/lib/Target/NVPTX/NVPTX.h

	Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	FunctionPass *createNVVMReflectPass(unsigned int SmVersion);			FunctionPass *createNVVMReflectPass(unsigned int SmVersion);
	MachineFunctionPass *createNVPTXPrologEpilogPass();			MachineFunctionPass *createNVPTXPrologEpilogPass();
	MachineFunctionPass *createNVPTXReplaceImageHandlesPass();			MachineFunctionPass *createNVPTXReplaceImageHandlesPass();
	FunctionPass *createNVPTXImageOptimizerPass();			FunctionPass *createNVPTXImageOptimizerPass();
	FunctionPass createNVPTXLowerArgsPass(const NVPTXTargetMachine TM);			FunctionPass createNVPTXLowerArgsPass(const NVPTXTargetMachine TM);
	FunctionPass *createNVPTXLowerAllocaPass();			FunctionPass *createNVPTXLowerAllocaPass();
	MachineFunctionPass *createNVPTXPeephole();			MachineFunctionPass *createNVPTXPeephole();
	MachineFunctionPass *createNVPTXProxyRegErasurePass();			MachineFunctionPass *createNVPTXProxyRegErasurePass();
				FunctionPass *createNVPTXTexSurfHandleInternalizerPass();

	namespace NVPTX {			namespace NVPTX {
	enum DrvInterface {			enum DrvInterface {
	NVCL,			NVCL,
	CUDA			CUDA
	};			};

	// A field inside TSFlags needs a shift and a mask. The usage is			// A field inside TSFlags needs a shift and a mask. The usage is
	▲ Show 20 Lines • Show All 111 Lines • Show Last 20 Lines

llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp

Show First 20 Lines • Show All 155 Lines • ▼ Show 20 Lines	public:
NVPTXPassConfig(NVPTXTargetMachine &TM, PassManagerBase &PM)		NVPTXPassConfig(NVPTXTargetMachine &TM, PassManagerBase &PM)
: TargetPassConfig(TM, PM) {}		: TargetPassConfig(TM, PM) {}

NVPTXTargetMachine &getNVPTXTargetMachine() const {		NVPTXTargetMachine &getNVPTXTargetMachine() const {
return getTM<NVPTXTargetMachine>();		return getTM<NVPTXTargetMachine>();
}		}

void addIRPasses() override;		void addIRPasses() override;
		bool addPreISel() override;
bool addInstSelector() override;		bool addInstSelector() override;
void addPreRegAlloc() override;		void addPreRegAlloc() override;
void addPostRegAlloc() override;		void addPostRegAlloc() override;
void addMachineSSAOptimization() override;		void addMachineSSAOptimization() override;

FunctionPass *createTargetRegisterAllocator(bool) override;		FunctionPass *createTargetRegisterAllocator(bool) override;
void addFastRegAlloc() override;		void addFastRegAlloc() override;
void addOptimizedRegAlloc() override;		void addOptimizedRegAlloc() override;
▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	void NVPTXPassConfig::addIRPasses() {
// but EarlyCSE can do neither of them.		// but EarlyCSE can do neither of them.
if (getOptLevel() != CodeGenOpt::None) {		if (getOptLevel() != CodeGenOpt::None) {
addEarlyCSEOrGVNPass();		addEarlyCSEOrGVNPass();
if (!DisableLoadStoreVectorizer)		if (!DisableLoadStoreVectorizer)
addPass(createLoadStoreVectorizerPass());		addPass(createLoadStoreVectorizerPass());
}		}
}		}

		bool NVPTXPassConfig::addPreISel() {
		addPass(createNVPTXTexSurfHandleInternalizerPass());
		return false;
		}

bool NVPTXPassConfig::addInstSelector() {		bool NVPTXPassConfig::addInstSelector() {
const NVPTXSubtarget &ST = *getTM<NVPTXTargetMachine>().getSubtargetImpl();		const NVPTXSubtarget &ST = *getTM<NVPTXTargetMachine>().getSubtargetImpl();

addPass(createLowerAggrCopies());		addPass(createLowerAggrCopies());
addPass(createAllocaHoisting());		addPass(createAllocaHoisting());
addPass(createNVPTXISelDag(getNVPTXTargetMachine(), getOptLevel()));		addPass(createNVPTXISelDag(getNVPTXTargetMachine(), getOptLevel()));

if (!ST.hasImageHandles())		if (!ST.hasImageHandles())
▲ Show 20 Lines • Show All 90 Lines • Show Last 20 Lines

llvm/lib/Target/NVPTX/NVPTXTexSurfHandleInternalizer.cpp

This file was added.

				//===- NVPTXLowerAggrCopies.cpp - ------------------------------- C++ ---===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// \file
				//
				// According to [NVVM IR Spec][1], `nvvm.texsurf.handle` should be used to
				// access texture/surface memory. The first argument to that intrinsic is a
				// metadata holding the texture or surface variable. The second argument to
				// that intrinsic is the texture or surface variable itself. However, the first
				// metadata argument cannot be handled directly by the NVPTX backend, which
				// only handle its internal version, i.e., `nvvm.texsurf.handle.internal`. This
				// pass, arranged just before the code selection, replaces
				// `nvvm.texsurf.handle` intrinsics with their internal version, i.e.,
				// `nvvm.texsurf.handle.internal`.
				// ---
				// [1]: https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html
				//
				//===----------------------------------------------------------------------===//

				#include "NVPTX.h"
				#include "llvm/IR/BasicBlock.h"
				#include "llvm/IR/Function.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/IntrinsicInst.h"
				#include "llvm/IR/IntrinsicsNVPTX.h"
				#include "llvm/Pass.h"

				using namespace llvm;

				#define DEBUG_TYPE "nvptx-texsurf-handle-internalizer"

				namespace llvm {
				void initializeTexSurfHandleInternalizerPass(PassRegistry &);
				} // namespace llvm

				namespace {

				class TexSurfHandleInternalizer : public FunctionPass {
				public:
				static char ID;

				TexSurfHandleInternalizer() : FunctionPass(ID) {
				initializeTexSurfHandleInternalizerPass(*PassRegistry::getPassRegistry());
				}

				StringRef getPassName() const override {
				return "Internalize `nvvm.texsurf.handle` intrinsics";
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.setPreservesCFG();
				}

				bool runOnFunction(Function &F) override {
				bool Changed = false;
				for (auto &BB : F)
				for (auto BI = BB.begin(), BE = BB.end(); BI != BE; /EMPTY/) {
				IntrinsicInst II = dyn_cast<IntrinsicInst>(&BI++);
				if (!II \|\| II->getIntrinsicID() != Intrinsic::nvvm_texsurf_handle)
				continue;
				assert(II->getArgOperand(1) ==
				cast<ValueAsMetadata>(
				cast<MetadataAsValue>(II->getArgOperand(0))->getMetadata())
				->getValue());
				// Replace it with the internal version.
				IRBuilder<> Builder(II);
				auto *NewII = Builder.CreateUnaryIntrinsic(
				Intrinsic::nvvm_texsurf_handle_internal, II->getArgOperand(1));
				II->replaceAllUsesWith(NewII);
				II->eraseFromParent();
				Changed = true;
				}
				return Changed;
				}
				};

				} // end of anonymous namespace

				FunctionPass *llvm::createNVPTXTexSurfHandleInternalizerPass() {
				return new TexSurfHandleInternalizer();
				}

				char TexSurfHandleInternalizer::ID = 0;

				INITIALIZE_PASS(TexSurfHandleInternalizer, "nvptx-texsurf-handle-internalizer",
				"Interalize texsurf-handle intrinsic", false, false)

llvm/test/CodeGen/NVPTX/tex-read-cuda.ll

	; RUN: llc < %s -march=nvptx -mcpu=sm_20 \| FileCheck %s --check-prefix=SM20			; RUN: llc < %s -march=nvptx -mcpu=sm_20 \| FileCheck %s --check-prefix=SM20
	; RUN: llc < %s -march=nvptx -mcpu=sm_30 \| FileCheck %s --check-prefix=SM30			; RUN: llc < %s -march=nvptx -mcpu=sm_30 \| FileCheck %s --check-prefix=SM30


	target triple = "nvptx-unknown-cuda"			target triple = "nvptx-unknown-cuda"

	declare { float, float, float, float } @llvm.nvvm.tex.unified.1d.v4f32.s32(i64, i32)			declare { float, float, float, float } @llvm.nvvm.tex.unified.1d.v4f32.s32(i64, i32)
	declare i64 @llvm.nvvm.texsurf.handle.internal.p1i64(i64 addrspace(1)*)			declare i64 @llvm.nvvm.texsurf.handle.internal.p1i64(i64 addrspace(1)*)
				declare i64 @llvm.nvvm.texsurf.handle.p1i64(metadata, i64 addrspace(1)*)

	; SM20-LABEL: .entry foo			; SM20-LABEL: .entry foo
	; SM30-LABEL: .entry foo			; SM30-LABEL: .entry foo
	define void @foo(i64 %img, float* %red, i32 %idx) {			define void @foo(i64 %img, float* %red, i32 %idx) {
	; SM20: ld.param.u64 %rd[[TEXREG:[0-9]+]], [foo_param_0];			; SM20: ld.param.u64 %rd[[TEXREG:[0-9]+]], [foo_param_0];
	; SM20: tex.1d.v4.f32.s32 {%f[[RED:[0-9]+]], %f[[GREEN:[0-9]+]], %f[[BLUE:[0-9]+]], %f[[ALPHA:[0-9]+]]}, [%rd[[TEXREG]], {%r{{[0-9]+}}}]			; SM20: tex.1d.v4.f32.s32 {%f[[RED:[0-9]+]], %f[[GREEN:[0-9]+]], %f[[BLUE:[0-9]+]], %f[[ALPHA:[0-9]+]]}, [%rd[[TEXREG]], {%r{{[0-9]+}}}]
	; SM30: ld.param.u64 %rd[[TEXREG:[0-9]+]], [foo_param_0];			; SM30: ld.param.u64 %rd[[TEXREG:[0-9]+]], [foo_param_0];
	; SM30: tex.1d.v4.f32.s32 {%f[[RED:[0-9]+]], %f[[GREEN:[0-9]+]], %f[[BLUE:[0-9]+]], %f[[ALPHA:[0-9]+]]}, [%rd[[TEXREG]], {%r{{[0-9]+}}}]			; SM30: tex.1d.v4.f32.s32 {%f[[RED:[0-9]+]], %f[[GREEN:[0-9]+]], %f[[BLUE:[0-9]+]], %f[[ALPHA:[0-9]+]]}, [%rd[[TEXREG]], {%r{{[0-9]+}}}]
	%val = tail call { float, float, float, float } @llvm.nvvm.tex.unified.1d.v4f32.s32(i64 %img, i32 %idx)			%val = tail call { float, float, float, float } @llvm.nvvm.tex.unified.1d.v4f32.s32(i64 %img, i32 %idx)
	%ret = extractvalue { float, float, float, float } %val, 0			%ret = extractvalue { float, float, float, float } %val, 0
	; SM20: st.global.f32 [%r{{[0-9]+}}], %f[[RED]]			; SM20: st.global.f32 [%r{{[0-9]+}}], %f[[RED]]
	; SM30: st.global.f32 [%r{{[0-9]+}}], %f[[RED]]			; SM30: st.global.f32 [%r{{[0-9]+}}], %f[[RED]]
	store float %ret, float* %red			store float %ret, float* %red
	ret void			ret void
	}			}


	@tex0 = internal addrspace(1) global i64 0, align 8			@tex0 = internal addrspace(1) global i64 0, align 8

	; SM20-LABEL: .entry bar			; SM20-LABEL: .entry bar
	; SM30-LABEL: .entry bar			; SM30-LABEL: .entry bar
	define void @bar(float* %red, i32 %idx) {			define void @bar(float* %red, i32 %idx) {
	; SM30: mov.u64 %rd[[TEXHANDLE:[0-9]+]], tex0			; SM30: mov.u64 %rd[[TEXHANDLE:[0-9]+]], tex0
	%texHandle = tail call i64 @llvm.nvvm.texsurf.handle.internal.p1i64(i64 addrspace(1)* @tex0)			%texHandle = tail call i64 @llvm.nvvm.texsurf.handle.internal.p1i64(i64 addrspace(1)* @tex0)
	; SM20: tex.1d.v4.f32.s32 {%f[[RED:[0-9]+]], %f[[GREEN:[0-9]+]], %f[[BLUE:[0-9]+]], %f[[ALPHA:[0-9]+]]}, [tex0, {%r{{[0-9]+}}}]			; SM20: tex.1d.v4.f32.s32 {%f[[RED:[0-9]+]], %f[[GREEN:[0-9]+]], %f[[BLUE:[0-9]+]], %f[[ALPHA:[0-9]+]]}, [tex0, {%r{{[0-9]+}}}]
	; SM30: tex.1d.v4.f32.s32 {%f[[RED:[0-9]+]], %f[[GREEN:[0-9]+]], %f[[BLUE:[0-9]+]], %f[[ALPHA:[0-9]+]]}, [%rd[[TEXHANDLE]], {%r{{[0-9]+}}}]			; SM30: tex.1d.v4.f32.s32 {%f[[RED:[0-9]+]], %f[[GREEN:[0-9]+]], %f[[BLUE:[0-9]+]], %f[[ALPHA:[0-9]+]]}, [%rd[[TEXHANDLE]], {%r{{[0-9]+}}}]
	%val = tail call { float, float, float, float } @llvm.nvvm.tex.unified.1d.v4f32.s32(i64 %texHandle, i32 %idx)			%val = tail call { float, float, float, float } @llvm.nvvm.tex.unified.1d.v4f32.s32(i64 %texHandle, i32 %idx)
	%ret = extractvalue { float, float, float, float } %val, 0			%ret = extractvalue { float, float, float, float } %val, 0
	; SM20: st.global.f32 [%r{{[0-9]+}}], %f[[RED]]			; SM20: st.global.f32 [%r{{[0-9]+}}], %f[[RED]]
	; SM30: st.global.f32 [%r{{[0-9]+}}], %f[[RED]]			; SM30: st.global.f32 [%r{{[0-9]+}}], %f[[RED]]
	store float %ret, float* %red			store float %ret, float* %red
	ret void			ret void
	}			}

	!nvvm.annotations = !{!1, !2, !3}			; SM20-LABEL: .entry bax
				; SM30-LABEL: .entry bax
				define void @bax(float* %red, i32 %idx) {
				; SM30: mov.u64 %rd[[TEXHANDLE:[0-9]+]], tex0
				%texHandle = tail call i64 @llvm.nvvm.texsurf.handle.p1i64(metadata !5, i64 addrspace(1)* @tex0)
				; SM20: tex.1d.v4.f32.s32 {%f[[RED:[0-9]+]], %f[[GREEN:[0-9]+]], %f[[BLUE:[0-9]+]], %f[[ALPHA:[0-9]+]]}, [tex0, {%r{{[0-9]+}}}]
				; SM30: tex.1d.v4.f32.s32 {%f[[RED:[0-9]+]], %f[[GREEN:[0-9]+]], %f[[BLUE:[0-9]+]], %f[[ALPHA:[0-9]+]]}, [%rd[[TEXHANDLE]], {%r{{[0-9]+}}}]
				%val = tail call { float, float, float, float } @llvm.nvvm.tex.unified.1d.v4f32.s32(i64 %texHandle, i32 %idx)
				%ret = extractvalue { float, float, float, float } %val, 0
				; SM20: st.global.f32 [%r{{[0-9]+}}], %f[[RED]]
				; SM30: st.global.f32 [%r{{[0-9]+}}], %f[[RED]]
				store float %ret, float* %red
				ret void
				}

				!nvvm.annotations = !{!1, !2, !3, !4}
	!1 = !{void (i64, float, i32) @foo, !"kernel", i32 1}			!1 = !{void (i64, float, i32) @foo, !"kernel", i32 1}
	!2 = !{void (float, i32) @bar, !"kernel", i32 1}			!2 = !{void (float, i32) @bar, !"kernel", i32 1}
	!3 = !{i64 addrspace(1)* @tex0, !"texture", i32 1}			!3 = !{void (float, i32) @bax, !"kernel", i32 1}
				!4 = !{i64 addrspace(1)* @tex0, !"texture", i32 1}
				!5 = !{i64 addrspace(1)* @tex0}