This is an archive of the discontinued LLVM Phabricator instance.

[nvptx] Add `nvvm.texsurf.handle` internalizer.
Needs ReviewPublic

Authored by hliao on Apr 8 2020, 10:58 PM.

Details

Reviewers
tra
Summary
  • Replace them with the internal version, i.e. nvvm.texsurf.handle.internal just before the instruction selector.
  • Teach clang codegen to generate nvvm.texsurf.handle instead of nvvm.texsurf.handle.internal.

Diff Detail

Event Timeline

hliao created this revision.Apr 8 2020, 10:58 PM
hliao updated this revision to Diff 256321.Apr 9 2020, 8:48 AM

Rebase to trunk.

tra added a comment.Apr 9 2020, 10:45 AM

The patch could use a more detailed description. Specifically, it does not describe the purpose of these changes.

Replace them with the internal version, i.e. nvvm.texsurf.handle.internal just before the instruction selector.

It's not clear what is 'them'. 'nvvm.texsurf.handle' ?
If so, do we need 'internal' any more? Can we just rename internal and be done with it? Adding an extra pass just to replace one intrinsic with another seems to be unnecessary.

I may be missing something here. Why do we have internal and non-internal intrinsics at all? Do we need both?

hliao added a comment.Apr 9 2020, 1:07 PM
In D77777#1972349, @tra wrote:

The patch could use a more detailed description. Specifically, it does not describe the purpose of these changes.

Replace them with the internal version, i.e. nvvm.texsurf.handle.internal just before the instruction selector.

It's not clear what is 'them'. 'nvvm.texsurf.handle' ?
If so, do we need 'internal' any more? Can we just rename internal and be done with it? Adding an extra pass just to replace one intrinsic with another seems to be unnecessary.

I may be missing something here. Why do we have internal and non-internal intrinsics at all? Do we need both?

besides required by NVVM IR spec, the metadata in that intrinsic is a trick to prevent it from being sunk into common code during optimization in LLVM IR. NVPTX backend only handles the internal version. We need to internalize them for codegen. I will put a brief explanation in that pass.

hliao updated this revision to Diff 256511.Apr 9 2020, 11:24 PM

Add more comments to explain what that pass does.

hliao updated this revision to Diff 256518.Apr 10 2020, 12:47 AM

Fix a clang-tidy warning.

tra added a comment.Apr 10 2020, 10:48 AM
In D77777#1972349, @tra wrote:

The patch could use a more detailed description. Specifically, it does not describe the purpose of these changes.

Replace them with the internal version, i.e. nvvm.texsurf.handle.internal just before the instruction selector.

It's not clear what is 'them'. 'nvvm.texsurf.handle' ?
If so, do we need 'internal' any more? Can we just rename internal and be done with it? Adding an extra pass just to replace one intrinsic with another seems to be unnecessary.

I may be missing something here. Why do we have internal and non-internal intrinsics at all? Do we need both?

besides required by NVVM IR spec,

NVVM IR spec is for nvidia's own compiler. It's based on LLVM, but it does not impose direct constraints on LLVM's design choices.

the metadata in that intrinsic is a trick to prevent it from being sunk into common code during optimization in LLVM IR.

This sounds like it may have been done that way in an attempt to work around a problem with intrinsics' constraints. We may want to check if there's a better way to do it now.
Right now both intrinsics are marked with [IntrNoMem] which may be the reason for compiler feeling free to move it around. We may need to give compiler correct information and then we may not need this just-in-time intrinsic replacement hack. I think it should be at least IntrArgMemOnly or, maybe IntrInaccessibleMemOrArgMemOnly.

NVPTX backend only handles the internal version.

This is obviously fixable.

In D77777#1974672, @tra wrote:
In D77777#1972349, @tra wrote:

The patch could use a more detailed description. Specifically, it does not describe the purpose of these changes.

Replace them with the internal version, i.e. nvvm.texsurf.handle.internal just before the instruction selector.

It's not clear what is 'them'. 'nvvm.texsurf.handle' ?
If so, do we need 'internal' any more? Can we just rename internal and be done with it? Adding an extra pass just to replace one intrinsic with another seems to be unnecessary.

I may be missing something here. Why do we have internal and non-internal intrinsics at all? Do we need both?

besides required by NVVM IR spec,

NVVM IR spec is for nvidia's own compiler. It's based on LLVM, but it does not impose direct constraints on LLVM's design choices.

It would be an advantage and, sometimes, desirable to generate IR compatible to NVVM IR spec.

the metadata in that intrinsic is a trick to prevent it from being sunk into common code during optimization in LLVM IR.

This sounds like it may have been done that way in an attempt to work around a problem with intrinsics' constraints. We may want to check if there's a better way to do it now.
Right now both intrinsics are marked with [IntrNoMem] which may be the reason for compiler feeling free to move it around. We may need to give compiler correct information and then we may not need this just-in-time intrinsic replacement hack. I think it should be at least IntrArgMemOnly or, maybe IntrInaccessibleMemOrArgMemOnly.

That may not exactly model the behavior as, for binding texture/surface support, in fact, it's true that there's no memory operation at all. Even with InstArgMemOnly or similar attributes, it still won't be preventable for optimizations to sink common code. Such trick is played in lots of intrinsics, such as read.register and etc.

NVPTX backend only handles the internal version.

This is obviously fixable.

SDAG so far cannot handle metadata GISel doesn't have support either. Getting that supported in TICG won't justify too much for target-specific intrinsics as metadata should not directly be used in code generation.

tra added a comment.Apr 10 2020, 1:55 PM

NVVM IR spec is for nvidia's own compiler. It's based on LLVM, but it does not impose direct constraints on LLVM's design choices.

It would be an advantage and, sometimes, desirable to generate IR compatible to NVVM IR spec.

I'm not against it, but I think it's OK to make different choices if we have good reasons for that. NVIDIA didn't update LLVM since they've contributed the original implementation, so by now we're both far behind the current state of NVVM and quite a bit sideways due to the things LLVM has added to NVPTX backend.

This sounds like it may have been done that way in an attempt to work around a problem with intrinsics' constraints. We may want to check if there's a better way to do it now.
Right now both intrinsics are marked with [IntrNoMem] which may be the reason for compiler feeling free to move it around. We may need to give compiler correct information and then we may not need this just-in-time intrinsic replacement hack. I think it should be at least IntrArgMemOnly or, maybe IntrInaccessibleMemOrArgMemOnly.

That may not exactly model the behavior as, for binding texture/surface support, in fact, it's true that there's no memory operation at all. Even with InstArgMemOnly or similar attributes, it still won't be preventable for optimizations to sink common code. Such trick is played in lots of intrinsics, such as read.register and etc.

Can you give me an example where/how optimizer would break things? Is that because were using metadata as an argument?

I've re-read NVVM docs and I can't say that I understand how it's supposed to work.
metadata holding the texture or surface variable alone is a rather odd notion and I'm not surprised that it's not handled well. In the end we do end up with a 'handle' which is an in-memory object. Perhaps it should be represented as a real variable with a metadata attribute. Then we can lower it as a handle, can enforce that only texture/surface instructions are allowed to use it and will have a way to tell LLVM what it's allowed to do.

I don't have a good picture of how it all will fit together in the end (or whether what I suggest makes sense), but the current implementation appears to be in need of rethinking.

hliao added a comment.Apr 10 2020, 2:47 PM
In D77777#1974988, @tra wrote:

NVVM IR spec is for nvidia's own compiler. It's based on LLVM, but it does not impose direct constraints on LLVM's design choices.

It would be an advantage and, sometimes, desirable to generate IR compatible to NVVM IR spec.

I'm not against it, but I think it's OK to make different choices if we have good reasons for that. NVIDIA didn't update LLVM since they've contributed the original implementation, so by now we're both far behind the current state of NVVM and quite a bit sideways due to the things LLVM has added to NVPTX backend.

This sounds like it may have been done that way in an attempt to work around a problem with intrinsics' constraints. We may want to check if there's a better way to do it now.
Right now both intrinsics are marked with [IntrNoMem] which may be the reason for compiler feeling free to move it around. We may need to give compiler correct information and then we may not need this just-in-time intrinsic replacement hack. I think it should be at least IntrArgMemOnly or, maybe IntrInaccessibleMemOrArgMemOnly.

That may not exactly model the behavior as, for binding texture/surface support, in fact, it's true that there's no memory operation at all. Even with InstArgMemOnly or similar attributes, it still won't be preventable for optimizations to sink common code. Such trick is played in lots of intrinsics, such as read.register and etc.

Can you give me an example where/how optimizer would break things? Is that because were using metadata as an argument?

I've re-read NVVM docs and I can't say that I understand how it's supposed to work.
metadata holding the texture or surface variable alone is a rather odd notion and I'm not surprised that it's not handled well. In the end we do end up with a 'handle' which is an in-memory object. Perhaps it should be represented as a real variable with a metadata attribute. Then we can lower it as a handle, can enforce that only texture/surface instructions are allowed to use it and will have a way to tell LLVM what it's allowed to do.

I don't have a good picture of how it all will fit together in the end (or whether what I suggest makes sense), but the current implementation appears to be in need of rethinking.

the 1st argument in llvm.nvvm.texsurf.hande.internal or the 2nd one in llvm.nvvm.texsurf.handle must be kept as an immediate or constant value, i.e. that global variable. However, optimizations will find common code in the following

if (cond) {
  %hnd = texsurf.handle.internal(@tex1);
} else {
  %hnd = texsurf.handle.internal(@tex2)
}
= use(%hnd)

and hoist or sink it into

if (cond) {
  %ptr = @tex1;
} else {
  %ptr = @tex2;
}
%hnd = texsurf.handle.intenal(%ptr);
= use(%hnd)

The backend cannot handle non immediate operand in texsurf.handle. The similar thing happens to read.register as well as it also assumes its argument is always an immediate value.

tra added a subscriber: majnemer.Apr 10 2020, 4:25 PM

the 1st argument in llvm.nvvm.texsurf.hande.internal or the 2nd one in llvm.nvvm.texsurf.handle must be kept as an immediate or constant value, i.e. that global variable. However, optimizations will find common code in the following

if (cond) {
  %hnd = texsurf.handle.internal(@tex1);
} else {
  %hnd = texsurf.handle.internal(@tex2)
}
= use(%hnd)

and hoist or sink it into

if (cond) {
  %ptr = @tex1;
} else {
  %ptr = @tex2;
}
%hnd = texsurf.handle.intenal(%ptr);
= use(%hnd)

The backend cannot handle non immediate operand in texsurf.handle. The similar thing happens to read.register as well as it also assumes its argument is always an immediate value.

I wonder if we can use token types to represent the handle? https://reviews.llvm.org/D11861
@majnemer -- would this use case be suitable for the token type?

tra added a comment.Apr 10 2020, 4:47 PM

Also, if I read PTX docs correctly, it should be OK to pass texture handle address via an intermediate variable:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#texture-sampler-and-surface-types

Creating pointers to opaque variables using mov, e.g., mov.u64 reg, opaque_var;. The resulting pointer may be stored to and loaded from memory, passed as a parameter to functions, and de-referenced by texture and surface load, store, and query instructions

We may not need the tokens and should be able to use regular pointer.

hliao added a comment.Apr 10 2020, 6:11 PM
In D77777#1975406, @tra wrote:

the 1st argument in llvm.nvvm.texsurf.hande.internal or the 2nd one in llvm.nvvm.texsurf.handle must be kept as an immediate or constant value, i.e. that global variable. However, optimizations will find common code in the following

if (cond) {
  %hnd = texsurf.handle.internal(@tex1);
} else {
  %hnd = texsurf.handle.internal(@tex2)
}
= use(%hnd)

and hoist or sink it into

if (cond) {
  %ptr = @tex1;
} else {
  %ptr = @tex2;
}
%hnd = texsurf.handle.intenal(%ptr);
= use(%hnd)

The backend cannot handle non immediate operand in texsurf.handle. The similar thing happens to read.register as well as it also assumes its argument is always an immediate value.

I wonder if we can use token types to represent the handle? https://reviews.llvm.org/D11861
@majnemer -- would this use case be suitable for the token type?

If we still could make PHI over token, it canont serve this purpose. Check llvm::canReplaceOperandWithVariable for operand for details.

hliao added a comment.Apr 10 2020, 6:15 PM
In D77777#1975440, @tra wrote:

Also, if I read PTX docs correctly, it should be OK to pass texture handle address via an intermediate variable:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#texture-sampler-and-surface-types

Creating pointers to opaque variables using mov, e.g., mov.u64 reg, opaque_var;. The resulting pointer may be stored to and loaded from memory, passed as a parameter to functions, and de-referenced by texture and surface load, store, and query instructions

We may not need the tokens and should be able to use regular pointer.

That handle is the output of texsurf.handle intrinsic instead of its input. Internally within NVTPX backend, it needs to keep track of which global variable needs to be a texref or surfref and requires the operand of texsurf.handle must be a global variable. Check NVPTXReplaceImageHandles.cpp line 167 - 175.

tra added a comment.Apr 10 2020, 10:45 PM
In D77777#1975440, @tra wrote:

Also, if I read PTX docs correctly, it should be OK to pass texture handle address via an intermediate variable:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#texture-sampler-and-surface-types

Creating pointers to opaque variables using mov, e.g., mov.u64 reg, opaque_var;. The resulting pointer may be stored to and loaded from memory, passed as a parameter to functions, and de-referenced by texture and surface load, store, and query instructions

We may not need the tokens and should be able to use regular pointer.

That handle is the output of texsurf.handle intrinsic instead of its input. Internally within NVTPX backend, it needs to keep track of which global variable needs to be a texref or surfref and requires the operand of texsurf.handle must be a global variable. Check NVPTXReplaceImageHandles.cpp line 167 - 175.

That's the point I'm trying to make -- existing code may not be the best way to implement this and should be improved. If we are serious about supporting textures & surfaces, then it may be worth making it work properly, as opposed to adding more hacks to get the old and till-now largely unused bits of LLVM do what we want.

We could do something like this:

  • class instances with the texref attribute are lowered in a way that produces a global handle. It could be a handle-only, or the handle may be produced in addition to the object itself.
  • intrinsics accept texref pointers and follow standard LLVM rules/assumptions.
  • => code is subject to regular LLVM optimizations
  • => no need to have special passes to tweak IR just so. At worst, we may keep something similar to NVPTXReplaceImageHandles which would replace object references with handle references. We may not even need to do that. As far as LLVM is concerned texture handle is just a pointer, and the objects with texref attribute are lowered as a .texref global in PTX. It's user's responsibility to pass the right pointer to the intrinsic.

On a side note, the lowering of texture/surface instructions and intrinsics could use a major overhaul, too. It's currently excessively redundant and could be reduced to a much more concise tablegen-driven implementation.

In D77777#1975406, @tra wrote:

the 1st argument in llvm.nvvm.texsurf.hande.internal or the 2nd one in llvm.nvvm.texsurf.handle must be kept as an immediate or constant value, i.e. that global variable. However, optimizations will find common code in the following

if (cond) {
  %hnd = texsurf.handle.internal(@tex1);
} else {
  %hnd = texsurf.handle.internal(@tex2)
}
= use(%hnd)

and hoist or sink it into

if (cond) {
  %ptr = @tex1;
} else {
  %ptr = @tex2;
}
%hnd = texsurf.handle.intenal(%ptr);
= use(%hnd)

The backend cannot handle non immediate operand in texsurf.handle. The similar thing happens to read.register as well as it also assumes its argument is always an immediate value.

I wonder if we can use token types to represent the handle? https://reviews.llvm.org/D11861
@majnemer -- would this use case be suitable for the token type?

If we still could make PHI over token, it canont serve this purpose. Check llvm::canReplaceOperandWithVariable for operand for details.

It is not possible to PHI a token value. Token values disable the call to canReplaceOperandWithVariable.