Page MenuHomePhabricator
Feed Advanced Search

May 28 2020

hliao accepted D80129: AMDGPU: Handle rewriting ptrmask for more address spaces.

LGTM

May 28 2020, 10:56 AM · Restricted Project
hliao added inline comments to D80364: [amdgpu] Teach load widening to handle non-DWORD aligned loads..
May 28 2020, 9:15 AM · Restricted Project
hliao added a comment to D80364: [amdgpu] Teach load widening to handle non-DWORD aligned loads..

I'd still like to find a way to avoid a whole extra pass run for this. In the test here, the LoadStoreVectorizer should have vectorized these? Why didn't it?

May 28 2020, 8:45 AM · Restricted Project

May 27 2020

hliao accepted D80038: InferAddressSpaces: Handle ptrmask intrinsic.

LGTM

May 27 2020, 8:11 PM · Restricted Project
hliao committed rG03481287ca53: Refactor argument attribute specification in intrinsic definition. NFC. (authored by hliao).
Refactor argument attribute specification in intrinsic definition. NFC.
May 27 2020, 2:11 PM
hliao committed rGfa342b5c8054: Enable `align <n>` to be used in the intrinsic definition. (authored by hliao).
Enable `align <n>` to be used in the intrinsic definition.
May 27 2020, 2:11 PM
hliao closed D80422: Enable `align <n>` to be used in intrinsic definitions..
May 27 2020, 2:10 PM · Restricted Project
hliao updated the diff for D80364: [amdgpu] Teach load widening to handle non-DWORD aligned loads..

Rebase the latest trunk.

May 27 2020, 2:09 PM · Restricted Project
hliao committed rG49688b3c306d: Fix `-Wpedantic` warning. NFC. (authored by hliao).
Fix `-Wpedantic` warning. NFC.
May 27 2020, 1:04 PM
hliao committed rGb0404681171d: Fix warning `-Wpedantic`. NFC. (authored by hliao).
Fix warning `-Wpedantic`. NFC.
May 27 2020, 9:12 AM
hliao updated the diff for D80422: Enable `align <n>` to be used in intrinsic definitions..

remove <tuple> header.

May 27 2020, 9:11 AM · Restricted Project
hliao added inline comments to D80422: Enable `align <n>` to be used in intrinsic definitions..
May 27 2020, 9:11 AM · Restricted Project
hliao updated the diff for D80422: Enable `align <n>` to be used in intrinsic definitions..
  • Introduce RetIndex and ArgIndex<n> to specify attribute indices.
  • Minor refinement following other review comments.
May 27 2020, 8:39 AM · Restricted Project

May 26 2020

hliao added a comment to D80364: [amdgpu] Teach load widening to handle non-DWORD aligned loads..

Rewrite such transformations in LLVM IR as the late codegen preparation.

May 26 2020, 2:11 PM · Restricted Project

May 24 2020

hliao updated the diff for D80364: [amdgpu] Teach load widening to handle non-DWORD aligned loads..

Rewrite such transformations in LLVM IR as the late codegen preparation.

May 24 2020, 2:55 PM · Restricted Project

May 22 2020

hliao added a comment to D80237: [hip] Ensure pointer in struct argument has proper `addrspacecast`..

addrspacecast might be a real conversion. I feel like this is really going well beyond what argument coercion should be expected to do, and we need to step back and re-evaluate how we're doing this.

addrspacecast *must* be a no-op in terms of argument coercion.

So what does this mean exactly? If the ABI lowering uses argument coercion in a way that changes address spaces, it must ensure that the representations are different? So it's always *legal* to just do a memcpy here, we're just trying really hard to not do so.

May 22 2020, 9:25 PM · Restricted Project
hliao added a comment to D80237: [hip] Ensure pointer in struct argument has proper `addrspacecast`..

addrspacecast might be a real conversion. I feel like this is really going well beyond what argument coercion should be expected to do, and we need to step back and re-evaluate how we're doing this.

May 22 2020, 8:21 PM · Restricted Project
hliao added inline comments to D80364: [amdgpu] Teach load widening to handle non-DWORD aligned loads..
May 22 2020, 6:14 PM · Restricted Project
hliao added inline comments to D80364: [amdgpu] Teach load widening to handle non-DWORD aligned loads..
May 22 2020, 2:30 PM · Restricted Project
hliao added inline comments to D80364: [amdgpu] Teach load widening to handle non-DWORD aligned loads..
May 22 2020, 11:47 AM · Restricted Project
hliao updated the diff for D80422: Enable `align <n>` to be used in intrinsic definitions..

Follow part of clang-tidy the suggestion on coding style.

May 22 2020, 9:39 AM · Restricted Project
hliao updated the diff for D80422: Enable `align <n>` to be used in intrinsic definitions..

Revise the comment in the intrinsic emitter.

May 22 2020, 7:28 AM · Restricted Project
hliao updated the diff for D80422: Enable `align <n>` to be used in intrinsic definitions..

Add tests & example changes in relevant AMDGPU intrinsics.

May 22 2020, 7:28 AM · Restricted Project
hliao added a comment to D80422: Enable `align <n>` to be used in intrinsic definitions..

To prepare the refactoring on D80364, intrinsics interested should be specified with the alignment on the return pointer. With this patch, amdgcn.dispatch.ptr is defined as follows

def int_amdgcn_dispatch_ptr :
  Intrinsic<[LLVMQualPointerType<llvm_i8_ty, 4>], [],
  [Align<-1, 4>, IntrNoMem, IntrSpeculatable]>;

Referring to the return index by a negative parameter index is a bit weird. How does this work for multiple return values? Is it [-2, -1]? This needs documenting somewhere

May 22 2020, 7:28 AM · Restricted Project

May 21 2020

hliao added a comment to D80422: Enable `align <n>` to be used in intrinsic definitions..

To prepare the refactoring on D80364, intrinsics interested should be specified with the alignment on the return pointer. With this patch, amdgcn.dispatch.ptr is defined as follows

May 21 2020, 9:38 PM · Restricted Project
hliao created D80422: Enable `align <n>` to be used in intrinsic definitions..
May 21 2020, 9:38 PM · Restricted Project
hliao added inline comments to D80364: [amdgpu] Teach load widening to handle non-DWORD aligned loads..
May 21 2020, 11:53 AM · Restricted Project
hliao added inline comments to D80364: [amdgpu] Teach load widening to handle non-DWORD aligned loads..
May 21 2020, 11:53 AM · Restricted Project

May 20 2020

hliao created D80364: [amdgpu] Teach load widening to handle non-DWORD aligned loads..
May 20 2020, 11:58 PM · Restricted Project

May 19 2020

hliao updated the diff for D80237: [hip] Ensure pointer in struct argument has proper `addrspacecast`..

Revise following comments.

May 19 2020, 11:57 PM · Restricted Project
hliao created D80237: [hip] Ensure pointer in struct argument has proper `addrspacecast`..
May 19 2020, 1:44 PM · Restricted Project

May 7 2020

hliao added a comment to D79344: [cuda] Start diagnosing variables with bad target..
In D79344#2026349, @tra wrote:

Here's a slightly smaller variant which may be a good clue for tracking down the root cause. This one fails with:

var.cc:6:14: error: no matching function for call to 'copysign'
  double g = copysign(0, g);
             ^~~~~~~~
var.cc:5:56: note: candidate template ignored: substitution failure [with e = int, f = double]: reference to __host__ variable 'b' in __device__ function
__attribute__((device)) typename c<a<f>::b, double>::d copysign(e, f) {
                                         ~             ^
1 error generated when compiling for sm_60.

I suspect that it's handling of non-type template parameter that may be breaking things in both cases.

template <typename> struct a { static const bool b = true; };
template <bool, class> struct c;
template <class h> struct c<true, h> { typedef h d; };
template <typename e, typename f>
__attribute__((device)) typename c<a<f>::b, double>::d copysign(e, f) {
  double g = copysign(0, g);
}
May 7 2020, 10:12 PM · Restricted Project
hliao added a comment to D79344: [cuda] Start diagnosing variables with bad target..
In D79344#2026180, @tra wrote:

The problem is reproducible in upstream clang. Let's see if I can reduce it to something simpler.

May 7 2020, 4:19 PM · Restricted Project
hliao added a comment to D79344: [cuda] Start diagnosing variables with bad target..
In D79344#2026025, @tra wrote:

We're calling copysign( int, double). The standard library provides copysign(double, double), CUDA provides only copysign(float, double). As far as C++ is concerned, both require one type conversion. I guess previously we would give __device__ one provided by CUDA a higher preference, considering that the callee is a device function. Now both seem to have equal weight. I'm not sure how/why,

May 7 2020, 2:43 PM · Restricted Project

May 6 2020

hliao committed rG4ee5a04187aa: [amdgpu] Fix check of VCC. (authored by hliao).
[amdgpu] Fix check of VCC.
May 6 2020, 11:21 AM
hliao closed D79498: [amdgpu] Fix check of VCC..
May 6 2020, 11:21 AM · Restricted Project
hliao added a comment to D79498: [amdgpu] Fix check of VCC..

One alternative is to check isSuperRegisterEq(Reg, VCC) for conciseness and readability. Not sure, we will have similar issues on other special registers with 16-bit subregs. But, I admit that's costy compared to the current switch.

May 6 2020, 9:07 AM · Restricted Project
hliao created D79498: [amdgpu] Fix check of VCC..
May 6 2020, 9:07 AM · Restricted Project
hliao committed rG6533c1da7fab: Revert "[MIR] Fix a bug in MIR printer." (authored by hliao).
Revert "[MIR] Fix a bug in MIR printer."
May 6 2020, 8:36 AM
hliao added a reverting change for rGe38018b80d8e: [MIR] Fix a bug in MIR printer.: rG6533c1da7fab: Revert "[MIR] Fix a bug in MIR printer.".
May 6 2020, 8:36 AM
hliao added a comment to rGe38018b80d8e: [MIR] Fix a bug in MIR printer..

that change is temporarily reverted.

May 6 2020, 8:35 AM
hliao added a comment to rGe38018b80d8e: [MIR] Fix a bug in MIR printer..

llvm/test/CodeGen/PowerPC/stack-coloring-vararg.mir need adjusting after this fix. I will submit the fix soon after local verification.

May 6 2020, 8:35 AM
hliao committed rGe38018b80d8e: [MIR] Fix a bug in MIR printer. (authored by hliao).
[MIR] Fix a bug in MIR printer.
May 6 2020, 8:03 AM

May 5 2020

hliao committed rG9142c0b46bfe: [clang][codegen] Hoist parameter attribute setting in function prolog. (authored by hliao).
[clang][codegen] Hoist parameter attribute setting in function prolog.
May 5 2020, 12:59 PM
hliao committed rG276c8dde0b58: [clang][codegen] Refactor argument loading in function prolog. NFC. (authored by hliao).
[clang][codegen] Refactor argument loading in function prolog. NFC.
May 5 2020, 12:59 PM
hliao closed D79395: [clang][codegen] Hoist parameter attribute setting in function prolog..
May 5 2020, 12:59 PM · Restricted Project
hliao closed D79394: [clang][codegen] Refactor argument loading in function prolog. NFC..
May 5 2020, 12:58 PM · Restricted Project

May 4 2020

hliao updated the diff for D79395: [clang][codegen] Hoist parameter attribute setting in function prolog..

Add dependency.

May 4 2020, 11:07 PM · Restricted Project
hliao added a comment to D79213: [hip] Add noalias on restrict qualified coerced hip pointers.

I hoisted parameter attributes preparation in D79395, which depends on D79394, a cleanup to make that hoist more straight-forward.

May 4 2020, 11:07 PM · Restricted Project
hliao created D79395: [clang][codegen] Hoist parameter attribute setting in function prolog..
May 4 2020, 11:07 PM · Restricted Project
hliao created D79394: [clang][codegen] Refactor argument loading in function prolog. NFC..
May 4 2020, 11:07 PM · Restricted Project
hliao abandoned D79393: [clang][codegen] Refactor argument loading in function prolog. NFC..
May 4 2020, 10:35 PM · Restricted Project
hliao created D79393: [clang][codegen] Refactor argument loading in function prolog. NFC..
May 4 2020, 10:35 PM · Restricted Project
hliao added a comment to D79213: [hip] Add noalias on restrict qualified coerced hip pointers.

Any more comments? As this should be a performance-critical issue, shall we get conclusion and make progress for the next step?

May 4 2020, 2:32 PM · Restricted Project
hliao added a comment to D79344: [cuda] Start diagnosing variables with bad target..
In D79344#2018683, @tra wrote:
In D79344#2018561, @tra wrote:

This has a good chance of breaking existing code. It would be great to add an escape hatch option to revert to the old behavior if we run into problems. The change is relatively simple, so reverting it in case something goes wrong should work, too. Up to you.

Why? for the cases addressed in this patch, if there is existing code, it won't be compiled to generate module file due to the missing symbol. Anything missing?

Logistics, mostly.

Overloading is a rather fragile area of CUDA. This is the area where clang and NVCC behave differently. Combined with the existing code that needs to work with both compilers, even minor changes in compiler behavior can result in unexpected issues. Stricter checks tend to expose existing code which happens to work (or to compile) when it should not have, but it's not always trivial to fix those quickly. Having an escape hatch allows us to deal with those issues. It allows the owner of the code to reproduce the problem while the rest of the world continues to work. Reverting is suboptimal as the end user is often not in a good position to build a compiler with your patch plumbed in and then plumb the patched compiler into their build system. Adding another compiler option to enable/disable the new behavior is much more manageable.

May 4 2020, 2:00 PM · Restricted Project
hliao added inline comments to D79344: [cuda] Start diagnosing variables with bad target..
May 4 2020, 1:26 PM · Restricted Project
hliao added inline comments to D79344: [cuda] Start diagnosing variables with bad target..
May 4 2020, 1:26 PM · Restricted Project
hliao added inline comments to D79344: [cuda] Start diagnosing variables with bad target..
May 4 2020, 12:55 PM · Restricted Project
hliao added a comment to D79344: [cuda] Start diagnosing variables with bad target..
In D79344#2018561, @tra wrote:

This has a good chance of breaking existing code. It would be great to add an escape hatch option to revert to the old behavior if we run into problems. The change is relatively simple, so reverting it in case something goes wrong should work, too. Up to you.

May 4 2020, 12:54 PM · Restricted Project
hliao updated the diff for D79344: [cuda] Start diagnosing variables with bad target..

Reformatting test code following pre-merge checks.

May 4 2020, 12:53 PM · Restricted Project
hliao added a comment to D79344: [cuda] Start diagnosing variables with bad target..

That test code just passed compilation on clang trunk if only assembly code is generated, https://godbolt.org/z/XYjRcT. But NVCC generates errors on all cases.

May 4 2020, 11:48 AM · Restricted Project
hliao created D79344: [cuda] Start diagnosing variables with bad target..
May 4 2020, 11:48 AM · Restricted Project

May 1 2020

hliao added inline comments to rGd1c43615ed06: [clang-format] Add the missing default argument..
May 1 2020, 10:31 PM · Restricted Project
hliao committed rGf3a3db8627e9: Add the missing '='. NFC. (authored by hliao).
Add the missing '='. NFC.
May 1 2020, 10:08 PM
hliao added inline comments to D79213: [hip] Add noalias on restrict qualified coerced hip pointers.
May 1 2020, 5:10 PM · Restricted Project

Apr 30 2020

hliao added a comment to D79213: [hip] Add noalias on restrict qualified coerced hip pointers.

Basically, I think that should be a generic issue. As argument coercing doesn't handle pointer previously, we need to address this pointer-specific issue. That is, if the original argument has qualifiers being able to map onto LLVM attributes, the coerced argument should be those qualifiers as well if the new coerced one is still a pointer as the original one. We may lose more useful attributes not limited to noalias, such as alignment, nonnull, and etc.

Apr 30 2020, 9:25 PM · Restricted Project
hliao committed rGd1c43615ed06: [clang-format] Add the missing default argument. (authored by hliao).
[clang-format] Add the missing default argument.
Apr 30 2020, 3:07 PM · Restricted Project

Apr 29 2020

hliao added a comment to D78655: [CUDA][HIP] Let lambda be host device by default.

says we capture host variable reference in a device lambda.

Is that required to be an error? I know @AlexVlx added support to hcc at one point to capture host variables by reference. So it seems to be possible for it to work correctly. So it doesn't seem to be like reason enough to disallow implicit HD.

Apr 29 2020, 10:09 PM · Restricted Project
hliao added a comment to D78655: [CUDA][HIP] Let lambda be host device by default.

I though the goal of adding HD/D attributes for lambda is to make the static check easier as lambda used in device code or device lambda is sensitive to captures. Invalid capture may render error accidentally without static check, says we capture host variable reference in a device lambda. That makes the final code invalid. Allowing regular lambda to be used in global or device function is considering harmful.

Inferring a lambda function by default as __host__ __device__ does not mean skipping the check for harmful captures.

If we add such checks, it does not matter whether the __host__ __device__ attribute is explicit or implicit, they go through the same check.

How to infer the device/host-ness of a lambda function is a usability issue. It is orthogonal to the issue of missing diagnostics about captures.

Forcing users to explicitly mark a lambda function as __device__ __host__ itself does not help diagnose the harmful captures if such diags do not exist.

Let's think about a lambda function which captures references to host variables. If it is only used in host code, as in ordinary C++ host code. Marking it host device implicitly does not change anything, since it is not emitted in device code. If it is used in device code, it will likely cause mem fault at run time since currently we do not diagnose it. Does it help if we force users to mark it __device__ __host__? I don't think so. Users will just reluctantly add __device__ __host__ to it and they still end up as mem fault. If we add the diagnostic about the harmful captures, it does not matter whether the __device__ __host__ attribute is explicit or implicit, users get the same diagnostic about harmful captures. So the effect is the same. However, usability is improved if users do not need to add __device__ __host__ by themselves.

Does inferring lambda function as __device__ __host__ by default making the diagnostic about harmful captures more difficult? No. It should be the same for lambdas with explicit __device__ __host__. This needs to be a deferred diagnostic like those we already did, which are only emitted if the function is really emitted. It does not matter whether the device/host attrs are implicit or explicit.

Apr 29 2020, 10:09 PM · Restricted Project
hliao updated the diff for D71227: [cuda][hip] Fix function overload resolution in the global initiailizer..

Rebase to trunk and resolve the conflict.

Apr 29 2020, 1:27 PM · Restricted Project

Apr 27 2020

hliao committed rG612720db874d: [hip] Remove test using `hip_pinned_shadow` attribute. NFC. (authored by hliao).
[hip] Remove test using `hip_pinned_shadow` attribute. NFC.
Apr 27 2020, 2:02 PM

Apr 25 2020

hliao added inline comments to D78655: [CUDA][HIP] Let lambda be host device by default.
Apr 25 2020, 9:32 AM · Restricted Project
hliao requested changes to D78655: [CUDA][HIP] Let lambda be host device by default.
In D78655#1997491, @tra wrote:

Summoning @rsmith as I'm sure that there are interesting corner cases in lambda handling that we didn't consider.

Making lambdas implicitly HD will make it easier to write the code which can't be instantiated on one side of the compilation. That's probably observable via SFINAE, but I can't tell whether that matters.
By default I'd rather err on handling lambdas the same way as we do regular user-authored functions.

Apr 25 2020, 9:32 AM · Restricted Project

Apr 24 2020

hliao committed rG495bb8feb9af: Fix `-Wparentheses` warnings. NFC. (authored by hliao).
Fix `-Wparentheses` warnings. NFC.
Apr 24 2020, 12:27 PM

Apr 21 2020

hliao committed rG163bd9d85800: Fix `-Wpedantic` warnings. NFC. (authored by hliao).
Fix `-Wpedantic` warnings. NFC.
Apr 21 2020, 1:34 PM
hliao added a comment to D78128: Implement some functions in NativeSession..

http://45.33.8.238/linux/15836/step_12.txt

After fixing the -DLLVM_ENABLE_ABI_BREAKING_CHECKS=on problem locally, the unittest (unittests/DebugInfo/PDB/DebugInfoPDBTests) still fails.

(
An erroring Expected must be explicitly checked to avoid -DLLVM_ENABLE_ABI_BREAKING_CHECKS=on failures.

if (Expected<std::unique_ptr<PDBFile>> File = loadPdbFile(PdbPath, Allocator))
  return std::string(PdbPath);
else
  return File.takeError();

)

Apr 21 2020, 1:33 PM · Restricted Project
hliao committed rG86e3b735cd80: [hip] Claim builtin type `__float128` supported if the host target supports it. (authored by hliao).
[hip] Claim builtin type `__float128` supported if the host target supports it.
Apr 21 2020, 1:01 PM
hliao closed D78513: [hip] Claim builtin type `__float128` supported if the host target supports it..
Apr 21 2020, 1:00 PM · Restricted Project
hliao committed rG21529355e1ba: Fix `-Wparentheses` warnings. NFC. (authored by hliao).
Fix `-Wparentheses` warnings. NFC.
Apr 21 2020, 12:27 PM
hliao committed rGa13dce1d90cb: Fix build. NFC. (authored by hliao).
Fix build. NFC.
Apr 21 2020, 12:26 PM

Apr 20 2020

hliao added a comment to D78513: [hip] Claim builtin type `__float128` supported if the host target supports it..

Currently if instructions of float128 get to amdgpu backend, are we going to crash?

Apr 20 2020, 2:38 PM · Restricted Project
hliao created D78513: [hip] Claim builtin type `__float128` supported if the host target supports it..
Apr 20 2020, 1:00 PM · Restricted Project

Apr 15 2020

hliao committed rG50472c422cbb: Remove extra ‘;’. NFC. (authored by hliao).
Remove extra ‘;’. NFC.
Apr 15 2020, 2:22 PM

Apr 14 2020

hliao updated the diff for D71227: [cuda][hip] Fix function overload resolution in the global initiailizer..

Rebase to trunk.

Apr 14 2020, 8:33 AM · Restricted Project

Apr 11 2020

hliao added a comment to D76365: [cuda][hip] Add CUDA builtin surface/texture reference support..
In D76365#1975784, @tra wrote:

It appears I can crash clang with some texture code: https://godbolt.org/z/5vdEwC

Apr 11 2020, 9:35 AM · Restricted Project

Apr 10 2020

hliao added a comment to D77777: [nvptx] Add `nvvm.texsurf.handle` internalizer..
In D77777#1975440, @tra wrote:

Also, if I read PTX docs correctly, it should be OK to pass texture handle address via an intermediate variable:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#texture-sampler-and-surface-types

Creating pointers to opaque variables using mov, e.g., mov.u64 reg, opaque_var;. The resulting pointer may be stored to and loaded from memory, passed as a parameter to functions, and de-referenced by texture and surface load, store, and query instructions

We may not need the tokens and should be able to use regular pointer.

Apr 10 2020, 6:19 PM · Restricted Project
hliao added a comment to D77777: [nvptx] Add `nvvm.texsurf.handle` internalizer..
In D77777#1975406, @tra wrote:

the 1st argument in llvm.nvvm.texsurf.hande.internal or the 2nd one in llvm.nvvm.texsurf.handle must be kept as an immediate or constant value, i.e. that global variable. However, optimizations will find common code in the following

if (cond) {
  %hnd = texsurf.handle.internal(@tex1);
} else {
  %hnd = texsurf.handle.internal(@tex2)
}
= use(%hnd)

and hoist or sink it into

if (cond) {
  %ptr = @tex1;
} else {
  %ptr = @tex2;
}
%hnd = texsurf.handle.intenal(%ptr);
= use(%hnd)

The backend cannot handle non immediate operand in texsurf.handle. The similar thing happens to read.register as well as it also assumes its argument is always an immediate value.

I wonder if we can use token types to represent the handle? https://reviews.llvm.org/D11861
@majnemer -- would this use case be suitable for the token type?

Apr 10 2020, 6:19 PM · Restricted Project
hliao added a comment to D77777: [nvptx] Add `nvvm.texsurf.handle` internalizer..
In D77777#1974988, @tra wrote:

NVVM IR spec is for nvidia's own compiler. It's based on LLVM, but it does not impose direct constraints on LLVM's design choices.

It would be an advantage and, sometimes, desirable to generate IR compatible to NVVM IR spec.

I'm not against it, but I think it's OK to make different choices if we have good reasons for that. NVIDIA didn't update LLVM since they've contributed the original implementation, so by now we're both far behind the current state of NVVM and quite a bit sideways due to the things LLVM has added to NVPTX backend.

This sounds like it may have been done that way in an attempt to work around a problem with intrinsics' constraints. We may want to check if there's a better way to do it now.
Right now both intrinsics are marked with [IntrNoMem] which may be the reason for compiler feeling free to move it around. We may need to give compiler correct information and then we may not need this just-in-time intrinsic replacement hack. I think it should be at least IntrArgMemOnly or, maybe IntrInaccessibleMemOrArgMemOnly.

That may not exactly model the behavior as, for binding texture/surface support, in fact, it's true that there's no memory operation at all. Even with InstArgMemOnly or similar attributes, it still won't be preventable for optimizations to sink common code. Such trick is played in lots of intrinsics, such as read.register and etc.

Can you give me an example where/how optimizer would break things? Is that because were using metadata as an argument?

I've re-read NVVM docs and I can't say that I understand how it's supposed to work.
metadata holding the texture or surface variable alone is a rather odd notion and I'm not surprised that it's not handled well. In the end we do end up with a 'handle' which is an in-memory object. Perhaps it should be represented as a real variable with a metadata attribute. Then we can lower it as a handle, can enforce that only texture/surface instructions are allowed to use it and will have a way to tell LLVM what it's allowed to do.

I don't have a good picture of how it all will fit together in the end (or whether what I suggest makes sense), but the current implementation appears to be in need of rethinking.

Apr 10 2020, 3:04 PM · Restricted Project
hliao added a comment to D77777: [nvptx] Add `nvvm.texsurf.handle` internalizer..
In D77777#1974672, @tra wrote:
In D77777#1972349, @tra wrote:

The patch could use a more detailed description. Specifically, it does not describe the purpose of these changes.

Replace them with the internal version, i.e. nvvm.texsurf.handle.internal just before the instruction selector.

It's not clear what is 'them'. 'nvvm.texsurf.handle' ?
If so, do we need 'internal' any more? Can we just rename internal and be done with it? Adding an extra pass just to replace one intrinsic with another seems to be unnecessary.

I may be missing something here. Why do we have internal and non-internal intrinsics at all? Do we need both?

besides required by NVVM IR spec,

NVVM IR spec is for nvidia's own compiler. It's based on LLVM, but it does not impose direct constraints on LLVM's design choices.

Apr 10 2020, 1:26 PM · Restricted Project
hliao updated the diff for D77777: [nvptx] Add `nvvm.texsurf.handle` internalizer..

Fix a clang-tidy warning.

Apr 10 2020, 1:03 AM · Restricted Project
hliao committed rGb54b4ecac3e5: Fix `-Wextra` warning. NFC. (authored by hliao).
Fix `-Wextra` warning. NFC.
Apr 10 2020, 12:30 AM
hliao committed rG96c4ec8fdbd9: Remove extra whitespace. NFC. (authored by hliao).
Remove extra whitespace. NFC.
Apr 10 2020, 12:30 AM

Apr 9 2020

hliao updated the diff for D77777: [nvptx] Add `nvvm.texsurf.handle` internalizer..

Add more comments to explain what that pass does.

Apr 9 2020, 11:25 PM · Restricted Project
hliao added a comment to D77777: [nvptx] Add `nvvm.texsurf.handle` internalizer..
In D77777#1972349, @tra wrote:

The patch could use a more detailed description. Specifically, it does not describe the purpose of these changes.

Replace them with the internal version, i.e. nvvm.texsurf.handle.internal just before the instruction selector.

It's not clear what is 'them'. 'nvvm.texsurf.handle' ?
If so, do we need 'internal' any more? Can we just rename internal and be done with it? Adding an extra pass just to replace one intrinsic with another seems to be unnecessary.

I may be missing something here. Why do we have internal and non-internal intrinsics at all? Do we need both?

Apr 9 2020, 1:07 PM · Restricted Project
hliao added a comment to D77743: [HIP] Emit symbols with kernel name in host binary.

The ambiguity issue is still there. That __global__ function generates different code if it's compiled as HIP by clang or non-HIP code by clang or other compilers. That will break the resolving from the symbol value to its device kernel name.

Apr 9 2020, 10:36 AM
hliao added a comment to D77743: [HIP] Emit symbols with kernel name in host binary.

In addition, we may also need to extend the registration to set up the mapping from that global variable to the host side stub function. hipKernelLaunch (implemented as a function call instead of the kernel launch syntax) to call into that stub function to prepare the arguments.

hipKernelLaunch does not call the stub function. The stub function calls hipKernelLaunch. Therefore user/runtime does not need to know about stub function to launch a kernel.

Apr 9 2020, 9:51 AM
hliao added a comment to D77743: [HIP] Emit symbols with kernel name in host binary.

In addition, we may also need to extend the registration to set up the mapping from that global variable to the host side stub function. hipKernelLaunch (implemented as a function call instead of the kernel launch syntax) to call into that stub function to prepare the arguments.

Apr 9 2020, 9:35 AM
hliao requested changes to D77743: [HIP] Emit symbols with kernel name in host binary.
In D77743#1970304, @tra wrote:

The kernel handle is a variable. Even if it has the same name as kernel, it is OK for the debugger since the debugger does not put break point on a variable.

The patch appears to apply only to generated kernels. What happens when we take address of the kernel directly?

a.hip: 
__global__ void kernel() {}

auto kernel_ref() {
  return kernel;
}

b.hip:
extern __global__ void kernel(); // access the handle var
something kernel_ref(); // returns the stub pointer?

void f() {
  auto x = kernel_ref();
  auto y = kernel(); 
  hipLaunchKernel(x,...); // x is the stub pointer. 
  hipLaunchKernel(y,...);
}

Will x and y contain the same value? For CUDA the answer would be yes as they both would contain the address of the host-side stub with the kernel's name.
In this case external reference will point to the handle variable, but I'm not sure what would kernel_ref() return.
My guess is that it will be the stub address, which may be a problem. I may be wrong. It would be good to add a test to verify that we always get consistent results when we're referencing the kernel.

Apr 9 2020, 9:22 AM
hliao updated the diff for D77777: [nvptx] Add `nvvm.texsurf.handle` internalizer..

Rebase to trunk.

Apr 9 2020, 8:48 AM · Restricted Project

Apr 8 2020

hliao created D77777: [nvptx] Add `nvvm.texsurf.handle` internalizer..
Apr 8 2020, 11:12 PM · Restricted Project