Page MenuHomePhabricator
Feed Advanced Search

Feb 26 2019

yaxunl committed rGe739ac0e2555: [HIP] change kernel stub name (authored by yaxunl).
[HIP] change kernel stub name
Feb 26 2019, 6:03 PM
yaxunl committed rC354948: [HIP] change kernel stub name.
[HIP] change kernel stub name
Feb 26 2019, 6:03 PM
yaxunl committed rL354948: [HIP] change kernel stub name.
[HIP] change kernel stub name
Feb 26 2019, 6:03 PM
yaxunl closed D58518: [HIP] change kernel stub name.
Feb 26 2019, 6:03 PM · Restricted Project, Restricted Project
yaxunl committed rGfa49c3a888e8: [CUDA][HIP] Check calling convention based on function target (authored by yaxunl).
[CUDA][HIP] Check calling convention based on function target
Feb 26 2019, 2:27 PM
yaxunl committed rL354929: [CUDA][HIP] Check calling convention based on function target.
[CUDA][HIP] Check calling convention based on function target
Feb 26 2019, 2:27 PM
yaxunl committed rC354929: [CUDA][HIP] Check calling convention based on function target.
[CUDA][HIP] Check calling convention based on function target
Feb 26 2019, 2:27 PM
yaxunl closed D57716: [CUDA][HIP] Check calling convention based on function target.
Feb 26 2019, 2:27 PM · Restricted Project
yaxunl accepted D58623: [AMDGPU] Allow using integral non-type template parameters.

LGTM. Thanks!

Feb 26 2019, 9:38 AM · Restricted Project
yaxunl added a comment to D57716: [CUDA][HIP] Check calling convention based on function target.

ping

Feb 26 2019, 8:23 AM · Restricted Project
yaxunl added a reviewer for D58518: [HIP] change kernel stub name: rjmccall.
Feb 26 2019, 8:23 AM · Restricted Project, Restricted Project
yaxunl committed rGd83c74028db0: [OpenCL] Fix assertion due to blocks (authored by yaxunl).
[OpenCL] Fix assertion due to blocks
Feb 26 2019, 8:20 AM
yaxunl committed rL354893: [OpenCL] Fix assertion due to blocks.
[OpenCL] Fix assertion due to blocks
Feb 26 2019, 8:20 AM
yaxunl committed rC354893: [OpenCL] Fix assertion due to blocks.
[OpenCL] Fix assertion due to blocks
Feb 26 2019, 8:20 AM
yaxunl closed D58658: [OpenCL] Fix assertion due to blocks.
Feb 26 2019, 8:20 AM · Restricted Project
yaxunl added a comment to D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.

I would like to fix the validation issue only and leave the overload resolution issue for future.

As I understand it, the "validation issue" is just that you'd like a diagnostic to be emitted when resolving the template argument in order to force SFINAE to pick a different template. I think that's actually just the overload-resolution issue.

Feb 26 2019, 7:16 AM · Restricted Project

Feb 25 2019

yaxunl created D58658: [OpenCL] Fix assertion due to blocks.
Feb 25 2019, 5:39 PM · Restricted Project

Feb 22 2019

yaxunl added inline comments to D58518: [HIP] change kernel stub name.
Feb 22 2019, 3:47 PM · Restricted Project, Restricted Project
yaxunl reopened D58518: [HIP] change kernel stub name.
Feb 22 2019, 1:45 PM · Restricted Project, Restricted Project
yaxunl updated the diff for D58518: [HIP] change kernel stub name.

Fixed regressions.

Feb 22 2019, 1:45 PM · Restricted Project, Restricted Project
yaxunl updated the diff for D57716: [CUDA][HIP] Check calling convention based on function target.

modify test to use non-template functions.

Feb 22 2019, 8:44 AM · Restricted Project
yaxunl added inline comments to D57716: [CUDA][HIP] Check calling convention based on function target.
Feb 22 2019, 8:40 AM · Restricted Project

Feb 21 2019

yaxunl committed rG00ebc0cb92e9: revert r354615: [HIP] change kernel stub name (authored by yaxunl).
revert r354615: [HIP] change kernel stub name
Feb 21 2019, 8:20 PM
yaxunl committed rL354651: revert r354615: [HIP] change kernel stub name.
revert r354615: [HIP] change kernel stub name
Feb 21 2019, 8:19 PM
yaxunl committed rC354651: revert r354615: [HIP] change kernel stub name.
revert r354615: [HIP] change kernel stub name
Feb 21 2019, 8:19 PM
yaxunl updated the diff for D57716: [CUDA][HIP] Check calling convention based on function target.

Revised by Artem's comments.

Feb 21 2019, 2:04 PM · Restricted Project
yaxunl committed rG8d7cf0e2d4b5: [HIP] change kernel stub name (authored by yaxunl).
[HIP] change kernel stub name
Feb 21 2019, 12:13 PM
yaxunl committed rL354615: [HIP] change kernel stub name.
[HIP] change kernel stub name
Feb 21 2019, 12:12 PM
yaxunl committed rC354615: [HIP] change kernel stub name.
[HIP] change kernel stub name
Feb 21 2019, 12:11 PM
yaxunl closed D58518: [HIP] change kernel stub name.
Feb 21 2019, 12:11 PM · Restricted Project, Restricted Project
yaxunl updated the diff for D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.

I would like to fix the validation issue only and leave the overload resolution issue for future.

Feb 21 2019, 11:43 AM · Restricted Project
yaxunl accepted D58509: [CodeGen] Fix string literal address space casting..

LGTM. Thanks!

Feb 21 2019, 11:28 AM · Restricted Project
yaxunl added a comment to D58518: [HIP] change kernel stub name.
In D58518#1406124, @tra wrote:

My guess is that this is needed because HIP debugger can see symbols from both host and device executables at the same time. Is that so?

If that's the case, I guess HIP may have similar naming problem for __host__ __device__ foo() if it's used on both host and device.

Feb 21 2019, 11:24 AM · Restricted Project, Restricted Project
yaxunl created D58518: [HIP] change kernel stub name.
Feb 21 2019, 10:05 AM · Restricted Project, Restricted Project

Feb 15 2019

yaxunl added a comment to D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.

It is totally unreasonable, at the time you are resolving a template argument, to investigate how the corresponding template parameter is used within the template and use that to shape how the template argument is resolved. That is simply not how the C++ template model works. Given that CODA doesn't distinguish between host and device functions in the type system, if you are going to have a rule here, it has to be based on, at most, (1) the current semantic context (which may not even be a function), (2) the template being specialized, and (3) the declarations in the template-argument set.

As I've said before on a previous patch, I think the *best* rule would be to recognize a hard difference between host and device function types, probably by making function types default to being host function types and requiring function pointers that can store device function pointers to be explicitly annotated. However, that would not be source-compatible with ordinary CUDA, which is presumably unacceptable.

The second-best rule would be to preserve compatibility by making an unannotated function type still be "unknown whether host or device", but to also allow the creation of explicitly host-only and device-only function types. For source compatibility, DREs to functions would formally have the unknown function type. Converting a pointer to an unknown function into a pointer to a host function would do some basic checking on the operand expression (basically to verify that it's not obviously a device function), and resolving an overload set in the context of a host-only function pointer type would do the obvious filtering.

Otherwise, you're going to be stuck where you are right now, which is that you're messing around with heuristics because somebody added a language extension that isn't actually very well thought out. But if that's what you have to do, it's what you have to do. For this specific question, where you are trying to resolve an overloaded template argument, I think there are basically two sensible options.

  • You can filter the overloads by the host-ness of the template. This makes some sense, because it's probably most likely that a function template that takes a function as a template argument is going to call it — but not necessarily, because it very well might decide instead to call over to the device to invoke the function. Also, not all templates have a "host-ness"; that's pretty much exclusive to function templates.
  • You can filter the overload by the host-ness of the current context. Again, this makes some sense because it's likely that a host function is trying to pass down a host function — but again, it's not hard to think of exceptions. And again, this has the problem that the context isn't always a function and so doesn't necessarily have a host-ness. Any sort of additional template-specific guidance seems doomed to gradually turn into the second design I mentioned above where you have the ability to be more specific about function types.

    For the time being, this is still a Clang extension, and while Artem mentioned that NVIDIA is investigating it, that's presumably still an investigation and we still have an opportunity to shape their thinking. So I would really recommend taking the second approach, or maybe even trying to convince them to take the first. (How common is higher-order programming on the device, anyway, that you can't break source compatibility for it?) For this specific line of inquiry, that would probably mean not trying to automatically use any particular filter on the overload set but instead just relying on the programmer to annotation what kind of function they want.
Feb 15 2019, 9:28 PM · Restricted Project
yaxunl added a comment to D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.

But what we've just been talking about is not a validity rule, it's an overload-resolution rule. It's not *invalid* to use a device function as a template argument to a host function template (or to a class template, which of course is neither host nor device). All you need to do is to resolve otherwise-intractable overload ambiguities by matching with the host-ness of the current context, which there's probably already code to do for when an overload set is used as e.g. a function argument.

Feb 15 2019, 4:39 PM · Restricted Project

Feb 14 2019

yaxunl added a comment to D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.
In D56411#1398291, @tra wrote:

That said, does CUDA have a general rule resolving __host__ vs. __device__ overloads based on context? And does it allow overloading based solely on __host__ vs. __device__?

NVCC does not. Clang does. See https://goo.gl/EXnymm for the details.

AFAICT, NVIDIA is starting to consider adopting Clang's approach:
http://lists.llvm.org/pipermail/cfe-dev/2018-November/060070.html (original message from Bryce apparently didn't make it to the cfe-dev archive)

Okay. Probably the template-argument rule ought to be the same as the address-of-function rule, which I assume means that there's a final pass that resolves ambiguities in favor of functions that can be used from the current context, to the extent that that's meaningful. It's hard to tell because that document does not appear to include a formal specification.

Regardless, that has no effect on this patch.

Feb 14 2019, 1:00 PM · Restricted Project
yaxunl added a comment to D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.
In D56411#1398291, @tra wrote:

That said, does CUDA have a general rule resolving __host__ vs. __device__ overloads based on context? And does it allow overloading based solely on __host__ vs. __device__?

NVCC does not. Clang does. See https://goo.gl/EXnymm for the details.

AFAICT, NVIDIA is starting to consider adopting Clang's approach:
http://lists.llvm.org/pipermail/cfe-dev/2018-November/060070.html (original message from Bryce apparently didn't make it to the cfe-dev archive)

Feb 14 2019, 11:28 AM · Restricted Project
yaxunl added a comment to D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.

I think the diagnostic should come during instantiation when you find an evaluated use of a host function within a device function.

It seems the body of function template is checked only during parsing of the definition of the template itself. When a function
template is instantiated, the body of the instantiated function is not checked again.

No, that's not correct. However, it's checked somewhat differently, and it's possible that the existing diagnostic is not set up to fire along all common paths. Try moving the diagnostic to MarkFunctionReferenced, and note that OdrUse will be false in all the unevaluated contexts.

You are right. After I disable current diagnostic, I saw PerformPendingInstantiations at the end of parsing the TU, where the AST of the instantiated function is iterated and MarkFunctionReferenced is called. I will try to fix my patch as suggested. Thanks.

I got one concern. If we want to do overload resolution of function type template argument based on host or device, we need to do that before template instantiation, right?

e.g. we have two functions having the same name f and type, but one is __host__ and the other is __device__, and we pass it as a template argument to a template function g. We want to choose __device__ f if g itself is __device__ and __host__ f if g itself is __host__. If we want to do this we have to do the check before template instantiation, right?

Yes, you would need to check that when resolving the overload to a single declaration. That would be separate from diagnosing uses.

That said, does CUDA have a general rule resolving __host__ vs. __device__ overloads based on context? And does it allow overloading based solely on __host__ vs. __device__?

Feb 14 2019, 9:03 AM · Restricted Project
yaxunl added a comment to D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.

I think the diagnostic should come during instantiation when you find an evaluated use of a host function within a device function.

It seems the body of function template is checked only during parsing of the definition of the template itself. When a function
template is instantiated, the body of the instantiated function is not checked again.

No, that's not correct. However, it's checked somewhat differently, and it's possible that the existing diagnostic is not set up to fire along all common paths. Try moving the diagnostic to MarkFunctionReferenced, and note that OdrUse will be false in all the unevaluated contexts.

You are right. After I disable current diagnostic, I saw PerformPendingInstantiations at the end of parsing the TU, where the AST of the instantiated function is iterated and MarkFunctionReferenced is called. I will try to fix my patch as suggested. Thanks.

Feb 14 2019, 8:35 AM · Restricted Project

Feb 13 2019

yaxunl committed rGc18e9ecd4fc1: [CUDA][HIP] Use device side kernel and variable names when registering them (authored by yaxunl).
[CUDA][HIP] Use device side kernel and variable names when registering them
Feb 13 2019, 6:00 PM
yaxunl committed rL354004: [CUDA][HIP] Use device side kernel and variable names when registering them.
[CUDA][HIP] Use device side kernel and variable names when registering them
Feb 13 2019, 6:00 PM
yaxunl committed rC354004: [CUDA][HIP] Use device side kernel and variable names when registering them.
[CUDA][HIP] Use device side kernel and variable names when registering them
Feb 13 2019, 6:00 PM
yaxunl closed D58163: [CUDA][HIP] Use device side kernel and variable names when registering them.
Feb 13 2019, 5:59 PM · Restricted Project
yaxunl updated the diff for D58163: [CUDA][HIP] Use device side kernel and variable names when registering them.

Revised by Artem's comments.

Feb 13 2019, 11:42 AM · Restricted Project
yaxunl added inline comments to D58163: [CUDA][HIP] Use device side kernel and variable names when registering them.
Feb 13 2019, 11:27 AM · Restricted Project
yaxunl added inline comments to D58163: [CUDA][HIP] Use device side kernel and variable names when registering them.
Feb 13 2019, 11:08 AM · Restricted Project
yaxunl added a comment to rG7084b56ee2e0: [HIP] Handle compile -m options and propagate into LLC.

It seem we only need to remove -mattr=-code-object-v3. we do not need to handle the feature options e.g. -mcode-object-v3 here, since they will be automatically pass to clang -cc1 and handled by clang -cc1. Your can keep the test since it is still correct.

Feb 13 2019, 10:53 AM

Feb 12 2019

yaxunl created D58163: [CUDA][HIP] Use device side kernel and variable names when registering them.
Feb 12 2019, 7:57 PM · Restricted Project

Feb 11 2019

yaxunl accepted D56871: [AMDGPU] Require at least protected visibility for certain symbols.

LGTM. Thanks!

Feb 11 2019, 10:37 AM · Restricted Project
yaxunl created D58057: Allow bundle size to be 0 in clang-offload-bundler.
Feb 11 2019, 8:52 AM · Restricted Project

Feb 6 2019

yaxunl added a comment to D57829: [HIP] Disable emitting llvm.linker.options in device compilation.
In D57829#1387412, @tra wrote:

Could you elaborate on why you want to disable this metadata? I think the original idea of llvm.linker.options was that it should be ignored if the back-end does not support it.

Feb 6 2019, 11:16 AM
yaxunl created D57831: AMDGPU: set wchar_t and wint_t to be unsigned short on windows.
Feb 6 2019, 11:06 AM
yaxunl created D57829: [HIP] Disable emitting llvm.linker.options in device compilation.
Feb 6 2019, 10:53 AM

Feb 5 2019

yaxunl updated the diff for D57716: [CUDA][HIP] Check calling convention based on function target.

My last fix is not right. This patch fixes the issue by checking calling convention based on whether it is host or device function.

Feb 5 2019, 12:24 PM · Restricted Project

Feb 4 2019

yaxunl created D57716: [CUDA][HIP] Check calling convention based on function target.
Feb 4 2019, 2:36 PM · Restricted Project
yaxunl added a comment to D57707: Add Triple::isAMDGPU.

Needs unit test

Feb 4 2019, 12:48 PM
yaxunl updated the diff for D57707: Add Triple::isAMDGPU.

add usage of it

Feb 4 2019, 12:47 PM
yaxunl created D57707: Add Triple::isAMDGPU.
Feb 4 2019, 12:29 PM

Jan 31 2019

yaxunl committed rL352801: Do not copy long double and 128-bit fp format from aux target for AMDGPU.
Do not copy long double and 128-bit fp format from aux target for AMDGPU
Jan 31 2019, 1:57 PM
yaxunl committed rC352801: Do not copy long double and 128-bit fp format from aux target for AMDGPU.
Do not copy long double and 128-bit fp format from aux target for AMDGPU
Jan 31 2019, 1:57 PM
yaxunl closed D57527: Do not copy long double and 128-bit fp format from aux target for AMDGPU.
Jan 31 2019, 1:57 PM
yaxunl added a comment to D57527: Do not copy long double and 128-bit fp format from aux target for AMDGPU.

Explanatory comment, please. Otherwise LGTM.

Jan 31 2019, 1:04 PM
yaxunl added a comment to D57527: Do not copy long double and 128-bit fp format from aux target for AMDGPU.

Okay, so you silently have an incompatible ABI for anything in the system headers that mentions long double. Do you have any plans to address or work around that, or is the hope that it just doesn't matter?

I feel like this should be a special case for AMDGPU rather than a general behavior with aux targets.

If host do not pass long double to device we will be fine. So we need to diagnose long double kernel arguments. However I'd like to do it in separate patch since we want to fix the regression first.

Okay. Do you also need to look for global structs and other way that information might be passed? I suppose at some level you just have to document it as a danger and treat further diagnostics as QoI.

I created a pull request to document long double usage in HIP https://github.com/ROCm-Developer-Tools/HIP/pull/890

Jan 31 2019, 12:24 PM
yaxunl updated the diff for D57527: Do not copy long double and 128-bit fp format from aux target for AMDGPU.

Fix in AMDGPUTargetInfo.

Jan 31 2019, 12:23 PM
yaxunl added a comment to D57527: Do not copy long double and 128-bit fp format from aux target for AMDGPU.

Okay, so you silently have an incompatible ABI for anything in the system headers that mentions long double. Do you have any plans to address or work around that, or is the hope that it just doesn't matter?

I feel like this should be a special case for AMDGPU rather than a general behavior with aux targets.

Jan 31 2019, 11:09 AM
yaxunl created D57527: Do not copy long double and 128-bit fp format from aux target for AMDGPU.
Jan 31 2019, 10:47 AM

Jan 30 2019

yaxunl committed rL352620: [HIP] Fix size_t for MSVC environment.
[HIP] Fix size_t for MSVC environment
Jan 30 2019, 4:27 AM
yaxunl committed rC352620: [HIP] Fix size_t for MSVC environment.
[HIP] Fix size_t for MSVC environment
Jan 30 2019, 4:27 AM
yaxunl closed D56318: [HIP] Fix size_t for MSVC environment.
Jan 30 2019, 4:27 AM

Jan 29 2019

yaxunl updated the diff for D56318: [HIP] Fix size_t for MSVC environment.

Use const argument.

Jan 29 2019, 8:20 PM
yaxunl updated the diff for D56318: [HIP] Fix size_t for MSVC environment.

Revised by John's comments.

Jan 29 2019, 7:02 PM
yaxunl committed rL352488: [CUDA][HIP] Do not diagnose use of _Float16.
[CUDA][HIP] Do not diagnose use of _Float16
Jan 29 2019, 5:20 AM
yaxunl committed rC352488: [CUDA][HIP] Do not diagnose use of _Float16.
[CUDA][HIP] Do not diagnose use of _Float16
Jan 29 2019, 5:20 AM
yaxunl closed D57369: [CUDA][HIP] Do not diagnose use of _Float16.
Jan 29 2019, 5:20 AM

Jan 28 2019

yaxunl created D57369: [CUDA][HIP] Do not diagnose use of _Float16.
Jan 28 2019, 9:31 PM
yaxunl added a comment to D57188: Disable _Float16 for non ARM/SPIR Targets.

This change causes regressions for CUDA/HIP. As single-source language, CUDA/HIP code contains both device and host code. It has separate compilation for host and device.
In host compilation, device function is parsed but not emitted in IR. The device function may have _Float16 argument, which is fine if device target supports it. Host compilation
should not diagnose use of _Float16 in device functions. However, current implementation diagnose any _Float16 usage in host compilation.

Jan 28 2019, 6:39 PM

Jan 24 2019

yaxunl added a comment to D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.

I think the diagnostic should come during instantiation when you find an evaluated use of a host function within a device function.

It seems the body of function template is checked only during parsing of the definition of the template itself. When a function
template is instantiated, the body of the instantiated function is not checked again.

No, that's not correct. However, it's checked somewhat differently, and it's possible that the existing diagnostic is not set up to fire along all common paths. Try moving the diagnostic to MarkFunctionReferenced, and note that OdrUse will be false in all the unevaluated contexts.

Jan 24 2019, 12:25 PM · Restricted Project

Jan 21 2019

yaxunl added a comment to D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.

I think the diagnostic should come during instantiation when you find an evaluated use of a host function within a device function.

It seems the body of function template is checked only during parsing of the definition of the template itself. When a function
template is instantiated, the body of the instantiated function is not checked again.

No, that's not correct. However, it's checked somewhat differently, and it's possible that the existing diagnostic is not set up to fire along all common paths. Try moving the diagnostic to MarkFunctionReferenced, and note that OdrUse will be false in all the unevaluated contexts.

Jan 21 2019, 8:24 PM · Restricted Project
yaxunl added a comment to D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.

I think the diagnostic should come during instantiation when you find an evaluated use of a host function within a device function.

Jan 21 2019, 2:14 PM · Restricted Project
yaxunl added a comment to D56318: [HIP] Fix size_t for MSVC environment.

It's pretty unfortunate that all these fields have to be individually called out like this. Can you move all these basic layout fields into a separate struct (which can be a secondary base class of TargetInfo) which can then just be normally copied? Anything that needs special copy semantics, like the LLVM DataLayout (do you need to copy this?) doesn't need to go into that struct, just the basic POD things that determine fundamental type layouts and semantics.

Jan 21 2019, 11:57 AM
yaxunl updated the diff for D56318: [HIP] Fix size_t for MSVC environment.

separate layout controlling flags to a base class for TargetInfo.

Jan 21 2019, 11:49 AM

Jan 16 2019

yaxunl added a comment to D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.

This patch still doesn't make any sense. You don't need to do any special validation when passing a function as a template argument. When Sema instantiates the template definition, it'll rebuild the expressions that refer to the template parameter, which will trigger the normal checking for whether those expressions are illegally referencing a host function from the device, etc. All you need to do is suppress that checking (whether it happens in a template definition or not) for references from non-potentially-evaluated contexts.

If you look at line 6583 of lib/Sema/SemaTemplate.cpp, you will see clang does the check if the function needs overloading resolution. However, clang missed the check if the function does not need overloading resolution. That's why I need to add the check at line 6593. All the other stuff is just to help make this check.

why clang does not do the reference check when there is no overloading resolution?

We should have already done the check for a non-overloaded function reference as part of building the DRE. See Sema::BuildDeclarationNameExpr. Template argument checking can resolve an overload set based on the type of the template parameter, so overload sets have to be treated specially there.

I think in usual cases clang already does that check during template argument parsing, so it does not need to do that again at line 6593. Unfortunately, for CUDA host/device check, it has to be skipped in template argument parsing and deferred to line 6593.

Again, you really should not ever impose this restriction in template arguments.

Jan 16 2019, 7:45 AM · Restricted Project

Jan 11 2019

yaxunl updated the diff for D56318: [HIP] Fix size_t for MSVC environment.

Copy type information from AuxTarget.

Jan 11 2019, 11:10 AM

Jan 10 2019

yaxunl added a comment to D56318: [HIP] Fix size_t for MSVC environment.

No, I understand that things like the function-call ABI should be different from the associated host ABI, but things like the size of long and the bit-field layout algorithm presumably shouldn't be, and that's the sort of thing that's configured by TargetInfo.

How about create a ForwardingTargegInfo which will has a pointer to AuxTarget and forward to that target if it is not null. Then let AMDGPUTargetInfo inherit from that.

Why forward? You have, like, two supported host environments, right? Can you just a subclass apiece of either MicrosoftX86_64TargetInfo or X86_64TargetInfo?

If that's unreasonable and you do need to forward, having a ForwardingTargetInfo sounds like a good idea, although I think you should require it to have an underlying target, and I think you need it to copy all the fields of that target.

Jan 10 2019, 1:03 PM
yaxunl committed rC350885: [HIP] Use nul instead of /dev/null when running on windows.
[HIP] Use nul instead of /dev/null when running on windows
Jan 10 2019, 12:15 PM
yaxunl committed rL350885: [HIP] Use nul instead of /dev/null when running on windows.
[HIP] Use nul instead of /dev/null when running on windows
Jan 10 2019, 12:15 PM
yaxunl closed D56225: [HIP] Use nul instead of /dev/null when running on windows.
Jan 10 2019, 12:15 PM
yaxunl added a comment to D56318: [HIP] Fix size_t for MSVC environment.

No, I understand that things like the function-call ABI should be different from the associated host ABI, but things like the size of long and the bit-field layout algorithm presumably shouldn't be, and that's the sort of thing that's configured by TargetInfo.

Jan 10 2019, 10:49 AM
yaxunl added a comment to D56318: [HIP] Fix size_t for MSVC environment.

If I was only concerned about size_t, your current solution would be fine. My concern is that you really need to match *all* of the associated CPU target's ABI choices, so your target really ought to be forwarding everything to that target by default and only selectively overriding it in order to support GPU-specific features. Probably the easiest way to do that is via inheritance.

Jan 10 2019, 10:30 AM
yaxunl added a comment to D56318: [HIP] Fix size_t for MSVC environment.

Okay. Is there a reasonable way to make your targets delegate to a different TargetInfo implementation for most things so that you can generally match the host target for things like type sizes and alignments?

Jan 10 2019, 7:18 AM
yaxunl added inline comments to D56225: [HIP] Use nul instead of /dev/null when running on windows.
Jan 10 2019, 7:07 AM
yaxunl added a comment to D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.

This patch still doesn't make any sense. You don't need to do any special validation when passing a function as a template argument. When Sema instantiates the template definition, it'll rebuild the expressions that refer to the template parameter, which will trigger the normal checking for whether those expressions are illegally referencing a host function from the device, etc. All you need to do is suppress that checking (whether it happens in a template definition or not) for references from non-potentially-evaluated contexts.

Jan 10 2019, 4:49 AM · Restricted Project

Jan 9 2019

yaxunl added a comment to D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.

But why? Why do you want to limit this to just template arguments instead of all sorts of similar contexts?

Jan 9 2019, 4:56 PM · Restricted Project
yaxunl updated the diff for D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.

disable the check for more general cases.

Jan 9 2019, 4:28 PM · Restricted Project
yaxunl added a comment to D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.

Sema won't necessarily have resolved a template decl when parsing a template argument list, so trying to propagate that decl down to indicate that we're resolving a template argument is not a good approach.

I was going to suggest recording that we're within a template argument in the current ExpressionEvaluationContextRecord, but in fact there's an even simpler and more general solution: there's no reason to enforce this restriction in *any* unevaluated context. If someone wants to refer to a device function within a decltype or sizeof operand, that should be fine. So you should just conditionalize the diagnostic on whether this is within an unevaluated context.

Jan 9 2019, 11:28 AM · Restricted Project
yaxunl updated the diff for D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.

Passing template decl by ExpressionEvaluationContextRecord.

Jan 9 2019, 11:23 AM · Restricted Project
yaxunl updated the diff for D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.

Add test for __host__ __device__.
Removing the flag IsParsingTemplateArgument in Sema. Instead, check ExprEvalContexts
for disabling checking device/host consistency.
I did not use ExprEvalContext Unevaluated to condition the check because
the issue happens with ExprEvalContext ConstantEvaluated. Also we do not want to
totally remove the check, we just want to defer the check until the arg evaluation is done.
When the deferred check is performed, ExprEvalContext is still in ConstantEvaluated but
its kind is no longer EK_TemplateArgument, therefore we can use the expr kind to condition
the check.

Jan 9 2019, 8:41 AM · Restricted Project
yaxunl added a comment to D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.
__host__ void bar() {}
__device__ int bar() { return 0; }
__host__ __device__ void foo() { int x = bar(); }
template <void (*devF)()> __global__ void kernel() { devF();}

kernel<foo>();

we DTRT for this case. Here host bar needs to return int since foo() expects that. will add a test for that.

__host__ bar() should not need to return int if foo is inline (or templated), because then we should never codegen foo for host. I guess my question is, we should be sure that kernel<foo>() does not force an inline/templated foo to be codegen'ed for host. (Sorry that wasn't more clear before.)

Jan 9 2019, 7:59 AM · Restricted Project

Jan 8 2019

yaxunl added a comment to D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.

Without reading the patch in detail (sorry) but looking mainly at the testcase: It looks like we're not checking how overloading and __host__ __device__ functions play into this. Maybe there are some additional edge-cases to explore/check.

will add test for __host__ __device__.

Jan 8 2019, 1:52 PM · Restricted Project