Page MenuHomePhabricator

grokos (George Rokos)
User

Projects

User does not belong to any projects.

User Details

User Since
Feb 16 2016, 1:22 PM (191 w, 1 d)

Recent Activity

Thu, Oct 10

grokos added inline comments to D68746: [Clang][OpenMP Offload] Move offload registration code to the wrapper.
Thu, Oct 10, 5:13 PM · Restricted Project

Tue, Oct 8

grokos committed rL374136: Request commit access for grokos.
Request commit access for grokos
Tue, Oct 8, 7:43 PM

Fri, Oct 4

grokos retitled D68369: Use named constant to indicate all lanes, to handle 32 and 64 wide architectures [NFC] from Use named constant to indicate all lanes, to handle 32 and 64 wide architectures to Use named constant to indicate all lanes, to handle 32 and 64 wide architectures [NFC].
Fri, Oct 4, 2:47 PM · Restricted Project, Restricted Project
grokos added a comment to D68369: Use named constant to indicate all lanes, to handle 32 and 64 wide architectures [NFC].

I also feel the patch must be marked as NFC, there is no functional change here.

Fri, Oct 4, 2:40 PM · Restricted Project, Restricted Project
grokos accepted D68369: Use named constant to indicate all lanes, to handle 32 and 64 wide architectures [NFC].
Fri, Oct 4, 2:40 PM · Restricted Project, Restricted Project
grokos added a comment to D68369: Use named constant to indicate all lanes, to handle 32 and 64 wide architectures [NFC].

I'm fine with the changes too. Maybe we should change the title because eventually this patch does not use -1 but rather the full representation (0xffff....).

Fri, Oct 4, 1:43 PM · Restricted Project, Restricted Project

Wed, Oct 2

grokos added a comment to D68369: Use named constant to indicate all lanes, to handle 32 and 64 wide architectures [NFC].

I'm wondering whether it would be better to define __kmpc_impl_all_lanes conditionally, i.e. as 0xFFFF for 32-wide and 0xFFFFFFFF for 64-wide architectures. I'm not sure I like this implicit signed-to-unsigned conversion here... Also, __kmpc_impl_lanemask_t is currently typedef'ed as uint32_t, if we want to support 64-wide lanes, then __kmpc_impl_lanemask_t must also be conditionally defined accordingly.

Wed, Oct 2, 7:37 PM · Restricted Project, Restricted Project
grokos added a comment to D68369: Use named constant to indicate all lanes, to handle 32 and 64 wide architectures [NFC].

Could alternatively add a constant to target_impl if preferred.

Wed, Oct 2, 6:54 PM · Restricted Project, Restricted Project

Fri, Sep 27

grokos added a comment to D68100: [OpenMP 5.0] declare mapper runtime implementation.

Lingda is right, we had faced the same issue in the loop trip count implementation. The loop trip count should be set per task but libomptarget has no notion of tasks, so we ended up engaging the host runtime (libomp) to store per-task information. Although it involves more work, I still believe that will be a more elegant solution.

Fri, Sep 27, 1:22 PM · Restricted Project
grokos added a comment to D68100: [OpenMP 5.0] declare mapper runtime implementation.

Please review when you have time, thanks

Maybe, it is better to add a single function to pass the list of mappers to the runtime and use the old functions rather than just duplicate them? Something like __tgt_mappers(...) and store them in the runtime. Plus, modify the original functions to use mappers if they were provided. Thoughts?

Fri, Sep 27, 12:00 PM · Restricted Project

Aug 23 2019

grokos added a comment to D66672: [OPENMP][NVPTX]Add __kmpc_syncwarp(int32_t) function..

I guess there is a clang-side patch as well which makes use of the new exposed kmpc function? Can you add it to the description? Also, can you add a basic test with a critical region that doesn't work correctly without this patch?

Aug 23 2019, 12:06 PM · Restricted Project
grokos abandoned D46185: [OpenMP] Allow nvptx sm_30 to be used as an offloading device.

As there has been no update from the original author for 1.5 years and what is posted here doesn't work anyway, I am abandoning this patch. If for any reason we need to enable sm_30 devices to be used for offloading, we can do it later on as part of redesigning the runtime by factoring out common logic.

Aug 23 2019, 11:47 AM · Restricted Project
grokos commandeered D46185: [OpenMP] Allow nvptx sm_30 to be used as an offloading device.
Aug 23 2019, 11:46 AM · Restricted Project

Aug 8 2019

grokos added a comment to D65340: [OpenMP][libomptarget] Add support for close map modifier.

I don't have any comments about this patch other than the ones @Hahnfeld mentioned.

Aug 8 2019, 12:19 PM · Restricted Project, Restricted Project

Aug 6 2019

grokos added inline comments to D65001: [OpenMP][libomptarget] Add support for unified memory for regular maps.
Aug 6 2019, 1:43 PM · Restricted Project, Restricted Project
grokos added inline comments to D65001: [OpenMP][libomptarget] Add support for unified memory for regular maps.
Aug 6 2019, 11:57 AM · Restricted Project, Restricted Project
grokos added inline comments to D64217: [OpenMP][NFCI] Cleanup the target state queue implementation.
Aug 6 2019, 9:44 AM · Restricted Project

Aug 2 2019

grokos accepted D65687: [libomptarget] Harmonize emitting CUDA errors and general debug messages..

This patch makes debug output more consistent, thanks!

Aug 2 2019, 6:57 PM · Restricted Project, Restricted Project
grokos added inline comments to D65001: [OpenMP][libomptarget] Add support for unified memory for regular maps.
Aug 2 2019, 10:43 AM · Restricted Project, Restricted Project

Jul 31 2019

grokos accepted D60972: [OpenMP 5.0] libomptarget interface for declare mapper functions.

OK, it's good to go now.

Jul 31 2019, 2:31 PM · Restricted Project, Restricted Project

Jul 30 2019

grokos added a comment to D60972: [OpenMP 5.0] libomptarget interface for declare mapper functions.

Since the mapper is not really implemented in this patch, if I add a test, it will be something like below:

__tgt_push_mapper_component(h, base0, begin0, size0, type0);
__tgt_push_mapper_component(h, base1, begin1, size1, type1);
auto total_size = __tgt_mapper_num_components(h);
printf("size=%d", total_size);
// CHECK: size=2

It seems to me this test is not meaningful. I can add a more meaningful test after all mapper patches are upstreamed.
Do you think we need a meaningless test like this now?

Jul 30 2019, 12:38 PM · Restricted Project, Restricted Project

Jul 29 2019

grokos added a comment to D60972: [OpenMP 5.0] libomptarget interface for declare mapper functions.

Now that the clang-side patch is finalized, this is almost good to go. What is missing from this patch is a test. Can you add something under libomptarget/test/offloading/?

Sure, since this is very simple I'll add a simple test. Do you think the test should be in libomptarget/test/offloading/ or libomptarget/test/api/?

Jul 29 2019, 2:42 PM · Restricted Project, Restricted Project
grokos added a comment to D60972: [OpenMP 5.0] libomptarget interface for declare mapper functions.

Now that the clang-side patch is finalized, this is almost good to go. What is missing from this patch is a test. Can you add something under libomptarget/test/offloading/?

Jul 29 2019, 2:00 PM · Restricted Project, Restricted Project
grokos added inline comments to D65001: [OpenMP][libomptarget] Add support for unified memory for regular maps.
Jul 29 2019, 11:05 AM · Restricted Project, Restricted Project

Jul 22 2019

grokos accepted D62398: [OPENMP][NVPTX]Perform memory flush if number of threads to sync is 1 or less..

This looks consistent with the changes in D62397, I think it's good to go.

Jul 22 2019, 12:27 PM · Restricted Project, Restricted Project
grokos accepted D65013: [OPENMP][NVPTX]Use __syncwarp() to reconverge the threads..

OK, looks good.

Jul 22 2019, 10:28 AM · Restricted Project, Restricted Project
grokos added a comment to D65013: [OPENMP][NVPTX]Use __syncwarp() to reconverge the threads..

Does this behavior of CUDA >= 9.0 affect only the parallel level counter? Do we need to propagate these changes to other functions?

Jul 22 2019, 9:55 AM · Restricted Project, Restricted Project

Jul 17 2019

grokos accepted D64808: [OPENMMP] Resolve lost LoopTripCnt for subsequent loops in same thread..

@grokos any comments here please?
hoping to get this patch in before the 9.0 branch.

Jul 17 2019, 6:52 AM · Restricted Project, Restricted Project

Jul 15 2019

grokos accepted D64571: [OPENMP]Fix threadid in __kmpc_omp_taskwait call for dependent target calls..

Looks good.

Jul 15 2019, 4:44 PM · Restricted Project, Restricted Project
grokos accepted D64648: [OPENMP][NVPTX]Fixed checks for cuda versions..

This change makes sense, looks good to me.

Jul 15 2019, 4:19 PM · Restricted Project, Restricted Project

Jul 9 2019

grokos abandoned D45326: [OpenMP] [CUDA plugin] Add support for teams reduction via scratchpad.

Right, this patch is now obsolete.

Jul 9 2019, 3:43 PM · Restricted Project

Jul 2 2019

grokos added a comment to D64080: [OPENMP]Make __kmpc_push_tripcount thread safe..

A long time ago we had identified the problem with the loop trip count and if I recall correctly the proposed solution was to store the trip count per OpenMP task. The fix would be implemented in libomp because libomptarget has no notion of tasks.

Jul 2 2019, 2:58 PM · Restricted Project, Restricted Project
grokos added a reviewer for D64080: [OPENMP]Make __kmpc_push_tripcount thread safe.: AlexEichenberger.
Jul 2 2019, 2:49 PM · Restricted Project, Restricted Project

Jun 19 2019

grokos added inline comments to D63106: [OpenMP][libomptarget] Add support for declare target to clause under unified memory.
Jun 19 2019, 8:32 AM · Restricted Project, Restricted Project

Jun 17 2019

grokos added inline comments to D63106: [OpenMP][libomptarget] Add support for declare target to clause under unified memory.
Jun 17 2019, 3:00 AM · Restricted Project, Restricted Project

Jun 14 2019

grokos accepted D63106: [OpenMP][libomptarget] Add support for declare target to clause under unified memory.

LGTM

Jun 14 2019, 12:43 PM · Restricted Project, Restricted Project
grokos added inline comments to D63106: [OpenMP][libomptarget] Add support for declare target to clause under unified memory.
Jun 14 2019, 11:18 AM · Restricted Project, Restricted Project

Jun 4 2019

grokos added a comment to D62841: [openmp] Use libffi only when LLVM_ENABLE_FFI is on.

May I propose something extra? I guess the purpose of this patch is to disable libffi, for instance on systems where the library is not installed or for whatever reason the user doesn't want to use it. In that case, we should accommodate for the standalone build as well. We could introduce a new Cmake variable LIBOMPTARGET_ENABLE_FFI and

  • set it to LLVM_ENABLE_FFI if building in-tree or
  • have a default value of true if building standalone and let the user set it to false if they want to

Any thoughts?

Jun 4 2019, 3:06 PM · Restricted Project
grokos added a comment to D60972: [OpenMP 5.0] libomptarget interface for declare mapper functions.

Moreover, I'd question the following things:

  1. Why are we back to __kmpc_? naming? Most other functions specific to libomptarget are called __tgt_?.

There are other functions in libomptarget starting with __kmpc_, for example, __kmpc_push_target_tripcount.
My understanding is anything that does not directly call the device starting with __kmpc_. The IBM and Intel compiler people seem to be okay with this naming.

Yes, this is the only function AFAICS and it has a comment "will be revised". All other functions related to mapping start with __tgt so unless there is good incentive, we should follow this naming convention.

Sure, I do prefer __tgt, as long as everyone else is okay with it.

Jun 4 2019, 12:00 PM · Restricted Project, Restricted Project

Jun 3 2019

grokos accepted D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile..

I formatted the description a bit. The patch looks good and the reasoning behind it is now obvious, in line with Nvidia's documentation.

Jun 3 2019, 5:29 PM · Restricted Project
grokos updated the summary of D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile..
Jun 3 2019, 5:28 PM · Restricted Project
grokos added a comment to D60972: [OpenMP 5.0] libomptarget interface for declare mapper functions.

I think this patch is wrong. Where are the previous additions of target_data and __tgt_target_data_mapper? Possibly you uploaded the diff between the previous revision and the current one, whereas you need to upload the diff between origin/master and your current HEAD!

Jun 3 2019, 1:54 PM · Restricted Project, Restricted Project
grokos added inline comments to D62397: [OPENMP][NVPTX]Relax flush directive..
Jun 3 2019, 1:45 PM · Restricted Project, Restricted Project
grokos accepted D60884: [OpenMP][libomptarget] Enable usage of unified memory for declare target link variables.

OK, there don't seem to be any more comments, looks good.

Jun 3 2019, 1:43 PM · Restricted Project, Restricted Project
grokos added inline comments to D60884: [OpenMP][libomptarget] Enable usage of unified memory for declare target link variables.
Jun 3 2019, 11:31 AM · Restricted Project, Restricted Project

May 31 2019

grokos added inline comments to D62397: [OPENMP][NVPTX]Relax flush directive..
May 31 2019, 3:22 PM · Restricted Project, Restricted Project
grokos added inline comments to D62397: [OPENMP][NVPTX]Relax flush directive..
May 31 2019, 3:13 PM · Restricted Project, Restricted Project
grokos added inline comments to D60884: [OpenMP][libomptarget] Enable usage of unified memory for declare target link variables.
May 31 2019, 3:00 PM · Restricted Project, Restricted Project

May 29 2019

grokos added a comment to D56274: [OPENMP][NVPTX]Fix incompatibility of __syncthreads with LLVM, NFC..
May 29 2019, 12:10 PM · Restricted Project
grokos added a comment to D62581: CallSiteSplitting: Respect convergent.

Also, can you include a reference in the commit message pointing to D56274 as this was the case demonstrating the need for this fix? Thanks!

May 29 2019, 12:09 PM

May 25 2019

grokos added inline comments to D60884: [OpenMP][libomptarget] Enable usage of unified memory for declare target link variables.
May 25 2019, 12:45 PM · Restricted Project, Restricted Project

May 24 2019

grokos added a comment to D62199: [OPENMP][NVPTX]Fix barriers and parallel level counters, NFC..

I agree with some of your complaints, I disagree with others:

May 24 2019, 2:56 AM · Restricted Project, Restricted Project

May 23 2019

grokos requested changes to D62199: [OPENMP][NVPTX]Fix barriers and parallel level counters, NFC..

Let me address what has been written so far from my perspective and request some changes since there has been demand for them:

  1. No doubt there are 2 changes here, unrelated to one another. Yes, they could be split into 2 patches to strictly comply with community guidelines. And I would prefer it if the author had submitted two distinct patches. On the other hand, I don't like missing the forest for the trees. This is a 4-line patch, I figured out it is OK to let both changes in at once. Sure, the small size of a patch is not a free pass to doing whatever we want, but in this case committing both changes in one go doesn't do any harm, at least none that I can think of so that I would make a big deal out of it. There's the letter of the law and the spirit of the law, in cases where these two collide I tend to go with the latter.
  2. My bad that I didn't request a test for cache flushing, I was predisposed due to the NFC tag. I agree there is no way to test whether the cache has been flushed, but @ABataev you could provide a test case which fails without cache flushing so that you can demonstrate that this patch does indeed work. There must be such a test case, after all how did you figure out there is a problem there?
  3. The NFC bit does not apply to the barrier fix. If we keep both changes in one patch, that should be reflected in the title. I agree that not being able to test a change does not make it NFC. You can call it NTC (non-testable change), but not NFC.
  4. In case of volatile, it seems there's a gray line here between what constitutes a NFC patch. The way I see it, "volatile" prevents compiler optimizations, it doesn't change anything in the way this code works, the algorithms/data structures used etc., the only change is how the code is compiled. On the other hand, we could argue that since the compiled library behaves differently there is indeed a functional change. Once again, to me this is a gray line and there is no absolute answer. However, since here there are more objections than approvals, @ABataev can you change the patch's title and remove the NFC part altogether?
  5. Regarding justification for the need for volatile, I agree that the explanation of the problem is very short, but the solution makes sense given the nature of that problem - at least to me. I could have blocked the patch until the author finds a way to describe the issue in detail - in the meantime there would be a correctness problem in the library which could cause inexplicable failures/wrong results to the users. Even if the "fix" is more of a hack - which I don't believe - I would rather let this hack in than leave the correctness problem lying around indefinitely. From the description I understand that a thread may cache values from the parallelLevel array but at the same time those values change in RAM without the thread knowing about it. This leads to corrupt states where some threads have a stale view of the parallel level. The only way to demonstrate the problem is to post ptx code, which I think is too much in the scope of a review, that's why I don't see the point in providing more details. The author could point to two places in the code, one where a thread caches a value from that array and another one where another thread modifies the global value, but it is not always possible to track where those changes occur. This is in case I understood the problem correctly. If not, then @ABataev I am also requesting a detailed explanation of the problem because the current description leads to wrong conclusions.
  6. 1, 4 and 5 represent my point of view, which is of course subjective, anyone could have a different mentality. But calling my reviews "reviews" and my activity here "suspicious" doesn't contribute at all. In pretty much the same way the "this is your problem, I've got the owner's approval" reply is inappropriate (and in no way do I endorse this "argument" as a reply to reviewers' requests).
  7. Since February 2018 I have been the owner of the libomptarget project. The issue of libomptarget having no owner came up in one of our biweekly multi-company telecons, I was the only one who volunteered and at a subsequent F2F there was consensus for my appointment since I had written most of the code for the base library and the CUDA and generic-elf64 plugins.
  8. Regarding the nvptx device runtime, certainly it is a different project from the rest of libomptarget. I think the RTL is still considered "coupled" with libomptarget for historical reasons only (both the base library, the plugins and the RTL were all written by the same people at the same company in the scope of the same project). But in terms of further development and expertise needed, it is a completely different kettle of fish. From my current position I can only contribute to the RTL by reviewing patches, it is true that I no longer do any development in this RTL. If anyone feels that the nvptx RTL repository should be managed/owned separately from the rest of libomptarget, please speak up and I will raise the issue at the next community telecon.
May 23 2019, 8:55 PM · Restricted Project, Restricted Project

May 21 2019

grokos accepted D62199: [OPENMP][NVPTX]Fix barriers and parallel level counters, NFC..

LGTM

May 21 2019, 11:29 AM · Restricted Project, Restricted Project

May 20 2019

grokos accepted D60223: [OpenMP][libomptarget] Enable requires flags for target libraries..

I don't have any other comments or suggestions, the patch looks good.

May 20 2019, 12:29 PM · Restricted Project

May 17 2019

grokos added inline comments to D60223: [OpenMP][libomptarget] Enable requires flags for target libraries..
May 17 2019, 12:56 PM · Restricted Project

May 10 2019

grokos accepted D61801: [OPENMP][NVPTX]Simplify handling of thread limit, NFC..

LGTM

May 10 2019, 7:10 PM · Restricted Project
grokos accepted D61785: [OPENMP][NVPTX]Improve number of threads counter, NFC..

LGTM

May 10 2019, 11:49 AM · Restricted Project, Restricted Project

May 3 2019

grokos accepted D61526: [OPENMP][NVPTX]Improve thread limit counter, NFC..

Looks good.

May 3 2019, 12:42 PM · Restricted Project, Restricted Project
grokos accepted D61523: [OPENMP][NVPTX]Improve number of threads counter, NFC..

Looks good.

May 3 2019, 12:42 PM · Restricted Project

May 2 2019

grokos accepted D61459: [OPENMP][NVPTX]Improved several standard OpenMP functions, NFC..

Looks good.

May 2 2019, 4:35 PM · Restricted Project, Restricted Project
grokos accepted D61395: [OPENMP][NVPTX]Improve code by using parallel level counter..

LGTM

May 2 2019, 12:52 PM · Restricted Project, Restricted Project

Apr 26 2019

grokos accepted D60918: [OPENMP][NVPTX]Correctly handle L2 parallelism in SPMD mode..

I think it looks good now.

Apr 26 2019, 12:20 PM · Restricted Project, Restricted Project

Apr 24 2019

grokos added a comment to D60972: [OpenMP 5.0] libomptarget interface for declare mapper functions.

I see that target_data_mapper pretty much copies the entire code from target_data_begin and target_data_end. That's 200+ duplicate lines of code. Couldn't we just add this check:

Apr 24 2019, 12:12 PM · Restricted Project, Restricted Project
grokos added a reviewer for D60972: [OpenMP 5.0] libomptarget interface for declare mapper functions: grokos.
Apr 24 2019, 12:00 PM · Restricted Project, Restricted Project

Mar 25 2019

grokos accepted D55952: [libomptarget] Introduce LIBOMPTARGET_ENABLE_DEBUG cmake option..

OK, now it makes sense. I had forgotten what was in the description since the patch was first submitted. :)

Mar 25 2019, 7:38 PM · Restricted Project, Restricted Project

Mar 22 2019

grokos added a comment to D55952: [libomptarget] Introduce LIBOMPTARGET_ENABLE_DEBUG cmake option..

This revision looks much better. Please have a look at the inline comment.

Mar 22 2019, 8:08 PM · Restricted Project, Restricted Project

Jan 4 2019

grokos accepted D56332: [OPENMP][NVPTX]Fix dynamic scheduling..

Nice approach to optimizing the degree of collisions of atomic adds. LGTM.

Jan 4 2019, 7:48 PM
grokos accepted D56290: [OPENMP][NVPTX]General formatting/code improvement, NFC..

Looks good. Thanks!

Jan 4 2019, 12:01 PM
grokos accepted D56278: [OPENMP][NVPTX]Improve performance + reduce number of used registers..

LGTM. Thanks for all this work!

Jan 4 2019, 8:25 AM
grokos added inline comments to D56290: [OPENMP][NVPTX]General formatting/code improvement, NFC..
Jan 4 2019, 8:24 AM
grokos added inline comments to D56278: [OPENMP][NVPTX]Improve performance + reduce number of used registers..
Jan 4 2019, 8:06 AM

Jan 3 2019

grokos added inline comments to D55588: [OpenMP] Fix nvidia-cuda-toolkit detection on Debian/Ubuntu.
Jan 3 2019, 5:44 PM
grokos accepted D56274: [OPENMP][NVPTX]Fix incompatibility of __syncthreads with LLVM, NFC..

I'll accept the patch for the sake of consistency and correctness of execution. Just one question:

which cost is too high

So should we expect a performance penalty until function copy is fixed in LLVM and we can revert back to __syncthreads()?

Jan 3 2019, 8:38 AM · Restricted Project
grokos accepted D55588: [OpenMP] Fix nvidia-cuda-toolkit detection on Debian/Ubuntu.

LGTM.

Jan 3 2019, 5:12 AM

Dec 31 2018

grokos added a comment to D55952: [libomptarget] Introduce LIBOMPTARGET_ENABLE_DEBUG cmake option..

I'm not sure what we are trying to do here. OMPTARGET_DEBUG is not an environment variable, it's a preprocessor definition. It is used by libomptarget/include/omptarget.h to include all necessary header files and define the DEBUGP macro appropriately, which in turn is used by libomptarget/src/private.h to define the DP macro used throughout the library.

Dec 31 2018, 9:30 AM · Restricted Project, Restricted Project

Dec 27 2018

grokos accepted D56100: [OPENMP][NVPTX]Fixed initialization of the data-sharing interface..
Dec 27 2018, 4:21 PM
grokos accepted D56102: [OPENMP][NVPTX]Added/fixed debugging messages, NFC..

LGTM.

Dec 27 2018, 8:26 AM
grokos added inline comments to D56102: [OPENMP][NVPTX]Added/fixed debugging messages, NFC..
Dec 27 2018, 8:20 AM
grokos accepted D56101: [OPENMP][NVPTX]Outline assert into noinline function, NFC..

Looks good.

Dec 27 2018, 8:19 AM
grokos added inline comments to D56100: [OPENMP][NVPTX]Fixed initialization of the data-sharing interface..
Dec 27 2018, 8:11 AM

Dec 18 2018

grokos added a comment to D55588: [OpenMP] Fix nvidia-cuda-toolkit detection on Debian/Ubuntu.

Since a similar solution was adopted for clang, I think we should let this one land. After all, it's a matter of consistency between the two projects.

Dec 18 2018, 7:16 PM

Nov 5 2018

grokos edited reviewers for D54079: [OPENMP] Add Hurd support, added: AndreyChurbanov, tlwilmar, jlpeyton, hbae; removed: gtbercea, caomhin, grokos.
Nov 5 2018, 6:47 AM

Oct 22 2018

grokos added a comment to D46185: [OpenMP] Allow nvptx sm_30 to be used as an offloading device.

Are we still interested in this patch or should we abandon it?

Oct 22 2018, 6:28 AM · Restricted Project

Oct 11 2018

grokos accepted D53141: [OpenMP][libomptarget] Add runtime function for pushing coalesced global records.

Looks good.

Oct 11 2018, 8:41 AM

Oct 1 2018

grokos accepted D52725: [libomptarget-nvptx] reduction: Determine if runtime uninitialized.

Looks good. Thanks for taking care of this issue!

Oct 1 2018, 7:16 AM
grokos accepted D52701: [libomptarget-nvptx] Enable asserts in bclib.

Looks good. I have no idea why this option was left there...

Oct 1 2018, 1:44 AM

Sep 29 2018

grokos accepted D51787: [libomptarget-nvptx] Fix ancestor_thread_num and team_size (non-SPMD).

Looks good.

Sep 29 2018, 12:35 PM

Sep 28 2018

grokos accepted D52655: [libomptarget-nvptx] Align data sharing stack.

Interesting. Are there other cases where alignment is needed?

Sep 28 2018, 9:18 AM
grokos accepted D51785: [libomptarget-nvptx] Ignore calls to dynamic API.

LG

Sep 28 2018, 8:39 AM
grokos accepted D51783: [libomptarget-nvptx] Fix number of threads in parallel.

LG

Sep 28 2018, 8:39 AM
grokos accepted D51687: [libomptarget-nvptx] Add testing infrastructure.

Then this revision is good to go!

Sep 28 2018, 6:38 AM

Sep 21 2018

grokos added a comment to D51875: [OPENMP][NVPTX] Add support for lastprivates/reductions handling in SPMD constructs with lightweight runtime..

@Hahnfeld: Are the latest changes in line with your requirements/plans to reduce the memory footprint of the nvptx runtime?

Sep 21 2018, 7:14 AM

Sep 12 2018

grokos accepted D50188: [OpenMP][libomptarget] Simplify warp master selection for data sharing.

I completely missed this one. The proposed change makes sense.

Sep 12 2018, 6:23 AM

Sep 11 2018

grokos added inline comments to D51875: [OPENMP][NVPTX] Add support for lastprivates/reductions handling in SPMD constructs with lightweight runtime..
Sep 11 2018, 7:53 AM

Sep 10 2018

grokos added a comment to D51687: [libomptarget-nvptx] Add testing infrastructure.

Great job, adding testing infrastructure has been on our list for a long time!

Sep 10 2018, 4:22 PM

Sep 6 2018

grokos added a comment to D51107: [LIBOMPTARGET] Add support for mapping of lambda captures..

BTW, the main problem here is that there is no mapping between host pointer and device pointer for the firstprivates, just like for the mapped data. We just need to rework the way we allocate the memory for the firstprivate to make MEMBER_OF to work.

I don't see this path taken in the latest patch. Is there a major drawback of doing this?

I did it using new tgtArgsPositions, which handles mappings between the firstprivates and the corresponding target pointers.

I think it's more consistent to handle all data mapping in a single loop (in target_data_begin) as discussed before.

I think this is a different problem that should be solved separately. It requires a redesign of the existing solution, while I'm just trying to implement a new feature.

Sep 6 2018, 8:51 AM

Sep 4 2018

grokos accepted D51624: [libomptarget][CUDA] Use cuDeviceGetAttribute, NFCI..

Looks good.

Sep 4 2018, 8:08 AM
grokos accepted D51623: [libomptarget] PR38704: Fix erase of ShadowPtrMap.

Good catch, thanks for the fix!

Sep 4 2018, 8:00 AM
grokos added inline comments to D51622: [libomptarget][NVPTX] Drop dead code and data structures, NFCI..
Sep 4 2018, 7:54 AM