grokos (George Rokos)
User

Projects

User does not belong to any projects.

User Details

User Since
Feb 16 2016, 1:22 PM (136 w, 11 h)

Recent Activity

Fri, Sep 21

grokos added a comment to D51875: [OPENMP][NVPTX] Add support for lastprivates/reductions handling in SPMD constructs with lightweight runtime..

@Hahnfeld: Are the latest changes in line with your requirements/plans to reduce the memory footprint of the nvptx runtime?

Fri, Sep 21, 7:14 AM

Wed, Sep 12

grokos accepted D50188: [OpenMP][libomptarget] Simplify warp master selection for data sharing.

I completely missed this one. The proposed change makes sense.

Wed, Sep 12, 6:23 AM

Tue, Sep 11

grokos added inline comments to D51875: [OPENMP][NVPTX] Add support for lastprivates/reductions handling in SPMD constructs with lightweight runtime..
Tue, Sep 11, 7:53 AM

Mon, Sep 10

grokos added a comment to D51687: [libomptarget-nvptx] Add testing infrastructure.

Great job, adding testing infrastructure has been on our list for a long time!

Mon, Sep 10, 4:22 PM

Thu, Sep 6

grokos added a comment to D51107: [LIBOMPTARGET] Add support for mapping of lambda captures..

BTW, the main problem here is that there is no mapping between host pointer and device pointer for the firstprivates, just like for the mapped data. We just need to rework the way we allocate the memory for the firstprivate to make MEMBER_OF to work.

I don't see this path taken in the latest patch. Is there a major drawback of doing this?

I did it using new tgtArgsPositions, which handles mappings between the firstprivates and the corresponding target pointers.

I think it's more consistent to handle all data mapping in a single loop (in target_data_begin) as discussed before.

I think this is a different problem that should be solved separately. It requires a redesign of the existing solution, while I'm just trying to implement a new feature.

Thu, Sep 6, 8:51 AM

Tue, Sep 4

grokos accepted D51624: [libomptarget][CUDA] Use cuDeviceGetAttribute, NFCI..

Looks good.

Tue, Sep 4, 8:08 AM
grokos accepted D51623: [libomptarget] PR38704: Fix erase of ShadowPtrMap.

Good catch, thanks for the fix!

Tue, Sep 4, 8:00 AM
grokos added inline comments to D51622: [libomptarget][NVPTX] Drop dead code and data structures, NFCI..
Tue, Sep 4, 7:54 AM

Mon, Sep 3

grokos added a comment to D51107: [LIBOMPTARGET] Add support for mapping of lambda captures..

@AlexEichenberger: Have you confirmed whether flag 0x400 is free to be used for LAMBDA_REF or has been reserved for something else in XL?

Mon, Sep 3, 2:52 PM

Aug 23 2018

grokos added a comment to D51107: [LIBOMPTARGET] Add support for mapping of lambda captures..

Is this patch about lambdas inside target regions or about lambdas containing a target region? From the description I assume it's the former. In that case, what happens if there are more than 1 lambdas in the target region?

  1. This is for lambdas inside target region.
  2. For each lambda we generate this sequence: <lambda1>, <capture1_1>, .., <capture1_n>, <lambda2>, <capture2_1>, ..., <capture2_m>, ...
Aug 23 2018, 5:53 AM

Aug 22 2018

grokos added a comment to D51107: [LIBOMPTARGET] Add support for mapping of lambda captures..

Is this patch about lambdas inside target regions or about lambdas containing a target region? From the description I assume it's the former. In that case, what happens if there are more than 1 lambdas in the target region?

Aug 22 2018, 11:43 AM

Aug 16 2018

grokos abandoned D44522: [Libomptarget] Full implementation of the target-offload-icv.
Aug 16 2018, 9:10 AM · Restricted Project

Aug 2 2018

grokos added a comment to D50158: [OpenMP] Add placeholder functions for the depend and nowait depend clauses for target data directives..

These functions are 1:1 the same as the current _nowait functions. If that's for handling depend clauses without nowait I think the compiler can just add another runtime call into libomp.

Aug 2 2018, 9:03 AM

Jul 20 2018

grokos added inline comments to D49564: [OPNEMP, NVPTX] Fixed sychronization construct + code cleanup..
Jul 20 2018, 4:54 PM

Jul 19 2018

grokos committed rOMP337455: [OpenMP][libomptarget] New map interface: remove translation code and ensure….
[OpenMP][libomptarget] New map interface: remove translation code and ensure…
Jul 19 2018, 6:48 AM
grokos committed rL337455: [OpenMP][libomptarget] New map interface: remove translation code and ensure….
[OpenMP][libomptarget] New map interface: remove translation code and ensure…
Jul 19 2018, 6:46 AM
grokos closed D44186: [OpenMP][libomptarget] New map interface: remove translation code and ensure proper alignment of struct members.
Jul 19 2018, 6:46 AM · Restricted Project
grokos added a comment to D44186: [OpenMP][libomptarget] New map interface: remove translation code and ensure proper alignment of struct members.

@ABataev: Can you put a link to the clang-side patch in the description so that we link the two patches together? Also, please let me know when you commit the clang patch so that I commit this one as well.

Jul 19 2018, 6:35 AM · Restricted Project
grokos added inline comments to D44186: [OpenMP][libomptarget] New map interface: remove translation code and ensure proper alignment of struct members.
Jul 19 2018, 6:31 AM · Restricted Project
grokos updated the diff for D44186: [OpenMP][libomptarget] New map interface: remove translation code and ensure proper alignment of struct members.

Added example code to demonstrate the need for padding in partially mapped structs.

Jul 19 2018, 6:24 AM · Restricted Project
grokos retitled D44186: [OpenMP][libomptarget] New map interface: remove translation code and ensure proper alignment of struct members from [OpenMP] New clang/libomptarget map interface: remove translation code to [OpenMP][libomptarget] New map interface: remove translation code and ensure proper alignment of struct members.
Jul 19 2018, 5:48 AM · Restricted Project

Jul 18 2018

grokos added a comment to D44186: [OpenMP][libomptarget] New map interface: remove translation code and ensure proper alignment of struct members.

Hi George, when do you plan to commit this patch? I cleaned the patch for the clang and it is ready to be committed.

Jul 18 2018, 10:15 AM · Restricted Project

Jul 17 2018

grokos accepted D49418: Also support several images for elf.

LGTM

Jul 17 2018, 9:08 AM

Jul 13 2018

grokos accepted D49204: [OpenMP][libomptarget] Fix data sharing and globalization infrastructure to work in SPMD mode.

I don't have any further comments.

Jul 13 2018, 2:08 AM

Jul 12 2018

grokos added inline comments to D49204: [OpenMP][libomptarget] Fix data sharing and globalization infrastructure to work in SPMD mode.
Jul 12 2018, 9:17 AM

Jul 9 2018

grokos accepted D49036: [OPENMP, NVPTX] Support several images in the executable..

Looks good. Do you think we should do the same for the generic-elf-64 plugin?

Jul 9 2018, 10:41 AM

Jun 29 2018

grokos accepted D48732: [OPENMP, NVPTX] Sync threads before start ordered loops..

The addition looks good. I've seen this piece of code in quite a few places, maybe we need to refactor it into a mini-function. But let's revisit that in the future.

Jun 29 2018, 9:12 AM

Jun 25 2018

grokos accepted D48480: [OPENMP, NVPTX] Fixes for NVPTX RTL.

Looks good to me, thanks for submitting those fixes!

Jun 25 2018, 5:43 AM

Jun 18 2018

grokos accepted D48286: [OpenMP] [CUDA] Expose teamid to the debug path.
Jun 18 2018, 11:20 AM · Restricted Project
grokos added a comment to D48286: [OpenMP] [CUDA] Expose teamid to the debug path.

LGTM.

Jun 18 2018, 11:20 AM · Restricted Project

May 24 2018

grokos committed rOMP333225: [CUDA]Fix dynamic|guided scheduling..
[CUDA]Fix dynamic|guided scheduling.
May 24 2018, 2:53 PM
grokos committed rL333225: [CUDA]Fix dynamic|guided scheduling..
[CUDA]Fix dynamic|guided scheduling.
May 24 2018, 2:16 PM
grokos closed D47333: [CUDA]Fix dynamic|guided scheduling..
May 24 2018, 2:16 PM
grokos accepted D47333: [CUDA]Fix dynamic|guided scheduling..

I think this looks good. Definitely a correct implementation compared to what we have now!

May 24 2018, 10:23 AM

May 22 2018

grokos accepted D47131: [libomptarget-nvptx] loop: Determine if runtime uninitialized.

Looks good. I don't see any reason why performance would be affected and it is a necessary change for the sake of correctness.

May 22 2018, 8:13 AM

May 21 2018

grokos accepted D47130: [CMake] Unify install path for libraries.

LGTM.

May 21 2018, 7:51 AM
grokos added a comment to D44186: [OpenMP][libomptarget] New map interface: remove translation code and ensure proper alignment of struct members.

Was the new map interface added to Clang and just missed it?

May 21 2018, 7:46 AM · Restricted Project

May 16 2018

grokos accepted D46930: [libomptarget-nvptx-bc] Pass found CUDA installations.

LGTM

May 16 2018, 8:32 AM
grokos accepted D46901: [libomptarget-nvptx] Test bitcode compiler flags and enable by default.

Looks good! Thanks for taking care of it!

May 16 2018, 8:31 AM

May 15 2018

grokos accepted D46840: [OpenMP][libomptarget] Add function for checking SPMD mode.

Looks good.

May 15 2018, 8:13 AM

May 14 2018

grokos requested changes to D46842: [OpenMP][libomptarget] Make bitcode library building depend on clang and llvm-linker being available .

I agree with Jonas, the initial version of the nvptx RTL that we tried to upstream was somewhat like what you try to do here and the consensus was that we do not want to have these dependencies.

May 14 2018, 1:50 PM

May 4 2018

grokos retitled D46210: [OpenMP] Use LIBOMPTARGET_DEVICE_RTL_DEBUG env var to control debug messages on the device side from [OpenMP] Use LIBOMPDEVICE_DEBUG env var to control debug messages on the device side to [OpenMP] Use LIBOMPTARGET_DEVICE_RTL_DEBUG env var to control debug messages on the device side.
May 4 2018, 12:06 PM · Restricted Project
grokos accepted D46210: [OpenMP] Use LIBOMPTARGET_DEVICE_RTL_DEBUG env var to control debug messages on the device side.

LGTM

May 4 2018, 12:06 PM · Restricted Project
grokos added inline comments to D46210: [OpenMP] Use LIBOMPTARGET_DEVICE_RTL_DEBUG env var to control debug messages on the device side.
May 4 2018, 8:52 AM · Restricted Project

May 2 2018

grokos added a comment to D46210: [OpenMP] Use LIBOMPTARGET_DEVICE_RTL_DEBUG env var to control debug messages on the device side.

May I suggest that we use something like LIBOMPTARGET_DEVICE_RTL_DEBUG instead of LIBOMPDEVICE_DEBUG? I think this name is more descriptive of what the env var does.

It's kind of long, but I am ok also if you do not want to shorten it.

May 2 2018, 9:23 AM · Restricted Project
grokos added a comment to D46210: [OpenMP] Use LIBOMPTARGET_DEVICE_RTL_DEBUG env var to control debug messages on the device side.

May I suggest that we use something like LIBOMPTARGET_DEVICE_RTL_DEBUG instead of LIBOMPDEVICE_DEBUG? I think this name is more descriptive of what the env var does.

May 2 2018, 8:02 AM · Restricted Project

Apr 27 2018

grokos added a comment to D46185: [OpenMP] Allow nvptx sm_30 to be used as an offloading device.

I don't think this code does atomic max. Actually, I don't think it does anything at all. First off, old_value and elem are variables which do not change inside the while-loop. So the first loop condition is either always true or always false. Secondly, atomicCAS returns the value that existed at address Buffer (regardless of whether or not the CAS operation succeeded) and not a boolean result. Also, old_value must be updated before each CAS attempt because some other thread may have changed it.

Apr 27 2018, 8:08 AM · Restricted Project

Apr 25 2018

grokos accepted D45528: [OpenMP] Remove compilation warning when using clang to compile bc files..

Looks good.

Apr 25 2018, 10:52 PM · Restricted Project

Apr 16 2018

grokos added a comment to D45528: [OpenMP] Remove compilation warning when using clang to compile bc files..

update as suggested (except PRId64)

Isn't PRId64 working?

I did not find any usage of it on the device side. Not confident on using it. We can remove the warning first, and improve the format later?

It's not giving a warning because in CUDA long long int happens to be 64 bits long, so you can use %lld to print an int64_t. This is not guaranteed to be true on every platform, therefore the portable way to print an int64_t is using the PRId64 macro. That's what we do in the base library.

My understanding of PRId64 is just a pretty format for lld, which I did not master yet, maybe you can help me to understand those two macros better.

But not sure why %lld can be wrong with long long int, this is standard c/c++ which cuda is based on?

Apr 16 2018, 7:46 AM · Restricted Project

Apr 13 2018

grokos added a comment to D45528: [OpenMP] Remove compilation warning when using clang to compile bc files..

update as suggested (except PRId64)

Isn't PRId64 working?

I did not find any usage of it on the device side. Not confident on using it. We can remove the warning first, and improve the format later?

Apr 13 2018, 3:39 PM · Restricted Project
grokos added a comment to D45528: [OpenMP] Remove compilation warning when using clang to compile bc files..

update as suggested (except PRId64)

Apr 13 2018, 12:50 PM · Restricted Project

Apr 12 2018

grokos added a comment to D45220: [OpenMP] Search llvm package to find bc file compiler and linker if not specified by a user.

I see. I was trying to set the bc file utilities using find_pacakge(llvm...) instead of using COMPILER_ID and common it to an upper level as we search things like elf and ffi.

The intention was to keep the user interface the same as what we have now, (if not adding more). But now look back at the comments, I realized I may consider cuda compiler to be clang++, it is actually should be clang?

Apr 12 2018, 1:40 PM · Restricted Project
grokos added inline comments to D45528: [OpenMP] Remove compilation warning when using clang to compile bc files..
Apr 12 2018, 1:33 PM · Restricted Project
grokos accepted D45530: [OpenMP] Make bc file compilation sensitive to LIBOMPTARGET_NVPTX_DEBUG flag.

I think this is now good to go.

Apr 12 2018, 1:15 PM · Restricted Project

Apr 10 2018

grokos accepted D45415: [OpenMP] Remove extra warning when we build.

LGTM

Apr 10 2018, 7:43 AM · Restricted Project

Apr 5 2018

grokos added a comment to D45220: [OpenMP] Search llvm package to find bc file compiler and linker if not specified by a user.

This is only for compiler and linker used for building bclib. It is not the default build compiler.

With this patch, if a user does not specify compiler and linker for the bc library, but uses -DLIBOMPTARGET_NVPTX_ENABLE_BCLIB=1, the cmake script will try to locate if the build compiler is clang and can be used for building the bc lib.

Apr 5 2018, 12:45 PM · Restricted Project
grokos added a comment to D45326: [OpenMP] [CUDA plugin] Add support for teams reduction via scratchpad.

One caveat regarding Alexey's proposal: According to the CUDA programming guide, malloc on the device allocates space from a fixed-size heap. The default size of this heap is 8MB. If we run into a scenario where more than 8MB will be required for the reduction scratchpad, allocating the scratchpad from the device will fail. The heap size can be user-defined from the host, but for that to happen the host must know how large the scratchpad needs to be, which defeats the purpose of moving scratchpad allocation from the plugin to the nvptx runtime.

But you can change the limit using cudaThreadSetLimit

Apr 5 2018, 12:26 PM · Restricted Project
grokos added a comment to D45326: [OpenMP] [CUDA plugin] Add support for teams reduction via scratchpad.

One caveat regarding Alexey's proposal: According to the CUDA programming guide, malloc on the device allocates space from a fixed-size heap. The default size of this heap is 8MB. If we run into a scenario where more than 8MB will be required for the reduction scratchpad, allocating the scratchpad from the device will fail. The heap size can be user-defined from the host, but for that to happen the host must know how large the scratchpad needs to be, which defeats the purpose of moving scratchpad allocation from the plugin to the nvptx runtime.

Apr 5 2018, 11:13 AM · Restricted Project
grokos added inline comments to D45326: [OpenMP] [CUDA plugin] Add support for teams reduction via scratchpad.
Apr 5 2018, 11:05 AM · Restricted Project
grokos updated the diff for D45326: [OpenMP] [CUDA plugin] Add support for teams reduction via scratchpad.
Apr 5 2018, 11:05 AM · Restricted Project
grokos created D45326: [OpenMP] [CUDA plugin] Add support for teams reduction via scratchpad.
Apr 5 2018, 8:21 AM · Restricted Project

Apr 2 2018

grokos accepted D44992: [OpenMP] enable bc file compilation using the latest clang.

OK, so we can proceed with this solution and if we observe any performance problem in the future then I will push a different fix (instead of declaring __shared__ variables as extern, we can have getter functions in the .cu file in which the variable is defined which other .cu files can call to get a pointer/reference to the variable).

Apr 2 2018, 1:54 PM · Restricted Project

Mar 29 2018

grokos added a comment to D44992: [OpenMP] enable bc file compilation using the latest clang.

Right, I remember this issue. It is caused by the fact that clang does not support __shared__ variables to be extern. This restriction was explicitly introduced here: https://reviews.llvm.org/D25125.

Mar 29 2018, 8:27 AM · Restricted Project

Mar 28 2018

grokos added a comment to D44992: [OpenMP] enable bc file compilation using the latest clang.

What is the problem with building the bc library with the latest clang? I cannot reproduce the issue.

Mar 28 2018, 3:18 PM · Restricted Project

Mar 22 2018

grokos updated the summary of D44522: [Libomptarget] Full implementation of the target-offload-icv.
Mar 22 2018, 1:04 PM · Restricted Project
grokos updated the diff for D44522: [Libomptarget] Full implementation of the target-offload-icv.
Mar 22 2018, 12:57 PM · Restricted Project

Mar 21 2018

grokos accepted D44754: [OpenMP][libomptarget] Initialize global memory stack only once..

Looks good.

Mar 21 2018, 1:18 PM
grokos abandoned D38798: [OpenMP] Support for implicit "declare target" functions - Sema patch.

@ABataev came up with a much simpler solution to the implementation of declare target: https://reviews.llvm.org/rL327636

Mar 21 2018, 8:17 AM · Restricted Project
grokos abandoned D43026: [OpenMP] CodeGen for the "declare target" directive - variables, functions, ctors/dtors.

@ABataev came up with a much simpler solution to the implementation of declare target: https://reviews.llvm.org/rL327636

Mar 21 2018, 8:17 AM · Restricted Project, Restricted Project

Mar 20 2018

grokos accepted D44487: [OpenMP][libomptarget] Enable globalization for workers.

LGTM.

Mar 20 2018, 11:45 AM
grokos added inline comments to D44487: [OpenMP][libomptarget] Enable globalization for workers.
Mar 20 2018, 11:17 AM

Mar 16 2018

grokos updated the diff for D44522: [Libomptarget] Full implementation of the target-offload-icv.

Rebase after bug fix.

Mar 16 2018, 7:18 PM · Restricted Project
grokos committed rL327763: Bugfix, extern declarations for libomp functions are `extern "C"` declarations.
Bugfix, extern declarations for libomp functions are `extern "C"` declarations
Mar 16 2018, 7:11 PM
grokos committed rOMP327763: Bugfix, extern declarations for libomp functions are `extern "C"` declarations.
Bugfix, extern declarations for libomp functions are `extern "C"` declarations
Mar 16 2018, 7:11 PM
grokos added a dependent revision for D44577: Read OMP_TARGET_OFFLOAD and provide API to access ICV: D44522: [Libomptarget] Full implementation of the target-offload-icv.
Mar 16 2018, 1:54 PM
grokos added a dependency for D44522: [Libomptarget] Full implementation of the target-offload-icv: D44577: Read OMP_TARGET_OFFLOAD and provide API to access ICV.
Mar 16 2018, 1:54 PM · Restricted Project
grokos updated the diff for D44522: [Libomptarget] Full implementation of the target-offload-icv.

Updated the patch to query the target offload kind from libomp instead of parsing OMP_TARGET_OFFLOAD.

Mar 16 2018, 1:52 PM · Restricted Project
grokos added a comment to D44577: Read OMP_TARGET_OFFLOAD and provide API to access ICV.

Question: If the ICV is set to DEFAULT, then libomptarget must change it to either MANDATORY (if there are available devices in the system) or DISABLED (if there are no devices). Should libomptarget notify libomp about this change?

Mar 16 2018, 1:49 PM
grokos committed rOMP327740: Moved extern declarations to private header file, they are only used from….
Moved extern declarations to private header file, they are only used from…
Mar 16 2018, 1:43 PM
grokos committed rL327740: Moved extern declarations to private header file, they are only used from….
Moved extern declarations to private header file, they are only used from…
Mar 16 2018, 1:42 PM
grokos updated the diff for D44522: [Libomptarget] Full implementation of the target-offload-icv.

Waiting for the libomp-side patch for OMP_TARGET_OFFLOAD, I removed parsing this env var from within libomptarget and now this revision only implements the internal logic to handle whatever libomptarget will get by querying libomp. (there is already some code checking for OMP_TARGET_OFFLOAD==DISABLED, I'm leaving it there in order not to break systems which make use of this option right now)

Mar 16 2018, 9:27 AM · Restricted Project

Mar 15 2018

grokos accepted D44537: [OpenMP][libomptarget] Fix master warp check.

Looks good.

Mar 15 2018, 1:30 PM
grokos updated the diff for D44522: [Libomptarget] Full implementation of the target-offload-icv.
Mar 15 2018, 1:24 PM · Restricted Project
grokos retitled D44522: [Libomptarget] Full implementation of the target-offload-icv from [Libomptarget] Behavior of library upon target offload failure to [Libomptarget] Full implementation of the target-offload-icv.
Mar 15 2018, 1:24 PM · Restricted Project
grokos added a comment to D44522: [Libomptarget] Full implementation of the target-offload-icv.

Right, I just read TR6 and everything is defined there. I'll revise the patch and upload a new diff.

Mar 15 2018, 10:24 AM · Restricted Project
grokos created D44522: [Libomptarget] Full implementation of the target-offload-icv.
Mar 15 2018, 9:17 AM · Restricted Project
grokos accepted D44486: [OpenMP][libomptarget] Enable usage of shared memory slots.

This is obviously good to go!

Mar 15 2018, 8:35 AM
grokos accepted D44470: [OpenMP][libomptarget] Enable multiple frames per global memory slot.

Looks good.

Mar 15 2018, 8:34 AM

Mar 14 2018

grokos added inline comments to D44470: [OpenMP][libomptarget] Enable multiple frames per global memory slot.
Mar 14 2018, 7:54 PM
grokos added inline comments to D44470: [OpenMP][libomptarget] Enable multiple frames per global memory slot.
Mar 14 2018, 3:19 PM
grokos added inline comments to D44487: [OpenMP][libomptarget] Enable globalization for workers.
Mar 14 2018, 2:05 PM
grokos added inline comments to D44487: [OpenMP][libomptarget] Enable globalization for workers.
Mar 14 2018, 12:23 PM
grokos committed rOMP327556: [libomptarget][nvptx] Bug fix: Correctly identify the warp master active thread..
[libomptarget][nvptx] Bug fix: Correctly identify the warp master active thread.
Mar 14 2018, 12:15 PM
grokos committed rL327556: [libomptarget][nvptx] Bug fix: Correctly identify the warp master active thread..
[libomptarget][nvptx] Bug fix: Correctly identify the warp master active thread.
Mar 14 2018, 12:15 PM

Mar 12 2018

grokos accepted D44260: [OpenMP][libomptarget] Add global memory data sharing support for master-worker sharing..

LGTM.

Mar 12 2018, 8:06 AM

Mar 9 2018

grokos added inline comments to D44186: [OpenMP][libomptarget] New map interface: remove translation code and ensure proper alignment of struct members.
Mar 9 2018, 2:48 PM · Restricted Project
grokos added inline comments to D44260: [OpenMP][libomptarget] Add global memory data sharing support for master-worker sharing..
Mar 9 2018, 1:20 PM
grokos added inline comments to D43197: [OpenMP] Add flag for linking runtime bitcode library.
Mar 9 2018, 7:36 AM

Mar 8 2018

grokos added inline comments to D44260: [OpenMP][libomptarget] Add global memory data sharing support for master-worker sharing..
Mar 8 2018, 10:57 AM
grokos accepted D44254: [OpenMP][libomptarget] Fix union..

I think the assignment makes more sense.

Mar 8 2018, 10:24 AM