grokos (George Rokos)
User

Projects

User does not belong to any projects.

User Details

User Since
Feb 16 2016, 1:22 PM (125 w, 5 d)

Recent Activity

Fri, Jul 13

grokos accepted D49204: [OpenMP][libomptarget] Fix data sharing and globalization infrastructure to work in SPMD mode.

I don't have any further comments.

Fri, Jul 13, 2:08 AM

Thu, Jul 12

grokos added inline comments to D49204: [OpenMP][libomptarget] Fix data sharing and globalization infrastructure to work in SPMD mode.
Thu, Jul 12, 9:17 AM

Mon, Jul 9

grokos accepted D49036: [OPENMP, NVPTX] Support several images in the executable..

Looks good. Do you think we should do the same for the generic-elf-64 plugin?

Mon, Jul 9, 10:41 AM

Fri, Jun 29

grokos accepted D48732: [OPENMP, NVPTX] Sync threads before start ordered loops..

The addition looks good. I've seen this piece of code in quite a few places, maybe we need to refactor it into a mini-function. But let's revisit that in the future.

Fri, Jun 29, 9:12 AM

Mon, Jun 25

grokos accepted D48480: [OPENMP, NVPTX] Fixes for NVPTX RTL.

Looks good to me, thanks for submitting those fixes!

Mon, Jun 25, 5:43 AM

Mon, Jun 18

grokos accepted D48286: [OpenMP] [CUDA] Expose teamid to the debug path.
Mon, Jun 18, 11:20 AM · Restricted Project
grokos added a comment to D48286: [OpenMP] [CUDA] Expose teamid to the debug path.

LGTM.

Mon, Jun 18, 11:20 AM · Restricted Project

May 24 2018

grokos committed rOMP333225: [CUDA]Fix dynamic|guided scheduling..
[CUDA]Fix dynamic|guided scheduling.
May 24 2018, 2:53 PM
grokos committed rL333225: [CUDA]Fix dynamic|guided scheduling..
[CUDA]Fix dynamic|guided scheduling.
May 24 2018, 2:16 PM
grokos closed D47333: [CUDA]Fix dynamic|guided scheduling..
May 24 2018, 2:16 PM
grokos accepted D47333: [CUDA]Fix dynamic|guided scheduling..

I think this looks good. Definitely a correct implementation compared to what we have now!

May 24 2018, 10:23 AM

May 22 2018

grokos accepted D47131: [libomptarget-nvptx] loop: Determine if runtime uninitialized.

Looks good. I don't see any reason why performance would be affected and it is a necessary change for the sake of correctness.

May 22 2018, 8:13 AM

May 21 2018

grokos accepted D47130: [CMake] Unify install path for libraries.

LGTM.

May 21 2018, 7:51 AM
grokos added a comment to D44186: [OpenMP] New clang/libomptarget map interface: remove translation code.

Was the new map interface added to Clang and just missed it?

May 21 2018, 7:46 AM · Restricted Project

May 16 2018

grokos accepted D46930: [libomptarget-nvptx-bc] Pass found CUDA installations.

LGTM

May 16 2018, 8:32 AM
grokos accepted D46901: [libomptarget-nvptx] Test bitcode compiler flags and enable by default.

Looks good! Thanks for taking care of it!

May 16 2018, 8:31 AM

May 15 2018

grokos accepted D46840: [OpenMP][libomptarget] Add function for checking SPMD mode.

Looks good.

May 15 2018, 8:13 AM

May 14 2018

grokos requested changes to D46842: [OpenMP][libomptarget] Make bitcode library building depend on clang and llvm-linker being available .

I agree with Jonas, the initial version of the nvptx RTL that we tried to upstream was somewhat like what you try to do here and the consensus was that we do not want to have these dependencies.

May 14 2018, 1:50 PM

May 4 2018

grokos retitled D46210: [OpenMP] Use LIBOMPTARGET_DEVICE_RTL_DEBUG env var to control debug messages on the device side from [OpenMP] Use LIBOMPDEVICE_DEBUG env var to control debug messages on the device side to [OpenMP] Use LIBOMPTARGET_DEVICE_RTL_DEBUG env var to control debug messages on the device side.
May 4 2018, 12:06 PM · Restricted Project
grokos accepted D46210: [OpenMP] Use LIBOMPTARGET_DEVICE_RTL_DEBUG env var to control debug messages on the device side.

LGTM

May 4 2018, 12:06 PM · Restricted Project
grokos added inline comments to D46210: [OpenMP] Use LIBOMPTARGET_DEVICE_RTL_DEBUG env var to control debug messages on the device side.
May 4 2018, 8:52 AM · Restricted Project

May 2 2018

grokos added a comment to D46210: [OpenMP] Use LIBOMPTARGET_DEVICE_RTL_DEBUG env var to control debug messages on the device side.

May I suggest that we use something like LIBOMPTARGET_DEVICE_RTL_DEBUG instead of LIBOMPDEVICE_DEBUG? I think this name is more descriptive of what the env var does.

It's kind of long, but I am ok also if you do not want to shorten it.

May 2 2018, 9:23 AM · Restricted Project
grokos added a comment to D46210: [OpenMP] Use LIBOMPTARGET_DEVICE_RTL_DEBUG env var to control debug messages on the device side.

May I suggest that we use something like LIBOMPTARGET_DEVICE_RTL_DEBUG instead of LIBOMPDEVICE_DEBUG? I think this name is more descriptive of what the env var does.

May 2 2018, 8:02 AM · Restricted Project

Apr 27 2018

grokos added a comment to D46185: [OpenMP] Allow nvptx sm_30 to be used as an offloading device.

I don't think this code does atomic max. Actually, I don't think it does anything at all. First off, old_value and elem are variables which do not change inside the while-loop. So the first loop condition is either always true or always false. Secondly, atomicCAS returns the value that existed at address Buffer (regardless of whether or not the CAS operation succeeded) and not a boolean result. Also, old_value must be updated before each CAS attempt because some other thread may have changed it.

Apr 27 2018, 8:08 AM · Restricted Project

Apr 25 2018

grokos accepted D45528: [OpenMP] Remove compilation warning when using clang to compile bc files..

Looks good.

Apr 25 2018, 10:52 PM · Restricted Project

Apr 16 2018

grokos added a comment to D45528: [OpenMP] Remove compilation warning when using clang to compile bc files..

update as suggested (except PRId64)

Isn't PRId64 working?

I did not find any usage of it on the device side. Not confident on using it. We can remove the warning first, and improve the format later?

It's not giving a warning because in CUDA long long int happens to be 64 bits long, so you can use %lld to print an int64_t. This is not guaranteed to be true on every platform, therefore the portable way to print an int64_t is using the PRId64 macro. That's what we do in the base library.

My understanding of PRId64 is just a pretty format for lld, which I did not master yet, maybe you can help me to understand those two macros better.

But not sure why %lld can be wrong with long long int, this is standard c/c++ which cuda is based on?

Apr 16 2018, 7:46 AM · Restricted Project

Apr 13 2018

grokos added a comment to D45528: [OpenMP] Remove compilation warning when using clang to compile bc files..

update as suggested (except PRId64)

Isn't PRId64 working?

I did not find any usage of it on the device side. Not confident on using it. We can remove the warning first, and improve the format later?

Apr 13 2018, 3:39 PM · Restricted Project
grokos added a comment to D45528: [OpenMP] Remove compilation warning when using clang to compile bc files..

update as suggested (except PRId64)

Apr 13 2018, 12:50 PM · Restricted Project

Apr 12 2018

grokos added a comment to D45220: [OpenMP] Search llvm package to find bc file compiler and linker if not specified by a user.

I see. I was trying to set the bc file utilities using find_pacakge(llvm...) instead of using COMPILER_ID and common it to an upper level as we search things like elf and ffi.

The intention was to keep the user interface the same as what we have now, (if not adding more). But now look back at the comments, I realized I may consider cuda compiler to be clang++, it is actually should be clang?

Apr 12 2018, 1:40 PM · Restricted Project
grokos added inline comments to D45528: [OpenMP] Remove compilation warning when using clang to compile bc files..
Apr 12 2018, 1:33 PM · Restricted Project
grokos accepted D45530: [OpenMP] Make bc file compilation sensitive to LIBOMPTARGET_NVPTX_DEBUG flag.

I think this is now good to go.

Apr 12 2018, 1:15 PM · Restricted Project

Apr 10 2018

grokos accepted D45415: [OpenMP] Remove extra warning when we build.

LGTM

Apr 10 2018, 7:43 AM · Restricted Project

Apr 5 2018

grokos added a comment to D45220: [OpenMP] Search llvm package to find bc file compiler and linker if not specified by a user.

This is only for compiler and linker used for building bclib. It is not the default build compiler.

With this patch, if a user does not specify compiler and linker for the bc library, but uses -DLIBOMPTARGET_NVPTX_ENABLE_BCLIB=1, the cmake script will try to locate if the build compiler is clang and can be used for building the bc lib.

Apr 5 2018, 12:45 PM · Restricted Project
grokos added a comment to D45326: [OpenMP] [CUDA plugin] Add support for teams reduction via scratchpad.

One caveat regarding Alexey's proposal: According to the CUDA programming guide, malloc on the device allocates space from a fixed-size heap. The default size of this heap is 8MB. If we run into a scenario where more than 8MB will be required for the reduction scratchpad, allocating the scratchpad from the device will fail. The heap size can be user-defined from the host, but for that to happen the host must know how large the scratchpad needs to be, which defeats the purpose of moving scratchpad allocation from the plugin to the nvptx runtime.

But you can change the limit using cudaThreadSetLimit

Apr 5 2018, 12:26 PM · Restricted Project
grokos added a comment to D45326: [OpenMP] [CUDA plugin] Add support for teams reduction via scratchpad.

One caveat regarding Alexey's proposal: According to the CUDA programming guide, malloc on the device allocates space from a fixed-size heap. The default size of this heap is 8MB. If we run into a scenario where more than 8MB will be required for the reduction scratchpad, allocating the scratchpad from the device will fail. The heap size can be user-defined from the host, but for that to happen the host must know how large the scratchpad needs to be, which defeats the purpose of moving scratchpad allocation from the plugin to the nvptx runtime.

Apr 5 2018, 11:13 AM · Restricted Project
grokos added inline comments to D45326: [OpenMP] [CUDA plugin] Add support for teams reduction via scratchpad.
Apr 5 2018, 11:05 AM · Restricted Project
grokos updated the diff for D45326: [OpenMP] [CUDA plugin] Add support for teams reduction via scratchpad.
Apr 5 2018, 11:05 AM · Restricted Project
grokos created D45326: [OpenMP] [CUDA plugin] Add support for teams reduction via scratchpad.
Apr 5 2018, 8:21 AM · Restricted Project

Apr 2 2018

grokos accepted D44992: [OpenMP] enable bc file compilation using the latest clang.

OK, so we can proceed with this solution and if we observe any performance problem in the future then I will push a different fix (instead of declaring __shared__ variables as extern, we can have getter functions in the .cu file in which the variable is defined which other .cu files can call to get a pointer/reference to the variable).

Apr 2 2018, 1:54 PM · Restricted Project

Mar 29 2018

grokos added a comment to D44992: [OpenMP] enable bc file compilation using the latest clang.

Right, I remember this issue. It is caused by the fact that clang does not support __shared__ variables to be extern. This restriction was explicitly introduced here: https://reviews.llvm.org/D25125.

Mar 29 2018, 8:27 AM · Restricted Project

Mar 28 2018

grokos added a comment to D44992: [OpenMP] enable bc file compilation using the latest clang.

What is the problem with building the bc library with the latest clang? I cannot reproduce the issue.

Mar 28 2018, 3:18 PM · Restricted Project

Mar 22 2018

grokos updated the summary of D44522: [Libomptarget] Full implementation of the target-offload-icv.
Mar 22 2018, 1:04 PM · Restricted Project
grokos updated the diff for D44522: [Libomptarget] Full implementation of the target-offload-icv.
Mar 22 2018, 12:57 PM · Restricted Project

Mar 21 2018

grokos accepted D44754: [OpenMP][libomptarget] Initialize global memory stack only once..

Looks good.

Mar 21 2018, 1:18 PM
grokos abandoned D38798: [OpenMP] Support for implicit "declare target" functions - Sema patch.

@ABataev came up with a much simpler solution to the implementation of declare target: https://reviews.llvm.org/rL327636

Mar 21 2018, 8:17 AM · Restricted Project
grokos abandoned D43026: [OpenMP] CodeGen for the "declare target" directive - variables, functions, ctors/dtors.

@ABataev came up with a much simpler solution to the implementation of declare target: https://reviews.llvm.org/rL327636

Mar 21 2018, 8:17 AM · Restricted Project, Restricted Project

Mar 20 2018

grokos accepted D44487: [OpenMP][libomptarget] Enable globalization for workers.

LGTM.

Mar 20 2018, 11:45 AM
grokos added inline comments to D44487: [OpenMP][libomptarget] Enable globalization for workers.
Mar 20 2018, 11:17 AM

Mar 16 2018

grokos updated the diff for D44522: [Libomptarget] Full implementation of the target-offload-icv.

Rebase after bug fix.

Mar 16 2018, 7:18 PM · Restricted Project
grokos committed rL327763: Bugfix, extern declarations for libomp functions are `extern "C"` declarations.
Bugfix, extern declarations for libomp functions are `extern "C"` declarations
Mar 16 2018, 7:11 PM
grokos committed rOMP327763: Bugfix, extern declarations for libomp functions are `extern "C"` declarations.
Bugfix, extern declarations for libomp functions are `extern "C"` declarations
Mar 16 2018, 7:11 PM
grokos added a dependent revision for D44577: Read OMP_TARGET_OFFLOAD and provide API to access ICV: D44522: [Libomptarget] Full implementation of the target-offload-icv.
Mar 16 2018, 1:54 PM
grokos added a dependency for D44522: [Libomptarget] Full implementation of the target-offload-icv: D44577: Read OMP_TARGET_OFFLOAD and provide API to access ICV.
Mar 16 2018, 1:54 PM · Restricted Project
grokos updated the diff for D44522: [Libomptarget] Full implementation of the target-offload-icv.

Updated the patch to query the target offload kind from libomp instead of parsing OMP_TARGET_OFFLOAD.

Mar 16 2018, 1:52 PM · Restricted Project
grokos added a comment to D44577: Read OMP_TARGET_OFFLOAD and provide API to access ICV.

Question: If the ICV is set to DEFAULT, then libomptarget must change it to either MANDATORY (if there are available devices in the system) or DISABLED (if there are no devices). Should libomptarget notify libomp about this change?

Mar 16 2018, 1:49 PM
grokos committed rOMP327740: Moved extern declarations to private header file, they are only used from….
Moved extern declarations to private header file, they are only used from…
Mar 16 2018, 1:43 PM
grokos committed rL327740: Moved extern declarations to private header file, they are only used from….
Moved extern declarations to private header file, they are only used from…
Mar 16 2018, 1:42 PM
grokos updated the diff for D44522: [Libomptarget] Full implementation of the target-offload-icv.

Waiting for the libomp-side patch for OMP_TARGET_OFFLOAD, I removed parsing this env var from within libomptarget and now this revision only implements the internal logic to handle whatever libomptarget will get by querying libomp. (there is already some code checking for OMP_TARGET_OFFLOAD==DISABLED, I'm leaving it there in order not to break systems which make use of this option right now)

Mar 16 2018, 9:27 AM · Restricted Project

Mar 15 2018

grokos accepted D44537: [OpenMP][libomptarget] Fix master warp check.

Looks good.

Mar 15 2018, 1:30 PM
grokos updated the diff for D44522: [Libomptarget] Full implementation of the target-offload-icv.
Mar 15 2018, 1:24 PM · Restricted Project
grokos retitled D44522: [Libomptarget] Full implementation of the target-offload-icv from [Libomptarget] Behavior of library upon target offload failure to [Libomptarget] Full implementation of the target-offload-icv.
Mar 15 2018, 1:24 PM · Restricted Project
grokos added a comment to D44522: [Libomptarget] Full implementation of the target-offload-icv.

Right, I just read TR6 and everything is defined there. I'll revise the patch and upload a new diff.

Mar 15 2018, 10:24 AM · Restricted Project
grokos created D44522: [Libomptarget] Full implementation of the target-offload-icv.
Mar 15 2018, 9:17 AM · Restricted Project
grokos accepted D44486: [OpenMP][libomptarget] Enable usage of shared memory slots.

This is obviously good to go!

Mar 15 2018, 8:35 AM
grokos accepted D44470: [OpenMP][libomptarget] Enable multiple frames per global memory slot.

Looks good.

Mar 15 2018, 8:34 AM

Mar 14 2018

grokos added inline comments to D44470: [OpenMP][libomptarget] Enable multiple frames per global memory slot.
Mar 14 2018, 7:54 PM
grokos added inline comments to D44470: [OpenMP][libomptarget] Enable multiple frames per global memory slot.
Mar 14 2018, 3:19 PM
grokos added inline comments to D44487: [OpenMP][libomptarget] Enable globalization for workers.
Mar 14 2018, 2:05 PM
grokos added inline comments to D44487: [OpenMP][libomptarget] Enable globalization for workers.
Mar 14 2018, 12:23 PM
grokos committed rOMP327556: [libomptarget][nvptx] Bug fix: Correctly identify the warp master active thread..
[libomptarget][nvptx] Bug fix: Correctly identify the warp master active thread.
Mar 14 2018, 12:15 PM
grokos committed rL327556: [libomptarget][nvptx] Bug fix: Correctly identify the warp master active thread..
[libomptarget][nvptx] Bug fix: Correctly identify the warp master active thread.
Mar 14 2018, 12:15 PM

Mar 12 2018

grokos accepted D44260: [OpenMP][libomptarget] Add global memory data sharing support for master-worker sharing..

LGTM.

Mar 12 2018, 8:06 AM

Mar 9 2018

grokos added inline comments to D44186: [OpenMP] New clang/libomptarget map interface: remove translation code.
Mar 9 2018, 2:48 PM · Restricted Project
grokos added inline comments to D44260: [OpenMP][libomptarget] Add global memory data sharing support for master-worker sharing..
Mar 9 2018, 1:20 PM
grokos added inline comments to D43197: [OpenMP] Add flag for linking runtime bitcode library.
Mar 9 2018, 7:36 AM

Mar 8 2018

grokos added inline comments to D44260: [OpenMP][libomptarget] Add global memory data sharing support for master-worker sharing..
Mar 8 2018, 10:57 AM
grokos accepted D44254: [OpenMP][libomptarget] Fix union..

I think the assignment makes more sense.

Mar 8 2018, 10:24 AM

Mar 7 2018

grokos accepted D43626: [OpenMP] Remove implicit data sharing using device shared memory from libomptarget.

I think the patch looks good. Let's remove the obsolete code in preparation for the new data-sharing scheme.

Mar 7 2018, 12:53 PM
grokos added inline comments to D44186: [OpenMP] New clang/libomptarget map interface: remove translation code.
Mar 7 2018, 9:16 AM · Restricted Project
grokos updated the summary of D44186: [OpenMP] New clang/libomptarget map interface: remove translation code.
Mar 7 2018, 8:24 AM · Restricted Project
grokos added inline comments to D44186: [OpenMP] New clang/libomptarget map interface: remove translation code.
Mar 7 2018, 8:23 AM · Restricted Project
grokos updated the diff for D44186: [OpenMP] New clang/libomptarget map interface: remove translation code.

OK, I left the change to the internal device ID representation out of this patch.

Mar 7 2018, 7:59 AM · Restricted Project

Mar 6 2018

grokos retitled D44186: [OpenMP] New clang/libomptarget map interface: remove translation code from [Clang][OpenMP] New clang/libomptarget map interface: remove translation code to [OpenMP] New clang/libomptarget map interface: remove translation code.
Mar 6 2018, 7:26 PM · Restricted Project
grokos created D44186: [OpenMP] New clang/libomptarget map interface: remove translation code.
Mar 6 2018, 7:26 PM · Restricted Project

Mar 1 2018

grokos added a comment to D43626: [OpenMP] Remove implicit data sharing using device shared memory from libomptarget.

As far as I am concerned, I agree that the existing data sharing scheme should be removed in favour of a simpler future patch which will introduce the more generic data sharing scheme.

Mar 1 2018, 10:52 AM

Feb 12 2018

grokos accepted D43197: [OpenMP] Add flag for linking runtime bitcode library.

I don't have any other remarks, looks good.

Feb 12 2018, 11:44 AM
grokos added inline comments to D43197: [OpenMP] Add flag for linking runtime bitcode library.
Feb 12 2018, 9:13 AM

Feb 8 2018

grokos accepted D41724: [OpenMP][libomptarget] Enable the compilation of multiple bc libraries for runtime inlining.
Feb 8 2018, 3:15 PM
grokos added inline comments to D41724: [OpenMP][libomptarget] Enable the compilation of multiple bc libraries for runtime inlining.
Feb 8 2018, 7:49 AM

Feb 7 2018

grokos added a reviewer for D43026: [OpenMP] CodeGen for the "declare target" directive - variables, functions, ctors/dtors: power-llvm-team.
Feb 7 2018, 10:00 AM · Restricted Project, Restricted Project
grokos added a reviewer for D38798: [OpenMP] Support for implicit "declare target" functions - Sema patch: power-llvm-team.
Feb 7 2018, 10:00 AM · Restricted Project
grokos retitled D43026: [OpenMP] CodeGen for the "declare target" directive - variables, functions, ctors/dtors from [OpenMP] Support for implicit "declare target" functions - CodeGen patch to [OpenMP] CodeGen for the "declare target" directive - variables, functions, ctors/dtors.
Feb 7 2018, 9:21 AM · Restricted Project, Restricted Project
grokos created D43026: [OpenMP] CodeGen for the "declare target" directive - variables, functions, ctors/dtors.
Feb 7 2018, 8:44 AM · Restricted Project, Restricted Project

Feb 6 2018

grokos accepted D41485: [OpenMP][libomptarget] Add data sharing support in libomptarget.

Looks good. Thanks for submitting! I'll take care of duplicate extern declarations in another patch.

Feb 6 2018, 12:33 PM

Jan 30 2018

grokos accepted D42686: [libomptarget] Only use CUDA Driver API.

LGTM.

Jan 30 2018, 7:02 AM
grokos added inline comments to D42686: [libomptarget] Only use CUDA Driver API.
Jan 30 2018, 6:51 AM
grokos accepted D42643: [libomptarget] Check for library with CUDA Driver API.

Looks good!

Jan 30 2018, 6:39 AM
grokos added inline comments to D42686: [libomptarget] Only use CUDA Driver API.
Jan 30 2018, 6:24 AM

Jan 29 2018

grokos added inline comments to D42643: [libomptarget] Check for library with CUDA Driver API.
Jan 29 2018, 2:26 PM
grokos committed rOMP323649: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget….
[OpenMP] Initial implementation of OpenMP offloading library - libomptarget…
Jan 29 2018, 6:07 AM