grokos (George Rokos)
User

Projects

User does not belong to any projects.

User Details

User Since
Feb 16 2016, 1:22 PM (114 w, 18 h)

Recent Activity

Mon, Apr 16

grokos added a comment to D45528: [OpenMP] Remove compilation warning when using clang to compile bc files..

update as suggested (except PRId64)

Isn't PRId64 working?

I did not find any usage of it on the device side. Not confident on using it. We can remove the warning first, and improve the format later?

It's not giving a warning because in CUDA long long int happens to be 64 bits long, so you can use %lld to print an int64_t. This is not guaranteed to be true on every platform, therefore the portable way to print an int64_t is using the PRId64 macro. That's what we do in the base library.

My understanding of PRId64 is just a pretty format for lld, which I did not master yet, maybe you can help me to understand those two macros better.

But not sure why %lld can be wrong with long long int, this is standard c/c++ which cuda is based on?

Mon, Apr 16, 7:46 AM · Restricted Project

Fri, Apr 13

grokos added a comment to D45528: [OpenMP] Remove compilation warning when using clang to compile bc files..

update as suggested (except PRId64)

Isn't PRId64 working?

I did not find any usage of it on the device side. Not confident on using it. We can remove the warning first, and improve the format later?

Fri, Apr 13, 3:39 PM · Restricted Project
grokos added a comment to D45528: [OpenMP] Remove compilation warning when using clang to compile bc files..

update as suggested (except PRId64)

Fri, Apr 13, 12:50 PM · Restricted Project

Thu, Apr 12

grokos added a comment to D45220: [OpenMP] Search llvm package to find bc file compiler and linker if not specified by a user.

I see. I was trying to set the bc file utilities using find_pacakge(llvm...) instead of using COMPILER_ID and common it to an upper level as we search things like elf and ffi.

The intention was to keep the user interface the same as what we have now, (if not adding more). But now look back at the comments, I realized I may consider cuda compiler to be clang++, it is actually should be clang?

Thu, Apr 12, 1:40 PM · Restricted Project
grokos added inline comments to D45528: [OpenMP] Remove compilation warning when using clang to compile bc files..
Thu, Apr 12, 1:33 PM · Restricted Project
grokos accepted D45530: [OpenMP] Make bc file compilation sensitive to LIBOMPTARGET_NVPTX_DEBUG flag.

I think this is now good to go.

Thu, Apr 12, 1:15 PM · Restricted Project

Tue, Apr 10

grokos accepted D45415: [OpenMP] Remove extra warning when we build.

LGTM

Tue, Apr 10, 7:43 AM · Restricted Project

Thu, Apr 5

grokos added a comment to D45220: [OpenMP] Search llvm package to find bc file compiler and linker if not specified by a user.

This is only for compiler and linker used for building bclib. It is not the default build compiler.

With this patch, if a user does not specify compiler and linker for the bc library, but uses -DLIBOMPTARGET_NVPTX_ENABLE_BCLIB=1, the cmake script will try to locate if the build compiler is clang and can be used for building the bc lib.

Thu, Apr 5, 12:45 PM · Restricted Project
grokos added a comment to D45326: [OpenMP] [CUDA plugin] Add support for teams reduction via scratchpad.

One caveat regarding Alexey's proposal: According to the CUDA programming guide, malloc on the device allocates space from a fixed-size heap. The default size of this heap is 8MB. If we run into a scenario where more than 8MB will be required for the reduction scratchpad, allocating the scratchpad from the device will fail. The heap size can be user-defined from the host, but for that to happen the host must know how large the scratchpad needs to be, which defeats the purpose of moving scratchpad allocation from the plugin to the nvptx runtime.

But you can change the limit using cudaThreadSetLimit

Thu, Apr 5, 12:26 PM · Restricted Project
grokos added a comment to D45326: [OpenMP] [CUDA plugin] Add support for teams reduction via scratchpad.

One caveat regarding Alexey's proposal: According to the CUDA programming guide, malloc on the device allocates space from a fixed-size heap. The default size of this heap is 8MB. If we run into a scenario where more than 8MB will be required for the reduction scratchpad, allocating the scratchpad from the device will fail. The heap size can be user-defined from the host, but for that to happen the host must know how large the scratchpad needs to be, which defeats the purpose of moving scratchpad allocation from the plugin to the nvptx runtime.

Thu, Apr 5, 11:13 AM · Restricted Project
grokos added inline comments to D45326: [OpenMP] [CUDA plugin] Add support for teams reduction via scratchpad.
Thu, Apr 5, 11:05 AM · Restricted Project
grokos updated the diff for D45326: [OpenMP] [CUDA plugin] Add support for teams reduction via scratchpad.
Thu, Apr 5, 11:05 AM · Restricted Project
grokos created D45326: [OpenMP] [CUDA plugin] Add support for teams reduction via scratchpad.
Thu, Apr 5, 8:21 AM · Restricted Project

Mon, Apr 2

grokos accepted D44992: [OpenMP] enable bc file compilation using the latest clang.

OK, so we can proceed with this solution and if we observe any performance problem in the future then I will push a different fix (instead of declaring __shared__ variables as extern, we can have getter functions in the .cu file in which the variable is defined which other .cu files can call to get a pointer/reference to the variable).

Mon, Apr 2, 1:54 PM · Restricted Project

Thu, Mar 29

grokos added a comment to D44992: [OpenMP] enable bc file compilation using the latest clang.

Right, I remember this issue. It is caused by the fact that clang does not support __shared__ variables to be extern. This restriction was explicitly introduced here: https://reviews.llvm.org/D25125.

Thu, Mar 29, 8:27 AM · Restricted Project

Wed, Mar 28

grokos added a comment to D44992: [OpenMP] enable bc file compilation using the latest clang.

What is the problem with building the bc library with the latest clang? I cannot reproduce the issue.

Wed, Mar 28, 3:18 PM · Restricted Project

Mar 22 2018

grokos updated the summary of D44522: [Libomptarget] Full implementation of the target-offload-icv.
Mar 22 2018, 1:04 PM · Restricted Project
grokos updated the diff for D44522: [Libomptarget] Full implementation of the target-offload-icv.
Mar 22 2018, 12:57 PM · Restricted Project

Mar 21 2018

grokos accepted D44754: [OpenMP][libomptarget] Initialize global memory stack only once..

Looks good.

Mar 21 2018, 1:18 PM
grokos abandoned D38798: [OpenMP] Support for implicit "declare target" functions - Sema patch.

@ABataev came up with a much simpler solution to the implementation of declare target: https://reviews.llvm.org/rL327636

Mar 21 2018, 8:17 AM · Restricted Project
grokos abandoned D43026: [OpenMP] CodeGen for the "declare target" directive - variables, functions, ctors/dtors.

@ABataev came up with a much simpler solution to the implementation of declare target: https://reviews.llvm.org/rL327636

Mar 21 2018, 8:17 AM · Restricted Project, Restricted Project

Mar 20 2018

grokos accepted D44487: [OpenMP][libomptarget] Enable globalization for workers.

LGTM.

Mar 20 2018, 11:45 AM
grokos added inline comments to D44487: [OpenMP][libomptarget] Enable globalization for workers.
Mar 20 2018, 11:17 AM

Mar 16 2018

grokos updated the diff for D44522: [Libomptarget] Full implementation of the target-offload-icv.

Rebase after bug fix.

Mar 16 2018, 7:18 PM · Restricted Project
grokos committed rL327763: Bugfix, extern declarations for libomp functions are `extern "C"` declarations.
Bugfix, extern declarations for libomp functions are `extern "C"` declarations
Mar 16 2018, 7:11 PM
grokos committed rOMP327763: Bugfix, extern declarations for libomp functions are `extern "C"` declarations.
Bugfix, extern declarations for libomp functions are `extern "C"` declarations
Mar 16 2018, 7:11 PM
grokos added a dependent revision for D44577: Read OMP_TARGET_OFFLOAD and provide API to access ICV: D44522: [Libomptarget] Full implementation of the target-offload-icv.
Mar 16 2018, 1:54 PM
grokos added a dependency for D44522: [Libomptarget] Full implementation of the target-offload-icv: D44577: Read OMP_TARGET_OFFLOAD and provide API to access ICV.
Mar 16 2018, 1:54 PM · Restricted Project
grokos updated the diff for D44522: [Libomptarget] Full implementation of the target-offload-icv.

Updated the patch to query the target offload kind from libomp instead of parsing OMP_TARGET_OFFLOAD.

Mar 16 2018, 1:52 PM · Restricted Project
grokos added a comment to D44577: Read OMP_TARGET_OFFLOAD and provide API to access ICV.

Question: If the ICV is set to DEFAULT, then libomptarget must change it to either MANDATORY (if there are available devices in the system) or DISABLED (if there are no devices). Should libomptarget notify libomp about this change?

Mar 16 2018, 1:49 PM
grokos committed rOMP327740: Moved extern declarations to private header file, they are only used from….
Moved extern declarations to private header file, they are only used from…
Mar 16 2018, 1:43 PM
grokos committed rL327740: Moved extern declarations to private header file, they are only used from….
Moved extern declarations to private header file, they are only used from…
Mar 16 2018, 1:42 PM
grokos updated the diff for D44522: [Libomptarget] Full implementation of the target-offload-icv.

Waiting for the libomp-side patch for OMP_TARGET_OFFLOAD, I removed parsing this env var from within libomptarget and now this revision only implements the internal logic to handle whatever libomptarget will get by querying libomp. (there is already some code checking for OMP_TARGET_OFFLOAD==DISABLED, I'm leaving it there in order not to break systems which make use of this option right now)

Mar 16 2018, 9:27 AM · Restricted Project

Mar 15 2018

grokos accepted D44537: [OpenMP][libomptarget] Fix master warp check.

Looks good.

Mar 15 2018, 1:30 PM
grokos updated the diff for D44522: [Libomptarget] Full implementation of the target-offload-icv.
Mar 15 2018, 1:24 PM · Restricted Project
grokos retitled D44522: [Libomptarget] Full implementation of the target-offload-icv from [Libomptarget] Behavior of library upon target offload failure to [Libomptarget] Full implementation of the target-offload-icv.
Mar 15 2018, 1:24 PM · Restricted Project
grokos added a comment to D44522: [Libomptarget] Full implementation of the target-offload-icv.

Right, I just read TR6 and everything is defined there. I'll revise the patch and upload a new diff.

Mar 15 2018, 10:24 AM · Restricted Project
grokos created D44522: [Libomptarget] Full implementation of the target-offload-icv.
Mar 15 2018, 9:17 AM · Restricted Project
grokos accepted D44486: [OpenMP][libomptarget] Enable usage of shared memory slots.

This is obviously good to go!

Mar 15 2018, 8:35 AM
grokos accepted D44470: [OpenMP][libomptarget] Enable multiple frames per global memory slot.

Looks good.

Mar 15 2018, 8:34 AM

Mar 14 2018

grokos added inline comments to D44470: [OpenMP][libomptarget] Enable multiple frames per global memory slot.
Mar 14 2018, 7:54 PM
grokos added inline comments to D44470: [OpenMP][libomptarget] Enable multiple frames per global memory slot.
Mar 14 2018, 3:19 PM
grokos added inline comments to D44487: [OpenMP][libomptarget] Enable globalization for workers.
Mar 14 2018, 2:05 PM
grokos added inline comments to D44487: [OpenMP][libomptarget] Enable globalization for workers.
Mar 14 2018, 12:23 PM
grokos committed rOMP327556: [libomptarget][nvptx] Bug fix: Correctly identify the warp master active thread..
[libomptarget][nvptx] Bug fix: Correctly identify the warp master active thread.
Mar 14 2018, 12:15 PM
grokos committed rL327556: [libomptarget][nvptx] Bug fix: Correctly identify the warp master active thread..
[libomptarget][nvptx] Bug fix: Correctly identify the warp master active thread.
Mar 14 2018, 12:15 PM

Mar 12 2018

grokos accepted D44260: [OpenMP][libomptarget] Add global memory data sharing support for master-worker sharing..

LGTM.

Mar 12 2018, 8:06 AM

Mar 9 2018

grokos added inline comments to D44186: [OpenMP] New clang/libomptarget map interface: remove translation code.
Mar 9 2018, 2:48 PM · Restricted Project
grokos added inline comments to D44260: [OpenMP][libomptarget] Add global memory data sharing support for master-worker sharing..
Mar 9 2018, 1:20 PM
grokos added inline comments to D43197: [OpenMP] Add flag for linking runtime bitcode library.
Mar 9 2018, 7:36 AM

Mar 8 2018

grokos added inline comments to D44260: [OpenMP][libomptarget] Add global memory data sharing support for master-worker sharing..
Mar 8 2018, 10:57 AM
grokos accepted D44254: [OpenMP][libomptarget] Fix union..

I think the assignment makes more sense.

Mar 8 2018, 10:24 AM

Mar 7 2018

grokos accepted D43626: [OpenMP] Remove implicit data sharing using device shared memory from libomptarget.

I think the patch looks good. Let's remove the obsolete code in preparation for the new data-sharing scheme.

Mar 7 2018, 12:53 PM
grokos added inline comments to D44186: [OpenMP] New clang/libomptarget map interface: remove translation code.
Mar 7 2018, 9:16 AM · Restricted Project
grokos updated the summary of D44186: [OpenMP] New clang/libomptarget map interface: remove translation code.
Mar 7 2018, 8:24 AM · Restricted Project
grokos added inline comments to D44186: [OpenMP] New clang/libomptarget map interface: remove translation code.
Mar 7 2018, 8:23 AM · Restricted Project
grokos updated the diff for D44186: [OpenMP] New clang/libomptarget map interface: remove translation code.

OK, I left the change to the internal device ID representation out of this patch.

Mar 7 2018, 7:59 AM · Restricted Project

Mar 6 2018

grokos retitled D44186: [OpenMP] New clang/libomptarget map interface: remove translation code from [Clang][OpenMP] New clang/libomptarget map interface: remove translation code to [OpenMP] New clang/libomptarget map interface: remove translation code.
Mar 6 2018, 7:26 PM · Restricted Project
grokos created D44186: [OpenMP] New clang/libomptarget map interface: remove translation code.
Mar 6 2018, 7:26 PM · Restricted Project

Mar 1 2018

grokos added a comment to D43626: [OpenMP] Remove implicit data sharing using device shared memory from libomptarget.

As far as I am concerned, I agree that the existing data sharing scheme should be removed in favour of a simpler future patch which will introduce the more generic data sharing scheme.

Mar 1 2018, 10:52 AM

Feb 12 2018

grokos accepted D43197: [OpenMP] Add flag for linking runtime bitcode library.

I don't have any other remarks, looks good.

Feb 12 2018, 11:44 AM
grokos added inline comments to D43197: [OpenMP] Add flag for linking runtime bitcode library.
Feb 12 2018, 9:13 AM

Feb 8 2018

grokos accepted D41724: [OpenMP][libomptarget] Enable the compilation of multiple bc libraries for runtime inlining.
Feb 8 2018, 3:15 PM
grokos added inline comments to D41724: [OpenMP][libomptarget] Enable the compilation of multiple bc libraries for runtime inlining.
Feb 8 2018, 7:49 AM

Feb 7 2018

grokos added a reviewer for D43026: [OpenMP] CodeGen for the "declare target" directive - variables, functions, ctors/dtors: power-llvm-team.
Feb 7 2018, 10:00 AM · Restricted Project, Restricted Project
grokos added a reviewer for D38798: [OpenMP] Support for implicit "declare target" functions - Sema patch: power-llvm-team.
Feb 7 2018, 10:00 AM · Restricted Project
grokos retitled D43026: [OpenMP] CodeGen for the "declare target" directive - variables, functions, ctors/dtors from [OpenMP] Support for implicit "declare target" functions - CodeGen patch to [OpenMP] CodeGen for the "declare target" directive - variables, functions, ctors/dtors.
Feb 7 2018, 9:21 AM · Restricted Project, Restricted Project
grokos created D43026: [OpenMP] CodeGen for the "declare target" directive - variables, functions, ctors/dtors.
Feb 7 2018, 8:44 AM · Restricted Project, Restricted Project

Feb 6 2018

grokos accepted D41485: [OpenMP][libomptarget] Add data sharing support in libomptarget.

Looks good. Thanks for submitting! I'll take care of duplicate extern declarations in another patch.

Feb 6 2018, 12:33 PM

Jan 30 2018

grokos accepted D42686: [libomptarget] Only use CUDA Driver API.

LGTM.

Jan 30 2018, 7:02 AM
grokos added inline comments to D42686: [libomptarget] Only use CUDA Driver API.
Jan 30 2018, 6:51 AM
grokos accepted D42643: [libomptarget] Check for library with CUDA Driver API.

Looks good!

Jan 30 2018, 6:39 AM
grokos added inline comments to D42686: [libomptarget] Only use CUDA Driver API.
Jan 30 2018, 6:24 AM

Jan 29 2018

grokos added inline comments to D42643: [libomptarget] Check for library with CUDA Driver API.
Jan 29 2018, 2:26 PM
grokos committed rOMP323649: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget….
[OpenMP] Initial implementation of OpenMP offloading library - libomptarget…
Jan 29 2018, 6:07 AM
grokos committed rL323649: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget….
[OpenMP] Initial implementation of OpenMP offloading library - libomptarget…
Jan 29 2018, 6:03 AM
grokos closed D14254: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs..
Jan 29 2018, 6:02 AM · Restricted Project
grokos added inline comments to D14254: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs..
Jan 29 2018, 6:00 AM · Restricted Project

Jan 26 2018

grokos updated the diff for D14254: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs..

Removed obsolete enum constant.

Jan 26 2018, 10:45 AM · Restricted Project
grokos updated the diff for D14254: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs..

Fixed formatting with clang-format.

Jan 26 2018, 10:40 AM · Restricted Project
grokos updated the diff for D14254: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs..
Jan 26 2018, 10:25 AM · Restricted Project
grokos abandoned D42584: [OpenMP] nvptx device RTL - final changes.
Jan 26 2018, 10:20 AM · Restricted Project
grokos removed a dependency for D42584: [OpenMP] nvptx device RTL - final changes: D14254: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs..
Jan 26 2018, 10:20 AM · Restricted Project
grokos removed a dependent revision for D14254: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs.: D42584: [OpenMP] nvptx device RTL - final changes.
Jan 26 2018, 10:20 AM · Restricted Project
grokos added a comment to D14254: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs..

I created a child revision with the final changes.

Jan 26 2018, 9:54 AM · Restricted Project
grokos added a dependency for D42584: [OpenMP] nvptx device RTL - final changes: D14254: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs..
Jan 26 2018, 9:54 AM · Restricted Project
grokos added a dependent revision for D14254: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs.: D42584: [OpenMP] nvptx device RTL - final changes.
Jan 26 2018, 9:54 AM · Restricted Project
grokos created D42584: [OpenMP] nvptx device RTL - final changes.
Jan 26 2018, 9:54 AM · Restricted Project

Jan 23 2018

grokos added a comment to D14254: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs..

Regarding the bitcode compiler/linker, we agreed on skipping looking for a suitable compiler in PATH, so for consistency I also skip looking in PATH for a linker. CMake only tries to locate llvm-link in the same directory as the clang binary (if clang is the CMAKE_C_COMPILER) or uses whatever the user has specified. Following @hfinkel 's suggestions, I updated the documentation to describe this behavior.

Jan 23 2018, 11:35 AM · Restricted Project
grokos added inline comments to D14254: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs..
Jan 23 2018, 11:28 AM · Restricted Project
grokos updated the diff for D14254: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs..

Made corrections according to the feedback I got. Regarding the general comments:

  1. I found only one more instance of threadID & (WARPSIZE-1) and replaced it with threadID % WARPSIZE. There are a few more bitwise operations in the code of the form threadID & ~(WARPSIZE-1) (note the bitwise NOT) which do something else - they round threadID down to the nearest multiple of WARPSIZE. Doing this with arithmetic operators would require something like threadID - threadID%WARPSIZE which I doubt the compiler could optimize (not to mention that the bitwise version is actually more easily recognizable as a round-to-a-multiple-of-2^x operation than the arithmetic version).
  2. I replaced all references to the built-in warpSize with the macro WARPSIZE. warpSize would have the advantage that it makes the code forward-compatible (if the warp size ever changes, the change will be reflected on this variable); however, it is not a compile-time constant and that would prevent the compiler from making certain optimizations (especially since the warp size is used in modulo operations for which we have assumed that the compiler can transform the modulo operation into bitwise operations). DS_Max_Worker_Warp_Size was also replaced with WARPSIZE.
Jan 23 2018, 11:28 AM · Restricted Project

Dec 20 2017

grokos added inline comments to D14254: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs..
Dec 20 2017, 11:58 AM · Restricted Project

Dec 12 2017

grokos added inline comments to D14254: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs..
Dec 12 2017, 8:17 AM · Restricted Project
grokos updated the diff for D14254: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs..
Dec 12 2017, 8:17 AM · Restricted Project

Dec 11 2017

grokos added inline comments to D14254: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs..
Dec 11 2017, 5:36 PM · Restricted Project
grokos updated the diff for D14254: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs..
Dec 11 2017, 5:35 PM · Restricted Project

Dec 7 2017

grokos updated the diff for D14254: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs..

Typos....

Dec 7 2017, 2:04 PM · Restricted Project
grokos added inline comments to D14254: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs..
Dec 7 2017, 1:51 PM · Restricted Project
grokos updated the diff for D14254: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs..

Responded to comments and modified the code accordingly.

Dec 7 2017, 1:50 PM · Restricted Project
grokos committed rL320082: [OpenMP] NVPTX: Set default/minimum compute capability to sm_35.
[OpenMP] NVPTX: Set default/minimum compute capability to sm_35
Dec 7 2017, 12:28 PM