Page MenuHomePhabricator

gregrodgers (Greg Rodgers)
User

Projects

User does not belong to any projects.

User Details

User Since
Aug 22 2016, 7:51 AM (171 w, 3 d)

Recent Activity

Aug 23 2018

gregrodgers added a comment to D50845: [CUDA/OpenMP] Define only some host macros during device compilation.

I have a longer comment on header files, but let me first understand this patch.

Aug 23 2018, 1:37 PM

Aug 22 2018

gregrodgers added a comment to D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls.

I like the idea of using an automatic include as a cc1 option (-include). However, I would prefer a more general automatic include for OpenMP, not just for math functions (clang_cuda_device_functions.h). Clang cuda automatically includes clang_cuda_runtime_wrapper.h. It includes other files as needed like clang_cuda_device_functions.h. Lets hypothetically call my proposed automatic include for OpenMP , clang_openmp_runtime_wrapper.h.

Aug 22 2018, 9:58 AM · Restricted Project

Jun 25 2018

gregrodgers added a comment to D48455: Remove hip.amdgcn.bc hc.amdgcn.bc from HIP Toolchains.

Why not provide a specific list of --hip-device-lib= for VDI builds? I am not sure about defining functions inside headers instead of using a hip bc lib.

Jun 25 2018, 8:27 AM · Unknown Object (Project), Restricted Project

May 7 2018

gregrodgers added a comment to D46185: [OpenMP] Allow nvptx sm_30 to be used as an offloading device.

I agree that George's RMW proposed code is correct. This was my first attempt at an RMW code. Maybe we should implement atomicMax as a device function in architecture-specific (e.g sm_30) device library. This way the code in loop.cu can remain just a call to atomicMax. Such an implementation would need an overloaded atomicMax.

May 7 2018, 9:12 AM · Unknown Object (Project)

Apr 9 2018

Herald updated subscribers of D35129: Add DragonFlyBSD support to OpenMP.
Apr 9 2018, 10:48 AM

Apr 4 2018

gregrodgers added a comment to D44992: [OpenMP] enable bc file compilation using the latest clang.

So , will the deviceRTLs/nvptx change? Instead of extern shared, what will it use for those data structures?

Apr 4 2018, 9:41 AM · Unknown Object (Project)

Apr 2 2018

gregrodgers added a comment to D44992: [OpenMP] enable bc file compilation using the latest clang.

Maybe my search is missing something, but the only place I see CUDARelocatableDeviceCode is in lib/Sema/SemaDeclAttr.cpp to allow for extern shared. How could this be causing slowness? I would think forcing extern to be global would be slower.

Apr 2 2018, 9:13 AM · Unknown Object (Project)

Feb 5 2018

gregrodgers added a comment to D42800: Let CUDA toolchain support amdgpu target.

Here my replys to the inline comments. Everything should be fixed in the next revision.

Feb 5 2018, 2:18 PM

Feb 1 2018

gregrodgers added a comment to D42800: Let CUDA toolchain support amdgpu target.

Sorry, all my great inline comments got lost somehow. I am a newbie to Phabricator. I will try to reconstruct my comments.

Feb 1 2018, 6:45 PM
gregrodgers requested changes to D42800: Let CUDA toolchain support amdgpu target.

Thanks to everyone for the reviews. I hope I replied to all inline comments. Since I sent this to Sam to post, we discovered a major shortcoming. As tra points out, there is a lot of cuda headers in the cuda sdk that are processed. We are able to override asm() expansions with #undef and redefine as an equivalent amdgpu component so the compiler never sees the asm(). I am sure we will need to add more redefines as we broaden our testing. But that is not the big problem. We would like to be able to run cudaclang for AMD GPUs without an install of cuda. Of course you must always install cuda if any of your targeted GPUs are NVidia GPUs. To run cudaclang without cuda when only non-NVidia gpus are specified, we need an open set of headers and we must replace the fatbin tools used in the toolchain. The later can be addressed by using the libomptarget methods for embedding multiple target GPU objects. The former is going to take a lot of work. I am going to be sending an updated patch that has the stubs for the open headers noted in clang_cuda_runtime_wrapper.h. They will be included with the CC1 flag -DUSE_OPEN_HEADERS__. This will be generated by the cuda driver when it finds no cuda installation and all target GPUs are not NVidia.

Feb 1 2018, 6:42 PM

Dec 19 2016

gregrodgers added a comment to D27928: Add isGPU() to Triple.

Justin, the commonality between nvptx and amdgcn LLVM IR is exactly why I would like isGPU(). I actually do want to assume that "isGPU" <--> "isNVPTX || isAMDGCN".

Dec 19 2016, 6:14 PM · Unknown Object (Project)
gregrodgers added a comment to D27928: Add isGPU() to Triple.

I can email you a bigger patch from our development tree. I would rather not post it in public yet because it still needs some work. Here are two examples from this patch.

Dec 19 2016, 2:22 PM · Unknown Object (Project)
gregrodgers added a comment to D27928: Add isGPU() to Triple.

Thank you Justin, Yes, I plan to use this extensively in clang for common OpenMP code generation. But I don't have those patches ready yet.
isGPU() may also be used for compilation of cuda code to LLVM IR as alternative to isNVPTX(). I will discuss with google authors first.
I formatted to 80 lines. Thank you for your patience with a new contributor.

Dec 19 2016, 11:15 AM · Unknown Object (Project)
gregrodgers updated the diff for D27928: Add isGPU() to Triple.

Formatted to 80 characters.

Dec 19 2016, 11:01 AM · Unknown Object (Project)
gregrodgers retitled D27928: Add isGPU() to Triple from to Add isGPU() to Triple.
Dec 19 2016, 9:52 AM · Unknown Object (Project)