Page MenuHomePhabricator

gregrodgers (Greg Rodgers)
User

Projects

User does not belong to any projects.

User Details

User Since
Aug 22 2016, 7:51 AM (353 w, 3 d)

Recent Activity

Apr 24 2023

gregrodgers abandoned D147666: [OPENMP] Adds <install_path>/lib to rpath to avoid need to set LD_LIBRARY_PATH to find plugins..

Its ugly, but to avoid requirement to set LD_LIBRARY_PATH for end-users who may need LD_LIBRARY_PATH for their own application, we will modify the compiler installation with these bash commands where _INSTALL_DIR is the installation directory.

Apr 24 2023, 9:33 AM · Restricted Project, Restricted Project

Apr 21 2023

gregrodgers added a comment to D148808: [OpenMP][libomptarget][AMDGPU] Enable active HSA wait state.

Why separate KERNEL_BUSYWAIT and DATA_BUSYWAIT? I don't feel the need of two separate controls.
What are the units? Documentation?

Apr 21 2023, 5:47 AM · Unknown Object (Project), Restricted Project

Apr 19 2023

gregrodgers accepted D147511: [OpenMP] Fix nextgen plugin behavior when passing negative thread_limit clause.
Apr 19 2023, 1:24 PM · Unknown Object (Project), Restricted Project

Apr 5 2023

gregrodgers requested review of D147666: [OPENMP] Adds <install_path>/lib to rpath to avoid need to set LD_LIBRARY_PATH to find plugins..
Apr 5 2023, 3:26 PM · Restricted Project, Restricted Project

Mar 23 2023

gregrodgers accepted D146075: [flang][driver][openmp] Write MLIR for -save-temps.

This looks good. Please merge. I found it very useful especially in the context of other generated temp files generated after llvm llinking and optimization in the offload driver. For example just listing the temp files with ls -ltr provides a somewhat visual ordering of the compilation flow.

Mar 23 2023, 7:48 AM · Restricted Project, Restricted Project, Restricted Project

Oct 27 2022

gregrodgers updated the diff for D136641: [OPENMP] Fast cross-team reduction (xteamr) helper functions..

4th update to original D136631 review. Changes:

  • Remove unnecessary thread sync following LDS reduction.
  • Replaced the bool LDS __is_last_team with LDS copy of the team counter value returned from atomic inc. This removes a few more scalar registers. Performance and resource utilization noticably improved from all updates to this review.
  • Change parameter name inival to "rnv" (Reduction Null Value) for consistency.
  • Improved header docs and comment block for main function _xteamr_reduction.
Oct 27 2022, 9:37 AM · Unknown Object (Project), Restricted Project

Oct 26 2022

gregrodgers updated the diff for D136641: [OPENMP] Fast cross-team reduction (xteamr) helper functions..
  • removed duplicate variants of shfl that were the same between amdgcn and nvptx and simplified nvptx variant of shfl_xor_int
Oct 26 2022, 5:40 PM · Unknown Object (Project), Restricted Project
gregrodgers updated the diff for D136641: [OPENMP] Fast cross-team reduction (xteamr) helper functions..
  • improve performance and lower register utilization by deriving reduction constants from k and passing numteams from codegen to xteamr function. This changes the interface to the xteamr functions and removes dependencies on DeviceRTL mapping functions
Oct 26 2022, 4:33 PM · Unknown Object (Project), Restricted Project

Oct 25 2022

gregrodgers updated the diff for D136641: [OPENMP] Fast cross-team reduction (xteamr) helper functions..
  • fix lit test of xteamr/test_xteamr.cpp
Oct 25 2022, 6:17 AM · Unknown Object (Project), Restricted Project

Oct 24 2022

gregrodgers updated the diff for D136641: [OPENMP] Fast cross-team reduction (xteamr) helper functions..
  1. - fix for make check-openmp
Oct 24 2022, 4:22 PM · Unknown Object (Project), Restricted Project
gregrodgers updated the summary of D136641: [OPENMP] Fast cross-team reduction (xteamr) helper functions..
Oct 24 2022, 3:12 PM · Unknown Object (Project), Restricted Project
gregrodgers updated the summary of D136641: [OPENMP] Fast cross-team reduction (xteamr) helper functions..
Oct 24 2022, 3:05 PM · Unknown Object (Project), Restricted Project
gregrodgers requested review of D136641: [OPENMP] Fast cross-team reduction (xteamr) helper functions..
Oct 24 2022, 2:59 PM · Unknown Object (Project), Restricted Project

Oct 19 2022

gregrodgers retitled D135162: [OPENMP] New api ompx_get_device_num_units(int devid) from [OPENMP] New api ompx_get_team_procs(devid) to [OPENMP] New api ompx_get_device_num_units(int devid).
Oct 19 2022, 1:17 PM · Restricted Project, Unknown Object (Project)
gregrodgers updated the diff for D135162: [OPENMP] New api ompx_get_device_num_units(int devid).
  • change name from ompx_get_team_procs(devid) to ompx_get_device_num_units(devid). Internal names have also been changed
Oct 19 2022, 1:11 PM · Restricted Project, Unknown Object (Project)

Oct 17 2022

gregrodgers added a comment to D135162: [OPENMP] New api ompx_get_device_num_units(int devid).

Is the name ompx_get_device_num_units(devid) acceptable?

Oct 17 2022, 9:51 PM · Restricted Project, Unknown Object (Project)
gregrodgers added a comment to D135162: [OPENMP] New api ompx_get_device_num_units(int devid).

See answer to inline comment.

Oct 17 2022, 9:19 PM · Restricted Project, Unknown Object (Project)
gregrodgers added a comment to D135162: [OPENMP] New api ompx_get_device_num_units(int devid).

I think I get your rational now. A thread executes on a places proc. In fact many threads can execute on (map to) a place proc. We need the same sort of "mapping" for a team or thread group. We did not call a place a thread_proc.
As the name is now, a team_proc is a place where a team can run, That can be a CU, an SM, a VE, a host socket, a host numa domain, a node, or something. The code in this review covers all existing (in trunk) offload plugins that can execute a team.

Oct 17 2022, 8:41 PM · Restricted Project, Unknown Object (Project)
gregrodgers added a comment to D135162: [OPENMP] New api ompx_get_device_num_units(int devid).

"Team can technically map to lots of things and soon they will. What about "thread groups"? Or even "sockets"?
Also, there is some duplication I pointed out below."

Oct 17 2022, 1:21 PM · Restricted Project, Unknown Object (Project)
gregrodgers updated the diff for D135162: [OPENMP] New api ompx_get_device_num_units(int devid).
  • move NumberOfTeamProcs into DeviceData per comment from jdoerfert
Oct 17 2022, 12:42 PM · Restricted Project, Unknown Object (Project)
gregrodgers updated the diff for D135162: [OPENMP] New api ompx_get_device_num_units(int devid).
  • Merge branch 'main' into arcpatch-D135162
Oct 17 2022, 11:28 AM · Restricted Project, Unknown Object (Project)
gregrodgers updated the diff for D135162: [OPENMP] New api ompx_get_device_num_units(int devid).
  • Remove change to openmp/libomptarget/plugins/exports so this revision will merge with current trunk
Oct 17 2022, 10:24 AM · Restricted Project, Unknown Object (Project)
gregrodgers abandoned D136094: [OPENMP] fix D35162 to remove changes to plugins/exports.

Ignore this, I was just trying to fix https://reviews.llvm.org/D135162 and accidentally created a new revision

Oct 17 2022, 10:06 AM · Restricted Project, Unknown Object (Project)
gregrodgers requested review of D136094: [OPENMP] fix D35162 to remove changes to plugins/exports.
Oct 17 2022, 10:03 AM · Restricted Project, Unknown Object (Project)
gregrodgers added a comment to D135162: [OPENMP] New api ompx_get_device_num_units(int devid).

The change to openmp/libomptarget/plugins/exports needs to be deleted. The upstream changed this file to include all functions beginning with __tgt_rtl

Oct 17 2022, 8:48 AM · Restricted Project, Unknown Object (Project)

Oct 5 2022

gregrodgers added a comment to D135162: [OPENMP] New api ompx_get_device_num_units(int devid).

The new interface to query the number of physical processors (or whatever it should be called) is fine, but the name is a little bit questionable. It implies how an OpenMP team is mapped to the low level model. I think it is possible that the execution model mapping is per-kernel. It doesn't sound great to have that implication in a general interface name.

Oct 5 2022, 1:15 PM · Restricted Project, Unknown Object (Project)

Oct 4 2022

gregrodgers retitled D135162: [OPENMP] New api ompx_get_device_num_units(int devid) from [OPENMP] New api ompx_get_team_procs(devid) returns the number of physical processors that can run a team. For AMD, this is the number of CUs. for Nvidia, this is number of SMs. For CPUs, this COULD be number of sockets if multiple teams are... to [OPENMP] New api ompx_get_team_procs(devid) .
Oct 4 2022, 8:36 AM · Restricted Project, Unknown Object (Project)
gregrodgers requested review of D135162: [OPENMP] New api ompx_get_device_num_units(int devid).
Oct 4 2022, 8:28 AM · Restricted Project, Unknown Object (Project)
gregrodgers abandoned D73657: [OPENMP] Load plugins from same directory as the libomptarget.so and quick fail mechanism for offloading plugins.

No longer using this approach

Oct 4 2022, 8:02 AM · Restricted Project, Unknown Object (Project)

Dec 2 2021

gregrodgers accepted D114890: [OpenMP] Make the new device runtime the default.

this is ok as is

Dec 2 2021, 6:41 AM · Restricted Project

Dec 1 2021

gregrodgers requested changes to D114890: [OpenMP] Make the new device runtime the default.

I forgot to add the "request changes" action.

Dec 1 2021, 5:53 PM · Restricted Project
gregrodgers added a comment to D114890: [OpenMP] Make the new device runtime the default.

We want amdgcn to remain on old deviceRTL till we have verified it . I made inline comments on how this could be done.

Dec 1 2021, 5:52 PM · Restricted Project

Aug 26 2021

gregrodgers requested changes to D102043: [libomptarget] Look for plugins next to libomptarget.so.
Aug 26 2021, 1:19 PM · Unknown Object (Project)

Jul 22 2021

gregrodgers accepted D106581: [libomptarget][amdgpu][nfc] Normalise license headers.

This looks good. Thanks for the cleanup

Jul 22 2021, 12:22 PM · Unknown Object (Project)

May 18 2021

gregrodgers added inline comments to D99235: [HIP] Change to code object v4.
May 18 2021, 11:54 AM · Restricted Project

Apr 15 2021

gregrodgers accepted D99949: [AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed.

I am removing my objection with the understanding that we will either replace or enhance amdgpu-arch with the cross-architecture tool offload-arch as described in my comments above. Also, this patch changes amd toolchains to use amdgpu-arch. So it will not interfere with how I expect the cross-architecture tool to be used to greatly simplify openmp command line arguments.

Apr 15 2021, 6:00 AM · Restricted Project
gregrodgers added inline comments to D99949: [AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed.
Apr 15 2021, 5:19 AM · Restricted Project

Apr 14 2021

gregrodgers added a comment to D99949: [AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed.

Dependence on hsa is not necessary. The amdgpu and nvidia drivers both use PCI codes available in /sys . We should use architecture independent methods as much as possible.

Apr 14 2021, 12:38 PM · Restricted Project
gregrodgers requested changes to D99949: [AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed.

I have two serious concerns with this tool . 1. It does not provide the infrastructure to identify runtime capabilities to satisfy requirements of a compiled image. 2. It is not architecturally neutral or extensible to other architectures. I have a different proposed tool called offload-arch . Here is the help text for offload-arch.

Apr 14 2021, 7:05 AM · Restricted Project

Jul 29 2020

gregrodgers added a comment to D84743: [Clang][AMDGCN] Universal device offloading macros header.

This is all excellent feedback. Thank you.
I don't understand what I see on the godbolt link. So far, we have only tested with clang. We will test with gcc to understand the fail.
I will make the change to use numeric values for _DEVICE_ARCH and change "UNKNOWN" to some integer (maybe -1). The value _DEVICE_GPU is intended to be generational within a specific _DEVICE_ARCH.
To be clear, this is primarily for users or library writers to implement device specific or host-only code. This is not something that should be automatic. Users or library authors will opt-in with their own include. So maybe it does not belong in clang/lib/Headers.
As noted in the header comment, I expect the community to help keep this up to date as offloading technology evolves.

Jul 29 2020, 7:59 AM · Restricted Project, Restricted Project

Jun 12 2020

gregrodgers accepted D81109: llvm-link: Add support for archive files as inputs..

LGTM, only one new auto in an existing code with 11 previous autos. Looks like consistent usage.

Jun 12 2020, 9:12 AM · Unknown Object (Project), Restricted Project

May 14 2020

gregrodgers commandeered D42800: Let CUDA toolchain support amdgpu target.
May 14 2020, 4:21 PM
gregrodgers abandoned D42800: Let CUDA toolchain support amdgpu target.
May 14 2020, 4:21 PM
gregrodgers abandoned D27928: Add isGPU() to Triple.
May 14 2020, 4:21 PM · Unknown Object (Project)

Mar 30 2020

gregrodgers added a comment to D76987: Rename options --cuda-gpu-arch and --no-cuda-gpu-arch.

This was discussed on llvm-dev three years ago. Here is the thread.

Mar 30 2020, 10:49 AM · Restricted Project

Feb 6 2020

gregrodgers added a comment to D73657: [OPENMP] Load plugins from same directory as the libomptarget.so and quick fail mechanism for offloading plugins.

I will rework this patch to

  • Try dlopen on relative library name first to for LD_LIBRARY_PATH search. If that fails, I will load using full path name.
  • Reorganize the names arrays into a single array to avoid a counter.
Feb 6 2020, 8:37 PM · Restricted Project, Unknown Object (Project)

Jan 29 2020

gregrodgers added reviewers for D73657: [OPENMP] Load plugins from same directory as the libomptarget.so and quick fail mechanism for offloading plugins: ronl, ABataev, jdoerfert, RaviNarayanaswamy.
Jan 29 2020, 11:58 AM · Restricted Project, Unknown Object (Project)
gregrodgers created D73657: [OPENMP] Load plugins from same directory as the libomptarget.so and quick fail mechanism for offloading plugins.
Jan 29 2020, 11:49 AM · Restricted Project, Unknown Object (Project)

Aug 23 2018

gregrodgers added a comment to D50845: [CUDA/OpenMP] Define only some host macros during device compilation.

I have a longer comment on header files, but let me first understand this patch.

Aug 23 2018, 1:37 PM

Aug 22 2018

gregrodgers added a comment to D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls.

I like the idea of using an automatic include as a cc1 option (-include). However, I would prefer a more general automatic include for OpenMP, not just for math functions (clang_cuda_device_functions.h). Clang cuda automatically includes clang_cuda_runtime_wrapper.h. It includes other files as needed like clang_cuda_device_functions.h. Lets hypothetically call my proposed automatic include for OpenMP , clang_openmp_runtime_wrapper.h.

Aug 22 2018, 9:58 AM · Restricted Project

Jun 25 2018

gregrodgers added a comment to D48455: Remove hip.amdgcn.bc hc.amdgcn.bc from HIP Toolchains.

Why not provide a specific list of --hip-device-lib= for VDI builds? I am not sure about defining functions inside headers instead of using a hip bc lib.

Jun 25 2018, 8:27 AM · Unknown Object (Project), Restricted Project

May 7 2018

gregrodgers added a comment to D46185: [OpenMP] Allow nvptx sm_30 to be used as an offloading device.

I agree that George's RMW proposed code is correct. This was my first attempt at an RMW code. Maybe we should implement atomicMax as a device function in architecture-specific (e.g sm_30) device library. This way the code in loop.cu can remain just a call to atomicMax. Such an implementation would need an overloaded atomicMax.

May 7 2018, 9:12 AM · Unknown Object (Project)

Apr 4 2018

gregrodgers added a comment to D44992: [OpenMP] enable bc file compilation using the latest clang.

So , will the deviceRTLs/nvptx change? Instead of extern shared, what will it use for those data structures?

Apr 4 2018, 9:41 AM · Unknown Object (Project)

Apr 2 2018

gregrodgers added a comment to D44992: [OpenMP] enable bc file compilation using the latest clang.

Maybe my search is missing something, but the only place I see CUDARelocatableDeviceCode is in lib/Sema/SemaDeclAttr.cpp to allow for extern shared. How could this be causing slowness? I would think forcing extern to be global would be slower.

Apr 2 2018, 9:13 AM · Unknown Object (Project)

Feb 5 2018

gregrodgers added a comment to D42800: Let CUDA toolchain support amdgpu target.

Here my replys to the inline comments. Everything should be fixed in the next revision.

Feb 5 2018, 2:18 PM

Feb 1 2018

gregrodgers added a comment to D42800: Let CUDA toolchain support amdgpu target.

Sorry, all my great inline comments got lost somehow. I am a newbie to Phabricator. I will try to reconstruct my comments.

Feb 1 2018, 6:45 PM
gregrodgers requested changes to D42800: Let CUDA toolchain support amdgpu target.

Thanks to everyone for the reviews. I hope I replied to all inline comments. Since I sent this to Sam to post, we discovered a major shortcoming. As tra points out, there is a lot of cuda headers in the cuda sdk that are processed. We are able to override asm() expansions with #undef and redefine as an equivalent amdgpu component so the compiler never sees the asm(). I am sure we will need to add more redefines as we broaden our testing. But that is not the big problem. We would like to be able to run cudaclang for AMD GPUs without an install of cuda. Of course you must always install cuda if any of your targeted GPUs are NVidia GPUs. To run cudaclang without cuda when only non-NVidia gpus are specified, we need an open set of headers and we must replace the fatbin tools used in the toolchain. The later can be addressed by using the libomptarget methods for embedding multiple target GPU objects. The former is going to take a lot of work. I am going to be sending an updated patch that has the stubs for the open headers noted in clang_cuda_runtime_wrapper.h. They will be included with the CC1 flag -DUSE_OPEN_HEADERS__. This will be generated by the cuda driver when it finds no cuda installation and all target GPUs are not NVidia.

Feb 1 2018, 6:42 PM

Dec 19 2016

gregrodgers added a comment to D27928: Add isGPU() to Triple.

Justin, the commonality between nvptx and amdgcn LLVM IR is exactly why I would like isGPU(). I actually do want to assume that "isGPU" <--> "isNVPTX || isAMDGCN".

Dec 19 2016, 6:14 PM · Unknown Object (Project)
gregrodgers added a comment to D27928: Add isGPU() to Triple.

I can email you a bigger patch from our development tree. I would rather not post it in public yet because it still needs some work. Here are two examples from this patch.

Dec 19 2016, 2:22 PM · Unknown Object (Project)
gregrodgers added a comment to D27928: Add isGPU() to Triple.

Thank you Justin, Yes, I plan to use this extensively in clang for common OpenMP code generation. But I don't have those patches ready yet.
isGPU() may also be used for compilation of cuda code to LLVM IR as alternative to isNVPTX(). I will discuss with google authors first.
I formatted to 80 lines. Thank you for your patience with a new contributor.

Dec 19 2016, 11:15 AM · Unknown Object (Project)
gregrodgers updated the diff for D27928: Add isGPU() to Triple.

Formatted to 80 characters.

Dec 19 2016, 11:01 AM · Unknown Object (Project)
gregrodgers retitled D27928: Add isGPU() to Triple from to Add isGPU() to Triple.
Dec 19 2016, 9:52 AM · Unknown Object (Project)