User Details
- User Since
- Apr 23 2020, 6:41 PM (44 w, 2 d)
Thu, Feb 25
Add extra llc step to produce assembly in the linker.
So, neither emit-llvm-bc or emit-llvm work well with save-temps. Therefore, I feel the current approach is still valid. This does not impact nvptx or any other target in any way. And I don't see how.
Wed, Feb 24
Tue, Feb 23
Here's a bit of background,
OffloadingPrefix was not getting properly set in the dependent actions of OffloadWrapperJobAction (which are backend [11] and assemble [12]). Since backend [11] and assemble [12] host-wrapper actions have same logic to the other host actions (3 & 4), those will overwrite the previous generated files from host-only actions.
Thu, Feb 18
Looks good to me. Thanks!
Wed, Feb 17
Fixed the assert.
Tue, Feb 16
It is because of how addClangTargetOptions is invoked. In case of save-temps, it is being invoked for all the actions resulting in target cc1 call. That's why all these invocations have -emit-llvm-bc. I guess we need Action as an argument to addClangTargetOptions.
emit-llvm-bc does not correctly solve the problem. It works because [input, compile, assemble, backend] actions collapse to a single action by driver. This single command handles emit-llvm-bc properly. But when save-temps is specified, this collapsing does not happen which messes up command line flags of the jobs and hence the output, for e.g., preprocessor command also has -emit-llvm-bc.
This does fixes the save-temps but only when -o is not specified. If -o is specified the name of host object file and host-wrapper object file (second last phase) is same, which fails the linker. This does not seem to be related to this patch.
Mon, Feb 15
Addressed review comments.
Can you use -check-prefixes=GCN,GFX8 and GCN,GFX9 so that update_mir_test_checks will common up the identical ones?
It does not work. Script warns as WARNING: Ignoring common prefixes: {'GCN'}: llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulo.mir
Addressed review comments.
Thu, Feb 11
Wed, Feb 10
I haave removed libomptarget-device-bc-path and have added amdgcn one. For diagnostic,
instead of having one per architecture, I have used the same and added second
parameter to specify arch.
Tue, Feb 9
LGTM. Thanks!
LGTM, thanks for fixing it.
Mon, Feb 8
- Added check for nogpulib
- Fixed diagnostic message
Addressed review comments.
Accidently missed some changes,
- Fix openmp-offload.c test failure
- Fix amdgpu-openmp-toolchain.c test failure
Ping!
Tue, Feb 2
- Use 0 for default -O option
- Rename addOptLevelArgs to addLLCOptArg
After addressing the review comments, I have internally verified changes on few simple test programs. They seem to be working fine.
Addressed review comments.
- Combined the toolchain creation logic for nvptx and amdgcn
- Replaced -Xopenmp-target with -emit-llvm-bc inside AMDGPUOpenMP.cpp
- Removed opt from pipeline
Mon, Feb 1
- Scalarize the vectors first
- Using widened operation for smaller types
Ping!
Jan 28 2021
Hi, apologies for late reply as I got sidetracked to some other work.
Jan 27 2021
Jan 20 2021
- Moved common methods of HIP and OpenMP to base AMDGPUToolChain
- Removed unnecessary asserts
Fixed failing debian tests
Won't this just prevent us from building clang due to the missing cmake changes?
It compiles and builds fine, however, I wasn't actually aware such sanity checking being present. It turns out
the unknown files inside llvm/ will lead cmake to report error but such reporting will not happen inside clang. Maybe such checks
were not enabled inside clang. Anyways thanks for pointing out. I will keep that in mind in future.
Jan 19 2021
Fix clang-tidy error
Jan 14 2021
Moved ops close to ADDO
Jan 10 2021
Removed global-isel-abort=0
Jan 5 2021
Jan 4 2021
Jan 1 2021
Dec 20 2020
Update AMDGPU barrier intrinsic
Dec 7 2020
Ping!
Dec 3 2020
Looks good, thanks.
Dec 2 2020
Dec 1 2020
Ping!
Nov 23 2020
Ping!
Nov 18 2020
Nov 4 2020
Nov 3 2020
Ping!
Oct 28 2020
Removed redundant header
Oct 21 2020
LGTM, thanks!
Oct 20 2020
clang-format'ed the changes
Oct 6 2020
Oct 5 2020
Sep 23 2020
Sep 22 2020
Formatting and removed implicit uses
Removed unused code
Added lowerFor({V2S8})
Sep 20 2020
Sep 17 2020
Updated tests and clamping number of elements to 2
Sep 7 2020
Sep 3 2020
Updated data_sharing_stack_init_common
Only places where it was accessed are here and here. Jon's observation is correct. The maximum number of threads on both amdgcn and nvptx is 1024. However, on amdgcn, wave size is 64 and so maximum number of waves can be 16 and on nvptx, the warp size is 32 and maximum number of warps is 32.
Sep 2 2020
@arsenm , let me know if it is good to land.
Aug 30 2020
Updated review comments.
Aug 11 2020
Added support for vector types.