This patch add support for printing analysis messages relating to data globalization on the GPU. This occurs when data is shared between the threads in a GPU context and must be pushed to global or shared memory.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I tested it just using the example file to see if it works. I tried just calling opt on it but it was giving me some dumb error so I figured I'd just get it set up first.
clang++ -fopenmp clang/tests/OpenMP/declare_target_codegen_globalization.cpp -Rpass=openmp-opt -Rpass-analysis=openmp-opt -S -emit-llvm -fopenmp-targets=nvptx64-nvidia-cuda -O3
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
708 | Can't we do foreachuse? |
I think the problem is calling opt on the file since it's combined with the nvptx IR. Is there some opt command line option that does that correctly?
Adding test case and changing analysis to use forEachUse. The nvptx file doesn't have the line number information after compiling with debugging symbols so the remark just says "unknown." When you get the remarks from clang it seems to print the information more than necessary.
declare_target_codegen_globalization.cpp:7:5: remark: Found thread data sharing on the GPU. Expect degraded performance due to data globalization. [-Rpass-analysis=openmp-opt] int bar() { ^ remark: Found thread data sharing on the GPU. Expect degraded performance due to data globalization. [-Rpass-analysis=openmp-opt] remark: Found thread data sharing on the GPU. Expect degraded performance due to data globalization. [-Rpass-analysis=openmp-opt]
Manually adding missing debug info for stack pushing not being generated by Clang. A real solution will require modifying clang's code generation.
LGTM
llvm/test/Transforms/OpenMP/globalization_remarks.ll | ||
---|---|---|
64 | I think you want !31 but I guess it doesn't really matter. |
clang-format: please reformat the code