Using LIBOMPTARGET_PROFILER, duplicates are created from timing both Kernel functions and Data update functions.
I commented out the duplicate timescope and left them in the targetkernel and the targetdataupdate functions. This
way the timescope calls will be closer to the launching of the kernel and the data moving.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This doesn't look the right way to handle it. The initial design is to profile all functions in the call stack.
Thanks, @tianshilei1992 for your input.
I adjusted the timescope calls to address a problem with duplicate profiling slices. You can see them on chrome://tracing. The duplication occurred when both kernel functions and all data-moving functions were timed separately. Focusing the timescope calls on 'targetKernel' and 'targetdataupdate' functions provides a solution to the duplicate slices, and more accurate profiling data, while keeping the important profiling.
My intention was to strike a balance between resolving the duplication issue and keeping the important profiling. I understand the importance of comprehensive profiling and I'm open to further discussion.
(Edited - change of file - the first file was after the fix - now you see the duplicate slices)openmp/libomptarget/src/interface.cpp | ||
---|---|---|
85 | Keep this, put the RegionTypeMsg into the details field of the profile entry. | |
146 | Remove. | |
157 | Remove. | |
172 | Remove. | |
183 | Remove. | |
194 | Remove. | |
204–205 | Remove. | |
247 | Keep this. | |
298 | Remove, above is better / more true to the time spend "offloading" | |
323–324 | Remove. | |
openmp/libomptarget/src/omptarget.cpp | ||
533–534 | Remove, but verify all callees go through targetDataMapper<...> | |
575 | This is not tracked anymore. | |
1004–1005 | Remove, but verify all callees go through targetDataMapper<...> | |
1097–1098 | Remove, but verify all callees go through targetDataMapper<...> | |
1142 | Remove, but verify all callees go through targetDataMapper<...> |
From my perspective, the issue is not in the measurement. Having nested time regions is a normal thing. Using the same Ident also doesn't seem to be the problem. For most nested measured regions, the visualization shows the nesting correctly. I think, the problem is in the visualization. Without verifying the regions where the nesting fails and the slices are printed into the same line, I think that chrome://tracing has an issue, if nested regions start at the same timestamp due to time resolution. An example is
{"pid":2590370,"tid":2590370,"ph":"X","ts":2394356,"dur":22893,"name":"targetKernel","args":{"detail":";main.cpp;main;60;7;;"}}, {"pid":2590370,"tid":2590370,"ph":"X","ts":2394356,"dur":22894,"name":"__tgt_target_kernel","args":{"detail":";main.cpp;main;60;7;;"}}
This should be fixed in the visualization, not in the measurement.
I addressed @jdoerfert's recommendations while the discussion is ongoing.
@protze.joachim, from your description, I understand that the issue might not lie in the
measurement itself but rather in the visualization process. That was my initial thought as
well. Moreover, based on my brief understanding of the code, I noticed that __tgt_target_kernel
always calls targetKernel, resulting in the timescope being invoked twice at practically the
same time. Is this measurement necessary, or does it create a duplication that requires
chrome://tracing to remove one for proper visualization?
Could you share how the json looks now?
Having nested time regions is a normal thing.
Yes, but our nesting did never provide real benefit. We did not have good reason for the placement. Let's go back to minimal placement and then see where it makes sense to add nested timers.
If the region doesn't add value, removing this duplicate seems reasonable. I just commented on the technical tracing aspect.
This seems to haven broken the AMPGPU buildbot
https://lab.llvm.org/buildbot/#/builders/193/builds/37266
FAILED: openmp/libomptarget/src/CMakeFiles/omptarget.dir/interface.cpp.o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang++ --target=x86_64-unknown-linux-gnu -DGTEST_HAS_RTTI=0 -DOMPT_SUPPORT=1 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_LIBCPP_ENABLE_HARDENED_MODE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/llvm/include -I/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/include -I/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -I/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/openmp/libomptarget/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wall -Wcast-qual -Wformat-pedantic -Wimplicit-fallthrough -Wsign-compare -Wno-enum-constexpr-conversion -Wno-extra -Wno-pedantic -O3 -DNDEBUG -fPIC -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT openmp/libomptarget/src/CMakeFiles/omptarget.dir/interface.cpp.o -MF openmp/libomptarget/src/CMakeFiles/omptarget.dir/interface.cpp.o.d -o openmp/libomptarget/src/CMakeFiles/omptarget.dir/interface.cpp.o -c /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/openmp/libomptarget/src/interface.cpp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/openmp/libomptarget/src/interface.cpp:85:3: error: invalid operands to binary expression ('const char *' and 'const char *') 85 | TIMESCOPE_WITH_RTM_AND_IDENT(RegionTypeMsg, Loc); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/openmp/libomptarget/src/private.h:434:72: note: expanded from macro 'TIMESCOPE_WITH_RTM_AND_IDENT' 434 | llvm::TimeTraceScope TimeScope(__FUNCTION__, SI.getProfileLocation() + RegionTypeMsg) | ~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~ /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/openmp/libomptarget/src/interface.cpp:196:7: error: unknown type name 'Loc' 196 | Loc, DeviceId, ArgNum, ArgsBase, Args, ArgSizes, ArgTypes, ArgNames, | ^ /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/openmp/libomptarget/src/interface.cpp:196:12: error: unknown type name 'DeviceId'; did you mean 'DeviceTy'? 196 | Loc, DeviceId, ArgNum, ArgsBase, Args, ArgSizes, ArgTypes, ArgNames, | ^~~~~~~~ | DeviceTy /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/openmp/libomptarget/include/device.h:398:8: note: 'DeviceTy' declared here 398 | struct DeviceTy { | ^ /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/openmp/libomptarget/src/interface.cpp:196:22: error: unknown type name 'ArgNum' 196 | Loc, DeviceId, ArgNum, ArgsBase, Args, ArgSizes, ArgTypes, ArgNames, | ^ /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/openmp/libomptarget/src/interface.cpp:196:30: error: unknown type name 'ArgsBase' 196 | Loc, DeviceId, ArgNum, ArgsBase, Args, ArgSizes, ArgTypes, ArgNames, | ^ /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/openmp/libomptarget/src/interface.cpp:196:40: error: unknown type name 'Args' 196 | Loc, DeviceId, ArgNum, ArgsBase, Args, ArgSizes, ArgTypes, ArgNames, | ^ /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/openmp/libomptarget/src/interface.cpp:196:46: error: unknown type name 'ArgSizes' 196 | Loc, DeviceId, ArgNum, ArgsBase, Args, ArgSizes, ArgTypes, ArgNames, | ^ /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/openmp/libomptarget/src/interface.cpp:196:56: error: unknown type name 'ArgTypes' 196 | Loc, DeviceId, ArgNum, ArgsBase, Args, ArgSizes, ArgTypes, ArgNames, | ^ /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/openmp/libomptarget/src/interface.cpp:196:66: error: unknown type name 'ArgNames' 196 | Loc, DeviceId, ArgNum, ArgsBase, Args, ArgSizes, ArgTypes, ArgNames, | ^ /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/openmp/libomptarget/src/interface.cpp:197:7: error: unknown type name 'ArgMappers' 197 | ArgMappers, targetDataUpdate, "Updating data within the OpenMP data region with update_mapper", "update"); | ^ /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/openmp/libomptarget/src/interface.cpp:197:19: error: unknown type name 'targetDataUpdate' 197 | ArgMappers, targetDataUpdate, "Updating data within the OpenMP data region with update_mapper", "update"); | ^ /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/openmp/libomptarget/src/interface.cpp:197:37: error: expected parameter declarator 197 | ArgMappers, targetDataUpdate, "Updating data within the OpenMP data region with update_mapper", "update"); | ^ /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/openmp/libomptarget/src/interface.cpp:197:37: error: expected ')' /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/openmp/libomptarget/src/interface.cpp:195:32: note: to match this '(' 195 | targetDataMapper<AsyncInfoTy>( | ^ /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/openmp/libomptarget/src/interface.cpp:195:3: error: parameter name cannot have template arguments 195 | targetDataMapper<AsyncInfoTy>( | ^ /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/openmp/libomptarget/src/interface.cpp:195:3: error: a type specifier is required for all declarations /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/openmp/libomptarget/src/interface.cpp:197:112: error: expected ')' 197 | ArgMappers, targetDataUpdate, "Updating data within the OpenMP data region with update_mapper", "update"); | ^ /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/openmp/libomptarget/src/interface.cpp:189:44: note: to match this '(' 189 | EXTERN void __tgt_target_data_update_mapper(ident_t *Loc, int64_t DeviceId, | ^ /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/openmp/libomptarget/src/interface.cpp:198:1: error: extraneous closing brace ('}') 198 | } | ^ 17 errors generated.
@jplehr thanks, I fixed the broken commit
fixed with this commit - 1ff0bdb86dbfb7c70bbbabcae3affff59a0e0806
Thanks for the timely fix! Build is now working.
Now there seem to be three test failures: https://lab.llvm.org/buildbot/#/builders/193/builds/37270
Depending on how much time you need, it may be better to revert the patch in the meantime.
Let me know if I can assist for the failing AMD case.
Keep this, put the RegionTypeMsg into the details field of the profile entry.