If we have a broken assumption we want to print a message to the user.
If the assumption is broken by many threads in many teams this can
become a problem. To avoid it we use a hash that tracks if a broken
assumption has (likely) been printed and avoid printing it again. This
is not fool proof and has some caveats that might cause problems in
the future (see comment) but it should improve the situation
considerably for now.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
openmp/libomptarget/DeviceRTL/src/Debug.cpp | ||
---|---|---|
43 ↗ | (On Diff #381008) | Seems better to wrap it with a function or macro like PRINTF_ONCE. |
openmp/libomptarget/DeviceRTL/src/Debug.cpp | ||
---|---|---|
44 ↗ | (On Diff #381008) | Unrelated, but why don't we use the standard format from assert.h along with using PRETTY_FUNCTION? |
openmp/libomptarget/DeviceRTL/src/Debug.cpp | ||
---|---|---|
44 ↗ | (On Diff #381008) | Idk, let's do that separate. |
openmp/libomptarget/DeviceRTL/include/Utils.h | ||
---|---|---|
80 | Are we c++17 in the deviceRTL? Can presumably be whatever clang trunk has implemented | |
99 | Or we could use zero for the initial value and write 1 into it. Seq_cst seems unnecessary - acq_rel? Global ordering on GPUs is expensive and not obviously required here, don't care which thread wins the race to print a debug message | |
openmp/libomptarget/DeviceRTL/src/Synchronization.cpp | ||
324 | Normally takes two orderings |
openmp/libomptarget/DeviceRTL/include/Utils.h | ||
---|---|---|
99 | Actually worse than that - seq cst induces a global order. To make that work, we probably change thread execution ordering. Want to minimise assert's influence on dynamic control flow |
openmp/libomptarget/DeviceRTL/include/Utils.h | ||
---|---|---|
80 | Runs w/o warnings for me. I'll use an integer type. | |
99 |
We could, but then we would pay a cost even if this code is dead. I'm fine either way.
This is used in front of a __builtin_trap, or at least guarded by a debug/configuration kind. It will not be fast but it doesn't have to be. | |
openmp/libomptarget/DeviceRTL/src/Synchronization.cpp | ||
324 | No it does not. This is not cas. |
openmp/libomptarget/DeviceRTL/src/Synchronization.cpp | ||
---|---|---|
324 | Yep, thanks! Did indeed misread as compare&swap |
Counter is per-tu and this is defined and used in a header. What happens if it's used in two translation units? Instantiations collide, linker picks one, we miss an output?
Not sure I care hugely about control, but if these are colliding between TUs we lose quite a lot of precision in the reporting. Wrapping the function in an anonymous namespace would be a fix
Replacing COUNTER with a defined enum value to uniquely identify which singleton counter to query between TUs.
Are we c++17 in the deviceRTL? Can presumably be whatever clang trunk has implemented