This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Ensure broken assumptions print once, not thousands of times.
ClosedPublic

Authored by jdoerfert on Oct 20 2021, 10:04 AM.

Details

Summary

If we have a broken assumption we want to print a message to the user.
If the assumption is broken by many threads in many teams this can
become a problem. To avoid it we use a hash that tracks if a broken
assumption has (likely) been printed and avoid printing it again. This
is not fool proof and has some caveats that might cause problems in
the future (see comment) but it should improve the situation
considerably for now.

Diff Detail

Event Timeline

jdoerfert created this revision.Oct 20 2021, 10:04 AM
jdoerfert requested review of this revision.Oct 20 2021, 10:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 20 2021, 10:04 AM
Herald added a subscriber: sstefan1. · View Herald Transcript
openmp/libomptarget/DeviceRTL/src/Debug.cpp
43 ↗(On Diff #381008)

Seems better to wrap it with a function or macro like PRINTF_ONCE.

jhuber6 added inline comments.Oct 20 2021, 10:43 AM
openmp/libomptarget/DeviceRTL/src/Debug.cpp
44 ↗(On Diff #381008)

Unrelated, but why don't we use the standard format from assert.h along with using PRETTY_FUNCTION?

jdoerfert updated this revision to Diff 381122.Oct 20 2021, 5:30 PM

Use singleton flag per assumption

jdoerfert added inline comments.Oct 20 2021, 5:32 PM
openmp/libomptarget/DeviceRTL/src/Debug.cpp
44 ↗(On Diff #381008)

Idk, let's do that separate.

JonChesterfield added inline comments.
openmp/libomptarget/DeviceRTL/include/Utils.h
80

Are we c++17 in the deviceRTL? Can presumably be whatever clang trunk has implemented

99

Or we could use zero for the initial value and write 1 into it. Seq_cst seems unnecessary - acq_rel? Global ordering on GPUs is expensive and not obviously required here, don't care which thread wins the race to print a debug message

openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
324

Normally takes two orderings

openmp/libomptarget/DeviceRTL/include/Utils.h
99

Actually worse than that - seq cst induces a global order. To make that work, we probably change thread execution ordering. Want to minimise assert's influence on dynamic control flow

jdoerfert added inline comments.Oct 21 2021, 7:17 AM
openmp/libomptarget/DeviceRTL/include/Utils.h
80

Runs w/o warnings for me. I'll use an integer type.

99

Or we could use zero for the initial value and write 1 into it.

We could, but then we would pay a cost even if this code is dead. I'm fine either way.

...

This is used in front of a __builtin_trap, or at least guarded by a debug/configuration kind. It will not be fast but it doesn't have to be.

openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
324

No it does not. This is not cas.

jdoerfert updated this revision to Diff 382138.Oct 25 2021, 3:10 PM

Remove UB, use weaker atomics

JonChesterfield accepted this revision.Oct 25 2021, 3:17 PM
JonChesterfield added inline comments.
openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
324

Yep, thanks! Did indeed misread as compare&swap

This revision is now accepted and ready to land.Oct 25 2021, 3:17 PM
jhuber6 updated this revision to Diff 403766.Jan 27 2022, 12:57 PM

Fix leftover code

Counter is per-tu and this is defined and used in a header. What happens if it's used in two translation units? Instantiations collide, linker picks one, we miss an output?

Counter is per-tu and this is defined and used in a header. What happens if it's used in two translation units? Instantiations collide, linker picks one, we miss an output?

We can replace counter with an enum if we want more control.

Not sure I care hugely about control, but if these are colliding between TUs we lose quite a lot of precision in the reporting. Wrapping the function in an anonymous namespace would be a fix

jhuber6 updated this revision to Diff 403788.Jan 27 2022, 1:42 PM

Replacing COUNTER with a defined enum value to uniquely identify which singleton counter to query between TUs.

jhuber6 updated this revision to Diff 403802.Jan 27 2022, 2:22 PM

Changing to use namespace and counter again.