Page MenuHomePhabricator

[OpenMP] Enable profiling on multiple threads
Needs ReviewPublic

Authored by markdewing on Dec 16 2022, 2:20 PM.

Details

Summary

This patch fixes the issue where the built-in profiler doesn't produce data for multiple threads:
https://github.com/llvm/llvm-project/issues/57985

Before creating a TimeScope object, the code checks to see if a profiler instance has
been created for that thread. If not, it creates one and saves a pointer for later.
At shutdown, all the saved profiler instances are finished before the call to write out the data.

Diff Detail

Event Timeline

markdewing created this revision.Dec 16 2022, 2:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 16 2022, 2:20 PM
markdewing requested review of this revision.Dec 16 2022, 2:20 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript

This patch involves changes to the LLVM library and to OpenMP. I want one patch showing the LLVM library changes in context, though I can close this one and submit two separate patches if that would be better.

You don't need to split it necessarily. Can you add people that worked on the time profile before as reviewers though.

Isn't this duplicating the time profiler logic in libomptarget? I'm not sure if there is a better way but this seems unfortunate.

llvm/lib/Support/TimeProfiler.cpp
324 ↗(On Diff #483671)

Nit: Prof, similar in other places.

openmp/libomptarget/src/rtl.cpp
47

Why is this a pointer? And why a std::vector not llvm::SmallVector

markdewing added inline comments.Dec 21 2022, 9:49 PM
openmp/libomptarget/src/rtl.cpp
47

If it is not a pointer, the following error happens

a.out: /home/mdewing/nvme/software/llvm/git/llvm-project/llvm/lib/Support/TimeProfiler.cpp:154: void llvm::TimeTraceProfiler::write(llvm::raw_pwrite_stream&): Assertion `llvm::all_of(Instances.List, [](const auto &TTP) { return TTP->Stack.empty(); }) && "All profiler sections should be ended when calling write"' failed.

This seems to be happening because the first entry in the vector is missing. I did not track down the root cause, but my guess is something to do with constructor ordering.

This is only needed if you link OpenMP as static library, or always?

openmp/libomptarget/src/rtl.cpp
47

That doesn't make sense to me. What has an indirection here to do with the interaction with TimeProfiler.cpp.

What if we add an instances vector argument to`timeTraceProfilerFinishThread`.
Our threads would work as before but when they finish they use the runtime global value "Instances" which was set in the constructor via getTimeTraceProfilerInstances.
Net benefit, assuming I got it right, is that we only need one extra argument, one global pointer, and to expose the function.
Afterwards, they should all agree on the instances vector and our problems should be gone, no?

markdewing added inline comments.Jan 9 2023, 9:21 AM
openmp/libomptarget/src/rtl.cpp
47

I looked into this more, and it does have to do with the ordering of destructors. The "deinit" function has the destructor attribute with a priority. The ordering of those destructors and the destructors of file-scope objects is not defined. And it looks like the destructor for file-scope objects get called first. So the memory for this vector gets freed before it's used in "deinit".

markdewing updated this revision to Diff 487478.Jan 9 2023, 9:32 AM

Add comment. Change variable name.