Page MenuHomePhabricator

[OpenMP] Enable profiling on multiple threads
Needs ReviewPublic

Authored by markdewing on Dec 16 2022, 2:20 PM.



This patch fixes the issue where the built-in profiler doesn't produce data for multiple threads:

Before creating a TimeScope object, the code checks to see if a profiler instance has
been created for that thread. If not, it creates one and saves a pointer for later.
At shutdown, all the saved profiler instances are finished before the call to write out the data.

Diff Detail

Event Timeline

markdewing created this revision.Dec 16 2022, 2:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 16 2022, 2:20 PM
markdewing requested review of this revision.Dec 16 2022, 2:20 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript

This patch involves changes to the LLVM library and to OpenMP. I want one patch showing the LLVM library changes in context, though I can close this one and submit two separate patches if that would be better.

You don't need to split it necessarily. Can you add people that worked on the time profile before as reviewers though.

Isn't this duplicating the time profiler logic in libomptarget? I'm not sure if there is a better way but this seems unfortunate.

324 ↗(On Diff #483671)

Nit: Prof, similar in other places.


Why is this a pointer? And why a std::vector not llvm::SmallVector

markdewing added inline comments.Dec 21 2022, 9:49 PM

If it is not a pointer, the following error happens

a.out: /home/mdewing/nvme/software/llvm/git/llvm-project/llvm/lib/Support/TimeProfiler.cpp:154: void llvm::TimeTraceProfiler::write(llvm::raw_pwrite_stream&): Assertion `llvm::all_of(Instances.List, [](const auto &TTP) { return TTP->Stack.empty(); }) && "All profiler sections should be ended when calling write"' failed.

This seems to be happening because the first entry in the vector is missing. I did not track down the root cause, but my guess is something to do with constructor ordering.

This is only needed if you link OpenMP as static library, or always?


That doesn't make sense to me. What has an indirection here to do with the interaction with TimeProfiler.cpp.

What if we add an instances vector argument to`timeTraceProfilerFinishThread`.
Our threads would work as before but when they finish they use the runtime global value "Instances" which was set in the constructor via getTimeTraceProfilerInstances.
Net benefit, assuming I got it right, is that we only need one extra argument, one global pointer, and to expose the function.
Afterwards, they should all agree on the instances vector and our problems should be gone, no?

markdewing added inline comments.Jan 9 2023, 9:21 AM

I looked into this more, and it does have to do with the ordering of destructors. The "deinit" function has the destructor attribute with a priority. The ordering of those destructors and the destructors of file-scope objects is not defined. And it looks like the destructor for file-scope objects get called first. So the memory for this vector gets freed before it's used in "deinit".

markdewing updated this revision to Diff 487478.Jan 9 2023, 9:32 AM

Add comment. Change variable name.