This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Support kernel record and replay
ClosedPublic

Authored by ggeorgakoudis on Nov 29 2022, 9:08 AM.

Details

Summary

This patch adds functionality for recording and replaying the execution of OpenMP offload kernels, based on an original implementation by Steve Rangel. The patch extends libomptarget to extract a json description of the kernel, the device image binary, and a device memory snapshot before and after the execution of a recorded kernel. Kernel recording/replaying in libomptarget is controlled through env vars (LIBOMPTARGET_RECORD, LIBOMPTARGET_REPLAY). It provides a tool, llvm-omp-kernel-replay, for replaying a kernel using the extracted information with the ability to verify replayed execution using the post-execution device memory snapshot, also supporting changing the number of teams/threads for replaying.

Diff Detail

Event Timeline

ggeorgakoudis created this revision.Nov 29 2022, 9:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 29 2022, 9:08 AM
ggeorgakoudis requested review of this revision.Nov 29 2022, 9:08 AM
Herald added a project: Restricted Project. · View Herald Transcript
ggeorgakoudis retitled this revision from Support kernel record and replay to [OpenMP] Support kernel record and replay.Nov 29 2022, 9:09 AM
ggeorgakoudis edited the summary of this revision. (Show Details)Nov 29 2022, 9:12 AM
ggeorgakoudis added a reviewer: koparasy.
ggeorgakoudis edited the summary of this revision. (Show Details)Nov 29 2022, 9:18 AM
ggeorgakoudis edited the summary of this revision. (Show Details)Nov 29 2022, 10:29 AM

I have a lot of style/organization comments but nothing major. If nobody has a conceptual problem we merge this in once the code is reorganized and then take it from there.

openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp
220

Let's put all these code snippets into a namespace or struct as static functions.
Then they are not "in the way" of the main logic.

if (isRecording())
  recording::saveImage(Image, Name);

Since there is state associated with recording, e.g., the memory pointers, we might even want a struct

if (RecordReplay.isRecording())
  RecordReplay.saveImage(Image, Name);
383

I'd start at more than 8Gb, and I also would like to catch 512Mb and such. Maybe we want a list of "common/known" memory sizes? 64, 32, 24, 20, 16, 10, 8, 6, 4, 2, 1, .5, .25, something like that? (Not that I know how much memory GPUs have...)

622

I imagine this needs to be integrated into the memory manager at some point to support free.
The manager keeps free lists and such and we can let it keep one list for all sizes bigger than what it tracks now.
When we record we go though all the allocated memory and dump it in a "sparse format" basically.
Replay then the opposite.
All that said, this can be a TODO for now explaining it.

737

These certainly need to go into some sort of helper functions.
A TODO for filtering is also required.
I'd personally rename "golden" to "original.output" or something descriptive, characters are free I guess.
No comments after "else". If anything, make it an assertion.

openmp/libomptarget/src/interface.cpp
249

Documentation.

openmp/libomptarget/src/omptarget.cpp
1755

No need for the block.
Above, the comments should be full sentences and actually describe more than the immediate code following.

openmp/libomptarget/tools/kernelreplay/llvm-omp-kernel-replay.cpp
55

This is potentially user facing, MB and such doesn't mean much.

67

Add a TODO that we should warn if the value was set before and is now changed.

110

I'm not 100% sold on the env var approach but I don't see a nice alternative that is not invasive.
That said, we should use a bitset and communicate if we need a ".replay" output for verification or not.
User might also want to choose a ".replay" output w/o verification via an option, but for perf testing, getting counters, etc. we should just skip that part.

124

DeviceId needs to be stored and read I assume. We might also want a cmd line option.

jplehr added a subscriber: jplehr.Nov 30 2022, 8:00 AM
ggeorgakoudis edited the summary of this revision. (Show Details)Nov 30 2022, 11:58 AM
kevinsala added inline comments.Dec 12 2022, 7:31 AM
openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp
373

Are we deallocating this memory buffer? If that's handled by the memory manager, this memory will automatically be freed at manager's finalization. But anyway, I think it's a good practice if we delete it during device deinitialization.

620

I've several doubts/questions because I'm surely missing some context. I'm sorry if these comments do not apply in record/replay mode:

  1. Can multiple threads allocate data concurrently on record/replay mode? This part seems not thread-safe.
  2. We are ignoring the Kind parameter here and always returning TARGET_ALLOC_DEFAULT data (pre-allocated at initialization). That's not a problem? Isn't possible to request other types of memory (e.g., host memory)?
  3. This part does not handle the case where we exceed the maximum pre-allocated memory, right?
openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h
417 ↗(On Diff #478623)

Could we rename the envars to OMPX_RecordKernel and OMPX_ReplayKernel, as the rest of envars?

ggeorgakoudis marked 12 inline comments as done.

Update for comments

Remove unnecessary headers from the replay tool

ggeorgakoudis marked an inline comment as done.Dec 13 2022, 1:39 AM
ggeorgakoudis added inline comments.
openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp
620

Thank you for feedback, always welcome!

  1. Good thinking, added a lock guard when allocating memory and a TODO
  2. Yes, the patch is about the "common case". Added a TODO
  3. No, it does not. Maybe add a TODO though we try to allocate the whole of device memory
737

I'm not sure what you mean by filtering

openmp/libomptarget/tools/kernelreplay/llvm-omp-kernel-replay.cpp
110

I couldn't think of a better alternative either. I'm using setenv which is unix specific, I couldn't find a portable interface in LLVM.

I ended up adding an env var for saving the kernel device memory output, thinking that it may be the case that we don't want to save the output even when recording.

ggeorgakoudis marked an inline comment as done.Dec 13 2022, 1:41 AM
jdoerfert accepted this revision.Dec 13 2022, 2:29 PM

LG, I think we can address remaining issues in-tree which makes it easier to test it.

openmp/libomptarget/src/interface.cpp
259

Nit: formatting.

This revision is now accepted and ready to land.Dec 13 2022, 2:29 PM

Fix comment formatting

This revision was automatically updated to reflect the committed changes.