This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Wrap (v)printf in the new RT and use same handling for AMD
ClosedPublic

Authored by jdoerfert on Oct 25 2021, 7:19 PM.

Details

Summary

To support printf NVPTX and AMD targets are handled differently. The
latter performs host calls, which we don't really want in OpenMP, the
former translates printf calls to vprintf calls as the NVIDIA
runtime provides an implementation for the device of vprintf. This
patch unifies the AMD and NVPTX handling and emits for both calls to the
vprintf wrapper __llvm_omp_vprintf which we define in our new device
runtime. The main benefit of this wrapper is that we can more easily
control (and profile) the emission of printf calls in device code.

Note: Tests are coming.

Diff Detail

Event Timeline

jdoerfert created this revision.Oct 25 2021, 7:19 PM
jdoerfert requested review of this revision.Oct 25 2021, 7:19 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptOct 25 2021, 7:19 PM
jdoerfert updated this revision to Diff 382171.Oct 25 2021, 8:23 PM

Actually use the new wrapper for OpenMP offload targeting AMD (and the new RT)

That's an interesting approach.

Do you happen to know where I can find details of the data format behind that void*? Have been meaning to look at writing printf for amdgpu as host side decoding of that buffer. If the compiler knows how long it is, that would be a useful third argument.

jdoerfert added inline comments.Oct 26 2021, 6:33 AM
clang/lib/CodeGen/CGGPUBuiltin.cpp
128

That's an interesting approach.

Do you happen to know where I can find details of the data format behind that void*? Have been meaning to look at writing printf for amdgpu as host side decoding of that buffer. If the compiler knows how long it is, that would be a useful third argument.

We actually do know. Above we allocate and fill the buffer. For the OpenMP wrapper you could easily add a third argument later in order to facilitate an OpenMP runtime printf impl. I would even like it to be target agnostic (e.g., replace the default CUDA route on request). That said, we should tackle that separately, wdyt?

JonChesterfield added a comment.EditedOct 26 2021, 7:04 AM

Nice! Yep, can add a size argument later. Will want it to control copying the payload over to the host. Or we could allocate a buffer that the corresponding runtime can handle directly (pinned/fine grain) and skip that copy

JonChesterfield accepted this revision.Oct 26 2021, 12:12 PM
This revision is now accepted and ready to land.Oct 26 2021, 12:12 PM

This doesn't apply against main, diff relative to something that isn't in main

-rebase on main

jdoerfert closed this revision.Nov 8 2021, 10:54 AM

Committed as part of D112680.