This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][GPU] Introduce the `ompx_dyn_cgroup_mem(<N>)` clause
ClosedPublic

Authored by jdoerfert on Jan 8 2023, 1:39 PM.

Details

Summary

Dynamic memory allows users to allocate fast shared memory when a kernel
is launched. We used to support a single size for all kernels via the
LIBOMPTARGET_SHARED_MEMORY_SIZE environment variable but now we can
control it per kernel invocation and use computed values.

Note: Only the nextgen plugins will allocate memory based on the clause.

Diff Detail

Event Timeline

jdoerfert created this revision.Jan 8 2023, 1:39 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 8 2023, 1:39 PM
jdoerfert requested review of this revision.Jan 8 2023, 1:39 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 8 2023, 1:39 PM

@ABataev I'm doing something wrong which causes the nowait target codegen to emit an assertion. Any idea what I'm missing? I tried to stay close to thread_limit for this one.

DeclRefExpr for Decl not entered in LocalDeclMap?
UNREACHABLE executed at .../clang/lib/CodeGen/CGExpr.cpp:2842!

@ABataev I'm doing something wrong which causes the nowait target codegen to emit an assertion. Any idea what I'm missing? I tried to stay close to thread_limit for this one.

DeclRefExpr for Decl not entered in LocalDeclMap?
UNREACHABLE executed at .../clang/lib/CodeGen/CGExpr.cpp:2842!

In which test do you see it?

jdoerfert added inline comments.Jan 9 2023, 10:34 AM
clang/test/OpenMP/target_ompx_dyn_cgroup_mem_codegen.cpp
109

@ABataev ^^^ here. I think when n is looked up.

ABataev added inline comments.Jan 9 2023, 11:05 AM
clang/lib/Sema/SemaOpenMP.cpp
15953

Try to use OMPD_task here and check if it works. I rather doubt you need to capture the expression and pass it to the target region. Also, check how device clause is implemented, most probably you need to do something similar. There is an implicit task created for nowait target, where need to allocate the space and pass the value.

jdoerfert added inline comments.Jan 9 2023, 1:20 PM
clang/lib/Sema/SemaOpenMP.cpp
15953

I found it. See below. Works now, with target and with task here. I leave target to match num_teams, device is needed in the outer call as well.

23829

I used Size here not ValExpr. Fixed it locally.

ABataev added inline comments.Jan 9 2023, 2:04 PM
clang/lib/Sema/SemaOpenMP.cpp
23829

Ah, yes, need to use the transformed expression, otherwise the declrefexpt won't be recognized as captured.

jhuber6 accepted this revision.Jan 19 2023, 4:12 PM

LGTM. Think we should have a static check if someone uses this with a non-GPU target?

This revision is now accepted and ready to land.Jan 19 2023, 4:12 PM
jdoerfert updated this revision to Diff 491105.Jan 21 2023, 3:05 PM

Add runtime test

This revision was landed with ongoing or failed builds.Jan 21 2023, 6:53 PM
This revision was automatically updated to reflect the committed changes.
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJan 21 2023, 6:53 PM