Page MenuHomePhabricator

[OpenMP] Overhaul `declare target` handling
ClosedPublic

Authored by jdoerfert on Thu, Apr 22, 1:39 AM.

Details

Summary

This patch fixes various issues with our prior declare target handling
and extends it to support omp begin declare target as well.

This started with PR49649 in mind, trying to provide a way for users to
avoid the "ref" global use introduced for globals with internal linkage.
From there it went down the rabbit hole, e.g., all variables, even
nohost ones, were emitted into the device code so it was impossible to
determine if "ref" was needed late in the game (based on the name only).
To make it really useful, begin declare target was needed as it can
carry the device_type. Not emitting variables eagerly had a ripple
effect. Finally, the precedence of the (explicit) declare target list
items needed to be taken into account, that meant we cannot just look
for any declare target attribute to make a decision. This caused the
handling of functions to require fixup as well.

I tried to clean up things while I was at it, e.g., we should not "parse
declarations and defintions" as part of OpenMP parsing, this will always
break at some point. Instead, we keep track what region we are in and
act on definitions and declarations instead, this is what we do for
declare variant and other begin/end directives already.

Highlights:

  • new diagnosis for restrictions specificed in the standard,
  • delayed emission of globals not mentioned in an explicit list of a declare target,
  • omission of nohost globals on the host and host globals on the device,
  • no explicit parsing of declarations in-between omp [begin] declare variant and the corresponding end anymore, regular parsing instead,
  • precedence for explicit mentions in declare target lists over implicit mentions in the declaration-definition-seq, and
  • omp allocate declarations will now replace an earlier emitted global, if necessary.

Notes:

The patch is larger than I hoped but it turns out that most changes do
on their own lead to "inconsistent states", which seem less desirable
overall.

After working through this I feel the standard should remove the
explicit declare target forms as the delayed emission is horrible.
That said, while we delay things anyway, it seems to me we check too
often for the current status even though that is often not sufficient to
act upon. There seems to be a lot of duplication that can probably be
trimmed down. Eagerly emitting some things seems pretty weak as an
argument to keep so much logic around.


Diff Detail

Event Timeline

jdoerfert created this revision.Thu, Apr 22, 1:39 AM
jdoerfert requested review of this revision.Thu, Apr 22, 1:39 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptThu, Apr 22, 1:39 AM

Could you check that it does not break the tests from https://github.com/clang-ykt/omptests? IIRC, "ref" was introduced to fix a bug with too early optimizations of the declare target variables defined in other modules. Check t-same-name-definitions especially, though I'm not sure that exactly this test caused adding of $refs.

clang/lib/CodeGen/CGOpenMPRuntime.cpp
2998–2999

Why did you decide to drop this?

clang/lib/Parse/ParseOpenMP.cpp
1736

Looks like it is still required to consume tok::annot_pragma_openmp_end

1742

try_emplace(ND, MI)

jdoerfert marked an inline comment as done.Thu, Apr 22, 9:02 AM

Could you check that it does not break the tests from https://github.com/clang-ykt/omptests? IIRC, "ref" was introduced to fix a bug with too early optimizations of the declare target variables defined in other modules. Check t-same-name-definitions especially, though I'm not sure that exactly this test caused adding of $refs.

I did run that test, works fine. The "$ref"s are still generated, except if you don't have a host version of a variable. Without it, there is no reason to have a ref because you cannot actually address the device version.

clang/lib/CodeGen/CGOpenMPRuntime.cpp
2998–2999

Because a missing host version should mean we do not create a region entry. Take a look at all the
situations in clang/test/OpenMP/declare_target_only_one_side_compilation.cpp for which we are missing a host entry. None of them should create a region entry info.

clang/lib/Parse/ParseOpenMP.cpp
1736

That's done in the caller now because the clauses are not always parsed.

ABataev added inline comments.Thu, Apr 22, 9:11 AM
clang/lib/Parse/ParseOpenMP.cpp
2134–2151

Add a RAII to control these new/delete or just create local var

jdoerfert marked an inline comment as done.Thu, Apr 22, 9:32 AM
jdoerfert added inline comments.
clang/lib/Parse/ParseOpenMP.cpp
2134–2151

Local variable and RAII do not always work, there is a delete in the end_declare_target below as well.

ABataev added inline comments.Fri, Apr 23, 6:40 AM
clang/lib/Parse/ParseOpenMP.cpp
2134–2151

But here you have new/delete pair within a single block. Either you can put the ar into stack/RAII or some extra checks are missing. Maybe must be something like this:

if (DKind == OMPD_declare_target)
  delete DTCI;

?

jdoerfert added inline comments.Fri, Apr 23, 8:52 AM
clang/lib/Parse/ParseOpenMP.cpp
2134–2151

If this is a directive without implicit mappings, the DTCI will be deleted right away. It is just used to collect the clauses (which include explicit mappings). If this is a directive with implicit mappings, which we cannot tell from the DKind alone, the DTCI will be stored in Sema until we reach the corresponding end directive. For the former case we could do stack/RAII, but not for the latter case, the scope is left and DTCI survives. I'll look into this again and try to avoid new/delete, feel free to look at the rest of the patch.

ABataev added inline comments.Fri, Apr 23, 9:17 AM
clang/lib/Parse/ParseOpenMP.cpp
2134–2151

Everything else looks good, just need to clarify the question with this new/delete. I missed if (HasImplicitMappings) check, so maybe this is correct but better to double-check.

jdoerfert marked 2 inline comments as done.Fri, Apr 23, 9:41 AM

I'm testing a new version that does not allocate the structure on the heap. Instead we copy it. It is really infrequent and it should be <40 bytes when that happens, let's not over-optimize ;)

I'm testing a new version that does not allocate the structure on the heap. Instead we copy it. It is really infrequent and it should be <40 bytes when that happens, let's not over-optimize ;)

I don't worry about optimizations here, just about possible memory leak.

Avoid heap memory, copy when necessary, add documentation, address comments

Looks good, but would be nice to check with sanitizers that there are no uses-after-delete.

clang/lib/CodeGen/CGDecl.cpp
2619–2620

const auto *

clang/lib/Parse/ParseOpenMP.cpp
2142–2143

Would be good to check how it works with sanitizers.

Looks good, but would be nice to check with sanitizers that there are no uses-after-delete.

There is no heap allocation anymore but I guess I could have messed up the stack stuff.
I don't have a sanitizer build handy but the buildbots will yell at me if I messed up, is that sufficient?

This revision is now accepted and ready to land.Fri, Apr 23, 1:07 PM

I'm fixing the tests now, also realized that the omp allocate handling now always creates definitions, which is bad.
I'll upload a new version soon, test case for this, which shows we also don't do a good job now, is here: https://godbolt.org/z/T9od5Mq7h

jdoerfert updated this revision to Diff 342419.Mon, May 3, 9:21 AM

Fix tests by not always emitting a definition for omp allocate

ABataev added inline comments.Mon, May 3, 9:45 AM
clang/lib/CodeGen/CGDecl.cpp
2618

Why need to remove constantness here?

2622

Skip

2623–2624

What about static locals, will it work?

jdoerfert added inline comments.Mon, May 3, 10:01 AM
clang/lib/CodeGen/CGDecl.cpp
2618

The problem is that the declaration, which might have been already generated in IR or not, had no address space attached before. Now we want to do that after the fact, potentially *way later*. To reuse the general codegen mechanism I modify the VarDecl (temporarily) by attaching the new address space and calling codegen again. If you have a better idea, I'm happy to change this.

2623–2624

need to check, will add a test.

ABataev added inline comments.Mon, May 3, 12:52 PM
clang/lib/CodeGen/CGDecl.cpp
2618

Why not just create it manually rather than rely on GetOrCreateLLVMGlobal?

jdoerfert added inline comments.Mon, May 3, 1:20 PM
clang/lib/CodeGen/CGDecl.cpp
2618

I'm not sure I follow. Creating a global is complicated, need to make sure all the linkage, visibility, address space stuff is set up, mangling, etc. Why would we duplicate all that code and risk messing things up now or in the future?

ABataev added inline comments.Mon, May 3, 1:26 PM
clang/lib/CodeGen/CGDecl.cpp
2618

And what about just setting the address-space on the LLVM type directly?

jdoerfert added inline comments.Mon, May 3, 5:44 PM
clang/lib/CodeGen/CGDecl.cpp
2618

It's part of the type, we can't change it after the fact (at least that is not an existing API).

ABataev added inline comments.Mon, May 3, 6:09 PM
clang/lib/CodeGen/CGDecl.cpp
2618

What about Value::mutateType?

jdoerfert added inline comments.Tue, May 4, 7:57 AM
clang/lib/CodeGen/CGDecl.cpp
2618

I'll give it a shot, though the comment makes me nervous:

/// Mutate the type of this Value to be of the specified type.                                                                                                                                                                                             
///
/// Note that this is an extremely dangerous operation which can create
/// completely invalid IR very easily.  It is strongly recommended that you
/// recreate IR objects with the right types instead of mutating them in
/// place.
jdoerfert updated this revision to Diff 342764.Tue, May 4, 9:02 AM

Use mutateType, add test for static variable in function with allocate directive

jdoerfert updated this revision to Diff 342765.Tue, May 4, 9:04 AM
jdoerfert marked 4 inline comments as done.

Remove const cast, clang-format

jdoerfert marked 7 inline comments as done.Tue, May 4, 9:09 AM
jdoerfert added inline comments.
clang/lib/CodeGen/CGDecl.cpp
2618

Seems to work with the mutate API. Let's try it out ;)

I'll wait for @ggeorgakoudis to update the tests with the script, then I'll adjust all clang tests again. FWIW, this also fixes an issue in OpenMC where declare target triggered this:

llvm-project/clang/lib/CodeGen/CodeGenFunction.cpp:719: void clang::CodeGen::CodeGenFunction::StartFunction(clang::GlobalDecl, clang::QualType, llvm::Function *, const clang::CodeGen::CGFunctionInfo &, const clang::CodeGen::FunctionArgList &, clang::SourceLocation, clang::SourceLocation): Assertion `CurFn->isDeclaration() && "Function already has body?"' failed.

I'll wait for @ggeorgakoudis to update the tests with the script, then I'll adjust all clang tests again. FWIW, this also fixes an issue in OpenMC where declare target triggered this:

llvm-project/clang/lib/CodeGen/CodeGenFunction.cpp:719: void clang::CodeGen::CodeGenFunction::StartFunction(clang::GlobalDecl, clang::QualType, llvm::Function *, const clang::CodeGen::CGFunctionInfo &, const clang::CodeGen::FunctionArgList &, clang::SourceLocation, clang::SourceLocation): Assertion `CurFn->isDeclaration() && "Function already has body?"' failed.

Nice!

jdoerfert updated this revision to Diff 343282.Wed, May 5, 9:51 PM

Update tests

jdoerfert updated this revision to Diff 343285.Wed, May 5, 10:10 PM

Reapply test changes accidentally dropped

jdoerfert updated this revision to Diff 343290.Wed, May 5, 11:12 PM

Address the last test failures (all ordering)

This revision was automatically updated to reflect the committed changes.

Hello, We are maintaining a downstream version of the monorepo based on the LLVM main branch. We have not transitioned to the new PM yet. In a recent attempt to merge the latest upstream commits into our monorepo we came across the following test failures after your commit.


For below test cases:

remarks_parallel_in_multiple_target_state_machines.c
remarks_parallel_in_target_state_machine.c

Thanks
greg

Hello, We are maintaining a downstream version of the monorepo based on the LLVM main branch. We have not transitioned to the new PM yet. In a recent attempt to merge the latest upstream commits into our monorepo we came across the following test failures after your commit.


For below test cases:

remarks_parallel_in_multiple_target_state_machines.c
remarks_parallel_in_target_state_machine.c

Thanks
greg

Can you add (and if you want upstream) a change that forces the use of the new pass manager for those tests?