This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][IR-Builder] Introduce the finalization stack
ClosedPublic

Authored by jdoerfert on Nov 14 2019, 11:01 AM.

Details

Summary

As a permanent and generic solution to the problem of variable
finalization (destructors, lastprivate, ...), this patch introduces the
finalization stack. The objects on the stack describe (1) the
(structured) regions the OpenMP-IR-Builder is currently constructing,
(2) if these are cancellable, and (3) the callback that will perform the
finalization (=cleanup) when necessary.

As the finalization can be necessary multiple times, at different source
locations, the callback takes the position at which code is currently
generated. This position will also encode the destination of the "region
exit" block *iff* the finalization call was issues for a region
generated by the OpenMPIRBuilder. For regions generated through the old
Clang OpenMP code geneneration, the "region exit" is determined by Clang
inside the finalization call instead (see getOMPCancelDestination).

As a first user, the parallel + cancel barrier interaction is changed.
In contrast to the temporary solution before, the barrier generation in
Clang does not need to be aware of the "CancelDestination" block.
Instead, the finalization callback is and, as described above, later
even that one does not need to be.

D70109 will be updated to use this scheme.

Event Timeline

jdoerfert created this revision.Nov 14 2019, 11:01 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptNov 14 2019, 11:01 AM
ABataev added inline comments.Nov 14 2019, 11:24 AM
clang/lib/CodeGen/CGOpenMPRuntime.cpp
1494

Better to not capture all the object implicitly by reference, provide explicit list

1497

Use InsertPointGuard where possible rather than use saveIP/restoreIP explicitly

1501

Add a message for the assert.

1510–1520

Again, need RAII

llvm/include/llvm/Frontend/OpenMPIRBuilder.h
69–71

Maybe push... for const reference and emplace... for variadic template (just like in standard library)?

jdoerfert marked 5 inline comments as done.

Addressed comments

clang/lib/CodeGen/CGOpenMPRuntime.cpp
1510–1520

There is no appropriate scope to put it in.

llvm/include/llvm/Frontend/OpenMPIRBuilder.h
69–71

That would defeat the purpose of using move semantic. I can make it a push but we will then have to copy and not move.

ABataev added inline comments.Nov 15 2019, 7:26 AM
clang/lib/CodeGen/CGOpenMPRuntime.cpp
1510–1520

You can check for OMPBuilder in the constructor/destructor.

llvm/include/llvm/Frontend/OpenMPIRBuilder.h
69–71

You can use it in emplace version while push... function will rely on copy semantics.

jdoerfert marked an inline comment as done.Nov 15 2019, 9:29 AM
jdoerfert added inline comments.
llvm/include/llvm/Frontend/OpenMPIRBuilder.h
69–71

I only call this once, why do I need multiple versions?

jdoerfert updated this revision to Diff 229573.Nov 15 2019, 9:36 AM

Replace conditional by RAII object

ABataev added inline comments.Nov 15 2019, 9:41 AM
llvm/include/llvm/Frontend/OpenMPIRBuilder.h
69–71

I thought that we may need it later in other cases, no?

ABataev added inline comments.Nov 15 2019, 9:44 AM
clang/lib/CodeGen/CGOpenMPRuntime.cpp
1481–1509

Most of this code can be moved to the constructor of the PopStackRAII.

jdoerfert updated this revision to Diff 229582.Nov 15 2019, 9:51 AM
jdoerfert marked an inline comment as done.

Make it a specialized push-pop RAII object (just moved code)

jdoerfert marked an inline comment as done.Nov 15 2019, 9:52 AM
jdoerfert added inline comments.
llvm/include/llvm/Frontend/OpenMPIRBuilder.h
69–71

We only need this interface for situations where we want to test finalization in the presence of cancellation points that are generated by the OpenMPIRBuilder while the cancelled region is not. For example, once D70290 is in, we can remove the call from the barrier code (it will be part of D70290). If we always port the regions first (sections, for, ...) we don't need this interface at all.

jdoerfert marked 4 inline comments as done.Nov 21 2019, 9:05 AM

ping

llvm/include/llvm/Frontend/OpenMPIRBuilder.h
69–71

I'll make this a const reference and push back.

jdoerfert updated this revision to Diff 233462.Dec 11 2019, 3:33 PM

rebase + ping

Tests?

llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
51 ↗(On Diff #233462)

llvm::function_ref?

jdoerfert marked 2 inline comments as done.Dec 11 2019, 4:03 PM

Tests?

This changes the implementation and does not add features. Thus, this is covered by the existing tests for the functionality, e.g. the unit tests to build barriers and the front-end barrier test.

llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
51 ↗(On Diff #233462)

The lambda that is passed in here might go out of scope so we need to own it. This is intentional.

ABataev added inline comments.Dec 11 2019, 4:34 PM
llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
51 ↗(On Diff #233462)

Maybe better to save the intermediate data in CGOpenMPRuntime class rather than rely on owning lambda ref here? Clang does not use escaping decls and instead stores intermediate data explicitly. It really complicates analysis and potential source of resource leakage.

jdoerfert updated this revision to Diff 233472.Dec 11 2019, 4:53 PM
jdoerfert marked an inline comment as done.

Update the unit test

jdoerfert marked an inline comment as done.Dec 11 2019, 4:58 PM
jdoerfert added inline comments.
llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
51 ↗(On Diff #233462)

I don't follow. Clang does use std::function to store callbacks that have to life for a while. Why is this different? What would be the benefit of having a function_ref here and a std::function in CGOpenMPRuntime? Note that the FinaliztionInfo object is created in the front-end and the std::function is assigned there already.

Unit tests: pass. 60744 tests passed, 0 failed and 726 were skipped.

clang-format: pass.

Build artifacts: console-log.txt, CMakeCache.txt, test-results.xml, diff.json

ABataev added inline comments.Dec 12 2019, 7:00 AM
llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
51 ↗(On Diff #233462)

Clang uses it only in some rare cases, when it is really required, like typo correction or something like this. In other cases it is not recommended to use it.

I'm not saying that you need to store std::function in CGOpenMPRuntime class, I'm saying about necessary data.

jdoerfert marked an inline comment as done.Dec 12 2019, 9:42 AM
jdoerfert added inline comments.
llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
51 ↗(On Diff #233462)

What necessary data? Can you please explain how you want it to look like and why? This version seems to work fine.

ABataev added inline comments.Dec 12 2019, 9:51 AM
llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
51 ↗(On Diff #233462)

I'm not arguing that it does not work, I'm asking do you really need such a complex solution? Just like I said before, it is very hard to maintain and to understand the lifetime of lambda, so it is a potential source of resource leakage. All I'm asking is is there a way to implement it differently, nothing else.

jdoerfert marked an inline comment as done.Dec 12 2019, 10:37 AM
jdoerfert added inline comments.
llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
51 ↗(On Diff #233462)

There are alternatives, all of which are more complex and come with the same "problems".

hfinkel added inline comments.Dec 12 2019, 11:18 AM
llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
51 ↗(On Diff #233462)

@jdoerfert , can you please elaborate? What other designs might be possible?

It looks to me like this lambda is necessary to maintain separation between Clang's CodeGen and the OpenMPIRBuilder (the non-unit-test use in this patch only captures CGF). I'm not particularly concerned about lifetime management of the lambdas if they only need to capture CGF, and maybe the design of this implies that the lambdas will only capture objects that live as long as the code generation phase, but it is certainly true that whenever we have long-lived lambdas thought about lifetime of captured data is required.

jdoerfert marked an inline comment as done.Dec 12 2019, 12:06 PM
jdoerfert added inline comments.
llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
51 ↗(On Diff #233462)

I'm confused. It seems people dislike the callback for potential risks even though, apparently, no one sees an actual problem nor an alternative without callbacks.

The alternative I mentioned before and the initial implementation were:

  • Initially I only had a function_ref here, which should work for now, but I doubt that this is any better.
  • Having a std::function in the frontend and a function_ref here. Though, that only doubles the problems.
rjmccall resigned from this revision.Dec 17 2019, 11:06 AM

I opposed the creation of this library on these terms in the first place, so I'm pretty sure I'm not on the hook to review it. Introducing an IRBuilder-level finalization stack sounds like it's going to be a huge mess if your goal is to plug this into other frontends.

I opposed the creation of this library on these terms in the first place, so I'm pretty sure I'm not on the hook to review it.

That's fine with me.

Introducing an IRBuilder-level finalization stack sounds like it's going to be a huge mess if your goal is to plug this into other frontends.

While I get that you don't want to review this, I would really like to understand why you think this would become a mess.

FWIW. The only code that is currently in Clang to make this work is going to be removed once the code generation is moved as well.

Introducing an IRBuilder-level finalization stack sounds like it's going to be a huge mess if your goal is to plug this into other frontends.

While I get that you don't want to review this, I would really like to understand why you think this would become a mess.

I guess it depends on what you're expecting to be able to achieve with this stack. Frontends have their own notion of what needs to be finalized and what can trigger control flow. If your finalization stack is purely for the convenience of your internal IR-generation, it's fine. If the need for a finalizer can cross the emission of arbitrary frontend code, or if your code needs to emit branches that might cross arbitrary frontend "scope" boundaries, you're going to be in trouble.

Introducing an IRBuilder-level finalization stack sounds like it's going to be a huge mess if your goal is to plug this into other frontends.

While I get that you don't want to review this, I would really like to understand why you think this would become a mess.

I guess it depends on what you're expecting to be able to achieve with this stack. Frontends have their own notion of what needs to be finalized and what can trigger control flow. If your finalization stack is purely for the convenience of your internal IR-generation, it's fine. If the need for a finalizer can cross the emission of arbitrary frontend code, or if your code needs to emit branches that might cross arbitrary frontend "scope" boundaries, you're going to be in trouble.

Let me explain what it does and why, maybe that helps:

The "finalizer" is a frontend provided callback as only the frontend "knows" what values to finalize and "how" to finalize them if we leave the scope. It is created only for OpenMP scopes as we only manage leaving those in the OMPBuilder. If we generate code for an OpenMP directive that will cause control flow to leave the region the finalizer is invoked. Other language finalization, e.g., because of nested control flow, will happen undisturbed. It works because OpenMP restricts what can happen in a region, e.g. you cannot return from an OpenMP parallel region, so all exits have to be "natural" or OpenMP related.

FWIW: We already have (basically) the same scheme in Clang, managed by the CGOpenMP and doing basically the same thing.

So it's never that OpenMP has things that need to be finalized before exiting (e.g. if something in frontend-emitted code throws an exception), it's just that OpenMP might need to generate its own control flow that breaks through arbitrary scopes in the frontend?

I would really hope that the OpenMP implementation in Clang would've used Clang's cleanup stack rather than inventing its own mechanism.

ABataev added a comment.EditedDec 17 2019, 4:07 PM

So it's never that OpenMP has things that need to be finalized before exiting (e.g. if something in frontend-emitted code throws an exception), it's just that OpenMP might need to generate its own control flow that breaks through arbitrary scopes in the frontend?

I would really hope that the OpenMP implementation in Clang would've used Clang's cleanup stack rather than inventing its own mechanism.

John, current implementation completely relies on Clang's cleanup stack.

That's what I figured. The thing is that that necessarily interacts correctly with everything in Clang's IRGen in ways that making a second stack that Clang's IRGen doesn't directly know about doesn't.

I think you either need to extract Clang's entire cleanup-stack concept into a generic frontend library, so that both Clang and your generic OpenMP lowering just manipulate that common stack, or you need to call back into the frontend to manage your cleanup.

So it's never that OpenMP has things that need to be finalized before exiting (e.g. if something in frontend-emitted code throws an exception), it's just that OpenMP might need to generate its own control flow that breaks through arbitrary scopes in the frontend?

Depending on how you look at it. OpenMP has to finalize some constructs, which usually means to place a call before we continue with the code that follows. It also introduce a new scope in which things like privatization happen, however these are almost completely handled through clang (see below). The one thing that comes to mind is lastprivate which is handled by OpenMP.

Btw. Exceptions have to be caught inside an OpenMP scope that has a finalizer as it would be UB otherwise (as with the return).

I would really hope that the OpenMP implementation in Clang would've used Clang's cleanup stack rather than inventing its own mechanism.

This patch uses (parts of) clangs cleanup logic and the existing CGOpenMP cleanup code (which is the existing OpenMP specific stack (CodeGenFunction::OMPCancelStack) this one will replace).


In the code that glues the old CGOpenMP to the new OMPBuilder, where we create the finalization callback, the existing getOMPCancelDestination is used as shown below.

CodeGenFunction::JumpDest Dest = CGF.getOMPCancelDestination(OMPD_parallel);
CGF.EmitBranchThroughCleanup(Dest);

In the follow up patch (D70290) that will lower #pragma omp parallel through the OMPIRBuilder, we don't need the clang::CodeGenFunction::OMPCancelDestination/clang::CodeGenFunction::OMPCancelStack stuff anymore (for the parallel) and the finalization callback becomes:

CodeGenFunction::JumpDest Dest = getJumpDestInCurrentScope(DestBB);
EmitBranchThroughCleanup(Dest);

With DestBB defined as the end of the OpenMP region/scope.


The additional stack is also needed because depending on the nesting we may or may not create control flow that can jump to the end of a scope. So

#pragma omp parallel
{
  {
    ... ;
    #pragma omp barrier // or an implicit barrier
    ... ;
  }
  parallel_exit:
}

will cause a control flow edge from the barrier to parallel_exit. This edge may not exist if there is another OpenMP region nested in the parallel.


I would really hope that the OpenMP implementation in Clang would've used Clang's cleanup stack rather than inventing its own mechanism.

John, current implementation completely relies on Clang's cleanup stack.

Let me rephrase the above statement less misleading:
The current implementation relies on Clang's cleanup stack *and* the OpenMP specific clang::CodeGenFunction::OMPCancelStack, which is what this patch will eventually replace.

That's what I figured.

Just to say this again:
Current OpenMP code generation keeps a second stack for exactly the same purpose as the one introduced here.

The thing is that that necessarily interacts correctly with everything in Clang's IRGen in ways that making a second stack that Clang's IRGen doesn't directly know about doesn't.

I think you either need to extract Clang's entire cleanup-stack concept into a generic frontend library, so that both Clang and your generic OpenMP lowering just manipulate that common stack, or you need to call back into the frontend to manage your cleanup.

I like the idea of a generic way to do this but I'm unsure if that is needed given the restrictions on OpenMP regions. We basically are not allowed to jump out by non-OpenMP means. What we do in the existing, and this rewritten stack is remember what OpenMP means will jump out of the current scope and to the end of the region. This can be reused in Flang as the restrictions hold there as well (as far as I know).

Okay, if this is just handling OpenMP-specific control flow and doesn't need to interact with what a reasonable frontend would do with cleanups, I'm appeased.

This revision is now accepted and ready to land.Dec 20 2019, 1:41 AM
This revision was automatically updated to reflect the committed changes.