Page MenuHomePhabricator

[OpenMPIRBuilder] introduce createStaticWorkshareLoop

Authored by ftynse on Dec 2 2020, 3:59 AM.



Introduce a function that creates a statically-scheduled workshare loop
out of a canonical loop created earlier by the OpenMPIRBuilder. This
basically amounts to injecting runtime calls to the preheader and the
after block and updating the trip count. Static scheduling kind is
currently hardcoded and needs to be extracted from the runtime library
into common TableGen definitions.

Diff Detail

Event Timeline

ftynse created this revision.Dec 2 2020, 3:59 AM
ftynse requested review of this revision.Dec 2 2020, 3:59 AM
SouraVX added a project: Restricted Project.
SouraVX added subscribers: AMDChirag, SouraVX.

Only Nits from me. @Meinersbur wdyt?


The preheader (only) solution is for static schedules but that is OK for now.




@Meinersbur: Eventually we might want to keep pointers identifying these different parts as we will add instructions to the blocks and therefore might break these invariants. Not necessary now though.


Nit: Ty or IVType


I'm not a fan of LoopInfo as a name, CLI could be our "go to" acronym?


Add a TODO above this block stating this should eventually move to the CLI or a "CLIUpdate" interface.

Meinersbur added inline comments.Dec 2 2020, 11:05 AM

The method;s name is "createStaticWorkshareLoop"


Return the modified CanonicalLoopInfo instead. The IP can be determined using getAfterIP().

Mostly for consistency with additional createXYZLoop methods, which create a new CanonicalLoopInfo instead of modifying an existing one.


Maybe make this member private (to the OpenMPIRBuilder implementation)? Other canonical loops such as tiling or OpenMPIRBuilder itself assuming a specific trip count during finialize() might depend on a specific trip count.

There's nothing like this now nor do I plan to add such things, but until we have a use case why front-end should directly modify the trip-cout, I'd hide this from the API.


add a call to assertOK() here?


Uses of it within the CanonicalLoop still needs to be identified, for changes like this. This makes explicitly storing it redundant. Same would also apply to most BasicBlocks which have a fixed position relative to the header block, so we could traverse the CFG to find them instead of storing them.

I don't have a good reason for either direction, I suggest to do what's easier for now.



LoopInfo clashes with llvm::LoopInfo and clang::LoopInfo types. The latter already makes it impossible to include some LLVM headers into CGLoopInfo.cpp


Could you emit the fini call into the getExit block instead? getAfterIP is meant for user code that executes, well, after the runtime boilerplate has finished.


Call LoopInfo->assertOK() when the changes are done?

Thanks @ftynse for this patch. This is great and will speed us up by a month.


No change required but just asking a question for my understanding,
Can createWorkshareLoop call createCanonicalLoop internally?


A barrier insertion with a TODO to look at nowait attribute?

Meinersbur added inline comments.Dec 2 2020, 11:09 AM

Did you consider using replaceAllUsesWith? setTripCount should already have removed the instances ub cond and latch.

Meinersbur added inline comments.Dec 2 2020, 11:12 AM

In the patch I am working on, I extracted most of createCanonicalLoop into a private createCanonicalLoopSkeleton to be used by other createXYZLoop methods (in my case: tiling).

However, this patch modifies an existing CanonicalLoopInfo, instead of creating a new one.

ftynse updated this revision to Diff 309487.Dec 4 2020, 2:03 AM
ftynse marked 18 inline comments as done.

Address reviews

ftynse added inline comments.Dec 4 2020, 2:11 AM

I only implemented the static schedule for now. My idea was to have createStatic, createDynamic, etc and eventually add a createWorkshareLoop that additionally takes a schedule kind flag and dispatches to the right function.


If I understood @Meinersbur's comment on another patch correctly, the overall design is to create canonical loops and transform them later (e.g., tile or workshare) with dedicated functions. createStaticWorkshareLoop follows this idea and therefore does _not_ create the loop, only the worksharing constructs.


Moved to implementation.


This replaces the uses of the induction variable (phi of the header block) with induction-variable + lower-bound-from-runtime-call, setTripCount updated the trip count (another argument in icmp in the condition block).

Switched to replaceAllUsesWithIf though.


Added a flag to the function. Let's have the caller decide if they need a barrier or not. For example, clang doesn't insert a barrier after the last loop in the "parallel" segment even if the loop doesn't have "nowait".

@ftynse would you mind having a look at D87247 ?

LGTM. Please wait for approval from @Meinersbur.


Nit: There is a create barrier function in the OpenMPIRBuilder.
CreateBarrier(Builder.saveIP(), llvm::omp::OMPD_barrier);

This revision is now accepted and ready to land.Dec 6 2020, 10:12 AM

Thanks for the patch. Just a couple of nits


I strongly suspect that @jdoerfert was referring to argument Type* type


Nit: To keep a consistent behavior throughout the builder, please change to:

if (!updateToLocation(Loc))
    return Loc.IP;

NIT: please name all alloca's above - easier to track in IR.

ftynse updated this revision to Diff 309845.Dec 7 2020, 2:26 AM
ftynse marked an inline comment as done.

Address more reviews.

ftynse marked an inline comment as done.Dec 7 2020, 2:28 AM
ftynse added inline comments.

It's already inconsistent, createCanonicalLoop doesn't follow this pattern, and we cannot return Loc.IP here because the return type is CanonicalLoopInfo *. Returning nullptr instead.


Yes, but it would emit another call to get_thread_num and OMPD_barrier could misled somebody to think the emitted code comes from a barrier directive, which it is not.

jdoerfert added inline comments.Dec 7 2020, 8:06 AM

but it would emit another call to get_thread_num

That is not a problem we address here. They are deduplicated later.

OMPD_barrier could misled somebody to think the emitted code comes from a barrier directive, which it is not.

The OpenMP runtime is integrated with existing tooling. Barriers "know" what caused them and that is available to profilers, debuggers, etc. We should not break with this tooling capability. The worksharing type for barriers is available via:
createBarrier(Loc, OMPD_for, /* ForceSimpleCall */ false, /* CheckCancelFlag */ false)
The last argument is false only for now. The cancel flag capabilities will be needed later as well, at least from the OpenMP frontends.

ftynse updated this revision to Diff 309938.Dec 7 2020, 9:24 AM
ftynse marked an inline comment as done.

Address even more review

This revision was landed with ongoing or failed builds.Dec 7 2020, 1:32 PM
This revision was automatically updated to reflect the committed changes.