Introduce a function that creates a statically-scheduled workshare loop
out of a canonical loop created earlier by the OpenMPIRBuilder. This
basically amounts to injecting runtime calls to the preheader and the
after block and updating the trip count. Static scheduling kind is
currently hardcoded and needs to be extracted from the runtime library
into common TableGen definitions.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Only Nits from me. @Meinersbur wdyt?
llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h | ||
---|---|---|
270 | The preheader (only) solution is for static schedules but that is OK for now. | |
667 | nice! | |
751 | @Meinersbur: Eventually we might want to keep pointers identifying these different parts as we will add instructions to the blocks and therefore might break these invariants. Not necessary now though. | |
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | ||
1006 | Nit: Ty or IVType | |
1019 | I'm not a fan of LoopInfo as a name, CLI could be our "go to" acronym? | |
1089 | Add a TODO above this block stating this should eventually move to the CLI or a "CLIUpdate" interface. |
llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h | ||
---|---|---|
270 | The method;s name is "createStaticWorkshareLoop" | |
281 | Return the modified CanonicalLoopInfo instead. The IP can be determined using getAfterIP(). Mostly for consistency with additional createXYZLoop methods, which create a new CanonicalLoopInfo instead of modifying an existing one. | |
746 | Maybe make this member private (to the OpenMPIRBuilder implementation)? Other canonical loops such as tiling or OpenMPIRBuilder itself assuming a specific trip count during finialize() might depend on a specific trip count. There's nothing like this now nor do I plan to add such things, but until we have a use case why front-end should directly modify the trip-cout, I'd hide this from the API. | |
750 | add a call to assertOK() here? | |
751 | Uses of it within the CanonicalLoop still needs to be identified, for changes like this. This makes explicitly storing it redundant. Same would also apply to most BasicBlocks which have a fixed position relative to the header block, so we could traverse the CFG to find them instead of storing them. I don't have a good reason for either direction, I suggest to do what's easier for now. | |
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | ||
1019 | +1 LoopInfo clashes with llvm::LoopInfo and clang::LoopInfo types. The latter already makes it impossible to include some LLVM headers into CGLoopInfo.cpp | |
1092–1093 | Could you emit the fini call into the getExit block instead? getAfterIP is meant for user code that executes, well, after the runtime boilerplate has finished. | |
1095 | Call LoopInfo->assertOK() when the changes are done? |
Thanks @ftynse for this patch. This is great and will speed us up by a month.
llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h | ||
---|---|---|
282 | No change required but just asking a question for my understanding, | |
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | ||
1094 | A barrier insertion with a TODO to look at nowait attribute? |
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | ||
---|---|---|
1088 | Did you consider using replaceAllUsesWith? setTripCount should already have removed the instances ub cond and latch. |
llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h | ||
---|---|---|
282 | In the patch I am working on, I extracted most of createCanonicalLoop into a private createCanonicalLoopSkeleton to be used by other createXYZLoop methods (in my case: tiling). However, this patch modifies an existing CanonicalLoopInfo, instead of creating a new one. |
llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h | ||
---|---|---|
270 | I only implemented the static schedule for now. My idea was to have createStatic, createDynamic, etc and eventually add a createWorkshareLoop that additionally takes a schedule kind flag and dispatches to the right function. | |
282 | If I understood @Meinersbur's comment on another patch correctly, the overall design is to create canonical loops and transform them later (e.g., tile or workshare) with dedicated functions. createStaticWorkshareLoop follows this idea and therefore does _not_ create the loop, only the worksharing constructs. | |
746 | Moved to implementation. | |
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | ||
1088 | This replaces the uses of the induction variable (phi of the header block) with induction-variable + lower-bound-from-runtime-call, setTripCount updated the trip count (another argument in icmp in the condition block). Switched to replaceAllUsesWithIf though. | |
1094 | Added a flag to the function. Let's have the caller decide if they need a barrier or not. For example, clang doesn't insert a barrier after the last loop in the "parallel" segment even if the loop doesn't have "nowait". |
LGTM. Please wait for approval from @Meinersbur.
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | ||
---|---|---|
1106–1108 | Nit: There is a create barrier function in the OpenMPIRBuilder. |
Thanks for the patch. Just a couple of nits
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | ||
---|---|---|
1006 | I strongly suspect that @jdoerfert was referring to argument Type* type | |
1032 | Nit: To keep a consistent behavior throughout the builder, please change to: if (!updateToLocation(Loc)) return Loc.IP; | |
1047–1049 | NIT: please name all alloca's above - easier to track in IR. |
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | ||
---|---|---|
1032 | It's already inconsistent, createCanonicalLoop doesn't follow this pattern, and we cannot return Loc.IP here because the return type is CanonicalLoopInfo *. Returning nullptr instead. | |
1106–1108 | Yes, but it would emit another call to get_thread_num and OMPD_barrier could misled somebody to think the emitted code comes from a barrier directive, which it is not. |
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | ||
---|---|---|
1106–1108 |
That is not a problem we address here. They are deduplicated later.
The OpenMP runtime is integrated with existing tooling. Barriers "know" what caused them and that is available to profilers, debuggers, etc. We should not break with this tooling capability. The worksharing type for barriers is available via: |
The preheader (only) solution is for static schedules but that is OK for now.