Changeset View
Standalone View
llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
Show First 20 Lines • Show All 150 Lines • ▼ Show 20 Lines | public: | ||||
/// | /// | ||||
/// \returns The insertion point after the barrier. | /// \returns The insertion point after the barrier. | ||||
InsertPointTy CreateCancel(const LocationDescription &Loc, Value *IfCondition, | InsertPointTy CreateCancel(const LocationDescription &Loc, Value *IfCondition, | ||||
omp::Directive CanceledDirective); | omp::Directive CanceledDirective); | ||||
/// Generator for '#omp parallel' | /// Generator for '#omp parallel' | ||||
/// | /// | ||||
/// \param Loc The insert and source location description. | /// \param Loc The insert and source location description. | ||||
/// \param AllocaIP The insertion points to be used for alloca instructions. | |||||
/// \param BodyGenCB Callback that will generate the region code. | /// \param BodyGenCB Callback that will generate the region code. | ||||
/// \param PrivCB Callback to copy a given variable (think copy constructor). | /// \param PrivCB Callback to copy a given variable (think copy constructor). | ||||
/// \param FiniCB Callback to finalize variable copies. | /// \param FiniCB Callback to finalize variable copies. | ||||
/// \param IfCondition The evaluated 'if' clause expression, if any. | /// \param IfCondition The evaluated 'if' clause expression, if any. | ||||
/// \param NumThreads The evaluated 'num_threads' clause expression, if any. | /// \param NumThreads The evaluated 'num_threads' clause expression, if any. | ||||
/// \param ProcBind The value of the 'proc_bind' clause (see ProcBindKind). | /// \param ProcBind The value of the 'proc_bind' clause (see ProcBindKind). | ||||
/// \param IsCancellable Flag to indicate a cancellable parallel region. | /// \param IsCancellable Flag to indicate a cancellable parallel region. | ||||
/// | /// | ||||
/// \returns The insertion position *after* the parallel. | /// \returns The insertion position *after* the parallel. | ||||
IRBuilder<>::InsertPoint | IRBuilder<>::InsertPoint | ||||
CreateParallel(const LocationDescription &Loc, BodyGenCallbackTy BodyGenCB, | CreateParallel(const LocationDescription &Loc, InsertPointTy AllocaIP, | ||||
PrivatizeCallbackTy PrivCB, FinalizeCallbackTy FiniCB, | BodyGenCallbackTy BodyGenCB, PrivatizeCallbackTy PrivCB, | ||||
Value *IfCondition, Value *NumThreads, | FinalizeCallbackTy FiniCB, Value *IfCondition, | ||||
omp::ProcBindKind ProcBind, bool IsCancellable); | Value *NumThreads, omp::ProcBindKind ProcBind, | ||||
bool IsCancellable); | |||||
/// Generator for '#omp flush' | /// Generator for '#omp flush' | ||||
/// | /// | ||||
/// \param Loc The location where the flush directive was encountered | /// \param Loc The location where the flush directive was encountered | ||||
void CreateFlush(const LocationDescription &Loc); | void CreateFlush(const LocationDescription &Loc); | ||||
/// Generator for '#omp taskwait' | /// Generator for '#omp taskwait' | ||||
/// | /// | ||||
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines | public: | ||||
/// The underlying LLVM-IR module | /// The underlying LLVM-IR module | ||||
Module &M; | Module &M; | ||||
/// The LLVM-IR Builder used to create IR. | /// The LLVM-IR Builder used to create IR. | ||||
IRBuilder<> Builder; | IRBuilder<> Builder; | ||||
/// Map to remember source location strings | /// Map to remember source location strings | ||||
StringMap<Constant *> SrcLocStrMap; | StringMap<Constant *> SrcLocStrMap; | ||||
fghanim: What's the benefit of this over just maintaining an alloca insertion point?
With this we will… | |||||
No functional difference. Basically the same reason why clang has an AllocIRBuilder (I think), we can use it independently of the other one. The alternative is to save the builder IP, set it to the alloca IP, do alloca stuff, reset the builder IP, when you use the AllocaIRBuilder below. Again, no functional difference, just less setting/resetting of IPs. jdoerfert: No functional difference. Basically the same reason why clang has an AllocIRBuilder (I think)… | |||||
Not Done ReplyInline ActionsI understand where you are coming from, and I understand that functionally there is mostly no change - for now. But I prefer we do a small struct with some RAII, over adding an entire builder. I think it's a bit overkill and will be a source to lots of trouble. Clang maintains an instruction AllocaInsertPt, not a specific builder. Speaking of which, This is completely from memory (i.e. check to make sure); I think we already save the previous AllocaInsertionPt in clang as part of the bodyGenCB. In which case, this is not needed; a nested parallel will always be generated by clang as part of the bodyGenCB, which in turn is going to save the Alloca insertion point for us. fghanim: I understand where you are coming from, and I understand that functionally there is mostly no… | |||||
Could you explain what kind of overkill you see here? And maybe also the lots of trouble you expect from a builder versus a insertion point? It is the same information, just packed in a different struct, right?
Right.
I think so.
Not strictly, no.
Yes.
Yes.
So far, because this will not be the only one that needs to create allocas. If you want to add the allocaIP to all runtime calls that create allocas, I'm fine with that too. jdoerfert: > I understand where you are coming from, and I understand that functionally there is mostly no… | |||||
Not Done ReplyInline Actions
Using an IRBuilder when an insertion point suffices
Currently, when using the OMPBuilder you need to juggle two IRBuilders, let's not make them 3 :)
no it is not. InsetrionPoint vs struct that contains Insertionpoint + other things. But that's beside the point I mean, is the AllocaBuilder providing a new functionality other than a convenient way to keep track of the Alloca InsertionPoint? fghanim: > Could you explain what kind of overkill you see here?
Using an IRBuilder when an insertion… | |||||
The stack is already here, implicitly, and actually on the stack. With
I disagree. The "other things" is the only interaction we have with the insertion point. So storing or accepting an insertion point is fine for the only purpose of creating a Builder and interacting with the builder. The difference is that we need to add more churn to update and reset the insertion point (what happens behind the scenes with the one line I quoted above).
I did not go this route because I think it complicates the API with little gain. I was also unsure what the flang situation looks like here. Do we have an alloca insertion point there too? jdoerfert: > Personally, I prefer some suitable structure to your needs and a stack (i.e. the way clang… | |||||
Not Done ReplyInline ActionsBut we haven't solved a problem, we just kicked it down the road with the TODO you added, with the only trade-off being convenience now. The AllocaBuilder is not usable by the frontend - the frontend doesn't & shouldn't know or care about any builders other than its own. However, an insertion point can be passed to the frontend (i.e. bodyCB). So keeping track of an Alloca insertion point across nests -similar to how clang does it- is what we need. I still prefer having a stack to push to - maybe even piggy back on the finalization stack by creating a struct that contains all relevant info per nest level. But you don't like it, fine. Then just pass an argument for allocaIP and be done with it. Flang (or clang or whichever other frontend) are required to emit allocas into function entry block by all LLVM passes that follow. As such, we can safely assume the outermost call is always going to have the AllocaIP in the entry block of the enclosing function. Which means the first time we call CreateParallel they can pass that as arg.. or pass an empty insertion point, in which case we already detect it. Adding a builder for every insertion point that we need to pass across nests or regions is not something we want to do for the reasons I mentioned earlier. Whether we like it or not, it is a builder we have to juggle along the other two, no matter how much we try to specialize its use, or keep it "hidden". fghanim: But we haven't solved a problem, we just kicked it down the road with the TODO you added, with… | |||||
/// Map to remember existing ident_t*. | /// Map to remember existing ident_t*. | ||||
DenseMap<std::pair<Constant *, uint64_t>, GlobalVariable *> IdentMap; | DenseMap<std::pair<Constant *, uint64_t>, GlobalVariable *> IdentMap; | ||||
/// Helper that contains information about regions we need to outline | /// Helper that contains information about regions we need to outline | ||||
/// during finalization. | /// during finalization. | ||||
struct OutlineInfo { | struct OutlineInfo { | ||||
using PostOutlineCBTy = std::function<void(Function &)>; | using PostOutlineCBTy = std::function<void(Function &)>; | ||||
Not Done ReplyInline ActionsNit: isn't this supposed to be part of one of the other patches you split of this? fghanim: Nit: isn't this supposed to be part of one of the other patches you split of this? | |||||
PostOutlineCBTy PostOutlineCB; | PostOutlineCBTy PostOutlineCB; | ||||
BasicBlock *EntryBB, *ExitBB; | BasicBlock *EntryBB, *ExitBB; | ||||
Not Done ReplyInline ActionsI see two benefits of passing entry and exit blocks, as opposed to what we used to do:
fghanim: I see two benefits of passing entry and exit blocks, as opposed to what we used to do:
1. less… | |||||
It is 2. Here, finalization steps will outline the inner region and introduce split blocks in the process. Those were not in the outer regions blocks vector which caused problems. The inputs/outputs are not allowed to change though, that invariant seems to hold fine. jdoerfert: It is 2. Here, finalization steps will outline the inner region and introduce split blocks in… | |||||
Not Done ReplyInline ActionsOh right. Thanks for explaining that. :) fghanim: Oh right. Thanks for explaining that. :)
Nit: if possible, add a simple assert to check that… | |||||
We can only if we remember them. I'll do it in debug builds I guess. jdoerfert: We can only if we remember them. I'll do it in debug builds I guess. | |||||
/// Collect all blocks in between EntryBB and ExitBB in both the given | /// Collect all blocks in between EntryBB and ExitBB in both the given | ||||
/// vector and set. | /// vector and set. | ||||
void collectBlocks(SmallPtrSetImpl<BasicBlock *> &BlockSet, | void collectBlocks(SmallPtrSetImpl<BasicBlock *> &BlockSet, | ||||
SmallVectorImpl<BasicBlock *> &BlockVector); | SmallVectorImpl<BasicBlock *> &BlockVector); | ||||
Not Done ReplyInline ActionsWhat is the benefit of passing blockSet when it is exclusively used inside of collectBlocks? fghanim: What is the benefit of passing `blockSet` when it is exclusively used inside of `collectBlocks`? | |||||
It's used in line 684 outside of collectBlocks. jdoerfert: It's used in line 684 outside of collectBlocks. | |||||
Not Done ReplyInline ActionsOh, Thanks. I missed that. :) fghanim: Oh, Thanks. I missed that. :)
In this case; I think both the Blockset, and BlockVector have the… | |||||
We could but we don't want to: jdoerfert: We could but we don't want to:
(1) that will make the currently logarithmic query in 684… | |||||
}; | }; | ||||
/// Collection of regions that need to be outlined during finalization. | /// Collection of regions that need to be outlined during finalization. | ||||
SmallVector<OutlineInfo, 16> OutlineInfos; | SmallVector<OutlineInfo, 16> OutlineInfos; | ||||
/// Add a new region that will be outlined later. | /// Add a new region that will be outlined later. | ||||
void addOutlineInfo(OutlineInfo &&OI) { OutlineInfos.emplace_back(OI); } | void addOutlineInfo(OutlineInfo &&OI) { OutlineInfos.emplace_back(OI); } | ||||
▲ Show 20 Lines • Show All 193 Lines • Show Last 20 Lines |
What's the benefit of this over just maintaining an alloca insertion point?
With this we will have 3 IRBuilders to maintain and keep track of: the clang (or flang) IRBuilder, the OMP IRBuilder and the allocaBuilder