We need to keep track of the alloca insertion point (which we already
communicate via the callback to the user) as we place allocas as well.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Thanks for the Patch. I have few questions to help me understand what's going on.
llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h | ||
---|---|---|
283 | What's the benefit of this over just maintaining an alloca insertion point? With this we will have 3 IRBuilders to maintain and keep track of: the clang (or flang) IRBuilder, the OMP IRBuilder and the allocaBuilder | |
292 | I see two benefits of passing entry and exit blocks, as opposed to what we used to do:
| |
297 | What is the benefit of passing blockSet when it is exclusively used inside of collectBlocks? |
llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h | ||
---|---|---|
283 | No functional difference. Basically the same reason why clang has an AllocIRBuilder (I think), we can use it independently of the other one. The alternative is to save the builder IP, set it to the alloca IP, do alloca stuff, reset the builder IP, when you use the AllocaIRBuilder below. Again, no functional difference, just less setting/resetting of IPs. | |
292 | It is 2. Here, finalization steps will outline the inner region and introduce split blocks in the process. Those were not in the outer regions blocks vector which caused problems. The inputs/outputs are not allowed to change though, that invariant seems to hold fine. | |
297 | It's used in line 684 outside of collectBlocks. |
llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h | ||
---|---|---|
283 | I understand where you are coming from, and I understand that functionally there is mostly no change - for now. But I prefer we do a small struct with some RAII, over adding an entire builder. I think it's a bit overkill and will be a source to lots of trouble. Clang maintains an instruction AllocaInsertPt, not a specific builder. Speaking of which, This is completely from memory (i.e. check to make sure); I think we already save the previous AllocaInsertionPt in clang as part of the bodyGenCB. In which case, this is not needed; a nested parallel will always be generated by clang as part of the bodyGenCB, which in turn is going to save the Alloca insertion point for us. | |
292 | Oh right. Thanks for explaining that. :) | |
297 | Oh, Thanks. I missed that. :) |
llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h | ||
---|---|---|
283 |
Could you explain what kind of overkill you see here? And maybe also the lots of trouble you expect from a builder versus a insertion point? It is the same information, just packed in a different struct, right?
Right.
I think so.
Not strictly, no.
Yes.
Yes.
So far, because this will not be the only one that needs to create allocas. If you want to add the allocaIP to all runtime calls that create allocas, I'm fine with that too. | |
292 | We can only if we remember them. I'll do it in debug builds I guess. | |
297 | We could but we don't want to: |
Overall this seems great Thanks! I have one minor concern:
This patch seems to do 2 things: Support for nested parallel regions(which was crashing earlier) and some infrastructure change(introducing AllocBuilder..).
I'm not sure of this, but is it possible to separate these as 2( or more) patches ? 1 for Nested parallel region support and other patch as a infrastructure change ?
Main benefit of this approach would be that this would be easier to track/maintain changes in future.
Thanks!
llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h | ||
---|---|---|
283 |
Using an IRBuilder when an insertion point suffices
Currently, when using the OMPBuilder you need to juggle two IRBuilders, let's not make them 3 :)
no it is not. InsetrionPoint vs struct that contains Insertionpoint + other things. But that's beside the point I mean, is the AllocaBuilder providing a new functionality other than a convenient way to keep track of the Alloca InsertionPoint? |
llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h | ||
---|---|---|
283 |
The stack is already here, implicitly, and actually on the stack. With
I disagree. The "other things" is the only interaction we have with the insertion point. So storing or accepting an insertion point is fine for the only purpose of creating a Builder and interacting with the builder. The difference is that we need to add more churn to update and reset the insertion point (what happens behind the scenes with the one line I quoted above).
I did not go this route because I think it complicates the API with little gain. I was also unsure what the flang situation looks like here. Do we have an alloca insertion point there too? |
llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h | ||
---|---|---|
283 | But we haven't solved a problem, we just kicked it down the road with the TODO you added, with the only trade-off being convenience now. The AllocaBuilder is not usable by the frontend - the frontend doesn't & shouldn't know or care about any builders other than its own. However, an insertion point can be passed to the frontend (i.e. bodyCB). So keeping track of an Alloca insertion point across nests -similar to how clang does it- is what we need. I still prefer having a stack to push to - maybe even piggy back on the finalization stack by creating a struct that contains all relevant info per nest level. But you don't like it, fine. Then just pass an argument for allocaIP and be done with it. Flang (or clang or whichever other frontend) are required to emit allocas into function entry block by all LLVM passes that follow. As such, we can safely assume the outermost call is always going to have the AllocaIP in the entry block of the enclosing function. Which means the first time we call CreateParallel they can pass that as arg.. or pass an empty insertion point, in which case we already detect it. Adding a builder for every insertion point that we need to pass across nests or regions is not something we want to do for the reasons I mentioned earlier. Whether we like it or not, it is a builder we have to juggle along the other two, no matter how much we try to specialize its use, or keep it "hidden". |
Thanks for the update. Just a couple of Nits, and a quick note
llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h | ||
---|---|---|
288 | Nit: isn't this supposed to be part of one of the other patches you split of this? | |
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | ||
433 | Here and elsewhere: You seem to have forgotten to undo the changes introduced by using a separate AllocaBuilder? Also, Nit: before using AllocaIP, either add an assert that it is not empty, or alternatively, add logic to just set it to entry block of outerfn if it is | |
490 | NIT: I think it would be better to give each a different name, to avoid unnecessary confusion. maybe OuterAllocaIP and innerAllocaIP? Also, we don't overwrite the outerAllocaIP in case it's needed later? |
I'll address the nits. Unsure if that is all. I also don't get the one comment.
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | ||
---|---|---|
433 |
I don't get this. These changes are on purpose. |
I'll address the nits.
Thanks :)
Unsure if that is all.
depends on the comment about AllocaBuilder
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | ||
---|---|---|
433 | Oh, my bad. I assumed that since we now pass allocaip to communicate where is the insertion point, using builder the way we used to is sufficient, and this was leftover code. So now what purpose does AllocaBuilder serve? |
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | ||
---|---|---|
433 |
|
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | ||
---|---|---|
433 |
Wouldn't a comment be better? |
What's the benefit of this over just maintaining an alloca insertion point?
With this we will have 3 IRBuilders to maintain and keep track of: the clang (or flang) IRBuilder, the OMP IRBuilder and the allocaBuilder