Before the conversion to LLVM-IR dialect and ultimately LLVM IR, FIR is
partially rewritten into a codegen form. This patch adds that pass, the
fircg dialect, and the small set of Ops in the fircg (sub) dialect.
Fircg is not part of the FIR dialect and should never be used outside of
the (closed) conversion to LLVM IR.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp | ||
---|---|---|
55 | Is the notion of "extended form" for embox documented anywhere? If not can you expand the doc here to describe what it is? (same for all patterns) | |
133 | Nit: you can omit the , 8 everywhere, SmallVector now computes a default. (that is unless you know that 8 is better than the default in this particular case of course) | |
200 | Is this comment up-to-date? | |
235 | By making this a module pass, we're losing on parallelism. void runOnOperation() { Operation *op = getOperation(); if (auto func = op->dyn_cast<mlir::FuncOp>()) runOn(func, func.getBody()); if (auto global : op->dyn_cast<fir::GlobalOp>()) runOn(global, global.getRegion()); } | |
241 | This test looks spurious? The loop inside the body would not execute if there are no region right? | |
249 | I don't quite get the delayed erasing, this whole simplification could be done without keeping state in a vector: region.walk([] (Operation *op) { if (auto embox = dyn_cast<EmboxOp>(op)) { if (embox.getShape()) { op->erase(); return; } } } (note that `op->erase()` should already assert for `op->use_empty()`, no need to duplicate here) But also, could you do it more simply: region.walk([] (Operation *op) { if (isOpTriviallyDead(op)) op->erase(); } ? | |
flang/tools/fir-opt/fir-opt.cpp | ||
21 | What is the intended difference between registerFIRPasses and registerOptPasses ? | |
22 | Why this change? |
flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp | ||
---|---|---|
300 | NIT: These comments doesn't make sense here? Must be gone in while preparing the patch ? |
flang/include/flang/Optimizer/CodeGen/CGPasses.td | ||
---|---|---|
27 | I'm not sure whether we need OpenACC Dialect too ? (Since OpenACC doesn't lower as OpenMP) @clementval do you have any comments/thought on this ? |
flang/include/flang/Optimizer/CodeGen/CGPasses.td | ||
---|---|---|
27 | We will probably need it later but since it is not done yet it can be added in a later patch. |
flang/include/flang/Optimizer/CodeGen/CGPasses.td | ||
---|---|---|
27 | You only need to list here the dialects that this pass introduces that aren't in the input. |
flang/include/flang/Optimizer/CodeGen/CGPasses.td | ||
---|---|---|
27 | That's what I thought. Since there is no fir op to openacc op conversion it is not needed. At least for now. |
flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp | ||
---|---|---|
133 | Thanks. | |
200 | Removed. | |
235 | It might be possible, but a quick experiment showed a bunch of tests regressing. | |
241 | Removed. | |
249 | This should be a very general DCE like in mlir Transforms/.CSE.cpp. That seems to have been mangled but can be brought back. | |
flang/tools/fir-opt/fir-opt.cpp | ||
21 | Fixed. | |
22 | A synch problem with the source. |
flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp | ||
---|---|---|
300 | Removed. The pass is now defined in a tablegen file. |
(seems like you need to run git clang-format)
flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp | ||
---|---|---|
235 | What kind of regressions? | |
249 | Can you just call this instead? https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Transforms/RegionUtils.h#L57 In general generic algorithm in passes like this ends up being replicated everywhere when they can be exposed a general utilities, if the simplifyRegions does not fit the bill here can we introduce another utility there? | |
flang/tools/fir-opt/fir-opt.cpp | ||
21 | It is still unclear to me what is the intent here: registerOptimizerPasses is not documented, and registerMLIRPassesForFortranTools says "Register the standard passes we use" which seems like it should have all the required passes. Why do you introduce registerOptimizerPasses at all? | |
23 | This does not seem needed |
flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp | ||
---|---|---|
235 | Actually in case I wasn't clear, the snippet I wrote above requires to schedule the pass twice in a pipeline, the test would look like this: fir-opt --pass-pipeline="func(cg-rewrite),fir.global(cg-rewrite)" If you can provide an .mlir test where this regresses, I'd be happy to take a look. | |
flang/test/Fir/cg-ops.fir | ||
18 | Your test does not have any global op? |
flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp | ||
---|---|---|
235 | I spent some time working to make this pass multithreaded, including converting data structures to be thread local, etc. We have approximately 800 basic tests and many thousands of Fortran tests. The attempted changes caused many of these tests to regress and start failing. The existing pass accomplishes its task as is. There is no data-based argument that the pass is a bottleneck. The code is being upstreamed in a series of many patches, and there will be ample opportunity to write and debug a parallel algorithm in subsequent patches. Finally, anyone that wants to can contribute improvements on fir-dev. | |
249 | Possibly. For the time being, we prefer to keep the simple dead-code cleanup. | |
flang/tools/fir-opt/fir-opt.cpp | ||
21 | Upstreaming the code is being done in a series of many patches. I have chosen to group code in ways that make the upstreaming process more manageable. For example, the MLIR registration interfaces have been changing, and we want those calls isolated until they can be upstreamed. Furthermore, link times are already expensive, so we have made efforts for reduce that impact by selecting specific required libraries from MLIR. For these reasons there is no reason to change this now. There will be opportunities to regroup registration calls in subsequent patches |
flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp | ||
---|---|---|
235 |
The problem is that I could your pass right after you land it, but that will create issues for you downstream, and right now it seems you can't show upstream why it is an issue with a test for this pass. That seems problematic to me for upstream development right now. The "FuncOp Pass" vs "Module Pass" is more than just "waiting for a bottleneck: it is about system design as whole. The fact that you bring thread_local here seems fishy and makes me worried about what kind of skeletons we'll find down the road. | |
249 | I'm fine if you don't want to call simplifyRegions, but then can you refactor this as a pre-patch into a utility in a common place to avoid code duplication then? I'm not enthusiastic to see each pass reimplementing generic utility / algorithm, and I don't see really a reason why we should at all. This has community/project wise implications as well, the CIRCT folks were reporting performance issues with this kind of code path just a few days ago: https://llvm.discourse.group/t/speeding-up-canonicalize/3015 ; we were able to work together to improve the general tooling and infra around this. | |
flang/tools/fir-opt/fir-opt.cpp | ||
21 | Can you clarify how this is impacting link time here? Can you introduce the APIs when they make sense in-tree please! I have no way to figure if there is a good reason or not for what you send for review otherwise. I don't know in which state is your downstream project, but I'm concerned about the issue it is causing right now on the restriction it puts on the code structure and organization. For example the regressions you mentioned that you spot downstream but that can't be reproduced upstream with a lit tests. |
Re the comment on losing on parallelism, the uploaded code is functionally correct and complete. We (or the community at large) can certainly make this execute in parallel once the up-streaming is done. Fast execution and maintainability are in everyone’s interests, including ours, and we will be motivated to update the code after we've up streamed code gen and its tests.
We would like to upstream this part of the patch with an understanding that it will be fixed at a later date.
This is true, there are still a bunch of comment I have that are still undressed even the parallelism aside: i.e. refactoring DCE and not duplicating the registration function when it isn't needed or not justified.
We (or the community at large) can certainly make this execute in parallel once the up-streaming is done. Fast execution and maintainability are in everyone’s interests, including ours, and we will be motivated to update the code after we've up streamed code gen and its tests.
We would like to upstream this part of the patch with an understanding that it will be fixed at a later date.
This does not answer my previous question: are you OK with me sending a patch to fix the parallelism there immediately after this patch lands?
I am under the impression right now that this is not possible because it would break some downstream tests in some ways that are unclear to me at the moment.
The OpenMP for Flang team has consistently been asking for upstreaming the fir-dev branch. I made an initial attempt to upstream a portion in May last year (https://reviews.llvm.org/D79731) which was discarded since it
did not have any community support. The current situation presents a few difficulties for the OpenMP team.
- The OpenMP dialect is in the llvm-project repo, while FIR codegen is developed in the fir-dev repo. So we have to first make a patch to llvm-project/mlir and then wait for it to be merged into the fir-dev repo (which can take from one to two weeks) and then make the relevant changes in the fir-dev repo. If FIR codegen was also upstream then this delay and committing to multiple repositories can be avoided.
- Since the bridge code (parse-tree to FIR) and codegen is not available in llvm-project/flang, any commits that we make to fir-dev cannot be upstreamed. So all our changes are also increasing the diff between fir-dev and upstream llvm-project/flang. Left uncontrolled this might become an untameable monster and we might never be able to fully upstream fir-dev.
- Since the OpenMP code which works with FIR is in fir-dev we cannot often show the context to MLIR core team. On at least one occasion this has become an issue while seeking help. If the code is upstream this will facilitate better discussions with the MLIR core team.
Given the issues mentioned above, we favour a faster upstreaming process so that the entire community can work on a single code base.
The MLIR core team has been very helpful with the OpenMP dialect work and we have benefitted from their advise and review comments. I also fondly recall that Mehdi has stepped in to support Flang/F18 (https://lists.llvm.org/pipermail/llvm-dev/2020-January/138219.html) when there was an opinion to consider other candidates. Mehdi has been very gracious with his time and has provided several reviews for FIR upstreaming. It will be unwise to not consider his review comments. We should try to address the DCE and registration comments. If addressing a comment requires substantial work then may be we can considered for later with a suitable mechanism to track this publicly.
I hope we can soon find a way out of this impasse and make progress.
flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp | ||
---|---|---|
249 | Is the suggestion here to move the following DCE code to a file inside the mlir directory tree? // Clean up the region. void simplifyRegion(mlir::Region ®ion) { for (auto &block : region.getBlocks()) for (auto &op : block.getOperations()) { for (auto ® : op.getRegions()) simplifyRegion(reg); maybeEraseOp(&op); } doDCE(); } /// Run a simple DCE cleanup to remove any dead code after the rewrites. void doDCE() { std::vector<mlir::Operation *> workList; workList.swap(opsToErase); while (!workList.empty()) { for (auto *op : workList) { std::vector<mlir::Value> opOperands(op->operand_begin(), op->operand_end()); LLVM_DEBUG(llvm::dbgs() << "DCE on " << *op << '\n'); ++numDCE; op->erase(); for (auto opnd : opOperands) maybeEraseOp(opnd.getDefiningOp()); } workList.clear(); workList.swap(opsToErase); } } void maybeEraseOp(mlir::Operation *op) { if (!op) return; if (op->hasTrait<mlir::OpTrait::IsTerminator>()) return; if (mlir::isOpTriviallyDead(op)) opsToErase.push_back(op); } |
flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp | ||
---|---|---|
249 | Yes this isn't more involved that what you're mentioning. I'll be happy to help with this kind of things, I just don't know how to proceed at the moment. The DCE refactoring is trivial, I'm more concerned about the testing problem with the pass and the inability to have IR tests upstream when there are issues. |
flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp | ||
---|---|---|
249 | The original suggestion was to make this pass not work on ModuleOp so that it would allow multithreading. That suggestion did not contain enough information. The code snippet does not compile. Fixing the compilation led to the test included in this patch failing. Subsequently an incorrect guess was made that appeared to fix the test in this patch. However, the guessed at solution was wrong as other tests not merged regressed. This was all done in an effort to follow your original suggestion. Clearly it is just not possible to merge tests that require code that has not been merged. |
flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp | ||
---|---|---|
249 | I am not sure what you're trying to say here, would you like me to send a patch that can be applied on top of this one and change this pass in a way that works with the test in-tree? |
flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp | ||
---|---|---|
249 | Here is the patch: https://reviews.llvm.org/D99132 for this part of the discussion. |
LGTM
flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp | ||
---|---|---|
249 | Seems like you updated to use this version of the patch. I'm sorry it took so long to get there for this revision: if something isn't clear with any of my comment feel free to ask me to clarify and I'm always happy to send a patch when it helps! |
I made a trivial fix to get the builds to pass.
https://reviews.llvm.org/rG502f27e66fd9fe44cd45ec5acae3e18f15f2d8c6
I've just checked, all Flang public workers are back to :green:. Thank you for your prompt fix @kiranchandramohan ! @schweitz , thanks for working on this!
I'm not sure whether we need OpenACC Dialect too ? (Since OpenACC doesn't lower as OpenMP) @clementval do you have any comments/thought on this ?