Page MenuHomePhabricator

[flang][fir] Add the pre-code gen rewrite pass and codegen ops.
ClosedPublic

Authored by schweitz on Mar 5 2021, 11:21 AM.

Details

Summary

Before the conversion to LLVM-IR dialect and ultimately LLVM IR, FIR is
partially rewritten into a codegen form. This patch adds that pass, the
fircg dialect, and the small set of Ops in the fircg (sub) dialect.
Fircg is not part of the FIR dialect and should never be used outside of
the (closed) conversion to LLVM IR.

Diff Detail

Event Timeline

schweitz created this revision.Mar 5 2021, 11:21 AM
schweitz requested review of this revision.Mar 5 2021, 11:21 AM
Herald added a project: Restricted Project. · View Herald Transcript
mehdi_amini added inline comments.Mar 5 2021, 12:08 PM
flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp
55

Is the notion of "extended form" for embox documented anywhere? If not can you expand the doc here to describe what it is?
(rewriteStaticShape and rewriteDynamicShape aren't documented, but that may not be necessary with a longer description for the pattern here, ideally with snippets example)

(same for all patterns)

133

Nit: you can omit the , 8 everywhere, SmallVector now computes a default.

(that is unless you know that 8 is better than the default in this particular case of course)

200

Is this comment up-to-date?

235

By making this a module pass, we're losing on parallelism.
Can you make is an operation pass and filter here instead?

void runOnOperation()  {
  Operation *op = getOperation();
  if (auto func = op->dyn_cast<mlir::FuncOp>()) runOn(func, func.getBody());
  if (auto global : op->dyn_cast<fir::GlobalOp>()) runOn(global, global.getRegion());
}
241

This test looks spurious? The loop inside the body would not execute if there are no region right?

249

I don't quite get the delayed erasing, this whole simplification could be done without keeping state in a vector:

region.walk([] (Operation *op) {

if (auto embox = dyn_cast<EmboxOp>(op)) {
    if (embox.getShape()) {
      op->erase(); 
      return;
   }
}

}

(note that `op->erase()` should already assert for `op->use_empty()`, no need to duplicate here)

But also, could you do it more simply:

region.walk([] (Operation *op) {

if (isOpTriviallyDead(op)) op->erase();

}

?
flang/tools/fir-opt/fir-opt.cpp
21

What is the intended difference between registerFIRPasses and registerOptPasses ?

22

Why this change?

SouraVX added inline comments.
flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp
300

NIT: These comments doesn't make sense here? Must be gone in while preparing the patch ?
Could you please update them accordingly.

SouraVX added inline comments.Mar 8 2021, 10:08 AM
flang/include/flang/Optimizer/CodeGen/CGPasses.td
27

I'm not sure whether we need OpenACC Dialect too ? (Since OpenACC doesn't lower as OpenMP) @clementval do you have any comments/thought on this ?

clementval added inline comments.Mar 8 2021, 11:25 AM
flang/include/flang/Optimizer/CodeGen/CGPasses.td
27

We will probably need it later but since it is not done yet it can be added in a later patch.

mehdi_amini added inline comments.Mar 8 2021, 11:46 AM
flang/include/flang/Optimizer/CodeGen/CGPasses.td
27

You only need to list here the dialects that this pass introduces that aren't in the input.
So basically if you take a fir op (or other dialect) and you turn it into an OpenACC op then you need to add the dependency (same for the other dialects listed here).

clementval added inline comments.Mar 8 2021, 11:52 AM
flang/include/flang/Optimizer/CodeGen/CGPasses.td
27

That's what I thought. Since there is no fir op to openacc op conversion it is not needed. At least for now.

schweitz added inline comments.Mar 8 2021, 12:52 PM
flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp
133

Thanks.

200

Removed.

235

It might be possible, but a quick experiment showed a bunch of tests regressing.

241

Removed.

249

This should be a very general DCE like in mlir Transforms/.CSE.cpp. That seems to have been mangled but can be brought back.

flang/tools/fir-opt/fir-opt.cpp
21

Fixed.

22

A synch problem with the source.

schweitz updated this revision to Diff 329114.Mar 8 2021, 12:54 PM
schweitz added inline comments.Mar 8 2021, 1:25 PM
flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp
300

Removed. The pass is now defined in a tablegen file.

(seems like you need to run git clang-format)

flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp
235

What kind of regressions?

249

Can you just call this instead? https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Transforms/RegionUtils.h#L57

In general generic algorithm in passes like this ends up being replicated everywhere when they can be exposed a general utilities, if the simplifyRegions does not fit the bill here can we introduce another utility there?

flang/tools/fir-opt/fir-opt.cpp
21

It is still unclear to me what is the intent here: registerOptimizerPasses is not documented, and registerMLIRPassesForFortranTools says "Register the standard passes we use" which seems like it should have all the required passes.

Why do you introduce registerOptimizerPasses at all?

23

This does not seem needed

mehdi_amini added inline comments.Mar 8 2021, 6:06 PM
flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp
235

Actually in case I wasn't clear, the snippet I wrote above requires to schedule the pass twice in a pipeline, the test would look like this: fir-opt --pass-pipeline="func(cg-rewrite),fir.global(cg-rewrite)"

If you can provide an .mlir test where this regresses, I'd be happy to take a look.

flang/test/Fir/cg-ops.fir
18

Your test does not have any global op?

schweitz updated this revision to Diff 329360.Mar 9 2021, 9:01 AM
schweitz updated this revision to Diff 331348.Mar 17 2021, 12:42 PM
schweitz marked 2 inline comments as done.Mar 17 2021, 12:57 PM
schweitz added inline comments.
flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp
235

I spent some time working to make this pass multithreaded, including converting data structures to be thread local, etc. We have approximately 800 basic tests and many thousands of Fortran tests. The attempted changes caused many of these tests to regress and start failing. The existing pass accomplishes its task as is. There is no data-based argument that the pass is a bottleneck. The code is being upstreamed in a series of many patches, and there will be ample opportunity to write and debug a parallel algorithm in subsequent patches. Finally, anyone that wants to can contribute improvements on fir-dev.

249

Possibly. For the time being, we prefer to keep the simple dead-code cleanup.

flang/tools/fir-opt/fir-opt.cpp
21

Upstreaming the code is being done in a series of many patches. I have chosen to group code in ways that make the upstreaming process more manageable. For example, the MLIR registration interfaces have been changing, and we want those calls isolated until they can be upstreamed. Furthermore, link times are already expensive, so we have made efforts for reduce that impact by selecting specific required libraries from MLIR. For these reasons there is no reason to change this now. There will be opportunities to regroup registration calls in subsequent patches

mehdi_amini added inline comments.Mar 17 2021, 1:39 PM
flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp
235

he code is being upstreamed in a series of many patches, and there will be ample opportunity to write and debug a parallel algorithm in subsequent patches

The problem is that I could your pass right after you land it, but that will create issues for you downstream, and right now it seems you can't show upstream why it is an issue with a test for this pass. That seems problematic to me for upstream development right now.

The "FuncOp Pass" vs "Module Pass" is more than just "waiting for a bottleneck: it is about system design as whole. The fact that you bring thread_local here seems fishy and makes me worried about what kind of skeletons we'll find down the road.
It is also in our experience almost impossible to come back and fix this later after it creeps everywhere in the compiler. We have a good example of this: LLVM itself. I tried for a while various approaches to revisit some of the LLVM internals but this is engrained too far by now. MLIR was designed with more care to avoid any global state and produce "crash reproducers" that can be as hermetic as possible for example. It'll be quite sad to me to give on all this so early and without very strong justifications.

249

I'm fine if you don't want to call simplifyRegions, but then can you refactor this as a pre-patch into a utility in a common place to avoid code duplication then?

I'm not enthusiastic to see each pass reimplementing generic utility / algorithm, and I don't see really a reason why we should at all.

This has community/project wise implications as well, the CIRCT folks were reporting performance issues with this kind of code path just a few days ago: https://llvm.discourse.group/t/speeding-up-canonicalize/3015 ; we were able to work together to improve the general tooling and infra around this.

flang/tools/fir-opt/fir-opt.cpp
21

Can you clarify how this is impacting link time here?

Can you introduce the APIs when they make sense in-tree please! I have no way to figure if there is a good reason or not for what you send for review otherwise. I don't know in which state is your downstream project, but I'm concerned about the issue it is causing right now on the restriction it puts on the code structure and organization. For example the regressions you mentioned that you spot downstream but that can't be reproduced upstream with a lit tests.

Re the comment on losing on parallelism, the uploaded code is functionally correct and complete. We (or the community at large) can certainly make this execute in parallel once the up-streaming is done. Fast execution and maintainability are in everyone’s interests, including ours, and we will be motivated to update the code after we've up streamed code gen and its tests.

We would like to upstream this part of the patch with an understanding that it will be fixed at a later date.

Re the comment on losing on parallelism, the uploaded code is functionally correct and complete.

This is true, there are still a bunch of comment I have that are still undressed even the parallelism aside: i.e. refactoring DCE and not duplicating the registration function when it isn't needed or not justified.

We (or the community at large) can certainly make this execute in parallel once the up-streaming is done. Fast execution and maintainability are in everyone’s interests, including ours, and we will be motivated to update the code after we've up streamed code gen and its tests.

We would like to upstream this part of the patch with an understanding that it will be fixed at a later date.

This does not answer my previous question: are you OK with me sending a patch to fix the parallelism there immediately after this patch lands?
I am under the impression right now that this is not possible because it would break some downstream tests in some ways that are unclear to me at the moment.

The OpenMP for Flang team has consistently been asking for upstreaming the fir-dev branch. I made an initial attempt to upstream a portion in May last year (https://reviews.llvm.org/D79731) which was discarded since it
did not have any community support. The current situation presents a few difficulties for the OpenMP team.

  1. The OpenMP dialect is in the llvm-project repo, while FIR codegen is developed in the fir-dev repo. So we have to first make a patch to llvm-project/mlir and then wait for it to be merged into the fir-dev repo (which can take from one to two weeks) and then make the relevant changes in the fir-dev repo. If FIR codegen was also upstream then this delay and committing to multiple repositories can be avoided.
  2. Since the bridge code (parse-tree to FIR) and codegen is not available in llvm-project/flang, any commits that we make to fir-dev cannot be upstreamed. So all our changes are also increasing the diff between fir-dev and upstream llvm-project/flang. Left uncontrolled this might become an untameable monster and we might never be able to fully upstream fir-dev.
  3. Since the OpenMP code which works with FIR is in fir-dev we cannot often show the context to MLIR core team. On at least one occasion this has become an issue while seeking help. If the code is upstream this will facilitate better discussions with the MLIR core team.

Given the issues mentioned above, we favour a faster upstreaming process so that the entire community can work on a single code base.

The MLIR core team has been very helpful with the OpenMP dialect work and we have benefitted from their advise and review comments. I also fondly recall that Mehdi has stepped in to support Flang/F18 (https://lists.llvm.org/pipermail/llvm-dev/2020-January/138219.html) when there was an opinion to consider other candidates. Mehdi has been very gracious with his time and has provided several reviews for FIR upstreaming. It will be unwise to not consider his review comments. We should try to address the DCE and registration comments. If addressing a comment requires substantial work then may be we can considered for later with a suitable mechanism to track this publicly.

I hope we can soon find a way out of this impasse and make progress.

flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp
249

Is the suggestion here to move the following DCE code to a file inside the mlir directory tree?

// Clean up the region.
  void simplifyRegion(mlir::Region &region) {
    for (auto &block : region.getBlocks())
      for (auto &op : block.getOperations()) {
        for (auto &reg : op.getRegions())
          simplifyRegion(reg);
        maybeEraseOp(&op);
      }
    doDCE();
  }

  /// Run a simple DCE cleanup to remove any dead code after the rewrites.
  void doDCE() {
    std::vector<mlir::Operation *> workList;
    workList.swap(opsToErase);
    while (!workList.empty()) {
      for (auto *op : workList) {
        std::vector<mlir::Value> opOperands(op->operand_begin(),
                                            op->operand_end());
        LLVM_DEBUG(llvm::dbgs() << "DCE on " << *op << '\n');
        ++numDCE;
        op->erase();
        for (auto opnd : opOperands)
          maybeEraseOp(opnd.getDefiningOp());
      }
      workList.clear();
      workList.swap(opsToErase);
    }
  }

  void maybeEraseOp(mlir::Operation *op) {
    if (!op)
      return;
    if (op->hasTrait<mlir::OpTrait::IsTerminator>())
      return;
    if (mlir::isOpTriviallyDead(op))
      opsToErase.push_back(op);
  }
mehdi_amini added inline comments.Mon, Mar 22, 1:20 PM
flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp
249

Yes this isn't more involved that what you're mentioning. I'll be happy to help with this kind of things, I just don't know how to proceed at the moment.

The DCE refactoring is trivial, I'm more concerned about the testing problem with the pass and the inability to have IR tests upstream when there are issues.

schweitz added inline comments.Mon, Mar 22, 3:48 PM
flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp
249

The original suggestion was to make this pass not work on ModuleOp so that it would allow multithreading. That suggestion did not contain enough information. The code snippet does not compile. Fixing the compilation led to the test included in this patch failing. Subsequently an incorrect guess was made that appeared to fix the test in this patch. However, the guessed at solution was wrong as other tests not merged regressed. This was all done in an effort to follow your original suggestion.

Clearly it is just not possible to merge tests that require code that has not been merged.

mehdi_amini added inline comments.Mon, Mar 22, 6:23 PM
flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp
249

I am not sure what you're trying to say here, would you like me to send a patch that can be applied on top of this one and change this pass in a way that works with the test in-tree?

mehdi_amini added inline comments.Mon, Mar 22, 6:46 PM
flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp
249

Here is the patch: https://reviews.llvm.org/D99132 for this part of the discussion.

schweitz updated this revision to Diff 332984.Wed, Mar 24, 7:31 AM

Respond to review comments.

mehdi_amini accepted this revision.Wed, Mar 24, 12:59 PM

LGTM

flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp
249

Seems like you updated to use this version of the patch. I'm sorry it took so long to get there for this revision: if something isn't clear with any of my comment feel free to ask me to clarify and I'm always happy to send a patch when it helps!

This revision is now accepted and ready to land.Wed, Mar 24, 12:59 PM
This revision was landed with ongoing or failed builds.Wed, Mar 24, 7:27 PM
This revision was automatically updated to reflect the committed changes.

I made a trivial fix to get the builds to pass.
https://reviews.llvm.org/rG502f27e66fd9fe44cd45ec5acae3e18f15f2d8c6

I've just checked, all Flang public workers are back to :green:. Thank you for your prompt fix @kiranchandramohan ! @schweitz , thanks for working on this!