Page MenuHomePhabricator

[flang][driver] Add support for `-O{0|1|2|3}`

Authored by awarzynski on Jun 17 2022, 4:11 AM.



This patch adds support for most common optimisation compiler flags:
-O{0|1|2|3}. This is implemented in both the compiler and frontend
drivers. At this point, these options are only used to configure the
LLVM optimisation pipelines (aka middle-end). LLVM backend or MLIR/FIR
optimisations are not supported yet.

Previously, the middle-end pass manager was only required when
generating LLVM bitcode (i.e. for flang-new -c -emit-llvm <file> or
flang-new -fc1 -emit-llvm-bc <file>). With this change, it becomes
required for all frontend actions that are represented as
CodeGenAction and CodeGenAction::executeAction is refactored
accordingly (in the spirit of better code re-use).

Additionally, the -fdebug-pass-manager option is enabled to facilitate
testing. This flag can be used to configure the pass manager to print
the middle-end passes that are being run. Similar option exists in Clang
and the semantics in Flang are identical. This option translates to
extra configuration when setting up the pass manager. This is
implemented in CodeGenAction::runOptimizationPipeline.

This patch also adds some bolier plate code to manage code-gen options
("code-gen" refers to generating machine code in LLVM in this context).
This was extracted from Clang. In Clang, it simplifies defining code-gen
options and enables option marshalling. In Flang, option marshalling is
not yet supported (we might do at some point), but being able to
auto-generate some code with macros is beneficial. This will become
particularly apparent when we start adding more options (at least in
Clang, the list of code-gen options is rather long).

Diff Detail

Event Timeline

awarzynski created this revision.Jun 17 2022, 4:11 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript
awarzynski requested review of this revision.Jun 17 2022, 4:11 AM

The CI failed.

The CI failed.

Thanks - I didn't notice any failures related to this change. Did I miss anything?

OK. This seems not to be related to this patch. I also see these failures in another review.

Thanks for this patch. This is one big progress to use flang-new for real workloads. Mostly looks good to me. I have several minor comments. If supporing -Ofast requires more changes, I prefer to push forward with the current patch.

summary: fronted -> frontend

Confirmed with the following case:

program m
  integer :: x = 1, y = 2, z = 0
  call add(x, y, z)
  print *, z
  subroutine add(a, b, c)
    integer :: a, b, c
    integer :: i
    do i = 1, 10
      c = c + a + b
$ flang-new -fc1 -emit-llvm test.f90 -O3
$ cat test.ll 
; ModuleID = 'FIRModule'
source_filename = "FIRModule"
target triple = "aarch64-unknown-linux-gnu"

@_QFEz = internal unnamed_addr global i32 0
@_QQcl.2E2F746573742E66393000 = linkonce constant [11 x i8] c"./test.f90\00"

define void @_QQmain() local_unnamed_addr !dbg !3 {
  %1 = load i32, ptr @_QFEz, align 4, !dbg !7
  %2 = add i32 %1, 30, !dbg !7
  store i32 %2, ptr @_QFEz, align 4, !dbg !7
  %3 = tail call ptr @_FortranAioBeginExternalListOutput(i32 -1, ptr nonnull @_QQcl.2E2F746573742E66393000, i32 5), !dbg !12
  %4 = tail call i1 @_FortranAioOutputInteger32(ptr %3, i32 %2), !dbg !13
  %5 = tail call i32 @_FortranAioEndIoStatement(ptr %3), !dbg !12
  ret void, !dbg !14
; Function Attrs: argmemonly nofree norecurse nosync nounwind
define void @_QFPadd(ptr nocapture readonly %0, ptr nocapture readonly %1, ptr nocapture %2) local_unnamed_addr #0 !dbg !9 {

Will enabling Ofast require more changes in the flang frontend? If yes, it is OK to support it in another patch later.


I saw clang/include/clang/Basic/CodeGenOptions.def has the following:

VALUE_CODEGENOPT(OptimizationLevel, 2, 0) ///< The -O[0-3] option specified.
VALUE_CODEGENOPT(OptimizeSize, 2, 0) ///< If -Os (==1) or -Oz (==2) is specified.

Do -Os and -Oz need extra processing in flang drivers? If yes, it is OK to support it in another patch later.


Miss the following?

// Coding style:
awarzynski edited the summary of this revision. (Show Details)Jun 22 2022, 4:49 AM
rovka added inline comments.Jun 22 2022, 5:41 AM

I'm not sure I understand the difference between CODEGENOPT and VALUE_CODEGENOPT. Is it that CODEGENOPT is actually a kind of BOOL_CODEGENOPT, that should always have just 1 bit? Do we really need both?


This isn't used yet, can we skip it?


Shouldn't this comment go away too?


Thanks for taking a look, @peixin! Just to clarify - I'm not really looking into -Os, -Ofast or -Oz at the moment. But I'm always happy to review driver patches :)


I would much prefer to have it implemented it in a dedicated patch.

For every option, one has to consider the semantics in the compiler driver (flang-new) vs frontend driver (flang-new -fc1) and then, in case of code-gen flags, the difference between middle-end and back-end pass pipelines. So quite a few things :)

In general, I'm in favor of doing this incrementally, in small patches. This makes reviewing and testing easier and more self-contained. Is that OK?


Do -Os and -Oz need extra processing in flang drivers?

These are code-size optimisation flags, so the logic will be a bit different.

Also, I'm very much in favor of small, incremental and self-contained patches. Splitting this into regular and size optimizations seemed like a natural split.


Good catch, thanks!

awarzynski marked 3 inline comments as done.

Address comments from Peixin and Diana

Main change - removed ENUM_CODEGENOPT and VALUE_CODEGENOPT from

peixin accepted this revision.Jun 22 2022, 6:26 AM



In general, I'm in favor of doing this incrementally, in small patches. This makes reviewing and testing easier and more self-contained. Is that OK?

Cannot agree more.

This revision is now accepted and ready to land.Jun 22 2022, 6:26 AM



Nit: I have committed D126164, and you can rebase and use D directly, which is the style in Clang.cpp.

awarzynski added inline comments.Jun 22 2022, 10:22 AM

Thanks for the heads up and for seeing through! I'll update it now before I forget :)


At one point I convinved myself that I understand the difference, but now I'm re-rereading Clang's CodeGenOptions.def and I also see ... no difference 😂 .

Thanks for pointing this out! Let me remove it.


Yup, good suggestion!

Use D instead of TC.getDriver()

rovka added a comment.Jun 23 2022, 1:45 AM

I just realized I haven't pestered you enough about testing :) Can you add a test that -O4 indeed warns and uses -O3? Also, the summary says this should work in both the compilation and the frontend driver, but you're only testing with %flang_fc1.

awarzynski added a comment.EditedJun 23 2022, 2:19 AM

I just realized I haven't pestered you enough about testing :)

I did feel like I was missing something here, haha :) Updates arriving shortly!

rovka accepted this revision.Jun 27 2022, 2:03 AM

Cool, thanks!

This revision was automatically updated to reflect the committed changes.