This is an archive of the discontinued LLVM Phabricator instance.

[Flang] Add support to use LTO specific pipelines
ClosedPublic

Authored by mnadeem on Jan 23 2023, 4:50 PM.

Details

Summary

Thin and full LTO modes use different pre-link pipelines compared to regular compilation. This patch adds support for calling those pipelines.

This patch closely mimics clang implementation with the exception that I changed the codegen option name from PrepareForLTO to PrepareForFullLTO to be more precise.

With this patch:

  • Compilation for full LTO should be as we expect (except possibly missing optimizations enabled by module summaries which we do not produce yet)
  • thinLTO uses the correct prelink pipeline but will use the postlink backend for fullLTO due to missing metadata and summary in the llvm module. I have added a warning regarding this: flang-new: warning: the option '-flto=thin' is a work in progress

Diff Detail

Event Timeline

mnadeem created this revision.Jan 23 2023, 4:50 PM
Herald added a project: Restricted Project. · View Herald Transcript
mnadeem requested review of this revision.Jan 23 2023, 4:50 PM

Thanks @mnadeem for this patch. Have you tried LTO and seen improvements? Or should flang emit additional info for this to work well?

This patch closely mimics clang behaviour with two exceptions:
No standalone option -flto for fc1. This is not needed if we always pass flto=xxx
Change the codegen option name from PrepareForLTO to PrepareForFullLTO to be more precise.

Is this a simplification/improvement or is there a strong case for these changes in Flang?

Are additional changes required for emitting assembly or bitcode?

flang/test/Driver/driver-help-hidden.f90
47

Does this option's description need a change since there are multiple LTO modes?

mnadeem updated this revision to Diff 492178.Jan 25 2023, 10:37 AM
mnadeem edited the summary of this revision. (Show Details)

Thanks @mnadeem for this patch. Have you tried LTO and seen improvements? Or should flang emit additional info for this to work well?

This patch closely mimics clang behaviour with two exceptions:
No standalone option -flto for fc1. This is not needed if we always pass flto=xxx
Change the codegen option name from PrepareForLTO to PrepareForFullLTO to be more precise.

Is this a simplification/improvement or is there a strong case for these changes in Flang?

Are additional changes required for emitting assembly or bitcode?

We have tried full LTO on Aarch64 (without this patch) and have seen a few 3-8% improvements in SPEC and a 100+% improvement in leslie3d.
There were a couple of additional failures in SPEC that we have yet to inspect but AFAICT no issue with respect to bitcode/assembly changes.

Is this a simplification/improvement or is there a strong case for these changes in Flang?

I have updated the change and now the only difference is the name for PrepareForFullLTO for clarity.

flang/test/Driver/driver-help-hidden.f90
47

No I dont think so. -flto is full lto.
-flto= can take either full or thin.

We probably need tests for the following:
-> Additional linker invocation options are created. (Is this the plugin option?)
-> Test to ensure that the emitted object code contains the additional info require for LTO.

I see some additional info in the LLVM IR generated with LTO in clang. Example below. Isn't this info required for Flang as well?

^0 = module: (path: "", hash: (0, 0, 0, 0, 0))
^1 = gv: (name: "add", summaries: (function: (module: ^0, flags: (linkage: external, visibility: default, notEligibleToImport: 1, live: 0, dsoLocal: 1, canAutoHide: 0), insts: 8, funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0, noInline: 1, alwaysInline: 0, noUnwind: 1, mayThrow: 0, hasUnknownCall: 0, mustBeUnreachable: 0)))) ; guid = 2232412992676883508
^2 = flags: 8
^3 = blockcount: 1
flang/test/Driver/driver-help-hidden.f90
47

Ahh, OK. I got confused between -flto= and flto. Thanks.

tblah added a subscriber: tblah.Jan 26 2023, 3:34 AM

We probably need tests for the following:
-> Additional linker invocation options are created. (Is this the plugin option?)
-> Test to ensure that the emitted object code contains the additional info require for LTO.

I see some additional info in the LLVM IR generated with LTO in clang. Example below. Isn't this info required for Flang as well?

^0 = module: (path: "", hash: (0, 0, 0, 0, 0))
^1 = gv: (name: "add", summaries: (function: (module: ^0, flags: (linkage: external, visibility: default, notEligibleToImport: 1, live: 0, dsoLocal: 1, canAutoHide: 0), insts: 8, funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0, noInline: 1, alwaysInline: 0, noUnwind: 1, mayThrow: 0, hasUnknownCall: 0, mustBeUnreachable: 0)))) ; guid = 2232412992676883508
^2 = flags: 8
^3 = blockcount: 1

I'll plan to add a test for the linker invocation.

Regarding the additional info for LTO. I looked a bit and here is my understanding (I may be wrong):

The TODOs would be:

  • Add support for -f[no]split-lto-unit
  • Emit LTO summary
  • Use ThinLTOBitcodeWriterPass for thin lto

To match clang behaviour I am thinking of just adding the functionality necessary for full LTO and leaving the thinLTO part for later and give a warning on using thinLTO. What do you think?

To match clang behaviour I am thinking of just adding the functionality necessary for full LTO and leaving the thinLTO part for later and give a warning on using thinLTO. What do you think?

Fine with me, provided this is clearly documented in commit summary.

clang/lib/Driver/ToolChains/Flang.cpp
306–312

Are these options parsed by the Clang driver library? If so, could you add a comment?

flang/lib/Frontend/CompilerInvocation.cpp
138–148

Please follow https://github.com/llvm/llvm-project/blob/main/flang/lib/Frontend/CompilerInvocation.cpp#L9 for the coding style. I see that that was not followed in the block below. Please feel encouraged to update that as well :) Preferably in a separate patch, but no problem if this happens here. And sorry for not pointing out before.

140–147

I wouldn't worry about the diagnostic in the frontend driver - it's intended for developers. You could add an assert instead.

flang/test/Driver/lto-flags.f90
2

How about -flto=jobserver, -flto=auto and -fno-lto?

15

[nit] Could you add an empty line to make this easier to parse.

18

[nit] You can probably skip one of the empty lines here.

mnadeem updated this revision to Diff 492887.Jan 27 2023, 1:28 PM
mnadeem marked 8 inline comments as done.
mnadeem edited the summary of this revision. (Show Details)
  • Address comments.
  • Fix formatting for some old code as well.
  • Update commit message.
mnadeem added inline comments.Jan 27 2023, 1:30 PM
flang/lib/Frontend/CompilerInvocation.cpp
138–148

Changed to camelBack

mnadeem edited the summary of this revision. (Show Details)Jan 27 2023, 1:30 PM
mnadeem updated this revision to Diff 492921.Jan 27 2023, 3:15 PM

Add the new warning to a group to fix clang test failure.

Thanks for the updates! This mostly looks OK to me. I will be away for a week, but am happy for this to be merged once my comments are addressed.

thinLTO uses the correct prelink pipeline but will use the postlink backend for fullLTO due to missing information in the llvm module. I have added a warning regarding this: '-flto=thin' is only partially supported for flang

What does "partially" mean to the user? Bear in mind that you are exposing this to the end-user that will often be unfamiliar with the compiler internals. Perhaps rephrase as "work-in-progress" or "experimental"?

clang/include/clang/Basic/DiagnosticDriverKinds.td
116–118 ↗(On Diff #492921)
  1. I think that this diagnostic is unlikely to be re-used and you would probably just issue it inline.
  1. What do you mean by`flang`? The driver is called flang-new and the subproject is called Flang. ATM, flang is merely a sub-directory within llvm-project.
  1. You can skip "for flang" as the driver name will be displayed anyway (as in, this does not add any new information to the diagnostic).
clang/lib/Driver/ToolChains/Flang.cpp
306

It's "The Clang project" rather than "The clang project": https://clang.llvm.org/

mnadeem updated this revision to Diff 493030.Jan 28 2023, 1:54 PM
mnadeem marked an inline comment as done.
mnadeem edited the summary of this revision. (Show Details)
mnadeem added inline comments.Jan 28 2023, 1:58 PM
clang/include/clang/Basic/DiagnosticDriverKinds.td
116–118 ↗(On Diff #492921)

you would probably just issue it inline

Not sure what you mean by this, can you elaborate? Do you mean without adding it in DiagnosticDriverKinds.td? I don't know how to do that.

What do you mean by`flang`?
$ flang-new --help
OVERVIEW: flang LLVM compiler

You can skip "for flang" as the driver name will be displayed anyway (as in, this does not add any new information to the diagnostic).

Done

Would it help to compile a lto.f90 to lto.o and check with llvm-objdump that there is a special section?

mnadeem marked an inline comment as done.Feb 2 2023, 5:34 PM

Would it help to compile a lto.f90 to lto.o and check with llvm-objdump that there is a special section?

The .o is just llvm bitcode

clang/include/clang/Basic/DiagnosticDriverKinds.td
116–118 ↗(On Diff #492921)

made it inline

mnadeem updated this revision to Diff 494482.Feb 2 2023, 5:35 PM

Would it help to compile a lto.f90 to lto.o and check with llvm-objdump that there is a special section?

The .o is just llvm bitcode

Agreed. But you should be able to prove that LTO worked as expected. You should be able to distinguish LTO and non-LTO files. You only check the driver arguments.

Would it help to compile a lto.f90 to lto.o and check with llvm-objdump that there is a special section?

The .o is just llvm bitcode

Agreed. But you should be able to prove that LTO worked as expected. You should be able to distinguish LTO and non-LTO files. You only check the driver arguments.

I like this idea. And you could use llvm-dis instead that we already use in testing: https://github.com/llvm/llvm-project/blob/c7b1176e9afbfcc3da9482abbf7c1eb8793ff254/flang/test/Driver/emit-llvm-bc.f90#L3-L4

ping @mnadeem. Do you need any inputs or helpf to complete this patch?

mnadeem updated this revision to Diff 500838.Feb 27 2023, 10:23 AM

Add LTO bitcode test.

LGTM. Please wait for @awarzynski.

flang/test/Driver/lto-bc.f90
3

Nit: Test that the output is LLVM bitcode for LTO and not a native objectfile by disassembling it to LLVM IR.

Nit: This probably can be taken up separately. There is metadata about ThinLTO being off in full mode.
!5 = !{i32 1, !"ThinLTO", i32 0}

22

Nit: newline

flang/test/Driver/lto-flags.f90
12

I see "-plugin-opt=mcpu=generic" for -flto in clang as well. Is it different in flang and only applicable to thin mode?

This revision is now accepted and ready to land.Feb 28 2023, 5:56 AM
awarzynski added inline comments.Feb 28 2023, 8:24 AM
flang/test/Driver/lto-bc.f90
2–3

I thought that the purpose of this test was to demonstrate that adding -flto does change the output module. How come it does not?

mnadeem updated this revision to Diff 501214.Feb 28 2023, 10:25 AM
mnadeem marked an inline comment as done.

update comments in the test and add newline at the end

mnadeem marked an inline comment as done.Feb 28 2023, 10:30 AM
mnadeem added inline comments.
flang/test/Driver/lto-bc.f90
3

Nit: This probably can be taken up separately. There is metadata about ThinLTO being off in full mode.

I put the TODOs in the commit message.

I thought that the purpose of this test was to demonstrate that adding -flto does change the output module. How come it does not?

More work is needed to support that, there is some discussion above.

flang/test/Driver/lto-flags.f90
12

I'm only checking for "-plugin-opt=thinlto" which is specific to thin lto and a differentiator between thin/full. The mcpu plugin option is present in both clang and flang and thin/full lto but I dont test for it.

[nit]

After this patch:

Do you mean "after this patch lands, I will work on this things ..." , or "after this patch lands, the following becomes available"? You could disambiguate by replacing with "With this patch:", wdyt?

flang/test/Driver/lto-bc.f90
3

I put the TODOs in the commit message.

IMHO, the commit message should focus on "what" and "why" is happening within this particular patch instead.

More work is needed to support that, there is some discussion above.

Please, could you make sure that the explanation is added to the commit message? As for the test - if there's no difference regardless of the flag, I would also add this RUN line:

! RUN: %flang %s -c -o - | llvm-dis -o - | FileCheck %s
mnadeem edited the summary of this revision. (Show Details)Mar 2 2023, 10:19 AM
mnadeem marked an inline comment as done.Mar 2 2023, 10:44 AM

[nit]

After this patch:

Do you mean "after this patch lands, I will work on this things ..." , or "after this patch lands, the following becomes available"? You could disambiguate by replacing with "With this patch:", wdyt?

Done. Replaced with "With this patch:"

Hi @awarzynski I think we are going in circles here. Or perhaps I am not understanding what you want here, either way it is frustrating for me because I don't have a clear picture of what I need to do to get this patch in.

Please let me know exactly what I need to get this patch in.

This patch "adds support to use LTO specific pipelines". Exceptions are mentioned in the commit message and can be worked on incrementally. There is more work needed to add LTO specific data in the module and it can also be added incrementally. The commit message already mentions that we are missing some information in the LLVM module.

I put the TODOs in the commit message.

IMHO, the commit message should focus on "what" and "why" is happening within this particular patch instead.

I have removed the TODOs from the commit message.

More work is needed to support that, there is some discussion above.

Please, could you make sure that the explanation is added to the commit message?

The current commit message already mentions it but I added a couple more words. Let me know if it is still unclear and I will be happy to update it.
To me, more work == todo. The TODOs which I have now removed said:

  • Set module flags for thinlto and split lto unit
  • Emit LTO summary for thin and/or full lto

As for the test - if there's no difference regardless of the flag, I would also add this RUN line:

! RUN: %flang %s -c -o - | llvm-dis -o - | FileCheck %s

It would produce an error: llvm-dis: error: file doesn't start with bitcode header
I am guessing non-lto compilation is probably already tested somewhere, but I can add this if it adds any value. Let me know.

I thought that the purpose of this test was to demonstrate that adding -flto does change the output module. How come it does not?

tschuett suggested that I add a test that compiles an x.f90 file and check that there is a special section. I responded that it is just a (normal) bitcode file, implying that there is no special section at this point.

Then they said that we need to be able distinguish LTO and non-LTO files and you supported the idea. So I added a test that compiles a file with LTO and pipes the output to llvm-dis to make sure that it is indeed LLVM bitcode and not an object file because that is is the only differentiator between LTO and non-LTO output at this time.

awarzynski accepted this revision.Mar 2 2023, 12:55 PM

LGTM, many thanks for working on this!

I don't have a clear picture of what I need to do to get this patch in.

Please just ask for clarification if my comments are unclear. Also feel free to reach out on Flang's slack or via e-mail - I try to be responsive.

it is frustrating for me

I am sorry that you feel this way. I try to help with Flang driver as much as I can, but these days I'm distracted with other things and only review patches that others submit. And when there's a delay of more than a few days between revisions, I effectively have to remind myself the whole context. It's much easier for me when that's available in the summary.

Also, it seems that the LTO support is going to be split into multiple patches. IMHO it would make more sense to communicate the plan for this on Discourse rather than via a commit message (everything that's not explaining "what" and "why" is extra noise that could be avoided). This patch should implement the plan rather than define it.

tschuett suggested that I add a test that compiles an x.f90 file and check that there is a special section. I responded that it is just a (normal) bitcode file, implying that there is no special section at this point.

Yes, and together we have been able to come up with an alternative - that's great team work! You were able to address reviewer's comment and Flang gained a valuable regression tests ;-)

Testing the driver can be tricky, but it's not impossible.

As for the test - if there's no difference regardless of the flag, I would also add this RUN line:

! RUN: %flang %s -c -o - | llvm-dis -o - | FileCheck %s

It would produce an error: llvm-dis: error: file doesn't start with bitcode header

You can add -emit-llvm-bc. You can also add CHECK-NOT with stuff that is eventually going to appear in these modules at some point. That would help to understand/document what is coming. Usually there's a way to make things work in LLVM ;-) Feel free to ignore this suggestion.

Thanks awarzynski, I have been a bit busy but I will try to land this today after making the test changes.
In the future I'll try to communicate any planning beforehand, using discourse, to make reviewing easier. (I'm still a relatively new contributor to LLVM so still learning :) )

This revision was landed with ongoing or failed builds.Mar 9 2023, 1:28 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMar 9 2023, 1:28 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
tblah added a comment.Mar 10 2023, 7:16 AM

We have tried full LTO on Aarch64 (without this patch) and have seen a few 3-8% improvements in SPEC and a 100+% improvement in leslie3d.
There were a couple of additional failures in SPEC that we have yet to inspect but AFAICT no issue with respect to bitcode/assembly changes.

Hi @mnadeem, thanks for this patch. I'm having some trouble reproducing the improvement on leslie3d. Can you share more information about your configuration? Which flags did you use?

We have tried full LTO on Aarch64 (without this patch) and have seen a few 3-8% improvements in SPEC and a 100+% improvement in leslie3d.
There were a couple of additional failures in SPEC that we have yet to inspect but AFAICT no issue with respect to bitcode/assembly changes.

Hi @mnadeem, thanks for this patch. I'm having some trouble reproducing the improvement on leslie3d. Can you share more information about your configuration? Which flags did you use?

-O3 -flto -mllvm -fdynamic-heap-array on an Aarch64 device.
It was tested back in January, so if the improvement is not showing up right now then it may be possible that some other patch/optimization enabled better codegen in non-lto too.