This is an archive of the discontinued LLVM Phabricator instance.

[clang][llvm][lld] FatLTO Prototype
AbandonedPublic

Authored by paulkirth on Aug 10 2022, 2:12 PM.

Details

Summary

This patch adds support to clang, llvm, and lld for fat-lto-objects.

Fat LTO objects contain both LTO compatible IR, as well as generated
object code. This allows users to defer the choice of whether to use LTO
or not to link-time. This is a feature availble in GCC for some time,
and makes the existing -ffat-lto-objects flag functional in the same
way as GCC's.

This patch adds support for that flag in clang, as well as setting the
necessary codegen options for the backend. Within LLVM, we add a new
EmbedBitcodePass that serializes the module to the object file.

Lastly we add a new -fat-lto-objects flag to LLD, and slightly change
how it chooses input files in the driver when the flag is set.

Users are expected to pass -ffat-lto-objects to clang in addition to
one of the -flto variants. Without the -flto flag,
-ffat-lto-objects has no effect.

In LLD, if -ffat-lto-objects are enabled and the input files are fat
object files, then the linker will chose the LTO compatible bitcode
sections embedded within the fat object and link them together using
LTO. Otherwise, standard object file linking is done using the assembly
section in the object files.

Original RFC: https://discourse.llvm.org/t/rfc-ffat-lto-objects-support/63977

Diff Detail

Event Timeline

arda created this revision.Aug 10 2022, 2:12 PM
arda requested review of this revision.Aug 10 2022, 2:12 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 10 2022, 2:12 PM
arda removed a reviewer: MaskRay.Aug 10 2022, 2:14 PM
arda edited reviewers, added: phosek, paulkirth; removed: MaskRay.Aug 10 2022, 2:14 PM
arda updated this revision to Diff 451904.Aug 11 2022, 10:27 AM

Reverting the magic number based design. FatLTO object files stay as ELF files
with embedded bitcode. That's also how GCC does it. Current change checks if
there is a bitcode section named .fatlto.

arda updated this revision to Diff 451933.Aug 11 2022, 11:31 AM

Fix the order of bitcode and fatLTO section check.

arda updated this revision to Diff 451985.Aug 11 2022, 2:23 PM

Add the -fat-lto-objects flag as a prerequisite to checks.

arda updated this revision to Diff 452234.Aug 12 2022, 10:48 AM

Do not aggregate the .fatlto section when generating non-LTO executable.

I know this is WIP, but I think there are some Q's worth discussing as work progresses.

  1. How do "fat" objects interact w/ existing object types? i.e., regular object files and normal LTO objects?
  2. How do we control the behavior? can I just mix them? Are there combinations that should not work?
  3. Are the plans for testing? do we need special tests for the new format? or are the differences insignificant? can they be covered by existing tests?

I haven't seen these points discussed in the RFC or in these patches. How these are intended to work should probably be spelled out somewhere. Probably both in the code and in the documentation.

paulkirth added inline comments.Aug 15 2022, 10:04 AM
lld/ELF/Driver.cpp
2636–2640 ↗(On Diff #452234)

maybe it makes sense to check for pointer validity separately now that there are 2 uses?

arda marked an inline comment as done.Aug 15 2022, 10:28 AM
arda updated this revision to Diff 452735.Aug 15 2022, 10:28 AM

Check for the pointer validity separately

paulkirth added inline comments.Aug 15 2022, 10:32 AM
lld/ELF/Driver.cpp
2636 ↗(On Diff #452735)

LLVM coding standard prefers early return/continue to avoid nesting/indentation.

see https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code

arda updated this revision to Diff 452741.Aug 15 2022, 10:43 AM
arda marked an inline comment as done.

Check for the pointer validity separately with early continue.

arda updated this revision to Diff 453147.Aug 16 2022, 3:51 PM

Add a draft test. Not sure if this is the best way to verify the executables
are identical. I should probably ignore sections like .comment. I would
appreciate a feedback on the approach.

Is there a clang part of this change?

arda added a comment.Aug 17 2022, 10:04 AM

Is there a clang part of this change?

Yes,
https://reviews.llvm.org/D131090

I will be merging that patch to this one soon though.

arda updated this revision to Diff 453345.Aug 17 2022, 10:50 AM

Add clang and llvm related changes

Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2022, 10:50 AM
arda updated this revision to Diff 453386.Aug 17 2022, 11:54 AM

Replace section name from .fatlto to .llvm.lto

arda updated this revision to Diff 453550.Aug 18 2022, 12:42 AM

Use obj2yaml and yaml2obj to avoid unreadable object files

arda updated this revision to Diff 453751.Aug 18 2022, 12:24 PM

Add a clang Driver test

paulkirth added inline comments.Aug 18 2022, 2:59 PM
llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
5080 ↗(On Diff #453751)

Since this has been merged into this commit, I'll restate my concerns here from D131090.

Does this just do what embed bitcode does? My understanding was that LTO generated IR is emitted into object files differently than the embed bitcode scenario. I'm basing that on the discussion at https://discourse.llvm.org/t/rfc-ffat-lto-objects-support/63977

What I mean is that the way EmbedBitcode worked was not TheRightThing according to that discussion. It appears as though your change just copies its logic.

So if I want to compile a C file and get a fat-LTO object, how does that work? When do the LTO passes get added to the compilation pipeline? I don't mean the big whole program optimization stuff that gets invoked at link-time, I mean the module level passes and summary generation? It's not clear to me when those will run after invoking clang on a source file, since I don't see any config for setting that up as part of codegen.

can you elaborate on how this is intended to work? I seem to be missing something. In particular, I'd like to some kind of documentation describing how all the pieces are intended to work. I'd also like to see something outlining any limitations, such as different combinations of object files/flags that are not expected to work and why?

arda updated this revision to Diff 454048.Aug 19 2022, 9:34 AM

Invoke -flto=thin in the first stage of -ffat-lto-objects

phosek added inline comments.Aug 19 2022, 11:09 AM
clang/lib/Driver/ToolChains/Clang.cpp
7125–7127 ↗(On Diff #454048)

Rather than assuming ThinLTO, I think this patch should assume Unified LTO as was suggested in https://discourse.llvm.org/t/rfc-ffat-lto-objects-support/63977.

clang/test/Driver/fat-lto-objects.c
2–3 ↗(On Diff #454048)

I don't think this test is useful, IIUC it checks that Clang accepts the -ffat-lto-objects flag, but that's also checked by the test below.

lld/test/ELF/fatlto/Inputs/a-thinLTO.ll
1–66 ↗(On Diff #454048)

Rather than using the output of Clang which tends to include a lot of unnecessary stuff, I'd consider either writing the minimal .ll file manually, or pruning this one to drop everything that's not strictly needed for the test.

lld/test/ELF/fatlto/Inputs/a.c
1–23 ↗(On Diff #454048)

This seems unnecessarily complicated. You should only include code that's absolutely necessary for the test.

lld/test/ELF/fatlto/Inputs/a.h
1–3 ↗(On Diff #454048)

You can use forward declarations and avoid this header altogether.

lld/test/ELF/fatlto/Inputs/main-thinLTO.ll
1–48 ↗(On Diff #454048)

Rather than using the output of Clang which tends to include a lot of unnecessary stuff, I'd consider either writing the minimal .ll file manually, or pruning this one to drop everything that's not strictly needed for the test.

lld/test/ELF/fatlto/Inputs/main.c
1–10 ↗(On Diff #454048)

This seems unnecessarily complicated. You should only include code that's absolutely necessary for the test.

lld/test/ELF/fatlto/fatlto.test
4–6 ↗(On Diff #454048)

LLD tests cannot depend on clang, that's a layering violation. It also doesn't seem like the output is being used anywhere so this could be safely removed.

12 ↗(On Diff #454048)

The convention is to use uppercase names for prefixes.

llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
5080 ↗(On Diff #453751)

I suspect that the approach we'll want to use is going to be akin to D130777 which uses a pass to control where in the pipeline the bitcode gets emitted.

paulkirth added inline comments.Aug 19 2022, 2:00 PM
lld/ELF/InputFiles.cpp
1721

Do we just keep going if there's an error? This is only checking if there's not an error, right? or does takeError do more than I remember?

phosek added inline comments.Aug 19 2022, 2:12 PM
lld/test/ELF/fatlto/Inputs/a-fatLTO.yaml
23 ↗(On Diff #454048)

I think it'd be much nicer if you inline the LLVM IR in the YAML file and have yaml2obj convert that to bitcode automatically (similarly to how yaml2obj also handles DWARF inside ELF). That's not supported now as far as I'm aware and should be implemented as a separate change. It might be worth leaving a comment here explaining how the content of this section was produced and having a TODO regarding the yaml2obj improvement.

arda updated this revision to Diff 454585.Aug 22 2022, 11:58 AM
arda marked 9 inline comments as done.

Reimplement embed-bitcode part as a separate pass

arda added inline comments.Aug 22 2022, 12:00 PM
lld/test/ELF/fatlto/fatlto.test
4–6 ↗(On Diff #454048)

Sorry for the confusion. These ones are just comments and do not run. I will remove them.

paulkirth added a comment.EditedAug 22 2022, 1:59 PM

I think the test strategy here needs to be:

  • Flags
    • clang, llvm, lld can parse the fat-object flags and configure themselves correctly
  • Format checking
    • llvm generates the correct object file sections when configured for fatlto
    • clang generates fatlto objects correctly (i.e. correctly sets up the codegen pipeline in llvm)
    • LLD can correctly link fat objects
  • Optimizations
    • the LTO pipeline is configured correctly
    • LLD can optimize fat objects the same way as normal bitcode files
clang/lib/Driver/ToolChains/Clang.cpp
7120–7122 ↗(On Diff #454585)

Do we always do this under -flto=full, now? Doesn't that just essentially disable full LTO, since you set it to 'thin' when compiling for fat objects?

clang/test/CodeGen/embed-lto-fatlto.c
4–5 ↗(On Diff #454585)

What property are you trying to check here? that the full LTO mode is replaced with thin?

lld/test/ELF/fatlto/fatlto.test
28–30 ↗(On Diff #454585)

Can this test fail given how you generate the linker's inputs and what you're comparing?

So, does the diff fail if you don't pass the -fat-lto-objects flag on Line 13?

arda marked 2 inline comments as done.Aug 22 2022, 9:36 PM
arda added inline comments.
lld/ELF/InputFiles.cpp
1721

We just keep going if there is an error. So we end up using the assembly and not the embedded LTO module when generating the object file if there is no embedded bitcode in the input files.

lld/test/ELF/fatlto/fatlto.test
28–30 ↗(On Diff #454585)

Yes, it fails.

sfertile added inline comments.
llvm/lib/Bitcode/Writer/EmbedBitcodePass.cpp
26 ↗(On Diff #454585)

From the discourse discussion:

  1. it was suggested that we remove the existing -fembed-bitcode functionality as Apple has stop supporting it.
  2. mentioned that MLGO uses the option to embed the bitcode at various points in the pipeline depending on if its using LTO our not.

Do we want the pass to be a bit more generic and be able to specify the global to use for embedding, and the section name to use as arguments? That way MLGO can keep using the section name it uses now . It also helps consuming tools to disambiguate between bitcode embedded for lto purpose from bitcode embedded for other purposes.

arda marked 2 inline comments as done.Aug 23 2022, 5:43 PM
arda added inline comments.
llvm/lib/Bitcode/Writer/EmbedBitcodePass.cpp
26 ↗(On Diff #454585)

I should have an answer in a couple of days. I will keep you updated.

arda marked an inline comment as done.Aug 24 2022, 10:53 PM
arda marked an inline comment as done.Aug 29 2022, 10:58 AM
arda added inline comments.
llvm/lib/Bitcode/Writer/EmbedBitcodePass.cpp
26 ↗(On Diff #454585)

We have talked to folks at MLGO and reached a decision that their use case is sufficiently different from ours that it should be handled separately. More detailed discussion here:

[[ https://discourse.llvm.org/t/rfc-ffat-lto-objects-support/63977/15?u=arda | [RFC] -ffat-lto-objects support on Discours ]]

paulkirth commandeered this revision.Feb 23 2023, 10:31 AM
paulkirth edited reviewers, added: arda; removed: paulkirth.

I'll be updating this patch soon with a rebased version. Likely that will need to be split up into a series of smaller patches.

There is also some design work that needs to be addressed. Primarily around how/when bitcode sections are emitted and object code generated. It may also be necessary to revisit some of the details in its LLD support, given changes to the codebase since the original patch.

paulkirth retitled this revision from [WIP][Do NOT review] LLD related changes for -ffat-lto-objects support to [clang][llvm][lld] FatLTO Prototype.
paulkirth edited the summary of this revision. (Show Details)

Rebase revision. Abandon parent revision on unified lto.

  • Small changes to the LLD implementation were required.

Note: I've only rebased this change so far, I've made no attempt to improve its quality.

paulkirth added inline comments.Feb 23 2023, 10:40 AM
lld/test/ELF/fatlto/Inputs/a-fatLTO.yaml
23 ↗(On Diff #454048)

Should we just do this now? it seems like a useful feature.

phosek added inline comments.Feb 23 2023, 11:57 AM
lld/test/ELF/fatlto/Inputs/a-fatLTO.yaml
23 ↗(On Diff #454048)

I think that would be helpful.

Seems like the EmbedBitcodePass comes straight from D130777. It may make more sense to drop that part of the patch here, and rebase on top of it.

On the downside, from what I can see this patch and D130777 have an issue in that they don't really do the right thing in terms of optimization pipeline configuration or making a good LTO compatible bitcode section.

First, when we set

if (Args.hasArg(options::OPT_ffat_lto_objects))
  Output = types::TY_PP_Asm;

in Driver.cpp, it changes the type of the backend action we launch, which changes the values of IsLTO and IsThinLTO(https://github.com/llvm/llvm-project/blob/53689fdfb29767a12b4d5ad41c67a705a3c622de/clang/lib/CodeGen/BackendUtil.cpp#L898).
In turn, this alters how select an optimization pipeline in BackendUtil.cpp, since LTO flags aren't set up correctly. (https://github.com/llvm/llvm-project/blob/53689fdfb29767a12b4d5ad41c67a705a3c622de/clang/lib/CodeGen/BackendUtil.cpp#L978)

The other issue is that, from what I can tell, EmbbedBitcodePass just dumps the module as is, and doesn't try to make one appropriate for LTO/ThinLTO at all.

For now I want to try and reconcile these two issues.

There is a simple and crude way to handle the pass pipeline: We can create a new pass manager, build the correct LTO/ThinLTO pipeline, and register the EmbedBitcodePass at the end of it. If we run that, then we can hand the module back to the already configured per module PM and let it run. The downside to this is that the (Thin)LTO pipeline we set up will probably be missing things that were added outside of the default pipeline. I'm not sure how to address that. We could copy all of the passes, but that seems like it would have its own issues. The best course of action may be to simply factor those bits out into helper functions and try to keep the existing logic in place. I'm not sure there is a very satisfactory way to address the situation. If possible, running a second backend action would be my preference, but I don't see how we could get that to work without significant refactoring.

The situation is more straightforward for improving the EmbedBitcodePass. I think we just need to parameterize the pass and let it select the correct flavor of BitcodeWriter for LTO modules and summaries.

lld/test/ELF/fatlto/Inputs/a-fatLTO.yaml
23 ↗(On Diff #454048)

Alright, I'll look into that once I have a better idea of what we would like to change here.

paulkirth updated this revision to Diff 501341.Feb 28 2023, 5:15 PM

Allow EmbedBitcodePass to emit sections for Thin + Full LTO

  • Improve options handling when FatLTO is enabled, so LTO related flags are not dropped in CC1
  • Misc code cleanups.

I still view this as very much a WIP, but the prototype now can encode .llvm.lto sections with ThinLTO summaries, and seems to work better in terms of managing an optimizatipon pipeline.

paulkirth updated this revision to Diff 503568.Mar 8 2023, 4:47 PM

Minor update to tests.

  • drop yaml2obj in favor of using llc + objcopy
  • remove c files

Note one tests now fails, though.
For some reason w/ FatLTO the linker rewrites
xor %eax, %eax to movb $0x0, %al.

I haven't figured out why, since the input files were generated the same way w/ llc,
and I've never seen a linker do that. I also confirmed that the input sections were identical.

I've split this into smaller patch sets by project:

LLVM: https://reviews.llvm.org/D146776
Clang: https://reviews.llvm.org/D146777
LLD: https://reviews.llvm.org/D146778

If we're happy with that, then we can probably abandon this revision in favor of those.