This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Support bitcode input containing multiple modules
ClosedPublic

Authored by MaskRay on Jul 10 2023, 11:45 PM.

Details

Summary

When using -fsplit-lto-unit (explicitly specified or due to using
-fsanitize=cfi/-fwhole-program-vtables), the emitted LLVM IR contains a module
flag metadata "EnableSplitLTOUnit". If a module contains both type metadata
and "EnableSplitLTOUnit", ThinLTOBitcodeWriter.cpp will write two modules
into the bitcode file. Compiling the bitcode (not ThinLTO backend compilation)
will lead to an error due to parseIR requiring a single module.

% clang -flto=thin a.cc -c -o a.bc
% clang -c a.bc
% clang -fsplit-lto-unit -flto=thin a.cc -c -o a.bc
% clang -c a.bc
error: Expected a single module
1 error generated.

There are multiple ways to have just one module in a bitcode file
output: -Xclang -fno-lto-unit, not using features like -fsanitize=cfi,
using -fsanitize=cfi with -fno-split-lto-unit. I think whether a
bitcode input file contains 2 modules (internal implementation strategy)
should not be a criterion to require an additional driver option when
the user seek for a non-LTO compile action.

Let's place the extra module (if present) into CodeGenOptions::LinkBitcodeFiles
(originally for -cc1 -mlink-bitcode-file). Linker::linkModules will link the two
modules together. This patch makes the following commands work:

clang -S -emit-llvm a.bc
clang -S a.bc
clang -c a.bc

Diff Detail

Event Timeline

MaskRay created this revision.Jul 10 2023, 11:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 10 2023, 11:45 PM
MaskRay requested review of this revision.Jul 10 2023, 11:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 10 2023, 11:45 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

If I follow correctly, this is basically undoing the splitting that was done by the command that produced the bitcode file? I guess that could be useful. But it requires either renaming your object files from the default ".o" to ".bc", or explicitly passing "-x ir"? That seems unintuitive. Maybe it's better to put this behind some explicit flag?

If I follow correctly, this is basically undoing the splitting that was done by the command that produced the bitcode file?

Yes, undoing llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp change that would make the output bitcode file not usable as an input for non-LTO use cases.

I guess that could be useful. But it requires either renaming your object files from the default ".o" to ".bc", or explicitly passing "-x ir"? That seems unintuitive. Maybe it's better to put this behind some explicit flag?

Yes, specify -x ir or let the driver deduce the file type with the predefined extension .bc.

I think this is a less common operation (compiling with LTO but then using as non-LTO), so I think adding another option seems not necessary.

If I follow correctly, this is basically undoing the splitting that was done by the command that produced the bitcode file?

Yes, undoing llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp change that would make the output bitcode file not usable as an input for non-LTO use cases.

I guess that could be useful. But it requires either renaming your object files from the default ".o" to ".bc", or explicitly passing "-x ir"? That seems unintuitive. Maybe it's better to put this behind some explicit flag?

Yes, specify -x ir or let the driver deduce the file type with the predefined extension .bc.

I think this is a less common operation (compiling with LTO but then using as non-LTO), so I think adding another option seems not necessary.

Perhaps rephrase my argument:

There are multiple ways to have just one module in a bitcode file output: -Xclang -fno-lto-unit, not using features like -fsanitize=cfi, using -fsanitize=cfi with -fno-split-lto-unit.
I think whether a bitcode input file contains 2 modules (internal implementation strategy) should not be a criterion to require an additional driver option when the user seek for a non-LTO compile action.

This is a good way to resolve the inconsistencies I was looking at in my own review.

I think whether a bitcode input file contains 2 modules (internal implementation strategy) should not be a criterion to require an additional driver option

Agreed. Most users don't know about the internal structure of bitcode modules, so I think it's best to use a consistent interface for all bitcode files.

clang/lib/CodeGen/CodeGenAction.cpp
1174

Perhaps this variable name could be more descriptive. Maybe "FirstM" or something like that?

MaskRay updated this revision to Diff 543083.Jul 21 2023, 2:49 PM
MaskRay edited the summary of this revision. (Show Details)

rename a variable

MaskRay updated this revision to Diff 543084.Jul 21 2023, 2:50 PM

rename a variable

ormris accepted this revision.Jul 21 2023, 4:50 PM

LGTM

This revision is now accepted and ready to land.Jul 21 2023, 4:50 PM
This revision was landed with ongoing or failed builds.Jul 21 2023, 8:05 PM
This revision was automatically updated to reflect the committed changes.