This is an archive of the discontinued LLVM Phabricator instance.

[clang] Allow disassembly of multi-module bitcode files
Needs RevisionPublic

Authored by ormris on Jun 27 2023, 11:40 AM.

Details

Summary

Clang currently exits with an error message if asked to disassemble a
multi-module bitcode file which is used by ThinLTO to allow for Whole
Program Devirtualization and CFI. This might be confusing to some who are not
familiar with bitcode formats. In addition, multi-module bitcode files
are reasonably common. This patch makes clang's behavior consistent for
all common bitcode files.

Example usage:

clang -c -flto=thin -fwhole-program-vtables -fvisibility=hidden virtual.cpp
clang -x ir -S -emit-llvm virtual.o
# Writes virtual.0.ll and virtual.1.ll

Diff Detail

Event Timeline

ormris created this revision.Jun 27 2023, 11:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 27 2023, 11:40 AM
ormris requested review of this revision.Jun 27 2023, 11:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 27 2023, 11:40 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
ormris updated this revision to Diff 535202.Jun 27 2023, 5:17 PM

clang-format

FWIW, the precommit CI Windows bot found an issue with the newly added test case. Maybe test the Itanium and MS ABIs separately? (Uncertain if an Itanium-only test is appropriate but that's another option.)

Yes, I'm working on a new test that's less target dependent.

ormris updated this revision to Diff 538867.Jul 10 2023, 4:31 PM

Attempt to fix pre-merge checks.

MaskRay added inline comments.Jul 10 2023, 6:16 PM
clang/test/Frontend/split-lto-ir-support.cpp
2

Without -fsanitize=cfi or -fwhole-program-vtables, -fsplit-lto-unit is not the default. You need to specify this option explicitly.

%clang is normally used for test/Driver tests. For other tests, prefer %clang_cc1.

16
MaskRay requested changes to this revision.Jul 10 2023, 10:03 PM

[clang] Allow disassembly of multi-module bitcode files

The subject confused me as I did not recognize what disassembly means :)

Clang currently exits with an error message if asked to disassemble a multi-module bitcode file. [...]

The description is hand-waving on how Clang uses multi-module bitcode files.
You can be more specific that multi-module bitcode files are for -fsanitize=cfi and -fwhole-program-vtables. And it will be more useful to include an example in the summary.

It seems that you want to do this:

myclang -Xclang -flto-unit -fsplit-lto-unit -flto=thin a.c -c -o a.bc
myclang -S -emit-llvm a.bc -o b.ll
ls b.0.ll b.1.ll

This is a bit odd as the -o file may no longer an output. This could be fixed, but is this functionality really useful?

% rm -f a.0.ll a.1.ll; myclang -c a.bc -o a.o ; ls a.[01].ll
a.0.ll  a.1.ll

We can just disassemble a bitcode with llvm-dis a.bc -o -. The nicer thing is that llvm-dis additionally calls ModuleSummaryIndex::print to print the module summary index (if present).
clang -S -emit-llvm -x ir a.bc output doesn't have this information.

clang/test/Frontend/split-lto-ir-support.cpp
2

We need to hard code a target triple as many targets don't support LTO.

This revision now requires changes to proceed.Jul 10 2023, 10:03 PM
This comment was removed by ormris.
ormris edited the summary of this revision. (Show Details)Jul 18 2023, 11:59 AM

You can be more specific that multi-module bitcode files are for -fsanitize=cfi and -fwhole-program-vtables. And it will be more useful to include an example in the summary.

Fixed.

This is a bit odd as the -o file may no longer an output. This could be fixed, but is this functionality really useful?

I think it's useful to allow this kind of disassembly. Users who want to quickly inspect a bitcode files produced by the compiler could run them through clang first. While llvm-dis is the normal tool to do this, but some users don't know or have access to llvm-dis.

The nicer thing is that llvm-dis additionally calls ModuleSummaryIndex::print to print the module summary index (if present).

True, though this functionality could be added. I think users are more likely to be concerned about how their code was optimized by the LTO pre-link pipeline, say, then what the summary contains.

Is a simpler change like D154923 sufficient? It also handles the case when we don't emit .ll, but .o or .s.

I don't like this doesn't write to the output file by -o too. I also think the output should just match llvm-dis output, instead of splitting into multiple files.