This is an archive of the discontinued LLVM Phabricator instance.

Embed bitcode in object file (clang cc1 part)
ClosedPublic

Authored by steven_wu on Feb 18 2016, 9:42 AM.

Details

Summary

Teach clang to embed bitcode inside bitcode. When -fembed-bitcode cc1
option is used, clang will embed both the input bitcode and cc1
commandline into the bitcode in special sections before compiling to
the object file. Using -fembed-bitcode-marker will only introduce a
marker in both sections.

Depends on D17390

Diff Detail

Repository
rL LLVM

Event Timeline

steven_wu updated this revision to Diff 48348.Feb 18 2016, 9:42 AM
steven_wu retitled this revision from to Embed bitcode in object file (clang cc1 part).
steven_wu updated this object.
steven_wu added a subscriber: cfe-commits.
rsmith edited edge metadata.Mar 18 2016, 10:28 AM

From a code standpoint, this looks fine to me, but I'd like to understand a bit more about what you aim to achieve with the various parts here.

include/clang/Frontend/CodeGenOptions.def
57 ↗(On Diff #48348)

What is this "marker only" mode for?

lib/CodeGen/BackendUtil.cpp
752–753 ↗(On Diff #48348)

{ on previous line.

792 ↗(On Diff #48348)

As mentioned in the discussion on cfe-dev, embedding the command line and embedding bitcode appear to be completely orthogonal and separate concerns. Can you explain why it makes sense to control both with the same flag?

793–794 ↗(On Diff #48348)

This doesn't seem like you're saving enough information to be able to replay the build (you're missing the current working directory, for instance). Can you clarify exactly what you want to be able to do with the embedded command-line options?

Hi Richard

Thanks for looking at the patch! Replies are inlined with the feedback.

Steven

include/clang/Frontend/CodeGenOptions.def
57 ↗(On Diff #48348)

marker only mode is used to speed up the compilation while still produce enough information in the final output to check if all the files containing a bitcode section.
-fembed-bitcode option will split the compilation into two stages and it adds measurable costs to compile time due to the extra process launch, the serialization and the verifier. -fembed-bitcode-marker is just a way to avoid that costs but still mark the section for the sanity check later (done by linker in our case).

lib/CodeGen/BackendUtil.cpp
792 ↗(On Diff #48348)

I thought the original discussion is that if the command line is needed for the bitcode embedding. I am happy to add option to toggle command line embedding. Here is the proposal:
-fembed-bitcode is implying -fembed-bitcode=all that embeds both bitcode and command line.
-fembed-bitcode=bitcode only embeds bitcode and no command line.
command line itself is useless so I will not provide an option to do that.

793–794 ↗(On Diff #48348)

From the original discussion, I mentioned that all paths are useless for the purpose of reproducing compilation from bitcode input. Bitcode is very self-contain that it has all the debug info and other source information included. Switching current working directory will not affect the object file generated from the bitcode. Please let me know if I missed something.
On the other hand, we have some backend options that will affect the compilation result. There are few options that are not properly encoded in the bitcode (see D17394) that should be passed from command line to reproduce the exact same compilation. The end goal is for command line section is that for -fembed-bitcode, there are two stages:

  1. source code -> bitcode
  2. bitcode -> object file

Command line section should store all the arguments in stage 2 so it can be replayed if needed.
Without D17394, too much information is embedded in the command line section (warning flags, include paths, etc., all embedded but ignored by clang if input is bitcode). D17394 is trying to trim down the options used in stage 2.

steven_wu updated this revision to Diff 52435.Apr 1 2016, 3:35 PM
steven_wu edited edge metadata.

Address the feedback from Richard:
Break -fembed-bitcode option into multiple -fembed-bitcode= options.
-fembed-bitcode=all will embed both bitcode and commandline and
-fembed-bitcode=bitcode will embed only the bitcode in the object file.

ping. The currently -fembed-bitcode option is only half working. I don't want to leave it like that too long.

steven_wu updated this revision to Diff 54559.Apr 21 2016, 12:00 PM

Ping.

Rebase the patch over trunk and tweak the testcase.

vsk added a subscriber: vsk.Apr 21 2016, 4:57 PM

Just a few nitpicks.

lib/CodeGen/BackendUtil.cpp
785 ↗(On Diff #54559)

You might see a 'drops const qualifier' warning on this line.

Also, calling OS.str() twice calls raw_ostream::flush() twice. Maybe it could be: OS.flush(); ArrayRef(Data.data(), Data.size()).

814 ↗(On Diff #54559)

You might see a 'drops const qualifier' warning on this line.

lib/Frontend/CompilerInvocation.cpp
670 ↗(On Diff #54559)

nit, can you use a range loop here?

steven_wu updated this revision to Diff 54597.Apr 21 2016, 5:24 PM

Thanks Vedant!

Address all of your review in this update.

rsmith added inline comments.May 5 2016, 3:03 PM
lib/CodeGen/BackendUtil.cpp
769–770 ↗(On Diff #54597)

Ping.

774–806 ↗(On Diff #54597)

Can you skip all this if Buf is empty (in the common case where the input is not an IR file)?

lib/Driver/Tools.cpp
3766–3771 ↗(On Diff #54597)

This changes the semantics of clang -fembed-bitcode -fembed-bitcode-marker to embed only a marker and no bitcode. Is that OK?

5724–5728 ↗(On Diff #54597)

Why is this necessary? -fembed-bitcode implies -disable-llvm-optzns anyway.

steven_wu added inline comments.May 5 2016, 4:15 PM
lib/CodeGen/BackendUtil.cpp
769–770 ↗(On Diff #54597)

Sorry, I missed this. Will fix.

774–806 ↗(On Diff #54597)

-fembed-bitcode-marker will actually take this path. -fembed-bitcode-marker will not split the compilation into two stages thus the input is not IR and Buf is empty but clang needs to generate a marker.

lib/Driver/Tools.cpp
3766–3771 ↗(On Diff #54597)

It wasn't clearly defined so I am ok with changing the semantics. It makes even more sense that we have different embed options.

5724–5728 ↗(On Diff #54597)

-disable-llvm-passes is for -save-temps option. -fembed-bitcode and -save-temps disable optzns differently.
-save-temps has following outputs: preprocess source, IR emitted from front end, assembly file, object file
-fembed-bitcode has following outputs: optimized IR, object file
They don't share the same intermediate output at all so they need to use -disable-llvm-optzns at different stages.

steven_wu updated this revision to Diff 56367.May 5 2016, 4:16 PM

Format update according to review.

rsmith added inline comments.May 5 2016, 4:53 PM
lib/CodeGen/BackendUtil.cpp
799–831 ↗(On Diff #56367)

I see, the "marker" is an empty llvm.embedded.module constant. Please add a comment to that effect; that's not obvious.

lib/Driver/Tools.cpp
5710–5714 ↗(On Diff #56367)

This means that -fembed-bitcode -save-temps will save different intermediate IR than -save-temps alone; that seems wrong. I would expect that process to save the unoptimized IR but embed the optimized IR. Getting that right will probably require adding another action to the pipeline for that combination of flags.

steven_wu updated this revision to Diff 56371.May 5 2016, 5:12 PM
steven_wu marked 2 inline comments as done.

Add comments to address the feedback fromt the review.

Attach a new patch with the comments

lib/CodeGen/BackendUtil.cpp
799–831 ↗(On Diff #56367)

Sure.

lib/Driver/Tools.cpp
5710–5714 ↗(On Diff #56367)

That is exactly the case. I will add a FIXME here.

rsmith accepted this revision.May 10 2016, 7:17 PM
rsmith edited edge metadata.
This revision is now accepted and ready to land.May 10 2016, 7:17 PM
This revision was automatically updated to reflect the committed changes.