This is an archive of the discontinued LLVM Phabricator instance.

ThinLTO Minimized Bitcode File Size Reduction
ClosedPublic

Authored by haojiewang on Jul 12 2017, 4:19 PM.

Details

Summary

Currently the ThinLTO minimized bitcode file only strip the debug info, but there is still a lot of information in the minimized bit code file that will be not used for thin linker. In this patch, most of the extra information is striped to reduce the minimized bitcode file. Now only ModuleVersion, ModuleInfo, ModuleGlobalValueSummary, ModuleHash, Symtab and Strtab are left. Now the minimized bitcode file size is reduced to 15%-30% of the debug info stripped bitcode file size.

Event Timeline

haojiewang created this revision.Jul 12 2017, 4:19 PM
pcc added a reviewer: pcc.Jul 12 2017, 4:26 PM
mehdi_amini added inline comments.Jul 12 2017, 5:25 PM
include/llvm/Bitcode/BitcodeWriter.h
87

s/constrction/construction/

lib/Bitcode/Writer/BitcodeWriter.cpp
100–103

s/Abstrac/Abstract/

But in fact I don't think it class is abstract, is it?

4007

Is a "module summary" described anywhere?

lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp
424–425

This should be obsolete right now?

Change some comment. Remove StripDebugInfo function while writing the minimized bitcode file.

haojiewang marked 3 inline comments as done.Jul 12 2017, 5:40 PM
haojiewang added inline comments.
lib/Bitcode/Writer/BitcodeWriter.cpp
4007

I got that name from ModuleSummaryIndex, but it's not a good name. Already changed it to minimized bitcode file.

lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp
424–425

Yes, I think so. Already removed it.

haojiewang added inline comments.Jul 12 2017, 5:40 PM
lib/Bitcode/Writer/BitcodeWriter.cpp
100–103

Yes, I also think so. I described it this way because the BitcodeWriterBase class's comment also says it is a abstract class, though I don't think it is. I was confused by this. Should I change both of them?

mehdi_amini added inline comments.Jul 12 2017, 5:48 PM
include/llvm/Bitcode/BitcodeWriter.h
89

Use a const reference then?

91

Same: if non-null required, why not using a reference?

144

Same?

lib/Bitcode/Writer/BitcodeWriter.cpp
4007

I have the same question for "minimized bitcode file" though, it may be some doc debt if this is already used in some place and not defined, just trying to flag these, not blocking your patch.

haojiewang marked 2 inline comments as done.Jul 12 2017, 6:04 PM
haojiewang added inline comments.
include/llvm/Bitcode/BitcodeWriter.h
89

But M is actually also required non-null. I just used the same args list with writeModule. So here using const references instead of pointer is better?

pcc edited edge metadata.Jul 12 2017, 6:11 PM

Please add a test that shows that the thinlink files can be read by llvm-lto2.

lib/Bitcode/Writer/BitcodeWriter.cpp
4031

With this call you're emitting a lot of unnecessary information into the minimized bitcode file. As I was suggesting on the other thread I wonder whether you can just emit "short" module-level records that just contain the name and the linkage (maybe see if you can drop the linkage, as it should already be available in the summary).

test/Transforms/ThinLTOBitcodeWriter/no-type-md.ll
49–68

This part of the test is no longer relevant.

50

I'd also check what we *are* emitting, not just what we're not emitting.

mehdi_amini added inline comments.Jul 12 2017, 6:15 PM
include/llvm/Bitcode/BitcodeWriter.h
89

I believe references carry the non_null information explicitly as part of the API.

haojiewang added inline comments.Jul 12 2017, 6:19 PM
lib/Bitcode/Writer/BitcodeWriter.cpp
4031

Yes, this is what I'm going to do after this patch released. So do you suggest I do it in this patch or leave it to next patch?

test/Transforms/ThinLTOBitcodeWriter/no-type-md.ll
49–68

You mean that I'd better write a new test file, right? Along with the llvm-lto2 test.

haojiewang added inline comments.Jul 12 2017, 6:21 PM
include/llvm/Bitcode/BitcodeWriter.h
89

Sure, I mean should I also change pointer M to reference or just change the Index and ModHash? And also for pointer M in writeModule?

pcc added inline comments.Jul 12 2017, 6:33 PM
lib/Bitcode/Writer/BitcodeWriter.cpp
4031

I would suggest writing an initial patch that changes the index bitcode reader to read the linkage from the summary instead of from the module, and then once that lands do what I suggested in this patch.

test/Transforms/ThinLTOBitcodeWriter/no-type-md.ll
49–68

I just mean that if you're now reading the file with llvm-bcanalyzer instead of llvm-dis you wouldn't expect the output to contain !llvm.dbg.cu either with or without your change, so it doesn't make sense to test for it. You can keep the rest of the test in this file.

Regarding llvm-lto2, you could either create a new test file or convert this one to use llvm-lto2 instead of llvm-lto.

haojiewang added inline comments.Jul 12 2017, 7:34 PM
lib/Bitcode/Writer/BitcodeWriter.cpp
4031

If it is ok that we do that after this patch? Because I think module info have too much redundancy info for minimized bitcode file, most of which is necessary for IR bitcode file. We can add a new writeModuleInfo for minimized bitcode file after this patch, and then also change the bitcode reader. That can get 5%-10% benefit on size reduction.

pcc added inline comments.Jul 12 2017, 8:39 PM
lib/Bitcode/Writer/BitcodeWriter.cpp
4031

Up to you but keep in mind that your code will be easier to review if you do things in the order I suggested, because we won't have to think about the design that we know you're going to replace, just the one that you're going to replace it with.

tejohnson added inline comments.Jul 13 2017, 9:36 AM
lib/Bitcode/Writer/BitcodeWriter.cpp
4007

The FE option is -fthin-link-bitcode, so perhaps ThinLinkBitcodeWriter? Admittedly I need to add documentation for this option.

4031

That change seems like an incremental follow on to this one. I.e. it affects the reader, whereas this patch just takes a first cut at reducing what we emit in the writer. So I am personally happy with this one going in first. Peter, are there things here that would be throwaway work if this went in first and that was a follow on?

pcc added inline comments.Jul 13 2017, 11:02 AM
lib/Bitcode/Writer/BitcodeWriter.cpp
4031

I can accept that it may be easier to develop the patches in this order, but as I mentioned it makes things harder to review (which I guess is the "throwaway work" -- if we go with this first I would need to carefully look at the body of writeModuleInfo to make sure that running it wouldn't cause problems in this new flow, even though we know that it won't make a difference in the end).

Generally when I write patches I try to make things easier for the reviewer. But as I say, up to you folks.

haojiewang added inline comments.Jul 13 2017, 3:14 PM
lib/Bitcode/Writer/BitcodeWriter.cpp
4031

I checked the code and found a problem for getting linkage from the summary instead of module info. The current logic is that we first insert the ValueID-ValueInfo map, which requires GUID(getting from name, linkage and source file name) and ValueID. This will be done after reading the module info. Then when reading summary, we can get GUID from that map. If we want to get the linkage for a value when reading summary, we cannot get the GUID until we read that value's summary. But this value may appear before this in a ref list or call list of other value, which requires the value info for it, while we don't have that yet. So if we want to do it this way, the logic may become more complex: we have to store all the ref list and call list to a ValueID-XXList map, and update them to SummaryIndex when we finish reading all the summary part. Is there another way to solve this problem?

pcc added inline comments.Jul 13 2017, 4:36 PM
lib/Bitcode/Writer/BitcodeWriter.cpp
4031

I see. In that case I think it makes sense to forget about moving the linkage as an initial step and change this patch so that it creates short module records containing just the name and the linkage. That would address my main design concern with this patch.

haojiewang added inline comments.Jul 13 2017, 5:07 PM
lib/Bitcode/Writer/BitcodeWriter.cpp
4031

I have two questions for shortening the module records:

  1. If we only record the name and linkage, the index of linkage in module record will be changed(from 5 to 2). Should we change it only for summary or also change the original module info(put linkage right after the name)? We have to also support reading summary from full IR bitcode file, so I guess we should let them have the same format.
  2. Currently there are two versions of module record(one is getting name from VST and another is getting name from strtab). Should we add a third version? Or obsolete the old ones?
pcc added inline comments.Jul 13 2017, 6:36 PM
lib/Bitcode/Writer/BitcodeWriter.cpp
4031
  1. I would just keep the linkage at offset 5 and use zeroes as padding.
  2. If you do that there should be no need for a new version.
haojiewang added inline comments.Jul 13 2017, 6:57 PM
lib/Bitcode/Writer/BitcodeWriter.cpp
4031

Oh, I see. Thank you. I will upload the update tomorrow.

Add writeSimplifiedModuleInfo to ThinLinkBitcodeWriter, now only the source filename, global value name and linkage will be writen while writing module info for a thin link bitcode file.
All the comments, names that contain "minimized bitcode file" and "module summary" are changed to "thin link bitcode".
Change the non-null pointer to reference.
Update the test for thin link bitcode file.

haojiewang marked 27 inline comments as done.Jul 14 2017, 2:58 PM
haojiewang added inline comments.
test/Transforms/ThinLTOBitcodeWriter/no-type-md.ll
49–68

Current thin link bitcode cannot be read by llvm-lto2. I'm tracing the reason and trying to find how to fix it. Do we need fix this in this patch?

pcc added inline comments.Jul 14 2017, 3:22 PM
test/Transforms/ThinLTOBitcodeWriter/no-type-md.ll
49–68

How are you calling llvm-lto2? You're passing -thinlto-distributed-indexes, right?

haojiewang added inline comments.Jul 14 2017, 3:26 PM
test/Transforms/ThinLTOBitcodeWriter/no-type-md.ll
49–68

Actually no, I just used the same command I saw in other test because I'm now familiar with llvm-lto2. What test command should I use?

pcc added inline comments.Jul 14 2017, 3:28 PM
test/Transforms/ThinLTOBitcodeWriter/no-type-md.ll
49–68

Probably something like http://llvm-cs.pcc.me.uk/test/ThinLTO/X86/distributed_import.ll#10 with resolutions for f and g.

haojiewang added inline comments.Jul 14 2017, 4:59 PM
test/Transforms/ThinLTOBitcodeWriter/no-type-md.ll
49–68

llvm-lto2 works, but this test cannot past. Because the symtab will not be built in this test case. And the full IR bitcode file for this test also cannot pass llvm-lto2 because there is no data layout in this test. Should I change this test to a more complete one or just leave it here and add another test?

pcc added inline comments.Jul 14 2017, 5:03 PM
test/Transforms/ThinLTOBitcodeWriter/no-type-md.ll
49–68

I'd add a triple and datalayout to this test, it probably ought to have one anyway.

Update test for llvm-lto2.

Update the test for llvm-lto2.

haojiewang marked 7 inline comments as done.Jul 14 2017, 5:25 PM
tejohnson edited edge metadata.Jul 18 2017, 2:29 PM

This looks pretty good to me now, a few nits below. Please wait to see if Peter has any more comments.

include/llvm/Bitcode/BitcodeWriter.h
87

Elsewhere we describe this as the minimized bitcode file used for the thin link, can you mention that for your 2 interfaces? Specifically what is missing is that these are writing just what is necessary for the thin link.

90

nit: s/generating/generated/

133

ditto

lib/Bitcode/Writer/BitcodeWriter.cpp
4031

nit: s/using/used/

lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp
424–425

nit: braces can be removed now.

tejohnson added inline comments.Jul 18 2017, 2:55 PM
lib/Bitcode/Writer/BitcodeWriter.cpp
101

Just noticed, this should be ThinLinkBitcodeWriter now.

Update some comments.

haojiewang marked 6 inline comments as done.Jul 18 2017, 3:23 PM
pcc added inline comments.Jul 18 2017, 3:33 PM
include/llvm/Bitcode/BitcodeWriter.h
93

clang-format

136

clang-format

lib/Bitcode/Writer/BitcodeWriter.cpp
114

I think we can move this into the derived class now, together with Hasher and addToStrtab.

124

I think this change removes the need to support writing a module with a precomputed hash from ModuleBitcodeWriter. Can you please remove that functionality?

142

clang-format

test/Transforms/ThinLTOBitcodeWriter/no-type-md.ll
49–68

Sorry, I clearly wasn't paying attention when I referred you to the distributed_import.ll test above, as that test is testing for pretty much exactly what I thought wasn't already being tested for. May I ask you to remove the llvm-lto2 part and move the llvm-bcanalyzer part of this test to distributed_import.ll?

haojiewang added inline comments.Jul 18 2017, 4:11 PM
lib/Bitcode/Writer/BitcodeWriter.cpp
114

Oh, sure. But I think this change will require a new addToStrtab for ThinLinkBitcodeWriter, which only add the str to StrtabBuilder without update GenerateHash.

124

Writing a module with a precomputed hash is still needed while split writing a module(splitAndWriteThinLTOBitcode), because I didn't touch the split writing in this patch. Maybe I should also change the writer in split writing?

pcc added inline comments.Jul 18 2017, 4:20 PM
lib/Bitcode/Writer/BitcodeWriter.cpp
114

You can just change the callers to call StrtabBuilder.add(), that's what we did before we needed to add addToStrtab to fix the caching problem in ModuleBitcodeWriter.

124

You mean here?
http://llvm-cs.pcc.me.uk/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp#386

Yes, I think you can change that to call writeThinLinkBitcode.

haojiewang added inline comments.Jul 18 2017, 5:28 PM
lib/Bitcode/Writer/BitcodeWriter.cpp
124

I just tried to change the split writing, but there were bunch of errors in test after I did that. There will be 11 test files should be updated if we change this, and most of them are because llvm-dis cannot read a thin link bitcode file. Should we do it in this patch or leave it the next patch? How should we update them, just changing llvm-dis to llvm-bcanalyzer or also adding the CHECKs and CHECK-NOTs as in the test file of this patch?

haojiewang added inline comments.Jul 18 2017, 5:32 PM
lib/Bitcode/Writer/BitcodeWriter.cpp
124

I'm just not sure if I would affect other people's code if I do so much changes on the test files.

haojiewang added inline comments.Jul 18 2017, 5:44 PM
lib/Bitcode/Writer/BitcodeWriter.cpp
124

Sorry, I just made a terrible mistake on this one. Actually there would be only 1 test file should be updated, which is for the llvm-dis problem. Besides, for the "NODEBUG" checks, I can just remove them because they are not needed after this patch, right? And after I add the CHECKs and CHECK-NOTs in distributed-import.ll, I don't have to do the check again for the split.ll test file, right?

pcc added inline comments.Jul 18 2017, 5:54 PM
lib/Bitcode/Writer/BitcodeWriter.cpp
124

Yes, you can remove the NODEBUG checks from that file, and no other changes should be required there since we're already making sure we can read the resulting files with llvm-lto (ideally it should be changed to use llvm-lto2, but that can be done some other time).

  1. Format the code using clang-format.
  2. Move GenerateHash, Hasher, addToStrtab from ModuleBitcodeWriterBase to ModuleBitcodeWriter since they are not really used in ThinLinkBitcodeWriter. Now ThinLinkBitcodeWriter just use StrtabBuilder.add() directly to write symbol names to Strtab.
  3. Update splitAndWriteThinLTOBitcode so that only the necessary information will be written into ThinLTO parts.
  4. Functionally remove writing a module with a precomputed module hash.
  5. Update the test files. Remove the NODEBUG tests in split.ll and no-type-md.ll. Thin link bitcode file content check is moved into distributed-import.ll.
haojiewang marked 14 inline comments as done.Jul 18 2017, 6:43 PM
pcc added inline comments.Jul 19 2017, 11:51 AM
lib/Bitcode/Writer/BitcodeWriter.cpp
119

This comment doesn't really make sense anymore in the context of the base class. I'd replace the ModHash field in this class with a ModHash field in both derived classes with their own comments, since they really serve different purposes in each class.

4021

Inline into caller.

lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp
380

Update comment

424–425

Update comment

test/Transforms/ThinLTOBitcodeWriter/no-type-md.ll
48–68

You can remove everything from this line onwards I think.

test/Transforms/ThinLTOBitcodeWriter/split.ll
46 ↗(On Diff #107227)

Likewise.

haojiewang marked 6 inline comments as done.
  1. Move ModHash to derived classes.
  2. Inline a one line function.
  3. Comments update.
  4. Tests update.
pcc accepted this revision.Jul 19 2017, 6:43 PM

LGTM

lib/Bitcode/Writer/BitcodeWriter.cpp
212

I'd remove the second sentence.

4010

I'd reword as "generated while writing the module bitcode file".

4011

This could be const ModuleHash &ModHash;.

4059

Remove blank line.

4124

This comment isn't very useful, it just says the same thing as the code. I'd remove it.

I think the other comments in this function are self evident from the code as well, but up to you if you want to keep them.

This revision is now accepted and ready to land.Jul 19 2017, 6:43 PM
  1. Change ModHash in ThinLinkBitcodeWriter, writeThinLinkBitcode and WriteThinLinkBitcodeToFile to const.
  2. Update some comments.
haojiewang marked 5 inline comments as done.Jul 19 2017, 7:29 PM
haojiewang added inline comments.
lib/Bitcode/Writer/BitcodeWriter.cpp
4124

I just keep the module hash comment, since I inline that function.

haojiewang closed this revision.Jul 21 2017, 10:25 AM
haojiewang marked an inline comment as done.