Currently the ThinLTO minimized bitcode file only strip the debug info, but there is still a lot of information in the minimized bit code file that will be not used for thin linker. In this patch, most of the extra information is striped to reduce the minimized bitcode file. Now only ModuleVersion, ModuleInfo, ModuleGlobalValueSummary, ModuleHash, Symtab and Strtab are left. Now the minimized bitcode file size is reduced to 15%-30% of the debug info stripped bitcode file size.
Details
Diff Detail
- Build Status
Buildable 8407 Build 8407: arc lint + arc unit
Event Timeline
include/llvm/Bitcode/BitcodeWriter.h | ||
---|---|---|
87 | s/constrction/construction/ | |
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
100–103 | s/Abstrac/Abstract/ But in fact I don't think it class is abstract, is it? | |
4003 | Is a "module summary" described anywhere? | |
lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp | ||
421–422 | This should be obsolete right now? |
Change some comment. Remove StripDebugInfo function while writing the minimized bitcode file.
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
---|---|---|
100–103 | Yes, I also think so. I described it this way because the BitcodeWriterBase class's comment also says it is a abstract class, though I don't think it is. I was confused by this. Should I change both of them? |
include/llvm/Bitcode/BitcodeWriter.h | ||
---|---|---|
89 | Use a const reference then? | |
91 | Same: if non-null required, why not using a reference? | |
144 | Same? | |
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
4003 | I have the same question for "minimized bitcode file" though, it may be some doc debt if this is already used in some place and not defined, just trying to flag these, not blocking your patch. |
include/llvm/Bitcode/BitcodeWriter.h | ||
---|---|---|
89 | But M is actually also required non-null. I just used the same args list with writeModule. So here using const references instead of pointer is better? |
Please add a test that shows that the thinlink files can be read by llvm-lto2.
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
---|---|---|
4027 | With this call you're emitting a lot of unnecessary information into the minimized bitcode file. As I was suggesting on the other thread I wonder whether you can just emit "short" module-level records that just contain the name and the linkage (maybe see if you can drop the linkage, as it should already be available in the summary). | |
test/Transforms/ThinLTOBitcodeWriter/no-type-md.ll | ||
36 | This part of the test is no longer relevant. | |
37 | I'd also check what we *are* emitting, not just what we're not emitting. |
include/llvm/Bitcode/BitcodeWriter.h | ||
---|---|---|
89 | I believe references carry the non_null information explicitly as part of the API. |
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
---|---|---|
4027 | Yes, this is what I'm going to do after this patch released. So do you suggest I do it in this patch or leave it to next patch? | |
test/Transforms/ThinLTOBitcodeWriter/no-type-md.ll | ||
36 | You mean that I'd better write a new test file, right? Along with the llvm-lto2 test. |
include/llvm/Bitcode/BitcodeWriter.h | ||
---|---|---|
89 | Sure, I mean should I also change pointer M to reference or just change the Index and ModHash? And also for pointer M in writeModule? |
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
---|---|---|
4027 | I would suggest writing an initial patch that changes the index bitcode reader to read the linkage from the summary instead of from the module, and then once that lands do what I suggested in this patch. | |
test/Transforms/ThinLTOBitcodeWriter/no-type-md.ll | ||
36 | I just mean that if you're now reading the file with llvm-bcanalyzer instead of llvm-dis you wouldn't expect the output to contain !llvm.dbg.cu either with or without your change, so it doesn't make sense to test for it. You can keep the rest of the test in this file. Regarding llvm-lto2, you could either create a new test file or convert this one to use llvm-lto2 instead of llvm-lto. |
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
---|---|---|
4027 | If it is ok that we do that after this patch? Because I think module info have too much redundancy info for minimized bitcode file, most of which is necessary for IR bitcode file. We can add a new writeModuleInfo for minimized bitcode file after this patch, and then also change the bitcode reader. That can get 5%-10% benefit on size reduction. |
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
---|---|---|
4027 | Up to you but keep in mind that your code will be easier to review if you do things in the order I suggested, because we won't have to think about the design that we know you're going to replace, just the one that you're going to replace it with. |
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
---|---|---|
4003 | The FE option is -fthin-link-bitcode, so perhaps ThinLinkBitcodeWriter? Admittedly I need to add documentation for this option. | |
4027 | That change seems like an incremental follow on to this one. I.e. it affects the reader, whereas this patch just takes a first cut at reducing what we emit in the writer. So I am personally happy with this one going in first. Peter, are there things here that would be throwaway work if this went in first and that was a follow on? |
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
---|---|---|
4027 | I can accept that it may be easier to develop the patches in this order, but as I mentioned it makes things harder to review (which I guess is the "throwaway work" -- if we go with this first I would need to carefully look at the body of writeModuleInfo to make sure that running it wouldn't cause problems in this new flow, even though we know that it won't make a difference in the end). Generally when I write patches I try to make things easier for the reviewer. But as I say, up to you folks. |
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
---|---|---|
4027 | I checked the code and found a problem for getting linkage from the summary instead of module info. The current logic is that we first insert the ValueID-ValueInfo map, which requires GUID(getting from name, linkage and source file name) and ValueID. This will be done after reading the module info. Then when reading summary, we can get GUID from that map. If we want to get the linkage for a value when reading summary, we cannot get the GUID until we read that value's summary. But this value may appear before this in a ref list or call list of other value, which requires the value info for it, while we don't have that yet. So if we want to do it this way, the logic may become more complex: we have to store all the ref list and call list to a ValueID-XXList map, and update them to SummaryIndex when we finish reading all the summary part. Is there another way to solve this problem? |
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
---|---|---|
4027 | I see. In that case I think it makes sense to forget about moving the linkage as an initial step and change this patch so that it creates short module records containing just the name and the linkage. That would address my main design concern with this patch. |
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
---|---|---|
4027 | I have two questions for shortening the module records:
|
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
---|---|---|
4027 |
|
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
---|---|---|
4027 | Oh, I see. Thank you. I will upload the update tomorrow. |
Add writeSimplifiedModuleInfo to ThinLinkBitcodeWriter, now only the source filename, global value name and linkage will be writen while writing module info for a thin link bitcode file.
All the comments, names that contain "minimized bitcode file" and "module summary" are changed to "thin link bitcode".
Change the non-null pointer to reference.
Update the test for thin link bitcode file.
test/Transforms/ThinLTOBitcodeWriter/no-type-md.ll | ||
---|---|---|
36 | Current thin link bitcode cannot be read by llvm-lto2. I'm tracing the reason and trying to find how to fix it. Do we need fix this in this patch? |
test/Transforms/ThinLTOBitcodeWriter/no-type-md.ll | ||
---|---|---|
36 | How are you calling llvm-lto2? You're passing -thinlto-distributed-indexes, right? |
test/Transforms/ThinLTOBitcodeWriter/no-type-md.ll | ||
---|---|---|
36 | Actually no, I just used the same command I saw in other test because I'm now familiar with llvm-lto2. What test command should I use? |
test/Transforms/ThinLTOBitcodeWriter/no-type-md.ll | ||
---|---|---|
36 | Probably something like http://llvm-cs.pcc.me.uk/test/ThinLTO/X86/distributed_import.ll#10 with resolutions for f and g. |
test/Transforms/ThinLTOBitcodeWriter/no-type-md.ll | ||
---|---|---|
36 | llvm-lto2 works, but this test cannot past. Because the symtab will not be built in this test case. And the full IR bitcode file for this test also cannot pass llvm-lto2 because there is no data layout in this test. Should I change this test to a more complete one or just leave it here and add another test? |
test/Transforms/ThinLTOBitcodeWriter/no-type-md.ll | ||
---|---|---|
36 | I'd add a triple and datalayout to this test, it probably ought to have one anyway. |
This looks pretty good to me now, a few nits below. Please wait to see if Peter has any more comments.
include/llvm/Bitcode/BitcodeWriter.h | ||
---|---|---|
87 | Elsewhere we describe this as the minimized bitcode file used for the thin link, can you mention that for your 2 interfaces? Specifically what is missing is that these are writing just what is necessary for the thin link. | |
90 | nit: s/generating/generated/ | |
133 | ditto | |
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
4027 | nit: s/using/used/ | |
lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp | ||
421–422 | nit: braces can be removed now. |
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
---|---|---|
101 | Just noticed, this should be ThinLinkBitcodeWriter now. |
include/llvm/Bitcode/BitcodeWriter.h | ||
---|---|---|
93 | clang-format | |
136 | clang-format | |
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
112–113 | I think this change removes the need to support writing a module with a precomputed hash from ModuleBitcodeWriter. Can you please remove that functionality? | |
112–113 | I think we can move this into the derived class now, together with Hasher and addToStrtab. | |
127–128 | clang-format | |
test/Transforms/ThinLTOBitcodeWriter/no-type-md.ll | ||
36 | Sorry, I clearly wasn't paying attention when I referred you to the distributed_import.ll test above, as that test is testing for pretty much exactly what I thought wasn't already being tested for. May I ask you to remove the llvm-lto2 part and move the llvm-bcanalyzer part of this test to distributed_import.ll? |
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
---|---|---|
112–113 | Oh, sure. But I think this change will require a new addToStrtab for ThinLinkBitcodeWriter, which only add the str to StrtabBuilder without update GenerateHash. | |
112–113 | Writing a module with a precomputed hash is still needed while split writing a module(splitAndWriteThinLTOBitcode), because I didn't touch the split writing in this patch. Maybe I should also change the writer in split writing? |
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
---|---|---|
112–113 | You can just change the callers to call StrtabBuilder.add(), that's what we did before we needed to add addToStrtab to fix the caching problem in ModuleBitcodeWriter. | |
112–113 | You mean here? Yes, I think you can change that to call writeThinLinkBitcode. |
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
---|---|---|
112–113 | I just tried to change the split writing, but there were bunch of errors in test after I did that. There will be 11 test files should be updated if we change this, and most of them are because llvm-dis cannot read a thin link bitcode file. Should we do it in this patch or leave it the next patch? How should we update them, just changing llvm-dis to llvm-bcanalyzer or also adding the CHECKs and CHECK-NOTs as in the test file of this patch? |
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
---|---|---|
112–113 | I'm just not sure if I would affect other people's code if I do so much changes on the test files. |
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
---|---|---|
112–113 | Sorry, I just made a terrible mistake on this one. Actually there would be only 1 test file should be updated, which is for the llvm-dis problem. Besides, for the "NODEBUG" checks, I can just remove them because they are not needed after this patch, right? And after I add the CHECKs and CHECK-NOTs in distributed-import.ll, I don't have to do the check again for the split.ll test file, right? |
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
---|---|---|
112–113 | Yes, you can remove the NODEBUG checks from that file, and no other changes should be required there since we're already making sure we can read the resulting files with llvm-lto (ideally it should be changed to use llvm-lto2, but that can be done some other time). |
- Format the code using clang-format.
- Move GenerateHash, Hasher, addToStrtab from ModuleBitcodeWriterBase to ModuleBitcodeWriter since they are not really used in ThinLinkBitcodeWriter. Now ThinLinkBitcodeWriter just use StrtabBuilder.add() directly to write symbol names to Strtab.
- Update splitAndWriteThinLTOBitcode so that only the necessary information will be written into ThinLTO parts.
- Functionally remove writing a module with a precomputed module hash.
- Update the test files. Remove the NODEBUG tests in split.ll and no-type-md.ll. Thin link bitcode file content check is moved into distributed-import.ll.
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
---|---|---|
112–113 | This comment doesn't really make sense anymore in the context of the base class. I'd replace the ModHash field in this class with a ModHash field in both derived classes with their own comments, since they really serve different purposes in each class. | |
4017 | Inline into caller. | |
lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp | ||
378–379 | Update comment | |
421–422 | Update comment | |
test/Transforms/ThinLTOBitcodeWriter/no-type-md.ll | ||
36 | You can remove everything from this line onwards I think. | |
test/Transforms/ThinLTOBitcodeWriter/split.ll | ||
44 | Likewise. |
- Move ModHash to derived classes.
- Inline a one line function.
- Comments update.
- Tests update.
LGTM
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
---|---|---|
198 | I'd remove the second sentence. | |
4006 | I'd reword as "generated while writing the module bitcode file". | |
4007 | This could be const ModuleHash &ModHash;. | |
4055 | Remove blank line. | |
4120 | This comment isn't very useful, it just says the same thing as the code. I'd remove it. I think the other comments in this function are self evident from the code as well, but up to you if you want to keep them. |
- Change ModHash in ThinLinkBitcodeWriter, writeThinLinkBitcode and WriteThinLinkBitcodeToFile to const.
- Update some comments.
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
---|---|---|
4120 | I just keep the module hash comment, since I inline that function. |
s/constrction/construction/