This is intended to provide a parallel (threaded) ThinLTO scheme
for linker plugin use through the libLTO C API.
Details
Diff Detail
Event Timeline
Did an initial pass through, some comments and questions below.
include/llvm-c/lto.h | ||
---|---|---|
562 | Wraps a single returned object file? (Comment for this and following structure are currently identical) | |
585 | Why "prior to" here and other places? | |
624 | Document return type? | |
include/llvm/LTO/ThinLTOCodeGenerator.h | ||
53 | do you mean "If a symbol is not listed there, it will be optimized away if it is inlined into every usage"? | |
83 | Doxygen comments on these and other members as well? | |
lib/LTO/ThinLTOCodeGenerator.cpp | ||
65 | No, should not be lazy. That is just for reading the combined index. | |
176 | Why don't we want to set an opt level? | |
249 | This case was already handled above. | |
298 | So the linker must place any external symbols with references from other modules in this set, right? I see that you added a comment that suggest that to the thinlto_codegen_add_must_preserve_symbol interface. Might want to add something to that effect here as well. | |
tools/llvm-lto/llvm-lto.cpp | ||
491 | Is this temporary? | |
503 | Why are all these commented out? | |
tools/lto/lto.cpp | ||
288 | Why the FIXME? |
Thanks for the review, I'll fix all of what you found :)
include/llvm-c/lto.h | ||
---|---|---|
562 | Yes this is a single object, I'll update the comment. | |
585 | "prior to" is an unfortunate copy/paste, I'll remove. | |
624 | I'll add. | |
include/llvm/LTO/ThinLTOCodeGenerator.h | ||
53 | Basically I expect the linker to tell me the list of symbols for which there is a reference from *outside* the ThinLTO modules. | |
83 | Sure! | |
lib/LTO/ThinLTOCodeGenerator.cpp | ||
65 | OK | |
176 | Not sure, we don't have a linker flag to do that, with current full LTO there is none either I think. Maybe passing -mllvm -O3 can have an impact? (I think the -mllvm are supposed to be "debug" flags though) This is something we'd like to figure out but not completely sure what is the best way (the front-end could encode this information in the bitcode for example). | |
249 | Good catch :) | |
298 | Any symbols with reference from other *non thinlto* modules to be exact. | |
tools/llvm-lto/llvm-lto.cpp | ||
491 | Yes! Will clean up | |
503 | Some debug code left over, I'll clean it up. | |
tools/lto/lto.cpp | ||
288 | Oh this API should just be removed. |
lib/LTO/ThinLTOCodeGenerator.cpp | ||
---|---|---|
298 | So an external symbol with a reference from another thinlto module would not be in this set and would be internalized? I realize that you are remembering the original linkage and later setting it back. But what happens in the mean time with the internalize and globalDCE passes? Couldn't they cause one of the symbols referenced from another thinlto module to be internalized and then eliminated, since you don't see those references here? |
lib/LTO/ThinLTOCodeGenerator.cpp | ||
---|---|---|
298 | You're right! |
More cleanup and a bugfix: cannot run globalopt too early because it breaks ThinLTO.
The issue is that globalopt (after internalize) may turning (former) globals into "unnamed address".
However this happens when optimizing the module, not when the module is used for importing in other modules. The global will be imported as available externally in these modules but won't be available in the source module because the "unamed address".
Mostly comment typo and wording suggestions
include/llvm-c/lto.h | ||
---|---|---|
580 | "Frees the code generator..." | |
589 | typo s/stay/stays/ | |
599 | Inconsistent capitalization. Some parameters here and in other interfaces are lower case, some here and other places start with upper case. | |
613 | s/match/matches/ | |
614 | maybe "and in future implementations may change" or "and may change in future implementations" | |
651 | Maybe "The intention is to make the bitcode files available for debugging after..." | |
include/llvm/LTO/ThinLTOCodeGenerator.h | ||
12 | s/provide/provides/ | |
45 | s/provide/provides/ | |
55 | Think this comment is clearer if the last sentence changed to "If a symbol is not listed there, it will be optimized away if it is inlined into every usage." | |
lib/LTO/ThinLTOCodeGenerator.cpp | ||
150 | s/let/let's/ or s/let/so/ | |
151 | This will be hard to correlate with the original file. How about using some part of the source file name saved in the module here and in the later opt.bc dump? I.e. use the base name of the path returned by getSourceFileName(), but still add in 'count' to avoid conflicts when the same file name is used at multiple paths. | |
209 | s/let/let's/ or s/let/so/ | |
227 | s/possible/possibly/ | |
483 | Do we expect to reuse the same code generator multiple times? |
Update with a good candidate to be integrated as a first Proof-of-concept.
I am able to link every target in clang/llvm with this implementation.
Beyong the actual implementation of the CodeGenerator, I am especially interested to
"lock-down" the libLTO API so that we can start to have linker support for it.
Some misc comments below.
include/llvm-c/lto.h | ||
---|---|---|
666 | What is the default? | |
include/llvm/LTO/ThinLTOCodeGenerator.h | ||
1 | Fix file name. | |
46 | How do PruningInterval and Expiration interact? Are they mutually exclusive, or does a module get pruned at the min of these? | |
99 | s/controling/controlling/ | |
lib/LTO/ThinLTOCodeGenerator.cpp | ||
65 | Why not lazy load metadata too? | |
84 | Include Module's SourceFileName to get more meaningful dump file names? | |
96 | This is doing more than internalization, so name should probably be adjusted to reflect that. I don't have a good name off the top of my head. Maybe split into 2 and call the first something like resolveLinkOnceAndWeak, and if that returns false then invoke the internalization handling (maybe outline keepIfPreserved and make it return a bool). | |
104 | Should available_externally also be kept in SingleModule mode? | |
112 | I'm confused by the comment. It is already local if private, so no need to internalize or add to Keep, which is consistent with what the code is doing. I just don't understand what the comment about being restrictive is saying. | |
160 | Combine the LinkOnce cases and just pick the appropriate Weak linkage based on the LinkOnce linkage, the handling is identical otherwise. | |
180 | Combine the Weak* cases, the handling is identical. | |
211 | This is doing more than internalization. How about a more generic name such as LinkageOptimization or something like that? | |
271 | This handling is already outlined in the other InternalizeModule() function, invoke here. | |
317 | Does this need to be guarded by lock? Comments for LockM indicate it is just to guard optimization. | |
329 | Why don't we want to do this when there are no preserved or cross referenced symbols? Couldn't the internalization be even more aggressive in that case? Also, wouldn't we at least want the linkonce/weak handling? | |
343 | Needs doxygen class description. Is "EarlyOptimizedModuleMap" more accurate though? | |
345 | When would this not be the case? | |
352 | Out of curiosity, why a StringMap for ModuleMap and an unordered_map from string here? They are both index by the identifier string. | |
410 | crossImportIntoModule? | |
592 | Doesn't instantiating here mean that the map isn't leveraged across threads? Oh, I see that this routine and the above promote() are only invoked from llvm-lto. Is there a reason why llvm-lto doesn't use ThinLTOCodeGenerator::run()? |
Adding Sergei as a reviewer as he expressed interest in the past.
lib/LTO/ThinLTOCodeGenerator.cpp | ||
---|---|---|
151 | The problem is that with static library, the name is funky. We need to replace parentheses and other special character with something like underscore. |
include/llvm-c/lto.h | ||
---|---|---|
699 | This, and the two functions that ensue, may also need to say LTO_API_VERSION=18? | |
lib/LTO/ThinLTOCodeGenerator.cpp | ||
1 | This is probably copied from LTOCodeGenerator.cpp. Can you confirm that the summary line is accurate? | |
44 | Do you need unordered_set here? I did not see unordered_set being used in this file. |
Hi Mehdi,
lib/LTO/ThinLTOCodeGenerator.cpp | ||
---|---|---|
426 | Although not really necessary for this patch, it would be nice to have the flags to disable the inliner and vectorizer like we currently have in full LTO. This is specially useful when investigating differences between full and thin. |
Thanks Teresa for the review. See answers inlined.
include/llvm-c/lto.h | ||
---|---|---|
666 | No guarantee provided by the API :) | |
include/llvm/LTO/ThinLTOCodeGenerator.h | ||
46 | PruningInterval controls the interval between two check on the cache: i.e. if you set 2h the plugin will not bother checking for cache invalidation for the next two hours. | |
lib/LTO/ThinLTOCodeGenerator.cpp | ||
1 | It is not very helpful, but matches LTOCodeGenerator indeed. | |
65 | Aren't they almost immediately materialized anyway? I thought that laziness would have a cost for no real benefit maybe? | |
84 | SourceFileName can fit! Good point. | |
96 | I remove the internalization phase from this patch for now. This will change a lot with the graph in the summary. | |
104 | Good point. Dunno. | |
112 | This is copy/pasted from LTOCodeGenerator::applyRestriction. I read it as "don't need to do anything thing you won't internalize any more something already private" | |
160 | OK, I'll keep this for a future version of the patch, I removed the internalize for now. | |
317 | You need to block "reader" of OptimizedBuffer in case there is already a writer populating it. | |
329 | No preserved or cross referenced symbols means that you will internalize *everything* and then global DCE will remove *everything*. This is of little interest. So I just considered that empty() means the linker didn't provide any information at all. It helps implementing testing with llvm-lto as well. | |
345 | The buffers identifier are supplied by the linker and can be anything: extern void thinlto_codegen_add_module(thinlto_code_gen_t cg, const char *identifier, const char *data, int length); | |
352 | Here there is no good reason I think. I think that StringMap will allocate the entry with the key, so you want a value that is quite small (for rehashing / growing the map). | |
426 | Good point, will update to try using a set of flags as close as possible the LTOCodeGenerator | |
592 | The reason llvm-lto does not use run() is to test steps in isolation. |
Update: taking into account most comments, and removing the internalize phase as it will be very different with the summary-based importing coming next.
include/llvm-c/lto.h | ||
---|---|---|
668 | Is the max cache size set via the space that was available at the start of the compilation, or is the threshold updated so that if something else comes along and eats up some disk space the allowable max cache size is adjusted downward? | |
include/llvm/LTO/ThinLTOCodeGenerator.h | ||
48 | It just wasn't clear how they interact, although your explanation was what I guessed. Maybe for the thinlto_codegen_set_cache_entry_expiration interface add that the module may be removed earlier due to the pruning interval. | |
lib/LTO/LTOModule.cpp | ||
80 | Also for future consider caching value unless it is expected to be called only once per module. | |
lib/LTO/ThinLTOCodeGenerator.cpp | ||
67 | The when lazy metadata linking is enabled, metadata parsing should be postponed until we actually go to do the import and linkInModule during the bulk importing. Since the source module is destroyed at the end of linkInModule, this should reduce the max amount of live metadata to one module at a time, rather than all modules we are importing from. I'm surprised it didn't reduce the max memory usage in a debug build. If we ever go back to post-pass metadata linking in the function importer (i.e. if we decide to do iterative importing rather than a single bulk import from each module), this will become more critical. However, if we move to summary-only importing decisions it will obviate the need to do this. Until we go to summary only import decisions, I would think it should reduce the max memory usage as per my first paragraph. Did you see a cost going with lazy metadata loading? | |
86 | Is this a TODO? If so, please add TODO comment. | |
110 | Add doxygyen comment describing ModuleMap. | |
114 | Ok. When you put this support back in your description above ("don't need to do anything thing you won't internalize any more something already private") is better than the comment about restriction. | |
331 | Ok, but I think you still want the linkonce/weak linkage changes? When you put this back this is more reason to split the internalization and linkonce/weak linkage changes into two routines and only invoke the latter if the sets are empty. | |
359 | What is the difference between this ModuleBuffer and the one loaded out of the ModuleMap in the below call to loadModuleFromBuffer? If they are the same, can loadModuleFromBuffer be called a single time and the resulting module optionally saved after? | |
379 | Unfortunately this means that ThinLTOCodeGenerator::run() is currently untested. Consider adding a mode to llvm-lto that does all thinlto steps (thin-action=all?) to test all the steps via this interface. | |
tools/llvm-lto/llvm-lto.cpp | ||
82 | Actual option below is "functionindex". But per the ref graph patch, this is broader than a function index, so I am looking at changing references to function index to something else anyway. So I would suggest changing the actual option below to "thinlto-index". | |
361 | s/mentionned/mentioned/ (here and below for import()). | |
373 | Adding all the modules isn't needed for promotion. Should be able to remove this loop. | |
398 | There is a lot of code duplication between these various functions, consider refactoring possibly in a follow-on patch. | |
414 | Why is the import() step a superset of promote and import, whereas the other steps (e.g. optimize()) only doing one thing? | |
428 | Untested. | |
484 | What if there is more than one occurrence? |
include/llvm-c/lto.h | ||
---|---|---|
668 | Mmmm, this is yet to be implemented. | |
include/llvm/LTO/ThinLTOCodeGenerator.h | ||
48 | Now I'm unsure it was clear enough, because you wrote " the module may be removed earlier due to the pruning interval". A cache entry *can't* be remove earlier. The pruning interval means that will only *check* the entries. | |
lib/LTO/ThinLTOCodeGenerator.cpp | ||
86 | Will do. | |
110 | Will do. | |
331 | Good point. (I expect this to come back with a very different summary-based implementation) | |
359 | Good point, this is legacy from when there was all the internalization stage, will cleanup! | |
379 | OK. | |
tools/llvm-lto/llvm-lto.cpp | ||
373 | It isn't needed... for now ;) | |
398 | OK | |
414 | Same reason as above: when I wrote it against the pure summary-based importing, it was needed/helpful. I'll remove for now. | |
428 | Any suggestion on how to test that? This is basically like testing opt -O3? | |
484 | I'll add a check and error. |
include/llvm-c/lto.h | ||
---|---|---|
668 | I don't really understand what you mean by this, particularly the last sentence about using extra space during the link without any limit. Isn't this set/used during the link (which is running the ThinLTO steps in process)? | |
include/llvm/LTO/ThinLTOCodeGenerator.h | ||
48 | Ah, so a clarifying comment would help then since I misunderstood. It sounds like the modules will be checked for expiration (and pruned from the cache) at the pruning interval. So with the default pruning interval of -1, the expiration is unused? I took these to be separate, complimentary, mechanisms for keeping the cache size down. Another option beyond just documenting the interactions well is to combine these parameters into a single interface that takes both the pruning interval and the expiration. | |
tools/llvm-lto/llvm-lto.cpp | ||
428 | Not sure, maybe just invoke this stage in your test after the importing action to make sure it succeeds (without necessarily checking for any specific optimization)? |
Sorry for the late entry... Some of my questions could have been already answered.
include/llvm-c/lto.h | ||
---|---|---|
736 | Does this handles commons properly? | |
include/llvm/LTO/ThinLTOCodeGenerator.h | ||
149 | Can we theoretically have a mixture of opt levels? -O2/-Os? Should I be able to respect per library opt level? | |
lib/LTO/ThinLTOCodeGenerator.cpp | ||
68 | I do not know if it is relevant at this point, but for what it worth - my IR might have metadata that changes opt/codegen. | |
82 | Default to /tmp? | |
141 | Yes! ...and default should probably be O2... | |
145 | So I assume I should be able to control all of those... | |
185 | Sorry, I miss something - why is this unconditional? | |
364 | In general - don't you want to verify modules as they progress through the stages? I do it in regular LTO and it did help on more than one occasion :) | |
379 | +1 on Teresa's point about llvm-lto. | |
tools/llvm-lto/llvm-lto.cpp | ||
442 | Verify TheModule? |
Thanks for all the comments!
(Please see inline for the discussion)
include/llvm-c/lto.h | ||
---|---|---|
736 | Can you be more specific? I'm not sure what is specific with common on this aspect? | |
include/llvm/LTO/ThinLTOCodeGenerator.h | ||
149 | Ideally I think we'd want this parameter to be recorded in the bitcode itself, what do you think? Here it is about the *codegen* opt level though, it won't impact the optimizer. | |
lib/LTO/ThinLTOCodeGenerator.cpp | ||
68 | slarin: how does it play with cross-module importing? If a function is defined in a module compiled with O3 but imported in a module compiled with O2? | |
82 | The way the client is enabling dumping temporaries is by providing a path. | |
145 | Yes, but we don't have an interface for these as of today. A serialization of the PMB options in the bitcode would help. | |
185 | As mentioned above, this is conditional in the saveTempBitcode function itself, which starts with if (SaveTempsDir.empty()) return; makes sense? | |
364 | We'll always verify the module once at the beginning of the optimizer pipeline, but I guess we could do it more frequently in assert builds. | |
tools/llvm-lto/llvm-lto.cpp | ||
82 | Will do. | |
442 | This is done in optimized() (see line 143 in ThinLTOCodeGenerator.cpp PMB.VerifyInput = true;) |
include/llvm-c/lto.h | ||
---|---|---|
736 | I probably not fully understand how this list should work... but this is not the proper place to figure it out - please ignore this comment for now. | |
include/llvm/LTO/ThinLTOCodeGenerator.h | ||
149 |
Yes. Certainly.
...yes. My target has codegen properties that are exposed to a user, which might produce drastically different results if not set properly, but once again, ideally it should come from bitcode. | |
lib/LTO/ThinLTOCodeGenerator.cpp | ||
68 | If I know settings for each module I can handle these situations in platform specific way. Besides rough optimization levels I have different addressing modes used in different modules, and I might chose not to mix certain features at all... At this point my main concern seems to be revolving around general LTO issues (like mixing different optimization scopes into one) and might not be "thin"-lto specific. We had to jump through hoops for regular LTO, and I see very similar set of issues being designed in here as well... | |
82 | I see... | |
185 | Totally |
Thanks for the great review. Hopefully I didn't forget anything with this update.
include/llvm/LTO/ThinLTOCodeGenerator.h | ||
---|---|---|
48 | Tried to make the doc very explicit, let me know what you think. | |
lib/LTO/LTOModule.cpp | ||
80 | It is supposed to be called once indeed. I think we can update the implementation in the future if needed. | |
lib/LTO/ThinLTOCodeGenerator.cpp | ||
67 | So just tested: With lazy loading of metadata: getLazyBitcodeModule takes 237ms and materializeMetadata takes 74ms (total 314ms) So no perf diff. I probably don't see any diff on the memory because most metadata will leak to the context and being free'd with the Module. So there is no real impact on the peak memory (just delaying a little bit). | |
379 | Done, it was valuable :) |
Thanks for adding the description! I do think that the formula for the new cache size is not right, see suggested fix below.
A couple misc comment typos, but LGTM once these the above is addressed.
include/llvm-c/lto.h | ||
---|---|---|
687 | s/term/terms/ | |
690 | "left over half the available space"? (i.e. add "half") Either that or "left over the free space" (since AvailableSpace = FreeSpace + CacheSize)? | |
694 | This doesn't seem right. I think P should be divided by 100 and not multiplied by it, since it is a percentage. And I think the percentage needs to be multiplied by something, not divided by AvailableSpace? Should this be: since it is described as the percentage of available space used for the cache. For this and the above suggestion, any change needs to be replicated below in ThinLTOCodeGenerator.h. | |
include/llvm/LTO/ThinLTOCodeGenerator.h | ||
105 | move 'an' to next line. | |
lib/LTO/ThinLTOCodeGenerator.cpp | ||
67 | Ok, thanks for checking! |
Wraps a single returned object file? (Comment for this and following structure are currently identical)