This is an archive of the discontinued LLVM Phabricator instance.

libLTO: add a ThinLTOCodeGenerator on the model of LTOCodeGenerator.
ClosedPublic

Authored by mehdi_amini on Feb 10 2016, 1:32 AM.

Details

Summary

This is intended to provide a parallel (threaded) ThinLTO scheme
for linker plugin use through the libLTO C API.

Diff Detail

Event Timeline

mehdi_amini retitled this revision from to libLTO: add a ThinLTOCodeGenerator on the model of LTOCodeGenerator..
mehdi_amini updated this object.
mehdi_amini added reviewers: tejohnson, rafael.
mehdi_amini added a subscriber: llvm-commits.

(Remove empty line change)

tejohnson edited edge metadata.Feb 11 2016, 4:50 PM

Did an initial pass through, some comments and questions below.

include/llvm-c/lto.h
562

Wraps a single returned object file? (Comment for this and following structure are currently identical)

585

Why "prior to" here and other places?

624

Document return type?

include/llvm/LTO/ThinLTOCodeGenerator.h
53

do you mean "If a symbol is not listed there, it will be optimized away if it is inlined into every usage"?

83

Doxygen comments on these and other members as well?

lib/LTO/ThinLTOCodeGenerator.cpp
65

No, should not be lazy. That is just for reading the combined index.

176

Why don't we want to set an opt level?

249

This case was already handled above.

298

So the linker must place any external symbols with references from other modules in this set, right? I see that you added a comment that suggest that to the thinlto_codegen_add_must_preserve_symbol interface. Might want to add something to that effect here as well.

tools/llvm-lto/llvm-lto.cpp
279

Is this temporary?

291

Why are all these commented out?

tools/lto/lto.cpp
282

Why the FIXME?

Thanks for the review, I'll fix all of what you found :)

include/llvm-c/lto.h
562

Yes this is a single object, I'll update the comment.

585

"prior to" is an unfortunate copy/paste, I'll remove.

624

I'll add.

include/llvm/LTO/ThinLTOCodeGenerator.h
53

Basically I expect the linker to tell me the list of symbols for which there is a reference from *outside* the ThinLTO modules.

83

Sure!

lib/LTO/ThinLTOCodeGenerator.cpp
65

OK

176

Not sure, we don't have a linker flag to do that, with current full LTO there is none either I think. Maybe passing -mllvm -O3 can have an impact? (I think the -mllvm are supposed to be "debug" flags though)

This is something we'd like to figure out but not completely sure what is the best way (the front-end could encode this information in the bitcode for example).

249

Good catch :)

298

Any symbols with reference from other *non thinlto* modules to be exact.
I'll add a comment here.

tools/llvm-lto/llvm-lto.cpp
279

Yes! Will clean up

291

Some debug code left over, I'll clean it up.

tools/lto/lto.cpp
282

Oh this API should just be removed.
(Early stage of prototyping I was expecting the linker to tell me the exported/preserved symbols *for each individual files*, but I could see any benefit from it later so I reverted to a global list).

tejohnson added inline comments.Feb 11 2016, 10:16 PM
lib/LTO/ThinLTOCodeGenerator.cpp
298

So an external symbol with a reference from another thinlto module would not be in this set and would be internalized? I realize that you are remembering the original linkage and later setting it back. But what happens in the mean time with the internalize and globalDCE passes? Couldn't they cause one of the symbols referenced from another thinlto module to be internalized and then eliminated, since you don't see those references here?

mehdi_amini added inline comments.Feb 12 2016, 10:16 AM
lib/LTO/ThinLTOCodeGenerator.cpp
298

You're right!
What has to be inside are functions that are cross-referenced as well.
I was confused because I move the set of preserved symbols from "per module" to be on the code generator (i.e. one for all), but it is the union of the individual sets (if they still existed).

mehdi_amini marked 26 inline comments as done.
mehdi_amini edited edge metadata.

Update following Teresa's comment

All comments should be addressed!

More cleanup and a bugfix: cannot run globalopt too early because it breaks ThinLTO.

The issue is that globalopt (after internalize) may turning (former) globals into "unnamed address".
However this happens when optimizing the module, not when the module is used for importing in other modules. The global will be imported as available externally in these modules but won't be available in the source module because the "unamed address".

Mostly comment typo and wording suggestions

include/llvm-c/lto.h
580

"Frees the code generator..."

589

typo s/stay/stays/

599

Inconsistent capitalization. Some parameters here and in other interfaces are lower case, some here and other places start with upper case.

613

s/match/matches/

614

maybe "and in future implementations may change" or "and may change in future implementations"

651

Maybe "The intention is to make the bitcode files available for debugging after..."

include/llvm/LTO/ThinLTOCodeGenerator.h
12

s/provide/provides/

45

s/provide/provides/

55

Think this comment is clearer if the last sentence changed to "If a symbol is not listed there, it will be optimized away if it is inlined into every usage."

lib/LTO/ThinLTOCodeGenerator.cpp
150

s/let/let's/ or s/let/so/

151

This will be hard to correlate with the original file. How about using some part of the source file name saved in the module here and in the later opt.bc dump? I.e. use the base name of the path returned by getSourceFileName(), but still add in 'count' to avoid conflicts when the same file name is used at multiple paths.

209

s/let/let's/ or s/let/so/

227

s/possible/possibly/

483

Do we expect to reuse the same code generator multiple times?

mehdi_amini updated this revision to Diff 49867.Mar 4 2016, 6:16 PM

Update with a good candidate to be integrated as a first Proof-of-concept.

I am able to link every target in clang/llvm with this implementation.

Beyong the actual implementation of the CodeGenerator, I am especially interested to
"lock-down" the libLTO API so that we can start to have linker support for it.

Some misc comments below.

include/llvm-c/lto.h
666

What is the default?

include/llvm/LTO/ThinLTOCodeGenerator.h
2

Fix file name.

47

How do PruningInterval and Expiration interact? Are they mutually exclusive, or does a module get pruned at the min of these?

100

s/controling/controlling/

lib/LTO/ThinLTOCodeGenerator.cpp
66

Why not lazy load metadata too?

85

Include Module's SourceFileName to get more meaningful dump file names?

97

This is doing more than internalization, so name should probably be adjusted to reflect that. I don't have a good name off the top of my head. Maybe split into 2 and call the first something like resolveLinkOnceAndWeak, and if that returns false then invoke the internalization handling (maybe outline keepIfPreserved and make it return a bool).

105

Should available_externally also be kept in SingleModule mode?

113

I'm confused by the comment. It is already local if private, so no need to internalize or add to Keep, which is consistent with what the code is doing. I just don't understand what the comment about being restrictive is saying.

161

Combine the LinkOnce cases and just pick the appropriate Weak linkage based on the LinkOnce linkage, the handling is identical otherwise.

181

Combine the Weak* cases, the handling is identical.

212

This is doing more than internalization. How about a more generic name such as LinkageOptimization or something like that?

272

This handling is already outlined in the other InternalizeModule() function, invoke here.

318

Does this need to be guarded by lock? Comments for LockM indicate it is just to guard optimization.

330

Why don't we want to do this when there are no preserved or cross referenced symbols? Couldn't the internalization be even more aggressive in that case? Also, wouldn't we at least want the linkonce/weak handling?

344

Needs doxygen class description. Is "EarlyOptimizedModuleMap" more accurate though?

346

When would this not be the case?

353

Out of curiosity, why a StringMap for ModuleMap and an unordered_map from string here? They are both index by the identifier string.

411

crossImportIntoModule?

593

Doesn't instantiating here mean that the map isn't leveraged across threads? Oh, I see that this routine and the above promote() are only invoked from llvm-lto. Is there a reason why llvm-lto doesn't use ThinLTOCodeGenerator::run()?

Adding Sergei as a reviewer as he expressed interest in the past.

lib/LTO/ThinLTOCodeGenerator.cpp
151

The problem is that with static library, the name is funky. We need to replace parentheses and other special character with something like underscore.

ygao added a subscriber: ygao.Mar 7 2016, 11:09 AM
ygao added inline comments.Mar 7 2016, 11:30 AM
include/llvm-c/lto.h
699

This, and the two functions that ensue, may also need to say LTO_API_VERSION=18?
I think it is more natural to say "a list of global symbols" than "a list of all global symbols", but English is not my first language.

lib/LTO/ThinLTOCodeGenerator.cpp
2

This is probably copied from LTOCodeGenerator.cpp. Can you confirm that the summary line is accurate?

45

Do you need unordered_set here? I did not see unordered_set being used in this file.

Thanks Gao!

include/llvm-c/lto.h
699

Good catch. Will fix (both).

lib/LTO/ThinLTOCodeGenerator.cpp
45

It used in a more elaborate version of this code that I have locally, but not needed here. Will remove.

bruno added a subscriber: bruno.Mar 7 2016, 11:48 AM

Hi Mehdi,

lib/LTO/ThinLTOCodeGenerator.cpp
427

Although not really necessary for this patch, it would be nice to have the flags to disable the inliner and vectorizer like we currently have in full LTO. This is specially useful when investigating differences between full and thin.

mehdi_amini marked 27 inline comments as done.Mar 7 2016, 10:55 PM

Thanks Teresa for the review. See answers inlined.

include/llvm-c/lto.h
666

No guarantee provided by the API :)

include/llvm/LTO/ThinLTOCodeGenerator.h
47

PruningInterval controls the interval between two check on the cache: i.e. if you set 2h the plugin will not bother checking for cache invalidation for the next two hours.
Cache Expiration is per entry: when pruning you are deleting the entries older than this value.
I documented getter/setter, not enough?

lib/LTO/ThinLTOCodeGenerator.cpp
2

It is not very helpful, but matches LTOCodeGenerator indeed.

66

Aren't they almost immediately materialized anyway? I thought that laziness would have a cost for no real benefit maybe?
(I didn't see a change on the memory profile when lazy loading or not)

85

SourceFileName can fit! Good point.

97

I remove the internalization phase from this patch for now. This will change a lot with the graph in the summary.

105

Good point. Dunno.

113

This is copy/pasted from LTOCodeGenerator::applyRestriction. I read it as "don't need to do anything thing you won't internalize any more something already private"

161

OK, I'll keep this for a future version of the patch, I removed the internalize for now.

318

You need to block "reader" of OptimizedBuffer in case there is already a writer populating it.
(in case of contention on the lock with too many reader there would be a possible mitigation with atomic flags for a fast-path, but not worth it here I think)

330

No preserved or cross referenced symbols means that you will internalize *everything* and then global DCE will remove *everything*. This is of little interest. So I just considered that empty() means the linker didn't provide any information at all. It helps implementing testing with llvm-lto as well.

346

The buffers identifier are supplied by the linker and can be anything:

extern void thinlto_codegen_add_module(thinlto_code_gen_t cg,
                                       const char *identifier, const char *data,
                                       int length);
353

Here there is no good reason I think. I think that StringMap will allocate the entry with the key, so you want a value that is quite small (for rehashing / growing the map).

427

Good point, will update to try using a set of flags as close as possible the LTOCodeGenerator

593

The reason llvm-lto does not use run() is to test steps in isolation.

mehdi_amini marked 4 inline comments as done.

Update: taking into account most comments, and removing the internalize phase as it will be very different with the summary-based importing coming next.

tejohnson added inline comments.Mar 8 2016, 8:03 AM
include/llvm-c/lto.h
668

Is the max cache size set via the space that was available at the start of the compilation, or is the threshold updated so that if something else comes along and eats up some disk space the allowable max cache size is adjusted downward?

include/llvm/LTO/ThinLTOCodeGenerator.h
48

It just wasn't clear how they interact, although your explanation was what I guessed. Maybe for the thinlto_codegen_set_cache_entry_expiration interface add that the module may be removed earlier due to the pruning interval.

lib/LTO/LTOModule.cpp
85

Also for future consider caching value unless it is expected to be called only once per module.

lib/LTO/ThinLTOCodeGenerator.cpp
67

The when lazy metadata linking is enabled, metadata parsing should be postponed until we actually go to do the import and linkInModule during the bulk importing. Since the source module is destroyed at the end of linkInModule, this should reduce the max amount of live metadata to one module at a time, rather than all modules we are importing from. I'm surprised it didn't reduce the max memory usage in a debug build.

If we ever go back to post-pass metadata linking in the function importer (i.e. if we decide to do iterative importing rather than a single bulk import from each module), this will become more critical. However, if we move to summary-only importing decisions it will obviate the need to do this.

Until we go to summary only import decisions, I would think it should reduce the max memory usage as per my first paragraph. Did you see a cost going with lazy metadata loading?

86

Is this a TODO? If so, please add TODO comment.

110

Add doxygyen comment describing ModuleMap.

114

Ok. When you put this support back in your description above ("don't need to do anything thing you won't internalize any more something already private") is better than the comment about restriction.

331

Ok, but I think you still want the linkonce/weak linkage changes? When you put this back this is more reason to split the internalization and linkonce/weak linkage changes into two routines and only invoke the latter if the sets are empty.

359

What is the difference between this ModuleBuffer and the one loaded out of the ModuleMap in the below call to loadModuleFromBuffer? If they are the same, can loadModuleFromBuffer be called a single time and the resulting module optionally saved after?

379

Unfortunately this means that ThinLTOCodeGenerator::run() is currently untested. Consider adding a mode to llvm-lto that does all thinlto steps (thin-action=all?) to test all the steps via this interface.

tools/llvm-lto/llvm-lto.cpp
85

Actual option below is "functionindex". But per the ref graph patch, this is broader than a function index, so I am looking at changing references to function index to something else anyway. So I would suggest changing the actual option below to "thinlto-index".

272

What if there is more than one occurrence?

339

s/mentionned/mentioned/ (here and below for import()).

351

Adding all the modules isn't needed for promotion. Should be able to remove this loop.

376

There is a lot of code duplication between these various functions, consider refactoring possibly in a follow-on patch.

392

Why is the import() step a superset of promote and import, whereas the other steps (e.g. optimize()) only doing one thing?

406

Untested.

mehdi_amini marked 4 inline comments as done.Mar 8 2016, 9:41 AM
mehdi_amini added inline comments.
include/llvm-c/lto.h
668

Mmmm, this is yet to be implemented.
I'd say it applies to what we store on disk when the linker is *not running*. I.e. during the link you are allowed to use extra space without any limit.

include/llvm/LTO/ThinLTOCodeGenerator.h
48

Now I'm unsure it was clear enough, because you wrote " the module may be removed earlier due to the pruning interval". A cache entry *can't* be remove earlier. The pruning interval means that will only *check* the entries.

lib/LTO/ThinLTOCodeGenerator.cpp
86

Will do.

110

Will do.

331

Good point.

(I expect this to come back with a very different summary-based implementation)

359

Good point, this is legacy from when there was all the internalization stage, will cleanup!

379

OK.

tools/llvm-lto/llvm-lto.cpp
272

I'll add a check and error.

351

It isn't needed... for now ;)
I wrote this code against my prototype of summary-based importing where the promotion is driven by what the other modules export.
(I'll remove for now).

376

OK

392

Same reason as above: when I wrote it against the pure summary-based importing, it was needed/helpful. I'll remove for now.

406

Any suggestion on how to test that? This is basically like testing opt -O3?

tejohnson added inline comments.Mar 8 2016, 10:01 AM
include/llvm-c/lto.h
668

I don't really understand what you mean by this, particularly the last sentence about using extra space during the link without any limit. Isn't this set/used during the link (which is running the ThinLTO steps in process)?

include/llvm/LTO/ThinLTOCodeGenerator.h
48

Ah, so a clarifying comment would help then since I misunderstood. It sounds like the modules will be checked for expiration (and pruned from the cache) at the pruning interval. So with the default pruning interval of -1, the expiration is unused? I took these to be separate, complimentary, mechanisms for keeping the cache size down. Another option beyond just documenting the interactions well is to combine these parameters into a single interface that takes both the pruning interval and the expiration.

tools/llvm-lto/llvm-lto.cpp
406

Not sure, maybe just invoke this stage in your test after the importing action to make sure it succeeds (without necessarily checking for any specific optimization)?

slarin edited edge metadata.Mar 8 2016, 11:06 AM

Sorry for the late entry... Some of my questions could have been already answered.

include/llvm-c/lto.h
736

Does this handles commons properly?

include/llvm/LTO/ThinLTOCodeGenerator.h
149

Can we theoretically have a mixture of opt levels? -O2/-Os? Should I be able to respect per library opt level?

lib/LTO/ThinLTOCodeGenerator.cpp
68

I do not know if it is relevant at this point, but for what it worth - my IR might have metadata that changes opt/codegen.

82

Default to /tmp?

141

Yes! ...and default should probably be O2...

145

So I assume I should be able to control all of those...

185

Sorry, I miss something - why is this unconditional?

364

In general - don't you want to verify modules as they progress through the stages? I do it in regular LTO and it did help on more than one occasion :)

379

+1 on Teresa's point about llvm-lto.

tools/llvm-lto/llvm-lto.cpp
420

Verify TheModule?

mehdi_amini marked an inline comment as done.Mar 8 2016, 11:24 AM

Thanks for all the comments!
(Please see inline for the discussion)

include/llvm-c/lto.h
736

Can you be more specific? I'm not sure what is specific with common on this aspect?

include/llvm/LTO/ThinLTOCodeGenerator.h
149

Ideally I think we'd want this parameter to be recorded in the bitcode itself, what do you think?
Right now we don't even expose any linker flag for that.

Here it is about the *codegen* opt level though, it won't impact the optimizer.

lib/LTO/ThinLTOCodeGenerator.cpp
68

slarin: how does it play with cross-module importing? If a function is defined in a module compiled with O3 but imported in a module compiled with O2?

82

The way the client is enabling dumping temporaries is by providing a path.
Having a default would mean either always dumping temporaries (we don't want that...) or having a more complicated way of enabling dumping.

145

Yes, but we don't have an interface for these as of today. A serialization of the PMB options in the bitcode would help.

185

As mentioned above, this is conditional in the saveTempBitcode function itself, which starts with

if (SaveTempsDir.empty()) return;

makes sense?

364

We'll always verify the module once at the beginning of the optimizer pipeline, but I guess we could do it more frequently in assert builds.

tools/llvm-lto/llvm-lto.cpp
85

Will do.

420

This is done in optimized()

(see line 143 in ThinLTOCodeGenerator.cpp PMB.VerifyInput = true;)

slarin added inline comments.Mar 8 2016, 12:09 PM
include/llvm-c/lto.h
736

I probably not fully understand how this list should work... but this is not the proper place to figure it out - please ignore this comment for now.

include/llvm/LTO/ThinLTOCodeGenerator.h
149

Ideally I think we'd want this parameter to be recorded in the bitcode itself, what do you think?

Yes. Certainly.

it is about the *codegen*

...yes. My target has codegen properties that are exposed to a user, which might produce drastically different results if not set properly, but once again, ideally it should come from bitcode.

lib/LTO/ThinLTOCodeGenerator.cpp
68

If I know settings for each module I can handle these situations in platform specific way. Besides rough optimization levels I have different addressing modes used in different modules, and I might chose not to mix certain features at all... At this point my main concern seems to be revolving around general LTO issues (like mixing different optimization scopes into one) and might not be "thin"-lto specific. We had to jump through hoops for regular LTO, and I see very similar set of issues being designed in here as well...

82

I see...

185

Totally

mehdi_amini updated this revision to Diff 50087.Mar 8 2016, 4:07 PM
mehdi_amini marked 15 inline comments as done.
mehdi_amini edited edge metadata.

Taking comment into account

Thanks for the great review. Hopefully I didn't forget anything with this update.

include/llvm/LTO/ThinLTOCodeGenerator.h
48

Tried to make the doc very explicit, let me know what you think.

lib/LTO/LTOModule.cpp
85

It is supposed to be called once indeed. I think we can update the implementation in the future if needed.

lib/LTO/ThinLTOCodeGenerator.cpp
67

So just tested:

With lazy loading of metadata: getLazyBitcodeModule takes 237ms and materializeMetadata takes 74ms (total 314ms)
without lazy loading of metadata: getLazyBitcodeModule takes 316ms

So no perf diff.

I probably don't see any diff on the memory because most metadata will leak to the context and being free'd with the Module. So there is no real impact on the peak memory (just delaying a little bit).

379

Done, it was valuable :)

tejohnson accepted this revision.Mar 8 2016, 5:19 PM
tejohnson edited edge metadata.

Thanks for adding the description! I do think that the formula for the new cache size is not right, see suggested fix below.

A couple misc comment typos, but LGTM once these the above is addressed.

include/llvm-c/lto.h
687

s/term/terms/

690

"left over half the available space"? (i.e. add "half") Either that or "left over the free space" (since AvailableSpace = FreeSpace + CacheSize)?

694

This doesn't seem right. I think P should be divided by 100 and not multiplied by it, since it is a percentage. And I think the percentage needs to be multiplied by something, not divided by AvailableSpace?

Should this be:
NewCacheSize = AvailableSpace * P/100

since it is described as the percentage of available space used for the cache.

For this and the above suggestion, any change needs to be replicated below in ThinLTOCodeGenerator.h.

include/llvm/LTO/ThinLTOCodeGenerator.h
105

move 'an' to next line.

lib/LTO/ThinLTOCodeGenerator.cpp
67

Ok, thanks for checking!

This revision is now accepted and ready to land.Mar 8 2016, 5:19 PM