Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Phabricator shutdown timeline

libLTO: add a ThinLTOCodeGenerator on the model of LTOCodeGenerator.
ClosedPublic

Authored by mehdi_amini on Feb 10 2016, 1:32 AM.

Details

Summary

This is intended to provide a parallel (threaded) ThinLTO scheme
for linker plugin use through the libLTO C API.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Thanks Teresa for the review. See answers inlined.

include/llvm-c/lto.h
666

No guarantee provided by the API :)

include/llvm/LTO/ThinLTOCodeGenerator.h
47

PruningInterval controls the interval between two check on the cache: i.e. if you set 2h the plugin will not bother checking for cache invalidation for the next two hours.
Cache Expiration is per entry: when pruning you are deleting the entries older than this value.
I documented getter/setter, not enough?

lib/LTO/ThinLTOCodeGenerator.cpp
2

It is not very helpful, but matches LTOCodeGenerator indeed.

66

Aren't they almost immediately materialized anyway? I thought that laziness would have a cost for no real benefit maybe?
(I didn't see a change on the memory profile when lazy loading or not)

85

SourceFileName can fit! Good point.

97

I remove the internalization phase from this patch for now. This will change a lot with the graph in the summary.

105

Good point. Dunno.

113

This is copy/pasted from LTOCodeGenerator::applyRestriction. I read it as "don't need to do anything thing you won't internalize any more something already private"

161

OK, I'll keep this for a future version of the patch, I removed the internalize for now.

318

You need to block "reader" of OptimizedBuffer in case there is already a writer populating it.
(in case of contention on the lock with too many reader there would be a possible mitigation with atomic flags for a fast-path, but not worth it here I think)

330

No preserved or cross referenced symbols means that you will internalize *everything* and then global DCE will remove *everything*. This is of little interest. So I just considered that empty() means the linker didn't provide any information at all. It helps implementing testing with llvm-lto as well.

346

The buffers identifier are supplied by the linker and can be anything:

extern void thinlto_codegen_add_module(thinlto_code_gen_t cg,
                                       const char *identifier, const char *data,
                                       int length);
353

Here there is no good reason I think. I think that StringMap will allocate the entry with the key, so you want a value that is quite small (for rehashing / growing the map).

427

Good point, will update to try using a set of flags as close as possible the LTOCodeGenerator

593

The reason llvm-lto does not use run() is to test steps in isolation.

mehdi_amini marked 4 inline comments as done.

Update: taking into account most comments, and removing the internalize phase as it will be very different with the summary-based importing coming next.

tejohnson added inline comments.Mar 8 2016, 8:03 AM
include/llvm-c/lto.h
668

Is the max cache size set via the space that was available at the start of the compilation, or is the threshold updated so that if something else comes along and eats up some disk space the allowable max cache size is adjusted downward?

include/llvm/LTO/ThinLTOCodeGenerator.h
48

It just wasn't clear how they interact, although your explanation was what I guessed. Maybe for the thinlto_codegen_set_cache_entry_expiration interface add that the module may be removed earlier due to the pruning interval.

lib/LTO/LTOModule.cpp
80

Also for future consider caching value unless it is expected to be called only once per module.

lib/LTO/ThinLTOCodeGenerator.cpp
67

The when lazy metadata linking is enabled, metadata parsing should be postponed until we actually go to do the import and linkInModule during the bulk importing. Since the source module is destroyed at the end of linkInModule, this should reduce the max amount of live metadata to one module at a time, rather than all modules we are importing from. I'm surprised it didn't reduce the max memory usage in a debug build.

If we ever go back to post-pass metadata linking in the function importer (i.e. if we decide to do iterative importing rather than a single bulk import from each module), this will become more critical. However, if we move to summary-only importing decisions it will obviate the need to do this.

Until we go to summary only import decisions, I would think it should reduce the max memory usage as per my first paragraph. Did you see a cost going with lazy metadata loading?

86

Is this a TODO? If so, please add TODO comment.

110

Add doxygyen comment describing ModuleMap.

114

Ok. When you put this support back in your description above ("don't need to do anything thing you won't internalize any more something already private") is better than the comment about restriction.

331

Ok, but I think you still want the linkonce/weak linkage changes? When you put this back this is more reason to split the internalization and linkonce/weak linkage changes into two routines and only invoke the latter if the sets are empty.

359

What is the difference between this ModuleBuffer and the one loaded out of the ModuleMap in the below call to loadModuleFromBuffer? If they are the same, can loadModuleFromBuffer be called a single time and the resulting module optionally saved after?

379

Unfortunately this means that ThinLTOCodeGenerator::run() is currently untested. Consider adding a mode to llvm-lto that does all thinlto steps (thin-action=all?) to test all the steps via this interface.

tools/llvm-lto/llvm-lto.cpp
82

Actual option below is "functionindex". But per the ref graph patch, this is broader than a function index, so I am looking at changing references to function index to something else anyway. So I would suggest changing the actual option below to "thinlto-index".

369

s/mentionned/mentioned/ (here and below for import()).

381

Adding all the modules isn't needed for promotion. Should be able to remove this loop.

406

There is a lot of code duplication between these various functions, consider refactoring possibly in a follow-on patch.

422

Why is the import() step a superset of promote and import, whereas the other steps (e.g. optimize()) only doing one thing?

436

Untested.

551

What if there is more than one occurrence?

mehdi_amini marked 4 inline comments as done.Mar 8 2016, 9:41 AM
mehdi_amini added inline comments.
include/llvm-c/lto.h
668

Mmmm, this is yet to be implemented.
I'd say it applies to what we store on disk when the linker is *not running*. I.e. during the link you are allowed to use extra space without any limit.

include/llvm/LTO/ThinLTOCodeGenerator.h
48

Now I'm unsure it was clear enough, because you wrote " the module may be removed earlier due to the pruning interval". A cache entry *can't* be remove earlier. The pruning interval means that will only *check* the entries.

lib/LTO/ThinLTOCodeGenerator.cpp
86

Will do.

110

Will do.

331

Good point.

(I expect this to come back with a very different summary-based implementation)

359

Good point, this is legacy from when there was all the internalization stage, will cleanup!

379

OK.

tools/llvm-lto/llvm-lto.cpp
381

It isn't needed... for now ;)
I wrote this code against my prototype of summary-based importing where the promotion is driven by what the other modules export.
(I'll remove for now).

406

OK

422

Same reason as above: when I wrote it against the pure summary-based importing, it was needed/helpful. I'll remove for now.

436

Any suggestion on how to test that? This is basically like testing opt -O3?

551

I'll add a check and error.

tejohnson added inline comments.Mar 8 2016, 10:01 AM
include/llvm-c/lto.h
668

I don't really understand what you mean by this, particularly the last sentence about using extra space during the link without any limit. Isn't this set/used during the link (which is running the ThinLTO steps in process)?

include/llvm/LTO/ThinLTOCodeGenerator.h
48

Ah, so a clarifying comment would help then since I misunderstood. It sounds like the modules will be checked for expiration (and pruned from the cache) at the pruning interval. So with the default pruning interval of -1, the expiration is unused? I took these to be separate, complimentary, mechanisms for keeping the cache size down. Another option beyond just documenting the interactions well is to combine these parameters into a single interface that takes both the pruning interval and the expiration.

tools/llvm-lto/llvm-lto.cpp
436

Not sure, maybe just invoke this stage in your test after the importing action to make sure it succeeds (without necessarily checking for any specific optimization)?

slarin edited edge metadata.Mar 8 2016, 11:06 AM

Sorry for the late entry... Some of my questions could have been already answered.

include/llvm-c/lto.h
736

Does this handles commons properly?

include/llvm/LTO/ThinLTOCodeGenerator.h
149

Can we theoretically have a mixture of opt levels? -O2/-Os? Should I be able to respect per library opt level?

lib/LTO/ThinLTOCodeGenerator.cpp
68

I do not know if it is relevant at this point, but for what it worth - my IR might have metadata that changes opt/codegen.

82

Default to /tmp?

141

Yes! ...and default should probably be O2...

145

So I assume I should be able to control all of those...

185

Sorry, I miss something - why is this unconditional?

364

In general - don't you want to verify modules as they progress through the stages? I do it in regular LTO and it did help on more than one occasion :)

379

+1 on Teresa's point about llvm-lto.

tools/llvm-lto/llvm-lto.cpp
450

Verify TheModule?

mehdi_amini marked an inline comment as done.Mar 8 2016, 11:24 AM

Thanks for all the comments!
(Please see inline for the discussion)

include/llvm-c/lto.h
736

Can you be more specific? I'm not sure what is specific with common on this aspect?

include/llvm/LTO/ThinLTOCodeGenerator.h
149

Ideally I think we'd want this parameter to be recorded in the bitcode itself, what do you think?
Right now we don't even expose any linker flag for that.

Here it is about the *codegen* opt level though, it won't impact the optimizer.

lib/LTO/ThinLTOCodeGenerator.cpp
68

slarin: how does it play with cross-module importing? If a function is defined in a module compiled with O3 but imported in a module compiled with O2?

82

The way the client is enabling dumping temporaries is by providing a path.
Having a default would mean either always dumping temporaries (we don't want that...) or having a more complicated way of enabling dumping.

145

Yes, but we don't have an interface for these as of today. A serialization of the PMB options in the bitcode would help.

185

As mentioned above, this is conditional in the saveTempBitcode function itself, which starts with

if (SaveTempsDir.empty()) return;

makes sense?

364

We'll always verify the module once at the beginning of the optimizer pipeline, but I guess we could do it more frequently in assert builds.

tools/llvm-lto/llvm-lto.cpp
82

Will do.

450

This is done in optimized()

(see line 143 in ThinLTOCodeGenerator.cpp PMB.VerifyInput = true;)

slarin added inline comments.Mar 8 2016, 12:09 PM
include/llvm-c/lto.h
736

I probably not fully understand how this list should work... but this is not the proper place to figure it out - please ignore this comment for now.

include/llvm/LTO/ThinLTOCodeGenerator.h
149

Ideally I think we'd want this parameter to be recorded in the bitcode itself, what do you think?

Yes. Certainly.

it is about the *codegen*

...yes. My target has codegen properties that are exposed to a user, which might produce drastically different results if not set properly, but once again, ideally it should come from bitcode.

lib/LTO/ThinLTOCodeGenerator.cpp
68

If I know settings for each module I can handle these situations in platform specific way. Besides rough optimization levels I have different addressing modes used in different modules, and I might chose not to mix certain features at all... At this point my main concern seems to be revolving around general LTO issues (like mixing different optimization scopes into one) and might not be "thin"-lto specific. We had to jump through hoops for regular LTO, and I see very similar set of issues being designed in here as well...

82

I see...

185

Totally

mehdi_amini updated this revision to Diff 50087.Mar 8 2016, 4:07 PM
mehdi_amini marked 15 inline comments as done.
mehdi_amini edited edge metadata.

Taking comment into account

Thanks for the great review. Hopefully I didn't forget anything with this update.

include/llvm/LTO/ThinLTOCodeGenerator.h
48

Tried to make the doc very explicit, let me know what you think.

lib/LTO/LTOModule.cpp
80

It is supposed to be called once indeed. I think we can update the implementation in the future if needed.

lib/LTO/ThinLTOCodeGenerator.cpp
67

So just tested:

With lazy loading of metadata: getLazyBitcodeModule takes 237ms and materializeMetadata takes 74ms (total 314ms)
without lazy loading of metadata: getLazyBitcodeModule takes 316ms

So no perf diff.

I probably don't see any diff on the memory because most metadata will leak to the context and being free'd with the Module. So there is no real impact on the peak memory (just delaying a little bit).

379

Done, it was valuable :)

tejohnson accepted this revision.Mar 8 2016, 5:19 PM
tejohnson edited edge metadata.

Thanks for adding the description! I do think that the formula for the new cache size is not right, see suggested fix below.

A couple misc comment typos, but LGTM once these the above is addressed.

include/llvm-c/lto.h
687

s/term/terms/

690

"left over half the available space"? (i.e. add "half") Either that or "left over the free space" (since AvailableSpace = FreeSpace + CacheSize)?

694

This doesn't seem right. I think P should be divided by 100 and not multiplied by it, since it is a percentage. And I think the percentage needs to be multiplied by something, not divided by AvailableSpace?

Should this be:
NewCacheSize = AvailableSpace * P/100

since it is described as the percentage of available space used for the cache.

For this and the above suggestion, any change needs to be replicated below in ThinLTOCodeGenerator.h.

include/llvm/LTO/ThinLTOCodeGenerator.h
104

move 'an' to next line.

lib/LTO/ThinLTOCodeGenerator.cpp
67

Ok, thanks for checking!

This revision is now accepted and ready to land.Mar 8 2016, 5:19 PM