This is an archive of the discontinued LLVM Phabricator instance.

LTO: Simplify caching interface.
ClosedPublic

Authored by pcc on Sep 15 2016, 1:14 PM.

Details

Summary

The NativeObjectOutput class has a design problem: it mixes up the caching
policy with the interface for output streams, which makes the client-side
code hard to follow and would for example make it harder to replace the
cache implementation in an arbitrary client.

This change separates the two aspects by moving the caching policy
to a separate field in Config, replacing NativeObjectOutput with a
NativeObjectStream class which only deals with streams and does not need to
be overridden by most clients and introducing an AddFile callback for adding
files (e.g. from the cache) to the link.

Diff Detail

Repository
rL LLVM

Event Timeline

pcc updated this revision to Diff 71541.Sep 15 2016, 1:14 PM
pcc retitled this revision from to LTO: Simplify caching interface..
pcc updated this object.
pcc added reviewers: tejohnson, mehdi_amini.
pcc added a subscriber: llvm-commits.
mehdi_amini added inline comments.Sep 15 2016, 4:15 PM
include/llvm/LTO/LTO.h
309 ↗(On Diff #71541)

I'm not in favor of this, currently a client that don't need caching does not need to handle file input.

mehdi_amini edited edge metadata.Sep 15 2016, 4:20 PM

make it harder to replace the cache implementation in an arbitrary client.

The class CacheObjectOutput is just a possible implementation in-tree for convenience, but just subclassing the NativeObjectOutput for any other custom cache implementation is quite simple. This is not clear to me what you mean then.

include/llvm/LTO/LTO.h
309 ↗(On Diff #71541)

I'm not in favor of this, currently a client that don't need caching does not need to handle file input....

... currently.

pcc added inline comments.Sep 15 2016, 4:21 PM
include/llvm/LTO/LTO.h
309 ↗(On Diff #71541)

Handling file input is pretty much what linkers do all the time, so I wouldn't consider it to be particularly burdensome :)

Is there any specific client you have in mind where this would be a problem?

pcc added a comment.Sep 15 2016, 4:28 PM

The class CacheObjectOutput is just a possible implementation in-tree for convenience, but just subclassing the NativeObjectOutput for any other custom cache implementation is quite simple. This is not clear to me what you mean then.

My point is that caching should be orthogonal to streaming native files, i.e. the code in the client that streams native files shouldn't have anything to do with caching.

mehdi_amini added inline comments.Sep 15 2016, 4:40 PM
tools/llvm-lto2/llvm-lto2.cpp
203–204 ↗(On Diff #71541)

As a side note: this is really inefficient: this is duplicating the file through memory without any apparent reason?

pcc added inline comments.Sep 15 2016, 4:45 PM
tools/llvm-lto2/llvm-lto2.cpp
203–204 ↗(On Diff #71541)

This is what the original code was doing.

(I'd like this code to provide some way for test cases to match filenames rather than blindly copying the contents across, but that can be done separately.)

My point is that caching should be orthogonal to streaming native files, i.e. the code in the client that streams native files shouldn't have anything to do with caching.

I see your point and it seems totally valid / relevant.
(Sorry for the delay I'm a little bit swamped)

tejohnson edited edge metadata.Sep 22 2016, 11:46 AM

Sorry for the delay in review. Looks like a nice cleanup! Looks pretty good to me, but I have a question, which may mean that I am not understanding something.

tools/gold/gold-plugin.cpp
801 ↗(On Diff #71541)

Removing the "&& options::cache_dir.empty()" seems like it would cause gold-plugin to remove the cache file, which is why I added that here when I added caching.

pcc added inline comments.Sep 22 2016, 12:14 PM
tools/gold/gold-plugin.cpp
801 ↗(On Diff #71541)

The only files created by AddStream are temporary (modulo save-temps), which means that they cannot be cache files.

(On a related note I just noticed that this patch also fixes a bug in the old code: if an output was cached gold would create an unused temporary file.)

tejohnson added inline comments.Sep 22 2016, 1:16 PM
tools/gold/gold-plugin.cpp
801 ↗(On Diff #71541)

In r279883 I changed this so that the path to the cache entry is passed back to gold instead of the temp file (see where we set OutputName = EntryPath below. If IsTemporary is not set to false, then later when we call recordFile we end up adding Filename to the Cleanup vector which means it is removed later in the cleanup_hook.

But yes, it looks like we should not be calling getOutputFileName in the caching case.

pcc added inline comments.Sep 22 2016, 1:32 PM
tools/gold/gold-plugin.cpp
801 ↗(On Diff #71541)

The code where we set OutputName = EntryPath has been moved to AddFile. This callback isn't called at all if the file is cached (i.e. if we call AddFile for this output).

tejohnson added inline comments.Sep 22 2016, 1:56 PM
tools/gold/gold-plugin.cpp
801 ↗(On Diff #71541)

But it is called on a cache miss, where you also call AddFile and will use the cache entry directly. Looks like that happens when you destroy CacheStream. That's what I believe we are doing currently. So we don't want to remove the Filenames[Task] in that case either.

pcc added inline comments.Sep 22 2016, 2:07 PM
tools/gold/gold-plugin.cpp
801 ↗(On Diff #71541)

In the case of a cache miss we use an AddStream provided by the cache, which does not call the AddStream provided by the client.

tejohnson added inline comments.Sep 22 2016, 2:14 PM
tools/gold/gold-plugin.cpp
801 ↗(On Diff #71541)

Ah - that's what I was missing, or had forgotten, by the time I got here in the patch! Can you document that in the LTO::run interface? I went back and it doesn't mention that. I.e. that if caching is enabled, the AddStream provided to run() is not used? Might be good to document here and in llvm-lto2 as well that the AddStream we create here not used if caching enabled.

pcc added inline comments.Sep 22 2016, 2:20 PM
tools/gold/gold-plugin.cpp
801 ↗(On Diff #71541)

If there's at least one non-thin LTO input, we will use AddStream. But yes, it does seem sensible to document this more explicitly in LTO::run.

pcc updated this revision to Diff 72218.Sep 22 2016, 2:50 PM
pcc edited edge metadata.
  • Add note to LTO::run documentation
pcc marked an inline comment as done.Sep 22 2016, 2:55 PM
tejohnson accepted this revision.Sep 22 2016, 10:13 PM
tejohnson edited edge metadata.

This LGTM, but see if Mehdi has any more comments

This revision is now accepted and ready to land.Sep 22 2016, 10:13 PM
mehdi_amini added inline comments.Sep 22 2016, 10:25 PM
include/llvm/LTO/Config.h
60 ↗(On Diff #72218)

The "AddFile" part seems convoluted to me. It seems to be here only to satisfy the flow of having a stream as a returned type.

72 ↗(On Diff #72218)

Since you're claiming to make it easier to change the cache implementation in the description, can you clarify how would I implement a "in-memory" cache with this interface?

116 ↗(On Diff #72218)

No documented.

include/llvm/LTO/LTO.h
316 ↗(On Diff #72218)

Is there any specific client you have in mind where this would be a problem?

That's not the angle I'm looking at when looking at the API design.

For instance:
Why is NativeObjectCache Cache in in the config? While these callbacks are here?

pcc updated this revision to Diff 72309.Sep 23 2016, 10:44 AM
pcc edited edge metadata.
  • Move AddFile to cache and Cache to LTO::run
include/llvm/LTO/Config.h
60 ↗(On Diff #72218)

Now gone

72 ↗(On Diff #72218)

New interface would allow you to implement that within the cache either internally or with a new callback.

116 ↗(On Diff #72218)

Removed.

include/llvm/LTO/LTO.h
316 ↗(On Diff #72218)

This is resolved with the new interface.

mehdi_amini accepted this revision.Sep 23 2016, 1:42 PM
mehdi_amini edited edge metadata.

The high-level LGTM, if there is anything that comes up on the details we could still handle it post-commit.

This revision was automatically updated to reflect the committed changes.