This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Use std::map to get determistic imports files
ClosedPublic

Authored by tejohnson on Jun 29 2018, 11:31 AM.

Details

Summary

I noticed that the .imports files emitted for distributed ThinLTO
backends do not have consistent ordering. This is because StringMap
iteration order is not guaranteed to be deterministic. Since we already
have a std::map with this information, used when emitting the individual
index files (ModuleToSummariesForIndex), use it for the imports files as
well.

This issue is likely causing some unnecessary rebuilds of the ThinLTO
backends in our distributed build system as the imports files are inputs
to those backends.

Diff Detail

Repository
rL LLVM

Event Timeline

tejohnson created this revision.Jun 29 2018, 11:31 AM

Does the iteration order of those map matters other than changing the hash? Will it change symbol resolution or importing different function?

The other option is MapVector, which preserve the insertion order, which might be faster than std::map.

Does the iteration order of those map matters other than changing the hash? Will it change symbol resolution or importing different function?

No effect on importing/symbol resolution etc. In fact, this is not an input to the compiler at all, it is used by the build system. The change in order will just make it look to the build system like it has changed when it hasn't.

The other option is MapVector, which preserve the insertion order, which might be faster than std::map.

The trickiness is that currently it may not be inserted in a deterministic order, because we use StringMap elsewhere in the thin link. So std::map is the most straightforward way to get the desired behavior.

Does the iteration order of those map matters other than changing the hash? Will it change symbol resolution or importing different function?

No effect on importing/symbol resolution etc. In fact, this is not an input to the compiler at all, it is used by the build system. The change in order will just make it look to the build system like it has changed when it hasn't.

I understand. But if the order is used by the build system, it can also affect the output binary. If the only goal is to keep the thin link function import map constant when the input doesn't change, std::map is fine. Can you add a test case for this?

The other option is MapVector, which preserve the insertion order, which might be faster than std::map.

The trickiness is that currently it may not be inserted in a deterministic order, because we use StringMap elsewhere in the thin link. So std::map is the most straightforward way to get the desired behavior.

This is unfortunate. I can't think of anything can be wrong because of that now but might need to revisit.

Does the iteration order of those map matters other than changing the hash? Will it change symbol resolution or importing different function?

No effect on importing/symbol resolution etc. In fact, this is not an input to the compiler at all, it is used by the build system. The change in order will just make it look to the build system like it has changed when it hasn't.

I understand. But if the order is used by the build system, it can also affect the output binary.

I'm not sure I understand how - additional ThinLTO backend processes could be unnecessarily redone, but they would have the same inputs and produce the same object file as before.

If the only goal is to keep the thin link function import map constant when the input doesn't change, std::map is fine. Can you add a test case for this?

I'll add a test that emits an imports file with multiple lines and check for them in order - it wouldn't necessarily fail without this change (we could get lucky), but it might catch any regression in this ordering.

The other option is MapVector, which preserve the insertion order, which might be faster than std::map.

The trickiness is that currently it may not be inserted in a deterministic order, because we use StringMap elsewhere in the thin link. So std::map is the most straightforward way to get the desired behavior.

This is unfortunate. I can't think of anything can be wrong because of that now but might need to revisit.

tejohnson updated this revision to Diff 153568.Jun 29 2018, 2:13 PM

Add test with multiple imports files that are expected to be ordered

steven_wu accepted this revision.Jul 10 2018, 9:24 AM

LGTM. Thanks!

This revision is now accepted and ready to land.Jul 10 2018, 9:24 AM

std::map is usually fairly slow, have you considered sorting at the time you write to the file instead? That would add a slow step when writing the files but wouldn't affect any in-memory operation.

std::map is usually fairly slow, have you considered sorting at the time you write to the file instead? That would add a slow step when writing the files but wouldn't affect any in-memory operation.

Actually...looking at the code and what I even wrote here in the description, I'm not sure why it didn't occur to me to simply use the std::map ModuleToSummariesForIndex we are already building for the purposes of writing the distributed index files. At least in the new LTO API, that is done just before calling EmitImportsFiles, so we have it on hand! For the old LTO API, that std::map is not being done yet, but this is just llvm-lto testing code, so I can add a call to the same llvm::gatherImportedSummariesForModule before invoking EmitImportsFiles.

Let me change my patch to do this...

Use the existing ModuleToSummariesForIndex instead.

tejohnson retitled this revision from [ThinLTO] Use std::map for import lists to get determistic imports files to [ThinLTO] Use std::map to get determistic imports files.Jul 10 2018, 12:17 PM
tejohnson edited the summary of this revision. (Show Details)
tejohnson added a reviewer: mehdi_amini.
This revision was automatically updated to reflect the committed changes.