This is an archive of the discontinued LLVM Phabricator instance.

[ELF] - Implemented threaded --build-id computation
ClosedPublic

Authored by grimar on Nov 1 2016, 9:45 AM.

Details

Summary

Patch switches computing of --build-id hash to tree.

This is the way when input data is splitted by chunks,
hash is computed for each one in threaded/non-threaded way.
At the end hash is conputed for result tree.

With or without -threads the result hash is the same.

Diff Detail

Repository
rL LLVM

Event Timeline

grimar updated this revision to Diff 76582.Nov 1 2016, 9:45 AM
grimar retitled this revision from to [ELF] - Implemented --build-id=tree.
grimar updated this object.
grimar added reviewers: ruiu, rafael.
grimar added subscribers: llvm-commits, grimar, evgeny777, emaste.
ruiu edited edge metadata.Nov 1 2016, 10:22 AM

I don't think we want to add one more different type of --build-id= option because we already have too many. We probably should use threads by default for "fast", "sha1" and "md5".

grimar updated this revision to Diff 76704.Nov 2 2016, 8:04 AM
grimar retitled this revision from [ELF] - Implemented --build-id=tree to [ELF] - Implemented threaded --build-id computation.
grimar edited edge metadata.
  • Reimplemented according to review comments.

Now each algorithm is threaded if -threads option used.

grimar updated this object.Nov 2 2016, 8:08 AM
ruiu added inline comments.Nov 2 2016, 10:10 AM
ELF/Config.h
147 ↗(On Diff #76704)

There's no real benefit to make it customziable, so please remove from Config.

ELF/SyntheticSections.cpp
61 ↗(On Diff #76704)

doTree sounds a bit weird. Maybe something like computeHash.

62 ↗(On Diff #76704)

You don't have to pass Buf because it's a class member.

63 ↗(On Diff #76704)

Please don't name Job or Worker because they sound like high level stuffs. They are in fact just an ArrayRef and a callback function.

65–67 ↗(On Diff #76704)

This is not a tree but an array.

I don't know why you are using the global allocator here. This can be a local std::vector variable, no?

69 ↗(On Diff #76704)

I think I don't understand the code. So, basically, what you want to do in this function is

  1. Split Buf into small chunks. So we need a function, say split(), which takes an ArrayRef<uint8_t> and returns std::vector<ArrayRef<uint8_t>>.
  2. Call a callback for each chunk.
  3. And then compute the final hash.

What we need to do here is not too complicated, so I think you can simplify the code.

96 ↗(On Diff #76704)

You can directly pass DoHash to the function, e.g., doTree(Buf, DoHash). Dohash sounds a bit weird too, but you don't really name it in this context.

doTree(Buf, [](...) {
  ...
});
emaste added inline comments.Nov 2 2016, 2:21 PM
ELF/InputSection.cpp
897 ↗(On Diff #76582)

This seems like a strange combination. Why not just SHAx?

907 ↗(On Diff #76582)

We don't have a way to select another size so "default" doesn't seem right

emaste added a comment.Nov 2 2016, 2:22 PM

(Sorry, comments were on an old version of the patch)

grimar added a comment.Nov 3 2016, 1:00 AM

(Sorry, comments were on an old version of the patch)

Just in case:

Why not just SHAx?

That seems how gold do. I am interested myself why such mix was implemented.
(https://sourceware.org/ml/binutils/2013-04/msg00134/t.patch).

Anyways current version looks much better for me.

grimar updated this revision to Diff 76847.Nov 3 2016, 5:41 AM

Addressed review comments.

ELF/Config.h
147 ↗(On Diff #76704)

Done.

ELF/SyntheticSections.cpp
61 ↗(On Diff #76704)

Done.

62 ↗(On Diff #76704)

I have to. Argument Buf was a data to hash, it is not the same that member which is a section content.
I renamed argument to Data to avoid confusion (actually I did not notice the member with the same name when wrote it).

63 ↗(On Diff #76704)

Ok, done.

65–67 ↗(On Diff #76704)

I expected that might be a bit faster. Removed.

69 ↗(On Diff #76704)

Reimplemented.

96 ↗(On Diff #76704)

Done.

ruiu added inline comments.Nov 3 2016, 11:48 AM
ELF/SyntheticSections.cpp
74–82 ↗(On Diff #76847)

This can be simplified.

while (Arr.size() > ChunkSize) {
  Ret.push_back(...);
  Arr = Arr.drop_front...;
}
if (!Arr.empty())
  Ret.push_back(Arr);
return Ret;
88 ↗(On Diff #76847)

Cb is not a good name. Let's name Hash.

93 ↗(On Diff #76847)

You can always call parallel_for_each. If multi-threads are not available, it will fall back to std::for_each.

109 ↗(On Diff #76847)

Replace [&] with [] because I think it doesn't capture anything. (And ditto for other functions.)

grimar added inline comments.Nov 4 2016, 1:34 AM
ELF/SyntheticSections.cpp
93 ↗(On Diff #76847)

Why we do that check in
void OutputSection<ELFT>::writeTo(uint8_t *Buf) then ?

if (Config->Threads) {
  parallel_for_each(Sections.begin(), Sections.end(),
                    [=](InputSection<ELFT> *C) { C->writeTo(Buf); });
} else {
  for (InputSection<ELFT> *C : Sections)
    C->writeTo(Buf);
}

I was thinking that it is for user to decide, if he/she uses -threads in command line, then threading is used if possible in LLD.
I think that might be helpfull. And also as I mentioned is consistent with OutputSection<ELFT>::writeTo.

grimar updated this revision to Diff 76919.Nov 4 2016, 9:56 AM
  • Addressed review comments.
ELF/SyntheticSections.cpp
74–82 ↗(On Diff #76847)

Done.

88 ↗(On Diff #76847)

Done.

109 ↗(On Diff #76847)

Removed & here, but others lambdas below still needs to capture HashSize.

ruiu accepted this revision.Nov 4 2016, 11:34 AM
ruiu edited edge metadata.

LGTM

This revision is now accepted and ready to land.Nov 4 2016, 11:34 AM
This revision was automatically updated to reflect the committed changes.