This is an archive of the discontinued LLVM Phabricator instance.

[ELF] - Lazy initialization of MergeInputSection class internals.
AbandonedPublic

Authored by grimar on May 19 2016, 7:31 AM.

Details

Reviewers
ruiu
rafael
Summary

Patch implements a way to lazily fill the Offsets vector of MergeInputSection class.

Now MergeInputSection<ELFT>::Offsets vector is filled much later,
after all GC processing, right after creation of output sections.
This allows to avoid filling it with dead ranges of data.
and opens oportunity to implement decompression
of sections in the way when only live sections are decompressed.
Also after that they should be easily switched to
parrallel_for proccessing here I am sure.

This rework also helped to cut off very similar code in
MergeOutputSection<ELFT>::addSection() and
MergeInputSection<ELFT>::MergeInputSection(), it is combined now
to MergeInputSection<ELFT>::prepareRanges().
Code from MergeOutputSection<ELFT>::addSection() is just gone.

Diff Detail

Event Timeline

grimar updated this revision to Diff 57784.May 19 2016, 7:31 AM
grimar retitled this revision from to [ELF] - Lazy initialization of MergeInputSection class internals..
grimar updated this object.
grimar added reviewers: ruiu, rafael.
grimar added subscribers: llvm-commits, grimar.
grimar added inline comments.May 19 2016, 7:38 AM
ELF/InputSection.h
15

I`ll remove that line, it is needless.

ruiu edited edge metadata.May 19 2016, 8:21 AM

I think this change makes sense, but in this design uncompressing sections would be inevitably serialized because it uncompresses sections during section garbage collection and section writing. I'm wondering if there's a way to parallelize it. (You don't really have to parallelize it at this point, but I'm thinking about the design that allows us to parallelize it easily by rewriting a for-loop with a parallel_for)

If there is no GC, we can uncompress all input sections after the name resolution is complete and before Writer::write(). We could simply visit all input sections and uncompress them if compressed.

If GC is present, the situation becomes a bit tricky because we are using getRangeAndSize() in MarkLive.cpp to set live bits of pieces of input sections to true. However, we don't really need any section contents in GC; it just says that "a piece of data at this offset needs to be handled as live." Since we do not merge data at this point, we can still do without reading the section contents. Here's the observation.

  • Until Writer::write(), no one needs section contents.
  • MergeInputSection only need to know which offsets are live during GC.
  • We could uncompress all input sections in parallel whose live bit is on.

So I think there's a room to improve this patch.

In D20433#434391, @ruiu wrote:

I think this change makes sense, but in this design uncompressing sections would be inevitably serialized because it uncompresses sections during section garbage collection and section writing. I'm wondering if there's a way to parallelize it. (You don't really have to parallelize it at this point, but I'm thinking about the design that allows us to parallelize it easily by rewriting a for-loop with a parallel_for)

If there is no GC, we can uncompress all input sections after the name resolution is complete and before Writer::write(). We could simply visit all input sections and uncompress them if compressed.

If GC is present, the situation becomes a bit tricky because we are using getRangeAndSize() in MarkLive.cpp to set live bits of pieces of input sections to true. However, we don't really need any section contents in GC; it just says that "a piece of data at this offset needs to be handled as live." Since we do not merge data at this point, we can still do without reading the section contents. Here's the observation.

  • Until Writer::write(), no one needs section contents.
  • MergeInputSection only need to know which offsets are live during GC.
  • We could uncompress all input sections in parallel whose live bit is on.

So I think there's a room to improve this patch.

Yap, that is true. So I would start from solving the step 2 (know live offsets during GC) first for this patch, I guess.
I`ll try to improve it on all fronts though. Thanks for sharing ideas.

grimar updated this revision to Diff 58062.May 22 2016, 8:58 AM
grimar edited edge metadata.

Fully reimplemented initial version of this.

Now MergeInputSection<ELFT>::Offsets vector is filled much later,
after all GC processing, right after creation of output sections.
This allows to avoid filling it with dead ranges of data.
and opens oportunity to implement decompression
of sections in the way when only live sections are decompressed.
Also after that they should be easily switched to
parrallel_for proccessing here I am sure.

This rework also helped to cut off very similar code in
MergeOutputSection<ELFT>::addSection() and
MergeInputSection<ELFT>::MergeInputSection(), it is combined now
to MergeInputSection<ELFT>::prepareRanges().
Code from MergeOutputSection<ELFT>::addSection() is just gone.

Forgot to add: all tests passes ok.

grimar updated this object.May 22 2016, 9:01 AM
ruiu added a comment.May 22 2016, 6:06 PM

You don't need to introduce a new notion of range for doing GC lazily. All you have to do is to memorize live offsets and apply it later. Please take a look at http://reviews.llvm.org/D20516

In D20433#436489, @ruiu wrote:

You don't need to introduce a new notion of range for doing GC lazily. All you have to do is to memorize live offsets and apply it later. Please take a look at http://reviews.llvm.org/D20516

I am not doing GC lazily here. I think "memorize live offsets and apply it later" it is almost exactly what this patch do.
At fact it memorize live offsets, but also do not keeps dead pieces when builds a list of them.

I think problem here that my patch was built on a half day old source code, that is probably why it looks like I introduced something new,
but Pieces and Ranges are just about the same.

So your way is to apply live bit and store all pieces, my way was to store only live pieces and that is why I even was need to save
data size separatelly (because I was not able to substract piece 1 offset from piece 2 offset if the latter was dead, I just did not have it in the list).

Actually I was sure that it is slightly better way, I tried to save all possible resources including memory,
since storing dead pieces is a waste. But your approach looks cleaner in implementation for me, so I would go with it.

ruiu added a comment.May 23 2016, 10:28 AM

r270455 should make things a lot easier to support compressed sections. Now I think basically you want to write InputSection::uncompress() function and call it like this from the driver's link function.

  for (InputSectionBase<ELFT> *S : F->getSections()) {
    if (!S || S == &InputSection<ELFT>::Discarded || !S->Live)
      continue;
    if (S->Compressed)
      S->uncompress();
    if (auto *MS = dyn_cast<MergeInputSection<ELFT>>(S))
      MS->splitIntoPieces();
  }
}
grimar abandoned this revision.May 24 2016, 1:03 AM