This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Scan .eh_frame sections precisely in order to eliminate unused LSDAs and personality routines.
Needs ReviewPublic

Authored by krisb on Sep 8 2017, 8:01 AM.

Details

Summary

This patch marks CIEs, LSDAs and personality routines with all their dependent sections
only if they are used in the main part of a program.

Diff Detail

Event Timeline

ikudrin created this revision.Sep 8 2017, 8:01 AM
ruiu added inline comments.Sep 11 2017, 5:17 PM
ELF/EhFrame.cpp
211–212

This is not new code, but does an FDE without any relocation make sense? If not, maybe we should report an error instead of marking them to be ignored by -1?

ELF/MarkLive.cpp
105

I do not fully understand this function probably because I do not fully understand the relationship between CIE, FDE, LSDA, personality functions, etc. Can you rewrite this comment so that this function is understandable for those who don't know the structure of .eh_frame? I don't think we need a lengthy comment here, but the current comment seems to require too much prior knowledge.

120

I'm worried if the hash table is too slow. Can you avoid using it?

301–302

EhInputSections are scanned multiple times?

ikudrin updated this revision to Diff 114832.Sep 12 2017, 7:20 AM
  • Substituted a sorted vector for a DenseMap.
  • Improved (hopefully) comments.
ikudrin added inline comments.Sep 12 2017, 7:35 AM
ELF/EhFrame.cpp
211–212

The polite behavior was introduced by @rafael to handle gold's '-r' output, see https://reviews.llvm.org/rL295141.

ELF/MarkLive.cpp
105

I've tried to extend the comment. How do you find it now?

301–302

A personality routine, which might be added in this scanning, depends on other functions which might not be added yet. These functions might also have EH information and so on. However, I suppose that it should converge quickly.

ikudrin updated this revision to Diff 115406.Sep 15 2017, 7:37 AM

Update the patch to incorporate the latest changes in the master branch.

ruiu added inline comments.Sep 15 2017, 8:05 AM
ELF/MarkLive.cpp
114–117

This function seems to get too long. Can you refactor?

ikudrin updated this revision to Diff 115608.Sep 18 2017, 1:46 AM

Refactored scanEhFrameSection.

ikudrin updated this revision to Diff 115663.Sep 18 2017, 8:43 AM

Formatted.

ruiu added inline comments.Sep 18 2017, 2:19 PM
ELF/MarkLive.cpp
142–144

Do you really need to care about that? It doesn't seem at least harmful to call Fn on the described function.

158–159

It is probably better to check if Piece->Live outside processFde and rename processFde scanFde.

166

Likewise, I'd check Cie->Live before calling the function and rename the function scanCie.

170–172

Is this code reachable? The GC traverses a graph consists of nodes (sections, fdes, cies, etc.) and edges (relocations). I wonder who can point to this dangling cie.

298–303

How many times does this loop usually repeat? Since this loop scans all EhInputSections on every iteration, I'm a bit worried about its performance.

ELF/SyntheticSections.cpp
458–473

It is odd that you had to check use Live and isFdeLive. If GC works correctly, all live sections are marked as such, so you don't need to use isFdeLive, no? I believe how this should work.

ikudrin updated this revision to Diff 116180.Sep 21 2017, 6:50 AM
ikudrin marked 2 inline comments as done.
  • Update to the current tip.
  • Address review comments.
grimar added a subscriber: grimar.

Added llvm-commits. to subscribers because otherwise notifications about this patch were never shown in mailing lists..

ikudrin marked 3 inline comments as done.Sep 21 2017, 7:18 AM
ikudrin added inline comments.
ELF/MarkLive.cpp
170–172

It's not a dangling CIE. It just doesn't refer to a personality function, which is not prohibited.
And the edge from an FDE to a CIE is not a relocation, but an indirect link through ID field in the FDE.

298–303

In case of programs which do not use EH, like clang or lld, there will be only one iteration.
A typical executable with EH on Linux, where a personal routine is located in a DSO, will require two iterations. And a new flag ExecInQueue helps to avoid scanning EH frames for the second time.
If a personality routine is added from a library, it'll require two full iterations, but hardly more, because all references should be satisfied at that point.

ELF/SyntheticSections.cpp
458–473

When --gc-sections is not specified, we mark all pieces alive by default. In that case markLive() is not called and we have to check the state here to discard FDEs for merged COMDAT sections, for example.

Ping. Is there anything I can do to proceed the patch?

ruiu edited edge metadata.Sep 27 2017, 8:08 PM

Sorry for the belated response. This patch seemed a bit too complicated to me, and I was thinking if there's a better way of doing it.

So, there are two tricky things in garbage-collecting .eh_frame section contents:

  1. .eh_frame sections consists of multiple section pieces, and each piece can be alive or dead. This is different from a regular section because a regular section is alive or dead as a whole.
  2. .eh_frame sections have reverse dependency to .text sections. Usually, if a section A has a relocation pointing to B, then B is alive if A is alive. However, for .eh_frame section pieces, this liveness propagates backwards.

To deal with the above problems, you scan all .eh_frame section pieces in this patch. So, our algorithm with this patch is no longer a simple graph traversal algorithm. I think that's the reason why it feels a bit too complicated to me. But I think if you add data members to InputSection for reverse dependencies, you can keep it as a simple graph traversal.

How about this?

  1. Add a new member, EhSectionPieces vector, to InputSection. It holds the input section's FDEs and CIEs if exists. At the beginning of markLive, scan all .eh_frame sections to initialize the vectors.
  2. To represent edges between InputSections through .eh_frame section pieces, explicitly add these dependencies to InputSection::DependentSections. For example, if a function has .eh_frame pieces, its personality function is added to its DependentSection. You can initialize it at the beginning of markLive.

After doing the above two things, all dependencies are explicitly represented, and all you have to do is to traverse the graph, make .eh_frame pieces alive, and visit DependentSections as usual. I think this is a cleaner way of doing garbage-collecting.

ELF/InputSection.h
276 ↗(On Diff #116180)

If unsigned is not uint32_t, a padding would be created between Size and Live, so you should make Live uint32_t.

Also, sizeof(uint32_t) is 32, so 8 * sizeof(uint32_t) - 1 is always just 31.

uint32_t Size : 31;
uint32_t Live : 1;
ELF/MarkLive.cpp
115

We do not sprinkle const that much in lld if a local variable is obviously not mutated.

121

We usually call this It.

135

const typename ELFT::uint -> uint64_t

You can always use uint64_t to represent a value that can be either 32 or 64 bit.

138–140

This loop can be written as

for (unsigned I = Fde.FirstRelocation + 1, N = Rels.size(); I < N && Rels[I].r_offset < FdeEnd; ++I)
    resolveReloc<ELFT>(EH, Rels[I], Fn);
154

I wouldn't assign it to a temporary local variable.

155–156

I still doubt if this check makes sense. If we visit an invalid CIE that doesn't point to anything and mark it alive, then that means we are going to emit an invalid CIE to the output file, no? It seems it is an error condition.

225–227

Isn't this a premature optimization? I wouldn't add this unless it is proved to be actually effective.

ikudrin updated this revision to Diff 117099.Sep 29 2017, 1:31 AM

Addressed review comments.

ikudrin marked 6 inline comments as done.EditedSep 29 2017, 1:52 AM
In D37626#882892, @ruiu wrote:

How about this?

  1. Add a new member, EhSectionPieces vector, to InputSection. It holds the input section's FDEs and CIEs if exists. At the beginning of markLive, scan all .eh_frame sections to initialize the vectors.
  2. To represent edges between InputSections through .eh_frame section pieces, explicitly add these dependencies to InputSection::DependentSections. For example, if a function has .eh_frame pieces, its personality function is added to its DependentSection. You can initialize it at the beginning of markLive.

After doing the above two things, all dependencies are explicitly represented, and all you have to do is to traverse the graph, make .eh_frame pieces alive, and visit DependentSections as usual. I think this is a cleaner way of doing garbage-collecting.

I tried to implement this approach, see D38391. I hope I understood your suggestion right.

According to my tests, it is slower than the current version, approximately, by 10-20%. I checked it both with samples which use (like omnetpp) and do not use EH (like clang). I compared the execution time of markLive() alone, not the whole linker.

ELF/MarkLive.cpp
155–156

It's not a dangling or invalid CIE. It just doesn't refer to a personality function, which is not prohibited. For example, you can find such CIEs in any LLVM project on Linux, because they do not use exception handling.

225–227

I don't think so. It really helps to avoid the second scan of EH frames in most cases.

ruiu added a comment.Oct 1 2017, 7:15 PM

Yes, https://reviews.llvm.org/D38391 is what I was thinking about. Thank you for creating that patch. Do you know why that is slower than this patch? My wild guess is that it is because D38391 needs to add a lot of pointers to EhSectionPieces vectors, but I wonder if that's actually the case.

I guess the same. Moreover:

  • This patch process any CIE once while D38391 has to store a pointer to it for each corresponding input section.
  • D38391 also stores pointers to personality function sections and LSDA sections for each input section which use them.

This variant might be slower than D38391 only if it has to scan EH frames several times, which is quite a rare case.

ikudrin updated this revision to Diff 120207.Oct 25 2017, 1:43 AM

Ping. Is there anything I can do to improve this patch? I suppose I am hardly able to make D38391 quicker than this one.

  • Rebase on the current tip.
ruiu added a comment.Oct 25 2017, 10:51 AM

I still want to explore the idea of https://reviews.llvm.org/D38391 instead of this because that approach seems more "correct" to me, and this patch seems fairly complex. We may end up with this patch insetad of D38391, but it should be worth a try to reduce complexity.

ELF/MarkLive.cpp
50–52

We usually write public members first.

110

This needs a function comment. If I understand it correctly, this function takes an FDE and returns an CIE that the given FDE belongs to. Is this correct?

225–227

But how much time you can save? I feel like we shouldn't optimize it at the moment.

ikudrin updated this revision to Diff 120394.Oct 26 2017, 4:34 AM
  • Address review comments;
  • Rebase on the tip.
ikudrin marked 3 inline comments as done.Oct 26 2017, 5:02 AM

I believe that the complexity in both cases is similar. Here we have two kinds of edges with distinct traversal functions. There we have to convert these edges in advance so that we can use only one traversal function. And distinct traversal function is not that different from the conversion function. But conversion always requires additional space and time to store edges, while the traversal function might be called several times. But my opinion might be biased.

ELF/MarkLive.cpp
225–227

According to my tests, with this optimization the execution time of doGcSections() becomes about 1% longer if the linked program does not use EH (like clang) and about 2.5% shorter in the other case (like omnetpp). The effect is not significant, so I've removed it.

ruiu added a comment.Oct 26 2017, 2:04 PM

I started a discussion in the llvm-dev mailing list to eliminate the need to scan .eh_frame sections in a special way. Please send a comment there if you have one.

ikudrin marked an inline comment as done.EditedFeb 13 2018, 5:17 PM

Hello @ruiu,

In D37626#908500, @ruiu wrote:

I started a discussion in the llvm-dev mailing list to eliminate the need to scan .eh_frame sections in a special way. Please send a comment there if you have one.

Could you summarise the discussion here, please?

espindola edited reviewers, added: espindola; removed: rafael.Mar 15 2018, 9:12 AM
krisb commandeered this revision.Jan 17 2023, 11:22 PM
krisb added a reviewer: ikudrin.
krisb edited reviewers, added: MaskRay; removed: espindola.
krisb added a subscriber: krisb.

Let's give this another chance.

Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2023, 11:22 PM
krisb updated this revision to Diff 490051.Jan 17 2023, 11:23 PM

Rebase on ToT

Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2023, 11:23 PM

I am still on a trip and will take a few days more.

MaskRay added a comment.EditedJan 28 2023, 4:41 PM

Can you describe the motivation of this change? It has two parts:

  • Personality routine GC
  • LSDA GC

Personality routine GC is typically not useful at all. (Moreover, in many configurations (without -static-libstdc++) personality routines are in shared objects.)

If you specify -fbinutils-version=2.36 (or latest), .gcc_except_table.* sections can be GCed.
I currently lean toward the opinion that this patch is probably not useful.
There is some GC complexity which does not pay off.

About the implementation, I think there are two issues:

  • for (EhInputSection *eh : ctx.ehInputSections) { loop can just be executed once, not nested in a for (;;) loop.
  • Calling isFdeLive twice, once in MarkLive, once in EhFrameSection::addRecords is not great.
krisb added a comment.Mar 5 2023, 11:53 PM

Hello @MaskRay, thanks for the review! And sorry for the delay with response.
The patch is useful in a probably rare case, indeed. It reproduces with a statically linked application that does not use exceptions itself, but hooks the personality routine because of some indirect dependencies that are removed by GC. Unfortunately, we cannot share our sample, but we will try to find something meaningful in the public domain.
Note that GNU ld produces smaller binaries because it analyzes the liveness of sections/pieces more thoroughly.

Hello @MaskRay, thanks for the review! And sorry for the delay with response.
The patch is useful in a probably rare case, indeed. It reproduces with a statically linked application that does not use exceptions itself, but hooks the personality routine because of some indirect dependencies that are removed by GC. Unfortunately, we cannot share our sample, but we will try to find something meaningful in the public domain.
Note that GNU ld produces smaller binaries because it analyzes the liveness of sections/pieces more thoroughly.

Can you try -fbinutils-version=2.36 or latest?