This patch marks CIEs, LSDAs and personality routines with all their dependent sections
only if they are used in the main part of a program.
Details
Diff Detail
Event Timeline
ELF/EhFrame.cpp | ||
---|---|---|
210–211 | This is not new code, but does an FDE without any relocation make sense? If not, maybe we should report an error instead of marking them to be ignored by -1? | |
ELF/MarkLive.cpp | ||
173 | I do not fully understand this function probably because I do not fully understand the relationship between CIE, FDE, LSDA, personality functions, etc. Can you rewrite this comment so that this function is understandable for those who don't know the structure of .eh_frame? I don't think we need a lengthy comment here, but the current comment seems to require too much prior knowledge. | |
192 | I'm worried if the hash table is too slow. Can you avoid using it? | |
351–352 | EhInputSections are scanned multiple times? |
ELF/EhFrame.cpp | ||
---|---|---|
210–211 | The polite behavior was introduced by @rafael to handle gold's '-r' output, see https://reviews.llvm.org/rL295141. | |
ELF/MarkLive.cpp | ||
173 | I've tried to extend the comment. How do you find it now? | |
351–352 | A personality routine, which might be added in this scanning, depends on other functions which might not be added yet. These functions might also have EH information and so on. However, I suppose that it should converge quickly. |
ELF/MarkLive.cpp | ||
---|---|---|
186–189 | This function seems to get too long. Can you refactor? |
ELF/MarkLive.cpp | ||
---|---|---|
152–154 | Do you really need to care about that? It doesn't seem at least harmful to call Fn on the described function. | |
180–182 | Is this code reachable? The GC traverses a graph consists of nodes (sections, fdes, cies, etc.) and edges (relocations). I wonder who can point to this dangling cie. | |
204–205 | It is probably better to check if Piece->Live outside processFde and rename processFde scanFde. | |
212 | Likewise, I'd check Cie->Live before calling the function and rename the function scanCie. | |
348–364 | How many times does this loop usually repeat? Since this loop scans all EhInputSections on every iteration, I'm a bit worried about its performance. | |
ELF/SyntheticSections.cpp | ||
448–463 | It is odd that you had to check use Live and isFdeLive. If GC works correctly, all live sections are marked as such, so you don't need to use isFdeLive, no? I believe how this should work. |
Added llvm-commits. to subscribers because otherwise notifications about this patch were never shown in mailing lists..
ELF/MarkLive.cpp | ||
---|---|---|
180–182 | It's not a dangling CIE. It just doesn't refer to a personality function, which is not prohibited. | |
348–364 | In case of programs which do not use EH, like clang or lld, there will be only one iteration. | |
ELF/SyntheticSections.cpp | ||
448–463 | When --gc-sections is not specified, we mark all pieces alive by default. In that case markLive() is not called and we have to check the state here to discard FDEs for merged COMDAT sections, for example. |
Sorry for the belated response. This patch seemed a bit too complicated to me, and I was thinking if there's a better way of doing it.
So, there are two tricky things in garbage-collecting .eh_frame section contents:
- .eh_frame sections consists of multiple section pieces, and each piece can be alive or dead. This is different from a regular section because a regular section is alive or dead as a whole.
- .eh_frame sections have reverse dependency to .text sections. Usually, if a section A has a relocation pointing to B, then B is alive if A is alive. However, for .eh_frame section pieces, this liveness propagates backwards.
To deal with the above problems, you scan all .eh_frame section pieces in this patch. So, our algorithm with this patch is no longer a simple graph traversal algorithm. I think that's the reason why it feels a bit too complicated to me. But I think if you add data members to InputSection for reverse dependencies, you can keep it as a simple graph traversal.
How about this?
- Add a new member, EhSectionPieces vector, to InputSection. It holds the input section's FDEs and CIEs if exists. At the beginning of markLive, scan all .eh_frame sections to initialize the vectors.
- To represent edges between InputSections through .eh_frame section pieces, explicitly add these dependencies to InputSection::DependentSections. For example, if a function has .eh_frame pieces, its personality function is added to its DependentSection. You can initialize it at the beginning of markLive.
After doing the above two things, all dependencies are explicitly represented, and all you have to do is to traverse the graph, make .eh_frame pieces alive, and visit DependentSections as usual. I think this is a cleaner way of doing garbage-collecting.
ELF/InputSection.h | ||
---|---|---|
276 | If unsigned is not uint32_t, a padding would be created between Size and Live, so you should make Live uint32_t. Also, sizeof(uint32_t) is 32, so 8 * sizeof(uint32_t) - 1 is always just 31. uint32_t Size : 31; uint32_t Live : 1; | |
ELF/MarkLive.cpp | ||
125 | We do not sprinkle const that much in lld if a local variable is obviously not mutated. | |
131 | We usually call this It. | |
145 | const typename ELFT::uint -> uint64_t You can always use uint64_t to represent a value that can be either 32 or 64 bit. | |
148–150 | This loop can be written as for (unsigned I = Fde.FirstRelocation + 1, N = Rels.size(); I < N && Rels[I].r_offset < FdeEnd; ++I) resolveReloc<ELFT>(EH, Rels[I], Fn); | |
164 | I wouldn't assign it to a temporary local variable. | |
165–166 | I still doubt if this check makes sense. If we visit an invalid CIE that doesn't point to anything and mark it alive, then that means we are going to emit an invalid CIE to the output file, no? It seems it is an error condition. | |
269–271 | Isn't this a premature optimization? I wouldn't add this unless it is proved to be actually effective. |
I tried to implement this approach, see D38391. I hope I understood your suggestion right.
According to my tests, it is slower than the current version, approximately, by 10-20%. I checked it both with samples which use (like omnetpp) and do not use EH (like clang). I compared the execution time of markLive() alone, not the whole linker.
ELF/MarkLive.cpp | ||
---|---|---|
165–166 | It's not a dangling or invalid CIE. It just doesn't refer to a personality function, which is not prohibited. For example, you can find such CIEs in any LLVM project on Linux, because they do not use exception handling. | |
269–271 | I don't think so. It really helps to avoid the second scan of EH frames in most cases. |
Yes, https://reviews.llvm.org/D38391 is what I was thinking about. Thank you for creating that patch. Do you know why that is slower than this patch? My wild guess is that it is because D38391 needs to add a lot of pointers to EhSectionPieces vectors, but I wonder if that's actually the case.
I guess the same. Moreover:
- This patch process any CIE once while D38391 has to store a pointer to it for each corresponding input section.
- D38391 also stores pointers to personality function sections and LSDA sections for each input section which use them.
This variant might be slower than D38391 only if it has to scan EH frames several times, which is quite a rare case.
Ping. Is there anything I can do to improve this patch? I suppose I am hardly able to make D38391 quicker than this one.
- Rebase on the current tip.
I still want to explore the idea of https://reviews.llvm.org/D38391 instead of this because that approach seems more "correct" to me, and this patch seems fairly complex. We may end up with this patch insetad of D38391, but it should be worth a try to reduce complexity.
ELF/MarkLive.cpp | ||
---|---|---|
50–52 | We usually write public members first. | |
120 | This needs a function comment. If I understand it correctly, this function takes an FDE and returns an CIE that the given FDE belongs to. Is this correct? | |
269–271 | But how much time you can save? I feel like we shouldn't optimize it at the moment. |
I believe that the complexity in both cases is similar. Here we have two kinds of edges with distinct traversal functions. There we have to convert these edges in advance so that we can use only one traversal function. And distinct traversal function is not that different from the conversion function. But conversion always requires additional space and time to store edges, while the traversal function might be called several times. But my opinion might be biased.
ELF/MarkLive.cpp | ||
---|---|---|
269–271 | According to my tests, with this optimization the execution time of doGcSections() becomes about 1% longer if the linked program does not use EH (like clang) and about 2.5% shorter in the other case (like omnetpp). The effect is not significant, so I've removed it. |
I started a discussion in the llvm-dev mailing list to eliminate the need to scan .eh_frame sections in a special way. Please send a comment there if you have one.
Can you describe the motivation of this change? It has two parts:
- Personality routine GC
- LSDA GC
Personality routine GC is typically not useful at all. (Moreover, in many configurations (without -static-libstdc++) personality routines are in shared objects.)
If you specify -fbinutils-version=2.36 (or latest), .gcc_except_table.* sections can be GCed.
I currently lean toward the opinion that this patch is probably not useful.
There is some GC complexity which does not pay off.
About the implementation, I think there are two issues:
- for (EhInputSection *eh : ctx.ehInputSections) { loop can just be executed once, not nested in a for (;;) loop.
- Calling isFdeLive twice, once in MarkLive, once in EhFrameSection::addRecords is not great.
Hello @MaskRay, thanks for the review! And sorry for the delay with response.
The patch is useful in a probably rare case, indeed. It reproduces with a statically linked application that does not use exceptions itself, but hooks the personality routine because of some indirect dependencies that are removed by GC. Unfortunately, we cannot share our sample, but we will try to find something meaningful in the public domain.
Note that GNU ld produces smaller binaries because it analyzes the liveness of sections/pieces more thoroughly.
This is not new code, but does an FDE without any relocation make sense? If not, maybe we should report an error instead of marking them to be ignored by -1?