This is an archive of the discontinued LLVM Phabricator instance.

[ELF] - Introduce GdbIndexBuilderDWARFContent
AbandonedPublic

Authored by grimar on Mar 23 2017, 10:54 AM.

Details

Summary

This patch continues work on speed up --gdb-index LLD option started in D31136.

In this patch I suggest to create separate class GdbIndexBuilderDWARFContent for parsing content required
to build .gdb_index section.

That has next benefits:

  1. Applying relocations is slow. Building .gdb_index seems requires resolving relocations

only in .debug_info section (for futher collectAddressRanges() call). We dont have to proccess any
other relocations I think. It easy to filter and optimize code separatelly for this case without overcomplicating the logic.

  1. This patch implements next optimization: only relocatons against symbols in allocatable sections

are handled. That allows to properly collect address ranges and skip all useless relocations in a simple way.
At first I though about adding option to LoadedObjectInfo for that, but it looks very specific feature and easier
to implement it in GdbIndexBuilderDWARFContent instead.

  1. It also opens road to pass already decompressed sections to parser. Since LLD linker decompresses sections on his side,

it will be convinent to adapt implementation of GdbIndexBuilderDWARFContent to pass decomressed sections instead object::ObjectFile.
That looks not needed for generic parser and would overcomplicate it.

  1. Creating GdbIndexBuilderDWARFContent allows to move implementation of .gdb_index builder code from LLD to DWARF

parser side, what allows futher reuse.

Patch also implements optimization of relocations processing, like D31136 did.
It is simple now and does not touch main parsers logic.

Number I got when linked llc using LLD with --gdb-index binary are:

Without this patch (average, 5 runs):

25365,272594      task-clock (msec)         #    1,312 CPUs utilized            ( +-  0,82% )
            16 827      context-switches          #    0,663 K/sec                    ( +-  2,26% )
               893      cpu-migrations            #    0,035 K/sec                    ( +-  8,21% )
           695 748      page-faults               #    0,027 M/sec                    ( +-  0,03% )
      19,330233261 seconds time elapsed

With this patch (average, 5 runs):

15865,250347      task-clock (msec)         #    1,554 CPUs utilized            ( +-  0,06% )
      15 830      context-switches          #    0,998 K/sec                    ( +-  0,93% )
         963      cpu-migrations            #    0,061 K/sec                    ( +-  4,79% )
     696 143      page-faults               #    0,044 M/sec                    ( +-  0,05% )
10,209859902 seconds time elapsed                                          ( +-  0,21% )

That makes --gdb-index almost 2x faster for my case !

Diff Detail

Event Timeline

grimar created this revision.Mar 23 2017, 10:54 AM
grimar updated this revision to Diff 92835.Mar 23 2017, 10:55 AM
  • Added comment.

I had to move and split some code to getSymbolAddress().
I can split this as NFC change into separate patch to make review of this one simpler if whole idea is OK.

ruiu added a comment.Mar 23 2017, 11:07 AM

As a thought experiment, if you want to make 20x faster, would you use this existing DWARF parser? I'm wondering if we need a ground-up approach. 2x is an awesome speedup, but we are probably 2x slower than the gold, so that is not a very high bar.

In D31296#709045, @ruiu wrote:

As a thought experiment, if you want to make 20x faster, would you use this existing DWARF parser? I'm wondering if we need a ground-up approach. 2x is an awesome speedup, but we are probably 2x slower than the gold, so that is not a very high bar.

I am still looking at it, first impressions that is very generic and do too much error checking, even in hot places like relocations handling where it strange to see.
I hope it be more clear soon if we can speedup it 20x times and how.

I also going to try to parralelize our .gdb_index building code (not sure if gold do that or not for gdb_index.).
That should give more speedup probably. But I want to spend more time on single threaded case at first to understand weakness.

ruiu added a comment.Mar 23 2017, 11:39 AM

How much code do you expect you would have to write if you pull out only the information out of DWARF debug sections for the purpose of creating .gdB_index?

(By the way incremental changes are fine and welcome as long as they don't mess up the existing design of the DWARF parser. But if you are going to eventually decide not to use this class, you want to roll this back as no one else will be using this.)

In D31296#709099, @ruiu wrote:

How much code do you expect you would have to write if you pull out only the information out of DWARF debug sections for the purpose of creating .gdB_index?

I would answer "not much". If you remember my first version of .gdb_index implementation for LLD used hand written parser already.
(https://reviews.llvm.org/D24267). But there was a discussion in llvm-mails with strong suggestion to reuse existent parsers.

(By the way incremental changes are fine and welcome as long as they don't mess up the existing design of the DWARF parser. But if you are going to eventually decide not to use this class, you want to roll this back as no one else will be using this.)

ruiu added a comment.Mar 23 2017, 1:01 PM

I'm not sure what's the best way of doing this, but we shouldn't exclude the choice to write a special-purpose parser for .gdb_index.

By the way, I think we had a correctness issue with .gdb_index. Did you address that already? If not, you probably want to do that first, as correctness is more important than speed.

In D31296#709143, @ruiu wrote:

I'm not sure what's the best way of doing this, but we shouldn't exclude the choice to write a special-purpose parser for .gdb_index.

I can probably try to ressurect my old patch for example and check how much it is faster/slower. To have some numbers for reference.
(though to have generic fast LLVM DWARF parser is still better, right ?)

By the way, I think we had a correctness issue with .gdb_index. Did you address that already?

Not yet, will try to look tomorrow.

ruiu added a comment.Mar 23 2017, 1:18 PM

Regarding your old patch to parse DWARF, it seems you are reading and applying relocations in your code. Why do you need to do that? LLD applies relocations when creating debug sections. Do you compute the same values twice?

In D31296#709196, @ruiu wrote:

Regarding your old patch to parse DWARF, it seems you are reading and applying relocations in your code. Why do you need to do that? LLD applies relocations when creating debug sections. Do you compute the same values twice?

Yes, I thought about that today too. But problem is that we apply relocations very late, when doing WriteTo().
I need to calculate just some relocations to get address area entries. I need to do that early to find final .gdb_index size. I thought about next way:

  • Do not apply relocations and gather broken address area (just to find the amount of entries).
  • Write .gdb_index after all other debug sections (after LLD writes them and do relocations).
  • Use LoadedObjectInfo::getLoadedSectionContents() to give parsers relocated content and re-read address area (with correct entries).
  • Update .gdb_index content and write it.

That should work in theory, but I doubt that it is faster then resolve some alloc relocations, like I do now. And definetely not cleaner.
Without applying relocations at all (just commented them out), I had about 9 seconds link time I think (need to recheck that), that is close to 10.2 I have in last patch.
So I probably would try optimize somewhere else at first.

aprantl added a subscriber: beanz.Mar 23 2017, 1:31 PM
ruiu added a comment.Mar 23 2017, 2:07 PM

You can always put .gdb_index at end of file even if some layout is enforced by linker scripts. That shouldn't cause any trouble because .gdb_index is not an SHF_ALLOC section. So the fact that we don't know the size of .gdb_index during linking is not an problem.

My intuition is that doing everything after applying relocations is easier and faster than doing it in the current way. The .gdb_index is after all designed with gdb in mind so that gdb can create .gdb_index sections for existing executables and libraries by reading their debug sections. So, post-processing (i.e. doing everything after applying relocations) is definitely doable and probably straightforward.

In D31296#709243, @ruiu wrote:

You can always put .gdb_index at end of file even if some layout is enforced by linker scripts. That shouldn't cause any trouble because .gdb_index is not an SHF_ALLOC section. So the fact that we don't know the size of .gdb_index during linking is not an problem.

My intuition is that doing everything after applying relocations is easier and faster than doing it in the current way. The .gdb_index is after all designed with gdb in mind so that gdb can create .gdb_index sections for existing executables and libraries by reading their debug sections. So, post-processing (i.e. doing everything after applying relocations) is definitely doable and probably straightforward.

That probably worth to try, but again - that should will not give 10x/20x speed up you mentioned earlier.
I need to recheck numbers here, but it had about 10-15% of speedup (in compare with latest patch) when relocations just disabled I think.

grimar abandoned this revision.Mar 29 2017, 5:40 AM

D31424 makes this useless.