This is an archive of the discontinued LLVM Phabricator instance.

[DWARF][GDB INDEX] Fix to deal with constant pool de-dupliation Summary:
ClosedPublic

Authored by ayermolo on Mar 24 2023, 3:57 PM.

Details

Summary

GDB 11.2 generates V8 version of gdb-index where it de-duplicates entries in
constant pool based on cu indices. Changed how constant pool entries are counted
to account for this.

Diff Detail

Event Timeline

ayermolo created this revision.Mar 24 2023, 3:57 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2023, 3:57 PM
Herald added subscribers: hoy, modimo, wenlei and 2 others. · View Herald Transcript
ayermolo requested review of this revision.Mar 24 2023, 3:57 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2023, 3:57 PM

gdb generated one

.gdb_index contents:
  Version = 8

  CU list offset = 0x18, has 2 entries:
    0: Offset = 0x0, Length = 0x8a
    1: Offset = 0x8a, Length = 0x8e

  Types CU list offset = 0x38, has 0 entries:

  Address area offset = 0x38, has 2 entries:
    Low/High address = [0x201180, 0x20118f) (Size: 0xf), CU id = 0
    Low/High address = [0x201190, 0x20119d) (Size: 0xd), CU id = 1

  Symbol table offset = 0x60, size = 1024, filled slots:
    2: Name offset = 0x20, CU vector offset = 0x0
      String name: S, CU vector index: 0
    71: Name offset = 0x22, CU vector offset = 0x8
      String name: S2, CU vector index: 1
    489: Name offset = 0x25, CU vector offset = 0x10
      String name: main, CU vector index: 2
    661: Name offset = 0x2a, CU vector offset = 0x18
      String name: foo, CU vector index: 3
    732: Name offset = 0x2e, CU vector offset = 0x0
      String name: unsigned int, CU vector index: 0
    754: Name offset = 0x3b, CU vector offset = 0x0
      String name: int, CU vector index: 0

  Constant pool offset = 0x2060, has 4 CU vectors:
    0(0x0): 0x90000000
    1(0x8): 0x90000001
    2(0x10): 0x30000000
    3(0x18): 0x30000001

vs LLD generated one:

.gdb_index contents:
  Version = 7

  CU list offset = 0x18, has 2 entries:
    0: Offset = 0x0, Length = 0x8a
    1: Offset = 0x8a, Length = 0x8e

  Types CU list offset = 0x38, has 0 entries:

  Address area offset = 0x38, has 2 entries:
    Low/High address = [0x201180, 0x20118f) (Size: 0xf), CU id = 0
    Low/High address = [0x201190, 0x20119d) (Size: 0xd), CU id = 1

  Symbol table offset = 0x60, size = 1024, filled slots:
    2: Name offset = 0x38, CU vector offset = 0x0
      String name: S, CU vector index: 0
    71: Name offset = 0x3a, CU vector offset = 0x8
      String name: S2, CU vector index: 1
    489: Name offset = 0x4a, CU vector offset = 0x1c
      String name: main, CU vector index: 3
    661: Name offset = 0x53, CU vector offset = 0x30
      String name: foo, CU vector index: 5
    732: Name offset = 0x3d, CU vector offset = 0x10
      String name: unsigned int, CU vector index: 2
    754: Name offset = 0x4f, CU vector offset = 0x24
      String name: int, CU vector index: 4

  Constant pool offset = 0x2060, has 6 CU vectors:
    0(0x0): 0x90000000
    1(0x8): 0x90000001
    2(0x10): 0x90000000 0x90000001
    3(0x1c): 0x30000000
    4(0x24): 0x90000000 0x90000001
    5(0x30): 0x30000001

Not sure how to add a test that would use gdb and work with build bots.

Hmm, the way we're reading this doesn't seem like it /quite/ makes sense - we read the offsets that are encoded in the "symbol table" but then we read the constant pool CU vectors directly based on the number of unique offsets in the symbol table.

Perhaps we should be reading at the offsets specified by the symbol table instead of reading one after the other - maybe at some later date the format changes and other data is stored there, etc?

Seems like CUIndeces should more accurately be called be called CUOffsets? & perhaps we could sort that (or maybe use a sorted set?) and walk the offsets in order and read at those offset locations?

(as for testing - how's the exsiting functionality tested? I guess it's not tested by running lld (since llvm-dwarfdump testing shouldn't depend on lld) so there's probably a checked in binary with a gdb index, or maybe assembly file that can be assembled into an object file with a gdb index in it? But something like that should be done for this test too - if you wanted' you could also add a test in cross-project-tests that runs gdb itself to generate the index - but that might be a bit brittle depending on what gdb version is installed on the test machine, etc)

ayermolo updated this revision to Diff 508699.Mar 27 2023, 9:30 AM

updated to use offsets themselves

ayermolo updated this revision to Diff 508700.Mar 27 2023, 9:31 AM

Removed unused header.

ayermolo updated this revision to Diff 508712.Mar 27 2023, 9:52 AM

added binary test

Hmm, the way we're reading this doesn't seem like it /quite/ makes sense - we read the offsets that are encoded in the "symbol table" but then we read the constant pool CU vectors directly based on the number of unique offsets in the symbol table.

Perhaps we should be reading at the offsets specified by the symbol table instead of reading one after the other - maybe at some later date the format changes and other data is stored there, etc?

Seems like CUIndeces should more accurately be called be called CUOffsets? & perhaps we could sort that (or maybe use a sorted set?) and walk the offsets in order and read at those offset locations?

(as for testing - how's the exsiting functionality tested? I guess it's not tested by running lld (since llvm-dwarfdump testing shouldn't depend on lld) so there's probably a checked in binary with a gdb index, or maybe assembly file that can be assembled into an object file with a gdb index in it? But something like that should be done for this test too - if you wanted' you could also add a test in cross-project-tests that runs gdb itself to generate the index - but that might be a bit brittle depending on what gdb version is installed on the test machine, etc)

Originally I just found tests that used LLD, but found one that was using binary: dwarfdump-dump-gdbindex.test
So there is precedence.

Poking around previously I found lld/test/ELF/gdb-index.s which does use LLD.

ayermolo updated this revision to Diff 508716.Mar 27 2023, 9:56 AM

fixed typo

dblaikie accepted this revision.Mar 27 2023, 11:12 AM

Thanks!

llvm/test/DebugInfo/dwarfdump-dump-gdbindex-v8.test
65–69

If you like, perhaps in a separate patch, we could remove the numbering from this dump - since/if nothing refers to these entries by number/index, and only by offset - the index/numbering seems unneeded/misleading/confusing?

This revision is now accepted and ready to land.Mar 27 2023, 11:12 AM
ayermolo updated this revision to Diff 508767.Mar 27 2023, 12:59 PM

missed commiting actual check for version. :facepalm