Download Raw Diff

Details

Reviewers

Commits

rG4d64fcdd43a1: [DWARF][GDB INDEX] Fix to deal with constant pool de-dupliation Summary:

Summary

GDB 11.2 generates V8 version of gdb-index where it de-duplicates entries in
constant pool based on cu indices. Changed how constant pool entries are counted
to account for this.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ayermolo created this revision.Mar 24 2023, 3:57 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2023, 3:57 PM

Herald added subscribers: hoy, modimo, wenlei and 2 others. · View Herald Transcript

ayermolo requested review of this revision.Mar 24 2023, 3:57 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2023, 3:57 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

gdb generated one

.gdb_index contents:
  Version = 8

  CU list offset = 0x18, has 2 entries:
    0: Offset = 0x0, Length = 0x8a
    1: Offset = 0x8a, Length = 0x8e

  Types CU list offset = 0x38, has 0 entries:

  Address area offset = 0x38, has 2 entries:
    Low/High address = [0x201180, 0x20118f) (Size: 0xf), CU id = 0
    Low/High address = [0x201190, 0x20119d) (Size: 0xd), CU id = 1

  Symbol table offset = 0x60, size = 1024, filled slots:
    2: Name offset = 0x20, CU vector offset = 0x0
      String name: S, CU vector index: 0
    71: Name offset = 0x22, CU vector offset = 0x8
      String name: S2, CU vector index: 1
    489: Name offset = 0x25, CU vector offset = 0x10
      String name: main, CU vector index: 2
    661: Name offset = 0x2a, CU vector offset = 0x18
      String name: foo, CU vector index: 3
    732: Name offset = 0x2e, CU vector offset = 0x0
      String name: unsigned int, CU vector index: 0
    754: Name offset = 0x3b, CU vector offset = 0x0
      String name: int, CU vector index: 0

  Constant pool offset = 0x2060, has 4 CU vectors:
    0(0x0): 0x90000000
    1(0x8): 0x90000001
    2(0x10): 0x30000000
    3(0x18): 0x30000001

vs LLD generated one:

.gdb_index contents:
  Version = 7

  CU list offset = 0x18, has 2 entries:
    0: Offset = 0x0, Length = 0x8a
    1: Offset = 0x8a, Length = 0x8e

  Types CU list offset = 0x38, has 0 entries:

  Address area offset = 0x38, has 2 entries:
    Low/High address = [0x201180, 0x20118f) (Size: 0xf), CU id = 0
    Low/High address = [0x201190, 0x20119d) (Size: 0xd), CU id = 1

  Symbol table offset = 0x60, size = 1024, filled slots:
    2: Name offset = 0x38, CU vector offset = 0x0
      String name: S, CU vector index: 0
    71: Name offset = 0x3a, CU vector offset = 0x8
      String name: S2, CU vector index: 1
    489: Name offset = 0x4a, CU vector offset = 0x1c
      String name: main, CU vector index: 3
    661: Name offset = 0x53, CU vector offset = 0x30
      String name: foo, CU vector index: 5
    732: Name offset = 0x3d, CU vector offset = 0x10
      String name: unsigned int, CU vector index: 2
    754: Name offset = 0x4f, CU vector offset = 0x24
      String name: int, CU vector index: 4

  Constant pool offset = 0x2060, has 6 CU vectors:
    0(0x0): 0x90000000
    1(0x8): 0x90000001
    2(0x10): 0x90000000 0x90000001
    3(0x1c): 0x30000000
    4(0x24): 0x90000000 0x90000001
    5(0x30): 0x30000001

Not sure how to add a test that would use gdb and work with build bots.

Harbormaster completed remote builds in B221698: Diff 508238.Mar 24 2023, 4:39 PM

Hmm, the way we're reading this doesn't seem like it /quite/ makes sense - we read the offsets that are encoded in the "symbol table" but then we read the constant pool CU vectors directly based on the number of unique offsets in the symbol table.

Perhaps we should be reading at the offsets specified by the symbol table instead of reading one after the other - maybe at some later date the format changes and other data is stored there, etc?

Seems like CUIndeces should more accurately be called be called CUOffsets? & perhaps we could sort that (or maybe use a sorted set?) and walk the offsets in order and read at those offset locations?

(as for testing - how's the exsiting functionality tested? I guess it's not tested by running lld (since llvm-dwarfdump testing shouldn't depend on lld) so there's probably a checked in binary with a gdb index, or maybe assembly file that can be assembled into an object file with a gdb index in it? But something like that should be done for this test too - if you wanted' you could also add a test in cross-project-tests that runs gdb itself to generate the index - but that might be a bit brittle depending on what gdb version is installed on the test machine, etc)

updated to use offsets themselves

Removed unused header.

added binary test

In D146852#4221678, @dblaikie wrote:

Hmm, the way we're reading this doesn't seem like it /quite/ makes sense - we read the offsets that are encoded in the "symbol table" but then we read the constant pool CU vectors directly based on the number of unique offsets in the symbol table.

Perhaps we should be reading at the offsets specified by the symbol table instead of reading one after the other - maybe at some later date the format changes and other data is stored there, etc?

Seems like CUIndeces should more accurately be called be called CUOffsets? & perhaps we could sort that (or maybe use a sorted set?) and walk the offsets in order and read at those offset locations?

(as for testing - how's the exsiting functionality tested? I guess it's not tested by running lld (since llvm-dwarfdump testing shouldn't depend on lld) so there's probably a checked in binary with a gdb index, or maybe assembly file that can be assembled into an object file with a gdb index in it? But something like that should be done for this test too - if you wanted' you could also add a test in cross-project-tests that runs gdb itself to generate the index - but that might be a bit brittle depending on what gdb version is installed on the test machine, etc)

Originally I just found tests that used LLD, but found one that was using binary: dwarfdump-dump-gdbindex.test
So there is precedence.

Poking around previously I found lld/test/ELF/gdb-index.s which does use LLD.

fixed typo

Thanks!

llvm/test/DebugInfo/dwarfdump-dump-gdbindex-v8.test
65–69	If you like, perhaps in a separate patch, we could remove the numbering from this dump - since/if nothing refers to these entries by number/index, and only by offset - the index/numbering seems unneeded/misleading/confusing?

This revision is now accepted and ready to land.Mar 27 2023, 11:12 AM

Harbormaster completed remote builds in B222060: Diff 508716.Mar 27 2023, 12:18 PM

ayermolo mentioned this in D146289: dwp section overflow checks.Mar 27 2023, 12:19 PM

missed commiting actual check for version. :facepalm

Harbormaster completed remote builds in B222091: Diff 508767.Mar 27 2023, 1:50 PM

rebase

Harbormaster completed remote builds in B222107: Diff 508788.Mar 27 2023, 2:59 PM

Closed by commit rG4d64fcdd43a1: [DWARF][GDB INDEX] Fix to deal with constant pool de-dupliation Summary: (authored by ayermolo). · Explain WhyMar 27 2023, 3:34 PM

This revision was automatically updated to reflect the committed changes.

ayermolo added a commit: rG4d64fcdd43a1: [DWARF][GDB INDEX] Fix to deal with constant pool de-dupliation Summary:.

Diff 508819

llvm/lib/DebugInfo/DWARF/DWARFGdbIndex.cpp

Show All 10 Lines
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/Support/DataExtractor.h"		#include "llvm/Support/DataExtractor.h"
#include "llvm/Support/Format.h"		#include "llvm/Support/Format.h"
#include "llvm/Support/FormatVariadic.h"		#include "llvm/Support/FormatVariadic.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <cassert>		#include <cassert>
#include <cinttypes>		#include <cinttypes>
#include <cstdint>		#include <cstdint>
		#include <set>
#include <utility>		#include <utility>

using namespace llvm;		using namespace llvm;

// .gdb_index section format reference:		// .gdb_index section format reference:
// https://sourceware.org/gdb/onlinedocs/gdb/Index-Section-Format.html		// https://sourceware.org/gdb/onlinedocs/gdb/Index-Section-Format.html

void DWARFGdbIndex::dumpCUList(raw_ostream &OS) const {		void DWARFGdbIndex::dumpCUList(raw_ostream &OS) const {
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	if (HasContent) {
dumpSymbolTable(OS);		dumpSymbolTable(OS);
dumpConstantPool(OS);		dumpConstantPool(OS);
}		}
}		}

bool DWARFGdbIndex::parseImpl(DataExtractor Data) {		bool DWARFGdbIndex::parseImpl(DataExtractor Data) {
uint64_t Offset = 0;		uint64_t Offset = 0;

// Only version 7 is supported at this moment.		// Only version 7 and 8 are supported at this moment.
Version = Data.getU32(&Offset);		Version = Data.getU32(&Offset);
if (Version != 7)		if (Version != 7 && Version != 8)
return false;		return false;

CuListOffset = Data.getU32(&Offset);		CuListOffset = Data.getU32(&Offset);
TuListOffset = Data.getU32(&Offset);		TuListOffset = Data.getU32(&Offset);
AddressAreaOffset = Data.getU32(&Offset);		AddressAreaOffset = Data.getU32(&Offset);
SymbolTableOffset = Data.getU32(&Offset);		SymbolTableOffset = Data.getU32(&Offset);
ConstantPoolOffset = Data.getU32(&Offset);		ConstantPoolOffset = Data.getU32(&Offset);

Show All 33 Lines	bool DWARFGdbIndex::parseImpl(DataExtractor Data) {
// Each slot in the hash table consists of a pair of offset_type values. The		// Each slot in the hash table consists of a pair of offset_type values. The
// first value is the offset of the symbol's name in the constant pool. The		// first value is the offset of the symbol's name in the constant pool. The
// second value is the offset of the CU vector in the constant pool.		// second value is the offset of the CU vector in the constant pool.
// If both values are 0, then this slot in the hash table is empty. This is ok		// If both values are 0, then this slot in the hash table is empty. This is ok
// because while 0 is a valid constant pool index, it cannot be a valid index		// because while 0 is a valid constant pool index, it cannot be a valid index
// for both a string and a CU vector.		// for both a string and a CU vector.
uint32_t SymTableSize = (ConstantPoolOffset - SymbolTableOffset) / 8;		uint32_t SymTableSize = (ConstantPoolOffset - SymbolTableOffset) / 8;
SymbolTable.reserve(SymTableSize);		SymbolTable.reserve(SymTableSize);
uint32_t CuVectorsTotal = 0;		std::set<uint32_t> CUOffsets;
for (uint32_t i = 0; i < SymTableSize; ++i) {		for (uint32_t i = 0; i < SymTableSize; ++i) {
uint32_t NameOffset = Data.getU32(&Offset);		uint32_t NameOffset = Data.getU32(&Offset);
uint32_t CuVecOffset = Data.getU32(&Offset);		uint32_t CuVecOffset = Data.getU32(&Offset);
SymbolTable.push_back({NameOffset, CuVecOffset});		SymbolTable.push_back({NameOffset, CuVecOffset});
if (NameOffset \|\| CuVecOffset)		if (NameOffset \|\| CuVecOffset)
++CuVectorsTotal;		CUOffsets.insert(CuVecOffset);
}		}

// The constant pool. CU vectors are stored first, followed by strings.		// The constant pool. CU vectors are stored first, followed by strings.
// The first value is the number of CU indices in the vector. Each subsequent		// The first value is the number of CU indices in the vector. Each subsequent
// value is the index and symbol attributes of a CU in the CU list.		// value is the index and symbol attributes of a CU in the CU list.
for (uint32_t i = 0; i < CuVectorsTotal; ++i) {		for (auto CUOffset : CUOffsets) {
		Offset = ConstantPoolOffset + CUOffset;
ConstantPoolVectors.emplace_back(0, SmallVector<uint32_t, 0>());		ConstantPoolVectors.emplace_back(0, SmallVector<uint32_t, 0>());
auto &Vec = ConstantPoolVectors.back();		auto &Vec = ConstantPoolVectors.back();
Vec.first = Offset - ConstantPoolOffset;		Vec.first = Offset - ConstantPoolOffset;

uint32_t Num = Data.getU32(&Offset);		uint32_t Num = Data.getU32(&Offset);
for (uint32_t j = 0; j < Num; ++j)		for (uint32_t J = 0; J < Num; ++J)
Vec.second.push_back(Data.getU32(&Offset));		Vec.second.push_back(Data.getU32(&Offset));
}		}

ConstantPoolStrings = Data.getData().drop_front(Offset);		ConstantPoolStrings = Data.getData().drop_front(Offset);
StringPoolOffset = Offset;		StringPoolOffset = Offset;
return true;		return true;
}		}

void DWARFGdbIndex::parse(DataExtractor Data) {		void DWARFGdbIndex::parse(DataExtractor Data) {
HasContent = !Data.getData().empty();		HasContent = !Data.getData().empty();
HasError = HasContent && !parseImpl(Data);		HasError = HasContent && !parseImpl(Data);
}		}

llvm/test/DebugInfo/Inputs/dwarfdump-gdbindex-v8.elf-x86-64

This binary file was added.

Property	Old Value	New Value
File Mode	null	100755

llvm/test/DebugInfo/dwarfdump-dump-gdbindex-v8.test

This file was added.

				RUN: llvm-dwarfdump -gdb-index %p/Inputs/dwarfdump-gdbindex-v8.elf-x86-64 \| FileCheck %s

				; main.cpp:
				; typedef struct
				; {
				; unsigned a;
				; unsigned b;
				; } S;
				;
				; int main() {
				; S s;
				; s.a = 0x64A40101;
				; }
				; helper.cpp:
				; typedef struct
				; {
				; unsigned a;
				; unsigned b;
				; } S2;
				;
				; int foo() {
				; S2 s;
				; s.a = 0x64A40101;
				; }
				; Compiled with:
				; clang++ -ggnu-pubnames -g2 -gdwarf-4 -fdebug-types-section -c test.cpp test2.cpp
				; ld.lld main.o helper.o -o dwarfdump-gdbindex-v8.elf-x86-64
				; gdb-11/bin/gdb-add-index dwarfdump-gdbindex-v8.elf-x86-64
				; clang version 17.0.0 (https://github.com/llvm/llvm-project.git 128b050d3c234c7238966349f8878884123a0030)
				; GNU gdb (GDB) 11.2
				; Info about gdb-index: https://sourceware.org/gdb/onlinedocs/gdb/Index-Section-Format.html

				; CHECK-LABEL: .gdb_index contents:
				; CHECK: Version = 8

				; CHECK: CU list offset = 0x18, has 2 entries:
				; CHECK-NEXT: 0: Offset = 0x0, Length = 0x6e
				; CHECK-NEXT: 1: Offset = 0x6e, Length = 0x72

				; CHECK: Types CU list offset = 0x38, has 2 entries:
				; CHECK-NEXT: 0: offset = 0x00000000, type_offset = 0x0000001e, type_signature = 0x418503b8111e9a7b
				; CHECK-NEXT; 1: offset = 0x00000044, type_offset = 0x0000001e, type_signature = 0x00f6cca4e3a15118

				; CHECK: Address area offset = 0x68, has 2 entries:
				; CHECK-NEXT: Low/High address = [0x201180, 0x20118f) (Size: 0xf), CU id = 0
				; CHECK-NEXT: Low/High address = [0x201190, 0x20119d) (Size: 0xd), CU id = 1

				; CHECK: Symbol table offset = 0x90, size = 1024, filled slots:
				; CHECK-NEXT: 2: Name offset = 0x28, CU vector offset = 0x0
				; CHECK-NEXT: String name: S, CU vector index: 0
				; CHECK-NEXT: 71: Name offset = 0x2a, CU vector offset = 0x8
				; CHECK-NEXT: String name: S2, CU vector index: 1
				; CHECK-NEXT: 489: Name offset = 0x2d, CU vector offset = 0x10
				; CHECK-NEXT: String name: main, CU vector index: 2
				; CHECK-NEXT: 661: Name offset = 0x32, CU vector offset = 0x18
				; CHECK-NEXT: String name: foo, CU vector index: 3
				; CHECK-NEXT: 732: Name offset = 0x36, CU vector offset = 0x20
				; CHECK-NEXT: String name: unsigned int, CU vector index: 4
				; CHECK-NEXT: 754: Name offset = 0x43, CU vector offset = 0x0
				; CHECK-NEXT: String name: int, CU vector index: 0


				; CHECK: Constant pool offset = 0x2090, has 5 CU vectors:
				; CHECK-NEXT: 0(0x0): 0x90000000
				; CHECK-NEXT: 1(0x8): 0x90000001
				; CHECK-NEXT: 2(0x10): 0x30000000
				; CHECK-NEXT: 3(0x18): 0x30000001
				; CHECK-NEXT: 4(0x20): 0x90000002

This is an archive of the discontinued LLVM Phabricator instance.

[DWARF][GDB INDEX] Fix to deal with constant pool de-dupliation Summary:
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 508819

llvm/lib/DebugInfo/DWARF/DWARFGdbIndex.cpp

llvm/test/DebugInfo/Inputs/dwarfdump-gdbindex-v8.elf-x86-64

llvm/test/DebugInfo/dwarfdump-dump-gdbindex-v8.test

This is an archive of the discontinued LLVM Phabricator instance.

[DWARF][GDB INDEX] Fix to deal with constant pool de-dupliation Summary:ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 508819

llvm/lib/DebugInfo/DWARF/DWARFGdbIndex.cpp

llvm/test/DebugInfo/Inputs/dwarfdump-gdbindex-v8.elf-x86-64

llvm/test/DebugInfo/dwarfdump-dump-gdbindex-v8.test

[DWARF][GDB INDEX] Fix to deal with constant pool de-dupliation Summary:
ClosedPublic