This is an archive of the discontinued LLVM Phabricator instance.

[llvm-dwarfdump] - Change how dwarfdump dumps .debug_ranges
AbandonedPublic

Authored by grimar on Aug 3 2017, 8:37 AM.

Download Raw Diff

Details

Reviewers

dblaikie
• rafael

Summary

It was raised in D36097 thread that we currently do not have
testcases for section index API in libDebugInfo.

Idea was to add some useful functionality for any tool,
like llvm-dwarfdump which will use that API and require testcases.

Patch introduces next changes:

Teaches dumper of .debug_ranges to render range section index.
Teaches dumper about address selection entry and how to render entries properly if section contains them. (previously it did not show result range properly, because did not add base address to result).
Cosmetic: adds nice header.

Diff Detail

Event Timeline

grimar created this revision.Aug 3 2017, 8:37 AM

Herald added a subscriber: aprantl. · View Herald TranscriptAug 3 2017, 8:37 AM

grimar mentioned this in D36097: Prototype fix for lld DWARF parsing of base address selection entries in range lists.Aug 3 2017, 8:39 AM

A couple of things

If possible, I think it'd be better if it printed the section name, rather than the section index

I'm not sure how practical it is to actually process base selection entries when rendering debug_ranges. I was looking at some related behavior today - since parsing the debug_ranges section depends on knowing which CU refers to which range - it's hard to tell short of parsing all the CU to find all the references to ranges, which initial base address applies to which section. Note that currently what's dumped in debug_ranges looks quite different (it includes the literal base address selection entries just as normal entries - not showing their effect) from when the range is dumped inside debug_info dumping - where it's processed and the effect of the base address selection entries (& of the default base address) is shown.

How does your patch deal with handling the default base address? Given that it varies between each range list in debug_ranges and would only be known by determining which CU refers to that range list (which can only be known by walking all the DWARF DIEs to find the points that refer to range lists)? Does it not use a default base address? (I guess that's not the case) Does it use the default base address of the first CU?

I'd say probably leave the debug_ranges dumping as-is (I think there's a minor change that could be made to improve it a little*, but still leave it basically dumping the raw bytes, not the processed/semantic range list) but improve the range dumping in debug_info to include this info, perhaps?

For example: how does your patch dump... oh, that's a bug in debug_ranges emission (or a suboptimal feature). So in the example you're testing there shouldn't be any relocations or a base address selection entry - LLVM should produce DWARF that relies on the existing default base address, perhaps. Oh, maybe that's circular - so not a bug, but...

OK, so what we want to do is get LLVM to produce a situation where it does emit a range but the CU has a default base address. Let's see what I can conjure up...

OK, here we go. It's not the simplest reproduction (I don't know how to tickle LLVM to produce code like this naturally - but it's easy with a minor manual IR edit):

Take this source code:

void f1();
void f2() {
  f1();
  {
    int i;
    f1();
    f1();
  }
}

Compile to LLVM textual IR with debug info (clang++ test.cpp -g -c -S -emit-llvm ).

Modify the resulting IR by swapping the order of the first two calls in f2. This creates a hole in the scope's range, forcing the use of DW_AT_ranges/debug_ranges section. While still having a low/highpc for the CU, providing a default base address for the range list.

Here's the modified IR:

define void @_Z2f2v() local_unnamed_addr !dbg !7 {
entry:
  tail call void @_Z2f1v(), !dbg !15
  tail call void @_Z2f1v(), !dbg !14
  tail call void @_Z2f1v(), !dbg !16
  ret void, !dbg !17
}
declare void @_Z2f1v() local_unnamed_addr
!llvm.dbg.cu = !{!0}
!llvm.module.flags = !{!3, !4, !5}
!llvm.ident = !{!6}
!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !1, producer: "clang version 6.0.0 (trunk 309873) (llvm/trunk 309879)", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2)
!1 = !DIFile(filename: "range.cpp", directory: "/usr/local/google/home/blaikie/dev/scratch")
!2 = !{}
!3 = !{i32 2, !"Dwarf Version", i32 4}
!4 = !{i32 2, !"Debug Info Version", i32 3}
!5 = !{i32 1, !"wchar_size", i32 4}
!6 = !{!"clang version 6.0.0 (trunk 309873) (llvm/trunk 309879)"}
!7 = distinct !DISubprogram(name: "f2", linkageName: "_Z2f2v", scope: !1, file: !1, line: 2, type: !8, isLocal: false, isDefinition: true, scopeLine: 2, flags: DIFlagPrototyped, isOptimized: true, unit: !0, variables: !10)
!8 = !DISubroutineType(types: !9)
!9 = !{null}
!10 = !{!11}
!11 = !DILocalVariable(name: "i", scope: !12, file: !1, line: 5, type: !13)
!12 = distinct !DILexicalBlock(scope: !7, file: !1, line: 4, column: 3)
!13 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
!14 = !DILocation(line: 3, column: 3, scope: !7)
!15 = !DILocation(line: 6, column: 5, scope: !12)
!16 = !DILocation(line: 7, column: 5, scope: !12)
!17 = !DILocation(line: 9, column: 1, scope: !7)

Then compiled (clang ranges.ll) produces a ranges section that looks like this:

        .section        .debug_ranges,"",@progbits
.Ldebug_ranges0:
        .quad   .Lfunc_begin0-.Lfunc_begin0
        .quad   .Ltmp0-.Lfunc_begin0
        .quad   .Ltmp1-.Lfunc_begin0
        .quad   .Ltmp2-.Lfunc_begin0
        .quad   0
        .quad   0

So in this case you can only tell which section these refer to by knowing which CU is referring to this range list to find the default base address to parse the list with.

This mapping from generically parsed range lists to 'resolved' range lists happens in DWARFDebugRangeList::getAbsoluteRanges - so the computation/addition of the section name/index should probably happen here. (maybe this code maps to section index, and then the dumping code higher up maps the index to the name to print it out)

both debug_ranges and debug_loc need the address size to even parse/dump the raw contents (not accounting for more CU-specific things like default base addresses, etc) - but they take a different strategy to doing so. debug_ranges dumping only works if you also dump debug_info, and it scrapes the address size while dumping debug_info to use when dumping debug_ranges. debug_loc on teh other hand forces the parsing of the first CU to retrieve its pointer size - so debug_loc dumping works even if debug_info is not dumped (& only parses the first unit header to achieve this) - I think that's probably the right solution and debug_ranges should do the same thing there.

Thanks for explanations and testcase, David !
My comments below.

In D36270#830831, @dblaikie wrote:

A couple of things

If possible, I think it'd be better if it printed the section name, rather than the section index

Then I believe we want to print both name and index, because there can be multiple sections with the same name:
.section .foo, "aw", @progbits, unique, 1
.section .foo, "aw", @init_array, unique, 2
.section .foo, "aw", @preinit_array, unique, 3

How does your patch deal with handling the default base address? Given that it varies between each range list in debug_ranges and would only be known by determining which CU refers to that range list (which can only be known by walking all the DWARF DIEs to find the points that refer to range lists)? Does it not use a default base address? (I guess that's not the case) Does it use the default base address of the first CU?

I'd say probably leave the debug_ranges dumping as-is (I think there's a minor change that could be made to improve it a little*, but still leave it basically dumping the raw bytes, not the processed/semantic range list) but improve the range dumping in debug_info to include this info, perhaps?

You right, this patch does not know about default base address, it works only with addess selection entries if there are any. I think I'll abandon it and try to improve .debug_info dumping just like you suggested.

D36313 posted instead.

Revision Contents

Path

Size

lib/

DebugInfo/

DWARF/

DWARFDebugRangeList.cpp

49 lines

test/

DebugInfo/

X86/

dwarfdump-ranges-baseentry.s

135 lines

dwarfdump-ranges-unrelocated.s

10 lines

dwarfdump-ranges.test

17 lines

Diff 109568

lib/DebugInfo/DWARF/DWARFDebugRangeList.cpp

Show All 27 Lines	bool DWARFDebugRangeList::extract(const DWARFDataExtractor &data,
clear();		clear();
if (!data.isValidOffset(*offset_ptr))		if (!data.isValidOffset(*offset_ptr))
return false;		return false;
AddressSize = data.getAddressSize();		AddressSize = data.getAddressSize();
if (AddressSize != 4 && AddressSize != 8)		if (AddressSize != 4 && AddressSize != 8)
return false;		return false;
Offset = *offset_ptr;		Offset = *offset_ptr;
while (true) {		while (true) {
RangeListEntry entry;		RangeListEntry Entry;
		Entry.SectionIndex = -1ULL;

uint32_t prev_offset = *offset_ptr;		uint32_t prev_offset = *offset_ptr;
entry.StartAddress =		Entry.StartAddress = data.getRelocatedAddress(offset_ptr);
data.getRelocatedAddress(offset_ptr, &entry.SectionIndex);		Entry.EndAddress =
entry.EndAddress = data.getRelocatedAddress(offset_ptr);		data.getRelocatedAddress(offset_ptr, &Entry.SectionIndex);

// Check that both values were extracted correctly.		// Check that both values were extracted correctly.
if (offset_ptr != prev_offset + 2 AddressSize) {		if (offset_ptr != prev_offset + 2 AddressSize) {
clear();		clear();
return false;		return false;
}		}
if (entry.isEndOfListEntry())		if (Entry.isEndOfListEntry())
break;		break;
Entries.push_back(entry);		Entries.push_back(Entry);
}		}
return true;		return true;
}		}

void DWARFDebugRangeList::dump(raw_ostream &OS) const {		void DWARFDebugRangeList::dump(raw_ostream &OS) const {
		if (Entries.empty())
		return;

		if (AddressSize == 4)
		OS << "Type Offset HighPC LowPC Section\n"
		<< "---- -------- -------- -------- --------\n";
		else
		OS << "Type Offset HighPC LowPC Section\n"
		<< "---- -------- ---------------- ---------------- --------\n";

		Optional<uint64_t> BaseAddr;
		uint64_t BaseSection;
for (const RangeListEntry &RLE : Entries) {		for (const RangeListEntry &RLE : Entries) {
const char *format_str = (AddressSize == 4		// Type can be either address selection entry or regular range.
? "%08x %08" PRIx64 " %08" PRIx64 "\n"		const char *Type = RLE.StartAddress == -1LL ? " AS " : " R ";
: "%08x %016" PRIx64 " %016" PRIx64 "\n");		const char *format_str =
OS << format(format_str, Offset, RLE.StartAddress, RLE.EndAddress);		(AddressSize == 4 ? "%s %08x %08" PRIx64 " %08" PRIx64 " %u\n"
		: "%s %08x %016" PRIx64 " %016" PRIx64 " %u\n");

		uint64_t LowPC = RLE.StartAddress;
		uint64_t HighPC = RLE.EndAddress;
		uint64_t SectionIndex = RLE.SectionIndex;
		if (RLE.StartAddress == -1LL) {
		BaseAddr = RLE.EndAddress;
		BaseSection = RLE.SectionIndex;
		} else if (BaseAddr) {
		LowPC += *BaseAddr;
		HighPC += *BaseAddr;
		SectionIndex = BaseSection;
		}

		OS << format(format_str, Type, Offset, LowPC, HighPC, SectionIndex);
}		}
OS << format("%08x <End of list>\n", Offset);		OS << format("%08x <End of list>\n", Offset);
}		}

DWARFAddressRangesVector		DWARFAddressRangesVector
DWARFDebugRangeList::getAbsoluteRanges(uint64_t BaseAddress) const {		DWARFDebugRangeList::getAbsoluteRanges(uint64_t BaseAddress) const {
DWARFAddressRangesVector Res;		DWARFAddressRangesVector Res;
for (const RangeListEntry &RLE : Entries) {		for (const RangeListEntry &RLE : Entries) {
Show All 9 Lines

test/DebugInfo/X86/dwarfdump-ranges-baseentry.s

				# RUN: llvm-mc -triple x86_64-pc-linux -filetype=obj %s -o %t
				# RUN: llvm-dwarfdump %t \| FileCheck %s

				# CHECK: .debug_ranges contents:
				# CHECK-NEXT: Type Offset HighPC LowPC Section
				# CHECK-NEXT: ---- -------- ---------------- ---------------- --------
				# CHECK-NEXT: AS 00000000 ffffffffffffffff 0000000000000008 2
				# CHECK-NEXT: R 00000000 0000000000000008 0000000000000009 2
				# CHECK-NEXT: R 00000000 000000000000000a 000000000000000b 2
				# CHECK-NEXT: 00000000 <End of list>

				## Asm code for testcase is a reduced output from next invocation and source:
				# clang test.cpp -g -c -mllvm -use-dwarf-ranges-base-address-specifier
				# test.cpp:
				# void f2() {
				# }
				# __attribute__((nodebug)) void f1() {
				# }
				# void f3() {
				# }

				.text
				.quad 0x0

				.globl _Z2f2v
				.type _Z2f2v,@function
				_Z2f2v:
				.Lfunc_begin0:
				nop
				.Lfunc_end0:

				.globl _Z2f1v
				.type _Z2f1v,@function
				_Z2f1v:
				.Lfunc_begin1:
				nop
				.Lfunc_end1:


				.globl bar
				.type bar,@function
				bar:
				.Lfunc_begin2:
				nop
				.Lfunc_end2:

				.section .debug_abbrev,"",@progbits
				.byte 1 # Abbreviation Code
				.byte 17 # DW_TAG_compile_unit
				.byte 1 # DW_CHILDREN_yes
				.byte 37 # DW_AT_producer
				.byte 14 # DW_FORM_strp
				.byte 19 # DW_AT_language
				.byte 5 # DW_FORM_data2
				.byte 3 # DW_AT_name
				.byte 14 # DW_FORM_strp
				.byte 16 # DW_AT_stmt_list
				.byte 23 # DW_FORM_sec_offset
				.byte 27 # DW_AT_comp_dir
				.byte 14 # DW_FORM_strp
				.ascii "\264B" # DW_AT_GNU_pubnames
				.byte 25 # DW_FORM_flag_present
				.byte 17 # DW_AT_low_pc
				.byte 1 # DW_FORM_addr
				.byte 85 # DW_AT_ranges
				.byte 23 # DW_FORM_sec_offset
				.byte 0 # EOM(1)
				.byte 0 # EOM(2)
				.byte 2 # Abbreviation Code
				.byte 46 # DW_TAG_subprogram
				.byte 0 # DW_CHILDREN_no
				.byte 17 # DW_AT_low_pc
				.byte 1 # DW_FORM_addr
				.byte 18 # DW_AT_high_pc
				.byte 6 # DW_FORM_data4
				.byte 64 # DW_AT_frame_base
				.byte 24 # DW_FORM_exprloc
				.byte 110 # DW_AT_linkage_name
				.byte 14 # DW_FORM_strp
				.byte 3 # DW_AT_name
				.byte 14 # DW_FORM_strp
				.byte 58 # DW_AT_decl_file
				.byte 11 # DW_FORM_data1
				.byte 59 # DW_AT_decl_line
				.byte 11 # DW_FORM_data1
				.byte 63 # DW_AT_external
				.byte 25 # DW_FORM_flag_present
				.byte 0 # EOM(1)
				.byte 0 # EOM(2)
				.byte 0 # EOM(3)

				.section .debug_info,"",@progbits
				.Lcu_begin0:
				.long 89 # Length of Unit
				.short 4 # DWARF version number
				.long .debug_abbrev # Offset Into Abbrev. Section
				.byte 8 # Address Size (in bytes)
				.byte 1 # Abbrev [1] 0xb:0x52 DW_TAG_compile_unit
				.long 0 # DW_AT_producer
				.short 4 # DW_AT_language
				.long 0 # DW_AT_name
				.long 0 # DW_AT_stmt_list
				.long 0 # DW_AT_comp_dir
				.quad 0 # DW_AT_low_pc
				.long .Ldebug_ranges0 # DW_AT_ranges
				.byte 2 # Abbrev [2] 0x2a:0x19 DW_TAG_subprogram
				.quad .Lfunc_begin0 # DW_AT_low_pc
				.long .Lfunc_end0-.Lfunc_begin0 # DW_AT_high_pc
				.byte 1 # DW_AT_frame_base
				.byte 86
				.long 0 # DW_AT_linkage_name
				.long 0 # DW_AT_name
				.byte 1 # DW_AT_decl_file
				.byte 1 # DW_AT_decl_line
				.byte 2 # Abbrev [2] 0x43:0x19 DW_TAG_subprogram
				.quad 0 # DW_AT_low_pc
				.long .Lfunc_end1-.Lfunc_begin1 # DW_AT_high_pc
				.byte 1 # DW_AT_frame_base
				.byte 86
				.long 0 # DW_AT_linkage_name
				.long 0 # DW_AT_name
				.byte 1 # DW_AT_decl_file
				.byte 5 # DW_AT_decl_line
				.byte 0 # End Of Children Mark

				.section .debug_ranges,"",@progbits
				.Ldebug_ranges0:
				.quad -1
				.quad .Lfunc_begin0
				.quad .Lfunc_begin0-.Lfunc_begin0
				.quad .Lfunc_end0-.Lfunc_begin0
				.quad .Lfunc_begin2-.Lfunc_begin0
				.quad .Lfunc_end2-.Lfunc_begin0
				.quad 0
				.quad 0

test/DebugInfo/X86/dwarfdump-ranges-unrelocated.s

	# RUN: llvm-mc -triple x86_64-pc-linux -filetype=obj %s -o %t			# RUN: llvm-mc -triple x86_64-pc-linux -filetype=obj %s -o %t
	# RUN: llvm-dwarfdump %t \| FileCheck %s			# RUN: llvm-dwarfdump %t \| FileCheck %s

	# CHECK: .debug_ranges contents:			# CHECK: .debug_ranges contents:
	# CHECK: 00000000 0000000000000000 0000000000000001			# CHECK-NEXT: Type Offset HighPC LowPC Section
	# CHECK: 00000000 0000000000000000 0000000000000002			# CHECK-NEXT: ---- -------- ---------------- ---------------- --------
	# CHECK: 00000000 <End of list>			# CHECK-NEXT: R 00000000 0000000000000000 0000000000000001 3
				# CHECK-NEXT: R 00000000 0000000000000000 0000000000000002 4
				# CHECK-NEXT: 00000000 <End of list>

	## Asm code for testcase is a reduced output from next invocation and source:			## Asm code for testcase is a reduced output from next invocation and source:
	# clang test.cpp -S -o test.s -gmlt -ffunction-sections			# clang test.cpp -S -o test.s -gmlt -ffunction-sections
	# test.cpp:			# test.cpp:
	# void foo1() { }			# void foo1() { }
	# void foo2() { }			# void foo2() { }

	.section .text.foo1,"ax",@progbits			.section .text.foo1,"ax",@progbits
	▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

test/DebugInfo/dwarfdump-ranges.test

	Show All 9 Lines
	CHECK: DW_TAG_compile_unit			CHECK: DW_TAG_compile_unit
	CHECK-NOT: TAG			CHECK-NOT: TAG
	CHECK: DW_AT_ranges [DW_FORM_data4] (0x00000030			CHECK: DW_AT_ranges [DW_FORM_data4] (0x00000030
	CHECK-NEXT: [0x0000000000000640 - 0x000000000000064b)			CHECK-NEXT: [0x0000000000000640 - 0x000000000000064b)
	CHECK-NEXT: [0x0000000000000637 - 0x000000000000063d))			CHECK-NEXT: [0x0000000000000637 - 0x000000000000063d))


	CHECK: .debug_ranges contents:			CHECK: .debug_ranges contents:
	CHECK-NEXT: 00000000 000000000000062c 0000000000000637			CHECK-NEXT: Type Offset HighPC LowPC Section
	CHECK-NEXT: 00000000 0000000000000637 000000000000063d			CHECK-NEXT: ---- -------- ---------------- ---------------- --------
				CHECK-NEXT: R 00000000 000000000000062c 0000000000000637 4294967295
				CHECK-NEXT: R 00000000 0000000000000637 000000000000063d 4294967295
	CHECK-NEXT: 00000000 <End of list>			CHECK-NEXT: 00000000 <End of list>
	CHECK-NEXT: 00000030 0000000000000640 000000000000064b			CHECK-NEXT: Type Offset HighPC LowPC Section
	CHECK-NEXT: 00000030 0000000000000637 000000000000063d			CHECK-NEXT: ---- -------- ---------------- ---------------- --------
				CHECK-NEXT: R 00000030 0000000000000640 000000000000064b 4294967295
				CHECK-NEXT: R 00000030 0000000000000637 000000000000063d 4294967295
	CHECK-NEXT: 00000030 <End of list>			CHECK-NEXT: 00000030 <End of list>