This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/AsmPrinter/
-
CodeGen/
-
AsmPrinter/
2
DwarfDebug.cpp
-
test/
-
CodeGen/Generic/
-
Generic/
2
dwarf-aranges-zero-size.ll
-
DebugInfo/
-
MSP430/
-
dwarf-basics-v5.ll
-
X86/
3
dwarf-aranges.ll

Differential D126257

Round up zero-sized symbols to 1 byte in `.debug_aranges`.
ClosedPublic

Authored by pcwalton on May 23 2022, 4:32 PM.

Download Raw Diff

Details

Reviewers

dblaikie

Commits

rG256a52d9aac8: Round up zero-sized symbols to 1 byte in `.debug_aranges`.

Summary

This commit modifies the AsmPrinter to avoid emitting any zero-sized symbols to
the .debug_aranges table, by rounding their size up to 1. Entries with zero
length violate the DWARF 5 spec, which states:

Each descriptor is a triple consisting of a segment selector, the beginning
address within that segment of a range of text or data covered by some entry
owned by the corresponding compilation unit, followed by the non-zero length
of that range.

In practice, these zero-sized entries produce annoying warnings in lld and
cause GNU binutils to truncate the table when parsing it.

Other parts of LLVM, such as DWARFDebugARanges in the DebugInfo module
(specifically the appendRange method), already avoid emitting zero-sized
symbols to .debug_aranges, but not comprehensively in the AsmPrinter. In fact,
the AsmPrinter does try to avoid emitting such zero-sized symbols when labels
aren't involved, but doesn't when the symbol to emitted is a difference of two
labels; this patch extends that logic to handle the case in which the symbol is
defined via labels.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

pcwalton created this revision.May 23 2022, 4:32 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2022, 4:32 PM

Herald added a subscriber: hiraditya. · View Herald Transcript

pcwalton requested review of this revision.May 23 2022, 4:32 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2022, 4:32 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

@dblaikie Here's a new version of the patch that takes the alternate approach you suggested of rounding lengths up to 1 byte. Feel free to take either diff and I'll close the other.

pcwalton mentioned this in D126010: Make sure the AsmPrinter doesn't emit any zero-sized symbols to `.debug_aranges`..May 23 2022, 4:35 PM

Harbormaster completed remote builds in B165950: Diff 431521.May 23 2022, 5:19 PM

dblaikie added inline comments.May 23 2022, 5:23 PM

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
3057–3063	Does this need to be an MCExpr? Rather than a hardcoded value of `1` (which would seem simpler)?

dblaikie added inline comments.May 23 2022, 5:32 PM

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
3057–3063	Like what if this was just: if (Span.End && Size != 0) And let the existing `Size == 0 => Size = 1` handling below take it from there?

Emit a constant 1 instead of a more complicated MCExpr when emitting symbols of length 1.

How's this?

In D126257#3535829, @pcwalton wrote:

How's this?

If we're going in this direction, I think this is the way to implement it - though I'm still leaning towards GCC's choice of "all symbols (even data symbols) should be non-zero length" (at the object code level, they can be zero length at the source level) to avoid ambiguities/confusion.

This revision is now accepted and ready to land.May 24 2022, 5:03 PM

Thanks!

Wouldn't that mean that those zero-sized data symbols won't show up in the debugger anymore? That might be confusing for Rust users, who do use zero-sized globals reasonably commonly (to attach methods to).

Harbormaster completed remote builds in B166166: Diff 431834.May 24 2022, 6:04 PM

In D126257#3535908, @pcwalton wrote:

Thanks!

Wouldn't that mean that those zero-sized data symbols won't show up in the debugger anymore? That might be confusing for Rust users, who do use zero-sized globals reasonably commonly (to attach methods to).

Sorry, I'm not following you - I mean for zero-sized data symbols, we should emit a single byte anyway - so that the address of the symbol is unique/actually exists.

What situation are you considering where something would cause things not to show up in the debugger anymore?

Oh, what I mean is that our choices in the case in which zero-sized global symbols are forbidden at the IR level would be (1) don't emit source-level zero-sized symbols into LLVM IR (or don't emit DWARF metadata for them), or (2) round all zero-sized symbols up to one byte. If we pick (1), I don't think they would show up in the debug info, which could be confusing for programmers. If we pick (2), then certain abstractions which are zero-cost today in Rust become non-zero-cost.

Mind you, I don't know of any applications specifically for which adding one extra byte per zero-sized symbol would be a problem. But I'm worried that someone might be, for example, writing a macro which creates thousands of zero-sized globals, as part of some trick along the lines of traits types in C++, and then the runtime space cost of rounding all globals up to size 1 could become noticeable.

In D126257#3538331, @pcwalton wrote:

Oh, what I mean is that our choices in the case in which zero-sized global symbols are forbidden at the IR level would be (1) don't emit source-level zero-sized symbols into LLVM IR (or don't emit DWARF metadata for them), or (2) round all zero-sized symbols up to one byte. If we pick (1), I don't think they would show up in the debug info, which could be confusing for programmers. If we pick (2), then certain abstractions which are zero-cost today in Rust become non-zero-cost.

Ah, sorry, yes, I wasn't suggesting (1) - I was suggesting (2). like GCC appears to do: https://godbolt.org/z/E89eo15qd

Mind you, I don't know of any applications specifically for which adding one extra byte per zero-sized symbol would be a problem. But I'm worried that someone might be, for example, writing a macro which creates thousands of zero-sized globals, as part of some trick along the lines of traits types in C++, and then the runtime space cost of rounding all globals up to size 1 could become noticeable.

Yeah - OK, I can see the hypothetical argument. All the more reason, from my perspective, to avoid enabling these to be truly zero-size now only to realize we need to change this later.

The ability to talk about unique addresses for each entity seems important to me (important for C++ at the language level, I think - though admittedly zero-size entities are an extension, so C++ doesn't really say what it means to take their address, compare that address, etc) - but also for DWARF consumers to search for things. If they're really zero length, then searching for the address sort of /should/ result in no result (because the address you're searching for is not part of the object, even though it is the address of the object - because the object takes up no space) - or should/must result in all the zero-length objects at the address plus whatever object follows it - making any kind of identification (eg: debuggers print out what global a pointer points to - but now it points to more than one thing...) complicated/ambiguous/confusing for the consumer and the users.

This revision was landed with ongoing or failed builds.May 25 2022, 1:32 PM

Closed by commit rG256a52d9aac8: Round up zero-sized symbols to 1 byte in `.debug_aranges`. (authored by pcwalton, committed by ayermolo). · Explain Why

This revision was automatically updated to reflect the committed changes.

ayermolo added a commit: rG256a52d9aac8: Round up zero-sized symbols to 1 byte in `.debug_aranges`..

In https://reviews.llvm.org/rG38eb4fe74b38 I moved the generic test into the X86 folder as it uses an X86 triple. You could think of the folder names there having 2 meanings, 1: the tests look at things specific to that arch and 2: they use that arch's triple (I think X86 is usually the one people choose if they want to test something generic anyway).

dtolnay added a subscriber: dtolnay.May 26 2022, 8:54 AM

bjope added a subscriber: bjope.May 26 2022, 9:26 AM

bjope added inline comments.

llvm/test/CodeGen/Generic/dwarf-aranges-zero-size.ll
18	Notice that align here is specified in bits. I think it is a bit weird to have a 1 bit alignment? So why do I care? Our downstream fork is checking that alignment is a multiple of byte size since the getAlignInBytes methods in DebugInfoMetadata.h is dividing the alignment specified in bits by the byte size. This test case hit such assertions. Do you think it is ok to change this to "align: 8"?

In D126257#3539510, @DavidSpickett wrote:

In https://reviews.llvm.org/rG38eb4fe74b38 I moved the generic test into the X86 folder as it uses an X86 triple. You could think of the folder names there having 2 meanings, 1: the tests look at things specific to that arch and 2: they use that arch's triple (I think X86 is usually the one people choose if they want to test something generic anyway).

Thanks for the catch/cleanup!

llvm/test/CodeGen/Generic/dwarf-aranges-zero-size.ll
18	Guessing we could probably just remove the "align: " field entirely here. Usually I'd advocate for generating the example from some C++ source code fed into clang, since that's the most canonical IR we have - a small example with a zero-length array or the like. Though that might lead to more type information than is really helpful in a test like this. I guess this was generated by Rust? So might be worth seeing why/where this alignment was created, maybe there's some bugs there to be sorted out too. As for the alignment restrictions: Might be worth upstreaming your check into LLVM's debug info verifier to make it an explicit constraint of the IR - would make it easier to detect these sort of things sooner.

bjope added inline comments.May 26 2022, 12:02 PM

llvm/test/DebugInfo/X86/dwarf-aranges.ll
25	I do not fully understand what happened here. The old label range was not zero-sized, right? So this is not rounding up to 1 byte, it is truncating it down to 1 byte, right? Is that really the intention with the patch?

dblaikie added inline comments.May 26 2022, 1:20 PM

llvm/test/DebugInfo/X86/dwarf-aranges.ll
25	Yep. Looks buggy to me - I guess maybe functions don't have a known size so appear to have size zero (when it's really unknown size) & that ends up overriding the real size computation. @pcwalton might be worth reverting this to figure that out?

uabelho added a subscriber: uabelho.May 29 2022, 10:11 PM

bjope added a subscriber: ayermolo.May 31 2022, 1:23 AM

bjope added inline comments.

llvm/test/DebugInfo/X86/dwarf-aranges.ll
25	Still no comments from @pcwalton (nor @ayermolo who committed this patch). I guess I can perform the revert while waiting for feedback. But then I also need to revert the follow-up commit by @DavidSpickett. Later when/if re-applying one would need to also re-apply that commit (and preferably also fix the problem with the align field in the dwarf-aranges-zero-size.ll test case discussed above).

Sorry for the delay. I was looking at this on Friday but didn't get around to finishing it. Feel free to revert in the meantime.

bjope added a reverting change: rG86caa0371859: Revert "Round up zero-sized symbols to 1 byte in `.debug_aranges`.".May 31 2022, 2:04 AM

In D126257#3546857, @pcwalton wrote:

Sorry for the delay. I was looking at this on Friday but didn't get around to finishing it. Feel free to revert in the meantime.

Well, this has already only caused me trouble. And since it isn't as trivial as to just revert a single commit I'm not saying that I really look forward to clean up the mess (I probably won't spend much time on explaining the reverts in the commit messages etc, but I guess I still need to follow up to make sure I haven't broken something that make build bots go crazy due to the reverts, so even more of my time being invested in this). If you are aware of the problem and doesn't have a quick solution I think you should react quicker and make the reverts yourself next time.

Anyway, I've landed the revert now!

I see. I don't have commit access unfortunately.

Hmm for some reason I wan't getting any notifications on this diff. My apologies.

pcwalton mentioned this in D126835: Round up zero-sized symbols to 1 byte in `.debug_aranges` (without breaking other logic)..Jun 1 2022, 3:45 PM

Abandoned in favor of D126835.

As another data point, in Chromium we lost line numbers in backtraces after this, see https://bugs.chromium.org/p/chromium/issues/detail?id=1335630

egarcia added a subscriber: egarcia.Jun 16 2022, 11:31 AM

ayermolo mentioned this in rGbecbbb7e3c81: Round up zero-sized symbols to 1 byte in `.debug_aranges` (without breaking….Jun 27 2022, 10:02 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

AsmPrinter/

DwarfDebug.cpp

10 lines

test/

CodeGen/

Generic/

dwarf-aranges-zero-size.ll

23 lines

DebugInfo/

MSP430/

dwarf-basics-v5.ll

2 lines

X86/

dwarf-aranges.ll

2 lines

Diff 432108

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp

Show First 20 Lines • Show All 3,036 Lines • ▼ Show 20 Lines	for (DwarfCompileUnit *CU : CUs) {
Asm->OutStreamer->AddComment("Segment Size (in bytes)");		Asm->OutStreamer->AddComment("Segment Size (in bytes)");
Asm->emitInt8(0);		Asm->emitInt8(0);

Asm->OutStreamer->emitFill(Padding, 0xff);		Asm->OutStreamer->emitFill(Padding, 0xff);

for (const ArangeSpan &Span : List) {		for (const ArangeSpan &Span : List) {
Asm->emitLabelReference(Span.Start, PtrSize);		Asm->emitLabelReference(Span.Start, PtrSize);

// Calculate the size as being from the span start to it's end.		// Calculate the size as being from the span start to its end.
if (Span.End) {		//
		// If the size is zero, then round it up to one byte. The DWARF
		// specification requires that entries in this table have nonzero
		// lengths.
		uint64_t Size = SymSize[Span.Start];
		if (Size != 0 && Span.End) {
Asm->emitLabelDifference(Span.End, Span.Start, PtrSize);		Asm->emitLabelDifference(Span.End, Span.Start, PtrSize);
} else {		} else {
// For symbols without an end marker (e.g. common), we		// For symbols without an end marker (e.g. common), we
// write a single arange entry containing just that one symbol.		// write a single arange entry containing just that one symbol.
uint64_t Size = SymSize[Span.Start];
if (Size == 0)		if (Size == 0)
Size = 1;		Size = 1;

Asm->OutStreamer->emitIntValue(Size, PtrSize);		Asm->OutStreamer->emitIntValue(Size, PtrSize);
}		}
}		}

Asm->OutStreamer->AddComment("ARange terminator");		Asm->OutStreamer->AddComment("ARange terminator");
		dblaikieUnsubmitted Not Done Reply Inline Actions Does this need to be an MCExpr? Rather than a hardcoded value of `1` (which would seem simpler)? dblaikie: Does this need to be an MCExpr? Rather than a hardcoded value of `1` (which would seem simpler)?
		dblaikieUnsubmitted Not Done Reply Inline Actions Like what if this was just: if (Span.End && Size != 0) And let the existing `Size == 0 => Size = 1` handling below take it from there? dblaikie: Like what if this was just: ``` if (Span.End && Size != 0) ``` And let the existing `Size == 0…
Asm->OutStreamer->emitIntValue(0, PtrSize);		Asm->OutStreamer->emitIntValue(0, PtrSize);
Asm->OutStreamer->emitIntValue(0, PtrSize);		Asm->OutStreamer->emitIntValue(0, PtrSize);
}		}
}		}

/// Emit a single range list. We handle both DWARF v5 and earlier.		/// Emit a single range list. We handle both DWARF v5 and earlier.
static void emitRangeList(DwarfDebug &DD, AsmPrinter *Asm,		static void emitRangeList(DwarfDebug &DD, AsmPrinter *Asm,
const RangeSpanList &List) {		const RangeSpanList &List) {
▲ Show 20 Lines • Show All 474 Lines • Show Last 20 Lines

llvm/test/CodeGen/Generic/dwarf-aranges-zero-size.ll

This file was added.

				; Ensures that the AsmPrinter doesn't emit zero-sized symbols into `.debug_aranges`.
				;
				; RUN: llc --generate-arange-section < %s \| FileCheck %s
				; CHECK: .section .debug_aranges
				; CHECK: .quad EXAMPLE
				; CHECK-NEXT: .quad 1
				; CHECK: .section

				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@EXAMPLE = constant <{ [0 x i8] }> zeroinitializer, align 1, !dbg !0

				!llvm.module.flags = !{!3}
				!llvm.dbg.cu = !{!4}

				!0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression())
				!1 = distinct !DIGlobalVariable(name: "EXAMPLE", linkageName: "EXAMPLE", scope: null, file: null, line: 161, type: !2, isLocal: false, isDefinition: true, align: 1)
				bjopeUnsubmitted Not Done Reply Inline Actions Notice that align here is specified in bits. I think it is a bit weird to have a 1 bit alignment? So why do I care? Our downstream fork is checking that alignment is a multiple of byte size since the getAlignInBytes methods in DebugInfoMetadata.h is dividing the alignment specified in bits by the byte size. This test case hit such assertions. Do you think it is ok to change this to "align: 8"? bjope: Notice that align here is specified in bits. I think it is a bit weird to have a 1 bit…
				dblaikieUnsubmitted Not Done Reply Inline Actions Guessing we could probably just remove the "align: " field entirely here. Usually I'd advocate for generating the example from some C++ source code fed into clang, since that's the most canonical IR we have - a small example with a zero-length array or the like. Though that might lead to more type information than is really helpful in a test like this. I guess this was generated by Rust? So might be worth seeing why/where this alignment was created, maybe there's some bugs there to be sorted out too. As for the alignment restrictions: Might be worth upstreaming your check into LLVM's debug info verifier to make it an explicit constraint of the IR - would make it easier to detect these sort of things sooner. dblaikie: Guessing we could probably just remove the "align: " field entirely here. Usually I'd advocate…
				!2 = !DIBasicType(name: "()", encoding: DW_ATE_unsigned)
				!3 = !{i32 2, !"Debug Info Version", i32 3}
				!4 = distinct !DICompileUnit(language: DW_LANG_Rust, file: !5, producer: "rustc", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: null, globals: !6)
				!5 = !DIFile(filename: "foo", directory: "")
				!6 = !{!0}

llvm/test/DebugInfo/MSP430/dwarf-basics-v5.ll

	Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
	; CHECK: NULL			; CHECK: NULL

	; CHECK: DW_TAG_pointer_type			; CHECK: DW_TAG_pointer_type

	; CHECK: NULL			; CHECK: NULL

	; CHECK: .debug_aranges contents:			; CHECK: .debug_aranges contents:
	; CHECK-NEXT: Address Range Header: length = 0x{{.*}}, format = DWARF32, version = 0x0002, cu_offset = 0x00000000, addr_size = 0x02, seg_size = 0x00			; CHECK-NEXT: Address Range Header: length = 0x{{.*}}, format = DWARF32, version = 0x0002, cu_offset = 0x00000000, addr_size = 0x02, seg_size = 0x00
	; CHECK-NEXT: [0x0000, 0x0006)			; CHECK-NEXT: [0x0000, 0x0001)

	; CHECK: .debug_addr contents:			; CHECK: .debug_addr contents:
	; CHECK-NEXT: Address table header: length = 0x{{.*}}, format = DWARF32, version = 0x0005, addr_size = 0x02, seg_size = 0x00			; CHECK-NEXT: Address table header: length = 0x{{.*}}, format = DWARF32, version = 0x0005, addr_size = 0x02, seg_size = 0x00
	; CHECK-NEXT: Addrs: [			; CHECK-NEXT: Addrs: [
	; CHECK-NEXT: 0x0000			; CHECK-NEXT: 0x0000
	; CHECK-NEXT: ]			; CHECK-NEXT: ]

	; ModuleID = 'dwarf-basics-v5.c'			; ModuleID = 'dwarf-basics-v5.c'
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/test/DebugInfo/X86/dwarf-aranges.ll

	Show All 16 Lines
	; CHECK-NEXT: .quad .Lsec_end1-some_other			; CHECK-NEXT: .quad .Lsec_end1-some_other

	; <common symbols> - it should have made one span for each symbol.			; <common symbols> - it should have made one span for each symbol.
	; CHECK-NEXT: .quad some_bss			; CHECK-NEXT: .quad some_bss
	; CHECK-NEXT: .quad 4			; CHECK-NEXT: .quad 4

	; <text section> - it should have made one span covering all functions in this CU.			; <text section> - it should have made one span covering all functions in this CU.
	; CHECK-NEXT: .quad .Lfunc_begin0			; CHECK-NEXT: .quad .Lfunc_begin0
	; CHECK-NEXT: .quad .Lsec_end2-.Lfunc_begin0			; CHECK-NEXT: .quad 1
				bjopeUnsubmitted Not Done Reply Inline Actions I do not fully understand what happened here. The old label range was not zero-sized, right? So this is not rounding up to 1 byte, it is truncating it down to 1 byte, right? Is that really the intention with the patch? bjope: I do not fully understand what happened here. The old label range was not zero-sized, right? So…
				dblaikieUnsubmitted Not Done Reply Inline Actions Yep. Looks buggy to me - I guess maybe functions don't have a known size so appear to have size zero (when it's really unknown size) & that ends up overriding the real size computation. @pcwalton might be worth reverting this to figure that out? dblaikie: Yep. Looks buggy to me - I guess maybe functions don't have a known size so appear to have size…
				bjopeUnsubmitted Not Done Reply Inline Actions Still no comments from @pcwalton (nor @ayermolo who committed this patch). I guess I can perform the revert while waiting for feedback. But then I also need to revert the follow-up commit by @DavidSpickett. Later when/if re-applying one would need to also re-apply that commit (and preferably also fix the problem with the align field in the dwarf-aranges-zero-size.ll test case discussed above). bjope: Still no comments from @pcwalton (nor @ayermolo who committed this patch). I guess I can…

	; -- finish --			; -- finish --
	; CHECK-NEXT: # ARange terminator			; CHECK-NEXT: # ARange terminator

	; -- source code --			; -- source code --
	; Generated from: "clang -c -g -emit-llvm"			; Generated from: "clang -c -g -emit-llvm"
	;			;
	; int some_data = 4;			; int some_data = 4;
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines