This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/AsmPrinter/
-
CodeGen/
-
AsmPrinter/
3/6
DwarfDebug.cpp
-
test/CodeGen/Generic/
-
CodeGen/
-
Generic/
-
dwarf-aranges-zero-size.ll

Differential D126010

Make sure the AsmPrinter doesn't emit any zero-sized symbols to `.debug_aranges`.
AbandonedPublic

Authored by pcwalton on May 19 2022, 1:06 PM.

Download Raw Diff

Details

Reviewers

dblaikie
aprantl
rnk
probinson

Summary

This commit modifies the AsmPrinter to avoid emitting any zero-sized symbols to
the .debug_aranges table. Such symbols violate the DWARF 5 spec, which
states:

Each descriptor is a triple consisting of a segment selector, the beginning
address within that segment of a range of text or data covered by some entry
owned by the corresponding compilation unit, followed by the non-zero length
of that range.

In practice, these zero-sized entries produce annoying warnings in lld and
cause GNU binutils to truncate the table when parsing it.

Other parts of LLVM, such as DWARFDebugARanges in the DebugInfo module
(specifically the appendRange method), already avoid emitting zero-sized
symbols to .debug_aranges, but not comprehensively in the AsmPrinter. In
fact, the AsmPrinter does try to avoid emitting such zero-sized symbols when
labels aren't involved, but doesn't when the symbol to emitted is a difference
of two labels; this patch extends that logic to handle the case in which the
symbol is defined via labels.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

pcwalton created this revision.May 19 2022, 1:06 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2022, 1:06 PM

Herald added a subscriber: hiraditya. · View Herald Transcript

pcwalton requested review of this revision.May 19 2022, 1:06 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2022, 1:06 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

pcwalton added a reviewer: dblaikie.May 19 2022, 1:07 PM

Harbormaster completed remote builds in B165396: Diff 430785.May 19 2022, 2:06 PM

MatzeB added reviewers: aprantl, rnk, probinson.May 19 2022, 3:22 PM

MatzeB added subscribers: MatzeB, junfd.May 19 2022, 3:34 PM

I think the right solution to this is not to have zero sized symbols at all (they break C++ function pointer uniqueness, make symbolising "weird"/ambiguous, break aranges here, and if they are functions and those functions are called they result in some pretty arbitrary badness as execution falls off - and we should probably fix that in general so you can't just fall off the end of a function, the code size wins aren't enough to justify it I believe/think/would be good to have some data to back that up)

I think I had a thread on llvm-dev a while back about this and how we should fix this. There are some options in llvm already to addressing overlapping sets of issues like this but not sure there's exactly "trap at end of any function that doesn't end in a branch instruction of some kind"

Well, named zero-sized values are a first-class feature of Rust. You can write code like this:

pub static FOO: () = ();

Or:

struct ZeroSized;
pub static BAR: ZeroSized = ZeroSized;

and calling mem::size_of::<ZeroSized>() is guaranteed by the language semantics to evaluate to zero. We want to be able to encode such values in DWARF as well so that printing them in the debugger does something sensible.

Forbidding zero-sized functions makes sense to me, but I'm not sure how we would proceed forbidding zero-sized globals without regressing working functionality.

That being said, forbidding global zero-sized symbols might work for us, because I think those are pretty rare. They do appear in practice sometimes, but not commonly.

However, Rust programmers do use the pub static BAR: ZeroSized = ZeroSized pattern above in a similar way to traits types in C++. Without zero-sized global symbols, this would no longer be a zero-cost abstraction. Of course, 1 byte per symbol is a pretty low cost, though I wonder if there are macro crates out there that generate a whole bunch of them.

I agree with David, I would like to see LLVM move in the direction of never emitting empty functions. These are just labels that snap to the next function in the same section, and that's silly. I'm not sure what happens if you use function sections. We should just emit some trap instruction, and let the linker do identical code folding (ICF) to merge them back together. This will regress code size, but I doubt out users will complain, and ICF will recover most of the size regression.

Regarding global symbols, I don't know about Rust, but I believe it is possible to emit empty global variables in LLVM IR with zero-sized arrays. So, I think this change probably has merit on its own, without getting into the handling of empty function bodies.

Lastly, this code change requires a test.

Added a test.

I like the idea of moving in the direction of forbidding zero-sized functions (pretty sure Rust never emits these) but supporting zero-sized globals.

I added a test.

Harbormaster completed remote builds in B165577: Diff 431041.May 20 2022, 2:24 PM

ayermolo added a subscriber: ayermolo.May 20 2022, 3:43 PM

In D126010#3528129, @rnk wrote:

I agree with David, I would like to see LLVM move in the direction of never emitting empty functions. These are just labels that snap to the next function in the same section, and that's silly. I'm not sure what happens if you use function sections. We should just emit some trap instruction, and let the linker do identical code folding (ICF) to merge them back together. This will regress code size, but I doubt out users will complain, and ICF will recover most of the size regression.

Regarding global symbols, I don't know about Rust, but I believe it is possible to emit empty global variables in LLVM IR with zero-sized arrays. So, I think this change probably has merit on its own, without getting into the handling of empty function bodies.

LLVM IR/C++ does support zero-sized arrays, though I think even in that case it might be better to make them non-zero symbols for symbolizing purposes, etc - GCC seems to make them non-zero size: https://godbolt.org/z/vh4z3G4fh

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
3052–3054	This might actually be a more suitable direction for zero-length entries. Otherwise: What's a consumer going to do if they query for the address and it's not in aranges? (They then need to scan all the DWARF to find the zero-length entry at that address, losing the benefit of aranges?) (also: what are you using aranges for? They've been off-by-default in Clang for many years now & I don't know of any particular value they have compared to using the ranges on the CUs in .debug_info directly (well, LLVM's aranges include data/non-code symbols, but GCC's don't, so it's hard for a consumer to rely on that extra data anyway) - I hope one day we can remove the support entirely)

dtolnay added a subscriber: dtolnay.May 23 2022, 3:34 AM

pcwalton added inline comments.May 23 2022, 10:49 AM

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
3052–3054	This might actually be a more suitable direction for zero-length entries. Otherwise: What's a consumer going to do if they query for the address and it's not in aranges? (They then need to scan all the DWARF to find the zero-length entry at that address, losing the benefit of aranges?) (also: what are you using aranges for? They've been off-by-default in Clang for many years now & I don't know of any particular value they have compared to using the ranges on the CUs in .debug_info directly (well, LLVM's aranges include data/non-code symbols, but GCC's don't, so it's hard for a consumer to rely on that extra data anyway) - I hope one day we can remove the support entirely) We actually aren't using them for anything. The problem is that some tools don't cope well with invalid `.debug_aranges` tables with premature terminators, so we have to emit something valid. It doesn't matter what it is.

wenlei added a subscriber: wenlei.May 23 2022, 11:16 AM

dblaikie added inline comments.May 23 2022, 4:03 PM

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
3052–3054	Sorry, I meant: how did you come across this bug? Aranges are disabled by default - so even if they're broken, I wouldn't expect that to be a problem, because they're not turned on anyway.

pcwalton added inline comments.May 23 2022, 4:22 PM

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
3052–3054	rustc generates `.debug_aranges`, and we're compiling Rust code. The relevant Rust issue explains the rationale as far as I'm aware (and it looks like you commented there).

pcwalton added inline comments.May 23 2022, 4:35 PM

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
3052–3054	Here's a new version of the patch that takes the approach you suggested of rounding sizes up to 1. https://reviews.llvm.org/D126257

ayermolo added inline comments.May 23 2022, 4:40 PM

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
3052–3054	Doesn't lldb use .debug_aranges to speed up loading of debug info?

In D126010#3529743, @dblaikie wrote:

In D126010#3528129, @rnk wrote:

I agree with David, I would like to see LLVM move in the direction of never emitting empty functions. These are just labels that snap to the next function in the same section, and that's silly. I'm not sure what happens if you use function sections. We should just emit some trap instruction, and let the linker do identical code folding (ICF) to merge them back together. This will regress code size, but I doubt out users will complain, and ICF will recover most of the size regression.

Regarding global symbols, I don't know about Rust, but I believe it is possible to emit empty global variables in LLVM IR with zero-sized arrays. So, I think this change probably has merit on its own, without getting into the handling of empty function bodies.

LLVM IR/C++ does support zero-sized arrays, though I think even in that case it might be better to make them non-zero symbols for symbolizing purposes, etc - GCC seems to make them non-zero size: https://godbolt.org/z/vh4z3G4fh

I'm still inclined towards GCC's direction here - making everything non-zero length. It means querying for an address makes sense, for instance. Otherwise what does it mean to search the DWARF for an address? If there is nothing /at/ that address (or the next thing is technically at that address, because the thing you're looking for is zero length) - like if you want to print out the symbol name of a zero-length symbol, the search should technically come back empty, because there's nothing at the address.

Closing in favor of D126257.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

AsmPrinter/

DwarfDebug.cpp

8 lines

test/

CodeGen/

Generic/

dwarf-aranges-zero-size.ll

23 lines

Diff 431041

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp

Show First 20 Lines • Show All 2,908 Lines • ▼ Show 20 Lines
// Emit a debug aranges section, containing a CU lookup for any		// Emit a debug aranges section, containing a CU lookup for any
// address we can tie back to a CU.		// address we can tie back to a CU.
void DwarfDebug::emitDebugARanges() {		void DwarfDebug::emitDebugARanges() {
// Provides a unique id per text section.		// Provides a unique id per text section.
MapVector<MCSection *, SmallVector<SymbolCU, 8>> SectionMap;		MapVector<MCSection *, SmallVector<SymbolCU, 8>> SectionMap;

// Filter labels by section.		// Filter labels by section.
for (const SymbolCU &SCU : ArangeLabels) {		for (const SymbolCU &SCU : ArangeLabels) {
		// Ignore zero-sized symbols so that we don't end up emitting any aranges
		// that have zero length, which would violate the DWARF spec.
		if (SymSize[SCU.Sym] == 0)
		continue;

if (SCU.Sym->isInSection()) {		if (SCU.Sym->isInSection()) {
// Make a note of this symbol and it's section.		// Make a note of this symbol and it's section.
MCSection *Section = &SCU.Sym->getSection();		MCSection *Section = &SCU.Sym->getSection();
if (!Section->getKind().isMetadata())		if (!Section->getKind().isMetadata())
SectionMap[Section].push_back(SCU);		SectionMap[Section].push_back(SCU);
} else {		} else {
// Some symbols (e.g. common/bss on mach-o) can have no section but still		// Some symbols (e.g. common/bss on mach-o) can have no section but still
// appear in the output. This sucks as we rely on sections to build		// appear in the output. This sucks as we rely on sections to build
▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	for (const ArangeSpan &Span : List) {

// Calculate the size as being from the span start to it's end.		// Calculate the size as being from the span start to it's end.
if (Span.End) {		if (Span.End) {
Asm->emitLabelDifference(Span.End, Span.Start, PtrSize);		Asm->emitLabelDifference(Span.End, Span.Start, PtrSize);
} else {		} else {
// For symbols without an end marker (e.g. common), we		// For symbols without an end marker (e.g. common), we
// write a single arange entry containing just that one symbol.		// write a single arange entry containing just that one symbol.
uint64_t Size = SymSize[Span.Start];		uint64_t Size = SymSize[Span.Start];
if (Size == 0)
Size = 1;

dblaikieUnsubmitted Not Done Reply Inline Actions This might actually be a more suitable direction for zero-length entries. Otherwise: What's a consumer going to do if they query for the address and it's not in aranges? (They then need to scan all the DWARF to find the zero-length entry at that address, losing the benefit of aranges?) (also: what are you using aranges for? They've been off-by-default in Clang for many years now & I don't know of any particular value they have compared to using the ranges on the CUs in .debug_info directly (well, LLVM's aranges include data/non-code symbols, but GCC's don't, so it's hard for a consumer to rely on that extra data anyway) - I hope one day we can remove the support entirely) dblaikie: This might actually be a more suitable direction for zero-length entries. Otherwise: What's a…
pcwaltonAuthorUnsubmitted Done Reply Inline Actions This might actually be a more suitable direction for zero-length entries. Otherwise: What's a consumer going to do if they query for the address and it's not in aranges? (They then need to scan all the DWARF to find the zero-length entry at that address, losing the benefit of aranges?) (also: what are you using aranges for? They've been off-by-default in Clang for many years now & I don't know of any particular value they have compared to using the ranges on the CUs in .debug_info directly (well, LLVM's aranges include data/non-code symbols, but GCC's don't, so it's hard for a consumer to rely on that extra data anyway) - I hope one day we can remove the support entirely) We actually aren't using them for anything. The problem is that some tools don't cope well with invalid `.debug_aranges` tables with premature terminators, so we have to emit something valid. It doesn't matter what it is. pcwalton: > This might actually be a more suitable direction for zero-length entries. > > Otherwise…
dblaikieUnsubmitted Not Done Reply Inline Actions Sorry, I meant: how did you come across this bug? Aranges are disabled by default - so even if they're broken, I wouldn't expect that to be a problem, because they're not turned on anyway. dblaikie: Sorry, I meant: how did you come across this bug? Aranges are disabled by default - so even if…
pcwaltonAuthorUnsubmitted Done Reply Inline Actions rustc generates `.debug_aranges`, and we're compiling Rust code. The relevant Rust issue explains the rationale as far as I'm aware (and it looks like you commented there). pcwalton: rustc generates `.debug_aranges`, and we're compiling Rust code. [[https://github.com/rust…
pcwaltonAuthorUnsubmitted Done Reply Inline Actions Here's a new version of the patch that takes the approach you suggested of rounding sizes up to 1. https://reviews.llvm.org/D126257 pcwalton: Here's a new version of the patch that takes the approach you suggested of rounding sizes up to…
ayermoloUnsubmitted Not Done Reply Inline Actions Doesn't lldb use .debug_aranges to speed up loading of debug info? ayermolo: Doesn't lldb use .debug_aranges to speed up loading of debug info?
Asm->OutStreamer->emitIntValue(Size, PtrSize);		Asm->OutStreamer->emitIntValue(Size, PtrSize);
}		}
}		}

Asm->OutStreamer->AddComment("ARange terminator");		Asm->OutStreamer->AddComment("ARange terminator");
Asm->OutStreamer->emitIntValue(0, PtrSize);		Asm->OutStreamer->emitIntValue(0, PtrSize);
Asm->OutStreamer->emitIntValue(0, PtrSize);		Asm->OutStreamer->emitIntValue(0, PtrSize);
}		}
▲ Show 20 Lines • Show All 479 Lines • Show Last 20 Lines

llvm/test/CodeGen/Generic/dwarf-aranges-zero-size.ll

This file was added.

				; Ensures that the AsmPrinter doesn't emit zero-sized symbols into `.debug_aranges`.
				;
				; RUN: llc --generate-arange-section < %s \| FileCheck %s
				; CHECK: .section .debug_aranges
				; CHECK-NOT: .quad EXAMPLE
				; CHECK: .section

				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@EXAMPLE = constant <{ [0 x i8] }> zeroinitializer, align 1, !dbg !0

				!llvm.module.flags = !{!3}
				!llvm.dbg.cu = !{!4}

				!0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression())
				!1 = distinct !DIGlobalVariable(name: "EXAMPLE", linkageName: "EXAMPLE", scope: null, file: null, line: 161, type: !2, isLocal: false, isDefinition: true, align: 1)
				!2 = !DIBasicType(name: "()", encoding: DW_ATE_unsigned)
				!3 = !{i32 2, !"Debug Info Version", i32 3}
				!4 = distinct !DICompileUnit(language: DW_LANG_Rust, file: !5, producer: "rustc", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: null, globals: !6)
				!5 = !DIFile(filename: "foo", directory: "")
				!6 = !{!0}