This is an archive of the discontinued LLVM Phabricator instance.

Make sure the AsmPrinter doesn't emit any zero-sized symbols to `.debug_aranges`.
AbandonedPublic

Authored by pcwalton on May 19 2022, 1:06 PM.

Details

Summary

This commit modifies the AsmPrinter to avoid emitting any zero-sized symbols to
the .debug_aranges table. Such symbols violate the DWARF 5 spec, which
states:

Each descriptor is a triple consisting of a segment selector, the beginning
address within that segment of a range of text or data covered by some entry
owned by the corresponding compilation unit, followed by the non-zero length
of that range.

In practice, these zero-sized entries produce annoying warnings in lld and
cause GNU binutils to truncate the table when parsing it.

Other parts of LLVM, such as DWARFDebugARanges in the DebugInfo module
(specifically the appendRange method), already avoid emitting zero-sized
symbols to .debug_aranges, but not comprehensively in the AsmPrinter. In
fact, the AsmPrinter does try to avoid emitting such zero-sized symbols when
labels aren't involved, but doesn't when the symbol to emitted is a difference
of two labels; this patch extends that logic to handle the case in which the
symbol is defined via labels.

Diff Detail

Event Timeline

pcwalton created this revision.May 19 2022, 1:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2022, 1:06 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
pcwalton requested review of this revision.May 19 2022, 1:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2022, 1:06 PM

I think the right solution to this is not to have zero sized symbols at all (they break C++ function pointer uniqueness, make symbolising "weird"/ambiguous, break aranges here, and if they are functions and those functions are called they result in some pretty arbitrary badness as execution falls off - and we should probably fix that in general so you can't just fall off the end of a function, the code size wins aren't enough to justify it I believe/think/would be good to have some data to back that up)

I think I had a thread on llvm-dev a while back about this and how we should fix this. There are some options in llvm already to addressing overlapping sets of issues like this but not sure there's exactly "trap at end of any function that doesn't end in a branch instruction of some kind"

Well, named zero-sized values are a first-class feature of Rust. You can write code like this:

pub static FOO: () = ();

Or:

struct ZeroSized;
pub static BAR: ZeroSized = ZeroSized;

and calling mem::size_of::<ZeroSized>() is guaranteed by the language semantics to evaluate to zero. We want to be able to encode such values in DWARF as well so that printing them in the debugger does something sensible.

Forbidding zero-sized functions makes sense to me, but I'm not sure how we would proceed forbidding zero-sized globals without regressing working functionality.

pcwalton added a comment.EditedMay 19 2022, 4:14 PM

That being said, forbidding global zero-sized symbols might work for us, because I think those are pretty rare. They do appear in practice sometimes, but not commonly.

However, Rust programmers do use the pub static BAR: ZeroSized = ZeroSized pattern above in a similar way to traits types in C++. Without zero-sized global symbols, this would no longer be a zero-cost abstraction. Of course, 1 byte per symbol is a pretty low cost, though I wonder if there are macro crates out there that generate a whole bunch of them.

rnk added a comment.May 20 2022, 11:06 AM

I agree with David, I would like to see LLVM move in the direction of never emitting empty functions. These are just labels that snap to the next function in the same section, and that's silly. I'm not sure what happens if you use function sections. We should just emit some trap instruction, and let the linker do identical code folding (ICF) to merge them back together. This will regress code size, but I doubt out users will complain, and ICF will recover most of the size regression.

Regarding global symbols, I don't know about Rust, but I believe it is possible to emit empty global variables in LLVM IR with zero-sized arrays. So, I think this change probably has merit on its own, without getting into the handling of empty function bodies.

Lastly, this code change requires a test.

pcwalton updated this revision to Diff 431041.May 20 2022, 1:38 PM

Added a test.

I like the idea of moving in the direction of forbidding zero-sized functions (pretty sure Rust never emits these) but supporting zero-sized globals.

I added a test.

I agree with David, I would like to see LLVM move in the direction of never emitting empty functions. These are just labels that snap to the next function in the same section, and that's silly. I'm not sure what happens if you use function sections. We should just emit some trap instruction, and let the linker do identical code folding (ICF) to merge them back together. This will regress code size, but I doubt out users will complain, and ICF will recover most of the size regression.

Regarding global symbols, I don't know about Rust, but I believe it is possible to emit empty global variables in LLVM IR with zero-sized arrays. So, I think this change probably has merit on its own, without getting into the handling of empty function bodies.

LLVM IR/C++ does support zero-sized arrays, though I think even in that case it might be better to make them non-zero symbols for symbolizing purposes, etc - GCC seems to make them non-zero size: https://godbolt.org/z/vh4z3G4fh

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
3052–3054

This might actually be a more suitable direction for zero-length entries.

Otherwise: What's a consumer going to do if they query for the address and it's not in aranges? (They then need to scan all the DWARF to find the zero-length entry at that address, losing the benefit of aranges?)

(also: what are you using aranges for? They've been off-by-default in Clang for many years now & I don't know of any particular value they have compared to using the ranges on the CUs in .debug_info directly (well, LLVM's aranges include data/non-code symbols, but GCC's don't, so it's hard for a consumer to rely on that extra data anyway) - I hope one day we can remove the support entirely)

pcwalton added inline comments.May 23 2022, 10:49 AM
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
3052–3054

This might actually be a more suitable direction for zero-length entries.

Otherwise: What's a consumer going to do if they query for the address and it's not in aranges? (They then need to scan all the DWARF to find the zero-length entry at that address, losing the benefit of aranges?)

(also: what are you using aranges for? They've been off-by-default in Clang for many years now & I don't know of any particular value they have compared to using the ranges on the CUs in .debug_info directly (well, LLVM's aranges include data/non-code symbols, but GCC's don't, so it's hard for a consumer to rely on that extra data anyway) - I hope one day we can remove the support entirely)

We actually aren't using them for anything. The problem is that some tools don't cope well with invalid .debug_aranges tables with premature terminators, so we have to emit *something* valid. It doesn't matter what it is.

wenlei added a subscriber: wenlei.May 23 2022, 11:16 AM
dblaikie added inline comments.May 23 2022, 4:03 PM
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
3052–3054

Sorry, I meant: how did you come across this bug? Aranges are disabled by default - so even if they're broken, I wouldn't expect that to be a problem, because they're not turned on anyway.

pcwalton added inline comments.May 23 2022, 4:22 PM
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
3052–3054

rustc generates .debug_aranges, and we're compiling Rust code. The relevant Rust issue explains the rationale as far as I'm aware (and it looks like you commented there).

pcwalton added inline comments.May 23 2022, 4:35 PM
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
3052–3054

Here's a new version of the patch that takes the approach you suggested of rounding sizes up to 1. https://reviews.llvm.org/D126257

ayermolo added inline comments.May 23 2022, 4:40 PM
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
3052–3054

Doesn't lldb use .debug_aranges to speed up loading of debug info?

I agree with David, I would like to see LLVM move in the direction of never emitting empty functions. These are just labels that snap to the next function in the same section, and that's silly. I'm not sure what happens if you use function sections. We should just emit some trap instruction, and let the linker do identical code folding (ICF) to merge them back together. This will regress code size, but I doubt out users will complain, and ICF will recover most of the size regression.

Regarding global symbols, I don't know about Rust, but I believe it is possible to emit empty global variables in LLVM IR with zero-sized arrays. So, I think this change probably has merit on its own, without getting into the handling of empty function bodies.

LLVM IR/C++ does support zero-sized arrays, though I think even in that case it might be better to make them non-zero symbols for symbolizing purposes, etc - GCC seems to make them non-zero size: https://godbolt.org/z/vh4z3G4fh

I'm still inclined towards GCC's direction here - making everything non-zero length. It means querying for an address makes sense, for instance. Otherwise what does it mean to search the DWARF for an address? If there is nothing /at/ that address (or the next thing is technically at that address, because the thing you're looking for is zero length) - like if you want to print out the symbol name of a zero-length symbol, the search should technically come back empty, because there's nothing at the address.

pcwalton abandoned this revision.May 25 2022, 1:17 PM

Closing in favor of D126257.