This is an archive of the discontinued LLVM Phabricator instance.

Round up zero-sized symbols to 1 byte in `.debug_aranges` (without breaking other logic).
ClosedPublic

Authored by pcwalton on Jun 1 2022, 3:45 PM.

Details

Summary

This commit modifies the AsmPrinter to avoid emitting any zero-sized symbols to
the .debug_aranges table, by rounding their size up to 1. Entries with zero
length violate the DWARF 5 spec, which states:

Each descriptor is a triple consisting of a segment selector, the beginning
address within that segment of a range of text or data covered by some entry
owned by the corresponding compilation unit, followed by the non-zero length
of that range.

In practice, these zero-sized entries produce annoying warnings in lld and
cause GNU binutils to truncate the table when parsing it.

Other parts of LLVM, such as DWARFDebugARanges in the DebugInfo module
(specifically the appendRange method), already avoid emitting zero-sized
symbols to .debug_aranges, but not comprehensively in the AsmPrinter. In fact,
the AsmPrinter does try to avoid emitting such zero-sized symbols when labels
aren't involved, but doesn't when the symbol to emitted is a difference of two
labels; this patch extends that logic to handle the case in which the symbol is
defined via labels.

Furthermore, this patch fixes a bug in which available_externally symbols
would cause unpredictable values to be emitted into the .debug_aranges table
under certain circumstances. In practice I don't believe that this caused
issues up until now, but the root cause of this bug--an invalid DenseMap
lookup--triggered failures in Chromium when combined with an earlier version of
this patch. Therefore, this patch fixes that bug too.

This is a revised version of diff D126257, which was reverted due to breaking
tests. The now-reverted version of this patch didn't distinguish between
symbols that didn't have their size reported to the DwarfDebug handler and
those that had their size reported to be zero. This new version of the patch
instead restricts the special handling only to the symbols whose size is
definitively known to be zero.

Diff Detail

Event Timeline

pcwalton created this revision.Jun 1 2022, 3:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2022, 3:45 PM
pcwalton requested review of this revision.Jun 1 2022, 3:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2022, 3:45 PM
dtolnay added a subscriber: dtolnay.Jun 1 2022, 3:53 PM

Test failures look unrelated?

ayermolo added inline comments.Jun 2 2022, 10:42 AM
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
3056

Can you change back to unit64_t. I believe LLVM coding standard errs on using specific types when it's not obvious what type it is.

ayermolo added inline comments.Jun 2 2022, 10:43 AM
llvm/test/CodeGen/X86/dwarf-aranges-zero-size.ll
9

Would it be possible to add source code + compiler command that generated this?

bjope added inline comments.Jun 2 2022, 12:51 PM
llvm/test/CodeGen/X86/dwarf-aranges-zero-size.ll
19

As mentioned in comments in the earlier review, "align" is specified in bits. And aligning to 1 bit seem a bit weird (downstream this trigger and assertion for us, because we assert that align is a multiple of the char size since llvm will divide this by the char size to get alignment in bytes).
Is rust really using alignments smaller than the char size?

pcwalton added inline comments.Jun 7 2022, 2:44 PM
llvm/test/CodeGen/X86/dwarf-aranges-zero-size.ll
19

Good catch, I submitted a fix to upstream rustc: https://github.com/rust-lang/rust/pull/97846

pcwalton updated this revision to Diff 435277.Jun 8 2022, 11:50 AM

Updated the diff to address comments.

pcwalton marked 3 inline comments as done.Jun 8 2022, 11:52 AM
ormris removed a subscriber: ormris.Jun 8 2022, 3:06 PM

Looks ok to me.
@dblaikie thoughts?

hans added a subscriber: hans.Jun 13 2022, 7:05 AM

When trying this on Chromium (in the context of https://crbug.com/1335630) we hit a densemap assert in llvm::DwarfDebug::emitDebugARanges()
See the attached reproducer.

dblaikie added a subscriber: hansw.Jun 13 2022, 2:32 PM

When trying this on Chromium (in the context of https://crbug.com/1335630) we hit a densemap assert in llvm::DwarfDebug::emitDebugARanges()
See the attached reproducer.

Thanks for the reproducer, @hansw - sounds worth addressing before this moves forward.

(I'm still sort of feeling like we should fix the underlying entities to be non-zero length always (even if they're zero length at the language level - they can still be non-zero length as an implementation detail providing unique addresses))

When trying this on Chromium (in the context of https://crbug.com/1335630) we hit a densemap assert in llvm::DwarfDebug::emitDebugARanges()
See the attached reproducer.

Thanks for the reproducer, @hansw - sounds worth addressing before this moves forward.

(I'm still sort of feeling like we should fix the underlying entities to be non-zero length always (even if they're zero length at the language level - they can still be non-zero length as an implementation detail providing unique addresses))

Doesn't it rely on various feeders into llvm doing the right thing, or can this be done in some kind of language agnostic way?

When trying this on Chromium (in the context of https://crbug.com/1335630) we hit a densemap assert in llvm::DwarfDebug::emitDebugARanges()
See the attached reproducer.

Thanks for the reproducer, @hansw - sounds worth addressing before this moves forward.

(I'm still sort of feeling like we should fix the underlying entities to be non-zero length always (even if they're zero length at the language level - they can still be non-zero length as an implementation detail providing unique addresses))

Doesn't it rely on various feeders into llvm doing the right thing, or can this be done in some kind of language agnostic way?

Possible it could require changes to producers if we wanted to make it invalid at the LLVM IR level to have a zero-length global variable (we already can't have a zero-length IR function - it has to have a terminator, even if it's "unreachable").

But I don't think that's necessary/required - this could be implemented in the LLVM backend(s) (ideally as target-neutrally as possible) to emit a trap for any functions that end up not ending in a terminator instruction (including truly zero-length functions) (various slight variations of this support already exist in LLVM - but I wasn't able to stare at them long enough to get the particular variant I really wanted - but maybe that was a "perfect being the enemy of the good" situation and using one of the existing features (either the non-zero-length thing implemented for Windows (I think that just puts a zero rather than trap) or the "trap on unreachable" (that puts traps on even non-final unreachables, which is more pessimism than I was hoping for - but maybe low cost enough) that Sony and Apple use), extending it for global variables as well as for functions)

Minimal reproduction:

template <typename T>
struct Baz {
    static constexpr int boo = int(-1);
};

extern template struct Baz<char>;

void bar(const int& a);

void foo() {
  bar(Baz<char>::boo);
}

I have a fix; it seems to be a subtle pre-existing bug in LLVM actually that happened to be masked until now.

pcwalton updated this revision to Diff 438080.Jun 17 2022, 5:54 PM

So this was a fun one. The previous version of this patch accidentally caused a
pre-existing bug to surface and break Chromium. In certain circumstances
including the Chromium test case, available_externally symbols would cause
invalid lookups in the SymSize table. Before this patch, the lookups were
done with the [] operator, which would succeed but cause an unspecified value
to be emitted in the .debug_aranges table. But this patch moves those lookups
to be done by dereferencing the result of the .find() method, which causes
asserts if the key is not found.

This patch fixes this bug and adds a test. It took quite a while to reduce all
150kloc+ of code...

This should be ready for review now.

pcwalton edited the summary of this revision. (Show Details)Jun 17 2022, 5:57 PM
dblaikie accepted this revision.Jun 21 2022, 4:49 PM

When trying this on Chromium (in the context of https://crbug.com/1335630) we hit a densemap assert in llvm::DwarfDebug::emitDebugARanges()
See the attached reproducer.

Thanks for the reproducer, @hansw - sounds worth addressing before this moves forward.

(I'm still sort of feeling like we should fix the underlying entities to be non-zero length always (even if they're zero length at the language level - they can still be non-zero length as an implementation detail providing unique addresses))

Doesn't it rely on various feeders into llvm doing the right thing, or can this be done in some kind of language agnostic way?

Possible it could require changes to producers if we wanted to make it invalid at the LLVM IR level to have a zero-length global variable (we already can't have a zero-length IR function - it has to have a terminator, even if it's "unreachable").

But I don't think that's necessary/required - this could be implemented in the LLVM backend(s) (ideally as target-neutrally as possible) to emit a trap for any functions that end up not ending in a terminator instruction (including truly zero-length functions) (various slight variations of this support already exist in LLVM - but I wasn't able to stare at them long enough to get the particular variant I really wanted - but maybe that was a "perfect being the enemy of the good" situation and using one of the existing features (either the non-zero-length thing implemented for Windows (I think that just puts a zero rather than trap) or the "trap on unreachable" (that puts traps on even non-final unreachables, which is more pessimism than I was hoping for - but maybe low cost enough) that Sony and Apple use), extending it for global variables as well as for functions)

I'd still be highly encouraging of going somewhere in this ^ direction (& in a direction away from using .debug_aranges as well), but the patch looks OK for what it is.

This revision is now accepted and ready to land.Jun 21 2022, 4:49 PM

Looks like this would be another issue with zero-length symbols: https://reviews.llvm.org/D127897 (just cross-referencing/collating)

Looks like this would be another issue with zero-length symbols: https://reviews.llvm.org/D127897 (just cross-referencing/collating)

Well it's an issue right now even without this change. Talking with @clayborg pointing to end-list in ranges is a good work around (along with lldb change).
Long term @zr33 (our summer intern) is working on mechanism to write out .debug_Info/.debug_abbrev using the same mechanics as in codegen/dwarflinker. Once that is enabled we can leave this pattern as is without converting to ranges.

Looks like this would be another issue with zero-length symbols: https://reviews.llvm.org/D127897 (just cross-referencing/collating)

Well it's an issue right now even without this change.

Sorry, what I meant is: This issue (D126835) and the other (D127897) would probably both be better-off solved by changing the compiler/LLVM to never produce zero-length entries.

Long term @zr33 (our summer intern) is working on mechanism to write out .debug_Info/.debug_abbrev using the same mechanics as in codegen/dwarflinker.
Once that is enabled we can leave this pattern as is without converting to ranges.

Still not really following - "converting to ranges" as in "changing consumers to use ranges instead of aranges"? sure - though I think that change would be a good thing in general, since aranges take up a bunch of object space.

Looks like this would be another issue with zero-length symbols: https://reviews.llvm.org/D127897 (just cross-referencing/collating)

Well it's an issue right now even without this change.

Sorry, what I meant is: This issue (D126835) and the other (D127897) would probably both be better-off solved by changing the compiler/LLVM to never produce zero-length entries.

Oh I see.
Some more context. Debug info that D127897 is handling is coming from GCC.

Long term @zr33 (our summer intern) is working on mechanism to write out .debug_Info/.debug_abbrev using the same mechanics as in codegen/dwarflinker.
Once that is enabled we can leave this pattern as is without converting to ranges.

Still not really following - "converting to ranges" as in "changing consumers to use ranges instead of aranges"? sure - though I think that change would be a good thing in general, since aranges take up a bunch of object space.

My bad I should have been more clear.. This is just what BOLT does. We convert DW_AT_low_pc/DW_AT_high_pc to DW_AT_low_pc/DW_AT_ranges.. This is because BOLT can break up functions.
Due to the limits of our current implementation if DIEs share a an abbrev table we must convert all of them.

Looks like this would be another issue with zero-length symbols: https://reviews.llvm.org/D127897 (just cross-referencing/collating)

Well it's an issue right now even without this change.

Sorry, what I meant is: This issue (D126835) and the other (D127897) would probably both be better-off solved by changing the compiler/LLVM to never produce zero-length entries.

Oh I see.
Some more context. Debug info that D127897 is handling is coming from GCC.

Oh, that surprises me - could you provide repro steps?

Looks like this would be another issue with zero-length symbols: https://reviews.llvm.org/D127897 (just cross-referencing/collating)

Well it's an issue right now even without this change.

Sorry, what I meant is: This issue (D126835) and the other (D127897) would probably both be better-off solved by changing the compiler/LLVM to never produce zero-length entries.

Oh I see.
Some more context. Debug info that D127897 is handling is coming from GCC.

Oh, that surprises me - could you provide repro steps?

int conv32(const void* ptr) { return *(const int *) ptr; }

int foo(const void* ptr) {
  return conv32(ptr) + 2;
}

gcc -g -gstrict-dwarf -gz=none -O3 helper.cpp -c
Version 11.2.1

Looks like this would be another issue with zero-length symbols: https://reviews.llvm.org/D127897 (just cross-referencing/collating)

Well it's an issue right now even without this change.

Sorry, what I meant is: This issue (D126835) and the other (D127897) would probably both be better-off solved by changing the compiler/LLVM to never produce zero-length entries.

Oh I see.
Some more context. Debug info that D127897 is handling is coming from GCC.

Oh, that surprises me - could you provide repro steps?

int conv32(const void* ptr) { return *(const int *) ptr; }

int foo(const void* ptr) {
  return conv32(ptr) + 2;
}

gcc -g -gstrict-dwarf -gz=none -O3 helper.cpp -c
Version 11.2.1

Huh - I'd argue that's /probably/ a bug in GCC? Describing a zero-length inlined scope seems of minimal value. I guess some very late optimization probably causes this in GCC. Might be worth filing a bug on GCC or otherwise asking them about that situation (if you do, please CC me - I'd be curious to follow along).

The test failures in libomp don't look related to me.

Looks like this would be another issue with zero-length symbols: https://reviews.llvm.org/D127897 (just cross-referencing/collating)

Well it's an issue right now even without this change.

Sorry, what I meant is: This issue (D126835) and the other (D127897) would probably both be better-off solved by changing the compiler/LLVM to never produce zero-length entries.

Oh I see.
Some more context. Debug info that D127897 is handling is coming from GCC.

Oh, that surprises me - could you provide repro steps?

int conv32(const void* ptr) { return *(const int *) ptr; }

int foo(const void* ptr) {
  return conv32(ptr) + 2;
}

gcc -g -gstrict-dwarf -gz=none -O3 helper.cpp -c
Version 11.2.1

Huh - I'd argue that's /probably/ a bug in GCC? Describing a zero-length inlined scope seems of minimal value. I guess some very late optimization probably causes this in GCC. Might be worth filing a bug on GCC or otherwise asking them about that situation (if you do, please CC me - I'd be curious to follow along).

Let me put it on my TODO list.

The test failures in libomp don't look related to me.

OK, will commit on Monday. Thanks

I verified that the failing tests don't break locally.

This revision was landed with ongoing or failed builds.Jun 27 2022, 10:02 AM
This revision was automatically updated to reflect the committed changes.