This is an archive of the discontinued LLVM Phabricator instance.

[DWARFLibrary] Add support to re-construct cu-index
ClosedPublic

Authored by ayermolo on Nov 11 2022, 6:24 PM.

Details

Summary

According to DWARF5 specification and gnu specification for DWARF4 the offset
entry in the CU/TU Index is 32 bits. This presents a problem when
.debug_info.dwo in DWP file grows beyond 4GB. The CU Index becomes partially
corrupted.

This diff adds manual parsing of .debug_info.dwo/.debug_abbrev.dwo to
reconstruct CU index in general, and TU index for DWARF5. This is a work around
until DWARF6 spec is finalized.

Next patch will change internal CU/TU struct to 64 bit, and change uses as
necessary. The plan is to land all the patches in one go after all are approved.

This patch originates from the discussion in: https://discourse.llvm.org/t/dwarf-dwp-4gb-limit/63902

Diff Detail

Event Timeline

ayermolo created this revision.Nov 11 2022, 6:24 PM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added subscribers: hoy, modimo, wenlei and 3 others. · View Herald Transcript
ayermolo requested review of this revision.Nov 11 2022, 6:24 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 11 2022, 6:24 PM
dblaikie added inline comments.
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
802

Is this how error handling's generally done here? I think maybe the DWARFContext has error handling callbacks that are meant to be used? (& should probably propagate up a failure result through all of this rather than continuing with corrupt data?)

809

Could you rely on the version of the index? (version 2, I think, for pre-standard index, version 5 for the DWARFv5 standard index) rather than having to wait to parse a unit to see what version it has.

There's currently no way to mix pre-standard and standard indexes, I think (owing to the valid columns accepted in each)? So that should be adequate.

830

Rather than exposing mutability in the index interface, could this whole function (fixupIndex) be moved into the index & performed there as part of parsing?

840

any reason to believe the lengths would be incorrect? Perhaps we can limit the scope a bit by not touching those?

llvm/lib/DebugInfo/DWARF/DWARFUnitIndex.cpp
272–275

Should this return by reference?

llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
264

Maybe, to avoid the "CU/TU/etc" Could use "Unit" consistently in both flag name and opt variable, etc. (so there's no confusion that maybe it's specifically only for the CUIndex and not the TUIndex)

265–266

maybe ".debug_info" rather than "debug info"?

267

Maybe DW_SECT_INFO rather than "DEBUG_INFO"?

ayermolo updated this revision to Diff 475244.Nov 14 2022, 12:32 PM
ayermolo marked an inline comment as done.

Addressing comments

ayermolo updated this revision to Diff 475249.Nov 14 2022, 12:52 PM

missed llvm-dwarfdump comments

ayermolo marked 3 inline comments as done.Nov 14 2022, 12:53 PM
ayermolo added inline comments.
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
802

Changed it to logAllUnhandledErrors(createError()) for now
To propagate up can chagne DWARFContext::get(CU,TU}Index() to return Expectec<DWARFUnitIndex>?

I was going with something more localized to minimize the impact. I guess it comes down to philosophical question of whether if CU/TU index is partially corrupted we want to consider whole thing corrupted, or keep the current behavior of at least being able to access debug info below 4GB.

830

Wasn't part of the feedback from other diff is to minimize impact and not modify cu/tu index parsing, or did I miss understand? We then will need to modify parse to pass in context, and if we are parsing CU or TU.

840

I didn't want to assume anything about the producer. If Offset is corrupt, depending how how length is calculated at least one might be corrupt also. Also we are overriding all the offsets, if we mess up on that, doesn't really matter if new length is correct or not.

llvm/lib/DebugInfo/DWARF/DWARFUnitIndex.cpp
272–275

I was trying to keep same return type as the const version. Changed to reference.

ayermolo updated this revision to Diff 475543.Nov 15 2022, 11:43 AM

fixed tests

Yeah, I'm mostly OK with this direction. It's pretty isolated, maybe easy enough to explain if needed, etc.

llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
782–784

These could probably all sink into fixupIndex as locals?

784

Maybe DenseMap rather than unordered_map?

830

Fair enough - mixed feelings, but I'll rescind this piece at least.

840

I'd prefer to be a bit less permissive, really - to not end up creating more weird cases that systems might come to depend on.

Maybe fail if the length doesn't match, until we know of any particular case with mismatched lengths that we understand enough to want to/figure out how to support?

2049

looks like this unrelated change snuck in?

llvm/lib/DebugInfo/DWARF/DWARFUnitIndex.cpp
272–275

oh, yeah, the other should probably change too... ifyou could do that in a separate patch, that'd be great

ayermolo marked 6 inline comments as done.Nov 23 2022, 1:47 PM
ayermolo added inline comments.
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
2049

Ah yeah clang-format change.

llvm/lib/DebugInfo/DWARF/DWARFUnitIndex.cpp
272–275

Sounds good.

ayermolo updated this revision to Diff 477596.Nov 23 2022, 1:47 PM
ayermolo marked an inline comment as done.

addressed comments

ayermolo marked 4 inline comments as done.Nov 23 2022, 1:48 PM

LLDB changes to enable 64bit support there. https://reviews.llvm.org/D138618

I think test coverage might be more suitable if it were a single dedicated test that has a corrupted index, to demonstrate that rebuilding the index comes up with a different (& correct) answer - rather than adding what looks sort of like redundant testing to existing test cases?

ayermolo added a comment.EditedNov 24 2022, 7:31 AM

I think test coverage might be more suitable if it were a single dedicated test that has a corrupted index, to demonstrate that rebuilding the index comes up with a different (& correct) answer - rather than adding what looks sort of like redundant testing to existing test cases?

I am not sure how to construct corrupted CU Index manually. Would Yaml2Obj be able to do it?
For other tests I think it's ads testing coverage to make sure we are parsing cu/tu indexes correctly in various DWARF versions and debug types enablements combinations.

*To add/clarify those tests do exercise common code paths. I think adding "corrupt" index test would make sense to test some of the error conditions.

I think test coverage might be more suitable if it were a single dedicated test that has a corrupted index, to demonstrate that rebuilding the index comes up with a different (& correct) answer - rather than adding what looks sort of like redundant testing to existing test cases?

I am not sure how to construct corrupted CU Index manually. Would Yaml2Obj be able to do it?

I'm not sure - I don't think so. Might just have to be raw assembly - check the other symbolizer dwp testing? I think it's probably hand-crafted assembly? (maybe llvm-dwp testing that merges existing dwp files has some hand crafted dwp files too you could be inspired by)

But maybe Yaml2Obj could help - I'm just not very familiar with this.

For other tests I think it's ads testing coverage to make sure we are parsing cu/tu indexes correctly in various DWARF versions and debug types enablements combinations.

I'd rather keep things a bit narrower - demonstrate that the index is computed correctly, and assume that everything else works correctly once the index is correct.

I think test coverage might be more suitable if it were a single dedicated test that has a corrupted index, to demonstrate that rebuilding the index comes up with a different (& correct) answer - rather than adding what looks sort of like redundant testing to existing test cases?

I am not sure how to construct corrupted CU Index manually. Would Yaml2Obj be able to do it?

I'm not sure - I don't think so. Might just have to be raw assembly - check the other symbolizer dwp testing? I think it's probably hand-crafted assembly? (maybe llvm-dwp testing that merges existing dwp files has some hand crafted dwp files too you could be inspired by)

But maybe Yaml2Obj could help - I'm just not very familiar with this.

For other tests I think it's ads testing coverage to make sure we are parsing cu/tu indexes correctly in various DWARF versions and debug types enablements combinations.

I'd rather keep things a bit narrower - demonstrate that the index is computed correctly, and assume that everything else works correctly once the index is correct.

So...
Good news is that I got inspired and created a hand written assembly with invalid index.
Bad news is that this version of the patch relies on overflow behavior. See uint32_t TruncOffset. So just having one incorrect offset doesn't do anything.
I tried to play with length in the header the Header.extract fails.

tschuett added inline comments.
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
59

Is this include useless?

ayermolo updated this revision to Diff 479349.Dec 1 2022, 10:22 AM

removed unordered_map

ayermolo marked an inline comment as done.Dec 1 2022, 10:22 AM
ayermolo added inline comments.
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
59

Thanks removed. Leftover from switch to DenseMap.

I think test coverage might be more suitable if it were a single dedicated test that has a corrupted index, to demonstrate that rebuilding the index comes up with a different (& correct) answer - rather than adding what looks sort of like redundant testing to existing test cases?

I am not sure how to construct corrupted CU Index manually. Would Yaml2Obj be able to do it?

I'm not sure - I don't think so. Might just have to be raw assembly - check the other symbolizer dwp testing? I think it's probably hand-crafted assembly? (maybe llvm-dwp testing that merges existing dwp files has some hand crafted dwp files too you could be inspired by)

But maybe Yaml2Obj could help - I'm just not very familiar with this.

For other tests I think it's ads testing coverage to make sure we are parsing cu/tu indexes correctly in various DWARF versions and debug types enablements combinations.

I'd rather keep things a bit narrower - demonstrate that the index is computed correctly, and assume that everything else works correctly once the index is correct.

So...
Good news is that I got inspired and created a hand written assembly with invalid index.
Bad news is that this version of the patch relies on overflow behavior. See uint32_t TruncOffset.

UB when converting a uint64_t into a uint32_t? Hmm, I didn't think that was undefined - figured that'd wrap around with well defined behavior. Maybe I'm misunderstanding - what's the particular UB you're referring to?

So just having one incorrect offset doesn't do anything.

Yeah, I'm still missing a step here. But I guess since we're only walking the offsets and looking for wraparound, this only has an effect if there's a genuine overflow, which would require a genuinely very large dwp file, which is infeasible to checkin as a test?

ayermolo marked an inline comment as done.Dec 1 2022, 4:10 PM

I think test coverage might be more suitable if it were a single dedicated test that has a corrupted index, to demonstrate that rebuilding the index comes up with a different (& correct) answer - rather than adding what looks sort of like redundant testing to existing test cases?

I am not sure how to construct corrupted CU Index manually. Would Yaml2Obj be able to do it?

I'm not sure - I don't think so. Might just have to be raw assembly - check the other symbolizer dwp testing? I think it's probably hand-crafted assembly? (maybe llvm-dwp testing that merges existing dwp files has some hand crafted dwp files too you could be inspired by)

But maybe Yaml2Obj could help - I'm just not very familiar with this.

For other tests I think it's ads testing coverage to make sure we are parsing cu/tu indexes correctly in various DWARF versions and debug types enablements combinations.

I'd rather keep things a bit narrower - demonstrate that the index is computed correctly, and assume that everything else works correctly once the index is correct.

So...
Good news is that I got inspired and created a hand written assembly with invalid index.
Bad news is that this version of the patch relies on overflow behavior. See uint32_t TruncOffset.

UB when converting a uint64_t into a uint32_t? Hmm, I didn't think that was undefined - figured that'd wrap around with well defined behavior. Maybe I'm misunderstanding - what's the particular UB you're referring to?

Sorry was a bit vague with the terms. When I said "overflow behavior" I didn't mean it's UB, but was referring to "wrap around behavior" which is defined.

So just having one incorrect offset doesn't do anything.

Yeah, I'm still missing a step here. But I guess since we're only walking the offsets and looking for wraparound, this only has an effect if there's a genuine overflow, which would require a genuinely very large dwp file, which is infeasible to checkin as a test?

Right exactly. We are iterating over headers and computing TruncOffset and don't hit the bad case until we wrap around (.debug_info.dwo over 4GB). So I don't see a way to create a test to trigger this unless we have a very large dwp file.

I think test coverage might be more suitable if it were a single dedicated test that has a corrupted index, to demonstrate that rebuilding the index comes up with a different (& correct) answer - rather than adding what looks sort of like redundant testing to existing test cases?

I am not sure how to construct corrupted CU Index manually. Would Yaml2Obj be able to do it?

I'm not sure - I don't think so. Might just have to be raw assembly - check the other symbolizer dwp testing? I think it's probably hand-crafted assembly? (maybe llvm-dwp testing that merges existing dwp files has some hand crafted dwp files too you could be inspired by)

But maybe Yaml2Obj could help - I'm just not very familiar with this.

For other tests I think it's ads testing coverage to make sure we are parsing cu/tu indexes correctly in various DWARF versions and debug types enablements combinations.

I'd rather keep things a bit narrower - demonstrate that the index is computed correctly, and assume that everything else works correctly once the index is correct.

So...
Good news is that I got inspired and created a hand written assembly with invalid index.
Bad news is that this version of the patch relies on overflow behavior. See uint32_t TruncOffset.

UB when converting a uint64_t into a uint32_t? Hmm, I didn't think that was undefined - figured that'd wrap around with well defined behavior. Maybe I'm misunderstanding - what's the particular UB you're referring to?

Sorry was a bit vague with the terms. When I said "overflow behavior" I didn't mean it's UB, but was referring to "wrap around behavior" which is defined.

So just having one incorrect offset doesn't do anything.

Yeah, I'm still missing a step here. But I guess since we're only walking the offsets and looking for wraparound, this only has an effect if there's a genuine overflow, which would require a genuinely very large dwp file, which is infeasible to checkin as a test?

Right exactly. We are iterating over headers and computing TruncOffset and don't hit the bad case until we wrap around (.debug_info.dwo over 4GB). So I don't see a way to create a test to trigger this unless we have a very large dwp file.

*nod* fair enough. Any chance of hand writing something that uses small assembly, but produces large enough output that it hits the overflow (using .zero or the like) - we wouldn't want to run it (don't want to produce multi-GB output files in test execution) but maybe comment it as a manual test, or at least have mentioned it/shown it in this review.

If you could write up something like that, copy/paste manually running it to show that the patch makes a difference and include the assembly in this review, I guess that'll have to be adequate?

I think test coverage might be more suitable if it were a single dedicated test that has a corrupted index, to demonstrate that rebuilding the index comes up with a different (& correct) answer - rather than adding what looks sort of like redundant testing to existing test cases?

I am not sure how to construct corrupted CU Index manually. Would Yaml2Obj be able to do it?

I'm not sure - I don't think so. Might just have to be raw assembly - check the other symbolizer dwp testing? I think it's probably hand-crafted assembly? (maybe llvm-dwp testing that merges existing dwp files has some hand crafted dwp files too you could be inspired by)

But maybe Yaml2Obj could help - I'm just not very familiar with this.

For other tests I think it's ads testing coverage to make sure we are parsing cu/tu indexes correctly in various DWARF versions and debug types enablements combinations.

I'd rather keep things a bit narrower - demonstrate that the index is computed correctly, and assume that everything else works correctly once the index is correct.

So...
Good news is that I got inspired and created a hand written assembly with invalid index.
Bad news is that this version of the patch relies on overflow behavior. See uint32_t TruncOffset.

UB when converting a uint64_t into a uint32_t? Hmm, I didn't think that was undefined - figured that'd wrap around with well defined behavior. Maybe I'm misunderstanding - what's the particular UB you're referring to?

Sorry was a bit vague with the terms. When I said "overflow behavior" I didn't mean it's UB, but was referring to "wrap around behavior" which is defined.

So just having one incorrect offset doesn't do anything.

Yeah, I'm still missing a step here. But I guess since we're only walking the offsets and looking for wraparound, this only has an effect if there's a genuine overflow, which would require a genuinely very large dwp file, which is infeasible to checkin as a test?

Right exactly. We are iterating over headers and computing TruncOffset and don't hit the bad case until we wrap around (.debug_info.dwo over 4GB). So I don't see a way to create a test to trigger this unless we have a very large dwp file.

*nod* fair enough. Any chance of hand writing something that uses small assembly, but produces large enough output that it hits the overflow (using .zero or the like) - we wouldn't want to run it (don't want to produce multi-GB output files in test execution) but maybe comment it as a manual test, or at least have mentioned it/shown it in this review.

If you could write up something like that, copy/paste manually running it to show that the patch makes a difference and include the assembly in this review, I guess that'll have to be adequate?

Oh I think I see.
Let me look into it.

bin/llvm-lit -a /data/users/ayermolo/server-llvm/llvm-project/llvm/test/tools/llvm-dwp/X86/invalid-cu-index.s

bin/llvm-dwarfdump --manaully-generate-unit-index --debug-cu-index /data/users/ayermolo/llvm-build-release/test/tools/llvm-dwp/X86/Output/invalid-cu-index.s.tmp.dwp


bin/llvm-dwarfdump --debug-cu-index /data/users/ayermolo/llvm-build-release/test/tools/llvm-dwp/X86/Output/invalid-cu-index.s.tmp.dwp

# This test checks that with invalid offset in the cu index
# we can reconstruct it manually.

# RUN: llvm-mc --filetype=obj --triple x86_64 %s -o %t.dwp
# RUN: llvm-dwarfdump --manaully-generate-unit-index --debug-cu-index %t.dwp | FileCheck %s

# This test checks that we parse correctly cu-index that has entries over 4GB.
# It is setup to work with current llvm implementation where cu-index is 32bit.
# Once we move to 64bit internal representation, it will need to be modified.

# CHECK:        0x970c277d61e66cb3
# CHECK-SAME: [0x00000000, 0xfffffff0)
# CHECK:      0xd725a83371e7e913
# CHECK-SAME: [0xfffffff0, 0x0000001b)
# CHECK:      0x93f541184fb98d75
# CHECK-SAME: [0x0000001b, 0x00000046)

        .section        .debug_abbrev.dwo,"e",@progbits
.Labbrev1:
        .byte   1                       # Abbreviation Code
        .byte   17                      # DW_TAG_compile_unit
        .byte   0                       # DW_CHILDREN_no
        .byte   37                      # DW_AT_producer
        .byte   8                       # DW_FORM_string
        .byte   3                       # DW_AT_name
        .byte   8                       # DW_FORM_string
        .ascii  "\261B"                 # DW_AT_GNU_dwo_id
        .byte   7                       # DW_FORM_data8
        .byte   0                       # EOM(1)
        .byte   0                       # EOM(2)
        .byte   0                       # EOM(3)
.Labbrev_end1:

        .section        .debug_info.dwo,"e",@progbits
# DWO CU1
.Lcu_begin1:
        .long   .Ldebug_info_end1-.Ldebug_info_start1 # Length of Unit 0x2b
.Ldebug_info_start1:
        .short  4                       # DWARF version number
        .long   0                       # Offset Into Abbrev. Section
        .byte   8                       # Address Size (in bytes)
        .byte   1                       # Abbrev DW_TAG_compile_unit
        .asciz  "Hand-written DWARF"    # DW_AT_producer
        .byte   '1', '.', 'c', 0        # DW_AT_name
        .quad   0x970c277d61e66cb3      # DW_AT_GNU_dwo_id
        .zero   0xfffffff0 - 0x2b       # 0xfffffff0 is mimimum reserved length
.Ldebug_info_end1:

# DWO CU2
.Lcu_begin2:
        .long   .Ldebug_info_end2-.Ldebug_info_start2 # Length of Unit 0x2b
.Ldebug_info_start2:
        .short  4                       # DWARF version number
        .long   0                       # Offset Into Abbrev. Section
        .byte   8                       # Address Size (in bytes)
        .byte   1                       # Abbrev DW_TAG_compile_unit
        .asciz  "Hand-written DWARF"    # DW_AT_producer
        .byte   '2', '.', 'c', 0        # DW_AT_name
        .quad   0xd725a83371e7e913      # DW_AT_GNU_dwo_id
.Ldebug_info_end2:

# DWO CU3
.Lcu_begin3:
        .long   .Ldebug_info_end3-.Ldebug_info_start3 # Length of Unit 0x2b
.Ldebug_info_start3:
        .short  4                       # DWARF version number
        .long   0                       # Offset Into Abbrev. Section
        .byte   8                       # Address Size (in bytes)
        .byte   1                       # Abbrev DW_TAG_compile_unit
        .asciz  "Hand-written DWARF"    # DW_AT_producer
        .byte   '3', '.', 'c', 0        # DW_AT_name
        .quad   0x93f541184fb98d75      # DW_AT_GNU_dwo_id
.Ldebug_info_end3:

        .section        .debug_cu_index,"",@progbits
        .long   2                       # DWARF version number
        .long   2                       # Section count
        .long   3                       # Unit count
        .long   8                       # Slot count

        .quad   0x970c277d61e66cb3, 0, 0, 0xd725a83371e7e913, 0, 0x93f541184fb98d75, 0, 0  # Hash table
        .long   1, 0, 0, 2, 0, 3, 0, 0  # Index table

        .long   1                       # DW_SECT_INFO
        .long   3                       # DW_SECT_ABBREV
# Offsets
        # row 0
        .long  .Lcu_begin1-.debug_info.dwo
        .long  .Labbrev1-.debug_abbrev.dwo
        # row 1
        .long  .Lcu_begin2-.debug_info.dwo
        .long  .Labbrev1-.debug_abbrev.dwo
        # row 2
        .long  0x1b # setting this manually, otherwise llvm-mc crashes
        .long  .Labbrev1-.debug_abbrev.dwo
# Lengths
        # row 0
        .long .Ldebug_info_end1-.Lcu_begin1
        .long .Labbrev_end1-.Labbrev1
        # row 1
        .long .Ldebug_info_end2-.Lcu_begin2
        .long .Labbrev_end1-.Labbrev1
        # row 2
        .long .Ldebug_info_end3-.Lcu_begin3
        .long .Labbrev_end1-.Labbrev1

bin/llvm-lit -a /data/users/ayermolo/server-llvm/llvm-project/llvm/test/tools/llvm-dwp/X86/invalid-cu-index.s

bin/llvm-dwarfdump --manaully-generate-unit-index --debug-cu-index /data/users/ayermolo/llvm-build-release/test/tools/llvm-dwp/X86/Output/invalid-cu-index.s.tmp.dwp


bin/llvm-dwarfdump --debug-cu-index /data/users/ayermolo/llvm-build-release/test/tools/llvm-dwp/X86/Output/invalid-cu-index.s.tmp.dwp

These don't quite look like how I'd expect. I'd have thought the manually-generate-unit-index would be different/correct, showing the value in the manual generation over the pre-built but buggy/overflowed index?

# This test checks that with invalid offset in the cu index
# we can reconstruct it manually.

# RUN: llvm-mc --filetype=obj --triple x86_64 %s -o %t.dwp
# RUN: llvm-dwarfdump --manaully-generate-unit-index --debug-cu-index %t.dwp | FileCheck %s

# This test checks that we parse correctly cu-index that has entries over 4GB.
# It is setup to work with current llvm implementation where cu-index is 32bit.
# Once we move to 64bit internal representation, it will need to be modified.

Sorry, I'm not following here ^ could you explain in more detail/rephrase?

ayermolo added a comment.EditedDec 2 2022, 3:42 PM

bin/llvm-lit -a /data/users/ayermolo/server-llvm/llvm-project/llvm/test/tools/llvm-dwp/X86/invalid-cu-index.s

bin/llvm-dwarfdump --manaully-generate-unit-index --debug-cu-index /data/users/ayermolo/llvm-build-release/test/tools/llvm-dwp/X86/Output/invalid-cu-index.s.tmp.dwp


bin/llvm-dwarfdump --debug-cu-index /data/users/ayermolo/llvm-build-release/test/tools/llvm-dwp/X86/Output/invalid-cu-index.s.tmp.dwp

These don't quite look like how I'd expect. I'd have thought the manually-generate-unit-index would be different/correct, showing the value in the manual generation over the pre-built but buggy/overflowed index?

# This test checks that with invalid offset in the cu index
# we can reconstruct it manually.

# RUN: llvm-mc --filetype=obj --triple x86_64 %s -o %t.dwp
# RUN: llvm-dwarfdump --manaully-generate-unit-index --debug-cu-index %t.dwp | FileCheck %s

# This test checks that we parse correctly cu-index that has entries over 4GB.
# It is setup to work with current llvm implementation where cu-index is 32bit.
# Once we move to 64bit internal representation, it will need to be modified.

Sorry, I'm not following here ^ could you explain in more detail/rephrase?

Well internal data structure for SectionContribution is still 32bit. So right now even with parsing we are still restricted by that. So output won't change.

struct SectionContribution {
      uint32_t Offset;
      uint32_t Length;
    };

My next patch will be changing it to 64bit and add accessors, but I am waiting for https://reviews.llvm.org/D138618 to go through review and land before posting it.
Maybe it's wrong order? Put up second patch changing data structure to 64bit, and in lldb code return 32 bit until that review goes through?

Does this clarify it for your second question, or should I try to rephrase?

bin/llvm-lit -a /data/users/ayermolo/server-llvm/llvm-project/llvm/test/tools/llvm-dwp/X86/invalid-cu-index.s

bin/llvm-dwarfdump --manaully-generate-unit-index --debug-cu-index /data/users/ayermolo/llvm-build-release/test/tools/llvm-dwp/X86/Output/invalid-cu-index.s.tmp.dwp


bin/llvm-dwarfdump --debug-cu-index /data/users/ayermolo/llvm-build-release/test/tools/llvm-dwp/X86/Output/invalid-cu-index.s.tmp.dwp

These don't quite look like how I'd expect. I'd have thought the manually-generate-unit-index would be different/correct, showing the value in the manual generation over the pre-built but buggy/overflowed index?

# This test checks that with invalid offset in the cu index
# we can reconstruct it manually.

# RUN: llvm-mc --filetype=obj --triple x86_64 %s -o %t.dwp
# RUN: llvm-dwarfdump --manaully-generate-unit-index --debug-cu-index %t.dwp | FileCheck %s

# This test checks that we parse correctly cu-index that has entries over 4GB.
# It is setup to work with current llvm implementation where cu-index is 32bit.
# Once we move to 64bit internal representation, it will need to be modified.

Sorry, I'm not following here ^ could you explain in more detail/rephrase?

Well internal data structure for SectionContribution is still 32bit. So right now even with parsing we are still restricted by that. So output won't change.

struct SectionContribution {
      uint32_t Offset;
      uint32_t Length;
    };

My next patch will be changing it to 64bit and add accessors, but I am waiting for https://reviews.llvm.org/D138618 to go through review and land before posting it.
Maybe it's wrong order? Put up second patch changing data structure to 64bit, and in lldb code return 32 bit until that review goes through?

Does this clarify it for your second question, or should I try to rephrase?

Ah, that makes this patch a bit untestable, though - perhaps this patch should wait for the 64bit underlying change, then this patch will have a real purpose/make a difference, where today it doesn't change anything. (committing things that don't change anything, then committing some otherwise-cleanup-ish refactoring that enables the previous change is tricky, because then it's hard to keep track of things being tested, since they couldn't be tested when they were committed owing to the missing underlying refactoring/changes - though it can be a chicke-and-egg situation, in this case I think it's probably clear enough that changing 32 bit offsets to 64 bit ones is pretty benign/mechanical & could be done/reviewed relatively easily (relatively, still lots of changes) & then this change would go on top making the behavior change/tested (albeit manually), etc)

bin/llvm-lit -a /data/users/ayermolo/server-llvm/llvm-project/llvm/test/tools/llvm-dwp/X86/invalid-cu-index.s

bin/llvm-dwarfdump --manaully-generate-unit-index --debug-cu-index /data/users/ayermolo/llvm-build-release/test/tools/llvm-dwp/X86/Output/invalid-cu-index.s.tmp.dwp


bin/llvm-dwarfdump --debug-cu-index /data/users/ayermolo/llvm-build-release/test/tools/llvm-dwp/X86/Output/invalid-cu-index.s.tmp.dwp

These don't quite look like how I'd expect. I'd have thought the manually-generate-unit-index would be different/correct, showing the value in the manual generation over the pre-built but buggy/overflowed index?

# This test checks that with invalid offset in the cu index
# we can reconstruct it manually.

# RUN: llvm-mc --filetype=obj --triple x86_64 %s -o %t.dwp
# RUN: llvm-dwarfdump --manaully-generate-unit-index --debug-cu-index %t.dwp | FileCheck %s

# This test checks that we parse correctly cu-index that has entries over 4GB.
# It is setup to work with current llvm implementation where cu-index is 32bit.
# Once we move to 64bit internal representation, it will need to be modified.

Sorry, I'm not following here ^ could you explain in more detail/rephrase?

Well internal data structure for SectionContribution is still 32bit. So right now even with parsing we are still restricted by that. So output won't change.

struct SectionContribution {
      uint32_t Offset;
      uint32_t Length;
    };

My next patch will be changing it to 64bit and add accessors, but I am waiting for https://reviews.llvm.org/D138618 to go through review and land before posting it.
Maybe it's wrong order? Put up second patch changing data structure to 64bit, and in lldb code return 32 bit until that review goes through?

Does this clarify it for your second question, or should I try to rephrase?

Ah, that makes this patch a bit untestable, though - perhaps this patch should wait for the 64bit underlying change, then this patch will have a real purpose/make a difference, where today it doesn't change anything. (committing things that don't change anything, then committing some otherwise-cleanup-ish refactoring that enables the previous change is tricky, because then it's hard to keep track of things being tested, since they couldn't be tested when they were committed owing to the missing underlying refactoring/changes - though it can be a chicke-and-egg situation, in this case I think it's probably clear enough that changing 32 bit offsets to 64 bit ones is pretty benign/mechanical & could be done/reviewed relatively easily (relatively, still lots of changes) & then this change would go on top making the behavior change/tested (albeit manually), etc)

I wasn't really planning on committing it without the other patch.
Let me do this. I'll remove dependency of the 32bit to 64bit patch on lldb being 64 bit, post it "on top" of this patch, and in that patch test it as a whole. Sounds good?

I wasn't really planning on committing it without the other patch.

Oh, OK - then if you could run the manual test/experiment with that patch applied, happy to review it on that basis and approve this so long as it goes after that work?

Let me do this. I'll remove dependency of the 32bit to 64bit patch on lldb being 64 bit, post it "on top" of this patch, and in that patch test it as a whole. Sounds good?

Not sure I follow - I think that's what I'd rather avoid committing this code (D137882) without a test. That it should wait for whatever refactoring's required before it's committed with some verification, even if it's manual. I don't want this patch to be committed in a state where it's a no-op/unobservable & rely on testing it in a subsequent patch.

Updated test with 64bit printout.


# RUN: llvm-mc --filetype=obj --triple x86_64 %s -o %t.dwp
# RUN: llvm-dwarfdump --manaully-generate-unit-index --debug-cu-index %t.dwp | FileCheck %s

# This test checks that we parse correctly cu-index that has entries over 4GB.
# It is setup to work with current llvm implementation where cu-index is 32bit.
# Once we move to 64bit internal representation, it will need to be modified.

# CHECK:        0x970c277d61e66cb3
# CHECK-SAME: [0x0000000000000000, 0x00000000fffffff0)
# CHECK:      0xd725a83371e7e913
# CHECK-SAME: [0x00000000fffffff0, 0x000000010000001b)
# CHECK:      0x93f541184fb98d75
# CHECK-SAME: [0x000000010000001b, 0x0000000100000046)

        .section        .debug_abbrev.dwo,"e",@progbits
.Labbrev1:
        .byte   1                       # Abbreviation Code
        .byte   17                      # DW_TAG_compile_unit
        .byte   0                       # DW_CHILDREN_no
        .byte   37                      # DW_AT_producer
        .byte   8                       # DW_FORM_string
        .byte   3                       # DW_AT_name
        .byte   8                       # DW_FORM_string
        .ascii  "\261B"                 # DW_AT_GNU_dwo_id
        .byte   7                       # DW_FORM_data8
        .byte   0                       # EOM(1)
        .byte   0                       # EOM(2)
        .byte   0                       # EOM(3)
.Labbrev_end1:

        .section        .debug_info.dwo,"e",@progbits
# DWO CU1
.Lcu_begin1:
        .long   .Ldebug_info_end1-.Ldebug_info_start1 # Length of Unit 0x2b
.Ldebug_info_start1:
        .short  4                       # DWARF version number
        .long   0                       # Offset Into Abbrev. Section
        .byte   8                       # Address Size (in bytes)
        .byte   1                       # Abbrev DW_TAG_compile_unit
        .asciz  "Hand-written DWARF"    # DW_AT_producer
        .byte   '1', '.', 'c', 0        # DW_AT_name
        .quad   0x970c277d61e66cb3      # DW_AT_GNU_dwo_id
        .zero   0xfffffff0 - 0x2b       # 0xfffffff0 is mimimum reserved length
.Ldebug_info_end1:

# DWO CU2
.Lcu_begin2:
        .long   .Ldebug_info_end2-.Ldebug_info_start2 # Length of Unit 0x2b
.Ldebug_info_start2:
        .short  4                       # DWARF version number
        .long   0                       # Offset Into Abbrev. Section
        .byte   8                       # Address Size (in bytes)
        .byte   1                       # Abbrev DW_TAG_compile_unit
        .asciz  "Hand-written DWARF"    # DW_AT_producer
        .byte   '2', '.', 'c', 0        # DW_AT_name
        .quad   0xd725a83371e7e913      # DW_AT_GNU_dwo_id
.Ldebug_info_end2:

# DWO CU3
.Lcu_begin3:
        .long   .Ldebug_info_end3-.Ldebug_info_start3 # Length of Unit 0x2b
.Ldebug_info_start3:
        .short  4                       # DWARF version number
        .long   0                       # Offset Into Abbrev. Section
        .byte   8                       # Address Size (in bytes)
        .byte   1                       # Abbrev DW_TAG_compile_unit
        .asciz  "Hand-written DWARF"    # DW_AT_producer
        .byte   '3', '.', 'c', 0        # DW_AT_name
        .quad   0x93f541184fb98d75      # DW_AT_GNU_dwo_id
.Ldebug_info_end3:

        .section        .debug_cu_index,"",@progbits
        .long   2                       # DWARF version number
        .long   2                       # Section count
        .long   3                       # Unit count
        .long   8                       # Slot count

        .quad   0x970c277d61e66cb3, 0, 0, 0xd725a83371e7e913, 0, 0x93f541184fb98d75, 0, 0  # Hash table
        .long   1, 0, 0, 2, 0, 3, 0, 0  # Index table

        .long   1                       # DW_SECT_INFO
        .long   3                       # DW_SECT_ABBREV
# Offsets
        # row 0
        .long  .Lcu_begin1-.debug_info.dwo
        .long  .Labbrev1-.debug_abbrev.dwo
        # row 1
        .long  .Lcu_begin2-.debug_info.dwo
        .long  .Labbrev1-.debug_abbrev.dwo
        # row 2
        .long  0x1b # setting this manually, otherwis llvm-mc crashes
        .long  .Labbrev1-.debug_abbrev.dwo
# Lengths
        # row 0
        .long .Ldebug_info_end1-.Lcu_begin1
        .long .Labbrev_end1-.Labbrev1
        # row 1
        .long .Ldebug_info_end2-.Lcu_begin2
        .long .Labbrev_end1-.Labbrev1
        # row 2
        .long .Ldebug_info_end3-.Lcu_begin3
        .long .Labbrev_end1-.Labbrev1
dblaikie accepted this revision.Dec 5 2022, 5:07 PM

Sounds good to me. Thanks for your patience/please only commit once the 64 bit changes to LLVM's index data structures are committed first.

Updated test with 64bit printout.


# RUN: llvm-mc --filetype=obj --triple x86_64 %s -o %t.dwp
# RUN: llvm-dwarfdump --manaully-generate-unit-index --debug-cu-index %t.dwp | FileCheck %s

# This test checks that we parse correctly cu-index that has entries over 4GB.
# It is setup to work with current llvm implementation where cu-index is 32bit.
# Once we move to 64bit internal representation, it will need to be modified.

I guess this comment needs to be updated ^

# CHECK:        0x970c277d61e66cb3
# CHECK-SAME: [0x0000000000000000, 0x00000000fffffff0)
# CHECK:      0xd725a83371e7e913
# CHECK-SAME: [0x00000000fffffff0, 0x000000010000001b)
# CHECK:      0x93f541184fb98d75
# CHECK-SAME: [0x000000010000001b, 0x0000000100000046)

        .section        .debug_abbrev.dwo,"e",@progbits
.Labbrev1:
        .byte   1                       # Abbreviation Code
        .byte   17                      # DW_TAG_compile_unit
        .byte   0                       # DW_CHILDREN_no
        .byte   37                      # DW_AT_producer
        .byte   8                       # DW_FORM_string
        .byte   3                       # DW_AT_name
        .byte   8                       # DW_FORM_string
        .ascii  "\261B"                 # DW_AT_GNU_dwo_id
        .byte   7                       # DW_FORM_data8
        .byte   0                       # EOM(1)
        .byte   0                       # EOM(2)
        .byte   0                       # EOM(3)
.Labbrev_end1:

        .section        .debug_info.dwo,"e",@progbits
# DWO CU1
.Lcu_begin1:
        .long   .Ldebug_info_end1-.Ldebug_info_start1 # Length of Unit 0x2b
.Ldebug_info_start1:
        .short  4                       # DWARF version number
        .long   0                       # Offset Into Abbrev. Section
        .byte   8                       # Address Size (in bytes)
        .byte   1                       # Abbrev DW_TAG_compile_unit
        .asciz  "Hand-written DWARF"    # DW_AT_producer
        .byte   '1', '.', 'c', 0        # DW_AT_name
        .quad   0x970c277d61e66cb3      # DW_AT_GNU_dwo_id
        .zero   0xfffffff0 - 0x2b       # 0xfffffff0 is mimimum reserved length
.Ldebug_info_end1:

# DWO CU2
.Lcu_begin2:
        .long   .Ldebug_info_end2-.Ldebug_info_start2 # Length of Unit 0x2b
.Ldebug_info_start2:
        .short  4                       # DWARF version number
        .long   0                       # Offset Into Abbrev. Section
        .byte   8                       # Address Size (in bytes)
        .byte   1                       # Abbrev DW_TAG_compile_unit
        .asciz  "Hand-written DWARF"    # DW_AT_producer
        .byte   '2', '.', 'c', 0        # DW_AT_name
        .quad   0xd725a83371e7e913      # DW_AT_GNU_dwo_id
.Ldebug_info_end2:

# DWO CU3
.Lcu_begin3:
        .long   .Ldebug_info_end3-.Ldebug_info_start3 # Length of Unit 0x2b
.Ldebug_info_start3:
        .short  4                       # DWARF version number
        .long   0                       # Offset Into Abbrev. Section
        .byte   8                       # Address Size (in bytes)
        .byte   1                       # Abbrev DW_TAG_compile_unit
        .asciz  "Hand-written DWARF"    # DW_AT_producer
        .byte   '3', '.', 'c', 0        # DW_AT_name
        .quad   0x93f541184fb98d75      # DW_AT_GNU_dwo_id
.Ldebug_info_end3:

        .section        .debug_cu_index,"",@progbits
        .long   2                       # DWARF version number
        .long   2                       # Section count
        .long   3                       # Unit count
        .long   8                       # Slot count

        .quad   0x970c277d61e66cb3, 0, 0, 0xd725a83371e7e913, 0, 0x93f541184fb98d75, 0, 0  # Hash table
        .long   1, 0, 0, 2, 0, 3, 0, 0  # Index table

        .long   1                       # DW_SECT_INFO
        .long   3                       # DW_SECT_ABBREV
# Offsets
        # row 0
        .long  .Lcu_begin1-.debug_info.dwo
        .long  .Labbrev1-.debug_abbrev.dwo
        # row 1
        .long  .Lcu_begin2-.debug_info.dwo
        .long  .Labbrev1-.debug_abbrev.dwo
        # row 2
        .long  0x1b # setting this manually, otherwis llvm-mc crashes
        .long  .Labbrev1-.debug_abbrev.dwo
# Lengths
        # row 0
        .long .Ldebug_info_end1-.Lcu_begin1
        .long .Labbrev_end1-.Labbrev1
        # row 1
        .long .Ldebug_info_end2-.Lcu_begin2
        .long .Labbrev_end1-.Labbrev1
        # row 2
        .long .Ldebug_info_end3-.Lcu_begin3
        .long .Labbrev_end1-.Labbrev1
llvm/include/llvm/DebugInfo/DWARF/DWARFUnit.h
95 ↗(On Diff #480276)

This change is unneeded/can be removed, yeah?

This revision is now accepted and ready to land.Dec 5 2022, 5:07 PM
ayermolo updated this revision to Diff 480293.Dec 5 2022, 5:33 PM

rebase, addressed comment

Thanks for reviewing. :)

ayermolo updated this revision to Diff 480620.Dec 6 2022, 2:22 PM

fixed some tests

ayermolo updated this revision to Diff 480682.Dec 6 2022, 4:29 PM

Reduced the scope a bit. Removed some v5 tests and added one new tests.

This revision was landed with ongoing or failed builds.Dec 7 2022, 1:09 PM
This revision was automatically updated to reflect the committed changes.

Accidentally pushed when pushing bolt changes. Reverted.

ayermolo reopened this revision.Dec 7 2022, 3:37 PM
This revision is now accepted and ready to land.Dec 7 2022, 3:37 PM
This revision was landed with ongoing or failed builds.Jan 10 2023, 3:16 PM
This revision was automatically updated to reflect the committed changes.

Sorry, this change causes a MemorySanitizer error in LLVM testsuite. I'm going to revert it. Please investigate, fix, and feel free to re-land.

Here's a sample error from llvm/test/CodeGen/X86/dwarf-split-line-1.ll:

==6051==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x55fcfe0c7a00 in llvm::DWARFContext::getTUIndex() llvm-project/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp:870:7
    #1 0x55fcfe17365f in operator() llvm-project/llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp:87:39
    #2 0x55fcfe17365f in __invoke<(lambda at llvm-project/llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp:74:14) &, unsigned long, llvm::DWARFSectionKind, const llvm::DWARFSection *, const llvm::DWARFUnitIndex::Entry *> [...]/c++/v1/__functional/invoke.h:394:23
    #3 0x55fcfe17365f in __call<(lambda at llvm-project/llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp:74:14) &, unsigned long, llvm::DWARFSectionKind, const llvm::DWARFSection *, const llvm::DWARFUnitIndex::Entry *> [...]/c++/v1/__functional/invoke.h:478:16
    #4 0x55fcfe17365f in operator() [...]/c++/v1/__functional/function.h:232:12
    #5 0x55fcfe17365f in std::__msan::unique_ptr<llvm::DWARFUnit, std::__msan::default_delete<llvm::DWARFUnit>> std::__msan::__function::__policy_invoker<std::__msan::unique_ptr<llvm::DWARFUnit, std::__msan::default_delete<llvm::DWARFUnit>> (unsigned long, llvm::DWARFSectionKind, llvm::DWARFSection const*, llvm::DWARFUnitIndex::Entry const*)>::__call_impl<std::__msan::__function::__default_alloc_func<llvm::DWARFUnitVector::addUnitsImpl(llvm::DWARFContext&, llvm::DWARFObject const&, llvm::DWARFSection const&, llvm::DWARFDebugAbbrev const*, llvm::DWARFSection const*, llvm::DWARFSection const*, llvm::StringRef, llvm::DWARFSection const&, llvm::DWARFSection const*, llvm::DWARFSection const&, bool, bool, bool, llvm::DWARFSectionKind)::$_0, std::__msan::unique_ptr<llvm::DWARFUnit, std::__msan::default_delete<llvm::DWARFUnit>> (unsigned long, llvm::DWARFSectionKind, llvm::DWARFSection const*, llvm::DWARFUnitIndex::Entry const*)>>(std::__msan::__function::__policy_storage const*, unsigned long, llvm::DWARFSectionKind, llvm::DWARFSection const*, llvm::DWARFUnitIndex::Entry const*) v1/__functional/function.h:711:16
    #6 0x55fcfe166ef6 in operator() [...]/c++/v1/__functional/function.h:842:16
    #7 0x55fcfe166ef6 in operator() [...]/c++/v1/__functional/function.h:1152:12
    #8 0x55fcfe166ef6 in llvm::DWARFUnitVector::addUnitsImpl(llvm::DWARFContext&, llvm::DWARFObject const&, llvm::DWARFSection const&, llvm::DWARFDebugAbbrev const*, llvm::DWARFSection const*, llvm::DWARFSection const*, llvm::StringRef, llvm::DWARFSection const&, llvm::DWARFSection const*, llvm::DWARFSection const&, bool, bool, bool, llvm::DWARFSectionKind) llvm-project/llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp:127:14
    #9 0x55fcfe1671cd in llvm::DWARFUnitVector::addUnitsForDWOSection(llvm::DWARFContext&, llvm::DWARFSection const&, llvm::DWARFSectionKind, bool) llvm-project/llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp:58:3
    #10 0x55fcfe0dee91 in operator() llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
    #11 0x55fcfe0dee91 in (anonymous namespace)::DWARFObjInMemory::forEachInfoDWOSections(llvm::function_ref<void (llvm::DWARFSection const&)>) const llvm-project/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp:1999:7
    #12 0x55fcfe0cabdf in llvm::DWARFContext::parseDWOUnits(bool) llvm-project/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp:1106:9
    #13 0x55fcfe0bc1f5 in getNumDWOCompileUnits llvm-project/llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h:229:5
    #14 0x55fcfe0bc1f5 in llvm::DWARFContext::dump(llvm::raw_ostream&, llvm::DIDumpOptions, std::__msan::array<std::__msan::optional<unsigned long>, 28ul>) llvm-project/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp:398:24
    #15 0x55fcfcdc085a in dumpObjectFile(llvm::object::ObjectFile&, llvm::DWARFContext&, llvm::Twine const&, llvm::raw_ostream&) llvm-project/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp:631:9
    #16 0x55fcfcdc298d in operator() [...]/c++/v1/__functional/function.h:842:16
    #17 0x55fcfcdc298d in operator() [...]/c++/v1/__functional/function.h:1152:12
    #18 0x55fcfcdc298d in handleBuffer(llvm::StringRef, llvm::MemoryBufferRef, std::__msan::function<bool (llvm::object::ObjectFile&, llvm::DWARFContext&, llvm::Twine const&, llvm::raw_ostream&)>, llvm::raw_ostream&) llvm-project/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp:686:12
    #19 0x55fcfcdbc35d in handleFile(llvm::StringRef, std::__msan::function<bool (llvm::object::ObjectFile&, llvm::DWARFContext&, llvm::Twine const&, llvm::raw_ostream&)>, llvm::raw_ostream&) llvm-project/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp:724:10
    #20 0x55fcfcdbb99a in main llvm-project/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp:820:18
    #21 0x7faa82510632 in __libc_start_main
    #22 0x55fcfcd15169 in _start

SUMMARY: MemorySanitizer: use-of-uninitialized-value llvm-project/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp:870:7 in llvm::DWARFContext::getTUIndex()

Full list of affected tests, the immediate cause looks similar (use-of-uninitialized-value in llvm::DWARFContext::getTUIndex()):

llvm/test/CodeGen/X86/dwarf-headers.ll
llvm/test/CodeGen/X86/dwarf-split-line-1.ll
llvm/test/CodeGen/X86/dwarf-split-line-2.ll
llvm/test/DebugInfo/WebAssembly/dwarf-headers.ll
llvm/test/DebugInfo/X86/dwarfdump-header.s
llvm/test/DebugInfo/X86/dwarfdump-str-offsets-v4-dwarf64-dwo.s
llvm/test/DebugInfo/X86/dwarfdump-str-offsets.s
llvm/test/DebugInfo/X86/generate-odr-hash.ll
llvm/test/DebugInfo/X86/gnu-public-names-tu.ll
llvm/test/DebugInfo/X86/string-offsets-multiple-cus.ll
llvm/test/DebugInfo/X86/tu-to-non-named-type.ll
llvm/test/DebugInfo/X86/tu-to-non-tu.ll
llvm/test/DebugInfo/X86/type_units_with_addresses.ll
ayermolo reopened this revision.Jan 11 2023, 3:13 PM

Sorry for break, let me take a look.

This revision is now accepted and ready to land.Jan 11 2023, 3:13 PM
ayermolo updated this revision to Diff 488453.Jan 11 2023, 7:07 PM

fixed ub, mem santiziers

ayermolo updated this revision to Diff 488644.Jan 12 2023, 7:17 AM

fixed one more mem sanitizer bug

dblaikie accepted this revision.Jan 12 2023, 9:44 AM

Looks OK, if it clears up the failures.

Looks OK, if it clears up the failures.

thanks. Re-ran ninja check all locally with memsan build, and came back clean.

This revision was automatically updated to reflect the committed changes.

@dblaikie looks like no more build bot breaks. :D Thank you again for reviewing.

dblaikie added inline comments.Mar 16 2023, 5:56 PM
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
868–871

Any idea what this comment/code were about checking the type index version? I don't see any reason this fixup wouldn't be relevant to DWARFv4 type unit indexes - though the fixup code would have to take the section to parse as a parameter, since it's currently hardcoded to the debug_info sections, not the debug_types sections.

ayermolo added inline comments.Mar 16 2023, 6:26 PM
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
868–871

I don't quite remember why. I think it was to narrow the scope.
Also don't we also need to call forEachTypesDWOSections if we want to handle .debug_types section?
Why do you see overflows in that section also?

dblaikie added inline comments.Mar 17 2023, 3:20 PM
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
868–871

We aren't having an issue here - since it's only DWARFv4+type units, and we're using DWARFv5 these days.

I've been looking at adding DWARFv5 overflow recovery (which can be more robust than DWARFv4, since DWARFv5 can get the DWOID/type signature without needing the abbrev section) and just looking more closely at this code/seeing these quirks & wanted to understand it better.

But, yeah, maybe just not worth supporting *shrug* no worries, thanks!

ayermolo added inline comments.Mar 19 2023, 3:39 PM
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
868–871

Hopefully we will be moving to DWARF5 his year also. :D
The code should work with DWARF5 + debug types, but you are right it could be made more robust by specializing and taking advantage of DWO ID being in a header. Not sure it's worth it either.

Also I am not sure how often debug-types are used with DWARF5 since one has to choose between them and .debug_names. Although looking at https://reviews.llvm.org/D49420 should be easy to turn on now that llvm generates DWARF5 type units.

dblaikie added inline comments.Mar 20 2023, 5:45 PM
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
868–871

Hopefully we will be moving to DWARF5 his year also. :D

*fingers crossed*

The code should work with DWARF5 + debug types, but you are right it could be made more robust by specializing and taking advantage of DWO ID being in a header. Not sure it's worth it either.

You mean DWARF4+debug types? It does already work with DWARF5 and debug types.

Also I am not sure how often debug-types are used with DWARF5 since one has to choose between them and .debug_names. Although looking at https://reviews.llvm.org/D49420 should be easy to turn on now that llvm generates DWARF5 type units.

It's what we're doing at Google, at least - we're still using gdb_index, not lldb - and when we use lldb it's without any index solution, so, yeah, slow...

Hadn't realized there was that outstanding patch for debug_names for type units. I'll have to take a look...

dblaikie added inline comments.Mar 20 2023, 5:53 PM
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
868–871

oh, looked at D49420, thought that was an outstanding patch for DWARFv5 debug_names support for type units, but it's the patch that enabled debug_names but not when type units are enabled.

I think I looked at it recently and it's not quite as simple as removing the opt-out - there's some implementation complexity to address in building debug_names, also in terms of what we put in them... but yeah, needs some looking at/implementing for sure.

ayermolo added inline comments.Mar 20 2023, 6:01 PM
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
868–871

err sorry didn't phrase it correctly. Just affirming this will support DWARF5 + debug types.

The patch is original one that went in a while back. Reading description acceleration table was disabled because at that time DWARF4 debug types were created by clang.

Ah I see. How do you generate gdb_index with gdb-add-index?

dblaikie added inline comments.Mar 21 2023, 10:20 AM
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
868–871

Ah I see. How do you generate gdb_index with gdb-add-index?

-ggnu-pubnames and -Wl,--gdb-index - so name lists in the object files then the linker generates the efficient index

ayermolo added inline comments.Mar 21 2023, 10:34 AM
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
868–871

ah gotcha version 7 of gdb-index that lld produces. thanks.