Page MenuHomePhabricator

[DWARFLibrary] Add support to re-construct cu-index
ClosedPublic

Authored by ayermolo on Nov 11 2022, 6:24 PM.

Details

Summary

According to DWARF5 specification and gnu specification for DWARF4 the offset
entry in the CU/TU Index is 32 bits. This presents a problem when
.debug_info.dwo in DWP file grows beyond 4GB. The CU Index becomes partially
corrupted.

This diff adds manual parsing of .debug_info.dwo/.debug_abbrev.dwo to
reconstruct CU index in general, and TU index for DWARF5. This is a work around
until DWARF6 spec is finalized.

Next patch will change internal CU/TU struct to 64 bit, and change uses as
necessary. The plan is to land all the patches in one go after all are approved.

This patch originates from the discussion in: https://discourse.llvm.org/t/dwarf-dwp-4gb-limit/63902

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ayermolo updated this revision to Diff 475543.Nov 15 2022, 11:43 AM

fixed tests

Yeah, I'm mostly OK with this direction. It's pretty isolated, maybe easy enough to explain if needed, etc.

llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
782–784

These could probably all sink into fixupIndex as locals?

784

Maybe DenseMap rather than unordered_map?

830

Fair enough - mixed feelings, but I'll rescind this piece at least.

840

I'd prefer to be a bit less permissive, really - to not end up creating more weird cases that systems might come to depend on.

Maybe fail if the length doesn't match, until we know of any particular case with mismatched lengths that we understand enough to want to/figure out how to support?

2049

looks like this unrelated change snuck in?

llvm/lib/DebugInfo/DWARF/DWARFUnitIndex.cpp
272–275

oh, yeah, the other should probably change too... ifyou could do that in a separate patch, that'd be great

ayermolo marked 6 inline comments as done.Nov 23 2022, 1:47 PM
ayermolo added inline comments.
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
2049

Ah yeah clang-format change.

llvm/lib/DebugInfo/DWARF/DWARFUnitIndex.cpp
272–275

Sounds good.

ayermolo updated this revision to Diff 477596.Nov 23 2022, 1:47 PM
ayermolo marked an inline comment as done.

addressed comments

ayermolo marked 4 inline comments as done.Nov 23 2022, 1:48 PM

LLDB changes to enable 64bit support there. https://reviews.llvm.org/D138618

I think test coverage might be more suitable if it were a single dedicated test that has a corrupted index, to demonstrate that rebuilding the index comes up with a different (& correct) answer - rather than adding what looks sort of like redundant testing to existing test cases?

ayermolo added a comment.EditedNov 24 2022, 7:31 AM

I think test coverage might be more suitable if it were a single dedicated test that has a corrupted index, to demonstrate that rebuilding the index comes up with a different (& correct) answer - rather than adding what looks sort of like redundant testing to existing test cases?

I am not sure how to construct corrupted CU Index manually. Would Yaml2Obj be able to do it?
For other tests I think it's ads testing coverage to make sure we are parsing cu/tu indexes correctly in various DWARF versions and debug types enablements combinations.

*To add/clarify those tests do exercise common code paths. I think adding "corrupt" index test would make sense to test some of the error conditions.

I think test coverage might be more suitable if it were a single dedicated test that has a corrupted index, to demonstrate that rebuilding the index comes up with a different (& correct) answer - rather than adding what looks sort of like redundant testing to existing test cases?

I am not sure how to construct corrupted CU Index manually. Would Yaml2Obj be able to do it?

I'm not sure - I don't think so. Might just have to be raw assembly - check the other symbolizer dwp testing? I think it's probably hand-crafted assembly? (maybe llvm-dwp testing that merges existing dwp files has some hand crafted dwp files too you could be inspired by)

But maybe Yaml2Obj could help - I'm just not very familiar with this.

For other tests I think it's ads testing coverage to make sure we are parsing cu/tu indexes correctly in various DWARF versions and debug types enablements combinations.

I'd rather keep things a bit narrower - demonstrate that the index is computed correctly, and assume that everything else works correctly once the index is correct.

I think test coverage might be more suitable if it were a single dedicated test that has a corrupted index, to demonstrate that rebuilding the index comes up with a different (& correct) answer - rather than adding what looks sort of like redundant testing to existing test cases?

I am not sure how to construct corrupted CU Index manually. Would Yaml2Obj be able to do it?

I'm not sure - I don't think so. Might just have to be raw assembly - check the other symbolizer dwp testing? I think it's probably hand-crafted assembly? (maybe llvm-dwp testing that merges existing dwp files has some hand crafted dwp files too you could be inspired by)

But maybe Yaml2Obj could help - I'm just not very familiar with this.

For other tests I think it's ads testing coverage to make sure we are parsing cu/tu indexes correctly in various DWARF versions and debug types enablements combinations.

I'd rather keep things a bit narrower - demonstrate that the index is computed correctly, and assume that everything else works correctly once the index is correct.

So...
Good news is that I got inspired and created a hand written assembly with invalid index.
Bad news is that this version of the patch relies on overflow behavior. See uint32_t TruncOffset. So just having one incorrect offset doesn't do anything.
I tried to play with length in the header the Header.extract fails.

tschuett added inline comments.
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
59

Is this include useless?

ayermolo updated this revision to Diff 479349.Dec 1 2022, 10:22 AM

removed unordered_map

ayermolo marked an inline comment as done.Dec 1 2022, 10:22 AM
ayermolo added inline comments.
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
59

Thanks removed. Leftover from switch to DenseMap.

I think test coverage might be more suitable if it were a single dedicated test that has a corrupted index, to demonstrate that rebuilding the index comes up with a different (& correct) answer - rather than adding what looks sort of like redundant testing to existing test cases?

I am not sure how to construct corrupted CU Index manually. Would Yaml2Obj be able to do it?

I'm not sure - I don't think so. Might just have to be raw assembly - check the other symbolizer dwp testing? I think it's probably hand-crafted assembly? (maybe llvm-dwp testing that merges existing dwp files has some hand crafted dwp files too you could be inspired by)

But maybe Yaml2Obj could help - I'm just not very familiar with this.

For other tests I think it's ads testing coverage to make sure we are parsing cu/tu indexes correctly in various DWARF versions and debug types enablements combinations.

I'd rather keep things a bit narrower - demonstrate that the index is computed correctly, and assume that everything else works correctly once the index is correct.

So...
Good news is that I got inspired and created a hand written assembly with invalid index.
Bad news is that this version of the patch relies on overflow behavior. See uint32_t TruncOffset.

UB when converting a uint64_t into a uint32_t? Hmm, I didn't think that was undefined - figured that'd wrap around with well defined behavior. Maybe I'm misunderstanding - what's the particular UB you're referring to?

So just having one incorrect offset doesn't do anything.

Yeah, I'm still missing a step here. But I guess since we're only walking the offsets and looking for wraparound, this only has an effect if there's a genuine overflow, which would require a genuinely very large dwp file, which is infeasible to checkin as a test?

ayermolo marked an inline comment as done.Dec 1 2022, 4:10 PM

I think test coverage might be more suitable if it were a single dedicated test that has a corrupted index, to demonstrate that rebuilding the index comes up with a different (& correct) answer - rather than adding what looks sort of like redundant testing to existing test cases?

I am not sure how to construct corrupted CU Index manually. Would Yaml2Obj be able to do it?

I'm not sure - I don't think so. Might just have to be raw assembly - check the other symbolizer dwp testing? I think it's probably hand-crafted assembly? (maybe llvm-dwp testing that merges existing dwp files has some hand crafted dwp files too you could be inspired by)

But maybe Yaml2Obj could help - I'm just not very familiar with this.

For other tests I think it's ads testing coverage to make sure we are parsing cu/tu indexes correctly in various DWARF versions and debug types enablements combinations.

I'd rather keep things a bit narrower - demonstrate that the index is computed correctly, and assume that everything else works correctly once the index is correct.

So...
Good news is that I got inspired and created a hand written assembly with invalid index.
Bad news is that this version of the patch relies on overflow behavior. See uint32_t TruncOffset.

UB when converting a uint64_t into a uint32_t? Hmm, I didn't think that was undefined - figured that'd wrap around with well defined behavior. Maybe I'm misunderstanding - what's the particular UB you're referring to?

Sorry was a bit vague with the terms. When I said "overflow behavior" I didn't mean it's UB, but was referring to "wrap around behavior" which is defined.

So just having one incorrect offset doesn't do anything.

Yeah, I'm still missing a step here. But I guess since we're only walking the offsets and looking for wraparound, this only has an effect if there's a genuine overflow, which would require a genuinely very large dwp file, which is infeasible to checkin as a test?

Right exactly. We are iterating over headers and computing TruncOffset and don't hit the bad case until we wrap around (.debug_info.dwo over 4GB). So I don't see a way to create a test to trigger this unless we have a very large dwp file.

I think test coverage might be more suitable if it were a single dedicated test that has a corrupted index, to demonstrate that rebuilding the index comes up with a different (& correct) answer - rather than adding what looks sort of like redundant testing to existing test cases?

I am not sure how to construct corrupted CU Index manually. Would Yaml2Obj be able to do it?

I'm not sure - I don't think so. Might just have to be raw assembly - check the other symbolizer dwp testing? I think it's probably hand-crafted assembly? (maybe llvm-dwp testing that merges existing dwp files has some hand crafted dwp files too you could be inspired by)

But maybe Yaml2Obj could help - I'm just not very familiar with this.

For other tests I think it's ads testing coverage to make sure we are parsing cu/tu indexes correctly in various DWARF versions and debug types enablements combinations.

I'd rather keep things a bit narrower - demonstrate that the index is computed correctly, and assume that everything else works correctly once the index is correct.

So...
Good news is that I got inspired and created a hand written assembly with invalid index.
Bad news is that this version of the patch relies on overflow behavior. See uint32_t TruncOffset.

UB when converting a uint64_t into a uint32_t? Hmm, I didn't think that was undefined - figured that'd wrap around with well defined behavior. Maybe I'm misunderstanding - what's the particular UB you're referring to?

Sorry was a bit vague with the terms. When I said "overflow behavior" I didn't mean it's UB, but was referring to "wrap around behavior" which is defined.

So just having one incorrect offset doesn't do anything.

Yeah, I'm still missing a step here. But I guess since we're only walking the offsets and looking for wraparound, this only has an effect if there's a genuine overflow, which would require a genuinely very large dwp file, which is infeasible to checkin as a test?

Right exactly. We are iterating over headers and computing TruncOffset and don't hit the bad case until we wrap around (.debug_info.dwo over 4GB). So I don't see a way to create a test to trigger this unless we have a very large dwp file.

*nod* fair enough. Any chance of hand writing something that uses small assembly, but produces large enough output that it hits the overflow (using .zero or the like) - we wouldn't want to run it (don't want to produce multi-GB output files in test execution) but maybe comment it as a manual test, or at least have mentioned it/shown it in this review.

If you could write up something like that, copy/paste manually running it to show that the patch makes a difference and include the assembly in this review, I guess that'll have to be adequate?

I think test coverage might be more suitable if it were a single dedicated test that has a corrupted index, to demonstrate that rebuilding the index comes up with a different (& correct) answer - rather than adding what looks sort of like redundant testing to existing test cases?

I am not sure how to construct corrupted CU Index manually. Would Yaml2Obj be able to do it?

I'm not sure - I don't think so. Might just have to be raw assembly - check the other symbolizer dwp testing? I think it's probably hand-crafted assembly? (maybe llvm-dwp testing that merges existing dwp files has some hand crafted dwp files too you could be inspired by)

But maybe Yaml2Obj could help - I'm just not very familiar with this.

For other tests I think it's ads testing coverage to make sure we are parsing cu/tu indexes correctly in various DWARF versions and debug types enablements combinations.

I'd rather keep things a bit narrower - demonstrate that the index is computed correctly, and assume that everything else works correctly once the index is correct.

So...
Good news is that I got inspired and created a hand written assembly with invalid index.
Bad news is that this version of the patch relies on overflow behavior. See uint32_t TruncOffset.

UB when converting a uint64_t into a uint32_t? Hmm, I didn't think that was undefined - figured that'd wrap around with well defined behavior. Maybe I'm misunderstanding - what's the particular UB you're referring to?

Sorry was a bit vague with the terms. When I said "overflow behavior" I didn't mean it's UB, but was referring to "wrap around behavior" which is defined.

So just having one incorrect offset doesn't do anything.

Yeah, I'm still missing a step here. But I guess since we're only walking the offsets and looking for wraparound, this only has an effect if there's a genuine overflow, which would require a genuinely very large dwp file, which is infeasible to checkin as a test?

Right exactly. We are iterating over headers and computing TruncOffset and don't hit the bad case until we wrap around (.debug_info.dwo over 4GB). So I don't see a way to create a test to trigger this unless we have a very large dwp file.

*nod* fair enough. Any chance of hand writing something that uses small assembly, but produces large enough output that it hits the overflow (using .zero or the like) - we wouldn't want to run it (don't want to produce multi-GB output files in test execution) but maybe comment it as a manual test, or at least have mentioned it/shown it in this review.

If you could write up something like that, copy/paste manually running it to show that the patch makes a difference and include the assembly in this review, I guess that'll have to be adequate?

Oh I think I see.
Let me look into it.

bin/llvm-lit -a /data/users/ayermolo/server-llvm/llvm-project/llvm/test/tools/llvm-dwp/X86/invalid-cu-index.s

bin/llvm-dwarfdump --manaully-generate-unit-index --debug-cu-index /data/users/ayermolo/llvm-build-release/test/tools/llvm-dwp/X86/Output/invalid-cu-index.s.tmp.dwp


bin/llvm-dwarfdump --debug-cu-index /data/users/ayermolo/llvm-build-release/test/tools/llvm-dwp/X86/Output/invalid-cu-index.s.tmp.dwp

# This test checks that with invalid offset in the cu index
# we can reconstruct it manually.

# RUN: llvm-mc --filetype=obj --triple x86_64 %s -o %t.dwp
# RUN: llvm-dwarfdump --manaully-generate-unit-index --debug-cu-index %t.dwp | FileCheck %s

# This test checks that we parse correctly cu-index that has entries over 4GB.
# It is setup to work with current llvm implementation where cu-index is 32bit.
# Once we move to 64bit internal representation, it will need to be modified.

# CHECK:        0x970c277d61e66cb3
# CHECK-SAME: [0x00000000, 0xfffffff0)
# CHECK:      0xd725a83371e7e913
# CHECK-SAME: [0xfffffff0, 0x0000001b)
# CHECK:      0x93f541184fb98d75
# CHECK-SAME: [0x0000001b, 0x00000046)

        .section        .debug_abbrev.dwo,"e",@progbits
.Labbrev1:
        .byte   1                       # Abbreviation Code
        .byte   17                      # DW_TAG_compile_unit
        .byte   0                       # DW_CHILDREN_no
        .byte   37                      # DW_AT_producer
        .byte   8                       # DW_FORM_string
        .byte   3                       # DW_AT_name
        .byte   8                       # DW_FORM_string
        .ascii  "\261B"                 # DW_AT_GNU_dwo_id
        .byte   7                       # DW_FORM_data8
        .byte   0                       # EOM(1)
        .byte   0                       # EOM(2)
        .byte   0                       # EOM(3)
.Labbrev_end1:

        .section        .debug_info.dwo,"e",@progbits
# DWO CU1
.Lcu_begin1:
        .long   .Ldebug_info_end1-.Ldebug_info_start1 # Length of Unit 0x2b
.Ldebug_info_start1:
        .short  4                       # DWARF version number
        .long   0                       # Offset Into Abbrev. Section
        .byte   8                       # Address Size (in bytes)
        .byte   1                       # Abbrev DW_TAG_compile_unit
        .asciz  "Hand-written DWARF"    # DW_AT_producer
        .byte   '1', '.', 'c', 0        # DW_AT_name
        .quad   0x970c277d61e66cb3      # DW_AT_GNU_dwo_id
        .zero   0xfffffff0 - 0x2b       # 0xfffffff0 is mimimum reserved length
.Ldebug_info_end1:

# DWO CU2
.Lcu_begin2:
        .long   .Ldebug_info_end2-.Ldebug_info_start2 # Length of Unit 0x2b
.Ldebug_info_start2:
        .short  4                       # DWARF version number
        .long   0                       # Offset Into Abbrev. Section
        .byte   8                       # Address Size (in bytes)
        .byte   1                       # Abbrev DW_TAG_compile_unit
        .asciz  "Hand-written DWARF"    # DW_AT_producer
        .byte   '2', '.', 'c', 0        # DW_AT_name
        .quad   0xd725a83371e7e913      # DW_AT_GNU_dwo_id
.Ldebug_info_end2:

# DWO CU3
.Lcu_begin3:
        .long   .Ldebug_info_end3-.Ldebug_info_start3 # Length of Unit 0x2b
.Ldebug_info_start3:
        .short  4                       # DWARF version number
        .long   0                       # Offset Into Abbrev. Section
        .byte   8                       # Address Size (in bytes)
        .byte   1                       # Abbrev DW_TAG_compile_unit
        .asciz  "Hand-written DWARF"    # DW_AT_producer
        .byte   '3', '.', 'c', 0        # DW_AT_name
        .quad   0x93f541184fb98d75      # DW_AT_GNU_dwo_id
.Ldebug_info_end3:

        .section        .debug_cu_index,"",@progbits
        .long   2                       # DWARF version number
        .long   2                       # Section count
        .long   3                       # Unit count
        .long   8                       # Slot count

        .quad   0x970c277d61e66cb3, 0, 0, 0xd725a83371e7e913, 0, 0x93f541184fb98d75, 0, 0  # Hash table
        .long   1, 0, 0, 2, 0, 3, 0, 0  # Index table

        .long   1                       # DW_SECT_INFO
        .long   3                       # DW_SECT_ABBREV
# Offsets
        # row 0
        .long  .Lcu_begin1-.debug_info.dwo
        .long  .Labbrev1-.debug_abbrev.dwo
        # row 1
        .long  .Lcu_begin2-.debug_info.dwo
        .long  .Labbrev1-.debug_abbrev.dwo
        # row 2
        .long  0x1b # setting this manually, otherwise llvm-mc crashes
        .long  .Labbrev1-.debug_abbrev.dwo
# Lengths
        # row 0
        .long .Ldebug_info_end1-.Lcu_begin1
        .long .Labbrev_end1-.Labbrev1
        # row 1
        .long .Ldebug_info_end2-.Lcu_begin2
        .long .Labbrev_end1-.Labbrev1
        # row 2
        .long .Ldebug_info_end3-.Lcu_begin3
        .long .Labbrev_end1-.Labbrev1

bin/llvm-lit -a /data/users/ayermolo/server-llvm/llvm-project/llvm/test/tools/llvm-dwp/X86/invalid-cu-index.s

bin/llvm-dwarfdump --manaully-generate-unit-index --debug-cu-index /data/users/ayermolo/llvm-build-release/test/tools/llvm-dwp/X86/Output/invalid-cu-index.s.tmp.dwp


bin/llvm-dwarfdump --debug-cu-index /data/users/ayermolo/llvm-build-release/test/tools/llvm-dwp/X86/Output/invalid-cu-index.s.tmp.dwp

These don't quite look like how I'd expect. I'd have thought the manually-generate-unit-index would be different/correct, showing the value in the manual generation over the pre-built but buggy/overflowed index?

# This test checks that with invalid offset in the cu index
# we can reconstruct it manually.

# RUN: llvm-mc --filetype=obj --triple x86_64 %s -o %t.dwp
# RUN: llvm-dwarfdump --manaully-generate-unit-index --debug-cu-index %t.dwp | FileCheck %s

# This test checks that we parse correctly cu-index that has entries over 4GB.
# It is setup to work with current llvm implementation where cu-index is 32bit.
# Once we move to 64bit internal representation, it will need to be modified.

Sorry, I'm not following here ^ could you explain in more detail/rephrase?

ayermolo added a comment.EditedDec 2 2022, 3:42 PM

bin/llvm-lit -a /data/users/ayermolo/server-llvm/llvm-project/llvm/test/tools/llvm-dwp/X86/invalid-cu-index.s

bin/llvm-dwarfdump --manaully-generate-unit-index --debug-cu-index /data/users/ayermolo/llvm-build-release/test/tools/llvm-dwp/X86/Output/invalid-cu-index.s.tmp.dwp


bin/llvm-dwarfdump --debug-cu-index /data/users/ayermolo/llvm-build-release/test/tools/llvm-dwp/X86/Output/invalid-cu-index.s.tmp.dwp

These don't quite look like how I'd expect. I'd have thought the manually-generate-unit-index would be different/correct, showing the value in the manual generation over the pre-built but buggy/overflowed index?

# This test checks that with invalid offset in the cu index
# we can reconstruct it manually.

# RUN: llvm-mc --filetype=obj --triple x86_64 %s -o %t.dwp
# RUN: llvm-dwarfdump --manaully-generate-unit-index --debug-cu-index %t.dwp | FileCheck %s

# This test checks that we parse correctly cu-index that has entries over 4GB.
# It is setup to work with current llvm implementation where cu-index is 32bit.
# Once we move to 64bit internal representation, it will need to be modified.

Sorry, I'm not following here ^ could you explain in more detail/rephrase?

Well internal data structure for SectionContribution is still 32bit. So right now even with parsing we are still restricted by that. So output won't change.

struct SectionContribution {
      uint32_t Offset;
      uint32_t Length;
    };

My next patch will be changing it to 64bit and add accessors, but I am waiting for https://reviews.llvm.org/D138618 to go through review and land before posting it.
Maybe it's wrong order? Put up second patch changing data structure to 64bit, and in lldb code return 32 bit until that review goes through?

Does this clarify it for your second question, or should I try to rephrase?

bin/llvm-lit -a /data/users/ayermolo/server-llvm/llvm-project/llvm/test/tools/llvm-dwp/X86/invalid-cu-index.s

bin/llvm-dwarfdump --manaully-generate-unit-index --debug-cu-index /data/users/ayermolo/llvm-build-release/test/tools/llvm-dwp/X86/Output/invalid-cu-index.s.tmp.dwp


bin/llvm-dwarfdump --debug-cu-index /data/users/ayermolo/llvm-build-release/test/tools/llvm-dwp/X86/Output/invalid-cu-index.s.tmp.dwp

These don't quite look like how I'd expect. I'd have thought the manually-generate-unit-index would be different/correct, showing the value in the manual generation over the pre-built but buggy/overflowed index?

# This test checks that with invalid offset in the cu index
# we can reconstruct it manually.

# RUN: llvm-mc --filetype=obj --triple x86_64 %s -o %t.dwp
# RUN: llvm-dwarfdump --manaully-generate-unit-index --debug-cu-index %t.dwp | FileCheck %s

# This test checks that we parse correctly cu-index that has entries over 4GB.
# It is setup to work with current llvm implementation where cu-index is 32bit.
# Once we move to 64bit internal representation, it will need to be modified.

Sorry, I'm not following here ^ could you explain in more detail/rephrase?

Well internal data structure for SectionContribution is still 32bit. So right now even with parsing we are still restricted by that. So output won't change.

struct SectionContribution {
      uint32_t Offset;
      uint32_t Length;
    };

My next patch will be changing it to 64bit and add accessors, but I am waiting for https://reviews.llvm.org/D138618 to go through review and land before posting it.
Maybe it's wrong order? Put up second patch changing data structure to 64bit, and in lldb code return 32 bit until that review goes through?

Does this clarify it for your second question, or should I try to rephrase?

Ah, that makes this patch a bit untestable, though - perhaps this patch should wait for the 64bit underlying change, then this patch will have a real purpose/make a difference, where today it doesn't change anything. (committing things that don't change anything, then committing some otherwise-cleanup-ish refactoring that enables the previous change is tricky, because then it's hard to keep track of things being tested, since they couldn't be tested when they were committed owing to the missing underlying refactoring/changes - though it can be a chicke-and-egg situation, in this case I think it's probably clear enough that changing 32 bit offsets to 64 bit ones is pretty benign/mechanical & could be done/reviewed relatively easily (relatively, still lots of changes) & then this change would go on top making the behavior change/tested (albeit manually), etc)

bin/llvm-lit -a /data/users/ayermolo/server-llvm/llvm-project/llvm/test/tools/llvm-dwp/X86/invalid-cu-index.s

bin/llvm-dwarfdump --manaully-generate-unit-index --debug-cu-index /data/users/ayermolo/llvm-build-release/test/tools/llvm-dwp/X86/Output/invalid-cu-index.s.tmp.dwp


bin/llvm-dwarfdump --debug-cu-index /data/users/ayermolo/llvm-build-release/test/tools/llvm-dwp/X86/Output/invalid-cu-index.s.tmp.dwp

These don't quite look like how I'd expect. I'd have thought the manually-generate-unit-index would be different/correct, showing the value in the manual generation over the pre-built but buggy/overflowed index?

# This test checks that with invalid offset in the cu index
# we can reconstruct it manually.

# RUN: llvm-mc --filetype=obj --triple x86_64 %s -o %t.dwp
# RUN: llvm-dwarfdump --manaully-generate-unit-index --debug-cu-index %t.dwp | FileCheck %s

# This test checks that we parse correctly cu-index that has entries over 4GB.
# It is setup to work with current llvm implementation where cu-index is 32bit.
# Once we move to 64bit internal representation, it will need to be modified.

Sorry, I'm not following here ^ could you explain in more detail/rephrase?

Well internal data structure for SectionContribution is still 32bit. So right now even with parsing we are still restricted by that. So output won't change.

struct SectionContribution {
      uint32_t Offset;
      uint32_t Length;
    };

My next patch will be changing it to 64bit and add accessors, but I am waiting for https://reviews.llvm.org/D138618 to go through review and land before posting it.
Maybe it's wrong order? Put up second patch changing data structure to 64bit, and in lldb code return 32 bit until that review goes through?

Does this clarify it for your second question, or should I try to rephrase?

Ah, that makes this patch a bit untestable, though - perhaps this patch should wait for the 64bit underlying change, then this patch will have a real purpose/make a difference, where today it doesn't change anything. (committing things that don't change anything, then committing some otherwise-cleanup-ish refactoring that enables the previous change is tricky, because then it's hard to keep track of things being tested, since they couldn't be tested when they were committed owing to the missing underlying refactoring/changes - though it can be a chicke-and-egg situation, in this case I think it's probably clear enough that changing 32 bit offsets to 64 bit ones is pretty benign/mechanical & could be done/reviewed relatively easily (relatively, still lots of changes) & then this change would go on top making the behavior change/tested (albeit manually), etc)

I wasn't really planning on committing it without the other patch.
Let me do this. I'll remove dependency of the 32bit to 64bit patch on lldb being 64 bit, post it "on top" of this patch, and in that patch test it as a whole. Sounds good?

I wasn't really planning on committing it without the other patch.

Oh, OK - then if you could run the manual test/experiment with that patch applied, happy to review it on that basis and approve this so long as it goes after that work?

Let me do this. I'll remove dependency of the 32bit to 64bit patch on lldb being 64 bit, post it "on top" of this patch, and in that patch test it as a whole. Sounds good?

Not sure I follow - I think that's what I'd rather avoid committing this code (D137882) without a test. That it should wait for whatever refactoring's required before it's committed with some verification, even if it's manual. I don't want this patch to be committed in a state where it's a no-op/unobservable & rely on testing it in a subsequent patch.

Updated test with 64bit printout.


# RUN: llvm-mc --filetype=obj --triple x86_64 %s -o %t.dwp
# RUN: llvm-dwarfdump --manaully-generate-unit-index --debug-cu-index %t.dwp | FileCheck %s

# This test checks that we parse correctly cu-index that has entries over 4GB.
# It is setup to work with current llvm implementation where cu-index is 32bit.
# Once we move to 64bit internal representation, it will need to be modified.

# CHECK:        0x970c277d61e66cb3
# CHECK-SAME: [0x0000000000000000, 0x00000000fffffff0)
# CHECK:      0xd725a83371e7e913
# CHECK-SAME: [0x00000000fffffff0, 0x000000010000001b)
# CHECK:      0x93f541184fb98d75
# CHECK-SAME: [0x000000010000001b, 0x0000000100000046)

        .section        .debug_abbrev.dwo,"e",@progbits
.Labbrev1:
        .byte   1                       # Abbreviation Code
        .byte   17                      # DW_TAG_compile_unit
        .byte   0                       # DW_CHILDREN_no
        .byte   37                      # DW_AT_producer
        .byte   8                       # DW_FORM_string
        .byte   3                       # DW_AT_name
        .byte   8                       # DW_FORM_string
        .ascii  "\261B"                 # DW_AT_GNU_dwo_id
        .byte   7                       # DW_FORM_data8
        .byte   0                       # EOM(1)
        .byte   0                       # EOM(2)
        .byte   0                       # EOM(3)
.Labbrev_end1:

        .section        .debug_info.dwo,"e",@progbits
# DWO CU1
.Lcu_begin1:
        .long   .Ldebug_info_end1-.Ldebug_info_start1 # Length of Unit 0x2b
.Ldebug_info_start1:
        .short  4                       # DWARF version number
        .long   0                       # Offset Into Abbrev. Section
        .byte   8                       # Address Size (in bytes)
        .byte   1                       # Abbrev DW_TAG_compile_unit
        .asciz  "Hand-written DWARF"    # DW_AT_producer
        .byte   '1', '.', 'c', 0        # DW_AT_name
        .quad   0x970c277d61e66cb3      # DW_AT_GNU_dwo_id
        .zero   0xfffffff0 - 0x2b       # 0xfffffff0 is mimimum reserved length
.Ldebug_info_end1:

# DWO CU2
.Lcu_begin2:
        .long   .Ldebug_info_end2-.Ldebug_info_start2 # Length of Unit 0x2b
.Ldebug_info_start2:
        .short  4                       # DWARF version number
        .long   0                       # Offset Into Abbrev. Section
        .byte   8                       # Address Size (in bytes)
        .byte   1                       # Abbrev DW_TAG_compile_unit
        .asciz  "Hand-written DWARF"    # DW_AT_producer
        .byte   '2', '.', 'c', 0        # DW_AT_name
        .quad   0xd725a83371e7e913      # DW_AT_GNU_dwo_id
.Ldebug_info_end2:

# DWO CU3
.Lcu_begin3:
        .long   .Ldebug_info_end3-.Ldebug_info_start3 # Length of Unit 0x2b
.Ldebug_info_start3:
        .short  4                       # DWARF version number
        .long   0                       # Offset Into Abbrev. Section
        .byte   8                       # Address Size (in bytes)
        .byte   1                       # Abbrev DW_TAG_compile_unit
        .asciz  "Hand-written DWARF"    # DW_AT_producer
        .byte   '3', '.', 'c', 0        # DW_AT_name
        .quad   0x93f541184fb98d75      # DW_AT_GNU_dwo_id
.Ldebug_info_end3:

        .section        .debug_cu_index,"",@progbits
        .long   2                       # DWARF version number
        .long   2                       # Section count
        .long   3                       # Unit count
        .long   8                       # Slot count

        .quad   0x970c277d61e66cb3, 0, 0, 0xd725a83371e7e913, 0, 0x93f541184fb98d75, 0, 0  # Hash table
        .long   1, 0, 0, 2, 0, 3, 0, 0  # Index table

        .long   1                       # DW_SECT_INFO
        .long   3                       # DW_SECT_ABBREV
# Offsets
        # row 0
        .long  .Lcu_begin1-.debug_info.dwo
        .long  .Labbrev1-.debug_abbrev.dwo
        # row 1
        .long  .Lcu_begin2-.debug_info.dwo
        .long  .Labbrev1-.debug_abbrev.dwo
        # row 2
        .long  0x1b # setting this manually, otherwis llvm-mc crashes
        .long  .Labbrev1-.debug_abbrev.dwo
# Lengths
        # row 0
        .long .Ldebug_info_end1-.Lcu_begin1
        .long .Labbrev_end1-.Labbrev1
        # row 1
        .long .Ldebug_info_end2-.Lcu_begin2
        .long .Labbrev_end1-.Labbrev1
        # row 2
        .long .Ldebug_info_end3-.Lcu_begin3
        .long .Labbrev_end1-.Labbrev1
dblaikie accepted this revision.Dec 5 2022, 5:07 PM

Sounds good to me. Thanks for your patience/please only commit once the 64 bit changes to LLVM's index data structures are committed first.

Updated test with 64bit printout.


# RUN: llvm-mc --filetype=obj --triple x86_64 %s -o %t.dwp
# RUN: llvm-dwarfdump --manaully-generate-unit-index --debug-cu-index %t.dwp | FileCheck %s

# This test checks that we parse correctly cu-index that has entries over 4GB.
# It is setup to work with current llvm implementation where cu-index is 32bit.
# Once we move to 64bit internal representation, it will need to be modified.

I guess this comment needs to be updated ^

# CHECK:        0x970c277d61e66cb3
# CHECK-SAME: [0x0000000000000000, 0x00000000fffffff0)
# CHECK:      0xd725a83371e7e913
# CHECK-SAME: [0x00000000fffffff0, 0x000000010000001b)
# CHECK:      0x93f541184fb98d75
# CHECK-SAME: [0x000000010000001b, 0x0000000100000046)

        .section        .debug_abbrev.dwo,"e",@progbits
.Labbrev1:
        .byte   1                       # Abbreviation Code
        .byte   17                      # DW_TAG_compile_unit
        .byte   0                       # DW_CHILDREN_no
        .byte   37                      # DW_AT_producer
        .byte   8                       # DW_FORM_string
        .byte   3                       # DW_AT_name
        .byte   8                       # DW_FORM_string
        .ascii  "\261B"                 # DW_AT_GNU_dwo_id
        .byte   7                       # DW_FORM_data8
        .byte   0                       # EOM(1)
        .byte   0                       # EOM(2)
        .byte   0                       # EOM(3)
.Labbrev_end1:

        .section        .debug_info.dwo,"e",@progbits
# DWO CU1
.Lcu_begin1:
        .long   .Ldebug_info_end1-.Ldebug_info_start1 # Length of Unit 0x2b
.Ldebug_info_start1:
        .short  4                       # DWARF version number
        .long   0                       # Offset Into Abbrev. Section
        .byte   8                       # Address Size (in bytes)
        .byte   1                       # Abbrev DW_TAG_compile_unit
        .asciz  "Hand-written DWARF"    # DW_AT_producer
        .byte   '1', '.', 'c', 0        # DW_AT_name
        .quad   0x970c277d61e66cb3      # DW_AT_GNU_dwo_id
        .zero   0xfffffff0 - 0x2b       # 0xfffffff0 is mimimum reserved length
.Ldebug_info_end1:

# DWO CU2
.Lcu_begin2:
        .long   .Ldebug_info_end2-.Ldebug_info_start2 # Length of Unit 0x2b
.Ldebug_info_start2:
        .short  4                       # DWARF version number
        .long   0                       # Offset Into Abbrev. Section
        .byte   8                       # Address Size (in bytes)
        .byte   1                       # Abbrev DW_TAG_compile_unit
        .asciz  "Hand-written DWARF"    # DW_AT_producer
        .byte   '2', '.', 'c', 0        # DW_AT_name
        .quad   0xd725a83371e7e913      # DW_AT_GNU_dwo_id
.Ldebug_info_end2:

# DWO CU3
.Lcu_begin3:
        .long   .Ldebug_info_end3-.Ldebug_info_start3 # Length of Unit 0x2b
.Ldebug_info_start3:
        .short  4                       # DWARF version number
        .long   0                       # Offset Into Abbrev. Section
        .byte   8                       # Address Size (in bytes)
        .byte   1                       # Abbrev DW_TAG_compile_unit
        .asciz  "Hand-written DWARF"    # DW_AT_producer
        .byte   '3', '.', 'c', 0        # DW_AT_name
        .quad   0x93f541184fb98d75      # DW_AT_GNU_dwo_id
.Ldebug_info_end3:

        .section        .debug_cu_index,"",@progbits
        .long   2                       # DWARF version number
        .long   2                       # Section count
        .long   3                       # Unit count
        .long   8                       # Slot count

        .quad   0x970c277d61e66cb3, 0, 0, 0xd725a83371e7e913, 0, 0x93f541184fb98d75, 0, 0  # Hash table
        .long   1, 0, 0, 2, 0, 3, 0, 0  # Index table

        .long   1                       # DW_SECT_INFO
        .long   3                       # DW_SECT_ABBREV
# Offsets
        # row 0
        .long  .Lcu_begin1-.debug_info.dwo
        .long  .Labbrev1-.debug_abbrev.dwo
        # row 1
        .long  .Lcu_begin2-.debug_info.dwo
        .long  .Labbrev1-.debug_abbrev.dwo
        # row 2
        .long  0x1b # setting this manually, otherwis llvm-mc crashes
        .long  .Labbrev1-.debug_abbrev.dwo
# Lengths
        # row 0
        .long .Ldebug_info_end1-.Lcu_begin1
        .long .Labbrev_end1-.Labbrev1
        # row 1
        .long .Ldebug_info_end2-.Lcu_begin2
        .long .Labbrev_end1-.Labbrev1
        # row 2
        .long .Ldebug_info_end3-.Lcu_begin3
        .long .Labbrev_end1-.Labbrev1
llvm/include/llvm/DebugInfo/DWARF/DWARFUnit.h
95 ↗(On Diff #480276)

This change is unneeded/can be removed, yeah?

This revision is now accepted and ready to land.Dec 5 2022, 5:07 PM
ayermolo updated this revision to Diff 480293.Dec 5 2022, 5:33 PM

rebase, addressed comment

Thanks for reviewing. :)

ayermolo updated this revision to Diff 480620.Dec 6 2022, 2:22 PM

fixed some tests

ayermolo updated this revision to Diff 480682.Dec 6 2022, 4:29 PM

Reduced the scope a bit. Removed some v5 tests and added one new tests.

This revision was landed with ongoing or failed builds.Dec 7 2022, 1:09 PM
This revision was automatically updated to reflect the committed changes.

Accidentally pushed when pushing bolt changes. Reverted.

ayermolo reopened this revision.Dec 7 2022, 3:37 PM
This revision is now accepted and ready to land.Dec 7 2022, 3:37 PM
This revision was landed with ongoing or failed builds.Tue, Jan 10, 3:16 PM
This revision was automatically updated to reflect the committed changes.

Sorry, this change causes a MemorySanitizer error in LLVM testsuite. I'm going to revert it. Please investigate, fix, and feel free to re-land.

Here's a sample error from llvm/test/CodeGen/X86/dwarf-split-line-1.ll:

==6051==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x55fcfe0c7a00 in llvm::DWARFContext::getTUIndex() llvm-project/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp:870:7
    #1 0x55fcfe17365f in operator() llvm-project/llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp:87:39
    #2 0x55fcfe17365f in __invoke<(lambda at llvm-project/llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp:74:14) &, unsigned long, llvm::DWARFSectionKind, const llvm::DWARFSection *, const llvm::DWARFUnitIndex::Entry *> [...]/c++/v1/__functional/invoke.h:394:23
    #3 0x55fcfe17365f in __call<(lambda at llvm-project/llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp:74:14) &, unsigned long, llvm::DWARFSectionKind, const llvm::DWARFSection *, const llvm::DWARFUnitIndex::Entry *> [...]/c++/v1/__functional/invoke.h:478:16
    #4 0x55fcfe17365f in operator() [...]/c++/v1/__functional/function.h:232:12
    #5 0x55fcfe17365f in std::__msan::unique_ptr<llvm::DWARFUnit, std::__msan::default_delete<llvm::DWARFUnit>> std::__msan::__function::__policy_invoker<std::__msan::unique_ptr<llvm::DWARFUnit, std::__msan::default_delete<llvm::DWARFUnit>> (unsigned long, llvm::DWARFSectionKind, llvm::DWARFSection const*, llvm::DWARFUnitIndex::Entry const*)>::__call_impl<std::__msan::__function::__default_alloc_func<llvm::DWARFUnitVector::addUnitsImpl(llvm::DWARFContext&, llvm::DWARFObject const&, llvm::DWARFSection const&, llvm::DWARFDebugAbbrev const*, llvm::DWARFSection const*, llvm::DWARFSection const*, llvm::StringRef, llvm::DWARFSection const&, llvm::DWARFSection const*, llvm::DWARFSection const&, bool, bool, bool, llvm::DWARFSectionKind)::$_0, std::__msan::unique_ptr<llvm::DWARFUnit, std::__msan::default_delete<llvm::DWARFUnit>> (unsigned long, llvm::DWARFSectionKind, llvm::DWARFSection const*, llvm::DWARFUnitIndex::Entry const*)>>(std::__msan::__function::__policy_storage const*, unsigned long, llvm::DWARFSectionKind, llvm::DWARFSection const*, llvm::DWARFUnitIndex::Entry const*) v1/__functional/function.h:711:16
    #6 0x55fcfe166ef6 in operator() [...]/c++/v1/__functional/function.h:842:16
    #7 0x55fcfe166ef6 in operator() [...]/c++/v1/__functional/function.h:1152:12
    #8 0x55fcfe166ef6 in llvm::DWARFUnitVector::addUnitsImpl(llvm::DWARFContext&, llvm::DWARFObject const&, llvm::DWARFSection const&, llvm::DWARFDebugAbbrev const*, llvm::DWARFSection const*, llvm::DWARFSection const*, llvm::StringRef, llvm::DWARFSection const&, llvm::DWARFSection const*, llvm::DWARFSection const&, bool, bool, bool, llvm::DWARFSectionKind) llvm-project/llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp:127:14
    #9 0x55fcfe1671cd in llvm::DWARFUnitVector::addUnitsForDWOSection(llvm::DWARFContext&, llvm::DWARFSection const&, llvm::DWARFSectionKind, bool) llvm-project/llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp:58:3
    #10 0x55fcfe0dee91 in operator() llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
    #11 0x55fcfe0dee91 in (anonymous namespace)::DWARFObjInMemory::forEachInfoDWOSections(llvm::function_ref<void (llvm::DWARFSection const&)>) const llvm-project/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp:1999:7
    #12 0x55fcfe0cabdf in llvm::DWARFContext::parseDWOUnits(bool) llvm-project/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp:1106:9
    #13 0x55fcfe0bc1f5 in getNumDWOCompileUnits llvm-project/llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h:229:5
    #14 0x55fcfe0bc1f5 in llvm::DWARFContext::dump(llvm::raw_ostream&, llvm::DIDumpOptions, std::__msan::array<std::__msan::optional<unsigned long>, 28ul>) llvm-project/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp:398:24
    #15 0x55fcfcdc085a in dumpObjectFile(llvm::object::ObjectFile&, llvm::DWARFContext&, llvm::Twine const&, llvm::raw_ostream&) llvm-project/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp:631:9
    #16 0x55fcfcdc298d in operator() [...]/c++/v1/__functional/function.h:842:16
    #17 0x55fcfcdc298d in operator() [...]/c++/v1/__functional/function.h:1152:12
    #18 0x55fcfcdc298d in handleBuffer(llvm::StringRef, llvm::MemoryBufferRef, std::__msan::function<bool (llvm::object::ObjectFile&, llvm::DWARFContext&, llvm::Twine const&, llvm::raw_ostream&)>, llvm::raw_ostream&) llvm-project/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp:686:12
    #19 0x55fcfcdbc35d in handleFile(llvm::StringRef, std::__msan::function<bool (llvm::object::ObjectFile&, llvm::DWARFContext&, llvm::Twine const&, llvm::raw_ostream&)>, llvm::raw_ostream&) llvm-project/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp:724:10
    #20 0x55fcfcdbb99a in main llvm-project/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp:820:18
    #21 0x7faa82510632 in __libc_start_main
    #22 0x55fcfcd15169 in _start

SUMMARY: MemorySanitizer: use-of-uninitialized-value llvm-project/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp:870:7 in llvm::DWARFContext::getTUIndex()

Full list of affected tests, the immediate cause looks similar (use-of-uninitialized-value in llvm::DWARFContext::getTUIndex()):

llvm/test/CodeGen/X86/dwarf-headers.ll
llvm/test/CodeGen/X86/dwarf-split-line-1.ll
llvm/test/CodeGen/X86/dwarf-split-line-2.ll
llvm/test/DebugInfo/WebAssembly/dwarf-headers.ll
llvm/test/DebugInfo/X86/dwarfdump-header.s
llvm/test/DebugInfo/X86/dwarfdump-str-offsets-v4-dwarf64-dwo.s
llvm/test/DebugInfo/X86/dwarfdump-str-offsets.s
llvm/test/DebugInfo/X86/generate-odr-hash.ll
llvm/test/DebugInfo/X86/gnu-public-names-tu.ll
llvm/test/DebugInfo/X86/string-offsets-multiple-cus.ll
llvm/test/DebugInfo/X86/tu-to-non-named-type.ll
llvm/test/DebugInfo/X86/tu-to-non-tu.ll
llvm/test/DebugInfo/X86/type_units_with_addresses.ll
ayermolo reopened this revision.Wed, Jan 11, 3:13 PM

Sorry for break, let me take a look.

This revision is now accepted and ready to land.Wed, Jan 11, 3:13 PM
ayermolo updated this revision to Diff 488453.Wed, Jan 11, 7:07 PM

fixed ub, mem santiziers

ayermolo updated this revision to Diff 488644.Thu, Jan 12, 7:17 AM

fixed one more mem sanitizer bug

dblaikie accepted this revision.Thu, Jan 12, 9:44 AM

Looks OK, if it clears up the failures.

Looks OK, if it clears up the failures.

thanks. Re-ran ninja check all locally with memsan build, and came back clean.

This revision was automatically updated to reflect the committed changes.

@dblaikie looks like no more build bot breaks. :D Thank you again for reviewing.