This is an archive of the discontinued LLVM Phabricator instance.

[DWARF] Make llvm-dwp handle DWARF v5 string offsets tables and indexed strings.
AbandonedPublic

Authored by wolfgangp on Feb 5 2018, 4:37 PM.

Download Raw Diff

Details

Reviewers

Summary

llvm-dwp needs to remap string offsets from input .dwo or .dwp files and therefore needs to be made aware of DWARF v5 string offsets tables now that llvm generates them. Unlike the pre-v5 GNU-style string offsets tables, DWARF v5 string offsets tables have a header describing length, format and version of the table. This patch preserves the header and performs some simple validation when it rewrites the string offsets table contributions.
It also makes llvm-dwp handle v5 compile unit headers.

One thing to note is that the current implementation recreates the string offsets table in one single loop over the entire string offsets section. Since we want to support a mixture of v5 and pre-v5 CUs, we can't parse the section without knowing where the pre-v5 units' contributions are. This is a similar problem as in llvm-dwarfdump, so we go unit by unit in re-writing the table.

Diff Detail

Event Timeline

wolfgangp created this revision.Feb 5 2018, 4:37 PM

Herald added subscribers: mgrang, JDevlieghere, aprantl. · View Herald TranscriptFeb 5 2018, 4:37 PM

dblaikie added inline comments.Feb 7 2018, 6:43 PM

test/tools/llvm-dwp/X86/invalid_string_form.test
3 ↗	(On Diff #132902)	Should the error message specify the form it was encoded with? (& maybe mention that it's unsupported in dwo/dwp files, to highlight that for the reader so they're less confused/not lead into thinking this form is just unsupported by LLVM tools more generally)
test/tools/llvm-dwp/X86/string_offsets.test
11 ↗	(On Diff #132902)	Yeah, I forget whether I already mentioned this in other tests. The reason is that writing an ELF file requires seeking through the file (or doing a lot of precomputation about the size of things) to write the section table offset in the ELF header (basically: write the ELF header but leave a gap, write all the sections (thus discovering their sizes and offsets), then write the section tables, etc - then seek back to the start of the file to fill in the offset to the section tables). I'm probably not using the right terminology. Other tools, like LLVM itself - write the file to a temp location, then copy from that I think. Be nice to fix this one way or another, but I'm not sure of the best approach. Maybe just detecting if the output file is seekable, and if it isn't, using a temporary file. That way we don't incur the temporary file overhead when the output location /is/ seekable.
13–15 ↗	(On Diff #132902)	Could you enumerate what those codepaths are? & it might be better to test them separately if there are interesting things at each DWP step? I guess what I'm trying to say/understand is why a test would involve two invocations of the dwp tool, rather than one. You can specify more than 2 inputs to DWP together (either multiple DWOs, DWPs, or any combination) so maybe you only need one invocation, with 3 DWO inputs? Not sure.
tools/llvm-dwp/llvm-dwp.cpp
728 ↗	(On Diff #132902)	I'm not quite following this change (from 'continue' to 'else') could you help me understand it?

Addressed review comments, added a test for detecting DW_FORM_strp which is not supported in dwp files.

test/tools/llvm-dwp/X86/invalid_string_form.test
3 ↗	(On Diff #132902)	I guess it would make sense to single out DW_FORM_strp, since it's supported in a dwo file, but not in a dwp file. Other than that, any other string form wouldn't be supported anywhere. I made that change, let me know if that's better.
test/tools/llvm-dwp/X86/string_offsets.test
13–15 ↗	(On Diff #132902)	There are 2 relevant code paths in write(). One is processing a dwo input file (which doesn't have an index), the other a dwp file (which does). Both file types can be combined in any order and any number, but it's really important to test the dwp input for the following reason: Since we have to go unit by unit when rewriting the string offsets contributions, we need to collect all the individual contributions to .debug_str_offsets.dwo first. This happens during the initial scanning of the CU index table and so occurs in the order in which the CUs appear in the index table. This may not be the same order as their contributions are laid out in .debug_str_offsets.dwo, so before we write the contributions back to the output dwp file, we need to sort them according to their offsets in .debug_str_offsets.dwo. This happens in writeStringOffsetsDWP(). It's important to preserve the order of string offsets table contributions in the output because the TU index table refers to the exact same contributions. So I was thinking that a good way to test this would be to first verify the dwo codepath, and then the dwp codepath, assuming that the dwp that was generated in the first step is correct. I added some tests to verify the first step. If you're uncomfortable with basing the second test on the results of the first, I'll figure something else out.
tools/llvm-dwp/llvm-dwp.cpp
728 ↗	(On Diff #132902)	No reason. I think I had some code at the end in a previous attempt and didn't revert back to the continue - approach. I removed the change.

Curious combination of tests - some assembly, some IR. I think I made the existing test cases all checked in binaries.

Perhaps we could come up with a uniform approach/strategy here? (& yeah, I'd probably separate the "testing DWP input" from "testing DWP output" rather than weaving one into the other)

In D42937#1003956, @dblaikie wrote:

Curious combination of tests - some assembly, some IR. I think I made the existing test cases all checked in binaries.

Perhaps we could come up with a uniform approach/strategy here? (& yeah, I'd probably separate the "testing DWP input" from "testing DWP output" rather than weaving one into the other)

I admit I have a preference for assembly files over checked-in binaries. With checked-in binaries you'd probably have to check in the assembly source as well for reference, and then you have to keep them consistent.

In D42937#1004102, @wolfgangp wrote:

In D42937#1003956, @dblaikie wrote:

Curious combination of tests - some assembly, some IR. I think I made the existing test cases all checked in binaries.

Perhaps we could come up with a uniform approach/strategy here? (& yeah, I'd probably separate the "testing DWP input" from "testing DWP output" rather than weaving one into the other)

I admit I have a preference for assembly files over checked-in binaries. With checked-in binaries you'd probably have to check in the assembly source as well for reference, and then you have to keep them consistent.

Same here, which is why I keep doing it.
Hard to code-review or edit a binary file. I don't think it costs that much to run llvm-mc on an assembler source, and the benefit in ease of understanding the test is huge.

Changed the test (string_offsets.test) to perform the 2 tests independently (instead of the second test depending on the result of the first). Added a hand-constructed dwp file to serve as input to test case 2. It has a mix of v5 and v4 units, type units to make sure the reference strings correctly and has the string offsets table contributions in a different order than the order in which the CUs appear in the index table.

I'm still producing the dwo files from IR, I could turn those into assembly files as well, if that's preferable.

In D42937#1008337, @wolfgangp wrote:

I'm still producing the dwo files from IR, I could turn those into assembly files as well, if that's preferable.

I think it'd probably be better - could you explain what the motivation is for them being IR tests? (what's the different needs there compared to the assembly test cases?)

test/tools/llvm-dwp/Inputs/string_offsets/mixed_dwp.s
1–4 ↗	(On Diff #134350)	Worth having repro steps here? (original source, commands used to produce it, including describing the manual steps used to reorder sections) Should this comment explain why the string offset contributions are reordered? Might help - though probably want to explain it in the test file too. Also perhaps it'd be easier if the string_offsets.test file were split (there aren't that many "BOTH" lines - might be easier to read as two separate tests rather than trying to have pieces in common?) & then this mixed_dwp.s could be its own test with the RUN/CHECK lines written in the file directly, rather than split over two files?

In D42937#1009329, @dblaikie wrote:

In D42937#1008337, @wolfgangp wrote:

I'm still producing the dwo files from IR, I could turn those into assembly files as well, if that's preferable.

I think it'd probably be better - could you explain what the motivation is for them being IR tests? (what's the different needs there compared to the assembly test cases?)

Can't really say there was any particular motivation other than the fact that producing these files with v5 string offsets tables via llc/objcopy is supported now and so the test case provides a small measure of integration testing. But no matter, I'll hand-code them in assembly, then.

Removed the test for dwo-only input since it's redundant. The 'mixed' test of dwo + dwp file covers all cases.

Made the mixed test an assembly file, which does assemble another unit (c.s) in order to perform the test. The auxiliary units (in IR) a.ll b.ll and c.ll have been removed. c.ll has been replaced by the aforementioned c.s.

Added some more commentary to explain the rationale for the difference between the order of string table contributions and corresponding CUs in the index table.

wolfgangp added inline comments.Feb 21 2018, 11:06 AM

test/tools/llvm-dwp/Inputs/string_offsets/mixed_dwp.s
1–4 ↗	(On Diff #134350)	The test is now an assembly file, though it still needs to assemble a third unit, which lives in the Input directory. There isn't too much repro for the hand-code dwp file, other than the units themselves. The index sections are hand-constructed.

(this patch doesn't have any changes to llvm-dwp - are those missing? Did they already get reviewed/submitted? something else?)

string_offsets_mixed.s
40–46	Oh... I hadn't realized/understood/thought about this. That's kind of awkward - mixing blobs with headers and blobs without headers in the same section (str_offsets) & then having to use the CU/TUs to disambiguate/dictate how to parse those chunks. Can we avoid that? Could we just say v4 and v5 are incompatible? Have a flag or something that checks. Then we could always walk the str_offsets alone, either expecting header'd sections (which I hope are self descriptive - once you know you're reading a v5 str_offsets, you don't need to consult the CU for anything to do that?) or non-header'd sections (where you just treat every word as a string offset without consideration for how those are divided into contributions) (Ah, I see you mentioned/highlighted that in the patch description too)

In D42937#1016110, @dblaikie wrote:

(this patch doesn't have any changes to llvm-dwp - are those missing? Did they already get reviewed/submitted? something else?)

There were no changes to llvm-dwp.c from the previous review, I'm not clear why the diff is not displayed by phabricator. It certainly wasn't submitted yet. I'm able to see it by clicking on "Show older changes".

wolfgangp added inline comments.Feb 22 2018, 5:02 PM

string_offsets_mixed.s
40–46	That's kind of awkward - mixing blobs with headers and blobs without headers in the same section (str_offsets) & then having to use the CU/TUs to disambiguate/dictate how to parse those chunks. Can we avoid that? Could we just say v4 and v5 are incompatible? Have a flag or something that checks. That would have been the simpler solution, but llvm-dwarfdump is already handling the same thing, so I thought it would be inconsistent if llvm-dwp didn't handle it too. We also would have to reject any dwp input files with mixed units that were created by non-llvm tools. It would certainly be easy to reject any mixing of v5 and v4 units. Pity, though, since it's already working...

dblaikie added inline comments.Feb 26 2018, 4:47 PM

string_offsets_mixed.s
40–46	I'd be happy to hear some other people's opinions on this (Adrian & Paul?). It seems pretty unfortunate to have this split between versions making a bunch of complexity to support like this.

probinson added inline comments.Feb 26 2018, 5:51 PM

string_offsets_mixed.s
40–46	The sticking point for me is whether there are objects in the wild that use GNU-style .debug_str_offsets, and might get linked with proper v5 .debug_str_offsets. The linker will combine them without a second thought, and it's unreasonable to ask a linker to verify. I guess... given that you need to enable this stuff explicitly for gcc with DWARF v4, and the point is to reduce relocations which is an irrelevant consideration for fission, it's maybe not so likely that we have objects like this in the wild. That makes it reasonable to put our collective foot down and say llvm-dwp won't mix-n-match. I do think it's worthwhile for llvm-dwarfdump to handle the mixed case, partly because it's already done and partly because it's a different kind of tool (diagnostic/analysis rather than production).

Ok, souunds like the best thing to do would be to abandon this approach and start over in favor of a v5 only approach (in addition to the existing one of course).

wolfgangp abandoned this revision.Feb 28 2018, 10:24 AM

Revision Contents

Path

Size

dwp-string-offsets-invalid-1.s

62 lines

dwp-string-offsets-invalid-2.s

62 lines

dwp-string-offsets-invalid-3.s

57 lines

dwp-string-offsets-invalid-4.s

38 lines

invalid_string_form.test

2 lines

string_offsets_mixed.s

373 lines

Diff 135299

dwp-string-offsets-invalid-1.s

				# RUN: llvm-mc -triple x86_64-unknown-linux %s -filetype=obj -o %t.dwo
				# RUN: not llvm-dwp %t.dwo -o %t.dwp \|& FileCheck %s

				# Test object to verify that dwp handles invalid DWARF v5 contributions
				# to the string offsets table. We have one simple compile unit.
				#
				.section .debug_str.dwo,"MS",@progbits,1
				str_producer:
				.asciz "Handmade DWARF producer"
				str_CU1:
				.asciz "Compile_Unit_1"
				str_CU1_dir:
				.asciz "/home/test/CU1"

				.section .debug_str_offsets.dwo,"",@progbits
				# An invalid DWARF v5 contribution to the .debug_str_offsets.dwo section.
				.debug_str_offsets_object_file1_start:
				.long 500 # Invalid length
				.short 5 # DWARF version
				.short 0 # Padding
				.debug_str_offsets_base_1:
				.long str_producer-.debug_str.dwo
				.long str_CU1-.debug_str.dwo
				.long str_CU1_dir-.debug_str.dwo
				.debug_str_offsets_object_file1_end:

				# A simple abbrev section.
				.section .debug_abbrev.dwo,"",@progbits
				.byte 0x01 # Abbrev code
				.byte 0x11 # DW_TAG_compile_unit
				.byte 0x00 # DW_CHILDREN_no
				.byte 0x25 # DW_AT_producer
				.byte 0x1a # DW_FORM_strx
				.byte 0x03 # DW_AT_name
				.byte 0x1a # DW_FORM_strx
				.byte 0x03 # DW_AT_name
				.byte 0x1a # DW_FORM_strx
				.byte 0x00 # EOM(1)
				.byte 0x00 # EOM(2)
				.byte 0x00 # EOM(3)
				abbrev_end:

				.section .debug_info.dwo,"",@progbits

				# DWARF v5 CU header.
				CU1_5_start:
				.long CU1_5_end-CU1_5_version # Length of Unit
				CU1_5_version:
				.short 5 # DWARF version number
				.byte 1 # DWARF Unit Type
				.byte 8 # Address Size (in bytes)
				.long .debug_abbrev.dwo # Offset Into Abbrev. Section
				# The compile-unit DIE, which has a DW_AT_producer, DW_AT_name
				# and DW_AT_compdir.
				.byte 1 # Abbreviation code
				.byte 0 # The index of the producer string
				.byte 1 # The index of the CU name string
				.byte 2 # The index of the comp dir string
				.byte 0 # NULL
				CU1_5_end:

				# CHECK: String offsets table contribution has invalid length

dwp-string-offsets-invalid-2.s

				# RUN: llvm-mc -triple x86_64-unknown-linux %s -filetype=obj -o %t.dwo
				# RUN: not llvm-dwp %t.dwo -o %t.dwp \|& FileCheck %s

				# Test object to verify that dwp handles invalid DWARF v5 contributions
				# to the string offsets table. We have one simple compile unit.
				#
				.section .debug_str.dwo,"MS",@progbits,1
				str_producer:
				.asciz "Handmade DWARF producer"
				str_CU1:
				.asciz "Compile_Unit_1"
				str_CU1_dir:
				.asciz "/home/test/CU1"

				.section .debug_str_offsets.dwo,"",@progbits
				# An invalid DWARF v5 contribution to the .debug_str_offsets.dwo section.
				.debug_str_offsets_object_file1_start:
				.long 0xfffffff4 # Invalid length
				.short 5 # DWARF version
				.short 0 # Padding
				.debug_str_offsets_base_1:
				.long str_producer-.debug_str.dwo
				.long str_CU1-.debug_str.dwo
				.long str_CU1_dir-.debug_str.dwo
				.debug_str_offsets_object_file1_end:

				# A simple abbrev section.
				.section .debug_abbrev.dwo,"",@progbits
				.byte 0x01 # Abbrev code
				.byte 0x11 # DW_TAG_compile_unit
				.byte 0x00 # DW_CHILDREN_no
				.byte 0x25 # DW_AT_producer
				.byte 0x1a # DW_FORM_strx
				.byte 0x03 # DW_AT_name
				.byte 0x1a # DW_FORM_strx
				.byte 0x03 # DW_AT_name
				.byte 0x1a # DW_FORM_strx
				.byte 0x00 # EOM(1)
				.byte 0x00 # EOM(2)
				.byte 0x00 # EOM(3)
				abbrev_end:

				.section .debug_info.dwo,"",@progbits

				# DWARF v5 CU header.
				CU1_5_start:
				.long CU1_5_end-CU1_5_version # Length of Unit
				CU1_5_version:
				.short 5 # DWARF version number
				.byte 1 # DWARF Unit Type
				.byte 8 # Address Size (in bytes)
				.long .debug_abbrev.dwo # Offset Into Abbrev. Section
				# The compile-unit DIE, which has a DW_AT_producer, DW_AT_name
				# and DW_AT_compdir.
				.byte 1 # Abbreviation code
				.byte 0 # The index of the producer string
				.byte 1 # The index of the CU name string
				.byte 2 # The index of the comp dir string
				.byte 0 # NULL
				CU1_5_end:

				# CHECK: Invalid string offsets table contribution

dwp-string-offsets-invalid-3.s

				# RUN: llvm-mc -triple x86_64-unknown-linux %s -filetype=obj -o %t.dwo
				# RUN: not llvm-dwp %t.dwo -o %t.dwp \|& FileCheck %s

				# Test object to verify that dwp handles invalid DWARF v5 contributions
				# to the string offsets table. We have one simple compile unit.
				#
				.section .debug_str.dwo,"MS",@progbits,1
				str_producer:
				.asciz "Handmade DWARF producer"
				str_CU1:
				.asciz "Compile_Unit_1"
				str_CU1_dir:
				.asciz "/home/test/CU1"

				.section .debug_str_offsets.dwo,"",@progbits
				# An invalid DWARF v5 contribution to the .debug_str_offsets.dwo section.
				# The section is too short to contain a valid header.
				.debug_str_offsets_object_file1_start:
				.long 0
				.debug_str_offsets_object_file1_end:

				# A simple abbrev section.
				.section .debug_abbrev.dwo,"",@progbits
				.byte 0x01 # Abbrev code
				.byte 0x11 # DW_TAG_compile_unit
				.byte 0x00 # DW_CHILDREN_no
				.byte 0x25 # DW_AT_producer
				.byte 0x1a # DW_FORM_strx
				.byte 0x03 # DW_AT_name
				.byte 0x1a # DW_FORM_strx
				.byte 0x03 # DW_AT_name
				.byte 0x1a # DW_FORM_strx
				.byte 0x00 # EOM(1)
				.byte 0x00 # EOM(2)
				.byte 0x00 # EOM(3)
				abbrev_end:

				.section .debug_info.dwo,"",@progbits

				# DWARF v5 CU header.
				CU1_5_start:
				.long CU1_5_end-CU1_5_version # Length of Unit
				CU1_5_version:
				.short 5 # DWARF version number
				.byte 1 # DWARF Unit Type
				.byte 8 # Address Size (in bytes)
				.long .debug_abbrev.dwo # Offset Into Abbrev. Section
				# The compile-unit DIE, which has a DW_AT_producer, DW_AT_name
				# and DW_AT_compdir.
				.byte 1 # Abbreviation code
				.byte 0 # The index of the producer string
				.byte 1 # The index of the CU name string
				.byte 2 # The index of the comp dir string
				.byte 0 # NULL
				CU1_5_end:

				# CHECK: Invalid string offsets table contribution

dwp-string-offsets-invalid-4.s

				# RUN: llvm-mc -triple x86_64-unknown-linux %s -filetype=obj -o %t.dwo
				# RUN: not llvm-dwp %t.dwo -o %t.dwp \|& FileCheck %s

				# Test object to verify that dwp rejects input files that use DW_FORM_strp.
				#
				.section .debug_str.dwo,"MS",@progbits,1
				str_name:
				.asciz "CU1"

				# A simple abbrev section.
				.section .debug_abbrev.dwo,"",@progbits
				.byte 0x01 # Abbrev code
				.byte 0x11 # DW_TAG_compile_unit
				.byte 0x00 # DW_CHILDREN_no
				.byte 0x03 # DW_AT_name
				.byte 0x0e # DW_FORM_strp
				.byte 0x00 # EOM(1)
				.byte 0x00 # EOM(2)
				.byte 0x00 # EOM(3)
				abbrev_end:

				.section .debug_info.dwo,"",@progbits

				# DWARF v5 CU header.
				CU1_5_start:
				.long CU1_5_end-CU1_5_version # Length of Unit
				CU1_5_version:
				.short 5 # DWARF version number
				.byte 1 # DWARF Unit Type
				.byte 8 # Address Size (in bytes)
				.long .debug_abbrev.dwo # Offset Into Abbrev. Section
				# The rudimentary compile-unit DIE, which has a DW_AT_name.
				.byte 1 # Abbreviation code
				.long str_name-.debug_str.dwo # The offset of the name string
				.byte 0 # NULL
				CU1_5_end:

				# CHECK: DW_FORM_strp is not supported in dwp files

invalid_string_form.test

	RUN: not llvm-dwp %p/../Inputs/invalid_string_form.dwo -o %t 2>&1 \| FileCheck %s			RUN: not llvm-dwp %p/../Inputs/invalid_string_form.dwo -o %t 2>&1 \| FileCheck %s

	CHECK: error: string field encoded without DW_FORM_string or DW_FORM_GNU_str_index			CHECK: error: string field encoded with unsupported form

string_offsets_mixed.s

				# Produce a dwp file from a dwo and a dwp file. This ensures that both relevant
				# code paths in llvm-dwp's write() are exercised.

				# RUN: llvm-mc -filetype=obj %p/../Inputs/string_offsets/c.s -o %tc.dwo
				# RUN: llvm-mc %s -filetype=obj -o %tmixed.dwp
				# RUN: llvm-dwp %tmixed.dwp %tc.dwo -o %t2.dwp
				# RUN: llvm-dwarfdump -v %t2.dwp \| FileCheck %s

				# The dwp input file (this file) has been hand constructed and contains one v5
				# compile unit, one v4 compile unit, one v5 type unit and one v4 type unit.
				#
				# The compile and type units have been generated from the following 2 source files by
				# compiling them with clang -S -gsplit-dwarf -gdwarf-5 -fdebug-types-section for file 1
				# and clang -S -gsplit-dwarf -gdwarf-4 -fdebug-tpes-section for file 2.
				#
				# file1:
				# enum E1 {a};
				# E1 glob1;
				#
				# file2:
				# enum E2 {d};
				# E2 glob2;
				#
				# The following sections were extracted and the compile unit and type unit DIEs were
				# reduced to a minimum:
				# .debug_str.dwo
				# .debug_abbrev.dwo
				# .debug_str_offsets.dwo
				# .debug_info.dwo
				# .debug_types.dwo
				#
				# The order of contributions to the string offsets table is different from the order in
				# which the compile units appear in the CU index table. The second compile unit's
				# contribution precedes the first unit's contribution.
				#
				# The rationale for this is the following: The initial parsing of CUs in the input file
				# occurs in the order in which the CUs appear in the index table. It is possible that
				# this order does not correspond to the order in which contributions to the string
				# offsets table are laid out.
				# With DWARF v5 we need to match up CUs with their corresponding string offsets table
				# contributions (in order to properly process the contribution headers) and hence need
				# to remap and write them unit-by-unit to the output file. The pre-v5 implementation
				# did not need to worry about this because it could just rewrite the input file's string
				# offsets table in one chunk. The test therefore ensures that llvm-dwp rewrites the
				# string offsets table contributions in the order they appear in the input file section,
				# and not in the order in which their corresponding CUs appear in the index table.
				dblaikieUnsubmitted Not Done Reply Inline Actions Oh... I hadn't realized/understood/thought about this. That's kind of awkward - mixing blobs with headers and blobs without headers in the same section (str_offsets) & then having to use the CU/TUs to disambiguate/dictate how to parse those chunks. Can we avoid that? Could we just say v4 and v5 are incompatible? Have a flag or something that checks. Then we could always walk the str_offsets alone, either expecting header'd sections (which I hope are self descriptive - once you know you're reading a v5 str_offsets, you don't need to consult the CU for anything to do that?) or non-header'd sections (where you just treat every word as a string offset without consideration for how those are divided into contributions) (Ah, I see you mentioned/highlighted that in the patch description too) dblaikie: Oh... I hadn't realized/understood/thought about this. That's kind of awkward - mixing blobs…
				wolfgangpAuthorUnsubmitted Not Done Reply Inline Actions That's kind of awkward - mixing blobs with headers and blobs without headers in the same section (str_offsets) & then having to use the CU/TUs to disambiguate/dictate how to parse those chunks. Can we avoid that? Could we just say v4 and v5 are incompatible? Have a flag or something that checks. That would have been the simpler solution, but llvm-dwarfdump is already handling the same thing, so I thought it would be inconsistent if llvm-dwp didn't handle it too. We also would have to reject any dwp input files with mixed units that were created by non-llvm tools. It would certainly be easy to reject any mixing of v5 and v4 units. Pity, though, since it's already working... wolfgangp: > That's kind of awkward - mixing blobs with headers and blobs without headers in the same…
				dblaikieUnsubmitted Not Done Reply Inline Actions I'd be happy to hear some other people's opinions on this (Adrian & Paul?). It seems pretty unfortunate to have this split between versions making a bunch of complexity to support like this. dblaikie: I'd be happy to hear some other people's opinions on this (Adrian & Paul?). It seems pretty…
				probinsonUnsubmitted Not Done Reply Inline Actions The sticking point for me is whether there are objects in the wild that use GNU-style .debug_str_offsets, and might get linked with proper v5 .debug_str_offsets. The linker will combine them without a second thought, and it's unreasonable to ask a linker to verify. I guess... given that you need to enable this stuff explicitly for gcc with DWARF v4, and the point is to reduce relocations which is an irrelevant consideration for fission, it's maybe not so likely that we have objects like this in the wild. That makes it reasonable to put our collective foot down and say llvm-dwp won't mix-n-match. I do think it's worthwhile for llvm-dwarfdump to handle the mixed case, partly because it's already done and partly because it's a different kind of tool (diagnostic/analysis rather than production). probinson: The sticking point for me is whether there are objects in the wild that use GNU-style .
				#
				# The index sections were constructed by hand according to the DWARF v5 standard.

				# We check that the final DWP contains 2 v5 CUs with a v4 CU sandwiched between
				# them. We make sure that at least one string from each CU and TU is displayed correctly
				# and that the string offsets table looks correct.

				# CHECK: .debug_info.dwo contents:
				# CHECK-NEXT: Compile Unit:{{.*}}version = 0x0005
				# CHECK-NOT: Compile Unit
				# CHECK: DW_AT_name [DW_FORM_strx1] ( indexed{{.*}}string = "a.cpp")
				#
				# The second compile unit.
				# CHECK: Compile Unit:{{.*}}version = 0x0004
				# CHECK-NOT: Compile Unit
				# CHECK: DW_AT_name [DW_FORM_GNU_str_index] ( indexed{{.*}}string = "b.cpp")
				#
				# The third compile unit.
				# CHECK: Compile Unit:{{.*}}version = 0x0005
				# CHECK-NOT: Compile Unit
				# CHECK: DW_AT_name [DW_FORM_strx1] ( indexed{{.*}}string = "E3")
				#
				# The first type unit.
				# CHECK: .debug_types.dwo contents:
				# CHECK: Type Unit:{{.*}}version = 0x0005
				# CHECK-NOT: Type Unit
				# CHECK: DW_AT_name [DW_FORM_strx1] ( indexed{{.*}}string = "a")
				#
				# The second type unit.
				# CHECK: Type Unit:{{.*}}version = 0x0004
				# CHECK-NOT: Type Unit
				# CHECK: DW_AT_name [DW_FORM_GNU_str_index] ( indexed{{.*}}string = "d")
				#
				# We expect the first contribution to the string offsets section to be from the
				# second compile unit (and hence of version 4).
				# CHECK: .debug_str_offsets.dwo contents:
				# CHECK-NEXT: 0x00000000: Contribution size = 20, Format = DWARF32, Version = 4
				# CHECK-NEXT: 0x00000000:{{.*}}b.dwo"
				#
				# We expect the secont contribution to the string offsets section to be from the
				# first compile unit, which has version 5.
				# CHECK: 0x[[SECONDCONTRIBOFFSET:[0-9a-f]*]]: Contribution size = 20, Format = DWARF32, Version = 5
				# CHECK-NEXT: 0x{{.*}}a.dwo"
				#
				# Check that the third contribution is of version 5 and contains the string "c.dwo"
				# CHECK: Contribution size = 16, Format = DWARF32, Version = 5
				# CHECK-NEXT: {{.*}}c.dwo"

				.section .debug_str.dwo,"MS",@progbits,1
				str_dwo_name1:
				.asciz "/test/a.dwo"
				str_dwo_name2:
				.asciz "/test/b.dwo"
				str_CU1:
				.asciz "a.cpp"
				str_CU2:
				.asciz "b.cpp"
				str_TU1:
				.asciz "Type_Unit_1"
				str_TU2:
				.asciz "Type_Unit_2"
				str_enum1:
				.asciz "E1"
				str_enum2:
				.asciz "E2"
				str_enumerator1:
				.asciz "a"
				str_enumerator2:
				.asciz "d"

				.section .debug_str_offsets.dwo,"",@progbits
				# Object files 2's portion of the .debug_str_offsets.dwo section.
				# This is a pre-DWARF v5 string offsets table contribution (i.e. no header).
				.debug_str_offsets_object_file2_start:
				.debug_str_offsets_base_2:
				.long str_dwo_name2-.debug_str.dwo
				.long str_CU2-.debug_str.dwo
				.long str_TU2-.debug_str.dwo
				.long str_enum2-.debug_str.dwo
				.long str_enumerator2-.debug_str.dwo
				.debug_str_offsets_object_file2_end:

				# Object files 1's portion of the .debug_str_offsets.dwo section.
				# CU1 and TU1 share a contribution to the string offsets table.
				.debug_str_offsets_object_file1_start:
				.long .debug_str_offsets_object_file1_end-.debug_str_offsets_base_1
				.short 5 # DWARF version
				.short 0 # Padding
				.debug_str_offsets_base_1:
				.long str_dwo_name1-.debug_str.dwo
				.long str_CU1-.debug_str.dwo
				.long str_TU1-.debug_str.dwo
				.long str_enum1-.debug_str.dwo
				.long str_enumerator1-.debug_str.dwo
				.debug_str_offsets_object_file1_end:

				# Abbrevs are shared for all compile and type units of the same version.
				.section .debug_abbrev.dwo,"",@progbits
				V5_abbrev_start:
				.byte 0x01 # Abbrev code
				.byte 0x11 # DW_TAG_compile_unit
				.byte 0x00 # DW_CHILDREN_no
				.short 0x42b0 # DW_AT_GNU_dwo_name
				.byte 0x25 # DW_FORM_strx1
				.byte 0x03 # DW_AT_name
				.byte 0x25 # DW_FORM_strx1
				.short 0x42b1 # DW_AT_GNU_dwo_id
				.byte 0x07 # DW_FORM_data8
				.byte 0x00 # EOM(1)
				.byte 0x00 # EOM(2)
				.byte 0x02 # Abbrev code
				.byte 0x41 # DW_TAG_type_unit
				.byte 0x01 # DW_CHILDREN_yes
				.byte 0x03 # DW_AT_name
				.byte 0x25 # DW_FORM_strx1
				.byte 0x00 # EOM(1)
				.byte 0x00 # EOM(2)
				.byte 0x03 # Abbrev code
				.byte 0x04 # DW_TAG_enumeration_type
				.byte 0x01 # DW_CHILDREN_yes
				.byte 0x03 # DW_AT_name
				.byte 0x25 # DW_FORM_strx1
				.byte 0x00 # EOM(1)
				.byte 0x00 # EOM(2)
				.byte 0x04 # Abbrev code
				.byte 0x28 # DW_TAG_enumerator
				.byte 0x00 # DW_CHILDREN_no
				.byte 0x03 # DW_AT_name
				.byte 0x25 # DW_FORM_strx1
				.byte 0x1c # DW_AT_const_value
				.byte 0x0d # DW_FORM_sdata
				.byte 0x00 # EOM(1)
				.byte 0x00 # EOM(2)
				.byte 0x00 # EOM(3)
				V5_abbrev_end:
				V4_abbrev_start:
				.byte 0x01 # Abbrev code
				.byte 0x11 # DW_TAG_compile_unit
				.byte 0x00 # DW_CHILDREN_no
				.short 0x42b0 # DW_AT_GNU_dwo_name
				.short 0x3e82 # DW_FORM_GNU_str_index
				.byte 0x03 # DW_AT_name
				.short 0x3e82 # DW_FORM_GNU_str_index
				.short 0x42b1 # DW_AT_GNU_dwo_id
				.byte 0x07 # DW_FORM_data8
				.byte 0x00 # EOM(1)
				.byte 0x00 # EOM(2)
				.byte 0x02 # Abbrev code
				.byte 0x41 # DW_TAG_type_unit
				.byte 0x01 # DW_CHILDREN_yes
				.byte 0x03 # DW_AT_name
				.short 0x3e82 # DW_FORM_GNU_str_index
				.byte 0x00 # EOM(1)
				.byte 0x00 # EOM(2)
				.byte 0x03 # Abbrev code
				.byte 0x04 # DW_TAG_enumeration_type
				.byte 0x01 # DW_CHILDREN_yes
				.byte 0x03 # DW_AT_name
				.short 0x3e82 # DW_FORM_GNU_str_index
				.byte 0x00 # EOM(1)
				.byte 0x00 # EOM(2)
				.byte 0x04 # Abbrev code
				.byte 0x28 # DW_TAG_enumerator
				.byte 0x00 # DW_CHILDREN_no
				.byte 0x03 # DW_AT_name
				.short 0x3e82 # DW_FORM_GNU_str_index
				.byte 0x1c # DW_AT_const_value
				.byte 0x0d # DW_FORM_sdata
				.byte 0x00 # EOM(1)
				.byte 0x00 # EOM(2)
				.byte 0x00 # EOM(3)
				V4_abbrev_end:
				abbrev_end:

				.section .debug_info.dwo,"",@progbits
				# DWARF v5 CU header.
				CU1_5_start:
				.long CU1_5_end-CU1_5_version # Length of Unit
				CU1_5_version:
				.short 5 # DWARF version number
				.byte 1 # DWARF Unit Type
				.byte 8 # Address Size (in bytes)
				.long 0 # Offset Into Abbrev. Contribution
				# The compile-unit DIE, which has a DW_AT_GNU_dwo_name, DW_AT_name
				# and DW_AT_GNU_dwo_id.
				.byte 1 # Abbreviation code
				.byte 0 # The index of the dwo name string
				.byte 1 # The index of the CU name string
				.quad 0xaa00bb00cc00dd00 # dwo id
				.byte 0 # NULL
				CU1_5_end:

				# DWARF v4 CU header.
				CU2_4_start:
				.long CU2_4_end-CU2_4_version # Length of Unit
				CU2_4_version:
				.short 4 # DWARF version number
				.long 0 # Offset Into Abbrev. Contribution
				.byte 8 # Address Size (in bytes)
				# The compile-unit DIE, which has a DW_AT_GNU_dwo_name, DW_AT_name
				# and DW_AT_GNU_dwo_id.
				.byte 1 # Abbreviation code
				.byte 0 # The index of the dwo name string
				.byte 1 # The index of the CU name string
				.quad 0xcc00dd00ee00ff00 # dwo id
				.byte 0 # NULL
				CU2_4_end:

				.section .debug_types.dwo,"",@progbits
				# DWARF v5 Type unit header.
				TU1_5_start:
				.long TU1_5_end-TU1_5_version # Length of Unit
				TU1_5_version:
				.short 5 # DWARF version number
				.byte 2 # DWARF Unit Type
				.byte 8 # Address Size (in bytes)
				.long 0 # Offset Into Abbrev. Section
				.quad 0x0011223344556677 # Type Signature
				.long TU1_5_type-TU1_5_start # Type offset
				# The type-unit DIE, which has a name.
				.byte 2 # Abbreviation code
				.byte 2 # Index of the unit type name string
				# The enumeration type DIE, which has a name.
				TU1_5_type:
				.byte 3 # Abbreviation code
				.byte 3 # Index of the enumeration type name string
				# One enumerator, which has a name.
				.byte 4 # Abbreviation code
				.byte 4 # Index of the enumerator string
				.byte 0 # NULL
				.byte 0 # NULL
				TU1_5_end:

				# DWARF v4 Type unit header.
				TU2_4_start:
				.long TU2_4_end-TU2_4_version # Length of Unit
				TU2_4_version:
				.short 4 # DWARF version number
				.long 0 # Offset Into Abbrev. Section
				.byte 8 # Address Size (in bytes)
				.quad 0x00aabbccddeeff99 # Type Signature
				.long TU2_4_type-TU2_4_start # Type offset
				# The type-unit DIE, which has a name.
				.byte 2 # Abbreviation code
				.byte 2 # Index of the unit type name string
				# The enumeration type DIE, which has a name.
				TU2_4_type:
				.byte 3 # Abbreviation code
				.byte 3 # Index of the enumeration type name string
				# One enumerator, which has a name.
				.byte 4 # Abbreviation code
				.byte 4 # Index of the enumerator string
				.byte 0 # NULL
				.byte 0 # NULL
				TU2_4_end:

				.section .debug_cu_index,"",@progbits
				# The index header
				.long 2 # Version
				.long 3 # Columns of contribution matrix
				.long 2 # number of units
				.long 2 # number of hash buckets in table

				# The signatures for both CUs.
				.quad 0xddeeaaddbbaabbee # signature 1
				.quad 0xff00ffeeffaaff00 # signature 2
				# The indexes for both CUs.
				.long 1 # index 1
				.long 2 # index 2
				# The sections to which all CUs contribute.
				.long 1 # DW_SECT_INFO
				.long 3 # DW_SECT_ABBREV
				.long 6 # DW_SECT_STR_OFFSETS

				# The starting offsets of all CU's contributions to info,
				# abbrev and string offsets table.
				.long CU1_5_start-.debug_info.dwo
				.long V5_abbrev_start-.debug_abbrev.dwo
				.long .debug_str_offsets_object_file1_start-.debug_str_offsets.dwo
				.long CU2_4_start-.debug_info.dwo
				.long V4_abbrev_start-.debug_abbrev.dwo
				.long .debug_str_offsets_object_file2_start-.debug_str_offsets.dwo

				# The lengths of all CU's contributions to info, abbrev and
				# string offsets table.
				.long CU1_5_end-CU1_5_start
				.long V5_abbrev_end-V5_abbrev_start
				.long .debug_str_offsets_object_file1_end-.debug_str_offsets_object_file1_start
				.long CU2_4_end-CU2_4_start
				.long V4_abbrev_end-V4_abbrev_start
				.long .debug_str_offsets_object_file2_end-.debug_str_offsets_object_file2_start

				.section .debug_tu_index,"",@progbits
				# The index header
				.long 2 # Version
				.long 3 # Columns of contribution matrix
				.long 2 # number of units
				.long 2 # number of hash buckets in table

				# The signatures for both TUs.
				.quad 0xeeaaddbbaabbeedd # signature 1
				.quad 0x00ffeeffaaff00ff # signature 2
				# The indexes for both TUs.
				.long 1 # index 1
				.long 2 # index 2
				# The sections to which both TUs contribute.
				.long 2 # DW_SECT_TYPES
				.long 3 # DW_SECT_ABBREV
				.long 6 # DW_SECT_STR_OFFSETS

				# The starting offsets of both TU's contributions to info,
				# abbrev and string offsets table.
				.long TU1_5_start-.debug_types.dwo
				.long V5_abbrev_start-.debug_abbrev.dwo
				.long .debug_str_offsets_object_file1_start-.debug_str_offsets.dwo
				.long TU2_4_start-.debug_types.dwo
				.long V4_abbrev_start-.debug_abbrev.dwo
				.long .debug_str_offsets_object_file2_start-.debug_str_offsets.dwo

				# The lengths of both TU's contributions to info, abbrev and
				# string offsets table.
				.long TU1_5_end-TU1_5_start
				.long V5_abbrev_end-V5_abbrev_start
				.long .debug_str_offsets_object_file1_end-.debug_str_offsets_object_file1_start
				.long TU2_4_end-TU2_4_start
				.long V4_abbrev_end-V4_abbrev_start
				.long .debug_str_offsets_object_file2_end-.debug_str_offsets_object_file2_start

This is an archive of the discontinued LLVM Phabricator instance.

[DWARF] Make llvm-dwp handle DWARF v5 string offsets tables and indexed strings.AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 135299

dwp-string-offsets-invalid-1.s

dwp-string-offsets-invalid-2.s

dwp-string-offsets-invalid-3.s

dwp-string-offsets-invalid-4.s

invalid_string_form.test

string_offsets_mixed.s

[DWARF] Make llvm-dwp handle DWARF v5 string offsets tables and indexed strings.
AbandonedPublic