[DebugInfo] Support DWARFv5 source code embedding extension

Authored by scott.linder on Jan 31 2018, 1:55 PM.



In DWARFv5 the Line Number Program Header is extensible, allowing values
with new content types. In this extension two new content types are
added: DW_LNCT_has_source indicates whether embedded source is available
for the file, and DW_LNCT_source contains the embedded source itself.

Add new optional attribute for !DIFile IR metadata called source which
contains source text. Use this to output the source to the DWARF line
table of code objects. Analogously extend METADATA_FILE in Bitcode and
.file directive in ASM to support optional source.

Teach llvm-dwarfdump and llvm-objdump about the new values. Output
the source below file_names entries in llvm-dwarfdump, and support
embedded sources for the -source option in llvm-objdump.

Diff Detail

scott.linder created this revision.Jan 31 2018, 1:55 PM
aprantl added inline comments.Jan 31 2018, 2:04 PM
750 ↗(On Diff #132268)

How did you come up with the number?
Are you reasonably sure that no one else is using it?
Perhaps add a prefix like LLVM_ or AMDGPU_ to the constants?

16 ↗(On Diff #132268)

The test with the .bc file in test/Bitcode should be to make sure we correctly upgrade the old format.

To test the new format, there should be a round-trip test in test/IR that does llvm-as | llvm-dis | llvm-as | llvm-dis.

aprantl added inline comments.Jan 31 2018, 2:08 PM
512 ↗(On Diff #132268)

It is probably better to make DIFile variable length so we don't have to pay for the extra pointer when we don't need it.

JDevlieghere added inline comments.Feb 1 2018, 2:33 AM
858 ↗(On Diff #132268)

I learned yesterday that we stylize this as DWARF v5

40 ↗(On Diff #132268)

Alignment-wise it might be better to swap these two.

76 ↗(On Diff #132268)

Maybe add a comment like the one above. Now these two look grouped.

151 ↗(On Diff #132268)

I'm not a fan of these two things not being consistent. Either both should be options or both should be empty StringRefs? This seems to be recurring throughout this patch. I think Optional better conveys the idea, but would require more API changes.

923 ↗(On Diff #132268)

else can be omitted here.

3363 ↗(On Diff #132268)

else can be omitted here too.

Thank you both for the comments, I am switching dev machines so I will be a bit slow to implement the necessary changes.

750 ↗(On Diff #132268)

I am in the process of proposing this extension for inclusion in the next DWARF standard, but for now I will add a prefix.

I have not done much research on what values (if any) in the user-range are already in use, so I will try to find some references on that.

151 ↗(On Diff #132268)

I agree, but would like feedback on the approach to fixing it.

The new code uses an Optional to explicitly support an empty source string, without it being assumed to simply be absent, so that None != "". While it may not have many practical uses, it seems better to be explicit.

The old code is also explicit, but the paired ChecksumKind is what encodes whether a checksum is available. I do not want to duplicate this, as it leads to inconsistencies when CSKind != CSK_None, but Checksum == None, or vice-versa.

The best option I see is to wrap the pair in an optional (Optional<pair<CSKind, Checksum>>), remove the CSK_None variant, and maybe make the pair an explicit struct with members for the kind and checksum. Would this be an acceptable fix?

JDevlieghere added inline comments.Feb 2 2018, 1:26 AM
151 ↗(On Diff #132268)

I like the idea of the explicit struct (DIFile::Checksum?) wrapped in an optional. If it's not too big of a change we might as well move the ChecksumKind enum in there too?

scott.linder marked 4 inline comments as done.Feb 5 2018, 12:28 PM
scott.linder added inline comments.
151 ↗(On Diff #132268)

I have started an implementation of this change, but am unsure how to deal with backwards compatibility.

The ChecksumKind values appear to map directly to the constants used in the Bitcode, and more critically to the constants used in CodeView debug info; I feel removing the CSK_None variant will obfuscate the relationship, prevent moving these constants to a .def file (a TODO in the code), and force some extra translation to retain compatibility.

If we are OK with this, I am still not certain where the relationship is actually defined; the current DIFile::ChecksumKind simply gets cast to unsigned before being passed to MCStreamer::EmitCVFileDirective. There is no mention of codeview::FileChecksumKind, which does not explicitly define what unsigned integer values map to each variant anyway. The current solution seems very implicit, but I am not confident enough to say it should be changed.

The issue also comes up with IR, as CSK_None will need to be accepted as a valid token, and will need to override any checksum string present.

Assuming these inconsistencies are OK, I don't know how to represent this explicitly in the code; where are "format upgrades" handled? Or should I just accept the old format when parsing (translating things where necessary) and only output the new format (essentially never outputting CSK_None).

The alternative is to retain the CSK_None variant, and wrap it in the Optional, which would allow something of the form of Some(None, "") to be represented, which I would like to avoid. Additionally, when encoding None the actual encoding would need to be defined for CodeView, and I have not yet found where that is explicitly stated.

512 ↗(On Diff #132268)

I'm not sure I understand what you mean by variable length here; can you explain, or give me an example to reference?

3363 ↗(On Diff #132268)

The else here is required; the intent is to support any number of named keywords, while still rejecting unknown keywords. The alternative to the else is to continue at the end of each other branch, which I think is more confusing.

I addressed the API concerns in D43043 and it's matching clang patch.

The patch breaks textual IR compatibility, but keeps the Bitcode identical. The CSK_None variant is gone, so there can be no ambiguity with the new Optional.

I also took the opportunity to make the relationship between DIFile::ChecksumKind and codeview::FileChecksumKind explicit.

I opened D43623 to update the C-API with checksum support, as this patch adds source support to the same API.

The latest diff has a few changes:

  • Add support for writing the source text to the .debug_line_str section, rather than inline in the .debug_line (using the recent work done by Paul Robinson).
  • Require source text be present in all entries of a line table, or none of them, the same as with MD5. This removes the need for the DW_LNCT_has_source flag, which is removed.
  • Add an LLVM_ prefix to the new content type, and confirm the encoding is not in use by GCC.
  • Modify the format of file_names in llvm-dwarfdump to accommodate this patch and future changes. The new format places each column on a new line, so it can naturally accommodate new content types, even when they are variable length. Along with this, I updated the display of the .debug_line_str section to quote and escape its contents to prevent interpreting characters like newlines. Affected tests are updated.
  • Remove C-API update.
  • Update metadata unit test.

There is still an outstanding comment about making a struct variable-length, but I am still not clear what that entails. I believe all other feedback from the previous version of the patch has now been addressed.

JDevlieghere accepted this revision.Feb 23 2018, 8:41 AM

Other than the one small nit about the \brief I have no complaints. So LGTM, but maybe give the others a chance to review this before landing.

60 ↗(On Diff #135640)

We omit the brief for new/updated code as autobrief is now enabled.

755 ↗(On Diff #135640)

Same here

This revision is now accepted and ready to land.Feb 23 2018, 8:41 AM

It seems a little counter-intuitive for embedded source to have a filename, but I guess that's so the debugger has a familiar-looking handle to refer to it?

22 ↗(On Diff #135640)

If you have Optional you can remove None.h.

20 ↗(On Diff #135640)

Optional requires None so you shouldn't need to include None.h explicitly.

1353 ↗(On Diff #135640)

This used to error out if size == 4, now it does not?

12 ↗(On Diff #135640)

Optional requires None so you shouldn't need to include None.h explicitly.

19 ↗(On Diff #135640)

This is the only change to DebugInfo.cpp? You shouldn't have to do that.

10 ↗(On Diff #135640)

Optional requires None so I think you don't need to include None.h explicitly.

14 ↗(On Diff #135640)

Adding Optional means you can remove None.h.

128 ↗(On Diff #135640)

you don't need the trailing {{.*}} on these lines. Also, use SINGLE-NEXT and CHECK-NEXT to check the filenames?

141 ↗(On Diff #135640)

Don't need the trailing {{.*}} on these checks. Also, use FISSION-NEXT to check the filenames?

162 ↗(On Diff #135640)

Don't need the trailing {{.*}} on all the above. Also use -NEXT to check the filenames?

This revision addresses the latest round of feedback.

Currently the filename is still included in the line table because the embedded source is acting like a cache, and just makes source available even when the original filesystem is not. For the current implementations we have in mind with the extension, there is already a valid filename to populate here.

scott.linder marked 13 inline comments as done.Feb 23 2018, 12:25 PM
This revision was automatically updated to reflect the committed changes.