This is an archive of the discontinued LLVM Phabricator instance.

[MC] Generate .debug_line in the 64-bit DWARF format [2/7]
ClosedPublic

Authored by ikudrin on Jun 4 2020, 4:55 AM.

Diff Detail

Event Timeline

ikudrin created this revision.Jun 4 2020, 4:55 AM
jhenderson added inline comments.Jun 4 2020, 5:41 AM
llvm/lib/MC/MCDwarf.cpp
487

Aside: I think this function may be misnamed - shouldn't it be MakeEndMinusStartExpr?

llvm/test/MC/ELF/gen-dwarf64.s
21

FileCheck has relatively recently been extended to improve numeric variable capturing. You can change this line to something like:

name: .debug_line_str[0x00000000[[#FILEOFF:]]] = "[[FILE:.+]]"

and the usage to: 0x[[#FILEOFF]].

Beware that there are some issues relating to leading zeros that I can't quite remember the details of.

ikudrin marked 2 inline comments as done.Jun 4 2020, 8:06 AM
ikudrin added inline comments.
llvm/lib/MC/MCDwarf.cpp
487

Looks like you are right. Would you like to prepare the patch?

llvm/test/MC/ELF/gen-dwarf64.s
21

I believe the feature would be great if there was a need for an expression, but here we can just compare exact strings.

With numeric substitutions, the first match would be something like 0x[[#%x,FILEOFF:]], and the usage would look like 0x{{0*}}[[#FILEOFF]]. To my taste, the current variant is much simpler. Moreover, the current variant shows the exact number of hex digits we expect in the dump, which is, AFAIK, impossible with numeric substitutions.

The test is verifying that llvm-mc can emit a DWARF-64 line table describing assembler source that does not have any .loc directives in it. That's not really the interesting case; the interesting case is assembler source that *does* have .loc directives in it, because that's the path that most compilations will actually use. Describing assembler source really doesn't need to support DWARF-64.

Note that within the MCDwarf arena, "GenDwarf" generally refers to DWARF generated to describe assembler source, rather than DWARF describing other source that simply happens to be passing through the assembler phase. Therefore I think a test named "gen-dwarf64.s" would be somewhat misleading to people familiar with this area. Please use a different name for your test.

The test is verifying that llvm-mc can emit a DWARF-64 line table describing assembler source that does not have any .loc directives in it. That's not really the interesting case; the interesting case is assembler source that *does* have .loc directives in it, because that's the path that most compilations will actually use. Describing assembler source really doesn't need to support DWARF-64.

I think it's a useful place to start for consistency & testing - I get where you're coming from with regards to the features of MC DWARF support that are only needed for assembly DWARF not needed for higher level source DWARF - but they seem to me small enough to implement & nice to have for consistency. (I guess some people might turn on DWARF64 for their whole build if they really want it - and who knows, maybe they are generating weirdly massive assembly files that justify/need DWARF64)

Note that within the MCDwarf arena, "GenDwarf" generally refers to DWARF generated to describe assembler source, rather than DWARF describing other source that simply happens to be passing through the assembler phase. Therefore I think a test named "gen-dwarf64.s" would be somewhat misleading to people familiar with this area. Please use a different name for your test.

This is the foundational test that other patches that flesh out the MC DWARF support are adding to, so I think the name fits.

(not meaning to cutoff discussion - just my 2c)

The test is verifying that llvm-mc can emit a DWARF-64 line table describing assembler source that does not have any .loc directives in it. That's not really the interesting case; the interesting case is assembler source that *does* have .loc directives in it, because that's the path that most compilations will actually use. Describing assembler source really doesn't need to support DWARF-64.

I think it's a useful place to start for consistency & testing - I get where you're coming from with regards to the features of MC DWARF support that are only needed for assembly DWARF not needed for higher level source DWARF - but they seem to me small enough to implement & nice to have for consistency. (I guess some people might turn on DWARF64 for their whole build if they really want it - and who knows, maybe they are generating weirdly massive assembly files that justify/need DWARF64)

Note that within the MCDwarf arena, "GenDwarf" generally refers to DWARF generated to describe assembler source, rather than DWARF describing other source that simply happens to be passing through the assembler phase. Therefore I think a test named "gen-dwarf64.s" would be somewhat misleading to people familiar with this area. Please use a different name for your test.

This is the foundational test that other patches that flesh out the MC DWARF support are adding to, so I think the name fits.

(not meaning to cutoff discussion - just my 2c)

With these small additions we can be more comfortable verifying that DWARF64 support is complete. clang -g a.s will trigger the MCDwarf code path. (In a build system with -S and -g you may easily get DWARF32 from assembly)

I can agree that DWARF64 is not needed for individual files, especially assembly files. They hardly can exceed DWARF32 limitations. But object files are just intermediate, later they are combined to a larger executable binary, and, if that binary is large enough, we can hit the DWARF32 limits. Converting debugging info at the linker stage does not seem to be an easy task, thus, the info should be prepared in an appropriate form from the beginning.

I can agree that DWARF64 is not needed for individual files, especially assembly files. They hardly can exceed DWARF32 limitations. But object files are just intermediate, later they are combined to a larger executable binary, and, if that binary is large enough, we can hit the DWARF32 limits. Converting debugging info at the linker stage does not seem to be an easy task, thus, the info should be prepared in an appropriate form from the beginning.

Yeah, though you don't have to have all your DWARF in DWARF64 - you could have the assembly CU in DWARF32 and other CUs/files in DWARF64 - but I still think it's nice to implement it everywhere for consistency/easier to test, support flags (if you turn on DWARF64 for your build it'd be weird/annoying if it worked for some things and not others), etc. Maybe someone creates a huge machine-generated assembly file that would still benefit from assembly-level debug info, etc.

The thing is, where the generated debugging info will be eventually placed by a linker. They usually tend to place sections in a similar order as input files, so that small object file, compiled from an assembly, with 32-bit debug info, may easily come after other files with huge debugging info, resulting in impossibility to relocate 32-bit offsets in that small file correctly. The linker should be smart enough to adjust the order of debug info sections in order to avoid that. Not sure if there is any linker that does that.

The thing is, where the generated debugging info will be eventually placed by a linker. They usually tend to place sections in a similar order as input files, so that small object file, compiled from an assembly, with 32-bit debug info, may easily come after other files with huge debugging info, resulting in impossibility to relocate 32-bit offsets in that small file correctly. The linker should be smart enough to adjust the order of debug info sections in order to avoid that. Not sure if there is any linker that does that.

If the linker places huge_c.o:.text before asm.o:.text, it would be weird if the linker places asm.o:.debug_info before huge_c.o:.debug_info
A linker can be smarter, but the output will be unexpected.

The thing is, where the generated debugging info will be eventually placed by a linker. They usually tend to place sections in a similar order as input files, so that small object file, compiled from an assembly, with 32-bit debug info, may easily come after other files with huge debugging info, resulting in impossibility to relocate 32-bit offsets in that small file correctly. The linker should be smart enough to adjust the order of debug info sections in order to avoid that. Not sure if there is any linker that does that.

Oh, good point - you're talking about the sec_offsets - like the sec_offset to the line table, or abbrev, or the debug_str section... yeah, fair point, even if a single .o files contributions are small, those contributions may still need large encodings for the sec_offsets used between DWARF sections - and thus needing DWARF64.

If the linker places huge_c.o:.text before asm.o:.text, it would be weird if the linker places asm.o:.debug_info before huge_c.o:.debug_info
A linker can be smarter, but the output will be unexpected.

The relative order between the .text and the .debug_info shouldn't be an issue - I think it's more the absolute size of any debug info section that needs a sec_offset reference to it. (debug_info can use sec_offset refer to debug_info (though our assembly support never needs this), debug_info uses sec_offsets to refer to all the other sections - abbrev, str, str_offsets, loc/loclists, range/rnglists, etc... (except aranges - which refers to the CU instead - so it might need 64 bit encodings for that reference)).

If the linker places huge_c.o:.text before asm.o:.text, it would be weird if the linker places asm.o:.debug_info before huge_c.o:.debug_info
A linker can be smarter, but the output will be unexpected.

That might be unexpected, but in some cases that might be the only way to make the output at all. I have not dug that deep, but probably the order of some debug info sections might be adjusted without violating the correctness of the information, no?

If the linker places huge_c.o:.text before asm.o:.text, it would be weird if the linker places asm.o:.debug_info before huge_c.o:.debug_info
A linker can be smarter, but the output will be unexpected.

That might be unexpected, but in some cases that might be the only way to make the output at all. I have not dug that deep, but probably the order of some debug info sections might be adjusted without violating the correctness of the information, no?

Yep. Though I don't think we need to invoke that possibility to justify the support for DWARF64 for assembler source - as you point out, sec offsets may need to be 64 bit even if all contributions are small - if the overall section is large.

Supporting the asm code path sounds necessary to me too. Once one object starts using DWARF64, I'm pretty sure all objects should switch for a reliable build system, although it's probably not strictly required. As discussed by others, if huge.o comes before asm.o in the link, asm.o's section offsets will need to be 64-bit, even if its length fields don't need to be. Also, beware that some people do crazy things and auto-generate source, usually C/C++ source, but I don't see why they couldn't do it for assembly too, and they do actually want debug information for that quite often too. This auto-generated source can often end up massive, with corresponding large debug data sizes.

llvm/lib/MC/MCDwarf.cpp
487

I would, but I already have too many patches in progress to keep track of, in relation to my debug line work at the moment, so if somebody else is able to fix it, that would be great.

llvm/test/MC/ELF/gen-dwarf64.s
21

Wanting to check the exact number of digits sounds reasonable to me. Possibly deserves an extension to the FileCheck expression parser at some point too. @thopre - thoughts on that? It seems related to our previous discussion about leading zeros too.

Therefore I think a test named "gen-dwarf64.s" would be somewhat misleading to people familiar with this area. Please use a different name for your test.

The test was named after gen-dwarf.s, which generally checks generating (32-bit) debug info in llvm-mc. In this patch, it is mostly a placeholder, which is gradually extended in the subsequent patches. Anyway, if you have a better suggestion for the naming, please share.

thopre added inline comments.Jun 5 2020, 4:02 AM
llvm/test/MC/ELF/gen-dwarf64.s
21

Something like # %#.8x, FILEOFF: (to follow printf format) then? Looks easy enough of a feature to add.

thopre added inline comments.Jun 5 2020, 4:06 AM
llvm/test/MC/ELF/gen-dwarf64.s
21

I meant [[# %#.8x, FILEOFF:]] of course

jhenderson added inline comments.Jun 5 2020, 4:39 AM
llvm/test/MC/ELF/gen-dwarf64.s
21

Yeah, that sort of thing, exactly.

Supporting the asm code path sounds necessary to me too.

Does this imply that something other than generating .loc/.file to the assembler? Quite a bit of the code is in MCObjectStreamer and not in lib/MC/MCDwarf.cpp.

Supporting the asm code path sounds necessary to me too.

Does this imply that something other than generating .loc/.file to the assembler? Quite a bit of the code is in MCObjectStreamer and not in lib/MC/MCDwarf.cpp.

The goal of this particular set of patches is to add generating of the 64-bit debug info sections to llvm-mc, so that all the debug info the tool generates can be in the DWARF64 format. If you know any particular code paths I missed regarding this goal, please let me know.

ikudrin marked an inline comment as done.Jun 5 2020, 7:50 AM
ikudrin added inline comments.
llvm/lib/MC/MCDwarf.cpp
487

Well, I would also postpone changing that, in particular, to avoid interfering with this set of patches.

ikudrin updated this revision to Diff 268808.Jun 5 2020, 8:22 AM
  • Removed a check for Relocations [ in the test.

I accept that generating DWARF64 format for hand-written assembler source has use-cases.

I want to be VERY CLEAR that this patch series is specifically for the hand-written assembler source case ONLY. I have not seen changes in this series that will permit generating DWARF64 format for C/C++ source. As long as everyone understands that, I have no further objections to make. Specifically, because the test is about hand-written assembler, the name gen-dwarf64.s is fine.

(By "hand-written assembler" I mean assembler source without .file/.loc directives. llvm-mc is perfectly capable of assembling a file with .file/.loc directives but that will take different paths not covered by this patch. It was not at all clear from the patch description that only the former case was intended.)

I want to be VERY CLEAR that this patch series is specifically for the hand-written assembler source case ONLY. I have not seen changes in this series that will permit generating DWARF64 format for C/C++ source. As long as everyone understands that, I have no further objections to make. Specifically, because the test is about hand-written assembler, the name gen-dwarf64.s is fine.

This series of patches enables producing DWARF64 debug info in llvm-mc only. While the new switch, -dwarf64, is visible in other tools, for now, it has no effect there, because MCContext::setDwarfFormat(DWARF64) is called only in main() in llvm-mc.cpp.

(By "hand-written assembler" I mean assembler source without .file/.loc directives. llvm-mc is perfectly capable of assembling a file with .file/.loc directives but that will take different paths not covered by this patch. It was not at all clear from the patch description that only the former case was intended.)

If an assembler source contains .file/.loc directives, and -dwarf64 is given, llvm-mc will still correctly produce a DWARF64 .debug_line section. That is because the only differences between DWARF32 and DWARF64 ones are unit_length and header_length fields, and references to .debug_line_str, and all these are covered by this patch.

Do you have any specific example which results in an incorrect output with these patches?

If an assembler source contains .file/.loc directives, and -dwarf64 is given, llvm-mc will still correctly produce a DWARF64 .debug_line section. That is because the only differences between DWARF32 and DWARF64 ones are unit_length and header_length fields, and references to .debug_line_str, and all these are covered by this patch.

Do you have any specific example which results in an incorrect output with these patches?

Right, both the GenDwarf and .file/.loc paths use the same infrastructure. But if an assembler source contains .file/.loc directives, it likely also has compiler-generated .debug_info/etc sections, so the consistency of the output sections is not assured. The DWARF spec (section 7.4) states "The 32-bit and 64-bit DWARF format conventions must _not_ be intermixed within a single compilation unit."

Thinking about this more, as a practical matter, we probably don't want to make llvm-mc detect the formats of the .debug_info/etc sections and assure that all the formats are consistent. I suppose have to make it the responsibility of the user to invoke llvm-mc with --dwarf-64 only for hand-written asm or asm with other sections emitted in DWARF-64 format?

I suppose the DWARF verifier should be the one to check the format consistency of all the .debug_* (sub-)sections. Does it do that?

If an assembler source contains .file/.loc directives, and -dwarf64 is given, llvm-mc will still correctly produce a DWARF64 .debug_line section. That is because the only differences between DWARF32 and DWARF64 ones are unit_length and header_length fields, and references to .debug_line_str, and all these are covered by this patch.

Do you have any specific example which results in an incorrect output with these patches?

Right, both the GenDwarf and .file/.loc paths use the same infrastructure. But if an assembler source contains .file/.loc directives, it likely also has compiler-generated .debug_info/etc sections, so the consistency of the output sections is not assured. The DWARF spec (section 7.4) states "The 32-bit and 64-bit DWARF format conventions must _not_ be intermixed within a single compilation unit."

Thinking about this more, as a practical matter, we probably don't want to make llvm-mc detect the formats of the .debug_info/etc sections and assure that all the formats are consistent. I suppose have to make it the responsibility of the user to invoke llvm-mc with --dwarf-64 only for hand-written asm or asm with other sections emitted in DWARF-64 format?

Yep, same way you have to (unfortunately) pass -gdwarf-5 when assembling a DWARF-having assembly file if you want a DWARFv5 line table (& in fact since DWARFv5 uses file zero, you have to do it if you want your assembler not to error-out). More of an issue for the actual clang integrated assembler than for llvm-mc, the latter is more of a tool for our LLVM developer convenience than an actual production assembler (so quirky interface issues are less of an issue here - but, yeah if there was some better solution for the production/user side, probably would consider using it for llvm-mc too).

If an assembler source contains .file/.loc directives, and -dwarf64 is given, llvm-mc will still correctly produce a DWARF64 .debug_line section. That is because the only differences between DWARF32 and DWARF64 ones are unit_length and header_length fields, and references to .debug_line_str, and all these are covered by this patch.

Do you have any specific example which results in an incorrect output with these patches?

Right, both the GenDwarf and .file/.loc paths use the same infrastructure. But if an assembler source contains .file/.loc directives, it likely also has compiler-generated .debug_info/etc sections, so the consistency of the output sections is not assured. The DWARF spec (section 7.4) states "The 32-bit and 64-bit DWARF format conventions must _not_ be intermixed within a single compilation unit."

Thinking about this more, as a practical matter, we probably don't want to make llvm-mc detect the formats of the .debug_info/etc sections and assure that all the formats are consistent. I suppose have to make it the responsibility of the user to invoke llvm-mc with --dwarf-64 only for hand-written asm or asm with other sections emitted in DWARF-64 format?

To enforce the rule we probably could devise an assembler directive like .dwarf32/.dwarf64. If a compiler generates an assembler source with some debug info sections pre-filled, it might add that directive to ensure that the result will be consistent.

I suppose the DWARF verifier should be the one to check the format consistency of all the .debug_* (sub-)sections. Does it do that?

I can't remember anything like that.

Ping. Are you OK with this patch and the whole series or there is something I should improve?

probinson accepted this revision.Jun 15 2020, 11:54 AM

Ping. Are you OK with this patch and the whole series or there is something I should improve?

This patch LGTM. I don't think I have looked at all the others.

This revision is now accepted and ready to land.Jun 15 2020, 11:54 AM
MaskRay accepted this revision.Jun 15 2020, 4:17 PM

When the FileCheck features are ready, we can migrate to it.

llvm/test/MC/ELF/gen-dwarf64.s
21

[[ # %#.8x, FILEOFF:]] looks good to me.

Sometimes the variable is not referenced. Is it worthwhile adding another syntax omitting the variable?

jhenderson added inline comments.Jun 16 2020, 12:15 AM
llvm/test/MC/ELF/gen-dwarf64.s
21

If I'm not mistaken (@thopre can confirm), it's already possible to do that. For example [[#]] just matches a number, with no specific constraint. I haven't actually tried, but I'd expect [[# %x]] to match a hex number and so on. For reference, some basic precision support is being added at D81667. Feedback is actively being sought there.

This revision was automatically updated to reflect the committed changes.
thopre added inline comments.Jun 16 2020, 7:30 AM
llvm/test/MC/ELF/gen-dwarf64.s
21

Indeed, #%x should work and with the precision support we'll be able to do #%.8x.