This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] Allow GNU macro extension to be emitted
ClosedPublic

Authored by dstenb on Jul 1 2020, 8:55 AM.

Details

Summary

Allow the GNU .debug_macro extension to be emitted for DWARF versions
earlier than 5. The extension is basically what became DWARF 5's format,
except that a DW_AT_GNU_macros attribute is emitted, and some entries
like the strx entries are missing. In this patch I emit GNU's indirect
entries, which are the same as DWARF 5's strp entries.

This patch adds the extension behind a hidden LLVM flag,
-use-gnu-debug-macro. If this can be landed, I would later want to
enable it by default when tuning for GDB and targeting DWARF versions
earlier than 5.

The size of a Clang 8.0 binary built with RelWithDebInfo and the flags
"-gdwarf-4 -fdebug-macro" reduces from 1533 MB to 1349 MB with
.debug_macro (compared to 1296 MB without -fdebug-macro).

One thing that I have not looked into yet is what to do in the DWO case
(when using -gsplit-dwarf). We can't use relocations in the
.debug_macro.dwo section, so the indirect strp entries are not viable
there.

Diff Detail

Event Timeline

dstenb created this revision.Jul 1 2020, 8:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 1 2020, 8:55 AM

This patch adds the extension behind a hidden LLVM flag, -use-gnu-debug-macro. If this can be landed, I would later want to enable it by default when tuning for GDB and targeting DWARF versions earlier than 5.

When you say 'by default' - do you mean by default when the user requests macro debug info (via -fdebug-macro) or by default without any extra flag?
& what does GCC do? Does it have a way to emit the standard debug_macinfo in v4 and below? Or does it always emit the debug_macro GNU extension?

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
2980

Is this correct/necessary? Does GCC produce a version 4 debug_macro section when emitting DWARFv3?

3088

Use default ref capture [&] when a lambda doesn't escape its scope like this. (possibly inline the call to getDwarfVersion() even (& let default capture capture 'this'). I'd probably even inline the lambda into the call expression - but feel free to leave that aspect as-is, sinec it's already written that way.

When you say 'by default' - do you mean by default when the user requests macro debug info (via -fdebug-macro) or by default without any extra flag?
& what does GCC do? Does it have a way to emit the standard debug_macinfo in v4 and below? Or does it always emit the debug_macro GNU extension?

I'm not particularly sure of this(introduction of GNU encodings). Behavior of GCC trunk(11.0.0) is as follows:

gcc -g3 test.c -c, after dumping using objdump(2.32), GCC will create .debug_macro(sort of it's default, until you specify -gstrict-dwarf in which case GCC will generate .debug_macinfo).

The only difference b/w -g3 and -gdwarf-5 -g3 GCC generated .debug_macro section is the version information in macro header, 4 and 5 respectively. However there's no difference in encoding used i.e both uses (DWARFv5 encodings) there is no DW_MACRO_GNU* -- observed using binutils objdump version info mentioned above.

And lastly The reason why current llvm-dwarfdump is not able to dump/parse GCC generated(gcc -g3 test.c -c) .debug_macro section is because it uses version information in the header to parse it correctly(In this case it is 4). However if you generate the macro info as(specifying version) gcc -gdwarf-5 -g3 test.c -c llvm-dwarfdump can parse/dump it correctly.

I think if it's about compatibility(analogous behavior with GCC), existing infra is Okay/Fine(Since same encodings are used). We just need to emit the .debug_macro section with version 4 and teach the llvm-dwarfdump to parse it correctly.

The size of a Clang 8.0 binary built with RelWithDebInfo and the flags "-gdwarf-4 -fdebug-macro" reduces from 1533 MB to 1349 MB with .debug_macro (compared to 1296 MB without -fdebug-macro).

Yes that's good motivation for this patch, I also observed this in our initial findings, but I was occupied with current v5 split macro stuff.
CLANG/llvm AFAIK doesn't have -gstrict-dwarf. So if you want analogous behavior like GCC(have .debug_macro section even at version 4) you may need to introduce extra flag/switch. So that if end-user still .debug_macinfo for whatever reasons CLANG/llvm should generate it.
I'm not the right person these sort of decision, I'll leave it to @dblaikie and @probinson .

One thing that I have not looked into yet is what to do in the DWO case (when using -gsplit-dwarf). We can't use relocations in the .debug_macro.dwo section, so the indirect strp entries are not viable there.

Yes, that's broken in GCC too, sort of it produces encodings that not even objdump can understand.
objdump produces following Error comes when you dump .dwo file:

.debug_macro section not zero terminated
objdump: Error:  Unknown macro opcode 23 seen
objdump: Error:  Unknown macro opcode de seen

I've done some initial work(in llvm) around that D78866 and related. This is still broken from emission perspective(Fix in progress). llvm-dwarfdump works great.

Anyway Thanks for the patch :)

dstenb marked an inline comment as done.Jul 2 2020, 6:32 AM
dstenb added inline comments.
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
2980

Yes, GCC emits the debug_macro section with version 4 even with -gdwarf-[23].

dstenb added a comment.Jul 2 2020, 8:30 AM

When you say 'by default' - do you mean by default when the user requests macro debug info (via -fdebug-macro) or by default without any extra flag?
& what does GCC do? Does it have a way to emit the standard debug_macinfo in v4 and below? Or does it always emit the debug_macro GNU extension?

I'm not particularly sure of this(introduction of GNU encodings). Behavior of GCC trunk(11.0.0) is as follows:

gcc -g3 test.c -c, after dumping using objdump(2.32), GCC will create .debug_macro(sort of it's default, until you specify -gstrict-dwarf in which case GCC will generate .debug_macinfo).

As Sourabh says this is default when not emitting strict DWARF in GCC. For Clang, my intention was for it to be enabled by default for -fdebug-macro when tuning for GDB. Maybe it would also be interesting when tuning for LLDB?

The only difference b/w -g3 and -gdwarf-5 -g3 GCC generated .debug_macro section is the version information in macro header, 4 and 5 respectively. However there's no difference in encoding used i.e both uses (DWARFv5 encodings) there is no DW_MACRO_GNU* -- observed using binutils objdump version info mentioned above.

I personally think that the binutils tools printing the GNU extension using DWARF 5 entry names is confusing, but if people prefer to have it like that to avoid the larger code changes that this patch series introduce, I can align with that.

And lastly The reason why current llvm-dwarfdump is not able to dump/parse GCC generated(gcc -g3 test.c -c) .debug_macro section is because it uses version information in the header to parse it correctly(In this case it is 4). However if you generate the macro info as(specifying version) gcc -gdwarf-5 -g3 test.c -c llvm-dwarfdump can parse/dump it correctly.

I think if it's about compatibility(analogous behavior with GCC), existing infra is Okay/Fine(Since same encodings are used). We just need to emit the .debug_macro section with version 4 and teach the llvm-dwarfdump to parse it correctly.

One difference though is that the GNU extension does not have anything like the strx entries that LLVM currently emits: https://github.com/gcc-mirror/gcc/blob/master/include/dwarf2.h#L425, so I assume we still need code to emit the strp entries when targeting DWARF 4?

CLANG/llvm AFAIK doesn't have -gstrict-dwarf. So if you want analogous behavior like GCC(have .debug_macro section even at version 4) you may need to introduce extra flag/switch. So that if end-user still .debug_macinfo for whatever reasons CLANG/llvm should generate it.
I'm not the right person these sort of decision, I'll leave it to @dblaikie and @probinson .

I just want to add that one downside with emitting .debug_macro that we have noticed downstream is that size of archives can grow quite a bit, since you then both pay for the uncoalesced strings in the different object files (same cost as for .debug_macinfo), plus all of the relocations.

Other than that I am personally not aware of any other major reasons for wanting to use .debug_macinfo over .debug_macro, given that the rest of the toolchain supports the latter format, of course.

I've done some initial work(in llvm) around that D78866 and related. This is still broken from emission perspective(Fix in progress). llvm-dwarfdump works great.

Okay, thanks for the information!

dstenb updated this revision to Diff 275752.Jul 6 2020, 9:59 AM
dstenb marked an inline comment as done.

Rebase and address comment.

When you say 'by default' - do you mean by default when the user requests macro debug info (via -fdebug-macro) or by default without any extra flag?
& what does GCC do? Does it have a way to emit the standard debug_macinfo in v4 and below? Or does it always emit the debug_macro GNU extension?

I'm not particularly sure of this(introduction of GNU encodings). Behavior of GCC trunk(11.0.0) is as follows:

gcc -g3 test.c -c, after dumping using objdump(2.32), GCC will create .debug_macro(sort of it's default, until you specify -gstrict-dwarf in which case GCC will generate .debug_macinfo).

As Sourabh says this is default when not emitting strict DWARF in GCC. For Clang, my intention was for it to be enabled by default for -fdebug-macro when tuning for GDB. Maybe it would also be interesting when tuning for LLDB?

Sounds alright. Not sure if the LLDB folks (@aprantl @JDevlieghere @labath) would be interested in that - a separate patch in any case.

The only difference b/w -g3 and -gdwarf-5 -g3 GCC generated .debug_macro section is the version information in macro header, 4 and 5 respectively. However there's no difference in encoding used i.e both uses (DWARFv5 encodings) there is no DW_MACRO_GNU* -- observed using binutils objdump version info mentioned above.

I personally think that the binutils tools printing the GNU extension using DWARF 5 entry names is confusing, but if people prefer to have it like that to avoid the larger code changes that this patch series introduce, I can align with that.

Agreed.

And lastly The reason why current llvm-dwarfdump is not able to dump/parse GCC generated(gcc -g3 test.c -c) .debug_macro section is because it uses version information in the header to parse it correctly(In this case it is 4). However if you generate the macro info as(specifying version) gcc -gdwarf-5 -g3 test.c -c llvm-dwarfdump can parse/dump it correctly.

I think if it's about compatibility(analogous behavior with GCC), existing infra is Okay/Fine(Since same encodings are used). We just need to emit the .debug_macro section with version 4 and teach the llvm-dwarfdump to parse it correctly.

One difference though is that the GNU extension does not have anything like the strx entries that LLVM currently emits: https://github.com/gcc-mirror/gcc/blob/master/include/dwarf2.h#L425, so I assume we still need code to emit the strp entries when targeting DWARF 4?

Likely - but might want to check what GCC does - maybe it uses some kind of strx encoding that's not documented, etc.

CLANG/llvm AFAIK doesn't have -gstrict-dwarf. So if you want analogous behavior like GCC(have .debug_macro section even at version 4) you may need to introduce extra flag/switch. So that if end-user still .debug_macinfo for whatever reasons CLANG/llvm should generate it.
I'm not the right person these sort of decision, I'll leave it to @dblaikie and @probinson .

I just want to add that one downside with emitting .debug_macro that we have noticed downstream is that size of archives can grow quite a bit, since you then both pay for the uncoalesced strings in the different object files (same cost as for .debug_macinfo), plus all of the relocations.

Got a rough %? Is it easy to disable this functionality if someone were trying to optimize for object size? (is there an easy way to disable gdb tuning on platforms that default to it, for instance?)

Other than that I am personally not aware of any other major reasons for wanting to use .debug_macinfo over .debug_macro, given that the rest of the toolchain supports the latter format, of course.

I've done some initial work(in llvm) around that D78866 and related. This is still broken from emission perspective(Fix in progress). llvm-dwarfdump works great.

Okay, thanks for the information!

I think if it's about compatibility(analogous behavior with GCC), existing infra is Okay/Fine(Since same encodings are used). We just need to emit the .debug_macro section with version 4 and teach the llvm-dwarfdump to parse it correctly.

One difference though is that the GNU extension does not have anything like the strx entries that LLVM currently emits: https://github.com/gcc-mirror/gcc/blob/master/include/dwarf2.h#L425, so I assume we still need code to emit the strp entries when targeting DWARF 4?

Likely - but might want to check what GCC does - maybe it uses some kind of strx encoding that's not documented, etc.

My recollection is that .debug_macro was invented independently of the strx forms so the prototype probably wouldn't have used them. Easy enough to check whether GCC's -fdebug-macro with v4 is emitting a .debug_str_offsets section.

LLVM wouldn't be using strx forms from .debug_info for v4, and would have no other reason to emit .debug_str_offsets, so I wouldn't want LLVM to use them in a v4 compatibility mode .debug_macro section either.

I think if it's about compatibility(analogous behavior with GCC), existing infra is Okay/Fine(Since same encodings are used). We just need to emit the .debug_macro section with version 4 and teach the llvm-dwarfdump to parse it correctly.

One difference though is that the GNU extension does not have anything like the strx entries that LLVM currently emits: https://github.com/gcc-mirror/gcc/blob/master/include/dwarf2.h#L425, so I assume we still need code to emit the strp entries when targeting DWARF 4?

Likely - but might want to check what GCC does - maybe it uses some kind of strx encoding that's not documented, etc.

My recollection is that .debug_macro was invented independently of the strx forms so the prototype probably wouldn't have used them. Easy enough to check whether GCC's -fdebug-macro with v4 is emitting a .debug_str_offsets section.

LLVM wouldn't be using strx forms from .debug_info for v4, and would have no other reason to emit .debug_str_offsets, so I wouldn't want LLVM to use them in a v4 compatibility mode .debug_macro section either.

GCC certainly seems to produce some kind of debug_macro.dwo section (& binutils dwp supports it in the index, if I recall correctly) using some form llvm-dwarfdump currently doesn't understand:

$ g++-tot -g3 main.cpp -c -gsplit-dwarf && llvm-objdump -h main.dwo | grep " \.debug"
  1 .debug_info.dwo        0000003c 0000000000000000 
  2 .debug_abbrev.dwo      0000003e 0000000000000000 
  3 .debug_macro.dwo       0000001e 0000000000000000 
  4 .debug_macro.dwo       00000364 0000000000000000 
  5 .debug_macro.dwo       00000013 0000000000000000 
  6 .debug_line.dwo        00000048 0000000000000000 
  7 .debug_str_offsets.dwo 000002d5 0000000000000000 
  8 .debug_str.dwo         00000e05 0000000000000000 
$ llvm-dwarfdump-tot main.dwo -debug-macro
main.dwo:       file format elf64-x86-64

.debug_macro.dwo contents:
0x00000000:
 - lineno: 19 macro: 
DW_MACINFO_invalid

I mean, I don't have strong feelings about supporting macro debug info in general, but if someone feels strongly about debug_macro GNU extension DWARFv4 support, there's certainly some GCC behavior that one could use to model the Split DWARF support for that off.

I think if it's about compatibility(analogous behavior with GCC), existing infra is Okay/Fine(Since same encodings are used). We just need to emit the .debug_macro section with version 4 and teach the llvm-dwarfdump to parse it correctly.

One difference though is that the GNU extension does not have anything like the strx entries that LLVM currently emits: https://github.com/gcc-mirror/gcc/blob/master/include/dwarf2.h#L425, so I assume we still need code to emit the strp entries when targeting DWARF 4?

Likely - but might want to check what GCC does - maybe it uses some kind of strx encoding that's not documented, etc.

My recollection is that .debug_macro was invented independently of the strx forms so the prototype probably wouldn't have used them. Easy enough to check whether GCC's -fdebug-macro with v4 is emitting a .debug_str_offsets section.

LLVM wouldn't be using strx forms from .debug_info for v4, and would have no other reason to emit .debug_str_offsets, so I wouldn't want LLVM to use them in a v4 compatibility mode .debug_macro section either.

GCC certainly seems to produce some kind of debug_macro.dwo section (& binutils dwp supports it in the index, if I recall correctly) using some form llvm-dwarfdump currently doesn't understand:

$ g++-tot -g3 main.cpp -c -gsplit-dwarf && llvm-objdump -h main.dwo | grep " \.debug"
  1 .debug_info.dwo        0000003c 0000000000000000 
  2 .debug_abbrev.dwo      0000003e 0000000000000000 
  3 .debug_macro.dwo       0000001e 0000000000000000 
  4 .debug_macro.dwo       00000364 0000000000000000 
  5 .debug_macro.dwo       00000013 0000000000000000 
  6 .debug_line.dwo        00000048 0000000000000000 
  7 .debug_str_offsets.dwo 000002d5 0000000000000000 
  8 .debug_str.dwo         00000e05 0000000000000000 
$ llvm-dwarfdump-tot main.dwo -debug-macro
main.dwo:       file format elf64-x86-64

.debug_macro.dwo contents:
0x00000000:
 - lineno: 19 macro: 
DW_MACINFO_invalid

I mean, I don't have strong feelings about supporting macro debug info in general, but if someone feels strongly about debug_macro GNU extension DWARFv4 support, there's certainly some GCC behavior that one could use to model the Split DWARF support for that off.

One more deciding factor to considered here(previously missed) is that: GDB(trunk) also doesn't understand GNU macro extensions(if you wish to call it) in split case.
i.e
gcc -g3 -gsplit-dwarf test.c
test.dwo contains .debug_macro.dwo forms which no tool(as of now can dump).
if you load a.out in GDB and try expanding macro(defined in source).
GDB will report

(gdb) info macro FOO
The symbol `FOO' has no definition as a C/C++ preprocessor macro
at <user-defined>:-1

on the other hand, if you try with -gstrict-dwarf -gsplit-dwarf. GDB is happy.
So at the end of the day, even if we allow GNU macro extension, things will still be broken for -gsplit-dwarf case.
Or we have to teach the debugger to understand this ?, this also hinges on the fact, what kinda form GCC uses in split-case in .debug_macro.dwo section.
That it self is unclear right ?

I think if it's about compatibility(analogous behavior with GCC), existing infra is Okay/Fine(Since same encodings are used). We just need to emit the .debug_macro section with version 4 and teach the llvm-dwarfdump to parse it correctly.

One difference though is that the GNU extension does not have anything like the strx entries that LLVM currently emits: https://github.com/gcc-mirror/gcc/blob/master/include/dwarf2.h#L425, so I assume we still need code to emit the strp entries when targeting DWARF 4?

Likely - but might want to check what GCC does - maybe it uses some kind of strx encoding that's not documented, etc.

My recollection is that .debug_macro was invented independently of the strx forms so the prototype probably wouldn't have used them. Easy enough to check whether GCC's -fdebug-macro with v4 is emitting a .debug_str_offsets section.

LLVM wouldn't be using strx forms from .debug_info for v4, and would have no other reason to emit .debug_str_offsets, so I wouldn't want LLVM to use them in a v4 compatibility mode .debug_macro section either.

GCC certainly seems to produce some kind of debug_macro.dwo section (& binutils dwp supports it in the index, if I recall correctly) using some form llvm-dwarfdump currently doesn't understand:

$ g++-tot -g3 main.cpp -c -gsplit-dwarf && llvm-objdump -h main.dwo | grep " \.debug"
  1 .debug_info.dwo        0000003c 0000000000000000 
  2 .debug_abbrev.dwo      0000003e 0000000000000000 
  3 .debug_macro.dwo       0000001e 0000000000000000 
  4 .debug_macro.dwo       00000364 0000000000000000 
  5 .debug_macro.dwo       00000013 0000000000000000 
  6 .debug_line.dwo        00000048 0000000000000000 
  7 .debug_str_offsets.dwo 000002d5 0000000000000000 
  8 .debug_str.dwo         00000e05 0000000000000000 
$ llvm-dwarfdump-tot main.dwo -debug-macro
main.dwo:       file format elf64-x86-64

.debug_macro.dwo contents:
0x00000000:
 - lineno: 19 macro: 
DW_MACINFO_invalid

I mean, I don't have strong feelings about supporting macro debug info in general, but if someone feels strongly about debug_macro GNU extension DWARFv4 support, there's certainly some GCC behavior that one could use to model the Split DWARF support for that off.

One more deciding factor to considered here(previously missed) is that: GDB(trunk) also doesn't understand GNU macro extensions(if you wish to call it) in split case.
i.e
gcc -g3 -gsplit-dwarf test.c
test.dwo contains .debug_macro.dwo forms which no tool(as of now can dump).
if you load a.out in GDB and try expanding macro(defined in source).
GDB will report

(gdb) info macro FOO
The symbol `FOO' has no definition as a C/C++ preprocessor macro
at <user-defined>:-1

on the other hand, if you try with -gstrict-dwarf -gsplit-dwarf. GDB is happy.
So at the end of the day, even if we allow GNU macro extension, things will still be broken for -gsplit-dwarf case.
Or we have to teach the debugger to understand this ?, this also hinges on the fact, what kinda form GCC uses in split-case in .debug_macro.dwo section.
That it self is unclear right ?

Sure, but easy enough to find the answer to by looking at GCC's output or implementation.

But, yeah, hardly high-value if gdb doesn't support it anyway, unless someone wants to add support there, or has some other DWARF Consumer that can handle it.

Don't mind -gsplit-dwarf -gdwarf-4 -fdebug-macro -ggdb to use debug_macinfo.dwo while non-split uses GNU extension debug_macro. Don't mind hugely in any case.

dstenb added a comment.Jul 8 2020, 5:18 AM

When you say 'by default' - do you mean by default when the user requests macro debug info (via -fdebug-macro) or by default without any extra flag?
& what does GCC do? Does it have a way to emit the standard debug_macinfo in v4 and below? Or does it always emit the debug_macro GNU extension?

I'm not particularly sure of this(introduction of GNU encodings). Behavior of GCC trunk(11.0.0) is as follows:

gcc -g3 test.c -c, after dumping using objdump(2.32), GCC will create .debug_macro(sort of it's default, until you specify -gstrict-dwarf in which case GCC will generate .debug_macinfo).

As Sourabh says this is default when not emitting strict DWARF in GCC. For Clang, my intention was for it to be enabled by default for -fdebug-macro when tuning for GDB. Maybe it would also be interesting when tuning for LLDB?

Sounds alright. Not sure if the LLDB folks (@aprantl @JDevlieghere @labath) would be interested in that - a separate patch in any case.

Yes, let's take that in another patch.

I just want to add that one downside with emitting .debug_macro that we have noticed downstream is that size of archives can grow quite a bit, since you then both pay for the uncoalesced strings in the different object files (same cost as for .debug_macinfo), plus all of the relocations.

Got a rough %? Is it easy to disable this functionality if someone were trying to optimize for object size? (is there an easy way to disable gdb tuning on platforms that default to it, for instance?)

I wrote that because I have in some downstream cases seen the size of large archives more than double. I don't know if there is something special about those cases, or the cost of relocation on that target, though.

When building a Clang 8.0 RelWithDebInfo binary on x86-64 the size of the archives under lib/ increased from 5224M to 5341M (a 2.4% increase), so that's not too bad.

$ du -h -s build-lib-with-macinfo/*.a | sort -h | tail -10
152M	build-lib-with-macinfo/libclangARCMigrate.a
194M	build-lib-with-macinfo/libclangStaticAnalyzerCore.a
200M	build-lib-with-macinfo/libLLVMX86CodeGen.a
230M	build-lib-with-macinfo/libLLVMAnalysis.a
245M	build-lib-with-macinfo/libLLVMScalarOpts.a
289M	build-lib-with-macinfo/libclangAST.a
457M	build-lib-with-macinfo/libclangSema.a
481M	build-lib-with-macinfo/libclangCodeGen.a
535M	build-lib-with-macinfo/libLLVMCodeGen.a
573M	build-lib-with-macinfo/libclangStaticAnalyzerCheckers.a

$ du -h -s build-lib-with-macro/*.a | sort -h | tail -10
154M	build-lib-with-macro/libclangARCMigrate.a
197M	build-lib-with-macro/libclangStaticAnalyzerCore.a
204M	build-lib-with-macro/libLLVMX86CodeGen.a
237M	build-lib-with-macro/libLLVMAnalysis.a
250M	build-lib-with-macro/libLLVMScalarOpts.a
295M	build-lib-with-macro/libclangAST.a
460M	build-lib-with-macro/libclangSema.a
487M	build-lib-with-macro/libclangCodeGen.a
548M	build-lib-with-macro/libLLVMCodeGen.a
581M	build-lib-with-macro/libclangStaticAnalyzerCheckers.a

Regarding overriding the default, I think the only way is to explicitly pass another tuning option, e.g. -glldb?

dstenb added a comment.Jul 8 2020, 6:11 AM

I think if it's about compatibility(analogous behavior with GCC), existing infra is Okay/Fine(Since same encodings are used). We just need to emit the .debug_macro section with version 4 and teach the llvm-dwarfdump to parse it correctly.

One difference though is that the GNU extension does not have anything like the strx entries that LLVM currently emits: https://github.com/gcc-mirror/gcc/blob/master/include/dwarf2.h#L425, so I assume we still need code to emit the strp entries when targeting DWARF 4?

Likely - but might want to check what GCC does - maybe it uses some kind of strx encoding that's not documented, etc.

My recollection is that .debug_macro was invented independently of the strx forms so the prototype probably wouldn't have used them. Easy enough to check whether GCC's -fdebug-macro with v4 is emitting a .debug_str_offsets section.

LLVM wouldn't be using strx forms from .debug_info for v4, and would have no other reason to emit .debug_str_offsets, so I wouldn't want LLVM to use them in a v4 compatibility mode .debug_macro section either.

GCC certainly seems to produce some kind of debug_macro.dwo section (& binutils dwp supports it in the index, if I recall correctly) using some form llvm-dwarfdump currently doesn't understand:

$ g++-tot -g3 main.cpp -c -gsplit-dwarf && llvm-objdump -h main.dwo | grep " \.debug"
  1 .debug_info.dwo        0000003c 0000000000000000 
  2 .debug_abbrev.dwo      0000003e 0000000000000000 
  3 .debug_macro.dwo       0000001e 0000000000000000 
  4 .debug_macro.dwo       00000364 0000000000000000 
  5 .debug_macro.dwo       00000013 0000000000000000 
  6 .debug_line.dwo        00000048 0000000000000000 
  7 .debug_str_offsets.dwo 000002d5 0000000000000000 
  8 .debug_str.dwo         00000e05 0000000000000000 
$ llvm-dwarfdump-tot main.dwo -debug-macro
main.dwo:       file format elf64-x86-64

.debug_macro.dwo contents:
0x00000000:
 - lineno: 19 macro: 
DW_MACINFO_invalid

I mean, I don't have strong feelings about supporting macro debug info in general, but if someone feels strongly about debug_macro GNU extension DWARFv4 support, there's certainly some GCC behavior that one could use to model the Split DWARF support for that off.

One more deciding factor to considered here(previously missed) is that: GDB(trunk) also doesn't understand GNU macro extensions(if you wish to call it) in split case.
i.e
gcc -g3 -gsplit-dwarf test.c
test.dwo contains .debug_macro.dwo forms which no tool(as of now can dump).
if you load a.out in GDB and try expanding macro(defined in source).
GDB will report

(gdb) info macro FOO
The symbol `FOO' has no definition as a C/C++ preprocessor macro
at <user-defined>:-1

on the other hand, if you try with -gstrict-dwarf -gsplit-dwarf. GDB is happy.
So at the end of the day, even if we allow GNU macro extension, things will still be broken for -gsplit-dwarf case.
Or we have to teach the debugger to understand this ?, this also hinges on the fact, what kinda form GCC uses in split-case in .debug_macro.dwo section.
That it self is unclear right ?

(Sorry, I don't have a GCC trunk build readily available, so I used GCC 9.3.0 here.)

When using those flags, GCC seems to emit DW_MACRO_define_strp (DW_MACRO_GNU_define_indirect) entries, but with indexed strings as operands. Neither binutils nor GDB does consider that such entries may hold indexed strings, and just treats those operands as indirect strings, which is why they are not properly handled. "Overloading" those indirect operands with indexed strings seems very weird to me. Perhaps that is just a bug in GCC, rather than a limitation in the consumers?

I think if it's about compatibility(analogous behavior with GCC), existing infra is Okay/Fine(Since same encodings are used). We just need to emit the .debug_macro section with version 4 and teach the llvm-dwarfdump to parse it correctly.

One difference though is that the GNU extension does not have anything like the strx entries that LLVM currently emits: https://github.com/gcc-mirror/gcc/blob/master/include/dwarf2.h#L425, so I assume we still need code to emit the strp entries when targeting DWARF 4?

Likely - but might want to check what GCC does - maybe it uses some kind of strx encoding that's not documented, etc.

My recollection is that .debug_macro was invented independently of the strx forms so the prototype probably wouldn't have used them. Easy enough to check whether GCC's -fdebug-macro with v4 is emitting a .debug_str_offsets section.

LLVM wouldn't be using strx forms from .debug_info for v4, and would have no other reason to emit .debug_str_offsets, so I wouldn't want LLVM to use them in a v4 compatibility mode .debug_macro section either.

GCC certainly seems to produce some kind of debug_macro.dwo section (& binutils dwp supports it in the index, if I recall correctly) using some form llvm-dwarfdump currently doesn't understand:

$ g++-tot -g3 main.cpp -c -gsplit-dwarf && llvm-objdump -h main.dwo | grep " \.debug"
  1 .debug_info.dwo        0000003c 0000000000000000 
  2 .debug_abbrev.dwo      0000003e 0000000000000000 
  3 .debug_macro.dwo       0000001e 0000000000000000 
  4 .debug_macro.dwo       00000364 0000000000000000 
  5 .debug_macro.dwo       00000013 0000000000000000 
  6 .debug_line.dwo        00000048 0000000000000000 
  7 .debug_str_offsets.dwo 000002d5 0000000000000000 
  8 .debug_str.dwo         00000e05 0000000000000000 
$ llvm-dwarfdump-tot main.dwo -debug-macro
main.dwo:       file format elf64-x86-64

.debug_macro.dwo contents:
0x00000000:
 - lineno: 19 macro: 
DW_MACINFO_invalid

I mean, I don't have strong feelings about supporting macro debug info in general, but if someone feels strongly about debug_macro GNU extension DWARFv4 support, there's certainly some GCC behavior that one could use to model the Split DWARF support for that off.

One more deciding factor to considered here(previously missed) is that: GDB(trunk) also doesn't understand GNU macro extensions(if you wish to call it) in split case.
i.e
gcc -g3 -gsplit-dwarf test.c
test.dwo contains .debug_macro.dwo forms which no tool(as of now can dump).
if you load a.out in GDB and try expanding macro(defined in source).
GDB will report

(gdb) info macro FOO
The symbol `FOO' has no definition as a C/C++ preprocessor macro
at <user-defined>:-1

on the other hand, if you try with -gstrict-dwarf -gsplit-dwarf. GDB is happy.
So at the end of the day, even if we allow GNU macro extension, things will still be broken for -gsplit-dwarf case.
Or we have to teach the debugger to understand this ?, this also hinges on the fact, what kinda form GCC uses in split-case in .debug_macro.dwo section.
That it self is unclear right ?

(Sorry, I don't have a GCC trunk build readily available, so I used GCC 9.3.0 here.)

When using those flags, GCC seems to emit DW_MACRO_define_strp (DW_MACRO_GNU_define_indirect) entries, but with indexed strings as operands. Neither binutils nor GDB does consider that such entries may hold indexed strings, and just treats those operands as indirect strings, which is why they are not properly handled. "Overloading" those indirect operands with indexed strings seems very weird to me. Perhaps that is just a bug in GCC, rather than a limitation in the consumers?

Perhaps - though there was some thought put into supporting GNU debug_macro in v4/pre-standard Fission, given the DWP format had columns for both debug_macro and debug_macinfo ( https://gcc.gnu.org/wiki/DebugFissionDWP ). Don't think it's a big deal either way - if someone comes along wanting to add debug_macro support for pre-standard Fission, we can discuss what that format looks like at that point - happy enough for it to be unimplemented (& as I said before, have "-ggdb -gdwarf-4 -fdebug-macro -> debug_macro" and "-ggdb -gdwarf-4 -fdebug-macro -gsplit-dwarf -> debug_macinfo.dwo").

(Sorry, I don't have a GCC trunk build readily available, so I used GCC 9.3.0 here.)

When using those flags, GCC seems to emit DW_MACRO_define_strp (DW_MACRO_GNU_define_indirect) entries, but with indexed strings as operands. Neither binutils nor GDB does consider that such entries may hold indexed strings, and just treats those operands as indirect strings, which is why they are not properly handled. "Overloading" those indirect operands with indexed strings seems very weird to me. Perhaps that is just a bug in GCC, rather than a limitation in the consumers?

Perhaps - though there was some thought put into supporting GNU debug_macro in v4/pre-standard Fission, given the DWP format had columns for both debug_macro and debug_macinfo ( https://gcc.gnu.org/wiki/DebugFissionDWP ). Don't think it's a big deal either way - if someone comes along wanting to add debug_macro support for pre-standard Fission, we can discuss what that format looks like at that point - happy enough for it to be unimplemented (& as I said before, have "-ggdb -gdwarf-4 -fdebug-macro -> debug_macro" and "-ggdb -gdwarf-4 -fdebug-macro -gsplit-dwarf -> debug_macinfo.dwo").

I'll leave the DWO parts out of this patch then, and later if we get to that, emitting macinfo in the ggdb -gdwarf-4 -fdebug-macro -gsplit-dwarf case seems reasonable to me.

dblaikie added inline comments.Jul 9 2020, 3:31 PM
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1359–1368

Looks like this might be wrong for v4 + split DWARF + using macro? Or perhaps this code isn't reachable by that combination?

Might be more clear, then, to sink the MacrosAttr choice down into the "else" clause here, and assert in the split DWARF case that the version >= 5? (possibly including a note about how the pre-v5, GCC debug_macro extension isn't supported with Split DWARF)

3032–3035

/might/ be worth pulling these 4 lines out as a lambda to use from the if/else branches, but probably not...

3036–3048

Might be nice to refactor this in both the original codepath and the new codepath you're adding (either before or after this commit) to compute the string once & share the rest of this expression..

std::string Str = Value.empty() ? Name.str() : (Name + ' ' + Value).str();
Asm->OutStreamer->emitSymbol(this->InfoHolder.getStringPool().getEntry(*Asm, Str).getSymbol(), 4);
3091–3093

Looks like maybe this could skip the std::function_ref, and do this:

emitMacroFileImpl(F, U, dwarf::DW_MACRO_start_file,
                      dwarf::DW_MACRO_end_file, 
                        (getDwarfVersion() >= 5)
                                   ? dwarf::MacroString
                                   : dwarf::GnuMacroString);
dstenb updated this revision to Diff 277030.Jul 10 2020, 7:19 AM
dstenb marked 2 inline comments as done.

Rebase and address review comments.

dstenb marked 2 inline comments as done.Jul 10 2020, 7:21 AM
dstenb added inline comments.
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
3036–3048

Yes, good idea! I split that out to the preparatory patch D83557.

3091–3093

Oh, thanks, of course!

dstenb marked 3 inline comments as done.Jul 10 2020, 8:12 AM
dstenb added inline comments.
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1359–1368

Sorry, in what way does this look wrong? If I am not overlooking something, this look the same as what GCC emits for the attribute in the -g3 -gdwarf-4 -gsplit-dwarf case.

Regardless of the above, doing like you suggest and adding an assert seems like a good idea.

3032–3035

Although there is some code duplication, I think I prefer to keep it as is.

dblaikie added inline comments.Jul 10 2020, 11:55 PM
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1359–1368

Sorry, in what way does this look wrong? If I am not overlooking something, this look the same as what GCC emits for the attribute in the -g3 -gdwarf-4 -gsplit-dwarf case.

I think that's what we were discussing at length previously in this review, that GNU debug_macro + v4 + split DWARF seems a bit ill specified, and it's probably best to have -fdebug-macro + v4 + -gsplit-dwarf continue to use debug_macinfo rather than the ill-specified and not-implemented-by-any-consumer v4 Split DWARF .debug_macro?

https://reviews.llvm.org/D82975#2142264

dstenb updated this revision to Diff 277451.Jul 13 2020, 9:12 AM

Assert on split DWARF.

dstenb marked an inline comment as done.Jul 13 2020, 9:16 AM
dstenb added inline comments.
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1359–1368

Okay, yes, I guess it's bad to assume that the attribute part is valid if the whole GNU debug_macro + v4 + split DWARF format seems ill specified. I have added an assert as you suggested.

dblaikie added inline comments.Jul 13 2020, 10:04 AM
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1359–1368

This code would hit the assertion with -gsplit-dwarf -fdebug-macro -use-gnu-debug-macro, yes?

While the -use-gnu-debug-macro isn't a driver option (ie: not a public feature), probably best to have UseDebugMacroSection set to false when using v4 split DWARF?

UseDebugMacroSection = DwarfVersion >= 5 || (UseGNUDebugMacro && !HasSplitDwarf);

(& a test that this combination behaves as expected)

dstenb marked an inline comment as done.Jul 14 2020, 9:12 AM
dstenb added inline comments.
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1359–1368

Yes, I'll do that. I thought that this could be done later on when this is promoted from a hidden LLVM flag to being enabled by the tuning, but perhaps it's better to just do it from the start.

dstenb updated this revision to Diff 277865.Jul 14 2020, 9:13 AM

Fall back to the macinfo format for split DWARF, and add a test case for that.

dblaikie accepted this revision.Jul 14 2020, 1:25 PM

Looks good (one small nit/change to a comment in the new test case) - thanks!

llvm/test/DebugInfo/X86/debug-macro-gnu-dwo.ll
1–2 ↗(On Diff #277865)

well-specified for split DWARFv4, specifically, I think?

This revision is now accepted and ready to land.Jul 14 2020, 1:25 PM

Thanks a lot for the perseverance reviewing this, @dblaikie!

What do you think about the patch series as it stands now, @SouraVX?

dstenb added a comment.Aug 9 2020, 9:57 PM

What do you think about the patch series as it stands now, @SouraVX?

Would you be fine with me landing this now, @SouraVX?

SouraVX accepted this revision.EditedAug 10 2020, 7:43 AM

Apologies @dstenb for too much delayed response. Some how this revision slipped from my worklist.
LGTM, Thanks for your patience, please go ahead with pushing/merging.