This is an archive of the discontinued LLVM Phabricator instance.

[WIP] [DebugInfoMetadata] Introduce the "lambda" DISubprogram field
AbandonedPublic

Authored by djtodoro on Apr 9 2020, 6:53 AM.

Details

Summary

Introduce the DW_AT_LLVM_lambda_name

The proposed attribute points to the name of the lambda expression from C++ source code.

Let’s compile a basic C++ code using a lambda expression with CLANG:

$ cat basic.cpp
#include <iostream>
int main()
{
 auto printNum = [](int num) {
   std::cout << num << '\n';
 };
 for (int i = 0; i < 5; ++i)
   printNum(i);
 return 0;
}
$ clang++ -g -O2  basic.cpp
 

Basically, before this set of patches a debug session on this simple example looks like:

$ gdb a.out
...
Breakpoint 1 at 0x401195: file basic.cpp, line 6.
(gdb) r
Starting program: a.out
Breakpoint 1, main::$_0::operator() (this=<optimized out>, num=0) at basic.cpp:6
6 std::cout << num << '\n';
(gdb) bt
#0 main::$_0::operator() (this=<optimized out>, num=0) at basic.cpp:6
#1 main () at basic.cpp:10

Using the lldb is not much better since we are missing the info from compiler:

$ lldb a.out
...
(lldb) bt
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
* frame #0: 0x0000000000401195 a.out`main [inlined] main::$_0::operator(this=<unavailable>, num=0)(int) const at basic.cpp:6:15
frame #1: 0x0000000000401190 a.out`main at basic.cpp:10
frame #2: 0x00007ffff7a961e3 libc.so.6`__libc_start_main + 243
frame #3: 0x00000000004010ce a.out`_start + 46

Currently, debuggers print mangled name (main::$_0::operator), which is not that useful in this backtrace.
After applying the set of patches and using the LLDB, debugging session on the same test case looks:

$ clang++ -glldb -O2 basic.cpp
$ lldb a.out
..
(lldb) bt
* thread #1, name = ‘a.out’, stop reason = breakpoint 1.1
* frame #0: 0x0000000000401195 a.out `main [inlined] [lambda] printNum(this=<unavailable>, num=0) at basic.cpp:6:15 [opt]  //<== by adding this feature, I was able to add the [lambda] attribute and the name of lambda ‘printNum’//
frame #1: 0x0000000000401190 a.out `main at basic.cpp:10 [opt]
frame #2: 0x00007ffff7a961e3 libc.so.6`__libc_start_main + 243
frame #3: 0x00000000004010ce a.out _start + 46

The GCC + GDB combination does a bit better job when processing lambdas. GCC adds hardcoded DW_AT_name “<lambda>” of corresponding DWARF TAG, referring to the type created for the lambda expression. That is not proposed by DWARF standard, but GDB knows how to parse it and print that into the backtrace.

Diff Detail

Event Timeline

djtodoro created this revision.Apr 9 2020, 6:53 AM

I should have introduced this as an RFC, but I have changed my email and it looks like that @llvm-dev admins have not approved my request yet.
If this makes sense, I could make the RFC. We may want to start the discussion on @dwarf mailing list as well, since something like this could be standardized and used through various tools using DWARF (compilers, debuggers, etc.).

djtodoro edited the summary of this revision. (Show Details)Apr 9 2020, 6:58 AM
djtodoro edited the summary of this revision. (Show Details)
djtodoro edited the summary of this revision. (Show Details)

I realized this when debugging LLVM by using debuggers (also noticed the GCC does a bit better job). I wrote this code in hurry, so I have not cleaned up/refactored the code and there are some (I think 6) test cases failing.

Yeah, please hold on all these patches & wait until you're approved to post to llvm-dev and have a discussion there. I'd worry that the design discussion might get fragmented amongst these threads (& lost amongst the noise of llvm-commits in general).

Some starter questions, though - at least:

  1. why does this need different support (in IR, in LLVM, etc) rather than using the existing "Name" field, if there's a more suitable name than the current one
  2. could you show the GCC+GDB behavior & contrast it with LLVM's (Clang+GDB/Clang+LLDB, etc) - what I could see didn't look too bad/different between Clang and GCC to me
  3. you mention "the lambda name" a few times, but lambdas don't have names - it looks like the name of the enclosing function is being used in some way? From what I can see from GCC and Clang's output (using Clang ToT, and GCC built from source/ToT at 2020/1/11) neither give the lambda type (DW_TAG_structure_type) a DW_AT_name at all, and LLVM/Clang doesn't name the ctor/dtor (whereas GCC names them "<lambda>" and "~<lambda>") and Clang and GCC both name the actual operator() "operator()" - so it's not looking all that different?

@dblaikie Thanks a lot for the comments!

Yeah, please hold on all these patches & wait until you're approved to post to llvm-dev and have a discussion there. I'd worry > that the design discussion might get fragmented amongst these threads (& lost amongst the noise of llvm-commits in general).

I strongly agree. I am now subscribed to the @llvm-dev with the new email, but I think I'll be away from keyboard for a few days. When I am back I'll make the RFC for sure.

First of all to summarize lambdas:

  1. lambda is an expression
  2. a closure is the object (runtime object) created by a lambda
  3. a closure is being instantiated from a class called closure class (for each lambda, compilers generate a unique closure class)

Sorry for the confusion. Naming is hard, but I was mistaken when calling it "lambda_name". Lambdas do not have names, and that should be something like: "enclosing_function/method_name".

Some starter questions, though - at least:

  1. why does this need different support (in IR, in LLVM, etc) rather than using the existing "Name" field, if there's a more suitable name than the current one

I thought adding something (lambda) specific to this DISubprogram representing the enclosing function can help debuggers to print some additional info in the final backtrace (such as [inlined] [lambda] from the example I shared).

  1. could you show the GCC+GDB behavior & contrast it with LLVM's (Clang+GDB/Clang+LLDB, etc) - what I could see didn't look too bad/different between Clang and GCC to me

I use gcc --version == g++ (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008
For a (closure class) type representing a lambda, GCC generates DW_TAG_structure_type with DW_AT_name <lambda(int)>, LLVM/Clang does not generate anything "lambda specific" within corresponding DW_TAG_class_type (clang generates that TAG for the closure class). As a consequence, for the GGC compiled example, we end up with the backtrace like:

...
(gdb) bt
#0  <lambda(int)>::operator()
...

but for LLVM/Clang generated exe:

...
#0  main::$_0::operator()
...

For example, If we have two closures within main():

--using gcc:
   -I was not able to print info about the frame that comes from the second closure (it prints nothing)
 --using clang:
   -It produces a subprogram with mangled name such as ($_2, $_3, etc.):
  	...
 	#0  main::$_2::operator()
      ...

I think that these names are not that useful to end users when using debuggers. The output that GCC+GDB produces is more useful, but I think that having a standardized way of representing this would be desirable (with enclosing_function/method_name after this patch set, LLDB was able to print the name of enclosing function in the backtrace; the alternative is using regular DW_AT_name, but to introduce a flag within the closure class indicating that it comes from a lambda, i.e. DW_AT_lambda, instead of hard coding the name "lambda<>" within the tag representing the type).

  1. you mention "the lambda name" a few times, but lambdas don't have names - it looks like the name of the enclosing function is being used in some way?

I wrote it in hurry, sorry, I was mistaken. As I said above, that should be something like "enclosing_function/method_name".

From what I can see from GCC and Clang's output (using Clang ToT, and GCC built from source/ToT at 2020/1/11) neither give the lambda type (DW_TAG_structure_type) a DW_AT_name at all

Actually it does. I use GCC 9.2, and it adds a name within the tag representing the type.

, and LLVM/Clang doesn't name the ctor/dtor (whereas GCC names them "<lambda>" and "~<lambda>") and Clang and GCC both name the actual operator() "operator()" - so it's not looking all that different?

That is very similar, except LLVM/Clang does not name the ctor/dtor, but I think it is not that important.

Some starter questions, though - at least:

  1. why does this need different support (in IR, in LLVM, etc) rather than using the existing "Name" field, if there's a more suitable name than the current one

I thought adding something (lambda) specific to this DISubprogram representing the enclosing function can help debuggers to print some additional info in the final backtrace (such as [inlined] [lambda] from the example I shared).

  1. could you show the GCC+GDB behavior & contrast it with LLVM's (Clang+GDB/Clang+LLDB, etc) - what I could see didn't look too bad/different between Clang and GCC to me

I use gcc --version == g++ (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008
For a (closure class) type representing a lambda, GCC generates DW_TAG_structure_type with DW_AT_name <lambda(int)>, LLVM/Clang does not generate anything "lambda specific" within corresponding DW_TAG_class_type (clang generates that TAG for the closure class). As a consequence, for the GGC compiled example, we end up with the backtrace like:

...
(gdb) bt
#0  <lambda(int)>::operator()

Ah, seems that's changed upstream since 9.2 - in current ToT GCC no name is produced so far as I've been able to reproduce:

0x0000004f: DW_TAG_structure_type

DW_AT_byte_size       (0x01)
DW_AT_decl_file       ("/usr/local/google/home/blaikie/dev/scratch/lambda.cpp")
DW_AT_decl_line       (3)
DW_AT_decl_column     (0x14)
DW_AT_sibling (0x0000010d)

The ctor/dtor still use the "<lambda>" name in them, FWIW.

The stack frame is "operator()" with no scope/context.

...

but for LLVM/Clang generated exe:

...
#0  main::$_0::operator()
...

For example, If we have two closures within main():

--using gcc:
   -I was not able to print info about the frame that comes from the second closure (it prints nothing)

Can you provide complete reproduction steps to clarify the scenarios in each case (in this one, what exactly you mean by "prints nothing" and which frames in which places, etc) - source listings, GDB/LLDB command line logs, etc. Minimal examples as much as possible.

--using clang:
  -It produces a subprogram with mangled name such as ($_2, $_3, etc.):
 	...
	#0  main::$_2::operator()
     ...

I think that these names are not that useful to end users when using debuggers. The output that GCC+GDB produces is more useful, but I think that having a standardized way of representing this would be desirable (with enclosing_function/method_name after this patch set, LLDB was able to print the name of enclosing function in the backtrace; the alternative is using regular DW_AT_name, but to introduce a flag within the closure class indicating that it comes from a lambda, i.e. DW_AT_lambda, instead of hard coding the name "lambda<>" within the tag representing the type).

What about the line and file number associated with these functions/function calls? Is that not useful/adequate to disambiguate the lambda, etc?

  1. you mention "the lambda name" a few times, but lambdas don't have names - it looks like the name of the enclosing function is being used in some way?

I wrote it in hurry, sorry, I was mistaken. As I said above, that should be something like "enclosing_function/method_name".

From what I can see from GCC and Clang's output (using Clang ToT, and GCC built from source/ToT at 2020/1/11) neither give the lambda type (DW_TAG_structure_type) a DW_AT_name at all

Actually it does. I use GCC 9.2, and it adds a name within the tag representing the type.

, and LLVM/Clang doesn't name the ctor/dtor (whereas GCC names them "<lambda>" and "~<lambda>") and Clang and GCC both name the actual operator() "operator()" - so it's not looking all that different?

That is very similar, except LLVM/Clang does not name the ctor/dtor, but I think it is not that important.

@dblaikie Thanks for your comments! I've been away for a few days, sorry the late response!

Some starter questions, though - at least:

  1. why does this need different support (in IR, in LLVM, etc) rather than using the existing "Name" field, if there's a more suitable name than the current one

I thought adding something (lambda) specific to this DISubprogram representing the enclosing function can help debuggers to print some additional info in the final backtrace (such as [inlined] [lambda] from the example I shared).

  1. could you show the GCC+GDB behavior & contrast it with LLVM's (Clang+GDB/Clang+LLDB, etc) - what I could see didn't look too bad/different between Clang and GCC to me

I use gcc --version == g++ (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008
For a (closure class) type representing a lambda, GCC generates DW_TAG_structure_type with DW_AT_name <lambda(int)>, LLVM/Clang does not generate anything "lambda specific" within corresponding DW_TAG_class_type (clang generates that TAG for the closure class). As a consequence, for the GGC compiled example, we end up with the backtrace like:

...
(gdb) bt
#0  <lambda(int)>::operator()

Ah, seems that's changed upstream since 9.2 - in current ToT GCC no name is produced so far as I've been able to reproduce:

0x0000004f: DW_TAG_structure_type

DW_AT_byte_size       (0x01)
DW_AT_decl_file       ("/usr/local/google/home/blaikie/dev/scratch/lambda.cpp")
DW_AT_decl_line       (3)
DW_AT_decl_column     (0x14)
DW_AT_sibling (0x0000010d)

The ctor/dtor still use the "<lambda>" name in them, FWIW.

The stack frame is "operator()" with no scope/context.

Yes, that is the case. I am using a newer version that comes along with the Ubuntu 19.10:

$ g++ --version
g++ (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
...

but for LLVM/Clang generated exe:

...
#0  main::$_0::operator()
...

For example, If we have two closures within main():

--using gcc:
   -I was not able to print info about the frame that comes from the second closure (it prints nothing)

Can you provide complete reproduction steps to clarify the scenarios in each case (in this one, what exactly you mean by "prints nothing" and which frames in which places, etc) - source listings, GDB/LLDB command line logs, etc. Minimal examples as much as possible.

Sure. I'll try to picture that within next comment.

--using clang:
  -It produces a subprogram with mangled name such as ($_2, $_3, etc.):
 	...
	#0  main::$_2::operator()
     ...

I think that these names are not that useful to end users when using debuggers. The output that GCC+GDB produces is more useful, but I think that having a standardized way of representing this would be desirable (with enclosing_function/method_name after this patch set, LLDB was able to print the name of enclosing function in the backtrace; the alternative is using regular DW_AT_name, but to introduce a flag within the closure class indicating that it comes from a lambda, i.e. DW_AT_lambda, instead of hard coding the name "lambda<>" within the tag representing the type).

What about the line and file number associated with these functions/function calls? Is that not useful/adequate to disambiguate the lambda, etc?

Hmm, good idea. I have not tried something like that, but I feel like a flag (lambda specific flag) would be still missing for some verbose output.

  1. you mention "the lambda name" a few times, but lambdas don't have names - it looks like the name of the enclosing function is being used in some way?

I wrote it in hurry, sorry, I was mistaken. As I said above, that should be something like "enclosing_function/method_name".

From what I can see from GCC and Clang's output (using Clang ToT, and GCC built from source/ToT at 2020/1/11) neither give the lambda type (DW_TAG_structure_type) a DW_AT_name at all

Actually it does. I use GCC 9.2, and it adds a name within the tag representing the type.

, and LLVM/Clang doesn't name the ctor/dtor (whereas GCC names them "<lambda>" and "~<lambda>") and Clang and GCC both name the actual operator() "operator()" - so it's not looking all that different?

That is very similar, except LLVM/Clang does not name the ctor/dtor, but I think it is not that important.

I tried to make some basic cases showing the current stage with lambdas.

Let's consider two simple C++ cases using lambdas.
The files are attached.

Using GCC

$ g++ --version
g++ (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

single_lambda.cpp:

$ g++ single_lambda.cpp -g -O2 -o gcc-single-lambda-call
$ gdb gcc-single-lambda-call
...
#0  <lambda(int)>::operator()
...

lldb does not parse the <lambda(int)> part:

$ lldb gcc-single-lambda-call
...
* frame #0: 0x00005555555550e7 gcc-single-lambda-call`main [inlined] operator(...
...

multiple_lambdas_call.cpp:

$ g++ multiple_lambdas_call.cpp -g -O2 -o gcc-multiple-lambdas-call
$ gdb gcc-multiple-lambdas-call
...
(gdb) b 6
(gdb) r
...
(gdb) bt
#0  <lambda(int)>::operator() ...
...

but stopping into the second lambda 'printnum2()' gives us this backtrace (there is no "frame" about second lambda within that backtrace):

...
(gdb) b 10
(gdb) r
...
(gdb) bt
#0  main () at multiple_lambdas_call.cpp:13
...

lldb outputs a smiliar backtrace.

Using CLANG

Using the Trunk version of LLVM/Clang.

single_lambda.cpp:

$ clang++ -g -O2 single_lambda.cpp -o clang-single-lambda-call
$ gdb clang-single-lambda-call
...
(gdb) b 6
(gdb) r
...
(gdb) bt
#0  main::$_0::operator()
...

$ lldb clang-single-lambda-call
...
(lldb) bt
...
  * frame #0: 0x0000000000401195 clang-single-lambda-call`main [inlined] main::$_0::operator(...

multiple_lambdas_call.cpp:

$ clang++ -g -O2 multiple_lambdas_call.cpp -o clang-multiple-lambdas-calls
...
(gdb) b 10
...
(gdb) r
...
(gdb) bt
#0  main::$_1::operator()
...

lldb outputs a smiliar backtrace.

With the patched (by applying this set of patches) LLVM/Clang and lldb, we have:

single_lambda.cpp:

$ clang++ -glldb -O2 single_lambda.cpp -o clang-patched-single-lambda
$ lldb ./clang-patched-single-lambda
...
(lldb) b 6
...
(lldb) r
...
(lldb) bt
...
  * frame #0: 0x0000000000401195 clang-patched-single-lambda`main [inlined] [lambda] printNum(this=<unavailable>, num=0)
...

In general I don't think this is a good idea - not all lambdas end up with variables that refer to them (and the name of that variable isn't the name of the lambda, in any case - that's the thing, the lambda is nameless). (eg: this code crashes on "int main() { [] { }(); }" and produces the wrong answer, for, say "int main() { int narf; []{}(); }" and clang doesn't keep track of reverse edges (there's no list of users of a type, I think?) so it's not possible in general to answer the question of "what, if any, name was used to refer to this lambda?")

But if you really want to pursue this, please start a thread on llvm-dev (including the usual suspects - myself, echristo, aprantl, probinson) to discuss how lambda debugging might be improved.

@dblaikie Thanks a lot for your feedback!

In general I don't think this is a good idea - not all lambdas end up with variables that refer to them (and the name of that variable isn't the name of the lambda, in any case - that's the thing, the lambda is nameless). (eg: this code crashes on "int main() { [] { }(); }" and produces the wrong answer, for, say "int main() { int narf; []{}(); }" and clang doesn't keep track of reverse edges (there's no list of users of a type, I think?) so it's not possible in general to answer the question of "what, if any, name was used to refer to this lambda?")

Basically, this was a couple-of-days-writting code, so I've just hardcoded to make it working only for the simple case (single_lambda.cpp) (it uses the firstDecl from the AST which is not the right solution). In addition, I was mistaken when naming it lambda_name, since better name would be name_of_enclosing_func (or something similar), since lambdas are nameless. Anyhow, after I saw this in front of me, I also realized the name_of_enclosing_func is not that useful for all the lambdas, so I think I'll change the approach.
IMPO, clang should handle lambdas (at least) the same way the newest GCC does (I've shared above how it looks; debugging user experience is obviously better). It may be even improved; I have some ideas, and I'll make a prototype for that as soon as I find some time.
Please let me know what you think.

But if you really want to pursue this, please start a thread on llvm-dev (including the usual suspects - myself, echristo, aprantl, probinson) to discuss how lambda debugging might be improved.

Sure.

Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2020, 1:17 AM

@dblaikie Thanks a lot for your feedback!

In general I don't think this is a good idea - not all lambdas end up with variables that refer to them (and the name of that variable isn't the name of the lambda, in any case - that's the thing, the lambda is nameless). (eg: this code crashes on "int main() { [] { }(); }" and produces the wrong answer, for, say "int main() { int narf; []{}(); }" and clang doesn't keep track of reverse edges (there's no list of users of a type, I think?) so it's not possible in general to answer the question of "what, if any, name was used to refer to this lambda?")

Basically, this was a couple-of-days-writting code, so I've just hardcoded to make it working only for the simple case (single_lambda.cpp) (it uses the firstDecl from the AST which is not the right solution). In addition, I was mistaken when naming it lambda_name, since better name would be name_of_enclosing_func (or something similar), since lambdas are nameless. Anyhow, after I saw this in front of me, I also realized the name_of_enclosing_func is not that useful for all the lambdas, so I think I'll change the approach.
IMPO, clang should handle lambdas (at least) the same way the newest GCC does (I've shared above how it looks; debugging user experience is obviously better). It may be even improved; I have some ideas, and I'll make a prototype for that as soon as I find some time.
Please let me know what you think.

Have you tested against ToT GCC? It looked to me that ToT GCC was closer to/the same as Clang's output & I'm not sure it's better. (GCC prints "<lambda(int)>::operator()" where Clang prints "main::$_0::operator()" - and honestly, the latter seems more informative - about where the lambda comes from)

But if you really want to pursue this, please start a thread on llvm-dev (including the usual suspects - myself, echristo, aprantl, probinson) to discuss how lambda debugging might be improved.

Sure.

@dblaikie Thanks a lot for your feedback!

In general I don't think this is a good idea - not all lambdas end up with variables that refer to them (and the name of that variable isn't the name of the lambda, in any case - that's the thing, the lambda is nameless). (eg: this code crashes on "int main() { [] { }(); }" and produces the wrong answer, for, say "int main() { int narf; []{}(); }" and clang doesn't keep track of reverse edges (there's no list of users of a type, I think?) so it's not possible in general to answer the question of "what, if any, name was used to refer to this lambda?")

Basically, this was a couple-of-days-writting code, so I've just hardcoded to make it working only for the simple case (single_lambda.cpp) (it uses the firstDecl from the AST which is not the right solution). In addition, I was mistaken when naming it lambda_name, since better name would be name_of_enclosing_func (or something similar), since lambdas are nameless. Anyhow, after I saw this in front of me, I also realized the name_of_enclosing_func is not that useful for all the lambdas, so I think I'll change the approach.
IMPO, clang should handle lambdas (at least) the same way the newest GCC does (I've shared above how it looks; debugging user experience is obviously better). It may be even improved; I have some ideas, and I'll make a prototype for that as soon as I find some time.
Please let me know what you think.

Have you tested against ToT GCC? It looked to me that ToT GCC was closer to/the same as Clang's output & I'm not sure it's better. (GCC prints "<lambda(int)>::operator()" where Clang prints "main::$_0::operator()" - and honestly, the latter seems more informative - about where the lambda comes from)

Hmm, yes, I see.. Something in the middle of these two would be better (if possible):
e.g.

main::<lambda(int)>::operator()
or at least we could provide a flag (lambda specific), in order to add an additional info to the frame as so:
[inlined] [lambda] main::operator()

but this is, I think, more appropriate for an RFC.

But if you really want to pursue this, please start a thread on llvm-dev (including the usual suspects - myself, echristo, aprantl, probinson) to discuss how lambda debugging might be improved.

Sure.

djtodoro planned changes to this revision.Jul 9 2020, 7:28 AM
djtodoro abandoned this revision.Jan 27 2022, 1:17 AM