This is an archive of the discontinued LLVM Phabricator instance.

CPlusPlusNameParser does not handles templated operator< properly
ClosedPublic

Authored by shafik on Mar 13 2020, 4:54 PM.

Details

Summary

CPlusPlusNameParser is used in several places on of them is during IR execution and setting breakpoints to pull information C++ like the basename, the context and arguments.

Currently it does not handle templated operator< properly. It used clang::Lexer which will tokenize operator<<A::B> into:

tok::kw_operator
tok::lessless
tok::raw_identifier

Later on the parser in ConsumeOperator() does not handle this case properly and we end up failing to parse.

Diff Detail

Event Timeline

shafik created this revision.Mar 13 2020, 4:54 PM
shafik added a comment.EditedMar 13 2020, 4:59 PM

My fix in ConsumeOperator() is not proper but if everyone feels this is correct approach I will create member functions to deal with this cleanly.

Other approaches could be modifying ExtractTokens() to detect this case and generate two tok::lessin place of tok::lessless but this feels like the wrong place to fix this. I tried to understand how clang handles this case since but it was not obvious. AFAICT they have to deal with this case too.

The approach seems reasonable to me.

lldb/unittests/Language/CPlusPlus/CPlusPlusLanguageTest.cpp
153–155

You could add these to the "EXPECT_FALSE" list below

I am not sure what the usage scenario is that this is meant to support. Is it user input that tries to name a specialization of a template operator< without separation to prevent tokenization as operator<<? I think that case should qualify as user error.

Both C++17 subclause 17.2 [temp.names] and N4849 subclause 13.3 [temp.names] imbue the "angle-bracket" treatment of < to after identification of a name (not to the formation of the name itself) and does involve special treatment of <<. Special treatment in tokenization (C++17 subclause 5.4 [lex.pptoken]; N4849 subclause 5.4 [lex.pptoken]) for angle brackets extends to <: and not to <<.

All of Clang, GCC, ICC, and MSVC do not compile the following:

enum E { E0 };
template <typename T> void operator<(E, T);

void g() {
  operator<<int>(E0, 0);  // does not compile
}

Compiler Explorer (godbolt.org) >>link<<.

In other words, where is the input coming from? Should the producer of the input be corrected instead?

I am not sure what the usage scenario is that this is meant to support. Is it user input that tries to name a specialization of a template operator< without separation to prevent tokenization as operator<<? I think that case should qualify as user error.

Both C++17 subclause 17.2 [temp.names] and N4849 subclause 13.3 [temp.names] imbue the "angle-bracket" treatment of < to after identification of a name (not to the formation of the name itself) and does involve special treatment of <<. Special treatment in tokenization (C++17 subclause 5.4 [lex.pptoken]; N4849 subclause 5.4 [lex.pptoken]) for angle brackets extends to <: and not to <<.

All of Clang, GCC, ICC, and MSVC do not compile the following:

enum E { E0 };
template <typename T> void operator<(E, T);

void g() {
  operator<<int>(E0, 0);  // does not compile
}

Compiler Explorer (godbolt.org) >>link<<.

In other words, where is the input coming from? Should the producer of the input be corrected instead?

Hubert, you are indeed correct. I hit this realization when I was going to sleep yesterday.

This is an unfortunate oddity of debug info. When clang generates DW_AT_Names for templates it appends the template parameters to them, gcc also does this BTW.

Long-term I would like to modify clang to stop doing that for LLDB, but LLDB will still have to support older compilers for a while. So I think this fix is still needed.

Long-term I would like to modify clang to stop doing that for LLDB, but LLDB will still have to support older compilers for a while. So I think this fix is still needed.

So is this some alternative to D75761 (where we'd use CPlusPlusNameParser to decode DW_AT_names of templates)? If so, I think that is an interesting direction, but beware that that class is kind of meant for processing the demangler output. The contents of DW_AT_name looks a bit like a demangled name, but in reality there are some deviations from that format (which is why I did not recommend this direction initially -- but I am not against it either).

Also it looks like llvm and gnu demanglers disagree on the exact formatting of demangled operator names:

$ c++filt _ZlsI1AEvT_S1_
void operator<< <A>(A, A)
$ llvm-cxxfilt _ZlsI1AEvT_S1_
void operator<<<A>(A, A)

I think the gnu format is superior (and unambiguous) so we could change llvm to match that -- and this change probably won't require any kind of compatibility hacks.

Long-term I would like to modify clang to stop doing that for LLDB, but LLDB will still have to support older compilers for a while. So I think this fix is still needed.

So is this some alternative to D75761 (where we'd use CPlusPlusNameParser to decode DW_AT_names of templates)? If so, I think that is an interesting direction, but beware that that class is kind of meant for processing the demangler output. The contents of DW_AT_name looks a bit like a demangled name, but in reality there are some deviations from that format (which is why I did not recommend this direction initially -- but I am not against it either).

Also it looks like llvm and gnu demanglers disagree on the exact formatting of demangled operator names:

$ c++filt _ZlsI1AEvT_S1_
void operator<< <A>(A, A)
$ llvm-cxxfilt _ZlsI1AEvT_S1_
void operator<<<A>(A, A)

I think the gnu format is superior (and unambiguous) so we could change llvm to match that -- and this change probably won't require any kind of compatibility hacks.

This is not an alternative, this is a complement to that fix. So even with D75761 we still fail in expressions and setting breakpoints for symbols of that type. So both fixes are needed.

I personally would like to stop emitting template parameters for DW_AT_names but I know this will break some stuff and I have to see if it is workable or not, I think it is.

Lastly, we will still need to support older compilers regardless.

Long-term I would like to modify clang to stop doing that for LLDB, but LLDB will still have to support older compilers for a while. So I think this fix is still needed.

So is this some alternative to D75761 (where we'd use CPlusPlusNameParser to decode DW_AT_names of templates)? If so, I think that is an interesting direction, but beware that that class is kind of meant for processing the demangler output. The contents of DW_AT_name looks a bit like a demangled name, but in reality there are some deviations from that format (which is why I did not recommend this direction initially -- but I am not against it either).

Also it looks like llvm and gnu demanglers disagree on the exact formatting of demangled operator names:

$ c++filt _ZlsI1AEvT_S1_
void operator<< <A>(A, A)
$ llvm-cxxfilt _ZlsI1AEvT_S1_
void operator<<<A>(A, A)

I think the gnu format is superior (and unambiguous) so we could change llvm to match that -- and this change probably won't require any kind of compatibility hacks.

This is not an alternative, this is a complement to that fix. So even with D75761 we still fail in expressions and setting breakpoints for symbols of that type. So both fixes are needed.

I personally would like to stop emitting template parameters for DW_AT_names but I know this will break some stuff and I have to see if it is workable or not, I think it is.

Lastly, we will still need to support older compilers regardless.

sounds good to me.

aprantl added inline comments.Mar 20 2020, 9:32 AM
lldb/source/Plugins/Language/CPlusPlus/CPlusPlusNameParser.cpp
333

Add a comment what this block is doing?

334

Thinking out loud: This is looking ahead two tokens.

342

here we are pushing an artificial less token, but it's not obvious why what works, so if you could add a comment explaining what is going on here, that would help.

shafik updated this revision to Diff 252941.Mar 26 2020, 12:21 PM
shafik marked 4 inline comments as done.

Addressing comments:

  • Adding more detailed comments
  • Adding test for cases that currently fail b/c we don't enable C++20
aprantl added inline comments.Mar 26 2020, 6:42 PM
lldb/source/Plugins/Language/CPlusPlus/CPlusPlusNameParser.cpp
333

"we" is super ambiguous. This is LLDB, so you may want to say Clang and/or GCC does this.

339

we with then

343

.

345

.

347

.

lldb/unittests/Language/CPlusPlus/CPlusPlusLanguageTest.cpp
147

... how Clang generates debug info?

shafik updated this revision to Diff 253142.Mar 27 2020, 9:24 AM
shafik marked 6 inline comments as done.

Fixing up loose language in the comments and adding periods.

aprantl accepted this revision.Mar 27 2020, 1:34 PM
This revision is now accepted and ready to land.Mar 27 2020, 1:34 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMar 27 2020, 2:54 PM