This is an archive of the discontinued LLVM Phabricator instance.

[demangler] Fix new/delete demangling
ClosedPublic

Authored by urnathan on Jan 28 2022, 7:53 AM.

Details

Summary

I discovered some demangler problems:

a) parsing of new expressions was broken, ignoring any 'gs' prefix
b) (when #a is fixed) badly formatted global new expressions
c) formatting of new and delete failed to correctly add whitespace

(a) happens as parseExpr swallows the 'gs' prefix but doesn't pass it to 'parseNewExpr'. It seems simpler to me to just code the new expression parsing directly in parseExpr, as is done for delete expressions.

(b) global new should be rendered something like '::new T' not '::operator new T'

(c) is resolved by being a bit more careful with whitespace.

Best shown with some examples (don't worry that these symbols are for impossible instantiations, that's not the point):

Old behaviour:
build/bin/llvm-cxxfilt _ZN2FnIXgsnw_iEEXna_ipiLi4EEEEEvv _ZN2FnIXnwLj4E_iEEXgsnaLj4E_ipiLi4EEEEEvv _ZN2FnIXgsdlLi4EEXdaLi4EEEEvv _ZN2FnIXdlLj4EEXgsdaLj4EEEEvv
void Fn<new int, new[] int(4)>() # No ::new
void Fn<new (4u)int, new[] (4u)int(4)>() # No ::new, poor whitespace
void Fn<::delete4, delete[] 4>() # missing necessary space
void Fn<delete4u, ::delete[] 4u>() # missing necessary space

New behaviour:
build/bin/llvm-cxxfilt _ZN2FnIXgsnw_iEEXna_ipiLi4EEEEEvv _ZN2FnIXnwLj4E_iEEXgsnaLj4E_ipiLi4EEEEEvv _ZN2FnIXgsdlLi4EEXdaLi4EEEEvv _ZN2FnIXdlLj4EEXgsdaLj4EEEEvv
void Fn<::new int, new[] int(4)>()
void Fn<new(4u) int, ::new[](4u) int(4)>()
void Fn<::delete 4, delete[] 4>()
void Fn<delete 4u, ::delete[] 4u>()

Binutils' behaviour:
c++filt _ZN2FnIXgsnw_iEEXna_ipiLi4EEEEEvv _ZN2FnIXnwLj4E_iEEXgsnaLj4E_ipiLi4EEEEEvv _ZN2FnIXgsdlLi4EEXdaLi4EEEEvv _ZN2FnIXdlLj4EEXgsdaLj4EEEEvv
void Fn<::new int, new int(4)>()
void Fn<new (4u) int, ::new (4u) int(4)>()
void Fn<::delete (4), delete[] (4)>()
void Fn<delete (4u), ::delete[] (4u)>()

The fixed and binutils demanglings are the same modulo some whitespace and optional parens.

Diff Detail

Event Timeline

urnathan requested review of this revision.Jan 28 2022, 7:53 AM
urnathan created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptJan 28 2022, 7:53 AM
urnathan edited the summary of this revision. (Show Details)
ChuanqiXu added inline comments.Feb 6 2022, 7:24 PM
libcxxabi/src/demangle/ItaniumDemangle.h
4802–4803

HaveInits is loop invariant, could we hoist the check?

4810

In case HaveInits is false and consumeIf('E') return true, the original behavior would be return make<NewExpr>(ExprList, Ty, NodeArray(), Global, IsArray);. Is the current behavior expected?

urnathan marked 2 inline comments as done.Feb 7 2022, 8:37 AM
urnathan added inline comments.
libcxxabi/src/demangle/ItaniumDemangle.h
4802–4803

no, see below

4810

You're misreading the old code -- there's a negation on the non-looping consume( 'E'). There's always an 'E', it's just that without 'pi' that 'E'must be immediate (there are no expressions).

Here's the grammar -- you always reach an 'E'

[gs] nw <expression>* _ <type> E # new (expr-list) type
[gs] nw <expression>* _ <type> <initializer> # new (expr-list) type (init)
[gs] na <expression>* _ <type> E # new[] (expr-list) type
[gs] na <expression>* _ <type> <initializer> # new[] (expr-list) type (init)
// <initializer> ::= pi <expression>* E # parenthesized initialization

(Not sure why the 'pi' initializer introducer is needed, probably history.)

OLD:
if ('pi') { while not 'E' {read expr; } build newexpr (exprs)
else if not 'E'; fail
else build newexpr ();

NEW:
haveinits = 'pi';
while not 'E' { if !haveinits; fail, else read expr }
build newexpr (exprs)

urnathan marked 2 inline comments as done.Feb 7 2022, 9:00 AM
urnathan added inline comments.
libcxxabi/src/demangle/ItaniumDemangle.h
4810

oh, of course 'pi' is needed as 'new int' and 'new int ()' are different. Not sure why 'new int {}' is not representable, perhaps no one's got there yet?

ChuanqiXu added inline comments.Feb 7 2022, 6:50 PM
libcxxabi/src/demangle/ItaniumDemangle.h
4810

Yeah, the point here is if 'pi' is not true, then there must be an immediate 'E'. Then the code makes sense. It is a little surprising to me that the parser of demangler wouldn't handle the wrong case. Maybe the basic assumption is that the input is valid.

For the case of 'new int {}', I guess it is represented as there are multiple <expression> between pi and 'E'.

urnathan added inline comments.Feb 9 2022, 8:40 AM
libcxxabi/src/demangle/ItaniumDemangle.h
4810

I'm not sure I follow about 'wouldn't handle the wrong case'. The parser should definitely not assume the input is valid. What invalid input have I missed?

ChuanqiXu added inline comments.Feb 9 2022, 5:52 PM
libcxxabi/src/demangle/ItaniumDemangle.h
4810

I mean the assumption: "There's always an 'E'. And without 'pi' that 'E' must be immediate". I think we get the conclusion from the grammar. But my question is "what if the input doesn't follow the grammar". Like there is not an immediate 'E' if 'p' is not presented.

urnathan added inline comments.Feb 10 2022, 3:37 AM
libcxxabi/src/demangle/ItaniumDemangle.h
4810

Yes, the grammar at this point permits either 'E' or 'pi' <expressions>* 'E' to appear. If we don't get 'pi' then HaveInits will be false. If we then don't get 'E' we'll enter the loop, and that will return nullptr as HaveInits is false. I don;t see a problem.

ChuanqiXu accepted this revision.Feb 10 2022, 3:51 AM
ChuanqiXu added inline comments.
libcxxabi/src/demangle/ItaniumDemangle.h
4810

Yeah, this is what I say about handling invalid input. If this is the policy or what demangler always tries to do, I have no problem.

This revision is now accepted and ready to land.Feb 10 2022, 3:51 AM
This revision was landed with ongoing or failed builds.Feb 10 2022, 4:33 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2022, 4:33 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript