This is an archive of the discontinued LLVM Phabricator instance.

[demangler] Add operator precedence
ClosedPublic

Authored by urnathan on Mar 3 2022, 6:46 AM.

Details

Summary

The demangler had no concept of operator precendence, and would parenthesize many more subexpressions than necessary. In particular it would parenthesize primary-expressions, such as '4', which just looks strange. It would also parenthesize '>' expressions, just in case they were inside a template parameter list.

This patch fixes both issues.

  • Add operator precedence to the OpInfo structure, and add a subexpression helper that will parenthesize a lower precedence subexpression.
  • Add a 'greater-than is greater-than' indicator to the output buffer, so the expression printer knows whether it is immediately inside a template parameter list (and must therefore parenthesize 'expr > expr'). This is a counter, so that ...
  • Add open and close printers to the output buffer, that increment and decrement the gt-is-gt indicator.
  • Parenthesize comma operators inside comma-separated lists. (probably a rare case, but still).

This dramatically reduces the extraneous parentheses being printed.

Diff Detail

Event Timeline

urnathan created this revision.Mar 3 2022, 6:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 6:46 AM
urnathan requested review of this revision.Mar 3 2022, 6:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 6:46 AM
bruno requested changes to this revision.Mar 3 2022, 3:32 PM

Nice cleanup, fixes and testcases, just pending test failures and this looks good.

libcxxabi/src/demangle/ItaniumDemangle.h
283

Nice!

This revision now requires changes to proceed.Mar 3 2022, 3:32 PM
urnathan updated this revision to Diff 412984.Mar 4 2022, 4:46 AM

remove C++14 auto use -- thought I'd already done that :(

urnathan updated this revision to Diff 413082.Mar 4 2022, 11:23 AM
urnathan updated this revision to Diff 413443.Mar 7 2022, 6:21 AM

sigh, a test case change had got attached to the wrong local commit. this time for sure!

urnathan updated this revision to Diff 417563.EditedMar 23 2022, 4:16 AM
urnathan edited reviewers, added: erichkeane, dblaikie; removed: ChuanqiXu.

Rebased. I believe the bruno-requested changes are done.

dblaikie accepted this revision.Mar 23 2022, 11:22 AM

Looks like the buildbot failures include two timeouts, and a format check that's already failing, perhaps? If that's it, then seems OK to me.

Looks like the buildbot failures include two timeouts, and a format check that's already failing, perhaps? If that's it, then seems OK to me.

Yes, the format failure is because the entire test program is badly formatted, and mostly contained in one many-thousand-line array, which any new mangling test adds elements too. And hence the formatter reformats the entire array. (and reformatting the array is (a) unwanted and (b) blows up the diff limit of phabricator)

Looks like the buildbot failures include two timeouts, and a format check that's already failing, perhaps? If that's it, then seems OK to me.

Yes, the format failure is because the entire test program is badly formatted, and mostly contained in one many-thousand-line array, which any new mangling test adds elements too. And hence the formatter reformats the entire array. (and reformatting the array is (a) unwanted and (b) blows up the diff limit of phabricator)

Yeah - if you like, it's probably fine to reformat that and commit it directly without review before or after this patch. But no big deal either way.

rjmccall added inline comments.Mar 24 2022, 1:21 PM
libcxxabi/src/demangle/ItaniumDemangle.h
1752–1760

Does this not need to apply to >> for some reason, or does demangling just have a longstanding bug?

urnathan added inline comments.Mar 25 2022, 4:22 AM
libcxxabi/src/demangle/ItaniumDemangle.h
1752–1760

you remind me that I puzzled about this, it is indeed a long-standing bug that we can deal with in a followup?

pre-patch we special cased ">" in an equivalent manner to here, but not ">>":

// might be a template argument expression, then we need to disambiguate
// with parens.
if (InfixOperator == ">")
  OB += "(";

I guess that predates C++'s splitting of >> in template arg parsing?

bruno accepted this revision.Mar 25 2022, 11:26 AM

LGTM

This revision is now accepted and ready to land.Mar 25 2022, 11:26 AM
This revision was landed with ongoing or failed builds.Mar 28 2022, 6:18 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2022, 6:18 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
goncharov added inline comments.
llvm/include/llvm/Demangle/ItaniumDemangle.h
1749

I am not familiar with the code at all, but it seems to be an issue. We have downstream code that relies on the ability to match mode from ctor, but this "match" and other definitions have not been updated along with ctors. Do you mind reverting the patch for now?

ldionne added inline comments.
llvm/include/llvm/Demangle/ItaniumDemangle.h
1749

Can you give a bit more context regarding what issue you're encountering? Since this was landed some time ago, I think it would be best to fix-forward if we can.

frgossen added inline comments.
llvm/include/llvm/Demangle/ItaniumDemangle.h
1749

We have a use of match that expects to invoke the constructor with the arguments it's being passed. That use is missing the precedence argument now, which is not being passed here but it is required for the constructor above.

I am not sure if this is a false use of match or if the match function should deal with the prec argument also.

Ah, thanks for posting - I committed an incomplete fix (didn't touch all the other ctors) in 1d1cf9b6c42c820f38eb776cb7504564441a23ee - I think the right thing is probably to do-nothing in print(Node::Prec prec) - since I assume/my understanding is that the printing already accounts for precedence by adding parentheses etc where needed?

llvm/include/llvm/Demangle/ItaniumDemangle.h
1749

4 hours isn't an especially large window for a revert, but yeah - some more context would be good. I'm working on providing that now/too.

But, yeah, it looks like, judging by other "match" functions it should have the same arg list as the ctor.

Ah, thanks for posting - I committed an incomplete fix (didn't touch all the other ctors) in 1d1cf9b6c42c820f38eb776cb7504564441a23ee - I think the right thing is probably to do-nothing in print(Node::Prec prec) - since I assume/my understanding is that the printing already accounts for precedence by adding parentheses etc where needed?

In any case, @urnathan - could you go ahead and commit the rest of the match fixes (& maybe for now leave my no-op version of print(Prec) in place - happy to discuss further what that should/shouldn't do here before we make further changes)? Or I can if you like.

Ah, thanks for posting - I committed an incomplete fix (didn't touch all the other ctors) in 1d1cf9b6c42c820f38eb776cb7504564441a23ee - I think the right thing is probably to do-nothing in print(Node::Prec prec) - since I assume/my understanding is that the printing already accounts for precedence by adding parentheses etc where needed?

In any case, @urnathan - could you go ahead and commit the rest of the match fixes (& maybe for now leave my no-op version of print(Prec) in place - happy to discuss further what that should/shouldn't do here before we make further changes)? Or I can if you like.

I have pushed:

c204cee642ee 2022-03-29 | [demangler] Update node match calls

to address the missing ctor args in the match functions. The Visitor print call you mention, and a variant in cxa_demangle.cpp's NDEBUG code (it is sad there's more duplication), should print the name of the enumerator -- these are visitors for dumping a textual representation of the node graph.

I'll post a patch to do that, along with a testcase exercising this piece of machinery, for review.

I see parenthesis around simple types like int. Not sure if this is intended.

I see parenthesis around simple types like int. Not sure if this is intended.

do you mean things like '(int)1' --that's a cast (and I remember being tripped up by that. If it's something else can you provide a testcase?

frgossen added a comment.EditedMar 29 2022, 10:49 AM

It's wrapping the function args. I think it's just the space that confused me. Here's the example for completeness.

std::enable_if<sizeof (int) == 4, int>::type func<int>

It's wrapping the function args. I think it's just the space that confused me. Here's the example for completeness.

std::enable_if<sizeof (int) == 4, int>::type func<int>

erm, there are no function args in that example? But the enable_if template arg list looks syntactically correct to me. I've not changed the space in 'sizeof[SPC]([expr|type]) (or typeid/noexcep/alignof)

Ah, thanks for posting - I committed an incomplete fix (didn't touch all the other ctors) in 1d1cf9b6c42c820f38eb776cb7504564441a23ee - I think the right thing is probably to do-nothing in print(Node::Prec prec) - since I assume/my understanding is that the printing already accounts for precedence by adding parentheses etc where needed?

In any case, @urnathan - could you go ahead and commit the rest of the match fixes (& maybe for now leave my no-op version of print(Prec) in place - happy to discuss further what that should/shouldn't do here before we make further changes)? Or I can if you like.

I have pushed:

c204cee642ee 2022-03-29 | [demangler] Update node match calls

to address the missing ctor args in the match functions. The Visitor print call you mention, and a variant in cxa_demangle.cpp's NDEBUG code (it is sad there's more duplication), should print the name of the enumerator -- these are visitors for dumping a textual representation of the node graph.

I'll post a patch to do that, along with a testcase exercising this piece of machinery, for review.

awesome, thanks!