This is an archive of the discontinued LLVM Phabricator instance.

[demangler] Support the new Rust mangling scheme (v0)
AbandonedPublic

Authored by tmiasko on Apr 6 2021, 11:33 AM.

Details

Reviewers
jhenderson
Summary

This patch implements a demangling support for symbols using new Rust
mangling scheme (v0).

The Rust is currently migrating to this new scheme. The progress is
tracked in https://github.com/rust-lang/rust/issues/60705.

  • The v0 mangling scheme uses "_R" as a prefix, which makes it easy to distinguish it from other mangling schemes.
  • The public API is modeled after __cxa_demangle / llvm::itaniumDemangle, since potential candidates for further integration use those.
  • The llvm-cxxfilt is extended to support Rust mangling and used to test the implementation.

References (specification and relevant parts of implementation in Rust):

Diff Detail

Event Timeline

tmiasko created this revision.Apr 6 2021, 11:33 AM
tmiasko updated this revision to Diff 335675.Apr 6 2021, 3:50 PM
tmiasko edited the summary of this revision. (Show Details)

...

tmiasko published this revision for review.Apr 6 2021, 3:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2021, 3:52 PM

If you are looking for a full review of this, try adding specific people as reviewers who have worked on the demangling library. It would also make sense to split this change up into lots of small parts. For example, you could implement a small subset of the Rust demangling scheme, in a first patch, and expand on it in a series of subsequent patches. That'll make each patch easier to review. Connecting it up to llvm-cxxfilt could also be a separate patch from implementing the demangling code.

llvm/test/tools/llvm-cxxfilt/rust.test
1 ↗(On Diff #335675)

llvm-cxxfilt tool tests are not the place to test the full library functionality. - that should be tested in the unittests. This test should contain just enough to cover the behaviour implemented in llvm-cxxfilt itself.

If you are looking for a full review of this, try adding specific people as reviewers who have worked on the demangling library. It would also make sense to split this change up into lots of small parts. For example, you could implement a small subset of the Rust demangling scheme, in a first patch, and expand on it in a series of subsequent patches. That'll make each patch easier to review. Connecting it up to llvm-cxxfilt could also be a separate patch from implementing the demangling code.

+1 to all that - incremental patches would probably help here

llvm/test/tools/llvm-cxxfilt/rust.test
1 ↗(On Diff #335675)

I'd disagree here - generally if there's a fairly simple/thin wrapper around a library, we do test it via lit/command line tests rather than unit tests. And that seems to be how llvm-cxxfilt/Demangle is tested currently (most cases seem to be tested in llvm/test/tools/llvm-cxxfilt, some in llvm/test/Demangle (using llvm-cxxfilt), but not much is tested in llvm/unittests/Demangle.

jhenderson added inline comments.Apr 22 2021, 1:14 AM
llvm/test/tools/llvm-cxxfilt/rust.test
1 ↗(On Diff #335675)

Not sure I agree with that characterisation. With the exception of the abitag.test test in tools/llvm-cxxfilt, the tests are all primarily testing llvm-cxxfilt level functionality (indirectly some exercise parts of the demangling code too, but I think that's not their purpose).

I don't mind if it makes sense to test things at a lit level, but they should at least be in llvm/test/Demangle. That being said, a unit test invocation for this should simpler than writing a lit test. See for example DemangleTest.cpp, where each case is a single line - a lit test case would require at least two process executions (llvm-cxxfilt and FileCheck), and a FileCheck CHECK pattern, so will inevitably longer. Anecdotally, I also feel like it would be easier to make mistakes in the lit test, although that could be resolved relatively easily by using the right FileCheck flags.

dblaikie added inline comments.Apr 22 2021, 10:30 AM
llvm/test/tools/llvm-cxxfilt/rust.test
1 ↗(On Diff #335675)

Taking a more detailed look - only looking at itanium manglings since I know how to grep for them easily enough (_Z):

unittests/Demangle/DemangleTest.cpp: 5 mangled names/demanglings
unittests/Demangle/ItaniumDemangleTest.cpp: 1 (presumably a test of API syntax/ergonomics, not of the demangling algorithm)
unittests/Demangle/PartialDemangleTest.cpp: 38 - this is an API designed for novel programmatic use (we use it inside Google for rewriting certain mangled names efficiently for profile matching when core library symbol names change for whatever reason)
test/Demangle: Only contain Microsoft demangling test cases (using llvm-undname, the MS equivalent of llvm-cxxfilt)
test/tools/llvm-cxxfilt: 56 (well, more, since several tests have more than one mangled name per line) mangled names

And especially llvm/test/tools/llvm-cxxfilt/delimiters.test seems to be the most comprehensive piece of demangler algorithm testing, for instance.

The first testing when the library was introduced was done with llvm-cxxfilt ( rGb940b66c6032096d40dfd1859e2598749a30378c ), not a unit test (& that test was initially in llvm/test/Demangle, though it got moved to llvm/tools/llvm-cxxfilt in rG7091820a969ed3503806e28dae371d1b11a9da6d )

This is generally true of LLVM testing - if there's a convenient command line wrapper around the functionality, then lit testing tends to be the focus. (see symbolizer, dwarfdump, objdump, etc)

jhenderson added inline comments.Apr 23 2021, 1:13 AM
llvm/test/tools/llvm-cxxfilt/rust.test
1 ↗(On Diff #335675)

We could go back and forth on this a while, I feel.

delimiters.test may have multiple manglings, but the demangling behaviour isn't what's being tested in that test. The purpose of that test is to ensure that the right non-alphanumeric characters act as a delimiter between two character sequences for llvm-cxxfilt's demangling process (see rG5cd5f8f2563395f8767f94604eb4c4bea8dcbea0 where it was introduced). Note that actually it's the same very simple mangled name that is demangled in every case. This behaviour is entirely within the llvm-cxxfilt tool code, not the library code.

There are similar stories for most of the other tests in tools/llvm-cxxfilt too. In general, it looks like there is very little dedicated testing of the Itanium demangler code, which is probably part of the problem here.

In other GNU binutils replacements, we try to ensure tests that are testing the underlying library code are in test/<library name> rather than test/tool/<tool name>. It's not perfect, but it has been a goal. The idea is that developers of a library that is used by multliple tools can find the tests for that library without having to go through potentially multiple different test/tool/ directories. It also helps the test directory structure reflect the llvm code structure better. Example: if I made a change to a Support class, I'd usually add my test to test/Support, if unit tests aren't viable.

This is generally true of LLVM testing - if there's a convenient command line wrapper around the functionality, then lit testing tends to be the focus. (see symbolizer, dwarfdump, objdump, etc)

If I understand it correctly, this is partly for legacy reasons - unit tests haven't been around in LLVM for all that long, so people use what they are used to. It's also because it's often easier to write the lit test than it is to get the setup right for a unit test, which is perfectly reasonable to me. That being said, I'm not opposed to using lit tests instead of unit tests, if the developer prefers (as long as the lit tests are in the right place), although it's worth pointing out that if this patch were to be split up, llvm-cxxfilt support would probably belong in a separate patch, and therefore testing would HAVE to be put into the unit tests.

Maybe I'm not reflecting the wider LLVM view? Should this be brought up on llvm-dev?

Figured I'd move this out of line (so it isn't narrowed to half the screen width):

We could go back and forth on this a while, I feel.

delimiters.test may have multiple manglings, but the demangling behaviour isn't what's being tested in that test. The purpose of that test is to ensure that the right non-alphanumeric characters act as a delimiter between two character sequences for llvm-cxxfilt's demangling process (see rG5cd5f8f2563395f8767f94604eb4c4bea8dcbea0 where it was introduced). Note that actually it's the same very simple mangled name that is demangled in every case. This behaviour is entirely within the llvm-cxxfilt tool code, not the library code.

Fair point - sorry I missed that detail.

There are similar stories for most of the other tests in tools/llvm-cxxfilt too. In general, it looks like there is very little dedicated testing of the Itanium demangler code, which is probably part of the problem here.

Yep

In other GNU binutils replacements, we try to ensure tests that are testing the underlying library code are in test/<library name> rather than test/tool/<tool name>. It's not perfect, but it has been a goal. The idea is that developers of a library that is used by multliple tools can find the tests for that library without having to go through potentially multiple different test/tool/ directories. It also helps the test directory structure reflect the llvm code structure better. Example: if I made a change to a Support class, I'd usually add my test to test/Support, if unit tests aren't viable.

Yep, with you there with regard to putting tests in the right directory. (except the last bit, as I'm more "unit tests when lit tests aren't viable (including significant ambiguity - if the functionality is used in a bunch of different places/ways)")

This is generally true of LLVM testing - if there's a convenient command line wrapper around the functionality, then lit testing tends to be the focus. (see symbolizer, dwarfdump, objdump, etc)

If I understand it correctly, this is partly for legacy reasons - unit tests haven't been around in LLVM for all that long, so people use what they are used to.

It was/is a fairly intentional choice (in part Chris Lattner was not a fan of unit tests in general (one of the funnier examples: https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20130819/185033.html )) - unit tests for code that has a lot of out-of-tree API usage, or just a lot of varied usage internally (like ADTs, in both counts) is worth unit testing. But something with a fairly specific use case/pretty thin mapping to a command line tool (like optimizations - the opt tool testing of middle end optimizations, symbolizing - llvm-symbolizer's such a basic wrapper around that functionality that we test the interesting parts of the algorithm that way and only use unit tests for the interesting parts of the API usage)

I mean, the gunit code & has been in-tree since 2008, so it's not exactly new.

It's also because it's often easier to write the lit test than it is to get the setup right for a unit test, which is perfectly reasonable to me. That being said, I'm not opposed to using lit tests instead of unit tests, if the developer prefers (as long as the lit tests are in the right place),

I have slightly stronger feelings that this is a community norm that I think is good to reinforce/encourage as a matter of uniformity across the project.

Fair point about where the tests go - they do get a bit chaotic - llvm-symbolizer/libDebugInfoDWARF symbolizing functionality tests are split pretty well between llvm/test/DebugInfo and llvm/test/tools/llvm-symbolizer. Not the end of the world, but good to try to improve consistency in this regard.

although it's worth pointing out that if this patch were to be split up, llvm-cxxfilt support would probably belong in a separate patch, and therefore testing would HAVE to be put into the unit tests.

Actually I'd favor splitting it the other way - adding the llvm-cxxfilt support first (with a no-op or trivial example working) then adding new demangling functionality in incremental patches adding more lit tests to it. It's not uncommon to add features to command line tools (like llvm-symbolizer) partly to exercise an otherwise API-only feature

Maybe I'm not reflecting the wider LLVM view?

That's my take on it/why I chimed in.

Should this be brought up on llvm-dev?

If you like - I think the source tree and kinds of test we mostly write on a regular basis shows the norms here.

tmiasko updated this revision to Diff 340292.Apr 24 2021, 9:26 AM
tmiasko edited the summary of this revision. (Show Details)

Reduce scope of changes

Remove punycode support.
Remove fuzzer.
Remove llvm::Demangle integration.
Move tests to llvm/test/Demangle.

Yep, with you there with regard to putting tests in the right directory. (except the last bit, as I'm more "unit tests when lit tests aren't viable (including significant ambiguity - if the functionality is used in a bunch of different places/ways)")

Fair enough.

If I understand it correctly, this is partly for legacy reasons - unit tests haven't been around in LLVM for all that long, so people use what they are used to.

It was/is a fairly intentional choice (in part Chris Lattner was not a fan of unit tests in general (one of the funnier examples: https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20130819/185033.html )) - unit tests for code that has a lot of out-of-tree API usage, or just a lot of varied usage internally (like ADTs, in both counts) is worth unit testing. But something with a fairly specific use case/pretty thin mapping to a command line tool (like optimizations - the opt tool testing of middle end optimizations, symbolizing - llvm-symbolizer's such a basic wrapper around that functionality that we test the interesting parts of the algorithm that way and only use unit tests for the interesting parts of the API usage)

I mean, the gunit code & has been in-tree since 2008, so it's not exactly new.

Guess I didn't understand correctly! Thanks.

Actually I'd favor splitting it the other way - adding the llvm-cxxfilt support first (with a no-op or trivial example working) then adding new demangling functionality in incremental patches adding more lit tests to it. It's not uncommon to add features to command line tools (like llvm-symbolizer) partly to exercise an otherwise API-only feature

Fine by me.

@tmiasko, I think the minimal functionality @dblaikie is suggesting would be something like implementing the bare minimal support for demangling something like _RC3foo (assuming that's a valid mangling). That allows you to have the llvm-cxxfilt code and a test file in place. Then in multiple subsequent patches, you gradually expand the functionality to cover other manglings.

llvm/lib/Demangle/RustDemangle.cpp
25–29

FWIW, I don't think there's a way to test this block without unit-testing, but I'm fine if the bulk wants to be in the lit tests. Same goes with a number of the other potential failure paths.

Yep, with you there with regard to putting tests in the right directory. (except the last bit, as I'm more "unit tests when lit tests aren't viable (including significant ambiguity - if the functionality is used in a bunch of different places/ways)")

Fair enough.

If I understand it correctly, this is partly for legacy reasons - unit tests haven't been around in LLVM for all that long, so people use what they are used to.

It was/is a fairly intentional choice (in part Chris Lattner was not a fan of unit tests in general (one of the funnier examples: https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20130819/185033.html )) - unit tests for code that has a lot of out-of-tree API usage, or just a lot of varied usage internally (like ADTs, in both counts) is worth unit testing. But something with a fairly specific use case/pretty thin mapping to a command line tool (like optimizations - the opt tool testing of middle end optimizations, symbolizing - llvm-symbolizer's such a basic wrapper around that functionality that we test the interesting parts of the algorithm that way and only use unit tests for the interesting parts of the API usage)

I mean, the gunit code & has been in-tree since 2008, so it's not exactly new.

Guess I didn't understand correctly! Thanks.

Actually I'd favor splitting it the other way - adding the llvm-cxxfilt support first (with a no-op or trivial example working) then adding new demangling functionality in incremental patches adding more lit tests to it. It's not uncommon to add features to command line tools (like llvm-symbolizer) partly to exercise an otherwise API-only feature

Fine by me.

@tmiasko, I think the minimal functionality @dblaikie is suggesting would be something like implementing the bare minimal support for demangling something like _RC3foo (assuming that's a valid mangling). That allows you to have the llvm-cxxfilt code and a test file in place. Then in multiple subsequent patches, you gradually expand the functionality to cover other manglings.

Yep, something along those lines.

llvm/lib/Demangle/RustDemangle.cpp
25–29

Yeah, API usage issues like this should be exercised by a unit test.

Though I wonder if it might be simpler to not support this and instead assert that the parameters are non-null as needed, rather than that the caller would get some error handling result.

Thanks for suggestions @dblaikie, @jhenderson. I will create a separate preparatory patch that should be easier to a review.

llvm/lib/Demangle/RustDemangle.cpp
25–29

The main motivation behind the current API choice was to facilitate further integration. Places where I would like to eventually support Rust demangling use itaniumDemangle for the most part, so I settled on the same API. The microsoftDemangle has similar API as well.

This API does unfortunately require a bit of extra code and unit tests to cover it (tests are in RustDemangleTest.cpp).

dblaikie added inline comments.Apr 29 2021, 2:26 PM
llvm/lib/Demangle/RustDemangle.cpp
25–29

Ah, fair enough.

The existing API also has some quirky memory management which is unfortunate (passing in malloc'd memory and reallocing, etc, internally) - but such is life.

pirama added a subscriber: pirama.Apr 30 2021, 1:00 PM
tmiasko abandoned this revision.Jun 23 2021, 3:07 PM

For the record, support for Rust mangling scheme landed in:

With the only remaining part being a support for non-ASCII identifiers in D104366.

Thanks for review David!

For the record, support for Rust mangling scheme landed in:

With the only remaining part being a support for non-ASCII identifiers in D104366.

Thanks for review David!

Awesome - thanks for your patience/all the patches & tests. I do have D104366 in my queue, it's just taking me a while longer to get the time to focus on it and understand punycode encoding enough to check over the implementation, etc.