This is an archive of the discontinued LLVM Phabricator instance.

[Demangle][Rust] Parse integer constants
ClosedPublic

Authored by tmiasko on May 10 2021, 10:00 AM.

Diff Detail

Event Timeline

tmiasko created this revision.May 10 2021, 10:00 AM
tmiasko requested review of this revision.May 10 2021, 10:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 10 2021, 10:00 AM
dblaikie added inline comments.May 10 2021, 1:34 PM
llvm/lib/Demangle/RustDemangle.cpp
207–212

Previous code didn't have an error path for this missing piece - any particular reason to add it now (& thus have to test it) compared to not supporting this & having const or type as the only options (& if it's not const, parse it as a type - as the previous code did, parsing everything as a type regardless)

427

Somewhat surprising that though this production uses <basic-type> but the implementation doesn't call parseBasicType - any chance of sharing that implementation detail, trivial as it is?

458

grammar production comment for this?

466–471

switching between int and decimal for the demangling seems like it could be problematic for readers/parsers/etc, maybe? (what does llvm-cxxfilt do for equivalent C++ numbered values (like integer template parameters))

Maybe APSInt could be used which supports bigger (though not arbitrarily big, I don't think... - I think you have to pick the size, though I guess you could pick it dynamically based on the number of hex digits) integers.

llvm/test/Demangle/rust.test
151–185

Do these benefit from the second parameter that shows the type? Or is the type testing redundant with the existing type testing added in the previous patch?

@tmiasko Thanks for adding this support to llvm. We're already using your work in Android. Can you comment on the status of this support? For e.g., I see that the following name is not demangled:

_RNvYNtNtNtCs7WMZDM8jTtQ_10env_logger3fmt6writer6TargetNtNtCsfOHkQPwunBC_4core3cmp9PartialEq2neB8_.

This symbol is generated with rust 1.51 with -Z symbol-mangling-version=v0

tmiasko updated this revision to Diff 344381.May 11 2021, 6:09 AM
tmiasko marked 3 inline comments as done.

Address review comments

  • Add an basic type enum & reuse basic type parsing.
  • Use APInt to format all const integers as decimals.
llvm/lib/Demangle/RustDemangle.cpp
207–212

No particular reason. I reverted the changes so that code falls back to the type production as before.

427

I introduced an enum for a basic types and reused the parsing code. What do you think?

466–471

I used APInt to consistently format values as decimals. This matches rustc pretty printing which consistently uses decimal as well.

In itanium this is non-issue since values have decimal mangling.

llvm/test/Demangle/rust.test
151–185

I included a type argument in addition to a const integer value, so that demangled from clearly indicates what type is being tested, and to visually distinguish checks from each other (in case one of them fails for example).

@pirama path productions which are necessary to demangle most symbols are not implemented yet. I can let you know the implementation is complete. At that point any reports about not demangled symbols would be quite valuable.

dblaikie added inline comments.May 11 2021, 3:04 PM
llvm/lib/Demangle/RustDemangle.cpp
427

Looks pretty good - if you like, you could commit that as a preliminary refactoring-only commit. (might be a good point to follow the steps to get commit access for changes like this)

438–440

Hmm - I'm just realizing that it looks like the demangle library has no dependencies on the rest of LLVM (the StringView type was a hint I saw earlier - but now looking more closely).

I'm guessing that's intentional & I'm surprised this code links successfully - I'd expect some part of SmallString or APInt would have some link-time dependence on the rest of LLVM, which I think (looking at the CMake file at least) there's no dependency on from the demangle library? Hmm, maybe llvm-cxxfilt has such a dependency and it's being picked up by accident (are you using lld or a windows linker, by chance? Those will be a bit more flexible about fulfilling library dependencies, perhaps).

So maybe we'll have to stick with the previous implementation you had here :/ (if so, might be worth a comment explaining why - though I don't immediately know where this constraint for libDemangle is documented/comes from - could you check around a bit in the library/version control history, maybe?)

llvm/test/Demangle/rust.test
151–185

perhaps rather than relying on the type encoding to test the integer encoding, the type could be rendered in the name (replacing the "integer" string) instead? (though that may be harder to read in the mangling - maybe not though)

tmiasko added inline comments.May 12 2021, 5:35 AM
llvm/lib/Demangle/RustDemangle.cpp
438–440

The code of Itanium demangler (ItaniumDemangle.h, DemangleConfig.h, StringView.h, and Utility.h) is shared with libcxxabi. The initial commit that introduced demangler to LLVM, comments:

The code also has no dependencies on anything else in LLVM. To enforce
that I added it as another library. That way a BUILD_SHARED_LIBS will
fail if anyone adds an use of StringRef for example.

The build with BUILD_SHARED_LIBS does indeed fail.

The implementation of the Itanium demangler itself cannot any dependencies, but it seems to me that we could add them to the demangler component. I don't know if we want to do that. Functionally wise, the difference is rather minor. Apart from APInt used here, I was potentially looking into using unicode::isPrintable when demangling char constants.

tmiasko updated this revision to Diff 345229.May 13 2021, 11:15 AM
tmiasko marked 3 inline comments as done.
  • Remove dependencies from implementation.
  • Show integer constants wider than 64-bits as hexadecimal numbers.
  • Include integer types in non-mangled form in tests
dblaikie accepted this revision.May 13 2021, 12:36 PM

Seems good, thanks!

llvm/lib/Demangle/RustDemangle.cpp
553–555

Might be worth a more formal description here, something like:

The return value is unspecified if HexDigits.size() > 16.

(so you can't even rely on the return value being zero, maximum, or some other default value - it's arbitrary/meaningless)

This revision is now accepted and ready to land.May 13 2021, 12:36 PM
This revision was automatically updated to reflect the committed changes.
tmiasko marked 2 inline comments as done.