This is an archive of the discontinued LLVM Phabricator instance.

[Demangle] convert itaniumDemangle and nonMicrosoftDemangle to use std::string_view
ClosedPublic

Authored by nickdesaulniers on May 2 2023, 11:53 AM.

Details

Summary

D149104 converted llvm::demangle to use std::string_view. Enabling
"expensive checks" (via -DLLVM_ENABLE_EXPENSIVE_CHECKS=ON) causes
lld/test/wasm/why-extract.s to fail. The reason for this is obscure:

Reason #10007 why std::string_view is dangerous:
Consider the following pattern:

std::string_view s = ...;
const char *c = s.data();
std::strlen(c);

Is c a NUL-terminated C style string? It depends; but if it's not then
it's not safe to call std::strlen on the std::string_view::data().
std::string_view::length() should be used instead.

Fixing this fixes the one lone test that caught this.

microsoftDemangle, rustDemangle, and dlangDemangle should get this same
treatment, too. I will do that next.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptMay 2 2023, 11:53 AM
nickdesaulniers requested review of this revision.May 2 2023, 11:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 2 2023, 11:53 AM

This doesn't make any sense. If the argument "name" is live on entry to the function, it'll remain alive until the function exit. Converting it to a string_view should maintain the same lifetime. Converting it to an std::string actually makes the lifetime shorter.

This doesn't make any sense. If the argument "name" is live on entry to the function, it'll remain alive until the function exit. Converting it to a string_view should maintain the same lifetime. Converting it to an std::string actually makes the lifetime shorter.

I agree with the analysis. It's more likely a caller of maybeDemangleSymbol passes a temporary whose lifetime has ended.

Switching to string_view may expose (due to expensive checks setting _GLIBCXX_DEBUG) latent bugs.

nickdesaulniers planned changes to this revision.May 2 2023, 12:50 PM

Looks like the problematic call site is writeWhyExtract() in lld/wasm/Driver.cpp. Trying to figure out why we're not demangling the symbol reference though still...

The call to std::strlen(MangledName) in llvm::itaniumDemangle looks suspicious and I'll bet we're having trouble demangling non-null terminated string_views.

The call to std::strlen(MangledName) in llvm::itaniumDemangle looks suspicious and I'll bet we're having trouble demangling non-null terminated string_views.

Yeah, something funny going on where the MangledName.length() != std::strlen(MangledName.data()) for this test case. Digging...

Testing a more complete fix...

sorry, the point I sync'ed to doesn't test cleanly with expensive checks enabled, so it's hard to know if my change is regressing anything new...(and this takes FOREVER to build and run tests with expensive checks enabled)...

sorry, the point I sync'ed to doesn't test cleanly with expensive checks enabled, so it's hard to know if my change is regressing anything new...(and this takes FOREVER to build and run tests with expensive checks enabled)...

You may just run ninja check-lld-wasm. The change does not affect other testsuites..

nickdesaulniers retitled this revision from [LLD] fix llvm::demangle lifetime issue to [Demangle] convert itaniumDemangle and nonMicrosoftDemangle to use std::string_view.
nickdesaulniers edited the summary of this revision. (Show Details)
  • redo patch completely
nickdesaulniers edited the summary of this revision. (Show Details)May 2 2023, 3:28 PM
nickdesaulniers planned changes to this revision.May 2 2023, 3:31 PM
nickdesaulniers added inline comments.
llvm/tools/llvm-cxxfilt/llvm-cxxfilt.cpp
91–94

This needs a bounds check; otherwise a few lllvm-cxxfilt tests fail an assertion.

nickdesaulniers edited the summary of this revision. (Show Details)
  • use starts_with for bounds check, rewrite part of commit message

Maybe it makes sense to revert D149104 while you're fixing the various demanglers to correctly respect the input length?

llvm/tools/llvm-cxxfilt/llvm-cxxfilt.cpp
91–94

DecoratedStr.substr(6) == "__imp_" looks wrong. Can we not just use DecoratedStr.starts_with("__imp_")?

nickdesaulniers added inline comments.May 2 2023, 3:42 PM
llvm/tools/llvm-cxxfilt/llvm-cxxfilt.cpp
94

Oh, look, another interface where nullptr is passed excessively by EVERY call site. Smells like another cleanup similar to D148940. Will do so in a follow up patch.

Is c_str2 NUL-terminated? Good question. It depends on whether c_str was NUL-terminated. When s in the above example is a std::string, then it doesn't matter whether c_str is NUL-terminated or not, c_str2 ALWAYS will be.

I am still confused... Is the bug in lld/wasm or llvm/lib/Demangle?

If the former it seems that we need a patch fixing just that part. Changing the signature of llvm::itaniumDemangle is useful but can be a separate patch.

If the latter, which const char * variable points a non-NUL-terminated string that leads to the lld/test/wasm failure?

nickdesaulniers marked an inline comment as done.May 2 2023, 3:44 PM

Maybe it makes sense to revert D149104 while you're fixing the various demanglers to correctly respect the input length?

This should fix-forwards the test. Sorry it took me so long to root cause.

llvm/tools/llvm-cxxfilt/llvm-cxxfilt.cpp
91–94

mid-air collision on your comment, PTAL. also phab is being slow for me today.

nickdesaulniers marked an inline comment as done.May 2 2023, 3:51 PM

Is c_str2 NUL-terminated? Good question. It depends on whether c_str was NUL-terminated. When s in the above example is a std::string, then it doesn't matter whether c_str is NUL-terminated or not, c_str2 ALWAYS will be.

I am still confused... Is the bug in lld/wasm or llvm/lib/Demangle?

llvm/lib/Demangle

If the former it seems that we need a patch fixing just that part. Changing the signature of llvm::itaniumDemangle is useful but can be a separate patch.

False. Passing a c_str that may not be null terminated without passing the length then calling std::strlen on it gives bonkers results. The interface of llvm::itaniumDemangle must be changed to pass the size, such as by passing a std::string_view.

If the latter, which const char * variable points a non-NUL-terminated string that leads to the lld/test/wasm failure?

The std::string_view's data() in llvm::demangle is passed to llvm::itaniumDemangle which calls std::strlen.

efriedma accepted this revision.May 2 2023, 3:51 PM

LGTM

Maybe it makes sense to revert D149104 while you're fixing the various demanglers to correctly respect the input length?

This should fix-forwards the test. Sorry it took me so long to root cause.

If I understand the situation correctly, all the other demanglers read past the end of the buffer too; we just get lucky that none of the tests on the buildbots use the API in a way which exposes that. I'd prefer to revert to known working, then fix each demangler, then fix the generic demangle() API.

This revision is now accepted and ready to land.May 2 2023, 3:51 PM

LGTM

Maybe it makes sense to revert D149104 while you're fixing the various demanglers to correctly respect the input length?

This should fix-forwards the test. Sorry it took me so long to root cause.

If I understand the situation correctly, all the other demanglers read past the end of the buffer too; we just get lucky that none of the tests on the buildbots use the API in a way which exposes that. I'd prefer to revert to known working, then fix each demangler, then fix the generic demangle() API.

Sure, I've backed out the offending commit in 3e3c6f24ff85ea52ed67d4c26f1d3d0eacd1ad1b.

LGTM

Maybe it makes sense to revert D149104 while you're fixing the various demanglers to correctly respect the input length?

This should fix-forwards the test. Sorry it took me so long to root cause.

If I understand the situation correctly, all the other demanglers read past the end of the buffer too; we just get lucky that none of the tests on the buildbots use the API in a way which exposes that. I'd prefer to revert to known working, then fix each demangler, then fix the generic demangle() API.

Sounds good. So we need two commits

  • Revert D149104. The description of this patch can be used by the revert commit.
  • Combine D149104 and this patch

Both commits need very sophisticated testing (including check-lldb check-bolt)...

LGTM

Maybe it makes sense to revert D149104 while you're fixing the various demanglers to correctly respect the input length?

This should fix-forwards the test. Sorry it took me so long to root cause.

If I understand the situation correctly, all the other demanglers read past the end of the buffer too; we just get lucky that none of the tests on the buildbots use the API in a way which exposes that. I'd prefer to revert to known working, then fix each demangler, then fix the generic demangle() API.

Sounds good. So we need two commits

  • Revert D149104. The description of this patch can be used by the revert commit.

Done https://reviews.llvm.org/rG3e3c6f24ff85ea52ed67d4c26f1d3d0eacd1ad1b.

I don't plan to combine them and would prefer to keep them distinct patches to minimize the amount of code that may get reverted (if at all). D149104 was a "top down" approach to the interface. The necessity of the revert shown it should be done bottom up instead.

This patch, as well as D149784 and D151003 are bottom up. I also need to write a patch for the M$ demangler. Then I will revisit the capstone D149104.

Both commits need very sophisticated testing (including check-lldb check-bolt)...

Done (for this patch).

nickdesaulniers planned changes to this revision.Jun 1 2023, 10:51 AM
nickdesaulniers added inline comments.
llvm/lib/Demangle/Demangle.cpp
31

I guess S is now unnecessary and can be removed.

nickdesaulniers accepted this revision.Jun 1 2023, 10:54 AM
nickdesaulniers added inline comments.
llvm/lib/Demangle/Demangle.cpp
31

Nvm/not yet. microsoftDemangle needs to be converted to use std::string_view first.

  • rebase for presubmits
This revision is now accepted and ready to land.Jun 2 2023, 10:08 AM
MaskRay accepted this revision.Jun 2 2023, 2:39 PM
MaskRay added inline comments.
llvm/lib/Demangle/Demangle.cpp
50

I think it's best for the if (!MangledName) return false; change to be made in this patch, not in the precursor patch.

isItaniumEncoding/etc still take const char * arguments and it makes sense for them to keep strict by disallowing nullptr arguments.

nickdesaulniers marked an inline comment as done.Jun 2 2023, 2:53 PM
nickdesaulniers added inline comments.
llvm/lib/Demangle/Demangle.cpp
50

I suspect you are referring to https://reviews.llvm.org/D151003#4391457.

In that case, I will drop the nullptr check from D151003. I will land this (D149675) as is.

nickdesaulniers marked an inline comment as done.Jun 2 2023, 2:56 PM
nickdesaulniers added inline comments.Jun 2 2023, 3:01 PM
llvm/lib/Demangle/Demangle.cpp
50

isItaniumEncoding/etc still take const char * arguments and it makes sense for them to keep strict by disallowing nullptr arguments.

I thought I had fixed that already. Ah! Yeah, but it was backed out as part of:

https://reviews.llvm.org/D149104#change-M6oofoKhLOim

I'll break that off into a separate patch since it's a good change to make separate from landing the top down refactoring.