Page MenuHomePhabricator

Use LLVM's new ItaniumPartialDemangler in LLDB
ClosedPublic

Authored by sgraenitz on Jul 20 2018, 11:49 AM.

Details

Summary

Simple replacement of the builtin demangler with LLVM's new ItaniumPartialDemangler. Kept the old code ifdef'ed out for benchmarking in one of the next steps. Before that I want to wire it up further and
make use of the new structured demangling capabilities.

Diff Detail

Repository
rL LLVM

Event Timeline

sgraenitz created this revision.Jul 20 2018, 11:49 AM

Hi Stefan, thanks for working on this!

source/Core/Mangled.cpp
310 ↗(On Diff #156559)

I think this is going to really tank performance: ItaniumPartialDemangler dynamically allocates a pretty big buffer on construction that it uses to store the AST (and reuse for subsequent calls to partialDemangle). Is there somewhere that you can put IPD it so that it stays alive between demangles?

An alternative, if its more convenient, would be to just put the buffer inline into ItaniumPartialDemangler, and = delete the move operations.

315 ↗(On Diff #156559)

This should really be malloc, finishDemangle may realloc the buffer, and it gets freeed below.

sgraenitz added inline comments.Jul 20 2018, 1:41 PM
source/Core/Mangled.cpp
310 ↗(On Diff #156559)

Thanks for the remark, I didn't dig deep enough to see what's going on in the BumpPointerAllocator class. I guess there is a reason for having ASTAllocator in the Db struct as an instance and thus allocating upfront, instead of having a pointer there and postponing the instantiation to Db::reset()?

I am not familiar enough with the code yet to know how it will look exactly, but having a heavy demangler in every Mangled per se sounds unreasonable. There's just too many of them that don't require demangling at all. For each successfully initiated partialDemangle() I will need to keep one of course.

I will have a closer look on Monday. So far thanks for mentioning that!

315 ↗(On Diff #156559)

Ah of course, will fix that. Thanks

Stefan, you should always add lldb-commits to the subscribers of your phab reviews. I added it now, but IIRC there are issues with adding subscribers after the fact.

sgraenitz added inline comments.Jul 20 2018, 2:04 PM
source/Core/Mangled.cpp
310 ↗(On Diff #156559)

Well, right the pointer to BumpPointerAllocator won't solve anything. Ok will have a look.

davide added a subscriber: davide.Jul 20 2018, 2:59 PM

Thanks for doing this.
We may consider doing some A-B testing between the two demanglers.
A strategy that worked very well for similar purposes was that of running nm on a large executable (e.g. clang or lldb itself) and see whether we demangle in the same exact way and measure the time needed for demangling.

We do need to test the performance of this as demangling is a hot spot when we parse any object file. If it is faster, then we should just replace it and not add any options to be able to switch. Also we should compare demangled names between the two to ensure there are no differences as we are doing this.

labath added a subscriber: labath.Jul 23 2018, 3:04 AM
labath added inline comments.
cmake/modules/LLDBConfig.cmake
417–423 ↗(On Diff #156559)

Since these options are mutually exclusive it might be better to make this a single multi-valued setting. Also, this new demangler should also be available in the MSVC case, should it not? Yhe reason we have if(MSVC) here is because the "system" demangler is not available there, but that should not be an issue here.

Hi all, thanks for your feedback. I will update the review with code fixes in a bit.

Also we should compare demangled names between the two to ensure there are no differences as we are doing this.

Yes I can definitely do it manually. When it comes to writing tests, I would consider it more like testing the demangler itself than its integration with LLDB right? Shall we add tests in LLVM that compare to __cxa_demangle?
https://github.com/llvm-mirror/llvm/blob/master/unittests/Demangle/PartialDemangleTest.cpp

running nm on a large executable (e.g. clang or lldb itself) and see whether we demangle in the same exact way and measure the time needed for demangling

Good idea, will try that.

this new demangler should also be available in the MSVC case, should it not?

I don't think the Itanium mangler supports MSVC mangling.

always add lldb-commits to the subscribers of your phab reviews

Aye aye

this new demangler should also be available in the MSVC case, should it not?

I don't think the Itanium mangler supports MSVC mangling.

That's fine. It just needs to be able to demangle itanium names when running on an MSVC platform.

That's fine. It just needs to be able to demangle itanium names when running on an MSVC platform.

IIUC cstring_mangling_scheme() should return eManglingSchemeItanium in this case and switch to the case label without the #if defined(_MSC_VER).

That's fine. It just needs to be able to demangle itanium names when running on an MSVC platform.

IIUC cstring_mangling_scheme() should return eManglingSchemeItanium in this case and switch to the case label without the #if defined(_MSC_VER).

Yes, and that's correct.

But I am talking about cmake code here. option(LLDB_USE_LLVM_DEMANGLER "Use llvm's new partial demangler" ON) is in an !MSVC block, so the user will not be able to choose which demangler to use there. My point was simply to move that option out of that block so that it is available on msvc builds as well. (That is if the new demangler works with msvc, but I don't see a reason why it shouldn't.)

Combined with the other suggestion of making this a tri-valued setting, you could have a single setting with values "system", "legacy", "new" or whatever you think best describes them, and then do something like if (MSVC AND ${setting} == "system") message(ERROR "System itanium demangler not available on MSVC").

sgraenitz updated this revision to Diff 156743.Jul 23 2018, 4:12 AM

Fix: Use malloc instead of new for allocating the demangled_name buffer.

sgraenitz marked an inline comment as done.Jul 23 2018, 4:15 AM
sgraenitz updated this revision to Diff 156745.Jul 23 2018, 4:22 AM

Enable LLDB_USE_LLVM_DEMANGLER on MSVC platforms.

sgraenitz marked an inline comment as done.Jul 23 2018, 4:26 AM
sgraenitz added inline comments.
cmake/modules/LLDBConfig.cmake
417–423 ↗(On Diff #156559)

I am talking about cmake code here.

Sorry, just saw that too.

Since these options are mutually exclusive it might be better to make this a single multi-valued setting.

The old LLDB_USE_BUILTIN_DEMANGLER might be removed entirely once we have benchmark results. Otherwise I will make it multi-valued.

sgraenitz marked an inline comment as done.Jul 23 2018, 5:51 AM

Quick local performance check doing target create clang in current review version vs. master HEAD version (both loading the exact same release build of clang) looks promising. It's faster already now, so I would remove the option for the builtin demangling.

review version = LLDB_USE_LLVM_DEMANGLER:
TargetList::CreateTarget (file = 'clang', arch = 'x86_64') = 0.762155337 sec

master HEAD version = LLDB_USE_BUILTIN_DEMANGLER:
TargetList::CreateTarget (file = 'clang', arch = 'x86_64') = 1.010040359 sec
sgraenitz added inline comments.Jul 23 2018, 6:15 AM
source/Core/Mangled.cpp
310 ↗(On Diff #156559)

ItaniumPartialDemangler dynamically allocates a pretty big buffer on construction

I think in the end each Mangled instance will have a pointer to IPD field for lazy access to rich demangling info. This piece of code may become something like:

m_IPD = new ItaniumPartialDemangler();
if (bool err = m_IPD->partialDemangle(mangled_name)) {
  delete m_IPD; m_IPD = nullptr;
}

In order to avoid unnecessary instantiations, we can consider to filter symbols upfront that we know can't be demangled. E.g. we could duplicate the simple checks from Db::parse() before creating a ItaniumPartialDemangler instance.

Even the simple switch with the above code in place shows performance improvements. So for now I would like to leave it this way and review the issue after having the bigger patch, which will actually make use of the rich demangle info.

What do you think?

sgraenitz updated this revision to Diff 156778.Jul 23 2018, 7:11 AM

Remove CMake options LLDB_USE_BUILTIN_DEMANGLER and LLDB_USE_LLVM_DEMANGLER.

Quick local performance check doing target create clang in current review version vs. master HEAD version (both loading the exact same release build of clang) looks promising. It's faster already now, so I would remove the option for the builtin demangling.

review version = LLDB_USE_LLVM_DEMANGLER:
TargetList::CreateTarget (file = 'clang', arch = 'x86_64') = 0.762155337 sec

master HEAD version = LLDB_USE_BUILTIN_DEMANGLER:
TargetList::CreateTarget (file = 'clang', arch = 'x86_64') = 1.010040359 sec

Oh, nice! I was expecting FastDemangle to still beat the partial demangler. If FastDemangle is now slower than maybe it should be removed (or at least renamed!).

source/Core/Mangled.cpp
310 ↗(On Diff #156559)

Sure, if this is a performance win then I can't think of any reason not to do it.

In the future though, I don't think that having copies of IPD in each Mangled is a good idea, even if they are lazily initialized. The partially demangled state held in IPD is significantly larger than the demangled string, so I think it would lead to a lot more memory usage. Do you think it is possible to have just one instance of IPD that you could use to demangle all the symbols to their chopped-up state, and just store that instead?

Well when repeating this test, the values are not always that far apart from each other, but on average the old USE_BUILTIN_DEMANGLER path the slower one. Maybe FastDemangle is still faster than IPD in success case, but the overhead from the fallback cases is adding up. (The USE_BUILTIN_DEMANGLER code path is also more noisy in terms of performance, probably same issue here.)

source/Core/Mangled.cpp
310 ↗(On Diff #156559)

Yes if that will be a bit more work, but also a possibility. I did a small experiment making the IPD in line 288 static, to reuse a single instance. That didn't affect runtimes much. I tried it several times and got the same results again, but have no explanation.

You would expect a speedup from this right? Is there maybe cleanup work that eats up time when reusing an instance? Maybe I have to revisit that.

Sorry if the review is a little bumpy, it's my first one. Added all subscribers as reviewers now. Hope that's ok.

The current version is a rather simple change, that slightly reduces complexity and slightly improves performance. IMHO having it committed would make it easier to discuss next steps on the list and work on improvements step by step. Thus I would prefer to keep this review small here and instead open a new one once I have a proposal. Happy to hear what you think and answer questions.

source/Core/Mangled.cpp
310 ↗(On Diff #156559)

Weird, I would have expected a speedup for making it static. My only guess is that malloc is just quickly handing back the buffer it used for the last IPD or something. I think the cleanup work IPD does should be about the same as the cost of construction.

The patch makes sense to me. It's nice to get a performance boost, while reducing the number of demanglers at the same time.

source/Core/Mangled.cpp
310 ↗(On Diff #156559)

If reusing the same IPD object can bring significant benefit, I think the right approach would be to change/extend/add the API (not in this patch) to make it possible to use it naturally. There aren't many places that do batch demangling of a large amount of symbols (Symtab::InitNameIndexes is one I recall), so it shouldn't be too hard to modify them to do something smarter.

JDevlieghere added inline comments.Jul 24 2018, 3:20 AM
source/Core/Mangled.cpp
30 ↗(On Diff #156778)

While you're here I'd remove these redundant comments so this block looks more consistent.

292 ↗(On Diff #156778)

Nit: I'd call this demangled_size because finishDemangle will update the value.

sgraenitz updated this revision to Diff 157001.Jul 24 2018, 4:20 AM

Minor improvements on naming and comments

sgraenitz marked 2 inline comments as done.Jul 24 2018, 4:24 AM
sgraenitz added inline comments.
source/Core/Mangled.cpp
310 ↗(On Diff #156559)

Yes. I will reach out to the list for discussion of options, when I am done with a few experiments in that direction.

sgraenitz updated this revision to Diff 157061.Jul 24 2018, 9:42 AM

Change EXPECTED_EQ to EXPECTED_STREQ, because ConstString::operator==() compares identity of pointers, not equality of strings. While "they must come from the same pool in order to be equal" (as stated in the description), this is verifying string equality not the internals of ConstString.

teemperor added inline comments.
unittests/Core/CMakeLists.txt
4 ↗(On Diff #157061)

This should be after ListenerTest.cpp to keep this file in alphabetical order.

sgraenitz updated this revision to Diff 157062.Jul 24 2018, 9:48 AM

Keep alphabetical order of files in CMakeLists.txt

sgraenitz marked an inline comment as done.Jul 24 2018, 9:49 AM
sgraenitz added inline comments.
unittests/Core/CMakeLists.txt
4 ↗(On Diff #157061)

Hehe good catch!

sgraenitz marked an inline comment as done.Jul 25 2018, 1:42 AM

I did a little investigation of the performance impact of not reusing the IPD. For this I used the unit test PartialDemangleTest.TestNameChopping, where I added an outer loop:

TEST(PartialDemangleTest, TestNameChopping) {
  ...
  llvm::ItaniumPartialDemangler IPD;
  for (int i = 0; i < (1 << 20); ++i) { // 1M repetitions
    for (ChoppedName &N : NamesToTest) {
      EXPECT_FALSE(IPD.partialDemangle(N.Mangled));
      EXPECT_TRUE(IPD.isFunction());
      EXPECT_FALSE(IPD.isData());
      ...

Then I compared runtimes when having IPD outside vs. inside the two for loops. Over a series of test runs, the average inside-the-loops overhead is around 10% here. Just for interest I then raised the initial buffer size from 4k to 1M. This obviously has a much bigger impact on the runtimes:

        IPD outside     IPD inside      IPD inside (1MB)
real	0m10.626s       0m11.769s       0m32.064s
user	0m10.474s       0m11.601s       0m27.526s
sys	0m0.082s        0m0.091s        0m4.394s

I think today's heap implementations are pretty good in allocating 4k chunks of memory. Nevertheless, in the next step we should try and avoid this overhead. This may require conceptual changes though. I send a comprehensive proposal to the dev list for discussion.

JDevlieghere accepted this revision.Jul 25 2018, 6:57 AM

Given these numbers I think we can land this patch as is and figure out the IDP stuff on the mailing list and/or a future patch. LGTM. Thanks Stefan!

This revision is now accepted and ready to land.Jul 25 2018, 6:57 AM
Closed by commit rL337931: Use LLVM's new ItaniumPartialDemangler in LLDB (authored by stefan.graenitz, committed by ). · Explain WhyJul 25 2018, 8:19 AM
This revision was automatically updated to reflect the committed changes.