Simple replacement of the builtin demangler with LLVM's new ItaniumPartialDemangler. Kept the old code ifdef'ed out for benchmarking in one of the next steps. Before that I want to wire it up further and
make use of the new structured demangling capabilities.
Details
Diff Detail
- Build Status
Buildable 20649 Build 20649: arc lint + arc unit
Event Timeline
Hi Stefan, thanks for working on this!
source/Core/Mangled.cpp | ||
---|---|---|
289 | I think this is going to really tank performance: ItaniumPartialDemangler dynamically allocates a pretty big buffer on construction that it uses to store the AST (and reuse for subsequent calls to partialDemangle). Is there somewhere that you can put IPD it so that it stays alive between demangles? An alternative, if its more convenient, would be to just put the buffer inline into ItaniumPartialDemangler, and = delete the move operations. | |
294 | This should really be malloc, finishDemangle may realloc the buffer, and it gets freeed below. |
source/Core/Mangled.cpp | ||
---|---|---|
289 | Thanks for the remark, I didn't dig deep enough to see what's going on in the BumpPointerAllocator class. I guess there is a reason for having ASTAllocator in the Db struct as an instance and thus allocating upfront, instead of having a pointer there and postponing the instantiation to Db::reset()? I am not familiar enough with the code yet to know how it will look exactly, but having a heavy demangler in every Mangled per se sounds unreasonable. There's just too many of them that don't require demangling at all. For each successfully initiated partialDemangle() I will need to keep one of course. I will have a closer look on Monday. So far thanks for mentioning that! | |
294 | Ah of course, will fix that. Thanks |
Stefan, you should always add lldb-commits to the subscribers of your phab reviews. I added it now, but IIRC there are issues with adding subscribers after the fact.
source/Core/Mangled.cpp | ||
---|---|---|
289 | Well, right the pointer to BumpPointerAllocator won't solve anything. Ok will have a look. |
Thanks for doing this.
We may consider doing some A-B testing between the two demanglers.
A strategy that worked very well for similar purposes was that of running nm on a large executable (e.g. clang or lldb itself) and see whether we demangle in the same exact way and measure the time needed for demangling.
We do need to test the performance of this as demangling is a hot spot when we parse any object file. If it is faster, then we should just replace it and not add any options to be able to switch. Also we should compare demangled names between the two to ensure there are no differences as we are doing this.
cmake/modules/LLDBConfig.cmake | ||
---|---|---|
413โ414 | Since these options are mutually exclusive it might be better to make this a single multi-valued setting. Also, this new demangler should also be available in the MSVC case, should it not? Yhe reason we have if(MSVC) here is because the "system" demangler is not available there, but that should not be an issue here. |
Hi all, thanks for your feedback. I will update the review with code fixes in a bit.
Also we should compare demangled names between the two to ensure there are no differences as we are doing this.
Yes I can definitely do it manually. When it comes to writing tests, I would consider it more like testing the demangler itself than its integration with LLDB right? Shall we add tests in LLVM that compare to __cxa_demangle?
https://github.com/llvm-mirror/llvm/blob/master/unittests/Demangle/PartialDemangleTest.cpp
running nm on a large executable (e.g. clang or lldb itself) and see whether we demangle in the same exact way and measure the time needed for demangling
Good idea, will try that.
this new demangler should also be available in the MSVC case, should it not?
I don't think the Itanium mangler supports MSVC mangling.
always add lldb-commits to the subscribers of your phab reviews
Aye aye
That's fine. It just needs to be able to demangle itanium names when running on an MSVC platform.
That's fine. It just needs to be able to demangle itanium names when running on an MSVC platform.
IIUC cstring_mangling_scheme() should return eManglingSchemeItanium in this case and switch to the case label without the #if defined(_MSC_VER).
Yes, and that's correct.
But I am talking about cmake code here. option(LLDB_USE_LLVM_DEMANGLER "Use llvm's new partial demangler" ON) is in an !MSVC block, so the user will not be able to choose which demangler to use there. My point was simply to move that option out of that block so that it is available on msvc builds as well. (That is if the new demangler works with msvc, but I don't see a reason why it shouldn't.)
Combined with the other suggestion of making this a tri-valued setting, you could have a single setting with values "system", "legacy", "new" or whatever you think best describes them, and then do something like if (MSVC AND ${setting} == "system") message(ERROR "System itanium demangler not available on MSVC").
cmake/modules/LLDBConfig.cmake | ||
---|---|---|
413โ414 |
Sorry, just saw that too.
The old LLDB_USE_BUILTIN_DEMANGLER might be removed entirely once we have benchmark results. Otherwise I will make it multi-valued. |
Quick local performance check doing target create clang in current review version vs. master HEAD version (both loading the exact same release build of clang) looks promising. It's faster already now, so I would remove the option for the builtin demangling.
review version = LLDB_USE_LLVM_DEMANGLER: TargetList::CreateTarget (file = 'clang', arch = 'x86_64') = 0.762155337 sec master HEAD version = LLDB_USE_BUILTIN_DEMANGLER: TargetList::CreateTarget (file = 'clang', arch = 'x86_64') = 1.010040359 sec
source/Core/Mangled.cpp | ||
---|---|---|
289 |
I think in the end each Mangled instance will have a pointer to IPD field for lazy access to rich demangling info. This piece of code may become something like: m_IPD = new ItaniumPartialDemangler(); if (bool err = m_IPD->partialDemangle(mangled_name)) { delete m_IPD; m_IPD = nullptr; } In order to avoid unnecessary instantiations, we can consider to filter symbols upfront that we know can't be demangled. E.g. we could duplicate the simple checks from Db::parse() before creating a ItaniumPartialDemangler instance. Even the simple switch with the above code in place shows performance improvements. So for now I would like to leave it this way and review the issue after having the bigger patch, which will actually make use of the rich demangle info. What do you think? |
Oh, nice! I was expecting FastDemangle to still beat the partial demangler. If FastDemangle is now slower than maybe it should be removed (or at least renamed!).
source/Core/Mangled.cpp | ||
---|---|---|
289 | Sure, if this is a performance win then I can't think of any reason not to do it. In the future though, I don't think that having copies of IPD in each Mangled is a good idea, even if they are lazily initialized. The partially demangled state held in IPD is significantly larger than the demangled string, so I think it would lead to a lot more memory usage. Do you think it is possible to have just one instance of IPD that you could use to demangle all the symbols to their chopped-up state, and just store that instead? |
Well when repeating this test, the values are not always that far apart from each other, but on average the old USE_BUILTIN_DEMANGLER path the slower one. Maybe FastDemangle is still faster than IPD in success case, but the overhead from the fallback cases is adding up. (The USE_BUILTIN_DEMANGLER code path is also more noisy in terms of performance, probably same issue here.)
source/Core/Mangled.cpp | ||
---|---|---|
289 | Yes if that will be a bit more work, but also a possibility. I did a small experiment making the IPD in line 288 static, to reuse a single instance. That didn't affect runtimes much. I tried it several times and got the same results again, but have no explanation. You would expect a speedup from this right? Is there maybe cleanup work that eats up time when reusing an instance? Maybe I have to revisit that. |
Sorry if the review is a little bumpy, it's my first one. Added all subscribers as reviewers now. Hope that's ok.
The current version is a rather simple change, that slightly reduces complexity and slightly improves performance. IMHO having it committed would make it easier to discuss next steps on the list and work on improvements step by step. Thus I would prefer to keep this review small here and instead open a new one once I have a proposal. Happy to hear what you think and answer questions.
source/Core/Mangled.cpp | ||
---|---|---|
289 | Weird, I would have expected a speedup for making it static. My only guess is that malloc is just quickly handing back the buffer it used for the last IPD or something. I think the cleanup work IPD does should be about the same as the cost of construction. |
The patch makes sense to me. It's nice to get a performance boost, while reducing the number of demanglers at the same time.
source/Core/Mangled.cpp | ||
---|---|---|
289 | If reusing the same IPD object can bring significant benefit, I think the right approach would be to change/extend/add the API (not in this patch) to make it possible to use it naturally. There aren't many places that do batch demangling of a large amount of symbols (Symtab::InitNameIndexes is one I recall), so it shouldn't be too hard to modify them to do something smarter. |
source/Core/Mangled.cpp | ||
---|---|---|
289 | Yes. I will reach out to the list for discussion of options, when I am done with a few experiments in that direction. |
Change EXPECTED_EQ to EXPECTED_STREQ, because ConstString::operator==() compares identity of pointers, not equality of strings. While "they must come from the same pool in order to be equal" (as stated in the description), this is verifying string equality not the internals of ConstString.
unittests/Core/CMakeLists.txt | ||
---|---|---|
4 | This should be after ListenerTest.cpp to keep this file in alphabetical order. |
unittests/Core/CMakeLists.txt | ||
---|---|---|
4 | Hehe good catch! |
I did a little investigation of the performance impact of not reusing the IPD. For this I used the unit test PartialDemangleTest.TestNameChopping, where I added an outer loop:
TEST(PartialDemangleTest, TestNameChopping) { ... llvm::ItaniumPartialDemangler IPD; for (int i = 0; i < (1 << 20); ++i) { // 1M repetitions for (ChoppedName &N : NamesToTest) { EXPECT_FALSE(IPD.partialDemangle(N.Mangled)); EXPECT_TRUE(IPD.isFunction()); EXPECT_FALSE(IPD.isData()); ...
Then I compared runtimes when having IPD outside vs. inside the two for loops. Over a series of test runs, the average inside-the-loops overhead is around 10% here. Just for interest I then raised the initial buffer size from 4k to 1M. This obviously has a much bigger impact on the runtimes:
IPD outside IPD inside IPD inside (1MB) real 0m10.626s 0m11.769s 0m32.064s user 0m10.474s 0m11.601s 0m27.526s sys 0m0.082s 0m0.091s 0m4.394s
I think today's heap implementations are pretty good in allocating 4k chunks of memory. Nevertheless, in the next step we should try and avoid this overhead. This may require conceptual changes though. I send a comprehensive proposal to the dev list for discussion.
Given these numbers I think we can land this patch as is and figure out the IDP stuff on the mailing list and/or a future patch. LGTM. Thanks Stefan!
Since these options are mutually exclusive it might be better to make this a single multi-valued setting. Also, this new demangler should also be available in the MSVC case, should it not? Yhe reason we have if(MSVC) here is because the "system" demangler is not available there, but that should not be an issue here.