This patch adds minimal support for D programming language demangling on LLVM core based on the D name mangling spec. This will allow easier integration on a future LLDB plugin for D either in the upstream tree or outside of it. Minimal support includes recognizing D demangling encoding and at least one mangling name, which in this case is `_Dmain` mangle.
Details
Diff Detail
Event Timeline
llvm/lib/Demangle/DLangDemangle.cpp | ||
---|---|---|
37 | Any chance of using an existing stream type (like llvm::itanium_demangle::OutputStream which RustDemangle also uses?)? Otherwise might be worth a bunch of separate testing of this class - or incrementally adding functionality to it that functionality is used in the patch series, otherwise it's hard to tell that everything's tested if it's added in one go here & significant parts are currently unused. | |
llvm/unittests/Demangle/DLangDemangleTest.cpp | ||
20 | I think most of the mangling testing is done via llvm-cxxfilt - see llvm/test/Demangle/rust.test - so probably best this testing be done that way too. |
llvm/lib/Demangle/DLangDemangle.cpp | ||
---|---|---|
37 | This type differs from a stream because D demangler needs the ability to prepend to the output string due to how D demangling is designed. Because of that, a stream is not a good fit here and plus adding methods like prepend will make it conceptually not a stream at all. I will incrementally add parts of it when necessary and add tests for it. Although, if you find any other data structure that might be suitable here, please let me know. | |
llvm/unittests/Demangle/DLangDemangleTest.cpp | ||
20 | I have a point on this that would like to take into consideration. Since these are unitary tests, I guess this has a better fit here rather than on the cxxfilt integration tests, even though majorly all the demangling tests are written there. I don't know if there is any rationale to test the behaviour on the integrations tests, but since these can all be isolated and therefore tested in a more pure way I think we should move them here, and only test if the path to this demangler is correct on cxxfilt. |
llvm/lib/Demangle/DLangDemangle.cpp | ||
---|---|---|
37 | Ah, fair enough. Might want to check the naming conventions (I'd expect "need" to be called "reserve" in C++ API parlance, though perhaps in LLVM's APIs that's called "grow"?)/API design (do the other similar data structures have a "free()" function, or do they rely on the dtor to cleanup?) line up with the existing OutputStream/other data structures in LLVM for consistency. & maybe worth pulling it out into a separate header - given how big this whole file is likely to get? | |
llvm/unittests/Demangle/DLangDemangleTest.cpp | ||
20 | Generally LLVM has small enough tools/binaries that can test functionality in sufficient isolation that most testing is done via these tools (general data structures like those in llvm/ADT are tested with unit tests, for instance - and a few other places where API usage is broad/varied and testing quirks (like different kinds of error handling/corner cases/etc) is valuable). In some ways this can be simpler to work with (eg: nothing needs to be recompiled when the test is modified, extracting a command line to run under a debugger is clear) & partly it's what LLVM developers/development are used to. |
llvm/lib/Demangle/DLangDemangle.cpp | ||
---|---|---|
2 | The "C++" bit is specifically for header files only, as it's to help some editors identify the file type for .h files. | |
34 | Nit: here and in other comments, we end our comments with a . Similarly, comments should always start with a capital letter (already done here, but not done further down). | |
37 | I wonder if we could model this as some kind of deque<std::string> - basically buffer things by adding strings to the start or end of the queue (prepend and append), before finalizing it into a single string at a later point. It might help avoid a lot of manual memory management/copying etc. | |
198 | I don't think LLVM includes these sorts of comments (with the exception of the special C++ annotation in the first line of the license header in header files. | |
llvm/unittests/Demangle/DLangDemangleTest.cpp | ||
20 | In my personal opinion gtest testing makes more sense for testing demangling than lit tests. If nothing else, it's probably significantly more efficient, especially on Windows, where the process launching overhead of spawning llvm-cxxfilt and FileCheck is going to be far greater than the time spent running the actual testing itself. | |
26 | gtest has specific support for parameterised tests (see the TEST_P macro), so it probably makes more sense to leverage that rather than hand-roll your own looping logic. This is particularly important, because it impacts how test failures are reported. |
llvm/lib/Demangle/DLangDemangle.cpp | ||
---|---|---|
37 | Probably would be a deque<char>? Though that'd then involve copying the result into the output buffer - so I think the current approach is probably a/the good one, and consistent with the other demanglers that write directly into their output buffer. Just with the added constraint of needing to be able to prepend info. (another alternative would be to generalize the existing itanium OutputStream to support prepending too) | |
llvm/unittests/Demangle/DLangDemangleTest.cpp | ||
20 | Generally this'd only be a single test file with one llvm-cxxfilt invocation and one FileCheck invocation so the overhead shouldn't be too great (yeah, I'd push back against splitting it into a bunch of run lines and separate lit tests). Part of my motivation here is consistency, among other reasons, for test discovery - having tests in the same place across the different demanglers, and across LLVM generally, hopefully gives us some chance of being able to maintain tests - add new test cases to the same place, rather than creating another test file because we can't find a good home to add things to. |
llvm/unittests/Demangle/DLangDemangleTest.cpp | ||
---|---|---|
20 | If we choose to move to llvm-cxxfilt tests, I will still maintain minimal testing here, mimicking what Rust is currently doing, otherwise I will need to squash https://reviews.llvm.org/D110577 . I agree with @jhenderson for performance reasons although I don't know if it is a point strong enough for changing the testing workflow. Is that spawn process overhead considerable changing regarding the recompilation times? If the case is something like x10 slower per test, I would say that isn't worth changing to cxxfilt test suite. |
llvm/unittests/Demangle/DLangDemangleTest.cpp | ||
---|---|---|
20 | Yeah, I think we'll want both - at least a small unit test to demonstrate the API usage (as the other demanglers have, I think?) and the llvm-cxxfilt test to demonstrate that's wired up correctly too. Then the question is where we put all the functional test cases - since we'll end up with both API and command line tests. And I'd prefer those tests live in a similar place/way to the other demanglers (& the way LLVM tends to test more broadly), which I think is in llvm-cxxfilt |
llvm/lib/Demangle/DLangDemangle.cpp | ||
---|---|---|
2 | Done | |
34 | Done | |
37 | I think a deque is not a good fit for this for several reasons. I think you were referring to some sort of deque<char*>, otherwise it would be very memory inefficient -- the deque node structs are way heavier than a byte. Even with a string deque memory not being contiguous across nodes can also hurt performance, when, e.g. calculating the length of a given deque. Another problem with that is the fact that memory allocation is done in chunks and gown twice when needed. Also, assuming that a string is appended/prepended to the deque on every OutputString::append or OutputString::prepend, allocating small chunks individually is also incredibly slow -- if that is not the case I'm not seeing a way where it is applicable at all. Talking about the OutputStream, as I mentioned above, prepending is not part of a Stream concept (correct me if I'm wrong), since a stream is designed to only grow at the end. I'm not quite sure if something is relying on specific characteristics of a stream but, e.g. because a stream only grows at the end, adding a prepend would mess up with indices. If that is not a problem, I would consider that option but also renaming it. | |
198 | Sorry and thanks for noticing, I probably added this because my editor wasn't smart enough while porting | |
llvm/unittests/Demangle/DLangDemangleTest.cpp | ||
26 | Done |
llvm/unittests/Demangle/DLangDemangleTest.cpp | ||
---|---|---|
26 | I'm not too familiar with gtest, so extra guidance is appreciated. I'm not sure if an empty fixture is the best fit here, tho |
llvm/unittests/Demangle/DLangDemangleTest.cpp | ||
---|---|---|
20 | How should I proceed here, then? |
llvm/include/llvm/Demangle/Demangle.h | ||
---|---|---|
63 | ||
llvm/lib/Demangle/DLangDemangle.cpp | ||
36 | ||
37 | Yeah, char* or string is what I was thinking, since if you don't need to go back and modify any of the already added string, you can just naturally add substrings at either end. You could also try a simple std::list, since that would avoid the overhead of reserving space completely. I've not really studied the performance implications of the various std containers though, so I can't really advise what's the most appropriate overall. I'm more interested in the practicality of writing and maintaining the code (don't reinvent the wheel and all that, which I feel this class more-or-less is doing). I'm not sure when you need to calculate the overall length, so I'm not sure that aspect is relevant? | |
47 | More comments missing full stop throughout this class (but need to settle on the appropriate type before bothering too much). | |
llvm/unittests/Demangle/DLangDemangleTest.cpp | ||
26 | If we end up moving to llvm-cxxfilt testing (and I don't really disagree with @dblaikie's points, so we probably should), this point is moot, as for a single test case, the parameterised testing is overkill. However, for future reference, I'd put the dlangDemangle and free calls in the SetUp and TearDown overrides in the fixture class, so that your test case detail becomes literally the EXPECT_STREQ line. Heck, you could probably get away with putting ALL the logic in the SetUp function, if I'm not mistaken and the TEST_P body then would be completely empty! P.S. Don't forget to run clang-format on your newly added code. |
llvm/lib/Demangle/DLangDemangle.cpp | ||
---|---|---|
37 | If we were going to move away from a custom data structure, to something that'll require copying at the end of the demangling (to get it into a standalone malloc'd buffer to return from the API) I'd probably suggest just using std::string - it's got the same general performance characteristics (eg: O(N) on prepend) as the current code, and is simple/direct, doesn't introduce potential other issues like memory fragmentation, etc. | |
llvm/unittests/Demangle/DLangDemangleTest.cpp | ||
20 | eh, I'm coming around to it - if we've got to have both FileCheck+unit tests anyway, maybe here's a chance to experiment with/try to see if it's something better - maybe I'm just reacting to recent negative experiences struggling with FileCheck, but something more intentional like gtest's expects seem like they can produce better error messages/recovery/etc, than FileCheck. So, sure, let's go with this - figure out the data expansion issues @jhenderson has raised to keep the testing as simple/streamlined as possible. Maybe, if someone feels like it, this could be backed up with some examples of different failure experiences between the two possible approaches (especially multiple failures - showing how FileCheck probably can't recover after the first failure (without some extra output to bind to with -DAG, for instance - and thus doubling the number of CHECK lines, etc) whereas the gtest should be able to report several failures quite clearly - I wonder if gtest does a better job of highlighting differences in long similar strings? At least I think it aligns them and prints them on one line after another, unlike FileCheck) (perhaps FileCheck could have some features that would make this sort of test situation smoother, like a "continue after CHECK-NEXT failure, skipping the failed line" and maybe in that mode, or as a separate mode, a "no, I really know this line should look like this, so if it doesn't, don't go searching for another possibly intended match, give me an exact A/B comparison with this line" (or maybe even the fuzzy line matching suggestion should be printed side-by-side with the CHECK expression, especially if the CHECK doesn't contain regex - maybe with some fancy logic to try to find where the pattern starts if it doesn't match the start of the line) Guess that's another aspect: Do any of the existing mangler tests use any of the FileCheck features such as regex or partial line matching? If they do, then that's maybe a sign that gtest exact matching isn't a perfect fit - but I guess they probably don't, or that if they do it's not enough to matter/really sway the decision. |
llvm/lib/Demangle/DLangDemangle.cpp | ||
---|---|---|
37 | I thought wrongly about deque. Probably deque<char> is desirable compared to deque<char*> or deque<string> since it has the same behaviour of std::vector but with amortized constant insertions at the beginning or end of the nodes. I still don't see good fit because of what I will describe bellow: About the length, getLength() is being used on some parts of the code. The length on a deque is O(N), being N the number of nodes, so it is certainly less performant, unless I rely on a manual counter. The same applies with std::list, with the risk of having even more nodes on std::list, since it is internally implemented with a double linked list instead of individually allocated arrays. The only benefit std::list brings is the constant insertion in the middle. Also copying continuous memory in chunks can be way faster with optimized SIMD memcpy. I read the code of OutputStream and it seems reasonable to me to use it, as it has a very similar approach to this implementation. The only major thing I really see is prepending. I can add the prepend method but my question remains the same, is it really a stream or should we consider renaming it? Is there any implication renaming it, if that is the case? If that is not possible, I thought about extending OutputString from OutputStream making that possible. |
llvm/lib/Demangle/DLangDemangle.cpp | ||
---|---|---|
37 | Re: OutputStream: Yeah, I think it's probably fine to rename it (OutputBuffer, StringBuffer, OutputStringBuffer? It's already in a (perhaps overly specific (itanium, even though it's already used for Rust and now potentially for D) namespace, so doesn't have to include a "demangled" or something else to disambiguate it) - I'm OK with any of the suggested names, including OutputString. I guess OutputBuffer is closer to the original name, while removing the "only adds to the end" semantics carried by the word "stream". Probably rename it in one patch, add the needed prepend in whatever patch is using it shortly thereafter. |
llvm/lib/Demangle/DLangDemangle.cpp | ||
---|---|---|
37 | The problem I mentioned earlier about std::string, and addressed by OutputStream and this OutputString implementation, is the fact that the grow()/need() expands twice the current size + requested size as opposed to the reserve() method in std::string (See here for more context: https://github.com/llvm/llvm-project/blob/main/libcxx/include/string#L3323). This is important because appending to OutputStream/OutputString is more efficient with significant less allocations. | |
37 | Ok, I'm going to write a patch to add prepending and rename it accordingly to OutputBuffer. We can discuss further the implementation/approach there. | |
llvm/unittests/Demangle/DLangDemangleTest.cpp | ||
20 |
About D demangling in specific, no. I don't know much about the other demanglers in the LLVM tree. |
llvm/unittests/Demangle/DLangDemangleTest.cpp | ||
---|---|---|
20 | Yeah, I meant the existing tests - like you could go check the existing llvm-cxxfilt tests to see how they work/if they do anything interesting. |
llvm/unittests/Demangle/DLangDemangleTest.cpp | ||
---|---|---|
20 | I took a look at the existing tests. I didn't find anything fancy with regex or partial line matching, although I did it without much detail. I found out that tests like this one (https://github.com/llvm/llvm-project/blob/main/llvm/test/Demangle/invalid-manglings.test), can be easily shortened by avoiding repetitiveness using some sort of fixture in gtest. |
llvm/lib/Demangle/DLangDemangle.cpp | ||
---|---|---|
37 | As a thought before getting too far with the reimplmentation of OutputStream, I wonder if you could model the following function: void OutputStream::prepend(const std::string &Input); as: OutputStream prependToStream(const std::string &Input, OutputStream &&stream); In other words, create a new stream that just has the input contents, and then move the contents of the old stream (including any memory allocation etc) into the newly created stream somehow. I've not thought about performance implications too much, so this may be a terrible idea due to additional copying involved somewhere along the lines. |
llvm/lib/Demangle/DLangDemangle.cpp | ||
---|---|---|
37 | I think this could probably be done efficiently/without any overhead - probably the missing piece would be the ability to pass in a malloc'd buffer to OutputStream's ctor. Then you could take it from one OutputStream, do whatever you want with it (prepend, etc) then construct a new OutputStream with it. But I'd lean towards considering that not better than adding the functionality directly to OutputStream (possibly with the rename). | |
llvm/unittests/Demangle/DLangDemangleTest.cpp | ||
20 | *thumbs up* Sounds good, thanks for checking! |
llvm/lib/Demangle/DLangDemangle.cpp | ||
---|---|---|
37 | Yeah, there is nothing less efficient on that approach, although I think it could lead to confusion. If it is not appropriate to have it inside, having it outside is questionable for other people that might use it. This can be solved by documenting the behaviour properly, but I'm also inclined to renaming. |
llvm/include/llvm/Demangle/Demangle.h | ||
---|---|---|
63 | Done. I already change this locally on other patches, as much as I noticed. I can update the patches if I find anything more to change. | |
llvm/lib/Demangle/DLangDemangle.cpp | ||
37 | Please see https://reviews.llvm.org/D111947 and https://reviews.llvm.org/D111948 . |
llvm/lib/Demangle/DLangDemangle.cpp | ||
---|---|---|
10 | ||
16 | ||
26 | That using namespace shows that OutputString needs moving out of the itanium_demangle namespace, since D isn't Itanium! | |
30 | Move this down to close to its first usage. | |
32 | If I follow it correctly, the second half of this if is unnecessary, as the strncmp will fail instead. | |
38 | Just a thought, but would it make more sense to have this return the OutputString rather than modify it in-place? Probably rename it to createOutputString in that case (this would be a separate patch of course). | |
40–42 | LLVM doesn't use braces on single-line ifs. | |
44 | Perhaps add a comment or probably put this in a separate function, as it feels like a different level of abstraction to the rest of the code in this fucntion, and I don't really understand why it's necessary (NB: I haven't looked into the OutputString interface, but ideally, I shouldn't have to). | |
llvm/lib/Demangle/Demangle.cpp | ||
24 | I'm wondering if we could reuse this function at the start of dlangDemangle instead of the second if. | |
llvm/unittests/Demangle/DLangDemangleTest.cpp | ||
9 | This comment is probably going to end up out of place as we go forward, as I expect we'll find other test cases that need checking that aren't part fo the libiberty library. | |
20–22 | I'd probably just ditch the protected, and make this a struct. | |
25–27 | ||
30 | We need test cases for nullptr as the input string, and probably something that fails the strncmp, e.g. _ and _Z or similar. | |
llvm/unittests/Demangle/DemangleTest.cpp | ||
25 | Probably move this above or below the two Rust mangling cases. |
llvm/lib/Demangle/DLangDemangle.cpp | ||
---|---|---|
10 | Done | |
16 | Done | |
26 | This is the same on Rust demangler, that is primarily why I didn't write a patch. I can do the renaming on another separate patch. | |
30 | Removed. | |
32 | True. This can also be joined into the next if. Done | |
38 | I didn't write this API. I can change it on another patch if you wish. | |
40–42 | Ok, noted. I will try to change this on the stacked patches. | |
44 | OutputString/OutputBuffer is not null terminated therefore this is necessary. This is also being done on other demanglers to make the API conformat with null terminated strings. I don't belive such tiny thing need a separate function, but I can generalize this if it is really needed. | |
llvm/lib/Demangle/Demangle.cpp | ||
24 | This can be done already since dlangDemangle performs that check too, I just followed what the other demanglers were doing. I think if this behaviour is documented I don't see the problem, although I also believe that the compiler optimized is smart enough to optimize this pre check. |
@jhenderson @dblaikie can you re-review?
llvm/unittests/Demangle/DLangDemangleTest.cpp | ||
---|---|---|
25–27 | Done |
llvm/lib/Demangle/DLangDemangle.cpp | ||
---|---|---|
10 | Looks like this got mangled - ended up as "the for D programming language"" rather than "for the D programming language". But I'll probably leave the review for @jhenderson to come back around and check their feedback's been addressed. |
Just a few nits left from me.
A note on the commit message: we don't usually add Signed off by tags to the commit messages in LLVM.
llvm/lib/Demangle/DLangDemangle.cpp | ||
---|---|---|
26 | Sounds reasonable, thanks. | |
33 | Decl is a bit of a weird name to me. I'm assuming it's meant to stand for something like Declaration? I feel like Demangled or even DemangledName (for symmetry with MangledName) would be clearer. | |
38 | I guess it'll depend on how easy it is to change the API. If there is good reason for its current design, it's fine to remain as-is. Either way, not a blocker for this patch. | |
44 | The separate function thought was mostly a way of describing what this code is doing, as it's not immediately clear if you aren't familiar with the OutputBuffer interface. A simple comment at the start of the if is probably sufficient, if you prefer that though. | |
llvm/unittests/Demangle/DLangDemangleTest.cpp | ||
31 | Just a thought: will brace initializers work here (i.e. {"_Dmain", "D main"} instead of std::make_pair(...) etc)? |
llvm/lib/Demangle/DLangDemangle.cpp | ||
---|---|---|
10 | Oh, thanks for noticing! I amended in the right pos. | |
33 | I can change to Demangled, I don't really know the full rationale behind it but I believe that is a better name indeed. | |
38 | The current design relies on the common C error handling way. I think the rationale here is because this function can set multiple variables by reference, and returning a reference could be seen as inconsistent. I think returning the reference would be fine if only one value was set. | |
llvm/unittests/Demangle/DLangDemangleTest.cpp | ||
31 | That doesn't work, unfortunately. |
LGTM, modulo the previously-discussed licensing question - have you heard anything from the LLVM foundation board yet?
llvm/lib/Demangle/DLangDemangle.cpp | ||
---|---|---|
40 |
Updated the last comment. About licensing, I wrote to llvm-dev about that (here: https://lists.llvm.org/pipermail/llvm-dev/2021-October/153267.html ). Iain, the primary author of the code also wrote to the first patch. More context here https://reviews.llvm.org/D110576 . I don't know what is the next step in that regard if anything more is needed.
llvm/lib/Demangle/DLangDemangle.cpp | ||
---|---|---|
40 | Patch updated. |
I think I'd only be comfortable if we get something official from the foundation saying they are happy with Iain's statement, since I'm not a lawyer. I don't know if there's any prior art in this regard. @lattner?
Looks like this patch https://lab.llvm.org/buildbot/#/builders/168/builds/3260/steps/10/logs/stdio
==40944==ERROR: LeakSanitizer: detected memory leaks Direct leak of 1024 byte(s) in 1 object(s) allocated from: #0 0x387698 in malloc /b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:129:3 #1 0x4f47d8 in initializeOutputBuffer /b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/Demangle/Utility.h:199:31 #2 0x4f47d8 in llvm::dlangDemangle(char const*) /b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/Demangle/DLangDemangle.cpp:29:8 #3 0x3c9455 in DLangDemangleTestFixture::SetUp() /b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/unittests/Demangle/DLangDemangleTest.cpp:20:39 #4 0x5f710e in HandleExceptionsInMethodIfSupported<testing::Test, void> /b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc #5 0x5f710e in testing::Test::Run() /b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2503:3 #6 0x5fbcdb in testing::TestInfo::Run() /b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2684:11 #7 0x5fd14f in testing::TestSuite::Run() /b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2816:28 #8 0x62dacd in testing::internal::UnitTestImpl::RunAllTests() /b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:5338:44 #9 0x62bc2c in HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> /b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc #10 0x62bc2c in testing::UnitTest::Run() /b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:4925:10 #11 0x5d95b0 in RUN_ALL_TESTS /b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/unittest/googletest/include/gtest/gtest.h:2473:46 #12 0x5d95b0 in main /b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/unittest/UnitTestMain/TestMain.cpp:50:10 #13 0x7f12de89a09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) SUMMARY: AddressSanitizer: 1024 byte(s) leaked in 1 allocation(s).
I don't have a sanitizer build setup currently, and couldn't reproduce this with valgrind (maybe used it incorrectly) - but hopefully 1bed03b5e3812dc6c659d285d77a2a577a2d113d addresses this issue.