Depends on D102581
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Yeah, I think there's a bug in the bound count handling (off by one) due to the quirk in the parseOptionalBase62 parsing?
llvm/lib/Demangle/RustDemangle.cpp | ||
---|---|---|
471–476 | Might be nice to have this written once/reused (rather than duplicated here between 'R' and 'Q' for instance) - not sure if it can be generalized over the use in demangleGenericArg too - all of them use the <lifetime> Production, but there's no common implementation of/function for that production - maybe there could be? | |
549 | Judging by the phrasing, was this meant to say "Inputs exceeding the bound are /valid/" rather than invalid? Maybe it'd be more suitable to have a fixed upper bound on the size of a demangled name in general instead of these special cases? I guess that could cover the recursion and this bound on bound lifetimes... That might be more predictable for users and for testing, etc. | |
707 | This seems like a really subtle and confusing API, especially given the existing "+ 1" behavior of the Base 62 encoding. I've been staring at the code in demangleFnSig+demangleBinder to figure out the first binder test case for a while before I realized that. Any chance this could use uintmax for the invalid value, rather than offsetting the result in this way? | |
llvm/test/Demangle/rust.test | ||
237–238 | I'm having a hard time understanding this test in the context of the spec. The spec says that _ encodes zero. So the G_ should be zero, right? Is there somewhere in the spec that says that <binder> mangled value is then 1 larger than that? (so a binder value 0 (_) is actually for 1 binder)? |
- Reuse the implementation for R and Q productions.
- Reword the comment regarding invalid binders.
llvm/lib/Demangle/RustDemangle.cpp | ||
---|---|---|
471–476 | I reused the implementation between R and Q productions. | |
549 | I reworded the comment and added an explanation why those inputs are invalid. I think we need at least two different limitation mechanisms. A recursion limit to avoid unbounded recursion once we start parsing backreferences, especially those that don't produce any output. And another limit to handle large output. I decided to check for invalid binders specifically since they are uniquely problematic; short strings like _RIC5crateFGAAAAA_EuE already generate gigabytes of output. The proposed bound rejects only invalid inputs, so it shouldn't be surprising for users in practice. A general limit on the output size would be alternative option. Although it requires answering the hard question what the limit should be. | |
707 | There are two layers of +1 which makes it quite confusing, but this is intended encoding of optional base62 numbers. | |
llvm/test/Demangle/rust.test | ||
237–238 | This looks like an omission in the specification The binder production should be marked as optional, and then the presence of G participates in the encoding of the value in analogous way to s in disambiguators:
Does this clarify the situation? I will look into updating the specification. |
llvm/lib/Demangle/RustDemangle.cpp | ||
---|---|---|
471–476 | any chance of having a "demangleLifetime" that can be used in "demangleGenericArg" as well as here? (nice to have a function for each production, so it's easier to find the implementation of that production in one place) | |
549 | Fair enough - thanks for the details! | |
707 | Fair enough - maybe some more quotations from the spec to explain this would be useful? | |
llvm/test/Demangle/rust.test | ||
237–238 | Ah, OK - yeah, spec update and some more words in the implementation (see other comment) would probably be handy. Thanks! |
Extend comment about +1 behaviour when parsing optional base-62 numbers.
llvm/lib/Demangle/RustDemangle.cpp | ||
---|---|---|
471–476 | Yeah it would be nice to unify those, but there is slight difference in a treatment of erased lifetimes:
| |
707 | I extended the comment and added a few concrete examples. | |
llvm/test/Demangle/rust.test | ||
237–238 | The proposed update to the specification https://github.com/rust-lang/rfcs/pull/3130. |
Sounds good, thanks!
llvm/lib/Demangle/RustDemangle.cpp | ||
---|---|---|
471–476 | having parameters to the demangle function to support these different behaviors doesn't seem totally unreasonable, while still having central handling? (or having variants "demangleLifetimeForGenericArg", etc) |
Might be nice to have this written once/reused (rather than duplicated here between 'R' and 'Q' for instance) - not sure if it can be generalized over the use in demangleGenericArg too - all of them use the <lifetime> Production, but there's no common implementation of/function for that production - maybe there could be?