If one reverts D66398 then the TestDataFormatterStdList does fail - as the C++ formatters are initialized in the opposite order. But the current state of trunk does not mind the order for C++ formatters.
Details
Diff Detail
- Repository
- rLLDB LLDB
Event Timeline
I understand one could pass some better abstracted error-reporting interface instead of ValueObject itself but that would require some more code (interface definition, inheritance etc.).
How much work would it be to return an llvm::Expected<bool> from Get_Impl? I'd really prefer that over the current approach.
lldb/include/lldb/Core/ValueObject.h | ||
---|---|---|
459 | This allows overwriting the error without checking if it's already set. Maybe it's better to use GetError, check that it's not set, and then either set it or append to it? | |
lldb/include/lldb/DataFormatters/FormattersContainer.h | ||
298–299 | If returning an expected isn't feasible, we should pass at least a Status* instead of setting the error in the ValueObject directly. | |
lldb/source/DataFormatters/TypeCategory.cpp | ||
350 | It's really unfortunate that so many call sites now require a null pointer. Can we make it a default argument? |
Done. Although now if (Get(...)) became confusing which I have commented in lldb_private::FormatDropExpected().
BTW I haven't found how to format a table for Doxygen. Also by searching around I have found other tables in LLDB/LLVM are also not recognized by Doxygen as tables.
lldb/include/lldb/DataFormatters/FormattersContainer.h | ||
---|---|---|
298–299 | With Expected<bool> this is no longer applicable. | |
lldb/source/DataFormatters/TypeCategory.cpp | ||
350 | With Expected<bool> this is no longer applicable. |
Testcase for the regex sanity check is failing - ValueObject::m_error remains set even after type summary clear. ValueObject::UpdateValueIfNeeded will find did_change_formats==true but it does not help it. I think there is some bug in ValueObject updating but maybe that setting of ValueObject::m_error is a wrong idea.
runCmd: type summary clear output: runCmd: type summary add --summary-string "${var[0-1]}" -x "int \[[0-9]\]" output: runCmd: frame variable int_array output: (int [5]) int_array = [-1,2] Expecting sub string: 1,2 Matched runCmd: type summary add --summary-string "${var[0-1]}" "int []" output: runCmd: frame variable int_array output: (int [5]) int_array = <Two regexes ("int \[[0-9]+\]" and "int \[[0-9]\]") ambiguously match the same string "int [5]".> Expecting pattern: int_array = <Two regexes \(("int\ \\\[\[0\-9\]\\\]" and "int\ \\\[\[0\-9\]\+\\\]"|"int\ \\\[\[0\-9\]\+\\\]" and "int\ \\\[\[0\-9\]\\\]")\) ambiguously match the same string "int\ \[5\]".> Matched runCmd: type summary clear output: runCmd: type summary add --summary-string "${var[0-1]}" "int []" output: runCmd: frame variable int_array output: (int [5]) int_array = <Two regexes ("int \[[0-9]+\]" and "int \[[0-9]\]") ambiguously match the same string "int [5]".> Expecting sub string: 1,2 Not matched FAIL
+ @jingham for value object stuff
Though I agree with the general principle of this change, the new api which implements it seems extremely confusing. Unfortunately, it's not really clear to me how to improve that. Maybe it would be better if the Get methods returned Expected<lldb::TypeFormatImplSP>? Then an "error" would mean ambiguous match, valid pointer would be "ok", and a null pointer would be "no match"? Or maybe both "no match" and "ambiguous match" should be errors? Or maybe, since these objects claim to be containers, they shouldn't even treat multiple matches as an error, but instead provide an interface to get *all* matches, and then this can be converted to an error higher up?
lldb/include/lldb/Core/ValueObject.h | ||
---|---|---|
460 | I didn't mention this in the previous patches, but I think you're overusing the rvalue references here. It's generally better to use regular value objects in cases like these because this enforces proper contracts between functions. Using rvalue references leaves open the possibility for a bug (which you've actually made here) that the TakeError function will do nothing to the passed-in object (on some path of execution), which means that the object will continue to exist in the "untaken" state in the caller, and then probably assert some distance away from where it was meant to be used. Passing the argument as a plain object means that this function *must* do something with it (really *take* it) or *it* will assert. The cost of that is a single object move, but that should generally be trivial for objects that implement moving properly. | |
lldb/include/lldb/DataFormatters/FormatClasses.h | ||
176 | std::string, please. :) ConstString is severely overused in lldb.. |
What about having the first one that matched being the one that is used and avoid errors all together? We do this with the regex alias stuff already. With regular expressions people are sure to make ones that conflict with each other.
This is how it worked so far before the fix D66398 (which made the regexes unambiguous - always matching at most 1 regex for any string). The previous version of the patch just sorted the regexes alphabetically because original LLDB did not sort them at all printing random results.
@labath did not like the alphabetical sorting as (IIUC) adding formatters from a new library can affect formatting for a different library.
We do this with the regex alias stuff already.
I haven't found the work alias in either grep -ri alias $(find -name DataFormatters) or in Variable Formatting doc.
With regular expressions people are sure to make ones that conflict with each other.
So which order to choose? It should not have different output during different runs of LLDB.
Kind of, yes. My concern was a bit more general, in that the alphabetical sorting just seems arbitrary and I don't believe anybody would expect that. It happens to do what we want here, but I can imagine that in some cases it might do the exact opposite, and then we'd have to massage the regular expressions to reverse their sort order, but retain meaning...
Personally, I'd go for the registration order, or possibly the reverse registration order, depending on what we want to give precedence to. It seems to me that this would give the most predictable results, and would make it easy to swap the precedence of two formatters if needed. But I don't feel strongly about that, so if you want to go with alphabetical, then I guess that's fine with me (it's definitely better than sorting on pointer values :P).
lldb/include/lldb/Core/ValueObject.h | ||
---|---|---|
460 | Thanks for this notification, I see I was automatically using the && parameter specifier excessively. rvalues still work properly with std::move() even without using this parameter specifier. |
Then you cannot override any existing formatter. Then it would be better to rather assert/error in such case which I tried to do in this patch but nobody likes this patch much.
or possibly the reverse registration order,
That is what this new patch implements.
Summarizing our discussion on IRC:
While I think that this version of the patch is definitely better than the previous ones, I still feel that the two container solution is unnecessarily complicated. I think using llvm::MapVector or even std::vector would be sufficient. While this may be optimal in terms of performance (not memory), I don't think that is really important in this case. The only place where the std::list solution is better than a std::vector is the removal of an arbitrary element in the list. However, I would stipulate that this is a very uncommon operation. The most common operations are lookup (which are linear in any case), and insertion (constant).
Therefore, I think we should optimize for readability/bug-freeness and choose the simplest solution.
Hm... "summarizing" might not have been the best word here. I have "summarized" what *I* said over IRC, but I did not mean to imply that Jan agreed with my position (and he didn't, not yet anyway..)
There are also lookups by name (GetExact) which are logarithmic now by that std::map.
I can try the MapVector. I just tried not to regress performance by using optimal data structures. If one was concerned about performance there can be done many other easy improvements of the existing code.
I have tried the MapVector with RegularExpressionSP as its key as suggested by @labath but that would be even more code than the std::list I provided here, both for its custom shared-pointer sorting and for its EmptyKey+TombstoneKey if we should not put std::map there.
So I returned back to a single std::vector as originally suggested by @labath and after fixing missing Delete in the Add implementation for vector it I think that may be a good compromise from all the variants.
I think this looks great. Some operations may get less efficient with this, but OTOH, some will get actually faster. And it is definitely the most understandable solution out of everything we had so far.
lldb/include/lldb/DataFormatters/FormattersContainer.h | ||
---|---|---|
83 | m_map.emplace_back, maybe? | |
111 | range-based for | |
130–134 | lol | |
177 | Is this still needed, or is that a leftover from the MapVector attempt? | |
307–308 | consider: for(const auto &entry: llvm::reverse(m_format_map)) |
This allows overwriting the error without checking if it's already set. Maybe it's better to use GetError, check that it's not set, and then either set it or append to it?