This is an archive of the discontinued LLVM Phabricator instance.

2/2: Process formatters in reverse-chronological order
ClosedPublic

Authored by jankratochvil on Aug 23 2019, 7:56 AM.

Details

Summary

If one reverts D66398 then the TestDataFormatterStdList does fail - as the C++ formatters are initialized in the opposite order. But the current state of trunk does not mind the order for C++ formatters.

Diff Detail

Repository
rLLDB LLDB

Event Timeline

jankratochvil created this revision.Aug 23 2019, 7:56 AM
JDevlieghere requested changes to this revision.Aug 23 2019, 9:09 AM

I understand one could pass some better abstracted error-reporting interface instead of ValueObject itself but that would require some more code (interface definition, inheritance etc.).

How much work would it be to return an llvm::Expected<bool> from Get_Impl? I'd really prefer that over the current approach.

lldb/include/lldb/Core/ValueObject.h
459

This allows overwriting the error without checking if it's already set. Maybe it's better to use GetError, check that it's not set, and then either set it or append to it?

lldb/include/lldb/DataFormatters/FormattersContainer.h
298–299

If returning an expected isn't feasible, we should pass at least a Status* instead of setting the error in the ValueObject directly.

lldb/source/DataFormatters/TypeCategory.cpp
350

It's really unfortunate that so many call sites now require a null pointer. Can we make it a default argument?

This revision now requires changes to proceed.Aug 23 2019, 9:09 AM
jankratochvil planned changes to this revision.Aug 23 2019, 10:13 AM
granata.enrico resigned from this revision.Aug 23 2019, 10:42 AM

I'm not working on LLDB anymore. Sorry about that.

jankratochvil marked 5 inline comments as done.Aug 25 2019, 10:16 AM

How much work would it be to return an llvm::Expected<bool> from Get_Impl? I'd really prefer that over the current approach.

Done. Although now if (Get(...)) became confusing which I have commented in lldb_private::FormatDropExpected().

BTW I haven't found how to format a table for Doxygen. Also by searching around I have found other tables in LLDB/LLVM are also not recognized by Doxygen as tables.

lldb/include/lldb/DataFormatters/FormattersContainer.h
298–299

With Expected<bool> this is no longer applicable.

lldb/source/DataFormatters/TypeCategory.cpp
350

With Expected<bool> this is no longer applicable.

jankratochvil marked 2 inline comments as done.
jankratochvil edited the summary of this revision. (Show Details)
jankratochvil planned changes to this revision.Aug 25 2019, 1:41 PM

Testcase for the regex sanity check is failing - ValueObject::m_error remains set even after type summary clear. ValueObject::UpdateValueIfNeeded will find did_change_formats==true but it does not help it. I think there is some bug in ValueObject updating but maybe that setting of ValueObject::m_error is a wrong idea.

runCmd: type summary clear
output:

runCmd: type summary add --summary-string "${var[0-1]}" -x "int \[[0-9]\]"
output:

runCmd: frame variable int_array
output: (int [5]) int_array = [-1,2]


Expecting sub string: 1,2
Matched

runCmd: type summary add --summary-string "${var[0-1]}" "int []"
output:

runCmd: frame variable int_array
output: (int [5]) int_array = <Two regexes ("int \[[0-9]+\]" and "int \[[0-9]\]") ambiguously match the same string "int [5]".>

Expecting pattern: int_array = <Two regexes \(("int\ \\\[\[0\-9\]\\\]" and "int\ \\\[\[0\-9\]\+\\\]"|"int\ \\\[\[0\-9\]\+\\\]" and "int\ \\\[\[0\-9\]\\\]")\) ambiguously match the same string "int\ \[5\]".>
Matched

runCmd: type summary clear
output:

runCmd: type summary add --summary-string "${var[0-1]}" "int []"
output:

runCmd: frame variable int_array
output: (int [5]) int_array = <Two regexes ("int \[[0-9]+\]" and "int \[[0-9]\]") ambiguously match the same string "int [5]".>

Expecting sub string: 1,2
Not matched

FAIL
labath added a subscriber: jingham.

+ @jingham for value object stuff

Though I agree with the general principle of this change, the new api which implements it seems extremely confusing. Unfortunately, it's not really clear to me how to improve that. Maybe it would be better if the Get methods returned Expected<lldb::TypeFormatImplSP>? Then an "error" would mean ambiguous match, valid pointer would be "ok", and a null pointer would be "no match"? Or maybe both "no match" and "ambiguous match" should be errors? Or maybe, since these objects claim to be containers, they shouldn't even treat multiple matches as an error, but instead provide an interface to get *all* matches, and then this can be converted to an error higher up?

lldb/include/lldb/Core/ValueObject.h
460

I didn't mention this in the previous patches, but I think you're overusing the rvalue references here. It's generally better to use regular value objects in cases like these because this enforces proper contracts between functions.

Using rvalue references leaves open the possibility for a bug (which you've actually made here) that the TakeError function will do nothing to the passed-in object (on some path of execution), which means that the object will continue to exist in the "untaken" state in the caller, and then probably assert some distance away from where it was meant to be used.

Passing the argument as a plain object means that this function *must* do something with it (really *take* it) or *it* will assert. The cost of that is a single object move, but that should generally be trivial for objects that implement moving properly.

lldb/include/lldb/DataFormatters/FormatClasses.h
176

std::string, please. :) ConstString is severely overused in lldb..

What about having the first one that matched being the one that is used and avoid errors all together? We do this with the regex alias stuff already. With regular expressions people are sure to make ones that conflict with each other.

What about having the first one that matched being the one that is used and avoid errors all together?

This is how it worked so far before the fix D66398 (which made the regexes unambiguous - always matching at most 1 regex for any string). The previous version of the patch just sorted the regexes alphabetically because original LLDB did not sort them at all printing random results.

@labath did not like the alphabetical sorting as (IIUC) adding formatters from a new library can affect formatting for a different library.

We do this with the regex alias stuff already.

I haven't found the work alias in either grep -ri alias $(find -name DataFormatters) or in Variable Formatting doc.

With regular expressions people are sure to make ones that conflict with each other.

So which order to choose? It should not have different output during different runs of LLDB.

What about having the first one that matched being the one that is used and avoid errors all together?

This is how it worked so far before the fix D66398 (which made the regexes unambiguous - always matching at most 1 regex for any string). The previous version of the patch just sorted the regexes alphabetically because original LLDB did not sort them at all printing random results.

@labath did not like the alphabetical sorting as (IIUC) adding formatters from a new library can affect formatting for a different library.

Kind of, yes. My concern was a bit more general, in that the alphabetical sorting just seems arbitrary and I don't believe anybody would expect that. It happens to do what we want here, but I can imagine that in some cases it might do the exact opposite, and then we'd have to massage the regular expressions to reverse their sort order, but retain meaning...

Personally, I'd go for the registration order, or possibly the reverse registration order, depending on what we want to give precedence to. It seems to me that this would give the most predictable results, and would make it easy to swap the precedence of two formatters if needed. But I don't feel strongly about that, so if you want to go with alphabetical, then I guess that's fine with me (it's definitely better than sorting on pointer values :P).

jankratochvil planned changes to this revision.Aug 27 2019, 5:10 AM
jankratochvil marked an inline comment as done.
jankratochvil added inline comments.
lldb/include/lldb/Core/ValueObject.h
460

Thanks for this notification, I see I was automatically using the && parameter specifier excessively. rvalues still work properly with std::move() even without using this parameter specifier.
I also agree there was the bug with possibly untaken Error asserting later (I find that as an orthogonal bug to the && one).

jankratochvil retitled this revision from Prevent D66398 `TestDataFormatterStdList`-like regressions in the future to 2/2: Process formatters in reverse-chronological order.
jankratochvil edited the summary of this revision. (Show Details)

Personally, I'd go for the registration order,

Then you cannot override any existing formatter. Then it would be better to rather assert/error in such case which I tried to do in this patch but nobody likes this patch much.

or possibly the reverse registration order,

That is what this new patch implements.

labath added a comment.Sep 2 2019, 6:38 AM

Summarizing our discussion on IRC:
While I think that this version of the patch is definitely better than the previous ones, I still feel that the two container solution is unnecessarily complicated. I think using llvm::MapVector or even std::vector would be sufficient. While this may be optimal in terms of performance (not memory), I don't think that is really important in this case. The only place where the std::list solution is better than a std::vector is the removal of an arbitrary element in the list. However, I would stipulate that this is a very uncommon operation. The most common operations are lookup (which are linear in any case), and insertion (constant).

Therefore, I think we should optimize for readability/bug-freeness and choose the simplest solution.

labath added a comment.Sep 2 2019, 6:42 AM

Hm... "summarizing" might not have been the best word here. I have "summarized" what *I* said over IRC, but I did not mean to imply that Jan agreed with my position (and he didn't, not yet anyway..)

The most common operations are lookup (which are linear in any case),

There are also lookups by name (GetExact) which are logarithmic now by that std::map.

I can try the MapVector. I just tried not to regress performance by using optimal data structures. If one was concerned about performance there can be done many other easy improvements of the existing code.

I have tried the MapVector with RegularExpressionSP as its key as suggested by @labath but that would be even more code than the std::list I provided here, both for its custom shared-pointer sorting and for its EmptyKey+TombstoneKey if we should not put std::map there.
So I returned back to a single std::vector as originally suggested by @labath and after fixing missing Delete in the Add implementation for vector it I think that may be a good compromise from all the variants.

labath accepted this revision.Sep 11 2019, 1:08 AM

I think this looks great. Some operations may get less efficient with this, but OTOH, some will get actually faster. And it is definitely the most understandable solution out of everything we had so far.

lldb/include/lldb/DataFormatters/FormattersContainer.h
83

m_map.emplace_back, maybe?

111

range-based for

130–134

lol

177

Is this still needed, or is that a leftover from the MapVector attempt?

307–308

consider: for(const auto &entry: llvm::reverse(m_format_map))

jankratochvil marked 9 inline comments as done.Sep 14 2019, 8:48 AM
jankratochvil added inline comments.
lldb/include/lldb/DataFormatters/FormattersContainer.h
83

Yes, not sure why I forgot here.

111

OK, other existing loops updated in: rL371922

177

Still needed.

307–308

Nice, thanks, used.

jankratochvil marked 4 inline comments as done.
labath accepted this revision.Sep 16 2019, 1:32 AM

lgtm

This revision was not accepted when it landed; it landed in state Needs Review.Sep 20 2019, 1:18 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptSep 20 2019, 1:18 PM