This is an archive of the discontinued LLVM Phabricator instance.

2/2: Fix `TestDataFormatterStdList` regression
ClosedPublic

Authored by jankratochvil on Aug 18 2019, 1:20 PM.

Details

Summary

Since D66174 I see failures of TestDataFormatterStdList in about 50% of runs on Fedora 30 x86_64 libstdc++.
I have found out that LLDB internally expects these RegularExpressions to be matched in their alphabetical order:

^std::(__cxx11::)?list<.+>(( )?&)?$
^std::__[[:alnum:]]+::list<.+>(( )?&)?$

But since D66174 they are sometimes matched in reverse order. A simple ugly workaround is to just sort it for ite ration. The overlapping matches one can see by another debug patch on top of it.
In fact it was only some luck it worked before as there is internally std::map<lldb::RegularExpressionSP, FormatterImpl> (FormattersContainer).

Diff Detail

Repository
rLLDB LLDB

Event Timeline

jankratochvil created this revision.Aug 18 2019, 1:20 PM
jankratochvil added a comment.EditedAug 19 2019, 7:56 AM

From IRC:
In fact the whole patch is a oneliner (+operator< for RegularExpression.h):

-  typedef FormattersContainer<lldb::RegularExpressionSP, FormatterImpl> RegexMatchContainer;
+  typedef FormattersContainer<RegularExpression, FormatterImpl> RegexMatchContainer;

The rest of the patch is just a mechanical update to make it compile again.

It was my mistake when trying to fix the TestDataFormatterStdList regression that it looked to me the new RegularExpression must be broken (its lifetime). In the end it was not broken. But along the way I have removed its IMO-excessive Compile() method as I am trying to push this API change as a cleanup in D66392. If there is disagreement I will rework this patch D66398 to use the former RegularExpression API using Compile(). If there is disagreement even with that I can drop it all and just make a new comparison function for lldb::RegularExpressionSP.

I find the change from std::shared_ptr to rvalues is a nice cleanup+optimization anyway. I do not like much std::shared_ptr myself as due to its lock instructions it kills performance of manycore machines.

labath accepted this revision.Aug 19 2019, 8:20 AM

I very much support getting rid of shared pointers, and for this is I am inclined to accept this. However, I don't think it was ever really intended to use the sorting order of the regex text as a disambiguating criterion for data formatters, and I don't think we should make a feature of it now -- it's just too weird and I doubt anybody will expect that.

Since it seems that there is a need for disambiguation (libc++ can be configured to have any name for the inline namespace, so there's no way to make its regex not match the libstdc++ one), I think we should just use the registration order to disambiguate. Then we just need to make sure the libstdc++ formatter is registered before the libc++ one, which should be both easy to achieve and understandable.

Since you've recently touched this code, how hard do you think it would be to exchange the std::map in FormattersContainer for a std::vector? Would you be willing to give that a try?

This revision is now accepted and ready to land.Aug 19 2019, 8:20 AM
JDevlieghere accepted this revision.Aug 20 2019, 8:38 AM
JDevlieghere added inline comments.
lldb/source/DataFormatters/FormatManager.cpp
950

I'd just remove this

Since it seems that there is a need for disambiguation (libc++ can be configured to have any name for the inline namespace, so there's no way to make its regex not match the libstdc++ one),

This is regression from D57466. If you configure libc++ to have the same name ("__cxx11") as libstdc++ then LLDB really cannot recognize it is libc++ behind.

I think we should just use the registration order to disambiguate. Then we just need to make sure the libstdc++ formatter is registered before the libc++ one, which should be both easy to achieve and understandable.

Since you've recently touched this code, how hard do you think it would be to exchange the std::map in FormattersContainer for a std::vector? Would you be willing to give that a try?

From IRC:
That std::vector there is not easy. First some methods were already using .find() on it so I had to change them to iterate through that vector.
Also some iterating in FormattersContainer (Delete_Impl, GetExact_Impl) were now iterating for comparison despite they could use .find() so by moving to std::vector we lose this improvement possibility.
labath: if replace that by a single entry then the nondeterminism goes away
labath: std::.*::list -> { if (inline_namespace == "__cxx11") return "libstdc++ formatter" else return "libc++ formatter" }

I have somehow find that dispatching implemented C++ not so simple, why not just to remove the ambiguity from the regexes?

jankratochvil edited the summary of this revision. (Show Details)

It just makes now the regexes unambiguous.

jankratochvil requested review of this revision.Aug 22 2019, 4:50 AM
labath accepted this revision.Aug 22 2019, 5:13 AM

It just makes now the regexes unambiguous.

I think this is fine too, but I would also only consider it a stop-gap, because these kinds of regular are horrid.

What was the problem with the dispatching solution? If it is hard to programatically access the inline namespace name (though I expect you should at least be able to to something like type.GetName().startswith("std::__cxx11"), then we can dispatch based on some other property. The presence of some member variable for instance? This would technically be even more correct as it would also work in the case when someone sadistically configures libc++ to use the __cxx11 namespace. And since the formatters need have knowledge of the member variables then I think dispatching based on them is fair game too...

lldb/include/lldb/DataFormatters/FormattersContainer.h
298–315

Let's make a separate patch for this. I think it would be better to print this error through proper channels, but that might require some plumbing.

This revision is now accepted and ready to land.Aug 22 2019, 5:13 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptAug 22 2019, 7:30 AM
jankratochvil added a comment.EditedAug 22 2019, 7:41 AM

because these kinds of regular are horrid.

Yes, they are. So let's link with PCRE (BSD licensed).

What was the problem with the dispatching solution?

I do not find it so easy as libc++ formatter is using static function LibcxxStdListSyntheticFrontEndCreator while libstdc++ is using new object instance ScriptedSyntheticChildren. The dispatching should be in LibcxxStdListSyntheticFrontEndCreator so should it store new ScriptedSyntheticChildren as a static variable?

And then I do not like that solution anyway. Currently the dispatching is based on type name strings so why not to just do that? KISS

we can dispatch based on some other property. The presence of some member variable for instance?

So let's drop the type name completely and make the matching duck typed. Why to divert from how it is currently done?

Anyway sure feel free to rewrite it but the regression is fixed now.

"jankratochvil added a reverting change" - that is not true, Differential got somehow confused by the commit text there talking about this patch.

"jankratochvil added a reverting change" - that is not true, Differential got somehow confused by the commit text there talking about this patch.

Good to know, but assuming that the above patch sticks, the plan *is* to revert this, right?

Good to know, but assuming that the above patch sticks, the plan *is* to revert this, right?

D66654 is already checked-in, I have considered it as done as is. If you want to revert this patch then:

  • One needs to reverse the registration order as a simple revert fails for TestDataFormatterStdList.py
  • One should put a big fat warning there.

Should I do that?