This is an archive of the discontinued LLVM Phabricator instance.

Data formatters: Look through array element typedefs
ClosedPublic

Authored by jarin on Jan 3 2020, 2:33 AM.

Details

Summary

Motivation: When formatting an array of typedefed chars, we would like to display the array as a string.

The string formatter currently does not trigger because the formatter lookup does not resolve typedefs for array elements (this behavior is inconsistent with pointers, for those we do look through pointee typedefs). This patch tries to make the array formatter lookup somewhat consistent with the pointer formatter lookup.

Diff Detail

Event Timeline

jarin created this revision.Jan 3 2020, 2:33 AM
teemperor accepted this revision.Jan 3 2020, 4:09 AM

I have some comments but otherwise this patch looks good to me. Thanks!

lldb/packages/Python/lldbsuite/test/functionalities/data-formatter/array_typedef/TestArrayTypedef.py
3

I rather have no comment than a generic one (it's kinda obvious what is tested here from the name).

14

I think we can mark this as NO_DEBUG_INFO_TESTCASE = True to only run this once.

lldb/packages/Python/lldbsuite/test/functionalities/data-formatter/array_typedef/main.cpp
2

We usually don't add license headers to test files.

lldb/source/API/SBType.cpp
218

I get that this is to preserve the previous SB API behavior but I think it's better if we keep this method a simple wrapper without extra functionality. That we return the canonical type seems like a bug for me, so it's IMHO fine to change this behavior.

lldb/source/DataFormatters/FormatManager.cpp
263

Maybe add a comment that we strip here typedefs of array element types.

267

Maybe add a comment that this is the original array type with the element type typedef desugared.

273

I know this is copied from a above but I think if this is more readable:

// this is not exactly the usual meaning of stripping typedefs.
const bool stripped_typedef = true;
GetPossibleMatches(
....
  stripped_typedef);
This revision is now accepted and ready to land.Jan 3 2020, 4:09 AM
jarin updated this revision to Diff 236021.Jan 3 2020, 4:41 AM

Just FYI, I meant that lldb/source/API/SBType.cpp shouldn't be changed with this patch. I think it's fine that this patch will change the behavior (as the old behavior seems broken).

Also do you need someone to commit this for you?

jarin marked 7 inline comments as done.Jan 3 2020, 4:57 AM

I have addressed the comments, thanks for the quick review.

lldb/source/API/SBType.cpp
218

Unfortunately, returning the non-canonicalized type breaks bunch of test (see below). In any case, such change should be done in a separate patch.

lldb-api :: functionalities/breakpoint/breakpoint_conditions/TestBreakpointConditions.py
lldb-api :: functionalities/data-formatter/data-formatter-stl/libcxx/unordered/TestDataFormatterUnordered.py
lldb-api :: functionalities/data-formatter/data-formatter-stl/libstdcpp/list/TestDataFormatterStdList.py
lldb-api :: functionalities/data-formatter/refpointer-recursion/TestDataFormatterRefPtrRecursion.py
lldb-api :: functionalities/data-formatter/varscript_formatting/TestDataFormatterVarScriptFormatting.py
jarin marked an inline comment as not done.Jan 3 2020, 4:59 AM

Also do you need someone to commit this for you?

Yes, please (once we agree what to do with SBType.cpp).

Interesting, I actually can't reproduce those failures on any of my machines. Can you post the error output you get?

But I agree that we can figure this out in a separate patch, so I'll give this another test run and then land it. Thanks for the patch!

teemperor accepted this revision.Jan 3 2020, 7:36 AM
jarin added a comment.Jan 3 2020, 8:16 AM

Actually, there is something unusually flaky in my current checkout, so you might be right that not canonicalizing is harmless. It still makes more sense to land that separately.

clayborg requested changes to this revision.Jan 6 2020, 1:49 PM
clayborg added a subscriber: clayborg.
clayborg added inline comments.
lldb/source/API/SBType.cpp
215–219

Why does getting the canonical type of an array type change the array element type? I would think it wouldn't matter but it obviously must?

lldb/source/DataFormatters/FormatManager.cpp
245–261

Wouldn't we want to add one for each typedef that has a formatter? Lets say we had:

typedef char MCHAR;
typedef MCHAR MMCHAR;

int main() {
  MMCHAR str[5] = "abcd";
  return 0;
}

if we had a special formatter for MCHAR that would maybe display MCHAR somehow special, would we be ok with always getting the standard C string formatter in this case? Seems like we should add one for each typedef that has a formatter and possibly also the base type?

lldb/source/Symbol/ClangASTContext.cpp
3935

If I ask an array type for its element type, I wouldn't expect all typedefs to be removed. This seems wrong, or it can become an option to this function as an extra param:

CompilerType
ClangASTContext::GetArrayElementType(lldb::opaque_compiler_type_t type, uint64_t *stride, bool get_canonical_element_type);
This revision now requires changes to proceed.Jan 6 2020, 1:49 PM

So as long as the following are true from this patch I am ok:

  • if I ask for the array element type of "str" in the test that was added, it should return "MCHAR". We shouldn't be removing any typedefs from the type. The user can call SBType::GetCanonicalType() if they need to (or equivalent with internal APIs)
  • If there are no formatters for "MCHAR[]" we can fall back to "char[]".
  • If there are no formatters for "MMCHAR[]" from my example we fall back to the _first_ array formatter that supports the array of typedefs, or back to the the array of canonical types. Only the first array based formatter should be returned as the desired formatter

This patch seems to redefine what getting the element type of an array is which seems wrong to me. We just need to make the code that uses these APIs smarter.

jarin marked 3 inline comments as done.Jan 7 2020, 1:42 AM
jarin added inline comments.
lldb/source/API/SBType.cpp
215–219

Removal of typedef for element type is the existing behavior that I have been trying preserve. Raphael already suggested we should not preserve it.

For example, I get the following output for my program (both without and with my patch):

Type:    MCHAR [10]
El type: char

Here is the script

import lldb, os

debugger = lldb.SBDebugger.Create()
debugger.SetAsync(False)
target = debugger.CreateTargetWithFileAndTargetTriple("df.out", \
  "x86_64-pc-linux")
main_bp = target.BreakpointCreateByLocation("df.cc", 11)
process = target.LaunchSimple(None, None, os.getcwd())
frame = process.selected_thread.GetSelectedFrame()
var_a = process.selected_thread.GetSelectedFrame().GetValueForVariablePath("a")
print("Type:    " + var_a.GetType().GetName())
print("El type: " + var_a.GetType().GetArrayElementType().GetName())

If we remove the canonicalization here, we will indeed get the behavior that you want:

Type:    MCHAR [10]
El type: MCHAR
lldb/source/DataFormatters/FormatManager.cpp
245–261

This does not work as expected with my patch, let me investigate why.

lldb/source/Symbol/ClangASTContext.cpp
3935

The original code for ClangASTContext::GetArrayElementType did remove the typedefs. With my change ClangASTContext::GetArrayElementType does not remove typedefs anymore.

So as long as the following are true from this patch I am ok:

  • if I ask for the array element type of "str" in the test that was added, it should return "MCHAR". We shouldn't be removing any typedefs from the type. The user can call SBType::GetCanonicalType() if they need to (or equivalent with internal APIs)
  • If there are no formatters for "MCHAR[]" we can fall back to "char[]".
  • If there are no formatters for "MMCHAR[]" from my example we fall back to the _first_ array formatter that supports the array of typedefs, or back to the the array of canonical types. Only the first array based formatter should be returned as the desired formatter

This patch seems to redefine what getting the element type of an array is which seems wrong to me. We just need to make the code that uses these APIs smarter.

This patch is moving the API closer to what you describe. The current behavior is that we strip the typedefs. The behavior we expect to get is that we don't strip the typedefs (which is what you describe). This patch moves the internal API closer to the expected behavior. The public SBAPI will be changed in a follow-up patch to be closer to the expected behavior. So no part of this patch is about adding any implicit stripping of typedefs which I think we all agree is unexpected and bad behavior. The only reason why we add the canonical type conversion to the SBAPI is to keep the bad SBAPI behavior for now (but this doesn't add any new typedef stripping, it just keeps the old one around for the SBAPI).

So from what I can tell this patch is improving the behavior by making it closer to what Greg describes (and what is correct), so it should land IMHO. The remaining existing typedef stripping can be fixed later (unless Greg feels strongly about fixing all typedef stripping it in this patch which is fine by me).

lldb/source/API/SBType.cpp
215–219

Why does getting the canonical type of an array type change the array element type? I would think it wouldn't matter but it obviously must?

It does matter because that's how canonical types work, see the Clang documentation in ASTContext.h:

/// The non-canonical version of a type may have many "decorated" versions of
/// types.  Decorators can include typedefs, 'typeof' operators, etc. The
/// returned type is guaranteed to be free of any of these, allowing two
/// canonical types to be compared for exact equality with a simple pointer
/// comparison.
lldb/source/DataFormatters/FormatManager.cpp
245–261

I tested this case when I reviewed this patch and IIRC this was caused by yet another call in GetArrayType (not sure though). But I don't think that's really blocking this patch as the old behaviour had the same flaw from what I can remember.

jarin added a comment.Jan 7 2020, 4:22 AM

So as long as the following are true from this patch I am ok:

  • if I ask for the array element type of "str" in the test that was added, it should return "MCHAR". We shouldn't be removing any typedefs from the type. The user can call SBType::GetCanonicalType() if they need to (or equivalent with internal APIs)

As discussed before, this works exactly the same way without my patch.

  • If there are no formatters for "MCHAR[]" we can fall back to "char[]".

This works. If there is formatter for MCHAR[], we use that formatter.

Note that if there is formatter for the element type (MCHAR), the command interpreter will not use that one because of the logic in ValueObjectPrinter::ShouldPrintChildren (which only expands children if the summary formatter says it has children or if there is no summary formatter). This might be a CLI usability regression.

  • If there are no formatters for "MMCHAR[]" from my example we fall back to the _first_ array formatter that supports the array of typedefs, or back to the the array of canonical types. Only the first array based formatter should be returned as the desired formatter

This also works. Here is an example:

(lldb) b df.cc:7
Breakpoint 1: where = df.out`main + 61 at df.cc:7:3, address = 0x000000000040114d
(lldb) r
...
(lldb) p m
(MCHAR [10]) $0 = "m1"
(lldb) p mm
(MMCHAR [10]) $1 = "m2"
(lldb) type summary add --summary-string "-- mchar --" -x "MCHAR \[[0-9]+]"
(lldb) p m
(MCHAR [10]) $2 = -- mchar --
(lldb) p mm
(MMCHAR [10]) $3 = -- mchar --
(lldb) type summary add --summary-string "-- mmchar --" -x "MMCHAR \[[0-9]+]"
(lldb) p m
(MCHAR [10]) $6 = -- mchar --
(lldb) p mm
(MMCHAR [10]) $7 = -- mmchar --

Here is the program:

typedef char MCHAR;
typedef MCHAR MMCHAR;

int main() {
  MMCHAR mm[10] = "m2";
  MCHAR m[10] = "m1";
  return 0;  // break here
}
clayborg accepted this revision.Jan 8 2020, 3:41 PM

Ok thanks for clarifying all my points! Looks good. And it would be nice to fix any locations that were canonicalizing types when they shouldn't be in a future patch.

This revision is now accepted and ready to land.Jan 8 2020, 3:41 PM
jarin added a comment.Jan 10 2020, 1:51 AM

Raphael, could you possibly land the patch for me? Thanks!

Raphael, could you possibly land the patch for me? Thanks!

It's in my queue, but the green dragon bot (and my own CI) are currently dead due to a libc++ change that breaks the data formatters. I'll get that running and then I'll land this.

Actually the failures are gmodules only so I'll land this now.

This revision was automatically updated to reflect the committed changes.