This is an archive of the discontinued LLVM Phabricator instance.

[lldb/Dataformatter] Add support for CoreFoundation Dictionaries and Sets
ClosedPublic

Authored by mib on Apr 17 2020, 12:46 PM.

Details

Summary

[lldb/Dataformatter] Add support for CoreFoundation Dictionaries and Sets

This patch improves data formatting for CoreFoundation containers:
CFDictionary and CFSet.

These data formatters make the containers and their children appear in Xcode's
variables view (and on the command line) without having to expand the
data structure.

Previous implementation (D48450) only supported showing the container's element count.

(lldb) frame var dict
(__NSCFDictionary *) dict = 0x00000001004062b0 2 key/value pairs

(lldb) frame var set
(__NSCFSet *) set = 0x0000000100406330 2 elements

Now the variable can be dereferenced to dispaly the container's elements:

(lldb) frame var *dict
(__NSCFDictionary) *dict = {
  [0] = {
    key = 0x0000000100004050 @"123"
    value = 0x0000000100004090 @"456"
  }
  [1] = {
    key = 0x0000000100004030 @"abc"
    value = 0x0000000100004070 @"def"
  }
}

(lldb) frame var *set
(__NSCFSet) *set = {
  [0] = 0x0000000100004050 @"123"
  [1] = 0x0000000100004030 @"abc"
}

rdar://39882287

Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>

Diff Detail

Event Timeline

mib created this revision.Apr 17 2020, 12:46 PM
mib updated this revision to Diff 258402.Apr 17 2020, 12:49 PM
mib edited the summary of this revision. (Show Details)

Reformat patch.

Thanks for taking care of this. It's a lot of work. First round of comments.

lldb/source/Plugins/Language/ObjC/CFBasicHash.cpp
13

Why do you need this? Can't you just use the default destructor?

19–20

I'm under the impression the second return is never hit.

24–25

Can this ever happen? What if addr is 0 instead?

29–30

Thanks for doing this, as it will work on remote devices. We really need to check the test passes on arm64 before committing.

36–72

These two pieces of code, m_ptr_size == 4 and m_ptr_size == 8 are surprisingly similar. I'm really worried we might have a bug in one of them and miss the other. Is there anything we can do to share the code? [e.g. templatize].

140–141

This could be an lldb_assert or unreachable. We really shouldn't be here if ptrsize != 8 or ptrsize != 4, unless there's an egregious bug.

lldb/source/Plugins/Language/ObjC/NSDictionary.cpp
101

Why?

144

Again, why?

177–187

Maybe a reference to the foundation header where these are defined, if public.

mib marked 11 inline comments as done.Apr 17 2020, 2:44 PM
mib added inline comments.
lldb/source/Plugins/Language/ObjC/CFBasicHash.cpp
36–72

Indeed, they're very similar and I already tried using templates (and SFINAE) to make it more generic, however I couldn't achieve that.

Since the remote architecture might be different from lldb's, we can't use macros to generate the underlying struct with the right size. So, I decided to template the structure used by CF, and have one of each architecture as a class attribute (look at CFBasicHash.h:114).

Basically it's a tradeoff I chose voluntarily: I preferred having the CFBasicHash class handle the architecture to only expose one CFBasicHash object in the CFDictionary and CFSet data-formatter, rather than having two CFBasicHash objects templated for each ptr_size and have all the logic duplicated for each different architecture AND each data formatters.

If you can see a better way to do it, please let me know :)

lldb/source/Plugins/Language/ObjC/NSDictionary.cpp
177–187

It is not in a public header, that's why I copied the explanation.

mib updated this revision to Diff 258431.Apr 17 2020, 2:49 PM
mib marked 2 inline comments as done.

Address Davide's comments.

labath added a subscriber: labath.Apr 20 2020, 1:10 AM
labath added inline comments.
lldb/source/Plugins/Language/ObjC/CFBasicHash.cpp
36–72
template<typename T> updateFor(std::unique_ptr<__CFBasicHash<T>> &m_ht, ...)

if (m_ptr_size == 4)
  updateFor<uint32_t>(m_ht_4, ...);
else if (m_ptr_size == 8)
  updateFor<uint64_t>(m_ht_8, ...)

?

Or the entire class could be a template, inheriting from a common (non-templatized) interface...

103–108

No manual memory management, please.

lldb/source/Plugins/Language/ObjC/NSDictionary.cpp
177–187

Are these actually used anywhere?

mib marked 5 inline comments as done.Apr 20 2020, 5:14 AM
mib added inline comments.
lldb/source/Plugins/Language/ObjC/CFBasicHash.cpp
36–72

I didn't think about that --' ... Thanks for the suggestion ^^

lldb/source/Plugins/Language/ObjC/NSDictionary.cpp
177–187

This is a left-over used in the multi variant implement that I'm working on ...

mib updated this revision to Diff 258710.Apr 20 2020, 5:22 AM
mib marked 2 inline comments as done.

Address Pavel's comments.

This is almost ready. After we're done with this round of cosmetics, I'll take another look a the algorithm and sign off.

lldb/source/Plugins/Language/ObjC/CFBasicHash.cpp
19–20

else after return

35–39

else after return. no need for {}

lldb/source/Plugins/Language/ObjC/NSDictionary.cpp
753–776

Can you add a comment explaining what this loop does?

lldb/source/Plugins/Language/ObjC/NSSet.cpp
655

lldbassert. Also no need for the second part of the comment.

lldb/test/API/functionalities/data-formatter/data-formatter-objc/TestDataFormatterObjCNSContainer.py
23

stray line

shafik added a subscriber: shafik.Apr 20 2020, 11:54 AM
shafik added inline comments.
lldb/source/Plugins/Language/ObjC/CFBasicHash.cpp
9

Use in class member initializers please then you can use =default for the constructor this is more idiomatic modern C++

lldb/source/Plugins/Language/ObjC/CFBasicHash.h
65

uint32_t m_ptr_size = UINT32_MAX;

66

lldb::ByteOrder m_byte_order = eByteOrderInvalid;

68

Address m_address = LLDB_INVALID_ADDRESS;

75

These bool should have default values?

I don't see them initialized.

shafik added inline comments.Apr 20 2020, 12:27 PM
lldb/source/Plugins/Language/ObjC/CFBasicHash.cpp
50

Shouldn't we check that m_ht is actually managing an object before attempting to access it's value?

51

These sizeof calls feel like the should just be consolidated into the initialization of size.

lldb/source/Plugins/Language/ObjC/NSDictionary.cpp
698

const

738

const

lldb/source/Plugins/Language/ObjC/NSSet.cpp
603

const if a value is not supposed to change make it const always. This prevents bugs where a "const" is modified by mistake.

649

reinterpret_cast and static_cast respectively. Same below.

mib updated this revision to Diff 258826.Apr 20 2020, 1:22 PM
mib marked 14 inline comments as done.

Address Davide's & Shafik's comments.

mib marked 2 inline comments as done.Apr 20 2020, 1:22 PM
mib updated this revision to Diff 258827.Apr 20 2020, 1:32 PM

Fix typo.

This looks fine to me but please give Pavel another chance to look before committing.

mib added a comment.Apr 20 2020, 3:13 PM

This looks fine to me but please give Pavel another chance to look before committing.

Will wait for @labath approval :) !

I don't have a vested interested here (the main reason I looked at this was the template discussion), but I have made a cursory pass over the patch. I didn't check the high level functionality, but it seems like the implementation details can be improved upon.

lldb/source/Plugins/Language/ObjC/CFBasicHash.cpp
45

I am not sure, but I think this still may technically be undefined behavior if m_ht is null even though the application of sizeof means the dereference will not normally take place.

47–62

Why don't you just pass m_ht.get() to ReadMemory ?

71–82

ReadMemory(ptr_offset, m_ht->pointers, ...) ?

lldb/source/Plugins/Language/ObjC/NSDictionary.cpp
692–693

= default? Maybe even implicitly default by not providing a destructor?

lldb/source/Plugins/Language/ObjC/NSSet.cpp
666–667

You've cargo-culted this code, but you've changed DataBufferSP to a plain DataBuffer. If the created valueobject continues to reference this data (I haven't checked, but I think it does), this will be a dangling reference.

mib marked 7 inline comments as done.Apr 21 2020, 12:47 PM
mib added inline comments.
lldb/source/Plugins/Language/ObjC/CFBasicHash.cpp
47–62

IIUC, ReadMemory will read memory matching the inferior endianness. That's why, I'm using a DataExtractor, to translate, the copied bytes to the host endianness.

71–82

Same answer.

mib updated this revision to Diff 259084.EditedApr 21 2020, 12:49 PM
mib marked 2 inline comments as done.

Address Pavel's comments:

  • Fix eventual UB by using the struct type in sizeof instead of dereferencing the pointer.
  • Remove empty destructors.
  • Fix dangling reference.
labath added inline comments.Apr 22 2020, 12:25 AM
lldb/source/Plugins/Language/ObjC/CFBasicHash.cpp
47–62

Well, I am glad that you are thinking about non-native endianness, but I am afraid things just don't work that way. There's no way an API like CopyData(void *, size_t) can automatically fix endianness problems. Why? Because the fix depends on the structure of the data.

Imagine this sequence of bytes: 00 01 02 03 04 05 06 07. If this sequence really represents a sequence of (little-endian) bytes (uint8_ts), then the corresponding "big-endian" sequence would be identical. If, however, it represents 2-byte words (uint16_t), then the big-endian representation would be 01 00 03 02 05 04 07 06. For 4-byte words, it would be 03 02 01 00 07 06 05 04, etc.

In short, you need to know the structure of the data to translate endiannes, and CopyData does not know that. Indeed, if you look at the implementation of that function you'll see that it just does a memcpy.

My suggestion for reading directly into the object was based on the assumption that are fine with things working only if everything is of the same endianness. I wouldn't have allowed that in more cross-platform code, but I didn't feel like policing all corners of lldb.

However, if you do want to make this endian-correct (which I do encourage), then you have a couple of options:

  • ditch the structs and use data extractor functions to read (GetU32, GetAddress, etc) the individual fields -- these know the size of the object they are accessing, and can adjust endianness appropriately
  • compare host and target endianness and call llvm::sys::swapByteOrder on the individual fields if they differ
  • use llvm's packed_endian_specific_integral instead of native types in the struct definitions. This would require passing the endiannes as a template parameter and then dispatching to the appropriate struct based on the runtime value similar to how you've already done for byte sizes.

Each of these options has its drawbacks, but I've seen them all places throughout llvm. In this particular case, I have a feeling the simplest option would be to go the DataExtractor route...

mib updated this revision to Diff 260012.EditedApr 24 2020, 3:58 PM

After playing with llvm::sys::swapByteOrder and the DataExtractor getters (GetU8, GetU16, GetU32 ....), it looks like neither of these supports bitfields.

Since the CFBasicHash struct relies heavily on bitfields, it won't support mixed endianness.

I changed the patch so if the host and the target have different byte orders, lldb will abort the data formatting.
I also removed all the DataBuffer and DataExtractor logic since it's not needed anymore.

That seems acceptable (though definitely not ideal), given that this is objc code, and probably no big endian machine will ever be running or debugging objc code...

davide accepted this revision.Apr 27 2020, 11:28 AM
This revision is now accepted and ready to land.Apr 27 2020, 11:28 AM
This revision was automatically updated to reflect the committed changes.