Page MenuHomePhabricator

[lldb/Dataformatter] Add support for CoreFoundation Dictionaries and Sets

Authored by mib on Apr 17 2020, 12:46 PM.



[lldb/Dataformatter] Add support for CoreFoundation Dictionaries and Sets

This patch improves data formatting for CoreFoundation containers:
CFDictionary and CFSet.

These data formatters make the containers and their children appear in Xcode's
variables view (and on the command line) without having to expand the
data structure.

Previous implementation (D48450) only supported showing the container's element count.

(lldb) frame var dict
(__NSCFDictionary *) dict = 0x00000001004062b0 2 key/value pairs

(lldb) frame var set
(__NSCFSet *) set = 0x0000000100406330 2 elements

Now the variable can be dereferenced to dispaly the container's elements:

(lldb) frame var *dict
(__NSCFDictionary) *dict = {
  [0] = {
    key = 0x0000000100004050 @"123"
    value = 0x0000000100004090 @"456"
  [1] = {
    key = 0x0000000100004030 @"abc"
    value = 0x0000000100004070 @"def"

(lldb) frame var *set
(__NSCFSet) *set = {
  [0] = 0x0000000100004050 @"123"
  [1] = 0x0000000100004030 @"abc"


Signed-off-by: Med Ismail Bennani <>

Diff Detail

Event Timeline

mib created this revision.Apr 17 2020, 12:46 PM
mib updated this revision to Diff 258402.Apr 17 2020, 12:49 PM
mib edited the summary of this revision. (Show Details)

Reformat patch.

Thanks for taking care of this. It's a lot of work. First round of comments.


Why do you need this? Can't you just use the default destructor?


I'm under the impression the second return is never hit.


Can this ever happen? What if addr is 0 instead?


Thanks for doing this, as it will work on remote devices. We really need to check the test passes on arm64 before committing.


These two pieces of code, m_ptr_size == 4 and m_ptr_size == 8 are surprisingly similar. I'm really worried we might have a bug in one of them and miss the other. Is there anything we can do to share the code? [e.g. templatize].


This could be an lldb_assert or unreachable. We really shouldn't be here if ptrsize != 8 or ptrsize != 4, unless there's an egregious bug.




Again, why?


Maybe a reference to the foundation header where these are defined, if public.

mib marked 11 inline comments as done.Apr 17 2020, 2:44 PM
mib added inline comments.

Indeed, they're very similar and I already tried using templates (and SFINAE) to make it more generic, however I couldn't achieve that.

Since the remote architecture might be different from lldb's, we can't use macros to generate the underlying struct with the right size. So, I decided to template the structure used by CF, and have one of each architecture as a class attribute (look at CFBasicHash.h:114).

Basically it's a tradeoff I chose voluntarily: I preferred having the CFBasicHash class handle the architecture to only expose one CFBasicHash object in the CFDictionary and CFSet data-formatter, rather than having two CFBasicHash objects templated for each ptr_size and have all the logic duplicated for each different architecture AND each data formatters.

If you can see a better way to do it, please let me know :)


It is not in a public header, that's why I copied the explanation.

mib updated this revision to Diff 258431.Apr 17 2020, 2:49 PM
mib marked 2 inline comments as done.

Address Davide's comments.

labath added a subscriber: labath.Apr 20 2020, 1:10 AM
labath added inline comments.
template<typename T> updateFor(std::unique_ptr<__CFBasicHash<T>> &m_ht, ...)

if (m_ptr_size == 4)
  updateFor<uint32_t>(m_ht_4, ...);
else if (m_ptr_size == 8)
  updateFor<uint64_t>(m_ht_8, ...)


Or the entire class could be a template, inheriting from a common (non-templatized) interface...


No manual memory management, please.


Are these actually used anywhere?

mib marked 5 inline comments as done.Apr 20 2020, 5:14 AM
mib added inline comments.

I didn't think about that --' ... Thanks for the suggestion ^^


This is a left-over used in the multi variant implement that I'm working on ...

mib updated this revision to Diff 258710.Apr 20 2020, 5:22 AM
mib marked 2 inline comments as done.

Address Pavel's comments.

This is almost ready. After we're done with this round of cosmetics, I'll take another look a the algorithm and sign off.


else after return


else after return. no need for {}


Can you add a comment explaining what this loop does?


lldbassert. Also no need for the second part of the comment.


stray line

shafik added a subscriber: shafik.Apr 20 2020, 11:54 AM
shafik added inline comments.

Use in class member initializers please then you can use =default for the constructor this is more idiomatic modern C++


uint32_t m_ptr_size = UINT32_MAX;


lldb::ByteOrder m_byte_order = eByteOrderInvalid;


Address m_address = LLDB_INVALID_ADDRESS;


These bool should have default values?

I don't see them initialized.

shafik added inline comments.Apr 20 2020, 12:27 PM

Shouldn't we check that m_ht is actually managing an object before attempting to access it's value?


These sizeof calls feel like the should just be consolidated into the initialization of size.






const if a value is not supposed to change make it const always. This prevents bugs where a "const" is modified by mistake.


reinterpret_cast and static_cast respectively. Same below.

mib updated this revision to Diff 258826.Apr 20 2020, 1:22 PM
mib marked 14 inline comments as done.

Address Davide's & Shafik's comments.

mib marked 2 inline comments as done.Apr 20 2020, 1:22 PM
mib updated this revision to Diff 258827.Apr 20 2020, 1:32 PM

Fix typo.

This looks fine to me but please give Pavel another chance to look before committing.

mib added a comment.Apr 20 2020, 3:13 PM

This looks fine to me but please give Pavel another chance to look before committing.

Will wait for @labath approval :) !

I don't have a vested interested here (the main reason I looked at this was the template discussion), but I have made a cursory pass over the patch. I didn't check the high level functionality, but it seems like the implementation details can be improved upon.


I am not sure, but I think this still may technically be undefined behavior if m_ht is null even though the application of sizeof means the dereference will not normally take place.


Why don't you just pass m_ht.get() to ReadMemory ?


ReadMemory(ptr_offset, m_ht->pointers, ...) ?


= default? Maybe even implicitly default by not providing a destructor?


You've cargo-culted this code, but you've changed DataBufferSP to a plain DataBuffer. If the created valueobject continues to reference this data (I haven't checked, but I think it does), this will be a dangling reference.

mib marked 7 inline comments as done.Apr 21 2020, 12:47 PM
mib added inline comments.

IIUC, ReadMemory will read memory matching the inferior endianness. That's why, I'm using a DataExtractor, to translate, the copied bytes to the host endianness.


Same answer.

mib updated this revision to Diff 259084.EditedApr 21 2020, 12:49 PM
mib marked 2 inline comments as done.

Address Pavel's comments:

  • Fix eventual UB by using the struct type in sizeof instead of dereferencing the pointer.
  • Remove empty destructors.
  • Fix dangling reference.
labath added inline comments.Apr 22 2020, 12:25 AM

Well, I am glad that you are thinking about non-native endianness, but I am afraid things just don't work that way. There's no way an API like CopyData(void *, size_t) can automatically fix endianness problems. Why? Because the fix depends on the structure of the data.

Imagine this sequence of bytes: 00 01 02 03 04 05 06 07. If this sequence really represents a sequence of (little-endian) bytes (uint8_ts), then the corresponding "big-endian" sequence would be identical. If, however, it represents 2-byte words (uint16_t), then the big-endian representation would be 01 00 03 02 05 04 07 06. For 4-byte words, it would be 03 02 01 00 07 06 05 04, etc.

In short, you need to know the structure of the data to translate endiannes, and CopyData does not know that. Indeed, if you look at the implementation of that function you'll see that it just does a memcpy.

My suggestion for reading directly into the object was based on the assumption that are fine with things working only if everything is of the same endianness. I wouldn't have allowed that in more cross-platform code, but I didn't feel like policing all corners of lldb.

However, if you do want to make this endian-correct (which I do encourage), then you have a couple of options:

  • ditch the structs and use data extractor functions to read (GetU32, GetAddress, etc) the individual fields -- these know the size of the object they are accessing, and can adjust endianness appropriately
  • compare host and target endianness and call llvm::sys::swapByteOrder on the individual fields if they differ
  • use llvm's packed_endian_specific_integral instead of native types in the struct definitions. This would require passing the endiannes as a template parameter and then dispatching to the appropriate struct based on the runtime value similar to how you've already done for byte sizes.

Each of these options has its drawbacks, but I've seen them all places throughout llvm. In this particular case, I have a feeling the simplest option would be to go the DataExtractor route...

mib updated this revision to Diff 260012.EditedApr 24 2020, 3:58 PM

After playing with llvm::sys::swapByteOrder and the DataExtractor getters (GetU8, GetU16, GetU32 ....), it looks like neither of these supports bitfields.

Since the CFBasicHash struct relies heavily on bitfields, it won't support mixed endianness.

I changed the patch so if the host and the target have different byte orders, lldb will abort the data formatting.
I also removed all the DataBuffer and DataExtractor logic since it's not needed anymore.

That seems acceptable (though definitely not ideal), given that this is objc code, and probably no big endian machine will ever be running or debugging objc code...

davide accepted this revision.Apr 27 2020, 11:28 AM
This revision is now accepted and ready to land.Apr 27 2020, 11:28 AM
This revision was automatically updated to reflect the committed changes.