This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Support Compact C Type Format (CTF)
ClosedPublic

Authored by JDevlieghere on Jul 10 2023, 9:43 AM.

Details

Summary

Add support for the Compact C Type Format (CTF) in LLDB. The format describes the layout and sizes of C types. It is most commonly consumed by dtrace. We generate CTF for the XNU kernel and want to be able to use this in LLDB to debug kernels for which we don't have dSYMs (anymore). CTF is a much more limited debug format than DWARF which allows is to be an order of magnitude smaller: a 1GB dSYM can be converted to a handful of megabytes of CTF. For XNU, the goal is not to replace DWARF, but rather to have CTF serve as a "better than nothing" debug info format when DWARF is not available.

It's worth noting that the LLVM toolchain does not support emitting CTF and I don't intend to change that. XNU uses ctfconvert to generate CTF from DWARF and that's also what's used by the test.

Diff Detail

Event Timeline

JDevlieghere created this revision.Jul 10 2023, 9:43 AM
JDevlieghere requested review of this revision.Jul 10 2023, 9:43 AM

This looks good to me, but I have very little experience with the type system; it def needs someone's input who is more familiar with that part of lldb.

lldb/source/Plugins/SymbolFile/CTF/SymbolFileCTF.cpp
104

the ctf header is a uint16+uint8+uint8 and then a bunch of uint32's. DataExtractor::ExtractBytes will only correctly extract 1-32 byte items when buffer endianness doesn't match host endian. If we really want to support a big endian CTF correctly, these fields need to be extracted individually. (even if the overall size is less than 32 bytes, this method would byteswap all of them as if they were a single value)

170

I'm not sure I understand this. I assume stroff is the offset of a string table which is strlen bytes long. parlabel/parname come after them? Should this be stroff+strlen + parlabel/parname? More generally, is there an expected/guarantee order of all of these so we could check the highest offset one against the m_data size instead of checking them individually?

533

Probably my ignorance here, but is this missing an argument? Or does the type_or_error end up as one of the arguments fed to the format string. formatv's are always a little mysterious to me.

JDevlieghere marked 3 inline comments as done.Jul 10 2023, 7:28 PM
JDevlieghere added inline comments.
lldb/source/Plugins/SymbolFile/CTF/SymbolFileCTF.cpp
104

I don't really care about big endian but it's easy enough to support it, so I might as well make it work.

170

No: the parlabel and the parname are offsets into he string table. The sections are ordered so we could check header_offset + ctf_header.stroff + ctf_header.strlen. I did it like this to improve error reporting. I could wrap this in an if(log) and have a single check if logging is off?

533

This is correct: LLDB_LOG_ERROR passes the error as {0}. We had a bunch of places in that were using the LOG macro incorrectly, which I fixed in e0e36e3725b50ac690d1839f0e9476e93ff7988d.

jasonmolenda added inline comments.Jul 10 2023, 7:47 PM
lldb/source/Plugins/SymbolFile/CTF/SymbolFileCTF.cpp
104

Yeah, fair enough. We could just memcpy if we're not going to handle BE, too (that's what the data extractor does for a same-endian setup).

170

OK, I see. We could have just checked that parlabel and parname were less than stroff+strlen too at this point. I don't mind all the checks, not worried about the execution time, it was a lot of checking tho and I was wondering if there wasn't one value that's guaranteed to be the highest offset that we could check to reduce how many things we check.

JDevlieghere marked 4 inline comments as done.Jul 10 2023, 8:00 PM
JDevlieghere added inline comments.
lldb/source/Plugins/SymbolFile/CTF/SymbolFileCTF.cpp
170

Good point, I think that better conveys what this is doing. Thanks, I'll update the code.

davide added a subscriber: davide.Jul 10 2023, 9:52 PM
JDevlieghere marked an inline comment as done.

Address @jasonmolenda's feedback

mib accepted this revision.Jul 10 2023, 11:00 PM
mib added a subscriber: bulbazord.

LGTM with comments.

lldb/source/Plugins/SymbolFile/CTF/SymbolFileCTF.cpp
117

may be the magic can be a constexpr value.

123

ditto for the version.

234–242

Can the hard coded hexadecimals be turned to constexpr or may be we should add a link to the spec that has these values ?

542

Shouldn't we increment type_uid only the type was created successfully ?

542–546
560

I'd expect to call this only if we can parse the type.

800–806

Todo ?

870

I'm sure @bulbazord might have some opinion on this ConstString :p

895

Same here, I guess we could change name to be a StringRef in a followup

This revision is now accepted and ready to land.Jul 10 2023, 11:00 PM
Michael137 added inline comments.Jul 11 2023, 4:03 AM
lldb/source/Plugins/SymbolFile/CTF/SymbolFileCTF.cpp
260

Why do we do this special handling for void?

265

Should we also log what the type we encountered was?

324

Here we assume the typedef's decl context is always the top-level TU? I would expect this to be problematic for something like:

using Int = double; 
                    
struct S {          
    using Int = int;
};

The AST we build with the dwarf parser respects the scopes:

TranslationUnitDecl 0x10a019608 <<invalid sloc>> <invalid sloc> <undeserialized declarations>              
|-TypedefDecl 0x12c8d8008 <<invalid sloc>> <invalid sloc> Int 'double'                                     
| `-BuiltinType 0x10a019830 'double'                                                                       
`-CXXRecordDecl 0x12c8d8120 <<invalid sloc>> <invalid sloc> struct S definition                            
  |- ...                                 
  `-TypedefDecl 0x12c8d8260 <<invalid sloc>> <invalid sloc> Int 'int'                                      
    `-BuiltinType 0x10a019710 'int'

The PDB implementation looks for the parent scope and passes that to CreateTypedef. The DWARF implementation is a bit more roundabout in that it calls MakeType with an invalid CompilerType but sets the encoding type to eEncodingIsTypedefUID and ResolveState to Unresolved (which Type::ResolveCompilerType knows how to handle later on)

Michael137 added inline comments.Jul 11 2023, 4:05 AM
lldb/source/Plugins/SymbolFile/CTF/SymbolFileCTF.cpp
324

(Sorry meant to comment in ParseTypedef below.)

All my comments seem to have been submitted on a stale revision. So they shifted up a couple of lines. But looks like Phabricator doesn't allow me to delete/move them

JDevlieghere marked 6 inline comments as done.Jul 11 2023, 8:54 AM
JDevlieghere added inline comments.
lldb/source/Plugins/SymbolFile/CTF/SymbolFileCTF.cpp
260

CTF encodes it as an integer, but clang doesn't consider it an integer type.

324

Is that allowed in C? I took this shortcut because CTF only supports C and I didn't think there were cases in C where that mattered.

542

No: CTF uses the type uid to refer to other types. If we don't increment it, every subsequent type ID will be off-by-one. We rely on this property by accessing the m_types vector by index (== uid).

560

See my comment above: we insert a nullptr to keep the index in the m_types vector equal to the type_uid. The other alternative is to use a map from UID to Type, but then you end up with what is essentially a vector with a small hole here and there, which is just a less efficient way of doing the same.

800–806

No: CTF does not provide symbols. If you look at ParseObjects you can see we rely on the existing symbol table to map symbols to type.

895

Variable (here) and Function (above) both store their name as a ConstString so the ConstString comparison is somewhat fast. To meaningfully change the signature we'll need to change the storage type too.

bulbazord added inline comments.Jul 11 2023, 9:40 AM
lldb/source/Plugins/SymbolFile/CTF/SymbolFileCTF.cpp
800–806

The explanation makes sense. I think it would be worth adding a comment w/ that explanation.

802–803

Perhaps you could flesh out the FIXME with exactly what kind of work needs to be done for this function to be considered "implemented"?

870

Module::LookupInfo uses ConstString right now. If we remove ConstString usage there, we can remove it here at that time.
Same for the thing below.

JDevlieghere marked 16 inline comments as done.

Address review feedback

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2023, 11:31 AM