This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Remove broken shared cache duplicate class handling
Needs ReviewPublic

Authored by kastiglione on Jun 23 2023, 10:47 AM.

Details

Summary

The shared cache list of ObjC classes contains classes that are duplicate in name (which
may or may not be duplicate in definition).

Currently, lldb does not handle this list of duplicates correctly. The bugs in the
handling of duplicate classes are in the for loop over the duplicates, which has the
following issues:

  1. Indexing into the unique class list (classOffsets), not the duplicate list (duplicateClassOffsets)
  2. Not incrementing idx

The result is that the first N unique classes are wastefully rehashed, where N is
duplicates_count (which can be large).

Note that this change removes the loop over the duplicates altogether, to avoid paying
the cost of ultimately doing nothing.

Ideally, the above bugs could be addressed, however lldb doesn't know which of the
duplicates the ObjC runtime has chosen. When the ObjC runtime registers duplicate
classes, it emits the following error:

Class <class> is implemented in both libA.dylib (0x1048800b8) and libB.dylib
(0x1048700b8). One of the two will be used. Which one is undefined.

In lldb, the code that uses results of this scan does class lookup by name, and doesn't
handle the case where there are multiple classes for a given name. Also, lldb doesn't
know which of the duplicates the ObjC runtime has chosen.

The ultimate fix is to determine which duplicate the ObjC runtime has chosen, and have
lldb use that too.

Diff Detail

Event Timeline

kastiglione created this revision.Jun 23 2023, 10:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2023, 10:47 AM
kastiglione requested review of this revision.Jun 23 2023, 10:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2023, 10:47 AM

Both issues are not present in the code that does the equivalent parsing for Objective-C versions 12 through 15. Most likely I accidentally introduced these bugs when copy/pasting this for the changes in 16. I would prefer to use that rather than diverging our handling of duplicates based on the Objective-C runtime version.

jingham added a comment.EditedJun 23 2023, 11:30 AM

There's similar code for the runtime versions 12-15 which actually looks correct. It seems to claim that the duplicate classes are all stuffed after the capacity in the classOffsets array and that the one that won will indeed be in that list but will not be marked as a duplicate. Did it not work to fix what look like transcription errors going from the 12-15 version of this code to the 16 version?

and that the one that won will indeed be in that list but will not be marked as a duplicate.

All of the duplicate classes are in the duplicate list, not a single class that won. For example, a duplicated class, say ABCSomeController, will be in the duplicate list 2 or more times. The result is that there's redundant hashing, since each of the duplicate names occurs 2 or more times. Which is maybe fine, or maybe causes a perf penalty. Another result is that ObjCLanguageRuntime::m_hash_to_isa_map (a multimap) ends up storing every duplicate, however the readers of m_hash_to_isa_map always take the first match found (see ObjCLanguageRuntime::GetDescriptorIterator). In other words, the redundant classes are never accessed, and the one lldb does return could differ from the class the objc runtime has picked. A "fix" to the surface level bugs identified here has the result of exposing a bunch of non-determinism.

My thought process was: "This is broken to the point that duplicate classes are currently being dropped altogether, and nobody has identified this as a problem. Maybe delete the code until we can figure out a proper fix."

I am fine with the "fix", but as you can see I am very skeptical in it being helpful.