This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Add support for objc_msgSend stubs
ClosedPublic

Authored by keith on Jun 17 2022, 4:32 PM.

Details

Reviewers
int3
ributzka
Group Reviewers
Restricted Project
Commits
rG3c24fae3986a: [lld-macho] Add support for objc_msgSend stubs
Summary

[lld-macho] Add support for objc_msgSend stubs

Apple Clang in Xcode 14 introduced a new feature for reducing the
overhead of objc_msgSend calls by deduplicating the setup calls for each
individual selector. This works by clang adding undefined symbols for
each selector called in a translation unit, such as _objc_msgSend$foo
for calling the foo method on any NSObject. There are 2
different modes for this behavior, the default directly does the setup
for _objc_msgSend and calls it, and the smaller option does the
selector setup, and then calls the standard _objc_msgSend stub
function.

The general overview of how this works is:

  • Undefined symbols with the given prefix are collected
  • The suffix of each matching undefined symbol is added as a string to __objc_methname
  • A pointer is added for every method name in the __objc_selrefs section
  • A got entry is emitted for _objc_msgSend
  • Stubs are emitting pointing to the synthesized locations

Notes:

  • Both __objc_methname and __objc_selrefs can also exist from object files, so their contents are merged with our synthesized contents
  • The compiler emits method names for defined methods, but not for undefined symbols you call, but stubs are used for both
  • This only implements the default "fast" mode currently just to reduce the diff, I also doubt many folks will care to swap modes
  • This only implements this for arm64 and x86_64, we don't need to implement this for 32 bit iOS archs, but we should implement it for watchOS archs in a later diff

Diff Detail

Event Timeline

keith created this revision.Jun 17 2022, 4:32 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 17 2022, 4:32 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
keith requested review of this revision.Jun 17 2022, 4:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 17 2022, 4:32 PM
keith added a comment.Jun 17 2022, 4:35 PM

This is a WIP version of https://github.com/llvm/llvm-project/issues/56034 I wanted to submit it early to hopefully get some initial reactions from folks in case I'm going down a bad path here! FYI I'm out through 7/5th, so I won't get back to this until sometime after then.

There are quite a few things left to do mostly outlined by TODOs here. Right now this doesn't work with non-trivial programs because some (not sure the conditions here) object files contain a __objc_selrefs section that conflicts with the one I'm creating. I also have only been testing this on my M1 mac, which is why I've only done the arm64 support so far, and clearly I haven't added any tests yet.

But I'd love to get feedback on the overall approach here before I flesh out these TODOs!

rmaz added a subscriber: rmaz.Jun 24 2022, 7:15 AM
int3 added a subscriber: int3.Jul 5 2022, 12:03 PM

This looks like a reasonable approach. Re the section name conflict, do we just need to apply relocations + concatenate the output (i.e. what ConcatOutputSection is doing), or do we have to do something more complicated? Whether to make SyntheticSections subclasses of InputSections vs OutputSections has been something we've debated for a while... LLD-ELF makes them subclass InputSections but we don't, and maybe this use case should make us rethink that.

lld/MachO/SyntheticSections.cpp
653–655

maybe do something similar to what we have for StubHelperSection::setup()

678

I would just do an if check for 32/64 bit here, I don't think we should having ObjC implementation details exposed on the Target class

int3 added inline comments.Jul 5 2022, 12:11 PM
lld/MachO/Writer.cpp
698–700

I think this should be part of treatSpecialUndefineds, which runs before scanSymbols -- I expect that we'd raise undef symbol errors otherwise

int3 added inline comments.Jul 5 2022, 12:32 PM
lld/MachO/SyntheticSections.cpp
678

nvm, I realize this is in line with what we're already doing for the regular stubs code + arch dependency means that we'd need to check for more than word size

int3 added inline comments.Jul 5 2022, 12:40 PM
lld/MachO/Writer.cpp
698–700

oh wait no we do the actual undef check in scanRelocations, so this is fine.

Sorry -- I've been away from the code for a while, should have paged in a bit more context before reviewing :)

keith added a comment.Jul 5 2022, 1:46 PM

Thanks for doing a first pass!

Re the section name conflict, do we just need to apply relocations + concatenate the output (i.e. what ConcatOutputSection is doing), or do we have to do something more complicated?

This is a good question that still requires more investigation from me. I have found so far that these sections already exist in object files only in _some_ cases and I haven't yet figured out why that is, or how that needs to be handled. I didn't hit that case until I started testing on real world codebases, and even then only a small subset of object files have them.

keith updated this revision to Diff 447818.Jul 26 2022, 1:51 PM
keith marked 3 inline comments as done.

Implmenet much more robust version

keith updated this revision to Diff 447830.Jul 26 2022, 2:13 PM

Minor todos

keith updated this revision to Diff 447844.Jul 26 2022, 2:58 PM

Add more test coverage

keith updated this revision to Diff 447851.Jul 26 2022, 3:25 PM

rebase to resolve conflicts

keith added a comment.Jul 26 2022, 3:26 PM

@int3 ok can you take another look here? I haven't implemented non arm64, or the "small" mode yet, but otherwise this is more complete than before.

int3 added inline comments.Jul 26 2022, 8:15 PM
lld/MachO/SyntheticSections.cpp
655

isn't the string already null-terminated? why do we have to add another \0?

704–705

I'm not sure I understand how InputSection ordering matters... also, which offset are you referring to here?

Plus inserting like this means that we are copying the entire vector, which is pretty large... would prefer if we didn't have to do this

1573

Why can't we just use stringMapOffset itself? Is it because it isn't available in the non-deduplicated CStringSection?

If that's the case, I wonder if we should just always have __objc_methname be deduplicated...

keith updated this revision to Diff 447921.Jul 26 2022, 8:55 PM
keith marked an inline comment as done.

Add selrefs to end of inputSections

lld/MachO/SyntheticSections.cpp
655

I didn't dig into this, but this is how I got here:

In CStringSection:

StringRef string = isec->getStringRef(i);
llvm::outs() << "'" << string << "'" << " len: " << string.size() << "\n";

This prints: 'foo' size 4.

In ObjCStubsSection::addEntry

llvm::outs() << "'" << sym->getName() << "' size " << sym->getName().size() << "\n";
std::string methname = sym->getName().drop_front(symbolPrefixLength).str();

This prints 'foo' size 3.

704–705

Sorry I didn't go back to check this. I originally added it before I started using getVA for the selrefs section, with that change this isn't required. Removed.

1573

Why can't we just use stringMapOffset itself? Is it because it isn't available in the non-deduplicated CStringSection?

Correct

If that's the case, I wonder if we should just always have __objc_methname be deduplicated...

Yea we could definitely do that if you think the performance downside wouldn't be a big deal. I didn't feel great adding this here either, but figured the time savings on larger codebases might be significant enough to warrant the complexity.

keith updated this revision to Diff 447922.Jul 26 2022, 8:57 PM

Minor format

int3 added inline comments.Jul 27 2022, 9:47 AM
lld/MachO/SyntheticSections.cpp
1573

yeah initially I wanted to keep the default output of LLD 100% unoptimized but I'm having second thoughts... firstly the behavior difference wrt ld64 is not ideal, secondly we haven't parallelized this yet, thirdly the extra data we have to write for __cstring probably offsets the perf savings we get from not dedup'ing (just as I've found some builds to be faster with -dead_strip), and fourthly, the code complexity...

would be nice to pick an objC-intensive app and run a benchmark here. but I don't think a minor regression here should block this, we can work on parallelizing this later instead

keith updated this revision to Diff 449771.Aug 3 2022, 1:29 PM
keith marked 2 inline comments as done.

Always deduplicate methnames

keith updated this revision to Diff 449777.Aug 3 2022, 1:44 PM

Remove leftover members

keith updated this revision to Diff 449821.Aug 3 2022, 4:18 PM

Handle arguments

keith updated this revision to Diff 449839.Aug 3 2022, 5:18 PM

Add x86_64 support

keith updated this revision to Diff 450156.Aug 4 2022, 3:47 PM

Small fixups

keith updated this revision to Diff 450170.Aug 4 2022, 4:12 PM

Resolve more todos

keith updated this revision to Diff 450173.Aug 4 2022, 4:17 PM

Minor cleanup

keith retitled this revision from [WIP][lld-macho] Add support for objc_msgSend stubs to [lld-macho] Add support for objc_msgSend stubs.Aug 4 2022, 4:18 PM
keith edited the summary of this revision. (Show Details)
keith edited the summary of this revision. (Show Details)Aug 4 2022, 4:23 PM

@int3 ok I've updated this with all the major outstanding issues and updated the description with the current state. I believe this is ready for more serious review. Let me know if you think I should implement either of things I'm planning to defer here, the small mode and watchOS support, or if you're also ok with us doing those separately.

int3 added a comment.Aug 8 2022, 8:34 AM

file naming consistency nits:

x86_64-objcstubs.s -> x86-64-objc-stubs.s
arm64-objcstubs.s -> arm64-objc-stubs.s
methname.s -> objc-methname.s
selrefs.s -> objc-selrefs.s

lld/MachO/Arch/ARM64.cpp
119–121

What's the purpose of these brk #0 instructions? Also, any idea why they are rendered as brk #0x1 in the arm64-objcstubs.s test?

169

are you planning to implement this in the near future? otherwise I think it's cleaner to drop it for now

lld/MachO/Arch/X86_64.cpp
197

consistency nit

lld/MachO/Driver.cpp
792–801 ↗(On Diff #450173)

the switch statement seems a bit overkill given that we know arg can only take two values... how about just if (arg->getOption().getID() == OPT_objc_stubs_small) { ... }?

1183 ↗(On Diff #450173)

is this necessary? isn't marking the pieces as live above sufficient?

lld/MachO/InputSection.h
191

use member initialization list

lld/MachO/SyntheticSections.cpp
655

could you dig into this? I would really prefer we avoid copying the string if possible

lld/MachO/SyntheticSections.h
315

Could we have a block comment here explaining how ObjC stubs are laid out? (Similar to what we have for StubsSection.)

325

could make this a StringLiteral (then we wouldn't need a separate symbolPrefixLength either)

584

we should be returning a const ref instead of a copy... but I think it would be even cleaner to just expose a getStringOffset(StringRef) method instead of the whole map interface. I think having the 31-bit hashing logic encapsulated within DeduplicatedCStringSection is also cleaner

lld/MachO/Writer.cpp
1200

nit: braces unnecessary

lld/test/MachO/arm64-objcstubs.s
36 ↗(On Diff #450173)

Should we have a CHECK-EMPTY here just to verify that _objc_msgSend$bar isn't generated? Otherwise I'm not sure what the purposes of selref2 is :)

lld/test/MachO/methname.s
30–34 ↗(On Diff #450173)

can we spell out "in" -> "internal"? I spent a while trying to figure out what "inc str" meant heh

also I don't think the _some_cstr and _some_methname symbols are needed

lld/test/MachO/x86_64-objcstubs.s
18 ↗(On Diff #450173)

just curious -- does clang -S normally generate regular symbols for strings in __objc_methname? AFAIK it usually uses private LFoo symbols for __cstring

keith updated this revision to Diff 451012.Aug 8 2022, 5:51 PM
keith marked 11 inline comments as done.
keith edited the summary of this revision. (Show Details)

Address feedback

lld/MachO/Arch/ARM64.cpp
119–121

The comment here was outdated, fixed to show 0x1. I'm not sure why there are extras here, maybe some alignment concern? I wasn't really sure if the right path is to copy ld64, or just ignore unless we hit an issue. I haven't tested without them.

lld/MachO/Driver.cpp
1183 ↗(On Diff #450173)

Without it I hit this assert https://github.com/llvm/llvm-project/blob/1b349bdaaa540e97d90318c3706527f6ca084987/lld/MachO/InputSection.cpp#L221 when using -dead_strip, the entire section being alive or not doesn't include the strings being alive

lld/MachO/SyntheticSections.cpp
655

I pushed a new option here which I believe avoids the copies. The core issue is that here https://github.com/llvm/llvm-project/blob/1b349bdaaa540e97d90318c3706527f6ca084987/lld/MachO/InputSection.h#L206-L212 we manually construct a StringRef with the actual data, and with the length of the actual data + 1, since pieces[i + 1].inSecOff is after the null terminator. But any other StringRef initialization, like this one https://github.com/llvm/llvm-project/blob/1b349bdaaa540e97d90318c3706527f6ca084987/lld/MachO/InputFiles.cpp#L943 goes through an initializer that computes the length, which naturally excludes the null terminator https://github.com/llvm/llvm-project/blob/1b349bdaaa540e97d90318c3706527f6ca084987/llvm/include/llvm/ADT/StringRef.h#L83-L85

I'm now mirroring this StringRef initialization, but maybe we should correct that string length off by 1 instead?

lld/test/MachO/x86_64-objcstubs.s
18 ↗(On Diff #450173)

Here's a snippet:

	.section	__TEXT,__objc_methname,cstring_literals
l_OBJC_METH_VAR_NAME_:                  ; @OBJC_METH_VAR_NAME_
	.asciz	"foo"

Then they reference this from __objc_const:

	.section	__DATA,__objc_const
	.p2align	3                               ; @"_OBJC_$_INSTANCE_METHODS_Foo"
__OBJC_$_INSTANCE_METHODS_Foo:
	.long	24                              ; 0x18
	.long	3                               ; 0x3
	.quad	l_OBJC_METH_VAR_NAME_
	.quad	l_OBJC_METH_VAR_TYPE_
	.quad	"-[Foo foo]"

I added a prefix to these to be more clear

int3 accepted this revision.Aug 10 2022, 3:04 AM

lgtm, thanks!

lld/MachO/Arch/ARM64.cpp
119–121

ah gotcha. yeah let's just emulate ld64

lld/MachO/SyntheticSections.cpp
649

how about asserting that the symbol name startswith() the symbol prefix here, just for clarity?

655

ooh. Yeah we should fix that off-by-one. I'm fine if you want to do it in a separate commit though

1593
lld/test/MachO/arm64-objc-stubs.s
40–43 ↗(On Diff #451012)

make these symbols private lselref*s too?

This revision is now accepted and ready to land.Aug 10 2022, 3:04 AM
keith updated this revision to Diff 451664.Aug 10 2022, 3:50 PM
keith marked 8 inline comments as done.

Small feedback improvements

This revision was landed with ongoing or failed builds.Aug 10 2022, 5:17 PM
This revision was automatically updated to reflect the committed changes.
int3 added inline comments.Sep 12 2022, 2:43 PM
int3 added inline comments.Sep 13 2022, 8:44 AM
lld/test/MachO/objc-selrefs.s
17 ↗(On Diff #451690)

@keith I tried running this test using ld64 (ld64-764 from XCode 14) instead of LLD and it complained about _objc_msgSend$foo being undefined + it did not recognize -objc_stubs_fast. Am I using the wrong version or something?

The version 764 is from Xcode 13:

% xcodebuild -version
Xcode 14.0
Build version 14A309
% ld -v
@(#)PROGRAM:ld  PROJECT:ld64-819.6
BUILD 14:58:44 Aug  5 2022
configured to support archs: armv6 armv7 armv7s arm64 arm64e arm64_32 i386 x86_64 x86_64h armv6m armv7k armv7m armv7em
LTO support using: LLVM version 14.0.0, (clang-1400.0.29.102) (static support for 29, runtime is 29)
TAPI support using: Apple TAPI version 14.0.0 (tapi-1400.0.11)
% DEVELOPER_DIR=/Applications/Xcode-13.4.0-RC1.app ld -v
@(#)PROGRAM:ld  PROJECT:ld64-764