This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Sort data-in-code entries
ClosedPublic

Authored by BertalanD on Sep 9 2022, 8:30 AM.

Details

Reviewers
int3
ributzka
thakis
Group Reviewers
Restricted Project
Commits
rG025a5b22c848: [lld-macho] Sort data-in-code entries
Summary

Previously, we would add entries to DataInCodeSection in the order they
appeared in input files. Because of this, entries would not be sorted if
sections were reordered due to e.g. -order_file or call graph profile
sorting.

This commit also fixes an incorrect assertion. The original assertion
from D103006 used to check that data-in-code entries are sorted in the
input objects -- likely because we use binary search on that data. In
D115556, the assertion was moved into collectDataInCodeEntries, but
the checked variable's name was not changed, so it ended up checking the
final contents of the DataInCodeSection. While this section is
guaranteed to be sorted as of this commit, we should still check that
inputs are not malformed.

This fixes a crash when building LLVM with PGO using an asserts build of
LLD as the linker.

Numbers for linking the Chromium Framework reproducer from #48001, which
has 6829 data-in-code entries:

x before
+ after
    N           Min           Max        Median           Avg        Stddev
x  20     2.1076453     2.3059683     2.1132485     2.1350302   0.049905767
+  20     2.1069031     2.3915262       2.14465     2.1728429   0.084065898
No difference proven at 95.0% confidence

Diff Detail

Event Timeline

BertalanD created this revision.Sep 9 2022, 8:30 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript
Herald added a subscriber: wenlei. · View Herald Transcript
BertalanD requested review of this revision.Sep 9 2022, 8:30 AM
thakis added a subscriber: thakis.Sep 9 2022, 12:24 PM

Thanks for looking into this!

Is just removing the assert the right thing? It kind of makes sense to require the data in code table in the output to be sorted too, so that tools can binary sort in _it_. With ld, it looks like it is sorted (with first.o, second.o, third.o, order.txt as in your test, except compiled like clang -c first.s and hence on my machine arm64):

% out/gn/bin/clang first.o second.o third.o -Wl,-order_file,order.txt -isysroot $(xcrun -show-sdk-path) -shared 
ld: warning: arm64 function not 4-byte aligned: _second from second.o
ld: warning: arm64 function not 4-byte aligned: _first from first.o
ld: warning: arm64 function not 4-byte aligned: _third from third.o
% otool -G a.out                                                                                                
a.out:
Data in code table (3 entries)
offset     length kind
0x00003fac      4 0x0004
0x00003fb0      4 0x0004
0x00003fb4      4 0x0004

These are sorted!

Now, lld with this patch:

% out/gn/bin/clang first.o second.o third.o -Wl,-order_file,order.txt -isysroot $(xcrun -show-sdk-path) -shared -fuse-ld=lld
% otool -G a.out                                                                                                            
a.out:
Data in code table (3 entries)
offset     length kind
0x0000027c      4 0x0004
0x00000278      4 0x0004
0x00000280      4 0x0004

These are not sorted!

So maybe the problem isn't the assert, but that our construction of the DIC table doesn't always fulfill the assert?

It kind of makes sense to require the data in code table in the output to be sorted too, so that tools can binary sort in _it_.

Right, it looks like ld64 explicitly sorts its contents (https://github.com/apple-oss-distributions/ld64/blob/dbf8f7feb5579761f1623b004bd468bdea7c6225/src/ld/LinkEdit.hpp#L2112), so we should do the same. At the same time, I do not have any information about what tools use data-in-code tables and in what way, so sorting might end up being unnecessary work.

I'll update this diff to sort the entries alongside fixing the faulty assertion.

BertalanD updated this revision to Diff 459289.Sep 10 2022, 7:24 AM
BertalanD retitled this revision from [lld-macho] Fix an incorrect assertion in DataInCodeSection to [lld-macho] Sort data-in-code entries.
BertalanD edited the summary of this revision. (Show Details)

Sort DataInCodeSection's contents as suggested by thakis.

int3 accepted this revision.Sep 12 2022, 10:00 AM

Thanks!

lld/MachO/SyntheticSections.cpp
966

add a comment mentioning that ld64 emits the table in sorted order too? could even link directly to https://reviews.llvm.org/D133581#3781121

This revision is now accepted and ready to land.Sep 12 2022, 10:00 AM
thakis added a comment.EditedSep 13 2022, 6:41 AM

(The below is mostly note-to-self and maybe interesting to others -- nothing actionable. I'd like to look into how to produce data-in-code tables on x86_64 some more. The very last line has a question for you, though.)

Right, it looks like ld64 explicitly sorts its contents (https://github.com/apple-oss-distributions/ld64/blob/dbf8f7feb5579761f1623b004bd468bdea7c6225/src/ld/LinkEdit.hpp#L2112), so we should do the same. At the same time, I do not have any information about what tools use data-in-code tables and in what way

That's a great question.

otool -tV (and llvm-otool -tV and the corresponding llvm-objdump invocation) do make use of it:

 % otool -tV llvm/test/Object/Inputs/macho-data-in-code.macho-thumbv7
...
00000012	    bd80	pop	{r7, pc}
00000014	00000038	.long 56	@ KIND_DATA

For llvm-objdump, this was added in 273ae01b0353 (https://reviews.llvm.org/rG273ae01b0353). Since vanilla otool does it too, I suppose there's code in cctools and maybe something else in cctools might use it? Huh, looks like it's in arm_disasm.c in cctools (print_data_in_code). For llvm-objdump, the code for this is _in_ the llvm-objdump tool code, so programs built on top of LLVM's libraries won't pick this up for free. The llvm-objdump code does explicitly sort the table.

But as far as I can tell, nothing else looks at this table. These tools all don't look at data-in-code as far as I can tell:

  • lldb (tried it, see below)
  • radare2 (only looked through source code)
  • Ghidra (only looked through source code)
  • Hopper (tried it with bar.o below; no source code for this, though)

It is hard to produce data-in-code tables on arm64!

On arm32, it's fairly easy with --target=arm-apple-ios but not with --target=armv7-apple-ios (and lld doesn't support 32-bit links anyways) -- compiling a file containing e.g. g(0x10001); or a simple switch statement produces data-in-code entries.

On arm64, I haven't managed to do it without inline asm. With inline asm, asm("ldr x0,=0x10001"); does cause creation of data-in-code.

As mentioned above, otool only interprets data-in-code for 32-bit arm, but llvm-otool can handle it in 64-bit arm too:

 % cat bar.cc
void as() { asm("ldr x0,=0x10001"); }

% out/gn/bin/llvm-otool -tV bar.o 
bar.o:
Contents of (__TEXT,__text) section
__Z2asv:
0000000000000000	ldr	x0, #0x8
0000000000000004	ret
0000000000000008	.long 65537	@ KIND_DATA
000000000000000c	.long 0	@ KIND_DATA

% otool -tV bar.o 
bar.o:
(__TEXT,__text) section
__Z2asv:
0000000000000000	ldr	x0, #0x8
0000000000000004	ret
0000000000000008	.long	0x00010001
000000000000000c	udf	#0x0

% lldb bar.o
(lldb) target create "bar.o"
Current executable set to '/Users/thakis/src/llvm-project/bar.o' (arm64).
(lldb) disas -n _Z2asv
bar.o`as:
bar.o[0x0] <+0>:  ldr    x0, #0x8                  ; <+8>
bar.o[0x4] <+4>:  ret    
bar.o[0x8] <+8>:  .long  0x00010001                ; unknown opcode
bar.o[0xc] <+12>: udf    #0x0

I didn't manage to trigger this at all on x86_64 code. But given we hit this assert in lld when doing a bootstrap build of LLVM on x86_64, it has to be possible to get this for fairly normal code on x86_64 :) % otool -G out/gn/bin/* | grep -v offset | grep -v out | grep -v 0 on my build dir on my arm mac doesn't find anything. I don't have my intel machine with me at the moment.

so sorting might end up being unnecessary work.

Does it take noticeable amounts of time?

thakis added a comment.EditedSep 13 2022, 7:12 AM

Here's one way to trigger data-in-code on x86_64:

% cat bar.cc
__attribute__((noinline)) inline int sw(int i) {
  switch (i) {
  case 0: return 3443;
  case 1: return 3453;
  case 2: return 3353;
  case 3: return 4353;
  default: return -1;
  }
}

void ref() {
  sw(2);
}

% clang -c bar.cc  --target=x86_64-apple-darwin

% otool -G bar.o                                
bar.o:
Data in code table (1 entries)
offset     length kind
0x00000078     16 0x0004

% out/gn/bin/llvm-otool -tV bar.o
...
0000000000000078	.long 4294967236	@ KIND_JUMP_TABLE32

(cctools otool again doesn't handle this.)

Need to carefully pick number of cases in switch and whatnot to make http://llvm-cs.pcc.me.uk/include/llvm/CodeGen/TargetLowering.h#1032 (isSuitableForJumpTable impl) and http://llvm-cs.pcc.me.uk/lib/CodeGen/SwitchLoweringUtils.cpp#64 (MinJumpTableEntries comparison in SwitchCG::SwitchLowering::findJumpTables) happy – then http://llvm-cs.pcc.me.uk/lib/CodeGen/SwitchLoweringUtils.cpp#85 (isSuitableForJumpTable check in SwitchCG::SwitchLowering::findJumpTables) will trigger and we'll end up with a jump table.

Also needs to be in an inline function, else the data just goes in __TEXT,__const due to http://llvm-cs.pcc.me.uk/lib/Target/TargetLoweringObjectFile.cpp#291 (F.isWeakForLinker check in TargetLoweringObjectFile::shouldPutJumpTableInFunctionSection)

That explains why we see the assert on x86_64.

thakis added a comment.EditedSep 13 2022, 7:21 AM

As for why we don't hit this on AArch64: http://llvm-cs.pcc.me.uk/lib/Target/AArch64/AArch64AsmPrinter.cpp#686 AArch64AsmPrinter::emitJumpTableInfo overrides AsmPrinter::emitJumpTableInfo, and the override doesn't call OutStreamer->emitDataRegion(MCDR_DataRegionJT32); for whatever reason. It probably should?

But that explains why we _don't_ currently see the assert on arm64.

With that, I think everything's explained :)

Also, this fixes the last known assert in https://bugs.chromium.org/p/chromium/issues/detail?id=1265937 (using lld as host linker when making chromium's clang package), probably good to reference this from here.

thakis accepted this revision.Sep 13 2022, 7:33 AM

Thank you for the write-up, it was very interesting!

Does it take noticeable amounts of time?

See the benchmark posted in the summary. Sorting 6000 8-byte data_in_code_entry structs seems to have a negligible overhead.

I'll commit this with the requested comment change in a moment. I will de-templatize this code collectDataInCodeEntries, and move it into DataInCodeSection::finalizeContents() in a follow-up commit.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptSep 13 2022, 10:10 AM

As for why we don't hit this on AArch64: http://llvm-cs.pcc.me.uk/lib/Target/AArch64/AArch64AsmPrinter.cpp#686 AArch64AsmPrinter::emitJumpTableInfo overrides AsmPrinter::emitJumpTableInfo, and the override doesn't call OutStreamer->emitDataRegion(MCDR_DataRegionJT32); for whatever reason. It probably should?

But that explains why we _don't_ currently see the assert on arm64.

With that, I think everything's explained :)

Follow-up to this: After D113576, llvm _always_ puts jump tables in __TEXT,__const on aarch64, so it never shows up in disassembly -- even for inline functions. (Things then rely on -dead_strip to remove the unused data from the jump tables references from non-prevailing inlines, instead of them not being linked in to the binary in the first place.) clang coming with the version of Xcode I'm using doesn't have that change yet as far as I can tell, so I didn't notice this immediately.