Page MenuHomePhabricator

[BOLT][AArch64] ST_Function symbols start a CODE interval
Needs RevisionPublic

Authored by sebpop on Feb 23 2023, 9:23 AM.



With the patch in perf2bolt continues and then crashes on an assert:

#9 0x0000ffff940ccaec llvm::bolt::BinaryFunction::addCFIInstruction(unsigned long, llvm::MCCFIInstruction&&) /home/admin/llvm-project/bolt/include/bolt/Core/BinaryFunction.h:1654:45
 #10 0x0000ffff940c7f10 llvm::bolt::CFIReaderWriter::fillCFIInfoFor(llvm::bolt::BinaryFunction&) const::'lambda'(llvm::dwarf::CFIProgram::Instruction const&)::operator()(llvm::dwarf::CFIProgram::Instruction const&) const /home/admin/llvm-project/bolt/lib/Core/Exceptions.cpp:571:38
 #11 0x0000ffff940c863c llvm::bolt::CFIReaderWriter::fillCFIInfoFor(llvm::bolt::BinaryFunction&) const /home/admin/llvm-project/bolt/lib/Core/Exceptions.cpp:650:32
 #12 0x0000ffff9aa48074 llvm::bolt::RewriteInstance::disassembleFunctions() /home/admin/llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp:3135:62
 #13 0x0000ffff9aa3bed4 llvm::bolt::RewriteInstance::run() /home/admin/llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp:771:27
 #14 0x0000aaaad74258b0 main /home/admin/llvm-project/bolt/tools/driver/llvm-bolt.cpp:243:29

Diff Detail

Event Timeline

sebpop created this revision.Feb 23 2023, 9:23 AM
Herald added a reviewer: Amir. · View Herald Transcript
Herald added a reviewer: maksfb. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
sebpop requested review of this revision.Feb 23 2023, 9:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 23 2023, 9:23 AM
sebpop updated this revision to Diff 499904.Feb 23 2023, 9:54 AM
sebpop retitled this revision from remove assert in addCFIInstruction to do not collect CFI info on empty functions.

Instead of removing the assert, the patch avoids collecting CFI info on functions without instructions.

I'm trying to understand what is going on with the assertion. Does the input binary have an empty function with CFIs or something else is going on? Could you please add a test case?

I will reduce a yaml object testcase for this fail.
Do you have a specialized reducer for yaml object files?
Otherwise I'm going to use a line by line reducer:

I will reduce a yaml object testcase for this fail.
Do you have a specialized reducer for yaml object files?
Otherwise I'm going to use a line by line reducer:

Can you create a test case in assembly?

Yes, I will provide a testcase in assembly.

Thanks! We typically fallback to YAML only when it's not possible to create a test case by other means. It's good to know about the reducer tool.

I let delta reduce the yaml representation of the binary, however it did not go very far over night...
Maybe delta could work on smaller inputs (this current one is way too large: 4Gb.)

The assert is in BinaryFunction.h:

  void addCFIInstruction(uint64_t Offset, MCCFIInstruction &&Inst) {                                                                                                                                               
=>  assert(!Instructions.empty()); 

(gdb) p this->dump()
Binary Function "lj_vm_ffi_call/1(*2)"  {
  All names   : lj_vm_ffi_call/1
  Number      : 17598
  State       : disassembled
  Address     : 0x18c9890
  Size        : 0x80
  MaxSize     : 0x80
  Offset      : 0x18b9890
  Section     : .text
  Orc Section : .local.text.lj_vm_ffi_call/1
  LSDA        : 0x0
  IsSimple    : 1
  IsMultiEntry: 0
  IsSplit     : 0
  BB Count    : 0
DWARF CFI Instructions:
End of Function "lj_vm_ffi_call/1(*2)"

It seems to me like llvm-objdump is not able to decode the instructions in that function:

$ llvm-objdump -d envoy

00000000018c9890 <lj_vm_ffi_call>:
 18c9890: fd 7b be a9   .word   0xa9be7bfd
 18c9894: fd 03 00 91   .word   0x910003fd
 18c9898: f3 0b 00 f9   .word   0xf9000bf3
 18c989c: f3 03 00 aa   .word   0xaa0003f3
 18c98a0: 08 08 40 b9   .word   0xb9400808
 18c98a4: 69 32 40 39   .word   0x39403269
 18c98a8: 6a 62 02 91   .word   0x9102626a
 18c98ac: 29 05 00 f1   .word   0xf1000529
 18c98b0: 6b 02 40 f9   .word   0xf940026b
 18c98b4: bf 63 28 cb   .word   0xcb2863bf
 18c98b8: a4 00 00 54   .word   0x540000a4
 18c98bc: 48 79 69 f8   .word   0xf8697948
 18c98c0: e8 7b 29 f8   .word   0xf8297be8
 18c98c4: 29 05 00 f1   .word   0xf1000529
 18c98c8: a5 ff ff 54   .word   0x54ffffa5
 18c98cc: 60 86 45 a9   .word   0xa9458660
 18c98d0: 60 86 41 6d   .word   0x6d418660
 18c98d4: 62 8e 46 a9   .word   0xa9468e62
 18c98d8: 62 8e 42 6d   .word   0x6d428e62
 18c98dc: 64 96 47 a9   .word   0xa9479664
 18c98e0: 64 96 43 6d   .word   0x6d439664
 18c98e4: 66 9e 48 a9   .word   0xa9489e66
 18c98e8: 66 9e 44 6d   .word   0x6d449e66
 18c98ec: 68 0a 40 f9   .word   0xf9400a68
 18c98f0: 60 01 3f d6   .word   0xd63f0160
 18c98f4: bf 03 00 91   .word   0x910003bf
 18c98f8: 60 86 05 a9   .word   0xa9058660
 18c98fc: 60 86 01 6d   .word   0x6d018660
 18c9900: 62 8e 02 6d   .word   0x6d028e62
 18c9904: f3 0b 40 f9   .word   0xf9400bf3
 18c9908: fd 7b c2 a8   .word   0xa8c27bfd
 18c990c: c0 03 5f d6   .word   0xd65f03c0

Same function disassembled by the GNU objdump:

$ objdump -d envoy

00000000018c9890 <lj_vm_ffi_call>:
 18c9890:       a9be7bfd        stp     x29, x30, [sp, #-32]!
 18c9894:       910003fd        mov     x29, sp
 18c9898:       f9000bf3        str     x19, [sp, #16]
 18c989c:       aa0003f3        mov     x19, x0
 18c98a0:       b9400808        ldr     w8, [x0, #8]
 18c98a4:       39403269        ldrb    w9, [x19, #12]
 18c98a8:       9102626a        add     x10, x19, #0x98
 18c98ac:       f1000529        subs    x9, x9, #0x1
 18c98b0:       f940026b        ldr     x11, [x19]
 18c98b4:       cb2863bf        sub     sp, x29, x8
 18c98b8:       540000a4        b.mi    18c98cc <lj_vm_ffi_call+0x3c>  // b.first
 18c98bc:       f8697948        ldr     x8, [x10, x9, lsl #3]
 18c98c0:       f8297be8        str     x8, [sp, x9, lsl #3]
 18c98c4:       f1000529        subs    x9, x9, #0x1
 18c98c8:       54ffffa5    18c98bc <lj_vm_ffi_call+0x2c>  // b.nfrst
 18c98cc:       a9458660        ldp     x0, x1, [x19, #88]
 18c98d0:       6d418660        ldp     d0, d1, [x19, #24]
 18c98d4:       a9468e62        ldp     x2, x3, [x19, #104]
 18c98d8:       6d428e62        ldp     d2, d3, [x19, #40]
 18c98dc:       a9479664        ldp     x4, x5, [x19, #120]
 18c98e0:       6d439664        ldp     d4, d5, [x19, #56]
 18c98e4:       a9489e66        ldp     x6, x7, [x19, #136]
 18c98e8:       6d449e66        ldp     d6, d7, [x19, #72]
 18c98ec:       f9400a68        ldr     x8, [x19, #16]
 18c98f0:       d63f0160        blr     x11
 18c98f4:       910003bf        mov     sp, x29
 18c98f8:       a9058660        stp     x0, x1, [x19, #88]
 18c98fc:       6d018660        stp     d0, d1, [x19, #24]
 18c9900:       6d028e62        stp     d2, d3, [x19, #40]
 18c9904:       f9400bf3        ldr     x19, [sp, #16]
 18c9908:       a8c27bfd        ldp     x29, x30, [sp], #32
 18c990c:       d65f03c0        ret

Maybe the function body is marked as data leading to both llvm-objdump and BOLT to skipping instructions disassembly? Aarch64 has a special ABI that I'm not very familiar with. @yota9, do you know what's happening? Perhaps we should print constant islands in the function dump as well (unless we already do and I missed the code).

yota9 added a comment.Feb 27 2023, 6:19 AM

@maksfb I think you're right, probably mappings symbols has something to do here. Currently islands are not printed, but I think it is a good idea to add such functionality.
@sebpop I was unable to briefly find the sources of "lj_vm_ffi_call" function, probably it is auto generated, maybe it is using the same hack then openssl using .long directive to emit instruction using opcode.. Anyway please try to find the closest $d or $x symbol to the lj_vm_ffi_call function with symbol adress <= lj_vm_ffi_call address. If it is $d it means that the instruction would be interpreted like a data rather then code and we need to find the source of the problem somewhere where lj_vm_ffi_call function is created..

Thanks @yota9 for the hints.
The code preceding the function is indeed marked with $d:

disassembled by binutils' objdump:

00000000018c986c <lj_cont_ffi_callback>:
 18c986c:       f940c2d5        ldr     x21, [x22, #384]
 18c9870:       a9020ef3        stp     x19, x3, [x23, #32]
 18c9874:       f9000ab7        str     x23, [x21, #16]
 18c9878:       aa1503e0        mov     x0, x21
 18c987c:       aa1b03e1        mov     x1, x27
 18c9880:       94012fab        bl      191572c <lj_ccallback_leave>
 18c9884:       a94706a0        .word   0xa94706a0
 18c9888:       6d4306a0        .word   0x6d4306a0
 18c988c:       17fff91d        .word   0x17fff91d

00000000018c9890 <lj_vm_ffi_call>:
 18c9890:       a9be7bfd        stp     x29, x30, [sp, #-32]!
 18c9894:       910003fd        mov     x29, sp
 18c9898:       f9000bf3        str     x19, [sp, #16]

llvm-objdump output for the same addresses is:

00000000018c986c <lj_cont_ffi_callback>:
 18c986c: d5 c2 40 f9   .word   0xf940c2d5
 18c9870: f3 0e 02 a9   .word   0xa9020ef3
 18c9874: b7 0a 00 f9   .word   0xf9000ab7
 18c9878: e0 03 15 aa   .word   0xaa1503e0
 18c987c: e1 03 1b aa   .word   0xaa1b03e1

00000000018c9880 <$x.137>:
 18c9880: 94012fab      bl      0x191572c <lj_ccallback_leave>

00000000018c9884 <$d.138>:
 18c9884: a0 06 47 a9   .word   0xa94706a0
 18c9888: a0 06 43 6d   .word   0x6d4306a0
 18c988c: 1d f9 ff 17   .word   0x17fff91d

00000000018c9890 <lj_vm_ffi_call>:
 18c9890: fd 7b be a9   .word   0xa9be7bfd
 18c9894: fd 03 00 91   .word   0x910003fd
 18c9898: f3 0b 00 f9   .word   0xf9000bf3

Well it looks to me that for some reason mapping symbols are corrupted. Basically it is not bolt problem, the only way you can overcome this issue without binary fixing is ignoring these functions manually with skip-funcs option. Even with this patch bolt won't produce working binary anyway.

The following patch fixes the issue by checking whether the Address is at the beginning of the new symbol.
If the Address is at the beginning of a new symbol, the decode type should not inherit the decode type of the previous symbol:
it should revert back to the default disassembly mode.

--- a/llvm/tools/llvm-objdump/llvm-objdump.cpp
+++ b/llvm/tools/llvm-objdump/llvm-objdump.cpp
@@ -1006,19 +1006,21 @@ static bool shouldAdjustVA(const SectionRef &Section) {
 typedef std::pair<uint64_t, char> MappingSymbolPair;
 static char getMappingSymbolKind(ArrayRef<MappingSymbolPair> MappingSymbols,
-                                 uint64_t Address) {
+                                 uint64_t Address, uint64_t Size) {
   auto It =
       partition_point(MappingSymbols, [Address](const MappingSymbolPair &Val) {
         return Val.first <= Address;
   // Return zero for any address before the first mapping symbol; this means
   // we should use the default disassembly mode, depending on the target.
   if (It == MappingSymbols.begin())
     return '\x00';
+  if ((It - 1)->first + Size == Address)
+    return '\x00';
   return (It - 1)->second;
 static uint64_t dumpARMELFData(uint64_t SectionAddr, uint64_t Index,
                                uint64_t End, const ObjectFile &Obj,
@@ -1751,11 +1753,11 @@ static void disassembleObject(const Target *TheTarget, ObjectFile &Obj,
         // ARM and AArch64 ELF binaries can interleave data and text in the
         // same section. We rely on the markers introduced to understand what
         // we need to dump. If the data marker is within a function, it is
         // denoted as a word/short etc.
         if (!MappingSymbols.empty()) {
-          char Kind = getMappingSymbolKind(MappingSymbols, Index);
+          char Kind = getMappingSymbolKind(MappingSymbols, Index, Size);
           DumpARMELFData = Kind == 'd';
           if (SecondarySTI) {
             if (Kind == 'a') {
               STI = PrimaryIsThumb ? SecondarySTI : PrimarySTI;
               DisAsm = PrimaryIsThumb ? SecondaryDisAsm : PrimaryDisAsm;

@sebpop But it is wrong behaviour. All the symbols after mmaping symbol are considered to be that type before new symbols is located. It is not per-symbol thing

sebpop updated this revision to Diff 503995.Mar 9 2023, 5:52 PM
sebpop retitled this revision from do not collect CFI info on empty functions to [BOLT][AArch64] ST_Function symbols start a CODE interval.

Updated patch to mark the code of functions as executable on a new mapping symbol.

sebpop updated this revision to Diff 506675.Mar 20 2023, 12:00 PM

Updated patch to pass clang-format check.

yota9 requested changes to this revision.EditedMar 20 2023, 12:05 PM

Sorry @sebpop but I don't think it is part of ARM mapping symbols ELF standard. What type of compiler are you using when observing this? Is there any way to create test to reproduce it?

This revision now requires changes to proceed.Mar 20 2023, 12:05 PM

The binary was created with clang-14 from Ubuntu 20.04.

I am using the following instructions to build envoy proxy on arm64:

git clone
cd envoy
./ci/ 'BAZEL_BUILD_EXTRA_OPTIONS="--copt=-gdwarf-4 --linkopt=-Wl,--emit-relocs" ./ci/ bazel.release.server_only'
cd /tmp/envoy-docker-build/envoy/source/exe/envoy
perf2bolt -p ~/ -o ~/perf.fdata --nl envoy

I will try to reduce a smaller test that can be included in BOLT's testsuite.

My colleague Jonathan Swinney helped reduce a testcase:

$ cat 
        bl bar
        mov     w0, 0
        .string "a b c d"
        .word 10

$ gcc h.s
$ objdump -d a.out
0000000000400564 <main>:
  400564:	94000006 	bl	40057c <bar>
  400568:	52800000 	mov	w0, #0x0                   	// #0
  40056c:	d65f03c0 	ret
  400570:	20622061 	.word	0x20622061
  400574:	00642063 	.word	0x00642063
  400578:	0000000a 	.word	0x0000000a

000000000040057c <bar>:
  40057c:	d65f03c0 	ret

The transition from data section back to executable code happens on new symbol as described in IHI0056B
I will add a testcase to the patch based on this.

maksfb added a comment.Mon, Jun 5, 3:04 PM

@sebpop According to ABI's (IHI0056B) "5.4.5 Mapping symbols":

Mapping symbols defined in a section (relocatable view) or segment (executable view) define a sequence of halfopen intervals that cover the address range of the section or segment. Each interval starts at the address defined
by the mapping symbol, and continues up to, but not including, the address defined by the next (in address order) mapping symbol or the end of the section or segment.

I.e. the code should resume with $x/$x.<any> mapping symbol in addition to ST_Function, but it's not.

Is the problematic code coming from lua compiler/assembler? If it's not possible to fix lua, I think it's okay to introduce the check for ST_Function and issue a warning. Unless o/c there are legitimate cases where ST_Function could be data (?). @yota9, does this make sense to you?

yota9 added a comment.Mon, Jun 5, 3:11 PM

Yes, totally right, that why I was referring ARM elf standard above. This is definitely caused by compiler bug which generates wrong maping symbols and such a fix is not a proper way to deal with it :(