The intent is clearly to push the tail rather than current BB.
Details
Diff Detail
Event Timeline
Hello, everyone! I encountered an deadloop problem due to llvm-bolt optimaization of tail-duplication pass. I used an exsiting test case to repeat the problem.
Look at the following simple example (from the bolt/test/X86/tail-duplication-pass.s)
# REQUIRES: system-linux # RUN: llvm-mc -filetype=obj -triple x86_64-unknown-unknown \ # RUN: %s -o %t.o # RUN: link_fdata %s %t.o %t.fdata # RUN: %clang %cflags %t.o -o %t.exe -Wl,-q # RUN: llvm-bolt %t.exe --data %t.fdata --reorder-blocks=ext-tsp \ # RUN: --print-finalized --tail-duplication=moderate \ # RUN: --tail-duplication-minimum-offset=1 -o %t.out | FileCheck %s # FDATA: 1 main 2 1 main #.BB2# 0 10 # FDATA: 1 main 4 1 main #.BB2# 0 20 # CHECK: BOLT-INFO: tail duplication modified 1 ({{.*}}%) functions; duplicated 1 blocks (1 bytes) responsible for {{.*}} dynamic executions ({{.*}}% of all block executions) # CHECK: BB Layout : .LBB00, .Ltail-dup0, .Ltmp0, .Ltmp1 .text .globl main .type main, %function .size main, .Lend-main main: xor %eax, %eax jmp .BB2 .BB1: inc %rax .BB2: retq # For relocations against .text call exit .Lend:
When changing --tail-duplication mode from moderate to aggressive,it will produce a strange deadloop duplication code.( moderate mode hasnot this problem.)
$ ~/llvm-upstream/build_debug/bin/llvm-bolt --data hello-pass.fdata --reorder-blocks=ext-tsp --print-finalized --tail-duplication=agg ressive --tail-duplication-minimum-offset=1 hello-pass.exe -o hello-pass.out BOLT-INFO: shared object or position-independent executable detected BOLT-INFO: Target architecture: x86_64 BOLT-INFO: BOLT version: 6dbd27b7ab98841370c61e919f42ed589c2f9bc7 BOLT-INFO: first alloc address is 0x0 BOLT-INFO: creating new program header table at address 0x200000, offset 0x200000 BOLT-INFO: enabling relocation mode BOLT-INFO: enabling lite mode BOLT-INFO: pre-processing profile using branch profile reader BOLT-INFO: 1 out of 11 functions in the binary (9.1%) have non-empty execution profile BOLT-INFO: basic block reordering modified layout of 0 functions (0.00% of profiled, 0.00% of total) BOLT-INFO: UCE removed 1 blocks and 5 bytes of code BOLT-INFO: 0 Functions were reordered by LoopInversionPass BOLT-INFO: tail duplication modified 1 (6.67%) functions; duplicated 1 blocks (4 bytes) responsible for 10 dynamic executions (16.67% of all block executions) Binary Function "main" after finalize-functions { Number : 6 State : CFG finalized Address : 0x1778 Size : 0xd MaxSize : 0x18 Offset : 0x778 Section : .text Orc Section : .local.text.main LSDA : 0x0 IsSimple : 1 IsMultiEntry: 1 IsSplit : 0 BB Count : 4 Hash : 3c02e78938c1b5c7 Secondary Entry Points : .BB2/1, .BB1/1 BB Layout : .LBB00, .Ltail-dup0, .Ltmp0, .Ltmp1 Exec Count : 0 Branch Count: 30 Profile Acc : 100.0% } .LBB00 (1 instructions, align : 1) Entry Point Exec Count : 6 CFI State : 0 00000000: xorl %eax, %eax Successors: .Ltail-dup0 (mispreds: 0, count: 6) .Ltail-dup0 (2 instructions, align : 1) Exec Count : 3 CFI State : 0 Predecessors: .LBB00, .Ltail-dup0 00000002: xorl %eax, %eax 00000004: jmp ".Ltail-dup0" Successors: .Ltail-dup0 (mispreds: 0, count: 3) .Ltmp0 (1 instructions, align : 1) Secondary Entry Point: .BB1/1 Exec Count : 20 CFI State : 0 00000006: incq %rax Successors: .Ltmp1 (mispreds: 0, count: 20) .Ltmp1 (1 instructions, align : 1) Secondary Entry Point: .BB2/1 Exec Count : 30 CFI State : 0 Predecessors: .Ltmp0 00000009: retq DWARF CFI Instructions: <empty> End of Function "main"
the successor of Ltail-dup0 is itself.
.Ltail-dup0 (2 instructions, align : 1) Exec Count : 3 CFI State : 0 Predecessors: .LBB00, .Ltail-dup0 00000002: xorl %eax, %eax 00000004: jmp ".Ltail-dup0" Successors: .Ltail-dup0 (mispreds: 0, count: 3)
I disassemble the result optimized binary executable as follows.
0000000000400000 <main>: 400000: 31 c0 xor %eax,%eax 400002: 31 c0 xor %eax,%eax 400004: eb fc jmp 400002 <main+0x2> 400006: 48 ff c0 inc %rax 400009: c3 retq
This is obviously optimized into a deadloop!
Then I have read up the related codes of tail duplication on llvm-bolt. I find the BinaryBasicBlock *CurrBB = &BB; maybe change to BinaryBasicBlock *CurrBB = &Tail;
I changed it and recompiled, it worked well!
Hi @zj, thank you for the test case and for the fix! It looks good to me. Could you please merge the new test case into bolt/test/X86/tail-duplication-pass.s?
Thanks! Do you have the commit access? Before committing, please re-title using all-caps "[BOLT]" and follow the "50-72" rule (50-character wide title, 72 for the summary).