This is an archive of the discontinued LLVM Phabricator instance.

[BOLT] Fix deadloop bug in taildup
ClosedPublic

Authored by zj on Aug 31 2023, 8:34 AM.

Details

Summary

The intent is clearly to push the tail rather than current BB.

Diff Detail

Event Timeline

zj created this revision.Aug 31 2023, 8:34 AM
Herald added a reviewer: Amir. · View Herald Transcript
Herald added a reviewer: maksfb. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
zj requested review of this revision.Aug 31 2023, 8:34 AM
zj added a comment.Aug 31 2023, 8:44 AM

Hello, everyone! I encountered an deadloop problem due to llvm-bolt optimaization of tail-duplication pass. I used an exsiting test case to repeat the problem.
Look at the following simple example (from the bolt/test/X86/tail-duplication-pass.s)

# REQUIRES: system-linux

# RUN: llvm-mc -filetype=obj -triple x86_64-unknown-unknown \
# RUN:   %s -o %t.o
# RUN: link_fdata %s %t.o %t.fdata
# RUN: %clang %cflags %t.o -o %t.exe -Wl,-q
# RUN: llvm-bolt %t.exe --data %t.fdata --reorder-blocks=ext-tsp \
# RUN:    --print-finalized --tail-duplication=moderate \
# RUN:    --tail-duplication-minimum-offset=1 -o %t.out | FileCheck %s

# FDATA: 1 main 2 1 main #.BB2# 0 10
# FDATA: 1 main 4 1 main #.BB2# 0 20
# CHECK: BOLT-INFO: tail duplication modified 1 ({{.*}}%) functions; duplicated 1 blocks (1 bytes) responsible for {{.*}} dynamic executions ({{.*}}% of all block executions)
# CHECK: BB Layout   : .LBB00, .Ltail-dup0, .Ltmp0, .Ltmp1

    .text
    .globl main
    .type main, %function
    .size main, .Lend-main
main:
    xor %eax, %eax
    jmp .BB2
.BB1:
    inc %rax
.BB2:
    retq
# For relocations against .text
    call exit
.Lend:

When changing --tail-duplication mode from moderate to aggressive,it will produce a strange deadloop duplication code.( moderate mode hasnot this problem.)

$ ~/llvm-upstream/build_debug/bin/llvm-bolt --data hello-pass.fdata --reorder-blocks=ext-tsp --print-finalized --tail-duplication=agg                                     ressive --tail-duplication-minimum-offset=1 hello-pass.exe  -o hello-pass.out
BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: 6dbd27b7ab98841370c61e919f42ed589c2f9bc7
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: creating new program header table at address 0x200000, offset 0x200000
BOLT-INFO: enabling relocation mode
BOLT-INFO: enabling lite mode
BOLT-INFO: pre-processing profile using branch profile reader
BOLT-INFO: 1 out of 11 functions in the binary (9.1%) have non-empty execution profile
BOLT-INFO: basic block reordering modified layout of 0 functions (0.00% of profiled, 0.00% of total)
BOLT-INFO: UCE removed 1 blocks and 5 bytes of code
BOLT-INFO: 0 Functions were reordered by LoopInversionPass
BOLT-INFO: tail duplication modified 1 (6.67%) functions; duplicated 1 blocks (4 bytes) responsible for 10 dynamic executions (16.67%                                      of all block executions)
Binary Function "main" after finalize-functions {
  Number      : 6
  State       : CFG finalized
  Address     : 0x1778
  Size        : 0xd
  MaxSize     : 0x18
  Offset      : 0x778
  Section     : .text
  Orc Section : .local.text.main
  LSDA        : 0x0
  IsSimple    : 1
  IsMultiEntry: 1
  IsSplit     : 0
  BB Count    : 4
  Hash        : 3c02e78938c1b5c7
  Secondary Entry Points : .BB2/1, .BB1/1
  BB Layout   : .LBB00, .Ltail-dup0, .Ltmp0, .Ltmp1
  Exec Count  : 0
  Branch Count: 30
  Profile Acc : 100.0%
}
.LBB00 (1 instructions, align : 1)
  Entry Point
  Exec Count : 6
  CFI State : 0
    00000000:   xorl    %eax, %eax
  Successors: .Ltail-dup0 (mispreds: 0, count: 6)

.Ltail-dup0 (2 instructions, align : 1)
  Exec Count : 3
  CFI State : 0
  Predecessors: .LBB00, .Ltail-dup0
    00000002:   xorl    %eax, %eax
    00000004:   jmp     ".Ltail-dup0"
  Successors: .Ltail-dup0 (mispreds: 0, count: 3)

.Ltmp0 (1 instructions, align : 1)
  Secondary Entry Point: .BB1/1
  Exec Count : 20
  CFI State : 0
    00000006:   incq    %rax
  Successors: .Ltmp1 (mispreds: 0, count: 20)

.Ltmp1 (1 instructions, align : 1)
  Secondary Entry Point: .BB2/1
  Exec Count : 30
  CFI State : 0
  Predecessors: .Ltmp0
    00000009:   retq

DWARF CFI Instructions:
    <empty>
End of Function "main"

the successor of Ltail-dup0 is itself.

.Ltail-dup0 (2 instructions, align : 1)
  Exec Count : 3
  CFI State : 0
  Predecessors: .LBB00, .Ltail-dup0
    00000002:   xorl    %eax, %eax
    00000004:   jmp     ".Ltail-dup0"
  Successors: .Ltail-dup0 (mispreds: 0, count: 3)

I disassemble the result optimized binary executable as follows.

0000000000400000 <main>:
  400000:       31 c0                   xor    %eax,%eax
  400002:       31 c0                   xor    %eax,%eax
  400004:       eb fc                   jmp    400002 <main+0x2>
  400006:       48 ff c0                inc    %rax
  400009:       c3                      retq

This is obviously optimized into a deadloop!
Then I have read up the related codes of tail duplication on llvm-bolt. I find the BinaryBasicBlock *CurrBB = &BB; maybe change to BinaryBasicBlock *CurrBB = &Tail;
I changed it and recompiled, it worked well!

hzq added a subscriber: hzq.Sep 4 2023, 5:06 AM

Hi @zj, thank you for the test case and for the fix! It looks good to me. Could you please merge the new test case into bolt/test/X86/tail-duplication-pass.s?

zj updated this revision to Diff 556545.Sep 12 2023, 3:52 AM

Merge the testcase.

zj added a comment.Sep 12 2023, 3:55 AM

Hi @zj, thank you for the test case and for the fix! It looks good to me. Could you please merge the new test case into bolt/test/X86/tail-duplication-pass.s?

Thanks for your advice, I have merged it already. :)

maksfb accepted this revision.Sep 14 2023, 11:49 AM

Thanks! Do you have the commit access? Before committing, please re-title using all-caps "[BOLT]" and follow the "50-72" rule (50-character wide title, 72 for the summary).

This revision is now accepted and ready to land.Sep 14 2023, 11:49 AM
zj retitled this revision from [Bolt]Tail Duplication: fix deadloop bug due to using the --tail-duplication=aggressive option of llvm-bolt to [BOLT] Fix deadloop bug in taildup.Sep 14 2023, 7:01 PM
zj added a comment.Sep 14 2023, 7:04 PM

Thanks! Do you have the commit access? Before committing, please re-title using all-caps "[BOLT]" and follow the "50-72" rule (50-character wide title, 72 for the summary).

OK, thx. I have modified it and my colleague will assist with the submission late.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptSep 15 2023, 7:13 AM