This is an archive of the discontinued LLVM Phabricator instance.

[BOLT] Remove peephole useless-branches pass
AbandonedPublic

Authored by yota9 on Jan 27 2022, 6:43 AM.

Details

Summary

Currently this pass is useless since the cfg-normalization pass contains
this functionality.
Also this patch refactors peephole options enum by placing it Peepholes class.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

Diff Detail

Event Timeline

yota9 created this revision.Jan 27 2022, 6:43 AM
yota9 requested review of this revision.Jan 27 2022, 6:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 27 2022, 6:43 AM
yota9 updated this revision to Diff 403640.Jan 27 2022, 6:49 AM

Remove enum name for in-class usage

yota9 updated this revision to Diff 403642.Jan 27 2022, 6:57 AM

Clang format

Amir added a comment.EditedJan 27 2022, 2:27 PM

Hi Vladislav,

I've added the clang-icp.test to binary tests repo (https://github.com/rafaelauler/bolt-tests) and this test breaks with this diff. Please let me know if you can reproduce it.

Amir added a comment.Jan 28 2022, 12:39 PM

Hi Vladislav,

I've added the clang-icp.test to binary tests repo (https://github.com/rafaelauler/bolt-tests) and this test breaks with this diff. Please let me know if you can reproduce it.

Please use bolt/utils/llvm-bolt-wrapper.py script (rGc2a961e414e0) to run binary matching.

  1. Build BOLT without this diff, rename the binary as llvm-bolt.old
  2. Build BOLT with this diff, rename the binary as llvm-bolt.new
  3. Run the script to produce llvm-bolt-wrapper.ini:

/path/to/llvm-bolt-wrapper.py /path/to/llvm-bolt.old /path/to/llvm-bolt.new > bin/llvm-bolt-wrapper.ini

  1. Symlink the script in place of llvm-bolt:

ln -sf /path/to/llvm-bolt-wrapper.py /path/to/llvm-bolt

  1. Run binary test:

bin/llvm-lit -a tools/bolttests/clang-icp.test

The mismatch in text section should be reported.

yota9 updated this revision to Diff 404171.Jan 28 2022, 2:49 PM

Call fixBranches in fixDoubleJumps.
@Amir could you please check the test? I think it shold be identical now :)

Amir added a comment.Jan 28 2022, 9:36 PM

Call fixBranches in fixDoubleJumps.
@Amir could you please check the test? I think it shold be identical now :)

No, it's still not matching.
I've used the script in D118516 to test it (cherry-pick it and move under this diff):

  • in repo: checkout this diff (D118364)
  • in BOLT build directory: invoke python3 <llvm-project>/bolt/utils/nfc-check-setup.py --run_sequentially
  • run the test: bin/llvm-lit -a tools/bolttests/X86/clang.test (no need to run the binary)
yota9 added a comment.EditedJan 29 2022, 1:50 PM

@Amir Hello! Yes, you're right, there are 256 bytes mismatch in .text.cold between binaries (the new binary becomes smaller). But now we not only remove the useless condition branches, we are also removing the useless unconditional branches. The example you can see in small function "_ZN4llvm11InstVisitorINS_26PGOIndirectCallSiteVisitorEvE16delegateCallInstERNS_8CallInstE"

Previously after-bolt we had:

38bd9e5:	eb f3                	jmp    38bd9da <_ZN4llvm11InstVisitorINS_26PGOIndirectCallSiteVisitorEvE16delegateCallInstERNS_8CallInstE+0x16>
38bd9e7:	eb f1                	jmp    38bd9da <_ZN4llvm11InstVisitorINS_26PGOIndirectCallSiteVisitorEvE16delegateCallInstERNS_8CallInstE+0x16>

And now:

38bd9e5:	eb f3                	jmp    38bd9da <_ZN4llvm11InstVisitorINS_26PGOIndirectCallSiteVisitorEvE16delegateCallInstERNS_8CallInstE+0x16>
38bd9e7:	90                   	nop

The second jump was emitted since previously we didn't run eraseInvalidBBs() after peephole pass, although in fact this instruction (which is in separate BB) becomes unreachable.

Also fixBranches() makes some work. E.g. in "_ZN4llvm13ScheduleDAGMID1Ev"

Previously we had jump on the next instruction:

47b1c51:	e8 da 35 cb fe       	callq  3465230 <_ZN12_GLOBAL__N_113CopyConstrainD0Ev>
47b1c56:	eb 07                	jmp    47b1c5f <_ZN4llvm13ScheduleDAGMID1Ev+0x51>
47b1c58:	eb 05                	jmp    47b1c5f <_ZN4llvm13ScheduleDAGMID1Ev+0x51>
47b1c5a:	ff 50 10             	callq  *0x10(%rax)
47b1c5d:	eb 00                	jmp    47b1c5f <_ZN4llvm13ScheduleDAGMID1Ev+0x51>
47b1c5f:	48 8d 53 08          	lea    0x8(%rbx),%rdx

And now no extra jump on 47b1c5d:

47b1c11:	e8 1a 36 cb fe       	callq  3465230 <_ZN12_GLOBAL__N_113CopyConstrainD0Ev>
 47b1c16:	eb 03                	jmp    47b1c1b <_ZN4llvm13ScheduleDAGMID1Ev+0x4d>
 47b1c18:	ff 50 10             	callq  *0x10(%rax)
 47b1c1b:	48 8d 53 08          	lea    0x8(%rbx),%rdx

This example also contains one unreachable jump removed by eraseInvalidBBs right in one function :)
Although the binaries are not identical, but it seems to be reasonable change :) But I think we need to decide how to run the fixBranches() in more proper way I made now. Since for example after the SimplifyConditionalTailCalls we might have situations where the fixBranches() is needed to call, so I'm not sure how to fix it better, maybe you can suggest me. In my mind the easiest and the bests solution is to run the FixupBranches pass twice, the second time after peephole with the comment that no further pass should change the branches. This will help us to avoid such problems in future, but will cost some time of course..

Amir added a comment.Jan 31 2022, 11:32 AM

Thanks for investigating! I'm checking it with more tests and will return to you shortly

yota9 abandoned this revision.Feb 1 2022, 12:36 PM

The SCTC pass breaks CFG and we are unable to call fixBranches at that point anymore, closing review for now, thank you @Amir for your time :)