Page MenuHomePhabricator

JumpThreading: enhance JT to handle BB with no successor and address comparison
Needs ReviewPublic

Authored by davidxl on Oct 11 2019, 4:25 PM.

Details

Reviewers
efriedma
wmi
Summary

Current JT only process (clone) BBs with multiple successors in JT with the aim to thread the predecessor with a successor BB. This misses opportunities to to handle return BB where the return value can be simplified with threading (cloning).

Example:

#include <array>
#include <algorithm>

constexpr std::array<int, 3> x = {1, 7, 17};

bool Contains(int i) {

return std::find(x.begin(), x.end(), i) != x.end();

}

Clang produces inefficient code:

_Z8Containsi: # @_Z8Containsi

.cfi_startproc
  1. %bb.0: cmpl $1, %edi je .LBB0_1
  2. %bb.2: cmpl $7, %edi jne .LBB0_3
  3. %bb.4: movl $_ZL1x+4, %eax jmp .LBB0_5

.LBB0_1:

movl    $_ZL1x, %eax
jmp     .LBB0_5

.LBB0_3:

cmpl    $17, %edi
movl    $_ZL1x+8, %ecx
movl    $_ZL1x+12, %eax
cmoveq  %rcx, %rax

.LBB0_5:

movl    $_ZL1x+12, %ecx
cmpq    %rcx, %rax
setne   %al
retq

While GCC produces:

_Z8Containsi:
.LFB1534:

.cfi_startproc
movl    $1, %eax
cmpl    $1, %edi
je      .L1
cmpl    $7, %edi
je      .L1
cmpl    $17, %edi
sete    %al

.L1:

ret

This patch address the issue. After the fix, the generated code looks like:

_Z8Containsi: # @_Z8Containsi

.cfi_startproc
addl    $-1, %edi
cmpl    $16, %edi
ja      .LBB0_2
movl    $65601, %eax            # imm = 0x10041
movl    %edi, %ecx
shrl    %cl, %eax
andb    $1, %al
retq

.LBB0_2: # %_ZSt4findIPKiiET_S2_S2_RKT0_.exit.thread

xorl    %eax, %eax
retq

Diff Detail

Event Timeline

davidxl created this revision.Oct 11 2019, 4:25 PM

If the terminator is a "ret", or some arbitrary terminator that doesn't simplify, it's not really "threading"; it's just tail duplication. That's likely profitable in some cases, but using ThreadEdge to perform the transform seems confusing.

JumpThreading is basically basic block cloning followed by control flow simplification. This is just a special case where the second part is missing.

There is already another special case in JT -- if all the Pred's target successor is the same, there is no threading either -- basically there is only control flow simplification part without the basic cloning. These two cases are just at two different ends of the spectrum.

wmi added a comment.Oct 12 2019, 9:58 PM

I change the testcase a little so the terminator won't be ret, but the generated code pattern is the same. Should it be handled as well?

------------------------------------
#include <array>
#include <algorithm>

constexpr std::array<int, 3> x = {1, 7, 17};
bool global, cond;

void Contains(int i) {
  global = std::find(x.begin(), x.end(), i) != x.end();
  if (cond)
    __builtin_printf("hello\n");
}
------------------------------------

Handling what Wei's case will be a nice thing to have, but it may require more significant change in JT. Currently the JT candidate BB selection is based on checking the conditional value used by branch or return value of ret instr (with this patch).

To handle this case, it requires checking use values of arbitrary instructions (value of store in the example). Another thing to consider is the cost model difference. In Wei's case, cloning really becomes tail dup with increased complexity of control flow (handling Ret instruction on the other hand does not have the issue).

Handling what Wei's case will be a nice thing to have, but it may require more significant change in JT.

I think we need to have a plan for what this is going to look like, so the new code here doesn't immediately become obsolete.

Another thing to consider is the cost model difference.

I think the necessary cost model is effectively the same for ret vs. other instructions. In particular, the "ret" might go away after after inlining.

lib/Transforms/Scalar/JumpThreading.cpp
2005 ↗(On Diff #224704)

I'm surprised threadEdge still works. Probably want to update the documentation for that API if we really want to allow SuccBB to be null.

wmi added inline comments.Oct 31 2019, 4:53 PM
lib/Transforms/Scalar/JumpThreading.cpp
1076–1081 ↗(On Diff #224704)

Nit: if (!RV || !(Condition = dyn_cast<CmpInst>(RV))) return false;

1726 ↗(On Diff #224704)

For BB being handled in this patch (containing Ret instruction), OnlyDest is always nullptr so this block will not be executed for it. Is it for other purpose?

1769–1780 ↗(On Diff #224704)

For BB with Ret instruction, we don't factor predecessors with the same PredVal here. If we factor predecessors with the same PredVal, we may have less clones?

1946–1948 ↗(On Diff #224704)

It will be helpful to add some comment for the case that SuccBB is nullptr and maybe a TODO for the extension and refactoring needed.

test/Transforms/JumpThreading/addr.ll
15–16 ↗(On Diff #224704)

Tests need to be processed with opt -instnamer.

davidxl marked 6 inline comments as done.Nov 12 2019, 8:16 PM
davidxl added inline comments.
lib/Transforms/Scalar/JumpThreading.cpp
1076–1081 ↗(On Diff #224704)

ok

1726 ↗(On Diff #224704)

This simplifies the ret instruction after the condition is removed.

1769–1780 ↗(On Diff #224704)

Right. This can be a TODO.

1946–1948 ↗(On Diff #224704)

Ok.

2005 ↗(On Diff #224704)

ok. Will document as a follow up.

test/Transforms/JumpThreading/addr.ll
15–16 ↗(On Diff #224704)

ok

davidxl updated this revision to Diff 229606.Nov 15 2019, 11:31 AM

rebased and addressed review feedbacks.

Herald added a project: Restricted Project. · View Herald TranscriptNov 15 2019, 11:31 AM
Herald added a subscriber: hiraditya. · View Herald Transcript