This is an archive of the discontinued LLVM Phabricator instance.

[X86] Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed.
ClosedPublic

Authored by congh on Jul 21 2015, 11:27 AM.

Details

Summary

Currently, AnalyzeBranch() fails non-equality comparison between floating points on X86 (see https://llvm.org/bugs/show_bug.cgi?id=23875). This is because this function can modify the branch by reversing the conditional jump and removing unconditional jump if there is a proper fall-through. However, in the case of non-equality comparison between floating points, this can turn the branch "unanalyzable". Consider the following case:

jne .BB1
jp .BB1
jmp .BB2
.BB1:
...
.BB2:
...

AnalyzeBranch() will reverse "jp .BB1" to "jnp .BB2" and then "jmp .BB2" will be removed:

jne .BB1
jnp .BB2
.BB1:
...
.BB2:
...

However, AnalyzeBranch() cannot analyze this branch anymore as there are two conditional jumps with different targets. This may disable some optimizations like block-placement: in this case the fall-through behavior is enforced even if the fall-through block is very cold, which is suboptimal.

Actually this optimization is also done in block-placement pass, which means we can remove this optimization from AnalyzeBranch(). However, currently X86::COND_NE_OR_P and X86::COND_NP_OR_E are not reversible: there is no defined negation conditions for them.

In order to reverse them, this patch defines two new CondCode X86::COND_NEG_NE_OR_P and X86::COND_NEG_NP_OR_E. It also defines how to synthesize instructions for them. Here only the second conditional jump is reversed. This is valid as we only need them to do this "unconditional jump removal" optimization.

The test cases haven't been updated accordingly. If this design is OK I will do it later.

Diff Detail

Event Timeline

congh updated this revision to Diff 30275.Jul 21 2015, 11:27 AM
congh retitled this revision from to Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed..
congh updated this object.
congh added reviewers: dexonsmith, davidxl.
congh added a subscriber: llvm-commits.
congh updated this revision to Diff 30378.Jul 22 2015, 11:43 AM

Handle the case that the false body is null when building instructions for COND_NEG_NP_OR_E and COND_NEG_NE_OR_P.

congh updated this revision to Diff 30385.Jul 22 2015, 12:41 PM

Corrected all failed test cases.

congh retitled this revision from Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed. to [X86] Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed..Jul 22 2015, 1:10 PM
congh added a reviewer: Gerolf.
congh updated this revision to Diff 30540.Jul 23 2015, 4:33 PM

Update the patch by renaming COND_NEG_NE_OR_P/COND_NEG_NP_OR_E to COND_E_AND_NP/COND_P_AND_NE.

Gerolf edited edge metadata.Jul 24 2015, 11:27 AM

"Actually this optimization is also done in block-placement pass, which means we can remove this optimization from AnalyzeBranch()." Does have AnalyzeBranch() have more clients or just block placement? In that case moving the code may impact generated code.

lib/Target/X86/X86InstrInfo.cpp
4056

It would be nice if you added a FIXME even though this is not part of your code. Any assumption about the IS patterns should be made explicit and checked with an assertion.

4061

Perhaps I'm only iterating what Duncan said. What I'm confused by is that the previous pattern are symmetrical: For example the first case is NP && E or E && NP while the new cases are asymmetrical like here NE && NP or P && E (as opposed to NE && NP or NP && NE, which is what I would expect from the handling of the previous pattern). At least there need to be a good explanation (comment) for this.

4089

I would need a picture and examples to understand for which conditions chains the TBB condition is relevant.

davidxl added inline comments.Jul 24 2015, 1:18 PM
lib/Target/X86/X86InstrInfo.cpp
4065

I might have missed other discussions (so that I completely missed with COND_P_AND_NE means), but should (NE || NP) be equivalent to !(E && P) which means the branch code should be the negate of COND_P_AND_E?
Similarly, (P||E) should be negate of NE_AND_NP?

davidxl edited edge metadata.Jul 24 2015, 1:42 PM

Just a meta comment: It is good to avoid doing any transformations in an analysis function.

congh added inline comments.Jul 24 2015, 1:56 PM
lib/Target/X86/X86InstrInfo.cpp
4056

Thanks for your review! I have added assertion checking if two destinations are identical for X86::COND_NP_OR_E and X86::COND_NE_OR_P. For X86::COND_P_AND_NE and X86::COND_E_AND_NP, however, I am not sure we should assert that they have different destinations. This is because it is still OK even when they have the same destination.

And do still need a FIXME?

4061

This condition has two instructions with two different destinations, and the second destination is the true BB. Therefore if we have NE then NP, then the true body can only be reached with !NE && NP; that is E && NP. If we have P then E, the true body can only be reached with !P && E; that is NP && E. And then we got two equivalent conditions. I have added a comment explaining it.

4089

I have added a comment showing examples of X86::COND_P_AND_NE in which two branch destinations are different.

congh updated this revision to Diff 30603.Jul 24 2015, 1:57 PM
congh edited edge metadata.

Update the patch according to Gerolf's comments.

congh added a comment.Jul 29 2015, 1:14 PM

Ping on this patch?

In addition to the comments what about ReverseBranchCondition? Shouldn't the new opcodes be handled there, too?
Perhaps it would be best if you worked directly with the code owner and get his consent.

lib/Target/X86/X86InstrInfo.cpp
4056–4057

I guess my comment was a bit too out the box. What I had in mind is not related to your review. So let's table this.

4057

Now that the NewTBB check has been removed potentially the assertions below can fire.

4092

The problem I have with this review is partially historical. COND_NE_OR_P etc doesn't make sense to me without explanation. OR does not seem to be used in a logical sense since both branch conditions should be present. But also your COND names obfuscate the picture (perhaps just for me though) even more: previously the branch targets had to be identical, now they can be different. This is a new concept and pressing it into the existing AnalyzeBranch routine makes the code harder too maintain. From this angle to have functions that handle the new functionality.

congh added a comment.Aug 21 2015, 5:21 PM

In addition to the comments what about ReverseBranchCondition? Shouldn't the new opcodes be handled there, too?

In ReverseBranchCondition(), GetOppositeBranchCondition() is called to get the reverse condition's opcode, and this function is updated in this patch to return the correct reverse condition for COND_NE_OR_P/COND_NP_OR_E/COND_E_AND_NP/COND_P_AND_NE.

Perhaps it would be best if you worked directly with the code owner and get his consent.

OK. I found Nadav Rotem is the code owner of X86 backend. I will add him as a reviewer.

lib/Target/X86/X86InstrInfo.cpp
4092

This is because on X86 the equality/non-equality comparison between floating points is translated into two instructions, and the conditions of those two instructions represent a logical OR instead of AND, as they jump to the same destination. Normally the negation of OR is AND, and that is why the reverse condition is named with AND. I agree that it is not straightforward to understand that it has two different branch targets, but that is the correct way to reverse it.

We need to revisit this patch ...

congh added a comment.Nov 19 2015, 5:28 PM

We need to revisit this patch ...

Have you seen any example that needs this patch?

Restart discussion on this thread:

For the motivating example where two conditional branches have different targets,

jne .BB1
jnp .BB2
.BB1:
...
.BB2:
...

Is it possible to teach AnalyzeBranch to recognize the pattern -- with opcode COND_NE_OR_P ?

congh added a comment.Jan 11 2016, 4:59 PM

Restart discussion on this thread:

For the motivating example where two conditional branches have different targets,

jne .BB1
jnp .BB2
.BB1:
...
.BB2:
...

Is it possible to teach AnalyzeBranch to recognize the pattern -- with opcode COND_NE_OR_P ?

Yes, I think this is done in this patch. Or do I misunderstand what you mean?

congh updated this revision to Diff 44664.Jan 12 2016, 1:22 PM

Update the patch by bringing back the condition reversal optimization in AnalyzeBranch().

davidxl added inline comments.Jan 13 2016, 2:11 PM
lib/Target/X86/X86InstrInfo.cpp
4060–4086

There is no need to change anything between line 4028 and 4037 (to simplify the patch). Just add a combined assertion after the pattern recognition:

assert( (BranchCode != cond_np_or_e || BranchCode != cond_ne_or_p || NewTB == TBB) && "Identical target BB expected");

Actually since the previous early exit has been removed, the assert can fire off, so the right thing to do is add the TBB == newTBB condition

4088

Should condition NewTBB != TBB be added here too?

4105

It is confusing here. The comment says the condition to B2 is NP_AND_E, but the branch code is P_AND_NE ..

Also the condition to B1 is NE_OR_P, so why not using COND_NE_OR_P?

4108

should NewTBB == TBB be added here?

4114

Why not unconditionally return true in else {} ?

lib/Target/X86/X86InstrInfo.h
64

add a comment after the enum:

COND_E_AND_NP, // negate of COND_NE_OR_P

'AND' does not directly map to any branch patterns, so add the comment help understanding the semantics.

congh updated this revision to Diff 44802.Jan 13 2016, 3:37 PM
congh marked an inline comment as done.

Update the patch according to David's comments.

lib/Target/X86/X86InstrInfo.cpp
4060–4086

You are right. But in practice I think when there is COND_NP and COND_E, their target is always the same (that is why I added assertion in this patch). But to be safer, I have replaced the assertion with a check.

4088

They can be the same target. This means we don't have to check if they are the same or different targets.

4105

I found the comment is incorrect. It is for COND_E_AND_NP not COND_P_AND_NE. I have updated the comment.

4108

You mean NewTBB != TBB? See my comment above: they are be the same target.

4114

I think it is OK to unconditionally return true here.

davidxl added inline comments.Jan 13 2016, 3:52 PM
lib/Target/X86/X86InstrInfo.cpp
4080

For B1, the condition is NP_OR_E, so why not using COND_NP_OR_E as the branch code? Is the new code needed? (after swapping TBB and FBB)

4084

To make sure the pattern is fully checked, I think NewTBB != TBB is also needed.

congh added inline comments.Jan 13 2016, 4:07 PM
lib/Target/X86/X86InstrInfo.cpp
4080

This is actually why this patch is created: COND_NP_OR_E is equivalent to COND_P_AND_NE, but the latter has the second condition reversed based on the former. COND_NP_OR_E has a JNP and JE, while COND_P_AND_NE has a JNP and JNE, or JE and JP. COND_NP_OR_E always has two identical targets, but COND_P_AND_NE doesn't. So they are different patterns. We use different names so that in X86::GetOppositeBranchCondition() we can get the reverse condition code for each other.

4084

But there is nothing wrong when NewTBB == TBB. This pattern doesn't care if they are the same or different basic blocks.

davidxl added inline comments.Jan 14 2016, 11:38 AM
lib/Target/X86/X86InstrInfo.cpp
4080

I am not sure if we need to introduce the Opposite branch code for E_OR_NP.

Example:

JE B1
JP B2

B1 (fall through):

B2:

For this case, the Analyze branch can return success with the following tuple:
{BranchCode = E_OR_NP, TBB = B1, FBB = B2}

Later when insertBranch is called: there are two scenarios:

  1. same as above where B1 is the fall through. The inserted code will be the same as above
  2. B2 is the layout successor. In this second case, the generated code sequence should look like: JNP B1 JE B1

B2 (fall through):
..
B1:

or

JE B1
JNP B1

B2:

..

B1:

Does it make sense?

congh added inline comments.Jan 14 2016, 12:32 PM
lib/Target/X86/X86InstrInfo.cpp
4080

In some places GetOppositeBranchCondition() is used to get the opposite branch code in order to generate an opposite branch. If we don't have an opposite branch code for COND_E_OR_NP, then how to reverse it?

davidxl added inline comments.Jan 14 2016, 12:44 PM
lib/Target/X86/X86InstrInfo.cpp
4080

GetOppositteBranchCondition for some reason is never called with COND_E_OR_NP before, do you know why this path was not triggered ?

congh added inline comments.Jan 14 2016, 1:17 PM
lib/Target/X86/X86InstrInfo.cpp
4080

There are two places to reverse branches:

  1. AnalyzeBranch(), in which GetOppositteBranchCondition() is called when there is a unconditional jump in the end of a MBB. If this is the case of a je+jnp branch, then a je+jp branch will be generated only based on jnp (GetOppositteBranchCondition returns jnp for a jp), as at this moment the COND_E_OR_NP pattern hasn't been recognized. If there is no jmp in the end, GetOppositteBranchCondition() won't be called. A je+jp pattern is handled similarly. When there is no jmp following je+jnp, a COND_E_OR_NP pattern will be recognized.
  1. Block placement: in this pass COND_E_OR_NP won't be handled with a check.
congh added inline comments.Jan 20 2016, 1:47 PM
lib/Target/X86/X86InstrInfo.cpp
4080

One reason we need new condition codes here is that we should make sure for every condition code there is an opposite condition code for it. This makes it easier to reverse branches without checking what the branch code is.

congh updated this revision to Diff 45444.Jan 20 2016, 1:59 PM
congh added a comment.Jan 20 2016, 1:59 PM

Detect COND_P_AND_NE and COND_E_AND_NP with the constraint that the two jumping targets are different.

Can we also add a test case that test this:

jcc1 BB1
jncc2 BB2

BB1: // cold
...
BB2:
..

can be ordered
correctly into
jxxx ...
jxxx ...
BB2:
...
BB1:

lib/Target/X86/X86InstrInfo.cpp
4226

Fix comment. It is not 'implied' -- this is guaranteed by AnalyzeBranch

4228

Have a little wrapper 'getFallthroughBlock'

test/CodeGen/X86/block-placement.ll
466

Can you make the ordering check more strict here?

495

to make the test more robust, perhaps annotate this branch with weights { 1, 1000} where from 'exit' block it is not likely to branch into if.then.

668

comment that the right order is : entry -> bar -> exit -> foo

672

use check-next for more strict checking.

congh marked 5 inline comments as done.Jan 22 2016, 11:26 AM

Can we also add a test case that test this:

jcc1 BB1
jncc2 BB2

BB1: // cold
...
BB2:
..

can be ordered
correctly into
jxxx ...
jxxx ...
BB2:
...
BB1:

Can we write LLVM IR to express this test case?

test/CodeGen/X86/block-placement.ll
672

However, we cannot use CHECK-NEXT here. The output is:

unanalyzable_branch_to_best_succ:       # @unanalyzable_branch_to_best_succ
	.cfi_startproc
# BB#0:                                 # %entry
	subl	$12, %esp
.Ltmp39:
	.cfi_def_cfa_offset 16
	testb	$1, 16(%esp)
	je	.LBB16_1
.LBB16_2:                               # %bar
	calll	f
.LBB16_3:                               # %exit
	addl	$12, %esp
	retl
.LBB16_1:                               # %foo
	fldz
	fucomp	%st(0)
	fnstsw	%ax
	sahf
	jne	.LBB16_2
	jnp	.LBB16_3
	jmp	.LBB16_2
congh added inline comments.Jan 22 2016, 11:36 AM
lib/Target/X86/X86InstrInfo.cpp
4088

I found that we could not add NewTBB != TBB here. I found such a test case, in which the target of JNE and JNP and the fall-through block are the same block.

# Machine code for function func: Post SSA
Frame Objects:
  fi#-2: size=4, align=4, fixed, at location [SP+8]
  fi#-1: size=4, align=16, fixed, at location [SP+4]
  fi#0: size=4, align=4, at location [SP-4]
Constant Pool:
  cp#0: -1.000000e+00, align=8

BB#0: derived from LLVM BB %entry
	PUSH32r %EAX<undef>, %ESP<imp-def>, %ESP<imp-use>; flags: FrameSetup
	%XMM1<def> = CVTSS2SDrm %ESP, 1, %noreg, 8, %noreg; mem:LD4[FixedStack-1](align=16)
	%XMM0<def> = CVTSS2SDrm %ESP, 1, %noreg, 12, %noreg; mem:LD4[FixedStack-2]
	%XMM0<def,tied1> = MULSDrr %XMM0<kill,tied0>, %XMM1<kill>
	%XMM1<def> = FsFLD0SD
	UCOMISDrr %XMM0, %XMM1<kill>, %EFLAGS<imp-def>
	JNE_1 <BB#2>, %EFLAGS<imp-use>
	JNP_1 <BB#2>, %EFLAGS<imp-use>
    Successors according to CFG: BB#3(0x50000000 / 0x80000000 = 62.50%) BB#2(0x30000000 / 0x80000000 = 37.50%)

BB#2: derived from LLVM BB %bb1
    Live Ins: %XMM0
    Predecessors according to CFG: BB#0
	%XMM0<def,tied1> = ADDSDrm %XMM0<kill,tied0>, %noreg, 1, %noreg, <cp#0>, %noreg; mem:LD8[ConstantPool]
    Successors according to CFG: BB#3(?%)

BB#3: derived from LLVM BB %bb2
    Live Ins: %XMM0
    Predecessors according to CFG: BB#2 BB#0
	%XMM0<def> = CVTSD2SSrr %XMM0<kill>
	MOVSSmr %ESP, 1, %noreg, 0, %noreg, %XMM0<kill>; mem:ST4[FixedStack0]
	LD_F32m %ESP, 1, %noreg, 0, %noreg, %FPSW<imp-def,dead>; mem:LD4[FixedStack0]
	%EAX<def> = POP32r %ESP<imp-def>, %ESP<imp-use>; flags: FrameDestroy
	RETL

# End machine code for function func.
congh updated this revision to Diff 45721.Jan 22 2016, 11:38 AM

Update the patch according to David's comment.

For the new suggested test case, I am thinking a code with

%cmp = fcmp une float %f, 0.000000e+00
br i1 %cmp, label %if.then, label %if.end

With the branch probably annotated with profile data to enable the reordering we want.

test/CodeGen/X86/block-placement.ll
672

Can you use the following to enforce the order..

; CHECK-DAG: #entry
; CHECK-NOT: <something>
; CHECK-DAG: #bar

congh marked an inline comment as done.Jan 22 2016, 3:26 PM

For the new suggested test case, I am thinking a code with

%cmp = fcmp une float %f, 0.000000e+00
br i1 %cmp, label %if.then, label %if.end

With the branch probably annotated with profile data to enable the reordering we want.

OK, I have added such a test case.

test/CodeGen/X86/block-placement.ll
672

I don't get it. What is the problem of the current CHECKs? I think the order of those four blocks is already enforced.

congh updated this revision to Diff 45761.Jan 22 2016, 3:26 PM
congh marked an inline comment as done.

Update the patch according to David's comments.

davidxl added inline comments.Jan 22 2016, 3:57 PM
lib/Target/X86/X86InstrInfo.cpp
4088

Should these two JCCs be eliminated?

test/CodeGen/X86/block-placement.ll
672

Ok -- you are right.

test/CodeGen/X86/fp-une-cmp.ll
55

The branch sequence does imply BB2 is reordered before BB1. Is it better to explicit test the label order?

Also why is the jump sequence not the optimal one:

jne bb2
jnp bb1
bb2:
bb1:

congh updated this revision to Diff 45764.Jan 22 2016, 4:10 PM

Update a test case.

lib/Target/X86/X86InstrInfo.cpp
4088

This is just an intermediate state, when AnalyzeBranch() is called. I believe they will be eliminated later.

test/CodeGen/X86/fp-une-cmp.ll
55

OK, I will test the BB orders.

The generated assembly is shown below, which should be optimal.

func2:                                  # @func2
# BB#0:                                 # %entry
	pushl	%eax
	cvtss2sd	8(%esp), %xmm1
	cvtss2sd	12(%esp), %xmm0
	mulsd	%xmm1, %xmm0
	xorpd	%xmm1, %xmm1
	ucomisd	%xmm1, %xmm0
	jne	.LBB1_1
	jp	.LBB1_1
.LBB1_2:                                # %bb2
	cvtsd2ss	%xmm0, %xmm0
	movss	%xmm0, (%esp)
	flds	(%esp)
	popl	%eax
	retl
.LBB1_1:                                # %bb1
	addsd	.LCPI1_0, %xmm0
	jmp	.LBB1_2
.Lfunc_end1:
	.size	func2, .Lfunc_end1-func2
davidxl accepted this revision.Jan 22 2016, 4:17 PM
davidxl edited edge metadata.

LGTM -- watch out for test failures.

This revision is now accepted and ready to land.Jan 22 2016, 4:17 PM
This revision was automatically updated to reflect the committed changes.

LGTM -- watch out for test failures.

David, Gerolf suggested getting one of the x86 maintainers to look at this, and Nadav was CC-ed, but no one with deep knowledge of the x86 backend ever really commented on the patch. =/ Seems somewhat bad form to LGTM without getting one of the long standing maintainers to chime in here.

(And there does appear to be a problem with it, see the comment from James on the commit thread...)

Ayal Zaks asked me to review this patch after he was asked by Nadav.

The stability failures caused by this patch are most likely caused by one or both of the issues in X86InstrInfo::AnalyzeBranchImpl. Cleaning up the unnecessary conditions would be a nice bonus. The inefficiency in fp-une-cmp.ll might be a separate issue.

llvm/trunk/lib/Target/X86/X86InstrInfo.cpp
4033 ↗(On Diff #46028)

Don't you need to check TBB == NewTBB here also?

4071 ↗(On Diff #46028)

I think you need to verify here that NewTBB == FBB before making this transformation. If FBB is nullptr, you'll need to compute the fallthrough block and check that it matches NewTBB.

llvm/trunk/lib/Target/X86/X86InstrInfo.h
57 ↗(On Diff #46028)

The COND_NP_OR_E condition is rather pointless. Based on the comment, it is pretty clear that the intent was for this to be an artificial condition for FCMP_OEQ, but that should be COND_NP_AND_E. (AND not OR)

In other words, I'd recommend deleting the existing COND_NP_OR_E condition and the COND_P_AND_NE one that you added. COND_NE_OR_P and its inverse COND_E_AND_NP are sufficient.

[FWIW, COND_NP_OR_E would always evaluate to true assuming the CC's originated from an FP compare instruction like COMISS.]

llvm/trunk/test/CodeGen/X86/fp-une-cmp.ll
51 ↗(On Diff #46028)

If we invert the compound branch at the end of the entry block and place bb1 before bb2, we can eliminate the jmp at the end of bb1. Do you know why that isn't happening?

davidxl added inline comments.Jan 27 2016, 3:55 PM
llvm/trunk/test/CodeGen/X86/fp-une-cmp.ll
51 ↗(On Diff #46028)

The test case is explicitly added to test the ability for MachineBlockPlacement to break the topological order and reorder BB2 ahead of BB1 (BB1 is ahead of BB2 in source order) -- look at the profile annotation that BB1 is a really cold block -- this reordering is not possible without this patch.

See also discussions in http://reviews.llvm.org/D11393?vs=on&id=45764&whitespace=ignore-most#toc

DavidKreitzer added inline comments.Jan 27 2016, 5:09 PM
llvm/trunk/test/CodeGen/X86/fp-une-cmp.ll
51 ↗(On Diff #46028)

Thanks for the explanation! I missed the profile annotation - Please ignore my comment on this one.

np.

Cong, can you add a comment for the new test case?

David

congh updated this revision to Diff 46299.Jan 28 2016, 11:32 AM
congh edited edge metadata.
congh removed rL LLVM as the repository for this revision.

Update the patch according to David's comments.

Thanks for the review, David!

llvm/trunk/lib/Target/X86/X86InstrInfo.cpp
4033 ↗(On Diff #46028)

Yes, you are right.

4071 ↗(On Diff #46028)

When FBB is nullptr, we could not safely let the fallthrough block be FBB. This is because there is a use case of AnalyzeBranch in block-placement where MBBs are reordered before this function is called, in which case the fallthrough MBB may have nothing to do with the branch.

But we can do this check if FBB is not null.

llvm/trunk/lib/Target/X86/X86InstrInfo.h
57 ↗(On Diff #46028)

This makes sense. Done.

llvm/trunk/test/CodeGen/X86/fp-une-cmp.ll
51 ↗(On Diff #46028)

This is because bb2 is hotter than bb1 (note that there is a branch_weights profile data), and the BlockPlacement pass will place the hotter one as a fall-through.

Thanks for the fixes and for cleaning up the unnecessary conditions. I still think the FBB==nullptr case is broken in the COND_E_AND_NP detection code, but otherwise, this looks good.

Did you get reproducers for the failures that people were seeing with this patch? Do these fixes solve them?

Thanks,
Dave

llvm/trunk/lib/Target/X86/X86InstrInfo.cpp
4071 ↗(On Diff #46028)

To be clear, I wasn't suggesting that you actually compute & set FBB here in the case when it was initially nullptr. Rather, I am saying that you cannot legally use COND_E_AND_NP here without proving that the target of the first branch is the same as the fall-through target.

What is to prevent this code from analyzing the sequence:

JNE B1
JNP B2
... fallthrough to some mystery block B3 (FBB == nullptr) ...

and returning this?

BranchCond = COND_E_AND_NP
TBB = B2
FBB = nullptr

The branch to B1 is lost.

llvm/trunk/lib/Target/X86/X86InstrInfo.h
57 ↗(On Diff #46028)

I would recommend combining these two comments. As written, they are a little misleading, because they suggest that COND_NE_OR_P is used to implement both FCMP_OEQ & FCMP_UNE while COND_E_AND_NP is used just to negate COND_NE_OR_P. In fact, COND_NE_OR_P is the natural implementation of FCMP_UNE while COND_E_AND_NP is the natural implementation of FCMP_OEQ. How about something like this?

// Artificial condition codes. These are used by AnalyzeBranch
// to indicate a block terminated with two conditional branches that together
// form a compound condition. They occur in code using FCMP_OEQ or FCMP_UNE,
// which can't be represented on x86 with a single condition. These
// are never used in MachineInstrs and are inverses of one another.
COND_NE_OR_P,
COND_E_AND_NP,
llvm/trunk/test/CodeGen/X86/fp-une-cmp.ll
51 ↗(On Diff #46028)

Understood about the branch weights, thanks, and thanks for adding the comment to make that clearer.

It's worth noting that bb2 is placed ahead of bb1 even under minsize. That indicates to me that something may need to be tweaked in block placement, but I think that's beyond the scope of this change set.

congh added a comment.Jan 29 2016, 4:36 PM

Thanks for the fixes and for cleaning up the unnecessary conditions. I still think the FBB==nullptr case is broken in the COND_E_AND_NP detection code, but otherwise, this looks good.

Did you get reproducers for the failures that people were seeing with this patch? Do these fixes solve them?

Yes. The issue is that I get the incorrect FBB in InsertBranch() when the passed-in FBB is null. What I did before is using the layout fallthrough BB as the FBB, but this may be incorrect because at that moment the user may have already modifed the layout, making the fallthrough BB not the FBB. Instead, I searched the successor list of MBB to find the FBB (the successor that is not the TBB or EHPad).

Thanks,
Dave

llvm/trunk/lib/Target/X86/X86InstrInfo.cpp
4071 ↗(On Diff #46028)

You are right! However, even we have

JNE B1
JNP B2
(fallthough to B1)

we are not 100% certain that B1 will be FBB. The user of AnalyzeBranch() can do anything before calling it, making the fallthrough block not the actual FBB. This happens in the block-placement pass. A possible solution is iterating the successor list of the MBB and finding the correct FBB, which is done in the updated patch. In the function InsertBranch() I did the same thing: when the passed-in FBB is false, I try to find it by checking the successor list of the MBB. But there is an another potential issue: the user passed-in TBB/FBB may not be the successors of the MBB. This happens in the tail-duplication pass. So I think it is very easy to misuse InsertBranch().

congh updated this revision to Diff 46452.Jan 29 2016, 4:37 PM

Update the patch according to David's comment.

congh added inline comments.Jan 29 2016, 4:39 PM
llvm/trunk/lib/Target/X86/X86InstrInfo.h
57 ↗(On Diff #46028)

Your suggested comment looks great! I have updated the patch accordingly. Thanks!

Thanks for the fixes! Just a few more minor issues.

Regarding my suggestion to modify the comments for InsertBranch/AnalyzeBranch: I would recommend that after doing so you run the change by the current CodeGen code owner for approval, since you are adding a dependence that doesn't currently exist, namely that these routines expect the CFG links in MachineBasicBlock to be up-to-date.

lib/CodeGen/TailDuplication.cpp
759

So, I understand why you needed to make this change. But it suggests that you might want to update the comment for InsertBranch in TargetInstrInfo.h to say that the CFG information must be valid before calling the routine. Similarly for AnalyzeBranch.

lib/Target/X86/X86InstrInfo.cpp
3927

Did you intentionally leave this debugging code here?

3932

This assertion seems dangerous to me given that you are calling this routine from within AnalyzeBranch. Theoretically, you could be in the middle of analyzing an unsupported block like this:

JA B1
JNP B2
JNE B3
.... fallthrough to B2 ...

That would trigger a call to getFallThroughMBB with TBB == B3, and this assertion would fail when it sees the two other successors B1 & B2. I don't know whether it is even possible to get IR like this, but it seems like this code ought to tolerate it.

I'd recommend simply returning nullptr if you find multiple non-EH, non-TBB successors.

congh added a comment.Mar 4 2016, 3:53 PM

Thanks for the fixes! Just a few more minor issues.

Regarding my suggestion to modify the comments for InsertBranch/AnalyzeBranch: I would recommend that after doing so you run the change by the current CodeGen code owner for approval, since you are adding a dependence that doesn't currently exist, namely that these routines expect the CFG links in MachineBasicBlock to be up-to-date.

I am really sorry for replying your comments so late (I was in vacation in Feb). This sounds good to me. Whom do you recommend as the CodeGen code owner to review this patch? Thanks!

lib/CodeGen/TailDuplication.cpp
759

OK, I have added the comments suggested by you to those routines.

lib/Target/X86/X86InstrInfo.cpp
3927

No.. my bad. I have removed them.

3932

OK. This makes sense.

congh updated this revision to Diff 49860.Mar 4 2016, 4:00 PM

Update the patch according to David's comments.

Thanks for following up on this. I just have a couple minor additional commenting suggestions.

To be clear, I was only suggesting that you get another reviewer for the change in include/llvm/Target/TargetInstrInfo.h. I would like someone to confirm that we can reasonably expect the MBB CFG information to be valid when AnalyzeBranch and InsertBranch are called. Aside from that, I am comfortable approving the rest of the patch myself. As for who should review the TargetInstrInfo.h change, maybe Sanjay or Nadav can do that? Also, CODE_OWNERS.TXT lists Evan Cheng as the CodeGen owner, though I don't know how up-to-date that is.

Thanks!
-Dave

include/llvm/Target/TargetInstrInfo.h
455

Both here and at 526, I would recommend saying explicitly, "The CFG information in MBB.Predecessors and MBB.Successors must be valid before calling this function."

lib/Target/X86/X86InstrInfo.cpp
3920

Maybe add another sentence to this comment: "Return nullptr if the fallthough MBB cannot be identified."

spatel edited edge metadata.Mar 7 2016, 3:45 PM

To be clear, I was only suggesting that you get another reviewer for the change in include/llvm/Target/TargetInstrInfo.h. I would like someone to confirm that we can reasonably expect the MBB CFG information to be valid when AnalyzeBranch and InsertBranch are called. Aside from that, I am comfortable approving the rest of the patch myself. As for who should review the TargetInstrInfo.h change, maybe Sanjay or Nadav can do that? Also, CODE_OWNERS.TXT lists Evan Cheng as the CodeGen owner, though I don't know how up-to-date that is.

Evan's info is out-of-date. I don't know enough about this to approve, but I tried to understand the patch via the testcases:

  1. I updated test/CodeGen/X86/fp-une-cmp.ll so it would be easier to see the change. Cong, please update this patch after r262875.
  2. I don't understand the wiggle in test/CodeGen/X86/x86-analyze-branch-jne-jp.ll. From what I see, the label order is the only thing that changes. Is that the expected difference? If so, the CHECK lines are not adequate; the test already passes without this patch. I recommend putting that test into the existing test/CodeGen/X86/fp-une-cmp.ll and using utils/update_llc_test_checks.py so we're sure we're getting the change that you expect.
congh added a comment.Mar 7 2016, 5:35 PM

Thanks for following up on this. I just have a couple minor additional commenting suggestions.

To be clear, I was only suggesting that you get another reviewer for the change in include/llvm/Target/TargetInstrInfo.h. I would like someone to confirm that we can reasonably expect the MBB CFG information to be valid when AnalyzeBranch and InsertBranch are called. Aside from that, I am comfortable approving the rest of the patch myself. As for who should review the TargetInstrInfo.h change, maybe Sanjay or Nadav can do that? Also, CODE_OWNERS.TXT lists Evan Cheng as the CodeGen owner, though I don't know how up-to-date that is.

Thank you a lot on reviewing this patch, David! Sanjay is now looking at this patch.

Thanks!
-Dave

include/llvm/Target/TargetInstrInfo.h
455

Done.

lib/Target/X86/X86InstrInfo.cpp
3920

Done.

congh added a comment.Mar 7 2016, 5:47 PM

To be clear, I was only suggesting that you get another reviewer for the change in include/llvm/Target/TargetInstrInfo.h. I would like someone to confirm that we can reasonably expect the MBB CFG information to be valid when AnalyzeBranch and InsertBranch are called. Aside from that, I am comfortable approving the rest of the patch myself. As for who should review the TargetInstrInfo.h change, maybe Sanjay or Nadav can do that? Also, CODE_OWNERS.TXT lists Evan Cheng as the CodeGen owner, though I don't know how up-to-date that is.

Evan's info is out-of-date. I don't know enough about this to approve, but I tried to understand the patch via the testcases:

  1. I updated test/CodeGen/X86/fp-une-cmp.ll so it would be easier to see the change. Cong, please update this patch after r262875.

This is done now.

  1. I don't understand the wiggle in test/CodeGen/X86/x86-analyze-branch-jne-jp.ll. From what I see, the label order is the only thing that changes. Is that the expected difference? If so, the CHECK lines are not adequate; the test already passes without this patch. I recommend putting that test into the existing test/CodeGen/X86/fp-une-cmp.ll and using utils/update_llc_test_checks.py so we're sure we're getting the change that you expect.

I tested this file on master and this patch, and the difference is not the order of two instructions but the labels after them. I have put this test to fp-une-cmp.ll and updated it with update_llc_test_checks.py. PTAL.

congh updated this revision to Diff 50014.Mar 7 2016, 5:48 PM
congh edited edge metadata.

Update the patch according to David and Sanjay's comments.

I'm happy to say that the successor/predecessor stuff should be correct. The last pass I'm aware of calling these is MachineBlockPlacement which has pretty clear reliance on the CFG being accurate.

If you want to double check, I actually think Quentin or Matze probably have the most context on the machine CFG at this point (more than I do honestly).

I think the only thing I'd really suggest here is to really tighten up the testing. The code and logic looks really fantastic.

test/CodeGen/X86/block-placement.ll
465–466

Here and elsewhere, when updating a test, it would be good to convert it to use CHECK-LABEL at least, and then in the specific test cases, actually write narrow checks. For cases where we have very microscopic functions testing single instruction sequences, using update_llc_test_checks.py is incredibly useful. You can also look at the style of tests in generates and generate comparably structured checks for more complex test cases.

468

Especially, the checks just on these %foo terms makes it hard to at a glance see what is being checked. Having a bit more syntax, or locating the checks inside the code itself, i think will amke it much more clear.

test/CodeGen/X86/fp-une-cmp.ll
79–107

These checks don't really make sense to me. Why are they above the function, but the function has a CHECK-LABEL adn seemingly similar checks?

spatel added a comment.Mar 8 2016, 7:45 AM

I'm happy to say that the successor/predecessor stuff should be correct. The last pass I'm aware of calling these is MachineBlockPlacement which has pretty clear reliance on the CFG being accurate.

If you want to double check, I actually think Quentin or Matze probably have the most context on the machine CFG at this point (more than I do honestly).

I think the only thing I'd really suggest here is to really tighten up the testing. The code and logic looks really fantastic.

I was working my way back up through the tests and was going to suggest similar. Given that the patch has 2.5 LGTMs, I have nothing more to add. :)

test/CodeGen/X86/fp-une-cmp.ll
79–107

Those checks are the old lines from before the update check script was run. They should be deleted.

test/CodeGen/X86/x86-analyze-branch-jne-jp.s
2–23

This file doesn't belong in the patch; it's a leftover from the earlier revision.

Thanks for the additional fixes. LGTM modulo Chandler's suggested test improvements.

-Dave

congh added a comment.Mar 10 2016, 1:36 PM

Thanks for the review, Chandler!

test/CodeGen/X86/block-placement.ll
465–466

OK. I have update this test according to the results from running update_llc_test_checks.py.

test/CodeGen/X86/fp-une-cmp.ll
79–107

Yes. I have deleted them.

congh updated this revision to Diff 50343.Mar 10 2016, 1:36 PM

Update the patch according to Chandler's comments.