This is an archive of the discontinued LLVM Phabricator instance.

[X86] Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed.
ClosedPublic

Authored by congh on Jul 21 2015, 11:27 AM.

Download Raw Diff

Details

Reviewers

spatel
nadav
davidxl
Gerolf
dexonsmith

Commits

rG94710840fb2e: Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed.
rG551a57f79796: Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed.
rL264199: Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed.
rL258847: Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed.

Summary

Currently, AnalyzeBranch() fails non-equality comparison between floating points on X86 (see https://llvm.org/bugs/show_bug.cgi?id=23875). This is because this function can modify the branch by reversing the conditional jump and removing unconditional jump if there is a proper fall-through. However, in the case of non-equality comparison between floating points, this can turn the branch "unanalyzable". Consider the following case:

jne .BB1
jp .BB1
jmp .BB2
.BB1:
...
.BB2:
...

AnalyzeBranch() will reverse "jp .BB1" to "jnp .BB2" and then "jmp .BB2" will be removed:

jne .BB1
jnp .BB2
.BB1:
...
.BB2:
...

However, AnalyzeBranch() cannot analyze this branch anymore as there are two conditional jumps with different targets. This may disable some optimizations like block-placement: in this case the fall-through behavior is enforced even if the fall-through block is very cold, which is suboptimal.

Actually this optimization is also done in block-placement pass, which means we can remove this optimization from AnalyzeBranch(). However, currently X86::COND_NE_OR_P and X86::COND_NP_OR_E are not reversible: there is no defined negation conditions for them.

In order to reverse them, this patch defines two new CondCode X86::COND_NEG_NE_OR_P and X86::COND_NEG_NP_OR_E. It also defines how to synthesize instructions for them. Here only the second conditional jump is reversed. This is valid as we only need them to do this "unconditional jump removal" optimization.

The test cases haven't been updated accordingly. If this design is OK I will do it later.

Diff Detail

Event Timeline

congh updated this revision to Diff 30275.Jul 21 2015, 11:27 AM

congh retitled this revision from to Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed..

congh updated this object.

congh added reviewers: dexonsmith, davidxl.

congh added a subscriber: llvm-commits.

Handle the case that the false body is null when building instructions for COND_NEG_NP_OR_E and COND_NEG_NE_OR_P.

Corrected all failed test cases.

congh retitled this revision from Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed. to [X86] Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed..Jul 22 2015, 1:10 PM

congh added a reviewer: Gerolf.

congh added a reviewer: nadav.Jul 23 2015, 11:22 AM

Update the patch by renaming COND_NEG_NE_OR_P/COND_NEG_NP_OR_E to COND_E_AND_NP/COND_P_AND_NE.

congh added a reviewer: spatel.Jul 24 2015, 10:27 AM

"Actually this optimization is also done in block-placement pass, which means we can remove this optimization from AnalyzeBranch()." Does have AnalyzeBranch() have more clients or just block placement? In that case moving the code may impact generated code.

lib/Target/X86/X86InstrInfo.cpp
4056	It would be nice if you added a FIXME even though this is not part of your code. Any assumption about the IS patterns should be made explicit and checked with an assertion.
4061	Perhaps I'm only iterating what Duncan said. What I'm confused by is that the previous pattern are symmetrical: For example the first case is NP && E or E && NP while the new cases are asymmetrical like here NE && NP or P && E (as opposed to NE && NP or NP && NE, which is what I would expect from the handling of the previous pattern). At least there need to be a good explanation (comment) for this.
4089	I would need a picture and examples to understand for which conditions chains the TBB condition is relevant.

davidxl added inline comments.Jul 24 2015, 1:18 PM

lib/Target/X86/X86InstrInfo.cpp
4065	I might have missed other discussions (so that I completely missed with COND_P_AND_NE means), but should (NE \|\| NP) be equivalent to !(E && P) which means the branch code should be the negate of COND_P_AND_E? Similarly, (P\|\|E) should be negate of NE_AND_NP?

Just a meta comment: It is good to avoid doing any transformations in an analysis function.

congh added inline comments.Jul 24 2015, 1:56 PM

lib/Target/X86/X86InstrInfo.cpp
4056	Thanks for your review! I have added assertion checking if two destinations are identical for X86::COND_NP_OR_E and X86::COND_NE_OR_P. For X86::COND_P_AND_NE and X86::COND_E_AND_NP, however, I am not sure we should assert that they have different destinations. This is because it is still OK even when they have the same destination. And do still need a FIXME?
4061	This condition has two instructions with two different destinations, and the second destination is the true BB. Therefore if we have NE then NP, then the true body can only be reached with !NE && NP; that is E && NP. If we have P then E, the true body can only be reached with !P && E; that is NP && E. And then we got two equivalent conditions. I have added a comment explaining it.
4089	I have added a comment showing examples of X86::COND_P_AND_NE in which two branch destinations are different.

Update the patch according to Gerolf's comments.

Ping on this patch?

Ping?

In addition to the comments what about ReverseBranchCondition? Shouldn't the new opcodes be handled there, too?
Perhaps it would be best if you worked directly with the code owner and get his consent.

lib/Target/X86/X86InstrInfo.cpp
4056–4057	I guess my comment was a bit too out the box. What I had in mind is not related to your review. So let's table this.
4057	Now that the NewTBB check has been removed potentially the assertions below can fire.
4092	The problem I have with this review is partially historical. COND_NE_OR_P etc doesn't make sense to me without explanation. OR does not seem to be used in a logical sense since both branch conditions should be present. But also your COND names obfuscate the picture (perhaps just for me though) even more: previously the branch targets had to be identical, now they can be different. This is a new concept and pressing it into the existing AnalyzeBranch routine makes the code harder too maintain. From this angle to have functions that handle the new functionality.

In D11393#220180, @Gerolf wrote:

In addition to the comments what about ReverseBranchCondition? Shouldn't the new opcodes be handled there, too?

In ReverseBranchCondition(), GetOppositeBranchCondition() is called to get the reverse condition's opcode, and this function is updated in this patch to return the correct reverse condition for COND_NE_OR_P/COND_NP_OR_E/COND_E_AND_NP/COND_P_AND_NE.

Perhaps it would be best if you worked directly with the code owner and get his consent.

OK. I found Nadav Rotem is the code owner of X86 backend. I will add him as a reviewer.

lib/Target/X86/X86InstrInfo.cpp
4092	This is because on X86 the equality/non-equality comparison between floating points is translated into two instructions, and the conditions of those two instructions represent a logical OR instead of AND, as they jump to the same destination. Normally the negation of OR is AND, and that is why the reverse condition is named with AND. I agree that it is not straightforward to understand that it has two different branch targets, but that is the correct way to reverse it.

Ping?

We need to revisit this patch ...

In D11393#293393, @davidxl wrote:

We need to revisit this patch ...

Have you seen any example that needs this patch?

Restart discussion on this thread:

For the motivating example where two conditional branches have different targets,

jne .BB1
jnp .BB2
.BB1:
...
.BB2:
...

Is it possible to teach AnalyzeBranch to recognize the pattern -- with opcode COND_NE_OR_P ?

In D11393#323944, @davidxl wrote:

Restart discussion on this thread:

For the motivating example where two conditional branches have different targets,

jne .BB1
jnp .BB2
.BB1:
...
.BB2:
...

Is it possible to teach AnalyzeBranch to recognize the pattern -- with opcode COND_NE_OR_P ?

Yes, I think this is done in this patch. Or do I misunderstand what you mean?

Update the patch by bringing back the condition reversal optimization in AnalyzeBranch().

davidxl added inline comments.Jan 13 2016, 2:11 PM

lib/Target/X86/X86InstrInfo.cpp
4060–4086	There is no need to change anything between line 4028 and 4037 (to simplify the patch). Just add a combined assertion after the pattern recognition: assert( (BranchCode != cond_np_or_e \|\| BranchCode != cond_ne_or_p \|\| NewTB == TBB) && "Identical target BB expected"); Actually since the previous early exit has been removed, the assert can fire off, so the right thing to do is add the TBB == newTBB condition
4088	Should condition NewTBB != TBB be added here too?
4105	It is confusing here. The comment says the condition to B2 is NP_AND_E, but the branch code is P_AND_NE .. Also the condition to B1 is NE_OR_P, so why not using COND_NE_OR_P?
4108	should NewTBB == TBB be added here?
4114	Why not unconditionally return true in else {} ?
lib/Target/X86/X86InstrInfo.h
64	add a comment after the enum: COND_E_AND_NP, // negate of COND_NE_OR_P 'AND' does not directly map to any branch patterns, so add the comment help understanding the semantics.

Update the patch according to David's comments.

lib/Target/X86/X86InstrInfo.cpp
4060–4086	You are right. But in practice I think when there is COND_NP and COND_E, their target is always the same (that is why I added assertion in this patch). But to be safer, I have replaced the assertion with a check.
4088	They can be the same target. This means we don't have to check if they are the same or different targets.
4105	I found the comment is incorrect. It is for COND_E_AND_NP not COND_P_AND_NE. I have updated the comment.
4108	You mean NewTBB != TBB? See my comment above: they are be the same target.
4114	I think it is OK to unconditionally return true here.

davidxl added inline comments.Jan 13 2016, 3:52 PM

lib/Target/X86/X86InstrInfo.cpp
4080	For B1, the condition is NP_OR_E, so why not using COND_NP_OR_E as the branch code? Is the new code needed? (after swapping TBB and FBB)
4084	To make sure the pattern is fully checked, I think NewTBB != TBB is also needed.

congh added inline comments.Jan 13 2016, 4:07 PM

lib/Target/X86/X86InstrInfo.cpp
4080	This is actually why this patch is created: COND_NP_OR_E is equivalent to COND_P_AND_NE, but the latter has the second condition reversed based on the former. COND_NP_OR_E has a JNP and JE, while COND_P_AND_NE has a JNP and JNE, or JE and JP. COND_NP_OR_E always has two identical targets, but COND_P_AND_NE doesn't. So they are different patterns. We use different names so that in X86::GetOppositeBranchCondition() we can get the reverse condition code for each other.
4084	But there is nothing wrong when NewTBB == TBB. This pattern doesn't care if they are the same or different basic blocks.

davidxl added inline comments.Jan 14 2016, 11:38 AM

lib/Target/X86/X86InstrInfo.cpp
4080	I am not sure if we need to introduce the Opposite branch code for E_OR_NP. Example: JE B1 JP B2 B1 (fall through): B2: For this case, the Analyze branch can return success with the following tuple: {BranchCode = E_OR_NP, TBB = B1, FBB = B2} Later when insertBranch is called: there are two scenarios: same as above where B1 is the fall through. The inserted code will be the same as above B2 is the layout successor. In this second case, the generated code sequence should look like: JNP B1 JE B1 B2 (fall through): .. B1: or JE B1 JNP B1 B2: .. B1: Does it make sense?

congh added inline comments.Jan 14 2016, 12:32 PM

lib/Target/X86/X86InstrInfo.cpp
4080	In some places GetOppositeBranchCondition() is used to get the opposite branch code in order to generate an opposite branch. If we don't have an opposite branch code for COND_E_OR_NP, then how to reverse it?

davidxl added inline comments.Jan 14 2016, 12:44 PM

lib/Target/X86/X86InstrInfo.cpp
4080	GetOppositteBranchCondition for some reason is never called with COND_E_OR_NP before, do you know why this path was not triggered ?

congh added inline comments.Jan 14 2016, 1:17 PM

lib/Target/X86/X86InstrInfo.cpp
4080	There are two places to reverse branches: AnalyzeBranch(), in which GetOppositteBranchCondition() is called when there is a unconditional jump in the end of a MBB. If this is the case of a je+jnp branch, then a je+jp branch will be generated only based on jnp (GetOppositteBranchCondition returns jnp for a jp), as at this moment the COND_E_OR_NP pattern hasn't been recognized. If there is no jmp in the end, GetOppositteBranchCondition() won't be called. A je+jp pattern is handled similarly. When there is no jmp following je+jnp, a COND_E_OR_NP pattern will be recognized. Block placement: in this pass COND_E_OR_NP won't be handled with a check.

congh added inline comments.Jan 20 2016, 1:47 PM

lib/Target/X86/X86InstrInfo.cpp
4080	One reason we need new condition codes here is that we should make sure for every condition code there is an opposite condition code for it. This makes it easier to reverse branches without checking what the branch code is.

congh updated this revision to Diff 45444.Jan 20 2016, 1:59 PM

Detect COND_P_AND_NE and COND_E_AND_NP with the constraint that the two jumping targets are different.

Can we also add a test case that test this:

jcc1 BB1
jncc2 BB2

BB1: // cold
...
BB2:
..

can be ordered
correctly into
jxxx ...
jxxx ...
BB2:
...
BB1:

lib/Target/X86/X86InstrInfo.cpp
4226	Fix comment. It is not 'implied' -- this is guaranteed by AnalyzeBranch
4228	Have a little wrapper 'getFallthroughBlock'
test/CodeGen/X86/block-placement.ll
466	Can you make the ordering check more strict here?
495	to make the test more robust, perhaps annotate this branch with weights { 1, 1000} where from 'exit' block it is not likely to branch into if.then.
668	comment that the right order is : entry -> bar -> exit -> foo
672	use check-next for more strict checking.

In D11393#332842, @davidxl wrote:

Can we also add a test case that test this:

jcc1 BB1
jncc2 BB2

BB1: // cold
...
BB2:
..

can be ordered
correctly into
jxxx ...
jxxx ...
BB2:
...
BB1:

Can we write LLVM IR to express this test case?

test/CodeGen/X86/block-placement.ll

672

However, we cannot use CHECK-NEXT here. The output is:

unanalyzable_branch_to_best_succ:       # @unanalyzable_branch_to_best_succ
	.cfi_startproc
# BB#0:                                 # %entry
	subl	$12, %esp
.Ltmp39:
	.cfi_def_cfa_offset 16
	testb	$1, 16(%esp)
	je	.LBB16_1
.LBB16_2:                               # %bar
	calll	f
.LBB16_3:                               # %exit
	addl	$12, %esp
	retl
.LBB16_1:                               # %foo
	fldz
	fucomp	%st(0)
	fnstsw	%ax
	sahf
	jne	.LBB16_2
	jnp	.LBB16_3
	jmp	.LBB16_2

congh added inline comments.Jan 22 2016, 11:36 AM

lib/Target/X86/X86InstrInfo.cpp

4088

I found that we could not add NewTBB != TBB here. I found such a test case, in which the target of JNE and JNP and the fall-through block are the same block.

# Machine code for function func: Post SSA
Frame Objects:
  fi#-2: size=4, align=4, fixed, at location [SP+8]
  fi#-1: size=4, align=16, fixed, at location [SP+4]
  fi#0: size=4, align=4, at location [SP-4]
Constant Pool:
  cp#0: -1.000000e+00, align=8

BB#0: derived from LLVM BB %entry
	PUSH32r %EAX<undef>, %ESP<imp-def>, %ESP<imp-use>; flags: FrameSetup
	%XMM1<def> = CVTSS2SDrm %ESP, 1, %noreg, 8, %noreg; mem:LD4[FixedStack-1](align=16)
	%XMM0<def> = CVTSS2SDrm %ESP, 1, %noreg, 12, %noreg; mem:LD4[FixedStack-2]
	%XMM0<def,tied1> = MULSDrr %XMM0<kill,tied0>, %XMM1<kill>
	%XMM1<def> = FsFLD0SD
	UCOMISDrr %XMM0, %XMM1<kill>, %EFLAGS<imp-def>
	JNE_1 <BB#2>, %EFLAGS<imp-use>
	JNP_1 <BB#2>, %EFLAGS<imp-use>
    Successors according to CFG: BB#3(0x50000000 / 0x80000000 = 62.50%) BB#2(0x30000000 / 0x80000000 = 37.50%)

BB#2: derived from LLVM BB %bb1
    Live Ins: %XMM0
    Predecessors according to CFG: BB#0
	%XMM0<def,tied1> = ADDSDrm %XMM0<kill,tied0>, %noreg, 1, %noreg, <cp#0>, %noreg; mem:LD8[ConstantPool]
    Successors according to CFG: BB#3(?%)

BB#3: derived from LLVM BB %bb2
    Live Ins: %XMM0
    Predecessors according to CFG: BB#2 BB#0
	%XMM0<def> = CVTSD2SSrr %XMM0<kill>
	MOVSSmr %ESP, 1, %noreg, 0, %noreg, %XMM0<kill>; mem:ST4[FixedStack0]
	LD_F32m %ESP, 1, %noreg, 0, %noreg, %FPSW<imp-def,dead>; mem:LD4[FixedStack0]
	%EAX<def> = POP32r %ESP<imp-def>, %ESP<imp-use>; flags: FrameDestroy
	RETL

# End machine code for function func.

Update the patch according to David's comment.

For the new suggested test case, I am thinking a code with

%cmp = fcmp une float %f, 0.000000e+00
br i1 %cmp, label %if.then, label %if.end

With the branch probably annotated with profile data to enable the reordering we want.

test/CodeGen/X86/block-placement.ll
672	Can you use the following to enforce the order.. ; CHECK-DAG: #entry ; CHECK-NOT: <something> ; CHECK-DAG: #bar

In D11393#333833, @davidxl wrote:

For the new suggested test case, I am thinking a code with

%cmp = fcmp une float %f, 0.000000e+00
br i1 %cmp, label %if.then, label %if.end

With the branch probably annotated with profile data to enable the reordering we want.

OK, I have added such a test case.

test/CodeGen/X86/block-placement.ll
672	I don't get it. What is the problem of the current CHECKs? I think the order of those four blocks is already enforced.

Update the patch according to David's comments.

davidxl added inline comments.Jan 22 2016, 3:57 PM

lib/Target/X86/X86InstrInfo.cpp
4088	Should these two JCCs be eliminated?
test/CodeGen/X86/block-placement.ll
672	Ok -- you are right.
test/CodeGen/X86/fp-une-cmp.ll
55	The branch sequence does imply BB2 is reordered before BB1. Is it better to explicit test the label order? Also why is the jump sequence not the optimal one: jne bb2 jnp bb1 bb2: bb1:

Update a test case.

lib/Target/X86/X86InstrInfo.cpp

4088

This is just an intermediate state, when AnalyzeBranch() is called. I believe they will be eliminated later.

test/CodeGen/X86/fp-une-cmp.ll

OK, I will test the BB orders.

The generated assembly is shown below, which should be optimal.

func2:                                  # @func2
# BB#0:                                 # %entry
	pushl	%eax
	cvtss2sd	8(%esp), %xmm1
	cvtss2sd	12(%esp), %xmm0
	mulsd	%xmm1, %xmm0
	xorpd	%xmm1, %xmm1
	ucomisd	%xmm1, %xmm0
	jne	.LBB1_1
	jp	.LBB1_1
.LBB1_2:                                # %bb2
	cvtsd2ss	%xmm0, %xmm0
	movss	%xmm0, (%esp)
	flds	(%esp)
	popl	%eax
	retl
.LBB1_1:                                # %bb1
	addsd	.LCPI1_0, %xmm0
	jmp	.LBB1_2
.Lfunc_end1:
	.size	func2, .Lfunc_end1-func2

LGTM -- watch out for test failures.

This revision is now accepted and ready to land.Jan 22 2016, 4:17 PM

Closed by commit rL258847: Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed. (authored by conghou). · Explain WhyJan 26 2016, 12:12 PM

This revision was automatically updated to reflect the committed changes.

In D11393#334183, @davidxl wrote:

LGTM -- watch out for test failures.

David, Gerolf suggested getting one of the x86 maintainers to look at this, and Nadav was CC-ed, but no one with deep knowledge of the x86 backend ever really commented on the patch. =/ Seems somewhat bad form to LGTM without getting one of the long standing maintainers to chime in here.

(And there does appear to be a problem with it, see the comment from James on the commit thread...)

DavidKreitzer added a subscriber: DavidKreitzer.Jan 27 2016, 2:11 PM

Ayal Zaks asked me to review this patch after he was asked by Nadav.

The stability failures caused by this patch are most likely caused by one or both of the issues in X86InstrInfo::AnalyzeBranchImpl. Cleaning up the unnecessary conditions would be a nice bonus. The inefficiency in fp-une-cmp.ll might be a separate issue.

llvm/trunk/lib/Target/X86/X86InstrInfo.cpp
4033 ↗	(On Diff #46028)	Don't you need to check TBB == NewTBB here also?
4071 ↗	(On Diff #46028)	I think you need to verify here that NewTBB == FBB before making this transformation. If FBB is nullptr, you'll need to compute the fallthrough block and check that it matches NewTBB.
llvm/trunk/lib/Target/X86/X86InstrInfo.h
57 ↗	(On Diff #46028)	The COND_NP_OR_E condition is rather pointless. Based on the comment, it is pretty clear that the intent was for this to be an artificial condition for FCMP_OEQ, but that should be COND_NP_AND_E. (AND not OR) In other words, I'd recommend deleting the existing COND_NP_OR_E condition and the COND_P_AND_NE one that you added. COND_NE_OR_P and its inverse COND_E_AND_NP are sufficient. [FWIW, COND_NP_OR_E would always evaluate to true assuming the CC's originated from an FP compare instruction like COMISS.]
llvm/trunk/test/CodeGen/X86/fp-une-cmp.ll
51 ↗	(On Diff #46028)	If we invert the compound branch at the end of the entry block and place bb1 before bb2, we can eliminate the jmp at the end of bb1. Do you know why that isn't happening?

davidxl added inline comments.Jan 27 2016, 3:55 PM

llvm/trunk/test/CodeGen/X86/fp-une-cmp.ll
51 ↗	(On Diff #46028)	The test case is explicitly added to test the ability for MachineBlockPlacement to break the topological order and reorder BB2 ahead of BB1 (BB1 is ahead of BB2 in source order) -- look at the profile annotation that BB1 is a really cold block -- this reordering is not possible without this patch. See also discussions in http://reviews.llvm.org/D11393?vs=on&id=45764&whitespace=ignore-most#toc

DavidKreitzer added inline comments.Jan 27 2016, 5:09 PM

llvm/trunk/test/CodeGen/X86/fp-une-cmp.ll
51 ↗	(On Diff #46028)	Thanks for the explanation! I missed the profile annotation - Please ignore my comment on this one.

np.

Cong, can you add a comment for the new test case?

David

Update the patch according to David's comments.

Thanks for the review, David!

llvm/trunk/lib/Target/X86/X86InstrInfo.cpp
4033 ↗	(On Diff #46028)	Yes, you are right.
4071 ↗	(On Diff #46028)	When FBB is nullptr, we could not safely let the fallthrough block be FBB. This is because there is a use case of AnalyzeBranch in block-placement where MBBs are reordered before this function is called, in which case the fallthrough MBB may have nothing to do with the branch. But we can do this check if FBB is not null.
llvm/trunk/lib/Target/X86/X86InstrInfo.h
57 ↗	(On Diff #46028)	This makes sense. Done.
llvm/trunk/test/CodeGen/X86/fp-une-cmp.ll
51 ↗	(On Diff #46028)	This is because bb2 is hotter than bb1 (note that there is a branch_weights profile data), and the BlockPlacement pass will place the hotter one as a fall-through.

Thanks for the fixes and for cleaning up the unnecessary conditions. I still think the FBB==nullptr case is broken in the COND_E_AND_NP detection code, but otherwise, this looks good.

Did you get reproducers for the failures that people were seeing with this patch? Do these fixes solve them?

Thanks,
Dave

llvm/trunk/lib/Target/X86/X86InstrInfo.cpp
4071 ↗	(On Diff #46028)	To be clear, I wasn't suggesting that you actually compute & set FBB here in the case when it was initially nullptr. Rather, I am saying that you cannot legally use COND_E_AND_NP here without proving that the target of the first branch is the same as the fall-through target. What is to prevent this code from analyzing the sequence: JNE B1 JNP B2 ... fallthrough to some mystery block B3 (FBB == nullptr) ... and returning this? BranchCond = COND_E_AND_NP TBB = B2 FBB = nullptr The branch to B1 is lost.
llvm/trunk/lib/Target/X86/X86InstrInfo.h
57 ↗	(On Diff #46028)	I would recommend combining these two comments. As written, they are a little misleading, because they suggest that COND_NE_OR_P is used to implement both FCMP_OEQ & FCMP_UNE while COND_E_AND_NP is used just to negate COND_NE_OR_P. In fact, COND_NE_OR_P is the natural implementation of FCMP_UNE while COND_E_AND_NP is the natural implementation of FCMP_OEQ. How about something like this? // Artificial condition codes. These are used by AnalyzeBranch // to indicate a block terminated with two conditional branches that together // form a compound condition. They occur in code using FCMP_OEQ or FCMP_UNE, // which can't be represented on x86 with a single condition. These // are never used in MachineInstrs and are inverses of one another. COND_NE_OR_P, COND_E_AND_NP,
llvm/trunk/test/CodeGen/X86/fp-une-cmp.ll
51 ↗	(On Diff #46028)	Understood about the branch weights, thanks, and thanks for adding the comment to make that clearer. It's worth noting that bb2 is placed ahead of bb1 even under minsize. That indicates to me that something may need to be tweaked in block placement, but I think that's beyond the scope of this change set.

In D11393#338709, @DavidKreitzer wrote:

Thanks for the fixes and for cleaning up the unnecessary conditions. I still think the FBB==nullptr case is broken in the COND_E_AND_NP detection code, but otherwise, this looks good.

Did you get reproducers for the failures that people were seeing with this patch? Do these fixes solve them?

Yes. The issue is that I get the incorrect FBB in InsertBranch() when the passed-in FBB is null. What I did before is using the layout fallthrough BB as the FBB, but this may be incorrect because at that moment the user may have already modifed the layout, making the fallthrough BB not the FBB. Instead, I searched the successor list of MBB to find the FBB (the successor that is not the TBB or EHPad).

Thanks,
Dave

llvm/trunk/lib/Target/X86/X86InstrInfo.cpp
4071 ↗	(On Diff #46028)	You are right! However, even we have JNE B1 JNP B2 (fallthough to B1) we are not 100% certain that B1 will be FBB. The user of AnalyzeBranch() can do anything before calling it, making the fallthrough block not the actual FBB. This happens in the block-placement pass. A possible solution is iterating the successor list of the MBB and finding the correct FBB, which is done in the updated patch. In the function InsertBranch() I did the same thing: when the passed-in FBB is false, I try to find it by checking the successor list of the MBB. But there is an another potential issue: the user passed-in TBB/FBB may not be the successors of the MBB. This happens in the tail-duplication pass. So I think it is very easy to misuse InsertBranch().

Update the patch according to David's comment.

congh added inline comments.Jan 29 2016, 4:39 PM

llvm/trunk/lib/Target/X86/X86InstrInfo.h
57 ↗	(On Diff #46028)	Your suggested comment looks great! I have updated the patch accordingly. Thanks!

Thanks for the fixes! Just a few more minor issues.

Regarding my suggestion to modify the comments for InsertBranch/AnalyzeBranch: I would recommend that after doing so you run the change by the current CodeGen code owner for approval, since you are adding a dependence that doesn't currently exist, namely that these routines expect the CFG links in MachineBasicBlock to be up-to-date.

lib/CodeGen/TailDuplication.cpp
759	So, I understand why you needed to make this change. But it suggests that you might want to update the comment for InsertBranch in TargetInstrInfo.h to say that the CFG information must be valid before calling the routine. Similarly for AnalyzeBranch.
lib/Target/X86/X86InstrInfo.cpp
3927	Did you intentionally leave this debugging code here?
3932	This assertion seems dangerous to me given that you are calling this routine from within AnalyzeBranch. Theoretically, you could be in the middle of analyzing an unsupported block like this: JA B1 JNP B2 JNE B3 .... fallthrough to B2 ... That would trigger a call to getFallThroughMBB with TBB == B3, and this assertion would fail when it sees the two other successors B1 & B2. I don't know whether it is even possible to get IR like this, but it seems like this code ought to tolerate it. I'd recommend simply returning nullptr if you find multiple non-EH, non-TBB successors.

In D11393#341155, @DavidKreitzer wrote:

Thanks for the fixes! Just a few more minor issues.

Regarding my suggestion to modify the comments for InsertBranch/AnalyzeBranch: I would recommend that after doing so you run the change by the current CodeGen code owner for approval, since you are adding a dependence that doesn't currently exist, namely that these routines expect the CFG links in MachineBasicBlock to be up-to-date.

I am really sorry for replying your comments so late (I was in vacation in Feb). This sounds good to me. Whom do you recommend as the CodeGen code owner to review this patch? Thanks!

lib/CodeGen/TailDuplication.cpp
759	OK, I have added the comments suggested by you to those routines.
lib/Target/X86/X86InstrInfo.cpp
3927	No.. my bad. I have removed them.
3932	OK. This makes sense.

Update the patch according to David's comments.

Thanks for following up on this. I just have a couple minor additional commenting suggestions.

To be clear, I was only suggesting that you get another reviewer for the change in include/llvm/Target/TargetInstrInfo.h. I would like someone to confirm that we can reasonably expect the MBB CFG information to be valid when AnalyzeBranch and InsertBranch are called. Aside from that, I am comfortable approving the rest of the patch myself. As for who should review the TargetInstrInfo.h change, maybe Sanjay or Nadav can do that? Also, CODE_OWNERS.TXT lists Evan Cheng as the CodeGen owner, though I don't know how up-to-date that is.

Thanks!
-Dave

include/llvm/Target/TargetInstrInfo.h
455	Both here and at 526, I would recommend saying explicitly, "The CFG information in MBB.Predecessors and MBB.Successors must be valid before calling this function."
lib/Target/X86/X86InstrInfo.cpp
3920	Maybe add another sentence to this comment: "Return nullptr if the fallthough MBB cannot be identified."

In D11393#369253, @DavidKreitzer wrote:

To be clear, I was only suggesting that you get another reviewer for the change in include/llvm/Target/TargetInstrInfo.h. I would like someone to confirm that we can reasonably expect the MBB CFG information to be valid when AnalyzeBranch and InsertBranch are called. Aside from that, I am comfortable approving the rest of the patch myself. As for who should review the TargetInstrInfo.h change, maybe Sanjay or Nadav can do that? Also, CODE_OWNERS.TXT lists Evan Cheng as the CodeGen owner, though I don't know how up-to-date that is.

Evan's info is out-of-date. I don't know enough about this to approve, but I tried to understand the patch via the testcases:

I updated test/CodeGen/X86/fp-une-cmp.ll so it would be easier to see the change. Cong, please update this patch after r262875.
I don't understand the wiggle in test/CodeGen/X86/x86-analyze-branch-jne-jp.ll. From what I see, the label order is the only thing that changes. Is that the expected difference? If so, the CHECK lines are not adequate; the test already passes without this patch. I recommend putting that test into the existing test/CodeGen/X86/fp-une-cmp.ll and using utils/update_llc_test_checks.py so we're sure we're getting the change that you expect.

In D11393#369253, @DavidKreitzer wrote:

Thanks for following up on this. I just have a couple minor additional commenting suggestions.

To be clear, I was only suggesting that you get another reviewer for the change in include/llvm/Target/TargetInstrInfo.h. I would like someone to confirm that we can reasonably expect the MBB CFG information to be valid when AnalyzeBranch and InsertBranch are called. Aside from that, I am comfortable approving the rest of the patch myself. As for who should review the TargetInstrInfo.h change, maybe Sanjay or Nadav can do that? Also, CODE_OWNERS.TXT lists Evan Cheng as the CodeGen owner, though I don't know how up-to-date that is.

Thank you a lot on reviewing this patch, David! Sanjay is now looking at this patch.

Thanks!
-Dave

include/llvm/Target/TargetInstrInfo.h
455	Done.
lib/Target/X86/X86InstrInfo.cpp
3920	Done.

In D11393#369350, @spatel wrote:

In D11393#369253, @DavidKreitzer wrote:

To be clear, I was only suggesting that you get another reviewer for the change in include/llvm/Target/TargetInstrInfo.h. I would like someone to confirm that we can reasonably expect the MBB CFG information to be valid when AnalyzeBranch and InsertBranch are called. Aside from that, I am comfortable approving the rest of the patch myself. As for who should review the TargetInstrInfo.h change, maybe Sanjay or Nadav can do that? Also, CODE_OWNERS.TXT lists Evan Cheng as the CodeGen owner, though I don't know how up-to-date that is.

Evan's info is out-of-date. I don't know enough about this to approve, but I tried to understand the patch via the testcases:

I updated test/CodeGen/X86/fp-une-cmp.ll so it would be easier to see the change. Cong, please update this patch after r262875.

This is done now.

I don't understand the wiggle in test/CodeGen/X86/x86-analyze-branch-jne-jp.ll. From what I see, the label order is the only thing that changes. Is that the expected difference? If so, the CHECK lines are not adequate; the test already passes without this patch. I recommend putting that test into the existing test/CodeGen/X86/fp-une-cmp.ll and using utils/update_llc_test_checks.py so we're sure we're getting the change that you expect.

I tested this file on master and this patch, and the difference is not the order of two instructions but the labels after them. I have put this test to fp-une-cmp.ll and updated it with update_llc_test_checks.py. PTAL.

Update the patch according to David and Sanjay's comments.

I'm happy to say that the successor/predecessor stuff should be correct. The last pass I'm aware of calling these is MachineBlockPlacement which has pretty clear reliance on the CFG being accurate.

If you want to double check, I actually think Quentin or Matze probably have the most context on the machine CFG at this point (more than I do honestly).

I think the only thing I'd really suggest here is to really tighten up the testing. The code and logic looks really fantastic.

test/CodeGen/X86/block-placement.ll
465–466	Here and elsewhere, when updating a test, it would be good to convert it to use CHECK-LABEL at least, and then in the specific test cases, actually write narrow checks. For cases where we have very microscopic functions testing single instruction sequences, using update_llc_test_checks.py is incredibly useful. You can also look at the style of tests in generates and generate comparably structured checks for more complex test cases.
468	Especially, the checks just on these %foo terms makes it hard to at a glance see what is being checked. Having a bit more syntax, or locating the checks inside the code itself, i think will amke it much more clear.
test/CodeGen/X86/fp-une-cmp.ll
79–107	These checks don't really make sense to me. Why are they above the function, but the function has a CHECK-LABEL adn seemingly similar checks?

In D11393#369670, @chandlerc wrote:

I'm happy to say that the successor/predecessor stuff should be correct. The last pass I'm aware of calling these is MachineBlockPlacement which has pretty clear reliance on the CFG being accurate.

If you want to double check, I actually think Quentin or Matze probably have the most context on the machine CFG at this point (more than I do honestly).

I think the only thing I'd really suggest here is to really tighten up the testing. The code and logic looks really fantastic.

I was working my way back up through the tests and was going to suggest similar. Given that the patch has 2.5 LGTMs, I have nothing more to add. :)

test/CodeGen/X86/fp-une-cmp.ll
79–107	Those checks are the old lines from before the update check script was run. They should be deleted.
test/CodeGen/X86/x86-analyze-branch-jne-jp.s
2–23	This file doesn't belong in the patch; it's a leftover from the earlier revision.

Thanks for the additional fixes. LGTM modulo Chandler's suggested test improvements.

-Dave

Thanks for the review, Chandler!

test/CodeGen/X86/block-placement.ll
465–466	OK. I have update this test according to the results from running update_llc_test_checks.py.
test/CodeGen/X86/fp-une-cmp.ll
79–107	Yes. I have deleted them.

Update the patch according to Chandler's comments.

Revision Contents

Path

Size

include/

llvm/

Target/

TargetInstrInfo.h

5 lines

lib/

CodeGen/

TailDuplication.cpp

6 lines

Target/

X86/

X86InstrInfo.h

96 lines

X86InstrInfo.cpp

93 lines

test/

CodeGen/

X86/

block-placement.ll

38 lines

fast-isel-cmp-branch2.ll

5 lines

fast-isel-cmp-branch3.ll

5 lines

fp-une-cmp.ll

38 lines

x86-analyze-branch-jne-jp.s

22 lines

Diff 50343

include/llvm/Target/TargetInstrInfo.h

Show First 20 Lines • Show All 446 Lines • ▼ Show 20 Lines	public:
/// methods to create new branches.		/// methods to create new branches.
///		///
/// Note that RemoveBranch and InsertBranch must be implemented to support		/// Note that RemoveBranch and InsertBranch must be implemented to support
/// cases where this method returns success.		/// cases where this method returns success.
///		///
/// If AllowModify is true, then this routine is allowed to modify the basic		/// If AllowModify is true, then this routine is allowed to modify the basic
/// block (e.g. delete instructions after the unconditional branch).		/// block (e.g. delete instructions after the unconditional branch).
///		///
		/// The CFG information in MBB.Predecessors and MBB.Successors must be valid
		DavidKreitzerUnsubmitted Not Done Reply Inline Actions Both here and at 526, I would recommend saying explicitly, "The CFG information in MBB.Predecessors and MBB.Successors must be valid before calling this function." DavidKreitzer: Both here and at 526, I would recommend saying explicitly, "The CFG information in MBB.
		conghAuthorUnsubmitted Not Done Reply Inline Actions Done. congh: Done.
		/// before calling this function.
virtual bool AnalyzeBranch(MachineBasicBlock &MBB, MachineBasicBlock *&TBB,		virtual bool AnalyzeBranch(MachineBasicBlock &MBB, MachineBasicBlock *&TBB,
MachineBasicBlock *&FBB,		MachineBasicBlock *&FBB,
SmallVectorImpl<MachineOperand> &Cond,		SmallVectorImpl<MachineOperand> &Cond,
bool AllowModify = false) const {		bool AllowModify = false) const {
return true;		return true;
}		}

/// Represents a predicate at the MachineFunction level. The control flow a		/// Represents a predicate at the MachineFunction level. The control flow a
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	public:
/// returned by AnalyzeBranch. This is only invoked in cases where		/// returned by AnalyzeBranch. This is only invoked in cases where
/// AnalyzeBranch returns success. It returns the number of instructions		/// AnalyzeBranch returns success. It returns the number of instructions
/// inserted.		/// inserted.
///		///
/// It is also invoked by tail merging to add unconditional branches in		/// It is also invoked by tail merging to add unconditional branches in
/// cases where AnalyzeBranch doesn't apply because there was no original		/// cases where AnalyzeBranch doesn't apply because there was no original
/// branch to analyze. At least this much must be implemented, else tail		/// branch to analyze. At least this much must be implemented, else tail
/// merging needs to be disabled.		/// merging needs to be disabled.
		///
		/// The CFG information in MBB.Predecessors and MBB.Successors must be valid
		/// before calling this function.
virtual unsigned InsertBranch(MachineBasicBlock &MBB, MachineBasicBlock *TBB,		virtual unsigned InsertBranch(MachineBasicBlock &MBB, MachineBasicBlock *TBB,
MachineBasicBlock *FBB,		MachineBasicBlock *FBB,
ArrayRef<MachineOperand> Cond,		ArrayRef<MachineOperand> Cond,
DebugLoc DL) const {		DebugLoc DL) const {
llvm_unreachable("Target didn't implement TargetInstrInfo::InsertBranch!");		llvm_unreachable("Target didn't implement TargetInstrInfo::InsertBranch!");
}		}

/// Delete the instruction OldInst and everything after it, replacing it with		/// Delete the instruction OldInst and everything after it, replacing it with
▲ Show 20 Lines • Show All 905 Lines • Show Last 20 Lines

lib/CodeGen/TailDuplication.cpp

Show First 20 Lines • Show All 743 Lines • ▼ Show 20 Lines	for (SmallSetVector<MachineBasicBlock *, 8>::iterator PI = Preds.begin(),
// Avoid adding fall through branches.		// Avoid adding fall through branches.
if (PredFBB == NextBB)		if (PredFBB == NextBB)
PredFBB = nullptr;		PredFBB = nullptr;
if (PredTBB == NextBB && PredFBB == nullptr)		if (PredTBB == NextBB && PredFBB == nullptr)
PredTBB = nullptr;		PredTBB = nullptr;

TII->RemoveBranch(*PredBB);		TII->RemoveBranch(*PredBB);

if (PredTBB)
TII->InsertBranch(*PredBB, PredTBB, PredFBB, PredCond, DebugLoc());

if (!PredBB->isSuccessor(NewTarget))		if (!PredBB->isSuccessor(NewTarget))
PredBB->replaceSuccessor(TailBB, NewTarget);		PredBB->replaceSuccessor(TailBB, NewTarget);
else {		else {
PredBB->removeSuccessor(TailBB, true);		PredBB->removeSuccessor(TailBB, true);
assert(PredBB->succ_size() <= 1);		assert(PredBB->succ_size() <= 1);
}		}

		if (PredTBB)
		DavidKreitzerUnsubmitted Not Done Reply Inline Actions So, I understand why you needed to make this change. But it suggests that you might want to update the comment for InsertBranch in TargetInstrInfo.h to say that the CFG information must be valid before calling the routine. Similarly for AnalyzeBranch. DavidKreitzer: So, I understand why you needed to make this change. But it suggests that you might want to…
		conghAuthorUnsubmitted Not Done Reply Inline Actions OK, I have added the comments suggested by you to those routines. congh: OK, I have added the comments suggested by you to those routines.
		TII->InsertBranch(*PredBB, PredTBB, PredFBB, PredCond, DebugLoc());

TDBBs.push_back(PredBB);		TDBBs.push_back(PredBB);
}		}
return Changed;		return Changed;
}		}

/// If it is profitable, duplicate TailBB's contents in each		/// If it is profitable, duplicate TailBB's contents in each
/// of its predecessors.		/// of its predecessors.
bool		bool
▲ Show 20 Lines • Show All 226 Lines • Show Last 20 Lines

lib/Target/X86/X86InstrInfo.h

	Show All 23 Lines

	namespace llvm {			namespace llvm {
	class X86RegisterInfo;			class X86RegisterInfo;
	class X86Subtarget;			class X86Subtarget;

	namespace X86 {			namespace X86 {
	// X86 specific condition code. These correspond to X86_*_COND in			// X86 specific condition code. These correspond to X86_*_COND in
	// X86InstrInfo.td. They must be kept in synch.			// X86InstrInfo.td. They must be kept in synch.
	enum CondCode {			enum CondCode {
	COND_A = 0,			COND_A = 0,
	COND_AE = 1,			COND_AE = 1,
	COND_B = 2,			COND_B = 2,
	COND_BE = 3,			COND_BE = 3,
	COND_E = 4,			COND_E = 4,
	COND_G = 5,			COND_G = 5,
	COND_GE = 6,			COND_GE = 6,
	COND_L = 7,			COND_L = 7,
	COND_LE = 8,			COND_LE = 8,
	COND_NE = 9,			COND_NE = 9,
	COND_NO = 10,			COND_NO = 10,
	COND_NP = 11,			COND_NP = 11,
	COND_NS = 12,			COND_NS = 12,
	COND_O = 13,			COND_O = 13,
	COND_P = 14,			COND_P = 14,
	COND_S = 15,			COND_S = 15,
	LAST_VALID_COND = COND_S,			LAST_VALID_COND = COND_S,

	// Artificial condition codes. These are used by AnalyzeBranch			// Artificial condition codes. These are used by AnalyzeBranch
	// to indicate a block terminated with two conditional branches to			// to indicate a block terminated with two conditional branches that together
	// the same location. This occurs in code using FCMP_OEQ or FCMP_UNE,			// form a compound condition. They occur in code using FCMP_OEQ or FCMP_UNE,
	// which can't be represented on x86 with a single condition. These			// which can't be represented on x86 with a single condition. These
	// are never used in MachineInstrs.			// are never used in MachineInstrs and are inverses of one another.
	COND_NE_OR_P,			COND_NE_OR_P,
	COND_NP_OR_E,			COND_E_AND_NP,

	COND_INVALID			COND_INVALID
	};			};

	// Turn condition code into conditional branch opcode.			// Turn condition code into conditional branch opcode.
	unsigned GetCondBranchFromCond(CondCode CC);			unsigned GetCondBranchFromCond(CondCode CC);

				davidxlUnsubmitted Done Reply Inline Actions add a comment after the enum: COND_E_AND_NP, // negate of COND_NE_OR_P 'AND' does not directly map to any branch patterns, so add the comment help understanding the semantics. davidxl: add a comment after the enum: COND_E_AND_NP, // negate of COND_NE_OR_P 'AND' does not…
	/// \brief Return a set opcode for the given condition and whether it has			/// \brief Return a set opcode for the given condition and whether it has
	/// a memory operand.			/// a memory operand.
	unsigned getSETFromCond(CondCode CC, bool HasMemoryOperand = false);			unsigned getSETFromCond(CondCode CC, bool HasMemoryOperand = false);

	/// \brief Return a cmov opcode for the given condition, register size in			/// \brief Return a cmov opcode for the given condition, register size in
	/// bytes, and operand type.			/// bytes, and operand type.
	unsigned getCMovFromCond(CondCode CC, unsigned RegBytes,			unsigned getCMovFromCond(CondCode CC, unsigned RegBytes,
	bool HasMemoryOperand = false);			bool HasMemoryOperand = false);

	// Turn CMov opcode into condition code.			// Turn CMov opcode into condition code.
	CondCode getCondFromCMovOpc(unsigned Opc);			CondCode getCondFromCMovOpc(unsigned Opc);

	/// GetOppositeBranchCondition - Return the inverse of the specified cond,			/// GetOppositeBranchCondition - Return the inverse of the specified cond,
	/// e.g. turning COND_E to COND_NE.			/// e.g. turning COND_E to COND_NE.
	CondCode GetOppositeBranchCondition(CondCode CC);			CondCode GetOppositeBranchCondition(CondCode CC);
	} // end namespace X86;			} // end namespace X86;


	/// isGlobalStubReference - Return true if the specified TargetFlag operand is			/// isGlobalStubReference - Return true if the specified TargetFlag operand is
	/// a reference to a stub for a global, not the global itself.			/// a reference to a stub for a global, not the global itself.
	inline static bool isGlobalStubReference(unsigned char TargetFlag) {			inline static bool isGlobalStubReference(unsigned char TargetFlag) {
	switch (TargetFlag) {			switch (TargetFlag) {
	case X86II::MO_DLLIMPORT: // dllimport stub.			case X86II::MO_DLLIMPORT: // dllimport stub.
	▲ Show 20 Lines • Show All 484 Lines • Show Last 20 Lines

lib/Target/X86/X86InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,801 Lines • ▼ Show 20 Lines	X86::CondCode X86::GetOppositeBranchCondition(X86::CondCode CC) {
case X86::COND_A: return X86::COND_BE;		case X86::COND_A: return X86::COND_BE;
case X86::COND_AE: return X86::COND_B;		case X86::COND_AE: return X86::COND_B;
case X86::COND_S: return X86::COND_NS;		case X86::COND_S: return X86::COND_NS;
case X86::COND_NS: return X86::COND_S;		case X86::COND_NS: return X86::COND_S;
case X86::COND_P: return X86::COND_NP;		case X86::COND_P: return X86::COND_NP;
case X86::COND_NP: return X86::COND_P;		case X86::COND_NP: return X86::COND_P;
case X86::COND_O: return X86::COND_NO;		case X86::COND_O: return X86::COND_NO;
case X86::COND_NO: return X86::COND_O;		case X86::COND_NO: return X86::COND_O;
		case X86::COND_NE_OR_P: return X86::COND_E_AND_NP;
		case X86::COND_E_AND_NP: return X86::COND_NE_OR_P;
}		}
}		}

/// Assuming the flags are set by MI(a,b), return the condition code if we		/// Assuming the flags are set by MI(a,b), return the condition code if we
/// modify the instructions such that flags are set by MI(b,a).		/// modify the instructions such that flags are set by MI(b,a).
static X86::CondCode getSwappedCondition(X86::CondCode CC) {		static X86::CondCode getSwappedCondition(X86::CondCode CC) {
switch (CC) {		switch (CC) {
default: return X86::COND_INVALID;		default: return X86::COND_INVALID;
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	bool X86InstrInfo::isUnpredicatedTerminator(const MachineInstr &MI) const {
// Conditional branch is a special case.		// Conditional branch is a special case.
if (MI.isBranch() && !MI.isBarrier())		if (MI.isBranch() && !MI.isBarrier())
return true;		return true;
if (!MI.isPredicable())		if (!MI.isPredicable())
return true;		return true;
return !isPredicated(MI);		return !isPredicated(MI);
}		}

		// Given a MBB and its TBB, find the FBB which was a fallthrough MBB (it may not
		// be a fallthorough MBB now due to layout changes). Return nullptr if the
		DavidKreitzerUnsubmitted Not Done Reply Inline Actions Maybe add another sentence to this comment: "Return nullptr if the fallthough MBB cannot be identified." DavidKreitzer: Maybe add another sentence to this comment: "Return nullptr if the fallthough MBB cannot be…
		conghAuthorUnsubmitted Not Done Reply Inline Actions Done. congh: Done.
		// fallthough MBB cannot be identified.
		static MachineBasicBlock getFallThroughMBB(MachineBasicBlock MBB,
		MachineBasicBlock *TBB) {
		MachineBasicBlock *FallthroughBB = nullptr;
		for (auto SI = MBB->succ_begin(), SE = MBB->succ_end(); SI != SE; ++SI) {
		if ((SI)->isEHPad() \|\| SI == TBB)
		continue;
		DavidKreitzerUnsubmitted Not Done Reply Inline Actions Did you intentionally leave this debugging code here? DavidKreitzer: Did you intentionally leave this debugging code here?
		conghAuthorUnsubmitted Not Done Reply Inline Actions No.. my bad. I have removed them. congh: No.. my bad. I have removed them.
		// Return a nullptr if we found more than one fallthrough successor.
		if (FallthroughBB)
		return nullptr;
		FallthroughBB = *SI;
		}
		DavidKreitzerUnsubmitted Not Done Reply Inline Actions This assertion seems dangerous to me given that you are calling this routine from within AnalyzeBranch. Theoretically, you could be in the middle of analyzing an unsupported block like this: JA B1 JNP B2 JNE B3 .... fallthrough to B2 ... That would trigger a call to getFallThroughMBB with TBB == B3, and this assertion would fail when it sees the two other successors B1 & B2. I don't know whether it is even possible to get IR like this, but it seems like this code ought to tolerate it. I'd recommend simply returning nullptr if you find multiple non-EH, non-TBB successors. DavidKreitzer: This assertion seems dangerous to me given that you are calling this routine from within…
		conghAuthorUnsubmitted Not Done Reply Inline Actions OK. This makes sense. congh: OK. This makes sense.
		return FallthroughBB;
		}

bool X86InstrInfo::AnalyzeBranchImpl(		bool X86InstrInfo::AnalyzeBranchImpl(
MachineBasicBlock &MBB, MachineBasicBlock &TBB, MachineBasicBlock &FBB,		MachineBasicBlock &MBB, MachineBasicBlock &TBB, MachineBasicBlock &FBB,
SmallVectorImpl<MachineOperand> &Cond,		SmallVectorImpl<MachineOperand> &Cond,
SmallVectorImpl<MachineInstr *> &CondBranches, bool AllowModify) const {		SmallVectorImpl<MachineInstr *> &CondBranches, bool AllowModify) const {

// Start from the bottom of the block and work up, examining the		// Start from the bottom of the block and work up, examining the
// terminator instructions.		// terminator instructions.
MachineBasicBlock::iterator I = MBB.end();		MachineBasicBlock::iterator I = MBB.end();
▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	while (I != MBB.begin()) {
}		}

// Handle subsequent conditional branches. Only handle the case where all		// Handle subsequent conditional branches. Only handle the case where all
// conditional branches branch to the same destination and their condition		// conditional branches branch to the same destination and their condition
// opcodes fit one of the special multi-branch idioms.		// opcodes fit one of the special multi-branch idioms.
assert(Cond.size() == 1);		assert(Cond.size() == 1);
assert(TBB);		assert(TBB);

// Only handle the case where all conditional branches branch to the same
// destination.
if (TBB != I->getOperand(0).getMBB())
return true;

// If the conditions are the same, we can leave them alone.		// If the conditions are the same, we can leave them alone.
X86::CondCode OldBranchCode = (X86::CondCode)Cond[0].getImm();		X86::CondCode OldBranchCode = (X86::CondCode)Cond[0].getImm();
if (OldBranchCode == BranchCode)		auto NewTBB = I->getOperand(0).getMBB();
		if (OldBranchCode == BranchCode && TBB == NewTBB)
continue;		continue;

// If they differ, see if they fit one of the known patterns. Theoretically,		// If they differ, see if they fit one of the known patterns. Theoretically,
// we could handle more patterns here, but we shouldn't expect to see them		// we could handle more patterns here, but we shouldn't expect to see them
// if instruction selection has done a reasonable job.		// if instruction selection has done a reasonable job.
		GerolfUnsubmitted Not Done Reply Inline Actions It would be nice if you added a FIXME even though this is not part of your code. Any assumption about the IS patterns should be made explicit and checked with an assertion. Gerolf: It would be nice if you added a FIXME even though this is not part of your code. Any…
		conghAuthorUnsubmitted Not Done Reply Inline Actions Thanks for your review! I have added assertion checking if two destinations are identical for X86::COND_NP_OR_E and X86::COND_NE_OR_P. For X86::COND_P_AND_NE and X86::COND_E_AND_NP, however, I am not sure we should assert that they have different destinations. This is because it is still OK even when they have the same destination. And do still need a FIXME? congh: Thanks for your review! I have added assertion checking if two destinations are identical for…
if ((OldBranchCode == X86::COND_NP &&		if (TBB == NewTBB &&
		GerolfUnsubmitted Not Done Reply Inline Actions I guess my comment was a bit too out the box. What I had in mind is not related to your review. So let's table this. Gerolf: I guess my comment was a bit too out the box. What I had in mind is not related to your review.
		GerolfUnsubmitted Not Done Reply Inline Actions Now that the NewTBB check has been removed potentially the assertions below can fire. Gerolf: Now that the NewTBB check has been removed potentially the assertions below can fire.
BranchCode == X86::COND_E) \|\|		((OldBranchCode == X86::COND_P && BranchCode == X86::COND_NE) \|\|
(OldBranchCode == X86::COND_E &&		(OldBranchCode == X86::COND_NE && BranchCode == X86::COND_P))) {
BranchCode == X86::COND_NP))
BranchCode = X86::COND_NP_OR_E;
else if ((OldBranchCode == X86::COND_P &&
BranchCode == X86::COND_NE) \|\|
(OldBranchCode == X86::COND_NE &&
BranchCode == X86::COND_P))
BranchCode = X86::COND_NE_OR_P;		BranchCode = X86::COND_NE_OR_P;
else		} else if ((OldBranchCode == X86::COND_NP && BranchCode == X86::COND_NE) \|\|
		GerolfUnsubmitted Not Done Reply Inline Actions Perhaps I'm only iterating what Duncan said. What I'm confused by is that the previous pattern are symmetrical: For example the first case is NP && E or E && NP while the new cases are asymmetrical like here NE && NP or P && E (as opposed to NE && NP or NP && NE, which is what I would expect from the handling of the previous pattern). At least there need to be a good explanation (comment) for this. Gerolf: Perhaps I'm only iterating what Duncan said. What I'm confused by is that the previous pattern…
		conghAuthorUnsubmitted Not Done Reply Inline Actions This condition has two instructions with two different destinations, and the second destination is the true BB. Therefore if we have NE then NP, then the true body can only be reached with !NE && NP; that is E && NP. If we have P then E, the true body can only be reached with !P && E; that is NP && E. And then we got two equivalent conditions. I have added a comment explaining it. congh: This condition has two instructions with two different destinations, and the second destination…
		(OldBranchCode == X86::COND_E && BranchCode == X86::COND_P)) {
		if (NewTBB != (FBB ? FBB : getFallThroughMBB(&MBB, TBB)))
		return true;

		davidxlUnsubmitted Not Done Reply Inline Actions I might have missed other discussions (so that I completely missed with COND_P_AND_NE means), but should (NE \|\| NP) be equivalent to !(E && P) which means the branch code should be the negate of COND_P_AND_E? Similarly, (P\|\|E) should be negate of NE_AND_NP? davidxl: I might have missed other discussions (so that I completely missed with COND_P_AND_NE means)…
		// X86::COND_E_AND_NP usually has two different branch destinations.
		//
		// JP B1
		// JE B2
		// JMP B1
		// B1:
		// B2:
		//
		// Here this condition branches to B2 only if NP && E. It has another
		// equivalent form:
		//
		// JNE B1
		// JNP B2
		// JMP B1
		// B1:
		davidxlUnsubmitted Not Done Reply Inline Actions For B1, the condition is NP_OR_E, so why not using COND_NP_OR_E as the branch code? Is the new code needed? (after swapping TBB and FBB) davidxl: For B1, the condition is NP_OR_E, so why not using COND_NP_OR_E as the branch code? Is the new…
		conghAuthorUnsubmitted Not Done Reply Inline Actions This is actually why this patch is created: COND_NP_OR_E is equivalent to COND_P_AND_NE, but the latter has the second condition reversed based on the former. COND_NP_OR_E has a JNP and JE, while COND_P_AND_NE has a JNP and JNE, or JE and JP. COND_NP_OR_E always has two identical targets, but COND_P_AND_NE doesn't. So they are different patterns. We use different names so that in X86::GetOppositeBranchCondition() we can get the reverse condition code for each other. congh: This is actually why this patch is created: COND_NP_OR_E is equivalent to COND_P_AND_NE, but…
		davidxlUnsubmitted Not Done Reply Inline Actions I am not sure if we need to introduce the Opposite branch code for E_OR_NP. Example: JE B1 JP B2 B1 (fall through): B2: For this case, the Analyze branch can return success with the following tuple: {BranchCode = E_OR_NP, TBB = B1, FBB = B2} Later when insertBranch is called: there are two scenarios: same as above where B1 is the fall through. The inserted code will be the same as above B2 is the layout successor. In this second case, the generated code sequence should look like: JNP B1 JE B1 B2 (fall through): .. B1: or JE B1 JNP B1 B2: .. B1: Does it make sense? davidxl: I am not sure if we need to introduce the Opposite branch code for E_OR_NP. Example: JE B1…
		conghAuthorUnsubmitted Not Done Reply Inline Actions In some places GetOppositeBranchCondition() is used to get the opposite branch code in order to generate an opposite branch. If we don't have an opposite branch code for COND_E_OR_NP, then how to reverse it? congh: In some places GetOppositeBranchCondition() is used to get the opposite branch code in order to…
		davidxlUnsubmitted Not Done Reply Inline Actions GetOppositteBranchCondition for some reason is never called with COND_E_OR_NP before, do you know why this path was not triggered ? davidxl: GetOppositteBranchCondition for some reason is never called with COND_E_OR_NP before, do you…
		conghAuthorUnsubmitted Not Done Reply Inline Actions There are two places to reverse branches: AnalyzeBranch(), in which GetOppositteBranchCondition() is called when there is a unconditional jump in the end of a MBB. If this is the case of a je+jnp branch, then a je+jp branch will be generated only based on jnp (GetOppositteBranchCondition returns jnp for a jp), as at this moment the COND_E_OR_NP pattern hasn't been recognized. If there is no jmp in the end, GetOppositteBranchCondition() won't be called. A je+jp pattern is handled similarly. When there is no jmp following je+jnp, a COND_E_OR_NP pattern will be recognized. Block placement: in this pass COND_E_OR_NP won't be handled with a check. congh: There are two places to reverse branches: 1. AnalyzeBranch(), in which…
		conghAuthorUnsubmitted Not Done Reply Inline Actions One reason we need new condition codes here is that we should make sure for every condition code there is an opposite condition code for it. This makes it easier to reverse branches without checking what the branch code is. congh: One reason we need new condition codes here is that we should make sure for every condition…
		// B2:
		//
		// Similarly it branches to B2 only if E && NP. That is why this condition
		// is named with COND_E_AND_NP.
		davidxlUnsubmitted Not Done Reply Inline Actions To make sure the pattern is fully checked, I think NewTBB != TBB is also needed. davidxl: To make sure the pattern is fully checked, I think NewTBB != TBB is also needed.
		conghAuthorUnsubmitted Not Done Reply Inline Actions But there is nothing wrong when NewTBB == TBB. This pattern doesn't care if they are the same or different basic blocks. congh: But there is nothing wrong when NewTBB == TBB. This pattern doesn't care if they are the same…
		BranchCode = X86::COND_E_AND_NP;
		} else
		davidxlUnsubmitted Not Done Reply Inline Actions There is no need to change anything between line 4028 and 4037 (to simplify the patch). Just add a combined assertion after the pattern recognition: assert( (BranchCode != cond_np_or_e \|\| BranchCode != cond_ne_or_p \|\| NewTB == TBB) && "Identical target BB expected"); Actually since the previous early exit has been removed, the assert can fire off, so the right thing to do is add the TBB == newTBB condition davidxl: There is no need to change anything between line 4028 and 4037 (to simplify the patch). Just…
		conghAuthorUnsubmitted Not Done Reply Inline Actions You are right. But in practice I think when there is COND_NP and COND_E, their target is always the same (that is why I added assertion in this patch). But to be safer, I have replaced the assertion with a check. congh: You are right. But in practice I think when there is COND_NP and COND_E, their target is always…
return true;		return true;

		davidxlUnsubmitted Not Done Reply Inline Actions Should condition NewTBB != TBB be added here too? davidxl: Should condition NewTBB != TBB be added here too?
		conghAuthorUnsubmitted Not Done Reply Inline Actions They can be the same target. This means we don't have to check if they are the same or different targets. congh: They can be the same target. This means we don't have to check if they are the same or…
		conghAuthorUnsubmitted Not Done Reply Inline Actions I found that we could not add NewTBB != TBB here. I found such a test case, in which the target of JNE and JNP and the fall-through block are the same block. # Machine code for function func: Post SSA Frame Objects: fi#-2: size=4, align=4, fixed, at location [SP+8] fi#-1: size=4, align=16, fixed, at location [SP+4] fi#0: size=4, align=4, at location [SP-4] Constant Pool: cp#0: -1.000000e+00, align=8 BB#0: derived from LLVM BB %entry PUSH32r %EAX<undef>, %ESP<imp-def>, %ESP<imp-use>; flags: FrameSetup %XMM1<def> = CVTSS2SDrm %ESP, 1, %noreg, 8, %noreg; mem:LD4[FixedStack-1](align=16) %XMM0<def> = CVTSS2SDrm %ESP, 1, %noreg, 12, %noreg; mem:LD4[FixedStack-2] %XMM0<def,tied1> = MULSDrr %XMM0<kill,tied0>, %XMM1<kill> %XMM1<def> = FsFLD0SD UCOMISDrr %XMM0, %XMM1<kill>, %EFLAGS<imp-def> JNE_1 <BB#2>, %EFLAGS<imp-use> JNP_1 <BB#2>, %EFLAGS<imp-use> Successors according to CFG: BB#3(0x50000000 / 0x80000000 = 62.50%) BB#2(0x30000000 / 0x80000000 = 37.50%) BB#2: derived from LLVM BB %bb1 Live Ins: %XMM0 Predecessors according to CFG: BB#0 %XMM0<def,tied1> = ADDSDrm %XMM0<kill,tied0>, %noreg, 1, %noreg, <cp#0>, %noreg; mem:LD8[ConstantPool] Successors according to CFG: BB#3(?%) BB#3: derived from LLVM BB %bb2 Live Ins: %XMM0 Predecessors according to CFG: BB#2 BB#0 %XMM0<def> = CVTSD2SSrr %XMM0<kill> MOVSSmr %ESP, 1, %noreg, 0, %noreg, %XMM0<kill>; mem:ST4[FixedStack0] LD_F32m %ESP, 1, %noreg, 0, %noreg, %FPSW<imp-def,dead>; mem:LD4[FixedStack0] %EAX<def> = POP32r %ESP<imp-def>, %ESP<imp-use>; flags: FrameDestroy RETL # End machine code for function func. congh: I found that we could not add NewTBB != TBB here. I found such a test case, in which the target…
		davidxlUnsubmitted Not Done Reply Inline Actions Should these two JCCs be eliminated? davidxl: Should these two JCCs be eliminated?
		conghAuthorUnsubmitted Not Done Reply Inline Actions This is just an intermediate state, when AnalyzeBranch() is called. I believe they will be eliminated later. congh: This is just an intermediate state, when AnalyzeBranch() is called. I believe they will be…
// Update the MachineOperand.		// Update the MachineOperand.
		GerolfUnsubmitted Not Done Reply Inline Actions I would need a picture and examples to understand for which conditions chains the TBB condition is relevant. Gerolf: I would need a picture and examples to understand for which conditions chains the TBB condition…
		conghAuthorUnsubmitted Not Done Reply Inline Actions I have added a comment showing examples of X86::COND_P_AND_NE in which two branch destinations are different. congh: I have added a comment showing examples of X86::COND_P_AND_NE in which two branch destinations…
Cond[0].setImm(BranchCode);		Cond[0].setImm(BranchCode);
CondBranches.push_back(I);		CondBranches.push_back(I);
}		}
		GerolfUnsubmitted Not Done Reply Inline Actions The problem I have with this review is partially historical. COND_NE_OR_P etc doesn't make sense to me without explanation. OR does not seem to be used in a logical sense since both branch conditions should be present. But also your COND names obfuscate the picture (perhaps just for me though) even more: previously the branch targets had to be identical, now they can be different. This is a new concept and pressing it into the existing AnalyzeBranch routine makes the code harder too maintain. From this angle to have functions that handle the new functionality. Gerolf: The problem I have with this review is partially historical. COND_NE_OR_P etc doesn't make…
		conghAuthorUnsubmitted Not Done Reply Inline Actions This is because on X86 the equality/non-equality comparison between floating points is translated into two instructions, and the conditions of those two instructions represent a logical OR instead of AND, as they jump to the same destination. Normally the negation of OR is AND, and that is why the reverse condition is named with AND. I agree that it is not straightforward to understand that it has two different branch targets, but that is the correct way to reverse it. congh: This is because on X86 the equality/non-equality comparison between floating points is…

return false;		return false;
}		}

bool X86InstrInfo::AnalyzeBranch(MachineBasicBlock &MBB,		bool X86InstrInfo::AnalyzeBranch(MachineBasicBlock &MBB,
MachineBasicBlock *&TBB,		MachineBasicBlock *&TBB,
MachineBasicBlock *&FBB,		MachineBasicBlock *&FBB,
SmallVectorImpl<MachineOperand> &Cond,		SmallVectorImpl<MachineOperand> &Cond,
bool AllowModify) const {		bool AllowModify) const {
SmallVector<MachineInstr *, 4> CondBranches;		SmallVector<MachineInstr *, 4> CondBranches;
return AnalyzeBranchImpl(MBB, TBB, FBB, Cond, CondBranches, AllowModify);		return AnalyzeBranchImpl(MBB, TBB, FBB, Cond, CondBranches, AllowModify);
}		}

		davidxlUnsubmitted Not Done Reply Inline Actions It is confusing here. The comment says the condition to B2 is NP_AND_E, but the branch code is P_AND_NE .. Also the condition to B1 is NE_OR_P, so why not using COND_NE_OR_P? davidxl: It is confusing here. The comment says the condition to B2 is NP_AND_E, but the branch code is…
		conghAuthorUnsubmitted Not Done Reply Inline Actions I found the comment is incorrect. It is for COND_E_AND_NP not COND_P_AND_NE. I have updated the comment. congh: I found the comment is incorrect. It is for COND_E_AND_NP not COND_P_AND_NE. I have updated the…
bool X86InstrInfo::AnalyzeBranchPredicate(MachineBasicBlock &MBB,		bool X86InstrInfo::AnalyzeBranchPredicate(MachineBasicBlock &MBB,
MachineBranchPredicate &MBP,		MachineBranchPredicate &MBP,
bool AllowModify) const {		bool AllowModify) const {
		davidxlUnsubmitted Not Done Reply Inline Actions should NewTBB == TBB be added here? davidxl: should NewTBB == TBB be added here?
		conghAuthorUnsubmitted Not Done Reply Inline Actions You mean NewTBB != TBB? See my comment above: they are be the same target. congh: You mean NewTBB != TBB? See my comment above: they are be the same target.
using namespace std::placeholders;		using namespace std::placeholders;

SmallVector<MachineOperand, 4> Cond;		SmallVector<MachineOperand, 4> Cond;
SmallVector<MachineInstr *, 4> CondBranches;		SmallVector<MachineInstr *, 4> CondBranches;
if (AnalyzeBranchImpl(MBB, MBP.TrueDest, MBP.FalseDest, Cond, CondBranches,		if (AnalyzeBranchImpl(MBB, MBP.TrueDest, MBP.FalseDest, Cond, CondBranches,
AllowModify))		AllowModify))
		davidxlUnsubmitted Not Done Reply Inline Actions Why not unconditionally return true in else {} ? davidxl: Why not unconditionally return true in else {} ?
		conghAuthorUnsubmitted Not Done Reply Inline Actions I think it is OK to unconditionally return true here. congh: I think it is OK to unconditionally return true here.
return true;		return true;

if (Cond.size() != 1)		if (Cond.size() != 1)
return true;		return true;

assert(MBP.TrueDest && "expected!");		assert(MBP.TrueDest && "expected!");

if (!MBP.FalseDest)		if (!MBP.FalseDest)
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	X86InstrInfo::InsertBranch(MachineBasicBlock &MBB, MachineBasicBlock *TBB,

if (Cond.empty()) {		if (Cond.empty()) {
// Unconditional branch?		// Unconditional branch?
assert(!FBB && "Unconditional branch with multiple successors!");		assert(!FBB && "Unconditional branch with multiple successors!");
BuildMI(&MBB, DL, get(X86::JMP_1)).addMBB(TBB);		BuildMI(&MBB, DL, get(X86::JMP_1)).addMBB(TBB);
return 1;		return 1;
}		}

		// If FBB is null, it is implied to be a fall-through block.
		bool FallThru = FBB == nullptr;

// Conditional branch.		// Conditional branch.
unsigned Count = 0;		unsigned Count = 0;
X86::CondCode CC = (X86::CondCode)Cond[0].getImm();		X86::CondCode CC = (X86::CondCode)Cond[0].getImm();
switch (CC) {		switch (CC) {
case X86::COND_NP_OR_E:
// Synthesize NP_OR_E with two branches.
BuildMI(&MBB, DL, get(X86::JNP_1)).addMBB(TBB);
++Count;
BuildMI(&MBB, DL, get(X86::JE_1)).addMBB(TBB);
++Count;
break;
case X86::COND_NE_OR_P:		case X86::COND_NE_OR_P:
// Synthesize NE_OR_P with two branches.		// Synthesize NE_OR_P with two branches.
BuildMI(&MBB, DL, get(X86::JNE_1)).addMBB(TBB);		BuildMI(&MBB, DL, get(X86::JNE_1)).addMBB(TBB);
++Count;		++Count;
BuildMI(&MBB, DL, get(X86::JP_1)).addMBB(TBB);		BuildMI(&MBB, DL, get(X86::JP_1)).addMBB(TBB);
++Count;		++Count;
break;		break;
		case X86::COND_E_AND_NP:
		// Use the next block of MBB as FBB if it is null.
		davidxlUnsubmitted Done Reply Inline Actions Fix comment. It is not 'implied' -- this is guaranteed by AnalyzeBranch davidxl: Fix comment. It is not 'implied' -- this is guaranteed by AnalyzeBranch
		if (FBB == nullptr) {
		FBB = getFallThroughMBB(&MBB, TBB);
		davidxlUnsubmitted Done Reply Inline Actions Have a little wrapper 'getFallthroughBlock' davidxl: Have a little wrapper 'getFallthroughBlock'
		assert(FBB && "MBB cannot be the last block in function when the false "
		"body is a fall-through.");
		}
		// Synthesize COND_E_AND_NP with two branches.
		BuildMI(&MBB, DL, get(X86::JNE_1)).addMBB(FBB);
		++Count;
		BuildMI(&MBB, DL, get(X86::JNP_1)).addMBB(TBB);
		++Count;
		break;
default: {		default: {
unsigned Opc = GetCondBranchFromCond(CC);		unsigned Opc = GetCondBranchFromCond(CC);
BuildMI(&MBB, DL, get(Opc)).addMBB(TBB);		BuildMI(&MBB, DL, get(Opc)).addMBB(TBB);
++Count;		++Count;
}		}
}		}
if (FBB) {		if (!FallThru) {
// Two-way Conditional branch. Insert the second branch.		// Two-way Conditional branch. Insert the second branch.
BuildMI(&MBB, DL, get(X86::JMP_1)).addMBB(FBB);		BuildMI(&MBB, DL, get(X86::JMP_1)).addMBB(FBB);
++Count;		++Count;
}		}
return Count;		return Count;
}		}

bool X86InstrInfo::		bool X86InstrInfo::
▲ Show 20 Lines • Show All 2,544 Lines • ▼ Show 20 Lines	case X86::DEC8r:
return FuseKind == FuseInc;		return FuseKind == FuseInc;
}		}
}		}

bool X86InstrInfo::		bool X86InstrInfo::
ReverseBranchCondition(SmallVectorImpl<MachineOperand> &Cond) const {		ReverseBranchCondition(SmallVectorImpl<MachineOperand> &Cond) const {
assert(Cond.size() == 1 && "Invalid X86 branch condition!");		assert(Cond.size() == 1 && "Invalid X86 branch condition!");
X86::CondCode CC = static_cast<X86::CondCode>(Cond[0].getImm());		X86::CondCode CC = static_cast<X86::CondCode>(Cond[0].getImm());
if (CC == X86::COND_NE_OR_P \|\| CC == X86::COND_NP_OR_E)
return true;
Cond[0].setImm(GetOppositeBranchCondition(CC));		Cond[0].setImm(GetOppositeBranchCondition(CC));
return false;		return false;
}		}

bool X86InstrInfo::		bool X86InstrInfo::
isSafeToMoveRegClassDefs(const TargetRegisterClass *RC) const {		isSafeToMoveRegClassDefs(const TargetRegisterClass *RC) const {
// FIXME: Return false for x87 stack register classes for now. We can't		// FIXME: Return false for x87 stack register classes for now. We can't
// allow any loads of these registers before FpGet_ST0_80.		// allow any loads of these registers before FpGet_ST0_80.
▲ Show 20 Lines • Show All 615 Lines • Show Last 20 Lines

test/CodeGen/X86/block-placement.ll

Show First 20 Lines • Show All 456 Lines • ▼ Show 20 Lines	bogus:
unreachable		unreachable
step:		step:
br label %exit		br label %exit
exit:		exit:
%merge = phi i32 [ 3, %step ], [ 6, %entry ]		%merge = phi i32 [ 3, %step ], [ 6, %entry ]
ret i32 %merge		ret i32 %merge
}		}

define void @fpcmp_unanalyzable_branch(i1 %cond) {		define void @fpcmp_unanalyzable_branch(i1 %cond) {
; This function's CFG contains an unanalyzable branch that is likely to be		; This function's CFG contains an once-unanalyzable branch (une on floating
		davidxlUnsubmitted Done Reply Inline Actions Can you make the ordering check more strict here? davidxl: Can you make the ordering check more strict here?
		chandlercUnsubmitted Not Done Reply Inline Actions Here and elsewhere, when updating a test, it would be good to convert it to use CHECK-LABEL at least, and then in the specific test cases, actually write narrow checks. For cases where we have very microscopic functions testing single instruction sequences, using update_llc_test_checks.py is incredibly useful. You can also look at the style of tests in generates and generate comparably structured checks for more complex test cases. chandlerc: Here and elsewhere, when updating a test, it would be good to convert it to use CHECK-LABEL at…
		conghAuthorUnsubmitted Not Done Reply Inline Actions OK. I have update this test according to the results from running update_llc_test_checks.py. congh: OK. I have update this test according to the results from running update_llc_test_checks.py.
; split due to having a different high-probability predecessor.		; points). As now it becomes analyzable, we should get best layout in which each
; CHECK: fpcmp_unanalyzable_branch		; edge in 'entry' -> 'entry.if.then_crit_edge' -> 'if.then' -> 'if.end' is
		chandlercUnsubmitted Not Done Reply Inline Actions Especially, the checks just on these %foo terms makes it hard to at a glance see what is being checked. Having a bit more syntax, or locating the checks inside the code itself, i think will amke it much more clear. chandlerc: Especially, the checks just on these %foo terms makes it hard to at a glance see what is being…
; CHECK: %entry		; fall-through.
; CHECK: %exit		; CHECK-LABEL: fpcmp_unanalyzable_branch:
; CHECK-NOT: %if.then		; CHECK: # BB#0: # %entry
; CHECK-NOT: %if.end		; CHECK: # BB#1: # %entry.if.then_crit_edge
; CHECK-NOT: jne		; CHECK: .LBB10_4: # %if.then
; CHECK-NOT: jnp		; CHECK: .LBB10_5: # %if.end
; CHECK: jne		; CHECK: # BB#3: # %exit
; CHECK-NEXT: jnp		; CHECK: jne .LBB10_4
; CHECK-NEXT: %if.then		; CHECK-NEXT: jnp .LBB10_5
		; CHECK-NEXT: jmp .LBB10_4

entry:		entry:
; Note that this branch must be strongly biased toward		; Note that this branch must be strongly biased toward
; 'entry.if.then_crit_edge' to ensure that we would try to form a chain for		; 'entry.if.then_crit_edge' to ensure that we would try to form a chain for
; 'entry' -> 'entry.if.then_crit_edge' -> 'if.then'. It is the last edge in that		; 'entry' -> 'entry.if.then_crit_edge' -> 'if.then' -> 'if.end'.
; chain which would violate the unanalyzable branch in 'exit', but we won't even
; try this trick unless 'if.then' is believed to almost always be reached from
; 'entry.if.then_crit_edge'.
br i1 %cond, label %entry.if.then_crit_edge, label %lor.lhs.false, !prof !1		br i1 %cond, label %entry.if.then_crit_edge, label %lor.lhs.false, !prof !1

entry.if.then_crit_edge:		entry.if.then_crit_edge:
%.pre14 = load i8, i8* undef, align 1		%.pre14 = load i8, i8* undef, align 1
br label %if.then		br label %if.then

lor.lhs.false:		lor.lhs.false:
br i1 undef, label %if.end, label %exit		br i1 undef, label %if.end, label %exit

exit:		exit:
%cmp.i = fcmp une double 0.000000e+00, undef		%cmp.i = fcmp une double 0.000000e+00, undef
br i1 %cmp.i, label %if.then, label %if.end		br i1 %cmp.i, label %if.then, label %if.end, !prof !3
		davidxlUnsubmitted Done Reply Inline Actions to make the test more robust, perhaps annotate this branch with weights { 1, 1000} where from 'exit' block it is not likely to branch into if.then. davidxl: to make the test more robust, perhaps annotate this branch with weights { 1, 1000} where from…

if.then:		if.then:
%0 = phi i8 [ %.pre14, %entry.if.then_crit_edge ], [ undef, %exit ]		%0 = phi i8 [ %.pre14, %entry.if.then_crit_edge ], [ undef, %exit ]
%1 = and i8 %0, 1		%1 = and i8 %0, 1
store i8 %1, i8* undef, align 4		store i8 %1, i8* undef, align 4
br label %if.end		br label %if.end

if.end:		if.end:
ret void		ret void
}		}

!1 = !{!"branch_weights", i32 1000, i32 1}		!1 = !{!"branch_weights", i32 1000, i32 1}
		!3 = !{!"branch_weights", i32 1, i32 1000}

declare i32 @f()		declare i32 @f()
declare i32 @g()		declare i32 @g()
declare i32 @h(i32 %x)		declare i32 @h(i32 %x)

define i32 @test_global_cfg_break_profitability() {		define i32 @test_global_cfg_break_profitability() {
; Check that our metrics for the profitability of a CFG break are global rather		; Check that our metrics for the profitability of a CFG break are global rather
; than local. A successor may be very hot, but if the current block isn't, it		; than local. A successor may be very hot, but if the current block isn't, it
▲ Show 20 Lines • Show All 142 Lines • ▼ Show 20 Lines
exit:		exit:
ret void		ret void
}		}

define void @unanalyzable_branch_to_best_succ(i1 %cond) {		define void @unanalyzable_branch_to_best_succ(i1 %cond) {
; Ensure that we can handle unanalyzable branches where the destination block		; Ensure that we can handle unanalyzable branches where the destination block
; gets selected as the optimal successor to merge.		; gets selected as the optimal successor to merge.
;		;
		; This branch is now analyzable and hence the destination block becomes the
		; hotter one. The right order is entry->bar->exit->foo.
		davidxlUnsubmitted Done Reply Inline Actions comment that the right order is : entry -> bar -> exit -> foo davidxl: comment that the right order is : entry -> bar -> exit -> foo
		;
; CHECK: unanalyzable_branch_to_best_succ		; CHECK: unanalyzable_branch_to_best_succ
; CHECK: %entry		; CHECK: %entry
; CHECK: %foo
; CHECK: %bar		; CHECK: %bar
		davidxlUnsubmitted Not Done Reply Inline Actions use check-next for more strict checking. davidxl: use check-next for more strict checking.
		conghAuthorUnsubmitted Not Done Reply Inline Actions However, we cannot use CHECK-NEXT here. The output is: unanalyzable_branch_to_best_succ: # @unanalyzable_branch_to_best_succ .cfi_startproc # BB#0: # %entry subl $12, %esp .Ltmp39: .cfi_def_cfa_offset 16 testb $1, 16(%esp) je .LBB16_1 .LBB16_2: # %bar calll f .LBB16_3: # %exit addl $12, %esp retl .LBB16_1: # %foo fldz fucomp %st(0) fnstsw %ax sahf jne .LBB16_2 jnp .LBB16_3 jmp .LBB16_2 congh: However, we cannot use CHECK-NEXT here. The output is: ``` unanalyzable_branch_to_best_succ…
		davidxlUnsubmitted Done Reply Inline Actions Can you use the following to enforce the order.. ; CHECK-DAG: #entry ; CHECK-NOT: <something> ; CHECK-DAG: #bar davidxl: Can you use the following to enforce the order.. ; CHECK-DAG: #entry ; CHECK-NOT…
		conghAuthorUnsubmitted Not Done Reply Inline Actions I don't get it. What is the problem of the current CHECKs? I think the order of those four blocks is already enforced. congh: I don't get it. What is the problem of the current CHECKs? I think the order of those four…
		davidxlUnsubmitted Not Done Reply Inline Actions Ok -- you are right. davidxl: Ok -- you are right.
; CHECK: %exit		; CHECK: %exit
		; CHECK: %foo

entry:		entry:
; Bias this branch toward bar to ensure we form that chain.		; Bias this branch toward bar to ensure we form that chain.
br i1 %cond, label %bar, label %foo, !prof !1		br i1 %cond, label %bar, label %foo, !prof !1

foo:		foo:
%cmp = fcmp une double 0.000000e+00, undef		%cmp = fcmp une double 0.000000e+00, undef
br i1 %cmp, label %bar, label %exit		br i1 %cmp, label %bar, label %exit
▲ Show 20 Lines • Show All 405 Lines • Show Last 20 Lines

test/CodeGen/X86/fast-isel-cmp-branch2.ll

	; RUN: llc < %s -mtriple=x86_64-apple-darwin10 \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-apple-darwin10 \| FileCheck %s
	; RUN: llc < %s -fast-isel -fast-isel-abort=1 -mtriple=x86_64-apple-darwin10 \| FileCheck %s			; RUN: llc < %s -fast-isel -fast-isel-abort=1 -mtriple=x86_64-apple-darwin10 \| FileCheck %s

	define i32 @fcmp_oeq(float %x, float %y) {			define i32 @fcmp_oeq(float %x, float %y) {
	; CHECK-LABEL: fcmp_oeq			; CHECK-LABEL: fcmp_oeq
	; CHECK: ucomiss %xmm1, %xmm0			; CHECK: ucomiss %xmm1, %xmm0
	; CHECK-NEXT: jne {{LBB.+_1}}			; CHECK-NEXT: jne {{LBB.+_1}}
	; CHECK-NEXT: jnp {{LBB.+_2}}			; CHECK-NEXT: jp {{LBB.+_1}}
	%1 = fcmp oeq float %x, %y			%1 = fcmp oeq float %x, %y
	br i1 %1, label %bb1, label %bb2			br i1 %1, label %bb1, label %bb2
	bb2:			bb2:
	ret i32 1			ret i32 1
	bb1:			bb1:
	ret i32 0			ret i32 0
	}			}

	▲ Show 20 Lines • Show All 140 Lines • ▼ Show 20 Lines
	bb1:			bb1:
	ret i32 0			ret i32 0
	}			}

	define i32 @fcmp_une(float %x, float %y) {			define i32 @fcmp_une(float %x, float %y) {
	; CHECK-LABEL: fcmp_une			; CHECK-LABEL: fcmp_une
	; CHECK: ucomiss %xmm1, %xmm0			; CHECK: ucomiss %xmm1, %xmm0
	; CHECK-NEXT: jne {{LBB.+_2}}			; CHECK-NEXT: jne {{LBB.+_2}}
	; CHECK-NEXT: jp {{LBB.+_2}}			; CHECK-NEXT: jnp {{LBB.+_1}}
	; CHECK-NEXT: jmp {{LBB.+_1}}
	%1 = fcmp une float %x, %y			%1 = fcmp une float %x, %y
	br i1 %1, label %bb1, label %bb2			br i1 %1, label %bb1, label %bb2
	bb2:			bb2:
	ret i32 1			ret i32 1
	bb1:			bb1:
	ret i32 0			ret i32 0
	}			}

	▲ Show 20 Lines • Show All 120 Lines • Show Last 20 Lines

test/CodeGen/X86/fast-isel-cmp-branch3.ll

Show All 11 Lines	bb1:
ret i32 0		ret i32 0
}		}

define i32 @fcmp_oeq2(float %x) {		define i32 @fcmp_oeq2(float %x) {
; CHECK-LABEL: fcmp_oeq2		; CHECK-LABEL: fcmp_oeq2
; CHECK: xorps %xmm1, %xmm1		; CHECK: xorps %xmm1, %xmm1
; CHECK-NEXT: ucomiss %xmm1, %xmm0		; CHECK-NEXT: ucomiss %xmm1, %xmm0
; CHECK-NEXT: jne {{LBB.+_1}}		; CHECK-NEXT: jne {{LBB.+_1}}
; CHECK-NEXT: jnp {{LBB.+_2}}		; CHECK-NEXT: jp {{LBB.+_1}}
%1 = fcmp oeq float %x, 0.000000e+00		%1 = fcmp oeq float %x, 0.000000e+00
br i1 %1, label %bb1, label %bb2		br i1 %1, label %bb1, label %bb2
bb2:		bb2:
ret i32 1		ret i32 1
bb1:		bb1:
ret i32 0		ret i32 0
}		}

▲ Show 20 Lines • Show All 304 Lines • ▼ Show 20 Lines	bb1:
ret i32 0		ret i32 0
}		}

define i32 @fcmp_une2(float %x) {		define i32 @fcmp_une2(float %x) {
; CHECK-LABEL: fcmp_une2		; CHECK-LABEL: fcmp_une2
; CHECK: xorps %xmm1, %xmm1		; CHECK: xorps %xmm1, %xmm1
; CHECK-NEXT: ucomiss %xmm1, %xmm0		; CHECK-NEXT: ucomiss %xmm1, %xmm0
; CHECK-NEXT: jne {{LBB.+_2}}		; CHECK-NEXT: jne {{LBB.+_2}}
; CHECK-NEXT: jp {{LBB.+_2}}		; CHECK-NEXT: jnp {{LBB.+_1}}
; CHECK-NEXT: jmp {{LBB.+_1}}
%1 = fcmp une float %x, 0.000000e+00		%1 = fcmp une float %x, 0.000000e+00
br i1 %1, label %bb1, label %bb2		br i1 %1, label %bb1, label %bb2
bb2:		bb2:
ret i32 1		ret i32 1
bb1:		bb1:
ret i32 0		ret i32 0
}		}

▲ Show 20 Lines • Show All 120 Lines • Show Last 20 Lines

test/CodeGen/X86/fp-une-cmp.ll

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	bb1:
%add = fadd double %mul, -1.000000e+00		%add = fadd double %mul, -1.000000e+00
br label %bb2		br label %bb2

bb2:		bb2:
%phi = phi double [ %add, %bb1 ], [ %mul, %entry ]		%phi = phi double [ %add, %bb1 ], [ %mul, %entry ]
ret double %phi		ret double %phi
}		}

; FIXME: With branch weights indicated, bb2 should be placed ahead of bb1.

define double @profile_metadata(double %x, double %y) {		define double @profile_metadata(double %x, double %y) {
; CHECK-LABEL: profile_metadata:		; CHECK-LABEL: profile_metadata:
; CHECK: # BB#0: # %entry		; CHECK: # BB#0: # %entry
; CHECK-NEXT: mulsd %xmm1, %xmm0		; CHECK-NEXT: mulsd %xmm1, %xmm0
; CHECK-NEXT: xorpd %xmm1, %xmm1		; CHECK-NEXT: xorpd %xmm1, %xmm1
		davidxlUnsubmitted Not Done Reply Inline Actions The branch sequence does imply BB2 is reordered before BB1. Is it better to explicit test the label order? Also why is the jump sequence not the optimal one: jne bb2 jnp bb1 bb2: bb1: davidxl: The branch sequence does imply BB2 is reordered before BB1. Is it better to explicit test the…
		conghAuthorUnsubmitted Not Done Reply Inline Actions OK, I will test the BB orders. The generated assembly is shown below, which should be optimal. func2: # @func2 # BB#0: # %entry pushl %eax cvtss2sd 8(%esp), %xmm1 cvtss2sd 12(%esp), %xmm0 mulsd %xmm1, %xmm0 xorpd %xmm1, %xmm1 ucomisd %xmm1, %xmm0 jne .LBB1_1 jp .LBB1_1 .LBB1_2: # %bb2 cvtsd2ss %xmm0, %xmm0 movss %xmm0, (%esp) flds (%esp) popl %eax retl .LBB1_1: # %bb1 addsd .LCPI1_0, %xmm0 jmp .LBB1_2 .Lfunc_end1: .size func2, .Lfunc_end1-func2 congh: OK, I will test the BB orders. The generated assembly is shown below, which should be optimal.
; CHECK-NEXT: ucomisd %xmm1, %xmm0		; CHECK-NEXT: ucomisd %xmm1, %xmm0
; CHECK-NEXT: jne .LBB1_1		; CHECK-NEXT: jne .LBB1_1
; CHECK-NEXT: jnp .LBB1_2		; CHECK-NEXT: jp .LBB1_1
; CHECK-NEXT: .LBB1_1: # %bb1
; CHECK-NEXT: addsd {{.*}}(%rip), %xmm0
; CHECK-NEXT: .LBB1_2: # %bb2		; CHECK-NEXT: .LBB1_2: # %bb2
; CHECK-NEXT: retq		; CHECK-NEXT: retq
		; CHECK-NEXT: .LBB1_1: # %bb1
		; CHECK-NEXT: addsd {{.*}}(%rip), %xmm0
		; CHECK-NEXT: jmp .LBB1_2

entry:		entry:
%mul = fmul double %x, %y		%mul = fmul double %x, %y
%cmp = fcmp une double %mul, 0.000000e+00		%cmp = fcmp une double %mul, 0.000000e+00
br i1 %cmp, label %bb1, label %bb2, !prof !1		br i1 %cmp, label %bb1, label %bb2, !prof !1

bb1:		bb1:
%add = fadd double %mul, -1.000000e+00		%add = fadd double %mul, -1.000000e+00
br label %bb2		br label %bb2

bb2:		bb2:
%phi = phi double [ %add, %bb1 ], [ %mul, %entry ]		%phi = phi double [ %add, %bb1 ], [ %mul, %entry ]
ret double %phi		ret double %phi
}		}

!1 = !{!"branch_weights", i32 1, i32 1000}		; Test if the negation of the non-equality check between floating points are
		; translated to jnp followed by jne.

		define void @foo(float %f) {
		; CHECK-LABEL: foo:
		; CHECK: # BB#0: # %entry
		; CHECK-NEXT: xorps %xmm1, %xmm1
		; CHECK-NEXT: ucomiss %xmm1, %xmm0
		; CHECK-NEXT: jne .LBB2_2
		; CHECK-NEXT: jnp .LBB2_1
		; CHECK-NEXT: .LBB2_2: # %if.then
		; CHECK-NEXT: jmp a # TAILCALL
		; CHECK-NEXT: .LBB2_1: # %if.end
		; CHECK-NEXT: retq
		entry:
		%cmp = fcmp une float %f, 0.000000e+00
		br i1 %cmp, label %if.then, label %if.end

		if.then:
		tail call void @a()
		br label %if.end

		if.end:
		ret void
		}

		declare void @a()

		!1 = !{!"branch_weights", i32 1, i32 1000}
		chandlercUnsubmitted Not Done Reply Inline Actions These checks don't really make sense to me. Why are they above the function, but the function has a CHECK-LABEL adn seemingly similar checks? chandlerc: These checks don't really make sense to me. Why are they above the function, but the function…
		spatelUnsubmitted Not Done Reply Inline Actions Those checks are the old lines from before the update check script was run. They should be deleted. spatel: Those checks are the old lines from before the update check script was run. They should be…
		conghAuthorUnsubmitted Not Done Reply Inline Actions Yes. I have deleted them. congh: Yes. I have deleted them.

test/CodeGen/X86/x86-analyze-branch-jne-jp.s

This file was added.

				.text
				.file "../llvm/test/CodeGen/X86/x86-analyze-branch-jne-jp.ll"
				.globl foo
				.p2align 4, 0x90
				.type foo,@function
				foo: # @foo
				.cfi_startproc
				# BB#0: # %entry
				xorps %xmm1, %xmm1
				ucomiss %xmm1, %xmm0
				jne .LBB0_2
				jnp .LBB0_1
				.LBB0_2: # %if.then
				jmp a # TAILCALL
				.LBB0_1: # %if.end
				retq
				.Lfunc_end0:
				.size foo, .Lfunc_end0-foo
				.cfi_endproc


				.section ".note.GNU-stack","",@progbits

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 50343

include/llvm/Target/TargetInstrInfo.h

lib/CodeGen/TailDuplication.cpp

lib/Target/X86/X86InstrInfo.h

lib/Target/X86/X86InstrInfo.cpp

test/CodeGen/X86/block-placement.ll

test/CodeGen/X86/fast-isel-cmp-branch2.ll

test/CodeGen/X86/fast-isel-cmp-branch3.ll

test/CodeGen/X86/fp-une-cmp.ll

test/CodeGen/X86/x86-analyze-branch-jne-jp.s

[X86] Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed.
ClosedPublic