Download Raw Diff

Details

Reviewers

zvi
sdardis
delena
vkalintiris
mkuper
hfinkel

Commits

rG3a3aaf67e07a: [DAG] optimize negation of bool
rL284611: [DAG] optimize negation of bool

Summary

Use mask and and negate for legalization of i1 source type with SIGN_EXTEND_INREG.

Diff Detail

Event Timeline

spatel updated this revision to Diff 74289.Oct 11 2016, 2:03 PM

spatel retitled this revision from to [x86] use 'neg' for negation of bool.

spatel updated this object.

spatel added reviewers: hfinkel, zvi, delena, mkuper.

spatel added a subscriber: llvm-commits.

Herald added a subscriber: mcrosier. · View Herald TranscriptOct 11 2016, 2:03 PM

Sanjay, can you please a RUN: and CHECK:'s for a 32-bit target?

Can we do this in SelectionDAGLegalize::ExpandNode instead? I suppose in theory some platform might prefer to lower this using shifts, but I can't think of any off the top of my head.

lib/Target/X86/X86ISelLowering.cpp
16343	Given "SIGN_EXTEND_INREG(X, i1)", you can transform it to "-(X&1)". But you can't assume the input is zero-extended. I think your patch misbehaves for the following testcase: define void @_Z1fbi(i1 zeroext %a, i32 %b, i8* %p) local_unnamed_addr #0 { entry: %conv = trunc i32 %b to i1 %or = or i1 %a, %conv %s = sext i1 %or to i8 store i8 %s, i8* %p ret void }

In D25485#568598, @efriedma wrote:

Can we do this in SelectionDAGLegalize::ExpandNode instead? I suppose in theory some platform might prefer to lower this using shifts, but I can't think of any off the top of my head.

Would we still need to do the 'and' masking op though? In that case, there's probably no win?

lib/Target/X86/X86ISelLowering.cpp
16343	Ah, I misunderstood the meaning/guarantee of the source value type of ISD::SIGN_EXTEND_INREG. So for the given example, we get: orb %dil, %sil shlb $7, %sil sarb $7, %sil movb %sil, (%rdx) But with this patch: orb %dil, %sil negb %sil movb %sil, (%rdx) And that's wrong if any of the higher bits of %b / %sil are set.

DAGCombine can eliminate the mask instruction in a lot of cases (if the value is in fact zero-extended). Also, the mask+neg is probably slightly more efficient than two shifts on most processors.

In D25485#568525, @zvi wrote:

Sanjay, can you please a RUN: and CHECK:'s for a 32-bit target?

Sure, that's:
https://reviews.llvm.org/rL284124

In D25485#568808, @efriedma wrote:

DAGCombine can eliminate the mask instruction in a lot of cases (if the value is in fact zero-extended). Also, the mask+neg is probably slightly more efficient than two shifts on most processors.

Yes, you're correct - thanks!

So I tried the SelectionDAGLegalize::ExpandNode suggestion, and I see one problem case: micromips.
I don't know micromips (cc'ing @sdardis and @dsanders), but it doesn't appear to have a negate instruction. If we legalize to and+negate, the code grows from something like:

sll	$1, $1, 31
jr	$ra
sra	$2, $1, 31

To:

andi16	$2, $2, 1
li16	$3, 0
subu16	$2, $3, $2
jrc	$ra

Given this potential regression, I'd like to proceed with the x86-only solution for now (I will add a TODO comment about making it more general). As noted in the initial summary, I have filed bugs for the PPC and ARM folks and linked to this patch, so they are aware of what is needed to pursue the common solution.

Patch updated:
This is still an x86-only solution, but now we correctly mask the operand before negation. As Eli noted, the mask is optimized away when we know the operand's top bits are already zero (via zeroext on the input parameter in the test cases).

spatel added a subscriber: amehsan.Oct 13 2016, 8:18 AM

We don't have a negate instruction, we use "nor $dst, $src, $zero" instead. "nor" present in all our ISAs except mips16 which has an actual negate instruction.

Did that regression hit general MIPS code or just microMIPS? It's possible that we have a missing pattern for microMIPS in that case. I'll give this patch a whirl.

In D25485#569279, @sdardis wrote:

We don't have a negate instruction, we use "nor $dst, $src, $zero" instead. "nor" present in all our ISAs except mips16 which has an actual negate instruction.

Did that regression hit general MIPS code or just microMIPS? It's possible that we have a missing pattern for microMIPS in that case. I'll give this patch a whirl.

I only saw the regression for a RUN with the micromips attribute specified. Here are the tests that were affected by the LegalizeDAG patch which I'll attach here if you'd like to try it:

LLVM :: CodeGen/Mips/llvm-ir/add.ll
 LLVM :: CodeGen/Mips/llvm-ir/mul.ll
 LLVM :: CodeGen/Mips/llvm-ir/sdiv.ll
 LLVM :: CodeGen/Mips/llvm-ir/srem.ll
 LLVM :: CodeGen/Mips/llvm-ir/sub.ll
 LLVM :: CodeGen/Mips/llvm-ir/urem.ll
 LLVM :: CodeGen/Mips/select.ll
 LLVM :: CodeGen/SystemZ/branch-07.ll
 LLVM :: CodeGen/SystemZ/risbg-01.ll
 LLVM :: CodeGen/SystemZ/shift-10.ll
 LLVM :: CodeGen/X86/negate-i1.ll

sext_in_reg_legalize.patch21 KBDownload

patch for LegalizeDAG

In D25485#569279, @sdardis wrote:

We don't have a negate instruction, we use "nor $dst, $src, $zero" instead. "nor" present in all our ISAs except mips16 which has an actual negate instruction.

Did that regression hit general MIPS code or just microMIPS? It's possible that we have a missing pattern for microMIPS in that case. I'll give this patch a whirl.

Arithmetic and bitwise negation are equivalent in this case but just to mention it. The arithmetic negation is 'sub $dst, $zero, $src'. IIRC, GAS has a 'neg' alias for this but I don't think it's implemented in LLVM yet.

There are some neg<-> aliases in LLVM for MIPS which is what you're seeing in the produced assembly. The main reason for the different is it's not possible to use $zero with the 16 bit instructions.

As Daniel pointed out, you can use subtraction in that case. The produced assembly from that selection dag patch is actually better than what we're currently getting in terms of size as li16, subu16 are 16 bits each, the two constant shifts are 32bits each. The optimal case would be to use a 32bit microMIPS instruction with the register $zero (better than current in terms of size and # instructions). The new instruction pattern for microMIPS should be easy to fold away.

In D25485#569347, @sdardis wrote:

As Daniel pointed out, you can use subtraction in that case. The produced assembly from that selection dag patch is actually better than what we're currently getting in terms of size as li16, subu16 are 16 bits each, the two constant shifts are 32bits each. The optimal case would be to use a 32bit microMIPS instruction with the register $zero (better than current in terms of size and # instructions). The new instruction pattern for microMIPS should be easy to fold away.

Ah, so if the LegalizeDAG patch is an improvement anyway (if not always optimal) for MIPS, then I'll update this patch to use that along with all of the regression test updates.

Looking closer at the SystemZ regression test differences, we have more instructions with the LegalizeDAG patch, so I think that target is missing a pattern ( cc'ing @uweigand and @jonpa ).

Example - the current code for test 'f7' in test/CodeGen/SystemZ/branch-07.ll is:

cr  %r2, %r3
ipm %r0
afi %r0, -268435456
sra %r0, 31
srlg  %r1, %r3, 32
srlg  %r2, %r2, 32
cr  %r2, %r1
ipm %r1
afi %r1, -268435456
sra %r1, 31
sllg  %r2, %r1, 32
lr  %r2, %r0
br  %r14

With the LegalizeDAG patch, it becomes:

cr  %r2, %r3
ipm %r0
afi %r0, -268435456
srl %r0, 31
lcr %r0, %r0
srlg  %r1, %r3, 32
srlg  %r2, %r2, 32
cr  %r2, %r1
ipm %r1
afi %r1, -268435456
srl %r1, 31
lcr %r1, %r1
sllg  %r2, %r1, 32
lr  %r2, %r0
br  %r14

In D25485#569404, @spatel wrote:

Looking closer at the SystemZ regression test differences, we have more instructions with the LegalizeDAG patch, so I think that target is missing a pattern ( cc'ing @uweigand and @jonpa ).

On 2nd thought, this is a missing combine for all targets:

define i32 @topbit(i32 %x) {
  %sra = ashr i32 %x, 31
  %neg = sub i32 0, %sra
  ret i32 %neg
}

Should be simplified to:

%neg = lshr i32 %x, 31

spatel mentioned this in D25135: [InstCombine] sub X, sext(bool Y) -> add X, zext(bool Y).Oct 13 2016, 2:44 PM

A patch for the missing DAGCombines of shifts was committed here:
rL284239
and improved:
rL284269

So now I'm trying for the universal (not just x86) fix and added tests for PR30660/30661:
rL284279
rL284280

spatel retitled this revision from [x86] use 'neg' for negation of bool to [DAG] optimize negation of bool.Oct 14 2016, 2:24 PM

spatel updated this object.

Patch upated:
Use mask+negate universally via LegalizeDAG.

There are improvements in the tests for x86, PPC, ARM, and MIPS.
SystemZ looks neutral to me, but I have no experience with that target. I assume it would see the same improvement (optimize away the mask) if we added tests for that.
There's an extra 'negu' in the MIPS select test. I didn't check to see what is going on there.
Based on the earlier comments, I think we're ok with an added instruction in the microMIPS cases as long as there's a reduction in the code size.

Herald added a subscriber: nemanjai. · View Herald TranscriptOct 14 2016, 2:32 PM

zvi added inline comments.Oct 15 2016, 10:49 PM

test/CodeGen/X86/negate-i1.ll
72	This SAR is redundant. Does DAGCombine know that SAR(all_ones/allzeros) is redundant?

Yes, the SystemZ changes now look fine to me.

spatel added inline comments.Oct 17 2016, 8:13 AM

test/CodeGen/X86/negate-i1.ll
72	It knows sometimes, but of course it missed this one. I'll work on that patch now.

Patch updated:
Rebased after rL284395 so we no longer have the unnecessary 'sar' in the x86-32 test as noted by Zvi.
The vector side of that fold needs more work and is currently up for review with D25685.

In D25485#570870, @spatel wrote:

There's an extra 'negu' in the MIPS select test. I didn't check to see what is going on there.

This isn't extra. The existing CHECK lines don't include the sll/sra pair, so this is actually a win, not a regression.

Before:

mtc1	$5, $f1
mtc1	$6, $f2
sltu	$1, $zero, $4
sll	$1, $1, 31
sra	$1, $1, 31
mtc1	$1, $f0
jr	 $ra
sel.s	$f0, $f2, $f1

After:

sltu	$1, $zero, $4
negu	 $1, $1   <--- the 'and' mask was folded away, so we saved an instruction
mtc1	$5, $f1
mtc1	$6, $f2
mtc1	$1, $f0
jr	 $ra
sel.s	$f0, $f2, $f1

Based on the earlier comments, I think we're ok with an added instruction in the microMIPS cases as long as there's a reduction in the code size.

@sdardis / @dsanders : do you see any common folds that are missing based on the MIPS diffs? I don't think you want me trying any MIPS-specific hacks, so if there's a net win already, we should be ok to proceed?

The changes to the legalizer + X86 tests LGTM. Thanks!

LGTM, unless ARM/PPC backend maintainers want to jump in. Two comments inlined.

do you see any common folds that are missing based on the MIPS diffs?

I'm happy with codesize reduction in the microMIPS case, the missing fold/optimization case is subtraction by zero but that shouldn't hold this patch up. I'll deal with/work on the instruction count reduction change later.

Thanks,
Simon

test/CodeGen/Mips/llvm-ir/add.ll

48–49 ↗

(On Diff #74870)

Add a comment above here along the lines of:

; FIXME: This code sequence is inefficient as it should be 'subu $[[T0]], $zero, $[[T0]'. This sequence is even better as it's a single instruction. See D25485 for the rest of the cases where this sequence occurs.

test/CodeGen/Mips/llvm-ir/mul.ll

23–26 ↗

(On Diff #74742)

Unnecessary change.

This revision is now accepted and ready to land.Oct 17 2016, 2:37 PM

Thanks, Simon. I'll make the suggested changes and get this in.

Note that the MIPS tests for sdiv/urem/srem with i1 values *are* universal folds (can't divide by zero, so these disappear?), but I'm wondering how those patterns would appear in the backend. We do have IR-level folds for those in InstSimplify.

Thanks.

Looking at the existing test output for division by i1 leads me to believe they're correct but perhaps not optimal in the sense of "division by zero => undefined behaviour". Taking that view, division of an i1 by an i1 should yield the numerator always, as for MIPS division by zero yields an undefined result and hence division by an i1 should be folded away (unless -mcheck-zero-division is active).

I'll investigate that issue in a bit, but it's an optimization issue, not a correctness issue.

Thanks,
Simon

amehsan added inline comments.Oct 17 2016, 4:11 PM

test/CodeGen/PowerPC/negate-i1.ll
2–4 ↗	(On Diff #74870)	please add -verify-machineinstrs to RUN command line. We make sure that we add it to all tests in our backend. Also -mtriple=powerpc64le-unknown-linux-gnu is a more common triple.

LGTM, unless ARM/PPC backend maintainers want to jump in. Two comments inlined.

LGTM too.

amehsan added inline comments.Oct 17 2016, 5:03 PM

test/CodeGen/PowerPC/negate-i1.ll
2–4 ↗	(On Diff #74870)	(I believe you just created and committed this in r284279. That is why I am asking to make a change in a place that is not modified in this patch :)

spatel added inline comments.Oct 17 2016, 5:21 PM

test/CodeGen/PowerPC/negate-i1.ll
2–4 ↗	(On Diff #74870)	Correct - I added the test, so we could close PR30661 when this patch is committed. Note that the comment on line 1 gives away why I used the apple-darwin triple; I thought that would be an easier regex hack while adapting the script that we use for x86 auto-generation of CHECK lines. :) Thank you for pointing out the improvements - I'll fix these up.

Patch updated:

Added FIXME comment to Mips test file to note optimization opportunity.
Removed inadvertent changes to RUN lines in Mips test file.
Added -verify-machineinstrs option for PPC test.
Changed PPC test triple to powerpc64le-unknown-linux-gnu.

I'll let this sit a bit in case there is any feedback from the ARM coders.

Herald added a reviewer: vkalintiris. · View Herald TranscriptOct 18 2016, 1:09 PM

Closed by commit rL284611: [DAG] optimize negation of bool (authored by spatel). · Explain WhyOct 19 2016, 10:08 AM

This revision was automatically updated to reflect the committed changes.

spatel mentioned this in D25785: [InstSimplify] folds for negation of sign-bit.Oct 19 2016, 12:05 PM

spatel mentioned this in rL284649: [InstSimplify] fold negation of sign-bit.Oct 19 2016, 2:33 PM

spatel mentioned this in D26583: [X86][SSE] Improve SINT_TO_FP of boolean vector results (signum).Nov 15 2016, 7:46 AM

Diff 74289

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 313 Lines • ▼ Show 20 Lines	for (auto VT : { MVT::f32, MVT::f64, MVT::f80, MVT::f128,
MVT::i8, MVT::i16, MVT::i32, MVT::i64 }) {		MVT::i8, MVT::i16, MVT::i32, MVT::i64 }) {
setOperationAction(ISD::BR_CC, VT, Expand);		setOperationAction(ISD::BR_CC, VT, Expand);
setOperationAction(ISD::SELECT_CC, VT, Expand);		setOperationAction(ISD::SELECT_CC, VT, Expand);
}		}
if (Subtarget.is64Bit())		if (Subtarget.is64Bit())
setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i32, Legal);		setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i32, Legal);
setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i16 , Legal);		setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i16 , Legal);
setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i8 , Legal);		setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i8 , Legal);
setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i1 , Expand);		setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i1 , Custom);
setOperationAction(ISD::FP_ROUND_INREG , MVT::f32 , Expand);		setOperationAction(ISD::FP_ROUND_INREG , MVT::f32 , Expand);

setOperationAction(ISD::FREM , MVT::f32 , Expand);		setOperationAction(ISD::FREM , MVT::f32 , Expand);
setOperationAction(ISD::FREM , MVT::f64 , Expand);		setOperationAction(ISD::FREM , MVT::f64 , Expand);
setOperationAction(ISD::FREM , MVT::f80 , Expand);		setOperationAction(ISD::FREM , MVT::f80 , Expand);
setOperationAction(ISD::FLT_ROUNDS_ , MVT::i32 , Custom);		setOperationAction(ISD::FLT_ROUNDS_ , MVT::i32 , Custom);

// Promote the i8 variants and force them on up to i32 which has a shorter		// Promote the i8 variants and force them on up to i32 which has a shorter
▲ Show 20 Lines • Show All 15,994 Lines • ▼ Show 20 Lines	SDValue Zero = DAG.getConstant(
APInt::getNullValue(ExtVT.getScalarSizeInBits()), dl, ExtVT);		APInt::getNullValue(ExtVT.getScalarSizeInBits()), dl, ExtVT);

SDValue V = DAG.getNode(ISD::VSELECT, dl, ExtVT, In, NegOne, Zero);		SDValue V = DAG.getNode(ISD::VSELECT, dl, ExtVT, In, NegOne, Zero);
if (VT.is512BitVector())		if (VT.is512BitVector())
return V;		return V;
return DAG.getNode(X86ISD::VTRUNC, dl, VT, V);		return DAG.getNode(X86ISD::VTRUNC, dl, VT, V);
}		}

		static SDValue LowerSIGN_EXTEND_INREG(SDValue Op, SelectionDAG &DAG) {
		SDValue N0 = Op.getOperand(0);
		EVT VT = Op.getValueType();
		EVT SrcVT = cast<VTSDNode>(Op.getOperand(1))->getVT();
		SDLoc DL(Op);

		// An in-register sign-extend of a boolean is a negation:
		// 'true' (1) sign-extended is -1.
		// 'false' (0) sign-extended is 0.
		if (SrcVT.getScalarSizeInBits() == 1)
		return DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT), N0);
		efriedmaUnsubmitted Not Done Reply Inline Actions Given "SIGN_EXTEND_INREG(X, i1)", you can transform it to "-(X&1)". But you can't assume the input is zero-extended. I think your patch misbehaves for the following testcase: define void @_Z1fbi(i1 zeroext %a, i32 %b, i8* %p) local_unnamed_addr #0 { entry: %conv = trunc i32 %b to i1 %or = or i1 %a, %conv %s = sext i1 %or to i8 store i8 %s, i8* %p ret void } efriedma: Given "SIGN_EXTEND_INREG(X, i1)", you can transform it to "-(X&1)". But you can't assume the…
		spatelAuthorUnsubmitted Not Done Reply Inline Actions Ah, I misunderstood the meaning/guarantee of the source value type of ISD::SIGN_EXTEND_INREG. So for the given example, we get: orb %dil, %sil shlb $7, %sil sarb $7, %sil movb %sil, (%rdx) But with this patch: orb %dil, %sil negb %sil movb %sil, (%rdx) And that's wrong if any of the higher bits of %b / %sil are set. spatel: Ah, I misunderstood the meaning/guarantee of the source value type of ISD::SIGN_EXTEND_INREG.

		return SDValue();
		}

static SDValue LowerSIGN_EXTEND_VECTOR_INREG(SDValue Op,		static SDValue LowerSIGN_EXTEND_VECTOR_INREG(SDValue Op,
const X86Subtarget &Subtarget,		const X86Subtarget &Subtarget,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
SDValue In = Op->getOperand(0);		SDValue In = Op->getOperand(0);
MVT VT = Op->getSimpleValueType(0);		MVT VT = Op->getSimpleValueType(0);
MVT InVT = In.getSimpleValueType();		MVT InVT = In.getSimpleValueType();
assert(VT.getSizeInBits() == InVT.getSizeInBits());		assert(VT.getSizeInBits() == InVT.getSizeInBits());

▲ Show 20 Lines • Show All 5,688 Lines • ▼ Show 20 Lines	SDValue X86TargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
case ISD::SRA_PARTS:		case ISD::SRA_PARTS:
case ISD::SRL_PARTS: return LowerShiftParts(Op, DAG);		case ISD::SRL_PARTS: return LowerShiftParts(Op, DAG);
case ISD::SINT_TO_FP: return LowerSINT_TO_FP(Op, DAG);		case ISD::SINT_TO_FP: return LowerSINT_TO_FP(Op, DAG);
case ISD::UINT_TO_FP: return LowerUINT_TO_FP(Op, DAG);		case ISD::UINT_TO_FP: return LowerUINT_TO_FP(Op, DAG);
case ISD::TRUNCATE: return LowerTRUNCATE(Op, DAG);		case ISD::TRUNCATE: return LowerTRUNCATE(Op, DAG);
case ISD::ZERO_EXTEND: return LowerZERO_EXTEND(Op, Subtarget, DAG);		case ISD::ZERO_EXTEND: return LowerZERO_EXTEND(Op, Subtarget, DAG);
case ISD::SIGN_EXTEND: return LowerSIGN_EXTEND(Op, Subtarget, DAG);		case ISD::SIGN_EXTEND: return LowerSIGN_EXTEND(Op, Subtarget, DAG);
case ISD::ANY_EXTEND: return LowerANY_EXTEND(Op, Subtarget, DAG);		case ISD::ANY_EXTEND: return LowerANY_EXTEND(Op, Subtarget, DAG);
		case ISD::SIGN_EXTEND_INREG: return LowerSIGN_EXTEND_INREG(Op, DAG);
case ISD::SIGN_EXTEND_VECTOR_INREG:		case ISD::SIGN_EXTEND_VECTOR_INREG:
return LowerSIGN_EXTEND_VECTOR_INREG(Op, Subtarget, DAG);		return LowerSIGN_EXTEND_VECTOR_INREG(Op, Subtarget, DAG);
case ISD::FP_TO_SINT: return LowerFP_TO_SINT(Op, DAG);		case ISD::FP_TO_SINT: return LowerFP_TO_SINT(Op, DAG);
case ISD::FP_TO_UINT: return LowerFP_TO_UINT(Op, DAG);		case ISD::FP_TO_UINT: return LowerFP_TO_UINT(Op, DAG);
case ISD::FP_EXTEND: return LowerFP_EXTEND(Op, DAG);		case ISD::FP_EXTEND: return LowerFP_EXTEND(Op, DAG);
case ISD::LOAD: return LowerExtendedLoad(Op, Subtarget, DAG);		case ISD::LOAD: return LowerExtendedLoad(Op, Subtarget, DAG);
case ISD::FABS:		case ISD::FABS:
case ISD::FNEG: return LowerFABSorFNEG(Op, DAG);		case ISD::FNEG: return LowerFABSorFNEG(Op, DAG);
▲ Show 20 Lines • Show All 10,636 Lines • Show Last 20 Lines

test/CodeGen/X86/negate-i1.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s

	define i8 @select_i8_neg1_or_0(i1 %a) {			define i8 @select_i8_neg1_or_0(i1 %a) {
	; CHECK-LABEL: select_i8_neg1_or_0:			; CHECK-LABEL: select_i8_neg1_or_0:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: shlb $7, %dil			; CHECK-NEXT: negb %dil
	; CHECK-NEXT: sarb $7, %dil
	; CHECK-NEXT: movl %edi, %eax			; CHECK-NEXT: movl %edi, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	;			;
	%b = sext i1 %a to i8			%b = sext i1 %a to i8
	ret i8 %b			ret i8 %b
	}			}

	define i8 @select_i8_neg1_or_0_zeroext(i1 zeroext %a) {			define i8 @select_i8_neg1_or_0_zeroext(i1 zeroext %a) {
	; CHECK-LABEL: select_i8_neg1_or_0_zeroext:			; CHECK-LABEL: select_i8_neg1_or_0_zeroext:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: shlb $7, %dil			; CHECK-NEXT: negb %dil
	; CHECK-NEXT: sarb $7, %dil
	; CHECK-NEXT: movl %edi, %eax			; CHECK-NEXT: movl %edi, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	;			;
	%b = sext i1 %a to i8			%b = sext i1 %a to i8
	ret i8 %b			ret i8 %b
	}			}

	define i16 @select_i16_neg1_or_0(i1 %a) {			define i16 @select_i16_neg1_or_0(i1 %a) {
	; CHECK-LABEL: select_i16_neg1_or_0:			; CHECK-LABEL: select_i16_neg1_or_0:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: shll $15, %edi			; CHECK-NEXT: negl %edi
	; CHECK-NEXT: sarw $15, %di
	; CHECK-NEXT: movl %edi, %eax			; CHECK-NEXT: movl %edi, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	;			;
	%b = sext i1 %a to i16			%b = sext i1 %a to i16
	ret i16 %b			ret i16 %b
	}			}

	define i16 @select_i16_neg1_or_0_zeroext(i1 zeroext %a) {			define i16 @select_i16_neg1_or_0_zeroext(i1 zeroext %a) {
	; CHECK-LABEL: select_i16_neg1_or_0_zeroext:			; CHECK-LABEL: select_i16_neg1_or_0_zeroext:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movzbl %dil, %eax			; CHECK-NEXT: movzbl %dil, %eax
	; CHECK-NEXT: shll $15, %eax			; CHECK-NEXT: negl %eax
	; CHECK-NEXT: sarw $15, %ax
	; CHECK-NEXT: # kill: %AX<def> %AX<kill> %EAX<kill>			; CHECK-NEXT: # kill: %AX<def> %AX<kill> %EAX<kill>
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	;			;
	%b = sext i1 %a to i16			%b = sext i1 %a to i16
	ret i16 %b			ret i16 %b
	}			}

	define i32 @select_i32_neg1_or_0(i1 %a) {			define i32 @select_i32_neg1_or_0(i1 %a) {
	; CHECK-LABEL: select_i32_neg1_or_0:			; CHECK-LABEL: select_i32_neg1_or_0:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: shll $31, %edi			; CHECK-NEXT: negl %edi
	; CHECK-NEXT: sarl $31, %edi
	; CHECK-NEXT: movl %edi, %eax			; CHECK-NEXT: movl %edi, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	;			;
	%b = sext i1 %a to i32			%b = sext i1 %a to i32
	ret i32 %b			ret i32 %b
	}			}

	define i32 @select_i32_neg1_or_0_zeroext(i1 zeroext %a) {			define i32 @select_i32_neg1_or_0_zeroext(i1 zeroext %a) {
	; CHECK-LABEL: select_i32_neg1_or_0_zeroext:			; CHECK-LABEL: select_i32_neg1_or_0_zeroext:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movzbl %dil, %eax			; CHECK-NEXT: movzbl %dil, %eax
	; CHECK-NEXT: shll $31, %eax			; CHECK-NEXT: negl %eax
	; CHECK-NEXT: sarl $31, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	;			;
	%b = sext i1 %a to i32			%b = sext i1 %a to i32
	ret i32 %b			ret i32 %b
	}			}

	define i64 @select_i64_neg1_or_0(i1 %a) {			define i64 @select_i64_neg1_or_0(i1 %a) {
	; CHECK-LABEL: select_i64_neg1_or_0:			; CHECK-LABEL: select_i64_neg1_or_0:
				zviUnsubmitted Not Done Reply Inline Actions This SAR is redundant. Does DAGCombine know that SAR(all_ones/allzeros) is redundant? zvi: This SAR is redundant. Does DAGCombine know that SAR(all_ones/allzeros) is redundant?
				spatelAuthorUnsubmitted Not Done Reply Inline Actions It knows sometimes, but of course it missed this one. I'll work on that patch now. spatel: It knows sometimes, but of course it missed this one. I'll work on that patch now.
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: # kill: %EDI<def> %EDI<kill> %RDI<def>			; CHECK-NEXT: # kill: %EDI<def> %EDI<kill> %RDI<def>
	; CHECK-NEXT: shlq $63, %rdi			; CHECK-NEXT: negq %rdi
	; CHECK-NEXT: sarq $63, %rdi
	; CHECK-NEXT: movq %rdi, %rax			; CHECK-NEXT: movq %rdi, %rax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	;			;
	%b = sext i1 %a to i64			%b = sext i1 %a to i64
	ret i64 %b			ret i64 %b
	}			}

	define i64 @select_i64_neg1_or_0_zeroext(i1 zeroext %a) {			define i64 @select_i64_neg1_or_0_zeroext(i1 zeroext %a) {
	; CHECK-LABEL: select_i64_neg1_or_0_zeroext:			; CHECK-LABEL: select_i64_neg1_or_0_zeroext:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movzbl %dil, %eax			; CHECK-NEXT: movzbl %dil, %eax
	; CHECK-NEXT: shlq $63, %rax			; CHECK-NEXT: negq %rax
	; CHECK-NEXT: sarq $63, %rax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	;			;
	%b = sext i1 %a to i64			%b = sext i1 %a to i64
	ret i64 %b			ret i64 %b
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[DAG] optimize negation of bool
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 74289

lib/Target/X86/X86ISelLowering.cpp

test/CodeGen/X86/negate-i1.ll

This is an archive of the discontinued LLVM Phabricator instance.

[DAG] optimize negation of boolClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 74289

lib/Target/X86/X86ISelLowering.cpp

test/CodeGen/X86/negate-i1.ll

[DAG] optimize negation of bool
ClosedPublic