This is an archive of the discontinued LLVM Phabricator instance.

[X86] The TEST instruction is eliminated when BSF/TZCNT is used
ClosedPublic

Authored by ikulagin on Jun 29 2018, 4:29 AM.

Download Raw Diff

Details

Reviewers

davide
craig.topper
spatel
RKSimon

Commits

rG02867f0fa3e6: [X86] The TEST instruction is eliminated when BSF/TZCNT is used
rL336768: [X86] The TEST instruction is eliminated when BSF/TZCNT is used

Summary

These changes cover the PR#31399.
Now the ffs(x) function is lowered to (x != 0) ? llvm.cttz(x) + 1 : 0
and it corresponds to the following llvm code:

%cnt = tail call i32 @llvm.cttz.i32(i32 %v, i1 true)
%tobool = icmp eq i32 %v, 0
%.op = add nuw nsw i32 %cnt, 1
%add = select i1 %tobool, i32 0, i32 %.op

and x86 asm code:

bsfl          %edi, %ecx
addl         $1, %ecx
testl         %edi, %edi
movl        $0, %eax
cmovnel  %ecx, %eax

In this case the 'test' instruction can't be eliminated because
the 'add' instruction modifies the EFLAGS, namely, ZF flag
that is set by the 'bsf' instruction when 'x' is zero.

I have moved the 'add' instruction below the 'cmov' instruction
at the peephole optimization stage during the compare instruction
optimization (optimizeCompareInstr), i.e., to implement
the following transformation:

bsfl    %edi, %ecx 
addl    $1, %ecx       --------* c1 = 1 
testl   %edi, %edi                |
movl    $0, %eax                 *  c2 = 0 -> c2 = c2 - c1 = -1 
                                             *  transform to movl  $-1, %eax
cmovnel %ecx, %eax          |
                               <--------|

It produces the following code:

bsfl          %edi, %ecx
movl        $-1, %eax
cmovnel  %ecx, %eax
addl         $1, %eax

Diff Detail

Repository: rL LLVM

Event Timeline

ikulagin created this revision.Jun 29 2018, 4:29 AM

Herald added a subscriber: llvm-commits. · View Herald TranscriptJun 29 2018, 4:29 AM

I need to look at this in more detail, but does this handle the case where the flags are still live past the cmov? Moving the add down would break that.

I wonder if we can just fix this pattern as a dag combine on the CMOV node before isel? That should be a lot simpler.

I figured out that the fixing this pattern while dag combining is possible indeed. I'll try to do it and see what happens.

I have fixed this pattern as a DAG combining on the CMOV node before instruction selection.
The following DAG combinations are performed:
(CMOV (ADD (CTTZ X), C), C-1, (X != 0)) -> (ADD (CMOV (CTTZ X), -1, (X != 0)), C)
(CMOV C-1, (ADD (CTTZ X), C), (X == 0)) -> (ADD (CMOV C-1, (CTTZ X), (X == 0)), C)

craig.topper added reviewers: spatel, RKSimon.Jul 7 2018, 8:59 AM

We need tests cases that use tzcnt too.

lib/Target/X86/X86ISelLowering.cpp
33435 ↗	(On Diff #154488)	You already checked that it was definitely a constant above. So you shouldn't need a dyn_cast here. Or you should just check the CC in the first if, then select your Add and your possible constant. And check that it is an Add and a constant.
33437 ↗	(On Diff #154488)	Probably should make sure the add only has one user. Otherwise you're increasing code size.
33443 ↗	(On Diff #154488)	The test cases only cover the CTTZ_ZERO_UNDEF version right?

ikulagin added a comment.Jul 9 2018, 12:05 AM

This comment was removed by ikulagin.

The test cases only cover the CTTZ_ZERO_UNDEF version right?

I can add the test for CTTZ, but in the case of using CTTZ the TEST instruction can't be eliminated
because it produces three basic blocks as follows:

bb.0:
  liveins: $edi
  %2:gr32 = COPY $edi
  %3:gr32 = MOV32ri 32
  TEST32rr %2:gr32, %2:gr32, implicit-def $eflags
  JE_1 %bb.2, implicit $eflags
  JMP_1 %bb.1
bb.1.cond.false:
  %0:gr32 = BSF32rr %2:gr32, implicit-def dead $eflags
bb.2.cond.end:
  %1:gr32 = PHI %3:gr32, %bb.0, %0:gr32, %bb.1
  TEST32rr %2:gr32, %2:gr32, implicit-def $eflags
  %4:gr32 = MOV32ri -1
  %5:gr32 = CMOVE32rr %4:gr32, %1:gr32, implicit $eflags
  %6:gr32 = ADD32ri8 %5:gr32, 6, implicit-def dead $eflags
  $eax = COPY %6:gr32
  RET 0, $eax

We should analyze the bb.0 and bb.1.cond.false to eliminate the TEST instruction followed by the CMOV.
Does it make sence to implement such analysis and transformation or it will suffice to eliminate the TEST
when CTTZ_ZERO_UNDEF is used?

What happens with cttz when -mattr=bmi which enables the tzcnt instruction

What happens with cttz when -mattr=bmi which enables the tzcnt instruction

Sorry, I described the case when cttz(X, FALSE) is used and no support for BMI.
With MBI everything is OK and the TEST is eliminated.

ikulagin marked 3 inline comments as done.Jul 9 2018, 1:34 AM

Or you should just check the CC in the first if, then select your Add and your possible constant. And check that it is an Add and a constant.

DONE.

Probably should make sure the add only has one user. Otherwise you're increasing code size.

DONE.

We need tests cases that use tzcnt too.

Tests added.

LGTM

This revision is now accepted and ready to land.Jul 9 2018, 11:27 AM

Could you commit it, please? I have no rights to do it.

Closed by commit rL336768: [X86] The TEST instruction is eliminated when BSF/TZCNT is used (authored by ctopper). · Explain WhyJul 11 2018, 12:02 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86ISelLowering.cpp

30 lines

X86InstrInfo.cpp

7 lines

test/

CodeGen/

X86/

dagcombine-select.ll

95 lines

Diff 154939

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 33,446 Lines • ▼ Show 20 Lines	if (checkBoolTestAndOrSetCCCombine(Cond, CC0, CC1, Flags, isAndSetCC)) {
Flags};		Flags};
SDValue LCMOV = DAG.getNode(X86ISD::CMOV, DL, N->getValueType(0), LOps);		SDValue LCMOV = DAG.getNode(X86ISD::CMOV, DL, N->getValueType(0), LOps);
SDValue Ops[] = {LCMOV, TrueOp, DAG.getConstant(CC1, DL, MVT::i8), Flags};		SDValue Ops[] = {LCMOV, TrueOp, DAG.getConstant(CC1, DL, MVT::i8), Flags};
SDValue CMOV = DAG.getNode(X86ISD::CMOV, DL, N->getValueType(0), Ops);		SDValue CMOV = DAG.getNode(X86ISD::CMOV, DL, N->getValueType(0), Ops);
return CMOV;		return CMOV;
}		}
}		}

		// Handle (CMOV (ADD (CTTZ X), C), C-1, (X != 0)) ->
		// (ADD (CMOV (CTTZ X), -1, (X != 0)), C) or
		// (CMOV C-1, (ADD (CTTZ X), C), (X == 0)) ->
		// (ADD (CMOV C-1, (CTTZ X), (X == 0)), C)
		if (CC == X86::COND_NE \|\| CC == X86::COND_E) {
		auto *Cnst = CC == X86::COND_E ? dyn_cast<ConstantSDNode>(TrueOp)
		: dyn_cast<ConstantSDNode>(FalseOp);
		SDValue Add = CC == X86::COND_E ? FalseOp : TrueOp;

		if (Cnst && Add.getOpcode() == ISD::ADD && Add.hasOneUse()) {
		auto *AddOp1 = dyn_cast<ConstantSDNode>(Add.getOperand(1));
		SDValue AddOp2 = Add.getOperand(0);
		if (AddOp1 && (AddOp2.getOpcode() == ISD::CTTZ_ZERO_UNDEF \|\|
		AddOp2.getOpcode() == ISD::CTTZ)) {
		APInt Diff = Cnst->getAPIntValue() - AddOp1->getAPIntValue();
		if (CC == X86::COND_NE) {
		Add = DAG.getNode(X86ISD::CMOV, DL, Add.getValueType(), AddOp2,
		DAG.getConstant(Diff, DL, Add.getValueType()),
		DAG.getConstant(CC, DL, MVT::i8), Cond);
		} else {
		Add = DAG.getNode(X86ISD::CMOV, DL, Add.getValueType(),
		DAG.getConstant(Diff, DL, Add.getValueType()),
		AddOp2, DAG.getConstant(CC, DL, MVT::i8), Cond);
		}
		return DAG.getNode(X86ISD::ADD, DL, Add.getValueType(), Add,
		SDValue(AddOp1, 0));
		}
		}
		}

return SDValue();		return SDValue();
}		}

/// Different mul shrinking modes.		/// Different mul shrinking modes.
enum ShrinkMode { MULS8, MULU8, MULS16, MULU16 };		enum ShrinkMode { MULS8, MULU8, MULS16, MULU16 };

static bool canReduceVMulWidth(SDNode *N, SelectionDAG &DAG, ShrinkMode &Mode) {		static bool canReduceVMulWidth(SDNode *N, SelectionDAG &DAG, ShrinkMode &Mode) {
EVT VT = N->getOperand(0).getValueType();		EVT VT = N->getOperand(0).getValueType();
▲ Show 20 Lines • Show All 7,110 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,585 Lines • ▼ Show 20 Lines	static X86::CondCode isUseDefConvertible(MachineInstr &MI) {
case X86::POPCNT16rr:case X86::POPCNT16rm:		case X86::POPCNT16rr:case X86::POPCNT16rm:
case X86::POPCNT32rr:case X86::POPCNT32rm:		case X86::POPCNT32rr:case X86::POPCNT32rm:
case X86::POPCNT64rr:case X86::POPCNT64rm:		case X86::POPCNT64rr:case X86::POPCNT64rm:
return X86::COND_E;		return X86::COND_E;
case X86::TZCNT16rr: case X86::TZCNT16rm:		case X86::TZCNT16rr: case X86::TZCNT16rm:
case X86::TZCNT32rr: case X86::TZCNT32rm:		case X86::TZCNT32rr: case X86::TZCNT32rm:
case X86::TZCNT64rr: case X86::TZCNT64rm:		case X86::TZCNT64rr: case X86::TZCNT64rm:
return X86::COND_B;		return X86::COND_B;
		case X86::BSF16rr:
		case X86::BSF16rm:
		case X86::BSF32rr:
		case X86::BSF32rm:
		case X86::BSF64rr:
		case X86::BSF64rm:
		return X86::COND_E;
}		}
}		}

/// Check if there exists an earlier instruction that		/// Check if there exists an earlier instruction that
/// operates on the same source operands and sets flags in the same way as		/// operates on the same source operands and sets flags in the same way as
/// Compare; remove Compare if possible.		/// Compare; remove Compare if possible.
bool X86InstrInfo::optimizeCompareInstr(MachineInstr &CmpInstr, unsigned SrcReg,		bool X86InstrInfo::optimizeCompareInstr(MachineInstr &CmpInstr, unsigned SrcReg,
unsigned SrcReg2, int CmpMask,		unsigned SrcReg2, int CmpMask,
▲ Show 20 Lines • Show All 3,968 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/dagcombine-select.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=x86_64-unknown-unknown -verify-machineinstrs < %s \| FileCheck -enable-var-scope %s			; RUN: llc -mtriple=x86_64-unknown-unknown -verify-machineinstrs < %s \| FileCheck -enable-var-scope %s
				; RUN: llc -mtriple=x86_64-unknown-unknown -verify-machineinstrs -mattr=+bmi < %s \| FileCheck -check-prefix=BMI -enable-var-scope %s

	define i32 @select_and1(i32 %x, i32 %y) {			define i32 @select_and1(i32 %x, i32 %y) {
	; CHECK-LABEL: select_and1:			; CHECK-LABEL: select_and1:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: xorl %eax, %eax
	; CHECK-NEXT: cmpl $11, %edi			; CHECK-NEXT: cmpl $11, %edi
	; CHECK-NEXT: cmovgel %esi, %eax			; CHECK-NEXT: cmovgel %esi, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	▲ Show 20 Lines • Show All 263 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	; CHECK-NEXT: .LBB19_1:			; CHECK-NEXT: .LBB19_1:
	; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%sel = select i1 %cond, double -4.0, double 23.3			%sel = select i1 %cond, double -4.0, double 23.3
	%bo = frem double 5.1, %sel			%bo = frem double 5.1, %sel
	ret double %bo			ret double %bo
	}			}

				declare i64 @llvm.cttz.i64(i64, i1)
				define i64 @cttz_64_eq_select(i64 %v) nounwind {
				; CHECK-LABEL: cttz_64_eq_select:
				; CHECK: # %bb.0:
				; CHECK-NEXT: bsfq %rdi, %rcx
				; CHECK-NEXT: movq $-1, %rax
				; CHECK-NEXT: cmoveq %rcx, %rax
				; CHECK-NEXT: addq $6, %rax
				; CHECK-NEXT: retq

				; BMI-LABEL: cttz_64_eq_select:
				; BMI: # %bb.0:
				; BMI-NEXT: tzcntq %rdi, %rcx
				; BMI-NEXT: movq $-1, %rax
				; BMI-NEXT: cmovbq %rcx, %rax
				; BMI-NEXT: addq $6, %rax
				; BMI-NEXT: retq
				%cnt = tail call i64 @llvm.cttz.i64(i64 %v, i1 true)
				%tobool = icmp eq i64 %v, 0
				%.op = add nuw nsw i64 %cnt, 6
				%add = select i1 %tobool, i64 5, i64 %.op
				ret i64 %add
				}

				define i64 @cttz_64_ne_select(i64 %v) nounwind {
				; CHECK-LABEL: cttz_64_ne_select:
				; CHECK: # %bb.0:
				; CHECK-NEXT: bsfq %rdi, %rcx
				; CHECK-NEXT: movq $-1, %rax
				; CHECK-NEXT: cmoveq %rcx, %rax
				; CHECK-NEXT: addq $6, %rax
				; CHECK-NEXT: retq

				; BMI-LABEL: cttz_64_ne_select:
				; BMI: # %bb.0:
				; BMI-NEXT: tzcntq %rdi, %rcx
				; BMI-NEXT: movq $-1, %rax
				; BMI-NEXT: cmovbq %rcx, %rax
				; BMI-NEXT: addq $6, %rax
				; BMI-NEXT: retq
				%cnt = tail call i64 @llvm.cttz.i64(i64 %v, i1 true)
				%tobool = icmp ne i64 %v, 0
				%.op = add nuw nsw i64 %cnt, 6
				%add = select i1 %tobool, i64 %.op, i64 5
				ret i64 %add
				}

				declare i32 @llvm.cttz.i32(i32, i1)
				define i32 @cttz_32_eq_select(i32 %v) nounwind {
				; CHECK-LABEL: cttz_32_eq_select:
				; CHECK: # %bb.0:
				; CHECK-NEXT: bsfl %edi, %ecx
				; CHECK-NEXT: movl $-1, %eax
				; CHECK-NEXT: cmovel %ecx, %eax
				; CHECK-NEXT: addl $6, %eax
				; CHECK-NEXT: retq

				; BMI-LABEL: cttz_32_eq_select:
				; BMI: # %bb.0:
				; BMI-NEXT: tzcntl %edi, %ecx
				; BMI-NEXT: movl $-1, %eax
				; BMI-NEXT: cmovbl %ecx, %eax
				; BMI-NEXT: addl $6, %eax
				; BMI-NEXT: retq
				%cnt = tail call i32 @llvm.cttz.i32(i32 %v, i1 true)
				%tobool = icmp eq i32 %v, 0
				%.op = add nuw nsw i32 %cnt, 6
				%add = select i1 %tobool, i32 5, i32 %.op
				ret i32 %add
				}

				define i32 @cttz_32_ne_select(i32 %v) nounwind {
				; CHECK-LABEL: cttz_32_ne_select:
				; CHECK: # %bb.0:
				; CHECK-NEXT: bsfl %edi, %ecx
				; CHECK-NEXT: movl $-1, %eax
				; CHECK-NEXT: cmovel %ecx, %eax
				; CHECK-NEXT: addl $6, %eax
				; CHECK-NEXT: retq

				; BMI-LABEL: cttz_32_ne_select:
				; BMI: # %bb.0:
				; BMI-NEXT: tzcntl %edi, %ecx
				; BMI-NEXT: movl $-1, %eax
				; BMI-NEXT: cmovbl %ecx, %eax
				; BMI-NEXT: addl $6, %eax
				; BMI-NEXT: retq
				%cnt = tail call i32 @llvm.cttz.i32(i32 %v, i1 true)
				%tobool = icmp ne i32 %v, 0
				%.op = add nuw nsw i32 %cnt, 6
				%add = select i1 %tobool, i32 %.op, i32 5
				ret i32 %add
				}