This is an archive of the discontinued LLVM Phabricator instance.

[X86] The TEST instruction is eliminated when BSF/TZCNT is used
ClosedPublic

Authored by ikulagin on Jun 29 2018, 4:29 AM.

Details

Summary

These changes cover the PR#31399.
Now the ffs(x) function is lowered to (x != 0) ? llvm.cttz(x) + 1 : 0
and it corresponds to the following llvm code:

%cnt = tail call i32 @llvm.cttz.i32(i32 %v, i1 true)
%tobool = icmp eq i32 %v, 0
%.op = add nuw nsw i32 %cnt, 1
%add = select i1 %tobool, i32 0, i32 %.op

and x86 asm code:

bsfl          %edi, %ecx
addl         $1, %ecx
testl         %edi, %edi
movl        $0, %eax
cmovnel  %ecx, %eax

In this case the 'test' instruction can't be eliminated because
the 'add' instruction modifies the EFLAGS, namely, ZF flag
that is set by the 'bsf' instruction when 'x' is zero.

I have moved the 'add' instruction below the 'cmov' instruction
at the peephole optimization stage during the compare instruction
optimization (optimizeCompareInstr), i.e., to implement
the following transformation:

bsfl    %edi, %ecx 
addl    $1, %ecx       --------* c1 = 1 
testl   %edi, %edi                |
movl    $0, %eax                 *  c2 = 0 -> c2 = c2 - c1 = -1 
                                             *  transform to movl  $-1, %eax
cmovnel %ecx, %eax          |
                               <--------|

It produces the following code:

bsfl          %edi, %ecx
movl        $-1, %eax
cmovnel  %ecx, %eax
addl         $1, %eax

Diff Detail

Repository
rL LLVM

Event Timeline

ikulagin created this revision.Jun 29 2018, 4:29 AM

I need to look at this in more detail, but does this handle the case where the flags are still live past the cmov? Moving the add down would break that.

I wonder if we can just fix this pattern as a dag combine on the CMOV node before isel? That should be a lot simpler.

I figured out that the fixing this pattern while dag combining is possible indeed. I'll try to do it and see what happens.

ikulagin updated this revision to Diff 154488.Jul 7 2018, 3:29 AM

I have fixed this pattern as a DAG combining on the CMOV node before instruction selection.
The following DAG combinations are performed:
(CMOV (ADD (CTTZ X), C), C-1, (X != 0)) -> (ADD (CMOV (CTTZ X), -1, (X != 0)), C)
(CMOV C-1, (ADD (CTTZ X), C), (X == 0)) -> (ADD (CMOV C-1, (CTTZ X), (X == 0)), C)

We need tests cases that use tzcnt too.

lib/Target/X86/X86ISelLowering.cpp
33435 ↗(On Diff #154488)

You already checked that it was definitely a constant above. So you shouldn't need a dyn_cast here.

Or you should just check the CC in the first if, then select your Add and your possible constant. And check that it is an Add and a constant.

33437 ↗(On Diff #154488)

Probably should make sure the add only has one user. Otherwise you're increasing code size.

33443 ↗(On Diff #154488)

The test cases only cover the CTTZ_ZERO_UNDEF version right?

This comment was removed by ikulagin.

The test cases only cover the CTTZ_ZERO_UNDEF version right?

I can add the test for CTTZ, but in the case of using CTTZ the TEST instruction can't be eliminated
because it produces three basic blocks as follows:

bb.0:
  liveins: $edi
  %2:gr32 = COPY $edi
  %3:gr32 = MOV32ri 32
  TEST32rr %2:gr32, %2:gr32, implicit-def $eflags
  JE_1 %bb.2, implicit $eflags
  JMP_1 %bb.1
bb.1.cond.false:
  %0:gr32 = BSF32rr %2:gr32, implicit-def dead $eflags
bb.2.cond.end:
  %1:gr32 = PHI %3:gr32, %bb.0, %0:gr32, %bb.1
  TEST32rr %2:gr32, %2:gr32, implicit-def $eflags
  %4:gr32 = MOV32ri -1
  %5:gr32 = CMOVE32rr %4:gr32, %1:gr32, implicit $eflags
  %6:gr32 = ADD32ri8 %5:gr32, 6, implicit-def dead $eflags
  $eax = COPY %6:gr32
  RET 0, $eax

We should analyze the bb.0 and bb.1.cond.false to eliminate the TEST instruction followed by the CMOV.
Does it make sence to implement such analysis and transformation or it will suffice to eliminate the TEST
when CTTZ_ZERO_UNDEF is used?

What happens with cttz when -mattr=bmi which enables the tzcnt instruction

What happens with cttz when -mattr=bmi which enables the tzcnt instruction

Sorry, I described the case when cttz(X, FALSE) is used and no support for BMI.
With MBI everything is OK and the TEST is eliminated.

ikulagin marked 3 inline comments as done.Jul 9 2018, 1:34 AM
ikulagin updated this revision to Diff 154546.Jul 9 2018, 1:39 AM

Or you should just check the CC in the first if, then select your Add and your possible constant. And check that it is an Add and a constant.

DONE.

Probably should make sure the add only has one user. Otherwise you're increasing code size.

DONE.

We need tests cases that use tzcnt too.

Tests added.

This revision is now accepted and ready to land.Jul 9 2018, 11:27 AM

Could you commit it, please? I have no rights to do it.

This revision was automatically updated to reflect the committed changes.