This is an archive of the discontinued LLVM Phabricator instance.

[InstCombineCompares] Added shl optimization for the instruction - icmp ugt/ult/uge/ule (shl Const2, A), Const1
AbandonedPublic

Authored by ankur29.garg on Nov 5 2014, 2:28 AM.

Details

Summary

Hi,
The following patch implements the optimization for the instructions of the type:
icmp ugt/ult/uge/ule (shl Const2, A), Const1

Such instructions can be converted to:
icmp (operator) A, (LeadingZeros(Const2) - LeadingZeros(Const1))

where (operator) is based on the initial operator and the actual values of the constants.

This handles the unsigned inequality operators. Equality operators are handled in patch D5839.
This patch is an extension of above patch where only equality operators are handled for similar instructions.

For Signed Inequality operators, such optimizations can't be made as with each left shift a signed number may keep changing the sign. So, such optimizations can't be performed.

Please help in reviewing it.

Thanks.
Ankur

Diff Detail

Event Timeline

ankur29.garg retitled this revision from to [InstCombineCompares] Added shl optimization for the instruction - icmp ugt/ult/uge/ule (shl Const2, A), Const1.
ankur29.garg updated this object.
ankur29.garg edited the test plan for this revision. (Show Details)
ankur29.garg set the repository for this revision to rL LLVM.
ankur29.garg added a subscriber: Unknown Object (MLST).
majnemer added inline comments.Nov 6 2014, 1:50 AM
test/Transforms/InstCombine/icmp.ll
1549–1555

Consider when %a is 31: %shl will be 0 which will mean %cmp is actually false, not true.

1557–1588

These have similar problems to the previous test case.

1590–1597

Consider when %a is 31: %shl will be 0 which will mean %cmp should be false.

However, this transform would make %cmp true.

1599–1606

Consider when %a is 30: %shl will be 0 which will mean %cmp should be true.

However, this transform would make %cmp false.

1608–1651

I'm pretty sure these aren't correct transformations either.

Hi David,
Thanks for reviewing the patch.
I have made the changes to correct the error you pointed out.

There are two cases:
Const 1 < Const 2
In this case i have added separate if else for the cases you have mentioned. So, this case is correct now.
Const 1 > Const 2
In this case, if Const 2 has some trailing zeros, that means it can be made zero by left shifting by an amount less than its bit width. In such cases, after the transformation, expression will involve two comparisons. For example:

%shl = shl i32 76, %a
%cmp = icmp ugt i32 %shl, 108
ret i1 %cmp
here %a should be greater than 0 and less than 31
So, this transformation is infact increasing the number of operations required. After the transformation this would become:
and (icmp ugt i32 A, 0), (icmp ult i32 A, 31)
This involves 3 operations (greater than the earlier 2).
So, I haven't included transformations for such cases as it is not leading to optimization.
Please suggest any other way to do this, if possible.
Please review the updated revision.

Thanks.

majnemer edited edge metadata.Nov 6 2014, 3:49 AM

Hi David,
Thanks for reviewing the patch.
I have made the changes to correct the error you pointed out.

There are two cases:
Const 1 < Const 2
In this case i have added separate if else for the cases you have mentioned. So, this case is correct now.
Const 1 > Const 2
In this case, if Const 2 has some trailing zeros, that means it can be made zero by left shifting by an amount less than its bit width. In such cases, after the transformation, expression will involve two comparisons. For example:

%shl = shl i32 76, %a
%cmp = icmp ugt i32 %shl, 108
ret i1 %cmp
here %a should be greater than 0 and less than 31
So, this transformation is infact increasing the number of operations required. After the transformation this would become:
and (icmp ugt i32 A, 0), (icmp ult i32 A, 31)

(and (icmp ugt i32 A, 0), (icmp ult i32 A, 31)) is equivalent to (icmp ult (add i32 A, -1), 30)

This involves 3 operations (greater than the earlier 2).
So, I haven't included transformations for such cases as it is not leading to optimization.
Please suggest any other way to do this, if possible.
Please review the updated revision.

Thanks.

lib/Transforms/InstCombine/InstCombineCompares.cpp
1172–1175
1181

Please clang-format this.

1185–1196

Can this be handled in one step as:

return new ICmpInst(I.getPredicate(), A,
                    ConstantInt::get(A->getType(), Shift));
ankur29.garg edited edge metadata.

Hi,
I have made the changes as per your in-line comments.

About the transformation for even const2:

"(and (icmp ugt i32 A, 0), (icmp ult i32 A, 31)) is equivalent to (icmp ult (add i32 A, -1), 30)"

This still transforms the initial two instructions into 2 instructions. I wanted to ask, is it useful to include this transformation.

Hi,
I found some errors in this optimization. Left-shifting an integer may cause it become lesser or greater than a constant based on the ordering of the bits.
For example,
AP1 = 01010000
AP2 = 00000101

if %a = 4, AP2 = AP1
if %a = 5, AP2 u> AP1
if %a = 6, AP2 u< AP1

I don't think this optimization is possible for 'shl'.
I will work on finding another way to do this, if possible.

Thanks.

ankur29.garg abandoned this revision.Nov 27 2014, 3:11 AM

This transformation is not possible for 'shl' instruction (example in the previous comment).

Thanks.