This is an archive of the discontinued LLVM Phabricator instance.

PR19958 wrong code at -O1 and above on x86_64-linux-gnu (InstCombine)
ClosedPublic

Authored by suyog on Jun 9 2014, 4:07 AM.

Details

Summary

This patch rectifies icmp instruction combine problem seen because of previous optimization patch for 'ashr/lshr exact' as mentioned in bug 19958.
I admit i wrote the previous patch keeping only equality in mind (not considering all cases is bad practice, will be careful in future).

This patch for now handles 'icmp eq/ne' only for 'ashr/lsher exact'. Added relevant test cases.

For other icmp instructions, some observations:

define i1 @f(i32 %x) {

%y = lshr exact i32 8, %x
%cmp = icmp ult i32 %y, 2
ret i1 %cmp

}

Here, in above example, as %x increases, %y decreases and hence comparison with same icmp instruction was giving wrong output.

while in ,

define i1 @f(i32 %x) {

%y = lshr exact i32 -8, %x
%cmp = icmp ult i32 %y, -2
ret i1 %cmp

}

as %x increases, %y also increases mathematically. Hence comparison with same icmp instruction is fine in this case.

This was not captured in original patch and hence wrong output. This can be handled in following two ways:

  1. if const1,const2 are both positive, then icmp instruction should be swapped (something like getSwappedPredicate()). if const1,const2 are both negative, then keep the icmp instruction as it is. Note : getInversePredicate() won't work.
  2. if const1/const2 are both positive, swap the operands, keeping the icmp instruction as it is. No change if const1,const2 are both negative.

Note : Cases where const1 and const2 have opposite sign - one is positive other is negative are always true/false.

This is handled by simplifyicmp call and it never reaches our patch. I checked it with both ashr/lshr for this case.

I will verify if above two approaches for other icmp instructions are universally true for both ashr/lshr and come up with another patch.
Added a TODO for the same.

Please review this patch which handles equality for ashr/lshr exact.
Any comments/suggestions are most welcomed.

Thanks.

  • Suyog

Diff Detail

Repository
rL LLVM

Event Timeline

suyog updated this revision to Diff 10235.Jun 9 2014, 4:07 AM
suyog retitled this revision from to PR19958 wrong code at -O1 and above on x86_64-linux-gnu (InstCombine).
suyog updated this object.
suyog edited the test plan for this revision. (Show Details)
suyog added reviewers: rafael, spatel, nlewycky.
suyog added a subscriber: Unknown Object (MLST).
spatel edited edge metadata.Jun 9 2014, 1:29 PM

Please check for typos: lhsr -> lshr

spatel added a comment.Jun 9 2014, 1:33 PM

I'm not sure what LLVM test case policy is, but I think it would be good to have contra-test cases too. Eg, a test case that doesn't specify 'exact', a test case that isn't eq/ne - these would ensure that your code is not firing when it wasn't intended. Also, can you include a 'ne' test?

suyog updated this revision to Diff 10263.Jun 10 2014, 12:21 AM
suyog edited edge metadata.

Corrected typo.

Added test cases for

  • exact ne
  • no exact
  • no ne/eq

Please help in reviewing if this looks good or any more test cases are required.

Thanks

  • Suyog

Minor nit: fear the 80-column police
// (icmp eq/ne (ashr exact const2, A), const1) -> icmp eq/ne A, Log2(const2/const1)

This line's indentation looks wrong too:
return new ICmpInst(I.getPredicate(), A,

Major nit:
unsigned shift = Quotient.logBase2();

What if Quotient isn't power of 2?

I think this leads to miscompiled code:
$ cat 19958.ll
@a = common global i32 0, align 4
define i1 @main() {
%a = load i32* @a, align 4
%shr = lshr exact i32 80, %a
%cmp = icmp eq i32 %shr, 41 ; NOTE: non-power-of-2 comparison
ret i1 %cmp
}
$ ./opt 19958.ll -S | ./llc | ./clang -x assembler - -o a.out ; ./a.out ; echo $?
0
$ ./opt -instcombine 19958.ll -S | ./llc | ./clang -x assembler - -o a.out ; ./a.out ; echo $?
1

I like the additional test cases, but they would be better if the constants were at the limits rather than just random numbers in the middle of the possible values.

suyog updated this revision to Diff 10383.Jun 13 2014, 4:55 AM

Thanks for the review.

Updated patch to handle exact division and exact log2. We were handling exact division and exact log2 for ashr in instsimplify, and not for lshr.

Now we are handling for both. Updated test cases which includes constants to the extreme limit and not something random in between.
Also took care of 80 lines clang-format. :)

Please help in reviewing the patch.

suyog added a comment.Jun 24 2014, 9:38 AM

Gentle Ping !!

suyog added a comment.Jul 1 2014, 6:34 AM

Gentle Ping !!

spatel added a comment.Jul 2 2014, 2:33 PM

Sorry for the delay. LGTM...but considering that versions of this patch have caused miscompiles twice and I'm pretty new here, I think we should have someone with more instcombine experience give final approval in case I've missed anything.

nicholas added inline comments.
lib/Transforms/InstCombine/InstCombineCompares.cpp
2349 ↗(On Diff #10383)

APInt::isMinValue() is faster than APInt::operator== even though it's less clear to read.

2350 ↗(On Diff #10383)

APInt::sdivrem is as fast as a single sdiv or srem. Please use it then text the 'rem' results.

test/Transforms/InstCombine/icmp.ll
1427 ↗(On Diff #10383)

The bugs from last time were due to icmp non-equality comparisons, right? Are those tests already in this code or should you be adding tests to make sure you don't miscompile those cases here?

suyog updated this revision to Diff 11125.Jul 7 2014, 11:37 AM
suyog added a reviewer: dexonsmith.

Hi Duncan, Nick, Rafael, Sanjay,

I have modified the code to calculate shift by Log2(Const2) - Log2(Const1)
instead of Log2(Const2/Const1) to avoid expensive division operation as per Duncan's suggestion.
Also handled few more conditions for 0.
Added additional test cases as well.

(I am doing this for icmp eq/ne only for now,
other icmp instructions are TODO item as they need some verification)

Can you please help in reviewing this modified patch.

Your suggestions/comments are most awaited. :)

Thanks,
Suyog

suyog updated this revision to Diff 11217.Jul 9 2014, 1:25 PM

Hi Duncan,

Thanks for your explanation. I have handled case for 'lshr' separately now in the patch attached.

I have combined the logic for ashr/lshr as well as exact/non-exact as most of the logic is same.
Wherever there are special cases, i have handled them separately. I have added the logic for the
cases which were handled by 'instsimplify' to keep the logical flow as discussed.

I have combined the logic, since the 'match' functions are expensive, same for the log,
and our whole point of calculating difference of log was to avoid expensive division operation.

Please help in reviewing the patch. Your comments/suggestions are most welcomed :)

suyog updated this revision to Diff 11326.Jul 11 2014, 1:24 PM
suyog added a subscriber: dexonsmith.

Hi Duncan,

Thanks for your review.
Couldn't spot the miscompiles earlier because i didn't comment out
the 'simplify' call. Now i modified the code and tested almost every
combination.

I have modified the code as per your suggestions :

  • Made comments short and crisp as per your suggestion.
  • included lambda helper functions to return appropriate constant and icmp instruction.
  • included logic for both constants -1 for ashr as suggested (In my opinion if both constants are equal and not -1, then final icmp would be icmp eq/ne A, 0. The Predicate won't get inversed and will remain same. You had suggested using lambda function here to get inversed predicate. Correct me if my analysis is wrong.)
  • used 'bool IsAShr' to avoid recalculation.
  • removed LShrOperator block which was causing miscompiles.
  • removed unnecessary shl/lshr distinction for exact and non exact
  • moved all test cases to new file icmp-ashr.ll
  • used msb_high/low instead of +ve/-ve and opposite msb
  • tried including all combinations of ne/eq + exact/non-exact + lshr/ashr

Please help in reviewing the patch.

Your comments/suggestions are valuable and most awaited. :)

Thanks,
Suyog

suyog updated this revision to Diff 11351.Jul 13 2014, 10:08 AM

Hi Duncan,

Made changes as per your suggestions :

  • Wrote the whole logic in separate function. Didn't make it 'static', took 'cue' from other 'Foldicmp' functions.
  • Removed TODO, will probably raise a separate PR for it.
  • Updated comments, removed extra braces.
  • defined variables just before their use.

Can you please see if this looks good to you?
Your comments/suggestions are most welcomed.

Thanks,
Suyog

suyog updated this revision to Diff 11609.Jul 17 2014, 1:26 PM

Hi Duncan, Nick

As there is no guidance in the coding standards,
i looked at few files in InstCombine and comment style in them and
i found it better to include braces, wherever there is a comment inside 'if'.

if ( ) {
// comment
}

I removed/didn't add braces where there is no comment and a single statement block.

Please see if this looks good to you !!

suyog accepted this revision.Jul 22 2014, 12:21 PM
suyog added a reviewer: suyog.
This revision is now accepted and ready to land.Jul 22 2014, 12:21 PM
suyog closed this revision.Jul 22 2014, 12:28 PM
suyog updated this revision to Diff 11773.

Closed by commit rL213678 (authored by @suyog).