This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] fold (add/uaddo (xor a, -1), 1) -> (sub 0, a)
AbandonedPublic

Authored by deadalnix on May 21 2017, 7:42 PM.

Details

Summary

As per title. This is the very straightforward 2 complement subtraction transaform.

Event Timeline

deadalnix created this revision.May 21 2017, 7:42 PM

This kicks in for fold-pcmpeqd-2.ll . Looking at the assembly, things looks good, but I'm not really sure what this test is testing for, so if someone familiar could advice on what to do, that'd be great. @chandlerc , @dblaikie you worked on that, can you advice ?

efriedma added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2095

Are you sure this sets the overflow bit correctly?

deadalnix added inline comments.May 22 2017, 8:39 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2095

You made me doubt, so I put some test together, and it does indeed overflow correctly.

efriedma added inline comments.May 23 2017, 11:06 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2095

Are you really, 100% sure it sets the overflow bit correctly? "uaddo (a ^ -1), 1" sets the overflow bit if "a ^ 1 == -1", or "a == 0". "usubo 0, a" sets the overflow bit if "a !=0". Simple test program to show this:

#include <stdio.h>
__attribute((noinline)) int f(unsigned a, unsigned *ovf) {
   return __builtin_sub_overflow(0, a, ovf);
}
__attribute((noinline)) int f2(unsigned a, unsigned *ovf) {
  return __builtin_add_overflow(a ^ -1, 1, ovf);
}
int main() {
  unsigned sum, ovf;
  ovf = f(0, &sum); printf("%u %u\n", sum, ovf);
  ovf = f2(0, &sum); printf("%u %u\n", sum, ovf);
  ovf = f(1, &sum); printf("%u %u\n", sum, ovf);
  ovf = f2(1, &sum); printf("%u %u\n", sum, ovf);
}

I have no idea what clang is doing there. It seems like the intrinsic do not map directly to the uaddo/usubo. See for yourself the generated IR (in clang 3.8 that's what I have available ATM):

; Function Attrs: noinline norecurse nounwind uwtable
define i32 @f(i32 %a, i32* nocapture %ovf) #0 {
  %1 = zext i32 %a to i33
  %2 = sub nsw i33 0, %1
  %3 = trunc i33 %2 to i32
  %4 = and i33 %2, 4294967295
  %5 = icmp ne i33 %4, %2
  store i32 %3, i32* %ovf, align 4
  %6 = zext i1 %5 to i32
  ret i32 %6
}

I'm not working from C to begin with, so I'm not super familiar with these intrinsic and what they are supposed to do. i definitively want to get to the bottom of this and make sure this is correct, but we can't conclude from the result of the intrinsic that it is the case. Sadly, I'm not on my work computer right now, so I can't check out what DAG is generated and how it is combined ATM in this specific case.

Oh, oops, posted the wrong version. Corrected:

#include <stdio.h>
__attribute((noinline)) int f(unsigned a, unsigned *ovf) {
   return __builtin_usub_overflow(0, a, ovf);
}
__attribute((noinline)) int f2(unsigned a, unsigned *ovf) {
  return __builtin_uadd_overflow(a ^ -1, 1, ovf);
}
int main() {
  unsigned sum, ovf;
  ovf = f(0, &sum); printf("%u %u\n", sum, ovf);
  ovf = f2(0, &sum); printf("%u %u\n", sum, ovf);
  ovf = f(1, &sum); printf("%u %u\n", sum, ovf);
  ovf = f2(1, &sum); printf("%u %u\n", sum, ovf);
}

(I have no idea why clang generates such weird code for overloaded versions of the intrinsics.)

deadalnix abandoned this revision.May 25 2017, 2:48 AM

OK I was able to dig more. Something is screwed up with my test case. This is indeed not doing the right thing with the carry.