This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] fold (add/uaddo (xor a, -1), 1) -> (sub 0, a)
AbandonedPublic

Authored by deadalnix on May 21 2017, 7:42 PM.

Download Raw Diff

Details

Reviewers

chandlerc
jyknight
nemanjai
mkuper
spatel
RKSimon
zvi
bkramer

Summary

As per title. This is the very straightforward 2 complement subtraction transaform.

Diff Detail

Build Status

Buildable 6636
Build 6636: arc lint + arc unit

Event Timeline

deadalnix created this revision.May 21 2017, 7:42 PM

This kicks in for fold-pcmpeqd-2.ll . Looking at the assembly, things looks good, but I'm not really sure what this test is testing for, so if someone familiar could advice on what to do, that'd be great. @chandlerc , @dblaikie you worked on that, can you advice ?

Format

efriedma added a subscriber: efriedma.May 22 2017, 2:53 PM

efriedma added inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2095	Are you sure this sets the overflow bit correctly?

dblaikie removed a reviewer: dblaikie.May 22 2017, 4:48 PM

deadalnix added inline comments.May 22 2017, 8:39 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2095	You made me doubt, so I put some test together, and it does indeed overflow correctly.

efriedma added inline comments.May 23 2017, 11:06 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

2095

Are you really, 100% sure it sets the overflow bit correctly? "uaddo (a ^ -1), 1" sets the overflow bit if "a ^ 1 == -1", or "a == 0". "usubo 0, a" sets the overflow bit if "a !=0". Simple test program to show this:

#include <stdio.h>
__attribute((noinline)) int f(unsigned a, unsigned *ovf) {
   return __builtin_sub_overflow(0, a, ovf);
}
__attribute((noinline)) int f2(unsigned a, unsigned *ovf) {
  return __builtin_add_overflow(a ^ -1, 1, ovf);
}
int main() {
  unsigned sum, ovf;
  ovf = f(0, &sum); printf("%u %u\n", sum, ovf);
  ovf = f2(0, &sum); printf("%u %u\n", sum, ovf);
  ovf = f(1, &sum); printf("%u %u\n", sum, ovf);
  ovf = f2(1, &sum); printf("%u %u\n", sum, ovf);
}

I have no idea what clang is doing there. It seems like the intrinsic do not map directly to the uaddo/usubo. See for yourself the generated IR (in clang 3.8 that's what I have available ATM):

; Function Attrs: noinline norecurse nounwind uwtable
define i32 @f(i32 %a, i32* nocapture %ovf) #0 {
  %1 = zext i32 %a to i33
  %2 = sub nsw i33 0, %1
  %3 = trunc i33 %2 to i32
  %4 = and i33 %2, 4294967295
  %5 = icmp ne i33 %4, %2
  store i32 %3, i32* %ovf, align 4
  %6 = zext i1 %5 to i32
  ret i32 %6
}

I'm not working from C to begin with, so I'm not super familiar with these intrinsic and what they are supposed to do. i definitively want to get to the bottom of this and make sure this is correct, but we can't conclude from the result of the intrinsic that it is the case. Sadly, I'm not on my work computer right now, so I can't check out what DAG is generated and how it is combined ATM in this specific case.

Oh, oops, posted the wrong version. Corrected:

#include <stdio.h>
__attribute((noinline)) int f(unsigned a, unsigned *ovf) {
   return __builtin_usub_overflow(0, a, ovf);
}
__attribute((noinline)) int f2(unsigned a, unsigned *ovf) {
  return __builtin_uadd_overflow(a ^ -1, 1, ovf);
}
int main() {
  unsigned sum, ovf;
  ovf = f(0, &sum); printf("%u %u\n", sum, ovf);
  ovf = f2(0, &sum); printf("%u %u\n", sum, ovf);
  ovf = f(1, &sum); printf("%u %u\n", sum, ovf);
  ovf = f2(1, &sum); printf("%u %u\n", sum, ovf);
}

(I have no idea why clang generates such weird code for overloaded versions of the intrinsics.)

OK I was able to dig more. Something is screwed up with my test case. This is indeed not doing the right thing with the carry.

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

14 lines

test/

CodeGen/

X86/

subcarry.ll

7 lines

Diff 99713

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,955 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitADD(SDNode *N) {
if (SimplifyDemandedBits(SDValue(N, 0)))		if (SimplifyDemandedBits(SDValue(N, 0)))
return SDValue(N, 0);		return SDValue(N, 0);

// fold (a+b) -> (a\|b) iff a and b share no bits.		// fold (a+b) -> (a\|b) iff a and b share no bits.
if ((!LegalOperations \|\| TLI.isOperationLegal(ISD::OR, VT)) &&		if ((!LegalOperations \|\| TLI.isOperationLegal(ISD::OR, VT)) &&
VT.isInteger() && DAG.haveNoCommonBitsSet(N0, N1))		VT.isInteger() && DAG.haveNoCommonBitsSet(N0, N1))
return DAG.getNode(ISD::OR, DL, VT, N0, N1);		return DAG.getNode(ISD::OR, DL, VT, N0, N1);

		// fold (add (xor a, -1), 1) -> (sub 0, a)
		if (N0.getOpcode() == ISD::XOR &&
		isOneConstantOrOneSplatConstant(N1) &&
		isAllOnesConstantOrAllOnesSplatConstant(N0.getOperand(1)))
		return DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT),
		N0.getOperand(0));

if (SDValue Combined = visitADDLike(N0, N1, N))		if (SDValue Combined = visitADDLike(N0, N1, N))
return Combined;		return Combined;

if (SDValue Combined = visitADDLike(N1, N0, N))		if (SDValue Combined = visitADDLike(N1, N0, N))
return Combined;		return Combined;

return SDValue();		return SDValue();
}		}
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitUADDO(SDNode *N) {
if (isNullConstant(N1))		if (isNullConstant(N1))
return CombineTo(N, N0, DAG.getConstant(0, DL, CarryVT));		return CombineTo(N, N0, DAG.getConstant(0, DL, CarryVT));

// If it cannot overflow, transform into an add.		// If it cannot overflow, transform into an add.
if (DAG.computeOverflowKind(N0, N1) == SelectionDAG::OFK_Never)		if (DAG.computeOverflowKind(N0, N1) == SelectionDAG::OFK_Never)
return CombineTo(N, DAG.getNode(ISD::ADD, DL, VT, N0, N1),		return CombineTo(N, DAG.getNode(ISD::ADD, DL, VT, N0, N1),
DAG.getConstant(0, DL, CarryVT));		DAG.getConstant(0, DL, CarryVT));

		// fold (uaddo (xor a, -1), 1) -> (sub 0, a)
		if (N0.getOpcode() == ISD::XOR &&
		isOneConstantOrOneSplatConstant(N1) &&
		isAllOnesConstantOrAllOnesSplatConstant(N0.getOperand(1)))
		return DAG.getNode(ISD::USUBO, DL, N->getVTList(),
		DAG.getConstant(0, DL, VT), N0.getOperand(0));
		efriedmaUnsubmitted Not Done Reply Inline Actions Are you sure this sets the overflow bit correctly? efriedma: Are you sure this sets the overflow bit correctly?
		deadalnixAuthorUnsubmitted Not Done Reply Inline Actions You made me doubt, so I put some test together, and it does indeed overflow correctly. deadalnix: You made me doubt, so I put some test together, and it does indeed overflow correctly.
		efriedmaUnsubmitted Not Done Reply Inline Actions Are you really, 100% sure it sets the overflow bit correctly? "uaddo (a ^ -1), 1" sets the overflow bit if "a ^ 1 == -1", or "a == 0". "usubo 0, a" sets the overflow bit if "a !=0". Simple test program to show this: #include <stdio.h> __attribute((noinline)) int f(unsigned a, unsigned ovf) { return __builtin_sub_overflow(0, a, ovf); } __attribute((noinline)) int f2(unsigned a, unsigned ovf) { return __builtin_add_overflow(a ^ -1, 1, ovf); } int main() { unsigned sum, ovf; ovf = f(0, &sum); printf("%u %u\n", sum, ovf); ovf = f2(0, &sum); printf("%u %u\n", sum, ovf); ovf = f(1, &sum); printf("%u %u\n", sum, ovf); ovf = f2(1, &sum); printf("%u %u\n", sum, ovf); } efriedma: Are you really, 100% sure it sets the overflow bit correctly? "uaddo (a ^ -1), 1" sets the…

if (SDValue Combined = visitUADDOLike(N0, N1, N))		if (SDValue Combined = visitUADDOLike(N0, N1, N))
return Combined;		return Combined;

if (SDValue Combined = visitUADDOLike(N1, N0, N))		if (SDValue Combined = visitUADDOLike(N1, N0, N))
return Combined;		return Combined;

return SDValue();		return SDValue();
}		}
▲ Show 20 Lines • Show All 14,533 Lines • Show Last 20 Lines

test/CodeGen/X86/subcarry.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown \| FileCheck %s

	%S = type { [4 x i64] }			%S = type { [4 x i64] }

	define %S @negate(%S* nocapture readonly %this) {			define %S @negate(%S* nocapture readonly %this) {
	; CHECK-LABEL: negate:			; CHECK-LABEL: negate:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: movq (%rsi), %rax			; CHECK-NEXT: xorl %eax, %eax
				; CHECK-NEXT: subq (%rsi), %rax
	; CHECK-NEXT: movq 8(%rsi), %rcx			; CHECK-NEXT: movq 8(%rsi), %rcx
	; CHECK-NEXT: notq %rax			; CHECK-NEXT: movq 16(%rsi), %rdx
	; CHECK-NEXT: addq $1, %rax
	; CHECK-NEXT: notq %rcx			; CHECK-NEXT: notq %rcx
	; CHECK-NEXT: adcq $0, %rcx			; CHECK-NEXT: adcq $0, %rcx
	; CHECK-NEXT: movq 16(%rsi), %rdx
	; CHECK-NEXT: notq %rdx			; CHECK-NEXT: notq %rdx
	; CHECK-NEXT: adcq $0, %rdx			; CHECK-NEXT: adcq $0, %rdx
	; CHECK-NEXT: movq 24(%rsi), %rsi			; CHECK-NEXT: movq 24(%rsi), %rsi
	; CHECK-NEXT: notq %rsi			; CHECK-NEXT: notq %rsi
	; CHECK-NEXT: adcq $0, %rsi			; CHECK-NEXT: adcq $0, %rsi
	; CHECK-NEXT: movq %rax, (%rdi)			; CHECK-NEXT: movq %rax, (%rdi)
	; CHECK-NEXT: movq %rcx, 8(%rdi)			; CHECK-NEXT: movq %rcx, 8(%rdi)
	; CHECK-NEXT: movq %rdx, 16(%rdi)			; CHECK-NEXT: movq %rdx, 16(%rdi)
	Show All 38 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] fold (add/uaddo (xor a, -1), 1) -> (sub 0, a)AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 99713

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/X86/subcarry.ll

[DAGCombine] fold (add/uaddo (xor a, -1), 1) -> (sub 0, a)
AbandonedPublic