Download Raw Diff

Details

Reviewers

spatel
RKSimon
llvm-commits

Summary

In the provided test case, DAG combiner fails to recognize and combine `(srl
(shl, i8:c0), i64:c1)` even though c0 == c1 in value (but exist as separate
SDNodes because of their different value types). The solution then is to compare
by value rather than node ID.

Note that InstCombine (and thus, opt) fails to combine this as well.

This was discovered with the following bit of C:

C
#include <stdint.h>

typedef struct {
  uint32_t low;
  uint32_t high;
} pair;

pair cmpxchg8b(uint64_t *m, pair src) {
  pair rv = {0, 0};
  asm volatile("cmpxchg8b %2"
               : "+a"(rv.low), "+d"(rv.high), "+m"(m)
               : "b"(src.low), "c"(src.high)
               : "flags");
  return rv;
}

unsigned f(uint64_t *m, unsigned a, unsigned b) {
  pair p = {a > 0, b > 0};
  pair rv = cmpxchg8b(m, p);
  return rv.low != 0 && rv.high > 0;
}

for which clang generates:

f:
        pushq   %rbx
        testl   %esi, %esi
        setne   %al
        testl   %edx, %edx
        setne   %cl
        movzbl  %al, %ebx
        movzbl  %cl, %ecx
        movq    %rdi, -8(%rsp)
        xorl    %edx, %edx
        xorl    %eax, %eax
        #APP
        cmpxchg8b       -8(%rsp)
        #NO_APP
        testl   %eax, %eax
        setne   %al
        movl    %edx, %ecx      # <======
        testq   %rcx, %rcx
        setne   %cl
        andb    %al, %cl
        movzbl  %cl, %eax
        popq    %rbx
        retq

Diff Detail

Repository: rL LLVM

Event Timeline

bryant updated this revision to Diff 65224.Jul 23 2016, 4:57 AM

bryant retitled this revision from to [DAGCombine] Match shift amount by value rather than relying on common sub-expressions..

bryant updated this object.

bryant added reviewers: llvm-commits, spatel, RKSimon.

bryant set the repository for this revision to rL LLVM.

bryant added a subscriber: llvm-commits.

RKSimon added inline comments.Jul 23 2016, 5:14 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4834	Please regenerate the patch with context - something like svn diff --diff-cmd=diff -x -U999999
test/CodeGen/X86/cmp-zext-combine.ll
3	You shouldn't need the -O3 Please can you add a i686 test case as well, and use utils/update_llc_test_checks.py to generate full code output (you will need to add check prefixes to the 32/64 tests).

RKSimon mentioned this in D22626: [X86] Simplify cmp-zext-constant DAG patterns..Jul 23 2016, 5:15 AM

Diff updated with more context. Updated test case.

bryant marked 2 inline comments as done.Jul 23 2016, 2:23 PM

bryant added inline comments.

test/CodeGen/X86/cmp-zext-combine.ll
4	Updated with the output of the python tool. I am not sure at all how to replicate this on i686, as the combine rules are different for zext i16 to i32.

eli.friedman added a subscriber: eli.friedman.Jul 23 2016, 3:32 PM

eli.friedman added inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4854	You can't use getZExtValue on an arbitrary constant; it will crash if the constant is too large.

Add uint64_t check to N0's shift amount operand.

bryant set the repository for this revision to rL LLVM.Jul 24 2016, 8:17 PM

bryant added inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4854	Thanks. APInt indeed asserts that the bit width occupied by the value fits within a `uint64_t`. Bbut doesn't the code two lines below (and in the rest of this function) make the same assumption about constant nodes, albeit about `N1C`?

eli.friedman added inline comments.Jul 25 2016, 10:48 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4854	There's some code near the beginning of the function which makes sure N1C is less than the bitwidth of the LHS. (LLVM only supports integers with sizes up to 2^24 bits or so.) Granted, it looks like it also uses getZExtValue incorrectly, so the following crashes: define <2 x i128> @y(<2 x i128>* byval align 32) #0 { entry: %a.addr = alloca <2 x i128>, align 32 %a = load <2 x i128>, <2 x i128>* %0, align 32 store <2 x i128> %a, <2 x i128>* %a.addr, align 32 %1 = load <2 x i128>, <2 x i128>* %a.addr, align 32 %shr = lshr <2 x i128> %1, <i128 -1, i128 -1> ret <2 x i128> %shr } Patch welcome. :)

RKSimon mentioned this in rL276855: [DAGCombiner] Use APInt directly to detect out of range shift constants.Jul 27 2016, 3:38 AM

RKSimon added inline comments.Jul 27 2016, 3:51 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4854	Fixed in rL276855 - we already had the fix for SHL, so I just updated LSHR/ASHR to match.

RKSimon mentioned this in D23007: [DAGCombiner] Better support for shifting large value type by constants.Aug 1 2016, 3:12 AM

RKSimon added inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4854	I've created D23007 which would allow us to get away from using the getBitWidth / getZExtValue checks in all of these cases.

Updated patch to use APInt-based value comparison. Depends on D23007.

Ping. Has there been decision on whether or not to accept this?

RKSimon mentioned this in rL278141: [DAGCombiner] Better support for shifting large value type by constants.Aug 9 2016, 10:47 AM

D23007 is now committed to trunk

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4865	Please generalise this using APInt.

Update per RKSimon: When computing the width of mask, use APInt.

This also patches the previously-existing undefined behavior that results when the shift amount N1C is greater than BitSize, i.e. (64 + N1C - BitSize) > 64 == bitwidth(~0ULL) which is U.B.

RKSimon added inline comments.Aug 12 2016, 6:21 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4867	It's be great if you can generalise this to work with larger types - i128 for instance.

bryant added inline comments.Aug 12 2016, 7:48 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4867	that would change the intent of the original code. would it still belong in this patch?

The problem summary notes that this fold is missing from InstCombine. Would we sidestep the later questions (and possibly target-specific problems) by doing the transform in IR?

Side note: bryant, your handle seems to have broken the internet :) -
http://lists.llvm.org/pipermail/llvm-dev/2016-August/103596.html

In D22726#513819, @spatel wrote:

The problem summary notes that this fold is missing from InstCombine. Would we sidestep the later questions (and possibly target-specific problems) by doing the transform in IR?

I'm totally open to doing this in InstCombine, hence making the note in the first place.

Side note: bryant, your handle seems to have broken the internet :) -
http://lists.llvm.org/pipermail/llvm-dev/2016-August/103596.html

Thanks (: I've reached out to Cameron and fixed the issue.

In D22726#514961, @bryant wrote:

In D22726#513819, @spatel wrote:

The problem summary notes that this fold is missing from InstCombine. Would we sidestep the later questions (and possibly target-specific problems) by doing the transform in IR?

I'm totally open to doing this in InstCombine, hence making the note in the first place.

Sure. Let me rephrase the question: are you seeing any cases where the opportunity to perform the optimization only emerges in the DAG? If no, then let's start a new patch that does the transform in InstCombine. The earlier IR optimization always beats a DAG combine in reach and can enable other folds to trigger.

bryant added inline comments.Aug 15 2016, 1:20 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4867	Also, `APInt::getAllOnesValue` only goes up to 2**64 - 1. So a patch for that would be needed beforehand?

hfinkel added a subscriber: hfinkel.Aug 15 2016, 1:34 AM

hfinkel added inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4867	Also, APInt::getAllOnesValue only goes up to 2**64 - 1. What do you mean by this? APInt::getAllOnesValue, AFAIK, handles all bit widths.

Generalize to widths of up to 2**64 - 1.

bryant marked 4 inline comments as done.Aug 15 2016, 1:57 AM

bryant added inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4867	Yeah, I read a bit too quickly. Please disregard. The transform is now generalized to up to 2**64 - 1 _in width_. Also, for efficiency's sake, I think it might be better for `c0` and `c1` and their subsequent comparison and other operations to be `uint64_t`, since they're ultimately restricted to that value range anyway.

Remove call to zeroExtendToMatch, since both shift amounts need to be within uint64_t.

RKSimon added inline comments.Aug 16 2016, 5:45 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4856	c1.eq(c0) will assert if they are not the same bitwidth

Reinstate zeroExtendToMatch.
Use OpSizeInBits instead of recomputing result width.

bryant marked an inline comment as done.Aug 16 2016, 2:16 PM

bryant added inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4856	You're right. That was the original premise of this patch.

Almost there!

Please can you add a (srl (shl x, c), c) i128 test to test\CodeGen\X86\shift-i128.ll to demonstrate its working for larger types

No. There are earlier combine rules that match specifically match on iN, N > 64 that distort the (shl (srl x, c), c) pattern.

In D22726#518905, @bryant wrote:

No. There are earlier combine rules that match specifically match on iN, N > 64 that distort the (shl (srl x, c), c) pattern.

Are you saying that they prevent the combine from happening or that the earlier combines assert due to still using getZExtValue() ?

In D22726#519454, @RKSimon wrote:

In D22726#518905, @bryant wrote:

No. There are earlier combine rules that match specifically match on iN, N > 64 that distort the (shl (srl x, c), c) pattern.

Are you saying that they prevent the combine from happening or that the earlier combines assert due to still using getZExtValue() ?

These have been dealt with so I think this patch is ready to go.

This revision is now accepted and ready to land.Sep 14 2016, 8:21 AM

Any chance that you will finish this please? rL284717 made the code change redundant, but the test is still of use.

I'm not sure that rL284717 fixes this:

$ llc -march=x86-64 < test/CodeGen/X86/cmp-zext-combine.ll 
nonzero32:                              # @nonzero32
        movl    %edi, %ecx              # <====== this is still here.
        xorl    %eax, %eax
        testq   %rcx, %rcx
        setne   %al
        retq

The original issue is still present, namely that shift amounts are compared by
DAG node instead of value (from DAGCombiner.cpp):

// fold (srl (shl x, c), c) -> (and x, cst2)
if (N0.getOpcode() == ISD::SHL && N0.getOperand(1) == N1 &&   // here.
    isConstantOrConstantVector(N1, /* NoOpaques */ true)) {
  SDLoc DL(N);

I don't think this is an issue with vectors, so I've updated this patch to
compare by value in the scalar case.

Test cases for have also been added for the 16- and 64-bit cases, but the former
fails:

$ llc -march=x86-64 < test/CodeGen/X86/cmp-zext-combine.ll 
nonzero16:                              # @nonzero16
        shll    $16, %edi               # <====== bad.
        xorl    %eax, %eax
        cmpl    $65535, %edi            # imm = 0xFFFF
        seta    %al
        retq

So I'm rather inclined to follow spatel's advice and make the change in
InstCombine instead. The patch is quite small and handles cases of all
bitwidths.

Rebase this patch onto rL284717.
Add 16- and 64-bit test cases.

The InstCombine patch is here: https://reviews.llvm.org/D25913

With that patch:

$ opt -O3 cmp-zext-combine.ll | llc -march=x86-64
nonzero16:                              # @nonzero16
        xorl    %eax, %eax
        testw   %di, %di
        setne   %al
        retq

nonzero32:                              # @nonzero32
        xorl    %eax, %eax
        testl   %edi, %edi
        setne   %al
        retq

nonzero64:                              # @nonzero64
        xorl    %eax, %eax
        testq   %rdi, %rdi
        setne   %al
        retq

which are the results that we want.

RKSimon added inline comments.Oct 24 2016, 10:17 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4856	Change to use isConstOrConstSplat if you can.

Just an (overdue) update: I've fixed this with Sanjay about two months ago in https://reviews.llvm.org/D25913 . That particular InstCombine allows (IR similar to) the cases in this patch to completely dodge the faulty SelectionDAG transform that generates SETCCs of the wrong width.

Simon: You've mentioned wanting to keep the tests. Is this still the case? Otherwise, I think this thread can be closed.

In D22726#637845, @bryant wrote:

Just an (overdue) update: I've fixed this with Sanjay about two months ago in https://reviews.llvm.org/D25913 . That particular InstCombine allows (IR similar to) the cases in this patch to completely dodge the faulty SelectionDAG transform that generates SETCCs of the wrong width.

Simon: You've mentioned wanting to keep the tests. Is this still the case? Otherwise, I think this thread can be closed.

If its been handled in InstCombine and you avoid the issue arising then I'm happy for this patch to be abandoned.

bryant abandoned this revision.Feb 8 2017, 3:49 AM

Diff 67542

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,825 Lines • ▼ Show 20 Lines	if (N1C && N0.getOpcode() == ISD::SRL) {
}		}
}		}

// fold (srl (trunc (srl x, c1)), c2) -> 0 or (trunc (srl x, (add c1, c2)))		// fold (srl (trunc (srl x, c1)), c2) -> 0 or (trunc (srl x, (add c1, c2)))
if (N1C && N0.getOpcode() == ISD::TRUNCATE &&		if (N1C && N0.getOpcode() == ISD::TRUNCATE &&
N0.getOperand(0).getOpcode() == ISD::SRL &&		N0.getOperand(0).getOpcode() == ISD::SRL &&
isa<ConstantSDNode>(N0.getOperand(0)->getOperand(1))) {		isa<ConstantSDNode>(N0.getOperand(0)->getOperand(1))) {
uint64_t c1 =		uint64_t c1 =
cast<ConstantSDNode>(N0.getOperand(0)->getOperand(1))->getZExtValue();		cast<ConstantSDNode>(N0.getOperand(0)->getOperand(1))->getZExtValue();
		RKSimonUnsubmitted Done Reply Inline Actions Please regenerate the patch with context - something like svn diff --diff-cmd=diff -x -U999999 RKSimon: Please regenerate the patch with context - something like ``` svn diff --diff-cmd=diff -x…
uint64_t c2 = N1C->getZExtValue();		uint64_t c2 = N1C->getZExtValue();
EVT InnerShiftVT = N0.getOperand(0).getValueType();		EVT InnerShiftVT = N0.getOperand(0).getValueType();
EVT ShiftCountVT = N0.getOperand(0)->getOperand(1).getValueType();		EVT ShiftCountVT = N0.getOperand(0)->getOperand(1).getValueType();
uint64_t InnerShiftSize = InnerShiftVT.getScalarType().getSizeInBits();		uint64_t InnerShiftSize = InnerShiftVT.getScalarType().getSizeInBits();
// This is only valid if the OpSizeInBits + c1 = size of inner shift.		// This is only valid if the OpSizeInBits + c1 = size of inner shift.
if (c1 + OpSizeInBits == InnerShiftSize) {		if (c1 + OpSizeInBits == InnerShiftSize) {
SDLoc DL(N0);		SDLoc DL(N0);
if (c1 + c2 >= InnerShiftSize)		if (c1 + c2 >= InnerShiftSize)
return DAG.getConstant(0, DL, VT);		return DAG.getConstant(0, DL, VT);
return DAG.getNode(ISD::TRUNCATE, DL, VT,		return DAG.getNode(ISD::TRUNCATE, DL, VT,
DAG.getNode(ISD::SRL, DL, InnerShiftVT,		DAG.getNode(ISD::SRL, DL, InnerShiftVT,
N0.getOperand(0)->getOperand(0),		N0.getOperand(0)->getOperand(0),
DAG.getConstant(c1 + c2, DL,		DAG.getConstant(c1 + c2, DL,
ShiftCountVT)));		ShiftCountVT)));
}		}
}		}

// fold (srl (shl x, c), c) -> (and x, cst2)		// fold (srl (shl x, c), c) -> (and x, cst2)
if (N1C && N0.getOpcode() == ISD::SHL && N0.getOperand(1) == N1) {		if (N1C && N0.getOpcode() == ISD::SHL &&
		isa<ConstantSDNode>(N0.getOperand(1))) {
		eli.friedmanUnsubmitted Not Done Reply Inline Actions You can't use getZExtValue on an arbitrary constant; it will crash if the constant is too large. eli.friedman: You can't use getZExtValue on an arbitrary constant; it will crash if the constant is too large.
		bryantAuthorUnsubmitted Not Done Reply Inline Actions Thanks. APInt indeed asserts that the bit width occupied by the value fits within a `uint64_t`. Bbut doesn't the code two lines below (and in the rest of this function) make the same assumption about constant nodes, albeit about `N1C`? bryant: Thanks. APInt indeed asserts that the bit width occupied by the value fits within a `uint64_t`.
		eli.friedmanUnsubmitted Not Done Reply Inline Actions There's some code near the beginning of the function which makes sure N1C is less than the bitwidth of the LHS. (LLVM only supports integers with sizes up to 2^24 bits or so.) Granted, it looks like it also uses getZExtValue incorrectly, so the following crashes: define <2 x i128> @y(<2 x i128>* byval align 32) #0 { entry: %a.addr = alloca <2 x i128>, align 32 %a = load <2 x i128>, <2 x i128>* %0, align 32 store <2 x i128> %a, <2 x i128>* %a.addr, align 32 %1 = load <2 x i128>, <2 x i128>* %a.addr, align 32 %shr = lshr <2 x i128> %1, <i128 -1, i128 -1> ret <2 x i128> %shr } Patch welcome. :) eli.friedman: There's some code near the beginning of the function which makes sure N1C is less than the…
		RKSimonUnsubmitted Not Done Reply Inline Actions Fixed in rL276855 - we already had the fix for SHL, so I just updated LSHR/ASHR to match. RKSimon: Fixed in rL276855 - we already had the fix for SHL, so I just updated LSHR/ASHR to match.
		RKSimonUnsubmitted Done Reply Inline Actions I've created D23007 which would allow us to get away from using the getBitWidth / getZExtValue checks in all of these cases. RKSimon: I've created D23007 which would allow us to get away from using the getBitWidth / getZExtValue…
		APInt c1 = cast<ConstantSDNode>(N0.getOperand(1))->getAPIntValue();
		APInt c0 = N1C->getAPIntValue();
		RKSimonUnsubmitted Done Reply Inline Actions c1.eq(c0) will assert if they are not the same bitwidth RKSimon: c1.eq(c0) will assert if they are not the same bitwidth
		bryantAuthorUnsubmitted Not Done Reply Inline Actions You're right. That was the original premise of this patch. bryant: You're right. That was the original premise of this patch.
		RKSimonUnsubmitted Not Done Reply Inline Actions Change to use isConstOrConstSplat if you can. RKSimon: Change to use isConstOrConstSplat if you can.
		zeroExtendToMatch(c1, c0);
		if (c1.eq(c0)) {
unsigned BitSize = N0.getScalarValueSizeInBits();		unsigned BitSize = N0.getScalarValueSizeInBits();
if (BitSize <= 64) {		if (BitSize <= 64) {
uint64_t ShAmt = N1C->getZExtValue() + 64 - BitSize;		APInt Mask = APInt::getAllOnesValue(64).lshr(64 + c0 - BitSize);
SDLoc DL(N);		SDLoc DL(N);
return DAG.getNode(ISD::AND, DL, VT, N0.getOperand(0),		return DAG.getNode(ISD::AND, DL, VT, N0.getOperand(0),
DAG.getConstant(~0ULL >> ShAmt, DL, VT));		DAG.getConstant(Mask, DL, VT));
		}
		RKSimonUnsubmitted Done Reply Inline Actions Please generalise this using APInt. RKSimon: Please generalise this using APInt.
}		}
}		}
		RKSimonUnsubmitted Done Reply Inline Actions It's be great if you can generalise this to work with larger types - i128 for instance. RKSimon: It's be great if you can generalise this to work with larger types - i128 for instance.
		bryantAuthorUnsubmitted Done Reply Inline Actions that would change the intent of the original code. would it still belong in this patch? bryant: that would change the intent of the original code. would it still belong in this patch?
		bryantAuthorUnsubmitted Done Reply Inline Actions Also, `APInt::getAllOnesValue` only goes up to 264 - 1. So a patch for that would be needed beforehand? bryant: Also, `APInt::getAllOnesValue` only goes up to 264 - 1. So a patch for that would be needed…
		hfinkelUnsubmitted Done Reply Inline Actions Also, APInt::getAllOnesValue only goes up to 264 - 1. What do you mean by this? APInt::getAllOnesValue, AFAIK, handles all bit widths. hfinkel: > Also, APInt::getAllOnesValue only goes up to 264 - 1. What do you mean by this? APInt…
		bryantAuthorUnsubmitted Not Done Reply Inline Actions Yeah, I read a bit too quickly. Please disregard. The transform is now generalized to up to 264 - 1 _in width_. Also, for efficiency's sake, I think it might be better for `c0` and `c1` and their subsequent comparison and other operations to be `uint64_t`, since they're ultimately restricted to that value range anyway. bryant:** Yeah, I read a bit too quickly. Please disregard. The transform is now generalized to up to…

// fold (srl (anyextend x), c) -> (and (anyextend (srl x, c)), mask)		// fold (srl (anyextend x), c) -> (and (anyextend (srl x, c)), mask)
if (N1C && N0.getOpcode() == ISD::ANY_EXTEND) {		if (N1C && N0.getOpcode() == ISD::ANY_EXTEND) {
// Shifting in all undef bits?		// Shifting in all undef bits?
EVT SmallVT = N0.getOperand(0).getValueType();		EVT SmallVT = N0.getOperand(0).getValueType();
unsigned BitSize = SmallVT.getScalarSizeInBits();		unsigned BitSize = SmallVT.getScalarSizeInBits();
if (N1C->getZExtValue() >= BitSize)		if (N1C->getZExtValue() >= BitSize)
return DAG.getUNDEF(VT);		return DAG.getUNDEF(VT);
▲ Show 20 Lines • Show All 10,192 Lines • Show Last 20 Lines

test/CodeGen/X86/cmp-zext-combine.ll

This file was added.

				; RUN: llc -march=x86-64 < %s \| FileCheck %s

				define i32 @nonzero(i32) {
				RKSimonUnsubmitted Done Reply Inline Actions You shouldn't need the -O3 Please can you add a i686 test case as well, and use utils/update_llc_test_checks.py to generate full code output (you will need to add check prefixes to the 32/64 tests). RKSimon: You shouldn't need the -O3 Please can you add a i686 test case as well, and use…
				; CHECK-LABEL: nonzero:
				bryantAuthorUnsubmitted Not Done Reply Inline Actions Updated with the output of the python tool. I am not sure at all how to replicate this on i686, as the combine rules are different for zext i16 to i32. bryant: Updated with the output of the python tool. I am not sure at all how to replicate this on i686…
				; CHECK-NEXT: xorl %eax, %eax
				; CHECK-NEXT: testl %edi, %edi
				; CHECK-NEXT: setne %al
				; CHECK-NEXT: retq
				%b = zext i32 %0 to i64
				%c = shl i64 %b, 32
				%d = icmp ugt i64 %c, 4294967295
				%rv = zext i1 %d to i32
				ret i32 %rv
				}

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Match shift amount by value rather than relying on common sub-expressions.
AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 67542

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/X86/cmp-zext-combine.ll

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Match shift amount by value rather than relying on common sub-expressions.AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 67542

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/X86/cmp-zext-combine.ll

[DAGCombine] Match shift amount by value rather than relying on common sub-expressions.
AbandonedPublic