Page MenuHomePhabricator

[DAGCombine] Match shift amount by value rather than relying on common sub-expressions.
AbandonedPublic

Authored by bryant on Jul 23 2016, 4:57 AM.

Details

Summary

In the provided test case, DAG combiner fails to recognize and combine `(srl
(shl, i8:c0), i64:c1)` even though c0 == c1 in value (but exist as separate
SDNodes because of their different value types). The solution then is to compare
by value rather than node ID.

Note that InstCombine (and thus, opt) fails to combine this as well.

This was discovered with the following bit of C:

C
#include <stdint.h>

typedef struct {
  uint32_t low;
  uint32_t high;
} pair;

pair cmpxchg8b(uint64_t *m, pair src) {
  pair rv = {0, 0};
  asm volatile("cmpxchg8b %2"
               : "+a"(rv.low), "+d"(rv.high), "+m"(m)
               : "b"(src.low), "c"(src.high)
               : "flags");
  return rv;
}

unsigned f(uint64_t *m, unsigned a, unsigned b) {
  pair p = {a > 0, b > 0};
  pair rv = cmpxchg8b(m, p);
  return rv.low != 0 && rv.high > 0;
}

for which clang generates:

f:
        pushq   %rbx
        testl   %esi, %esi
        setne   %al
        testl   %edx, %edx
        setne   %cl
        movzbl  %al, %ebx
        movzbl  %cl, %ecx
        movq    %rdi, -8(%rsp)
        xorl    %edx, %edx
        xorl    %eax, %eax
        #APP
        cmpxchg8b       -8(%rsp)
        #NO_APP
        testl   %eax, %eax
        setne   %al
        movl    %edx, %ecx      # <======
        testq   %rcx, %rcx
        setne   %cl
        andb    %al, %cl
        movzbl  %cl, %eax
        popq    %rbx
        retq

Diff Detail

Repository
rL LLVM

Event Timeline

bryant updated this revision to Diff 65224.Jul 23 2016, 4:57 AM
bryant retitled this revision from to [DAGCombine] Match shift amount by value rather than relying on common sub-expressions..
bryant updated this object.
bryant added reviewers: llvm-commits, spatel, RKSimon.
bryant set the repository for this revision to rL LLVM.
bryant added a subscriber: llvm-commits.
RKSimon added inline comments.Jul 23 2016, 5:14 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4834

Please regenerate the patch with context - something like

svn diff --diff-cmd=diff -x -U999999
test/CodeGen/X86/cmp-zext-combine.ll
3

You shouldn't need the -O3

Please can you add a i686 test case as well, and use utils/update_llc_test_checks.py to generate full code output (you will need to add check prefixes to the 32/64 tests).

bryant updated this revision to Diff 65257.Jul 23 2016, 2:21 PM

Diff updated with more context. Updated test case.

bryant marked 2 inline comments as done.Jul 23 2016, 2:23 PM
bryant added inline comments.
test/CodeGen/X86/cmp-zext-combine.ll
4

Updated with the output of the python tool. I am not sure at all how to replicate this on i686, as the combine rules are different for zext i16 to i32.

bryant marked an inline comment as done.Jul 23 2016, 2:23 PM
eli.friedman added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4854

You can't use getZExtValue on an arbitrary constant; it will crash if the constant is too large.

bryant updated this revision to Diff 65290.Jul 24 2016, 8:17 PM
bryant removed rL LLVM as the repository for this revision.

Add uint64_t check to N0's shift amount operand.

bryant set the repository for this revision to rL LLVM.Jul 24 2016, 8:17 PM
bryant added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4854

Thanks. APInt indeed asserts that the bit width occupied by the value fits within a uint64_t.

Bbut doesn't the code two lines below (and in the rest of this function) make the same assumption about constant nodes, albeit about N1C?

eli.friedman added inline comments.Jul 25 2016, 10:48 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4854

There's some code near the beginning of the function which makes sure N1C is less than the bitwidth of the LHS. (LLVM only supports integers with sizes up to 2^24 bits or so.) Granted, it looks like it also uses getZExtValue incorrectly, so the following crashes:

define <2 x i128> @y(<2 x i128>* byval align 32) #0 {
entry:
  %a.addr = alloca <2 x i128>, align 32
  %a = load <2 x i128>, <2 x i128>* %0, align 32
  store <2 x i128> %a, <2 x i128>* %a.addr, align 32
  %1 = load <2 x i128>, <2 x i128>* %a.addr, align 32
  %shr = lshr <2 x i128> %1, <i128 -1, i128 -1>
  ret <2 x i128> %shr
}

Patch welcome. :)

RKSimon added inline comments.Jul 27 2016, 3:51 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4854

Fixed in rL276855 - we already had the fix for SHL, so I just updated LSHR/ASHR to match.

RKSimon added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4854

I've created D23007 which would allow us to get away from using the getBitWidth / getZExtValue checks in all of these cases.

bryant updated this revision to Diff 66427.Aug 1 2016, 8:39 PM

Updated patch to use APInt-based value comparison. Depends on D23007.

bryant marked an inline comment as done.Aug 1 2016, 8:40 PM
bryant added a comment.Aug 7 2016, 9:07 PM

Ping. Has there been decision on whether or not to accept this?

RKSimon edited edge metadata.Aug 9 2016, 10:57 AM

D23007 is now committed to trunk

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4865

Please generalise this using APInt.

bryant updated this revision to Diff 67542.EditedAug 10 2016, 10:04 AM
bryant edited edge metadata.

Update per RKSimon: When computing the width of mask, use APInt.

This also patches the previously-existing undefined behavior that results when the shift amount N1C is greater than BitSize, i.e. (64 + N1C - BitSize) > 64 == bitwidth(~0ULL) which is U.B.

bryant marked an inline comment as done.Aug 10 2016, 10:06 AM
RKSimon added inline comments.Aug 12 2016, 6:21 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4867

It's be great if you can generalise this to work with larger types - i128 for instance.

bryant added inline comments.Aug 12 2016, 7:48 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4867

that would change the intent of the original code. would it still belong in this patch?

spatel edited edge metadata.Aug 12 2016, 8:04 AM

The problem summary notes that this fold is missing from InstCombine. Would we sidestep the later questions (and possibly target-specific problems) by doing the transform in IR?

Side note: bryant, your handle seems to have broken the internet :) -
http://lists.llvm.org/pipermail/llvm-dev/2016-August/103596.html

The problem summary notes that this fold is missing from InstCombine. Would we sidestep the later questions (and possibly target-specific problems) by doing the transform in IR?

I'm totally open to doing this in InstCombine, hence making the note in the first place.

Side note: bryant, your handle seems to have broken the internet :) -
http://lists.llvm.org/pipermail/llvm-dev/2016-August/103596.html

Thanks (: I've reached out to Cameron and fixed the issue.

The problem summary notes that this fold is missing from InstCombine. Would we sidestep the later questions (and possibly target-specific problems) by doing the transform in IR?

I'm totally open to doing this in InstCombine, hence making the note in the first place.

Sure. Let me rephrase the question: are you seeing any cases where the opportunity to perform the optimization only emerges in the DAG? If no, then let's start a new patch that does the transform in InstCombine. The earlier IR optimization always beats a DAG combine in reach and can enable other folds to trigger.

bryant added inline comments.Aug 15 2016, 1:20 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4867

Also, APInt::getAllOnesValue only goes up to 2**64 - 1. So a patch for that would be needed beforehand?

hfinkel added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4867

Also, APInt::getAllOnesValue only goes up to 2**64 - 1.

What do you mean by this? APInt::getAllOnesValue, AFAIK, handles all bit widths.

bryant updated this revision to Diff 68004.Aug 15 2016, 1:52 AM
bryant edited edge metadata.

Generalize to widths of up to 2**64 - 1.

bryant marked 4 inline comments as done.Aug 15 2016, 1:57 AM
bryant added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4867

Yeah, I read a bit too quickly. Please disregard. The transform is now generalized to up to 2**64 - 1 _in width_.

Also, for efficiency's sake, I think it might be better for c0 and c1 and their subsequent comparison and other operations to be uint64_t, since they're ultimately restricted to that value range anyway.

bryant updated this revision to Diff 68006.Aug 15 2016, 2:03 AM
bryant marked an inline comment as done.

Remove call to zeroExtendToMatch, since both shift amounts need to be within uint64_t.

RKSimon added inline comments.Aug 16 2016, 5:45 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4856

c1.eq(c0) will assert if they are not the same bitwidth

bryant updated this revision to Diff 68253.Aug 16 2016, 2:15 PM
  • Reinstate zeroExtendToMatch.
  • Use OpSizeInBits instead of recomputing result width.
bryant marked an inline comment as done.Aug 16 2016, 2:16 PM
bryant added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4856

You're right. That was the original premise of this patch.

Almost there!

Please can you add a (srl (shl x, c), c) i128 test to test\CodeGen\X86\shift-i128.ll to demonstrate its working for larger types

bryant marked an inline comment as done.Aug 17 2016, 5:22 PM

No. There are earlier combine rules that match specifically match on iN, N > 64 that distort the (shl (srl x, c), c) pattern.

No. There are earlier combine rules that match specifically match on iN, N > 64 that distort the (shl (srl x, c), c) pattern.

No. There are earlier combine rules that match specifically match on iN, N > 64 that distort the (shl (srl x, c), c) pattern.

Are you saying that they prevent the combine from happening or that the earlier combines assert due to still using getZExtValue() ?

RKSimon accepted this revision.Sep 14 2016, 8:21 AM
RKSimon edited edge metadata.

No. There are earlier combine rules that match specifically match on iN, N > 64 that distort the (shl (srl x, c), c) pattern.

Are you saying that they prevent the combine from happening or that the earlier combines assert due to still using getZExtValue() ?

These have been dealt with so I think this patch is ready to go.

This revision is now accepted and ready to land.Sep 14 2016, 8:21 AM

Any chance that you will finish this please? rL284717 made the code change redundant, but the test is still of use.

I'm not sure that rL284717 fixes this:

$ llc -march=x86-64 < test/CodeGen/X86/cmp-zext-combine.ll 
nonzero32:                              # @nonzero32
        movl    %edi, %ecx              # <====== this is still here.
        xorl    %eax, %eax
        testq   %rcx, %rcx
        setne   %al
        retq

The original issue is still present, namely that shift amounts are compared by
DAG node instead of value (from DAGCombiner.cpp):

// fold (srl (shl x, c), c) -> (and x, cst2)
if (N0.getOpcode() == ISD::SHL && N0.getOperand(1) == N1 &&   // here.
    isConstantOrConstantVector(N1, /* NoOpaques */ true)) {
  SDLoc DL(N);

I don't think this is an issue with vectors, so I've updated this patch to
compare by value in the scalar case.

Test cases for have also been added for the 16- and 64-bit cases, but the former
fails:

$ llc -march=x86-64 < test/CodeGen/X86/cmp-zext-combine.ll 
nonzero16:                              # @nonzero16
        shll    $16, %edi               # <====== bad.
        xorl    %eax, %eax
        cmpl    $65535, %edi            # imm = 0xFFFF
        seta    %al
        retq

So I'm rather inclined to follow spatel's advice and make the change in
InstCombine instead. The patch is quite small and handles cases of all
bitwidths.

bryant updated this revision to Diff 75605.Oct 24 2016, 9:53 AM
bryant edited edge metadata.
  • Rebase this patch onto rL284717.
  • Add 16- and 64-bit test cases.

The InstCombine patch is here: https://reviews.llvm.org/D25913

With that patch:

$ opt -O3 cmp-zext-combine.ll | llc -march=x86-64
nonzero16:                              # @nonzero16
        xorl    %eax, %eax
        testw   %di, %di
        setne   %al
        retq

nonzero32:                              # @nonzero32
        xorl    %eax, %eax
        testl   %edi, %edi
        setne   %al
        retq

nonzero64:                              # @nonzero64
        xorl    %eax, %eax
        testq   %rdi, %rdi
        setne   %al
        retq

which are the results that we want.

RKSimon added inline comments.Oct 24 2016, 10:17 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4856

Change to use isConstOrConstSplat if you can.

bryant added a comment.Jan 6 2017, 6:23 AM

Just an (overdue) update: I've fixed this with Sanjay about two months ago in https://reviews.llvm.org/D25913 . That particular InstCombine allows (IR similar to) the cases in this patch to completely dodge the faulty SelectionDAG transform that generates SETCCs of the wrong width.

Simon: You've mentioned wanting to keep the tests. Is this still the case? Otherwise, I think this thread can be closed.

Just an (overdue) update: I've fixed this with Sanjay about two months ago in https://reviews.llvm.org/D25913 . That particular InstCombine allows (IR similar to) the cases in this patch to completely dodge the faulty SelectionDAG transform that generates SETCCs of the wrong width.

Simon: You've mentioned wanting to keep the tests. Is this still the case? Otherwise, I think this thread can be closed.

If its been handled in InstCombine and you avoid the issue arising then I'm happy for this patch to be abandoned.

bryant abandoned this revision.Feb 8 2017, 3:49 AM