This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] fold assertzexts separated by trunc
ClosedPublic

Authored by spatel on Aug 22 2017, 11:26 AM.

Details

Summary

If we have an AssertZext of a truncated value that has already been AssertZext'ed, we can assert on the wider source op to improve the zext-y knowledge.

This is an implementation of the change suggested in D36890 (and I think would obsolete that patch). Actually, I don't think the suggestion #1 that I made there is valid, but this achieves the same result in a safe way by matching the a larger pattern.

The fact that this only affects x86 makes me suspicious that x86 is doing something suboptimal to create these AssertZexts in the first place, but I'm not sure.

Diff Detail

Event Timeline

spatel created this revision.Aug 22 2017, 11:26 AM

Here's the effect on "select_C1_C2_zeroext" nodes in case that makes the transform clearer:

      t4: i32 = AssertZext t2, ValueType:ch:i8
    t5: i8 = truncate t4
  t7: i8 = AssertZext t5, ValueType:ch:i1
t8: i1 = truncate t7

-->

  t16: i32 = AssertZext t2, ValueType:ch:i1
t18: i1 = truncate t16
aaboud edited edge metadata.Aug 22 2017, 2:14 PM

I have one comment below.
By the way, I noticed that the double AssertZero occur only for the x86_64 (in i386 it does not happen).
It might be worth checking where it comes from, regardless of this patch.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7925

What about the other case:
assert (trunc (assert X, i1) to iN), i8 --> trunc (assert X, i1) to iN

Do not we want to cover this as well?
Or we are sure that such case will never occur?

aivchenk edited edge metadata.Aug 22 2017, 2:52 PM

That's certainly a much better approach than the original. Couple of comments/questions:

  • We eliminate the "outer" assert by making "inner" stronger. In a sense, this essentially moves the information about the extension further away from the use. Is there some sort of canonicalization or a convention for that? The other way of removing redundancy would be just to eliminate the weaker "inner" assert, though it would not solve the initial problem for handling AssertZext in the backend. But may be the problem is that we don't handle AssertZext properly?
  • Should we implement the same idea for AssertSext?

I have one comment below.
By the way, I noticed that the double AssertZero occur only for the x86_64 (in i386 it does not happen).
It might be worth checking where it comes from, regardless of this patch.

i386 is (mostly?) immune from this because it passes function arguments on the stack. The first assertzext (the one with i32/i8 types in most of these examples) is generated by X86TargetLowering::LowerFormalArguments(). We always get the zext assert source value type as i8 for an i1 arg because that's what is specified in "RegisterTypeForVT", but I think that's the root cause of the bug. Should we be using the IR type (i1) at this stage when creating assertzext nodes and the subsequent truncate node?

Adding some more potential reviewers. This is my first look at function arg lowering. Is the intermediate trunc to i8 of the formal arg necessary for other reasons? The naive change to X86TargetLowering::LowerFormalArguments() doesn't work for weird vector types or AVX512 masks:

@@ -3015,7 +3015,7 @@
           // Promoting a mask type (v*i1) into a register of type i64/i32/i16/i8
           ArgValue = lowerRegToMasks(ArgValue, VA.getValVT(), RegVT, dl, DAG);
         } else
-          ArgValue = DAG.getNode(ISD::TRUNCATE, dl, VA.getValVT(), ArgValue);
+          ArgValue = DAG.getNode(ISD::TRUNCATE, dl, Ins[InsIndex].ArgVT, ArgValue);
       }
     } else {
       assert(VA.isMemLoc());

We're trying to avoid adding a DAG combine for this pattern by not creating the pattern in the first place:

      t4: i32 = AssertZext t2, ValueType:ch:i8
    t5: i8 = truncate t4
  t7: i8 = AssertZext t5, ValueType:ch:i1
t8: i1 = truncate t7

-->

  t16: i32 = AssertZext t2, ValueType:ch:i1
t18: i1 = truncate t16
  • Should we implement the same idea for AssertSext?

Yes, we should. But after discovering that this code almost already exists in the Mips backend, I'd like to make that a small follow-up just to reduce the patch size and risk.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7925

I haven't seen that occur yet, but since it's easy to handle, I'll make the fold more general.

spatel updated this revision to Diff 113565.Sep 1 2017, 11:43 AM
spatel added a reviewer: efriedma.

Patch updated:
After finding that the same functionality exists in trunk, I think it makes sense to go ahead with this patch. Effectively, we're just taking a general fold that was hiding in the Mips backend and applying it to all targets.

Also, given that there is already another fold for assertzext(assertzext(x)), I assume there's already some scenario under which it makes sense to have folds for these nodes besides the one we're looking at. Therefore, there's not much benefit to trying to stub this pattern out at the source. We just account for the potential inefficiency in argument lowering with this fold.

efriedma added a subscriber: hans.Sep 13 2017, 3:42 PM
efriedma added inline comments.
test/CodeGen/X86/negate-i1.ll
134

This seems kind of scary... is an "i1 zeroext" actually guaranteed to be zero-extended to 64 bits?

craig.topper added inline comments.Sep 13 2017, 3:49 PM
test/CodeGen/X86/negate-i1.ll
134

Nope. It hit PR28540 which should be fixed by D37729

spatel added inline comments.Sep 13 2017, 4:12 PM
test/CodeGen/X86/negate-i1.ll
134

Should I make this patch dependent on that fix? Are we correct when we go from:

        t4: i32 = AssertZext t2, ValueType:ch:i8
      t5: i8 = truncate t4
    t7: i8 = AssertZext t5, ValueType:ch:i1
  t8: i1 = truncate t7
t9: i64 = sign_extend t8

to:

    t16: i32 = AssertZext t2, ValueType:ch:i1
  t18: i64 = any_extend t16
t15: i64 = sign_extend_inreg t18, ValueType:ch:i1
craig.topper edited edge metadata.Sep 18 2017, 1:53 PM

PR28540 is fixed now.

PR28540 is fixed now.

Thanks - so now for the questionable test, we'll get movl+negq rather than negq+movq (no changes to the output from this patch).

Any other known problems?

spatel updated this revision to Diff 115719.Sep 18 2017, 2:16 PM

Patch updated:
No code changes, but test output updated after the fix from rL313563.

PR28540 is fixed now.

Thanks - so now for the questionable test, we'll get movl+negq rather than negq+movq (no changes to the output from this patch).

That comment wasn't clear. I meant we still get the same DAG nodes for the questionable test from this combine:

    t16: i32 = AssertZext t2, ValueType:ch:i1
  t18: i64 = any_extend t16
t15: i64 = sign_extend_inreg t18, ValueType:ch:i1
This revision is now accepted and ready to land.Sep 18 2017, 2:25 PM
This revision was automatically updated to reflect the committed changes.