Page MenuHomePhabricator

[x86] use the IR type of formal args to create assertzext/assertsext and scalar truncate nodes

Authored by spatel on Aug 23 2017, 9:13 AM.



This is an implementation of the idea I suggested in D37017 as an alternate way to solve the multiple assertzext problem. I'm posting this just so it's clear what I had drafted. It could be completely wrong for reasons I don't understand yet. :)

  1. I've avoided the vector problems/questions by only using the IR type in a truncate of a scalar. I don't know what mapping we use for illegal vector types or AVX512 masks to formal args, but there must be some cases where vectors are mapped to scalars because I hit asserts without that limitation.
  2. I loosened an assert in SelectionDAGISel::LowerArguments() to account for this new/unexpected behavior.
  3. The tests diffs are a superset of those in D37017 currently because we handle assertsext here.

Diff Detail

Event Timeline

spatel created this revision.Aug 23 2017, 9:13 AM
aaboud added inline comments.Aug 23 2017, 1:37 PM

Do we still need this?
It is the reason for the second AssertZext instruction.

spatel added inline comments.

I don't think we'll get the 2nd assert for the affected tests with this patch because we trunc to i1, but let me know if you're seeing something else.

But yes, we do need this code - or at least there's some other change needed to account for that part. If I just don't set AssertOp, we'll get these x86 test fails and from what I saw it's because we need extra masking ops without the assert:
Failing Tests (5):

LLVM :: CodeGen/X86/bool-zext.ll
LLVM :: CodeGen/X86/illegal-bitfield-loadstore.ll
LLVM :: CodeGen/X86/negate-i1.ll
LLVM :: CodeGen/X86/sext-i1.ll
LLVM :: CodeGen/X86/tail-call-casts.ll
aaboud added inline comments.Aug 24 2017, 5:08 AM

I was using this test to debug the second AssertSext:

target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"

define i32  @sbar(i1 signext %a) {
 %b = xor i1 %a, 1
 %sext = sext i1 %b to i32
 ret i32 %sext

command line:
llc -mtriple=x86_64-unknown-unknown -o -

DAG after this line:

SelectionDAG has 9 nodes:
            t0: ch = EntryToken
          t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0
        t4: i32 = AssertSext t2, ValueType:ch:i8
      t5: i8 = truncate t4
    t7: i8 = AssertSext t5, ValueType:ch:i1
  t8: i1 = truncate t7
  t0: ch = EntryToken
spatel added inline comments.Aug 24 2017, 6:07 AM

Yes - that's exactly the case that this patch should be hitting. I see:
Creating new node: t4: i32 = AssertSext t2, ValueType:ch:i1
Creating new node: t5: i1 = truncate t4 there's no extra trunc+assertzext with this patch. You're not seeing that?

aaboud added inline comments.Aug 24 2017, 8:18 AM

You are right, this patch prevent this code from generating another AssertSext.
Now we return from getCopyFromParts at line 222, before adding the new AsserSext.


by the way, do we need these parentheses here and below with the one line "else if"/"else" ?

spatel added inline comments.Aug 24 2017, 8:44 AM

Technically, no - but since one clause in this chain had them (and I thought that has a good thing for readability), then all clauses in this chain should have them for symmetry. I think the LLVM coding guidelines are purposely ambiguous on this, but my preference is to brace more. It improves readability and helps avoid bugs if someone comes along and wants to add more code or comments - which I probably should do if this patch is functionally correct and going to proceed :)


Any thoughts about the validity of this approach? If not, I can clean up D37017 as the alternate way to improve the x86 codegen.

efriedma edited edge metadata.Aug 31 2017, 4:47 PM

Using more aggressive AssertZext/etc. when we can seems obviously good. I don't see what that has to do with modifying the type of ArgValue, though.

spatel added a comment.Sep 1 2017, 8:45 AM

Using more aggressive AssertZext/etc. when we can seems obviously good. I don't see what that has to do with modifying the type of ArgValue, though.

IIUC, you're suggesting the solution that @aaboud was probably making earlier, but I didn't recognize it at the time:

  1. Fix the x86 (and other targets*) assertzext generation to use the IR type because that has to be equal or smaller than the register type (that's the first part of the x86 diff in this rev of the patch near line 3000).
  2. Don't alter the ArgValue types.
  3. Fix the later chunk in SelectionDAGISel::LowerArguments() to not create a redundant assertzext if the target has already created one.

This produces identical x86 test diffs as we see here. We'll still have a trunc-of-trunc for i32 -> i8 -> i1 in the affected x86 cases, but those will get folded by existing logic.
*For in-tree targets, I would need to make a similar fix as the x86 change to PowerPC, Mips, and AMDGPU because I see regression test failures for all of those.

As I was looking at what Mips was doing I discovered:

// fold (AssertZext (trunc (AssertZext x))) -> (trunc (AssertZext x))
// if the type of the extension of the innermost AssertZext node is
// smaller from that of the outermost node, eg:
// (AssertZext:i32 (trunc:i32 (AssertZext:i64 X, i32)), i8)
//   -> (trunc:i32 (AssertZext X, i8))

This is similar to what I proposed as a general DAGCombine in D37017. Should we go ahead with that patch and avoid all of the target-specific changes? The upside is that out-of-tree targets likely have the same bug, so if we add a generic combine, everyone gets the improvement. The downside is slightly less efficiency from creating an unnecessary assert node that ends up being removed.

spatel abandoned this revision.Sep 1 2017, 11:46 AM

Abandoning - I've updated D37017; I think that makes more sense.