This is an archive of the discontinued LLVM Phabricator instance.

[X86][AArch64][NFC] Add tests for masked merge unfolding
ClosedPublic

Authored by lebedev.ri on Apr 12 2018, 5:01 AM.

Details

Summary

This is PR37104.

PR6773 will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, andl+andn/andps+andnps / bic/bsl would be generated. (see @out)
Now, they would no longer be generated (see @in).
I'm guessing llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp should be able to unfold this.

Diff Detail

Repository
rL LLVM

Event Timeline

lebedev.ri created this revision.Apr 12 2018, 5:01 AM

Added some more tests with constant mask, for a good measure.
Not sure if they are of much use, or whether that is too much?

Right, i can drop half of the x86 output if i split it into scalar and vector files.

Split x86 testcase into scalar and vector files, halving amount of RUN-lines in each of them.

Actually test aarch64 in aarch64 test file, oops.

lebedev.ri planned changes to this revision.Apr 13 2018, 1:11 PM

Need to think more about the patterns/tests. Not needed *yet* anyway.

Don't know what changes are planned here, but this is on the right track. We want to have coverage of the possible canonical IR variations for various targets.

PowerPC with Altivec has a vsel instruction if you want even more coverage, but I don't think the PPC backend has the isel pattern-matching logic to produce that currently (cc @nemanjai).

test/CodeGen/X86/unfold-masked-merge-vector.ll
2

Not sure this RUN adds much value - I think SSE is required by the x86-64 ABI, so we may already be in undefined territory with these tests. :)

A target with -mattr=avx may be slightly more interesting, but there might not be enough ISA difference there to even make that worthwhile.

Don't know what changes are planned here, but this is on the right track. We want to have coverage of the possible canonical IR variations for various targets.

I'm working on that right now, got it working, maybe will update this + post the dagcombiner part (that is where i should have put it, right?) in a few hours.

PowerPC with Altivec has a vsel instruction if you want even more coverage, but I don't think the PPC backend has the isel pattern-matching logic to produce that currently (cc @nemanjai).

Downside: i dropped vector tests for now, only only handle scalars for now.

Don't know what changes are planned here, but this is on the right track. We want to have coverage of the possible canonical IR variations for various targets.

I'm working on that right now, got it working, maybe will update this + post the dagcombiner part (that is where i should have put it, right?) in a few hours.

Yes, this will be in DAGCombiner.

PowerPC with Altivec has a vsel instruction if you want even more coverage, but I don't think the PPC backend has the isel pattern-matching logic to produce that currently (cc @nemanjai).

Downside: i dropped vector tests for now, only only handle scalars for now.

That's actually preferred. Small steps - easier to review and fix when the bug reports come in. :)

Don't know what changes are planned here, but this is on the right track. We want to have coverage of the possible canonical IR variations for various targets.

PowerPC with Altivec has a vsel instruction if you want even more coverage, but I don't think the PPC backend has the isel pattern-matching logic to produce that currently (cc @nemanjai).

Not sure this answer is pertinent any longer, but on PPC we can generate the VSX version of the vector select (xxsel) for rather specific situations. See test/CodeGen/PowerPC/vselect-constants.ll and test/CodeGen/PowerPC/vsx.ll.

Not sure this answer is pertinent any longer, but on PPC we can generate the VSX version of the vector select (xxsel) for rather specific situations. See test/CodeGen/PowerPC/vselect-constants.ll and test/CodeGen/PowerPC/vsx.ll.

I think it's still valid. Didn't mean to be cryptic - here's the vector example from this patch compiled for PPC64le:

define <4 x i32> @out_vec(<4 x i32> %x, <4 x i32> %y, <4 x i32> %mask) {
  %mx = and <4 x i32> %x, %mask
  %notmask = xor <4 x i32> %mask, <i32 -1, i32 -1, i32 -1, i32 -1>
  %my = and <4 x i32> %y, %notmask
  %r = or <4 x i32> %mx, %my
  ret <4 x i32> %r
}

$ ./llc -o - vsel.ll -mtriple=powerpc64le
xxland 0, 34, 36
xxlandc 1, 35, 36
xxlor 34, 0, 1
blr

xxsel 0, 35, 34, 36 ?

...
$ ./llc -o - vsel.ll -mtriple=powerpc64le
xxland 0, 34, 36
xxlandc 1, 35, 36
xxlor 34, 0, 1
blr

xxsel 0, 35, 34, 36 ?

Yes.

lebedev.ri edited the summary of this revision. (Show Details)

Revised tests, dropped vectors for now.

spatel added inline comments.Apr 17 2018, 4:27 PM
test/CodeGen/AArch64/unfold-masked-merge-scalar.ll
454–457 ↗(On Diff #142807)

If you mark functions with 'nounwind', it should remove the cfi noise.

lebedev.ri added inline comments.Apr 18 2018, 10:57 AM
test/CodeGen/AArch64/unfold-masked-merge-scalar.ll
454–457 ↗(On Diff #142807)

Does not work for aarch64, and i'd prefer to keep it consistent.

Revisited tests once, more, added some more complex patterns (with non-trivial 'y' and/or 'm'), that i expect could be failed to be matched.

spatel added inline comments.Apr 20 2018, 10:24 AM
test/CodeGen/AArch64/unfold-masked-merge-scalar.ll
454–457 ↗(On Diff #142807)

By 'does not work', you mean the script wasn't working, right? Can you try again after:
rL330453

Hopefully, we can get rid of the cfi noise for both targets now.

Note that there's little consistency between targets, but I appreciate that goal (the vast majority of AArch tests don't have auto-generated checks).

lebedev.ri marked 3 inline comments as done.

Get rid of CFI noise now that it is possibe after rL330453.

test/CodeGen/AArch64/unfold-masked-merge-scalar.ll
454–457 ↗(On Diff #142807)

Thank you, that was it!

spatel accepted this revision.Apr 20 2018, 1:37 PM

LGTM. You could increase diversity in the constant mask tests by not using a single-string-of-set-bits constant (eg 0x0f0f0f0f instead of 0x0000ffff).

This revision is now accepted and ready to land.Apr 20 2018, 1:37 PM

LGTM. You could increase diversity in the constant mask tests by not using a single-string-of-set-bits constant (eg 0x0f0f0f0f instead of 0x0000ffff).

But since we don't do anything (not fold, not unfold) with these patterns
with constant masks, is there any point in adding even more tests?

LGTM. You could increase diversity in the constant mask tests by not using a single-string-of-set-bits constant (eg 0x0f0f0f0f instead of 0x0000ffff).

But since we don't do anything (not fold, not unfold) with these patterns
with constant masks, is there any point in adding even more tests?

I wouldn't add more tests, but different constants might show the variety of codegen options that AArch is using on these sequences (x86 probably has no differences in that respect?).

Split tests with variable mask and constant mask into separate files, add a bit more tests with different constant mask patterns.

This revision was automatically updated to reflect the committed changes.