Download Raw Diff

Details

Reviewers

spatel
craig.topper
RKSimon
javed.absar

Commits

rGbf18cc56d3a5: [X86][AArch64][NFC] Add tests for masked merge unfolding
rL330645: [X86][AArch64][NFC] Add tests for masked merge unfolding

Summary

This is PR37104.

PR6773 will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, andl+andn/andps+andnps / bic/bsl would be generated. (see @out)
Now, they would no longer be generated (see @in).
I'm guessing llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp should be able to unfold this.

Diff Detail

Repository: rL LLVM

Event Timeline

lebedev.ri created this revision.Apr 12 2018, 5:01 AM

Herald added subscribers: kristof.beyls, javed.absar, rengolin. · View Herald TranscriptApr 12 2018, 5:01 AM

Added some more tests with constant mask, for a good measure.
Not sure if they are of much use, or whether that is too much?

Right, i can drop half of the x86 output if i split it into scalar and vector files.

Split x86 testcase into scalar and vector files, halving amount of RUN-lines in each of them.

Actually test aarch64 in aarch64 test file, oops.

Need to think more about the patterns/tests. Not needed *yet* anyway.

Don't know what changes are planned here, but this is on the right track. We want to have coverage of the possible canonical IR variations for various targets.

PowerPC with Altivec has a vsel instruction if you want even more coverage, but I don't think the PPC backend has the isel pattern-matching logic to produce that currently (cc @nemanjai).

test/CodeGen/X86/unfold-masked-merge-vector.ll
2	Not sure this RUN adds much value - I think SSE is required by the x86-64 ABI, so we may already be in undefined territory with these tests. :) A target with -mattr=avx may be slightly more interesting, but there might not be enough ISA difference there to even make that worthwhile.

In D45563#1069916, @spatel wrote:

Don't know what changes are planned here, but this is on the right track. We want to have coverage of the possible canonical IR variations for various targets.

I'm working on that right now, got it working, maybe will update this + post the dagcombiner part (that is where i should have put it, right?) in a few hours.

PowerPC with Altivec has a vsel instruction if you want even more coverage, but I don't think the PPC backend has the isel pattern-matching logic to produce that currently (cc @nemanjai).

Downside: i dropped vector tests for now, only only handle scalars for now.

In D45563#1069922, @lebedev.ri wrote:

In D45563#1069916, @spatel wrote:

Don't know what changes are planned here, but this is on the right track. We want to have coverage of the possible canonical IR variations for various targets.

I'm working on that right now, got it working, maybe will update this + post the dagcombiner part (that is where i should have put it, right?) in a few hours.

Yes, this will be in DAGCombiner.

PowerPC with Altivec has a vsel instruction if you want even more coverage, but I don't think the PPC backend has the isel pattern-matching logic to produce that currently (cc @nemanjai).

Downside: i dropped vector tests for now, only only handle scalars for now.

That's actually preferred. Small steps - easier to review and fix when the bug reports come in. :)

In D45563#1069916, @spatel wrote:

Don't know what changes are planned here, but this is on the right track. We want to have coverage of the possible canonical IR variations for various targets.

PowerPC with Altivec has a vsel instruction if you want even more coverage, but I don't think the PPC backend has the isel pattern-matching logic to produce that currently (cc @nemanjai).

Not sure this answer is pertinent any longer, but on PPC we can generate the VSX version of the vector select (xxsel) for rather specific situations. See test/CodeGen/PowerPC/vselect-constants.ll and test/CodeGen/PowerPC/vsx.ll.

In D45563#1069968, @nemanjai wrote:

In D45563#1069916, @spatel wrote:

Not sure this answer is pertinent any longer, but on PPC we can generate the VSX version of the vector select (xxsel) for rather specific situations. See test/CodeGen/PowerPC/vselect-constants.ll and test/CodeGen/PowerPC/vsx.ll.

I think it's still valid. Didn't mean to be cryptic - here's the vector example from this patch compiled for PPC64le:

define <4 x i32> @out_vec(<4 x i32> %x, <4 x i32> %y, <4 x i32> %mask) {
  %mx = and <4 x i32> %x, %mask
  %notmask = xor <4 x i32> %mask, <i32 -1, i32 -1, i32 -1, i32 -1>
  %my = and <4 x i32> %y, %notmask
  %r = or <4 x i32> %mx, %my
  ret <4 x i32> %r
}

$ ./llc -o - vsel.ll -mtriple=powerpc64le
xxland 0, 34, 36
xxlandc 1, 35, 36
xxlor 34, 0, 1
blr

xxsel 0, 35, 34, 36 ?

In D45563#1069984, @spatel wrote:

...
$ ./llc -o - vsel.ll -mtriple=powerpc64le
xxland 0, 34, 36
xxlandc 1, 35, 36
xxlor 34, 0, 1
blr

xxsel 0, 35, 34, 36 ?

Yes.

Revised tests, dropped vectors for now.

spatel added inline comments.Apr 17 2018, 4:27 PM

test/CodeGen/AArch64/unfold-masked-merge-scalar.ll
454–457 ↗	(On Diff #142807)	If you mark functions with 'nounwind', it should remove the cfi noise.

lebedev.ri added inline comments.Apr 18 2018, 10:57 AM

test/CodeGen/AArch64/unfold-masked-merge-scalar.ll
454–457 ↗	(On Diff #142807)	Does not work for aarch64, and i'd prefer to keep it consistent.

Revisited tests once, more, added some more complex patterns (with non-trivial 'y' and/or 'm'), that i expect could be failed to be matched.

spatel added inline comments.Apr 20 2018, 10:24 AM

test/CodeGen/AArch64/unfold-masked-merge-scalar.ll
454–457 ↗	(On Diff #142807)	By 'does not work', you mean the script wasn't working, right? Can you try again after: rL330453 Hopefully, we can get rid of the cfi noise for both targets now. Note that there's little consistency between targets, but I appreciate that goal (the vast majority of AArch tests don't have auto-generated checks).

Herald added a reviewer: javed.absar. · View Herald TranscriptApr 20 2018, 10:24 AM

Get rid of CFI noise now that it is possibe after rL330453.

test/CodeGen/AArch64/unfold-masked-merge-scalar.ll
454–457 ↗	(On Diff #142807)	Thank you, that was it!

LGTM. You could increase diversity in the constant mask tests by not using a single-string-of-set-bits constant (eg 0x0f0f0f0f instead of 0x0000ffff).

This revision is now accepted and ready to land.Apr 20 2018, 1:37 PM

In D45563#1074051, @spatel wrote:

LGTM. You could increase diversity in the constant mask tests by not using a single-string-of-set-bits constant (eg 0x0f0f0f0f instead of 0x0000ffff).

But since we don't do anything (not fold, not unfold) with these patterns
with constant masks, is there any point in adding even more tests?

In D45563#1074387, @lebedev.ri wrote:

In D45563#1074051, @spatel wrote:

LGTM. You could increase diversity in the constant mask tests by not using a single-string-of-set-bits constant (eg 0x0f0f0f0f instead of 0x0000ffff).

But since we don't do anything (not fold, not unfold) with these patterns
with constant masks, is there any point in adding even more tests?

I wouldn't add more tests, but different constants might show the variety of codegen options that AArch is using on these sequences (x86 probably has no differences in that respect?).

Split tests with variable mask and constant mask into separate files, add a bit more tests with different constant mask patterns.

Closed by commit rL330645: [X86][AArch64][NFC] Add tests for masked merge unfolding (authored by lebedevri). · Explain WhyApr 23 2018, 1:42 PM

This revision was automatically updated to reflect the committed changes.

Diff 142177

test/CodeGen/AArch64/unfold-masked-merge.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=aarch64-unknown-linux-gnu < %s \| FileCheck %s

				; https://bugs.llvm.org/show_bug.cgi?id=37104

				define i32 @out(i32 %x, i32 %y, i32 %mask) {
				; CHECK-LABEL: out:
				; CHECK: // %bb.0:
				; CHECK-NEXT: and w8, w0, w2
				; CHECK-NEXT: bic w9, w1, w2
				; CHECK-NEXT: orr w0, w8, w9
				; CHECK-NEXT: ret
				%mx = and i32 %x, %mask
				%notmask = xor i32 %mask, -1
				%my = and i32 %y, %notmask
				%r = or i32 %mx, %my
				ret i32 %r
				}

				define i32 @out_constmask(i32 %x, i32 %y) {
				; CHECK-LABEL: out_constmask:
				; CHECK: // %bb.0:
				; CHECK-NEXT: lsr w8, w0, #8
				; CHECK-NEXT: bfi w1, w8, #8, #16
				; CHECK-NEXT: mov w0, w1
				; CHECK-NEXT: ret
				%mx = and i32 %x, 16776960
				%my = and i32 %y, -16776961
				%r = or i32 %mx, %my
				ret i32 %r
				}

				define <4 x i32> @out_vec(<4 x i32> %x, <4 x i32> %y, <4 x i32> %mask) {
				; CHECK-LABEL: out_vec:
				; CHECK: // %bb.0:
				; CHECK-NEXT: bsl v2.16b, v0.16b, v1.16b
				; CHECK-NEXT: mov v0.16b, v2.16b
				; CHECK-NEXT: ret
				%mx = and <4 x i32> %x, %mask
				%notmask = xor <4 x i32> %mask, <i32 -1, i32 -1, i32 -1, i32 -1>
				%my = and <4 x i32> %y, %notmask
				%r = or <4 x i32> %mx, %my
				ret <4 x i32> %r
				}

				define <4 x i32> @out_vec_undef(<4 x i32> %x, <4 x i32> %y, <4 x i32> %mask) {
				; CHECK-LABEL: out_vec_undef:
				; CHECK: // %bb.0:
				; CHECK-NEXT: bsl v2.16b, v0.16b, v1.16b
				; CHECK-NEXT: mov v0.16b, v2.16b
				; CHECK-NEXT: ret
				%mx = and <4 x i32> %x, %mask
				%notmask = xor <4 x i32> %mask, <i32 -1, i32 -1, i32 undef, i32 -1>
				%my = and <4 x i32> %y, %notmask
				%r = or <4 x i32> %mx, %my
				ret <4 x i32> %r
				}

				define <4 x i32> @out_vec_constmask_splat(<4 x i32> %x, <4 x i32> %y, <4 x i32> %mask) {
				; CHECK-LABEL: out_vec_constmask_splat:
				; CHECK: // %bb.0:
				; CHECK-NEXT: movi v2.2d, #0xffff0000ffff00
				; CHECK-NEXT: bsl v2.16b, v0.16b, v1.16b
				; CHECK-NEXT: mov v0.16b, v2.16b
				; CHECK-NEXT: ret
				%mx = and <4 x i32> %x, <i32 16776960, i32 16776960, i32 16776960, i32 16776960>
				%my = and <4 x i32> %y, <i32 -16776961, i32 -16776961, i32 -16776961, i32 -16776961>
				%r = or <4 x i32> %mx, %my
				ret <4 x i32> %r
				}

				define <4 x i32> @out_vec_constmask_undef(<4 x i32> %x, <4 x i32> %y, <4 x i32> %mask) {
				; CHECK-LABEL: out_vec_constmask_undef:
				; CHECK: // %bb.0:
				; CHECK-NEXT: movi v2.2d, #0xffff0000ffff00
				; CHECK-NEXT: movi v3.2d, #0xff0000ffff0000ff
				; CHECK-NEXT: and v0.16b, v0.16b, v2.16b
				; CHECK-NEXT: and v1.16b, v1.16b, v3.16b
				; CHECK-NEXT: orr v0.16b, v0.16b, v1.16b
				; CHECK-NEXT: ret
				%mx = and <4 x i32> %x, <i32 16776960, i32 16776960, i32 undef, i32 16776960>
				%my = and <4 x i32> %y, <i32 -16776961, i32 -16776961, i32 undef, i32 -16776961>
				%r = or <4 x i32> %mx, %my
				ret <4 x i32> %r
				}

				define <4 x i32> @out_vec_constmask_nonsplat(<4 x i32> %x, <4 x i32> %y, <4 x i32> %mask) {
				; CHECK-LABEL: out_vec_constmask_nonsplat:
				; CHECK: // %bb.0:
				; CHECK-NEXT: movi v2.2d, #0xff00ff00ffff00
				; CHECK-NEXT: bsl v2.16b, v0.16b, v1.16b
				; CHECK-NEXT: mov v0.16b, v2.16b
				; CHECK-NEXT: ret
				%mx = and <4 x i32> %x, <i32 16776960, i32 16711935, i32 16776960, i32 16711935>
				%my = and <4 x i32> %y, <i32 -16776961, i32 -16711936, i32 -16776961, i32 -16711936>
				%r = or <4 x i32> %mx, %my
				ret <4 x i32> %r
				}

				;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
				; Should be the same as the previous one.
				;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

				define i32 @in(i32 %x, i32 %y, i32 %mask) {
				; CHECK-LABEL: in:
				; CHECK: // %bb.0:
				; CHECK-NEXT: eor w8, w0, w1
				; CHECK-NEXT: and w8, w8, w2
				; CHECK-NEXT: eor w0, w8, w1
				; CHECK-NEXT: ret
				%n0 = xor i32 %x, %y
				%n1 = and i32 %n0, %mask
				%r = xor i32 %n1, %y
				ret i32 %r
				}

				define i32 @in_constmask(i32 %x, i32 %y) {
				; CHECK-LABEL: in_constmask:
				; CHECK: // %bb.0:
				; CHECK-NEXT: eor w8, w0, w1
				; CHECK-NEXT: and w8, w8, #0xffff00
				; CHECK-NEXT: eor w0, w8, w1
				; CHECK-NEXT: ret
				%n0 = xor i32 %x, %y
				%n1 = and i32 %n0, 16776960
				%r = xor i32 %n1, %y
				ret i32 %r
				}

				define <4 x i32> @in_vec(<4 x i32> %x, <4 x i32> %y, <4 x i32> %mask) {
				; CHECK-LABEL: in_vec:
				; CHECK: // %bb.0:
				; CHECK-NEXT: eor v0.16b, v0.16b, v1.16b
				; CHECK-NEXT: and v0.16b, v0.16b, v2.16b
				; CHECK-NEXT: eor v0.16b, v0.16b, v1.16b
				; CHECK-NEXT: ret
				%n0 = xor <4 x i32> %x, %y
				%n1 = and <4 x i32> %n0, %mask
				%r = xor <4 x i32> %n1, %y
				ret <4 x i32> %r
				}

				define <4 x i32> @in_vec_constmask_splat(<4 x i32> %x, <4 x i32> %y) {
				; CHECK-LABEL: in_vec_constmask_splat:
				; CHECK: // %bb.0:
				; CHECK-NEXT: eor v0.16b, v0.16b, v1.16b
				; CHECK-NEXT: movi v2.2d, #0xffff0000ffff00
				; CHECK-NEXT: and v0.16b, v0.16b, v2.16b
				; CHECK-NEXT: eor v0.16b, v0.16b, v1.16b
				; CHECK-NEXT: ret
				%n0 = xor <4 x i32> %x, %y
				%n1 = and <4 x i32> %n0, <i32 16776960, i32 16776960, i32 16776960, i32 16776960>
				%r = xor <4 x i32> %n1, %y
				ret <4 x i32> %r
				}

				define <4 x i32> @in_vec_constmask_undef(<4 x i32> %x, <4 x i32> %y) {
				; CHECK-LABEL: in_vec_constmask_undef:
				; CHECK: // %bb.0:
				; CHECK-NEXT: eor v0.16b, v0.16b, v1.16b
				; CHECK-NEXT: movi v2.2d, #0xffff0000ffff00
				; CHECK-NEXT: and v0.16b, v0.16b, v2.16b
				; CHECK-NEXT: eor v0.16b, v0.16b, v1.16b
				; CHECK-NEXT: ret
				%n0 = xor <4 x i32> %x, %y
				%n1 = and <4 x i32> %n0, <i32 16776960, i32 16776960, i32 undef, i32 16776960>
				%r = xor <4 x i32> %n1, %y
				ret <4 x i32> %r
				}

				define <4 x i32> @in_vec_constmask_nonsplat(<4 x i32> %x, <4 x i32> %y) {
				; CHECK-LABEL: in_vec_constmask_nonsplat:
				; CHECK: // %bb.0:
				; CHECK-NEXT: eor v0.16b, v0.16b, v1.16b
				; CHECK-NEXT: movi v2.2d, #0xff00ff00ffff00
				; CHECK-NEXT: and v0.16b, v0.16b, v2.16b
				; CHECK-NEXT: eor v0.16b, v0.16b, v1.16b
				; CHECK-NEXT: ret
				%n0 = xor <4 x i32> %x, %y
				%n1 = and <4 x i32> %n0, <i32 16776960, i32 16711935, i32 16776960, i32 16711935>
				%r = xor <4 x i32> %n1, %y
				ret <4 x i32> %r
				}

test/CodeGen/X86/unfold-masked-merge-scalar.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=x86_64-unknown-linux-gnu -mattr=-bmi < %s \| FileCheck %s --check-prefix=CHECK-NOBMI
				; RUN: llc -mtriple=x86_64-unknown-linux-gnu -mattr=+bmi < %s \| FileCheck %s --check-prefix=CHECK-BMI

				; https://bugs.llvm.org/show_bug.cgi?id=37104

				define i32 @out(i32 %x, i32 %y, i32 %mask) {
				; CHECK-NOBMI-LABEL: out:
				; CHECK-NOBMI: # %bb.0:
				; CHECK-NOBMI-NEXT: andl %edx, %edi
				; CHECK-NOBMI-NEXT: notl %edx
				; CHECK-NOBMI-NEXT: andl %esi, %edx
				; CHECK-NOBMI-NEXT: orl %edi, %edx
				; CHECK-NOBMI-NEXT: movl %edx, %eax
				; CHECK-NOBMI-NEXT: retq
				;
				; CHECK-BMI-LABEL: out:
				; CHECK-BMI: # %bb.0:
				; CHECK-BMI-NEXT: andl %edx, %edi
				; CHECK-BMI-NEXT: andnl %esi, %edx, %eax
				; CHECK-BMI-NEXT: orl %edi, %eax
				; CHECK-BMI-NEXT: retq
				%mx = and i32 %x, %mask
				%notmask = xor i32 %mask, -1
				%my = and i32 %y, %notmask
				%r = or i32 %mx, %my
				ret i32 %r
				}

				define i32 @out_constmask(i32 %x, i32 %y) {
				; CHECK-NOBMI-LABEL: out_constmask:
				; CHECK-NOBMI: # %bb.0:
				; CHECK-NOBMI-NEXT: # kill: def $esi killed $esi def $rsi
				; CHECK-NOBMI-NEXT: # kill: def $edi killed $edi def $rdi
				; CHECK-NOBMI-NEXT: andl $16776960, %edi # imm = 0xFFFF00
				; CHECK-NOBMI-NEXT: andl $-16776961, %esi # imm = 0xFF0000FF
				; CHECK-NOBMI-NEXT: leal (%rsi,%rdi), %eax
				; CHECK-NOBMI-NEXT: retq
				;
				; CHECK-BMI-LABEL: out_constmask:
				; CHECK-BMI: # %bb.0:
				; CHECK-BMI-NEXT: # kill: def $esi killed $esi def $rsi
				; CHECK-BMI-NEXT: # kill: def $edi killed $edi def $rdi
				; CHECK-BMI-NEXT: andl $16776960, %edi # imm = 0xFFFF00
				; CHECK-BMI-NEXT: andl $-16776961, %esi # imm = 0xFF0000FF
				; CHECK-BMI-NEXT: leal (%rsi,%rdi), %eax
				; CHECK-BMI-NEXT: retq
				%mx = and i32 %x, 16776960
				%my = and i32 %y, -16776961
				%r = or i32 %mx, %my
				ret i32 %r
				}

				;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
				; Should be the same as the previous one.
				;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

				define i32 @in(i32 %x, i32 %y, i32 %mask) {
				; CHECK-NOBMI-LABEL: in:
				; CHECK-NOBMI: # %bb.0:
				; CHECK-NOBMI-NEXT: xorl %esi, %edi
				; CHECK-NOBMI-NEXT: andl %edx, %edi
				; CHECK-NOBMI-NEXT: xorl %esi, %edi
				; CHECK-NOBMI-NEXT: movl %edi, %eax
				; CHECK-NOBMI-NEXT: retq
				;
				; CHECK-BMI-LABEL: in:
				; CHECK-BMI: # %bb.0:
				; CHECK-BMI-NEXT: xorl %esi, %edi
				; CHECK-BMI-NEXT: andl %edx, %edi
				; CHECK-BMI-NEXT: xorl %esi, %edi
				; CHECK-BMI-NEXT: movl %edi, %eax
				; CHECK-BMI-NEXT: retq
				%n0 = xor i32 %x, %y
				%n1 = and i32 %n0, %mask
				%r = xor i32 %n1, %y
				ret i32 %r
				}

				define i32 @in_constmask(i32 %x, i32 %y) {
				; CHECK-NOBMI-LABEL: in_constmask:
				; CHECK-NOBMI: # %bb.0:
				; CHECK-NOBMI-NEXT: xorl %esi, %edi
				; CHECK-NOBMI-NEXT: andl $16776960, %edi # imm = 0xFFFF00
				; CHECK-NOBMI-NEXT: xorl %esi, %edi
				; CHECK-NOBMI-NEXT: movl %edi, %eax
				; CHECK-NOBMI-NEXT: retq
				;
				; CHECK-BMI-LABEL: in_constmask:
				; CHECK-BMI: # %bb.0:
				; CHECK-BMI-NEXT: xorl %esi, %edi
				; CHECK-BMI-NEXT: andl $16776960, %edi # imm = 0xFFFF00
				; CHECK-BMI-NEXT: xorl %esi, %edi
				; CHECK-BMI-NEXT: movl %edi, %eax
				; CHECK-BMI-NEXT: retq
				%n0 = xor i32 %x, %y
				%n1 = and i32 %n0, 16776960
				%r = xor i32 %n1, %y
				ret i32 %r
				}

test/CodeGen/X86/unfold-masked-merge-vector.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=x86_64-unknown-linux-gnu -mattr=-sse < %s \| FileCheck %s --check-prefix=CHECK-NOSSE
				spatelUnsubmitted Not Done Reply Inline Actions Not sure this RUN adds much value - I think SSE is required by the x86-64 ABI, so we may already be in undefined territory with these tests. :) A target with -mattr=avx may be slightly more interesting, but there might not be enough ISA difference there to even make that worthwhile. spatel: Not sure this RUN adds much value - I think SSE is required by the x86-64 ABI, so we may…
				; RUN: llc -mtriple=x86_64-unknown-linux-gnu -mattr=+sse < %s \| FileCheck %s --check-prefix=CHECK-SSE

				; https://bugs.llvm.org/show_bug.cgi?id=37104

				define <4 x i32> @out_vec(<4 x i32> %x, <4 x i32> %y, <4 x i32> %mask) {
				; CHECK-NOSSE-LABEL: out_vec:
				; CHECK-NOSSE: # %bb.0:
				; CHECK-NOSSE-NEXT: pushq %rbx
				; CHECK-NOSSE-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NOSSE-NEXT: .cfi_offset %rbx, -16
				; CHECK-NOSSE-NEXT: movl {{[0-9]+}}(%rsp), %r10d
				; CHECK-NOSSE-NEXT: movl {{[0-9]+}}(%rsp), %r11d
				; CHECK-NOSSE-NEXT: movl {{[0-9]+}}(%rsp), %eax
				; CHECK-NOSSE-NEXT: movl {{[0-9]+}}(%rsp), %ebx
				; CHECK-NOSSE-NEXT: andl %ebx, %r8d
				; CHECK-NOSSE-NEXT: andl %eax, %ecx
				; CHECK-NOSSE-NEXT: andl %r11d, %edx
				; CHECK-NOSSE-NEXT: andl %r10d, %esi
				; CHECK-NOSSE-NEXT: notl %r11d
				; CHECK-NOSSE-NEXT: notl %eax
				; CHECK-NOSSE-NEXT: notl %ebx
				; CHECK-NOSSE-NEXT: notl %r10d
				; CHECK-NOSSE-NEXT: andl %r9d, %r10d
				; CHECK-NOSSE-NEXT: orl %esi, %r10d
				; CHECK-NOSSE-NEXT: andl {{[0-9]+}}(%rsp), %ebx
				; CHECK-NOSSE-NEXT: orl %r8d, %ebx
				; CHECK-NOSSE-NEXT: andl {{[0-9]+}}(%rsp), %eax
				; CHECK-NOSSE-NEXT: orl %ecx, %eax
				; CHECK-NOSSE-NEXT: andl {{[0-9]+}}(%rsp), %r11d
				; CHECK-NOSSE-NEXT: orl %edx, %r11d
				; CHECK-NOSSE-NEXT: movl %ebx, 12(%rdi)
				; CHECK-NOSSE-NEXT: movl %eax, 8(%rdi)
				; CHECK-NOSSE-NEXT: movl %r11d, 4(%rdi)
				; CHECK-NOSSE-NEXT: movl %r10d, (%rdi)
				; CHECK-NOSSE-NEXT: movq %rdi, %rax
				; CHECK-NOSSE-NEXT: popq %rbx
				; CHECK-NOSSE-NEXT: retq
				;
				; CHECK-SSE-LABEL: out_vec:
				; CHECK-SSE: # %bb.0:
				; CHECK-SSE-NEXT: andps %xmm2, %xmm0
				; CHECK-SSE-NEXT: andnps %xmm1, %xmm2
				; CHECK-SSE-NEXT: orps %xmm2, %xmm0
				; CHECK-SSE-NEXT: retq
				%mx = and <4 x i32> %x, %mask
				%notmask = xor <4 x i32> %mask, <i32 -1, i32 -1, i32 -1, i32 -1>
				%my = and <4 x i32> %y, %notmask
				%r = or <4 x i32> %mx, %my
				ret <4 x i32> %r
				}

				define <4 x i32> @out_vec_undef(<4 x i32> %x, <4 x i32> %y, <4 x i32> %mask) {
				; CHECK-NOSSE-LABEL: out_vec_undef:
				; CHECK-NOSSE: # %bb.0:
				; CHECK-NOSSE-NEXT: movl {{[0-9]+}}(%rsp), %r10d
				; CHECK-NOSSE-NEXT: movl {{[0-9]+}}(%rsp), %r11d
				; CHECK-NOSSE-NEXT: movl {{[0-9]+}}(%rsp), %eax
				; CHECK-NOSSE-NEXT: andl %eax, %r8d
				; CHECK-NOSSE-NEXT: andl %r11d, %edx
				; CHECK-NOSSE-NEXT: andl %r10d, %esi
				; CHECK-NOSSE-NEXT: andl {{[0-9]+}}(%rsp), %ecx
				; CHECK-NOSSE-NEXT: notl %r11d
				; CHECK-NOSSE-NEXT: notl %eax
				; CHECK-NOSSE-NEXT: notl %r10d
				; CHECK-NOSSE-NEXT: andl %r9d, %r10d
				; CHECK-NOSSE-NEXT: orl %esi, %r10d
				; CHECK-NOSSE-NEXT: andl {{[0-9]+}}(%rsp), %eax
				; CHECK-NOSSE-NEXT: orl %r8d, %eax
				; CHECK-NOSSE-NEXT: andl {{[0-9]+}}(%rsp), %r11d
				; CHECK-NOSSE-NEXT: orl %edx, %r11d
				; CHECK-NOSSE-NEXT: movl %ecx, 8(%rdi)
				; CHECK-NOSSE-NEXT: movl %eax, 12(%rdi)
				; CHECK-NOSSE-NEXT: movl %r11d, 4(%rdi)
				; CHECK-NOSSE-NEXT: movl %r10d, (%rdi)
				; CHECK-NOSSE-NEXT: movq %rdi, %rax
				; CHECK-NOSSE-NEXT: retq
				;
				; CHECK-SSE-LABEL: out_vec_undef:
				; CHECK-SSE: # %bb.0:
				; CHECK-SSE-NEXT: andps %xmm2, %xmm0
				; CHECK-SSE-NEXT: andnps %xmm1, %xmm2
				; CHECK-SSE-NEXT: orps %xmm2, %xmm0
				; CHECK-SSE-NEXT: retq
				%mx = and <4 x i32> %x, %mask
				%notmask = xor <4 x i32> %mask, <i32 -1, i32 -1, i32 undef, i32 -1>
				%my = and <4 x i32> %y, %notmask
				%r = or <4 x i32> %mx, %my
				ret <4 x i32> %r
				}

				define <4 x i32> @out_vec_constmask_splat(<4 x i32> %x, <4 x i32> %y, <4 x i32> %mask) {
				; CHECK-NOSSE-LABEL: out_vec_constmask_splat:
				; CHECK-NOSSE: # %bb.0:
				; CHECK-NOSSE-NEXT: andl $16776960, %r8d # imm = 0xFFFF00
				; CHECK-NOSSE-NEXT: andl $16776960, %ecx # imm = 0xFFFF00
				; CHECK-NOSSE-NEXT: andl $16776960, %edx # imm = 0xFFFF00
				; CHECK-NOSSE-NEXT: andl $16776960, %esi # imm = 0xFFFF00
				; CHECK-NOSSE-NEXT: movl $-16776961, %eax # imm = 0xFF0000FF
				; CHECK-NOSSE-NEXT: movl {{[0-9]+}}(%rsp), %r10d
				; CHECK-NOSSE-NEXT: andl %eax, %r10d
				; CHECK-NOSSE-NEXT: orl %r8d, %r10d
				; CHECK-NOSSE-NEXT: movl {{[0-9]+}}(%rsp), %r8d
				; CHECK-NOSSE-NEXT: andl %eax, %r8d
				; CHECK-NOSSE-NEXT: orl %ecx, %r8d
				; CHECK-NOSSE-NEXT: andl {{[0-9]+}}(%rsp), %eax
				; CHECK-NOSSE-NEXT: orl %edx, %eax
				; CHECK-NOSSE-NEXT: andl $-16776961, %r9d # imm = 0xFF0000FF
				; CHECK-NOSSE-NEXT: orl %esi, %r9d
				; CHECK-NOSSE-NEXT: movl %r10d, 12(%rdi)
				; CHECK-NOSSE-NEXT: movl %r8d, 8(%rdi)
				; CHECK-NOSSE-NEXT: movl %eax, 4(%rdi)
				; CHECK-NOSSE-NEXT: movl %r9d, (%rdi)
				; CHECK-NOSSE-NEXT: movq %rdi, %rax
				; CHECK-NOSSE-NEXT: retq
				;
				; CHECK-SSE-LABEL: out_vec_constmask_splat:
				; CHECK-SSE: # %bb.0:
				; CHECK-SSE-NEXT: movaps {{.*#+}} xmm2 = [0,255,255,0,0,255,255,0,0,255,255,0,0,255,255,0]
				; CHECK-SSE-NEXT: andps %xmm2, %xmm0
				; CHECK-SSE-NEXT: andnps %xmm1, %xmm2
				; CHECK-SSE-NEXT: orps %xmm2, %xmm0
				; CHECK-SSE-NEXT: retq
				%mx = and <4 x i32> %x, <i32 16776960, i32 16776960, i32 16776960, i32 16776960>
				%my = and <4 x i32> %y, <i32 -16776961, i32 -16776961, i32 -16776961, i32 -16776961>
				%r = or <4 x i32> %mx, %my
				ret <4 x i32> %r
				}

				define <4 x i32> @out_vec_constmask_undef(<4 x i32> %x, <4 x i32> %y, <4 x i32> %mask) {
				; CHECK-NOSSE-LABEL: out_vec_constmask_undef:
				; CHECK-NOSSE: # %bb.0:
				; CHECK-NOSSE-NEXT: andl $16776960, %r8d # imm = 0xFFFF00
				; CHECK-NOSSE-NEXT: andl $16776960, %edx # imm = 0xFFFF00
				; CHECK-NOSSE-NEXT: andl $16776960, %esi # imm = 0xFFFF00
				; CHECK-NOSSE-NEXT: movl $-16776961, %eax # imm = 0xFF0000FF
				; CHECK-NOSSE-NEXT: movl {{[0-9]+}}(%rsp), %ecx
				; CHECK-NOSSE-NEXT: andl %eax, %ecx
				; CHECK-NOSSE-NEXT: orl %r8d, %ecx
				; CHECK-NOSSE-NEXT: andl {{[0-9]+}}(%rsp), %eax
				; CHECK-NOSSE-NEXT: orl %edx, %eax
				; CHECK-NOSSE-NEXT: andl $-16776961, %r9d # imm = 0xFF0000FF
				; CHECK-NOSSE-NEXT: orl %esi, %r9d
				; CHECK-NOSSE-NEXT: movl %ecx, 12(%rdi)
				; CHECK-NOSSE-NEXT: movl %eax, 4(%rdi)
				; CHECK-NOSSE-NEXT: movl %r9d, (%rdi)
				; CHECK-NOSSE-NEXT: movl $0, 8(%rdi)
				; CHECK-NOSSE-NEXT: movq %rdi, %rax
				; CHECK-NOSSE-NEXT: retq
				;
				; CHECK-SSE-LABEL: out_vec_constmask_undef:
				; CHECK-SSE: # %bb.0:
				; CHECK-SSE-NEXT: movaps {{.*#+}} xmm2 = [0,255,255,0,0,255,255,0,255,255,255,255,0,255,255,0]
				; CHECK-SSE-NEXT: andps %xmm2, %xmm0
				; CHECK-SSE-NEXT: andnps %xmm1, %xmm2
				; CHECK-SSE-NEXT: orps %xmm2, %xmm0
				; CHECK-SSE-NEXT: retq
				%mx = and <4 x i32> %x, <i32 16776960, i32 16776960, i32 undef, i32 16776960>
				%my = and <4 x i32> %y, <i32 -16776961, i32 -16776961, i32 undef, i32 -16776961>
				%r = or <4 x i32> %mx, %my
				ret <4 x i32> %r
				}

				define <4 x i32> @out_vec_constmask_nonsplat(<4 x i32> %x, <4 x i32> %y, <4 x i32> %mask) {
				; CHECK-NOSSE-LABEL: out_vec_constmask_nonsplat:
				; CHECK-NOSSE: # %bb.0:
				; CHECK-NOSSE-NEXT: andl $16711935, %r8d # imm = 0xFF00FF
				; CHECK-NOSSE-NEXT: andl $16776960, %ecx # imm = 0xFFFF00
				; CHECK-NOSSE-NEXT: andl $16711935, %edx # imm = 0xFF00FF
				; CHECK-NOSSE-NEXT: andl $16776960, %esi # imm = 0xFFFF00
				; CHECK-NOSSE-NEXT: movl $-16711936, %r11d # imm = 0xFF00FF00
				; CHECK-NOSSE-NEXT: movl {{[0-9]+}}(%rsp), %r10d
				; CHECK-NOSSE-NEXT: andl %r11d, %r10d
				; CHECK-NOSSE-NEXT: orl %r8d, %r10d
				; CHECK-NOSSE-NEXT: movl $-16776961, %eax # imm = 0xFF0000FF
				; CHECK-NOSSE-NEXT: andl {{[0-9]+}}(%rsp), %eax
				; CHECK-NOSSE-NEXT: orl %ecx, %eax
				; CHECK-NOSSE-NEXT: andl {{[0-9]+}}(%rsp), %r11d
				; CHECK-NOSSE-NEXT: orl %edx, %r11d
				; CHECK-NOSSE-NEXT: andl $-16776961, %r9d # imm = 0xFF0000FF
				; CHECK-NOSSE-NEXT: orl %esi, %r9d
				; CHECK-NOSSE-NEXT: movl %r10d, 12(%rdi)
				; CHECK-NOSSE-NEXT: movl %eax, 8(%rdi)
				; CHECK-NOSSE-NEXT: movl %r11d, 4(%rdi)
				; CHECK-NOSSE-NEXT: movl %r9d, (%rdi)
				; CHECK-NOSSE-NEXT: movq %rdi, %rax
				; CHECK-NOSSE-NEXT: retq
				;
				; CHECK-SSE-LABEL: out_vec_constmask_nonsplat:
				; CHECK-SSE: # %bb.0:
				; CHECK-SSE-NEXT: movaps {{.*#+}} xmm2 = [0,255,255,0,255,0,255,0,0,255,255,0,255,0,255,0]
				; CHECK-SSE-NEXT: andps %xmm2, %xmm0
				; CHECK-SSE-NEXT: andnps %xmm1, %xmm2
				; CHECK-SSE-NEXT: orps %xmm2, %xmm0
				; CHECK-SSE-NEXT: retq
				%mx = and <4 x i32> %x, <i32 16776960, i32 16711935, i32 16776960, i32 16711935>
				%my = and <4 x i32> %y, <i32 -16776961, i32 -16711936, i32 -16776961, i32 -16711936>
				%r = or <4 x i32> %mx, %my
				ret <4 x i32> %r
				}

				;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
				; Should be the same as the previous one.
				;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

				define <4 x i32> @in_vec(<4 x i32> %x, <4 x i32> %y, <4 x i32> %mask) {
				; CHECK-NOSSE-LABEL: in_vec:
				; CHECK-NOSSE: # %bb.0:
				; CHECK-NOSSE-NEXT: movl {{[0-9]+}}(%rsp), %r10d
				; CHECK-NOSSE-NEXT: movl {{[0-9]+}}(%rsp), %r11d
				; CHECK-NOSSE-NEXT: movl {{[0-9]+}}(%rsp), %eax
				; CHECK-NOSSE-NEXT: xorl %r9d, %esi
				; CHECK-NOSSE-NEXT: xorl %eax, %edx
				; CHECK-NOSSE-NEXT: xorl %r11d, %ecx
				; CHECK-NOSSE-NEXT: xorl %r10d, %r8d
				; CHECK-NOSSE-NEXT: andl {{[0-9]+}}(%rsp), %r8d
				; CHECK-NOSSE-NEXT: andl {{[0-9]+}}(%rsp), %ecx
				; CHECK-NOSSE-NEXT: andl {{[0-9]+}}(%rsp), %edx
				; CHECK-NOSSE-NEXT: andl {{[0-9]+}}(%rsp), %esi
				; CHECK-NOSSE-NEXT: xorl %r9d, %esi
				; CHECK-NOSSE-NEXT: xorl %eax, %edx
				; CHECK-NOSSE-NEXT: xorl %r11d, %ecx
				; CHECK-NOSSE-NEXT: xorl %r10d, %r8d
				; CHECK-NOSSE-NEXT: movl %r8d, 12(%rdi)
				; CHECK-NOSSE-NEXT: movl %ecx, 8(%rdi)
				; CHECK-NOSSE-NEXT: movl %edx, 4(%rdi)
				; CHECK-NOSSE-NEXT: movl %esi, (%rdi)
				; CHECK-NOSSE-NEXT: movq %rdi, %rax
				; CHECK-NOSSE-NEXT: retq
				;
				; CHECK-SSE-LABEL: in_vec:
				; CHECK-SSE: # %bb.0:
				; CHECK-SSE-NEXT: xorps %xmm1, %xmm0
				; CHECK-SSE-NEXT: andps %xmm2, %xmm0
				; CHECK-SSE-NEXT: xorps %xmm1, %xmm0
				; CHECK-SSE-NEXT: retq
				%n0 = xor <4 x i32> %x, %y
				%n1 = and <4 x i32> %n0, %mask
				%r = xor <4 x i32> %n1, %y
				ret <4 x i32> %r
				}

				define <4 x i32> @in_vec_constmask_splat(<4 x i32> %x, <4 x i32> %y) {
				; CHECK-NOSSE-LABEL: in_vec_constmask_splat:
				; CHECK-NOSSE: # %bb.0:
				; CHECK-NOSSE-NEXT: movl {{[0-9]+}}(%rsp), %r10d
				; CHECK-NOSSE-NEXT: movl {{[0-9]+}}(%rsp), %r11d
				; CHECK-NOSSE-NEXT: movl {{[0-9]+}}(%rsp), %eax
				; CHECK-NOSSE-NEXT: xorl %r9d, %esi
				; CHECK-NOSSE-NEXT: xorl %eax, %edx
				; CHECK-NOSSE-NEXT: xorl %r11d, %ecx
				; CHECK-NOSSE-NEXT: xorl %r10d, %r8d
				; CHECK-NOSSE-NEXT: andl $16776960, %r8d # imm = 0xFFFF00
				; CHECK-NOSSE-NEXT: andl $16776960, %ecx # imm = 0xFFFF00
				; CHECK-NOSSE-NEXT: andl $16776960, %edx # imm = 0xFFFF00
				; CHECK-NOSSE-NEXT: andl $16776960, %esi # imm = 0xFFFF00
				; CHECK-NOSSE-NEXT: xorl %r9d, %esi
				; CHECK-NOSSE-NEXT: xorl %eax, %edx
				; CHECK-NOSSE-NEXT: xorl %r11d, %ecx
				; CHECK-NOSSE-NEXT: xorl %r10d, %r8d
				; CHECK-NOSSE-NEXT: movl %r8d, 12(%rdi)
				; CHECK-NOSSE-NEXT: movl %ecx, 8(%rdi)
				; CHECK-NOSSE-NEXT: movl %edx, 4(%rdi)
				; CHECK-NOSSE-NEXT: movl %esi, (%rdi)
				; CHECK-NOSSE-NEXT: movq %rdi, %rax
				; CHECK-NOSSE-NEXT: retq
				;
				; CHECK-SSE-LABEL: in_vec_constmask_splat:
				; CHECK-SSE: # %bb.0:
				; CHECK-SSE-NEXT: xorps %xmm1, %xmm0
				; CHECK-SSE-NEXT: andps {{.*}}(%rip), %xmm0
				; CHECK-SSE-NEXT: xorps %xmm1, %xmm0
				; CHECK-SSE-NEXT: retq
				%n0 = xor <4 x i32> %x, %y
				%n1 = and <4 x i32> %n0, <i32 16776960, i32 16776960, i32 16776960, i32 16776960>
				%r = xor <4 x i32> %n1, %y
				ret <4 x i32> %r
				}

				define <4 x i32> @in_vec_constmask_undef(<4 x i32> %x, <4 x i32> %y) {
				; CHECK-NOSSE-LABEL: in_vec_constmask_undef:
				; CHECK-NOSSE: # %bb.0:
				; CHECK-NOSSE-NEXT: movl {{[0-9]+}}(%rsp), %r10d
				; CHECK-NOSSE-NEXT: movl {{[0-9]+}}(%rsp), %ecx
				; CHECK-NOSSE-NEXT: movl {{[0-9]+}}(%rsp), %eax
				; CHECK-NOSSE-NEXT: xorl %r9d, %esi
				; CHECK-NOSSE-NEXT: xorl %eax, %edx
				; CHECK-NOSSE-NEXT: xorl %ecx, %r8d
				; CHECK-NOSSE-NEXT: andl $16776960, %r8d # imm = 0xFFFF00
				; CHECK-NOSSE-NEXT: andl $16776960, %edx # imm = 0xFFFF00
				; CHECK-NOSSE-NEXT: andl $16776960, %esi # imm = 0xFFFF00
				; CHECK-NOSSE-NEXT: xorl %r9d, %esi
				; CHECK-NOSSE-NEXT: xorl %eax, %edx
				; CHECK-NOSSE-NEXT: xorl %ecx, %r8d
				; CHECK-NOSSE-NEXT: movl %r10d, 8(%rdi)
				; CHECK-NOSSE-NEXT: movl %r8d, 12(%rdi)
				; CHECK-NOSSE-NEXT: movl %edx, 4(%rdi)
				; CHECK-NOSSE-NEXT: movl %esi, (%rdi)
				; CHECK-NOSSE-NEXT: movq %rdi, %rax
				; CHECK-NOSSE-NEXT: retq
				;
				; CHECK-SSE-LABEL: in_vec_constmask_undef:
				; CHECK-SSE: # %bb.0:
				; CHECK-SSE-NEXT: xorps %xmm1, %xmm0
				; CHECK-SSE-NEXT: andps {{.*}}(%rip), %xmm0
				; CHECK-SSE-NEXT: xorps %xmm1, %xmm0
				; CHECK-SSE-NEXT: retq
				%n0 = xor <4 x i32> %x, %y
				%n1 = and <4 x i32> %n0, <i32 16776960, i32 16776960, i32 undef, i32 16776960>
				%r = xor <4 x i32> %n1, %y
				ret <4 x i32> %r
				}

				define <4 x i32> @in_vec_constmask_nonsplat(<4 x i32> %x, <4 x i32> %y) {
				; CHECK-NOSSE-LABEL: in_vec_constmask_nonsplat:
				; CHECK-NOSSE: # %bb.0:
				; CHECK-NOSSE-NEXT: movl {{[0-9]+}}(%rsp), %r10d
				; CHECK-NOSSE-NEXT: movl {{[0-9]+}}(%rsp), %r11d
				; CHECK-NOSSE-NEXT: movl {{[0-9]+}}(%rsp), %eax
				; CHECK-NOSSE-NEXT: xorl %r9d, %esi
				; CHECK-NOSSE-NEXT: xorl %eax, %edx
				; CHECK-NOSSE-NEXT: xorl %r11d, %ecx
				; CHECK-NOSSE-NEXT: xorl %r10d, %r8d
				; CHECK-NOSSE-NEXT: andl $16711935, %r8d # imm = 0xFF00FF
				; CHECK-NOSSE-NEXT: andl $16776960, %ecx # imm = 0xFFFF00
				; CHECK-NOSSE-NEXT: andl $16711935, %edx # imm = 0xFF00FF
				; CHECK-NOSSE-NEXT: andl $16776960, %esi # imm = 0xFFFF00
				; CHECK-NOSSE-NEXT: xorl %r9d, %esi
				; CHECK-NOSSE-NEXT: xorl %eax, %edx
				; CHECK-NOSSE-NEXT: xorl %r11d, %ecx
				; CHECK-NOSSE-NEXT: xorl %r10d, %r8d
				; CHECK-NOSSE-NEXT: movl %r8d, 12(%rdi)
				; CHECK-NOSSE-NEXT: movl %ecx, 8(%rdi)
				; CHECK-NOSSE-NEXT: movl %edx, 4(%rdi)
				; CHECK-NOSSE-NEXT: movl %esi, (%rdi)
				; CHECK-NOSSE-NEXT: movq %rdi, %rax
				; CHECK-NOSSE-NEXT: retq
				;
				; CHECK-SSE-LABEL: in_vec_constmask_nonsplat:
				; CHECK-SSE: # %bb.0:
				; CHECK-SSE-NEXT: xorps %xmm1, %xmm0
				; CHECK-SSE-NEXT: andps {{.*}}(%rip), %xmm0
				; CHECK-SSE-NEXT: xorps %xmm1, %xmm0
				; CHECK-SSE-NEXT: retq
				%n0 = xor <4 x i32> %x, %y
				%n1 = and <4 x i32> %n0, <i32 16776960, i32 16711935, i32 16776960, i32 16711935>
				%r = xor <4 x i32> %n1, %y
				ret <4 x i32> %r
				}

This is an archive of the discontinued LLVM Phabricator instance.

[X86][AArch64][NFC] Add tests for masked merge unfolding
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 142177

test/CodeGen/AArch64/unfold-masked-merge.ll

test/CodeGen/X86/unfold-masked-merge-scalar.ll

test/CodeGen/X86/unfold-masked-merge-vector.ll

This is an archive of the discontinued LLVM Phabricator instance.

[X86][AArch64][NFC] Add tests for masked merge unfoldingClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 142177

test/CodeGen/AArch64/unfold-masked-merge.ll

test/CodeGen/X86/unfold-masked-merge-scalar.ll

test/CodeGen/X86/unfold-masked-merge-vector.ll

[X86][AArch64][NFC] Add tests for masked merge unfolding
ClosedPublic