This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine][X86][AArch64] Masked merge unfolding: vector edition.
ClosedPublic

Authored by lebedev.ri on May 7 2018, 8:46 AM.

Details

Summary

This appears to be the last missing piece for the masked merge pattern handling in the backend.

This is PR37104.

PR6773 will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, andps+andnps / bsl would be generated. (see @out)
Now, they would no longer be generated (see @in), and we need to make sure that they are generated.

Before reviewing this, it is best to finish with the scalar part.

Diff Detail

Repository
rL LLVM

Event Timeline

lebedev.ri created this revision.May 7 2018, 8:46 AM

Rebased ontop of updated tests - X86 XOP test added.

spatel added inline comments.
lib/Target/AArch64/AArch64ISelLowering.h
444 ↗(On Diff #145959)

The name of the function is overly specific if we're including bsl and instructions too. I think we should either generalize that name or add a different hook for "hasBitwiseSelect".

lib/Target/X86/X86ISelLowering.cpp
4773–4776 ↗(On Diff #145959)

That seems odd. Does something bad happen with other types of 128-bit vectors (eg v8i16) with SSE1 only? I would've thought all element types would get mapped to andnps.

lebedev.ri added inline comments.May 9 2018, 2:03 PM
lib/Target/AArch64/AArch64ISelLowering.h
444 ↗(On Diff #145959)

Currently for AArch64, hasAndNot() is not overriden.
We could rename it to that, and make hasAndNotCompare() call hasAndNot().

Or is this about these hooks in general?

spatel added inline comments.May 9 2018, 3:53 PM
lib/Target/AArch64/AArch64ISelLowering.h
444 ↗(On Diff #145959)

It's a combination:

  1. It's not accurate to say here (or in x86) that we have AndNotCompare for vector types. That hook is supposed to indicate that the logic instruction also sets flags (bics vs. bic). It's not causing any problems right now AFAICT, but stretching the truth of that logic could cause trouble some day.
  1. bsl (or vpcmov for x86) are not and-not operations either, so some transform may be misled by us claiming those are and-not ops.

For AArch, we have vector bic. So here, I think we should override hasAndNot() and indicate that we have bic for scalar and bic for vector. It's really just a bonus for the masked-merge transform that we can shrink it down to bsl.

Similarly for x86, what we care about is that we can use andnps, and producing vpcmov (who knew) is a bonus. So we want to fix the override of hasAndNot() to indicate that legal vector types are supported with the right dose of SSE.

lib/Target/X86/X86ISelLowering.cpp
4773–4776 ↗(On Diff #145959)

Hmm...the other types seem to get legalized via scalarization before we have a chance to transform them into the x86-specific logic nodes.

Probably nobody cares about pre-SSE2 codegen at this point, but Craig or Simon may have suggestions about making the tests less noisy in that respect.

Rework X86/AArch64 hasAndNotCompare() / hasAndNot() hooks a little.

lib/Target/AArch64/AArch64ISelLowering.h
444 ↗(On Diff #145959)

I have reworked this a little, is this what you had in mind?

spatel added inline comments.May 10 2018, 9:58 AM
lib/Target/AArch64/AArch64ISelLowering.h
453–459 ↗(On Diff #146136)

No need to change hasAndNotCompare? Seem clearer to just "return true" as we do today.

spatel added inline comments.May 10 2018, 10:04 AM
lib/Target/AArch64/AArch64ISelLowering.h
453–459 ↗(On Diff #146136)

Oops...the current code isn't correct. It should be:

bool hasAndNotCompare(SDValue V) const override {
  // We can use bics for any scalar.
  return V.getValueType().isVector();
}
spatel added inline comments.May 10 2018, 10:06 AM
lib/Target/AArch64/AArch64ISelLowering.h
453–459 ↗(On Diff #146136)

Argh...forgot to invert:

bool hasAndNotCompare(SDValue V) const override {
  // We can use bics for any scalar.
  return V.getValueType().isScalarInteger();
}
lebedev.ri marked 7 inline comments as done.

Rework AArch64 hasAndNotCompare() / hasAndNot() hooks a little once more,
move them around.

spatel accepted this revision.May 21 2018, 7:51 AM

We may be waiting for the test coverage to be resolved, but this logic change LGTM

lib/Target/AArch64/AArch64ISelLowering.h
455 ↗(On Diff #146315)

Could just return true here? We could assert that we're not dealing with FP if that's a concern.

This revision is now accepted and ready to land.May 21 2018, 7:51 AM

We may be waiting for the test coverage to be resolved, but this logic change LGTM

Thank you for the review!

Waiting on @RKSimon ...

RKSimon requested changes to this revision.May 21 2018, 9:42 AM

I 'm happy with the illegal type tests to go in but the lost ANDNP combine could be a problem.

lib/Target/X86/X86ISelLowering.cpp
4776 ↗(On Diff #146315)

Its very unlikely we'll ever add MMX support to this - drop this FIXME.

test/CodeGen/X86/unfold-masked-merge-vector-variablemask-const.ll
65 ↗(On Diff #146315)

This regression is rather unfortunate - anyway that we can get the X86ISD::ANDNP back?

This revision now requires changes to proceed.May 21 2018, 9:42 AM
lebedev.ri added inline comments.May 21 2018, 10:05 AM
test/CodeGen/X86/unfold-masked-merge-vector-variablemask-const.ll
65 ↗(On Diff #146315)

Uhh, great observation :/
I think this case isn't even supposed to be affected,
since we skip all not xor's...
I guess isAllOnesConstantOrAllOnesSplatConstant() is somehow misbehaving.

Drop fixme, fix slight regression in dependent differential.

lebedev.ri marked 3 inline comments as done.
lebedev.ri added inline comments.
lib/Target/AArch64/AArch64ISelLowering.h
455 ↗(On Diff #146315)

I would think we should delegate to hasAndNotCompare(), since it is already there.

test/CodeGen/X86/unfold-masked-merge-vector-variablemask-const.ll
65 ↗(On Diff #146315)
RKSimon accepted this revision.May 21 2018, 1:28 PM

LGTM - thanks.

This revision is now accepted and ready to land.May 21 2018, 1:28 PM
lebedev.ri marked an inline comment as done.May 21 2018, 1:32 PM

LGTM - thanks.

Thank you for the review!