This is an archive of the discontinued LLVM Phabricator instance.

[X86] Prefer vpmovq2m over vpternlogd + vpcmpgtq
Needs ReviewPublic

Authored by davezarzycki on Jun 7 2021, 8:26 AM.

Download Raw Diff

Details

Reviewers

craig.topper
spatel
RKSimon
pengfei

Summary

Let's undo earlier optimizations that turn < 0 comparisons into > -1. This enables us to use vpmovq2m instead of vpternlogd + vpcmpgtq.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

davezarzycki created this revision.Jun 7 2021, 8:26 AM

Herald added subscribers: pengfei, hiraditya. · View Herald TranscriptJun 7 2021, 8:26 AM

davezarzycki requested review of this revision.Jun 7 2021, 8:26 AM

Harbormaster completed remote builds in B107998: Diff 350305.Jun 7 2021, 8:27 AM

I'm surprised that this didn't break existing tests. Where is the right place to update the test suite?

RKSimon added reviewers: RKSimon, pengfei.Jun 7 2021, 8:29 AM

RKSimon retitled this revision from Prefer vpmovq2m over vpternlogd + vpcmpgtq to [X86] Prefer vpmovq2m over vpternlogd + vpcmpgtq.

davezarzycki updated this revision to Diff 350322.Jun 7 2021, 9:17 AM

Harbormaster completed remote builds in B108010: Diff 350322.Jun 7 2021, 9:17 AM

Actually, wait. Something weird is going on at the mid-level. These two functions should generate the same optimized IR, right?

typedef int V __attribute__((vector_size(64)));

V lt_zero_x_y(V mask, V x, V y) { return mask <  0 ? x : y; }
V ge_zero_y_x(V mask, V x, V y) { return mask >= 0 ? y : x; }

In D103820#2803120, @davezarzycki wrote:
Actually, wait. Something weird is going on at the mid-level. These two functions should generate the same optimized IR, right?
typedef int V __attribute__((vector_size(64)));

V lt_zero_x_y(V mask, V x, V y) { return mask <  0 ? x : y; }
V ge_zero_y_x(V mask, V x, V y) { return mask >= 0 ? y : x; }

Should but currently we have a different IR. No canonicalization? Wow.

Plus, just cursious why with

typedef int V __attribute__((vector_size(4)));

we produce

define dso_local i32 @_Z11lt_zero_x_yDv1_iS_S_(i32 %0, i32 %1, i32 %2) local_unnamed_addr #0 {
  %4 = insertelement <1 x i32> poison, i32 %0, i32 0
  %5 = insertelement <1 x i32> poison, i32 %1, i32 0
  %6 = insertelement <1 x i32> poison, i32 %2, i32 0
  %7 = icmp sgt <1 x i32> %4, <i32 -1>
  %8 = select <1 x i1> %7, <1 x i32> %6, <1 x i32> %5
  %9 = extractelement <1 x i32> %8, i32 0
  ret i32 %9
}

Why not scalarize it on IR level?

In D103820#2803120, @davezarzycki wrote:
Actually, wait. Something weird is going on at the mid-level. These two functions should generate the same optimized IR, right?
typedef int V __attribute__((vector_size(64)));

V lt_zero_x_y(V mask, V x, V y) { return mask <  0 ? x : y; }
V ge_zero_y_x(V mask, V x, V y) { return mask >= 0 ? y : x; }

I'm not sure we canonicalize select operand order.

vpmovq2m only exists with avx512dq. Is this transform still useful with just avx512f?

In D103820#2803182, @craig.topper wrote:
In D103820#2803120, @davezarzycki wrote:
Actually, wait. Something weird is going on at the mid-level. These two functions should generate the same optimized IR, right?
typedef int V __attribute__((vector_size(64)));

V lt_zero_x_y(V mask, V x, V y) { return mask <  0 ? x : y; }
V ge_zero_y_x(V mask, V x, V y) { return mask >= 0 ? y : x; }
I'm not sure we canonicalize select operand order.

vpmovq2m only exists with avx512dq. Is this transform still useful with just avx512f?

On AVX512F? Sure. VPXOR has better code density than VPTERNLOG.

craig.topper added inline comments.Jun 7 2021, 10:20 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
22802	I'm a little confused. the vpmovq2m patter is (setgt imm_allzeros, X). Does this SETGE get canonicalized again?

davezarzycki added inline comments.Jun 7 2021, 10:29 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
22802	Oh, sorry about that. I originally made a logic error and converted > -1 to < 0 instead of >= 0; but at that time, I did verify that VPMOVQ2M was generated. You are right that the desired instruction is not generated with SETGE now that the logic error is fixed. In any case, I'd like to update the tests ahead of any code change. Is there a preferred file to put the current suboptimal code gen?

craig.topper added inline comments.Jun 7 2021, 11:08 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
22802	avx512-vec-cmp.ll seems like a decent file to put it in.

I filed a bug about the IR not being canonicalized. Let's wait for that before proceeding on generating VPMOVQ2M:

https://bugs.llvm.org/show_bug.cgi?id=50605

@davezarzycki What's going on the way phab is reporting the path in Revision Contents pane? llvm/lib/Target/X86 -> i/llvm/lib/Target/X86 ?!?

In D103820#2803573, @RKSimon wrote:

@davezarzycki What's going on the way phab is reporting the path in Revision Contents pane? llvm/lib/Target/X86 -> i/llvm/lib/Target/X86 ?!?

That's just git-diff. I thought Phab handled that but I guess not. From the documetation:

diff.mnemonicPrefix
If set, 'git diff' uses a prefix pair that is different from the standard "a/" and "b/" depending on what is being compared. When this configuration is in effect, reverse diff output also swaps the order of the prefixes:

git diff
compares the (i)ndex and the (w)ork tree;

git diff HEAD
compares a (c)ommit and the (w)ork tree;

git diff --cached
compares a (c)ommit and the (i)ndex;

git diff HEAD:file1 file2
compares an (o)bject and a (w)ork tree entity;

git diff --no-index a b
compares two non-git things (1) and (2).

@davezarzycki Are you still looking at this? What about PR50605?

Revision Contents

Path

Size

		llvm/	lib/	Target/	X86/
	i/	llvm/	lib/	Target/	X86/

X86ISelLowering.cpp

6 lines

Diff 350322

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 22,791 Lines • ▼ Show 20 Lines	static SDValue LowerIntVSETCC_AVX512(SDValue Op, SelectionDAG &DAG) {
ISD::CondCode SetCCOpcode = cast<CondCodeSDNode>(CC)->get();		ISD::CondCode SetCCOpcode = cast<CondCodeSDNode>(CC)->get();

// Prefer SETGT over SETLT.		// Prefer SETGT over SETLT.
if (SetCCOpcode == ISD::SETLT) {		if (SetCCOpcode == ISD::SETLT) {
SetCCOpcode = ISD::getSetCCSwappedOperands(SetCCOpcode);		SetCCOpcode = ISD::getSetCCSwappedOperands(SetCCOpcode);
std::swap(Op0, Op1);		std::swap(Op0, Op1);
}		}

		// Prefer >= 0 over > -1.
		if (SetCCOpcode == ISD::SETGT && ISD::isBuildVectorAllOnes(Op1.getNode())) {
		SetCCOpcode = ISD::SETGE;
		craig.topperUnsubmitted Not Done Reply Inline Actions I'm a little confused. the vpmovq2m patter is (setgt imm_allzeros, X). Does this SETGE get canonicalized again? craig.topper: I'm a little confused. the vpmovq2m patter is (setgt imm_allzeros, X). Does this SETGE get…
		davezarzyckiAuthorUnsubmitted Done Reply Inline Actions Oh, sorry about that. I originally made a logic error and converted > -1 to < 0 instead of >= 0; but at that time, I did verify that VPMOVQ2M was generated. You are right that the desired instruction is not generated with SETGE now that the logic error is fixed. In any case, I'd like to update the tests ahead of any code change. Is there a preferred file to put the current suboptimal code gen? davezarzycki: Oh, sorry about that. I originally made a logic error and converted > -1 to < 0 instead of >= 0…
		craig.topperUnsubmitted Not Done Reply Inline Actions avx512-vec-cmp.ll seems like a decent file to put it in. craig.topper: avx512-vec-cmp.ll seems like a decent file to put it in.
		Op1 = DAG.getConstant(0, dl, Op1.getValueType());
		}

return DAG.getSetCC(dl, VT, Op0, Op1, SetCCOpcode);		return DAG.getSetCC(dl, VT, Op0, Op1, SetCCOpcode);
}		}

/// Given a buildvector constant, return a new vector constant with each element		/// Given a buildvector constant, return a new vector constant with each element
/// incremented or decremented. If incrementing or decrementing would result in		/// incremented or decremented. If incrementing or decrementing would result in
/// unsigned overflow or underflow or this is not a simple vector constant,		/// unsigned overflow or underflow or this is not a simple vector constant,
/// return an empty value.		/// return an empty value.
static SDValue incDecVectorConstant(SDValue V, SelectionDAG &DAG, bool IsInc) {		static SDValue incDecVectorConstant(SDValue V, SelectionDAG &DAG, bool IsInc) {
▲ Show 20 Lines • Show All 29,459 Lines • Show Last 20 Lines