Download Raw Diff

Details

Reviewers

RKSimon
andreadb
craig.topper
efriedma
spatel

Commits

rG8b5d9cbbfedc: [x86][DAG] Unroll vectorized FREMs that will become libcalls

Summary

Currently, two element vectors produced as the result of a binary op are
widened to four element vectors on x86 by
DAGTypeLegalizer::WidenVecRes_BinaryCanTrap. If the op still isn't legal
after widening it is unrolled into scalar 4 scalar ops in selectionDAG before
being converted into a libcall. This way we end up with 4 libcalls (two of them
on known undef elements) instead of the original two libcalls.

This patch modifies DAGTypeLegalizer::WidenVectorResult to ensure
that if it is known that an binary op will be turned into a libcall, it is
unrolled instead of being widened. This prevents the creation of the extra
scalar instructions on known undef elements and (eventually) libcalls with
known undef parameters which would otherwise be created when the op gets
expanded post widening.

llvm/test/CodeGen/X86/frem-libcall.ll:

Regression test for the change.

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp:

The change in SelectionDAG as mentioned above.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

n-omer created this revision.May 19 2022, 9:33 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2022, 9:33 AM

Herald added subscribers: StephenFan, ecnelises, pengfei, hiraditya. · View Herald Transcript

n-omer requested review of this revision.May 19 2022, 9:33 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2022, 9:33 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B165353: Diff 430716.May 19 2022, 9:34 AM

RKSimon added reviewers: efriedma, spatel.May 19 2022, 9:37 AM

Probably pull out the frem.ll and frem-libcall.ll tests into their own phab for review first - frem-libcall.ll in particular doesn't show the current problem in trunk (it generates 4 fmodf calls atm).

llvm/test/CodeGen/X86/frem-libcall.ll
5	Sort this to keep the description more easy to notice: ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py ; RUN: llc -mtriple=x86_64-linux-gnu < %s \| FileCheck %s ; Ensure vectorized FREMs are not widened/unrolled such that they get lowered ; into libcalls on undef elements.
llvm/test/CodeGen/X86/frem.ll
4 ↗	(On Diff #430716)	Sort this to keep the description more easy to notice: ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py ; RUN: llc -mtriple=x86_64-linux-gnu < %s \| FileCheck %s ; Basic test coverage for FREM
95 ↗	(On Diff #430716)	very pedantic - but maybe sort the vector tests by size - the v8f16/v4f32/v2f64 first - then the 256 / 512-bit variants.
309 ↗	(On Diff #430716)	frem_v4f32

This makes the names and conventions here seem very confusing. The function is called WidenVecRes_BinaryCanTrap, but we're dealing with operations that can't actually trap. And it now has two different ways of unrolling an operation, depending on whether we're generating a libcall: the existing codepath uses smaller vectors, and the new one completely unrolls.

Can we unify the determination of whether we want to generate operations involving undef? And can we unify how we unroll an operation?

Why do we need to anything more than what we do for ISD::FSIN and friends already? LegalizeDAG.cpp doesn't have an Expand other than libcall for ISD::FREM. So checking TLI.isOperationExpand like we do for FSIN should be sufficient right?

Hi @RKSimon, I've updated the tests for the current trunk and split them off into https://reviews.llvm.org/D126055.

jmorse mentioned this in rGaed49eac87b8: [X86] Add tests for FREM.May 20 2022, 8:28 AM

Updating the diff for completeness because D126055. Still looking into the reviewers comments.

Harbormaster completed remote builds in B165542: Diff 431000.May 20 2022, 9:34 AM

Update diff based on review comments.

Add context to diff.

n-omer edited the summary of this revision. (Show Details)May 23 2022, 7:33 AM

Remove FIXME from test.

LGTM - its annoying that we can't reuse the similar code for the unary ops, but not a big issue. I think we will probably need to add FPOW and the integer division/remainder, but we can address that later.

@craig.topper @efriedma Any comments?

Harbormaster completed remote builds in B165839: Diff 431371.May 23 2022, 8:54 AM

spatel added inline comments.May 23 2022, 9:34 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

3698

Instead of copying this block, create a lambda above the switch and call it from here and the existing code?

auto unrollExpandedOp = [&]() {
  // We're going to widen this vector op to a legal type by padding with undef
  // elements. If the wide vector op is eventually going to be expanded to
  // scalar libcalls, then unroll into scalar ops now to avoid unnecessary
  // libcalls on the undef elements.
  EVT VT = N->getValueType(0);
  EVT WideVecVT = TLI.getTypeToTransformTo(*DAG.getContext(), VT);
  if (!TLI.isOperationLegalOrCustom(N->getOpcode(), WideVecVT) &&
      TLI.isOperationExpand(N->getOpcode(), VT.getScalarType())) {
    Res = DAG.UnrollVectorOp(N, WideVecVT.getVectorNumElements());
    return true;
  }
  return false;
};

Update diff based on reviewer's comments.

Thanks @spatel.

Harbormaster completed remote builds in B165871: Diff 431415.May 23 2022, 10:37 AM

RKSimon added inline comments.May 24 2022, 12:14 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
3703	remove these braces now?
3794–3795	these braces can go now?

Update diff based on reviewer's comments.

Thanks @RKSimon.

Harbormaster completed remote builds in B166009: Diff 431616.May 24 2022, 2:24 AM

LGTM

This revision is now accepted and ready to land.May 24 2022, 2:24 AM

RKSimon mentioned this in rG64186e9b351a: [X86] Add test showing failure to expand <2 x float> fpow without widening to….May 24 2022, 2:58 AM

This revision was landed with ongoing or failed builds.May 24 2022, 5:35 AM

Closed by commit rG8b5d9cbbfedc: [x86][DAG] Unroll vectorized FREMs that will become libcalls (authored by n-omer, committed by jmorse). · Explain Why

This revision was automatically updated to reflect the committed changes.

jmorse added a commit: rG8b5d9cbbfedc: [x86][DAG] Unroll vectorized FREMs that will become libcalls.

RKSimon mentioned this in rG11455e475889: [DAG] Unroll vectorized FPOW instructions before widening that will scalarize….May 24 2022, 7:45 AM

Diff 431653

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

Show First 20 Lines • Show All 3,574 Lines • ▼ Show 20 Lines	void DAGTypeLegalizer::WidenVectorResult(SDNode *N, unsigned ResNo) {
LLVM_DEBUG(dbgs() << "Widen node result " << ResNo << ": "; N->dump(&DAG);		LLVM_DEBUG(dbgs() << "Widen node result " << ResNo << ": "; N->dump(&DAG);
dbgs() << "\n");		dbgs() << "\n");

// See if the target wants to custom widen this node.		// See if the target wants to custom widen this node.
if (CustomWidenLowerNode(N, N->getValueType(ResNo)))		if (CustomWidenLowerNode(N, N->getValueType(ResNo)))
return;		return;

SDValue Res = SDValue();		SDValue Res = SDValue();

		auto unrollExpandedOp = [&]() {
		// We're going to widen this vector op to a legal type by padding with undef
		// elements. If the wide vector op is eventually going to be expanded to
		// scalar libcalls, then unroll into scalar ops now to avoid unnecessary
		// libcalls on the undef elements.
		EVT VT = N->getValueType(0);
		EVT WideVecVT = TLI.getTypeToTransformTo(*DAG.getContext(), VT);
		if (!TLI.isOperationLegalOrCustom(N->getOpcode(), WideVecVT) &&
		TLI.isOperationExpand(N->getOpcode(), VT.getScalarType())) {
		Res = DAG.UnrollVectorOp(N, WideVecVT.getVectorNumElements());
		return true;
		}
		return false;
		};

switch (N->getOpcode()) {		switch (N->getOpcode()) {
default:		default:
#ifndef NDEBUG		#ifndef NDEBUG
dbgs() << "WidenVectorResult #" << ResNo << ": ";		dbgs() << "WidenVectorResult #" << ResNo << ": ";
N->dump(&DAG);		N->dump(&DAG);
dbgs() << "\n";		dbgs() << "\n";
#endif		#endif
llvm_unreachable("Do not know how to widen the result of this operator!");		llvm_unreachable("Do not know how to widen the result of this operator!");
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	#endif
case ISD::VP_FADD:		case ISD::VP_FADD:
case ISD::VP_FSUB:		case ISD::VP_FSUB:
case ISD::VP_FMUL:		case ISD::VP_FMUL:
case ISD::VP_FDIV:		case ISD::VP_FDIV:
case ISD::VP_FREM:		case ISD::VP_FREM:
Res = WidenVecRes_Binary(N);		Res = WidenVecRes_Binary(N);
break;		break;

		case ISD::FREM:
		if (unrollExpandedOp())
		spatelUnsubmitted Not Done Reply Inline Actions Instead of copying this block, create a lambda above the switch and call it from here and the existing code? auto unrollExpandedOp = [&]() { // We're going to widen this vector op to a legal type by padding with undef // elements. If the wide vector op is eventually going to be expanded to // scalar libcalls, then unroll into scalar ops now to avoid unnecessary // libcalls on the undef elements. EVT VT = N->getValueType(0); EVT WideVecVT = TLI.getTypeToTransformTo(DAG.getContext(), VT); if (!TLI.isOperationLegalOrCustom(N->getOpcode(), WideVecVT) && TLI.isOperationExpand(N->getOpcode(), VT.getScalarType())) { Res = DAG.UnrollVectorOp(N, WideVecVT.getVectorNumElements()); return true; } return false; }; spatel:* Instead of copying this block, create a lambda above the switch and call it from here and the…
		break;
		// If the target has custom/legal support for the scalar FP intrinsic ops
		// (they are probably not destined to become libcalls), then widen those
		// like any other binary ops.
		LLVM_FALLTHROUGH;
		RKSimonUnsubmitted Not Done Reply Inline Actions remove these braces now? RKSimon: remove these braces now?

case ISD::FADD:		case ISD::FADD:
case ISD::FMUL:		case ISD::FMUL:
case ISD::FPOW:		case ISD::FPOW:
case ISD::FSUB:		case ISD::FSUB:
case ISD::FDIV:		case ISD::FDIV:
case ISD::FREM:
case ISD::SDIV:		case ISD::SDIV:
case ISD::UDIV:		case ISD::UDIV:
case ISD::SREM:		case ISD::SREM:
case ISD::UREM:		case ISD::UREM:
Res = WidenVecRes_BinaryCanTrap(N);		Res = WidenVecRes_BinaryCanTrap(N);
break;		break;

case ISD::SMULFIX:		case ISD::SMULFIX:
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	#include "llvm/IR/ConstrainedOps.def"
case ISD::FLOG10:		case ISD::FLOG10:
case ISD::FLOG2:		case ISD::FLOG2:
case ISD::FNEARBYINT:		case ISD::FNEARBYINT:
case ISD::FRINT:		case ISD::FRINT:
case ISD::FROUND:		case ISD::FROUND:
case ISD::FROUNDEVEN:		case ISD::FROUNDEVEN:
case ISD::FSIN:		case ISD::FSIN:
case ISD::FSQRT:		case ISD::FSQRT:
case ISD::FTRUNC: {		case ISD::FTRUNC:
// We're going to widen this vector op to a legal type by padding with undef		if (unrollExpandedOp())
// elements. If the wide vector op is eventually going to be expanded to
// scalar libcalls, then unroll into scalar ops now to avoid unnecessary
// libcalls on the undef elements.
EVT VT = N->getValueType(0);
EVT WideVecVT = TLI.getTypeToTransformTo(*DAG.getContext(), VT);
if (!TLI.isOperationLegalOrCustom(N->getOpcode(), WideVecVT) &&
TLI.isOperationExpand(N->getOpcode(), VT.getScalarType())) {
Res = DAG.UnrollVectorOp(N, WideVecVT.getVectorNumElements());
break;		break;
}
}
// If the target has custom/legal support for the scalar FP intrinsic ops		// If the target has custom/legal support for the scalar FP intrinsic ops
		RKSimonUnsubmitted Not Done Reply Inline Actions these braces can go now? RKSimon: these braces can go now?
// (they are probably not destined to become libcalls), then widen those like		// (they are probably not destined to become libcalls), then widen those
// any other unary ops.		// like any other unary ops.
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;

case ISD::ABS:		case ISD::ABS:
case ISD::BITREVERSE:		case ISD::BITREVERSE:
case ISD::BSWAP:		case ISD::BSWAP:
case ISD::CTLZ:		case ISD::CTLZ:
case ISD::CTLZ_ZERO_UNDEF:		case ISD::CTLZ_ZERO_UNDEF:
case ISD::CTPOP:		case ISD::CTPOP:
case ISD::CTTZ:		case ISD::CTTZ:
▲ Show 20 Lines • Show All 2,798 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/frem-libcall.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=x86_64-linux-gnu < %s \| FileCheck %s			; RUN: llc -mtriple=x86_64-linux-gnu < %s \| FileCheck %s

	; FIXME: Ensure vectorized FREMs are not widened/unrolled such that they get lowered			; Ensure vectorized FREMs are not widened/unrolled such that they get lowered
	; into libcalls on undef elements.			; into libcalls on undef elements.
				RKSimonUnsubmitted Not Done Reply Inline Actions Sort this to keep the description more easy to notice: ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py ; RUN: llc -mtriple=x86_64-linux-gnu < %s \| FileCheck %s ; Ensure vectorized FREMs are not widened/unrolled such that they get lowered ; into libcalls on undef elements. RKSimon: Sort this to keep the description more easy to notice: ``` ; NOTE: Assertions have been…

	define float @frem(<2 x float> %a0, <2 x float> %a1, <2 x float> %a2, <2 x float> *%p3) nounwind {			define float @frem(<2 x float> %a0, <2 x float> %a1, <2 x float> %a2, <2 x float> *%p3) nounwind {
	; CHECK-LABEL: frem:			; CHECK-LABEL: frem:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: pushq %rbx			; CHECK-NEXT: pushq %rbx
	; CHECK-NEXT: subq $80, %rsp			; CHECK-NEXT: subq $64, %rsp
	; CHECK-NEXT: movq %rdi, %rbx			; CHECK-NEXT: movq %rdi, %rbx
	; CHECK-NEXT: movaps %xmm2, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill			; CHECK-NEXT: movaps %xmm2, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
	; CHECK-NEXT: movaps %xmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill			; CHECK-NEXT: movaps %xmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
	; CHECK-NEXT: movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
	; CHECK-NEXT: shufps {{.*#+}} xmm0 = xmm0[3,3,3,3]
	; CHECK-NEXT: shufps {{.*#+}} xmm1 = xmm1[3,3,3,3]
	; CHECK-NEXT: callq fmodf@PLT
	; CHECK-NEXT: movaps %xmm0, (%rsp) # 16-byte Spill
	; CHECK-NEXT: movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
	; CHECK-NEXT: movhlps {{.*#+}} xmm0 = xmm0[1,1]
	; CHECK-NEXT: movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Reload
	; CHECK-NEXT: movhlps {{.*#+}} xmm1 = xmm1[1,1]
	; CHECK-NEXT: callq fmodf@PLT
	; CHECK-NEXT: unpcklps (%rsp), %xmm0 # 16-byte Folded Reload
	; CHECK-NEXT: # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
	; CHECK-NEXT: movaps %xmm0, (%rsp) # 16-byte Spill			; CHECK-NEXT: movaps %xmm0, (%rsp) # 16-byte Spill
	; CHECK-NEXT: movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
	; CHECK-NEXT: movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Reload
	; CHECK-NEXT: callq fmodf@PLT			; CHECK-NEXT: callq fmodf@PLT
	; CHECK-NEXT: movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill			; CHECK-NEXT: movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
	; CHECK-NEXT: movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload			; CHECK-NEXT: movaps (%rsp), %xmm0 # 16-byte Reload
	; CHECK-NEXT: shufps {{.*#+}} xmm0 = xmm0[1,1,1,1]			; CHECK-NEXT: shufps {{.*#+}} xmm0 = xmm0[1,1,1,1]
	; CHECK-NEXT: movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Reload			; CHECK-NEXT: movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Reload
	; CHECK-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,1,1,1]			; CHECK-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,1,1,1]
	; CHECK-NEXT: callq fmodf@PLT			; CHECK-NEXT: callq fmodf@PLT
	; CHECK-NEXT: movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Reload			; CHECK-NEXT: movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Reload
	; CHECK-NEXT: unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]			; CHECK-NEXT: unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
	; CHECK-NEXT: unpcklpd (%rsp), %xmm1 # 16-byte Folded Reload
	; CHECK-NEXT: # xmm1 = xmm1[0],mem[0]
	; CHECK-NEXT: divps {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload			; CHECK-NEXT: divps {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
	; CHECK-NEXT: movaps %xmm1, %xmm0			; CHECK-NEXT: movaps %xmm1, %xmm0
	; CHECK-NEXT: shufps {{.*#+}} xmm0 = xmm0[1,1],xmm1[1,1]			; CHECK-NEXT: shufps {{.*#+}} xmm0 = xmm0[1,1],xmm1[1,1]
	; CHECK-NEXT: addss %xmm1, %xmm0			; CHECK-NEXT: addss %xmm1, %xmm0
	; CHECK-NEXT: movlps %xmm1, (%rbx)			; CHECK-NEXT: movlps %xmm1, (%rbx)
	; CHECK-NEXT: addq $80, %rsp			; CHECK-NEXT: addq $64, %rsp
	; CHECK-NEXT: popq %rbx			; CHECK-NEXT: popq %rbx
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%frem = frem <2 x float> %a0, %a1			%frem = frem <2 x float> %a0, %a1
	%fdiv = fdiv <2 x float> %frem, %a2			%fdiv = fdiv <2 x float> %frem, %a2
	%ex0 = extractelement <2 x float> %fdiv, i32 0			%ex0 = extractelement <2 x float> %fdiv, i32 0
	%ex1 = extractelement <2 x float> %fdiv, i32 1			%ex1 = extractelement <2 x float> %fdiv, i32 1
	%res = fadd float %ex0, %ex1			%res = fadd float %ex0, %ex1
	store <2 x float> %fdiv, <2 x float> *%p3			store <2 x float> %fdiv, <2 x float> *%p3
	ret float %res			ret float %res
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[x86][SelectionDAG] Unroll vectorized FREM instructions which will be lowered to libcalls
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 431653

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

llvm/test/CodeGen/X86/frem-libcall.ll

This is an archive of the discontinued LLVM Phabricator instance.

[x86][SelectionDAG] Unroll vectorized FREM instructions which will be lowered to libcallsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 431653

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

llvm/test/CodeGen/X86/frem-libcall.ll

[x86][SelectionDAG] Unroll vectorized FREM instructions which will be lowered to libcalls
ClosedPublic