This is an archive of the discontinued LLVM Phabricator instance.

Update the vectorizer cost model for getVectorInstrCost to reflect actual costs of the operations on Power8/Power9
AbandonedPublic

Authored by nemanjai on Aug 30 2016, 2:36 AM.

Download Raw Diff

Details

Reviewers

wschmidt
kbarton
amehsan
hfinkel

Summary

Integer vector extractions and insertions are accomplished with direct moves which take about 5 cycles.
Single precision floating point values just need to be aligned and converted (and inserted with a vperm in the insert case).
Double precision floating point values just need to be aligned (and inserted using an xxpermdi in the insert case).

For Power9, 32-bit values can be inserted into a vector without a vperm so do not require loading a permute mask.

This patch reflects these aspects of the operations in getVectorInstrCost.

Diff Detail

Repository: rL LLVM

Event Timeline

nemanjai updated this revision to Diff 69654.Aug 30 2016, 2:36 AM

nemanjai retitled this revision from to Update the vectorizer cost model for getVectorInstrCost to reflect actual costs of the operations on Power8/Power9.

nemanjai updated this object.

nemanjai added reviewers: hfinkel, kbarton, amehsan, wschmidt.

nemanjai set the repository for this revision to rL LLVM.

nemanjai added a subscriber: llvm-commits.

Herald added a subscriber: nemanjai. · View Herald TranscriptAug 30 2016, 2:36 AM

nemanjai added a parent revision: D24021: Fix code-gen crash on Power9 when lowering insert_vector_elt node with variable index (PR30189).Aug 30 2016, 2:36 AM

Updated the very confusing condition for whether the type is an integer of the right width at the right index.

Please note:
This is not necessarily meant to be in a state where it can be approved/committed. This patch implements the costs of these instructions based on latency of the instructions themselves. As such, this patch is meant to start a discussion on exactly how we want the cost model to work and what we want the returned costs to represent.

As things stand now, the relative cost of a direct move and an actual Load-hit-store hazard are not nearly representative of their actual relative costs. But I've tried to ensure that we fall through to the "LHS" code in cases where we will actually have an LHS hazard.

Added a few comments to start the discussion of how this should be designed.

lib/Target/PowerPC/PPCTargetTransformInfo.cpp
315	This seems to be the point where all of the interesting changes are, so I'll add my comments here. I would recommend that some of these values (i.e, DirectMoveCost be defined somewhere else, either as an enumeration or #define. I expect that these costs will change over time (with the hardware), and it would be good to have a clear and convenient mechanism to represent that as opposed to a bunch of condition checks in this function. Off the top of my head, I can think of a few general ways to design this: Try to create a generic cost function here, that is "built" based on values defined as constants Create a separate function for each architecture, that needs to be updated with every new hardware (i.e., P8InstrCost, p9InstrCost, ...) Create a class that can be used, with subclasses for new architectures that can override the defaults when values change. I'm sure there are other possibilities as well.

I start with comments about the code here and then get to the more general dicussion:

1- Some hardcoded constants represent cost of an operation. For example in line 365 we have

DirectMoveCost + 1

and 1 seems to represent the cost of a swap/permute. There are other places that value of 1 has been used for this cost. We should clearly define a constant equal to one for this cost and use that constant name, like what is done for DirectMoveCost

2- IIUC, there is an implicit assumption here, that the cost of direct move on pwr8 and pwr9 is the same. Is this known for a fact?

As a more general comment:

1- Is there a reason that this function cannot be replaced by a target description file? Apart from the fact, that we need to do extra work to develop code to support defining these cost in td files, is there a fundamental reason that these cannot be defined in td files?

nemanjai abandoned this revision.Apr 3 2018, 4:34 PM

Herald added a subscriber: rengolin. · View Herald TranscriptApr 3 2018, 4:34 PM

Revision Contents

Path

Size

lib/

Target/

PowerPC/

PPCISelLowering.cpp

2 lines

PPCInstrVSX.td

20 lines

PPCTargetTransformInfo.cpp

78 lines

test/

Analysis/

CostModel/

PowerPC/

insert_extract.ll

2 lines

CodeGen/

PowerPC/

swaps-le-5.ll

15 lines

swaps-le-6.ll

10 lines

vsx_insert_extract_le.ll

6 lines

Diff 69681

lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 553 Lines • ▼ Show 20 Lines	if (Subtarget.hasAltivec()) {
setCondCodeAction(ISD::SETUO, MVT::v4f32, Expand);		setCondCodeAction(ISD::SETUO, MVT::v4f32, Expand);
setCondCodeAction(ISD::SETUEQ, MVT::v4f32, Expand);		setCondCodeAction(ISD::SETUEQ, MVT::v4f32, Expand);
setCondCodeAction(ISD::SETO, MVT::v4f32, Expand);		setCondCodeAction(ISD::SETO, MVT::v4f32, Expand);
setCondCodeAction(ISD::SETONE, MVT::v4f32, Expand);		setCondCodeAction(ISD::SETONE, MVT::v4f32, Expand);

if (Subtarget.hasVSX()) {		if (Subtarget.hasVSX()) {
setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v2f64, Legal);		setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v2f64, Legal);
setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v2f64, Legal);		setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v2f64, Legal);
		setOperationAction(ISD::INSERT_VECTOR_ELT, MVT::v2f64, Custom);
if (Subtarget.hasP8Vector()) {		if (Subtarget.hasP8Vector()) {
setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v4f32, Legal);		setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v4f32, Legal);
setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v4f32, Legal);		setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v4f32, Legal);
}		}
if (Subtarget.hasDirectMove() && isPPC64) {		if (Subtarget.hasDirectMove() && isPPC64) {
setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v16i8, Legal);		setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v16i8, Legal);
setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v8i16, Legal);		setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v8i16, Legal);
setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v4i32, Legal);		setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v4i32, Legal);
setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v2i64, Legal);		setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v2i64, Legal);
setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v16i8, Legal);		setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v16i8, Legal);
setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v8i16, Legal);		setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v8i16, Legal);
setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v4i32, Legal);		setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v4i32, Legal);
setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v2i64, Legal);		setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v2i64, Legal);
		setOperationAction(ISD::INSERT_VECTOR_ELT, MVT::v2i64, Custom);
}		}
setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v2f64, Legal);		setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v2f64, Legal);

setOperationAction(ISD::FFLOOR, MVT::v2f64, Legal);		setOperationAction(ISD::FFLOOR, MVT::v2f64, Legal);
setOperationAction(ISD::FCEIL, MVT::v2f64, Legal);		setOperationAction(ISD::FCEIL, MVT::v2f64, Legal);
setOperationAction(ISD::FTRUNC, MVT::v2f64, Legal);		setOperationAction(ISD::FTRUNC, MVT::v2f64, Legal);
setOperationAction(ISD::FNEARBYINT, MVT::v2f64, Legal);		setOperationAction(ISD::FNEARBYINT, MVT::v2f64, Legal);
setOperationAction(ISD::FROUND, MVT::v2f64, Legal);		setOperationAction(ISD::FROUND, MVT::v2f64, Legal);
▲ Show 20 Lines • Show All 11,640 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCInstrVSX.td

Show First 20 Lines • Show All 844 Lines • ▼ Show 20 Lines
let Predicates = [IsBigEndian] in {		let Predicates = [IsBigEndian] in {
def : Pat<(v2f64 (scalar_to_vector f64:$A)),		def : Pat<(v2f64 (scalar_to_vector f64:$A)),
(v2f64 (SUBREG_TO_REG (i64 1), $A, sub_64))>;		(v2f64 (SUBREG_TO_REG (i64 1), $A, sub_64))>;

def : Pat<(f64 (extractelt v2f64:$S, 0)),		def : Pat<(f64 (extractelt v2f64:$S, 0)),
(f64 (EXTRACT_SUBREG $S, sub_64))>;		(f64 (EXTRACT_SUBREG $S, sub_64))>;
def : Pat<(f64 (extractelt v2f64:$S, 1)),		def : Pat<(f64 (extractelt v2f64:$S, 1)),
(f64 (EXTRACT_SUBREG (XXPERMDI $S, $S, 2), sub_64))>;		(f64 (EXTRACT_SUBREG (XXPERMDI $S, $S, 2), sub_64))>;
		def : Pat<(v2f64 (insertelt v2f64:$A, f64:$B, 0)),
		(v2f64 (XXPERMDI (COPY_TO_REGCLASS $B, VSRC), $A, 1))>;
		def : Pat<(v2f64 (insertelt v2f64:$A, f64:$B, 1)),
		(v2f64 (XXPERMDI $A, (COPY_TO_REGCLASS $B, VSRC), 0))>;
}		}

let Predicates = [IsLittleEndian] in {		let Predicates = [IsLittleEndian] in {
def : Pat<(v2f64 (scalar_to_vector f64:$A)),		def : Pat<(v2f64 (scalar_to_vector f64:$A)),
(v2f64 (XXPERMDI (SUBREG_TO_REG (i64 1), $A, sub_64),		(v2f64 (XXPERMDI (SUBREG_TO_REG (i64 1), $A, sub_64),
(SUBREG_TO_REG (i64 1), $A, sub_64), 0))>;		(SUBREG_TO_REG (i64 1), $A, sub_64), 0))>;

def : Pat<(f64 (extractelt v2f64:$S, 0)),		def : Pat<(f64 (extractelt v2f64:$S, 0)),
(f64 (EXTRACT_SUBREG (XXPERMDI $S, $S, 2), sub_64))>;		(f64 (EXTRACT_SUBREG (XXPERMDI $S, $S, 2), sub_64))>;
def : Pat<(f64 (extractelt v2f64:$S, 1)),		def : Pat<(f64 (extractelt v2f64:$S, 1)),
(f64 (EXTRACT_SUBREG $S, sub_64))>;		(f64 (EXTRACT_SUBREG $S, sub_64))>;
		def : Pat<(v2f64 (insertelt v2f64:$A, f64:$B, 0)),
		(v2f64 (XXPERMDI $A, (COPY_TO_REGCLASS $B, VSRC), 0))>;
		def : Pat<(v2f64 (insertelt v2f64:$A, f64:$B, 1)),
		(v2f64 (XXPERMDI (COPY_TO_REGCLASS $B, VSRC), $A, 1))>;
}		}

// Additional fnmsub patterns: -ac + b == -(ac - b)		// Additional fnmsub patterns: -ac + b == -(ac - b)
def : Pat<(fma (fneg f64:$A), f64:$C, f64:$B),		def : Pat<(fma (fneg f64:$A), f64:$C, f64:$B),
(XSNMSUBADP $B, $C, $A)>;		(XSNMSUBADP $B, $C, $A)>;
def : Pat<(fma f64:$A, (fneg f64:$C), f64:$B),		def : Pat<(fma f64:$A, (fneg f64:$C), f64:$B),
(XSNMSUBADP $B, $C, $A)>;		(XSNMSUBADP $B, $C, $A)>;

▲ Show 20 Lines • Show All 780 Lines • ▼ Show 20 Lines	let Predicates = [IsBigEndian, HasDirectMove] in {
def : Pat<(v16i8 (scalar_to_vector i32:$A)),		def : Pat<(v16i8 (scalar_to_vector i32:$A)),
(v16i8 (SUBREG_TO_REG (i64 1), MovesToVSR.BE_BYTE_0, sub_64))>;		(v16i8 (SUBREG_TO_REG (i64 1), MovesToVSR.BE_BYTE_0, sub_64))>;
def : Pat<(v8i16 (scalar_to_vector i32:$A)),		def : Pat<(v8i16 (scalar_to_vector i32:$A)),
(v8i16 (SUBREG_TO_REG (i64 1), MovesToVSR.BE_HALF_0, sub_64))>;		(v8i16 (SUBREG_TO_REG (i64 1), MovesToVSR.BE_HALF_0, sub_64))>;
def : Pat<(v4i32 (scalar_to_vector i32:$A)),		def : Pat<(v4i32 (scalar_to_vector i32:$A)),
(v4i32 (SUBREG_TO_REG (i64 1), MovesToVSR.BE_WORD_0, sub_64))>;		(v4i32 (SUBREG_TO_REG (i64 1), MovesToVSR.BE_WORD_0, sub_64))>;
def : Pat<(v2i64 (scalar_to_vector i64:$A)),		def : Pat<(v2i64 (scalar_to_vector i64:$A)),
(v2i64 (SUBREG_TO_REG (i64 1), MovesToVSR.BE_DWORD_0, sub_64))>;		(v2i64 (SUBREG_TO_REG (i64 1), MovesToVSR.BE_DWORD_0, sub_64))>;
		def : Pat<(v2i64 (insertelt v2i64:$V, i64:$A, 0)),
		(v2i64 (XXPERMDI
		(COPY_TO_REGCLASS MovesToVSR.BE_DWORD_0, VSRC), $V, 1))>;
		def : Pat<(v2i64 (insertelt v2i64:$V, i64:$A, 1)),
		(v2i64 (XXPERMDI
		$V, (COPY_TO_REGCLASS MovesToVSR.BE_DWORD_0, VSRC), 0))>;
def : Pat<(i32 (vector_extract v16i8:$S, 0)),		def : Pat<(i32 (vector_extract v16i8:$S, 0)),
(i32 VectorExtractions.LE_BYTE_15)>;		(i32 VectorExtractions.LE_BYTE_15)>;
def : Pat<(i32 (vector_extract v16i8:$S, 1)),		def : Pat<(i32 (vector_extract v16i8:$S, 1)),
(i32 VectorExtractions.LE_BYTE_14)>;		(i32 VectorExtractions.LE_BYTE_14)>;
def : Pat<(i32 (vector_extract v16i8:$S, 2)),		def : Pat<(i32 (vector_extract v16i8:$S, 2)),
(i32 VectorExtractions.LE_BYTE_13)>;		(i32 VectorExtractions.LE_BYTE_13)>;
def : Pat<(i32 (vector_extract v16i8:$S, 3)),		def : Pat<(i32 (vector_extract v16i8:$S, 3)),
(i32 VectorExtractions.LE_BYTE_12)>;		(i32 VectorExtractions.LE_BYTE_12)>;
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	let Predicates = [IsLittleEndian, HasDirectMove] in {
def : Pat<(v16i8 (scalar_to_vector i32:$A)),		def : Pat<(v16i8 (scalar_to_vector i32:$A)),
(v16i8 (COPY_TO_REGCLASS MovesToVSR.LE_WORD_0, VSRC))>;		(v16i8 (COPY_TO_REGCLASS MovesToVSR.LE_WORD_0, VSRC))>;
def : Pat<(v8i16 (scalar_to_vector i32:$A)),		def : Pat<(v8i16 (scalar_to_vector i32:$A)),
(v8i16 (COPY_TO_REGCLASS MovesToVSR.LE_WORD_0, VSRC))>;		(v8i16 (COPY_TO_REGCLASS MovesToVSR.LE_WORD_0, VSRC))>;
def : Pat<(v4i32 (scalar_to_vector i32:$A)),		def : Pat<(v4i32 (scalar_to_vector i32:$A)),
(v4i32 MovesToVSR.LE_WORD_0)>;		(v4i32 MovesToVSR.LE_WORD_0)>;
def : Pat<(v2i64 (scalar_to_vector i64:$A)),		def : Pat<(v2i64 (scalar_to_vector i64:$A)),
(v2i64 MovesToVSR.LE_DWORD_0)>;		(v2i64 MovesToVSR.LE_DWORD_0)>;
		def : Pat<(v2i64 (insertelt v2i64:$V, i64:$A, 0)),
		(v2i64 (XXPERMDI
		$V, (COPY_TO_REGCLASS MovesToVSR.BE_DWORD_0, VSRC), 0))>;
		def : Pat<(v2i64 (insertelt v2i64:$V, i64:$A, 1)),
		(v2i64 (XXPERMDI
		(COPY_TO_REGCLASS MovesToVSR.BE_DWORD_0, VSRC), $V, 1))>;
def : Pat<(i32 (vector_extract v16i8:$S, 0)),		def : Pat<(i32 (vector_extract v16i8:$S, 0)),
(i32 VectorExtractions.LE_BYTE_0)>;		(i32 VectorExtractions.LE_BYTE_0)>;
def : Pat<(i32 (vector_extract v16i8:$S, 1)),		def : Pat<(i32 (vector_extract v16i8:$S, 1)),
(i32 VectorExtractions.LE_BYTE_1)>;		(i32 VectorExtractions.LE_BYTE_1)>;
def : Pat<(i32 (vector_extract v16i8:$S, 2)),		def : Pat<(i32 (vector_extract v16i8:$S, 2)),
(i32 VectorExtractions.LE_BYTE_2)>;		(i32 VectorExtractions.LE_BYTE_2)>;
def : Pat<(i32 (vector_extract v16i8:$S, 3)),		def : Pat<(i32 (vector_extract v16i8:$S, 3)),
(i32 VectorExtractions.LE_BYTE_3)>;		(i32 VectorExtractions.LE_BYTE_3)>;
▲ Show 20 Lines • Show All 511 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCTargetTransformInfo.cpp

Show First 20 Lines • Show All 306 Lines • ▼ Show 20 Lines	int PPCTTIImpl::getCastInstrCost(unsigned Opcode, Type Dst, Type Src) {

return BaseT::getCastInstrCost(Opcode, Dst, Src);		return BaseT::getCastInstrCost(Opcode, Dst, Src);
}		}

int PPCTTIImpl::getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy) {		int PPCTTIImpl::getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy) {
return BaseT::getCmpSelInstrCost(Opcode, ValTy, CondTy);		return BaseT::getCmpSelInstrCost(Opcode, ValTy, CondTy);
}		}

int PPCTTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index) {		int PPCTTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index) {
		kbartonUnsubmitted Not Done Reply Inline Actions This seems to be the point where all of the interesting changes are, so I'll add my comments here. I would recommend that some of these values (i.e, DirectMoveCost be defined somewhere else, either as an enumeration or #define. I expect that these costs will change over time (with the hardware), and it would be good to have a clear and convenient mechanism to represent that as opposed to a bunch of condition checks in this function. Off the top of my head, I can think of a few general ways to design this: Try to create a generic cost function here, that is "built" based on values defined as constants Create a separate function for each architecture, that needs to be updated with every new hardware (i.e., P8InstrCost, p9InstrCost, ...) Create a class that can be used, with subclasses for new architectures that can override the defaults when values change. I'm sure there are other possibilities as well. kbarton: This seems to be the point where all of the interesting changes are, so I'll add my comments…
assert(Val->isVectorTy() && "This must be a vector type");		assert(Val->isVectorTy() && "This must be a vector type");

		int DirectMoveCost = 5;
int ISD = TLI->InstructionOpcodeToISD(Opcode);		int ISD = TLI->InstructionOpcodeToISD(Opcode);
assert(ISD && "Invalid opcode");		assert(ISD && "Invalid opcode");

if (ST->hasVSX() && Val->getScalarType()->isDoubleTy()) {		if (ST->hasQPX() && Val->getScalarType()->isFloatingPointTy()) {
// Double-precision scalars are already located in index #0.		// Floating point scalars are already located in index #0.
if (Index == 0)		if (Index == 0)
return 0;		return 0;

return BaseT::getVectorInstrCost(Opcode, Val, Index);		return BaseT::getVectorInstrCost(Opcode, Val, Index);
} else if (ST->hasQPX() && Val->getScalarType()->isFloatingPointTy()) {		}
// Floating point scalars are already located in index #0.
if (Index == 0)		// Handle vector insertions and extractions differently because extractions
		// are inherrently more efficient on Power hardware prior to Power9.
		if (ISD == ISD::EXTRACT_VECTOR_ELT) {
		// Doubles are already at index 1/0 (LE/BE)
		if (ST->hasVSX() && Val->getScalarType()->isDoubleTy()) {
		if (Index == ST->isLittleEndian() ? 1 : 0)
return 0;		return 0;
		else // Otherwise it needs a swap
		return 1;
		}

		// Integers are moved out of the following indices (LE/BE):
		// 8-bit -> 8/7
		// 16-bit -> 3/4
		// 32-bit -> 2/1
		// 64-bit -> 1/0
		auto isAtNaturalIndex = [=](int BitWidth, bool IsLE) -> bool {
		if (IsLE)
		return (BitWidth == 64 && Index == 1) \|\|
		(BitWidth == 32 && Index == 2) \|\|
		(BitWidth == 16 && Index == 3) \|\|
		(BitWidth = 8 && Index == 8);
		else
		return (BitWidth == 64 && Index == 0) \|\|
		(BitWidth == 32 && Index == 1) \|\|
		(BitWidth == 16 && Index == 4) \|\|
		(BitWidth == 8 && Index == 7);
		return false;
		};

		if (Val->isIntegerTy() && ST->hasDirectMove() &&
		isAtNaturalIndex(Val->getScalarSizeInBits(), ST->isLittleEndian()))
		return DirectMoveCost;
		else if (ST->hasDirectMove() && Index != -1U)
		// Otherwise they need repositioning in the vector register.
		return DirectMoveCost + 1;
		} else if (ISD == ISD::INSERT_VECTOR_ELT) {
		// Doubles only need a variant of XXPERMDI to be inserted (when the index
		// is constant).
		if (ST->hasVSX() && Val->getScalarType()->isDoubleTy() &&
		Index != -1U)
		return 1;

		// On Power9 we do not need to permute the vectors when inserting a 32-bit
		// value. Floating point gets converted and inserted, integers are moved and
		// inserted.
		if (ST->hasP9Vector() && Val->getScalarType()->isFloatTy() &&
		Index != -1U)
		return 2; // Convert + insert
		if (ST->hasP9Vector() && Val->getScalarType()->isIntegerTy(32) &&
		Index != -1U)
		return DirectMoveCost + 1; // Move + insert

		// 64-bit integers need a direct move and a variant of XXPERMDI to be
		// inserted (when the index is constant).
		if (ST->hasDirectMove() && Val->getScalarType()->isIntegerTy(64) &&
		Index != -1U)
		return DirectMoveCost + 1;

		// Smaller integers are inserted by moving into the VSR, loading a permute
		// mask and permuting the vectors.
		if (ST->hasDirectMove() && Val->getScalarType()->isIntegerTy() &&
		Index != -1U)
		return DirectMoveCost + 3;
		} else
return BaseT::getVectorInstrCost(Opcode, Val, Index);		return BaseT::getVectorInstrCost(Opcode, Val, Index);
}
		// Other inserts/extracts will incur an LHS penalty.

// Estimated cost of a load-hit-store delay. This was obtained		// Estimated cost of a load-hit-store delay. This was obtained
// experimentally as a minimum needed to prevent unprofitable		// experimentally as a minimum needed to prevent unprofitable
// vectorization for the paq8p benchmark. It may need to be		// vectorization for the paq8p benchmark. It may need to be
// raised further if other unprofitable cases remain.		// raised further if other unprofitable cases remain.
unsigned LHSPenalty = 2;		unsigned LHSPenalty = 2;
if (ISD == ISD::INSERT_VECTOR_ELT)		if (ISD == ISD::INSERT_VECTOR_ELT)
LHSPenalty += 7;		LHSPenalty += 7;
▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

test/Analysis/CostModel/PowerPC/insert_extract.ll

	; RUN: opt < %s -cost-model -analyze -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 \| FileCheck %s			; RUN: opt < %s -cost-model -analyze -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 \| FileCheck %s
	target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"			target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"
	target triple = "powerpc64-unknown-linux-gnu"			target triple = "powerpc64-unknown-linux-gnu"

	define i32 @insert(i32 %arg) {			define i32 @insert(i32 %arg) {
	; CHECK: cost of 10 {{.*}} insertelement			; CHECK: cost of 10 {{.*}} insertelement
	%x = insertelement <4 x i32> undef, i32 %arg, i32 0			%x = insertelement <4 x i32> undef, i32 %arg, i32 0
	ret i32 undef			ret i32 undef
	}			}

	define i32 @extract(<4 x i32> %arg) {			define i32 @extract(<4 x i32> %arg) {
	; CHECK: cost of 3 {{.*}} extractelement			; CHECK: cost of 5 {{.*}} extractelement
	%x = extractelement <4 x i32> %arg, i32 0			%x = extractelement <4 x i32> %arg, i32 0
	ret i32 %x			ret i32 %x
	}			}

test/CodeGen/PowerPC/swaps-le-5.ll

	Show All 9 Lines
	entry:			entry:
	%0 = load <2 x double>, <2 x double>* @x, align 16			%0 = load <2 x double>, <2 x double>* @x, align 16
	%vecins = insertelement <2 x double> %0, double %y, i32 0			%vecins = insertelement <2 x double> %0, double %y, i32 0
	store <2 x double> %vecins, <2 x double>* @z, align 16			store <2 x double> %vecins, <2 x double>* @z, align 16
	ret void			ret void
	}			}

	; CHECK-LABEL: @bar0			; CHECK-LABEL: @bar0
	; CHECK-DAG: lxvd2x [[REG1:[0-9]+]]			; CHECK-DAG: xxswapd [[REG1:[0-9]+]], 1
	; CHECK-DAG: xxspltd [[REG2:[0-9]+]]			; CHECK-DAG: lxvd2x [[REG2:[0-9]+]]
	; CHECK: xxpermdi [[REG3:[0-9]+]], [[REG2]], [[REG1]], 1			; CHECK-NOT: xxswapd
	; CHECK: stxvd2x [[REG3]]			; CHECK:xxmrgld [[REG3:[0-9]+]], [[REG1]], [[REG2]]
	; CHECK-NOT: xxswapd			; CHECK-NOT: xxswapd
				; CHECK: stxvd2x [[REG3]]

	define void @bar1(double %y) {			define void @bar1(double %y) {
	entry:			entry:
	%0 = load <2 x double>, <2 x double>* @x, align 16			%0 = load <2 x double>, <2 x double>* @x, align 16
	%vecins = insertelement <2 x double> %0, double %y, i32 1			%vecins = insertelement <2 x double> %0, double %y, i32 1
	store <2 x double> %vecins, <2 x double>* @z, align 16			store <2 x double> %vecins, <2 x double>* @z, align 16
	ret void			ret void
	}			}

	; CHECK-LABEL: @bar1			; CHECK-LABEL: @bar1
	; CHECK-DAG: lxvd2x [[REG1:[0-9]+]]			; CHECK-DAG: xxswapd [[REG1:[0-9]+]], 1
	; CHECK-DAG: xxspltd [[REG2:[0-9]+]]			; CHECK-DAG: lxvd2x [[REG2:[0-9]+]]
	; CHECK: xxmrghd [[REG3:[0-9]+]], [[REG1]], [[REG2]]			; CHECK: xxpermdi [[REG3:[0-9]+]], [[REG2]], [[REG1]], 1
	; CHECK: stxvd2x [[REG3]]			; CHECK: stxvd2x [[REG3]]
	; CHECK-NOT: xxswapd			; CHECK-NOT: xxswapd

	define void @baz0() {			define void @baz0() {
	entry:			entry:
	%0 = load <2 x double>, <2 x double>* @z, align 16			%0 = load <2 x double>, <2 x double>* @z, align 16
	%1 = load <2 x double>, <2 x double>* @x, align 16			%1 = load <2 x double>, <2 x double>* @x, align 16
	%vecins = shufflevector <2 x double> %0, <2 x double> %1, <2 x i32> <i32 0, i32 2>			%vecins = shufflevector <2 x double> %0, <2 x double> %1, <2 x i32> <i32 0, i32 2>
	Show All 27 Lines

test/CodeGen/PowerPC/swaps-le-6.ll

Show All 14 Lines	entry:
%vecins = insertelement <2 x double> %0, double %1, i32 0		%vecins = insertelement <2 x double> %0, double %1, i32 0
store <2 x double> %vecins, <2 x double>* @z, align 16		store <2 x double> %vecins, <2 x double>* @z, align 16
ret void		ret void
}		}

; CHECK-LABEL: @bar0		; CHECK-LABEL: @bar0
; CHECK-DAG: lxvd2x [[REG1:[0-9]+]]		; CHECK-DAG: lxvd2x [[REG1:[0-9]+]]
; CHECK-DAG: lxsdx [[REG2:[0-9]+]]		; CHECK-DAG: lxsdx [[REG2:[0-9]+]]
; CHECK: xxspltd [[REG4:[0-9]+]], [[REG2]], 0		; CHECK: xxswapd [[REG4:[0-9]+]], [[REG2]]
; CHECK: xxpermdi [[REG5:[0-9]+]], [[REG4]], [[REG1]], 1		; CHECK: xxmrgld [[REG5:[0-9]+]], [[REG4]], [[REG1]]
		; CHECK-NOT: xxswapd
; CHECK: stxvd2x [[REG5]]		; CHECK: stxvd2x [[REG5]]

define void @bar1() {		define void @bar1() {
entry:		entry:
%0 = load <2 x double>, <2 x double>* @x, align 16		%0 = load <2 x double>, <2 x double>* @x, align 16
%1 = load double, double* @y, align 8		%1 = load double, double* @y, align 8
%vecins = insertelement <2 x double> %0, double %1, i32 1		%vecins = insertelement <2 x double> %0, double %1, i32 1
store <2 x double> %vecins, <2 x double>* @z, align 16		store <2 x double> %vecins, <2 x double>* @z, align 16
ret void		ret void
}		}

; CHECK-LABEL: @bar1		; CHECK-LABEL: @bar1
; CHECK-DAG: lxvd2x [[REG1:[0-9]+]]		; CHECK-DAG: lxvd2x [[REG1:[0-9]+]]
; CHECK-DAG: lxsdx [[REG2:[0-9]+]]		; CHECK-DAG: lxsdx [[REG2:[0-9]+]]
; CHECK: xxspltd [[REG4:[0-9]+]], [[REG2]], 0		; CHECK: xxswapd [[REG4:[0-9]+]], [[REG2]]
; CHECK: xxmrghd [[REG5:[0-9]+]], [[REG1]], [[REG4]]		; CHECK: xxpermdi [[REG5:[0-9]+]], [[REG1]], [[REG4]], 1
		; CHECK-NOT: xxswapd
; CHECK: stxvd2x [[REG5]]		; CHECK: stxvd2x [[REG5]]

test/CodeGen/PowerPC/vsx_insert_extract_le.ll

	; RUN: llc -verify-machineinstrs -mcpu=pwr8 -mattr=+vsx -mtriple=powerpc64le-unknown-linux-gnu < %s \| FileCheck %s			; RUN: llc -verify-machineinstrs -mcpu=pwr8 -mattr=+vsx -mtriple=powerpc64le-unknown-linux-gnu < %s \| FileCheck %s

	define <2 x double> @testi0(<2 x double>* %p1, double* %p2) {			define <2 x double> @testi0(<2 x double>* %p1, double* %p2) {
	%v = load <2 x double>, <2 x double>* %p1			%v = load <2 x double>, <2 x double>* %p1
	%s = load double, double* %p2			%s = load double, double* %p2
	%r = insertelement <2 x double> %v, double %s, i32 0			%r = insertelement <2 x double> %v, double %s, i32 0
	ret <2 x double> %r			ret <2 x double> %r

	; CHECK-LABEL: testi0			; CHECK-LABEL: testi0
	; CHECK: lxvd2x 0, 0, 3			; CHECK: lxvd2x 0, 0, 3
	; CHECK: lxsdx 1, 0, 4			; CHECK: lxsdx 1, 0, 4
	; CHECK: xxswapd 0, 0			; CHECK: xxswapd 0, 0
	; CHECK: xxspltd 1, 1, 0			; CHECK: xxmrghd 34, 0, 1
	; CHECK: xxpermdi 34, 0, 1, 1
	}			}

	define <2 x double> @testi1(<2 x double>* %p1, double* %p2) {			define <2 x double> @testi1(<2 x double>* %p1, double* %p2) {
	%v = load <2 x double>, <2 x double>* %p1			%v = load <2 x double>, <2 x double>* %p1
	%s = load double, double* %p2			%s = load double, double* %p2
	%r = insertelement <2 x double> %v, double %s, i32 1			%r = insertelement <2 x double> %v, double %s, i32 1
	ret <2 x double> %r			ret <2 x double> %r

	; CHECK-LABEL: testi1			; CHECK-LABEL: testi1
	; CHECK: lxvd2x 0, 0, 3			; CHECK: lxvd2x 0, 0, 3
	; CHECK: lxsdx 1, 0, 4			; CHECK: lxsdx 1, 0, 4
	; CHECK: xxswapd 0, 0			; CHECK: xxswapd 0, 0
	; CHECK: xxspltd 1, 1, 0			; CHECK: xxpermdi 34, 1, 0, 1
	; CHECK: xxmrgld 34, 1, 0
	}			}

	define double @teste0(<2 x double>* %p1) {			define double @teste0(<2 x double>* %p1) {
	%v = load <2 x double>, <2 x double>* %p1			%v = load <2 x double>, <2 x double>* %p1
	%r = extractelement <2 x double> %v, i32 0			%r = extractelement <2 x double> %v, i32 0
	ret double %r			ret double %r

	; CHECK-LABEL: teste0			; CHECK-LABEL: teste0
	Show All 12 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Update the vectorizer cost model for getVectorInstrCost to reflect actual costs of the operations on Power8/Power9AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 69681

lib/Target/PowerPC/PPCISelLowering.cpp

lib/Target/PowerPC/PPCInstrVSX.td

lib/Target/PowerPC/PPCTargetTransformInfo.cpp

test/Analysis/CostModel/PowerPC/insert_extract.ll

test/CodeGen/PowerPC/swaps-le-5.ll

test/CodeGen/PowerPC/swaps-le-6.ll

test/CodeGen/PowerPC/vsx_insert_extract_le.ll

Update the vectorizer cost model for getVectorInstrCost to reflect actual costs of the operations on Power8/Power9
AbandonedPublic