This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
2/4
PPCTargetTransformInfo.cpp
-
test/
-
Analysis/CostModel/PowerPC/
-
CostModel/
-
PowerPC/
-
reduce-and.ll
-
reduce-or.ll
-
Transforms/LoopVectorize/PowerPC/
-
LoopVectorize/
-
PowerPC/
1/2
predcost.ll

Differential D155876

[PowerPC] vector cost model add cost to extract i1
ClosedPublic

Authored by RolandF on Jul 20 2023, 11:53 AM.

Download Raw Diff

Details

Reviewers

shchenz
stefanp

Commits

rG4d425f86632f: [PowerPC] vector cost model add cost to extract i1

Summary

Try to avoid some unprofitable predication on PPC. Recognize in the cost model that computing on i1 values will require extra mask or compare operation.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

RolandF created this revision.Jul 20 2023, 11:53 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2023, 11:53 AM

Herald added subscribers: shchenz, kbarton, hiraditya, nemanjai. · View Herald Transcript

RolandF requested review of this revision.Jul 20 2023, 11:53 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2023, 11:53 AM

Herald added subscribers: llvm-commits, wangpc. · View Herald Transcript

RolandF added reviewers: shchenz, stefanp.Jul 20 2023, 11:56 AM

RolandF added a subscriber: power-llvm-team.

Harbormaster completed remote builds in B246991: Diff 542620.Jul 20 2023, 4:18 PM

Oops C++ fail and missed some tests.

shchenz added inline comments.Jul 21 2023, 8:40 PM

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
738	I must miss something. I compile following cases: define i1 @ext_ext_or_reduction_v4i1(<4 x i1> %z) { %z1 = extractelement <4 x i1> %z, i32 1 ret i1 %z1 } define i32 @ext_ext_or_reduction_v4i32(<4 x i32> %z) { %z1 = extractelement <4 x i32> %z, i32 1 ret i32 %z1 } Seems llc generates exactly same instructions for them at both pwr9 and pwr8. So don't understand why here we need extra cost for i1 types, the i1 type must have some kinds of users?

RolandF added inline comments.Jul 24 2023, 8:17 AM

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
738	You will see the extra overhead if you compile the provided test case. But looking at your example, notice that the value produced has 64 or 32 live bits - there are garbage bits in the results. The return performs no calculation, it just functions like a copy, but the user would still have to get rid of those bits. The uses will be scalar code - if we try to charge those with the overhead we will make the scalar loop cost more too, plus scalar code will already have the masking code there.

I think the extra cost considering for i1 type looks reasonable.

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp

738

I see. And pwr8 seems uses more rotate/clear instructions than pwr9 does as pwr9 has vextubrx. I guess we don't need to model that detail for scalar operations?

llvm/test/Transforms/LoopVectorize/PowerPC/predcost.ll

nit: I run the bugpoint and get following narrow down input:

target datalayout = "e-m:e-Fn32-i64:64-n32:64-S128-v256:256:256-v512:512:512"
target triple = "powerpc64le-unknown-linux-gnu"

define dso_local void @_tc(ptr nocapture noundef %aaa, i64 noundef %bbb) local_unnamed_addr {
entry:
  br label %for.body

for.cond.cleanup.loopexit:                        ; preds = %for.inc
  ret void

for.body:                                         ; preds = %for.inc, %entry
  %i.08 = phi i64 [ %inc, %for.inc ], [ 0, %entry ]
  %arrayidx = getelementptr inbounds i8, ptr %aaa, i64 %i.08
  %0 = load i8, ptr %arrayidx, align 1
  %cmp1 = icmp eq i8 %0, 0
  br i1 %cmp1, label %if.then, label %for.inc

if.then:                                          ; preds = %for.body
  store i8 32, ptr %arrayidx, align 1
  br label %for.inc

for.inc:                                          ; preds = %if.then, %for.body
  %inc = add nuw nsw i64 %i.08, 1
  %exitcond.not = icmp eq i64 %inc, %bbb
  br i1 %exitcond.not, label %for.cond.cleanup.loopexit, label %for.body
}

This revision is now accepted and ready to land.Jul 24 2023, 7:26 PM

RolandF added inline comments.Jul 26 2023, 11:41 AM

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
738	For pwr8 there might be 1 added instruction or there might be 2. Uses in arithmetic/logical operations should only need 1. So I could use either 1 or 2 and 1 is enough to prevent the unprofitable case.
llvm/test/Transforms/LoopVectorize/PowerPC/predcost.ll
2	thanks, will update on commit

This revision was landed with ongoing or failed builds.Aug 14 2023, 2:04 PM

Closed by commit rG4d425f86632f: [PowerPC] vector cost model add cost to extract i1 (authored by RolandF). · Explain Why

This revision was automatically updated to reflect the committed changes.

RolandF added a commit: rG4d425f86632f: [PowerPC] vector cost model add cost to extract i1.

Revision Contents

Path

Size

llvm/

lib/

Target/

PowerPC/

PPCTargetTransformInfo.cpp

15 lines

test/

Analysis/

CostModel/

PowerPC/

reduce-and.ll

8 lines

reduce-or.ll

8 lines

Transforms/

LoopVectorize/

PowerPC/

predcost.ll

29 lines

Diff 550082

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp

Show All 21 Lines
#include "llvm/Transforms/InstCombine/InstCombiner.h"		#include "llvm/Transforms/InstCombine/InstCombiner.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
#include <optional>		#include <optional>

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "ppctti"		#define DEBUG_TYPE "ppctti"

		static cl::opt<bool> VecMaskCost("ppc-vec-mask-cost",
		cl::desc("add masking cost for i1 vectors"), cl::init(true), cl::Hidden);

static cl::opt<bool> DisablePPCConstHoist("disable-ppc-constant-hoisting",		static cl::opt<bool> DisablePPCConstHoist("disable-ppc-constant-hoisting",
cl::desc("disable constant hoisting on PPC"), cl::init(false), cl::Hidden);		cl::desc("disable constant hoisting on PPC"), cl::init(false), cl::Hidden);

static cl::opt<bool>		static cl::opt<bool>
EnablePPCColdCC("ppc-enable-coldcc", cl::Hidden, cl::init(false),		EnablePPCColdCC("ppc-enable-coldcc", cl::Hidden, cl::init(false),
cl::desc("Enable using coldcc calling conv for cold "		cl::desc("Enable using coldcc calling conv for cold "
"internal functions"));		"internal functions"));

▲ Show 20 Lines • Show All 657 Lines • ▼ Show 20 Lines	if (ST->hasVSX() && Val->getScalarType()->isDoubleTy()) {
// Double-precision scalars are already located in index #0 (or #1 if LE).		// Double-precision scalars are already located in index #0 (or #1 if LE).
if (ISD == ISD::EXTRACT_VECTOR_ELT &&		if (ISD == ISD::EXTRACT_VECTOR_ELT &&
Index == (ST->isLittleEndian() ? 1 : 0))		Index == (ST->isLittleEndian() ? 1 : 0))
return 0;		return 0;

return Cost;		return Cost;

} else if (Val->getScalarType()->isIntegerTy() && Index != -1U) {		} else if (Val->getScalarType()->isIntegerTy() && Index != -1U) {
		unsigned EltSize = Val->getScalarSizeInBits();
		// Computing on 1 bit values requires extra mask or compare operations.
		unsigned MaskCost = VecMaskCost && EltSize == 1 ? 1 : 0;
if (ST->hasP9Altivec()) {		if (ST->hasP9Altivec()) {
if (ISD == ISD::INSERT_VECTOR_ELT)		if (ISD == ISD::INSERT_VECTOR_ELT)
// A move-to VSR and a permute/insert. Assume vector operation cost		// A move-to VSR and a permute/insert. Assume vector operation cost
// for both (cost will be 2x on P9).		// for both (cost will be 2x on P9).
return 2 * CostFactor;		return 2 * CostFactor;

// It's an extract. Maybe we can do a cheap move-from VSR.		// It's an extract. Maybe we can do a cheap move-from VSR.
unsigned EltSize = Val->getScalarSizeInBits();		unsigned EltSize = Val->getScalarSizeInBits();
if (EltSize == 64) {		if (EltSize == 64) {
unsigned MfvsrdIndex = ST->isLittleEndian() ? 1 : 0;		unsigned MfvsrdIndex = ST->isLittleEndian() ? 1 : 0;
if (Index == MfvsrdIndex)		if (Index == MfvsrdIndex)
return 1;		return 1;
} else if (EltSize == 32) {		} else if (EltSize == 32) {
unsigned MfvsrwzIndex = ST->isLittleEndian() ? 2 : 1;		unsigned MfvsrwzIndex = ST->isLittleEndian() ? 2 : 1;
if (Index == MfvsrwzIndex)		if (Index == MfvsrwzIndex)
return 1;		return 1;
}		}

// We need a vector extract (or mfvsrld). Assume vector operation cost.		// We need a vector extract (or mfvsrld). Assume vector operation cost.
// The cost of the load constant for a vector extract is disregarded		// The cost of the load constant for a vector extract is disregarded
// (invariant, easily schedulable).		// (invariant, easily schedulable).
return CostFactor;		return CostFactor + MaskCost;

} else if (ST->hasDirectMove())		} else if (ST->hasDirectMove()) {
// Assume permute has standard cost.		// Assume permute has standard cost.
// Assume move-to/move-from VSR have 2x standard cost.		// Assume move-to/move-from VSR have 2x standard cost.
		if (ISD == ISD::INSERT_VECTOR_ELT)
return 3;		return 3;
		return 3 + MaskCost;
		}
		shchenzUnsubmitted Not Done Reply Inline Actions I must miss something. I compile following cases: define i1 @ext_ext_or_reduction_v4i1(<4 x i1> %z) { %z1 = extractelement <4 x i1> %z, i32 1 ret i1 %z1 } define i32 @ext_ext_or_reduction_v4i32(<4 x i32> %z) { %z1 = extractelement <4 x i32> %z, i32 1 ret i32 %z1 } Seems llc generates exactly same instructions for them at both pwr9 and pwr8. So don't understand why here we need extra cost for i1 types, the i1 type must have some kinds of users? shchenz: I must miss something. I compile following cases: ``` define i1 @ext_ext_or_reduction_v4i1(<4 x…
		RolandFAuthorUnsubmitted Done Reply Inline Actions You will see the extra overhead if you compile the provided test case. But looking at your example, notice that the value produced has 64 or 32 live bits - there are garbage bits in the results. The return performs no calculation, it just functions like a copy, but the user would still have to get rid of those bits. The uses will be scalar code - if we try to charge those with the overhead we will make the scalar loop cost more too, plus scalar code will already have the masking code there. RolandF: You will see the extra overhead if you compile the provided test case. But looking at your…
		shchenzUnsubmitted Not Done Reply Inline Actions I see. And pwr8 seems uses more rotate/clear instructions than pwr9 does as pwr9 has `vextubrx`. I guess we don't need to model that detail for scalar operations? shchenz: I see. And pwr8 seems uses more rotate/clear instructions than pwr9 does as pwr9 has `vextubrx`.
		RolandFAuthorUnsubmitted Done Reply Inline Actions For pwr8 there might be 1 added instruction or there might be 2. Uses in arithmetic/logical operations should only need 1. So I could use either 1 or 2 and 1 is enough to prevent the unprofitable case. RolandF: For pwr8 there might be 1 added instruction or there might be 2. Uses in arithmetic/logical…
}		}

// Estimated cost of a load-hit-store delay. This was obtained		// Estimated cost of a load-hit-store delay. This was obtained
// experimentally as a minimum needed to prevent unprofitable		// experimentally as a minimum needed to prevent unprofitable
// vectorization for the paq8p benchmark. It may need to be		// vectorization for the paq8p benchmark. It may need to be
// raised further if other unprofitable cases remain.		// raised further if other unprofitable cases remain.
unsigned LHSPenalty = 2;		unsigned LHSPenalty = 2;
if (ISD == ISD::INSERT_VECTOR_ELT)		if (ISD == ISD::INSERT_VECTOR_ELT)
▲ Show 20 Lines • Show All 346 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/PowerPC/reduce-and.ll

	; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
	; RUN: opt < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr8 -passes="print<cost-model>" -cost-kind=throughput 2>&1 -disable-output \| FileCheck %s			; RUN: opt < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr8 -passes="print<cost-model>" -cost-kind=throughput 2>&1 -disable-output \| FileCheck %s

	define i32 @reduce_i1(i32 %arg) {			define i32 @reduce_i1(i32 %arg) {
	; CHECK-LABEL: 'reduce_i1'			; CHECK-LABEL: 'reduce_i1'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V1 = call i1 @llvm.vector.reduce.and.v1i1(<1 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V1 = call i1 @llvm.vector.reduce.and.v1i1(<1 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2 = call i1 @llvm.vector.reduce.and.v2i1(<2 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2 = call i1 @llvm.vector.reduce.and.v2i1(<2 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4 = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4 = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8 = call i1 @llvm.vector.reduce.and.v8i1(<8 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8 = call i1 @llvm.vector.reduce.and.v8i1(<8 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V16 = call i1 @llvm.vector.reduce.and.v16i1(<16 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V16 = call i1 @llvm.vector.reduce.and.v16i1(<16 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 97 for instruction: %V32 = call i1 @llvm.vector.reduce.and.v32i1(<32 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 129 for instruction: %V32 = call i1 @llvm.vector.reduce.and.v32i1(<32 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 193 for instruction: %V64 = call i1 @llvm.vector.reduce.and.v64i1(<64 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 257 for instruction: %V64 = call i1 @llvm.vector.reduce.and.v64i1(<64 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 386 for instruction: %V128 = call i1 @llvm.vector.reduce.and.v128i1(<128 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 514 for instruction: %V128 = call i1 @llvm.vector.reduce.and.v128i1(<128 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	%V1 = call i1 @llvm.vector.reduce.and.v1i1(<1 x i1> undef)			%V1 = call i1 @llvm.vector.reduce.and.v1i1(<1 x i1> undef)
	%V2 = call i1 @llvm.vector.reduce.and.v2i1(<2 x i1> undef)			%V2 = call i1 @llvm.vector.reduce.and.v2i1(<2 x i1> undef)
	%V4 = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> undef)			%V4 = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> undef)
	%V8 = call i1 @llvm.vector.reduce.and.v8i1(<8 x i1> undef)			%V8 = call i1 @llvm.vector.reduce.and.v8i1(<8 x i1> undef)
	%V16 = call i1 @llvm.vector.reduce.and.v16i1(<16 x i1> undef)			%V16 = call i1 @llvm.vector.reduce.and.v16i1(<16 x i1> undef)
	%V32 = call i1 @llvm.vector.reduce.and.v32i1(<32 x i1> undef)			%V32 = call i1 @llvm.vector.reduce.and.v32i1(<32 x i1> undef)
	Show All 13 Lines

llvm/test/Analysis/CostModel/PowerPC/reduce-or.ll

	; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
	; RUN: opt < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr8 -passes="print<cost-model>" -cost-kind=throughput 2>&1 -disable-output \| FileCheck %s			; RUN: opt < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr8 -passes="print<cost-model>" -cost-kind=throughput 2>&1 -disable-output \| FileCheck %s

	define i32 @reduce_i1(i32 %arg) {			define i32 @reduce_i1(i32 %arg) {
	; CHECK-LABEL: 'reduce_i1'			; CHECK-LABEL: 'reduce_i1'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V1 = call i1 @llvm.vector.reduce.or.v1i1(<1 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V1 = call i1 @llvm.vector.reduce.or.v1i1(<1 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2 = call i1 @llvm.vector.reduce.or.v2i1(<2 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2 = call i1 @llvm.vector.reduce.or.v2i1(<2 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4 = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4 = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8 = call i1 @llvm.vector.reduce.or.v8i1(<8 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8 = call i1 @llvm.vector.reduce.or.v8i1(<8 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V16 = call i1 @llvm.vector.reduce.or.v16i1(<16 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V16 = call i1 @llvm.vector.reduce.or.v16i1(<16 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 97 for instruction: %V32 = call i1 @llvm.vector.reduce.or.v32i1(<32 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 129 for instruction: %V32 = call i1 @llvm.vector.reduce.or.v32i1(<32 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 193 for instruction: %V64 = call i1 @llvm.vector.reduce.or.v64i1(<64 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 257 for instruction: %V64 = call i1 @llvm.vector.reduce.or.v64i1(<64 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 386 for instruction: %V128 = call i1 @llvm.vector.reduce.or.v128i1(<128 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 514 for instruction: %V128 = call i1 @llvm.vector.reduce.or.v128i1(<128 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	%V1 = call i1 @llvm.vector.reduce.or.v1i1(<1 x i1> undef)			%V1 = call i1 @llvm.vector.reduce.or.v1i1(<1 x i1> undef)
	%V2 = call i1 @llvm.vector.reduce.or.v2i1(<2 x i1> undef)			%V2 = call i1 @llvm.vector.reduce.or.v2i1(<2 x i1> undef)
	%V4 = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> undef)			%V4 = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> undef)
	%V8 = call i1 @llvm.vector.reduce.or.v8i1(<8 x i1> undef)			%V8 = call i1 @llvm.vector.reduce.or.v8i1(<8 x i1> undef)
	%V16 = call i1 @llvm.vector.reduce.or.v16i1(<16 x i1> undef)			%V16 = call i1 @llvm.vector.reduce.or.v16i1(<16 x i1> undef)
	%V32 = call i1 @llvm.vector.reduce.or.v32i1(<32 x i1> undef)			%V32 = call i1 @llvm.vector.reduce.or.v32i1(<32 x i1> undef)
	Show All 13 Lines

llvm/test/Transforms/LoopVectorize/PowerPC/predcost.ll

This file was added.

				; RUN: opt -ppc-vec-mask-cost=true -aa-pipeline=basic-aa -mcpu=pwr8 -S -passes=loop-vectorize < %s \| FileCheck %s

				shchenzUnsubmitted Not Done Reply Inline Actions nit: I run the bugpoint and get following narrow down input: target datalayout = "e-m:e-Fn32-i64:64-n32:64-S128-v256:256:256-v512:512:512" target triple = "powerpc64le-unknown-linux-gnu" define dso_local void @_tc(ptr nocapture noundef %aaa, i64 noundef %bbb) local_unnamed_addr { entry: br label %for.body for.cond.cleanup.loopexit: ; preds = %for.inc ret void for.body: ; preds = %for.inc, %entry %i.08 = phi i64 [ %inc, %for.inc ], [ 0, %entry ] %arrayidx = getelementptr inbounds i8, ptr %aaa, i64 %i.08 %0 = load i8, ptr %arrayidx, align 1 %cmp1 = icmp eq i8 %0, 0 br i1 %cmp1, label %if.then, label %for.inc if.then: ; preds = %for.body store i8 32, ptr %arrayidx, align 1 br label %for.inc for.inc: ; preds = %if.then, %for.body %inc = add nuw nsw i64 %i.08, 1 %exitcond.not = icmp eq i64 %inc, %bbb br i1 %exitcond.not, label %for.cond.cleanup.loopexit, label %for.body } shchenz: nit: I run the bugpoint and get following narrow down input: ``` target datalayout = "e-m:e…
				RolandFAuthorUnsubmitted Done Reply Inline Actions thanks, will update on commit RolandF: thanks, will update on commit
				target datalayout = "e-m:e-Fn32-i64:64-n32:64-S128-v256:256:256-v512:512:512"
				target triple = "powerpc64le-unknown-linux-gnu"

				define dso_local void @_tc(ptr nocapture noundef %aaa, i64 noundef %bbb) local_unnamed_addr {
				; CHECK-NOT: extractelement <16 x i1>
				entry:
				br label %for.body

				for.cond.cleanup.loopexit: ; preds = %for.inc
				ret void

				for.body: ; preds = %for.inc, %entry
				%i.08 = phi i64 [ %inc, %for.inc ], [ 0, %entry ]
				%arrayidx = getelementptr inbounds i8, ptr %aaa, i64 %i.08
				%0 = load i8, ptr %arrayidx, align 1
				%cmp1 = icmp eq i8 %0, 0
				br i1 %cmp1, label %if.then, label %for.inc

				if.then: ; preds = %for.body
				store i8 32, ptr %arrayidx, align 1
				br label %for.inc

				for.inc: ; preds = %if.then, %for.body
				%inc = add nuw nsw i64 %i.08, 1
				%exitcond.not = icmp eq i64 %inc, %bbb
				br i1 %exitcond.not, label %for.cond.cleanup.loopexit, label %for.body
				}

This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] vector cost model add cost to extract i1ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 550082

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp

llvm/test/Analysis/CostModel/PowerPC/reduce-and.ll

llvm/test/Analysis/CostModel/PowerPC/reduce-or.ll

llvm/test/Transforms/LoopVectorize/PowerPC/predcost.ll

[PowerPC] vector cost model add cost to extract i1
ClosedPublic