This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
-
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
Target/Hexagon/
-
Hexagon/
-
HexagonTargetTransformInfo.h
-
HexagonTargetTransformInfo.cpp
-
Transforms/Vectorize/
-
Vectorize/
-
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/Hexagon/
-
Transforms/
-
LoopVectorize/
-
Hexagon/
-
lit.local.cfg
-
minimum-vf.ll

Differential D45271

[LV] Introduce TTI::getMinimumVF
ClosedPublic

Authored by kparzysz on Apr 4 2018, 10:53 AM.

Download Raw Diff

Details

Reviewers

hsaito
craig.topper
dcaballe

Commits

rGdfed941eec93: [LV] Introduce TTI::getMinimumVF
rL330062: [LV] Introduce TTI::getMinimumVF

Summary

The function getMinimumVF(ElemWidth) will return the minimum VF for a vector with elements of size ElemWidth bits. This value will only apply to targets for which TTI::shouldMaximizeVectorBandwidth returns true. The value of 0 indicates that there is no minimum VF.

This is a follow-up to D44735.

Diff Detail

Repository: rL LLVM

Event Timeline

kparzysz created this revision.Apr 4 2018, 10:53 AM

kparzysz mentioned this in D44574: [LV] Introduce TTI::getMinimumVF.Apr 4 2018, 11:22 AM

I think TTI part also looks reasonable, but I'll wait for Craig or others to comment on that.

lib/Transforms/Vectorize/LoopVectorize.cpp
6163 ↗	(On Diff #140993)	This change in the vectorizer looks reasonable to me.
test/Transforms/LoopVectorize/Hexagon/minimum-vf.ll
1 ↗	(On Diff #140993)	Sorry for being picky. I happen to think this test is somehow unnecessarily large. Can't we have a much smaller/simpler test case? I think all we need is widest type guiding to less than 64 and/or constant trip guiding to less than 64, right? Or am I missing a point?
5 ↗	(On Diff #140993)	Please add a comment saying that hard coded 9 comes from the constant trip count, in case someone has to maintain the test case and/or validation later.

kparzysz added inline comments.Apr 4 2018, 1:54 PM

test/Transforms/LoopVectorize/Hexagon/minimum-vf.ll
5 ↗	(On Diff #140993)	I have reduced this testcase. The MaxVF of 9 actually came from register usage calculation. It was a coincidence that the iteration count was 9 as well.

Reduced the testcase.

hsaito added inline comments.Apr 4 2018, 2:55 PM

test/Transforms/LoopVectorize/Hexagon/minimum-vf.ll
5 ↗	(On Diff #140993)	Can't go something as simple as the equivalent of this? If that's the case why? for(i=0;i<9;i++){ a[i]+=1; // int add b[i]+=1; // char add }

kparzysz added inline comments.Apr 5 2018, 8:09 AM

test/Transforms/LoopVectorize/Hexagon/minimum-vf.ll

5 ↗

(On Diff #140993)

I've tried this testcase, let's call it sl.c:

int a[9];
char b[9];

void foo() {
  for (unsigned i = 0; i < 9; ++i) {
    a[i]++;
    b[i]++;
  }
}

Output from clang -target hexagon -O2 -mhvx -mllvm -hexagon-autohvx -S sl.c -fno-unroll-loops -mllvm -debug-only=loop-vectorize:

LV: Checking a loop in "foo" from sl.c
LV: Interleaving disabled by the pass manager
LV: Loop hints: force=? width=0 unroll=1
LV: Found a loop: for.body
LV: Found an induction variable.
LV: We can vectorize this loop!
LV: Found a loop with a very small trip count. This loop is worth vectorizing only if no scalar iteration overheads are incurred.
LV: Found trip count: 9
LV: The Smallest and Widest types: 8 / 32 bits.
LV: The Widest register safe to use is: 512 bits.
LV(REG): Calculating max register usage:
LV: Found uniform instruction:   %exitcond = icmp eq i32 %inc3, 9
LV: Found uniform instruction:   %arrayidx = getelementptr inbounds [9 x i32], [9 x i32]* @a, i32 0, i32 %i.08
LV: Found uniform instruction:   %arrayidx1 = getelementptr inbounds [9 x i8], [9 x i8]* @b, i32 0, i32 %i.08
LV: Found uniform instruction:   %i.08 = phi i32 [ 0, %entry ], [ %inc3, %for.body ]
LV: Found uniform instruction:   %inc3 = add nuw nsw i32 %i.08, 1
LV: Found scalar instruction:   %i.08 = phi i32 [ 0, %entry ], [ %inc3, %for.body ]
LV: Found scalar instruction:   %inc3 = add nuw nsw i32 %i.08, 1
LV: Found uniform instruction:   %exitcond = icmp eq i32 %inc3, 9
LV: Found uniform instruction:   %arrayidx = getelementptr inbounds [9 x i32], [9 x i32]* @a, i32 0, i32 %i.08
LV: Found uniform instruction:   %arrayidx1 = getelementptr inbounds [9 x i8], [9 x i8]* @b, i32 0, i32 %i.08
LV: Found uniform instruction:   %i.08 = phi i32 [ 0, %entry ], [ %inc3, %for.body ]
LV: Found uniform instruction:   %inc3 = add nuw nsw i32 %i.08, 1
LV: Found scalar instruction:   %i.08 = phi i32 [ 0, %entry ], [ %inc3, %for.body ]
LV: Found scalar instruction:   %inc3 = add nuw nsw i32 %i.08, 1
LV(REG): At #0 Interval # 0
LV(REG): At #1 Interval # 1
LV(REG): At #2 Interval # 2
LV(REG): At #3 Interval # 3
LV(REG): At #5 Interval # 1
LV(REG): At #6 Interval # 2
LV(REG): At #7 Interval # 3
LV(REG): At #9 Interval # 1
LV(REG): At #10 Interval # 1
LV(REG): VF = 32
LV(REG): Found max usage: 2
LV(REG): Found invariant usage: 0
LV(REG): VF = 64
LV(REG): Found max usage: 4
LV(REG): Found invariant usage: 0
LV: Aborting. A tail loop is required with -Os/-Oz.
LV: Vectorization is possible but not beneficial.
LV: Interleaving is not beneficial.

The VF calculated above is 64, which it what we would have wanted. On the other hand, with the testcase from this patch, the relevant output looks like this:

...
LV(REG): At #106 Interval # 2
LV(REG): At #108 Interval # 1
LV(REG): At #109 Interval # 1
LV(REG): VF = 18
LV(REG): Found max usage: 36
LV(REG): Found invariant usage: 16
LV: Overriding calculated MaxVF(9) with target's minimum: 64

Ping.

In D45271#1063304, @kparzysz wrote:

Ping.

I have no more comments. You probably want to add a few more reviewers who are familiar with TTI.

Since you're ok with the vectorizer change, I'll commit this. If someone has concerns regarding the TTI interface, I'll address it post-commit.

This revision was not accepted when it landed; it landed in state Needs Review.Apr 13 2018, 1:19 PM

Closed by commit rL330062: [LV] Introduce TTI::getMinimumVF (authored by kparzysz). · Explain Why

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Analysis/

TargetTransformInfo.h

9 lines

TargetTransformInfoImpl.h

2 lines

lib/

Analysis/

TargetTransformInfo.cpp

4 lines

Target/

Hexagon/

HexagonTargetTransformInfo.h

1 line

HexagonTargetTransformInfo.cpp

4 lines

Transforms/

Vectorize/

LoopVectorize.cpp

7 lines

test/

Transforms/

LoopVectorize/

Hexagon/

lit.local.cfg

2 lines

minimum-vf.ll

173 lines

Diff 142460

llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 674 Lines • ▼ Show 20 Lines	public:
/// \return True if the vectorization factor should be chosen to		/// \return True if the vectorization factor should be chosen to
/// make the vector of the smallest element type match the size of a		/// make the vector of the smallest element type match the size of a
/// vector register. For wider element types, this could result in		/// vector register. For wider element types, this could result in
/// creating vectors that span multiple vector registers.		/// creating vectors that span multiple vector registers.
/// If false, the vectorization factor will be chosen based on the		/// If false, the vectorization factor will be chosen based on the
/// size of the widest element type.		/// size of the widest element type.
bool shouldMaximizeVectorBandwidth(bool OptSize) const;		bool shouldMaximizeVectorBandwidth(bool OptSize) const;

		/// \return The minimum vectorization factor for types of given element
		/// bit width, or 0 if there is no mimimum VF. The returned value only
		/// applies when shouldMaximizeVectorBandwidth returns true.
		unsigned getMinimumVF(unsigned ElemWidth) const;

/// \return True if it should be considered for address type promotion.		/// \return True if it should be considered for address type promotion.
/// \p AllowPromotionWithoutCommonHeader Set true if promoting \p I is		/// \p AllowPromotionWithoutCommonHeader Set true if promoting \p I is
/// profitable without finding other extensions fed by the same input.		/// profitable without finding other extensions fed by the same input.
bool shouldConsiderAddressTypePromotion(		bool shouldConsiderAddressTypePromotion(
const Instruction &I, bool &AllowPromotionWithoutCommonHeader) const;		const Instruction &I, bool &AllowPromotionWithoutCommonHeader) const;

/// \return The size of a cache line in bytes.		/// \return The size of a cache line in bytes.
unsigned getCacheLineSize() const;		unsigned getCacheLineSize() const;
▲ Show 20 Lines • Show All 378 Lines • ▼ Show 20 Lines	public:
virtual int getIntImmCost(unsigned Opc, unsigned Idx, const APInt &Imm,		virtual int getIntImmCost(unsigned Opc, unsigned Idx, const APInt &Imm,
Type *Ty) = 0;		Type *Ty) = 0;
virtual int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,		virtual int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,
Type *Ty) = 0;		Type *Ty) = 0;
virtual unsigned getNumberOfRegisters(bool Vector) = 0;		virtual unsigned getNumberOfRegisters(bool Vector) = 0;
virtual unsigned getRegisterBitWidth(bool Vector) const = 0;		virtual unsigned getRegisterBitWidth(bool Vector) const = 0;
virtual unsigned getMinVectorRegisterBitWidth() = 0;		virtual unsigned getMinVectorRegisterBitWidth() = 0;
virtual bool shouldMaximizeVectorBandwidth(bool OptSize) const = 0;		virtual bool shouldMaximizeVectorBandwidth(bool OptSize) const = 0;
		virtual unsigned getMinimumVF(unsigned ElemWidth) const = 0;
virtual bool shouldConsiderAddressTypePromotion(		virtual bool shouldConsiderAddressTypePromotion(
const Instruction &I, bool &AllowPromotionWithoutCommonHeader) = 0;		const Instruction &I, bool &AllowPromotionWithoutCommonHeader) = 0;
virtual unsigned getCacheLineSize() = 0;		virtual unsigned getCacheLineSize() = 0;
virtual llvm::Optional<unsigned> getCacheSize(CacheLevel Level) = 0;		virtual llvm::Optional<unsigned> getCacheSize(CacheLevel Level) = 0;
virtual llvm::Optional<unsigned> getCacheAssociativity(CacheLevel Level) = 0;		virtual llvm::Optional<unsigned> getCacheAssociativity(CacheLevel Level) = 0;
virtual unsigned getPrefetchDistance() = 0;		virtual unsigned getPrefetchDistance() = 0;
virtual unsigned getMinPrefetchStride() = 0;		virtual unsigned getMinPrefetchStride() = 0;
virtual unsigned getMaxPrefetchIterationsAhead() = 0;		virtual unsigned getMaxPrefetchIterationsAhead() = 0;
▲ Show 20 Lines • Show All 283 Lines • ▼ Show 20 Lines	unsigned getRegisterBitWidth(bool Vector) const override {
return Impl.getRegisterBitWidth(Vector);		return Impl.getRegisterBitWidth(Vector);
}		}
unsigned getMinVectorRegisterBitWidth() override {		unsigned getMinVectorRegisterBitWidth() override {
return Impl.getMinVectorRegisterBitWidth();		return Impl.getMinVectorRegisterBitWidth();
}		}
bool shouldMaximizeVectorBandwidth(bool OptSize) const override {		bool shouldMaximizeVectorBandwidth(bool OptSize) const override {
return Impl.shouldMaximizeVectorBandwidth(OptSize);		return Impl.shouldMaximizeVectorBandwidth(OptSize);
}		}
		unsigned getMinimumVF(unsigned ElemWidth) const override {
		return Impl.getMinimumVF(ElemWidth);
		}
bool shouldConsiderAddressTypePromotion(		bool shouldConsiderAddressTypePromotion(
const Instruction &I, bool &AllowPromotionWithoutCommonHeader) override {		const Instruction &I, bool &AllowPromotionWithoutCommonHeader) override {
return Impl.shouldConsiderAddressTypePromotion(		return Impl.shouldConsiderAddressTypePromotion(
I, AllowPromotionWithoutCommonHeader);		I, AllowPromotionWithoutCommonHeader);
}		}
unsigned getCacheLineSize() override {		unsigned getCacheLineSize() override {
return Impl.getCacheLineSize();		return Impl.getCacheLineSize();
}		}
▲ Show 20 Lines • Show All 279 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 347 Lines • ▼ Show 20 Lines	public:
unsigned getNumberOfRegisters(bool Vector) { return 8; }		unsigned getNumberOfRegisters(bool Vector) { return 8; }

unsigned getRegisterBitWidth(bool Vector) const { return 32; }		unsigned getRegisterBitWidth(bool Vector) const { return 32; }

unsigned getMinVectorRegisterBitWidth() { return 128; }		unsigned getMinVectorRegisterBitWidth() { return 128; }

bool shouldMaximizeVectorBandwidth(bool OptSize) const { return false; }		bool shouldMaximizeVectorBandwidth(bool OptSize) const { return false; }

		unsigned getMinimumVF(unsigned ElemWidth) const { return 0; }

bool		bool
shouldConsiderAddressTypePromotion(const Instruction &I,		shouldConsiderAddressTypePromotion(const Instruction &I,
bool &AllowPromotionWithoutCommonHeader) {		bool &AllowPromotionWithoutCommonHeader) {
AllowPromotionWithoutCommonHeader = false;		AllowPromotionWithoutCommonHeader = false;
return false;		return false;
}		}

unsigned getCacheLineSize() { return 0; }		unsigned getCacheLineSize() { return 0; }
▲ Show 20 Lines • Show All 487 Lines • Show Last 20 Lines

llvm/trunk/lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 338 Lines • ▼ Show 20 Lines
	unsigned TargetTransformInfo::getMinVectorRegisterBitWidth() const {			unsigned TargetTransformInfo::getMinVectorRegisterBitWidth() const {
	return TTIImpl->getMinVectorRegisterBitWidth();			return TTIImpl->getMinVectorRegisterBitWidth();
	}			}

	bool TargetTransformInfo::shouldMaximizeVectorBandwidth(bool OptSize) const {			bool TargetTransformInfo::shouldMaximizeVectorBandwidth(bool OptSize) const {
	return TTIImpl->shouldMaximizeVectorBandwidth(OptSize);			return TTIImpl->shouldMaximizeVectorBandwidth(OptSize);
	}			}

				unsigned TargetTransformInfo::getMinimumVF(unsigned ElemWidth) const {
				return TTIImpl->getMinimumVF(ElemWidth);
				}

	bool TargetTransformInfo::shouldConsiderAddressTypePromotion(			bool TargetTransformInfo::shouldConsiderAddressTypePromotion(
	const Instruction &I, bool &AllowPromotionWithoutCommonHeader) const {			const Instruction &I, bool &AllowPromotionWithoutCommonHeader) const {
	return TTIImpl->shouldConsiderAddressTypePromotion(			return TTIImpl->shouldConsiderAddressTypePromotion(
	I, AllowPromotionWithoutCommonHeader);			I, AllowPromotionWithoutCommonHeader);
	}			}

	unsigned TargetTransformInfo::getCacheLineSize() const {			unsigned TargetTransformInfo::getCacheLineSize() const {
	return TTIImpl->getCacheLineSize();			return TTIImpl->getCacheLineSize();
	▲ Show 20 Lines • Show All 870 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/Hexagon/HexagonTargetTransformInfo.h

Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	public:
/// \name Vector TTI Implementations		/// \name Vector TTI Implementations
/// @{		/// @{

unsigned getNumberOfRegisters(bool vector) const;		unsigned getNumberOfRegisters(bool vector) const;
unsigned getMaxInterleaveFactor(unsigned VF);		unsigned getMaxInterleaveFactor(unsigned VF);
unsigned getRegisterBitWidth(bool Vector) const;		unsigned getRegisterBitWidth(bool Vector) const;
unsigned getMinVectorRegisterBitWidth() const;		unsigned getMinVectorRegisterBitWidth() const;
bool shouldMaximizeVectorBandwidth(bool OptSize) const { return true; }		bool shouldMaximizeVectorBandwidth(bool OptSize) const { return true; }
		unsigned getMinimumVF(unsigned ElemWidth) const;

bool supportsEfficientVectorElementLoadStore() {		bool supportsEfficientVectorElementLoadStore() {
return false;		return false;
}		}

unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) {		unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) {
return 0;		return 0;
}		}
▲ Show 20 Lines • Show All 107 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp

	Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
	unsigned HexagonTTIImpl::getRegisterBitWidth(bool Vector) const {			unsigned HexagonTTIImpl::getRegisterBitWidth(bool Vector) const {
	return Vector ? getMinVectorRegisterBitWidth() : 32;			return Vector ? getMinVectorRegisterBitWidth() : 32;
	}			}

	unsigned HexagonTTIImpl::getMinVectorRegisterBitWidth() const {			unsigned HexagonTTIImpl::getMinVectorRegisterBitWidth() const {
	return getST()->useHVXOps() ? getST()->getVectorLength()*8 : 0;			return getST()->useHVXOps() ? getST()->getVectorLength()*8 : 0;
	}			}

				unsigned HexagonTTIImpl::getMinimumVF(unsigned ElemWidth) const {
				return (8 * getST()->getVectorLength()) / ElemWidth;
				}

	unsigned HexagonTTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src,			unsigned HexagonTTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src,
	unsigned Alignment, unsigned AddressSpace, const Instruction *I) {			unsigned Alignment, unsigned AddressSpace, const Instruction *I) {
	if (Opcode == Instruction::Load && Src->isVectorTy()) {			if (Opcode == Instruction::Load && Src->isVectorTy()) {
	VectorType *VecTy = cast<VectorType>(Src);			VectorType *VecTy = cast<VectorType>(Src);
	unsigned VecWidth = VecTy->getBitWidth();			unsigned VecWidth = VecTy->getBitWidth();
	if (VecWidth > 64) {			if (VecWidth > 64) {
	// Assume that vectors longer than 64 bits are meant for HVX.			// Assume that vectors longer than 64 bits are meant for HVX.
	if (getNumberOfRegisters(true) > 0) {			if (getNumberOfRegisters(true) > 0) {
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,137 Lines • ▼ Show 20 Lines	if (TTI.shouldMaximizeVectorBandwidth(OptForSize) \|\|
// ones.		// ones.
unsigned TargetNumRegisters = TTI.getNumberOfRegisters(true);		unsigned TargetNumRegisters = TTI.getNumberOfRegisters(true);
for (int i = RUs.size() - 1; i >= 0; --i) {		for (int i = RUs.size() - 1; i >= 0; --i) {
if (RUs[i].MaxLocalUsers <= TargetNumRegisters) {		if (RUs[i].MaxLocalUsers <= TargetNumRegisters) {
MaxVF = VFs[i];		MaxVF = VFs[i];
break;		break;
}		}
}		}
		if (unsigned MinVF = TTI.getMinimumVF(SmallestType)) {
		if (MaxVF < MinVF) {
		DEBUG(dbgs() << "LV: Overriding calculated MaxVF(" << MaxVF
		<< ") with target's minimum: " << MinVF << '\n');
		MaxVF = MinVF;
		}
		}
}		}
return MaxVF;		return MaxVF;
}		}

VectorizationFactor		VectorizationFactor
LoopVectorizationCostModel::selectVectorizationFactor(unsigned MaxVF) {		LoopVectorizationCostModel::selectVectorizationFactor(unsigned MaxVF) {
float Cost = expectedCost(1).first;		float Cost = expectedCost(1).first;
const float ScalarCost = Cost;		const float ScalarCost = Cost;
▲ Show 20 Lines • Show All 2,521 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopVectorize/Hexagon/lit.local.cfg

				if not 'Hexagon' in config.root.targets:
				config.unsupported = True

llvm/trunk/test/Transforms/LoopVectorize/Hexagon/minimum-vf.ll

				; RUN: opt -march=hexagon -loop-vectorize -hexagon-autohvx -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s
				; REQUIRES: asserts

				; Check that TTI::getMinimumVF works. The calculated MaxVF was based on the
				; register pressure and was less than 64.
				; CHECK: LV: Overriding calculated MaxVF({{[0-9]+}}) with target's minimum: 64

				target datalayout = "e-m:e-p:32:32:32-a:0-n16:32-i64:64:64-i32:32:32-i16:16:16-i1:8:8-f32:32:32-f64:64:64-v32:32:32-v64:64:64-v512:512:512-v1024:1024:1024-v2048:2048:2048"
				target triple = "hexagon"

				%s.0 = type { i8*, i32, i32, i32, i32 }

				@g0 = external dso_local local_unnamed_addr global %s.0**, align 4

				declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #0
				declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #0

				; Function Attrs: nounwind
				define hidden fastcc void @f0(i8* nocapture %a0, i32 %a1, i32 %a2, i32 %a3, i32 %a4, i8 zeroext %a5) unnamed_addr #1 {
				b0:
				%v0 = alloca [4 x [9 x i16]], align 8
				%v1 = bitcast [4 x [9 x i16]]* %v0 to i8*
				call void @llvm.lifetime.start.p0i8(i64 72, i8* nonnull %v1) #2
				%v2 = add i32 %a1, -2
				%v3 = add i32 %a3, -9
				%v4 = icmp ugt i32 %v2, %v3
				%v5 = add i32 %a2, -2
				%v6 = add i32 %a4, -9
				%v7 = icmp ugt i32 %v5, %v6
				%v8 = or i1 %v4, %v7
				%v9 = load %s.0, %s.0* @g0, align 4, !tbaa !1
				%v10 = zext i8 %a5 to i32
				%v11 = getelementptr inbounds %s.0, %s.0* %v9, i32 %v10
				%v12 = load %s.0, %s.0* %v11, align 4, !tbaa !1
				%v13 = getelementptr inbounds %s.0, %s.0* %v12, i32 0, i32 0
				%v14 = load i8, i8* %v13, align 4, !tbaa !5
				br i1 %v8, label %b1, label %b2

				b1: ; preds = %b1, %b0
				%v15 = phi i32 [ 0, %b0 ], [ %v119, %b1 ]
				%v16 = add i32 %v5, %v15
				%v17 = icmp slt i32 %v16, 0
				%v18 = icmp slt i32 %v16, %a4
				%v19 = select i1 %v18, i32 %v16, i32 %v3
				%v20 = select i1 %v17, i32 0, i32 %v19
				%v21 = mul i32 %v20, %a3
				%v22 = add i32 97, %v21
				%v23 = getelementptr inbounds i8, i8* %v14, i32 %v22
				%v24 = load i8, i8* %v23, align 1, !tbaa !8
				%v25 = zext i8 %v24 to i32
				%v26 = add i32 101, %v21
				%v27 = getelementptr inbounds i8, i8* %v14, i32 %v26
				%v28 = load i8, i8* %v27, align 1, !tbaa !8
				%v29 = zext i8 %v28 to i32
				%v30 = mul nsw i32 %v29, -5
				%v31 = add nsw i32 %v30, %v25
				%v32 = add i32 106, %v21
				%v33 = getelementptr inbounds i8, i8* %v14, i32 %v32
				%v34 = load i8, i8* %v33, align 1, !tbaa !8
				%v35 = zext i8 %v34 to i32
				%v36 = mul nuw nsw i32 %v35, 20
				%v37 = add nsw i32 %v36, %v31
				%v38 = add i32 111, %v21
				%v39 = getelementptr inbounds i8, i8* %v14, i32 %v38
				%v40 = load i8, i8* %v39, align 1, !tbaa !8
				%v41 = zext i8 %v40 to i32
				%v42 = mul nuw nsw i32 %v41, 20
				%v43 = add nsw i32 %v42, %v37
				%v44 = add i32 116, %v21
				%v45 = getelementptr inbounds i8, i8* %v14, i32 %v44
				%v46 = load i8, i8* %v45, align 1, !tbaa !8
				%v47 = zext i8 %v46 to i32
				%v48 = mul nsw i32 %v47, -5
				%v49 = add nsw i32 %v48, %v43
				%v50 = add i32 120, %v21
				%v51 = getelementptr inbounds i8, i8* %v14, i32 %v50
				%v52 = load i8, i8* %v51, align 1, !tbaa !8
				%v53 = zext i8 %v52 to i32
				%v54 = add nsw i32 %v49, %v53
				%v55 = trunc i32 %v54 to i16
				%v56 = getelementptr inbounds [4 x [9 x i16]], [4 x [9 x i16]]* %v0, i32 0, i32 0, i32 %v15
				store i16 %v55, i16* %v56, align 2, !tbaa !9
				%v57 = mul nsw i32 %v35, -5
				%v58 = add nsw i32 %v57, %v29
				%v59 = add nsw i32 %v42, %v58
				%v60 = mul nuw nsw i32 %v47, 20
				%v61 = add nsw i32 %v60, %v59
				%v62 = mul nsw i32 %v53, -5
				%v63 = add nsw i32 %v62, %v61
				%v64 = add i32 125, %v21
				%v65 = getelementptr inbounds i8, i8* %v14, i32 %v64
				%v66 = load i8, i8* %v65, align 1, !tbaa !8
				%v67 = zext i8 %v66 to i32
				%v68 = add nsw i32 %v63, %v67
				%v69 = trunc i32 %v68 to i16
				%v70 = getelementptr inbounds [4 x [9 x i16]], [4 x [9 x i16]]* %v0, i32 0, i32 1, i32 %v15
				store i16 %v69, i16* %v70, align 2, !tbaa !9
				%v71 = mul nsw i32 %v41, -5
				%v72 = add nsw i32 %v71, %v35
				%v73 = add nsw i32 %v60, %v72
				%v74 = mul nuw nsw i32 %v53, 20
				%v75 = add nsw i32 %v74, %v73
				%v76 = mul nsw i32 %v67, -5
				%v77 = add nsw i32 %v76, %v75
				%v78 = add i32 130, %v21
				%v79 = getelementptr inbounds i8, i8* %v14, i32 %v78
				%v80 = load i8, i8* %v79, align 1, !tbaa !8
				%v81 = zext i8 %v80 to i32
				%v82 = add nsw i32 %v77, %v81
				%v83 = trunc i32 %v82 to i16
				%v84 = getelementptr inbounds [4 x [9 x i16]], [4 x [9 x i16]]* %v0, i32 0, i32 2, i32 %v15
				store i16 %v83, i16* %v84, align 2, !tbaa !9
				%v85 = add i32 92, %v21
				%v86 = getelementptr inbounds i8, i8* %v14, i32 %v85
				%v87 = load i8, i8* %v86, align 1, !tbaa !8
				%v88 = zext i8 %v87 to i16
				%v89 = add i32 135, %v21
				%v90 = getelementptr inbounds i8, i8* %v14, i32 %v89
				%v91 = load i8, i8* %v90, align 1, !tbaa !8
				%v92 = zext i8 %v91 to i16
				%v93 = mul nsw i16 %v92, -5
				%v94 = add nsw i16 %v93, %v88
				%v95 = add i32 140, %v21
				%v96 = getelementptr inbounds i8, i8* %v14, i32 %v95
				%v97 = load i8, i8* %v96, align 1, !tbaa !8
				%v98 = zext i8 %v97 to i16
				%v99 = mul nuw nsw i16 %v98, 20
				%v100 = add nsw i16 %v99, %v94
				%v101 = add i32 145, %v21
				%v102 = getelementptr inbounds i8, i8* %v14, i32 %v101
				%v103 = load i8, i8* %v102, align 1, !tbaa !8
				%v104 = zext i8 %v103 to i16
				%v105 = mul nuw nsw i16 %v104, 20
				%v106 = add i16 %v105, %v100
				%v107 = add i32 150, %v21
				%v108 = getelementptr inbounds i8, i8* %v14, i32 %v107
				%v109 = load i8, i8* %v108, align 1, !tbaa !8
				%v110 = zext i8 %v109 to i16
				%v111 = mul nsw i16 %v110, -5
				%v112 = add i16 %v111, %v106
				%v113 = add i32 154, %v21
				%v114 = getelementptr inbounds i8, i8* %v14, i32 %v113
				%v115 = load i8, i8* %v114, align 1, !tbaa !8
				%v116 = zext i8 %v115 to i16
				%v117 = add i16 %v112, %v116
				%v118 = getelementptr inbounds [4 x [9 x i16]], [4 x [9 x i16]]* %v0, i32 0, i32 3, i32 %v15
				store i16 %v117, i16* %v118, align 2, !tbaa !9
				%v119 = add nuw nsw i32 %v15, 1
				%v120 = icmp eq i32 %v119, 19
				br i1 %v120, label %b2, label %b1

				b2: ; preds = %b1, %b0
				call void @llvm.lifetime.end.p0i8(i64 72, i8* nonnull %v1) #2
				ret void
				}

				attributes #0 = { argmemonly nounwind }
				attributes #1 = { nounwind "target-cpu"="hexagonv60" "target-features"="+hvx-length64b,+hvxv60" }
				attributes #2 = { nounwind }

				!llvm.module.flags = !{!0}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{!2, !2, i64 0}
				!2 = !{!"any pointer", !3, i64 0}
				!3 = !{!"omnipotent char", !4, i64 0}
				!4 = !{!"Simple C/C++ TBAA"}
				!5 = !{!6, !2, i64 0}
				!6 = !{!"", !2, i64 0, !7, i64 4, !7, i64 8, !7, i64 12, !7, i64 16}
				!7 = !{!"int", !3, i64 0}
				!8 = !{!3, !3, i64 0}
				!9 = !{!10, !10, i64 0}
				!10 = !{!"short", !3, i64 0}