This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
1/1
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/AArch64/
-
Transforms/
-
LoopVectorize/
-
AArch64/
-
runtime-check-size-based-threshold.ll
2
sve-epilog-vect-vscale-tune.ll
-
sve-fneg.ll
-
type-shrinkage-zext-costs.ll

Differential D147522

[LoopVectorize] Take vscale into account when deciding to create epilogues
ClosedPublic

Authored by david-arm on Apr 4 2023, 5:28 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
reames
hassnaa-arm
kmclaughlin
dmgreen
paulwalker-arm

Commits

rG69ee6533131d: [LoopVectorize] Take vscale into account when deciding to create epilogues

Summary

In LoopVectorizationCostModel::isEpilogueVectorizationProfitable we
check to see if the chosen main vector loop VF >= 16. If so, we
decide to create a vector epilogue loop. However, this doesn't
take VScaleForTuning into account because we could be targeting a
CPU where vscale > 1, and hence the runtime VF would be a multiple
of the known minimum value.

This patch multiplies scalable VFs by VScaleForTuning and several
tests have been updated that now produce vector epilogues.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

david-arm created this revision.Apr 4 2023, 5:28 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 4 2023, 5:28 AM

Herald added subscribers: shiva0217, hiraditya. · View Herald Transcript

david-arm requested review of this revision.Apr 4 2023, 5:28 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 4 2023, 5:28 AM

Herald added subscribers: llvm-commits, • pcwang-thead, alextsao1999. · View Herald Transcript

Harbormaster completed remote builds in B223546: Diff 510758.Apr 4 2023, 6:31 AM

It looks like intrinsiccost.ll is failing in the precommit tests?

In D147522#4243689, @fhahn wrote:

It looks like intrinsiccost.ll is failing in the precommit tests?

Well spotted @fhahn, thanks! Not sure what happened there as I thought I'd run make check. Oh well!

Reverted test changes for intrinsiccost.ll.

Harbormaster completed remote builds in B223738: Diff 511013.Apr 5 2023, 2:16 AM

david-arm added a reviewer: dmgreen.Apr 12 2023, 6:39 AM

I'm worried the test changes don't look relevant to the code change. I mean, sure the changes are are the effect of the code change, but the tests themselves are not related to epilogue vectorisation? nor do they specify what to tune for and thus they've only changed because of the current default for this tuning option. This means the tests will change if/when the default changes. I think it would be better to have a dedicated test file that shows the result of a simple loop when RUN using several cpu tuning parameters. And for the other/existing loop vectorisation tests to be independent of the effect of tuning when the effect is not relevant to what the test is protecting (which is how you're already handling runtime-check-size-based-threshold.ll).

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5625–5626	You could use `Multiplier = getVScaleForTuning().value_or(1)` here if you wanted. At some point in the future we might want to change this to `value_or(MinVScale)` anyway.

Reverted test changes and added a new specific test for epilogue vectorisation with vscale tuning.

david-arm marked an inline comment as done.Apr 12 2023, 8:13 AM

I've a couple of suggestions but otherwise looks good.

llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-vscale-tune.ll
4–5	Perhaps add a RUN line for the default (i.e. with no -mcpu option) that can presumably reuse the check lines for CHECK-NV1.
52	It's up to you Dave but I don't see autogenerating the check lines offering any value here. If anything it makes it harder to understand the effect. Something simple like: ; CHECK-EPILOGUE: vec.epilog.ph: ; CHECK-EPILOGUE: load <vscale x 4 x i16> ; CHECK-NO-EPILOGUE-NOT: vec.epilog.ph: seems like a clearer test?

This revision is now accepted and ready to land.Apr 12 2023, 8:37 AM

Harbormaster completed remote builds in B225079: Diff 512842.Apr 12 2023, 8:47 AM

Closed by commit rG69ee6533131d: [LoopVectorize] Take vscale into account when deciding to create epilogues (authored by david-arm). · Explain WhyApr 17 2023, 3:50 AM

This revision was automatically updated to reflect the committed changes.

david-arm added a commit: rG69ee6533131d: [LoopVectorize] Take vscale into account when deciding to create epilogues.

david-arm mentioned this in D148123: [AArch64][CostModel] Make sext/zext free if folded into a masked load.Apr 19 2023, 9:05 AM

Matt added a subscriber: Matt.Apr 19 2023, 9:31 AM

dewen added a subscriber: dewen.Tue, Nov 14, 1:25 AM

Herald added subscribers: wangpc, artagnon, sunshaoce. · View Herald TranscriptTue, Nov 14, 1:25 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

8 lines

test/

Transforms/

LoopVectorize/

AArch64/

runtime-check-size-based-threshold.ll

6 lines

sve-epilog-vect-vscale-tune.ll

37 lines

sve-fneg.ll

9 lines

type-shrinkage-zext-costs.ll

4 lines

Diff 514154

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,613 Lines • ▼ Show 20 Lines	bool LoopVectorizationCostModel::isEpilogueVectorizationProfitable(
// Allow the target to opt out entirely.		// Allow the target to opt out entirely.
if (!TTI.preferEpilogueVectorization())		if (!TTI.preferEpilogueVectorization())
return false;		return false;

// We also consider epilogue vectorization unprofitable for targets that don't		// We also consider epilogue vectorization unprofitable for targets that don't
// consider interleaving beneficial (eg. MVE).		// consider interleaving beneficial (eg. MVE).
if (TTI.getMaxInterleaveFactor(VF) <= 1)		if (TTI.getMaxInterleaveFactor(VF) <= 1)
return false;		return false;
// FIXME: We should consider changing the threshold for scalable
// vectors to take VScaleForTuning into account.		unsigned Multiplier = 1;
if (VF.getKnownMinValue() >= EpilogueVectorizationMinVF)		if (VF.isScalable())
		Multiplier = getVScaleForTuning().value_or(1);
		if ((Multiplier * VF.getKnownMinValue()) >= EpilogueVectorizationMinVF)
		paulwalker-armUnsubmitted Done Reply Inline Actions You could use `Multiplier = getVScaleForTuning().value_or(1)` here if you wanted. At some point in the future we might want to change this to `value_or(MinVScale)` anyway. paulwalker-arm: You could use `Multiplier = getVScaleForTuning().value_or(1)` here if you wanted. At some…
return true;		return true;
return false;		return false;
}		}

VectorizationFactor		VectorizationFactor
LoopVectorizationCostModel::selectEpilogueVectorizationFactor(		LoopVectorizationCostModel::selectEpilogueVectorizationFactor(
const ElementCount MainLoopVF, const LoopVectorizationPlanner &LVP) {		const ElementCount MainLoopVF, const LoopVectorizationPlanner &LVP) {
VectorizationFactor Result = VectorizationFactor::Disabled();		VectorizationFactor Result = VectorizationFactor::Disabled();
▲ Show 20 Lines • Show All 5,036 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/runtime-check-size-based-threshold.ll

	; RUN: opt -passes=loop-vectorize -mtriple=arm64-apple-iphoneos -vectorizer-min-trip-count=8 -S %s \| FileCheck --check-prefixes=CHECK,DEFAULT %s			; RUN: opt -passes=loop-vectorize -mtriple=arm64-apple-iphoneos -vectorizer-min-trip-count=8 \
	; RUN: opt -passes=loop-vectorize -mtriple=arm64-apple-iphoneos -vectorizer-min-trip-count=8 -vectorize-memory-check-threshold=1 -S %s \| FileCheck --check-prefixes=CHECK,THRESHOLD %s			; RUN: -enable-epilogue-vectorization=false -S %s \| FileCheck --check-prefixes=CHECK,DEFAULT %s
				; RUN: opt -passes=loop-vectorize -mtriple=arm64-apple-iphoneos -vectorizer-min-trip-count=8 \
				; RUN: -enable-epilogue-vectorization=false -vectorize-memory-check-threshold=1 -S %s \| FileCheck --check-prefixes=CHECK,THRESHOLD %s

	; Tests for loops with large numbers of runtime checks. Check that loops are			; Tests for loops with large numbers of runtime checks. Check that loops are
	; vectorized, if the loop trip counts are large and the impact of the runtime			; vectorized, if the loop trip counts are large and the impact of the runtime
	; checks is very small compared to the expected loop runtimes.			; checks is very small compared to the expected loop runtimes.


	; The trip count in the loop in this function is too to warrant large runtime checks.			; The trip count in the loop in this function is too to warrant large runtime checks.
	; CHECK-LABEL: define {{.*}} @test_tc_too_small			; CHECK-LABEL: define {{.*}} @test_tc_too_small
	▲ Show 20 Lines • Show All 159 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-vscale-tune.ll

This file was added.

				; RUN: opt -S -passes=loop-vectorize,instsimplify -force-vector-interleave=1 \
				; RUN: -mcpu=neoverse-v1 < %s \| FileCheck %s --check-prefix=CHECK-EPILOG
				; RUN: opt -S -passes=loop-vectorize,instsimplify -force-vector-interleave=1 \
				; RUN: -mcpu=neoverse-v1 < %s \| FileCheck %s --check-prefix=CHECK-EPILOG
				; RUN: opt -S -passes=loop-vectorize,instsimplify -force-vector-interleave=1 \
				paulwalker-armUnsubmitted Not Done Reply Inline Actions Perhaps add a RUN line for the default (i.e. with no -mcpu option) that can presumably reuse the check lines for CHECK-NV1. paulwalker-arm: Perhaps add a RUN line for the default (i.e. with no -mcpu option) that can presumably reuse…
				; RUN: -mcpu=neoverse-v2 < %s \| FileCheck %s --check-prefix=CHECK-NO-EPILOG
				; RUN: opt -S -passes=loop-vectorize,instsimplify -force-vector-interleave=1 \
				; RUN: -mcpu=cortex-x2 < %s \| FileCheck %s --check-prefix=CHECK-NO-EPILOG

				target triple = "aarch64-unknown-linux-gnu"

				define void @foo(ptr noalias nocapture readonly %p, ptr noalias nocapture %q, i64 %len) #0 {
				; CHECK-EPILOG: vec.epilog.ph:
				; CHECK-EPILOG: vec.epilog.vector.body:
				; CHECK-EPILOG: load <vscale x 4 x i16>

				; CHECK-NO-EPILOG-NOT: vec.epilog.vector.ph:
				; CHECK-NO-EPILOG-NOT: vec.epilog.vector.body:
				entry:
				br label %for.body

				for.body: ; preds = %entry, %for.body
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
				%arrayidx = getelementptr inbounds i16, ptr %p, i64 %indvars.iv
				%0 = load i16, ptr %arrayidx
				%add = add nuw nsw i16 %0, 2
				%arrayidx3 = getelementptr inbounds i16, ptr %q, i64 %indvars.iv
				store i16 %add, ptr %arrayidx3
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %len
				br i1 %exitcond, label %exit, label %for.body

				exit: ; preds = %for.body
				ret void
				}

				attributes #0 = { "target-features"="+sve" vscale_range(1,16) }
				paulwalker-armUnsubmitted Not Done Reply Inline Actions It's up to you Dave but I don't see autogenerating the check lines offering any value here. If anything it makes it harder to understand the effect. Something simple like: ; CHECK-EPILOGUE: vec.epilog.ph: ; CHECK-EPILOGUE: load <vscale x 4 x i16> ; CHECK-NO-EPILOGUE-NOT: vec.epilog.ph: seems like a clearer test? paulwalker-arm: It's up to you Dave but I don't see autogenerating the check lines offering any value here. If…

llvm/test/Transforms/LoopVectorize/AArch64/sve-fneg.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt -passes=loop-vectorize,dce -mtriple aarch64-linux-gnu -mattr=+sve \		; RUN: opt -passes=loop-vectorize,dce -prefer-predicate-over-epilogue=scalar-epilogue \
; RUN: -prefer-predicate-over-epilogue=scalar-epilogue < %s -S \| FileCheck %s		; RUN: -enable-epilogue-vectorization=false < %s -S \| FileCheck %s

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-unknown-linux-gnu"		target triple = "aarch64-unknown-linux-gnu"

; This should be vscale x 8 vectorized, maybe with some interleaving.		; This should be vscale x 8 vectorized, maybe with some interleaving.

define void @fneg(ptr nocapture noundef writeonly %d, ptr nocapture noundef readonly %s, i32 noundef %n) {		define void @fneg(ptr nocapture noundef writeonly %d, ptr nocapture noundef readonly %s, i32 noundef %n) #0 {
; CHECK-LABEL: @fneg(		; CHECK-LABEL: @fneg(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[S2:%.]] = ptrtoint ptr [[S:%.]] to i64		; CHECK-NEXT: [[S2:%.]] = ptrtoint ptr [[S:%.]] to i64
; CHECK-NEXT: [[D1:%.]] = ptrtoint ptr [[D:%.]] to i64		; CHECK-NEXT: [[D1:%.]] = ptrtoint ptr [[D:%.]] to i64
; CHECK-NEXT: [[CMP6:%.]] = icmp sgt i32 [[N:%.]], 0		; CHECK-NEXT: [[CMP6:%.]] = icmp sgt i32 [[N:%.]], 0
; CHECK-NEXT: br i1 [[CMP6]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]		; CHECK-NEXT: br i1 [[CMP6]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]
; CHECK: for.body.preheader:		; CHECK: for.body.preheader:
; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[N]] to i64		; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[N]] to i64
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	for.body: ; preds = %for.body.preheader, %for.body
%0 = load half, ptr %arrayidx, align 2		%0 = load half, ptr %arrayidx, align 2
%fneg = fneg half %0		%fneg = fneg half %0
%arrayidx2 = getelementptr inbounds half, ptr %d, i64 %indvars.iv		%arrayidx2 = getelementptr inbounds half, ptr %d, i64 %indvars.iv
store half %fneg, ptr %arrayidx2, align 2		store half %fneg, ptr %arrayidx2, align 2
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count		%exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
br i1 %exitcond.not, label %for.cond.cleanup, label %for.body		br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
}		}

		attributes #0 = { "target-features"="+sve" vscale_range(1,16) }

llvm/test/Transforms/LoopVectorize/AArch64/type-shrinkage-zext-costs.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt -S -passes=loop-vectorize,instsimplify -force-vector-interleave=1 \			; RUN: opt -S -passes=loop-vectorize,instsimplify -force-vector-interleave=1 \
	; RUN: -debug-only=loop-vectorize 2>%t < %s \| FileCheck %s			; RUN: -enable-epilogue-vectorization=false -debug-only=loop-vectorize 2>%t < %s \| FileCheck %s
	; RUN: cat %t \| FileCheck %s --check-prefix=CHECK-COST			; RUN: cat %t \| FileCheck %s --check-prefix=CHECK-COST

	target triple = "aarch64-unknown-linux-gnu"			target triple = "aarch64-unknown-linux-gnu"

	define void @zext_i8_i16(ptr noalias nocapture readonly %p, ptr noalias nocapture %q, i32 %len) #0 {			define void @zext_i8_i16(ptr noalias nocapture readonly %p, ptr noalias nocapture %q, i32 %len) #0 {
	; CHECK-COST-LABEL: LV: Checking a loop in 'zext_i8_i16'			; CHECK-COST-LABEL: LV: Checking a loop in 'zext_i8_i16'
	; CHECK-COST: LV: Found an estimated cost of 0 for VF 1 For instruction: %conv = zext i8 %0 to i32			; CHECK-COST: LV: Found an estimated cost of 0 for VF 1 For instruction: %conv = zext i8 %0 to i32
	; CHECK-COST: LV: Found an estimated cost of 1 for VF 2 For instruction: %conv = zext i8 %0 to i32			; CHECK-COST: LV: Found an estimated cost of 1 for VF 2 For instruction: %conv = zext i8 %0 to i32
	; CHECK-COST: LV: Found an estimated cost of 1 for VF 4 For instruction: %conv = zext i8 %0 to i32			; CHECK-COST: LV: Found an estimated cost of 1 for VF 4 For instruction: %conv = zext i8 %0 to i32
	; CHECK-COST: LV: Found an estimated cost of 1 for VF 8 For instruction: %conv = zext i8 %0 to i32			; CHECK-COST: LV: Found an estimated cost of 1 for VF 8 For instruction: %conv = zext i8 %0 to i32
	; CHECK-COST: LV: Found an estimated cost of 2 for VF 16 For instruction: %conv = zext i8 %0 to i32			; CHECK-COST: LV: Found an estimated cost of 2 for VF 16 For instruction: %conv = zext i8 %0 to i32
	; CHECK-COST: LV: Found an estimated cost of 1 for VF vscale x 1 For instruction: %conv = zext i8 %0 to i32			; CHECK-COST: LV: Found an estimated cost of 1 for VF vscale x 1 For instruction: %conv = zext i8 %0 to i32
	; CHECK-COST: LV: Found an estimated cost of 1 for VF vscale x 2 For instruction: %conv = zext i8 %0 to i32			; CHECK-COST: LV: Found an estimated cost of 1 for VF vscale x 2 For instruction: %conv = zext i8 %0 to i32
	; CHECK-COST: LV: Found an estimated cost of 1 for VF vscale x 4 For instruction: %conv = zext i8 %0 to i32			; CHECK-COST: LV: Found an estimated cost of 1 for VF vscale x 4 For instruction: %conv = zext i8 %0 to i32
	; CHECK-COST: LV: Found an estimated cost of 0 for VF vscale x 8 For instruction: %conv = zext i8 %0 to i32			; CHECK-COST: LV: Found an estimated cost of 0 for VF vscale x 8 For instruction: %conv = zext i8 %0 to i32

	; CHECK-LABEL: define void @zext_i8_i16			; CHECK-LABEL: define void @zext_i8_i16
	; CHECK-SAME: (ptr noalias nocapture readonly [[P:%.]], ptr noalias nocapture [[Q:%.]], i32 [[LEN:%.*]]) #[[ATTR0:[0-9]+]] {			; CHECK-SAME: (ptr noalias nocapture readonly [[P:%.]], ptr noalias nocapture [[Q:%.]], i32 [[LEN:%.*]]) #[[ATTR0:[0-9]+]] {
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[LEN]], -1			; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[LEN]], -1
	; CHECK-NEXT: [[TMP1:%.*]] = zext i32 [[TMP0]] to i64			; CHECK-NEXT: [[TMP1:%.*]] = zext i32 [[TMP0]] to i64
	; CHECK-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1			; CHECK-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1
	; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP4:%.*]] = mul i64 [[TMP3]], 8			; CHECK-NEXT: [[TMP4:%.*]] = mul i64 [[TMP3]], 8
	▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
	; CHECK-COST: LV: Found an estimated cost of 1 for VF 2 For instruction: %conv = sext i8 %0 to i32			; CHECK-COST: LV: Found an estimated cost of 1 for VF 2 For instruction: %conv = sext i8 %0 to i32
	; CHECK-COST: LV: Found an estimated cost of 1 for VF 4 For instruction: %conv = sext i8 %0 to i32			; CHECK-COST: LV: Found an estimated cost of 1 for VF 4 For instruction: %conv = sext i8 %0 to i32
	; CHECK-COST: LV: Found an estimated cost of 1 for VF 8 For instruction: %conv = sext i8 %0 to i32			; CHECK-COST: LV: Found an estimated cost of 1 for VF 8 For instruction: %conv = sext i8 %0 to i32
	; CHECK-COST: LV: Found an estimated cost of 2 for VF 16 For instruction: %conv = sext i8 %0 to i32			; CHECK-COST: LV: Found an estimated cost of 2 for VF 16 For instruction: %conv = sext i8 %0 to i32
	; CHECK-COST: LV: Found an estimated cost of 1 for VF vscale x 1 For instruction: %conv = sext i8 %0 to i32			; CHECK-COST: LV: Found an estimated cost of 1 for VF vscale x 1 For instruction: %conv = sext i8 %0 to i32
	; CHECK-COST: LV: Found an estimated cost of 1 for VF vscale x 2 For instruction: %conv = sext i8 %0 to i32			; CHECK-COST: LV: Found an estimated cost of 1 for VF vscale x 2 For instruction: %conv = sext i8 %0 to i32
	; CHECK-COST: LV: Found an estimated cost of 1 for VF vscale x 4 For instruction: %conv = sext i8 %0 to i32			; CHECK-COST: LV: Found an estimated cost of 1 for VF vscale x 4 For instruction: %conv = sext i8 %0 to i32
	; CHECK-COST: LV: Found an estimated cost of 0 for VF vscale x 8 For instruction: %conv = sext i8 %0 to i32			; CHECK-COST: LV: Found an estimated cost of 0 for VF vscale x 8 For instruction: %conv = sext i8 %0 to i32

	; CHECK-LABEL: define void @sext_i8_i16			; CHECK-LABEL: define void @sext_i8_i16
	; CHECK-SAME: (ptr noalias nocapture readonly [[P:%.]], ptr noalias nocapture [[Q:%.]], i32 [[LEN:%.*]]) #[[ATTR0]] {			; CHECK-SAME: (ptr noalias nocapture readonly [[P:%.]], ptr noalias nocapture [[Q:%.]], i32 [[LEN:%.*]]) #[[ATTR0]] {
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[LEN]], -1			; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[LEN]], -1
	; CHECK-NEXT: [[TMP1:%.*]] = zext i32 [[TMP0]] to i64			; CHECK-NEXT: [[TMP1:%.*]] = zext i32 [[TMP0]] to i64
	; CHECK-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1			; CHECK-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1
	; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP4:%.*]] = mul i64 [[TMP3]], 8			; CHECK-NEXT: [[TMP4:%.*]] = mul i64 [[TMP3]], 8
	▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines