This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Enable fixed length vectorization
ClosedPublic

Authored by reames on Aug 9 2022, 9:49 AM.

Download Raw Diff

Details

Reviewers

sjarus
craig.topper
frasercrmck
kito-cheng

Commits

rGb45a262679ab: [RISCV] Enable fixed length vectors and loop vectorization with same

Summary

This change enables the use of RISCV's variable length vector registers for fixed length vectors in the IR, and implicitly enables various IR transforms which generate fixed length vectors if legal (e.g. LoopVectorize and SLPVectorize). Specifically, this enables fixed length vectors which are known to be inbounds of the underlying variable hardware size.

For context, remember that the +V extension provides a minimum VLEN of 128. The embedded variants provide lower minimums. The analogy here is essentially vectorizing for SSE on a machine which may or may not include AVX2/AVX512. We won't get full utilization by default, but we will get some benefit. And of course, with an explicit mcpu we can vectorize to the exact target hardware.

The LV impact is mostly related to vectorizer robustness. In cases we haven't yet fully implemented scalable vectorization support, we can fall back to fixed length vectorization. Note that there a bunch of cases we haven't yet implemented, so in practice this is a fairly major shift towards auto-vectorizing more often.

On the SLP side, I haven't done anywhere near as detailed an evaluation, but the initial investigation I did do ran into a few open issues. Given that, I've posted a change which disables SLP even when fixed vectors are enabled, and marked it as dependency for this one.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

reames created this revision.Aug 9 2022, 9:49 AM

Herald added a reviewer: sjarus. · View Herald TranscriptAug 9 2022, 9:49 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: sunshaoce, VincentWu, luke957 and 33 others. · View Herald Transcript

reames requested review of this revision.Aug 9 2022, 9:49 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 9 2022, 9:49 AM

Herald added subscribers: llvm-commits, alextsao1999, • pcwang-thead and 2 others. · View Herald Transcript

AVX615?

reames added inline comments.Aug 9 2022, 9:55 AM

llvm/test/CodeGen/RISCV/fold-vector-cmp.ll
13	This case demonstrates an unfortunate, but I think non-blocking interaction. Essentially, with vectors illegal, we force scalarization and then constant fold the individual scalar lanes. With vectors legal, we fail to recognize that scalarization is profitable, or that lane 0 is unused. As a result, we fail to constant fold. This is a general problem with vector codegen optimization, and not directly related to this change.
llvm/test/CodeGen/RISCV/vec3-setcc-crash.ll
2 ↗	(On Diff #451188)	TODO: This test needs to be restricted to continue exercising scalar lowering.
llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll
85 ↗	(On Diff #451188)	TODO: We're deciding that fixed length vectorization is profitable over scalable vectorization when both are legal. That's odd, and probably indicates a problem in the cost model.
llvm/test/Transforms/LoopVectorize/RISCV/scalable-basics.ll
331 ↗	(On Diff #451188)	This is an example of fallback logic working as expected. (We have a known problem around scalable scatter/gather costing.)
llvm/test/Transforms/LoopVectorize/RISCV/scalable-divrem.ll
252 ↗	(On Diff #451188)	Again, fallback working as expected due to known problem with scalable vectorization.

reames edited the summary of this revision. (Show Details)Aug 9 2022, 9:55 AM

reames added inline comments.Aug 9 2022, 10:21 AM

llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll
85 ↗	(On Diff #451188)	This turns out to be a known issue in the cost model. So as to return a conservative correct cost, we're using an upper bound on VL. This results in scalable vectors (for which maximum VL is potentially quite large) appearing unprofitable. In this case, the cost is a function of log2(max-vl), but we have other instances - such as scatter/gather - where the cost is linear in max-VL and magnifies this effect even further. I don't think this is a blocking issue for enabling fixed length vectorization.

Harbormaster completed remote builds in B180198: Diff 451188.Aug 9 2022, 1:55 PM

Rebase

reames added a parent revision: D131519: [RISCV] Use VScaleForTuning in costing of operations whose cost depends on VL.Aug 9 2022, 2:47 PM

@reames

Below are some thoughts which might give you some food for thought, but frankly I just wanted to ask someone for an advice^^.

We have a peculiar architecture that has only scalable vectors.
They are different compared to SVE and RISCV V extensions in that the minimum size is just one element and the maximum size is known (e.g. 32 64-bit elements).
The ISA has load / store instructions and a few others with both static and dynamic counters, i.e. you can write (pseudo code) "dst = vload.f32 rep 16 [ptr]" or "dst = vloadf32 rep vlen [ptr]".
We don't currently use the dynamic version, we just map fixed point vectors to pseudo register classes of the same size. This is quite messy, because these pseudo register classes also require pseudo instructions for each vector size (not only loads / stores, but all of them).
I'd like to rework it to have just one register class (like SVE does), but here is the problem: when spilling occurs, one needs to know the effective size of the spilled register. If we use one register class for all possible fixed vectors, the size is hard to figure out, if possible.
I guess you will face the same issue, but it should be easier for you, since you are taking the minimum VLEN, which is rather small, so you can just spill VLEN bits, no matter what the effective size is.

@barannikov88 - I don't see how your last comment connects to this review. If you want to ask a question on your hardware, please email me and we can chat briefly.

llvm/test/CodeGen/RISCV/vec3-setcc-crash.ll
2 ↗	(On Diff #451188)	Done
llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll
85 ↗	(On Diff #451188)	Fix out for review as D131519

In D131508#3711061, @reames wrote:

@barannikov88 - I don't see how your last comment connects to this review. If you want to ask a question on your hardware, please email me and we can chat briefly.

Sorry for intervening. I wanted to bring to your attention that if you map fixed vectors to scalable registers, you will need a way to know the effective size of the register when a need for a spill arises.
I don't know if the ISA allows you to extract the number of elements from the internal part of the vector register, but you can always spill 128 bits (the min VLEN), wasting some stack space if the effective size is smaller.
That is just of more importance for our target, so I was wondering how are you going to solve this spilling issue, if at all.

ADD
I won't take more of your time and just question on discourse forum. Sorry if my point was inappropriate, I probably misunderstood the commit message (my English is far from good).

In D131508#3711122, @barannikov88 wrote:

In D131508#3711061, @reames wrote:

@barannikov88 - I don't see how your last comment connects to this review. If you want to ask a question on your hardware, please email me and we can chat briefly.

Sorry for intervening. I wanted to bring to your attention that if you map fixed vectors to scalable registers, you will need a way to know the effective size of the register when a need for a spill arises.
I don't know if the ISA allows you to extract the number of elements from the internal part of the vector register, but you can always spill 128 bits (the min VLEN), wasting some stack space if the effective size is smaller.
That is just of more importance for our target, so I was wondering how are you going to solve this spilling issue, if at all.

On RISCV, the current implementation of fixed length vectors uses however many bits of the vector register are required. When spilling a vector register, we currently spill the full register, and make no attempt at tracking which sub-lanes are live. We could in theory spill less if sub-lanes of the register are dead, but a) this is unlikely to be a significant stack savings, and b) the spill instructions easiest to use work on full vector registers.

Harbormaster completed remote builds in B180261: Diff 451278.Aug 9 2022, 5:07 PM

Matt added a subscriber: Matt.Aug 9 2022, 8:03 PM

reames retitled this revision from [WIP][RISCV] Enable fixed length vectorization to [RISCV] Enable fixed length vectorization.Aug 13 2022, 8:47 AM

reames edited the summary of this revision. (Show Details)

reames added reviewers: craig.topper, frasercrmck, kito-cheng.

reames set the repository for this revision to rG LLVM Github Monorepo.

Herald added a subscriber: StephenFan. · View Herald TranscriptAug 13 2022, 8:47 AM

reames mentioned this in D132680: [RISCV] Disable SLP vectorization by default due to unresolved profitability issues.Aug 25 2022, 10:48 AM

reames added a parent revision: D132680: [RISCV] Disable SLP vectorization by default due to unresolved profitability issues.

reames edited the summary of this revision. (Show Details)Aug 25 2022, 10:51 AM

LGTM

llvm/test/CodeGen/RISCV/fpclamptosat_vec.ll
17 ↗	(On Diff #451278)	This might need a rebase. I don't think we use vncvt.x.x.w anymore after D132041

This revision is now accepted and ready to land.Aug 26 2022, 12:32 PM

reames mentioned this in rGa310637132e1: [RISCV] Disable SLP vectorization by default due to unresolved profitability….Aug 26 2022, 2:12 PM

This revision was landed with ongoing or failed builds.Aug 26 2022, 2:45 PM

Closed by commit rGb45a262679ab: [RISCV] Enable fixed length vectors and loop vectorization with same (authored by reames). · Explain Why

This revision was automatically updated to reflect the committed changes.

reames added a commit: rGb45a262679ab: [RISCV] Enable fixed length vectors and loop vectorization with same.

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVSubtarget.cpp

2 lines

test/

Analysis/

CostModel/

RISCV/

active_lane_mask.ll

20 lines

CodeGen/

RISCV/

fold-vector-cmp.ll

29 lines

Transforms/

LoopVectorize/

RISCV/

illegal-type.ll

58 lines

Diff 456028

llvm/lib/Target/RISCV/RISCVSubtarget.cpp

Show All 38 Lines	static cl::opt<int> RVVVectorBitsMax(
cl::init(0), cl::Hidden);		cl::init(0), cl::Hidden);

static cl::opt<int> RVVVectorBitsMin(		static cl::opt<int> RVVVectorBitsMin(
"riscv-v-vector-bits-min",		"riscv-v-vector-bits-min",
cl::desc("Assume V extension vector registers are at least this big, "		cl::desc("Assume V extension vector registers are at least this big, "
"with zero meaning no minimum size is assumed. A value of -1 "		"with zero meaning no minimum size is assumed. A value of -1 "
"means use Zvl*b extension. This is primarily used to enable "		"means use Zvl*b extension. This is primarily used to enable "
"autovectorization with fixed width vectors."),		"autovectorization with fixed width vectors."),
cl::init(0), cl::Hidden);		cl::init(-1), cl::Hidden);

static cl::opt<unsigned> RVVVectorLMULMax(		static cl::opt<unsigned> RVVVectorLMULMax(
"riscv-v-fixed-length-vector-lmul-max",		"riscv-v-fixed-length-vector-lmul-max",
cl::desc("The maximum LMUL value to use for fixed length vectors. "		cl::desc("The maximum LMUL value to use for fixed length vectors. "
"Fractional LMUL values are not supported."),		"Fractional LMUL values are not supported."),
cl::init(8), cl::Hidden);		cl::init(8), cl::Hidden);

static cl::opt<bool> RISCVDisableUsingConstantPoolForLargeInts(		static cl::opt<bool> RISCVDisableUsingConstantPoolForLargeInts(
▲ Show 20 Lines • Show All 158 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/RISCV/active_lane_mask.ll

	Show All 9 Lines
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %mask_nxv1i1_i64 = call <vscale x 1 x i1> @llvm.get.active.lane.mask.nxv1i1.i64(i64 undef, i64 undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %mask_nxv1i1_i64 = call <vscale x 1 x i1> @llvm.get.active.lane.mask.nxv1i1.i64(i64 undef, i64 undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %mask_nxv16i1_i32 = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i32(i32 undef, i32 undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %mask_nxv16i1_i32 = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i32(i32 undef, i32 undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %mask_nxv8i1_i32 = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i32(i32 undef, i32 undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %mask_nxv8i1_i32 = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i32(i32 undef, i32 undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %mask_nxv4i1_i32 = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i32(i32 undef, i32 undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %mask_nxv4i1_i32 = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i32(i32 undef, i32 undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %mask_nxv2i1_i32 = call <vscale x 2 x i1> @llvm.get.active.lane.mask.nxv2i1.i32(i32 undef, i32 undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %mask_nxv2i1_i32 = call <vscale x 2 x i1> @llvm.get.active.lane.mask.nxv2i1.i32(i32 undef, i32 undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %mask_nxv1i1_i32 = call <vscale x 1 x i1> @llvm.get.active.lane.mask.nxv1i1.i32(i32 undef, i32 undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %mask_nxv1i1_i32 = call <vscale x 1 x i1> @llvm.get.active.lane.mask.nxv1i1.i32(i32 undef, i32 undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %mask_nxv32i1_i64 = call <vscale x 32 x i1> @llvm.get.active.lane.mask.nxv32i1.i64(i64 undef, i64 undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %mask_nxv32i1_i64 = call <vscale x 32 x i1> @llvm.get.active.lane.mask.nxv32i1.i64(i64 undef, i64 undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %mask_nxv16i1_i16 = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i16(i16 undef, i16 undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %mask_nxv16i1_i16 = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i16(i16 undef, i16 undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %mask_v16i1_i64 = call <16 x i1> @llvm.get.active.lane.mask.v16i1.i64(i64 undef, i64 undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %mask_v16i1_i64 = call <16 x i1> @llvm.get.active.lane.mask.v16i1.i64(i64 undef, i64 undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %mask_v8i1_i64 = call <8 x i1> @llvm.get.active.lane.mask.v8i1.i64(i64 undef, i64 undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %mask_v8i1_i64 = call <8 x i1> @llvm.get.active.lane.mask.v8i1.i64(i64 undef, i64 undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %mask_v4i1_i64 = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 undef, i64 undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %mask_v4i1_i64 = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 undef, i64 undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %mask_v2i1_i64 = call <2 x i1> @llvm.get.active.lane.mask.v2i1.i64(i64 undef, i64 undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %mask_v2i1_i64 = call <2 x i1> @llvm.get.active.lane.mask.v2i1.i64(i64 undef, i64 undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %mask_v16i1_i32 = call <16 x i1> @llvm.get.active.lane.mask.v16i1.i32(i32 undef, i32 undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %mask_v16i1_i32 = call <16 x i1> @llvm.get.active.lane.mask.v16i1.i32(i32 undef, i32 undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %mask_v8i1_i32 = call <8 x i1> @llvm.get.active.lane.mask.v8i1.i32(i32 undef, i32 undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %mask_v8i1_i32 = call <8 x i1> @llvm.get.active.lane.mask.v8i1.i32(i32 undef, i32 undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %mask_v4i1_i32 = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 undef, i32 undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %mask_v4i1_i32 = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 undef, i32 undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %mask_v2i1_i32 = call <2 x i1> @llvm.get.active.lane.mask.v2i1.i32(i32 undef, i32 undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %mask_v2i1_i32 = call <2 x i1> @llvm.get.active.lane.mask.v2i1.i32(i32 undef, i32 undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 224 for instruction: %mask_v32i1_i64 = call <32 x i1> @llvm.get.active.lane.mask.v32i1.i64(i64 undef, i64 undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %mask_v32i1_i64 = call <32 x i1> @llvm.get.active.lane.mask.v32i1.i64(i64 undef, i64 undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %mask_v16i1_i16 = call <16 x i1> @llvm.get.active.lane.mask.v16i1.i16(i16 undef, i16 undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %mask_v16i1_i16 = call <16 x i1> @llvm.get.active.lane.mask.v16i1.i16(i16 undef, i16 undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
	;			;
	%mask_nxv16i1_i64 = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 undef, i64 undef)			%mask_nxv16i1_i64 = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 undef, i64 undef)
	%mask_nxv8i1_i64 = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 undef, i64 undef)			%mask_nxv8i1_i64 = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 undef, i64 undef)
	%mask_nxv4i1_i64 = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 undef, i64 undef)			%mask_nxv4i1_i64 = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 undef, i64 undef)
	%mask_nxv2i1_i64 = call <vscale x 2 x i1> @llvm.get.active.lane.mask.nxv2i1.i64(i64 undef, i64 undef)			%mask_nxv2i1_i64 = call <vscale x 2 x i1> @llvm.get.active.lane.mask.nxv2i1.i64(i64 undef, i64 undef)
	%mask_nxv1i1_i64 = call <vscale x 1 x i1> @llvm.get.active.lane.mask.nxv1i1.i64(i64 undef, i64 undef)			%mask_nxv1i1_i64 = call <vscale x 1 x i1> @llvm.get.active.lane.mask.nxv1i1.i64(i64 undef, i64 undef)

	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/fold-vector-cmp.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -start-after codegenprepare -mtriple=riscv64 -mattr=-v -o - %s \| FileCheck %s			; RUN: llc -start-after codegenprepare -mtriple=riscv64 -mattr=-v -o - %s \| FileCheck --check-prefix=CHECK-NOV %s
	; RUN: llc -start-after codegenprepare -mtriple=riscv64 -mattr=+v -o - %s \| FileCheck %s			; RUN: llc -start-after codegenprepare -mtriple=riscv64 -mattr=+v -o - %s \| FileCheck --check-prefix=CHECK-V %s

	; Reproducer for https://github.com/llvm/llvm-project/issues/55168.			; Reproducer for https://github.com/llvm/llvm-project/issues/55168.
	; We should always return 1 (and not -1).			; We should always return 1 (and not -1).
	define i32 @test(i32 %call.i) {			define i32 @test(i32 %call.i) {
	; CHECK-LABEL: test:			; CHECK-NOV-LABEL: test:
	; CHECK: # %bb.0:			; CHECK-NOV: # %bb.0:
	; CHECK-NEXT: li a0, 1			; CHECK-NOV-NEXT: li a0, 1
	; CHECK-NEXT: ret			; CHECK-NOV-NEXT: ret
				;
				; CHECK-V-LABEL: test:
				reamesAuthorUnsubmitted Done Reply Inline Actions This case demonstrates an unfortunate, but I think non-blocking interaction. Essentially, with vectors illegal, we force scalarization and then constant fold the individual scalar lanes. With vectors legal, we fail to recognize that scalarization is profitable, or that lane 0 is unused. As a result, we fail to constant fold. This is a general problem with vector codegen optimization, and not directly related to this change. reames: This case demonstrates an unfortunate, but I think non-blocking interaction. Essentially, with…
				; CHECK-V: # %bb.0:
				; CHECK-V-NEXT: lui a1, 524288
				; CHECK-V-NEXT: vsetivli zero, 2, e32, mf2, ta, mu
				; CHECK-V-NEXT: vmv.v.x v8, a1
				; CHECK-V-NEXT: vsetvli zero, zero, e32, mf2, tu, mu
				; CHECK-V-NEXT: vmv.s.x v8, a0
				; CHECK-V-NEXT: addiw a0, a1, 2
				; CHECK-V-NEXT: vsetvli zero, zero, e32, mf2, ta, mu
				; CHECK-V-NEXT: vmslt.vx v0, v8, a0
				; CHECK-V-NEXT: vmv.v.i v8, 0
				; CHECK-V-NEXT: vmerge.vim v8, v8, 1, v0
				; CHECK-V-NEXT: vsetivli zero, 1, e32, mf2, ta, mu
				; CHECK-V-NEXT: vslidedown.vi v8, v8, 1
				; CHECK-V-NEXT: vmv.x.s a0, v8
				; CHECK-V-NEXT: ret
	%t2 = insertelement <2 x i32> <i32 poison, i32 -2147483648>, i32 %call.i, i64 0			%t2 = insertelement <2 x i32> <i32 poison, i32 -2147483648>, i32 %call.i, i64 0
	%t3 = icmp slt <2 x i32> %t2, <i32 -2147483646, i32 -2147483646>			%t3 = icmp slt <2 x i32> %t2, <i32 -2147483646, i32 -2147483646>
	%t4 = zext <2 x i1> %t3 to <2 x i32>			%t4 = zext <2 x i1> %t3 to <2 x i32>
	%t6 = extractelement <2 x i32> %t4, i64 1			%t6 = extractelement <2 x i32> %t4, i64 1
	ret i32 %t6			ret i32 %t6
	}			}

llvm/test/Transforms/LoopVectorize/RISCV/illegal-type.ll

	Show First 20 Lines • Show All 95 Lines • ▼ Show 20 Lines

	for.end:			for.end:
	ret void			ret void
	}			}

	define void @uniform_store_i1(i1* noalias %dst, i64* noalias %start, i64 %N) {			define void @uniform_store_i1(i1* noalias %dst, i64* noalias %start, i64 %N) {
	; CHECK-LABEL: @uniform_store_i1(			; CHECK-LABEL: @uniform_store_i1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.]] = add i64 [[N:%.]], 1
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP0]], 4
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP0]], 4
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]]
				; CHECK-NEXT: [[IND_END:%.]] = getelementptr i64, i64 [[START:%.*]], i64 [[N_VEC]]
				; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <2 x i64> poison, i64* [[START]], i32 0
				; CHECK-NEXT: [[BROADCAST_SPLAT:%.]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64*> poison, <2 x i32> zeroinitializer
				; CHECK-NEXT: [[BROADCAST_SPLATINSERT3:%.]] = insertelement <2 x i64> poison, i64* [[START]], i32 0
				; CHECK-NEXT: [[BROADCAST_SPLAT4:%.]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT3]], <2 x i64*> poison, <2 x i32> zeroinitializer
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[POINTER_PHI:%.]] = phi i64 [ [[START]], [[VECTOR_PH]] ], [ [[PTR_IND:%.*]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP1:%.]] = getelementptr i64, i64 [[POINTER_PHI]], <2 x i64> <i64 0, i64 1>
				; CHECK-NEXT: [[TMP2:%.]] = getelementptr i64, i64 [[POINTER_PHI]], <2 x i64> <i64 2, i64 3>
				; CHECK-NEXT: [[TMP3:%.]] = extractelement <2 x i64> [[TMP1]], i32 0
				; CHECK-NEXT: [[TMP4:%.]] = getelementptr i64, i64 [[TMP3]], i32 0
				; CHECK-NEXT: [[TMP5:%.]] = bitcast i64 [[TMP4]] to <2 x i64>*
				; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <2 x i64>, <2 x i64> [[TMP5]], align 4
				; CHECK-NEXT: [[TMP6:%.]] = getelementptr i64, i64 [[TMP3]], i32 2
				; CHECK-NEXT: [[TMP7:%.]] = bitcast i64 [[TMP6]] to <2 x i64>*
				; CHECK-NEXT: [[WIDE_LOAD2:%.]] = load <2 x i64>, <2 x i64> [[TMP7]], align 4
				; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds i64, <2 x i64> [[TMP1]], i64 1
				; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds i64, <2 x i64> [[TMP2]], i64 1
				; CHECK-NEXT: [[TMP10:%.]] = icmp eq <2 x i64> [[TMP8]], [[BROADCAST_SPLAT]]
				; CHECK-NEXT: [[TMP11:%.]] = icmp eq <2 x i64> [[TMP9]], [[BROADCAST_SPLAT4]]
				; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i1> [[TMP10]], i32 0
				; CHECK-NEXT: store i1 [[TMP12]], i1* [[DST:%.*]], align 1
				; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x i1> [[TMP10]], i32 1
				; CHECK-NEXT: store i1 [[TMP13]], i1* [[DST]], align 1
				; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x i1> [[TMP11]], i32 0
				; CHECK-NEXT: store i1 [[TMP14]], i1* [[DST]], align 1
				; CHECK-NEXT: [[TMP15:%.*]] = extractelement <2 x i1> [[TMP11]], i32 1
				; CHECK-NEXT: store i1 [[TMP15]], i1* [[DST]], align 1
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
				; CHECK-NEXT: [[PTR_IND]] = getelementptr i64, i64* [[POINTER_PHI]], i64 4
				; CHECK-NEXT: [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[CMP_N]], label [[END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[START]], [[ENTRY:%.*]] ]
				; CHECK-NEXT: [[BC_RESUME_VAL1:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[FIRST_SROA:%.]] = phi i64 [ [[INCDEC_PTR:%.]], [[FOR_BODY]] ], [ [[START:%.]], [[ENTRY:%.*]] ]			; CHECK-NEXT: [[FIRST_SROA:%.]] = phi i64 [ [[INCDEC_PTR:%.*]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[IV_NEXT:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL1]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], 1
	; CHECK-NEXT: [[TMP0:%.]] = load i64, i64 [[FIRST_SROA]], align 4			; CHECK-NEXT: [[TMP17:%.]] = load i64, i64 [[FIRST_SROA]], align 4
	; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i64, i64* [[FIRST_SROA]], i64 1			; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i64, i64* [[FIRST_SROA]], i64 1
	; CHECK-NEXT: [[CMP_NOT:%.]] = icmp eq i64 [[INCDEC_PTR]], [[START]]			; CHECK-NEXT: [[CMP_NOT:%.]] = icmp eq i64 [[INCDEC_PTR]], [[START]]
	; CHECK-NEXT: store i1 [[CMP_NOT]], i1* [[DST:%.*]], align 1			; CHECK-NEXT: store i1 [[CMP_NOT]], i1* [[DST]], align 1
	; CHECK-NEXT: [[CMP:%.]] = icmp ult i64 [[IV]], [[N:%.]]			; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[IV]], [[N]]
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[END:%.*]], !llvm.loop [[LOOP0]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[END]], !llvm.loop [[LOOP4:![0-9]+]]
	; CHECK: end:			; CHECK: end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%first.sroa = phi i64* [ %incdec.ptr, %for.body ], [ %start, %entry ]			%first.sroa = phi i64* [ %incdec.ptr, %for.body ], [ %start, %entry ]
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Enable fixed length vectorizationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 456028

llvm/lib/Target/RISCV/RISCVSubtarget.cpp

llvm/test/Analysis/CostModel/RISCV/active_lane_mask.ll

llvm/test/CodeGen/RISCV/fold-vector-cmp.ll

llvm/test/Transforms/LoopVectorize/RISCV/illegal-type.ll

[RISCV] Enable fixed length vectorization
ClosedPublic