This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/test/Transforms/LoopVectorize/AArch64/
-
test/
-
Transforms/
-
LoopVectorize/
-
AArch64/
3/3
sve-type-conv.ll

Differential D99951

[NFC] Add scalable vectorisation tests for int/FP <> int/FP conversions
ClosedPublic

Authored by david-arm on Apr 6 2021, 6:30 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
kmclaughlin
peterwaller-arm
fhahn
CarolineConcatto

Commits

rGcf7276820c50: [NFC] Add scalable vectorisation tests for int/FP <> int/FP conversions

Summary

We can already vectorise loops that involve int<>int, fp<>fp, int<>fp
and fp<>int conversions, however we didn't previously have any tests
for them. This patch adds some tests for each conversion type.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

david-arm requested review of this revision.Apr 6 2021, 6:30 AM

david-arm created this revision.

Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2021, 6:30 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

david-arm added a parent revision: D99935: [AArch64] Add instruction costs for FP_TO_UINT and FP_TO_SINT with half types.Apr 6 2021, 6:31 AM

Harbormaster completed remote builds in B97294: Diff 335497.Apr 6 2021, 7:25 AM

david-arm added a reviewer: CarolineConcatto.Apr 9 2021, 5:57 AM

Would it make sense to have all the conversion tests in a single file, as the tests are quite compact?

llvm/test/Transforms/LoopVectorize/AArch64/sve-conv-int-to-int.ll
6 ↗	(On Diff #335497)	Do we still need this? IIRC most those checks have been removed by now?

Sure, I can combine them into a single test if you think that's better? I wasn't sure whether in general we prefer to have more concise test files or to have fewer, larger files. I could create a file with a name like sve-type-conv.ll.

llvm/test/Transforms/LoopVectorize/AArch64/sve-conv-int-to-int.ll
6 ↗	(On Diff #335497)	Yes you're right. When I created the patch originally I think we still needed them, but I'll rebase and remove them.

In D99951#2679444, @david-arm wrote:

Sure, I can combine them into a single test if you think that's better? I wasn't sure whether in general we prefer to have more concise test files or to have fewer, larger files. I could create a file with a name like sve-type-conv.ll.

Sounds good to me. I don't think there's a general answer really. In this case it seems like they all test the same/similar code path and there might be less processing overhead with fewer, larger files than with many small files.

Removed warnings check line.
Combined all tests into one file.

david-arm marked an inline comment as done.Apr 9 2021, 7:43 AM

Harbormaster completed remote builds in B97985: Diff 336453.Apr 9 2021, 8:18 AM

kmclaughlin added inline comments.Apr 23 2021, 3:39 AM

llvm/test/Transforms/LoopVectorize/AArch64/sve-type-conv.ll
157–158	I'm not sure why this test and the one below are using both a `[u\|s]itofp` and an `fptrunc`, is it possible to just use `uitofp i64 %0 to half` here?
218	nit: Should this test be called `@u8_to_u16`?

Removed unnecessary fptrunc from tests

david-arm marked 2 inline comments as done.Apr 23 2021, 4:00 AM

david-arm added inline comments.

llvm/test/Transforms/LoopVectorize/AArch64/sve-type-conv.ll
157–158	Yeah, this is an artefact of clang. When compiling a loop such as this: void foo(float16_t dst, uint64_t src, long long n) { for (long long i = 0; i < n; i++) dst[i] = src[i]; } it generates IR with two-part conversions, i.e. uitfop + fptrunc. Not sure why, but I've changed the test anyway.

Thanks for updating the tests, LGTM!

This revision is now accepted and ready to land.Apr 23 2021, 5:09 AM

Harbormaster completed remote builds in B100530: Diff 339971.Apr 23 2021, 5:28 AM

Closed by commit rGcf7276820c50: [NFC] Add scalable vectorisation tests for int/FP <> int/FP conversions (authored by david-arm). · Explain WhyApr 26 2021, 3:02 AM

This revision was automatically updated to reflect the committed changes.

david-arm marked an inline comment as done.

david-arm added a commit: rGcf7276820c50: [NFC] Add scalable vectorisation tests for int/FP <> int/FP conversions.

Revision Contents

Path

Size

llvm/

test/

Transforms/

LoopVectorize/

AArch64/

sve-type-conv.ll

266 lines

Diff 340470

llvm/test/Transforms/LoopVectorize/AArch64/sve-type-conv.ll

This file was added.

				; RUN: opt -loop-vectorize -dce -instcombine < %s -S \| FileCheck %s

				target triple = "aarch64-unknown-linux-gnu"


				define void @f16_to_f32(float* noalias nocapture %dst, half* noalias nocapture readonly %src, i64 %N) #0 {
				; CHECK-LABEL: @f16_to_f32(
				; CHECK: vector.body
				; CHECK: %{{.}} = fpext <vscale x 8 x half> %{{.}} to <vscale x 8 x float>
				entry:
				br label %for.body

				for.body: ; preds = %entry, %for.body
				%i.07 = phi i64 [ %inc, %for.body ], [ 0, %entry ]
				%arrayidx = getelementptr inbounds half, half* %src, i64 %i.07
				%0 = load half, half* %arrayidx, align 2
				%conv = fpext half %0 to float
				%arrayidx1 = getelementptr inbounds float, float* %dst, i64 %i.07
				store float %conv, float* %arrayidx1, align 4
				%inc = add nuw nsw i64 %i.07, 1
				%exitcond.not = icmp eq i64 %inc, %N
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end: ; preds = %for.body, %entry
				ret void
				}


				define void @f64_to_f32(float* noalias nocapture %dst, double* noalias nocapture readonly %src, i64 %N) #0 {
				; CHECK-LABEL: @f64_to_f32(
				; CHECK: vector.body
				; CHECK: %{{.}} = fptrunc <vscale x 8 x double> %{{.}} to <vscale x 8 x float>
				entry:
				br label %for.body

				for.body: ; preds = %entry, %for.body
				%i.07 = phi i64 [ %inc, %for.body ], [ 0, %entry ]
				%arrayidx = getelementptr inbounds double, double* %src, i64 %i.07
				%0 = load double, double* %arrayidx, align 8
				%conv = fptrunc double %0 to float
				%arrayidx1 = getelementptr inbounds float, float* %dst, i64 %i.07
				store float %conv, float* %arrayidx1, align 4
				%inc = add nuw nsw i64 %i.07, 1
				%exitcond.not = icmp eq i64 %inc, %N
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end: ; preds = %for.body, %entry
				ret void
				}


				define void @f16_to_s8(i8* noalias nocapture %dst, half* noalias nocapture readonly %src, i64 %N) #0 {
				; CHECK-LABEL: @f16_to_s8(
				; CHECK: vector.body
				; CHECK: %{{.}} = fptosi <vscale x 8 x half> %{{.}} to <vscale x 8 x i8>
				entry:
				br label %for.body

				for.body: ; preds = %entry, %for.body
				%i.08 = phi i64 [ %inc, %for.body ], [ 0, %entry ]
				%arrayidx = getelementptr inbounds half, half* %src, i64 %i.08
				%0 = load half, half* %arrayidx, align 2
				%conv1 = fptosi half %0 to i8
				%arrayidx2 = getelementptr inbounds i8, i8* %dst, i64 %i.08
				store i8 %conv1, i8* %arrayidx2, align 1
				%inc = add nuw nsw i64 %i.08, 1
				%exitcond.not = icmp eq i64 %inc, %N
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end: ; preds = %for.body, %entry
				ret void
				}


				define void @f32_to_u64(i64* noalias nocapture %dst, float* noalias nocapture readonly %src, i64 %N) #0 {
				; CHECK-LABEL: @f32_to_u64(
				; CHECK: vector.body
				; CHECK: %{{.}} = fptoui <vscale x 8 x float> %{{.}} to <vscale x 8 x i64>
				entry:
				br label %for.body

				for.body: ; preds = %entry, %for.body
				%i.07 = phi i64 [ %inc, %for.body ], [ 0, %entry ]
				%arrayidx = getelementptr inbounds float, float* %src, i64 %i.07
				%0 = load float, float* %arrayidx, align 4
				%conv = fptoui float %0 to i64
				%arrayidx1 = getelementptr inbounds i64, i64* %dst, i64 %i.07
				store i64 %conv, i64* %arrayidx1, align 8
				%inc = add nuw nsw i64 %i.07, 1
				%exitcond.not = icmp eq i64 %inc, %N
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end: ; preds = %for.body, %entry
				ret void
				}


				define void @s8_to_f32(float* noalias nocapture %dst, i8* noalias nocapture readonly %src, i64 %N) #0 {
				; CHECK-LABEL: @s8_to_f32(
				; CHECK: vector.body
				; CHECK: %{{.}} = sitofp <vscale x 8 x i8> %{{.}} to <vscale x 8 x float>
				entry:
				br label %for.body

				for.body: ; preds = %entry, %for.body
				%i.07 = phi i64 [ %inc, %for.body ], [ 0, %entry ]
				%arrayidx = getelementptr inbounds i8, i8* %src, i64 %i.07
				%0 = load i8, i8* %arrayidx, align 1
				%conv = sitofp i8 %0 to float
				%arrayidx1 = getelementptr inbounds float, float* %dst, i64 %i.07
				store float %conv, float* %arrayidx1, align 4
				%inc = add nuw nsw i64 %i.07, 1
				%exitcond.not = icmp eq i64 %inc, %N
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end: ; preds = %for.body, %entry
				ret void
				}


				define void @u16_to_f32(float* noalias nocapture %dst, i16* noalias nocapture readonly %src, i64 %N) #0 {
				; CHECK-LABEL: @u16_to_f32(
				; CHECK: vector.body
				; CHECK: %{{.}} = uitofp <vscale x 8 x i16> %{{.}} to <vscale x 8 x float>
				entry:
				br label %for.body

				for.body: ; preds = %entry, %for.body
				%i.07 = phi i64 [ %inc, %for.body ], [ 0, %entry ]
				%arrayidx = getelementptr inbounds i16, i16* %src, i64 %i.07
				%0 = load i16, i16* %arrayidx, align 2
				%conv = uitofp i16 %0 to float
				%arrayidx1 = getelementptr inbounds float, float* %dst, i64 %i.07
				store float %conv, float* %arrayidx1, align 4
				%inc = add nuw nsw i64 %i.07, 1
				%exitcond.not = icmp eq i64 %inc, %N
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end: ; preds = %for.body, %entry
				ret void
				}


				define void @u64_to_f16(half* noalias nocapture %dst, i64* noalias nocapture readonly %src, i64 %N) #0 {
				; CHECK-LABEL: @u64_to_f16(
				; CHECK: vector.body
				; CHECK: %{{.}} = uitofp <vscale x 8 x i64> %{{.}} to <vscale x 8 x half>
				entry:
				br label %for.body

				for.body: ; preds = %entry, %for.body
				%i.08 = phi i64 [ %inc, %for.body ], [ 0, %entry ]
				%arrayidx = getelementptr inbounds i64, i64* %src, i64 %i.08
				%0 = load i64, i64* %arrayidx, align 8
				%conv1 = uitofp i64 %0 to half
				%arrayidx2 = getelementptr inbounds half, half* %dst, i64 %i.08
				store half %conv1, half* %arrayidx2, align 2
				%inc = add nuw nsw i64 %i.08, 1
				kmclaughlinUnsubmitted Done Reply Inline Actions I'm not sure why this test and the one below are using both a `[u\|s]itofp` and an `fptrunc`, is it possible to just use `uitofp i64 %0 to half` here? kmclaughlin: I'm not sure why this test and the one below are using both a `[u\|s]itofp` and an `fptrunc`, is…
				david-armAuthorUnsubmitted Done Reply Inline Actions Yeah, this is an artefact of clang. When compiling a loop such as this: void foo(float16_t dst, uint64_t src, long long n) { for (long long i = 0; i < n; i++) dst[i] = src[i]; } it generates IR with two-part conversions, i.e. uitfop + fptrunc. Not sure why, but I've changed the test anyway. david-arm: Yeah, this is an artefact of clang. When compiling a loop such as this: ``` void foo(float16_t…
				%exitcond.not = icmp eq i64 %inc, %N
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end: ; preds = %for.body, %entry
				ret void
				}


				define void @s64_to_f16(half* noalias nocapture %dst, i64* noalias nocapture readonly %src, i64 %N) #0 {
				; CHECK-LABEL: @s64_to_f16(
				; CHECK: vector.body
				; CHECK: %{{.}} = sitofp <vscale x 8 x i64> %{{.}} to <vscale x 8 x half>
				entry:
				br label %for.body

				for.body: ; preds = %entry, %for.body
				%i.08 = phi i64 [ %inc, %for.body ], [ 0, %entry ]
				%arrayidx = getelementptr inbounds i64, i64* %src, i64 %i.08
				%0 = load i64, i64* %arrayidx, align 8
				%conv1 = sitofp i64 %0 to half
				%arrayidx2 = getelementptr inbounds half, half* %dst, i64 %i.08
				store half %conv1, half* %arrayidx2, align 2
				%inc = add nuw nsw i64 %i.08, 1
				%exitcond.not = icmp eq i64 %inc, %N
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end: ; preds = %for.body, %entry
				ret void
				}


				define void @s8_to_s32(i32* noalias nocapture %dst, i8* noalias nocapture readonly %src, i64 %N) #0 {
				; CHECK-LABEL: @s8_to_s32(
				; CHECK: vector.body
				; CHECK: %{{.}} = sext <vscale x 8 x i8> %{{.}} to <vscale x 8 x i32>
				entry:
				br label %for.body

				for.body: ; preds = %entry, %for.body
				%i.07 = phi i64 [ %inc, %for.body ], [ 0, %entry ]
				%arrayidx = getelementptr inbounds i8, i8* %src, i64 %i.07
				%0 = load i8, i8* %arrayidx, align 1
				%conv = sext i8 %0 to i32
				%arrayidx1 = getelementptr inbounds i32, i32* %dst, i64 %i.07
				store i32 %conv, i32* %arrayidx1, align 4
				%inc = add nuw nsw i64 %i.07, 1
				%exitcond.not = icmp eq i64 %inc, %N
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end: ; preds = %for.body, %entry
				ret void
				}


				define void @u8_to_u16(i16* noalias nocapture %dst, i8* noalias nocapture readonly %src, i64 %N) #0 {
				; CHECK-LABEL: @u8_to_u16(
				; CHECK: vector.body
				; CHECK: %{{.}} = zext <vscale x 8 x i8> %{{.}} to <vscale x 8 x i16>
				entry:
				br label %for.body
				kmclaughlinUnsubmitted Done Reply Inline Actions nit: Should this test be called `@u8_to_u16`? kmclaughlin: nit: Should this test be called `@u8_to_u16`?

				for.body: ; preds = %entry, %for.body
				%i.07 = phi i64 [ %inc, %for.body ], [ 0, %entry ]
				%arrayidx = getelementptr inbounds i8, i8* %src, i64 %i.07
				%0 = load i8, i8* %arrayidx, align 1
				%conv = zext i8 %0 to i16
				%arrayidx1 = getelementptr inbounds i16, i16* %dst, i64 %i.07
				store i16 %conv, i16* %arrayidx1, align 2
				%inc = add nuw nsw i64 %i.07, 1
				%exitcond.not = icmp eq i64 %inc, %N
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end: ; preds = %for.body, %entry
				ret void
				}


				define void @s64_to_s8(i8* noalias nocapture %dst, i64* noalias nocapture readonly %src, i64 %N) #0 {
				; CHECK-LABEL: @s64_to_s8(
				; CHECK: vector.body
				; CHECK: %{{.}} = trunc <vscale x 8 x i64> %{{.}} to <vscale x 8 x i8>
				entry:
				br label %for.body

				for.body: ; preds = %entry, %for.body
				%i.07 = phi i64 [ %inc, %for.body ], [ 0, %entry ]
				%arrayidx = getelementptr inbounds i64, i64* %src, i64 %i.07
				%0 = load i64, i64* %arrayidx, align 8
				%conv = trunc i64 %0 to i8
				%arrayidx1 = getelementptr inbounds i8, i8* %dst, i64 %i.07
				store i8 %conv, i8* %arrayidx1, align 1
				%inc = add nuw nsw i64 %i.07, 1
				%exitcond.not = icmp eq i64 %inc, %N
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end: ; preds = %for.body, %entry
				ret void
				}


				attributes #0 = { "target-features"="+sve" }

				!0 = distinct !{!0, !1, !2, !3, !4, !5}
				!1 = !{!"llvm.loop.mustprogress"}
				!2 = !{!"llvm.loop.vectorize.width", i32 8}
				!3 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}
				!4 = !{!"llvm.loop.interleave.count", i32 1}
				!5 = !{!"llvm.loop.vectorize.enable", i1 true}