This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/test/Integration/Dialect/Linalg/CPU/Scalable/
-
test/
-
Integration/
-
Dialect/
-
Linalg/
-
CPU/
-
Scalable/
5
fill-1d.mlir
-
lit.local.cfg

Differential D155839

[mlir][Linalg] Add an end-to-end test for scalable vectorisation
ClosedPublic

Authored by awarzynski on Jul 20 2023, 6:53 AM.

Download Raw Diff

Details

Reviewers

c-rhodes
dcaballe
benmxwl-arm
nicolasvasilache

Commits

rG9d927d039728: [mlir][Linalg] Add an end-to-end test for scalable vectorisation

Summary

This patch adds our first integration test for scalable vectorisation in
Linalg. It simply runs linalg.fill to fill a scalable vector with a
pre-defined f32 value. The result is printed to stdout.

Note that with scalable architectures, the vector size is not know at
compile time, but it is known at runtime. For this reason, the length of
the output generated by the new test depends on the hardware implementation. For
Arm's SVE we do know that there will be at least 4 f32 elements in every
scalable vector register. CHECK lines were designed accordingly.

In order to see what happens for different implementations of SVE, you
can use the following QEMU settings:

qemu-aarch64 -cpu max,sve128=on
qemu-aarch64 -cpu max,sve512=on

ATM, this test is only enabled when MLIR_RUN_ARM_SVE_TESTS is set.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

awarzynski created this revision.Jul 20 2023, 6:53 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2023, 6:53 AM

Herald added subscribers: bviyer, Moerafaat, bzcheeseman and 25 others. · View Herald Transcript

awarzynski requested review of this revision.Jul 20 2023, 6:53 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptJul 20 2023, 6:53 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: wangpc, alextsao1999, limo1996 and 2 others. · View Herald Transcript

Harbormaster completed remote builds in B246905: Diff 542501.Jul 20 2023, 9:50 AM

Matt added a subscriber: Matt.Jul 20 2023, 10:12 AM

Matt added inline comments.Jul 20 2023, 10:15 AM

mlir/test/Integration/Dialect/Linalg/CPU/Scalable/fill-1d.mlir
35	Nit: s/for/four/ (Perhaps f32 would be a bit clearer than FP, too?)

Thanks!

This revision is now accepted and ready to land.Jul 20 2023, 10:39 AM

Thanks Andrzej, I've left a few suggestions that could simplify this a little but overall LGTM. I did try to run it locally but it fails for me when running under Lit (exit code 74), runs fine when I manually run it using command from lit -a. Initially I thought it was because I didn't have D155920 but I rebased and still doesn't work, also tried fresh build, flags I'm using relating to this are -DMLIR_INCLUDE_INTEGRATION_TESTS=ON -DARM_EMULATOR_EXECUTABLE="$HOME/qemu-7.1.0/build/qemu-aarch64" -DMLIR_RUN_ARM_SVE_TESTS=ON FWIW. Suspect my build isn't configured correctly?

mlir/test/Integration/Dialect/Linalg/CPU/Scalable/fill-1d.mlir
2	there's nothing in the ArmSVE dialect used here?
2	there's no affine ops?
2–4	nit: with unused `printMemrefF32` removed I get the same IR with these flags
59–60	unused

c-rhodes mentioned this in D156519: [mlir][VectorOps] Use SCF for vector.print and allow scalable vectors.Jul 31 2023, 2:58 AM

Incorporate suggestions from Matt and Cullen - thanks!

@c-rhodes, no luck reproducing your failure :( Could you try again?

In D155839#4547617, @awarzynski wrote:

Incorporate suggestions from Matt and Cullen - thanks!

@c-rhodes, no luck reproducing your failure :( Could you try again?

Just tried it again and it works for me, not sure what changed, apologies for confusion. This LGTM cheers.

Harbormaster completed remote builds in B249271: Diff 545734.Jul 31 2023, 12:40 PM

Closed by commit rG9d927d039728: [mlir][Linalg] Add an end-to-end test for scalable vectorisation (authored by awarzynski). · Explain WhyAug 1 2023, 1:32 AM

This revision was automatically updated to reflect the committed changes.

awarzynski added a commit: rG9d927d039728: [mlir][Linalg] Add an end-to-end test for scalable vectorisation.

Revision Contents

Path

Size

mlir/

test/

Integration/

Dialect/

Linalg/

CPU/

Scalable/

fill-1d.mlir

56 lines

lit.local.cfg

9 lines

Diff 545962

mlir/test/Integration/Dialect/Linalg/CPU/Scalable/fill-1d.mlir

This file was added.

// RUN: mlir-opt %s -test-transform-dialect-interpreter -test-transform-dialect-erase-schedule -lower-vector-mask -one-shot-bufferize -test-lower-to-llvm | \

// RUN: %mcr_aarch64_cmd -e=entry -entry-point-result=void --march=aarch64 --mattr="+sve" -shared-libs=%mlir_runner_utils,%mlir_c_runner_utils | \

c-rhodesUnsubmitted

Not Done

there's nothing in the ArmSVE dialect used here?

c-rhodes: there's nothing in the ArmSVE dialect used here?

c-rhodesUnsubmitted

Not Done

there's no affine ops?

c-rhodes: there's no affine ops?

// RUN: FileCheck %s

c-rhodesUnsubmitted

Not Done

- // RUN: mlir-opt %s -lower-affine -convert-scf-to-cf -convert-vector-to-llvm="enable-arm-sve" -finalize-memref-to-llvm -convert-func-to-llvm -convert-arith-to-llvm -canonicalize | \

- // RUN: mlir-opt %s -test-transform-dialect-interpreter --test-transform-dialect-erase-schedule | \

- // RUN: mlir-opt --lower-vector-mask -one-shot-bufferize -convert-scf-to-cf -func-bufferize -test-lower-to-llvm | \

+ // RUN: mlir-opt %s -test-transform-dialect-interpreter -test-transform-dialect-erase-schedule -lower-vector-mask -one-shot-bufferize -test-lower-to-llvm | \

// RUN: %mcr_aarch64_cmd -e=entry -entry-point-result=void --march=aarch64 --mattr="+sve" -shared-libs=%mlir_runner_utils,%mlir_c_runner_utils | \

nit: with unused printMemrefF32 removed I get the same IR with these flags

c-rhodes: nit: with unused `printMemrefF32` removed I get the same IR with these flags

func.func @printTestEnd() {

%0 = llvm.mlir.addressof @str_sve_end : !llvm.ptr<array<24 x i8>>

%1 = llvm.mlir.constant(0 : index) : i64

%2 = llvm.getelementptr %0[%1, %1]

: (!llvm.ptr<array<24 x i8>>, i64, i64) -> !llvm.ptr<i8>

llvm.call @printCString(%2) : (!llvm.ptr<i8>) -> ()

return

}

func.func @entry() {

%c4 = arith.constant 4 : index

%c0 = arith.constant 0 : index

%step = arith.constant 1 : index

%c1_f32 = arith.constant 123.0 : f32

%vscale = vector.vscale

%vl_fp = arith.muli %c4, %vscale : index

%vec = bufferization.alloc_tensor(%vl_fp) : tensor<?xf32>

%vec_out = scf.for %i = %c0 to %vl_fp step %step iter_args(%vin = %vec) -> tensor<?xf32> {

%vout = tensor.insert %c1_f32 into %vin[%i] : tensor<?xf32>

scf.yield %vout : tensor<?xf32>

}

%pi = arith.constant 3.14 : f32

%vec_out_1 = linalg.fill ins(%pi : f32) outs(%vec_out : tensor<?xf32>) -> tensor<?xf32>

// There are at least 4 f32 elements in every SVE vector. For implementations

// with wider vectors, you should see more elements being printed.

// CHECK: 3.14

MattUnsubmitted

Not Done

%vec_out_1 = linalg.fill ins(%pi : f32) outs(%vec_out : tensor<?xf32>) -> tensor<?xf32>

- // There is at least for FP elements in every SVE vector. For implementations

+ // There are at least four f32 elements in every SVE vector. For implementations

// with wider vectors, you should see more elements being printed.

Nit: s/for/four/

(Perhaps f32 would be a bit clearer than FP, too?)

Matt: Nit: s/for/four/ (Perhaps f32 would be a bit clearer than FP, too?)

// CHECK: 3.14

scf.for %i = %c0 to %vl_fp step %step {

%element = tensor.extract %vec_out_1[%i] : tensor<?xf32>

vector.print %element : f32

}

// CHECK: SVE: END OF TEST OUTPUT

func.call @printTestEnd() : () -> ()

return

}

transform.sequence failures(propagate) {

^bb1(%arg1: !transform.any_op):

%0 = transform.structured.match ops{["linalg.fill"]} in %arg1 : (!transform.any_op) -> !transform.any_op

transform.structured.masked_vectorize %0 vector_sizes [[4]] : !transform.any_op

}

llvm.func @printCString(!llvm.ptr<i8>)

llvm.mlir.global internal constant @str_sve_end("SVE: END OF TEST OUTPUT\0A")

c-rhodesUnsubmitted

Not Done

unused

c-rhodes: unused

mlir/test/Integration/Dialect/Linalg/CPU/Scalable/lit.local.cfg

This file was added.

				import sys

				# ArmSVE tests must be enabled via build flag.
				if not config.mlir_run_arm_sve_tests:
				config.unsupported = True

				# No JIT on win32.
				if sys.platform == "win32":
				config.unsupported = True

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Linalg] Add an end-to-end test for scalable vectorisationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 545962

mlir/test/Integration/Dialect/Linalg/CPU/Scalable/fill-1d.mlir

mlir/test/Integration/Dialect/Linalg/CPU/Scalable/lit.local.cfg

[mlir][Linalg] Add an end-to-end test for scalable vectorisation
ClosedPublic