This is an archive of the discontinued LLVM Phabricator instance.

[mlir][ArmSME] Use memref indices for load and store
ClosedPublic

Authored by c-rhodes on Jul 31 2023, 6:52 AM.

Details

Summary

This patch extends the ArmSME load and store op lowering to use the
memref indices. An integration test that loads two 32-bit element ZA
tiles from memory and stores them back to memory in reverse order to
verify this is added.

Depends on D156467 D156558

Diff Detail

Event Timeline

c-rhodes created this revision.Jul 31 2023, 6:52 AM
Herald added a reviewer: ftynse. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
c-rhodes requested review of this revision.Jul 31 2023, 6:52 AM
dcaballe accepted this revision.Jul 31 2023, 10:37 PM

Thanks!

mlir/lib/Conversion/ArmSMEToSCF/ArmSMEToSCF.cpp
34

nit: SmallVector -> SmallVectorImpl

This revision is now accepted and ready to land.Jul 31 2023, 10:37 PM

Thanks Cullen! A few minor comments/suggestions.

I think that it would be nice to update one of the tests (or more) as follows:

// BEFORE
func.func @arm_sme_tile_load(%src : memref<?x?xi32>) {
  %c0 = arith.constant 0 : index
  %tile = arm_sme.tile_load %src[%c0, %c0] : memref<?x?xi32>, vector<[4]x[4]xi32>
  return
}

// AFTER
func.func @arm_sme_tile_load(%src : memref<?x?xi32>) {
  %c0 = arith.constant 0 : index
  %c123 = arith.constant 123 : index
  %tile = arm_sme.tile_load %src[%c123, %c0] : memref<?x?xi32>, vector<[4]x[4]xi32>
  return
}

Otherwise it's a bit tricky to see what has changed - with c0 in all tests the effective address does not change, does it?

mlir/lib/Conversion/ArmSMEToSCF/ArmSMEToSCF.cpp
114

[nit] Suggesting a more descriptive name.

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
138–139

I think that this comment should be moved further down.

mlir/test/Conversion/ArmSMEToSCF/arm-sme-to-scf.mlir
13–14

[nit] "TILE_SLICE_OFFSET" suggests offset into ZA (that's where tile slices are defined). But this is for a plain memref. Perhaps OFFSET? I am also thinking that the key point of this patch is to clearly differentiate between memory offsets (low level LLVM concept) and "tile slice idx" (SME concept).

mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir
1–18

Rather than duplicating this, could you use DEFINE and REDEFINE? Alternatively, you could move this to a separate file. There's quite a lot going on here.

29

Could you add a comment that would highlight the difference between z0_d_f64 and load_store_two_za_s_tiles?

242–243

[nit] Small suggestion for a more descriptive name.

Matt added a subscriber: Matt.Aug 1 2023, 2:44 PM
c-rhodes updated this revision to Diff 546409.Aug 2 2023, 4:51 AM

address comments

c-rhodes marked 4 inline comments as done.Aug 2 2023, 4:58 AM

Thanks Cullen! A few minor comments/suggestions.

Thanks for comments

I think that it would be nice to update one of the tests (or more) as follows:

// BEFORE
func.func @arm_sme_tile_load(%src : memref<?x?xi32>) {
  %c0 = arith.constant 0 : index
  %tile = arm_sme.tile_load %src[%c0, %c0] : memref<?x?xi32>, vector<[4]x[4]xi32>
  return
}

// AFTER
func.func @arm_sme_tile_load(%src : memref<?x?xi32>) {
  %c0 = arith.constant 0 : index
  %c123 = arith.constant 123 : index
  %tile = arm_sme.tile_load %src[%c123, %c0] : memref<?x?xi32>, vector<[4]x[4]xi32>
  return
}

Otherwise it's a bit tricky to see what has changed - with c0 in all tests the effective address does not change, does it?

this isn't changing anything in the example you provided, the offset is adjusted when materializing the tile slice loop, and that is tested in the -convert-arm-sme-to-scf tests.

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
138–139

I think that this comment should be moved further down.

that's not part of this patch but done anyway

mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir
1–18

Rather than duplicating this, could you use DEFINE and REDEFINE? Alternatively, you could move this to a separate file. There's quite a lot going on here.

I was thinking this should have a single entry that calls both tests but ZA isn't currently re-enabled after calls so that's not possible until that's resolved but should be once it is.

242–243

[nit] Small suggestion for a more descriptive name.

I understanding you intend that to mean to element type of the tile but appending type usually indicates the value is of that type which this isnt, it's an index, I've kept it as is.

awarzynski accepted this revision.Aug 3 2023, 12:36 AM

LGTM, thanks! I've left a few more comments, but feel free to ignore.

this isn't changing anything in the example you provided, the offset is adjusted when materializing the tile slice loop, and that is tested in the -convert-arm-sme-to-scf tests.

Clarified what I had in mind inline in a test. Not a blocker.

mlir/test/Dialect/ArmSME/vector-ops-to-llvm.mlir
68

I was suggesting to change %c0 = 0 to e.g. %c123 = 123 in places like this one. Right now (with %c0) it looks like this

// BEFORE
// CHECK-NEXT:   %[[OFF0:.*]] = llvm.mul %[[TILE_SLICE_I64]], %[[STRIDE0]]  : i64
// CHECK-NEXT:   %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[OFF0]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8

becomes:

// AFTER
// CHECK-NEXT:   %[[OFF0:.*]] = llvm.mul %[[TILE_SLICE_I64]], %[[STRIDE0]]  : i64
// CHECK-NEXT:   %[[OFF1:.*]] = llvm.add %[[OFF0]], %[[C0_I64]]  : i64
// CHECK-NEXT:   %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[OFF1]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8

Note that OFF0 and OFF1 are identical so the benefits of this patch are obfuscated. The resulting code is correct _with_ and _without_ your change (because the offset is 0).

Replacing %c0 with %c123 makes things more interesting:

// BEFORE
// CHECK-NEXT:   %[[OFF0:.*]] = llvm.mul %[[TILE_SLICE_I64]], %[[STRIDE0]]  : i64
// CHECK-NEXT:   %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[OFF0]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8

becomes:

// AFTER
// CHECK-NEXT:   %[[OFF0:.*]] = llvm.mul %[[TILE_SLICE_I64]], %[[STRIDE0]]  : i64
// CHECK-NEXT:   %[[OFF1:.*]] = llvm.add %[[OFF0]], %[[C123_I64]]  : i64
// CHECK-NEXT:   %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[OFF1]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8

In this case the "BEFORE" is obviously broken.

mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir
1–18

I was thinking this should have a single entry that calls both tests but ZA isn't currently re-enabled after calls so that's not possible until that's resolved but should be once it is.

I don't follow - with DEFINE/REDEFINE you can re-use everything while keeping the entry point custom:

// DEFINE: %{entry_point} = za0_d_f64

// DEFINE: %{compile} = mlir-opt %s -enable-arm-streaming="mode=locally enable-za" \
// DEFINE:   -convert-vector-to-arm-sme -convert-arm-sme-to-scf \
// DEFINE:   -convert-vector-to-llvm="enable-arm-sme" -cse -canonicalize \
// DEFINE:   -allocate-arm-sme-tiles -test-lower-to-llvm
// DEFINE: %{translate} = mlir-translate -mlir-to-llvmir | \
// DEFINE: %{run} = %lli_aarch64_cmd --march=aarch64 --mattr="+sve,+sme" \
// DEFINE:   --entry-function=%{entry_point} \
// DEFINE:   --dlopen=%mlir_native_utils_lib_dir/libmlir_c_runner_utils%shlibext \
// DEFINE:   --dlopen=%mlir_native_utils_lib_dir/libmlir_runner_utils%shlibext | \
// RUN: %{compile} | %{translate} | %{run} | FileCheck %s --check-prefix=CHECK-ZA0_D

// REDEFINE: %{entry_point} = load_store_two_za_s_tiles
// RUN: %{compile} | %{translate} | %{run} | FileCheck %s
242–243

I understanding you intend that to mean to element type

That's not the main thing that I had in mind :) Sorry should've been clearer.

My main suggestion here is to avoid variables names like e.g. size (or name etc). With names like this my first question would be "_size_ of what?" (or "_name_ of what?"). I just wanted to clarify, this is just a nit.

This revision was automatically updated to reflect the committed changes.
c-rhodes marked an inline comment as done.
c-rhodes marked 2 inline comments as done.Aug 3 2023, 1:55 AM

LGTM, thanks! I've left a few more comments, but feel free to ignore.

this isn't changing anything in the example you provided, the offset is adjusted when materializing the tile slice loop, and that is tested in the -convert-arm-sme-to-scf tests.

Clarified what I had in mind inline in a test. Not a blocker.

Addressed your final comments before landing, cheers Andrzej

mlir/test/Dialect/ArmSME/vector-ops-to-llvm.mlir
68

I was suggesting to change %c0 = 0 to e.g. %c123 = 123 in places like this one. Right now (with %c0) it looks like this

// BEFORE
// CHECK-NEXT:   %[[OFF0:.*]] = llvm.mul %[[TILE_SLICE_I64]], %[[STRIDE0]]  : i64
// CHECK-NEXT:   %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[OFF0]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8

becomes:

// AFTER
// CHECK-NEXT:   %[[OFF0:.*]] = llvm.mul %[[TILE_SLICE_I64]], %[[STRIDE0]]  : i64
// CHECK-NEXT:   %[[OFF1:.*]] = llvm.add %[[OFF0]], %[[C0_I64]]  : i64
// CHECK-NEXT:   %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[OFF1]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8

Note that OFF0 and OFF1 are identical so the benefits of this patch are obfuscated. The resulting code is correct _with_ and _without_ your change (because the offset is 0).

Replacing %c0 with %c123 makes things more interesting:

// BEFORE
// CHECK-NEXT:   %[[OFF0:.*]] = llvm.mul %[[TILE_SLICE_I64]], %[[STRIDE0]]  : i64
// CHECK-NEXT:   %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[OFF0]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8

becomes:

// AFTER
// CHECK-NEXT:   %[[OFF0:.*]] = llvm.mul %[[TILE_SLICE_I64]], %[[STRIDE0]]  : i64
// CHECK-NEXT:   %[[OFF1:.*]] = llvm.add %[[OFF0]], %[[C123_I64]]  : i64
// CHECK-NEXT:   %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[OFF1]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8

In this case the "BEFORE" is obviously broken.

ah ok sure, I've updated vector_load_i8

mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir
1–18

I was thinking this should have a single entry that calls both tests but ZA isn't currently re-enabled after calls so that's not possible until that's resolved but should be once it is.

I don't follow - with DEFINE/REDEFINE you can re-use everything while keeping the entry point custom:

// DEFINE: %{entry_point} = za0_d_f64

// DEFINE: %{compile} = mlir-opt %s -enable-arm-streaming="mode=locally enable-za" \
// DEFINE:   -convert-vector-to-arm-sme -convert-arm-sme-to-scf \
// DEFINE:   -convert-vector-to-llvm="enable-arm-sme" -cse -canonicalize \
// DEFINE:   -allocate-arm-sme-tiles -test-lower-to-llvm
// DEFINE: %{translate} = mlir-translate -mlir-to-llvmir | \
// DEFINE: %{run} = %lli_aarch64_cmd --march=aarch64 --mattr="+sve,+sme" \
// DEFINE:   --entry-function=%{entry_point} \
// DEFINE:   --dlopen=%mlir_native_utils_lib_dir/libmlir_c_runner_utils%shlibext \
// DEFINE:   --dlopen=%mlir_native_utils_lib_dir/libmlir_runner_utils%shlibext | \
// RUN: %{compile} | %{translate} | %{run} | FileCheck %s --check-prefix=CHECK-ZA0_D

// REDEFINE: %{entry_point} = load_store_two_za_s_tiles
// RUN: %{compile} | %{translate} | %{run} | FileCheck %s

Updated, this is nice. Also updated to use mlir-cpu-runner.

242–243

I understanding you intend that to mean to element type

That's not the main thing that I had in mind :) Sorry should've been clearer.

My main suggestion here is to avoid variables names like e.g. size (or name etc). With names like this my first question would be "_size_ of what?" (or "_name_ of what?"). I just wanted to clarify, this is just a nit.

Sorry when I first read this I didn't spot size, I've updated it.