Details
Diff Detail
Event Timeline
Thanks!
mlir/lib/Conversion/ArmSMEToSCF/ArmSMEToSCF.cpp | ||
---|---|---|
34 | nit: SmallVector -> SmallVectorImpl |
Thanks Cullen! A few minor comments/suggestions.
I think that it would be nice to update one of the tests (or more) as follows:
// BEFORE func.func @arm_sme_tile_load(%src : memref<?x?xi32>) { %c0 = arith.constant 0 : index %tile = arm_sme.tile_load %src[%c0, %c0] : memref<?x?xi32>, vector<[4]x[4]xi32> return } // AFTER func.func @arm_sme_tile_load(%src : memref<?x?xi32>) { %c0 = arith.constant 0 : index %c123 = arith.constant 123 : index %tile = arm_sme.tile_load %src[%c123, %c0] : memref<?x?xi32>, vector<[4]x[4]xi32> return }
Otherwise it's a bit tricky to see what has changed - with c0 in all tests the effective address does not change, does it?
mlir/lib/Conversion/ArmSMEToSCF/ArmSMEToSCF.cpp | ||
---|---|---|
112 | [nit] Suggesting a more descriptive name. | |
mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp | ||
135 | I think that this comment should be moved further down. | |
mlir/test/Conversion/ArmSMEToSCF/arm-sme-to-scf.mlir | ||
13 | [nit] "TILE_SLICE_OFFSET" suggests offset into ZA (that's where tile slices are defined). But this is for a plain memref. Perhaps OFFSET? I am also thinking that the key point of this patch is to clearly differentiate between memory offsets (low level LLVM concept) and "tile slice idx" (SME concept). | |
mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir | ||
12–21 | Rather than duplicating this, could you use DEFINE and REDEFINE? Alternatively, you could move this to a separate file. There's quite a lot going on here. | |
33 | Could you add a comment that would highlight the difference between z0_d_f64 and load_store_two_za_s_tiles? | |
246–247 | [nit] Small suggestion for a more descriptive name. |
Thanks for comments
I think that it would be nice to update one of the tests (or more) as follows:
// BEFORE func.func @arm_sme_tile_load(%src : memref<?x?xi32>) { %c0 = arith.constant 0 : index %tile = arm_sme.tile_load %src[%c0, %c0] : memref<?x?xi32>, vector<[4]x[4]xi32> return } // AFTER func.func @arm_sme_tile_load(%src : memref<?x?xi32>) { %c0 = arith.constant 0 : index %c123 = arith.constant 123 : index %tile = arm_sme.tile_load %src[%c123, %c0] : memref<?x?xi32>, vector<[4]x[4]xi32> return }Otherwise it's a bit tricky to see what has changed - with c0 in all tests the effective address does not change, does it?
this isn't changing anything in the example you provided, the offset is adjusted when materializing the tile slice loop, and that is tested in the -convert-arm-sme-to-scf tests.
mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp | ||
---|---|---|
135 |
that's not part of this patch but done anyway | |
mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir | ||
12–21 |
I was thinking this should have a single entry that calls both tests but ZA isn't currently re-enabled after calls so that's not possible until that's resolved but should be once it is. | |
246–247 |
I understanding you intend that to mean to element type of the tile but appending type usually indicates the value is of that type which this isnt, it's an index, I've kept it as is. |
LGTM, thanks! I've left a few more comments, but feel free to ignore.
this isn't changing anything in the example you provided, the offset is adjusted when materializing the tile slice loop, and that is tested in the -convert-arm-sme-to-scf tests.
Clarified what I had in mind inline in a test. Not a blocker.
mlir/test/Dialect/ArmSME/vector-ops-to-llvm.mlir | ||
---|---|---|
62 | I was suggesting to change %c0 = 0 to e.g. %c123 = 123 in places like this one. Right now (with %c0) it looks like this // BEFORE // CHECK-NEXT: %[[OFF0:.*]] = llvm.mul %[[TILE_SLICE_I64]], %[[STRIDE0]] : i64 // CHECK-NEXT: %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[OFF0]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8 becomes: // AFTER // CHECK-NEXT: %[[OFF0:.*]] = llvm.mul %[[TILE_SLICE_I64]], %[[STRIDE0]] : i64 // CHECK-NEXT: %[[OFF1:.*]] = llvm.add %[[OFF0]], %[[C0_I64]] : i64 // CHECK-NEXT: %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[OFF1]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8 Note that OFF0 and OFF1 are identical so the benefits of this patch are obfuscated. The resulting code is correct _with_ and _without_ your change (because the offset is 0). Replacing %c0 with %c123 makes things more interesting: // BEFORE // CHECK-NEXT: %[[OFF0:.*]] = llvm.mul %[[TILE_SLICE_I64]], %[[STRIDE0]] : i64 // CHECK-NEXT: %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[OFF0]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8 becomes: // AFTER // CHECK-NEXT: %[[OFF0:.*]] = llvm.mul %[[TILE_SLICE_I64]], %[[STRIDE0]] : i64 // CHECK-NEXT: %[[OFF1:.*]] = llvm.add %[[OFF0]], %[[C123_I64]] : i64 // CHECK-NEXT: %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[OFF1]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8 In this case the "BEFORE" is obviously broken. | |
mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir | ||
12–21 |
I don't follow - with DEFINE/REDEFINE you can re-use everything while keeping the entry point custom: // DEFINE: %{entry_point} = za0_d_f64 // DEFINE: %{compile} = mlir-opt %s -enable-arm-streaming="mode=locally enable-za" \ // DEFINE: -convert-vector-to-arm-sme -convert-arm-sme-to-scf \ // DEFINE: -convert-vector-to-llvm="enable-arm-sme" -cse -canonicalize \ // DEFINE: -allocate-arm-sme-tiles -test-lower-to-llvm // DEFINE: %{translate} = mlir-translate -mlir-to-llvmir | \ // DEFINE: %{run} = %lli_aarch64_cmd --march=aarch64 --mattr="+sve,+sme" \ // DEFINE: --entry-function=%{entry_point} \ // DEFINE: --dlopen=%mlir_native_utils_lib_dir/libmlir_c_runner_utils%shlibext \ // DEFINE: --dlopen=%mlir_native_utils_lib_dir/libmlir_runner_utils%shlibext | \ // RUN: %{compile} | %{translate} | %{run} | FileCheck %s --check-prefix=CHECK-ZA0_D // REDEFINE: %{entry_point} = load_store_two_za_s_tiles // RUN: %{compile} | %{translate} | %{run} | FileCheck %s | |
246–247 |
That's not the main thing that I had in mind :) Sorry should've been clearer. My main suggestion here is to avoid variables names like e.g. size (or name etc). With names like this my first question would be "_size_ of what?" (or "_name_ of what?"). I just wanted to clarify, this is just a nit. |
Addressed your final comments before landing, cheers Andrzej
mlir/test/Dialect/ArmSME/vector-ops-to-llvm.mlir | ||
---|---|---|
62 |
ah ok sure, I've updated vector_load_i8 | |
mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir | ||
12–21 |
Updated, this is nice. Also updated to use mlir-cpu-runner. | |
246–247 |
Sorry when I first read this I didn't spot size, I've updated it. |
nit: SmallVector -> SmallVectorImpl