LLVM backend for AArch64 does not currently support product reductions
so it requires some test code duplication to have a version for the
addition reductions. For all the other tests, we can run the vanilla
version with VLA compilation options.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_reductions.mlir | ||
---|---|---|
11–12 ↗ | (On Diff #425205) | I need to remove this |
mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_reductions_vla.mlir | ||
---|---|---|
2 | For the fully duplicated test (new files), would it make sense to put them in mlir/test/Integration/Dialect/SparseTensor/CPU/ArmSVE Similar to what we did for mlir/test/Integration/Dialect/Vector/CPU/ArmSVE |
Hi @jsetoain , thanks very much for this patch! I know that you have no access to SVE hardware ATM, so it might be tricky for you to finish this. And we (Arm) would love to have this merged :) Shall I take over? I do have access to the required testing infra.
Btw, these tests run mostly fine for me, but lli behaves slightly differently to mlir-cpu-runner - I had to add return 0 for the tests to pass. So, why not just use mlir-cpu-runner instead? I suspect that that's something to do with SVE support?
Hi Andrzej! By all means, please, feel free to take over. I'm trying to figure out a way to transfer the patch to you. The reason why I'm using lli here is because mlir-cpu-runner was not working. I don't remember the issue, exactly, and it might have been fixed by now. If you manage to get the test running with mlir-cpu-runner, that's definitely preferable to lli.
Thanks! 😊
@aartbik Given https://reviews.llvm.org/D136183, we should probably park this for now?
CC @jsetoain
Abandoning this for now. Let's wait for updates on the sparse compiler before we decide what to do next: https://discourse.llvm.org/t/mlir-sparse-compiler-progress/60479/8.
We will soon be back online with a much better SIMD approach!
Apologies for the slight detour in the meantime.
Rebased on top of main.
I switched from lli to mlir-cpu-runner as that was the path of least resitance for now. However, there's no -march and -mattr in mlir-cpu-runner. I need to double check that indeed SVE code is generated and run. If not, we either need to update mlir-cpu-runner or I need to use lli instead.
mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_flatten.mlir | ||
---|---|---|
28 | Just a very quick comment, this will not be the right way to run the sparse vectorizer. We will have to pass in the flags to the pipeline command sparse-compiler (see original vector example), but adding this support is still TBD. I was waiting for the vectorizer to be "production" ready before adding those ;-) |
Revision https://reviews.llvm.org/D139581 should enable you to restart the efforts again!
Rebase on top of main
@aartbik Could you take a quick and let me know whether this makes sense? 3
tests are currently failing for me:
Failed Tests (3): MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_quantized_matmul.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_sampled_matmul.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_sampled_mm_fusion.mlir
I didn't get a chance to triage just yet.
Can you please rebase until at least https://reviews.llvm.org/D139983 and try again.
Also, please note you may need enable-buffer-initialization=true if you use vector_transfer/print to CHECK the output if you use codegen instead of lib.
Rebase on top of main
I've also simplified my changes a bit to make them less intrusive.
mlir/test/Integration/Dialect/SparseTensor/CPU/ArmSVE/lit.local.cfg | ||
---|---|---|
20 ↗ | (On Diff #483214) | period at end |
mlir/test/Integration/Dialect/SparseTensor/CPU/ArmSVE/sparse_reductions_vla.mlir | ||
1 ↗ | (On Diff #483214) | refresh my memory, why do we need the ArmSVE specific directory? |
mlir/test/Integration/Dialect/SparseTensor/CPU/lit.local.cfg | ||
21 | period at end | |
mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_cast.mlir | ||
13 | Phrase this a bit in the same style as the others, If SVE is available, do the same run, but now with direct IR generation and VLA vectorization. | |
mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_filter_conv2d.mlir | ||
85 | Do you see a way to avoid adding this to *all* tests? Can we put this in its own file and let the command line compile take care of it? |
mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_filter_conv2d.mlir | ||
---|---|---|
85 | Here is an idea, add a new file entry.mlir with just this entry method, and then do RUN: cat $s entry.mlir | rest of pipeline so that the two files are concatenated into a single stream |
Address PR comments
Thanks for the comments @aartbik :)
- Moved entry_lli to a dedicated file (avoids code duplication)
- Unified tests to use direct IR generation
- Updated comments
refresh my memory, why do we need the ArmSVE specific directory?
Let me get back to you tomorrow - first I need to go over the discussion and refresh my own memory.
mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_filter_conv2d.mlir | ||
---|---|---|
85 | Thanks! I've derived a slightly different solution - see the latest update. it turns out lli will happily consume multiple files. Let me know whether you have any preference - I'm happy to try your suggestion instead. |
mlir/test/Integration/Dialect/SparseTensor/CPU/Inputs/main_for_lli.ll | ||
---|---|---|
1 ↗ | (On Diff #485408) | I am okay keeping this at the same level. No need to introduce a whole new dir for just this |
mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_filter_conv2d.mlir | ||
80 | please make sure to avoid unrelated diffs | |
85 | even better |
mlir/test/Integration/Dialect/SparseTensor/CPU/Inputs/main_for_lli.ll | ||
---|---|---|
1 ↗ | (On Diff #485408) | We need to make sure that this file is not considered as a yet another test file. By putting it in "Inputs", we don't require any additional logic for this - see the definition of config.excludes. I've not been able to find any documentation for this, but I've seen it used in various other sub-projects. |
mlir/test/Integration/Dialect/SparseTensor/CPU/ArmSVE/sparse_reductions_vla.mlir | ||
---|---|---|
1 ↗ | (On Diff #483214) | The answer is in the original commit summary:
I did check the backend for a clear indication of what is currently supported and what is not, but it's a bit convoluted:
I will reach out internally to learn a bit more about the status of SVE in the context of reductions, but need to wait for folks to return from their Xmas breaks first. I do think that we should avoid code duplication though and creating a new sub-directory here. Instead, I suggest splitting "sparse_reductions.mlir" into 2 files:
Let me upload this so that you get a better idea. |
Split "sparse_reductions.mlir" into 2 test files
I split "sparse_reductions.mlir" into:
- sparse_reductions_1.mlir (reductions supported by SVE)
- sparse_reductions_2.mlir (all other reductions)
This way we can avoid adding "ArmSVE/sparse_reductions_vla.mlir", which
duplicated SVE-supported reductions from "sparse_reductions.mlir".
Remove the remaining unrelated changes
Also made sure that all SVE tests use direct IR generation
Yeah, this is LGTM after making sure the emulator works and addressing my last nit request.
mlir/test/Integration/Dialect/SparseTensor/CPU/Inputs/main_for_lli.ll | ||
---|---|---|
1 ↗ | (On Diff #485408) | Ah, okay, makes sense. |
mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_reductions_2.mlir | ||
1 ↗ | (On Diff #485704) | I like this solution of splitting up the files much better than having yet another directory. Thanks! But one last nit, I don't really like the _1 _2 naming, as it does not add much. Perhaps we can some up with a better name, like just sparse_reductions.mlir for the original and sparse_reductions_prod.mlir for this file, so it becomes more descriptive? |
I've experimented with both QEMU and ArmIE. While QEMU works, it crashes at the end of a simulation when running lli. I've tried to reduce my issue, but I can only reproduce it with lli (which is a relatively huge binary). As I haven't been able to find any evidence that QEMU actually ever worked here (end-to-end), I suggest leaving it for now. lli is rather too large as a reproducer. Any suggestions how to reduce this or where to report it?
Instead, I used ArmIE. It worked without any issues (I needed to replace lli with %lli in tests). It's a free emulator and the official instructions worked for me just fine. So I am hoping that this is sufficient here. I used the following CMake set-up:
cmake -DMLIR_INCLUDE_INTEGRATION_TESTS=True -DMLIR_RUN_ARM_SVE_TESTS=True -DARM_EMULATOR_OPTIONS="-msve-vector-bits=128" -DARM_EMULATOR_EXECUTABLE="armie" <other cmake options>
I'm just about to send a small update that will address the outstanding comment - rename "sparse_reductions_{1|2}.mlir". So, is this ready to be merged? :)
I'm just about to send a small update that will address the outstanding comment - rename "sparse_reductions_{1|2}.mlir". So, is this ready to be merged? :)
Aart is currently on vacation and will be back after Jan 20th. I am in his team and I would suggest that we wait until he is back because I am afraid that I do not have enough context/knowledge to make the decision by myself for this patch ;-)
period at end