A LDR will implicitly zero the rest of the vector, so vector_insert(zeros, load, 0) can use a single load. This adds tablegen patterns for both scaled and unscaled loads, detecting where we are inserting a load into the lower element of a zero vector.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This commit (https://github.com/llvm/llvm-project/commit/83bbd3fdbd75295669cf97967c38810d427c5c25) causes a regression in a downstream project: https://github.com/openxla/iree/issues/12546.
The effect is incorrect results in matrix multiplications, where the result is now filled with zeros instead of the correct, nonzero matrix entries. I will try to debug this some more.
Minimized end-to-end MLIR testcase here: https://github.com/openxla/iree/pull/12556 :
To make this more helpful, here is:
- the LLVM IR output by our MLIR compiler (ie the input to LLVM aarch64 codegen here): https://gist.github.com/bjacob/2ed1bce14ae4d67b4261adee70089e29
- the good generated aarch64 code (before the regression): https://gist.github.com/bjacob/e69201fc4528ea516fc1788cabb597f0
- the bad generated aarch64 code (after the regression): https://gist.github.com/bjacob/e364981e10e878c4b092728fe6e087aa
Hi - thanks for the report. It sounds like the offset might be wrong from the look at the assembly. This instructions specifically:
10534: 40 f4 7f 3d ldr b0, [x2, #4093] vs 10534: 46 0c 00 d1 sub x6, x2, #3 1053c: c0 00 40 0d ld1 { v0.b }[0], [x6]
I think I see the problem - It looks like it should be using an LDUR for those instructions. When printing assembly it will produce an ldr b0, [x2, -3] instruction, but emitting obj files gives the large positive offset. I will put a fix in for that issue now.
I've been unable to produce the same output from https://gist.github.com/bjacob/2ed1bce14ae4d67b4261adee70089e29 though - I probably don't know the right set of commands and was just using mlir-translate to convert the file to llvm-ir. Do you know what commands are needed to compile it to assembly?
There is hopefully a fix in 1c6ea961938488997712763762079e535b8b704. Please let me know if that does or doesn't fix your issue, and if you have details on getting assembly from mlir. Thanks
Thank you very much for the quick fix. I confirm that https://reviews.llvm.org/rG1c6ea961938488997712763762079e535b8b704e fixes the regression.
You probably won't need this anymore since you were able to fix this without it, but just for completeness, here was how to reproduce:
- Build https://github.com/openxla/iree - following normal build instructions - note that IREE uses its own submodule third_party/llvm-project.
- Run the IREE compiler from the build directory with these flags:
tools/iree-compile --iree-llvm-target-triple=aarch64-none-linux-android29 --iree-hal-target-backends=llvm-cpu ~/pack_testcase.mlir -o /tmp/a.vmfb --iree-llvm-keep-linker-artifacts
Where the input file pack_testcase.mlir is:
func.func @pack_pad_transpose_1x9xi8_into_2x4x8x4xi8(%arg0 : tensor<1x9xi8>) -> tensor<2x4x8x4xi8> { %empty = tensor.empty() : tensor<2x4x8x4xi8> %c0_i8 = arith.constant 0 : i8 %pack = tensor.pack %arg0 padding_value(%c0_i8 : i8) outer_dims_perm = [1, 0] inner_dims_pos = [1, 0] inner_tiles = [8, 4] into %empty : tensor<1x9xi8> -> tensor<2x4x8x4xi8> return %pack : tensor<2x4x8x4xi8> }
Thanks to the --iree-llvm-keep-linker-artifacts flag, it will print the path to the generated .so, like this
/usr/local/google/home/benoitjacob/pack_testcase.mlir:4:11: remark: linker artifacts for embedded_elf_arm_64 preserved: /tmp/pack_pad_transpose_1x9xi8_into_2x4x8x4xi8_dispatch_0-9c98ea.so
So you can then objdump that as usual,
$ANDROID_NDK/toolchains/llvm/prebuilt/linux-x86_64/bin/llvm-objdump -d /tmp/pack_pad_transpose_1x9xi8_into_2x4x8x4xi8_dispatch_0-9c98ea.so