The MemRef to LLVM conversion pass emits llvm.alloca operations to promote MemRef descriptors to the stack when lowering memref.copy operations for operands which do not have a contiguous layout in memory. The original stack position is never restored after the allocations, which creates an issue when the copy operation is embedded into a loop with a high trip count, ultimately resulting in a segmentation fault due to the stack growing too large.
Below is as a minimal example illustrating the issue:
module { func.func @main() { %arg0 = memref.alloc() : memref<32x64xi64> %arg1 = memref.alloc() : memref<16x32xi64> %lb = arith.constant 0 : index %ub = arith.constant 100000 : index %step = arith.constant 1 : index %slice = memref.subview %arg0[16,32][16,32][1,1] : memref<32x64xi64> to memref<16x32xi64, #map> scf.for %i = %lb to %ub step %step { memref.copy %slice, %arg1 : memref<16x32xi64, #map> to memref<16x32xi64> } return } }
When running the code above, e.g., with mlir-cpu-runner, the execution crashes with a segmentation fault:
$ mlir-opt \ --convert-scf-to-cf \ --convert-memref-to-llvm \ --convert-func-to-llvm --convert-cf-to-llvm \ --reconcile-unrealized-casts <file> | \ mlir-cpu-runner \ -e main -entry-point-result=void \ --shared-libs=$PWD/build/lib/libmlir_c_runner_utils.so [...] Segmentation fault
This patch causes the code lowering a memref.copy operation in the MemRefToLLVM pass to emit a pair of matching llvm.intr.stacksave and llvm.intr.stackrestore operations around the promotion of memory descriptors and the subsequent call to memrefCopy in order to restore the stack to its original position after the call.