The Loop Vectorizer can generate mis-optimized code https://bugs.llvm.org/show_bug.cgi?id=34245
In case of GEMM, we use only the SLP Vectorizer out of two LLVM vectorizers. Consequently, we disable the Loop Vectorizer for the innermost loop using mark nodes and emitting the corresponding metadata.
P.S.: I haven't managed to insert the mark nodes before AST for nodes, since the isolation can produce if statements and, AFAIU, we aren't able to modify their children. For example, we can get the following
// Mark node if (…) for (…)
Consequently, we aren't able to modify the for loop during handling of the mark node.