This patch implements the functionality of llvm.loop.vectorize.ivdep.enable metadata. The metadata indicates that dependencies of Unknown type can be assumed to be Safe during the CanVecMem check. This patch depends on D69391
Details
Diff Detail
Event Timeline
[serious] We did not come to a conclusion during the discussion about the semantics ivdep being implementation-dependent.
Also consider that vectorize_enable(assume_safety) has been designed such that if previous passes move code into the loop that has not been in the source code loop, the safety semantic does not apply to the moved code. This implementation would apply ivdep on the moved code (which might be okay if we shift the responsibility to the code motion pass to remove the metadata).
llvm/docs/LangRef.rst | ||
---|---|---|
5492 | [serious] Please also describe the relationship to llvm.mem.parallel_loop_access. | |
llvm/include/llvm/Analysis/LoopAccessAnalysis.h | ||
204 | [style] Start variables with a capital letter. Also to not use two different styles on the same line. | |
llvm/lib/Analysis/LoopAccessAnalysis.cpp | ||
2035–2037 | [serious] areDepsSafe is an expensive call which we should not do redundantly when only one of the result is actually used. [suggestion] Use only CanVecMem and update it according to ivdep. |
Passed Ivdep hint from LoopVectorizationLegality using GetLAA, to avoid computing areDepsSafe twice.
Thank you for checking this diff and for the suggestions! Sorry for the delayed update on this.
I was under the impression that Unknown dependencies were ambiguous to the vectorizer and so would be along the lines of the Cray semantics. Please suggest what would be a good approach for this.
Also consider that vectorize_enable(assume_safety) has been designed such that if previous passes move code into the loop that has not been in the source code loop, the safety semantic does not apply to the moved code. This implementation would apply ivdep on the moved code (which might be okay if we shift the responsibility to the code motion pass to remove the metadata).
I believe vectorize_enable(assume_safety) does this by marking the memory access instructions in the original loop (before any code is moved into the loop) with llvm.access.group metadata? Would it be recommended to do something similar for ivdep?
llvm/docs/LangRef.rst | ||
---|---|---|
5492 | llvm.loop.parallel_accesses metadata indicates that no dependencies exist between instructions marked with llvm.access.group metadata and can be executed in parallel, whereas llvm.loop.vectorize.ivdep.enable indicates that Unknown dependencies are safe for vectorization. |
Cray's semantics are dangerous in that it may unpredictably miscompile depending on which compiler you are using. Just being slower would have been fine. The issue is that there might not be a good approach and I am not convinced that approximated compatibility to Cray is a good-enough reason to introduce unpredictable miscompilation. However, just parsing #pragma ivdep but ignoring it might be fine.
Why may it miscompile unpredictably? Consider:
#pragma ivdep for (int i = 0; i < n; i+=1) sum[i/n] += A[i];
Since 0 <= i < n, i/n will always evaluate to 0. But does the compiler known? If it does, it might recognize the reduction in
double tmp = sum[0]; for (int i = 0; i < n; i+=1) tmp += A[i]; sum[0] = tmp;
and vectorize it correctly. However, if it does not, ivdep allows it to ignore the loop-carried dependency and write arbitrary values to sum[0] (for instance just the last value sum[0] = A[n-1]), i.e. miscompile the reduction.
We cannot assume that LLVM's capability to recognize dependencies and reductions are stricly superior over Cray's, hence miscompile programs that Cray's compiler did not. Even worse, the user cannot deduce whether their code will be compiled correctly or not.
Do you try getting this uptreamed? It won't be accepted before we come to a conclusion for the unpredictable miscompilation.
[serious] Please also describe the relationship to llvm.mem.parallel_loop_access.