For the ordered construct with the simd clause, the outlined region
cannot be inlined. Otherwise, there may be unexpected behavior when
the captured statements are vectorized.
Details
Diff Detail
Event Timeline
The following test case fails after https://reviews.llvm.org/rGaf000197c4214926bd7d0862d86f89aed5f20da6.
#include <iostream> using namespace std; int main() { float a[10]; int i, N = 10; for (i = 0; i < N; i++) a[i] = 0; #pragma omp simd for (i = 0; i < N; i++) { #pragma omp ordered simd a[i] = a[i-1] + 1.0; } for (i = 0; i < N; i++) cout << a[i] << " "; cout << endl; }
$ clang++ -fopenmp simd.cpp && ./a.out 1 2 3 4 5 6 7 8 9 10 $ clang++ -fopenmp -O1 simd.cpp && ./a.out 1 1 1 1 2 1 1 1 2 3
It is fixed by this patch.
That's UB; https://godbolt.org/z/5nb61G3vY
/app/example.cpp:13:12: runtime error: index -1 out of bounds for type 'float [10]' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /app/example.cpp:13:12 in 1 2 3 4 5 6 7 8 9 10
Do you have an example that is miscompiled and doesn't contain UB?
for (i = 0; i < N; i++) --> for (i = 1; i < N; i++)
#include <iostream> using namespace std; int main() { float a[10]; int i, N = 10; for (i = 0; i < N; i++) a[i] = 0; #pragma omp simd for (i = 1; i < N; i++) { #pragma omp ordered simd a[i] = a[i-1] + 1.0; } for (i = 0; i < N; i++) cout << a[i] << " "; cout << endl; }
$ clang++ -fopenmp simd.cpp && ./a.out 0 1 2 3 4 5 6 7 8 9 $ clang++ -fopenmp -O1 simd.cpp && ./a.out 0 1 1 1 1 2 1 1 1 2
Aha. But i don't think this is the right fix,
the fact that the inlining manifests the miscompile is a symptom.
Preventing the outlined region from being inlined would also hurt OpenMP performance considerably.
Please notice that this remove is only inside emitOutlinedOrderedFunction, which is only used when ordered simd directive is there. According to OpenMP 5.0 Spec, the ordered construct either specifies a structured block in a worksharing-loop, simd, or worksharing-loop SIMD region that will be executed in the order of the loop iterations.
$ clang++ -fopenmp simd.cpp -O1 -Xclang -disable-llvm-passes && ./a.out 0 1 2 3 4 5 6 7 8 9 $ clang++ -fopenmp simd.cpp -O2 && ./a.out 0 1 2 3 4 5 6 7 8 9 $ clang++ -fopenmp simd.cpp -O3 && ./a.out 0 1 2 3 4 5 6 7 8 9
This bug is not in clang frontend. I will post it in bugzilla.
Another question is why not add llvm::Attribute::AlwaysInline when CGM.getCodeGenOpts().OptimizationLevel is 0? @jhuber6 I think it is correct to add the attribute when CGM.getCodeGenOpts().OptimizationLevel is 0.
As noted before, this is not the right fix. Not inlining should not cause a semantic difference here, the problem is something else.
@jdoerfert Yes, I agree that this is not the right fix. What I mean is to abandon this PR and post this bug in bugzilla to let someone who knows more about optimization passes to solve it considering this may not be one frontend bug according to the result of the command clang++ -fopenmp simd.cpp -O1 -Xclang -disable-llvm-passes. I have to admit that always inlining the outlined function gives correct results since optmization passes vectorize the statements if there is no dependence (eg. a[i] = 1.0) and add memcheck if there may be dependence (eg. a[i] = b[i] + 1.0). Although this vectorization violate the definition of ordered construct in OpenMP 5.0 standard, it's safe to do it. Anyway, I think it's not one big problem here.
Just to prevent the IR from changing when optimizations aren't enabled.
@jhuber6 Thanks for the reply. I think not generating outlined function is the right way when there is no optimization. Using the code for ordered threads like the following should be ok. This should not be one big deal since few users use O0 to run their applications. Do you think if we should make this change?
diff --git a/clang/lib/CodeGen/CGStmtOpenMP.cpp b/clang/lib/CodeGen/CGStmtOpenMP.cpp index 6ede4c3189d4..0ca5c7b71084 100644 --- a/clang/lib/CodeGen/CGStmtOpenMP.cpp +++ b/clang/lib/CodeGen/CGStmtOpenMP.cpp @@ -5443,11 +5443,12 @@ void CodeGenFunction::EmitOMPOrderedDirective(const OMPOrderedDirective &S) { CGM.getOpenMPRuntime().emitDoacrossOrdered(*this, DC); return; } + unsigned OptLevel = CGM.getCodeGenOpts().OptimizationLevel; const auto *C = S.getSingleClause<OMPSIMDClause>(); - auto &&CodeGen = [&S, C, this](CodeGenFunction &CGF, - PrePostActionTy &Action) { + auto &&CodeGen = [&S, OptLevel, C, this](CodeGenFunction &CGF, + PrePostActionTy &Action) { const CapturedStmt *CS = S.getInnermostCapturedStmt(); - if (C) { + if (C && OptLevel > 0) { llvm::SmallVector<llvm::Value *, 16> CapturedVars; CGF.GenerateOpenMPCapturedVars(*CS, CapturedVars); llvm::Function *OutlinedFn =
Abandon this PR since this is not right fix. Continuing discussion will be on openmp-dev mail list. This bug should be fixed if ordered simd is processed correctly in frontend and vectorization pass.