This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/lib/CodeGen/
-
lib/
-
CodeGen/
-
CGStmtOpenMP.cpp

Differential D109321

[clang][OpenMP] Fix the bug in codegen for ordered directive
AbandonedPublic

Authored by peixin on Sep 6 2021, 6:27 AM.

Download Raw Diff

Details

Reviewers

jhuber6
jdoerfert
kiranchandramohan
Meinersbur
clementval
kiranktp
Leporacanthicus
bryanpkc
arnamoy10
Chuanfeng
lebedev.ri

Summary

For the ordered construct with the simd clause, the outlined region
cannot be inlined. Otherwise, there may be unexpected behavior when
the captured statements are vectorized.

Diff Detail

Event Timeline

peixin created this revision.Sep 6 2021, 6:27 AM

Herald added subscribers: guansong, yaxunl. · View Herald TranscriptSep 6 2021, 6:27 AM

peixin requested review of this revision.Sep 6 2021, 6:27 AM

Herald added subscribers: cfe-commits, sstefan1. · View Herald TranscriptSep 6 2021, 6:27 AM

The following test case fails after https://reviews.llvm.org/rGaf000197c4214926bd7d0862d86f89aed5f20da6.

#include <iostream>
using namespace std;

int main() {
  float a[10];
  int i, N = 10;
  for (i = 0; i < N; i++)
    a[i] = 0;

  #pragma omp simd
  for (i = 0; i < N; i++) {
    #pragma omp ordered simd
    a[i] = a[i-1] + 1.0;
  }

  for (i = 0; i < N; i++)
    cout << a[i] << "  ";
  cout << endl;
}

$ clang++ -fopenmp simd.cpp && ./a.out
1  2  3  4  5  6  7  8  9  10
$ clang++ -fopenmp -O1 simd.cpp && ./a.out
1  1  1  1  2  1  1  1  2  3

It is fixed by this patch.

In D109321#2985244, @peixin wrote:

The following test case fails after https://reviews.llvm.org/rGaf000197c4214926bd7d0862d86f89aed5f20da6.

#include <iostream>
using namespace std;

int main() {
  float a[10];
  int i, N = 10;
  for (i = 0; i < N; i++)
    a[i] = 0;

  #pragma omp simd
  for (i = 0; i < N; i++) {
    #pragma omp ordered simd
    a[i] = a[i-1] + 1.0;
  }

  for (i = 0; i < N; i++)
    cout << a[i] << "  ";
  cout << endl;
}

$ clang++ -fopenmp simd.cpp && ./a.out
1  2  3  4  5  6  7  8  9  10
$ clang++ -fopenmp -O1 simd.cpp && ./a.out
1  1  1  1  2  1  1  1  2  3

It is fixed by this patch.

That's UB; https://godbolt.org/z/5nb61G3vY

/app/example.cpp:13:12: runtime error: index -1 out of bounds for type 'float [10]'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /app/example.cpp:13:12 in 
1  2  3  4  5  6  7  8  9  10

Do you have an example that is miscompiled and doesn't contain UB?

This revision now requires changes to proceed.Sep 6 2021, 6:33 AM

for (i = 0; i < N; i++) --> for (i = 1; i < N; i++)

#include <iostream>
using namespace std;

int main() {
  float a[10];
  int i, N = 10;
  for (i = 0; i < N; i++)
    a[i] = 0;

  #pragma omp simd
  for (i = 1; i < N; i++) {
    #pragma omp ordered simd
    a[i] = a[i-1] + 1.0;
  }

  for (i = 0; i < N; i++)
    cout << a[i] << "  ";
  cout << endl;
}

$ clang++ -fopenmp simd.cpp && ./a.out
0 1  2  3  4  5  6  7  8  9
$ clang++ -fopenmp -O1 simd.cpp && ./a.out
0 1  1  1  1  2  1  1  1  2

Aha. But i don't think this is the right fix,
the fact that the inlining manifests the miscompile is a symptom.

In D109321#2985281, @lebedev.ri wrote:

Aha. But i don't think this is the right fix,
the fact that the inlining manifests the miscompile is a symptom.

Preventing the outlined region from being inlined would also hurt OpenMP performance considerably.

In D109321#2985284, @jhuber6 wrote:

In D109321#2985281, @lebedev.ri wrote:

Aha. But i don't think this is the right fix,
the fact that the inlining manifests the miscompile is a symptom.

Preventing the outlined region from being inlined would also hurt OpenMP performance considerably.

Please notice that this remove is only inside emitOutlinedOrderedFunction, which is only used when ordered simd directive is there. According to OpenMP 5.0 Spec, the ordered construct either specifies a structured block in a worksharing-loop, simd, or worksharing-loop SIMD region that will be executed in the order of the loop iterations.

Harbormaster completed remote builds in B122758: Diff 370906.Sep 6 2021, 7:09 AM

$ clang++ -fopenmp simd.cpp -O1 -Xclang -disable-llvm-passes && ./a.out
0 1  2  3  4  5  6  7  8  9
$ clang++ -fopenmp simd.cpp -O2 && ./a.out
0 1  2  3  4  5  6  7  8  9
$ clang++ -fopenmp simd.cpp -O3 && ./a.out
0 1  2  3  4  5  6  7  8  9

This bug is not in clang frontend. I will post it in bugzilla.

Another question is why not add llvm::Attribute::AlwaysInline when CGM.getCodeGenOpts().OptimizationLevel is 0? @jhuber6 I think it is correct to add the attribute when CGM.getCodeGenOpts().OptimizationLevel is 0.

As noted before, this is not the right fix. Not inlining should not cause a semantic difference here, the problem is something else.

In D109321#2986678, @peixin wrote:
$ clang++ -fopenmp simd.cpp -O1 -Xclang -disable-llvm-passes && ./a.out
0 1  2  3  4  5  6  7  8  9
$ clang++ -fopenmp simd.cpp -O2 && ./a.out
0 1  2  3  4  5  6  7  8  9
$ clang++ -fopenmp simd.cpp -O3 && ./a.out
0 1  2  3  4  5  6  7  8  9
This bug is not in clang frontend. I will post it in bugzilla.

Another question is why not add llvm::Attribute::AlwaysInline when CGM.getCodeGenOpts().OptimizationLevel is 0? @jhuber6 I think it is correct to add the attribute when CGM.getCodeGenOpts().OptimizationLevel is 0.

Just to prevent the IR from changing when optimizations aren't enabled.

In D109321#2987783, @jdoerfert wrote:

As noted before, this is not the right fix. Not inlining should not cause a semantic difference here, the problem is something else.

@jdoerfert Yes, I agree that this is not the right fix. What I mean is to abandon this PR and post this bug in bugzilla to let someone who knows more about optimization passes to solve it considering this may not be one frontend bug according to the result of the command clang++ -fopenmp simd.cpp -O1 -Xclang -disable-llvm-passes. I have to admit that always inlining the outlined function gives correct results since optmization passes vectorize the statements if there is no dependence (eg. a[i] = 1.0) and add memcheck if there may be dependence (eg. a[i] = b[i] + 1.0). Although this vectorization violate the definition of ordered construct in OpenMP 5.0 standard, it's safe to do it. Anyway, I think it's not one big problem here.

Just to prevent the IR from changing when optimizations aren't enabled.

@jhuber6 Thanks for the reply. I think not generating outlined function is the right way when there is no optimization. Using the code for ordered threads like the following should be ok. This should not be one big deal since few users use O0 to run their applications. Do you think if we should make this change?

diff --git a/clang/lib/CodeGen/CGStmtOpenMP.cpp b/clang/lib/CodeGen/CGStmtOpenMP.cpp
index 6ede4c3189d4..0ca5c7b71084 100644
--- a/clang/lib/CodeGen/CGStmtOpenMP.cpp
+++ b/clang/lib/CodeGen/CGStmtOpenMP.cpp
@@ -5443,11 +5443,12 @@ void CodeGenFunction::EmitOMPOrderedDirective(const OMPOrderedDirective &S) {
       CGM.getOpenMPRuntime().emitDoacrossOrdered(*this, DC);
     return;
   }
+  unsigned OptLevel = CGM.getCodeGenOpts().OptimizationLevel;
   const auto *C = S.getSingleClause<OMPSIMDClause>();
-  auto &&CodeGen = [&S, C, this](CodeGenFunction &CGF,
-                                 PrePostActionTy &Action) {
+  auto &&CodeGen = [&S, OptLevel, C, this](CodeGenFunction &CGF,
+                                           PrePostActionTy &Action) {
     const CapturedStmt *CS = S.getInnermostCapturedStmt();
-    if (C) {
+    if (C && OptLevel > 0) {
       llvm::SmallVector<llvm::Value *, 16> CapturedVars;
       CGF.GenerateOpenMPCapturedVars(*CS, CapturedVars);
       llvm::Function *OutlinedFn =

Abandon this PR since this is not right fix. Continuing discussion will be on openmp-dev mail list. This bug should be fixed if ordered simd is processed correctly in frontend and vectorization pass.

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CGStmtOpenMP.cpp

4 lines

Diff 370906

clang/lib/CodeGen/CGStmtOpenMP.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 5,356 Lines • ▼ Show 20 Lines
	static llvm::Function *emitOutlinedOrderedFunction(CodeGenModule &CGM,			static llvm::Function *emitOutlinedOrderedFunction(CodeGenModule &CGM,
	const CapturedStmt *S,			const CapturedStmt *S,
	SourceLocation Loc) {			SourceLocation Loc) {
	CodeGenFunction CGF(CGM, /suppressNewContext=/true);			CodeGenFunction CGF(CGM, /suppressNewContext=/true);
	CodeGenFunction::CGCapturedStmtInfo CapStmtInfo;			CodeGenFunction::CGCapturedStmtInfo CapStmtInfo;
	CGF.CapturedStmtInfo = &CapStmtInfo;			CGF.CapturedStmtInfo = &CapStmtInfo;
	llvm::Function Fn = CGF.GenerateOpenMPCapturedStmtFunction(S, Loc);			llvm::Function Fn = CGF.GenerateOpenMPCapturedStmtFunction(S, Loc);
	Fn->setDoesNotRecurse();			Fn->setDoesNotRecurse();
	if (CGM.getCodeGenOpts().OptimizationLevel != 0)			Fn->removeFnAttr(llvm::Attribute::AlwaysInline);
	Fn->addFnAttr(llvm::Attribute::AlwaysInline);			Fn->addFnAttr(llvm::Attribute::NoInline);
	return Fn;			return Fn;
	}			}

	void CodeGenFunction::EmitOMPOrderedDirective(const OMPOrderedDirective &S) {			void CodeGenFunction::EmitOMPOrderedDirective(const OMPOrderedDirective &S) {
	if (CGM.getLangOpts().OpenMPIRBuilder) {			if (CGM.getLangOpts().OpenMPIRBuilder) {
	llvm::OpenMPIRBuilder &OMPBuilder = CGM.getOpenMPRuntime().getOMPBuilder();			llvm::OpenMPIRBuilder &OMPBuilder = CGM.getOpenMPRuntime().getOMPBuilder();
	using InsertPointTy = llvm::OpenMPIRBuilder::InsertPointTy;			using InsertPointTy = llvm::OpenMPIRBuilder::InsertPointTy;

	▲ Show 20 Lines • Show All 1,946 Lines • Show Last 20 Lines