This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Sema/
-
Sema/
-
SemaDecl.cpp
-
test/
-
OpenMP/
-
declare_target_messages.cpp
-
SemaCUDA/
-
openmp-target.cu

Differential D109175

[openmp] Emit deferred diag only when device compilation presents
ClosedPublic

Authored by weiwang on Sep 2 2021, 10:47 AM.

Download Raw Diff

Details

Reviewers

jdoerfert
yaxunl

Commits

rGb283d55c90dd: [openmp] Emit deferred diag only when device compilation presents

Summary

There is no need to check for deferred diag when device compilation or target is
not given. This results in considerable build time improvement in some cases.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

weiwang created this revision.Sep 2 2021, 10:47 AM

Herald added subscribers: hoy, dexonsmith, wenlei and 3 others. · View Herald TranscriptSep 2 2021, 10:47 AM

weiwang requested review of this revision.Sep 2 2021, 10:47 AM

Herald added a reviewer: jdoerfert. · View Herald TranscriptSep 2 2021, 10:47 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: cfe-commits, sstefan1. · View Herald Transcript

weiwang added a reviewer: yaxunl.Sep 2 2021, 10:49 AM

weiwang added a subscriber: sugak.

Our internal codebase never uses the target directive. Once the deferred diags is bypassed, we observed 18% e2e build time improvement.

In D109175#2980333, @weiwang wrote:

Our internal codebase never uses the target directive. Once the deferred diags is bypassed, we observed 18% e2e build time improvement.

Is that with -fopenmp or without?
That seems, kinda a lot more than i would have expected,
perhaps there are some other ways to reduce the overhead other than this approach?

In D109175#2980446, @lebedev.ri wrote:

In D109175#2980333, @weiwang wrote:

Our internal codebase never uses the target directive. Once the deferred diags is bypassed, we observed 18% e2e build time improvement.

Is that with -fopenmp or without?
That seems, kinda a lot more than i would have expected,
perhaps there are some other ways to reduce the overhead other than this approach?

This is with -fopenmp and no other omp related flags. I'd prefer a more generic way of fixing this, but right now this seems to be most direct way.

Harbormaster completed remote builds in B122344: Diff 370328.Sep 2 2021, 11:54 AM

In D109175#2980446, @lebedev.ri wrote:

In D109175#2980333, @weiwang wrote:

Our internal codebase never uses the target directive. Once the deferred diags is bypassed, we observed 18% e2e build time improvement.

Is that with -fopenmp or without?
That seems, kinda a lot more than i would have expected,
perhaps there are some other ways to reduce the overhead other than this approach?

Yeah, though the slow down in build time is blocking us from moving to newer llvm for many projects. Currently I think it makes sense to give user the option to decide whether they want faster build or better diagnostics. Of course if the slow down is fully addressed in the future, the switch could be removed too.

Why do we need this flag, is the absence of -fopenmp-targets not sufficient?

In D109175#2980744, @jdoerfert wrote:

Why do we need this flag, is the absence of -fopenmp-targets not sufficient?

Just double checked, this is the full omp related options currently in use:

"-fopenmp"
"-fopenmp-version=31"
"-fopenmp-version=31"
"-fopenmp-cuda-parallel-target-regions"

We saw a huge number of DECLS_TO_CHECK_FOR_DEFERRED_DIAGS records. I don't know if this has anything to do with omp version being 31, since prior 5.0, everything is available on host.

In D109175#2980806, @weiwang wrote:
In D109175#2980744, @jdoerfert wrote:

Why do we need this flag, is the absence of -fopenmp-targets not sufficient?

Just double checked, this is the full omp related options currently in use:
"-fopenmp"
"-fopenmp-version=31"
"-fopenmp-version=31"
"-fopenmp-cuda-parallel-target-regions"
We saw a huge number of DECLS_TO_CHECK_FOR_DEFERRED_DIAGS records. I don't know if this has anything to do with omp version being 31, since prior 5.0, everything is available on host.

I don't think we are selective right now. As I was saying, disable deferred parsing if fopenmp-targets is missing, no need for this option.

In D109175#2980900, @jdoerfert wrote:
In D109175#2980806, @weiwang wrote:
In D109175#2980744, @jdoerfert wrote:

Why do we need this flag, is the absence of -fopenmp-targets not sufficient?

Just double checked, this is the full omp related options currently in use:
"-fopenmp"
"-fopenmp-version=31"
"-fopenmp-version=31"
"-fopenmp-cuda-parallel-target-regions"
We saw a huge number of DECLS_TO_CHECK_FOR_DEFERRED_DIAGS records. I don't know if this has anything to do with omp version being 31, since prior 5.0, everything is available on host.
I don't think we are selective right now. As I was saying, disable deferred parsing if fopenmp-targets is missing, no need for this option.

Sure I can certainly make the change. To make sure I understand you correctly: if -fopenmp-targets (or maybe fopenmp-is-device too) is not given from cmdline, we can just skip the deferred diags like this option does?

In D109175#2980905, @weiwang wrote:
In D109175#2980900, @jdoerfert wrote:
In D109175#2980806, @weiwang wrote:
In D109175#2980744, @jdoerfert wrote:

Why do we need this flag, is the absence of -fopenmp-targets not sufficient?

Just double checked, this is the full omp related options currently in use:
"-fopenmp"
"-fopenmp-version=31"
"-fopenmp-version=31"
"-fopenmp-cuda-parallel-target-regions"
We saw a huge number of DECLS_TO_CHECK_FOR_DEFERRED_DIAGS records. I don't know if this has anything to do with omp version being 31, since prior 5.0, everything is available on host.
I don't think we are selective right now. As I was saying, disable deferred parsing if fopenmp-targets is missing, no need for this option.
Sure I can certainly make the change. To make sure I understand you correctly: if -fopenmp-targets (or maybe fopenmp-is-device too) is not given from cmdline, we can just skip the deferred diags like this option does?

I thought so, @ABataev wdyt?

In D109175#2981054, @jdoerfert wrote:
In D109175#2980905, @weiwang wrote:
In D109175#2980900, @jdoerfert wrote:
In D109175#2980806, @weiwang wrote:
In D109175#2980744, @jdoerfert wrote:

Why do we need this flag, is the absence of -fopenmp-targets not sufficient?

Just double checked, this is the full omp related options currently in use:
"-fopenmp"
"-fopenmp-version=31"
"-fopenmp-version=31"
"-fopenmp-cuda-parallel-target-regions"
We saw a huge number of DECLS_TO_CHECK_FOR_DEFERRED_DIAGS records. I don't know if this has anything to do with omp version being 31, since prior 5.0, everything is available on host.
I don't think we are selective right now. As I was saying, disable deferred parsing if fopenmp-targets is missing, no need for this option.
Sure I can certainly make the change. To make sure I understand you correctly: if -fopenmp-targets (or maybe fopenmp-is-device too) is not given from cmdline, we can just skip the deferred diags like this option does?
I thought so, @ABataev wdyt?

Yes, deferred diags are used only for target-dependent compilation, so should be enough to check if IsTargetSpecified is false.

update as discussed.

Changed a few affected tests to explicitly add target triple and added a case where deferred diags are skipped.

weiwang retitled this revision from [openmp] Add clang cc1 option -fopenmp-skip-deferred-diags to [openmp] Emit deferred diag only when device compilation presents.Sep 3 2021, 11:18 AM

weiwang edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B122556: Diff 370630.Sep 3 2021, 11:56 AM

• hafixo added a commit: rCRT373035: hwasan: Compatibility fixes for short granules..Sep 6 2021, 12:44 AM

• hafixo added a commit: rGc336557f0238: hwasan: Compatibility fixes for short granules..Sep 6 2021, 12:47 AM

thopre removed a commit: rGc336557f0238: hwasan: Compatibility fixes for short granules..Sep 7 2021, 2:47 AM

thopre removed a commit: rCRT373035: hwasan: Compatibility fixes for short granules..Sep 7 2021, 2:51 AM

I agree with Johannes and Alexey that deferred diags are only needed when LangOpts.OMPTargetTriples.empty(). However, I am not sure whether it is only needed in device compilation.

For other offloading languages like CUDA/HIP it is needed in both device and host compilation.

In D109175#2986782, @yaxunl wrote:

I agree with Johannes and Alexey that deferred diags are only needed when LangOpts.OMPTargetTriples.empty(). However, I am not sure whether it is only needed in device compilation.

For other offloading languages like CUDA/HIP it is needed in both device and host compilation.

Technically, we might even want to delay in host only mode for OpenMP, but that is something we can revisit (e.g., by dynamically setting a flag based on the directives we've seen).
@yaxunl Should we for now check if there is any associated offload job?

In D109175#2987048, @jdoerfert wrote:

In D109175#2986782, @yaxunl wrote:

I agree with Johannes and Alexey that deferred diags are only needed when LangOpts.OMPTargetTriples.empty(). However, I am not sure whether it is only needed in device compilation.

For other offloading languages like CUDA/HIP it is needed in both device and host compilation.

Technically, we might even want to delay in host only mode for OpenMP, but that is something we can revisit (e.g., by dynamically setting a flag based on the directives we've seen).
@yaxunl Should we for now check if there is any associated offload job?

Shall we go ahead and get this change in and think about more longer term solution later?

In D109175#2989823, @weiwang wrote:

In D109175#2987048, @jdoerfert wrote:

In D109175#2986782, @yaxunl wrote:

I agree with Johannes and Alexey that deferred diags are only needed when LangOpts.OMPTargetTriples.empty(). However, I am not sure whether it is only needed in device compilation.

For other offloading languages like CUDA/HIP it is needed in both device and host compilation.

Technically, we might even want to delay in host only mode for OpenMP, but that is something we can revisit (e.g., by dynamically setting a flag based on the directives we've seen).
@yaxunl Should we for now check if there is any associated offload job?

Shall we go ahead and get this change in and think about more longer term solution later?

LGTM. This patch should be sufficient to limit deferred diags to OpenMP with offloading. Device compilation is covered by OpenMPIsDevice and host compilation is covered by !LangOpts.OMPTargetTriples.empty(). I will leave the decision to Johannes.

In D109175#2989997, @yaxunl wrote:

In D109175#2989823, @weiwang wrote:

In D109175#2987048, @jdoerfert wrote:

In D109175#2986782, @yaxunl wrote:

I agree with Johannes and Alexey that deferred diags are only needed when LangOpts.OMPTargetTriples.empty(). However, I am not sure whether it is only needed in device compilation.

For other offloading languages like CUDA/HIP it is needed in both device and host compilation.

Technically, we might even want to delay in host only mode for OpenMP, but that is something we can revisit (e.g., by dynamically setting a flag based on the directives we've seen).
@yaxunl Should we for now check if there is any associated offload job?

Shall we go ahead and get this change in and think about more longer term solution later?

LGTM. This patch should be sufficient to limit deferred diags to OpenMP with offloading. Device compilation is covered by OpenMPIsDevice and host compilation is covered by !LangOpts.OMPTargetTriples.empty(). I will leave the decision to Johannes.

Thanks. @jdoerfert. Could you approve this?

LGTM

This revision is now accepted and ready to land.Sep 15 2021, 10:02 AM

Closed by commit rGb283d55c90dd: [openmp] Emit deferred diag only when device compilation presents (authored by weiwang). · Explain WhyOct 25 2021, 11:19 AM

This revision was automatically updated to reflect the committed changes.

weiwang added a commit: rGb283d55c90dd: [openmp] Emit deferred diag only when device compilation presents.

Revision Contents

Path

Size

clang/

lib/

Sema/

SemaDecl.cpp

8 lines

test/

OpenMP/

declare_target_messages.cpp

7 lines

SemaCUDA/

openmp-target.cu

4 lines

Diff 382069

clang/lib/Sema/SemaDecl.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 12,603 Lines • ▼ Show 20 Lines	void Sema::AddInitializerToDecl(Decl RealDecl, Expr Init, bool DirectInit) {
if (CXXDirectInit) {		if (CXXDirectInit) {
assert(DirectInit && "Call-style initializer must be direct init.");		assert(DirectInit && "Call-style initializer must be direct init.");
VDecl->setInitStyle(VarDecl::CallInit);		VDecl->setInitStyle(VarDecl::CallInit);
} else if (DirectInit) {		} else if (DirectInit) {
// This must be list-initialization. No other way is direct-initialization.		// This must be list-initialization. No other way is direct-initialization.
VDecl->setInitStyle(VarDecl::ListInit);		VDecl->setInitStyle(VarDecl::ListInit);
}		}

if (LangOpts.OpenMP && VDecl->isFileVarDecl())		if (LangOpts.OpenMP &&
		(LangOpts.OpenMPIsDevice \|\| !LangOpts.OMPTargetTriples.empty()) &&
		VDecl->isFileVarDecl())
DeclsToCheckForDeferredDiags.insert(VDecl);		DeclsToCheckForDeferredDiags.insert(VDecl);
CheckCompleteVariableDeclaration(VDecl);		CheckCompleteVariableDeclaration(VDecl);
}		}

/// ActOnInitializerError - Given that there was an error parsing an		/// ActOnInitializerError - Given that there was an error parsing an
/// initializer for the given declaration, try to return to some form		/// initializer for the given declaration, try to return to some form
/// of sanity.		/// of sanity.
void Sema::ActOnInitializerError(Decl *D) {		void Sema::ActOnInitializerError(Decl *D) {
▲ Show 20 Lines • Show All 2,213 Lines • ▼ Show 20 Lines	Decl Sema::ActOnFinishFunctionBody(Decl dcl, Stmt *Body,
PopFunctionScopeInfo(ActivePolicy, dcl);		PopFunctionScopeInfo(ActivePolicy, dcl);
// If any errors have occurred, clear out any temporaries that may have		// If any errors have occurred, clear out any temporaries that may have
// been leftover. This ensures that these temporaries won't be picked up for		// been leftover. This ensures that these temporaries won't be picked up for
// deletion in some later function.		// deletion in some later function.
if (hasUncompilableErrorOccurred()) {		if (hasUncompilableErrorOccurred()) {
DiscardCleanupsInEvaluationContext();		DiscardCleanupsInEvaluationContext();
}		}

if (FD && (LangOpts.OpenMP \|\| LangOpts.CUDA \|\| LangOpts.SYCLIsDevice)) {		if (FD && ((LangOpts.OpenMP && (LangOpts.OpenMPIsDevice \|\|
		!LangOpts.OMPTargetTriples.empty())) \|\|
		LangOpts.CUDA \|\| LangOpts.SYCLIsDevice)) {
auto ES = getEmissionStatus(FD);		auto ES = getEmissionStatus(FD);
if (ES == Sema::FunctionEmissionStatus::Emitted \|\|		if (ES == Sema::FunctionEmissionStatus::Emitted \|\|
ES == Sema::FunctionEmissionStatus::Unknown)		ES == Sema::FunctionEmissionStatus::Unknown)
DeclsToCheckForDeferredDiags.insert(FD);		DeclsToCheckForDeferredDiags.insert(FD);
}		}

return dcl;		return dcl;
}		}
▲ Show 20 Lines • Show All 3,797 Lines • Show Last 20 Lines

clang/test/OpenMP/declare_target_messages.cpp

	// RUN: %clang_cc1 -triple x86_64-apple-macos10.7.0 -verify=expected,omp45 -fopenmp -fopenmp-version=45 -fnoopenmp-use-tls -ferror-limit 100 -o - %s			// RUN: %clang_cc1 -triple x86_64-apple-macos10.7.0 -verify=expected,omp45 -fopenmp -fopenmp-version=45 -fnoopenmp-use-tls -ferror-limit 100 -o - %s
	// RUN: %clang_cc1 -triple x86_64-apple-macos10.7.0 -verify=expected,omp5,host5 -fopenmp -fnoopenmp-use-tls -ferror-limit 100 -o - %s			// RUN: %clang_cc1 -triple x86_64-apple-macos10.7.0 -verify=expected,omp5,host5 -fopenmp -fopenmp-targets=x86_64-apple-macos10.7.0 -fnoopenmp-use-tls -ferror-limit 100 -o - %s
	// RUN: %clang_cc1 -triple x86_64-apple-macos10.7.0 -verify=expected,omp5,dev5 -fopenmp -fopenmp-is-device -fopenmp-targets=x86_64-apple-macos10.7.0 -aux-triple x86_64-apple-macos10.7.0 -fnoopenmp-use-tls -ferror-limit 100 -o - %s			// RUN: %clang_cc1 -triple x86_64-apple-macos10.7.0 -verify=expected,omp5,dev5 -fopenmp -fopenmp-is-device -fopenmp-targets=x86_64-apple-macos10.7.0 -aux-triple x86_64-apple-macos10.7.0 -fnoopenmp-use-tls -ferror-limit 100 -o - %s

	// RUN: %clang_cc1 -triple x86_64-apple-macos10.7.0 -verify=expected,omp5,host5 -fopenmp-simd -fnoopenmp-use-tls -ferror-limit 100 -o - %s			// RUN: %clang_cc1 -triple x86_64-apple-macos10.7.0 -verify=expected,omp5,host5 -fopenmp-simd -fopenmp-targets=x86_64-apple-macos10.7.0 -fnoopenmp-use-tls -ferror-limit 100 -o - %s
	// RUN: %clang_cc1 -triple x86_64-apple-macos10.7.0 -verify=expected,omp5,host5 -fopenmp-simd -fopenmp-is-device -fnoopenmp-use-tls -ferror-limit 100 -o - %s			// RUN: %clang_cc1 -triple x86_64-apple-macos10.7.0 -verify=expected,omp5,host5 -fopenmp-simd -fopenmp-is-device -fopenmp-targets=x86_64-apple-macos10.7.0 -fnoopenmp-use-tls -ferror-limit 100 -o - %s
	// RUN: %clang_cc1 -triple x86_64-apple-macos10.7.0 -verify=expected,omp45 -fopenmp-version=45 -fopenmp-simd -fnoopenmp-use-tls -ferror-limit 100 -o - %s			// RUN: %clang_cc1 -triple x86_64-apple-macos10.7.0 -verify=expected,omp45 -fopenmp-version=45 -fopenmp-simd -fnoopenmp-use-tls -ferror-limit 100 -o - %s

				// RUN: %clang_cc1 -triple x86_64-apple-macos10.7.0 -verify=expected,omp5 -fopenmp -fnoopenmp-use-tls -ferror-limit 100 -o - %s
	#pragma omp end declare target // expected-error {{unexpected OpenMP directive '#pragma omp end declare target'}}			#pragma omp end declare target // expected-error {{unexpected OpenMP directive '#pragma omp end declare target'}}

	int a, b, z; // omp5-error {{variable captured in declare target region must appear in a to clause}}			int a, b, z; // omp5-error {{variable captured in declare target region must appear in a to clause}}
	__thread int t; // expected-note {{defined as threadprivate or thread local}}			__thread int t; // expected-note {{defined as threadprivate or thread local}}

	#pragma omp declare target . // expected-error {{expected '(' after 'declare target'}}			#pragma omp declare target . // expected-error {{expected '(' after 'declare target'}}

	#pragma omp declare target			#pragma omp declare target
	▲ Show 20 Lines • Show All 184 Lines • Show Last 20 Lines

clang/test/SemaCUDA/openmp-target.cu

	// RUN: %clang_cc1 -triple x86_64 -verify=expected,dev \			// RUN: %clang_cc1 -triple x86_64 -verify=expected,dev \
	// RUN: -verify-ignore-unexpected=note \			// RUN: -verify-ignore-unexpected=note \
	// RUN: -fopenmp -fopenmp-version=50 -o - %s			// RUN: -fopenmp -fopenmp-version=50 -fopenmp-targets=amdgcn-amd-amdhsa -o - %s
	// RUN: %clang_cc1 -triple x86_64 -verify -verify-ignore-unexpected=note\			// RUN: %clang_cc1 -triple x86_64 -verify -verify-ignore-unexpected=note\
	// RUN: -fopenmp -fopenmp-version=50 -o - -x c++ %s			// RUN: -fopenmp -fopenmp-version=50 -fopenmp-targets=amdgcn-amd-amdhsa -o - -x c++ %s
	// RUN: %clang_cc1 -triple x86_64 -verify=dev -verify-ignore-unexpected=note\			// RUN: %clang_cc1 -triple x86_64 -verify=dev -verify-ignore-unexpected=note\
	// RUN: -fcuda-is-device -o - %s			// RUN: -fcuda-is-device -o - %s

	#if __CUDA__			#if __CUDA__
	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"
	__device__ void cu_devf();			__device__ void cu_devf();
	#endif			#endif

	Show All 30 Lines