This is an archive of the discontinued LLVM Phabricator instance.

[openmp] Emit deferred diag only when device compilation presents
ClosedPublic

Authored by weiwang on Sep 2 2021, 10:47 AM.

Details

Summary

There is no need to check for deferred diag when device compilation or target is
not given. This results in considerable build time improvement in some cases.

Diff Detail

Event Timeline

weiwang created this revision.Sep 2 2021, 10:47 AM
weiwang requested review of this revision.Sep 2 2021, 10:47 AM
Herald added a project: Restricted Project. · View Herald Transcript
weiwang added a subscriber: sugak.

Our internal codebase never uses the target directive. Once the deferred diags is bypassed, we observed 18% e2e build time improvement.

Our internal codebase never uses the target directive. Once the deferred diags is bypassed, we observed 18% e2e build time improvement.

Is that with -fopenmp or without?
That seems, kinda a lot more than i would have expected,
perhaps there are some other ways to reduce the overhead other than this approach?

Our internal codebase never uses the target directive. Once the deferred diags is bypassed, we observed 18% e2e build time improvement.

Is that with -fopenmp or without?
That seems, kinda a lot more than i would have expected,
perhaps there are some other ways to reduce the overhead other than this approach?

This is with -fopenmp and no other omp related flags. I'd prefer a more generic way of fixing this, but right now this seems to be most direct way.

Our internal codebase never uses the target directive. Once the deferred diags is bypassed, we observed 18% e2e build time improvement.

Is that with -fopenmp or without?
That seems, kinda a lot more than i would have expected,
perhaps there are some other ways to reduce the overhead other than this approach?

Yeah, though the slow down in build time is blocking us from moving to newer llvm for many projects. Currently I think it makes sense to give user the option to decide whether they want faster build or better diagnostics. Of course if the slow down is fully addressed in the future, the switch could be removed too.

Why do we need this flag, is the absence of -fopenmp-targets not sufficient?

weiwang added a comment.EditedSep 2 2021, 1:24 PM

Why do we need this flag, is the absence of -fopenmp-targets not sufficient?

Just double checked, this is the full omp related options currently in use:

"-fopenmp"
"-fopenmp-version=31"
"-fopenmp-version=31"
"-fopenmp-cuda-parallel-target-regions"

We saw a huge number of DECLS_TO_CHECK_FOR_DEFERRED_DIAGS records. I don't know if this has anything to do with omp version being 31, since prior 5.0, everything is available on host.

Why do we need this flag, is the absence of -fopenmp-targets not sufficient?

Just double checked, this is the full omp related options currently in use:

"-fopenmp"
"-fopenmp-version=31"
"-fopenmp-version=31"
"-fopenmp-cuda-parallel-target-regions"

We saw a huge number of DECLS_TO_CHECK_FOR_DEFERRED_DIAGS records. I don't know if this has anything to do with omp version being 31, since prior 5.0, everything is available on host.

I don't think we are selective right now. As I was saying, disable deferred parsing if fopenmp-targets is missing, no need for this option.

Why do we need this flag, is the absence of -fopenmp-targets not sufficient?

Just double checked, this is the full omp related options currently in use:

"-fopenmp"
"-fopenmp-version=31"
"-fopenmp-version=31"
"-fopenmp-cuda-parallel-target-regions"

We saw a huge number of DECLS_TO_CHECK_FOR_DEFERRED_DIAGS records. I don't know if this has anything to do with omp version being 31, since prior 5.0, everything is available on host.

I don't think we are selective right now. As I was saying, disable deferred parsing if fopenmp-targets is missing, no need for this option.

Sure I can certainly make the change. To make sure I understand you correctly: if -fopenmp-targets (or maybe fopenmp-is-device too) is not given from cmdline, we can just skip the deferred diags like this option does?

Why do we need this flag, is the absence of -fopenmp-targets not sufficient?

Just double checked, this is the full omp related options currently in use:

"-fopenmp"
"-fopenmp-version=31"
"-fopenmp-version=31"
"-fopenmp-cuda-parallel-target-regions"

We saw a huge number of DECLS_TO_CHECK_FOR_DEFERRED_DIAGS records. I don't know if this has anything to do with omp version being 31, since prior 5.0, everything is available on host.

I don't think we are selective right now. As I was saying, disable deferred parsing if fopenmp-targets is missing, no need for this option.

Sure I can certainly make the change. To make sure I understand you correctly: if -fopenmp-targets (or maybe fopenmp-is-device too) is not given from cmdline, we can just skip the deferred diags like this option does?

I thought so, @ABataev wdyt?

Why do we need this flag, is the absence of -fopenmp-targets not sufficient?

Just double checked, this is the full omp related options currently in use:

"-fopenmp"
"-fopenmp-version=31"
"-fopenmp-version=31"
"-fopenmp-cuda-parallel-target-regions"

We saw a huge number of DECLS_TO_CHECK_FOR_DEFERRED_DIAGS records. I don't know if this has anything to do with omp version being 31, since prior 5.0, everything is available on host.

I don't think we are selective right now. As I was saying, disable deferred parsing if fopenmp-targets is missing, no need for this option.

Sure I can certainly make the change. To make sure I understand you correctly: if -fopenmp-targets (or maybe fopenmp-is-device too) is not given from cmdline, we can just skip the deferred diags like this option does?

I thought so, @ABataev wdyt?

Yes, deferred diags are used only for target-dependent compilation, so should be enough to check if IsTargetSpecified is false.

weiwang updated this revision to Diff 370630.EditedSep 3 2021, 11:17 AM

update as discussed.

Changed a few affected tests to explicitly add target triple and added a case where deferred diags are skipped.

weiwang retitled this revision from [openmp] Add clang cc1 option -fopenmp-skip-deferred-diags to [openmp] Emit deferred diag only when device compilation presents.Sep 3 2021, 11:18 AM
weiwang edited the summary of this revision. (Show Details)
yaxunl added a comment.Sep 7 2021, 7:33 AM

I agree with Johannes and Alexey that deferred diags are only needed when LangOpts.OMPTargetTriples.empty(). However, I am not sure whether it is only needed in device compilation.

For other offloading languages like CUDA/HIP it is needed in both device and host compilation.

I agree with Johannes and Alexey that deferred diags are only needed when LangOpts.OMPTargetTriples.empty(). However, I am not sure whether it is only needed in device compilation.

For other offloading languages like CUDA/HIP it is needed in both device and host compilation.

Technically, we might even want to delay in host only mode for OpenMP, but that is something we can revisit (e.g., by dynamically setting a flag based on the directives we've seen).
@yaxunl Should we for now check if there is any associated offload job?

I agree with Johannes and Alexey that deferred diags are only needed when LangOpts.OMPTargetTriples.empty(). However, I am not sure whether it is only needed in device compilation.

For other offloading languages like CUDA/HIP it is needed in both device and host compilation.

Technically, we might even want to delay in host only mode for OpenMP, but that is something we can revisit (e.g., by dynamically setting a flag based on the directives we've seen).
@yaxunl Should we for now check if there is any associated offload job?

Shall we go ahead and get this change in and think about more longer term solution later?

I agree with Johannes and Alexey that deferred diags are only needed when LangOpts.OMPTargetTriples.empty(). However, I am not sure whether it is only needed in device compilation.

For other offloading languages like CUDA/HIP it is needed in both device and host compilation.

Technically, we might even want to delay in host only mode for OpenMP, but that is something we can revisit (e.g., by dynamically setting a flag based on the directives we've seen).
@yaxunl Should we for now check if there is any associated offload job?

Shall we go ahead and get this change in and think about more longer term solution later?

LGTM. This patch should be sufficient to limit deferred diags to OpenMP with offloading. Device compilation is covered by OpenMPIsDevice and host compilation is covered by !LangOpts.OMPTargetTriples.empty(). I will leave the decision to Johannes.

I agree with Johannes and Alexey that deferred diags are only needed when LangOpts.OMPTargetTriples.empty(). However, I am not sure whether it is only needed in device compilation.

For other offloading languages like CUDA/HIP it is needed in both device and host compilation.

Technically, we might even want to delay in host only mode for OpenMP, but that is something we can revisit (e.g., by dynamically setting a flag based on the directives we've seen).
@yaxunl Should we for now check if there is any associated offload job?

Shall we go ahead and get this change in and think about more longer term solution later?

LGTM. This patch should be sufficient to limit deferred diags to OpenMP with offloading. Device compilation is covered by OpenMPIsDevice and host compilation is covered by !LangOpts.OMPTargetTriples.empty(). I will leave the decision to Johannes.

Thanks. @jdoerfert. Could you approve this?

This revision is now accepted and ready to land.Sep 15 2021, 10:02 AM