There is no need to check for deferred diag when device compilation or target is
not given. This results in considerable build time improvement in some cases.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Our internal codebase never uses the target directive. Once the deferred diags is bypassed, we observed 18% e2e build time improvement.
Is that with -fopenmp or without?
That seems, kinda a lot more than i would have expected,
perhaps there are some other ways to reduce the overhead other than this approach?
This is with -fopenmp and no other omp related flags. I'd prefer a more generic way of fixing this, but right now this seems to be most direct way.
Yeah, though the slow down in build time is blocking us from moving to newer llvm for many projects. Currently I think it makes sense to give user the option to decide whether they want faster build or better diagnostics. Of course if the slow down is fully addressed in the future, the switch could be removed too.
Just double checked, this is the full omp related options currently in use:
"-fopenmp" "-fopenmp-version=31" "-fopenmp-version=31" "-fopenmp-cuda-parallel-target-regions"
We saw a huge number of DECLS_TO_CHECK_FOR_DEFERRED_DIAGS records. I don't know if this has anything to do with omp version being 31, since prior 5.0, everything is available on host.
I don't think we are selective right now. As I was saying, disable deferred parsing if fopenmp-targets is missing, no need for this option.
Sure I can certainly make the change. To make sure I understand you correctly: if -fopenmp-targets (or maybe fopenmp-is-device too) is not given from cmdline, we can just skip the deferred diags like this option does?
Yes, deferred diags are used only for target-dependent compilation, so should be enough to check if IsTargetSpecified is false.
update as discussed.
Changed a few affected tests to explicitly add target triple and added a case where deferred diags are skipped.
I agree with Johannes and Alexey that deferred diags are only needed when LangOpts.OMPTargetTriples.empty(). However, I am not sure whether it is only needed in device compilation.
For other offloading languages like CUDA/HIP it is needed in both device and host compilation.
Technically, we might even want to delay in host only mode for OpenMP, but that is something we can revisit (e.g., by dynamically setting a flag based on the directives we've seen).
@yaxunl Should we for now check if there is any associated offload job?
Shall we go ahead and get this change in and think about more longer term solution later?
LGTM. This patch should be sufficient to limit deferred diags to OpenMP with offloading. Device compilation is covered by OpenMPIsDevice and host compilation is covered by !LangOpts.OMPTargetTriples.empty(). I will leave the decision to Johannes.