This is an archive of the discontinued LLVM Phabricator instance.

[WIP][NOT FOR COMMIT][PROTOTYPE] clang-scan-deps: dependency scanning tool rough prototype
Needs ReviewPublic

Authored by arphaman on Oct 16 2018, 6:48 PM.
This revision needs review, but there are no reviewers specified.

Details

Reviewers
None
Summary

This patch is a prototype implementation of the clang-scan-deps tool. This tool computes the set of file dependencies for a particular compiler invocation using some optimizations that are outlined below. This tool makes the non-modular dependency scanning up to 10 times faster for particular workloads (e.g. llc target, 1542 C++ files) on one of our machines, when compared to parallel invocations of clang with -Eonly. We are still in the early stages of proper modules support, but our initial crude prototype can get up to 4x when run on the first 1000 files from clang’s compilation database for a build of LLVM with modules turned on.

We run the full Clang preprocessor. Here’s what we do to reduce its workload:

  • Minimize sources by stripping away unused tokens. We keep only the interesting PP directives (#define, #if, #include, etc.), i.e. those that might impact the set of dependencies. The patch includes an additional standalone clang-filter-to-includes tool that can be used to perform this minimization.
  • Assume the filesystem is immutable for one run of the service, and cache the files and their minimized contents in memory in a global cache.
  • Skip over excluded preprocessor ranges by bumping up the buffer pointer in the lexer instead of lexing the skipped tokens.

Right now we still haven't integrated the explicit module experiments with this tool, and at the moment we are using the tool to:

  • benchmark the performance of the fast scanner.
  • compare the accuracy to the regular -Eonly invocation.

In the future it might be interesting to extend this tool to provide an interface that the build system can use for the pre scanning phase for the explicit modules.

Diff Detail

Repository
rC Clang

Event Timeline

arphaman created this revision.Oct 16 2018, 6:48 PM

With regards to D53352: you can change the diff related to a patch whilst keeping discuccion and metadata of the diff.

Please add a generic description to the diff for an easy skimming on what you are accomplishing.

If I get this right, your tool will spit out a CPP file that is only include directives and perhaps the related conditional logic, or the final output of your tool is a file list? This is different than the -M flags in a way that it keeps conditions sane, rather than spitting out what includes were used if the input, with regards to the compiler options, was to be compiled?

Have you checked the tool Include What You Use? I'm curious in what way the mishappenings of that tool present themselves in yours. There were some challenges not overcome in a good way in IWYU, their readme goes into a bit more detail on this.

ddunbar added inline comments.Oct 18 2018, 10:18 AM
lib/Lex/FilterToIncludes.cpp
628

What is our feeling w.r.t. _Pragma, which can in theory influence the preprocessor. I'm not sure this model can sanely support it?

arphaman added inline comments.Oct 18 2018, 10:40 AM
lib/Lex/FilterToIncludes.cpp
628

We need to look into that. In the worst case we can always avoid minimizing the file if it has a _Pragma anywhere in it.

arphaman added inline comments.Oct 18 2018, 10:54 AM
lib/Lex/FilterToIncludes.cpp
628

We also have to handle cases like this one:

foo.h:

#define PP _Pragma

bar.h:

PP ("foo")

Unfortunately this can't be handled by just disabling minimization for foo.h as PP will be stripped out from bar.h. Furthermore, this pattern is quite reasonable, so we should expect it in the code we receive. Right now I can't think of a reasonable solution for this problem.

There's also a more "evil" case:

#define P(A, B) A##B

P(_Pra,gma)("foo")

It would be reasonable to introduce a warning for the use of _Pragma token that was created using PP token concatenation and just hope that our users won't use this pattern.

arphaman added inline comments.Oct 18 2018, 11:03 AM
lib/Lex/FilterToIncludes.cpp
628

One possible solution to the first issue is to simply fallback to the regular -Eonly invocation if we detect an include of a file that has a macro with a _Pragma in it. Then we could emit a remark to the user saying that their build could be faster if they rewrite their code so that this pattern no longer occurs.

arphaman edited the summary of this revision. (Show Details)Oct 18 2018, 11:37 AM

With regards to D53352: you can change the diff related to a patch whilst keeping discuccion and metadata of the diff.

Good point, thanks!

Please add a generic description to the diff for an easy skimming on what you are accomplishing.

I added the description, thanks.

If I get this right, your tool will spit out a CPP file that is only include directives and perhaps the related conditional logic, or the final output of your tool is a file list?

It's both, as there are two tools in this patch. The first is the clang-filter-to-includes tool, which is wrapper around our source minimization optimization (i.e. it spits out a source with #includes and other PP directives). The clang-scan-deps tool integrates this optimization into a service-like tool that will produce a set of dependencies for a set of compilation commands. Right now it's mainly a benchmark that compares the speed of the fast scanner to the regular preprocessor invocation.

This is different than the -M flags in a way that it keeps conditions sane, rather than spitting out what includes were used if the input, with regards to the compiler options, was to be compiled?

It's supposed to produce identical output to the run of -Eonly with the -MD flag. Right now we don't know how we want to expose the dependency set to the clients.

Have you checked the tool Include What You Use? I'm curious in what way the mishappenings of that tool present themselves in yours. There were some challenges not overcome in a good way in IWYU, their readme goes into a bit more detail on this.

I haven't looked into IWYU that much. Could you please elaborate on which mishappenings you think might present themselves here?

dexonsmith added inline comments.Oct 18 2018, 6:52 PM
lib/Lex/FilterToIncludes.cpp
628

Hmm. Our plan for @import is to stop supporting code such as:

#define IMPORT(M) @import M
IMPORT(LLVMSupport);

where the @import is buried. I.e., start erroring out in normal Clang builds. Perhaps it would be reasonable to similarly disallow _Pragma usage that buries import/include statements. I.e., do not support (even in normal builds) the following:

#define IMPORT(M) _Pragma("clang module import " #M)
IMPORT(LLVMSupport)

Possibly, this could be a warning that is error-by-default.

jsji removed a subscriber: jsji.Oct 18 2018, 7:02 PM
kimgr added a subscriber: kimgr.Oct 28 2018, 8:06 AM

Hi Alex,

chapuni added a subscriber: chapuni.Nov 6 2018, 1:35 PM
aganea added a subscriber: aganea.Apr 9 2019, 6:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 9 2019, 6:57 AM
kousikk removed a subscriber: kousikk.Jul 10 2019, 1:59 PM
kousikk added a subscriber: kousikk.