This is an archive of the discontinued LLVM Phabricator instance.

Introduce a source minimizer that reduces source to directives that might affect the dependency list for a compilation

Authored by arphaman on Dec 7 2018, 4:38 PM.



This patch introduces a dependency directives source minimizer to clang that minimizes header and source files to the minimum necessary preprocessor directives for evaluating includes. It reduces the source down to #define, #include, #import, @import, and any conditional preprocessor logic that contains one of those.

The source minimizer works by lexing the input with a custom fast lexer that recognizes the preprocessor directives it cares about, and emitting those directives in the minimized source. It ignores source code, comments, and normalizes whitespace. It gives up and fails if seems any directives that it doesn't recognize as valid (e.g. #define 0).

In addition to the source minimizer this patch adds a print-dependency-directives-minimized-source CC1 option that allows you to invoke the minimizer using clang directly.

There a couple of known issues with the source minimizer:

  • It fails to detect @import that was formed in a macro expansion. We are planning to add a warning to discourage this use.
  • It fails to detect _Pragma ("clang import"). We are planning to probably add a warning to discourage this use.
  • It assumes raw string literals are valid when minimizing source without respecting language mode.

This is based on code that was included in the original WIP patch I posted before the dev meeting: . It's based on the original filter-to-includes code written by @dexonsmith . We are planning to use to implement fast dependency scanning for explicit module builds.

Diff Detail


Event Timeline

arphaman created this revision.Dec 7 2018, 4:38 PM
arphaman updated this revision to Diff 193214.Apr 1 2019, 6:05 PM

Rebased patch, added a #warning/error fix from @dexonsmith.

Herald added a project: Restricted Project. · View Herald TranscriptApr 1 2019, 6:05 PM
Herald added a subscriber: jdoerfert. · View Herald Transcript

@Bigcheese Please take a look. I'll post the basic tool that runs the preprocessor on regular vs minimized sources by Wed.

MaskRay added a subscriber: MaskRay.Apr 2 2019, 3:08 AM

Are you planning to do this recursively?
The minimizer does not help much for the following example, while Sema.h contains 10,000 lines of useless code.

#include "clang/Sema/Sema.h"

int foo() {}

Are you planning to do this recursively?
The minimizer does not help much for the following example, while Sema.h contains 10,000 lines of useless code.

#include "clang/Sema/Sema.h"

int foo() {}

Not recursively, but on each file *independently* by intercepting stat and open in a custom VFS layer. The old patch at (which Alex points to above) might give you more context.

aganea added a subscriber: aganea.Apr 9 2019, 6:56 AM
arphaman updated this revision to Diff 194331.Apr 9 2019, 8:01 AM

Updated to the new LLVM license comments.

Bigcheese requested changes to this revision.Apr 12 2019, 5:09 PM

I have a bit more review to do, but this is what I've found so far. The naming comments are just suggestions, but the digit separators' are actually an issue.

25 ↗(On Diff #194331)

This is a really long namespace name, not sure what else to call it though.

36 ↗(On Diff #194331)

Is @import actually a preprocessor directive? For C++20 modules all the modules bits end up being declarations.

923–924 ↗(On Diff #194331)

Can we give better error messages here? Should minimizeSourceToDependencyDirectives take a DiagnosticEngine?

28 ↗(On Diff #194331)

I'm not sure Lexer is the best name for this class. It's really a combined lexer and minimizer and I was a bit confused by some things until I realized that. I think it would make more sense to name this Minimizer and the associated lex as minimize.

203 ↗(On Diff #194331)

assert(First <= Last) to match the other checks?

241 ↗(On Diff #194331)

I don't think this handles digit separators correctly.

#include <bob>
int a = 0'1;
#include <foo>
This revision now requires changes to proceed.Apr 12 2019, 5:09 PM
arphaman updated this revision to Diff 200371.May 20 2019, 3:58 PM
arphaman marked 8 inline comments as done.
  • Added diagnostic support.
  • Fixed the issue with C++14 number separators.
  • Other fixes requested by @Bigcheese .
36 ↗(On Diff #194331)

No, @import is actually a declaration. I've updated the comments.

923–924 ↗(On Diff #194331)

Sure, I added support for diagnostics!

241 ↗(On Diff #194331)

Thanks, that was true! I fixed it to handle digit separators.

arphaman updated this revision to Diff 200376.May 20 2019, 4:53 PM

Remove some outdated commented out code.

This revision is now accepted and ready to land.May 21 2019, 5:59 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJun 3 2019, 3:57 PM
thakis added a subscriber: thakis.Jun 4 2019, 6:38 AM

Any reason this isn't tested via lit instead?