This is an archive of the discontinued LLVM Phabricator instance.

preprocessor supports `-dI` flag
ClosedPublic

Authored by elsteveogrande on Sep 30 2016, 7:45 PM.

Details

Summary

Already exists in gcc (look for -dCHARS in https://gcc.gnu.org/onlinedocs/gcc-6.2.0/gcc/Preprocessor-Options.html#Preprocessor-Options ).

This change aims to add this option, pass it through to the preprocessor via the options class, and when inclusions occur we output some information (+ test cases). In my own testing and experimenting with it I do see similar results to gcc's.

cd ~/build  # build dir for llvm+clang
echo '#include <header1.h>' > test.cpp
echo 'int main() { return FOO; }' >> test.cpp
echo '#include "header2.h"' > header1.h
echo '#include "/tmp/header3.h"' >> header1.h
echo '#define FOO 42' > /tmp/header2.h
echo '#define BAR 43' > /tmp/header3.h

bin/clang++ -cc1 -E -isystem /tmp -I $(pwd) -dI test.cpp
# 1 "test.cpp"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 319 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "test.cpp" 2
#include <header1.h> /* clang -E -dI */
# 1 "test.cpp"
# 1 "/Users/steveo/build/header1.h" 1
#include "header2.h" /* clang -E -dI */
# 1 "/Users/steveo/build/header1.h"
# 1 "/tmp/header2.h" 1 3
# 2 "/Users/steveo/build/header1.h" 2
#include "/tmp/header3.h" /* clang -E -dI */
# 2 "/Users/steveo/build/header1.h"
# 1 "/tmp/header3.h" 1
# 3 "/Users/steveo/build/header1.h" 2
# 2 "test.cpp" 2
int main() { return 42; }

Equivalent GCC invocation:

g++ -E -isystem /tmp -I $(pwd) -dI test.cpp
# 1 "test.cpp"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "test.cpp"
#include <header1.h>
# 1 "test.cpp"
# 1 "/home/steveo/build/header1.h" 1
#include "header2.h"
# 1 "/home/steveo/build/header1.h"
# 1 "/tmp/header2.h" 1 3 4
# 2 "/home/steveo/build/header1.h" 2
#include "/tmp/header3.h"
# 2 "/home/steveo/build/header1.h"
# 1 "/tmp/header3.h" 1
# 2 "/home/steveo/build/header1.h" 2
# 2 "test.cpp" 2
int main() { return
# 2 "test.cpp" 3 4
                   42
# 2 "test.cpp"
                      ; }

The line markers just following the #include seem to reflect the source code location where the inclusion occurred, and the actual path to the included file. This diff doesn't change the behavior or ordering of that; however it seems to reliably occur in both clang and gcc.

Diff Detail

Repository
rL LLVM

Event Timeline

elsteveogrande retitled this revision from to preprocessor supports `-dI` flag.
elsteveogrande updated this object.
elsteveogrande set the repository for this revision to rL LLVM.
elsteveogrande updated this object.
elsteveogrande updated this object.

Added a couple of notes

stable/src/llvm/tools/clang/lib/Frontend/PrintPreprocessedOutput.cpp
241 ↗(On Diff #73185)

Re-reading this diff, I saw hat my editor trimmed EOL whitespace automatically and I let these minor changes become part of the patch. Sorry for the noise :)

390 ↗(On Diff #73185)

The FIXME and hardcoded #include are copied from the clause just above this. Seems getting the text for this token wasn't that easy. I can try again and solve this in a future patch, if it's ok to let this ine slide, @rsmith.

stable/src/llvm/tools/clang/test/Preprocessor/dump_import.m
1 ↗(On Diff #73185)

Also: I should have said ^#include at the end of this line. Fortunately if the TODO is fixed, the preprocessor will report #import and this test will still work :)

Updated to actually use the include token's text, not a hardcoded #include. Updated unit test to expect the right word here.

elsteveogrande set the repository for this revision to rL LLVM.

Ping :) Hoping someone on cfe-commits might be able to take a quick look...

rsmith added inline comments.Oct 3 2016, 4:33 PM
stable/src/llvm/tools/clang/lib/Frontend/PrintPreprocessedOutput.cpp
320–321 ↗(On Diff #73185)

Do we really this? We're only using these sanitized paths within /*...*/ comments, and generating a newline in that context seems fine. The only thing that seems necessary to escape is a */ within the path, which this doesn't handle.

390 ↗(On Diff #73185)

This is fine for now.

stable/src/llvm/tools/clang/test/Preprocessor/dump_import.m
1 ↗(On Diff #73185)

It's still better to check for #include here with a FIXME in the test, since that way once we fix the issue, we'll know to update the test.

Also the other issues are old comments, now fixed with the latest updates to this. This does now take the value of the include token: include or include_once or import etc.; and send that to output.

Thanks very much @rsmith ! (Sorry to have been a pest.)

stable/src/llvm/tools/clang/lib/Frontend/PrintPreprocessedOutput.cpp
320–321 ↗(On Diff #73185)

right you are! I was thinking of // when I was doing that escaping logic.

I should either escape */, or even better, just use // for comments, is that ok? (I feel like // is more robust because it'll really ignore characters until the end-of-line.)

elsteveogrande added inline comments.Oct 3 2016, 5:48 PM
stable/src/llvm/tools/clang/lib/Frontend/PrintPreprocessedOutput.cpp
320–321 ↗(On Diff #73185)

never mind, can't use // with c89, ansi, etc. Will correct this escaping business.

elsteveogrande updated this object.

Fixed escaping function; simplified that function as well as the diagnostic output.

majnemer added inline comments.
lib/Frontend/PrintPreprocessedOutput.cpp
321 ↗(On Diff #73394)

Don't pass StringRef by const reference, just pass it by value.

321–325 ↗(On Diff #73394)

Variables should start with uppercase characters.

344 ↗(On Diff #73394)

This could be a StringRef instead of a std::string.

346 ↗(On Diff #73394)

Pointers lean right.

348 ↗(On Diff #73394)

Why not just use getName ?

Thanks for the feedback @majnemer! Fixed these. I _think_ I used StringRef right; I assume it's like std::string in that it manages its own memory (i.e. I didn't create references to deallocated strings, etc.).

elsteveogrande marked 5 inline comments as done.Oct 3 2016, 10:40 PM
rsmith added inline comments.Oct 4 2016, 8:57 AM
lib/Frontend/PrintPreprocessedOutput.cpp
328 ↗(On Diff #73398)

Putting a \ before a * won't stop it being recognised as part of a */.

329 ↗(On Diff #73398)

Is this really necessary? It'll be very ugly on Windows.

396 ↗(On Diff #73398)

Please do this in the preceding case too.

Thanks again @rsmith! Updates will be coming; I have some other fixes as well.

lib/Frontend/PrintPreprocessedOutput.cpp
329 ↗(On Diff #73398)

Yeah, there will be tons of double-backslashes because of this. I think to sidestep this -- plus the derp moment above about \/* still breaking comments regardless -- I could perhaps simplify all this escaping.

Side note, to explain the motivation of this diff and why I'm jumping through hoops and spending so much time to make this properly-escaped; skip to last paragraph if it's not of interest :).

The ulterior motive is to allow our build system (http://buckbuild.com) to determine how include paths were resolved during preprocessing of some file. So, for example, when checking an existing precompiled header to see if it's compatible with some other subsequent build (which we'd like to use the PCH in), and assuming other flags like -g -O2 -fPIC and so on match, we need to know whether given lists of include paths (that from the PCH build, and the current build) are "compatible". Like, would a #include <thread> resolve to the same libc++ version of thread.

Anyway: so I was trying hard to make this "magic comment" machine-readable but it's getting pretty hairy, with escapes and backslashes and having to somehow escape */ and so on. (Escaping forward slashes will make this ugly on non-Windows :) ). Since this is a purpose-built and non-generic, directory name escaper, I could try:

  • fix the */ problem, and \\, somehow. Change */ to *\/ but don't bother escaping slashes all the time (only on comment-busting slashes).
  • just drop asterisks and "weird stuff" from filenames, solving those, but causing other problems (correctness, mangling up pathnames).
376–378 ↗(On Diff #73398)

Will fix this site as mentioned.

396 ↗(On Diff #73398)

Will do. Actually I found PP.getSpelling(token) in this same file, worked fine :)

majnemer added inline comments.Oct 4 2016, 10:46 AM
lib/Frontend/PrintPreprocessedOutput.cpp
344 ↗(On Diff #73398)

Could this just return a StringRef? You could use an empty StringRef on failure.

321–325 ↗(On Diff #73394)

Please uppercase all your other variables too.

Thanks again @majnemer! Other vars are fixed, will send an update shortly.

lib/Frontend/PrintPreprocessedOutput.cpp
344 ↗(On Diff #73398)

I'll get rid of this in favor of PP.getSpelling(token) instead.

Also (and I added this in the unit tests) it happens to work for include, include_next, include_macros, import.

  • Fix a few more style nits according to LLVM style guide.
  • Further fixed w/ clang-format -style=LLVM, kept (most of) recommended changes.
  • Added more unit tests (more include cases: #include_next, -imacro)
  • Drop token-to-text function; discovered PP.getSpelling(token).
  • Fixed an existing FIXME where we want to preserve the correct inclusion token and not use a hardcoded one.
  • Clean up escape function, make safer w/r/t assumptions about buffer sizes and make parameter passing less ugly. Improved logic, finally got simple escaping of end-of-comment sequence right after several hairbrained tries :)

Thanks for taking time to provide input @rsmith @majnemer !

elsteveogrande marked 4 inline comments as done and an inline comment as not done.Oct 4 2016, 11:39 AM
majnemer added inline comments.Oct 4 2016, 5:57 PM
lib/Frontend/PrintPreprocessedOutput.cpp
101 ↗(On Diff #73518)

Please use the LLVM naming convention for dumpIncludeDirectives.

399 ↗(On Diff #73518)

I'd spell out the type of TokenText, it is not clear which overload of getSpelling will get called.

400 ↗(On Diff #73518)

assert(!TokenText.empty());

404 ↗(On Diff #73518)

!SearchPath.empty()

409–411 ↗(On Diff #73518)

Why not have sanitizePath return a std::vector<char>?

Thanks -- will do one more pass

lib/Frontend/PrintPreprocessedOutput.cpp
409–411 ↗(On Diff #73518)

Hmm, good idea. Better than caller-passed vectors, and it be whatever size is needed (in the called method I'd CharVector.reserve anyway). The return of that vector should get elided.

elsteveogrande marked 6 inline comments as done.Oct 4 2016, 7:30 PM

Addressed most recent comments, hopefully squashed the last few things.

majnemer added inline comments.Oct 5 2016, 3:53 PM
lib/Frontend/PrintPreprocessedOutput.cpp
331–349 ↗(On Diff #73625)

I think this loop would be easier to understand like so:

while (!Path.empty()) {
  if (Path.consume_front("\\")) {
    Buffer.push_back("\\\\");
  } else if (Path.consume_front("\"")) {
    Buffer.push_back("\\\"");
  } else if (Path.consume_front("*/")) {
    Buffer.push_back("*\\/");
  } else {
    Buffer.push_back(Path.front());
    Path = Path.drop_front();
  }
}

The big takeaway is that we now avoid messy I + 1 < N type checks.

elsteveogrande marked an inline comment as done.Oct 5 2016, 6:15 PM
elsteveogrande added inline comments.
lib/Frontend/PrintPreprocessedOutput.cpp
331–349 ↗(On Diff #73625)

yes! I like this much better, thanks. Didn't know about that method, very handy.

Also at the risk of changing more stuff and possibly churning more (thank you for sticking with me this long!) I'll change it to use either a std::stringstream, or std::string with operator+= to build it up.

A std::vector<char> won't take additional strings push_back'ed like that, and while it's more handy than char[], it's slightly weird, when this is stringstream's purpose, imho.

elsteveogrande marked an inline comment as done.

Using consume_front(sequence), cleaner escaping code.

elsteveogrande added inline comments.Oct 6 2016, 10:46 AM
lib/Frontend/PrintPreprocessedOutput.cpp
329–330 ↗(On Diff #73720)

Note: I'm pretty sure this is about as efficient as it gets for string-building (and then also, string-builds will tend to happen only once per include). The string has its internal char-vector-like structure, pre-sized to a generous amount, so extra futzing should not occur, making this O(length of strings appended to it); and the return should optimize away so that it's not copied.

More importantly though, this is now quite a bit cleaner.

elsteveogrande marked an inline comment as done.Oct 6 2016, 6:05 PM

@rsmith @majnemer Ready to review once more.

(Pardon the nag but I wanted to ensure you knew. If there's another round of fixes I'll change to "changes planned", fix, and then move back to "needs review" when ready again. So you'll know if it's in a ready state or not :) )

majnemer edited edge metadata.Oct 6 2016, 9:32 PM

The change from an algorithmic POV looks good but I can't speak to whether or not the approach is sound, I'll leave that to @rsmith.

lib/Frontend/PrintPreprocessedOutput.cpp
404 ↗(On Diff #73518)

This review comment was never addressed.

elsteveogrande added inline comments.Oct 6 2016, 9:52 PM
lib/Frontend/PrintPreprocessedOutput.cpp
388–394 ↗(On Diff #73720)

facepalm -- how did I miss that size() > 0. I'll fix that along with the next batch of updates.

Or another idea. To get this out the door, perhaps I just drop this and sanitizePath? The essential part (for me) is getting the -dI option working, and printing at least the #includes. More useful diagnostic info can get tacked on later perhaps.

elsteveogrande edited edge metadata.

Dropped escaping function, adds logic to this diff which isn't 100% necessary at this time. See updated description, under test plan, for details. The output already does hint at the path which was used to resolve that include.

elsteveogrande updated this object.

update description w/ arc diff --verbatim

elsteveogrande marked an inline comment as done.Oct 7 2016, 4:40 AM

Hi @rsmith -- now this simply reports the #include line (or whatever token) without fiddly path escaping. So this is simplified and there's less room for error and such.

ping :) Wanted to get sign off on this (simplified) diff if possible.

Thanks! Sorry to nag. (This would be one part of the solution for really long build times for us.)

cc a few more devs who have dealt with frontend lately and might be familiar with this part. :)

vsk added inline comments.Oct 10 2016, 9:39 PM
test/Preprocessor/dump_include.c
3 ↗(On Diff #73917)

Could you use FileCheck instead of grep? E.g;

// RUN: %clang_cc1 -w -E -dI -isystem %S -imacros %S/dump_include.h %s -o - | FileCheck %s
// CHECK: {{^}}#include <dump_
// CHECK: {{^}}#include "dump_
// CHECK: {{^}}#include_next "dump_
// CHECK: {{^}}#__include_macros "{{.*}}dump"

This applies to your other test as well.

Thanks very much @vsk for the feedback! Updated per your recommendations; CHECK looks much better

elsteveogrande marked an inline comment as done.Oct 11 2016, 8:58 AM

Ping again -- addressed all issues, looking ok now?

Note about this change + review process here. First I know the every 1-2 day pinging generates noise on the lists, and for that I'm SorryNotSorry :-p

I just want to ensure this is reaching at least the right people for feedback + eventually accepting.

Also this diff is pretty low-risk...

  • It's short and doesn't change too much.
  • What it does change is essentially opt-in with a new flag. So this shouldn't break existing toolchains and projects, because nothing uses this yet.
  • It's largely "additive" and again shouldn't change existing behavior, just allowing for new functionality.
  • This new flag is for feature parity w/ gcc, and so there is some prescription for its behavior. Though not officially documented (absent from GCC docs, for example, but with a small amount of info in man 1 gcc), it seems to work roughly the same.
  • Has unit test coverage which I believe covers all possible usage cases (different ways to #include files).
  • I stripped away unneeded code which was creating some sticking points, working around that for now, shortening this diff more. (And reducing mystery and having to make decisions about e.g. string escaping and weird stuff.)
  • Had quite a bit of feedback already from @rsmith, @majnemer, @vsk (thank you all!) -- addressed a number of issues and cleaned this up a lot.

Is this in acceptable shape? Any objections to accept, land, then revert if catastrophe (doubt it for reasons above)?

Thanks again!

rsmith accepted this revision.Oct 15 2016, 2:07 PM
rsmith edited edge metadata.

Thanks for your patience, this looks great to me. Do you need someone to commit this for you?

This revision is now accepted and ready to land.Oct 15 2016, 2:07 PM

Thank you! I tried with`arc land` but got a 403...

I'll need help landing this... I also tried git svn dcommit but command hung there forever.

Thanks!

ping - seems I'm unable to commit this myself. Thanks!

elsteveogrande edited edge metadata.

Fixed an error. A newline is sometimes not appended prior to this #include.

When returning from an included file which doesn't have a trailing newline, the #include is stuck at the end of some other line, i.e. the include's own # isn't the first character in the line.

Added a new line if needed like the other code just above it. I copied+pasted the moveTo as well.

Ping, can anyone help with committing this? It's accepted, just needs to be landed. (I don't have commit access.) Thanks!

This revision was automatically updated to reflect the committed changes.
bruno edited edge metadata.Oct 28 2016, 10:11 AM

The test is failing on windows:

http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/builds/141

I reverted the patch for now in r285416, can you take a look?

cfe/trunk/test/Preprocessor/dump_include.c
2

Thanks very much @bruno and sorry for the trouble here! Looks like I was expecting a forward slash at the beginning, forgetting about drive letters on Windows -- and backslashes (actually looks like backslashes and forward slashes can get mixed?).

Indeed the output on windows looked like:

#__include_macros "C:\b\slave\clang-x86-windows-msvc2015\clang-x86-windows-msvc2015\llvm\tools\clang\test\Preprocessor/dump_include.h" /* clang -E -dI */

which actually looks correct. I'll have to correct my test here for this case. Will have this updated in a couple of minutes. Thanks very much!