Page MenuHomePhabricator

Add support for a backdoor driver option that enables emitting header usage information in JSON to a file
ClosedPublic

Authored by ahatanak on Nov 14 2022, 5:22 PM.

Details

Summary

Each line in the file is a JSON object that has the name of the main source file followed by the list of system header files included directly or indirectly from that file.

For example:

{"source":"/tmp/foo.c", "includes":["/usr/include/stdio.h", "/usr/include/stdlib.h"]}

To reduce the amount of data written to the file, only the system headers that are directly included from a file that isn't in the system directory are recorded.

Diff Detail

Event Timeline

ahatanak created this revision.Nov 14 2022, 5:22 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 14 2022, 5:22 PM
ahatanak requested review of this revision.Nov 14 2022, 5:22 PM

Mostly looks good. Is this expected to work with Clang modules at all?

clang/include/clang/Driver/Driver.h
248

Please document this.

clang/include/clang/Driver/Options.td
5678

Could this be an enum of the desired format instead? It would default to textual and be overwritten with something like -header-include-format=json? I think it would result in simpler interface if in the future we decide to add another format.

clang/lib/Frontend/HeaderIncludeGen.cpp
68

Do we really want this to be a set? I can imagine clients wanting the include order to be preserved and with potential duplicates (for textual headers without guards or #pragma once).

If the set semantics are indeed what we want, I'd recommend using llvm::StringSet<> instead.

ahatanak updated this revision to Diff 475984.Nov 16 2022, 6:59 PM
ahatanak marked 3 inline comments as done.

Address review comments.

Mostly looks good. Is this expected to work with Clang modules at all?

My use case only has to keep track of textual headers.

jansvoboda11 added inline comments.Nov 17 2022, 12:49 PM
clang/include/clang/Driver/Driver.h
240

Can we make this HeaderIncludeFormatEnum instead of std::string?

clang/include/clang/Frontend/DependencyOutputOptions.h
49

I'd either drop the Enum suffix or replace it with Kind. But up to you.

clang/lib/Frontend/HeaderIncludeGen.cpp
258

This can be StringRef I think.

ahatanak updated this revision to Diff 476306.Nov 17 2022, 6:29 PM
ahatanak marked 2 inline comments as done.

Address review comments. Also, change check string in test case to fix pre-merge checks that were failing.

clang/lib/Frontend/HeaderIncludeGen.cpp
258

I simplified this part by removing the code that checks whether MainID is valid. It's set when BeginSourceFile is called, so it should be valid.

jansvoboda11 added inline comments.Nov 18 2022, 2:04 PM
clang/lib/Frontend/HeaderIncludeGen.cpp
294–295

Hmm, this new logic seems orthogonal to the output format. Changing the set of headers only based on the desired format seems counter-intuitive. I think we should either allow specifying the desired logic via a separate command-line flag, or decouple the new feature from the -header-include-file flag entirely. What are your thoughts?

(Sorry I didn't notice this earlier.)

ahatanak updated this revision to Diff 480101.Dec 5 2022, 7:19 AM
ahatanak marked an inline comment as done.

Add another environment variable CC_PRINT_HEADERS_FILTERING, which can be used to specify whether header information should be filtered or not.

Currently, CC_PRINT_HEADERS_FORMAT=json, CC_PRINT_HEADERS_FILTERING=on and CC_PRINT_HEADERS_FORMAT=textual, CC_PRINT_HEADERS_FILTERING=off are allowed. CC_PRINT_HEADERS=1 gets translated to CC_PRINT_HEADERS_FORMAT=textual, CC_PRINT_HEADERS_FILTERING=off.

jansvoboda11 added inline comments.Dec 5 2022, 1:58 PM
clang/lib/Driver/ToolChains/Clang.cpp
5571

Let's change "on" to something that won't interfere with potential future filters. How about "no-transitive-system"? (or "only-direct-system" if that better fits the semantics)

Same for the command-line counterpart.

clang/lib/Frontend/HeaderIncludeGen.cpp
161

I don't think asserting on invalid user input is a good strategy. Let's emit a diagnostic and exit gracefully instead.

clang/tools/driver/driver.cpp
279

This should probably be a proper diagnostic.

ahatanak added inline comments.Dec 6 2022, 12:06 PM
clang/lib/Frontend/HeaderIncludeGen.cpp
161

The combination of (textual,filtering) is rejected by the driver, so the assertion should always succeed if the user invoked the driver and set the environment variables. Do we need a diagnostic for the case where -header-include-format and -header-include-filtering are passed on the CC1 command line by the user?

jansvoboda11 added inline comments.Dec 6 2022, 4:06 PM
clang/lib/Frontend/HeaderIncludeGen.cpp
161

Ah, you're right about the driver. Yes. I think we should emit a diagnostic when this is passed to cc1 on the command line. Good place for this might be ParseDependencyOutputArgs in CompilerInvocation.cpp, after the marshalling layer populates the DependencyOutputOptions fields.

ahatanak updated this revision to Diff 480737.Dec 6 2022, 7:01 PM
ahatanak marked 3 inline comments as done.

Address review comments.

jansvoboda11 accepted this revision.Dec 7 2022, 2:26 PM

LGTM, thank you for working through this!

This revision is now accepted and ready to land.Dec 7 2022, 2:26 PM
akyrtzi accepted this revision.Dec 7 2022, 2:45 PM
This revision was landed with ongoing or failed builds.Dec 7 2022, 3:31 PM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Dec 7 2022, 4:11 PM

Looks like this breaks tests on windows: http://45.33.8.238/win/71451/step_7.txt

Please take a look and revert for now if it takes a while to fix.

SeanP added a subscriber: SeanP.Thu, Jan 5, 10:28 AM
SeanP added inline comments.
clang/tools/driver/driver.cpp
251

This change is not POSIX compliant. If T is char *, the next call to getenv() on line 253 invalidates the value saved into OptVal. getenv() is allowed to use the same buffer for the return value. That means the value OptVal is pointing at will be overwritten by the next call.

When T is bool you need the !! operators to get the char * converted to bool correctly.

I suggest leaving this function as it was and calling getenv() directly for the scenario involving CC_PRINT_HEADERS_FORMAT. Make sure you save the getenv() results into std::string so the value isn't overridden.

ahatanak added inline comments.Thu, Jan 5, 11:05 PM
clang/tools/driver/driver.cpp
251

Thank you for notifying me of the bug. It should be fixed in https://github.com/llvm/llvm-project/commit/34aa2e24c89ae39c0db4254d8aafcae0285dbe34