This is an archive of the discontinued LLVM Phabricator instance.

[clang] Disallow differences in defines used for creating and using PCH
ClosedPublic

Authored by mstorsjo on May 30 2022, 1:35 PM.

Details

Summary

Some amount of differences between defines when creating and using
a PCH were intentionally allowed in
c379c072405f39bca1d3552408fc0427328e8b6d (and later extended in
b63687519610a73dd565be1fec28332211b4df5b).

When using a PCH (or when picking a PCH out of a directory containing
multiple candidates) Clang used to accept the header if there were
defines on the command line when creating the PCH that are missing
when using the PCH, or vice versa, defines only set when using the
PCH.

The only cases where Clang explicitly rejected the use of a PCH
is if there was an explicit conflict between the options, e.g.
-DFOO=1 vs -DFOO=2, or -DFOO vs -UFOO.

The latter commit added a FIXME that we really should check whether
mismatched defines actually were used somewhere in the PCH, so that
the define would affect the outcome. This FIXME has stood unaddressed
since 2012.

This differs from GCC, which rejects PCH files if the defines differ
at all.

This is also problematic when using a directory containing multiple
candidate PCH files, one built with -DFOO and one without. When
attempting to include a PCH and iterating over the candidates in the
directory, Clang will essentially pick the first one out of the two,
even if there existed a better, exact match in the directory.

This fixes https://github.com/lhmouse/mcfgthread/issues/63.

This is likely to be controversial. In tree, 84 tests break by
making this change. This patch only fixes up the core tests for
this feature (to bring it up for discussion); out of the rest, some
probably are easy to fix, while others seem to be designed to specifically
test cases with such discrepancies.

Diff Detail

Event Timeline

mstorsjo created this revision.May 30 2022, 1:35 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 30 2022, 1:35 PM
mstorsjo requested review of this revision.May 30 2022, 1:35 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 30 2022, 1:35 PM
rnk added a comment.Jun 15 2022, 3:23 PM

I don't have a great answer here, but yes, dllexport macro norms sort of run directly counter to the normal ways that people use PCH. It seems like projects will need to have a library module / PCH file for building a DLL and then a PCH / module for consuming the DLL.

I don't have a great answer here, but yes, dllexport macro norms sort of run directly counter to the normal ways that people use PCH. It seems like projects will need to have a library module / PCH file for building a DLL and then a PCH / module for consuming the DLL.

This isn’t about sharing a PCH between building and consuming a DLL - that’d be highly unusual use of PCHs indeed.

It’s about building a library as both DLL and static library at the same time. For that, you’d have a PCH directory containing two versions of the same PCH, one built with -DDLLEXPORT and one without. When building the object files for the two copies of the library (for the static or DLL version), the compiler gets called with or without -DDLLEXPORT, and is pointed at the directory containing the two copies of the PCH.

GCC successfully picks the right one out of the directory, while Clang just picks the first one it tries. Clang also does try to pick the right one out of a directory with multiple copies (picking the right one, if you have one built with -DFOO=1 and one with -DFOO=2), but Clang considers that if you have a define entirely missing (PCH built with -DDLLEXPORT but the same define absent when including the header, or vice versa) that they’re probably compatible, and pick the first one out of the directory.

I can somewhat understand the rationale - if not using a PCH directory, but only one single file, Clang does allow you to include that while defining -DUNRELATED. (Clang’s implementation does have a 10 year old todo about checking that the differing define indeed was unrelated and not used by the PCH in any way.) GCC doesn’t allow that, but requires there to be an exact match between all -D arguments between when the PCH was built and consumed.

So with GCC’s stricter logic here, I think it would make sense to make Clang equally strict, but I wonder if that’d break Clang-only usecases that have ended up relying on Clang’s more tolerant behaviour here. (There even are tests testing specifically the fuzzy allowing of differences.) Maybe the safest would be to only make it strict in the case when picking the right one out of a directory with alternatives?

mstorsjo updated this revision to Diff 443461.Jul 9 2022, 2:58 PM

Reduce the scope and impact of the change: When telling Clang to include a specifically named PCH file, keep tolerating the same set of mismatches as before. (Clang has been tolerating such differences for over 10 years, and there are a number of tests specifically for such fuzzy matches.) When using a GCC style directory with multiple alternative PCH files, require an exact match, to avoid the case where multiple ones are acceptable and Clang erroneously picks the first seemingly acceptable one.

In this form, the patch no longer breaks other testcases than the one that is meant to be fixed/changed.

This still has a small risk of breaking someone's setup, but as this makes Clang match GCC's behaviour for the compatible option, the risk is probably quite small.

(A more comprehensive solution to avoid breaking that case, would be to iterate over the PCH directory, first requiring a strict match, and if none is found, iterate over the directory again, tolerating inexact matches like before. But I don't think that's necessary.)

Seeking additional opinions on the PCH change: @dexonsmith @arphaman

The use case makes sense to me. It seems reasonable to me that the compiler shouldn't use the same PCH file with mismatched defines on the command line.

clang/lib/Serialization/ASTReader.cpp
637–649

These feel like they should be a "validation level" enum. Multiple optional boolean parameters in sequence is usually a hazard for future refactorings.

I haven’t reviewed the details of the patch and won’t have time to do so (at least for a while), but the description of the intended (more narrow) scope SGTM. With the scope limited to GCC directories this should be fine.

There are cases where it’s safe to have mismatched defines (e.g., if a define can be proven during a cheap dependency scan not to matter for a given PCH, it’s better to canonicalize away the define and share the artifact more broadly) but if I understand correctly this patch won’t preclude compiler smarts like that.

If I’m right, then I think this can go in once @rnk is happy. (If I’m wrong, maybe better to wait for one of the people I pinged.)

I haven’t reviewed the details of the patch and won’t have time to do so (at least for a while), but the description of the intended (more narrow) scope SGTM. With the scope limited to GCC directories this should be fine.

There are cases where it’s safe to have mismatched defines (e.g., if a define can be proven during a cheap dependency scan not to matter for a given PCH, it’s better to canonicalize away the define and share the artifact more broadly) but if I understand correctly this patch won’t preclude compiler smarts like that.

Yup. Any implementation of such logic hasn’t materialized during the 10 years since todos hinting that we should implement that, but it doesn’t seem to be a big practical issue either, outside of false positive matches with gcc PCH directories - so I guess it’s not a high priority in practice.

If I’m right, then I think this can go in once @rnk is happy. (If I’m wrong, maybe better to wait for one of the people I pinged.)

Yeah, I’d be happy to add a validation level enum to make things a bit clearer - I’ll try to revise the patch soon.

mstorsjo updated this revision to Diff 448708.Jul 29 2022, 1:24 PM

Update the parameters to the internal checkPreprocessorOptions function to use a three-state enum instead of having two separate bool flags.

The 15.x release already has been branched, but I'd like to have this backported and included if possible (the regression risk should be quite small as long as it's restricted to GCC PCH directories). To handle that, I'd omit the release notes when pushing to the main branch (as the release notes document differs between the branches) and manually add that to the release branch, if this is accepted for backport.

There are cases where it’s safe to have mismatched defines (e.g., if a define can be proven during a cheap dependency scan not to matter for a given PCH, it’s better to canonicalize away the define and share the artifact more broadly) but if I understand correctly this patch won’t preclude compiler smarts like that.

Yup. Any implementation of such logic hasn’t materialized during the 10 years since todos hinting that we should implement that, but it doesn’t seem to be a big practical issue either, outside of false positive matches with gcc PCH directories - so I guess it’s not a high priority in practice.

Note, FWIW, that clang-scan-deps is being used to canonicalize command-lines as of the last year; e.g., there are patches up (maybe committed?) to remove unused search paths for implicitly-discovered explicit modules. Finding unused defines is a bit harder and less urgent (search path explosion had to be fixed for performance parity with on-demand implicit modules, which have an unsound optimization of ignoring search path differences), but there’s now a framework where it can be reasonably implemented if/when it’s identified as a major bottleneck in builds.

(Fallout from unsoundness in on-demand implicit modules is a big reason progress has been slow in this area.)

@rnk Can you have another look here? There doesn’t really seem to be anyone else who’s able to give it a review at the moment, but @dexonsmith seemed ok with the idea.

Do you think it would make sense to introduce a command-line flag that would control this behavior?

Also, I'd like to test this patch internally to see if it affects anything on our end. I should be able to do so next week.

Do you think it would make sense to introduce a command-line flag that would control this behavior?

Hmm - I don’t think it’d be necessary. If using the gcc style PCH directory, I would also kinda expect it to be used in a setting where it’s occasionally built with GCC (which requires strict matches) or with a build system that enforces such a setup anyway.

Also, I'd like to test this patch internally to see if it affects anything on our end. I should be able to do so next week.

Thanks, that’d be very much appreciated!

rnk accepted this revision.Aug 5 2022, 2:09 PM

I don't have any concerns, this seems reasonable to me.

Since Jan offered to test it, please wait to hear back before landing.

This revision is now accepted and ready to land.Aug 5 2022, 2:09 PM

Also, I'd like to test this patch internally to see if it affects anything on our end. I should be able to do so next week.

Ping @jansvoboda11 - any update on testing this?

mstorsjo updated this revision to Diff 451434.Aug 10 2022, 6:20 AM

Rebased, to fix (trivial) conflicts in the release notes file.

jansvoboda11 accepted this revision.Aug 10 2022, 10:55 AM

LGTM, no issues on our side.