This is an archive of the discontinued LLVM Phabricator instance.

[clang] Add the flag -ffile-reproducible
ClosedPublic

Authored by ayzhao on Mar 30 2022, 2:58 PM.

Details

Summary

When Clang generates the path prefix (i.e. the path of the directory
where the file is) when generating FILE, __builtin_FILE(), and
std::source_location, Clang uses the platform-specific path separator
character of the build environment where Clang _itself_ is built. This
leads to inconsistencies in Chrome builds where Clang running on
non-Windows environments uses the forward slash (/) path separator
while Clang running on Windows builds uses the backslash (\) path
separator. To fix this, we add a flag -ffile-reproducible (and its
inverse, -fno-file-reproducible) to have Clang use the target's
platform-specific file separator character.

Additionally, the existing flags -fmacro-prefix-map and
-ffile-prefix-map now both imply -ffile-reproducible. This can be
overriden by setting -fno-file-reproducible.

[0]: https://crbug.com/1310767

Diff Detail

Event Timeline

ayzhao created this revision.Mar 30 2022, 2:58 PM
ayzhao requested review of this revision.Mar 30 2022, 2:58 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMar 30 2022, 2:58 PM
ayzhao added a subscriber: rnk.
ayzhao updated this revision to Diff 419277.Mar 30 2022, 3:01 PM

Remove errant diff

As said elsewhere, I think this is too simplistic.

It's good to make this dependent on the triple, instead of on the host platform. But we should have a flag that controls which slash direction to use on windows triples, since different people will want different things.

And since most people who use clang-cl run it on Windows, the default for that flag should imho stay a backslash (but projects can add the forward direction flag if they want).

(I don't think the argument about '\' in include lines applies to this patch (?))

hans added a comment.Mar 31 2022, 7:00 AM

we should have a flag that controls which slash direction to use on windows triples, since different people will want different things.
And since most people who use clang-cl run it on Windows, the default for that flag should imho stay a backslash (but projects can add the forward direction flag if they want).

I guess this is the part that's not entirely clear to me. Do people really *want* the backslash? Would anything break if we just use the forward slash?

This is MSVC's current behaviour:

C:\src\tmp>type a.cc foo\a.h

a.cc


#include "foo/a.h"

int main() { f(); return 0; }

foo\a.h


#include <stdio.h>

inline void f() { printf("%s\n", __FILE__); }

C:\src\tmp>cl /nologo a.cc && a.exe
a.cc
C:\src\tmp\foo/a.h

I think

C:\src\tmp\foo/a.h

looks pretty weird, and canonicalizing on slashes would be an improvement even if it diverges from MSVC.

(I don't think the argument about '\' in include lines applies to this patch (?))

It kinda does in that developers, even on Windows, already use forward slashes to refer to included files:

#include "foo/bar.h"

and so having the compiler prefer forward slashes for the whole filename is in line with that.

While this would make clang-cl diverge from MSVC, maybe this is one of those cases where we want to diverge because clang-cl's way is better. Unless it breaks something.

clang/lib/AST/Expr.cpp
2194–2195

I was going to say perhaps we should put a comment about why Windows needs different treatment, but as we'd need to put the comment in three different places, maybe this (remapPathPrefix() and make_preferred()) should be factored out to a utility function?

Maybe Martin or Aaron have opinions here too?

ayzhao added inline comments.Mar 31 2022, 10:52 AM
clang/lib/AST/Expr.cpp
2194–2195

Extracting these lines into a separate function (say, for example, handle_file_macro(...)?) makes sense. However, I'm not sure which file this function would belong:

  • The function would need to accept a LangOpts and a TargetInfo object. Both of these classes are Clang specific.
  • It shouldn't belong in llvm::sys::path::Style because the file declaring that namespace exists in the llvm directory, which shouldn't depend on things in Clang.
  • We could put it in LangOpts, but TargetInfo.h includes LangOpts.h, so this would create a circular dependency.
  • TargetInfo doesn't make much sense as it doesn't have any code generation / language specific logic.

WDYT?

When targeting Windows, the path separator used when targeting Windows depends on the build environment when Clang _itself_ is built. This leads to inconsistencies in Chrome builds where Clang running on non-Windows environments uses the forward slash (/) path separator while Clang running on Windows builds uses the backslash (\) path separator. To fix this, we make Clang use forward slashes whenever it is targeting Windows.

IMO, this problem formulation is a bit too narrow/specific - you'd have the same phenomenon if you'd cross compile things on Windows, targeting e.g. Linux. So I think the condition for normalizing paths would be if the build host OS is Windows, not based on the target OS. (I.e. in the current patch form, the normalization that is chosen, when targeting Windows from Unix, is a no-op anyway.)

Then secondly - if the source paths are relative paths, making them OS-agnostic in this way does make sense. But if they are absolute, they are pretty much by definition specific to the build host, and can't be expected to make sense on a different platform. So for such a case, there's less reason to try to normalize the paths. But as Windows path handling in most cases does tolerate paths with forward slashes, I guess it's ok to do anyway.

Overall, I think this is a sensible thing to do. I'm not entirely sure if an option for the behaviour is needed or not though - I guess that's the main question.

Do you know what cl.exe does to paths in pdb files? Does it write mixed-slashiness for foo/bar.h includes, or does it write backslashes throughout?

Windows can handle slashes, but several tools can't. I worry that if we do something different than cl, some random corner case might break (dbghelp, or some source server thing or some custom debug info processor somewhere).

Then secondly - if the source paths are relative paths, making them OS-agnostic in this way does make sense. But if they are absolute, they are pretty much by definition specific to the build host,

We use /pdbsourcepath:X:\fake\prefix to write a deterministic absolute path to the output file at link time (and -fdebug-compilation-dir . to get deterministic compiler output – see https://blog.llvm.org/2019/11/deterministic-builds-with-clang-and-lld.html). The motivation is to get exactly the same output when building on linux and windows hosts.

Windows can handle slashes, but several tools can't. I worry that if we do something different than cl, some random corner case might break (dbghelp, or some source server thing or some custom debug info processor somewhere).

Yup, that's a valid concern. Especially PDB handling is pretty particular about requiring backslashes, as far as I've seen.

Then secondly - if the source paths are relative paths, making them OS-agnostic in this way does make sense. But if they are absolute, they are pretty much by definition specific to the build host,

We use /pdbsourcepath:X:\fake\prefix to write a deterministic absolute path to the output file at link time (and -fdebug-compilation-dir . to get deterministic compiler output – see https://blog.llvm.org/2019/11/deterministic-builds-with-clang-and-lld.html). The motivation is to get exactly the same output when building on linux and windows hosts.

Ok, nice.

Do you agree that if we go down this path doing this, we should also do it whenever the host OS is windows, i.e. also for windows->linux cross compilation, for consistency?

thakis added a comment.EditedApr 1 2022, 7:03 AM

Do you agree that if we go down this path doing this, we should also do it whenever the host OS is windows, i.e. also for windows->linux cross compilation, for consistency?

Yes, to me it makes sense that this should depend on the target triple, not on the host system (ie always use slashes when targeting non-Windows, and jury still a bit out on what exactly to do when targeting Windows).

hans added a comment.Apr 1 2022, 8:47 AM

Do you know what cl.exe does to paths in pdb files? Does it write mixed-slashiness for foo/bar.h includes, or does it write backslashes throughout?

Looks like just backslashes:

cl /nologo /Zi a.cc && a.exe
llvm-pdbutil.exe dump --files \src\tmp\a.pdb | grep foo
- (MD5: B97B3B57F55C87D06A0379A4FBCA51F0) C:\src\tmp\foo\a.h

(To make sure llvm-pdbutil didn't canonicalize, I looked at the pdb in vim and the backslashes are there.)

It appears that lld does the same.

Windows can handle slashes, but several tools can't. I worry that if we do something different than cl, some random corner case might break (dbghelp, or some source server thing or some custom debug info processor somewhere).

Since cl is already mixing back and forward slashes, any tools consuming FILE can presumably handle both. So diverging msvc by canonicalizing here doesn't seem that risky, we just have to decide in which direction.

clang/lib/AST/Expr.cpp
2194–2195

It would have to be somewhere in Clang.
Maybe it could be a method in Preprocessor.

ayzhao updated this revision to Diff 420350.Apr 4 2022, 5:12 PM

Extract logic into a separate function

ayzhao marked 2 inline comments as done.Apr 4 2022, 5:13 PM

Maybe Martin or Aaron have opinions here too?

Naively, this seems wrong to me. Yes, Windows sometimes allows you to use forward slashes, but that is not the path separator which Windows users are most often presented with in terms of documentation, screen shots, presentation in Explorer, etc. My feeling is that the default behavior on Windows needs to be to use backslashes and not forward slashes. Having an option to canonicalize in the other direction would be reasonable, as does differing what the default is based on the target.

hans added a comment.Apr 5 2022, 7:43 AM

My feeling is that the default behavior on Windows needs to be to use backslashes and not forward slashes.

Okay, how would folks feel about always canonicalizing __FILE__ etc. to use backslashes when targeting Windows?

My feeling is that the default behavior on Windows needs to be to use backslashes and not forward slashes.

Okay, how would folks feel about always canonicalizing __FILE__ etc. to use backslashes when targeting Windows?

FWIW (I have little Windows experience and no stake in this!), defaulting to backslashes more sense to me. Also seems easy enough to add a preprocessor option to configure this.

I'm curious though, if you're building on POSIX (so the file starts out as POSIX) and targeting Windows, does calling make_preferred() actually give you something reasonable in the general case? Or should it be used *only* when there's a hit in LangOpts.remapPathPrefix()? Sorry if this has been discussed and rejected; I haven't followed the full thread; feel free to ignore if it's not relevant; but it seems to me that when -ffile-prefix-map is NOT being used, then __FILE__ ought to correspond to the build environment (if anything, it should use Style::native).

Normalizing to \ seems better to me than normalizing to / too.

I abstractly like changing as little as necessary at every stage, which in this case would mean changing just the slashiness of slashes that clang itself adds, so I'm still weakly in favor of that. But I also won't insist on it.

My feeling is that the default behavior on Windows needs to be to use backslashes and not forward slashes.

Okay, how would folks feel about always canonicalizing __FILE__ etc. to use backslashes when targeting Windows?

FWIW (I have little Windows experience and no stake in this!), defaulting to backslashes more sense to me. Also seems easy enough to add a preprocessor option to configure this.

I'm curious though, if you're building on POSIX (so the file starts out as POSIX) and targeting Windows, does calling make_preferred() actually give you something reasonable in the general case? Or should it be used *only* when there's a hit in LangOpts.remapPathPrefix()? Sorry if this has been discussed and rejected; I haven't followed the full thread; feel free to ignore if it's not relevant; but it seems to me that when -ffile-prefix-map is NOT being used, then __FILE__ ought to correspond to the build environment (if anything, it should use Style::native).

(We don't use -ffile-prefix-map. As https://blog.llvm.org/2019/11/deterministic-builds-with-clang-and-lld.html explains, using it means your commandlines *must* be machine-dependent, which we don't want. So we use -ffile-compilation-dir=. instead which has the same effect but doesn't have that drawback.)

So, the general consensus seems to be that we should use backslashes when targeting Windows.

I implemented using only backslashes for Windows; however, clang/test/SemaCXX/destructor.cpp fails when running on Linux with the following error (among other errors, but the one below is the most important).

...
error: 'error' diagnostics seen but not expected:
  Line 32: '<REDACTED_PATH_TO_LLVM_CHECKOUT_WITH_BACKSLASHES>\\clang\\test\\SemaCXX\\destructor.cpp' file not found
...

The reason for this is that the test has Clang target windows and the test also has the statement #include __FILE__.

One way to fix this would be to have every macro that accepts a path internally convert the path to the build environment's path format, but TBH I'm not sure. What do you all think?

So, the general consensus seems to be that we should use backslashes when targeting Windows.

I implemented using only backslashes for Windows; however, clang/test/SemaCXX/destructor.cpp fails when running on Linux with the following error (among other errors, but the one below is the most important).

...
error: 'error' diagnostics seen but not expected:
  Line 32: '<REDACTED_PATH_TO_LLVM_CHECKOUT_WITH_BACKSLASHES>\\clang\\test\\SemaCXX\\destructor.cpp' file not found
...

The reason for this is that the test has Clang target windows and the test also has the statement #include __FILE__.

One way to fix this would be to have every macro that accepts a path internally convert the path to the build environment's path format, but TBH I'm not sure. What do you all think?

Wow, #include __FILE__ is kind of amazing...

It sounds like you're suggesting changing the #include directive to start canonicalizing its argument. I'm not a fan of that idea; header search is hard enough to reason about.

I suggest, instead:

  • Only canonicalize __FILE__ for the target platform when there's a command-line flag that suggests it's okay. I suggest adding -ffile-reproducible, which could be implied by -ffile-prefix-map but also specified separately when -ffile-prefix-map isn't being used.
  • When __FILE__ is being canonicalized, it's okay to fail on #include __FILE__ if the canonicalized path doesn't work as an argument to #include in the current build environment.

This would mean that -ffile-reproducible makes #include __FILE__ non-portable, which would be fine as long as #include __FILE__ isn't load-bearing for real life usecases. My guess is that it'd be okay?

What do others think?

  • Only canonicalize __FILE__ for the target platform when there's a command-line flag that suggests it's okay. I suggest adding -ffile-reproducible, which could be implied by -ffile-prefix-map but also specified separately when -ffile-prefix-map isn't being used.

(Could also be implied by -ffile-compilation-dir=)

hans added a comment.Apr 14 2022, 6:54 AM

So, the general consensus seems to be that we should use backslashes when targeting Windows.

I implemented using only backslashes for Windows; however, clang/test/SemaCXX/destructor.cpp fails when running on Linux with the following error (among other errors, but the one below is the most important).

...
error: 'error' diagnostics seen but not expected:
  Line 32: '<REDACTED_PATH_TO_LLVM_CHECKOUT_WITH_BACKSLASHES>\\clang\\test\\SemaCXX\\destructor.cpp' file not found
...

The reason for this is that the test has Clang target windows and the test also has the statement #include __FILE__.

That seems like an unusual use case of __FILE__. I wonder if we could just change the test to #include "destructor.cpp" instead and not worry about the case of #include __FILE__ in cross-builds targeting Windows.

ayzhao added a subscriber: rsmith.Apr 14 2022, 10:52 AM

So, the general consensus seems to be that we should use backslashes when targeting Windows.

I implemented using only backslashes for Windows; however, clang/test/SemaCXX/destructor.cpp fails when running on Linux with the following error (among other errors, but the one below is the most important).

...
error: 'error' diagnostics seen but not expected:
  Line 32: '<REDACTED_PATH_TO_LLVM_CHECKOUT_WITH_BACKSLASHES>\\clang\\test\\SemaCXX\\destructor.cpp' file not found
...

The reason for this is that the test has Clang target windows and the test also has the statement #include __FILE__.

That seems like an unusual use case of __FILE__. I wonder if we could just change the test to #include "destructor.cpp" instead and not worry about the case of #include __FILE__ in cross-builds targeting Windows.

This line was introduced in D37235 (coincidentally written by @thakis), and in that review, @rsmith made a comment on this exact line that said

Do we guarantee that __FILE__ names a path that can be used to include the current file? In other tests, we add -include %s to the RUN: line to model this situation.

ayzhao updated this revision to Diff 427771.May 6 2022, 3:29 PM

Implement -ffile-reproducible

ayzhao retitled this revision from [clang] Use forward slash as the path separator for Windows in __FILE__, __builtin_FILE(), and std::source_location to [clang] Add the flag -ffile-reproducible.May 6 2022, 3:31 PM
ayzhao edited the summary of this revision. (Show Details)
ayzhao updated this revision to Diff 427773.May 6 2022, 3:36 PM

missing newline

ayzhao updated this revision to Diff 427774.May 6 2022, 3:39 PM

another missing newline

ayzhao added a comment.May 6 2022, 3:39 PM

So, the general consensus seems to be that we should use backslashes when targeting Windows.

I implemented using only backslashes for Windows; however, clang/test/SemaCXX/destructor.cpp fails when running on Linux with the following error (among other errors, but the one below is the most important).

...
error: 'error' diagnostics seen but not expected:
  Line 32: '<REDACTED_PATH_TO_LLVM_CHECKOUT_WITH_BACKSLASHES>\\clang\\test\\SemaCXX\\destructor.cpp' file not found
...

The reason for this is that the test has Clang target windows and the test also has the statement #include __FILE__.

One way to fix this would be to have every macro that accepts a path internally convert the path to the build environment's path format, but TBH I'm not sure. What do you all think?

Wow, #include __FILE__ is kind of amazing...

It sounds like you're suggesting changing the #include directive to start canonicalizing its argument. I'm not a fan of that idea; header search is hard enough to reason about.

I suggest, instead:

  • Only canonicalize __FILE__ for the target platform when there's a command-line flag that suggests it's okay. I suggest adding -ffile-reproducible, which could be implied by -ffile-prefix-map but also specified separately when -ffile-prefix-map isn't being used.
  • When __FILE__ is being canonicalized, it's okay to fail on #include __FILE__ if the canonicalized path doesn't work as an argument to #include in the current build environment.

This would mean that -ffile-reproducible makes #include __FILE__ non-portable, which would be fine as long as #include __FILE__ isn't load-bearing for real life usecases. My guess is that it'd be okay?

What do others think?

Patch has been updated to implement -ffile-reproducible

aaron.ballman added inline comments.May 9 2022, 8:57 AM
clang/include/clang/Driver/Options.td
3020–3023 ↗(On Diff #427774)

These are both getting a bit long, can you wrap to 80 col?

clang/lib/Driver/ToolChains/Clang.cpp
1489–1490 ↗(On Diff #427774)

I'm reasonably sure this does the same thing (but you should check to be sure).

clang/lib/Frontend/CompilerInvocation.cpp
4102–4105 ↗(On Diff #427774)

I was hoping there was a clean way to do this using hasFlag() but I wasn't really excited by what I was getting from that; I think this is okay, but no need to qualify the option names.

clang/lib/Lex/PPMacroExpansion.cpp
1895–1900

LLVM style nit for single-line if/else clauses (the diff view makes it look more scary than it is).

ayzhao updated this revision to Diff 428121.May 9 2022, 10:13 AM
ayzhao marked 4 inline comments as done.

code review comments

hans accepted this revision.May 9 2022, 11:23 AM

I still wish this could just be the default behavior, but this flag seems a good compromise.

LGTM (but please wait for Aaron too)

clang/include/clang/Driver/Options.td
1516 ↗(On Diff #428121)

I think "build environment" is maybe confusing, and "host" might be the better word.

This revision is now accepted and ready to land.May 9 2022, 11:23 AM
ayzhao updated this revision to Diff 428139.May 9 2022, 11:26 AM

Clarify --fno-file-reproducible HelpText

ayzhao marked an inline comment as done.May 9 2022, 11:27 AM
This revision was landed with ongoing or failed builds.May 11 2022, 2:05 PM
Closed by commit rG6398f3f2e904: [clang] Add the flag -ffile-reproducible (authored by ayzhao, committed by hans). · Explain Why
This revision was automatically updated to reflect the committed changes.