This is an archive of the discontinued LLVM Phabricator instance.

[Preprocessor] Implement -fminimize-whitespace.
ClosedPublic

Authored by Meinersbur on Jun 19 2021, 6:32 PM.

Details

Summary

This patch adds the -fminimize-whitespace with the following effects:

  • If combined with -E, remove as much non-line-breaking whitespace as possible
  • If combined with -E -P, removes as much whitespace as possible, including line-breaks.

The motivation is to reduce the amount of insignificant changes in the preprocessed output with source files where only whitespace has been changes (add/remove comments, clang-format, etc.) which is in particular useful with ccache.

A patch for ccache for using this flag has been proposed to ccache as well: https://github.com/ccache/ccache/pull/815, which will use -fnormalize-whitespace when clang-13 has been detected (assuming this will be merged before the release fork), and additionally uses -P in "unify_mode". ccache already had a unify_mode in an older version which was removed because of problems that using the preprocessor itself does not have (such that the custom tokenizer did not recognize C++11 raw strings).

This patch slightly reorganizes which part is responsible for adding newlines that are required for semantics. It is now either startNewLineIfNeeded() or MoveToLine() but never both; this avoids the ShouldUpdateCurrentLine workaround and avoids redundant lines being inserted in some cases. It also fixes a mandatory newline not inserted after a _Pragma("...") that is expanded into a #pragma.

Diff Detail

Event Timeline

Meinersbur created this revision.Jun 19 2021, 6:32 PM
Meinersbur requested review of this revision.Jun 19 2021, 6:32 PM
Herald added projects: Restricted Project, Restricted Project, Restricted Project. · View Herald Transcript
Herald added a reviewer: Restricted Project. · View Herald Transcript
Meinersbur edited the summary of this revision. (Show Details)
  • Undo "typechange" changes
Meinersbur edited the summary of this revision. (Show Details)Jun 19 2021, 7:27 PM
Meinersbur edited reviewers, added: aaron.ballman, rsmith, ilya-biryukov, dblaikie, juliehockett, rnk; removed: jdoerfert, Restricted Project.
Meinersbur removed projects: Restricted Project, Restricted Project.

This is probably more @aaron.ballman 's wheelhouse, but for my money this seems pretty problematic - will make quoted text in compiler diagnostics weird/difficult to read, etc.

Do you have a particular use case that has exceptionally frequent whitespace-only changes?
Or is this something you think would be generally applicable? How applicable, do you think? Do you have some data showing a % of rebuilds that would be avoided in practice/based on real code bases, etc?

This is probably more @aaron.ballman 's wheelhouse, but for my money this seems pretty problematic - will make quoted text in compiler diagnostics weird/difficult to read, etc.

It is not meant for human readability but automatic processing. The linked ccache patch either re-runs the compiler with on the original source file if there is any diagnostic output (run_second_cpp=false) or(inclusive) uses the preprocessor output only to determine whether there was a significant change in the first place (run_second_cpp=true, default).

To discourage use by humans, we could make the flag available only behind -Xclang, but I'd expect people to only use the flag if they have a need for it anyway. Compiler diagnostics invoked on -E -P output is already not very useful; -fnormalize-whitespace would potentially cause the entire source be on a single line. If this is a show-stopper, I could try making the diagnostic's quoted text print only an excerpt of a line if it becomes too long.

Do you have a particular use case that has exceptionally frequent whitespace-only changes?

Every edit of a (multi-line) comment and running of clang-format. The former makes me hesitant to improve comments in central header files because I know it means that I have to rebuild the entire code base; it should not be a reason to not improve documentation. The latter is best-practice before any commit. Again, if it includes a central header file, it means rebuilding significant portions.

ccache uses preprocessor mode by default while it also supports strict binary comparison ("direct mode") and I assume there is a reason for this. However, every non-intra-line change causes the preprocessor output to change due to the line number in the #line directive changing, making the preprocessor mode much less effective than it could be. Very few NF changes take place within a single line only.

Unlike gcc, clang's -E -P output still includes empty lines (unless there are more than 8 consecutive empty lines). That is, there is hardly a benefit to use -P output in ccache with clang.

Or is this something you think would be generally applicable? How applicable, do you think? Do you have some data showing a % of rebuilds that would be avoided in practice/based on real code bases, etc?

If it supports my case, I can work with builds using ccache for a while: one configured using -fnormalize-whitespace, the other without, and compare ccache statistics. I have used it already, but never comparatively over the same time frame. However, I think the above argument is already pretty good.

Some bikeshedding: Calls to AvoidConcat could be avoided by always inserting a space between tokens at the cost of making the output larger. In the current form, the flag could also be named -fminimize-whitespace.

Meinersbur edited the summary of this revision. (Show Details)Jun 21 2021, 4:53 PM

One of the concerns I'd have, for instance (have you done some broad testing of these patches on sizable code bases?) is that it wouldn't surprise me if clang had some scalability bugs/issues with very long source lines - so it might be necessary to introduce some (arbitrary?) newlines to break up the code. Though I'm not sure - no need to do that pre-emptively, but might be good to have some data that indicates whether this might be a problem or not.

During a trial phase while compiling everything twice with ccache I got the following results.

Only unify_mode::

$ ccache -d . -s
cache directory                     .
primary config                      ./ccache.conf
secondary config (readonly)         /home/meinersbur/install/ccache/release/etc/ccache.conf
stats updated                       Fri Jun 25 02:00:22 2021
stats zeroed                        Wed Jun 23 03:22:36 2021
cache hit (direct)                  1001
cache hit (preprocessed)            5904
cache miss                         10717
cache hit rate                     39.18 %
compile failed                         5
cleanups performed                     0
files in cache                     38182
cache size                           1.9 GB
max cache size                       5.5 GB

unify_mode with -fnormalize-whitespace:

$ ccache -d . -s
cache directory                     .
primary config                      ./ccache.conf
secondary config (readonly)         /home/meinersbur/install/ccache/release/etc/ccache.conf
stats updated                       Fri Jun 25 02:04:19 2021
stats zeroed                        Wed Jun 23 03:22:31 2021
cache hit (direct)                  1001
cache hit (preprocessed)            2663
cache miss                         13957
cache hit rate                     20.79 %
compile failed                         5
cleanups performed                     0
files in cache                     44661
cache size                           2.4 GB
max cache size                       5.5 GB

Admittedly, I focused on changes (of Clang and Polly) like refactoring, improving comments, minimize difference to upstream (clang-)formatting etc. during the testing.

The difference is most pronounced with changes such as rG3e8d1e8b12ba9017b861fff94afdd4a29b39de17:

cache hit (direct)                     0
cache hit (preprocessed)               0
cache miss                          2245
cache hit rate                      0.00 %

vs

cache hit (direct)                     0
cache hit (preprocessed)            2235
cache miss                            10
cache hit rate                     99.55 %

(the misses come from compilations with warning diagnostics)

One of the concerns I'd have, for instance (have you done some broad testing of these patches on sizable code bases?) is that it wouldn't surprise me if clang had some scalability bugs/issues with very long source lines - so it might be necessary to introduce some (arbitrary?) newlines to break up the code. Though I'm not sure - no need to do that pre-emptively, but might be good to have some data that indicates whether this might be a problem or not.

I found no such issues during my trials. However, I think the request is understandable an I would implement it on request. It introduces a new problem having to determine where no newlines mat be introduced (e.g. within a #pragma).

Admittedly, I focused on changes (of Clang and Polly) like refactoring, improving comments, minimize difference to upstream (clang-)formatting etc. during the testing.

Yeah, I was more curious about general purpose/average changes, but anyway.

One of the concerns I'd have, for instance (have you done some broad testing of these patches on sizable code bases?) is that it wouldn't surprise me if clang had some scalability bugs/issues with very long source lines - so it might be necessary to introduce some (arbitrary?) newlines to break up the code. Though I'm not sure - no need to do that pre-emptively, but might be good to have some data that indicates whether this might be a problem or not.

I found no such issues during my trials. However, I think the request is understandable an I would implement it on request. It introduces a new problem having to determine where no newlines mat be introduced (e.g. within a #pragma).

Yeah, no need to get ahead of that - just something to be aware of/on the look out if this feature ends up in use.

One other thing: This wouldn't be usable when using debug info, presumably, because it'd refer to the wrong lines, etc.

Meinersbur added a comment.EditedJun 27 2021, 11:23 PM

One other thing: This wouldn't be usable when using debug info, presumably, because it'd refer to the wrong lines, etc.

This is considered in the ccache patch. By default (unless a corresponding "sloppiness" option is used), only direct hits (by binary file comparison) are used if: There is compiler output, debug info is generated, or .incbin assembly instruction is used.

However, this is orthogonal to -fnormalize-whitespace. It is a mitigation of using -E -P as preprocessor intermediate.

I'm still mulling over the feature, but I have some nits with the patch while I was exploring it. I share the concerns raised by @dblaikie and would add that it's very common for implementers to ask developers to run their code through -E mode to submit preprocessed output in order to reproduce a customer issue with the compiler, and I worry that uses of this flag will have unintended consequences in that scenario. However, I don't yet have a concrete "this code demonstrates the problem I'm worried about" example to discuss, so this worry may be unfounded. The "very long line" example mentioned by @dblaikie is sort of along these lines (sorry for the bad pun).

clang/docs/ClangCommandLineReference.rst
2480

You might want to wrap this to 80-col limits.

aaron.ballman added inline comments.Jun 29 2021, 8:36 AM
clang/lib/Frontend/PrintPreprocessedOutput.cpp
138
140
177

Can Tok be const Token & instead?

184
639
Meinersbur marked 6 inline comments as done.
Meinersbur added a comment.EditedJun 29 2021, 1:43 PM

... would add that it's very common for implementers to ask developers to run their code through -E mode to submit preprocessed output in order to reproduce a customer issue with the compiler, and I worry that uses of this flag will have unintended consequences in that scenario.

Why would one add -fnormalize-whitespace for this use case?

The "very long line" example mentioned by @dblaikie is sort of along these lines (sorry for the bad pun).

I can add forced line breaks (after/before 80 cols? configurable?) if requested.

clang/lib/Frontend/PrintPreprocessedOutput.cpp
177

Note that this is a renamed bool HandleFirstTokOnLine(Token &Tok). Renamed because it is now called for every token.

ping

It would be nice if we could get this into clang 13 since the ccache part detects support for -fnormalize-whitespace when detecting clang 13+. Probing the compiler for whether it supports -fnormalize-whitespace on every run would be prohibitively expensive.

... would add that it's very common for implementers to ask developers to run their code through -E mode to submit preprocessed output in order to reproduce a customer issue with the compiler, and I worry that uses of this flag will have unintended consequences in that scenario.

Why would one add -fnormalize-whitespace for this use case?

I don't think they'd *add* it, I worry it's already used by their build system for reasons unknown and the developer replicates it when reporting the bug because they're not looking at what flags can be removed.

The "very long line" example mentioned by @dblaikie is sort of along these lines (sorry for the bad pun).

I can add forced line breaks (after/before 80 cols? configurable?) if requested.

I don't think that's necessary just yet. If there's an issue in practice, it seems like we could address it once we have the real world use in front of us. However, It'd help my confidence if you were able to run this functionality over a large corpus of code to see if any issues do pop out, if that's plausible for you.

That said, I think I've convinced myself that this functionality is reasonable (if a bit novel), but I'd like to see it diagnosed when used outside of a preprocessor invocation because it really serves no purpose there. I also think we should rename it because "normalize whitespace" can be ambiguous depending on what context you're reading the words in.

clang/docs/ClangCommandLineReference.rst
2484

Should this option be ignored when not performing a preprocessing action (and documented as such)?

clang/include/clang/Driver/Options.td
1789

I think I'd feel most comfortable if this didn't use "normalize" in the name. When I hear that, I think it's going to be doing something like converting all (perhaps funky Unicode) notions of whitespace into ASCII space characters. But this option doesn't actually do that -- it's removing as much whitespace from the preprocessed output as possible. Would it be better named as -felide-unnecessary-whitespace or something along those lines?

Meinersbur marked an inline comment as done.Jul 18 2021, 10:05 PM

... would add that it's very common for implementers to ask developers to run their code through -E mode to submit preprocessed output in order to reproduce a customer issue with the compiler, and I worry that uses of this flag will have unintended consequences in that scenario.

Why would one add -fnormalize-whitespace for this use case?

I don't think they'd *add* it, I worry it's already used by their build system for reasons unknown and the developer replicates it when reporting the bug because they're not looking at what flags can be removed.

I made the flag an error if used without -E, so the build system cannot add it.

Flags such as -P, -fuse-line-directives, -frewrite-imports and -frewrite-includes are also silently ignored without -E. This requirement seems exclusive for -felide-unnecessary-whitespace. It also suggests that it is not a problematic scenario.

The "very long line" example mentioned by @dblaikie is sort of along these lines (sorry for the bad pun).

I can add forced line breaks (after/before 80 cols? configurable?) if requested.

I don't think that's necessary just yet. If there's an issue in practice, it seems like we could address it once we have the real world use in front of us. However, It'd help my confidence if you were able to run this functionality over a large corpus of code to see if any issues do pop out, if that's plausible for you.

I ran it (using ccache) over llvm-project and the llvm-test-suite. Is there another large corpus do you have in mind?

I'd like to see it diagnosed when used outside of a preprocessor invocation because it really serves no purpose there.

I made the flag an error if not used without -E.

I also think we should rename it because "normalize whitespace" can be ambiguous depending on what context you're reading the words in.

Changed to your suggested -felide-unnecessary-whitespace, although I think name will not allow use to insert whitespace e.g. for keeping the line length down. If that's not required, I would prefer -fminimize-whitespace over -felide-unnecessary-whitespace`.

clang/docs/ClangCommandLineReference.rst
2484

I changed it to an error if used without -E.

clang/include/clang/Driver/Options.td
1789

I don't think it is ambiguous what whitespace means for the preprocessor. Currently, -E already converts into '\n' and '\n' only.

I don't like the suggested name but still applied it to this patch. I'd prefer -fminimize-whitespace. However, the name would be wrong if we add additional whitespace such as line-breaks to avoid the long line problem, of we unconditionally always emit a space between tokens to not having to call AvoidConcat which would still serve the same goal.

Meinersbur marked an inline comment as done.
  • Rename -fnormalize-whitespace to -felide-unnecessary-whitespace
  • error when used without -E
  • Reabse

I don't think they'd *add* it, I worry it's already used by their build system for reasons unknown and the developer replicates it when reporting the bug because they're not looking at what flags can be removed.

I made the flag an error if used without -E, so the build system cannot add it.

Flags such as -P, -fuse-line-directives, -frewrite-imports and -frewrite-includes are also silently ignored without -E. This requirement seems exclusive for -felide-unnecessary-whitespace. It also suggests that it is not a problematic scenario.

Perhaps it's not a problematic scenario, but I'd consider those cases to also be bugs. Silently ignoring things the user wrote (either in code or on the command line) is typically user-hostile.

The "very long line" example mentioned by @dblaikie is sort of along these lines (sorry for the bad pun).

I can add forced line breaks (after/before 80 cols? configurable?) if requested.

I don't think that's necessary just yet. If there's an issue in practice, it seems like we could address it once we have the real world use in front of us. However, It'd help my confidence if you were able to run this functionality over a large corpus of code to see if any issues do pop out, if that's plausible for you.

I ran it (using ccache) over llvm-project and the llvm-test-suite. Is there another large corpus do you have in mind?

A Linux or BSD distro worth of code would be really nice, but I don't insist. The trouble with llvm-project is that it has a somewhat uniform style for code, and I've found that tools in distro packages tend to exercise more interesting cases sometimes.

I also think we should rename it because "normalize whitespace" can be ambiguous depending on what context you're reading the words in.

Changed to your suggested -felide-unnecessary-whitespace, although I think name will not allow use to insert whitespace e.g. for keeping the line length down. If that's not required, I would prefer -fminimize-whitespace over -felide-unnecessary-whitespace`.

I'd be perfectly happy with -fminimize-whitespace as well if that's your preference.

clang/docs/ClangCommandLineReference.rst
2484

Thanks! Can you add that info to the documentation here?

  • Rename -felide-unnecessary-whitespace to -fminimize-whitespace
  • Add -fminimize-whitespace only vlid to be used to docs
  • Adjust some code comments
  • Rebase

but for my money this seems pretty problematic - will make quoted text in compiler diagnostics weird/difficult to read, etc.

FWIW, clang has a line length limit of 4096 for printing diagnostics.

Meinersbur retitled this revision from [Preprocessor] Implement -fnormalize-whitespace. to [Preprocessor] Implement -fminimize-whitespace..Jul 20 2021, 2:47 PM
Meinersbur edited the summary of this revision. (Show Details)
Meinersbur marked 2 inline comments as done.
  • Error if -fminimize-whitespace is applied to asm files
  • Join lines with escaped newlines
  • Added some FIXMEs for grandfathered bugs

I compiled the Linux kernel (default config, excludes most drivers. Takes ~18mins to compile) and indeed found two problems:

  1. Linux contains assembler-with-cpp files. When compiling with clang, it processes it with -E before passing the result to as. There are different rules on whether whitespace can be completely removed (AvoidConcat) in asm files. I changed ccache to not pass -fminimize-whitespace to assembler files and added an error if one tries.
  1. Line breaks outside of macros are (more) significant in assembler files. I tried to rely more on Tok.getLocation() than Tok.isAtStartOfLine() (which is pretty unreliable) to determine line breaks, which unfortunately caused differences even without -fminimize-whitespace. The patch is using Tok.isAtStartOfLine() again.

The clang-13 release branch will be created on July 27. Is there any hope to get this merged by then?

  • Revert test changes
aaron.ballman accepted this revision.Jul 22 2021, 4:42 AM

I compiled the Linux kernel (default config, excludes most drivers. Takes ~18mins to compile) and indeed found two problems:

Thank you, that's super helpful validation!

The clang-13 release branch will be created on July 27. Is there any hope to get this merged by then?

I think there is, but I'm an eternal optimist. :-) This LGTM, but you should wait a day or two before landing it in case @dblaikie or @rsmith have concerns.

clang/lib/Driver/ToolChains/Clang.cpp
6050–6053

One downside to this approach as opposed to using a switch is that it'll be easy to add new languages and forget to update this. However, we don't do that particularly often and this will loudly tell users about the lack of support, so I think it's fine as-is.

This revision is now accepted and ready to land.Jul 22 2021, 4:42 AM

I think there is, but I'm an eternal optimist. :-) This LGTM, but you should wait a day or two before landing it in case @dblaikie or @rsmith have concerns.

Of course.

clang/lib/Driver/ToolChains/Clang.cpp
6050–6053

I used the types::isABC functions because if a new input type is added, it will either be added to one of the isABC functions. If not, it would be save to not allow whitespace minimization with it until verified.

However, I did not think about the warning for missing enums. I am going to use a switch instead.

  • Introduce types::isDerivedFromC
  • Print input type in error message

Because of how large the switch construct would have been, I created a new function types::isDerivedFromC together with the other functions. These commonly have default-cases so compilers would not warn when Types.def is amended, but isDerivedFromC can be found between other functions that would need to be updated.

aaron.ballman accepted this revision.Jul 23 2021, 4:50 AM

Because of how large the switch construct would have been, I created a new function types::isDerivedFromC together with the other functions. These commonly have default-cases so compilers would not warn when Types.def is amended, but isDerivedFromC can be found between other functions that would need to be updated.

Thank you, I think that turned out cleanly!

clang/lib/Driver/Types.cpp
173 ↗(On Diff #360977)

Heh, I had to go look this one up. TIL... :-D

This revision was landed with ongoing or failed builds.Jul 25 2021, 9:39 PM
This revision was automatically updated to reflect the committed changes.

This broke some cases for me, where the output is missing some newlines, breaking the output badly. I’ll try to provide a repro…

This broke some cases for me, where the output is missing some newlines, breaking the output badly. I’ll try to provide a repro…

https://martin.st/temp/repro.tar.xz

clang -target x86_64-windows-gnu -E -P _mkerrcodes.h -Iinclude

The output (at the end) has missing newlines (e.g. around GPG_ERR_ETXTBSY).

Looking into it...

Meinersbur added a comment.EditedJul 27 2021, 5:08 PM

Proposed fix here: D106924

The reproducer had another whitespace difference before and after D104601: typedef __builtin_va_list __gnuc_va_list; instead of typedef __builtin_va_list __gnuc_va_list; (two spaces before "typedef" instead of one). I assume this is not the issue, especially since the source file (include\vadefs.h) actually has two spaces before "typedef".

romanovvlad added a subscriber: romanovvlad.EditedJul 29 2021, 4:30 AM

Hi @Meinersbur,

It seems the patch introduces one more regression. The following test doesn't pass on Windows:

// RUN: %clang -E %s -o %t.ii
// RUN: %clang %t.ii

#include "string.h"

int main() {
  return 0;
}

The following macro from vcruntime.h:

#define _CRT_BEGIN_C_HEADER            \
    __pragma(pack(push, _CRT_PACKING)) \
    extern "C" {

Becomes

#pragma pack(push, 8) extern "C" {

in the preprocessed file.

I'm not an expert in this code, but partially returning old behavior helped with that:

diff --git a/clang/lib/Frontend/PrintPreprocessedOutput.cpp b/clang/lib/Frontend/PrintPreprocessedOutput.cpp
index b725956..b49b247 100644
--- a/clang/lib/Frontend/PrintPreprocessedOutput.cpp
+++ b/clang/lib/Frontend/PrintPreprocessedOutput.cpp
@@ -772,6 +772,7 @@ static void PrintPreprocessedTokens(Preprocessor &PP, Token &Tok,
   bool IsStartOfLine = false;
   char Buffer[256];
   while (1) {
+    Callbacks->MoveToLine(Tok.getLocation(), /*RequireStartOfLine=*/false);
     // Two lines joined with line continuation ('\' as last character on the
     // line) must be emitted as one line even though Tok.getLine() returns two
     // different values. In this situation Tok.isAtStartOfLine() is false even

Could you please take a look?

alexfh added a subscriber: alexfh.Jul 29 2021, 4:07 PM

This commit changes the behavior of clang -E -P even when no -fminimize-whitespace is used. This breaks certain use cases like using clang to preprocess files for flex, which turns out to be sensitive to the presence of line breaks in places where C++ compilers aren't.

An isolated test case:

$ clang-old -E -x c++ -P - -o /tmp/pp.good
#define I(x, ...) \
 x { return X##x; }



#ifndef A
#define A(op, x) I(op, x)
#endif

A(foo, {







})
A(bar, {})
$ cat /tmp/pp.good
foo { return Xfoo; }
bar { return Xbar; }
$ clang-new -E -x c++ -P - -o /tmp/pp.bad
#define I(x, ...) \
 x { return X##x; }



#ifndef A
#define A(op, x) I(op, x)
#endif

A(foo, {







})
A(bar, {})
$ cat /tmp/pp.bad
foo { return Xfoo; }bar { return Xbar; }

Please fix or revert the commit.

Thanks!

This commit changes the behavior of clang -E -P even when no -fminimize-whitespace is used. This breaks certain use cases like using clang to preprocess files for flex, which turns out to be sensitive to the presence of line breaks in places where C++ compilers aren't.

An isolated test case:

$ clang-old -E -x c++ -P - -o /tmp/pp.good
#define I(x, ...) \
 x { return X##x; }



#ifndef A
#define A(op, x) I(op, x)
#endif

A(foo, {







})
A(bar, {})
$ cat /tmp/pp.good
foo { return Xfoo; }
bar { return Xbar; }
$ clang-new -E -x c++ -P - -o /tmp/pp.bad
#define I(x, ...) \
 x { return X##x; }



#ifndef A
#define A(op, x) I(op, x)
#endif

A(foo, {







})
A(bar, {})
$ cat /tmp/pp.bad
foo { return Xfoo; }bar { return Xbar; }

Please fix or revert the commit.

Thanks!

IIUC, c6b0b16c0f55c34f4eaa05184815bbbe97f4b750 was aimed to fix a similar issue, but it doesn't fix this specific case. If providing a proper fix quickly isn't feasible, consider reverting these patches.

@tstellar, FYI there are still issues being reported with this commit (which was made before the branch). One fix is committed and a backport request was made in https://llvm.org/PR51261, but there’s still a couple other outstanding issues here.

Looking at this once again, I'm pretty sure that reverting this is much safer than trying to fix forward. Unless there's a really trivial fix to the outstanding issues, I'll revert this later today.

I am working on a fix, but I wouldn't mind reverting.

I am working on a fix, but I wouldn't mind reverting.

I would be comfortable reverting the feature for Clang 13 and fixing forward on trunk (unless there's a need to revert from trunk temporarily, of course). Giving the feature a full release to bake isn't a bad outcome.

@romanovvlad This is due to -fms-extensions. It Expands __Pragma to #pragma instead of keeping it a __Pragma. This is a one-line fix.

@alexfh This was fixed by D106924

I would suggest to not revert. Will upload a patch for -fms-extensions after some cleanup.

I am working on a fix, but I wouldn't mind reverting.

I would be comfortable reverting the feature for Clang 13 and fixing forward on trunk (unless there's a need to revert from trunk temporarily, of course). Giving the feature a full release to bake isn't a bad outcome.

While this means that ccache can only use this feature when clang-14 is detected, I fully understand.

I am working on a fix, but I wouldn't mind reverting.

I would be comfortable reverting the feature for Clang 13 and fixing forward on trunk (unless there's a need to revert from trunk temporarily, of course). Giving the feature a full release to bake isn't a bad outcome.

While this means that ccache can only use this feature when clang-14 is detected, I fully understand.

Correct -- that would be unfortunate as I believe you were hoping for this to be in Clang 13 for ccache support. My fear of having this in Clang 13 is that very early testing has found several breakages so I worry if we've not had enough time to shake all those out. Having a bit more time might build our confidence. (So the result, to me, should be either that we back this out of the Clang 13 branch if we're still worried or we patch the Clang 13 branch if we think we've got all the cases, but I want to make sure we don't leave it somewhere in the middle for the release branch.)

Patch uploaded here: D107183

Correct -- that would be unfortunate as I believe you were hoping for this to be in Clang 13 for ccache support.

Yes, that would have been the ideal outcome.

@romanovvlad This is due to -fms-extensions. It Expands __Pragma to #pragma instead of keeping it a __Pragma. This is a one-line fix.

@alexfh This was fixed by D106924

You're right. It turns out that my comment https://reviews.llvm.org/D104601#2915122 was a mistake and c6b0b16c0f55c34f4eaa05184815bbbe97f4b750 actually fixes this particular issue.

I would suggest to not revert. Will upload a patch for -fms-extensions after some cleanup.

I'm running more thorough testing of clang with the c6b0b16c0f55c34f4eaa05184815bbbe97f4b750 commit to ensure we're not missing anything else here.

FYI, 13.0.0 rc1 is getting tagged on Monday, so it’d be good to have the branch in a usable state by then, either with all fixes we currently have in main, or reverted.

@mstorsjo I've taken all steps required for the resolution suggested @aaron.ballman. llvm.org/PR51300 suberseedes llvm.org/PR51261 (release-13.x branch is under @tstellard's control). D107183 has been pushed to main to fix the outstanding issue discovered by @romanovvlad (unfortunately without pre-commit review, but given that this might be blocking them, the patch is small and you would like the situation to be resolved by then, I think a post-commit review is acceptable).

@alexfh Happy to fix any additional issues you may find. The -fminimize-whitespace feature unfortunately required some restructuring of the the code since the decision on when to insert newline were done in many different places using different logics.

Patch uploaded here: D107183

The patch resolves the issue. Thanks!

Do we have all the issues fixed in trunk yet or do we need to revert in the release/13.x branch.

Do we have all the issues fixed in trunk yet or do we need to revert in the release/13.x branch.

All known issues have been fixed on trunk. However, @alexfh said they are going to do more thorough testing and @aaron.ballman has not changed their suggestion to revert on release/13.x. Since you just reverted on clang-13.x, I'd say the decision has been made.

RKSimon added inline comments.
clang/lib/Frontend/PrintPreprocessedOutput.cpp
684

@Meinersbur Static analysis is warning that these are both the same - should one be EmittedDirectiveOnThisLine ?

This commit seems to cause some regression for "-save-temps" as there is no new line before the pragma. See the below test case, -E will output
int test();#pragma clang assume_nonnull
It will fail the compilation on the preprocessed output with
error: expected unqualified-id
int test();#pragma clang assume_nonnull end

^

Thanks,

Manman

cat test-pragma.mm
// RUN: %clang -E -o - %s | FileCheck %s

#define CF_ASSUME_NONNULL_BEGIN _Pragma("clang assume_nonnull begin")
#define CF_ASSUME_NONNULL_END _Pragma("clang assume_nonnull end")

CF_ASSUME_NONNULL_BEGIN
int test();

CF_ASSUME_NONNULL_END
// CHECK-NOT: test();#pragma clang assume_nonnull

Meinersbur added a comment.EditedNov 4 2021, 10:58 PM

It will fail the compilation on the preprocessed output with
error: expected unqualified-id
int test();#pragma clang assume_nonnull end

The handler for assume_nonull passes an invalid SourceLocation if embedded in in a _Pragma. Not sure whether this is already a bug, but we should insert semantically relevant newlines even if the line for the #pragma is unknown. Fixed in rG1606022fab2d90ed8ee6d15800ec1c2c293db20e. Thanks for the report.

I looked into why it did work before -fminimize-whitespace introduced since the if (PLoc.isInvalid()) return; bail-out existed as well. There was another function startNewLineIfNeeded(bool ShouldUpdateCurrentLine) that was called regardless of PLoc and I folded the functionality into MoveToLine since having two functions with unclear separation between them (i.e. when should ShouldUpdateCurrentLine be true) made things difficult.

clang/lib/Frontend/PrintPreprocessedOutput.cpp
684

Fixed in rG8f099d17a1bee857ada3c5af032cfcb559252024. Thanks for the report and sorry for not seeing you comment until now.