This is an archive of the discontinued LLVM Phabricator instance.

[flang] Produce proper "preprocessor output" for -E option
ClosedPublic

Authored by klausler on Jul 23 2021, 4:56 PM.

Details

Summary

[flang] Produce proper "preprocessor output" for -E option

Rename the current -E option to "-E -Xflang -fno-reformat".

Add a new Parsing::EmitPreprocessedSource() routine to convert the
cooked character stream output of the prescanner back to something
more closely resembling output from a traditional preprocessor;
call this new routine when -E appears.

The new -E output is suitable for use as fixed form Fortran source to
compilation by (one hopes) any Fortran compiler. If the original
top-level source file had been free form source, the output will be
suitable for use as free form source as well; otherwise there may be
diagnostics about missing spaces if they were indeed absent in the
original fixed form source.

Unless the -P option appears, #line directives are interspersed
with the output (but be advised, f18 will ignore these if presented
with them in a later compilation).

An effort has been made to preserve original alphabetic character case
and source indentation.

Add -P and -fno-reformat to the new drivers.

Tweak test options to avoid confusion with prior -E output; use
-fno-reformat where needed, but prefer to keep -E, sometimes
in concert with -P, on most, updating expected results accordingly.

Diff Detail

Event Timeline

klausler created this revision.Jul 23 2021, 4:56 PM
klausler requested review of this revision.Jul 23 2021, 4:56 PM
klausler updated this revision to Diff 361398.Jul 23 2021, 6:26 PM

Update code to match commit comments.

Thank you for submitting this. It will unblock https://gitlab.kitware.com/cmake/cmake/-/issues/22387, which is quite an important milestone for Flang.

With this change, tests in https://github.com/llvm/llvm-project/tree/main/flang/test/Preprocessing require re-formatting or they will start failing.

Also, could a regression test be added that fails without your change? For example:

! Verify that the output from `-E` is valid fixed-form source. See
! https://bugs.llvm.org/show_bug.cgi?id=50993.

! RUN: %flang_fc1 -E %s 2>&1 | %flang_fc1 -fsyntax-only -ffixed-form 2>&1 | FileCheck %s --allow-empty

! CHECK-NOT: error
! CHECK-NOT: warning

! https://bugs.llvm.org/show_bug.cgi?id=51219
! CHECK-NOT: Character in fixed-form label field must be a digit

      PROGRAM HELLO
        write(*, *), "hello, world!"
      END PROGRAM

Feel free to edit and re-use this one ^^^. I also created https://bugs.llvm.org/show_bug.cgi?id=5121.9. Otherwise that final CHECK-NOT above is confusing.

I was hoping to get a code review. Tests will be added before submission.

The link you gave to bugzilla does not work.

With this change, tests in https://github.com/llvm/llvm-project/tree/main/flang/test/Preprocessing require re-formatting or they will start failing.

They all pass without change.

I was hoping to get a code review. Tests will be added before submission.

I read it and it made a lot of sense to me. It's fairly straightforward, like you suggested it would. The commit message makes it clear _what_ and _why_ is happening and that's consistent with what the code does. Glad to see -fdebug-dump-cooked-chars and -P being introduced. The latter is consistent with gfortran, which is important for us. This change should be sufficient for us to make progress with CMake.

I downloaded and tested your patch - hence my remark about the tests (they failed for me). Also, extra tests would really help to identify potential corner cases. Anyway, I scanned the implementation again and left a few inline comments.

The link you gave to bugzilla does not work.

Sorry, my bad: https://bugs.llvm.org/show_bug.cgi?id=51219.

flang/include/flang/Parser/parsing.h
40

[nit] Why not preprocess only? This is triggered by -E and does a bit more than just prescanning. But I don't mind.

flang/lib/Parser/parsing.cpp
104

This should be Fortran::parser::Options::fixedFormColumns rather than hard-coded 72.

136

Could you add an assert here assert(position->line > sourceLine && "sourceLine exceeded the actual source line")? There are not too many comments here and asserts can be very informative.

138

Why 10 and why make this case special?

149

This will only make sense for fixed-form, right?

They all pass without change.

Yes, but with this patch they fail. For example, https://github.com/llvm/llvm-project/blob/main/flang/test/Preprocessing/pp001.F:

bin/f18 -E /llvm-project/flang/test/Preprocessing/pp001.F
!dir$ fixed
#line "/llvm-project/flang/test/Preprocessing/pp001.F" 4
      integer, parameter :: KWM = 666

      if (777 .eq. 777) then
        print *, 'pp001.F yes'
      else
        print *, 'pp001.F no: ',777
      end if
      end

i.e., spaces in if (777 .eq. 777) then are preserved and hence ! CHECK: if(777.eq.777)then is no longer valid. This is the output that I get without your change:

integer,parameter::kwm=666
if(777.eq.777)then
print*,'pp001.F yes'
else
print*,'pp001.F no: ',777
endif
end
klausler marked 5 inline comments as done.Jul 26 2021, 11:33 AM
klausler added inline comments.
flang/include/flang/Parser/parsing.h
40

From the perspective of the Parsing module, we're just going to run the prescanner. F18 has no preprocessing phase; there is no way to just run its preprocessor. If there's a better accurate name that you would prefer, perhaps "preprocessedOutputOnly", let me know.

flang/lib/Parser/parsing.cpp
104

There's no guarantee that a later Fortran compilation of the -E output will also be using a nonstandard fixed form statement length. 72 is safer.

136

That case could arise in some future where f18 honors #line directives in its input.

138

Emitting a few newlines can be more readable and occupy fewer bytes than the equivalent #line directive would have required. The 10 is the number of bytes in a #line 123 directive (with newline).

149

The output is always valid fixed form Fortran.

awarzynski added inline comments.Jul 27 2021, 9:52 AM
flang/include/flang/Parser/parsing.h
40

prescanAndReformat?

flang/lib/Parser/parsing.cpp
104

Makes sense. Could you add a comment then? Otherwise it's just a magic number. A comment will prevent folks (including my future self) from changing this to Fortran::parser::Options::fixedFormColumns.

138

[nit] A non-magic number would be great (e.g. unsigned newLineCharLitmit = 10). Or a comment.

155

Non-digit in the first 6 columns (fixed form) is an error, right? Why not assert or issue a diagnostic here? This could hide a potential bug.

klausler marked 6 inline comments as done.Jul 27 2021, 10:09 AM
klausler added inline comments.
flang/lib/Parser/parsing.cpp
155

It would be an error if we were examining the original source file, yes, but this code runs later in compilation when we are reformatting the cooked character stream, from which excess spaces have been removed by prescanning.

awarzynski added inline comments.Jul 27 2021, 1:16 PM
flang/lib/Parser/parsing.cpp
155

Ah, so ch is assumed to be a valid character here. That wasn't clear.

A comment for this case would be appreciated (similar to e.g. // Wrap long lines in a portable fashion, perhaps // Add whitespaces in the first 6 columns to turn this into a valid fixed-form file?).

klausler marked 4 inline comments as done.Jul 27 2021, 1:49 PM
klausler added inline comments.
flang/include/flang/Parser/parsing.h
40

Whatever you like; will do.

klausler updated this revision to Diff 362174.Jul 27 2021, 2:04 PM
klausler marked an inline comment as done.

Add new/newly supported options to new driver.

Update options in tests.

Address review comments.

Thank you for updating the new driver and fixing the tests. Everything builds fine and tests pass. Tested with the new driver. Since you are adding this in the new driver too, I suggest that for consistency sake we rename -fdebug-dump-cooked-chars . Let me elaborate.

All options in the form of -fdebug-dump-<something> correspond to an action and have a dedicated instance of FrontendAction (in FrontendActions.h). -fdebug-dump-cooked-chars is a feature option that configures -E (i.e. preprocessing) rather than an action in itself. Indeed, you tweaked PrintPreprocessedAction rather than created a new action, which makes a lot of sense. As -fdebug-dump-cooked-chars is not an action, there are a few changes here that are not required (see inline comments). As for the spelling itself, -fdebug-dump-cooked-chars will not dump anything on its own. It needs to be paired with -E. So the name is a bit misleading. Perhaps we could have something like -fcooked-only or -fno-reformat? Naming is hard, I have no particular preferences.

Apologies for not bringing this up earlier. In f18, we don't differentiate between "action" options and "feature" options, so it doesn't matter that much. But in flang-new it becomes more relevant. Let me know if you'd like me to go into more detail with respect to action/non-action flags.

clang/include/clang/Driver/Options.td
4535–4536 ↗(On Diff #362174)

This is the decision point (action vs non-action option):

  • Group<Preprocessor_Group> --> -fdebug-dump-cooked-chars is a preprocessor flag (i.e. tweaks the behavior of -E)
  • Group<Action_Group> --> -fdebug-dump-cooked-chars is an action flag

Just highlighting. I think that Preprocessor_Group is exactly what we want here.

flang/include/flang/Frontend/FrontendOptions.h
31–33 ↗(On Diff #362174)

Not needed, -fdebug-dump-cooked-chars is not an action.

255–276 ↗(On Diff #362174)

Do you think that we need accessors and mutators here? We've kept FrontendOptions rather basic - it's just a plain aggregate. There's no intention to make it any more "clever". Keeping everything "public" allows us to have a rather basic and concise interface.

If we do add member methods for noLineDirectives_ and noLineDirectives_, then perhaps we should do it for all member variables? But then we'll bloat this class.

Also, we've been using bitfields rather than bools for this class to keep it slim. Could we stick to bitfields for consistency?

flang/lib/Frontend/CompilerInvocation.cpp
122–124 ↗(On Diff #362174)

In this switch statement we only decide _which_ action to run. In this case, it's PrintPreprocessedInput.

Preprocessor options are set in parsePreprocessorArgs. Could you move this code there?

126–127 ↗(On Diff #362174)

This case is not needed, -fdebug-dump-cooked-chars is not an action.

flang/lib/Parser/parsing.cpp
57–58

This is nly required for fixed-form, but not for free-form. That's fine, but it's worth adding a comment that this is intentional. For example:

// Make sure that the first 6 columns are formatted correctly. We focus on the standard Fortran fixed form, which is also valid free form.

My future self will be grateful.

flang/test/Preprocessing/dash-E.F90
1 ↗(On Diff #362174)

How about statement labels? That's still not tested.

klausler marked 5 inline comments as done.Jul 28 2021, 9:10 AM
klausler added inline comments.
flang/include/flang/Frontend/FrontendOptions.h
31–33 ↗(On Diff #362174)

It used to be so in my first patch. Removed.

255–276 ↗(On Diff #362174)

I put them there because you're using names in that class with terminal underscores, and that's a convention in f18 (and google3) meant only for private data members; I didn't want to perpetuate that error. If everything here is meant to be public, this should be a struct whose members are all data and all public.

Bitfields can't have default initializers, which is the best way to initialize these things; otherwise the default value is separated from the declaration and possibly in another file.

klausler marked 2 inline comments as done.Jul 28 2021, 11:05 AM

As for the spelling itself, -fdebug-dump-cooked-chars will not dump anything on its own. It needs to be paired with -E. So the name is a bit misleading. Perhaps we could have something like -fcooked-only or -fno-reformat? Naming is hard, I have no particular preferences.

Tell me what you want, and I'll change the patch to use it. Otherwise I'll have to iterate.

klausler updated this revision to Diff 362499.Jul 28 2021, 12:41 PM

Address review comments (except for renaming -fdebug-dump-cooked-chars, need to know new name for that)

klausler marked 2 inline comments as done.Jul 28 2021, 12:42 PM
klausler updated this revision to Diff 362546.Jul 28 2021, 2:52 PM

Fix new test's RUN line; add --strict-whitespace to FileCheck.

As for the spelling itself, -fdebug-dump-cooked-chars will not dump anything on its own. It needs to be paired with -E. So the name is a bit misleading. Perhaps we could have something like -fcooked-only or -fno-reformat? Naming is hard, I have no particular preferences.

Tell me what you want, and I'll change the patch to use it. Otherwise I'll have to iterate.

-fno-reformat, please.

flang/include/flang/Frontend/FrontendOptions.h
255–276 ↗(On Diff #362174)

I put them there because you're using names in that class with terminal underscores, and that's a convention in f18 (and google3) meant only for private data members; I didn't want to perpetuate that error. If everything here is meant to be public, this should be a struct whose members are all data and all public.

Ack. Fixed in https://reviews.llvm.org/D107062

Bitfields can't have default initializers, which is the best way to initialize these things; otherwise the default value is separated from the declaration and possibly in another file.

Sounds like each approach has its pros and cons. I would still prefer consistency and to rely on the constructor to initialise everything (which is nearby, in the same file). If you are strongly in favour bools, then let's update all fields at the same time. Preferably in a dedicated patch.

awarzynski added inline comments.Jul 29 2021, 7:02 AM
clang/include/clang/Driver/Options.td
4540–4541 ↗(On Diff #362546)

In this group, all options have the following flags set:

let Flags = [FC1Option, FlangOnlyOption] in {

This means that we only support them in the frontend driver (FC1Option). Frontend-only options should be skipped in clang/lib/Driver/ToolChains/Flang.cpp (the bridge between the compiler driver and the frontend driver). This way, flang-new will ignore it:

bin/flang-new  -E -P  -fdebug-dump-cooked-chars  file.f
flang-new: warning: argument unused during compilation: '-fdebug-dump-cooked-chars'

Users can still use it with -Xflang, e.g.

flang-new -E -Xflang -fdebug-dump-cooked-chars file.f

This way, we avoid exposing all of the frontend options in the compiler driver mode.

clang/lib/Driver/ToolChains/Flang.cpp
43 ↗(On Diff #362546)

You can delete options::OPT_fdebug_dump_cooked_chars from here. It won't be supported by the compiler driver, but that's fine. It's mostly for compiler developers, so it's fine to restrict it to the frontend driver. You can still pass it to flang-new -fc1 with -Xflang.

As for -P, could you add it to https://github.com/llvm/llvm-project/blob/main/flang/test/Driver/frontend-forwarding.f90?

awarzynski added inline comments.Jul 29 2021, 7:27 AM
flang/lib/Frontend/CompilerInvocation.cpp
378–379 ↗(On Diff #362546)

I would replace this comment with a diagnostic, e.g. :

const unsigned diagID = diags.getCustomDiagID(clang::DiagnosticsEngine::Warning, "'-P' will be ignored ('-P' and '-fno-reformat' are incompatible)");
diags.Report(diagID);
}

This way, the driver becomes a bit more friendly towards end-users. You will have to update this method a tiny bit. Here's a patch with a test (it's a small change):

diff --git a/flang/lib/Frontend/CompilerInvocation.cpp b/flang/lib/Frontend/CompilerInvocation.cpp
index 04905f103daf..3d38d6dba3ff 100644
--- a/flang/lib/Frontend/CompilerInvocation.cpp
+++ b/flang/lib/Frontend/CompilerInvocation.cpp
@@ -342,8 +342,10 @@ static std::string getIntrinsicDir() {
 ///
 /// \param [in] opts The preprocessor options instance
 /// \param [out] args The list of input arguments
-static void parsePreprocessorArgs(
-    Fortran::frontend::PreprocessorOptions &opts, llvm::opt::ArgList &args) {
+static bool parsePreprocessorArgs(Fortran::frontend::PreprocessorOptions &opts,
+    llvm::opt::ArgList &args, clang::DiagnosticsEngine &diags) {
+  unsigned numErrorsBefore = diags.getNumErrors();
+
   // Add macros from the command line.
   for (const auto *currentArg : args.filtered(
            clang::driver::options::OPT_D, clang::driver::options::OPT_U)) {
@@ -372,13 +374,24 @@ static void parsePreprocessorArgs(
         ? PPMacrosFlag::Include
         : PPMacrosFlag::Exclude;

+
   if (args.hasArg(clang::driver::options::OPT_fdebug_dump_cooked_chars)) {
     opts.set_dumpCookedChars(true);
-  } else if (args.hasArg(clang::driver::options::OPT_P)) {
-    // If both -P and -fdebug-dump-cooked-chars appear, -P is
-    // unclaimed and will elicit a warning
+  }
+
+  if (args.hasArg(clang::driver::options::OPT_P)) {
     opts.set_noLineDirectives(true);
   }
+
+  if (opts.noLineDirectives() && opts.dumpCookedChars()) {
+    const unsigned diagID =
+        diags.getCustomDiagID(clang::DiagnosticsEngine::Warning,
+            "'-P' will be ignored ('-P' and '-fno-reformat' are incompatible)");
+    diags.Report(diagID);
+    opts.set_noLineDirectives(false);
+  }
+
+  return diags.getNumErrors() == numErrorsBefore;
 }

 /// Parses all semantic related arguments and populates the variables
@@ -541,7 +554,7 @@ bool CompilerInvocation::CreateFromArgs(CompilerInvocation &res,
   }

   success &= ParseFrontendArgs(res.frontendOpts(), args, diags);
-  parsePreprocessorArgs(res.preprocessorOpts(), args);
+  success &= parsePreprocessorArgs(res.preprocessorOpts(), args, diags);
   success &= parseSemaArgs(res, args, diags);
   success &= parseDialectArgs(res, args, diags);
   success &= parseDiagArgs(res, args, diags);
diff --git a/flang/test/Driver/pp-flags-warning.f90 b/flang/test/Driver/pp-flags-warning.f90
new file mode 100644
index 000000000000..5fbf3a1574a5
--- /dev/null
+++ b/flang/test/Driver/pp-flags-warning.f90
@@ -0,0 +1,8 @@
+! REQUIRES: new-flang-driver
+
+! RUN: %flang_fc1 -E -P -fdebug-dump-cooked-chars %s 2>&1 | FileCheck %s
+
+! CHECK: warning: '-P' will be ignored ('-P' and '-fno-reformat' are incompatible)
+
+program hello
+end

Feel free to use this (with or without edits).

klausler marked 4 inline comments as done.Jul 29 2021, 11:48 AM
klausler added inline comments.
flang/include/flang/Frontend/FrontendOptions.h
255–276 ↗(On Diff #362174)

They've been moved to PreprocessorOptions.

flang/lib/Frontend/CompilerInvocation.cpp
378–379 ↗(On Diff #362546)

Now that -fno-reformat is a developer-only flag, this is less important. Changing to accept both. The output of -fno-reformat has no #line directives so the -P is at worst redundant when both appear.

klausler updated this revision to Diff 362841.Jul 29 2021, 11:51 AM
klausler marked 2 inline comments as done.
klausler edited the summary of this revision. (Show Details)

Rename original -fdebug-dump-cooked-chars to -fno-reformat, available only in the frontend driver.

Thank you for all the comments in EmitPreprocessedSource! I've left a couple of final suggestions, but nothing major.

flang/include/flang/Frontend/PreprocessorOptions.h
47–68 ↗(On Diff #362841)

Could you make noReformat and noLineDirective public (for consistency with other member variables here)? If you have a strong preference for private members, then all of them should be private.

Btw, I've updated PreprocessorOptions in https://reviews.llvm.org/D107062 so that it is consistent with FrontendOptions.

flang/lib/Parser/parsing.cpp
113

If you insert continue after ++sourceLine, you can avoid the following else block. That's also one level of indentation less, which makes things easier to read.

klausler marked 2 inline comments as done.Jul 30 2021, 8:48 AM
klausler added inline comments.
flang/lib/Parser/parsing.cpp
113

Adding gotos, however spelled, tends to make code harder to follow than simple nested structures. No.

klausler updated this revision to Diff 363112.Jul 30 2021, 8:58 AM
klausler marked an inline comment as done.

Address final(?) comments

awarzynski accepted this revision.Jul 30 2021, 9:29 AM

LGTM. This patch is much appreciated and thank you for addressing my comments.

flang/lib/Parser/parsing.cpp
113
This revision is now accepted and ready to land.Jul 30 2021, 9:29 AM
klausler updated this revision to Diff 363212.Jul 30 2021, 3:01 PM
klausler marked an inline comment as done.
klausler edited the summary of this revision. (Show Details)

Revert some test changes; instead of adding "-fno-reformat" to most tests using -E, apply a mix of updated test results with some use of -P where necessary.

This revision was landed with ongoing or failed builds.Jul 30 2021, 3:14 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJul 30 2021, 3:14 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript