This is an archive of the discontinued LLVM Phabricator instance.

[flang] Stricter "implicit continuation" in preprocessing
ClosedPublic

Authored by klausler on Jul 17 2023, 10:33 AM.

Details

Summary

The prescanner performs implicit line continuation when it looks
like the parenthesized arguments of a call to a function-like macro
may span multiple lines. In an attempt to work more like a
Fortran-oblivious C preprocessor, the prescanner will act as if
the following lines had been continuations so that the function-like
macro could be invoked.

This still seems like a good idea, but a recent bug report on
LLVM's GitHub issue tracker shows one way in which it could trigger
inadvertently and mess up a program. So this patch makes the
conditions for implicit line continuation much more strict.

First, the leading parenthesis has to have been preceded by an
identifier that's known to be a macro name. (It doesn't have to
be a function-like macro, since it's possible for a keyword-like
macro to expand to the name of a function-like macro.) Second,
no macro definition can ever have had unbalanced parentheses in
its replacement text.

Also cleans up some parenthesis recognition code to fix some
issues found in testing, so that a token with leading or trailing
spaces can still be recognized as a parenthesis or comma.

Fixes https://github.com/llvm/llvm-project/issues/63844.

Diff Detail

Event Timeline

klausler created this revision.Jul 17 2023, 10:33 AM
Herald added a project: Restricted Project. ยท View Herald Transcript
Herald added subscribers: sunshaoce, jdoerfert. ยท View Herald Transcript
klausler requested review of this revision.Jul 17 2023, 10:33 AM

Hi @klausler can you upload a patch with context. I left a few initial comments though.

Thanks!

flang/docs/Preprocessing.md
77

I'm not sure there is a test already for this case, is there?

Just to confirm I understand this correctly, this new paragraph covers this case below:

subroutine sub(a)
  implicit none
  integer :: a

#define foo(x) ((x) + 1)
#define bar foo

  a = bar  (
a
)

end subroutine sub

that should be preprocessed something similar to:

subroutine sub(a)
  implicit none
  integer :: a

  a = (( a ) + 1)

end subroutine sub
flang/include/flang/Parser/char-block.h
67

I'm confused by this function. I've tested it with several inputs and it always seems to return ' '?

My tests here: https://www.godbolt.org/z/af9oc74aE

Maybe I'm missing something. Can you add a comment about its purpose?

klausler added inline comments.Jul 18 2023, 6:37 AM
flang/docs/Preprocessing.md
77

Yes.

flang/include/flang/Parser/char-block.h
67

It returns the only non-blank character, if it is the only non-blank character.

klausler updated this revision to Diff 541516.Jul 18 2023, 7:21 AM

Add comment.

Thanks for the update, @klausler

I added some more comments.

(Pretty please ๐Ÿ™‚ , add the full context otherwise reviewing the change gets unnecessarily harder)

flang/include/flang/Parser/char-block.h
67

Ah, right a case I forgot to test ๐Ÿ˜… . Thanks.

Would it make sense not to use a loop? Something like this:

char OnlyNonBlank() const {
    if (size() == 1) {
        char ch{*begin()};
        if (ch != ' ' && ch != '\t') {
            return ch;
        }
    }
    return ' ';
}
flang/lib/Parser/preprocessor.cpp
1198

anyMacroWithUnbalancedParantheses_ is only getting updated when we find a directive and we do not seem to clean it up, so we end carrying wrong state up to the point where we decide we allow an implicit continuation.

Consider the following testcase:

program main
  implicit none
#define FOO(x) ((x) + 1)
#define BAR (
  integer :: k

  k = FOO(
    4)
end program main

This fails with the patch

$ flang-new -E -o- t.F90 
error: Could not scan t.F90
./t.F90:8:10: error: Unmatched '('
    k = FOO(
           ^
./t.F90:9:6: error: Unmatched ')'
      4)
       ^

but if you remove (or comment) the definition of BAR the file is scanned correctly.

flang-new -E -o- t.F90 
#line "./t.F90" 2
      program main
      implicit none


      integer :: k

      k = (( 4) + 1)

      end program main
klausler added inline comments.Jul 19 2023, 9:11 AM
flang/include/flang/Parser/char-block.h
67

The entire point of this function is to ignore leading and trailing blanks to expose a single-character token. Every character needs to be examined. This requires a loop.

flang/lib/Parser/preprocessor.cpp
1198

The whole point of this flag is to disable all implicit continuations if any macro definition has unbalanced parentheses. I can't figure out how to do it safely and I've spent too much time on this topic already.

klausler updated this revision to Diff 542075.Jul 19 2023, 9:33 AM

Fix setting of anyMacroWithUnbalancedParentheses_ -- once true, it must stay true.

This revision is now accepted and ready to land.Jul 21 2023, 12:22 PM

Any further comments, rogfer01?

rogfer01 accepted this revision.Jul 27 2023, 11:28 PM

LGTM. I suggest adding an XFAIL'd test but I'll leave this at your choice.

flang/include/flang/Parser/char-block.h
67

Right. Thanks for the clarification!

flang/lib/Parser/preprocessor.cpp
1198

Right. My concern is that the definition of a macro with unbalanced parentheses that it is not even used in the code causes a later implicit continuation to be disabled.

This is a bit surprising and I think it may be confusing. But it seems an unusual situation so I understand its impact is low and can be fixed later. I suggest to add a XFAIL testcase for it at least.

(gfortran and pgf90 don't have a problem with this. ifort/ifx rejects that but that seems because it does not support this kind of implicit continuations.)

This revision was automatically updated to reflect the committed changes.