This is an archive of the discontinued LLVM Phabricator instance.

[flang] Fix semantic checks for C919
ClosedPublic

Authored by peixin on May 28 2022, 7:49 AM.

Details

Summary

The previous semantic analysis does not consider when the last part-ref
is scalar or complex part. Refactor the previous code and bring all the
checks into one place. The check starts from the designator by
extracting the dataref wrapped including the substring and complex part
and recursively check the base objects.

Co-authored-by: Peter Klausler <pklausler@nvidia.com>

Diff Detail

Event Timeline

peixin created this revision.May 28 2022, 7:49 AM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: jdoerfert. · View Herald Transcript
peixin requested review of this revision.May 28 2022, 7:49 AM
peixin added inline comments.May 28 2022, 8:02 AM
flang/lib/Semantics/expression.cpp
260

I think this message is not clear enough. For example, users fixing one problem for call sub(t(:)%t3(:)%t2(:)%t1(:)%s(1,:)(1:2)) by changing from s(1,:)(1:2) into s(1,1)(1:2) will get another error.

klausler requested changes to this revision.May 28 2022, 10:29 AM
klausler added inline comments.
flang/lib/Semantics/expression.cpp
252

This message is misspelled and inaccurate, and the loop does not do what I think you think it does. It is traversing the subscripts of an array reference like A%C(:,:), not multiple component part-refs.

296

This new message is also misspelled and will be inaccurate for a case like A(:)%C where there is only one component.

1105

Similar problems as previous messages, and perhaps redundant with the other checks. Is there a case that this new code catches that the other checks do not?

This revision now requires changes to proceed.May 28 2022, 10:29 AM
peixin added inline comments.May 28 2022, 11:40 AM
flang/lib/Semantics/expression.cpp
252

Right. This may can be the following:

A derived type accessbility cannot have more than one part references with nonzero rank

296

Same as above.

1105

Sorry, my bad. When I fix the test cases, I missed this.

The case not covered by previous checks is when the last part ref is scalar variable.

program m
  type mt
    integer :: x
    character(10) :: s
  end type
  type mt2
    type(mt) :: t1(2)
  end type
  type mt3
    type(mt2) :: t2(2)
  end type
  type mt4
    type(mt3) :: t3(2)
  end type
  type(mt4) :: t(2)

  call sub(t%t3%t2%t1%x)
  call sub2(t%t3%t2%t1%s(1:2))
end
peixin added inline comments.May 28 2022, 11:47 AM
flang/lib/Semantics/expression.cpp
252

I am thinking why we don't use the description the standard says.

The data reference cannot have more than one part references with nonzero rank

BTW, here are gfortran and ifort message:

gfortran: Two or more part references with nonzero rank must not be specified at (1)
ifort: A component cannot be an array if the encompassing structure is an array

klausler added inline comments.May 28 2022, 11:52 AM
flang/lib/Semantics/expression.cpp
252

I understand the constraint. I'll figure out why the existing checks didn't catch the case of a scalar terminal component.

The existing checks nearly work, but they don't recurse to check base objects. Try this patch and see whether you like it -- it brings the checks into one place and handles all of the cases:

diff --git a/flang/include/flang/Semantics/expression.h b/flang/include/flang/Semantics/expression.h
index 6c0d385a..c3755f0a 100644
--- a/flang/include/flang/Semantics/expression.h
+++ b/flang/include/flang/Semantics/expression.h
@@ -322,6 +322,7 @@ private:
       DataRef &&, const Symbol &, const semantics::Scope &);
   MaybeExpr CompleteSubscripts(ArrayRef &&);
   MaybeExpr ApplySubscripts(DataRef &&, std::vector<Subscript> &&);
+  void CheckRanks(const DataRef &);
   MaybeExpr TopLevelChecks(DataRef &&);
   std::optional<Expr<SubscriptInteger>> GetSubstringBound(
       const std::optional<parser::ScalarIntExpr> &);
diff --git a/flang/lib/Semantics/expression.cpp b/flang/lib/Semantics/expression.cpp
index b2330db5..c8685d8f 100644
--- a/flang/lib/Semantics/expression.cpp
+++ b/flang/lib/Semantics/expression.cpp
@@ -249,20 +249,6 @@ MaybeExpr ExpressionAnalyzer::CompleteSubscripts(ArrayRef &&ref) {
           symbolRank, symbol.name(), subscripts);
     }
     return std::nullopt;
-  } else if (Component * component{ref.base().UnwrapComponent()}) {
-    int baseRank{component->base().Rank()};
-    if (baseRank > 0) {
-      int subscriptRank{0};
-      for (const auto &expr : ref.subscript()) {
-        subscriptRank += expr.Rank();
-      }
-      if (subscriptRank > 0) { // C919a
-        Say("Subscripts of component '%s' of rank-%d derived type "
-            "array have rank %d but must all be scalar"_err_en_US,
-            symbol.name(), baseRank, subscriptRank);
-        return std::nullopt;
-      }
-    }
   } else if (const auto *object{
                  symbol.detailsIf<semantics::ObjectEntityDetails>()}) {
     // C928 & C1002
@@ -306,20 +292,48 @@ MaybeExpr ExpressionAnalyzer::ApplySubscripts(
       std::move(dataRef.u));
 }

+// C919a - only one part-ref of a data-ref may have rank > 0
+void ExpressionAnalyzer::CheckRanks(const DataRef &dataRef) {
+  common::visit(
+      common::visitors{
+          [this](const Component &component) {
+            const Symbol &symbol{component.GetLastSymbol()};
+            if (int componentRank{symbol.Rank()}; componentRank > 0) {
+              if (int baseRank{component.base().Rank()}; baseRank > 0) {
+                Say("Reference to whole rank-%d component '%s' of rank-%d array of derived type is not allowed"_err_en_US,
+                    componentRank, symbol.name(), baseRank);
+              }
+            } else {
+              CheckRanks(component.base());
+            }
+          },
+          [this](const ArrayRef &arrayRef) {
+            if (const auto *component{arrayRef.base().UnwrapComponent()}) {
+              int subscriptRank{0};
+              for (const Subscript &subscript : arrayRef.subscript()) {
+                subscriptRank += subscript.Rank();
+              }
+              if (subscriptRank > 0) {
+                if (int componentBaseRank{component->base().Rank()};
+                    componentBaseRank > 0) {
+                  Say("Subscripts of component '%s' of rank-%d derived type array have rank %d but must all be scalar"_err_en_US,
+                      component->GetLastSymbol().name(), componentBaseRank,
+                      subscriptRank);
+                }
+              } else {
+                CheckRanks(component->base());
+              }
+            }
+          },
+          [](const SymbolRef &) {},
+          [](const CoarrayRef &) {},
+      },
+      dataRef.u);
+}
+
 // Top-level checks for data references.
 MaybeExpr ExpressionAnalyzer::TopLevelChecks(DataRef &&dataRef) {
-  if (Component * component{std::get_if<Component>(&dataRef.u)}) {
-    const Symbol &symbol{component->GetLastSymbol()};
-    int componentRank{symbol.Rank()};
-    if (componentRank > 0) {
-      int baseRank{component->base().Rank()};
-      if (baseRank > 0) { // C919a
-        Say("Reference to whole rank-%d component '%%%s' of "
-            "rank-%d array of derived type is not allowed"_err_en_US,
-            componentRank, symbol.name(), baseRank);
-      }
-    }
-  }
+  CheckRanks(dataRef);
   return Designate(std::move(dataRef));
 }

The existing checks nearly work, but they don't recurse to check base objects. Try this patch and see whether you like it -- it brings the checks into one place and handles all of the cases:

diff --git a/flang/include/flang/Semantics/expression.h b/flang/include/flang/Semantics/expression.h
index 6c0d385a..c3755f0a 100644
--- a/flang/include/flang/Semantics/expression.h
+++ b/flang/include/flang/Semantics/expression.h
@@ -322,6 +322,7 @@ private:
       DataRef &&, const Symbol &, const semantics::Scope &);
   MaybeExpr CompleteSubscripts(ArrayRef &&);
   MaybeExpr ApplySubscripts(DataRef &&, std::vector<Subscript> &&);
+  void CheckRanks(const DataRef &);
   MaybeExpr TopLevelChecks(DataRef &&);
   std::optional<Expr<SubscriptInteger>> GetSubstringBound(
       const std::optional<parser::ScalarIntExpr> &);
diff --git a/flang/lib/Semantics/expression.cpp b/flang/lib/Semantics/expression.cpp
index b2330db5..c8685d8f 100644
--- a/flang/lib/Semantics/expression.cpp
+++ b/flang/lib/Semantics/expression.cpp
@@ -249,20 +249,6 @@ MaybeExpr ExpressionAnalyzer::CompleteSubscripts(ArrayRef &&ref) {
           symbolRank, symbol.name(), subscripts);
     }
     return std::nullopt;
-  } else if (Component * component{ref.base().UnwrapComponent()}) {
-    int baseRank{component->base().Rank()};
-    if (baseRank > 0) {
-      int subscriptRank{0};
-      for (const auto &expr : ref.subscript()) {
-        subscriptRank += expr.Rank();
-      }
-      if (subscriptRank > 0) { // C919a
-        Say("Subscripts of component '%s' of rank-%d derived type "
-            "array have rank %d but must all be scalar"_err_en_US,
-            symbol.name(), baseRank, subscriptRank);
-        return std::nullopt;
-      }
-    }
   } else if (const auto *object{
                  symbol.detailsIf<semantics::ObjectEntityDetails>()}) {
     // C928 & C1002
@@ -306,20 +292,48 @@ MaybeExpr ExpressionAnalyzer::ApplySubscripts(
       std::move(dataRef.u));
 }

+// C919a - only one part-ref of a data-ref may have rank > 0
+void ExpressionAnalyzer::CheckRanks(const DataRef &dataRef) {
+  common::visit(
+      common::visitors{
+          [this](const Component &component) {
+            const Symbol &symbol{component.GetLastSymbol()};
+            if (int componentRank{symbol.Rank()}; componentRank > 0) {
+              if (int baseRank{component.base().Rank()}; baseRank > 0) {
+                Say("Reference to whole rank-%d component '%s' of rank-%d array of derived type is not allowed"_err_en_US,
+                    componentRank, symbol.name(), baseRank);
+              }
+            } else {
+              CheckRanks(component.base());
+            }
+          },
+          [this](const ArrayRef &arrayRef) {
+            if (const auto *component{arrayRef.base().UnwrapComponent()}) {
+              int subscriptRank{0};
+              for (const Subscript &subscript : arrayRef.subscript()) {
+                subscriptRank += subscript.Rank();
+              }
+              if (subscriptRank > 0) {
+                if (int componentBaseRank{component->base().Rank()};
+                    componentBaseRank > 0) {
+                  Say("Subscripts of component '%s' of rank-%d derived type array have rank %d but must all be scalar"_err_en_US,
+                      component->GetLastSymbol().name(), componentBaseRank,
+                      subscriptRank);
+                }
+              } else {
+                CheckRanks(component->base());
+              }
+            }
+          },
+          [](const SymbolRef &) {},
+          [](const CoarrayRef &) {},
+      },
+      dataRef.u);
+}
+
 // Top-level checks for data references.
 MaybeExpr ExpressionAnalyzer::TopLevelChecks(DataRef &&dataRef) {
-  if (Component * component{std::get_if<Component>(&dataRef.u)}) {
-    const Symbol &symbol{component->GetLastSymbol()};
-    int componentRank{symbol.Rank()};
-    if (componentRank > 0) {
-      int baseRank{component->base().Rank()};
-      if (baseRank > 0) { // C919a
-        Say("Reference to whole rank-%d component '%%%s' of "
-            "rank-%d array of derived type is not allowed"_err_en_US,
-            componentRank, symbol.name(), baseRank);
-      }
-    }
-  }
+  CheckRanks(dataRef);
   return Designate(std::move(dataRef));
 }

Thanks for this.

it brings the checks into one place and handles all of the cases

This is really what I like. I have two more concerns.

  1. TopLevelChecks is called in both Analyze(const parser::Substring & and Analyze(const parser::Designator. There is redundant check for the sub1 call in the following case:
program m
  type mt
    integer :: x
    character(10) :: s
  end type
  type mt2
    type(mt) :: t1(2)
  end type
  type mt3
    type(mt2) :: t2(2)
  end type
  type mt4
    type(mt3) :: t3(2)
  end type
  type(mt4) :: t(2)
  call sub(t%t3%t2%t1%x)
  call sub1(t(1)%t3(1)%t2(1)%t1(1)%s(1:))
end
| | | | | ActualArg -> Expr = 't%t3%t2%t1%x'
| | | | | | Designator -> DataRef -> StructureComponent
| | | | | | | DataRef -> StructureComponent
| | | | | | | | DataRef -> StructureComponent
| | | | | | | | | DataRef -> StructureComponent
| | | | | | | | | | DataRef -> Name = 't'
| | | | | | | | | | Name = 't3'
| | | | | | | | | Name = 't2'
| | | | | | | | Name = 't1'
| | | | | | | Name = 'x'
| | | | | ActualArg -> Expr = 't(1_8)%t3(1_8)%t2(1_8)%t1(1_8)%s(1_8:10_8)'
| | | | | | Designator -> Substring
| | | | | | | DataRef -> StructureComponent
| | | | | | | | DataRef -> ArrayElement
| | | | | | | | | DataRef -> StructureComponent
| | | | | | | | | | DataRef -> ArrayElement
| | | | | | | | | | | DataRef -> StructureComponent
| | | | | | | | | | | | DataRef -> ArrayElement
| | | | | | | | | | | | | DataRef -> StructureComponent
| | | | | | | | | | | | | | DataRef -> ArrayElement
| | | | | | | | | | | | | | | DataRef -> Name = 't'
| | | | | | | | | | | | | | | SectionSubscript -> Integer -> Expr = '1_4'
| | | | | | | | | | | | | | | | LiteralConstant -> IntLiteralConstant = '1'
| | | | | | | | | | | | | | Name = 't3'
| | | | | | | | | | | | | SectionSubscript -> Integer -> Expr = '1_4'
| | | | | | | | | | | | | | LiteralConstant -> IntLiteralConstant = '1'
| | | | | | | | | | | | Name = 't2'
| | | | | | | | | | | SectionSubscript -> Integer -> Expr = '1_4'
| | | | | | | | | | | | LiteralConstant -> IntLiteralConstant = '1'
| | | | | | | | | | Name = 't1'
| | | | | | | | | SectionSubscript -> Integer -> Expr = '1_4'
| | | | | | | | | | LiteralConstant -> IntLiteralConstant = '1'
| | | | | | | | Name = 's'
| | | | | | | SubstringRange
| | | | | | | | Scalar -> Integer -> Expr = '1_4'
| | | | | | | | | LiteralConstant -> IntLiteralConstant = '1'

To look at the parse-tree, moving CheckRanks inside Analyze(const parser::Designator seems to be better.

  1. The purpose of returning void for CheckRanks is to give users more semantic error messages, right? Can we return one boolean value for it and return std::nullopt in caller if it fails? In this case, for one expression, we abort the analysis if there is one error in case there will be one internal error or seg fault? This is not one hard suggestion.

Your suggestions seem fine to me. If you move CheckRanks out of TopLevelChecks, then TopLevelChecks can just become a call to Designate, too.

If you move CheckRanks out of TopLevelChecks, then TopLevelChecks can just become a call to Designate, too.

Yes, that's what I also wanted to suggest, too. Thanks for the confirm. Do you want to fix this patch and create one PR considering you contribute more than me for this patch, or let me finish it with some more regression tests of the last symbol is scalar non-char and char and name you as an co-author?

If you move CheckRanks out of TopLevelChecks, then TopLevelChecks can just become a call to Designate, too.

Yes, that's what I also wanted to suggest, too. Thanks for the confirm. Do you want to fix this patch and create one PR considering you contribute more than me for this patch, or let me finish it with some more regression tests of the last symbol is scalar non-char and char and name you as an co-author?

I'm happy either way, just let me know.

peixin added a comment.EditedMay 29 2022, 8:29 PM

@klausler Sorry, I may mislead you.

  1. I dig deep for this and find my proposed suggestions are not entirely correct. Your solution also cannot support all the cases. The DataRef is defined here (https://github.com/llvm/llvm-project/blob/42c3c70a9e3f0a697b8794f4e8b603091ffb12a4/flang/include/flang/Evaluate/variable.h#L285-L299). It does not contain the substring. So I missed the substring.

And Designator is used as follows:

(gdb) pt d
type = const struct Fortran::parser::Designator {
    Fortran::parser::CharBlock source;
    std::variant<Fortran::parser::DataRef, Fortran::parser::Substring> u;
subroutine s()
  interface
    function real_info(i)
    end
  end interface
  type mt
    complex :: c, c2(2)
    integer :: x, x2(2)
    character(10) :: s, s2(2)
   contains
    procedure, nopass :: info => real_info
  end type
  type mt2
    type(mt) :: t1(2)
  end type
  type mt3
    type(mt2) :: t2(2)
  end type
  type mt4
    type(mt3) :: t3(2)
  end type
  type(mt4) :: t(2)
  call sub5(t%t3%t2%t1%c%RE)
  call sub6(t%t3%t2%t1%c2(1)%RE)
end

The parse tree is like the following:

| | | | | ActualArg -> Expr = 't%t3%t2%t1%c%RE'
| | | | | | Designator -> DataRef -> StructureComponent
| | | | | | | DataRef -> StructureComponent
| | | | | ActualArg -> Expr = 't%t3%t2%t1%c2(1_8)%RE'
| | | | | | Designator -> DataRef -> StructureComponent
| | | | | | | DataRef -> ArrayElement
| | | | | | | | DataRef -> StructureComponent

The CheckRanks cannot be at the position of previous TopLevelChecks (https://github.com/llvm/llvm-project/blob/88af539c0eaa5a282e6c1632797d33dfc0ebcb65/flang/lib/Semantics/expression.cpp#L374-L376) since the result after Analyze for complex part and others are different and the dataRef for complex part is empty. So the real solution should be like the following:

MaybeExpr ExpressionAnalyzer::Analyze(const parser::Designator &d) {
  auto restorer{GetContextualMessages().SetLocation(d.source)};
  FixMisparsedSubstring(d);
  if (auto *dataRef{std::get_if<parser::DataRef>(&d.u)}) {
    if (!CheckRanks(std::move(*dataRef))) {
      return std::nullopt;
    }
  } else if (auto *substr{std::get_if<parser::Substring>(&d.u)}) {
    ...
  }

  // These checks have to be deferred to these "top level" data-refs where
  // we can be sure that there are no following subscripts (yet).
  if (MaybeExpr result{Analyze(d.u)}) {
    if (std::optional<DataRef> dataRef{ExtractDataRef(std::move(result))}) {
      return Designate(std::move(*dataRef));
    }
    return result;
  }
  return std::nullopt;
}

If you agree, I would like to finish this.

  1. I hit some more unimplemented semantic checks:
function real_info1(i)
end
subroutine real_info2()
end
subroutine s()
  interface
    function real_info1(i)
    end
    subroutine real_info2()
    end
  end interface
  type mt
    complex :: c, c2(2)
    integer :: x, x2(2)
    character(10) :: s, s2(2)
   contains
    procedure, nopass :: info1 => real_info1
    procedure, nopass :: info2 => real_info2
  end type
  type mt2
    type(mt) :: t1(2)
  end type
  type mt3
    type(mt2) :: t2(2)
  end type
  type mt4
    type(mt3) :: t3(2)
  end type
  type(mt4) :: t(2)

  call sub0(t%t3%t2%t1%info1(i))
  call sub1(t%t3%t2%t1%info2) ! test this when remove the nopass
  call t%t3%t2%t1%info2
  call sub2(t%t3%t2%t1%info2) ! test this with nopass attribute
end

These four are not implemented. Do you want to take up some, then I can file some tickets for you? After finish this patch, I will also start to look at one or more.

  1. When I read the code in expression.cpp, some regression tests are missed. I mean, only have function code. The test cases in this patch is one example. Do you have any plans to add them? I can also help add some when I read the code and find the missed ones.
peixin updated this revision to Diff 432887.May 30 2022, 4:32 AM

@klausler Please check if this is the right fix. This should be able to cover all the scenarios. And I think the previous ExtractDataRef is not complete.

I just don't understand your last two comments. But if you find checks missing in semantics, and they're not related to this patch, please file them as bugs.

I just don't understand your last two comments. But if you find checks missing in semantics, and they're not related to this patch, please file them as bugs.

For the second comment, I find some checks missing in semantics when I extend the test cases. I will file them.

For the third comment, I mean there are some semantic checks without regression tests. Should we add the regression tests later?

These two comments are not related to this patch. Sorry, I should ask you by creating one issue in llvm-project.

peixin retitled this revision from [flang] Add one missed semantic check for derived type accessbility to [flang] Fix semantic checks for C919.May 30 2022, 6:10 PM
peixin edited the summary of this revision. (Show Details)
peixin set the repository for this revision to rG LLVM Github Monorepo.
peixin removed a project: Restricted Project.
Herald added a project: Restricted Project. · View Herald TranscriptMay 30 2022, 6:10 PM
peixin updated this revision to Diff 433058.May 31 2022, 5:34 AM
peixin edited the summary of this revision. (Show Details)
  1. Add description in summary.
  2. Add co-author.
  3. Add regression tests.
peixin added a comment.Jun 6 2022, 8:47 AM

@klausler Is the current revision the right solution? If not, please feel free to take up this.

klausler accepted this revision.Jun 6 2022, 8:56 AM
This revision is now accepted and ready to land.Jun 6 2022, 8:56 AM
This revision was automatically updated to reflect the committed changes.