This is an archive of the discontinued LLVM Phabricator instance.

[clang][OpeMP] Model OpenMP structured-block in AST (PR40563)
ClosedPublic

Authored by lebedev.ri on Mar 11 2019, 9:03 AM.

Details

Summary

https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page 3:

structured block

For C/C++, an executable statement, possibly compound, with a single entry at the
top and a single exit at the bottom, or an OpenMP construct.

COMMENT: See Section 2.1 on page 38 for restrictions on structured
blocks.
2.1 Directive Format

Some executable directives include a structured block. A structured block:
• may contain infinite loops where the point of exit is never reached;
• may halt due to an IEEE exception;
• may contain calls to exit(), _Exit(), quick_exit(), abort() or functions with a
_Noreturn specifier (in C) or a noreturn attribute (in C/C++);
• may be an expression statement, iteration statement, selection statement, or try block, provided
that the corresponding compound statement obtained by enclosing it in { and } would be a
structured block; and

Restrictions
Restrictions to structured blocks are as follows:
• Entry to a structured block must not be the result of a branch.
• The point of exit cannot be a branch out of the structured block.
C / C++
• The point of entry to a structured block must not be a call to setjmp().
• longjmp() and throw() must not violate the entry/exit criteria.

Of particular note here is the fact that OpenMP structured blocks are as-if noexcept,
in the same sense as with the normal noexcept functions in C++.
I.e. if throw happens, and it attempts to travel out of the noexcept function
(here: out of the current structured-block), then the program terminates.

Now, one of course can say that since it is explicitly prohibited by the Specification,
then any and all programs that violate this Specification contain undefined behavior,
and are unspecified, and thus no one should care about them. Just don't write broken code /s

But i'm not sure this is a reasonable approach.
I have personally had oss-fuzz issues of this origin - exception thrown inside
of an OpenMP structured-block that is not caught, thus causing program termination.
This issue isn't all that hard to catch, it's not any particularly different from
diagnosing the same situation with the normal noexcept function.

Now, clang static analyzer does not presently model exceptions.
But clang-tidy has a simplisic bugprone-exception-escape check,
and it is even refactored as a ExceptionAnalyzer class for reuse.
So it would be trivial to use that analyzer to check for
exceptions escaping out of OpenMP structured blocks. (D59466)

All that sounds too great to be true. Indeed, there is a caveat.
Presently, it's practically impossible to do. To check a OpenMP structured block
you need to somehow 'get' the OpenMP structured block, and you can't because
it's simply not modelled in AST. CapturedStmt/CapturedDecl is not it's representation.

Now, it is of course possible to write e.g. some AST matcher that would e.g.
match every OpenMP executable directive, and then return the whatever Stmt is
the structured block of said executable directive, if any.
But i said practically. This isn't practical for the following reasons:

  1. This will bitrot. That matcher will need to be kept up-to-date, and refreshed with every new OpenMP spec version.
  2. Every single piece of code that would want that knowledge would need to have such matcher. Well, okay, if it is an AST matcher, it could be shared. But then you still have RecursiveASTVisitor and friends. 2 > 1, so now you have code duplication.

So it would be reasonable (and is fully within clang AST spirit) to not
force every single consumer to do that work, but instead store that knowledge
in the correct, and appropriate place - AST, class structure.

Now, there is another hoop we need to get through.
It isn't fully obvious how to model this.
The best solution would of course be to simply add a OMPStructuredBlock transparent
node. It would be optimal, it would give us two properties:

  • Given this OMPExecutableDirective, what's it OpenMP structured block?
  • It is trivial to check whether the Stmt* is a OpenMP structured block (isa<OMPStructuredBlock>(ptr))

But OpenMP structured block isn't necessarily the first, direct child of OMP*Directive.
(even ignoring the clang's CapturedStmt/CapturedDecl that were inserted inbetween).
So i'm not sure whether or not we could re-create AST statements after they were already created?
There would be other costs to a new AST node: https://bugs.llvm.org/show_bug.cgi?id=40563#c12

1. You will need to break the representation of loops. The body should be replaced by the "structured block" entity.
2. You will need to support serialization/deserialization.
3. You will need to support template instantiation.
4. You will need to support codegen and take this new construct to account in each OpenMP directive.

Instead, there is an functionally-equivalent, alternative solution, consisting of two parts.

Part 1:

  • Add a member function isStandaloneDirective() to the OMPExecutableDirective class, that will tell whether this directive is stand-alone or not, as per the spec. We need it because we can't just check for the existance of associated statements, see code comment.
  • Add a member function getStructuredBlock() to the OMPExecutableDirective` class itself, that assert that this is not a stand-alone directive, and either return the correct loop body if this is a loop-like directive, or the captured statement.

This way, given an OMPExecutableDirective, we can get it's structured block.
Also, since the knowledge is ingrained into the clang OpenMP implementation,
it will not cause any duplication, and hopefully won't bitrot.

Great we achieved 1 of 2 properties of OMPStructuredBlock approach.

Thus, there is a second part needed:

  • How can we check whether a given Stmt* is OMPStructuredBlock?

Well, we can't really, in general. I can see this workaround:

class FunctionASTVisitor : public RecursiveASTVisitor<FunctionASTVisitor> {
  using Base = RecursiveASTVisitor<FunctionASTVisitor>;
public:
  bool VisitOMPExecDir(OMPExecDir *D) {
    OmpStructuredStmts.emplace_back(D.getStructuredStmt());
  }
  bool VisitSOMETHINGELSE(???) {
    if(InOmpStructuredStmt)
      HI!
  }
  bool TraverseStmt(Stmt *Node) {
    if (!Node)
      return Base::TraverseStmt(Node);
    if (OmpStructuredStmts.back() == Node)
      ++InOmpStructuredStmt;
    Base::TraverseStmt(Node);
    if (OmpStructuredStmts.back() == Node) {
      OmpStructuredStmts.pop_back();
      --InOmpStructuredStmt;
    }
    return true;
  }
  std::vector<Stmt*> OmpStructuredStmts;
  int InOmpStructuredStmt = 0;
};

But i really don't see using it in practice.
It's just too intrusive; and again, requires knowledge duplication.

.. but no. The solution lies right on the ground.
Why don't we simply store this i'm a openmp structured block in the bitfield of the Stmt itself?
This does not appear to have any impact on the memory footprint of the clang AST,
since it's just a single extra bit in the bitfield. At least the static assertions don't fail.
Thus, indeed, we can achieve both of the properties without a new AST node.

We can cheaply set that bit right in sema, at the end of Sema::ActOnOpenMPExecutableDirective(),
by just calling the getStructuredBlock() that we just added.
Test coverage that demonstrates all this has been added.

This isn't as great with serialization though. Most of it does not use abbrevs,
so we do end up paying the full price (4 bytes?) instead of a single bit.
That price, of course, can be reclaimed by using abbrevs.
In fact, i suspect that might not just reclaim these bytes, but pack these PCH significantly.

I'm not seeing a third solution. If there is one, it would be interesting to hear about it.
("just don't write code that would require isa<OMPStructuredBlock>(ptr)" is not a solution.)

Fixes PR40563.

Diff Detail

Repository
rL LLVM

Event Timeline

lebedev.ri created this revision.Mar 11 2019, 9:03 AM
lebedev.ri retitled this revision from [private][clang][OpeMP] Model OpenMP structured-block in AST (PR40563) to [clang][OpeMP] Model OpenMP structured-block in AST (PR40563).Mar 11 2019, 9:05 AM
ABataev added inline comments.Mar 11 2019, 9:42 AM
include/clang/AST/StmtOpenMP.h
335 ↗(On Diff #190079)

No need to insert it into each class, just add:

Stmt * OMPExecutableDirective::getStructuredBlock() const {
  if (!hasAssociatedStmt() || !getAssociatedStmt())
    return nullptr;
  if (auto *LD = dyn_cast<OMPLoopDirective>(this))
    return LD->getBody();
  return getInnermostCapturedStmt()->getCapturedStmt();
}
gribozavr added inline comments.Mar 12 2019, 2:20 AM
include/clang/AST/StmtOpenMP.h
266 ↗(On Diff #190079)

"the AST node"

2158 ↗(On Diff #190079)

This comment (and other similar comments on getStructuredBlockImpl()) are implementation comments and should be written in the function body.

include/clang/Basic/OpenMPKinds.def
18 ↗(On Diff #190079)

Why not add a default definition:

#  define OPENMP_EXECUTABLE_DIRECTIVE(Name, Class) OPENMP_DIRECTIVE(Name)

(Also for OPENMP_EXECUTABLE_DIRECTIVE_EXT below.)

test/AST/ast-dump-openmp-atomic.c
1 ↗(On Diff #190079)

I'm not a fan of ast-dump tests. They are fragile, difficult to update, difficult to read (when they are 700 lines long), and it is unclear what the important parts are.

WDYT about converting them to unit tests? See clang/unittests/AST/StmtPrinterTest.cpp for an example.

gribozavr added inline comments.Mar 12 2019, 2:23 AM
include/clang/AST/StmtOpenMP.h
269 ↗(On Diff #190079)

Why not Stmt * as a return value?

Thanks for taking a look, some replies.

include/clang/AST/StmtOpenMP.h
269 ↗(On Diff #190079)

Good question.

Because usually checking a pointer for null is an exceptional
case, you normally have the AST node, you don't check
every single pointer for null, i don't think?

This is absolutely not the case here, null is not an
exceptional failure state here, it's the expected return value
for stand-alone directives. With Optional, you explicitly
signal "hey, i'm not a plain pointer, pay attention!",
thus hopefully maybe preventing some issues.

I'm not sure we can more nicely solve this with an extra base class[es],
at least not without having only two direct subclasses of OMPExecutableDirective:

  • OMPNonstandaloneExecutableDirective.
  • OMPStandaloneExecutableDirective.

and two different OMPOrderedDirective classes (one standalone, one not).
But somehow i don't think there is any chance of that being accepted,
even though it would result in slightly cleaner AST.

335 ↗(On Diff #190079)

I absolutely can do that, you are sure that is the most future-proof state?
In particular, i want to re-point-out that if it's implemented like this,
in the base class, then the sub-class may(will) not even know about this function,
and thus 'forget' to update it, should it not be giving the correct answer for
that new specific OpenMP executable directive.

You are sure it's better to implement it in the OMPExecutableDirective itself?

include/clang/Basic/OpenMPKinds.def
18 ↗(On Diff #190079)

Hm, i have never seen that chaining pattern in .def headers before,
so it could be surprising behavior.

test/AST/ast-dump-openmp-atomic.c
1 ↗(On Diff #190079)

They are difficult to update.

The updating part is true, for all of the clang tests,
they unlike llvm tests have no update scripts, so it's a major pain.
Here, it's is actually reasonably simple:

  1. prune old // CHECK* lines
  2. run the run line
  3. prepend each line of output with // CHECK-NEXT:
  4. scrub addresses, s/0x[0-9a-f]+/0x{{.*}}/
  5. scrub filepath prefix, again a simple string-replace
  6. prune everything after TranslationUnitDecl and before first FunctionDecl
  7. replace a few // CHECK-NEXT: with // CHECK:
  8. append the final processed output to the test file. done.

It would not be too hard to write an update tool for this, but i've managed with a simple macro..

They are fragile, difficult to read (when they are 700 lines long),
and it is unclear what the important parts are.

This is kind-of intentional. I've tried to follow the llvm-proper approach of gigantic
tests that are overly verbose, but quickly cement the state so any change will
be noticed. That has been a very successful approach for LLVM.

In fact, the lack of this ast-dumper test coverage was not helping
the recent ast-dumper refactoring, ask @aaron.ballman / @steveire.

So no, i really don't want to convert the tests into a less verbose state.

I suppose i could add a more fine-grained tests,
though readability is arguable. Here, it is obvious nothing is omitted,
and you can quickly see the openmp_structured_block by Ctrl+F'ing for it.
With more distilled tests, it's less obvious. (i'm not talking about StmtPrinterTest)

ABataev added inline comments.Mar 12 2019, 10:56 AM
include/clang/AST/StmtOpenMP.h
335 ↗(On Diff #190079)

Yes, I'm sure. It is the universal solution and all future classes must be compatible with it. If they are not, then they are incorrect.

gribozavr added inline comments.Mar 12 2019, 11:08 AM
include/clang/AST/StmtOpenMP.h
269 ↗(On Diff #190079)

Because usually checking a pointer for null is an exceptional
case, you normally have the AST node, you don't check
every single pointer for null, i don't think?

When deciding whether to check for null, we look at the function contract and check those pointers that are nullable. For example, IfStmt::getThen() does not return NULL in valid ASTs, but IfStmt::getElse() does.

This is absolutely not the case here, null is not an
exceptional failure state here, it's the expected return value
for stand-alone directives.

Yes, I understand what optional is and how a new codebase could use it with pointers. However, LLVM and Clang code has not been using optional to indicate nullability for pointers. I could only find 136 occurrences of Optional<.*\*> in llvm-project.git.

include/clang/Basic/OpenMPKinds.def
18 ↗(On Diff #190079)

It is used in Clang, for example, in llvm/tools/clang/include/clang/AST/TypeNodes.def, or llvm/tools/clang/include/clang/AST/BuiltinTypes.def. I was surprised that this file does not use this pattern, to be honest.

test/AST/ast-dump-openmp-atomic.c
1 ↗(On Diff #190079)

Here, it's is actually reasonably simple:

Unfortunately, an 8-step process is anything but simple.

This is kind-of intentional. I've tried to follow the llvm-proper approach of gigantic
tests that are overly verbose, but quickly cement the state so any change will
be noticed.

That's pretty much a definition of a "fragile test". The tests should check what they intend to check.

That has been a very successful approach for LLVM.

I don't think LLVM does this, the CHECK lines are for the most part manually crafted.

In fact, the lack of this ast-dumper test coverage was not helping
the recent ast-dumper refactoring, ask @aaron.ballman / @steveire.

If we need tests for AST dumper, we should write tests for it. OpenMP tests should not double for AST dumper tests.

lebedev.ri added inline comments.Mar 12 2019, 11:16 AM
include/clang/AST/StmtOpenMP.h
335 ↗(On Diff #190079)

Aha! Well, ok then.

Do you also suggest that Optional<> is too fancy?
Would it be better to do this instead?

bool isStandaloneDirective() const {
  return !hasAssociatedStmt() || !getAssociatedStmt();
}

// Requires: !isStandaloneDirective()
Stmt *OMPExecutableDirective::getStructuredBlock() const {
  assert(!isStandaloneDirective() && "Standalone Executable OpenMP directives don't have structured blocks.")
  if (auto *LD = dyn_cast<OMPLoopDirective>(this))
    return LD->getBody();
  return getInnermostCapturedStmt()->getCapturedStmt();
}

Hm, maybe that actually conveys more meaning..

lebedev.ri added inline comments.Mar 12 2019, 12:15 PM
test/AST/ast-dump-openmp-atomic.c
1 ↗(On Diff #190079)

Here, it's is actually reasonably simple:

Unfortunately, an 8-step process is anything but simple.

Hah, what i was saying is that it is entirely automatable,
all steps are identical each time.

That's pretty much a definition of a "fragile test". The tests should check what they intend to check.
I don't think LLVM does this, the CHECK lines are for the most part manually crafted.

No.
I know that 'most' of the new/updated tests are not manual,
from looking at the tests in commits, it's not an empty statement.

llvm/test/CodeGen$ for i in `ls -1d *`; do echo -n "$i "; find $i -iname \*.ll | xargs grep "update_llc" | wc -l; done | sort -r -n -k 2,2
X86 1427
PowerPC 125
RISCV 90
...
llvm/test/CodeGen$ find X86/ -iname *.ll | wc -l
3370
llvm/test/CodeGen$ find PowerPC/ -iname *.ll | wc -l
1019
llvm/test/CodeGen$ find RISCV/ -iname *.ll | wc -l
105

So a half of X86 backend tests, and pretty much all RISCV backend tests.

llvm/test$ for i in `ls -1d *`; do echo -n "$i "; find $i -iname \*.ll | xargs grep "update_test" | wc -l; done | sort -r -n -k 2,2
Transforms 845
Analysis 17
CodeGen 6
Other 2
$ for i in `ls -1d *`; do echo -n "$i "; find $i -iname \*.ll | wc -l; done | sort -r -n -k 2,2
Transforms 4620
Analysis 804

So sure, by the numbers, 'most' aren't auto-generated.
It's a question of legacy mainly.

I've both added llvm tests, and clang tests.
llvm tests are a breeze, just come up with sufficient IR,
and run the script [verify that the CHECK is what you thought it is,
or tune IR until it is], and you're done. With clang (codegen mostly),
one needs to first come up with the test, and then with the check lines,
and be really careful not to misspell CHECK, or else you're suddenly
not testing anything. clang tests are at least 3x more time-costly,
and i suspect thus have much less coverage they could have.

If we need tests for AST dumper, we should write tests for it. OpenMP tests should not double for AST dumper tests.

I'll take a look at StmtPrinterTest, but i'm not sure i agree it's better.

lebedev.ri added inline comments.Mar 12 2019, 1:11 PM
include/clang/AST/StmtOpenMP.h
335 ↗(On Diff #190079)

Great, that doesn't work, and highlights my concerns.
target enter data / target exit data / target update are stand-alone directives as per the spec,
but not as per that isStandaloneDirective() check ^.
https://godbolt.org/z/0tE93s

Is this a bug, or intentional?

ABataev added inline comments.Mar 12 2019, 1:33 PM
include/clang/AST/StmtOpenMP.h
335 ↗(On Diff #190079)

Well, this is an incompatibility caused by previous not-quite correct implementation. It was reworked already, but these incorrect children still remain, I just had no time to clean them out. You can fix this.

lebedev.ri added inline comments.Mar 12 2019, 1:35 PM
include/clang/AST/StmtOpenMP.h
335 ↗(On Diff #190079)

Okay, that is reassuring, thanks.

gribozavr added inline comments.Mar 12 2019, 1:54 PM
test/AST/ast-dump-openmp-atomic.c
1 ↗(On Diff #190079)

Hah, what i was saying is that it is entirely automatable, all steps are identical each time.

What about the issue of reviewing the resulting diffs? How likely is that the programmer will notice an unintended change in the sea of diffs?

If my arguments don't convince you, here's some industry wisdom: https://testing.googleblog.com/2015/01/testing-on-toilet-change-detector-tests.html

lebedev.ri added inline comments.Mar 13 2019, 7:27 AM
include/clang/AST/StmtOpenMP.h
335 ↗(On Diff #190079)

this is an incompatibility caused by previous not-quite correct implementation. It was reworked already,
but these incorrect children still remain, I just had no time to clean them out. You can fix this.

I have looked into this, and while parsing/sema changes are trivial, it has consequences for CodeGen.
In particular CodeGenFunction::EmitOMPTargetEnterDataDirective() & friends can no longer
create OMPLexicalScope with OMPD_task, or else it will assert that there are no associated stmts.

And more importantly, in the end CodeGenFunction::EmitOMPTargetTaskBasedDirective() tries to, again,
get these assoc stmts, and fails. I'm guessing it shouldn't just bailout, then it would not emit anything?

Any hints would be appreciated.

ABataev added inline comments.Mar 13 2019, 7:49 AM
include/clang/AST/StmtOpenMP.h
335 ↗(On Diff #190079)

Ahh, now I recall why it was required. It is needed to support the asynchronous data transfers. It can be reworked to avoid adding those special associated statements but it requires significant amount of time.
You can just add a check for those 3 directives in this Stmt *OMPExecutableDirective::getStructuredBlock() and return nullptr for them explicitly. I think it is still better than adding a whole bunch of those getStructuredStmt functions for each class. Plus, using this single function, you can apply it to the base OMPExecutableDirective class rather than to the derived classes. Some matchers could prefer to do this.

lebedev.ri marked 17 inline comments as done.
lebedev.ri edited the summary of this revision. (Show Details)

Addressed review notes
Split off baseline tests to a separate diff, to actually demonstrate the diff
Did not add more fine-grained tests as of this moment.

IIRC, last time I looked only 4 statement/expression classes currently have some abbreviation defined. It would probably be useful to have someone go systematically through the list of classes and fix this.

IIRC, last time I looked only 4 statement/expression classes currently have some abbreviation defined.

Yep, that is what i'm seeing in this diff.

It would probably be useful to have someone go systematically through the list of classes and fix this.

Yeah, i suspect that might be really beneficial.

IIRC, last time I looked only 4 statement/expression classes currently have some abbreviation defined.

Yep, that is what i'm seeing in this diff.

It would probably be useful to have someone go systematically through the list of classes and fix this.

Yeah, i suspect that might be really beneficial.

Especially given that more people are going to use modules now that they are in the process of being standardized.

IIRC, last time I looked only 4 statement/expression classes currently have some abbreviation defined.

Yep, that is what i'm seeing in this diff.

It would probably be useful to have someone go systematically through the list of classes and fix this.

Yeah, i suspect that might be really beneficial.

Especially given that more people are going to use modules now that they are in the process of being standardized.

Ah, good point, filed https://bugs.llvm.org/show_bug.cgi?id=41072

riccibruno added inline comments.Mar 15 2019, 3:03 PM
include/clang/AST/Stmt.h
100 ↗(On Diff #190437)

What about a comment explaining the purpose of this bit ?

include/clang/AST/StmtOpenMP.h
274 ↗(On Diff #190437)

This is not const-correct. The const-qualified method should return a const Stmt *.

lebedev.ri added inline comments.Mar 15 2019, 3:12 PM
include/clang/AST/StmtOpenMP.h
274 ↗(On Diff #190437)

Hmm, true; though that is not uncommon for all of clang.
I guess it will be best to return const-qualified here, and add a wrapper that would cast it away.

lebedev.ri marked 8 inline comments as done.

Added more fine-grained tests.

Hopefully all good now ?

riccibruno added inline comments.Mar 16 2019, 6:27 AM
include/clang/AST/StmtOpenMP.h
274 ↗(On Diff #190957)

Uh, and what about the non-const version of getStructuredBlock ? I think that you just have to do the usual const_cast dance to implement the non-const version in term of the const version.

Apart from this I have no other comment.

lebedev.ri added inline comments.Mar 16 2019, 6:39 AM
lib/AST/StmtOpenMP.cpp
40 ↗(On Diff #190957)

@riccibruno
getBody() exists in const-only variant
getInnermostCapturedStmt() does exist in both the const and non-const variants,
but the const variant does const_cast and defers to non-const variant.
getCapturedStmt() is good though.

So at best i can add

Stmt *getStructuredBlock() {
  return const_cast<Stmt *>(const_cast<OMPExecutableDirective *>(this)->getStructuredBlock());
}

I'm not sure it's worth it?

Drop unused helper from unit test.

riccibruno added inline comments.Mar 16 2019, 7:14 AM
lib/AST/StmtOpenMP.cpp
40 ↗(On Diff #190957)

Or perhaps something like the following to avoid one const_cast:

Stmt *getStructuredBlock() {
  const OMPExecutableDirective *ConstThis = this;
  return const_cast<Stmt *>(ConstThis->getStructuredBlock());
}

My point is that in itself this is a minor thing, but not defining consistently both versions (when appropriate) inevitably leads to annoying errors. For example suppose that you only define the const version of getStructuredBlock(), and then there is some random function bar with the signature void bar(Stmt *S).

You then try:

// Intentionally not `const Stmt *` since we will set some random flag in S.
void bar(Stmt *S);

// Non-const since we are modifying it.
OMPExecutableDirective *OED = something;
bar(OED->getStructuredBlock())

which fails because OED->getStructuredBlock() has type const Stmt*, even though every other pointee type is non-const qualified.

So unless it is intentional to forbid modifying the statement returned by getStructuredBlock(), defining both versions of getStructuredBlock() avoid this problem. Yes it is annoying to have to define the two versions but as far as I know this is inherent to C++ unfortunately.

lebedev.ri added inline comments.Mar 16 2019, 7:20 AM
lib/AST/StmtOpenMP.cpp
40 ↗(On Diff #190957)

Hmm, okay, will add during next update / before landing, whichever happens next.

riccibruno added inline comments.Mar 16 2019, 7:26 AM
lib/AST/StmtOpenMP.cpp
40 ↗(On Diff #190957)

Sounds good.

lebedev.ri edited the summary of this revision. (Show Details)Mar 16 2019, 1:55 PM
gribozavr added inline comments.Mar 18 2019, 4:12 AM
include/clang/AST/Stmt.h
102 ↗(On Diff #190959)

"non-standalone OpenMP executable directives"

test/PCH/stmt-openmp_structured_block-bit.cpp
8 ↗(On Diff #190959)

Maybe try imitating a more recently written test, like llvm/tools/clang/test/PCH/cxx2a-compare.cpp ? It has both header and user written in the same file, for improved readability.

test/PCH/stmt-openmp_structured_block-bit.h
2 ↗(On Diff #190959)

Is cxx_exprs.h needed?

I don't think -std=c++11 is needed either.

Same in RUN lines below.

Maybe try imitating a more recently written test, like llvm/tools/clang/test/PCH/cxx2a-compare.cpp ?

5 ↗(On Diff #190959)

The point of PCH tests is to test the serialization of ASTs into PCH. In this line the cxx_exprs.h header is being initialized, it does not contain any OpenMP constructs, so I'm not sure what this test is supposed to test.

6 ↗(On Diff #190959)

I don't see any CHECK lines for FileCheck. (FileCheck fails the test if there are no CHECK lines.)

unittests/AST/OMPStructuredBlockTest.cpp
9 ↗(On Diff #190959)

"of Stmt"

27 ↗(On Diff #190959)

Eventually you'll move it to ASTMatchers.h, right?

41 ↗(On Diff #190959)

extra "k" before "Matcher"

unittests/AST/PrintTest.h
15 ↗(On Diff #190959)

Please don't write using namespace in headers. Instead, put the code below into the clang namespace.

19 ↗(On Diff #190959)

No need for this, make PrintStmt() static.

Before i do address any further bike^W review notes in other
reviews in this path stack, i want to finish with this review.
If this change doesn't stick, rest is pointless waste of everyone's time.

@ABataev please do tell if there is any outstanding issues here.
I think i have addressed everything..

As far as I'm concerned, I don't have any major revision requests. I haven't reviewed the unit tests (I'm planning to), but for non-test code, I don't have any further comments.

The OpenMP part looks good.

gribozavr added inline comments.Mar 20 2019, 2:39 AM
unittests/AST/OMPStructuredBlockTest.cpp
64 ↗(On Diff #190959)

Same comment as in the other patch -- I would prefer that the source is inlined into the ASSERT_TRUE, with implicit string concatenation and "\n"s, if raw strings don't work.

75 ↗(On Diff #190959)

Could you also add an assertion that OMPStructuredBlockMatcher() matches zero AST nodes? Similarly in every test that has no structured blocks.

121 ↗(On Diff #190959)

This test file repeats this pattern many times, only with different OpenMP directives:

TEST(OMPStructuredBlock, Test~~~Directive0)
TEST(OMPStructuredBlock, Test~~~Directive1)
TEST(OMPStructuredBlock, Test~~~DirectiveCollapse1)
TEST(OMPStructuredBlock, Test~~~DirectiveCollapse2)
TEST(OMPStructuredBlock, Test~~~DirectiveCollapse22)

WDYT about deduplicating it using parameterized tests? For example, see AssignmentTest in llvm/tools/clang/unittests/Analysis/ExprMutationAnalyzerTest.cpp.

656 ↗(On Diff #190959)

Is it only a pretty-printer problem or something more severe?

704 ↗(On Diff #190959)

Extra "k" before "Matcher".

712 ↗(On Diff #190959)

Extra "k" before "Matcher".

1481 ↗(On Diff #190959)

"TestTeams"

unittests/AST/PrintTest.h
1 ↗(On Diff #190959)

"ASTPrint.h"?

lebedev.ri added inline comments.Mar 20 2019, 4:07 AM
unittests/AST/OMPStructuredBlockTest.cpp
64 ↗(On Diff #190959)

I still don't understand the reasoning here.
It feels like a change just for the sake of the change.

gribozavr added inline comments.Mar 20 2019, 5:17 AM
unittests/AST/OMPStructuredBlockTest.cpp
64 ↗(On Diff #190959)

These variables don't improve readability. Also, if there are multiple such variables in one test, it is easy to mistakenly use an incorrect one in the ASSERT_TRUE line.

lebedev.ri marked 21 inline comments as done.

Addressed all nits excluding the removal of multiline string literals.

The OpenMP part looks good.

Great. Can you stamp this then please? :)

include/clang/AST/Stmt.h
102 ↗(On Diff #190959)

English isn't my native language; to me the current variant (attempts to?) convey
that structured block can only ever exist for non-standalone executable directives.
I.e. "structured block of non-standalone executable directives" is a misnomer.

The variant without [] does not convey any of that, i think, it just says
that "it's a structured block of non-standalone OpenMP executable directives",
it isn't obvious from that (without knowing OpenMP details) that there
wouldn't be "a structured block of standalone OpenMP executable directive".

Isn't there an important difference there?

unittests/AST/OMPStructuredBlockTest.cpp
64 ↗(On Diff #190959)

Can you please quote something about this from
https://llvm.org/docs/CodingStandards.html
https://llvm.org/docs/ProgrammersManual.html
etc?
If not this is entirely subjective/stylistic,
and this was the form that was more readable to me.
(easier to write, too, those "\n" will look horrible)
Also, what about the cases with two assertions? (not just the new ones)

I really don't see how that would be a good change.

656 ↗(On Diff #190959)
gribozavr accepted this revision.Mar 20 2019, 8:15 AM
gribozavr added inline comments.
include/clang/AST/Stmt.h
102 ↗(On Diff #190959)

I didn't catch this difference implied by "[]".

I.e. "structured block of non-standalone executable directives" is a misnomer.

Why is it a misnomer? It might be overly verbose, but it is correct.

However, I see that this fact is important to highlight, so I'd suggest to spell it out:

"This bit is set only for the Stmts that are the structured-block of OpenMP executable directives. Directives that have a structured block are called "non-standalone" directives."

Please also fix the typo "OpeMP" => OpenMP.

unittests/AST/OMPStructuredBlockTest.cpp
64 ↗(On Diff #190959)

There are lots of readability points that are not formalized in the coding style, and yet, are followed in the code.

For cases with two assertions we need to have a variable to avoid code duplication, so it is OK there -- the variable has a purpose.

If you want to use raw string literals, could I at least ask you to drop the first line onto the same level of indentation as the rest of the lines?

  const char *Source =
      R"(void test(int i) {
#pragma omp atomic
++i;
})";

change to:

  const char *Source = R"(
void test(int i) {
#pragma omp atomic
++i;
})";
This revision is now accepted and ready to land.Mar 20 2019, 8:15 AM
ABataev accepted this revision.Mar 20 2019, 8:32 AM

LG for the OpenMP part

lebedev.ri marked 5 inline comments as done.

Few last nits.

@ABataev @gribozavr thank you for the reviews!

include/clang/AST/Stmt.h
102 ↗(On Diff #190959)

I.e. "structured block of non-standalone executable directives" is a misnomer.

Why is it a misnomer? It might be overly verbose, but it is correct.

Blargh.
I meant to write "structured block of *standalone* executable directives" would make no sense.

".."

Yes, that looks cleaner, thank you!

unittests/AST/OMPStructuredBlockTest.cpp
64 ↗(On Diff #190959)

If you want to use raw string literals, could I at least ask you to drop the first line onto the same level of indentation as the rest of the lines?

Absolutely.

Herald added a project: Restricted Project. · View Herald TranscriptMar 20 2019, 9:32 AM
rupprecht added inline comments.
cfe/trunk/unittests/AST/OMPStructuredBlockTest.cpp
58

Looks like this test fails when the default is not libomp, e.g. DCLANG_DEFAULT_OPENMP_RUNTIME=libgomp
Using -fopenmp=libomp explicitly here causes the test to pass in the case. Does that seem like the right fix, or has something gone wrong?

So if I understand the history here:

  • the IsOMPStructuredBlock bit on Stmt was added to implement getStructuredBlock()
  • after review, getStructuredBlock() doesn't use the bit
  • the bit is used in the textual AST dump, and an AST matcher (which is in turn never used in-tree)

Can we have the bit back please :-)
We're currently trying to add a HasErrors bit to improve error-recovery and some nodes are completely full.
Using a bit on every Stmt to improve the text dump a little and provide a matcher (only relevant to OMP files) seems disproportionately expensive.

If the bit is unused except for propagation, please remove it.

So if I understand the history here:

  • the IsOMPStructuredBlock bit on Stmt was added to implement getStructuredBlock()
  • after review, getStructuredBlock() doesn't use the bit
  • the bit is used in the textual AST dump, and an AST matcher (which is in turn never used in-tree)

Can we have the bit back please :-)
We're currently trying to add a HasErrors bit to improve error-recovery and some nodes are completely full.
Using a bit on every Stmt to improve the text dump a little and provide a matcher (only relevant to OMP files) seems disproportionately expensive.

If the bit is unused except for propagation, please remove it.

This review (& https://bugs.llvm.org/show_bug.cgi?id=40563 in general)
has left me so burnout as to all this openmp stuff that i never
followed-up with the actual planned usages of this bit.

The reasons why this bit is needed (i think they were spelled out here)
still apply. One obvious example is https://bugs.llvm.org/show_bug.cgi?id=41102,
where after we'd teach ExceptionAnalyzer to look into CapturedStmt / CapturedDecl,
we'd immediately have false-positives for these OpenMP structured blocks,
so we'd need to avoid traversing into them.

Doing that via a single bit check should be trivial, as opposed to,
i dunno, maintainting a set of all the openmp structured block statements,
and before visiting each statement seeing if it is in that set?
That kinda sounds slow.

So if I understand the history here:

  • the IsOMPStructuredBlock bit on Stmt was added to implement getStructuredBlock()
  • after review, getStructuredBlock() doesn't use the bit
  • the bit is used in the textual AST dump, and an AST matcher (which is in turn never used in-tree)

Can we have the bit back please :-)
We're currently trying to add a HasErrors bit to improve error-recovery and some nodes are completely full.
Using a bit on every Stmt to improve the text dump a little and provide a matcher (only relevant to OMP files) seems disproportionately expensive.

If the bit is unused except for propagation, please remove it.

This review (& https://bugs.llvm.org/show_bug.cgi?id=40563 in general)
has left me so burnout as to all this openmp stuff that i never
followed-up with the actual planned usages of this bit.

I'm sorry to hear that.

The reasons why this bit is needed (i think they were spelled out here)
still apply. One obvious example is https://bugs.llvm.org/show_bug.cgi?id=41102,
where after we'd teach ExceptionAnalyzer to look into CapturedStmt / CapturedDecl,
we'd immediately have false-positives for these OpenMP structured blocks,
so we'd need to avoid traversing into them.

Doing that via a single bit check should be trivial, as opposed to,
i dunno, maintainting a set of all the openmp structured block statements,
and before visiting each statement seeing if it is in that set?
That kinda sounds slow.

Well, two points.

The first is that language dialects have to take a back-seat to general-purpose
mechanisms. I care a lot about Objective-C ARC, and that dialect has some
type qualifiers that are used very frequently, but the representation does not
and will never prioritize those type qualifiers over things that are common to
C/C++, and so we just use extra memory when compiling ARC.

The second is that we can pretty easily solve your problem without adding a
bit to Stmt. Captured statements are like blocks in the sense that traverses
that walk into captured statements will always pass through a node in the AST
that knows it's introducing a captured statement, so they should be easy to just
record that the traversal is currently within a captured statement, or even
maintain a stack of such contexts. If it helps to add "callbacks" to some CRTP
visitor class when entering and exiting a special local context, we can do that.

d5edcb90643104d6911da5c0ff44c4f33fff992f, looking forward to seeing better error recovery.

d5edcb90643104d6911da5c0ff44c4f33fff992f, looking forward to seeing better error recovery.

Thanks very much for your help Roman, I'm also sorry this has been difficult.

The plan for error recovery is to add a RecoveryExpr to the AST that can represent the known structure of an invalid expression (e.g. a function call with no viable overloads - this has subexpressions and possibly a known type) instead of dropping it as we do today. This will be used to improve diagnostics and also expose a more meaningful AST to tools.

A "transitive-has-errors" bit on Expr is needed as various places currently assume that if an Expr exists, the code is valid. (TypoExpr is a complicated special snowflake, and we should also be able to reduce the number of places that special-case it).

d5edcb90643104d6911da5c0ff44c4f33fff992f, looking forward to seeing better error recovery.

Thanks very much for your help Roman, I'm also sorry this has been difficult.

The plan for error recovery is to add a RecoveryExpr to the AST that can represent the known structure of an invalid expression (e.g. a function call with no viable overloads - this has subexpressions and possibly a known type) instead of dropping it as we do today. This will be used to improve diagnostics and also expose a more meaningful AST to tools.

A "transitive-has-errors" bit on Expr is needed as various places currently assume that if an Expr exists, the code is valid. (TypoExpr is a complicated special snowflake, and we should also be able to reduce the number of places that special-case it).

Yep, sounds reasonable, thank you for the explanation; that does match my expectations
as i was paying some attention to the previous related discussions.