This is an archive of the discontinued LLVM Phabricator instance.

Comment Sema: Run checks only when appropriate (NFC)
Needs ReviewPublic

Authored by aaronpuchert on Nov 12 2021, 12:20 PM.

Details

Reviewers
gribozavr2
Summary

Instead of checking within the checks whether they apply, we check
before calling them. This allows us to fetch the CommandInfo only once
instead of for every check. Also eliminates some redundant checks and
deduplicates code in Sema::checkBlockCommandDuplicate.

Diff Detail

Event Timeline

aaronpuchert requested review of this revision.Nov 12 2021, 12:20 PM
aaronpuchert created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptNov 12 2021, 12:20 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
aaronpuchert added a comment.EditedNov 12 2021, 12:51 PM

Forgot to run clang-format, will do this when I have to update or land.

clang/lib/AST/CommentSema.cpp
630–632

If you'd be happier here with a regular pointer I don't mind.

I find the code more readable and robust when the check* function checks its applicability itself. After this refactoring, it is not so clear when each check functions applies, what are the correct conditions to call them. To ensure correct usage, probably we should be adding equivalent asserts to the beginning of each function; at which point I'm not sure if the new code is better.

For example, consider checkContainerDecl. It used to start with:

const CommandInfo *Info = Traits.getCommandInfo(Comment->getCommandID());
if (!Info->IsRecordLikeDetailCommand || isRecordLikeDecl())
  return;

This code clearly shows the condition for the warning: the command is "RecordDetailLike" but the decl is not "RecordLike". In the new code this logic is split across two functions.

Is repeated CommandInfo fetching a performance issue? It shouldn't be because it is basically address arithmetic on the static array of commands using the command ID as an index.

I find the code more readable and robust when the check* function checks its applicability itself. After this refactoring, it is not so clear when each check functions applies, what are the correct conditions to call them.

The current code doesn't really do that either. Wouldn't we have to call all check* functions in all actOn* methods? But we only call a certain subset of checks which then check additional applicability conditions internally. Also we do check some conditions before: actOnBlockCommandFinish knows not to call checkReturnsCommand and checkDeprecatedCommand when there is no declaration, which requires more knowledge about what the checks do than to write

if (Info->IsReturnsCommand)
  checkReturnsCommand(Command);
if (Info->IsDeprecatedCommand)
  checkDeprecatedCommand(Command);

which seems pretty natural.

Now I don't think we want to go though the whole battery for every command that we encounter, so if we let the actOn* methods decide which checks to run, we can also let them decide when to run them.

To ensure correct usage, probably we should be adding equivalent asserts to the beginning of each function; at which point I'm not sure if the new code is better.

To some extent D113794 is doing that. (By having llvm_unreachable in the switches.) In the remaining functions nothing bad would happen, I think (other than false positive warnings).

For example, consider checkContainerDecl. [...] This code clearly shows the condition for the warning: the command is "RecordDetailLike" but the decl is not "RecordLike". In the new code this logic is split across two functions.

After the change the call site is

if (Info->IsRecordLikeDetailCommand)
  checkContainerDecl(BC);

So it's saying: if we have a command providing the details for a "RecordLike", let's check if we really have a "RecordLike" there. (Perhaps it should be checkRecordLikeDecl?) Then the check function is doing the check, i.e. "do we actually have a RecordLike here"?

So the idea is that actOn* methods decide which checks to run, while the check functions can assume they're applicable. Ideally they'd only see the declaration, but we have to pass the command in for diagnostics.

Is repeated CommandInfo fetching a performance issue? It shouldn't be because it is basically address arithmetic on the static array of commands using the command ID as an index.

Not that I observed, though in relative terms it could be: take as an extreme example actOnBlockCommandFinish where we don't actually need to run any of the checks, because we have an unrelated block command. Then we're fetching the same data four times instead of just once, and not doing anything else. Compared to the remaining compilation process it's likely not an issue though, that's right.

Apply clang-format.

Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2022, 1:20 PM

In general, I think this approach is defensible. Though I do wonder if we want to put an assert in the check functions to assert that the command is what you'd expect, so someone calling the check functions incorrectly will get some early feedback. WDYT?

clang/lib/AST/CommentSema.cpp
666–667

e.g., turn this into an assert so if someone calls checkDeprectedCommand when it's not a deprecated command, they get yelled at. This makes asserts builds slightly slower (because we're getting the command ID twice), but I don't think the overhead should be all that noticeable in practice.