This is an archive of the discontinued LLVM Phabricator instance.

[clang] format string checks for `InitListExpr`
AbandonedPublic

Authored by inclyc on Aug 6 2022, 1:13 AM.

Details

Summary

this patch enhances clang check format strings defined in list
initialization expressions. Before this patch clang just thought these
expressions are not string literal.

Example:

constexpr const char *foo() { return "%s %d"; }

struct Bar {
  static constexpr char value[] = {'%', 's', '%', 'd', '\0'};
};

constexpr const char *foobar() { return Bar::value; }

constexpr const char *foooobar() { return foobar(); }

int main() {
  printf(foo(), "abc", "def");
  printf(Bar::value, "abc", "def");
  printf(foobar(), "abc", "def");
  printf(foooobar(), "abc", "def");
  return 0;
}

Diagnostics after this patch:

<source>:14:24: warning: format specifies type 'int' but the argument has type 'const char *' [-Wformat]
  printf(foo(), "abc", "def");
         ~~~~~         ^~~~~
<source>:3:42: note: format string is defined here
constexpr const char *foo() { return "%s %d"; }
                                         ^~
                                         %s
<source>:15:29: warning: format specifies type 'int' but the argument has type 'const char *' [-Wformat]
  printf(Bar::value, "abc", "def");
                            ^~~~~
<source>:16:27: warning: format specifies type 'int' but the argument has type 'const char *' [-Wformat]
  printf(foobar(), "abc", "def");
                          ^~~~~
<source>:17:29: warning: format specifies type 'int' but the argument has type 'const char *' [-Wformat]
  printf(foooobar(), "abc", "def");
                            ^~~~~
4 warnings generated.

Diff Detail

Event Timeline

inclyc created this revision.Aug 6 2022, 1:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 6 2022, 1:13 AM
inclyc published this revision for review.Aug 6 2022, 1:18 AM
inclyc added a project: Restricted Project.
inclyc added a subscriber: Restricted Project.
inclyc updated this revision to Diff 450494.Aug 6 2022, 1:20 AM

typo fix

Not a formal review of course, but for the diagnostics, I'm missing something that tells the user what the format string ended up looking like; in your example, the output never mentions that it checked "%s%d".

clang/lib/Sema/SemaChecking.cpp
8707

I'd prefer not to add more FIXME comments.

inclyc added a comment.Aug 6 2022, 6:20 AM

There are too many things changed in this patch, I think the function of displaying the evaluation results, FIXME warning, these can be done in subsequent patches.

inclyc updated this revision to Diff 450528.Aug 6 2022, 6:32 AM

Use isa<> to check Expr class

inclyc updated this revision to Diff 450530.Aug 6 2022, 6:39 AM

rebase && qualify const

ping

FWIW, we usually only ping a review that hasn't had any activity in a week or more (it's not uncommon for reviews to sit for a few days while people think about them or reviewers are busy on other stuff).

I've not had the chance to do an in-depth review yet, but I have two thoughts I can share early on. I think you can simplify the patch a little bit by treating a string literal and an initializer list the same way (using an iterator to walk over the elements) instead of ginning up a fake StringLiteral AST node (that's a very heavy handed way to implement this). However, even with that simplification, I'm not certain the use cases for the diagnostic happen enough to warrant this large of a change in how we process format strings. I had encouraged such a diagnostic given the equivalence of the construct with string literals, but I was imagining that the support would be a few lines of code rather than anything substantial like this. Perhaps you can find a way to make the changes less invasive, but if not, I think we may want to hold off on this change until a user files an issue pointing out some real world code that would be easier to fix if they had such a diagnostic.

ping

FWIW, we usually only ping a review that hasn't had any activity in a week or more (it's not uncommon for reviews to sit for a few days while people think about them or reviewers are busy on other stuff).

I've not had the chance to do an in-depth review yet, but I have two thoughts I can share early on. I think you can simplify the patch a little bit by treating a string literal and an initializer list the same way (using an iterator to walk over the elements) instead of ginning up a fake StringLiteral AST node (that's a very heavy handed way to implement this). However, even with that simplification, I'm not certain the use cases for the diagnostic happen enough to warrant this large of a change in how we process format strings. I had encouraged such a diagnostic given the equivalence of the construct with string literals, but I was imagining that the support would be a few lines of code rather than anything substantial like this. Perhaps you can find a way to make the changes less invasive, but if not, I think we may want to hold off on this change until a user files an issue pointing out some real world code that would be easier to fix if they had such a diagnostic.

Clang seems to depend class FormatStringLiteral when performing format string checking, which has its SourceLocation, a wrapped pointer. With these information we can generate FixIt Hints directly on source codes. In previous version, we probably haven't even considered checking the format string evaluated from InitListExpr (just consider it is not a string literal). I've tried adding some ways to treat them as equals before, but it seems more invasive than this patch. Actually, I don't think any user would ever define a format string like {'%', 'd', 0} (That's exactly the reason I split format string checks in two patches). So maybe we should abandon this patch because there are no enough benefits to apply this?

but it seems more invasive than this patch.

It may be necessary to separate out the existing parts that check for formatting strings and related location information, because InitListExpr may not have proper information of source code location that can be determined during the format string checking.

ping

FWIW, we usually only ping a review that hasn't had any activity in a week or more (it's not uncommon for reviews to sit for a few days while people think about them or reviewers are busy on other stuff).

I've not had the chance to do an in-depth review yet, but I have two thoughts I can share early on. I think you can simplify the patch a little bit by treating a string literal and an initializer list the same way (using an iterator to walk over the elements) instead of ginning up a fake StringLiteral AST node (that's a very heavy handed way to implement this). However, even with that simplification, I'm not certain the use cases for the diagnostic happen enough to warrant this large of a change in how we process format strings. I had encouraged such a diagnostic given the equivalence of the construct with string literals, but I was imagining that the support would be a few lines of code rather than anything substantial like this. Perhaps you can find a way to make the changes less invasive, but if not, I think we may want to hold off on this change until a user files an issue pointing out some real world code that would be easier to fix if they had such a diagnostic.

Clang seems to depend class FormatStringLiteral when performing format string checking, which has its SourceLocation, a wrapped pointer. With these information we can generate FixIt Hints directly on source codes. In previous version, we probably haven't even considered checking the format string evaluated from InitListExpr (just consider it is not a string literal). I've tried adding some ways to treat them as equals before, but it seems more invasive than this patch. Actually, I don't think any user would ever define a format string like {'%', 'd', 0} (That's exactly the reason I split format string checks in two patches). So maybe we should abandon this patch because there are no enough benefits to apply this?

I think that might be the best course of action, though I'm really sorry to have asked you to do this work only to turn around and say "nevermind." I just don't see enough users writing their format strings that way for the amount of effort it takes to support diagnosing it.

inclyc abandoned this revision.Aug 9 2022, 5:37 PM