This is an archive of the discontinued LLVM Phabricator instance.

[clang] format string checking for conpile-time evaluated str literal
ClosedPublic

Authored by inclyc on Aug 1 2022, 8:10 AM.

Details

Summary

This patch enhances clang's ability to check compile-time determinable
string literals as format strings, and can give FixIt hints at literals
(unlike gcc). Issue https://github.com/llvm/llvm-project/issues/55805
mentiond two compile-time string cases. And this patch partially fixes
one.

constexpr const char* foo() {
  return "%s %d";
}
int main() {
   printf(foo(), "abc", "def");
   return 0;
}

This patch enables clang check format string for this:

<source>:4:24: warning: format specifies type 'int' but the argument has type 'const char *' [-Wformat]
  printf(foo(), "abc", "def");
         ~~~~~         ^~~~~
<source>:2:42: note: format string is defined here
constexpr const char *foo() { return "%s %d"; }
                                         ^~
                                         %s
1 warning generated.

Signed-off-by: YingChi Long <me@inclyc.cn>

Diff Detail

Event Timeline

inclyc created this revision.Aug 1 2022, 8:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 1 2022, 8:10 AM
inclyc edited the summary of this revision. (Show Details)Aug 1 2022, 8:11 AM
inclyc updated this revision to Diff 449023.Aug 1 2022, 8:14 AM

add newline

inclyc published this revision for review.Aug 1 2022, 8:17 AM
inclyc added reviewers: mizvekov, rsmith, aaron.ballman.
inclyc added a project: Restricted Project.
inclyc added a subscriber: Restricted Project.
inclyc added inline comments.Aug 1 2022, 8:28 AM
clang/test/Sema/format-strings-scanf.c
235 ↗(On Diff #449023)

These new notes are FixIt hints, looks much better than before.

clang/test/Sema/format-strings.c
263 ↗(On Diff #449023)

Here, s4 is declared as char * const and initialized as const char, so warning generates at the declaration of s4 is sufficient? Since modify characters in string literal causes UB.

Initializing 'char *const' with an expression of type 'const char[6]' discards qualifiers

Currently the checker in this patch just qualifies "hello" as a string literal, and does not report -Wformat-nonliteral as before

tbaeder added a subscriber: tbaeder.Aug 1 2022, 9:19 PM
tbaeder added inline comments.
clang/lib/Sema/SemaChecking.cpp
8507

A comment above this line would be helpful. Would also visually separate it from the return above, which just confused me.

8510

I think you should be able to unify the two if statements.

Can you not dyn_cast_or_null<StringLiteral>(Result.Val.getLValueBase()) here instead of casting to Expr* and checking the StmtClass?

clang/test/Sema/format-strings-scanf.c
235 ↗(On Diff #449023)

Can you show some sample output?

inclyc added inline comments.Aug 1 2022, 11:26 PM
clang/lib/Sema/SemaChecking.cpp
8510

LValueBase seems to be a wrapper of a pointer that has its own dyn_cast method.

I have changed code here like this, but it cannot compile

auto *MaybeStringLiteral =
        dyn_cast_or_null<StringLiteral *>(Result.Val.getLValueBase());
    if (MaybeStringLiteral) {
      return checkFormatStringExpr(S, MaybeStringLiteral, Args, APK, format_idx,
                                   firstDataArg, Type, CallType,
                                   /*InFunctionCall*/ false, CheckedVarArgs,
                                   UncoveredArg, Offset);
    }
<workspace>/llvm-project/llvm/include/llvm/Support/Casting.h:64:53: error: type 'clang::StringLiteral *' cannot be used prior to '::' because it has no members
  static inline bool doit(const From &Val) { return To::classof(&Val); }
clang/test/Sema/format-strings-scanf.c
235 ↗(On Diff #449023)

Source:

// sample.cpp
#include <cstdio>

int main() {
  int *i;
  scanf(1 ? "%s %d" : "%d", i);
}

previous version:

sample.cpp:5:29: warning: format specifies type 'char *' but the argument has type 'int *' [-Wformat]
  scanf(1 ? "%s %d" : "%d", i);
             ~~             ^
             %d
sample.cpp:5:18: warning: more '%' conversions than data arguments [-Wformat-insufficient-args]
  scanf(1 ? "%s %d" : "%d", i);
                ~^
2 warnings generated.

this patch highlights cond ? T : F expressions:

sample.cpp:5:29: warning: format specifies type 'char *' but the argument has type 'int *' [-Wformat]
  scanf(1 ? "%s %d" : "%d", i);
        ~~~~~~~~~~~~~~~~~~  ^
sample.cpp:5:14: note: format string is defined here
  scanf(1 ? "%s %d" : "%d", i);
             ^~
             %d
sample.cpp:5:9: warning: more '%' conversions than data arguments [-Wformat-insufficient-args]
  scanf(1 ? "%s %d" : "%d", i);
        ^~~~~~~~~~~~~~~~~~
sample.cpp:5:18: note: format string is defined here
  scanf(1 ? "%s %d" : "%d", i);
                ~^
2 warnings generated.
inclyc updated this revision to Diff 449191.Aug 1 2022, 11:30 PM

Make if statements concise and add comments

tbaeder added inline comments.Aug 1 2022, 11:32 PM
clang/lib/Sema/SemaChecking.cpp
8510

Ah, shame. This ends up calling dyn_cast on a llvm::Pointerunion as far as I can see, which seems to return nullptr in case the cast is unsuccessful anyway, so you could try just exploiting that: https://llvm.org/doxygen/classllvm_1_1PointerUnion.html#a9147352f78d98ee246f25d3300e0aaba

clang/test/Sema/format-strings-scanf.c
235 ↗(On Diff #449023)

Alright, might be a bit verbose but I agree that it is useful.

inclyc added inline comments.Aug 1 2022, 11:41 PM
clang/test/Sema/format-strings-scanf.c
236 ↗(On Diff #449191)
// long macro
#include <cstdio>

#define SOME_STRANGE_MACRO(a, b) ((0) ? (a) : (b))

int main() {
  int *i;
  scanf(SOME_STRANGE_MACRO("%d", "%d %s"), i);
}

previous:

sample.cpp:7:39: warning: more '%' conversions than data arguments [-Wformat-insufficient-args]
  scanf(SOME_STRANGE_MACRO("%d", "%d %s"), i);
                                     ~^
sample.cpp:3:48: note: expanded from macro 'SOME_STRANGE_MACRO'
#define SOME_STRANGE_MACRO(a, b) ((0) ? (a) : (b))
                                               ^
1 warning generated.

now:

sample.cpp:7:9: warning: more '%' conversions than data arguments [-Wformat-insufficient-args]
  scanf(SOME_STRANGE_MACRO("%d", "%d %s"), i);
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
sample.cpp:3:34: note: expanded from macro 'SOME_STRANGE_MACRO'
#define SOME_STRANGE_MACRO(a, b) ((0) ? (a) : (b))
                                 ^~~~~~~~~~~~~~~~~
sample.cpp:7:39: note: format string is defined here
  scanf(SOME_STRANGE_MACRO("%d", "%d %s"), i);
                                     ~^
sample.cpp:3:48: note: expanded from macro 'SOME_STRANGE_MACRO'
#define SOME_STRANGE_MACRO(a, b) ((0) ? (a) : (b))
                                               ^
1 warning generated.

I think it is better to underline the buggy expression, if this sucks maybe we can check if this is constexpr after checking conditional operator ? : or somehow other statement classes ?

Getting a review usually takes far longer than just a few hours, so don't ping too early. I also want an opinion from at least one other reviewer.

clang/lib/Sema/SemaChecking.cpp
8508

The comment should be full sentences, start with an upper-case character and end in a period:

// If this expression can be evaluated at compile-time,
// check if the result is a StringLiteral and retry

(just a suggestion)

8510

What about this? I.e. usong StringLiteral* directly in the dyn_cast call?

clang/test/Sema/format-strings-scanf.c
236 ↗(On Diff #449191)

I think the output is fine as-is for this patch, but could be improved later. Looking at this as a human, I feel as if both of the "expanded from macro" notes are redundant. But I guess they can provide useful information in other cases.

inclyc added inline comments.Aug 2 2022, 2:32 AM
clang/lib/Sema/SemaChecking.cpp
8510
FAILED: tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaChecking.cpp.o 
/usr/lib/llvm/14/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I<llvm-project>/build/tools/clang/lib/Sema -I<llvm-project>/clang/lib/Sema -I<llvm-project>/clang/include -I<llvm-project>/build/tools/clang/include -I<llvm-project>/build/include -I<llvm-project>/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -fdiagnostics-color -fno-common -Woverloaded-virtual -Wno-nested-anon-types -g  -fno-exceptions -fno-rtti -std=c++14 -MD -MT tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaChecking.cpp.o -MF tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaChecking.cpp.o.d -o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaChecking.cpp.o -c <llvm-project>/clang/lib/Sema/SemaChecking.cpp
In file included from <llvm-project>/clang/lib/Sema/SemaChecking.cpp:14:
In file included from <llvm-project>/clang/include/clang/AST/APValue.h:18:
In file included from <llvm-project>/llvm/include/llvm/ADT/APFloat.h:19:
In file included from <llvm-project>/llvm/include/llvm/ADT/ArrayRef.h:15:
<llvm-project>/llvm/include/llvm/ADT/STLExtras.h:199:42: error: implicit instantiation of undefined template 'llvm::FirstIndexOfType<const clang::StringLiteral *>'
    : std::integral_constant<size_t, 1 + FirstIndexOfType<T, Us...>::value> {};
                                         ^
<llvm-project>/llvm/include/llvm/ADT/STLExtras.h:199:42: note: in instantiation of template class 'llvm::FirstIndexOfType<const clang::StringLiteral *, clang::DynamicAllocLValue>' requested here
<llvm-project>/llvm/include/llvm/ADT/STLExtras.h:199:42: note: in instantiation of template class 'llvm::FirstIndexOfType<const clang::StringLiteral *, clang::TypeInfoLValue, clang::DynamicAllocLValue>' requested here
<llvm-project>/llvm/include/llvm/ADT/STLExtras.h:199:42: note: in instantiation of template class 'llvm::FirstIndexOfType<const clang::StringLiteral *, const clang::Expr *, clang::TypeInfoLValue, clang::DynamicAllocLValue>' requested here
<llvm-project>/llvm/include/llvm/ADT/PointerUnion.h:230:30: note: in instantiation of template class 'llvm::FirstIndexOfType<const clang::StringLiteral *, const clang::ValueDecl *, const clang::Expr *, clang::TypeInfoLValue, clang::DynamicAllocLValue>' requested here
    return F.Val.getInt() == FirstIndexOfType<To, PTs...>::value;
                             ^
<llvm-project>/llvm/include/llvm/ADT/PointerUnion.h:248:27: note: in instantiation of function template specialization 'llvm::CastInfoPointerUnionImpl<const clang::ValueDecl *, const clang::Expr *, clang::TypeInfoLValue, clang::DynamicAllocLValue>::isPossible<const clang::StringLiteral *>' requested here
    return Impl::template isPossible<To>(f);
                          ^
<llvm-project>/llvm/include/llvm/Support/Casting.h:312:19: note: in instantiation of member function 'llvm::CastInfo<const clang::StringLiteral *, llvm::PointerUnion<const clang::ValueDecl *, const clang::Expr *, clang::TypeInfoLValue, clang::DynamicAllocLValue>>::isPossible' requested here
    if (!Derived::isPossible(f))
                  ^
<llvm-project>/llvm/include/llvm/Support/Casting.h:407:23: note: in instantiation of member function 'llvm::DefaultDoCastIfPossible<const clang::StringLiteral *, llvm::PointerUnion<const clang::ValueDecl *, const clang::Expr *, clang::TypeInfoLValue, clang::DynamicAllocLValue>, llvm::CastInfo<const clang::StringLiteral *, llvm::PointerUnion<const clang::ValueDecl *, const clang::Expr *, clang::TypeInfoLValue, clang::DynamicAllocLValue>>>::doCastIfPossible' requested here
    return ForwardTo::doCastIfPossible(const_cast<NonConstFrom>(f));
                      ^
<llvm-project>/clang/include/clang/AST/APValue.h:167:37: note: in instantiation of function template specialization 'llvm::PointerUnion<const clang::ValueDecl *, const clang::Expr *, clang::TypeInfoLValue, clang::DynamicAllocLValue>::dyn_cast<const clang::StringLiteral *>' requested here
    T dyn_cast() const { return Ptr.dyn_cast<T>(); }
                                    ^
<llvm-project>/clang/lib/Sema/SemaChecking.cpp:8509:44: note: in instantiation of function template specialization 'clang::APValue::LValueBase::dyn_cast<const clang::StringLiteral *>' requested here
    auto *LVE = Result.Val.getLValueBase().dyn_cast<const StringLiteral *>();
                                           ^
<llvm-project>/llvm/include/llvm/ADT/STLExtras.h:196:46: note: template is declared here
template <typename T, typename... Us> struct FirstIndexOfType;
                                             ^
In file included from <llvm-project>/clang/lib/Sema/SemaChecking.cpp:14:
In file included from <llvm-project>/clang/include/clang/AST/APValue.h:22:
<llvm-project>/llvm/include/llvm/ADT/PointerUnion.h:234:12: error: no matching function for call to 'isPossible'
    assert(isPossible<To>(F) && "cast to an incompatible type !");
           ^~~~~~~~~~~~~~
/usr/include/assert.h:90:27: note: expanded from macro 'assert'
     (static_cast <bool> (expr)                                         \
                          ^~~~
<llvm-project>/llvm/include/llvm/ADT/PointerUnion.h:251:53: note: in instantiation of function template specialization 'llvm::CastInfoPointerUnionImpl<const clang::ValueDecl *, const clang::Expr *, clang::TypeInfoLValue, clang::DynamicAllocLValue>::doCast<const clang::StringLiteral *>' requested here
  static To doCast(From &f) { return Impl::template doCast<To>(f); }
                                                    ^
<llvm-project>/llvm/include/llvm/Support/Casting.h:314:21: note: in instantiation of member function 'llvm::CastInfo<const clang::StringLiteral *, llvm::PointerUnion<const clang::ValueDecl *, const clang::Expr *, clang::TypeInfoLValue, clang::DynamicAllocLValue>>::doCast' requested here
    return Derived::doCast(f);
                    ^
<llvm-project>/llvm/include/llvm/Support/Casting.h:407:23: note: in instantiation of member function 'llvm::DefaultDoCastIfPossible<const clang::StringLiteral *, llvm::PointerUnion<const clang::ValueDecl *, const clang::Expr *, clang::TypeInfoLValue, clang::DynamicAllocLValue>, llvm::CastInfo<const clang::StringLiteral *, llvm::PointerUnion<const clang::ValueDecl *, const clang::Expr *, clang::TypeInfoLValue, clang::DynamicAllocLValue>>>::doCastIfPossible' requested here
    return ForwardTo::doCastIfPossible(const_cast<NonConstFrom>(f));
                      ^
<llvm-project>/clang/include/clang/AST/APValue.h:167:37: note: in instantiation of function template specialization 'llvm::PointerUnion<const clang::ValueDecl *, const clang::Expr *, clang::TypeInfoLValue, clang::DynamicAllocLValue>::dyn_cast<const clang::StringLiteral *>' requested here
    T dyn_cast() const { return Ptr.dyn_cast<T>(); }
                                    ^
<llvm-project>/clang/lib/Sema/SemaChecking.cpp:8509:44: note: in instantiation of function template specialization 'clang::APValue::LValueBase::dyn_cast<const clang::StringLiteral *>' requested here
    auto *LVE = Result.Val.getLValueBase().dyn_cast<const StringLiteral *>();
                                           ^
<llvm-project>/llvm/include/llvm/ADT/PointerUnion.h:229:45: note: candidate template ignored: substitution failure [with To = const clang::StringLiteral *]
  template <typename To> static inline bool isPossible(From &F) {
                                            ^
2 errors generated.
[1908/2420] Building CXX object tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaExpr.cpp.o
ninja: build stopped: subcommand failed.

oops, did I misunderstand you? this is my source code:

// If this expression can be evaluated at compile-time,
// check if the result is a StringLiteral and retry
Expr::EvalResult Result;
if (E->EvaluateAsRValue(Result, S.Context) && Result.Val.isLValue()) {
  auto *LVE = Result.Val.getLValueBase().dyn_cast<const StringLiteral *>();
  if (LVE)
    return checkFormatStringExpr(
        S, LVE, Args, APK, format_idx, firstDataArg, Type, CallType,
        /*InFunctionCall*/ false, CheckedVarArgs, UncoveredArg, Offset);
}
tbaeder added inline comments.Aug 2 2022, 2:42 AM
clang/lib/Sema/SemaChecking.cpp
8510

Okay, the documentation tricked me, sorry. Should be fine as-is.

inclyc updated this revision to Diff 449236.Aug 2 2022, 2:49 AM

update comments

Thank you for working on this, this is heading in the right direction! Please be sure to also add a release note so users know about the new functionality.

clang/lib/Sema/SemaChecking.cpp
8512
8513
clang/test/Sema/format-strings.c
625 ↗(On Diff #449236)

Most of these new notes like this seems chatty -- perhaps we should silence the note when it's going to be located on the same line as the warning?

263 ↗(On Diff #449023)

I think we don't want to lose the warning here -- a char * const is not a string literal or a reasonable replacement for one.

inclyc updated this revision to Diff 449510.Aug 2 2022, 7:10 PM

Now only checks functions calls

inclyc added a comment.Aug 2 2022, 7:28 PM

The previous version of this patch checked constant string literals before checking various types of statements, while the current version only checks for compile-time evaluation literals after function calls, and now other regression tests remain as they are.

aaron.ballman accepted this revision.Aug 4 2022, 5:33 AM

LGTM aside from some small corrections. Thank you for this! Do you need me to commit on your behalf? If so, what name and email address would you like me to use for patch attribution?

clang/lib/Sema/SemaChecking.cpp
8719–8723
8840
This revision is now accepted and ready to land.Aug 4 2022, 5:33 AM
inclyc added a comment.Aug 4 2022, 6:00 AM

Thank you! I already have the commit access, I can commit this myself ^_^

This revision was landed with ongoing or failed builds.Aug 4 2022, 6:08 AM
This revision was automatically updated to reflect the committed changes.

Thank you! I already have the commit access, I can commit this myself ^_^

Excellent! Feel free to land when you're ready. :-)

clang/lib/Sema/SemaChecking.cpp
8719–8723

This one should still be const Expr * -- we only use auto when the type is spelled out explicitly in the initialization.

inclyc added a comment.Aug 4 2022, 6:12 AM

Thank you! I already have the commit access, I can commit this myself ^_^

Excellent! Feel free to land when you're ready. :-)

Sorry I haven't noticed this yet, should I revert changes and commit again?

Thank you! I already have the commit access, I can commit this myself ^_^

Excellent! Feel free to land when you're ready. :-)

Sorry I haven't noticed this yet, should I revert changes and commit again?

No worries! And no need to revert; you can land an NFC change that switches the auto to Expr (no need for a review).

inclyc added a comment.Aug 4 2022, 6:23 AM

Last question, issue https://github.com/llvm/llvm-project/issues/55805 mentioned another case, some initListExpr evaulated as "StringLiteral"(Array in fact, but maybe able to consider as a literal), should we implement this?

Last question, issue https://github.com/llvm/llvm-project/issues/55805 mentioned another case, some initListExpr evaulated as "StringLiteral"(Array in fact, but maybe able to consider as a literal), should we implement this?

Yes, I think we should. The following are functionally equivalent:

static constexpr const char *str1 = "%s";
static constexpr const char str2[] = { '%', 's', 0 };