This is an archive of the discontinued LLVM Phabricator instance.

Reduce false positives in printf/scanf format checker
ClosedPublic

Authored by AndyG on Dec 18 2015, 3:36 AM.

Details

Summary

The printf/scanf format checker is a little over-zealous in handling the conditional operator. This patch reduces work by not checking code-paths that are never used and reduces false positives regarding uncovered arguments, for example in the code fragment:

printf(minimal ? "%i\n" : "%i: %s\n", code, msg);

Diff Detail

Event Timeline

AndyG updated this revision to Diff 43213.Dec 18 2015, 3:36 AM
AndyG retitled this revision from to Reduce false positives in printf/scanf format checker.
AndyG updated this object.
AndyG added a reviewer: cfe-commits.
AndyG edited reviewers, added: rsmith, dblaikie; removed: cfe-commits.Dec 21 2015, 1:38 AM
AndyG added a subscriber: cfe-commits.

Richard, David,

Sorry, I've added you as reviewers by proximity to recent relevant changes in SemaChecking.cpp. Hope you don't mind. Please tell me who to redirect to, if there are more applicable reviewers.

Cheers
Andy

Richard, you have been recommended to me as a suitable reviewer by David. If that's not ok with you, please could you recommend another! Thanks, Andy.

rtrieu edited edge metadata.Jan 25 2016, 12:18 PM

Hi Andy,

Could you give some motivation for this patch? From your example:

printf(minimal ? "%i\n" : "%i: %s\n", code, msg);

I would expect the first format string to produce a warning that msg is not used.

AndyG added a comment.EditedJan 26 2016, 12:06 AM

Hi Richard,

Thank you for looking at my patch. I would argue that the code

printf(minimal ? "%i\n" : "%i: %s\n", code, msg);

is valid and should not cause a warning due to unused arguments since at least one code path uses all the arguments. This in my view is the problem with the diagnostic as it currently stands.

I should expect that the following would cause the warning:

printf(minimal ? "%i\n" : "%i: %s\n", code, msg, unused);

which indeed it still does under this patch for the argument 'unused'. Also, where the compiler can determine that a code path is never followed then it will still produce the warning, as in:

printf(1 ? "%i\n" : "%i: %s\n", code, msg);

Cheers,
Andy

It looks like you are limiting to one diagnostic per printf, no matter the number of format strings. How is the case when multiple format strings would trigger the warning?

AndyG added a comment.Jan 28 2016, 6:25 AM

Yes, but only for the "data argument not used" warning. All other warnings are unaffected by the change, for example:

test.cpp:9:51: warning: format specifies type 'char *' but the argument has type 'int' [-Wformat]                                                  
    printf(minimal ? "%s %s %i\n" : "%i %i %i\n", code, msg);
                      ~~                          ^~~~
                      %d
test.cpp:9:30: warning: more '%' conversions than data arguments [-Wformat]
    printf(minimal ? "%s %s %i\n" : "%i %i %i\n", code, msg);
                            ~^
test.cpp:9:57: warning: format specifies type 'int' but the argument has type 'const char *' [-Wformat]
    printf(minimal ? "%s %s %i\n" : "%i %i %i\n", code, msg);
                                        ~~              ^~~
                                        %s
test.cpp:9:45: warning: more '%' conversions than data arguments [-Wformat]
    printf(minimal ? "%s %s %i\n" : "%i %i %i\n", code, msg);
                                           ~^
4 warnings generated.

But, as you observed, for the unused argument it only warns for the first instance:

test.cpp:9:53: warning: data argument not used by format string [-Wformat-extra-args]
    printf(minimal ? "%i\n" : "%i %s\n", code, msg, msg);
                              ~~~~~~~~~             ^
1 warning generated.

This is in comparison to the original diagnostic:

test.cpp:9:48: warning: data argument not used by format string [-Wformat-extra-args]
    printf(minimal ? "%i\n" : "%i %s\n", code, msg, msg);
                     ~~~~~~                    ^
test.cpp:9:53: warning: data argument not used by format string [-Wformat-extra-args]
    printf(minimal ? "%i\n" : "%i %s\n", code, msg, msg);
                              ~~~~~~~~~             ^
2 warnings generated.

However, may I ask what the specific concern is here? If this diagnostic appears then the user knows that all the strings do not use this and any subsequent argument. Effectively the original situation doesn't really tell the user anything more useful - it only duplicates the warning and actually, as can be seen above, it is potentially confusing since it suggests two positions for the unused argument(s). Compare this with the new diagnostic which shows only one string, and this string is the first string which uses the most arguments, so it is very clear which arguments are not used.

What do you think? Is this an adequate argument?

AndyG added a comment.Feb 3 2016, 2:30 PM

Thoughts? Am I good to go?

Cheers,
Andy

rtrieu added a comment.Feb 3 2016, 3:29 PM

Oops, forget to hit send on my last comment.

My concern is something like:

printf(condition ? "first message: %d" : "second message: %d", 5, 10);

There's nothing in the code implying that either message is the one that should be using two arguments, so which one will be highlighted? At that point, we may need to consider rewriting the message to indicate that multiple format strings would be affected.

AndyG added a comment.Feb 3 2016, 11:49 PM

In your case, the first string would be highlighted only. Yes, I see what you mean. Is it possible to have multiple ranges for the diagnostic? By which I mean, to produce the following:

test.cpp:x:y: warning: data argument not used by format string [-Wformat-extra-args]
    printf(condition ? "first message: %d" : "second message: %d", 5, 10);
                       ~~~~~~~~~~~~~~~~~~~   ~~~~~~~~~~~~~~~~~~~~     ^

If so, I think the diagnostic should still track just the strings utilising the most arguments, so this would be the case with three messages, missing out the middle string since it is already using a sub-set of the arguments anyway:

test.cpp:x:y: warning: data argument not used by format string [-Wformat-extra-args]
    printf(a ? "%d %d" : b ? "%d" : "%d %d", 5, 10, 20);
                ~~~~~                ~~~~~          ^

If you think this isn't sufficient, then my suggestion would be to make the range of the diagnostic the complete format expression, and let the programmer assess it themselves.

rtrieu added a comment.Feb 4 2016, 8:51 PM

In your case, the first string would be highlighted only. Yes, I see what you mean. Is it possible to have multiple ranges for the diagnostic? By which I mean, to produce the following:

test.cpp:x:y: warning: data argument not used by format string [-Wformat-extra-args]
    printf(condition ? "first message: %d" : "second message: %d", 5, 10);
                       ~~~~~~~~~~~~~~~~~~~   ~~~~~~~~~~~~~~~~~~~~     ^

If so, I think the diagnostic should still track just the strings utilising the most arguments, so this would be the case with three messages, missing out the middle string since it is already using a sub-set of the arguments anyway:

test.cpp:x:y: warning: data argument not used by format string [-Wformat-extra-args]
    printf(a ? "%d %d" : b ? "%d" : "%d %d", 5, 10, 20);
                ~~~~~                ~~~~~          ^

If you think this isn't sufficient, then my suggestion would be to make the range of the diagnostic the complete format expression, and let the programmer assess it themselves.

I believe your two examples are the way to go, highlighting all the format strings using the most arguments. You can pass multiple SourceRange's to a diagnostic. To do that, in your Diagnose function, store the result of P.Diag in a local reference, then pass in the extra SourceRange's, then pass that reference to EmitFormatDiagnostic.

AndyG updated this revision to Diff 47426.EditedFeb 10 2016, 1:14 AM
AndyG removed reviewers: dblaikie, rsmith.

All strings matching the highest uncovered argument are now highlighted in the diagnostic.

EDIT: I also rebased against r260222

AndyG added a comment.Feb 15 2016, 1:51 AM

Richard, are you happy with this latest version? Can I proceed to commit it?

Thanks.

Richard, are you happy with this latest version? Can I proceed to commit it?

Thanks.

No, this is not ready yet. This code is old and needs a bit of clean up, and because your patch touches it, you will be one that needs to clean it up before committing your patch. Mainly, the changes are needed so that you can pass your arg handler properly through the function calls.

include/clang/Sema/Sema.h
9100–9106 ↗(On Diff #47426)

Remove this from Sema.h and make the function static in SemaChecking.cpp. It is only called from SemaChecking.cpp anyways and doesn't need access to anything in Sema.

lib/Sema/SemaChecking.cpp
3261

Make this a class. Its members needs to be segmented into private and public parts.

3262–3264

This section should be private section.

3266–3268

These should be made public.

3522

Pass in UncoveredArg and update within CheckFormatString function

3598–3604

FirstUncoveredArg is now a private member so there's two paths, either will work:

  1. Create two public function for UncoveredArgHandler,
bool UncoveredArgHandler::hasUncoveredArg();
unsigned UncoveredArgHandler::getUncoveredArg();

then use:

if (UncoveredArg.hasUncoveredArg()) {
  unsigned ArgIdx = UncoveredArg.getUncoveredArg() + firstDataArg;

getUncoveredArg() should never return negative numbers as that should remain an implementation detail.

  1. Pass both Args and firstDataArg to Diagnose and do the logic there.
3675

Don't change the call signature here, leave it as a zero argument call. Have another function return the first uncovered argument.

3914–3966

Move these two functions up to where the class is defined.

3930–3931

Use a range-based for-loop here.

5027–5028

Make this a static void function.
Add a new first argument "Sema &S".

static void CheckFormatString(Sema &S, const StringLiteral *FExpr,

The other changes for changing this function are listed below. Also, a forward declaration of this function is necessary higher in this file since a previous function will call this function.

5027–5028

FormatStringType and VariadicCallType need to qualified with "Sema::"

5028

Pass in an UncoveredArgHandler here instead of an unsigned.

5040–5041

*this => S
Here and throughout this function

5041–5042

PDiag => S.PDiag
Here and throughout this function

5051–5052

Context => S.Context
Here and throughout this function

5077–5078

FST_* => Sema::FST_*
Here and throughout this function

5092

This line should remain the same. Additional function should be added to check the status of uncovered arguments and pass that to UncoveredArgHandler.

5102

You have modified the Scanf handling function, but there are no additional scanf tests.

test/Sema/format-strings.c
107

Add a test for having the format string as a constant literal defined elsewhere. This should generate a note to point to the original format string. You changes removes this note. I am fine having this listed as a FIXME for a follow up patch.

AndyG marked 7 inline comments as done.Feb 22 2016, 5:15 AM

I've removed CheckFormatString from Sema.h and make it a static function inside SemaChecking.cpp in r261522. I'll submit a new diff here with the remaining requested changes for this patch when I have a moment.

AndyG marked 11 inline comments as done.Feb 22 2016, 7:44 AM

Revised patch coming shortly...

lib/Sema/SemaChecking.cpp
3675

DoneProcessing is that function! To move the logic into another function would leave nothing behind :o)

Passing UncoveredArgHandler& into CheckFormatHandler allows this to be worked around.

3914–3966

UncoveredArgHandler::Update has been moved up, but UncoveredArgHandler::Diagnose has been left since it has a dependency on CheckFormatHandler.

3930–3931

Except that it is iterating from the second item in the list. I don't think there is an easier way to do it.

AndyG updated this revision to Diff 48678.Feb 22 2016, 7:47 AM
AndyG marked 2 inline comments as done.

Patch additionally re-based off r261522.

AndyG added inline comments.Feb 22 2016, 8:55 AM
lib/Sema/SemaChecking.cpp
3912

Rather than FExpr, I think this should be OrigFormatExpr?

A few more comments.

Patch additionally re-based off r261522.

It's usually a bad idea to rebase in the middle of a string of patches. Phabricator isn't aware of revisions, so while a base to latest patch will show up fine, attempting to diff the original patch to the latest will show all the upstream changes as well.

lib/Sema/SemaChecking.cpp
3279

Add an assert that NewFirstUncoveredArg >= 0

3284–3288

Move this to a new function "setAllCovered()"

3823–3833

Use two different function calls depending on the result of the find_first(). This also preserves the existing assert.

signed notCoveredArg = CoveredArgs.find_first();
if (notCoveredArg >= 0) {
  assert((unsigned)notCoveredArg < NumDataArgs);
  UncoveredArg.Update(notCoveredArg, OrigFormatExpr);
} else {
  UncoveredArg.setAllCovered();
}
3912

OrigFormatExpr is probably better.

3930–3931

That raises the question why the first element is different from the rest.

5092

Not exactly what I said, but this works too.

AndyG updated this revision to Diff 49058.Feb 25 2016, 7:25 AM
AndyG marked 5 inline comments as done.

Updated patch according to the comments made. Also spotted a corner-case where a double-diagnostic was produced, for example in the following where one string has too few arguments and the other string too many:

printf(minimal ? "%i\n" : "%i %s %s\n", code, msg);

Hitherto, this produced two diagnostics: the first (correct) diagnostic that more '%' conversions exist than data arguments, but then a second diagnostic regarding uncovered arguments. The second diagnostic is now suppressed since the format string that has too many does also cover all arguments.

AndyG added inline comments.Feb 25 2016, 7:30 AM
lib/Sema/SemaChecking.cpp
3930–3931

I've put this as a range-based for-loop as requested. The reason it wasn't before is that one expression is passed through the call to EmitFormatDiagnostic. Ultimately, I think EmitFormatDiagnostic needs to be reworked to support multiple format strings - and this will probably resolve the FIXME relating to the missing diagnostic note in the tests.

rtrieu accepted this revision.Feb 25 2016, 9:08 PM
rtrieu edited edge metadata.

LGTM
The other issues brought up in this review can go in follow up patches.

This revision is now accepted and ready to land.Feb 25 2016, 9:08 PM
AndyG closed this revision.Feb 26 2016, 7:39 AM