This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] Add io-functions-misused checker
Needs RevisionPublic

Authored by hgabii on Jan 30 2018, 3:12 AM.

Details

Summary

Add misc-io-functions-misused checker to warns for cases when the return value of certain standard iostream C functions (fgetc(), getc(), getchar(), fgetwc(), getwc(), getwchar(), istream::get()) return int value which is then incorrectly stored in char typed values.

Diff Detail

Event Timeline

hgabii created this revision.Jan 30 2018, 3:12 AM

Why only the wide versions?
Even narrow versions return an int so that you can check for EOF.

May be bugprone is better module then misc?

docs/ReleaseNotes.rst
63

Please add () to function names and enclose each of them in ``. Same in documentation.

Eugene.Zelenko edited reviewers, added: alexfh, aaron.ballman, hokein, ilya-biryukov; removed: Restricted Project.Jan 30 2018, 10:53 AM
alexfh requested changes to this revision.Jan 31 2018, 4:04 AM
alexfh added inline comments.
clang-tidy/misc/IoFunctionsMisusedCheck.cpp
33 ↗(On Diff #131940)

matchesName can be pretty expensive and should generally be avoided. What set of types should be matched here? Can they all be listed here (using hasAnyName)?

43–46 ↗(On Diff #131940)

It looks like the code doesn't need to distinguish between these two. They can be bound to the same ID then.

clang-tidy/misc/IoFunctionsMisusedCheck.h
19 ↗(On Diff #131940)

s/checker/check/

docs/clang-tidy/checks/misc-io-functions-misused.rst
26–27 ↗(On Diff #131940)

The formatting is off.

This revision now requires changes to proceed.Jan 31 2018, 4:04 AM
dkrupp added a subscriber: dkrupp.May 25 2018, 3:50 AM
hgabii updated this revision to Diff 153863.Jul 3 2018, 12:07 AM

Sorry, I forgot about my pull request.

I fixed the issues what are identified by reviewers :

  • Check moved to bugprone module
  • () and `` added
  • matchesName replaced with hasAnyName

-Unnecessary IDs depracated

  • clang-tidy/misc/IoFunctionsMisusedCheck.h:19: "s/checker/check/" => Fixed all places
  • Formatting fixed

Only thing left is to match all IO functions, not just the wide versions.

hgabii retitled this revision from [clang-tidy] Add misc-io-functions-misused checker to [clang-tidy] Add io-functions-misused checker.Jul 3 2018, 7:20 AM
aaron.ballman added inline comments.Jul 3 2018, 8:12 AM
clang-tidy/bugprone/BugproneTidyModule.cpp
84

This name reads a bit awkwardly because usually "misused" would come before "io functions". However, given that we've already said it's prone to bugs, I think "misuse" is somewhat implied. Perhaps bugprone-io-functions or `bugprone-misused-io-functions'?

However, I think this is actually checking an existing CERT rule and perhaps should be added to the CERT module as cert-fio34-c and perhaps that's the only module it should live in?

clang-tidy/bugprone/IoFunctionsMisusedCheck.cpp
25–31 ↗(On Diff #153863)

You can combine these hasAnyName() calls into one given that they are otherwise identical matchers.

26 ↗(On Diff #153863)

Why only the wide char versions and not the narrow char versions as well? For instance, those are problematic on systems where sizeof(int) == sizeof(char). See https://wiki.sei.cmu.edu/confluence/display/c/FIO34-C.+Distinguish+between+characters+read+from+a+file+and+EOF+or+WEOF for more details.

alexfh requested changes to this revision.Jul 11 2018, 8:58 AM

(removing from my dashboard)

This revision now requires changes to proceed.Jul 11 2018, 8:58 AM
gerazo added a subscriber: gerazo.Nov 26 2018, 7:53 AM
hgabii updated this revision to Diff 176732.Dec 4 2018, 4:33 PM
hgabii marked an inline comment as done.
hgabii edited the summary of this revision. (Show Details)
hgabii marked an inline comment as done.Dec 4 2018, 4:36 PM

I moved to bugprone module and renamed to bugprone-io-functions.
I added as a reference for cert-fio34-c.

Eugene.Zelenko added inline comments.Dec 4 2018, 4:45 PM
docs/clang-tidy/checks/bugprone-io-functions.rst
5

Please adjust length.

hgabii updated this revision to Diff 176736.Dec 4 2018, 4:52 PM
hgabii marked an inline comment as done.Dec 4 2018, 4:55 PM
aaron.ballman added inline comments.Dec 6 2018, 10:21 AM
clang-tidy/bugprone/IoFunctionsCheck.cpp
32

::std::istream

47–49

This diagnostic confuses me, so perhaps you can explain the situation you want to diagnose a bit further.

FIO34-C is about situations where sizeof(char) == sizeof(int) and the call returns EOF/WEOF. In that case, there's no way to distinguish between an EOF/error and a valid character. Suggesting to explicitly cast to a character type doesn't add any safety to the code -- the user needs to insert calls to feof() or ferror() instead (to make it portable, anyway) and should store the character value in a sufficiently large integer type before doing the comparison against EOF.

Are you trying to catch a different kind of bug?

hgabii updated this revision to Diff 177713.Dec 11 2018, 7:47 AM
hgabii marked 3 inline comments as done.Dec 11 2018, 7:52 AM
hgabii added inline comments.
clang-tidy/bugprone/IoFunctionsCheck.cpp
32

True. Fixed.

47–49

Yes, I want to diagnose this problem. Do you think I need to add suggestions into the warning message to use feof() or ferror()?

aaron.ballman added inline comments.Dec 14 2018, 8:43 AM
clang-tidy/bugprone/IoFunctionsCheck.cpp
47–49

I think there are two different situations to think about:

  1. If the target architecture has sizeof(char) == sizeof(int), no amount of casting will help the user because the bit pattern for EOF is the same whether stored as an int or a char on that platform. There, the user really must use feof() and ferror(). This situation is theoretically even worse for the wide versions of the APIs because sizeof(wint_t) == sizeof(wchar_t) quite commonly, except that it would require a terrible choice in character sets for wchar_t by the implementation because Unicode makes guarantees about certain bit patterns not being a valid character (like 0xFFFF in UTF-16). I don't know that we have to worry about the wide-char versions at all in practice, but if you know of a platform where this is a legitimate concern, I'd love to hear about it.
  1. If the target architecture does not have this identical-size-in-band-error-indicator problem, then it might make sense to diagnose the code for portability reasons or it might make sense to not diagnose at all because the user won't run into it. Might be worth making this an optional feature of the check.

Either way, I don't know that we'll be able to suggest fixes for the user in a diagnostic message of reasonable length. This might be a case where we want to tell the user about the danger and let the documentation tell them how to fix it. Maybe a diagnostic like: "the value for %select{EOF|WEOF}0 is indistinguishable from a valid character with the same bit pattern; consider explicit checks for error or end-of-file indicators"

alexfh requested changes to this revision.Oct 12 2020, 6:56 AM
This revision now requires changes to proceed.Oct 12 2020, 6:56 AM