This is an archive of the discontinued LLVM Phabricator instance.

[libc] add fuzz target for strtointeger functions
ClosedPublic

Authored by michaelrj on Dec 15 2022, 4:38 PM.

Details

Summary

The string to integer conversion functions are well suited to
differential fuzzing, and this patch adds a target to enable just that.
It also fixes a bug in the fuzzing comparison logic and changes atoi
slightly to match the behavior described in the C standard.

Diff Detail

Event Timeline

michaelrj created this revision.Dec 15 2022, 4:38 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptDec 15 2022, 4:38 PM
michaelrj requested review of this revision.Dec 15 2022, 4:38 PM
sivachandra added inline comments.Dec 15 2022, 10:54 PM
libc/fuzzing/stdlib/strtointeger_differential_fuzz.cpp
31

constexpr and VALID_CHARS.

38

Add a comment briefly explaining the fuzz strategy. Like, give the information that the first byte of of data will be treated as the target base for strto* functions.

49

Why should we make the input clean? It defeats the purpose of fuzzing in a way. Since this is a differential fuzz setup, I am not really worried about garbage. Also, valid ASCII characters make half of the uint8_t range anyway. So, we don't have to worry about most of the data inputs being just invalid numbers. Also, the above list of valid chars does not have the + and - signs.

54

should

libc/src/stdlib/atoi.cpp
16 ↗(On Diff #483388)

I think this change is trying to match the standard prescribed behavior. Can you please do this separately and add unit tests with inputs which after affected by this behavior if any?

michaelrj updated this revision to Diff 484052.Dec 19 2022, 1:19 PM
michaelrj marked 3 inline comments as done.

address comments

libc/fuzzing/stdlib/strtointeger_differential_fuzz.cpp
49

The reason I want to clean the input is because otherwise almost all of the inputs won't actually exercise the function at all. If the first character is a question mark (for example), then the function won't actually parse anything and it will return 0. Testing that isn't bad, but given that more than half of the ascii range falls into that category there will be a lot of cycles spent on those obvious cases instead of actually testing the branches.

sivachandra added inline comments.Dec 19 2022, 2:22 PM
libc/fuzzing/stdlib/strtointeger_differential_fuzz.cpp
49

Your argument is fair. My point is that it might not help in exercising all code paths. Take decimal numbers for example. The set of valid characters is only 10 out of the 62 there are in VALID_CHARS now. But, there are more valid numbers as you progress to higher bases. For base 36, I think there are no invalid inputs at all? You can choose to implement two modes - one for human testing and another for continuous testing. For human testing, clean the input like you do in this patch currently. For continuous testing, don't clean the input. You will have to add two targets with a distinguishing pre-processor macro.

You should add - and + to the set of valid chars. Or, you can prepend a - and + to each clean input in the human mode and test again.

michaelrj updated this revision to Diff 484103.Dec 19 2022, 3:21 PM
michaelrj marked 3 inline comments as done.

move to compile flag for cleaner input

libc/fuzzing/stdlib/strtointeger_differential_fuzz.cpp
49

this sounds like a good compromise to me.

libc/src/stdlib/atoi.cpp
16 ↗(On Diff #483388)

I've uploaded a separate patch here: https://reviews.llvm.org/D140350

sivachandra accepted this revision.Dec 19 2022, 4:00 PM
This revision is now accepted and ready to land.Dec 19 2022, 4:00 PM

rebase before landing

lntue added inline comments.Dec 20 2022, 10:36 AM
libc/fuzzing/stdlib/strtointeger_differential_fuzz.cpp
51

It looks like cleaner is not used anymore and not get deleted afterward?

clean up the last references to the "cleaner" object

lntue accepted this revision.Dec 20 2022, 10:44 AM
This revision was landed with ongoing or failed builds.Dec 20 2022, 10:48 AM
This revision was automatically updated to reflect the committed changes.