The string to integer conversion functions are well suited to
differential fuzzing, and this patch adds a target to enable just that.
It also fixes a bug in the fuzzing comparison logic and changes atoi
slightly to match the behavior described in the C standard.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
libc/fuzzing/stdlib/strtointeger_differential_fuzz.cpp | ||
---|---|---|
31 | constexpr and VALID_CHARS. | |
38 | Add a comment briefly explaining the fuzz strategy. Like, give the information that the first byte of of data will be treated as the target base for strto* functions. | |
49 | Why should we make the input clean? It defeats the purpose of fuzzing in a way. Since this is a differential fuzz setup, I am not really worried about garbage. Also, valid ASCII characters make half of the uint8_t range anyway. So, we don't have to worry about most of the data inputs being just invalid numbers. Also, the above list of valid chars does not have the + and - signs. | |
54 | should | |
libc/src/stdlib/atoi.cpp | ||
16 ↗ | (On Diff #483388) | I think this change is trying to match the standard prescribed behavior. Can you please do this separately and add unit tests with inputs which after affected by this behavior if any? |
address comments
libc/fuzzing/stdlib/strtointeger_differential_fuzz.cpp | ||
---|---|---|
49 | The reason I want to clean the input is because otherwise almost all of the inputs won't actually exercise the function at all. If the first character is a question mark (for example), then the function won't actually parse anything and it will return 0. Testing that isn't bad, but given that more than half of the ascii range falls into that category there will be a lot of cycles spent on those obvious cases instead of actually testing the branches. |
libc/fuzzing/stdlib/strtointeger_differential_fuzz.cpp | ||
---|---|---|
49 | Your argument is fair. My point is that it might not help in exercising all code paths. Take decimal numbers for example. The set of valid characters is only 10 out of the 62 there are in VALID_CHARS now. But, there are more valid numbers as you progress to higher bases. For base 36, I think there are no invalid inputs at all? You can choose to implement two modes - one for human testing and another for continuous testing. For human testing, clean the input like you do in this patch currently. For continuous testing, don't clean the input. You will have to add two targets with a distinguishing pre-processor macro. You should add - and + to the set of valid chars. Or, you can prepend a - and + to each clean input in the human mode and test again. |
move to compile flag for cleaner input
libc/fuzzing/stdlib/strtointeger_differential_fuzz.cpp | ||
---|---|---|
49 | this sounds like a good compromise to me. | |
libc/src/stdlib/atoi.cpp | ||
16 ↗ | (On Diff #483388) | I've uploaded a separate patch here: https://reviews.llvm.org/D140350 |
libc/fuzzing/stdlib/strtointeger_differential_fuzz.cpp | ||
---|---|---|
51 | It looks like cleaner is not used anymore and not get deleted afterward? |
constexpr and VALID_CHARS.