This is an archive of the discontinued LLVM Phabricator instance.

[libFuzzer] Fix DataFlow.cpp logic when tracing long inputs.
ClosedPublic

Authored by Dor1s on Apr 10 2019, 1:42 PM.

Details

Summary
  1. Do not create DFSan labels for the bytes which we do not trace. This is where we run out of labels at the first place.
  2. When dumping the traces on the disk, make sure to offset the label identifiers by the number of the first byte in the trace range.
  3. For the last label, make sure to write it at the last position of the trace bit string, as that label represents the input size, not any particular byte.

Also fixed the bug with division in python which I've introduced when migrated the scripts to Python3 (// is required for integral division).

Otherwise, the scripts are wasting too much time unsuccessfully trying to
collect and process traces from the long inputs. For more context, see
https://github.com/google/oss-fuzz/issues/1632#issuecomment-481761789

Event Timeline

Dor1s created this revision.Apr 10 2019, 1:42 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptApr 10 2019, 1:42 PM
Herald added subscribers: Restricted Project, delcypher. · View Herald Transcript
Dor1s updated this revision to Diff 194583.Apr 10 2019, 1:45 PM

Update the test to reflect the change and make sure it passes.

kcc added a comment.Apr 10 2019, 4:49 PM

I don't think this is right. This subprocess is *expected* to fail with exactly this message when we run out of labels,
and then we handle the input as two subsets, and so on.
But this error must not happen if the range is >= 2 (two labels should not cause this error), so the process converges.
We can not predict when we run out of labels -- for some 8k inputs it will work from the first attempt,
for some much smaller inputs it will require several bisections.

I would simply limit the size of the inputs used in DFT-based fuzzing to something like 8K for now, and see how it works.

Hey @kcc, I've figured out what's the problem here. Will update the description and the CL shortly.

Dor1s updated this revision to Diff 194919.Apr 12 2019, 11:07 AM

This patch fixes the issue with the long inputs by not creating DFSan labels
for the bytes which we do not trace, and by offseting the resulting labels to
make the dumps easily mergeable.

Dor1s retitled this revision from [libFuzzer] Skip too long inputs in the data flow scripts. to [Draft] [libFuzzer] Fix DataFlow.cpp logic when tracing long inputs..Apr 12 2019, 11:10 AM
Dor1s edited the summary of this revision. (Show Details)
Dor1s edited the summary of this revision. (Show Details)

Please see this draft proposal / explanation for the issue. The bisection in python is fine! :)

Btw, I did some testing locally, with a 5 bytes input. I was tracing it 3 times:

  • full
  • 0 3 bytes
  • 3 5 bytes

see the traces below:

root@7fc00dc69f76:/out# cat full start end
F1 111111
F4 111111
F7 111111
F11 111111
F12 100001

F1 111001
F4 111001
F7 111001
F11 111001
F12 100001

F1 000111
F4 000111
F7 000111
F11 000111
F12 000001

root@7fc00dc69f76:/out# cat full3 start3 end3
F1 111111
F4 111111
F7 111111
F11 111111
F12 100001

F1 111001
F4 111001
F7 111001
F11 111001
F12 100001

F1 000111
F4 000111
F7 000111
F11 000111
F12 000001

If we merge start and end, we get full -- these three are obtained using the current implementation. After applying this CL, the same works for merging start3 and end3 resulting in full3 which is equal to full, i.e. I believe that nothing's broken.

kcc added a comment.Apr 12 2019, 12:25 PM

Code LGTM, but please also add a test that would fail with current code and pass with your change.
Either extend test/fuzzer/dataflow.test or add another one nearby.

Dor1s updated this revision to Diff 194949.Apr 12 2019, 1:28 PM
Dor1s edited the summary of this revision. (Show Details)

Add the test, remove debug logging, fix python3 division.

Dor1s updated this revision to Diff 194950.Apr 12 2019, 1:29 PM

fix a typo

Dor1s retitled this revision from [Draft] [libFuzzer] Fix DataFlow.cpp logic when tracing long inputs. to [libFuzzer] Fix DataFlow.cpp logic when tracing long inputs..Apr 12 2019, 1:30 PM
Dor1s edited the summary of this revision. (Show Details)
kcc accepted this revision.Apr 12 2019, 1:31 PM

LGTM++

This revision is now accepted and ready to land.Apr 12 2019, 1:31 PM
This revision was automatically updated to reflect the committed changes.