This is an archive of the discontinued LLVM Phabricator instance.

Use binary write mode in WriteToFile function to avoid appended \r characters on Windows
ClosedPublic

Authored by tuktuk on Mar 29 2019, 12:40 PM.

Download Raw Diff

Details

Reviewers

kcc
vitalybuka

Commits

rG3db6ad2bcfc5: Use binary write mode in WriteToFile function to avoid appended \r characters…
rCRT357807: Use binary write mode in WriteToFile function to avoid appended \r characters…
rL357807: Use binary write mode in WriteToFile function to avoid appended \r characters…

Summary

When using libfuzzer on Windows, in the contents of a crash sample, bytes that can be mistaken for a \n are replaced by a \r\n sequence. As a consequence, crashes are not reproducible. This patch will open files in binary mode to fix this issue. The patch does not affect POSIX systems.

Patch by tuktuk

Diff Detail

Repository: rL LLVM

Event Timeline

tuktuk created this revision.Mar 29 2019, 12:40 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMar 29 2019, 12:40 PM

Herald added subscribers: Restricted Project, llvm-commits. · View Herald Transcript

vitalybuka accepted this revision.Mar 29 2019, 12:46 PM

This revision is now accepted and ready to land.Mar 29 2019, 12:46 PM

Oh, can you add a test so it fail without the patch on Windows?

In D60008#1448260, @vitalybuka wrote:

Oh, can you add a test so it fail without the patch on Windows?

Hello,

I am happy to provide a simple example but I do not know the code base enough to turn it into a test.

With the following fuzz target:

// fuzz_target.cc
#include <cstdint>

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size)
{
  if (Data[0] == '\n') {
    __builtin_trap();
  }
  return 0;
}

The produced crash is not reproducible:

>"C:\Program Files\LLVM\bin\clang++.exe" -fsanitize=address,fuzzer 
fuzz_target.cc -o fuzzer.exe

> fuzzer.exe
SUMMARY: libFuzzer: deadly signal
Test unit written to ./crash-adc83b19e793491b1c6ea0fd8b46cd9f32e592fc
Base64: Cg==

> fuzzer.exe crash-adc83b19e793491b1c6ea0fd8b46cd9f32e592fc
Running: crash-adc83b19e793491b1c6ea0fd8b46cd9f32e592fc
Executed crash-adc83b19e793491b1c6ea0fd8b46cd9f32e592fc in 2 ms
***
*** NOTE: fuzzing was not performed, you have only
***       executed the target code on a fixed set of inputs.
***

Indeed the contents of the produced file is "\r\n", not "\n".

The expected result is the one obtained with the following fuzz target:

// fuzz_target.cc
#include <cstdint>

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size)
{
  if (Data[0] == 'a') {
    __builtin_trap();
  }
  return 0;
}

The produced crash is reproducible as expected:

> "C:\Program Files\LLVM\bin\clang++.exe" -fsanitize=address,fuzzer 
fuzz_target.cc -o fuzzer.exe

> fuzzer.exe
SUMMARY: libFuzzer: deadly signal
Test unit written to ./crash-86f7e437faa5a7fce15d1ddcb9eaeaea377667b8
Base64: YQ==

>fuzzer.exe crash-86f7e437faa5a7fce15d1ddcb9eaeaea377667b8
fuzzer: Running 1 inputs 1 time(s) each.
Running: crash-86f7e437faa5a7fce15d1ddcb9eaeaea377667b8
==12580== ERROR: libFuzzer: deadly signal

Something like that but I would recommend something more generic

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size)
{
  if (Size < 1000000)
   return 0;

  // or any other simple hash
  Sum = Size;
  for (i = 0 to Size )
     Sum += data[i]

  if (Sum % 100 == 19)
      abort();

  return 0;
}

rm -rf SOME_UNIQUE_PREFIX*
run with -max_len=2000000 -artifact_prefix=SOME_UNIQUE_PREFIX
it should crash quickly

retry on SOME_UNIQUE_PREFIX*
so if any character in input is altered by Write/Read it should stop being reproducible.

There is a problem with that generic solution which is that nothing in it forces libfuzzer to generate multiple byte values and in my experiments the data I get is always very poorly distributed, so in practice it is not that generic.
I tried a fuzz target that would force libfuzzer to have at least one instance of every possible byte value, but libfuzzer wouldn't manage to generate appropriate data.
However I can confirm that the following fuzz target can be used to differentiate between pre-patch and patched versions following the (non-)reproducibility test we talked about:

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

// Tests preservation of crashing inputs when writing to disk.
#include <cstdint>

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
    static bool Found[256];

    // Make sure Data is quite long

    if (Size < 1000) {
        return 0;
    }

    // Make sure Data has at least one '\n' byte value

    // Checking that Data has all possible byte values would be more satisfying
    // but libfuzzer doesn't seem powerful enough for this right now

    for (size_t i = 0; i < 256; ++i) {
        Found[i] = false;
    }
    for (size_t i = 0; i < Size; ++i) {
        Found[Data[i]] = true;
    }
    if (!Found['\n']) {
        return 0;
    }

    // Crash upon some hash value from Size and Data

    // Reproducibility will only happen if hash value is the same

    size_t Sum = Size;
    for (size_t i = 0; i < Size; ++i) {
        Sum += Data[i];
    }
    if (Sum % 100 == 19) {
        __builtin_trap();
    }

    return 0;
}

I spent some time trying to understand llvm lit syntax but I couldn't produce a satisfying test file that I would be able to run from my Windows machine. Could you do this part? Thanks a lot.

Test

Herald added subscribers: jdoerfert, dexonsmith. · View Herald TranscriptMar 31 2019, 10:48 PM

Harbormaster completed remote builds in B29886: Diff 193043.Mar 31 2019, 10:48 PM

Can you try the test on Windows?

Looks good to me! Here are the results without / with patch.

D:\llvm-project\compiler-rt> python ..\build\Release\bin\llvm-lit.py --param build_config=x64 --param build_mode=Release --param llvm_site_config=../build/test/lit.site.cfg test\fuzzer\reload.test
llvm-lit.py: D:/llvm-project/compiler-rt/test/fuzzer/lit.cfg:32: note: lsan feature unavailable
llvm-lit.py: D:/llvm-project/compiler-rt/test/fuzzer/lit.cfg:39: note: msan feature unavailable
llvm-lit.py: D:/llvm-project/compiler-rt/test/fuzzer/lit.cfg:57: note: linux feature unavailable
-- Testing: 1 tests, single process --
FAIL: libFuzzer :: reload.test (1 of 1)
Testing Time: 25.05s
********************
Failing Tests (1):
    libFuzzer :: reload.test

  Unexpected Failures: 1

D:\llvm-project\compiler-rt> python ..\build\Release\bin\llvm-lit.py --param build_config=x64 --param build_mode=Release --param llvm_site_config=../build/test/lit.site.cfg test\fuzzer\reload.test
llvm-lit.py: D:/llvm-project/compiler-rt/test/fuzzer/lit.cfg:32: note: lsan feature unavailable
llvm-lit.py: D:/llvm-project/compiler-rt/test/fuzzer/lit.cfg:39: note: msan feature unavailable
llvm-lit.py: D:/llvm-project/compiler-rt/test/fuzzer/lit.cfg:57: note: linux feature unavailable
-- Testing: 1 tests, single process --
PASS: libFuzzer :: reload.test (1 of 1)
Testing Time: 9.71s
  Expected Passes    : 1

Execution time is always similar with multiple tries, I am not sure why it takes longer to fail without patch than to succeed with patch.

vitalybuka edited the summary of this revision. (Show Details)Apr 5 2019, 12:58 PM

If you like you can update your name in the summary, I can't see it in profile, and I will submit the patch

It's good as it is, you can submit the patch. Thanks for your time!

Closed by commit rL357807: Use binary write mode in WriteToFile function to avoid appended \r characters… (authored by vitalybuka). · Explain WhyApr 5 2019, 1:15 PM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: delcypher. · View Herald TranscriptApr 5 2019, 1:15 PM

Revision Contents

Path

Size

compiler-rt/

trunk/

lib/

fuzzer/

FuzzerIO.cpp

2 lines

test/

fuzzer/

ReloadTest.cpp

24 lines

reload.test

13 lines

Diff 193955

compiler-rt/trunk/lib/fuzzer/FuzzerIO.cpp

	Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	}			}

	void CopyFileToErr(const std::string &Path) {			void CopyFileToErr(const std::string &Path) {
	Printf("%s", FileToString(Path).c_str());			Printf("%s", FileToString(Path).c_str());
	}			}

	void WriteToFile(const Unit &U, const std::string &Path) {			void WriteToFile(const Unit &U, const std::string &Path) {
	// Use raw C interface because this function may be called from a sig handler.			// Use raw C interface because this function may be called from a sig handler.
	FILE *Out = fopen(Path.c_str(), "w");			FILE *Out = fopen(Path.c_str(), "wb");
	if (!Out) return;			if (!Out) return;
	fwrite(U.data(), sizeof(U[0]), U.size(), Out);			fwrite(U.data(), sizeof(U[0]), U.size(), Out);
	fclose(Out);			fclose(Out);
	}			}

	void ReadDirToVectorOfUnits(const char Path, Vector<Unit> V,			void ReadDirToVectorOfUnits(const char Path, Vector<Unit> V,
	long *Epoch, size_t MaxSize, bool ExitOnError) {			long *Epoch, size_t MaxSize, bool ExitOnError) {
	long E = Epoch ? *Epoch : 0;			long E = Epoch ? *Epoch : 0;
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

compiler-rt/trunk/test/fuzzer/ReloadTest.cpp

				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

				// Test that fuzzer we can reload artifacts with any bytes inside.
				#include <algorithm>
				#include <cstdint>
				#include <numeric>
				#include <set>
				#include <stdio.h>

				extern "C" size_t LLVMFuzzerCustomMutator(uint8_t *Data, size_t Size,
				size_t MaxSize, unsigned int Seed) {
				std::srand(Seed);
				std::generate(Data, Data + MaxSize, std::rand);
				return MaxSize;
				}

				extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
				if (Size > 5000 && std::set<uint8_t>(Data, Data + Size).size() > 255 &&
				(uint8_t)std::accumulate(Data, Data + Size, uint8_t(Size)) == 0)
				__builtin_trap();
				return 0;
				}

compiler-rt/trunk/test/fuzzer/reload.test

				RUN: %cpp_compiler %S/ReloadTest.cpp -o %t-ReloadTest
				RUN: not %run %t-ReloadTest -max_len=10000 -seed=1 -timeout=15 -len_control=0 -exact_artifact_path=%t.crash 2>&1 \| FileCheck %s

				CHECK: Test unit written to {{.*}}reload.test.tmp.crash

				RUN: not %run %t-ReloadTest %t.crash 2>&1 \| FileCheck %s --check-prefix=ARTIFACT

				ARTIFACT: Running: {{.*}}reload.test.tmp.crash
				ARTIFACT: ERROR: libFuzzer: deadly signal

				# Sanity check that altered artifact is not going to crash
				RUN: echo z >> %t.crash
				RUN: %run %t-ReloadTest %t.crash