This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/
-
CMakeLists.txt
-
fuzzer/
-
FuzzedDataProvider.h
-
lib/
-
fuzzer/
-
FuzzerExtFunctions.def
-
tests/
-
CMakeLists.txt
-
FuzzedDataProviderUnittest.cpp
-
utils/
-
FuzzedDataProvider.h
-
sanitizer_common/scripts/
-
scripts/
-
check_lint.sh

Differential D65661

[compiler-rt] Move FDP to include/fuzzer/FuzzedDataProvider.h for easier use.
ClosedPublic

Authored by Dor1s on Aug 2 2019, 8:12 AM.

Download Raw Diff

Details

Reviewers

kcc
morehouse

Commits

rGf1b0a93e3a77: [compiler-rt] Move FDP to include/fuzzer/FuzzedDataProvider.h for easier use.
rL367917: [compiler-rt] Move FDP to include/fuzzer/FuzzedDataProvider.h for easier use.
rCRT367917: [compiler-rt] Move FDP to include/fuzzer/FuzzedDataProvider.h for easier use.

Summary

FuzzedDataProvider is a helper class for writing fuzz targets that fuzz
multple inputs simultaneously. The header is supposed to be used for fuzzing
engine agnostic fuzz targets (i.e. the same target can be used with libFuzzer,
AFL, honggfuzz, and other engines). The common thing though is that fuzz targets
are typically compiled with clang, as it provides all sanitizers as well as
different coverage instrumentation modes. Therefore, making this FDP class a
part of the compiler-rt installation package would make it easier to develop
and distribute fuzz targets across different projects, build systems, etc.
Some context also available in https://github.com/google/oss-fuzz/pull/2547.

This CL does not delete the header from lib/fuzzer/utils directory in order to
provide the downstream users some time for a smooth migration to the new
header location.

Diff Detail

Repository

rCRT Compiler Runtime

Build Status

Buildable 36171
Build 36170: arc lint + arc unit

Event Timeline

Dor1s created this revision.Aug 2 2019, 8:12 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 2 2019, 8:12 AM

Herald added subscribers: Restricted Project, delcypher, mgorny and 2 others. · View Herald Transcript

Harbormaster completed remote builds in B36034: Diff 213052.Aug 2 2019, 8:13 AM

I'm not opposing, but i have a question - this is not fuzzer specific at all, right?
This is just Span on steroids - knows it's size and byte position within the buffer,
and has methods to change the position by consuming bytes; nothing more?

I'm not opposing, but i have a question - this is not fuzzer specific at all, right?

Yes, see the summary above.

This is just Span on steroids - knows it's size and byte position within the buffer,
and has methods to change the position by consuming bytes; nothing more?

No, span is harmful for fuzzing, as its boundaries are not instrumented (i.e. we can miss some buffer under-/overflows). The FDP takes care of that by allocating dedicated buffers for separate inputs. Plus, it provides various other helpers like ConsumeBool or PickValueInArray to save people from writing custom tricks like data++[0] % something again and again.

It has evolved from a similar classes invented in Chrome and some other Google projects, and it did prove to be useful.

I should probably add some documentation in LLVM. As of now there is a short documentation for FDP in google/fuzzing repo: https://github.com/google/fuzzing/blob/master/docs/split-inputs.md#fuzzed-data-provider

In D65661#1612413, @Dor1s wrote:

I'm not opposing, but i have a question - this is not fuzzer specific at all, right?

Yes, see the summary above.

This is just Span on steroids - knows it's size and byte position within the buffer,
and has methods to change the position by consuming bytes; nothing more?

No, span is harmful for fuzzing, as its boundaries are not instrumented (i.e. we can miss some buffer under-/overflows). The FDP takes care of that by allocating dedicated buffers for separate inputs.

The word that is throwing me off here is "inputs".
If fuzzer gave us a buffer of 8 bytes, and we consume it as 2 consecutive 4-byte integers,
the terminology here means those are 2 separate inputs, right?
And while span would act like a light-weight view with no extra allocations,
this would allocate a *separate* buffer for each of these "inputs"?
That's it? I think this should be documented better, if it's not already.

Plus, it provides various other helpers like ConsumeBool or PickValueInArray to save people from writing custom tricks like data++[0] % something again and again.

It has evolved from a similar classes invented in Chrome and some other Google projects, and it did prove to be useful.

I should probably add some documentation in LLVM. As of now there is a short documentation for FDP in google/fuzzing repo: https://github.com/google/fuzzing/blob/master/docs/split-inputs.md#fuzzed-data-provider

the terminology here means those are 2 separate inputs, right?

Yes! Can you think of a better name for that? I thought "argument" or "parameter" could work, but what if those values are used somehow else (e.g. a number of iterations) and not passed to any function explicitly, then they are not technically arguments or parameters :/ "input" sounds pretty abstract to me, but any better suggestions are totally welcome.

And while span would act like a light-weight view with no extra allocations,
this would allocate a *separate* buffer for each of these "inputs"?

I don't see a sane way to use span in your example. Most likely it'll end up with two integer variable on stack, which is what FDP would do as well.

But if you interpret the buffer like int32_t*[2] and pass the address to the target API, you may miss some bugs because of the way how instrumentation works. We had cases like this, when e.g. off-by-one bugs were not detected. Educating every fuzzer developer on that matter doesn't scale. Providing an API that solves the problem does :)

In D65661#1612447, @Dor1s wrote:

the terminology here means those are 2 separate inputs, right?

Yes! Can you think of a better name for that? I thought "argument" or "parameter" could work, but what if those values are used somehow else (e.g. a number of iterations) and not passed to any function explicitly, then they are not technically arguments or parameters :/ "input" sounds pretty abstract to me, but any better suggestions are totally welcome.

Aha, now it makes sense. Sub-buffers?

And while span would act like a light-weight view with no extra allocations,
this would allocate a *separate* buffer for each of these "inputs"?

I don't see a sane way to use span in your example. Most likely it'll end up with two integer variable on stack, which is what FDP would do as well.

But if you interpret the buffer like int32_t*[2]

That's what i meant, yes.

and pass the address to the target API, you may miss some bugs because of the way how instrumentation works. We had cases like this, when e.g. off-by-one bugs were not detected.

Yeah, that i understand first-hand (:

Educating every fuzzer developer on that matter doesn't scale. Providing an API that solves the problem does :)

Though this wrapper won't solve all the issues, this would need to be more or less consistently
used throught all the code being fuzzed, not just on the surface in just the fuzz target.
Which means it would need to be *always* available, not just when building with clang+fuzzer,
which means it would need to be located in every project's sources.
In other words, i guess not sure how putting it in clang helps, other than creating compiler lock-in.

If this change gets accepted, any clang user should be able to use it via #include <fuzzer/FuzzedDataProvider.h>, rather than have it somewhere in their repo. It is not tied to -fsanitizer=fuzzer or any other compiler flags.

In D65661#1612541, @Dor1s wrote:

If this change gets accepted, any clang user should be able to use it via #include <fuzzer/FuzzedDataProvider.h>, rather than have it somewhere in their repo. It is not tied to -fsanitizer=fuzzer or any other compiler flags.

I think that misses the point, it's still a compiler lock-in - does that work with gcc?

In D65661#1612546, @lebedev.ri wrote:

In D65661#1612541, @Dor1s wrote:

If this change gets accepted, any clang user should be able to use it via #include <fuzzer/FuzzedDataProvider.h>, rather than have it somewhere in their repo. It is not tied to -fsanitizer=fuzzer or any other compiler flags.

I think that misses the point, it's still a compiler lock-in - does that work with gcc?

I don't think gcc is a good choice for fuzzing :) That is also covered in the CL summary.

But seriously, GCC users can just grab this header and use it -- that's the current state of the world, so this CL doesn't make life for GCC users harder.

In D65661#1612558, @Dor1s wrote:

In D65661#1612546, @lebedev.ri wrote:

In D65661#1612541, @Dor1s wrote:

If this change gets accepted, any clang user should be able to use it via #include <fuzzer/FuzzedDataProvider.h>, rather than have it somewhere in their repo. It is not tied to -fsanitizer=fuzzer or any other compiler flags.

I think that misses the point, it's still a compiler lock-in - does that work with gcc?

I don't think gcc is a good choice for fuzzing :)

...
Yep, i failed to convey the problem.
Any code that uses that header will get locked into compiler that provides said header,
unless they resort to bundling said header in the first place, which will be frowned upon.
Any code - at least the fuzzers, which somewhat contradicts the recommendations of oss-fuzz
for fuzz targets to be tested during *normal* builds. And really, this structure would need to
be consistently used through the code for best results,
in which case the entire codebase will get compiler-locked.

But i guess these concerns are completely insane/alien to everyone in nowadays monoculture world.

That is also covered in the CL summary.

Keep the header at the old location as well for smooth migration.

Harbormaster completed remote builds in B36062: Diff 213172.Aug 2 2019, 8:37 PM

Dor1s edited the summary of this revision. (Show Details)Aug 2 2019, 8:41 PM

In D65661#1612573, @lebedev.ri wrote:

Yep, i failed to convey the problem.
Any code that uses that header will get locked into compiler that provides said header,
unless they resort to bundling said header in the first place, which will be frowned upon.

Without this patch, any project that wants to use this header needs to add a vendored copy of FDP to their source repository. With this patch, non-clang will still have to do this, though projects that build with clang will not. To me, this seems very beneficial for every project on OSS-Fuzz (and any other project that uses libFuzzer), since they support building with clang anyway.

Any code - at least the fuzzers, which somewhat contradicts the recommendations of oss-fuzz
for fuzz targets to be tested during *normal* builds. And really, this structure would need to
be consistently used through the code for best results,
in which case the entire codebase will get compiler-locked.

Yes, the alternative is to use a vendored FDP. I agree it's ugly, but no uglier than is currently required.

I think this patch has more upside than downside. We strictly improve the case for projects using clang already, while leaving things the same for non-clang.

This revision is now accepted and ready to land.Aug 5 2019, 11:29 AM

Rebase + re-run the tests locally

Harbormaster completed remote builds in B36171: Diff 213442.Aug 5 2019, 12:54 PM

Closed by commit rL367917: [compiler-rt] Move FDP to include/fuzzer/FuzzedDataProvider.h for easier use. (authored by Dor1s). · Explain WhyAug 5 2019, 12:55 PM

This revision was automatically updated to reflect the committed changes.

In D65661#1615277, @morehouse wrote:

Without this patch, any project that wants to use this header needs to add a vendored copy of FDP to their source repository. With this patch, non-clang will still have to do this, though projects that build with clang will not.

Wasn't @lebedev.ri's point that projects might use multiple compilers? Projects that build with Clang might also build with other compilers. So what you call "non-clang" should probably be "not-exclusively-clang", and the only projects benefiting are those that build only with Clang.

It seems to me as if this header is compiler-independent, so why is it in Clang's builtin include directory (on Linux that's <prefix>/lib/clang/<version>/include) as opposed to the system include directory (on Linux, <prefix>/include)? The builtin include directory is supposed to contain headers that are "private" to Clang, because they contain non-standard C/C++ that is only guaranteed to be supported by Clang. (Basically stuff that starts with two underscores like builtins.)

Revision Contents

Path

Size

include/

CMakeLists.txt

9 lines

fuzzer

FuzzedDataProvider.h

245 lines

lib/

fuzzer/

FuzzerExtFunctions.def

8 lines

tests/

CMakeLists.txt

2 lines

FuzzedDataProviderUnittest.cpp

2 lines

utils/

FuzzedDataProvider.h

4 lines

sanitizer_common/

scripts/

check_lint.sh

4 lines

Diff 213442

include/CMakeLists.txt

	if (COMPILER_RT_BUILD_SANITIZERS)			if (COMPILER_RT_BUILD_SANITIZERS)
	set(SANITIZER_HEADERS			set(SANITIZER_HEADERS
	sanitizer/allocator_interface.h			sanitizer/allocator_interface.h
	sanitizer/asan_interface.h			sanitizer/asan_interface.h
	sanitizer/common_interface_defs.h			sanitizer/common_interface_defs.h
	sanitizer/coverage_interface.h			sanitizer/coverage_interface.h
	sanitizer/dfsan_interface.h			sanitizer/dfsan_interface.h
	sanitizer/hwasan_interface.h			sanitizer/hwasan_interface.h
	sanitizer/linux_syscall_hooks.h			sanitizer/linux_syscall_hooks.h
	sanitizer/lsan_interface.h			sanitizer/lsan_interface.h
	sanitizer/msan_interface.h			sanitizer/msan_interface.h
	sanitizer/netbsd_syscall_hooks.h			sanitizer/netbsd_syscall_hooks.h
	sanitizer/scudo_interface.h			sanitizer/scudo_interface.h
	sanitizer/tsan_interface.h			sanitizer/tsan_interface.h
	sanitizer/tsan_interface_atomic.h			sanitizer/tsan_interface_atomic.h
	)			)
				set(FUZZER_HEADERS
				fuzzer/FuzzedDataProvider.h
				)
	endif(COMPILER_RT_BUILD_SANITIZERS)			endif(COMPILER_RT_BUILD_SANITIZERS)

	if (COMPILER_RT_BUILD_XRAY)			if (COMPILER_RT_BUILD_XRAY)
	set(XRAY_HEADERS			set(XRAY_HEADERS
	xray/xray_interface.h			xray/xray_interface.h
	xray/xray_log_interface.h			xray/xray_log_interface.h
	xray/xray_records.h			xray/xray_records.h
	)			)
	endif(COMPILER_RT_BUILD_XRAY)			endif(COMPILER_RT_BUILD_XRAY)

	set(COMPILER_RT_HEADERS			set(COMPILER_RT_HEADERS
	${SANITIZER_HEADERS}			${SANITIZER_HEADERS}
				${FUZZER_HEADERS}
	${XRAY_HEADERS})			${XRAY_HEADERS})

	set(output_dir ${COMPILER_RT_OUTPUT_DIR}/include)			set(output_dir ${COMPILER_RT_OUTPUT_DIR}/include)

	# Copy compiler-rt headers to the build tree.			# Copy compiler-rt headers to the build tree.
	set(out_files)			set(out_files)
	foreach( f ${COMPILER_RT_HEADERS} )			foreach( f ${COMPILER_RT_HEADERS} )
	set( src ${CMAKE_CURRENT_SOURCE_DIR}/${f} )			set( src ${CMAKE_CURRENT_SOURCE_DIR}/${f} )
	Show All 9 Lines
	add_dependencies(compiler-rt compiler-rt-headers)			add_dependencies(compiler-rt compiler-rt-headers)
	set_target_properties(compiler-rt-headers PROPERTIES FOLDER "Compiler-RT Misc")			set_target_properties(compiler-rt-headers PROPERTIES FOLDER "Compiler-RT Misc")

	# Install sanitizer headers.			# Install sanitizer headers.
	install(FILES ${SANITIZER_HEADERS}			install(FILES ${SANITIZER_HEADERS}
	COMPONENT compiler-rt-headers			COMPONENT compiler-rt-headers
	PERMISSIONS OWNER_READ OWNER_WRITE GROUP_READ WORLD_READ			PERMISSIONS OWNER_READ OWNER_WRITE GROUP_READ WORLD_READ
	DESTINATION ${COMPILER_RT_INSTALL_PATH}/include/sanitizer)			DESTINATION ${COMPILER_RT_INSTALL_PATH}/include/sanitizer)
				# Install fuzzer headers.
				install(FILES ${FUZZER_HEADERS}
				COMPONENT compiler-rt-headers
				PERMISSIONS OWNER_READ OWNER_WRITE GROUP_READ WORLD_READ
				DESTINATION ${COMPILER_RT_INSTALL_PATH}/include/fuzzer)
	# Install xray headers.			# Install xray headers.
	install(FILES ${XRAY_HEADERS}			install(FILES ${XRAY_HEADERS}
	COMPONENT compiler-rt-headers			COMPONENT compiler-rt-headers
	PERMISSIONS OWNER_READ OWNER_WRITE GROUP_READ WORLD_READ			PERMISSIONS OWNER_READ OWNER_WRITE GROUP_READ WORLD_READ
	DESTINATION ${COMPILER_RT_INSTALL_PATH}/include/xray)			DESTINATION ${COMPILER_RT_INSTALL_PATH}/include/xray)

	if (NOT CMAKE_CONFIGURATION_TYPES) # don't add this for IDEs.			if (NOT CMAKE_CONFIGURATION_TYPES) # don't add this for IDEs.
	add_custom_target(install-compiler-rt-headers			add_custom_target(install-compiler-rt-headers
	Show All 13 Lines

include/fuzzer/

This directory was added.

include/fuzzer/FuzzedDataProvider.h

This file was added.

				//===- FuzzedDataProvider.h - Utility header for fuzz targets ---- C++ - ===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				// A single header library providing an utility class to break up an array of
				// bytes. Whenever run on the same input, provides the same output, as long as
				// its methods are called in the same order, with the same arguments.
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_FUZZER_FUZZED_DATA_PROVIDER_H_
				#define LLVM_FUZZER_FUZZED_DATA_PROVIDER_H_

				#include <limits.h>
				#include <stddef.h>
				#include <stdint.h>

				#include <algorithm>
				#include <cstring>
				#include <initializer_list>
				#include <string>
				#include <type_traits>
				#include <utility>
				#include <vector>

				class FuzzedDataProvider {
				public:
				// \|data\| is an array of length \|size\| that the FuzzedDataProvider wraps to
				// provide more granular access. \|data\| must outlive the FuzzedDataProvider.
				FuzzedDataProvider(const uint8_t *data, size_t size)
				: data_ptr_(data), remaining_bytes_(size) {}
				~FuzzedDataProvider() = default;

				// Returns a std::vector containing \|num_bytes\| of input data. If fewer than
				// \|num_bytes\| of data remain, returns a shorter std::vector containing all
				// of the data that's left. Can be used with any byte sized type, such as
				// char, unsigned char, uint8_t, etc.
				template <typename T> std::vector<T> ConsumeBytes(size_t num_bytes) {
				num_bytes = std::min(num_bytes, remaining_bytes_);
				return ConsumeBytes<T>(num_bytes, num_bytes);
				}

				// Similar to \|ConsumeBytes\|, but also appends the terminator value at the end
				// of the resulting vector. Useful, when a mutable null-terminated C-string is
				// needed, for example. But that is a rare case. Better avoid it, if possible,
				// and prefer using \|ConsumeBytes\| or \|ConsumeBytesAsString\| methods.
				template <typename T>
				std::vector<T> ConsumeBytesWithTerminator(size_t num_bytes,
				T terminator = 0) {
				num_bytes = std::min(num_bytes, remaining_bytes_);
				std::vector<T> result = ConsumeBytes<T>(num_bytes + 1, num_bytes);
				result.back() = terminator;
				return result;
				}

				// Returns a std::string containing \|num_bytes\| of input data. Using this and
				// \|.c_str()\| on the resulting string is the best way to get an immutable
				// null-terminated C string. If fewer than \|num_bytes\| of data remain, returns
				// a shorter std::string containing all of the data that's left.
				std::string ConsumeBytesAsString(size_t num_bytes) {
				static_assert(sizeof(std::string::value_type) == sizeof(uint8_t),
				"ConsumeBytesAsString cannot convert the data to a string.");

				num_bytes = std::min(num_bytes, remaining_bytes_);
				std::string result(
				reinterpret_cast<const std::string::value_type *>(data_ptr_),
				num_bytes);
				Advance(num_bytes);
				return result;
				}

				// Returns a number in the range [min, max] by consuming bytes from the
				// input data. The value might not be uniformly distributed in the given
				// range. If there's no input data left, always returns \|min\|. \|min\| must
				// be less than or equal to \|max\|.
				template <typename T> T ConsumeIntegralInRange(T min, T max) {
				static_assert(std::is_integral<T>::value, "An integral type is required.");
				static_assert(sizeof(T) <= sizeof(uint64_t), "Unsupported integral type.");

				if (min > max)
				abort();

				// Use the biggest type possible to hold the range and the result.
				uint64_t range = static_cast<uint64_t>(max) - min;
				uint64_t result = 0;
				size_t offset = 0;

				while (offset < sizeof(T) * CHAR_BIT && (range >> offset) > 0 &&
				remaining_bytes_ != 0) {
				// Pull bytes off the end of the seed data. Experimentally, this seems to
				// allow the fuzzer to more easily explore the input space. This makes
				// sense, since it works by modifying inputs that caused new code to run,
				// and this data is often used to encode length of data read by
				// \|ConsumeBytes\|. Separating out read lengths makes it easier modify the
				// contents of the data that is actually read.
				--remaining_bytes_;
				result = (result << CHAR_BIT) \| data_ptr_[remaining_bytes_];
				offset += CHAR_BIT;
				}

				// Avoid division by 0, in case \|range + 1\| results in overflow.
				if (range != std::numeric_limits<decltype(range)>::max())
				result = result % (range + 1);

				return static_cast<T>(min + result);
				}

				// Returns a std::string of length from 0 to \|max_length\|. When it runs out of
				// input data, returns what remains of the input. Designed to be more stable
				// with respect to a fuzzer inserting characters than just picking a random
				// length and then consuming that many bytes with \|ConsumeBytes\|.
				std::string ConsumeRandomLengthString(size_t max_length) {
				// Reads bytes from the start of \|data_ptr_\|. Maps "\\" to "\", and maps "\"
				// followed by anything else to the end of the string. As a result of this
				// logic, a fuzzer can insert characters into the string, and the string
				// will be lengthened to include those new characters, resulting in a more
				// stable fuzzer than picking the length of a string independently from
				// picking its contents.
				std::string result;

				// Reserve the anticipated capaticity to prevent several reallocations.
				result.reserve(std::min(max_length, remaining_bytes_));
				for (size_t i = 0; i < max_length && remaining_bytes_ != 0; ++i) {
				char next = ConvertUnsignedToSigned<char>(data_ptr_[0]);
				Advance(1);
				if (next == '\\' && remaining_bytes_ != 0) {
				next = ConvertUnsignedToSigned<char>(data_ptr_[0]);
				Advance(1);
				if (next != '\\')
				break;
				}
				result += next;
				}

				result.shrink_to_fit();
				return result;
				}

				// Returns a std::vector containing all remaining bytes of the input data.
				template <typename T> std::vector<T> ConsumeRemainingBytes() {
				return ConsumeBytes<T>(remaining_bytes_);
				}

				// Prefer using \|ConsumeRemainingBytes\| unless you actually need a std::string
				// object.
				// Returns a std::vector containing all remaining bytes of the input data.
				std::string ConsumeRemainingBytesAsString() {
				return ConsumeBytesAsString(remaining_bytes_);
				}

				// Returns a number in the range [Type's min, Type's max]. The value might
				// not be uniformly distributed in the given range. If there's no input data
				// left, always returns \|min\|.
				template <typename T> T ConsumeIntegral() {
				return ConsumeIntegralInRange(std::numeric_limits<T>::min(),
				std::numeric_limits<T>::max());
				}

				// Reads one byte and returns a bool, or false when no data remains.
				bool ConsumeBool() { return 1 & ConsumeIntegral<uint8_t>(); }

				// Returns a copy of a value selected from a fixed-size \|array\|.
				template <typename T, size_t size>
				T PickValueInArray(const T (&array)[size]) {
				static_assert(size > 0, "The array must be non empty.");
				return array[ConsumeIntegralInRange<size_t>(0, size - 1)];
				}

				template <typename T>
				T PickValueInArray(std::initializer_list<const T> list) {
				// static_assert(list.size() > 0, "The array must be non empty.");
				return *(list.begin() + ConsumeIntegralInRange<size_t>(0, list.size() - 1));
				}

				// Return an enum value. The enum must start at 0 and be contiguous. It must
				// also contain \|kMaxValue\| aliased to its largest (inclusive) value. Such as:
				// enum class Foo { SomeValue, OtherValue, kMaxValue = OtherValue };
				template <typename T> T ConsumeEnum() {
				static_assert(std::is_enum<T>::value, "\|T\| must be an enum type.");
				return static_cast<T>(ConsumeIntegralInRange<uint32_t>(
				0, static_cast<uint32_t>(T::kMaxValue)));
				}

				// Reports the remaining bytes available for fuzzed input.
				size_t remaining_bytes() { return remaining_bytes_; }

				private:
				FuzzedDataProvider(const FuzzedDataProvider &) = delete;
				FuzzedDataProvider &operator=(const FuzzedDataProvider &) = delete;

				void Advance(size_t num_bytes) {
				if (num_bytes > remaining_bytes_)
				abort();

				data_ptr_ += num_bytes;
				remaining_bytes_ -= num_bytes;
				}

				template <typename T>
				std::vector<T> ConsumeBytes(size_t size, size_t num_bytes_to_consume) {
				static_assert(sizeof(T) == sizeof(uint8_t), "Incompatible data type.");

				// The point of using the size-based constructor below is to increase the
				// odds of having a vector object with capacity being equal to the length.
				// That part is always implementation specific, but at least both libc++ and
				// libstdc++ allocate the requested number of bytes in that constructor,
				// which seems to be a natural choice for other implementations as well.
				// To increase the odds even more, we also call \|shrink_to_fit\| below.
				std::vector<T> result(size);
				std::memcpy(result.data(), data_ptr_, num_bytes_to_consume);
				Advance(num_bytes_to_consume);

				// Even though \|shrink_to_fit\| is also implementation specific, we expect it
				// to provide an additional assurance in case vector's constructor allocated
				// a buffer which is larger than the actual amount of data we put inside it.
				result.shrink_to_fit();
				return result;
				}

				template <typename TS, typename TU> TS ConvertUnsignedToSigned(TU value) {
				static_assert(sizeof(TS) == sizeof(TU), "Incompatible data types.");
				static_assert(!std::numeric_limits<TU>::is_signed,
				"Source type must be unsigned.");

				// TODO(Dor1s): change to `if constexpr` once C++17 becomes mainstream.
				if (std::numeric_limits<TS>::is_modulo)
				return static_cast<TS>(value);

				// Avoid using implementation-defined unsigned to signer conversions.
				// To learn more, see https://stackoverflow.com/questions/13150449.
				if (value <= std::numeric_limits<TS>::max())
				return static_cast<TS>(value);
				else {
				constexpr auto TS_min = std::numeric_limits<TS>::min();
				return TS_min + static_cast<char>(value - TS_min);
				}
				}

				const uint8_t *data_ptr_;
				size_t remaining_bytes_;
				};

				#endif // LLVM_FUZZER_FUZZED_DATA_PROVIDER_H_

lib/fuzzer/FuzzerExtFunctions.def

	Show All 10 Lines
	// the macro is:			// the macro is:
	//			//
	// EXT_FUNC(<name>, <return_type>, <function_signature>, <warn_if_missing>)			// EXT_FUNC(<name>, <return_type>, <function_signature>, <warn_if_missing>)
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	// Optional user functions			// Optional user functions
	EXT_FUNC(LLVMFuzzerInitialize, int, (int argc, char **argv), false);			EXT_FUNC(LLVMFuzzerInitialize, int, (int argc, char **argv), false);
	EXT_FUNC(LLVMFuzzerCustomMutator, size_t,			EXT_FUNC(LLVMFuzzerCustomMutator, size_t,
	(uint8_t * Data, size_t Size, size_t MaxSize, unsigned int Seed),			(uint8_t *Data, size_t Size, size_t MaxSize, unsigned int Seed),
	false);			false);
	EXT_FUNC(LLVMFuzzerCustomCrossOver, size_t,			EXT_FUNC(LLVMFuzzerCustomCrossOver, size_t,
	(const uint8_t * Data1, size_t Size1,			(const uint8_t *Data1, size_t Size1,
	const uint8_t * Data2, size_t Size2,			const uint8_t *Data2, size_t Size2,
	uint8_t * Out, size_t MaxOutSize, unsigned int Seed),			uint8_t *Out, size_t MaxOutSize, unsigned int Seed),
	false);			false);

	// Sanitizer functions			// Sanitizer functions
	EXT_FUNC(__lsan_enable, void, (), false);			EXT_FUNC(__lsan_enable, void, (), false);
	EXT_FUNC(__lsan_disable, void, (), false);			EXT_FUNC(__lsan_disable, void, (), false);
	EXT_FUNC(__lsan_do_recoverable_leak_check, int, (), false);			EXT_FUNC(__lsan_do_recoverable_leak_check, int, (), false);
	EXT_FUNC(__sanitizer_acquire_crash_state, int, (), true);			EXT_FUNC(__sanitizer_acquire_crash_state, int, (), true);
	EXT_FUNC(__sanitizer_install_malloc_and_free_hooks, int,			EXT_FUNC(__sanitizer_install_malloc_and_free_hooks, int,
	Show All 17 Lines

lib/fuzzer/tests/CMakeLists.txt

Show First 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	generate_compiler_rt_tests(FuzzerTestObjects
SOURCES FuzzerUnittest.cpp ${COMPILER_RT_GTEST_SOURCE}		SOURCES FuzzerUnittest.cpp ${COMPILER_RT_GTEST_SOURCE}
RUNTIME ${LIBFUZZER_TEST_RUNTIME}		RUNTIME ${LIBFUZZER_TEST_RUNTIME}
DEPS gtest ${LIBFUZZER_TEST_RUNTIME_DEPS}		DEPS gtest ${LIBFUZZER_TEST_RUNTIME_DEPS}
CFLAGS ${LIBFUZZER_UNITTEST_CFLAGS} ${LIBFUZZER_TEST_RUNTIME_CFLAGS}		CFLAGS ${LIBFUZZER_UNITTEST_CFLAGS} ${LIBFUZZER_TEST_RUNTIME_CFLAGS}
LINK_FLAGS ${LIBFUZZER_UNITTEST_LINK_FLAGS} ${LIBFUZZER_TEST_RUNTIME_LINK_FLAGS})		LINK_FLAGS ${LIBFUZZER_UNITTEST_LINK_FLAGS} ${LIBFUZZER_TEST_RUNTIME_LINK_FLAGS})
set_target_properties(FuzzerUnitTests PROPERTIES		set_target_properties(FuzzerUnitTests PROPERTIES
RUNTIME_OUTPUT_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})		RUNTIME_OUTPUT_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})

list(APPEND LIBFUZZER_UNITTEST_CFLAGS -I${COMPILER_RT_SOURCE_DIR}/lib/fuzzer/utils)

set(FuzzedDataProviderTestObjects)		set(FuzzedDataProviderTestObjects)
generate_compiler_rt_tests(FuzzedDataProviderTestObjects		generate_compiler_rt_tests(FuzzedDataProviderTestObjects
FuzzedDataProviderUnitTests "FuzzerUtils-${arch}-Test" ${arch}		FuzzedDataProviderUnitTests "FuzzerUtils-${arch}-Test" ${arch}
SOURCES FuzzedDataProviderUnittest.cpp ${COMPILER_RT_GTEST_SOURCE}		SOURCES FuzzedDataProviderUnittest.cpp ${COMPILER_RT_GTEST_SOURCE}
DEPS gtest ${LIBFUZZER_TEST_RUNTIME_DEPS}		DEPS gtest ${LIBFUZZER_TEST_RUNTIME_DEPS}
CFLAGS ${LIBFUZZER_UNITTEST_CFLAGS} ${LIBFUZZER_TEST_RUNTIME_CFLAGS}		CFLAGS ${LIBFUZZER_UNITTEST_CFLAGS} ${LIBFUZZER_TEST_RUNTIME_CFLAGS}
LINK_FLAGS ${LIBFUZZER_UNITTEST_LINK_FLAGS} ${LIBFUZZER_TEST_RUNTIME_LINK_FLAGS})		LINK_FLAGS ${LIBFUZZER_UNITTEST_LINK_FLAGS} ${LIBFUZZER_TEST_RUNTIME_LINK_FLAGS})
set_target_properties(FuzzedDataProviderUnitTests PROPERTIES		set_target_properties(FuzzedDataProviderUnitTests PROPERTIES
RUNTIME_OUTPUT_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})		RUNTIME_OUTPUT_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
endif()		endif()

lib/fuzzer/tests/FuzzedDataProviderUnittest.cpp

	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

	#include "gtest/gtest.h"			#include "gtest/gtest.h"
	#include <cstdint>			#include <cstdint>
	#include <cstdlib>			#include <cstdlib>

	#include "FuzzedDataProvider.h"			#include <fuzzer/FuzzedDataProvider.h>

	// The test is intentionally extensive, as behavior of \|FuzzedDataProvider\| must			// The test is intentionally extensive, as behavior of \|FuzzedDataProvider\| must
	// not be broken, given than many fuzz targets depend on it. Changing the			// not be broken, given than many fuzz targets depend on it. Changing the
	// behavior might invalidate existing corpora and make the fuzz targets using			// behavior might invalidate existing corpora and make the fuzz targets using
	// \|FuzzedDataProvider\| to lose code coverage accumulated over time.			// \|FuzzedDataProvider\| to lose code coverage accumulated over time.

	/* A random 1KB buffer generated by:			/* A random 1KB buffer generated by:
	$ python -c "import os; print ',\n'.join([', '.join(['0x%02X' % ord(i) for i \			$ python -c "import os; print ',\n'.join([', '.join(['0x%02X' % ord(i) for i \
	▲ Show 20 Lines • Show All 337 Lines • Show Last 20 Lines

lib/fuzzer/utils/FuzzedDataProvider.h

Show All 20 Lines
#include <cstring>		#include <cstring>
#include <initializer_list>		#include <initializer_list>
#include <string>		#include <string>
#include <type_traits>		#include <type_traits>
#include <utility>		#include <utility>
#include <vector>		#include <vector>

class FuzzedDataProvider {		class FuzzedDataProvider {
public:		public:
// \|data\| is an array of length \|size\| that the FuzzedDataProvider wraps to		// \|data\| is an array of length \|size\| that the FuzzedDataProvider wraps to
// provide more granular access. \|data\| must outlive the FuzzedDataProvider.		// provide more granular access. \|data\| must outlive the FuzzedDataProvider.
FuzzedDataProvider(const uint8_t *data, size_t size)		FuzzedDataProvider(const uint8_t *data, size_t size)
: data_ptr_(data), remaining_bytes_(size) {}		: data_ptr_(data), remaining_bytes_(size) {}
~FuzzedDataProvider() = default;		~FuzzedDataProvider() = default;

// Returns a std::vector containing \|num_bytes\| of input data. If fewer than		// Returns a std::vector containing \|num_bytes\| of input data. If fewer than
// \|num_bytes\| of data remain, returns a shorter std::vector containing all		// \|num_bytes\| of data remain, returns a shorter std::vector containing all
▲ Show 20 Lines • Show All 143 Lines • ▼ Show 20 Lines	template <typename T> T ConsumeEnum() {
static_assert(std::is_enum<T>::value, "\|T\| must be an enum type.");		static_assert(std::is_enum<T>::value, "\|T\| must be an enum type.");
return static_cast<T>(ConsumeIntegralInRange<uint32_t>(		return static_cast<T>(ConsumeIntegralInRange<uint32_t>(
0, static_cast<uint32_t>(T::kMaxValue)));		0, static_cast<uint32_t>(T::kMaxValue)));
}		}

// Reports the remaining bytes available for fuzzed input.		// Reports the remaining bytes available for fuzzed input.
size_t remaining_bytes() { return remaining_bytes_; }		size_t remaining_bytes() { return remaining_bytes_; }

private:		private:
FuzzedDataProvider(const FuzzedDataProvider &) = delete;		FuzzedDataProvider(const FuzzedDataProvider &) = delete;
FuzzedDataProvider &operator=(const FuzzedDataProvider &) = delete;		FuzzedDataProvider &operator=(const FuzzedDataProvider &) = delete;

void Advance(size_t num_bytes) {		void Advance(size_t num_bytes) {
if (num_bytes > remaining_bytes_)		if (num_bytes > remaining_bytes_)
abort();		abort();

data_ptr_ += num_bytes;		data_ptr_ += num_bytes;
▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

lib/sanitizer_common/scripts/check_lint.sh

	Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	}			}

	if [ "${COMPILER_RT}" = "" ]; then			if [ "${COMPILER_RT}" = "" ]; then
	COMPILER_RT=projects/compiler-rt			COMPILER_RT=projects/compiler-rt
	fi			fi
	LIT_TESTS=${COMPILER_RT}/test			LIT_TESTS=${COMPILER_RT}/test
	# Headers			# Headers
	SANITIZER_INCLUDES=${COMPILER_RT}/include/sanitizer			SANITIZER_INCLUDES=${COMPILER_RT}/include/sanitizer
	run_lint ${SANITIZER_INCLUDES_LINT_FILTER} ${SANITIZER_INCLUDES}/*.h &			FUZZER_INCLUDES=${COMPILER_RT}/include/fuzzer
				run_lint ${SANITIZER_INCLUDES_LINT_FILTER} ${SANITIZER_INCLUDES}/*.h \
				${FUZZER_INCLUDES}/*.h &

	# Sanitizer_common			# Sanitizer_common
	COMMON_RTL=${COMPILER_RT}/lib/sanitizer_common			COMMON_RTL=${COMPILER_RT}/lib/sanitizer_common
	run_lint ${COMMON_RTL_INC_LINT_FILTER} ${COMMON_RTL}/*.cpp \			run_lint ${COMMON_RTL_INC_LINT_FILTER} ${COMMON_RTL}/*.cpp \
	${COMMON_RTL}/*.h \			${COMMON_RTL}/*.h \
	${COMMON_RTL}/tests/*.cpp &			${COMMON_RTL}/tests/*.cpp &

	# Interception			# Interception
	▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines