This is an archive of the discontinued LLVM Phabricator instance.

[Fuzzer] First step to thread affinity
AbandonedPublic

Authored by devnexen on Jun 9 2018, 2:28 AM.

Download Raw Diff

Details

Reviewers

kcc
morehouse

Summary

Adding a new option (two modes) for a given job to be bound to a specific cpu.
No complex cpu binding, later on checking cpu load before binding, ought to be possible for example.
Linux and FreeBSD implemented at the moment.

Diff Detail

Repository: rCRT Compiler Runtime

Event Timeline

devnexen created this revision.Jun 9 2018, 2:28 AM

Herald added subscribers: Restricted Project, llvm-commits, krytarowski, emaste. · View Herald TranscriptJun 9 2018, 2:28 AM

Intentionally implemented only on platforms I could test even though I know NetBSD for example has a very similar API. Windows is doable as well, their API is pretty straightforward, if anybody implements it further or later on I can grab a windows VM.

devnexen edited the summary of this revision. (Show Details)Jun 9 2018, 2:31 AM

What's the gain? Performance improvement? What are the numbers for benchmarking?

What's the gain? Performance improvement? What are the numbers for benchmarking?

I wonder too.
Looks like something to fix in kernel. Each and every program out there is not supposed to do this to get good performance. That's OS job. OS knows what's the current load, what other apps are running, what's their affinity. LibFuzzer doesn't.

I understand your points, but usually see the jobs are often bound to cpu0 while this cpu can be potentially pretty busy.
But I put it as an option it s something not to be forced.

In D47977#1127236, @devnexen wrote:

I understand your points, but usually see the jobs are often bound to cpu0 while this cpu can be potentially pretty busy.
But I put it as an option it s something not to be forced.

In general we are adding CPU/OS specific code, not sure whether it's worth the maintenance cost.

NetBSD uses similar but different API.. if it is really needed I would rather prompt for higher level 3rd party library.. but it might be overkill. Also reimplementing it opencoding might be overkill too.

If you want to bind fuzzing to cpu1, you can use userland cpu affinity tools to do it without changing the code.

I understand your points, but usually see the jobs are often bound to cpu0 while this cpu can be potentially pretty busy.

Whoever is doing this needs to be fixed.
It's exactly this kind of code that introduces such problems. It tries to be "smart" but does not have full picture. And we are adding more such code, so we will have more such problems. If one uses rand mode and 2 threads happens to be bound to the same core, we lose half of performance. The sequential mode occupies cores sequentially, so two independent processes will oversubscribe first half of cores and leave remaining cores idle.
If you are adding these flags to some scripts, you can just use taskset there.

In D47977#1127238, @dvyukov wrote:

I understand your points, but usually see the jobs are often bound to cpu0 while this cpu can be potentially pretty busy.

Whoever is doing this needs to be fixed.
It's exactly this kind of code that introduces such problems. It tries to be "smart" but does not have full picture. And we are adding more such code, so we will have more such problems. If one uses rand mode and 2 threads happens to be bound to the same core, we lose half of performance. The sequential mode occupies cores sequentially, so two independent processes will oversubscribe first half of cores and leave remaining cores idle.
If you are adding these flags to some scripts, you can just use taskset there.

True ideally what I tried to say in the description is ideally the cpu load ought to be checked first but that s a lot of code :-)

True ideally what I tried to say in the description is ideally the cpu load ought to be checked first but that s a lot of code :-)

No, you can't solve this by checking CPU load. Load, affinity, set of other processes and their affinity changes dynamically. And this can't be solved by periodic re-tuning either, because re-tuning logic of other processes is unknown. Only OS has the information and means to do this properly and efficiently.

Well on Linux you can check process infos on /proc/<proc>/status I believe but that might be overkill.

Well on Linux you can check process infos on /proc/<proc>/status I believe but that might be overkill.

Besides this being overkill, it also does not provide enough info. And you simply may not have permissions to read it. And you may be running inside of pid namespace. And two processes can do lock-step dense: both overload cpu 0, both decide to jump to cpu 1, both overload cpu 1, both decide to jump to cpu 0, etc.

Only OS has the information and means to do this properly and efficiently.

And we did not even get to windows, mac, fuchsia and akaros.

This is absolutely not the business we want to get into with libfuzzer. Moreover, this is absolutely not libfuzzer-specific. Reimplementing OS within each and every program out there is wrong.

Ok fair points :-)

Revision Contents

Path

Size

lib/

fuzzer/

FuzzerDriver.cpp

40 lines

FuzzerFlags.def

1 line

FuzzerUtil.h

4 lines

FuzzerUtilFuchsia.cpp

3 lines

FuzzerUtilPosix.cpp

34 lines

FuzzerUtilWindows.cpp

3 lines

Diff 150614

lib/fuzzer/FuzzerDriver.cpp

Context not available.
	}	}

	static int RunInMultipleProcesses(const Vector<std::string> &Args,	static int RunInMultipleProcesses(const Vector<std::string> &Args,
	unsigned NumWorkers, unsigned NumJobs) {	Random &Rand) {
	std::atomic<unsigned> Counter(0);	std::atomic<unsigned> Counter(0);
	std::atomic<bool> HasErrors(false);	std::atomic<bool> HasErrors(false);
		unsigned NumCores = NumberOfCpuCores();
		unsigned w = 0;
	Command Cmd(Args);	Command Cmd(Args);
	Cmd.removeFlag("jobs");	Cmd.removeFlag("jobs");
	Cmd.removeFlag("workers");	Cmd.removeFlag("workers");
	Vector<std::thread> V;	Vector<std::thread> V;
	std::thread Pulse(PulseThread);	std::thread Pulse(PulseThread);
	Pulse.detach();	Pulse.detach();
	for (unsigned i = 0; i < NumWorkers; i++)	for (unsigned i = 0; i < Flags.workers; i++) {
	V.push_back(std::thread(WorkerThread, std::ref(Cmd), &Counter, NumJobs, &HasErrors));	std::thread Wthread(WorkerThread, std::ref(Cmd), &Counter, Flags.jobs, &HasErrors);
		if (Flags.sched_workers == 1) {
		if (w == NumCores)
		w = 0;
		SchedWorkerThread(Wthread, w++);
		} else if (Flags.sched_workers == 2) {
		w = Rand(NumCores);
		SchedWorkerThread(Wthread, w);
		}
		V.push_back(std::move(Wthread));
		}
	for (auto &T : V)	for (auto &T : V)
	T.join();	T.join();
	return HasErrors ? 1 : 0;	return HasErrors ? 1 : 0;
Context not available.
	Printf("Running %u workers\n", Flags.workers);	Printf("Running %u workers\n", Flags.workers);
	}	}

		unsigned Seed = Flags.seed;
		// Initialize Seed.
		if (Seed == 0)
		Seed =
		std::chrono::system_clock::now().time_since_epoch().count() + GetPid();

		Random Rand(Seed);

	if (Flags.workers > 0 && Flags.jobs > 0)	if (Flags.workers > 0 && Flags.jobs > 0)
	return RunInMultipleProcesses(Args, Flags.workers, Flags.jobs);	return RunInMultipleProcesses(Args, Rand);

	FuzzingOptions Options;	FuzzingOptions Options;
	Options.Verbosity = Flags.verbosity;	Options.Verbosity = Flags.verbosity;
Context not available.
	if (Flags.data_flow_trace)	if (Flags.data_flow_trace)
	Options.DataFlowTrace = Flags.data_flow_trace;	Options.DataFlowTrace = Flags.data_flow_trace;

	unsigned Seed = Flags.seed;
	// Initialize Seed.
	if (Seed == 0)
	Seed =
	std::chrono::system_clock::now().time_since_epoch().count() + GetPid();
	if (Flags.verbosity)	if (Flags.verbosity)
	Printf("INFO: Seed: %u\n", Seed);	Printf("INFO: Seed: %u\n", Seed);

	Random Rand(Seed);	int CpuBind;
		GetSchedWorker(&CpuBind);

		if (CpuBind >= 0)
		Printf("INFO: CPU id: %d\n", CpuBind);

	auto *MD = new MutationDispatcher(Rand, Options);	auto *MD = new MutationDispatcher(Rand, Options);
	auto *Corpus = new InputCorpus(Options.OutputCorpus);	auto *Corpus = new InputCorpus(Options.OutputCorpus);
	auto F = new Fuzzer(Callback, Corpus, *MD, Options);	auto F = new Fuzzer(Callback, Corpus, *MD, Options);
Context not available.

lib/fuzzer/FuzzerFlags.def

Context not available.
	FUZZER_FLAG_INT(analyze_dict, 0, "Experimental")	FUZZER_FLAG_INT(analyze_dict, 0, "Experimental")
	FUZZER_DEPRECATED_FLAG(use_clang_coverage)	FUZZER_DEPRECATED_FLAG(use_clang_coverage)
	FUZZER_FLAG_STRING(data_flow_trace, "Experimental: use the data flow trace")	FUZZER_FLAG_STRING(data_flow_trace, "Experimental: use the data flow trace")
		FUZZER_FLAG_INT(sched_workers, 0, "Set workers per cpu")
Context not available.

lib/fuzzer/FuzzerUtil.h

Context not available.

	#include "FuzzerDefs.h"	#include "FuzzerDefs.h"
	#include "FuzzerCommand.h"	#include "FuzzerCommand.h"
		#include <thread>

	namespace fuzzer {	namespace fuzzer {

Context not available.

	void SleepSeconds(int Seconds);	void SleepSeconds(int Seconds);

		void SchedWorkerThread(std::thread & Wthread, int C);
		void GetSchedWorker(int *C);

	unsigned long GetPid();	unsigned long GetPid();

	size_t GetPeakRSSMb();	size_t GetPeakRSSMb();
Context not available.

lib/fuzzer/FuzzerUtilFuchsia.cpp

Context not available.
	return Info.koid;	return Info.koid;
	}	}

		void SchedWorkerThread(std::thread &WorkerThread, int C) {}
		void GetSchedWorker(int C) { C = -1; }

	size_t GetPeakRSSMb() {	size_t GetPeakRSSMb() {
	zx_status_t rc;	zx_status_t rc;
	zx_info_task_stats_t Info;	zx_info_task_stats_t Info;
Context not available.

lib/fuzzer/FuzzerUtilPosix.cpp

Context not available.
	#include <sys/types.h>	#include <sys/types.h>
	#include <thread>	#include <thread>
	#include <unistd.h>	#include <unistd.h>
		#ifdef __FreeBSD__
		#include <pthread_np.h>
		typedef cpuset_t cpu_set_t;
		#endif

	namespace fuzzer {	namespace fuzzer {

Context not available.

	unsigned long GetPid() { return (unsigned long)getpid(); }	unsigned long GetPid() { return (unsigned long)getpid(); }

		void SchedWorkerThread(std::thread &WorkerThread, int C) {
		pthread_t NativeThread = WorkerThread.native_handle();
		#if defined(__linux__) \|\| defined(__FreeBSD__)
		cpu_set_t Cset;
		CPU_ZERO(&Cset);
		CPU_SET(C, &Cset);
		if (pthread_setaffinity_np(NativeThread, sizeof(Cset), &Cset) != 0)
		fprintf(stderr, "Error thread affinity setting for cpu %d\n", C);
		#endif
		}

		void GetSchedWorker(int *C) {
		unsigned long NumCores = std::thread::hardware_concurrency();
		*C = -1;
		#if defined(__linux__) \|\| defined(__FreeBSD__)
		cpu_set_t Cset;
		CPU_ZERO(&Cset);
		if (pthread_getaffinity_np(pthread_self(), sizeof(Cset), &Cset) != 0) {
		fprintf(stderr, "Error thread affinity capture\n");
		} else {
		for (size_t i = 0; i < NumCores; i ++) {
		if (CPU_ISSET(i, &Cset)) {
		*C = i;
		break;
		}
		}
		}
		#endif
		}

	size_t GetPeakRSSMb() {	size_t GetPeakRSSMb() {
	struct rusage usage;	struct rusage usage;
	if (getrusage(RUSAGE_SELF, &usage))	if (getrusage(RUSAGE_SELF, &usage))
Context not available.

lib/fuzzer/FuzzerUtilWindows.cpp

Context not available.

	unsigned long GetPid() { return GetCurrentProcessId(); }	unsigned long GetPid() { return GetCurrentProcessId(); }

		void SchedWorkerThread(std::thread &WorkerThread, int C) {}
		void GetSchedWorker(int C) { C = -1; }

	size_t GetPeakRSSMb() {	size_t GetPeakRSSMb() {
	PROCESS_MEMORY_COUNTERS info;	PROCESS_MEMORY_COUNTERS info;
	if (!GetProcessMemoryInfo(GetCurrentProcess(), &info, sizeof(info)))	if (!GetProcessMemoryInfo(GetCurrentProcess(), &info, sizeof(info)))
Context not available.