This is an archive of the discontinued LLVM Phabricator instance.

[llvm-profdata] Avoid keeping reference to every files
Needs ReviewPublic

Authored by paulsemel on Jun 21 2019, 2:05 PM.

Download Raw Diff

Details

Reviewers

bogner
danielcdh
xur
dnovillo
davidxl
wmi

Summary

This is definitely not the ideal solution to solve this problem.
But I think this is better than destroying my RAM.

Diff Detail

Repository: rL LLVM

Event Timeline

paulsemel created this revision.Jun 21 2019, 2:05 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 21 2019, 2:05 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Adding reviewers who are familiar with SamplePGO.

davidxl added a reviewer: wmi.Jun 21 2019, 2:14 PM

We had better have a test for it. It is not a strict NFC and we best verify it works as we expect -- like if we miss any place which still uses StringRef refering to string in file data buffer, we may have dangling pointer after we free the buffer.

llvm/tools/llvm-profdata/llvm-profdata.cpp
444	Please add some comment about why we need FunctionNames. I think it is because FunctionSamples and SampleRecord all use StringRef and the original strings refered to are in the data buffer. To remove the dependence on the data buffer, we want to use a name table to save all the name strings. The usage is not very straightforward.

In D63671#1554457, @wmi wrote:

We had better have a test for it. It is not a strict NFC and we best verify it works as we expect -- like if we miss any place which still uses StringRef refering to string in file data buffer, we may have dangling pointer after we free the buffer.

I agree with you that this is not completely NFC. However, I didn't really see what could best test this functionality (anyway, in our case, accessing dangling pointer would lead to SIGSEGV as we unmap the files).

Add comment to describe why we need this string table
Add one test case.

paulsemel marked an inline comment as done.Jun 24 2019, 4:06 PM

paulsemel retitled this revision from [llvm-profdata] [NFC] Avoir keeping reference to every files to [llvm-profdata] Avoid keeping reference to every files.Jun 24 2019, 4:32 PM

ping @wmi

In D63671#1556584, @paulsemel wrote:

Add one test case.

Thanks. If we remove the code to replace string reference from file data buffer to FunctionNames set for call targets, will the test fail?

In D63671#1563092, @wmi wrote:

In D63671#1556584, @paulsemel wrote:

Add one test case.

Thanks. If we remove the code to replace string reference from file data buffer to FunctionNames set for call targets, will the test fail?

No, because this change is kind of a NFC, which implies that I want to keep the same behaviors as previously.
To actually have a test that fails if we remove this code, we would need to have a super heavy and time consuming test, which is, imo, not what we want for a test.

Do we really need to have a tests that fails if we remove this code ?

In D63671#1564933, @paulsemel wrote:

In D63671#1563092, @wmi wrote:

In D63671#1556584, @paulsemel wrote:

Add one test case.

Thanks. If we remove the code to replace string reference from file data buffer to FunctionNames set for call targets, will the test fail?

No, because this change is kind of a NFC, which implies that I want to keep the same behaviors as previously.
To actually have a test that fails if we remove this code, we would need to have a super heavy and time consuming test, which is, imo, not what we want for a test.

Do we really need to have a tests that fails if we remove this code ?

Is it possible to add an option to flush the file data buffer with 0 after the file is released? The option is only enabled for testing. With the option on, even with small test we can catch some difference in merge result if the patch is incorrect.

In D63671#1565126, @wmi wrote:

In D63671#1564933, @paulsemel wrote:

In D63671#1563092, @wmi wrote:

In D63671#1556584, @paulsemel wrote:

Add one test case.

Thanks. If we remove the code to replace string reference from file data buffer to FunctionNames set for call targets, will the test fail?

No, because this change is kind of a NFC, which implies that I want to keep the same behaviors as previously.
To actually have a test that fails if we remove this code, we would need to have a super heavy and time consuming test, which is, imo, not what we want for a test.

Do we really need to have a tests that fails if we remove this code ?

Is it possible to add an option to flush the file data buffer with 0 after the file is released? The option is only enabled for testing. With the option on, even with small test we can catch some difference in merge result if the patch is incorrect.

I realize I wasn't clear. I mean flush the data buffer after reading and merging for the file is done and just before the file is released.

In D63671#1568997, @wmi wrote:

In D63671#1565126, @wmi wrote:

In D63671#1564933, @paulsemel wrote:

In D63671#1563092, @wmi wrote:

In D63671#1556584, @paulsemel wrote:

Add one test case.

Thanks. If we remove the code to replace string reference from file data buffer to FunctionNames set for call targets, will the test fail?

No, because this change is kind of a NFC, which implies that I want to keep the same behaviors as previously.
To actually have a test that fails if we remove this code, we would need to have a super heavy and time consuming test, which is, imo, not what we want for a test.

Do we really need to have a tests that fails if we remove this code ?

Is it possible to add an option to flush the file data buffer with 0 after the file is released? The option is only enabled for testing. With the option on, even with small test we can catch some difference in merge result if the patch is incorrect.

I realize I wasn't clear. I mean flush the data buffer after reading and merging for the file is done and just before the file is released.

I am very sorry, but I don't really understand what you want... The files are mmap'd with only READ protection, so zeroing the data isn't really an option!
Plus, we munmap the file when we destroy the object at the end of the loop, so accessing again the data would result in a segfault already... we cannot do much more here I think, but maybe I am missing smth ?

Sorry, I didn't have time to work on this the past 10 days..

ping @wmi

From my last reply "I assumed the input files are small since we want to have small testcase. If the input files are small, the buffer won't be allocated through mmap but through new (From getOpenFileImpl in lib/Support/MemoryBuffer.cpp)".

Do you think we can construct a testcase with small inputs so the buffer could be writable?

Plus, we munmap the file when we destroy the object at the end of the loop, so accessing again the data would result in a segfault already...

If that is the case, why you didn't see SEGV when you remove the code to replace string reference from file data buffer to FunctionNames set for call targets? I think munmap area could possibly be remapped when reading in the next file?

Revision Contents

Path

Size

llvm/

include/

llvm/

ProfileData/

SampleProf.h

28 lines

tools/

llvm-profdata/

llvm-profdata.cpp

15 lines

Diff 206071

llvm/include/llvm/ProfileData/SampleProf.h

Show All 12 Lines

#ifndef LLVM_PROFILEDATA_SAMPLEPROF_H		#ifndef LLVM_PROFILEDATA_SAMPLEPROF_H
#define LLVM_PROFILEDATA_SAMPLEPROF_H		#define LLVM_PROFILEDATA_SAMPLEPROF_H

#include "llvm/ADT/DenseSet.h"		#include "llvm/ADT/DenseSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringMap.h"		#include "llvm/ADT/StringMap.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
		#include "llvm/ADT/StringSet.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/GlobalValue.h"		#include "llvm/IR/GlobalValue.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorOr.h"		#include "llvm/Support/ErrorOr.h"
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"
#include <algorithm>		#include <algorithm>
#include <cstdint>		#include <cstdint>
▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines	public:
/// Return true if this sample record contains function calls.		/// Return true if this sample record contains function calls.
bool hasCalls() const { return !CallTargets.empty(); }		bool hasCalls() const { return !CallTargets.empty(); }

uint64_t getSamples() const { return NumSamples; }		uint64_t getSamples() const { return NumSamples; }
const CallTargetMap &getCallTargets() const { return CallTargets; }		const CallTargetMap &getCallTargets() const { return CallTargets; }

/// Merge the samples in \p Other into this record.		/// Merge the samples in \p Other into this record.
/// Optionally scale sample counts by \p Weight.		/// Optionally scale sample counts by \p Weight.
sampleprof_error merge(const SampleRecord &Other, uint64_t Weight = 1) {		sampleprof_error merge(const SampleRecord &Other,
		StringSet<> *FunctionNames = nullptr,
		uint64_t Weight = 1) {
sampleprof_error Result = addSamples(Other.getSamples(), Weight);		sampleprof_error Result = addSamples(Other.getSamples(), Weight);
for (const auto &I : Other.getCallTargets()) {		for (const auto &I : Other.getCallTargets()) {
MergeResult(Result, addCalledTarget(I.first(), I.second, Weight));		StringRef FName = I.first();
		if (FunctionNames) {
		FName = FunctionNames->insert(FName).first->first();
		}
		MergeResult(Result, addCalledTarget(FName, I.second, Weight));
}		}
return Result;		return Result;
}		}

void print(raw_ostream &OS, unsigned Indent) const;		void print(raw_ostream &OS, unsigned Indent) const;
void dump() const;		void dump() const;

private:		private:
▲ Show 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	public:

/// Return all the callsite samples collected in the body of the function.		/// Return all the callsite samples collected in the body of the function.
const CallsiteSampleMap &getCallsiteSamples() const {		const CallsiteSampleMap &getCallsiteSamples() const {
return CallsiteSamples;		return CallsiteSamples;
}		}

/// Merge the samples in \p Other into this one.		/// Merge the samples in \p Other into this one.
/// Optionally scale samples by \p Weight.		/// Optionally scale samples by \p Weight.
sampleprof_error merge(const FunctionSamples &Other, uint64_t Weight = 1) {		sampleprof_error merge(const FunctionSamples &Other,
		StringSet<> *FunctionNames = nullptr,
		uint64_t Weight = 1) {
sampleprof_error Result = sampleprof_error::success;		sampleprof_error Result = sampleprof_error::success;
Name = Other.getName();		Name = Other.getName();
MergeResult(Result, addTotalSamples(Other.getTotalSamples(), Weight));		MergeResult(Result, addTotalSamples(Other.getTotalSamples(), Weight));
MergeResult(Result, addHeadSamples(Other.getHeadSamples(), Weight));		MergeResult(Result, addHeadSamples(Other.getHeadSamples(), Weight));
for (const auto &I : Other.getBodySamples()) {		for (const auto &I : Other.getBodySamples()) {
const LineLocation &Loc = I.first;		const LineLocation &Loc = I.first;
const SampleRecord &Rec = I.second;		const SampleRecord &Rec = I.second;
MergeResult(Result, BodySamples[Loc].merge(Rec, Weight));		MergeResult(Result, BodySamples[Loc].merge(Rec, FunctionNames, Weight));
}		}
for (const auto &I : Other.getCallsiteSamples()) {		for (const auto &I : Other.getCallsiteSamples()) {
const LineLocation &Loc = I.first;		const LineLocation &Loc = I.first;
FunctionSamplesMap &FSMap = functionSamplesAt(Loc);		FunctionSamplesMap &FSMap = functionSamplesAt(Loc);
for (const auto &Rec : I.second)		for (const auto &Rec : I.second) {
MergeResult(Result, FSMap[Rec.first].merge(Rec.second, Weight));		if (FunctionNames) {
		StringRef FName = Rec.second.getName();
		FName = FunctionNames->insert(FName).first->first();
		const_cast<FunctionSamples &>(Rec.second).setName(FName);
		}
		MergeResult(Result,
		FSMap[Rec.first].merge(Rec.second, FunctionNames, Weight));
		}
}		}
return Result;		return Result;
}		}

/// Recursively traverses all children, if the total sample count of the		/// Recursively traverses all children, if the total sample count of the
/// corresponding function is no less than \p Threshold, add its corresponding		/// corresponding function is no less than \p Threshold, add its corresponding
/// GUID to \p S. Also traverse the BodySamples to add hot CallTarget's GUID		/// GUID to \p S. Also traverse the BodySamples to add hot CallTarget's GUID
/// to \p S.		/// to \p S.
▲ Show 20 Lines • Show All 208 Lines • Show Last 20 Lines

llvm/tools/llvm-profdata/llvm-profdata.cpp

//===- llvm-profdata.cpp - LLVM profile data tool -------------------------===//		//===- llvm-profdata.cpp - LLVM profile data tool -------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// llvm-profdata merges .profdata files.		// llvm-profdata merges .profdata files.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
		#include "llvm/ADT/StringSet.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/ProfileData/InstrProfReader.h"		#include "llvm/ProfileData/InstrProfReader.h"
#include "llvm/ProfileData/InstrProfWriter.h"		#include "llvm/ProfileData/InstrProfWriter.h"
#include "llvm/ProfileData/ProfileCommon.h"		#include "llvm/ProfileData/ProfileCommon.h"
#include "llvm/ProfileData/SampleProfReader.h"		#include "llvm/ProfileData/SampleProfReader.h"
#include "llvm/ProfileData/SampleProfWriter.h"		#include "llvm/ProfileData/SampleProfWriter.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Errc.h"		#include "llvm/Support/Errc.h"
▲ Show 20 Lines • Show All 411 Lines • ▼ Show 20 Lines	static void mergeSampleProfile(const WeightedFileVector &Inputs,
using namespace sampleprof;		using namespace sampleprof;
auto WriterOrErr =		auto WriterOrErr =
SampleProfileWriter::create(OutputFilename, FormatMap[OutputFormat]);		SampleProfileWriter::create(OutputFilename, FormatMap[OutputFormat]);
if (std::error_code EC = WriterOrErr.getError())		if (std::error_code EC = WriterOrErr.getError())
exitWithErrorCode(EC, OutputFilename);		exitWithErrorCode(EC, OutputFilename);

auto Writer = std::move(WriterOrErr.get());		auto Writer = std::move(WriterOrErr.get());
StringMap<FunctionSamples> ProfileMap;		StringMap<FunctionSamples> ProfileMap;
SmallVector<std::unique_ptr<sampleprof::SampleProfileReader>, 5> Readers;		StringSet<> FunctionNames;
		wmiUnsubmitted Done Reply Inline Actions Please add some comment about why we need FunctionNames. I think it is because FunctionSamples and SampleRecord all use StringRef and the original strings refered to are in the data buffer. To remove the dependence on the data buffer, we want to use a name table to save all the name strings. The usage is not very straightforward. wmi: Please add some comment about why we need FunctionNames. I think it is because FunctionSamples…
LLVMContext Context;		LLVMContext Context;
for (const auto &Input : Inputs) {		for (const auto &Input : Inputs) {
auto ReaderOrErr = SampleProfileReader::create(Input.Filename, Context);		auto ReaderOrErr = SampleProfileReader::create(Input.Filename, Context);
if (std::error_code EC = ReaderOrErr.getError())		if (std::error_code EC = ReaderOrErr.getError())
exitWithErrorCode(EC, Input.Filename);		exitWithErrorCode(EC, Input.Filename);

// We need to keep the readers around until after all the files are		const auto &Reader = ReaderOrErr.get();
// read so that we do not lose the function names stored in each
// reader's memory. The function names are needed to write out the
// merged profile map.
Readers.push_back(std::move(ReaderOrErr.get()));
const auto Reader = Readers.back().get();
if (std::error_code EC = Reader->read())		if (std::error_code EC = Reader->read())
exitWithErrorCode(EC, Input.Filename);		exitWithErrorCode(EC, Input.Filename);

StringMap<FunctionSamples> &Profiles = Reader->getProfiles();		StringMap<FunctionSamples> &Profiles = Reader->getProfiles();
for (StringMap<FunctionSamples>::iterator I = Profiles.begin(),		for (StringMap<FunctionSamples>::iterator I = Profiles.begin(),
E = Profiles.end();		E = Profiles.end();
I != E; ++I) {		I != E; ++I) {
sampleprof_error Result = sampleprof_error::success;		sampleprof_error Result = sampleprof_error::success;
FunctionSamples Remapped =		FunctionSamples Remapped =
Remapper ? remapSamples(I->second, *Remapper, Result)		Remapper ? remapSamples(I->second, *Remapper, Result)
: FunctionSamples();		: FunctionSamples();
FunctionSamples &Samples = Remapper ? Remapped : I->second;		FunctionSamples &Samples = Remapper ? Remapped : I->second;
StringRef FName = Samples.getName();		StringRef FName = Samples.getName();
MergeResult(Result, ProfileMap[FName].merge(Samples, Input.Weight));		FName = FunctionNames.insert(FName).first->first();
		Samples.setName(FName);
		MergeResult(Result, ProfileMap[FName].merge(Samples, &FunctionNames,
		Input.Weight));
if (Result != sampleprof_error::success) {		if (Result != sampleprof_error::success) {
std::error_code EC = make_error_code(Result);		std::error_code EC = make_error_code(Result);
handleMergeWriterError(errorCodeToError(EC), Input.Filename, FName);		handleMergeWriterError(errorCodeToError(EC), Input.Filename, FName);
}		}
}		}
}		}
Writer->write(ProfileMap);		Writer->write(ProfileMap);
}		}
▲ Show 20 Lines • Show All 604 Lines • Show Last 20 Lines