This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/ProfileData/
-
llvm/
-
ProfileData/
-
SampleProfReader.h
-
lib/ProfileData/
-
ProfileData/
2/2
SampleProfReader.cpp
-
test/tools/llvm-profdata/
-
tools/
-
llvm-profdata/
1/1
sample-profile-json.test
-
tools/llvm-profdata/
-
llvm-profdata/
-
llvm-profdata.cpp

Differential D130944

[llvm-profdata] Support JSON as as an output-only format
ClosedPublic

Authored by kazu on Aug 1 2022, 2:34 PM.

Download Raw Diff

Details

Reviewers

davidxl
wenlei
hoy
wmi

Commits

rGa044d0491efe: [llvm-profdata] Support JSON as as an output-only format

Summary

This patch teaches llvm-profdata to output the sample profile in the
JSON format. The new option is intended to be used for research and
development purposes. For example, one can write a Python script to
take a JSON file and analyze how similar different inline instances of
a given function are to each other.

I've chosen JSON because Python can parse it reasonably fast, and it
just takes a couple of lines to read the whole data:

import json
with open ('profile.json') as f:
  profile = json.load(f)

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

kazu created this revision.Aug 1 2022, 2:34 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 1 2022, 2:34 PM

Herald added subscribers: wenlei, hiraditya. · View Herald Transcript

kazu requested review of this revision.Aug 1 2022, 2:34 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 1 2022, 2:34 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

kazu added reviewers: davidxl, wenlei, hoy, wmi.Aug 1 2022, 2:37 PM

Harbormaster completed remote builds in B178650: Diff 449128.Aug 1 2022, 3:12 PM

Perhaps make it explicit in the title that this is json an output-only format so far?

llvm/lib/ProfileData/SampleProfWriter.cpp
539 ↗	(On Diff #449128)	Wondering if this is necessary since `S.getBodySamples()` returns a `BodySampleMap` object which is already ordered. Similarly with getCallsiteSamples below.
llvm/test/tools/llvm-profdata/sample-profile-json.test
4	nit: use JSON-next for this line and below

Do you absolutely need this to be a supported format and generated natively in llvm tools?

For a profile format, we'd expect full reader/writer support, in both llvm-profdata and llvm-profgen. Having a partially supported output only format seems less than ideal.

For research purpose, would it be possible to generate json from a text profile through offline scripts?

Another option is to leverage llvm-profdata show to have an extra flag to output in json form. show output doesn't need to be a valid profile format and it doesn't need to be loadable by profile reader.

Re-reply in Phab: +1 on the show command direction. If we have a reader in the future, consider promote this into a supported format.

Updated to move the JSON format support to "llvm-profdata show".

In D130944#3695666, @wenlei wrote:

Do you absolutely need this to be a supported format and generated natively in llvm tools?

For a profile format, we'd expect full reader/writer support, in both llvm-profdata and llvm-profgen. Having a partially supported output only format seems less than ideal.

For research purpose, would it be possible to generate json from a text profile through offline scripts?

Another option is to leverage llvm-profdata show to have an extra flag to output in json form. show output doesn't need to be a valid profile format and it doesn't need to be loadable by profile reader.

The main reason for adding the JSON support in an llvm tool is that I can leverage the parser.

llvm-profdata show makes sense.

I wrote another "dumper" just for the JSON format in SampleProfileReader. It's harder than I thought to wire SampleProfileWriter* to llvm-profdata show. Specifically, I had trouble casting raw_fd_ostream to raw_ostream for json::OStream purposes.

I've manually updated the patch description as that's not part of the diff in Phabricator.

davidxl added inline comments.Aug 9 2022, 2:06 PM

llvm/lib/ProfileData/SampleProfReader.cpp
87	too many nesting levels. Perhaps split out the lamda?
115	split out the lamda?

Harbormaster completed remote builds in B180238: Diff 451251.Aug 9 2022, 3:30 PM

Broke up the big lambda into smaller pieces.

I've broken up the big lambda into smaller pieces. It should be easier to see that the output has two major parts, namely`BodySamples` and CallsiteSamples, with the implementation details factored out elsewhere.

PTAL. Thanks!

lgtm

This revision is now accepted and ready to land.Aug 9 2022, 4:22 PM

This revision was landed with ongoing or failed builds.Aug 9 2022, 4:25 PM

Closed by commit rGa044d0491efe: [llvm-profdata] Support JSON as as an output-only format (authored by kazu). · Explain Why

This revision was automatically updated to reflect the committed changes.

kazu added a commit: rGa044d0491efe: [llvm-profdata] Support JSON as as an output-only format.

Harbormaster completed remote builds in B180275: Diff 451298.Aug 9 2022, 6:03 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

ProfileData/

SampleProfReader.h

3 lines

lib/

ProfileData/

SampleProfReader.cpp

74 lines

test/

tools/

llvm-profdata/

sample-profile-json.test

104 lines

tools/

llvm-profdata/

llvm-profdata.cpp

28 lines

Diff 451301

llvm/include/llvm/ProfileData/SampleProfReader.h

Show First 20 Lines • Show All 385 Lines • ▼ Show 20 Lines	public:
/// support loading function profiles on demand, return true when the		/// support loading function profiles on demand, return true when the
/// reader has been given a module. Always return false for reader		/// reader has been given a module. Always return false for reader
/// which doesn't support loading function profiles on demand.		/// which doesn't support loading function profiles on demand.
virtual bool collectFuncsFromModule() { return false; }		virtual bool collectFuncsFromModule() { return false; }

/// Print all the profiles on stream \p OS.		/// Print all the profiles on stream \p OS.
void dump(raw_ostream &OS = dbgs());		void dump(raw_ostream &OS = dbgs());

		/// Print all the profiles on stream \p OS in the JSON format.
		void dumpJson(raw_ostream &OS = dbgs());

/// Return the samples collected for function \p F.		/// Return the samples collected for function \p F.
FunctionSamples *getSamplesFor(const Function &F) {		FunctionSamples *getSamplesFor(const Function &F) {
// The function name may have been updated by adding suffix. Call		// The function name may have been updated by adding suffix. Call
// a helper to (optionally) strip off suffixes so that we can		// a helper to (optionally) strip off suffixes so that we can
// match against the original function name in the profile.		// match against the original function name in the profile.
StringRef CanonName = FunctionSamples::getCanonicalFnName(F);		StringRef CanonName = FunctionSamples::getCanonicalFnName(F);
return getSamplesFor(CanonName);		return getSamplesFor(CanonName);
}		}
▲ Show 20 Lines • Show All 498 Lines • Show Last 20 Lines

llvm/lib/ProfileData/SampleProfReader.cpp

	Show All 24 Lines
	#include "llvm/ADT/StringRef.h"			#include "llvm/ADT/StringRef.h"
	#include "llvm/IR/Module.h"			#include "llvm/IR/Module.h"
	#include "llvm/IR/ProfileSummary.h"			#include "llvm/IR/ProfileSummary.h"
	#include "llvm/ProfileData/ProfileCommon.h"			#include "llvm/ProfileData/ProfileCommon.h"
	#include "llvm/ProfileData/SampleProf.h"			#include "llvm/ProfileData/SampleProf.h"
	#include "llvm/Support/CommandLine.h"			#include "llvm/Support/CommandLine.h"
	#include "llvm/Support/Compression.h"			#include "llvm/Support/Compression.h"
	#include "llvm/Support/ErrorOr.h"			#include "llvm/Support/ErrorOr.h"
				#include "llvm/Support/JSON.h"
	#include "llvm/Support/LEB128.h"			#include "llvm/Support/LEB128.h"
	#include "llvm/Support/LineIterator.h"			#include "llvm/Support/LineIterator.h"
	#include "llvm/Support/MD5.h"			#include "llvm/Support/MD5.h"
	#include "llvm/Support/MemoryBuffer.h"			#include "llvm/Support/MemoryBuffer.h"
	#include "llvm/Support/raw_ostream.h"			#include "llvm/Support/raw_ostream.h"
	#include <algorithm>			#include <algorithm>
	#include <cstddef>			#include <cstddef>
	#include <cstdint>			#include <cstdint>
	Show All 26 Lines
	/// Dump all the function profiles found on stream \p OS.			/// Dump all the function profiles found on stream \p OS.
	void SampleProfileReader::dump(raw_ostream &OS) {			void SampleProfileReader::dump(raw_ostream &OS) {
	std::vector<NameFunctionSamples> V;			std::vector<NameFunctionSamples> V;
	sortFuncProfiles(Profiles, V);			sortFuncProfiles(Profiles, V);
	for (const auto &I : V)			for (const auto &I : V)
	dumpFunctionProfile(I.first, OS);			dumpFunctionProfile(I.first, OS);
	}			}

				static void dumpFunctionProfileJson(const FunctionSamples &S,
				json::OStream &JOS, bool TopLevel = false) {
				auto DumpBody = [&](const BodySampleMap &BodySamples) {
				for (const auto &I : BodySamples) {
				const LineLocation &Loc = I.first;
				const SampleRecord &Sample = I.second;
				JOS.object([&] {
				JOS.attribute("line", Loc.LineOffset);
				if (Loc.Discriminator)
				JOS.attribute("discriminator", Loc.Discriminator);
				JOS.attribute("samples", Sample.getSamples());

				davidxlUnsubmitted Done Reply Inline Actions too many nesting levels. Perhaps split out the lamda? davidxl: too many nesting levels. Perhaps split out the lamda?
				auto CallTargets = Sample.getSortedCallTargets();
				if (!CallTargets.empty()) {
				JOS.attributeArray("calls", [&] {
				for (const auto &J : CallTargets) {
				JOS.object([&] {
				JOS.attribute("function", J.first);
				JOS.attribute("samples", J.second);
				});
				}
				});
				}
				});
				}
				};

				auto DumpCallsiteSamples = [&](const CallsiteSampleMap &CallsiteSamples) {
				for (const auto &I : CallsiteSamples)
				for (const auto &FS : I.second) {
				const LineLocation &Loc = I.first;
				const FunctionSamples &CalleeSamples = FS.second;
				JOS.object([&] {
				JOS.attribute("line", Loc.LineOffset);
				if (Loc.Discriminator)
				JOS.attribute("discriminator", Loc.Discriminator);
				JOS.attributeArray(
				"samples", [&] { dumpFunctionProfileJson(CalleeSamples, JOS); });
				});
				}
				davidxlUnsubmitted Done Reply Inline Actions split out the lamda? davidxl: split out the lamda?
				};

				JOS.object([&] {
				JOS.attribute("name", S.getName());
				JOS.attribute("total", S.getTotalSamples());
				if (TopLevel)
				JOS.attribute("head", S.getHeadSamples());

				const auto &BodySamples = S.getBodySamples();
				if (!BodySamples.empty())
				JOS.attributeArray("body", [&] { DumpBody(BodySamples); });

				const auto &CallsiteSamples = S.getCallsiteSamples();
				if (!CallsiteSamples.empty())
				JOS.attributeArray("callsites",
				[&] { DumpCallsiteSamples(CallsiteSamples); });
				});
				}

				/// Dump all the function profiles found on stream \p OS in the JSON format.
				void SampleProfileReader::dumpJson(raw_ostream &OS) {
				std::vector<NameFunctionSamples> V;
				sortFuncProfiles(Profiles, V);
				json::OStream JOS(OS, 2);
				JOS.arrayBegin();
				for (const auto &[FC, FS] : V)
				dumpFunctionProfileJson(*FS, JOS, true);
				JOS.arrayEnd();

				// Emit a newline character at the end as json::OStream doesn't emit one.
				OS << "\n";
				}

	/// Parse \p Input as function head.			/// Parse \p Input as function head.
	///			///
	/// Parse one line of \p Input, and update function name in \p FName,			/// Parse one line of \p Input, and update function name in \p FName,
	/// function's total sample count in \p NumSamples, function's entry			/// function's total sample count in \p NumSamples, function's entry
	/// count in \p NumHeadSamples.			/// count in \p NumHeadSamples.
	///			///
	/// \returns true if parsing is successful.			/// \returns true if parsing is successful.
	static bool ParseHead(const StringRef &Input, StringRef &FName,			static bool ParseHead(const StringRef &Input, StringRef &FName,
	▲ Show 20 Lines • Show All 1,815 Lines • Show Last 20 Lines

llvm/test/tools/llvm-profdata/sample-profile-json.test

This file was added.

				RUN: llvm-profdata show --sample --json %p/Inputs/sample-profile.proftext \| FileCheck %s --check-prefix=JSON
				JSON: [
				JSON-NEXT: {
				JSON-NEXT: "name": "main",
				hoyUnsubmitted Done Reply Inline Actions nit: use JSON-next for this line and below hoy: nit: use JSON-next for this line and below
				JSON-NEXT: "total": 184019,
				JSON-NEXT: "head": 0,
				JSON-NEXT: "body": [
				JSON-NEXT: {
				JSON-NEXT: "line": 4,
				JSON-NEXT: "samples": 534
				JSON-NEXT: },
				JSON-NEXT: {
				JSON-NEXT: "line": 4,
				JSON-NEXT: "discriminator": 2,
				JSON-NEXT: "samples": 534
				JSON-NEXT: },
				JSON-NEXT: {
				JSON-NEXT: "line": 5,
				JSON-NEXT: "samples": 1075
				JSON-NEXT: },
				JSON-NEXT: {
				JSON-NEXT: "line": 5,
				JSON-NEXT: "discriminator": 1,
				JSON-NEXT: "samples": 1075
				JSON-NEXT: },
				JSON-NEXT: {
				JSON-NEXT: "line": 6,
				JSON-NEXT: "samples": 2080
				JSON-NEXT: },
				JSON-NEXT: {
				JSON-NEXT: "line": 7,
				JSON-NEXT: "samples": 534
				JSON-NEXT: },
				JSON-NEXT: {
				JSON-NEXT: "line": 9,
				JSON-NEXT: "samples": 2064,
				JSON-NEXT: "calls": [
				JSON-NEXT: {
				JSON-NEXT: "function": "_Z3bari",
				JSON-NEXT: "samples": 1471
				JSON-NEXT: },
				JSON-NEXT: {
				JSON-NEXT: "function": "_Z3fooi",
				JSON-NEXT: "samples": 631
				JSON-NEXT: }
				JSON-NEXT: ]
				JSON-NEXT: }
				JSON-NEXT: ],
				JSON-NEXT: "callsites": [
				JSON-NEXT: {
				JSON-NEXT: "line": 10,
				JSON-NEXT: "samples": [
				JSON-NEXT: {
				JSON-NEXT: "name": "inline1",
				JSON-NEXT: "total": 1000,
				JSON-NEXT: "body": [
				JSON-NEXT: {
				JSON-NEXT: "line": 1,
				JSON-NEXT: "samples": 1000
				JSON-NEXT: }
				JSON-NEXT: ]
				JSON-NEXT: }
				JSON-NEXT: ]
				JSON-NEXT: },
				JSON-NEXT: {
				JSON-NEXT: "line": 10,
				JSON-NEXT: "samples": [
				JSON-NEXT: {
				JSON-NEXT: "name": "inline2",
				JSON-NEXT: "total": 2000,
				JSON-NEXT: "body": [
				JSON-NEXT: {
				JSON-NEXT: "line": 1,
				JSON-NEXT: "samples": 2000
				JSON-NEXT: }
				JSON-NEXT: ]
				JSON-NEXT: }
				JSON-NEXT: ]
				JSON-NEXT: }
				JSON-NEXT: ]
				JSON-NEXT: },
				JSON-NEXT: {
				JSON-NEXT: "name": "_Z3bari",
				JSON-NEXT: "total": 20301,
				JSON-NEXT: "head": 1437,
				JSON-NEXT: "body": [
				JSON-NEXT: {
				JSON-NEXT: "line": 1,
				JSON-NEXT: "samples": 1437
				JSON-NEXT: }
				JSON-NEXT: ]
				JSON-NEXT: },
				JSON-NEXT: {
				JSON-NEXT: "name": "_Z3fooi",
				JSON-NEXT: "total": 7711,
				JSON-NEXT: "head": 610,
				JSON-NEXT: "body": [
				JSON-NEXT: {
				JSON-NEXT: "line": 1,
				JSON-NEXT: "samples": 610
				JSON-NEXT: }
				JSON-NEXT: ]
				JSON-NEXT: }
				JSON-NEXT: ]

llvm/tools/llvm-profdata/llvm-profdata.cpp

Show First 20 Lines • Show All 2,482 Lines • ▼ Show 20 Lines
}		}

static int showSampleProfile(const std::string &Filename, bool ShowCounts,		static int showSampleProfile(const std::string &Filename, bool ShowCounts,
uint32_t TopN, bool ShowAllFunctions,		uint32_t TopN, bool ShowAllFunctions,
bool ShowDetailedSummary,		bool ShowDetailedSummary,
const std::string &ShowFunction,		const std::string &ShowFunction,
bool ShowProfileSymbolList,		bool ShowProfileSymbolList,
bool ShowSectionInfoOnly, bool ShowHotFuncList,		bool ShowSectionInfoOnly, bool ShowHotFuncList,
raw_fd_ostream &OS) {		bool JsonFormat, raw_fd_ostream &OS) {
using namespace sampleprof;		using namespace sampleprof;
LLVMContext Context;		LLVMContext Context;
auto ReaderOrErr =		auto ReaderOrErr =
SampleProfileReader::create(Filename, Context, FSDiscriminatorPassOption);		SampleProfileReader::create(Filename, Context, FSDiscriminatorPassOption);
if (std::error_code EC = ReaderOrErr.getError())		if (std::error_code EC = ReaderOrErr.getError())
exitWithErrorCode(EC, Filename);		exitWithErrorCode(EC, Filename);

auto Reader = std::move(ReaderOrErr.get());		auto Reader = std::move(ReaderOrErr.get());
if (ShowSectionInfoOnly) {		if (ShowSectionInfoOnly) {
showSectionInfo(Reader.get(), OS);		showSectionInfo(Reader.get(), OS);
return 0;		return 0;
}		}

if (std::error_code EC = Reader->read())		if (std::error_code EC = Reader->read())
exitWithErrorCode(EC, Filename);		exitWithErrorCode(EC, Filename);

if (ShowAllFunctions \|\| ShowFunction.empty())		if (ShowAllFunctions \|\| ShowFunction.empty()) {
Reader->dump(OS);		if (JsonFormat)
		Reader->dumpJson(OS);
else		else
		Reader->dump(OS);
		} else {
		if (JsonFormat)
		exitWithError(
		"the JSON format is supported only when all functions are to "
		"be printed");

// TODO: parse context string to support filtering by contexts.		// TODO: parse context string to support filtering by contexts.
Reader->dumpFunctionProfile(StringRef(ShowFunction), OS);		Reader->dumpFunctionProfile(StringRef(ShowFunction), OS);
		}

if (ShowProfileSymbolList) {		if (ShowProfileSymbolList) {
std::unique_ptr<sampleprof::ProfileSymbolList> ReaderList =		std::unique_ptr<sampleprof::ProfileSymbolList> ReaderList =
Reader->getProfileSymbolList();		Reader->getProfileSymbolList();
ReaderList->dump(OS);		ReaderList->dump(OS);
}		}

if (ShowDetailedSummary) {		if (ShowDetailedSummary) {
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
static int show_main(int argc, const char *argv[]) {		static int show_main(int argc, const char *argv[]) {
cl::opt<std::string> Filename(cl::Positional, cl::desc("<profdata-file>"));		cl::opt<std::string> Filename(cl::Positional, cl::desc("<profdata-file>"));

cl::opt<bool> ShowCounts("counts", cl::init(false),		cl::opt<bool> ShowCounts("counts", cl::init(false),
cl::desc("Show counter values for shown functions"));		cl::desc("Show counter values for shown functions"));
cl::opt<bool> TextFormat(		cl::opt<bool> TextFormat(
"text", cl::init(false),		"text", cl::init(false),
cl::desc("Show instr profile data in text dump format"));		cl::desc("Show instr profile data in text dump format"));
		cl::opt<bool> JsonFormat(
		"json", cl::init(false),
		cl::desc("Show sample profile data in the JSON format"));
cl::opt<bool> ShowIndirectCallTargets(		cl::opt<bool> ShowIndirectCallTargets(
"ic-targets", cl::init(false),		"ic-targets", cl::init(false),
cl::desc("Show indirect call site target values for shown functions"));		cl::desc("Show indirect call site target values for shown functions"));
cl::opt<bool> ShowMemOPSizes(		cl::opt<bool> ShowMemOPSizes(
"memop-sizes", cl::init(false),		"memop-sizes", cl::init(false),
cl::desc("Show the profiled sizes of the memory intrinsic calls "		cl::desc("Show the profiled sizes of the memory intrinsic calls "
"for shown functions"));		"for shown functions"));
cl::opt<bool> ShowDetailedSummary("detailed-summary", cl::init(false),		cl::opt<bool> ShowDetailedSummary("detailed-summary", cl::init(false),
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	static int show_main(int argc, const char *argv[]) {

if (ProfileKind == instr)		if (ProfileKind == instr)
return showInstrProfile(		return showInstrProfile(
Filename, ShowCounts, TopNFunctions, ShowIndirectCallTargets,		Filename, ShowCounts, TopNFunctions, ShowIndirectCallTargets,
ShowMemOPSizes, ShowDetailedSummary, DetailedSummaryCutoffs,		ShowMemOPSizes, ShowDetailedSummary, DetailedSummaryCutoffs,
ShowAllFunctions, ShowCS, ValueCutoff, OnlyListBelow, ShowFunction,		ShowAllFunctions, ShowCS, ValueCutoff, OnlyListBelow, ShowFunction,
TextFormat, ShowBinaryIds, ShowCovered, OS);		TextFormat, ShowBinaryIds, ShowCovered, OS);
if (ProfileKind == sample)		if (ProfileKind == sample)
return showSampleProfile(Filename, ShowCounts, TopNFunctions,		return showSampleProfile(
ShowAllFunctions, ShowDetailedSummary,		Filename, ShowCounts, TopNFunctions, ShowAllFunctions,
ShowFunction, ShowProfileSymbolList,		ShowDetailedSummary, ShowFunction, ShowProfileSymbolList,
ShowSectionInfoOnly, ShowHotFuncList, OS);		ShowSectionInfoOnly, ShowHotFuncList, JsonFormat, OS);
return showMemProfProfile(Filename, ProfiledBinary, OS);		return showMemProfProfile(Filename, ProfiledBinary, OS);
}		}

int main(int argc, const char *argv[]) {		int main(int argc, const char *argv[]) {
InitLLVM X(argc, argv);		InitLLVM X(argc, argv);

StringRef ProgName(sys::path::filename(argv[0]));		StringRef ProgName(sys::path::filename(argv[0]));
if (argc > 1) {		if (argc > 1) {
Show All 35 Lines