This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/tools/llvm-xray/
-
trunk/
-
tools/
-
llvm-xray/
-
xray-converter.cpp

Differential D58584

[XRay][tools] Revert "Use Support/JSON.h in llvm-xray convert"
ClosedPublic

Authored by lebedev.ri on Feb 23 2019, 2:17 PM.

Download Raw Diff

Details

Reviewers

dberris
kpw
sammccall

Commits

rGee57e9e190b8: Merging r354764: --------------------------------------------------------------…
rL354856: Merging r354764:
rG49b6f81a74ae: [XRay][tools] Revert "Use Support/JSON.h in llvm-xray convert"
rL354764: [XRay][tools] Revert "Use Support/JSON.h in llvm-xray convert"

Summary

This reverts D50129 / rL338834: [XRay][tools] Use Support/JSON.h in llvm-xray convert

Abstractions are great.
Readable code is great.
JSON support library is a *good* idea.

However unfortunately, there is an internal detail that one needs
to be aware of in llvm::json::Object - it uses llvm::DenseMap.
So for every llvm::json::Object, even if you only store a single int
entry there, you pay the whole price of llvm::DenseMap.

Unfortunately, it matters for llvm-xray.

I was trying to analyse the llvm-exegesis analysis mode performance,
and for that i wanted to view the LLVM X-Ray log visualization in Chrome
trace viewer. And the llvm-xray convert is sluggish, and sometimes
even ended up being killed by OOM.

xray-log.llvm-exegesis.lwZ0sT was acquired from llvm-exegesis
(compiled with -fxray-instruction-threshold=128)
analysis mode over -benchmarks-file with 10099 points (one full
latency measurement set), with normal runtime of 0.387s.

Timings:
Old: (copied from D58580)

$ perf stat -r 5 ./bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT 

 Performance counter stats for './bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT' (5 runs):

          21346.24 msec task-clock                #    1.000 CPUs utilized            ( +-  0.28% )
               314      context-switches          #   14.701 M/sec                    ( +- 59.13% )
                 1      cpu-migrations            #    0.037 M/sec                    ( +-100.00% )
           2181354      page-faults               # 102191.251 M/sec                  ( +-  0.02% )
       85477442102      cycles                    # 4004415.019 GHz                   ( +-  0.28% )  (83.33%)
       14526427066      stalled-cycles-frontend   #   16.99% frontend cycles idle     ( +-  0.70% )  (83.33%)
       32371533721      stalled-cycles-backend    #   37.87% backend cycles idle      ( +-  0.27% )  (33.34%)
       67896890228      instructions              #    0.79  insn per cycle         
                                                  #    0.48  stalled cycles per insn  ( +-  0.03% )  (50.00%)
       14592654840      branches                  # 683631198.653 M/sec               ( +-  0.02% )  (66.67%)
         212207534      branch-misses             #    1.45% of all branches          ( +-  0.94% )  (83.34%)

           21.3502 +- 0.0585 seconds time elapsed  ( +-  0.27% )

New:

$ perf stat -r 9 ./bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT

 Performance counter stats for './bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT' (9 runs):

           7178.38 msec task-clock                #    1.000 CPUs utilized            ( +-  0.26% )
               182      context-switches          #   25.402 M/sec                    ( +- 28.84% )
                 0      cpu-migrations            #    0.046 M/sec                    ( +- 70.71% )
             33701      page-faults               # 4694.994 M/sec                    ( +-  0.88% )
       28761053971      cycles                    # 4006833.933 GHz                   ( +-  0.26% )  (83.32%)
        2028297997      stalled-cycles-frontend   #    7.05% frontend cycles idle     ( +-  1.61% )  (83.32%)
       10773154901      stalled-cycles-backend    #   37.46% backend cycles idle      ( +-  0.38% )  (33.36%)
       36199132874      instructions              #    1.26  insn per cycle         
                                                  #    0.30  stalled cycles per insn  ( +-  0.03% )  (50.02%)
        6434504227      branches                  # 896420204.421 M/sec               ( +-  0.03% )  (66.68%)
          73355176      branch-misses             #    1.14% of all branches          ( +-  1.46% )  (83.33%)

            7.1807 +- 0.0190 seconds time elapsed  ( +-  0.26% )

So using llvm::json nearly triples run-time on that test case.
(+3x is times, not percent.)

Memory:
Old:

total runtime: 39.88s.
bytes allocated in total (ignoring deallocations): 79.07GB (1.98GB/s)
calls to allocation functions: 33267816 (834135/s)
temporary memory allocations: 5832298 (146235/s)
peak heap memory consumption: 9.21GB
peak RSS (including heaptrack overhead): 147.98GB
total memory leaked: 1.09MB

New:

total runtime: 17.42s.
bytes allocated in total (ignoring deallocations): 5.12GB (293.86MB/s)
calls to allocation functions: 21382982 (1227284/s)
temporary memory allocations: 232858 (13364/s)
peak heap memory consumption: 350.69MB
peak RSS (including heaptrack overhead): 2.55GB
total memory leaked: 79.95KB

Diff:

total runtime: -22.46s.
bytes allocated in total (ignoring deallocations): -73.95GB (3.29GB/s)
calls to allocation functions: -11884834 (529155/s)
temporary memory allocations: -5599440 (249307/s)
peak heap memory consumption: -8.86GB
peak RSS (including heaptrack overhead): 0B
total memory leaked: -1.01MB

So using llvm::json increases *peak* memory consumption on *this* testcase ~+27x.
And total allocation count +15x. Both of these numbers are times, *not* percent.

And note that memory usage is clearly unbound with llvm::json, it directly depends
on the length of the log, so peak memory consumption is always increasing.
This isn't so with the dumb code, there is no accumulating memory consumption,
peak memory consumption is fixed. Naturally, that means it will handle *much*
larger logs without OOM'ing.

Readability is good, but the price is simply unacceptable here.
Too bad none of this analysis was done as part of the development/review D50129 itself.

Diff Detail

Repository: rL LLVM

Event Timeline

lebedev.ri created this revision.Feb 23 2019, 2:17 PM

Herald added subscribers: jdoerfert, courbet. · View Herald TranscriptFeb 23 2019, 2:17 PM

lebedev.ri added a subscriber: hans.Feb 23 2019, 11:53 PM

riccibruno added a subscriber: riccibruno.Feb 24 2019, 5:13 AM

LGTM -- thanks!

What I wish existed was a stateful JSON output stream implementation instead of the build-everything-in-memory model. Maybe someday. :)

This revision is now accepted and ready to land.Feb 24 2019, 1:47 PM

In D58584#1408350, @dberris wrote:

LGTM -- thanks!

Thank you. Will land later today.

What I wish existed was a stateful JSON output stream implementation instead of the build-everything-in-memory model. Maybe someday. :)

Yeah. LLVM's YAML library is that way.
I did try to bend it to my will and produce [identical] JSON, but did not really succeed.

Closed by commit rL354764: [XRay][tools] Revert "Use Support/JSON.h in llvm-xray convert" (authored by lebedevri). · Explain WhyFeb 24 2019, 11:40 PM

This revision was automatically updated to reflect the committed changes.

Was I added as a review because you want this merged to release_80?

In D58584#1409124, @hans wrote:

Was I added as a review because you want this merged to release_80?

Yep, i did file https://bugs.llvm.org/show_bug.cgi?id=40839

In D58584#1409168, @lebedev.ri wrote:

In D58584#1409124, @hans wrote:

Was I added as a review because you want this merged to release_80?

Yep, i did file https://bugs.llvm.org/show_bug.cgi?id=40839

Thanks! I'm reading my email in the wrong order it seems :-) Merged in r354856.

lebedev.ri mentioned this in D60609: Use native llvm JSON library for time profiler output.Apr 12 2019, 11:33 PM

Revision Contents

Path

Size

llvm/

trunk/

tools/

llvm-xray/

xray-converter.cpp

109 lines

Diff 188101

llvm/trunk/tools/llvm-xray/xray-converter.cpp

Show All 11 Lines
#include "xray-converter.h"		#include "xray-converter.h"

#include "trie-node.h"		#include "trie-node.h"
#include "xray-registry.h"		#include "xray-registry.h"
#include "llvm/DebugInfo/Symbolize/Symbolize.h"		#include "llvm/DebugInfo/Symbolize/Symbolize.h"
#include "llvm/Support/EndianStream.h"		#include "llvm/Support/EndianStream.h"
#include "llvm/Support/FileSystem.h"		#include "llvm/Support/FileSystem.h"
#include "llvm/Support/FormatVariadic.h"		#include "llvm/Support/FormatVariadic.h"
#include "llvm/Support/JSON.h"
#include "llvm/Support/ScopedPrinter.h"		#include "llvm/Support/ScopedPrinter.h"
#include "llvm/Support/YAMLTraits.h"		#include "llvm/Support/YAMLTraits.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/XRay/InstrumentationMap.h"		#include "llvm/XRay/InstrumentationMap.h"
#include "llvm/XRay/Trace.h"		#include "llvm/XRay/Trace.h"
#include "llvm/XRay/YAMLXRayRecord.h"		#include "llvm/XRay/YAMLXRayRecord.h"

using namespace llvm;		using namespace llvm;
▲ Show 20 Lines • Show All 207 Lines • ▼ Show 20 Lines	StackTrieNode *findOrCreateStackNode(
NodeStore.push_front({FuncId, Parent, {}, {stack_id, std::move(siblings)}});		NodeStore.push_front({FuncId, Parent, {}, {stack_id, std::move(siblings)}});
StackTrieNode *CurrentStack = &NodeStore.front();		StackTrieNode *CurrentStack = &NodeStore.front();
for (auto *sibling : CurrentStack->ExtraData.siblings)		for (auto *sibling : CurrentStack->ExtraData.siblings)
sibling->ExtraData.siblings.push_back(CurrentStack);		sibling->ExtraData.siblings.push_back(CurrentStack);
ParentCallees.push_back(CurrentStack);		ParentCallees.push_back(CurrentStack);
return CurrentStack;		return CurrentStack;
}		}

		void writeTraceViewerRecord(uint16_t Version, raw_ostream &OS, int32_t FuncId,
		uint32_t TId, uint32_t PId, bool Symbolize,
		const FuncIdConversionHelper &FuncIdHelper,
		double EventTimestampUs,
		const StackTrieNode &StackCursor,
		StringRef FunctionPhenotype) {
		OS << " ";
		if (Version >= 3) {
		OS << llvm::formatv(
		R"({ "name" : "{0}", "ph" : "{1}", "tid" : "{2}", "pid" : "{3}", )"
		R"("ts" : "{4:f4}", "sf" : "{5}" })",
		(Symbolize ? FuncIdHelper.SymbolOrNumber(FuncId)
		: llvm::to_string(FuncId)),
		FunctionPhenotype, TId, PId, EventTimestampUs,
		StackCursor.ExtraData.id);
		} else {
		OS << llvm::formatv(
		R"({ "name" : "{0}", "ph" : "{1}", "tid" : "{2}", "pid" : "1", )"
		R"("ts" : "{3:f3}", "sf" : "{4}" })",
		(Symbolize ? FuncIdHelper.SymbolOrNumber(FuncId)
		: llvm::to_string(FuncId)),
		FunctionPhenotype, TId, EventTimestampUs, StackCursor.ExtraData.id);
		}
		}

} // namespace		} // namespace

void TraceConverter::exportAsChromeTraceEventFormat(const Trace &Records,		void TraceConverter::exportAsChromeTraceEventFormat(const Trace &Records,
raw_ostream &OS) {		raw_ostream &OS) {
const auto &FH = Records.getFileHeader();		const auto &FH = Records.getFileHeader();
auto Version = FH.Version;		auto Version = FH.Version;
auto CycleFreq = FH.CycleFrequency;		auto CycleFreq = FH.CycleFrequency;

unsigned id_counter = 0;		unsigned id_counter = 0;

		OS << "{\n \"traceEvents\": [";
DenseMap<uint32_t, StackTrieNode *> StackCursorByThreadId{};		DenseMap<uint32_t, StackTrieNode *> StackCursorByThreadId{};
DenseMap<uint32_t, SmallVector<StackTrieNode *, 4>> StackRootsByThreadId{};		DenseMap<uint32_t, SmallVector<StackTrieNode *, 4>> StackRootsByThreadId{};
DenseMap<unsigned, StackTrieNode *> StacksByStackId{};		DenseMap<unsigned, StackTrieNode *> StacksByStackId{};
std::forward_list<StackTrieNode> NodeStore{};		std::forward_list<StackTrieNode> NodeStore{};
		int loop_count = 0;
// Create a JSON Array which will hold all trace events.
json::Array TraceEvents;
for (const auto &R : Records) {		for (const auto &R : Records) {
		if (loop_count++ == 0)
		OS << "\n";
		else
		OS << ",\n";

// Chrome trace event format always wants data in micros.		// Chrome trace event format always wants data in micros.
// CyclesPerMicro = CycleHertz / 10^6		// CyclesPerMicro = CycleHertz / 10^6
// TSC / CyclesPerMicro == TSC * 10^6 / CycleHertz == MicroTimestamp		// TSC / CyclesPerMicro == TSC * 10^6 / CycleHertz == MicroTimestamp
// Could lose some precision here by converting the TSC to a double to		// Could lose some precision here by converting the TSC to a double to
// multiply by the period in micros. 52 bit mantissa is a good start though.		// multiply by the period in micros. 52 bit mantissa is a good start though.
// TODO: Make feature request to Chrome Trace viewer to accept ticks and a		// TODO: Make feature request to Chrome Trace viewer to accept ticks and a
// frequency or do some more involved calculation to avoid dangers of		// frequency or do some more involved calculation to avoid dangers of
// conversion.		// conversion.
double EventTimestampUs = double(1000000) / CycleFreq * double(R.TSC);		double EventTimestampUs = double(1000000) / CycleFreq * double(R.TSC);
StackTrieNode *&StackCursor = StackCursorByThreadId[R.TId];		StackTrieNode *&StackCursor = StackCursorByThreadId[R.TId];
switch (R.Type) {		switch (R.Type) {
case RecordTypes::CUSTOM_EVENT:		case RecordTypes::CUSTOM_EVENT:
case RecordTypes::TYPED_EVENT:		case RecordTypes::TYPED_EVENT:
// TODO: Support typed and custom event rendering on Chrome Trace Viewer.		// TODO: Support typed and custom event rendering on Chrome Trace Viewer.
break;		break;
case RecordTypes::ENTER:		case RecordTypes::ENTER:
case RecordTypes::ENTER_ARG:		case RecordTypes::ENTER_ARG:
StackCursor = findOrCreateStackNode(StackCursor, R.FuncId, R.TId,		StackCursor = findOrCreateStackNode(StackCursor, R.FuncId, R.TId,
StackRootsByThreadId, StacksByStackId,		StackRootsByThreadId, StacksByStackId,
&id_counter, NodeStore);		&id_counter, NodeStore);
// Each record is represented as a json dictionary with function name,		// Each record is represented as a json dictionary with function name,
// type of B for begin or E for end, thread id, process id,		// type of B for begin or E for end, thread id, process id,
// timestamp in microseconds, and a stack frame id. The ids are logged		// timestamp in microseconds, and a stack frame id. The ids are logged
// in an id dictionary after the events.		// in an id dictionary after the events.
TraceEvents.push_back(json::Object({		writeTraceViewerRecord(Version, OS, R.FuncId, R.TId, R.PId, Symbolize,
{"name", Symbolize ? FuncIdHelper.SymbolOrNumber(R.FuncId)		FuncIdHelper, EventTimestampUs, *StackCursor, "B");
: llvm::to_string(R.FuncId)},
{"ph", "B"},
{"tid", llvm::to_string(R.TId)},
{"pid", llvm::to_string(Version >= 3 ? R.PId : 1)},
{"ts", llvm::formatv("{0:f4}", EventTimestampUs)},
{"sf", llvm::to_string(StackCursor->ExtraData.id)},
}));
break;		break;
case RecordTypes::EXIT:		case RecordTypes::EXIT:
case RecordTypes::TAIL_EXIT:		case RecordTypes::TAIL_EXIT:
// No entries to record end for.		// No entries to record end for.
if (StackCursor == nullptr)		if (StackCursor == nullptr)
break;		break;
// Should we emit an END record anyway or account this condition?		// Should we emit an END record anyway or account this condition?
// (And/Or in loop termination below)		// (And/Or in loop termination below)
StackTrieNode *PreviousCursor = nullptr;		StackTrieNode *PreviousCursor = nullptr;
do {		do {
TraceEvents.push_back(json::Object({		if (PreviousCursor != nullptr) {
{"name", Symbolize		OS << ",\n";
? FuncIdHelper.SymbolOrNumber(StackCursor->FuncId)		}
: llvm::to_string(StackCursor->FuncId)},		writeTraceViewerRecord(Version, OS, StackCursor->FuncId, R.TId, R.PId,
{"ph", "E"},		Symbolize, FuncIdHelper, EventTimestampUs,
{"tid", llvm::to_string(R.TId)},		*StackCursor, "E");
{"pid", llvm::to_string(Version >= 3 ? R.PId : 1)},
{"ts", llvm::formatv("{0:f4}", EventTimestampUs)},
{"sf", llvm::to_string(StackCursor->ExtraData.id)},
}));
PreviousCursor = StackCursor;		PreviousCursor = StackCursor;
StackCursor = StackCursor->Parent;		StackCursor = StackCursor->Parent;
} while (PreviousCursor->FuncId != R.FuncId && StackCursor != nullptr);		} while (PreviousCursor->FuncId != R.FuncId && StackCursor != nullptr);
break;		break;
}		}
}		}
		OS << "\n ],\n"; // Close the Trace Events array.
		OS << " "
		<< "\"displayTimeUnit\": \"ns\",\n";

// The stackFrames dictionary substantially reduces size of the output file by		// The stackFrames dictionary substantially reduces size of the output file by
// avoiding repeating the entire call stack of function names for each entry.		// avoiding repeating the entire call stack of function names for each entry.
json::Object StackFrames;		OS << R"( "stackFrames": {)";
for (const auto &Stack : StacksByStackId) {		int stack_frame_count = 0;
const auto &StackId = Stack.first;		for (auto map_iter : StacksByStackId) {
const auto &StackFunctionNode = Stack.second;		if (stack_frame_count++ == 0)
json::Object::iterator It;		OS << "\n";
std::tie(It, std::ignore) = StackFrames.insert({		else
llvm::to_string(StackId),		OS << ",\n";
json::Object{		OS << " ";
{"name",		OS << llvm::formatv(
Symbolize ? FuncIdHelper.SymbolOrNumber(StackFunctionNode->FuncId)		R"("{0}" : { "name" : "{1}")", map_iter.first,
: llvm::to_string(StackFunctionNode->FuncId)}},		(Symbolize ? FuncIdHelper.SymbolOrNumber(map_iter.second->FuncId)
});		: llvm::to_string(map_iter.second->FuncId)));
		if (map_iter.second->Parent != nullptr)
if (StackFunctionNode->Parent != nullptr)		OS << llvm::formatv(R"(, "parent": "{0}")",
It->second.getAsObject()->insert(		map_iter.second->Parent->ExtraData.id);
{"parent", llvm::to_string(StackFunctionNode->Parent->ExtraData.id)});		OS << " }";
}		}
		OS << "\n }\n"; // Close the stack frames map.
json::Object TraceJSON{		OS << "}\n"; // Close the JSON entry.
{"displayTimeUnit", "ns"},
{"traceEvents", std::move(TraceEvents)},
{"stackFrames", std::move(StackFrames)},
};

// Pretty-print the JSON using two spaces for indentations.
OS << formatv("{0:2}", json::Value(std::move(TraceJSON)));
}		}

namespace llvm {		namespace llvm {
namespace xray {		namespace xray {

static CommandRegistration Unused(&Convert, []() -> Error {		static CommandRegistration Unused(&Convert, []() -> Error {
// FIXME: Support conversion to BINARY when upgrading XRay trace versions.		// FIXME: Support conversion to BINARY when upgrading XRay trace versions.
InstrumentationMap Map;		InstrumentationMap Map;
▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines