This is an archive of the discontinued LLVM Phabricator instance.

[XRay] Implement `llvm-xray convert` -- trace file conversion
ClosedPublic

Authored by dberris on Sep 8 2016, 10:59 PM.

Details

Summary

This is the second part of a multi-part change to define additional
subcommands to the llvm-xray tool.

This change defines a conversion subcommand to take XRay log files, and
turns them from one format to another (binary or YAML). This currently
only supports the first version of the log file format, defined in the
compiler-rt runtime.

Depends on D21987.

Diff Detail

Repository
rL LLVM

Event Timeline

dberris updated this revision to Diff 70787.Sep 8 2016, 10:59 PM
dberris retitled this revision from to [XRay] Implement `llvm-xray convert` -- trace file conversion.
dberris updated this object.
dberris added reviewers: dblaikie, echristo.
dberris added a subscriber: llvm-commits.
dberris updated this revision to Diff 73610.Oct 5 2016, 2:36 AM
  • Use the command registry instead of hard-coding into main
dberris updated this revision to Diff 74093.Oct 9 2016, 9:20 PM
  • Use the command registry instead of hard-coding into main
  • Rebase
dberris updated this revision to Diff 74973.Oct 18 2016, 3:45 AM
  • Use llvm::Error
dberris updated this revision to Diff 75118.Oct 19 2016, 1:29 AM
  • Rebase
dberris planned changes to this revision.Oct 26 2016, 3:36 AM

I'll be adding more tests and sample inputs for this tool, and probably use DataExtractor too for loading the records from the XRay log files. Please hold off on reviewing @dblaikie.

dberris updated this revision to Diff 75988.Oct 27 2016, 1:01 AM
  • Add test for converting a simple raw log to YAML

PTAL @dblaikie -- do you think using the DataExtractor here would be a good idea?

dblaikie edited edge metadata.Oct 27 2016, 2:29 PM

Before I get into all the details - this seems similar to the extract tool, except it deals with the log rather than the instrumentation map, right?

So pretty much all the same feedback as there (& I don't necessarily remember the original answers). We ended up with a one way conversion last time (to yaml, but not back again), should this one be different? If we only convert one way, is 'convert' the right name? Should be be more explicit about the difference between these two things, or just detect based on file magic & use one command name for both kinds of input files? (check if it's an ELF file then assume the user is trying to extract an instrumentation map, and otherwise assume the file's a log?)

tools/llvm-xray/func-id-helper.cc
40–42 ↗(On Diff #75988)

We don't usually bother putting {} on single-line if statements.

49–53 ↗(On Diff #75988)

Probably drop the {} here too (maybe even use the conditional operator, if you like)

tools/llvm-xray/xray-record.h
33 ↗(On Diff #73610)

Is the alignas attribute supported on all the platforms/compilers we care about?

53 ↗(On Diff #73610)

There's LLVM_PACKED you may need to use to make this portable to MSVC, by the looks of it.

(though I'm still intrinsically suspicious of splatting in/out of memory, and would be more comfortable seeing just 'obvious' code to read bytes and shift them into values, etc - but perhaps I'm in the minority/an outlier here and this sort of code is OK with everyone else)

A few of the inline comments from my last feedback are probably out of date/not relevant - I had them queued up from weeks/months ago.

That said - taking a guess at it: Yes, I'd use DataExtractor here over splatting/memcpying structs around for the same reasons.

dberris planned changes to this revision.Oct 31 2016, 8:58 PM

Before I get into all the details - this seems similar to the extract tool, except it deals with the log rather than the instrumentation map, right?

So pretty much all the same feedback as there (& I don't necessarily remember the original answers). We ended up with a one way conversion last time (to yaml, but not back again), should this one be different? If we only convert one way, is 'convert' the right name? Should be be more explicit about the difference between these two things, or just detect based on file magic & use one command name for both kinds of input files? (check if it's an ELF file then assume the user is trying to extract an instrumentation map, and otherwise assume the file's a log?)

The value of having a convert function is so that we can support other formats specifically for the xray log. Here are some future things we need to support:

  • Convert from an old version of the log into a newer version.
  • Convert from the XRay log format to another format. One of them is the Chrome Trace Viewer format, something that the Google Performance Tools (https://github.com/gperftools/gperftools) can consume, etc.

For now, the most convenient thing we can do is create YAML files. The important part here is the log file loading library, that we use in the accounting implementation (stacked on top of this change).

So the verb really is to "convert", as opposed to just "extract".

Does this make sense?

That said, I'll apply most of the changes in the 'extract' review here.

Cheers

dberris updated this revision to Diff 76528.Nov 1 2016, 1:21 AM
dberris edited edge metadata.
  • Add more tests, fix some bugs
  • Use DataExtractor, refactor a bit to use Error properly, and use a simpler implementation of the log reader.

Ready for a look now @dblaikie, thanks in advance!

PS. We no longer support round-tripping in this implementation.

dblaikie added inline comments.Nov 1 2016, 9:54 AM
tools/llvm-xray/func-id-helper.cc
28–29 ↗(On Diff #76528)

Fold ResOrErr into the if condition to reduce/match its scope to its use?

49–50 ↗(On Diff #76528)

Fold variable into if condition.

(alternatively - and this applies to the other case of the same - consider a short-exit:

if (!ResOrErr) {
  handleAllErrors(...)
  return F.str();
}

I know this means duplicating the "Return F.str()" code - but allows the main non-error code to be unindented, can make it easier to follow (by keeping the error handling close to the error, too).)

53–57 ↗(On Diff #76528)

Drop {} on single line blocks, probably

tools/llvm-xray/func-id-helper.h
39 ↗(On Diff #76528)

This should be implicit? Or did you want it to disable the other special members? Any particular reason? (might be worth a comment if that's the case)

tools/llvm-xray/xray-converter.cc
121–124 ↗(On Diff #76528)

If we need to support these formats for both input and output - I don't so much mind whether we support the null conversion/roundtrip. The point in the previous review was that we didn't need to handle binary output nor yaml input - but it sounds like you need all the different input and output modes here? (at the very least the native format will be a valid input (for conversion to non-native formats) and output (for upgrading) format, for example.

So, up to you - if supporting roundtripping is convenient/nice generalization, that's fine

126–129 ↗(On Diff #76528)

Hmm - now I'm a bit confused given after my last comment.

If we don't support roundtripping, and we don't support binary output - then the only thing we support is binary input and YAML output, so why the switch a few lines below that chooses the input format between binary and yaml? Isn't the yaml reading case in that switch unreachable and thus the YAMLogLoader unused/untested?

Indeed, no test seems to exercise -i/-input-format?

tools/llvm-xray/xray-extract.cc
193–194 ↗(On Diff #76528)

Roll variable into if condition

203–204 ↗(On Diff #76528)

Guessing it's mapped rather than streamed because that's the API for DataExtractor (dealing with a buffer)? Or does it have other needs to be mapped?

(might be nice if it didn't have to be - but I suppose there's no strong reason/benefit to it... )

210 ↗(On Diff #76528)

Might make sense to pass MappedFile.size() rather than fileSize, even though they're the same (& even maybe just having MappedFile have a way to access a StringRef of the range)

tools/llvm-xray/xray-log-reader.cc
87 ↗(On Diff #76528)

I probably wouldn't bother with explicit scope here (I usually only use them if I need the dtor behavior) - but I appreciate the benefit to scoping the variables and don't mind it if you prefer it this way.

112 ↗(On Diff #76528)

unneeded extra scope here?

tools/llvm-xray/xray-record.h
23 ↗(On Diff #76528)

Guessing this doesn't need an alignment attribute anymore?

38 ↗(On Diff #76528)

Is there an enum that should be used for this member?

dberris updated this revision to Diff 76910.Nov 3 2016, 11:06 PM
dberris marked 16 inline comments as done.
  • Address some comments
  • Add FIXME to support binary conversion

Thanks @dblaikie, PTAL.

tools/llvm-xray/func-id-helper.cc
49–50 ↗(On Diff #76528)

I like just limiting the scope of the variable, so I stuck with that. Also, now using sys::path::filename instead of manually futzing with the string.

tools/llvm-xray/xray-converter.cc
126–129 ↗(On Diff #76528)

Right, I think the better message here is "not yet supported". When we need the functionality then it seems better to change this then.

tools/llvm-xray/xray-extract.cc
203–204 ↗(On Diff #76528)

Yeah, the API for the DataExtractor deals with a buffer. We could be reading things in chunks at a time explicitly, but that seems unnecessary (unless mmapping is undesirable for other reasons).

tools/llvm-xray/xray-log-reader.cc
87 ↗(On Diff #76528)

No strong preference, this was a remnant of attempting to use the same variable name for the extractor -- but there's really no good reason for the explicit scope. :)

Just in case it wasn't clear, this one's ready for another look @dblaikie.

dblaikie added inline comments.Nov 21 2016, 1:52 PM
tools/llvm-xray/func-id-helper.cc
49–50 ↗(On Diff #76528)

I'll say, on reflection, that I kind of prefer keeping the diagnostic handling closer - rather than "if (success) { next bit of code } else { error handling for the if condition }"

but I'll leave that up to you - can get a feel for it and always change it later if it starts to get unwieldy

tools/llvm-xray/xray-converter.cc
126–129 ↗(On Diff #76528)

Dead/untested code is a to be avoided - so if the YAMLLogLoader is unused/untested, it probably shouldn't be committed yet.

It's difficult to keep track of test coverage if code is committed without coverage - then when the code becomes live it's not obvious that it was previously untested, etc.

152 ↗(On Diff #76910)

Spurious semicolon?

tools/llvm-xray/xray-extract.cc
203–204 ↗(On Diff #76528)

Can be nice to be able to read streaming input (general good for uniformity - lots of tools accept '-' as the filename to read from stdin), etc. Not a strong requirement by any means.

tools/llvm-xray/xray-log-reader.cc
113 ↗(On Diff #76910)

Could use 'emplace_back()' if you reckon that's more readable (I don't think it's less readable than {} at least).

tools/llvm-xray/xray-record.h
38 ↗(On Diff #76528)

The comment "Usually either ENTER = 0 or EXIT = 1" is no longer needed, as it's implied by the type. (perhaps could be reworded, etc - "Identifies this as an enter or exit record" or somesuch if that's particularly useful - though the enum is only a few lines away)

dberris updated this revision to Diff 78832.Nov 21 2016, 9:25 PM
dberris marked 4 inline comments as done.
  • Implement conversion to v1 binary format
tools/llvm-xray/xray-converter.cc
126–129 ↗(On Diff #76528)

Good point. I've re-implemented the binary output, and the round-tripping.

tools/llvm-xray/xray-extract.cc
203–204 ↗(On Diff #76528)

Actually, now that I read this part of the code again, it's the YAML Input that requires a full document to be provided in its constructor.

http://llvm.org/docs/YamlIO.html#input

tools/llvm-xray/xray-record.h
38 ↗(On Diff #76528)

Reworded to be short and sweet.

Ready for another look now.

(I was thinking more going the other direction: removing the dead code, rather than adding a use of it)

What's the purpose of the binary/raw output format? (I think it was mentioned that the binary output format would be for testing - so we could write yaml, then generate raw files - then feed those into other tools for testing? But if the yaml input format is supported everywhere, what's the purpose of that? We'll want to have a couple of binary tests, but I imagine they would test checked in binary -> yaml output (if they test yaml -> binary -> yaml, then I'm not sure they achieve much because it's just roundtripping so we have as much code under test (binary -> yaml) as we do in the test harness (yaml -> binary), effectively)

That's my mental model at least.

(this commit seems to add several .bin test files - but I don't see where those are used?)

tools/llvm-xray/xray-converter.cc
175–176 ↗(On Diff #78832)

This seems like an odd construct - should Extractor not error-out in the case where ConvertInstrMap is empty?

dberris updated this revision to Diff 79006.Nov 22 2016, 5:21 PM
dberris marked an inline comment as done.
  • Address review comments

(I was thinking more going the other direction: removing the dead code, rather than adding a use of it)

What's the purpose of the binary/raw output format? (I think it was mentioned that the binary output format would be for testing - so we could write yaml, then generate raw files - then feed those into other tools for testing? But if the yaml input format is supported everywhere, what's the purpose of that? We'll want to have a couple of binary tests, but I imagine they would test checked in binary -> yaml output (if they test yaml -> binary -> yaml, then I'm not sure they achieve much because it's just roundtripping so we have as much code under test (binary -> yaml) as we do in the test harness (yaml -> binary), effectively)

That's my mental model at least.

Other things going on concurrently (work on the FDR mode for the runtime library) is making me need to be able to turn versions of the binary log from one form to another. I've found that working with the binary versions is much more convenient from a tool perspective, and that this current simple format is more amenable to analysis than the condensed format I'm working on.

So effectively, version 2 of the log format is turning out to have both non-fixed-size records and more interleaving happening. This tool, that's able to take that complex format into something simpler/different doesn't make sense for generating just YAML files.

(this commit seems to add several .bin test files - but I don't see where those are used?)

Some of the tests are looking at symbolization of the function id's, and associating functions with debug info, etc. for coverage of the various modes by which the YAML output could be generated.

tools/llvm-xray/xray-converter.cc
175–176 ↗(On Diff #78832)

Good point.

Technically, that is an error in the use of the extractor's constructor, hence it signalling an error. Think about it as if the construction of the extractor had thrown an exception -- and in this case, the appropriate action is to ignore that exception.

So this is saying, if the filename was empty, then the extractor couldn't be initialised and therefore we ignore the error. In the case that it wasn't empty, then we should emit the error, but still not cause the tool to fail in execution.

Changed both here and the extract sub-command where we check the filename explicitly to not be empty.

It seems Phabricator had been swallowing some of my comments (and not sending out email messages).

This is ready for another look @dblaikie.

(I was thinking more going the other direction: removing the dead code, rather than adding a use of it)

What's the purpose of the binary/raw output format? (I think it was mentioned that the binary output format would be for testing - so we could write yaml, then generate raw files - then feed those into other tools for testing? But if the yaml input format is supported everywhere, what's the purpose of that? We'll want to have a couple of binary tests, but I imagine they would test checked in binary -> yaml output (if they test yaml -> binary -> yaml, then I'm not sure they achieve much because it's just roundtripping so we have as much code under test (binary -> yaml) as we do in the test harness (yaml -> binary), effectively)

That's my mental model at least.

Other things going on concurrently (work on the FDR mode for the runtime library) is making me need to be able to turn versions of the binary log from one form to another. I've found that working with the binary versions is much more convenient from a tool perspective, and that this current simple format is more amenable to analysis than the condensed format I'm working on.

I'm not sure I'm following here - could you describe this in more detail?

My understanding was that tools would likely use these LLVM APIs for data handling (or they'd use YAML input) - so there wouldn't be any benefit to any particular format (except YAML) since these LLVM APIs could handle all the formats. This would make the binary format(s) fairly niche - just convenient for runtime emission, but not desirable for any other use.

Where have I gone wrong here?

So effectively, version 2 of the log format is turning out to have both non-fixed-size records and more interleaving happening. This tool, that's able to take that complex format into something simpler/different doesn't make sense for generating just YAML files.

(this commit seems to add several .bin test files - but I don't see where those are used?)

Some of the tests are looking at symbolization of the function id's, and associating functions with debug info, etc. for coverage of the various modes by which the YAML output could be generated.

Not quite following, more practically: which tests are these files used in?

(I was thinking more going the other direction: removing the dead code, rather than adding a use of it)

What's the purpose of the binary/raw output format? (I think it was mentioned that the binary output format would be for testing - so we could write yaml, then generate raw files - then feed those into other tools for testing? But if the yaml input format is supported everywhere, what's the purpose of that? We'll want to have a couple of binary tests, but I imagine they would test checked in binary -> yaml output (if they test yaml -> binary -> yaml, then I'm not sure they achieve much because it's just roundtripping so we have as much code under test (binary -> yaml) as we do in the test harness (yaml -> binary), effectively)

That's my mental model at least.

Other things going on concurrently (work on the FDR mode for the runtime library) is making me need to be able to turn versions of the binary log from one form to another. I've found that working with the binary versions is much more convenient from a tool perspective, and that this current simple format is more amenable to analysis than the condensed format I'm working on.

I'm not sure I'm following here - could you describe this in more detail?

Sure, there's a patch I've uploaded recently that has a lot more about this different format (see D27038).

My understanding was that tools would likely use these LLVM APIs for data handling (or they'd use YAML input) - so there wouldn't be any benefit to any particular format (except YAML) since these LLVM APIs could handle all the formats. This would make the binary format(s) fairly niche - just convenient for runtime emission, but not desirable for any other use.

Where have I gone wrong here?

That's one half of the story.

The other half is the new log format (still binary) is not conducive to this simple approach of loading up records with full information. The FDR mode log format is substantially different, in that per-thread buffers are chunked up into fixed-sized blocks and each record might be a metadata record (16 bytes) or a function record (8 bytes). That log has all sorts of information consolidated in very different ways (e.g. the thread ID is stored once, as a metadata record at the beginning of a buffer, CPU id's only show up when the CPU is first encountered, we store TSC deltas instead of full TSCs in function records), and is not conducive to just "load in memory, read records one by one".

So then, the conversion tool ought to be able to convert the FDR mode logs into this simpler log format that we already support in this tool -- then let all the other tools upcoming (accounting, timeline visualisation, etc.) just use the simpler log format.

The closest analogy I can make for existing tools that do something similar would be 'dot' and friends.

So effectively, version 2 of the log format is turning out to have both non-fixed-size records and more interleaving happening. This tool, that's able to take that complex format into something simpler/different doesn't make sense for generating just YAML files.

(this commit seems to add several .bin test files - but I don't see where those are used?)

Some of the tests are looking at symbolization of the function id's, and associating functions with debug info, etc. for coverage of the various modes by which the YAML output could be generated.

Not quite following, more practically: which tests are these files used in?

convert-with-debug-syms.txt
convert-with-standalone-instrmap.txt
convert-with-yaml-instrmap.txt

(I was thinking more going the other direction: removing the dead code, rather than adding a use of it)

What's the purpose of the binary/raw output format? (I think it was mentioned that the binary output format would be for testing - so we could write yaml, then generate raw files - then feed those into other tools for testing? But if the yaml input format is supported everywhere, what's the purpose of that? We'll want to have a couple of binary tests, but I imagine they would test checked in binary -> yaml output (if they test yaml -> binary -> yaml, then I'm not sure they achieve much because it's just roundtripping so we have as much code under test (binary -> yaml) as we do in the test harness (yaml -> binary), effectively)

That's my mental model at least.

Other things going on concurrently (work on the FDR mode for the runtime library) is making me need to be able to turn versions of the binary log from one form to another. I've found that working with the binary versions is much more convenient from a tool perspective, and that this current simple format is more amenable to analysis than the condensed format I'm working on.

I'm not sure I'm following here - could you describe this in more detail?

Sure, there's a patch I've uploaded recently that has a lot more about this different format (see D27038).

My understanding was that tools would likely use these LLVM APIs for data handling (or they'd use YAML input) - so there wouldn't be any benefit to any particular format (except YAML) since these LLVM APIs could handle all the formats. This would make the binary format(s) fairly niche - just convenient for runtime emission, but not desirable for any other use.

Where have I gone wrong here?

That's one half of the story.

The other half is the new log format (still binary) is not conducive to this simple approach of loading up records with full information. The FDR mode log format is substantially different, in that per-thread buffers are chunked up into fixed-sized blocks and each record might be a metadata record (16 bytes) or a function record (8 bytes). That log has all sorts of information consolidated in very different ways (e.g. the thread ID is stored once, as a metadata record at the beginning of a buffer, CPU id's only show up when the CPU is first encountered, we store TSC deltas instead of full TSCs in function records), and is not conducive to just "load in memory, read records one by one".

So then, the conversion tool ought to be able to convert the FDR mode logs into this simpler log format that we already support in this tool -- then let all the other tools upcoming (accounting, timeline visualisation, etc.) just use the simpler log format.

I'm still a bit confused though - why not just convert to the yaml format at that point? (& why convert at all - if we're talking about other tools built on top of LLVM's library (tools built outside of that would presumably only rely on the YAML?) would use a common parser/reader API that could handle all formats, right? Rather than having the user run a conversion tool, then the tool they want to use)

The closest analogy I can make for existing tools that do something similar would be 'dot' and friends.

So effectively, version 2 of the log format is turning out to have both non-fixed-size records and more interleaving happening. This tool, that's able to take that complex format into something simpler/different doesn't make sense for generating just YAML files.

(this commit seems to add several .bin test files - but I don't see where those are used?)

Some of the tests are looking at symbolization of the function id's, and associating functions with debug info, etc. for coverage of the various modes by which the YAML output could be generated.

Not quite following, more practically: which tests are these files used in?

convert-with-debug-syms.txt
convert-with-standalone-instrmap.txt
convert-with-yaml-instrmap.txt

ah, hmm - I saw the .bin files (some of them) added in nthe last update - so I guess maybe they just hadn't been added previously, but the uses were already there. No worries.

The other half is the new log format (still binary) is not conducive to this simple approach of loading up records with full information. The FDR mode log format is substantially different, in that per-thread buffers are chunked up into fixed-sized blocks and each record might be a metadata record (16 bytes) or a function record (8 bytes). That log has all sorts of information consolidated in very different ways (e.g. the thread ID is stored once, as a metadata record at the beginning of a buffer, CPU id's only show up when the CPU is first encountered, we store TSC deltas instead of full TSCs in function records), and is not conducive to just "load in memory, read records one by one".

So then, the conversion tool ought to be able to convert the FDR mode logs into this simpler log format that we already support in this tool -- then let all the other tools upcoming (accounting, timeline visualisation, etc.) just use the simpler log format.

I'm still a bit confused though - why not just convert to the yaml format at that point? (& why convert at all - if we're talking about other tools built on top of LLVM's library (tools built outside of that would presumably only rely on the YAML?) would use a common parser/reader API that could handle all formats, right? Rather than having the user run a conversion tool, then the tool they want to use)

YAML is a little big and verbose -- the blow-up from the binary format to YAML is a factor of at least 10. In practical terms, dealing with the binary format in terms of storage and transport is a good option.

YAML is convenient for human consumption, and if users ever build tools that use scripting languages or other things already deal with YAML (or JSON, or some other text format) then that's the convenient path to take.

Maybe the solution here is to move the log reading part into something that's part of the LLVM library, but that still has the problem that reconstruction of a consistent timeline representing events on multiple threads is much more complex than one that just takes a linear "simple" binary file. Then the XRay log file that has non-fixed-size records and more complex encoding ought to be converted to the simplified form.

So the compromise/trade-off being made here is that we localize the knowledge of dealing with the different XRay binary log formats to this tool, that can turn it into "the simplest XRay log format", that can be read by simpler libraries for consumption. Given that, and the plan of being able to convert these XRay log files into other non-XRay specific formats (Chrome trace viewer format which is JSON, or the perftools utilities, etc.) is a huge convenience. This convenience is not only for testing purposes but also for building and integrating with other existing tools that deal with all sorts of traces.

Does this make sense?

dberris updated this revision to Diff 80028.Dec 1 2016, 9:57 PM

Rebase again

dberris updated this revision to Diff 80031.Dec 1 2016, 10:30 PM

Squash local commits in the hopes of producing a simpler patch.

echristo edited edge metadata.Jan 3 2017, 5:38 PM

So then, the conversion tool ought to be able to convert the FDR mode logs into this simpler log format that we already support in this tool -- then let all the other tools upcoming (accounting, timeline visualisation, etc.) just use the simpler log format.

I'm still a bit confused though - why not just convert to the yaml format at that point? (& why convert at all - if we're talking about other tools built on top of LLVM's library (tools built outside of that would presumably only rely on the YAML?) would use a common parser/reader API that could handle all formats, right? Rather than having the user run a conversion tool, then the tool they want to use)

YAML is a little big and verbose -- the blow-up from the binary format to YAML is a factor of at least 10. In practical terms, dealing with the binary format in terms of storage and transport is a good option.

I can agree with this.

YAML is convenient for human consumption, and if users ever build tools that use scripting languages or other things already deal with YAML (or JSON, or some other text format) then that's the convenient path to take.

Well, not entirely convenient. It would be nice to have a "dumping" output that produces an easy to consume by normal humans format that isn't the full yaml format for specifying.

Maybe the solution here is to move the log reading part into something that's part of the LLVM library, but that still has the problem that reconstruction of a consistent timeline representing events on multiple threads is much more complex than one that just takes a linear "simple" binary file. Then the XRay log file that has non-fixed-size records and more complex encoding ought to be converted to the simplified form.

I think that the format reading should definitely be a part of the llvm library and not the tool in particular. The tool should just wrap the library.

So the compromise/trade-off being made here is that we localize the knowledge of dealing with the different XRay binary log formats to this tool, that can turn it into "the simplest XRay log format", that can be read by simpler libraries for consumption. Given that, and the plan of being able to convert these XRay log files into other non-XRay specific formats (Chrome trace viewer format which is JSON, or the perftools utilities, etc.) is a huge convenience. This convenience is not only for testing purposes but also for building and integrating with other existing tools that deal with all sorts of traces.

Does this make sense?

Being able to convert seems useful in general. You want to be able to write testcases in something human readable, but you want the default output to be binary for compactness.

-eric

I think that the format reading should definitely be a part of the llvm library and not the tool in particular. The tool should just wrap the library.

Cool, thanks Eric -- D28345 does just the log reading library which depends on this change landing.

So the compromise/trade-off being made here is that we localize the knowledge of dealing with the different XRay binary log formats to this tool, that can turn it into "the simplest XRay log format", that can be read by simpler libraries for consumption. Given that, and the plan of being able to convert these XRay log files into other non-XRay specific formats (Chrome trace viewer format which is JSON, or the perftools utilities, etc.) is a huge convenience. This convenience is not only for testing purposes but also for building and integrating with other existing tools that deal with all sorts of traces.

Does this make sense?

Being able to convert seems useful in general. You want to be able to write testcases in something human readable, but you want the default output to be binary for compactness.

Good point. So I think "conversion" is a bit more loaded than just "dump", where "dump" really just makes the representation more human-readable (for debugging, or just turning machine-readable form to human-readable form).

Will renaming this from 'convert' to 'dump' make that better?

-eric

dblaikie accepted this revision.Jan 9 2017, 11:25 AM
dblaikie edited edge metadata.

Good point. So I think "conversion" is a bit more loaded than just "dump", where "dump" really just makes the representation more human-readable (for debugging, or just turning machine-readable form to human-readable form).

Will renaming this from 'convert' to 'dump' make that better?

I don't think that'd be accurate - this is doing format conversions between semantic preserving representations intended for machine consumption. Our dumping tools are generally intended only for human consumption (or really hacky machine consumption in scripts for quick work sometimes (eg: some scripts I'm writing using objdump/dwarfdump for doing some object size analysis)).

tools/llvm-xray/xray-extract.cc
220 ↗(On Diff #80031)

push_back? (since you're specifying the type anyway, etc)

tools/llvm-xray/xray-log-reader.cc
165 ↗(On Diff #80031)

I'd skip the 'Tmp' temporary, and put the result directly into 'Records' (because it seems simpler - but arguably could have slightly improved performance if 'Records' already has a large capacity that can be reused)

This revision is now accepted and ready to land.Jan 9 2017, 11:25 AM
dberris updated this revision to Diff 83764.Jan 9 2017, 6:47 PM
dberris marked 2 inline comments as done.
dberris edited edge metadata.
  • Address review comments
This revision was automatically updated to reflect the committed changes.