This is an archive of the discontinued LLVM Phabricator instance.

[XRay] Implement `llvm-xray extract`, start of the llvm-xray tool
ClosedPublic

Authored by dberris on Jul 5 2016, 3:56 AM.

Details

Summary

Usage:

llvm-xray extract <object file> [-o <filename or '-'>]

The tool gets the XRay instrumentation map from an object file and turns
it into YAML. We first support ELF64 sleds on x86_64 binaries, with
provision for supporting other supported platforms and formats later.

This is the first of a many-part change to fully implement the
llvm-xray tool.

We also define a subcommand registration and dispatch mechanism to be
used by other further subcommand implementations for llvm-xray.

Depends on D21982 for the in-memory logging in compiler-rt.
Depends on D21983 for the changes to flags in clang.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Thanks @dblaikie -- I'll make some changes and ping when ready. :)

tools/llvm-xray/xray-extract.cc
52–58

Could we detect which kind of file we're reading with file magic instead of passing an argument to specify the input format?

Not with how we've defined it in compiler-rt, no. :(

Is it useful to specify the input format as YAML if the only task to be performed is to generate YAML? Perhaps the only input format should be ELF/object file.

Good point. For extraction it doesn't make sense. But the class that does extraction can be used (and actually is used in later patches) to load an instrumentation map in YAML. I'll remove the flag and assume that extraction will only ever make sense on ELF/object files.

63–68

LLVM isn't too fussy about using declarations/directives - you're welcome to "using llvm::yaml" if that seems suitable, I think? (maybe the yaml namespace has particularly vague names that are problematic?)

Right -- I was merely following the examples. I'll try with just using namespace llvm::yaml here.

(also we don't tend to bother with the '::' global scope qualifier)

Will fix this -- force of habit. :)

105

If you can get away without needing to be convertible to error_code, that's probably a good thing (it's just there for backwards compatibility in APIs where we haven't migrated the whole stack to llvm::Error yet)

Unfortunately that seems to be a pure virtual function in ErrorInfo<...>. :(

I like StringError though, I'll change this to use StringError instead (less types, better).

280

Yeah, we're constrained by the structure that's in disk. I thought about only ever using the YAML struct for everything, and it boils down to allowing it to be ported/used by other classes/functions that need to load the instrumentation map and don't need any of the YAML functionality.

286–287

This is vestigial -- it used to be that there was a function defined in the namespace, and it's changed to this global registration approach. I'll remove the namespaces. :)

289

Good idea, yeah I'll change this to make the function return an llvm::Error and consolidate the logging and exit in main.

dberris updated this revision to Diff 74477.Oct 13 2016, 12:30 AM
dberris marked 15 inline comments as done.
  • Address some review comments

I haven't added the tests, will do so when the style questions have been resolved.

tools/llvm-xray/xray-extract.cc
131–133

I'll add more test cases in the next revision.

dblaikie added inline comments.Oct 14 2016, 10:26 AM
tools/llvm-xray/xray-extract.cc
238–240

Previous code only printed this "Cannot extract" message, new code will print that as well as whatever text is in Err, right? Is that a desired change?

Is there some nice/easy way to append/prepend the "Cannot extract" text to the exsiting Error to pass up to main to print there? (splitting the diagnostic printing between the two places seems a bit awkward)

246–247

Ideally pass up the string and error code here, rather than printing then passing?

280

Still, think it might be nice to move away from it being splatted back and forth to memory like that - and having the expected C++ types (like enums and bools).

dberris updated this revision to Diff 74815.Oct 16 2016, 11:05 PM
dberris marked 2 inline comments as done.
  • Use errors exclusively

Please have another look @dblaikie -- before I add more tests?

tools/llvm-xray/xray-extract.cc
280

I agree -- on the ELF section (and the stuff embedded in the binaries), I think we're a bit constrained by what we can effectively write on the compiler-side (and read on the tooling side). On the trace files though I'm a bit more open to working with a unified encoding format, that was a bit more clever than it currently is.

dblaikie added inline comments.Oct 17 2016, 10:31 AM
tools/llvm-xray/xray-extract.cc
280

I agree -- on the ELF section (and the stuff embedded in the binaries), I think we're a bit constrained by what we can effectively write on the compiler-side (and read on the tooling side). On the trace files though I'm a bit more open to working with a unified encoding format, that was a bit more clever than it currently is.

Sorry I think I'm still confused/we're talking past each other. All I mean is that, rather than this tool splatting into a carefully laid out struct, it would use something like DataExtractor to extract the carefully laid out bytes into a more generic struct.

The file formats and writer would remain the same - just the reader would be more robust and more a usable type for the rest of the APIs.

(I suppose put another way: Using the YAML struct with its YAML types isn't really ideal for general tool code that's trying to process these records. Equally, using a carefully laid out struct with surprising types (char instead of bool/enum, explicit padding, etc) has the same problem. The main interchange structure should probably be the generic semantic thing, not YAML or binary file related - and once that happens, we could skip the packed struct entirely and just read in the few records with DataExtractor or a similar API)

dberris updated this revision to Diff 74945.Oct 17 2016, 10:42 PM
dberris marked an inline comment as done.
  • Use a common sled representation, hide ELF64-specific implementation detail

Please have another look @dblaikie.

tools/llvm-xray/xray-extract.cc
280

Ah, yes, this makes sense. I've hidden the ELF64-specific sled layout and defined a higher level SledEntry type that could be exposed later.

dblaikie added inline comments.Oct 18 2016, 10:14 AM
tools/llvm-xray/xray-extract.cc
142–145

I'm still a bit concerned about doing this via memory mapping (what if we are on a different platform that happens to add some extra padding between fields, etc?)

So I'd suggest using DataExtractor or similar (then you don't have to worry about alignment either) techniques/devices/tools, probably?

Maybe it just seems like this function is harder to read than it is in the code review, but might consider breaking it up - perhaps taking all the error handling at the beginning and turning it into one utility function, so this function (LoadBinaryInstrELF) can focus on the parsing, etc (or break out this parsing into another function, etc - or even both).

263–267

Would it make sense to sink this code into InstrumentationMapExtractor's ctor? (it has all the information - it knows it's trying to extract an instrumentation map, and the name of the input - so it seems it should be responsible for creating this message, maybe?)

dberris updated this revision to Diff 75109.Oct 18 2016, 11:46 PM
  • Use DataExtractor to load data from the memory mapped object file.
tools/llvm-xray/xray-extract.cc
142–145

Ah, I get it now. I'm using llvm::DataExtractor now.

Refactoring this further seems pre-mature, given that this is already an implementation detail. When supporting other formats, it would be something to think about, I agree.

263–267

Actually, nope -- because users of this class could choose to ignore the errors (i.e. treat it as if there was no available instrumentation map). It's just that for this sub-command, it wouldn't work if extraction actually failed. :)

Generally looking pretty good - a few little nits & more test coverage (error cases, for example) is probably a good idea.

test/tools/llvm-xray/X86/extract-instrmap.ll
6–7

Convenient how?
I suppose what i'm trying to understand is: why /both/? (which is to say, what does each offer that the other doesn't)

13

Is it worth checking the specific addresses? If this test is simple enough to provide fairly stable values here - to make sure the algorithms, etc, are working correctly & not just printing garbage values?

tools/llvm-xray/xray-extract.cc
86

I'd sink this down into the function/near the use (the general principle of keeping variables in the smallest scope needed)

164

Drop the "? false : true" here, and use !=, perhaps:

"Entry.AlwaysInstrument = AlwaysInstrument != 0" (potentially even drop the "!= 0", but I can see how that helps readability.

182

Move assignment rather than swap? (but equally I wouldn't mind the old code that left the OutputSleds in an unspecified state on failure)

234–238

This code (& LoadYAMLInstrFile) is dead/untested - perhaps should be moved into another/separate change.

263–267

I'm not sure I understand this - could you rephrase?

What I mean is: The joining of this "Cannot extract instrumentation map from" seems like it could go inside the InstrumentationMapExtractor ctor (since it has all the context to make that message and the message seems appropriate at that level). Then at this level (in the CommandRegistration) we just propagate the error up ("if (Err) return Err; or whatever)

tools/llvm-xray/xray-extract.h
39

Doesn't look like this needs to be a member? (it's only used during construction, if I'm reading it correctly)

51

Unused/dead code

tools/llvm-xray/xray-registry.h
32–33

Could just make this a struct - since it's only member's public anyway.

37

It doesn't seem like it actually requires that SC is not null (& pedantically the terminology would be "not null" (or "not a null pointer" - 'nullptr' is just some specific null pointer literal)) - but, sure - seems OK to say that's not acceptable even if it'd be fine for the current implementation.

Is it worth just defining this function to only work if the SubCommand is registered? (assert when not found, instead of returning the empty std::function)

dberris updated this revision to Diff 75393.Oct 20 2016, 11:52 PM
dberris marked 12 inline comments as done.
  • Add failure tests
  • Address more review comments
dberris marked an inline comment as done.Oct 20 2016, 11:53 PM
dberris added inline comments.
test/tools/llvm-xray/X86/extract-instrmap.ll
6–7

JSON was convenient for making it easily parse-able and view-able by anything that will show it on a browser. :)

13

Yep, added a few more functions to be a bit more complete.

tools/llvm-xray/xray-extract.cc
263–267

Ah, right -- I've moved the concatenation of the Twine into the error generation parts.

tools/llvm-xray/xray-extract.h
51

Unused/dead code

Yes, but this is actually used in a later (stacked) change...

dberris marked an inline comment as done.Oct 20 2016, 11:58 PM

This is ready for another look @dblaikie -- added some failure tests that were easy to catch (not found, no instrumentation map) and provided some sample binaries to use instead of creating one on the fly (as it seemed harder to generate a final linked executable from a .ll).

If you have ideas about how to make the sequence:

llc ...; <link the .o into a final binary> ;  llvm-xray extract <final binary>

work in a .ll test, I'd be very interested to try.

This is ready for another look @dblaikie -- added some failure tests that were easy to catch (not found, no instrumentation map) and provided some sample binaries to use instead of creating one on the fly (as it seemed harder to generate a final linked executable from a .ll).

If you have ideas about how to make the sequence:

llc ...; <link the .o into a final binary> ;  llvm-xray extract <final binary>

work in a .ll test, I'd be very interested to try.

Could you explain why you need a linked binary rather than just an object file? What changes? Relocations, I guess? All the addresses look like zero?

Yeah, doubt there's much to do about that - no way to produce linked binaries with the LLVM tools available to the test suite, and no point implementing support in the tool to handle unapplied relocations (unlike dwarfdump which does handle them, and there's more reason to run dwarfdump on unlinked objects, so it's worth having that support for more than just testing).

tools/llvm-xray/xray-extract.cc
90–93

Untested (do we need this test? Presumably we just won't be able to open the file)

98–99

I'm guessing this is the path we take for "no such file or directory"? (ie: covered by the test for that)

101–104

Untested

119–122

Untested (though, granted - not sure quite how to test this, but could look further into libObject to see how getContents can fail)

128–132

Untested

207–209

Guess this should just be an assert for now? (represents a programmer error/isn't reachable/testable/etc?)

tools/llvm-xray/xray-registry.cc
35–36

Replace branch-to-unreachable with assert.

dberris marked 4 inline comments as done.Oct 24 2016, 10:17 PM

This is ready for another look @dblaikie -- added some failure tests that were easy to catch (not found, no instrumentation map) and provided some sample binaries to use instead of creating one on the fly (as it seemed harder to generate a final linked executable from a .ll).

If you have ideas about how to make the sequence:

llc ...; <link the .o into a final binary> ;  llvm-xray extract <final binary>

work in a .ll test, I'd be very interested to try.

Could you explain why you need a linked binary rather than just an object file? What changes? Relocations, I guess? All the addresses look like zero?

The addresses are basically zero on a .o :/

Yeah, doubt there's much to do about that - no way to produce linked binaries with the LLVM tools available to the test suite, and no point implementing support in the tool to handle unapplied relocations (unlike dwarfdump which does handle them, and there's more reason to run dwarfdump on unlinked objects, so it's worth having that support for more than just testing).

Yep, I think the binary should be fine here.

However, I'm having some trouble generating malformed binaries without resorting to hex-editing some existing files. Any recommendations for faking "bad" inputs to test the error conditions?

tools/llvm-xray/xray-extract.cc
90–93

Good point, removed.

98–99

Yes. that's correct.

dberris updated this revision to Diff 75670.Oct 24 2016, 10:17 PM
dberris marked 2 inline comments as done.
  • Add test for empty, convert some if-unreachable to assert
dberris updated this revision to Diff 75672.Oct 24 2016, 11:23 PM
dberris marked an inline comment as done.
  • Add elf32 sample and test for unsupportedness
  • Add test for badly sized instrumentation map entries.

Added a couple more tests, ptal -- I haven't quite resorted to hex-editing files, but close enough with deliberately generating a bad XRay instrumentation map with a modified clang+llvm local build.

tools/llvm-xray/xray-extract.cc
119–122

I looked, and it had something to do with the ELF encoding of a file (say, if the supposed size of the section defined in the header is different in reality (through some checks)). Not sure how to properly test this yet.

dblaikie accepted this revision.Oct 25 2016, 8:52 AM
dblaikie edited edge metadata.
dblaikie added inline comments.
tools/llvm-xray/llvm-xray.cc
36

Perhaps it'd be better to catch this case earlier - disallow registering with a null/empty function (just assert) rather than quietly not executing anything?

This revision is now accepted and ready to land.Oct 25 2016, 8:52 AM
dberris updated this revision to Diff 75825.Oct 25 2016, 6:46 PM
dberris marked an inline comment as done.
dberris edited edge metadata.

Squash local commits and rebase before landing.

Thanks for the review @dblaikie!

tools/llvm-xray/llvm-xray.cc
36

Good idea. Added an assert on the registration path to make sure the function is not "empty".

dberris updated this revision to Diff 75826.Oct 25 2016, 6:51 PM
dberris updated this object.

Update description.

dberris reopened this revision.Oct 25 2016, 9:22 PM

Rolled back in rL285155, broke the tests.

This revision is now accepted and ready to land.Oct 25 2016, 9:22 PM