This is an archive of the discontinued LLVM Phabricator instance.

Add parser for the stackmap section format
AbandonedPublic

Authored by reames on Jun 10 2015, 5:58 PM.

Details

Summary

Add an externally facing parser for the stackmap section generated for stackmaps, patchpoint, and statepoint. The intended users of the parser are consumers of LLVM - mainly users of MCJIT.

Our documented usage model for all of these intrinsics is to have users parse the stackmap format and generate whatever custom side tables they may require. We know have multiple active groups that need such a parser, so it's time to find a place to put it for code sharing. This code is base around code developed for our particular frontend. The LLILC group has expressed interest in getting access to the code and another unrelated user spoke up on llvm-dev today.

To be explicitly clear about support: the parser explicitly and intentionally only supports parsing the section generated by the exact same build of LLVM. There are NO compatibility guarantees provided even across major releases.

I would welcome suggestions on how to test this in tree. I didn't see any obvious places to add a stackmap section parser. Would it make sense to add a dedicated parser/printer tool? Or is there an existing one I can add to? I'm reasonable sure I've introduced at least one bug when migrating the code, but without backporting into my own tree, I don't have a way to flesh these out.

Diff Detail

Event Timeline

reames updated this revision to Diff 27483.Jun 10 2015, 5:58 PM
reames retitled this revision from to Add parser for the stackmap section format.
reames updated this object.
reames edited the test plan for this revision. (Show Details)
reames added a subscriber: Unknown Object (MLST).

For StackMapSection::dump(), my suggestion is to follow the other dumper routines and send the output to dbgs(), instead of using printfs.

void StackMapSection::dump() const { print(dbgs()); }

void StackMapSection::print(raw_ostream &OS) const
{

OS << "Functions (" << static_cast<int>(FnSizeRecords.size()) << ") [\n";
for (uint16_t j = 0; j < FnSizeRecords.size(); j++) {
  OS << "  addr = " << static_cast<unsigned>(FnSizeRecords[j].FunctionAddr);
  OS << ", size = " << static_cast<unsigned>(FnSizeRecords[j].StackSize);
  OS << "\n";
}
OS << "]\n";

OS << "Constants (" << static_cast<int>(Constants.size()) << ") [\n";
for (uint16_t j = 0; j < Constants.size(); j++) {
  OS << "  value = " << static_cast<unsigned>(Constants[j]);
}
OS << "]\n";

OS << "Records (" << static_cast<int>(Records.size()) << ") [\n";
for (unsigned i = 0; i < Records.size(); i++) {
  const StackMapRecord& Rec = Records[i];
  OS << "  id = " << Rec.PatchPointID;
  OS << ", offset" << Rec.InstructionOffset;
  OS << ", flags=" << Rec.ReservedFlags;
  OS << "\n";

  OS << "  Locations (" << static_cast<int>(Rec.Locations.size()) << ") [\n";
  for (uint16_t j = 0; j < Rec.Locations.size(); j++) {
    const LocationRecord& Loc = Rec.Locations[j];
    OS << "    type = " << locationTypeToString(Loc.Type);
    OS << ", size = " << (size_t)Loc.SizeInBytes;
    OS << ", dwarfreg = " << Loc.DwarfRegNum;
    OS << ", offset = " << Loc.Offset;
    OS << "\n";
  }
  OS << "  ]\n";
}
OS << "]\n";

}

swaroop.sridhar accepted this revision.Jun 10 2015, 6:15 PM
swaroop.sridhar edited edge metadata.

Looks good to me, other than the comment above.

This revision is now accepted and ready to land.Jun 10 2015, 6:15 PM
sanjoy edited edge metadata.Jun 10 2015, 6:22 PM

This change mostly has stylistic issues -- I've commented on some; and whitespace needs to be fixed (I'd just run the change through clang-format).

As far as testing is concerned, does it make sense to run the parser over a binary blob in a test in unittests/?

lib/CodeGen/StackMaps.cpp
39

Should be ParsePrimitive.

43

So the machine generating the stackmap section and the machine parsing them should have the same endianness -- should this be documented explicitly?

49

LLVM style is uint8_t *Data etc.

114

foreach?

124

foreach?

pgavlin edited edge metadata.Jun 12 2015, 9:59 AM

+1 to Swaroop's suggestion to use an ostream rather than printf.

include/llvm/CodeGen/StackMaps.h
48

s/it's/its

lib/CodeGen/StackMaps.cpp
43

It would certainly be nice to have his were documented if it is intentional.

52

I don't see anything that guarantees that the layout of a LocationRecord value matches the layout of the fields in the input buffer w.r.t. padding. I think that either LocationRecord needs to have __attribute__((packed)) applied (or something similar that will work across MSVC, GCC, and clang) or each field will need to be set individually using the result of an appropriate call to parse_primitive.

149

Maybe mark these as TODOs to help them stand out a bit better.

lhames added a subscriber: lhames.Jun 12 2015, 4:52 PM

This should definitely be ostream based, rather than printf based.

It also duplicates the contents of the stackmap section, which is unfortunate. The section will already be in memory, so ideally we should just have accessors for the fields. I'm writing up something along these lines at the moment and will post it shortly.

Hi all,

I couldn't figure out how to neatly attach an alternative patch to this review, so I've posted mine at http://reviews.llvm.org/D10434.

That review is only a sketch of the idea. Still on the to-do list are the location data structure, version check, and richer debug-dumps (i.e with named registers, fields).

I'm curious about how the data from the stackmap section is used by clients. If all clients are interested in all stackmap contents then it seems reasonable to provide an abstract StackMap data structure (like the one proposed here) and use my parser to parse it. If different clients use different portions of the stackmap sections, and are likely to want to parse the stackmap section into custom data structures, then maybe we're best of just providing stackmap parsers in-tree.

I like Lang Hames' idea of using copy-free accessors to read the data directly from the __llvm.stackmaps section loaded into memory.
In particular, if the client compiler needs to read the stackmap section and translate it to an alternate format specific to its runtime, its better to avoid creating an intermediate copy between the two versions.

I prefer Lang's approach as well--I certainly wouldn't mind having this checked in as an intermediate step, however.

Unless anyone objects, I'm going to check in a cleaned up version of
this change with style comments addressed and a use in objdump. I agree
that the flyweight parser is probably a better long term approach, but
this a) works, and b) is incremental progress towards that goal. We can
adapt the parser once it's in and tested.

If anyone wants to take lead on the flyweight style parser, I'd welcome
that. It's probably not something I'll get to short term. I am very
happy to review patches.

Philip

reames abandoned this revision.Jul 22 2015, 4:03 PM

Lang landed his version of a stackmap parser a while back.