This is an archive of the discontinued LLVM Phabricator instance.

Coverage: Documentation for the coverage mapping format
ClosedPublic

Authored by arphaman on Jul 30 2014, 12:06 PM.

Details

Summary

This patch provides documentation for the coverage mapping format.

It uses html based examples for high level overview to show the difference
between the mapping regions using different colours.

Diff Detail

Repository
rL LLVM

Event Timeline

arphaman updated this revision to Diff 12042.Jul 30 2014, 12:06 PM
arphaman retitled this revision from to Coverage: Documentation for the coverage mapping format.
arphaman updated this object.
arphaman edited the test plan for this revision. (Show Details)
arphaman added reviewers: bogner, bob.wilson, dexonsmith.
arphaman set the repository for this revision to rL LLVM.
arphaman added a subscriber: Unknown Object (MLST).
silvas added a subscriber: silvas.Jul 30 2014, 6:15 PM
silvas added inline comments.
docs/CoverageMappingFormat.rst
14–16 ↗(On Diff #12042)

Abstracts don't work well for most kinds of documentation (this case included). It's much more important to specify your assumptions about your readers and what the document is *trying to achieve for those readers*, than just trying to summarize the document's contents per se. See http://llvm.org/docs/SphinxQuickstartTemplate.html#guidelines for some basic guidelines.

43–44 ↗(On Diff #12042)

A reader could make it here (and actually beyond here) thinking that this format is the run-time data structure that is used to store counts; I speak from personal experience...

Please don't address this comment by just saying somewhere "this is not a discussion of the runtime data structure". Instead, describe where this information fits in the coverage pipeline (i.e. the entire process of going from "a user's source code" to "a user's source code displayed to them with coverage information annotated on it"), which programs handle/create/look at this information, what does "mapping" have to do with anything?, etc. This is key "orienting" information that the reader absolutely has to have clear for the document to make any sense at all!

Obviously, it can be deduced from various aspects of this document what role this information must play. But the reader shouldn't have to play Sherlock. Actually, I can only find three clues that what this document describes is *not* the runtime data structure: "stored in object files and IR", "produced by frontend", "stores its integers in a format not suitable for random-access".

56 ↗(On Diff #12042)

Is there a reason you didn't use a <pre> here? These non-breaking space entities make it basically impossible to edit this manually or review it in source form.

189 ↗(On Diff #12042)

This is Darwin-specific. If this has to be included please accompany it with a discussion of portability and what would be different for other platforms.

224–225 ↗(On Diff #12042)

Before diving into the description of the format, please display (in a suitable format) and do a step-by-step dissection/walkthrough of the contents of the field [48 x i8] c"..." ; Encoded data (not shown) that you ellipsis'd above, so that the reader can "get a feel for it".

238–239 ↗(On Diff #12042)

It doesn't really make sense to talk about "primitives" at all, since your strings aren't even "primitive" (they contain a LEB128, which is itself a primitive).

Better to instead have a section about "the things that can appear after the : in [foo : type]". (see my comment below about simplifying the use of subscripts to use a simpler colon notation)

257 ↗(On Diff #12042)

Everywhere that you currently say uint just say LEB128. That eliminates the need for the reader to look around at what you mean by uint (readers do not always read sequentially). It also saves you from having to explain what a "uint" is in the first place besides simply linking to a description of LEB128.

278–280 ↗(On Diff #12042)

This notation seems more fit for a LaTeX document, but as you can tell it's a bit of a bear in Sphinx. Please use a simpler, flat notation for this. Just [value : uint] would be fine, placed in monospace like you have done for the section "Mapping Region"

332–333 ↗(On Diff #12042)

How do you know which operation to apply? There seems to be a LHS and a RHS but no "opcode".

arphaman updated this revision to Diff 12181.Aug 4 2014, 3:34 PM

Thank you for your feedback, I've attached an updated patch that tries to
address your comments.

silvas added a comment.Aug 5 2014, 3:39 PM

This is an awesome improvement! A few more review comments.

docs/CoverageMappingFormat.rst
51 ↗(On Diff #12181)

Tiny nit: please use ./test here instead of just test so that it is clear that test is not a subcommand of llvm-cov show. (I personally got tripped up by this) Also move the ./test after -instr-profile=test.profdata (unless the syntax of llvm-cov doesn't allow it? which would be weird)

93–95 ↗(On Diff #12181)

The wording here suggests that the function itself creates the data. Consider starting the sentence instead as "For each function that requires code coverage, the fronted has to create...."

104 ↗(On Diff #12181)

Here you discuss what the different regions are used for, but it isn't clear to the reader what the differences "boil down to". Why are these different kinds needed? Are they represented differently? Consider thinking about this from the perspective of what functionality would be impaired by removing each kind.

106–107 ↗(On Diff #12181)

Do the other kinds of regions have coverage mapping counters associated with them? If not, then please make the wording here indicate that, e.g. "Code regions are special in that they associate portions ..." (also give some insight into *why* they should be unlike the others in this respect).

196–197 ↗(On Diff #12181)

If the count is statically known to be zero, then why does it need to be represented at all?

I think it is because the tools that process the coverage information after the fact are "dumb" (don't have the full knowledge of the frontend) and so the frontend has to provide the information to them in this form. It's worth noting this I think.

278 ↗(On Diff #12181)

Please say "8 bits" or "8 bytes" here for clarity.

293–294 ↗(On Diff #12181)

Please somehow note that this "20" comes from the second field in @__llvm_coverage_mapping

302–304 ↗(On Diff #12181)

At this point you should probably say "in LEB128 format, which is used throughout for storing integers", so that users don't get confused at the arbitrary 256 limit (speaking from personal experience here...)

307–308 ↗(On Diff #12181)

Again, just for clarity, please explicitly state where in @__llvm_coverage_mapping this information is coming from.

317–319 ↗(On Diff #12181)

Instead of saying "first byte", "second byte", etc. It is probably simpler to just put the data that is being discussed, e.g.

  • c"\01": stores the number of file ids ...
  • c"\00": an index into the filenames array ...
  • etc.
325–326 ↗(On Diff #12181)

Why is the total number of mapping regions for the whole file (which could contain many functions) encoded in the per-function data?

346 ↗(On Diff #12181)

Please say either "The per-function coverage mapping data ..." or "Each function's coverage mapping data ...". Otherwise the reader could be confused and think that this is a description of the whole data string (the 40 bytes in the example above).

397–405 ↗(On Diff #12181)

This isn't clear to me. What exactly does a fixed-size bit field in a LEB128 mean? Is this after it is supposedly interpreted into a 32-bit int? What endianness? Please clarify.

493 ↗(On Diff #12181)

By pseudo-counter do you mean "expression counter"?

552–559 ↗(On Diff #12181)

This seems a bit out of place here. Isn't the stuff above this all about the structure of the per-function entries, but this is about the filename data (the first 20 bytes of the example above?)

arphaman updated this revision to Diff 12634.Aug 18 2014, 4:11 PM

Thanks for your feedback, sorry about the delayed response.
I've attached an updated patch.

arphaman added inline comments.Aug 18 2014, 4:25 PM
docs/CoverageMappingFormat.rst
51 ↗(On Diff #12181)

Currently, llvm-cov requires that the object file name is the first argument after the command, thus ./test has to be before -instr-profile=test.profdata. This might be changed in the future though.

325–326 ↗(On Diff #12181)

Those coverage mapping file ids are only meaningful in a context of a single function.
The new description says:

The number of mapping regions that are stored in an array for the function's file id #0.

Which hopefully makes it more clear.

397–405 ↗(On Diff #12181)

I've got rid of the

`[tag : 2, data : 30]`

part, but still kept the description. There is no real fixed size limit here, it's just that the implementation assumes that the counter fits in a 32 bit integer.
The updated description reads:

A `coverage mapping counter`_ is stored in a single `LEB value <LEB128_>`_.
It is composed of two things --- the `tag <counter-tag_>`_
which is stored in the lowest 2 bits, and the `counter data`_ which is stored
in the remaining bits.

I think it's clear that the extraction of 2 integers of widths 2 bits and remaining bits happens after the LEB value is decoded. As far as I understand, the endianness here doesn't matter as the LEB values are always encoded in little endian, and the 2 host endian integers are extracted from the decoded LEB integer which is already host endian.

493 ↗(On Diff #12181)

No, It's pseudo-counter from line 503.

332–333 ↗(On Diff #12042)

The opcode is derived from the actual counter that uses this expression, I've added a mention for this in the updated patch.

silvas accepted this revision.Aug 18 2014, 5:01 PM
silvas added a reviewer: silvas.

This looks great. Please commit.

This revision is now accepted and ready to land.Aug 18 2014, 5:01 PM
arphaman closed this revision.Aug 19 2014, 10:15 AM
arphaman updated this revision to Diff 12668.

Closed by commit rL215990 (authored by @arphaman).