This is an archive of the discontinued LLVM Phabricator instance.

Coverage: Documentation for the coverage mapping format
ClosedPublic

Authored by arphaman on Jul 30 2014, 12:06 PM.

Details

Summary

This patch provides documentation for the coverage mapping format.

It uses html based examples for high level overview to show the difference
between the mapping regions using different colours.

Diff Detail

Event Timeline

arphaman updated this revision to Diff 12042.Jul 30 2014, 12:06 PM
arphaman retitled this revision from to Coverage: Documentation for the coverage mapping format.
arphaman updated this object.
arphaman edited the test plan for this revision. (Show Details)
arphaman added reviewers: bogner, bob.wilson, dexonsmith.
arphaman set the repository for this revision to rL LLVM.
arphaman added a subscriber: Unknown Object (MLST).
silvas added a subscriber: silvas.Jul 30 2014, 6:15 PM
silvas added inline comments.
docs/CoverageMappingFormat.rst
15–17

Abstracts don't work well for most kinds of documentation (this case included). It's much more important to specify your assumptions about your readers and what the document is *trying to achieve for those readers*, than just trying to summarize the document's contents per se. See http://llvm.org/docs/SphinxQuickstartTemplate.html#guidelines for some basic guidelines.

44–45

A reader could make it here (and actually beyond here) thinking that this format is the run-time data structure that is used to store counts; I speak from personal experience...

Please don't address this comment by just saying somewhere "this is not a discussion of the runtime data structure". Instead, describe where this information fits in the coverage pipeline (i.e. the entire process of going from "a user's source code" to "a user's source code displayed to them with coverage information annotated on it"), which programs handle/create/look at this information, what does "mapping" have to do with anything?, etc. This is key "orienting" information that the reader absolutely has to have clear for the document to make any sense at all!

Obviously, it can be deduced from various aspects of this document what role this information must play. But the reader shouldn't have to play Sherlock. Actually, I can only find three clues that what this document describes is *not* the runtime data structure: "stored in object files and IR", "produced by frontend", "stores its integers in a format not suitable for random-access".

57

Is there a reason you didn't use a <pre> here? These non-breaking space entities make it basically impossible to edit this manually or review it in source form.

190

This is Darwin-specific. If this has to be included please accompany it with a discussion of portability and what would be different for other platforms.

225–226

Before diving into the description of the format, please display (in a suitable format) and do a step-by-step dissection/walkthrough of the contents of the field [48 x i8] c"..." ; Encoded data (not shown) that you ellipsis'd above, so that the reader can "get a feel for it".

239–240

It doesn't really make sense to talk about "primitives" at all, since your strings aren't even "primitive" (they contain a LEB128, which is itself a primitive).

Better to instead have a section about "the things that can appear after the : in [foo : type]". (see my comment below about simplifying the use of subscripts to use a simpler colon notation)

258

Everywhere that you currently say uint just say LEB128. That eliminates the need for the reader to look around at what you mean by uint (readers do not always read sequentially). It also saves you from having to explain what a "uint" is in the first place besides simply linking to a description of LEB128.

279–281

This notation seems more fit for a LaTeX document, but as you can tell it's a bit of a bear in Sphinx. Please use a simpler, flat notation for this. Just [value : uint] would be fine, placed in monospace like you have done for the section "Mapping Region"

333–334

How do you know which operation to apply? There seems to be a LHS and a RHS but no "opcode".

arphaman updated this revision to Diff 12181.Aug 4 2014, 3:34 PM

Thank you for your feedback, I've attached an updated patch that tries to
address your comments.

silvas added a comment.Aug 5 2014, 3:39 PM

This is an awesome improvement! A few more review comments.

docs/CoverageMappingFormat.rst
51

Tiny nit: please use ./test here instead of just test so that it is clear that test is not a subcommand of llvm-cov show. (I personally got tripped up by this) Also move the ./test after -instr-profile=test.profdata (unless the syntax of llvm-cov doesn't allow it? which would be weird)

93–95

The wording here suggests that the function itself creates the data. Consider starting the sentence instead as "For each function that requires code coverage, the fronted has to create...."

104

Here you discuss what the different regions are used for, but it isn't clear to the reader what the differences "boil down to". Why are these different kinds needed? Are they represented differently? Consider thinking about this from the perspective of what functionality would be impaired by removing each kind.

106–107

Do the other kinds of regions have coverage mapping counters associated with them? If not, then please make the wording here indicate that, e.g. "Code regions are special in that they associate portions ..." (also give some insight into *why* they should be unlike the others in this respect).

196–197

If the count is statically known to be zero, then why does it need to be represented at all?

I think it is because the tools that process the coverage information after the fact are "dumb" (don't have the full knowledge of the frontend) and so the frontend has to provide the information to them in this form. It's worth noting this I think.

278

Please say "8 bits" or "8 bytes" here for clarity.

293–294

Please somehow note that this "20" comes from the second field in @__llvm_coverage_mapping

302–304

At this point you should probably say "in LEB128 format, which is used throughout for storing integers", so that users don't get confused at the arbitrary 256 limit (speaking from personal experience here...)

307–308

Again, just for clarity, please explicitly state where in @__llvm_coverage_mapping this information is coming from.

317–319

Instead of saying "first byte", "second byte", etc. It is probably simpler to just put the data that is being discussed, e.g.

  • c"\01": stores the number of file ids ...
  • c"\00": an index into the filenames array ...
  • etc.
325–326

Why is the total number of mapping regions for the whole file (which could contain many functions) encoded in the per-function data?

346

Please say either "The per-function coverage mapping data ..." or "Each function's coverage mapping data ...". Otherwise the reader could be confused and think that this is a description of the whole data string (the 40 bytes in the example above).

397–405

This isn't clear to me. What exactly does a fixed-size bit field in a LEB128 mean? Is this after it is supposedly interpreted into a 32-bit int? What endianness? Please clarify.

493

By pseudo-counter do you mean "expression counter"?

552–559

This seems a bit out of place here. Isn't the stuff above this all about the structure of the per-function entries, but this is about the filename data (the first 20 bytes of the example above?)

arphaman updated this revision to Diff 12634.Aug 18 2014, 4:11 PM

Thanks for your feedback, sorry about the delayed response.
I've attached an updated patch.

arphaman added inline comments.Aug 18 2014, 4:25 PM
docs/CoverageMappingFormat.rst
51

Currently, llvm-cov requires that the object file name is the first argument after the command, thus ./test has to be before -instr-profile=test.profdata. This might be changed in the future though.

325–326

Those coverage mapping file ids are only meaningful in a context of a single function.
The new description says:

The number of mapping regions that are stored in an array for the function's file id #0.

Which hopefully makes it more clear.

333–334

The opcode is derived from the actual counter that uses this expression, I've added a mention for this in the updated patch.

397–405

I've got rid of the

`[tag : 2, data : 30]`

part, but still kept the description. There is no real fixed size limit here, it's just that the implementation assumes that the counter fits in a 32 bit integer.
The updated description reads:

A `coverage mapping counter`_ is stored in a single `LEB value <LEB128_>`_.
It is composed of two things --- the `tag <counter-tag_>`_
which is stored in the lowest 2 bits, and the `counter data`_ which is stored
in the remaining bits.

I think it's clear that the extraction of 2 integers of widths 2 bits and remaining bits happens after the LEB value is decoded. As far as I understand, the endianness here doesn't matter as the LEB values are always encoded in little endian, and the 2 host endian integers are extracted from the decoded LEB integer which is already host endian.

493

No, It's pseudo-counter from line 503.

silvas accepted this revision.Aug 18 2014, 5:01 PM
silvas added a reviewer: silvas.

This looks great. Please commit.

This revision is now accepted and ready to land.Aug 18 2014, 5:01 PM
arphaman closed this revision.Aug 19 2014, 10:15 AM
arphaman updated this revision to Diff 12668.

Closed by commit rL215990 (authored by @arphaman).