This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Print module summary index to assembly
ClosedPublic

Authored by tejohnson on May 10 2018, 8:40 AM.

Details

Summary

Implements AsmWriter support for printing the module summary index to
assembly with the format discussed in the RFC "LLVM Assembly format for
ThinLTO Summary".

Implements just enough of the parsing support to recognize and ignore
the summary entries. As agreed in the RFC thread, this will be the
behavior when assembling the IR. A follow on change will implement
parsing/assembling of the summary entries for use by tools that
currently build the summary index from bitcode.

Diff Detail

Repository
rL LLVM

Event Timeline

tejohnson created this revision.May 10 2018, 8:40 AM
tejohnson updated this revision to Diff 146140.May 10 2018, 8:56 AM

A little bit of minor comment cleanup

tejohnson updated this revision to Diff 146828.May 15 2018, 7:38 AM

A few changes motivated when writing the AsmParser support:

  • Use "external" instead of "extern" as linkage, so that existing

linkage type parser can be utilized

  • Quote string values (e.g. names and paths)
  • Emit only one of edge hotness and relative branch freq (to be

consistent with bitcode writer), and only write them if they have a
specified value.

tejohnson updated this revision to Diff 146837.May 15 2018, 8:22 AM

Update tests for previous change.

Ping on review. I don't expect more changes from my side at this point.

dexonsmith requested changes to this revision.May 15 2018, 8:59 PM
dexonsmith added a subscriber: dexonsmith.

I haven't had time to give this a full review, but skimming this I noticed that there's no update to llvm/docs/LangRef.rst. Can you document the syntax and semantics there?

I also wonder if it would be cleaner to add support for different constructs one-at-a-time, adding printing/parsing/double-round-trip-testing/LangRef incrementally for each construct. But I'm not totally opposed to doing the printing support first.

tools/llvm-lto/llvm-lto.cpp
383 ↗(On Diff #146837)

It's not obvious to me if this is fixing a pre-existing bug or you need to change this to count from 0 instead of from 1. (If the former, you may as well separate this out into a prep commit.)

dexonsmith added inline comments.May 15 2018, 8:59 PM
include/llvm/IR/ModuleSummaryIndex.h
315 ↗(On Diff #146837)

Can this be split out and committed separately?

519 ↗(On Diff #146837)

Can this be split out and committed separately?

lib/AsmParser/LLLexer.h
68–69 ↗(On Diff #146837)

Rather than making this (temporarily) public, it might be nicer to add a function like "LexModuleSummaryIndex" that calls SkipLineComment. Then it'll be less likely that SkipLineComment grows non-private users.

This revision now requires changes to proceed.May 15 2018, 8:59 PM

I haven't had time to give this a full review, but skimming this I noticed that there's no update to llvm/docs/LangRef.rst. Can you document the syntax and semantics there?

Thanks for taking a look! Good point, I will work on the LangRef changes and add to this patch.

I also wonder if it would be cleaner to add support for different constructs one-at-a-time, adding printing/parsing/double-round-trip-testing/LangRef incrementally for each construct. But I'm not totally opposed to doing the printing support first.

It's a lot easier to do it all at once. Building a partial index on the parsing side has its own issues.

include/llvm/IR/ModuleSummaryIndex.h
315 ↗(On Diff #146837)

Yes, will do.

519 ↗(On Diff #146837)

Yes, will do.

lib/AsmParser/LLLexer.h
68–69 ↗(On Diff #146837)

I realized when working on the parsing that it is going to be useful to keep this functionality public, because for parsing clients that aren't interested in the index, for which we will pass in a null Index pointer, we want to continue to skip the index lines. Similarly, for clients that aren't interested in the IR (e.g. the Thin Link, which when reading a bitcode file skips past all of the IR records), we will pass the parser a null Module pointer and want to skip all the non-summary IR lines. I can add a new public function to call this, not sure of the best name, or just rename SkipLineComment to SkipLine.

tools/llvm-lto/llvm-lto.cpp
383 ↗(On Diff #146837)

Yeah it's a pre-existing inconsistency with the module numbering done by both the new and old LTO APIs used by linkers, which both start at 0. I will commit this separately.

tejohnson updated this revision to Diff 147090.May 16 2018, 8:06 AM

Merge in minor fixes committed separately.

dexonsmith added inline comments.May 16 2018, 8:18 AM
lib/AsmParser/LLLexer.h
68–69 ↗(On Diff #146837)

Or perhaps a more honest name like SkipModuleSummaryIndex.

tejohnson added inline comments.May 16 2018, 8:21 AM
lib/AsmParser/LLLexer.h
68–69 ↗(On Diff #146837)

I can add that for this patch. But that isn't a good name for the other use case I am adding to my parsing patch, of skipping the IR when we only care about reading the summary index. Should we have two interfaces (e.g. SkipModuleSummaryIndex and SkipModuleIR) or a single generic skip interface?

tejohnson updated this revision to Diff 147939.May 21 2018, 7:49 PM

Add LangRef documentation.

The LangRef looks great, thanks for adding it. I have a few nitpicks and a follow-up on the SkipLineComment discussion. I made it most of the way through the AssemblyWriter (and any comments/questions there might apply more generally to the rest)... you may as well respond to those first, and then I can review the rest.

docs/LangRef.rst
5707–5708 ↗(On Diff #147939)

Is this a good place to document the behaviour around when the ThinLTO summary does and does not get dropped?

include/llvm/IR/ModuleSummaryIndex.h
434–435 ↗(On Diff #147939)

This comment isn't clear to me. The current text seems to suggest that the llvm.assume intrinsic is a type of llvm.type.test intrinsic. Assuming that's unintentional, I wonder if one of these would be accurate:

  • "List of type indentifiers used by this function in llvm.type.test intrinsics not referenced by an llvm.assume intrinsic, ..."
  • "List of type indentifiers used by this function in llvm.type.test intrinsics referenced by something other than an llvm.assume intrinsic, ..."

(I just noticed this is an old comment that has just been moved, so feel free to clean up separately.)

871–873 ↗(On Diff #147939)

Is it necessary to allow nullptr here? If not, can we take GlobalValueSummaryMapTy::value_type by reference?

lib/AsmParser/LLLexer.cpp
433–436 ↗(On Diff #147939)

It looks like this is copied exactly from elsewhere. Can/should we factor this out into LexUIntID (or something)? (Maybe even taking the token to return on success?)

lib/AsmParser/LLLexer.h
68–69 ↗(On Diff #146837)

Interesting. Are you sure SkipModuleIR will do what you want if it just skips to the end of a line? IIRC, IR can be split over multiple lines (e.g., metadata).

Assuming it works, I might slightly prefer the explicit names for each use case, but I'm not sure I have a good reason to...

If you do keep it generic, I think SkipLine (as you suggested elsewhere) is a better name.

lib/IR/AsmWriter.cpp
765 ↗(On Diff #147939)

This is a weird mix of naming styles. I think it should either be GUIDIterator (which matches our usual naming style) or guid_iterator (which matches as_iterator above).

767–768 ↗(On Diff #147939)

I don't see these being used anywhere.

907 ↗(On Diff #147939)

Does marking this as inline really gain anything here (besides a weak symbol)? I doubt our inliner really needs a hint here. But I could be wrong...

908 ↗(On Diff #147939)

Invert condition and return early to reduce nesting?

1011 ↗(On Diff #147939)

Is it useful to assert(TheIndex)?

1176–1177 ↗(On Diff #147939)

The DestSlot local seems unnecessary here...

2261 ↗(On Diff #147939)

= nullptr?

2370–2371 ↗(On Diff #147939)

I'd prefer these nullptr assignments to be at the declaration in the class.

2581 ↗(On Diff #147939)

I feel like "First" is liable to bitrot.

2609–2610 ↗(On Diff #147939)
  • Period at the end of the sentence.
  • Did you intend to leave this FIXME until later?
2617 ↗(On Diff #147939)

I think "Next" is unnecessary here.

2624 ↗(On Diff #147939)

I think "Last" is liable to bitrot.

2684 ↗(On Diff #147939)

Why print all or none? (Would it be better to check each for non-zero?)

2698 ↗(On Diff #147939)

Is there an empty method to use instead?

tejohnson marked 17 inline comments as done.May 24 2018, 10:30 AM

Thanks for the comments! I think I have addressed them all.

docs/LangRef.rst
5707–5708 ↗(On Diff #147939)

Added (with a note about current temporary behavior).

include/llvm/IR/ModuleSummaryIndex.h
434–435 ↗(On Diff #147939)

Looking at the code, your second alternative appears to be correct. This is pcc's code and comment, I will send a change to him separately - assuming that goes in first I will merge with this patch.

871–873 ↗(On Diff #147939)

In my single callsite it will never be nullptr. In general, a ValueInfo can have a null pointer though. But given the documented use case (when iterating the index), it shouldn't be null, so I will change to a reference.

lib/AsmParser/LLLexer.h
68–69 ↗(On Diff #146837)

Hmm, it looks like metadata (at least) can be split across multiple lines, so my parsing patch is going to have to do more work to properly skip everything before the summary. I can probably do something simple there like skipping every line that doesn't start with "^". For now I am assuming that the summary entries will be on a single line, so using SkipLineComment should work. I did rename SkipLineComment to SkipLine, which is actually what it is doing. I left it public but removed the FIXME, and added a LLParser method to skip module summary entries by invoking it.

lib/IR/AsmWriter.cpp
765 ↗(On Diff #147939)

Changed to guid_iterator to be consistent with other iterators.

767–768 ↗(On Diff #147939)

Removed

907 ↗(On Diff #147939)

Probably not, I just inherited this because I copied SlotTracker::initialize. Removed.

1011 ↗(On Diff #147939)

Might as well, added.

2609–2610 ↗(On Diff #147939)

Yes, for now. I'd like to revisit this soon, but wanted to focus on getting the parsing side support done first.
Can you clarify your concern about the period at the end of the sentence? There's already one there, which I assume is correct.

2684 ↗(On Diff #147939)

In fact, my parsing WIP patch will handle any subset of these being specified. Fixed here.

2698 ↗(On Diff #147939)

Yep, fixed here and elsewhere in patch.

tejohnson marked 9 inline comments as done.

Address comments.

Manually merge r333212 comment fix.

pcc added a comment.May 24 2018, 1:30 PM

This is somewhat of a bikeshed comment but I'm not really a big fan of putting everything on one line. (I know we do that for debug metadata too, but really I'd be in favour of changing that as well.) I'm wondering whether something like this would be easier to read:

^9 = gv: ( ; guid = 17381606045411660303
  name: "hot_function",
  summaries: (
    function: (
      module: ^0,
      flags: (
        linkage: external,
        notEligibleToImport: 0,
        live: 0,
        dsoLocal: 0
      ),
      insts: 16,
      calls: (
        (callee: ^5, hotness: hot),
        (callee: ^6, hotness: cold),
        (callee: ^4, hotness: hot),
        (callee: ^7, hotness: cold),
        (callee: ^10, hotness: none),
        (callee: ^3, hotness: hot),
        (callee: ^2, hotness: none),
        (callee: ^8, hotness: none),
        (callee: ^1, hotness: critical)
      )
    )
  )
)
docs/LangRef.rst
5721 ↗(On Diff #148440)

Probably wouldn't mention the linker, see this part of the thread with David: http://lists.llvm.org/pipermail/llvm-dev/2018-May/122975.html

lib/IR/AsmWriter.cpp
2776 ↗(On Diff #148440)

Maybe changing the other callers to do:

if (GV->getLinkage() != GlobalValue::ExternalLinkage)
   Out << getLinkagePrintName(GV->getLinkage()) << ' ';

would be clearer than adding the argument?

tejohnson marked an inline comment as done.May 24 2018, 1:53 PM
In D46699#1111535, @pcc wrote:

This is somewhat of a bikeshed comment but I'm not really a big fan of putting everything on one line. (I know we do that for debug metadata too, but really I'd be in favour of changing that as well.) I'm wondering whether something like this would be easier to read:

^9 = gv: ( ; guid = 17381606045411660303
  name: "hot_function",
  summaries: (
    function: (
      module: ^0,
      flags: (
        linkage: external,
        notEligibleToImport: 0,
        live: 0,
        dsoLocal: 0
      ),
      insts: 16,
      calls: (
        (callee: ^5, hotness: hot),
        (callee: ^6, hotness: cold),
        (callee: ^4, hotness: hot),
        (callee: ^7, hotness: cold),
        (callee: ^10, hotness: none),
        (callee: ^3, hotness: hot),
        (callee: ^2, hotness: none),
        (callee: ^8, hotness: none),
        (callee: ^1, hotness: critical)
      )
    )
  )
)

Thanks for the comments.

I see pros and cons. In some ways this is nicer to look at, but I am concerned about the amount of vertical space the summary assembly will consume. In most cases I would like to see more summary entries on my screen at a time, but that is probably personal preference.

Incidentally, I confirmed that my WIP parser can actually handle this (Duncan pointed out and I confirmed BTW that multi-line metadata are supported by the parser, although I don't believe they are emitted that way). Although the support in this patch to skip the summary assembly will not handle this, as it only skips a line at a time and won't understand the subsequent lines which don't start with the summary ID.

docs/LangRef.rst
5721 ↗(On Diff #148440)

Forgot about that, fixed.

lib/IR/AsmWriter.cpp
2776 ↗(On Diff #148440)

I thought about that but felt it was less churn and clearer to handle it in one centralized place. If you much prefer changing the callers, I can do that instead.

tejohnson updated this revision to Diff 148475.May 24 2018, 1:53 PM
tejohnson marked an inline comment as done.

Address LangRef comment

pcc added a comment.May 24 2018, 2:51 PM

Although the support in this patch to skip the summary assembly will not handle this, as it only skips a line at a time and won't understand the subsequent lines which don't start with the summary ID.

You could always count parens and stop when you reach 0. That's probably what the end state for skipping should look like as well.

For printing, if you think that would work better for you I can probably tolerate the single line.

lib/IR/AsmWriter.cpp
2776 ↗(On Diff #148440)

Can you do that please? I do prefer to avoid mysterious bool arguments wherever possible.

dexonsmith accepted this revision.May 24 2018, 3:22 PM

I have a suggestion for how to respond to Peter's feedback, but this LGTM once he's happy.

lib/IR/AsmWriter.cpp
2609–2610 ↗(On Diff #147939)

Can you clarify your concern about the period at the end of the sentence? There's already one there, which I assume is correct.

Yes it is. I have no idea why I didn't see it. Maybe I need to clean my laptop screen :/.

2776 ↗(On Diff #148440)

I agree with Peter that the bool argument is mysterious. We also don't tend to pass around std::strings much.

Here's a third option that should be less boilerplate. First, in a prep NFC patch, do:

  • Simplify this to const char *getLinkageName(GlobalValue::LinkageTypes).
  • Add a function, printLinkageWithSpace(raw_ostream&,GlobalValue::LinkageTypes). Its implementation can call getLinkageName and add a space, but return early for ExternalLinkage.
  • Update current users to call printLinkageWithSpace.

Then in your patch:

  • Change getLinkageName to return "external" for ExternalLinkage.
  • Use getLinkageName where you need it.
This revision is now accepted and ready to land.May 24 2018, 3:22 PM
In D46699#1111653, @pcc wrote:

Although the support in this patch to skip the summary assembly will not handle this, as it only skips a line at a time and won't understand the subsequent lines which don't start with the summary ID.

You could always count parens and stop when you reach 0. That's probably what the end state for skipping should look like as well.

Yeah, let me do that.

For printing, if you think that would work better for you I can probably tolerate the single line.

Ok, let's keep it on a single line for now then.

lib/IR/AsmWriter.cpp
2776 ↗(On Diff #148440)

I like this suggestion - Peter, does that work for you?

pcc added inline comments.May 24 2018, 3:59 PM
lib/IR/AsmWriter.cpp
2776 ↗(On Diff #148440)

Yes, that works for me, thanks.

Uploading new version momentarily with the following changes:

  • Merged in recommended linkage name printer NFC changes (r333281).
  • Changed SkipModuleSummaryEntry so that it would handle multi-line summary entries (by skipping the set of nested parentheses).
  • Undid changes to SkipLineComment due to the above change.
  • Added a test to ensure the round-tripping of summary entries through llvm-as (in test/Bitcode/thinlto-function-summary-callgraph-profile-summary.ll).
  • Synced with HEAD

Peter, I'll wait for you to do final sign off.

tejohnson updated this revision to Diff 148620.May 25 2018, 9:19 AM

Address comments and sync with HEAD.

pcc added a comment.May 25 2018, 3:05 PM

Mostly looking good, a few last comments.

lib/AsmParser/LLParser.cpp
741 ↗(On Diff #148620)

Can you write tests for the parse failures?

748 ↗(On Diff #148620)

This code is reachable only if the first token after the "label" is not an lparen, no? There also seems to be a bug: this code will accept foo: ) (. Maybe a more straightforward way to write this is:

  • expect the lparen
  • do the same parsing loop as before, but with NumOpenParen initialized to 1 and no FoundSomeParens variable
tejohnson added inline comments.May 25 2018, 3:42 PM
lib/AsmParser/LLParser.cpp
741 ↗(On Diff #148620)

will do

748 ↗(On Diff #148620)

This code is reachable only if the first token after the "label" is not an lparen, no?

That's what was intended, since we will error for any other imbalanced parentheses (by hitting Eof).

There also seems to be a bug: this code will accept foo: ) (

True, because NumOpenParens is unsigned, and the rparens first will cause it to wrap around and be >0. If I change NumOpenParens to be a signed integer, the loop will exit after the ")" is parsed and we will get the error here here because FoundSomeParens will be false.

But in any case, I will restructure as you suggest, because the current code won't catch the following case:

label: foobarjunk ( )

i.e. it won't complain about foobarjunk before the "(". I don't really want this code to do a lot of syntax checking, because that's coming in the parsing patch, but it sounds reasonable to expect an initial lparen and start at 1. Will change.

tejohnson updated this revision to Diff 148683.May 25 2018, 4:08 PM

Address comment, add new tests.

tejohnson marked 2 inline comments as done.May 25 2018, 4:09 PM

Comments addressed.

pcc accepted this revision.May 25 2018, 4:25 PM

LGTM

This revision was automatically updated to reflect the committed changes.