This is an archive of the discontinued LLVM Phabricator instance.

Output optimization remarks in YAML
ClosedPublic

Authored by anemet on Sep 14 2016, 2:49 PM.

Details

Summary

This allows various presentation of this data using an external tool.
This was first recommended here[1].

As an example, consider this module:

1 int foo();
2 int bar();
3
4 int baz() {
5   return foo() + bar();
6 }

The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):

remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)

Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:

--- !Missed
Pass:            inline
Name:            NotInlined
DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 10 }
Function:        baz
Hotness:         30
Args:
  - Callee: foo
  - String:  will not be inlined into
  - Caller: baz
...
--- !Missed
Pass:            inline
Name:            NotInlined
DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 18 }
Function:        baz
Hotness:         30
Args:
  - Callee: bar
  - String:  will not be inlined into
  - Caller: baz
...

This is a summary of the high-level decisions:

  • There is a new streaming interface to emit optimization remarks. E.g. for the inliner remark above:

    ORE.emit(DiagnosticInfoOptimizationRemarkMissed( DEBUG_TYPE, "NotInlined", &I) << NV("Callee", Callee) << " will not be inlined into " << NV("Caller", CS.getCaller()) << setIsVerbose());

NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.

Subsequent patches will update ORE users to use the new streaming API.

  • I am using YAML I/O for writing the YAML file. YAML I/O requires you to specify reading and writing at once but reading is highly non-trivial for some of the more complex LLVM types. Since it's not clear that we (ever) want to use LLVM to parse this YAML file, the code supports and asserts that we're writing only.

On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).

  • The YAML stream is stored in the LLVM context.
  • As before hotness is computed in the analysis pass instead of DiganosticInfo. This avoids the layering problem since BFI is in Analysis while DiagnosticInfo is in IR.

[1] https://reviews.llvm.org/D19678#419445

Diff Detail

Repository
rL LLVM

Event Timeline

anemet updated this revision to Diff 71442.Sep 14 2016, 2:49 PM
anemet retitled this revision from to [RFC] Output optimization remarks in YAML.
anemet updated this object.
anemet added reviewers: hfinkel, rjmccall.
anemet added a subscriber: llvm-commits.
anemet updated this object.Sep 14 2016, 2:51 PM
anemet updated this object.Sep 14 2016, 3:00 PM

Ping. @hfinkel, would be great if you could comment on the direction in this RFC. I don't want to start updating the optimization before we settle on the API. Thanks!

hfinkel edited edge metadata.Sep 22 2016, 4:24 PM

Ping. @hfinkel, would be great if you could comment on the direction in this RFC. I don't want to start updating the optimization before we settle on the API. Thanks!

Thanks for working on this! I think this direction makes sense. I think that we'd like the remark classes to be serializable as YAML, but we probably don't want the file writing code to be in the backend. The diagnostic classes get passed to the handler, which is often in the frontend, and we should probably let the handler decide what to do with them (e.g. where/if to write them to a file).

  • string: will not be inlined into

We don't need to fix this all at once, but we need to think carefully about how remarks are identified. I think we want it to be very clear how each diagnostic is identified, and we don't want strings that compose the human-readable summary message to subtly also form an interface contract with external tools. So the string here is fine, but we'll want each diagnostic to have some name that is designed to be machine consumed, and probably key-value pairs which likewise have machine-consumable names.

Thanks for taking a look!

Thanks for working on this! I think this direction makes sense. I think that we'd like the remark classes to be serializable as YAML, but we probably don't want the file writing code to be in the backend. The diagnostic classes get passed to the handler, which is often in the frontend, and we should probably let the handler decide what to do with them (e.g. where/if to write them to a file).

Then maybe I misunderstood what was meant by "doing this in the backend" in earlier conversations. I thought the idea was (which made sense to me) that all the backend user (e.g. clang, lto, opt) does it to instruct the backend to write the diagnostics into a file by invoking Context.setDiagnosticsOutputFile. I agree that this only gives binary control to the backend users but I am not sure we want more control at this time. Do you have something in mind?

It's also has the benefit that we don't need to duplicate this across the backend users.

  • string: will not be inlined into

We don't need to fix this all at once, but we need to think carefully about how remarks are identified. I think we want it to be very clear how each diagnostic is identified, and we don't want strings that compose the human-readable summary message to subtly also form an interface contract with external tools. So the string here is fine, but we'll want each diagnostic to have some name that is designed to be machine consumed, and probably key-value pairs which likewise have machine-consumable names.

OK, so sounds like we want the interface to enforce a single string per message policy. So probably a format like this would work better:

ORE.emit(DiagnosticInfoOptimizationRemarkMissed(DEBUG_TYPE, &I)
         << "% will not be inlined into %"
         << Callee << CS.getCaller());

What do you think?

Thanks for taking a look!

Thanks for working on this! I think this direction makes sense. I think that we'd like the remark classes to be serializable as YAML, but we probably don't want the file writing code to be in the backend. The diagnostic classes get passed to the handler, which is often in the frontend, and we should probably let the handler decide what to do with them (e.g. where/if to write them to a file).

Then maybe I misunderstood what was meant by "doing this in the backend" in earlier conversations. I thought the idea was (which made sense to me) that all the backend user (e.g. clang, lto, opt) does it to instruct the backend to write the diagnostics into a file by invoking Context.setDiagnosticsOutputFile. I agree that this only gives binary control to the backend users but I am not sure we want more control at this time. Do you have something in mind?

It's also has the benefit that we don't need to duplicate this across the backend users.

I agree; this is good. The yaml::Output object can be a memory-mapped buffer. This implementation is fairly generic. The diagnostic handler does not get to filter what to serialize, true, but perhaps that's okay.

  • string: will not be inlined into

We don't need to fix this all at once, but we need to think carefully about how remarks are identified. I think we want it to be very clear how each diagnostic is identified, and we don't want strings that compose the human-readable summary message to subtly also form an interface contract with external tools. So the string here is fine, but we'll want each diagnostic to have some name that is designed to be machine consumed, and probably key-value pairs which likewise have machine-consumable names.

OK, so sounds like we want the interface to enforce a single string per message policy. So probably a format like this would work better:

ORE.emit(DiagnosticInfoOptimizationRemarkMissed(DEBUG_TYPE, &I)
         << "% will not be inlined into %"
         << Callee << CS.getCaller());

What do you think?

I'd much rather we be forced to actually name these things. I don't want to have to change the tools that interpret the YAML files every time we update the messages to make them more helpful. How about this:

ORE.emit(DiagnosticInfoOptimizationRemarkMissed(DEBUG_TYPE, &I, "InlinerNotInlined")
         << NV("Callee", Callee) << " will not be inlined into "
         << NV("Caller", CS.getCaller()));

Where NV is supposed to stand for "named value", and the idea being that the name is used as the key for the YAML output.

lib/Analysis/OptimizationDiagnosticInfo.cpp
131 ↗(On Diff #71442)

Why don't we call this regardless of whether we're saving the output? We might not want to change which messages the user sees while compiling just because we're also recording the output (I don't think that we do).

ORE.emit(DiagnosticInfoOptimizationRemarkMissed(DEBUG_TYPE, &I, "InlinerNotInlined")
         << NV("Callee", Callee) << " will not be inlined into "
         << NV("Caller", CS.getCaller()));

Where NV is supposed to stand for "named value", and the idea being that the name is used as the key for the YAML output.

I think that this makes a lot of sense! Thanks for your input!

lib/Analysis/OptimizationDiagnosticInfo.cpp
131 ↗(On Diff #71442)

Makes sense. The backend user can decide not to emit it at the higher level.

This is probably that we should use now for IsVerbose (D23415).

I will update the patch with all this and also add support for streaming the IsVerbose flag.

anemet updated this revision to Diff 72535.Sep 26 2016, 12:13 PM
anemet edited edge metadata.

Addresses Hal's comments. I also got this ready for a formal review, please
take a look.

A few notes:

  • The format is slightly changed with the named values and the remark name. I will update the review description to reflect this. The testcase is of course already updated.
  • For the remark name I used "NotInlined" rather than "InlinerNotInlined" since the pass name is already provided
  • Added support for IsVerbose in the streaming API. There is a new FIXME to handle IsVerbose in the backend users rather than in the backend. I will do this in a follow-up.
anemet retitled this revision from [RFC] Output optimization remarks in YAML to Output optimization remarks in YAML.Sep 26 2016, 12:17 PM
anemet updated this object.

Great, thanks!

I'm glad you liked by suggestion for 'NV', because it is easy to type, but I do wonder if we'll have any issues because 'NV' is currently used as a variable name in a number of places. Having variable/type shadowing can certainly be confusing. What do you think?

I'm glad you liked by suggestion for 'NV', because it is easy to type, but I do wonder if we'll have any issues because 'NV' is currently used as a variable name in a number of places. Having variable/type shadowing can certainly be confusing. What do you think?

I was worried about the same thing but I haven't started propagating this pattern yet so I don't know if we hit something.

My backup solution is creating a small namespace for NV and perhaps IsVerbose, something like "ore". Then you'd have to qualify NV:

ORE.emit(DiagnosticInfoOptimizationRemarkMissed( DEBUG_TYPE, "NotInlined", &I) 
                 << ore::NV("Callee", Callee) << " will not be inlined into "
                 << ore::NV("Caller", CS.getCaller())
                 << ore::setIsVerbose());

What do you think? Another option I was thinking is using std::make_pair directly.

I'm glad you liked by suggestion for 'NV', because it is easy to type, but I do wonder if we'll have any issues because 'NV' is currently used as a variable name in a number of places. Having variable/type shadowing can certainly be confusing. What do you think?

I was worried about the same thing but I haven't started propagating this pattern yet so I don't know if we hit something.

My backup solution is creating a small namespace for NV and perhaps IsVerbose, something like "ore". Then you'd have to qualify NV:

ORE.emit(DiagnosticInfoOptimizationRemarkMissed( DEBUG_TYPE, "NotInlined", &I) 
                 << ore::NV("Callee", Callee) << " will not be inlined into "
                 << ore::NV("Caller", CS.getCaller())
                 << ore::setIsVerbose());

What do you think? Another option I was thinking is using std::make_pair directly.

Using ore:: seems safer. That's a good option. I don't like std::make_pair, it is pretty long, and implies something about the implementation we might later wish to change.

Using ore:: seems safer. That's a good option. I don't like std::make_pair, it is pretty long, and implies something about the implementation we might later wish to change.

Great. Do you want to see the namespace as part of this patch or is OK as a follow-on?

hfinkel accepted this revision.Sep 26 2016, 3:01 PM
hfinkel edited edge metadata.

Using ore:: seems safer. That's a good option. I don't like std::make_pair, it is pretty long, and implies something about the implementation we might later wish to change.

Great. Do you want to see the namespace as part of this patch or is OK as a follow-on?

Using ore:: seems safer. That's a good option. I don't like std::make_pair, it is pretty long, and implies something about the implementation we might later wish to change.

Great. Do you want to see the namespace as part of this patch or is OK as a follow-on?

This patch (i.e. it should be committed that way if that's what we intend to use), but otherwise, this LGTM.

This revision is now accepted and ready to land.Sep 26 2016, 3:01 PM

This patch (i.e. it should be committed that way if that's what we intend to use), but otherwise, this LGTM.

Sure thing, thanks for the review!

This revision was automatically updated to reflect the committed changes.

For the record, the committed version adds the new namespace 'ore' as we talked about.

I was also able to simplify the underlying NV machinery. Essentially, I always create a string key to string value pair for each argument (if the arg is string, the key is just 'String').

I think there is room to optimize this with some move semantics but I leave that to a follow-on patch.