This is an archive of the discontinued LLVM Phabricator instance.

[OptRemark] RFC: Introduce a message table for OptRemarks
Needs ReviewPublic

Authored by andrew.w.kaylor on Jan 8 2020, 3:58 PM.

Details

Summary

This is a very preliminary proposal to introduce a message table that would let us replace hard-coded strings with an identifier. The basic idea is that instead of something like this:

ORE->emit([&]() {
  return OptimizationRemark(DEBUG_TYPE, "LoadElim", LI)
         << "load of type " << NV("Type", LI->getType()) << " eliminated"
         << setExtraArgs() << " in favor of "
         << NV("InfavorOfValue", AvailableValue);
});

We'd be able to have something like this:

ORE->emit([&]() {
  return OptimizationRemark(DEBUG_TYPE, diag::remark_gvn_load_elim, LI)
         << NV("Type", LI->getType())
         << setExtraArgs() << NV("InfavorOfValue", AvailableValue);
});

I think this opens up a lot of possibilities for more compact storage of remarks and more reliable identification of remarks in the DiagHandler. It also brings up a lot of questions about how much of the information that is currently part of the OptimizationRemark class (and its siblings) should be part of the message table. I hope to discuss that in this review.

Diff Detail

Event Timeline

andrew.w.kaylor created this revision.Jan 8 2020, 3:58 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 8 2020, 3:58 PM

I like this general idea; couple of thoughts...

  1. Clang has a similar system, and one disadvantage is that any time you change/add any message, it seems to trigger a large rebuild. Could we have this TG system generate separate .inc files for each category of things, so we don't have the same kind of rebuild problem.
  2. Given that are arguments are named, can we use something like "Format string with optional specifier like %{NV}" instead of "Format string with optional specifier like %0"? I think that the former would be better.
anemet added a comment.Jan 8 2020, 4:41 PM

As Francis mentioned it before it would be good derive the pass name from the remark type (diag::remark_gvn_load_elim -> gvn) . I.e. I would drop the DEBUG_TYPE argument.

I like this general idea; couple of thoughts...

  1. Clang has a similar system, and one disadvantage is that any time you change/add any message, it seems to trigger a large rebuild. Could we have this TG system generate separate .inc files for each category of things, so we don't have the same kind of rebuild problem.

Probably. One thing I was imagining coming out of the .td file(s) is an enum somewhere that assigns values to all of the messages. I suppose we could make that multiple enums, each with a fixed starting value to avoid triggering rebuilds. In general, these should lend themselves to logical groupings. We'd probably also want a way for downstream LLVM-based products to painlessly add their own remarks and groups should help with that.

  1. Given that are arguments are named, can we use something like "Format string with optional specifier like %{NV}" instead of "Format string with optional specifier like %0"? I think that the former would be better.

Yes, that's a great idea. So, if I understand what you're suggesting, the entry for the remark I used as an example would look like this:

def remark_gvn_load_elim: OptRemark<"LoadElim", "load of type %Type eliminated", " in favor of %InfavorOfValue">;

Is that the idea?

As Francis mentioned it before it would be good derive the pass name from the remark type (diag::remark_gvn_load_elim -> gvn) . I.e. I would drop the DEBUG_TYPE argument.

This is one of the things that I thought could be a field in the OptRemark class. I think it makes sense to have this kind of property tightly coupled with the message. Francis mentioned a potential problem with keeping that synchronized with where the remarks are emitted, but I think if we think of it in terms of a category of optimization rather than a pass name that isn't a problem, because the nature of the optimization being described won't change even if, for example, you move it from InstCombine to AggressiveInstCombine. Or perhaps I'm introducing a second concept here. There's a bit of a disconnect between compiler developers who want to use this feature and compiler users who want to use the feature. The latter group is probably more interested in being able to say, for example, show me all remarks related to loop optimization rather than show me remarks from the loop rotate pass.

It also occurs to me that we could move the Optimization/Missed/Analysis hierarchy into this table.

I like this general idea; couple of thoughts...

  1. Clang has a similar system, and one disadvantage is that any time you change/add any message, it seems to trigger a large rebuild. Could we have this TG system generate separate .inc files for each category of things, so we don't have the same kind of rebuild problem.

Probably. One thing I was imagining coming out of the .td file(s) is an enum somewhere that assigns values to all of the messages. I suppose we could make that multiple enums, each with a fixed starting value to avoid triggering rebuilds. In general, these should lend themselves to logical groupings. We'd probably also want a way for downstream LLVM-based products to painlessly add their own remarks and groups should help with that.

  1. Given that are arguments are named, can we use something like "Format string with optional specifier like %{NV}" instead of "Format string with optional specifier like %0"? I think that the former would be better.

Yes, that's a great idea. So, if I understand what you're suggesting, the entry for the remark I used as an example would look like this:

def remark_gvn_load_elim: OptRemark<"LoadElim", "load of type %Type eliminated", " in favor of %InfavorOfValue">;

Is that the idea?

Yep, something like that.

As Francis mentioned it before it would be good derive the pass name from the remark type (diag::remark_gvn_load_elim -> gvn) . I.e. I would drop the DEBUG_TYPE argument.

This is one of the things that I thought could be a field in the OptRemark class. I think it makes sense to have this kind of property tightly coupled with the message. Francis mentioned a potential problem with keeping that synchronized with where the remarks are emitted, but I think if we think of it in terms of a category of optimization rather than a pass name that isn't a problem, because the nature of the optimization being described won't change even if, for example, you move it from InstCombine to AggressiveInstCombine. Or perhaps I'm introducing a second concept here. There's a bit of a disconnect between compiler developers who want to use this feature and compiler users who want to use the feature. The latter group is probably more interested in being able to say, for example, show me all remarks related to loop optimization rather than show me remarks from the loop rotate pass.

I think making a distinction here is actually a good idea. We could make foptimization-record-passes allow such groups as well (or add another flag for groups specifically).

Adding it in the same way as the Group<...> in include/clang/Driver/Options.td would be great:

def fsave_optimization_record : Flag<["-"], "fsave-optimization-record">,
  Group<f_Group>, HelpText<"Generate a YAML optimization record file">;

It also occurs to me that we could move the Optimization/Missed/Analysis hierarchy into this table.

That would also be great.

llvm/utils/TableGen/OptRemarkDiagEmitter.cpp
1

OptRemarkDiagEmitter.cpp

Updated to incorporate review suggestions

Any more feedback? Should I proceed in this direction?

fhahn added a subscriber: fhahn.Feb 25 2020, 3:56 AM

This is great! Sorry for the delay. More comments inline.

llvm/include/llvm/Remarks/OptRemarkDiagBase.td
23 ↗(On Diff #239983)

I think a few places call this "Passed". Would that be better than "General"?

33 ↗(On Diff #239983)

In this scheme, it will be hard for a remark to be part of multiple groups, right?

I was imagining something where we can do things like:

def gvn_load_elim : OptRemark<"LoadElim", "...">;
def gvn_load_elim_missed : OptRemark<"LoadElimMissed", "...">, Kind<Kind_Missed>;
def licm_hoisted : OptRemark<"InstHoisted", "...">;
def loop_vectorized : OptRemark<"LoopVectorized", "...">;

def GroupGVN : OptRemarkGroup<"GVN", [ gvn_load_elim, gvn_load_elim_missed ]>;
def GroupLICM : OptRemarkGroup<"LICM", [ licm_hoisted ]>;
def GroupLV : OptRemarkGroup<"LV", [ loop_vectorized ]>;
def GroupLoopStuff : OptRemarkGroup<"LoopStuff", [ licm_hoisted, loop_vectorized ]>;

or even groups of groups to avoid listing everything:

def GroupLoopStuff : OptRemarkGroup<"LoopStuff", [ GroupLICM, GroupLoopStuff ]>;

where we would have the liberty of putting remarks in different groups:

  • for compiler devs: group remarks by pass
  • for users: group remarks by a higher level concept: loop optimizations, actionability, sanitizer-added code, etc.

I don't know how hard it is to get something like this, but let me know what you think.

llvm/test/TableGen/opt-remark-diag.td
21

What would be the use case of these enums? Can't the same be achieved by not quoting RemarkName and Test1 in the OPT_REMARK( macros like in clang/include/clang/Driver/Options.h:

enum ID {
    OPT_INVALID = 0, // This is not an option ID.
#define OPTION(PREFIX, NAME, ID, KIND, GROUP, ALIAS, ALIASARGS, FLAGS, PARAM,  \
               HELPTEXT, METAVAR, VALUES)                                      \
  OPT_##ID,
#include "clang/Driver/Options.inc"
    LastOption
#undef OPTION
  };
a.elovikov added inline comments.Jun 8 2020, 4:44 PM
llvm/test/TableGen/opt-remark-diag.td
21

I'm more interested in what other use-cases for OPT_REMARKs are. I believe C++ macros and tablegen serve the same purpose so I'm not sure if it's beneficial to use both at the same time. On the other hand, I want to extend the .td description of remarks to simple statistics so that each remark will be something like "number of <smth>: X" and auto-generate most of the code for handling it. I.e., given

let Group = StatisticsFoo in {
  def remark_statistic_one : OptRemark<"StatisticOne", "Number of EventOne: %{Arg}>;
  def remark_statistic_two : OptRemark<"StatisticTwo", "Number of EventTwo: %{Arg}>;
}

I'd like to be able to generate

struct StatisticsFooStorage {
  NumStatisticOne = 0;
  NumStatisticTwo = 0;

  void emit(RemarkEmitter Emitter) {
    Emitter.emit(StatisticOneRemarkString.format(NumStatisticOne);
    Emitter.emit(StatisticTwoRemarkString.format(NumStatisticTwo);
  }
}

And actual optizmiation

StatisticsFooStorage StatStorage;
// ...
if (something)
  ++StatStorage.NumStatisticOne;
//
RemarksEmitter Emitter;
StatStorage.emit(Emitter);

I think writing direct tblgen emitter for this might be easier than using the OPT_REMARK macros (although that would probably be doable as well).

andrew.w.kaylor marked 3 inline comments as done.Jun 9 2020, 12:45 PM

Sorry for having let this drop for so long. Some other priorities came up, but I am still interested in seeing this through.

llvm/include/llvm/Remarks/OptRemarkDiagBase.td
23 ↗(On Diff #239983)

I don't like "Passed" but "General" isn't very helpful either. I think this will apply to "optimizations that were performed" as opposed to missed opportunities and "analysis" (which seems to mean "information"), but I'm not sure there are no cases where the base OptimizationRemark class is used to mean something else.

The clang documentation describes the groups this way:

  1. When the pass makes a transformation (-Rpass).
  2. When the pass fails to make a transformation (-Rpass-missed).
  3. When the pass determines whether or not to make a transformation (-Rpass-analysis).

That last description seems bad.

I don't really have strong feelings about what we call it. I guess what's important is to give it a good enough name that someone won't accidentally use if for a remark that doesn't align with the intended use. "General" doesn't do that. I guess "Passed" does.

33 ↗(On Diff #239983)

Multiple groups seems useful. I'll see what I can do with that.

llvm/test/TableGen/opt-remark-diag.td
21

@thegameg The enum ID would be used in the optimization pass to emit the remark and somewhere else (possibly the diagnostic handler) to look up the string for the remark. The reason I was generating the enum explicitly is that I wanted to establish the base ID for each group based on other information in the .td file, but I guess that could be accomplished by moving the starting ID to the header file in the way that you suggest.

@a.elovikov I'm not sure I understand what you want to accomplish with the statistics. What does this accomplish that you can't do with existing LLVM statistics other than enabling the information to be reported in a release build?

thegameg added inline comments.Jun 16 2020, 6:34 PM
llvm/include/llvm/Remarks/OptRemarkDiagBase.td
23 ↗(On Diff #239983)

No strong feelings either, I agree with everything you said.