This is an archive of the discontinued LLVM Phabricator instance.

[llvm-profdata] Add support for weighted merge of profile data
ClosedPublic

Authored by slingn on Nov 10 2015, 11:10 AM.

Details

Summary

This change adds support for an optional weight when merging profile data with the llvm-profdata tool.
Weights are specified by adding an option ':<weight>' suffix to the input file names.

Adding support for arbitrary weighting of input profile data allows for relative importance to be placed on the
input data from multiple training runs.

Both sampled and instrumented profiles are supported.

Diff Detail

Repository
rL LLVM

Event Timeline

slingn updated this revision to Diff 39836.Nov 10 2015, 11:10 AM
slingn retitled this revision from to [llvm-profdata] Add support for weighted merge of profile data.
slingn updated this object.
slingn added reviewers: dnovillo, bogner, davidxl.
slingn added a subscriber: llvm-commits.
davidxl edited edge metadata.Nov 10 2015, 11:39 AM

Thanks -- this is a good functionality.

A high level comment about the command line syntax. I think we can do it without introducing a new option. All that needed for user is to add an optional weight suffix to the input file name such as:

llvm-profdata merge -o outp.profdata input1.data:10 input2.data:5 input3.data

When the :weight suffix is missing, the weight is assumed to be 1.

Note that this will like to simplify the implementation quite a bit -- not special parsing is needed.

davidxl added inline comments.Nov 10 2015, 12:07 PM
lib/ProfileData/InstrProfWriter.cpp
108 ↗(On Diff #39836)

value Profile data needs to be weight aware too.

140 ↗(On Diff #39836)

nit: Multiply ... by Weight.

test/tools/llvm-profdata/weight-sample.test
42 ↗(On Diff #39836)

No need for such tests if the weight is always attached to the input file.

tools/llvm-profdata/llvm-profdata.cpp
61 ↗(On Diff #39836)

You can directly fetch the weight from the augmented file name.

69 ↗(On Diff #39836)

Why doing weight here?

74 ↗(On Diff #39836)

Pass the weight to addRecord method instead.

silvas added a subscriber: silvas.Nov 10 2015, 10:17 PM

Thanks -- this is a good functionality.

A high level comment about the command line syntax. I think we can do it without introducing a new option. All that needed for user is to add an optional weight suffix to the input file name such as:

llvm-profdata merge -o outp.profdata input1.data:10 input2.data:5 input3.data

When the :weight suffix is missing, the weight is assumed to be 1.

Note that this will like to simplify the implementation quite a bit -- not special parsing is needed.

I agree with David here. The suffix approach seems a lot more robust and convenient. The only requirement is being able to assume that : doesn't appear in filenames, which might be a problem on windows, but that can probably be mitigated by checking that it is of the form :<decimal integer> I think.

silvas added inline comments.Nov 10 2015, 10:27 PM
include/llvm/ProfileData/SampleProf.h
194 ↗(On Diff #39836)

Why did this suddenly become unaligned with the comment? (I suggest setting up clang-format).

lib/ProfileData/InstrProfWriter.cpp
143 ↗(On Diff #39836)

Just use a lambda. Or I think you might be able to get away with this which is shorter:

for (auto &Count : Dest.Counts)
  Count *= Weight;

Also, it seems like there are multiple places that the weight comes into play. Can you please ensure there is a Single Point Of Truth for where the weight is applied?

slingn marked 6 inline comments as done.Nov 13 2015, 7:46 PM
slingn added inline comments.
lib/ProfileData/InstrProfWriter.cpp
143 ↗(On Diff #39836)

I have refactored things in the next patch set.

tools/llvm-profdata/llvm-profdata.cpp
61 ↗(On Diff #39836)

It's still better in my mind to parse the weight in a separate method since that can be shared between instr and sampled merge - and is a clearer separation of concerns.

69 ↗(On Diff #39836)

Fixed. Yes - this approach was left over from a previous revision.

slingn updated this revision to Diff 40201.Nov 13 2015, 7:46 PM
slingn marked 2 inline comments as done.
slingn edited edge metadata.

Updated for dnovillo, davidxl and silvas feedback.

-Changed to use ':<weight>' file suffix rather than add a --weights=N,M option.
-Refectored so that weights are applied in narrower scope (aka 'Single Point of Truth')

slingn updated this object.Nov 13 2015, 7:58 PM
davidxl added inline comments.Nov 13 2015, 9:13 PM
include/llvm/ProfileData/InstrProf.h
324 ↗(On Diff #40201)

Why moving this definition in-class? Try to avoid bringing irrelevant changes like this. It makes it harder to compare two versions.

413 ↗(On Diff #40201)

Moving this function from InstrProfWriter.cpp to here is fine, but can you submit a different patch for that?

include/llvm/Support/MathExtras.h
656 ↗(On Diff #40201)

Can you submit the saturating math change in a separate patch?

lib/ProfileData/InstrProfWriter.cpp
117 ↗(On Diff #40201)

Result is not checked here.

120 ↗(On Diff #40201)

Since you refactored the code with an else branch, you can sink the max function count update and return statement after the the if-then-else. I think this should be done in a separate patch (same one as moving the merge from writer cpp to instrprof.cpp)

tools/llvm-profdata/llvm-profdata.cpp
88 ↗(On Diff #40201)

There does seem to be a need to exposed the 'scale' method publicly -- it is better to make AddRecord to take an optional weight parameter as the previous version, and let AddRecord's implementation to call private scale method.

138 ↗(On Diff #40201)

Similarly here merge can take an optional weight parameter.

slingn added inline comments.Nov 14 2015, 7:41 AM
include/llvm/ProfileData/InstrProf.h
324 ↗(On Diff #40201)

It looks like this is part of the implementation of merge(). Is there a need for it to be public?

I will refactor in a separate change set.

413 ↗(On Diff #40201)

Will do.

include/llvm/Support/MathExtras.h
656 ↗(On Diff #40201)

Will do.

lib/ProfileData/InstrProfWriter.cpp
120 ↗(On Diff #40201)

OK.

tools/llvm-profdata/llvm-profdata.cpp
88 ↗(On Diff #40201)

I assume you meant 'does not seem to be a need'?

OK. That will mean making a copy of the record internally in order to scale it since AddRecord shouldn't assume that it is allowed to mutate the passed in record.

The other alternative is to revert to the previous method of applying the weight during merge. That would be more efficient (than making a temporary scaled record) but spreads out the weighting logic across a number of methods.

silvas added inline comments.Nov 16 2015, 7:11 PM
tools/llvm-profdata/llvm-profdata.cpp
155 ↗(On Diff #40201)
158 ↗(On Diff #40201)

typo "postive"

slingn updated this revision to Diff 41681.Dec 2 2015, 2:36 PM
slingn marked 6 inline comments as done.

Updated for davidxl and silvas feedback.

Split changes not directly related to weighted merging into separate patches which are now merged.

slingn marked an inline comment as done.Dec 2 2015, 2:39 PM
silvas added inline comments.Dec 2 2015, 2:57 PM
docs/CommandGuide/llvm-profdata.rst
31 ↗(On Diff #41681)

this is confusing. I interpret filenames[:weights]... as that there are multiple weights for a given filename. It seems like filename[:weight] ... is a better way to describe it. The repetition comes from the ... not from the plurality of the name (I would say that what was there previously was incorrect).

40 ↗(On Diff #41681)

This isn't a very user-friendly description. I would describe it as: "instead of <filename> pass <filename>:<weight>, where <weight> is a decimal integer >= 1. A weight of 1 is assume if just <filename> by itself is passed"

slingn updated this revision to Diff 41684.Dec 2 2015, 3:30 PM
slingn marked 2 inline comments as done.

Updated for silvas feedback.

davidxl added inline comments.Dec 2 2015, 3:35 PM
include/llvm/ProfileData/InstrProf.h
445 ↗(On Diff #41684)

It might be better to refactor the 'fused' Multiple/Add into a helper -- it is used in two different places.

include/llvm/ProfileData/SampleProf.h
239 ↗(On Diff #41684)

Why is SaturatingAdd not used here?

245 ↗(On Diff #41684)

Same here.

slingn added inline comments.Dec 3 2015, 3:16 PM
include/llvm/ProfileData/InstrProf.h
445 ↗(On Diff #41684)

That sounds like a good idea. I'll do that in a separate change.

include/llvm/ProfileData/SampleProf.h
239 ↗(On Diff #41684)

Good question. I'll make that change in a separate patch since it is changing the existing semantics and should have its own unit tests.

davidxl accepted this revision.Dec 3 2015, 3:18 PM
davidxl edited edge metadata.
This revision is now accepted and ready to land.Dec 3 2015, 3:18 PM
This revision was automatically updated to reflect the committed changes.