This is an archive of the discontinued LLVM Phabricator instance.

[llvm-profdata] Add overlap command to compute similarity b/w two profile files
ClosedPublic

Authored by xur on Apr 22 2019, 12:19 PM.

Details

Diff Detail

Repository
rL LLVM

Event Timeline

xur created this revision.Apr 22 2019, 12:19 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 22 2019, 12:19 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
xur updated this revision to Diff 196118.Apr 22 2019, 12:21 PM

Minor fix of the comments.

xur updated this revision to Diff 196121.Apr 22 2019, 12:27 PM

Removed some un-intended changes.

davidxl added inline comments.Apr 23 2019, 10:59 AM
llvm/docs/CommandGuide/llvm-profdata.rst
240 ↗(On Diff #196121)

match underscore length with word.

250 ↗(On Diff #196121)

show --> overlap

253 ↗(On Diff #196121)

as follows

258 ↗(On Diff #196121)

add space between ')' and 'and'

259 ↗(On Diff #196121)

what does 'unique' mean here?

270 ↗(On Diff #196121)

This definition of 'overlap' captures branch probability well (relatvie), but loses the information on absolute count. There should be an option to measure that.

xur marked 7 inline comments as done.Apr 23 2019, 3:43 PM
xur added inline comments.
llvm/docs/CommandGuide/llvm-profdata.rst
259 ↗(On Diff #196121)

I changed to "unmatched counters (or counters only existing in) "

270 ↗(On Diff #196121)

You are right that it does not computer the absolute counter difference.

The overlap is actually the overlap of distribution of the counts.
I changed the description a little bit to reflect this.

The summary of the overlap does print out the sum of the counts for both profiles. But the absolute counter difference is not the main purpose.
If I want that, I think it should belong to another sub-command, like "diff".

xur updated this revision to Diff 196347.Apr 23 2019, 3:45 PM
xur marked an inline comment as done.

Integrated David's review comments.

davidxl added inline comments.Apr 24 2019, 2:19 PM
llvm/docs/CommandGuide/llvm-profdata.rst
268 ↗(On Diff #196347)

not --> no

286 ↗(On Diff #196347)

-o=output or -o output

llvm/lib/ProfileData/InstrProf.cpp
1210 ↗(On Diff #196347)

For function level dump, the # of function is redundant -- perhaps change it to # of counters

The overall dump format looks better with this:

  1. program level:

    Edge Profile: (overlap, # of functions ) 80%, 10 Incall Profile: (overlap, # of functions) 70%, 10 ...
  2. function level Edge Profile: (overlap, # of counters) 75%, 10 ...
1211 ↗(On Diff #196347)

The header and content does not match in fields.

llvm/lib/ProfileData/InstrProfWriter.cpp
190 ↗(On Diff #196347)

Why is this a writer method?

llvm/tools/llvm-profdata/llvm-profdata.cpp
223 ↗(On Diff #196347)

The title print including the summary can be pushed into FuncOverlap::dump() method

228 ↗(On Diff #196347)

The title can be pushed into FunctionOverlap::dump() method.

xur marked 7 inline comments as done.Apr 24 2019, 3:04 PM
xur added inline comments.
llvm/docs/CommandGuide/llvm-profdata.rst
286 ↗(On Diff #196347)

it does take -output.
may be "-output=output, -output output, -o=output, or -o output"? (it's a little verbose). I actually copied this line from merge and show subcommand

llvm/lib/ProfileData/InstrProf.cpp
1210 ↗(On Diff #196347)

I will change to this format.

add number of counts to function level is a good idea.

1211 ↗(On Diff #196347)

why this does not match?
the first column is the percent score and the second is the number if functions in this category. I don't see the problem.

llvm/lib/ProfileData/InstrProfWriter.cpp
190 ↗(On Diff #196347)

because this reuses the merge framework (load the base profile to write context) and load the second one to do the overlap calculation.

llvm/tools/llvm-profdata/llvm-profdata.cpp
223 ↗(On Diff #196347)

This would require to put a flag in the structure to distinguish whether this is function level or program level. It can be done.

228 ↗(On Diff #196347)

This would require add both profile file name to the structure.

xur updated this revision to Diff 196884.Apr 26 2019, 10:59 AM

Integrated David's review comments. Mainly changed the reporting format.

davidxl added inline comments.Apr 29 2019, 10:05 AM
llvm/include/llvm/ProfileData/InstrProf.h
594 ↗(On Diff #196884)

How to tell if count or percentage is stored here?

597 ↗(On Diff #196884)

The name EdgeCount is misleading -- it sounds like number of 'edges' in CFG. Perhaps just 'TotalCount' or 'CountSum'

612 ↗(On Diff #196884)

BaseSum-->Base

614 ↗(On Diff #196884)

TestSum --> Test

llvm/lib/ProfileData/InstrProf.cpp
482 ↗(On Diff #196884)

This is not 'getCountSums' -- but 'accumuateCounts' -- perhaps rename it.

511 ↗(On Diff #196884)

The loop nest can be merged into one single loop:

while (i != ie && j != je) {

  if (i->v == j->v) {
     //match:
     ...
     i++; j++;
  } else if (i->v > j->v) {
      j++;
      // handle mismatch case -- as one value appears in only one record (assuming count == 0)
     ...
  } else {
      i++;
       // handle mismatch
  }
}
// handle the rest of the mismatched cases.

}

553 ↗(On Diff #196884)

already be computed and is nonzero

560 ↗(On Diff #196884)

Perhaps just check the function hash?

596 ↗(On Diff #196884)

+= or just = ?

llvm/tools/llvm-profdata/llvm-profdata.cpp
212 ↗(On Diff #196884)

-> silently

I don't see a return here.

xur marked 12 inline comments as done.Apr 29 2019, 10:39 AM
xur added inline comments.
llvm/include/llvm/ProfileData/InstrProf.h
594 ↗(On Diff #196884)

The caller should keep track of this. This data structure does not know -- it just a floating points.

llvm/lib/ProfileData/InstrProf.cpp
511 ↗(On Diff #196884)

Yes. The version I used is a copy from merge function.
Your version is better here because it will terminate when one of arrays reaches the end.

553 ↗(On Diff #196884)

Yes. it's true for call chains in this patch.
I put an assert here just to catch others callers that not init TestSum.

560 ↗(On Diff #196884)

FuncHash already being checked (In Reader). This is just to handle the cases that escapes the hash check.

596 ↗(On Diff #196884)

they are the same. the incoming value should be 0 by the initializer.

xur marked 5 inline comments as done.Apr 29 2019, 10:41 AM
xur added inline comments.
llvm/tools/llvm-profdata/llvm-profdata.cpp
212 ↗(On Diff #196884)

fixed

xur updated this revision to Diff 197151.Apr 29 2019, 11:31 AM
xur marked an inline comment as done.

Integrated David's review comments

xur updated this revision to Diff 197161.Apr 29 2019, 12:32 PM

Discussed offline with David. We changed the output to a simpler: "Description: <Value>" format. This tool will mostly used in a script. Simpler format will make text grep cleaner.

davidxl added inline comments.Apr 29 2019, 2:25 PM
llvm/lib/ProfileData/InstrProf.cpp
1240 ↗(On Diff #197161)

"mismatched count percentage(EdgeProfile): " seems more obvious.

1243 ↗(On Diff #197161)

percentage of EdgeProfile only in test_profile:

llvm/lib/ProfileData/InstrProfWriter.cpp
219 ↗(On Diff #197161)

why resetting it to zero?

xur marked 4 inline comments as done.Apr 29 2019, 3:10 PM
xur added inline comments.
llvm/lib/ProfileData/InstrProfWriter.cpp
219 ↗(On Diff #197161)

setting to zero so that the comparison always true in function InstrProfRecord::overlap() (so that we output function level information).
this way we only need to pass the cutoff value rather the filter object.

xur updated this revision to Diff 197204.Apr 29 2019, 3:35 PM
xur marked an inline comment as done.

Adjusted the output string with David's comments.
Also made the naming more uniformed for edge and value profiles.

This revision is now accepted and ready to land.Apr 30 2019, 12:00 PM
This revision was automatically updated to reflect the committed changes.