This is an archive of the discontinued LLVM Phabricator instance.

[llvm-profdata] Add overlap command to compute similarity b/w two profile files
ClosedPublic

Authored by xur on Apr 22 2019, 12:19 PM.

Details

Diff Detail

Event Timeline

xur created this revision.Apr 22 2019, 12:19 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 22 2019, 12:19 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
xur updated this revision to Diff 196118.Apr 22 2019, 12:21 PM

Minor fix of the comments.

xur updated this revision to Diff 196121.Apr 22 2019, 12:27 PM

Removed some un-intended changes.

davidxl added inline comments.Apr 23 2019, 10:59 AM
llvm/docs/CommandGuide/llvm-profdata.rst
240

match underscore length with word.

250

show --> overlap

253

as follows

258

add space between ')' and 'and'

259

what does 'unique' mean here?

270

This definition of 'overlap' captures branch probability well (relatvie), but loses the information on absolute count. There should be an option to measure that.

xur marked 7 inline comments as done.Apr 23 2019, 3:43 PM
xur added inline comments.
llvm/docs/CommandGuide/llvm-profdata.rst
259

I changed to "unmatched counters (or counters only existing in) "

270

You are right that it does not computer the absolute counter difference.

The overlap is actually the overlap of distribution of the counts.
I changed the description a little bit to reflect this.

The summary of the overlap does print out the sum of the counts for both profiles. But the absolute counter difference is not the main purpose.
If I want that, I think it should belong to another sub-command, like "diff".

xur updated this revision to Diff 196347.Apr 23 2019, 3:45 PM
xur marked an inline comment as done.

Integrated David's review comments.

davidxl added inline comments.Apr 24 2019, 2:19 PM
llvm/docs/CommandGuide/llvm-profdata.rst
268

not --> no

286

-o=output or -o output

llvm/lib/ProfileData/InstrProf.cpp
1211

For function level dump, the # of function is redundant -- perhaps change it to # of counters

The overall dump format looks better with this:

  1. program level:

    Edge Profile: (overlap, # of functions ) 80%, 10 Incall Profile: (overlap, # of functions) 70%, 10 ...
  2. function level Edge Profile: (overlap, # of counters) 75%, 10 ...
1212

The header and content does not match in fields.

llvm/lib/ProfileData/InstrProfWriter.cpp
190

Why is this a writer method?

llvm/tools/llvm-profdata/llvm-profdata.cpp
223

The title print including the summary can be pushed into FuncOverlap::dump() method

228

The title can be pushed into FunctionOverlap::dump() method.

xur marked 7 inline comments as done.Apr 24 2019, 3:04 PM
xur added inline comments.
llvm/docs/CommandGuide/llvm-profdata.rst
286

it does take -output.
may be "-output=output, -output output, -o=output, or -o output"? (it's a little verbose). I actually copied this line from merge and show subcommand

llvm/lib/ProfileData/InstrProf.cpp
1211

I will change to this format.

add number of counts to function level is a good idea.

1212

why this does not match?
the first column is the percent score and the second is the number if functions in this category. I don't see the problem.

llvm/lib/ProfileData/InstrProfWriter.cpp
190

because this reuses the merge framework (load the base profile to write context) and load the second one to do the overlap calculation.

llvm/tools/llvm-profdata/llvm-profdata.cpp
223

This would require to put a flag in the structure to distinguish whether this is function level or program level. It can be done.

228

This would require add both profile file name to the structure.

xur updated this revision to Diff 196884.Apr 26 2019, 10:59 AM

Integrated David's review comments. Mainly changed the reporting format.

davidxl added inline comments.Apr 29 2019, 10:05 AM
llvm/include/llvm/ProfileData/InstrProf.h
594

How to tell if count or percentage is stored here?

597

The name EdgeCount is misleading -- it sounds like number of 'edges' in CFG. Perhaps just 'TotalCount' or 'CountSum'

612

BaseSum-->Base

614

TestSum --> Test

llvm/lib/ProfileData/InstrProf.cpp
482

This is not 'getCountSums' -- but 'accumuateCounts' -- perhaps rename it.

511

The loop nest can be merged into one single loop:

while (i != ie && j != je) {

  if (i->v == j->v) {
     //match:
     ...
     i++; j++;
  } else if (i->v > j->v) {
      j++;
      // handle mismatch case -- as one value appears in only one record (assuming count == 0)
     ...
  } else {
      i++;
       // handle mismatch
  }
}
// handle the rest of the mismatched cases.

}

553

already be computed and is nonzero

560

Perhaps just check the function hash?

596

+= or just = ?

llvm/tools/llvm-profdata/llvm-profdata.cpp
212

-> silently

I don't see a return here.

xur marked 12 inline comments as done.Apr 29 2019, 10:39 AM
xur added inline comments.
llvm/include/llvm/ProfileData/InstrProf.h
594

The caller should keep track of this. This data structure does not know -- it just a floating points.

llvm/lib/ProfileData/InstrProf.cpp
511

Yes. The version I used is a copy from merge function.
Your version is better here because it will terminate when one of arrays reaches the end.

553

Yes. it's true for call chains in this patch.
I put an assert here just to catch others callers that not init TestSum.

560

FuncHash already being checked (In Reader). This is just to handle the cases that escapes the hash check.

596

they are the same. the incoming value should be 0 by the initializer.

xur marked 5 inline comments as done.Apr 29 2019, 10:41 AM
xur added inline comments.
llvm/tools/llvm-profdata/llvm-profdata.cpp
212

fixed

xur updated this revision to Diff 197151.Apr 29 2019, 11:31 AM
xur marked an inline comment as done.

Integrated David's review comments

xur updated this revision to Diff 197161.Apr 29 2019, 12:32 PM

Discussed offline with David. We changed the output to a simpler: "Description: <Value>" format. This tool will mostly used in a script. Simpler format will make text grep cleaner.

davidxl added inline comments.Apr 29 2019, 2:25 PM
llvm/lib/ProfileData/InstrProf.cpp
1240

"mismatched count percentage(EdgeProfile): " seems more obvious.

1243

percentage of EdgeProfile only in test_profile:

llvm/lib/ProfileData/InstrProfWriter.cpp
219

why resetting it to zero?

xur marked 4 inline comments as done.Apr 29 2019, 3:10 PM
xur added inline comments.
llvm/lib/ProfileData/InstrProfWriter.cpp
219

setting to zero so that the comparison always true in function InstrProfRecord::overlap() (so that we output function level information).
this way we only need to pass the cutoff value rather the filter object.

xur updated this revision to Diff 197204.Apr 29 2019, 3:35 PM
xur marked an inline comment as done.

Adjusted the output string with David's comments.
Also made the naming more uniformed for edge and value profiles.

This revision is now accepted and ready to land.Apr 30 2019, 12:00 PM
This revision was automatically updated to reflect the committed changes.