This is an archive of the discontinued LLVM Phabricator instance.

[esan|cfrag] Compute the struct field access difference ratio
ClosedPublic

Authored by zhaoqin on Jun 2 2016, 8:42 AM.

Details

Summary

Computes the struct field access variation based on each field access
count.

Adds a flag to control the report thresholds.

Updates struct-simple.cpp with variance report output.

Diff Detail

Repository
rL LLVM

Event Timeline

zhaoqin updated this revision to Diff 59387.Jun 2 2016, 8:42 AM
zhaoqin retitled this revision from to [esan|cfrag] Compute the struct field access variance.
zhaoqin updated this object.
zhaoqin added a reviewer: aizatsky.
zhaoqin added subscribers: bruening, kcc, vitalybuka and 3 others.
zhaoqin updated this revision to Diff 59443.Jun 2 2016, 12:27 PM

Rebase to the ToT, and apply clang-format

aizatsky added inline comments.Jun 2 2016, 1:41 PM
lib/esan/cache_frag.cpp
44 ↗(On Diff #59443)

Is variance a well-established term here? If not, we need to find something else.
Variance has a very strict meaning in my mind: "variance is the expectation of the squared deviation of a random variable from its mean".

62 ↗(On Diff #59443)

As far as I can see you compute this:

Log[t, v1 / (v2 + 1) / t]

Is this right?

This is equivalent to:

Log[t, v1/(v2+1)]-1

So lets:

a) document this formula
b) maybe get rid of this -1? Would simplify things.

62 ↗(On Diff #59443)

If my assumption about formula is right, then how about logratio as a metric name?

64 ↗(On Diff #59443)

Let's add Threshold as a parameter.

66 ↗(On Diff #59443)

Why + 1? Because of 0 values?

What do you imagine computeVarianceScore(v1, 0, t) look like?

108 ↗(On Diff #59443)

I'm not sure that adding log-ratios together is a good thing to do. Do you have any reasoning why you do it this way?

zhaoqin updated this revision to Diff 59464.Jun 2 2016, 3:10 PM
zhaoqin marked 4 inline comments as done.

PTAL

lib/esan/cache_frag.cpp
62 ↗(On Diff #59443)

changed the algorithm a bit with an added weight, not sure if logratio is still suitable.

62 ↗(On Diff #59443)

add comment about the formula.
The reason I have -1 is to make the score 0 if the two values are similar, so it is still 0 even with adding some weight, add comment.

66 ↗(On Diff #59443)

Yes, it is for handling 0.
for computeVarianceScore(v1, 0, t), it should be roughly the same as computeVarianceScore(v1, 1, t)
The rationale of computeVarianceScore is to evaluate how far apart the two value are.
so the variance score for (v1, 0) should be closest to the score of (v1, 1).
Use condition instead.

108 ↗(On Diff #59443)

You are right, I add weight in the computeVarianceScore, so it is no longer a simple log-ratios.

zhaoqin added inline comments.Jun 2 2016, 3:13 PM
lib/esan/cache_frag.cpp
44 ↗(On Diff #59464)

oops, missed this comment, how about just Difference?

zhaoqin updated this revision to Diff 59466.Jun 2 2016, 3:17 PM

Rename variance to difference

aizatsky added inline comments.Jun 2 2016, 3:40 PM
lib/esan/cache_frag.cpp
62 ↗(On Diff #59466)

Before we move to the weight, let's discuss logarithm base for a while.

Obviously:

log(T, V1/V2) = log2(v1/v2) / log2(T)

So changing T would just change a constant factor for all measurements. Why not statically fix it?

Maybe just use log2(v1/v2) without base? you could use __builtin_clzll for that.

63 ↗(On Diff #59466)

As for weight, can you please share your thinking why do you need it?

66 ↗(On Diff #59466)

Is there a "-1" now?

68 ↗(On Diff #59466)

Let's rename Threshold to Base.

73 ↗(On Diff #59466)

if (Val2 > Val1) { swap(Val1, Val2); }

You won't need the Weight then.

80 ↗(On Diff #59466)

In any case I suggest you extract a simple FastLog function.

zhaoqin marked 6 inline comments as done.Jun 3 2016, 9:19 AM
zhaoqin added inline comments.
lib/esan/cache_frag.cpp
62 ↗(On Diff #59466)

The major reason is to have a runtime flag for T is that I am not sure what's the right way to evaluate the variation of the struct field counts.
I want the overall score for the variation having a few features:

  1. The bigger the difference is, the higher the score should be: e.g., { 2^20, 1, 2^20 } should have higher score than { 2^10, 2^8, 2^10, 2^8, 2^10, 2^8, ... }
  2. The higher the count value is, the higher the score should be: e.g., { 2^20, 2^10 } should have higher score than {2^10, 1}.

It looks like I should not use log to compute at all, which gets rid of the count factor, maybe a simple ratio should work, e.g., V1/V2/Base

63 ↗(On Diff #59466)

For two pairs of value: (2^20, 2^10), and (2^10, 1), I want to say (2^20, 2^10) worth more attention, and so should have a higher score.
Now use V1/V2/Base, so do not need weight.

66 ↗(On Diff #59466)

I used the while (Ratio > Threshold) to make the score 0, probably not a good idea, make the code look different from the algo. Update it.

80 ↗(On Diff #59466)

stop using log

zhaoqin marked 4 inline comments as done.Jun 3 2016, 9:20 AM
zhaoqin updated this revision to Diff 59571.Jun 3 2016, 9:21 AM

Use v1/v2/base to calculate the difference score.

aizatsky requested changes to this revision.Jun 3 2016, 11:40 AM
aizatsky edited edge metadata.
aizatsky added inline comments.
lib/esan/cache_frag.cpp
45 ↗(On Diff #59571)

Let's call it ratio then.

54 ↗(On Diff #59571)

If we are sticking to a ratio, then Sum(ratios) doesn't make lots of sense.

This is the whole-program metric, right?

How about considering (one of) the following metrics (Ci - counts, Ri - ratio):

  1. Sum(Abs(Ci - C{i+1}))? - Total "Assymetry". You can divide it by TotalCount to get a relative metric.
  1. Max(Ri) (Mean, Median, StdDev, GeomMean)

BTW if you want this in sooner, I suggest you remove TotalDifference from this CL altogether, and let's figure it out in a separate one?

63 ↗(On Diff #59571)

The only reason to use log is to make huge differences look smaller. I suspect that this is not your case - you want huge differences look huge, because CPUs are linear. I approve the use of the linear scale.

70 ↗(On Diff #59571)

Unless there is an overflow concern (which there shouldn't be, right?), I suggest storing the unscaled ratio. You can always introduce scaling when you print.

This revision now requires changes to proceed.Jun 3 2016, 11:40 AM
zhaoqin marked 3 inline comments as done.Jun 3 2016, 12:11 PM
zhaoqin added inline comments.
lib/esan/cache_frag.cpp
54 ↗(On Diff #59571)

Removed.

63 ↗(On Diff #59571)

ack

zhaoqin marked an inline comment as done.Jun 3 2016, 12:11 PM
zhaoqin updated this object.Jun 3 2016, 12:14 PM
zhaoqin edited edge metadata.
zhaoqin retitled this revision from [esan|cfrag] Compute the struct field access variance to [esan|cfrag] Compute the struct field access difference ratio.
zhaoqin updated this revision to Diff 59596.Jun 3 2016, 12:15 PM
zhaoqin edited edge metadata.

PTAL

zhaoqin updated this revision to Diff 59597.Jun 3 2016, 12:20 PM
zhaoqin edited edge metadata.

update the comment about computing the difference ratio.

aizatsky accepted this revision.Jun 3 2016, 1:19 PM
aizatsky edited edge metadata.
This revision is now accepted and ready to land.Jun 3 2016, 1:19 PM
This revision was automatically updated to reflect the committed changes.