This is an archive of the discontinued LLVM Phabricator instance.

[PGO] Detect more structural changes with the stable hash
ClosedPublic

Authored by vsk on Oct 30 2017, 7:48 PM.

Details

Summary

Lifting from Bob Wilson's notes: The hash value that we compute and
store in PGO profile data to detect out-of-date profiles does not
include enough information. This means that many significant changes to
the source will not cause compiler warnings about the profile being out
of date, and worse, we may continue to use the outdated profile data to
make bad optimization decisions. There is some tension here because
some source changes won't affect PGO and we don't want to invalidate the
profile unnecessarily.

This patch adds a new hashing scheme which is more sensitive to loop
nesting, conditions, and out-of-order control flow. Here are examples
which show snippets which get the same hash under the current scheme,
and different hashes under the new scheme:

Loop Nesting Example

// Snippet 1
while (foo()) {
  while (bar()) {}
}

// Snippet 2
while (foo()) {}
while (bar()) {}

Condition Example

// Snippet 1
if (foo())
  bar();
baz();

// Snippet 2
if (foo())
  bar();
else
  baz();

Out-of-order Control Flow Example

// Snippet 1
while (foo()) {
  if (bar()) {}
  baz();
}

// Snippet 2
while (foo()) {
  if (bar())
    continue;
  baz();
}

In each of these cases, it's useful to differentiate between the
snippets because swapping their profiles gives bad optimization hints.

The new hashing scheme considers some logical operators in an effort to
detect more changes in conditions. This isn't a perfect scheme. E.g, it
does not produce the same hash for these equivalent snippets:

// Snippet 1
bool c = !a || b;
if (d && e) {}

// Snippet 2
bool f = d && e;
bool c = !a || b;
if (f) {}

This would require an expensive data flow analysis. Short of that, the
new hashing scheme looks reasonably complete, based on a scan over the
statements we place counters on.

Profiles which use the old version of the PGO hash remain valid and can
be used without issue (there are tests in tree which check this).

rdar://17068282

Diff Detail

Event Timeline

vsk created this revision.Oct 30 2017, 7:48 PM
arphaman added inline comments.Oct 31 2017, 9:31 AM
lib/CodeGen/CodeGenPGO.cpp
186

Am I missing something or will this call always return the same type as before since the first if will weed out statements which have hash only in v2 (since getHashType(PGO_HASH_V1, S) will return None for the new statements)?

bob.wilson edited edge metadata.Oct 31 2017, 9:47 AM

I'm excited to see some progress on this, but since there is overhead to adding a new hashing scheme, I think we should do more before introducing a new scheme. One of the problems with the previous scheme is that is did not take into account nesting. Distinguishing an if-statement from an if-else statement is good but we also need to distinguish what code is inside the then-block, else-block, and the code afterward. As it stands, I believe the following are all hashed the same, even with this patch:

Loop after the if-else:

if (condition1())
  x = 1;
else
  x = 2;
while (condition2()) body();

Loop in the else-block:

if (condition1())
  x = 1;
else
  while (condition2()) body();

Loop in the then-block:

if (condition1()) {
  while (condition2()) body();
} else
  x = 2

It would be good to fix that.

vsk planned changes to this revision.Oct 31 2017, 9:51 AM

Apart from Alex's point, this patch is missing a compatibility test for v5 of the profile format, and is missing coverage for the new entries in HashType.

lib/CodeGen/CodeGenPGO.cpp
186

You're right, I need to rethink this. At the moment, the hash only changes for IfStmts which have Else blocks.

vsk updated this revision to Diff 121224.Nov 1 2017, 6:16 PM
vsk marked 2 inline comments as done.
vsk edited the summary of this revision. (Show Details)
  • Handle loop nesting, conditions, and out-of-order control flow.
  • Improve test coverage. Add a format compatibility test, and check that functions which were previously hashed the same way get different hashes now.
vsk updated this revision to Diff 122368.Nov 9 2017, 4:42 PM
vsk edited the summary of this revision. (Show Details)
  • Consider logical nots and some binary comparison operators in the hash, per an offline conversation with @bogner
arphaman edited edge metadata.Nov 10 2017, 3:44 PM

Generally it LG

lib/CodeGen/CodeGenPGO.cpp
229

What about ObjCAtTryStmt?

255

Might as well include the ObjCAtThrowStmt as well.

vsk added inline comments.Nov 10 2017, 3:59 PM
lib/CodeGen/CodeGenPGO.cpp
229

I thought about this, but: we don't place a counter on ObjC at-try's, so it seems a bit moot to invalidate the profile if we see one change.

255

If profiling at-try isn't very important, I'm not sure the hash should be sensitive to at-throw either. I do see situations where it could be helpful, but IMO the potential benefit is outweighed by having extra invalidations.

arphaman accepted this revision.Nov 10 2017, 4:02 PM
This revision is now accepted and ready to land.Nov 10 2017, 4:02 PM
vsk added a comment.Nov 14 2017, 3:09 PM

Thanks for the review Alex. There hasn't been any more feedback so I'll commit this soon.

This revision was automatically updated to reflect the committed changes.