This is an archive of the discontinued LLVM Phabricator instance.

[PGO] Include the mem ops into the function hash.
ClosedPublic

Authored by hjyamauchi on Jul 28 2020, 11:40 AM.

Details

Summary

To avoid hash collisions when the only difference is in mem ops.

Diff Detail

Event Timeline

hjyamauchi created this revision.Jul 28 2020, 11:40 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 28 2020, 11:40 AM
Herald added subscribers: llvm-commits, Restricted Project, hiraditya. · View Herald Transcript
hjyamauchi requested review of this revision.Jul 28 2020, 11:40 AM
MaskRay added inline comments.Jul 28 2020, 11:58 AM
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
648

2 bits for ValueSites[IPVK_MemOPSize].size(). Is that sufficient?
This deserves a comment.

Rong has pointed out that having 2 bits may not be enough -- also instead of using | operator, add is better.

Example: for a comdat function with two copies: one has 4 memop sites, while the other has zero. Assuming both has one icall site and cfg and selects are the same, then their resulting hash will still clash if Or operator is used.

Another example, similar to the above, but one has 5 sites, the other has 1 sites -- the resulting hash is also the same. Even though both copies will have vp data statically allocated, we may end up with out of bound access if the inline instance that assumes 5 sites is using the profv with only 1 site statically allocated.

I think we should at least change the '|' to +.

Better yet, just use JamRC to compute the higher 32bit hash using the four values.

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
646–647

The comment looks stale.

Use JamCRC.

hjyamauchi added inline comments.Jul 28 2020, 2:37 PM
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
646–647

I think the 4 bits are still reserved for CSPGO. See https://reviews.llvm.org/rG6cdf3d8086fe.

MaskRay added inline comments.Jul 28 2020, 2:38 PM
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
627–628

To make sure I understand: we use two CRCs instead of one CRC because we want to extract low/high 32 bits for debugging purposes?

Otherwise, we can just use one CRC.

hjyamauchi added inline comments.Jul 28 2020, 2:47 PM
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
627–628

JamCRC's result is 32-bit, but we have 64-bit (actually 60-bits) of spare room. Using more bits should decrease the chance of hash collisions.

hjyamauchi added inline comments.Jul 28 2020, 2:50 PM
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
647–671

Actually I think it'd be better to shift by 28 here rather than throwing out the top 4 bits for reservation. Also, adding would be better than or'ing. Will update.

We may also need to bump both the raw and index format version with this change. For the profile use side, we also need to keep the hashing scheme of the older version (in profile-use side). More details to come.

Many tests can also be enhanced to filter out the hash details if they are not part the main test.

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
627–628

JamCRC produces a 32bit value.

On version bump --- looks like FrontPGO has been bumping version for hashing changes while IR PGO had not, so it is ok to leave the version unchanged (current situation is also confusing as the version of IR and FE PGO are mixed).

On the other hand, just in case, we need to introduce a temporary internal option for people to be able to keep on using old profile (with older scheme of hashing) for a while. Basically, under the option, the hashing scheme falls back to the old way.

other than that, looks fine.

Shift the higher 32 bits by 28 bits and add.

Many tests can also be enhanced to filter out the hash details if they are not part the main test.

Done.

Add a flag to back out to the old hashing.

On version bump --- looks like FrontPGO has been bumping version for hashing changes while IR PGO had not, so it is ok to leave the version unchanged (current situation is also confusing as the version of IR and FE PGO are mixed).

On the other hand, just in case, we need to introduce a temporary internal option for people to be able to keep on using old profile (with older scheme of hashing) for a while. Basically, under the option, the hashing scheme falls back to the old way.

other than that, looks fine.

Thanks. Done.

Can you split out the test changes and commit it separately?

Rebase (converting one more test) and fixing a typo and whitespace.

Can you split out the test changes and commit it separately?

How exactly? Do you mean commit the non-test part without flipping the flag and then flip it with the test changes?

changes like in llvm/test/Transforms/PGOProfile/PR41279.ll etc can be independently committed.

MaskRay added inline comments.Jul 28 2020, 9:35 PM
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
266

Nit: descriptions usually don't end with a period

(I know that some descriptions in this file are not consistent)

Address comment.

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
266

Done.

changes like in llvm/test/Transforms/PGOProfile/PR41279.ll etc can be independently committed.

Uploaded D84865 for this.

davidxl added inline comments.Jul 29 2020, 11:31 AM
llvm/test/Transforms/PGOProfile/Inputs/multiple_hash_profile.proftext
1

We can convert this test case to cover the new option introduced:

basically create another profile entry for _Z3fooi with the old hash. Change the counter value a little and expect the profile-use match the old entry when the option is used.

Address comment.

llvm/test/Transforms/PGOProfile/Inputs/multiple_hash_profile.proftext
1

Done.

davidxl accepted this revision.Jul 29 2020, 1:06 PM

lgtm (with the small test enhancement)

llvm/test/Transforms/PGOProfile/Inputs/multiple_hash_profile.proftext
10

nit: change 12 to a different value say 6.(otherwise if the option is an nop, the test will still succeed).

llvm/test/Transforms/PGOProfile/multiple_hash_profile.ll
3

Use a different prefix like CHECKOLD

33–36

CHECKOLD expects a different weight.

This revision is now accepted and ready to land.Jul 29 2020, 1:06 PM
MaskRay accepted this revision.Jul 29 2020, 1:26 PM

LGTM as well with @davidxl's comments addressed.

Address comments. PTAL.

llvm/test/Transforms/PGOProfile/Inputs/multiple_hash_profile.proftext
10

Done.

llvm/test/Transforms/PGOProfile/multiple_hash_profile.ll
3

Done.

33–36

Done.

davidxl accepted this revision.Jul 29 2020, 1:33 PM

lgtm

Just realized that we need a test case to show it fixes the original issue (existence with memop --> different hash). Ok as a follow up .

This revision was landed with ongoing or failed builds.Jul 29 2020, 2:00 PM
This revision was automatically updated to reflect the committed changes.

Builtbot failure: http://lab.llvm.org:8011/builders/clang-ppc64be-linux/builds/51984

I think it's an endian issue in the hash computation.

Will revert.

hjyamauchi reopened this revision.Jul 29 2020, 3:06 PM
This revision is now accepted and ready to land.Jul 29 2020, 3:06 PM

right. It occurred to me during review, but did not think of the hard coded
values in proftext depends on LE.

MaskRay added inline comments.Jul 29 2020, 3:33 PM
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
661

#include "llvm/Support/Endian.h"

and use support::endian::write64le(&Data.N, (uint64_t)SIVisitor.getNumOfSelectInsts())

MaskRay added inline comments.Jul 29 2020, 3:34 PM
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
661

Sorry, just use a plain uint8_t C[8]; and support::endian::write64le(C, (uint64_t)SIVisitor.getNumOfSelectInsts())

Fix the endianness issue.

hjyamauchi added inline comments.Jul 29 2020, 4:40 PM
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
661

I went for the same style (using a vector) as the above code (line 642).

Here's the diff in the latest update:

< +    union {
< +      uint64_t N;
< +      uint8_t C[8];
< +    } Data;
< +    Data.N = (uint64_t)SIVisitor.getNumOfSelectInsts();
< +    JCH.update(Data.C);
< +    Data.N = (uint64_t)ValueSites[IPVK_IndirectCallTarget].size();
< +    JCH.update(Data.C);
< +    Data.N = (uint64_t)ValueSites[IPVK_MemOPSize].size();
< +    JCH.update(Data.C);
< +    Data.N = (uint64_t)MST.AllEdges.size();
< +    JCH.update(Data.C);
---
> +    auto updateJCH = [&JCH](uint64_t Num) {
> +      std::vector<uint8_t> Data;
> +      for (unsigned I = 0; I < 8; ++I)
> +        Data.push_back((uint8_t)(Num >> (I * 8)));
> +      JCH.update(Data);
> +    };
> +    updateJCH((uint64_t)SIVisitor.getNumOfSelectInsts());
> +    updateJCH((uint64_t)ValueSites[IPVK_IndirectCallTarget].size());
> +    updateJCH((uint64_t)ValueSites[IPVK_MemOPSize].size());
> +    updateJCH((uint64_t)MST.AllEdges.size());

Use endian::write64le.

Latest update diff:

< +      std::vector<uint8_t> Data;
< +      for (unsigned I = 0; I < 8; ++I)
< +        Data.push_back((uint8_t)(Num >> (I * 8)));
---
> +      uint8_t Data[8];
> +      support::endian::write64le(Data, Num);
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
661

Now uses write64le.

MaskRay closed this revision.Jul 30 2020, 10:10 AM

Relanded as 3d6f53018f845e893ad34f64ff2851a2e5c3ba1d

@yamauchi For a reland, you still need to include Differential Revision: so that the differential can be closed automatically when you push the commit.

It is also a convention to explain what has changed compared with the initial land.

As a hindsight, the option -pgo-instr-old-cfg-hashing should probably have been mentioned in the description.

Yup, -pgo-instr-old-cfg-hashing=true to revert back to the old hashing.

I think it'd be good to have them, if possible, though it's a latent, non-recent bug.

hans added a comment.Aug 5 2020, 8:30 AM
In D84782#2191429, @yamauchi wrote:

I think it'd be good to have them, if possible, though it's a latent, non-recent bug.

They're hard to apply without 50da55a58534e9207d8d5e31c8b4b5cf0c624175. Do you think we should take that also? Or maybe this can wait since it's not a new bug.

Via Conduit