To avoid hash collisions when the only difference is in mem ops.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Time | Test | |
---|---|---|
3,770 ms | linux > Clang.OpenMP::declare_mapper_codegen.cpp | |
8,870 ms | linux > Clang.OpenMP::target_update_codegen.cpp |
Event Timeline
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | ||
---|---|---|
648 | 2 bits for ValueSites[IPVK_MemOPSize].size(). Is that sufficient? |
Rong has pointed out that having 2 bits may not be enough -- also instead of using | operator, add is better.
Example: for a comdat function with two copies: one has 4 memop sites, while the other has zero. Assuming both has one icall site and cfg and selects are the same, then their resulting hash will still clash if Or operator is used.
Another example, similar to the above, but one has 5 sites, the other has 1 sites -- the resulting hash is also the same. Even though both copies will have vp data statically allocated, we may end up with out of bound access if the inline instance that assumes 5 sites is using the profv with only 1 site statically allocated.
I think we should at least change the '|' to +.
Better yet, just use JamRC to compute the higher 32bit hash using the four values.
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | ||
---|---|---|
646–647 | The comment looks stale. |
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | ||
---|---|---|
646–647 | I think the 4 bits are still reserved for CSPGO. See https://reviews.llvm.org/rG6cdf3d8086fe. |
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | ||
---|---|---|
627–628 | To make sure I understand: we use two CRCs instead of one CRC because we want to extract low/high 32 bits for debugging purposes? Otherwise, we can just use one CRC. |
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | ||
---|---|---|
627–628 | JamCRC's result is 32-bit, but we have 64-bit (actually 60-bits) of spare room. Using more bits should decrease the chance of hash collisions. |
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | ||
---|---|---|
647–674 | Actually I think it'd be better to shift by 28 here rather than throwing out the top 4 bits for reservation. Also, adding would be better than or'ing. Will update. |
We may also need to bump both the raw and index format version with this change. For the profile use side, we also need to keep the hashing scheme of the older version (in profile-use side). More details to come.
Many tests can also be enhanced to filter out the hash details if they are not part the main test.
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | ||
---|---|---|
627–628 | JamCRC produces a 32bit value. |
On version bump --- looks like FrontPGO has been bumping version for hashing changes while IR PGO had not, so it is ok to leave the version unchanged (current situation is also confusing as the version of IR and FE PGO are mixed).
On the other hand, just in case, we need to introduce a temporary internal option for people to be able to keep on using old profile (with older scheme of hashing) for a while. Basically, under the option, the hashing scheme falls back to the old way.
other than that, looks fine.
Many tests can also be enhanced to filter out the hash details if they are not part the main test.
Done.
How exactly? Do you mean commit the non-test part without flipping the flag and then flip it with the test changes?
changes like in llvm/test/Transforms/PGOProfile/PR41279.ll etc can be independently committed.
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | ||
---|---|---|
266 | Nit: descriptions usually don't end with a period (I know that some descriptions in this file are not consistent) |
llvm/test/Transforms/PGOProfile/Inputs/multiple_hash_profile.proftext | ||
---|---|---|
1 | We can convert this test case to cover the new option introduced: basically create another profile entry for _Z3fooi with the old hash. Change the counter value a little and expect the profile-use match the old entry when the option is used. |
Address comment.
llvm/test/Transforms/PGOProfile/Inputs/multiple_hash_profile.proftext | ||
---|---|---|
1 | Done. |
lgtm (with the small test enhancement)
llvm/test/Transforms/PGOProfile/Inputs/multiple_hash_profile.proftext | ||
---|---|---|
10 | nit: change 12 to a different value say 6.(otherwise if the option is an nop, the test will still succeed). | |
llvm/test/Transforms/PGOProfile/multiple_hash_profile.ll | ||
3 ↗ | (On Diff #281708) | Use a different prefix like CHECKOLD |
33 ↗ | (On Diff #281708) | CHECKOLD expects a different weight. |
lgtm
Just realized that we need a test case to show it fixes the original issue (existence with memop --> different hash). Ok as a follow up .
Builtbot failure: http://lab.llvm.org:8011/builders/clang-ppc64be-linux/builds/51984
I think it's an endian issue in the hash computation.
Will revert.
right. It occurred to me during review, but did not think of the hard coded
values in proftext depends on LE.
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | ||
---|---|---|
661 | #include "llvm/Support/Endian.h" and use support::endian::write64le(&Data.N, (uint64_t)SIVisitor.getNumOfSelectInsts()) |
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | ||
---|---|---|
661 | Sorry, just use a plain uint8_t C[8]; and support::endian::write64le(C, (uint64_t)SIVisitor.getNumOfSelectInsts()) |
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | ||
---|---|---|
661 | I went for the same style (using a vector) as the above code (line 642). |
Here's the diff in the latest update:
< + union { < + uint64_t N; < + uint8_t C[8]; < + } Data; < + Data.N = (uint64_t)SIVisitor.getNumOfSelectInsts(); < + JCH.update(Data.C); < + Data.N = (uint64_t)ValueSites[IPVK_IndirectCallTarget].size(); < + JCH.update(Data.C); < + Data.N = (uint64_t)ValueSites[IPVK_MemOPSize].size(); < + JCH.update(Data.C); < + Data.N = (uint64_t)MST.AllEdges.size(); < + JCH.update(Data.C); --- > + auto updateJCH = [&JCH](uint64_t Num) { > + std::vector<uint8_t> Data; > + for (unsigned I = 0; I < 8; ++I) > + Data.push_back((uint8_t)(Num >> (I * 8))); > + JCH.update(Data); > + }; > + updateJCH((uint64_t)SIVisitor.getNumOfSelectInsts()); > + updateJCH((uint64_t)ValueSites[IPVK_IndirectCallTarget].size()); > + updateJCH((uint64_t)ValueSites[IPVK_MemOPSize].size()); > + updateJCH((uint64_t)MST.AllEdges.size());
Latest update diff:
< + std::vector<uint8_t> Data; < + for (unsigned I = 0; I < 8; ++I) < + Data.push_back((uint8_t)(Num >> (I * 8))); --- > + uint8_t Data[8]; > + support::endian::write64le(Data, Num);
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | ||
---|---|---|
661 | Now uses write64le. |
Relanded as 3d6f53018f845e893ad34f64ff2851a2e5c3ba1d
@yamauchi For a reland, you still need to include Differential Revision: so that the differential can be closed automatically when you push the commit.
It is also a convention to explain what has changed compared with the initial land.
As a hindsight, the option -pgo-instr-old-cfg-hashing should probably have been mentioned in the description.
I think it'd be good to have them, if possible, though it's a latent, non-recent bug.
They're hard to apply without 50da55a58534e9207d8d5e31c8b4b5cf0c624175. Do you think we should take that also? Or maybe this can wait since it's not a new bug.
Nit: descriptions usually don't end with a period
(I know that some descriptions in this file are not consistent)