This pass extends StructuralHash to include much more information about
the IR under analysis. This is done with a flag (Detailed) to configure
the behavior. The detailed behavior is intended for use in expensive
checks and downstream.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
| llvm/lib/IR/StructuralHash.cpp | ||
|---|---|---|
| 130–138 | This seems like a very random collection of things to add to the hash. Why isn't this just hashing all the operands? That should cover the operand types, the called function and the intrinsic ID. | |
| llvm/lib/IR/StructuralHash.cpp | ||
|---|---|---|
| 130–138 | I was under the impression that it wasn't possible to just hash a value. I can hash the pointer, but I'm not sure that would be correct in all cases (unless everything is uniqued appropriately). https://github.com/llvm/llvm-project/blob/d9cb76bc4d5e903fe045c58a42fc791d0c70172b/llvm/include/llvm/Analysis/IRSimilarityIdentifier.h#L261 implements logic that seems to follow those assumptions (and is a similar implementation to what is here). Definitely could be that my assumptions are incorrect here though. | |
| llvm/lib/IR/StructuralHash.cpp | ||
|---|---|---|
| 130–138 | You are right that we can't "just" hash the operand pointers, but I'd still use that as the general approach. If the operand is a Constant you should be able to hash the pointer as those are uniqued, for Arguments you can take the argument number, and for Instructions you could use only the type for now (to handle those we'd have to number instructions). | |
| llvm/lib/IR/StructuralHash.cpp | ||
|---|---|---|
| 130–138 | Also, re: pointers, my intention with this is to have a hash that is stable across different modules (interested in looking at function deduplication across modules) given the same function which makes just hashing pointers provide incorrect results. I'm planning on writing in support for other constant types in the future and instruction operands through numbering (in follow-up patches). My intention with the Detailed flag currently isn't to be make every semantically-meaningful difference produce a different hash but to capture most of the common cases and be "good enough" at most cases. | |
FWIW, I think more details, even if not perfect, seems reasonable. We will experiment with this and come back to it as needed.
| llvm/lib/IR/StructuralHash.cpp | ||
|---|---|---|
| 123 | early exit, plz. | |
Address reviewer feedback
- Early exit where pointed out.
- Switch to hashing operands for the most part (except for in the case of cmp)
- Add a couple additional operands to hash, particularly integer constants and arguments.
- Rebase against upstream to include changes to prevent test regressions on 32-bit arches.
Fix minor formatting issue
| llvm/lib/IR/StructuralHash.cpp | ||
|---|---|---|
| 130–138 | I've split this out into an updateInstruction and an updateValue function. updateInstruction handles everything related to instructions and then calls updateValue when handling operands. I think it makes things a little bit clearer as it avoids any reasoning about recursion depending upon if the value is an instruction or not. I've added a couple additional cases when handling operands (rather than just hashing the type) that aren't meant to be exhaustive. | |
| 143 | Yes. The MergeFunctions pass is now using this hash after https://reviews.llvm.org/D158217 and the original hashing implementation did not include additional details (as I believe the intention was to not hash all details) and then use the FunctionComparator class where necessary. | |
this on its own lg, but I have inline comments
| llvm/lib/IR/StructuralHash.cpp | ||
|---|---|---|
| 48 | we really shouldn't have to re-implement llvm::hash_value. IIUC the reason we're not is because 32-bit vs 64-bit hashes are different which causes MergeFunctions to act differently because it sorts hashes. but IMO MergeFunctions shouldn't be sorting by hash, that's kinda bad. I think we should go back to the original version of https://reviews.llvm.org/D158217 and fix MergeFunctions to be more deterministic and not rely on hash values. It seems like we're just currently lucky in that the hashes for test cases in MergeFunctions are ordered consistently between 32/64 bit. | |
| llvm/lib/IR/StructuralHash.cpp | ||
|---|---|---|
| 48 | I agree that it shouldn't be sorting by hash, but I'd rather not touch that if I don't have to. When I relanded that differential in 64da0be1fc06ee2199bd27c980736986e0eccd9d I updated it to use what MergeFunctions did originally and use hash_16_bytes along with a uint64_t rather than size_t (forgot to update the differential with the changes). The MergeFunction test was consistent before https://reviews.llvm.org/D158217 as they just directly called hash_16_bytes and stored everything as a uint64_t. | |
Update patch
- Switch to native hashing implementations for APInt/APFloat
- Switch from hashing type pointers to hashing some details about the type as the former is non-deterministic (which of course is frustratingly masked by my system having ASLR disabled). Left a TODO to refactor the type hashing into a more formal hash_value implementation within the class.
- Formatting changes.
| llvm/lib/IR/StructuralHash.cpp | ||
|---|---|---|
| 48 | Sorry for not reading your message close enough initially. I missed basically the entire point. I've switched over to using the native hash_value implementations for APInt and APFloat to avoid the reimplementation. I think within StructuralHash we can guarantee the same hash between 32 bit and 64 bit platforms outside of the detailed hash case, but inside of the detailed hash case we don't make that guarantee. It would be good to order by something other than hash within MergeFunctions, but the ordering needs to be deterministic and ideally fast and hashes make quite a bit of sense for that (other than the significant churn when they change). I'm thinking keeping the determinism between 64-bit/32-bit makes sense to preserve the MergeFunctions semantics (at least for now) for the non-detailed case and the detailed case can take advantage of hash_value implementations where appropriate. What do you think? | |
Address reviewer feedback
| llvm/lib/IR/StructuralHash.cpp | ||
|---|---|---|
| 51 | Yes. For some reason I was thinking that there was.a generic floating point type in the enum so did them separately. Thanks for the catch! | |
Looks fine. Please test EXPENSIVE_ CHECKS to make sure this doesn't find new missing invalidations.
| llvm/lib/IR/StructuralHash.cpp | ||
|---|---|---|
| 40 | I don't think we should add hash_value implementations to uniqued types. It encourages misuse. | |
Looks like the changes caught a regression:
-- Testing: 1 tests, 1 workers -- FAIL: LLVM :: CodeGen/X86/statepoint-stack-usage.ll (1 of 1) ******************** TEST 'LLVM :: CodeGen/X86/statepoint-stack-usage.ll' FAILED ******************** Script: -- : 'RUN: at line 1'; /tmp/llvm-expensive-checks/bin/llc -verify-machineinstrs -stack-symbol-ordering=0 < /tmp/llvm-project/llvm/test/CodeGen/X86/statepoint-stack-usage.ll | /tmp/llvm-expensive-checks/bin/FileCheck /tmp/llvm-project/llvm/test/CodeGen/X86/statepoint-stack-usage.ll -- Exit Code: 2 Command Output (stderr): -- + : 'RUN: at line 1' + /tmp/llvm-expensive-checks/bin/llc -verify-machineinstrs -stack-symbol-ordering=0 + /tmp/llvm-expensive-checks/bin/FileCheck /tmp/llvm-project/llvm/test/CodeGen/X86/statepoint-stack-usage.ll Pass modifies its input and doesn't report it: CodeGen Prepare Pass modifies its input and doesn't report it UNREACHABLE executed at /tmp/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1441! PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. Stack dump: 0. Program arguments: /tmp/llvm-expensive-checks/bin/llc -verify-machineinstrs -stack-symbol-ordering=0 1. Running pass 'Function Pass Manager' on module '<stdin>'. 2. Running pass 'CodeGen Prepare' on function '@back_to_back_calls' #0 0x00005555580db4c8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/tmp/llvm-expensive-checks/bin/llc+0x2b874c8) #1 0x00005555580d8f7e llvm::sys::RunSignalHandlers() (/tmp/llvm-expensive-checks/bin/llc+0x2b84f7e) #2 0x00005555580dbca8 SignalHandler(int) Signals.cpp:0:0 #3 0x0000155554fd9520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520) #4 0x000015555502da7c pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0x96a7c) #5 0x0000155554fd9476 gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x42476) #6 0x0000155554fbf7f3 abort (/lib/x86_64-linux-gnu/libc.so.6+0x287f3) #7 0x0000555558038700 llvm::install_out_of_memory_new_handler() (/tmp/llvm-expensive-checks/bin/llc+0x2ae4700) #8 0x0000555557922b42 (/tmp/llvm-expensive-checks/bin/llc+0x23ceb42) #9 0x000055555792b232 llvm::FPPassManager::runOnModule(llvm::Module&) (/tmp/llvm-expensive-checks/bin/llc+0x23d7232) #10 0x00005555579233fa llvm::legacy::PassManagerImpl::run(llvm::Module&) (/tmp/llvm-expensive-checks/bin/llc+0x23cf3fa) #11 0x0000555556a5e757 main (/tmp/llvm-expensive-checks/bin/llc+0x150a757) #12 0x0000155554fc0d90 (/lib/x86_64-linux-gnu/libc.so.6+0x29d90) #13 0x0000155554fc0e40 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e40) #14 0x0000555556a58c65 _start (/tmp/llvm-expensive-checks/bin/llc+0x1504c65) FileCheck error: '<stdin>' is empty. FileCheck command line: /tmp/llvm-expensive-checks/bin/FileCheck /tmp/llvm-project/llvm/test/CodeGen/X86/statepoint-stack-usage.ll -- ******************** ******************** Failed Tests (1): LLVM :: CodeGen/X86/statepoint-stack-usage.ll
I'll file an issue and take a look and land this change after this gets fixed (or I find a bug in my code here).
Any opposition to landing this with the detailed parameter set to false in the PM changed checks temporarily while https://github.com/llvm/llvm-project/issues/64938 is fixed (and then enable detailed structural hashing for change status expensive checks once that is fixed)?
clang-format not found in user’s local PATH; not linting file.