This is an archive of the discontinued LLVM Phabricator instance.

[IR] Add more details to StructuralHash
ClosedPublic

Authored by aidengrossman on Aug 17 2023, 11:21 PM.

Details

Summary

This pass extends StructuralHash to include much more information about
the IR under analysis. This is done with a flag (Detailed) to configure
the behavior. The detailed behavior is intended for use in expensive
checks and downstream.

Diff Detail

Event Timeline

aidengrossman created this revision.Aug 17 2023, 11:21 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2023, 11:21 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
aidengrossman requested review of this revision.Aug 17 2023, 11:21 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2023, 11:21 PM
nikic added inline comments.Aug 18 2023, 12:06 AM
llvm/lib/IR/StructuralHash.cpp
129–137

This seems like a very random collection of things to add to the hash. Why isn't this just hashing all the operands? That should cover the operand types, the called function and the intrinsic ID.

aidengrossman added inline comments.Aug 18 2023, 12:44 AM
llvm/lib/IR/StructuralHash.cpp
129–137

I was under the impression that it wasn't possible to just hash a value. I can hash the pointer, but I'm not sure that would be correct in all cases (unless everything is uniqued appropriately).

https://github.com/llvm/llvm-project/blob/d9cb76bc4d5e903fe045c58a42fc791d0c70172b/llvm/include/llvm/Analysis/IRSimilarityIdentifier.h#L261 implements logic that seems to follow those assumptions (and is a similar implementation to what is here).

Definitely could be that my assumptions are incorrect here though.

nikic added inline comments.Aug 18 2023, 2:24 AM
llvm/lib/IR/StructuralHash.cpp
129–137

You are right that we can't "just" hash the operand pointers, but I'd still use that as the general approach. If the operand is a Constant you should be able to hash the pointer as those are uniqued, for Arguments you can take the argument number, and for Instructions you could use only the type for now (to handle those we'd have to number instructions).

aidengrossman added inline comments.Aug 18 2023, 2:31 PM
llvm/lib/IR/StructuralHash.cpp
129–137

Also, re: pointers, my intention with this is to have a hash that is stable across different modules (interested in looking at function deduplication across modules) given the same function which makes just hashing pointers provide incorrect results.

I'm planning on writing in support for other constant types in the future and instruction operands through numbering (in follow-up patches). My intention with the Detailed flag currently isn't to be make every semantically-meaningful difference produce a different hash but to capture most of the common cases and be "good enough" at most cases.

aeubanks added inline comments.Aug 21 2023, 10:46 AM
llvm/lib/IR/StructuralHash.cpp
129–137

we should split out a update(Value*) (which doesn't recursively look at operands) and hash the operands of the current instruction as well as the current instruction itself

144

is there anywhere we don't want this new functionality?

FWIW, I think more details, even if not perfect, seems reasonable. We will experiment with this and come back to it as needed.

llvm/lib/IR/StructuralHash.cpp
122

early exit, plz.

aidengrossman marked 3 inline comments as done.

Address reviewer feedback

  • Early exit where pointed out.
  • Switch to hashing operands for the most part (except for in the case of cmp)
  • Add a couple additional operands to hash, particularly integer constants and arguments.
  • Rebase against upstream to include changes to prevent test regressions on 32-bit arches.

Fix minor formatting issue

llvm/lib/IR/StructuralHash.cpp
129–137

I've split this out into an updateInstruction and an updateValue function. updateInstruction handles everything related to instructions and then calls updateValue when handling operands. I think it makes things a little bit clearer as it avoids any reasoning about recursion depending upon if the value is an instruction or not.

I've added a couple additional cases when handling operands (rather than just hashing the type) that aren't meant to be exhaustive.

144

Yes. The MergeFunctions pass is now using this hash after https://reviews.llvm.org/D158217 and the original hashing implementation did not include additional details (as I believe the intention was to not hash all details) and then use the FunctionComparator class where necessary.

aeubanks accepted this revision.Aug 21 2023, 3:20 PM

this on its own lg, but I have inline comments

llvm/lib/IR/StructuralHash.cpp
48

we really shouldn't have to re-implement llvm::hash_value. IIUC the reason we're not is because 32-bit vs 64-bit hashes are different which causes MergeFunctions to act differently because it sorts hashes. but IMO MergeFunctions shouldn't be sorting by hash, that's kinda bad. I think we should go back to the original version of https://reviews.llvm.org/D158217 and fix MergeFunctions to be more deterministic and not rely on hash values. It seems like we're just currently lucky in that the hashes for test cases in MergeFunctions are ordered consistently between 32/64 bit.

This revision is now accepted and ready to land.Aug 21 2023, 3:20 PM
aidengrossman added inline comments.Aug 21 2023, 3:50 PM
llvm/lib/IR/StructuralHash.cpp
48

I agree that it shouldn't be sorting by hash, but I'd rather not touch that if I don't have to. When I relanded that differential in 64da0be1fc06ee2199bd27c980736986e0eccd9d I updated it to use what MergeFunctions did originally and use hash_16_bytes along with a uint64_t rather than size_t (forgot to update the differential with the changes). The MergeFunction test was consistent before https://reviews.llvm.org/D158217 as they just directly called hash_16_bytes and stored everything as a uint64_t.

Update patch

  • Switch to native hashing implementations for APInt/APFloat
  • Switch from hashing type pointers to hashing some details about the type as the former is non-deterministic (which of course is frustratingly masked by my system having ASLR disabled). Left a TODO to refactor the type hashing into a more formal hash_value implementation within the class.
  • Formatting changes.
aidengrossman added inline comments.Aug 21 2023, 11:36 PM
llvm/lib/IR/StructuralHash.cpp
48

Sorry for not reading your message close enough initially. I missed basically the entire point. I've switched over to using the native hash_value implementations for APInt and APFloat to avoid the reimplementation. I think within StructuralHash we can guarantee the same hash between 32 bit and 64 bit platforms outside of the detailed hash case, but inside of the detailed hash case we don't make that guarantee. It would be good to order by something other than hash within MergeFunctions, but the ordering needs to be deterministic and ideally fast and hashes make quite a bit of sense for that (other than the significant churn when they change).

I'm thinking keeping the determinism between 64-bit/32-bit makes sense to preserve the MergeFunctions semantics (at least for now) for the non-detailed case and the detailed case can take advantage of hash_value implementations where appropriate. What do you think?

nikic added inline comments.Aug 23 2023, 3:41 AM
llvm/lib/IR/StructuralHash.cpp
37

Call hash(ToAdd) instead?

51

Aren't these cases covered by the type ID hash?

72

Use auto with dyn_cast.

73

Hrm, apparently we don't have a public API to get subclass data in a generic way :/

76

Loop over operands().

aidengrossman marked 5 inline comments as done.

Address reviewer feedback

llvm/lib/IR/StructuralHash.cpp
51

Yes. For some reason I was thinking that there was.a generic floating point type in the enum so did them separately. Thanks for the catch!

nikic accepted this revision.Aug 23 2023, 11:49 AM

Looks fine. Please test EXPENSIVE_ CHECKS to make sure this doesn't find new missing invalidations.

llvm/lib/IR/StructuralHash.cpp
40

I don't think we should add hash_value implementations to uniqued types. It encourages misuse.

Address reviewer feedback

  • Remove TODO comment
aidengrossman marked an inline comment as done.Aug 23 2023, 12:09 PM

Looks like the changes caught a regression:

-- Testing: 1 tests, 1 workers --
FAIL: LLVM :: CodeGen/X86/statepoint-stack-usage.ll (1 of 1)
******************** TEST 'LLVM :: CodeGen/X86/statepoint-stack-usage.ll' FAILED ********************
Script:
--
: 'RUN: at line 1';   /tmp/llvm-expensive-checks/bin/llc -verify-machineinstrs -stack-symbol-ordering=0 < /tmp/llvm-project/llvm/test/CodeGen/X86/statepoint-stack-usage.ll | /tmp/llvm-expensive-checks/bin/FileCheck /tmp/llvm-project/llvm/test/CodeGen/X86/statepoint-stack-usage.ll
--
Exit Code: 2
Command Output (stderr):
--
+ : 'RUN: at line 1'
+ /tmp/llvm-expensive-checks/bin/llc -verify-machineinstrs -stack-symbol-ordering=0
+ /tmp/llvm-expensive-checks/bin/FileCheck /tmp/llvm-project/llvm/test/CodeGen/X86/statepoint-stack-usage.ll
Pass modifies its input and doesn't report it: CodeGen Prepare
Pass modifies its input and doesn't report it
UNREACHABLE executed at /tmp/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1441!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /tmp/llvm-expensive-checks/bin/llc -verify-machineinstrs -stack-symbol-ordering=0
1.      Running pass 'Function Pass Manager' on module '<stdin>'.
2.      Running pass 'CodeGen Prepare' on function '@back_to_back_calls'
 #0 0x00005555580db4c8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/tmp/llvm-expensive-checks/bin/llc+0x2b874c8)
 #1 0x00005555580d8f7e llvm::sys::RunSignalHandlers() (/tmp/llvm-expensive-checks/bin/llc+0x2b84f7e)
 #2 0x00005555580dbca8 SignalHandler(int) Signals.cpp:0:0
 #3 0x0000155554fd9520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x000015555502da7c pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0x96a7c)
 #5 0x0000155554fd9476 gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x42476)
 #6 0x0000155554fbf7f3 abort (/lib/x86_64-linux-gnu/libc.so.6+0x287f3)
 #7 0x0000555558038700 llvm::install_out_of_memory_new_handler() (/tmp/llvm-expensive-checks/bin/llc+0x2ae4700)
 #8 0x0000555557922b42 (/tmp/llvm-expensive-checks/bin/llc+0x23ceb42)
 #9 0x000055555792b232 llvm::FPPassManager::runOnModule(llvm::Module&) (/tmp/llvm-expensive-checks/bin/llc+0x23d7232)
#10 0x00005555579233fa llvm::legacy::PassManagerImpl::run(llvm::Module&) (/tmp/llvm-expensive-checks/bin/llc+0x23cf3fa)
#11 0x0000555556a5e757 main (/tmp/llvm-expensive-checks/bin/llc+0x150a757)
#12 0x0000155554fc0d90 (/lib/x86_64-linux-gnu/libc.so.6+0x29d90)
#13 0x0000155554fc0e40 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e40)
#14 0x0000555556a58c65 _start (/tmp/llvm-expensive-checks/bin/llc+0x1504c65)
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /tmp/llvm-expensive-checks/bin/FileCheck /tmp/llvm-project/llvm/test/CodeGen/X86/statepoint-stack-usage.ll
--
********************
********************
Failed Tests (1):
  LLVM :: CodeGen/X86/statepoint-stack-usage.ll

I'll file an issue and take a look and land this change after this gets fixed (or I find a bug in my code here).

aeubanks accepted this revision.Aug 23 2023, 1:05 PM

Any opposition to landing this with the detailed parameter set to false in the PM changed checks temporarily while https://github.com/llvm/llvm-project/issues/64938 is fixed (and then enable detailed structural hashing for change status expensive checks once that is fixed)?

Any opposition to landing this with the detailed parameter set to false in the PM changed checks temporarily while https://github.com/llvm/llvm-project/issues/64938 is fixed (and then enable detailed structural hashing for change status expensive checks once that is fixed)?

that sounds fine with a TODO

This revision was automatically updated to reflect the committed changes.