This is an archive of the discontinued LLVM Phabricator instance.

[NFCi][MergeFunctions] Consolidate Hashing Functions
ClosedPublic

Authored by aidengrossman on Aug 17 2023, 2:07 PM.

Details

Summary

A couple years ago, StructuralHash was created, copying the exact
hashing implementation from FunctionComparator (minus a couple small
details/refactorings). Since then, the hashing implementation has not
diverged, but several other areas, like unit testing, have diverged
significantly, with StructuralHash getting more attention in these
areas. This patch aims to consolidate the two hashing functions into
StructuralHash given they do the exact same thing and having less
divergence in areas like unit testing would be beneficial.

The original aim at creating a separate StructuralHash was to make the
implementation divergent and capture additional details like instruction
operands (which neither hashing implementation does currently). The
MergeFunctions pass doesn't need these detaisl, but verification of pass
return values would benefit from this additional data. Setting an option
to calculate these values would allow for divergent behavior where
appropriate while reducing code duplication with little runtime
overhead.

Diff Detail

Event Timeline

aidengrossman created this revision.Aug 17 2023, 2:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2023, 2:07 PM
aidengrossman requested review of this revision.Aug 17 2023, 2:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2023, 2:07 PM

I think it's important to add comments in StructuralHash that different users of it may want different behavior, so in the future if that's desired we should be able to add flags to tune the behavior of the hashing

but otherwise this seems fine

Address reviewer feedback

  • Add comment that different users might want different behavior so options

should be preferred.

aeubanks accepted this revision.Aug 17 2023, 3:29 PM

thanks!

This revision is now accepted and ready to land.Aug 17 2023, 3:29 PM
This revision was landed with ongoing or failed builds.Aug 18 2023, 1:19 PM
This revision was automatically updated to reflect the committed changes.
mgorny added a subscriber: mgorny.Aug 19 2023, 3:43 AM

This change has caused a test regression on 32-bit hosts (reproduced on ARM and x86):

FAIL: LLVM :: Transforms/MergeFunc/inline-asm.ll (14 of 62)
******************** TEST 'LLVM :: Transforms/MergeFunc/inline-asm.ll' FAILED ********************
Script:
--
: 'RUN: at line 1';   /home/mgorny/llvm-project/build/bin/opt -passes=mergefunc -S < /home/mgorny/llvm-project/llvm/test/Transforms/MergeFunc/inline-asm.ll | /home/mgorny/llvm-project/build/bin/FileCheck /home/mgorny/llvm-project/llvm/test/Transforms/MergeFunc/inline-asm.ll
--
Exit Code: 1

Command Output (stderr):
--
/home/mgorny/llvm-project/llvm/test/Transforms/MergeFunc/inline-asm.ll:9:16: error: CHECK-LABEL: expected string not found in input
; CHECK-LABEL: @int_ptr_arg_same
               ^
<stdin>:35:26: note: scanning from here
define void @int_ptr_null() {
                         ^
<stdin>:36:19: note: possible intended match here
 tail call void @float_ptr_null()
                  ^

Input file: <stdin>
Check file: /home/mgorny/llvm-project/llvm/test/Transforms/MergeFunc/inline-asm.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
          30: define void @int_ptr_arg_same(ptr %0) { 
          31:  tail call void @float_ptr_arg_same(ptr %0) 
          32:  ret void 
          33: } 
          34:  
          35: define void @int_ptr_null() { 
label:9'0                              X~~~~ error: no match found
          36:  tail call void @float_ptr_null() 
label:9'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
label:9'1                       ?                possible intended match
          37:  ret void 
label:9'0     ~~~~~~~~~~
          38: } 
label:9'0     ~~
>>>>>>

--

FWICS the functions are now output in different order depending on the arch.

Thanks for bringing this to my attention! Not sure how I completely missed the buildbot notifications. I've reverted for now so that I can work on a fix out-of-trunk and will resubmit once I have this fixed (and after more thorough testing).

It looks like hash_code stores everything as size_t which causes the difference between the 32 bit and 64 bit platforms. The functions end up in different places as MergeFunctions ends up ordering them by hash. I've switched the hashing a little bit (mainly using uint64_t and switching to hash_16_bytes from hash_combine to get rid of the size_t) so that things are consistent across platforms. I've also updated the magic constant as the function header to prevent any unnecessary churn in the test diffs and I've verified that behavior is consistent between 64 and 32 bit builds now.

Relanded in 64da0be1fc06ee2199bd27c980736986e0eccd9d.

Thank you. I'm going to try again later today.

I've just confirmed that the tests pass on 32-bit and 64-bit x86 now. Thanks, again!