This is an archive of the discontinued LLVM Phabricator instance.

Fix sysvhash function
ClosedPublic

Authored by urnathan on Apr 9 2023, 12:38 PM.

Details

Summary

I had reason to look at hashSysV, the dynlib hash function. It's a rather awkward formulation -- taken almost straight from the SysV docs. But I noticed it is implemented incorrectly. Specifically it treats the symbol name as a set of plain chars, rather than unsigned chars, as the documented SysV hash treats them. For traditional names formed from a subset of 7-bit ASCII that's not a material difference. But for host ABIs with plain char signed (like x86), this will fail for high-bit chars. And we're in unicode UTF8 world now.[*]

So, use uint8_t, as the hashGnu function does.

Now, the loop is hard to optimize, and this reformulation removes 6 insns from the 16 insn loop on x86. There are 3 changes (all of which are too hard for clang and gcc to spot right now)

  1. the 'if (g)' test is meaningless. We're xoring with g, so if it's zero, that's a nop anyway. As it happens, on x86 we compute the xor anyway and then cmov it. So, just drop it.
  1. the ending 'h ^= g' clears the top 4 bits. But those 4 bits will get shifted out at the start of the next iteration. We can remove sink this mask out of the loop.
  1. the final change is to mask wth 0xf0 after shifting by 24 bits, rather than before by 0xf0<<24, as that increases the likelihood of the ISA being able to use a short immediate mask operand. That's not the case for x86, but plenty of risc ISAs would need to materialize the larger constant somewhere.

I've added unit tests, and verified the hash values against binutil's implementation.

  • Elf itself does not talk about character sets for symbol names -- they're merely NUL-terminated sequences of bytes.

FYI the refactoring will shortly be applied to binutils

Diff Detail

Event Timeline

urnathan created this revision.Apr 9 2023, 12:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 9 2023, 12:38 PM
Herald added a subscriber: pengfei. · View Herald Transcript
urnathan requested review of this revision.Apr 9 2023, 12:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 9 2023, 12:38 PM
urnathan updated this revision to Diff 512048.Apr 9 2023, 5:03 PM
urnathan edited the summary of this revision. (Show Details)Apr 11 2023, 2:30 PM
urnathan added reviewers: MaskRay, jhenderson.

Elf itself does not talk about character sets for symbol names -- they're merely NUL-terminated sequences of bytes.

Elf => ELF


Thanks for the patch! The generic ABI code fragment may have an oversight.
When long represents a 64-bit integer, elf_hash((const unsigned char *)"\xff\x0f\x0f\x0f\x0f\x0f\x12") returns 0x100000002, larger than UINT32_MAX.
I created a post on https://groups.google.com/g/generic-abi/c/8J_jtjsonrE

llvm/include/llvm/Object/ELF.h
1245

return H & 0x0fffffff;

llvm/unittests/Object/ELFTest.cpp
275

utf8 => UTF-8

Please remember to upload your patches with full context.

I'll leave @MaskRay to do the review of the logic, based on his discussion on the gABI list.

llvm/unittests/Object/ELFTest.cpp
275

Comments should end in a trailing "."

277–278

I'm not sure what this comment is trying to say. What do you mean by "feedback"? Could you reword it a bit, please?

MaskRay added inline comments.Apr 12 2023, 12:49 AM
llvm/unittests/Object/ELFTest.cpp
289

Here is a nice test case (all alphanumeric characters!) to trigger an overflow issue for the generic ABI function. elf_hash((const unsigned char *)"ZZZZZW9p")) == 100000000 (https://maskray.me/blog/2023-04-12-elf-hash-function)

emaste added a subscriber: emaste.Apr 12 2023, 6:55 AM
urnathan marked 5 inline comments as done.Apr 14 2023, 12:47 PM

Thanks for the patch! The generic ABI code fragment may have an oversight.
When long represents a 64-bit integer, elf_hash((const unsigned char *)"\xff\x0f\x0f\x0f\x0f\x0f\x12") returns 0x100000002, larger than UINT32_MAX.

Yeah, I noticed that too, but the original algorithm was when long was always 32 bits and uint32_t and friends either didn't exist or were very rare. (can't respond to the gabi link you provided).

Anyway, adding your testcase.

This revision was not accepted when it landed; it landed in state Needs Review.Apr 14 2023, 2:15 PM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.