I had reason to look at hashSysV, the dynlib hash function. It's a rather awkward formulation -- taken almost straight from the SysV docs. But I noticed it is implemented incorrectly. Specifically it treats the symbol name as a set of plain chars, rather than unsigned chars, as the documented SysV hash treats them. For traditional names formed from a subset of 7-bit ASCII that's not a material difference. But for host ABIs with plain char signed (like x86), this will fail for high-bit chars. And we're in unicode UTF8 world now.[*]
So, use uint8_t, as the hashGnu function does.
Now, the loop is hard to optimize, and this reformulation removes 6 insns from the 16 insn loop on x86. There are 3 changes (all of which are too hard for clang and gcc to spot right now)
- the 'if (g)' test is meaningless. We're xoring with g, so if it's zero, that's a nop anyway. As it happens, on x86 we compute the xor anyway and then cmov it. So, just drop it.
- the ending 'h ^= g' clears the top 4 bits. But those 4 bits will get shifted out at the start of the next iteration. We can remove sink this mask out of the loop.
- the final change is to mask wth 0xf0 after shifting by 24 bits, rather than before by 0xf0<<24, as that increases the likelihood of the ISA being able to use a short immediate mask operand. That's not the case for x86, but plenty of risc ISAs would need to materialize the larger constant somewhere.
I've added unit tests, and verified the hash values against binutil's implementation.
- Elf itself does not talk about character sets for symbol names -- they're merely NUL-terminated sequences of bytes.
FYI the refactoring will shortly be applied to binutils
return H & 0x0fffffff;