This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Object/
-
llvm/
-
Object/
1/1
ELF.h
-
unittests/Object/
-
Object/
4/4
ELFTest.cpp

Differential D147890

Fix sysvhash function
ClosedPublic

Authored by urnathan on Apr 9 2023, 12:38 PM.

Download Raw Diff

Details

Reviewers

MaskRay
jhenderson

Commits

rG8bdbebbecdd5: [ELF] Fix SysV hash function.

Summary

I had reason to look at hashSysV, the dynlib hash function. It's a rather awkward formulation -- taken almost straight from the SysV docs. But I noticed it is implemented incorrectly. Specifically it treats the symbol name as a set of plain chars, rather than unsigned chars, as the documented SysV hash treats them. For traditional names formed from a subset of 7-bit ASCII that's not a material difference. But for host ABIs with plain char signed (like x86), this will fail for high-bit chars. And we're in unicode UTF8 world now.[*]

So, use uint8_t, as the hashGnu function does.

Now, the loop is hard to optimize, and this reformulation removes 6 insns from the 16 insn loop on x86. There are 3 changes (all of which are too hard for clang and gcc to spot right now)

the 'if (g)' test is meaningless. We're xoring with g, so if it's zero, that's a nop anyway. As it happens, on x86 we compute the xor anyway and then cmov it. So, just drop it.

the ending 'h ^= g' clears the top 4 bits. But those 4 bits will get shifted out at the start of the next iteration. We can remove sink this mask out of the loop.

the final change is to mask wth 0xf0 after shifting by 24 bits, rather than before by 0xf0<<24, as that increases the likelihood of the ISA being able to use a short immediate mask operand. That's not the case for x86, but plenty of risc ISAs would need to materialize the larger constant somewhere.

I've added unit tests, and verified the hash values against binutil's implementation.

Elf itself does not talk about character sets for symbol names -- they're merely NUL-terminated sequences of bytes.

FYI the refactoring will shortly be applied to binutils

Diff Detail

Event Timeline

urnathan created this revision.Apr 9 2023, 12:38 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 9 2023, 12:38 PM

Herald added a subscriber: pengfei. · View Herald Transcript

urnathan requested review of this revision.Apr 9 2023, 12:38 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 9 2023, 12:38 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B224481: Diff 512031.Apr 9 2023, 1:22 PM

urnathan updated this revision to Diff 512048.Apr 9 2023, 5:03 PM

Harbormaster completed remote builds in B224496: Diff 512048.Apr 9 2023, 5:20 PM

urnathan edited the summary of this revision. (Show Details)Apr 11 2023, 2:30 PM

urnathan added reviewers: MaskRay, jhenderson.

Elf itself does not talk about character sets for symbol names -- they're merely NUL-terminated sequences of bytes.

Elf => ELF

Thanks for the patch! The generic ABI code fragment may have an oversight.
When long represents a 64-bit integer, elf_hash((const unsigned char *)"\xff\x0f\x0f\x0f\x0f\x0f\x12") returns 0x100000002, larger than UINT32_MAX.
I created a post on https://groups.google.com/g/generic-abi/c/8J_jtjsonrE

llvm/include/llvm/Object/ELF.h
1245	`return H & 0x0fffffff;`
llvm/unittests/Object/ELFTest.cpp
275	utf8 => UTF-8

Please remember to upload your patches with full context.

I'll leave @MaskRay to do the review of the logic, based on his discussion on the gABI list.

llvm/unittests/Object/ELFTest.cpp
275	Comments should end in a trailing "."
277–278	I'm not sure what this comment is trying to say. What do you mean by "feedback"? Could you reword it a bit, please?

MaskRay added inline comments.Apr 12 2023, 12:49 AM

llvm/unittests/Object/ELFTest.cpp
289	Here is a nice test case (all alphanumeric characters!) to trigger an overflow issue for the generic ABI function. `elf_hash((const unsigned char *)"ZZZZZW9p")) == 100000000` (https://maskray.me/blog/2023-04-12-elf-hash-function)

emaste added a subscriber: emaste.Apr 12 2023, 6:55 AM

In D147890#4260002, @MaskRay wrote:

Thanks for the patch! The generic ABI code fragment may have an oversight.
When long represents a 64-bit integer, elf_hash((const unsigned char *)"\xff\x0f\x0f\x0f\x0f\x0f\x12") returns 0x100000002, larger than UINT32_MAX.

Yeah, I noticed that too, but the original algorithm was when long was always 32 bits and uint32_t and friends either didn't exist or were very rare. (can't respond to the gabi link you provided).

Anyway, adding your testcase.

This revision was not accepted when it landed; it landed in state Needs Review.Apr 14 2023, 2:15 PM

This revision was landed with ongoing or failed builds.

Closed by commit rG8bdbebbecdd5: [ELF] Fix SysV hash function. (authored by urnathan). · Explain Why

This revision was automatically updated to reflect the committed changes.

urnathan added a commit: rG8bdbebbecdd5: [ELF] Fix SysV hash function..

Revision Contents

Path

Size

llvm/

include/

llvm/

Object/

ELF.h

13 lines

unittests/

Object/

ELFTest.cpp

11 lines

Diff 512031

llvm/include/llvm/Object/ELF.h

Show First 20 Lines • Show All 1,231 Lines • ▼ Show 20 Lines	return createError("a section " + getSecIndexForError(*this, Section) +
"section name string table");		"section name string table");
return StringRef(DotShstrtab.data() + Offset);		return StringRef(DotShstrtab.data() + Offset);
}		}

/// This function returns the hash value for a symbol in the .dynsym section		/// This function returns the hash value for a symbol in the .dynsym section
/// Name of the API remains consistent as specified in the libelf		/// Name of the API remains consistent as specified in the libelf
/// REF : http://www.sco.com/developers/gabi/latest/ch5.dynamic.html#hash		/// REF : http://www.sco.com/developers/gabi/latest/ch5.dynamic.html#hash
inline unsigned hashSysV(StringRef SymbolName) {		inline unsigned hashSysV(StringRef SymbolName) {
unsigned h = 0, g;		uint32_t H = 0;
for (char C : SymbolName) {		for (uint8_t C : SymbolName) {
h = (h << 4) + C;		H = (H << 4) + C;
g = h & 0xf0000000L;		H ^= (H >> 24) & 0xf0;
if (g != 0)
h ^= g >> 24;
h &= ~g;
}		}
return h;		return H & 0xfffffff;
		MaskRayUnsubmitted Done Reply Inline Actions `return H & 0x0fffffff;` MaskRay: `return H & 0x0fffffff;`
}		}

/// This function returns the hash value for a symbol in the .dynsym section		/// This function returns the hash value for a symbol in the .dynsym section
/// for the GNU hash table. The implementation is defined in the GNU hash ABI.		/// for the GNU hash table. The implementation is defined in the GNU hash ABI.
/// REF : https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=bfd/elf.c#l222		/// REF : https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=bfd/elf.c#l222
inline uint32_t hashGnu(StringRef Name) {		inline uint32_t hashGnu(StringRef Name) {
uint32_t H = 5381;		uint32_t H = 5381;
for (uint8_t C : Name)		for (uint8_t C : Name)
H = (H << 5) + H + C;		H = (H << 5) + H + C;
return H;		return H;
}		}

} // end namespace object		} // end namespace object
} // end namespace llvm		} // end namespace llvm

#endif // LLVM_OBJECT_ELF_H		#endif // LLVM_OBJECT_ELF_H

llvm/unittests/Object/ELFTest.cpp

Show First 20 Lines • Show All 265 Lines • ▼ Show 20 Lines	TEST(ELFTest, DataRegionTest) {
// data end pointers.		// data end pointers.
Region = {Data.data(), Data.data() + Data.size()};		Region = {Data.data(), Data.data() + Data.size()};

CheckOperator(Region);		CheckOperator(Region);
const char *ErrMsg2 = "can't read past the end of the file";		const char *ErrMsg2 = "can't read past the end of the file";
EXPECT_THAT_ERROR(Region[3].takeError(), FailedWithMessage(ErrMsg2));		EXPECT_THAT_ERROR(Region[3].takeError(), FailedWithMessage(ErrMsg2));
EXPECT_THAT_ERROR(Region[4].takeError(), FailedWithMessage(ErrMsg2));		EXPECT_THAT_ERROR(Region[4].takeError(), FailedWithMessage(ErrMsg2));
}		}

		// Test the sysV and the gnu hash functions, particularly with utf8 unicode
		MaskRayUnsubmitted Done Reply Inline Actions utf8 => UTF-8 MaskRay: utf8 => UTF-8
		jhendersonUnsubmitted Done Reply Inline Actions Comments should end in a trailing "." jhenderson: Comments should end in a trailing "."
		TEST(ELFTest, Hash) {
		// Symbols long enough to have feedback
		EXPECT_EQ(hashSysV("FooBarBazToto"), 0x5ec3e8fU);
		jhendersonUnsubmitted Done Reply Inline Actions I'm not sure what this comment is trying to say. What do you mean by "feedback"? Could you reword it a bit, please? jhenderson: I'm not sure what this comment is trying to say. What do you mean by "feedback"? Could you…
		EXPECT_EQ(hashGnu("FooBarBazToto"), 0x5478be61U);
		EXPECT_EQ(hashSysV(u8"boom💥pants"), 0x5a0cf53U);
		EXPECT_EQ(hashGnu(u8"boom💥pants"), 0xf5dda2deU);
		EXPECT_EQ(hashSysV(u8"woot!🧙 💑 🌈"), 0x3522e38U);
		EXPECT_EQ(hashGnu(u8"woot!🧙 💑 🌈"), 0xf7603f3U);
		}
		MaskRayUnsubmitted Done Reply Inline Actions Here is a nice test case (all alphanumeric characters!) to trigger an overflow issue for the generic ABI function. `elf_hash((const unsigned char )"ZZZZZW9p")) == 100000000` (https://maskray.me/blog/2023-04-12-elf-hash-function) MaskRay:* Here is a nice test case (all alphanumeric characters!) to trigger an overflow issue for the…