This is an archive of the discontinued LLVM Phabricator instance.

[PR24643] Speeding up FoldingSetNodeID::AddPointer
AbandonedPublic

Authored by bcraig on Sep 1 2015, 9:59 AM.

Download Raw Diff

Details

Reviewers

klimek
krememek
bkramer

Summary

The old code resulted in placement new calls on Linux64, and memmove
calls on Windows. The new code gets to use memcpy instead. This
change makes an analysis of a large .c file on Linux64 go from
6m57.730s to 6m44.317s.

This effectively reverts r135364.
The commit log for that change is...

Simplify & microoptimize code. No intended functionality change.

It doesn't include any rationale or metrics regarding the speedup.

Diff Detail

Event Timeline

bcraig updated this revision to Diff 33706.Sep 1 2015, 9:59 AM

bcraig retitled this revision from to [PR24643] Speeding up FoldingSetNodeID::AddPointer.

bcraig updated this object.

bcraig added reviewers: chandlerc, krememek.

bcraig added a subscriber: llvm-commits.

ping

bcraig added reviewers: bkramer, klimek.Sep 16 2015, 8:12 AM

This is very strange. What compiler are you using to compile LLVM?

In trunk, SmallVector<unsigned>::append will always use memcpy, never memmove. It also has a constant size so the copy should be lowered into a single movq on x86_64. I verified this with clang -O3 and gcc 5.1 -O3, both generate optimal code for append and messy code for AddInteger.

What's possible is that all your pointers fit into 32 bits so the overall footprint of the FoldingSetNodeID is reduced (AddInteger truncates to 32 bits if the integer fits). Can you confirm that?

I am building with Visual Studio 2013, targeting x86 (not x64).
I am also building with clang 3.4, targeting Linux x64.

The memmove call was only on the Win32 builds, and it was definitely a result of the append call. It was the top 'inclusive time' function for the toy program that I was profiling. MSVC's std::uninitialized_copy eventually turns into memmove.

I will dig in a bit more and see if my typical pointers are small.

I can confirm that on my machine, the addresses that were typically passed to AddPointer were "small". They tended to fit in 32-bits. Stack addresses tended to be large.

I will also say that the generated assembly on Linux for my patch is significantly larger / more complicated than the original code. Despite the increased complexity, it does seem to run faster. There's also the substantial benefits that I get on Windows by avoiding the memmove.

Ping. Hoping to get another round of review on this since I've added my investigation notes.

Is there any further change or investigation necessary for this patch?

chandlerc removed a reviewer: chandlerc.Apr 6 2016, 11:03 PM

bcraig abandoned this revision.Jul 5 2016, 10:43 AM

Revision Contents

Path

Size

lib/

Support/

FoldingSet.cpp

12 lines

Diff 33706

lib/Support/FoldingSet.cpp

Context not available.
	// depend on the host. It doesn't matter, however, because hashing on	// depend on the host. It doesn't matter, however, because hashing on
	// pointer values is inherently unstable. Nothing should depend on the	// pointer values is inherently unstable. Nothing should depend on the
	// ordering of nodes in the folding set.	// ordering of nodes in the folding set.
	Bits.append(reinterpret_cast<unsigned *>(&Ptr),	uintptr_t casted = reinterpret_cast<uintptr_t>(Ptr);
	reinterpret_cast<unsigned *>(&Ptr+1));	if (sizeof(uintptr_t) == sizeof(unsigned))
		AddInteger(unsigned(casted));
		else if (sizeof(uintptr_t) == sizeof(unsigned long))
		AddInteger((unsigned long)casted);
		else if (sizeof(uintptr_t) == sizeof(unsigned long long))
		AddInteger((unsigned long long)casted);
		else
		llvm_unreachable("unexpected pointer size");
	}	}

	void FoldingSetNodeID::AddInteger(signed I) {	void FoldingSetNodeID::AddInteger(signed I) {
	Bits.push_back(I);	Bits.push_back(I);
	}	}
Context not available.