This is an archive of the discontinued LLVM Phabricator instance.

Avoid insertion sorting each new range in MemCpyOptimizer
ClosedPublic

Authored by inolen on Jul 14 2015, 2:39 PM.

Download Raw Diff

Details

Reviewers

chandlerc
nicholas

Summary

While working on a project I wound up generating a fairly large lookup table (10k entries) of callbacks inside of a static constructor. Clang was taking upwards of ~10 minutes to compile the lookup table. I generated a smaller test case (http://www.inolen.com/static_initializer_test.ll) that, after running with -ftime-report, pointed fingers at GlobalOpt and MemCpyOptimizer.

Running memcpyopt through opt accounted for around ~1 minute. The main culprit was MemCpyOptimizer insertion sorting the ranges as it discovered them in tryMergingIntoMemset. I've changed this up such that ranges are always appended to the list, and once they've all been added they're sorted and merged (n log n vs n^2).

I'm not really sure who to tag as a reviewer, Lang mentioned that Chandler may be appropriate.

Diff Detail

Repository: rL LLVM

Event Timeline

inolen updated this revision to Diff 29712.Jul 14 2015, 2:39 PM

inolen retitled this revision from to Avoid insertion sorting each new range in MemCpyOptimizer.

inolen updated this object.

inolen added a reviewer: chandlerc.

inolen set the repository for this revision to rL LLVM.

inolen added a subscriber: llvm-commits.

nmusgrave mentioned this in D11251: updated tests for correct commit, concerning D11198.Jul 15 2015, 4:59 PM

nmusgrave mentioned this in rL242365: updated tests for correct commit, concerning D11198.Jul 15 2015, 5:26 PM

I'm worried about the memory usage of storing all of them. It feels like you traded CPU time for more memory when you didn't need to.

The problem was that it did a linear scan to find the range with the nearest start value. That list is already stored sorted. Why not just use bisection (std::lower_bound) to find it, saving CPU time without increasing memory usage?

lib/Transforms/Scalar/MemCpyOptimizer.cpp
258	Should be named "Head". See http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly .
403–404	I don't have a suggestion, but I think the terminology of add/merge feels weird. You're really doing an enqueue (to be resolved later) and then resolve, but that's not much better working.

Alright, patch updated to use std::lower_bound.

I converted over from std::list to SmallVector to have a random access iterator for std::lower_bound.

It originally used std::list to avoid expensive copies when merging ranges, but now with move semantics, I don't think this is as serious of a concern.

nicholas accepted this revision.Jul 16 2015, 11:44 PM

nicholas added a reviewer: nicholas.

nicholas added a subscriber: nicholas.

nicholas added inline comments.

lib/Transforms/Scalar/MemCpyOptimizer.cpp
255	Optional: consider hoisting "Ranges.end()" out to "range_iterator E = Ranges.end;". See http://llvm.org/docs/CodingStandards.html#don-t-evaluate-end-every-time-through-a-loop .

This revision is now accepted and ready to land.Jul 16 2015, 11:44 PM

inolen added inline comments.Jul 17 2015, 1:56 AM

lib/Transforms/Scalar/MemCpyOptimizer.cpp
255	I'd removed the end iterator local due to it being invalidated during the loop at the bottom which calls Ranges.erase().

Committed in rL242843.

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

MemCpyOptimizer.cpp

25 lines

Diff 29933

lib/Transforms/Scalar/MemCpyOptimizer.cpp

Context not available.
	#include "llvm/Support/Debug.h"	#include "llvm/Support/Debug.h"
	#include "llvm/Support/raw_ostream.h"	#include "llvm/Support/raw_ostream.h"
	#include "llvm/Transforms/Utils/Local.h"	#include "llvm/Transforms/Utils/Local.h"
	#include <list>	#include <algorithm>
	using namespace llvm;	using namespace llvm;

	#define DEBUG_TYPE "memcpyopt"	#define DEBUG_TYPE "memcpyopt"
Context not available.

	namespace {	namespace {
	class MemsetRanges {	class MemsetRanges {
	/// Ranges - A sorted list of the memset ranges. We use std::list here	/// Ranges - A sorted list of the memset ranges.
	/// because each element is relatively large and expensive to copy.	SmallVector<MemsetRange, 8> Ranges;
	std::list<MemsetRange> Ranges;	typedef SmallVectorImpl<MemsetRange>::iterator range_iterator;
	typedef std::list<MemsetRange>::iterator range_iterator;
	const DataLayout &DL;	const DataLayout &DL;
	public:	public:
	MemsetRanges(const DataLayout &DL) : DL(DL) {}	MemsetRanges(const DataLayout &DL) : DL(DL) {}

	typedef std::list<MemsetRange>::const_iterator const_iterator;	typedef SmallVectorImpl<MemsetRange>::const_iterator const_iterator;
	const_iterator begin() const { return Ranges.begin(); }	const_iterator begin() const { return Ranges.begin(); }
	const_iterator end() const { return Ranges.end(); }	const_iterator end() const { return Ranges.end(); }
	bool empty() const { return Ranges.empty(); }	bool empty() const { return Ranges.empty(); }
Context not available.
	/// addRange - Add a new store to the MemsetRanges data structure. This adds a	/// addRange - Add a new store to the MemsetRanges data structure. This adds a
	/// new range for the specified store at the specified offset, merging into	/// new range for the specified store at the specified offset, merging into
	/// existing ranges as appropriate.	/// existing ranges as appropriate.
	///
	/// Do a linear search of the ranges to see if this can be joined and/or to
	/// find the insertion point in the list. We keep the ranges sorted for
	/// simplicity here. This is a linear search of a linked list, which is ugly,
	/// however the number of ranges is limited, so this won't get crazy slow.
	void MemsetRanges::addRange(int64_t Start, int64_t Size, Value *Ptr,	void MemsetRanges::addRange(int64_t Start, int64_t Size, Value *Ptr,
	unsigned Alignment, Instruction *Inst) {	unsigned Alignment, Instruction *Inst) {
	int64_t End = Start+Size;	int64_t End = Start+Size;
	range_iterator I = Ranges.begin(), E = Ranges.end();

	while (I != E && Start > I->End)	range_iterator I = std::lower_bound(Ranges.begin(), Ranges.end(), Start,
	++I;	[](const MemsetRange &LHS, int64_t RHS) { return LHS.End < RHS; });

	// We now know that I == E, in which case we didn't find anything to merge	// We now know that I == E, in which case we didn't find anything to merge
	// with, or that Start <= I->End. If End < I->Start or I == E, then we need	// with, or that Start <= I->End. If End < I->Start or I == E, then we need
	// to insert a new range. Handle this now.	// to insert a new range. Handle this now.
	if (I == E \|\| End < I->Start) {	if (I == Ranges.end() \|\| End < I->Start) {
		nicholasUnsubmitted Not Done Reply Inline Actions Optional: consider hoisting "Ranges.end()" out to "range_iterator E = Ranges.end;". See http://llvm.org/docs/CodingStandards.html#don-t-evaluate-end-every-time-through-a-loop . nicholas: Optional: consider hoisting "Ranges.end()" out to "range_iterator E = Ranges.end;". See http…
		inolenAuthorUnsubmitted Not Done Reply Inline Actions I'd removed the end iterator local due to it being invalidated during the loop at the bottom which calls Ranges.erase(). inolen: I'd removed the end iterator local due to it being invalidated during the loop at the bottom…
	MemsetRange &R = *Ranges.insert(I, MemsetRange());	MemsetRange &R = *Ranges.insert(I, MemsetRange());
	R.Start = Start;	R.Start = Start;
	R.End = End;	R.End = End;
		nlewyckyUnsubmitted Not Done Reply Inline Actions Should be named "Head". See http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly . nlewycky: Should be named "Head". See http://llvm.org/docs/CodingStandards.html#name-types-functions…
Context not available.
	if (End > I->End) {	if (End > I->End) {
	I->End = End;	I->End = End;
	range_iterator NextI = I;	range_iterator NextI = I;
	while (++NextI != E && End >= NextI->Start) {	while (++NextI != Ranges.end() && End >= NextI->Start) {
	// Merge the range in.	// Merge the range in.
	I->TheStores.append(NextI->TheStores.begin(), NextI->TheStores.end());	I->TheStores.append(NextI->TheStores.begin(), NextI->TheStores.end());
	if (NextI->End > I->End)	if (NextI->End > I->End)
Context not available.
		nlewyckyUnsubmitted Not Done Reply Inline Actions I don't have a suggestion, but I think the terminology of add/merge feels weird. You're really doing an enqueue (to be resolved later) and then resolve, but that's not much better working. nlewycky: I don't have a suggestion, but I think the terminology of add/merge feels weird. You're really…