This is an archive of the discontinued LLVM Phabricator instance.

Avoid insertion sorting each new range in MemCpyOptimizer
ClosedPublic

Authored by inolen on Jul 14 2015, 2:39 PM.

Details

Summary

While working on a project I wound up generating a fairly large lookup table (10k entries) of callbacks inside of a static constructor. Clang was taking upwards of ~10 minutes to compile the lookup table. I generated a smaller test case (http://www.inolen.com/static_initializer_test.ll) that, after running with -ftime-report, pointed fingers at GlobalOpt and MemCpyOptimizer.

Running memcpyopt through opt accounted for around ~1 minute. The main culprit was MemCpyOptimizer insertion sorting the ranges as it discovered them in tryMergingIntoMemset. I've changed this up such that ranges are always appended to the list, and once they've all been added they're sorted and merged (n log n vs n^2).

I'm not really sure who to tag as a reviewer, Lang mentioned that Chandler may be appropriate.

Diff Detail

Repository
rL LLVM

Event Timeline

inolen updated this revision to Diff 29712.Jul 14 2015, 2:39 PM
inolen retitled this revision from to Avoid insertion sorting each new range in MemCpyOptimizer.
inolen updated this object.
inolen added a reviewer: chandlerc.
inolen set the repository for this revision to rL LLVM.
inolen added a subscriber: llvm-commits.

I'm worried about the memory usage of storing all of them. It feels like you traded CPU time for more memory when you didn't need to.

The problem was that it did a linear scan to find the range with the nearest start value. That list is already stored sorted. Why not just use bisection (std::lower_bound) to find it, saving CPU time without increasing memory usage?

lib/Transforms/Scalar/MemCpyOptimizer.cpp
258
403–404

I don't have a suggestion, but I think the terminology of add/merge feels weird. You're really doing an enqueue (to be resolved later) and then resolve, but that's not much better working.

inolen updated this revision to Diff 29933.Jul 16 2015, 12:39 PM

Alright, patch updated to use std::lower_bound.

I converted over from std::list to SmallVector to have a random access iterator for std::lower_bound.

It originally used std::list to avoid expensive copies when merging ranges, but now with move semantics, I don't think this is as serious of a concern.

nicholas accepted this revision.Jul 16 2015, 11:44 PM
nicholas added a reviewer: nicholas.
nicholas added a subscriber: nicholas.
nicholas added inline comments.
lib/Transforms/Scalar/MemCpyOptimizer.cpp
255

Optional: consider hoisting "Ranges.end()" out to "range_iterator E = Ranges.end;". See http://llvm.org/docs/CodingStandards.html#don-t-evaluate-end-every-time-through-a-loop .

This revision is now accepted and ready to land.Jul 16 2015, 11:44 PM
inolen added inline comments.Jul 17 2015, 1:56 AM
lib/Transforms/Scalar/MemCpyOptimizer.cpp
255

I'd removed the end iterator local due to it being invalidated during the loop at the bottom which calls Ranges.erase().

Eugene.Zelenko added a subscriber: Eugene.Zelenko.

Committed in rL242843.