Currently popFromQueueImpl iterates over all candidates to find the best
one. While the candidate queue is small, this is not a problem. But it
becomes a problem once the queue gets larger. For example, the snippet
below takes 330s to compile with llc -O0, but completes in 3s with this
patch.
define void @test(i4000000* %ptr) {
entry:
store i4000000 0, i4000000* %ptr, align 4 ret void
}
On backends that use the MachineScheduler, there should be no changes in
the generated code (e.g. for X86 there are no binary changes with this
patch when building MultiSource, SPEC2000, SPEC2006 with -O3 -lto).
On backends that are not using the MachineScheduler, there is a slight
change in behavior: previously, the first candidate in the list would be
picked if there are multiple candidates with the same score.
For small worklists, maintaining the heap can be more expensive than it
is actually worth it, so the new approach is only used for candidate
lists with more than 100 candidates. See
http://llvm-compile-time-tracker.com/compare.php?from=058af835063ff9afc39fc53279fa660e075564ed&to=390d055bee63860be70caf57515b3f29f7728d91&stat=instructions
where the first commit is with a limit and the second one always uses
a heap. The first commit is slightly faster.
Overall on CTMark, the change is mostly neutral
(http://llvm-compile-time-tracker.com/compare.php?from=f7522a5823d66303edfd8d872232dd6b07190f42&to=058af835063ff9afc39fc53279fa660e075564ed&stat=instructions)
but it is beneficial for very large inputs.