Page MenuHomePhabricator

[ScheduleDAGRRList] Limit number of candidates to explore.
ClosedPublic

Authored by fhahn on Wed, Jul 22, 7:59 AM.

Details

Summary

Currently popFromQueueImpl iterates over all candidates to find the best
one. While the candidate queue is small, this is not a problem. But it
becomes a problem once the queue gets larger. For example, the snippet
below takes 330s to compile with llc -O0, but completes in 3s with this
patch.

define void @test(i4000000* %ptr) {
entry:

store i4000000 0, i4000000* %ptr, align 4
ret void

}

This patch limits the number of candidates to check to 1000. This limit
ensures that it never triggers for test-suite/SPEC2000/SPEC2006 on X86
and AArch64 with -O3, while still drastically limiting the compile-time
in case of very large queues.

It would be even better to use a binary heap to manage to queue
(D83335), but some heuristics change the score of a node in the queue
after another node has been scheduled. I plan to address this for
backends that use the MachineScheduler in the future, but that requires
a more careful evaluation. In the meantime, the limit should help users
impacted by this issue.

The patch includes a slightly smaller version of the motivating example
as test case, to guard against the issue.

Diff Detail

Event Timeline

fhahn created this revision.Wed, Jul 22, 7:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptWed, Jul 22, 7:59 AM
This revision is now accepted and ready to land.Wed, Jul 22, 2:34 PM
This revision was automatically updated to reflect the committed changes.
RKSimon added inline comments.
llvm/test/CodeGen/X86/stress-scheduledagrrlist.ll
12

@fhahn Should this be in x86 or generic codegen tests? I wouldn't have noticed it but its causing a notable increase in runtime for the x86 codegen tests.....

fhahn marked an inline comment as done.Thu, Jul 23, 10:31 AM
fhahn added inline comments.
llvm/test/CodeGen/X86/stress-scheduledagrrlist.ll
12

It's intentional for X86 to check that we don't have excessive compile-times (the patch improves compile-time by ~ 100x). On my system, it takes ~1-2 seconds, but if that's too long we can reduce the width of the type further.

RKSimon added inline comments.Thu, Jul 23, 11:59 AM
llvm/test/CodeGen/X86/stress-scheduledagrrlist.ll
12

Hmm - please can you check an EXPENSIVE_CHECKS build? I'm seeing > 6min for this.

fhahn marked an inline comment as done.Thu, Jul 23, 12:04 PM
fhahn added inline comments.
llvm/test/CodeGen/X86/stress-scheduledagrrlist.ll
12

oh right, I think there is some very expensive checks somewhere in SelectionDAG and there will be a very large number of nodes.... Is there a way to disable tests for builds with expensive checks? Otherwise we probably should remove the test.

RKSimon added inline comments.Sat, Jul 25, 7:38 AM
llvm/test/CodeGen/X86/stress-scheduledagrrlist.ll
12

I can't think of a way for lit to check for non EXPENSIVE_CHECKS builds - and I'd be worried that such an approach would be abused tbh.

fhahn added a comment.Sat, Jul 25, 7:48 AM

OK, I removed the test in rGc09a10845b42. it seems like there's no way to work around the expensive-checks issue and reducing the width of the store much further will defeat the actual purpose of the test.