Emitting schedules is hard. There are many prologs and epilogs to stitch together, and importantly there are different strategies for creating a fully-pipelined loop. The current emission code is tangled in with the schedule discovery code, which makes it harder than necessary to understand and makes it hard to plumb in new emission strategies.
I'm not particularly happy that I couldn't make this a set of incremental changes. The changes are pretty invasive - we now model the emitted stages and blocks explicitly as a CFG - and I couldn't find a reasonable way to do this incrementally.
This patch still needs a bit of tweaking - Hexagon tests fail because we're making the kernel loop a bit differently (easy fix). PPC tests need a little register allocation tweak. But at this point I'm pretty happy with the overall structure of the code, it emits comparable code to before and I think it only needs small tweaks. It also passes all of Jinsong's recent tests (thanks for the sms-phi.ll test, that was nasty!)
I'd appreciate some feedback on the direction! The overall diffstat is -650LoC, so this makes the Pipeliner smaller too.