This makes polly generate a CFG which is closer to what we want in LLVM IR, with a loop preheader for the original loop. This is just a cleanup, but it exposes some fragile assumptions.
I'm not completely happy with the changes related to expandCodeFor; RTCBB->getTerminator() is basically a random insertion point which happens to work due to the way we generate runtime checks. I'm not sure what the right answer looks like, though.