If the block being cloned contains a PHI node, in general, we need to clone that PHI node, even though it's trivial. If the operand of the PHI is an instruction in the block being cloned, the correct value for the operand doesn't exist until SSAUpdater constructs it.
We usually don't hit this issue because we try to avoid threading across loop headers, but it's possible to hit this in some cases involving irreducible CFGs. I added a flag to allow threading across loop headers to make the testcase simpler.
Thanks to Brian Rzycki for reducing the testcase.
I have concerns about adding this flag in this patch without a bit more testing and general test-cases for loop header threading. I fear this is a proverbial can of worms for any third party user of opt who discovers this flag and starts using it.