Ignore ephemeral values (only feeding llvm.assume intrinsics) when
computing the instruction count to decide if a block is small enough for
threading. This is similar to the handling of these values in the
InlineCost computation. These instructions will eventually be removed
and shouldn't count against code size (similar to the existing ignoring
of phis).
Without this change, when enabling -fwhole-program-vtables, which causes
type test / assume sequences to be inserted by clang, we can get
different threading decisions. In particular, when building with
instrumentation FDO it can affect the optimizations decisions before FDO
matching, leading to some mismatches.
One issue with the reversed loop is that the Size is checked against the limit the iteration after it is incremented. When iterating in forward order, this means that the branch is not counted against the limit, since the loop exits before the subsequent check. But with the reversed order it gets counted and the test started failing since we no longer did the threading. I decided to consolidate the Size increment and check to make it more consistent, and simply bumped up the default limit and the one used in the test so that there is no change to the status quo in terms of non-ephemeral values.