Building on the added RuntimeVFxUF, replace the CalculateTripCountMinusVF
opcode with explicit Sub,Cmp & Select VPInstructions to compute the trip
count minus 1. This removes a very specific opcode and allows re-using
the already computed VFxUF.
Note that this highlights an existing challenge with using VPInstruction to
model computation in the vector preheaders: when using VPInstruction to
compute scalar values outside a vector loop, we may have to explicitly
access VPIteration(0, 0) when retrieving the corresponding value from
state.
The reason for that is that some recipes in the preheader must
explicitly set VPIteration(0, 0) because some of their users in the
vector loop may need to access a vector with lane 0 broadcasted (e.g.
trip count for tail folding top-level mask compare). For now this patch
works around this by checking if the VPInstruction has a parent region.
Should we generalize this for all opcodes in generateInstruction?
Note: A number of tests still need updating.
Depends on D157322.