Update VPInstruction::execute to only generate code for the first part
if the recipe is outside the vector loop region (i.e. the pre-headers
for now).
This avoids generating unnecessary code in the pre-headers (see adjusted
tests) and going forward allows using VPInstructions more generally for
code-generation outside the vector loop.
To facilitate this, VPTransformState::set/get() have been extended with
versions that take a std::variant with either the Part or a VPIteration.
When getting a the result for a definition outside the vector loop
region, always return Part 0. This means the callers do not need to
check where the definition resides.
If a recipe should generate one invariant/uniform-across-UF IR value across all parts, regardless of its position being inside or outside the vectorized loop, it can do so while registering it for all parts, as done here.