If the destination is tied, then user has some control of the
register used for input. They would have the ability to control
the value of any tail elements. By using tail agnostic we take
this option away from them.
Its not clear that the intrinsics are defined such that this isn't
supposed to work. And undisturbed is a valid implementation for agnostic
so code wouldn't even fail to work on all systems if we always used
agnostic.
The vcompress intrinsic is defined to require tail undisturbed so
at minimum we need this for that instruction or need to redefine
the intrinsic.
I've made an exception here for vmv.s.x/fmv.s.f and reduction
instructions which only write to element 0 regardless of the tail
policy. This allows us to keep the agnostic policy on those which
should allow better redundant vsetvli removal.
An enhancement would be to check for undef input and keep the
agnostic policy, but we don't have good test coverage for that yet.
If the tail policy is ignored on WritesElement0 instructions, why not set tail policy as tu?
Is it because spec said If anything, the default should be tail-agnostic/mask-agnostic, so software has to specify when it cares about the non-participating elements?
If we set anything as tu except undefined maskedoff, could it also make vsetvli removal easier.