vfirst has the chance of an early out in a microarchitecture, but
vcpop does not. Though I don't know of such a microarchitecture.
For and/or we only need to know if any 1 exists in the mask (after
invering for AND). So we can use vfirst.
Unfortunately, there is no sgez instruction so we end up needing
to invert an sltz for OR. That might make this not worthwhile if there
isn't a microarchitecture that optimizes vfirst. If the start value
is known to be 0 and the result is used by a branch we will hopefully
end up with a bgez instead.
Posting to collect other opinions.
clang-format not found in user’s local PATH; not linting file.