Adapt tail-predication to the new semantics of get.active.lane.mask as proposed in D86147. This means that:
- we can remove BTC + 1 overflow checks because now the loop tripcount is passed in to the intrinsic,
- we can immediately use that value to setup a counter for the number of elements processed by the loop and don't need to materialize BTC + 1.
nit: maybe something more descriptive that highlights that this is the original scalar trip count or the number of elements.