This patch updates the lowering of the arm_sme.zero to intrinsics so
that it calculates the correct mask for the tile to zero.
The zero instruction takes an 8-bit mask which specifies which 64-bit
tiles to zero, ZA0.D to ZA7.D correspond to bits 0 to 7. To zero tiles
with element sizes of 8-bit to 32-bit just requires zeroing the right
64-bit tiles.
This is quite easy to calculate, each size has a "base mask" which can
be shifted left by the tile ID to get the mask for that tile.
base_mask << tile_id
After tile allocation, this will be folded to a constant mask.