A TBL instruction will fill out-of-range values with 0's, something used in D121139 to turn tbl2 with a zero input into tbl1s. This works OK for v16i8, but for v8i8 the input is still treated as a v16i8, so "out-of-range" values (like a lane index of 8) would end up loading values from the top half of the input register. Clean this up by detecting the out of range values and making sure they really use out of range values. There is a fix for swapped indices of 64bit input vectors too, which could be incorrectly adjusted if the zerovector was the first operand.
Fixes #55545