The immediate for the final instruction is the second argument. It should be 32-bits to match the software spec then truncated to 8-bits to match the instruction format. The mask is the last argument which needs to be 16-bits for the 512-bit instruction.
Also put the tests for 128-bit and 256-bit in the AVX512VL file instead of the AVX512BW+VL file since this is not a BW instruction.