Add vector costs for ceil/floor/trunc/round. As can be seen in the tests, the prior default costs were a significant under estimate of the actual code generated.
These costs are computed by simply generating code with the current backend, and then counting the number of instructions. I discount one vsetvli, and ignore the return.
These costs were generated by some simple scripting. Assuming we're happy with the code structure, I plan to add a bunch more intrinsics using the same approach (in individual reviews).
Add v8f16 for floor