This patch handles back-end folding of generic patterns created by D48067 in cases where the instruction isn't a straightforward packed values rounding operation, but a masked operation or a scalar operation.
Can we just do this with isel patterns like we do for ADDSS?
Can you generate %k from a compare instruction rather than passing in a X x i1 type. It will make the code a little cleaner since we won't have to extend and split the mask in such crazy ways.
I've considered that, but decided to fold it here. To do it in .td patterns we'd need to add 4 new patterns in 2 separate files. 32 and 64 bit patterns would need to be added for VROUNDS* on AVX and ROUNDS* on SSE4.1. Writing this pattern here both makes it easier to track and produces less check complexity.
Added zero extension of mask to i32 in the masked scalar tests and added more ways to represent the mask, testing the 8-bit mask pattern among others. 16-bit mask patterns removed due to scalar_to_vector errors.
Corrected the scalar pattern predicates, added packed zero-masked instruction patterns and tests to cover zero-masking. Changed the RUN line of vec_floor.ll to give different results for AVX512F and AVX512VL where needed (e.g. in 128- and 256-bit masked operations).