This fixes most of the scheduling info for SKX vector operations.
I had to split a lot of the YMM/ZMM classes into separate classes for YMM and ZMM.
I've left out the DQ/QQ issues for now. I'll fix them in a separate change.
There seems to be several cases of broadcast/permutations specialized on the mask value, I have not touched these.
The before/after llvm-exegesis analysis are in the phabricator diff.