Added a missing memory folding relationship for the CVTDQ2PS instruction (and its AVX variants) - we can safely fold these (but not the CVTDQ2PD versions which have a register/memory size discrepancy in the source operand). I've added a test case demonstrating that stack folding now works.
Also fixed an issue with the VCVTTPD2DQ / VCVTTPS2DQ instructions being incorrectly put in the 2 source operand folding tables instead of the 1 source operand.
Finally, tidied up some entries in the folding tables so that they are under the correct comment section (they were categorised as AVX2 instructions when they're AVX1).
Is there a guideline regarding IR test cases with types that aren't specified by the ABI?
Ie, is the lowering of 128-element vectors to legal x86 types stable? Would it be better to define a struct here?