Added a missing memory folding relationship for the CVTDQ2PS instruction (and its AVX variants) - we can safely fold these (but not the CVTDQ2PD versions which have a register/memory size discrepancy in the source operand). I've added a test case demonstrating that stack folding now works.
Also fixed an issue with the VCVTTPD2DQ / VCVTTPS2DQ instructions being incorrectly put in the 2 source operand folding tables instead of the 1 source operand.
Finally, tidied up some entries in the folding tables so that they are under the correct comment section (they were categorised as AVX2 instructions when they're AVX1).