Changes to allow the ParallelDSP to perform some vectorisation on add instructions in a (typically unrolled) loop to convert them to sadd16.
- ParallelChains has been introduced to collate multiple parallel OpChains and Reduction now inherits from this class.
- The SuperWord class is introduced, also inheriting from ParallelChains, to represent parallel chains rooted at different store instructions. These are created while searching for sequential stores and those form the roots from which we can then compare the chains in the usual way.
- AreAliased is now given a ParallelChain instead of the OpChainList 'Candidates', which allows us to query other writes in the region.
- Finally, to help memory management, OpChainList also now holds a unique_ptr to the OpChain.
I've also made some misc changes like initialising pointers and moving a couple of things into lamda helpers...