This patch improves support for combining the SSE4A EXTRQ(I) and INSERTQ(I) intrinsics:
1 - Converts INSERTQ/EXTRQ calls to INSERTQI/EXTRQI if the 'index' operand is constant
2 - Converts INSERTQI/EXTRQI calls to shufflevector if the bit index/length are both byte aligned (we can already lower shuffles to INSERTQI/EXTRQI if its useful)
3 - Constant folding support
4 - Add zeroinitializer handling
Michael - I've also removed some old INSERTQI 'bundling' code that attempted to merge 2 INSERTQI calls; this doesn't actually work as it assumed that we were inserting from the same source and that ranges could be merged together. This isn't true as we always insert the bottom bits from a source. Technically we could reduce this to a single case where both are inserting to the same destination index and then just have a single insertion of the maximum of the 2 lengths - if you think this is worth it I'll add back the code for that case only.