This is a follow-up on D24681 which supports lowerInterleavedLoad() on X86.
This change-set supports lowerInterleavedStore(). It mainly provides the necessary infrastructure/utilities in order to have lowerInterleavedStore() in place. It does not try to support more patterns beyond what X86InterleaveAccess already supports (currently, X86InterleavedAccess supports interleaved access with 64 x 4 bits in transpose4_4()).