This allows vector-sized store merging of constants in DAGCombiner using the existing code in MergeConsecutiveStores(). All of the twisted logic that decides exactly what vector operations are legal and fast for each particular CPU are handled separately in there using the appropriate hooks.
Some notes:
- For the motivating tests in merge-store-constants.ll, we already produce the same vector code in IR via the SLP vectorizer. So this is just providing a backend backstop for code that doesn't go through that pass (-O1). More details in PR24449:
https://bugs.llvm.org/show_bug.cgi?id=24449 (this change should be the last step to resolve that bug)
- At the minimum vector size limit (16-bytes), we're trading two 8-byte scalar immediate stores for one 16-byte constant pool load and one 16-byte store (eg, fold-vector-sext-crash2.ll::test_sext1). I think that's a reasonable trade-off because offloading any work to vector registers should ease pressure on register-starved scalar code, but let me know if there are other considerations. We could adjust this in the hook by returning true only for >2*max scalar size, so we know there would an instruction reduction.
- There's a likely regression in vector-sext-crash2.ll::test_zext1 and mod128.ll where we materialize a constant in scalar and then send it over to the vector unit. I know we have some bug reports related to that. A quick scan turned up:
https://bugs.llvm.org/show_bug.cgi?id=26301
...but there are probably others.