The original motivation for this was to implement moreElementsVector of shuffles
on AArch64, which resulted in complex sequences of artifacts like unmerge(unmerge(concat...))
which the combiner couldn't handle. It seemed here that the better option,
instead of writing ever-more-complex combines, was to have a way to find
the original "non-artifact" source registers for a given definition, walking
through arbitrary expressions of unmerge/concat/insert. As long as the bits
aren't extended or truncated, this is a pretty simple algorithm that avoids
the need for lots of combines and instead jumps straight to the final result
we want.
I've only used this new technique in 2 places within tryCombineUnmerge, using it
in more general situations resulted in infinite loops in AMDGPU. So for now
it's used when we would otherwise fail to combine and that seems to work.
In order to support looking through G_INSERTs, I also had to add it as an
artifact in isArtifact(), which caused a whole lot of issues in tests. AMDGPU
started infinite looping since full legalization of G_INSERT doensn't seem to
be there. To work around this, I've temporarily added a CLI option to use the
old behaviour so that the MIR tests will still run and terminate.
Other minor changes include no longer making >128b G_MERGE/UNMERGE legal.
We never had isel support for that anyway and it was a remnant of the legacy
legalizer rules. However being legal prevented the combiner from checking if it
was dead and deleting them.
I don't think we want to combine G_INSERT but instead not generate them at all. Is this insert from ir or from legalizer?
Did you consider this case? I had also encountered infinite loops when trying to deal with this combine when one of the vector sizes is odd.
I wanted to avoid making G_INSERT and use LCM merge/unmerge pad with undef instead. AMDGPU can deal with merge/unmerge when sizes are even (in the end it would turn everything into vectors of size 2 in worst case).
With odd vector size at some point it has to do "pad with undef". U guess that we could make combine that recognizes value that is spanned over real element and undef but it looks unnecessary complicated to combine.
I thought about adding new artifacts equivalent to G_ANYEXT/G_TRUNC but for vectors (these would be simplified versions of sdag's INSERT_SUBVECTOR and EXTRACT_SUBVECTOR) with intent to be trivial to combine. They would pad vector with undef elements/trunc elements. These do not exist in llvm-ir, and when used in more/fewer_elements_vector would always end up combining with each other if we consistently ask for same number of elements in all more/fewer_elements_vector legalizer rules.
Would this work for you (I would expect that there would be no need to combine insert)?