When combiner AA is enabled, look at stores on the same chain.
Non-aliasing stores are moved to the same chain so the existing
code fails because it expects to find an adajcent store on a consecutive
chain.
There are some ordering issues by adding this so the optimal store merge
is not found, and I'm not sure how best to solve them.
Because of how DAGCombiner tries these store combines,
MergeConsecutiveStores doesn't see the correct set of stores on the chain
when it visits the other stores. Each store individually has its chain
fixed before trying to merge consecutive stores, and then tries to merge
stores from that point before the other stores have been processed to
have their chains fixed.
Suppose you have 4 32-bit stores that should be merged into 1 vector
store. One store is visited first, fixing the chain. What happens is
because not all of the store chains have yet been fixed, 2 of the stores
are merged. The other 2 stores later have their chains fixed,
but because the other stores were already merged, they have different
memory types and merging the two different sized stores is not
supported and would be more difficult to handle.
I'm able to get what I want to happen by only performing
MergeConsecutiveStores on the legalized DAG, at which point
all of the nonaliasing stores have had their chains fixed. However,
it is also desirable to have MergeConsecutivStores process stores of illegal
types like it does now, so I'm not sure how to best ensure all stores
have their chains fixed before store merging happens. Should
FindBetterChain search through consecutive chains and try to handle
other consecutive stores at the same time? Should there be some DAG
preprocess step to fix all store chains or something else?
Why? Expanding memcpy, etc. also creates these kinds of parallel stores.