Previously X86 would call getTypeLegalizationCost then do the table look up and multiply the result by the splitting cost. But that's not really representative of how the reduction occurs. For example it will count the cost of the extract element multiple times. And for the non-pairwise case, a reduction on a large vector type just does a bunch of packed operations on full legal registers until we're down to only a single register of values. Then the horizontal shuffling starts happening.
This patch has the base implementation call back to the target implementation once we've reached a legal type. Since that's likely the case where the target may have some horizontal tricks. If the target doesn't have special handling we should just come back to the base implementation, but end up in the lower part of the code. I've changed X86 to only do lookups on simple types and illegal types will then go to the base implementation for legalization.
Please add a comment describing how this is supposed to work (similar to the patch's summary).