D104868 removed an (incorrect) fold for distributing BFI instructions in a chain, combining them into a single instruction. BFI's like that are hard to test, as the patterns are often destroyed before they become BFIs. But it did lead to regressions in some of our tests.
This patch adds a replacement, which reassociates BFI instructions with non-overlapping insertion masks so that low bits are inserted first. This can end up sorting the nodes so that adjacent inserts are next to one another, allowing the existing folds to combine into a single BFI.
Please add a comment explaining what this is doing.