If a collection of interconnected phi nodes is only ever loaded, stored or bitcast then we can convert the whole set to the bitcast type, potentially helping to reduce the number of register moves needed as the phi's are passed across basic block boundaries. This has to be done in CodegenPrepare as it naturally straddles basic blocks. It just looks from phi nodes, looking at uses and operands for a collection of nodes that all together are bitcast between float and integer types. We record visited phi nodes to not have to process them more than once. The whole subgraph is then replaced with a new type. Loads and Stores are bitcast to the correct type, which should then be folded into the load/store, changing it's type.
This comes up in the biquad testcase due to the way MVE needs to keep values in integer registers. I have also seen it come up from aarch64 partner example code, where a complicated set of sroa/inlining produced integer phis, where float would have been a better choice. I can't provide an example unfortunately (and you might argue it should be fixed earlier in that case). It also comes up in an X86 atomic test case, which looks better and functionally OK to my untrained eyes. I can make this ARM or MVE only, if that would be better. I also added extract_element handling, which increased this from a 60% improvement to a 109% improvement.
The list of Defs you're handling here is sort of a heuristic. The transform itself doesn't really care.
I guess the choice of LoadInst and ExtractElementInst here is a reasonable conservative guess. If we wanted to be more aggressive, we'd probably want to try to do some real cost estimate.