GEP merging can sometimes increase the number of live values and register
pressure across control edges and cause performance problems particularly if the
increased register pressure results in spills.
This change implements GEP unmerging around an IndirectBr in certain cases to
mitigate the issue. This is in the CodeGenPrepare pass (after all the GEP
merging has happened.)
With this patch, the Python interpreter loop runs faster by ~5%.
I don't think this check is necessary. GEPIOp is constrained to be defined in SrcBlock, and it's SrcBlock that has the IndirectBr terminator, so *any* use of GEPIOp outside of SrcBlock keeps it live over the indirect edge. I don't see why we wouldn't unmerge regardless of the parent block here.