The option splits BasicBlocks into minimal statements such that no additional scalar dependencies are introduced.
The algorithm is based on a union-find structure, and unites sets if putting them into separate statements would introduce a scalar dependencies. As a consequence, instructions may be split into separate statements such their relative order is different than the statements they are in. This is accounted for instructions whose relative order matters (e.g. memory accesses).
The algorithm is generic in that heuristic changes can be made with relative ease. We might relax the order requirement for read-reads or accesses to different base pointers. Forwardable instructions can be made to not cause a join.
This implementation gives us a speed-up of 82% in SPEC 2006 456.hmmer benchmark by allowing loop-distribution in a hot loop.
Nit: Could you add a one line explanation as to what the granularity refers to, please?