At it's heart, this change simply extends the reasoning for proving that B must execute if A does to allow a single-successor or loop preheader chain of blocks. The majority of the change is in making that reasonable efficient.
To make this efficient, we need to cache the per-block queries for the intermediate nodes in the found path. (I'd love to do the edge cases too, but the invalidation is trickier.) This patch does so by taking an existing loop level cache, and essentially splitting it into a per-block cache and then summarizing back to loop level. In particular, we exactly parallel the construction and invalidation of the new-block cache so that no new invalidation events should be needed. The new cache should be "as correct" as the original code.
The invalidation actions we need to worry about are adding and removing instructions from a block. For removal, we might end up in an imprecise state. For addition, we might end up in a incorrect state. The existing LoopProperties cache has exactly the same issues, and depends on forgetLoop calls for correctness when we insert new instructions (with interesting properties) into a loop.
Long term, I'm actually hoping to sink the notion of block properties into BasicBlock itself, but starting here with a standalone patch makes a lot of sense.
This is a really strange and counter-intuitive limitation. I'm OK with it for now, but I think we might want to remove it in the future. Should we add a TODO here or in method description?