Both llvm.experimental.guard and llvm.assume intrinsics can introduce facts in the middle of a basic block. Previously, EarlyCSE was not exploiting these facts when visiting later instructions in the same block.
Worth noting is that the choice to visit each operand does make the algorithm O(n^2) in the number of instructions. However, this is not new. We have to hash the operands, so we're already O(n^2). Apparently, most real code doesn't consist of long series of calls which consume each previous instruction. Who knew?
In the pathological case that you mentioned won't this be O(I^3), since you're doing:
Maybe we can find an better bound as a function of the number of total uses?