The test case from:
https://bugs.llvm.org/show_bug.cgi?id=42771
...shows a ~30x slowdown caused by the awkward loop iteration (rL207302) that is seemingly done just to avoid invalidating the instruction iterator. We can instead delay instruction deletion until we reach the end of the block (or we could delay until we reach the end of all blocks).
There's a test diff here for a degenerate case with llvm.assume that I didn't step through, but I think that's caused because isInstructionTriviallyDead() may not be true for a function call even if that call has no uses.
This change probably doesn't result in much overall compile-time improvement because we call -instsimplify as a standalone pass only once in the standard -O2 opt pipeline from what I see.
Why did you change the logic here? Now we might try to simplify an instruction that has no uses but is not trivially dead, which at least makes the comment on L45 slightly wrong.