This is both more efficient and more ergonomic to use, as inverting a
bit vector is trivial while inverting a set is annoying.
Sadly this leaks into a bunch of APIs downstream, so adapt them as well.
This would be NFC, but there is an ordering dependency in MemRefOps's
computeMemRefRankReductionMask. This is now deterministic, previously it
was dependent on SmallDenseSet's unspecified iteration order.
Is this mlir:: necessary?