It was discovered that this pass can be slow on huge functions, meaning 20% compile time instead of the usual ~0.5% (with a test case spending ~19 mins on SystemZ just in the backend with llc).
The problem relates to the necessary clearing of the kill flags when a redundant instruction is removed. This was made by scanning backwards over the instructions until a use(/def) was found (I tried simply clearing all kill flags on the involved registers, but that did change a few cases of later optimizations, so it seems that this should not be simplified like that).
This patch replaces the scanning of instructions with a map from register to the last seen use (kill).
Compile time on the huge file has been remedied fully (-> 0.4%), but I see on SystemZ/SPEC a slight average compile time increase (0.46% -> 0.52%). This is small enough that it seems wise to go ahead and use the map to help the huge functions (or..?).
Some alternate experiments with this has already been made at https://reviews.llvm.org/D146810, but this turned out to be the simpler version with the same performance.
If the number of map pairs is usually small I would use SmallDenseMap as it has pre-allocated storage, especially when you're placing it in a vector