Labels are matched using a regexp of the form '^(pattern):', which
requires the addition of a "suffix" concept to NamelessValue.
Aside from that, the key challenge is that block labels are values, and
we typically capture values including the prefix '%'. However, when
labels appear at the start of a basic block, the prefix '%' is not
included, so we must capture block label values *without* the prefix
'%'.
We don't know ahead of time whether an IR value is a label or not. In
most cases, they are prefixed by the word "label" (their type), but this
isn't the case in phi nodes. We solve this issue by changing the
variable generalization into a two-pass algorithm: the first pass finds
all occurences of a variable and determines whether the '%' prefix can
be included or not. The second pass does the actual substitution.
I did consider the alternative of trying to detect the phi node case
using more regular expression special cases but ultimately decided
against that because it seemed more fragile, and perhaps the approach
of keeping a tentative prefix that may later be discarded could also be
eventually applied to some metadata cases.
This would require more refactoring, though: right now, the two-pass
approach is only used locally to a function, which isn't sufficient for
metadata. Tackling this larger refactoring was a bit too much.
Maybe you can split the part duplicated between this loop and the one below to something like
?