Size-wise, BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM is the most
expensive opcode, since it comes with an associated symbol string. We
were previously emitting it once per binding, instead of once per
symbol. This diff groups all bindings for a given symbol together and
ensures we only emit one such opcode per symbol. This matches ld64's
behavior.
While this is a relatively small win on chromium_framework (-72KiB), for
programs that have more dynamic bindings, the difference can be quite
large.
This change is perf-neutral when linking chromium_framework.
Is it guaranteed that we have filtered out separate symbols pointing at the same address by now? If not, this has nondeterminism issues (here and in the sort below)