Use an arena allocator to reduce the number of allocations and frees. Individual allocations become extremely cheap, and individual frees are free. Change from std::string to a custom string implementation that is a reference to immutable data, so that substr and copies are free. Remove the null terminator so that strings can be appended without copying the base.
I have some concerns about the approach. Some of the code is non-portable, I guess, but that can be fixed.
You might want to try reducing the amount of memory allocated rather than growing an homemade allocator to amortize malloc()/free() cost. I'm not positive this can be reached with the current design, but there have been several discussions in the past about a full rewrite. (Please note that the demangler has deeper issues, including looking at its own output which has been cause many bugs).
I'm not sure this will work with MSVC.
The admittedly non-portable extension works on MSVC, gcc, and clang. So it's mostly portable ;-) But it's a hack, I admit.
Though I would prefer to remove plus the arena through to all the callsites. It would make the diff less readable, so I thought I'd start with this version. Sorry I should have made that clear from the beginning.