A few items I came across ...
a) Using a do...while loop in the number formatter means we do not have to special case zero.
b) Let's use 'if (auto size = ...) {}' for appending to the output buffer.
c) We should also be using memcpy there, not memmove -- the string being appended is never part of the current buffer.
d) Let's put all the operator<< functions together.
e) I find 'if (cond) frob(..., true) ; elseOD frob(..., false)' somewhat confusing. Let's just use std::abs in the signed integer printer and let CSE decide about the duplicate < 0 testing.
f) Let's have as many as possible return *this. That's both more consistent, and allows tailcalls in some cases (the actual number formatter has a local array though).
These changes removed around 100 bytes from the demangler's instructions on x86_64.