The old code resulted in placement new calls on Linux64, and memmove
calls on Windows. The new code gets to use memcpy instead. This
change makes an analysis of a large .c file on Linux64 go from
6m57.730s to 6m44.317s.
This effectively reverts r135364.
The commit log for that change is...
Simplify & microoptimize code. No intended functionality change.
It doesn't include any rationale or metrics regarding the speedup.