Dyld on arm64 macOS has strict requirements for alignment and sequence of segments and sections.
I developed this diff by incrementally changing alignments & sequences to match the output of ld64. I stopped when arm64 macOS began executing my test programs rather than immediately rejecting them in execve(2) with errno == EBADMACHO "Malformed Mach-O file".
Hmm, this is 4 and not WordSize? It is plausible, but might be reasonable to add a comment explaining why 4 on 64-bit systems is needed/correct.