The code was added in r252352, probably to address some layout
issues. Actually PT_TLS's p_memsz doesn't need to be aligned. ld.bfd
doesn't do that.
In case of larger alignment (e.g. 64 for Android Bionic on AArch64, see
D62055), this may make the overhead much smaller.