Currently we have computations of p_filesz and p_memsz mixed together
with the use of a loop over fragments. After recent changes it is possible to
avoid using a loop for the computation of p_filesz, since we know that fragments
are sorted by their file offsets.
The main benefit of this change is that splits the computation of p_filesz
and p_memsz what is simpler and allows us to fix the computation of the
p_memsz independently (D78005 shows the issue that we have currently).
Also it fixes the bug in program-header-size-offset.yaml.
Depends on D78627.
This is a little subtle, and probably deserves a comment to explain why we pay attention to offset but not size of a single trailing SHT_NOBITS sections.