This is an archive of the discontinued LLVM Phabricator instance.

[LLD][ELF] Support --[no-]mmap-output-file with F_no_mmap
ClosedPublic

Authored by terrelln on Oct 21 2019, 8:42 PM.

Details

Summary

Add a flag F_no_mmap to FileOutputBuffer to support
--[no-]mmap-output-file in ELF LLD. LLD currently explicitly ignores
this flag for compatibility with GNU ld and gold.

We need this flag to speed up link time for large binaries in certain
scenarios. When we link some of our larger binaries we find that LLD
takes 50+ GB of memory, which causes memory pressure. The memory
pressure causes the VM to flush dirty pages of the output file to disk.
This is normally okay, since we should be flushing cold pages. However,
when using BtrFS with compression we need to write 128KB at a time when
we flush a page. If any page in that 128KB block is written again, then
it must be flushed a second time, and so on. Since LLD doesn't write
sequentially this causes write amplification. The same 128KB block will
end up being flushed multiple times, causing the linker to many times
more IO than necessary. We've observed 3-5x faster builds with
-no-mmap-output-file when we hit this scenario.

The bad scenario only applies to compressed filesystems, which group
together multiple pages into a single compressed block. I've tested
BtrFS, but the problem will be present for any compressed filesystem
on Linux, since it is caused by the VM.

Silently ignoring --no-mmap-output-file caused a silent regression when
we switched from gold to lld. We pass --no-mmap-output-file to fix this
edge case, but since lld silently ignored the flag we didn't realize it
wasn't being respected.

Test Plan:

ninja check-llvm
ninja check-lld

Benchmark building a 9 GB binary that exposes this edge case. I linked 3
times with --mmap-output-file and 3 times with --no-mmap-output-file and
took the average. The machine has 24 cores @ 2.4 GHz, 112 GB of RAM,
BtrFS mounted with -compress-force=zstd, and an 80% full disk.

ModeTime
mmap894 s
no mmap126 s

When compression is disabled, BtrFS performs just as well with and
without mmap on this benchmark.

I was unable to reproduce the regression with any binaries in
lld-speed-test.

Diff Detail

Event Timeline

terrelln created this revision.Oct 21 2019, 8:42 PM
Herald added a project: Restricted Project. · View Herald Transcript
MaskRay added a comment.EditedOct 22 2019, 11:20 AM

I don't know btrfs well but my current understanding is that there is a trade-off between compression rates and performance.

--no-mmap-output-file is supported though currently ignored, so filling in its functionality seems fine. I've got some suggestions and questions about the description.

Add support for -[no-]mmap-output-file to ELF. LLD currently explicitly ignores this flag for compatibility with the GNU linker.

GNU linker*s*, or more explicitly, GNU ld and gold.

However, when BtrFS compression is enabled, BtrFS writes 128KB blocks. Since LLD doesn't write the output file sequentially, this causes massive write and compression amplification.

Do the flushes of random accessed dirty pages make it slow under memory pressure?

FileOutputBuffer::F_in_memory

FileOutputBuffer does no have this mode. Do you have another patch that has not been sent?

-compress-force=zstd

What is compression is turned off? Have you measured the performance differences with and without --no-mmap-output-file? I think I have asked too much, but if you have access to some ext4/XFS/ZFS machines, the numbers will be useful... CentOS has defaulted to XFS and ext4 is the most widely used fs on Linux.

lld/test/ELF/mmap-output-file.s
4 ↗(On Diff #225990)

Just check the output is the same.

llvm-mc -filetype=obj -triple=x86-unknown-linux /dev/null -o %t.o
ld.lld %t.o -o %t
ld.lld --no-mmap-output-file %t.o -o %t1
cmp %t %t1
ld.lld --mmap-output-file %t.o -o %t2
cmp %t %t2
ruiu added a comment.Oct 22 2019, 11:45 AM

I should have reviewed this one first, sorry. I'm a little sad that BtrFS behaves poorly under a certain condition, and I feel like the filesystem is at fault than the linker, but I'm fine with actually supporting -no-mmap-output-file option as a workaround. That's practical and easy to implement.

lld/ELF/Config.h
203

Sort asciibetically.

lld/ELF/Options.td
414

Map -> Mmap

415

map -> mmap

I'll update the description to answer questions, and make the requested changes. I've also answered questions inline below.

I'm a little sad that BtrFS behaves poorly under a certain condition, and I feel like the filesystem is at fault than the linker

I'd say it is a combination of the linker and BtrFS causing the problem. With compression enabled, BtrFS has to write 128 KB at a time. When there is memory pressure, because the linker uses ~50 GB of memory, the kernel is forced to flush the page cache, so BtrFS is forced to do a write. I suspect you'd see the same problem on any compressed filesystem when under memory pressure. Reducing the memory footprint of the linker would also alleviate the problem.

It is possible the kernel page eviction algorithm could be improved if it knew the "block size" of the filesystem, and I've asked our BtrFS folks if they see a way this case can be improved.

Do the flushes of random accessed dirty pages make it slow under memory pressure?

Yeah, BtrFS is forced to flush the dirty pages, but it has to write 128 KB at a time. This drastically slows down non-sequential write patterns since the same 128 KB block is written multiple times.

FileOutputBuffer::F_in_memory

Thats the parent diff D69293

What is compression is turned off? Have you measured the performance differences with and without --no-mmap-output-file? I think I have asked too much, but if you have access to some ext4/XFS/ZFS machines, the numbers will be useful... CentOS has defaulted to XFS and ext4 is the most widely used fs on Linux.

When compression is disabled I don't see the same problem, and I don't expect to see it on other filesystems (without compression enabled). The reason is we don't see as much write amplification. When compression is disabled flushing a 4 KB page only writes 4 KB. And, the kernel should only flush colder pages, so the write amplification should be minimized. But, maybe the kernel isn't as smart about flushing a cold 4 KB page in a warm 128 KB range.

Can you try posix_fallocate(fd, 0, len) and see how it performs under memory pressure with btrfs+compression?

Can you try posix_fallocate(fd, 0, len) and see how it performs under memory pressure with btrfs+compression?

I see the same behavior, which is expected, since we'll still see the write amplification.

terrelln updated this revision to Diff 226369.Oct 24 2019, 6:02 PM
  • Merge D69293 into this diff.
  • Fix all comments.
terrelln retitled this revision from [LLD][ELF] Support -[no-]mmap-output-file to [LLD][ELF] Support -[no-]mmap-output-file with F_no_mmap.Oct 24 2019, 6:04 PM
terrelln edited the summary of this revision. (Show Details)
terrelln added a project: lld.
terrelln edited the summary of this revision. (Show Details)Oct 24 2019, 6:09 PM
ruiu accepted this revision.Oct 24 2019, 7:52 PM

LGTM

lld/ELF/Writer.cpp
2598–2600

nit: can you replace this = with |= for consistency?

This revision is now accepted and ready to land.Oct 24 2019, 7:52 PM
MaskRay accepted this revision.Oct 24 2019, 10:32 PM
MaskRay added inline comments.
lld/test/ELF/mmap-output-file.s
5 ↗(On Diff #226369)

Prefer --m in tests. Due to the way the parser works, -m emulation is not considered, but we don't want users to use -mmap-output-file

terrelln updated this revision to Diff 226466.Oct 25 2019, 11:10 AM
terrelln edited the summary of this revision. (Show Details)

Fix comments

terrelln marked 2 inline comments as done.Oct 25 2019, 11:11 AM

Will you need someone to commit this for you?

-[no-]mmap-output-file

Please also change the title to use the double-dashed form.

lld/test/ELF/mmap-output-file.s
3 ↗(On Diff #226466)

Just use x86_64. This is not Linux specific.

terrelln retitled this revision from [LLD][ELF] Support -[no-]mmap-output-file with F_no_mmap to [LLD][ELF] Support --[no-]mmap-output-file with F_no_mmap.Oct 28 2019, 9:47 PM
terrelln edited the summary of this revision. (Show Details)
terrelln updated this revision to Diff 226835.Oct 28 2019, 9:50 PM

Use triple x86_64 in tests.

terrelln marked an inline comment as done.Oct 28 2019, 9:50 PM
ruiu added a comment.Oct 28 2019, 9:58 PM

Feel free to submit.

I can commit this for you tomorrow if one of your reviewers hasn't gotten to it by then.

@smeenai I just tried

curl -L 'https://reviews.llvm.org/D69294?download=true' | patch -p1; git commit -m 'Differential Revision: https://reviews.llvm.org/D69294'; arc amend

:)

The nice thing with git is that %an and %ae (author name and email) will be just correct. The old Patch by Name svn way does not give enough attribution to the author :(

This revision was automatically updated to reflect the committed changes.