This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lld/ELF/
-
ELF/
5/5
LTO.cpp
-
llvm/
-
include/llvm/
-
llvm/
-
Bitcode/
1
BitcodeWriter.h
-
Bitstream/
16/17
BitstreamWriter.h
-
lib/Bitcode/Writer/
-
Bitcode/
-
Writer/
-
BitcodeWriter.cpp

Differential D86905

Flush bitcode incrementally for LTO output
ClosedPublic

Authored by stephan.yichao.zhao on Aug 31 2020, 8:48 PM.

Download Raw Diff

Details

Reviewers

MaskRay
tejohnson
• espindola
mehdi_amini

Commits

rG11201315d588: Flush bitcode incrementally for LTO output

Summary

Bitcode writer does not flush buffer until the end by default. This is
fine to small bitcode files. When -flto,--plugin-opt=emit-llvm,-gmlt are
used, the final bitcode file is large, for example, >8G. Keeping all
data in memory consumes a lot of memory.

This change allows bitcode writer flush data to disk early when buffered
data size is above some threshold. This is only enabled when lld emits
LLVM bitcode.

One issue to address is backpatching bitcode: subblock length, function
body indexes, meta data indexes need to backfill. If buffer can be
flushed partially, we introduced raw_fd_stream that supports
read/seek/write, and enables backpatching bitcode flushed in disk.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

stephan.yichao.zhao created this revision.Aug 31 2020, 8:48 PM

Herald added a reviewer: • espindola. · View Herald TranscriptAug 31 2020, 8:48 PM

Herald added a reviewer: MaskRay. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, dexonsmith, steven_wu and 4 others. · View Herald Transcript

stephan.yichao.zhao requested review of this revision.Aug 31 2020, 8:48 PM

Harbormaster completed remote builds in B70161: Diff 289055.Aug 31 2020, 8:50 PM

updated

stephan.yichao.zhao edited reviewers, added: tejohnson; removed: • espindola.Aug 31 2020, 8:55 PM

Herald added a reviewer: • espindola. · View Herald TranscriptAug 31 2020, 8:55 PM

stephan.yichao.zhao removed a reviewer: • espindola.Aug 31 2020, 8:56 PM

Herald added a reviewer: • espindola. · View Herald TranscriptAug 31 2020, 8:56 PM

I can understand the read-write stream requirement, but the changes to lib/Support may require an additional set of reviewers. You will need some unittests (see llvm/unittests/Support/raw_ostream_test.cpp for example) Probably consider splitting the patch into two.

lld/ELF/LTO.cpp
69	Avoid braces around simple statements http://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements
71	`return {};` openFile would likely fail for the same reason.
llvm/include/llvm/Support/raw_ostream.h
318 ↗	(On Diff #289070)	https://llvm.org/docs/HowToSetUpLLVMStyleRTTI.html We usually use static `classof`
llvm/lib/Support/raw_ostream.cpp
32 ↗	(On Diff #289070)	Can this be avoided?

Harbormaster completed remote builds in B70173: Diff 289070.Aug 31 2020, 9:34 PM

stephan.yichao.zhao mentioned this in D86913: Add raw_fd_stream that supports reading/seeking/writing.Sep 1 2020, 12:25 AM

addressed comments

In D86905#2248592, @MaskRay wrote:

I can understand the read-write stream requirement, but the changes to lib/Support may require an additional set of reviewers. You will need some unittests (see llvm/unittests/Support/raw_ostream_test.cpp for example) Probably consider splitting the patch into two.

created D86913 with unit tests.

lld/ELF/LTO.cpp
71	openFile opens raw_fd_ostream (write-only) but not raw_fd_stream (read-write-seek). So openFile may work on a platform that does not support seek or read-write mode).
llvm/include/llvm/Support/raw_ostream.h
318 ↗	(On Diff #289070)	Thank you. I implemented classof. then dyn_cast works! But I did not make kind as a class member initialized at constructor like that link does. Making that change will change lots of code at different child classes. Is the way that kind is defined in the link a requirement in LLVM codebase?

Harbormaster completed remote builds in B70185: Diff 289088.Sep 1 2020, 12:30 AM

update

Harbormaster completed remote builds in B70188: Diff 289091.Sep 1 2020, 12:35 AM

update

Harbormaster completed remote builds in B70189: Diff 289092.Sep 1 2020, 12:40 AM

evgeny777 added a subscriber: evgeny777.Sep 1 2020, 5:19 AM

evgeny777 added inline comments.

llvm/include/llvm/Bitstream/BitstreamWriter.h
167	Can we use memory mapped I/O and avoid backpatching on disk?
llvm/include/llvm/Support/raw_ostream.h
570 ↗	(On Diff #289092)	Comment is misleading

MaskRay added inline comments.Sep 1 2020, 12:16 PM

llvm/include/llvm/Support/raw_ostream.h
318 ↗	(On Diff #289070)	Can `Kind` use a default member initializer?

stephan.yichao.zhao marked 2 inline comments as done.Sep 1 2020, 2:53 PM

stephan.yichao.zhao added inline comments.

llvm/include/llvm/Bitstream/BitstreamWriter.h
167	Our use case is likely not what mmap is good at. I assume mmap in Linux loads pages on demand. If a code reads/writes data on pages already loaded, its access has no IO cost. For example, a code randomly accesses a chunk of continuous addresses or addresses within a same page. Although the first time a page is loaded, the memory copy and page fault cost are still paid, the cost is ignorable asymptotically. Our case is a bit different. Given a 512M incremental flush threshold, I tested an LTO built that outputs a 5G bitcode file. The BackpatchWord is called 16,613,927 times, among which only 12 needs disk seek. Plus, each access visits 4-8 bytes on a page, and all visited pages are far away from each other. It is likely that the pages are not cached, and need to load anyway, and after a load, our code does not access enough data on a page to 'cancel' the page fault cost. So its cost could be very similar to seek. Note that if a BackpatchWord needs to access disk, we need 1 seek to load existing data, 1 seek to overwrite the data, and 1 seek to jump back. The first 2 seek addresses are very close, hopefully disk cache can handle them. Although the last jump back seek is a very long jump, if a page cache is based no time or frequency, the page that it jumps back may not be evicted yet. Overall the ratio of disk access introduced is very small, so hopefully its additional cost is small. I also did a perf profile, no observable latency is shown (because LTO takes too much time). Give the above and that mmap support is different across systems, the seek based approach seems fine.
llvm/include/llvm/Support/raw_ostream.h
570 ↗	(On Diff #289092)	Will be updating this at https://reviews.llvm.org/D86913

stephan.yichao.zhao marked 3 inline comments as done.Sep 1 2020, 4:11 PM

stephan.yichao.zhao added inline comments.

llvm/include/llvm/Support/raw_ostream.h
318 ↗	(On Diff #289070)	yes. will be updating https://reviews.llvm.org/D86913

stephan.yichao.zhao marked an inline comment as done.Sep 1 2020, 4:12 PM

Our case is a bit different. Given a 512M incremental flush threshold, I tested an LTO built that outputs a 5G bitcode file. The BackpatchWord is called 16,613,927 times, among which only 12 needs disk seek. Plus, each access visits 4-8 bytes on a page, and all visited pages are far away from each other. It is likely that the pages are not cached, and need to load anyway, and after a load, our code does not access enough data on a page to 'cancel' the page fault cost. So its cost could be very similar to seek.

It seems that you're trying to implement your own I/O caching. I don't understand why you're not letting OS to do this for you. For instance on systems with larger amount of memory (I have 64 GB on my home PC, typical build server may have even more) mmap will buffer all your 5G bc file in memoy and then write it back to disk without any seek operations (which are costly on traditional HDD).

Give the above and that mmap support is different across systems, the seek based approach seems fine.

LLVM has FileOutputBuffer class which abstracts underlying OS differences. LLVM lld.lld linker uses it for output file generation

In D86905#2251511, @evgeny777 wrote:

Our case is a bit different. Given a 512M incremental flush threshold, I tested an LTO built that outputs a 5G bitcode file. The BackpatchWord is called 16,613,927 times, among which only 12 needs disk seek. Plus, each access visits 4-8 bytes on a page, and all visited pages are far away from each other. It is likely that the pages are not cached, and need to load anyway, and after a load, our code does not access enough data on a page to 'cancel' the page fault cost. So its cost could be very similar to seek.

It seems that you're trying to implement your own I/O caching. I don't understand why you're not letting OS to do this for you. For instance on systems with larger amount of memory (I have 64 GB on my home PC, typical build server may have even more) mmap will buffer all your 5G bc file in memoy and then write it back to disk without any seek operations (which are costly on traditional HDD).

My local machine also has lot of memory to make this work. :) The problem is when LTO-ing thousands of such targets, the build server I am using throttles memory usage.
ThinLTO (https://www.youtube.com/watch?v=9OIEZAj243g) has one similar motivation like this: build services do not allow memory consumption above some-G threshold.

Although disk seek has cost, it happens < 1 out of million only when generating large bitcode files merged by LTO. A bitcode generated from each compilation unit is much smaller, does not need any disk seek.
So in practice, the disk seek overhead happens in a very small chance with a trade-off to save lot of memory to make build thousands of large targets by a build service.

Give the above and that mmap support is different across systems, the seek based approach seems fine.

LLVM has FileOutputBuffer class which abstracts underlying OS differences. LLVM lld.lld linker uses it for output file generation

Thank you for sharing FileOutputBuffer. This is a useful platform-independent mmap.
If it uses mmap, maybe it is not necessary to buffer the entire file contents in memory, but still leveraging OS page management.
So it still needs to reload evicted pages, which is similar to seek. If it buffers all 5G in memory, the memory issue still exists.

The current WriteBitcodeToFile API assumes a raw_ostream argument. If we used FileOutputBuffer, we may want to encapsulate it as a subclass of raw_ostream like raw_fd_stream.
So raw_fd_stream can be extended to use FileOutputBuffer internally when necessary in the future.

WriteBitcodeToFile does not know the size beforehand - this makes FileOutputBuffer impractical. I think raw_fd_stream is still needed.

MaskRay added a reviewer: mehdi_amini.Sep 2 2020, 3:24 PM

rebased from D86913

Harbormaster completed remote builds in B70555: Diff 289756.Sep 3 2020, 10:35 AM

ping

Can you rebase this on top of D86913 so that it doesn't include those changes?

tejohnson added inline comments.Sep 11 2020, 7:06 PM

llvm/include/llvm/Bitstream/BitstreamWriter.h
100	Would it be valuable to make this configurable? How sensitive is performance to the value chosen here?
163	Why is this guarded by NDEBUG? I'm not convinced there is much value in doing this code even when StartBit is 0 in the debug case.

Jianzhou Zhao <jianzhouzh@google.com> mentioned this in rG0ece51c60c51: Add raw_fd_stream that supports reading/seeking/writing.Sep 12 2020, 12:36 AM

addressed comments

Herald added a subscriber: dang. · View Herald TranscriptSep 12 2020, 1:07 AM

stephan.yichao.zhao added inline comments.Sep 12 2020, 1:08 AM

llvm/include/llvm/Bitstream/BitstreamWriter.h
100	Added a flag to plugin-opt. Is it the right way to do this?
163	This relates to the assert at line 165. At debug mode, the assert at line 165 checks if the value to backpatch is 0. So the code needs to read data from disk. At non-debug mode, the assert at line 165 does not verify existing data. So the code does not need to read data from disk for this reason. But if StartBit is non 0, the code still needs to read the existing data because the backpatched value is not aligned. For example, when backpatching with StartBit = 2, the aligned data on disk are, c0 00 00 00 3f So we need to read them out, fill in those 0s, then write back. Although the code can always read these bytes from disk, I wanted to save some IO overhead. Added comments.

Harbormaster completed remote builds in B71454: Diff 291377.Sep 12 2020, 1:51 AM

Please mention in the summary somewhere that this is only enabled for lld right now.

lld/ELF/LTO.cpp
171	Nit, document constant parameters with /parameter_name=/
llvm/include/llvm/Bitstream/BitstreamWriter.h
100	Oh sorry, I just meant an llvm internal option (cl::opt<int>) in this file. Will let @MaskRay comment on whether they want it as an lld option.
159	s/path/patch/. But I think the whole comment block would be clearer if written in the affirmative sense, e.g. something like: // When unaligned, copy existing data into Bytes from the file FS and the buffer Out so // that it can be updated before writing. For debug builds read bytes unconditionally // in order to check that the existing value is 0 as expected. Also as noted below suggest moving comment just above the #ifdef check below.
163	Ah ok. I suggest moving that assert up under the #ifdef too then just for clarity, since they go together logically. And as suggested above, move that whole comment about filling in Bytes to here just above the #ifdef.
172	Why this line added? Oh ic, presumably to avoid an unused variable warning in the NDEBUG case. Maybe just add a comment to that effect.

addressed comments

stephan.yichao.zhao marked 3 inline comments as done.Sep 12 2020, 1:34 PM

stephan.yichao.zhao added inline comments.

lld/ELF/LTO.cpp
171	reverted the change at lld options.
llvm/include/llvm/Bitstream/BitstreamWriter.h
100	Switched to cl::opt. Thank you for the suggestion, I did not know this option. What is the proper way to call the options with clang or lld? For example, I tried clang/ld.lld with --bitcode-mdindex-threshold=1 or -bitcode-mdindex-threshold=1. They are not accepted. Do we need any prefix before them?

stephan.yichao.zhao marked an inline comment as done.Sep 12 2020, 1:37 PM

stephan.yichao.zhao added inline comments.

llvm/include/llvm/Bitstream/BitstreamWriter.h
100	Added cl::opt at BitcodeWriter.cpp instead of here. BitstreamWriter.h does not have cpp. If we add its cpp, we also need to add a library, and many code needs to update to use the new library like this (https://reviews.llvm.org/D63899).

Harbormaster completed remote builds in B71475: Diff 291416.Sep 12 2020, 2:14 PM

update

stephan.yichao.zhao edited the summary of this revision. (Show Details)Sep 12 2020, 6:27 PM

Harbormaster completed remote builds in B71488: Diff 291433.Sep 12 2020, 7:08 PM

There are a number of single statement if and for bodies in the patch that have braces but should not per llvm coding style.

llvm/include/llvm/Bitstream/BitstreamWriter.h
163	I just realized the first part of this suggestion doesn't make sense, what I should have said is to move the assert up into within the braces, since it goes with that code. I.e. if the code within the braces isn't executed Bytes isn't filled in so there is no point asserting whether Bytes is 0.

updated

fixed the if and for clauses style issue.

updated

Harbormaster completed remote builds in B71497: Diff 291443.Sep 13 2020, 12:17 AM

Harbormaster completed remote builds in B71498: Diff 291444.Sep 13 2020, 1:05 AM

lgtm but please wait to see if @MaskRay or @evgeny777 have more comments

This revision is now accepted and ready to land.Sep 13 2020, 8:02 AM

stephan.yichao.zhao added a child revision: D86913: Add raw_fd_stream that supports reading/seeking/writing.Sep 14 2020, 2:52 PM

This revision was landed with ongoing or failed builds.Sep 16 2020, 8:33 PM

Closed by commit rG11201315d588: Flush bitcode incrementally for LTO output (authored by Jianzhou Zhao <jianzhouzh@google.com>). · Explain Why

This revision was automatically updated to reflect the committed changes.

Jianzhou Zhao <jianzhouzh@google.com> added a commit: rG11201315d588: Flush bitcode incrementally for LTO output.

See comment about a VC++ warning being generated.

llvm/include/llvm/Bitstream/BitstreamWriter.h
158	Microsoft Visual C++ is warning that this needs to be char Bytes[9] Can this be updated (at least to silence the warning, we use warnings as errors in our builds)?

stephan.yichao.zhao added inline comments.Sep 18 2020, 9:37 AM

llvm/include/llvm/Bitstream/BitstreamWriter.h
158	Yes. I will be fixing it. What is the name of this warning from Visual C++?

stephan.yichao.zhao marked an inline comment as done.Sep 18 2020, 9:44 AM

stephan.yichao.zhao added inline comments.

llvm/include/llvm/Bitstream/BitstreamWriter.h
158	https://github.com/llvm/llvm-project/commit/cab6f5b2ab814a4be3fd71aacdbe10298f512833

dstuttard added inline comments.Sep 21 2020, 1:28 AM

llvm/include/llvm/Bitstream/BitstreamWriter.h
158	Sorry - should have included that in the original report. It's warning C6386: \llvm-project\llvm\include\llvm\Bitstream\BitstreamWriter.h(177) : warning C6386: Buffer overrun while writing to 'Bytes': the writable size is '8' bytes, but '9' bytes might be written.

MTC added a subscriber: MTC.Oct 9 2021, 12:51 AM

Herald added a subscriber: ormris. · View Herald TranscriptOct 9 2021, 12:51 AM

This looks like a nice idea: can you include perf numbers in the description?

llvm/include/llvm/Bitcode/BitcodeWriter.h
50	I think this is the main public API entry point change? Likely worth updating the doc clearly here.

In D86905#3053390, @mehdi_amini wrote:

This looks like a nice idea: can you include perf numbers in the description?

I didn't notice this was already committed, but numbers are still interesting to go with the change, can you post some here?

In D86905#3053392, @mehdi_amini wrote:

I didn't notice this was already committed, but numbers are still interesting to go with the change, can you post some here?

I dont have a statistic result.
This change mainly helps large binaries. In my cases, a binary is ~8G, the other RAM used for LTO is ~40G. So this change reduces 17% memory overhead.
This 17% reduction is important to me because otherwise I have to find a machine with more than the normal RAM limit to build this binary.
For small applications, this change does not help too much.

In D86905#3053455, @stephan.yichao.zhao wrote:

In D86905#3053392, @mehdi_amini wrote:

I didn't notice this was already committed, but numbers are still interesting to go with the change, can you post some here?

I dont have a statistic result.
This change mainly helps large binaries. In my cases, a binary is ~8G, the other RAM used for LTO is ~40G. So this change reduces 17% memory overhead.
This 17% reduction is important to me because otherwise I have to find a machine with more than the normal RAM limit to build this binary.

Thanks, this is exactly the kind of "numbers" I was looking for actually.

Enna1 mentioned this in D112297: [LTO] Fix assertion failed when flushing bitcode incrementally for LTO output..Oct 22 2021, 2:01 AM

MaskRay mentioned this in rGb5149f4e66a4: [LTO] Fix assertion failed when flushing bitcode incrementally for LTO output..Jan 4 2022, 9:40 PM

Revision Contents

Path

Size

lld/

ELF/

LTO.cpp

16 lines

llvm/

include/

llvm/

Bitcode/

BitcodeWriter.h

2 lines

Bitstream/

BitstreamWriter.h

100 lines

lib/

Bitcode/

Writer/

BitcodeWriter.cpp

12 lines

Diff 292403

lld/ELF/LTO.cpp

Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	auto ret =
std::make_unique<raw_fd_ostream>(file, ec, sys::fs::OpenFlags::OF_None);		std::make_unique<raw_fd_ostream>(file, ec, sys::fs::OpenFlags::OF_None);
if (ec) {		if (ec) {
error("cannot open " + file + ": " + ec.message());		error("cannot open " + file + ": " + ec.message());
return nullptr;		return nullptr;
}		}
return ret;		return ret;
}		}

		// The merged bitcode after LTO is large. Try openning a file stream that
		// supports reading, seeking and writing. Such a file allows BitcodeWriter to
		// flush buffered data to reduce memory comsuption. If this fails, open a file
		// stream that supports only write.
		static std::unique_ptr<raw_fd_ostream> openLTOOutputFile(StringRef file) {
		std::error_code ec;
		std::unique_ptr<raw_fd_ostream> fs =
		std::make_unique<raw_fd_stream>(file, ec);
		if (!ec)
		return fs;
		MaskRayUnsubmitted Done Reply Inline Actions Avoid braces around simple statements http://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements MaskRay: Avoid braces around simple statements http://llvm.org/docs/CodingStandards.html#don-t-use…
		return openFile(file);
		}
		MaskRayUnsubmitted Done Reply Inline Actions `return {};` openFile would likely fail for the same reason. MaskRay: `return {};` openFile would likely fail for the same reason.
		stephan.yichao.zhaoAuthorUnsubmitted Done Reply Inline Actions openFile opens raw_fd_ostream (write-only) but not raw_fd_stream (read-write-seek). So openFile may work on a platform that does not support seek or read-write mode). stephan.yichao.zhao: openFile opens raw_fd_ostream (write-only) but not raw_fd_stream (read-write-seek). So openFile…

static std::string getThinLTOOutputFile(StringRef modulePath) {		static std::string getThinLTOOutputFile(StringRef modulePath) {
return lto::getThinLTOOutputFile(		return lto::getThinLTOOutputFile(
std::string(modulePath), std::string(config->thinLTOPrefixReplace.first),		std::string(modulePath), std::string(config->thinLTOPrefixReplace.first),
std::string(config->thinLTOPrefixReplace.second));		std::string(config->thinLTOPrefixReplace.second));
}		}

static lto::Config createConfig() {		static lto::Config createConfig() {
lto::Config c;		lto::Config c;
▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	static lto::Config createConfig() {
c.TimeTraceEnabled = config->timeTraceEnabled;		c.TimeTraceEnabled = config->timeTraceEnabled;
c.TimeTraceGranularity = config->timeTraceGranularity;		c.TimeTraceGranularity = config->timeTraceGranularity;

c.CSIRProfile = std::string(config->ltoCSProfileFile);		c.CSIRProfile = std::string(config->ltoCSProfileFile);
c.RunCSIRInstr = config->ltoCSProfileGenerate;		c.RunCSIRInstr = config->ltoCSProfileGenerate;

if (config->emitLLVM) {		if (config->emitLLVM) {
c.PostInternalizeModuleHook = [](size_t task, const Module &m) {		c.PostInternalizeModuleHook = [](size_t task, const Module &m) {
if (std::unique_ptr<raw_fd_ostream> os = openFile(config->outputFile))		if (std::unique_ptr<raw_fd_ostream> os =
		openLTOOutputFile(config->outputFile))
WriteBitcodeToFile(m, *os, false);		WriteBitcodeToFile(m, *os, false);
return false;		return false;
};		};
		tejohnsonUnsubmitted Done Reply Inline Actions Nit, document constant parameters with /parameter_name=/ tejohnson: Nit, document constant parameters with /parameter_name=/
		stephan.yichao.zhaoAuthorUnsubmitted Done Reply Inline Actions reverted the change at lld options. stephan.yichao.zhao: reverted the change at lld options.
}		}

if (config->ltoEmitAsm)		if (config->ltoEmitAsm)
c.CGFileType = CGFT_AssemblyFile;		c.CGFileType = CGFT_AssemblyFile;

if (config->saveTemps)		if (config->saveTemps)
checkError(c.addSaveTemps(config->outputFile.str() + ".",		checkError(c.addSaveTemps(config->outputFile.str() + ".",
/UseInputModulePath/ true));		/UseInputModulePath/ true));
▲ Show 20 Lines • Show All 193 Lines • Show Last 20 Lines

llvm/include/llvm/Bitcode/BitcodeWriter.h

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	class BitcodeWriter {
bool WroteStrtab = false, WroteSymtab = false;		bool WroteStrtab = false, WroteSymtab = false;

void writeBlob(unsigned Block, unsigned Record, StringRef Blob);		void writeBlob(unsigned Block, unsigned Record, StringRef Blob);

std::vector<Module *> Mods;		std::vector<Module *> Mods;

public:		public:
/// Create a BitcodeWriter that writes to Buffer.		/// Create a BitcodeWriter that writes to Buffer.
BitcodeWriter(SmallVectorImpl<char> &Buffer);		BitcodeWriter(SmallVectorImpl<char> &Buffer, raw_fd_stream *FS = nullptr);
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions I think this is the main public API entry point change? Likely worth updating the doc clearly here. mehdi_amini: I think this is the main public API entry point change? Likely worth updating the doc clearly…

~BitcodeWriter();		~BitcodeWriter();

/// Attempt to write a symbol table to the bitcode file. This must be called		/// Attempt to write a symbol table to the bitcode file. This must be called
/// at most once after all modules have been written.		/// at most once after all modules have been written.
///		///
/// A reader does not require a symbol table to interpret a bitcode file;		/// A reader does not require a symbol table to interpret a bitcode file;
/// the symbol table is needed only to improve link-time performance. So		/// the symbol table is needed only to improve link-time performance. So
▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines

llvm/include/llvm/Bitstream/BitstreamWriter.h

Show All 14 Lines
#define LLVM_BITSTREAM_BITSTREAMWRITER_H		#define LLVM_BITSTREAM_BITSTREAMWRITER_H

#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/Optional.h"		#include "llvm/ADT/Optional.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/Bitstream/BitCodes.h"		#include "llvm/Bitstream/BitCodes.h"
#include "llvm/Support/Endian.h"		#include "llvm/Support/Endian.h"
		#include "llvm/Support/raw_ostream.h"
#include <vector>		#include <vector>

namespace llvm {		namespace llvm {

class BitstreamWriter {		class BitstreamWriter {
		/// Out - The buffer that keeps unflushed bytes.
SmallVectorImpl<char> &Out;		SmallVectorImpl<char> &Out;

		/// FS - The file stream that Out flushes to. If FS is nullptr, it does not
		/// support read or seek, Out cannot be flushed until all data are written.
		raw_fd_stream *FS;

		/// FlushThreshold - If FS is valid, this is the threshold (unit B) to flush
		/// FS.
		const uint64_t FlushThreshold;

/// CurBit - Always between 0 and 31 inclusive, specifies the next bit to use.		/// CurBit - Always between 0 and 31 inclusive, specifies the next bit to use.
unsigned CurBit;		unsigned CurBit;

/// CurValue - The current value. Only bits < CurBit are valid.		/// CurValue - The current value. Only bits < CurBit are valid.
uint32_t CurValue;		uint32_t CurValue;

/// CurCodeSize - This is the declared size of code values used for the		/// CurCodeSize - This is the declared size of code values used for the
/// current block, in bits.		/// current block, in bits.
unsigned CurCodeSize;		unsigned CurCodeSize;

/// BlockInfoCurBID - When emitting a BLOCKINFO_BLOCK, this is the currently		/// BlockInfoCurBID - When emitting a BLOCKINFO_BLOCK, this is the currently
/// selected BLOCK ID.		/// selected BLOCK ID.
Show All 17 Lines	class BitstreamWriter {
struct BlockInfo {		struct BlockInfo {
unsigned BlockID;		unsigned BlockID;
std::vector<std::shared_ptr<BitCodeAbbrev>> Abbrevs;		std::vector<std::shared_ptr<BitCodeAbbrev>> Abbrevs;
};		};
std::vector<BlockInfo> BlockInfoRecords;		std::vector<BlockInfo> BlockInfoRecords;

void WriteByte(unsigned char Value) {		void WriteByte(unsigned char Value) {
Out.push_back(Value);		Out.push_back(Value);
		FlushToFile();
}		}

void WriteWord(unsigned Value) {		void WriteWord(unsigned Value) {
Value = support::endian::byte_swap<uint32_t, support::little>(Value);		Value = support::endian::byte_swap<uint32_t, support::little>(Value);
Out.append(reinterpret_cast<const char *>(&Value),		Out.append(reinterpret_cast<const char *>(&Value),
reinterpret_cast<const char *>(&Value + 1));		reinterpret_cast<const char *>(&Value + 1));
		FlushToFile();
}		}

size_t GetBufferOffset() const { return Out.size(); }		uint64_t GetNumOfFlushedBytes() const { return FS ? FS->tell() : 0; }

		size_t GetBufferOffset() const { return Out.size() + GetNumOfFlushedBytes(); }

size_t GetWordIndex() const {		size_t GetWordIndex() const {
size_t Offset = GetBufferOffset();		size_t Offset = GetBufferOffset();
assert((Offset & 3) == 0 && "Not 32-bit aligned");		assert((Offset & 3) == 0 && "Not 32-bit aligned");
return Offset / 4;		return Offset / 4;
}		}

		/// If the related file stream supports reading, seeking and writing, flush
		/// the buffer if its size is above a threshold.
		void FlushToFile() {
		if (!FS)
		tejohnsonUnsubmitted Done Reply Inline Actions Would it be valuable to make this configurable? How sensitive is performance to the value chosen here? tejohnson: Would it be valuable to make this configurable? How sensitive is performance to the value…
		stephan.yichao.zhaoAuthorUnsubmitted Done Reply Inline Actions Added a flag to plugin-opt. Is it the right way to do this? stephan.yichao.zhao: Added a flag to plugin-opt. Is it the right way to do this?
		tejohnsonUnsubmitted Done Reply Inline Actions Oh sorry, I just meant an llvm internal option (cl::opt<int>) in this file. Will let @MaskRay comment on whether they want it as an lld option. tejohnson: Oh sorry, I just meant an llvm internal option (cl::opt<int>) in this file. Will let @MaskRay…
		stephan.yichao.zhaoAuthorUnsubmitted Done Reply Inline Actions Switched to cl::opt. Thank you for the suggestion, I did not know this option. What is the proper way to call the options with clang or lld? For example, I tried clang/ld.lld with --bitcode-mdindex-threshold=1 or -bitcode-mdindex-threshold=1. They are not accepted. Do we need any prefix before them? stephan.yichao.zhao: Switched to cl::opt. Thank you for the suggestion, I did not know this option. What is the…
		stephan.yichao.zhaoAuthorUnsubmitted Done Reply Inline Actions Added cl::opt at BitcodeWriter.cpp instead of here. BitstreamWriter.h does not have cpp. If we add its cpp, we also need to add a library, and many code needs to update to use the new library like this (https://reviews.llvm.org/D63899). stephan.yichao.zhao: Added cl::opt at BitcodeWriter.cpp instead of here. BitstreamWriter.h does not have cpp. If we…
		return;
		if (Out.size() < FlushThreshold)
		return;
		FS->write((char *)&Out.front(), Out.size());
		Out.clear();
		}

public:		public:
explicit BitstreamWriter(SmallVectorImpl<char> &O)		/// Create a BitstreamWriter that writes to Buffer \p O.
: Out(O), CurBit(0), CurValue(0), CurCodeSize(2) {}		///
		/// \p FS is the file stream that \p O flushes to incrementally. If \p FS is
		/// null, \p O does not flush incrementially, but writes to disk at the end.
		///
		/// \p FlushThreshold is the threshold (unit M) to flush \p O if \p FS is
		/// valid.
		BitstreamWriter(SmallVectorImpl<char> &O, raw_fd_stream *FS = nullptr,
		uint32_t FlushThreshold = 512)
		: Out(O), FS(FS), FlushThreshold(FlushThreshold << 20), CurBit(0),
		CurValue(0), CurCodeSize(2) {}

~BitstreamWriter() {		~BitstreamWriter() {
assert(CurBit == 0 && "Unflushed data remaining");		assert(CurBit == 0 && "Unflushed data remaining");
assert(BlockScope.empty() && CurAbbrevs.empty() && "Block imbalance");		assert(BlockScope.empty() && CurAbbrevs.empty() && "Block imbalance");
}		}

/// Retrieve the current position in the stream, in bits.		/// Retrieve the current position in the stream, in bits.
uint64_t GetCurrentBitNo() const { return GetBufferOffset() * 8 + CurBit; }		uint64_t GetCurrentBitNo() const { return GetBufferOffset() * 8 + CurBit; }

/// Retrieve the number of bits currently used to encode an abbrev ID.		/// Retrieve the number of bits currently used to encode an abbrev ID.
unsigned GetAbbrevIDWidth() const { return CurCodeSize; }		unsigned GetAbbrevIDWidth() const { return CurCodeSize; }

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
// Basic Primitives for emitting bits to the stream.		// Basic Primitives for emitting bits to the stream.
//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//

/// Backpatch a 32-bit word in the output at the given bit offset		/// Backpatch a 32-bit word in the output at the given bit offset
/// with the specified value.		/// with the specified value.
void BackpatchWord(uint64_t BitNo, unsigned NewWord) {		void BackpatchWord(uint64_t BitNo, unsigned NewWord) {
using namespace llvm::support;		using namespace llvm::support;
uint64_t ByteNo = BitNo / 8;		uint64_t ByteNo = BitNo / 8;
		uint64_t StartBit = BitNo & 7;
		uint64_t NumOfFlushedBytes = GetNumOfFlushedBytes();

		if (ByteNo >= NumOfFlushedBytes) {
assert((!endian::readAtBitAlignment<uint32_t, little, unaligned>(		assert((!endian::readAtBitAlignment<uint32_t, little, unaligned>(
&Out[ByteNo], BitNo & 7)) &&		&Out[ByteNo - NumOfFlushedBytes], StartBit)) &&
"Expected to be patching over 0-value placeholders");		"Expected to be patching over 0-value placeholders");
endian::writeAtBitAlignment<uint32_t, little, unaligned>(		endian::writeAtBitAlignment<uint32_t, little, unaligned>(
&Out[ByteNo], NewWord, BitNo & 7);		&Out[ByteNo - NumOfFlushedBytes], NewWord, StartBit);
		return;
		}

		// If the byte offset to backpatch is flushed, use seek to backfill data.
		// First, save the file position to restore later.
		uint64_t CurPos = FS->tell();

		// Copy data to update into Bytes from the file FS and the buffer Out.
		char Bytes[8];
		dstuttardUnsubmitted Done Reply Inline Actions Microsoft Visual C++ is warning that this needs to be char Bytes[9] Can this be updated (at least to silence the warning, we use warnings as errors in our builds)? dstuttard: Microsoft Visual C++ is warning that this needs to be char Bytes[9] Can this be updated (at…
		stephan.yichao.zhaoAuthorUnsubmitted Done Reply Inline Actions Yes. I will be fixing it. What is the name of this warning from Visual C++? stephan.yichao.zhao: Yes. I will be fixing it. What is the name of this warning from Visual C++?
		stephan.yichao.zhaoAuthorUnsubmitted Done Reply Inline Actions https://github.com/llvm/llvm-project/commit/cab6f5b2ab814a4be3fd71aacdbe10298f512833 stephan.yichao.zhao: https://github.com/llvm/llvm-project/commit/cab6f5b2ab814a4be3fd71aacdbe10298f512833
		dstuttardUnsubmitted Not Done Reply Inline Actions Sorry - should have included that in the original report. It's warning C6386: \llvm-project\llvm\include\llvm\Bitstream\BitstreamWriter.h(177) : warning C6386: Buffer overrun while writing to 'Bytes': the writable size is '8' bytes, but '9' bytes might be written. dstuttard: Sorry - should have included that in the original report. It's warning C6386: \llvm…
		size_t BytesNum = StartBit ? 8 : 4;
		tejohnsonUnsubmitted Done Reply Inline Actions s/path/patch/. But I think the whole comment block would be clearer if written in the affirmative sense, e.g. something like: // When unaligned, copy existing data into Bytes from the file FS and the buffer Out so // that it can be updated before writing. For debug builds read bytes unconditionally // in order to check that the existing value is 0 as expected. Also as noted below suggest moving comment just above the #ifdef check below. tejohnson: s/path/patch/. But I think the whole comment block would be clearer if written in the…
		size_t BytesFromDisk = std::min(BytesNum, NumOfFlushedBytes - ByteNo);
		size_t BytesFromBuffer = BytesNum - BytesFromDisk;

		// When unaligned, copy existing data into Bytes from the file FS and the
		tejohnsonUnsubmitted Done Reply Inline Actions Why is this guarded by NDEBUG? I'm not convinced there is much value in doing this code even when StartBit is 0 in the debug case. tejohnson: Why is this guarded by NDEBUG? I'm not convinced there is much value in doing this code even…
		stephan.yichao.zhaoAuthorUnsubmitted Done Reply Inline Actions This relates to the assert at line 165. At debug mode, the assert at line 165 checks if the value to backpatch is 0. So the code needs to read data from disk. At non-debug mode, the assert at line 165 does not verify existing data. So the code does not need to read data from disk for this reason. But if StartBit is non 0, the code still needs to read the existing data because the backpatched value is not aligned. For example, when backpatching with StartBit = 2, the aligned data on disk are, c0 00 00 00 3f So we need to read them out, fill in those 0s, then write back. Although the code can always read these bytes from disk, I wanted to save some IO overhead. Added comments. stephan.yichao.zhao: This relates to the assert at line 165. At debug mode, the assert at line 165 checks if the…
		tejohnsonUnsubmitted Done Reply Inline Actions Ah ok. I suggest moving that assert up under the #ifdef too then just for clarity, since they go together logically. And as suggested above, move that whole comment about filling in Bytes to here just above the #ifdef. tejohnson: Ah ok. I suggest moving that assert up under the #ifdef too then just for clarity, since they…
		tejohnsonUnsubmitted Done Reply Inline Actions I just realized the first part of this suggestion doesn't make sense, what I should have said is to move the assert up into within the braces, since it goes with that code. I.e. if the code within the braces isn't executed Bytes isn't filled in so there is no point asserting whether Bytes is 0. tejohnson: I just realized the first part of this suggestion doesn't make sense, what I should have said…
		// buffer Out so that it can be updated before writing. For debug builds
		// read bytes unconditionally in order to check that the existing value is 0
		// as expected.
		#ifdef NDEBUG
		evgeny777Unsubmitted Done Reply Inline Actions Can we use memory mapped I/O and avoid backpatching on disk? evgeny777: Can we use memory mapped I/O and avoid backpatching on disk?
		stephan.yichao.zhaoAuthorUnsubmitted Done Reply Inline Actions Our use case is likely not what mmap is good at. I assume mmap in Linux loads pages on demand. If a code reads/writes data on pages already loaded, its access has no IO cost. For example, a code randomly accesses a chunk of continuous addresses or addresses within a same page. Although the first time a page is loaded, the memory copy and page fault cost are still paid, the cost is ignorable asymptotically. Our case is a bit different. Given a 512M incremental flush threshold, I tested an LTO built that outputs a 5G bitcode file. The BackpatchWord is called 16,613,927 times, among which only 12 needs disk seek. Plus, each access visits 4-8 bytes on a page, and all visited pages are far away from each other. It is likely that the pages are not cached, and need to load anyway, and after a load, our code does not access enough data on a page to 'cancel' the page fault cost. So its cost could be very similar to seek. Note that if a BackpatchWord needs to access disk, we need 1 seek to load existing data, 1 seek to overwrite the data, and 1 seek to jump back. The first 2 seek addresses are very close, hopefully disk cache can handle them. Although the last jump back seek is a very long jump, if a page cache is based no time or frequency, the page that it jumps back may not be evicted yet. Overall the ratio of disk access introduced is very small, so hopefully its additional cost is small. I also did a perf profile, no observable latency is shown (because LTO takes too much time). Give the above and that mmap support is different across systems, the seek based approach seems fine. stephan.yichao.zhao: Our use case is likely not what mmap is good at. I assume mmap in Linux loads pages on demand.
		if (StartBit)
		#endif
		{
		FS->seek(ByteNo);
		ssize_t BytesRead = FS->read(Bytes, BytesFromDisk);
		tejohnsonUnsubmitted Done Reply Inline Actions Why this line added? Oh ic, presumably to avoid an unused variable warning in the NDEBUG case. Maybe just add a comment to that effect. tejohnson: Why this line added? Oh ic, presumably to avoid an unused variable warning in the NDEBUG case.
		(void)BytesRead; // silence warning
		assert(BytesRead >= 0 && static_cast<size_t>(BytesRead) == BytesFromDisk);
		for (size_t i = 0; i < BytesFromBuffer; ++i)
		Bytes[BytesFromDisk + i] = Out[i];
		assert((!endian::readAtBitAlignment<uint32_t, little, unaligned>(
		Bytes, StartBit)) &&
		"Expected to be patching over 0-value placeholders");
		}

		// Update Bytes in terms of bit offset and value.
		endian::writeAtBitAlignment<uint32_t, little, unaligned>(Bytes, NewWord,
		StartBit);

		// Copy updated data back to the file FS and the buffer Out.
		FS->seek(ByteNo);
		FS->write(Bytes, BytesFromDisk);
		for (size_t i = 0; i < BytesFromBuffer; ++i)
		Out[i] = Bytes[BytesFromDisk + i];

		// Restore the file position.
		FS->seek(CurPos);
}		}

void BackpatchWord64(uint64_t BitNo, uint64_t Val) {		void BackpatchWord64(uint64_t BitNo, uint64_t Val) {
BackpatchWord(BitNo, (uint32_t)Val);		BackpatchWord(BitNo, (uint32_t)Val);
BackpatchWord(BitNo + 32, (uint32_t)(Val >> 32));		BackpatchWord(BitNo + 32, (uint32_t)(Val >> 32));
}		}

void Emit(uint32_t Val, unsigned NumBits) {		void Emit(uint32_t Val, unsigned NumBits) {
▲ Show 20 Lines • Show All 428 Lines • Show Last 20 Lines

llvm/lib/Bitcode/Writer/BitcodeWriter.cpp

Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
#include <vector>		#include <vector>

using namespace llvm;		using namespace llvm;

static cl::opt<unsigned>		static cl::opt<unsigned>
IndexThreshold("bitcode-mdindex-threshold", cl::Hidden, cl::init(25),		IndexThreshold("bitcode-mdindex-threshold", cl::Hidden, cl::init(25),
cl::desc("Number of metadatas above which we emit an index "		cl::desc("Number of metadatas above which we emit an index "
"to enable lazy-loading"));		"to enable lazy-loading"));
		static cl::opt<uint32_t> FlushThreshold(
		"bitcode-flush-threshold", cl::Hidden, cl::init(512),
		cl::desc("The threshold (unit M) for flushing LLVM bitcode."));

static cl::opt<bool> WriteRelBFToSummary(		static cl::opt<bool> WriteRelBFToSummary(
"write-relbf-to-summary", cl::Hidden, cl::init(false),		"write-relbf-to-summary", cl::Hidden, cl::init(false),
cl::desc("Write relative block frequency to function summary "));		cl::desc("Write relative block frequency to function summary "));

extern FunctionSummary::ForceSummaryHotnessType ForceSummaryEdgesCold;		extern FunctionSummary::ForceSummaryHotnessType ForceSummaryEdgesCold;

namespace {		namespace {
▲ Show 20 Lines • Show All 4,351 Lines • ▼ Show 20 Lines	static void writeBitcodeHeader(BitstreamWriter &Stream) {
Stream.Emit((unsigned)'B', 8);		Stream.Emit((unsigned)'B', 8);
Stream.Emit((unsigned)'C', 8);		Stream.Emit((unsigned)'C', 8);
Stream.Emit(0x0, 4);		Stream.Emit(0x0, 4);
Stream.Emit(0xC, 4);		Stream.Emit(0xC, 4);
Stream.Emit(0xE, 4);		Stream.Emit(0xE, 4);
Stream.Emit(0xD, 4);		Stream.Emit(0xD, 4);
}		}

BitcodeWriter::BitcodeWriter(SmallVectorImpl<char> &Buffer)		BitcodeWriter::BitcodeWriter(SmallVectorImpl<char> &Buffer, raw_fd_stream *FS)
: Buffer(Buffer), Stream(new BitstreamWriter(Buffer)) {		: Buffer(Buffer), Stream(new BitstreamWriter(Buffer, FS, FlushThreshold)) {
writeBitcodeHeader(*Stream);		writeBitcodeHeader(*Stream);
}		}

BitcodeWriter::~BitcodeWriter() { assert(WroteStrtab); }		BitcodeWriter::~BitcodeWriter() { assert(WroteStrtab); }

void BitcodeWriter::writeBlob(unsigned Block, unsigned Record, StringRef Blob) {		void BitcodeWriter::writeBlob(unsigned Block, unsigned Record, StringRef Blob) {
Stream->EnterSubblock(Block, 3);		Stream->EnterSubblock(Block, 3);

▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	void llvm::WriteBitcodeToFile(const Module &M, raw_ostream &Out,
Buffer.reserve(256*1024);		Buffer.reserve(256*1024);

// If this is darwin or another generic macho target, reserve space for the		// If this is darwin or another generic macho target, reserve space for the
// header.		// header.
Triple TT(M.getTargetTriple());		Triple TT(M.getTargetTriple());
if (TT.isOSDarwin() \|\| TT.isOSBinFormatMachO())		if (TT.isOSDarwin() \|\| TT.isOSBinFormatMachO())
Buffer.insert(Buffer.begin(), BWH_HeaderSize, 0);		Buffer.insert(Buffer.begin(), BWH_HeaderSize, 0);

BitcodeWriter Writer(Buffer);		BitcodeWriter Writer(Buffer, dyn_cast<raw_fd_stream>(&Out));
Writer.writeModule(M, ShouldPreserveUseListOrder, Index, GenerateHash,		Writer.writeModule(M, ShouldPreserveUseListOrder, Index, GenerateHash,
ModHash);		ModHash);
Writer.writeSymtab();		Writer.writeSymtab();
Writer.writeStrtab();		Writer.writeStrtab();

if (TT.isOSDarwin() \|\| TT.isOSBinFormatMachO())		if (TT.isOSDarwin() \|\| TT.isOSBinFormatMachO())
emitDarwinBCHeaderAndTrailer(Buffer, TT);		emitDarwinBCHeaderAndTrailer(Buffer, TT);

// Write the generated bitstream to "Out".		// Write the generated bitstream to "Out".
		if (!Buffer.empty())
Out.write((char*)&Buffer.front(), Buffer.size());		Out.write((char *)&Buffer.front(), Buffer.size());
}		}

void IndexBitcodeWriter::write() {		void IndexBitcodeWriter::write() {
Stream.EnterSubblock(bitc::MODULE_BLOCK_ID, 3);		Stream.EnterSubblock(bitc::MODULE_BLOCK_ID, 3);

writeModuleVersion();		writeModuleVersion();

// Write the module paths in the combined index.		// Write the module paths in the combined index.
▲ Show 20 Lines • Show All 316 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Flush bitcode incrementally for LTO outputClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 292403

lld/ELF/LTO.cpp

llvm/include/llvm/Bitcode/BitcodeWriter.h

llvm/include/llvm/Bitstream/BitstreamWriter.h

llvm/lib/Bitcode/Writer/BitcodeWriter.cpp

Flush bitcode incrementally for LTO output
ClosedPublic