This reduces peak allocations from 564.2 to 545.7 MB when linking chrome.
Diff Detail
Event Timeline
At least on linux with libstdc++ it doesn't seem to move the data to a smaller buffer. I still get a peak of 564.2 with it.
Does your code make this part of code faster? It reduces memory allocation but it does extra memcpy, so I'm wondering.
On my machine the difference is in the noise. Note that it is not a given that this has more copying. When the vector is resized there is a copy, and we should expect fewer resizes with the single buffer.
I confirm the peak allocation difference. I think that implementation can be a bit simpler though, my suggestion is below.
ELF/Writer.cpp | ||
---|---|---|
875 | It can be a bit simpler I think: std::vector<Relocation> Buffer; // <new comment> auto WithBuffer = [&](InputSectionBase &Sec) { Fn(Sec); Buffer = std::move(Sec.Relocations); Sec.Relocations = Buffer; }; |
ELF/Writer.cpp | ||
---|---|---|
875 | That would reduce peak allocation, but each call to Fn is still allocating a new buffer, no? But does suggest a way to simplify it a bit. |
ELF/Writer.cpp | ||
---|---|---|
875 | CPU time was about the same as the original code for my way I think. I observed a minor speed up between original and your version. It was: So it is about 0.5% I think. Results seem was more or less stable. I do not have precise numbers for my version underhand (I can retest it if you think it worth, but it was something about original code numbers). I think the simplicity of implementation makes more sense here probably (and I hope we will be able to use shrink_to_fit one day). |
ELF/Writer.cpp | ||
---|---|---|
875 | Testing the suggestion the peak memory is indeed the same, but the total memory allocated is actually higher: master 765.92 564.2 |
I wonder if you can make a guess on how many relocations will be inserted to the vector. We know the number of relocations for each input section, so calling reserve() might work.
I think reserve() is exactly what causes high peaks now. We already call it. See:
https://github.com/llvm-mirror/lld/blob/master/ELF/Relocations.cpp#L1020
Use end() when inserting. Doesn't make a difference in here, but is the canonical way of concatenating vectors.
ELF/Writer.cpp | ||
---|---|---|
875 | Did not see that (my profiler seem shows only peak memory). |
Just removing it would be disastrous for performance. It was a 6% improvement when it was added in r319976.
It might be possible to remove it after this patch.
I tried this patch and found that at least for Chrome, we wasted memory by calling reserve() on ".data.rel.ro" sections. Such section in Chrome doesn't add any item to Sec.Relocations while their Rels.size() is relatively large, so reserved memory is totally wasted. If you don't call reserve on such section, you can save almost the same memory as you did in this patch. Can you take a look?
The issue is that chrome is linked with -pie. The section in question has a lot of
.quad foo
.quad bar
...
They become dynamic relocation instead of static ones.
I don't see how to predict that in general. I will try removing the reserve in combination with this patch.
Delete call to reserve.
The results are
master 765.5 total, 563.7 peak
with reserve 747.76 total, 545.3 peak
without reserve 747.63 total, 545.3 peak
Few more ideas about that one.
ELF/Writer.cpp | ||
---|---|---|
861–874 | Less tricky way probably could be to pass std::vector<Relocation> Buffer to Fn, so it could populate it. And code would be something like next then: auto WithBuffer = [&](InputSectionBase &Sec) { Fn(Sec, Buffer); Sec.Relocations.insert(Sec.Relocations.end(), Buffer.begin(), Buffer.end()); Buffer.clear(); }; | |
866 | I would probably add that we do that to decrease total memory consumption. |
ELF/Writer.cpp | ||
---|---|---|
861–874 | This would require passing a buffer to a lot of places. For example, RelocationBaseSection::addReloc would need it. As written the code with swap is a bit more complicated, but it is completely local. |
I would probably add that we do that to decrease total memory consumption.
Maybe "the exact internal buffer size" would make comment more understandable.