This is an archive of the discontinued LLVM Phabricator instance.

[BOLT] Replace relocations container type
Needs ReviewPublic

Authored by yota9 on Mar 20 2022, 2:48 PM.

Details

Summary

Synchronize binary function and binary section relocations container
type in order to be able to change relocation struct fields inplace.

Diff Detail

Event Timeline

yota9 created this revision.Mar 20 2022, 2:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 20 2022, 2:48 PM
Herald added a subscriber: ayermolo. · View Herald Transcript
yota9 requested review of this revision.Mar 20 2022, 2:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 20 2022, 2:48 PM
ayermolo added inline comments.Mar 20 2022, 2:51 PM
bolt/include/bolt/Core/Relocation.h
58 ↗(On Diff #416800)

Can you add description of this function.

yota9 marked an inline comment as done.Mar 20 2022, 2:53 PM
yota9 added inline comments.
bolt/include/bolt/Core/Relocation.h
58 ↗(On Diff #416800)

Sure will do

yota9 updated this revision to Diff 416801.Mar 20 2022, 2:55 PM
yota9 marked an inline comment as done.

clang format + comment

yota9 updated this revision to Diff 418236.Mar 25 2022, 8:27 AM

Change runtime test to ordinary

yota9 retitled this revision from [BOLT] NFC: Fix dynamic relocation offset for constant islands to [BOLT] Fix dynamic relocation offset for constant islands.Mar 25 2022, 12:14 PM

Thanks!

bolt/include/bolt/Core/BinarySection.h
290

Here you remove the accessors to the non-const version of relocs, but I suspect we will need them to avoid using the mutable hack.

bolt/include/bolt/Core/Relocation.h
53 ↗(On Diff #418236)

Why mutable? Except when storing mutexes, usually mutable indicates a problem with the class that should be solved with a redesign (removing const where appropriate). Is there a way to use non-const Relocation objects?

yota9 marked an inline comment as done.Mar 28 2022, 3:47 PM
yota9 added inline comments.
bolt/include/bolt/Core/BinarySection.h
290

The problem is deeper as I understand. I'm not c++ expert, but the relocations are stored in std::set<Relocation, std::less<>>container, which does not give the in container edit abilities due to the std::less sorting. The mutable fields are hacky, but AFAIU such fields are not not participated in the container sorting mechanisms and though might be edited in place. Correct me if I wrong, I really don't like c++ some times like this :) Maybe you can suggest some of the alternatives to change the container type for example :)

We can change BinarySection's RelocationSetType to be a sorted std::vector instead, as per https://llvm.org/docs/ProgrammersManual.html#a-sorted-vector

We would need to insert the relocations in any order, and then sort this vector and remove duplicates. Although I'm not convinced that we need to keep relocations in a sorted order or to remove duplicates. @maksfb What do you think?

yota9 updated this revision to Diff 419578.Mar 31 2022, 3:39 PM
yota9 marked an inline comment as done.

@rafauler After some thinking I've desided to replace std::set with std::map. The relocations are identified by the offset field,
so it is good-enough solution that gives us fast search abilities and any inplace manipulations of the relocations struct.

yota9 updated this revision to Diff 419579.Mar 31 2022, 3:41 PM

clang-format

yota9 added a comment.Mar 31 2022, 3:43 PM

BTW the std::map is used as a container for the function relocations, so now we've got "synced up" them :)

I think the strategy looks good. I've been looking at this at the memory profiler, and it doesn't increase memory utilization by much by adding these extra fields to the Relocation class because after BinaryFunction::disassembly(), we call clearList(Relocations), which frees the memory used by relocation objects that are in the text sections. Relocations associated with data sections still use 0.4% of all memory allocation. The increase in memory utilization is from 48B to 72B (50% increase for this class), which should translate to an overall 0.2% increase in memory utilization for one of our large binaries.

I'm concerned that by emitting so many labels in the data sections, we could slow down the emitter. Could you check by how much time BOLT is slowing down after this patch, in a typical large binary? Just so we are aware and tracking memory and time regressions.

Tagging @maksfb to be aware of the memory regression.

I also recommend checking peak memory utilization with /usr/bin/time

Hello @rafaelauler . I will try to measure the results later, but I see that something going on with the buildbots, some of the tests are failed with 60 sec timeout. Since now I'm able to run x86 test on arm I've tested it and everything went momentally, I don't see the reason why the tests are failing. Do you have any idea? Thank you!

If there is a timeout, I guess we can just run/try again.

yota9 added a comment.Apr 5 2022, 2:22 PM

@rafaelauler I've tried couple of times, but the result seems to be the same...

Do you have an environment that is the same as the one used in pre-merge checks? (Ubuntu, I guess). That might be important to repro the timeout.

yota9 added a comment.Apr 5 2022, 2:58 PM

@rafaelauler Right now I've got only ubuntu aarch64 :( I will try to check it on x86 ubuntu locally soon, although I don't really see the reason for such a strange behaviour..

yota9 updated this revision to Diff 421269.Apr 7 2022, 10:26 AM

nit
@rafauler It was my fault, not all tests should be OK, sorry for bothering :)

yota9 added a comment.EditedApr 7 2022, 2:41 PM

@rafauler I've checked the biggest binary I have. As for the time after couple of launches I see 380-390 sec before patch and 390-400sec after patch, which I assume is just a negligible difference. As for the memory the I've got 38279868kb before and 38348116kb after (+68mb or +0.2%) which is small amount too and matches yours results. So I assume it's a reasonable change with an acceptable overhead :)

I suspect the regression in time might be because we are emitting too many labels unnecessarily. We are emitting a label for every relocation, right? Maybe the best approach would be to create a special class for DynamicRelocation and only that will have extra fields to store a label, and then change our emitter to emit a label only for those objects (as opposed to every static reloc), and then later in updateOutputValues fetch that label? I think this would avoid the memory/time regression, unless I'm missing something.

@rafauler Well, the time regression is about couple of %, even before patch I got up to 390sec, and 390+ sec after. The memory regression is negligible. It is possible to do what you're suggesting, but it looks to me like extra complication that personally me would like to avoid. I really don't think it is worse it, but it is discussible question

yota9 updated this revision to Diff 422477.Apr 13 2022, 4:49 AM

@rafauler I've decided to add extra arg for relocation emission, it is easy to add and easy to revert in case we won't need it anymore

I've been discussing this diff with the team and in general people are not only concerned about the memory/time regression, but it feels like a hacky approach. We need proper support for dynamic relocations in BOLT, and in doing so, the code should become easier to understand and maintain. What's your take? Do you think that you could design things in a way to make these concerns go away? If you would like to, we can set up a time to discuss my suggestion over VC or text chat, if it's not clear.

yota9 added a comment.Apr 13 2022, 4:35 PM

@rafaelauler As I understand we will need to keep dynamic relocations in binary function/data like we do it for static relocation and emit labels in the right places. Let's discuss it in discord in the nearest time if you have any suggestion. I don't really like the binary function part of the code too (initially I've made the review NFC). As for this review I suggest to separate the container type refactoring and to think about the dynamic relocations emission separately, what do you think?

I think we are on the same page. I'm fine with separating the refactoring in two separate diffs too.

yota9 updated this revision to Diff 423069.Apr 15 2022, 6:01 AM

Remove all changes but refactoring

yota9 retitled this revision from [BOLT] Fix dynamic relocation offset for constant islands to [BOLT] Replace relocations container type.Apr 15 2022, 6:01 AM
yota9 edited the summary of this revision. (Show Details)

This looks good to me, but I don't think it makes sense to commit this yet unless we have a real use case for this (perhaps in a follow-up diff).