Synchronize binary function and binary section relocations container
type in order to be able to change relocation struct fields inplace.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
bolt/include/bolt/Core/Relocation.h | ||
---|---|---|
58 ↗ | (On Diff #416800) | Can you add description of this function. |
bolt/include/bolt/Core/Relocation.h | ||
---|---|---|
58 ↗ | (On Diff #416800) | Sure will do |
Thanks!
bolt/include/bolt/Core/BinarySection.h | ||
---|---|---|
290 | Here you remove the accessors to the non-const version of relocs, but I suspect we will need them to avoid using the mutable hack. | |
bolt/include/bolt/Core/Relocation.h | ||
53 ↗ | (On Diff #418236) | Why mutable? Except when storing mutexes, usually mutable indicates a problem with the class that should be solved with a redesign (removing const where appropriate). Is there a way to use non-const Relocation objects? |
bolt/include/bolt/Core/BinarySection.h | ||
---|---|---|
290 | The problem is deeper as I understand. I'm not c++ expert, but the relocations are stored in std::set<Relocation, std::less<>>container, which does not give the in container edit abilities due to the std::less sorting. The mutable fields are hacky, but AFAIU such fields are not not participated in the container sorting mechanisms and though might be edited in place. Correct me if I wrong, I really don't like c++ some times like this :) Maybe you can suggest some of the alternatives to change the container type for example :) |
We can change BinarySection's RelocationSetType to be a sorted std::vector instead, as per https://llvm.org/docs/ProgrammersManual.html#a-sorted-vector
We would need to insert the relocations in any order, and then sort this vector and remove duplicates. Although I'm not convinced that we need to keep relocations in a sorted order or to remove duplicates. @maksfb What do you think?
@rafauler After some thinking I've desided to replace std::set with std::map. The relocations are identified by the offset field,
so it is good-enough solution that gives us fast search abilities and any inplace manipulations of the relocations struct.
BTW the std::map is used as a container for the function relocations, so now we've got "synced up" them :)
I think the strategy looks good. I've been looking at this at the memory profiler, and it doesn't increase memory utilization by much by adding these extra fields to the Relocation class because after BinaryFunction::disassembly(), we call clearList(Relocations), which frees the memory used by relocation objects that are in the text sections. Relocations associated with data sections still use 0.4% of all memory allocation. The increase in memory utilization is from 48B to 72B (50% increase for this class), which should translate to an overall 0.2% increase in memory utilization for one of our large binaries.
I'm concerned that by emitting so many labels in the data sections, we could slow down the emitter. Could you check by how much time BOLT is slowing down after this patch, in a typical large binary? Just so we are aware and tracking memory and time regressions.
Tagging @maksfb to be aware of the memory regression.
Hello @rafaelauler . I will try to measure the results later, but I see that something going on with the buildbots, some of the tests are failed with 60 sec timeout. Since now I'm able to run x86 test on arm I've tested it and everything went momentally, I don't see the reason why the tests are failing. Do you have any idea? Thank you!
Do you have an environment that is the same as the one used in pre-merge checks? (Ubuntu, I guess). That might be important to repro the timeout.
@rafaelauler Right now I've got only ubuntu aarch64 :( I will try to check it on x86 ubuntu locally soon, although I don't really see the reason for such a strange behaviour..
@rafauler I've checked the biggest binary I have. As for the time after couple of launches I see 380-390 sec before patch and 390-400sec after patch, which I assume is just a negligible difference. As for the memory the I've got 38279868kb before and 38348116kb after (+68mb or +0.2%) which is small amount too and matches yours results. So I assume it's a reasonable change with an acceptable overhead :)
I suspect the regression in time might be because we are emitting too many labels unnecessarily. We are emitting a label for every relocation, right? Maybe the best approach would be to create a special class for DynamicRelocation and only that will have extra fields to store a label, and then change our emitter to emit a label only for those objects (as opposed to every static reloc), and then later in updateOutputValues fetch that label? I think this would avoid the memory/time regression, unless I'm missing something.
@rafauler Well, the time regression is about couple of %, even before patch I got up to 390sec, and 390+ sec after. The memory regression is negligible. It is possible to do what you're suggesting, but it looks to me like extra complication that personally me would like to avoid. I really don't think it is worse it, but it is discussible question
@rafauler I've decided to add extra arg for relocation emission, it is easy to add and easy to revert in case we won't need it anymore
I've been discussing this diff with the team and in general people are not only concerned about the memory/time regression, but it feels like a hacky approach. We need proper support for dynamic relocations in BOLT, and in doing so, the code should become easier to understand and maintain. What's your take? Do you think that you could design things in a way to make these concerns go away? If you would like to, we can set up a time to discuss my suggestion over VC or text chat, if it's not clear.
@rafaelauler As I understand we will need to keep dynamic relocations in binary function/data like we do it for static relocation and emit labels in the right places. Let's discuss it in discord in the nearest time if you have any suggestion. I don't really like the binary function part of the code too (initially I've made the review NFC). As for this review I suggest to separate the container type refactoring and to think about the dynamic relocations emission separately, what do you think?
I think we are on the same page. I'm fine with separating the refactoring in two separate diffs too.
This looks good to me, but I don't think it makes sense to commit this yet unless we have a real use case for this (perhaps in a follow-up diff).
Here you remove the accessors to the non-const version of relocs, but I suspect we will need them to avoid using the mutable hack.