MutableRange provides an interface to effectively make a const container mutable without copying every element or modifying the const container in any way.
This will eventually be used in D64281.
abrachet on Jul 9 2019, 6:11 PM.Authored by
Nice work so far. I think it would be more natural to me if the "main" template argument was that of the underlying container, rather than the iterator. This shouldn't affect the functionality, if I've followed it correctly.
I've got a number of other comments, but don't want to overload things too much in the first instance.
Moved files from Support to ADT. Removed base class because it wasn't adding anything. Changed enable_if to specialize on the container type, between AssociativeContainer and SequenceContainer's. Added doxygen comments for all public methods. Added collect() method to return the UpdatableRange as the same type of the original container. Updated unit test to be more comprehensive and use test fixtures. Addressed review comments.
If you haven't already, it's worth looking at the other ADT classes like SmallVector to make sure your code matches the same style etc used there.
I think the name "MutableRange" might be more consistent with naming schemes within LLVM.
Overall, I quite like this. I think it's at the point where it's worth getting a wider audience by adding some more people as reviewers.
You probably need a data structure to process symbol table/section header table/program header table. To that end, this patch adds an overlay data structure (MutableRange as the name made me puzzled) to make operations more efficient. Have you benchmarked whether the librarified llvm-objcopy will benefit from a fancy data structure like this? I'm thinking plain vectors (as is) may just meet the needs.
If you just want to make interleaved insertion/deletion efficient:
Note you pay the O(n) cost for each operation. If you group them (online algorithm->offline) by insertion/deletion:
a) batched insertion: 1,3,5: O(n)
You pay the O(n) cost just twice.
If this is going to be promoted to ADT, I think it should be justified by a second use case.
The idea here was the llvm-objcopy needs to reference the original sections/symbols etc and that such a data structure could be used to keep both the original and update or add new entries.
Not yet. But this can keep the original entries without needing to make a copy of every single one, only the ones that get modified.
So like cache insertion/deletion and then execute them whenever the data structure is read? Can this be done with std::transform efficiently?
Not married to the name by any means :) Just called such because it creates a mutable layer over an immutable data structure.
Using a overlay data structure for section header table is not necessary. There are not many sections. ~50 is a cap for most applications, so O(N) isn't slow at all. In lld, orphan placement algorithm is quadratic in terms of the number of output sections. This isn't an performance issue at all. I believe ld.bfd has a similar complexity.
Speaking of the dynamic symbol table and the static symbol table. Some shared objects have an awful number of .dynsym entries, e.g. my libclangAST.so has 5000+ .dynsym entries and 7000+ .symtab entries. We do should keep an eye on its performance. IMHO as long as we don't make it O(|syms|^2) it should be fine. It is currently O(|syms| * constant).
After making llvm-objcopy a library, we may do a bunch of operations on it. As I think about it, the operations may look like:
Objcopy o; o.addSymbol(); o.removeSymbol(); // x100 o.addSymbol(); // x100 o.addSection(); // x100 o.removeSection(); // x100 o.commit(); // actually perform the operations. Make sure at this point, aggregate the previous operations and do addSymbol() in batch, removeSymbol() in batch, addSection() in batch, and removeSection() in batch.
The time complexity will be just O(|syms|*|types of operations|) instead of O(|syms|*|operations|).
Not true: llvm-objcopy can be easily used on object files, which can have tens of thousands of sections, possibly more, in real-world cases that I work with, due to -ffunction-sections/-fdata-sections, COMDATs etc. Symbols are also sometimes in the order of hundreds of thousands at least, especially in linked applications. In other words, we need to be careful about the performance trade-offs we make (though linear is probably fine if anything simpler is more complex).
I'll come back to this tomorrow. Ran out of time to look more today.
I think you're trying to do too much in this patch. Having an idea of how it could be extended is very different from going ahead and implementing it, and I really don't see a need to add support for all the types you are currently supporting. Certainly, I think this is too big to review as is.
I know I suggested bringing it across here, but @MaskRay is probably right in that we probably need another concrete use case for this code to put it into ADT. I think it would be wise to scale it back to what is needed for the MutableObject patch series, sorry.
Rather than completely throwing this all away, maybe you should create a new patch that just keeps it tightly focused to what is needed (perhaps similar to what you had before), referencing this patch, and then abandoning this patch. Then, if someone in the future feels there's a need for a more general solution, this patch can be resurrected.